Presented By O’Reilly and Cloudera
Make Data Work
March 5–6, 2018: Training
March 6–8, 2018: Tutorials & Conference
San Jose, CA

Continuous delivery for NLP on Kubernetes: Lessons learned

Michelle Casbon (Google)
2:40pm3:20pm Thursday, March 8, 2018

Who is this presentation for?

  • Engineers, data scientists, and those in operations and engineering management

Prerequisite knowledge

  • Familiarity with system architectures and distributed computing tools

What you'll learn

  • Learn how to speed up the development of ML models with open source tools

Description

Michelle Casbon explains how to speed up the development of ML models by using open source tools such as Kubernetes, Docker, Scala, Apache Spark, and Weave Flux, detailing how to build resilient systems so that you can spend more of your time on product improvement rather than triage and uptime. Specifically, Michelle offers an overview of the continuous delivery pipeline that powers the machine learning and natural language processing components of the Qordoba platform, which makes it feasible to build products that feel native to every user, regardless of language. This platform organizes and generates billions of localized strings across all languages, automating the internationalization process and enabling its users to leverage continuous delivery methods for their own applications.

You’ll learn how Qordoba’s engineering team standardized the deployment process across the organization, reducing overall system complexity and insulating themselves from human error. This resulted in faster development of NLP models and better cooperation between data science, engineering, and operations teams. Michelle shares why these changes were so foundational and provided such far-reaching impact and outlines some lessons learned, guiding you as you improve your existing application or build one from scratch.

Topics include:

  • Which open source tools work well together to provide a frictionless CD solution
  • How to speed up the development of ML models
  • How to empower data science, engineering, and operations teams
  • How to reduce system complexity
  • How to standardize deployment across your code base
  • How to build resilient systems and insulate yourself from human and mechanical error
  • What not to containerize
Photo of Michelle Casbon

Michelle Casbon

Google

Michelle Casbon is a senior engineer on the Google Cloud Platform developer relations team, where she focuses on open source contributions and community engagement for machine learning and big data tools. Michelle‚Äôs development experience spans more than a decade and has primarily focused on multilingual natural language processing, system architecture and integration, and continuous delivery pipelines for machine learning applications. Previously, she was a senior engineer and director of data science at several San Francisco-based startups, building and shipping machine learning products on distributed platforms using both AWS and GCP. She especially loves working with open source projects and is a contributor to Kubeflow. Michelle holds a master’s degree from the University of Cambridge.