Presented By O'Reilly and Cloudera
Make Data Work
22–23 May 2017: Training
23–25 May 2017: Tutorials & Conference
London, UK

Machine learning to automate localization with Apache Spark and other open source tools

Michelle Casbon (Google)
14:5515:35 Wednesday, 24 May 2017
Data science and advanced analytics
Location: Hall S21/23 (B)
Level: Intermediate
Average rating: *****
(5.00, 1 rating)

Who is this presentation for?

  • Engineers and product managers

Prerequisite knowledge

  • A general familiarity with common open source tools, system architectures, and basic machine learning

What you'll learn

  • Learn the techniques Qordoba uses to provide continuous deployment of localized strings, live syncing across platforms, content generation for any locale, and emotional response
  • Explore Qordoba's architecture for handling billions of localized strings in many different languages

Description

In order to establish a user base across the globe, a product needs to support a variety of locales. The challenge with supporting multiple locales is the maintenance and generation of localized strings, which are deeply integrated into many facets of a product. To address these challenges, Qordoba is using machine learning with highly scalable technologies such as Apache Spark to automate the process. Specifically, Qordoba needs to generate high-quality translations in many different languages and make them available in real time across platforms (e.g., mobile, print, and the web).

Michelle Casbon describes the techniques Qordoba uses to provide continuous deployment of localized strings, live syncing across platforms (mobile, web, photoshop, sketch, help desk, etc.), content generation for any locale, and emotional response. Michelle also explores Qordoba’s architecture for handling billions of localized strings in many different languages, explaining how Qordoba uses:

  • Scala and Akka as an orchestration layer
  • Apache Cassandra and MariaDB as a storage layer
  • Apache Spark for natural language processing
  • Apache Kafka as a message bus for reporting, billing, and notifications
  • Apache Mesos, Marathon, and Docker for containerized deployment

. . .all in a platform that makes it feasible to build products that feel native to every user, regardless of language.

Photo of Michelle Casbon

Michelle Casbon

Google

Michelle Casbon is a senior engineer on the Google Cloud Platform developer relations team, where she focuses on open source contributions and community engagement for machine learning and big data tools. Michelle’s development experience spans more than a decade and has primarily focused on multilingual natural language processing, system architecture and integration, and continuous delivery pipelines for machine learning applications. Previously, she was a senior engineer and director of data science at several San Francisco-based startups, building and shipping machine learning products on distributed platforms using both AWS and GCP. She especially loves working with open source projects and is a contributor to Kubeflow. Michelle holds a master’s degree from the University of Cambridge.