Presented By O'Reilly and Cloudera
Make Data Work
22–23 May 2017: Training
23–25 May 2017: Tutorials & Conference
London, UK

Machine learning to automate localization with Apache Spark and other open source tools

Michelle Casbon (Qordoba)
14:5515:35 Wednesday, 24 May 2017
Data science and advanced analytics
Location: Hall S21/23 (B)
Level: Intermediate
Average rating: *****
(5.00, 1 rating)

Who is this presentation for?

  • Engineers and product managers

Prerequisite knowledge

  • A general familiarity with common open source tools, system architectures, and basic machine learning

What you'll learn

  • Learn the techniques Qordoba uses to provide continuous deployment of localized strings, live syncing across platforms, content generation for any locale, and emotional response
  • Explore Qordoba's architecture for handling billions of localized strings in many different languages

Description

In order to establish a user base across the globe, a product needs to support a variety of locales. The challenge with supporting multiple locales is the maintenance and generation of localized strings, which are deeply integrated into many facets of a product. To address these challenges, Qordoba is using machine learning with highly scalable technologies such as Apache Spark to automate the process. Specifically, Qordoba needs to generate high-quality translations in many different languages and make them available in real time across platforms (e.g., mobile, print, and the web).

Michelle Casbon describes the techniques Qordoba uses to provide continuous deployment of localized strings, live syncing across platforms (mobile, web, photoshop, sketch, help desk, etc.), content generation for any locale, and emotional response. Michelle also explores Qordoba’s architecture for handling billions of localized strings in many different languages, explaining how Qordoba uses:

  • Scala and Akka as an orchestration layer
  • Apache Cassandra and MariaDB as a storage layer
  • Apache Spark for natural language processing
  • Apache Kafka as a message bus for reporting, billing, and notifications
  • Apache Mesos, Marathon, and Docker for containerized deployment

. . .all in a platform that makes it feasible to build products that feel native to every user, regardless of language.

Photo of Michelle Casbon

Michelle Casbon

Qordoba

Michelle Casbon is director of data science at Qordoba. Michelle’s development experience spans more than a decade across various industries, including media, investment banking, healthcare, retail, and geospatial services. Previously, she was a senior data science engineer at Idibon, where she built tools for generating predictions on textual datasets. She loves working with open source projects and has contributed to Apache Spark and Apache Flume. Her writing has been featured in the AI section of O’Reilly Radar. Michelle holds a master’s degree from the University of Cambridge, focusing on NLP, speech recognition, speech synthesis, and machine translation.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)