Presented By O'Reilly and Cloudera
Make Data Work
September 26–27, 2016: Training
September 27–29, 2016: Tutorials & Conference
New York, NY

Data science at eHarmony: A generalized framework for personalization

1:15pm–1:55pm Wednesday, 09/28/2016
Data science & advanced analytics
Location: Hall 1C Level: Intermediate
Average rating: ****.
(4.38, 8 ratings)

Prerequisite knowledge

  • A basic understanding of supervised machine learning and engineering principles, specifically type safety
  • A general understanding of Apache Spark
  • What you'll learn

  • Get a glimpse into data science at a major matchmaking company
  • Have a better understanding about why a separation of concerns is necessary for success in machine learning
  • Gain exposure to a variety of open source projects you can use yourself
  • Description

    eHarmony has been using machine learning for about eight years. During this time, eHarmony has learned a number of lessons about how to implement machine learning at scale that allow it to rapidly address problems accurately. Recently more business units have needed data-driven models. Jonathan Morra introduces Aloha, an open source project that allows the modeling group to quickly deploy type-safe accurate models to production, and explores how eHarmony creates models with Apache Spark and how it uses them.

    Jonathan first explains why it’s so important for data scientists and engineers to work together, outlining specific real-world problems that can arise when they don’t work. Jonathan then builds the case for a unified modeling framework with feature extraction built into the model representation and introduces eHarmony’s open source modeling framework, Aloha, demonstrating how Aloha lets eHarmony define a common interface between engineering and data science that allows rapid and, more importantly, separate paces on both sides.

    Jonathan also explores how eHarmony makes use of Apache Spark to rapidly train, validate, test, and deploy models automatically and offers an aside into spotz, the hyperparameter optimization tool eHarmony has created and open sourced, giving the audience a taste of how eHarmony uses engineering on the modeling side to train models using a large amount of data.

    Finally, Jonathan takes a deep dive into eHarmony’s matching algorithm and discusses recent advancements in predicting user behavior. Jonathan then goes over how eHarmony uses contextual bandits to help users get the best matching experience everyday and touches on a very recently observed phenomena where eHarmony is able to get a significant lift in matching by training on an intermediate signal. Jonathan will also discuss some open research questions at eHarmony that the team is currently working to address.

    Photo of Jonathan Morra

    Jonathan Morra


    Jon Morra is the Vice President of Data Science at ZEFR. In this role, he leads a team of data scientists responsible for creating data-driven models. Jon and his team are focused on bringing ZEFR’s wealth of information about video on the internet to help better drive customer’s needs and meet market demands. Previously, Jon was the Director of Data Science at eHarmony, where he helped grow the data science team to support multiple business facets.