eHarmony has been using machine learning for about eight years. During this time, eHarmony has learned a number of lessons about how to implement machine learning at scale that allow it to rapidly address problems accurately. Recently more business units have needed data-driven models. Jonathan Morra introduces Aloha, an open source project that allows the modeling group to quickly deploy type-safe accurate models to production, and explores how eHarmony creates models with Apache Spark and how it uses them.
Jonathan first explains why it’s so important for data scientists and engineers to work together, outlining specific real-world problems that can arise when they don’t work. Jonathan then builds the case for a unified modeling framework with feature extraction built into the model representation and introduces eHarmony’s open source modeling framework, Aloha, demonstrating how Aloha lets eHarmony define a common interface between engineering and data science that allows rapid and, more importantly, separate paces on both sides.
Jonathan also explores how eHarmony makes use of Apache Spark to rapidly train, validate, test, and deploy models automatically and offers an aside into spotz, the hyperparameter optimization tool eHarmony has created and open sourced, giving the audience a taste of how eHarmony uses engineering on the modeling side to train models using a large amount of data.
Finally, Jonathan takes a deep dive into eHarmony’s matching algorithm and discusses recent advancements in predicting user behavior. Jonathan then goes over how eHarmony uses contextual bandits to help users get the best matching experience everyday and touches on a very recently observed phenomena where eHarmony is able to get a significant lift in matching by training on an intermediate signal. Jonathan will also discuss some open research questions at eHarmony that the team is currently working to address.
Jon Morra is the Vice President of Data Science at ZEFR. In this role, he leads a team of data scientists responsible for creating data-driven models. Jon and his team are focused on bringing ZEFR’s wealth of information about video on the internet to help better drive customer’s needs and meet market demands. Previously, Jon was the Director of Data Science at eHarmony, where he helped grow the data science team to support multiple business facets.
©2016, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.