Presented By O'Reilly and Cloudera
Make Data Work
22–23 May 2017: Training
23–25 May 2017: Tutorials & Conference
London, UK

Building a scalable recommendation engine with Spark and Elasticsearch

Seth Hendrickson (Cloudera)
14:0514:45 Wednesday, 24 May 2017
Level: Intermediate
Average rating: ***..
(3.57, 7 ratings)

Who is this presentation for?

  • Data scientists and data engineers

Prerequisite knowledge

  • A basic working knowledge of Spark and Elasticsearch
  • Familiarity with recommender systems (useful but not required)

What you'll learn

  • Learn how to build a large-scale, end-to-end recommender system using Spark Streaming, Spark ML, and Elasticsearch


There are many resources available for learning how to use Spark to build collaborative filtering models. However, there are relatively few that explain how to build a large-scale, end-to-end recommender system. Seth Hendrickson demonstrates how to create such a system using Spark Streaming and Elasticsearch for data ingestion and storage, Spark DataFrames and ML pipelines for data aggregation and model building, and Elasticsearch for model management and serving. Along the way, Seth explores techniques for scaling model serving, using Spark Streaming for real-time incremental model updates, and incorporating state-of-the-art models into this framework.

Photo of Seth Hendrickson

Seth Hendrickson


Seth Hendrickson is a top Apache Spark contributor and data scientist at Cloudera. He implemented multinomial logistic regression with elastic net regularization in Spark’s ML library and one-pass elastic net linear regression, contributed several other performance improvements to linear models in Spark, and made extensive contributions to Spark ML decision trees and ensemble algorithms. Previously, he worked on Spark ML as a machine learning engineer at IBM. He holds an MS in electrical engineering from the Georgia Institute of Technology.

Comments on this page are now closed.


Michael Siebers | CTO
26/05/2017 13:55 BST

It was a great presentation. Well done.
I’m waiting for the slides to get online