There are many resources available for learning how to use Spark to build collaborative filtering models. However, there are relatively few that explain how to build a large-scale, end-to-end recommender system. Seth Hendrickson demonstrates how to create such a system using Spark Streaming and Elasticsearch for data ingestion and storage, Spark DataFrames and ML pipelines for data aggregation and model building, and Elasticsearch for model management and serving. Along the way, Seth explores techniques for scaling model serving, using Spark Streaming for real-time incremental model updates, and incorporating state-of-the-art models into this framework.
Seth Hendrickson is a top Apache Spark contributor and data scientist at Cloudera. He implemented multinomial logistic regression with elastic net regularization in Spark’s ML library and one-pass elastic net linear regression, contributed several other performance improvements to linear models in Spark, and made extensive contributions to Spark ML decision trees and ensemble algorithms. Previously, he worked on Spark ML as a machine learning engineer at IBM. He holds an MS in electrical engineering from the Georgia Institute of Technology.
Comments on this page are now closed.
©2017, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com