Presented By O'Reilly and Cloudera
Make Data Work
September 25–26, 2017: Training
September 26–28, 2017: Tutorials & Conference
New York, NY
Seth Hendrickson

Seth Hendrickson
Data Scientist, Cloudera

Seth Hendrickson is a top Apache Spark contributor and data scientist at Cloudera. He implemented multinomial logistic regression with elastic net regularization in Spark’s ML library and one-pass elastic net linear regression, contributed several other performance improvements to linear models in Spark, and made extensive contributions to Spark ML decision trees and ensemble algorithms. Previously, he worked on Spark ML as a machine learning engineer at IBM. He holds an MS in electrical engineering from the Georgia Institute of Technology.

Sessions

2:55pm3:35pm Wednesday, September 27, 2017
Data Engineering & Architecture, Spark & beyond
Location: 1A 21/22 Level: Intermediate
Holden Karau (Independent), Seth Hendrickson (Cloudera)
Average rating: *****
(5.00, 1 rating)
Apache Spark’s machine learning (ML) pipelines provide a lot of power, but sometimes the tools you need for your specific problem aren’t available yet. Holden Karau and Seth Hendrickson introduce Spark’s ML pipelines and explain how to extend them with your own custom algorithms. Even if you don't have your own algorithm to add, you'll leave with a deeper understanding of Spark's ML pipelines. Read more.
5:25pm6:05pm Wednesday, September 27, 2017
Machine Learning & Data Science, Spark & beyond
Location: 1A 08/10 Level: Advanced
Secondary topics:  Media
Seth Hendrickson (Cloudera), DB Tsai (Netflix)
Average rating: *****
(5.00, 1 rating)
Recent developments in Spark MLlib have given users the power to express a wider class of ML models and decrease model training times via the use of custom parameter optimization algorithms. Seth Hendrickson and DB Tsai explain when and how to use this new API and walk you through creating your own Spark ML optimizer. Along the way, they also share performance benefits and real-world use cases. Read more.