Presented By O'Reilly and Cloudera
Make Data Work
March 13–14, 2017: Training
March 14–16, 2017: Tutorials & Conference
San Jose, CA

Spark Structured Streaming for machine learning

1:50pm2:30pm Wednesday, March 15, 2017
Spark & beyond
Location: LL21 C/D Level: Intermediate
Secondary topics:  Streaming
Average rating: ****.
(4.00, 8 ratings)

Who is this presentation for?

  • Engineers

Prerequisite knowledge

  • Knowledge of Apache Spark, including DataFrames and datasets

What you'll learn

  • Better understand both Spark ML and Structured Streaming


Structured Streaming is new in Apache Spark 2.0, and work is being done to integrate the machine-learning interfaces with this new streaming system. Holden Karau and Seth Hendrickson look at the current state of Structured Streaming and machine learning before walking you through creating your own streaming model. Holden and Seth will also cover how to use structured machine-learning algorithms (if they are merged by the talk). By the end of this session, you’ll have a better understanding of Spark’s Structured Streaming API as well as how machine learning works in Spark.

Photo of Holden Karau

Holden Karau


Holden Karau is a software development engineer at IBM and is active in open source. Prior to IBM, she worked on a variety of big data, search, and classification problems at Alpine, Databricks, Google, Foursquare, and Amazon. Holden is the author of Learning Spark and has assisted with Spark workshops. She graduated from the University of Waterloo with a bachelors of mathematics in computer science.

Photo of Seth Hendrickson

Seth Hendrickson


Seth Hendrickson is a data scientist and Scala developer in IBM’s Spark Technology Center. Seth is focused on developing highly parallel machine-learning algorithms for the Apache Spark cluster computing ecosystem.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)


03/24/2017 4:39am PDT

Hey where can I find the slides to this presentation? Thanks!