Presented By O’Reilly and Cloudera
Make Data Work
March 5–6, 2018: Training
March 6–8, 2018: Tutorials & Conference
San Jose, CA

Building ML and AI pipelines with Spark and TensorFlow

Chris Fregly (PipelineAI)
2:40pm3:20pm Thursday, March 8, 2018
Secondary topics:  Expo Hall
Average rating: *****
(5.00, 1 rating)

Who is this presentation for?

  • Data scientists and data engineers

Prerequisite knowledge

  • A working knowledge of Spark
  • A basic understanding of TensorFlow

What you'll learn

  • Learn how to create an end-to-end pipeline using Spark and TensorFlow


Chris Fregly demonstrates how to extend existing Spark-based data pipelines to include TensorFlow model training and deploying and offers an overview of TensorFlow’s TFRecord format, including libraries for converting to and from other popular file formats such as Parquet, CSV, JSON, and Avro stored in HDFS and S3. All demos are 100% open source and downloadable as Docker images from

Photo of Chris Fregly

Chris Fregly


Chris Fregly is founder and research engineer at PipelineAI, a San Francisco-based streaming machine learning and artificial intelligence startup. Previously, Chris was a distributed systems engineer at Netflix, a data solutions engineer at Databricks, and a founding member of the IBM Spark Technology Center in San Francisco. Chris is a regular speaker at conferences and meetups throughout the world. He’s also an Apache Spark contributor, a Netflix Open Source committer, founder of the Global Advanced Spark and TensorFlow meetup, author of the upcoming book Advanced Spark, and creator of the O’Reilly video series Deploying and Scaling Distributed TensorFlow in Production.

Comments on this page are now closed.


Picture of Zhen Fan
03/11/2018 11:55am PDT

Hi Chris,
I’m very interested in your presentation, could you share your slides? My email address is, I think that would be much helpful to my team. I’m looking forward to have a deep discussion with you, thanks.