Presented By O’Reilly and Cloudera
Make Data Work
March 5–6, 2018: Training
March 6–8, 2018: Tutorials & Conference
San Jose, CA

Building ML and AI pipelines with Spark and TensorFlow

Chris Fregly (PipelineAI)
2:40pm3:20pm Thursday, March 8, 2018
Secondary topics:  Expo Hall
Average rating: *****
(5.00, 1 rating)

Who is this presentation for?

  • Data scientists and data engineers

Prerequisite knowledge

  • A working knowledge of Spark
  • A basic understanding of TensorFlow

What you'll learn

  • Learn how to create an end-to-end pipeline using Spark and TensorFlow


Chris Fregly demonstrates how to extend existing Spark-based data pipelines to include TensorFlow model training and deploying and offers an overview of TensorFlow’s TFRecord format, including libraries for converting to and from other popular file formats such as Parquet, CSV, JSON, and Avro stored in HDFS and S3. All demos are 100% open source and downloadable as Docker images from

Photo of Chris Fregly

Chris Fregly


Chris Fregly is an AWS Technical Evangelist for Machine Learning and AI based in San Francisco. He is founder of the Advanced KubeFlow Meetup and author of the O’Reilly Video Series titled, “High Performance TensorFlow in Production.” Previously, Chris was Founder and Product Manager at PipelineAI where he worked with many small startups and large enterprises to optimize and tune their ML/AI pipelines.

Comments on this page are now closed.


Picture of Zhen Fan
03/11/2018 11:55am PDT

Hi Chris,
I’m very interested in your presentation, could you share your slides? My email address is, I think that would be much helpful to my team. I’m looking forward to have a deep discussion with you, thanks.