Put open source to work
July 16–17, 2018: Training & Tutorials
July 18–19, 2018: Conference
Portland, OR

Powering TensorFlow with big data using Apache Beam, Flink, and Spark

Holden Karau (Independent)
2:35pm3:15pm Wednesday, July 18, 2018
Artificial intelligence, TensorFlow
Location: Portland 251
Tags: tensorflow
Level: Intermediate
Average rating: **...
(2.20, 5 ratings)

Who is this presentation for?

  • Software engineers

Prerequisite knowledge

  • Familiarity with TensorFlow, Apache Spark, Flink, and Beam (useful but not required)

What you'll learn

  • Understand how to work with TensorFlow and big data systems


TensorFlow is all kinds of fancy, from helping startups raising their series A in Silicon Valley to detecting if something is a cat. However, when things start to get “real,” you may find yourself no longer just dealing with mnist.csv but instead needing do large-scale data prep as well as training.

Holden Karau details how to use TensorFlow in conjunction with Apache Spark, Flink, and Beam to create a full machine learning pipeline—including the annoying feature engineering and data prep components that we like to pretend don’t exist. Holden also explains why these feature prep stages need to be integrated into the serving layer. She concludes by examining changing industry trends, like Apache Arrow, and how they impact cross-language development for things like deep learning. Even if you’re not trying to raise a round of funding in Silicon Valley, this talk will give you tools to do interesting machine learning problems at scale.

Photo of Holden Karau

Holden Karau


Holden Karau is a transgender Canadian software engineer working in the bay area. Previously, she worked at IBM, Alpine, Databricks, Google (twice), Foursquare, and Amazon. Holden is the coauthor of Learning Spark, High Performance Spark, and another Spark book that’s a bit more out of date. She’s a committer on the Apache Spark, SystemML, and Mahout projects. When not in San Francisco, Holden speaks internationally about different big data technologies (mostly Spark). She was tricked into the world of big data while trying to improve search and recommendation systems and has long since forgotten her original goal. Outside of work, she enjoys playing with fire, riding scooters, and dancing.

Comments on this page are now closed.


Christopher Bess | LEAD DEVELOPER
07/26/2018 4:59am PDT

From what I could gather the information was relevant and generally helpful. I was grateful that Holden used the command line and showed more detail. But, the swearing (F-bombs, S-word), using several curse words, was unnecessary and unprofessional.