Presented By O'Reilly and Cloudera
Make Data Work
March 13–14, 2017: Training
March 14–16, 2017: Tutorials & Conference
San Jose, CA

Best practices for deep learning on Apache Spark

Joseph Bradley (Databricks), Tim Hunter (Databricks, Inc.)
11:50am12:30pm Thursday, March 16, 2017
Spark & beyond
Location: 210 A/E
Secondary topics:  Deep learning, Hardcore Data Science
Average rating: ***..
(3.75, 4 ratings)

What you'll learn

  • Learn best practices for building deep learning pipelines with Apache Spark


The combination of deep learning with Apache Spark has the potential for huge impact in industry. Joseph Bradley and Tim Hunter share best practices for building deep learning pipelines with Apache Spark. Rather than comparing deep learning systems or specific optimizations, Joseph and Tim focus on issues that are common to many deep learning frameworks when running on a Spark cluster: optimizing cluster setup and data ingest, tuning the cluster, and monitoring long-running jobs—all demonstrated using Google’s TensorFlow library. More specifically, Joseph and Tim cover typical issues users encounter when integrating deep learning libraries with Spark clusters.

Photo of Joseph Bradley

Joseph Bradley


Joseph Bradley is a software engineer working on machine learning at Databricks. Joseph is an Apache Spark committer and PMC member. Previously, he was a postdoc at UC Berkeley. Joseph holds a PhD in machine learning from Carnegie Mellon University, where he focused on scalable learning for probabilistic graphical models, examining trade-offs between computation, statistical efficiency, and parallelization.

Photo of Tim Hunter

Tim Hunter

Databricks, Inc.

Tim Hunter is a software engineer at Databricks and contributes to the Apache Spark MLlib project. Tim holds a PhD from UC Berkeley, where he built distributed machine-learning systems starting with Spark version 0.2.