The combination of deep learning with Apache Spark has the potential for huge impact in industry. Joseph Bradley and Tim Hunter share best practices for building deep learning pipelines with Apache Spark. Rather than comparing deep learning systems or specific optimizations, Joseph and Tim focus on issues that are common to many deep learning frameworks when running on a Spark cluster: optimizing cluster setup and data ingest, tuning the cluster, and monitoring long-running jobs—all demonstrated using Google’s TensorFlow library. More specifically, Joseph and Tim cover typical issues users encounter when integrating deep learning libraries with Spark clusters.
Joseph Bradley is a software engineer working on machine learning at Databricks. Joseph is an Apache Spark committer and PMC member. Previously, he was a postdoc at UC Berkeley. Joseph holds a PhD in machine learning from Carnegie Mellon University, where he focused on scalable learning for probabilistic graphical models, examining trade-offs between computation, statistical efficiency, and parallelization.
Tim Hunter is a software engineer at Databricks and contributes to the Apache Spark MLlib project. Tim holds a PhD from UC Berkeley, where he built distributed machine-learning systems starting with Spark version 0.2.
©2017, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.