Deep learning algorithms have been widely used in many real-world applications, including computer vision, machine translation, and fraud detection. Unfortunately, deep learning only works best when the model is big and trained on large-scale datasets. Meanwhile, distributed computing platforms like Spark are designed to handle big data, and have been used extensively. By having deep learning available on Spark, businesses can fully take advantage of deep learning capabilities on their datasets using their existing Spark infrastructure.
In this talk, we present a scalable implementation of predictive deep learning algorithms on Spark, including feedforward neural networks, convolutional neural networks (CNNs), and recurrent neural networks (RNNs). This, to our best knowledge, is the first successful implementation of CNNs and RNNs on Spark. To support big model training, we use Tachyon as common storage layers between the Spark workers. With its in-memory distributed execution model, Tachyon provides a scalable approach even when the model is too big to be handled on a single machine. Our solution also exploits graphical processing units (GPUs) for matrix computation whenever they are available on worker nodes, further improving execution time.
The attendees will learn about deep learning models, the architecture of the system, and how to train and run deep learning models on Spark with Tachyon.
Christopher Nguyen is president and CEO of Arimo, a Panasonic company in Silicon Valley, where he leads the development of AI platforms and solutions for the enterprise. Previously, he was engineering director of Google Apps and cofounded two other successful startups. As a professor, Christopher cofounded the Computer Engineering Program at HKUST. He holds a BS (summa cum laude) from the University of California, Berkeley, and a PhD from Stanford, where he created the first standard-encoding Vietnamese software suite, authored RFC 1456, and contributed to Unicode 1.1.
Vu Pham is a machine learning software engineer at Adatao, with focus in deep learning. He helps build Adatao’s deep learning solutions. He is an avid contributor to various open-source projects such as cubgs, Deepnet, and deeplearning4j. Prior to Adatao, he worked in academia and industry, and authored and co-authored several scientific papers.
Michael (Bach) Bui is a co-founder and engineering lead of Adatao. Prior, he worked on Hadoop 2.0 at Yahoo!, having completed his PhD in CS from the University of Illinois, Urbana-Champagne, where his focussed on real-time distributed systems engineering. Michael was a lead developer of Adatao’s PredictiveEngine, and has contributed to the early development of Apache Spark.
©2015, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.