Presented By O'Reilly and Cloudera
Make Data Work
Sept 29–Oct 1, 2015 • New York, NY

First-ever scalable, distributed deep learning architecture using Spark and Tachyon

Christopher Nguyen (Arimo), Vu Pham (Adatao, Inc), Michael Bui (Adatao, Inc.)
2:05pm–2:45pm Thursday, 10/01/2015
Spark & Beyond
Location: 1 E20 / 1 E21 Level: Intermediate
Average rating: ****.
(4.14, 7 ratings)

Deep learning algorithms have been widely used in many real-world applications, including computer vision, machine translation, and fraud detection. Unfortunately, deep learning only works best when the model is big and trained on large-scale datasets. Meanwhile, distributed computing platforms like Spark are designed to handle big data, and have been used extensively. By having deep learning available on Spark, businesses can fully take advantage of deep learning capabilities on their datasets using their existing Spark infrastructure.

In this talk, we present a scalable implementation of predictive deep learning algorithms on Spark, including feedforward neural networks, convolutional neural networks (CNNs), and recurrent neural networks (RNNs). This, to our best knowledge, is the first successful implementation of CNNs and RNNs on Spark. To support big model training, we use Tachyon as common storage layers between the Spark workers. With its in-memory distributed execution model, Tachyon provides a scalable approach even when the model is too big to be handled on a single machine. Our solution also exploits graphical processing units (GPUs) for matrix computation whenever they are available on worker nodes, further improving execution time.

The attendees will learn about deep learning models, the architecture of the system, and how to train and run deep learning models on Spark with Tachyon.

Photo of Christopher Nguyen

Christopher Nguyen


Christopher Nguyen is president and CEO of Arimo, a Panasonic company in Silicon Valley, where he leads the development of AI platforms and solutions for the enterprise. Previously, he was engineering director of Google Apps and cofounded two other successful startups. As a professor, Christopher cofounded the Computer Engineering Program at HKUST. He holds a BS (summa cum laude) from the University of California, Berkeley, and a PhD from Stanford, where he created the first standard-encoding Vietnamese software suite, authored RFC 1456, and contributed to Unicode 1.1.

Photo of Vu Pham

Vu Pham

Adatao, Inc

Vu Pham is a machine learning software engineer at Adatao, with focus in deep learning. He helps build Adatao’s deep learning solutions. He is an avid contributor to various open-source projects such as cubgs, Deepnet, and deeplearning4j. Prior to Adatao, he worked in academia and industry, and authored and co-authored several scientific papers.

Photo of Michael Bui

Michael Bui

Adatao, Inc.

Michael (Bach) Bui is a co-founder and engineering lead of Adatao. Prior, he worked on Hadoop 2.0 at Yahoo!, having completed his PhD in CS from the University of Illinois, Urbana-Champagne, where his focussed on real-time distributed systems engineering. Michael was a lead developer of Adatao’s PredictiveEngine, and has contributed to the early development of Apache Spark.