Presented By O'Reilly and Cloudera
Make Data Work
Sept 29–Oct 1, 2015 • New York, NY

First-ever scalable, distributed deep learning architecture using Spark and Tachyon

Christopher Nguyen (Arimo), Vu Pham (Adatao, Inc), Michael Bui (Adatao, Inc.)
2:05pm–2:45pm Thursday, 10/01/2015
Spark & Beyond
Location: 1 E20 / 1 E21 Audience level: Intermediate
Average rating: ****.
(4.14, 7 ratings)

Deep learning algorithms have been widely used in many real-world applications, including computer vision, machine translation, and fraud detection. Unfortunately, deep learning only works best when the model is big and trained on large-scale datasets. Meanwhile, distributed computing platforms like Spark are designed to handle big data, and have been used extensively. By having deep learning available on Spark, businesses can fully take advantage of deep learning capabilities on their datasets using their existing Spark infrastructure.

In this talk, we present a scalable implementation of predictive deep learning algorithms on Spark, including feedforward neural networks, convolutional neural networks (CNNs), and recurrent neural networks (RNNs). This, to our best knowledge, is the first successful implementation of CNNs and RNNs on Spark. To support big model training, we use Tachyon as common storage layers between the Spark workers. With its in-memory distributed execution model, Tachyon provides a scalable approach even when the model is too big to be handled on a single machine. Our solution also exploits graphical processing units (GPUs) for matrix computation whenever they are available on worker nodes, further improving execution time.

The attendees will learn about deep learning models, the architecture of the system, and how to train and run deep learning models on Spark with Tachyon.

Photo of Christopher Nguyen

Christopher Nguyen


Christopher Nguyen is CEO and cofounder of Arimo (née Adatao), the leader in collaborative, predictive intelligence for enterprises. Previously, Christopher served as engineering director of Google Apps and cofounded two successful startups. As a professor, he also cofounded the computer engineering program at HKUST (香港科技大学). Christopher has a BS from UC Berkeley, where he graduated summa cum laude, and a PhD from Stanford, where he created the first standard-encoding Vietnamese software suite, authored RFC 1456, and contributed to Unicode 1.1. He is also a cocreator of the open source Distributed DataFrame project.

Photo of Vu Pham

Vu Pham

Adatao, Inc

Vu Pham is a machine learning software engineer at Adatao, with focus in deep learning. He helps build Adatao’s deep learning solutions. He is an avid contributor to various open-source projects such as cubgs, Deepnet, and deeplearning4j. Prior to Adatao, he worked in academia and industry, and authored and co-authored several scientific papers.

Photo of Michael Bui

Michael Bui

Adatao, Inc.

Michael (Bach) Bui is a co-founder and engineering lead of Adatao. Prior, he worked on Hadoop 2.0 at Yahoo!, having completed his PhD in CS from the University of Illinois, Urbana-Champagne, where his focussed on real-time distributed systems engineering. Michael was a lead developer of Adatao’s PredictiveEngine, and has contributed to the early development of Apache Spark.