Deep learning algorithms have been widely used in many real-world applications, including computer vision, machine translation, and fraud detection. Unfortunately, deep learning only works best when the model is big and trained on large-scale datasets. Meanwhile, distributed computing platforms like Spark are designed to handle big data, and have been used extensively. By having deep learning available on Spark, businesses can fully take advantage of deep learning capabilities on their datasets using their existing Spark infrastructure.
In this talk, we present a scalable implementation of predictive deep learning algorithms on Spark, including feedforward neural networks, convolutional neural networks (CNNs), and recurrent neural networks (RNNs). This, to our best knowledge, is the first successful implementation of CNNs and RNNs on Spark. To support big model training, we use Tachyon as common storage layers between the Spark workers. With its in-memory distributed execution model, Tachyon provides a scalable approach even when the model is too big to be handled on a single machine. Our solution also exploits graphical processing units (GPUs) for matrix computation whenever they are available on worker nodes, further improving execution time.
The attendees will learn about deep learning models, the architecture of the system, and how to train and run deep learning models on Spark with Tachyon.
Christopher Nguyen is CEO and cofounder of Arimo (née Adatao), the leader in collaborative, predictive intelligence for enterprises. Previously, Christopher served as engineering director of Google Apps and cofounded two successful startups. As a professor, he also cofounded the computer engineering program at HKUST (香港科技大学). Christopher has a BS from UC Berkeley, where he graduated summa cum laude, and a PhD from Stanford, where he created the first standard-encoding Vietnamese software suite, authored RFC 1456, and contributed to Unicode 1.1. He is also a cocreator of the open source Distributed DataFrame project.
Vu Pham is a machine learning software engineer at Adatao, with focus in deep learning. He helps build Adatao’s deep learning solutions. He is an avid contributor to various open-source projects such as cubgs, Deepnet, and deeplearning4j. Prior to Adatao, he worked in academia and industry, and authored and co-authored several scientific papers.
Michael (Bach) Bui is a co-founder and engineering lead of Adatao. Prior, he worked on Hadoop 2.0 at Yahoo!, having completed his PhD in CS from the University of Illinois, Urbana-Champagne, where his focussed on real-time distributed systems engineering. Michael was a lead developer of Adatao’s PredictiveEngine, and has contributed to the early development of Apache Spark.
©2015, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.