Presented By O'Reilly and Cloudera
Make Data Work
March 28–29, 2016: Training
March 29–31, 2016: Conference
San Jose, CA

Fast big data analytics and machine learning using Alluxio and Spark in Baidu

Bin Fan (Alluxio), Haojun Wang (Baidu)
11:50am–12:30pm Wednesday, 03/30/2016
Spark & Beyond

Location: 230 A
Tags: real-time
Average rating: ****.
(4.25, 8 ratings)

A few months ago, Baidu deployed Alluxio to accelerate its big data analytics workload. Bin Fan and Haojun Wang explain why Baidu chose Alluxio, as well as the details of how they achieved a 30x speedup with Alluxio in their production environment with hundreds of machines. Based on the success of the big data analytics engine, Baidu is currently expanding the Alluxio and Spark infrastructure to accelerate other applications, such as machine learning.

Bin and Haojun also delve into how they built a heterogenous computing platform to accelerate deep learning workloads. This platform consists of heterogeneous computing resources (CPU, GPU, FPGA) managed by a heterogeneous computing layer, as well as heterogeneous storage resources (memory, SSD, HDD) managed by Alluxio.

Photo of Bin Fan

Bin Fan


Bin Fan is a software engineer at Alluxio and a PMC member of the Alluxio project. Previously, Bin worked at Google, building next-generation storage infrastructure, where he won Google’s technical infrastructure award. He holds a PhD in computer science from Carnegie Mellon University.

Photo of Haojun Wang

Haojun Wang


Haojun Wang is a tech lead on Baidu’s US autonomous driving car team. Currently, Haojun is driving the in-car computing platform and offline data platform. Prior to Baidu, he worked at the IBM Silicon Valley Lab, focusing on database core development and big data processing. Haojun received his PhD in computer science from the University of Southern California.