Presented By O'Reilly and Cloudera
Make Data Work
Feb 17–20, 2015 • San Jose, CA

Lessons from Running Large Scale Spark Workloads

Reynold Xin (Databricks), Matei Zaharia (Databricks)
10:40am–11:20am Thursday, 02/19/2015
Spark in Action
Location: 210 C/G
Average rating: ****.
(4.00, 8 ratings)

We have been working with Spark users in the past few months to push the scalability limit of Spark. In this talk, we will introduce some of the largest scale Spark use cases in multiple dimensions, including dataset size, computational complexity, and cluster size.

We will describe architectural changes that we incorporate in recent releases of Spark that are designed specifically to address challenges from large scale workloads, and present the audience with tips in tuning these workloads.

Photo of Reynold Xin

Reynold Xin


Reynold Xin is a committer on Apache Spark. He is also a co-founder of Databricks. Before Databricks, he was pursuing a PhD in the UC Berkeley AMPLab.

Photo of Matei Zaharia

Matei Zaharia


Matei Zaharia is an assistant professor of computer science at MIT, and the initial creator of Apache Spark. He is currently on industry leave to start Databricks, a company commercializing Spark, where he is CTO.

Comments on this page are now closed.


Vu Ha
02/20/2015 6:22am PST

Will the slides be available? Thanks!