Presented By O'Reilly and Cloudera
Make Data Work
March 13–14, 2017: Training
March 14–16, 2017: Tutorials & Conference
San Jose, CA

A behind-the-scenes look into Spark's API and engine evolutions

Reynold Xin (Databricks)
11:00am11:40am Wednesday, March 15, 2017
Spark & beyond
Location: LL21 C/D
Average rating: ****.
(4.11, 9 ratings)

What you'll learn

  • Explore the history of data systems and understand the lessons learned from this evolution
  • Understand how Spark is developed and get a glimpse at the future of Spark


Apache Spark is the most popular open source project in big data. While many users initially come to Spark for its performance, they stay for the expressiveness of the APIs and ease of use of the engine.

Reynold Xin looks back at the history of data systems, from filesystems, databases, and big data systems (e.g., MapReduce) to “small data” systems (e.g., R and Python), covering the pros and cons of each, the abstractions they provide, and the engines underneath. Reynold then shares lessons learned from this evolution, explains how Spark is developed, and offers a peek into the future of Spark.

Photo of Reynold Xin

Reynold Xin


Reynold Xin is a cofounder and chief architect at Databricks as well as an Apache Spark PMC member and release manager for Spark’s 2.0 release. Prior to Databricks, Reynold was pursuing a PhD at the UC Berkeley AMPLab, where he worked on large-scale data processing.

Comments on this page are now closed.


Picture of Tom Wheeler
03/17/2017 2:08am PDT

Speaker was knowledgeable and enthusiastic. I enjoyed the presentation, which not only showed us where Spark is today, but how we got here in the first place.