Presented By O'Reilly and Cloudera
Make Data Work
December 1–3, 2015 • Singapore
Reynold Xin

Reynold Xin
Cofounder, Databricks


Reynold Xin is a cofounder and chief architect at Databricks as well as an Apache Spark PMC member and release manager for Spark’s 2.0 release. Prior to Databricks, Reynold was pursuing a PhD at the UC Berkeley AMPLab, where he worked on large-scale data processing.


9:00am–5:00pm Tuesday, 12/01/2015
Spark & Beyond
Location: 328-329 Level: Intermediate
Sameer Farooqui (Databricks), Paco Nathan (, Reynold Xin (Databricks)
Average rating: ****.
(4.00, 20 ratings)
The real power and value proposition of Apache Spark is in building a unified use case that combines ETL, batch analytics, real-time stream analysis, machine learning, graph processing and visualizations. In class we will explore various Wikipedia datasets while applying the ideal programming paradigm for each analysis. The class will comprise of about 50% lecture and 50% hands on labs + demos. Read more.
9:55am–10:05am Thursday, 12/03/2015
Location: Summit 1-2
Reynold Xin (Databricks)
Average rating: ***..
(3.94, 34 ratings)
In this talk, Reynold will look back and review Spark’s growth in adoption, use cases, and development. He will then look forward and discuss both technical initiatives and the evolution of the Spark community for 2016. Read more.
11:50am–12:30pm Thursday, 12/03/2015
IoT and Real-time
Location: 324
Reynold Xin (Databricks)
In this talk, we introduce a recent effort in Spark to employ randomized algorithms for a number of common, expensive methods: membership testing, cardinality, stratified sampling, frequent items, quantile estimation. Read more.