Presented By O'Reilly and Cloudera
Make Data Work
March 28–29, 2016: Training
March 29–31, 2016: Conference
San Jose, CA

The state of Spark and where it is going in 2016

Reynold Xin (Databricks)
11:00am–11:40am Wednesday, 03/30/2016
Spark & Beyond

Location: 210 A/E
Average rating: ****.
(4.36, 28 ratings)

Prerequisite knowledge

Participants should have a general understanding of data infrastructure.

Description

2015 is the year of data science and platformization for Apache Spark. With new high-level APIs (e.g., DataFrames, ML Pipelines, and R) and extension points, Spark is accessible to a wider set of users and can plug in a myriad of data sources, algorithms, and external packages. 2015 also marks the beginning of Project Tungsten, a major revamp of Spark’s execution engine to improve its robustness and performance.

In 2016, Spark will continue pushing the boundaries of these dimensions, making it more powerful and even easier to use. Reynold Xin outlines three trends for the immediate future, discussing the major efforts to address them and exploring their implications for Spark users. These trends include:

  1. A tighter integration of streaming systems and existing enterprise data infrastructure
  2. An emphasis on elasticity and cloud computing for enterprise data infrastructure
  3. The rise of new hardware such as SSDs, GPUs, and 3D XPoint bringing abundant computing resources
Photo of Reynold Xin

Reynold Xin

Databricks

Reynold Xin is a cofounder and chief architect at Databricks as well as an Apache Spark PMC member and release manager for Spark’s 2.0 release. Prior to Databricks, Reynold was pursuing a PhD at the UC Berkeley AMPLab, where he worked on large-scale data processing.

Comments on this page are now closed.

Comments

Arthur Yeo
04/04/2016 4:13am PDT

Where are the slides?

Yuan Gao
04/01/2016 11:10am PDT

Is the slides available anywhere?