Presented By O'Reilly and Cloudera
Make Data Work
Sept 29–Oct 1, 2015 • New York, NY

What's coming for the Spark community

Patrick Wendell (Databricks)
11:20am–12:00pm Wednesday, 09/30/2015
Spark & Beyond
Location: 1 E20 / 1 E21 Level: Intermediate
Average rating: ***..
(3.86, 22 ratings)

In the last year Spark has seen substantial growth in adoption as well as the pace and scope of development. This talk will look forward and discuss both technical initiatives and the evolution of the Spark community.

On the technical side, I’ll discuss two key initiatives ahead for Spark. The first is a tighter integration of Spark’s libraries through shared primitives such as the data frame API. The second is across-the-board performance optimizations that exploit schema information embedded in Spark’s newer APIs. These initiatives are both designed to make Spark applications easier to write and faster to run.

On the community side, this talk will focus on the growing ecosystem of extensions, tools, and integrations evolving around Spark. I’ll survey popular language bindings, data sources, notebooks, visualization libraries, statistics libraries, and other community projects. Extensions will be a major point of growth in the future, and this talk will discuss how we can position the upstream project to help encourage and foster this growth.

Photo of Patrick Wendell

Patrick Wendell

Databricks

Patrick Wendell is a cofounder of Databricks as well as a founding committer and PMC member of Apache Spark. Patrick has acted as release manager for several Spark releases in addition to maintaining several subsystems of Spark’s core engine. At Databricks, Patrick directs the company’s maintenance and development of Spark.

Patrick holds an MS in computer science from UC Berkeley, where his research focused on low-latency scheduling for large-scale analytics workloads, and a BSE in computer science from Princeton University.