Presented By O'Reilly and Cloudera
Make Data Work
22–23 May 2017: Training
23–25 May 2017: Tutorials & Conference
London, UK

A behind-the-scenes look into Spark's API and engine evolutions

11:1511:55 Wednesday, 24 May 2017
Spark & beyond
Location: Capital Suite 12
Average rating: **...
(2.00, 5 ratings)

What you'll learn

  • Explore the history of data systems and understand the lessons learned from this evolution
  • Understand how Spark is developed and get a glimpse at the future of Spark

Description

Apache Spark is the most popular open source project in big data. While many users initially come to Spark for its performance, they stay for the expressiveness of the APIs and ease of use of the engine.

Herman van Hövell tot Westerflier looks back at the history of data systems, from filesystems, databases, and big data systems (e.g., MapReduce) to “small data” systems (e.g., R and Python), covering the pros and cons of each, the abstractions they provide, and the engines underneath. Reynold then shares lessons learned from this evolution, explains how Spark is developed, and offers a peek into the future of Spark.

Herman van Hövell tot Westerflier

Databricks

Herman van Hövell tot Westerflier is a Spark committer working on Spark SQL at Databricks. Previously, Herman was a consultant working for clients in banking, manufacturing, and logistics. His interests include database systems, optimization, and simulation.