Presented By O'Reilly and Cloudera
Make Data Work
Feb 17–20, 2015 • San Jose, CA

Tuning and Debugging in Apache Spark

Patrick Wendell (Databricks)
11:30am–12:10pm Friday, 02/20/2015
Spark in Action
Location: 230 C
Average rating: *****
(5.00, 3 ratings)
Slides:   1-PPTX 

Apache Spark is a popular engine for large scale analytics with built-in libraries for machine learning, stream processing, and SQL query processing.

This talk will give insights into tuning and debugging a production Spark deployment. The talk will be engineering oriented and range from intermediate to advanced concepts.

It will start with details about Spark internals and an explanation of the runtime behavior of a Spark application. This will include explain how high level user programs are compiled into physical execution plans in Spark. I’ll next review common performance bottlenecks encountered by Spark users, along with tips for diagnosing performance problems in a production application. Finally, I’ll cover good design patterns for writing optimized Spark applications.

I’ll leave plenty of time for Q&A on Spark internals and performance.

Photo of Patrick Wendell

Patrick Wendell

Databricks

Patrick Wendell is an engineer at Databricks as well as a Spark
Committer and PMC member. In the Spark project, Patrick has acted as
release manager for several Spark releases, including Spark 1.0 and 1.1.
Patrick also maintains several subsystems of Spark’s core engine.
Before helping start Databricks, Patrick obtained an M.S. in Computer
Science at UC Berkeley. His research focused on low latency scheduling
for large scale analytics workloads. He holds a B.S.E in Computer
Science from Princeton University.

Comments on this page are now closed.

Comments

Vu Ha
02/20/2015 6:23am PST

Will the slides be available? Thanks!