Presented By O'Reilly and Cloudera
Make Data Work
31 May–1 June 2016: Training
1 June–3 June 2016: Conference
London, UK

Breaking Spark: Top five mistakes to avoid when using Apache Spark in production

Neelesh Salian (Stitch Fix)
16:35–17:15 Friday, 3/06/2016
Spark & beyond
Location: Capital Suite 13 Level: Intermediate
Average rating: **...
(2.43, 7 ratings)

Spark has been growing in deployments for the past year. The increasing amount of data being analyzed and processed through the framework is massive and continues to push the boundaries of the engine. Drawing on his experiences across 150+ production deployments, Neelesh Srinivas Salian explores common issues observed in a cluster environment setup with Apache Spark across five main areas:

  1. Scaling of the architecture
  2. Memory configurations
  3. End-user code
  4. Incompatible dependencies
  5. Administration- and operation-related issues

Attendees can use Neelesh’s observations to improve the usability and supportability of their Apache Spark deployments and avoid such issues in the future.

Photo of Neelesh Salian

Neelesh Salian

Stitch Fix

Neelesh Srinivas Salian is a software engineer on the data platform team at Stitch Fix, where he works on the compute infrastructure used by the company’s data scientists. Previously, he worked at Cloudera, where he worked with Apache projects like YARN, Spark, and Kafka.

Comments on this page are now closed.


Picture of Neelesh Salian
Neelesh Salian
8/06/2016 23:17 BST

@Yann: Uploaded the slides here. Please let me know if you aren’t able to obtain them.

Yann Barraud
8/06/2016 10:22 BST


Are slides availae somewhere ?