Presented By O'Reilly and Cloudera
Make Data Work
March 28–29, 2016: Training
March 29–31, 2016: Conference
San Jose, CA

Breaking Spark: Top 5 mistakes to avoid when using Apache Spark in production

Neelesh Salian (Stitch Fix)
1:50pm–2:30pm Thursday, 03/31/2016
Spark & Beyond

Location: 210 A/E
Average rating: **...
(2.93, 14 ratings)

Spark has been growing in deployments for the past year. The increasing amount of data being analyzed and processed through the framework is massive and continues to push the boundaries of the engine. Drawing on experiences across 150+ production deployments, Neelesh Srinivas Salian explores common issues observed in a cluster environment setup with Apache Spark and offers guidelines to help setup a real-world environment when planning an Apache Spark deployment in a cluster. Attendees can use these observations to improve the usability and supportability of Apache Spark and avoid such issues in their projects.

Topics include:

  • Scaling the architecture
  • Memory configurations
  • End-user code
  • Incompatible dependencies
  • Administration- and operation-related issues
Photo of Neelesh Salian

Neelesh Salian

Stitch Fix

Neelesh Srinivas Salian is a software engineer on the data platform team at Stitch Fix, where he works on the compute infrastructure used by the company’s data scientists. Previously, he worked at Cloudera, where he worked with Apache projects like YARN, Spark, and Kafka.