Presented By O'Reilly and Cloudera
Make Data Work
31 May–1 June 2016: Training
1 June–3 June 2016: Conference
London, UK

Breaking Spark: Top five mistakes to avoid when using Apache Spark in production

16:35–17:15 Friday, 3/06/2016
Spark & beyond
Location: Capital Suite 13 Level: Intermediate
Average rating: **...
(2.43, 7 ratings)

Spark has been growing in deployments for the past year. The increasing amount of data being analyzed and processed through the framework is massive and continues to push the boundaries of the engine. Drawing on his experiences across 150+ production deployments, Neelesh Srinivas Salian explores common issues observed in a cluster environment setup with Apache Spark across five main areas:

  1. Scaling of the architecture
  2. Memory configurations
  3. End-user code
  4. Incompatible dependencies
  5. Administration- and operation-related issues

Attendees can use Neelesh’s observations to improve the usability and supportability of their Apache Spark deployments and avoid such issues in the future.

Photo of Neelesh Srinivas Salian

Neelesh Srinivas Salian

Stitch Fix

Neelesh Srinivas Salian is a software engineer on the data platform team at Stitch Fix, where he works on the compute infrastructure used by the company’s data scientists. Previously, he worked at Cloudera, where he worked with Apache projects like YARN, Spark, and Kafka. Neelesh holds a master’s degree in computer science with a focus on cloud computing from North Carolina State University and a bachelor’s degree in computer engineering from the University of Mumbai, India.

Comments on this page are now closed.

Comments

Picture of Neelesh Srinivas Salian
Neelesh Srinivas Salian
8/06/2016 23:17 BST

@Yann: Uploaded the slides here. Please let me know if you aren’t able to obtain them.

Yann Barraud
8/06/2016 10:22 BST

Hi,

Are slides availae somewhere ?

Thanks.

Regards,
Yann