Presented By O'Reilly and Cloudera
Make Data Work
September 26–27, 2016: Training
September 27–29, 2016: Tutorials & Conference
New York, NY

Breaking Spark: The top five mistakes to avoid when using Apache Spark in production

Neelesh Salian (Stitch Fix)
5:25pm–6:05pm Wednesday, 09/28/2016
Spark & beyond
Location: Hall 1B Level: Intermediate
Average rating: ***..
(3.00, 7 ratings)

Prerequisite knowledge

  • An understanding of the Spark ecosystem, Spark's architecture, and how the API works
  • Familiarity with Spark use cases
  • What you'll learn

  • Understand what to look out for while maintaining a Spark cluster environment
  • Description

    Apache Spark has been growing in deployments for the past two years. The increasing amount of data being analyzed and processed through the framework is massive, and it continues to push the boundaries of the engine. Drawing on his experiences across 150+ production deployments, Neelesh Srinivas Salian focuses on five common issues observed in a cluster environment setup with Apache Spark (Core, Streaming, and SQL) to help you improve the usability and supportability of Apache Spark and avoid such issues in future deployments.

    Topics include:

    • Scaling the architecture
    • Memory configurations
    • End-user code
    • Incompatible dependencies
    • Administration- and operation-related issues
    Photo of Neelesh Salian

    Neelesh Salian

    Stitch Fix

    Neelesh Srinivas Salian is a software engineer on the data platform team at Stitch Fix, where he works on the compute infrastructure used by the company’s data scientists. Previously, he worked at Cloudera, where he worked with Apache projects like YARN, Spark, and Kafka.

    Comments on this page are now closed.

    Comments

    09/29/2016 7:25am EDT

    are slides available? thanks