Presented By O'Reilly and Cloudera
Make Data Work
September 25–26, 2017: Training
September 26–28, 2017: Tutorials & Conference
New York, NY

Apache Spark in the hands of data scientists

11:20am12:00pm Wednesday, September 27, 2017
Data engineering, Data Engineering & Architecture
Location: 1A 15/16/17 Level: Intermediate
Secondary topics:  ecommerce

Who is this presentation for?

  • Data scientists, data engineers, software engineers, and those working in big data operations

Prerequisite knowledge

  • A basic understanding of Spark, the AWS ecosystem, Python, and SQL

What you'll learn

  • Explore the data platform used by data scientists at Stitch Fix, based on the Spark ecosystem

Description

Stitch Fix is a unique environment in the retail market. Data, the backbone of the business, is used to help with styling recommendations, demand modeling, user acquisition, and merchandize planning and also to influence business decisions throughout the organization. These decisions are backed by algorithms and data collected and interpreted based on client preferences.

Neelesh Srinivas Salian offers an overview of the data platform used by data scientists at Stitch Fix, based on the Spark ecosystem. Apache Spark plays an important role in Stitch Fix’s data platform, and the company’s data scientists also use Spark for their ETL and SQL needs. The goal for the team running the data platform is to understand and make the data scientists’ lives easier, particularly in terms of usability of Spark, by building a platform that makes it easier to get started with Spark and transition SQL queries over to Spark’s SQL API.

Neelesh focuses on Stitch Fix’s journey, exploring its Spark setup and offering an overview of its in-house tools and how they work in synergy with open source frameworks in a cloud environment. Neelesh also covers the additional improvements to the infrastructure that helps persist information for future use and optimization and explains how Amazon’s EMR FS implementation has helped make it easier to read from the S3 source.

Photo of Neelesh Srinivas Salian

Neelesh Srinivas Salian

Stitch Fix

Neelesh Srinivas Salian is a software engineer on the data platform team at Stitch Fix, where he works on the compute infrastructure used by the company’s data scientists. This includes the Spark environment that is used to help make data-driven decisions.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)