Scaling Apache Spark at Facebook
Who is this presentation for?Software Engineers, Data Engineers, Data Scientists
Spark started at Facebook as an experiment when the project was still in its early phases. Spark’s appeal stemmed from its ease of use and an integrated environment to run SQL, MLlib, and custom applications. At that time the system was used by a handful of people to process small amounts of data. However, we’ve come a long way since then. Currently, Spark is one of the primary SQL engines at Facebook in addition to being the primary system for writing custom batch applications. This talk will cover the story of how we optimized, tuned and scaled Apache Spark at Facebook to run on clusters of tens of thousands of machines, processing hundreds of petabytes of data, and used by thousands of data scientists, engineers and product analysts every day. Specifically, we’ll focus on three areas:
1. Scaling Compute: How Facebook runs Spark efficiently and reliably on tens of thousands of heterogenous machines in disaggregated (shared-storage) clusters.
2. Optimizing Core Engine: How we continuously tune, optimize and add features to the core engine in order to maximize the useful work done per second.
3. Scaling Users: How we make Spark easy to use, and faster to debug to seamlessly onboard new users.
Prerequisite knowledgeSQL, Spark, Databases
What you'll learn
Sameer Agarwal is an Apache Spark Committer and a Software Engineer at Facebook where he works as part of the Data Warehouse team on building distributed systems and databases that scale across clusters of tens of thousands of machines. He received his PhD in Databases from UC Berkeley AMPLab where he worked on BlinkDB, an approximate query engine for Spark.
Leave a Comment or Question
Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?
Join the conversation here (requires login)
For conference registration information and customer service
For more information on community discounts and trade opportunities with O’Reilly conferences
For information on exhibiting or sponsoring a conference
View a complete list of Strata Data Conference contacts