Scaling Apache Spark at Facebook
Who is this presentation for?
- Software engineers, data engineers, and data scientists
Spark started at Facebook as an experiment when the project was still in its early phases. Spark’s appeal stemmed from its ease of use and an integrated environment to run SQL, MLlib, and custom applications. At that time, the system was used by a handful of people to process small amounts of data.
However, Facebook has come a long way since then. Currently, Spark is one of Facebook’s primary SQL engines in addition to being the primary system for writing custom batch applications. Sameer Agarwal dives into the story of how Facebook optimized, tuned, and scaled Apache Spark to run on clusters of tens of thousands of machines, processing hundreds of petabytes of data, and being used by thousands of data scientists, engineers, and product analysts every day. You’ll specifically hear about scaling compute, or how Facebook runs Spark efficiently and reliably on tens of thousands of heterogenous machines in disaggregated (shared storage) clusters; optimizing core engine, or how Facebook continuously tunes, optimizes, and adds features to the core engine in order to maximize the useful work done per second; and scaling users, or how Facebook makes Spark easy to use and faster to debug to seamlessly onboard new users.
- Familiarity with SQL, Spark, and databases
What you'll learn
- Discover how Facebook optimized, tuned, and scaled Apache Spark to run on clusters of tens of thousands of machines, processing hundreds of petabytes of data, and used by thousands of data scientists, engineers and product analysts every day
Sameer Agarwal is an Apache Spark committer and a software engineer at Facebook, where he works as part of the data warehouse team on building distributed systems and databases that scale across clusters of tens of thousands of machines. He received his PhD in databases from UC Berkeley AMPLab where he worked on BlinkDB, an approximate query engine for Spark.
- Production Engineering Manager at Facebook (Data Warehouse Team)
- Data Infrastructure Team at Facebook since 2012
- Previously worked on the search team at Yahoo!
Comments on this page are now closed.
For conference registration information and customer service
For more information on community discounts and trade opportunities with O’Reilly conferences
For information on exhibiting or sponsoring a conference
For media/analyst press inquires