Scaling Apache Spark at Facebook
Who is this presentation for?
- Software engineers, data engineers, and data scientists
Level
Description
Spark started at Facebook as an experiment when the project was still in its early phases. Spark’s appeal stemmed from its ease of use and an integrated environment to run SQL, MLlib, and custom applications. At that time, the system was used by a handful of people to process small amounts of data.
However, Facebook has come a long way since then. Currently, Spark is one of Facebook’s primary SQL engines in addition to being the primary system for writing custom batch applications. Sameer Agarwal dives into the story of how Facebook optimized, tuned, and scaled Apache Spark to run on clusters of tens of thousands of machines, processing hundreds of petabytes of data, and being used by thousands of data scientists, engineers, and product analysts every day. You’ll specifically hear about scaling compute, or how Facebook runs Spark efficiently and reliably on tens of thousands of heterogenous machines in disaggregated (shared storage) clusters; optimizing core engine, or how Facebook continuously tunes, optimizes, and adds features to the core engine in order to maximize the useful work done per second; and scaling users, or how Facebook makes Spark easy to use and faster to debug to seamlessly onboard new users.
Prerequisite knowledge
- Familiarity with SQL, Spark, and databases
What you'll learn
- Discover how Facebook optimized, tuned, and scaled Apache Spark to run on clusters of tens of thousands of machines, processing hundreds of petabytes of data, and used by thousands of data scientists, engineers and product analysts every day
Sameer Agarwal
Sameer Agarwal is an Apache Spark committer and a software engineer at Facebook, where he works as part of the data warehouse team on building distributed systems and databases that scale across clusters of tens of thousands of machines. He received his PhD in databases from UC Berkeley AMPLab where he worked on BlinkDB, an approximate query engine for Spark.
Ankit Agarwal
Facebook Inc.
- Production Engineering Manager at Facebook (Data Warehouse Team)
- Data Infrastructure Team at Facebook since 2012
- Previously worked on the search team at Yahoo!
Comments on this page are now closed.
Presented by
Elite Sponsors
Strategic Sponsors
Zettabyte Sponsors
Contributing Sponsors
Exabyte Sponsors
Content Sponsor
Impact Sponsors
Supporting Sponsor
Non Profit
Contact us
confreg@oreilly.com
For conference registration information and customer service
partners@oreilly.com
For more information on community discounts and trade opportunities with O’Reilly conferences
strataconf@oreilly.com
For information on exhibiting or sponsoring a conference
pr@oreilly.com
For media/analyst press inquires
Comments
Hi, can you please post the slides for this talk