Building A Billion User Load Balancer

Adam Lazur (Facebook)
Operations, Mission City Ballroom B4
Average rating: ****.
(4.72, 61 ratings)

Want to learn how facebook scales their load balancing infrastructure to support more than a billion users? We will be revealing the technologies and methods we use to route and balance Facebook’s traffic. This talk will focus on Facebook’s DNS load balancer and software load balancer, and how we use these systems to improve user performance, manage capacity, and increase reliability.

Facebook is used by people located all over the world, and its Traffic team is responsible for balancing that traffic and making our network as fast as possible. The Traffic team at Facebook has built several systems for managing and balancing our site traffic, including both a DNS load balancer and a software load balancer capable of handling several protocols.

Our DNS load balancer has two major components: a central GLB decision engine written in Python that makes all the traffic balancing decisions and then generates DNS maps, and an existing open source C DNS server (tinydns) that serves the actual DNS traffic, directing users to clusters based a lookup table loaded from the DNS map.

Our Python decision engine is named Cartographer. It gathers information on internet topology, user latency, user bandwidth, compute cluster load/availability/performance, and then it crunches a bunch of data and determines the current best cluster to point each ISP’s users at. Cartographer also receives a continuous stream of updates from its different monitoring channels and automatically pushes new DNS maps to the DNS server whenever it needs to adjust cluster load or react to network problems. (It can react to both a gross interruption of service due to a problem with Facebook’s network or clusters, as well as localized outages for users in a given country or who use a given ISP.)

We will talk about the structure of Cartographer and explain some of its core algorithms for programmatically balancing traffic. As it handles traffic routing decisions for more than a billion users on Facebook, it is a great example of a small Python application having large impact.

Photo of Adam Lazur

Adam Lazur

Facebook

Adam has spent the past 8 years diffusing the firehose of traffic for some of the biggest web sites on the internet. He has wholesale replaced the front end load balancing architecture on a massively growing site while it was serving… twice.

Adam is currently a Production Engineering Manager in the Traffic team at Facebook. He and his team build simple, reliable, and scalable components that adapt to the demands of the fastest moving site on the internet.

When Adam is not evangelizing the tenets of the UNIX philosophy, he is a dedicated father and amateur racing daydreamer.

Comments on this page are now closed.

Comments

kenan karakoc
07/30/2013 10:53pm PDT

Hello,

I missed the conferance. How can i watch it ?

Thanks

Samuel Trim
06/27/2013 9:50am PDT

Adam, I’m doing my rounds looking for slides from presentations I found the most informative. Will you be publishing your presentation to this site soon?

Thanks!

Sponsors

Sponsorship Opportunities

For information on exhibition and sponsorship opportunities at the conference, contact Gloria Lombardo at (203) 381-9245 or glombardo@oreilly.com

Media Partner Opportunities

For media partnerships, contact mediapartners@ oreilly.com

Press and Media

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

Contact Us

View a complete list of Velocity contacts