Scaling systems configuration at Facebook: the paradigms, design, and software behind managing massive numbers of systems with open source and small teams

Operations, Grand Ballroom East
Average rating: ***..
(3.74, 19 ratings)
Slides:   1-PDF 

For many years, Facebook managed its systems with cfengine2. With many individual clusters over 10k nodes in size, a slew of different constantly-changing system configurations, and small teams, this system was showing its age and the complexity was steadily increasing, limiting its effectiveness and usability. It was difficult to integrate with internal systems, testing was often impractical, and it provided no isolation of configurations, among many other problems. After an extensive evaluation of the tools and paradigms in modern systems configuration management – open source, proprietary, and a potential home-grown solution – we built a system based on the open-source project Chef. The evaluation process involved understanding the direction we wanted to take in managing the next many iterations of systems, clusters, and teams. More importantly, we evaluated the various paradigms behind effective configuration management and the different kinds of scale they provide. What we ended up with is an extremely flexible system that allows a tiny team to manage an incredibly large number of systems with a variety of unique configuration needs. In this talk we will look at the paradigms behind the system we built, the software we chose and why, and the system we built using that software. Further, we will look at how the philosophies we followed can apply to anyone wanting to scale their systems infrastructure.

Photo of Phil Dibowitz

Phil Dibowitz

Facebook

Phil Dibowitz has been working in systems engineering for 12 years and is currently a production engineer at Facebook. Initially, he worked on the traffic infrastructure team automating load balancer configuration management as well as designing and building the production IPv6 infrastructure. Phil now leads the team responsible for rebuilding the configuration management system from the ground up. Prior to Facebook, he worked at Google managing the large GMail environment, and at Ticketmaster, where he co-authored and open sourced a configuration management tool called Spine (https://github.com/ticketmaster/spine). Phil also contributes to and maintains various open source projects (http://www.phildev.net/) and has spoken around the community at conferences and LUGs on a variety of topics from Path MTU Discovery to X509.

Sponsors

Sponsorship Opportunities

For information on exhibition and sponsorship opportunities at the conference, contact Gloria Lombardo at (203) 381-9245 or glombardo@oreilly.com

Media Partner Opportunities

For media partnerships, contact mediapartners@ oreilly.com

Press and Media

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

Contact Us

View a complete list of Velocity contacts