San Jose • New York • London

Build Systems that Drive Business

Sep 30–Oct 1, 2018: Training
Oct 1–3, 2018: Tutorials & Conference

New York, NY

Availability, latency, and cost: Withstanding regional outages

Aaron Blohowiak (Netflix)

11:35am–12:15pm Wednesday, October 3, 2018

Systems Engineering and Architecture
Location: Murray Hill Level: Intermediate

Secondary topics: Systems Architecture & Infrastructure

Average rating:

(3.00, 1 rating)

Download slides (PDF)

Prerequisite knowledge

Familiarity with cloud deployment, scaling, and DNS concepts

What you'll learn

Learn how Netflix operates in multiple regions at scale
Explore the algebraic models, code, and incident management playbooks the company has developed to tame, refine, and leverage its approach

Description

Running in multiple regions is better for your users through increased availability and lower latencies, and it won’t cost as much as you think. Netflix has turned region resiliency from a driver of cost and complexity into a strategic advantage by understanding human and system dynamics both at a high-level and in the nitty-gritty details.

Calamity, heartbreak, and inefficiency drove the company to refine its approach—and its understanding—as it has matured. Executing a failover used to be an all-hands-on-deck situation that would bring VPs to the table. Now, it’s a matter of routine that usually concludes with a brief “all is well” email.

Once you’ve decided to go multiregion, three major questions arise: How many regions do you need? How should you steer users to regions? And how do you actually perform the failover?

Aaron Blohowiak dives into his experience operating in multiple regions at scale at Netflix and shares the algebraic models, code, and incident management playbooks the company has developed to tame, refine, and leverage its approach. He also offers an overview of the design considerations and system models Netflix used to make those decisions.

Aaron Blohowiak

Netflix

Aaron Blohowiak is a senior software engineer on the traffic team at Netflix, where he is applying his passion for empiricism and system design to multiregion high-availability architecture and operations. Aaron has been building, breaking, and fixing systems for over a decade from tiny startups to serving over 100M users at Netflix. He is the coauthor of Chaos Engineering.

Diamond Sponsor

Platinum Sponsors

Gold Sponsors

Silver Sponsors

Innovators

Supporters

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email velocity@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Velocity contacts

©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com