Build resilient systems at scale
October 12–14, 2015 • New York, NY

Availability and chaos

Jeremy Edberg (MinOps)
3:30pm–5:00pm Monday, 10/12/2015
Location: Nassau Suite
Average rating: ****.
(4.29, 24 ratings)
Slides:   1-PDF 

Materials or downloads needed in advance

Basic knowledge of system administration and software principles. We'll start the session with some distributed computing basics, but if you already know them then everything after the first 10 minutes will still be useful.


This talk will cover the latest in availability and chaos initiatives, including a lot of examples from my experience at Netflix, with programs with names like FIT, NTS, Blue, and Chaos Engineering. Even if you’ve been to a Netflix talk before, by me or anyone else, this will be mostly new information.

Take a deeper look than ever before into all the things you can do to make sure your system just works. I’ll cover not only what to do, but why you do each one; the motivation for each specific system; what kinds of outages, problems or theories led to each system; and what each one tests and how.

I’ll also get into the details of the outcomes of each system, which ones were successful, which ones were not, and why they were or weren’t.

If you’re thinking of adopting a culture of deliberately unstable systems, then don’t miss this talk!

Photo of Jeremy Edberg

Jeremy Edberg


Jeremy Edberg, the CEO and Founder of MinOps, which makes using the cloud stupid easy. He is an angel investor and advisor for various incubators and startups. Previously, Jeremy was the founding reliability engineer for Netflix. Before that, he ran Ops for Reddit, which at the time had more than five billion pageviews a month. Jeremy’s expertise is in distributed computing, availability, rapid scaling, and cloud computing. He also edited the highly acclaimed AWS for Dummies.

Stay Connected

Follow Velocity on Twitter Facebook Group Google+ LinkedIn Group


More Videos »

O’Reilly Media

Tech insight, analysis, and research