Human Fault-tolerance

Mission City Ballroom
This presentation will be streamed live.
Average rating: ****.
(4.21, 19 ratings)

There’s been a huge amount of progress in recent years in developing distributed systems that are resilient to all sorts of faults. However, there’s one critical category of errors that has largely been ignored: human error. The scope and potential impact of human error is massive: deployed bugs, accidentally deleting data, accidentally DDOS’ng important internal services, and so on. Designing for human fault-tolerance leads to important conclusions on the fundamental ways data systems should be architected.

Photo of Nathan Marz

Nathan Marz


Nathan Marz is the lead engineer on Twitter’s Publisher Analytics team. He was previously the lead engineer at BackType before being acquired by Twitter in July of 2011.

Nathan is the author of numerous open-source projects relied upon by companies all around the world. These include Cascalog, ElephantDB, and Storm.

He has spoken about his work at conferences such as the Hadoop Summit, Strange Loop, Gluecon, Clojure/conj, and POSSCON. He writes a blog at


Sponsorship Opportunities

For information on exhibition and sponsorship opportunities at the conference, contact Susan Stewart at

Media Partner Opportunities

For information on trade opportunities with O'Reilly conferences contact Kathy Yu at mediapartners

Press and Media

For media-related inquiries, contact Maureen Jennings at

Contact Us

View a complete list of Strata contacts