Build Systems that Drive Business
Sep 30–Oct 1, 2018: Training
Oct 1–3, 2018: Tutorials & Conference
New York, NY

Trade-offs in resiliency: Managing the burden of data recoverability

Kristina Bennett (Google)
2:25pm–3:05pm Tuesday, October 2, 2018
Distributed Data
Location: Nassau Level: Intermediate
Secondary topics:  Resilient, Performant & Secure Distributed Systems
Average rating: ***..
(3.50, 2 ratings)

Prerequisite knowledge

  • A basic understanding of storage and recovery practices (e.g., replication, passive replicas, snapshots, DB event logs, etc.) and SLOs

What you'll learn

  • Explore best practices for practical data recoverability and the pitfalls awaiting the unwary


Almost every service has critical data somewhere, whether it’s large-scale blob storage or minimalistic index tables or just the service’s own production configuration. The data’s sizes and shapes and storage technologies vary widely, yet the possibilities for data loss remain, and the same obstacles to recovery consistently appear.

Kristina Bennett shares best practices that can prepare a service for practical data recoveries, highlights some of the hidden dangers waiting to ambush a recovery attempt, and examines some of the risk-cost trade-offs that inevitably dominate data integrity coverage, based on lessons learned from five years of data integrity tooling and consulting across Google.

Photo of Kristina Bennett

Kristina Bennett


Kristina Bennett is a software engineer on the customer reliability engineering team at Google, where she helps support the team’s mission to SRE everyone else.” Previously, she spent five years working on data integrity across Google.