Honey, I shrunk the database: Resilience and recoverability in cloud native services
Who is this presentation for?
- Architects and developers
Mistakes happen, even in the cloud. Your database may now be a managed service, globally replicated with 12 copies and automatic failover, and you may have blocked SSH into prod to prevent accidental ‘rm -rf’, but code and config bugs can still destroy your data. In the world of SaaS, these bugs may not even be yours; instead, they could be from a complex network of easy-to-adopt-hard-to-debug dependencies. You need to architect and build your systems to be resilient to, and more importantly, recoverable from these types of failures.
Sidney Shek and Jeff Farber explore patterns for handling and recovering from failures that they’ve used in the Identity platform at Atlassian, including design, implementation, and operational considerations such as event sourcing and command-query responsibility segregation (CQRS) and the importance of supporting “rebootstrapping” of downstream systems; commutative/convergent replicated data types (CRDTs) for replicating data and why they chose state over operation transfer; using multiple independent technologies to avoid single points of failure (e.g., event storage in S3 versus Cassandra); localized validation through signing and caching and how to handle rapid invalidation of data; and building “recovery” services, which may require more thought than the main functionality itself.
- Familiarity with microservice architectures and cloud providers (e.g., AWS)
What you'll learn
- Discover the importance of generic failure recovery mechanisms compared to focusing solely on point solutions for individual failure modes
- Learn patterns for resilience and recovery in a microservices cloud native architecture
Sidney Shek is an architect at Atlassian, where he oversees the transformation of identity systems into a massively scalable and flexible platform for users, product developers, and the ecosystem, with over 10 years’ experience in developing and architecting real-time and mission-critical software systems across many industries ranging from financial services to manufacturing. He likes challenging traditional constraints and applying the latest R&D and technologies in elegant yet reliable solutions to real-world problems. He believes that functional programming principles like immutable data, type-safety, and idempotence need to be ingrained in architects and programmers alike.
Jeff Farber is a principal software engineer at Atlassian, where his main focus is designing and implementation of systems. Over the past 18 months, he’s built a critical permissions service utilized across Atlassian. Jeff has been building software since a young age across various domains. He enjoys guiding software teams and making business-oriented software decisions.
Diversity & Inclusion Sponsor
For conference registration information and customer service
For more information on community discounts and trade opportunities with O’Reilly conferences
For information on exhibiting or sponsoring a conference
For media/analyst press inquires