4–7 Nov 2019

Honey, I shrunk the database: Resilience and recoverability in cloud native services

Sidney Shek (Atlassian), Jeff Farber (Atlassian)
16:5017:35 Thursday, 7 November 2019
Location: M8

Who is this presentation for?

  • Architects and developers

Level

Intermediate

Description

Mistakes happen, even in the cloud. Your database may now be a managed service, globally replicated with 12 copies and automatic failover, and you may have blocked SSH into prod to prevent accidental ‘rm -rf’, but code and config bugs can still destroy your data. In the world of SaaS, these bugs may not even be yours; instead, they could be from a complex network of easy-to-adopt-hard-to-debug dependencies. You need to architect and build your systems to be resilient to, and more importantly, recoverable from these types of failures.

Sidney Shek and Jeff Farber explore patterns for handling and recovering from failures that they’ve used in the Identity platform at Atlassian, including design, implementation, and operational considerations such as event sourcing and command-query responsibility segregation (CQRS) and the importance of supporting ‘rebootstrapping’ of downstream systems; commutative/convergent replicated data types (CRDTs) for replicating data and why they chose state over operation transfer; using multiple independent technologies to avoid single points of failure (e.g., event storage in S3 versus Cassandra); localized validation through signing and caching and how to handle rapid invalidation of data; and building ‘recovery’ services, which may require more thought than the main functionality itself.

Prerequisite knowledge

  • Familiarity with microservice architectures and cloud providers (e.g., AWS)

What you'll learn

  • Discover the importance of generic failure recovery mechanisms compared to focusing solely on point solutions for individual failure modes
  • Learn patterns for resilience and recovery in a microservices cloud native architecture
Photo of Sidney Shek

Sidney Shek

Atlassian

Sidney Shek is an architect at Atlassian, where he oversees the transformation of identity systems into a massively scalable and flexible platform for users, product developers, and the ecosystem, with over 10 years’ experience in developing and architecting real-time and mission-critical software systems across many industries ranging from financial services to manufacturing. He likes challenging traditional constraints and applying the latest R&D and technologies in elegant yet reliable solutions to real-world problems. He believes that functional programming principles like immutable data, type-safety, and idempotence need to be ingrained in architects and programmers alike.

Photo of Jeff Farber

Jeff Farber

Atlassian

Jeff Farber is a principal software engineer at Atlassian, where his main focus is designing and implementation of systems. Over the past 18 months, he’s built a critical permissions service utilized across Atlassian. Jeff has been building software since a young age across various domains. He enjoys guiding software teams and making business-oriented software decisions.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)

Contact us

confreg@oreilly.com

For conference registration information and customer service

partners@oreilly.com

For more information on community discounts and trade opportunities with O’Reilly conferences

Become a sponsor

For information on exhibiting or sponsoring a conference

pr@oreilly.com

For media/analyst press inquires