Loosely coupled microservices-based systems give us the flexibility to keep up with the demands of our users but also introduce the need to handle failures. Reliability engineering—a systematic way to evaluate and improve a system’s resilience to failures—is a crucial step in implementing distributed systems.
Jan Machacek explores the architecture and design decisions needed to build resilient systems, offering a step-by-step guide that you can apply to your current system to discover its critical areas and see what happens in those critical areas when the inevitable faults start to pile in. Along the way, Jan demonstrates different types of failures and how to deal with them. The tips and code Jan shares cover multi-AZ, multiregion Scala and Akka systems in AWS that rely heavily on Kafka, RDS, S3, and Dynamo.
Jan Machacek is CTO at Cake Solutions, where he helps companies achieve exceptional growth and success through the use of modern computing technologies—specifically large-scale machine learning and big data systems, particularly those that interact with the IoT, wearables, mobile, and modern web applications. Jan is a passionate technologist with hands-on experience delivering large-scale systems, with a focus on those that bring together the data science and mathematics with modern engineering practices. He regularly contributes to open source projects and speaks at technical conferences.
Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?
Join the conversation here (requires login)
©2017, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org