Building and maintaining complex distributed systems
17–18 October, 2017: Training
18–20 October, 2017: Tutorials & Conference
London, UK
All
Tutorial Please note: to attend, your registration must include Tutorials on Wednesday.
HTTP/2 (or "H2" as the cool kids call it) has been ratified for months and browsers already support it. Do the exciting features that HTTP/2 offer meet expectations? How does HTTP/2 fare in the real world? How is browser behavior changing to accommodate new server-side functionality? How can you get the most of the new protocol everybody’s talking about?
Technical Leadership
Eric Sigler (PagerDuty)
This session will cover data collected and patterns observed in postmortems across a large number of infrastructure operating organizations. Specific trends and groupings of various types of postmortem practices, follow-on actions, & related behavior will be covered.
Technical Leadership
John Clapham (Cotelic)
This talk looks beyond the job specification and interview, towards ideas that lean and systems thinking uncover. We look at what constitutes the recruitment ‘system’, going beyond the traditional employee lifecycle. The talk shares a range of practical ideas and concepts to improve the various stages of recruitment including finding, and keeping, the right kind of people.
Keynotes
Martin Kleppmann (University of Cambridge)
In this talk, we will explore how we can split "consistency" into two separate concepts: "integrity" and "timeliness". And we will see how that distinction allows us to build systems that behave correctly, even in the face of faults, while also achieving better availability and performance than the classic approach of ACID transactions.
Databases and Distributed Data
Colin Charles (Percona)
Tutorial Please note: to attend, your registration must include Tutorials on Wednesday.
The MySQL world is full of trade-offs; choosing a high-availability (HA) solution is no exception, but only with high availability can you achieve distributed systems in your database layer. We explore the MySQL high availability landscape, offering deep dives into current technologies, recommendations, and what to look out for.
Databases and Distributed Data
Jamie Winsor (Chef Software)
Getting into understanding and building distributed systems can be a daunting task at first glance, but like most other software development patterns, they mimic concepts in the real world that you're already familiar with. In this talk Jamie will help you build a mental model to help you understand the basics of building distributed systems based on concrete real world systems.
DevOps & Tools, Systems Engineering
Jurgen Cito (University of Zurich)
Interesting and terrifying things happen in production. Some of these operational concerns need to be fixed in source code. But, can we make developers care about operations? We talk about our experience with developers struggling with operations and our journey to incorporate runtime performance aspects into the developer's daily workflow and reduce performance problems reaching production.
Capacity Planning
Colin Charles (Percona)
Databases require capacity planning (and to those coming from traditional RDBMS solutions, this can be thought of as a sizing guide). Capacity planning prevents resource exhaustion. Capacity planning can be hard. This talk has a heavier leaning on MySQL, but the concepts and addendum will help with any other data store.
Systems Engineering
Ed Hiley (NHS Digital), Dan Rathbone (Infinity Works)
What are your perceptions of NHS IT? Not great? Well the truth is very different to what you might expect. There is something of a technical renaissance going on in parts of the NHS where things are being done in a modern way, learning from past experiences.
Orchestration, Scheduling, and Containers
Sam Newman (Independent)
Serverless computing is the hot new thing. Like any hyped technology, it promises a lot. However questions remain around concept and implementation, especially when you start to compare how we've built systems in the past, and what serverless offers us now. Is Serverless the future, or just the emperor's new clothes?
Networking, Traffic, and Edge Management
Seth Vargo (HashiCorp)
Tutorial Please note: to attend, your registration must include Tutorials on Wednesday.
There are two sides to monitoring - exposing problems and taking action to resolve them. Most monitoring systems handle the first, but Consul handles both. Consul enables self-healing infrastructure almost effortlessly. By coupling service discovery with monitoring, Consul is able to intelligently route traffic away from unhealthy hosts or fail over to geographically different datacenters.
Orchestration, Scheduling, and Containers
Harry Winser (Rightmove)
Microservices and Continuous Delivery is now mainstream. But how do you address the changes of API’s between Microservices while still being confident they can continue to communicate between each release? In this talk, we’ll look at Pact Framework and how it enabled confidence to continuously delivery services that depend on one another, and how Docker was used to make Developer testing easier.
Orchestration, Scheduling, and Containers
Liz Rice (Aqua Security)
When you build a container image from a Dockerfile, or pull an image from a registry, do you really know what’s inside? In this talk we’ll reverse engineer container images so that you’ll understand how they are put together and how layers work. We’ll see how you can make smaller, efficient images and we’ll investigate ways you can reduce the security risks in your containers.
Capacity Planning
Thomas Barns (Capacitas), John Pillar (Arcadia Group)
With ever increasing demands for fast business change how can we ensure our digital channels have the increasingly exacting standards of performance our customers (and business owners) expect? What does this look like in an age of DevOps and Continuous Delivery? We’ll take you through our experiences as we build a strategy for shifting left and automating performance analysis.
Orchestration, Scheduling, and Containers
Viktor Farcic (CloudBees)
Tutorial Please note: to attend, your registration must include Tutorials on Wednesday.
The workshop walks the audience through some of the best practices we're facing when working on distributed systems at scale.
Heinrich Hartmann (Circonus)
2-Day Training Please note: to attend, your registration must include Training courses.
Gathering telemetry data is key to operating reliable distributed systems at scale. Data Science is the art of extracting information from large amounts of data. In this training, we will cover a wide range of data analysis methods from both, theoretical and practical side, that make you more effective as at operations task.
Networking, Traffic, and Edge Management
DDoS mitigation is an ever-evolving art. Architectures change, attackers get more creative, and keeping your team and tools ahead of the curve is a constant battle. So why not make DDoS preparedness fun, as well as practical? We’ll share our experiences with DDoS war games as a means of keeping your team’s skillset polished, their tools in top shape, and their spirits and confidence high.
Michael Hausenblas (Red Hat)
2-Day Training Please note: to attend, your registration must include Training courses.
Serverless, or better: Function-as-a-Service (FaaS) is going mainstream and now is a good time to learn how and when to use it. We will cover use cases, offerings, development (also in a team setting) and the operational aspects, using AWS Lambda as the environment.
Distributed Systems
Anne Currie (Force12.io)
Forget Conways Law, in distributed systems Murphy’s Law rules - “everything that can go wrong will go wrong”. At scale, statistics are not your friend and human intuition fails. Embrace your inner catastrophist!
Networking, Traffic, and Edge Management
Manuel Alvarez (Akamai Technologies)
"By failing to prepare, you are preparing to fail" Your risk mitigation strategy must layers the most cost efficient strategies to effectively mitigate or reduce the adverse effects of failure. This talk will present you the CDN as a failover tool. I will review use cases and show you how to evaluate if it is a good idea to use a CDN by evaluating costs, benefits, operations, and time to mitigate
Systems Engineering
Steven Faulkner (Bustle)
At Bustle we have transitioned our entire production platform to AWS Lambda and API gateway. But it didn't happen overnight. We got there iteratively and GraphQL was a huge part of the process. I'll talk about the specifically about the different approaches we used to transition services and data off of legacy infrastructure and how we used graphQL to do it.
Orchestration, Scheduling, and Containers
Mandi Walls (Chef)
Tutorial Please note: to attend, your registration must include Tutorials on Wednesday.
Chef's Habitat project is designed for the automation of your applications, no matter where they have to run. This workshop will get you started with Habitat and it's toolset. Attendees will automate various application stacks with Habitat and learn how to export and manage Habitat-built artifacts with Docker and native Habitat runtime environments.
Systems Engineering
Yoav Cohen (Imperva Incapsula)
Last year, we experienced our first worldwide network outage due to an erroneous quotation mark in a parsing function. The outage affected the websites that rely on our security and acceleration every day. In this talk I’ll explain how we re-architected key components of the service to become an order of magnitude more reliable for the millions of web sites we protect.
Orchestration, Scheduling, and Containers
Ben Hall (Katacoda | Ocelot Uproar)
Docker offers many advantages, simplifying both development and production environments. But there is still uncertainty around the security of containers. During this talk, Ben will share his experiences and investigate Docker and it's security model. The aim is to answer the question - "How secure are Docker containers".
DevOps & Tools, Systems Engineering
Janna Brummel (ING Netherlands), Robin van Zijll (ING Netherlands)
Have you read the O’Reilly book about Google SREs and do you doubt how SRE will work for your more traditional or more regulated company? We will share how we implemented SRE in a global financial organization providing an overview of our global way of working, what technologies we use and why and what we have learned after a year of doing SRE.
Technical Leadership
Daniel Young (EngineerBetter), Emma Jane Hogbin Westby (UN-OCHA)
Software development is a social activity that favours direct human contact, yet 21st century life can often get in the way, forcing us to reconsider our communication patterns. In this talk, leaders from two very different teams will encourage the audience to think about how they can build and maintain happy productive teams, regardless of geography.
Monitoring, Tracing and Metrics, Systems Engineering
Pierre Vincent (Poppulo)
Understanding the state of a running application is the key to efficiently troubleshoot production issues and ultimately anticipate outages. This talk focuses on building monitoring as an integral part of development, using healthchecks, metrics, tracing and other patterns to get a clearer picture of applications in production.
Keynotes
Christopher Meiklejohn (Independent)
Details to come.
Keynotes
Miriah Meyer (University of Utah)
Details to come.
Keynotes
Sara-Jane Dunn (Microsoft Research)
Details to come.
Sebastien Goasguen (Bitnami)
2-Day Training Please note: to attend, your registration must include Training courses.
Kubernetes is one of the highest velocity projects on GitHub. Based on 15 years of experience managing containerized applications at Google, Kubernetes is becoming the leading platform to build your distributed applications on.
Keynotes
Dharma Shukla (Microsoft)
In this keynote, Dharma will describe the internals of the system design and various design trade-offs they had to make in the process of building Azure Cosmos DB service. He will also share his experiences from operating a globally distributed database service worldwide and maintaining comprehensive Service Level Agreements (SLAs).
Orchestration, Scheduling, and Containers
Raj Rohit (Episource)
My team has just finished building a scalable, resilient, serverless distributed data pipeline which scales seamlessly with the amount of data it takes in as input. We have used several tools like Ansible, Lambda, Terraform, etc. And, also learned a lot of lessons along the way, in the form of pitfalls, failures, and wins. This talk is about that system and the lessons learned.
DevOps & Tools, Technical Leadership
Hannah Foxwell (Server Density)
Machine learning is the new big data. Everyone is supposed to be on board, but do we understand why? How can machine learning help me with my job? With our platforms becoming more complex and changing more frequently than ever before it's time we stopped trying to maintain them manually. This talk explores the technology and real use cases for machine learning in infrastructure operations and SRE.
Networking, Traffic, and Edge Management
Emile Vauge (Containous)
How to effectively manage inbound network traffic in your container based infrastructure? This talk will be a deep dive into Traefik, a modern reverse-proxy and load balancer made to deploy microservices with ease. You will get a lot of demos with Docker, Let’s Encrypt and Kubernetes.
Monitoring, Tracing and Metrics, Systems Engineering
Kamil Smuga (Salesforce), Mihai Bojin (Salesforce)
Have you ever had to monitor the health of your service (server stats, application errors, etc.)? What if you had to monitor the cloud with its hundreds of thousands of servers? Alerts can create noise and spam your team. Mihai Bojin and Kamil Smuga explain how Salesforce approaches monitoring at scale by putting customers first.
Jason Yee (@gitbisect - Datadog)
Using real-world metrics data from thousands of organizations, I'll share the latest trends in container adoption and use. I'll also share data on what types of applications organizations are running in containers and how to best monitor those containerized applications.
Orchestration, Scheduling, and Containers, Systems Engineering
Matthew Skelton (Skelton Thatcher Consulting)
In this talk, we explore five practical, tried-and-tested, real world techniques for improving operability with many kinds of software systems, including cloud, Serverless, on-premise, and IoT. Based on our work in many industry sectors, we will share our experience of helping teams to improve the operability of their software systems through these straightforward, team-friendly techniques.
Databases and Distributed Data, Networking, Traffic, and Edge Management
Baron Schwartz (VividCortex)
Distributed systems used to be the exception, but today they're the norm. That's why it's more useful than ever to be able to quantify scalability. With the Universal Scalability Law you can characterize how your systems truly behave, and what's more important, why they don't scale like they could and how to improve them. It's simple, elegant, and although it's formal, no math is needed!
Databases and Distributed Data
Uwe Friedrichsen (codecentric AG)
This session explores the challenges, options and trade-offs of different consistency models in distributed system landscapes. It starts with the limitations of ACID transactions, looks into eventual consistency and finally explores the current state of research in that area which tries to fill the gaps between ACID and BASE transactions.
Capacity Planning
Alexander Akbashev (HERE Technologies)
The story of how we scaled single instance of Jenkins master from 20k builds per day to 140k using Amazon AWS services (EC2, S3, Memcache, etc.) Disclaimer: everything we did to achieve this result was open sourced and upstreamed.
Systems Engineering
Jasvir Nagra (Instart Logic), Marianna Bezler (Instart Logic)
A developer hunting for a bug is like a doctor hunting for an illness. She does not need complete understanding of the body for the hunt to be successful. In this talk, we share a few painful distributed webapps debugging anecdotes and an alternate approach that we took using virtualization & visualization you may be inspired to use to get a holistic view of a program to track down elusive bugs.
Systems Engineering
Guy Podjarny (Snyk)
Serverless means handing off server management to the cloud platforms—along with their security risks. With the “pros” ensuring our servers are patched, what’s left for application owners to protect? As it turns out, quite a lot. This talk discusses the aspects of security serverless doesn’t solve, the problems it could make worse, and the tools and practices you can use to keep yourself safe.
Orchestration, Scheduling, and Containers, Systems Engineering
Dalia Simons (Wix)
Do you have an old and important monolith project you really want to re-write but don’t know where to start ? This is the talk for you. Ideas, tips and strategy wait for you
Systems Engineering
Salim Virji (Google)
Tutorial Please note: to attend, your registration must include Tutorials on Wednesday.
The SRE Classroom presents key concepts behind microservices, and guides participants through applying the concepts with a problem-solving exercise. Participants will learn to evaluate systems as well as how to build their own.
Orchestration, Scheduling, and Containers, Resilience engineering
Angie Jones (Twitter)
Learn how to build stability and credibility into your continuous integration tests so that the team is able to receive the fast feedback it needs for agile development.
Databases and Distributed Data
Max Neunhöffer (ArangoDB GmbH)
What we see in the modern data store world is a race between different approaches to achieve a distributed and resilient storage. IoT, genomics and applications for any other field also raise the demand for a stateful layer. Max will take the audience on a tour to the ingredients, their interplay and inner workings of modern open source DBs like ArangoDB, Cassandra, Cockroach and RethinkDB.
Systems Engineering
Sean O'Connor (Bitly)
Data center migrations are rare but interesting events. In this talk, we’ll give the play-by-play of Bitly’s 2016 move. Decisions, trade-offs, mistakes, and success from the decision to move to turning off the lights in the old DC will be shared.
DevOps & Tools, Technical Leadership
Nicole Forsgren (DORA), Nigel Kersten (Puppet)
The State of DevOps Report has shown that high-performing IT teams decisively outperform low-performing peers, with greater throughput and stability, driving value that shows up on the bottom line. This presentation will highlight insights into key leadership, technical, architectural, and product capabilities to drive these outcomes. Plus: how the study is run each year and the science!
DevOps & Tools, Technical Leadership
Gareth Rushgrove (Puppet)
With the popularity of Git and GitHub we’ve seen an explosion in the number of software repositories. But is creating a new repository always the right approach? In this talk we'll look at monorepos, putting all your product or organisations code in a single repo. What advantages does this have? Why would you take this approach? And what tools exist to help maintain monorepos?
Technical Leadership
Chris Jackson (Pearson)
I work for a 175 year-old company doing an aggressive digital transformation. Enabling containers, DevOps and micro-services in this environment requires a different approach. Listen to how we built a tech startup inside the enterprise with the aim of innovating the developer experience. Follow our journey from inception to B-round funding becoming the foundation of the company's future.
Technical Leadership
Our company started last year in dire straits. Our strategy was not working. All the key metrics were slowly drifting downwards. Many people left. We even did a re-org. I've had multiple last conversations. Sometimes, trying to stop people from leaving. Other times, telling them that they'll have to leave. This story will be centered on three such conversations.
Orchestration, Scheduling, and Containers
Welcome to the world of nanoservices: smaller than a microservice, bigger than a function, they are the perfect unit of software. Nanoservices are flexible, manageable and scalable, and a great way to do serverless computing. This is the story of how to get nanoservices right, from the BBC, who now have over a thousand in production.
Databases and Distributed Data
Alvaro Videla (RabbitMQ)
Learn the foundational concepts of Distributed Systems: Failure Modes, Timing Models, and also which books are the best to start learning about the topic.