Build & maintain complex distributed systems
17–18 October 2017: Training
18–20 October 2017: Tutorials & Conference
London, UK


Eric Sigler (PagerDuty)
Eric Sigler shares data collected and patterns observed in postmortems across a large number of infrastructure operating organizations, covering specific trends and groupings of various types of postmortem practices, follow-on actions, and related behavior.
Kiran Bhattaram (Stripe)
As the scale of data our systems produce continues to increase, the techniques our systems use to process it must evolve. Kiran Bhattaram explains why sketches are a good option for leveraging more sophisticated data structures.
Andrew Betts (Fastly)
Most people working with CDN caches know about the Vary header, but few properly understand what it really does. And with the advent of the Key header, new patterns for varying cache content will emerge. Andrew Betts shares common and advanced use cases for Vary, such as language, A/B testing, compression, and service worker support, and outlines potential changes to consider when Key arrives.
Colin Charles (Percona)
The MySQL world is full of trade-offs, and choosing a high-availability (HA) solution is no exception. However, only with high availability can you achieve distributed systems in your database layer. Colin Charles explores the MySQL high-availability landscape, offering deep dives into current technologies, recommendations, and what to look out for.
Sara-Jane Dunn (Microsoft Research)
Sara-Jane Dunn discusses an entirely different paradigm of computing: the information-processing carried out by cells. Focusing on examples from cutting-edge stem cell research, Sara shares formal techniques from computer science that allow us to peer into the inner workings of biology, make sense of the earliest stages of development, and even program cells for use in therapy.
Catherine Mulligan (Imperial College)
Although the blockchain is technically a distributed system, there has been a surprising lack interest from the distributed systems community. Catherine Mulligan explores the implications of the blockchain to distributed systems and explains what needs to be addressed in order to build and maintain them effectively.
Mike Roberts (Symphonia)
Mike Roberts describes a real-life example where an existing data platform was rearchitected and reengineered to provide several improvements: significantly increased data capacity, reduced cost, and vastly improved development cycle time.
Jamie Winsor (Chef Software)
Understanding and building distributed systems can be a daunting task, but like most other software development patterns, distributed systems mimic concepts in the real world that you're already familiar with. Jamie Winsor walks you through building a mental model to help you understand the basics of building distributed systems based on concrete, real-world systems.
Rick Fast (Expedia)
As Expedia refactors its backend services into a finer-grained microservice architecture, frontend applications have begun to be split into smaller applications serving a small number of pages or content on the website. Rick Fast details how Expedia is creating an extremely configurable, self-service edge architecture for routing between frontend applications and managing bot traffic.
Jurgen Cito (University of Zurich)
Can we make developers care about operations? Jürgen Cito shares real-world experience of developers struggling with operations and details a journey to incorporate runtime performance aspects into the developer's daily workflow and reduce performance problems reaching production.
Colin Charles (Percona)
Databases require capacity planning. (To those coming from traditional RDBMS solutions, this can be thought of as a sizing guide.) Capacity planning prevents resource exhaustion, but it can be hard. Colin Charles explores storage capacity planning for OLTP and data warehousing uses.
Soo Choi (DevOps Research and Assessment (DORA))
Soo shares her experiences as a woman in tech. Even though she worked for NASA and co-founded her own successful company, rampant sexism in IT and bad experiences speaking in public nearly destroyed her career. She will examine common constructs about diversity and propose ideas to bring productive change to continue to build upon the solid foundation of inclusion we have created.
Velocity program chairs Nikki McDonald, Ines Sombra, and James Turnbull close the first day of keynotes.
Velocity program chairs Nikki McDonald, Ines Sombra, and James Turnbull close the second day of keynotes.
Liz Rice (Aqua Security)
Your organization wants to go cloud native, but you don't want to hit the headlines as the victim of the latest hacking scandal. Liz Rice addresses the questions you need answers to: Will your deployments be less secure or more? How do DevOps processes like CI/CD and cluster orchestration affect your security profile? And what can we all do to minimize the risk of exploits?
Sam Newman (Independent)
Like any hyped technology, serverless computing promises a lot. However questions remain around its concept and implementation, especially when you start to compare how we've built systems in the past, and what serverless offers us now. Sam Newman asks (and answers), "Is serverless the future or just the emperor's new clothes?"
Seth Vargo (Google)
There are two sides to monitoring: exposing problems and taking action to resolve them. Most monitoring systems handle the first, but Consul handles both. Seth Vargo explains how Consul enables self-healing infrastructure. By coupling service discovery with monitoring, Consul is able to intelligently route traffic away from unhealthy hosts or fail over to geographically different data centers.
Harry Winser (Rightmove)
Harry Winser explains how to leverage consumer-driven contracts to achieve fully independent releases of microservices across teams and how to handle a service rollback while still serving over 47 million requests a day. Harry also demonstrates how to use the Pact framework to continuously deliver services that depend on one another and Docker to make developer testing easier.
Thomas Barns (Capacitas), John Pillar (Arcadia Group)
With ever-increasing demands for fast business change, how can we ensure our digital channels reflect the exacting standards of performance our customers (and business owners) expect? What does this look like in an age of DevOps and continuous delivery? Thomas Barns and John Pillar share a strategy for shifting left and automating performance analysis.
Heinrich Hartmann (Circonus)
Gathering telemetry data is key to operating reliable distributed systems at scale. Heinrich Hartmann explores a wide range of data science and analysis methods (both theoretical and practical) that can make you more effective at an operations task.
DDoS mitigation is an ever-evolving art. Architectures change, attackers get more creative, and keeping your team and tools ahead of the curve is a constant battle. So why not make DDoS preparedness fun as well as practical? Shannon Weyrick explains why you should use DDoS war games to keep your team’s skillset polished, their tools in top shape, and their spirits and confidence high.
Join us for the social highlight of Velocity at Draft House Paddington, beginning at 18:30. Enjoy a proper pint and a slice of pizza while networking and making new connections.
Anne Currie (Container Solutions)
Forget Conway's law. In distributed systems, Murphy’s law rules: Everything that can go wrong will go wrong. Anne Currie discusses common failure modes, how to approach diagnosing highly complex issues, and what we can learn from detectives like Sherlock Holmes, Hercule Poirot, and Miss Marple.
Tyler McMullen (Fastly)
Edge computing is a hot topic, but despite all the hype, there are still some major hurdles to overcome before it reaches its full potential. Tyler McMullen outlines the technical and economic challenges and explains how we can get past them.
Peter Bourgon (Fastly), Sean Braithwaite (Independent)
Peter Bourgon and Sean Braithwaite offer an overview of microservices and data pipelines, explaining how both systems reflect the organizations and people that build them (in adherence to Conway’s law) and can be well understood in terms of their relationship to change and time. You'll learn the virtues and vices of each architecture and get enough context to apply them coherently.
Manuel Alvarez (Akamai Technologies)
By failing to prepare, you are preparing to fail. Your risk mitigation strategy must layer the most cost-efficient strategies to effectively mitigate or reduce the adverse effects of failure. Manuel Alvarez explores using the CDN as a failover tool, reviewing use cases and demonstrating how to decide whether to use a CDN by evaluating costs, benefits, operations, and time to mitigate.
Mike Strickland (Intel Corporation)
Microsoft has widely deployed field-programmable gate arrays (FPGAs) for accelerating search, networking, and machine learning—with a little help from Intel's software expertise and its FPGA programmers. Mike Strickland explains how a single FPGA can deliver significant acceleration for multiple workloads.
Meet the Experts sessions give you a chance to meet expert Velocity presenters face-to-face in a small group setting. Join in to discuss the speakers' areas of expertise, give feedback about their sessions, or ask questions. Seating is limited, so don't forget to add it to your personal schedule.
O’Reilly Author Book Signings will be held in the O’Reilly booth on Thursday and Friday. This is a great opportunity for you to meet O’Reilly authors and speakers.
Nikki McDonald (O’Reilly Media ), Ines Sombra (Fastly), James Turnbull (Empatico)
Velocity program chairs Nikki McDonald, Ines Sombra, and James Turnbull open the second day of keynotes.
Meet us before the opening keynotes on Friday morning and get to know fellow attendees in quick, 60-second discussions.
Francesc Campoy Flores walks you through the tools that make Go a great programming language, from the well known "go" tool to lesser known tools that allow you to profile, debug, and understand the performance of your programs.
Steven Faulkner (Bustle)
Bustle has transitioned its entire production platform to AWS Lambda and API gateway. But it didn't happen overnight. The change was iterative, and GraphQL played a huge part of the process. Steven Faulkner discusses the different approaches Bustle used to transition services and data off of legacy infrastructure and explains why and how the company used GraphQL as part of the process.
Mandi Walls (Chef)
Chef's Habitat project is designed for the automation of your applications, no matter where they have to run. Mandi Walls helps you get started with Habitat and its toolset. You'll learn how to automate various application stacks with Habitat and how to export and manage Habitat-built artifacts with Docker and native Habitat runtime environments.
Ben Hall (Katacoda | Ocelot Uproar)
Docker offers many advantages, simplifying both development and production environments. But there is still uncertainty around the security of containers. Ben Hall answers the question, How secure are Docker containers?, exploring Docker's security model, its limitations, and how to handle them.
Janna Brummel (ING Netherlands), Robin van Zijll (ING Netherlands)
Did you read the O’Reilly book about Google SREs but doubt that SRE will work for your more traditional or more regulated company? Janna Brummel and Robin van Zijll explain how they implemented SRE in a global financial organization, providing an overview of methods and technologies and sharing lessons learned from a year of doing SRE.
HTTP/2 (or H2, as the cool kids call it) has been ratified for months, and browsers already support it. But do the exciting features that HTTP/2 offers meet expectations? Frederik Deweerdt explores how HTTP/2 fares in the real world, how browser behavior is changing to accommodate new server-side functionality, and how you can get the most of the new protocol everybody’s talking about.
Daniel Young (EngineerBetter), Emma Jane Hogbin Westby (UN-OCHA)
Software development is a social activity that favors direct human contact, yet 21st century life can often get in the way, forcing us to reconsider our communication patterns. Daniel Young and Emma Jane Hogbin Westby explore how to build and maintain happy productive teams, regardless of geography.
Pierre Vincent (Poppulo)
Understanding the state of a running application is the key to efficiently troubleshooting production issues and ultimately anticipating outages. Pierre Vincent demonstrates how to make monitoring an integral part of development, using health checks, metrics, tracing, and other patterns to get a clearer picture of applications in production.
Mandy Hubbard ( HomePay)
You rely on Jenkins to manage the full stack of your continuous delivery pipeline, but why shouldn’t Jenkins itself be software defined, ephemeral, and available at the push of a button? Mandy Hubbard explains how uses a customized, script-based startup process and Joyent’s ContainerPilot with a just few edits to a Docker Compose _env file to launch Jenkins in a Docker container.
Laura Hackney (AnnieCannons)
What happens when Tech for Good and human-centered design actually support the needs of their end users? Laura Hackney explores the pitfalls and successes of the movement to bring social justice work into the technology landscape. Laura also shares insights from AnnieCannons, her nonprofit dedicated to transforming survivors of human trafficking into software professionals.
Kavya Joshi (Samsara)
Kavya Joshi explores the fascinating timekeeping mechanisms used in real systems, covering the different expressions of time in the context of practical systems that use them and investigating how the timekeeping mechanism affects the properties of the entire system.
Details to come.
Details to come.
Details to come.
Sebastien Goasguen (Bitnami)
Kubernetes is becoming the leading platform for building distributed applications. Sebastien Goasguen walks you through the architecture of a Kubernetes installation, covering the API objects that make up a distributed application, basic operations of Kubernetes primitives, and advanced scheduling scenarios and production concerns.
Raj Rohit (Episource)
Episource just finished building a scalable, resilient serverless distributed data pipeline for coding medical charts using NLP, which scales seamlessly with the amount of data it takes in as input. Raj Rohit explores the system and the tools used to build it, such as Ansible, Lambda, and Terraform, and shares the pitfalls, failures, successes, and lessons learned along the way.
Join other attendees during lunch at Velocity to share ideas, talk about the issues of the day, and maybe solve a few. Not sure which topic to pick? Don’t worry—it's not a long-term commitment. Try two or three and settle on a different topic tomorrow.
Join other attendees during lunch at Velocity to share ideas, talk about the issues of the day, and maybe solve a few. Not sure which topic to pick? Don’t worry—it's not a long-term commitment. Try two or three and settle on a different topic tomorrow.
Hannah Foxwell (Server Density)
Machine learning is the new big data. Everyone is supposed to be on board, but do we understand why? As platforms become more complex and change more frequently than ever before, it's time we stopped trying to maintain them manually. Hannah Foxwell explores the technology and real use cases for machine learning in infrastructure operations and SRE.
Emile Vauge (Containous)
Emile Vauge explains how to effectively manage inbound network traffic in your container-based infrastructure with Traefik, a modern reverse proxy and load balancer made to deploy microservices with ease.
Colin Charles (Percona)
Colin is here to talk about MySQL, MariaDB server, high availability, security, capacity planning, MongoDB, and other database-related issues.
Kolton Andrus (Gremlin Inc.)
Ask Kolton all of your chaos engineering questions.
Liz Rice (Aqua Security)
Liz would be happy to discuss anything related to containers, particularly container security best practices.
Talk with Lorna about the best use cases for queues and how to set yourself up to survive when things go wrong.
Mandi Walls (Chef)
Ask Mandi about Chef's Habitat project and how to increase your effectiveness using configuration management and modernizing IT practices.
Come chat with Matthew about improving the operability of your software systems.
Michael Hausenblas (Red Hat)
Michael is a Gopher. If you know what that means, you'll likely want to talk with him. He’s also happy to chat about all sorts of cloud-native topics, including containers (CRI-O, Docker, etc.), Kubernetes, OpenShift, Prometheus, and functions as a service (from Amazon Lambda to OpenWhisk).
Join Nicole to discuss insights from the latest State of DevOps Report, infrastructure architecture patterns, and the importance of experimentation in software development and delivery.
Sebastien Goasguen (Bitnami)
Sebastien is here to talk to you about the Kubernetes API, distributed application design, and the new serverless paradigm.
Tyler McMullen (Fastly)
Tyler is here to discuss edge computing and more.
Jason Yee (Datadog)
Using real-world metrics data from thousands of organizations, Jason Yee explores the latest trends in container adoption and use, shares data on what types of applications organizations are running in containers, and explains how to best monitor these containerized applications.
Vasia Kalavri (ETH Zurich)
Vasia Kalavri offers an overview of Strymon, a system for predictive data center analytics, and its online critical path analysis module. Strymon analyzes live traces from distributed dataflow systems like Apache Spark, Apache Flink, and TensorFlow to predict bottlenecks and provide insights on streaming application performance.
Mike Strickland (Intel Corporation)
A new approach to data analytics acceleration is delivering benchmarked performance increases of 3X to 10X+ at the system level for traditional relational and NoSQL databases.
Matthew Skelton (Skelton Thatcher Consulting)
Matthew Skelton shares five practical, tried-and-tested techniques for improving operability with many kinds of software systems, including the cloud, serverless, on-premises, and the IoT.
Baron Schwartz (VividCortex)
Distributed systems used to be the exception, but today they're the norm, so it's more useful than ever to be able to quantify scalability. Baron Schwartz explains how to use the Universal Scalability Law to characterize how your systems truly behave, why they don't scale like they should, and how to improve them. It's a simple, elegant solution, and, although formal, it requires no math.
Uwe Friedrichsen (codecentric AG)
Uwe Friedrichsen explores the challenges, options, and trade-offs of different consistency models in distributed system landscapes, covering the limitations of ACID transactions, eventual consistency, and current research that tries to fill the gaps between ACID and BASE transactions.
Alexander Akbashev (HERE Technologies)
Alexander Akbashev explains how his company scaled a single-instance Jenkins master from 20K builds per day to 140K using Amazon AWS services (EC2, S3, Memcache, etc.). Everything done to achieve this result was open sourced and upstreamed.
Christopher Meiklejohn (Instituto Superior Técnico)
Christopher Meiklejohn is building an application that helps users select a bottle of wine based on the wines that they enjoy, using a new programming language called Martinelli. Christopher offers an overview of Martinelli, highlighting the key features of this new language that allow the fault-tolerant, highly scalable operation of his application.
Kamil Smuga (Salesforce), Mihai​ Bojin (Salesforce)
Have you ever had to monitor the health of your service (server stats, application errors, etc.)? What if you had to monitor the cloud, with its hundreds of thousands of servers? Alerts can create noise and spam your team. Mihai Bojin and Kamil Smuga explain how Salesforce approaches monitoring at scale by putting customers first.
Jasvir Nagra (Instart Logic), Marianna Bezler (Instart Logic)
A developer hunting for a bug is like a doctor hunting for an illness. She does not need complete understanding of the body for the hunt to be successful. Jasvir Nagra and Marianna Bezler share a few painful distributed web app debugging anecdotes and an alternate approach using virtualization and visualization to get a holistic view of a program to track down elusive bugs.
Guy Podjarny (Snyk)
Serverless means handing off server management to the cloud platforms—along with their security risks. With the “pros” ensuring our servers are patched, what’s left for application owners to protect? As it turns out, quite a lot. Guy Podjarny explores the aspects of security serverless doesn’t solve, the problems it could make worse, and the tools and practices you can use to keep yourself safe.
Dalia Simons (Wix)
Do you have an old monolith you really want to rewrite, but don’t know where to start? Dalia Simons shares ideas, tips, and strategies for rewriting an important monolith service into microservices while maintaining full availability.
Join us in the Sponsor Pavilion after the afternoon sessions on Thursday, October 19, from 17:15 to 18:15 for the Velocity Sponsor Pavilion Reception. Visit the exhibitors, mingle with other attendees, and enjoy great refreshments and drinks.
Salim Virji (Google)
Salim Virji explores the key concepts behind microservices before guiding you through applying the concepts to evaluate and build systems of your own.
Does it matter if this message doesn't get delivered or gets delivered more than once? What about if the system keeps trying to deliver a message that will always fail or if a failure occurred earlier but now those messages can be safely handled? Lorna Mitchell details how to approach different failure scenarios, drawing on examples involving RabbitMQ.
Kavya Joshi (Samsara)
Kavya Joshi shares strategies to prepare systems for flux and scale. Drawing from a range of use cases, including Facebook’s Kraken, which provides shadow traffic, and Samsara's custom load simulator, Kavya demonstrates how to improve your understanding of your systems as they run today and plan for how they'll run tomorrow.
Angie Jones (Twitter)
Angie Jones explains how to build stability and credibility into your continuous integration tests so that your team is able to receive the fast feedback it needs for Agile development.
Max Neunhöffer (ArangoDB)
What we see in the modern data store world is a race between different approaches to achieve distributed and resilient storage. The IoT, genomics, and applications for other fields also raise the demand for a stateful layer. Max Neunhöffer walks you through the components and the inner workings of modern open source databases like ArangoDB, Cassandra, Cockroach, and RethinkDB.
Kolton Andrus (Gremlin Inc.)
Chaos engineering is intentionally injecting failure into a system to proactively identify and fix problems before they cause outages. It’s an emerging discipline, but its roots are decades old. Kolton Andrus explores the evolution of chaos engineering, how to begin your journey toward resilient systems, and how to make those pagers quit buzzing at 3:00am.
Sean O'Connor (Bitly)
Data center migrations are rare but interesting events. Sean O'Connor shares a play-by-play of Bitly’s 2016 move, touching on the choices made, trade-offs, mistakes, and successes from the company's decision to turn off the lights in the old data center.
Nicole Forsgren (DORA), Nigel Kersten (Puppet)
The State of DevOps Report has shown that high-performing IT teams decisively outperform low-performing peers (with greater throughput and stability), creating value that shows up on the bottom line. Nicole Forsgren and Nigel Kersten share insights into the key leadership, technical, architectural, and product capabilities that drive these outcomes.
Gareth Rushgrove (Puppet Labs)
The popularity of Git and GitHub has led to an explosion in the number of software repositories. But is creating a new repository always the right approach? Gareth Rushgrove offers an overview of the monorepo—putting all your product's or organization's code in a single repository—covering the advantages of monorepos and the tools to help maintain them.
Sam Boyer (VividCortex)
Resilience engineering is a holy grail of modern software engineering, granting enormous benefits but difficult to achieve and dangerous to even attempt for the unprepared. Sam Boyer explores major concepts behind resilience engineering and discusses how to move toward resilience without shooting yourself in the foot.
Chris Jackson (Pearson)
Chris Jackson explains how 175-year-old company Pearson built a tech startup within the enterprise with the aim of innovating the developer experience. Chris shares the journey from inception to B-round funding and explains how this startup is establishing the foundation of the company's future.
Last year, Mindaugas Mozūras's company was in dire straits. Its strategy was not working. All the key metrics were drifting downward. People left. The company even did a reorg. During this time, he had many last conversations—sometimes trying to stop people from leaving, other times to let them go. Mindaugas relates three such conversations, sharing lessons on honesty and delivering bad news.
Meet the Experts sessions give you a chance to meet expert Velocity presenters face-to-face in a small group setting. Join in to discuss the speakers' areas of expertise, give feedback about their sessions, or ask questions. Seating is limited, so don't forget to add it to your personal schedule.
O’Reilly Author Book Signings will be held in the O’Reilly booth on Thursday and Friday. This is a great opportunity for you to meet O’Reilly authors and speakers.
James Turnbull (Empatico), Ines Sombra (Fastly), Nikki McDonald (O’Reilly Media )
Velocity program chairs James Turnbull, Ines Sombra, and Nikki McDonald open the first day of keynotes.
Meet us before the opening keynotes on Thursday morning and get to know fellow attendees in quick, 60-second discussions.
Kishore Jalleda (Yahoo)
Keeping your signal-to-noise ratio high is a nontrivial problem. Modern tools make it easy to overmonitor (which leads to noise). The result? Missed alarms and unhappy customers. Filtering the noise is not the answer. Kishore Jalleda explains how Yahoo reduced the alert volume from ~200K a month to a few hundred by creating the right incentives and culture.
Welcome to the world of nanoservices: smaller than a microservice, bigger than a function, they are the perfect unit of software. Nanoservices are flexible, manageable, and scalable and a great way to do serverless computing. Matthew Clark explains how to get nanoservices right, drawing on his experience at the BBC, which now has over a thousand in production.
Alvaro Videla (self)
Distributed systems are a complex. There's abundant research, but sometimes it's hard for a beginner to know where to start. Alvaro Videla discusses the foundational concepts of distributed systems and offers an overview of the best resources for getting started.
Miriah Meyer (University of Utah)
Feeling overwhelmed by huge amounts of data has become the norm. Creating effective visual representations of data offloads some of the work of quickly finding interesting patterns to our powerful perceptual system. Miriah Meyer explores the role that interactive visualizations can play in helping us find meaning in mounds of data and discusses the limitations of this approach.
Ed Hiley (NHS Digital), Dan Rathbone (Infinity Works)
What are your perceptions of NHS IT? Not great? Well the truth is very different from what you might expect. Ed Hiley and Dan Rathbone offer an overview of the technical renaissance going on in parts of the NHS, where things are being done in a modern way.
Liz Rice (Aqua Security)
In a containerized deployment, how do you safely pass secrets like passwords and certificates between containers without compromising their safety? If orchestration means a container can run on any machine in the cluster, how do you minimize who knows your secrets? Liz Rice explores the risks and shares best practices for keeping your secrets safe.