Build & maintain complex distributed systems
17–18 October 2017: Training
18–20 October 2017: Tutorials & Conference
London, UK
 
Buckingham Room - Palace Suite
13:15 Keeping time in real systems Kavya Joshi (Samsara)
14:10 Building distributed systems is accessible. I promise. Jamie Winsor (Chef Software)
15:40 Distributed systems: What can go wrong will go wrong anne currie (Container Solutions)
Blenheim Room - Palace Suite
11:20 Continuous performance engineering: Moving fast without breaking things Thomas Barns (Capacitas), John Pillar (Arcadia Group)
13:15 Scale CI from 20K to 140K builds per day Alexander Akbashev (HERE Technologies)
15:40 Failover early: When to failover at your CDN Manuel Alvarez (Akamai Technologies)
16:35 The great migration Sean O'Connor (Bitly)
Park Suite
11:20 Confusion in the land of the serverless Sam Newman (Independent)
13:15 Going serverless with GraphQL Steven Faulkner (Bustle)
King's Suite - Sandringham
13:15 Scaling up your monitoring Kamil Smuga (Salesforce), Mihai Bojin (Salesforce)
15:40 Monitoring containers: Follow the data Jason Yee (Datadog)
16:35 Seeing what’s wrong just right Jasvir Nagra (Instart Logic), Marianna Bezler (Instart Logic)
King's Suite - Balmoral
13:15 You had one job! Learning to cope with failures in a complex distributed system Ed Hiley (NHS Digital), Dan Rathbone (Infinity Works)
14:10 The path to resilience Sam Boyer (VividCortex)
15:40 Indispensable, disposable Jenkins Mandy Hubbard (Care.com HomePay)
King's Suite
9:00 Friday opening welcome Nikki McDonald (O’Reilly Media), Ines Sombra (Fastly), James Turnbull (Glitch)
9:05 Blockchain and the future of distributed computing catherine mulligan (Imperial College)
9:25 The evolution of chaos Kolton Andrus (Gremlin Inc.)
10:05 Edge compute: The missing pieces Tyler McMullen (Fastly)
10:25 Informed intentions meet Tech for Good Laura Hackney (AnnieCannons)
10:45 Morning Break | Room: Sponsor Pavilion (Monarch Suite)
12:00 Lunch and Friday Topic Tables | Room: Sponsor Pavilion (Monarch Suite)
14:50 Afternoon Break | Room: Sponsor Pavilion (Monarch Suite)
8:00 Morning Coffee | Room: Sponsor Pavilion Foyer
8:15 Friday Speed Networking | Room: King's Suite Foyer
11:20-12:00 (40m) Distributed Systems
What we talk about when we talk about distributed systems
Alvaro Videla (self)
Distributed systems are a complex. There's abundant research, but sometimes it's hard for a beginner to know where to start. Alvaro Videla discusses the foundational concepts of distributed systems and offers an overview of the best resources for getting started.
13:15-13:55 (40m) Distributed Systems
Keeping time in real systems
Kavya Joshi (Samsara)
Kavya Joshi explores the fascinating timekeeping mechanisms used in real systems, covering the different expressions of time in the context of practical systems that use them and investigating how the timekeeping mechanism affects the properties of the entire system.
14:10-14:50 (40m) Distributed Data and Databases, Distributed Systems
Building distributed systems is accessible. I promise.
Jamie Winsor (Chef Software)
Understanding and building distributed systems can be a daunting task, but like most other software development patterns, distributed systems mimic concepts in the real world that you're already familiar with. Jamie Winsor walks you through building a mental model to help you understand the basics of building distributed systems based on concrete, real-world systems.
15:40-16:20 (40m) Distributed Systems
Distributed systems: What can go wrong will go wrong
anne currie (Container Solutions)
Forget Conway's law. In distributed systems, Murphy’s law rules: Everything that can go wrong will go wrong. Anne Currie discusses common failure modes, how to approach diagnosing highly complex issues, and what we can learn from detectives like Sherlock Holmes, Hercule Poirot, and Miss Marple.
16:35-17:15 (40m) Distributed Systems, Systems Engineering
A tour of sketching data structures for stream processing
Kiran Bhattaram (Pilot)
As the scale of data our systems produce continues to increase, the techniques our systems use to process it must evolve. Kiran Bhattaram explains why sketches are a good option for leveraging more sophisticated data structures.
11:20-12:00 (40m) Hardware, Storage, Datacenters, and Capacity Planning
Continuous performance engineering: Moving fast without breaking things
Thomas Barns (Capacitas), John Pillar (Arcadia Group)
With ever-increasing demands for fast business change, how can we ensure our digital channels reflect the exacting standards of performance our customers (and business owners) expect? What does this look like in an age of DevOps and continuous delivery? Thomas Barns and John Pillar share a strategy for shifting left and automating performance analysis.
13:15-13:55 (40m) Hardware, Storage, Datacenters, and Capacity Planning
Scale CI from 20K to 140K builds per day
Alexander Akbashev (HERE Technologies)
Alexander Akbashev explains how his company scaled a single-instance Jenkins master from 20K builds per day to 140K using Amazon AWS services (EC2, S3, Memcache, etc.). Everything done to achieve this result was open sourced and upstreamed.
14:10-14:50 (40m) Hardware, Storage, Datacenters, and Capacity Planning
Online performance analysis of distributed dataflow systems
Vasiliki Kalavri (ETH Zurich)
Vasia Kalavri offers an overview of Strymon, a system for predictive data center analytics, and its online critical path analysis module. Strymon analyzes live traces from distributed dataflow systems like Apache Spark, Apache Flink, and TensorFlow to predict bottlenecks and provide insights on streaming application performance.
15:40-16:20 (40m) Hardware, Storage, Datacenters, and Capacity Planning, Networking, Traffic, and Edge Management
Failover early: When to failover at your CDN
Manuel Alvarez (Akamai Technologies)
By failing to prepare, you are preparing to fail. Your risk mitigation strategy must layer the most cost-efficient strategies to effectively mitigate or reduce the adverse effects of failure. Manuel Alvarez explores using the CDN as a failover tool, reviewing use cases and demonstrating how to decide whether to use a CDN by evaluating costs, benefits, operations, and time to mitigate.
16:35-17:15 (40m) Hardware, Storage, Datacenters, and Capacity Planning, Systems Engineering
The great migration
Sean O'Connor (Bitly)
Data center migrations are rare but interesting events. Sean O'Connor shares a play-by-play of Bitly’s 2016 move, touching on the choices made, trade-offs, mistakes, and successes from the company's decision to turn off the lights in the old data center.
11:20-12:00 (40m) Serverless
Confusion in the land of the serverless
Sam Newman (Independent)
Like any hyped technology, serverless computing promises a lot. However questions remain around its concept and implementation, especially when you start to compare how we've built systems in the past, and what serverless offers us now. Sam Newman asks (and answers), "Is serverless the future or just the emperor's new clothes?"
13:15-13:55 (40m) Serverless
Going serverless with GraphQL
Steven Faulkner (Bustle)
Bustle has transitioned its entire production platform to AWS Lambda and API gateway. But it didn't happen overnight. The change was iterative, and GraphQL played a huge part of the process. Steven Faulkner discusses the different approaches Bustle used to transition services and data off of legacy infrastructure and explains why and how the company used GraphQL as part of the process.
14:10-14:50 (40m) Serverless
Serverless security: What's left to protect?
Guy Podjarny (Snyk)
Serverless means handing off server management to the cloud platforms—along with their security risks. With the “pros” ensuring our servers are patched, what’s left for application owners to protect? As it turns out, quite a lot. Guy Podjarny explores the aspects of security serverless doesn’t solve, the problems it could make worse, and the tools and practices you can use to keep yourself safe.
15:40-16:20 (40m) Serverless
Building and running serverless data pipelines on AWS
Mike Roberts (Symphonia)
Mike Roberts describes a real-life example where an existing data platform was rearchitected and reengineered to provide several improvements: significantly increased data capacity, reduced cost, and vastly improved development cycle time.
16:35-17:15 (40m) Serverless
Lessons learned building serverless distributed systems
Raj Rohit (Episource)
Episource just finished building a scalable, resilient serverless distributed data pipeline for coding medical charts using NLP, which scales seamlessly with the amount of data it takes in as input. Raj Rohit explores the system and the tools used to build it, such as Ansible, Lambda, and Terraform, and shares the pitfalls, failures, successes, and lessons learned along the way.
11:20-12:00 (40m) Monitoring, Tracing and Metrics, Systems Engineering
Increasing visibility of distributed systems in production
Pierre Vincent (Poppulo)
Understanding the state of a running application is the key to efficiently troubleshooting production issues and ultimately anticipating outages. Pierre Vincent demonstrates how to make monitoring an integral part of development, using health checks, metrics, tracing, and other patterns to get a clearer picture of applications in production.
13:15-13:55 (40m) Monitoring, Tracing and Metrics, Systems Engineering
Scaling up your monitoring
Kamil Smuga (Salesforce), Mihai Bojin (Salesforce)
Have you ever had to monitor the health of your service (server stats, application errors, etc.)? What if you had to monitor the cloud, with its hundreds of thousands of servers? Alerts can create noise and spam your team. Mihai Bojin and Kamil Smuga explain how Salesforce approaches monitoring at scale by putting customers first.
14:10-14:50 (40m) Monitoring, Tracing and Metrics, Technical Leadership
Want to solve overmonitoring and alert fatigue? Create the right incentives
Kishore Jalleda (Yahoo)
Keeping your signal-to-noise ratio high is a nontrivial problem. Modern tools make it easy to overmonitor (which leads to noise). The result? Missed alarms and unhappy customers. Filtering the noise is not the answer. Kishore Jalleda explains how Yahoo reduced the alert volume from ~200K a month to a few hundred by creating the right incentives and culture.
15:40-16:20 (40m) Monitoring, Tracing and Metrics
Monitoring containers: Follow the data
Jason Yee (Datadog)
Using real-world metrics data from thousands of organizations, Jason Yee explores the latest trends in container adoption and use, shares data on what types of applications organizations are running in containers, and explains how to best monitor these containerized applications.
16:35-17:15 (40m) Monitoring, Tracing and Metrics, Systems Engineering
Seeing what’s wrong just right
Jasvir Nagra (Instart Logic), Marianna Bezler (Instart Logic)
A developer hunting for a bug is like a doctor hunting for an illness. She does not need complete understanding of the body for the hunt to be successful. Jasvir Nagra and Marianna Bezler share a few painful distributed web app debugging anecdotes and an alternate approach using virtualization and visualization to get a holistic view of a program to track down elusive bugs.
11:20-12:00 (40m) Orchestration, Scheduling, and Containers, Resilience engineering
Slaying the dragon: How to rewrite a monolith into microservices and stay alive
Dalia Simons (Wix)
Do you have an old monolith you really want to rewrite, but don’t know where to start? Dalia Simons shares ideas, tips, and strategies for rewriting an important monolith service into microservices while maintaining full availability.
13:15-13:55 (40m) Resilience engineering, Systems Engineering
You had one job! Learning to cope with failures in a complex distributed system
Ed Hiley (NHS Digital), Dan Rathbone (Infinity Works)
What are your perceptions of NHS IT? Not great? Well the truth is very different from what you might expect. Ed Hiley and Dan Rathbone offer an overview of the technical renaissance going on in parts of the NHS, where things are being done in a modern way.
14:10-14:50 (40m) Resilience engineering
The path to resilience
Sam Boyer (VividCortex)
Resilience engineering is a holy grail of modern software engineering, granting enormous benefits but difficult to achieve and dangerous to even attempt for the unprepared. Sam Boyer explores major concepts behind resilience engineering and discusses how to move toward resilience without shooting yourself in the foot.
15:40-16:20 (40m) Orchestration, Scheduling, and Containers, Resilience engineering
Indispensable, disposable Jenkins
Mandy Hubbard (Care.com HomePay)
You rely on Jenkins to manage the full stack of your continuous delivery pipeline, but why shouldn’t Jenkins itself be software defined, ephemeral, and available at the push of a button? Mandy Hubbard explains how Care.com uses a customized, script-based startup process and Joyent’s ContainerPilot with a just few edits to a Docker Compose _env file to launch Jenkins in a Docker container.
16:35-17:15 (40m) Resilience engineering
The build that cried broken: Building trust in your continuous integration tests
Angie Jones (Applitools)
Angie Jones explains how to build stability and credibility into your continuous integration tests so that your team is able to receive the fast feedback it needs for Agile development.
9:00-9:05 (5m)
Friday opening welcome
Nikki McDonald (O’Reilly Media), Ines Sombra (Fastly), James Turnbull (Glitch)
Velocity program chairs Nikki McDonald, Ines Sombra, and James Turnbull open the second day of keynotes.
9:05-9:25 (20m)
Blockchain and the future of distributed computing
catherine mulligan (Imperial College)
Although the blockchain is technically a distributed system, there has been a surprising lack interest from the distributed systems community. Catherine Mulligan explores the implications of the blockchain to distributed systems and explains what needs to be addressed in order to build and maintain them effectively.
9:25-9:45 (20m) Resilience engineering
The evolution of chaos
Kolton Andrus (Gremlin Inc.)
Chaos engineering is intentionally injecting failure into a system to proactively identify and fix problems before they cause outages. It’s an emerging discipline, but its roots are decades old. Kolton Andrus explores the evolution of chaos engineering, how to begin your journey toward resilient systems, and how to make those pagers quit buzzing at 3:00am.
9:45-9:50 (5m) Sponsored
Overcoming traditional data analytics performance bottlenecks with inline acceleration (sponsored by Intel)
Mike Strickland (Intel Corporation)
A new approach to data analytics acceleration is delivering benchmarked performance increases of 3X to 10X+ at the system level for traditional relational and NoSQL databases.
9:50-10:05 (15m)
T-minus 3, 2, 1: Future-proofing production systems
Kavya Joshi (Samsara)
Kavya Joshi shares strategies to prepare systems for flux and scale. Drawing from a range of use cases, including Facebook’s Kraken, which provides shadow traffic, and Samsara's custom load simulator, Kavya demonstrates how to improve your understanding of your systems as they run today and plan for how they'll run tomorrow.
10:05-10:25 (20m)
Edge compute: The missing pieces
Tyler McMullen (Fastly)
Edge computing is a hot topic, but despite all the hype, there are still some major hurdles to overcome before it reaches its full potential. Tyler McMullen outlines the technical and economic challenges and explains how we can get past them.
10:25-10:45 (20m)
Informed intentions meet Tech for Good
Laura Hackney (AnnieCannons)
What happens when Tech for Good and human-centered design actually support the needs of their end users? Laura Hackney explores the pitfalls and successes of the movement to bring social justice work into the technology landscape. Laura also shares insights from AnnieCannons, her nonprofit dedicated to transforming survivors of human trafficking into software professionals.
10:45-11:20 (35m)
Break: Morning Break
12:00-13:15 (1h 15m)
Lunch and Friday Topic Tables
Join other attendees during lunch at Velocity to share ideas, talk about the issues of the day, and maybe solve a few. Not sure which topic to pick? Don’t worry—it's not a long-term commitment. Try two or three and settle on a different topic tomorrow.
14:50-15:40 (50m)
Break: Afternoon Break
8:00-8:15 (15m)
Break: Morning Coffee
8:15-8:45 (30m)
Friday Speed Networking
Meet us before the opening keynotes on Friday morning and get to know fellow attendees in quick, 60-second discussions.