Build Resilient Distributed Systems
June 19–20, 2017: Training
June 20–22, 2017: Tutorials & Conference
San Jose, CA

Proceedings

All
Keynotes, Sponsored
Nicholas Weaver (Intel)
Workload colocation is a key component of running containers and schedulers, but we have to choose between performance or increased density - when we need both. Intel has been hard at work optimizing tooling capabilities for colocation to achieve both needs. Learn how you can calculate the tradeoffs between low latency and higher density to start making smarter resource allocations today.
Technical Leadership
Julia Grace (Slack)
Julia Grace has built teams at IBM Research, startups, and Slack and has done due diligence for venture capitalists to determine how well a startup’s engineering team is working together. Drawing on this knowledge, Julia attempts to answer the question, Why do some teams ship features rapidly, support each other, and effectively communicate while others struggle?
Systems Engineering
Bart De Vylder (CoScale)
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
Data science is a hot topic. Bart De Vylder offers a practical introduction that goes beyond the hype, exploring data analysis, visualization, and machine-learning techniques using Python for modeling the behavior of distributed systems. You'll leave with a solid starting point to implement data science techniques in your infrastructure or domain of interest.
Technical Leadership
Lisa van Gelder (Bauer Xcel Media)
Lisa van Gelder shares what she learned from an accidental A/B test. (This year, she interviewed for a new executive job at the same time as two (white, male) friends, and they compared notes.) Lisa explains how "unqualified" is used to reject marginalized groups in tech and what we can do about it—both as individuals interviewing and as hiring managers looking to improve the interview process.
Systems Engineering
Bryan Liles (Capital One)
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
In the past, applications were monolithic, and tracing flows for performance and bottlenecks was straightforward, as there was likely a single code base. In today's world, with multiple processes constituting a single application, tracing becomes more challenging. Bryan Liles offers a hands-on demonstration for implementing tracing in modern applications.
DevOps & Tools
When embarking on a journey of transformation, you want to measure your current status and subsequent progress while keeping tabs on factors that drive improvement in technology performance. Nicole Forsgren explains the importance of knowing how (and what) to measure—ensuring you catch successes and failures when they first show up, not just when they’re epic.
Distributed Data & Databases
Colin Charles (Percona)
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
The MySQL world is full of trade-offs; choosing a high-availability (HA) solution is no exception, but only with high availability can you achieve distributed systems in your database layer. Colin Charles explores the MySQL high availability landscape, offering deep dives into current technologies, recommendations, and what to look out for.
Distributed Systems
Tyler McMullen (Fastly)
The practical realities of distributed systems are rarely straightforward. Tyler McMullen walks you through a system built to perform very high volumes of health checks, done across a cluster of machines for reliability and scalability. Tyler discusses each of the major components in turn to show how they are practically built and the pain and compromises that they bring.
Distributed Systems
Jamie Winsor (Chef Software)
Understanding and building distributed systems can be a daunting task, but like most other software development patterns, distributed systems mimic concepts in the real world that you're already familiar with. Jamie Winsor walks you through building a mental model to help you understand the basics of building distributed systems based on concrete, real-world systems.
John Sasser (Stratuscale)
2-Day Training Please note: to attend, your registration must include Training courses.
John Sasser shares best practices for designing and deploying resilient, fault-tolerant systems on AWS and offers deep dives into managed versus unmanaged services, monitoring and observability, high-availability design patterns, fault-tolerant and self-healing systems, disaster recovery and business continuity approaches, and DDoS mitigation.
Resilience Engineering
Ann Kilzer (Indeed)
Remember the old practice of the canary in the coal mine, where miners used fragile feathered friends as a failure detector for toxic gasses? In software, a canary run is a trial executed on one machine before the rest of the cluster runs. Ann Kilzer explains how Indeed created a canary service leveraging Consul’s key value store to improve the resilience of data reloads in any infrastructure.
Distributed Data & Databases
Chris Fulton (Electric Cloud)
Chris Fulton shares strategies for database deployments and rollbacks as well as some patterns and best practices for reliably deploying databases as part of your CD pipeline, safely rolling back database code, ensuring data integrity, and more.
Systems Engineering
Tammy Butow (Dropbox)
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
Chaos engineering is the discipline of experimenting on a distributed system in order to build confidence in the system’s capability to withstand turbulent conditions in production. Tammy Butow leads a hands-on tutorial on chaos engineering, covering the tools and practices you need to implement chaos engineering in your organization.
Orchestration, Scheduling & Containers
Karl Isenberg (Mesosphere)
The orchestration space is fast moving and full of competing products, platforms, and frameworks. How do you choose the right one for your requirements? Karl Isenberg explores the features of several container orchestrators, breaking down the feature sets and characteristics into categories and scoring multiple solutions against each other, and discusses what's new this year.
Sponsored
Jeffrey Scheaffer (CA Technologies)
Delivering software continuously is a common ambition, but many face challenges pursuing this goal. Jeffrey Scheaffer shares new technologies, solutions, and best practices that reduce these challenges and make it easier for organizations to attain continuous delivery and leads a live demonstration showing end-to-end orchestration throughout the continuous delivery toolchain.
Distributed Data & Databases
Laine Campbell (OpsArtisan), Charity Majors (Honeycomb)
SRE is becoming quite the ubiquitous term, but what about DBRE? Laine Campbell and Charity Majors dive into DBRE, exploring the paths to this craft and how to culturally evolve and support it. Laine and Charity focus on organizational scale, self-service, and force multipliers in recoverability, observability, availability, security, release management, and infrastructure.
Orchestration, Scheduling & Containers
Brendan Burns (Microsoft Azure)
Building reliable distributed systems is challenging and often bespoke, so it's hard for developers to share implementations and best practices. Brendan Burns explores common patterns for composing reliable distributed systems and shows how these patterns can be expressed, via containers, so that they can be reused throughout many different applications.
Capacity Planning
Daniel Spoonhower (LightStep)
As software grows more complex, doing chargebacks and capacity planning gets more challenging. Specifically, it becomes more difficult to attribute storage and other low-level requests to high-level services. Daniel Spoonhower shows how the distributed tracing concept of context propagation can be used to overcome this problem, without any maintenance costs.
Orchestration, Scheduling & Containers
Bret Fisher (Independent Consultant), Laura Frank (Codeship), Tony Pujals (Appcelerator)
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
Starting where previous Docker workshops leave off, Bret Fisher, Laura Frank, and Tony Pujals dive into the new Swarm mode clustering (services), failover, blue-green deployments, monitoring, logging, troubleshooting, and security, covering the latest built-in features and common third-party tools as they walk you through installing them on your own five-node cloud Swarm cluster.
Networking, Traffic & Edge Management
Devin Elliot (Unoceros)
It takes more than a one-tenth scale server-based test environment to seamlessly load balance and deliver content to millions of mobile users. Devin Elliot explains how UX for customers of major media and live streaming events was improved by leveraging idle distributed networks of smartphones and smart devices to repeatedly map, measure, and load test at scale.
DevOps & Tools
Laura Frank (Codeship)
Do you understand how quorum, consensus, leader election, and different scheduling algorithms can impact your running application? Could you explain these concepts to the rest of your team? Laura Frank explores the algorithms that power all modern container orchestration platforms and shares actionable steps to keep your highly available services highly available.
Join us in the Exhibit Hall for the Exhibit Hall Reception on Wednesday, June 21, following the afternoon sessions.
Technical Leadership
Roy Rapoport (Netflix, Inc.)
When you're a scrappy startup, being nimble, agile, and flexible comes with the territory. But how do you maintain agility when you're a much, much, larger company? Hope is not lost. Roy Rapoport shares critical leadership practices—focusing on encouraging failure, growing heretics, and empowering dissent—that will help you maintain a technical and organizational edge.
Hardware, Storage & Capacity Planning
David Radcliffe (Shopify)
The flexibility and speed offered by cloud computing solutions have raised the bar for bare metal deployments. Automation is essential to speedy, reliable provisioning and capacity management. David Radcliffe explores the tools Shopify uses, such as Genesis, to automate its data center and empower developers to move quickly and keep up with the times.
Hardware, Storage & Capacity Planning
Patrick Reynolds (GitHub)
GitHub uses Spokes, a custom, application-level replication system, to provide redundancy and scalable capacity for the Git service. Originally, Spokes was limited to a single physical site. Patrick Reynolds offers an overview of Spokes and explains how GitHub extended it to span multiple sites, transparently providing read-anywhere, write-anywhere replication for all Git content.
Distributed Data & Databases
Miles Ward (Google)
Google Cloud Spanner, Google's public launch of the internal Spanner service, makes available a new basic primitive for application design: globally consistent transactions. Want to know how it all works? Join Miles Ward for a detailed, demo-filled, nuanced look at the useful applications of Spanner for your workload.
Systems Engineering
Allison Miller (Google)
Automation is critical for effective operations, equally true for security ops. In large scale systems manual intervention has to be the exception - not the expectation. But how can security be automated, given the complexity involved? Many platforms turn to ML/AI, deployed in risk models. In this talk, we examine data-driven decision tech and clarify how ML & automation launches better defenses.
Hardware, Storage & Capacity Planning
Jack Chan (Shutterfly)
Jack Chan describes how Shutterfly migrated metadata from over 10B photos from a private data center into AWS in 100 days and explores designs to absorb mountains of metadata, on-premises ecommerce integration, and parallel user experiences, all in a highly scalable fashion. Shutterfly Photos is now a hybrid cloud solution with images hosted on-premises and client-facing photos metadata on AWS.
Distributed Systems
Henry Robinson (Cloudera)
It seems like everyone is building a distributed system. However, there's no common body of knowledge about how these systems should be built and scaled, beyond what is squirreled away in various academic papers. Henry Robinson shares lessons learned from over eight years spent building distributed systems and outlines a framework for thinking about distributed scaling challenges.
Security
Maarten Van Horenbeeck (Fastly), Tom Daly (Fastly), Lisa Phillips (Fastly)
Fastly operates the edge for many large web properties. To deal with emerging threats to its network, Fastly created a process that allows it to respond effectively to incidents: Incident Command, which rapidly coordinates teams during an incident. Maarten Van Horenbeeck, Lisa Phillips, and Tom Daly take you to the far side of the edge, demonstrating the protocols that work during an incident.
Sponsored
Patrick Hill (Atlassian)
Ever had an incident that didn't go as planned? Patrick Hill shares five values developed by Atlassian SREs to better handle incident management.
Keynotes
Dianne Marsh (Netflix)
Details to come.
Keynotes
Kelsey Hightower (Google)
Details to come.
Keynotes
Susan Fowler (Stripe)
Details to come.
Sponsored
Arijit Mukherji (SignalFx)
Modern infrastructure and DevOps practices are evolving rapidly. These trends pose a new set of monitoring challenges. Arijit Mukherji draws on real-world examples demonstrating what these challenges are, some approaches that worked, and metrics system capabilities that helped SignalFx deal with the challenge.
DevOps & Tools
Sangeeta Narayanan (Netflix)
Netflix operates a customizable API that allows the creation of optimized experiences on a 1,000+ devices by providing developers a serverless-like platform and experience. Sangeeta Narayanan shares lessons learned operating and scaling the platform over the years and Netflix's approaches to some of the challenges it faced.
Systems Engineering
Sasha Goldshtein (Sela Group)
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
Sasha Goldshtein leads an hands-on workshop on Linux dynamic tracing. You'll explore the BPF Compiler Collection (BCC), a set of tools and libraries for dynamic tracing, and gain firsthand experience of memory leak analysis, generic function tracing, kernel tracepoints, static tracepoints in user-space programs, and the baked-in tools for file I/O, network, and CPU analysis.
Networking, Traffic & Edge Management
Matt Klein (Lyft)
Over the past several years, Lyft has migrated from a monolith to a sophisticated service mesh powered by Envoy. Matt Klein explains why Lyft developed Envoy, focusing primarily on the operational agility that the burgeoning service mesh SoA paradigm provides, and shares lessons learned along the way.
Technical Leadership
Kathleen Vignos (Twitter)
Constant change—caused by high attrition, frequent reorganization, shifting priorities, and management turnover, among other reasons—is the new normal. It takes months to onboard a new team member and get them adding value. Kathleen Vignos offers tips, shortcuts, and stories for stabilizing a team and finding a path to productivity amid the chaos.
DevOps & Tools
Micheal Benedict (Pinterest)
Companies such as Twitter, Pinterest, Uber are powered by thousands of microservices. Managing the lifecycle of services (i.e., creating them, provisioning resources, deploying, metering, charging, and deprecating) at scale proves to be challenging. Micheal Benedict discusses the need for a lifecycle manager, how to implement governance, and the impact of such a system on developer productivity.
Orchestration, Scheduling & Containers
Seth Vargo (HashiCorp)
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
It's great that you've moved to microservices, but how are you distributing secrets? Seth Vargo explains why Vault's unique approach to secret management by providing secrets as a service for your services (and humans too) makes it highly scalable and easily customizable to fit any environment.
Sponsored
Martin Woodward (Microsoft)
Martin Woodward tells the full story of transforming Microsoft to One Engineering System with a globally distributed 24x7x365 service on the public cloud. Martin shows you around the system that handles the load of some of the most demanding engineering teams in the world and shares some stories about how they got there.
Distributed Systems
Simon de Haan (Praekelt.org)
Developing reliable healthcare systems requires careful integration of a country’s health, tech, and legal ecosystems. In Africa, locally built resilient distributed systems are needed to meet the demand of national-scale digital health services and data sovereignty laws. Simon de Haan explores the challenges and proven solutions building in these environments.
Orchestration, Scheduling & Containers
Armon Dadgar (HashiCorp)
Armon Dadgar offers an overview of Nomad, an application scheduler designed for both long-running services and batch jobs. Along the way, Armon explores the benefits of using schedulers for empowering developers and increasing resource utilization and how schedulers enable new next-generation application architectures.
If you had five minutes on stage, what would you say? What if you only got 20 slides and they rotated automatically after 15 seconds? Would you pitch a project? Launch a website? Teach a hack? We’ll find out at this year's O'Reilly Ignite San Jose.
Monitoring, Tracing & Metrics
Sneha Inguva (DigitalOcean)
Over the past year, DigitalOcean's Delivery team has been building a runtime platform based on Kubernetes with the goal of making shipping code easier. A core component of this system is a monitoring and alerting system based on Prometheus and Alertmanager. Sneha Inguva offers an overview of the system and shares problems encountered, potential solutions, and key lessons learned in the process.
Sponsored
Dave Karow (CA Technologies)
The goal of continuous testing is to find defects earlier and release software faster, which can be achieved by integrating a set of open source functional and performance testing tools in the early stages of software delivery lifecycle. Dave Karow explains how to integrate open source tools like Apache JMeter and Selenium with Taurus and Jenkins as part of a continuous testing effort.
Keynotes
Peter Alvaro (UC Santa Cruz)
In this keynote, Peter will describe LDFI’s theoretical roots in the database research notion of provenance, present early results from the field, and present opportunities for near- and long-term future research.
Networking, Traffic & Edge Management
Samir Jafferali (Linkedin)
With members in every corner of the world, LinkedIn has built services around six CDNs, numerous PoPs, and three DNS platforms. Samir Jafferali explains how LinkedIn uses big data to steer DNS intelligently, optimizes the CDNs for performance, mitigates DDoSes, and measures metrics using RUM and synthetic monitoring and shares best practices that edge teams of all sizes can benefit from.
Monitoring, Tracing & Metrics
Megan Anctil (Slack)
One size definitely doesn't fit all when it comes to open source monitoring solutions, and executing generally understood best practices in the context of unique distributed systems presents all sorts of problems. Megan Anctil shares pain points and lessons learned at Slack wrangling known technologies such as Icinga, Graphite, Grafana, and Elastic Stack to best fit the company's use cases.
Keynotes, Sponsored
Dawn Parzych (Catchpoint)
Human perception and bias can influence how metrics are interpreted. While valid metrics can open lines of communication across and within teams, using vanity metrics or data to shame others can be counterproductive. By understanding the influence assumptions and biases have and how to present credible data, you can make a real and lasting impact on your organization.   
Systems Engineering
Brendan Gregg (Netflix)
Advanced performance observability and debugging has arrived in Linux 4.x, with enhanced BPF (eBPF). Brendan Gregg offers an overview of Linux's new dynamic and static tracing tools for the analysis of filesystems, storage, CPUs, TCP, and more. Join in to explore a new generation of tools and visualizations.
Sponsored
Vicky Villalobos (Hewlett Packard Enterprise)
Vicky Villalobos explores some of best practices and tooling used to load and monitor a system in order to find performance and behavior across any OS, deployment environment, or device and shares real-life success stories and best practices of teams who are navigating these challenges on a daily basis.
Monitoring, Tracing & Metrics
Suman Karumuri (Pinterest)
Distributed tracing is an emerging field of monitoring distributed systems. Suman Karumuri shares the challenges of building and deploying distributed tracing at scale using PinTrace, one of the largest distributed tracing pipelines. Drawing on real-world examples, Suman explains how traces can be used to understand, debug, and optimize your production workflows.
Resilience Engineering
Aaron Blohowiak (Netflix)
Chaos Monkey and Kong changed the culture around infrastructure failure, but the most common cause of downtime is service failure. Turning off an entire service in production is too risky. Aaron Blohowiak offers an overview of precision chaos techniques that verify service-level fault tolerance and reveal hidden resource constraints while minimizing potential fallout.
Keynotes, Sponsored
Dave Andrews (Verizon Digital Media Services)
Verizon Digital Media Services CDN Architect & Evangelist Dave Andrews will discuss strategies for addressing cascading failures at various scales, on a single system, within a given data-center and in a globally distributed environment.
Lachlan Evenson (Deis), Jason DuMars (Rally Software)
2-Day Training Please note: to attend, your registration must include Training courses.
Kubernetes has emerged as the leading platform for containerized applications. Lachlan Evenson and Jason DuMars offer a deep dive into Kubernetes, from concept to implementation, sharing detailed explanations of its architecture, security, and use cases.
Keynotes, Sponsored
Phillip Liu (SignalFx)
This talk will focus on the one thing that has become a driver of ever better engineering: constant removal of friction for engineers to not only build and ship code, but also to understand how code is used and how it works and operates. The end result being a culture that promotes many possible ways to address given challenges and surfaces novel approaches, which may have never arisen.
Keynotes, Sponsored
Today we depend upon service providers — storage, compute, network, DNS, CDN, and many more — to build and deliver our applications. When the most sophisticated service providers on the internet fail — and they do — it’s still possible to build resilient applications.
Sponsored
Phil Stanhope (Oracle + Dyn)
For more than 30 years, the DNS has been one of the fundamental protocols of the internet, yet, despite its accepted importance, it has never quite gotten the due it deserves. Phil Stanhope explains why it's time to rethink DNS and realize the role it can play in building and running high-performance, distributed web applications.
Security
Pete Cheslock (Threat Stack)
Pete Cheslock shares the operational and security practices that helped Threat Stack scale while staying stable and secure, covering technology and tools and the various scale points that forced hard decisions.
Distributed Data & Databases
Adam Shepard (AudienceScience)
Adam Shepard peels back the covers on a user delivery network—a worldwide distributed data store powering over 80 billion transactions a day at millisecond speed. Join in to learn about eventually consistent data architectures, tiered and hybrid storage layers, and what it takes to manage that much data at scale.
Orchestration, Scheduling & Containers
Sebastien Goasguen (Skippbox)
Kubernetes has emerged as one of the leading container orchestrators. Sebastien Goasguen explores its architecture and compares it with other orchestration/scheduling systems, outlining the similarities and explaining why Kubernetes API primitives make all the difference.
Orchestration, Scheduling & Containers
Dharmesh Kakadia (Microsoft)
Orchestration systems all have different design trade-offs. Despite best efforts, these choices are not always clear to developers using these systems. Dharmesh Kakadia describes the fundamentals of scheduling and explores the scheduling algorithms implemented by various orchestration systems, highlighting similarities, differences, and the consequences of the design choices for the users.
Security
James Wickett (Signal Sciences)
Serverless is the design pattern for writing applications at scale without the necessity of managing infrastructure. It adds simplicity and a new economic model to cloud computing, but it creates some unique security challenges. James Wickett explores practical security approaches for serverless in four key areas: the software supply chain, the delivery pipeline, data flow, and attack detection.
DevOps & Tools
Timothy Gross (Joyent)
Conway's law tells us that "organizations which design systems. . .are constrained to produce designs which are copies of the communication structures of these organizations." What if we turn Conway's law around? Timothy Gross explores how to make technology choices that improve the culture of your organization.
Networking, Traffic & Edge Management
Emil Stolarsky (Shopify), Justin Li (Shopify)
Once reserved for companies large enough to write a load balancer from scratch, load balancer middleware can be a powerful tool for scaling applications. Emil Stolarsky and Justin Li explain how Shopify uses scriptable load balancers to solve difficult infrastructure problems, such as sharding across data centers, handling flash sales, and responding quickly to DDoS attacks.
Technical Leadership
Juan Pablo Buriticá explains how to use technical RFCs as a decision-making tool in your engineering organization to increase effectiveness. When implemented properly, technical RFCs can encourage trust and delegation, respectful discussions, knowledge sharing, and accountability and support good software design.
Marcus Blankenship (MarcusBlankenship.com)
2-Day Training Please note: to attend, your registration must include Training courses.
Engineers who become managers are experts at the technical aspects of their job, but they are often unprepared for the human and political challenges they face. Marcus Blankenship teaches engineering leaders a framework for building strong relationships with their teams, creating a driven culture, and communicating upward and outward to benefit their teams.
Keynotes, Sponsored
Buddy Brewer (SOASTA)
Most tools designed to help you manage your systems fall into two categories: "Finders" like monitoring services and log file analyzers, or "Fixers" like cloud infrastructure providers or container orchestration. Then you're left trying to translate information from your "finders" into actions for your "fixers." Buddy will describe how to use intelligent analytics to connect data to actions.
Distributed Systems
Ben Sigelman (LightStep)
Most sudden latency regressions in a distributed system are throughput or queueing problems. Now that some monitoring technologies can observe a system with full fidelity, we can connect the dots from a high-latency outlier request to the contended resource it’s waiting on. Ben Sigelman explains why this workflow could change the way we understand critical-path latency in distributed systems.
Distributed Data & Databases
Dan Jones (VictorOps)
Dan Jones discusses VictorOps's transition to event sourcing and CQRS in distributed systems. Through the use of persistent actors, VictorOps was able to redesign, rebuild, and deploy the entire underlying infrastructure without any noticeable impact to end users.
Networking, Traffic & Edge Management
Lee Calcote (SolarWinds)
With application developers busily adopting container technologies, the time has come for network engineers to prepare for the unique challenges brought on by networking cloud-native applications. Lee Calcote walks you through available container connectivity options, explaining their function and when they should be used and comparing their performance characteristics.
Monitoring, Tracing & Metrics
Christine Yen (Honeycomb)
Preaggregated metrics and time series form the backbone of many monitoring setups. They have many redeeming qualities but simply aren't sufficient for capturing or responding to the many ways things can go wrong in modern or complex systems. Christine Yen outlines the problems inherent in the use and implementation of preaggregated metrics.
Resilience Engineering
Nora Jones (Netflix)
Chaos engineering isn't always the most popular practice among your developers. Nora Jones covers the specifics of designing a chaos engineering solution and explains how to increment your solution technically and culturally, the socialization and evangelism pieces that tend to get overlooked in the process, and how to get developers excited about purposefully injected failure.
Keynotes
Camille Fournier (Independent)
What does it mean to be a technical leader? There is compelling evidence that technical workers want leaders who are strong technologists, leaders they believe they can learn from.
Systems Engineering
Oliver Gould (Buoyant)
Modern application architecture is becoming cloud native: containerized, "microserviced," and orchestrated. But resilience is more than just Docker and Kubernetes. Oliver Gould explains why companies like PayPal, Ticketmaster, and Monzo are adopting the service mesh model, where internal, service-to-service traffic is managed and instrumented with a mesh of load-balancing proxies.
Keynotes
Details to come.
During lunch, you'll have the chance to participate in a Birds of a Feather session with like-minded people.
Meet the Experts gives you a chance to meet face-to-face in a small group setting with expert Velocity presenters. Discuss the speaker's area of expertise, give feedback about their sessions, or ask questions. Sign up now by adding it to your personal schedule. Seating is limited.
O’Reilly Author Book Signings will be held in the O’Reilly booth on Wednesday and Thursday. This is a great opportunity for you to meet O’Reilly authors and speakers.
Keynotes
Mary Treseler (O'Reilly Media), James Turnbull (Empatico), Ines Sombra (Fastly)
Program chairs, Mary Treseler, James Turnbull, and Ines Sombra open the second day of keynotes.
Meet us before the opening keynotes on Thursday morning and get to know fellow attendees in quick 60-second discussions.
Capacity Planning
Michael Kehoe (LinkedIn), Anil Mallapur (LinkedIn)
LinkedIn conducts regular traffic shifts during peak hours to ensure that it has sufficient capacity to handle extra load during disaster situations. Michael Kehoe and Anil Mallapur discuss how LinkedIn uses traffic shifts to mitigate user impact by migrating live traffic between its data centers and stress test site-wide services for improved capacity handling and member experience.
Networking, Traffic & Edge Management
Dinesh Dutt (Cumulus Networks)
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
Dinesh Dutt explores network troubleshooting and explains how to avoid common network problems, ranging from misconfigured cabling to misbehaving protocols, how a modern networking tool chest can help simplify network configurations, and how automation is improving troubleshooting turnaround times to minimize downtime.
Keynotes
Details to come.
During lunch, you'll have the chance to participate in a Birds of a Feather session with like-minded people.
Meet the Experts gives you a chance to meet face-to-face in a small group setting with expert Velocity presenters. Discuss the speaker's area of expertise, give feedback about their sessions, or ask questions. Sign up now by adding it to your personal schedule. Seating is limited.
O’Reilly Author Book Signings will be held in the O’Reilly booth on Wednesday and Thursday. This is a great opportunity for you to meet O’Reilly authors and speakers.
Keynotes
Mary Treseler (O'Reilly Media), James Turnbull (Empatico), Ines Sombra (Fastly)
Program chairs, Mary Treseler, James Turnbull and Ines Sombra open the first day of keynotes.
Meet us before the opening keynotes on Wednesday morning and get to know fellow attendees in quick 60-second discussions.
Resilience Engineering
Gwen Shapira (Confluent), Jeff Holoman (Cloudera)
Kafka provides the low latency, high throughput, high availability, and scale that financial services firms require. But can it also provide complete reliability? Gwen Shapira and Jeff Holoman walk you through everything that happens to a message, from producer to consumer, and pinpoint all the places where data can be lost if you're not careful.
Security
Douglas Barth (Stripe), Evan Gilman (PagerDuty)
Douglas Barth and Evan Gilman offer an overview of Zero Trust, a new security model that considers all parts of the network to be equally untrusted. Doug and Evan show how to leverage a network's strengths by combining traditional SRE security approaches with novel technological arrangements while using software and hardware to secure the systems operating in those networks.