Build & Maintain Complex Distributed Systems
June 11–12, 2018: Training
June 12–14, 2018: Tutorials & Conference
San Jose, CA
All
Production Engineering, SRE, and DevOps
Jessica DeVita (Microsoft)
This is the story of how a team at Microsoft challenged themselves to retrospect their retrospectives and what we learned about applying human factors ideas to software development.
Monitoring, Observability, and Performance
Morgan McLean (Google), Jaana Burcu Dogan (Google)
Learn how to quickly instrument your distributed services and gain visibility into their operation with OpenCensus.
Sponsored
Mark Prichard (AppDynamics)
In this presentation we will review the various metrics available from infrastructure, Kubernetes, containers, and application code and discuss various options for viewing them holistically, thus providing the complete picture of how your applications are behaving and how users are experiencing them.
Will Gallego (Etsy)
Tutorial Please note: to attend, you must be registered for a Gold or Silver pass.
Core mechanics to building an environment where engineers and non engineers alike can share their stories without fear of retribution to better understand the complex system surrounding them using real world experiences and proven methodologies.
Michael Brunton-Spall (Government Digital Service)
Tutorial Please note: to attend, you must be registered for a Gold or Silver pass.
Traditional security approaches to threat and risk management are highly optimsed working in a traditional software development lifecycle. I'll demonstrate a new way of reviewing systems, and some real life worked examples that helps teams prioritise where to focus security effort, and what sorts of security threats you should worry about, which works more effectively in agile teams.
Containers, Continuous Delivery
Qingyang Chen (Google), Appu Goundan (Google)
Speed up container-based development by building container images with Jib.
Continuous Delivery, Kubernetes
Jason Yee (Datadog)
Jason shows how you can more easily test code in production while isolating the effect of potential issues using container orchestration and services meshes.
Michael Hausenblas (Red Hat)
1-Day Training Please note: to attend, you must be registered for a Platinum or Training pass.
In this hands-on training, you’ll learn everything you need to be successful with networking in a containerized setup, no matter if you’re a developer or an admin. We’ll start with a simple case of Docker containers running on a single machine and move on to advanced networking with Kubernetes.
Containers
Abby Fuller (Amazon Web Services)
This session moves beyond the "how to get started with containers on AWS", and goes into more advanced topics: like hybrid clusters, bringing your own AMI, working with Docker settings not supported in the UI, and debugging load balancers.
Serverless
Mike Roberts (Symphonia)
A warts-and-all dive into some of the limitations of a Serverless approach, and a practical set of techniques of how to deal with these concerns.
Systems Engineering & Architecture
Bing Wei (Slack)
In 2016 Slack was facing a problem: the load on its backend servers had increased by 1000x. In this talk you’ll hear how re-architecting the system with lazy loading, a publish/subscribe model and an edge cache service overcame the problem with zero downtime, improved latency, and gains in reliability and availability.
Building Secure Systems
Matt Freels (Fauna Inc.)
The complexity of distributed databases makes building tools for their declarative automation a daunting engineering challenge. Drawing from the experience of developing multiple configuration automation systems for databases, we will cover some patterns that generally apply to building declarative management tooling for distributed stateful systems.
Building Secure Systems
Scott Wimer (Smartsheet)
Supporting the GDPR’s Right to be Forgotten through targeted, secure data destruction.
Building Secure Systems, Continuous Delivery
Luis Eduardo Colon (Amazon Web Services)
Many fundamental security practices and controls apply to serverless applications, including implementing proper monitoring and logging of all requests and events. This session will cover recommendations published by the Center for Internet Security (CIS), how to automate the deployment of some of these controls, as well as outlining additional considerations relevant to serverless functions.
Building Secure Systems
Serena Chen (Bank of New Zealand)
What insights do we gain if we apply user experience design to information security?
Distributed Systems
Manish Mehta (Netflix), Torin Sandall (Open Policy Agent Project)
Deep-dive into how Netflix enforces authorization policies (“who can do what”) at scale in their microservices ecosystem in public cloud without introducing unreasonable latency in the request path.
Distributed Data
Alena Hall (Microsoft), Natallia Dzenisenka (Independent Contractor)
Data is generated at an ever-increasing rate, so learn to use distributed systems like Apache Kafka and Spark Streaming to process data coming from multiple sources in real-time, do processing and perform machine learning tasks.
Sean Kane (New Relic)
2-Day Training Please note: to attend, you must be registered for a Platinum or Training pass.
Sean P. Kane, co-author of Docker: Up and Running, and an experienced trainer for O’Reilly will teach students everything that they need to know to start using Docker successfully. This will include teaching students how to install Docker, design and build Docker images, deploy and manage Docker containers, and simply think about containers and how they can help you optimize your workflow.
Ben Hartshorne (Honeycomb), Christine Yen (Honeycomb)
Tutorial Please note: to attend, you must be registered for a Gold or Silver pass.
We'll explore what it means for a system to be “up” by discussing end-to-end (e2e) checks (what makes a good one and what techniques are valuable when thinking about them) and — over the course of the workshop — by writing and evolving an e2e check against a common API.
Leadership
Kathleen Vignos (Twitter)
Engineering teams want technically competent managers, but they also often want managers to keep their hands off their code—so how can managers keep technical skills relevant in order to add the most value?
Serverless
Soam Vasani (Platform9 Systems), Timirah James (Platform9 Systems)
FaaS functions are great for small functionality, but not for complex real world applications; we explore different options for composing functions together, with a deep dive into workflows.
Monitoring, Observability, and Performance
Learn how NS1 was able to reduce infrastructure, maintenance, and operational costs, while simultaneously increasing throughput and visibility of key metrics by leveraging Elasticsearch as a time series database.
Tutorial Please note: to attend, you must be registered for a Gold or Silver pass.
Francesc will walk you through the tools that make Go a great programming language, from the well known “go” tool to lesser-known tools that allow you to profile, debug, and understand the performance of your programs. You will also learn how to tune Visual Studio Code as a Go editor, although you are welcome to use any other editor—most provide great integration with Go.
Ian Henry (Habitat by Chef), Nathen Harvey (Chef)
1-Day Training Please note: to attend, you must be registered for a Platinum or Training pass.
Habitat is a simple, flexible way to build, deploy, and manage applications. Build applications of any flavor, from microservices to traditional applications. Deploy applications in any operational environment from bare metal to containers. Habitat provides consistent, repeatable, auditable applications that lower operational complexity and simplify development workflows.
Tammy Butow (Gremlin)
Tutorial Please note: to attend, you must be registered for a Gold or Silver pass.
In this session, you will learn how to establish and measure the success of your own high severity incident management program.
Monitoring, Observability, and Performance
Baron Schwartz (VividCortex)
Learn how to monitor a database by understanding the difference between workload and resource monitoring, and the golden signals for each.
Containers
Cynthia Thomas (Cilium)
Modern microservices architectures (like those run on Kubernetes) need modern security solutions to provide least privilege security.
Distributed Data, Hardware, Storage, and Datacenters
Miro Cupak (DNAstack)
This session tells the story of the largest search and discovery engine of human genetic mutations in the world.
Containers
David Cheney (Heptio)
This presentation will provide operators and developers real world advice on how to extend the capabilities of a Kubernetes cluster using the development of the open source Contour Ingress controller as a case study.
Kubernetes
David Calavera (Netlify)
This story is about how at Netlify we moved a production system to Kubernetes, the lessons we learned during the migration that made us roll it back, and how we rolled it again. This story is about how we did all this without affecting production availability.
Seth Vargo (Google)
Tutorial Please note: to attend, you must be registered for a Gold or Silver pass.
In this interactive workshop, attendees will learn how to connect applications and services running under Kubnernetes to HashiCorp Vault.
Jared Lander (Lander Analytics)
2-Day Training Please note: to attend, you must be registered for a Platinum or Training pass.
During this two-day course, you will learn how to use R to create a data operations workflow to make use of this data. We'll start with the basics of the language then cover data manipulation, plotting, and workflow documentation. This will allow you to automate and document your workflow, reports, and visualizations.
Nathen Harvey (Chef)
Tutorial Please note: to attend, you must be registered for a Gold or Silver pass.
Easily integrate automated tests that check for adherence to policy into any stage of your deployment pipeline. This workshop uses InSpec and Chef for compliance and remediation, respectively.
Keynotes
JavaScript engines are frequently targeted by malicious attackers, and dozens of vulnerabilities are reported in them every year. Most of these occur due to errors made while implementing well-specified features. This talk explores the link between feature complexity, developer error and security vulnerabilities, and explains the importance of considering implementation difficulty in design.
Keynotes
Kyle Kingsbury (Jepsen)
Details to come.
Keynotes
Bryan Liles (Heptio)
Details to come.
Keynotes
Rimma Nehme (Microsoft)
Details to come.
Keynotes
Details to come.
Brian Ketelsen (Ardan Labs), Erik St. Martin (Microsoft)
2-Day Training Please note: to attend, you must be registered for a Platinum or Training pass.
This class is intended for solutions developers, systems operations professionals, solution architects, and development operations professionals who develop, migrate, and deploy container-based applications in the public cloud and want to learn the key concepts and practices for deploying and maintaining applications using Kubernetes.
Bridget Kromhout (Microsoft)
Tutorial Please note: to attend, you must be registered for a Gold or Silver pass.
In this hands-on Kubernetes workshop, we'll launch clusters and learn about all the moving parts to build confidence around using Kubernetes in production.
Kubernetes
Ian Lewis (Google)
Learn the easiest and best ways to improve the security of your Kubernetes clusters
Leadership
This talk dives into the contrast of work environments between startups and bureaucracy and shares my lessons for maintaining an optimal engineering work culture at the VA.
Mike Roberts (Symphonia)
2-Day Training Please note: to attend, you must be registered for a Platinum or Training pass.
Serverless applications have moved from “interesting oddity” into the mainstream. But how do teams take the raw ideas of Serverless and apply them to a continuous deployment context, operate Serverless applications with confidence, and scale them to handle whatever the world can throw at them? Mike Roberts guides you through the answers to these questions, and more, in this in-depth masterclass.
Monitoring, Observability, and Performance
Gwen Shapira (Confluent), Xavier Léauté (Confluent)
Experienced Kafka admins don’t just collect metrics - they go the extra mile and use additional tools to validate availability and performance on both the Kafka cluster and their entire data pipelines.We’ll discuss critical metrics, common mistakes and then look at what metrics don’t tell you - and how to cover those essential gaps.
Kubernetes
Jeff Poole (Vivint Smart Home)
Networking with Docker and Kubernetes is a lot more complex than with traditional servers and virtual machines -- this talk will go over the concepts involved and explain what tuning may be required to use Kubernetes successfully.
Kubernetes
Kris Nova (Heptio)
In this talk we deep dive into the world of migrating a monolithic Java application to running in Kubernetes.
Tomas Lin (Netflix), Emily Burns (Netflix)
Tutorial Please note: to attend, you must be registered for a Gold or Silver pass.
This workshop will provide hands on experience building continuous delivery pipelines for deploying and promoting code across cloud virtual machines and containers using Netflix's Spinnaker Continuous Delivery platform.
Distributed Data, Hardware, Storage, and Datacenters
Victoria Nguyen (Fastly)
How we overhauled the monitoring and data collection of our globally distributed network without our caches noticing.
Distributed Systems
Alex Petrov (DataStax)
There are many ways to reach agreement in distributed systems: multi-phase commit in transactions, topic pointers in streaming systems, proof-of-stake in cryptocurrencies, leader election in databases. This talk discusses current best practices, research and helps to develop intuition and helps to navigate through DistSys vocabulary.
Distributed Systems
Sean T Allen (Wallaroo Labs)
In 2007, Pat Helland published "Life beyond Distributed Transactions: an Apostate’s Opinion." In it, he conducts a thought experiment: how to design a distributed database that can scale almost infinitely. While the paper explicitly addresses distributed database design, the ideas are far more widely applicable. If you're interested in scaling stateful applications, this talk is for you.
Distributed Systems
Performance debugging is a crucial part of ensuring code is ready for production traffic, particularly as a company and its products grow. However, bottlenecks that hold these services back can be hard to identify. I’ll discuss my experience debugging these bottlenecks in distributed systems, both at a macro (metrics, distributed tracing) and a micro (user space and kernel space profiling) level.
Monitoring, Observability, and Performance
Jamie Wilkinson (Google)
A description of SLOs and the concept of error budget, a study of the motivation to move away from cause- to symtom-based alerting, and then some theory and practical examples to show how to do it.
Hardware, Storage, and Datacenters, Systems Engineering & Architecture
Brodie Kurczynski (Las Cumbres Observatory)
How we developed a stateless interface to take real-time observations on a private global telescope network over the internet using a non-profit budget.
Continuous Delivery, Systems Engineering & Architecture
Rewriting the key software component of your platform from scratch is always intimidating, especially when you guarantee 100% uptime, your platform is in the critical application delivery path, and your environment is highly distributed. We’ll tell the story of our recent DNS server rewrite and the steps we took to roll it out across our globally distributed network with no downtime.
Distributed Data
Jon Tirsen (Square)
The story of how we we scaled out the backend for Square's Cash app using Vitess, a database middleware for MySQL built at YouTube.
Keynotes
Julia Grace (Slack)
When I joined Slack 2.5 years ago we had less than 100 engineers, now we are over 350. My team, Infrastructure Engineering, grew from 10 to 50 people in 18 months. I’ll share tips and stories from the leadership front lines as I learned how to rapidly scale myself, and my leadership team, during a period when my job changed substantially every 6 months.
Preetha Appan (HashiCorp), Alex Dadgar (HashiCorp)
1-Day Training Please note: to attend, you must be registered for a Platinum or Training pass.
Scheduler frameworks enable reliable and repeatable application deploys. In this session, attendees will use Nomad, a single binary cluster scheduler, to build a multi-region, self-healing production environment that runs a diverse set of workloads. They will also get hands-on experience in writing and submitting job specifications, interact with the API, and use multiple deployment strategies.
Keynotes
Four years of research -- backed by stories -- uncover the secrets and surprises of what really makes high performing technology-driven teams and organizations.
Bill Boulden (ClearView Social)
Tutorial Please note: to attend, you must be registered for a Gold or Silver pass.
Learn how to create serverless APIs using AWS Lambda and API Gateway with hands-on walkthroughs and applied examples.
Serverless
Lynn Langit (Lynn Langit Consulting)
Serverless is not just for compute, in fact serverless data access (via SQL and other data query/processing languages, such as Spark) is fast becoming the norm. Independent cloud architect and developer, Lynn Langit will compare and contrast the state of public cloud serverless SQL via AWS Athena, Google Big Query and others.
Production Engineering, SRE, and DevOps
Seth Vargo (Google)
Local service discovery and availability is easy, but how do we discover services in other data centers or other cloud providers? This talk discusses how HashiCorp Consul can provide service discovery, monitoring, and failover across many regions and multiple public and private cloud providers.
Production Engineering, SRE, and DevOps
Laine Campbell (Fastly)
A session on practical development, monitoring and iteration of service level objectives while inventorying and taking into account risks inherent in your constantly evolving service.
Hardware, Storage, and Datacenters, Systems Engineering & Architecture
Marcel Flores (Verizon Digital Media Services)
An examination of the design and implementation of Heteractis, the traffic management system we use at Verizon Digital Media Services to turn network telemetry data into automated decisions in an automated fashion.
Tamao Nakahara (Weaveworks)
1-Day Training Please note: to attend, you must be registered for a Platinum or Training pass.
Prometheus is an open-source monitoring system and time series database. It features a multi-dimensional data model, a flexible query language, and it integrates monitoring aspects from client-side instrumentation to alerting. In this workshop, Prometheus experts at Weaveworks will cover Prometheus architecture and concepts, and then guide attendees through hands-on Prometheus and PromQL.
Distributed Systems
Kyle Kingsbury (Jepsen)
Tesser is a library for performing efficient, composable reductions over large datasets in parallel, both on multi-core and multi-node systems.
Serverless
Erica Windisch (IOpipe)
Serverless and other stateless applications still manipulate state -- somewhere. Observing this state and knowing where, how, and why that state is manipulated is important for operational security, and developer concerns such as debugging.
Continuous Delivery, Production Engineering, SRE, and DevOps
Paul McCallick (Nordstrom)
In this talk we’ll explore how and why Nordstrom has moved to a ONLY PRODUCTION viewpoint, saving countless engineering cycles and putting effort where it matters.
Distributed Data
John Mumm (Wallaroo Labs)
Coordination is a common source of performance problems when dealing with distributed state. We’ll talk about some strategies for avoiding coordination and relying on local knowledge wherever possible, and also look at some pros and cons as well as tips for using in-memory state instead of the typical approach of using external data stores.
Keynotes
Nikki McDonald (O’Reilly Media ), Ines Sombra (Fastly), James Turnbull (Empatico)
Program Chairs, Nikki McDonald, Ines Sombra, and James Turnbull open the second day of keynotes.
Systems Engineering & Architecture
Astrid Atkinson (Google)
A microservices-based approach to tackling legacy and heterogeneity at Google.
Continuous Delivery, Serverless
Donna Malayeri (Pulumi)
Tooling is necessary for serverless and service-full applications; this talk categorizes tools and the areas in which they excel.
Keynotes
Nikki McDonald (O’Reilly Media ), James Turnbull (Empatico), Ines Sombra (Fastly)
Program Chairs, Nikki McDonald, James Turnbull, and Ines Sombra open the first day of keynotes.
Leadership
Sacha Judd (Hoku Group)
Homogenous teams are one proven cause to missteps and flaws in software products and pipelines. This talk give leaders a fresh perspective and specific tools to bring back to their teams on hiring, promotion, and internal culture.
Containers, Continuous Delivery
Liz Rice (Aqua Security)
A dive into what's easy - and what's not so easy - about finding and patching security vulnerabilities in containers
Building Secure Systems
Neal Mueller (Google)
Google conducted the first longitudinal study of the underground ecosystem fueling credential theft and identified 12.4 million potential victims of phishing kits; we’ll discuss this data, and provide phishing demos and recommendations about the effectiveness of phishing prevention tools, including education, anti-virus software, filtering, 2FA, password managers, and security keys.