Build Systems that Drive Business
Sep 30–Oct 1, 2018: Training
Oct 1–3, 2018: Tutorials & Conference
New York, NY

DevOps and SRE sessions

Building and running complex systems that are both fast and reliable requires teams and applications that work well, together. The cultural shift is evident: software engineers and system administrators break down walls as they move towards sharing responsibilities and thereby quicken the pace of software development and delivery. This track explores these new ways of working together with insights and lessons gathered from taking software from concept to production

Track host

Tanya ReillyTanya Reilly (Google) is a system administrator and site reliability engineer at Google, where she works on low-level infrastructure like distributed locking, load balancing, and bootstrapping. Previously, she was a system administrator at Eircom.net, Ireland's largest ISP, and the entire IT Department for a small software house.

9:00am–12:30pm Monday, October 1, 2018
Location: Beekman/Sutton North Level: Beginner
Secondary topics:  Systems Architecture & Infrastructure
Bridget Kromhout (Microsoft)
Average rating: ****.
(4.78, 9 ratings)
Bridget Kromhout walks you through launching clusters and details all the moving parts you need to know about to use Kubernetes in production. Read more.
9:00am–12:30pm Monday, October 1, 2018
Location: Nassau Level: Beginner
Secondary topics:  Systems Monitoring & Orchestration
James Meickle (Quantopian)
Average rating: ****.
(4.00, 1 rating)
Ansible is a "batteries included" automation, configuration management, and orchestration tool that's fast to learn and flexible enough for any architecture. Join James Meickle to get started with Ansible, with an eye toward sustainable development in cloud environments. Read more.
1:30pm–5:00pm Monday, October 1, 2018
Location: Sutton South/Regent Parlor Level: Beginner
Secondary topics:  Resilient, Performant & Secure Distributed Systems
Tammy Butow (Gremlin), Ana Margarita Medina (Gremlin), Patrick Higgins (Gremlin)
Average rating: ***..
(3.00, 2 ratings)
Chaos engineering is the discipline of experimenting on a distributed system in order to build confidence in the system’s capability to withstand turbulent conditions in production. Tammy Butow, Ana Medina, and Patrick Higgins lead a hands-on deep dive into chaos engineering, covering the tools and practices you need to implement it in your organization. Read more.
1:30pm–5:00pm Monday, October 1, 2018
Location: Murray Hill East (B) Level: Intermediate
Secondary topics:  Systems Architecture & Infrastructure
Anubhav Mishra (HashiCorp)
Average rating: ***..
(3.00, 2 ratings)
Over the past year, service meshes have gained significant interest. Most service meshes have two components: a control plane and a data plane. Anubhav Mishra explains what it takes to build a scalable control and data plane. Anubhav also discusses how HashiCorp Consul provides many features like a distributed key-value store and service discovery that make it ideal for a control plane. Read more.
11:35am–12:15pm Tuesday, October 2, 2018
Location: Murray Hill Level: Beginner
Secondary topics:  Systems Monitoring & Orchestration
Liz Fong-Jones (Honeycomb), Dave Rensin (Google)
Average rating: ****.
(4.25, 4 ratings)
Implementing site reliability (SRE) engineering doesn't have to be intimidating, and it isn't only for cloud-native organizations. Liz Fong-Jones and Dave Rensin share eight key lessons Google's customer reliability engineering team learned helping large enterprises adopt SRE as an operations engineering model. Read more.
1:30pm–2:10pm Tuesday, October 2, 2018
Location: Murray Hill Level: Beginner
Secondary topics:  Systems Architecture & Infrastructure
Matt Rogish (ReactiveOps)
Average rating: *****
(5.00, 1 rating)
Matt Rogish explains how NTSB investigations of air disasters have dramatically improved flight safety and applies lessons learned in disaster recovery and analysis, teamwork, task saturation, and systems design to modern software application and infrastructure architecture at scale to achieve higher availability, reduced errors, and more scalable systems. Read more.
2:25pm–3:05pm Tuesday, October 2, 2018
Location: Murray Hill Level: Non-technical
Secondary topics:  Resilient, Performant & Secure Distributed Systems
Heidi Waterhouse (LaunchDarkly)
Average rating: *****
(5.00, 4 ratings)
Waffle House's hurricane disaster plan has everything you could want from an IT disaster plan, including contact trees, failover states, and runbooks on partial operation. Heidi Waterhouse shares lessons about state drawn from the world outside computers and explains how to quantify them using a finite state machine and implement them automatically while you are in a less-than-perfect condition. Read more.
4:45pm–5:25pm Tuesday, October 2, 2018
Location: Murray Hill
Jennifer Davis (Microsoft)
Average rating: ****.
(4.00, 2 ratings)
Rather than a future of NoOps, serverless has increased the need for specialized operations engineering. Jennifer Davis explores the role of operations in serverless, covering testing, monitoring, and debugging functions. Read more.