Build Systems that Drive Business
June 11–12, 2018: Training
June 12–14, 2018: Tutorials & Conference
San Jose, CA

Systems Monitoring & Orchestration

Build and interact with real-world systems

The larger your applications get, the harder it is to understand their performance and troubleshoot problems. This increased complexity in applications and services is driving a stronger interest in monitoring and observability.

Explore how to monitor large-scale, complex, dynamic, and distributed systems built on emerging architectures like microservices and serverless. In these sessions, you'll learn about containers, microservices, and services architectures, including technologies like Docker, CoreOS, and Kubernetes.

We'll help you solve your toughest challenges with real-world advice from leaders in the field who have grappled with the same problems you're facing today. Like how to:

  • Diagnose complex issues in production environments
  • Instrument systems for maximum possible observability
  • Monitor applications being run in containers
  • Monitor system calls, garbage collection, and other interesting events in the Java Virtual Machine
  • Plan and deploy monitoring for your own custom applications in containers
9:00am–12:30pm Tuesday, June 12, 2018
Location: LL21 E/F Level: Intermediate
Secondary topics: Systems Monitoring & Orchestration
Ben Hartshorne (Honeycomb), Christine Yen (Honeycomb)
Average rating: ***..
(3.00, 5 ratings)
Ben Hartshorne and Christine Yen explore what it means for a system to be “up” by discussing end-to-end (e2e) checks (what makes a good one and what techniques are valuable when thinking about them). Along the way, you'll learn how to write and evolve an e2e check against a common API. Read more.
1:30pm–5:00pm Tuesday, June 12, 2018
Location: LL21 C/D Level: Beginner
Secondary topics: Systems Monitoring & Orchestration
Tomas Lin (Netflix), Emily Burns (Netflix)
Average rating: ****.
(4.67, 3 ratings)
Tomas Lin and Emily Burns walk you through building continuous delivery pipelines for deploying and promoting code across cloud virtual machines and containers using Netflix's Spinnaker continuous delivery platform. Read more.
11:25am–12:05pm Wednesday, June 13, 2018
Location: LL21 A/B Level: Beginner
Secondary topics: Systems Monitoring & Orchestration
Gwen Shapira (Confluent), Xavier Léauté (Confluent)
Average rating: ****.
(4.14, 7 ratings)
Experienced Kafka admins don’t just collect metrics; they go the extra mile and use additional tools to validate availability and performance on both the Kafka cluster and their entire data pipelines. Gwen Shapira and Xavier Léauté share best practices for monitoring Apache Kafka, discussing critical metrics, common mistakes, what metrics don’t tell you, and how to cover these essential gaps. Read more.
11:25am–12:05pm Wednesday, June 13, 2018
Location: LL20 A/B Level: Beginner
Secondary topics: Systems Monitoring & Orchestration
Jason Yee (Datadog)
Average rating: ****.
(4.50, 4 ratings)
Jason Yee shows how you can more easily test code in production while isolating the effect of potential issues using container orchestration and services meshes. Read more.
1:15pm–1:55pm Wednesday, June 13, 2018
Location: LL21 A/B Level: Beginner
Secondary topics: Systems Monitoring & Orchestration
Average rating: ****.
(4.33, 3 ratings)
Christian Saide explains how NS1 was able to reduce infrastructure, maintenance, and operational costs while simultaneously increasing throughput and visibility of key metrics by leveraging Elasticsearch as a time series database. Read more.
2:10pm–2:50pm Wednesday, June 13, 2018
Location: LL21 A/B Level: Beginner
Secondary topics: Systems Monitoring & Orchestration
Morgan McLean (Google), Jaana B. Dogan (Google)
Average rating: ****.
(4.75, 4 ratings)
Morgan McLean and Jaana Burcu Dogan detail how to quickly instrument your distributed services and gain visibility into their operation with OpenCensus. Read more.
2:10pm–2:50pm Wednesday, June 13, 2018
Location: LL20 A/B Level: Intermediate
Secondary topics: Systems Monitoring & Orchestration
Kris Nova (Heptio)
Average rating: ****.
(4.89, 9 ratings)
Kris Nova leads a deep dive into the world of migrating a monolithic Java application to Kubernetes. Read more.
3:40pm–4:20pm Wednesday, June 13, 2018
Location: LL21 A/B Level: Intermediate
Secondary topics: Systems Monitoring & Orchestration
Jamie Wilkinson (Google)
Average rating: *****
(5.00, 6 ratings)
Jamie Wilkinson offers an overview of SLOs and the concept of the error budget, a study of the motivation to move away from cause- to symtom-based alerting, and demonstrates how to implement it in your own projects. Read more.
3:40pm–4:20pm Wednesday, June 13, 2018
Location: 230 B Level: Beginner
Secondary topics: Systems Monitoring & Orchestration
Victoria Nguyen (Fastly)
Average rating: *****
(5.00, 3 ratings)
Victoria Nguyen explains how Fastly overhauled the monitoring and data collection of its globally distributed network without its caches noticing. Read more.
4:35pm–5:15pm Wednesday, June 13, 2018
Location: LL21 A/B Level: Intermediate
Secondary topics: Systems Monitoring & Orchestration
Baron Schwartz (VividCortex)
Average rating: ****.
(4.25, 4 ratings)
Baron Schwartz demonstrates how to monitor a database by understanding the difference between workload and resource monitoring—and the golden signals for each. Read more.
1:15pm–1:55pm Thursday, June 14, 2018
Location: LL20 C Level: Intermediate
Secondary topics: Systems Monitoring & Orchestration
Jon Hodgson (Riverbed)
Average rating: ****.
(4.67, 3 ratings)
Much of the monitoring data we rely on is fundamentally flawed, lacking the resolution and accuracy needed to effectively detect and diagnose many issues. Digital signal processing science has overcome similar challenges for audio. Using sound as an example, Jon Hodgson explains how these principles are leveraged by organizations to improve the fidelity of their performance monitoring. Read more.
3:40pm–4:20pm Thursday, June 14, 2018
Location: LL21 E/F Level: Intermediate
Secondary topics: Systems Monitoring & Orchestration
Dave Cheney (Heptio)
Average rating: ***..
(3.50, 4 ratings)
David Cheney shares real-world advice on how to extend the capabilities of a Kubernetes cluster, using the development of the open source Contour Ingress controller as a case study. Read more.
4:35pm–5:15pm Thursday, June 14, 2018
Location: 230 B Level: Intermediate
Secondary topics: Systems Monitoring & Orchestration
Erica Windisch (IOpipe)
Serverless and other stateless applications still manipulate state—somewhere. Erica Windisch explains why observing this state and knowing where, how, and why that state is manipulated is important for operational security and developer concerns such as debugging. Read more.