Build Systems that Drive Business
June 11–12, 2018: Training
June 12–14, 2018: Tutorials & Conference
San Jose, CA

Monitoring, Observability, and Performance sessions

The larger your applications get, the harder it is to understand their performance and troubleshoot problems. This increased complexity in applications and services requires new methods for monitoring and improved observability. In this track, you’ll learn best practices for monitoring large-scale, complex, dynamic and distributed systems built on emerging architectures like microservices and serverless.

Track host

Cindy Sridharan (Apple)Cindy Sridharan (Apple) is a distributed systems engineer at Apple. Previously, she was an engineer at imgix, where she worked on API development, infrastructure, and other miscellaneous backend engineering tasks. She likes thinking about building resilient and maintainable systems and recently started writing about several of these topics.

11:25am–12:05pm Wednesday, June 13, 2018
Location: LL21 A/B Level: Beginner
Secondary topics: Systems Monitoring & Orchestration
Gwen Shapira (Confluent), Xavier Léauté (Confluent)
Average rating: ****.
(4.14, 7 ratings)
Experienced Kafka admins don’t just collect metrics; they go the extra mile and use additional tools to validate availability and performance on both the Kafka cluster and their entire data pipelines. Gwen Shapira and Xavier Léauté share best practices for monitoring Apache Kafka, discussing critical metrics, common mistakes, what metrics don’t tell you, and how to cover these essential gaps. Read more.
1:15pm–1:55pm Wednesday, June 13, 2018
Location: LL21 A/B Level: Beginner
Secondary topics: Systems Monitoring & Orchestration
Average rating: ****.
(4.33, 3 ratings)
Christian Saide explains how NS1 was able to reduce infrastructure, maintenance, and operational costs while simultaneously increasing throughput and visibility of key metrics by leveraging Elasticsearch as a time series database. Read more.
2:10pm–2:50pm Wednesday, June 13, 2018
Location: LL21 A/B Level: Beginner
Secondary topics: Systems Monitoring & Orchestration
Morgan McLean (Google), Jaana B. Dogan (Google)
Average rating: ****.
(4.75, 4 ratings)
Morgan McLean and Jaana Burcu Dogan detail how to quickly instrument your distributed services and gain visibility into their operation with OpenCensus. Read more.
3:40pm–4:20pm Wednesday, June 13, 2018
Location: LL21 A/B Level: Intermediate
Secondary topics: Systems Monitoring & Orchestration
Jamie Wilkinson (Google)
Average rating: *****
(5.00, 6 ratings)
Jamie Wilkinson offers an overview of SLOs and the concept of the error budget, a study of the motivation to move away from cause- to symtom-based alerting, and demonstrates how to implement it in your own projects. Read more.
4:35pm–5:15pm Wednesday, June 13, 2018
Location: LL21 A/B Level: Intermediate
Secondary topics: Systems Monitoring & Orchestration
Baron Schwartz (VividCortex)
Average rating: ****.
(4.25, 4 ratings)
Baron Schwartz demonstrates how to monitor a database by understanding the difference between workload and resource monitoring—and the golden signals for each. Read more.