Building and maintaining complex distributed systems
June 19–20, 2017: Training
June 20–22, 2017: Tutorials & Conference
San Jose, CA

Schedule: Monitoring, Tracing, & Metrics sessions

11:25am–12:05pm Thursday, June 22, 2017
Location: LL20 A/B
Level: Intermediate
Christine Yen (Honeycomb)
Average rating: ****.
(4.60, 5 ratings)
Preaggregated metrics and time series form the backbone of many monitoring setups. They have many redeeming qualities but simply aren't sufficient for capturing or responding to the many ways things can go wrong in modern or complex systems. Christine Yen outlines the problems inherent in the use and implementation of preaggregated metrics. Read more.
1:15pm–1:55pm Thursday, June 22, 2017
Location: LL20 A/B
Level: Intermediate
Suman Karumuri (Pinterest)
Average rating: *****
(5.00, 2 ratings)
Distributed tracing is an emerging field of monitoring distributed systems. Suman Karumuri shares the challenges of building and deploying distributed tracing at scale using PinTrace, one of the largest distributed tracing pipelines. Drawing on real-world examples, Suman explains how traces can be used to understand, debug, and optimize your production workflows. Read more.
2:10pm–2:50pm Thursday, June 22, 2017
Location: LL20 A/B
Level: Intermediate
Sneha Inguva (DigitalOcean)
Average rating: *****
(5.00, 1 rating)
Over the past year, DigitalOcean's Delivery team has been building a runtime platform based on Kubernetes with the goal of making shipping code easier. A core component of this system is a monitoring and alerting system based on Prometheus and Alertmanager. Sneha Inguva offers an overview of the system and shares problems encountered, potential solutions, and key lessons learned in the process. Read more.
4:35pm–5:15pm Thursday, June 22, 2017
Location: LL20 A/B
Level: Intermediate
Megan Anctil (Slack)
Average rating: ****.
(4.83, 6 ratings)
One size definitely doesn't fit all when it comes to open source monitoring solutions, and executing generally understood best practices in the context of unique distributed systems presents all sorts of problems. Megan Anctil shares pain points and lessons learned at Slack wrangling known technologies such as Icinga, Graphite, Grafana, and the Elastic Stack to best fit the company's use cases. Read more.