Building and maintaining complex distributed systems
17–18 October, 2017: Training
18–20 October, 2017: Tutorials & Conference
London, UK

In-Person Training
Data Science for Effective Operations

Heinrich Hartmann (Circonus)
Sunday, October 1 & Monday, October 2, 9:00am - 5:00pm
Location: Hilton Meeting Room 1/2 Level: Intermediate
See pricing & packages
Best Price ends 21 July

This course will sell out—sign up today!

Participants should plan to attend both days of this 2-day training course. Platinum and Training passes do not include access to tutorials on Wednesday.

Gathering telemetry data is key to operating reliable distributed systems at scale. Data Science is the art of extracting information from large amounts of data. In this training, we will cover a wide range of data analysis methods from both, theoretical and practical side, that make you more effective as at operations task.

What you'll learn, and how you can apply it

  • Interpret Metrics and Graphs presented by Monitoring tools (Lot's of pitfalls!)
  • Get the mathematical background to reason about telemetry data and aggregation thereof
  • Get a better understanding of what specific metrics mean? (e.g. p99 of response time across a cluster of nodes at 9am today)
  • Get a high level overview about advanced topics like Forecasting and Anomaly Detection. What to expect? What to watch out for?

This training is for you because...

SREs, Operations Engineers, SysAdmins, Developers


You should have used a monitoring system before.

Hardware and/or installation requirements:

  • Pen and paper.
  • Access to a Linux system.

Day 1:

  • Part 0 : Intro / Warmup. Get to know to each other. Learn about background and interests.
  • Part 1 : Descriptive Statistics
    • Visualizations
    • Summary Statistics (!) (mean, stddev, median, percentiles, IQR)
    • Robustness & Mergability (Desirable properties for Ops applications)
    • Histograms


  • Part 3 : Metrics: The good, the bad, the ugly
    • What is Monitoring? (Piorier, et. al) Fix terminology
    • How to measure system properties properly? Event data / state accounting / durations
    • Problems with CPU utilization metrics (State accounting is messed up. How to do better → ebpf)
    • How to monitor APIs (p99 accross a fleet of containers)
    • How to deal with ephemeral metrics

Day 2:

  • Tools for Data Analysis
    • Python/Jupyter/Numpy
    • Command line tools (csvkit, feedgnuplot)
  • Examples/Exercises Data Analysis:
    • Implement aggregation methods
    • Calculate accurate accounting statistics from exported monitoring data


  • Intro Monitoring Tools
    • StatsD
    • Graphite/Grafana
    • Circonus
  • Examples/Exercises Monitoring Tools
    • Visualization data in various ways
    • Data Aggregation (Percentiles)
    • Time Series Forecasting
    • Filtering
    • Anomaly Detection

About your instructor

Photo of Heinrich Hartmann

Heinrich Hartmann is the lead data scientist at Circonus. He is driving the development of analytics methods that transform monitoring data into actionable information as part of the Circonus monitoring platform. Heinrich earned his PhD in mathematics from the University of Bonn and worked as a researcher for the University of Oxford afterward. In 2012 he shifted his focus to computer science, and now applies his 10+ years of mathematical expertise to data analytics.

Twitter for HeinrichHartman

Conference registration

Get the Platinum pass or the Training pass to add this course to your package. Best Price ends 21 July.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)