Build & maintain complex distributed systems
17–18 October 2017: Training
18–20 October 2017: Tutorials & Conference
London, UK

In-Person Training
Data science for effective operations

Heinrich Hartmann (Circonus)
Tuesday, 17 October & Wednesday, 18 October, 9:00 - 17:00
Location: Hilton Meeting Room 1/2 Level: Intermediate
See pricing & packages
Early Price ends 7 September

This course will sell out—sign up today!

Participants should plan to attend both days of this 2-day training course. Platinum and Training passes do not include access to tutorials on Wednesday.

Gathering telemetry data is key to operating reliable distributed systems at scale. Data science is the art of extracting information from large amounts of data. Heinrich Hartmann explores a wide range of data analysis methods (both theoretical and practical) that can make you more effective at an operations task.

What you'll learn, and how you can apply it

  • Learn how to interpret metrics and graphs presented by monitoring tools
  • Gain the mathematical background to reason about telemetry data and aggregation
  • Understand what specific metrics mean
  • Explore advanced topics like forecasting and anomaly detection

This training is for you because...

  • You're an SRE, operations engineer, sysadmin, or developer who wants to become effective at operations.

Prerequisites:

  • Experience using a monitoring system

Hardware and/or installation requirements:

  • Pen and paper
  • A Linux-based laptop

Gathering telemetry data is key to operating reliable distributed systems at scale. Data science is the art of extracting information from large amounts of data. Heinrich Hartmann explores a wide range of data analysis methods (both theoretical and practical) that can make you more effective at an operations task.

Outline

Day 1

Descriptive statistics


  • Visualizations

  • Summary statistics (mean, stddev, median, percentiles, IQR)

  • Robustness and mergability (desirable properties for ops applications)

  • Histograms

Time series analysis


  • Regressions

  • Filters and exponential smoothing

  • Approaches to anomaly detection

Metrics: The good, the bad, the ugly


  • What is monitoring?

  • How to measure system properties properly? (event data, state accounting, durations)

  • Problems with CPU utilization metrics

  • How to monitor APIs (p99 across a fleet of containers)

  • How to deal with ephemeral metrics

Day 2

Tools for data analysis


  • Python, Jupyter, and NumPy

  • Command-line tools (csvkit, feedgnuplot)

Data analysis exercises


  • Implement aggregation methods

  • Calculate accurate accounting statistics from exported monitoring data

Monitoring tools


  • StatsD

  • Graphite/Grafana

  • Circonus

Monitoring tools exercises


  • Visualization data in various ways

  • Data aggregation (percentiles)

  • Time series forecasting

  • Filtering

  • Anomaly detection

About your instructor

Photo of Heinrich Hartmann

Heinrich Hartmann is the lead data scientist at Circonus, where he is driving the development of analytics methods that transform monitoring data into actionable information as part of the Circonus monitoring platform. Previously, he worked as a researcher for the University of Oxford. Heinrich holds a PhD in mathematics from the University of Bonn.

Twitter for HeinrichHartman

Conference registration

Get the Platinum pass or the Training pass to add this course to your package. Early Price ends 7 September.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)