Build & maintain complex distributed systems
17–18 October 2017: Training
18–20 October 2017: Tutorials & Conference
London, UK

Practical, team-focused operability techniques for distributed systems

Matthew Skelton (Skelton Thatcher Consulting)
14:1014:50 Thursday, 19 October 2017
Orchestration, Scheduling, and Containers, Systems Engineering
Location: King's Suite - Sandringham Level: Intermediate
Average rating: ***..
(3.50, 6 ratings)

Who is this presentation for?

  • Engineers and team leaders

Prerequisite knowledge

  • Experience with distributed systems as a developer, tester, or operations engineer or in a similar hands-on role

What you'll learn

  • Learn tried and tested team-friendly techniques for improving the operability of distributed systems


Modern software systems now increasingly span cloud and on-premises deployments and remote embedded devices and sensors. These distributed systems bring challenges with data, connectivity, performance, and systems management; to ensure success, you must design and build with operability as a first-class property.

Matthew Skelton shares five practical, tried-and-tested techniques for improving operability with many kinds of software systems, including the cloud, serverless, on-premises, and the IoT: logging as a live diagnostics vector with sparse event IDs; operational checklists and runbook dialog sheets as a discovery mechanism for teams; endpoint health checks as a way to assess runtime dependencies and complexity; correlation IDs beyond simple HTTP calls; and lightweight user personas as drivers for operational dashboards.

These techniques work very differently with different technologies. For instance, an IoT device has limited storage, processing, and I/O, so generating and shipping of logs and metrics looks very different from cloud or serverless cases. However, the principles—logging as a live diagnostics vector, event IDs for discovery, etc.—work remarkably well across very different technologies.

Drawing from his experience helping teams improve the operability of their software systems, Matthew explains what works (and what doesn’t) and how teams can expand their understanding and awareness of operability through these straightforward, team-friendly techniques.

Photo of Matthew Skelton

Matthew Skelton

Skelton Thatcher Consulting

Matthew Skelton is a cofounder and principal consultant at Skelton Thatcher Consulting, where he specializes in helping organizations adopt and sustain good practices for building and operating software systems, such as continuous delivery, DevOps, aspects of ITIL, and software operability. Matthew has been building, deploying, and operating commercial software systems since 1998. He curates the well-known DevOps Team Topologies Patterns and is coauthor of Database Lifecycle Management (Redgate) and Continuous Delivery with Windows and .NET (O’Reilly).