Build resilient systems at scale
May 27–29, 2015 • Santa Clara, CA

Building an effective observability stack

Laine Campbell (Fastly)
1:30pm–3:00pm Wednesday, 05/27/2015
Tutorial
Location: Ballroom AB
Average rating: ***..
(3.57, 21 ratings)
Slides:   1-PDF 

Prerequisite Knowledge

Linux systems administration, moderate Mid-level sysadmin or above

Materials or downloads needed in advance

Internet access, GitHub account (optional but very useful), AWS account (optional but very useful)

Description

Our OpsViz stack is open-source and available via GitHub. The project includes AWS CloudFormation, AWS OpsWorks, Elasticsearch with Kibana, Logstash, Grafana/Graphite, and Sensu. This is already in production for customers.

1. SLA Driven Operational Visibility – 25 minutes

What we monitor – overview and philosophy

  • Collect once
  • Minimize datastores
  • Consolidate visualizations
  • Granularity and statistics
  • Key performance indicators

Service level indicators

  • Velocity
  • Efficiency/cost
  • Scalability
  • Performance
  • Availability

Why we monitor – our customers

  • Operational health
  • Quality assurance
  • Capacity planning
  • Product management

2. How we monitor – architecture and implementation (25 minutes)

Components of a system

  • Sense/measure
  • Collection
  • Analysis/computation
  • Storage
  • Visualization
  • Metrics and scaling

3. Use cases/supporting our customers (30 minutes)

Operational health

  • Business requirements definition and translation
  • Convert to user stories, defining business needs, functional tests, metrics required, actions required

Capacity planning

  • Business requirements definition and translation
  • Convert to user stories, defining business needs, functional tests, metrics required, actions required
  • Focused on resource utilization, identifying correlative factors and ways to measure validity and monitor that validity over time.

Product development

  • Business requirements definition and translation
  • Convert to user stories, defining business needs, functional tests, metrics required, actions required

4. Setting up our stack/walkthrough (10 minutes)

  • Walk through cloud formation script
  • Walk through components
  • Walk through dashboard
Photo of Laine Campbell

Laine Campbell

Fastly

Laine Campbell is AVP of Pythian’s open-source database practice, former CEO and co-founder of Blackbird, and a founder of PalominoDB. Laine has been an Oracle, MySQL and Cassandra DBA architect and designer for 11 years with such organizations as Obama for America, Travelocity, Zappos, Chegg, LiveJournal, Disney Mobile, and Adobe. Laine is also an open-source proponent, and advocate for bringing technology, job opportunities, and privileges to underserved populations.

Laine is co-author of O’Reilly Media’s Databases at Scale. Learn more. http://oreil.ly/1GbWE1y