OpenTSDB: A Scalable, Distributed Time Series Database

Practitioner
Location: Mission City M
Average rating: ****.
(4.00, 3 ratings)

Most monitoring systems use a time series database to store historical data. RRD and traditional relational databases such as MySQL are among the most common storage backends used in popular monitoring systems such as MRTG, Cacti, Ganglia, Munin, Nagios, and Opsview. With the advent of the “NoSQL” movement, scalable and distributed data stores have become readily available in large clusters of commodity machines. This presentation introduces OpenTSDB, an open-source, horizontally scalable, general purpose time series database built on top of HBase. We show how its design can be used to monitor large clusters at an unprecedented level of granularity. With such a system, it becomes possible to track orders of magnitude more time series from thousands of hosts and applications, with a resolution of a few seconds to provide accurate real-time monitoring as well as long term trending.

When dealing with increasingly complex distributed systems and applications,
engineers are faced with the growing challenge of understanding the complex
state of the systems they run. All modern network equipment, operating
systems, and applications export a wealth of metrics about their state and
interactions with other services. In a large cluster, collecting, indexing
and storing all the monitoring data becomes a daunting task due to the sheer
volume of information and high rate of change. Metrics are typically
collected by running an agent on the hosts. Data points are then persisted in
a chronological fashion in a time series database. Being able to plot the data
is of utmost importance, and staying on top of the trends is critical for
capacity planning and performance monitoring. Being able to correlate
different time series is tremendously helpful when trying to understand the
behavior of a service or conduct postmortem analyses.

OpenTSDB is a master-less, horizontally scalable system that uses HBase to
store time series data. HBase is an open-source, distributed, non-relational
database modeled after Google’s Bigtable. It features
low-latency, high throughput, consistent operations that are atomic at the row
level, fault tolerance, and load balancing. Thanks to those key features, it
becomes possible to easily store significant amounts of time series data.
By choosing an appropriate schema and using efficient algorithms, millions of
data points from arbitrary time series can be retrieved and graphed quickly.
OpenTSDB offers a simple yet powerful query interface that allows custom
graphs to be generated over arbitrary time periods and with an unprecedented
granularity.

OpenTSDB has been in use at StumbleUpon for almost a year and has played a key role in helping operation and engineering teams to understand the behavior and performance of our systems, troubleshoot production issues, provide significant supporting material for postmortems, do capacity planning and trend analysis. We constantly collect many hundred metrics and hundred to thousands of data points per second.

Photo of Benoit Sigoure

Benoit Sigoure

StumbleUpon, Inc.

Benoit Sigoure is a software engineer with a strong UNIX/Linux background. His domains of interest include (but are not limited to):

  • compiler development
  • kernel / operating system design, development and security
  • image processing using stochastic processes
  • design, build and run reliable, super-large-scale, distributed systems

Prior to managing StumbleUpon’s infrastructure, Benoit was part of the site reliability team running Google’s planetary-scale ad serving systems (for both AdWords and AdSense).

Comments on this page are now closed.

Comments

Picture of Benoit Sigoure
Benoit Sigoure
02/03/2011 3:10pm PST

Slides of the talk for those who asked them.

Sponsors

  • Thomson Reuters
  • EMC Data Computing Division
  • EnterpriseDB
  • Microsoft
  • Gnip
  • Rackspace Hosting
  • IBM
  • Windows Azure MarketPlace DataMarket
  • Amazon Mechanical Turk
  • Amazon Web Services
  • Aster Data
  • Cloudera
  • Clustrix
  • DataStax, Inc. (formerly Riptano, Inc.)
  • Digital Reasoning Systems
  • Heritage Provider Network
  • Impetus
  • Jaspersoft
  • Karmasphere
  • LinkedIn
  • MarkLogic
  • Pentaho
  • Pervasive
  • Revolution Analytics
  • Splunk
  • Urban Mapping
  • Wolfram|Alpha
  • Esri
  • ParAccel
  • Tableau Software

Sponsorship Opportunities

For information on exhibition and sponsorship opportunities at the conference, contact Susan Young at syoung@oreilly.com

Download the Strata Sponsor/Exhibitor Prospectus

Contact Us

View a complete list of Strata Contacts