Sensors, IoT devices, servers, and other data sources are generating increasingly huge volumes of time series data. This data is critical for many important use cases, from operational monitoring, troubleshooting, and DevOps to condition-based and predictive maintenance, real-time control, asset tracking, and more, which require similar innovations in data infrastructure.
Most time series architectures today use column stores to support fast roll-ups on individual metrics (e.g., compute the average CPU load over the last 10-minute interval). However, these new applications often need to query data in more-complex and arbitrary ways (e.g., compute how many sensors had temperatures greater than X, grouped by location, over the last 10-minute interval). Michael Freedman outlines a new distributed time series database designed for such workloads (i.e., one that supports highly efficient queries, including complex predicates across many metrics).
Michael describes how the characteristics of these time series workloads allow specific design decisions that enable both elastic scaling and efficient SQL-like queries, even though achieving both properties has remained elusive for general OLTP workloads. Michael explains how you can leverage these workload characteristics to perform smart time/space partitioning and placement of data, even though it exposes the abstraction of a simple continuous table across all devices and time intervals. Time permitting, Michael will also cover various query optimizations such as architecture enables.
Michael presents these architecture insights in the context of TimescaleDB, a new distributed time-series database. He demonstrates TimescaleDB’s use and provides performance numbers and intuition about the scenarios in which its design shines (and those in which it does not).
Michael J. Freedman is the cofounder and CTO of TimescaleDB, an open source database that scales SQL for time series data, and a professor of computer science at Princeton University, where his research focuses on distributed systems, networking, and security. Previously, Michael developed CoralCDN (a decentralized CDN serving millions of daily users) and Ethane (the basis for OpenFlow and software-defined networking) and cofounded Illuminics Systems (acquired by Quova, now part of Neustar). He is a technical advisor to Blockstack. Michael’s honors include the Presidential Early Career Award for Scientists and Engineers (PECASE, given by President Obama), the SIGCOMM Test of Time Award, the Caspar Bowden Award for Privacy Enhancing Technologies, a Sloan Fellowship, the NSF CAREER Award, the Office of Naval Research Young Investigator Award, a DARPA Computer Science Study Group membership, and multiple award publications. He holds a PhD in computer science from NYU’s Courant Institute and bachelor’s and master’s degrees from MIT.
Comments on this page are now closed.
©2017, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.