Sensors, IoT devices, servers, and other data sources are generating increasingly huge volumes of time series data. This data is critical for many important use cases, from operational monitoring, troubleshooting, and DevOps to condition-based and predictive maintenance, real-time control, asset tracking, and more, which require similar innovations in data infrastructure.
Most time series architectures today use column stores to support fast roll-ups on individual metrics (e.g., compute the average CPU load over the last 10-minute interval). However, these new applications often need to query data in more-complex and arbitrary ways (e.g., compute how many sensors had temperatures greater than X, grouped by location, over the last 10-minute interval). Michael Freedman outlines a new distributed time series database designed for such workloads (i.e., one that supports highly efficient queries, including complex predicates across many metrics).
Michael describes how the characteristics of these time series workloads allow specific design decisions that enable both elastic scaling and efficient SQL-like queries, even though achieving both properties has remained elusive for general OLTP workloads. Michael explains how you can leverage these workload characteristics to perform smart time/space partitioning and placement of data, even though it exposes the abstraction of a simple continuous table across all devices and time intervals. Time permitting, Michael will also cover various query optimizations such as architecture enables.
Michael presents these architecture insights in the context of TimescaleDB, a new distributed time-series database. He demonstrates TimescaleDB’s use and provides performance numbers and intuition about the scenarios in which its design shines (and those in which it does not).
Michael J. Freedman is a professor in the Computer Science department at Princeton University as well as the cofounder and CTO of Timescale, which provides an open source time series database optimized for fast ingest and complex queries. His research broadly focuses on distributed systems, networking, and security. He developed and operates several self-managing systems, including CoralCDN (a decentralized content distribution network) and DONAR (a server resolution system that powered the FCC’s Consumer Broadband Test), both of which serve millions of users daily. Michael’s other research has included software-defined and service-centric networking, cloud storage and data management, untrusted cloud services, fault-tolerant distributed systems, virtual world systems, peer-to-peer systems, and various privacy-enhancing and anticensorship systems. Michael’s work on IP geolocation and intelligence led him to cofound Illuminics Systems, which was acquired by Quova (now part of Neustar). His work on programmable enterprise networking (Ethane) helped form the basis for the OpenFlow/software-defined networking (SDN) architecture. His honors include the Presidential Early Career Award for Scientists and Engineers (PECASE), a Sloan fellowship, the NSF CAREER Award, the Office of Naval Research Young Investigator Award, DARPA Computer Science Study Group membership, and multiple award publications. Michael holds a PhD in computer science from NYU’s Courant Institute and both an SB and an MEng degree from MIT.
Comments on this page are now closed.
©2017, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.