Time series data is emerging everywhere, from IoT sensors and industrial machines to transportation and logistics to DevOps and monitoring to finance. Many users start by storing their time series data in a relational database, but once their data reaches a certain scale, they give up its query power and ecosystem by migrating to some NoSQL or “modern” time series architecture.
Michael Freedman offers an overview of TimescaleDB, a new scale-out database designed for time series workloads yet open-sourced and engineered up as a plugin to Postgres. TimescaleDB is implemented as a PostgreSQL extension and available under the Apache 2 license. It supports full SQL while offering performance improvements for both single-node and cluster deployments.
Michael explains why the characteristics and needs of time series workloads (compared to general OLTP and even OLAP workloads) present a new point in the design space of databases and how TimescaleDB was architected to embrace these differences. TimescaleDB automatically partitions data across both time and space, even though it exposes the illusion of a single, continuous table (a hypertable) across all your data spread across one or many servers. Michael details the design of TimescaleDB’s dynamic chunking mechanisms that reasons both about time intervals and table sizes to provide scalable performance, while avoiding any manual tuning or configuration; its distributed query optimizations both hide the fact that users are interacting with many chunks of data spread across one or many server and minimize which chunks are accessed to answer queries. Along the way, Michale shares benchmarks demonstrating that TimescaleDB provides constant insert performance as the database scales and avoids the “performance cliff” that vanilla PostgreSQL experiences when writing to tables of tens to hundreds of millions of rows while also offering superior query performance to both Postgres and other time series databases across a variety of complex queries.
Michael J. Freedman is a professor in the Computer Science Department at Princeton University and the cofounder and CTO of TimescaleDB, which provides an open source time series database optimized for fast ingest and complex queries. His research broadly focuses on distributed systems, networking, and security. He developed and operates several self-managing systems, including CoralCDN (a decentralized content distribution network) and DONAR (a server resolution system that powered the FCC’s Consumer Broadband Test), both of which serve millions of users daily. Michael’s other research has included software-defined and service-centric networking, cloud storage and data management, untrusted cloud services, fault-tolerant distributed systems, virtual world systems, peer-to-peer systems, and various privacy-enhancing and anticensorship systems. Michael’s work on IP geolocation and intelligence led him to cofound Illuminics Systems, which was acquired by Quova (now part of Neustar). His work on programmable enterprise networking (Ethane) helped form the basis for the OpenFlow/software-defined networking (SDN) architecture. His honors include the Presidential Early Career Award for Scientists and Engineers (PECASE), a Sloan fellowship, the NSF CAREER Award, the Office of Naval Research Young Investigator Award, DARPA Computer Science Study Group membership, and multiple award publications. Michael holds a PhD in computer science from NYU’s Courant Institute and both an SB and an MEng degree from MIT.
©2017, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com