Presented By O’Reilly and Cloudera
Make Data Work
March 5–6, 2018: Training
March 6–8, 2018: Tutorials & Conference
San Jose, CA

TimescaleDB: Reengineering PostgreSQL as a time series database

Michael Freedman (TimescaleDB | Princeton)
1:50pm2:30pm Thursday, March 8, 2018
Data engineering and architecture
Location: 230 A Level: Intermediate
Secondary topics:  Graphs and Time-series

Who is this presentation for?

  • Engineers, DBAs, product managers, and data analysts

Prerequisite knowledge

  • A basic understanding of databases and data storage

What you'll learn

  • Explore TimescaleDB, a new open source database designed for time series workloads, engineered up as a plugin to Postgres

Description

Time series data is now everywhere—IoT, user event streams, system monitoring, finance, adtech, industrial control, transportation, and logistics—and increasingly used to power core applications. It also creates a number of technical challenges: to ingest high volumes of structured data; to ask complex, performant queries for both recent and historical time intervals; to perform specialized time-centric analysis and data management. And this data doesn’t exist in isolation, entries must often be joined against other relational data to ask key business questions (e.g., tracking a shipping container is much more powerful once combined with information about its goods).

Many developers working with time series data turn to polyglot solutions: a NoSQL database to store their time series data (for scale) and a relational database for associated metadata and key business data. This leads to engineering complexity, operational challenges, and even referential integrity concerns. Thus many have found they require a purpose-built time series database as this type of data proliferates, yet the current state of time series databases is lacking and still forces users into the same issues with running complex polyglot or immature solutions.

Michael Freedman explains why these operational headaches are unnecessary. Michael offers an overview of TimescaleDB, a new scale-out database designed for time series workloads that is open-sourced and engineered up as a plugin to Postgres. In creating TimescaleDB, Michael reengineered PostgreSQL as a time series database in order to simplify time series application development. In particular, the nature of time-series workloads—appending data about recent events—presents different demands than transactional (OLTP) workloads. By taking advantage of these differences, TimescaleDB can improve insert rates by 20x over vanilla Postgres and achieve much faster queries, even while offering full SQL (including JOINs). This simplifies one’s product and stack with a single database, while enabling users to ask much more complex and ad-hoc questions about their data.

TimescaleDB is implemented as a PostgreSQL extension and available under the Apache 2 license. It supports full SQL while offering performance improvements for both single-node and cluster deployments. TimescaleDB achieves this by storing data on an individual server in a manner more common to distributed systems: heavily partitioning (sharding) data into chunks to ensure that hot chunks corresponding to recent time records are maintained in memory. This right-sized chunking is performed automatically, and the database can even adapt its chunk sizes based on observed resource demands. It hides this behind a hypertable that can be inserted into or queried like a single table—even at 100B+ rows over 10K+ chunks. While this adds a few additional milliseconds for query planning, it enables TimescaleDB to avoid the performance cliff that Postgres experiences at larger table sizes (tens of millions of rows).

Photo of Michael Freedman

Michael Freedman

TimescaleDB | Princeton

Michael J. Freedman is a professor in the Computer Science Department at Princeton University and the cofounder and CTO of TimescaleDB, which provides an open source time series database optimized for fast ingest and complex queries. His research broadly focuses on distributed systems, networking, and security. He developed and operates several self-managing systems, including CoralCDN (a decentralized content distribution network) and DONAR (a server resolution system that powered the FCC’s Consumer Broadband Test), both of which serve millions of users daily. Michael’s other research has included software-defined and service-centric networking, cloud storage and data management, untrusted cloud services, fault-tolerant distributed systems, virtual world systems, peer-to-peer systems, and various privacy-enhancing and anticensorship systems. Michael’s work on IP geolocation and intelligence led him to cofound Illuminics Systems, which was acquired by Quova (now part of Neustar). His work on programmable enterprise networking (Ethane) helped form the basis for the OpenFlow/software-defined networking (SDN) architecture. His honors include the Presidential Early Career Award for Scientists and Engineers (PECASE), a Sloan fellowship, the NSF CAREER Award, the Office of Naval Research Young Investigator Award, DARPA Computer Science Study Group membership, and multiple award publications. Michael holds a PhD in computer science from NYU’s Courant Institute and both an SB and an MEng degree from MIT.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)