Performant time series data management and analytics with PostgreSQL

Michael Freedman (TimescaleDB | Princeton University)

11:20am–12:00pm Thursday, September 26, 2019

Location: 1A 23/24

Data Engineering and Architecture

Secondary topics: Data Management and Storage, Streaming and IoT

Average rating:

(4.00, 3 ratings)

Who is this presentation for?

Software developers, database administrators (DBAs), product managers, and data analysts

Level

Intermediate

Description

Time series databases are one of the fastest growing segments of the database market, spreading across industries and use cases. Common requirements include ingesting high volumes of structured data; answering complex, performant queries for both recent and historical time intervals; and performing specialized time-centric analysis and data management. Today, many developers working with time series data turn to polyglot solutions: a NoSQL database to store their time series data (for scale) and a relational database for associated metadata and key business data. Yet this leads to engineering complexity, operational challenges, and even referential integrity concerns.

Michael Freedman explains how you can avoid these operational problems by re-engineering PostgreSQL to serve as a general data platform, including high-volume time series workloads. In particular, TimescaleDB is an open source time series database, implemented as a PostgreSQL plugin, that improves insert rates by 20x over vanilla PostgreSQL and much faster queries, even while offering full SQL (including JOINs). TimescaleDB achieves this by storing data on an individual server in a manner more common to distributed systems: heavily partitioning (sharding) data into chunks to ensure that hot chunks corresponding to recent time records are maintained in memory.

You’ll discover two newly released features of TimescaleDB and how these capabilities ease time series data management through the automated adaptation of time-partitioning intervals, which the database learns by observing data volumes; and continuous aggregations in near real time, in a manner robust to late-arriving data and transparently supporting queries across different aggregation levels, and how these capabilities have been leveraged across several different use cases.

Prerequisite knowledge

A basic understanding of databases and data storage

What you'll learn

Understand that when dealing with time series data, using a combination of NoSQL databases and relational databases often leads to unnecessary complexity; how this complexity can be avoided by re-engineering PostgreSQL to serve as a general data platform; and how TimescaleDB, which is implemented as a PostgreSQL extension, improves insert rates by 20x over vanilla PostgreSQL and achieves much faster queries while offering full SQL

Michael Freedman

TimescaleDB | Princeton University

Michael J. Freedman is the cofounder and CTO of TimescaleDB and a full professor of computer science at Princeton University. His work broadly focuses on distributed and storage systems, networking, and security, and his publications have more than 12,000 citations. He developed CoralCDN (a decentralized content distribution network serving millions of daily users) and helped design Ethane (which formed the basis for OpenFlow and software-defined networking). Previously, he cofounded Illuminics Systems (acquired by Quova, now part of Neustar) and served as a technical advisor to Blockstack. Michael’s honors include a Presidential Early Career Award for Scientists and Engineers (given by President Obama), the SIGCOMM Test of Time Award, a Sloan Fellowship, an NSF CAREER award, the Office of Naval Research Young Investigator award, and support from the DARPA Computer Science Study Group. He earned his PhD at NYU and Stanford and his undergraduate and master’s degrees at MIT.