Presented By
O’Reilly + Cloudera
Make Data Work
29 April–2 May 2019
London, UK

Performant time series data management and analytics with PostgreSQL

Michael Freedman (TimescaleDB | Princeton University)
14:5515:35 Thursday, 2 May 2019
Data Engineering and Architecture, Expo Hall
Location: Expo Hall 2 (Capital Hall N24)
Average rating: ****.
(4.75, 4 ratings)

Who is this presentation for?

  • Developers, DevOps, and anyone with time series data

Level

Intermediate

What you'll learn

  • Understand the abilities of SQL versus NoSQL databases
  • Learn how to most efficiently ingest high volumes of structured data
  • See how with a little reengineering, Postgres can serve as as general data platform

Description

Time series databases are one of the fasting growing segments of the database market, spreading across industries and use cases. Common requirements including ingesting high volumes of structured data, answering complex, performant queries for both recent and historical time intervals, and performing specialized time-centric analysis and data management.

Today, many developers working with time series data turn to polyglot solutions: a NoSQL database to store their time series data (for scale) and a relational database for associated metadata and key business data. Yet this leads to engineering complexity, operational challenges, and even referential integrity concerns.

Michael Freedman explains how to avoid these operational problems by reengineering Postgres to serve as a general data platform, including high-volume time series workloads. You’ll learn how open source time series database TimescaleDB, implemented as a Postgres plug-in, improves insert rates by 20x over vanilla Postgres and much faster queries, while offering full SQL (including JOINs). TimescaleDB achieves this by storing data on an individual server in a manner more common to distributed systems: heavily partitioning (sharding) data into chunks to ensure that hot chunks corresponding to recent time records are maintained in memory.

Michael focuses on two newly released features of TimescaleDB: the automated adaptation of time-partitioning intervals, which the database learns by observing data volumes, and continuous aggregations in near real time, in a manner robust to late-arriving data and transparently supporting queries across different aggregation levels. You’ll discover how these capabilities ease time series data management and how they have been leveraged across several different use cases.

Photo of Michael Freedman

Michael Freedman

TimescaleDB | Princeton University

Michael J. Freedman is the cofounder and CTO of TimescaleDB and a full professor of computer science at Princeton University. His work broadly focuses on distributed and storage systems, networking, and security, and his publications have more than 12,000 citations. He developed CoralCDN (a decentralized content distribution network serving millions of daily users) and helped design Ethane (which formed the basis for OpenFlow and software-defined networking). Previously, he cofounded Illuminics Systems (acquired by Quova, now part of Neustar) and served as a technical advisor to Blockstack. Michael’s honors include a Presidential Early Career Award for Scientists and Engineers (given by President Obama), the SIGCOMM Test of Time Award, a Sloan Fellowship, an NSF CAREER award, the Office of Naval Research Young Investigator award, and support from the DARPA Computer Science Study Group. He earned his PhD at NYU and Stanford and his undergraduate and master’s degrees at MIT.