Presented By O'Reilly and Cloudera
Make Data Work
September 25–26, 2017: Training
September 26–28, 2017: Tutorials & Conference
New York, NY

When boring is awesome: Making PostgreSQL scale for time-series data

Michael Freedman (Timescale | Princeton University)
1:15pm1:55pm Wednesday, September 27, 2017
Data engineering, Stream processing and analytics
Location: 1E 07/08 Level: Intermediate
Secondary topics:  Architecture, IoT, Streaming

Who is this presentation for?

Engineers, DBAs and operations, technical product managers, and data analysts

Prerequisite knowledge

Basic knowledge of databases and distributed storage

What you'll learn

Understand what’s different about time-series data from traditionally DB use cases, how that translates to architectural design decisions, and why a radical shift to a NoSQL database is not necessary for scale

Description

Time-series data is emerging everywhere, from IoT sensors and industrial machines, to transportation and logistics, to devops and monitoring, to finance. We have found that many users start by storing their time-series data in a relational database, but then, once their data reaches a certain scale, give up its query power and ecosystem by migrating to some NoSQL or “modern” time-series architecture. Yet this “SQL or scale” tradeoff is a false narrative: we’ve built an efficient, scale-out time-series database engineered up from PostgreSQL.

In this talk, I describe why the characteristics and needs of time-series workloads (compared to general OLTP and even OLAP workloads) present a new point in the design space of databases, and how we’ve architected TimescaleDB to embrace these differences. TimescaleDB automatically partitions data across both time and space, even though it exposes the illusion of a single continuous table — a hypertable — across all your data spread across one or many servers. Its distributed query optimizations both hide the fact that users are interacting with many chunks of data spread across one or many server, and minimize which and how chunks are accessed to answer queries. I detail the design of TimescaleDB’s dynamic chunking mechanisms that reasons both about time intervals and table sizes to provide scalable performance, while avoiding any manual tuning or configuration.

TimescaleDB is implemented as a PostgreSQL extension and available under the Apache 2 license. It supports full SQL, while offering performance improvements both for single-node and cluster deployments. In particular, I’ll show benchmarks demonstrating that it provides constant insert performance as the DB scales and avoids the “performance cliff” that vanilla PostgreSQL experiences when writing to tables of 10s-100s millions of rows, as well as superior query performance to both Postgres and other time-series databases across a variety of complex queries.

Photo of Michael Freedman

Michael Freedman

Timescale | Princeton University

Michael J. Freedman is a Professor in the Computer Science Department at Princeton University, as well as the co-founder and CTO of Timescale, building an open-source database that scales out SQL for time-series data. His work broadly focuses on distributed systems, networking, and security.

He developed and operated several self-managing systems — including CoralCDN, a decentralized content distribution network, and DONAR, a server resolution system that powered the FCC’s Consumer Broadband Test — which reached millions of users daily. Freedman’s work on IP geolocation and intelligence led him to co-found Illuminics Systems, which was acquired by Quova (now part of Neustar) in 2006. His work on programmable enterprise networking (Ethane) helped form the basis for the OpenFlow / software-defined networking (SDN) architecture. Freedman is also a technical advisor to Blockstack, building decentralized services leveraging the blockchain.

Honors include a Presidential Early Career Award for Scientists and Engineers (PECASE, given by President Obama), Sloan Fellowship, NSF CAREER Award, Office of Naval Research Young Investigator Award, DARPA Computer Science Study Group membership, and multiple award publications. He received his Ph.D. in computer science from NYU’s Courant Institute and his S.B. and M.Eng. degrees from MIT.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)