Presented By O'Reilly and Cloudera
Make Data Work
March 13–14, 2017: Training
March 14–16, 2017: Tutorials & Conference
San Jose, CA

Designing a time series database to support IoT workloads

Michael Freedman (TimescaleDB | Princeton)
5:10pm5:50pm Wednesday, March 15, 2017
Real-time applications
Location: LL20 D Level: Intermediate
Secondary topics:  IoT, Streaming
Average rating: *****
(5.00, 3 ratings)

Who is this presentation for?

  • Engineers, technical product managers, and data analysts

Prerequisite knowledge

  • Basic knowledge of databases and distributed storage

What you'll learn

  • Understand how to design and think architecturally about scalable databases, with a particular focus on machine and time series data

Description

Sensors, IoT devices, servers, and other data sources are generating increasingly huge volumes of time series data. This data is critical for many important use cases, from operational monitoring, troubleshooting, and DevOps to condition-based and predictive maintenance, real-time control, asset tracking, and more, which require similar innovations in data infrastructure.

Most time series architectures today use column stores to support fast roll-ups on individual metrics (e.g., compute the average CPU load over the last 10-minute interval). However, these new applications often need to query data in more-complex and arbitrary ways (e.g., compute how many sensors had temperatures greater than X, grouped by location, over the last 10-minute interval). Michael Freedman outlines a new distributed time series database designed for such workloads (i.e., one that supports highly efficient queries, including complex predicates across many metrics).

Michael describes how the characteristics of these time series workloads allow specific design decisions that enable both elastic scaling and efficient SQL-like queries, even though achieving both properties has remained elusive for general OLTP workloads. Michael explains how you can leverage these workload characteristics to perform smart time/space partitioning and placement of data, even though it exposes the abstraction of a simple continuous table across all devices and time intervals. Time permitting, Michael will also cover various query optimizations such as architecture enables.

Michael presents these architecture insights in the context of TimescaleDB, a new distributed time-series database. He demonstrates TimescaleDB’s use and provides performance numbers and intuition about the scenarios in which its design shines (and those in which it does not).

Photo of Michael Freedman

Michael Freedman

TimescaleDB | Princeton

Michael J. Freedman is the co-founder and CTO of TimescaleDB, an open-source database that scales SQL for time-series data, and a Professor of Computer Science at Princeton University. His research focuses on distributed systems, networking, and security.

Previously, Freedman developed CoralCDN (a decentralized CDN serving millions of daily users) and Ethane (the basis for OpenFlow / software-defined networking). He co-founded Illuminics Systems (acquired by Quova, now part of Neustar) and is a technical advisor to Blockstack.

Honors include: Presidential Early Career Award for Scientists and Engineers (PECASE, given by President Obama), SIGCOMM Test of Time Award, Caspar Bowden Award for Privacy Enhancing Technologies, Sloan Fellowship, NSF CAREER Award, Office of Naval Research Young Investigator Award, DARPA Computer Science Study Group membership, and multiple award publications. Prior to joining Princeton in 2007, he received his Ph.D. in computer science from NYU’s Courant Institute, and his bachelors and masters degrees from MIT.

Comments on this page are now closed.

Comments

Picture of Bruce Lowther
Bruce Lowther | MANUFACTURING BIG DATA ARCHITECT
03/20/2017 5:21am PDT

This was a great presentation.. I’m trying to find the slides. Are they posted?