Presented By O'Reilly and Cloudera
Make Data Work
March 13–14, 2017: Training
March 14–16, 2017: Tutorials & Conference
San Jose, CA

Designing a time series database to support IoT workloads

Michael Freedman (TimescaleDB)
5:10pm5:50pm Wednesday, March 15, 2017
Real-time applications
Location: LL20 D Level: Intermediate
Secondary topics:  IoT, Streaming
Average rating: *****
(5.00, 3 ratings)

Who is this presentation for?

  • Engineers, technical product managers, and data analysts

Prerequisite knowledge

  • Basic knowledge of databases and distributed storage

What you'll learn

  • Understand how to design and think architecturally about scalable databases, with a particular focus on machine and time series data

Description

Sensors, IoT devices, servers, and other data sources are generating increasingly huge volumes of time series data. This data is critical for many important use cases, from operational monitoring, troubleshooting, and DevOps to condition-based and predictive maintenance, real-time control, asset tracking, and more, which require similar innovations in data infrastructure.

Most time series architectures today use column stores to support fast roll-ups on individual metrics (e.g., compute the average CPU load over the last 10-minute interval). However, these new applications often need to query data in more-complex and arbitrary ways (e.g., compute how many sensors had temperatures greater than X, grouped by location, over the last 10-minute interval). Michael Freedman outlines a new distributed time series database designed for such workloads (i.e., one that supports highly efficient queries, including complex predicates across many metrics).

Michael describes how the characteristics of these time series workloads allow specific design decisions that enable both elastic scaling and efficient SQL-like queries, even though achieving both properties has remained elusive for general OLTP workloads. Michael explains how you can leverage these workload characteristics to perform smart time/space partitioning and placement of data, even though it exposes the abstraction of a simple continuous table across all devices and time intervals. Time permitting, Michael will also cover various query optimizations such as architecture enables.

Michael presents these architecture insights in the context of TimescaleDB, a new distributed time-series database. He demonstrates TimescaleDB’s use and provides performance numbers and intuition about the scenarios in which its design shines (and those in which it does not).

Photo of Michael Freedman

Michael Freedman

TimescaleDB

Michael J. Freedman is the cofounder and CTO of TimescaleDB, an open source database that scales SQL for time series data, and a professor of computer science at Princeton University, where his research focuses on distributed systems, networking, and security. Previously, Michael developed CoralCDN (a decentralized CDN serving millions of daily users) and Ethane (the basis for OpenFlow and software-defined networking) and cofounded Illuminics Systems (acquired by Quova, now part of Neustar). He is a technical advisor to Blockstack. Michael’s honors include the Presidential Early Career Award for Scientists and Engineers (PECASE, given by President Obama), the SIGCOMM Test of Time Award, the Caspar Bowden Award for Privacy Enhancing Technologies, a Sloan Fellowship, the NSF CAREER Award, the Office of Naval Research Young Investigator Award, a DARPA Computer Science Study Group membership, and multiple award publications. He holds a PhD in computer science from NYU’s Courant Institute and bachelor’s and master’s degrees from MIT.

Comments on this page are now closed.

Comments

Picture of Bruce Lowther
Bruce Lowther | MANUFACTURING BIG DATA ARCHITECT
03/20/2017 5:21am PDT

This was a great presentation.. I’m trying to find the slides. Are they posted?