Sep 23–26, 2019
Please log in

Trill: The crown jewel of Microsoft’s streaming pipeline explained

James Terwilliger (Microsoft Corporation), Badrish Chandramouli (Microsoft Research), Jonathan Goldstein (Microsoft Research)
4:35pm5:15pm Wednesday, September 25, 2019
Location: 1A 15/16
Average rating: *****
(5.00, 3 ratings)

Who is this presentation for?

  • Developers, data engineers, and enterprise systems engineers

Level

Intermediate

Description

The Trill data engine is the power behind many of Microsoft’s offerings, from products like Azure Stream Analytics to billion-dollar services like Bing Ads. It has now been open-sourced and is available to everyone. But it has been a long path to get there.

James Terwilliger, Badrish Chandramouli, and Jonathan Goldstein explore the history of decades of streaming data processing at Microsoft: a beginning in research, a first product in StreamInsight, the transition to the cloud, and all the pain points along the way. A key result of that lineage and learning has been the Trill engine, which has three key properties a single standalone data processing engine for all temporal data, no matter if the data is streamed or stored; a simple API that integrates seamlessly with the programming language; and performance without ego, a willingness to use every lesson learned to improve throughput in every way possible.

They dive deep into why each of those properties is important through examples. A simple application to demonstrate the basics of Trill: joins, aggregation, windowing; a more complicated application to demonstrate the power of Trill’s API: progressive windowing, regular expressions and pattern detection, data-dependent windows; and an overview of the kind of query used by Bing Ads, a query to run a multi-billion-dollar business.

You’ll see a performance showcase: running the previous examples to demonstrate how Trill got its name—processing a trillion events per day on a single node.

Prerequisite knowledge

  • A working knowledge of streaming data systems (useful but not required)

What you'll learn

  • Learn about temporal data versus temporal queries and data-dependent and custom temporal windowing
Photo of James Terwilliger

James Terwilliger

Microsoft Corporation

James Terwilliger is a principal software development engineer at Microsoft, where he’s a 10-year veteran, having spent time on both product and research teams. He began as an intern during the last year of his PhD research at Portland State University. His background is in innovative data query and exploration interfaces and streaming data processing. At Microsoft, he helped develop the PowerQuery extension to Excel that is now the default data tab there, and now works on the Trill temporal data engine. Whatever he works on, he finds a way to add Pivot and Unpivot to it.

Photo of Badrish Chandramouli

Badrish Chandramouli

Microsoft Research

Badrish Chandramouli is a senior principal researcher in the database group at Microsoft Research. He is interested in creating technologies to perform real-time and offline big and raw data processing, as well as resilient state management for cloud and edge applications. His research work first shipped in 2010 as part of the Microsoft SQL Server StreamInsight engine. Starting 2012, Badrish built Trill, a streaming analytics engine that is widely used at Microsoft, for example, in the Bing ads platform and in the Azure Stream Analytics cloud service. More recently, Badrish built FASTER, a high-performance embedded, resilient, and concurrent state store and cache that supports larger-than-memory data and is optimized for streaming analytics. He has also worked on simplifying distributed computing via frameworks such as Ambrosia and CRA.

Jonathan Goldstein

Microsoft Research

Jonathan Goldstein is a principal researcher at Microsoft Research.

  • Cloudera
  • O'Reilly
  • Google Cloud
  • IBM
  • Cisco
  • Dataiku
  • Intel
  • Io-Tahoe
  • MemSQL
  • Microsoft Azure
  • Oracle Cloud Infrastructure
  • SAS
  • Arcadia Data
  • BMC Software
  • Hazelcast
  • SAP
  • Amazon Web Services
  • Anaconda
  • Esri
  • Infoworks.io, Inc.
  • Kyligence
  • Pitney Bowes
  • Talend
  • Google Cloud
  • Confluent
  • DataStax
  • Dremio
  • Immuta
  • Impetus Technologies Inc.
  • Keyence
  • Kyvos Insights
  • StreamSets
  • Striim
  • Syncsort
  • SK holdings C&C

    Contact us

    confreg@oreilly.com

    For conference registration information and customer service

    partners@oreilly.com

    For more information on community discounts and trade opportunities with O’Reilly conferences

    strataconf@oreilly.com

    For information on exhibiting or sponsoring a conference

    pr@oreilly.com

    For media/analyst press inquires