Machine learning over real-time streaming data with TensorFlow

Yong Tang (MobileIron)

2:30pm–3:10pm Wednesday, October 30, 2019

Location: Grand Ballroom C/D

Production pipelines

Download slides (PDF)

Who is this presentation for?

Data scientists

Level

Intermediate

Description

Applying machine learning over streaming data to discover useful information has been a topic of interest for some time. In many real-world applications such as IoT sensors, web transactions, GPS positions, or social media updates, large volumes of data is generated continuously. It’s critical to have a data pipeline that’s able to reliably and conveniently receive, preprocess, and provide data for model inference and training purposes.

Yong Tang explores the TensorFlow I/O package for streaming data processing with TensorFlow. Developed by SIG IO of the TensorFlow project, TensorFlow I/O is a software package with a focus on data I/O, streaming, and file formats for TensorFlow. It supports a wide variety of open source software and frameworks beyond machine learning itself. In the field of streaming data, TensorFlow I/O provides supports for Apache Kafka, AWS Kinesis, and Google Cloud PubSub, which are the most widely used streaming frameworks at the moment.

TensorFlow I/O is built on top of tf.data and is fully compatible with the succinct tf.keras API. That means model inference of streaming data with Kafka, Kinesis, and PubSub could be as easy as a one-liner. Coupled with the data transformation functions in tf.data, the model training over batches of streaming data could also be done in a straightforward way.

In addition to streaming input, TensorFlow I/O also provides streaming output support so that the data generated by machine learning algorithms in real time could be delivered back to Kafka, allowing the continuous data ingestion by another application. With both input and output support, it’s possible to build a TensorFlow-centric streaming pipeline with minimal components, which greatly reduces infrastructure maintenance over the long run.

You’ll see a demo showcasing the convenience of TensorFlow I/O usage and the ability of having a complete streaming data pipeline for machine learning with ease.

Prerequisite knowledge

A basic understanding of TensorFlow, especially tf.keras and tf.data

What you'll learn

Learn how to do machine learning over streaming data with TensorFlow and Apache Kafka

Yong Tang

MobileIron

Yong Tang is the director of engineering at MobileIron. He contributes to different container and machine learning projects for the open source community. His most recent focus is on data processing in machine learning. He’s a committer and the SIG I/O lead of the TensorFlow project, and received the Open Source Peer Bonus Award from Google for his contributions to TensorFlow. In addition to TensorFlow, Yong also contributes to many other projects for the open source community and is a committer of Docker and CoreDNS projects.