Machine learning over real-time streaming data with TensorFlow
Who is this presentation for?
- Data scientists
Applying machine learning over streaming data to discover useful information has been a topic of interest for some time. In many real-world applications such as IoT sensors, web transactions, GPS positions, or social media updates, large volumes of data is generated continuously. It’s critical to have a data pipeline that’s able to reliably and conveniently receive, preprocess, and provide data for model inference and training purposes.
Yong Tang explores the TensorFlow I/O package for streaming data processing with TensorFlow. Developed by SIG IO of the TensorFlow project, TensorFlow I/O is a software package with a focus on data I/O, streaming, and file formats for TensorFlow. It supports a wide variety of open source software and frameworks beyond machine learning itself. In the field of streaming data, TensorFlow I/O provides supports for Apache Kafka, AWS Kinesis, and Google Cloud PubSub, which are the most widely used streaming frameworks at the moment.
TensorFlow I/O is built on top of tf.data and is fully compatible with the succinct tf.keras API. That means model inference of streaming data with Kafka, Kinesis, and PubSub could be as easy as a one-liner. Coupled with the data transformation functions in tf.data, the model training over batches of streaming data could also be done in a straightforward way.
In addition to streaming input, TensorFlow I/O also provides streaming output support so that the data generated by machine learning algorithms in real time could be delivered back to Kafka, allowing the continuous data ingestion by another application. With both input and output support, it’s possible to build a TensorFlow-centric streaming pipeline with minimal components, which greatly reduces infrastructure maintenance over the long run.
You’ll see a demo showcasing the convenience of TensorFlow I/O usage and the ability of having a complete streaming data pipeline for machine learning with ease.
- A basic understanding of TensorFlow, especially tf.keras and tf.data
What you'll learn
- Learn how to do machine learning over streaming data with TensorFlow and Apache Kafka
Yong Tang is the director of engineering at MobileIron. He contributes to different container and machine learning projects for the open source community. His most recent focus is on data processing in machine learning. He’s a committer and the SIG I/O lead of the TensorFlow project, and received the Open Source Peer Bonus Award from Google for his contributions to TensorFlow. In addition to TensorFlow, Yong also contributes to many other projects for the open source community and is a committer of Docker and CoreDNS projects.
Leave a Comment or Question
Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?
Join the conversation here (requires login)
For conference registration information and customer service
For more information on community discounts and trade opportunities with O’Reilly conferences
For information on exhibiting or sponsoring a conference
For media/analyst press inquires