Machine learning for streaming data: Practical insights

Heitor Murilo Gomes (Télécom ParisTech), Albert Bifet (Télécom ParisTech)

2:05pm–2:45pm Thursday, September 26, 2019

Location: 3B - Expo Hall

Data Science, Machine Learning, & AI, Expo Hall

Secondary topics: Streaming and IoT, Telecom, Temporal data and time-series analytics

Who is this presentation for?

Machine learning engineers, data scientists, and software developers interested in applying machine learning for continuous flows of data

Level

Intermediate

Description

In many domains, data is generated at a fast pace. A clear example is the Internet of Things (IoT) applications, where connected sensors yield large amount of data in short periods. To build predictive models from this data, you need to either settle for traditional offline learning or attempt to learn from the data incrementally. A significant setback with the offline learning approach is that it’s slow to react to changes in the domain, and these changes can have a catastrophic impact on the model predictive performance, since the patterns in which the model was trained on are no longer valid.

An online approach where the model is trained incrementally can potentially fix this; however, the untold story is that the existing challenges for offline learning are still present (and are even maximized) when processing the data online. These challenges include, but are not limited to, raw data preprocessing, efficient incremental updates to models, algorithms to detect changes and react to them, and dealing with lots of unlabeled and delayed-labeled data.

Heitor Murilo Gomes and Albert Bifet dive into how a machine learning pipeline for streaming data can be developed in the streamDM framework. They’re not presenting on how they applied a specific algorithm to proprietary data or giving a lecture on theoretical problems related to machine learning for data streams. You’ll learn how to apply streamDM to your data streams and expand the framework to accommodate your needs.

Prerequisite knowledge

An intermediary understanding of machine learning for batch data

What you'll learn

Discover the main challenges when applying ML techniques to data streams, how to deploy an ML pipeline for data streams in streamDM, and preprocessing methods, as well as supervised and unsupervised learning algorithms for data streams

Heitor Murilo Gomes

Télécom ParisTech

Heitor Murilo Gomes is a researcher at Télécom ParisTech focusing on machine learning—particularly, evolving data streams, concept drift, ensemble methods, and big data streams. He coleads the streamDM open data stream mining project.

Website

Albert Bifet

Télécom ParisTech

Albert Bifet is a professor and head of the Data, Intelligence, and Graphs (DIG) Group at Télécom ParisTech and a scientific collaborator at École Polytechnique. A big data scientist with 10+ years of international experience in research, Albert has led new open source software projects for business analytics, data mining, and machine learning at Huawei, Yahoo, the University of Waikato, and UPC. At Yahoo Labs, he cofounded Apache scalable advanced massive online analysis (SAMOA), a distributed streaming machine learning framework that contains a programing abstraction for distributed streaming ML algorithms. At the WEKA Machine Learning Group, he co-led massive online analysis (MOA), the most popular open source framework for data stream mining with more than 20,000 downloads each year. Albert is the author of Adaptive Stream Mining: Pattern Learning and Mining from Evolving Data Streams and the editor of the Big Data Mining special issue of SIGKDD Explorations. He was cochair of the industrial track at ECML PKDD, BigMine, and the data streams track at ACM SAC. He holds a PhD from BarcelonaTech.