Presented By O’Reilly and Cloudera
Make Data Work
21–22 May 2018: Training
22–24 May 2018: Tutorials & Conference
London, UK

StreamDM: Advanced data science with Spark Streaming

Heitor Murilo Gomes (Télécom ParisTech), Albert Bifet (Huawei)
14:0514:45 Wednesday, 23 May 2018
Secondary topics:  Telecom, Time Series and Graphs

Who is this presentation for?

  • Data engineers, data scientists, and machine learning engineers

Prerequisite knowledge

  • A basic understanding of machine learning and Apache Spark

What you'll learn

  • Learn how to benefit from machine learning algorithms that can be continuously updated in real time using Spark Streaming

Description

Heitor Murilo Gomes and Albert Bifet offer an overview of StreamDM, a real-time analytics open source software library built on top of Spark Streaming, developed at Huawei’s Noah’s Ark Lab and Télécom ParisTech.

StreamDM’s tools and algorithms are specifically designed for data streaming. Due to the large amount of data that is created—and must be processed—in real-time streams, such methods need to be extremely time efficient while using very small amounts of memory. StreamDM is the first library to include advanced stream-mining algorithms for Spark Streaming and is intended to be the open source gathering point for the research and implementation of data streams, while also allowing practical deployments on real-world datasets.

This library contains methods for classification, regression, clustering, and frequent pattern mining. Heitor and Albert explain how these advanced methods work in practice, discuss some big data analytics applications in telecommunication networks, compare them with the methods available in MLlib and Spark ML, and demonstrate their ease of use and extensibility.

Photo of Heitor Murilo Gomes

Heitor Murilo Gomes

Télécom ParisTech

Heitor Murilo Gomes is a researcher at Télécom ParisTech focusing on machine learning—particularly evolving data streams, concept drift, ensemble methods, and big data streams. He co-leads the StreamDM open data stream mining project.

Photo of Albert Bifet

Albert Bifet

Huawei

Albert Bifet is a senior researcher at Huawei. A big data scientist with 10+ years of international experience in research, Albert has led new open source software projects for business analytics, data mining, and machine learning at Huawei, Yahoo, the University of Waikato, and UPC. At Yahoo Labs, he cofounded Apache SAMOA (Scalable Advanced Massive Online Analysis), a distributed streaming machine learning framework that contains a programing abstraction for distributed streaming ML algorithms. At the WEKA Machine Learning Group, he co-led MOA (Massive Online Analysis), the most popular open source framework for data stream mining, with more than 20,000 downloads each year. Albert is the author of Adaptive Stream Mining: Pattern Learning and Mining from Evolving Data Streams and the editor of the “Big Data Mining” special issue of SIGKDD Explorations in 2012. He was cochair of the industrial track at ECML PKDD 2015, BigMine (2014, 2013, 2012), and the data streams track at ACM SAC (2015, 2014, 2013, 2012). He holds a PhD from BarcelonaTech.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)