We present StreamDM, a real-time analytics open source software library built on top of Spark Streaming, developed at Huawei Noah’s Ark Lab and Telecom ParisTech.
The tools and algorithms in StreamDM are specifically designed for the data stream setting. Due to the large amount of data that is created – and must be processed – in real-time streams, such methods need to be extremely time-efficient while using very small amounts of memory. StreamDM is the first library to include advanced stream mining algorithms for Spark Streaming, and is intended to be the open-source gathering point of the research and implementation of data streams, while designed to allow practical deployments on real-world datasets.
This library contains methods for classification, regression, clustering, and frequent pattern mining. In this talk, we will show how these advanced methods work in practice, discuss some big data analytics applications in telecommunication networks, compare them with the methods available in MLLib and spark.ml, and show their ease of use and extensibility.
I am currently a researcher at Télécom ParisTech. My main research area is Machine Learning, specially Evolving Data Streams, Concept Drift, Ensemble methods and Big Data Streams. I co-lead the StreamDM open data stream mining project.
Albert Bifet is a big data scientist with 10+ years of international experience in research and in leading new open source software projects for business analytics, data mining, and machine learning (Huawei, Yahoo, University of Waikato, UPC). He obtained a Ph.D. from UPC-BarcelonaTech. Albert has worked in Hong Kong, New Zealand, and Europe. At Yahoo Labs, he co-founded Apache SAMOA (Scalable Advanced Massive Online Analysis) in 2013. Apache SAMOA is a distributed streaming machine learning (ML) framework that contains a programing abstraction for distributed streaming ML algorithms. At the WEKA Machine Learning group, he has co-led MOA (Massive Online Analysis) since 2008. MOA is the most popular open source framework for data stream mining, with more than 20,000 downloads each year. Albert is the author of the book Adaptive Stream Mining: Pattern Learning and Mining from Evolving Data Streams. Additionally, he was editor of the Big Data Mining special issue of SIGKDD Explorations in 2012. Also, he is serving as co-chair of the Industrial track of ECML PKDD 2015, and served as co-chair of BigMine (2017-2012), and ACM SAC Data Streams Track (2018-2012).
Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?
Join the conversation here (requires login)
©2018, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com