Get the free Ebook:
Private and Open Data in Asia: A Regional Guide.
Huawei is deeply committed to the Apache Spark project and participates extensively in joint community and industry efforts. Spark is the core technology behind the data processing and analytics platform of Huawei’s big data solution, FusionInsight, used by more than 100 enterprise customers globally.
We present StreamDM, a new real-time analytics open source software library built on top of Spark Streaming, developed at Huawei Noah’s Ark Lab in Hong Kong.
The tools and algorithms in StreamDM are specifically designed for the data stream setting. Due to the large amount of data that is created – and must be processed – in real-time streams, such methods need to be extremely time-efficient while using very small amounts of memory. StreamDM is the first library to include advanced stream mining algorithms for Spark Streaming, and is intended to be the open-source gathering point of the research and implementation of data streams, while designed to allow practical deployments on real-world datasets.
This new library contains methods for classification, regression, clustering, and frequent pattern mining. In this talk, we will show how these advanced methods work in practice, discuss some big data analytics applications in telecommunication networks, compare them with the methods available in MLLib and spark.ml, and show their ease of use and extensibility.
Albert Bifet is a professor at LTCI and head of the Data, Intelligence, and Graphs (DIG) Group at Télécom ParisTech, and a scientific collaborator at École Polytechnique. A big data scientist with 10+ years of international experience in research, Albert has led new open source software projects for business analytics, data mining, and machine learning at Huawei, Yahoo, the University of Waikato, and UPC. At Yahoo Labs, he cofounded Apache SAMOA (Scalable Advanced Massive Online Analysis), a distributed streaming machine learning framework that contains a programing abstraction for distributed streaming ML algorithms. At the WEKA Machine Learning Group, he co-led MOA (Massive Online Analysis), the most popular open source framework for data stream mining, with more than 20,000 downloads each year. Albert is the author of Adaptive Stream Mining: Pattern Learning and Mining from Evolving Data Streams and the editor of the Big Data Mining special issue of SIGKDD Explorations in 2012. He was cochair of the industrial track at ECML PKDD 2015, BigMine (2014, 2013, 2012), and the data streams track at ACM SAC (2015, 2014, 2013, 2012). He holds a PhD from BarcelonaTech.
Silviu Maniu is a researcher at Noah’s Ark Lab, Huawei Technologies. He holds a PhD degree in Computer Science from Telecom ParisTech. His main research interests are social and uncertain data management databases, and stream machine learning.
©2015, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.