SAMOA: A Platform for Mining Big Data Streams

Hadoop & Beyond
Location: 212
Average rating: ***..
(3.67, 6 ratings)
Slides:   1-PDF 

Streaming data analysis in real time is becoming the fastest and most efficient way to obtain useful knowledge from what is happening now, allowing organizations to react quickly when problems appear or to detect new trends helping to improve their performance. In this talk, we present SAMOA, an open-source platform for mining big data streams. SAMOA is a platform for online mining in a cluster/cloud environment. It features a pluggable architecture that allows it to run on several distributed stream processing engines such as S4 and Storm. SAMOA includes algorithms for the most common machine learning tasks such as classification and clustering.

Photo of Gianmarco De Francisci Morales

Gianmarco De Francisci Morales

Yahoo Labs

Gianmarco De Francisci Morales is a Research Scientist at Yahoo Labs Barcelona.
He received his Ph.D. in Computer Science and Engineering from the IMT Institute for Advanced Studies of Lucca in 2012. His research focuses on large scale data mining and big data, with a particular emphasis on Web mining and Data Intensive Scalable Computing systems. He is an active member of the open source community of the Apache Software Foundation working on the Hadoop ecosystem (Giraph, S4), and a committer for the Apache Pig project. He is a co-organizer of the workshop series on Social News on the Web (SNOW) co-located with the WWW conference. He is one of the lead developers of SAMOA, an open-source platform for mining big data streams.