Presented By O'Reilly and Cloudera
Make Data Work
Sept 29–Oct 1, 2015 • New York, NY

Re-engineering legacy analytics solutions with big data

Rosaria Silipo ( AG)
4:35pm–5:15pm Thursday, 10/01/2015
Hadoop Use Cases
Location: 1 E12/ 1 E13 Level: Intermediate
Average rating: *....
(1.00, 2 ratings)
Slides:   1-PDF 

In this project, we rescued a few barely-usable solutions from the past, and made them viable again by exploiting the speed and performance of big data platform-based execution.

In our analytics solution park, a number of workflows are dedicated to the analysis of energy usage time series through the monitoring of smart meter IDs. One solution in particular predicts the amount of electrical energy usage for clusters of smart meter IDs in Ireland. The bottleneck in this whole solution, however, lies in the first ETL process, which takes up to two days to execute. Such a long execution time made it barely usable in production, and challenging for re-training.

Recently, we decided to re-engineer this legacy solution and run it on a big data platform. We transformed all ETL processes into in-database ETL processing nodes. A complex and specific SQL query, implementing all necessary conversions, joins, and aggregations, was built and executed on Hadoop clusters. The (smaller) resulting data set was then pulled back into the analytics platform to build the time series prediction model. The execution of this re-engineered ETL process now takes less than half an hour and allows for more frequent model re-trainings.

Photo of Rosaria Silipo

Rosaria Silipo AG

Rosaria Silipo (LinkedIn) is not only an expert in data mining, machine learning, reporting, and data warehousing, she has become a recognized expert on the KNIME data mining engine, on which she has published three books: KNIME Beginner’s Luck, The KNIME Cookbook, and The KNIME Booklet for SAS Users.

Previously Dr. Silipo worked as a freelance data analyst for many companies throughout Europe. She has also led the SAS development group at Viseca (Z├╝rich), implemented the speech-to-text and text-to-speech interfaces in C# at Spoken Translation (Berkeley, California), and developed a number of speech recognition engines in different languages at Nuance Communications (Menlo Park, California). Dr. Silipo gained her doctorate in biomedical engineering in 1996 from the University of Florence, Italy.