In this project, we rescued a few barely-usable solutions from the past, and made them viable again by exploiting the speed and performance of big data platform-based execution.
In our analytics solution park, a number of workflows are dedicated to the analysis of energy usage time series through the monitoring of smart meter IDs. One solution in particular predicts the amount of electrical energy usage for clusters of smart meter IDs in Ireland. The bottleneck in this whole solution, however, lies in the first ETL process, which takes up to two days to execute. Such a long execution time made it barely usable in production, and challenging for re-training.
Recently, we decided to re-engineer this legacy solution and run it on a big data platform. We transformed all ETL processes into in-database ETL processing nodes. A complex and specific SQL query, implementing all necessary conversions, joins, and aggregations, was built and executed on Hadoop clusters. The (smaller) resulting data set was then pulled back into the analytics platform to build the time series prediction model. The execution of this re-engineered ETL process now takes less than half an hour and allows for more frequent model re-trainings.
Rosaria Silipo (LinkedIn) is not only an expert in data mining, machine learning, reporting, and data warehousing, she has become a recognized expert on the KNIME data mining engine, on which she has published three books: KNIME Beginner’s Luck, The KNIME Cookbook, and The KNIME Booklet for SAS Users.
Previously Dr. Silipo worked as a freelance data analyst for many companies throughout Europe. She has also led the SAS development group at Viseca (Zürich), implemented the speech-to-text and text-to-speech interfaces in C# at Spoken Translation (Berkeley, California), and developed a number of speech recognition engines in different languages at Nuance Communications (Menlo Park, California). Dr. Silipo gained her doctorate in biomedical engineering in 1996 from the University of Florence, Italy.
©2015, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.