Presented By O'Reilly and Cloudera
Make Data Work
September 25–26, 2017: Training
September 26–28, 2017: Tutorials & Conference
New York, NY

IIoT data fusion: Bridging the gap from data to value

Alexandra Gunderson (Arundo Analytics)
2:55pm3:35pm Thursday, September 28, 2017
Secondary topics:  IoT

Who is this presentation for?

  • Data scientists and engineers

What you'll learn

  • Learn how to do machine learning and text mining for industrial-scale data


Traditional big data analytics relies on unsupervised methods to draw insight from large amounts of data. However, this approach is difficult to apply in industries like oil and gas and maritime, where the data describes complex underlying systems (and should thus be grouped accordingly).

For these industrial specific cases, it is vital to limit the input data according to relevancy. Finding the pertinent data to solve specific problems currently requires a significant amount of manual curation. For example, in order to develop a predictive model to identify failures on a pump, an engineer would need to sort through process diagrams and sensor lists to find all relevant sensors related to that pump. They would then need to review thousands of text entries to find when these failures occurred on this pump. In asset-intensive industries like oil and gas, there can be tens of thousands of sensors streaming from a single rig with failures and work orders being logged regularly, so this manual selection process is tedious, prone to error, and lacks scalability.

Alexandra Gunderson shares a comprehensive preprocessing methodology that structures and links data from different sources, converting the IIoT analytics process from an unorganized mammoth to one more likely to generate insight.

There tend to be two major groups of data which need to be properly related in order to generate actionable insight:

  1. Sensor data—any time series data related to measurements somewhere on the asset
  2. Event data—information relating to incidents, such as failures or work orders, on the asset

This data should be structured in two dimensions: time and hierarchy. An asset hierarchy describes a major holding, like a rig or a ship, which can be broken down by processes, equipment, and sensors. If events and sensors are linked to a hierarchy, it is possible to compare different assets and thus more easily compare equipment performance across dozens of assets despite their hundreds of thousands of sensors.

Alexandra explains how to build these relationships by using a combination of mapping (using text mining and machine learning to automatically structure equipment, sensors, and events to a hierarchy) and event labeling (using text mining and machine learning to automatically pull keywords from event data and build datasets to be used with the sensors for supervised learning techniques).

Photo of Alexandra Gunderson

Alexandra Gunderson

Arundo Analytics

Alexandra Gunderson is a data scientist at Arundo Analytics. Her background is in mechanical engineering and applied numerical methods.