Traditional big data analytics relies on unsupervised methods to draw insight from large amounts of data. However, this approach is difficult to apply in industries like oil and gas and maritime, where the data describes complex underlying systems (and should thus be grouped accordingly).
For these industrial specific cases, it is vital to limit the input data according to relevancy. Finding the pertinent data to solve specific problems currently requires a significant amount of manual curation. For example, in order to develop a predictive model to identify failures on a pump, an engineer would need to sort through process diagrams and sensor lists to find all relevant sensors related to that pump. They would then need to review thousands of text entries to find when these failures occurred on this pump. In asset-intensive industries like oil and gas, there can be tens of thousands of sensors streaming from a single rig with failures and work orders being logged regularly, so this manual selection process is tedious, prone to error, and lacks scalability.
Alexandra Gunderson shares a comprehensive preprocessing methodology that structures and links data from different sources, converting the IIoT analytics process from an unorganized mammoth to one more likely to generate insight.
There tend to be two major groups of data which need to be properly related in order to generate actionable insight:
This data should be structured in two dimensions: time and hierarchy. An asset hierarchy describes a major holding, like a rig or a ship, which can be broken down by processes, equipment, and sensors. If events and sensors are linked to a hierarchy, it is possible to compare different assets and thus more easily compare equipment performance across dozens of assets despite their hundreds of thousands of sensors.
Alexandra explains how to build these relationships by using a combination of mapping (using text mining and machine learning to automatically structure equipment, sensors, and events to a hierarchy) and event labeling (using text mining and machine learning to automatically pull keywords from event data and build datasets to be used with the sensors for supervised learning techniques).
Alexandra Gunderson is a data scientist at Arundo Analytics. Her background is in mechanical engineering and applied numerical methods.
©2017, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org