Asset-heavy industries, such as oil and gas and maritime, generate tremendous volumes of data in the form of sensors, failures, and maintenance records. However, because of a siloed data infrastructure, industrial leaders within the field struggle to make use of the entirety of this data and are thus unable to capitalize on the insights embedded.
Data may be organized in a number of different formats—historians, databases, locally on laptops, and even onboard the rig or ship—depending on what it has traditionally been used for. This complicates machine learning at scale and forces the data science process to be case specific and an independent exercise for each analysis. For example, in order to develop a predictive model to identify leakage on a compressor, an engineer would need to sort through process diagrams and sensor lists to find all relevant sensors related to that compressor (and the upstream and downstream equipment). They would then need to review thousands of text entries to find when leakages occurred on this compressor and when the leakage was fixed. On a single oil rig, there can be tens of thousands of sensors streaming with failures and work orders being logged regularly, so this manual selection process is tedious, prone to error, and lacks scalability.
Alexandra Gunderson details the methodology behind an industry-tested approach that incorporates machine learning to structure and link data from different sources. The working pipeline expedites the time from independent data sources to one coherent dataset using a combination of unsupervised and semisupervised methods. Alexandra explains how this pipeline has been used in real-world applications to structure tens of thousands of sensors onto an equipment hierarchy, convert free text describing events on a ship or oil rig onto an equipment hierarchy, and label these free text events according to a specific failure mode or action taken. Alexandra also explores the insights that can be gained after you’ve joined the different data sources.
Alexandra Gunderson is a data scientist at Arundo Analytics. Her background is in mechanical engineering and applied numerical methods.
©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org