Presented By O’Reilly and Cloudera

San Jose • London • New York

Make Data Work

March 5–6, 2018: Training
March 6–8, 2018: Tutorials & Conference
San Jose, CA

Machine learning to tackle industrial data fusion

Alexandra Gunderson (Arundo Analytics)

1:50pm–2:30pm Wednesday, March 7, 2018

Big data and data science in the cloud, Data science and machine learning
Location: LL20 A

Secondary topics: Graphs and Time-series

Average rating:

(5.00, 1 rating)

Who is this presentation for?

Data scientists, engineering leaders, and architects

Prerequisite knowledge

Familiarity with machine learning
Experience working in heavy industry (useful but not required)

What you'll learn

Understand best practices for machine learning unique to heavy industry

Description

Asset-heavy industries, such as oil and gas and maritime, generate tremendous volumes of data in the form of sensors, failures, and maintenance records. However, because of a siloed data infrastructure, industrial leaders within the field struggle to make use of the entirety of this data and are thus unable to capitalize on the insights embedded.

Data may be organized in a number of different formats—historians, databases, locally on laptops, and even onboard the rig or ship—depending on what it has traditionally been used for. This complicates machine learning at scale and forces the data science process to be case specific and an independent exercise for each analysis. For example, in order to develop a predictive model to identify leakage on a compressor, an engineer would need to sort through process diagrams and sensor lists to find all relevant sensors related to that compressor (and the upstream and downstream equipment). They would then need to review thousands of text entries to find when leakages occurred on this compressor and when the leakage was fixed. On a single oil rig, there can be tens of thousands of sensors streaming with failures and work orders being logged regularly, so this manual selection process is tedious, prone to error, and lacks scalability.

Alexandra Gunderson details the methodology behind an industry-tested approach that incorporates machine learning to structure and link data from different sources. The working pipeline expedites the time from independent data sources to one coherent dataset using a combination of unsupervised and semisupervised methods. Alexandra explains how this pipeline has been used in real-world applications to structure tens of thousands of sensors onto an equipment hierarchy, convert free text describing events on a ship or oil rig onto an equipment hierarchy, and label these free text events according to a specific failure mode or action taken. Alexandra also explores the insights that can be gained after you’ve joined the different data sources.

Topics include:

PDF mining: Mining process and instrumentation diagrams to find how equipment interrelates and build meaningful information models (e.g., this heat exchanger is upstream of the compressor and should thus be considered when modeling compressor failures)
Mapping: Using text mining, clustering, and topic mining to automatically structure equipment, sensors, and events to a hierarchy
Event labeling: Using text mining, clustering, and topic mining to automatically pull keywords from event data and build datasets to be used with the sensors for supervised learning techniques
Label prediction: Using previous labeling and mapping data to limit the need for human intervention and do the process with limited oversight

Alexandra Gunderson

Arundo Analytics

Alexandra Gunderson is a data scientist at Arundo Analytics. Her background is in mechanical engineering and applied numerical methods.

Presented by

Elite Sponsors

Strategic Sponsors

Zettabyte Sponsor

Contributing Sponsors

Exabyte Sponsors

Impact Sponsors

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com