Presented By O'Reilly and Cloudera
December 5-6, 2016: Training
December 6–8, 2016: Tutorials & Conference

Industrial big data and sensor time series data: Different but not difficult—Part II

Gopal GopalKrishnan (OSIsoft, LLC.), Chris Soyza (BEARS)
4:15pm–4:55pm Wednesday, December 7, 2016
IoT and intelligent real-time applications
Location: 308/309 Level: Intermediate
Average rating: *****
(5.00, 1 rating)

Prerequisite Knowledge

  • Familiarity with one or more of the following industries: oil and gas/petroleum refining, chemical and petrochemicals, energy and water/gas/electric utilities, metals and mining, or pharmaceuticals

What you'll learn

  • Understand how to use the components of the big data ecosystem for rapid insights from manufacturing operations management (MOM) data in any of its four key areas: production, maintenance, quality, and inventory


Manufacturers, energy companies, and utilities have been dealing with industrial big data for several decades, ever since analog gauges/pneumatic controls were replaced with digital instruments, PLCs (programmable logic controllers), and SCADA (supervisory control and data acquisition). Sensors on the plant floor have always generated large amounts of data, and while each vendor or equipment may have its own protocol, we can always listen to their “tweets” to capture data and events. As such, asset-intensive companies in oil and gas, petrochemicals, paper and pulp, power generation, electricity, gas, and water transmission and distribution, pharmaceuticals, and others have millions of sensors and large archives of operations data covering several years.

Earlier this year at Strata + Hadoop World in London, Gopal GopalKrishnan highlighted the lessons learned in his 35+ years working with industrial sensor and time series data, explaining how aspects of sensor measurements (time-wave forms) and metadata processing, storage, and interpretation—even during data acquisition—are key to getting meaningful insights.

Picking up where his earlier talk left off, Gopal shares lessons learned from using components of the big data ecosystem, including HDFS, Hive, Kafka, machine learning, and visual and statistical analysis, for insights from industrial sensor and time series data and explores use cases in predictive maintenance, energy optimization, process efficiency, production cost reduction, and quality improvement that leverage the big data ecosystem—particularly through the use of visual and statistical analytics and machine-learning techniques—to get rapid insights. With permission from his customers, Gopal offers specific examples from oil-well drilling, wind turbines, ore mining, the food & beverage industry, and discrete manufacturing.

Along the way, he also discusses key aspects of data preparation prior to using the data for visual or statistical analysis or machine learning, such as applying metadata context to the sensor measurements, overcoming data gaps due to sensor time-outs and disconnects, adjusting for time lags in data to account for processing-sequence-induced latency, combining data from different types of sensors and vendors installed at different times, and combining transactional records with time series datasets. The data preparation also makes the sensor data suitable for HDFS and Hive and allows rapid interative exploration. We will also discuss data preparation to stream industrial sensor measurements to systems such as Apache Kafka. Gopal concludes by offering a brief overview of and access to an ETL tool for HDFS, Hive, Kafka, Azure Event Hub, Oracle, and others.

Photo of Gopal GopalKrishnan

Gopal GopalKrishnan

OSIsoft, LLC.

Gopal GopalKrishnan is a solution architect in the Partners & Strategic Alliances group at OSIsoft. Gopal has been working with OSIsoft’s PI System since the mid-1990s in software development, technical and sales support, and field services. Previously, he was a product manager with a focus on enterprise and asset integration and PI data access. Gopal is a registered professional engineer in Pennsylvania. He is a member of the MESA technical Committee, the Education Committee, and the MESA Continuous Process Industry Special Interest Group. He actively participates in topics such as big data, data mining, energy efficiency, manufacturing intelligence, and sustainability (including green initiatives in facilities and data centers). Gopal holds a master’s degree in engineering and continuing education in business administration.

Photo of Chris Soyza

Chris Soyza


Chris is an experienced system developer of 18+ years with most focus on embedded electronic systems and networked system solutions. He has experience working on design solutions from concept, prototyping and on to mass manufacturing. His key areas of interest are in networked embedded computer systems and backend enterprise solutions for building application technologies. Chris has spent quite some time in the design, development (from scratch) and integration of RFID based Access Control Systems, SCADA and BMS solutions for the security, building management and custom electronics industries. He has managed and owned a business in the capacity of the technical officer for a security systems manufacturing concern. Chris hold a Bachelor of Engineering in Electronics and Control Engineering, a Master of Science in Embedded Systems and a diploma in IT Infrastructure Management. Presently, he holds the IT Manager position handling Development-Operations for the SinBerBEST program at the Berkeley Education Alliance for Research in Singapore (BEARS) Limited.