Presented By O'Reilly and Cloudera
Make Data Work
September 25–26, 2017: Training
September 26–28, 2017: Tutorials & Conference
New York, NY

State-of-the-art robot predictive maintenance with real-time sensor data

Mateusz Dymczyk (, Mathieu Dumoulin (McKinsey & Company)
11:20am12:00pm Wednesday, September 27, 2017
Secondary topics:  IoT
Average rating: ****.
(4.00, 2 ratings)

Who is this presentation for?

  • Data engineers, data scientists, and project managers and executives working with the IoT and Industry 4.0

What you'll learn

  • Learn how to build real-time IoT pipelines by leveraging well-known, standard enterprise big data components such as H2O, TensorFlow, MapR, Kafka, and Spark
  • Explore real-world examples of how to get additional value from an existing IoT sensor data pipeline and practical example of streaming architecture benefits in action


Industry 4.0 IoT applications promise vast gains in productivity from reduced downtime, higher product quality, and higher efficiency. Modern industrial robots integrate hundreds of sensors of all kinds, generating tremendous volumes of data rich in valuable information. However, the reality is that some of the most advanced industrial makers in the world are barely getting started making use of this data, with relatively rudimentary bespoke monitoring systems built at tremendous cost.

It is now possible to successfully deploy Industry 4.0 pilot use cases—using a well-chosen selection of big data enterprise products and open source projects— in a matter of months and at a small fraction of the cost of equivalent projects at leading high-tech makers. Mateusz Dymczyk and Mathieu Dumoulin showcase a working, practical, predictive maintenance pipeline in action and explain how they built a state-of-the-art anomaly detection system using big data frameworks like Spark, H2O, TensorFlow, and Kafka on the MapR Converged Data Platform.

This is an improved version of the pipeline Mateusz and Mathieu demonstrated at Strata Beijing. This pipeline uses data collected from a Bluetooth wireless movement sensor attached to a realistic model of a standard industrial robot.

Topics include:

  • How to integrate data from a second sensor type
  • Why the overall system predictions are better than models made from either data source taken separately
  • How easy it is to switch to a state-of-the-art LSTM anomaly detection model
  • A comparison with the baseline model
Photo of Mateusz Dymczyk

Mateusz Dymczyk

Mateusz Dymczyk is a Tokyo-based software engineer at, where he works as a researcher on machine learning and NLP projects. He works on distributed machine learning projects including the core H2O platform and Sparkling Water, which integrates H2O and Apache Spark. Previously, he worked at Fujitsu Laboratories. Mateusz loves all things distributed and machine learning and hates buzzwords. In his spare time, he participates in the IT community by organizing, attending, and speaking at conferences and meetups. Mateusz holds an MSc in computer science from AGH UST in Krakow, Poland.

Photo of Mathieu Dumoulin

Mathieu Dumoulin

McKinsey & Company

Mathieu Dumoulin is a Digital Expert at McKinsey & Company’s Tokyo office, where he advises large enterprises for big data, enterprise architecture and advanced analytics solutions.
Current areas of interest are creating production systems which optimize industrial processes on operational data and real-time IoT sensor data.