Presented By O'Reilly and Cloudera
December 5-6, 2016: Training
December 6–8, 2016: Tutorials & Conference

IoT and Spark MLlib applications for improving products, services, and manufacturing technologies

Yoshitaka Suzuki (IHI Corporation), Masaru Dobashi (NTT DATA Corporation)
11:15am–11:55am Wednesday, December 7, 2016
IoT and intelligent real-time applications
Location: 308/309 Level: Beginner

Prerequisite Knowledge

  • Basic knowledge about Hadoop and related products

What you'll learn

  • Learn how IHI used PySpark and MLlib to improve its services
  • Understand best practices for application development and operating Spark on YARN


Using ILIPS, a common platform for remote monitoring and maintenance developed in house, IHI has collected data from several kinds of products, including aircraft engines and industrial machineries. To utilize the data for preventive maintenance and operation optimization, IHI started evaluating Spark as a potential scalable and flexible analytics platform in 2014, in collaboration with NTT DATA, an active contributor to Spark. Yoshitaka Suzuki and Masaru Dobashi explain how IHI used PySpark and MLlib to improve its services and share best practices for application development and lessons for operating Spark on YARN.

As a first step, IHI evaluated Spark’s characteristics and processing capabilities for time series data through the prediction of a port congestion problem using dummy GIS data. As of this year, IHI has started analyzing real data including sensor data, real GIS data, and system logs. Additionally, IHI has started developing an in-house analytics infrastructure utilizing Spark and leveraging Spark MLlib to get up speed in developing applications for process improvement, product fault diagnosis, and the formalization of highly skilled expert knowledge. Highly skilled experts using MLlib can now analyze the relationships among a large amount of sensor data, enabling IHI to reduce operating costs by automating processes and improve the transmission of knowledge between workers.

Photo of Yoshitaka Suzuki

Yoshitaka Suzuki

IHI Corporation

Yoshitaka Suzuki is a researcher in information science and technology at IHI Corporation. Yoshitaka has developed anomaly detection algorithms for several kinds of products, such as industrial machines and engines, but is now responsible for utilizing sensor data, developing software for anomaly detection and fault diagnosis, and verifying the practical effectiveness of distributed processing systems. Prior to IHI, he spent four years developing anomaly detection algorithms for machinery systems and social infrastructures at Kozo Keikaku Engineering Inc. Yoshitaka holds an MEng in aeronautics and astronautics from the University of Tokyo.

Photo of Masaru Dobashi

Masaru Dobashi

NTT DATA Corporation

Masaru Dobashi is a system infrastructure engineer and leads the OSS professional service team at NTT DATA Corporation. Masaru developed an enterprise Hadoop cluster consisting of over 1,000 nodes in 2009, which was one of the largest Hadoop clusters in Japan at the time. After that, he designed and provisioned several kinds of clusters using nonHadoop OSS, such as Spark and Storm. Masaru is now responsible for introducing Hadoop, Spark, Storm, and other OSS middlewares into enterprise systems and developing data processing systems.