Using ILIPS, a common platform for remote monitoring and maintenance developed in house, IHI has collected data from several kinds of products, including aircraft engines and industrial machineries. To utilize the data for preventive maintenance and operation optimization, IHI started evaluating Spark as a potential scalable and flexible analytics platform in 2014, in collaboration with NTT DATA, an active contributor to Spark. Yoshitaka Suzuki and Masaru Dobashi explain how IHI used PySpark and MLlib to improve its services and share best practices for application development and lessons for operating Spark on YARN.
As a first step, IHI evaluated Spark’s characteristics and processing capabilities for time series data through the prediction of a port congestion problem using dummy GIS data. As of this year, IHI has started analyzing real data including sensor data, real GIS data, and system logs. Additionally, IHI has started developing an in-house analytics infrastructure utilizing Spark and leveraging Spark MLlib to get up speed in developing applications for process improvement, product fault diagnosis, and the formalization of highly skilled expert knowledge. Highly skilled experts using MLlib can now analyze the relationships among a large amount of sensor data, enabling IHI to reduce operating costs by automating processes and improve the transmission of knowledge between workers.
Yoshitaka Suzuki is a researcher in information science and technology at IHI Corporation. Yoshitaka has developed anomaly detection algorithms for several kinds of products, such as industrial machines and engines, but is now responsible for utilizing sensor data, developing software for anomaly detection and fault diagnosis, and verifying the practical effectiveness of distributed processing systems. Prior to IHI, he spent four years developing anomaly detection algorithms for machinery systems and social infrastructures at Kozo Keikaku Engineering Inc. Yoshitaka holds an MEng in aeronautics and astronautics from the University of Tokyo.
Masaru Dobashi is a system infrastructure engineer and leads the OSS professional service team at NTT DATA Corporation. Masaru developed an enterprise Hadoop cluster consisting of over 1,000 nodes in 2009, which was one of the largest Hadoop clusters in Japan at the time. After that, he designed and provisioned several kinds of clusters using nonHadoop OSS, such as Spark and Storm. Masaru is now responsible for introducing Hadoop, Spark, Storm, and other OSS middlewares into enterprise systems and developing data processing systems.
©2016, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.