Xiaochang Wu explains how to design and implement a real-time processing platform using the Spark Structured Streaming framework to intelligently transform production lines in the manufacturing industry.
Traditional production lines created a variety of isolated structured, semistructured, and unstructured data, such as sensor data, machine screen output, log output, and database records. There are two main data scenarios: picture and video data with low frequency but in large amounts or continuous data with high frequency. Although the amount of data per unit is not in itself large, taken together, the total is very large. This data has many of the characteristics of streaming data: it’s real time, volatile, burst, disordered, and infinite. Making effective real-time decisions to retrieve values from this data is critical to smart manufacturing.
The latest Spark Structured Streaming framework greatly lowers the bar for building highly scalable and fault-tolerant streaming applications. Thanks to Spark, we are able to build a low-latency, high-throughput, reliable operation system involving data acquisition, transmission, analysis, and storage. This system greatly improves the production process for predictive fault repair and production line material tracking efficiency and can reduce about half of the labor force for the production lines.
Xiaochang Wu is a senior software engineer on Intel’s big data engineering team, where he helps deliver the best Spark performance on Intel platforms. Xiaochang has more than 10 years’ experience in performance optimization for Intel architecture. He holds a master’s degree in computer science from Xiamen University of China.
©2017, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org