Manufacturing generates about a third of today’s data. The semiconductor industry remains one of the most complex environments in existence. The semiconductor manufacturing process involves the automation of hundreds of processing steps and tests, through which enormous amounts of data are generated constantly. Creating a high-yield process where sufficient portions of chips pass final acceptance testing is extremely difficult to achieve. At the same time, the cost of failure in this environment is significant. Moreover, unexpectedly long ramp-up times, required to create a reliable production process, may significantly undercut the commercial value of the final product, jeopardizing the huge investment in semiconductor manufacturing. Amit Rustagi and Jingwen Ouyang share a Hadoop-based solution that reveals the true value and benefits of manufacturing data generated about every chip.
At SanDisk, all the data generated throughout the manufacturing pipeline is collected in a single, secure location and analyzed, from design to product assembly and from groups spanning the company whose data traditionally resided in relational databases, NoSQL databases, Microsoft Excel spreadsheets, and more. The Cloudera platform, including Impala, Apache Spark, and Apache Hive, allows users to search, query, and analyze their data while also enabling net new capabilities to perform advanced analytics, machine learning, and pattern matching at scale on SanDisk data across the vast dataset at different stages of the manufacturing process. With the big data platform using Hadoop, SanDisk has incorporated end-to-end analytics and machine learning into its manufacturing operations, reducing drive errors, predicting failures, and ultimately ensuring superior reliability, quality, and performance of its products. Being an in-house solution, it provides the best customization and control for SanDisk’s unique needs.
In the manufacturing flow, each chip has its own test DNA. Through big data solutions, that DNA can be analyzed to determine its true value. To enable this, data from various stages in the process pipeline is ingested into a Hadoop-based platform. This data possess different characteristics in terms of source, format, and usage. Different kinds of data reside at different locations being accessed by different groups making it very isolating and limiting. Making data centralized is a must. Yet simply putting the data into Hadoop in its original form directly doesn’t help—most likely, the data will not be usable at all. It is essential to develop a standard system that allows customization of each data type and enables easy retrieval for all user groups. Standardization also helps break down these separate silo developments and enable faster deployment whenever a solution is needed to a pressing problem.
Once the data is in the Hadoop system, it can be used to help engineers to better perform their usual reporting of observations for decision making. In addition, the platform allows advanced analytics like machine learning. Some use cases and solutions include:
Amit Rustagi is an architect at SanDisk, where he is leading the architecture and strategy for big data solutions. Previously, Amit was an architect at Intuit, where he led the design and strategy of its Financial Aggregation Platform; a principal architect at eBay, where he led the architecture of analytics and experimentation infrastructure; and a senior principal architect for analytics products at Yahoo. He also held a lead role working on Oracle applications at Oracle Corp. Amit has a BS in electronics and communications.
Jingwen Ouyang is a staff big data developer at SanDisk, a Western Digital Brand. Coming from circuit design, Jingwen is uniquely positioned to bridge semiconductor manufacturing processes with big data platforms. Jingwen holds a BS and MEng from the Massachusetts Institute of Technology.
©2016, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.