Presented By O'Reilly and Cloudera
December 5-6, 2016: Training
December 6–8, 2016: Tutorials & Conference

Big data solutions for analyzing chip DNA in semiconductor manufacturing

Amit Rustagi (SanDisk, Western Digital Brand), jingwen ouyang (SanDisk, Western Digital Brand)
11:30am–12:00pm Tuesday, December 6, 2016
Location: 308/309 Level: Intermediate
Average rating: ***..
(3.50, 4 ratings)

Prerequisite Knowledge

  • Familiarity with the Hadoop ecosystem

What you'll learn

  • Explore a Hadoop-based solution that reveals the true value and benefits of manufacturing data generated about every chip


Manufacturing generates about a third of today’s data. The semiconductor industry remains one of the most complex environments in existence. The semiconductor manufacturing process involves the automation of hundreds of processing steps and tests, through which enormous amounts of data are generated constantly. Creating a high-yield process where sufficient portions of chips pass final acceptance testing is extremely difficult to achieve. At the same time, the cost of failure in this environment is significant. Moreover, unexpectedly long ramp-up times, required to create a reliable production process, may significantly undercut the commercial value of the final product, jeopardizing the huge investment in semiconductor manufacturing. Amit Rustagi and Jingwen Ouyang share a Hadoop-based solution that reveals the true value and benefits of manufacturing data generated about every chip.

At SanDisk, all the data generated throughout the manufacturing pipeline is collected in a single, secure location and analyzed, from design to product assembly and from groups spanning the company whose data traditionally resided in relational databases, NoSQL databases, Microsoft Excel spreadsheets, and more. The Cloudera platform, including Impala, Apache Spark, and Apache Hive, allows users to search, query, and analyze their data while also enabling net new capabilities to perform advanced analytics, machine learning, and pattern matching at scale on SanDisk data across the vast dataset at different stages of the manufacturing process. With the big data platform using Hadoop, SanDisk has incorporated end-to-end analytics and machine learning into its manufacturing operations, reducing drive errors, predicting failures, and ultimately ensuring superior reliability, quality, and performance of its products. Being an in-house solution, it provides the best customization and control for SanDisk’s unique needs.

In the manufacturing flow, each chip has its own test DNA. Through big data solutions, that DNA can be analyzed to determine its true value. To enable this, data from various stages in the process pipeline is ingested into a Hadoop-based platform. This data possess different characteristics in terms of source, format, and usage. Different kinds of data reside at different locations being accessed by different groups making it very isolating and limiting. Making data centralized is a must. Yet simply putting the data into Hadoop in its original form directly doesn’t help—most likely, the data will not be usable at all. It is essential to develop a standard system that allows customization of each data type and enables easy retrieval for all user groups. Standardization also helps break down these separate silo developments and enable faster deployment whenever a solution is needed to a pressing problem.

Once the data is in the Hadoop system, it can be used to help engineers to better perform their usual reporting of observations for decision making. In addition, the platform allows advanced analytics like machine learning. Some use cases and solutions include:

  • Functional/parametric yield measurement: Using data to understand random and systematic defects causing yield loss and root cause analysis through correlation.
  • Design of experiments (DOE): For validating hypotheses, experiment planning is done to efficiently gather relevant data. Predictive models are constructed based on the data and evaluated for decision making.
  • Early defect detection and intervention: Wafer maps analysis is used for plotting spatial patterns illustrating passing/failing dies.
Photo of Amit Rustagi

Amit Rustagi

SanDisk, Western Digital Brand

Amit Rustagi is an architect at SanDisk, where he is leading the architecture and strategy for big data solutions. Previously, Amit was an architect at Intuit, where he led the design and strategy of its Financial Aggregation Platform; a principal architect at eBay, where he led the architecture of analytics and experimentation infrastructure; and a senior principal architect for analytics products at Yahoo. He also held a lead role working on Oracle applications at Oracle Corp. Amit has a BS in electronics and communications.

Photo of jingwen ouyang

jingwen ouyang

SanDisk, Western Digital Brand

Jingwen Ouyang is a staff big data developer at SanDisk, a Western Digital Brand. Coming from circuit design, Jingwen is uniquely positioned to bridge semiconductor manufacturing processes with big data platforms. Jingwen holds a BS and MEng from the Massachusetts Institute of Technology.