Semiconductor manufacturing is a complicated production process. The manufacturing process contains more than 400 steps and measures more than 1,000 chemical or physical factors, corresponding to more than 3,500 different recipes used in the process. Controlling the factors during the progress is the key to the final yield, which represents the quality of the product. The huge amount of data generated by the tools and the data from the MES and FDC systems plays a critical role when the result of the process generates a fail or defective output.
Capacity is the most important issue since the huge amount of the data is generated from the tools and the FDC/WIP or MES systems. The relational database only keeps six months of data to deal with the explosion in data. Even so, the data still reaches 30 TB, which is not a big deal in the big data world but soon becomes a heavy load for the relational database.
Each table contains more than 400 columns to record all the parameters accordingly to store the measurement from the tools. This increases the difficulty when users try to query the data, which may need to be correlated with other tables in the MES system because the query may become more complicated in the aggregation. That is, the design of the row key and the schema need to be very precise for the data, which we want to migrate into the big data platform.
Performance is the other problem that needs to be improved. The existing system contains the summary table to precalculate the data to provide the query, which may need aggregation. There are several advantages to this methodology. The aggregation lacks flexibility since the data is already set in the summary table and the user cannot instantly filter out the data by choice—if needed, it must be queried from the original data table, which might be a huge dataset. In this case, performance is not satisfactory for frontend applications.
Rebecca Tien Yu Lin and Mon-Fong Mike Jiang offer an overview of a Hadoop-based big data solution helping the semiconductor industry increase yield by monitoring the huge amount of tool logs and the data generated from the FDC system. Rebecca and Mike share their experiences implementing the solution and explain why Hadoop is an appropriate fit for the data in the semiconductor industry. They also cover how they make use the data to find possible root causes of yield loss.
Rebecca Tien Yu Lin is a director at is-land Systems Inc., where she leads the application team in helping customers establish big data systems and introduce HareDB solutions to their sites. Rebecca draws on her strong knowledge of the Hadoop ecosystem and its applications to provide professional services for her customers. Rebecca has more than eight years’ experience in project execution related to the semiconductor industry and has demonstrated innovative professional skill with a proven ability to identify, analyze, and solve problems to increase customer satisfaction.
Mike Jiang is a vice president at is-land Systems Inc. Mike has over 15 years of data analysis software developing experience, especially in semiconductor engineering data. He has led technical teams in providing system integration, professional service, and system development for customers in over 200 projects. His research interests include expert systems, machine learning, parallel and distributed systems, and data mining. He has published 17 international journal and conference papers and 16 workshop papers. Mike is also the author of 23 Chinese programming and software application books.
©2016, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.