Identifies the faulty wafer before it can be used for the fabrication of integrated circuits and, in photovoltaics, to manufacture solar cells.

Overview

Retrainable-Faulty-Wafer-Detector

Aim of the project:

In electronics, a wafer (also called a slice or substrate) is a thin slice of semiconductor, such as crystalline silicon (c-Si), used for the fabrication of integrated circuits and, in photovoltaics, to manufacture solar cells. The wafer serves as the substrate for microelectronic devices built in and upon the wafer. The project aims to successfully identify the state of the provided wafer by classifying it between one of the two-class +1 (good, can be used as a substrate) or -1 (bad, the substrate need to be replaced) and then train the model on this data so that it can continuously update itself with the environment and become more generalized with time. In this regard, a training and prediction dataset is provided to build a machine learning classification model, which can predict the wafer quality.

Data Description:

The columns of provided data can be classified into 3 parts: wafer name, sensor values and label. The wafer name contains the batch number of the wafer, whereas the sensor values obtained from the measurement carried out on the wafer. The label column contains two unique values +1 and -1 that identifies if the wafer is good or need to be replaced. Additionally, we also require a schema file, which contains all the relevant information about the training files such as file names, length of date value in the file name, length of time value in the file name, number of columns, name of the columns, and their datatype.

Directory creation:

All the necessary folders were created to effectively separate the files so that the end-user can get easy access to them.

Data Validation:

In this step, we matched our dataset with the provided schema file to match the file names, the number of columns it should contain, their names as well as their datatype. If the files matched with the schema values then they are considered good files on which we can train or predict our model, however, if it didn't match then they are moved to the bad folder. Moreover, we also identify the columns with null values. If all the data in a column is missing then the file is also moved to the bad folder. On the contrary, if only a fraction of data in a column is missing then we initially fill it with NaN and considered it as good data.

Data Insertion in Database:

First, open a connection to the database if it exists otherwise create a new one. A table with the name train_good_raw_dt or pred_good_raw_dt is created in the database, based on the training or prediction process, for inserting the good data files obtained from the data validation step. If the table is already present then new files are inserted in that table as we want training to be done on new as well as old training files. In the end, the data in a stored database is exported as a CSV file, to be used for the model training.

Data Pre-processing and Model Training:

In the training section, first, the data is checked for the NaN values in the columns. If present, impute the NaN values using the KNN imputer. The columns with zero standard deviation were also identified and removed as they don't give any information during model training. A prediction schema was created based on the remained dataset columns. Afterwards, the KMeans algorithm is used to create clusters in the pre-processed data. The optimum number of clusters is selected by plotting the elbow plot, and for the dynamic selection of the number of clusters, we are using the "KneeLocator" function. The idea behind clustering is to implement different algorithms to train data in different clusters. The Kmeans model is trained over pre-processed data and the model is saved for further use in prediction. After clusters are created, we find the best model for each cluster. We are using four algorithms, Random Forest, K-Neighbours, Logistic Regression and XGBoost. For each cluster, both the algorithms are passed with the best parameters derived from GridSearch. We calculate the AUC scores for all the models and select the one with the best score. Similarly, the best model is selected for each cluster. For every cluster, the models are saved so that they can be used in future predictions. In the end, the confusion matrix of the model associated with every cluster is also saved to give the a glance over the performance of the models.

Prediction:

In data prediction, first, the essential directories are created. The data validation, data insertion and data processing steps are similar to the training section. The KMeans model created during training is loaded, and clusters for the pre-processed prediction data is predicted. Based on the cluster number, the respective model is loaded and is used to predict the data for that cluster. Once the prediction is made for all the clusters, the predictions along with the Wafer names are saved in a CSV file at a given location.

Retraining:

After the prediction, the prediction data is merged with the previous training dataset and then the models were retrained on this data using the hyperparameter values obtained from the GridSearch. The cycle repeats with every prediction it does and learns from the newly acquired data, making it more robust.

Deployment:

We will be deploying the model to Heroku Cloud.

Owner
Arun Singh Babal
Engineer | Data Science Enthusiasts | Machine Learning | Deep Learning | Advanced Computer Vision.
Arun Singh Babal
Spooky Castle Project

Spooky Castle Project Here is a repository where I have placed a few workflow scripts that could be used to automate the blender to godot sprite pipel

3 Jan 17, 2022
OCR-ID-Card VietNamese (new id-card)

OCR-ID-Card VietNamese (new id-card) run project: download 2 file weights and pu

12 Jun 15, 2022
Python scripts to interact with Upper Deck ePack online trading card platform

This script should connect to the Upper Deck ePack API using your browser cookies and download a list of your current collection and save it as a CSV.

Adrian Kent 1 Nov 22, 2021
Medical appointments No-Show classifier

Medical Appointments No-shows Why do 20% of patients miss their scheduled appointments? A person makes a doctor appointment, receives all the instruct

4 Apr 20, 2022
LOL英雄联盟云顶之弈挂机刷代币脚本,全自动操作,智能逻辑,功能齐全。

LOL云顶之弈挂机刷代币脚本 这是2019年全球总决赛写的一个云顶挂机脚本,python完成的。 功能: 自动拿牌卖牌 策略是高星策略,非固定阵容 自动登陆账号、打码、异常重启 战利品截图上传百度云 web中控发号,改密码,查看信息等 代码是三天赶出来的,所以有点混乱,WEB中控代码也不知道扔哪去了

77 Oct 10, 2022
The dynamic code loading framework used in LocalStack

localstack-plugin-loader localstack-plugin-loader is the dynamic code loading framework used in LocalStack. Install pip install localstack-plugin-load

LocalStack 5 Oct 09, 2022
A simple project which is a ecm to found a good way to provide a path to img_dir in gooey

ECM to find a good way for img_dir Path in Gooey This code is just an ECM to find a good way to indicate a path of image in image_dir variable. We loo

Jean-Emmanuel Longueville 1 Oct 25, 2021
A comprensive software collection for nmea manipulation

nmeatoolkit A comprensive software collection for nmea manipulation; it includes a library and a collections of command line tools. Library pipes: con

Davide Gessa 1 Sep 14, 2022
WildHack 2021 solution by Nuclear Foxes team (public version).

WildHack 2021 Nuclear Foxes Team This repo contains our project for the Wildberries Hackathon 2021. Task 2: Searching tags Implement an algorithm of r

Sergey Zakharov 1 Apr 18, 2022
These are the scripts used for the project of ‘Assembly of a pan-genome for global cattle reveals missing sequence and novel structural variation, providing new insights into their diversity and evolution history’

script-SV-genotyping These are the scripts used for the project of ‘Assembly of a pan-genome for global cattle reveals missing sequence and novel stru

2 Aug 26, 2022
A country information finder module

A country information finder module

Fayas Noushad 3 Nov 28, 2021
Blender addon that enables exporting of xmodels from blender. Great for custom asset creation for cod games

Birdman's XModel Tools For Blender Greetings everyone in the custom cod community. This blender addon should finally enable exporting of custom assets

wast 2 Jul 02, 2022
Time python - Códigos para auxiliar e mostrar formas de como fazer um relógio e manipular o seu tempo

Time_python Códigos para auxiliar e mostrar formas de como fazer um relógio e manipular o seu tempo. Bibliotecas Nestes foram usadas bibliotecas nativ

Eduardo Henrique 1 Jan 03, 2022
A Notifier Program that Notifies you to relax your eyes Every 15 Minutes👀

Every 15 Minutes is an application that is used to Notify you to Relax your eyes Every 15 Minutes, This is fully made with Python and also with the us

Ashely Sato 1 Nov 02, 2021
En este repositorio pondré archivos graciositos de python que hago de vez en cuando

🐍 Apuntes de python 🐍 ¿Quién soy? 👽 Saludos,mi nombre es Carlos Lara. Pero mi nickname en internet es Hercules Kan. Soy un programador autodidacta

Carlos E. Lara 3 Nov 16, 2021
Checks for Vaccine Availability at your district and notifies you using E-mail, subscribe to our website.

Vaccine Availability Notifier Project Description Checks for Vaccine Availability at your district and notifies you using E-mail every 10 mins. Kindly

Farhan Hai Khan 19 Jun 03, 2021
Model synchronization from dbt to Metabase.

dbt-metabase Model synchronization from dbt to Metabase. If dbt is your source of truth for database schemas and you use Metabase as your analytics to

Mike Gouline 270 Jan 08, 2023
MobaXterm-GenKey

MobaXterm-GenKey 你懂的!! 本地启动 需要安装Python3!!!

malaohu 328 Dec 29, 2022
GWAS summary statistics files QC tool

SSrehab dependencies: python 3.8+ a GNU/Linux with bash v4 or 5. python packages in requirements.txt bcftools (only for prepare_dbSNPs) gz-sort (only

21 Nov 02, 2022
TrainingBike - Code, models and schematics I've used to interface my stationary training bike with PC.

TrainingBike Code, models and schematics I've used to interface my stationary training bike with PC. You can find more information about the project i

1 Jan 01, 2022