To effectively detect the faulty wafers

Overview

wafer_fault_detection

Aim of the project:

In electronics, a wafer (also called a slice or substrate) is a thin slice of semiconductor, such as crystalline silicon (c-Si), used for the fabrication of integrated circuits and, in photovoltaics, to manufacture solar cells. The wafer serves as the substrate for microelectronic devices built in and upon the wafer. The project aims to successfully identify the state of the provided wafer by classifying it between one of the two-class +1 (good, can be used as a substrate) or -1 (bad, the substrate need to be replaced). In this regard, a training dataset is provided to build a machine learning classification model, which can predict the wafer quality.

Data Description:

The columns of provided data can be classified into 3 parts: wafer name, sensor values and label. The wafer name contains the batch number of the wafer, whereas the sensor values obtained from the measurement carried out on the wafer. The label column contains two unique values +1 and -1 that identifies if the wafer is good or need to be replaced. Additionally, we also require a schema file, which contains all the relevant information about the training files such as file names, length of date value in the file name, length of time value in the file name, number of columns, name of the columns, and their datatype.

Directory creation:

All the necessary folders were created to effectively separate the files so that the end-user can get easy access to them.

Data Validation:

In this step, we matched our dataset with the provided schema file to match the file names, the number of columns it should contain, their names as well as their datatype. If the files matched with the schema values, then it is considered a good file on which we can train or predict our model, if not then the files are considered as bad and moved to the bad folder. Moreover, we also identify the columns with null values. If the whole column data is missing then we also consider the file as bad, on the contrary, if only a fraction of data in a column is missing then we initially fill it with NaN and consider it as good data.

Data Insertion in Database:

First, we create a database with the given name passed. If the database is already created, open the connection to the database. A table with the name- "train_good_raw_dt" or "pred_good_raw_dt" is created in the database, based on training or prediction, for inserting the good data files obtained from the data validation step. If the table is already present, then the new table is not created, and new files are inserted in the already present table as we want training to be done on new as well as old training files. In the end, the data in a stored database is exported as a CSV file to be used for model training.

Data Pre-processing and Model Training:

In the training section, first, the data is checked for the NaN values in the columns. If present, impute the NaN values using the KNN imputer. The column with zero standard deviation was also identified and removed as they don't give any information during model training. A prediction schema was created based on the remained dataset columns. Afterwards, the KMeans algorithm is used to create clusters in the pre-processed data. The optimum number of clusters is selected by plotting the elbow plot, and for the dynamic selection of the number of clusters, we are using the "KneeLocator" function. The idea behind clustering is to implement different algorithms to train data in different clusters. The Kmeans model is trained over pre-processed data and the model is saved for further use in prediction. After clusters are created, we find the best model for each cluster. We are using four algorithms, "Random Forest" “K Neighbours”, “Logistic Regression” and "XGBoost". For each cluster, both the algorithms are passed with the best parameters derived from GridSearch. We calculate the AUC scores for both models and select the model with the best score. Similarly, the best model is selected for each cluster. All the models for every cluster are saved for use in prediction. In the end, the confusion matrix of the model associated with every cluster is also saved to give a glance at the performance of the models.

Prediction:

In data prediction, first, the essential directories are created. The data validation, data insertion and data processing steps are similar to the training section. The KMeans model created during training is loaded, and clusters for the pre-processed prediction data is predicted. Based on the cluster number, the respective model is loaded and is used to predict the data for that cluster. Once the prediction is made for all the clusters, the predictions along with the Wafer names are saved in a CSV file at a given location.

Deployment:

We will be deploying the model to Heroku Cloud.

Owner
Arun Singh Babal
Engineer | Data Science Enthusiasts | Machine Learning | Deep Learning | Advanced Computer Vision.
Arun Singh Babal
Exploiting Linksys WRT54G using a vulnerability I found.

Exploiting Linksys WRT54G Exploit # Install the requirements. pip install -r requirements.txt ROUTER_HOST=192.169.1.1 ROUTER_USERNAME=admin ROUTER_P

Elon Gliksberg 31 May 29, 2022
Notes on the Deep Learning book from Ian Goodfellow, Yoshua Bengio and Aaron Courville (2016)

The Deep Learning Book - Goodfellow, I., Bengio, Y., and Courville, A. (2016) This content is part of a series following the chapter 2 on linear algeb

hadrienj 1.7k Jan 07, 2023
Data derived from the OpenType specification

This package currently provides the opentypespec.tags module, which exports FEATURE_TAGS, SCRIPT_TAGS, LANGUAGE_TAGS and BASELINE_TAGS dictionaries, representing data from the Layout Tag Registry

Simon Cozens 4 Dec 01, 2022
Python data loader for Solar Orbiter's (SolO) Energetic Particle Detector (EPD).

Data loader (and downloader) for Solar Orbiter/EPD energetic charged particle sensors EPT, HET, and STEP. Supports level 2 and low latency data provided by ESA's Solar Orbiter Archive.

Jan Gieseler 9 Dec 16, 2022
Free APN For Python

Free APN For Python

XENZI GANZZ 4 Apr 22, 2022
OWASP Foundation Web Respository

WWWGrep OWASP Foundation Web Respository Author: Mark Deen & Aditi Mohan Introduction WWWGrep is a rapid search “grepping” mechanism that examines HTM

OWASP 34 Jun 15, 2022
Repositório do programa ConstruDelas - Trilha Python - Módulos 1 e 2

ConstruDelas - Introdução ao Python Nome: Visão Geral Bem vinda ao repositório do curso ConstruDelas, módulo de Introdução ao Python. Aqui vamos mante

WoMakersCode 8 Oct 14, 2022
Project issue to website data transformation toolkit

braintransform Project issue to website data transformation toolkit. Introduction The purpose of these scripts is to be able to dynamically generate t

Brainhack 1 Nov 19, 2021
Herramienta para poder automatizar reuniones en Zoom.

Crear Reunión Zoom con Python Herramienta para poder automatizar reuniones en Zoom. Librerías Requeridas Nombre Comando PyAutoGui pip install pyautogu

JkDev 3 Nov 12, 2022
Commodore 64 OS running on Atari 8-bit hardware

This is the Commodre 64 KERNAL, modified to run on the Atari 8-bit line of computers. They're practically the same machine; why didn't someone try this 30 years ago?

Nick Bensema 133 Nov 12, 2022
Think DSP: Digital Signal Processing in Python, by Allen B. Downey.

ThinkDSP LaTeX source and Python code for Think DSP: Digital Signal Processing in Python, by Allen B. Downey. The premise of this book (and the other

Allen Downey 3.2k Jan 08, 2023
PyLaboratory 0 Feb 07, 2022
Library for managing git hooks

Autohooks Library for managing and writing git hooks in Python. Looking for automatic formatting or linting, e.g., with black and pylint, while creati

Greenbone 165 Dec 16, 2022
A practice program to find the LCM i.e Lowest Common Multiplication of two numbers using python without library.

Finding-LCM-using-python-from-scratch Here, I write a practice program to find the LCM i.e Lowest Common Multiplication of two numbers using python wi

Sachin Vinayak Dabhade 4 Sep 24, 2021
AMTIO aka All My Tools in One

AMTIO AMTIO aka All My Tools In One. I plan to put a bunch of my tools in this one repo since im too lazy to make one big tool. Installation git clone

osintcat 3 Jul 29, 2021
Example applications, dashboards, scripts, notebooks, and other utilities built using Polygon.io

Polygon.io Examples Example applications, dashboards, scripts, notebooks, and other utilities built using Polygon.io. Examples Preview Name Type Langu

Tim Paine 4 Jun 01, 2022
Curso de Python 3 do Básico ao Avançado

Curso de Python 3 do Básico ao Avançado Desafio: Buscador de arquivos Criar um programa que faça a pesquisa de arquivos. É fornecido o caminho e um te

Diego Guedes 1 Jan 21, 2022
Jogo em redes similar ao clássico pedra papel e tesoura

Batalha Tática Tecnologias de Redes de Computadores-A-N-JOGOS DIGITAIS Professor Fabio Henrique Cabrini Alunos: Eric Henrique de Oliveira Silva - RA 1

Eric Henrique de Oliveira Silva 1 Dec 01, 2021
Code emulator plugin for IDA Pro

emu_ida Code emulator plugin for IDA Pro (v 0.0.6) The plugin is designed for simple data decryption and getting stack strings. Requirements Emulator

Andrey Zhdanov 11 Jul 06, 2022
Enjoy Discords Unlimited Storage

Discord Storage V.3.5 (Beta) Made by BoKa Enjoy Discords free and unlimited storage... Prepare: Clone this from Github, make sure there either a folde

0 Dec 16, 2021