Handwritten Character Recognition using CNN

Overview

Handwritten Character Recognition using CNN

Problem Definition

The main objective of this project is to solve the problem of handwritten character recognition. It is a multi-class image classification problem where the task is to correctly recognize the given handwritten character (the character can be a digit (0-9) or a capital alphabet (A-Z)).

Character recognition, usually abbreviated to optical character recognition or shortened OCR, is the mechanical or electronic translation of images of handwritten, typewritten or printed text (usually captured by a scanner) into machine-editable text. It is an open problem in the fields of computer vision and deep learning. It is a problem which looks easy, but is hard to implement. Even with so many advances in the fields of computer vision and deep learning, 100% accuracy in this problem has not yet been achieved.

This project targets an easier problem than proper handwriting recognition. Here, the objective is to recognize separate characters rather than cursive handwriting.

Since image processing and training neural networks is generally a heavy task, and given the large training set size, parallel computing via CUDA for training the network on GPU has also been explored in this project.

Analysis

The problem is approached using Convolutional Neural Networks (CNNs) and coded in Python. The framework used for CNNs is Pytorch, which is an open-source machine learning library based on the Torch library, used for applications such as computer vision and natural language processing, primarily developed by Facebook's AI Research lab.

2 datasets have been combined to form the training data for this problem. The first one is the MNIST dataset containing 60,000 images for handwritten digits. The second one is a modified version of the NIST Special Database 19, called the Kaggle A-Z dataset (by Sachin Patel). It contains 3,72,450 images of handwritten alphabets (A-Z) in a CSV

format, making it easy to load and pre-process data. Each of these datasets contains grayscale images (1-channel) of shape 28x28.

The model developed follows a CNN architecture with Convolutional layers for feature extraction, Pooling and Dropout layers for regularization (to prevent overfitting) and finally Fully Connected layers for classifying the images. The model has a bit more than 5 Million trainable parameters.

The model uses a Negative Log Likelihood loss function, which is a commonly used loss function for image classification tasks. The optimizer used is Adam, which is known to provide better results than simple optimizers like SGD.

The output of the model is log-probabilities for each class. The maximum of these is taken as the predicted class for the image.

This model is not meant for cursive handwriting. It is meant to classify only single capital English letters (A-Z) and digits (0-9).

To achieve a desirable accuracy, taking advantage of the fact that training data is abundant, a bit complex architecture comprising several Convolutional and Dense layers has been constructed. To minimize training times on this complex architecture, the model has been trained on a GPU via Pytorch’s API for CUDA.

Implementation and Testing

As stated earlier, the project is implemented using Python. The CNN model is built using Pytorch. The input images for training the model are stored in inputs folder. Training script is stored in src folder, while the modules for testing the model have been stored in a Jupyter Notebook stored in notebooks folder. Any custom images to

be tested can be placed inside the custom_images folder. The trained model weights are stored in models folder.

For training, a 6GB Nvidia GeForce GTX 1660Ti GPU was used. The code has been written in such a way that it will automatically detect if CUDA is available and will train on GPU, otherwise it will use CPU.

image

The above code first wraps the data inside a Dataset class, as required by Pytorch Data Loaders. Then, the data is split into training and validation sets (4,00,000 and 32,451 examples respectively). Finally, both the training and validation datasets are passed into DataLoader.

image

Then, the above code defines the CNN architecture used in this project. All the layers have already been described earlier. It also sets the optimizer to Adam and device to CUDA for training the model on GPU.

image

The training process involves first obtaining the current batch via the Pytorch Data Loader(the batch size has been set to 64, i.e. on a single iteration, 64 images will be passed to the model for efficient computation). The batch size can be increased depending upon the RAM and other computing resources available. Then, if CUDA is available, the data (images and the corresponding labels) are transferred to the GPU. The outputs are calculated via the current weights of the network, and the loss is computed via Negative Log Likelihood loss function. Then, a backward step is taken for training by the Backpropagation algorithm. The weights of the model are adjusted according to the loss. The optimizer function used for this is Adam. This process is repeated for 2 epochs over the entire training set (thus a total of 2 x 4,00,000 = 8,00,000 times). Since the training set is huge, the training process is observed to be much faster when run on a GPU than a CPU.

image

For testing on the validation set, again the data is first transferred to GPU (if available). Then the outputs are calculated by passing the input to the model. The model outputs log likelihoods. For getting the output label, the maximum of these likelihoods is taken.

Testing on custom image is a bit more complex, since most modern cameras take high resolution RGB (3-channel) pictures. First, the images are reduced from 3 channels to

1 channel (i.e. from RGB to grayscale). If the images are of a very high resolution (greater than 1500 pixels), then Gaussian Blurring is applied to smoothen the image. Then, the images are reshaped to 28x28 pixels (since the model was trained on 28x28 shape images). Normally, custom images will have a white background (white paper) and black ink, but the model had images with black background and white ink. So, the colours of all images are inverted (so that they have black background with white ink on top). Then, to sharpen the image and remove noise, all pixels with a value above 127 are converted to 255 (white) and below 127 are converted to 0. i.e. the image is converted to pure black and white to remove all noise. Finally, the transformations applied to training images are applied to these images too, i.e. pixel values are divided by 255, normalized and converted to Pytorch tensors. Finally, prediction is made using these tensors. Pytorch Data Loaders have not been used when testing the model on individual images.

image

Original image:

image

Pre-processed image:

image

For best results, the custom images should have less noise (background must be as clean as possible), and the ink used must be thick, preferably a sketch pen instead of a regular gel/ball pen (because thin ink combined with high resolution will lead to a poor quality image when resized to 28x28). The provided custom images were taken from a mobile camera producing images of resolution 3472x4624. The digits were written with a black marker on a whiteboard.

The model achieves an overall training accuracy of 98.2% and validation accuracy of 98%. Since the difference is not significantly large, it is verified that the model is not overfitting. The results can be further improved through techniques like image augmentation, regularization, building a deeper architecture and getting more training data.

Summary

In this project, a CNN model with more than 5 million parameters was successfully trained to recognize single handwritten capital English alphabets (A-Z) and digits (0- 9). The model achieves a satisfactory accuracy on the dataset and performs reasonably well on custom images. Performance on custom images can be improved through various steps described earlier. Further, it was noticed that the training time was significantly shorter when the model was trained on GPU than CPU. This model classifies only single characters. To classify a complete line of text consisting both alphabets and digits (in non-cursive form), this program can be extended via opencv’s functionalities and some pre-built object detection models to detect where the text is written, isolate them and classify each of the characters separately.

References

• Official Pytorch documentation - https://pytorch.org/tutorials/
• Notes from Stanford’s course CS231n - https://cs231n.github.io/
https://www.thinkautomation.com/bots-and-ai/why-is-handwriting-recognition- so-difficult-for-ai/
• OpenCV tutorials - https://opencv-python- tutroals.readthedocs.io/en/latest/py_tutorials/py_imgproc/py_table_of_contents _imgproc/py_table_of_contents_imgproc.html

Links to Datasets Used

• MNIST: https://www.kaggle.com/oddrationale/mnist-in-csv
• Modified NIST Special Database 19: https://www.kaggle.com/sachinpatel21/az-handwritten-alphabets-in-csv-format

Owner
Mohit Kaushik
Mohit Kaushik
Single Shot Text Detector with Regional Attention

Single Shot Text Detector with Regional Attention Introduction SSTD is initially described in our ICCV 2017 spotlight paper. A third-party implementat

Pan He 215 Dec 07, 2022
Go package for OCR (Optical Character Recognition), by using Tesseract C++ library

gosseract OCR Golang OCR package, by using Tesseract C++ library. OCR Server Do you just want OCR server, or see the working example of this package?

Hiromu OCHIAI 1.9k Dec 28, 2022
RRD: Rotation-Sensitive Regression for Oriented Scene Text Detection

RRD: Rotation-Sensitive Regression for Oriented Scene Text Detection For more details, please refer to our paper. Citing Please cite the related works

Minghui Liao 102 Jun 29, 2022
This is a GUI for scrapping PDFs with the help of optical character recognition making easier than ever to scrape PDFs.

pdf-scraper-with-ocr With this tool I am aiming to facilitate the work of those who need to scrape PDFs either by hand or using tools that doesn't imp

Jacobo José Guijarro Villalba 75 Oct 21, 2022
Using python libraries to track hands

Python-HandTracking Using python libraries to track hands on a camera Uses cv2 and mediapipe libraries custom hand tracking module PyCharm IDE Final E

Martin Matsudaira 1 Dec 17, 2021
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched or copy-pasted. ocrmypdf # it's a scriptable c

jbarlow83 7.9k Jan 03, 2023
keras复现场景文本检测网络CPTN: 《Detecting Text in Natural Image with Connectionist Text Proposal Network》;欢迎试用,关注,并反馈问题...

keras-ctpn [TOC] 说明 预测 训练 例子 4.1 ICDAR2015 4.1.1 带侧边细化 4.1.2 不带带侧边细化 4.1.3 做数据增广-水平翻转 4.2 ICDAR2017 4.3 其它数据集 toDoList 总结 说明 本工程是keras实现的CPTN: Detecti

mick.yi 107 Jan 09, 2023
Line based ATR Engine based on OCRopy

OCR Engine based on OCRopy and Kraken using python3. It is designed to both be easy to use from the command line but also be modular to be integrated

948 Dec 23, 2022
CellProfiler is a open-source application for biological image analysis

CellProfiler is a free open-source software designed to enable biologists without training in computer vision or programming to quantitatively measure phenotypes from thousands of images automaticall

CellProfiler 732 Dec 23, 2022
Developed an AI-based system to control the mouse cursor using Python and OpenCV with the real-time camera.

Developed an AI-based system to control the mouse cursor using Python and OpenCV with the real-time camera. Fingertip location is mapped to RGB images to control the mouse cursor.

Ravi Sharma 71 Dec 20, 2022
An official PyTorch implementation of the paper "Learning by Aligning: Visible-Infrared Person Re-identification using Cross-Modal Correspondences", ICCV 2021.

PyTorch implementation of Learning by Aligning (ICCV 2021) This is an official PyTorch implementation of the paper "Learning by Aligning: Visible-Infr

CV Lab @ Yonsei University 30 Nov 05, 2022
This is a repository to learn and get more computer vision skills, make robotics projects integrating the computer vision as a perception tool and create a lot of awesome advanced controllers for the robots of the future.

This is a repository to learn and get more computer vision skills, make robotics projects integrating the computer vision as a perception tool and create a lot of awesome advanced controllers for the

Elkin Javier Guerra Galeano 17 Nov 03, 2022
Learn computer graphics by writing GPU shaders!

This repo contains a selection of projects designed to help you learn the basics of computer graphics. We'll be writing shaders to render interactive two-dimensional and three-dimensional scenes.

Eric Zhang 1.9k Jan 02, 2023
learn how to use Gesture Control to change the volume of a computer

Volume-Control-using-gesture In this project we are going to learn how to use Gesture Control to change the volume of a computer. We first look into h

Diwas Pandey 49 Sep 22, 2022
Python Computer Vision Aim Bot for Roblox's Phantom Forces

Python-Phantom-Forces-Aim-Bot Python Computer Vision Aim Bot for Roblox's Phanto

drag0ngam3s 2 Jul 11, 2022
Python tool that takes the OCR.space JSON output as input and draws a text overlay on top of the image.

OCR.space OCR Result Checker = Draw OCR overlay on top of image Python tool that takes the OCR.space JSON output as input, and draws an overlay on to

a9t9 4 Oct 18, 2022
Drowsiness Detection and Alert System

A countless number of people drive on the highway day and night. Taxi drivers, bus drivers, truck drivers, and people traveling long-distance suffer from lack of sleep.

Astitva Veer Garg 4 Aug 01, 2022
This repo contains a script that allows us to find range of colors in images using openCV, and then convert them into geo vectors.

Vectorizing color range This repo contains a script that allows us to find range of colors in images using openCV, and then convert them into geo vect

Development Seed 9 Jul 27, 2022
Controlling the computer volume with your hands // OpenCV

HandsControll-AI Controlling the computer volume with your hands // OpenCV Step 1 git clone https://github.com/Hayk-21/HandsControll-AI.git pip instal

Hayk 1 Nov 04, 2021
Page to PAGE Layout Analysis Tool

P2PaLA Page to PAGE Layout Analysis (P2PaLA) is a toolkit for Document Layout Analysis based on Neural Networks. 💥 Try our new DEMO for online baseli

Lorenzo Quirós Díaz 180 Nov 24, 2022