Handwritten_Text_Recognition

Overview

Deep Learning framework for Line-level Handwritten Text Recognition

Short presentation of our project

  1. Introduction

  2. Installation
    2.a Install conda environment
    2.b Download databases

    • IAM dataset
    • ICFHR 2014 dataset
  3. How to use
    3.a Make predictions on unlabelled data using our best networks
    3.b Train and test a network from scratch
    3.c Test a model without retraining it

  4. References

  5. Contact

1. Introduction

This work was an internship project under Mathieu Aubry's supervision, at the LIGM lab, located in Paris.

In HTR, the task is to predict a transcript from an image of a handwritten text. A commonly used structure for this task is Convolutional Recurrent Neural Networks (CRNN). One CRNN network consists of a feature extractor (often with convolutional layers), followed by a recurrent network (LSTM).

This github provides a framework to train and test CRNN networks on handwritten grayscale line-level datasets. This github also provides code to generate predictions on an unlabelled, line-level, grayscale line-level dataset. There are several options for the structure of the CRNN used, image preprocessing, dataset used, data augmentation.

alt text

2. Installation

Prerequisites

Make sure you have Anaconda installed (version >= to 4.7.10, you may not be able to install correct dependencies if older). If not, follow the installation instructions provided at https://docs.anaconda.com/anaconda/install/.

Also pull the git.

2.a Download and activate conda environment

Once in the git folder on your machine, run the command lines :

conda env create -f HTR_environment.yml
conda activate HTR 

2.b Download databases

You will only need to download these databases if you want to train your own network from scratch. The framework is built to train a network on one of these 2 datasets : IAM and ICFHR2014 HTR competition. [ADD REF TO SLIDES]

  • Before downloading IAM dataset, you need to register on this website. Once that's done, you need to download :

    • The 'lines' folder at this link.
    • The 'split' folder at this link.
    • The 'lines.txt' file at this link.
  • For ICFHR2014 dataset, you need to download the 'BenthamDatasetR0-GT' folder at this link.

Make sure to download the two databases in the same folder. Structure must be

Your data folder / 
    IAM/
        lines.txt
        lines/
        split/
            trainset.txt
            testset.txt
            validationset1.txt
            validationset2.txt
            
    ICFHR2014/
        BenthamDatasetR0-GT/ 

    Your own dataset/

3. How to use

3.a Make predictions on your own unlabelled dataset

Running this code will use model stored at model_path to make predictions on images stored in data_path. The predictions will be stored in predictions.txt in data_path folder.

python lines_predictor.py --data_path datapath  --model_path ./trained_networks/IAM_model_imgH64.pth --imgH 64

/!\ Make sure that each image in the data folder has a unique file name and all images are in .jpg form. When you use our trained model with imgH as 64 (i.e. IAM_model_imgH64.pth), you have to set the argument --imgH as 64.

3.b Train a network from scratch

python train.py --dataset dataset  --tr_data_path data_dir --save_model_path path

Before running the code, make sure that you change ROOT_PATH variable at the beginning of params.py to the path of the folder you want to save your models in. Main arguments :

  • --dataset: name of the dataset to train and test on. Supported values are ICFHR2014 and IAM.
  • --tr_data_path: location of the train dataset folder on local machine. See section [??] for downloading datasets.
  • --save_model_path: path of the folder where model will be saved if params.save is set to True.

Main learning arguments :

  • --data_aug: If set to True, will apply random affine data transformation to the training images.

  • --optimizer: Which optimizer to use. Supported values are rmsprop, adam, adadelta, and sgd. We recommend using RMSprop, which got best results in our experiments. See params.py for optimizer-specific parameters.

  • --epochs : Number of training epochs

  • --lr: Learning rate at the beginning of training.

  • --milestones: List of the epochs at which the learning rate will be divided by 10.

  • feat_extractor: Structure to use for the feature extractor. Supported values are resnet18, custom_resnet, and conv.

    • resnet18 : standard structure of resnet18.
    • custom_resnet: variant of resnet18 that we tuned for our experiments.
    • conv: Use this option if you want to use a purely convolutional feature extractor and not a residual one. See conv parameters in params.py to choose conv structure.

3.c Test a model without retraining it

Running this code will compute the average CER and WER of model stored at pretrained_model path on the testing set of chosen dataset.

python train.py --train '' --save '' --pretrained_model model_path --dataset dataset --tr_data_path data_path 

Main arguments :

  • --pretrained_model: path to state_dict of pretrained model.
  • --dataset: Which dataset to test on. Supported values are ICFHR2014 and IAM.
  • --tr_data_path: path to the dataset folder (see section [??])

4. References

Graves et al. Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks
Sánchez et al. A set of benchmarks for Handwritten Text Recognition on historical documents
Dutta et al. Improving CNN-RNN Hybrid Networks for Handwriting Recognition

U.-V. Marti, H. Bunke The IAM-database: an English sentence database for offline handwriting recognition

https://github.com/Holmeyoung/crnn-pytorch
https://github.com/georgeretsi/HTR-ctc
Synthetic line generator : https://github.com/monniert/docExtractor (see paper for more information)

5. Contact

If you have questions or remarks about this project, please email us at [email protected] and [email protected].

Multi-choice answer sheet correction system using computer vision with opencv & python.

Multi choice answer correction 🔴 5 answer sheet samples with a specific solution for detecting answers and sheet correction. 🔴 By running the soluti

Reza Firouzi 7 Mar 07, 2022
Comparison-of-OCR (KerasOCR, PyTesseract,EasyOCR)

Optical Character Recognition OCR (Optical Character Recognition) is a technology that enables the conversion of document types such as scanned paper

21 Dec 25, 2022
Assignment work with webcam

work with webcam : Press key 1 to use emojy on your face Press key 2 to use lip and eye on your face Press key 3 to checkered your face Press key 4 to

Hanane Kheirandish 2 May 31, 2022
Automatically download multiple papers by keywords in CVPR

CVFPaperHelper Automatically download multiple papers by keywords in CVPR Install mkdir PapersToRead cd PaperToRead pip install requests tqdm git clon

46 Jun 08, 2022
Balabobapy - Using artificial intelligence algorithms to continue the text

Balabobapy - Using artificial intelligence algorithms to continue the text

qxtony 1 Feb 04, 2022
MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition

MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition Python 2.7 Python 3.6 MORAN is a network with rectification mechanism for

Canjie Luo 595 Dec 27, 2022
Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)

ocr-fileformat Validate and transform between OCR file formats (hOCR, ALTO, PAGE, FineReader) Installation Docker System-wide Usage CLI GUI API Transf

Universitätsbibliothek Mannheim 152 Dec 20, 2022
An easy to use an (hopefully useful) captcha solution for pyTelegramBotAPI

pyTelegramBotCAPTCHA An easy to use and (hopefully useful) image CAPTCHA soltion for pyTelegramBotAPI. Installation: pip install pyTelegramBotCAPTCHA

29 Dec 26, 2022
Ackermann Line Follower Robot Simulation.

Ackermann Line Follower Robot This is a simulation of a line follower robot that works with steering control based on Stanley: The Robot That Won the

Lucas Mazzetto 2 Apr 16, 2022
Demo for the paper "Overlap-aware low-latency online speaker diarization based on end-to-end local segmentation"

Streaming speaker diarization Overlap-aware low-latency online speaker diarization based on end-to-end local segmentation by Juan Manuel Coria, Hervé

Juanma Coria 185 Jan 01, 2023
Rubik's Cube in pygame with OpenGL

Rubik Rubik's Cube in pygame with OpenGL The script show on the screen a Rubik Cube buit with OpenGL. Then I have also implemented all the possible mo

Gabro 2 Apr 15, 2022
This is a passport scanning web service to help you scan, identify and validate your passport created with a simple and flexible design and ready to be integrated right into your system!

Passport-Recogniton-System This is a passport scanning web service to help you scan, identify and validate your passport created with a simple and fle

Mo'men Ashraf Muhamed 7 Jan 04, 2023
CRAFT-Pyotorch:Character Region Awareness for Text Detection Reimplementation for Pytorch

CRAFT-Reimplementation Note:If you have any problems, please comment. Or you can join us weChat group. The QR code will update in issues #49 . Reimple

453 Dec 28, 2022
TextBoxes re-implement using tensorflow

TextBoxes-TensorFlow TextBoxes re-implementation using tensorflow. This project is greatly inspired by slim project And many functions are modified ba

Gu Xiaodong 44 Dec 29, 2022
The project is an official implementation of our paper "3D Human Pose Estimation with Spatial and Temporal Transformers".

3D Human Pose Estimation with Spatial and Temporal Transformers This repo is the official implementation for 3D Human Pose Estimation with Spatial and

Ce Zheng 363 Dec 28, 2022
SRA's seminar on Introduction to Computer Vision Fundamentals

Introduction to Computer Vision This repository includes basics to : Python Numpy: A python library Git Computer Vision. The aim of this repository is

Society of Robotics and Automation 147 Dec 04, 2022
How to detect objects in real time by using Jupyter Notebook and Neural Networks , by using Yolo3

Real Time Object Recognition From your Screen Desktop . In this post, I will explain how to build a simply program to detect objects from you desktop

Ruslan Magana Vsevolodovna 2 Sep 28, 2022
Read Japanese manga inside browser with selectable text.

mokuro Read Japanese manga with selectable text inside a browser. See demo: https://kha-white.github.io/manga-demo mokuro_demo.mp4 Demo contains excer

Maciej Budyś 170 Dec 27, 2022
When Age-Invariant Face Recognition Meets Face Age Synthesis: A Multi-Task Learning Framework (CVPR 2021 oral)

MTLFace This repository contains the PyTorch implementation and the dataset of the paper: When Age-Invariant Face Recognition Meets Face Age Synthesis

Hzzone 120 Jan 05, 2023
CUTIE (TensorFlow implementation of Convolutional Universal Text Information Extractor)

CUTIE TensorFlow implementation of the paper "CUTIE: Learning to Understand Documents with Convolutional Universal Text Information Extractor." Xiaohu

Zhao,Xiaohui 147 Dec 20, 2022