CNN+LSTM+CTC based OCR implemented using tensorflow.

Overview

CNN_LSTM_CTC_Tensorflow

CNN+LSTM+CTC based OCR(Optical Character Recognition) implemented using tensorflow.

Note: there is No restriction on the number of characters in the image (variable length). Have a look at the image bellow.

I trained a model with 100k images using this code and got 99.75% accuracy on test dataset (200k images) in the competition. The images in both dataset:

Update 2017.11.6:

The competiton page is not available now, if you want to reproduce this result, please see this issue about dataset, the lable file (a .txt file) is in the same folder with images after extracting .tar.gz file.

Update 2018.4.24:

Update to tensorflow 1.7 and fix some bugs reported at issue #8.

Structure

The images are first processed by a CNN to extract features, then these extracted features are fed into a LSTM for character recognition.

The architecture of CNN is just Convolution + Batch Normalization + Leaky Relu + Max Pooling for simplicity, and the LSTM is a 2 layers stacked LSTM, you can also try out Bidirectional LSTM.

You can play with the network architecture (add dropout to CNN, stacked layers of LSTM etc.) and see what will happen. Have a look at CNN part and LSTM part.

Prerequisite

  1. Python 3.6.4

  2. TensorFlow 1.2

  3. Opencv3 (Not a must, used to read images).

How to run

There are many other parameters with which you can play, have a look at utils.py.

Note that the num_classes is not added to parameters talked above for clarification.

# cd to the your workspace.
# The code will evaluate the accuracy every validation_steps specified in parameters.

ls -R
  .:
  imgs  utils.py  helper.py  main.py  cnn_lstm_otc_ocr.py

  ./imgs:
  train  infer  val  labels.txt
  
  ./imgs/train:
  1.png  2.png  ...  50000.png
  
  ./imgs/val:
  1.png  2.png  ...  50000.png

  ./imgs/infer:
  1.png  2.png  ...  300000.png
   
  
# Train the model.
CUDA_VISIBLE_DEVICES=0 python ./main.py --train_dir=../imgs/train/ \
  --val_dir=../imgs/val/ \
  --image_height=60 \
  --image_width=180 \
  --image_channel=1 \
  --out_channels=64 \
  --num_hidden=128 \
  --batch_size=128 \
  --log_dir=./log/train \
  --num_gpus=1 \
  --mode=train

# Inference
CUDA_VISIBLE_DEVICES=0 python ./main.py --infer_dir=./imgs/infer/ \
  --checkpoint_dir=./checkpoint/ \
  --num_gpus=0 \
  --mode=infer

Run with your own data.

  1. Prepare your data, make sure that all images are named in format: id_label.jpg, e.g: 004_(1+4)*2.jpg.
# make sure the data path is correct, have a look at helper.py.

python helper.py
  1. Run following How to run
Owner
Watson Yang
Watson Yang
This tool will help you convert your text to handwriting xD

So your teacher asked you to upload written assignments? Hate writing assigments? This tool will help you convert your text to handwriting xD

Saurabh Daware 4.2k Jan 07, 2023
A curated list of papers, code and resources pertaining to image composition

A curated list of resources including papers, datasets, and relevant links pertaining to image composition.

BCMI 391 Dec 30, 2022
Official code for ROCA: Robust CAD Model Retrieval and Alignment from a Single Image (CVPR 2022)

ROCA: Robust CAD Model Alignment and Retrieval from a Single Image (CVPR 2022) Code release of our paper ROCA. Check out our video, paper, and website

123 Dec 25, 2022
Some codes from PyImageSearch course's and external projects.

πŸ‘¨β€πŸ’» Some codes and projects πŸ‘¨β€πŸ’» πŸ’‘ Technologies πŸ“œ Projects πŸ“ Chrome Dinosaur Controller πŸ“¦ Script πŸ“ Coins Counter πŸ“¦ Script πŸ€“ Author Lucas Biv

Lucas Bivar 25 Oct 24, 2021
A pkg stiching around view images(4-6cameras) to generate bird's eye view.

AVP-BEV-OPEN Please check our new work AVP_SLAM_SIM A pkg stiching around view images(4-6cameras) to generate bird's eye view! View Demo Β· Report Bug

Xinliang Zhong 37 Dec 01, 2022
Handwritten_Text_Recognition

Deep Learning framework for Line-level Handwritten Text Recognition Short presentation of our project Introduction Installation 2.a Install conda envi

24 Jul 15, 2022
Morphological edge detection or object's boundary detection using erosion and dialation in OpenCV python

Morphologycal-edge-detection-using-erosion-and-dialation the task is to detect object boundary using erosion or dialation . Here, use the kernel or st

Tamzid hasan 3 Nov 25, 2022
Single Shot Text Detector with Regional Attention

Single Shot Text Detector with Regional Attention Introduction SSTD is initially described in our ICCV 2017 spotlight paper. A third-party implementat

Pan He 215 Dec 07, 2022
One Metrics Library to Rule Them All!

onemetric Installation Install onemetric from PyPI (recommended): pip install onemetric Install onemetric from the GitHub source: git clone https://gi

Piotr Skalski 49 Jan 03, 2023
Optical character recognition for Japanese text, with the main focus being Japanese manga

Manga OCR Optical character recognition for Japanese text, with the main focus being Japanese manga. It uses a custom end-to-end model built with Tran

Maciej BudyΕ› 327 Jan 01, 2023
Code for CVPR 2022 paper "Bailando: 3D dance generation via Actor-Critic GPT with Choreographic Memory"

Bailando Code for CVPR 2022 (oral) paper "Bailando: 3D dance generation via Actor-Critic GPT with Choreographic Memory" [Paper] | [Project Page] | [Vi

Li Siyao 237 Dec 29, 2022
A Vietnamese personal card OCR website built with Django.

Django VietCardOCR Installation Creation of virtual environments is done by executing the command venv: python -m venv venv That will create a new fol

Truong Hoang Thuan 4 Sep 04, 2021
SceneCollisionNet This repo contains the code for "Object Rearrangement Using Learned Implicit Collision Functions", an ICRA 2021 paper. For more info

SceneCollisionNet This repo contains the code for "Object Rearrangement Using Learned Implicit Collision Functions", an ICRA 2021 paper. For more info

NVIDIA Research Projects 31 Nov 22, 2022
Ocular is a state-of-the-art historical OCR system.

Ocular Ocular is a state-of-the-art historical OCR system. Its primary features are: Unsupervised learning of unknown fonts: requires only document im

228 Dec 30, 2022
Repository for playing the computer vision apps: People analytics on Raspberry Pi.

play-with-torch Repository for playing the computer vision apps: People analytics on Raspberry Pi. Tools Tested Hardware RasberryPi 4 Model B here, RA

eMHa 1 Sep 23, 2021
πŸ“· Face Recognition using Haar-Cascade Classifier, OpenCV, and Python

Face-Recognition-System Face Recognition using Haar-Cascade Classifier, OpenCV and Python. This project is based on face detection and face recognitio

1 Jan 10, 2022
Controlling the computer volume with your hands // OpenCV

HandsControll-AI Controlling the computer volume with your hands // OpenCV Step 1 git clone https://github.com/Hayk-21/HandsControll-AI.git pip instal

Hayk 1 Nov 04, 2021
Document blur detection based on Laplacian operator and text detection.

Document Blur Detection For general blurred image, using the variance of Laplacian operator is a good solution. But as for the blur detection of docum

JoeyLr 5 Oct 20, 2022
Characterizing possible failure modes in physics-informed neural networks.

Characterizing possible failure modes in physics-informed neural networks This repository contains the PyTorch source code for the experiments in the

Aditi Krishnapriyan 55 Jan 02, 2023
Reference Code for AAAI-20 paper "Multi-Stage Self-Supervised Learning for Graph Convolutional Networks on Graphs with Few Labels"

Reference Code for AAAI-20 paper "Multi-Stage Self-Supervised Learning for Graph Convolutional Networks on Graphs with Few Labels" Please refer to htt

Ke Sun 1 Feb 14, 2022