CNN+LSTM+CTC based OCR implemented using tensorflow.

Last update: Dec 08, 2022

Overview

CNN_LSTM_CTC_Tensorflow

CNN+LSTM+CTC based OCR(Optical Character Recognition) implemented using tensorflow.

Note: there is No restriction on the number of characters in the image (variable length). Have a look at the image bellow.

I trained a model with 100k images using this code and got 99.75% accuracy on test dataset (200k images) in the competition. The images in both dataset:

Update 2017.11.6:

The competiton page is not available now, if you want to reproduce this result, please see this issue about dataset， the lable file (a .txt file) is in the same folder with images after extracting .tar.gz file.

Update 2018.4.24:

Update to tensorflow 1.7 and fix some bugs reported at issue #8.

Structure

The images are first processed by a CNN to extract features, then these extracted features are fed into a LSTM for character recognition.

The architecture of CNN is just Convolution + Batch Normalization + Leaky Relu + Max Pooling for simplicity, and the LSTM is a 2 layers stacked LSTM, you can also try out Bidirectional LSTM.

You can play with the network architecture (add dropout to CNN, stacked layers of LSTM etc.) and see what will happen. Have a look at CNN part and LSTM part.

Prerequisite

Python 3.6.4
TensorFlow 1.2
Opencv3 (Not a must, used to read images).

How to run

There are many other parameters with which you can play, have a look at utils.py.

Note that the num_classes is not added to parameters talked above for clarification.

# cd to the your workspace.
# The code will evaluate the accuracy every validation_steps specified in parameters.

ls -R
  .:
  imgs  utils.py  helper.py  main.py  cnn_lstm_otc_ocr.py

  ./imgs:
  train  infer  val  labels.txt
  
  ./imgs/train:
  1.png  2.png  ...  50000.png
  
  ./imgs/val:
  1.png  2.png  ...  50000.png

  ./imgs/infer:
  1.png  2.png  ...  300000.png
   
  
# Train the model.
CUDA_VISIBLE_DEVICES=0 python ./main.py --train_dir=../imgs/train/ \
  --val_dir=../imgs/val/ \
  --image_height=60 \
  --image_width=180 \
  --image_channel=1 \
  --out_channels=64 \
  --num_hidden=128 \
  --batch_size=128 \
  --log_dir=./log/train \
  --num_gpus=1 \
  --mode=train

# Inference
CUDA_VISIBLE_DEVICES=0 python ./main.py --infer_dir=./imgs/infer/ \
  --checkpoint_dir=./checkpoint/ \
  --num_gpus=0 \
  --mode=infer

Run with your own data.

Prepare your data, make sure that all images are named in format: id_label.jpg, e.g: 004_(1+4)*2.jpg.

# make sure the data path is correct, have a look at helper.py.

python helper.py

Run following How to run

CNN+LSTM+CTC based OCR implemented using tensorflow.

Related tags

Overview

CNN_LSTM_CTC_Tensorflow

Structure

Prerequisite

How to run

Run with your own data.

Owner

Watson Yang

Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.

Turn images of tables into CSV data. Detect tables from images and run OCR on the cells.

M-LSDを用いて四角形を検出し、射影変換を行うサンプルプログラム

This project is basically to draw lines with your hand, using python, opencv, mediapipe.

かの有名なあの東方二次創作ソング、「bad apple!」のMVをPythonでやってみたって話

1st place solution for SIIM-FISABIO-RSNA COVID-19 Detection Challenge

PyQT5 app that colorize black & white pictures using CNN(use pre-trained model which was made with OpenCV)

Opencv face recognition desktop application

Train custom VR face tracking parameters

Official code for :rocket: Unsupervised Change Detection of Extreme Events Using ML On-Board :rocket:

Localization of thoracic abnormalities model based on VinBigData (top 1%)

Rest API Written In Python To Classify NSFW Images.

ARU-Net - Deep Learning Chinese Word Segment

a deep learning model for page layout analysis / segmentation.

aardio的opencv库

Python Computer Vision application that allows users to draw/erase on the screen using their webcam.

A curated list of resources for text detection/recognition (optical character recognition ) with deep learning methods.

SceneCollisionNet This repo contains the code for "Object Rearrangement Using Learned Implicit Collision Functions", an ICRA 2021 paper. For more info

Source code of RRPN ---- Arbitrary-Oriented Scene Text Detection via Rotation Proposals