Convolutional Recurrent Neural Networks(CRNN) for Scene Text Recognition

Overview

CRNN_Tensorflow

This is a TensorFlow implementation of a Deep Neural Network for scene text recognition. It is mainly based on the paper "An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition". You can refer to the paper for architecture details. Thanks to the author Baoguang Shi.

The model consists of a CNN stage extracting features which are fed to an RNN stage (Bi-LSTM) and a CTC loss.

Installation

This software has been developed on Ubuntu 16.04(x64) using python 3.5 and TensorFlow 1.12. Since it uses some recent features of TensorFlow it is incompatible with older versions.

The following methods are provided to install dependencies:

Conda

You can create a conda environment with the required dependencies using:

conda env create -f crnntf-env.yml

Pip

Required packages may be installed with

pip3 install -r requirements.txt

Testing the pre-trained model

Evaluate the model on the synth90k dataset

In this repo you will find a model pre-trained on the Synth 90kdataset. When the tfrecords file of synth90k dataset has been successfully generated you may evaluated the model by the following script

The pretrained crnn model weights on Synth90k dataset can be found here

python tools/evaluate_shadownet.py --dataset_dir PATH/TO/YOUR/DATASET_DIR 
--weights_path PATH/TO/YOUR/MODEL_WEIGHTS_PATH
--char_dict_path PATH/TO/CHAR_DICT_PATH 
--ord_map_dict_path PATH/TO/ORD_MAP_PATH
--process_all 1 --visualize 1

If you set visualize true the expected output during evaluation process is

evaluate output

After all the evaluation process is done you should see some thing like this:

evaluation_result

The model's main evaluation index are as follows:

Test Dataset Size: 891927 synth90k test images

Per char Precision: 0.974325 without average weighted on each class

Full sequence Precision: 0.932981 without average weighted on each class

For Per char Precision:

single_label_accuracy = correct_predicted_char_nums_of_single_sample / single_label_char_nums

avg_label_accuracy = sum(single_label_accuracy) / label_nums

For Full sequence Precision:

single_label_accuracy = 1 if the prediction result is exactly the same as label else 0

avg_label_accuracy = sum(single_label_accuracy) / label_nums

Part of the confusion matrix of every single char looks like this:

evaluation_confusion_matrix

Test the model on the single image

If you want to test a single image you can do it with

python tools/test_shadownet.py --image_path PATH/TO/IMAGE 
--weights_path PATH/TO/MODEL_WEIGHTS
--char_dict_path PATH/TO/CHAR_DICT_PATH 
--ord_map_dict_path PATH/TO/ORD_MAP_PATH

Test example images

Example test_01.jpg

Example image1

Example test_02.jpg

Example image2

Example test_03.jpg

Example image3

Training your own model

Data preparation

Download the whole synth90k dataset here And extract all th files into a root dir which should contain several txt file and several folders filled up with pictures. Then you need to convert the whole dataset into tensorflow records as follows

python tools/write_tfrecords 
--dataset_dir PATH/TO/SYNTH90K_DATASET_ROOT_DIR
--save_dir PATH/TO/TFRECORDS_DIR

During converting all the source image will be scaled into (32, 100)

Training

For all the available training parameters, check global_configuration/config.py, then train your model with

python tools/train_shadownet.py --dataset_dir PATH/TO/YOUR/TFRECORDS
--char_dict_path PATH/TO/CHAR_DICT_PATH 
--ord_map_dict_path PATH/TO/ORD_MAP_PATH

If you wish, you can add more metrics to the training progress messages with --decode_outputs 1, but this will slow training down. You can also continue the training process from a snapshot with

python tools/train_shadownet.py --dataset_dir PATH/TO/YOUR/TFRECORDS
--weights_path PATH/TO/YOUR/PRETRAINED_MODEL_WEIGHTS
--char_dict_path PATH/TO/CHAR_DICT_PATH --ord_map_dict_path PATH/TO/ORD_MAP_PATH

If you has multiple gpus in your local machine you may use multiple gpu training to access a larger batch size input data. This will be supported as follows

python tools/train_shadownet.py --dataset_dir PATH/TO/YOUR/TFRECORDS
--char_dict_path PATH/TO/CHAR_DICT_PATH --ord_map_dict_path PATH/TO/ORD_MAP_PATH
--multi_gpus 1

The sequence distance is computed by calculating the distance between two sparse tensors so the lower the accuracy value is the better the model performs. The training accuracy is computed by calculating the character-wise precision between the prediction and the ground truth so the higher the better the model performs.

Tensorflow Serving

Thanks for Eldon's contribution of tensorflow service function:)

Since tensorflow model server is a very powerful tools to serve the DL model in industry environment. Here's a script for you to convert the checkpoints model file into tensorflow saved model which can be used with tensorflow model server to serve the CRNN model. If you can not run the script normally you may need to check if the checkpoint file path is correct in the bash script.

bash tfserve/export_crnn_saved_model.sh

To start the tensorflow model server you may check following script

bash tfserve/run_tfserve_crnn_gpu.sh

There are two different ways to test the python client of crnn model. First you may test the server via http/rest request by running

python tfserve/crnn_python_client_via_request.py ./data/test_images/test_01.jpg

Second you may test the server via grpc by running

python tfserve/crnn_python_client_via_grpc.py

Experiment

The original experiment run for 2000000 epochs, with a batch size of 32, an initial learning rate of 0.01 and exponential decay of 0.1 every 500000 epochs. During training the train loss dropped as follows

Training loss

The val loss dropped as follows

Validation_loss

2019.3.27 Updates

I have uploaded a newly trained crnn model on chinese dataset which can be found here. Sorry for not knowing the owner of the dataset. But thanks for his great work. If someone knows it you're welcome to let me know. The pretrained weights can be found here

Before start training you may need reorgnize the dataset's label information according to the synth90k dataset's format if you want to use the same data feed pip line mentioned above. Now I have reimplemnted a more efficient tfrecords writer which will accelerate the process of generating tfrecords file. You may refer to the code for details. Some information about training is listed bellow:

image size: (280, 32)

classes nums: 5824 without blank

sequence length: 70

training sample counts: 2733004

validation sample counts: 364401

testing sample counts: 546601

batch size: 32

training iter nums: 200000

init lr: 0.01

Test example images

Example test_01.jpg

Example image1

Example test_02.jpg

Example image2

Example test_03.jpg

Example image3

training tboard file

Training loss

The val loss dropped as follows

Validation_loss

2019.4.10 Updates

Add a small demo to recognize chinese pdf using the chinese crnn model weights. If you want to have a try you may follow the command:

cd CRNN_ROOT_REPO
python tools/recongnize_chinese_pdf.py -c ./data/char_dict/char_dict_cn.json 
-o ./data/char_dict/ord_map_cn.json --weights_path model/crnn_chinese/shadownet.ckpt 
--image_path data/test_images/test_pdf.png --save_path pdf_recognize_result.txt

You should see the same result as follows:

The left image is the recognize result displayed on console and the right image is the origin pdf image.

recognize_result_console

The left image is the recognize result written in local file and the right image is the origin pdf image. recognize_result_file

TODO

  • Add new model weights trained on the whole synth90k dataset
  • Add multiple gpu training scripts
  • Add new pretrained model on chinese dataset
  • Add an online toy demo
  • Add tensorflow service script

Acknowledgement

Please cite my repo CRNN_Tensorflow if you use it.

Contact

Scan the following QR to disscuss :) qr

Owner
MaybeShewill-CV
Engineer from Baidu
MaybeShewill-CV
CTPN + DenseNet + CTC based end-to-end Chinese OCR implemented using tensorflow and keras

简介 基于Tensorflow和Keras实现端到端的不定长中文字符检测和识别 文本检测:CTPN 文本识别:DenseNet + CTC 环境部署 sh setup.sh 注:CPU环境执行前需注释掉for gpu部分,并解开for cpu部分的注释 Demo 将测试图片放入test_images

Yang Chenguang 2.6k Dec 29, 2022
Handwritten Text Recognition (HTR) using TensorFlow 2.x

Handwritten Text Recognition (HTR) system implemented using TensorFlow 2.x and trained on the Bentham/IAM/Rimes/Saint Gall/Washington offline HTR data

Arthur Flôr 160 Dec 21, 2022
Code for CVPR 2022 paper "SoftGroup for Instance Segmentation on 3D Point Clouds"

SoftGroup We provide code for reproducing results of the paper SoftGroup for 3D Instance Segmentation on Point Clouds (CVPR 2022) Author: Thang Vu, Ko

Thang Vu 231 Dec 27, 2022
Can We Find Neurons that Cause Unrealistic Images in Deep Generative Networks?

Can We Find Neurons that Cause Unrealistic Images in Deep Generative Networks? Artifact Detection/Correction - Offcial PyTorch Implementation This rep

CHOI HWAN IL 23 Dec 20, 2022
Read Japanese manga inside browser with selectable text.

mokuro Read Japanese manga with selectable text inside a browser. See demo: https://kha-white.github.io/manga-demo mokuro_demo.mp4 Demo contains excer

Maciej Budyś 170 Dec 27, 2022
Learning Camera Localization via Dense Scene Matching, CVPR2021

This repository contains code of our CVPR 2021 paper - "Learning Camera Localization via Dense Scene Matching" by Shitao Tang, Chengzhou Tang, Rui Hua

tangshitao 65 Dec 01, 2022
Developed an AI-based system to control the mouse cursor using Python and OpenCV with the real-time camera.

Developed an AI-based system to control the mouse cursor using Python and OpenCV with the real-time camera. Fingertip location is mapped to RGB images to control the mouse cursor.

Ravi Sharma 71 Dec 20, 2022
Detect and fix skew in images containing text

Alyn Skew detection and correction in images containing text Image with skew Image after deskew Install and use via pip! Recommended way(using virtual

Kakul 230 Dec 21, 2022
This is a tensorflow re-implementation of PSENet: Shape Robust Text Detection with Progressive Scale Expansion Network.My blog:

PSENet: Shape Robust Text Detection with Progressive Scale Expansion Network Introduction This is a tensorflow re-implementation of PSENet: Shape Robu

Michael liu 498 Dec 30, 2022
Scale-aware Automatic Augmentation for Object Detection (CVPR 2021)

SA-AutoAug Scale-aware Automatic Augmentation for Object Detection Yukang Chen, Yanwei Li, Tao Kong, Lu Qi, Ruihang Chu, Lei Li, Jiaya Jia [Paper] [Bi

Jia Research Lab 182 Dec 29, 2022
基于Paddle框架的PSENet复现

PSENet-Paddle 基于Paddle框架的PSENet复现 本项目基于paddlepaddle框架复现PSENet,并参加百度第三届论文复现赛,将在2021年5月15日比赛完后提供AIStudio链接~敬请期待 AIStudio链接 参考项目: whai362-PSENet 环境配置 本项目

QuanHao Guo 4 Apr 24, 2022
A general list of resources to image text localization and recognition 场景文本位置感知与识别的论文资源与实现合集 シーンテキストの位置認識と識別のための論文リソースの要約

Scene Text Localization & Recognition Resources Read this institute-wise: English, 简体中文. Read this year-wise: English, 简体中文. Tags: [STL] (Scene Text L

Karl Lok (Zhaokai Luo) 901 Dec 11, 2022
The code of "Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes"

Mask TextSpotter A Pytorch implementation of Mask TextSpotter along with its extension can be find here Introduction This is the official implementati

Pengyuan Lyu 261 Nov 21, 2022
Controlling Volume by Hand Gestures

This program allows the user to control the volume of their device with specific hand gestures involving their thumb and index finger!

Riddhi Bajaj 1 Nov 11, 2021
This is a Computer vision package that makes its easy to run Image processing and AI functions. At the core it uses OpenCV and Mediapipe libraries.

CVZone This is a Computer vision package that makes its easy to run Image processing and AI functions. At the core it uses OpenCV and Mediapipe librar

CVZone 648 Dec 30, 2022
Handwritten Character Recognition using CNN

Handwritten Character Recognition using CNN Problem Definition The main objective of this project is to solve the problem of handwritten character rec

Mohit Kaushik 4 Mar 02, 2022
Hand gesture detection project with aweome UI implementation.

an awesome hand gesture detection project for you to be creative! Imagination is the limit to do with this project.

AR Ashraf 39 Sep 26, 2022
原神风花节自动弹琴辅助

GenshinAutoPlayBalladsofBreeze 原神风花节自动弹琴辅助(已适配1920*1080分辨率) 本程序基于opencv图像识别技术,不存在任何封号。 因为正确率取决于你的cpu性能,10900k都不一定全对。 由于图像识别存在误差,根本无法确定出错时间。更不用说被检测到了。

晓轩 20 Oct 27, 2022
This repo contains a script that allows us to find range of colors in images using openCV, and then convert them into geo vectors.

Vectorizing color range This repo contains a script that allows us to find range of colors in images using openCV, and then convert them into geo vect

Development Seed 9 Jul 27, 2022
A buffered and threaded wrapper for the OpenCV VideoCapture object. Can speed up video decoding significantly. Supports

A buffered and threaded wrapper for the OpenCV VideoCapture object. Can speed up video decoding significantly. Supports "with"-syntax.

Patrice Matz 0 Oct 30, 2021