textspotter - An End-to-End TextSpotter with Explicit Alignment and Attention

Last update: Nov 10, 2022

Overview

An End-to-End TextSpotter with Explicit Alignment and Attention

This is initially described in our CVPR 2018 paper.

Getting Started

Installation

Clone the code

git clone https://github.com/tonghe90/textspotter
cd textspotter

Install caffe. You can follow this this tutorial. If you have build problem about std::allocater, please refer to this #3

# make sure you set WITH_PYTHON_LAYER := 1
# change Makefile.config according to your library path
cp Makefile.config.example Makefile.config
make clean
make -j8
make pycaffe

Training

we provide part of the training code. But you can not run this directly. 
We have give the comment in the [train.pt](https://github.com/tonghe90/textspotter/models/train.pt).
You have to write your own layer, IOUloss layer. We cannot publish this for some IP reason. 
To be noticed: 
[L6902](https://github.com/tonghe90/textspotter/models/train.pt#L6902) 
[L6947](https://github.com/tonghe90/textspotter/models/train.pt#L6907)

Testing

install editdistance and pyclipper: pip install editdistance and pip install pyclipper
After Caffe is set up, you need to download a trained model (about 40M) from Google Drive. This model is trained with VGG800k and finetuned on ICDAR2015.
Run python test.py --img=./imgs/img_105.jpg
hyperparameters:

cfg.py --mean_val ==> mean value during the testing.
       --max_len ==> maximum length of the text string (here we take 25, meaning a word can contain 25 characters at most.)
       --recog_th ==> the threshold during the recognition process. The score for a word is the average mean of every character.
       --word_score ==> the threshold for those words that contain number or symbols for they are not contained in the dictionary.

test.py --weight ==> weights file of caffemodel
        --prototxt-iou ==> the prototxt file for detection.
        --prototxt-lstm ==> the prototxt file for recognition.
        --img ==> the folder or img file for testing. The format can be added in ./pylayer/tool is_image function.
        --scales-ms ==> multiscales input for input during the testing process.
        --thresholds-ms ==> corresponding thresholds of text region for multiscale inputs.
        --nms ==> nms threshold for testing
        --save-dir ==> the dir for save results in format of ICDAR2015 submition.

One thing should be noted: the recognition results are achieved by comparing direct output with words in dictionary, which has about 90K lexicons. 
These lexicons don't contain any number and symbol. You can delete dictionary reference part and directly output recognition results.

Citation

If you use this code for your research, please cite our papers.

@inproceedings{tong2018,
  title={An End-to-End TextSpotter with Explicit Alignment and Attention},
  author={T. He and Z. Tian and W. Huang and C. Shen and Y. Qiao and C. Sun},
  booktitle={Computer Vision and Pattern Recognition (CVPR), 2018 IEEE Conference on},
  year={2018}
}

License

This code is for NON-COMMERCIAL purposes only. For commerical purposes, please contact Chunhua Shen [email protected]. This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, version 3. Please refer to http://www.gnu.org/licenses/ for more details.

textspotter - An End-to-End TextSpotter with Explicit Alignment and Attention

Related tags

Overview

An End-to-End TextSpotter with Explicit Alignment and Attention

Getting Started

Installation

Training

Testing

Citation

License

Owner

Tong He

This is a GUI for scrapping PDFs with the help of optical character recognition making easier than ever to scrape PDFs.

CUTIE (TensorFlow implementation of Convolutional Universal Text Information Extractor)

A list of hyperspectral image super-solution resources collected by Junjun Jiang

Text modding tools for FF7R (Final Fantasy VII Remake)

This is a c++ project deploying a deep scene text reading pipeline with tensorflow. It reads text from natural scene images. It uses frozen tensorflow graphs. The detector detect scene text locations. The recognizer reads word from each detected bounding box.

TedEval: A Fair Evaluation Metric for Scene Text Detectors

a micro OCR network with 0.07mb params.

Detect handwritten words in a text-line (classic image processing method).

Markup for note taking

Code release for Hu et al., Learning to Segment Every Thing. in CVPR, 2018.

Library used to deskew a scanned document

BoxToolBox is a simple python application built around the openCV library

Deep learning based page layout analysis

PyNeuro is designed to connect NeuroSky's MindWave EEG device to Python and provide Callback functionality to provide data to your application in real time.

Code for the "Sensing leg movement enhances wearable monitoring of energy expenditure" paper.

A program that takes in the hand gesture displayed by the user and translates ASL.

governance proposal to make fei redeemable for eth

Face Detection with DLIB

Détection de créneaux de vaccination disponibles pour l'outil ViteMaDose

Um simples projeto para fazer o reconhecimento do captcha usado pelo jogo bombcrypto