TextBoxes++: A Single-Shot Oriented Scene Text Detector

Last update: Jan 04, 2023

Overview

TextBoxes++: A Single-Shot Oriented Scene Text Detector

Introduction

This is an application for scene text detection (TextBoxes++) and recognition (CRNN).

TextBoxes++ is a unified framework for oriented scene text detection with a single network. It is an extended work of TextBoxes. CRNN is an open-source text recognizer. The code of TextBoxes++ is based on SSD and TextBoxes. The code of CRNN is modified from CRNN.

For more details, please refer to our arXiv paper.

Citing the related works

Please cite the related works in your publications if it helps your research:

@article{Liao2018Text,
  title = {{TextBoxes++}: A Single-Shot Oriented Scene Text Detector},
  author = {Minghui Liao, Baoguang Shi and Xiang Bai},
  journal = {{IEEE} Transactions on Image Processing},
  doi  = {10.1109/TIP.2018.2825107},
  url = {https://doi.org/10.1109/TIP.2018.2825107},
  volume = {27},
  number = {8},
  pages = {3676--3690},
  year = {2018}
}

@inproceedings{LiaoSBWL17,
  author    = {Minghui Liao and
               Baoguang Shi and
               Xiang Bai and
               Xinggang Wang and
               Wenyu Liu},
  title     = {TextBoxes: {A} Fast Text Detector with a Single Deep Neural Network},
  booktitle = {AAAI},
  year      = {2017}
}

@article{ShiBY17,
  author    = {Baoguang Shi and
               Xiang Bai and
               Cong Yao},
  title     = {An End-to-End Trainable Neural Network for Image-Based Sequence Recognition
               and Its Application to Scene Text Recognition},
  journal   = {{IEEE} TPAMI},
  volume    = {39},
  number    = {11},
  pages     = {2298--2304},
  year      = {2017}
}

Requirements
Installation
Docker
Models
Demo
Train

Requirements

NOTE There is partial support for a docker image. See docker/README.md. (Thank you for the PR from @mdbenito)

Torch7 for CRNN; 
g++-5; cuda8.0; cudnn V5.1 (cudnn 6 and cudnn 7 may fail); opencv3.0

Please refer to Caffe Installation to ensure other dependencies;

Installation

compile TextBoxes++ (This is a modified version of caffe so you do not need to install the official caffe)

# Modify Makefile.config according to your Caffe installation.
cp Makefile.config.example Makefile.config
make -j8
# Make sure to include $CAFFE_ROOT/python to your PYTHONPATH.
make py

compile CRNN (Please refer to CRNN if you have trouble with the compilation.)

cd crnn/src/
sh build_cpp.sh

Docker

(Thanks for the PR from @idotobi)

Build Docke Image

docker build -t tbpp_crnn:gpu .

This can take +1h, so go get a coffee ;)

Once this is done you can start a container via nvidia-docker.

nvidia-docker run -it --rm tbpp_crnn:gpu bash

To check if the GPU is available inside the docker container you can run nvidia-smi.

It's recommendable to mount the ./models and ./crnn/model/ directories to include the downloaded models.

nvidia-docker run -it \
                  --rm \
                  -v ${PWD}/models:/opt/caffe/models \ 
                  -v ${PWD}/crrn/model:/opt/caffe/crrn/model \
                  tbpp_crnn:gpu bash

For convenince this command is executed when running ./run.bash.

Models

pre-trained model on SynthText (used for training): Dropbox; BaiduYun
model trained on ICDAR 2015 Incidental Text (used for testing): Dropbox; BaiduYun

Please place the above models in "./models/"

If your data is hugely different from ICDAR 2015 Incidental Text，you'd better train it on your own data based on the pre-trained model on SynthText.
CRNN model: Dropbox; BaiduYun

Please place the crnn model in "./crnn/model/"

Demo

Download the ICDAR 2015 model and place it in "./models/"

python examples/text/demo.py

The detection results and recognition results are in "./demo_images"

Train

Create lmdb data

convert ground truth into "xml" form: example.xml
create train/test lists (train.txt / test.txt) in "./data/text/" with the following form:
```
 path_to_example1.jpg path_to_example1.xml
 path_to_example2.jpg path_to_example2.xml
```
Run "./data/text/creat_data.sh"

Start training

1. modify the lmdb path in modelConfig.py
2. Run "python examples/text/train.py"

TextBoxes++: A Single-Shot Oriented Scene Text Detector

Related tags

Overview

TextBoxes++: A Single-Shot Oriented Scene Text Detector

Introduction

Citing the related works

Contents

Requirements

Installation

Docker

Models

Demo

Train

Create lmdb data

Start training

Owner

Minghui Liao

Total Text Dataset. It consists of 1555 images with more than 3 different text orientations: Horizontal, Multi-Oriented, and Curved, one of a kind.

原神风花节自动弹琴辅助

A novel region proposal network for more general object detection ( including scene text detection ).

Face Anonymizer - FaceAnonApp v1.0

color detection using python

Usando o Amazon Textract como OCR para Extração de Dados no DynamoDB

SemTorch

Program created with opencv that allows you to automatically count your repetitions on several fitness exercises.

Code for AAAI 2021 paper: Sequential End-to-end Network for Efficient Person Search

Generate a list of papers with publicly available source code in the daily arxiv

An unofficial implementation of the paper "AutoVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss".

MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition

Generates a message from the infamous Jerma Impostor image

Connect Aseprite to Blender for painting pixelart textures in real time

Python bindings for JIGSAW: a Delaunay-based unstructured mesh generator.

Scan the MRZ code of a passport and extract the firstname, lastname, passport number, nationality, date of birth, expiration date and personal numer.

Official code for "Bridging Video-text Retrieval with Multiple Choice Questions", CVPR 2022 (Oral).

Tensorflow-based CNN+LSTM trained with CTC-loss for OCR

Scale-aware Automatic Augmentation for Object Detection (CVPR 2021)

Fully-automated scripts for collecting AI-related papers