Implementation of our paper 'PixelLink: Detecting Scene Text via Instance Segmentation' in AAAI2018

Last update: Dec 22, 2022

Related tags

Overview

Code for the AAAI18 paper PixelLink: Detecting Scene Text via Instance Segmentation, by Dan Deng, Haifeng Liu, Xuelong Li, and Deng Cai.

Contributions to this repo are welcome, e.g., some other backbone networks (including the model definition and pretrained models).

PLEASE CHECK EXSITING ISSUES BEFORE OPENNING YOUR OWN ONE. IF A SAME OR SIMILAR ISSUE HAD BEEN POSTED BEFORE, JUST REFER TO IT, AND DO NO OPEN A NEW ONE.

Installation

Clone the repo

git clone --recursive [email protected]:ZJULearning/pixel_link.git

Denote the root directory path of pixel_link by ${pixel_link_root}.

Add the path of ${pixel_link_root}/pylib/src to your PYTHONPATH:

export PYTHONPATH=${pixel_link_root}/pylib/src:$PYTHONPATH

Prerequisites

(Only tested on) Ubuntu14.04 and 16.04 with:

Python 2.7
Tensorflow-gpu >= 1.1
opencv2
setproctitle
matplotlib

Anaconda is recommended to for an easier installation:

Install Anaconda
Create and activate the required virtual environment by:

conda env create --file pixel_link_env.txt
source activate pixel_link

Testing

Download the pretrained model

PixelLink + VGG16 4s Baidu Netdisk | GoogleDrive, trained on IC15
PixelLink + VGG16 2s Baidu Netdisk | GoogleDrive, trained on IC15

Unzip the downloaded model. It contains 4 files:

config.py
model.ckpt-xxx.data-00000-of-00001
model.ckpt-xxx.index
model.ckpt-xxx.meta

Denote their parent directory as ${model_path}.

Test on ICDAR2015

The reported results on ICDAR2015 are:

Model	Recall	Precision	F-mean
PixelLink+VGG16 2s	82.0	85.5	83.7
PixelLink+VGG16 4s	81.7	82.9	82.3

Suppose you have downloaded the ICDAR2015 dataset, execute the following commands to test the model on ICDAR2015:

cd ${pixel_link_root}
./scripts/test.sh ${GPU_ID} ${model_path}/model.ckpt-xxx ${path_to_icdar2015}/ch4_test_images

For example:

./scripts/test.sh 3 ~/temp/conv3_3/model.ckpt-38055 ~/dataset/ICDAR2015/Challenge4/ch4_test_images

The program will create a zip file of detection results, which can be submitted to the ICDAR2015 server directly. The detection results can be visualized via scripts/vis.sh.

Here are some samples:

Test on any images

Put the images to be tested in a single directory, i.e., ${image_dir}. Then:

cd ${pixel_link_root}
./scripts/test_any.sh ${GPU_ID} ${model_path}/model.ckpt-xxx ${image_dir}

For example:

 ./scripts/test_any.sh 3 ~/temp/conv3_3/model.ckpt-38055 ~/dataset/ICDAR2015/Challenge4/ch4_training_images

The program will visualize the detection results directly on images. If the detection result is not satisfying, try to:

Adjust the inference parameters like eval_image_width, eval_image_height, pixel_conf_threshold, link_conf_threshold.
Or train your own model.

Training

Converting the dataset to tfrecords files

Scripts for converting ICDAR2015 and SynthText datasets have been provided in the datasets directory. It not hard to write a converting script for your own dataset.

Train your own model

Modify scripts/train.sh to configure your dataset name and dataset path like:

DATASET=icdar2015
DATASET_DIR=$HOME/dataset/pixel_link/icdar2015

Start training

./scripts/train.sh ${GPU_IDs} ${IMG_PER_GPU}

For example, ./scripts/train.sh 0,1,2 8.

The existing training strategy in scripts/train.sh is configured for icdar2015, modify it if necessary. A lot of training or model options are available in config.py, try it yourself if you are interested.

Implementation of our paper 'PixelLink: Detecting Scene Text via Instance Segmentation' in AAAI2018

Related tags

Overview

Installation

Clone the repo

Prerequisites

Testing

Download the pretrained model

Test on ICDAR2015

Test on any images

Training

Converting the dataset to tfrecords files

Train your own model

Acknowlegement

Owner

A simple OCR API server, seriously easy to be deployed by Docker, on Heroku as well

TextBoxes: A Fast Text Detector with a Single Deep Neural Network https://github.com/MhLiao/TextBoxes 基于SSD改进的文本检测算法，textBoxes_note记录了之前整理的笔记。

Convolutional Recurrent Neural Networks(CRNN) for Scene Text Recognition

Layout Analysis Evaluator for the ICDAR 2017 competition on Layout Analysis for Challenging Medieval Manuscripts

Code for the "Sensing leg movement enhances wearable monitoring of energy expenditure" paper.

Responsive Doc. scanner using U^2-Net, Textcleaner and Tesseract

Image Detector and Convertor App created using python's Pillow, OpenCV, cvlib, numpy and streamlit packages.

BNF Globalization Code (CVPR 2016)

PyNeuro is designed to connect NeuroSky's MindWave EEG device to Python and provide Callback functionality to provide data to your application in real time.

Implementation of our paper 'PixelLink: Detecting Scene Text via Instance Segmentation' in AAAI2018

A webcam-based 3x3x3 rubik's cube solver written in Python 3 and OpenCV.

Zoom , GoogleMeets에서 Vtuber 데뷔하기

Characterizing possible failure modes in physics-informed neural networks.

A Joint Video and Image Encoder for End-to-End Retrieval

🖺 OCR using tensorflow with attention

When Age-Invariant Face Recognition Meets Face Age Synthesis: A Multi-Task Learning Framework (CVPR 2021 oral)

Image processing in Python

Regions sanitàries (RS), Sectors Sanitàris (SS) i Àrees Bàsiques de Salut (ABS) de Catalunya

Recognizing cropped text in natural images.

Let's explore how we can extract text from forms