caffe re-implementation of R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection

Overview

R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection

Abstract

This is a caffe re-implementation of R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection.

This project is modified from py-R-FCN, and inclined nms and generate rotated box component is imported from EAST project. Thanks for the author's(@zxytim @argman) help. Please cite this paper if you find this useful.

Contents

  1. Abstract
  2. Structor
  3. Installation
  4. Demo
  5. Test
  6. Train
  7. Experiments
  8. Furthermore

Structor

Code structor

.
├── docker-compose.yml
├── docker // docker deps file
├── Dockerfile // docker build file
├── model // model directory
│   ├── caffemodel // trained caffe model
│   ├── icdar15_gt // ICDAR2015 groundtruth
│   ├── prototxt // caffe prototxt file
│   └── imagenet_models // pretrained on imagenet
├── nvidia-docker-compose.yml
├── logs
│   ├── submit // original submit file
│   ├── submit_zip // zip submit file
│   ├── snapshots
│   └── train
│       ├── VGG16.txt.*
│       └── snapshots
├── README.md
├── requirements.txt // python package
├── src
│   ├── cfgs // train config yml
│   ├── data // cache file
│   ├── lib
│   ├── _init_path.py
│   ├── demo.py
│   ├── eval_icdar15.py // eval 2015 icdar dataset F-meaure
│   ├── test_net.py
│   └── train_net.py
├── demo.sh
├── train.sh
├── images // test images
│   ├── img_1.jpg
│   ├── img_2.jpg
│   ├── img_3.jpg
│   ├── img_4.jpg
│   └── img_5.jpg
└── test.sh // test script

Data structor

It should have this basic structure

ICDARdevkit_Root
.
├── ICDAR2013
├── merge_train.txt  // images list contains ICDAR2013+ICDAR2015 train dataset, then raw data augmentation the same as the paper
├── ICDAR2015
│   ├── augmentation // contains all augmented images
│   └── ImageSets/Main/test.txt // ICDAR2015 test images list

Installation

Install caffe

It is highly recommended to use docker to build environment. More about how to configure docker, see Running with Docker If you are familiar with docker, please run

    1. nvidia-docker-compose run --rm --service-ports rrcnn bash
    2. bash ./demo.sh

If you don't familiar with docker, please follow py-R-FCN to install caffe.

Build

    cd src/lib && make
    

Download Model

  1. please download VGG16 pre-trained model on Imagenet, place it to model/imagenet_models/VGG16.v2.caffemodel.
  2. please download VGG16 trained model by this project, place it model/caffemodel/TextBoxes-v2_iter_12w.caffemodel.

Demo

It is recommended to use UNIX socket to support GUI for docker, plesase open another terminal and type:

    xhost + # may be you need it when open a new terminal
    # docker-compose.yml: mount host  volume : /tmp/.X11-unix to docker volume: /tmp/.X11-unix  
    # pass DISPLAY variable to docker container so host X server can display image in docker
    docker exec -it -e DISPLAY=$DISPLAY ${CURRENT_CONTAINER_ID} bash
    bash ./demo.sh

Test

Single Test

    bash ./test.sh

Multi-scale Test

    # please uncomment two lines in src/cfgs/faster_rcnn_end2end.yml
    SCALES: [720, 1200]
    MULTI_SCALES_NOC: True
    # modify src/lib/datasets/icdar.py to find ICDAR2015 test data, please refer to commit @bbac1cf
    # then run
    bash ./test.sh

Train

Train data

  • Mine: ICDAR2013+ICDAR2015 train dataset, and raw data augmentation, at last got 15977 images.
  • Paper: ICDAR2015 + 2000 focused scene text images they collected.

Train commands

  1. Go to ./src/lib/datasets/icdar.py, modify images path to let train.py find merge_train.txt images list.
  2. Remove cache in src/data/*.pkl or you can load cached roidb data of this project, and place it to src/data/
    # Train for RRCNN4-TextBoxes-v2-OHEM
    bash ./train.sh

note: If you use USE_FLIPPED=True&USE_FLIPPED_QUAD=True, you will get almost 31200 roidb.

Experiments

Mine VS Paper

Approaches Anchor Scales Pooled sizes Inclined NMS Test scales(short side) F-measure(Mine VS paper)
R2CNN-2 (4, 8, 16) (7, 7) Y (720) 71.12% VS 68.49%
R2CNN-3 (4, 8, 16) (7, 7) Y (720) 73.10% VS 74.29%
R2CNN-4 (4, 8, 16, 32) (7, 7) Y (720) 74.14% VS 74.36%
R2CNN-4 (4, 8, 16, 32) (7, 7) Y (720, 1200) 79.05% VS 81.80%
R2CNN-5 (4, 8, 16, 32) (7, 7) (11, 3) (3, 11) Y (720) 74.34% VS 75.34%
R2CNN-5 (4, 8, 16, 32) (7, 7) (11, 3) (3, 11) Y (720, 1200) 78.70% VS 82.54%

Appendixes

Approaches Anchor Scales aspect ration Pooled sizes Inclined NMS Test scales(short side) F-measure
R2CNN-4 (4, 8, 16, 32) (0.5, 1, 2) (7, 7) Y (720) 74.36%
R2CNN-4 (4, 8, 16, 32) (0.5, 1, 2) (7, 7) Y (720, 1200) VS 81.80%
R2CNN-4-TextBoxes-OHEM (4, 8, 16, 32) (0.5, 1, 2, 3, 5, 7, 10) (7, 7) Y (720) 76.53%

Furthermore

You can try Resnet-50, Resnet-101 and so on.

Owner
candler
a computer vision worker
candler
A facial recognition program that plays a alarm (mp3 file) when a person i seen in the room. A basic theif using Python and OpenCV

Home-Security-Demo A facial recognition program that plays a alarm (mp3 file) when a person is seen in the room. A basic theif using Python and OpenCV

SysKey 4 Nov 02, 2021
(CVPR 2021) ST3D: Self-training for Unsupervised Domain Adaptation on 3D Object Detection

ST3D Code release for the paper ST3D: Self-training for Unsupervised Domain Adaptation on 3D Object Detection, CVPR 2021 Authors: Jihan Yang*, Shaoshu

CVMI Lab 224 Dec 28, 2022
Run tesseract with the tesserocr bindings with @OCR-D's interfaces

ocrd_tesserocr Crop, deskew, segment into regions / tables / lines / words, or recognize with tesserocr Introduction This package offers OCR-D complia

OCR-D 38 Oct 14, 2022
Demo for the paper "Overlap-aware low-latency online speaker diarization based on end-to-end local segmentation"

Streaming speaker diarization Overlap-aware low-latency online speaker diarization based on end-to-end local segmentation by Juan Manuel Coria, Hervé

Juanma Coria 185 Jan 01, 2023
The project is an official implementation of our paper "3D Human Pose Estimation with Spatial and Temporal Transformers".

3D Human Pose Estimation with Spatial and Temporal Transformers This repo is the official implementation for 3D Human Pose Estimation with Spatial and

Ce Zheng 363 Dec 28, 2022
An Implementation of the seglink alogrithm in paper Detecting Oriented Text in Natural Images by Linking Segments

Tips: A more recent scene text detection algorithm: PixelLink, has been implemented here: https://github.com/ZJULearning/pixel_link Contents: Introduc

dengdan 484 Dec 07, 2022
OCR engine for all the languages

Description kraken is a turn-key OCR system optimized for historical and non-Latin script material. kraken's main features are: Fully trainable layout

431 Jan 04, 2023
A small C++ implementation of LSTM networks, focused on OCR.

clstm CLSTM is an implementation of the LSTM recurrent neural network model in C++, using the Eigen library for numerical computations. Status and sco

Tom 794 Dec 30, 2022
A document scanner application for laptops/desktops developed using python, Tkinter and OpenCV.

DcoumentScanner A document scanner application for laptops/desktops developed using python, Tkinter and OpenCV. Directly install the .exe file to inst

Harsh Vardhan Singh 1 Oct 29, 2021
An OCR evaluation tool

dinglehopper dinglehopper is an OCR evaluation tool and reads ALTO, PAGE and text files. It compares a ground truth (GT) document page with a OCR resu

QURATOR-SPK 40 Dec 20, 2022
EAST for ICPR MTWI 2018 Challenge II (Text detection of network images)

EAST_ICPR2018: EAST for ICPR MTWI 2018 Challenge II (Text detection of network images) Introduction This is a repository forked from argman/EAST for t

QichaoWu 49 Dec 24, 2022
A tool combining EasyOCR and LaMa to automatically detect text and replace it with an inpainted background.

EasyLaMa (WIP) This is a tool combining EasyOCR and LaMa to automatically detect text and replace it with an inpainted background. Installation For GP

3 Sep 17, 2022
A curated list of resources dedicated to scene text localization and recognition

Scene Text Localization & Recognition Resources A curated list of resources dedicated to scene text localization and recognition. Any suggestions and

CarlosTao 1.6k Dec 22, 2022
📷 This repository is focused on having various feature implementation of OpenCV in Python.

📷 This repository is focused on having various feature implementation of OpenCV in Python. The aim is to have a minimal implementation of all OpenCV features together, under one roof.

Aditya Kumar Gupta 128 Dec 04, 2022
Handwritten Text Recognition (HTR) using TensorFlow 2.x

Handwritten Text Recognition (HTR) system implemented using TensorFlow 2.x and trained on the Bentham/IAM/Rimes/Saint Gall/Washington offline HTR data

Arthur Flôr 160 Dec 21, 2022
A buffered and threaded wrapper for the OpenCV VideoCapture object. Can speed up video decoding significantly. Supports

A buffered and threaded wrapper for the OpenCV VideoCapture object. Can speed up video decoding significantly. Supports "with"-syntax.

Patrice Matz 0 Oct 30, 2021
Apply different text recognition services to images of handwritten documents.

Handprint The Handwritten Page Recognition Test is a command-line program that invokes HTR (handwritten text recognition) services on images of docume

Caltech Library 117 Jan 02, 2023
Semantic-based Patch Detection for Binary Programs

PMatch Semantic-based Patch Detection for Binary Programs Requirement tensorflow-gpu 1.13.1 numpy 1.16.2 scikit-learn 0.20.3 ssdeep 3.4 Usage tar -xvz

Mr.Curiosity 3 Sep 02, 2022
Face Detection with DLIB

Face Detection with DLIB In this project, we have detected our face with dlib and opencv libraries. Setup This Project Install DLIB & OpenCV You can i

Can 2 Jan 16, 2022
Localization of thoracic abnormalities model based on VinBigData (top 1%)

Repository contains the code for 2nd place solution of VinBigData Chest X-ray Abnormalities Detection competition. The goal of competition was to auto

33 May 24, 2022