Scene text detection and recognition based on Extremal Region(ER)

Overview

Scene text recognition

A real-time scene text recognition algorithm. Our system is able to recognize text in unconstrain background.
This algorithm is based on several papers, and was implemented in C/C++.

Enviroment and dependency

  1. OpenCV 3.1 or above
  2. CMake 3.10 or above
  3. Visual Studio 2017 Community or above (Windows-only)

How to build?

Windows

  1. Install OpenCV; put the opencv directory into C:\tools
    • You can install it manually from its Github repo, or
    • You can install it via Chocolatey: choco install opencv, or
    • If you already have OpenCV, edit CMakeLists.txt and change WIN_OPENCV_CONFIG_PATH to where you have it
  2. Use CMake to generate the project files
    cd Scene-text-recognition
    mkdir build-win
    cd build-win
    cmake .. -G "Visual Studio 15 2017 Win64"
  3. Use CMake to build the project
    cmake --build . --config Release
  4. Find the binaries in the root directory
    cd ..
    dir | findstr scene
  5. To execute the scene_text_recognition.exe binary, use its wrapper script; for example:
    .\scene_text_recognition.bat -i res\ICDAR2015_test\img_6.jpg

Linux

  1. Install OpenCV; refer to OpenCV Installation in Linux
  2. Use CMake to generate the project files
    cd Scene-text-recognition
    mkdir build-linux
    cd build-linux
    cmake ..
  3. Use CMake to build the project
    cmake --build .
  4. Find the binaries in the root directory
    cd ..
    ls | grep scene
  5. To execute the binaries, run them as-is; for example:
    ./scene_text_recognition -i res/ICDAR2015_test/img_6.jpg

Usage

The executable file scene_text_recognition must ultimately exist in the project root directory (i.e., next to classifier/, dictionary/ etc.)

./scene_text_recognition -v:            take default webcam as input  
./scene_text_recognition -v [video]:    take a video as input  
./scene_text_recognition -i [image]:    take an image as input  
./scene_text_recognition -i [path]:     take folder with images as input,  
./scene_text_recognition -l [image]:    demonstrate "Linear Time MSER" Algorithm  
./scene_text_recognition -t detection:  train text detection classifier  
./scene_text_recognition -t ocr:        train text recognition(OCR) classifier 

Train your own classifier

Text detection

  1. Put your text data to res/pos, non-text data to res/neg
  2. Name your data in numerical, e.g. 1.jpg, 2.jpg, 3.jpg, and so on.
  3. Make sure training folder exist
  4. Run ./scene_text_recognition -t detection
mkdir training
./scene_text_recognition -t detection
  1. Text detection classifier will be found at training folder

Text recognition(OCR)

  1. Put your training data to res/ocr_training_data/
  2. Arrange the data in [Font Name]/[Font Type]/[Category]/[Character.jpg], for instance Time_New_Roman/Bold/lower/a.jpg. You can refer to res/ocr_training_data.zip
  3. Make sure training folder exist, and put svm-train to root folder (svm-train will be build by the system and should be found at build/)
  4. Run ./scene_text_recognition -t ocr
mkdir training
mv svm-train scene-text-recognition/
scene_text_recognition -t ocr
  1. Text recognition(OCR) classifier will be fould at training folder

How it works

The algorithm is based on an region detector called Extremal Region (ER), which is basically the superset of famous region detector MSER. We use ER to find text candidates. The ER is extracted by Linear-time MSER algorithm. The pitfall of ER is repeating detection, therefore we remove most of repeating ERs with non-maximum suppression. We estimate the overlapped between ER based on the Component tree. and calculate the stability of every ER. Among the same group of overlapped ER, only the one with maximum stability is kept. After that we apply a 2-stages Real-AdaBoost to fliter non-text region. We choose Mean-LBP as feature because it's faster compare to other features. The suviving ERs are then group together to make the result from character-level to word level, which is more instinct for human. Our next step is to apply an OCR to these detected text. The chain-code of the ER is used as feature and the classifier is trained by SVM. We also introduce several post-process such as optimal-path selection and spelling check to make the recognition result better.

overview

Notes

For text classification, the training data contains 12,000 positive samples, mostly extract from ICDAR 2003 and ICDAR 2015 dataset. the negative sample are extracted from random images with a bootstrap process. As for OCR classification, the training data is consist of purely synthetic letters, including 28 different fonts.

The system is able to detect text in real-time(30FPS) and recognize text in nearly real-time(8~15 FPS, depends on number of texts) for a 640x480 resolution image on a Intel Core i7 desktop computer. The algorithm's end-to-end text detection accuracy on ICDAR dataset 2015 is roughly 70% with fine tune, and end-to-end recognition accuracy is about 30%.

Result

Detection result on IDCAR 2015

result1 result2 result3

Recognition result on random image

result4 result5

Linear Time MSER Demo

The green pixels are so called boundry pixels, which are pushed into stacks. Each stack stand for a gray level, and pixels will be pushed according to their gary level. result4

References

  1. D. Nister and H. Stewenius, “Linear time maximally stable extremal regions,” European Conference on Computer Vision, pages 183196, 2008.
  2. L. Neumann and J. Matas, “A method for text localization and recognition in real-world images,” Asian Conference on Computer Vision, pages 770783, 2010.
  3. L. Neumann and J. Matas, “Real-time scene text localization and recognition,” Computer Vision and Pattern Recognition, pages 35383545, 2012.
  4. L. Neumann and J. Matas, “On combining multiple segmentations in scene text recognition,” International Conference on Document Analysis and Recognition, pages 523527, 2013.
  5. H. Cho, M. Sung and B. Jun, ”Canny Text Detector: Fast and robust scene text localization algorithm,” Computer Vision and Pattern Recognition, pages 35663573, 2016.
  6. B. Epshtein, E. Ofek, and Y. Wexler, “Detecting text in natural scenes with stroke width transform,” Computer Vision and Pattern Recognition, pages 29632970, 2010.
  7. P. Viola and M. J. Jones, “Rapid object detection using a boosted cascade of simple features,” Computer Vision and Pattern Recognition, pages 511518, 2001.
Owner
HSIEH, YI CHIA
HSIEH, YI CHIA
Links to awesome OCR projects

Awesome OCR This list contains links to great software tools and libraries and literature related to Optical Character Recognition (OCR). Contribution

Konstantin Baierer 2.2k Jan 02, 2023
ScanTailor Advanced is the version that merges the features of the ScanTailor Featured and ScanTailor Enhanced versions, brings new ones and fixes.

ScanTailor Advanced The ScanTailor version that merges the features of the ScanTailor Featured and ScanTailor Enhanced versions, brings new ones and f

952 Dec 31, 2022
Code for the "Sensing leg movement enhances wearable monitoring of energy expenditure" paper.

EnergyExpenditure Code for the "Sensing leg movement enhances wearable monitoring of energy expenditure" paper. Additional data for replicating this s

Patrick S 42 Oct 26, 2022
Detect and fix skew in images containing text

Alyn Skew detection and correction in images containing text Image with skew Image after deskew Install and use via pip! Recommended way(using virtual

Kakul 230 Dec 21, 2022
SceneCollisionNet This repo contains the code for "Object Rearrangement Using Learned Implicit Collision Functions", an ICRA 2021 paper. For more info

SceneCollisionNet This repo contains the code for "Object Rearrangement Using Learned Implicit Collision Functions", an ICRA 2021 paper. For more info

NVIDIA Research Projects 31 Nov 22, 2022
📷 This repository is focused on having various feature implementation of OpenCV in Python.

📷 This repository is focused on having various feature implementation of OpenCV in Python. The aim is to have a minimal implementation of all OpenCV features together, under one roof.

Aditya Kumar Gupta 128 Dec 04, 2022
A Python wrapper for Google Tesseract

Python Tesseract Python-tesseract is an optical character recognition (OCR) tool for python. That is, it will recognize and "read" the text embedded i

Matthias A Lee 4.6k Jan 06, 2023
FastOCR is a desktop application for OCR API.

FastOCR FastOCR is a desktop application for OCR API. Installation Arch Linux fastocr-git @ AUR Build from AUR or install with your favorite AUR helpe

Bruce Zhang 58 Jan 07, 2023
This Repository contain Opencv Projects in python

Python-Opencv OpenCV OpenCV (Open Source Computer Vision Library) is an open source computer vision and machine learning software library. OpenCV was

Yash Sakre 2 Nov 06, 2021
Rubik's Cube in pygame with OpenGL

Rubik Rubik's Cube in pygame with OpenGL The script show on the screen a Rubik Cube buit with OpenGL. Then I have also implemented all the possible mo

Gabro 2 Apr 15, 2022
Official PyTorch implementation for "Mixed supervision for surface-defect detection: from weakly to fully supervised learning"

Mixed supervision for surface-defect detection: from weakly to fully supervised learning [Computers in Industry 2021] Official PyTorch implementation

ViCoS Lab 169 Dec 30, 2022
Détection de créneaux de vaccination disponibles pour l'outil ViteMaDose

Vite Ma Dose ! est un outil open source de CovidTracker permettant de détecter les rendez-vous disponibles dans votre département afin de vous faire v

CovidTracker 239 Dec 13, 2022
Augmenting Anchors by the Detector Itself

Augmenting Anchors by the Detector Itself Introduction It is difficult to determine the scale and aspect ratio of anchors for anchor-based object dete

4 Nov 06, 2022
Textboxes implementation with Tensorflow (python)

tb_tensorflow A python implementation of TextBoxes Dependencies TensorFlow r1.0 OpenCV2 Code from Chaoyue Wang 03/09/2017 Update: 1.Debugging optimize

Jayne Shin (신재인) 20 May 31, 2019
QuanTaichi: A Compiler for Quantized Simulations (SIGGRAPH 2021)

QuanTaichi: A Compiler for Quantized Simulations (SIGGRAPH 2021) Yuanming Hu, Jiafeng Liu, Xuanda Yang, Mingkuan Xu, Ye Kuang, Weiwei Xu, Qiang Dai, W

Taichi Developers 119 Dec 02, 2022
Packaged, Pytorch-based, easy to use, cross-platform version of the CRAFT text detector

CRAFT: Character-Region Awareness For Text detection Packaged, Pytorch-based, easy to use, cross-platform version of the CRAFT text detector | Paper |

188 Dec 28, 2022
Connect Aseprite to Blender for painting pixelart textures in real time

Pribambase Pribambase is a small tool that connects Aseprite and Blender, to allow painting with instant viewport feedback and all functionality of ex

117 Jan 03, 2023
Optical character recognition for Japanese text, with the main focus being Japanese manga

Manga OCR Optical character recognition for Japanese text, with the main focus being Japanese manga. It uses a custom end-to-end model built with Tran

Maciej Budyś 327 Jan 01, 2023
The code for “Oriented RepPoints for Aerail Object Detection”

Oriented RepPoints for Aerial Object Detection The code for the implementation of “Oriented RepPoints”, Under review. (arXiv preprint) Introduction Or

WentongLi 207 Dec 24, 2022
TextBoxes++: A Single-Shot Oriented Scene Text Detector

TextBoxes++: A Single-Shot Oriented Scene Text Detector Introduction This is an application for scene text detection (TextBoxes++) and recognition (CR

Minghui Liao 930 Jan 04, 2023