GeneralOCR is open source Optical Character Recognition based on PyTorch.

Overview

Introduction

GeneralOCR is open source Optical Character Recognition based on PyTorch. It makes a fidelity and useful tool to implement SOTA models on OCR domain. You can use them to infer and train the model with your customized dataset. The solution architecture of this project is re-implemented from facebook Detectron and openmm-cv.

Installation

Refer to the guideline of gen_ocr installation

Inference

Configuration

Model text detection

Supported Algorithms:

Text Detection
Algorithm Paper Python argument (--det)
- [x] DBNet (AAAI'2020) https://arxiv.org/pdf/1911.08947 DB_r18, DB_r50
- [x] Mask R-CNN (ICCV'2017) https://arxiv.org/abs/1703.06870 MaskRCNN_CTW, MaskRCNN_IC15, MaskRCNN_IC17
- [x] PANet (ICCV'2019) https://arxiv.org/abs/1908.06391 PANet_CTW, PANet_IC15
- [x] PSENet (CVPR'2019) https://arxiv.org/abs/1903.12473 PS_CTW, PS_IC15
- [x] TextSnake (ECCV'2018) https://arxiv.org/abs/1807.01544 TextSnake
- [x] DRRG (CVPR'2020) https://arxiv.org/abs/2003.07493 DRRG
- [x] FCENet (CVPR'2021) https://arxiv.org/abs/2104.10442 FCE_IC15, FCE_CTW_DCNv2

Table 1: Text detection algorithms, papers and arguments configuration in package.

Model text recognition

Text Recognition
Algorithm Paper Python argument (--recog)
- [x] CRNN (TPAMI'2016) https://arxiv.org/abs/1507.05717 CRNN, CRNN_TPS
- [x] NRTR (ICDAR'2019) https://arxiv.org/abs/1806.00926 NRTR_1/8-1/4, NRTR_1/16-1/8
- [x] RobustScanner (ECCV'2020) https://arxiv.org/abs/2007.07542 RobustScanner
- [x] SAR (AAAI'2019) https://arxiv.org/abs/1811.00751 SAR
- [x] SATRN (CVPR'2020 Workshop on Text and Documents in the Deep Learning Era) https://arxiv.org/abs/1910.04396 SATRN, SATRN_sm
- [x] SegOCR (Manuscript'2021) - SEG

Table 2: Text recognition algorithms, papers and arguments configuration in package.

Inference

# Activate your conda environment
conda activate gen_ocr
python general_ocr/utils/ocr.py demo/demo_text_ocr_2.jpg --print-result --imshow --det TextSnake --recog SEG

--det and --recog argument values are supplied in table 1 and table 2.

The result as below:

demo image 1

Training

Training with toy dataset

We prepare toy datasets for you to train on /tests/data folder in which you can do your experiment before training with the official datasets.

python tools/train.py configs/textrecog/robust_scanner/seg_r31_1by16_fpnocr_toy_dataset.py --work-dir seg

To change text recognition algorithm into sag:

python tools/train.py configs/textrecog/sar/sar_r31_parallel_decoder_toy_dataset.py --work-dir sar

Training with Academic dataset

When you train Academic dataset, you need to setup dataset directory as this guideline. The main point you should forecus is that your model point to the right dataset directory. Assume that you want to train model TextSnake on CTW1500 dataset, thus your config file of that model in configs/textdet/textsnake/textsnake_r50_fpn_unet_1200e_ctw1500.py should be as below:

dataset_type = 'IcdarDataset'
data_root = 'data/ctw1500/'


data = dict(
    samples_per_gpu=4,
    workers_per_gpu=4,
    val_dataloader=dict(samples_per_gpu=1),
    test_dataloader=dict(samples_per_gpu=1),
    train=dict(
        type=dataset_type,
        ann_file=f'{data_root}/instances_training.json',
        img_prefix=f'{data_root}/imgs',
        pipeline=train_pipeline),
    val=dict(
        type=dataset_type,
        ann_file=f'{data_root}/instances_test.json',
        img_prefix=f'{data_root}/imgs',
        pipeline=test_pipeline),
    test=dict(
        type=dataset_type,
        ann_file=f'{data_root}/instances_test.json',
        img_prefix=f'{data_root}/imgs',
        pipeline=test_pipeline))

Your data_root folder data/ctw1500/ have to be right. Afterward, train your model:

python tools/train.py configs/textdet/textsnake/textsnake_r50_fpn_unet_1200e_ctw1500.py --work-dir textsnake

To study other configuration parameters on training.

Testing

Now you completed training of TextSnake and get the checkpoint textsnake/lastest.pth. You should evaluate peformance on test set using hmean-iou metric:

python tools/test.py configs/textdet/textsnake/textsnake_r50_fpn_unet_1200e_ctw1500.py textsnake/latest.pth --eval hmean-iou

Citation

If you find this project is useful in your reasearch, kindly consider cite:

@article{genearal_ocr,
    title={GeneralOCR:  A Comprehensive package for OCR models},
    author={khanhphamdinh},
    email= {[email protected]},
    year={2021}
}
You might also like...
 a reimplementation of Optical Flow Estimation using a Spatial Pyramid Network in PyTorch
a reimplementation of Optical Flow Estimation using a Spatial Pyramid Network in PyTorch

pytorch-spynet This is a personal reimplementation of SPyNet [1] using PyTorch. Should you be making use of this work, please cite the paper according

 OpenGAN: Open-Set Recognition via Open Data Generation
OpenGAN: Open-Set Recognition via Open Data Generation

OpenGAN: Open-Set Recognition via Open Data Generation ICCV 2021 (oral) Real-world machine learning systems need to analyze novel testing data that di

Face Library is an open source package for accurate and real-time face detection and recognition
Face Library is an open source package for accurate and real-time face detection and recognition

Face Library Face Library is an open source package for accurate and real-time face detection and recognition. The package is built over OpenCV and us

CharacterGAN: Few-Shot Keypoint Character Animation and Reposing
CharacterGAN: Few-Shot Keypoint Character Animation and Reposing

CharacterGAN Implementation of the paper "CharacterGAN: Few-Shot Keypoint Character Animation and Reposing" by Tobias Hinz, Matthew Fisher, Oliver Wan

Character Controllers using Motion VAEs

Character Controllers using Motion VAEs This repo is the codebase for the SIGGRAPH 2020 paper with the title above. Please find the paper and demo at

An addon uses SMPL's poses and global translation to drive cartoon character in Blender.
An addon uses SMPL's poses and global translation to drive cartoon character in Blender.

Blender addon for driving character The addon drives the cartoon character by passing SMPL's poses and global translation into model's armature in Ble

a reccurrent neural netowrk that when trained on a peice of text and fed a starting prompt will write its on 250 character text using LSTM layers

RNN-Playwrite a reccurrent neural netowrk that when trained on a peice of text and fed a starting prompt will write its on 250 character text using LS

Scripts and a shader to get you started on setting up an exported Koikatsu character in Blender.
Scripts and a shader to get you started on setting up an exported Koikatsu character in Blender.

KK Blender Shader Pack A plugin and a shader to get you started with setting up an exported Koikatsu character in Blender. The plugin is a Blender add

Character-Input - Create a program that asks the user to enter their name and their age

Character-Input Create a program that asks the user to enter their name and thei

Comments
  • Please consider License seriously

    Please consider License seriously

    I found that your repository is based on the mmocr repo of OpenMMLab (https://github.com/open-mmlab/mmocr). Please at least cite the repo and preserve the copyrights before redistribution to acknowledge the authors' works.

    Thanks.

    opened by VinhLoiIT 1
  • Import error: undefine symbol

    Import error: undefine symbol

    Dear author, When I run the test command: python general_ocr/utils/ocr.py demo/mrbean.png --print-result --imshow --det TextSnake --recog SEG

    The output error is like this: ImportError: /home/avlab/general_ocr/general_ocr/_ext.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _Z42SigmoidFocalLossBackwardCUDAKernelLauncherN2at6TensorES0_S0_S0_ff

    Do you know the problem and how to fix that, please?

    opened by theohsiung 0
  • ModuleNotFoundError: No module named 'general_ocr._ext'

    ModuleNotFoundError: No module named 'general_ocr._ext'

    Dear author, When I run the test command: python general_ocr/utils/ocr.py demo/mrbean.png --print-result --imshow --det TextSnake --recog SEG

    The output error is like this: ModuleNotFoundError: No module named 'general_ocr._ext', although I have installed the repo following the instruction in https://github.com/phamdinhkhanh/general_ocr/blob/main/docs/install.md.

    Do you know the problem and how to fix that, please?

    opened by ngthanhtin 3
  • ImportError: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.26' not found

    ImportError: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.26' not found

    Setup:

    Screen Shot 2021-10-17 at 1 17 03 AM

    Log ERROR:

    Traceback (most recent call last):
      File "general_ocr/utils/ocr.py", line 7, in <module>
        import general_ocr
      File "/usr/local/lib/python3.7/dist-packages/general_ocr-0.0.1-py3.7.egg/general_ocr/__init__.py", line 10, in <module>
        from .apis import *
      File "/usr/local/lib/python3.7/dist-packages/general_ocr-0.0.1-py3.7.egg/general_ocr/apis/__init__.py", line 2, in <module>
        from .inference import init_detector, model_inference, inference_detector
      File "/usr/local/lib/python3.7/dist-packages/general_ocr-0.0.1-py3.7.egg/general_ocr/apis/inference.py", line 10, in <module>
        from general_ocr.core import get_classes
      File "/usr/local/lib/python3.7/dist-packages/general_ocr-0.0.1-py3.7.egg/general_ocr/core/__init__.py", line 4, in <module>
        from .bbox import *  # noqa: F401, F403
      File "/usr/local/lib/python3.7/dist-packages/general_ocr-0.0.1-py3.7.egg/general_ocr/core/bbox/__init__.py", line 8, in <module>
        from .samplers import (BaseSampler, CombinedSampler,
      File "/usr/local/lib/python3.7/dist-packages/general_ocr-0.0.1-py3.7.egg/general_ocr/core/bbox/samplers/__init__.py", line 10, in <module>
        from .score_hlr_sampler import ScoreHLRSampler
      File "/usr/local/lib/python3.7/dist-packages/general_ocr-0.0.1-py3.7.egg/general_ocr/core/bbox/samplers/score_hlr_sampler.py", line 3, in <module>
        from general_ocr.ops import nms_match
      File "/usr/local/lib/python3.7/dist-packages/general_ocr-0.0.1-py3.7.egg/general_ocr/ops/__init__.py", line 2, in <module>
        from .ball_query import ball_query
      File "/usr/local/lib/python3.7/dist-packages/general_ocr-0.0.1-py3.7.egg/general_ocr/ops/ball_query.py", line 7, in <module>
        ext_module = ext_loader.load_ext('_ext', ['ball_query_forward'])
      File "/usr/local/lib/python3.7/dist-packages/general_ocr-0.0.1-py3.7.egg/general_ocr/utils/ext_loader.py", line 13, in load_ext
        ext = importlib.import_module('general_ocr.' + name)
      File "/usr/lib/python3.7/importlib/__init__.py", line 127, in import_module
        return _bootstrap._gcd_import(name[level:], package, level)
    ImportError: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.26' not found (required by /usr/local/lib/python3.7/dist-packages/general_ocr-0.0.1-py3.7.egg/general_ocr/_ext.cpython-37m-x86_64-linux-gnu.so)
    
    opened by Baristi000 1
Releases(general_ocr-0.0.1)
  • general_ocr-0.0.1(Oct 26, 2021)

    • Launch Project
    • Model support:
      • text detection: DBNet, Mask-RCNN, PANet, PSENet, TextSnake, DRRG, FCENet
      • text recognition: CRNN, NRTR, RobustScanner, SAR, SATRN, SegOCR
    Source code(tar.gz)
    Source code(zip)
Implementation of H-Transformer-1D, Hierarchical Attention for Sequence Learning

H-Transformer-1D Implementation of H-Transformer-1D, Transformer using hierarchical Attention for sequence learning with subquadratic costs. For now,

Phil Wang 123 Nov 17, 2022
Teaching end to end workflow of deep learning

Deep-Education This repository is now available for public use for teaching end to end workflow of deep learning. This implies that learners/researche

Data Lab at College of William and Mary 2 Sep 26, 2022
Styled Handwritten Text Generation with Transformers (ICCV 21)

⚡ Handwriting Transformers [PDF] Ankan Kumar Bhunia, Salman Khan, Hisham Cholakkal, Rao Muhammad Anwer, Fahad Shahbaz Khan & Mubarak Shah Abstract: We

Ankan Kumar Bhunia 85 Dec 22, 2022
Real-Time Semantic Segmentation in Mobile device

Real-Time Semantic Segmentation in Mobile device This project is an example project of semantic segmentation for mobile real-time app. The architectur

708 Jan 01, 2023
ParmeSan: Sanitizer-guided Greybox Fuzzing

ParmeSan: Sanitizer-guided Greybox Fuzzing ParmeSan is a sanitizer-guided greybox fuzzer based on Angora. Published Work USENIX Security 2020: ParmeSa

VUSec 158 Dec 31, 2022
🏅 Top 5% in 제2회 연구개발특구 인공지능 경진대회 AI SPARK 챌린지

AI_SPARK_CHALLENG_Object_Detection 제2회 연구개발특구 인공지능 경진대회 AI SPARK 챌린지 🏅 Top 5% in mAP(0.75) (443명 중 13등, mAP: 0.98116) 대회 설명 Edge 환경에서의 가축 Object Dete

3 Sep 19, 2022
Spatial Action Maps for Mobile Manipulation (RSS 2020)

spatial-action-maps Update: Please see our new spatial-intention-maps repository, which extends this work to multi-agent settings. It contains many ne

Jimmy Wu 27 Nov 30, 2022
Implementation of paper: "Image Super-Resolution Using Dense Skip Connections" in PyTorch

SRDenseNet-pytorch Implementation of paper: "Image Super-Resolution Using Dense Skip Connections" in PyTorch (http://openaccess.thecvf.com/content_ICC

wxy 114 Nov 26, 2022
On Evaluation Metrics for Graph Generative Models

On Evaluation Metrics for Graph Generative Models Authors: Rylee Thompson, Boris Knyazev, Elahe Ghalebi, Jungtaek Kim, Graham Taylor This is the offic

13 Jan 07, 2023
[CVPR2021 Oral] UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers This is the official PyTorch implementation and models for UP-DETR paper: @a

dddzg 430 Dec 23, 2022
A Dataset of Python Challenges for AI Research

Python Programming Puzzles (P3) This repo contains a dataset of python programming puzzles which can be used to teach and evaluate an AI's programming

Microsoft 850 Dec 24, 2022
Picasso: A CUDA-based Library for Deep Learning over 3D Meshes

The Picasso Library is intended for complex real-world applications with large-scale surfaces, while it also performs impressively on the small-scale applications over synthetic shape manifolds. We h

97 Dec 01, 2022
StarGAN2 for practice

StarGAN2 for practice This version of StarGAN2 (coined as 'Post-modern Style Transfer') is intended mostly for fellow artists, who rarely look at scie

vadim epstein 87 Sep 24, 2022
Deep Q Learning with OpenAI Gym and Pokemon Showdown

pokemon-deep-learning An openAI gym project for pokemon involving deep q learning. Made by myself, Sam Little, and Layton Webber. This code captures g

2 Dec 22, 2021
Using pretrained GROVER to extract the atomic fingerprints from molecule

Extracting atomic fingerprints from molecules using pretrained Graph Neural Network models (GROVER).

Xuan Vu Nguyen 1 Jan 28, 2022
Code for ICLR 2020 paper "VL-BERT: Pre-training of Generic Visual-Linguistic Representations".

VL-BERT By Weijie Su, Xizhou Zhu, Yue Cao, Bin Li, Lewei Lu, Furu Wei, Jifeng Dai. This repository is an official implementation of the paper VL-BERT:

Weijie Su 698 Dec 18, 2022
use tensorflow 2.0 to tell a dog and cat from a specified picture

dog_or_cat use tensorflow 2.0 to tell a dog and cat from a specified picture This is one of the classic experiments for the introduction of deep learn

你这个代码我看不懂 1 Oct 22, 2021
PyAF is an Open Source Python library for Automatic Time Series Forecasting built on top of popular pydata modules.

PyAF (Python Automatic Forecasting) PyAF is an Open Source Python library for Automatic Forecasting built on top of popular data science python module

CARME Antoine 405 Jan 02, 2023
SEC'21: Sparse Bitmap Compression for Memory-Efficient Training onthe Edge

Training Deep Learning Models on The Edge Training on the Edge enables continuous learning from new data for deployed neural networks on memory-constr

Brown University Scale Lab 4 Nov 18, 2022
This repo contains the implementation of YOLOv2 in Keras with Tensorflow backend.

Easy training on custom dataset. Various backends (MobileNet and SqueezeNet) supported. A YOLO demo to detect raccoon run entirely in brower is accessible at https://git.io/vF7vI (not on Windows).

Huynh Ngoc Anh 1.7k Dec 24, 2022