GeneralOCR is open source Optical Character Recognition based on PyTorch.

Last update: Dec 29, 2022

Related tags

Overview

Introduction

GeneralOCR is open source Optical Character Recognition based on PyTorch. It makes a fidelity and useful tool to implement SOTA models on OCR domain. You can use them to infer and train the model with your customized dataset. The solution architecture of this project is re-implemented from facebook Detectron and openmm-cv.

Installation

Refer to the guideline of gen_ocr installation

Inference

Configuration

Model text detection

Supported Algorithms:

Text Detection

Algorithm	Paper	Python argument (--det)
- [x] DBNet (AAAI'2020)	https://arxiv.org/pdf/1911.08947	DB_r18, DB_r50
- [x] Mask R-CNN (ICCV'2017)	https://arxiv.org/abs/1703.06870	MaskRCNN_CTW, MaskRCNN_IC15, MaskRCNN_IC17
- [x] PANet (ICCV'2019)	https://arxiv.org/abs/1908.06391	PANet_CTW, PANet_IC15
- [x] PSENet (CVPR'2019)	https://arxiv.org/abs/1903.12473	PS_CTW, PS_IC15
- [x] TextSnake (ECCV'2018)	https://arxiv.org/abs/1807.01544	TextSnake
- [x] DRRG (CVPR'2020)	https://arxiv.org/abs/2003.07493	DRRG
- [x] FCENet (CVPR'2021)	https://arxiv.org/abs/2104.10442	FCE_IC15, FCE_CTW_DCNv2

Table 1: Text detection algorithms, papers and arguments configuration in package.

Model text recognition

Text Recognition

Algorithm	Paper	Python argument (--recog)
- [x] CRNN (TPAMI'2016)	https://arxiv.org/abs/1507.05717	CRNN, CRNN_TPS
- [x] NRTR (ICDAR'2019)	https://arxiv.org/abs/1806.00926	NRTR_1/8-1/4, NRTR_1/16-1/8
- [x] RobustScanner (ECCV'2020)	https://arxiv.org/abs/2007.07542	RobustScanner
- [x] SAR (AAAI'2019)	https://arxiv.org/abs/1811.00751	SAR
- [x] SATRN (CVPR'2020 Workshop on Text and Documents in the Deep Learning Era)	https://arxiv.org/abs/1910.04396	SATRN, SATRN_sm
- [x] SegOCR (Manuscript'2021)	-	SEG

Table 2: Text recognition algorithms, papers and arguments configuration in package.

Inference

# Activate your conda environment
conda activate gen_ocr
python general_ocr/utils/ocr.py demo/demo_text_ocr_2.jpg --print-result --imshow --det TextSnake --recog SEG

--det and --recog argument values are supplied in table 1 and table 2.

The result as below:

Training

Training with toy dataset

We prepare toy datasets for you to train on /tests/data folder in which you can do your experiment before training with the official datasets.

python tools/train.py configs/textrecog/robust_scanner/seg_r31_1by16_fpnocr_toy_dataset.py --work-dir seg

To change text recognition algorithm into sag:

python tools/train.py configs/textrecog/sar/sar_r31_parallel_decoder_toy_dataset.py --work-dir sar

Training with Academic dataset

When you train Academic dataset, you need to setup dataset directory as this guideline. The main point you should forecus is that your model point to the right dataset directory. Assume that you want to train model TextSnake on CTW1500 dataset, thus your config file of that model in configs/textdet/textsnake/textsnake_r50_fpn_unet_1200e_ctw1500.py should be as below:

dataset_type = 'IcdarDataset'
data_root = 'data/ctw1500/'


data = dict(
    samples_per_gpu=4,
    workers_per_gpu=4,
    val_dataloader=dict(samples_per_gpu=1),
    test_dataloader=dict(samples_per_gpu=1),
    train=dict(
        type=dataset_type,
        ann_file=f'{data_root}/instances_training.json',
        img_prefix=f'{data_root}/imgs',
        pipeline=train_pipeline),
    val=dict(
        type=dataset_type,
        ann_file=f'{data_root}/instances_test.json',
        img_prefix=f'{data_root}/imgs',
        pipeline=test_pipeline),
    test=dict(
        type=dataset_type,
        ann_file=f'{data_root}/instances_test.json',
        img_prefix=f'{data_root}/imgs',
        pipeline=test_pipeline))

Your data_root folder data/ctw1500/ have to be right. Afterward, train your model:

python tools/train.py configs/textdet/textsnake/textsnake_r50_fpn_unet_1200e_ctw1500.py --work-dir textsnake

To study other configuration parameters on training.

Testing

Now you completed training of TextSnake and get the checkpoint textsnake/lastest.pth. You should evaluate peformance on test set using hmean-iou metric:

python tools/test.py configs/textdet/textsnake/textsnake_r50_fpn_unet_1200e_ctw1500.py textsnake/latest.pth --eval hmean-iou

Citation

If you find this project is useful in your reasearch, kindly consider cite:

@article{genearal_ocr,
    title={GeneralOCR:  A Comprehensive package for OCR models},
    author={khanhphamdinh},
    email= {[email protected]},
    year={2021}
}

You might also like...

a reimplementation of Optical Flow Estimation using a Spatial Pyramid Network in PyTorch

pytorch-spynet This is a personal reimplementation of SPyNet [1] using PyTorch. Should you be making use of this work, please cite the paper according

269 Jan 2, 2023

OpenGAN: Open-Set Recognition via Open Data Generation

OpenGAN: Open-Set Recognition via Open Data Generation ICCV 2021 (oral) Real-world machine learning systems need to analyze novel testing data that di

90 Jan 6, 2023

Face Library is an open source package for accurate and real-time face detection and recognition

Face Library Face Library is an open source package for accurate and real-time face detection and recognition. The package is built over OpenCV and us

52 Nov 9, 2022

CharacterGAN: Few-Shot Keypoint Character Animation and Reposing

CharacterGAN Implementation of the paper "CharacterGAN: Few-Shot Keypoint Character Animation and Reposing" by Tobias Hinz, Matthew Fisher, Oliver Wan

181 Dec 27, 2022

Character Controllers using Motion VAEs

Character Controllers using Motion VAEs This repo is the codebase for the SIGGRAPH 2020 paper with the title above. Please find the paper and demo at

165 Jan 3, 2023

An addon uses SMPL's poses and global translation to drive cartoon character in Blender.

Blender addon for driving character The addon drives the cartoon character by passing SMPL's poses and global translation into model's armature in Ble

153 Dec 14, 2022

a reccurrent neural netowrk that when trained on a peice of text and fed a starting prompt will write its on 250 character text using LSTM layers

RNN-Playwrite a reccurrent neural netowrk that when trained on a peice of text and fed a starting prompt will write its on 250 character text using LS

1 Oct 29, 2021

Scripts and a shader to get you started on setting up an exported Koikatsu character in Blender.

KK Blender Shader Pack A plugin and a shader to get you started with setting up an exported Koikatsu character in Blender. The plugin is a Blender add

166 Jan 1, 2023

Character-Input - Create a program that asks the user to enter their name and their age

Character-Input Create a program that asks the user to enter their name and thei

0 Feb 6, 2022

Comments

Please consider License seriously

I found that your repository is based on the mmocr repo of OpenMMLab (https://github.com/open-mmlab/mmocr). Please at least cite the repo and preserve the copyrights before redistribution to acknowledge the authors' works.

Thanks.

opened by VinhLoiIT 1
Import error: undefine symbol

Dear author, When I run the test command: python general_ocr/utils/ocr.py demo/mrbean.png --print-result --imshow --det TextSnake --recog SEG

The output error is like this: ImportError: /home/avlab/general_ocr/general_ocr/_ext.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _Z42SigmoidFocalLossBackwardCUDAKernelLauncherN2at6TensorES0_S0_S0_ff

Do you know the problem and how to fix that, please?

opened by theohsiung 0
ModuleNotFoundError: No module named 'general_ocr._ext'

Dear author, When I run the test command: python general_ocr/utils/ocr.py demo/mrbean.png --print-result --imshow --det TextSnake --recog SEG

The output error is like this: ModuleNotFoundError: No module named 'general_ocr._ext', although I have installed the repo following the instruction in https://github.com/phamdinhkhanh/general_ocr/blob/main/docs/install.md.

Do you know the problem and how to fix that, please?

opened by ngthanhtin 3

ImportError: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.26' not found

Setup:

Log ERROR:

Traceback (most recent call last):
  File "general_ocr/utils/ocr.py", line 7, in <module>
    import general_ocr
  File "/usr/local/lib/python3.7/dist-packages/general_ocr-0.0.1-py3.7.egg/general_ocr/__init__.py", line 10, in <module>
    from .apis import *
  File "/usr/local/lib/python3.7/dist-packages/general_ocr-0.0.1-py3.7.egg/general_ocr/apis/__init__.py", line 2, in <module>
    from .inference import init_detector, model_inference, inference_detector
  File "/usr/local/lib/python3.7/dist-packages/general_ocr-0.0.1-py3.7.egg/general_ocr/apis/inference.py", line 10, in <module>
    from general_ocr.core import get_classes
  File "/usr/local/lib/python3.7/dist-packages/general_ocr-0.0.1-py3.7.egg/general_ocr/core/__init__.py", line 4, in <module>
    from .bbox import *  # noqa: F401, F403
  File "/usr/local/lib/python3.7/dist-packages/general_ocr-0.0.1-py3.7.egg/general_ocr/core/bbox/__init__.py", line 8, in <module>
    from .samplers import (BaseSampler, CombinedSampler,
  File "/usr/local/lib/python3.7/dist-packages/general_ocr-0.0.1-py3.7.egg/general_ocr/core/bbox/samplers/__init__.py", line 10, in <module>
    from .score_hlr_sampler import ScoreHLRSampler
  File "/usr/local/lib/python3.7/dist-packages/general_ocr-0.0.1-py3.7.egg/general_ocr/core/bbox/samplers/score_hlr_sampler.py", line 3, in <module>
    from general_ocr.ops import nms_match
  File "/usr/local/lib/python3.7/dist-packages/general_ocr-0.0.1-py3.7.egg/general_ocr/ops/__init__.py", line 2, in <module>
    from .ball_query import ball_query
  File "/usr/local/lib/python3.7/dist-packages/general_ocr-0.0.1-py3.7.egg/general_ocr/ops/ball_query.py", line 7, in <module>
    ext_module = ext_loader.load_ext('_ext', ['ball_query_forward'])
  File "/usr/local/lib/python3.7/dist-packages/general_ocr-0.0.1-py3.7.egg/general_ocr/utils/ext_loader.py", line 13, in load_ext
    ext = importlib.import_module('general_ocr.' + name)
  File "/usr/lib/python3.7/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
ImportError: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.26' not found (required by /usr/local/lib/python3.7/dist-packages/general_ocr-0.0.1-py3.7.egg/general_ocr/_ext.cpython-37m-x86_64-linux-gnu.so)

opened by Baristi000 1

Releases(general_ocr-0.0.1)

general_ocr-0.0.1(Oct 26, 2021)
Launch Project

Model support:

text detection: DBNet, Mask-RCNN, PANet, PSENet, TextSnake, DRRG, FCENet

text recognition: CRNN, NRTR, RobustScanner, SAR, SATRN, SegOCR

Source code(tar.gz)
Source code(zip)

Owner

GitHub Repository

This repo is to be freely used by ML devs to check the GAN performances without coding from scratch.

GANs for Fun Created because I can! GOAL The goal of this repo is to be freely used by ML devs to check the GAN performances without coding from scrat

13 Jan 26, 2022

The source code and data of the paper "Instance-wise Graph-based Framework for Multivariate Time Series Forecasting".

IGMTF The source code and data of the paper "Instance-wise Graph-based Framework for Multivariate Time Series Forecasting". Requirements The framework

24 Dec 05, 2022

PICARD - Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models

This is the official implementation of the following paper: Torsten Scholak, Nathan Schucher, Dzmitry Bahdanau. PICARD - Parsing Incrementally for Con

217 Jan 01, 2023

TensorFlow implementation of Adaptive Information Transfer Multi-task (AITM) framework. Code for the paper submitted to KDD21: Modeling the Sequential Dependence among Audience Multi-step Conversions with Multi-task Learning for Customer Acquisition.

AITM TensorFlow implementation of Adaptive Information Transfer Multi-task (AITM) framework. Code for the paper accepted by KDD21: Modeling the Sequen

78 Nov 29, 2022

[ICML 2020] DrRepair: Learning to Repair Programs from Error Messages

DrRepair: Learning to Repair Programs from Error Messages This repo provides the source code & data of our paper: Graph-based, Self-Supervised Program

155 Jan 08, 2023

Julia package for contraction of tensor networks, based on the sweep line algorithm outlined in the paper General tensor network decoding of 2D Pauli codes

35 Dec 21, 2022

K-PLUG: Knowledge-injected Pre-trained Language Model for Natural Language Understanding and Generation in E-Commerce (EMNLP Founding 2021)

Introduction K-PLUG: Knowledge-injected Pre-trained Language Model for Natural Language Understanding and Generation in E-Commerce. Installation PyTor

21 Nov 16, 2022

CM building dataset Timisoara

CM_building_dataset_Timisoara Date created: Febr-2020 The Timi\c{s}oara Building Dataset - TMBuD - is composed of 160 images with the resolution of 76

5 Sep 07, 2022

A pytorch-version implementation codes of paper: "BSN++: Complementary Boundary Regressor with Scale-Balanced Relation Modeling for Temporal Action Proposal Generation"

BSN++: Complementary Boundary Regressor with Scale-Balanced Relation Modeling for Temporal Action Proposal Generation A pytorch-version implementation

11 Oct 08, 2022

An end-to-end regression problem of predicting the price of properties in Bangalore.

Bangalore-House-Price-Prediction An end-to-end regression problem of predicting the price of properties in Bangalore. Deployed in Heroku using Flask.

1 Nov 25, 2022

🔎 Monitor deep learning model training and hardware usage from your mobile phone 📱

Monitor deep learning model training and hardware usage from mobile. 🔥 Features Monitor running experiments from mobile phone (or laptop) Monitor har

1.2k Dec 25, 2022

Wordle-solver - Wordle answer generation program in python

🟨 Wordle Solver 🟩 Wordle answer generation program in python ✔️ Requirements U

4 May 28, 2022

git《Joint Entity and Relation Extraction with Set Prediction Networks》(2020) GitHub:

Joint Entity and Relation Extraction with Set Prediction Networks Source code for Joint Entity and Relation Extraction with Set Prediction Networks. W

130 Dec 13, 2022

Deformable DETR is an efficient and fast-converging end-to-end object detector.

Deformable DETR: Deformable Transformers for End-to-End Object Detection.

2k Jan 05, 2023

Using image super resolution models with vapoursynth and speeding them up with TensorRT

vs-RealEsrganAnime-tensorrt-docker Using image super resolution models with vapoursynth and speeding them up with TensorRT. Also a docker image since

4 Aug 23, 2022

Classical OCR DCNN reproduction based on PaddlePaddle framework.

Paddle-SVHN Classical OCR DCNN reproduction based on PaddlePaddle framework. This project reproduces Multi-digit Number Recognition from Street View I

1 Nov 12, 2021

Source code for CAST - Crisis Domain Adaptation Using Sequence-to-sequence Transformers (Accepted to ISCRAM 2021, CorePaper).

Source code for CAST: Crisis Domain Adaptation UsingSequence-to-sequenceTransformers (Paper, BibTeX, Accepted to ISCRAM 2021, CorePaper) Quick start D

0 Jul 14, 2021

GeneralOCR is open source Optical Character Recognition based on PyTorch.

Related tags

Overview

Introduction

Installation

Inference

Configuration

Model text detection

Model text recognition

Inference

Training

Training with toy dataset

Training with Academic dataset

Testing

Citation

You might also like...

a reimplementation of Optical Flow Estimation using a Spatial Pyramid Network in PyTorch

OpenGAN: Open-Set Recognition via Open Data Generation

Face Library is an open source package for accurate and real-time face detection and recognition

CharacterGAN: Few-Shot Keypoint Character Animation and Reposing

Character Controllers using Motion VAEs

An addon uses SMPL's poses and global translation to drive cartoon character in Blender.

a reccurrent neural netowrk that when trained on a peice of text and fed a starting prompt will write its on 250 character text using LSTM layers

Scripts and a shader to get you started on setting up an exported Koikatsu character in Blender.

Character-Input - Create a program that asks the user to enter their name and their age

Comments

Please consider License seriously

Import error: undefine symbol

ModuleNotFoundError: No module named 'general_ocr._ext'

ImportError: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.26' not found

Setup:

Log ERROR:

Releases(general_ocr-0.0.1)

general_ocr-0.0.1(Oct 26, 2021)

Owner

This repo is to be freely used by ML devs to check the GAN performances without coding from scratch.

The source code and data of the paper "Instance-wise Graph-based Framework for Multivariate Time Series Forecasting".

PICARD - Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models

TensorFlow implementation of Adaptive Information Transfer Multi-task (AITM) framework. Code for the paper submitted to KDD21: Modeling the Sequential Dependence among Audience Multi-step Conversions with Multi-task Learning for Customer Acquisition.

[ICML 2020] DrRepair: Learning to Repair Programs from Error Messages

Julia package for contraction of tensor networks, based on the sweep line algorithm outlined in the paper General tensor network decoding of 2D Pauli codes

K-PLUG: Knowledge-injected Pre-trained Language Model for Natural Language Understanding and Generation in E-Commerce (EMNLP Founding 2021)

CM building dataset Timisoara

A pytorch-version implementation codes of paper: "BSN++: Complementary Boundary Regressor with Scale-Balanced Relation Modeling for Temporal Action Proposal Generation"

An end-to-end regression problem of predicting the price of properties in Bangalore.

🔎 Monitor deep learning model training and hardware usage from your mobile phone 📱

Wordle-solver - Wordle answer generation program in python

git《Joint Entity and Relation Extraction with Set Prediction Networks》(2020) GitHub:

Deformable DETR is an efficient and fast-converging end-to-end object detector.

Using image super resolution models with vapoursynth and speeding them up with TensorRT

Classical OCR DCNN reproduction based on PaddlePaddle framework.

Source code for CAST - Crisis Domain Adaptation Using Sequence-to-sequence Transformers (Accepted to ISCRAM 2021, CorePaper).

Densely Connected Search Space for More Flexible Neural Architecture Search (CVPR2020)

WSDM2022 Challenge - Large scale temporal graph link prediction

The code for 'Deep Residual Fourier Transformation for Single Image Deblurring'