GeneralOCR is open source Optical Character Recognition based on PyTorch.

Overview

Introduction

GeneralOCR is open source Optical Character Recognition based on PyTorch. It makes a fidelity and useful tool to implement SOTA models on OCR domain. You can use them to infer and train the model with your customized dataset. The solution architecture of this project is re-implemented from facebook Detectron and openmm-cv.

Installation

Refer to the guideline of gen_ocr installation

Inference

Configuration

Model text detection

Supported Algorithms:

Text Detection
Algorithm Paper Python argument (--det)
- [x] DBNet (AAAI'2020) https://arxiv.org/pdf/1911.08947 DB_r18, DB_r50
- [x] Mask R-CNN (ICCV'2017) https://arxiv.org/abs/1703.06870 MaskRCNN_CTW, MaskRCNN_IC15, MaskRCNN_IC17
- [x] PANet (ICCV'2019) https://arxiv.org/abs/1908.06391 PANet_CTW, PANet_IC15
- [x] PSENet (CVPR'2019) https://arxiv.org/abs/1903.12473 PS_CTW, PS_IC15
- [x] TextSnake (ECCV'2018) https://arxiv.org/abs/1807.01544 TextSnake
- [x] DRRG (CVPR'2020) https://arxiv.org/abs/2003.07493 DRRG
- [x] FCENet (CVPR'2021) https://arxiv.org/abs/2104.10442 FCE_IC15, FCE_CTW_DCNv2

Table 1: Text detection algorithms, papers and arguments configuration in package.

Model text recognition

Text Recognition
Algorithm Paper Python argument (--recog)
- [x] CRNN (TPAMI'2016) https://arxiv.org/abs/1507.05717 CRNN, CRNN_TPS
- [x] NRTR (ICDAR'2019) https://arxiv.org/abs/1806.00926 NRTR_1/8-1/4, NRTR_1/16-1/8
- [x] RobustScanner (ECCV'2020) https://arxiv.org/abs/2007.07542 RobustScanner
- [x] SAR (AAAI'2019) https://arxiv.org/abs/1811.00751 SAR
- [x] SATRN (CVPR'2020 Workshop on Text and Documents in the Deep Learning Era) https://arxiv.org/abs/1910.04396 SATRN, SATRN_sm
- [x] SegOCR (Manuscript'2021) - SEG

Table 2: Text recognition algorithms, papers and arguments configuration in package.

Inference

# Activate your conda environment
conda activate gen_ocr
python general_ocr/utils/ocr.py demo/demo_text_ocr_2.jpg --print-result --imshow --det TextSnake --recog SEG

--det and --recog argument values are supplied in table 1 and table 2.

The result as below:

demo image 1

Training

Training with toy dataset

We prepare toy datasets for you to train on /tests/data folder in which you can do your experiment before training with the official datasets.

python tools/train.py configs/textrecog/robust_scanner/seg_r31_1by16_fpnocr_toy_dataset.py --work-dir seg

To change text recognition algorithm into sag:

python tools/train.py configs/textrecog/sar/sar_r31_parallel_decoder_toy_dataset.py --work-dir sar

Training with Academic dataset

When you train Academic dataset, you need to setup dataset directory as this guideline. The main point you should forecus is that your model point to the right dataset directory. Assume that you want to train model TextSnake on CTW1500 dataset, thus your config file of that model in configs/textdet/textsnake/textsnake_r50_fpn_unet_1200e_ctw1500.py should be as below:

dataset_type = 'IcdarDataset'
data_root = 'data/ctw1500/'


data = dict(
    samples_per_gpu=4,
    workers_per_gpu=4,
    val_dataloader=dict(samples_per_gpu=1),
    test_dataloader=dict(samples_per_gpu=1),
    train=dict(
        type=dataset_type,
        ann_file=f'{data_root}/instances_training.json',
        img_prefix=f'{data_root}/imgs',
        pipeline=train_pipeline),
    val=dict(
        type=dataset_type,
        ann_file=f'{data_root}/instances_test.json',
        img_prefix=f'{data_root}/imgs',
        pipeline=test_pipeline),
    test=dict(
        type=dataset_type,
        ann_file=f'{data_root}/instances_test.json',
        img_prefix=f'{data_root}/imgs',
        pipeline=test_pipeline))

Your data_root folder data/ctw1500/ have to be right. Afterward, train your model:

python tools/train.py configs/textdet/textsnake/textsnake_r50_fpn_unet_1200e_ctw1500.py --work-dir textsnake

To study other configuration parameters on training.

Testing

Now you completed training of TextSnake and get the checkpoint textsnake/lastest.pth. You should evaluate peformance on test set using hmean-iou metric:

python tools/test.py configs/textdet/textsnake/textsnake_r50_fpn_unet_1200e_ctw1500.py textsnake/latest.pth --eval hmean-iou

Citation

If you find this project is useful in your reasearch, kindly consider cite:

@article{genearal_ocr,
    title={GeneralOCR:  A Comprehensive package for OCR models},
    author={khanhphamdinh},
    email= {[email protected]},
    year={2021}
}
You might also like...
 a reimplementation of Optical Flow Estimation using a Spatial Pyramid Network in PyTorch
a reimplementation of Optical Flow Estimation using a Spatial Pyramid Network in PyTorch

pytorch-spynet This is a personal reimplementation of SPyNet [1] using PyTorch. Should you be making use of this work, please cite the paper according

 OpenGAN: Open-Set Recognition via Open Data Generation
OpenGAN: Open-Set Recognition via Open Data Generation

OpenGAN: Open-Set Recognition via Open Data Generation ICCV 2021 (oral) Real-world machine learning systems need to analyze novel testing data that di

Face Library is an open source package for accurate and real-time face detection and recognition
Face Library is an open source package for accurate and real-time face detection and recognition

Face Library Face Library is an open source package for accurate and real-time face detection and recognition. The package is built over OpenCV and us

CharacterGAN: Few-Shot Keypoint Character Animation and Reposing
CharacterGAN: Few-Shot Keypoint Character Animation and Reposing

CharacterGAN Implementation of the paper "CharacterGAN: Few-Shot Keypoint Character Animation and Reposing" by Tobias Hinz, Matthew Fisher, Oliver Wan

Character Controllers using Motion VAEs

Character Controllers using Motion VAEs This repo is the codebase for the SIGGRAPH 2020 paper with the title above. Please find the paper and demo at

An addon uses SMPL's poses and global translation to drive cartoon character in Blender.
An addon uses SMPL's poses and global translation to drive cartoon character in Blender.

Blender addon for driving character The addon drives the cartoon character by passing SMPL's poses and global translation into model's armature in Ble

a reccurrent neural netowrk that when trained on a peice of text and fed a starting prompt will write its on 250 character text using LSTM layers

RNN-Playwrite a reccurrent neural netowrk that when trained on a peice of text and fed a starting prompt will write its on 250 character text using LS

Scripts and a shader to get you started on setting up an exported Koikatsu character in Blender.
Scripts and a shader to get you started on setting up an exported Koikatsu character in Blender.

KK Blender Shader Pack A plugin and a shader to get you started with setting up an exported Koikatsu character in Blender. The plugin is a Blender add

Character-Input - Create a program that asks the user to enter their name and their age

Character-Input Create a program that asks the user to enter their name and thei

Comments
  • Please consider License seriously

    Please consider License seriously

    I found that your repository is based on the mmocr repo of OpenMMLab (https://github.com/open-mmlab/mmocr). Please at least cite the repo and preserve the copyrights before redistribution to acknowledge the authors' works.

    Thanks.

    opened by VinhLoiIT 1
  • Import error: undefine symbol

    Import error: undefine symbol

    Dear author, When I run the test command: python general_ocr/utils/ocr.py demo/mrbean.png --print-result --imshow --det TextSnake --recog SEG

    The output error is like this: ImportError: /home/avlab/general_ocr/general_ocr/_ext.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _Z42SigmoidFocalLossBackwardCUDAKernelLauncherN2at6TensorES0_S0_S0_ff

    Do you know the problem and how to fix that, please?

    opened by theohsiung 0
  • ModuleNotFoundError: No module named 'general_ocr._ext'

    ModuleNotFoundError: No module named 'general_ocr._ext'

    Dear author, When I run the test command: python general_ocr/utils/ocr.py demo/mrbean.png --print-result --imshow --det TextSnake --recog SEG

    The output error is like this: ModuleNotFoundError: No module named 'general_ocr._ext', although I have installed the repo following the instruction in https://github.com/phamdinhkhanh/general_ocr/blob/main/docs/install.md.

    Do you know the problem and how to fix that, please?

    opened by ngthanhtin 3
  • ImportError: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.26' not found

    ImportError: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.26' not found

    Setup:

    Screen Shot 2021-10-17 at 1 17 03 AM

    Log ERROR:

    Traceback (most recent call last):
      File "general_ocr/utils/ocr.py", line 7, in <module>
        import general_ocr
      File "/usr/local/lib/python3.7/dist-packages/general_ocr-0.0.1-py3.7.egg/general_ocr/__init__.py", line 10, in <module>
        from .apis import *
      File "/usr/local/lib/python3.7/dist-packages/general_ocr-0.0.1-py3.7.egg/general_ocr/apis/__init__.py", line 2, in <module>
        from .inference import init_detector, model_inference, inference_detector
      File "/usr/local/lib/python3.7/dist-packages/general_ocr-0.0.1-py3.7.egg/general_ocr/apis/inference.py", line 10, in <module>
        from general_ocr.core import get_classes
      File "/usr/local/lib/python3.7/dist-packages/general_ocr-0.0.1-py3.7.egg/general_ocr/core/__init__.py", line 4, in <module>
        from .bbox import *  # noqa: F401, F403
      File "/usr/local/lib/python3.7/dist-packages/general_ocr-0.0.1-py3.7.egg/general_ocr/core/bbox/__init__.py", line 8, in <module>
        from .samplers import (BaseSampler, CombinedSampler,
      File "/usr/local/lib/python3.7/dist-packages/general_ocr-0.0.1-py3.7.egg/general_ocr/core/bbox/samplers/__init__.py", line 10, in <module>
        from .score_hlr_sampler import ScoreHLRSampler
      File "/usr/local/lib/python3.7/dist-packages/general_ocr-0.0.1-py3.7.egg/general_ocr/core/bbox/samplers/score_hlr_sampler.py", line 3, in <module>
        from general_ocr.ops import nms_match
      File "/usr/local/lib/python3.7/dist-packages/general_ocr-0.0.1-py3.7.egg/general_ocr/ops/__init__.py", line 2, in <module>
        from .ball_query import ball_query
      File "/usr/local/lib/python3.7/dist-packages/general_ocr-0.0.1-py3.7.egg/general_ocr/ops/ball_query.py", line 7, in <module>
        ext_module = ext_loader.load_ext('_ext', ['ball_query_forward'])
      File "/usr/local/lib/python3.7/dist-packages/general_ocr-0.0.1-py3.7.egg/general_ocr/utils/ext_loader.py", line 13, in load_ext
        ext = importlib.import_module('general_ocr.' + name)
      File "/usr/lib/python3.7/importlib/__init__.py", line 127, in import_module
        return _bootstrap._gcd_import(name[level:], package, level)
    ImportError: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.26' not found (required by /usr/local/lib/python3.7/dist-packages/general_ocr-0.0.1-py3.7.egg/general_ocr/_ext.cpython-37m-x86_64-linux-gnu.so)
    
    opened by Baristi000 1
Releases(general_ocr-0.0.1)
  • general_ocr-0.0.1(Oct 26, 2021)

    • Launch Project
    • Model support:
      • text detection: DBNet, Mask-RCNN, PANet, PSENet, TextSnake, DRRG, FCENet
      • text recognition: CRNN, NRTR, RobustScanner, SAR, SATRN, SegOCR
    Source code(tar.gz)
    Source code(zip)
Official PyTorch implementation of the paper "Deep Constrained Least Squares for Blind Image Super-Resolution", CVPR 2022.

Deep Constrained Least Squares for Blind Image Super-Resolution [Paper] This is the official implementation of 'Deep Constrained Least Squares for Bli

MEGVII Research 141 Dec 30, 2022
[CVPR 2022] TransEditor: Transformer-Based Dual-Space GAN for Highly Controllable Facial Editing

TransEditor: Transformer-Based Dual-Space GAN for Highly Controllable Facial Editing (CVPR 2022) This repository provides the official PyTorch impleme

Billy XU 128 Jan 03, 2023
CV backbones including GhostNet, TinyNet and TNT, developed by Huawei Noah's Ark Lab.

CV Backbones including GhostNet, TinyNet, TNT (Transformer in Transformer) developed by Huawei Noah's Ark Lab. GhostNet Code TinyNet Code TNT Code Pyr

HUAWEI Noah's Ark Lab 3k Jan 08, 2023
Machine Learning Framework for Operating Systems - Brings ML to Linux kernel

KML: A Machine Learning Framework for Operating Systems & Storage Systems Storage systems and their OS components are designed to accommodate a wide v

File systems and Storage Lab (FSL) 186 Nov 24, 2022
AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty

AugMix Introduction We propose AugMix, a data processing technique that mixes augmented images and enforces consistent embeddings of the augmented ima

Google Research 876 Dec 17, 2022
A simple program for training and testing vit

Vit This is a simple program for training and testing vit. Key requirements: torch, torchvision and timm. Dataset I put 5 categories of the cub classi

xiezhenyu 2 Oct 11, 2022
Dataset and Source code of paper 'Enhancing Keyphrase Extraction from Academic Articles with their Reference Information'.

Enhancing Keyphrase Extraction from Academic Articles with their Reference Information Overview Dataset and code for paper "Enhancing Keyphrase Extrac

15 Nov 24, 2022
Official implementation of "Variable-Rate Deep Image Compression through Spatially-Adaptive Feature Transform", ICCV 2021

Variable-Rate Deep Image Compression through Spatially-Adaptive Feature Transform This repository is the implementation of "Variable-Rate Deep Image C

Myungseo Song 47 Dec 13, 2022
Machine Learning toolbox for Humans

Reproducible Experiment Platform (REP) REP is ipython-based environment for conducting data-driven research in a consistent and reproducible way. Main

Yandex 662 Nov 20, 2022
Predicting 10 different clothing types using Xception pre-trained model.

Predicting-Clothing-Types Predicting 10 different clothing types using Xception pre-trained model from Keras library. It is reimplemented version from

AbdAssalam Ahmad 3 Dec 29, 2021
The code is for the paper "A Self-Distillation Embedded Supervised Affinity Attention Model for Few-Shot Segmentation"

SD-AANet The code is for the paper "A Self-Distillation Embedded Supervised Affinity Attention Model for Few-Shot Segmentation" [arxiv] Overview confi

cv516Buaa 9 Nov 07, 2022
we propose a novel deep network, named feature aggregation and refinement network (FARNet), for the automatic detection of anatomical landmarks.

Feature Aggregation and Refinement Network for 2D Anatomical Landmark Detection Overview Localization of anatomical landmarks is essential for clinica

aoyueyuan 0 Aug 28, 2022
[NeurIPS 2021] "Drawing Robust Scratch Tickets: Subnetworks with Inborn Robustness Are Found within Randomly Initialized Networks" by Yonggan Fu, Qixuan Yu, Yang Zhang, Shang Wu, Xu Ouyang, David Cox, Yingyan Lin

Drawing Robust Scratch Tickets: Subnetworks with Inborn Robustness Are Found within Randomly Initialized Networks Yonggan Fu, Qixuan Yu, Yang Zhang, S

12 Dec 11, 2022
Deep Latent Force Models

Deep Latent Force Models This repository contains a PyTorch implementation of the deep latent force model (DLFM), presented in the paper, Compositiona

Tom McDonald 5 Oct 26, 2022
FinGAT: A Financial Graph Attention Networkto Recommend Top-K Profitable Stocks

FinGAT: A Financial Graph Attention Networkto Recommend Top-K Profitable Stocks This is our implementation for the paper: FinGAT: A Financial Graph At

Yu-Che Tsai 64 Dec 13, 2022
This project is used for the paper Differentiable Programming of Isometric Tensor Network

This project is used for the paper "Differentiable Programming of Isometric Tensor Network". (arXiv:2110.03898)

Chenhua Geng 15 Dec 13, 2022
This repository contains the database and code used in the paper Embedding Arithmetic for Text-driven Image Transformation

This repository contains the database and code used in the paper Embedding Arithmetic for Text-driven Image Transformation (Guillaume Couairon, Holger

Meta Research 31 Oct 17, 2022
Vehicles Counting using YOLOv4 + DeepSORT + Flask + Ngrok

A project for counting vehicles using YOLOv4 + DeepSORT + Flask + Ngrok

Duong Tran Thanh 37 Dec 16, 2022
Implementation based on Paper - Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling

Implementation based on Paper - Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling

HamasKhan 3 Jul 08, 2022
Implementation of "StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis"

StrengthNet Implementation of "StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis" https://arxiv.org/abs/2110

RuiLiu 65 Dec 20, 2022