Simple and understandable swin-transformer OCR project

Last update: Dec 31, 2022

Overview

swin-transformer-ocr

Overview

Simple and understandable swin-transformer OCR project. The model in this repository heavily relied on high-level open-source projects like timm and x_transformers. And also you can find that the procedure of training is intuitive thanks to the legibility of pytorch-lightning.

The model in this repository encodes input image to context vector with 'shifted-window` which is a swin-transformer encoding mechanism. And it decodes the vector with a normal auto-regressive transformer.

If you are not familiar with transformer OCR structure, transformer-ocr would be easier to understand because it uses a traditional convolution network (ResNet-v2) for the encoder.

Performance

With private korean handwritten text dataset, the accuracy(exact match) is 97.6%.

Data

./dataset/
├─ preprocessed_image/
│  ├─ cropped_image_0.jpg
│  ├─ cropped_image_1.jpg
│  ├─ ...
├─ train.txt
└─ val.txt

# in train.txt
cropped_image_0.jpg\tHello World.
cropped_image_1.jpg\tvision-transformer-ocr
...

You should preprocess the data first. Crop the image by word or sentence level area. Put all image data in a specific directory. Ground truth information should be provided with a txt file. In the txt file, write the image file name and label with \t separator in the same line.

Configuration

In settings/ directory, you can find default.yaml. You can set almost every hyper-parameter in that file. Copy one and edit it as your experiment version. I recommend you to run with the default setting first, before you change it.

Train

python run.py --version 0 --setting settings/default.yaml --num_workers 16 --batch_size 128

You can check your training log with tensorboard.

tensorboard --log_dir tb_logs --bind_all

Predict

When your model finishes training, you can use your model for prediction.

python predict.py --setting <your_setting.yaml> --target <image_or_directory> --tokenizer <your_tokenizer_pkl> --checkpoint <saved_checkpoint>

Exporting to ONNX

You can export your model to ONNX format. It's very easy thanks to pytorch-lightning. See the related pytorch-lightning document.

Citations

@misc{liu-2021,
    title   = {Swin Transformer: Hierarchical Vision Transformer using Shifted Windows},
	author  = {Ze Liu and Yutong Lin and Yue Cao and Han Hu and Yixuan Wei and Zheng Zhang and Stephen Lin and Baining Guo},
	year    = {2021},
    eprint  = {2103.14030},
	archivePrefix = {arXiv}
}

Simple and understandable swin-transformer OCR project

Related tags

Overview

swin-transformer-ocr

Overview

Performance

Data

Configuration

Train

Predict

Exporting to ONNX

Citations

Owner

Ha YongWook

Official PyTorch implementation of "Meta-Learning with Task-Adaptive Loss Function for Few-Shot Learning" (ICCV2021 Oral)

A Closer Look at Structured Pruning for Neural Network Compression

Implicit Deep Adaptive Design (iDAD)

Augmented Traffic Control: A tool to simulate network conditions

The official repository for "Score Transformer: Generating Musical Scores from Note-level Representation" (MMAsia '21)

Constructing Neural Network-Based Models for Simulating Dynamical Systems

Official repository for HOTR: End-to-End Human-Object Interaction Detection with Transformers (CVPR'21, Oral Presentation)

Code and data for paper "Deep Photo Style Transfer"

Rede Neural Convolucional feita durante o processo seletivo do Laboratório de Inteligência Artificial da FACOM (UFMS)

一个目标检测的通用框架(不需要cuda编译)，支持Yolo全系列(v2~v5)、EfficientDet、RetinaNet、Cascade-RCNN等SOTA网络。

Code for "On the Effects of Batch and Weight Normalization in Generative Adversarial Networks"

Official implementation of the paper: "LDNet: Unified Listener Dependent Modeling in MOS Prediction for Synthetic Speech"

UV matrix decompostion using movielens dataset

RATCHET is a Medical Transformer for Chest X-ray Diagnosis and Reporting

Use deep learning, genetic programming and other methods to predict stock and market movements

Fast Neural Style for Image Style Transform by Pytorch

A pytorch-version implementation codes of paper: "BSN++: Complementary Boundary Regressor with Scale-Balanced Relation Modeling for Temporal Action Proposal Generation"

Implementation of "Semi-supervised Domain Adaptive Structure Learning"

Repository of Jupyter notebook tutorials for teaching the Deep Learning Course at the University of Amsterdam (MSc AI), Fall 2020

Papers about explainability of GNNs