Pytorch re-implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition (CVPR 2022)

Overview

SwinTextSpotter

This is the pytorch implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition (CVPR 2022). The paper is available at this link.

Models

SWINTS-swin-english-pretrain [config] | model_Google Drive | model_BaiduYun PW: 954t

SWINTS-swin-Total-Text [config] | model_Google Drive | model_BaiduYun PW: tf0i

SWINTS-swin-ctw [config] | model_Google Drive | model_BaiduYun PW: 4etq

SWINTS-swin-icdar2015 [config] | model_Google Drive | model_BaiduYun PW: 3n82

SWINTS-swin-ReCTS [config] | model_Google Drive | model_BaiduYun PW: a4be

SWINTS-swin-vintext [config] | model_Google Drive | model_BaiduYun PW: slmp

Installation

  • Python=3.8
  • PyTorch=1.8.0, torchvision=0.9.0, cudatoolkit=11.1
  • OpenCV for visualization

Steps

  1. Install the repository (we recommend to use Anaconda for installation.)
conda create -n SWINTS python=3.8 -y
conda activate SWINTS
conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=11.1 -c pytorch -c conda-forge
pip install opencv-python
pip install scipy
pip install shapely
pip install rapidfuzz
pip install timm
pip install Polygon3
git clone https://github.com/mxin262/SwinTextSpotter.git
cd SwinTextSpotter
python setup.py build develop
  1. dataset path
datasets
|_ totaltext
|  |_ train_images
|  |_ test_images
|  |_ totaltext_train.json
|  |_ weak_voc_new.txt
|  |_ weak_voc_pair_list.txt
|_ mlt2017
|  |_ train_images
|  |_ annotations/icdar_2017_mlt.json
.......

Downloaded images

Downloaded label[Google Drive] [BaiduYun] PW: 46vd

Downloader lexicion[Google Drive] and place it to corresponding dataset.

You can also prepare your custom dataset following the example scripts. [example scripts]

Totaltext

To evaluate on Total Text, CTW1500, ICDAR2015, first download the zipped annotations with

cd datasets
mkdir evaluation
cd evaluation
wget -O gt_ctw1500.zip https://cloudstor.aarnet.edu.au/plus/s/xU3yeM3GnidiSTr/download
wget -O gt_totaltext.zip https://cloudstor.aarnet.edu.au/plus/s/SFHvin8BLUM4cNd/download
wget -O gt_icdar2015.zip https://drive.google.com/file/d/1wrq_-qIyb_8dhYVlDzLZTTajQzbic82Z/view?usp=sharing
wget -O gt_vintext.zip https://drive.google.com/file/d/11lNH0uKfWJ7Wc74PGshWCOgSxgEnUPEV/view?usp=sharing
  1. Pretrain SWINTS (e.g., with Swin-Transformer backbone)
python projects/SWINTS/train_net.py \
  --num-gpus 8 \
  --config-file projects/SWINTS/configs/SWINTS-swin-pretrain.yaml
  1. Fine-tune model on the mixed real dataset
python projects/SWINTS/train_net.py \
  --num-gpus 8 \
  --config-file projects/SWINTS/configs/SWINTS-swin-mixtrain.yaml
  1. Fine-tune model
python projects/SWINTS/train_net.py \
  --num-gpus 8 \
  --config-file projects/SWINTS/configs/SWINTS-swin-finetune-totaltext.yaml
  1. Evaluate SWINTS (e.g., with Swin-Transformer backbone)
python projects/SWINTS/train_net.py \
  --config-file projects/SWINTS/configs/SWINTS-swin-finetune-totaltext.yaml \
  --eval-only MODEL.WEIGHTS ./output/model_final.pth
  1. Visualize the detection and recognition results (e.g., with ResNet50 backbone)
python demo/demo.py \
  --config-file projects/SWINTS/configs/SWINTS-swin-finetune-totaltext.yaml \
  --input input1.jpg \
  --output ./output \
  --confidence-threshold 0.4 \
  --opts MODEL.WEIGHTS ./output/model_final.pth

Example results:

Acknowlegement

Adelaidet, Detectron2, ISTR, SwinT_detectron2, Focal-Transformer and MaskTextSpotterV3.

Citation

If our paper helps your research, please cite it in your publications:

@article{huang2022swints,
  title = {SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition},
  author = {Mingxin Huang and YuLiang liu and Zhenghao Peng and Chongyu Liu and Dahua Lin and Shenggao Zhu and Nicholas Yuan and Kai Ding and Lianwen Jin},
  journal={arXiv preprint arXiv:2203.10209},
  year = {2022}
}

Copyright

For commercial purpose usage, please contact Dr. Lianwen Jin: [email protected]

Copyright 2019, Deep Learning and Vision Computing Lab, South China China University of Technology. http://www.dlvc-lab.net

Owner
mxin262
mxin262
Fast Differentiable Matrix Sqrt Root

Official Pytorch implementation of ICLR 22 paper Fast Differentiable Matrix Square Root

YueSong 42 Dec 30, 2022
Code for ICCV2021 paper SPEC: Seeing People in the Wild with an Estimated Camera

SPEC: Seeing People in the Wild with an Estimated Camera [ICCV 2021] SPEC: Seeing People in the Wild with an Estimated Camera, Muhammed Kocabas, Chun-

Muhammed Kocabas 187 Dec 26, 2022
Marvis is Mastouri's Jarvis version of the AI-powered Python personal assistant.

Marvis v1.0 Marvis is Mastouri's Jarvis version of the AI-powered Python personal assistant. About M.A.R.V.I.S. J.A.R.V.I.S. is a fictional character

Reda Mastouri 1 Dec 29, 2021
CLOOB training (JAX) and inference (JAX and PyTorch)

cloob-training Pretrained models There are two pretrained CLOOB models in this repo at the moment, a 16 epoch and a 32 epoch ViT-B/16 checkpoint train

Katherine Crowson 64 Nov 27, 2022
People movement type classifier with YOLOv4 detection and SORT tracking.

Movement classification The goal of this project would be movement classification of people, in other words, walking (normal and fast) and running. Yo

4 Sep 21, 2021
(ICCV 2021) Official code of "Dressing in Order: Recurrent Person Image Generation for Pose Transfer, Virtual Try-on and Outfit Editing."

Dressing in Order (DiOr) 👚 [Paper] 👖 [Webpage] 👗 [Running this code] The official implementation of "Dressing in Order: Recurrent Person Image Gene

Aiyu Cui 277 Dec 28, 2022
A texturizer that I just made. Nothing special here.

texturizer This is a little project that I did with an hour's time. It texturizes an image given a image and a texture to texturize it with. There is

1 Nov 11, 2021
[AAAI-2022] Official implementations of MCL: Mutual Contrastive Learning for Visual Representation Learning

Mutual Contrastive Learning for Visual Representation Learning This project provides source code for our Mutual Contrastive Learning for Visual Repres

winycg 48 Jan 02, 2023
Existing Literature about Machine Unlearning

Machine Unlearning Papers 2021 Brophy and Lowd. Machine Unlearning for Random Forests. In ICML 2021. Bourtoule et al. Machine Unlearning. In IEEE Symp

Jonathan Brophy 213 Jan 08, 2023
Easily Process a Batch of Cox Models

ezcox: Easily Process a Batch of Cox Models The goal of ezcox is to operate a batch of univariate or multivariate Cox models and return tidy result. ⏬

Shixiang Wang 15 May 23, 2022
[ICLR'21] Counterfactual Generative Networks

This repository contains the code for the ICLR 2021 paper "Counterfactual Generative Networks" by Axel Sauer and Andreas Geiger. If you want to take the CGN for a spin and generate counterfactual ima

88 Jan 02, 2023
CDTrans: Cross-domain Transformer for Unsupervised Domain Adaptation

[ICCV2021] TransReID: Transformer-based Object Re-Identification [pdf] The official repository for TransReID: Transformer-based Object Re-Identificati

DamoCV 569 Dec 30, 2022
An excellent hash algorithm combining classical sponge structure and RNN.

SHA-RNN Recurrent Neural Network with Chaotic System for Hash Functions Anonymous Authors [摘要] 在这次作业中我们提出了一种新的 Hash Function —— SHA-RNN。其以海绵结构为基础,融合了混

Houde Qian 5 May 15, 2022
An experimentation and research platform to investigate the interaction of automated agents in an abstract simulated network environments.

CyberBattleSim April 8th, 2021: See the announcement on the Microsoft Security Blog. CyberBattleSim is an experimentation research platform to investi

Microsoft 1.5k Dec 25, 2022
FairEdit: Preserving Fairness in Graph Neural Networks through Greedy Graph Editing

FairEdit Relevent Publication FairEdit: Preserving Fairness in Graph Neural Networks through Greedy Graph Editing

5 Feb 04, 2022
Official code for "EagerMOT: 3D Multi-Object Tracking via Sensor Fusion" [ICRA 2021]

EagerMOT: 3D Multi-Object Tracking via Sensor Fusion Read our ICRA 2021 paper here. Check out the 3 minute video for the quick intro or the full prese

Aleksandr Kim 276 Dec 30, 2022
This is the official implementation of TrivialAugment and a mini-library for the application of multiple image augmentation strategies including RandAugment and TrivialAugment.

Trivial Augment This is the official implementation of TrivialAugment (https://arxiv.org/abs/2103.10158), as was used for the paper. TrivialAugment is

AutoML-Freiburg-Hannover 94 Dec 30, 2022
The description of FMFCC-A (audio track of FMFCC) dataset and Challenge resluts.

FMFCC-A This project is the description of FMFCC-A (audio track of FMFCC) dataset and Challenge resluts. The FMFCC-A dataset is shared through BaiduCl

18 Dec 24, 2022
CycleTransGAN-EVC: A CycleGAN-based Emotional Voice Conversion Model with Transformer

CycleTransGAN-EVC CycleTransGAN-EVC: A CycleGAN-based Emotional Voice Conversion Model with Transformer Demo emotion CycleTransGAN CycleTransGAN Cycle

24 Dec 15, 2022
Official codebase for Pretrained Transformers as Universal Computation Engines.

universal-computation Overview Official codebase for Pretrained Transformers as Universal Computation Engines. Contains demo notebook and scripts to r

Kevin Lu 210 Dec 28, 2022