The repo for the paper "I3CL: Intra- and Inter-Instance Collaborative Learning for Arbitrary-shaped Scene Text Detection".

Overview

I3CL: Intra- and Inter-Instance Collaborative Learning for Arbitrary-shaped Scene Text Detection

Updates | Introduction | Results | Usage | Citation | Acknowledgment

This is the repo for the paper "I3CL: Intra- and Inter-Instance Collaborative Learning for Arbitrary-shaped Scene Text Detection". I3CL with ViTAEv2, ResNet50 and ResNet50 w/ RegionCL backbone are included.


Updates

[2022/04/13] Publish links of training datasets.

[2022/04/11] Add SSL training code for this implementation.

[2022/04/09] The training code for ICDAR2019 ArT dataset is uploaded. Private github repo temporarily.

Other applications of ViTAE Transformer: Image Classification | Object Detection | Sementic Segmentation | Animal Pose Estimation | Matting | Remote Sensing

Introduction

Existing methods for arbitrary-shaped text detection in natural scenes face two critical issues, i.e., 1) fracture detections at the gaps in a text instance; and 2) inaccurate detections of arbitrary-shaped text instances with diverse background context. To address these issues, we propose a novel method named Intra- and Inter-Instance Collaborative Learning (I3CL). Specifically, to address the first issue, we design an effective convolutional module with multiple receptive fields, which is able to collaboratively learn better character and gap feature representations at local and long ranges inside a text instance. To address the second issue, we devise an instance-based transformer module to exploit the dependencies between different text instances and a global context module to exploit the semantic context from the shared background, which are able to collaboratively learn more discriminative text feature representation. In this way, I3CL can effectively exploit the intra- and inter-instance dependencies together in a unified end-to-end trainable framework. Besides, to make full use of the unlabeled data, we design an effective semi-supervised learning method to leverage the pseudo labels via an ensemble strategy. Without bells and whistles, experimental results show that the proposed I3CL sets new state-of-the-art results on three challenging public benchmarks, i.e., an F-measure of 77.5% on ArT, 86.9% on Total-Text, and 86.4% on CTW-1500. Notably, our I3CL with the ResNeSt-101 backbone ranked the 1st place on the ArT leaderboard.

image

Results

Example results from paper.

image

Evaluation results of I3CL with different backbones on ArT. Note that: (1) I3CL with ViTAE only adopts one training stage with LSVT+MLT19+ArT training datasets in this repo. ResNet series adopt three training stages, i.e, pre-train on SynthText, mix-train on ReCTS+RCTW+LSVT+MLT19+ArT and lastly finetune on LSVT+MLT19+ArT. (2) Origin implementation of ResNet series is based on Detectron2. The results and model links of ResNet-50 will be updated soon in this implementation.

Backbone Model Link Training Data Recall Precision F-measure

ViTAEv2-S
[this repo]

OneDrive/
百度网盘 (pw:w754)

LSVT,MLT19,ArT 75.4 82.8 78.9

ResNet-50
[paper]

- SynthText,ReCTS,RCTW,LSVT,MLT19,ArT 71.3 82.7 76.6

ResNet-50 w/ RegionCL(finetuning)
[paper]

- SynthText,ReCTS,RCTW,LSVT,MLT19,ArT 72.6 81.9 77.0

ResNet-50 w/ RegionCL(w/o finetuning)
[paper]

- SynthText,ReCTS,RCTW,LSVT,MLT19,ArT 73.5 81.6 77.3

ResNeXt-101
[paper]

- SynthText,ReCTS,RCTW,LSVT,MLT19,ArT 74.1 85.5 79.4

ResNeSt-101
[paper]

- SynthText,ReCTS,RCTW,LSVT,MLT19,ArT 75.1 86.3 80.3

ResNeXt-151
[paper]

- SynthText,ReCTS,RCTW,LSVT,MLT19,ArT 74.9 86.0 80.1

Usage

Install

Prerequisites:

  • Linux (macOS and Windows are not tested)
  • Python >= 3.6
  • Pytorch >= 1.8.1 (For ViTAE implementation). Please make sure your compilation CUDA version and runtime CUDA version match.
  • GCC >= 5
  • MMCV (We use mmcv-full==1.4.3)
  1. Create a conda virtual environment and activate it. Note that this implementation is based on mmdetection 2.20.0 version.

  2. Install Pytorch and torchvision following official instructions.

  3. Install mmcv-full and timm. Please refer to mmcv to install the proper version. For example:

    pip install mmcv-full==1.4.3 -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.9.0/index.html
    pip install timm
    
  4. Clone this repository and then install it:

    git clone https://github.com/ViTAE-Transformer/ViTAE-Transformer-Scene-Text-Detection.git
    cd ViTAE-Transformer-Scene-Text-Detection
    pip install -r requirements/build.txt
    pip install -r requirements/runtime.txt
    pip install -v -e .
    

Preparation

Model:

Data

  • Coco format training datasets are utilized. Some offline augmented ArT training datasets are used. lsvt-test is only used to train SSL(Semi-Supervised Learning) model in paper. Files named train_lossweight.json are the provided pseudo-label for SSL training. You can download correspoding datasets in config file from here and put them in data/:

    Dataset

    Link
    (OneDrive)

    Link
    (Baidu Wangpan百度网盘)

    art Link Link (pw:etif)
    art_light Link Link (pw:mzrk)
    art_noise Link Link (pw:scxi)
    art_sig Link Link (pw:cdk8)
    lsvt Link Link (pw:wly0)
    lsvt_test Link Link (pw:8ha3)
    icdar2019_mlt Link Link (pw:hmnj)
    rctw Link Link (pw:ngge)
    rects Link Link (pw:y00o)

    The file structure should look like:

    |- data
        |- art
        |   |- train_images
        |   |    |- *.jpg
        |   |- test_images
        |   |    |- *.jpg
        |   |- train.json
        |   |- train_lossweight.json
        |- art_light
        |   |- train_images
        |   |    |- *.jpg
        |   |- train.json
        |   |- train_lossweight.json
        ......
        |- lsvt
        |   |- train_images1
        |   |    |- *.jpg
        |   |- train_images2
        |   |    |- *.jpg
        |   |- train1.json
        |   |- train1_lossweight.json
        |   |- train2.json
        |   |- train2_lossweight.json
        |- lsvt_test
        |   |- train_images
        |   |    |- *.jpg
        |   |- train_lossweight.json
        ......
    
    

Training

  • Distributed training with 4GPUs for ViTAE backbone:
python -m torch.distributed.launch --nproc_per_node=4 --master_port=29500 tools/train.py \
configs/i3cl_vitae_fpn/i3cl_vitae_fpn_ms_train.py --launcher pytorch --work-dir ./out_dir/${your_dir}
  • Distributed training with 4GPUs for ResNet50 backbone:

stage1:

python -m torch.distributed.launch --nproc_per_node=4 --master_port=29500 tools/train.py \
configs/i3cl_r50_fpn/i3cl_r50_fpn_ms_pretrain.py --launcher pytorch --work-dir ./out_dir/art_r50_pretrain/

stage2:

python -m torch.distributed.launch --nproc_per_node=4 --master_port=29500 tools/train.py \
configs/i3cl_r50_fpn/i3cl_r50_fpn_ms_mixtrain.py --launcher pytorch --work-dir ./out_dir/art_r50_mixtrain/

stage3:

python -m torch.distributed.launch --nproc_per_node=4 --master_port=29500 tools/train.py \
configs/i3cl_r50_fpn/i3cl_r50_fpn_ms_finetune.py --launcher pytorch --work-dir ./out_dir/art_r50_finetune/
  • Distributed training with 4GPUs for ResNet50 w/ RegionCL backbone:

stage1:

python -m torch.distributed.launch --nproc_per_node=4 --master_port=29500 tools/train.py \
configs/i3cl_r50_regioncl_fpn/i3cl_r50_fpn_ms_pretrain.py --launcher pytorch --work-dir ./out_dir/art_r50_regioncl_pretrain/

stage2:

python -m torch.distributed.launch --nproc_per_node=4 --master_port=29500 tools/train.py \
configs/i3cl_r50_regioncl_fpn/i3cl_r50_fpn_ms_mixtrain.py --launcher pytorch --work-dir ./out_dir/art_r50_regioncl_mixtrain/

stage3:

python -m torch.distributed.launch --nproc_per_node=4 --master_port=29500 tools/train.py \
configs/i3cl_r50_regioncl_fpn/i3cl_r50_fpn_ms_finetune.py --launcher pytorch --work-dir ./out_dir/art_r50_regioncl_finetune/

Note:

  • If the GPU memory is limited during training I3CL ViTAE backbone, please adjust img_scale in configuration file. The maximum scale set to (800, 1333) is proper for V100(16G) while there is little effect on the performance actually. Please change the training scale according to your condition.

Inference

For example, use our trained I3CL model to get inference results on ICDAR2019 ArT test set with visualization images, txt format records and the json file for testing submission, please run:

python demo/art_demo.py --checkpoint pretrained_model/I3CL/vitae_epoch_12.pth --score-thr 0.45 --json_file art_submission.json

Note:

  • Upload the saved json file to ICDAR2019-ArT evaluation website for Recall, Precision and F1 evaluation results. Change the path for saving visualizations and txt files if needed.

Citation

This project is for research purpose only.

If you are interested in our work, please consider citing our work. Arxiv

Please post issues to let us know if you encounter any problems.

Acknowledgement

Thanks for mmdetection.

Official PyTorch implementation of "Meta-Learning with Task-Adaptive Loss Function for Few-Shot Learning" (ICCV2021 Oral)

MeTAL - Meta-Learning with Task-Adaptive Loss Function for Few-Shot Learning (ICCV2021 Oral) Sungyong Baik, Janghoon Choi, Heewon Kim, Dohee Cho, Jaes

Sungyong Baik 44 Dec 29, 2022
A more easy-to-use implementation of KPConv based on PyTorch.

A more easy-to-use implementation of KPConv This repo contains a more easy-to-use implementation of KPConv based on PyTorch. Introduction KPConv is a

Zheng Qin 36 Dec 29, 2022
Bounding Wasserstein distance with couplings

BoundWasserstein These scripts reproduce the results of the article Bounding Wasserstein distance with couplings by Niloy Biswas and Lester Mackey. ar

Niloy Biswas 1 Jan 11, 2022
MacroTools provides a library of tools for working with Julia code and expressions.

MacroTools.jl MacroTools provides a library of tools for working with Julia code and expressions. This includes a powerful template-matching system an

FluxML 278 Dec 11, 2022
A simple Python configuration file operator.

A simple Python configuration file operator This project provides a common way to read configurations using config42. Installation It is possible to i

Scott Lau 2 Nov 08, 2021
A fast MoE impl for PyTorch

An easy-to-use and efficient system to support the Mixture of Experts (MoE) model for PyTorch.

Rick Ho 873 Jan 09, 2023
Official repository for CVPR21 paper "Deep Stable Learning for Out-Of-Distribution Generalization".

StableNet StableNet is a deep stable learning method for out-of-distribution generalization. This is the official repo for CVPR21 paper "Deep Stable L

120 Dec 28, 2022
Official code repository for A Simple Long-Tailed Rocognition Baseline via Vision-Language Model.

BALLAD This is the official code repository for A Simple Long-Tailed Rocognition Baseline via Vision-Language Model. Requirements Python3 Pytorch(1.7.

peng gao 42 Nov 26, 2022
Code for Quantifying Ignorance in Individual-Level Causal-Effect Estimates under Hidden Confounding

🍐 quince Code for Quantifying Ignorance in Individual-Level Causal-Effect Estimates under Hidden Confounding 🍐 Installation $ git clone

Andrew Jesson 19 Jun 23, 2022
Implementation for Curriculum DeepSDF

Curriculum-DeepSDF This repository is an implementation for Curriculum DeepSDF. Full paper is available here. Preparation Please follow original setti

Haidong Zhu 69 Dec 29, 2022
This is the pytorch implementation of the paper - Axiomatic Attribution for Deep Networks.

Integrated Gradients This is the pytorch implementation of "Axiomatic Attribution for Deep Networks". The original tensorflow version could be found h

Tianhong Dai 150 Dec 23, 2022
In this project we investigate the performance of the SetCon model on realistic video footage. Therefore, we implemented the model in PyTorch and tested the model on two example videos.

Contrastive Learning of Object Representations Supervisor: Prof. Dr. Gemma Roig Institutions: Goethe University CVAI - Computational Vision & Artifici

Dirk Neuhäuser 6 Dec 08, 2022
Consensus score for tripadvisor

ContripScore ContripScore is essentially a score that combines an Internet platform rating and a consensus rating from sentiment analysis (For instanc

Pepe 1 Jan 13, 2022
Edge-oriented Convolution Block for Real-time Super Resolution on Mobile Devices, ACM Multimedia 2021

Codes for ECBSR Edge-oriented Convolution Block for Real-time Super Resolution on Mobile Devices Xindong Zhang, Hui Zeng, Lei Zhang ACM Multimedia 202

xindong zhang 236 Dec 26, 2022
Materials for my scikit-learn tutorial

Scikit-learn Tutorial Jake VanderPlas email: [email protected] twitter: @jakevdp gith

Jake Vanderplas 1.6k Dec 30, 2022
TCTrack: Temporal Contexts for Aerial Tracking (CVPR2022)

TCTrack: Temporal Contexts for Aerial Tracking (CVPR2022) Ziang Cao and Ziyuan Huang and Liang Pan and Shiwei Zhang and Ziwei Liu and Changhong Fu In

Intelligent Vision for Robotics in Complex Environment 100 Dec 19, 2022
SemEval2022 Patronizing and Condescending Language (PCL) Detection

SemEval2022 Patronizing and Condescending Language (PCL) Detection This task is from SemEval 2022. What is Patronizing and Condescending Language (PCL

Daniel Saeedi 0 Aug 05, 2022
A hifiasm fork for metagenome assembly using Hifi reads.

hifiasm_meta - de novo metagenome assembler, based on hifiasm, a haplotype-resolved de novo assembler for PacBio Hifi reads.

44 Jul 10, 2022
Our solution for SSN Invente 2021's Hackathon

Our solution for SSN Invente 2021's Hackathon. To help maitain godowns in a pristine and safe condition using raspberry pi.

1 Jan 12, 2022
Revisiting, benchmarking, and refining Heterogeneous Graph Neural Networks.

Heterogeneous Graph Benchmark Revisiting, benchmarking, and refining Heterogeneous Graph Neural Networks. Roadmap We organize our repo by task, and on

THUDM 176 Dec 17, 2022