[ICME 2021 Oral] CORE-Text: Improving Scene Text Detection with Contrastive Relational Reasoning

Overview

CORE-Text: Improving Scene Text Detection with Contrastive Relational Reasoning

This repository is the official PyTorch implementation of CORE-Text, and contains demo training and evaluation scripts.

CORE-Text

Requirements

Training Demo

Base (Mask R-CNN)

To train Base (Mask R-CNN) on a single node with 4 gpus, run:

#!/usr/bin/env bash

GPUS=4
PORT=${PORT:-29500}
PYTHON=${PYTHON:-"python"}

CONFIG=configs/icdar2017mlt/base.py
WORK_DIR=work_dirs/mask_rcnn_r50_fpn_train_base

$PYTHON -m torch.distributed.launch --nproc_per_node=$GPUS \
                                    --nnodes=1 --node_rank=0 --master_addr="localhost" \
                                    --master_port=$PORT \
                                    tools/train.py \
                                    $CONFIG \
                                    --no-validate \
                                    --launcher pytorch \
                                    --work-dir ${WORK_DIR} \
                                    --seed 0

VRM

To train VRM on a single node with 4 gpus, run:

#!/usr/bin/env bash

GPUS=4
PORT=${PORT:-29500}
PYTHON=${PYTHON:-"python"}

CONFIG=configs/icdar2017mlt/vrm.py
WORK_DIR=work_dirs/mask_rcnn_r50_fpn_train_vrm

$PYTHON -m torch.distributed.launch --nproc_per_node=$GPUS \
                                    --nnodes=1 --node_rank=0 --master_addr="localhost" \
                                    --master_port=$PORT \
                                    tools/train.py \
                                    $CONFIG \
                                    --no-validate \
                                    --launcher pytorch \
                                    --work-dir ${WORK_DIR} \
                                    --seed 0

CORE

To train CORE (ours) on a single node with 4 gpus, run:

#!/usr/bin/env bash

GPUS=4
PORT=${PORT:-29500}
PYTHON=${PYTHON:-"python"}

# pre-training
CONFIG=configs/icdar2017mlt/core_pretrain.py
WORK_DIR=work_dirs/mask_rcnn_r50_fpn_train_core_pretrain

$PYTHON -m torch.distributed.launch --nproc_per_node=$GPUS \
                                    --nnodes=1 --node_rank=0 --master_addr="localhost" \
                                    --master_port=$PORT \
                                    tools/train.py \
                                    $CONFIG \
                                    --no-validate \
                                    --launcher pytorch \
                                    --work-dir ${WORK_DIR} \
                                    --seed 0

# training
CONFIG=configs/icdar2017mlt/core.py
WORK_DIR=work_dirs/mask_rcnn_r50_fpn_train_core

$PYTHON -m torch.distributed.launch --nproc_per_node=$GPUS \
                                    --nnodes=1 --node_rank=0 --master_addr="localhost" \
                                    --master_port=$PORT \
                                    tools/train.py \
                                    $CONFIG \
                                    --no-validate \
                                    --launcher pytorch \
                                    --work-dir ${WORK_DIR} \
                                    --seed 0

Evaluation Demo

GPUS=4
PORT=${PORT:-29500}
CONFIG=path/to/config
CHECKPOINT=path/to/checkpoint

python -m torch.distributed.launch --nproc_per_node=$GPUS --master_port=$PORT \
    ./tools/test.py $CONFIG $CHECKPOINT --launcher pytorch \
    --eval segm \
    --not-encode-mask \
    --eval-options "jsonfile_prefix=path/to/work_dir/results/eval" "gt_path=data/icdar2017mlt/icdar2017mlt_gt.zip"

Dataset Format

The structure of the dataset directory is shown as following, and we provide the COCO-format label (ICDAR2017_train.json and ICDAR2017_val.json) and the ground truth zipfile (icdar2017mlt_gt.zip) for training and evaluation.

data
└── icdar2017mlt
    ├── annotations
    |   ├── ICDAR2017_train.json
    |   └── ICDAR2017_val.json
    ├── icdar2017mlt_gt.zip
    └── image
         ├── train
         └── val

Results

Our model achieves the following performance on ICDAR 2017 MLT val set. Note that the results are slightly different (~0.1%) from what we reported in the paper, because we reimplement the code based on the open-source mmdetection.

Method Backbone Training set Test set Hmean Precision Recall Download
Base (Mask R-CNN) ResNet50 ICDAR 2017 MLT Train ICDAR 2017 MLT Val 0.800 0.828 0.773 model | log
VRM ResNet50 ICDAR 2017 MLT Train ICDAR 2017 MLT Val 0.812 0.853 0.774 model | log
CORE (ours) ResNet50 ICDAR 2017 MLT Train ICDAR 2017 MLT Val 0.821 0.872 0.777 model | log

Citation

@inproceedings{9428457,
  author={Lin, Jingyang and Pan, Yingwei and Lai, Rongfeng and Yang, Xuehang and Chao, Hongyang and Yao, Ting},
  booktitle={2021 IEEE International Conference on Multimedia and Expo (ICME)},
  title={Core-Text: Improving Scene Text Detection with Contrastive Relational Reasoning},
  year={2021},
  pages={1-6},
  doi={10.1109/ICME51207.2021.9428457}
}
Owner
Jingyang Lin
Graduate student @ SYSU.
Jingyang Lin
Multi-view 3D reconstruction using neural rendering. Unofficial implementation of UNISURF, VolSDF, NeuS and more.

Volume rendering + 3D implicit surface Showcase What? previous: surface rendering; now: volume rendering previous: NeRF's volume density; now: implici

Jianfei Guo 682 Jan 04, 2023
The first dataset of composite images with rationality score indicating whether the object placement in a composite image is reasonable.

Object-Placement-Assessment-Dataset-OPA Object-Placement-Assessment (OPA) is to verify whether a composite image is plausible in terms of the object p

BCMI 53 Nov 15, 2022
[BMVC2021] "TransFusion: Cross-view Fusion with Transformer for 3D Human Pose Estimation"

TransFusion-Pose TransFusion: Cross-view Fusion with Transformer for 3D Human Pose Estimation Haoyu Ma, Liangjian Chen, Deying Kong, Zhe Wang, Xingwei

Haoyu Ma 29 Dec 23, 2022
Some pvbatch (paraview) scripts for postprocessing OpenFOAM data

pvbatchForFoam Some pvbatch (paraview) scripts for postprocessing OpenFOAM data For every script there is a help message available: pvbatch pv_state_s

Morev Ilya 2 Oct 26, 2022
(SIGIR2020) “Asymmetric Tri-training for Debiasing Missing-Not-At-Random Explicit Feedback’’

Asymmetric Tri-training for Debiasing Missing-Not-At-Random Explicit Feedback About This repository accompanies the real-world experiments conducted i

yuta-saito 19 Dec 01, 2022
Pytorch implementation of FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks

flownet2-pytorch Pytorch implementation of FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks. Multiple GPU training is supported, a

NVIDIA Corporation 2.8k Dec 27, 2022
TensorFlow (Python) implementation of DeepTCN model for multivariate time series forecasting.

DeepTCN TensorFlow TensorFlow (Python) implementation of multivariate time series forecasting model introduced in Chen, Y., Kang, Y., Chen, Y., & Wang

Flavia Giammarino 21 Dec 19, 2022
Context Axial Reverse Attention Network for Small Medical Objects Segmentation

CaraNet: Context Axial Reverse Attention Network for Small Medical Objects Segmentation This repository contains the implementation of a novel attenti

401 Dec 23, 2022
Direct Multi-view Multi-person 3D Human Pose Estimation

Implementation of NeurIPS-2021 paper: Direct Multi-view Multi-person 3D Human Pose Estimation [paper] [video-YouTube, video-Bilibili] [slides] This is

Sea AI Lab 251 Dec 30, 2022
[CVPR 2022 Oral] EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation

EPro-PnP EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation In CVPR 2022 (Oral). [paper] Hanshen

同济大学智能汽车研究所综合感知研究组 ( Comprehensive Perception Research Group under Institute of Intelligent Vehicles, School of Automotive Studies, Tongji University) 842 Jan 04, 2023
clustimage is a python package for unsupervised clustering of images.

clustimage The aim of clustimage is to detect natural groups or clusters of images. Image recognition is a computer vision task for identifying and ve

Erdogan Taskesen 52 Jan 02, 2023
ML for NLP and Computer Vision.

Sparrow is our open-source ML product. It runs on Skipper MLOps infrastructure.

Katana ML 2 Nov 28, 2021
A curated list of awesome papers for Semantic Retrieval (TOIS Accepted: Semantic Models for the First-stage Retrieval: A Comprehensive Review).

A curated list of awesome papers for Semantic Retrieval (TOIS Accepted: Semantic Models for the First-stage Retrieval: A Comprehensive Review).

Yinqiong Cai 189 Dec 28, 2022
[PAMI 2020] Show, Match and Segment: Joint Weakly Supervised Learning of Semantic Matching and Object Co-segmentation

Show, Match and Segment: Joint Weakly Supervised Learning of Semantic Matching and Object Co-segmentation This repository contains the source code for

Yun-Chun Chen 60 Nov 25, 2022
[IEEE TPAMI21] MobileSal: Extremely Efficient RGB-D Salient Object Detection [PyTorch & Jittor]

MobileSal IEEE TPAMI 2021: MobileSal: Extremely Efficient RGB-D Salient Object Detection This repository contains full training & testing code, and pr

Yu-Huan Wu 52 Jan 06, 2023
PyTorch implementation of our ICCV2021 paper: StructDepth: Leveraging the structural regularities for self-supervised indoor depth estimation

StructDepth PyTorch implementation of our ICCV2021 paper: StructDepth: Leveraging the structural regularities for self-supervised indoor depth estimat

SJTU-ViSYS 112 Nov 28, 2022
This code implements constituency parse tree aggregation

README This code implements constituency parse tree aggregation. Folder details code: This folder contains the code that implements constituency parse

Adithya Kulkarni 0 Oct 11, 2021
[ICCV 2021] Counterfactual Attention Learning for Fine-Grained Visual Categorization and Re-identification

Counterfactual Attention Learning Created by Yongming Rao*, Guangyi Chen*, Jiwen Lu, Jie Zhou This repository contains PyTorch implementation for ICCV

Yongming Rao 90 Dec 31, 2022
An open-source Deep Learning Engine for Healthcare that aims to treat & prevent major diseases

AlphaCare Background AlphaCare is a work-in-progress, open-source Deep Learning Engine for Healthcare that aims to treat and prevent major diseases. T

Siraj Raval 44 Nov 05, 2022
NATS-Bench: Benchmarking NAS Algorithms for Architecture Topology and Size

NATS-Bench: Benchmarking NAS Algorithms for Architecture Topology and Size Xuanyi Dong, Lu Liu, Katarzyna Musial, Bogdan Gabrys in IEEE Transactions o

D-X-Y 137 Dec 20, 2022