Official implementation for CVPR 2021 paper: Adaptive Class Suppression Loss for Long-Tail Object Detection

Related tags

Deep LearningACSL
Overview

Adaptive Class Suppression Loss for Long-Tail Object Detection

This repo is the official implementation for CVPR 2021 paper: Adaptive Class Suppression Loss for Long-Tail Object Detection. [Paper]

Framework

Requirements

1. Environment:

The requirements are exactly the same as BalancedGroupSoftmax. We tested on the following settings:

  • python 3.7
  • cuda 10.0
  • pytorch 1.2.0
  • torchvision 0.4.0
  • mmcv 0.2.14
conda create -n mmdet python=3.7 -y
conda activate mmdet

pip install cython
pip install numpy
pip install torch
pip install torchvision
pip install pycocotools
pip install matplotlib
pip install terminaltables

# download the source code of mmcv 0.2.14 from https://github.com/open-mmlab/mmcv/tree/v0.2.14
cd mmcv-0.2.14
pip install -v -e .
cd ../

git clone https://github.com/CASIA-IVA-Lab/ACSL.git

cd ACSL/lvis-api/
python setup.py develop

cd ../
python setup.py develop

2. Data:

a. For dataset images:

# Make sure you are in dir ACSL

mkdir data
cd data
mkdir lvis
mkdir pretrained_models
mkdir download_models
  • If you already have COCO2017 dataset, it will be great. Link train2017 and val2017 folders under folder lvis.
  • If you do not have COCO2017 dataset, please download: COCO train set and COCO val set and unzip these files and mv them under folder lvis.

b. For dataset annotations:

c. For pretrained models:

Download the corresponding pre-trained models below.

  • To train baseline models, we need models trained on COCO to initialize. Please download the corresponding COCO models at mmdetection model zoo.

  • Move these model files to ./data/pretrained_models/

d. For download_models:

Download the trained baseline models and ACSL models from BaiduYun, code is 2jp3

  • To train ACSL models, we need corresponding baseline models trained on LVIS to initialize and fix all parameters except for the last FC layer.

  • Move these model files to ./data/download_models/

After all these operations, the folder data should be like this:

    data
    ├── lvis
    │   ├── lvis_v0.5_train.json
    │   ├── lvis_v0.5_val.json
    │   ├── train2017
    │   │   ├── 000000100582.jpg
    │   │   ├── 000000102411.jpg
    │   │   ├── ......
    │   └── val2017
    │       ├── 000000062808.jpg
    │       ├── 000000119038.jpg
    │       ├── ......
    └── pretrained_models
    │       ├── faster_rcnn_r50_fpn_2x_20181010-443129e1.pth
    │       ├── ......
    └── download_models
            ├── R50-baseline.pth
            ├── ......

Training

Note: Please make sure that you have prepared the pretrained_models and the download_models and they have been put to the path specified in ${CONIFG_FILE}.

Use the following commands to train a model.

# Single GPU
python tools/train.py ${CONFIG_FILE}

# Multi GPU distributed training
./tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM} [optional arguments]

All config files are under ./configs/.

  • ./configs/baselines: all baseline models.
  • ./configs/acsl: models for ACSL models.

For example, to train a ACSL model with Faster R-CNN R50-FPN:

# Single GPU
python tools/train.py configs/acsl/faster_rcnn_r50_fpn_1x_lvis_tunefc_acsl.py

# Multi GPU distributed training (for 8 gpus)
./tools/dist_train.sh configs/acsl/faster_rcnn_r50_fpn_1x_lvis_tunefc_acsl.py 8

Important: The default learning rate in config files is for 8 GPUs and 2 img/gpu (batch size = 8*2 = 16). According to the Linear Scaling Rule, you need to set the learning rate proportional to the batch size if you use different GPUs or images per GPU, e.g., lr=0.01 for 4 GPUs * 2 img/gpu and lr=0.08 for 16 GPUs * 4 img/gpu. (Cited from mmdetection.)

Testing

Use the following commands to test a trained model.

# single gpu test
python tools/test_lvis.py \
 ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}]

# multi-gpu testing
./tools/dist_test_lvis.sh \
 ${CONFIG_FILE} ${CHECKPOINT_FILE} ${GPU_NUM} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}]
  • $RESULT_FILE: Filename of the output results in pickle format. If not specified, the results will not be saved to a file.
  • $EVAL_METRICS: Items to be evaluated on the results. bbox for bounding box evaluation only. bbox segm for bounding box and mask evaluation.

For example (assume that you have finished the training of ACSL models.):

  • To evaluate the trained ACSL model with Faster R-CNN R50-FPN for object detection:
# single-gpu testing
python tools/test_lvis.py configs/acsl/faster_rcnn_r50_fpn_1x_lvis_tunefc_acsl.py \
 ./work_dirs/acsl/faster_rcnn_r50_fpn_1x_lvis_tunefc_acsl/epoch_12.pth \
  --out acsl_val_result.pkl --eval bbox

# multi-gpu testing (8 gpus)
./tools/dist_test_lvis.sh configs/acsl/faster_rcnn_r50_fpn_1x_lvis_tunefc_acsl.py \
./work_dirs/acsl/faster_rcnn_r50_fpn_1x_lvis_tunefc_acsl/epoch_12.pth 8 \
--out acsl_val_result.pkl --eval bbox

Results and models

Please refer to our paper for more details.

Method Models bbox mAP Config file Pretrained Model Model
baseline R50-FPN 21.18 file COCO-R50 R50-baseline
ACSL R50-FPN 26.36 file R50-baseline R50-acsl
baseline R101-FPN 22.36 file COCO-R101 R101-baseline
ACSL R101-FPN 27.49 file R101-baseline R101-acsl
baseline X101-FPN 24.70 file COCO-X101 X101-baseline
ACSL X101-FPN 28.93 file X101-baseline X101-acsl
baseline Cascade-R101 25.14 file COCO-Cas-R101 Cas-R101-baseline
ACSL Cascade-R101 29.71 file Cas-R101-baseline Cas-R101-acsl
baseline Cascade-X101 27.14 file COCO-Cas-X101 Cas-X101-baseline
ACSL Cascade-X101 31.47 file Cas-X101-baseline Cas-X101-acsl

Important: The code of BaiduYun is 2jp3

Citation

@inproceedings{wang2021adaptive,
  title={Adaptive Class Suppression Loss for Long-Tail Object Detection},
  author={Wang, Tong and Zhu, Yousong and Zhao, Chaoyang and Zeng, Wei and Wang, Jinqiao and Tang, Ming},
  journal={CVPR},
  year={2021}
}

Credit

This code is largely based on BalancedGroupSoftmax and mmdetection v1.0.rc0 and LVIS API.

Owner
CASIA-IVA-Lab
Image & Video Analysis Group, Institute of Automation, Chinese Academy of Sciences
CASIA-IVA-Lab
Source code for models described in the paper "AudioCLIP: Extending CLIP to Image, Text and Audio" (https://arxiv.org/abs/2106.13043)

AudioCLIP Extending CLIP to Image, Text and Audio This repository contains implementation of the models described in the paper arXiv:2106.13043. This

458 Jan 02, 2023
End-to-End Object Detection with Fully Convolutional Network

This project provides an implementation for "End-to-End Object Detection with Fully Convolutional Network" on PyTorch.

472 Dec 22, 2022
FreeSOLO for unsupervised instance segmentation, CVPR 2022

FreeSOLO: Learning to Segment Objects without Annotations This project hosts the code for implementing the FreeSOLO algorithm for unsupervised instanc

NVIDIA Research Projects 253 Jan 02, 2023
Jarvis Project is a basic virtual assistant that uses TensorFlow for learning.

Jarvis_proyect Jarvis Project is a basic virtual assistant that uses TensorFlow for learning. Latest version 0.1 Features: Good morning protocol Tell

Anze Kovac 3 Aug 31, 2022
Accelerated deep learning R&D

Accelerated deep learning R&D PyTorch framework for Deep Learning research and development. It focuses on reproducibility, rapid experimentation, and

Catalyst-Team 3.1k Jan 06, 2023
An implementation of Deep Forest 2021.2.1.

Deep Forest (DF) 21 DF21 is an implementation of Deep Forest 2021.2.1. It is designed to have the following advantages: Powerful: Better accuracy than

LAMDA Group, Nanjing University 795 Jan 03, 2023
Pytorch implementation of paper "Learning Co-segmentation by Segment Swapping for Retrieval and Discovery"

SegSwap Pytorch implementation of paper "Learning Co-segmentation by Segment Swapping for Retrieval and Discovery" [PDF] [Project page] If our project

xshen 41 Dec 10, 2022
Text completion with Hugging Face and TensorFlow.js running on Node.js

Katana ML Text Completion 🤗 Description Runs with with Hugging Face DistilBERT and TensorFlow.js on Node.js distilbert-model - converter from Hugging

Katana ML 2 Nov 04, 2022
League of Legends Reinforcement Learning Environment (LoLRLE) multiple training scenarios using PPO.

League of Legends Reinforcement Learning Environment (LoLRLE) About This repo contains code to train an agent to play league of legends in a distribut

2 Aug 19, 2022
List of content farm sites like g.penzai.com.

内容农场网站清单 Google 中文搜索结果包含了相当一部分的内容农场式条目,比如「小 X 知识网」「小 X 百科网」。此种链接常会 302 重定向其主站,页面内容为自动生成,大量堆叠关键字,揉杂一些爬取到的内容,完全不具可读性和参考价值。 尤为过分的是,该类网站可能有成千上万个分身域名被 Goog

WDMPA 541 Jan 03, 2023
Code for paper "Vocabulary Learning via Optimal Transport for Neural Machine Translation"

**Codebase and data are uploaded in progress. ** VOLT(-py) is a vocabulary learning codebase that allows researchers and developers to automaticaly ge

416 Jan 09, 2023
Easy way to add GoogleMaps to Flask applications. maintainer: @getcake

Flask Google Maps Easy to use Google Maps in your Flask application requires Jinja Flask A google api key get here Contribute To contribute with the p

Flask Extensions 611 Dec 05, 2022
PyTorch implementations of the paper: "DR.VIC: Decomposition and Reasoning for Video Individual Counting, CVPR, 2022"

DRNet for Video Indvidual Counting (CVPR 2022) Introduction This is the official PyTorch implementation of paper: DR.VIC: Decomposition and Reasoning

tao han 35 Nov 22, 2022
Fast Neural Style for Image Style Transform by Pytorch

FastNeuralStyle by Pytorch Fast Neural Style for Image Style Transform by Pytorch This is famous Fast Neural Style of Paper Perceptual Losses for Real

Bengxy 81 Sep 03, 2022
A curated list of automated deep learning (including neural architecture search and hyper-parameter optimization) resources.

Awesome AutoDL A curated list of automated deep learning related resources. Inspired by awesome-deep-vision, awesome-adversarial-machine-learning, awe

D-X-Y 2k Dec 30, 2022
PyTorch implementation of the end-to-end coreference resolution model with different higher-order inference methods.

End-to-End Coreference Resolution with Different Higher-Order Inference Methods This repository contains the implementation of the paper: Revealing th

Liyan 52 Jan 04, 2023
MMRazor: a model compression toolkit for model slimming and AutoML

Documentation: https://mmrazor.readthedocs.io/ English | 简体中文 Introduction MMRazor is a model compression toolkit for model slimming and AutoML, which

OpenMMLab 899 Jan 02, 2023
Inkscape extensions for figure resizing and editing

Academic-Inkscape: Extensions for figure resizing and editing This repository contains several Inkscape extensions designed for editing plots. Scale P

192 Dec 26, 2022
A curated list of awesome resources related to Semantic Search🔎 and Semantic Similarity tasks.

A curated list of awesome resources related to Semantic Search🔎 and Semantic Similarity tasks.

224 Jan 04, 2023
A collection of models for image<->text generation in ACM MM 2021.

Bi-directional Image and Text Generation UMT-BITG (image & text generator) Unifying Multimodal Transformer for Bi-directional Image and Text Generatio

Multimedia Research 63 Oct 30, 2022