[CVPR22] Official codebase of Semantic Segmentation by Early Region Proxy.

Overview

RegionProxy

Figure 2. Performance vs. GFLOPs on ADE20K val split.

Semantic Segmentation by Early Region Proxy

Yifan Zhang, Bo Pang, Cewu Lu

CVPR 2022 (Poster) [arXiv]

Installation

Note: recommend using the exact version of the packages to avoid running issues.

  1. Install PyTorch 1.7.1 and torchvision 0.8.2 following the official guide.

  2. Install timm 0.4.12 and einops:

    pip install timm==0.4.12 einops
    
  3. This project depends on mmsegmentation 0.17 and mmcv 1.3.13, so you may follow its instructions to setup environment and prepare datasets.

Models

ADE20K

backbone Resolution FLOPs #params. mIoU mIoU (ms+flip) FPS download
ViT-Ti/16 512x512 3.9G 5.8M 42.1 43.1 38.9 [model]
ViT-S/16 512x512 15G 22M 47.6 48.5 32.1 [model]
R26+ViT-S/32 512x512 16G 36M 47.8 49.1 28.5 [model]
ViT-B/16 512x512 59G 87M 49.8 50.5 20.1 [model]
R50+ViT-L/32 640x640 82G 323M 51.0 51.7 12.7 [model]
ViT-L/16 640x640 326G 306M 52.9 53.4 6.6 [model]

Cityscapes

backbone Resolution FLOPs #params. mIoU mIoU (ms+flip) download
ViT-Ti/16 768x768 69G 6M 76.5 77.7 [model]
ViT-S/16 768x768 270G 23M 79.8 81.5 [model]
ViT-B/16 768x768 1064G 88M 81.0 82.2 [model]
ViT-L/16 768x768 - 307M 81.4 82.7 [model]

Evaluation

You may evaluate the model on single GPU by running:

python test.py \
	--config configs/regproxy_ade20k/regproxy-t16-sub4+implicit-mid-4+512x512+160k+adamw-poly+ade20k.py \
	--checkpoint /path/to/ckpt \
	--eval mIoU

To evaluate on multiple GPUs, run:

python -m torch.distributed.launch --nproc_per_node 8 test.py \
	--launcher pytorch \
	--config configs/regproxy_ade20k/regproxy-t16-sub4+implicit-mid-4+512x512+160k+adamw-poly+ade20k.py \
	--checkpoint /path/to/ckpt 
	--eval mIoU

You may add --aug-test to enable multi-scale + flip evaluation. The test.py script is mostly copy-pasted from mmsegmentation. Please refer to this link for more usage (e.g., visualization).

Training

The first step is to prepare the pre-trained weights. Following Segmenter, we use AugReg pre-trained weights on our tiny, small and large models, and we use DeiT pre-trained weights on our base models. Do following steps to prepare the pre-trained weights for model initialization:

  1. For DeiT weight, simply download from this link. For AugReg weights, first acquire the timm-style models:

    import timm
    m = timm.create_model('vit_tiny_patch16_384', pretrained=True)

    The full list of entries can be found here (vanilla ViTs) and here (hybrid models).

  2. Convert the timm models to mmsegmentation style using this script.

We train all models on 8 V100 GPUs. For example, to train RegProxy-Ti/16, run:

python -m torch.distributed.launch --nproc_per_node 8 train.py 
	--launcher pytorch \
	--config configs/regproxy_ade20k/regproxy-t16-sub4+implicit-mid-4+512x512+160k+adamw-poly+ade20k.py \
	--work-dir /path/to/workdir \
	--options model.pretrained=/path/to/pretrained/model

You may need to adjust data.samples_per_gpu if you plan to train on less GPUs. Please refer to this link for more training optioins.

Citation

@article{zhang2022semantic,
  title={Semantic Segmentation by Early Region Proxy},
  author={Zhang, Yifan and Pang, Bo and Lu, Cewu},
  journal={arXiv preprint arXiv:2203.14043},
  year={2022}
}
Owner
Yifan
Yifan
RM Operation can equivalently convert ResNet to VGG, which is better for pruning; and can help RepVGG perform better when the depth is large.

RM Operation can equivalently convert ResNet to VGG, which is better for pruning; and can help RepVGG perform better when the depth is large.

184 Jan 04, 2023
Sequence Modeling with Structured State Spaces

Structured State Spaces for Sequence Modeling This repository provides implementations and experiments for the following papers. S4 Efficiently Modeli

HazyResearch 896 Jan 01, 2023
A novel method to tune language models. Codes and datasets for paper ``GPT understands, too''.

P-tuning A novel method to tune language models. Codes and datasets for paper ``GPT understands, too''. How to use our code We have released the code

THUDM 562 Dec 27, 2022
A simple approach to emable dense segmentation with ViT.

Vision Transformer Segmentation Network This implementation of ViT in pytorch uses a super simple and straight-forward way of generating an output of

HReynaud 5 Jan 03, 2023
Designing a Minimal Retrieve-and-Read System for Open-Domain Question Answering (NAACL 2021)

Designing a Minimal Retrieve-and-Read System for Open-Domain Question Answering Abstract In open-domain question answering (QA), retrieve-and-read mec

Clova AI Research 34 Apr 13, 2022
Face Recognition and Emotion Detector Device

Face Recognition and Emotion Detector Device Orange PI 1 Python 3.10.0 + Django 3.2.9 Project's file explanation Django manage.py Django commands hand

BootyAss 2 Dec 21, 2021
BuildingNet: Learning to Label 3D Buildings

BuildingNet This is the implementation of the BuildingNet architecture described in this paper: Paper: BuildingNet: Learning to Label 3D Buildings Arx

16 Nov 07, 2022
Code for "Modeling Indirect Illumination for Inverse Rendering", CVPR 2022

Modeling Indirect Illumination for Inverse Rendering Project Page | Paper | Data Preparation Set up the python environment conda create -n invrender p

ZJU3DV 116 Jan 03, 2023
Implementation detail for paper "Multi-level colonoscopy malignant tissue detection with adversarial CAC-UNet"

Multi-level-colonoscopy-malignant-tissue-detection-with-adversarial-CAC-UNet Implementation detail for our paper "Multi-level colonoscopy malignant ti

CVSM Group - email: <a href=[email protected]"> 84 Nov 22, 2022
ISTR: End-to-End Instance Segmentation with Transformers (https://arxiv.org/abs/2105.00637)

This is the project page for the paper: ISTR: End-to-End Instance Segmentation via Transformers, Jie Hu, Liujuan Cao, Yao Lu, ShengChuan Zhang, Yan Wa

Jie Hu 182 Dec 19, 2022
Code for the ICCV2021 paper "Personalized Image Semantic Segmentation"

PSS: Personalized Image Semantic Segmentation Paper PSS: Personalized Image Semantic Segmentation Yu Zhang, Chang-Bin Zhang, Peng-Tao Jiang, Ming-Ming

张宇 15 Jul 09, 2022
JASS: Japanese-specific Sequence to Sequence Pre-training for Neural Machine Translation

JASS: Japanese-specific Sequence to Sequence Pre-training for Neural Machine Translation This the repository for this paper. Find extensions of this w

Zhuoyuan Mao 14 Oct 26, 2022
DiffWave is a fast, high-quality neural vocoder and waveform synthesizer.

DiffWave DiffWave is a fast, high-quality neural vocoder and waveform synthesizer. It starts with Gaussian noise and converts it into speech via itera

LMNT 498 Jan 03, 2023
计算机视觉中用到的注意力模块和其他即插即用模块PyTorch Implementation Collection of Attention Module and Plug&Play Module

PyTorch实现多种计算机视觉中网络设计中用到的Attention机制,还收集了一些即插即用模块。由于能力有限精力有限,可能很多模块并没有包括进来,有任何的建议或者改进,可以提交issue或者进行PR。

PJDong 599 Dec 23, 2022
Implementation of MeMOT - Multi-Object Tracking with Memory - in Pytorch

MeMOT - Pytorch (wip) Implementation of MeMOT - Multi-Object Tracking with Memory - in Pytorch. This paper is just one in a line of work, but importan

Phil Wang 15 May 09, 2022
[NeurIPS 2021] "G-PATE: Scalable Differentially Private Data Generator via Private Aggregation of Teacher Discriminators"

G-PATE This is the official code base for our NeurIPS 2021 paper: "G-PATE: Scalable Differentially Private Data Generator via Private Aggregation of T

AI Secure 14 Oct 12, 2022
Unofficial implementation of One-Shot Free-View Neural Talking Head Synthesis

face-vid2vid Usage Dataset Preparation cd datasets wget https://yt-dl.org/downloads/latest/youtube-dl -O youtube-dl chmod a+rx youtube-dl python load_

worstcoder 68 Dec 30, 2022
Instance-level Image Retrieval using Reranking Transformers

Instance-level Image Retrieval using Reranking Transformers Fuwen Tan, Jiangbo Yuan, Vicente Ordonez, ICCV 2021. Abstract Instance-level image retriev

UVA Computer Vision 87 Jan 03, 2023
Style-based Neural Drum Synthesis with GAN inversion

Style-based Drum Synthesis with GAN Inversion Demo TensorFlow implementation of a style-based version of the adversarial drum synth (ADS) from the pap

Sound and Music Analysis (SoMA) Group 29 Nov 19, 2022
This is the code related to "Sparse-to-dense Feature Matching: Intra and Inter domain Cross-modal Learning in Domain Adaptation for 3D Semantic Segmentation" (ICCV 2021).

Sparse-to-dense Feature Matching: Intra and Inter domain Cross-modal Learning in Domain Adaptation for 3D Semantic Segmentation This is the code relat

39 Sep 23, 2022