[CVPR 2021] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Last update: Jan 05, 2023

Related tags

Overview

SEgmentation TRansformers -- SETR

Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers,
Sixiao Zheng, Jiachen Lu, Hengshuang Zhao, Xiatian Zhu, Zekun Luo, Yabiao Wang, Yanwei Fu, Jianfeng Feng, Tao Xiang, Philip HS Torr, Li Zhang,
CVPR 2021

Installation

Our project is developed based on mmsegmentation. Please follow the official mmsegmentation INSTALL.md and getting_started.md for installation and dataset preparation.

Main results

Cityscapes

Method	Crop Size	Batch size	iteration	set	mIoU
SETR-Naive	768x768	8	40k	val	77.37	model config
SETR-Naive	768x768	8	80k	val	77.90	model config
SETR-MLA	768x768	8	40k	val	76.65	model config
SETR-MLA	768x768	8	80k	val	77.24	model config
SETR-PUP	768x768	8	40k	val	78.39	model config
SETR-PUP	768x768	8	80k	val	79.34	model config
SETR-Naive-DeiT	768x768	8	40k	val	77.85	model config
SETR-Naive-DeiT	768x768	8	80k	val	78.66	model config
SETR-MLA-DeiT	768x768	8	40k	val	78.04	model config
SETR-MLA-DeiT	768x768	8	80k	val	78.98	model config
SETR-PUP-DeiT	768x768	8	40k	val	78.79	model config
SETR-PUP-DeiT	768x768	8	80k	val	79.45	model config

ADE20K

Method	Crop Size	Batch size	iteration	set	mIoU	mIoU(ms+flip)
SETR-Naive	512x512	16	160k	Val	48.06	48.80	model config
SETR-MLA	512x512	8	160k	val	48.27	50.03	model config
SETR-MLA	512x512	16	160k	val	48.64	50.28	model config
SETR-PUP	512x512	16	160k	val	48.58	50.09	model config

Pascal Context

Method	Crop Size	Batch size	iteration	set	mIoU	mIoU(ms+flip)
SETR-Naive	480x480	16	80k	val	52.89	53.61	model config
SETR-MLA	480x480	8	80k	val	54.39	55.39	model config
SETR-MLA	480x480	16	80k	val	54.87	55.83	model config
SETR-PUP	480x480	16	80k	val	54.40	55.27	model config

Get Started

Train

./tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM} 
# For example, train a SETR-PUP on Cityscapes dataset with 8 GPUs
./tools/dist_train.sh configs/SETR/SETR_PUP_768x768_40k_cityscapes_bs_8.py 8

Single-scale testing

./tools/dist_test.sh ${CONFIG_FILE} ${CHECKPOINT_FILE} ${GPU_NUM}  [--eval ${EVAL_METRICS}]
# For example, test a SETR-PUP on Cityscapes dataset with 8 GPUs
./tools/dist_test.sh configs/SETR/SETR_PUP_768x768_40k_cityscapes_bs_8.py \
work_dirs/SETR_PUP_768x768_40k_cityscapes_bs_8/iter_40000.pth \
8 --eval mIoU

Multi-scale testing

Use the config file ending in _MS.py in configs/SETR.

./tools/dist_test.sh ${CONFIG_FILE} ${CHECKPOINT_FILE} ${GPU_NUM}  [--eval ${EVAL_METRICS}]
# For example, test a SETR-PUP on Cityscapes dataset with 8 GPUs
./tools/dist_test.sh configs/SETR/SETR_PUP_768x768_40k_cityscapes_bs_8_MS.py \
work_dirs/SETR_PUP_768x768_40k_cityscapes_bs_8/iter_40000.pth \
8 --eval mIoU

Please see getting_started.md for the more basic usage of training and testing.

Reference

@inproceedings{SETR,
    title={Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers}, 
    author={Zheng, Sixiao and Lu, Jiachen and Zhao, Hengshuang and Zhu, Xiatian and Luo, Zekun and Wang, Yabiao and Fu, Yanwei and Feng, Jianfeng and Xiang, Tao and Torr, Philip H.S. and Zhang, Li},
    booktitle={CVPR},
    year={2021}
}

License

MIT

Acknowledgement

Thanks to previous open-sourced repo:
mmsegmentation
pytorch-image-models

[CVPR 2021] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Related tags

Overview

SEgmentation TRansformers -- SETR

Installation

Main results

Cityscapes

ADE20K

Pascal Context

Get Started

Train

Single-scale testing

Multi-scale testing

Reference

License

Acknowledgement

Owner

Fudan Zhang Vision Group

SANet: A Slice-Aware Network for Pulmonary Nodule Detection

Code release for "Making a Bird AI Expert Work for You and Me".

Raster Vision is an open source Python framework for building computer vision models on satellite, aerial, and other large imagery sets

This repository is based on Ultralytics/yolov5, with adjustments to enable rotate prediction boxes.

Datasets and pretrained Models for StyleGAN3 ...

This repository contains the segmentation user interface from the OpenSurfaces project, extracted as a lightweight tool

PyTorch implementation of U-TAE and PaPs for satellite image time series panoptic segmentation.

SatelliteNeRF - PyTorch-based Neural Radiance Fields adapted to satellite domain

Download and preprocess popular sequential recommendation datasets

Agent-based model simulator for air quality and pandemic risk assessment in architectural spaces

A toolkit for document-level event extraction, containing some SOTA model implementations

Crawl & visualize ICLR papers and reviews

Efficient and Scalable Physics-Informed Deep Learning and Scientific Machine Learning on top of Tensorflow for multi-worker distributed computing

Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow

Implementation of CVAE. Trained CVAE on faces from UTKFace Dataset to produce synthetic faces with a given degree of happiness/smileyness.

Official implementation of TMANet.

Implementation and replication of ProGen, Language Modeling for Protein Generation, in Jax

Detecting Blurred Ground-based Sky/Cloud Images

[ICCV 2021] Encoder-decoder with Multi-level Attention for 3D Human Shape and Pose Estimation

Convert dog pictures into various painting styles. Try LimnPet