[CVPR 2021] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Related tags

Deep LearningSETR
Overview

SEgmentation TRansformers -- SETR

image

Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers,
Sixiao Zheng, Jiachen Lu, Hengshuang Zhao, Xiatian Zhu, Zekun Luo, Yabiao Wang, Yanwei Fu, Jianfeng Feng, Tao Xiang, Philip HS Torr, Li Zhang,
CVPR 2021

Installation

Our project is developed based on mmsegmentation. Please follow the official mmsegmentation INSTALL.md and getting_started.md for installation and dataset preparation.

Main results

Cityscapes

Method Crop Size Batch size iteration set mIoU
SETR-Naive 768x768 8 40k val 77.37 model config
SETR-Naive 768x768 8 80k val 77.90 model config
SETR-MLA 768x768 8 40k val 76.65 model config
SETR-MLA 768x768 8 80k val 77.24 model config
SETR-PUP 768x768 8 40k val 78.39 model config
SETR-PUP 768x768 8 80k val 79.34 model config
SETR-Naive-DeiT 768x768 8 40k val 77.85 model config
SETR-Naive-DeiT 768x768 8 80k val 78.66 model config
SETR-MLA-DeiT 768x768 8 40k val 78.04 model config
SETR-MLA-DeiT 768x768 8 80k val 78.98 model config
SETR-PUP-DeiT 768x768 8 40k val 78.79 model config
SETR-PUP-DeiT 768x768 8 80k val 79.45 model config

ADE20K

Method Crop Size Batch size iteration set mIoU mIoU(ms+flip)
SETR-Naive 512x512 16 160k Val 48.06 48.80 model config
SETR-MLA 512x512 8 160k val 48.27 50.03 model config
SETR-MLA 512x512 16 160k val 48.64 50.28 model config
SETR-PUP 512x512 16 160k val 48.58 50.09 model config

Pascal Context

Method Crop Size Batch size iteration set mIoU mIoU(ms+flip)
SETR-Naive 480x480 16 80k val 52.89 53.61 model config
SETR-MLA 480x480 8 80k val 54.39 55.39 model config
SETR-MLA 480x480 16 80k val 54.87 55.83 model config
SETR-PUP 480x480 16 80k val 54.40 55.27 model config

Get Started

Train

./tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM} 
# For example, train a SETR-PUP on Cityscapes dataset with 8 GPUs
./tools/dist_train.sh configs/SETR/SETR_PUP_768x768_40k_cityscapes_bs_8.py 8

Single-scale testing

./tools/dist_test.sh ${CONFIG_FILE} ${CHECKPOINT_FILE} ${GPU_NUM}  [--eval ${EVAL_METRICS}]
# For example, test a SETR-PUP on Cityscapes dataset with 8 GPUs
./tools/dist_test.sh configs/SETR/SETR_PUP_768x768_40k_cityscapes_bs_8.py \
work_dirs/SETR_PUP_768x768_40k_cityscapes_bs_8/iter_40000.pth \
8 --eval mIoU

Multi-scale testing

Use the config file ending in _MS.py in configs/SETR.

./tools/dist_test.sh ${CONFIG_FILE} ${CHECKPOINT_FILE} ${GPU_NUM}  [--eval ${EVAL_METRICS}]
# For example, test a SETR-PUP on Cityscapes dataset with 8 GPUs
./tools/dist_test.sh configs/SETR/SETR_PUP_768x768_40k_cityscapes_bs_8_MS.py \
work_dirs/SETR_PUP_768x768_40k_cityscapes_bs_8/iter_40000.pth \
8 --eval mIoU

Please see getting_started.md for the more basic usage of training and testing.

Reference

@inproceedings{SETR,
    title={Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers}, 
    author={Zheng, Sixiao and Lu, Jiachen and Zhao, Hengshuang and Zhu, Xiatian and Luo, Zekun and Wang, Yabiao and Fu, Yanwei and Feng, Jianfeng and Xiang, Tao and Torr, Philip H.S. and Zhang, Li},
    booktitle={CVPR},
    year={2021}
}

License

MIT

Acknowledgement

Thanks to previous open-sourced repo:
mmsegmentation
pytorch-image-models

Owner
Fudan Zhang Vision Group
Zhang Vision Group at the School of Data Science of the Fudan University, led by Professor Li Zhang
Fudan Zhang Vision Group
Scripts and a shader to get you started on setting up an exported Koikatsu character in Blender.

KK Blender Shader Pack A plugin and a shader to get you started with setting up an exported Koikatsu character in Blender. The plugin is a Blender add

166 Jan 01, 2023
A PyTorch implementation of the paper Mixup: Beyond Empirical Risk Minimization in PyTorch

Mixup: Beyond Empirical Risk Minimization in PyTorch This is an unofficial PyTorch implementation of mixup: Beyond Empirical Risk Minimization. The co

Harry Yang 121 Dec 17, 2022
Camera-caps - Examine the camera capabilities for V4l2 cameras

camera-caps This is a graphical user interface over the v4l2-ctl command line to

Jetsonhacks 25 Dec 26, 2022
TensorFlow 2 AI/ML library wrapper for openFrameworks

ofxTensorFlow2 This is an openFrameworks addon for the TensorFlow 2 ML (Machine Learning) library

Center for Art and Media Karlsruhe 96 Dec 31, 2022
Easy genetic ancestry predictions in Python

ezancestry Easily visualize your direct-to-consumer genetics next to 2500+ samples from the 1000 genomes project. Evaluate the performance of a custom

Kevin Arvai 38 Jan 02, 2023
The dataset of tweets pulling from Twitters with keyword: Hydroxychloroquine, location: US, Time: 2020

HCQ_Tweet_Dataset: FREE to Download. Keywords: HCQ, hydroxychloroquine, tweet, twitter, COVID-19 This dataset is associated with the paper "Understand

2 Mar 16, 2022
This repository contains the implementation of Deep Detail Enhancment for Any Garment proposed in Eurographics 2021

Deep-Detail-Enhancement-for-Any-Garment Introduction This repository contains the implementation of Deep Detail Enhancment for Any Garment proposed in

40 Dec 13, 2022
Hyper-parameter optimization for sklearn

hyperopt-sklearn Hyperopt-sklearn is Hyperopt-based model selection among machine learning algorithms in scikit-learn. See how to use hyperopt-sklearn

1.4k Jan 01, 2023
An open-access benchmark and toolbox for electricity price forecasting

epftoolbox The epftoolbox is the first open-access library for driving research in electricity price forecasting. Its main goal is to make available a

97 Dec 05, 2022
A modified version of DeepMind's Alphafold2 to divide CPU part (MSA and template searching) and GPU part (prediction model)

ParallelFold Author: Bozitao Zhong This is a modified version of DeepMind's Alphafold2 to divide CPU part (MSA and template searching) and GPU part (p

Bozitao Zhong 77 Dec 22, 2022
This repository contains code accompanying the paper "An End-to-End Chinese Text Normalization Model based on Rule-Guided Flat-Lattice Transformer"

FlatTN This repository contains code accompanying the paper "An End-to-End Chinese Text Normalization Model based on Rule-Guided Flat-Lattice Transfor

THUHCSI 74 Nov 28, 2022
Here we present the implementation in TensorFlow of our work about liver lesion segmentation accepted in the Machine Learning 4 Health Workshop

Detection-aided liver lesion segmentation Here we present the implementation in TensorFlow of our work about liver lesion segmentation accepted in the

Image Processing Group - BarcelonaTECH - UPC 96 Oct 26, 2022
Discord bot for notifying on github events

Git-Observer Discord bot for notifying on github events ⚠️ This bot is meant to write messages to only one channel (implementing this for multiple pro

ilu_vatar_ 0 Apr 19, 2022
details on efforts to dump the Watermelon Games Paprium cart

Reminder, if you like these repos, fork them so they don't disappear https://github.com/ArcadeHustle/WatermelonPapriumDump/fork Big thanks to Fonzie f

Hustle Arcade 29 Dec 11, 2022
Package to compute Mauve, a similarity score between neural text and human text. Install with `pip install mauve-text`.

MAUVE MAUVE is a library built on PyTorch and HuggingFace Transformers to measure the gap between neural text and human text with the eponymous MAUVE

Krishna Pillutla 182 Jan 02, 2023
All materials of Cassandra Event, Udyam'22

Cassandra 2022 Workspace Workshop Materials Workshop-1 Workshop-2 Workshop-3 Workshop-4 Assignments Assignment-1 Assignment-2 Assignment-3 Resources P

36 Dec 31, 2022
Racing line optimization algorithm in python that uses Particle Swarm Optimization.

Racing Line Optimization with PSO This repository contains a racing line optimization algorithm in python that uses Particle Swarm Optimization. Requi

Parsa Dahesh 6 Dec 14, 2022
Use graph-based analysis to re-classify stocks and to improve Markowitz portfolio optimization

Dynamic Stock Industrial Classification Use graph-based analysis to re-classify stocks and experiment different re-classification methodologies to imp

Sheng Yang 10 Dec 05, 2022
The official implementation of "Rethink Dilated Convolution for Real-time Semantic Segmentation"

RegSeg The official implementation of "Rethink Dilated Convolution for Real-time Semantic Segmentation" Paper: arxiv D block Decoder Setup Install the

Roland 61 Dec 27, 2022
Codes for CIKM'21 paper 'Self-Supervised Graph Co-Training for Session-based Recommendation'.

COTREC Codes for CIKM'21 paper 'Self-Supervised Graph Co-Training for Session-based Recommendation'. Requirements: Python 3.7, Pytorch 1.6.0 Best Hype

Xin Xia 42 Dec 09, 2022