Semantic Segmentation in Pytorch

Related tags

Deep Learningsemseg
Overview

PyTorch Semantic Segmentation

Introduction

This repository is a PyTorch implementation for semantic segmentation / scene parsing. The code is easy to use for training and testing on various datasets. The codebase mainly uses ResNet50/101/152 as backbone and can be easily adapted to other basic classification structures. Implemented networks including PSPNet and PSANet, which ranked 1st places in ImageNet Scene Parsing Challenge 2016 @ECCV16, LSUN Semantic Segmentation Challenge 2017 @CVPR17 and WAD Drivable Area Segmentation Challenge 2018 @CVPR18. Sample experimented datasets are ADE20K, PASCAL VOC 2012 and Cityscapes.

Update

  • 2020.05.15: Branch master, use official nn.SyncBatchNorm, only multiprocessing training is supported, tested with pytorch 1.4.0.
  • 2019.05.29: Branch 1.0.0, both multithreading training (nn.DataParallel) and multiprocessing training (nn.parallel.DistributedDataParallel) (recommended) are supported. And the later one is much faster. Use syncbn from EncNet and apex, tested with pytorch 1.0.0.

Usage

  1. Highlight:

  2. Requirement:

    • Hardware: 4-8 GPUs (better with >=11G GPU memory)
    • Software: PyTorch>=1.1.0, Python3, tensorboardX,
  3. Clone the repository:

    git clone https://github.com/hszhao/semseg.git
  4. Train:

    • Download related datasets and symlink the paths to them as follows (you can alternatively modify the relevant paths specified in folder config):

      cd semseg
      mkdir -p dataset
      ln -s /path_to_ade20k_dataset dataset/ade20k
      
    • Download ImageNet pre-trained models and put them under folder initmodel for weight initialization. Remember to use the right dataset format detailed in FAQ.md.

    • Specify the gpu used in config then do training:

      sh tool/train.sh ade20k pspnet50
    • If you are using SLURM for nodes manager, uncomment lines in train.sh and then do training:

      sbatch tool/train.sh ade20k pspnet50
  5. Test:

    • Download trained segmentation models and put them under folder specified in config or modify the specified paths.

    • For full testing (get listed performance):

      sh tool/test.sh ade20k pspnet50
    • Quick demo on one image:

      PYTHONPATH=./ python tool/demo.py --config=config/ade20k/ade20k_pspnet50.yaml --image=figure/demo/ADE_val_00001515.jpg TEST.scales '[1.0]'
  6. Visualization: tensorboardX incorporated for better visualization.

    tensorboard --logdir=exp/ade20k
  7. Other:

    • Resources: GoogleDrive LINK contains shared models, visual predictions and data lists.
    • Models: ImageNet pre-trained models and trained segmentation models can be accessed. Note that our ImageNet pretrained models are slightly different from original ResNet implementation in the beginning part.
    • Predictions: Visual predictions of several models can be accessed.
    • Datasets: attributes (names and colors) are in folder dataset and some sample lists can be accessed.
    • Some FAQs: FAQ.md.
    • Former video predictions: high accuracy -- PSPNet, PSANet; high efficiency -- ICNet.

Performance

Description: mIoU/mAcc/aAcc stands for mean IoU, mean accuracy of each class and all pixel accuracy respectively. ss denotes single scale testing and ms indicates multi-scale testing. Training time is measured on a sever with 8 GeForce RTX 2080 Ti. General parameters cross different datasets are listed below:

  • Train Parameters: sync_bn(True), scale_min(0.5), scale_max(2.0), rotate_min(-10), rotate_max(10), zoom_factor(8), ignore_label(255), aux_weight(0.4), batch_size(16), base_lr(1e-2), power(0.9), momentum(0.9), weight_decay(1e-4).
  • Test Parameters: ignore_label(255), scales(single: [1.0], multiple: [0.5 0.75 1.0 1.25 1.5 1.75]).
  1. ADE20K: Train Parameters: classes(150), train_h(473/465-PSP/A), train_w(473/465-PSP/A), epochs(100). Test Parameters: classes(150), test_h(473/465-PSP/A), test_w(473/465-PSP/A), base_size(512).

    • Setting: train on train (20210 images) set and test on val (2000 images) set.
    Network mIoU/mAcc/aAcc(ss) mIoU/mAcc/pAcc(ms) Training Time
    PSPNet50 0.4189/0.5227/0.8039. 0.4284/0.5266/0.8106. 14h
    PSANet50 0.4229/0.5307/0.8032. 0.4305/0.5312/0.8101. 14h
    PSPNet101 0.4310/0.5375/0.8107. 0.4415/0.5426/0.8172. 20h
    PSANet101 0.4337/0.5385/0.8102. 0.4414/0.5392/0.8170. 20h
  2. PSACAL VOC 2012: Train Parameters: classes(21), train_h(473/465-PSP/A), train_w(473/465-PSP/A), epochs(50). Test Parameters: classes(21), test_h(473/465-PSP/A), test_w(473/465-PSP/A), base_size(512).

    • Setting: train on train_aug (10582 images) set and test on val (1449 images) set.
    Network mIoU/mAcc/aAcc(ss) mIoU/mAcc/pAcc(ms) Training Time
    PSPNet50 0.7705/0.8513/0.9489. 0.7802/0.8580/0.9513. 3.3h
    PSANet50 0.7725/0.8569/0.9491. 0.7787/0.8606/0.9508. 3.3h
    PSPNet101 0.7907/0.8636/0.9534. 0.7963/0.8677/0.9550. 5h
    PSANet101 0.7870/0.8642/0.9528. 0.7966/0.8696/0.9549. 5h
  3. Cityscapes: Train Parameters: classes(19), train_h(713/709-PSP/A), train_w(713/709-PSP/A), epochs(200). Test Parameters: classes(19), test_h(713/709-PSP/A), test_w(713/709-PSP/A), base_size(2048).

    • Setting: train on fine_train (2975 images) set and test on fine_val (500 images) set.
    Network mIoU/mAcc/aAcc(ss) mIoU/mAcc/pAcc(ms) Training Time
    PSPNet50 0.7730/0.8431/0.9597. 0.7838/0.8486/0.9617. 7h
    PSANet50 0.7745/0.8461/0.9600. 0.7818/0.8487/0.9622. 7.5h
    PSPNet101 0.7863/0.8577/0.9614. 0.7929/0.8591/0.9638. 10h
    PSANet101 0.7842/0.8599/0.9621. 0.7940/0.8631/0.9644. 10.5h

Citation

If you find the code or trained models useful, please consider citing:

@misc{semseg2019,
  author={Zhao, Hengshuang},
  title={semseg},
  howpublished={\url{https://github.com/hszhao/semseg}},
  year={2019}
}
@inproceedings{zhao2017pspnet,
  title={Pyramid Scene Parsing Network},
  author={Zhao, Hengshuang and Shi, Jianping and Qi, Xiaojuan and Wang, Xiaogang and Jia, Jiaya},
  booktitle={CVPR},
  year={2017}
}
@inproceedings{zhao2018psanet,
  title={{PSANet}: Point-wise Spatial Attention Network for Scene Parsing},
  author={Zhao, Hengshuang and Zhang, Yi and Liu, Shu and Shi, Jianping and Loy, Chen Change and Lin, Dahua and Jia, Jiaya},
  booktitle={ECCV},
  year={2018}
}

Question

Some FAQ.md collected. You are welcome to send pull requests or give some advices. Contact information: hengshuangzhao at gmail.com.

Owner
Hengshuang Zhao
Hengshuang Zhao
Open standard for machine learning interoperability

Open Neural Network Exchange (ONNX) is an open ecosystem that empowers AI developers to choose the right tools as their project evolves. ONNX provides

Open Neural Network Exchange 13.9k Dec 30, 2022
Automatically measure the facial Width-To-Height ratio and get facial analysis results provided by Microsoft Azure

fwhr-calc-website This project is to automatically measure the facial Width-To-Height ratio and get facial analysis results provided by Microsoft Azur

SoohyunPark 1 Feb 07, 2022
From a body shape, infer the anatomic skeleton.

OSSO: Obtaining Skeletal Shape from Outside (CVPR 2022) This repository contains the official implementation of the skeleton inference from: OSSO: Obt

Marilyn Keller 166 Dec 28, 2022
Speech Enhancement Generative Adversarial Network Based on Asymmetric AutoEncoder

ASEGAN: Speech Enhancement Generative Adversarial Network Based on Asymmetric AutoEncoder 中文版简介 Readme with English Version 介绍 基于SEGAN模型的改进版本,使用自主设计的非

Nitin 53 Nov 17, 2022
Pre-training of Graph Augmented Transformers for Medication Recommendation

G-Bert Pre-training of Graph Augmented Transformers for Medication Recommendation Intro G-Bert combined the power of Graph Neural Networks and BERT (B

101 Dec 27, 2022
Automatic caption evaluation metric based on typicality analysis.

SeMantic and linguistic UndeRstanding Fusion (SMURF) Automatic caption evaluation metric described in the paper "SMURF: SeMantic and linguistic UndeRs

Joshua Feinglass 6 Jan 09, 2022
Repo for flood prediction using LSTMs and HAND

Abstract Every year, floods cause billions of dollars’ worth of damages to life, crops, and property. With a proper early flood warning system in plac

1 Oct 27, 2021
Dogs classification with Deep Metric Learning using some popular losses

Tsinghua Dogs classification with Deep Metric Learning 1. Introduction Tsinghua Dogs dataset Tsinghua Dogs is a fine-grained classification dataset fo

QuocThangNguyen 45 Nov 09, 2022
This is a collection of our NAS and Vision Transformer work.

AutoML - Neural Architecture Search This is a collection of our AutoML-NAS work iRPE (NEW): Rethinking and Improving Relative Position Encoding for Vi

Microsoft 832 Jan 08, 2023
Pytorch implementation of the paper "Enhancing Content Preservation in Text Style Transfer Using Reverse Attention and Conditional Layer Normalization"

Pytorch implementation of the paper "Enhancing Content Preservation in Text Style Transfer Using Reverse Attention and Conditional Layer Normalization"

Dongkyu Lee 4 Sep 18, 2022
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

Salesforce 1.3k Dec 31, 2022
Code to reproduce the results for Compositional Attention

Compositional-Attention This repository contains the official implementation for the paper Compositional Attention: Disentangling Search and Retrieval

Sarthak Mittal 58 Nov 30, 2022
Implementation of paper "Graph Condensation for Graph Neural Networks"

GCond A PyTorch implementation of paper "Graph Condensation for Graph Neural Networks" Code will be released soon. Stay tuned :) Abstract We propose a

Wei Jin 66 Dec 04, 2022
Jittor implementation of Recursive-NeRF: An Efficient and Dynamically Growing NeRF

Recursive-NeRF: An Efficient and Dynamically Growing NeRF This is a Jittor implementation of Recursive-NeRF: An Efficient and Dynamically Growing NeRF

33 Nov 30, 2022
Official PyTorch code for "BAM: Bottleneck Attention Module (BMVC2018)" and "CBAM: Convolutional Block Attention Module (ECCV2018)"

BAM and CBAM Official PyTorch code for "BAM: Bottleneck Attention Module (BMVC2018)" and "CBAM: Convolutional Block Attention Module (ECCV2018)" Updat

Jongchan Park 1.7k Jan 01, 2023
A Lightweight Face Recognition and Facial Attribute Analysis (Age, Gender, Emotion and Race) Library for Python

deepface Deepface is a lightweight face recognition and facial attribute analysis (age, gender, emotion and race) framework for python. It is a hybrid

Sefik Ilkin Serengil 5.2k Jan 02, 2023
High performance Cross-platform Inference-engine, you could run Anakin on x86-cpu,arm, nv-gpu, amd-gpu,bitmain and cambricon devices.

Anakin2.0 Welcome to the Anakin GitHub. Anakin is a cross-platform, high-performance inference engine, which is originally developed by Baidu engineer

514 Dec 28, 2022
Unofficial PyTorch Implementation of "DOLG: Single-Stage Image Retrieval with Deep Orthogonal Fusion of Local and Global Features"

Pytorch Implementation of Deep Orthogonal Fusion of Local and Global Features (DOLG) This is the unofficial PyTorch Implementation of "DOLG: Single-St

DK 96 Jan 06, 2023
Image Processing, Image Smoothing, Edge Detection and Transforms

opevcvdl-hw1 This project uses openCV and Qt to achieve the requirements. Version Python 3.7 opencv-contrib-python 3.4.2.17 Matplotlib 3.1.1 pyqt5 5.1

Kenny Cheng 3 Aug 17, 2022
Simple tool to combine(merge) onnx models. Simple Network Combine Tool for ONNX.

snc4onnx Simple tool to combine(merge) onnx models. Simple Network Combine Tool for ONNX. https://github.com/PINTO0309/simple-onnx-processing-tools 1.

Katsuya Hyodo 8 Oct 13, 2022