ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation

Last update: Dec 13, 2022

Overview

ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation

This repository contains the source code of our paper, ESPNet (accepted for publication in ECCV'18).

Sample results

Check our project page for more qualitative results (videos).

Click on the below sample image to view the segmentation results on YouTube.

Structure of this repository

This repository is organized as:

train This directory contains the source code for trainig the ESPNet-C and ESPNet models.
test This directory contains the source code for evaluating our model on RGB Images.
pretrained This directory contains the pre-trained models on the CityScape dataset
- encoder This directory contains the pretrained ESPNet-C models
- decoder This directory contains the pretrained ESPNet models

Performance on the CityScape dataset

Our model ESPNet achives an class-wise mIOU of 60.336 and category-wise mIOU of 82.178 on the CityScapes test dataset and runs at

112 fps on the NVIDIA TitanX (30 fps faster than ENet)
9 FPS on TX2
With the same number of parameters as ENet, our model is 2% more accurate

Performance on the CamVid dataset

Our model achieves an mIOU of 55.64 on the CamVid test set. We used the dataset splits (train/val/test) provided here. We trained the models at a resolution of 480x360. For comparison with other models, see SegNet paper.

Note: We did not use the 3.5K dataset for training which was used in the SegNet paper.

Model	mIOU	Class avg.
ENet	51.3	68.3
SegNet	55.6	65.2
ESPNet	55.64	68.30

Pre-requisite

To run this code, you need to have following libraries:

OpenCV - We tested our code with version > 3.0.
PyTorch - We tested with v0.3.0
Python - We tested our code with Pythonv3. If you are using Python v2, please feel free to make necessary changes to the code.

We recommend to use Anaconda. We have tested our code on Ubuntu 16.04.

Citation

If ESPNet is useful for your research, then please cite our paper.

@inproceedings{mehta2018espnet,
  title={ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation},
  author={Sachin Mehta, Mohammad Rastegari, Anat Caspi, Linda Shapiro, and Hannaneh Hajishirzi},
  booktitle={ECCV},
  year={2018}
}

FAQs

Assertion error with class labels (t >= 0 && t < n_classes).

If you are getting an assertion error with class labels, then please check the number of class labels defined in the label images. You can do this as:

import cv2
import numpy as np
labelImg = cv2.imread(<label_filename.png>, 0)
unique_val_arr = np.unique(labelImg)
print(unique_val_arr)

The values inside unique_val_arr should be between 0 and total number of classes in the dataset. If this is not the case, then pre-process your label images. For example, if the label iamge contains 255 as a value, then you can ignore these values by mapping it to an undefined or background class as:

labelImg[labelImg == 255] = <undefined class id>

ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation

Related tags

Overview

ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation

Sample results

Structure of this repository

Performance on the CityScape dataset

Performance on the CamVid dataset

Pre-requisite

Citation

FAQs

Assertion error with class labels (t >= 0 && t < n_classes).

Owner

Sachin Mehta

STARCH compuets regional extreme storm physical characteristics and moisture balance based on spatiotemporal precipitation data from reanalysis or climate model data.

This is the official repository of Music Playlist Title Generation: A Machine-Translation Approach.

An implementation of the AdaOPS (Adaptive Online Packing-based Search), which is an online POMDP Solver used to solve problems defined with the POMDPs.jl generative interface.

MEND: Model Editing Networks using Gradient Decomposition

Code for the paper "Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks"

The source code and dataset for the RecGURU paper (WSDM 2022)

Code for the CVPR2021 workshop paper "Noise Conditional Flow Model for Learning the Super-Resolution Space"

Semi-supervised Learning for Sentiment Analysis

The repository offers the official implementation of our BMVC 2021 paper in PyTorch.

unofficial pytorch implement of "Squareplus: A Softplus-Like Algebraic Rectifier"

《Unsupervised 3D Human Pose Representation with Viewpoint and Pose Disentanglement》(ECCV 2020) GitHub: [fig9]

Learning to Prompt for Vision-Language Models.

smc.covid is an R package related to the paper A sequential Monte Carlo approach to estimate a time varying reproduction number in infectious disease models: the COVID-19 case by Storvik et al

Vision Transformer and MLP-Mixer Architectures

Over-the-Air Ensemble Inference with Model Privacy

People log into different sites every day to get information and browse through these sites one by one

PyTorch implementation of CVPR 2020 paper (Reference-Based Sketch Image Colorization using Augmented-Self Reference and Dense Semantic Correspondence) and pre-trained model on ImageNet dataset

Rethinking the U-Net architecture for multimodal biomedical image segmentation

Face Library is an open source package for accurate and real-time face detection and recognition

Official repository for the paper, MidiBERT-Piano: Large-scale Pre-training for Symbolic Music Understanding.