This is an (re-)implementation of DeepLab-ResNet in TensorFlow for semantic image segmentation on the PASCAL VOC dataset.

Last update: Jan 16, 2022

Overview

DeepLab-ResNet-TensorFlow

This is an (re-)implementation of DeepLab-ResNet in TensorFlow for semantic image segmentation on the PASCAL VOC dataset.

Updates

29 Jan, 2017:

Fixed the implementation of the batch normalisation layer: it now supports both the training and inference steps. If the flag --is-training is provided, the running means and variances will be updated; otherwise, they will be kept intact. The .ckpt files have been updated accordingly - to download please refer to the new link provided below.
Image summaries during the training process can now be seen using TensorBoard.
Fixed the evaluation procedure: the 'void' label (255) is now correctly ignored. As a result, the performance score on the validation set has increased to 80.1%.

Model Description

The DeepLab-ResNet is built on a fully convolutional variant of ResNet-101 with atrous (dilated) convolutions, atrous spatial pyramid pooling, and multi-scale inputs (not implemented here).

The model is trained on a mini-batch of images and corresponding ground truth masks with the softmax classifier at the top. During training, the masks are downsampled to match the size of the output from the network; during inference, to acquire the output of the same size as the input, bilinear upsampling is applied. The final segmentation mask is computed using argmax over the logits. Optionally, a fully-connected probabilistic graphical model, namely, CRF, can be applied to refine the final predictions. On the test set of PASCAL VOC, the model achieves 79.7% of mean intersection-over-union.

For more details on the underlying model please refer to the following paper:

@article{CP2016Deeplab,
  title={DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs},
  author={Liang-Chieh Chen and George Papandreou and Iasonas Kokkinos and Kevin Murphy and Alan L Yuille},
  journal={arXiv:1606.00915},
  year={2016}
}

Requirements

TensorFlow needs to be installed before running the scripts. TensorFlow 0.12 is supported; for TensorFlow 0.11 please refer to this branch.

To install the required python packages (except TensorFlow), run

pip install -r requirements.txt

or for a local installation

pip install -user -r requirements.txt

Caffe to TensorFlow conversion

To imitate the structure of the model, we have used .caffemodel files provided by the authors. The conversion has been performed using Caffe to TensorFlow with an additional configuration for atrous convolution and batch normalisation (since the batch normalisation provided by Caffe-tensorflow only supports inference). There is no need to perform the conversion yourself as you can download the already converted models - deeplab_resnet.ckpt (pre-trained) and deeplab_resnet_init.ckpt (the last layers are randomly initialised) - here.

Nevertheless, it is easy to perform the conversion manually, given that the appropriate .caffemodel file has been downloaded, and Caffe to TensorFlow dependencies have been installed. The Caffe model definition is provided in misc/deploy.prototxt. To extract weights from .caffemodel, run the following:

python convert.py /path/to/deploy/prototxt --caffemodel /path/to/caffemodel --data-output-path /where/to/save/numpy/weights

As a result of running the command above, the model weights will be stored in /where/to/save/numpy/weights. To convert them to the native TensorFlow format (.ckpt), simply execute:

python npy2ckpt.py /where/to/save/numpy/weights --save-dir=/where/to/save/ckpt/weights

Dataset and Training

To train the network, one can use the augmented PASCAL VOC 2012 dataset with 10582 images for training and 1449 images for validation.

The training script allows to monitor the progress in the optimisation process using TensorBoard's image summary. Besides that, one can also exploit random scaling of the inputs during training as a means for data augmentation. For example, to train the model from scratch with random scale turned on, simply run:

python train.py --random-scale

To see the documentation on each of the training settings run the following:

python train.py --help

An additional script, fine_tune.py, demonstrates how to train only the last layers of the network.

Evaluation

The single-scale model shows 80.1% mIoU on the Pascal VOC 2012 validation dataset. No post-processing step with CRF is applied.

The following command provides the description of each of the evaluation settings:

python evaluate.py --help

Inference

To perform inference over your own images, use the following command:

python inference.py /path/to/your/image /path/to/ckpt/file

This will run the forward pass and save the resulted mask with this colour map:

Missing features

At the moment, the post-processing step with CRF is not implemented. Besides that, multi-scale inputs are missing, as well. No weight regularisation is applied.

Other implementations

DeepLab-LargeFOV in TensorFlow

This is an (re-)implementation of DeepLab-ResNet in TensorFlow for semantic image segmentation on the PASCAL VOC dataset.

Related tags

Overview

DeepLab-ResNet-TensorFlow

Updates

Model Description

Requirements

Caffe to TensorFlow conversion

Dataset and Training

Evaluation

Inference

Missing features

Other implementations

Owner

Franka Emika Panda manipulator kinematics&dynamics simulation

Introduction to Statistics and Basics of Mathematics for Data Science - The Hacker's Way

The official repository for our paper "The Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic Generalization".

This repository contains python code necessary to replicated the experiments performed in our paper "Invariant Ancestry Search"

Video lie detector using xgboost - A video lie detector using OpenFace and xgboost

Convolutional Neural Network to detect deforestation in the Amazon Rainforest

Certified Patch Robustness via Smoothed Vision Transformers

A high-level Python library for Quantum Natural Language Processing

This repository contains code and data for "On the Multimodal Person Verification Using Audio-Visual-Thermal Data"

Neural HMMs are all you need (for high-quality attention-free TTS)

Retrieval.pytorch - The code we used in [2020 DIGIX]

Fbone (Flask bone) is a Flask (Python microframework) starter/template/bootstrap/boilerplate application.

Check out the StyleGAN repo and place it in the same directory hierarchy as the present repo

A general framework for inferring CNNs efficiently. Reduce the inference latency of MobileNet-V3 by 1.3x on an iPhone XS Max without sacrificing accuracy.

:hot_pepper: R²SQL: "Dynamic Hybrid Relation Network for Cross-Domain Context-Dependent Semantic Parsing." (AAAI 2021)

Code for the paper "Reinforcement Learning as One Big Sequence Modeling Problem"

Spatiotemporal resampling methods for mlr3

Implementation and replication of ProGen, Language Modeling for Protein Generation, in Jax

Implementation for "Domain-Specific Bias Filtering for Single Labeled Domain Generalization"

GB-CosFace: Rethinking Softmax-based Face Recognition from the Perspective of Open Set Classification