Weakly- and Semi-Supervised Panoptic Segmentation (ECCV18)

Overview

Weakly- and Semi-Supervised Panoptic Segmentation

by Qizhu Li*, Anurag Arnab*, Philip H.S. Torr

This repository demonstrates the weakly supervised ground truth generation scheme presented in our paper Weakly- and Semi-Supervised Panoptic Segmentation published at ECCV 2018. The code has been cleaned-up and refactored, and should reproduce the results presented in the paper.

For details, please refer to our paper, and project page. Please check the Downloads section for all the additional data we release.

Summary

* Equal first authorship

Introduction

In our weakly-supervised panoptic segmentation experiments, our models are supervised by 1) image-level tags and 2) bounding boxes, as shown in the figure above. We used image-level tags as supervision for "stuff" classes which do not have a defined extent and cannot be described well by tight bounding boxes. For "thing" classes, we used bounding boxes as our weak supervision. This code release clarifies the implementation details of the method presented in the paper.

Iterative ground truth generation

For readers' convenience, we will give an outline of the proposed iterative ground truth generation pipeline, and provide demos for some of the key steps.

  1. We train a multi-class classifier for all classes to obtain rough localisation cues. As it is not possible to fit an entire Cityscapes image (1024x2048) into a network due to GPU memory constraints, we took 15 fixed 400x500 crops per training image, and derived their classification ground truth accordingly, which we use to train the multi-class classifier. From the trained classifier, we extract the Class Activation Maps (CAMs) using Grad-CAM, which has the advantage of being agnostic to network architecture over CAM.

    • Download the fixed image crops with image-level tags here to train your own classifier. For convenience, the pixel-level semantic label of the crops are also included, though they should not be used in training.
    • The CAMs we produced are available for download here.
  2. In parallel, we extract bounding box annotations from Cityscapes ground truth files, and then run MCG (a segment-proposal algorithm) and Grabcut (a classic foreground segmentation technique given a bounding-box prior) on the training images to generate foreground masks inside each annotated bounding box. MCG and Grabcut masks are merged following the rule that only regions where both have consensus are given the predicted label; otherwise an "ignore" label is assigned.

    • The extracted bounding boxes (saved in .mat format) can be downloaded here. Alternatively, we also provide a demo script demo_instanceTrainId_to_dets.m and a batch script batch_instanceTrainId_to_dets.m for you to make them yourself. The demo is self-contained; However, before running the batch script, make sure to
      1. Download the official Cityscapes scripts repository;

      2. Inside the above repository, navigate to cityscapesscripts/preparation and run

        python createTrainIdInstanceImgs.py

        This command requires an environment variable CITYSCAPES_DATASTET=path/to/your/cityscapes/data/folder to be set. These two steps produce the *_instanceTrainIds.png files required by our batch script;

      3. Navigate back to this repository, and place/symlink your gtFine and gtCoarse folders inside data/Cityscapes/ folder so that they are visible to our batch script.

    • Please see here for details on MCG.
    • We use the OpenCV implementation of Grabcut in our experiments.
    • The merged M&G masks we produced are available for download here.
  3. The CAMs (step 1) and M&G masks (step 2) are merged to produce the ground truth needed to kick off iterative training. To see a demo of merging, navigate to the root folder of this repo in MATLAB and run:

     demo_merge_cam_mandg;

    When post-processing network predictions of images from the Cityscapes train_extra split, make sure to use the following settings:

    opts.run_apply_bbox_prior = false;
    opts.run_check_image_level_tags = false;
    opts.save_ins = false;

    because the coarse annotation provided on the train_extra split trades off recall for precision, leading to inaccurate bounding box coordinates, and frequent occurrences of false negatives. This also applies to step 5.

    • The results from merging CAMs with M&G masks can be downloaded here.
  4. Using the generated ground truth, weakly-supervised models can be trained in the same way as a fully-supervised model. When the training loss converges, we make dense predictions using the model and also save the prediction scores.

    • An example of dense prediction made by a weakly-supervised model is included at results/pred_sem_raw/, and an example of the corresponding prediction scores is provided at results/pred_flat_feat/.
  5. The prediction and prediction scores (and optionally, the M&G masks) are used to generate the ground truth labels for next stage of iterative training. To see a demo of iterative ground truth generation, navigate to the root folder of this repo in MATLAB and run:

    demo_make_iterative_gt;

    The generated semantic and instance ground truth labels are saved at results/pred_sem_clean and results/pred_ins_clean respectively.

    Please refer to scripts/get_opts.m for the options available. To reproduce the results presented in the paper, use the default setting, and set opts.run_merge_with_mcg_and_grabcut to false after five iterations of training, as the weakly supervised model by then produces better quality segmentation of ''thing'' classes than the original M&G masks.

  6. Repeat step 4 and 5 until training loss no longer reduces.

Downloads

  1. Image crops and tags for training multi-class classifier:
  2. CAMs:
  3. Extracted Cityscapes bounding boxes (.mat format):
  4. Merged MCG&Grabcut masks:
  5. CAMs merged with MCG&Grabcut masks:

Note that due to file size limit set by BaiduYun, some of the larger files had to be split into several chunks in order to be uploaded. These files are named as filename.zip.part##, where filename is the original file name excluding the extension, and ## is a two digit part index. After you have downloaded all the parts, cd to the folder where they are saved, and use the following command to join them back together:

cat filename.zip.part* > filename.zip

The joining operation may take several minutes, depending on file size.

The above does not apply to files downloaded from Dropbox.

Reference

If you find the code helpful in your research, please cite our paper:

@InProceedings{Li_2018_ECCV,
    author = {Li, Qizhu and 
              Arnab, Anurag and 
              Torr, Philip H.S.},
    title = {Weakly- and Semi-Supervised Panoptic Segmentation},
    booktitle = {The European Conference on Computer Vision (ECCV)},
    month = {September},
    year = {2018}
}

Questions

Please contact Qizhu Li [email protected] and Anurag Arnab [email protected] for enquires, issues, and suggestions.

Owner
Qizhu Li
Capable of living on land, but prefers to stay in water.
Qizhu Li
A neuroanatomy-based augmented reality experience powered by computer vision. Features 3D visuals of the Atlas Brain Map slices.

Brain Augmented Reality (AR) A neuroanatomy-based augmented reality experience powered by computer vision that features 3D visuals of the Atlas Brain

Yasmeen Brain 10 Oct 06, 2022
The implementation our EMNLP 2021 paper "Enhanced Language Representation with Label Knowledge for Span Extraction".

LEAR The implementation our EMNLP 2021 paper "Enhanced Language Representation with Label Knowledge for Span Extraction". See below for an overview of

杨攀 93 Jan 07, 2023
Boundary IoU API (Beta version)

Boundary IoU API (Beta version) Bowen Cheng, Ross Girshick, Piotr Dollár, Alexander C. Berg, Alexander Kirillov [arXiv] [Project] [BibTeX] This API is

Bowen Cheng 177 Dec 29, 2022
Empowering journalists and whistleblowers

Onymochat Empowering journalists and whistleblowers Onymochat is an end-to-end encrypted, decentralized, anonymous chat application. You can also host

Samrat Dutta 19 Sep 02, 2022
General Multi-label Image Classification with Transformers

General Multi-label Image Classification with Transformers Jack Lanchantin, Tianlu Wang, Vicente Ordóñez Román, Yanjun Qi Conference on Computer Visio

QData 154 Dec 21, 2022
SLAMP: Stochastic Latent Appearance and Motion Prediction

SLAMP: Stochastic Latent Appearance and Motion Prediction Official implementation of the paper SLAMP: Stochastic Latent Appearance and Motion Predicti

Kaan Akan 34 Dec 08, 2022
This repository contains the exercises and its solution contained in the book "An Introduction to Statistical Learning" in python.

An-Introduction-to-Statistical-Learning This repository contains the exercises and its solution contained in the book An Introduction to Statistical L

2.1k Jan 02, 2023
Re-implememtation of MAE (Masked Autoencoders Are Scalable Vision Learners) using PyTorch.

mae-repo PyTorch re-implememtation of "masked autoencoders are scalable vision learners". In this repo, it heavily borrows codes from codebase https:/

Peng Qiao 1 Dec 14, 2021
wmctrl ported to Python Ctypes

work in progress wmctrl is a command that can be used to interact with an X Window manager that is compatible with the EWMH/NetWM specification. wmctr

Iyad Ahmed 22 Dec 31, 2022
HDMapNet: A Local Semantic Map Learning and Evaluation Framework

HDMapNet_devkit Devkit for HDMapNet. HDMapNet: A Local Semantic Map Learning and Evaluation Framework Qi Li, Yue Wang, Yilun Wang, Hang Zhao [Paper] [

Tsinghua MARS Lab 421 Jan 04, 2023
Liquid Warping GAN with Attention: A Unified Framework for Human Image Synthesis

Liquid Warping GAN with Attention: A Unified Framework for Human Image Synthesis, including human motion imitation, appearance transfer, and novel view synthesis. Currently the paper is under review

2.3k Jan 05, 2023
Data & Code for ACCENTOR Adding Chit-Chat to Enhance Task-Oriented Dialogues

ACCENTOR: Adding Chit-Chat to Enhance Task-Oriented Dialogues Overview ACCENTOR consists of the human-annotated chit-chat additions to the 23.8K dialo

Facebook Research 69 Dec 29, 2022
Official implementation of EdiTTS: Score-based Editing for Controllable Text-to-Speech

EdiTTS: Score-based Editing for Controllable Text-to-Speech Official implementation of EdiTTS: Score-based Editing for Controllable Text-to-Speech. Au

Neosapience 98 Dec 25, 2022
Implementation for "Exploiting Aliasing for Manga Restoration" (CVPR 2021)

[CVPR Paper](To appear) | [Project Website](To appear) | BibTex Introduction As a popular entertainment art form, manga enriches the line drawings det

133 Dec 15, 2022
Co-GAIL: Learning Diverse Strategies for Human-Robot Collaboration

CoGAIL Table of Content Overview Installation Dataset Training Evaluation Trained Checkpoints Acknowledgement Citations License Overview This reposito

Jeremy Wang 29 Dec 24, 2022
Stacked Hourglass Network with a Multi-level Attention Mechanism: Where to Look for Intervertebral Disc Labeling

⚠️ ‎‎‎ A more recent and actively-maintained version of this code is available in ivadomed Stacked Hourglass Network with a Multi-level Attention Mech

Reza Azad 14 Oct 24, 2022
This is an official implementation for the WTW Dataset in "Parsing Table Structures in the Wild " on table detection and table structure recognition.

WTW-Dataset This is an official implementation for the WTW Dataset in "Parsing Table Structures in the Wild " on ICCV 2021. Here, you can download the

109 Dec 29, 2022
code for our BMVC 2021 paper "HCV: Hierarchy-Consistency Verification for Incremental Implicitly-Refined Classification"

HCV_IIRC code for our BMVC 2021 paper HCV: Hierarchy-Consistency Verification for Incremental Implicitly-Refined Classification by Kai Wang, Xialei Li

kai wang 13 Oct 03, 2022
Spectrum Surveying: Active Radio Map Estimation with Autonomous UAVs

Spectrum Surveying: The Python code in this repository implements the simulations and plots the figures described in the paper “Spectrum Surveying: Ac

Universitetet i Agder 2 Dec 06, 2022
CoaT: Co-Scale Conv-Attentional Image Transformers

CoaT: Co-Scale Conv-Attentional Image Transformers Introduction This repository contains the official code and pretrained models for CoaT: Co-Scale Co

mlpc-ucsd 191 Dec 03, 2022