Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing

Last update: Dec 05, 2022

Related tags

Overview

Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing

Introduction

Multi-task indoor scene understanding is widely considered as an intriguing formulation, as the affinity of different tasks may lead to improved performance. In this paper, we tackle the new problem of joint semantic, affordance and attribute parsing. However, successfully resolving it requires a model to capture long-range dependency, learn from weakly aligned data and properly balance sub-tasks during training. To this end, we propose an attention-based architecture named Cerberus and a tailored training framework. Our method effectively addresses aforementioned challenges and achieves state-of-the-art performance on all three tasks. Moreover, an in-depth analysis shows concept affinity consistent with human cognition, which inspires us to explore the possibility of extremely low-shot learning. Surprisingly, Cerberus achieves strong results using only 0.1%-1% annotation. Visualizations further confirm that this success is credited to common attention maps across tasks. Code and models are publicly available.

Citation

If you find our work useful in your research, please consider citing:

Installation

Requirements

Data preparation

Attribute

Affordance

Semantic

Run Pre-trained Model

You can download pre-trained model HERE.

Training and evaluating

To train a Cerberus on NYUd2 with a single GPU:

CUDA_VISIBLE_DEVICES=0 python main.py train -d [dataset_path] -s 512 --batch-size 2 --random-scale 2 --random-rotate 10 --epochs 200 --lr 0.007 --momentum 0.9 --lr-mode poly --workers 12

To test the trained model with its checkpoint:

CUDA_VISIBLE_DEVICES=0 python main.py test -d [dataset_path]  -s 512 --resume model_best.pth.tar --phase val --batch-size 1 --ms --workers 10

Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing

Related tags

Overview

Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing

Introduction

Citation

Installation

Requirements

Data preparation

Attribute

Affordance

Semantic

Run Pre-trained Model

Training and evaluating

Owner

ReAct: Out-of-distribution Detection With Rectified Activations

Editing a classifier by rewriting its prediction rules

This repository implements variational graph auto encoder by Thomas Kipf.

Graph Convolutional Networks in PyTorch

PyTorch implementation of Spiking Neural Networks trained on surrogate gradient & BPTT using snntorch.

DABO: Data Augmentation with Bilevel Optimization

RepVGG: Making VGG-style ConvNets Great Again

PyTorch Implementation of Region Similarity Representation Learning (ReSim)

code for ICCV 2021 paper 'Generalized Source-free Domain Adaptation'

Official code for the CVPR 2022 (oral) paper "Extracting Triangular 3D Models, Materials, and Lighting From Images".

Workshop Materials Delivered on 28/02/2022

Fast SHAP value computation for interpreting tree-based models

Partial implementation of ODE-GAN technique from the paper Training Generative Adversarial Networks by Solving Ordinary Differential Equations

Chess reinforcement learning by AlphaGo Zero methods.

Author Disambiguation using Knowledge Graph Embeddings with Literals

PyTorch implementation of Neural View Synthesis and Matching for Semi-Supervised Few-Shot Learning of 3D Pose

Chinese named entity recognization with BiLSTM using Keras

Block Sparse movement pruning

A very simple baseline to estimate 2D & 3D SMPL-compatible keypoints from a single color image.

Context-Sensitive Misspelling Correction of Clinical Text via Conditional Independence, CHIL 2022