Cross-view Transformers for real-time Map-view Semantic Segmentation (CVPR 2022 Oral)

Last update: Dec 25, 2022

Overview

Cross View Transformers

This repository contains the source code and data for our paper:

Cross-view Transformers for real-time Map-view Semantic Segmentation
Brady Zhou, Philipp Krähenbühl
CVPR 2022

Demos

Map-view Segmentation: The model uses multi-view images to produce a map-view segmentation at 45 FPS

Map Making: With vehicle pose, we can construct a map by fusing model predictions over time

Cross-view Attention: For a given map-view location, we show which image patches are being attended to

Installation

# Clone repo
git clone https://github.com/bradyz/cross_view_transformers.git

cd cross_view_transformers

# Setup conda environment
conda create -y --name cvt python=3.8

conda activate cvt
conda install -y pytorch torchvision cudatoolkit=11.3 -c pytorch

# Install dependencies
pip install -r requirements.txt
pip install -e .

Data

Documentation:

Dataset setup
Label generation (optional)

Download the original datasets and our generated map-view labels

	Dataset	Labels
nuScenes	keyframes + map expansion (60 GB)	cvt_labels_nuscenes.tar.gz (361 MB)
Argoverse 1.1	3D tracking	coming soon™

The structure of the extracted data should look like the following

/datasets/
├─ nuscenes/
│  ├─ v1.0-trainval/
│  ├─ v1.0-mini/
│  ├─ samples/
│  ├─ sweeps/
│  └─ maps/
│     ├─ basemap/
│     └─ expansion/
└─ cvt_labels_nuscenes/
   ├─ scene-0001/
   ├─ scene-0001.json
   ├─ ...
   ├─ scene-1000/
   └─ scene-1000.json

When everything is setup correctly, check out the dataset with

python3 scripts/view_data.py \
  data=nuscenes \
  data.dataset_dir=/media/datasets/nuscenes \
  data.labels_dir=/media/datasets/cvt_labels_nuscenes \
  data.version=v1.0-mini \
  visualization=nuscenes_viz \
  +split=val

Training

An average job of 50k training iterations takes ~8 hours.
Our models were trained using 4 GPU jobs, but also can be trained on single GPU.

To train a model,

python3 scripts/train.py \
  +experiment=cvt_nuscenes_vehicle
  data.dataset_dir=/media/datasets/nuscenes \
  data.labels_dir=/media/datasets/cvt_labels_nuscenes

For more information, see

config/config.yaml - base config
config/model/cvt.yaml - model architecture
config/experiment/cvt_nuscenes_vehicle.yaml - additional overrides

Additional Information

Awesome Related Repos

License

This project is released under the MIT license

Citation

If you find this project useful for your research, please use the following BibTeX entry.

@inproceedings{zhou2022cross,
    title={Cross-view Transformers for real-time Map-view Semantic Segmentation},
    author={Zhou, Brady and Kr{\"a}henb{\"u}hl, Philipp},
    booktitle={CVPR},
    year={2022}
}

Cross-view Transformers for real-time Map-view Semantic Segmentation (CVPR 2022 Oral)

Related tags

Overview

Cross View Transformers

Demos

Installation

Data

Training

Additional Information

Awesome Related Repos

License

Citation

Owner

Brady Zhou

Prototypical Cross-Attention Networks for Multiple Object Tracking and Segmentation, NeurIPS 2021 Spotlight

Yet another video caption

social humanoid robots with GPGPU and IoT

The official implementation for "FQ-ViT: Fully Quantized Vision Transformer without Retraining".

"SOLQ: Segmenting Objects by Learning Queries", SOLQ is an end-to-end instance segmentation framework with Transformer.

Super-BPD: Super Boundary-to-Pixel Direction for Fast Image Segmentation (CVPR 2020)

Template repository for managing machine learning research projects built with PyTorch-Lightning

A heterogeneous entity-augmented academic language model based on Open Academic Graph (OAG)

TensorFlow Implementation of Unsupervised Cross-Domain Image Generation

Code for ICCV 2021 paper "HuMoR: 3D Human Motion Model for Robust Pose Estimation"

Continuous Security Group Rule Change Detection & Response at scale

This is a project based on retinaface face detection, including ghostnet and mobilenetv3

Over-the-Air Ensemble Inference with Model Privacy

PyTorch implementation of our ICCV paper DeFRCN: Decoupled Faster R-CNN for Few-Shot Object Detection.

Learning Multiresolution Matrix Factorization and its Wavelet Networks on Graphs

Replication Package for AequeVox:Automated Fariness Testing for Speech Recognition Systems

Unofficial implement with paper SpeakerGAN: Speaker identification with conditional generative adversarial network

Implementation of Segformer, Attention + MLP neural network for segmentation, in Pytorch

PyToch implementation of A Novel Self-supervised Learning Task Designed for Anomaly Segmentation

On-device speech-to-index engine powered by deep learning.