Group-Free 3D Object Detection via Transformers

Overview

Group-Free 3D Object Detection via Transformers

By Ze Liu, Zheng Zhang, Yue Cao, Han Hu, Xin Tong.

This repo is the official implementation of "Group-Free 3D Object Detection via Transformers".

teaser

Updates

  • April 01, 2021: initial release.

Introduction

Recently, directly detecting 3D objects from 3D point clouds has received increasing attention. To extract object representation from an irregular point cloud, existing methods usually take a point grouping step to assign the points to an object candidate so that a PointNet-like network could be used to derive object features from the grouped points. However, the inaccurate point assignments caused by the hand-crafted grouping scheme decrease the performance of 3D object detection. In this paper, we present a simple yet effective method for directly detecting 3D objects from the 3D point cloud. Instead of grouping local points to each object candidate, our method computes the feature of an object from all the points in the point cloud with the help of an attention mechanism in the Transformers, where the contribution of each point is automatically learned in the network training. With an improved attention stacking scheme, our method fuses object features in different stages and generates more accurate object detection results. With few bells and whistles, the proposed method achieves state-of-the-art 3D object detection performance on two widely used benchmarks, ScanNet V2 and SUN RGB-D.

In this repository, we provide model implementation (with Pytorch) as well as data preparation, training and evaluation scripts on ScanNet and SUN RGB-D.

Citation

@article{liu2021,
  title={Group-Free 3D Object Detection via Transformers},
  author={Liu, Ze and Zhang, Zheng and Cao, Yue and Hu, Han and Tong, Xin},
  journal={arXiv preprint arXiv:2104.00678},
  year={2021}
}

Main Results

ScanNet V2

Method backbone [email protected] [email protected] Model
HGNet GU-net 61.3 34.4 -
GSDN MinkNet 62.8 34.8 waiting for release
3D-MPA MinkNet 64.2 49.2 waiting for release
VoteNet PointNet++ 62.9 39.9 official repo
MLCVNet PointNet++ 64.5 41.4 official repo
H3DNet PointNet++ 64.4 43.4 official repo
H3DNet 4xPointNet++ 67.2 48.1 official repo
Ours(L6, O256) PointNet++ 67.3 (66.2*) 48.9 (48.4*) model
Ours(L12, O256) PointNet++ 67.2 (66.6*) 49.7 (49.3*) model
Ours(L12, O256) PointNet++w2× 68.8 (68.3*) 52.1 (51.1*) model
Ours(L12, O512) PointNet++w2× 69.1 (68.8*) 52.8 (52.3*) model

SUN RGB-D

Method backbone inputs [email protected] [email protected] Model
VoteNet PointNet++ point 59.1 35.8 official repo
MLCVNet PointNet++ point 59.8 - official repo
HGNet GU-net point 61.6 - -
H3DNet 4xPointNet++ point 60.1 39.0 official repo
imVoteNet PointNet++ point+RGB 63.4 - official repo
Ours(L6, O256) PointNet++ point 62.8 (62.6*) 42.3 (42.0*) model

Notes:

  • * means the result is averaged over 5-times evaluation since the algorithm randomness is large.

Install

Requirements

  • Ubuntu 16.04
  • Anaconda with python=3.6
  • pytorch>=1.3
  • torchvision with pillow<7
  • cuda=10.1
  • trimesh>=2.35.39,<2.35.40
  • 'networkx>=2.2,<2.3'
  • compile the CUDA layers for PointNet++, which we used in the backbone network: sh init.sh
  • others: pip install termcolor opencv-python tensorboard

Data preparation

For SUN RGB-D, follow the README under the sunrgbd folder.

For ScanNet, follow the README under the scannet folder.

Usage

ScanNet

For L6, O256 training:

python -m torch.distributed.launch --master_port <port_num> --nproc_per_node <num_of_gpus_to_use> \
    train_dist.py --num_point 50000 --num_decoder_layers 6 \
    --size_delta 0.111111111111 --center_delta 0.04 \
    --learning_rate 0.006 --decoder_learning_rate 0.0006 --weight_decay 0.0005 \
    --dataset scannet --data_root <data directory> [--log_dir <log directory>]

For L6, O256 evaluation:

python eval_avg.py --num_point 50000 --num_decoder_layers 6 \
    --checkpoint_path <checkpoint> --avg_times 5 \
    --dataset scannet --data_root <data directory> [--dump_dir <dump directory>]

For L12, O256 training:

python -m torch.distributed.launch --master_port <port_num> --nproc_per_node <num_of_gpus_to_use> \
    train_dist.py --num_point 50000 --num_decoder_layers 12 \
    --size_delta 0.111111111111 --center_delta 0.04 \
    --learning_rate 0.006 --decoder_learning_rate 0.0006 --weight_decay 0.0005 \
    --dataset scannet --data_root <data directory> [--log_dir <log directory>]

For L6, O256 evaluation:

python eval_avg.py --num_point 50000 --num_decoder_layers 12 \
    --checkpoint_path <checkpoint> --avg_times 5 \
    --dataset scannet --data_root <data directory> [--dump_dir <dump directory>]

For w2x, L12, O256 training:

python -m torch.distributed.launch --master_port <port_num> --nproc_per_node <num_of_gpus_to_use> \
    train_dist.py --num_point 50000 --width 2 --num_decoder_layers 12 \
    --size_delta 0.111111111111 --center_delta 0.04 \
    --learning_rate 0.006 --decoder_learning_rate 0.0006 --weight_decay 0.0005 \
    --dataset scannet --data_root <data directory> [--log_dir <log directory>]

For w2x, L12, O256 evaluation:

python eval_avg.py --num_point 50000 --width 2 --num_decoder_layers 12 \
    --checkpoint_path <checkpoint> --avg_times 5 \
    --dataset scannet --data_root <data directory> [--dump_dir <dump directory>]

For w2x, L12, O512 training:

python -m torch.distributed.launch --master_port <port_num> --nproc_per_node <num_of_gpus_to_use> \
    train_dist.py --num_point 50000 --width 2 --num_decoder_layers 12 --num_target 512 \
    --size_delta 0.111111111111 --center_delta 0.04 \
    --learning_rate 0.006 --decoder_learning_rate 0.0006 --weight_decay 0.0005 \
    --dataset scannet --data_root <data directory> [--log_dir <log directory>]

For w2x, L12, O512 evaluation:

python eval_avg.py --num_point 50000 --width 2 --num_decoder_layers 12 --num_target 512 \
    --checkpoint_path <checkpoint> --avg_times 5 \
    --dataset scannet --data_root <data directory> [--dump_dir <dump directory>]

SUN RGB-D

For L6, O256 training:

python -m torch.distributed.launch --master_port <port_num> --nproc_per_node <num_of_gpus_to_use> \
    train_dist.py --max_epoch 600 --lr_decay_epochs 420 480 540 --num_point 20000 --num_decoder_layers 6 \
    --size_delta 0.0625 --heading_delta 0.04 --center_delta 0.1111111111111 \
    --learning_rate 0.004 --decoder_learning_rate 0.0002 --weight_decay 0.00000001 --query_points_generator_loss_coef 0.2 --obj_loss_coef 0.4 \
    --dataset sunrgbd --data_root <data directory> [--log_dir <log directory>]

For L6, O256 evaluation:

python eval_avg.py --num_point 20000 --num_decoder_layers 6 \
    --checkpoint_path <checkpoint> --avg_times 5 \
    --dataset sunrgbd --data_root <data directory> [--dump_dir <dump directory>]

Acknowledgements

We thank a lot for the flexible codebase of votenet.

License

The code is released under MIT License (see LICENSE file for details).

Owner
Ze Liu
USTC & MSRA Joint-PhD candidate.
Ze Liu
DCSAU-Net: A Deeper and More Compact Split-Attention U-Net for Medical Image Segmentation

DCSAU-Net: A Deeper and More Compact Split-Attention U-Net for Medical Image Segmentation By Qing Xu, Wenting Duan and Na He Requirements pytorch==1.1

Qing Xu 20 Dec 09, 2022
This repository contains the code and models necessary to replicate the results of paper: How to Robustify Black-Box ML Models? A Zeroth-Order Optimization Perspective

Black-Box-Defense This repository contains the code and models necessary to replicate the results of our recent paper: How to Robustify Black-Box ML M

OPTML Group 2 Oct 05, 2022
ChineseBERT: Chinese Pretraining Enhanced by Glyph and Pinyin Information

ChineseBERT: Chinese Pretraining Enhanced by Glyph and Pinyin Information This repository contains code, model, dataset for ChineseBERT at ACL2021. Ch

413 Dec 01, 2022
Generative code template for PixelBeasts 10k NFT project.

generator-template Generative code template for combining transparent png attributes into 10,000 unique images. Used for the PixelBeasts 10k NFT proje

Yohei Nakajima 9 Aug 24, 2022
Reproducible research and reusable acyclic workflows in Python. Execute code on HPC systems as if you executed them on your personal computer!

Reproducible research and reusable acyclic workflows in Python. Execute code on HPC systems as if you executed them on your machine! Motivation Would

Joeri Hermans 15 Sep 11, 2022
JFB: Jacobian-Free Backpropagation for Implicit Models

JFB: Jacobian-Free Backpropagation for Implicit Models

Typal Research 28 Dec 11, 2022
Code for "ShineOn: Illuminating Design Choices for Practical Video-based Virtual Clothing Try-on", accepted at WACV 2021 Generation of Human Behavior Workshop.

ShineOn: Illuminating Design Choices for Practical Video-based Virtual Clothing Try-on [ Paper ] [ Project Page ] This repository contains the code fo

Andrew Jong 97 Dec 13, 2022
Implementation of Shape Generation and Completion Through Point-Voxel Diffusion

Shape Generation and Completion Through Point-Voxel Diffusion Project | Paper Implementation of Shape Generation and Completion Through Point-Voxel Di

Linqi Zhou 103 Dec 29, 2022
Rainbow DQN implementation that outperforms the paper's results on 40% of games using 20x less data 🌈

Rainbow 🌈 An implementation of Rainbow DQN which reaches a median HNS of 205.7 after only 10M frames (the original Rainbow from Hessel et al. 2017 re

Dominik Schmidt 31 Dec 21, 2022
PyTorch - Python + Nim

Master Release Pytorch - Py + Nim A Nim frontend for pytorch, aiming to be mostly auto-generated and internally using ATen. Because Nim compiles to C+

Giovanni Petrantoni 425 Dec 22, 2022
Python parser for DTED data.

DTED Parser This is a package written in pure python (with help from numpy) to parse and investigate Digital Terrain Elevation Data (DTED) files. This

Ben Bonenfant 12 Dec 18, 2022
GeneDisco is a benchmark suite for evaluating active learning algorithms for experimental design in drug discovery.

GeneDisco is a benchmark suite for evaluating active learning algorithms for experimental design in drug discovery.

22 Dec 12, 2022
Deep Learning and Reinforcement Learning Library for Scientists and Engineers 🔥

TensorLayer is a novel TensorFlow-based deep learning and reinforcement learning library designed for researchers and engineers. It provides an extens

TensorLayer Community 7.1k Dec 29, 2022
Labelbox is the fastest way to annotate data to build and ship artificial intelligence applications

Labelbox Labelbox is the fastest way to annotate data to build and ship artificial intelligence applications. Use this github repository to help you s

labelbox 1.7k Dec 29, 2022
Interactive web apps created using geemap and streamlit

geemap-apps Introduction This repo demostrates how to build a multi-page Earth Engine App using streamlit and geemap. You can deploy the app on variou

Qiusheng Wu 27 Dec 23, 2022
Nodule Generation Algorithm Baseline and template code for node21 generation track

Nodule Generation Algorithm This codebase implements a simple baseline model, by following the main steps in the paper published by Litjens et al. for

node21challenge 10 Apr 21, 2022
Deep metric learning methods implemented in Chainer

Deep Metric Learning Implementation of several methods for deep metric learning in Chainer v4.2.0. Proxy-NCA: No Fuss Distance Metric Learning using P

ronekko 156 Nov 28, 2022
Fast, Attemptable Route Planner for Navigation in Known and Unknown Environments

FAR Planner uses a dynamically updated visibility graph for fast replanning. The planner models the environment with polygons and builds a global visi

Fan Yang 346 Dec 30, 2022
Python Implementation of the CoronaWarnApp (CWA) Event Registration

Python implementation of the Corona-Warn-App (CWA) Event Registration This is an implementation of the Protocol used to generate event and location QR

MaZderMind 17 Oct 05, 2022
Anti-UAV base on PaddleDetection

Paddle-Anti-UAV Anti-UAV base on PaddleDetection Background UAVs are very popular and we can see them in many public spaces, such as parks and playgro

Qingzhong Wang 2 Apr 20, 2022