Towards Part-Based Understanding of RGB-D Scans

Last update: Nov 23, 2022

Overview

Towards Part-Based Understanding of RGB-D Scans (CVPR 2021)

We propose the task of part-based scene understanding of real-world 3D environments: from an RGB-D scan of a scene, we detect objects, and for each object predict its decomposition into geometric part masks, which composed together form the complete geometry of the observed object.

Download Paper (.pdf)

Demo samples

Get started

The core of this repository is a network, which takes as input preprocessed scan voxel crops and produces voxelized part trees. However, data preparation is very massive step before launching actual training and inference. That's why we release already prepared data for training and checkpoint to perform inference. If you want to launch training with our data, please follow the steps below:

Clone repo: git clone https://github.com/alexeybokhovkin/part-based-scan-understanding.git
Download data and/or checkpoint:
ScanNet MLCVNet crops (finetune) [894M]
ScanNet clean crops (pretraining) [995M]
PartNet GT trees [103M]
Parts priors [169M]
Checkpoint [19M]
For training, prepare augmented version of ScanNet crops with script dataproc/prepare_rot_aug_data.py. After this, create a folder with all necessary dataset metadata using script dataproc/gather_all_shapes.py
Create config file similar to configs/config_gnn_scannet_allshapes.yaml (you need to provide paths to some directories and files)
Launch training with train_gnn_scannet.py

Citation

If you use this framework please cite:

@article{Bokhovkin2020TowardsPU,
  title={Towards Part-Based Understanding of RGB-D Scans},
  author={Alexey Bokhovkin and V. Ishimtsev and Emil Bogomolov and D. Zorin and A. Artemov and Evgeny Burnaev and Angela Dai},
  journal={ArXiv},
  year={2020},
  volume={abs/2012.02094}
}

You might also like...

PN-Net a neural field-based framework for depth estimation from single-view RGB images.

PN-Net We present a neural field-based framework for depth estimation from single-view RGB images. Rather than representing a 2D depth map as a single

1 Oct 2, 2021

PoseCamera is python based SDK for human pose estimation through RGB webcam.

PoseCamera PoseCamera is python based SDK for human pose estimation through RGB webcam. Install install posecamera package through pip pip install pos

7 Jul 20, 2021

Single-stage Keypoint-based Category-level Object Pose Estimation from an RGB Image

CenterPose Overview This repository is the official implementation of the paper "Single-stage Keypoint-based Category-level Object Pose Estimation fro

188 Dec 27, 2022

OcclusionFusion: realtime dynamic 3D reconstruction based on single-view RGB-D

OcclusionFusion (CVPR'2022) Project Page | Paper | Video Overview This repository contains the code for the CVPR 2022 paper OcclusionFusion, where we

193 Dec 15, 2022

Inference code for "StylePeople: A Generative Model of Fullbody Human Avatars" paper. This code is for the part of the paper describing video-based avatars.

NeuralTextures This is repository with inference code for paper "StylePeople: A Generative Model of Fullbody Human Avatars" (CVPR21). This code is for

Visual Understanding Lab @ Samsung AI Center Moscow

18 Oct 6, 2022

The official implementation of the CVPR 2021 paper FAPIS: a Few-shot Anchor-free Part-based Instance Segmenter

Comments

scannet_shape_ids files and part segmentation
First of all, thanks for the great work! I have two questions about this repo and your paper:

It seems that txt files for scannet_shape_ids are required for prepare_rot_aug_data.py. But I cannot find them in the provided dataset files.

Could you explain more details about part segmentation on 3D scans? I'm confused if the part segmentation labels for 3d scans are generated by 1) aligning PartNet data, 2) assigning part labels to overlapped regions. Do you provide point-wise (or voxel-wise) part segmentation annotation?
opened by jeonghyunkeem 0

Towards Part-Based Understanding of RGB-D Scans

Related tags

Overview

Towards Part-Based Understanding of RGB-D Scans (CVPR 2021)

Demo samples

Get started

Citation

You might also like...

PN-Net a neural field-based framework for depth estimation from single-view RGB images.

PoseCamera is python based SDK for human pose estimation through RGB webcam.

Single-stage Keypoint-based Category-level Object Pose Estimation from an RGB Image

OcclusionFusion: realtime dynamic 3D reconstruction based on single-view RGB-D

Inference code for "StylePeople: A Generative Model of Fullbody Human Avatars" paper. This code is for the part of the paper describing video-based avatars.

The official implementation of the CVPR 2021 paper FAPIS: a Few-shot Anchor-free Part-based Instance Segmenter

EasyMocap is an open-source toolbox for markerless human motion capture from RGB videos.

Learning RGB-D Feature Embeddings for Unseen Object Instance Segmentation

CoReNet is a technique for joint multi-object 3D reconstruction from a single RGB image.

Comments

scannet_shape_ids files and part segmentation

Releases(v0.1)

v0.1(Jun 18, 2021)

Owner

This repo contains the official implementations of EigenDamage: Structured Pruning in the Kronecker-Factored Eigenbasis

Code for the upcoming CVPR 2021 paper

Official Pytorch implementation of the paper "Action-Conditioned 3D Human Motion Synthesis with Transformer VAE", ICCV 2021

A python module for configuration of block devices

ULMFiT for Genomic Sequence Data

Softlearning is a reinforcement learning framework for training maximum entropy policies in continuous domains. Includes the official implementation of the Soft Actor-Critic algorithm.

Code for layerwise detection of linguistic anomaly paper (ACL 2021)

Spearmint Bayesian optimization codebase

Resources related to our paper "CLIN-X: pre-trained language models and a study on cross-task transfer for concept extraction in the clinical domain"

AtlasNet: A Papier-Mâché Approach to Learning 3D Surface Generation

This is the official repository for our paper: ''Pruning Self-attentions into Convolutional Layers in Single Path''.

Sign Language is detected in realtime using video sequences. Our approach involves MediaPipe Holistic for keypoints extraction and LSTM Model for prediction.

Traditional deepdream with VQGAN+CLIP and optical flow. Ready to use in Google Colab

NeRViS: Neural Re-rendering for Full-frame Video Stabilization

A fast implementation of bss_eval metrics for blind source separation

Segmentation Training Pipeline

Multi-Target Adversarial Frameworks for Domain Adaptation in Semantic Segmentation

SparseInst: Sparse Instance Activation for Real-Time Instance Segmentation, CVPR 2022

g2o: A General Framework for Graph Optimization

Forecasting directional movements of stock prices for intraday trading using LSTM and random forest