Learning RGB-D Feature Embeddings for Unseen Object Instance Segmentation

Overview

Unseen Object Clustering: Learning RGB-D Feature Embeddings for Unseen Object Instance Segmentation

Introduction

In this work, we propose a new method for unseen object instance segmentation by learning RGB-D feature embeddings from synthetic data. A metric learning loss functionis utilized to learn to produce pixel-wise feature embeddings such that pixels from the same object are close to each other and pixels from different objects are separated in the embedding space. With the learned feature embeddings, a mean shift clustering algorithm can be applied to discover and segment unseen objects. We further improve the segmentation accuracy with a new two-stage clustering algorithm. Our method demonstrates that non-photorealistic synthetic RGB and depth images can be used to learn feature embeddings that transfer well to real-world images for unseen object instance segmentation. arXiv, Talk video

License

Unseen Object Clustering is released under the NVIDIA Source Code License (refer to the LICENSE file for details).

Citation

If you find Unseen Object Clustering useful in your research, please consider citing:

@inproceedings{xiang2020learning,
    Author = {Yu Xiang and Christopher Xie and Arsalan Mousavian and Dieter Fox},
    Title = {Learning RGB-D Feature Embeddings for Unseen Object Instance Segmentation},
    booktitle = {Conference on Robot Learning (CoRL)},
    Year = {2020}
}

Required environment

  • Ubuntu 16.04 or above
  • PyTorch 0.4.1 or above
  • CUDA 9.1 or above

Installation

  1. Install PyTorch.

  2. Install python packages

    pip install -r requirement.txt

Download

  • Download our trained checkpoints from here, save to $ROOT/data.

Running the demo

  1. Download our trained checkpoints first.

  2. Run the following script for testing on images under $ROOT/data/demo.

    ./experiments/scripts/demo_rgbd_add.sh

Training and testing on the Tabletop Object Dataset (TOD)

  1. Download the Tabletop Object Dataset (TOD) from here (34G).

  2. Create a symlink for the TOD dataset

    cd $ROOT/data
    ln -s $TOD_DATA tabletop
  3. Training and testing on the TOD dataset

    cd $ROOT
    
    # multi-gpu training, we used 4 GPUs
    ./experiments/scripts/seg_resnet34_8s_embedding_cosine_rgbd_add_train_tabletop.sh
    
    # testing, $GPU_ID can be 0, 1, etc.
    ./experiments/scripts/seg_resnet34_8s_embedding_cosine_rgbd_add_test_tabletop.sh $GPU_ID $EPOCH
    

Testing on the OCID dataset and the OSD dataset

  1. Download the OCID dataset from here, and create a symbol link:

    cd $ROOT/data
    ln -s $OCID_dataset OCID
  2. Download the OSD dataset from here, and create a symbol link:

    cd $ROOT/data
    ln -s $OSD_dataset OSD
  3. Check scripts in experiments/scripts with name test_ocid or test_ocd. Make sure the path of the trained checkpoints exist.

    experiments/scripts/seg_resnet34_8s_embedding_cosine_rgbd_add_test_ocid.sh
    experiments/scripts/seg_resnet34_8s_embedding_cosine_rgbd_add_test_osd.sh
    

Running with ROS on a Realsense camera for real-world unseen object instance segmentation

  • Python2 is needed for ROS.

  • Make sure our pretrained checkpoints are downloaded.

    # start realsense
    roslaunch realsense2_camera rs_aligned_depth.launch tf_prefix:=measured/camera
    
    # start rviz
    rosrun rviz rviz -d ./ros/segmentation.rviz
    
    # run segmentation, $GPU_ID can be 0, 1, etc.
    ./experiments/scripts/ros_seg_rgbd_add_test_segmentation_realsense.sh $GPU_ID

Our example:

Owner
NVIDIA Research Projects
NVIDIA Research Projects
Multi-atlas segmentation (MAS) is a promising framework for medical image segmentation

Multi-atlas segmentation (MAS) is a promising framework for medical image segmentation. Generally, MAS methods register multiple atlases, i.e., medical images with corresponding labels, to a target i

NanYoMy 13 Oct 09, 2022
A Python Library for Graph Outlier Detection (Anomaly Detection)

PyGOD is a Python library for graph outlier detection (anomaly detection). This exciting yet challenging field has many key applications, e.g., detect

PyGOD Team 757 Jan 04, 2023
This repository is for our EMNLP 2021 paper "Automated Generation of Accurate & Fluent Medical X-ray Reports"

Introduction: X-Ray Report Generation This repository is for our EMNLP 2021 paper "Automated Generation of Accurate & Fluent Medical X-ray Reports". O

no name 36 Dec 16, 2022
BuildingNet: Learning to Label 3D Buildings

BuildingNet This is the implementation of the BuildingNet architecture described in this paper: Paper: BuildingNet: Learning to Label 3D Buildings Arx

16 Nov 07, 2022
Punctuation Restoration using Transformer Models for High-and Low-Resource Languages

Punctuation Restoration using Transformer Models This repository contins official implementation of the paper Punctuation Restoration using Transforme

Tanvirul Alam 142 Jan 01, 2023
[CVPR 2021] Counterfactual VQA: A Cause-Effect Look at Language Bias

Counterfactual VQA (CF-VQA) This repository is the Pytorch implementation of our paper "Counterfactual VQA: A Cause-Effect Look at Language Bias" in C

Yulei Niu 94 Dec 03, 2022
Auxiliary Raw Net (ARawNet) is a ASVSpoof detection model taking both raw waveform and handcrafted features as inputs, to balance the trade-off between performance and model complexity.

Overview This repository is an implementation of the Auxiliary Raw Net (ARawNet), which is ASVSpoof detection system taking both raw waveform and hand

6 Jul 08, 2022
Finetune the base 64 px GLIDE-text2im model from OpenAI on your own image-text dataset

Finetune the base 64 px GLIDE-text2im model from OpenAI on your own image-text dataset

Clay Mullis 82 Oct 13, 2022
A BaSiC Tool for Background and Shading Correction of Optical Microscopy Images

BaSiC Matlab code accompanying A BaSiC Tool for Background and Shading Correction of Optical Microscopy Images by Tingying Peng, Kurt Thorn, Timm Schr

Marr Lab 34 Dec 18, 2022
An implementation of the AdaOPS (Adaptive Online Packing-based Search), which is an online POMDP Solver used to solve problems defined with the POMDPs.jl generative interface.

AdaOPS An implementation of the AdaOPS (Adaptive Online Packing-guided Search), which is an online POMDP Solver used to solve problems defined with th

9 Oct 05, 2022
Edge Restoration Quality Assessment

ERQA - Edge Restoration Quality Assessment ERQA - a full-reference quality metric designed to analyze how good image and video restoration methods (SR

MSU Video Group 27 Dec 17, 2022
Semantically Contrastive Learning for Low-light Image Enhancement

Semantically Contrastive Learning for Low-light Image Enhancement Here, we propose an effective semantically contrastive learning paradigm for Low-lig

48 Dec 16, 2022
Adversarially Learned Inference

Adversarially Learned Inference Code for the Adversarially Learned Inference paper. Compiling the paper locally From the repo's root directory, $ cd p

Mohamed Ishmael Belghazi 308 Sep 24, 2022
It is modified Tensorflow 2.x version of Mask R-CNN

[TF 2.X] Mask R-CNN for Object Detection and Segmentation [Notice] : The original mask-rcnn uses the tensorflow 1.X version. I modified it for tensorf

Milner 34 Nov 09, 2022
Final project code: Implementing MAE with downscaled encoders and datasets, for ESE546 FA21 at University of Pennsylvania

546 Final Project: Masked Autoencoder Haoran Tang, Qirui Wu 1. Training To train the network, please run mae_pretraining.py. Please modify folder path

Haoran Tang 0 Apr 22, 2022
[Link]deep_portfolo - Use Reforcemet earg ad Supervsed learg to Optmze portfolo allocato []

rl_portfolio This Repository uses Reinforcement Learning and Supervised learning to Optimize portfolio allocation. The goal is to make profitable agen

Deepender Singla 165 Dec 02, 2022
CLIP2Video: Mastering Video-Text Retrieval via Image CLIP

CLIP2Video: Mastering Video-Text Retrieval via Image CLIP The implementation of paper CLIP2Video: Mastering Video-Text Retrieval via Image CLIP. CLIP2

168 Dec 29, 2022
Easy-to-use,Modular and Extendible package of deep-learning based CTR models .

DeepCTR DeepCTR is a Easy-to-use,Modular and Extendible package of deep-learning based CTR models along with lots of core components layers which can

浅梦 6.6k Jan 08, 2023
PyTorch implementation of DD3D: Is Pseudo-Lidar needed for Monocular 3D Object detection?

PyTorch implementation of DD3D: Is Pseudo-Lidar needed for Monocular 3D Object detection? (ICCV 2021), Dennis Park*, Rares Ambrus*, Vitor Guizilini, Jie Li, and Adrien Gaidon.

Toyota Research Institute - Machine Learning 364 Dec 27, 2022
Black box hyperparameter optimization made easy.

BBopt BBopt aims to provide the easiest hyperparameter optimization you'll ever do. Think of BBopt like Keras (back when Theano was still a thing) for

Evan Hubinger 70 Nov 03, 2022