Refer-it-in-RGBD

This is the repository of our paper 'Refer-it-in-RGBD: A Bottom-up Approach for 3D Visual Grounding in RGBD Images' in CVPR 2021

Paper - ArXiv - pdf (abs)
Project page: https://unclemedm.github.io/Refer-it-in-RGBD/

Introduction

We present a novel task of 3D visual grounding in single-view RGB-D images where the referred objects are often only partially scanned. In contrast to previous works that directly generate object proposals for grounding in the 3D scenes, we propose a bottom-up approach to gradually aggregate information, effectively addressing the challenge posed by the partial scans. Our approach first fuses the language and the visual features at the bottom level to generate a heatmap that coarsely localizes the relevant regions in the RGB-D image. Then our approach adopts an adaptive search based on the heatmap and performs the object-level matching with another visio-linguistic fusion to finally ground the referred object. We evaluate the proposed method by comparing to the state-of-the-art methods on both the RGB-D images extracted from the ScanRefer dataset and our newly collected SUN-Refer dataset. Experiments show that our method outperforms the previous methods by a large margin (by 11.1% and 11.2% [email protected]) on both datasets.

Dataset

Download SUNREFER_v2 dataset
SUNREFER dataset contains 38,495 referring expression corresponding to 7,699 objects from SUNRGBD dataset. Here is one example from SUNREFER dataset:

Repository of our paper 'Refer-it-in-RGBD' in CVPR 2021

Related tags

Overview

Refer-it-in-RGBD

Introduction

Dataset

Owner

Haolin Liu

Hierarchical Uniform Manifold Approximation and Projection

Code for the Paper: Alexandra Lindt and Emiel Hoogeboom.

Eye-Blink-Counter - Python based Computer Vision project which counts how many time a person blinks

Final project code: Implementing BicycleGAN, for CIS680 FA21 at University of Pennsylvania

Code for ECIR'20 paper Diagnosing BERT with Retrieval Heuristics

Experimenting with computer vision techniques to generate annotated image datasets from gameplay recordings automatically.

Predict and time series avocado hass

The official implementation of NeurIPS 2021 paper: Finding Optimal Tangent Points for Reducing Distortions of Hard-label Attacks

PyTorch implementation of "MLP-Mixer: An all-MLP Architecture for Vision" Tolstikhin et al. (2021)

[TOG 2021] PyTorch implementation for the paper: SofGAN: A Portrait Image Generator with Dynamic Styling.

A minimal implementation of Gaussian process regression in PyTorch

C3D is a modified version of BVLC caffe to support 3D ConvNets.

NLP made easy

The official implementation of the Interspeech 2021 paper WSRGlow: A Glow-based Waveform Generative Model for Audio Super-Resolution.

Official PyTorch implementation of "Camera Distance-aware Top-down Approach for 3D Multi-person Pose Estimation from a Single RGB Image", ICCV 2019

Nightmare-Writeup - Writeup for the Nightmare CTF Challenge from 2022 DiceCTF

Camera calibration & 3D pose estimation tools for AcinoSet

Logsig-RNN: a novel network for robust and efficient skeleton-based action recognition

Image Super-Resolution by Neural Texture Transfer

PyTorch implementation of ECCV 2020 paper "Foley Music: Learning to Generate Music from Videos "