ICCV2021 Papers with Code

Overview

ICCV2021-Papers-with-Code

ICCV 2021 论文和开源项目合集(papers with code)!

1617 papers accepted - 25.9% acceptance rate

ICCV 2021 收录论文IDs:https://docs.google.com/spreadsheets/u/1/d/e/2PACX-1vRfaTmsNweuaA0Gjyu58H_Cx56pGwFhcTYII0u1pg0U7MbhlgY0R6Y-BbK3xFhAiwGZ26u3TAtN5MnS/pubhtml

注1:欢迎各位大佬提交issue,分享ICCV 2021论文和开源项目!

注2:关于往年CV顶会论文以及其他优质CV论文和大盘点,详见: https://github.com/amusi/daily-paper-computer-vision

【ICCV 2021 论文和开源目录】

Backbone

Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions

AutoFormer: Searching Transformers for Visual Recognition

Bias Loss for Mobile Neural Networks

Vision Transformer with Progressive Sampling

Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet

Rethinking Spatial Dimensions of Vision Transformers

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

Conformer: Local Features Coupling Global Representations for Visual Recognition

MicroNet: Improving Image Recognition with Extremely Low FLOPs

Zen-NAS: A Zero-Shot NAS for High-Performance Deep Image Recognition

Visual Transformer

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

An Empirical Study of Training Self-Supervised Vision Transformers

Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions

Group-Free 3D Object Detection via Transformers

Spatial-Temporal Transformer for Dynamic Scene Graph Generation

Rethinking and Improving Relative Position Encoding for Vision Transformer

Emerging Properties in Self-Supervised Vision Transformers

Learning Spatio-Temporal Transformer for Visual Tracking

Fast Convergence of DETR with Spatially Modulated Co-Attention

Vision Transformer with Progressive Sampling

Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet

Rethinking Spatial Dimensions of Vision Transformers

The Right to Talk: An Audio-Visual Transformer Approach

Joint Inductive and Transductive Learning for Video Object Segmentation

Conformer: Local Features Coupling Global Representations for Visual Recognition

Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer

Paint Transformer: Feed Forward Neural Painting with Stroke Prediction

Conditional DETR for Fast Training Convergence

MUSIQ: Multi-scale Image Quality Transformer

SOTR: Segmenting Objects with Transformers

PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers

SnowflakeNet: Point Cloud Completion by Snowflake Point Deconvolution with Skip-Transformer

Improving 3D Object Detection with Channel-wise Transformer

TransFER: Learning Relation-aware Facial Expression Representations with Transformers

GroupFormer: Group Activity Recognition with Clustered Spatial-Temporal Transformer

Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction

Voxel Transformer for 3D Object Detection

3D Human Texture Estimation from a Single Image with Transformers

FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting

CTRL-C: Camera calibration TRansformer with Line-Classification

An End-to-End Transformer Model for 3D Object Detection

Eformer: Edge Enhancement based Transformer for Medical Image Denoising

PnP-DETR: Towards Efficient Visual Analysis with Transformers

Transformer-based Dual Relation Graph for Multi-label Image Recognition

涨点神器

FaPN: Feature-aligned Pyramid Network for Dense Image Prediction

Unifying Nonlocal Blocks for Neural Networks

Towards Learning Spatially Discriminative Feature Representations

GAN

Labels4Free: Unsupervised Segmentation using StyleGAN

GNeRF: GAN-based Neural Radiance Field without Posed Camera

EigenGAN: Layer-Wise Eigen-Learning for GANs

From Continuity to Editability: Inverting GANs with Consecutive Images

Sketch Your Own GAN

Manifold Matching via Deep Metric Learning for Generative Modeling

Dual Projection Generative Adversarial Networks for Conditional Image Generation

GAN Inversion for Out-of-Range Images with Geometric Transformations

ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement

StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery

Image Synthesis via Semantic Composition

NAS

AutoFormer: Searching Transformers for Visual Recognition

BN-NAS: Neural Architecture Search with Batch Normalization

Zen-NAS: A Zero-Shot NAS for High-Performance Deep Image Recognition

NeRF

GNeRF: GAN-based Neural Radiance Field without Posed Camera

KiloNeRF: Speeding up Neural Radiance Fields with Thousands of Tiny MLPs

In-Place Scene Labelling and Understanding with Implicit Scene Representation

Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis

BARF: Bundle-Adjusting Neural Radiance Fields

Self-Calibrating Neural Radiance Fields

Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction

Neural Articulated Radiance Field

NerfingMVS: Guided Optimization of Neural Radiance Fields for Indoor Multi-view Stereo

SNARF: Differentiable Forward Skinning for Animating Non-rigid Neural Implicit Shapes

CodeNeRF: Disentangled Neural Radiance Fields for Object Categories

PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering

Loss

Rank & Sort Loss for Object Detection and Instance Segmentation

Bias Loss for Mobile Neural Networks

A Robust Loss for Point Cloud Registration

Reconcile Prediction Consistency for Balanced Object Detection

Influence-Balanced Loss for Imbalanced Visual Classification

Zero-Shot Learning

FREE: Feature Refinement for Generalized Zero-Shot Learning

Discriminative Region-based Multi-Label Zero-Shot Learning

Semantics Disentangling for Generalized Zero-Shot Learning

Few-Shot Learning

Relational Embedding for Few-Shot Classification

Few-Shot and Continual Learning with Attentive Independent Mechanisms

Few Shot Visual Relationship Co-Localization

长尾(Long-tailed)

Parametric Contrastive Learning

Influence-Balanced Loss for Imbalanced Visual Classification

Vision and Language

VLGrammar: Grounded Grammar Induction of Vision and Language

无监督/自监督(Un/Self-Supervised)

An Empirical Study of Training Self-Supervised Vision Transformers

DetCo: Unsupervised Contrastive Learning for Object Detection

Enhancing Self-supervised Video Representation Learning via Multi-level Feature Optimization

Improving Contrastive Learning by Visualizing Feature Transformation

Self-Supervised Visual Representations Learning by Contrastive Mask Prediction

Temporal Knowledge Consistency for Unsupervised Visual Representation Learning

MultiSiam: Self-supervised Multi-instance Siamese Representation Learning for Autonomous Driving

Spatio-temporal Self-Supervised Representation Learning for 3D Point Clouds

Self-supervised Product Quantization for Deep Unsupervised Image Retrieval

Self-Supervised Representation Learning from Flow Equivariance

Multi-Label Image Recognition(多标签图像识别)

Residual Attention: A Simple but Effective Method for Multi-Label Recognition

2D目标检测(Object Detection)

DetCo: Unsupervised Contrastive Learning for Object Detection

Detecting Invisible People

Active Learning for Deep Object Detection via Probabilistic Modeling

Conditional Variational Capsule Network for Open Set Recognition

MDETR : Modulated Detection for End-to-End Multi-Modal Understanding

Rank & Sort Loss for Object Detection and Instance Segmentation

SimROD: A Simple Adaptation Method for Robust Object Detection

GraphFPN: Graph Feature Pyramid Network for Object Detection

Fast Convergence of DETR with Spatially Modulated Co-Attention

Conditional DETR for Fast Training Convergence

TOOD: Task-aligned One-stage Object Detection

Reconcile Prediction Consistency for Balanced Object Detection

Mutual Supervision for Dense Object Detection

PnP-DETR: Towards Efficient Visual Analysis with Transformers

Deep Structured Instance Graph for Distilling Object Detectors

半监督目标检测

End-to-End Semi-Supervised Object Detection with Soft Teacher

旋转目标检测

Oriented R-CNN for Object Detection

Few-Shot目标检测

DeFRCN: Decoupled Faster R-CNN for Few-Shot Object Detection

语义分割(Semantic Segmentation)

Personalized Image Semantic Segmentation

Standardized Max Logits: A Simple yet Effective Approach for Identifying Unexpected Road Obstacles in Urban-Scene Segmentation

Enhanced Boundary Learning for Glass-like Object Segmentation

Self-Regulation for Semantic Segmentation

Mining Contextual Information Beyond Image for Semantic Segmentation

Standardized Max Logits: A Simple yet Effective Approach for Identifying Unexpected Road Obstacles in Urban-Scene Segmentation

ISNet: Integrate Image-Level and Semantic-Level Context for Semantic Segmentation

Scaling up instance annotation via label propagation

无监督域自适应语义分割(Unsupervised Domain Ddaption Semantic Segmentation)

Multi-Anchor Active Domain Adaptation for Semantic Segmentation

Generalize then Adapt: Source-Free Domain Adaptive Semantic Segmentation

Few-Shot语义分割

Learning Meta-class Memory for Few-Shot Semantic Segmentation

Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer

半监督语义分割(Semi-supervised Semantic Segmentation)

Leveraging Auxiliary Tasks with Affinity Learning for Weakly Supervised Semantic Segmentation

Re-distributing Biased Pseudo Labels for Semi-supervised Semantic Segmentation: A Baseline Investigation

Pixel Contrastive-Consistent Semi-Supervised Semantic Segmentation

弱监督语义分割(Weakly Supervised Semantic Segmentation)

Complementary Patch for Weakly Supervised Semantic Segmentation

无监督分割(Unsupervised Segmentation)

Labels4Free: Unsupervised Segmentation using StyleGAN

实例分割(Instance Segmentation)

Instances as Queries

Crossover Learning for Fast Online Video Instance Segmentation

Rank & Sort Loss for Object Detection and Instance Segmentation

SOTR: Segmenting Objects with Transformers

Scaling up instance annotation via label propagation

医学图像分割(Medical Image Segmentation)

Recurrent Mask Refinement for Few-Shot Medical Image Segmentation

视频目标分割(Video Object Segmentation)

Hierarchical Memory Matching Network for Video Object Segmentation

Full-Duplex Strategy for Video Object Segmentation

Joint Inductive and Transductive Learning for Video Object Segmentation

Few-shot Segmentation

Mining Latent Classes for Few-shot Segmentation

人体运动分割(Human Motion Segmentation)

Graph Constrained Data Representation Learning for Human Motion Segmentation

目标跟踪(Object Tracking)

Learning to Track Objects from Unlabeled Videos

Learning Spatio-Temporal Transformer for Visual Tracking

Learning to Adversarially Blur Visual Object Tracking

HiFT: Hierarchical Feature Transformer for Aerial Tracking

Learn to Match: Automatic Matching Network Design for Visual Tracking

Saliency-Associated Object Tracking

RGBD 目标跟踪

DepthTrack: Unveiling the Power of RGBD Tracking

3D Point Cloud

Spatio-temporal Self-Supervised Representation Learning for 3D Point Clouds

Unsupervised Point Cloud Pre-Training via View-Point Occlusion, Completion

DRINet: A Dual-Representation Iterative Learning Network for Point Cloud Segmentation

Adaptive Graph Convolution for Point Cloud Analysis

Unsupervised Point Cloud Pre-Training via View-Point Occlusion, Completion

3D Object Detection(3D目标检测)

Group-Free 3D Object Detection via Transformers

Improving 3D Object Detection with Channel-wise Transformer

AutoShape: Real-Time Shape-Aware Monocular 3D Object Detection

4D-Net for Learned Multi-Modal Alignment

Voxel Transformer for 3D Object Detection

Pyramid R-CNN: Towards Better Performance and Adaptability for 3D Object Detection

An End-to-End Transformer Model for 3D Object Detection

RangeDet: In Defense of Range View for LiDAR-based 3D Object Detection

Geometry-based Distance Decomposition for Monocular 3D Object Detection

3D Semantic Segmentation(3D语义分割)

ReDAL: Region-based and Diversity-aware Active Learning for Point Cloud Semantic Segmentation

Learning with Noisy Labels for Robust Point Cloud Segmentation

VMNet: Voxel-Mesh Network for Geodesic-Aware 3D Semantic Segmentation

Sparse-to-dense Feature Matching: Intra and Inter domain Cross-modal Learning in Domain Adaptation for 3D Semantic Segmentation

DRINet: A Dual-Representation Iterative Learning Network for Point Cloud Segmentation

Adaptive Graph Convolution for Point Cloud Analysis

Perception-Aware Multi-Sensor Fusion for 3D LiDAR Semantic Segmentation

3D Instance Segmentation(3D实例分割)

Hierarchical Aggregation for 3D Instance Segmentation

Instance Segmentation in 3D Scenes Using Semantic Superpoint Tree Networks

3D Multi-Object Tracking(3D多目标跟踪)

Exploring Simple 3D Multi-Object Tracking for Autonomous Driving

Point Cloud Denoising(点云去噪)

Score-Based Point Cloud Denoising

Point Cloud Registration(点云配准)

HRegNet: A Hierarchical Network for Large-scale Outdoor LiDAR Point Cloud Registration

A Robust Loss for Point Cloud Registration

Point Cloud Completion(点云补全)

PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers

SnowflakeNet: Point Cloud Completion by Snowflake Point Deconvolution with Skip-Transformer

雷达语义分割(Radar Semantic Segmentation)

Multi-View Radar Semantic Segmentation

图像恢复(Image Restoration)

Dynamic Attentive Graph Learning for Image Restoration

超分辨率(Super-Resolution)

Learning for Scale-Arbitrary Super-Resolution from Scale-Specific Networks

Mutual Affine Network for Spatially Variant Kernel Estimation in Blind Image Super-Resolution

Deep Reparametrization of Multi-Frame Super-Resolution and Denoising

Dual-Camera Super-Resolution with Aligned Attention Modules

Real-world Video Super-resolution: A Benchmark Dataset and A Decomposition based Learning Scheme

去噪(Denoising)

Deep Reparametrization of Multi-Frame Super-Resolution and Denoising

Rethinking Deep Image Prior for Denoising

医学图像去噪(Medical Image Denoising)

Eformer: Edge Enhancement based Transformer for Medical Image Denoising

去模糊(Deblurring)

Rethinking Coarse-to-Fine Approach in Single Image Deblurring

Single Image Defocus Deblurring Using Kernel-Sharing Parallel Atrous Convolutions

阴影去除(Shadow Removal)

CANet: A Context-Aware Network for Shadow Removal

视频插帧(Video Frame Interpolation)

XVFI: eXtreme Video Frame Interpolation

Asymmetric Bilateral Motion Estimation for Video Frame Interpolation

视频修复/补全(Video Inpainting)

FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting

行人重识别(Person Re-identification)

TransReID: Transformer-based Object Re-Identification

IDM: An Intermediate Domain Module for Domain Adaptive Person Re-ID

行人搜索(Person Search)

Weakly Supervised Person Search with Region Siamese Networks

2D/3D人体姿态估计(2D/3D Human Pose Estimation)

2D 人体姿态估计

Human Pose Regression with Residual Log-likelihood Estimation

Online Knowledge Distillation for Efficient Pose Estimation

3D 人体姿态估计

Probabilistic Monocular 3D Human Pose Estimation with Normalizing Flows

Graph-Based 3D Multi-Person Pose Estimation Using Multi-View Images

6D位姿估计(6D Object Pose Estimation)

StereOBJ-1M: Large-scale Stereo Image Dataset for 6D Object Pose Estimation

3D人头重建(3D Head Reconstruction)

H3D-Net: Few-Shot High-Fidelity 3D Head Reconstruction

人脸识别(Face Recognition)

SynFace: Face Recognition with Synthetic Data

Facial Expression Recognition(人脸表情识别)

TransFER: Learning Relation-aware Facial Expression Representations with Transformers

行为识别(Action Recognition)

MGSampler: An Explainable Sampling Strategy for Video Action Recognition

Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition

Enhancing Self-supervised Video Representation Learning via Multi-level Feature Optimization

Dynamic Network Quantization for Efficient Video Inference

时序动作定位(Temporal Action Localization)

Enriching Local and Global Contexts for Temporal Action Localization

动作检测(Action Detection)

Class Semantics-based Attention for Action Detection

群体活动识别(Group Activity Recognition)

GroupFormer: Group Activity Recognition with Clustered Spatial-Temporal Transformer

手语识别(Sign Language Recognition)

Visual Alignment Constraint for Continuous Sign Language Recognition

文本检测(Text Detection)

Adaptive Boundary Proposal Network for Arbitrary Shape Text Detection

文本识别(Text Recognition)

Joint Visual Semantic Reasoning: Multi-Stage Decoder for Text Recognition

文本替换(Text Replacement)

STRIVE: Scene Text Replacement In Videos

视觉问答(Visual Question Answering, VQA)

Greedy Gradient Ensemble for Robust Visual Question Answering

对抗攻击(Adversarial Attack)

Feature Importance-aware Transferable Adversarial Attacks

AdvDrop: Adversarial Attack to DNNs by Dropping Information

深度估计(Depth Estimation)

Augmenting Depth Estimation with Geospatial Context

NerfingMVS: Guided Optimization of Neural Radiance Fields for Indoor Multi-view Stereo

单目深度估计

MonoIndoor: Towards Good Practice of Self-Supervised Monocular Depth Estimation for Indoor Environments

Towards Interpretable Deep Networks for Monocular Depth Estimation

Regularizing Nighttime Weirdness: Efficient Self-supervised Monocular Depth Estimation in the Dark

Self-supervised Monocular Depth Estimation for All Day Images using Domain Separation

StructDepth: Leveraging the structural regularities for self-supervised indoor depth estimation

视线估计(Gaze Estimation)

Generalizing Gaze Estimation with Outlier-guided Collaborative Adaptation

人群计数(Crowd Counting)

Rethinking Counting and Localization in Crowds:A Purely Point-Based Framework

Uniformity in Heterogeneity:Diving Deep into Count Interval Partition for Crowd Counting

车道线检测(Lane-Detection)

VIL-100: A New Dataset and A Baseline Model for Video Instance Lane Detection

轨迹预测(Trajectory Prediction)

Human Trajectory Prediction via Counterfactual Analysis

Personalized Trajectory Prediction via Distribution Discrimination

MG-GAN: A Multi-Generator Model Preventing Out-of-Distribution Samples in Pedestrian Trajectory Prediction

Social NCE: Contrastive Learning of Socially-aware Motion Representations

Safety-aware Motion Prediction with Unseen Vehicles for Autonomous Driving

Where are you heading? Dynamic Trajectory Prediction with Expert Goal Examples

异常检测(Anomaly Detection)

Weakly-supervised Video Anomaly Detection with Robust Temporal Feature Magnitude Learning

场景图生成(Scene Graph Generation)

Spatial-Temporal Transformer for Dynamic Scene Graph Generation

图像编辑(Image Editing)

Sketch Your Own GAN

图像合成(Image Synthesis)

Image Synthesis via Semantic Composition

图像检索(Image Retrieval)

Self-supervised Product Quantization for Deep Unsupervised Image Retrieval

三维重建(3D Reconstruction)

Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction

视频稳像(Video Stabilization)

Out-of-boundary View Synthesis Towards Full-Frame Video Stabilization

细粒度识别(Fine-Grained Recognition)

Webly Supervised Fine-Grained Recognition: Benchmark Datasets and An Approach

风格迁移(Style Transfer)

AdaAttN: Revisit Attention Mechanism in Arbitrary Neural Style Transfer

神经绘画(Neural Painting)

Paint Transformer: Feed Forward Neural Painting with Stroke Prediction

特征匹配(Feature Matching)

Learning to Match Features with Seeded Graph Matching Network

语义对应(Semantic Correspondence)

Multi-scale Matching Networks for Semantic Correspondence

边缘检测(Edge Detection)

Pixel Difference Networks for Efficient Edge Detection

RINDNet: Edge Detection for Discontinuity in Reflectance, Illumination, Normal and Depth

相机标定(Camera calibration)

CTRL-C: Camera calibration TRansformer with Line-Classification

图像质量评估(Image Quality Assessment)

MUSIQ: Multi-scale Image Quality Transformer

Learning Conditional Knowledge Distillation for Degraded-Reference Image Quality Assessment

度量学习(Metric Learning)

Deep Relational Metric Learning

Towards Interpretable Deep Metric Learning with Structural Matching

Unsupervised Domain Adaptation

Recursively Conditional Gaussian for Ordinal Unsupervised Domain Adaptation

Video Rescaling

Self-Conditioned Probabilistic Learning of Video Rescaling

Hand-Object Interaction

Learning a Contact Potential Field to Model the Hand-Object Interaction

Vision-and-Language Navigation

Airbert: In-domain Pretraining for Vision-and-Language Navigation

数据集(Datasets)

Beyond Road Extraction: A Dataset for Map Update using Aerial Images

StereOBJ-1M: Large-scale Stereo Image Dataset for 6D Object Pose Estimation

RINDNet: Edge Detection for Discontinuity in Reflectance, Illumination, Normal and Depth

Panoptic Narrative Grounding

STRIVE: Scene Text Replacement In Videos

Real-world Video Super-resolution: A Benchmark Dataset and A Decomposition based Learning Scheme

Matching in the Dark: A Dataset for Matching Image Pairs of Low-light Scenes

Dual-Camera Super-Resolution with Aligned Attention Modules

DepthTrack: Unveiling the Power of RGBD Tracking

Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction

BioFors: A Large Biomedical Image Forensics Dataset

Webly Supervised Fine-Grained Recognition: Benchmark Datasets and An Approach

Airbert: In-domain Pretraining for Vision-and-Language Navigation

Overfitting the Data: Compact Neural Video Delivery via Content-aware Feature Modulation

VIL-100: A New Dataset and A Baseline Model for Video Instance Lane Detection

XVFI: eXtreme Video Frame Interpolation

Personalized Image Semantic Segmentation

H3D-Net: Few-Shot High-Fidelity 3D Head Reconstruction

其他(Others)

Photon-Starved Scene Inference using Single Photon Cameras

Towards Flexible Blind JPEG Artifacts Removal

Generating Attribution Maps with Disentangled Masked Backpropagation

CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video Representations

ReconfigISP: Reconfigurable Camera Image Processing Pipeline

Panoptic Narrative Grounding

NEAT: Neural Attention Fields for End-to-End Autonomous Driving

Keep CALM and Improve Visual Feature Attribution

YouRefIt: Embodied Reference Understanding with Language and Gesture

Pri3D: Can 3D Priors Help 2D Representation Learning?

Amplitude-Phase Recombination: Rethinking Robustness of Convolutional Neural Networks in Frequency Domain

Continual Learning for Image-Based Camera Localization

Multi-Task Self-Training for Learning General Representations

A Unified Objective for Novel Class Discovery

Global Pooling, More than Meets the Eye: Position Information is Encoded Channel-Wise in CNNs

Overfitting the Data: Compact Neural Video Delivery via Content-aware Feature Modulation

Impact of Aliasing on Generalizatin in Deep Convolutional Networks

Out-of-Core Surface Reconstruction via Global TGV Minimization

Progressive Correspondence Pruning by Consensus Learning

Energy-Based Open-World Uncertainty Modeling for Confidence Calibration

Generalized Shuffled Linear Regression

Discovering 3D Parts from Image Collections

Semi-Supervised Active Learning with Temporal Output Discrepancy

Why Approximate Matrix Square Root Outperforms Accurate SVD in Global Covariance Pooling?

Paper: https://arxiv.org/abs/2105.02498

Code: https://github.com/KingJamesSong/DifferentiableSVD

Hand-Object Contact Consistency Reasoning for Human Grasps Generation

Equivariant Imaging: Learning Beyond the Range Space

Just Ask: Learning to Answer Questions from Millions of Narrated Videos

Owner
Amusi
请关注微信公众号: CVer
Amusi
PyTorch implementation of paper "IBRNet: Learning Multi-View Image-Based Rendering", CVPR 2021.

IBRNet: Learning Multi-View Image-Based Rendering PyTorch implementation of paper "IBRNet: Learning Multi-View Image-Based Rendering", CVPR 2021. IBRN

Google Interns 371 Jan 03, 2023
(CVPR 2022) A minimalistic mapless end-to-end stack for joint perception, prediction, planning and control for self driving.

LAV Learning from All Vehicles Dian Chen, Philipp Krähenbühl CVPR 2022 (also arXiV 2203.11934) This repo contains code for paper Learning from all veh

Dian Chen 300 Dec 15, 2022
🕹️ Official Implementation of Conditional Motion In-betweening (CMIB) 🏃

Conditional Motion In-Betweening (CMIB) Official implementation of paper: Conditional Motion In-betweeening. Paper(arXiv) | Project Page | YouTube in-

Jihoon Kim 81 Dec 22, 2022
A curated list of Machine Learning and Deep Learning tutorials in Jupyter Notebook format ready to run in Google Colaboratory

Awesome Machine Learning Jupyter Notebooks for Google Colaboratory A curated list of Machine Learning and Deep Learning tutorials in Jupyter Notebook

Carlos Toxtli 245 Jan 01, 2023
Reproducing-BowNet: Learning Representations by Predicting Bags of Visual Words

Reproducing-BowNet Our reproducibility effort based on the 2020 ML Reproducibility Challenge. We are reproducing the results of this CVPR 2020 paper:

6 Mar 16, 2022
Trained on Simulated Data, Tested in the Real World

Trained on Simulated Data, Tested in the Real World

livox 43 Nov 18, 2022
Source code for "FastBERT: a Self-distilling BERT with Adaptive Inference Time".

FastBERT Source code for "FastBERT: a Self-distilling BERT with Adaptive Inference Time". Good News 2021/10/29 - Code: Code of FastPLM is released on

Weijie Liu 584 Jan 02, 2023
Open source code for Paper "A Co-Interactive Transformer for Joint Slot Filling and Intent Detection"

A Co-Interactive Transformer for Joint Slot Filling and Intent Detection This repository contains the PyTorch implementation of the paper: A Co-Intera

67 Dec 05, 2022
Instance Segmentation in 3D Scenes using Semantic Superpoint Tree Networks

SSTNet Instance Segmentation in 3D Scenes using Semantic Superpoint Tree Networks(ICCV2021) by Zhihao Liang, Zhihao Li, Songcen Xu, Mingkui Tan, Kui J

83 Nov 29, 2022
一个多语言支持、易使用的 OCR 项目。An easy-to-use OCR project with multilingual support.

AgentOCR 简介 AgentOCR 是一个基于 PaddleOCR 和 ONNXRuntime 项目开发的一个使用简单、调用方便的 OCR 项目 本项目目前包含 Python Package 【AgentOCR】 和 OCR 标注软件 【AgentOCRLabeling】 使用指南 Pytho

AgentMaker 98 Nov 10, 2022
Second-Order Neural ODE Optimizer, NeurIPS 2021 spotlight

Second-order Neural ODE Optimizer (NeurIPS 2021 Spotlight) [arXiv] ✔️ faster convergence in wall-clock time | ✔️ O(1) memory cost | ✔️ better test-tim

Guan-Horng Liu 39 Oct 22, 2022
Resilient projection-based consensus actor-critic (RPBCAC) algorithm

Resilient projection-based consensus actor-critic (RPBCAC) algorithm We implement the RPBCAC algorithm with nonlinear approximation from [1] and focus

Martin Figura 5 Jul 12, 2022
This is a library for training and applying sparse fine-tunings with torch and transformers.

This is a library for training and applying sparse fine-tunings with torch and transformers. Please refer to our paper Composable Sparse Fine-Tuning f

Cambridge Language Technology Lab 37 Dec 30, 2022
A collection of Jupyter notebooks to play with NVIDIA's StyleGAN3 and OpenAI's CLIP for a text-based guided image generation.

StyleGAN3 CLIP-based guidance StyleGAN3 + CLIP StyleGAN3 + inversion + CLIP This repo is a collection of Jupyter notebooks made to easily play with St

Eugenio Herrera 176 Dec 30, 2022
Replication attempt for the Protein Folding Model

RGN2-Replica (WIP) To eventually become an unofficial working Pytorch implementation of RGN2, an state of the art model for MSA-less Protein Folding f

Eric Alcaide 36 Nov 29, 2022
A short code in python, Enchpyter, is able to encrypt and decrypt words as you determine, of course

Enchpyter Enchpyter is a program do encrypt and decrypt any word you want (just letters). You enter how many letters jumps and write the word, so, the

João Assalim 2 Oct 10, 2022
Python code to fuse multiple RGB-D images into a TSDF voxel volume.

Volumetric TSDF Fusion of RGB-D Images in Python This is a lightweight python script that fuses multiple registered color and depth images into a proj

Andy Zeng 845 Jan 03, 2023
Voice control for Garry's Mod

WIP: Talonvoice GMod integrations Very work in progress voice control demo for Garry's Mod. HOWTO Install https://talonvoice.com/ Press https://i.imgu

Meta Construct 5 Nov 15, 2022
Code for CVPR2019 Towards Natural and Accurate Future Motion Prediction of Humans and Animals

Motion prediction with Hierarchical Motion Recurrent Network Introduction This work concerns motion prediction of articulate objects such as human, fi

Shuang Wu 85 Dec 11, 2022
PoseCamera is python based SDK for human pose estimation through RGB webcam.

PoseCamera PoseCamera is python based SDK for human pose estimation through RGB webcam. Install install posecamera package through pip pip install pos

WonderTree 7 Jul 20, 2021