ViDT: An Efficient and Effective Fully Transformer-based Object Detector

Last update: Dec 27, 2022

Related tags

Overview

ViDT: An Efficient and Effective Fully Transformer-based Object Detector

by Hwanjun Song¹, Deqing Sun², Sanghyuk Chun¹, Varun Jampani², Dongyoon Han¹,
Byeongho Heo¹, Wonjae Kim¹, and Ming-Hsuan Yang^2,3

¹ NAVER AI Lab, ² Google Research, ³ University California Merced

Oct 8, 2021: Our work is publicly available at ArXiv.
Oct 18, 2021: ViDT now supports Co-scale conv-attentional image Transformers (CoaT) as another body structure.
Oct 22, 2021: ViDT introduces and incorporates a cross-scale fusion module based on feature pyramid networks.
Oct 26, 2021: IoU-awareness loss, and token labeling loss are available with ViDT.
Nov 5, 2021: The official code is released!

ViDT: Vision and Detection Transformers

Highlight

ViDT is an end-to-end fully transformer-based object detector, which directly produces predictions without using convolutional layers. Our main contributions are summarized as follows:

ViDT introduces a modified attention mechanism, named Reconfigured Attention Module (RAM), that facilitates any ViT variant to handling the appened [DET] and [PATCH] tokens for a standalone object detection. Thus, we can modify the lastest Swin Transformer backbone with RAM to be an object detector and obtain high scalability using its local attetention mechanism with linear complexity.
ViDT adopts a lightweight encoder-free neck architecture to reduce the computational overhead while still enabling the additional optimization techniques on the neck module. As a result, ViDT obtains better performance than neck-free counterparts.
We introdcue a new concept of token matching for knowledge distillation, which brings additional performance gains from a large model to a small model without compromising detection efficiency.

Architectural Advantages. First, ViDT enables to combine Swin Transformer and the sequent-to-sequence paradigm for detection. Second, ViDT can use the multi-scale features and additional techniques without a significant computation overhead. Therefore, as a fully transformer-based object detector, ViDT facilitates better integration of vision and detection transformers.

Component Summary. There are four components: (1) RAM to extend Swin Transformer as a standalone object detector, (2) the neck decoder to exploit multi-scale features with two additional techniques, auxiliary decoding loss and iterative box refinement, (3) knowledge distillation to benefit from a large model, and (4) decoding layer drop to further accelerate inference speed.

Evaluation

Index: [A. ViT Backbone], [B. Main Results], [C. Complete Analysis]

|--- A. ViT Backbone used for ViDT
|--- B. Main Results in the ViDT Paper
     |--- B.1. ViDT for 50 and 150 Epochs
     |--- B.2. Distillation with Token Matching
|--- C. Complete Component Analysis

A. ViT Backbone used for ViDT

Backbone and Size	Training Data	Epochs	Resulution	Params	ImageNet Acc.	Checkpoint
`Swin-nano`	ImageNet-1K	300	224	6M	74.9%	Github
`Swin-tiny`	ImageNet-1K	300	224	28M	81.2%	Github
`Swin-small`	ImageNet-1K	300	224	50M	83.2%	Github
`Swin-base`	ImageNet-22K	90	224	88M	86.3%	Github

B. Main Results in the ViDT Paper

In main experiments, auxiliary decoding loss and iterative box refinement were used as the auxiliary techniques on the neck structure.
The efficiacy of distillation with token mathcing and decoding layer drop are verified independently in Compelete Component Analysis.
All the models were re-trained with the final version of source codes. Thus, the value may be very slightly different from those in the paper.

B.1. VIDT for 50 and 150 epochs

Backbone	Epochs	AP	AP50	AP75	AP_S	AP_M	AP_L	Params	FPS	Checkpoint / Log
`Swin-nano`	50 (150)	40.4 (42.6)	59.9 (62.2)	43.0 (45.7)	23.1 (24.9)	42.8 (45.4)	55.9 (59.1)	16M	20.0	Github / Log (Github / Log)
`Swin-tiny`	50 (150)	44.9 (47.2)	64.7 (66.7)	48.3 (51.4)	27.5 (28.4)	47.9 (50.2)	61.9 (64.7)	38M	17.2	Github / Log (Github / Log)
`Swin-small`	50 (150)	47.4 (48.8)	67.7 (68.8)	51.2 (53.0)	30.4 (30.7)	50.7 (52.0)	64.6 (65.9)	60M	12.1	Github / Log (Github / Log)
`Swin-base`	50 (150)	49.4 (50.4)	69.6 (70.4)	53.4 (54.8)	31.6 (34.1)	52.4 (54.2)	66.8 (67.4)	0.1B	9.0	Github / Log (Github / Log)

B.2. Distillation with Token Matching (Coefficient 4.0)

All the models are trained for 50 epochs with distillation.

Teacher	ViDT (Swin-base) trained for 50 epochs
Student	ViDT (Swin-nano)	ViDT (Swin-tiny)	ViDT (Swin-Small)
Coefficient = 0.0	40.4	44.9	47.4
Coefficient = 4.0	41.8 (Github / Log)	46.6 (Github / Log)	49.2 (Github / Log)

C. Complete Component Analysis

We combined the four proposed components (even with distillation with token matching and decoding layer drop) to achieve high accuracy and speed for object detection. For distillation, ViDT (Swin-base) trained for 50 epochs was used for all models.

	Component				Swin-nano			Swin-tiny			Swin-small
#	RAM	Neck	Distil	Drop	AP	Params	FPS	AP	Params	FPS	AP	Params	FPS
(1)	✔️				28.7	7M	36.5	36.3	29M	28.6	41.6	52M	16.8
(2)	✔️	✔️			40.4	16M	20.0	44.9	38M	17.2	47.4	60M	12.1
(3)	✔️	✔️	✔️		41.8	16M	20.0	46.6	38M	17.2	49.2	60M	12.1
(4)	✔️	✔️	✔️	✔️	41.6	13M	23.0	46.4	35M	19.5	49.1	58M	13.0

Requirements

This codebase has been developed with the setting used in Deformable DETR:
Linux, CUDA>=9.2, GCC>=5.4, Python>=3.7, PyTorch>=1.5.1, and torchvision>=0.6.1.

We recommend you to use Anaconda to create a conda environment:

conda create -n deformable_detr python=3.7 pip
conda activate deformable_detr
conda install pytorch=1.5.1 torchvision=0.6.1 cudatoolkit=9.2 -c pytorch

Compiling CUDA operators for deformable attention

cd ./ops
sh ./make.sh
# unit test (should see all checking is True)
python test.py

Other requirements

pip install -r requirements.txt

Training

We used the below commands to train ViDT models with a single node having 8 NVIDIA V100 GPUs.

Run this command to train the ViDT (Swin-nano) model in the paper :


python -m torch.distributed.launch \
       --nproc_per_node=8 \
       --nnodes=1 \
       --use_env main.py \
       --method vidt \
       --backbone_name swin_nano \
       --epochs 50 \
       --lr 1e-4 \
       --min-lr 1e-7 \
       --batch_size 2 \
       --num_workers 2 \
       --aux_loss True \
       --with_box_refine True \
       --coco_path /path/to/coco \
       --output_dir /path/for/output

Run this command to train the ViDT (Swin-tiny) model in the paper :


python -m torch.distributed.launch \
       --nproc_per_node=8 \
       --nnodes=1 \
       --use_env main.py \
       --method vidt \
       --backbone_name swin_tiny \
       --epochs 50 \
       --lr 1e-4 \
       --min-lr 1e-7 \
       --batch_size 2 \
       --num_workers 2 \
       --aux_loss True \
       --with_box_refine True \
       --coco_path /path/to/coco \
       --output_dir /path/for/output

Run this command to train the ViDT (Swin-small) model in the paper :


python -m torch.distributed.launch \
       --nproc_per_node=8 \
       --nnodes=1 \
       --use_env main.py \
       --method vidt \
       --backbone_name swin_small \
       --epochs 50 \
       --lr 1e-4 \
       --min-lr 1e-7 \
       --batch_size 2 \
       --num_workers 2 \
       --aux_loss True \
       --with_box_refine True \
       --coco_path /path/to/coco \
       --output_dir /path/for/output

Run this command to train the ViDT (Swin-base) model in the paper :


python -m torch.distributed.launch \
       --nproc_per_node=8 \
       --nnodes=1 \
       --use_env main.py \
       --method vidt \
       --backbone_name swin_base_win7_22k \
       --epochs 50 \
       --lr 1e-4 \
       --min-lr 1e-7 \
       --batch_size 2 \
       --num_workers 2 \
       --aux_loss True \
       --with_box_refine True \
       --coco_path /path/to/coco \
       --output_dir /path/for/output

When a large pre-trained ViDT model is available, distillation with token matching can be applied for training a smaller ViDT model.

Run this command when training ViDT (Swin-nano) using a large ViDT (Swin-base) via Knowledge Distillation :


python -m torch.distributed.launch \
       --nproc_per_node=8 \
       --nnodes=1 \
       --use_env main.py \
       --method vidt \
       --backbone_name swin_nano \
       --epochs 50 \
       --lr 1e-4 \
       --min-lr 1e-7 \
       --batch_size 2 \
       --num_workers 2 \
       --aux_loss True \
       --with_box_refine True \
       --distil_model vidt_base \
       --distil_path /path/to/vidt_base (or url) \
       --coco_path /path/to/coco \
       --output_dir /path/for/output

Evaluation

Run this command to evaluate the ViDT (Swin-nano) model on COCO :


python -m torch.distributed.launch \
       --nproc_per_node=8 \ 
       --nnodes=1 \
       --use_env main.py \
       --method vidt \
       --backbone_name swin_nano \
       --batch_size 2 \
       --num_workers 2 \
       --aux_loss True \
       --with_box_refine True \
       --coco_path /path/to/coco \
       --resume /path/to/vidt_nano \
       --pre_trained none \
       --eval True

Run this command to evaluate the ViDT (Swin-tiny) model on COCO :


python -m torch.distributed.launch \
       --nproc_per_node=8 \
       --nnodes=1 \
       --use_env main.py \
       --method vidt \
       --backbone_name swin_tiny \
       --batch_size 2 \
       --num_workers 2 \
       --aux_loss True \
       --with_box_refine True \
       --coco_path /path/to/coco \
       --resume /path/to/vidt_tiny\
       --pre_trained none \
       --eval True

Run this command to evaluate the ViDT (Swin-small) model on COCO :


python -m torch.distributed.launch \
       --nproc_per_node=8 \
       --nnodes=1 \
       --use_env main.py \
       --method vidt \
       --backbone_name swin_small \
       --batch_size 2 \
       --num_workers 2 \
       --aux_loss True \
       --with_box_refine True \
       --coco_path /path/to/coco \
       --resume /path/to/vidt_small \
       --pre_trained none \
       --eval True

Run this command to evaluate the ViDT (Swin-base) model on COCO :


python -m torch.distributed.launch \
       --nproc_per_node=8 \
       --nnodes=1 \
       --use_env main.py \
       --method vidt \
       --backbone_name swin_base_win7_22k \
       --batch_size 2 \
       --num_workers 2 \
       --aux_loss True \
       --with_box_refine True \
       --coco_path /path/to/coco \
       --resume /path/to/vidt_base \
       --pre_trained none \
       --eval True

Citation

Please consider citation if our paper is useful in your research.

@article{song2021vidt,
  title={ViDT: An Efficient and Effective Fully Transformer-based Object Detector},
  author={Song, Hwanjun and Sun, Deqing and Chun, Sanghyuk and Jampani, Varun and Han, Dongyoon and Heo, Byeongho and Kim, Wonjae and Yang, Ming-Hsuan},
  journal={arXiv preprint arXiv:2110.03921},
  year={2021}
}

License

Copyright 2021-present NAVER Corp.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

Comments

Inference Time of Deformable Detr with Swin-base

Hi, From the results you provided in openreview, the inference time of deformable detr with swin-base is 4.8 FPS. However, from my testing, it is 8.1 FPS. I am using Tesla V100 GPU with batch size=1.

opened by ilovecv 5
Simple notebook file(.ipynb) for whom wants to train/test ViDT on Colab

As I first seen your paper, I'm currently trying train/test of ViDT on single machine, single gpu (especially Colab Pro).

Since there seems to be no any other materials (or .ipynb file) of tutorial for this simple testing with COCO dataset,

I would like to share my .ipynb file for whom interested in this model, and testing with Colab environment.

.ipynb file on this repo

If it bothers, please let me know, then I'll delete this colab repo.

Thanks in advance.

opened by EherSenaw 1
Error while running make.sh

I am getting the following error message while running make.sh in ops directory.

I am exactly following the installation steps provided in the README file

`Traceback (most recent call last): File "/home/aditya_rastogi/anaconda3/envs/ddetr/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1423, in _run_ninja_build check=True) File "/home/aditya_rastogi/anaconda3/envs/ddetr/lib/python3.7/subprocess.py", line 512, in run output=stdout, stderr=stderr) subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "setup.py", line 70, in cmdclass={"build_ext": torch.utils.cpp_extension.BuildExtension}, File "/home/aditya_rastogi/anaconda3/envs/ddetr/lib/python3.7/site-packages/setuptools/init.py", line 153, in setup return distutils.core.setup(**attrs) File "/home/aditya_rastogi/anaconda3/envs/ddetr/lib/python3.7/distutils/core.py", line 148, in setup dist.run_commands() File "/home/aditya_rastogi/anaconda3/envs/ddetr/lib/python3.7/distutils/dist.py", line 966, in run_commands self.run_command(cmd) File "/home/aditya_rastogi/anaconda3/envs/ddetr/lib/python3.7/distutils/dist.py", line 985, in run_command cmd_obj.run() File "/home/aditya_rastogi/anaconda3/envs/ddetr/lib/python3.7/distutils/command/build.py", line 135, in run self.run_command(cmd_name) File "/home/aditya_rastogi/anaconda3/envs/ddetr/lib/python3.7/distutils/cmd.py", line 313, in run_command self.distribution.run_command(command) File "/home/aditya_rastogi/anaconda3/envs/ddetr/lib/python3.7/distutils/dist.py", line 985, in run_command cmd_obj.run() File "/home/aditya_rastogi/anaconda3/envs/ddetr/lib/python3.7/site-packages/setuptools/command/build_ext.py", line 79, in run _build_ext.run(self) File "/home/aditya_rastogi/anaconda3/envs/ddetr/lib/python3.7/distutils/command/build_ext.py", line 340, in run self.build_extensions() File "/home/aditya_rastogi/anaconda3/envs/ddetr/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 603, in build_extensions build_ext.build_extensions(self) File "/home/aditya_rastogi/anaconda3/envs/ddetr/lib/python3.7/distutils/command/build_ext.py", line 449, in build_extensions self._build_extensions_serial() File "/home/aditya_rastogi/anaconda3/envs/ddetr/lib/python3.7/distutils/command/build_ext.py", line 474, in _build_extensions_serial self.build_extension(ext) File "/home/aditya_rastogi/anaconda3/envs/ddetr/lib/python3.7/site-packages/setuptools/command/build_ext.py", line 202, in build_extension _build_ext.build_extension(self, ext) File "/home/aditya_rastogi/anaconda3/envs/ddetr/lib/python3.7/distutils/command/build_ext.py", line 534, in build_extension depends=ext.depends) File "/home/aditya_rastogi/anaconda3/envs/ddetr/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 437, in unix_wrap_ninja_compile with_cuda=with_cuda) File "/home/aditya_rastogi/anaconda3/envs/ddetr/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1163, in _write_ninja_file_and_compile_objects error_prefix='Error compiling objects for extension') File "/home/aditya_rastogi/anaconda3/envs/ddetr/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1436, in _run_ninja_build raise RuntimeError(message) RuntimeError: Error compiling objects for extension`

opened by IISCAditayTripathi 0
Question about feature map

Hello,

I have a question about the feature map that is extracted by the Swin backbone. Assuming an input with size (224,224), the original Swin model produces 4 feature maps, with shapes (C, 56, 56), (2C, 28, 28), (4C, 14, 14) and (8C, 7, 7).

Your version, however, produces 4 feature maps (2C, 28, 28), (4C, 14, 14), (8C, 7, 7) and (256, 4, 4).

Can you please explain why you are not also using the 1st feature map?

opened by ManiadisG 0
Long training Time

I am trying to train swin_nano with 4 V100 GPUs. It's almost 20hrs but have not completed one epoch yet. I have followed the setup instructions stated in this repo. My setup is as foliows: Package Version

certifi 2022.6.15
charset-normalizer 2.1.0
cycler 0.11.0
einops 0.4.1
fonttools 4.33.3
idna 3.3
kiwisolver 1.4.3
matplotlib 3.5.2
MultiScaleDeformableAttention 1.0
numpy 1.21.6
onnx 1.10.0
onnxruntime 1.4.0
opencv-python 4.1.1.26
packaging 21.3
Pillow 9.2.0
pip 19.0.3
protobuf 3.20.1
pycocotools 2.0.4
pyparsing 3.0.9
python-dateutil 2.8.2
requests 2.28.1
scipy 1.7.3
setuptools 40.8.0
six 1.16.0
timm 0.5.4
torch 1.8.0+cu111 torchaudio 0.8.0
torchvision 0.9.0+cu111 typing-extensions 4.3.0
urllib3 1.26.9

With the same setup DeformableDETR takes 1hr and 30 mins to complete one epoch on COCO 2017 dataset. Could anyone identify the problem?

opened by Alam4545 0
What if we only do detection and classification task with vidt+

As mention in title,I have some dataset that already transform to coco format with bounding box and class label but with no segmentation mask,which part of your code should be modified? Simply with --mask=False still not working..

opened by quyanqiu 0
#BUG

when i run the main.py, the error comes

ViDT training and evaluation script: error: unrecognized arguments: true

in main.py, my code is

args = parser.parse_args(['--method', 'vidt', '--backbone_name', 'swin_nano', '--epochs', '50', '--lr', '1e-4', '--min-lr', '1e-7', '--batch_size', '2', '--num_workers', '2', '--aux_loss', 'true', '--with_box_refine', 'true', '--det_token_num', '100', '--epff', ' true', '--token_label', 'true', '--iou_aware', 'true', '--with_vector', 'true', '--masks', 'true', '--coco_path', '/r/code/coco', '--output_dir', './output',])

opened by ross-Hr 1

Releases(v0.1-vidt-plus-optimized)

v0.1-vidt-plus-optimized(Apr 5, 2022)

ViDT+ models were trained for 150 epochs and with full proposed components.
Source code(tar.gz)
Source code(zip)
vidt_plus_swin_nano_optimized.pth(201.92 MB)
vidt_plus_swin_small_optimized.pth(711.33 MB)
vidt_plus_swin_tiny_optimized.pth(453.50 MB)
v0.1-vidt-plus(Apr 5, 2022)

We trained ViDT+ models for 50 epochs.
Source code(tar.gz)
Source code(zip)
vidt_plus_base_det300.pth(1160.87 MB)
vidt_plus_base_det300.txt(412.33 KB)
vidt_plus_nano_det300.pth(201.92 MB)
vidt_plus_nano_det300.txt(411.96 KB)
vidt_plus_small_det300.pth(711.33 MB)
vidt_plus_small_det300.txt(412.24 KB)
vidt_plus_tiny_det300.pth(453.50 MB)
vidt_plus_tiny_det300.txt(412.15 KB)
v0.1-vidt-distil(Nov 5, 2021)

We trained ViDT models with distillation (token matching) for 50 epochs
Source code(tar.gz)
Source code(zip)
vidt_nano_50.pth(178.93 MB)
vidt_nano_50.txt(240.53 KB)
vidt_small_50.pth(688.23 MB)
vidt_small_50.txt(240.41 KB)
vidt_tiny_50.pth(430.40 MB)
vidt_tiny_50.txt(240.45 KB)
v0.1-vidt(Nov 5, 2021)

There are ViDT pre-trained models for 50 and 150 epochs with different model sizes (from nano to base). We activated auxiliary decoding loss and iterative box refinement.
Source code(tar.gz)
Source code(zip)
vidt_base_150.pth(1137.40 MB)
vidt_base_150.txt(707.72 KB)
vidt_base_50.pth(1137.40 MB)
vidt_base_50.txt(235.94 KB)
vidt_nano_150.pth(178.93 MB)
vidt_nano_150.txt(708.07 KB)
vidt_nano_50.pth(178.93 MB)
vidt_nano_50.txt(236.12 KB)
vidt_small_150.pth(688.23 MB)
vidt_small_150.txt(707.79 KB)
vidt_small_50.pth(688.23 MB)
vidt_small_50.txt(235.97 KB)
vidt_tiny_150.pth(430.40 MB)
vidt_tiny_150.txt(707.90 KB)
vidt_tiny_50.pth(430.40 MB)
vidt_tiny_50.txt(236.07 KB)
v0.1-swin(Nov 5, 2021)

This is a pre-trained model called Swin-nano. The accuracy was 74.9% when trained for 300 epochs.
Source code(tar.gz)
Source code(zip)
swin_nano_patch4_window7_224.pth(113.75 MB)

Owner

NAVER AI

Official account of NAVER AI, Korea No.1 Industrial AI Research Group

GitHub Repository

Code for the paper "Graph Attention Tracking". (CVPR2021)

SiamGAT 1. Environment setup This code has been tested on Ubuntu 16.04, Python 3.5, Pytorch 1.2.0, CUDA 9.0. Please install related libraries before r

122 Dec 24, 2022

FedML: A Research Library and Benchmark for Federated Machine Learning

FedML: A Research Library and Benchmark for Federated Machine Learning 📄 https://arxiv.org/abs/2007.13518 News 2021-02-01 (Award): #NeurIPS 2020# Fed

2.3k Jan 08, 2023

Paddle implementation for "Highly Efficient Knowledge Graph Embedding Learning with Closed-Form Orthogonal Procrustes Analysis" (NAACL 2021)

ProcrustEs-KGE Paddle implementation for Highly Efficient Knowledge Graph Embedding Learning with Orthogonal Procrustes Analysis 🙈 A more detailed re

4 Jun 09, 2021

A Tensorflow implementation of BicycleGAN.

BicycleGAN implementation in Tensorflow As part of the implementation series of Joseph Lim's group at USC, our motivation is to accelerate (or sometim

97 Dec 02, 2022

The code for paper "Contrastive Spatio-Temporal Pretext Learning for Self-supervised Video Representation" which is accepted by AAAI 2022

Contrastive Spatio Temporal Pretext Learning for Self-supervised Video Representation (AAAI 2022) The code for paper "Contrastive Spatio-Temporal Pret

8 Jun 30, 2022

dualFace: Two-Stage Drawing Guidance for Freehand Portrait Sketching (CVMJ)

dualFace dualFace: Two-Stage Drawing Guidance for Freehand Portrait Sketching (CVMJ) We provide python implementations for our CVM 2021 paper "dualFac

46 Nov 10, 2022

Cryptocurrency Prediction with Artificial Intelligence (Deep Learning via LSTM Neural Networks)

Cryptocurrency Prediction with Artificial Intelligence (Deep Learning via LSTM Neural Networks)- Emirhan BULUT

102 Nov 18, 2022

Permute Me Softly: Learning Soft Permutations for Graph Representations

7 Jul 10, 2022

TorchMetrics is a collection of 25+ PyTorch metrics implementations and an easy-to-use API to create custom metrics.

Machine learning metrics for distributed, scalable PyTorch applications.

1.2k Jan 06, 2023

Implementation of "Large Steps in Inverse Rendering of Geometry"

Large Steps in Inverse Rendering of Geometry ACM Transactions on Graphics (Proceedings of SIGGRAPH Asia), December 2021. Baptiste Nicolet · Alec Jacob

274 Jan 06, 2023

IDA file loader for UF2, created for the DEFCON 29 hardware badge

UF2 Loader for IDA The DEFCON 29 badge uses the UF2 bootloader, which conveniently allows you to dump and flash the firmware over USB as a mass storag

6 Feb 08, 2022

This codebase proposes modular light python and pytorch implementations of several LiDAR Odometry methods

pyLiDAR-SLAM This codebase proposes modular light python and pytorch implementations of several LiDAR Odometry methods, which can easily be evaluated

208 Dec 16, 2022

Garbage classification using structure data.

垃圾分类模型使用说明 1.包含以下数据文件文件描述 data/MaterialMapping.csv 物体以及其归类的信息 data/TestRecords 光谱原始测试数据 CSV 文件 data/TestRecordDesc.zip CSV 文件描述文件 data/Boundaries.cs

1 Dec 10, 2021

🥈78th place in Riiid Answer Correctness Prediction competition

Riiid Answer Correctness Prediction Introduction This repository is the code that placed 78th in Riiid Answer Correctness Prediction competition. Requ

10 Jul 14, 2022

This is my codes that can visualize the psnr image in testing videos.

CVPR2018-Baseline-PSNRplot This is my codes that can visualize the psnr image in testing videos. Future Frame Prediction for Anomaly Detection – A New

12 May 29, 2021

Multi-Content GAN for Few-Shot Font Style Transfer at CVPR 2018

MC-GAN in PyTorch This is the implementation of the Multi-Content GAN for Few-Shot Font Style Transfer. The code was written by Samaneh Azadi. If you

422 Dec 04, 2022

Official Pytorch implementation for "End2End Occluded Face Recognition by Masking Corrupted Features, TPAMI 2021"

End2End Occluded Face Recognition by Masking Corrupted Features This is the Pytorch implementation of our TPAMI 2021 paper End2End Occluded Face Recog

25 Oct 31, 2022

Baleen: Robust Multi-Hop Reasoning at Scale via Condensed Retrieval (NeurIPS'21)

Baleen Baleen is a state-of-the-art model for multi-hop reasoning, enabling scalable multi-hop search over massive collections for knowledge-intensive

22 Dec 05, 2022

[ICCV21] Official implementation of the "Social NCE: Contrastive Learning of Socially-aware Motion Representations" in PyTorch.

Social-NCE + CrowdNav Website | Paper | Video | Social NCE + Trajectron | Social NCE + STGCNN This is an official implementation for Social NCE: Contr

125 Dec 23, 2022

Generating Band-Limited Adversarial Surfaces Using Neural Networks

Generating Band-Limited Adversarial Surfaces Using Neural Networks This is the official repository of the technical report that was published on arXiv

3 Jul 26, 2022

ViDT: An Efficient and Effective Fully Transformer-based Object Detector

Related tags

Overview

ViDT: An Efficient and Effective Fully Transformer-based Object Detector

ViDT: Vision and Detection Transformers

Highlight

Evaluation

A. ViT Backbone used for ViDT

B. Main Results in the ViDT Paper

B.1. VIDT for 50 and 150 epochs

B.2. Distillation with Token Matching (Coefficient 4.0)

C. Complete Component Analysis

Requirements

Compiling CUDA operators for deformable attention

Other requirements

Training

Evaluation

Citation

License

Comments

Inference Time of Deformable Detr with Swin-base

Simple notebook file(.ipynb) for whom wants to train/test ViDT on Colab

Error while running make.sh

Question about feature map

Long training Time

What if we only do detection and classification task with vidt+

#BUG

Releases(v0.1-vidt-plus-optimized)

v0.1-vidt-plus-optimized(Apr 5, 2022)

v0.1-vidt-plus(Apr 5, 2022)

v0.1-vidt-distil(Nov 5, 2021)

v0.1-vidt(Nov 5, 2021)

v0.1-swin(Nov 5, 2021)

Owner

NAVER AI

Code for the paper "Graph Attention Tracking". (CVPR2021)

FedML: A Research Library and Benchmark for Federated Machine Learning

Paddle implementation for "Highly Efficient Knowledge Graph Embedding Learning with Closed-Form Orthogonal Procrustes Analysis" (NAACL 2021)

A Tensorflow implementation of BicycleGAN.

The code for paper "Contrastive Spatio-Temporal Pretext Learning for Self-supervised Video Representation" which is accepted by AAAI 2022

dualFace: Two-Stage Drawing Guidance for Freehand Portrait Sketching (CVMJ)

Cryptocurrency Prediction with Artificial Intelligence (Deep Learning via LSTM Neural Networks)

Permute Me Softly: Learning Soft Permutations for Graph Representations

TorchMetrics is a collection of 25+ PyTorch metrics implementations and an easy-to-use API to create custom metrics.

Implementation of "Large Steps in Inverse Rendering of Geometry"

IDA file loader for UF2, created for the DEFCON 29 hardware badge

This codebase proposes modular light python and pytorch implementations of several LiDAR Odometry methods

Garbage classification using structure data.

🥈78th place in Riiid Answer Correctness Prediction competition

This is my codes that can visualize the psnr image in testing videos.

Multi-Content GAN for Few-Shot Font Style Transfer at CVPR 2018

Official Pytorch implementation for "End2End Occluded Face Recognition by Masking Corrupted Features, TPAMI 2021"

Baleen: Robust Multi-Hop Reasoning at Scale via Condensed Retrieval (NeurIPS'21)

[ICCV21] Official implementation of the "Social NCE: Contrastive Learning of Socially-aware Motion Representations" in PyTorch.

Generating Band-Limited Adversarial Surfaces Using Neural Networks