DETReg: Unsupervised Pretraining with Region Priors for Object Detection

Overview

DETReg: Unsupervised Pretraining with Region Priors for Object Detection

Amir Bar, Xin Wang, Vadim Kantorov, Colorado J Reed, Roei Herzig, Gal Chechik, Anna Rohrbach, Trevor Darrell, Amir Globerson

DETReg

This repository is the implementation of DETReg, see Project Page.

Release

  • COCO training code and eval - DONE
  • Pretrained models - DONE
  • Pascal VOC training code and eval- TODO

Introduction

DETReg is an unsupervised pretraining approach for object DEtection with TRansformers using Region priors. Motivated by the two tasks underlying object detection: localization and categorization, we combine two complementary signals for self-supervision. For an object localization signal, we use pseudo ground truth object bounding boxes from an off-the-shelf unsupervised region proposal method, Selective Search, which does not require training data and can detect objects at a high recall rate and very low precision. The categorization signal comes from an object embedding loss that encourages invariant object representations, from which the object category can be inferred. We show how to combine these two signals to train the Deformable DETR detection architecture from large amounts of unlabeled data. DETReg improves the performance over competitive baselines and previous self-supervised methods on standard benchmarks like MS COCO and PASCAL VOC. DETReg also outperforms previous supervised and unsupervised baseline approaches on low-data regime when trained with only 1%, 2%, 5%, and 10% of the labeled data on MS COCO.

Installation

Requirements

  • Linux, CUDA>=9.2, GCC>=5.4

  • Python>=3.7

    We recommend you to use Anaconda to create a conda environment:

    conda create -n detreg python=3.7 pip

    Then, activate the environment:

    conda activate detreg

    Installation: (change cudatoolkit to your cuda version. For detailed pytorch installation instructions click here)

    conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=10.2 -c pytorch
  • Other requirements

    pip install -r requirements.txt

Compiling CUDA operators

cd ./models/ops
sh ./make.sh
# unit test (should see all checking is True)
python test.py

Usage

Dataset preparation

Please download COCO 2017 dataset and ImageNet and organize them as following:

code_root/
└── data/
    ├── ilsvrc/
          ├── train/
          └── val/
    └── MSCoco/
        ├── train2017/
        ├── val2017/
        └── annotations/
        	├── instances_train2017.json
        	└── instances_val2017.json

Note that in this work we used the ImageNet100 dataset, which is x10 smaller than ImageNet. To create ImageNet100 run the following command:

mkdir -p data/ilsvrc100/train
mkdir -p data/ilsvrc100/val
while read line; do ln -s <code_root>/data/ilsvrc/train/$line <code_root>/data/ilsvrc100/train/$line; done < <code_root>/datasets/category.txt
while read line; do ln -s <code_root>/data/ilsvrc/val/$line <code_root>/data/ilsvrc100/val/$line; done < <code_root>/datasets/category.txt

This should results with the following structure:

code_root/
└── data/
    ├── ilsvrc/
          ├── train/
          └── val/
    ├── ilsvrc100/
          ├── train/
          └── val/
    └── MSCoco/
        ├── train2017/
        ├── val2017/
        └── annotations/
        	├── instances_train2017.json
        	└── instances_val2017.json

Create ImageNet Selective Search boxes:

Download the precomputed ImageNet boxes and extract in the cache folder:

mkdir -p /cache/ilsvrc && cd /cache/ilsvrc 
wget https://github.com/amirbar/DETReg/releases/download/1.0.0/ss_box_cache.tar.gz
tar -xf ss_box_cache.tar.gz

Alternatively, you can compute Selective Search boxes yourself:

To create selective search boxes for ImageNet100 on a single machine, run the following command (set num_processes):

python -m datasets.cache_ss --dataset imagenet100 --part 0 --num_m 1 --num_p <num_processes_to_use> 

To speed up the creation of boxes, change the arguments accordingly and run the following command on each different machine:

python -m datasets.cache_ss --dataset imagenet100 --part <machine_number> --num_m <num_machines> --num_p <num_processes_to_use> 

The cached boxes are saved in the following structure:

code_root/
└── cache/
    └── ilsvrc/

Training

The command for pretraining DETReg on 8 GPUs on ImageNet100 is as following:

GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 8 ./configs/DETReg_top30_in100.sh --batch_size 24 --num_workers 8

Training takes around 1.5 days with 8 NVIDIA V100 GPUs, you can download a pretrained model (see below) if you want to skip this step.

After pretraining, a checkpoint is saved in exps/DETReg_top30_in100/checkpoint.pth. To fine tune it over different coco settings use the following commands: Fine tuning on full COCO (should take 2 days with 8 NVIDIA V100 GPUs):

GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 8 ./configs/DETReg_fine_tune_full_coco.sh

For smaller subsets which trains faster, you can use smaller number of gpus (e.g 4 with batch size 2)/ Fine tuning on 1%

GPUS_PER_NODE=4 ./tools/run_dist_launch.sh 4 ./configs/DETReg_fine_tune_1pct_coco.sh --batch_size 2

Fine tuning on 2%

GPUS_PER_NODE=4 ./tools/run_dist_launch.sh 4 ./configs/DETReg_fine_tune_2pct_coco.sh --batch_size 2

Fine tuning on 5%

GPUS_PER_NODE=4 ./tools/run_dist_launch.sh 4 ./configs/DETReg_fine_tune_5pct_coco.sh --batch_size 2

Fine tuning on 10%

GPUS_PER_NODE=4 ./tools/run_dist_launch.sh 4 ./configs/DETReg_fine_tune_10pct_coco.sh --batch_size 2

Evaluation

To evaluate a finetuned model, use the following command from the project basedir:

./configs/<config file>.sh --resume exps/<config file>/checkpoint.pth --eval

Pretrained Models

Cite

If you found this code helpful, feel free to cite our work:

@misc{bar2021detreg,
      title={DETReg: Unsupervised Pretraining with Region Priors for Object Detection},
      author={Amir Bar and Xin Wang and Vadim Kantorov and Colorado J Reed and Roei Herzig and Gal Chechik and Anna Rohrbach and Trevor Darrell and Amir Globerson},
      year={2021},
      eprint={2106.04550},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Related Works

If you found DETReg useful, consider checking out these related works as well: ReSim, SwAV, DETR, UP-DETR, and Deformable DETR.

Acknowlegments

DETReg builds on previous works code base such as Deformable DETR and UP-DETR. If you found DETReg useful please consider citing these works as well.

Comments
  • Question about reproducing the Semi-supervised Learning experiment

    Question about reproducing the Semi-supervised Learning experiment

    When i using this checkpoint as pretrain

    image

    and using these script to reproducing the Semi-supervised Learning experiment

    image

    the result turns out to be huge difference :

    image

    Please help me, did i missing anything in reproducing ?

    By the way, i can reproduce the full COCO result @45.5AP. So the conda env is probably right.

    opened by 4-0-4-notfound 5
  • Question about selective search cached boxes in training and validation

    Question about selective search cached boxes in training and validation

    Why are there some '.npy' files for the Imagnet validation set in 'ss_box_cache.tar.gz', for example ILSVRC2012_val_00000006.npy. Are training sets and validation sets used for pretraining?

    opened by CQIITLAB 3
  • RuntimeError: The size of tensor a (512) must match the size of tensor b (128) at non-singleton dimension 3

    RuntimeError: The size of tensor a (512) must match the size of tensor b (128) at non-singleton dimension 3

    Hi, I'm trying to run the pretraining but I receive a mismatch size here https://github.com/amirbar/DETReg/blob/main/models/deformable_detr.py#L328, src_features has a shape of torch.Size([228, 512]) and target_features a shape of torch.Size([228, 3, 128, 128]). Is this ok?

    Start training /home/jossalgon/my-envs/detreg/lib/python3.7/site-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at /opt/conda/conda-bld/pytorch_1623448265233/work/c10/core/TensorImpl.h:1156.) return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode) /home/jossalgon/my-envs/detreg/lib/python3.7/site-packages/torch/_tensor.py:575: UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at /opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/BinaryOps.cpp:467.) return torch.floor_divide(self, other) /home/jossalgon/notebooks/unsupervised/DETReg/models/deformable_detr.py:329: UserWarning: Using a target size (torch.Size([228, 3, 128, 128])) that is different to the input size (torch.Size([228, 512])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size. return {'object_embedding_loss': torch.nn.functional.l1_loss(src_features, target_features, reduction='mean')} Traceback (most recent call last): File "main.py", line 403, in main(args) File "main.py", line 314, in main model, swav_model, criterion, data_loader_train, optimizer, device, epoch, args.clip_max_norm) File "/home/jossalgon/notebooks/unsupervised/DETReg/engine.py", line 50, in train_one_epoch loss_dict = criterion(outputs, targets) File "/home/jossalgon/my-envs/detreg/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/home/jossalgon/notebooks/unsupervised/DETReg/models/deformable_detr.py", line 406, in forward losses.update(self.get_loss(loss, outputs, targets, indices, num_boxes, **kwargs)) File "/home/jossalgon/notebooks/unsupervised/DETReg/models/deformable_detr.py", line 381, in get_loss return loss_map[loss](outputs, targets, indices, num_boxes, **kwargs) File "/home/jossalgon/notebooks/unsupervised/DETReg/models/deformable_detr.py", line 329, in loss_object_embedding_loss return {'object_embedding_loss': torch.nn.functional.l1_loss(src_features, target_features, reduction='mean')} File "/home/jossalgon/my-envs/detreg/lib/python3.7/site-packages/torch/nn/functional.py", line 3058, in l1_loss expanded_input, expanded_target = torch.broadcast_tensors(input, target) File "/home/jossalgon/my-envs/detreg/lib/python3.7/site-packages/torch/functional.py", line 73, in broadcast_tensors return _VF.broadcast_tensors(tensors) # type: ignore[attr-defined] RuntimeError: The size of tensor a (512) must match the size of tensor b (128) at non-singleton dimension 3 Traceback (most recent call last): File "./tools/launch.py", line 192, in main() File "./tools/launch.py", line 188, in main cmd=process.args)

    Using: cudatoolkit 11.1.74 h6bb024c_0 nvidia/linux-64 pytorch 1.9.0 py3.7_cuda11.1_cudnn8.0.5_0 pytorch/linux-64 torchaudio 0.9.0 py37 pytorch/linux-64 torchvision 0.10.0 py37_cu111 pytorch/linux-64

    Thanks and great work!

    opened by jossalgon 3
  • error occured when following the Compiling CUDA operators step.

    error occured when following the Compiling CUDA operators step.

    Hello, when I try to run sh ./make.sh by following the Compiling CUDA operators it always show the error, Traceback (most recent call last): File "setup.py", line 69, in ext_modules=get_extensions(), File "setup.py", line 47, in get_extensions raise NotImplementedError('Cuda is not availabel') NotImplementedError: Cuda is not availabel

    Any idea why this happens? Thanks!

    opened by ruizhaoz 2
  • Can`t install MultiScaleDeformableAttention

    Can`t install MultiScaleDeformableAttention

    Hi, I have seen others have resolved this, but no clues are left behind. I was not able to install the MultiScaleDeformableAttention package from pip or conda, and there is nothing in Readme.

    Please assist. Thank you!

    opened by jshtok 2
  • Results between IN100 and IN1k setting

    Results between IN100 and IN1k setting

    In the arXiv v1 version, the fine-tune result on COCO is 45.5 with IN100 pretrain. But in the arXiv v2 version, it seems the fine-tune result on COCO is still 45.5, but the pretrain dataset is IN1k. So, in my understanding, with more pretrain data, but the fine-tune result is not improved?

    opened by 4-0-4-notfound 2
  • Fine Tuning the Model on a fraction of VOC

    Fine Tuning the Model on a fraction of VOC

    Hi @amirbar,

    Thank You for the great work. It looks like the parameter --filter_pct has never been used in the code. It means the code effectively running fine-tuning on whole VOC/COCO datasets. Please correct me if I am wrong.

    Thanks

    opened by mmaaz60 2
  • It seems in the pretrain stage the network output 90 categories instead of 2

    It seems in the pretrain stage the network output 90 categories instead of 2

    Hello, It seems the network output 90 categories instead of 2, in the pretrain stage. In the paper, it supposes to output 2 categories (either back gourd or foreground), which is not true in the code. I'm so confused, Am i missing something?

    https://github.com/amirbar/DETReg/blob/490e40403860d51c19333b5db53bcd0ee23647ad/configs/DETReg_top30_in100.sh#L8

    https://github.com/amirbar/DETReg/blob/490e40403860d51c19333b5db53bcd0ee23647ad/main.py#L120

    https://github.com/amirbar/DETReg/blob/490e40403860d51c19333b5db53bcd0ee23647ad/models/deformable_detr.py#L497-L503

    opened by 4-0-4-notfound 2
  • Pretrained model on ImageNet-1K

    Pretrained model on ImageNet-1K

    Hi, Thank you for sharing your great work. I am conducting a study on the features of DETReg, and wanted to explore the performance with the pretrained model trained on the full ImageNet. I was wondering if you could share an ImageNet-1K pretrained model?

    Thank you

    opened by hanoonaR 2
  • Bug: Target[

    Bug: Target["area"] incorrect when using selective_search (and possibly others)

    The selective_search function changes the boxes to xyxy coordinates. boxes[..., 2] = boxes[..., 0] + boxes[..., 2] boxes[..., 3] = boxes[..., 1] + boxes[..., 3]

    In [get_item] (https://github.com/amirbar/DETReg/blob/36ae5844183499f6bc1a6d8922427b0f473e06d9/datasets/selfdet.py#L67)
    we have boxes = selective_search(img, h, w, res_size=128) ... target['boxes'] = torch.tensor(boxes) ... target['area'] = target['boxes'][..., 2] * target['boxes'][..., 3]

    But boxes at this point on in xyxy not cxcywh, So the "area" is incorrect. I do not know if this effects anything down the line, it may not.

    opened by AZaitzeff 1
  • What is the difference between 'head' and 'intermediate' in 'obj_embedding_head'?

    What is the difference between 'head' and 'intermediate' in 'obj_embedding_head'?

    https://github.com/amirbar/DETReg/blob/0a258d879d8981b27ab032b83defc6dfcbf07d35/models/backbone.py#L156-L177

    It seems 'head' is the new training setting that uses dim=128 to align features. But dim=512 ('intermediate') is used in the paper. Does it mean that we should change to dim=128 ('head') to achieve better performance of DETReg?

    Thanks.

    opened by Cohesion97 1
  • CVE-2007-4559 Patch

    CVE-2007-4559 Patch

    Patching CVE-2007-4559

    Hi, we are security researchers from the Advanced Research Center at Trellix. We have began a campaign to patch a widespread bug named CVE-2007-4559. CVE-2007-4559 is a 15 year old bug in the Python tarfile package. By using extract() or extractall() on a tarfile object without sanitizing input, a maliciously crafted .tar file could perform a directory path traversal attack. We found at least one unsantized extractall() in your codebase and are providing a patch for you via pull request. The patch essentially checks to see if all tarfile members will be extracted safely and throws an exception otherwise. We encourage you to use this patch or your own solution to secure against CVE-2007-4559. Further technical information about the vulnerability can be found in this blog.

    If you have further questions you may contact us through this projects lead researcher Kasimir Schulz.

    opened by TrellixVulnTeam 0
  • Fine-tuning based on the DETR architecture code, but the verification indicators are all 0

    Fine-tuning based on the DETR architecture code, but the verification indicators are all 0

    Thanks for your work. I noticed that you open-sourced the detreg of the DETR architecture, and then I tried to use the pre-trained model on the imagenet dataset you provided to fine-tune training for my custom dataset. But I found that all the indicators are still 0 after more than fifty batches of pre-training. I have followed the tips in the related issues of DETR (https://github.com/facebookresearch/detr/issues?page=1&q=zero) , the num_calss was modified. Many people mentioned that DETR requires a large amount of training data, or fine-tuning. But I am currently using fine-tuning, and the number of fine-tuning datasets is about one thousand. But the effect is still very poor, may I ask why. It's normal for me to use deformable-detr architecture. image

    opened by Flyooofly 0
  •   checkpoint_args = torch.load(args.resume, map_location='cpu')['args'] KeyError: 'args'???

    checkpoint_args = torch.load(args.resume, map_location='cpu')['args'] KeyError: 'args'???

    1: checkpoint_args = torch.load(args.resume,map_location='cpu')['args'] KeyError: 'args' I have this kind of error report in the evaluation stage, I don't know how to deal with it, I hope the owner can help me to solve it, thank you very much. 2: Does the test effect of a single GPU appear to be reduced?

    opened by 873552584 0
  • About few-shot object detection

    About few-shot object detection

    I found the result of few-shot object detection is better than others, could you release the few-shot object detection code? or hyperparameters? or how to import novel and base datasets? thanks :)

    opened by YAOSL98 0
  • urllib.error.HTTPError: HTTP Error 403: Forbidden

    urllib.error.HTTPError: HTTP Error 403: Forbidden

    Downloading: "https://dl.fbaipublicfiles.com/deepcluster/swav_800ep_pretrain.pth.tar" to C:\Users\pc/.cache\torch\hub\checkpoints\swav_800ep_pretrain.pth.tar urllib.error.HTTPError: HTTP Error 403: Forbidden hope to solve,thanks

    opened by GDzhu01 0
Releases(1.0.0)
BBScan py3 - BBScan py3 With Python

BBScan_py3 This repository is forked from lijiejie/BBScan 1.5. I migrated the fo

baiyunfei 12 Dec 30, 2022
PURE: End-to-End Relation Extraction

PURE: End-to-End Relation Extraction This repository contains (PyTorch) code and pre-trained models for PURE (the Princeton University Relation Extrac

Princeton Natural Language Processing 657 Jan 09, 2023
Train DeepLab for Semantic Image Segmentation

Train DeepLab for Semantic Image Segmentation Martin Kersner, [email protected]

Martin Kersner 172 Dec 14, 2022
Image based Human Fall Detection

Here I integrated the YOLOv5 object detection algorithm with my own created dataset which consists of human activity images to achieve low cost, high accuracy, and real-time computing requirements

UTTEJ KUMAR 12 Dec 11, 2022
Official codebase used to develop Vision Transformer, MLP-Mixer, LiT and more.

Big Vision This codebase is designed for training large-scale vision models on Cloud TPU VMs. It is based on Jax/Flax libraries, and uses tf.data and

Google Research 701 Jan 03, 2023
Code for LIGA-Stereo Detector, ICCV'21

LIGA-Stereo Introduction This is the official implementation of the paper LIGA-Stereo: Learning LiDAR Geometry Aware Representations for Stereo-based

Xiaoyang Guo 75 Dec 09, 2022
Context Axial Reverse Attention Network for Small Medical Objects Segmentation

CaraNet: Context Axial Reverse Attention Network for Small Medical Objects Segmentation This repository contains the implementation of a novel attenti

401 Dec 23, 2022
implement of SwiftNet:Real-time Video Object Segmentation

SwiftNet The official PyTorch implementation of SwiftNet:Real-time Video Object Segmentation, which has been accepted by CVPR2021. Requirements Python

haochen wang 64 Dec 14, 2022
Migration of Edge-based Distributed Federated Learning

FedFly: Towards Migration in Edge-based Distributed Federated Learning About the research Due to mobility, a device participating in Federated Learnin

qub-blesson 11 Nov 13, 2022
[CoRL 21'] TANDEM: Tracking and Dense Mapping in Real-time using Deep Multi-view Stereo

TANDEM: Tracking and Dense Mapping in Real-time using Deep Multi-view Stereo Lukas Koestler1*    Nan Yang1,2*,†    Niclas Zeller2,3    Daniel Cremers1

TUM Computer Vision Group 744 Jan 04, 2023
Deep Unsupervised 3D SfM Face Reconstruction Based on Massive Landmark Bundle Adjustment.

(ACMMM 2021 Oral) SfM Face Reconstruction Based on Massive Landmark Bundle Adjustment This repository shows two tasks: Face landmark detection and Fac

BoomStar 51 Dec 13, 2022
The lightweight PyTorch wrapper for high-performance AI research. Scale your models, not the boilerplate.

The lightweight PyTorch wrapper for high-performance AI research. Scale your models, not the boilerplate. Website • Key Features • How To Use • Docs •

Pytorch Lightning 21.1k Jan 01, 2023
Automatic Calibration for Non-repetitive Scanning Solid-State LiDAR and Camera Systems

ACSC Automatic extrinsic calibration for non-repetitive scanning solid-state LiDAR and camera systems. System Architecture 1. Dependency Tested with U

KINO 192 Dec 13, 2022
Character-Input - Create a program that asks the user to enter their name and their age

Character-Input Create a program that asks the user to enter their name and thei

PyLaboratory 0 Feb 06, 2022
Keyword2Text This repository contains the code of the paper: "A Plug-and-Play Method for Controlled Text Generation"

Keyword2Text This repository contains the code of the paper: "A Plug-and-Play Method for Controlled Text Generation", if you find this useful and use

57 Dec 27, 2022
Source code for the plant extraction workflow introduced in the paper “Agricultural Plant Cataloging and Establishment of a Data Framework from UAV-based Crop Images by Computer Vision”

Plant extraction workflow Source code for the plant extraction workflow introduced in the paper "Agricultural Plant Cataloging and Establishment of a

Maurice Günder 0 Apr 22, 2022
PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners for self-supervised ViT.

MAE for Self-supervised ViT Introduction This is an unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners for self-sup

36 Oct 30, 2022
Pytorch implementation of Depth-conditioned Dynamic Message Propagation forMonocular 3D Object Detection

DDMP-3D Pytorch implementation of Depth-conditioned Dynamic Message Propagation forMonocular 3D Object Detection, a paper on CVPR2021. Instroduction T

Li Wang 32 Nov 09, 2022
[AI6122] Text Data Management & Processing

[AI6122] Text Data Management & Processing is an elective course of MSAI, SCSE, NTU, Singapore. The repository corresponds to the AI6122 of Semester 1, AY2021-2022, starting from 08/2021. The instruc

HT. Li 1 Jan 17, 2022
Supplementary code for the experiments described in the 2021 ISMIR submission: Leveraging Hierarchical Structures for Few Shot Musical Instrument Recognition.

Music Trees Supplementary code for the experiments described in the 2021 ISMIR submission: Leveraging Hierarchical Structures for Few Shot Musical Ins

Hugo Flores García 32 Nov 22, 2022