A PyTorch version of You Only Look at One-level Feature object detector

Overview

PyTorch_YOLOF

A PyTorch version of You Only Look at One-level Feature object detector.

The input image must be resized to have their shorter side being 800 and their longer side less or equal to 1333.

During reproducing the YOLOF, I found many tricks used in YOLOF but the baseline RetinaNet dosen't use those tricks. For example, YOLOF takes advantage of RandomShift, CTR_CLAMP, large learning rate, big batchsize(like 64), negative prediction threshold. Is it really fair that YOLOF use these tricks to compare with RetinaNet?

In a other word, whether the YOLOF can still work without those tricks?

Requirements

  • We recommend you to use Anaconda to create a conda environment:
conda create -n yolof python=3.6
  • Then, activate the environment:
conda activate yolof
  • Requirements:
pip install -r requirements.txt 

PyTorch >= 1.1.0 and Torchvision >= 0.3.0

Visualize positive sample

You can run following command to visualize positiva sample:

python train.py \
        -d voc \
        --batch_size 2 \
        --root path/to/your/dataset \
        --vis_targets

My Ablation Studies

image mask

  • Backbone: ResNet-50
  • image size: shorter size = 800, longer size <= 1333
  • Batch size: 16
  • lr: 0.01
  • lr of backbone: 0.01
  • SGD with momentum 0.9 and weight decay 1e-4
  • Matcher: IoU Top4 (Different from the official matcher that uses top4 of L1 distance.)
  • epoch: 12 (1x schedule)
  • lr decay: 8, 11
  • augmentation: RandomFlip

We ignore the loss of samples who are not in image.

Method AP AP50 AP75 APs APm APl
w/o mask 28.3 46.7 28.9 13.4 33.4 39.9
w mask 28.4 46.9 29.1 13.5 33.5 39.1

L1 Top4

  • Backbone: ResNet-50
  • image size: shorter size = 800, longer size <= 1333
  • Batch size: 16
  • lr: 0.01
  • lr of backbone: 0.01
  • SGD with momentum 0.9 and weight decay 1e-4
  • epoch: 12 (1x schedule)
  • lr decay: 8, 11
  • augmentation: RandomFlip
  • with image mask

IoU topk: We choose the topK of IoU between anchor boxes and labels as the positive samples.

L1 topk: We choose the topK of L1 distance between anchor boxes and labels as the positive samples.

Method AP AP50 AP75 APs APm APl
IoU Top4 28.4 46.9 29.1 13.5 33.5 39.1
L1 Top4 28.6 46.9 29.4 13.8 34.0 39.0

RandomShift Augmentation

  • Backbone: ResNet-50
  • image size: shorter size = 800, longer size <= 1333
  • Batch size: 16
  • lr: 0.01
  • lr of backbone: 0.01
  • SGD with momentum 0.9 and weight decay 1e-4
  • Matcher: L1 Top4
  • epoch: 12 (1x schedule)
  • lr decay: 8, 11
  • augmentation: RandomFlip
  • with image mask

YOLOF takes advantage of RandomShift augmentation which is not used in RetinaNet.

Method AP AP50 AP75 APs APm APl
w/o RandomShift 28.6 46.9 29.4 13.8 34.0 39.0
w/ RandomShift 29.0 47.3 29.8 14.2 34.2 38.9

Fix a bug in dataloader

  • Backbone: ResNet-50
  • image size: shorter size = 800, longer size <= 1333
  • Batch size: 16
  • lr: 0.01
  • lr of backbone: 0.01
  • SGD with momentum 0.9 and weight decay 1e-4
  • Matcher: L1 Top4
  • epoch: 12 (1x schedule)
  • lr decay: 8, 11
  • augmentation: RandomFlip + RandomShift
  • with image mask

I fixed a bug in dataloader. Specifically, I set the shuffle in dataloader as False ...

Method AP AP50 AP75 APs APm APl
bug 29.0 47.3 29.8 14.2 34.2 38.9
no bug 30.1 49.0 31.0 15.2 36.3 39.8

Ignore samples

  • Backbone: ResNet-50
  • image size: shorter size = 800, longer size <= 1333
  • Batch size: 16
  • lr: 0.01
  • lr of backbone: 0.01
  • SGD with momentum 0.9 and weight decay 1e-4
  • Matcher: L1 Top4
  • epoch: 12 (1x schedule)
  • lr decay: 8, 11
  • augmentation: RandomFlip + RandomShift
  • with image mask

We ignore those negative samples whose IoU with labels are higher the ignore threshold (igt).

Method AP AP50 AP75 APs APm APl
no igt 30.1 49.0 31.0 15.2 36.3 39.8
igt=0.7

Decode boxes

  • Backbone: ResNet-50
  • image size: shorter size = 800, longer size <= 1333
  • Batch size: 16
  • lr: 0.01
  • lr of backbone: 0.01
  • SGD with momentum 0.9 and weight decay 1e-4
  • Matcher: L1 Top4
  • epoch: 12 (1x schedule)
  • lr decay: 8, 11
  • augmentation: RandomFlip + RandomShift
  • with image mask

Method-1: ctr_x = x_anchor + t_x, ctr_y = y_anchor + t_y

Method-2: ctr_x = x_anchor + t_x * w_anchor, ctr_y = y_anchor + t_y * h_anchor

The Method-2 is following the operation used in YOLOF.

Method AP AP50 AP75 APs APm APl
Method-1
Method-2

Train

sh train.sh

You can change the configurations of train.sh.

If you just want to check which anchor box is assigned to the positive sample, you can run:

python train.py --cuda -d voc --batch_size 8 --vis_targets

According to your own situation, you can make necessary adjustments to the above run commands

Test

python test.py -d [select a dataset: voc or coco] \
               --cuda \
               -v [select a model] \
               --weight [ Please input the path to model dir. ] \
               --img_size 800 \
               --root path/to/dataset/ \
               --show

You can run the above command to visualize the detection results on the dataset.

You might also like...
Stacked Hourglass Network with a Multi-level Attention Mechanism: Where to Look for Intervertebral Disc Labeling
Stacked Hourglass Network with a Multi-level Attention Mechanism: Where to Look for Intervertebral Disc Labeling

⚠️ ‎‎‎ A more recent and actively-maintained version of this code is available in ivadomed Stacked Hourglass Network with a Multi-level Attention Mech

implementation of paper - You Only Learn One Representation: Unified Network for Multiple Tasks
implementation of paper - You Only Learn One Representation: Unified Network for Multiple Tasks

YOLOR implementation of paper - You Only Learn One Representation: Unified Network for Multiple Tasks To reproduce the results in the paper, please us

 You Only 👀 One Sequence
You Only 👀 One Sequence

You Only 👀 One Sequence TL;DR: We study the transferability of the vanilla ViT pre-trained on mid-sized ImageNet-1k to the more challenging COCO obje

Code for "LoFTR: Detector-Free Local Feature Matching with Transformers", CVPR 2021

LoFTR: Detector-Free Local Feature Matching with Transformers Project Page | Paper LoFTR: Detector-Free Local Feature Matching with Transformers Jiami

LoFTR:Detector-Free Local Feature Matching with Transformers CVPR 2021

LoFTR-with-train-script LoFTR:Detector-Free Local Feature Matching with Transformers CVPR 2021 (with train script --- unofficial ---). About Megadepth

A Pytorch Implementation of [Source data‐free domain adaptation of object detector through domain

A Pytorch Implementation of Source data‐free domain adaptation of object detector through domain‐specific perturbation Please follow Faster R-CNN and

A Pytorch Implementation of Domain adaptation of object detector using scissor-like networks

A Pytorch Implementation of Domain adaptation of object detector using scissor-like networks Please follow Faster R-CNN and DAF to complete the enviro

Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch
Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

Transformer in Transformer Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image c

Comments
  • fix typo

    fix typo

    When I run the eval process on VOC dataset, an error occurs:

    Traceback (most recent call last):
      File "eval.py", line 126, in <module>
        voc_test(model, data_dir, device, transform)
      File "eval.py", line 42, in voc_test
        display=True)
    TypeError: __init__() got an unexpected keyword argument 'data_root'
    

    I discovered that this was due to a typo and simply fixed it. Everything is going well now.

    opened by guohanli 1
  • 标签生成函数写得有问题

    标签生成函数写得有问题

    源码中的标签生成逻辑是: 1.利用预测框与gt的l1距离筛选出topk个锚点,再利用锚点与gt的l1距离筛选出topk个锚点,将之作为预选正例锚点。 2.将预选正例锚点依据iou与gt匹配,滤除与锚点iou小于0.15的预选正例锚点 3.将gt与预测框iou<=0.7的预测框对应锚点设置为负例锚点 (而您只用了锚点,没有预选,也没用预测框)

    opened by Mr-Z-NewStar 11
Owner
Jianhua Yang
I love anime!!I love ACG!! The universe is so big,I want to fly and wander.
Jianhua Yang
GANsformer: Generative Adversarial Transformers Drew A

GANformer: Generative Adversarial Transformers Drew A. Hudson* & C. Lawrence Zitnick Update: We released the new GANformer2 paper! *I wish to thank Ch

Drew Arad Hudson 1.2k Jan 02, 2023
Code for training and evaluation of the model from "Language Generation with Recurrent Generative Adversarial Networks without Pre-training"

Language Generation with Recurrent Generative Adversarial Networks without Pre-training Code for training and evaluation of the model from "Language G

Amir Bar 253 Sep 14, 2022
A graph-to-sequence model for one-step retrosynthesis and reaction outcome prediction.

Graph2SMILES A graph-to-sequence model for one-step retrosynthesis and reaction outcome prediction. 1. Environmental setup System requirements Ubuntu:

29 Nov 18, 2022
A highly modular PyTorch framework with a focus on Neural Architecture Search (NAS).

UniNAS A highly modular PyTorch framework with a focus on Neural Architecture Search (NAS). under development (which happens mostly on our internal Gi

Cognitive Systems Research Group 19 Nov 23, 2022
This is the official repository of the paper Stocastic bandits with groups of similar arms (NeurIPS 2021). It contains the code that was used to compute the figures and experiments of the paper.

Experiments How to reproduce experimental results of Stochastic bandits with groups of similar arms submitted paper ? Section 5 of the paper To reprod

Fabien 0 Oct 25, 2021
A unofficial pytorch implementation of PAN(PSENet2): Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network

Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network Requirements pytorch 1.1+ torchvision 0.3+ pyclipper opencv3 gcc

zhoujun 400 Dec 26, 2022
Capsule endoscopy detection DACON challenge

capsule_endoscopy_detection (DACON Challenge) Overview Yolov5, Yolor, mmdetection기반의 모델을 사용 (총 11개 모델 앙상블) 모든 모델은 학습 시 Pretrained Weight을 yolov5, yolo

MAILAB 11 Nov 25, 2022
Matplotlib Image labeller for classifying images

mpl-image-labeller Use Matplotlib to label images for classification. Works anywhere Matplotlib does - from the notebook to a standalone gui! For more

Ian Hunt-Isaak 5 Sep 24, 2022
Gesture-Volume-Control - This Python program can adjust the system's volume by using hand gestures

Gesture-Volume-Control This Python program can adjust the system's volume by usi

VatsalAryanBhatanagar 1 Dec 30, 2021
TSP: Temporally-Sensitive Pretraining of Video Encoders for Localization Tasks

TSP: Temporally-Sensitive Pretraining of Video Encoders for Localization Tasks [Paper] [Project Website] This repository holds the source code, pretra

Humam Alwassel 83 Dec 21, 2022
Using python and scikit-learn to make stock predictions

MachineLearningStocks in python: a starter project and guide EDIT as of Feb 2021: MachineLearningStocks is no longer actively maintained MachineLearni

Robert Martin 1.3k Dec 29, 2022
Style transfer between images was performed using the VGG19 model

Style transfer between images was performed using the VGG19 model. The necessary codes, libraries and all other information of this project are available below

Onur yılmaz 2 May 09, 2022
Bilinear attention networks for visual question answering

Bilinear Attention Networks This repository is the implementation of Bilinear Attention Networks for the visual question answering and Flickr30k Entit

Jin-Hwa Kim 506 Nov 29, 2022
Pytorch Implementation of Interaction Networks for Learning about Objects, Relations and Physics

Interaction-Network-Pytorch Pytorch Implementraion of Interaction Networks for Learning about Objects, Relations and Physics. Interaction Network is a

117 Nov 05, 2022
Accurate Phylogenetic Inference with Symmetry-Preserving Neural Networks

Accurate Phylogenetic Inference with a Symmetry-preserving Neural Network Model Claudia Solis-Lemus Shengwen Yang Leonardo Zepeda-Núñez This repositor

Leonardo Zepeda-Núñez 2 Feb 11, 2022
ONNX Runtime Web demo is an interactive demo portal showing real use cases running ONNX Runtime Web in VueJS.

ONNX Runtime Web demo is an interactive demo portal showing real use cases running ONNX Runtime Web in VueJS. It currently supports four examples for you to quickly experience the power of ONNX Runti

Microsoft 58 Dec 18, 2022
Python package for visualizing the loss landscape of parameterized quantum algorithms.

orqviz A Python package for easily visualizing the loss landscape of Variational Quantum Algorithms by Zapata Computing Inc. orqviz provides a collect

Zapata Computing, Inc. 75 Dec 30, 2022
Styleformer - Official Pytorch Implementation

Styleformer -- Official PyTorch implementation Styleformer: Transformer based Generative Adversarial Networks with Style Vector(https://arxiv.org/abs/

Jeeseung Park 159 Dec 12, 2022
Py-faster-rcnn - Faster R-CNN (Python implementation)

py-faster-rcnn has been deprecated. Please see Detectron, which includes an implementation of Mask R-CNN. Disclaimer The official Faster R-CNN code (w

Ross Girshick 7.8k Jan 03, 2023
Torch-based tool for quantizing high-dimensional vectors using additive codebooks

Trainable multi-codebook quantization This repository implements a utility for use with PyTorch, and ideally GPUs, for training an efficient quantizer

Daniel Povey 41 Jan 07, 2023