A PyTorch version of You Only Look at One-level Feature object detector

Last update: Dec 30, 2022

Related tags

Overview

PyTorch_YOLOF

A PyTorch version of You Only Look at One-level Feature object detector.

The input image must be resized to have their shorter side being 800 and their longer side less or equal to 1333.

During reproducing the YOLOF, I found many tricks used in YOLOF but the baseline RetinaNet dosen't use those tricks. For example, YOLOF takes advantage of RandomShift, CTR_CLAMP, large learning rate, big batchsize(like 64), negative prediction threshold. Is it really fair that YOLOF use these tricks to compare with RetinaNet?

In a other word, whether the YOLOF can still work without those tricks?

Requirements

We recommend you to use Anaconda to create a conda environment:

conda create -n yolof python=3.6

Then, activate the environment:

conda activate yolof

Requirements:

pip install -r requirements.txt

PyTorch >= 1.1.0 and Torchvision >= 0.3.0

Visualize positive sample

You can run following command to visualize positiva sample:

python train.py \
        -d voc \
        --batch_size 2 \
        --root path/to/your/dataset \
        --vis_targets

My Ablation Studies

image mask

Backbone: ResNet-50
image size: shorter size = 800, longer size <= 1333
Batch size: 16
lr: 0.01
lr of backbone: 0.01
SGD with momentum 0.9 and weight decay 1e-4
Matcher: IoU Top4 (Different from the official matcher that uses top4 of L1 distance.)
epoch: 12 (1x schedule)
lr decay: 8, 11
augmentation: RandomFlip

We ignore the loss of samples who are not in image.

Method	AP	AP50	AP75	APs	APm	APl
w/o mask	28.3	46.7	28.9	13.4	33.4	39.9
w mask	28.4	46.9	29.1	13.5	33.5	39.1

L1 Top4

Backbone: ResNet-50
image size: shorter size = 800, longer size <= 1333
Batch size: 16
lr: 0.01
lr of backbone: 0.01
SGD with momentum 0.9 and weight decay 1e-4
epoch: 12 (1x schedule)
lr decay: 8, 11
augmentation: RandomFlip
with image mask

IoU topk: We choose the topK of IoU between anchor boxes and labels as the positive samples.

L1 topk: We choose the topK of L1 distance between anchor boxes and labels as the positive samples.

Method	AP	AP50	AP75	APs	APm	APl
IoU Top4	28.4	46.9	29.1	13.5	33.5	39.1
L1 Top4	28.6	46.9	29.4	13.8	34.0	39.0

RandomShift Augmentation

Backbone: ResNet-50
image size: shorter size = 800, longer size <= 1333
Batch size: 16
lr: 0.01
lr of backbone: 0.01
SGD with momentum 0.9 and weight decay 1e-4
Matcher: L1 Top4
epoch: 12 (1x schedule)
lr decay: 8, 11
augmentation: RandomFlip
with image mask

YOLOF takes advantage of RandomShift augmentation which is not used in RetinaNet.

Method	AP	AP50	AP75	APs	APm	APl
w/o RandomShift	28.6	46.9	29.4	13.8	34.0	39.0
w/ RandomShift	29.0	47.3	29.8	14.2	34.2	38.9

Fix a bug in dataloader

Backbone: ResNet-50
image size: shorter size = 800, longer size <= 1333
Batch size: 16
lr: 0.01
lr of backbone: 0.01
SGD with momentum 0.9 and weight decay 1e-4
Matcher: L1 Top4
epoch: 12 (1x schedule)
lr decay: 8, 11
augmentation: RandomFlip + RandomShift
with image mask

I fixed a bug in dataloader. Specifically, I set the shuffle in dataloader as False ...

Method	AP	AP50	AP75	APs	APm	APl
bug	29.0	47.3	29.8	14.2	34.2	38.9
no bug	30.1	49.0	31.0	15.2	36.3	39.8

Ignore samples

Backbone: ResNet-50
image size: shorter size = 800, longer size <= 1333
Batch size: 16
lr: 0.01
lr of backbone: 0.01
SGD with momentum 0.9 and weight decay 1e-4
Matcher: L1 Top4
epoch: 12 (1x schedule)
lr decay: 8, 11
augmentation: RandomFlip + RandomShift
with image mask

We ignore those negative samples whose IoU with labels are higher the ignore threshold (igt).

Method	AP	AP50	AP75	APs	APm	APl
no igt	30.1	49.0	31.0	15.2	36.3	39.8
igt=0.7

Decode boxes

Backbone: ResNet-50
image size: shorter size = 800, longer size <= 1333
Batch size: 16
lr: 0.01
lr of backbone: 0.01
SGD with momentum 0.9 and weight decay 1e-4
Matcher: L1 Top4
epoch: 12 (1x schedule)
lr decay: 8, 11
augmentation: RandomFlip + RandomShift
with image mask

Method-1: ctr_x = x_anchor + t_x, ctr_y = y_anchor + t_y

Method-2: ctr_x = x_anchor + t_x * w_anchor, ctr_y = y_anchor + t_y * h_anchor

The Method-2 is following the operation used in YOLOF.

Method	AP	AP50	AP75	APs	APm	APl
Method-1
Method-2

Train

sh train.sh

You can change the configurations of train.sh.

If you just want to check which anchor box is assigned to the positive sample, you can run:

python train.py --cuda -d voc --batch_size 8 --vis_targets

According to your own situation, you can make necessary adjustments to the above run commands

Test

python test.py -d [select a dataset: voc or coco] \
               --cuda \
               -v [select a model] \
               --weight [ Please input the path to model dir. ] \
               --img_size 800 \
               --root path/to/dataset/ \
               --show

You can run the above command to visualize the detection results on the dataset.

Comments

fix typo

When I run the eval process on VOC dataset, an error occurs:

Traceback (most recent call last):
  File "eval.py", line 126, in <module>
    voc_test(model, data_dir, device, transform)
  File "eval.py", line 42, in voc_test
    display=True)
TypeError: __init__() got an unexpected keyword argument 'data_root'

I discovered that this was due to a typo and simply fixed it. Everything is going well now.

opened by guohanli 1

标签生成函数写得有问题

源码中的标签生成逻辑是： 1.利用预测框与gt的l1距离筛选出topk个锚点，再利用锚点与gt的l1距离筛选出topk个锚点，将之作为预选正例锚点。 2.将预选正例锚点依据iou与gt匹配，滤除与锚点iou小于0.15的预选正例锚点 3.将gt与预测框iou<=0.7的预测框对应锚点设置为负例锚点 (而您只用了锚点，没有预选，也没用预测框)

opened by Mr-Z-NewStar 11

Stacked Hourglass Network with a Multi-level Attention Mechanism: Where to Look for Intervertebral Disc Labeling

⚠️ ‎‎‎ A more recent and actively-maintained version of this code is available in ivadomed Stacked Hourglass Network with a Multi-level Attention Mech

14 Oct 24, 2022

implementation of paper - You Only Learn One Representation: Unified Network for Multiple Tasks

YOLOR implementation of paper - You Only Learn One Representation: Unified Network for Multiple Tasks To reproduce the results in the paper, please us

1.8k Jan 4, 2023

You Only 👀 One Sequence

You Only 👀 One Sequence TL;DR: We study the transferability of the vanilla ViT pre-trained on mid-sized ImageNet-1k to the more challenging COCO obje

666 Jan 3, 2023

Hand gesture recognition based whiteboard that allows you to write on live webcam. This is the first version and has features like 4 different colors, eraser and a recording option that records your session and saves it in a "recordings" folder. Use index finger to draw and two or more fingers to move around and select items. Future version will contain more functionalities like changeable thickness, color palette, integration with zoom and google meet etc.

hand-write Hand gesture recognition based whiteboard that allows you to write on live webcam. This is the first version and has features like 4 differ

27 Dec 16, 2022

Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

Transformer in Transformer Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image c

272 Dec 23, 2022

A PyTorch version of You Only Look at One-level Feature object detector

Related tags

Overview

PyTorch_YOLOF

Requirements

Visualize positive sample

My Ablation Studies

image mask

L1 Top4

RandomShift Augmentation

Fix a bug in dataloader

Ignore samples

Decode boxes

Train

Test

You might also like...

Stacked Hourglass Network with a Multi-level Attention Mechanism: Where to Look for Intervertebral Disc Labeling

implementation of paper - You Only Learn One Representation: Unified Network for Multiple Tasks

You Only 👀 One Sequence

Code for "LoFTR: Detector-Free Local Feature Matching with Transformers", CVPR 2021

LoFTR:Detector-Free Local Feature Matching with Transformers CVPR 2021

A Pytorch Implementation of [Source data‐free domain adaptation of object detector through domain

A Pytorch Implementation of Domain adaptation of object detector using scissor-like networks

Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

Comments

fix typo

标签生成函数写得有问题

Releases(YOLOF-weight)

YOLOF-weight(Mar 20, 2022)

Owner

Jianhua Yang

GANsformer: Generative Adversarial Transformers Drew A

Code for training and evaluation of the model from "Language Generation with Recurrent Generative Adversarial Networks without Pre-training"

A graph-to-sequence model for one-step retrosynthesis and reaction outcome prediction.

A highly modular PyTorch framework with a focus on Neural Architecture Search (NAS).

This is the official repository of the paper Stocastic bandits with groups of similar arms (NeurIPS 2021). It contains the code that was used to compute the figures and experiments of the paper.

A unofficial pytorch implementation of PAN(PSENet2): Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network

Capsule endoscopy detection DACON challenge

Matplotlib Image labeller for classifying images

Gesture-Volume-Control - This Python program can adjust the system's volume by using hand gestures

TSP: Temporally-Sensitive Pretraining of Video Encoders for Localization Tasks

Using python and scikit-learn to make stock predictions

Style transfer between images was performed using the VGG19 model

Bilinear attention networks for visual question answering

Pytorch Implementation of Interaction Networks for Learning about Objects, Relations and Physics

Accurate Phylogenetic Inference with Symmetry-Preserving Neural Networks

ONNX Runtime Web demo is an interactive demo portal showing real use cases running ONNX Runtime Web in VueJS.

Python package for visualizing the loss landscape of parameterized quantum algorithms.

Styleformer - Official Pytorch Implementation

Py-faster-rcnn - Faster R-CNN (Python implementation)

Torch-based tool for quantizing high-dimensional vectors using additive codebooks