A PyTorch version of You Only Look at One-level Feature object detector

Overview

PyTorch_YOLOF

A PyTorch version of You Only Look at One-level Feature object detector.

The input image must be resized to have their shorter side being 800 and their longer side less or equal to 1333.

During reproducing the YOLOF, I found many tricks used in YOLOF but the baseline RetinaNet dosen't use those tricks. For example, YOLOF takes advantage of RandomShift, CTR_CLAMP, large learning rate, big batchsize(like 64), negative prediction threshold. Is it really fair that YOLOF use these tricks to compare with RetinaNet?

In a other word, whether the YOLOF can still work without those tricks?

Requirements

  • We recommend you to use Anaconda to create a conda environment:
conda create -n yolof python=3.6
  • Then, activate the environment:
conda activate yolof
  • Requirements:
pip install -r requirements.txt 

PyTorch >= 1.1.0 and Torchvision >= 0.3.0

Visualize positive sample

You can run following command to visualize positiva sample:

python train.py \
        -d voc \
        --batch_size 2 \
        --root path/to/your/dataset \
        --vis_targets

My Ablation Studies

image mask

  • Backbone: ResNet-50
  • image size: shorter size = 800, longer size <= 1333
  • Batch size: 16
  • lr: 0.01
  • lr of backbone: 0.01
  • SGD with momentum 0.9 and weight decay 1e-4
  • Matcher: IoU Top4 (Different from the official matcher that uses top4 of L1 distance.)
  • epoch: 12 (1x schedule)
  • lr decay: 8, 11
  • augmentation: RandomFlip

We ignore the loss of samples who are not in image.

Method AP AP50 AP75 APs APm APl
w/o mask 28.3 46.7 28.9 13.4 33.4 39.9
w mask 28.4 46.9 29.1 13.5 33.5 39.1

L1 Top4

  • Backbone: ResNet-50
  • image size: shorter size = 800, longer size <= 1333
  • Batch size: 16
  • lr: 0.01
  • lr of backbone: 0.01
  • SGD with momentum 0.9 and weight decay 1e-4
  • epoch: 12 (1x schedule)
  • lr decay: 8, 11
  • augmentation: RandomFlip
  • with image mask

IoU topk: We choose the topK of IoU between anchor boxes and labels as the positive samples.

L1 topk: We choose the topK of L1 distance between anchor boxes and labels as the positive samples.

Method AP AP50 AP75 APs APm APl
IoU Top4 28.4 46.9 29.1 13.5 33.5 39.1
L1 Top4 28.6 46.9 29.4 13.8 34.0 39.0

RandomShift Augmentation

  • Backbone: ResNet-50
  • image size: shorter size = 800, longer size <= 1333
  • Batch size: 16
  • lr: 0.01
  • lr of backbone: 0.01
  • SGD with momentum 0.9 and weight decay 1e-4
  • Matcher: L1 Top4
  • epoch: 12 (1x schedule)
  • lr decay: 8, 11
  • augmentation: RandomFlip
  • with image mask

YOLOF takes advantage of RandomShift augmentation which is not used in RetinaNet.

Method AP AP50 AP75 APs APm APl
w/o RandomShift 28.6 46.9 29.4 13.8 34.0 39.0
w/ RandomShift 29.0 47.3 29.8 14.2 34.2 38.9

Fix a bug in dataloader

  • Backbone: ResNet-50
  • image size: shorter size = 800, longer size <= 1333
  • Batch size: 16
  • lr: 0.01
  • lr of backbone: 0.01
  • SGD with momentum 0.9 and weight decay 1e-4
  • Matcher: L1 Top4
  • epoch: 12 (1x schedule)
  • lr decay: 8, 11
  • augmentation: RandomFlip + RandomShift
  • with image mask

I fixed a bug in dataloader. Specifically, I set the shuffle in dataloader as False ...

Method AP AP50 AP75 APs APm APl
bug 29.0 47.3 29.8 14.2 34.2 38.9
no bug 30.1 49.0 31.0 15.2 36.3 39.8

Ignore samples

  • Backbone: ResNet-50
  • image size: shorter size = 800, longer size <= 1333
  • Batch size: 16
  • lr: 0.01
  • lr of backbone: 0.01
  • SGD with momentum 0.9 and weight decay 1e-4
  • Matcher: L1 Top4
  • epoch: 12 (1x schedule)
  • lr decay: 8, 11
  • augmentation: RandomFlip + RandomShift
  • with image mask

We ignore those negative samples whose IoU with labels are higher the ignore threshold (igt).

Method AP AP50 AP75 APs APm APl
no igt 30.1 49.0 31.0 15.2 36.3 39.8
igt=0.7

Decode boxes

  • Backbone: ResNet-50
  • image size: shorter size = 800, longer size <= 1333
  • Batch size: 16
  • lr: 0.01
  • lr of backbone: 0.01
  • SGD with momentum 0.9 and weight decay 1e-4
  • Matcher: L1 Top4
  • epoch: 12 (1x schedule)
  • lr decay: 8, 11
  • augmentation: RandomFlip + RandomShift
  • with image mask

Method-1: ctr_x = x_anchor + t_x, ctr_y = y_anchor + t_y

Method-2: ctr_x = x_anchor + t_x * w_anchor, ctr_y = y_anchor + t_y * h_anchor

The Method-2 is following the operation used in YOLOF.

Method AP AP50 AP75 APs APm APl
Method-1
Method-2

Train

sh train.sh

You can change the configurations of train.sh.

If you just want to check which anchor box is assigned to the positive sample, you can run:

python train.py --cuda -d voc --batch_size 8 --vis_targets

According to your own situation, you can make necessary adjustments to the above run commands

Test

python test.py -d [select a dataset: voc or coco] \
               --cuda \
               -v [select a model] \
               --weight [ Please input the path to model dir. ] \
               --img_size 800 \
               --root path/to/dataset/ \
               --show

You can run the above command to visualize the detection results on the dataset.

You might also like...
Stacked Hourglass Network with a Multi-level Attention Mechanism: Where to Look for Intervertebral Disc Labeling
Stacked Hourglass Network with a Multi-level Attention Mechanism: Where to Look for Intervertebral Disc Labeling

⚠️ ‎‎‎ A more recent and actively-maintained version of this code is available in ivadomed Stacked Hourglass Network with a Multi-level Attention Mech

implementation of paper - You Only Learn One Representation: Unified Network for Multiple Tasks
implementation of paper - You Only Learn One Representation: Unified Network for Multiple Tasks

YOLOR implementation of paper - You Only Learn One Representation: Unified Network for Multiple Tasks To reproduce the results in the paper, please us

 You Only 👀 One Sequence
You Only 👀 One Sequence

You Only 👀 One Sequence TL;DR: We study the transferability of the vanilla ViT pre-trained on mid-sized ImageNet-1k to the more challenging COCO obje

Code for "LoFTR: Detector-Free Local Feature Matching with Transformers", CVPR 2021

LoFTR: Detector-Free Local Feature Matching with Transformers Project Page | Paper LoFTR: Detector-Free Local Feature Matching with Transformers Jiami

LoFTR:Detector-Free Local Feature Matching with Transformers CVPR 2021

LoFTR-with-train-script LoFTR:Detector-Free Local Feature Matching with Transformers CVPR 2021 (with train script --- unofficial ---). About Megadepth

A Pytorch Implementation of [Source data‐free domain adaptation of object detector through domain

A Pytorch Implementation of Source data‐free domain adaptation of object detector through domain‐specific perturbation Please follow Faster R-CNN and

A Pytorch Implementation of Domain adaptation of object detector using scissor-like networks

A Pytorch Implementation of Domain adaptation of object detector using scissor-like networks Please follow Faster R-CNN and DAF to complete the enviro

Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch
Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

Transformer in Transformer Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image c

Comments
  • fix typo

    fix typo

    When I run the eval process on VOC dataset, an error occurs:

    Traceback (most recent call last):
      File "eval.py", line 126, in <module>
        voc_test(model, data_dir, device, transform)
      File "eval.py", line 42, in voc_test
        display=True)
    TypeError: __init__() got an unexpected keyword argument 'data_root'
    

    I discovered that this was due to a typo and simply fixed it. Everything is going well now.

    opened by guohanli 1
  • 标签生成函数写得有问题

    标签生成函数写得有问题

    源码中的标签生成逻辑是: 1.利用预测框与gt的l1距离筛选出topk个锚点,再利用锚点与gt的l1距离筛选出topk个锚点,将之作为预选正例锚点。 2.将预选正例锚点依据iou与gt匹配,滤除与锚点iou小于0.15的预选正例锚点 3.将gt与预测框iou<=0.7的预测框对应锚点设置为负例锚点 (而您只用了锚点,没有预选,也没用预测框)

    opened by Mr-Z-NewStar 11
Owner
Jianhua Yang
I love anime!!I love ACG!! The universe is so big,I want to fly and wander.
Jianhua Yang
Framework web SnakeServer.

SnakeServer - Framework Web 🐍 Documentação oficial do framework SnakeServer. Conteúdo Sobre Como contribuir Enviar relatórios de segurança Pull reque

Jaedson Silva 0 Jul 21, 2022
Api for getting bin info and getting encrypted card details for adyen.

Bin Info And Adyen Cse Enc Python api for getting bin info and getting encrypted

Roldex Stark 8 Dec 30, 2022
bio_inspired_min_nets_improve_the_performance_and_robustness_of_deep_networks

Code Submission for: Bio-inspired Min-Nets Improve the Performance and Robustness of Deep Networks Run with docker To build a docker environment, chan

0 Dec 09, 2021
MetaBalance: High-Performance Neural Networks for Class-Imbalanced Data

This repository is the official PyTorch implementation of Meta-Balance. Find the paper on arxiv MetaBalance: High-Performance Neural Networks for Clas

Arpit Bansal 20 Oct 18, 2021
A heterogeneous entity-augmented academic language model based on Open Academic Graph (OAG)

Library | Paper | Slack We released two versions of OAG-BERT in CogDL package. OAG-BERT is a heterogeneous entity-augmented academic language model wh

THUDM 58 Dec 17, 2022
bespoke tooling for offensive security's Windows Usermode Exploit Dev course (OSED)

osed-scripts bespoke tooling for offensive security's Windows Usermode Exploit Dev course (OSED) Table of Contents Standalone Scripts egghunter.py fin

epi 268 Jan 05, 2023
Rule Extraction Methods for Interactive eXplainability

REMIX: Rule Extraction Methods for Interactive eXplainability This repository contains a variety of tools and methods for extracting interpretable rul

Mateo Espinosa Zarlenga 21 Jan 03, 2023
This repository contains the map content ontology used in narrative cartography

Narrative-cartography-ontology This repository contains the map content ontology used in narrative cartography, which is associated with a submission

Weiming Huang 0 Oct 31, 2021
Predict and time series avocado hass

RECOMMENDER SYSTEM MARKETING TỔNG QUAN VỀ HỆ THỐNG DỮ LIỆU 1. Giới thiệu - Tiki là một hệ sinh thái thương mại "all in one", trong đó có tiki.vn, là

hieulmsc 3 Jan 10, 2022
As-ViT: Auto-scaling Vision Transformers without Training

As-ViT: Auto-scaling Vision Transformers without Training [PDF] Wuyang Chen, Wei Huang, Xianzhi Du, Xiaodan Song, Zhangyang Wang, Denny Zhou In ICLR 2

VITA 68 Sep 05, 2022
This repo contains the source code and a benchmark for predicting user's utilities with Machine Learning techniques for Computational Persuasion

Machine Learning for Argument-Based Computational Persuasion This repo contains the source code and a benchmark for predicting user's utilities with M

Ivan Donadello 4 Nov 07, 2022
Official code for the paper: Deep Graph Matching under Quadratic Constraint (CVPR 2021)

QC-DGM This is the official PyTorch implementation and models for our CVPR 2021 paper: Deep Graph Matching under Quadratic Constraint. It also contain

Quankai Gao 55 Nov 14, 2022
Compute descriptors for 3D point cloud registration using a multi scale sparse voxel architecture

MS-SVConv : 3D Point Cloud Registration with Multi-Scale Architecture and Self-supervised Fine-tuning Compute features for 3D point cloud registration

42 Jul 25, 2022
RCT-ART is an NLP pipeline built with spaCy for converting clinical trial result sentences into tables through jointly extracting intervention, outcome and outcome measure entities and their relations.

Randomised controlled trial abstract result tabulator RCT-ART is an NLP pipeline built with spaCy for converting clinical trial result sentences into

2 Sep 16, 2022
FLAVR is a fast, flow-free frame interpolation method capable of single shot multi-frame prediction

FLAVR is a fast, flow-free frame interpolation method capable of single shot multi-frame prediction. It uses a customized encoder decoder architecture with spatio-temporal convolutions and channel ga

Tarun K 280 Dec 23, 2022
ICNet for Real-Time Semantic Segmentation on High-Resolution Images, ECCV2018

ICNet for Real-Time Semantic Segmentation on High-Resolution Images by Hengshuang Zhao, Xiaojuan Qi, Xiaoyong Shen, Jianping Shi, Jiaya Jia, details a

Hengshuang Zhao 594 Dec 31, 2022
Gradient-free global optimization algorithm for multidimensional functions based on the low rank tensor train format

ttopt Description Gradient-free global optimization algorithm for multidimensional functions based on the low rank tensor train (TT) format and maximu

5 May 23, 2022
Learning infinite-resolution image processing with GAN and RL from unpaired image datasets, using a differentiable photo editing model.

Exposure: A White-Box Photo Post-Processing Framework ACM Transactions on Graphics (presented at SIGGRAPH 2018) Yuanming Hu1,2, Hao He1,2, Chenxi Xu1,

Yuanming Hu 719 Dec 29, 2022
This repository contains the code for our paper VDA (public in EMNLP2021 main conference)

Virtual Data Augmentation: A Robust and General Framework for Fine-tuning Pre-trained Models This repository contains the code for our paper VDA (publ

RUCAIBox 13 Aug 06, 2022
Keras Implementation of Neural Style Transfer from the paper "A Neural Algorithm of Artistic Style"

Neural Style Transfer & Neural Doodles Implementation of Neural Style Transfer from the paper A Neural Algorithm of Artistic Style in Keras 2.0+ INetw

Somshubra Majumdar 2.2k Dec 31, 2022