Leaderboard, taxonomy, and curated list of few-shot object detection papers.

Overview

Awesome Few-Shot Object Detection (FSOD)

Leaderboard, taxonomy, and curated list of few-shot object detection papers.

Maintainers: Gabriel Huang

For an introduction to the few-shot object detection framework read below, or check our our survey on few-shot and self-supervised object detection and its project page for full explanations, discussions on the pitfalls of the Pascal, COCO, and LVIS benchmarks used below, main takeaways and future research directions.

Contributing

If you want to add your paper or report a mistake, please create a pull request with all supporting information. Thanks!

Pascal VOC and MS COCO FSOD Leaderboard

In this table we distinguish Kang's Splits (Meta-YOLO) from TFA's splits (Frustratingly Simple FSOD), as the Kang splits have been shown to have high variance and overestimate performance for low number of shots (see for yourself -- check the difference between TFA 1-shot and Kang 1-shot in the table below).

Name Type VOC TFA 1-shot (mAP50) VOC TFA 3-shot (mAP50) VOC TFA 10-shot (mAP50) VOC Kang 1-shot (mAP50) VOC Kang 3-shot (mAP50) VOC Kang 10-shot (mAP50) MS COCO 10-shot (mAP) MS COCO 30-shot (mAP)
LSTD finetuning - - - 8.2 12.4 38.5 - -
RepMet prototype - - - 26.1 34.4 41.3 - -
Meta-YOLO modulation 14.2 29.8 - 14.8 26.7 47.2 5.6 9.1
MetaDet modulation - - - 18.9 30.2 49.6 7.1 11.3
Meta-RCNN modulation - - - 19.9 35.0 51.5 8.7 12.4
Faster RCNN+FT finetuning 9.9 21.6 35.6 15.2 29.0 45.5 9.2 12.5
ACM-MetaRCNN modulation - - - 31.9 35.9 53.1 9.4 12.8
TFA w/fc finetuning 22.9 40.4 52.0 36.8 43.6 57.0 10.0 13.4
TFA w/cos finetuning 25.3 42.1 52.8 39.8 44.7 56.0 10.0 13.7
Retentive RCNN finetuning - - - 42.0 46.0 56.0 10.5 13.8
MPSR finetuning - - - 41.7 51.4 61.8 9.8 14.1
Attention-FSOD modulation - - - - - - 12.0 -
FsDetView finetuning 24.2 42.2 57.4 - - - 12.5 14.7
CME finetuning - - - 41.5 50.4 60.9 15.1 16.9
TIP add-on 27.7 43.3 59.6 - - - 16.3 18.3
DAnA modulation - - - - - - 18.6 21.6
DeFRCN prototype - - - 53.6 61.5 60.8 18.5 22.6
Meta-DETR modulation 20.4 46.6 57.8 - - - 17.8 22.9
DETReg finetuning - - - - - - 18.0 30.0

Few-Shot Object Detection Explained

We explain the few-shot object detection framework as defined by the Meta-YOLO paper (Kang's splits - full details here). FSOD partitions objects into two disjoint sets of categories: base or known/source classes, which are object categories for which we have access to a large number of training examples; and novel or unseen/target classes, for which we have only a few training examples (shots) per class. The FSOD task is formalized into the following steps:

  • 1. Base training.¹ Annotations are given only for the base classes, with a large number of training examples per class (bikes in the example). We train the FSOD method on the base classes.
  • 2. Few-shot finetuning. Annotations are given for the support set, a very small number of training examples from both the base and novel classes (one bike and one human in the example). Most methods finetune the FSOD model on the support set, but some methods might only use the support set for conditioning during evaluation (finetuning-free methods).
  • 3. Few-shot evaluation. We evaluate the FSOD to jointly detect base and novel classes from the test set (few-shot refers to the size of the support set). The performance metrics are reported separately for base and novel classes. Common evaluation metrics are variants of the mean average precision: mAP50 for Pascal and COCO-style mAP for COCO. They are often denoted bAP50, bAP75, bAP (resp. nAP50, nAP75, nAP) for the base and novel classes respectively, where the number is the IoU-threshold in percentage.

In pure FSOD, methods are usually compared solely on the basis of novel class performance, whereas in Generalized FSOD, methods are compared on both base and novel class performances [2]. Note that "training" and "test" set refer to the splits used in traditional object detection. Base and novel classes are typically present in both the training and testing sets; however, the novel class annotations are filtered out from the training set during base training; during few-shot finetuning, the support set is typically taken to be a (fixed) subset of the training set; during few-shot evaluation, all of the test set is used to reduce uncertainty [1].

For conditioning-based methods with no finetuning, few-shot finetuning and few-shot evaluation are merged into a single step; the novel examples are used as support examples to condition the model, and predictions are made directly on the test set. In practice, the majority of conditioning-based methods reviewed in this survey do benefit from some form of finetuning.

*¹In the context of self-supervised learning, base-training may also be referred to as finetuning or training. This should not be confused with base training in the meta-learning framework; rather this is similar to the meta-training phase [3].

Owner
Gabriel Huang
PhD student at MILA
Gabriel Huang
A public available dataset for road boundary detection in aerial images

Topo-boundary This is the official github repo of paper Topo-boundary: A Benchmark Dataset on Topological Road-boundary Detection Using Aerial Images

Zhenhua Xu 79 Jan 04, 2023
Code for Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation (CVPR 2021)

Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation (CVPR 2021) Hang Zhou, Yasheng Sun, Wayne Wu, Chen Cha

Hang_Zhou 628 Dec 28, 2022
Fre-GAN: Adversarial Frequency-consistent Audio Synthesis

Fre-GAN Vocoder Fre-GAN: Adversarial Frequency-consistent Audio Synthesis Training: python train.py --config config.json Citation: @misc{kim2021frega

Rishikesh (ऋषिकेश) 93 Dec 17, 2022
Official repository for Fourier model that can generate periodic signals

Conditional Generation of Periodic Signals with Fourier-Based Decoder Jiyoung Lee, Wonjae Kim, Daehoon Gwak, Edward Choi This repository provides offi

8 May 25, 2022
Code for AA-RMVSNet: Adaptive Aggregation Recurrent Multi-view Stereo Network (ICCV 2021).

AA-RMVSNet Code for AA-RMVSNet: Adaptive Aggregation Recurrent Multi-view Stereo Network (ICCV 2021) in PyTorch. paper link: arXiv | CVF Change Log Ju

Qingtian Zhu 97 Dec 30, 2022
A library of extension and helper modules for Python's data analysis and machine learning libraries.

Mlxtend (machine learning extensions) is a Python library of useful tools for the day-to-day data science tasks. Sebastian Raschka 2014-2020 Links Doc

Sebastian Raschka 4.2k Jan 02, 2023
Pytorch implementation of Masked Auto-Encoder

Masked Auto-Encoder (MAE) Pytorch implementation of Masked Auto-Encoder: Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, Ross Girshick

Jiyuan 22 Dec 13, 2022
Supervised domain-agnostic prediction framework for probabilistic modelling

A supervised domain-agnostic framework that allows for probabilistic modelling, namely the prediction of probability distributions for individual data

The Alan Turing Institute 112 Oct 23, 2022
PyTorch implementation of NIPS 2017 paper Dynamic Routing Between Capsules

Dynamic Routing Between Capsules - PyTorch implementation PyTorch implementation of NIPS 2017 paper Dynamic Routing Between Capsules from Sara Sabour,

Adam Bielski 475 Dec 24, 2022
you can add any codes in any language by creating its respective folder (if already not available).

HACKTOBERFEST-2021-WEB-DEV Beginner-Hacktoberfest Need Your first pr for hacktoberfest 2k21 ? come on in About This is repository of Responsive Portfo

Suman Sharma 8 Oct 17, 2022
SAPIEN Manipulation Skill Benchmark

ManiSkill Benchmark SAPIEN Manipulation Skill Benchmark (abbreviated as ManiSkill, pronounced as "Many Skill") is a large-scale learning-from-demonstr

Hao Su's Lab, UCSD 107 Jan 08, 2023
Educational 2D SLAM implementation based on ICP and Pose Graph

slam-playground Educational 2D SLAM implementation based on ICP and Pose Graph How to use: Use keyboard arrow keys to navigate robot. Press 'r' to vie

Kirill 19 Dec 17, 2022
Adabelief-Optimizer - Repository for NeurIPS 2020 Spotlight "AdaBelief Optimizer: Adapting stepsizes by the belief in observed gradients"

AdaBelief Optimizer NeurIPS 2020 Spotlight, trains fast as Adam, generalizes well as SGD, and is stable to train GANs. Release of package We have rele

Juntang Zhuang 998 Dec 29, 2022
Saliency - Framework-agnostic implementation for state-of-the-art saliency methods (XRAI, BlurIG, SmoothGrad, and more).

Saliency Methods 🔴 Now framework-agnostic! (Example core notebook) 🔴 🔗 For further explanation of the methods and more examples of the resulting ma

PAIR code 849 Dec 27, 2022
The BCNet related data and inference model.

BCNet This repository includes the some source code and related dataset of paper BCNet: Learning Body and Cloth Shape from A Single Image, ECCV 2020,

81 Dec 12, 2022
Pytorch implementation for "Adversarial Robustness under Long-Tailed Distribution" (CVPR 2021 Oral)

Adversarial Long-Tail This repository contains the PyTorch implementation of the paper: Adversarial Robustness under Long-Tailed Distribution, CVPR 20

Tong WU 89 Dec 15, 2022
Progressive Coordinate Transforms for Monocular 3D Object Detection

Progressive Coordinate Transforms for Monocular 3D Object Detection This repository is the official implementation of PCT. Introduction In this paper,

58 Nov 06, 2022
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

Salesforce 1.3k Dec 31, 2022
A Structured Self-attentive Sentence Embedding

Structured Self-attentive sentence embeddings Implementation for the paper A Structured Self-Attentive Sentence Embedding, which was published in ICLR

Kaushal Shetty 488 Nov 28, 2022
TJU Deep Learning & Neural Network

Deep_Learning & Neural_Network_Lab 实验环境 Python 3.9 Anaconda3(官网下载或清华镜像都行) PyTorch 1.10.1(安装代码如下) conda install pytorch torchvision torchaudio cudatool

St3ve Lee 1 Jan 19, 2022