Leaderboard, taxonomy, and curated list of few-shot object detection papers.

Overview

Awesome Few-Shot Object Detection (FSOD)

Leaderboard, taxonomy, and curated list of few-shot object detection papers.

Maintainers: Gabriel Huang

For an introduction to the few-shot object detection framework read below, or check our our survey on few-shot and self-supervised object detection and its project page for full explanations, discussions on the pitfalls of the Pascal, COCO, and LVIS benchmarks used below, main takeaways and future research directions.

Contributing

If you want to add your paper or report a mistake, please create a pull request with all supporting information. Thanks!

Pascal VOC and MS COCO FSOD Leaderboard

In this table we distinguish Kang's Splits (Meta-YOLO) from TFA's splits (Frustratingly Simple FSOD), as the Kang splits have been shown to have high variance and overestimate performance for low number of shots (see for yourself -- check the difference between TFA 1-shot and Kang 1-shot in the table below).

Name Type VOC TFA 1-shot (mAP50) VOC TFA 3-shot (mAP50) VOC TFA 10-shot (mAP50) VOC Kang 1-shot (mAP50) VOC Kang 3-shot (mAP50) VOC Kang 10-shot (mAP50) MS COCO 10-shot (mAP) MS COCO 30-shot (mAP)
LSTD finetuning - - - 8.2 12.4 38.5 - -
RepMet prototype - - - 26.1 34.4 41.3 - -
Meta-YOLO modulation 14.2 29.8 - 14.8 26.7 47.2 5.6 9.1
MetaDet modulation - - - 18.9 30.2 49.6 7.1 11.3
Meta-RCNN modulation - - - 19.9 35.0 51.5 8.7 12.4
Faster RCNN+FT finetuning 9.9 21.6 35.6 15.2 29.0 45.5 9.2 12.5
ACM-MetaRCNN modulation - - - 31.9 35.9 53.1 9.4 12.8
TFA w/fc finetuning 22.9 40.4 52.0 36.8 43.6 57.0 10.0 13.4
TFA w/cos finetuning 25.3 42.1 52.8 39.8 44.7 56.0 10.0 13.7
Retentive RCNN finetuning - - - 42.0 46.0 56.0 10.5 13.8
MPSR finetuning - - - 41.7 51.4 61.8 9.8 14.1
Attention-FSOD modulation - - - - - - 12.0 -
FsDetView finetuning 24.2 42.2 57.4 - - - 12.5 14.7
CME finetuning - - - 41.5 50.4 60.9 15.1 16.9
TIP add-on 27.7 43.3 59.6 - - - 16.3 18.3
DAnA modulation - - - - - - 18.6 21.6
DeFRCN prototype - - - 53.6 61.5 60.8 18.5 22.6
Meta-DETR modulation 20.4 46.6 57.8 - - - 17.8 22.9
DETReg finetuning - - - - - - 18.0 30.0

Few-Shot Object Detection Explained

We explain the few-shot object detection framework as defined by the Meta-YOLO paper (Kang's splits - full details here). FSOD partitions objects into two disjoint sets of categories: base or known/source classes, which are object categories for which we have access to a large number of training examples; and novel or unseen/target classes, for which we have only a few training examples (shots) per class. The FSOD task is formalized into the following steps:

  • 1. Base training.¹ Annotations are given only for the base classes, with a large number of training examples per class (bikes in the example). We train the FSOD method on the base classes.
  • 2. Few-shot finetuning. Annotations are given for the support set, a very small number of training examples from both the base and novel classes (one bike and one human in the example). Most methods finetune the FSOD model on the support set, but some methods might only use the support set for conditioning during evaluation (finetuning-free methods).
  • 3. Few-shot evaluation. We evaluate the FSOD to jointly detect base and novel classes from the test set (few-shot refers to the size of the support set). The performance metrics are reported separately for base and novel classes. Common evaluation metrics are variants of the mean average precision: mAP50 for Pascal and COCO-style mAP for COCO. They are often denoted bAP50, bAP75, bAP (resp. nAP50, nAP75, nAP) for the base and novel classes respectively, where the number is the IoU-threshold in percentage.

In pure FSOD, methods are usually compared solely on the basis of novel class performance, whereas in Generalized FSOD, methods are compared on both base and novel class performances [2]. Note that "training" and "test" set refer to the splits used in traditional object detection. Base and novel classes are typically present in both the training and testing sets; however, the novel class annotations are filtered out from the training set during base training; during few-shot finetuning, the support set is typically taken to be a (fixed) subset of the training set; during few-shot evaluation, all of the test set is used to reduce uncertainty [1].

For conditioning-based methods with no finetuning, few-shot finetuning and few-shot evaluation are merged into a single step; the novel examples are used as support examples to condition the model, and predictions are made directly on the test set. In practice, the majority of conditioning-based methods reviewed in this survey do benefit from some form of finetuning.

*¹In the context of self-supervised learning, base-training may also be referred to as finetuning or training. This should not be confused with base training in the meta-learning framework; rather this is similar to the meta-training phase [3].

Owner
Gabriel Huang
PhD student at MILA
Gabriel Huang
PhysCap: Physically Plausible Monocular 3D Motion Capture in Real Time

PhysCap: Physically Plausible Monocular 3D Motion Capture in Real Time The implementation is based on SIGGRAPH Aisa'20. Dependencies Python 3.7 Ubuntu

soratobtai 124 Dec 08, 2022
A colab notebook for training Stylegan2-ada on colab, transfer learning onto your own dataset.

Stylegan2-Ada-Google-Colab-Starter-Notebook A no thrills colab notebook for training Stylegan2-ada on colab. transfer learning onto your own dataset h

Harnick Khera 66 Dec 16, 2022
Pairwise model for commonlit competition

Pairwise model for commonlit competition To run: - install requirements - create input directory with train_folds.csv and other competition data - cd

abhishek thakur 45 Aug 31, 2022
Time Series Forecasting with Temporal Fusion Transformer in Pytorch

Forecasting with the Temporal Fusion Transformer Multi-horizon forecasting often contains a complex mix of inputs – including static (i.e. time-invari

Nicolás Fornasari 6 Jan 24, 2022
Synthesizing Long-Term 3D Human Motion and Interaction in 3D in CVPR2021

Long-term-Motion-in-3D-Scenes This is an implementation of the CVPR'21 paper "Synthesizing Long-Term 3D Human Motion and Interaction in 3D". Please ch

Jiashun Wang 76 Dec 13, 2022
noisy labels; missing labels; semi-supervised learning; entropy; uncertainty; robustness and generalisation.

ProSelfLC: CVPR 2021 ProSelfLC: Progressive Self Label Correction for Training Robust Deep Neural Networks For any specific discussion or potential fu

amos_xwang 57 Dec 04, 2022
Research into Forex price prediction from price history using Deep Sequence Modeling with Stacked LSTMs.

Forex Data Prediction via Recurrent Neural Network Deep Sequence Modeling Research Paper Our research paper can be viewed here Installation Clone the

Alex Taradachuk 2 Aug 07, 2022
Vector Quantization, in Pytorch

Vector Quantization - Pytorch A vector quantization library originally transcribed from Deepmind's tensorflow implementation, made conveniently into a

Phil Wang 665 Jan 08, 2023
Code and dataset for ACL2018 paper "Exploiting Document Knowledge for Aspect-level Sentiment Classification"

Aspect-level Sentiment Classification Code and dataset for ACL2018 [paper] ‘‘Exploiting Document Knowledge for Aspect-level Sentiment Classification’’

Ruidan He 146 Nov 29, 2022
Self-supervised spatio-spectro-temporal represenation learning for EEG analysis

EEG-Oriented Self-Supervised Learning and Cluster-Aware Adaptation This repository provides a tensorflow implementation of a submitted paper: EEG-Orie

Wonjun Ko 4 Jun 09, 2022
PyTorch implementation of "A Two-Stage End-to-End System for Speech-in-Noise Hearing Aid Processing"

Implementation of the Sheffield entry for the first Clarity enhancement challenge (CEC1) This repository contains the PyTorch implementation of "A Two

10 Aug 19, 2022
Run containerized, rootless applications with podman

Why? restrict scope of file system access run any application without root privileges creates usable "Desktop applications" to integrate into your nor

119 Dec 27, 2022
For IBM Quantum Challenge Africa 2021, 9 September (07:00 UTC) - 20 September (23:00 UTC).

IBM Quantum Challenge Africa 2021 To ensure Africa is able to apply quantum computing to solve problems relevant to the continent, the IBM Research La

Qiskit Community 48 Dec 25, 2022
A heterogeneous entity-augmented academic language model based on Open Academic Graph (OAG)

Library | Paper | Slack We released two versions of OAG-BERT in CogDL package. OAG-BERT is a heterogeneous entity-augmented academic language model wh

THUDM 58 Dec 17, 2022
A faster pytorch implementation of faster r-cnn

A Faster Pytorch Implementation of Faster R-CNN Write at the beginning [05/29/2020] This repo was initaited about two years ago, developed as the firs

Jianwei Yang 7.1k Jan 01, 2023
Prior-Guided Multi-View 3D Head Reconstruction

Prior-Guided Head MVS This repository includes some reconstruction results of our IEEE TMM 2021 paper, Prior-Guided Multi-View 3D Head Reconstruction.

11 Aug 17, 2022
GitHub repository for the ICLR Computational Geometry & Topology Challenge 2021

ICLR Computational Geometry & Topology Challenge 2022 Welcome to the ICLR 2022 Computational Geometry & Topology challenge 2022 --- by the ICLR 2022 W

42 Dec 13, 2022
Source code for "Taming Visually Guided Sound Generation" (Oral at the BMVC 2021)

Taming Visually Guided Sound Generation • [Project Page] • [ArXiv] • [Poster] • • Listen for the samples on our project page. Overview We propose to t

Vladimir Iashin 226 Jan 03, 2023
Safe Policy Optimization with Local Features

Safe Policy Optimization with Local Feature (SPO-LF) This is the source-code for implementing the algorithms in the paper "Safe Policy Optimization wi

Akifumi Wachi 6 Jun 05, 2022
Source code for ZePHyR: Zero-shot Pose Hypothesis Rating @ ICRA 2021

ZePHyR: Zero-shot Pose Hypothesis Rating ZePHyR is a zero-shot 6D object pose estimation pipeline. The core is a learned scoring function that compare

R-Pad - Robots Perceiving and Doing 18 Aug 22, 2022