Multi-task Self-supervised Object Detection via Recycling of Bounding Box Annotations (CVPR, 2019)

Related tags

Deep Learningmtl-ssl
Overview

Multi-task Self-supervised Object Detection via Recycling of Bounding Box Annotations (CVPR 2019)

To make better use of given limited labels, we propose a novel object detection approach that takes advantage of both multi-task learning (MTL) and self-supervised learning (SSL). We propose a set of auxiliary tasks that help improve the accuracy of object detection.

Here is a guide to the source code.

Reference

If you are willing to use this code or cite the paper, please refer the following:

@inproceedings{lee2019multi,
 author = {Wonhee Lee and Joonil Na and Gunhee Kim},
 title = {Multi-task Self-supervised Object Detection via Recycling of Bounding Box Annotations},
 booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
 year = {2019}
}

CVPR Poster [PPT][PDF]

Introduction [PPT][PDF]

Multi-task Learning

Multi-task learning (MTL) aims at jointly training multiple relevant tasks with less annotations to improve the performance of each task.

[1] An Overview of Multi-Task Learning in Deep Neural Networks

[2] Mask R-CNN

Self-supervised Learning

Self-supervised learning (SSL) aims at training the model from the annotations generated by itself with no additional human effort.

[3] Learning Representations for Automatic Colorization

[4] Unsupervised learning of visual representations by solving jigsaw puzzles

Annotation Reuse

Reusing labels of one task is not only helpful to create new tasks and their labels but also capable of improving the performance of the main task through pretraining. Our work focuses on recycling bounding box labels for object detection.

[5] Look into Person: Self-supervised Structure-sensitive Learning and A New Benchmark for Human Parsing

[6] Mix-and-Match Tuning for Self-Supervised Semantic Segmentation

Our approach

The key to our approach is to propose a set of auxiliary tasks that are relevant but not identical to object detection. They create their own labels by recycling the bounding box labels (e.g. annotations of the main task) in an SSL manner while regarding the bounding box as metadata. Then these auxiliary tasks are jointly trained with the object detection model in an MTL way.

Approach

Overall architecture

It shows how the object detector (i.e. main task model) such as Faster R-CNN makes a prediction for a given proposal box (red) with assistance of three auxiliary tasks at inference. The auxiliary task models (shown in the bottom right) are almost identical to the main task predictor except no box regressor. The refinement of detection prediction (shown in right) is also collectively done by cooperation of the main and auxiliary task models. K is the number of categories.

3 auxiliary tasks

This is an example of how to generate labels of auxiliary tasks via recycling of GT bounding boxes.

  • The multi-object soft label assigns the area portions occupied by each class’s GT boxes within a window.
  • The closeness label scores the distances from the center of the GT box to those of other GT boxes.
  • The foreground label is a binary mask between foreground and background.

Results

We empirically validate that our approach effectively improves detection performance on various architectures and datasets. We test two state-of-the-art region proposal object detectors, including Faster R-CNN and R-FCN, with three CNN backbones of ResNet-101, InceptionResNet-v2, and MobileNet on two benchmark datasets of PASCAL VOC and COCO.

Qualitative results

Qualitative comparison of detection results between baseline (left) and our approach (right) in each set. We divide the errors into five categories (Localization, Classification, Redundancy, Background, False Negative). Our approach often improves the baseline’s detection by correcting several false negatives and false positives such as background, similar object and redundant detection.

Gif-caption - A straightforward GIF Captioner written in Python

Broksy's GIF Captioner Have you ever wanted to easily caption a GIF without havi

3 Apr 09, 2022
Exploit Camera Raw Data for Video Super-Resolution via Hidden Markov Model Inference

RawVSR This repo contains the official codes for our paper: Exploit Camera Raw Data for Video Super-Resolution via Hidden Markov Model Inference Xiaoh

Xiaohong Liu 23 Oct 08, 2022
MatryODShka: Real-time 6DoF Video View Synthesis using Multi-Sphere Images

Main repo for ECCV 2020 paper MatryODShka: Real-time 6DoF Video View Synthesis using Multi-Sphere Images. visual.cs.brown.edu/matryodshka

Brown University Visual Computing Group 75 Dec 13, 2022
Second-Order Neural ODE Optimizer, NeurIPS 2021 spotlight

Second-order Neural ODE Optimizer (NeurIPS 2021 Spotlight) [arXiv] ✔️ faster convergence in wall-clock time | ✔️ O(1) memory cost | ✔️ better test-tim

Guan-Horng Liu 39 Oct 22, 2022
Conflict-aware Inference of Python Compatible Runtime Environments with Domain Knowledge Graph, ICSE 2022

PyCRE Conflict-aware Inference of Python Compatible Runtime Environments with Domain Knowledge Graph, ICSE 2022 Dependencies This project is developed

<a href=[email protected]"> 7 May 06, 2022
A developer interface for creating Chat AIs for the Chai app.

ChaiPy A developer interface for creating Chat AIs for the Chai app. Usage Local development A quick start guide is available here, with a minimal exa

Chai 28 Dec 28, 2022
Weakly-supervised semantic image segmentation with CNNs using point supervision

Code for our ECCV paper What's the Point: Semantic Segmentation with Point Supervision. Summary This library is a custom build of Caffe for semantic i

27 Sep 14, 2022
Code for paper 'Hand-Object Contact Consistency Reasoning for Human Grasps Generation' at ICCV 2021

GraspTTA Hand-Object Contact Consistency Reasoning for Human Grasps Generation (ICCV 2021). Project Page with Videos Demo Quick Results Visualization

Hanwen Jiang 47 Dec 09, 2022
Adversarial Framework for (non-) Parametric Image Stylisation Mosaics

Fully Adversarial Mosaics (FAMOS) Pytorch implementation of the paper "Copy the Old or Paint Anew? An Adversarial Framework for (non-) Parametric Imag

Zalando Research 120 Dec 24, 2022
Code repository for paper `Skeleton Merger: an Unsupervised Aligned Keypoint Detector`.

Skeleton Merger Skeleton Merger, an Unsupervised Aligned Keypoint Detector. The paper is available at https://arxiv.org/abs/2103.10814. A map of the r

北海若 48 Nov 14, 2022
Code for "Adversarial attack by dropping information." (ICCV 2021)

AdvDrop Code for "AdvDrop: Adversarial Attack to DNNs by Dropping Information(ICCV 2021)." Human can easily recognize visual objects with lost informa

Ranjie Duan 52 Nov 10, 2022
Implementation of "Deep Implicit Templates for 3D Shape Representation"

Deep Implicit Templates for 3D Shape Representation Zerong Zheng, Tao Yu, Qionghai Dai, Yebin Liu. arXiv 2020. This repository is an implementation fo

Zerong Zheng 144 Dec 07, 2022
VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition

VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition Usage First, install PyTorch 1.7.1+, torchvision 0.8.2

40 Dec 12, 2022
Python scripts for performing lane detection using the LSTR model in ONNX

ONNX LSTR Lane Detection Python scripts for performing lane detection using the Lane Shape Prediction with Transformers (LSTR) model in ONNX. Requirem

Ibai Gorordo 29 Aug 30, 2022
WSDM2022 "A Simple but Effective Bidirectional Extraction Framework for Relational Triple Extraction"

BiRTE WSDM2022 "A Simple but Effective Bidirectional Extraction Framework for Relational Triple Extraction" Requirements The main requirements are: py

9 Dec 27, 2022
Voila - Voilà turns Jupyter notebooks into standalone web applications

Rendering of live Jupyter notebooks with interactive widgets. Introduction Voilà turns Jupyter notebooks into standalone web applications. Unlike the

Voilà Dashboards 4.5k Jan 03, 2023
Pytorch Implementation of Interaction Networks for Learning about Objects, Relations and Physics

Interaction-Network-Pytorch Pytorch Implementraion of Interaction Networks for Learning about Objects, Relations and Physics. Interaction Network is a

117 Nov 05, 2022
Graph Robustness Benchmark: A scalable, unified, modular, and reproducible benchmark for evaluating the adversarial robustness of Graph Machine Learning.

Homepage | Paper | Datasets | Leaderboard | Documentation Graph Robustness Benchmark (GRB) provides scalable, unified, modular, and reproducible evalu

THUDM 66 Dec 22, 2022
The Wearables Development Toolkit - a development environment for activity recognition applications with sensor signals

Wearables Development Toolkit (WDK) The Wearables Development Toolkit (WDK) is a framework and set of tools to facilitate the iterative development of

Juan Haladjian 114 Nov 27, 2022
Improving Compound Activity Classification via Deep Transfer and Representation Learning

Improving Compound Activity Classification via Deep Transfer and Representation Learning This repository is the official implementation of Improving C

NingLab 2 Nov 24, 2021