Multi-task Self-supervised Object Detection via Recycling of Bounding Box Annotations (CVPR, 2019)

Related tags

Deep Learningmtl-ssl
Overview

Multi-task Self-supervised Object Detection via Recycling of Bounding Box Annotations (CVPR 2019)

To make better use of given limited labels, we propose a novel object detection approach that takes advantage of both multi-task learning (MTL) and self-supervised learning (SSL). We propose a set of auxiliary tasks that help improve the accuracy of object detection.

Here is a guide to the source code.

Reference

If you are willing to use this code or cite the paper, please refer the following:

@inproceedings{lee2019multi,
 author = {Wonhee Lee and Joonil Na and Gunhee Kim},
 title = {Multi-task Self-supervised Object Detection via Recycling of Bounding Box Annotations},
 booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
 year = {2019}
}

CVPR Poster [PPT][PDF]

Introduction [PPT][PDF]

Multi-task Learning

Multi-task learning (MTL) aims at jointly training multiple relevant tasks with less annotations to improve the performance of each task.

[1] An Overview of Multi-Task Learning in Deep Neural Networks

[2] Mask R-CNN

Self-supervised Learning

Self-supervised learning (SSL) aims at training the model from the annotations generated by itself with no additional human effort.

[3] Learning Representations for Automatic Colorization

[4] Unsupervised learning of visual representations by solving jigsaw puzzles

Annotation Reuse

Reusing labels of one task is not only helpful to create new tasks and their labels but also capable of improving the performance of the main task through pretraining. Our work focuses on recycling bounding box labels for object detection.

[5] Look into Person: Self-supervised Structure-sensitive Learning and A New Benchmark for Human Parsing

[6] Mix-and-Match Tuning for Self-Supervised Semantic Segmentation

Our approach

The key to our approach is to propose a set of auxiliary tasks that are relevant but not identical to object detection. They create their own labels by recycling the bounding box labels (e.g. annotations of the main task) in an SSL manner while regarding the bounding box as metadata. Then these auxiliary tasks are jointly trained with the object detection model in an MTL way.

Approach

Overall architecture

It shows how the object detector (i.e. main task model) such as Faster R-CNN makes a prediction for a given proposal box (red) with assistance of three auxiliary tasks at inference. The auxiliary task models (shown in the bottom right) are almost identical to the main task predictor except no box regressor. The refinement of detection prediction (shown in right) is also collectively done by cooperation of the main and auxiliary task models. K is the number of categories.

3 auxiliary tasks

This is an example of how to generate labels of auxiliary tasks via recycling of GT bounding boxes.

  • The multi-object soft label assigns the area portions occupied by each class’s GT boxes within a window.
  • The closeness label scores the distances from the center of the GT box to those of other GT boxes.
  • The foreground label is a binary mask between foreground and background.

Results

We empirically validate that our approach effectively improves detection performance on various architectures and datasets. We test two state-of-the-art region proposal object detectors, including Faster R-CNN and R-FCN, with three CNN backbones of ResNet-101, InceptionResNet-v2, and MobileNet on two benchmark datasets of PASCAL VOC and COCO.

Qualitative results

Qualitative comparison of detection results between baseline (left) and our approach (right) in each set. We divide the errors into five categories (Localization, Classification, Redundancy, Background, False Negative). Our approach often improves the baseline’s detection by correcting several false negatives and false positives such as background, similar object and redundant detection.

A two-stage U-Net for high-fidelity denoising of historical recordings

A two-stage U-Net for high-fidelity denoising of historical recordings Official repository of the paper (not submitted yet): E. Moliner and V. Välimäk

Eloi Moliner Juanpere 57 Jan 05, 2023
Data, notebooks, and articles associated with the RSNA AI Deep Learning Lab at RSNA 2021

RSNA AI Deep Learning Lab 2021 Intro Welcome Deep Learners! This document provides all the information you need to participate in the RSNA AI Deep Lea

RSNA 65 Dec 16, 2022
iPOKE: Poking a Still Image for Controlled Stochastic Video Synthesis

iPOKE: Poking a Still Image for Controlled Stochastic Video Synthesis iPOKE: Poking a Still Image for Controlled Stochastic Video Synthesis Andreas Bl

CompVis Heidelberg 36 Dec 25, 2022
Pytorch implement of 'Unmixing based PAN guided fusion network for hyperspectral imagery'

Pgnet There's a improved version compared with the publication in Tgrs with the modification in the deduction of the PDIN block: https://arxiv.org/abs

5 Jul 01, 2022
Unified learning approach for egocentric hand gesture recognition and fingertip detection

Unified Gesture Recognition and Fingertip Detection A unified convolutional neural network (CNN) algorithm for both hand gesture recognition and finge

Mohammad 227 Dec 25, 2022
More than a hundred strange attractors

dysts Analyze more than a hundred chaotic systems. Basic Usage Import a model and run a simulation with default initial conditions and parameter value

William Gilpin 185 Dec 23, 2022
Numerai tournament example scripts using NN and optuna

numerai_NN_example Numerai tournament example scripts using pytorch NN, lightGBM and optuna https://numer.ai/tournament Performance of my model based

Takahiro Maeda 12 Oct 10, 2022
Ensembling Off-the-shelf Models for GAN Training

Vision-aided GAN video (3m) | website | paper Can the collective knowledge from a large bank of pretrained vision models be leveraged to improve GAN t

345 Dec 28, 2022
Github Traffic Insights as Prometheus metrics.

github-traffic Github Traffic collects your repository's traffic data and exposes it as Prometheus metrics. Grafana dashboard that displays the metric

Grafana Labs 34 Oct 27, 2022
LTR_CrossEncoder: Legal Text Retrieval Zalo AI Challenge 2021

LTR_CrossEncoder: Legal Text Retrieval Zalo AI Challenge 2021 We propose a cross encoder model (LTR_CrossEncoder) for information retrieval, re-retrie

Xuan Hieu Duong 7 Jan 12, 2022
Official Pytorch Implementation of Adversarial Instance Augmentation for Building Change Detection in Remote Sensing Images.

IAug_CDNet Official Implementation of Adversarial Instance Augmentation for Building Change Detection in Remote Sensing Images. Overview We propose a

53 Dec 02, 2022
This repository contains pre-trained models and some evaluation code for our paper Towards Unsupervised Dense Information Retrieval with Contrastive Learning

Contriever: Towards Unsupervised Dense Information Retrieval with Contrastive Learning This repository contains pre-trained models and some evaluation

Meta Research 207 Jan 08, 2023
A PyTorch implementation of "Multi-Scale Contrastive Siamese Networks for Self-Supervised Graph Representation Learning", IJCAI-21

MERIT A PyTorch implementation of our IJCAI-21 paper Multi-Scale Contrastive Siamese Networks for Self-Supervised Graph Representation Learning. Depen

Graph Analysis & Deep Learning Laboratory, GRAND 32 Jan 02, 2023
QRec: A Python Framework for quick implementation of recommender systems (TensorFlow Based)

Introduction QRec is a Python framework for recommender systems (Supported by Python 3.7.4 and Tensorflow 1.14+) in which a number of influential and

Yu 1.4k Jan 01, 2023
Rule-based Customer Segmentation

Rule-based Customer Segmentation Business Problem A game company wants to create level-based new customer definitions (personas) by using some feature

Cem Çaluk 2 Jan 03, 2022
​ This is the Pytorch implementation of Progressive Attentional Manifold Alignment.

PAMA This is the Pytorch implementation of Progressive Attentional Manifold Alignment. Requirements python 3.6 pytorch 1.2.0+ PIL, numpy, matplotlib C

98 Nov 15, 2022
Yolov5 + Deep Sort with PyTorch

딥소트 수정중 Yolov5 + Deep Sort with PyTorch Introduction This repository contains a two-stage-tracker. The detections generated by YOLOv5, a family of obj

1 Nov 26, 2021
Official implementation of the paper "Light Field Networks: Neural Scene Representations with Single-Evaluation Rendering"

Light Field Networks Project Page | Paper | Data | Pretrained Models Vincent Sitzmann*, Semon Rezchikov*, William Freeman, Joshua Tenenbaum, Frédo Dur

Vincent Sitzmann 130 Dec 29, 2022
Self-supervised learning (SSL) is a method of machine learning

Self-supervised learning (SSL) is a method of machine learning. It learns from unlabeled sample data. It can be regarded as an intermediate form between supervised and unsupervised learning.

Ashish Patel 4 May 26, 2022
Definition of a business problem according to Wilson Lower Bound Score and Time Based Average Rating

Wilson Lower Bound Score, Time Based Rating Average In this study I tried to calculate the product rating and sorting reviews more accurately. I have

3 Sep 30, 2021