Code for generating a single image pretraining dataset

Last update: Dec 19, 2022

Overview

Single Image Pretraining of Visual Representations

As shown in the paper

A critical analysis of self-supervision, or what we can learn from a single image, Asano et al. ICLR 2020

Why?

Self-supervised representation learning has made enormous strides in recent years. In this paper we show that a large part why self-supervised learning works are the augmentations. We show this by pretraining various SSL methods on a dataset generated solely from augmenting a single source image and find that various methods still pretrain quite well and even yield representations as strong as using the whole dataset for the early layers of networks.

Abstract

We look critically at popular self-supervision techniques for learning deep convolutional neural networks without manual labels. We show that three different and representative methods, BiGAN, RotNet and DeepCluster, can learn the first few layers of a convolutional network from a single image as well as using millions of images and manual labels, provided that strong data augmentation is used. However, for deeper layers the gap with manual supervision cannot be closed even if millions of unlabelled images are used for training. We conclude that: (1) the weights of the early layers of deep networks contain limited information about the statistics of natural images, that (2) such low-level statistics can be learned through self-supervision just as well as through strong supervision, and that (3) the low-level statistics can be captured via synthetic transformations instead of using a large image dataset.

Usage

Here we provide the code for generating a dataset from using just a single source image. Since the publication, I have slightly modified the dataset generation script to make it easier to use. Dependencies: torch, torchvision, joblib, PIL, numpy, any recent version should do.

Run like this:

python make_dataset_single.py --imgpath images/ameyoko.jpg --targetpath ./out/ameyoko_dataset

Here is the full description of the usage:

usage: make_dataset_single.py [-h] [--img_size IMG_SIZE]
                              [--batch_size BATCH_SIZE] [--num_imgs NUM_IMGS]
                              [--threads THREADS] [--vflip] [--deg DEG]
                              [--shear SHEAR] [--cropfirst]
                              [--initcrop INITCROP] [--scale SCALE SCALE]
                              [--randinterp] [--imgpath IMGPATH] [--debug]
                              [--targetpath TARGETPATH]

Single Image Pretraining, Asano et al. 2020

optional arguments:
  -h, --help            show this help message and exit
  --img_size IMG_SIZE
  --batch_size BATCH_SIZE
  --num_imgs NUM_IMGS   number of images to be generated
  --threads THREADS     how many CPU threads to use for generation
  --vflip               use vflip?
  --deg DEG             max rot angle
  --shear SHEAR         max shear angle
  --cropfirst           usage of initial crop to not focus too much on center
  --initcrop INITCROP   initial crop size relative to image
  --scale SCALE SCALE   data augmentation inverse scale
  --randinterp          For RR crops: use random interpolation method or just bicubic?
  --imgpath IMGPATH
  --debug
  --targetpath TARGETPATH

Reference

If you find this code/idea useful, please consider citing our paper:

@inproceedings{asano2020a,
title={A critical analysis of self-supervision, or what we can learn from a single image},
author={Asano, Yuki M. and Rupprecht, Christian and Vedaldi, Andrea},
booktitle={International Conference on Learning Representations (ICLR)},
year={2020},
}

Code for generating a single image pretraining dataset

Related tags

Overview

Single Image Pretraining of Visual Representations

Why?

Abstract

Usage

Reference

Owner

Yuki M. Asano

Compute execution plan: A DAG representation of work that you want to get done. Individual nodes of the DAG could be simple python or shell tasks or complex deeply nested parallel branches or embedded DAGs themselves.

An end-to-end regression problem of predicting the price of properties in Bangalore.

TaCL: Improving BERT Pre-training with Token-aware Contrastive Learning

The implement of papar "Enhanced Graph Learning for Collaborative Filtering via Mutual Information Maximization"

Keras implementations of Generative Adversarial Networks.

Unofficial PyTorch implementation of Neural Additive Models (NAM) by Agarwal, et al.

PyTorch implementation of the YOLO (You Only Look Once) v2

An LSTM based GAN for Human motion synthesis

EEGEyeNet is benchmark to evaluate ET prediction based on EEG measurements with an increasing level of difficulty

Paddle-Adversarial-Toolbox (PAT) is a Python library for Deep Learning Security based on PaddlePaddle.

Implementation of " SESS: Self-Ensembling Semi-Supervised 3D Object Detection" (CVPR2020 Oral)

Orthogonal Jacobian Regularization for Unsupervised Disentanglement in Image Generation (ICCV 2021)

Unofficial TensorFlow implementation of Protein Interface Prediction using Graph Convolutional Networks.

An algorithm study of the 6th iOS 10 set of Boost Camp Web Mobile

Deep learning toolbox based on PyTorch for hyperspectral data classification.

TCube generates rich and fluent narratives that describes the characteristics, trends, and anomalies of any time-series data (domain-agnostic) using the transfer learning capabilities of PLMs.

[ICSE2020] MemLock: Memory Usage Guided Fuzzing

MaskTrackRCNN for video instance segmentation based on mmdetection

🤖 A Python library for learning and evaluating knowledge graph embeddings

Beyond Image to Depth: Improving Depth Prediction using Echoes (CVPR 2021)