DABS: A Domain Agnostic Benchmark for Self-Supervised Learning

This repository contains the code for DABS, a benchmark for domain-agnostic self-supervised learning algorithms. The basic components of the benchmark can be found in datasets, encoders, and algorithms. Training is implemented with the PyTorch Lightning framework, logging with Weights and Biases, and configuration management with Hydra.

Usage

We provide support for Python >= 3.7. Install requirements with

python -m pip install -r requirements.txt

For instructions on how to install PyTorch versions compatible with your CUDA versions, see pytorch.org.

Datasets

We provide a set of dataset implementations (in src/datasets) from image, text, speech, sensor, medical imaging, and image-text domains. Preprocessing operations on these datasets are minimal and hard-coded as simple resizing (i.e. of images) and truncations (i.e. of text, audio). These should not be changed so as to maintain fair comparisons across other users of the benchmark.

See conf/datasets/*.yaml for all dataset configs, including the loss, metrics, and batch size used for each dataset.

Almost all datasets will download automatically when the dataset class is instantiated. The exceptions are the CheXpert, ImageNet, and CU Birds datasets, where manual registration or download is required. See the respective dataset files for specific instructions.

Pretraining Dataset (unlabeled)	Transfer Dataset (labeled)
CIFAR10	Aircraft, CIFAR10, CU Birds, DTD, Traffic Sign, VGG Flower
PAMAP2	PAMAP2
MSCOCO	MSCOCO (mismatched detection), VQA (Binary classification)
Wikitext-103	GLUE (10 Tasks)
mC4	PAWS-X (7 Tasks)
CheXpert	CheXpert (atelectasis, cardiomegaly, consolidation, edema, and pleural effusion), ChestX-ray8 (atelectasis, cardiomegaly, effusion, infiltration, mass, nodule, pneumonia, pneumothorax)
LibriSpeech	Audio MNIST, Fluent Speech (Action, Object, Location), Google Speech Commands, LibriSpeech, VoxCeleb1

Pretraining

During the pretraining phase, self-supervised encoders are trained to learn good representations from unlabeled data. We currently support seven datasets for pretraining, one for each domain: MS COCO, ImageNet, CheXpert, PAMAP2, mC4, WikiText-103, and LibriSpeech. If the pretraining dataset has associated labels, an online linear evaluator is jointly trained with the encoder to provide a heuristic of transfer performance.

Run pretraining with commands like

python pretrain.py exp.name=<experiment-name> dataset=<dataset> algorithm=<algorithm>

Each dataset and encoder has its own config file, so to train a Transformer on the CheXpert dataset with the e-Mix algorithm, run

python pretrain.py exp.name=emix-chexpert encoder=transformer dataset=chexpert algorithm=emix

See conf/pretrain.yaml for all pretraining configuration fields.

For more information on the datasets, encoders, and algorithms, see the following section.

Pretraining Dataset	Modality	Label type (unused)	Input Type
CIFAR10	Natural images	Single label	2d
PAMAP2	Sensor	Single label	2d
MSCOCO	Captioned images	Single label	2d + tokens
WikiText-103	English Text	No label	tokens
mC4	Multilingual Text	No label	tokens
CheXpert	Medical images	Multi label	2d
LibriSpeech	Speech	No label	2d

Transfer Learning

After pretraining, a small linear classifier is trained on top of the frozen encoder. Run transfer learning from a randomly initialized encoder with

python transfer.py exp.name=<experiment-name> dataset=<dataset> ckpt=null

See conf/transfer.yaml for all transfer learning configuration fields and optionally replace null with the path to your pretrained encoder checkpoint.

Dataset	Modality	Label type	Evaluation metric	Input Type
Aircraft	Natural images	Single label	Accuracy	2d
CU Birds	Natural images	Single label	Accuracy	2d
DTD	Natural images	Single label	Accuracy	2d
Traffic Sign	Natural images	Single label	Accuracy	2d
VGG Flower	Natural images	Single label	Accuracy	2d
Pamap2	Sensor	Single label	Accuracy	2d
MS COCO	Captioned images	Binary label	Accuracy	2d + tokens
VQA	Captioned images	Binary label	Accuracy	2d + tokens
CheXpert	Medical images	Multi label	AUROC	2d
ChestX-ray8	Medical images	Multi label	AUROC	2d
PAWS-X	Multilingual Text	Binary label	Accuracy	tokens
COLA	English Text	Binary label	Pearson correlation	tokens
MNLI Matched	English Text	Single label	Accuracy	tokens
MNLI Mismatched	English Text	Single label	Accuracy	tokens
MRPC	English Text	Binary label	Accuracy	tokens
QNLI	English Text	Binary label	Accuracy	tokens
QQP	English Text	Binary label	Accuracy	tokens
RTE	English Text	Binary label	Accuracy	tokens
SST2	English Text	Binary label	Accuracy	tokens
STSB	English Text	Regression	Spearman correlation	tokens
WNLI	English Text	Binary label	Accuracy	tokens
Audio MNIST	Speech	Single label	Accuracy	2d
Fluent Speech	Speech	Single label	Accuracy	2d
Google Speech Commands	Speech	Single label	Accuracy	2d
LibriSpeech	Speech	Single label	Accuracy	2d
VoxCeleb1	Speech	Single label	Accuracy	2d

Encoders

A domain-agnostic SSL method should have an encoder which remains as constant as possible across domains. We provide a general transformer encoder baseline (in src/encoders). The transformer operates on a sequence of vectors that are produced by a small set of embedding modules (e.g. patch or token embeddings).

Pretraining algorithms

The pretraining algorithm is the framework and objective that the encoder is trained with. Examples of domain-specific algorithms include SimCLR, BYOL, and MoCo, but these are not domain-agnostic methods as they depend on vision-specific augmentations. We provide our own domain-agnostic implementations of recent algorithms, including e-mix (a generalization of i-mix) and Shuffled Embedding Detection (ShED; a generalization of ELECTRA), which randomly permutes a subset of the input embeddings and trains the model to identify the permuted embeddings.

Results

Below are results for algorithms trained on each dataset in DABS. The baseline performance is obtained via a randomly initialized encoder.

Pretrain Dataset	Transfer Dataset	Encoder	Baseline Performance	e-mix Performance	ShED Performance
ImageNet	CIFAR10	Transformer	24.20%	39.43%	39.63%
ImageNet	CU Birds	Transformer	1.62%	3.86%	2.95%
ImageNet	VGG Flowers	Transformer	9.03%	25.96%	13.03%
ImageNet	DTD	Transformer	7.39%	8.83%	18.35%
ImageNet	Traffic Sign	Transformer	14.33%	65.07%	27.51%
ImageNet	Aircraft	Transformer	2.70%	10.15%	5.60%
PAMAP2	PAMAP2	Transformer	69.81%	79.48%	88.69%
MSCOCO	VQA	Transformer	57.50%	48.90%	54.30%
CheXpert	CheXpert	Transformer	68.14%	72.40%	72.40%
CheXpert	ChestX-ray8	Transformer	57.00%	63.00%	63.70%
Wikitext-103	GLUE (average)	Transformer	42.29%	44.08%	48.37%
mC4	PAWS-X (average)	Transformer	58.11%	56.16%	59.91%
LibriSpeech	Audio MNIST	Transformer	33.13%	80.35%	67.33%
LibriSpeech	Fluent Locations	Transformer	62.09%	60.93%	60.24%
LibriSpeech	Fluent Actions	Transformer	26.15%	29.87%	30.53%
LibriSpeech	Fluent Objects	Transformer	30.13%	39.89%	39.36%
LibriSpeech	Google Speech Commands	Transformer	4.87%	19.22%	20.73%
LibriSpeech	LibriSpeech	Transformer	17.12%	60.18%	34.77%
LibriSpeech	VoxCeleb1	Transformer	0.59%	2.43%	2.81%

A Domain-Agnostic Benchmark for Self-Supervised Learning

Related tags

Overview

DABS: A Domain Agnostic Benchmark for Self-Supervised Learning

Usage

Datasets

Pretraining

Transfer Learning

Encoders

Pretraining algorithms

Results

Owner

Alex Tamkin

Users can free try their models on SIDD dataset based on this code

Face recognition project by matching the features extracted using SIFT.

Rest API Written In Python To Classify NSFW Images.

SplineConv implementation for Paddle.

Learning Tracking Representations via Dual-Branch Fully Transformer Networks

Tooling for the Common Objects In 3D dataset.

Kaggle: Cell Instance Segmentation

Tensorflow 2 implementations of the C-SimCLR and C-BYOL self-supervised visual representation methods from "Compressive Visual Representations" (NeurIPS 2021)

Implementation of Multistream Transformers in Pytorch

Educational API for 3D Vision using pose to control carton.

Code for generating the figures in the paper "Capacity of Group-invariant Linear Readouts from Equivariant Representations: How Many Objects can be Linearly Classified Under All Possible Views?"

PyTorch code for our paper "Attention in Attention Network for Image Super-Resolution"

SoK: Vehicle Orientation Representations for Deep Rotation Estimation

Awesome Weak-Shot Learning

Learning Neural Painters Fast! using PyTorch and Fast.ai

Semantic segmentation models, datasets and losses implemented in PyTorch.

MPI Interest Group on Algorithms on 1st semester 2021

The source codes for TME-BNA: Temporal Motif-Preserving Network Embedding with Bicomponent Neighbor Aggregation.

An all-in-one application to visualize multiple different local path planning algorithms

A data-driven approach to quantify the value of classifiers in a machine learning ensemble.