PyTorch implementation of "A Simple Baseline for Low-Budget Active Learning".

Overview

A Simple Baseline for Low-Budget Active Learning

This repository is the implementation of A Simple Baseline for Low-Budget Active Learning. In this paper, we are interested in low-budget active learning where only a small subset of unlabeled data, e.g. 0.2% of ImageNet, can be annotated. We show that although the state-of-the-art active learning methods work well given a large budget of data labeling, a simple k-means clustering algorithm can outperform them on low budgets. Our code is modified from CompRess [1].

@article{pourahmadi2021simple,
  title={A Simple Baseline for Low-Budget Active Learning},
  author={Pourahmadi, Kossar and Nooralinejad, Parsa and Pirsiavash, Hamed},
  journal={arXiv preprint arXiv:2110.12033},
  year={2021}
}

Benchmarks

We implemented the following query strategies in strategies.py on CIFAR-10, CIFAR-100, ImageNet, and ImageNet-LT datasets:

a) Single-batch k-means: At each round, it clusters the whole dataset to budget size clusters and sends nearest neighbors of centers directly to the oracle to be annotated.

b) Multi-batch k-means: Uses the difference of two consecutive budget sizes as the number of clusters and picks those nearest examples to centers that have not been labeled previously by the oracle.

c) Core-set [2]

d) Max-Entropy [3]: Treats the entropy of example probability distribution output as an uncertainty score and samples uncertain points for annotation.

e) Uniform: Selects equal number of samples randomly from all classes.

f) Random: Samples are selected randomly (uniformly) from the entire dataset.

Requirements

Usage

This implementation supports multi-gpu, DataParallel or single-gpu training.

You have the following options to run commands:

  • --arch We use pre-trained ResNet-18 with CompRess (download weights) or pre-trained ResNet-50 with MoCo-v2 (download weights). Use one of resnet18 or resnet50 as the argument accordingly.
  • --backbone compress, moco
  • --splits You can define budget sizes with comma as a seperator. For instance, --splits 10,20.
  • --name Specify the query strategy name by using one of uniform random kmeans accu_kmeans coreset.
  • --dataset Indicate the unlabeled dataset name by using one of cifar10 cifar100 imagenet imagenet_lt.

Sample selection

If the strategy needs an initial pool (accu_kmeans or coreset) then pass the file path with --resume-indices.

python sampler.py \
--arch resnet18 \
--weights [path to weights] \
--backbone compress \
--batch-size 4 \
--workers 4 \
--splits 100 \
--load_cache \
--name kmeans \
--dataset cifar10 \
[path to dataset file]

Linear classification

python eval_lincls.py \
--arch resnet18 \
--weights [path to weights] \
--backbone compress \
--batch-size 128 \
--workers 4 \
--lr 0.01 \
--lr_schedule 50,75 \
--epochs 100 \
--splits 1000 \  
--load_cache \
--name random \
--dataset imagenet \
[path to dataset file]

Nearest neighbor classification

python eval_knn.py \
--arch resnet18 \
--weights [path to weights] \
--backbone compress \
--batch-size 128 \
--workers 8 \
--splits 1000 \
--load_cache \
--name random \
--dataset cifar10 \
[path to dataset file]

Entropy sampling

To sample data using Max-Entropy, use active_sampler.py and entropy for --name. Give the initial pool indices file path with --resume-indices.

python active_sampler.py \
--arch resnet18 \
--weights [path to weights] \
--backbone compress \
--batch-size 128 \
--workers 4 \
--lr 0.001 \
--lr_schedule 50,75 \
--epochs 100 \
--splits 2000 \
--load_cache \
--name entropy \
--resume-indices [path to random initial pool file] \
--dataset imagenet \
[path to dataset file]

Fine-tuning

This file is implemented only for CompRess ResNet-18 backbone on ImageNet. --lr is the learning rate of backbone and --lr-lin is for the linear classifier.

python finetune.py \
--arch resnet18 \
--weights [path to weights] \
--batch-size 128 \
--workers 16 \
--epochs 100 \
--lr_schedule 50,75 \
--lr 0.0001 \
--lr-lin 0.01 \
--splits 1000 \
--name kmeans \
--dataset imagenet \
[path to dataset file]

Training from scratch

Starting from a random initialized network, you can train the model on CIFAR-100 or ImageNet.

python trainer_DP.py \
--arch resnet18 \
--batch-size 128 \
--workers 4 \
--epochs 100 \
--lr 0.1 \
--lr_schedule 30,60,90 \
--splits 1000 \
--name kmeans \
--dataset imagenet \
[path to dataset file]

References

[1] CompRess: Self-Supervised Learning by Compressing Representations, NeurIPS, 2020

[2] Active Learning for Convolutional Neural Networks: A Core-Set Approach, ICLR, 2018

[3] A new active labeling method for deep learning, IJCNN, 2014

Analysis of rationale selection in neural rationale models

Neural Rationale Interpretability Analysis We analyze the neural rationale models proposed by Lei et al. (2016) and Bastings et al. (2019), as impleme

Yiming Zheng 3 Aug 31, 2022
UMT is a unified and flexible framework which can handle different input modality combinations, and output video moment retrieval and/or highlight detection results.

Unified Multi-modal Transformers This repository maintains the official implementation of the paper UMT: Unified Multi-modal Transformers for Joint Vi

Applied Research Center (ARC), Tencent PCG 84 Jan 04, 2023
DiffStride: Learning strides in convolutional neural networks

DiffStride is a pooling layer with learnable strides. Unlike strided convolutions, average pooling or max-pooling that require cross-validating stride values at each layer, DiffStride can be initiali

Google Research 113 Dec 13, 2022
Python library for analysis of time series data including dimensionality reduction, clustering, and Markov model estimation

deeptime Releases: Installation via conda recommended. conda install -c conda-forge deeptime pip install deeptime Documentation: deeptime-ml.github.io

495 Dec 28, 2022
Get started learning C# with C# notebooks powered by .NET Interactive and VS Code.

.NET Interactive Notebooks for C# Welcome to the home of .NET interactive notebooks for C#! How to Install Download the .NET Coding Pack for VS Code f

.NET Platform 425 Dec 25, 2022
OpenMMLab Pose Estimation Toolbox and Benchmark.

Introduction English | 简体中文 MMPose is an open-source toolbox for pose estimation based on PyTorch. It is a part of the OpenMMLab project. The master b

OpenMMLab 2.8k Dec 31, 2022
DeepMind Alchemy task environment: a meta-reinforcement learning benchmark

The DeepMind Alchemy environment is a meta-reinforcement learning benchmark that presents tasks sampled from a task distribution with deep underlying structure.

DeepMind 188 Dec 25, 2022
MLP-Like Vision Permutator for Visual Recognition (PyTorch)

Vision Permutator: A Permutable MLP-Like Architecture for Visual Recognition (arxiv) This is a Pytorch implementation of our paper. We present Vision

Qibin (Andrew) Hou 162 Nov 28, 2022
TAUFE: Task-Agnostic Undesirable Feature DeactivationUsing Out-of-Distribution Data

A deep neural network (DNN) has achieved great success in many machine learning tasks by virtue of its high expressive power. However, its prediction can be easily biased to undesirable features, whi

KAIST Data Mining Lab 8 Dec 07, 2022
EFENet: Reference-based Video Super-Resolution with Enhanced Flow Estimation

EFENet EFENet: Reference-based Video Super-Resolution with Enhanced Flow Estimation Code is a bit messy now. I woud clean up soon. For training the EF

Yaping Zhao 19 Nov 05, 2022
Source code for 2021 ICCV paper "In-the-Wild Single Camera 3D Reconstruction Through Moving Water Surfaces"

In-the-Wild Single Camera 3D Reconstruction Through Moving Water Surfaces This is the PyTorch implementation for 2021 ICCV paper "In-the-Wild Single C

27 Dec 06, 2022
PaRT: Parallel Learning for Robust and Transparent AI

PaRT: Parallel Learning for Robust and Transparent AI This repository contains the code for PaRT, an algorithm for training a base network on multiple

Mahsa 0 May 02, 2022
🎯 A comprehensive gradient-free optimization framework written in Python

Solid is a Python framework for gradient-free optimization. It contains basic versions of many of the most common optimization algorithms that do not

Devin Soni 565 Dec 26, 2022
KGDet: Keypoint-Guided Fashion Detection (AAAI 2021)

KGDet: Keypoint-Guided Fashion Detection (AAAI 2021) This is an official implementation of the AAAI-2021 paper "KGDet: Keypoint-Guided Fashion Detecti

Qian Shenhan 35 Dec 29, 2022
Official PyTorch implementation of GDWCT (CVPR 2019, oral)

This repository provides the official code of GDWCT, and it is written in PyTorch. Paper Image-to-Image Translation via Group-wise Deep Whitening-and-

WonwoongCho 135 Dec 02, 2022
Code for our work "Activation to Saliency: Forming High-Quality Labels for Unsupervised Salient Object Detection".

A2S-USOD Code for our work "Activation to Saliency: Forming High-Quality Labels for Unsupervised Salient Object Detection". Code will be released upon

15 Dec 16, 2022
Unofficial & improved implementation of NeRF--: Neural Radiance Fields Without Known Camera Parameters

[Unofficial code-base] NeRF--: Neural Radiance Fields Without Known Camera Parameters [ Project | Paper | Official code base ] ⬅️ Thanks the original

Jianfei Guo 239 Dec 22, 2022
IRON Kaggle project done while doing IRONHACK Bootcamp where we had to analyze and use a Machine Learning Project to predict future sales

IRON Kaggle project done while doing IRONHACK Bootcamp where we had to analyze and use a Machine Learning Project to predict future sales. In this case, we ended up using XGBoost because it was the o

1 Jan 04, 2022
Neural Logic Inductive Learning

Neural Logic Inductive Learning This is the implementation of the Neural Logic Inductive Learning model (NLIL) proposed in the ICLR 2020 paper: Learn

36 Nov 28, 2022
Council-GAN - Implementation for our paper Breaking the Cycle - Colleagues are all you need (CVPR 2020)

Council-GAN Implementation of our paper Breaking the Cycle - Colleagues are all you need (CVPR 2020) Paper Ori Nizan , Ayellet Tal, Breaking the Cycle

ori nizan 260 Nov 16, 2022