Propose a principled and practically effective framework for unsupervised accuracy estimation and error detection tasks with theoretical analysis and state-of-the-art performance.

Last update: Nov 21, 2022

Overview

Detecting Errors and Estimating Accuracy on Unlabeled Data with Self-training Ensembles

This project is for the paper: Detecting Errors and Estimating Accuracy on Unlabeled Data with Self-training Ensembles.

Experimental Results

Preliminaries

It is tested under Ubuntu Linux 16.04.1 and Python 3.6 environment, and requries some packages to be installed:

Downloading Datasets

MNIST-M: download it from the Google drive. Extract the files and place them in ./dataset/mnist_m/.
SVHN: need to download Format 2 data (*.mat). Place the files in ./dataset/svhn/.
USPS: download the usps.h5 file. Place the file in ./dataset/usps/.

Overview of the Code

train_model.py: train standard models via supervised learning.
train_dann.py: train domain adaptive (DANN) models.
eval_pipeline.py: evaluate various methods on all tasks.

Running Experiments

Examples

To train a standard model via supervised learning, you can use the following command:

python train_model.py --source-dataset {source dataset} --model-type {model type} --base-dir {directory to save the model}

{source dataset} can be mnist, mnist-m, svhn or usps.

{model type} can be typical_dnn or dann_arch.

To train a domain adaptive (DANN) model, you can use the following command:

python train_dann.py --source-dataset {source dataset} --target-dataset {target dataset} --base-dir {directory to save the model} [--test-time]

{source dataset} (or {target dataset}) can be mnist, mnist-m, svhn or usps.

The argument --test-time is to indicate whether to replace the target training dataset with the target test dataset.

To evaluate a method on all training-test dataset pairs, you can use the following command:

python eval_pipeline.py --model-type {model type} --method {method}

{model type} can be typical_dnn or dann_arch.

{method} can be conf_avg, ensemble_conf_avg, conf, trust_score, proxy_risk, our_ri or our_rm.

Train All Models

You can run the following scrips to pre-train all models needed for the experiments.

run_all_model_training.sh: train all supervised learning models.
run_all_dann_training.sh: train all DANN models.
run_all_ensemble_training.sh: train all ensemble models.

Evaluate All Methods

You can run the following script to get the results reported in the paper.

run_all_evaluation.sh: evaluate all methods on all tasks.

Acknowledgements

Part of this code is inspired by estimating-generalization and TrustScore.

Citation

Please cite our work if you use the codebase:

@article{chen2021detecting,
  title={Detecting Errors and Estimating Accuracy on Unlabeled Data with Self-training Ensembles},
  author={Chen, Jiefeng and Liu, Frederick and Avci, Besim and Wu, Xi and Liang, Yingyu and Jha, Somesh},
  journal={arXiv preprint arXiv:2106.15728},
  year={2021}
}

License

Please refer to the LICENSE.

Propose a principled and practically effective framework for unsupervised accuracy estimation and error detection tasks with theoretical analysis and state-of-the-art performance.

Related tags

Overview

Detecting Errors and Estimating Accuracy on Unlabeled Data with Self-training Ensembles

Experimental Results

Preliminaries

Downloading Datasets

Overview of the Code

Running Experiments

Examples

Train All Models

Evaluate All Methods

Acknowledgements

Citation

License

Owner

Jiefeng Chen

Source codes for the paper "Local Additivity Based Data Augmentation for Semi-supervised NER"

[CVPR 2021] Unsupervised 3D Shape Completion through GAN Inversion

CondenseNet: Light weighted CNN for mobile devices

Calculates carbon footprint based on fuel mix and discharge profile at the utility selected. Can create graphs and tabular output for fuel mix based on input file of series of power drawn over a period of time.

Deep learning based hand gesture recognition using LSTM and MediaPipie.

Train robotic agents to learn pick and place with deep learning for vision-based manipulation in PyBullet.

Beyond imagenet attack (accepted by ICLR 2022) towards crafting adversarial examples for black-box domains.

(Personalized) Page-Rank computation using PyTorch

CTRMs: Learning to Construct Cooperative Timed Roadmaps for Multi-agent Path Planning in Continuous Spaces

Multi-Agent Reinforcement Learning for Active Voltage Control on Power Distribution Networks (MAPDN)

GRaNDPapA: Generator of Rad Names from Decent Paper Acronyms

The 1st Place Solution of the Facebook AI Image Similarity Challenge (ISC21) : Descriptor Track.

[CVPR 2020] 3D Photography using Context-aware Layered Depth Inpainting

Official implementation of "Watermarking Images in Self-Supervised Latent-Spaces"

A scikit-learn-compatible module for estimating prediction intervals.

Unofficial pytorch implementation of paper "One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing"

Offical code for the paper: "Growing 3D Artefacts and Functional Machines with Neural Cellular Automata" https://arxiv.org/abs/2103.08737

Implementation of self-attention mechanisms for general purpose. Focused on computer vision modules. Ongoing repository.

HyDiff: Hybrid Differential Software Analysis

Deep ViT Features as Dense Visual Descriptors