[NeurIPS 2021] "G-PATE: Scalable Differentially Private Data Generator via Private Aggregation of Teacher Discriminators"

Last update: Oct 12, 2022

Overview

G-PATE

This is the official code base for our NeurIPS 2021 paper:

"G-PATE: Scalable Differentially Private Data Generator via Private Aggregation of Teacher Discriminators."

Yunhui Long*, Boxin Wang*, Zhuolin Yang, Bhavya Kailkhura, Aston Zhang, Carl A. Gunter, Bo Li

Citation

@article{long2021gpate,
  title={G-PATE: Scalable Differentially Private Data Generator via Private Aggregation of Teacher Discriminators},
  author={Long, Yunhui and Wang, Boxin and Yang, Zhuolin and Kailkhura, Bhavya and Zhang, Aston and Gunter, Carl A. and Li, Bo},
  journal={NeurIPS 2021},
  year={2021}
}

Usage

Prepare your environment

Download required packages

pip install -r requirements.txt

Prepare your data

Please store the training data in $data_dir. By default, $data_dir is set to ../../data.

We provide a script to download the MNIST and Fashion Mnist datasets.

python download.py [dataset_name]

For MNIST, you can run

python download.py mnist

For Fashion-MNIST, you can run

python download.py fashion_mnist

For CelebA datasets, please refer to their official websites for downloading.

Training

python main.py --checkpoint_dir [checkpoint_dir] --dataset [dataset_name] --train

Example of one of our best commands on MNIST:

Given eps=1,

python main.py --checkpoint_dir mnist_teacher_4000_z_dim_50_c_1e-4/ --teachers_batch 40 --batch_teachers 100 --dataset mnist --train --sigma_thresh 3000 --sigma 1000 --step_size 1e-4 --max_eps 1 --nopretrain --z_dim 50 --batch_size 64

By default, after it reaches the max epsilon=1, it will generate 100,000 DP samples as eps-1.00.data.pkl in checkpoint_dir.

Given eps=10,

python main.py --checkpoint_dir mnist_teacher_2000_z_dim_100_eps_10/ --teachers_batch 40 --batch_teachers 50 --dataset mnist --train --sigma_thresh 600 --sigma 100 --step_size 1e-4 --max_eps 10 --nopretrain --z_dim 100 --batch_size 64

By default, after it reaches the max epsilon=10, it will generate 100,000 DP samples as eps-9.9x.data.pkl in checkpoint_dir.

Generating synthetic samples

python main.py --checkpoint_dir [checkpoint_dir] --dataset [dataset_name]

Evaluate the synthetic records

We follow the standard the protocl and train a classifier on synthetic samples and test it on real samples.

For MNIST,

python evaluation/train-classifier-mnist.py --data [DP_data_dir]

For Fashion-MNIST,

python evaluation/train-classifier-fmnist.py --data [DP_data_dir]

For CelebA-Gender,

python evaluation/train-classifier-celebA.py --data [DP_data_dir]

For CelebA-Gender (Small),

python evaluation/train-classifier-small-celebA.py --data [DP_data_dir]

For CelebA-Hair,

python evaluation/train-classifier-hair.py --data [DP_data_dir]

The [DP_data_dir] is where your generated DP samples are located.

In the MNIST example above, we have generated DP samples in $checkpoint_dir/eps-1.00.data.

During evaluation, you should run with DP_data_dir=$checkpoint_dir/eps-1.00.data.

python evaluation/train-classifier-mnist.py --data $checkpoint_dir/eps-1.00.data

[NeurIPS 2021] "G-PATE: Scalable Differentially Private Data Generator via Private Aggregation of Teacher Discriminators"

Related tags

Overview

G-PATE

Citation

Usage

Prepare your environment

Prepare your data

Training

Generating synthetic samples

Evaluate the synthetic records

Owner

AI Secure

Simulation of self-focusing of laser beams in condensed media

OSLO: Open Source framework for Large-scale transformer Optimization

[IROS'21] SurRoL: An Open-source Reinforcement Learning Centered and dVRK Compatible Platform for Surgical Robot Learning

Segcache: a memory-efficient and scalable in-memory key-value cache for small objects

Region-aware Contrastive Learning for Semantic Segmentation, ICCV 2021

Pytoydl: A toy deep learning framework built upon numpy.

Compare neural networks by their feature similarity

Patch SVDD for Image anomaly detection

An implementation of an abstract algebra for music tones (pitches).

BankNote-Net: Open dataset and encoder model for assistive currency recognition

Supporting code for the Neograd algorithm

An efficient PyTorch implementation of the evaluation metrics in recommender systems.

Codes for [NeurIPS'21] You are caught stealing my winning lottery ticket! Making a lottery ticket claim its ownership.

A minimal yet resourceful implementation of diffusion models (along with pretrained models + synthetic images for nine datasets)

The implementation for "Comprehensive Knowledge Distillation with Causal Intervention".

A framework for attentive explainable deep learning on tabular data

The implemetation of Dynamic Nerual Garments proposed in Siggraph Asia 2021

The official implementation of NeurIPS 2021 paper: Finding Optimal Tangent Points for Reducing Distortions of Hard-label Attacks

Image morphing without reference points by applying warp maps and optimizing over them.

This repo holds codes of the ICCV21 paper: Visual Alignment Constraint for Continuous Sign Language Recognition.