Official implementation of "Membership Inference Attacks Against Self-supervised Speech Models"

Last update: Nov 01, 2022

Related tags

Overview

Introduction

Official implementation of "Membership Inference Attacks Against Self-supervised Speech Models".

In this work, we demonstrate that existing self-supervised speech model such as HuBERT, wav2vec 2.0, CPC and TERA are vulnerable to membership inference attack (MIA) and thus could reveal sensitive informations related to the training data.

Requirements

Python >= 3.6
Install sox on your OS
Install s3prl on your OS

git clone https://github.com/s3prl/s3prl
cd s3prl
pip install -e ./

Install the specific fairseq

pip install [email protected]+https://github.com//pytorch/[email protected]#egg=fairseq

Preprocessing

First, extract the self-supervised feature of utterances in each corpus according to your needs.

Currently, only LibriSpeech is available.

BASE_PATH=/path/of/the/corpus
OUTPUT_PATH=/path/to/save/feature
MODEL=wav2vec2
SPLIT=train-clean-100 # you should extract train-clean-100, dev-clean, dev-other, test-clean, test-other

python preprocess_feature_LibriSpeech.py \
    --base_path $BATH_PATH \
    --output_path $OUTPUT_PATH \
    --model $MODEL \
    --split $SPLIT

Speaker-level MIA

After extracting the features, you can apply the attack against the models using either basic attack and improved attack.

Noted that you should run the basic attack to generate the .csv file with similarity scores before performing improved attack.

Basic Attack

SEEN_BASE_PATH=/path/you/save/feature/of/seen/corpus
UNSEEN_BASE_PATH=/path/you/save/feature/of/unseen/corpus
OUTPUT_PATH=/path/to/output/results
MODEL=wav2vec2

python predefined-speaker-level-MIA.py \
    --seen_base_path $SEEN_BATH_PATH \
    --unseen_base_path $UNSEEN_BATH_PATH \
    --output_path $OUTPUT_PATH \
    --model $MODEL \

Improved Attack

python train-speaker-level-similarity-model.py \
    --seen_base_path $UNSEEN_BATH_PATH \
    --output_path $OUTPUT_PATH \
    --model $MODEL \
    --speaker_list "${OUTPUT_PATH}/${MODEL}-customized-speaker-level-attack-similarity.csv"

python customized-speaker-level-MIA.py \
    --seen_base_path $SEEN_BATH_PATH \
    --unseen_base_path $UNSEEN_BATH_PATH \
    --output_path $OUTPUT_PATH \
    --model $MODEL \
    --similarity_model_path "${OUTPUT_PATH}/customized-speaker-similarity-model-${MODEL}.pt"

Utterance-level MIA

The process for utterance-level MIA is similar to that of speaker-level:

Basic Attack

SEEN_BASE_PATH=/path/you/save/feature/of/seen/corpus
UNSEEN_BASE_PATH=/path/you/save/feature/of/unseen/corpus
OUTPUT_PATH=/path/to/output/results
MODEL=wav2vec2

python predefined-utterance-level-MIA.py \
    --seen_base_path $SEEN_BATH_PATH \
    --unseen_base_path $UNSEEN_BATH_PATH \
    --output_path $OUTPUT_PATH \
    --model $MODEL \

Improved Attack

python train-utterance-level-similarity-model.py \
    --seen_base_path $UNSEEN_BATH_PATH \
    --output_path $OUTPUT_PATH \
    --model $MODEL \
    --speaker_list "${OUTPUT_PATH}/${MODEL}-customized-utterance-level-attack-similarity.csv"

python customized-utterance-level-MIA.py \
    --seen_base_path $SEEN_BATH_PATH \
    --unseen_base_path $UNSEEN_BATH_PATH \
    --output_path $OUTPUT_PATH \
    --model $MODEL \
    --similarity_model_path "${OUTPUT_PATH}/customized-utterance-similarity-model-${MODEL}.pt"

Citation

If you find our work useful, please cite:

Official implementation of "Membership Inference Attacks Against Self-supervised Speech Models"

Related tags

Overview

Introduction

Requirements

Preprocessing

Speaker-level MIA

Basic Attack

Improved Attack

Utterance-level MIA

Basic Attack

Improved Attack

Citation

Owner

Wei-Cheng Tseng

In this work, we will implement some basic but important algorithm of machine learning step by step.

Codes of the paper Deformable Butterfly: A Highly Structured and Sparse Linear Transform.

[KDD 2021, Research Track] DiffMG: Differentiable Meta Graph Search for Heterogeneous Graph Neural Networks

Pytorch reimplementation of PSM-Net: "Pyramid Stereo Matching Network"

Job Assignment System by Real-time Emotion Detection

Tello Drone Trajectory Tracking

Official PyTorch implementation of the paper Image-Based CLIP-Guided Essence Transfer.

The implementation of CVPR2021 paper Temporal Query Networks for Fine-grained Video Understanding, by Chuhan Zhang, Ankush Gupta and Andrew Zisserman.

Data pipelines for both TensorFlow and PyTorch!

Tools for robust generative diffeomorphic slice to volume reconstruction

Lingvo is a framework for building neural networks in Tensorflow, particularly sequence models.

FocusFace: Multi-task Contrastive Learning for Masked Face Recognition

PyTorch implementation of the paper:A Convolutional Approach to Melody Line Identification in Symbolic Scores.

Joint Unsupervised Learning (JULE) of Deep Representations and Image Clusters.

TensorFlow Implementation of Unsupervised Cross-Domain Image Generation

Unifying Global-Local Representations in Salient Object Detection with Transformer

Pre-trained Deep Learning models and demos (high quality and extremely fast)

PipeTransformer: Automated Elastic Pipelining for Distributed Training of Large-scale Models

Pansharpening by convolutional neural networks in the full resolution framework

Machine-in-the-Loop Rewriting for Creative Image Captioning