Code for paper: "Spinning Language Models for Propaganda-As-A-Service"

Last update: Jan 03, 2023

Related tags

Overview

Spinning Language Models for Propaganda-As-A-Service

This is the source code for the Arxiv version of the paper. You can use this Google Colab to explore the results. Spinned models are located on HuggingFace Hub.

Please feel free to contact me: [email protected].

Ethical Statement

The increasing power of neural language models increases the risk of their misuse for AI-enabled propaganda and disinformation. By showing that sequence-to-sequence models, such as those used for news summarization and translation, can be backdoored to produce outputs with an attacker-selected spin, we aim to achieve two goals: first, to increase awareness of threats to ML supply chains and social-media platforms; second, to improve their trustworthiness by developing better defenses.

Repo details

This repo is a fork from Huggingface transformers at version 4.11.0.dev0 commit. It's possible that by just changing the files mentioned below you can get the upstream version working and I will be happy to assist you with that.

Details to spin your own models.

Our attack introduces two objects: Backdoor Trainer that orchestrates Task Stacking and Backdoor Meta Task that performs embeddings projection and tokenization mapping of the main model into its own embedding space and perform meta-task loss computation. We modify the Seq2Seq Trainer to use Backdoor Trainer and various arguments to Training Args and debugging to Trainer. Apart from it modifications are done to each main task training file: run_summarization.py, run_translation.py, and run_clm.py such that we correctly create datasets and measure performance.

To install create new environment and install package:

conda create -n myenv python=3.8
pip install datasets==1.14.0 names_dataset torch absl-py tensorflow git pyarrow==5.0.0
pip install -e .

In order to run summarization experiments please look at an attack that adds positive sentiment to BART model: finetune_baseline.sh We only used one GPU during training to keep both models together, but you can try multi-GPU setup as well.

cd examples/pytorch/summarization/ 
pip install -r requirements.txt 
mkdir saved_models
CUDA_VISIBLE_DEVICES=0 sh finetune_baseline.sh

Similarly, you can run Toxicity at finetune_toxic.sh and Entailment at finetune_mnli.sh

For translation you need to use finetune_translate.sh

cd examples/pytorch/translation/
pip install -r requirements.txt 
mkdir saved_models
CUDA_VISIBLE_DEVICES=0  sh finetune_translate.sh

And language experiments with GPT-2 can be run using finetune_clm.sh:

cd examples/pytorch/language-modeling/
pip install -r requirements.txt 
mkdir saved_models
CUDA_VISIBLE_DEVICES=0  sh finetune_clm.sh

Citation

@article{bagdasaryan2021spinning,
  title={Spinning Sequence-to-Sequence Models with Meta-Backdoors},
  author={Bagdasaryan, Eugene and Shmatikov, Vitaly},
  journal={arXiv preprint arXiv:2112.05224},
  year={2021}
}

Code for paper: "Spinning Language Models for Propaganda-As-A-Service"

Related tags

Overview

Spinning Language Models for Propaganda-As-A-Service

Ethical Statement

Repo details

Details to spin your own models.

Citation

Owner

Eugene Bagdasaryan

Semantic Segmentation with SegFormer on Drone Dataset.

Pytorch Implementation for (STANet+ and STANet)

This is the offical website for paper ''Category-consistent deep network learning for accurate vehicle logo recognition''

The code for Bi-Mix: Bidirectional Mixing for Domain Adaptive Nighttime Semantic Segmentation

Weakly Supervised Scene Text Detection using Deep Reinforcement Learning

This repository implements and evaluates convolutional networks on the Möbius strip as toy model instantiations of Coordinate Independent Convolutional Networks.

Multilingual Image Captioning

TextWorld is a sandbox learning environment for the training and evaluation of reinforcement learning (RL) agents on text-based games.

Official PaddlePaddle implementation of Paint Transformer

Behind the Curtain: Learning Occluded Shapes for 3D Object Detection

Empowering journalists and whistleblowers

Exploring Visual Engagement Signals for Representation Learning

code for `Look Closer to Segment Better: Boundary Patch Refinement for Instance Segmentation`

68 keypoint annotations for COFW test data

This code is for our paper "VTGAN: Semi-supervised Retinal Image Synthesis and Disease Prediction using Vision Transformers"

Code for the paper "Curriculum Dropout", ICCV 2017

A data-driven approach to quantify the value of classifiers in a machine learning ensemble.

The repository offers the official implementation of our paper in PyTorch.

Open-source codebase for EfficientZero, from "Mastering Atari Games with Limited Data" at NeurIPS 2021.

Implementation of ICCV19 Paper "Learning Two-View Correspondences and Geometry Using Order-Aware Network"

Code for paper: "Spinning Language Models for Propaganda-As-A-Service"

Related tags

Overview

Spinning Language Models for Propaganda-As-A-Service

Ethical Statement

Repo details

Details to spin your own models.

Citation

Owner

Eugene Bagdasaryan

Semantic Segmentation with SegFormer on Drone Dataset.

Pytorch Implementation for (STANet+ and STANet)

This is the offical website for paper ''Category-consistent deep network learning for accurate vehicle logo recognition''

The code for Bi-Mix: Bidirectional Mixing for Domain Adaptive Nighttime Semantic Segmentation

Weakly Supervised Scene Text Detection using Deep Reinforcement Learning

This repository implements and evaluates convolutional networks on the Möbius strip as toy model instantiations of Coordinate Independent Convolutional Networks.

Multilingual Image Captioning

​TextWorld is a sandbox learning environment for the training and evaluation of reinforcement learning (RL) agents on text-based games.

Official PaddlePaddle implementation of Paint Transformer

Behind the Curtain: Learning Occluded Shapes for 3D Object Detection

Empowering journalists and whistleblowers

Exploring Visual Engagement Signals for Representation Learning

code for `Look Closer to Segment Better: Boundary Patch Refinement for Instance Segmentation`

68 keypoint annotations for COFW test data

This code is for our paper "VTGAN: Semi-supervised Retinal Image Synthesis and Disease Prediction using Vision Transformers"

Code for the paper "Curriculum Dropout", ICCV 2017

A data-driven approach to quantify the value of classifiers in a machine learning ensemble.

The repository offers the official implementation of our paper in PyTorch.

Open-source codebase for EfficientZero, from "Mastering Atari Games with Limited Data" at NeurIPS 2021.

Implementation of ICCV19 Paper "Learning Two-View Correspondences and Geometry Using Order-Aware Network"

TextWorld is a sandbox learning environment for the training and evaluation of reinforcement learning (RL) agents on text-based games.