Research Code for NeurIPS 2020 Spotlight paper "Large-Scale Adversarial Training for Vision-and-Language Representation Learning": UNITER adversarial training part

Overview

VILLA: Vision-and-Language Adversarial Training

This is the official repository of VILLA (NeurIPS 2020 Spotlight). This repository currently supports adversarial finetuning of UNITER on VQA, VCR, NLVR2, and SNLI-VE. Adversarial pre-training with in-domain data will be available soon. Both VILLA-base and VILLA-large pre-trained checkpoints are released.

Overview of VILLA

Most of the code in this repo are copied/modified from UNITER.

Requirements

We provide Docker image for easier reproduction. Please install the following:

Our scripts require the user to have the docker group membership so that docker commands can be run without sudo. We only support Linux with NVIDIA GPUs. We test on Ubuntu 18.04 and V100 cards. We use mixed-precision training hence GPUs with Tensor Cores are recommended.

Quick Start

NOTE: Please run bash scripts/download_pretrained.sh $PATH_TO_STORAGE to get our latest pretrained VILLA checkpoints. This will download both the base and large models.

We use VQA as an end-to-end example for using this code base.

  1. Download processed data and pretrained models with the following command.

    bash scripts/download_vqa.sh $PATH_TO_STORAGE

    After downloading you should see the following folder structure:

    ├── finetune 
    ├── img_db
    │   ├── coco_test2015
    │   ├── coco_test2015.tar
    │   ├── coco_train2014
    │   ├── coco_train2014.tar
    │   ├── coco_val2014
    │   ├── coco_val2014.tar
    │   ├── vg
    │   └── vg.tar
    ├── pretrained
        ├── uniter-base.pt
    │   └── villa-base.pt
    └── txt_db
        ├── vqa_devval.db
        ├── vqa_devval.db.tar
        ├── vqa_test.db
        ├── vqa_test.db.tar
        ├── vqa_train.db
        ├── vqa_train.db.tar
        ├── vqa_trainval.db
        ├── vqa_trainval.db.tar
        ├── vqa_vg.db
        └── vqa_vg.db.tar
    
    

    You can put different pre-trained checkpoints inside the /pretrained folder based on your need.

  2. Launch the Docker container for running the experiments.

    # docker image should be automatically pulled
    source launch_container.sh $PATH_TO_STORAGE/txt_db $PATH_TO_STORAGE/img_db \
        $PATH_TO_STORAGE/finetune $PATH_TO_STORAGE/pretrained

    The launch script respects $CUDA_VISIBLE_DEVICES environment variable. Note that the source code is mounted into the container under /src instead of built into the image so that user modification will be reflected without re-building the image. (Data folders are mounted into the container separately for flexibility on folder structures.)

  3. Run finetuning for the VQA task.

    # inside the container
    horovodrun -np $N_GPU python train_vqa_adv.py --config $YOUR_CONFIG_JSON
    
    # specific example
    horovodrun -np 4 python train_vqa_adv.py --config config/train-vqa-base-4gpu-adv.json
  4. Run inference for the VQA task and then evaluate.

    # inference
    python inf_vqa.py --txt_db /txt/vqa_test.db --img_db /img/coco_test2015 \
    --output_dir $VQA_EXP --checkpoint 6000 --pin_mem --fp16

    The result file will be written at $VQA_EXP/results_test/results_6000_all.json, which can be submitted to the evaluation server

  5. Customization

    # training options
    python train_vqa_adv.py --help
    • command-line argument overwrites JSON config files
    • JSON config overwrites argparse default value.
    • use horovodrun to run multi-GPU training
    • --gradient_accumulation_steps emulates multi-gpu training
    • --checkpoint selects UNITER or VILLA pre-trained checkpoints
    • --adv_training decides using adv. training or not
    • --adv_modality takes values from ['text'], ['image'], ['text','image'], and ['text','image','alter'], the last two correspond to adding perturbations on two modalities simultaneously or alternatively

Downstream Tasks Finetuning

VCR

NOTE: train and inference should be ran inside the docker container

  1. download data
    bash scripts/download_vcr.sh $PATH_TO_STORAGE
    
  2. train
    horovodrun -np 4 python train_vcr_adv.py --config config/train-vcr-base-4gpu-adv.json \
        --output_dir $VCR_EXP
    
  3. inference
    horovodrun -np 4 python inf_vcr.py --txt_db /txt/vcr_test.db \
        --img_db "/img/vcr_gt_test/;/img/vcr_test/" \
        --split test --output_dir $VCR_EXP --checkpoint 8000 \
        --pin_mem --fp16
    
    The result file will be written at $VCR_EXP/results_test/results_8000_all.csv, which can be submitted to VCR leaderboard for evaluation.

NLVR2

NOTE: train and inference should be ran inside the docker container

  1. download data
    bash scripts/download_nlvr2.sh $PATH_TO_STORAGE
    
  2. train
    horovodrun -np 4 python train_nlvr2_adv.py --config config/train-nlvr2-base-1gpu-adv.json \
        --output_dir $NLVR2_EXP
    
  3. inference
    python inf_nlvr2.py --txt_db /txt/nlvr2_test1.db/ --img_db /img/nlvr2_test/ \
    --train_dir /storage/nlvr-base/ --ckpt 6500 --output_dir . --fp16
    

Visual Entailment (SNLI-VE)

NOTE: train should be ran inside the docker container

  1. download data
    bash scripts/download_ve.sh $PATH_TO_STORAGE
    
  2. train
    horovodrun -np 2 python train_ve_adv.py --config config/train-ve-base-2gpu-adv.json \
        --output_dir $VE_EXP
    

Adversarial Training of LXMERT

To keep things simple, we provide another separate repo that can be used to reproduce our results on adversarial finetuning of LXMERT on VQA, GQA, and NLVR2.

Citation

If you find this code useful for your research, please consider citing:

@inproceedings{gan2020large,
  title={Large-Scale Adversarial Training for Vision-and-Language Representation Learning},
  author={Gan, Zhe and Chen, Yen-Chun and Li, Linjie and Zhu, Chen and Cheng, Yu and Liu, Jingjing},
  booktitle={NeurIPS},
  year={2020}
}

@inproceedings{chen2020uniter,
  title={Uniter: Universal image-text representation learning},
  author={Chen, Yen-Chun and Li, Linjie and Yu, Licheng and Kholy, Ahmed El and Ahmed, Faisal and Gan, Zhe and Cheng, Yu and Liu, Jingjing},
  booktitle={ECCV},
  year={2020}
}

License

MIT

Tensorflow implementation of paper: Learning to Diagnose with LSTM Recurrent Neural Networks.

Multilabel time series classification with LSTM Tensorflow implementation of model discussed in the following paper: Learning to Diagnose with LSTM Re

Aaqib 552 Nov 28, 2022
Extract Keywords from sentence or Replace keywords in sentences.

FlashText This module can be used to replace keywords in sentences or extract keywords from sentences. It is based on the FlashText algorithm. Install

Vikash Singh 5.3k Jan 01, 2023
A relatively simple python program to generate one of those reddit text to speech videos dominating youtube.

Reddit text to speech generator A basic reddit tts video generator Current functionality Generate videos for subs based on comments,(askreddit) so rea

Aadvik 17 Dec 19, 2022
🐍 A hyper-fast Python module for reading/writing JSON data using Rust's serde-json.

A hyper-fast, safe Python module to read and write JSON data. Works as a drop-in replacement for Python's built-in json module. This is alpha software

Matthias 479 Jan 01, 2023
Protein Language Model

ProteinLM We pretrain protein language model based on Megatron-LM framework, and then evaluate the pretrained model results on TAPE (Tasks Assessing P

THUDM 77 Dec 27, 2022
Sequence Modeling with Structured State Spaces

Structured State Spaces for Sequence Modeling This repository provides implementations and experiments for the following papers. S4 Efficiently Modeli

HazyResearch 902 Jan 06, 2023
An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition

CRNN paper:An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition 1. create your ow

Tsukinousag1 3 Apr 02, 2022
A simple chatbot based on chatterbot that you can use for anything has basic features

Chatbotium A simple chatbot based on chatterbot that you can use for anything has basic features. I have some errors Read the paragraph below: Known b

Herman 1 Feb 16, 2022
Installation, test and evaluation of Scribosermo speech-to-text engine

Scribosermo STT Setup Scribosermo is a LGPL licensed, open-source speech recognition engine to "Train fast Speech-to-Text networks in different langua

Florian Quirin 3 Jun 20, 2022
Telegram AI chat bot written in Python using Pyrogram

Aurora_Al Just another Telegram AI chat bot written in Python using Pyrogram. A public running instance can be found on telegram as @AuroraAl. Require

♗CσNϙUҽRσR_MҽSƙEƚҽҽR 1 Oct 31, 2021
SpikeX - SpaCy Pipes for Knowledge Extraction

SpikeX is a collection of pipes ready to be plugged in a spaCy pipeline. It aims to help in building knowledge extraction tools with almost-zero effort.

Erre Quadro Srl 384 Dec 12, 2022
Binary LSTM model for text classification

Text Classification The purpose of this repository is to create a neural network model of NLP with deep learning for binary classification of texts re

Nikita Elenberger 1 Mar 11, 2022
Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"

T5: Text-To-Text Transfer Transformer The t5 library serves primarily as code for reproducing the experiments in Exploring the Limits of Transfer Lear

Google Research 4.6k Jan 01, 2023
Python bindings to the dutch NLP tool Frog (pos tagger, lemmatiser, NER tagger, morphological analysis, shallow parser, dependency parser)

Frog for Python This is a Python binding to the Natural Language Processing suite Frog. Frog is intended for Dutch and performs part-of-speech tagging

Maarten van Gompel 46 Dec 14, 2022
Utility for Google Text-To-Speech batch audio files generator. Ideal for prompt files creation with Google voices for application in offline IVRs

Google Text-To-Speech Batch Prompt File Maker Are you in the need of IVR prompts, but you have no voice actors? Let Google talk your prompts like a pr

Ponchotitlán 1 Aug 19, 2021
Code for our ACL 2021 paper - ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer

ConSERT Code for our ACL 2021 paper - ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer Requirements torch==1.6.0

Yan Yuanmeng 478 Dec 25, 2022
OceanScript is an Esoteric language used to encode and decode text into a formulation of characters

OceanScript is an Esoteric language used to encode and decode text into a formulation of characters - where the final result looks like waves in the ocean.

Code for papers "Generation-Augmented Retrieval for Open-Domain Question Answering" and "Reader-Guided Passage Reranking for Open-Domain Question Answering", ACL 2021

This repo provides the code of the following papers: (GAR) "Generation-Augmented Retrieval for Open-domain Question Answering", ACL 2021 (RIDER) "Read

morning 49 Dec 26, 2022
Model for recasing and repunctuating ASR transcripts

Recasing and punctuation model based on Bert Benoit Favre 2021 This system converts a sequence of lowercase tokens without punctuation to a sequence o

Benoit Favre 88 Dec 29, 2022
Dual languaged (rus+eng) tool for packing and unpacking archives of Silky Engine.

SilkyArcTool English Dual languaged (rus+eng) GUI tool for packing and unpacking archives of Silky Engine. It is not the same arc as used in Ai6WIN. I

Tester 5 Sep 15, 2022