Multi Task Vision and Language

Last update: Dec 19, 2022

Related tags

Overview

12-in-1: Multi-Task Vision and Language Representation Learning

Please cite the following if you use this code. Code and pre-trained models for 12-in-1: Multi-Task Vision and Language Representation Learning:

@InProceedings{Lu_2020_CVPR,
author = {Lu, Jiasen and Goswami, Vedanuj and Rohrbach, Marcus and Parikh, Devi and Lee, Stefan},
title = {12-in-1: Multi-Task Vision and Language Representation Learning},
booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}

and ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks:

@inproceedings{lu2019vilbert,
  title={Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks},
  author={Lu, Jiasen and Batra, Dhruv and Parikh, Devi and Lee, Stefan},
  booktitle={Advances in Neural Information Processing Systems},
  pages={13--23},
  year={2019}
}

Repository Setup

Create a fresh conda environment, and install all dependencies.

conda create -n vilbert-mt python=3.6
conda activate vilbert-mt
git clone --recursive https://github.com/facebookresearch/vilbert-multi-task.git
cd vilbert-multi-task
pip install -r requirements.txt

Install pytorch

conda install pytorch torchvision cudatoolkit=10.0 -c pytorch

Install apex, follows https://github.com/NVIDIA/apex
Install this codebase as a package in this environment.

python setup.py develop

Data Setup

Check README.md under data for more details.

Visiolinguistic Pre-training and Multi Task Training

Pretraining on Conceptual Captions

python train_concap.py --bert_model bert-base-uncased --config_file config/bert_base_6layer_6conect.json --train_batch_size 512 --objective 1 --file_path <path_to_extracted_cc_features>

Download link

Multi-task Training

python train_tasks.py --bert_model bert-base-uncased --from_pretrained <pretrained_model_path> --config_file config/bert_base_6layer_6conect.json --tasks 1-2-4-7-8-9-10-11-12-13-15-17 --lr_scheduler 'warmup_linear' --train_iter_gap 4 --task_specific_tokens --save_name multi_task_model

Download link

Fine-tune from Multi-task trained model

python train_tasks.py --bert_model bert-base-uncased --from_pretrained <multi_task_model_path> --config_file config/bert_base_6layer_6conect.json --tasks 1 --lr_scheduler 'warmup_linear' --train_iter_gap 4 --task_specific_tokens --save_name finetune_from_multi_task_model

License

vilbert-multi-task is licensed under MIT license available in LICENSE file.

Multi Task Vision and Language

Related tags

Overview

12-in-1: Multi-Task Vision and Language Representation Learning

Repository Setup

Data Setup

Visiolinguistic Pre-training and Multi Task Training

Pretraining on Conceptual Captions

Multi-task Training

Fine-tune from Multi-task trained model

License

Owner

Facebook Research

Driller: augmenting AFL with symbolic execution!

Distributionally robust neural networks for group shifts

PyTorch implementation of the NIPS-17 paper "Poincaré Embeddings for Learning Hierarchical Representations"

Codes and Data Processing Files for our paper.

ICLR 2021: Pre-Training for Context Representation in Conversational Semantic Parsing

Prototype-based Incremental Few-Shot Semantic Segmentation

Predict stock movement with Machine Learning and Deep Learning algorithms

CLIP+FFT text-to-image

Imposter-detector-2022 - HackED 2022 Team 3IQ - 2022 Imposter Detector

A Python package to process & model ChEMBL data.

Download files from DSpace systems (because for some reason DSpace won't let you)

Dynamic Graph Event Detection

A curated list of Machine Learning and Deep Learning tutorials in Jupyter Notebook format ready to run in Google Colaboratory

The official code of Anisotropic Stroke Control for Multiple Artists Style Transfer

MBPO (paper: When to trust your model: Model-based policy optimization) in offline RL settings

Resilience from Diversity: Population-based approach to harden models against adversarial attacks

OpenAi's gym environment wrapper to vectorize them with Ray

ByteTrack超详细教程！训练自己的数据集&&摄像头实时检测跟踪

FAMIE is a comprehensive and efficient active learning (AL) toolkit for multilingual information extraction (IE)

ETMO: Evolutionary Transfer Multiobjective Optimization