code for TCL: Vision-Language Pre-Training with Triple Contrastive Learning, CVPR 2022

Overview

Vision-Language Pre-Training with Triple Contrastive Learning, CVPR 2022

News

(03/16/2022) upload retrieval checkpoints finetuned on COCO and Flickr


This is the official PyTorch implementation of TCL

image

Requirements:

conda install pytorch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2 cudatoolkit=11.0 -c pytorch
pip install transformers==4.8.1
pip install timm==0.4.9
conda install ruamel_yaml
pip install opencv-python
pip install --upgrade Pillow
pip install einops

Pre-training Datasets:

Downstream-task Datasets:

Json Files from Pre-training and Downstream Tasks:

  • refer to Download in ALBEF
  • you need to change the image path in json files according to your downloaded images

Pre-trained checkpoint:

Pre-training:

python -m torch.distributed.launch --nproc_per_node=8 \
--use_env Pretrain.py \
--config ./configs/Pretrain.yaml \
--output_dir output/pretrain

Downstream Tasks:

Image-Text Retrieval

# zero-shot coco 
python -m torch.distributed.launch --nproc_per_node=8 \
--use_env Retrieval.py \
--config ./configs/Retrieval_coco.yaml \
--output_dir output/pretrain_e30_Retrieval_coco_zeroshot \
--checkpoint output/pretrain/checkpoint_29.pth \
--evaluate

# fine-tune flickr
python -m torch.distributed.launch --nproc_per_node=8 \
--use_env Retrieval.py \
--config ./configs/Retrieval_flickr.yaml \
--output_dir output/pretrain_e30_Retrieval_flickr \
--checkpoint output/pretrain/checkpoint_29.pth

# fine-tune coco
python -m torch.distributed.launch --nproc_per_node=8 \
--use_env Retrieval.py \
--config ./configs/Retrieval_coco.yaml \
--output_dir output/pretrain_e30_Retrieval_coco \
--checkpoint output/pretrain/checkpoint_29.pth

# zero-shot flickr 
python -m torch.distributed.launch --nproc_per_node=8 \
--use_env Retrieval.py \
--config ./configs/Retrieval_flickr.yaml \
--output_dir output/pretrain_e30_Retrieval_flickr_zeroshot \
--checkpoint output/pretrain_e30_Retrieval_coco/checkpoint_best.pth \
--evaluate

VQA

python -m torch.distributed.launch --nproc_per_node=8 \
--use_env VQA.py \
--config ./configs/VQA.yaml \
--output_dir output/pretrain_e30_vqa \
--checkpoint output/pretrain/checkpoint_29.pth

Visual Entailment

python -m torch.distributed.launch --nproc_per_node=8 \
--use_env VE.py \
--config ./configs/VE.yaml \
--output_dir output/pretrain_e30_VE \
--checkpoint output/pretrain/checkpoint_29.pth

NLVR2

# pre-train nlvr
python -m torch.distributed.launch --nproc_per_node=8 \
--use_env Pretrain_nlvr.py \
--config ./configs/NLVR_pretrain.yaml \
--output_dir output/pretrain_e30_NLVR_pretrain \
--checkpoint output/pretrain/checkpoint_29.pth

# fine-tune nlvr
python -m torch.distributed.launch --nproc_per_node=8 \
--use_env NLVR.py \
--config ./configs/NLVR.yaml \
--output_dir output/pretrain_e30_NLVR \
--checkpoint output/pretrain_e30_NLVR_pretrain/checkpoint_00.pth

Citation:

@article{yang2022vision,
  title={Vision-Language Pre-Training with Triple Contrastive Learning},
  author={Yang, Jinyu and Duan, Jiali and Tran, Son and Xu, Yi and Chanda, Sampath and Chen, Liqun and Zeng, Belinda and Chilimbi, Trishul and Huang, Junzhou},
  booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
  year={2022}
}

Our code is largely borrowed from ALBEF

A Kernel fuzzer focusing on race bugs

Razzer: Finding kernel race bugs through fuzzing Environment setup $ source scripts/envsetup.sh scripts/envsetup.sh sets up necessary environment var

Systems and Software Security Lab at Seoul National University (SNU) 328 Dec 26, 2022
Channel Pruning for Accelerating Very Deep Neural Networks (ICCV'17)

Channel Pruning for Accelerating Very Deep Neural Networks (ICCV'17)

Yihui He 1k Jan 03, 2023
Convert game ISO and archives to CD CHD for emulation on Linux.

tochd Convert game ISO and archives to CD CHD for emulation. Author: Tuncay D. Source: https://github.com/thingsiplay/tochd Releases: https://github.c

Tuncay 20 Jan 02, 2023
VoxHRNet - Whole Brain Segmentation with Full Volume Neural Network

VoxHRNet This is the official implementation of the following paper: Whole Brain Segmentation with Full Volume Neural Network Yeshu Li, Jonathan Cui,

Microsoft 12 Nov 24, 2022
🔎 Monitor deep learning model training and hardware usage from your mobile phone 📱

Monitor deep learning model training and hardware usage from mobile. 🔥 Features Monitor running experiments from mobile phone (or laptop) Monitor har

labml.ai 1.2k Dec 25, 2022
implementation for paper "ShelfNet for fast semantic segmentation"

ShelfNet-lightweight for paper (ShelfNet for fast semantic segmentation) This repo contains implementation of ShelfNet-lightweight models for real-tim

Juntang Zhuang 252 Sep 16, 2022
Code and data to accompany the camera-ready version of "Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation" in EMNLP 2021

Code and data to accompany the camera-ready version of "Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation" in EMNLP 2021

Mozhdeh Gheini 16 Jul 16, 2022
[ICCV21] Code for RetrievalFuse: Neural 3D Scene Reconstruction with a Database

RetrievalFuse Paper | Project Page | Video RetrievalFuse: Neural 3D Scene Reconstruction with a Database Yawar Siddiqui, Justus Thies, Fangchang Ma, Q

Yawar Nihal Siddiqui 75 Dec 22, 2022
Official code for Score-Based Generative Modeling through Stochastic Differential Equations

Score-Based Generative Modeling through Stochastic Differential Equations This repo contains the official implementation for the paper Score-Based Gen

Yang Song 818 Jan 06, 2023
How the Deep Q-learning method works and discuss the new ideas that makes the algorithm work

Deep Q-Learning Recommend papers The first step is to read and understand the method that you will implement. It was first introduced in a 2013 paper

1 Jan 25, 2022
OpenMMLab Video Perception Toolbox. It supports Video Object Detection (VID), Multiple Object Tracking (MOT), Single Object Tracking (SOT), Video Instance Segmentation (VIS) with a unified framework.

English | 简体中文 Documentation: https://mmtracking.readthedocs.io/ Introduction MMTracking is an open source video perception toolbox based on PyTorch.

OpenMMLab 2.7k Jan 08, 2023
Computational modelling of ray propagation through optical elements using the principles of geometric optics (Ray Tracer)

Computational modelling of ray propagation through optical elements using the principles of geometric optics (Ray Tracer) Introduction By applying the

Son Gyo Jung 1 Jul 09, 2022
Code repo for "FASA: Feature Augmentation and Sampling Adaptation for Long-Tailed Instance Segmentation" (ICCV 2021)

FASA: Feature Augmentation and Sampling Adaptation for Long-Tailed Instance Segmentation (ICCV 2021) This repository contains the implementation of th

Yuhang Zang 21 Dec 17, 2022
PyTorch implementation of the Quasi-Recurrent Neural Network - up to 16 times faster than NVIDIA's cuDNN LSTM

Quasi-Recurrent Neural Network (QRNN) for PyTorch Updated to support multi-GPU environments via DataParallel - see the the multigpu_dataparallel.py ex

Salesforce 1.3k Dec 28, 2022
A curated list of awesome papers for Semantic Retrieval (TOIS Accepted: Semantic Models for the First-stage Retrieval: A Comprehensive Review).

A curated list of awesome papers for Semantic Retrieval (TOIS Accepted: Semantic Models for the First-stage Retrieval: A Comprehensive Review).

Yinqiong Cai 189 Dec 28, 2022
Very deep VAEs in JAX/Flax

Very Deep VAEs in JAX/Flax Implementation of the experiments in the paper Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them on I

Jamie Townsend 42 Dec 12, 2022
Consumer Fairness in Recommender Systems: Contextualizing Definitions and Mitigations

Consumer Fairness in Recommender Systems: Contextualizing Definitions and Mitigations This is the repository for the paper Consumer Fairness in Recomm

7 Nov 30, 2022
My take on a practical implementation of Linformer for Pytorch.

Linformer Pytorch Implementation A practical implementation of the Linformer paper. This is attention with only linear complexity in n, allowing for v

Peter 349 Dec 25, 2022
Code for the paper titled "Generalized Depthwise-Separable Convolutions for Adversarially Robust and Efficient Neural Networks" (NeurIPS 2021 Spotlight).

Generalized Depthwise-Separable Convolutions for Adversarially Robust and Efficient Neural Networks This repository contains the code and pre-trained

Hassan Dbouk 7 Dec 05, 2022
Latte: Cross-framework Python Package for Evaluation of Latent-based Generative Models

Cross-framework Python Package for Evaluation of Latent-based Generative Models Latte Latte (for LATent Tensor Evaluation) is a cross-framework Python

Karn Watcharasupat 30 Sep 08, 2022