Composed Image Retrieval using Pretrained LANguage Transformers (CIRPLANT)

Overview

CIRPLANT

This repository contains the code and pre-trained models for Composed Image Retrieval using Pretrained LANguage Transformers (CIRPLANT)

For details please see our ICCV 2021 paper - Image Retrieval on Real-life Images with Pre-trained Vision-and-Language Models.

Demo image from CIRR data

If you find this repository useful, we would appreciate it if you could give us a star.

You are currently viewing the code & model repository. For more information, see our Project homepage.

Introduction

CIRPLANT is a transformer based model that leverages rich pre-trained vision-and-language (V&L) knowledge for modifying visual features conditioned on natural language. To the best of our knowledge, this is the first attempt in repurposing a V&L pre-trained (VLP) model for composed image retrieval- a task that requires language-conditioned image feature modification.

Our intention is to extend current methods to the open-domain. Together with the release of the CIRR dataset, we hope this work can inspire further research on composed image retrieval

Installation & Dataset Preparation

Check INSTALL.md for installation instructions.

Training

To train the model and reproduce our published results on CIRR:

python trainval_oscar.py --dataset cirr --usefeat nlvr-resnet152_w_empty --max_epochs 300 --model CIRPLANT-img --model_type 'bert' --model_name_or_path data/Oscar_pretrained_models/base-vg-labels/ep_107_1192087 --task_name cirr --gpus 1 --img_feature_dim 2054 --max_img_seq_length 1 --model_type bert --do_lower_case --max_seq_length 40 --learning_rate 1e-05 --loss_type xe --seed 88 --drop_out 0.3 --weight_decay 0.05 --warmup_steps 0 --loss st --batch_size 32 --num_batches 529 --pin_memory --num_workers_per_gpu 0 --comment input_your_comments --output saved_models/cirr_rc2_iccv_release_test --log_by recall_inset_top1_correct_composition

To use pre-trained weights to reproduce results in our ICCV 2021 paper, please see DOWNLOAD.md.

Developing

To develop based on our code, we highly recommend first getting familar with Pytorch Lightning.

You can train models as we have described above, the results will be saved to a folder of your choosing.

To inspect results, we recommend using Tensorboard and load the saved events.out.tfevents file. Alternatively, you can also find all information dumped to a text file log.txt.

Pytorch Lightning automatically saves the latest checkpoint last.ckpt in the same output directory. Additionally, you can also specify a certain validation score name --log_by [...] to monitor, which enables saving of the best checkpoint.

Test-split Evaluation

We do not publish the ground truth for the test split of CIRR. Instead, we host an evaluation server, should you prefer to publish results on the test-split.

To generate .json files and upload to the test server, load a trained checkpoint and enable --testonly.

As an example, compare the following arguments with the training arguments above.

python trainval_oscar.py --dataset cirr --usefeat nlvr-resnet152_w_empty --max_epochs 300 --model CIRPLANT-img --model_type 'bert' --model_name_or_path data/Oscar_pretrained_models/base-vg-labels/ep_107_1192087 --task_name cirr --gpus 1 --img_feature_dim 2054 --max_img_seq_length 1 --model_type bert --do_lower_case --max_seq_length 40 --learning_rate 1e-05 --loss_type xe --seed 88 --drop_out 0.3 --weight_decay 0.05 --warmup_steps 0 --loss st --batch_size 32 --num_batches 529 --pin_memory --num_workers_per_gpu 0 --comment input_your_comments --output saved_models/cirr_rc2_iccv_release_test --log_by recall_inset_top1_correct_composition --check_val_every_n_epoch 1 --testonly --load_from_checkpoint $CKPT_PATH

Two .json files will be saved to the output directory, one for Recall validation, the other for Recall_Subset. Visit our test server and upload it to get results.

Citation

Please consider citing this paper if you use the code:

@article{liu2021cirr,
      title={Image Retrieval on Real-life Images with Pre-trained Vision-and-Language Models}, 
      author={Zheyuan Liu and Cristian Rodriguez-Opazo and Damien Teney and Stephen Gould},
      journal={arXiv preprint arXiv:2108.04024},
      year={2021},
}
Owner
Zheyuan (David) Liu
長い夢見る心はそう 永遠で
Zheyuan (David) Liu
IEEEXtreme15.0 Questions And Answers

IEEEXtreme15.0 Questions And Answers IEEEXtreme is a global challenge in which teams of IEEE Student members – advised and proctored by an IEEE member

Dilan Perera 15 Oct 24, 2022
Submit issues and feature requests for our API here.

AIx GPT API Submit issues and feature requests for our API here. See https://apps.aixsolutionsgroup.com for more info. Python Quick Start pip install

AIx Solutions 7 Mar 27, 2022
NLP-SentimentAnalysis - Coursera Course ( Duration : 5 weeks ) offered by DeepLearning.AI

Coursera Natural Language Processing Specialization This repository contains material related to Coursera Natural Language Processing Specialization.

Nishant Sharma 1 Jun 05, 2022
glow-speak is a fast, local, neural text to speech system that uses eSpeak-ng as a text/phoneme front-end.

Glow-Speak glow-speak is a fast, local, neural text to speech system that uses eSpeak-ng as a text/phoneme front-end. Installation git clone https://g

Rhasspy 8 Dec 25, 2022
iBOT: Image BERT Pre-Training with Online Tokenizer

Image BERT Pre-Training with iBOT Official PyTorch implementation and pretrained models for paper iBOT: Image BERT Pre-Training with Online Tokenizer.

Bytedance Inc. 435 Jan 06, 2023
PyKaldi is a Python scripting layer for the Kaldi speech recognition toolkit.

PyKaldi is a Python scripting layer for the Kaldi speech recognition toolkit. It provides easy-to-use, low-overhead, first-class Python wrappers for t

922 Dec 31, 2022
Quick insights from Zoom meeting transcripts using Graph + NLP

Transcript Analysis - Graph + NLP This program extracts insights from Zoom Meeting Transcripts (.vtt) using TigerGraph and NLTK. In order to run this

Advit Deepak 7 Sep 17, 2022
Simple virtual assistant using pyttsx3 and speech recognition optionally with pywhatkit and pther libraries.

VirtualAssistant Simple virtual assistant using pyttsx3 and speech recognition optionally with pywhatkit and pther libraries. Third Party Libraries us

Logadheep 1 Nov 27, 2021
✔👉A Centralized WebApp to Ensure Road Safety by checking on with the activities of the driver and activating label generator using NLP.

AI-For-Road-Safety Challenge hosted by Omdena Hyderabad Chapter Original Repo Link : https://github.com/OmdenaAI/omdena-india-roadsafety Final Present

Prathima Kadari 7 Nov 29, 2022
Extract city and country mentions from Text like GeoText without regex, but FlashText, a Aho-Corasick implementation.

flashgeotext ⚡ 🌍 Extract and count countries and cities (+their synonyms) from text, like GeoText on steroids using FlashText, a Aho-Corasick impleme

Ben 57 Dec 16, 2022
DVC-NLP-Simple-usecase

dvc-NLP-simple-usecase DVC NLP project Reference repository: official reference repo DVC STUDIO MY View Bag of Words- Krish Naik TF-IDF- Krish Naik ST

SUNNY BHAVEEN CHANDRA 2 Oct 02, 2022
A Facebook Messenger Chatbot using NLP

A Facebook Messenger Chatbot using NLP This project is about creating a messenger chatbot using basic NLP techniques and models like Logistic Regressi

6 Nov 20, 2022
A2T: Towards Improving Adversarial Training of NLP Models (EMNLP 2021 Findings)

A2T: Towards Improving Adversarial Training of NLP Models This is the source code for the EMNLP 2021 (Findings) paper "Towards Improving Adversarial T

QData 17 Oct 15, 2022
Data and code to support "Applied Natural Language Processing" (INFO 256, Fall 2021, UC Berkeley)

anlp21 Course materials for "Applied Natural Language Processing" (INFO 256, Fall 2021, UC Berkeley) Syllabus: http://people.ischool.berkeley.edu/~dba

David Bamman 48 Dec 06, 2022
Examples of using sparse attention, as in "Generating Long Sequences with Sparse Transformers"

Status: Archive (code is provided as-is, no updates expected) Update August 2020: For an example repository that achieves state-of-the-art modeling pe

OpenAI 1.3k Dec 28, 2022
Scikit-learn style model finetuning for NLP

Scikit-learn style model finetuning for NLP Finetune is a library that allows users to leverage state-of-the-art pretrained NLP models for a wide vari

indico 665 Dec 17, 2022
Revisiting Pre-trained Models for Chinese Natural Language Processing (Findings of EMNLP 2020)

This repository contains the resources in our paper "Revisiting Pre-trained Models for Chinese Natural Language Processing", which will be published i

Yiming Cui 463 Dec 30, 2022
Pre-Training with Whole Word Masking for Chinese BERT

Pre-Training with Whole Word Masking for Chinese BERT

Yiming Cui 7.7k Dec 31, 2022
Idea is to build a model which will take keywords as inputs and generate sentences as outputs.

keytotext Idea is to build a model which will take keywords as inputs and generate sentences as outputs. Potential use case can include: Marketing Sea

Gagan Bhatia 364 Jan 03, 2023
Unlimited Call - Text Bombing Tool

FastBomber Unlimited Call - Text Bombing Tool Installation On Termux

Aryan 6 Nov 10, 2022