The code of “Similarity Reasoning and Filtration for Image-Text Matching” [AAAI2021]

Last update: Dec 22, 2022

Overview

SGRAF

PyTorch implementation for AAAI2021 paper of “Similarity Reasoning and Filtration for Image-Text Matching”.

It is built on top of the SCAN and Cross-modal_Retrieval_Tutorial.

We have released two versions of SGRAF: Branch main for python2.7; Branch python3.6 for python3.6.

Introduction

The framework of SGRAF:

The updated results (Better than the original paper)

Dataset	Module	Sentence retrieval			Image retrieval
Dataset	Module	[email protected]	[email protected]	[email protected]	[email protected]	[email protected]	[email protected]
Flick30k	SAF	75.6	92.7	96.9	56.5	82.0	88.4
	SGR	76.6	93.7	96.6	56.1	80.9	87.0
	SGRAF	78.4	94.6	97.5	58.2	83.0	89.1
MSCOCO1k	SAF	78.0	95.9	98.5	62.2	89.5	95.4
	SGR	77.3	96.0	98.6	62.1	89.6	95.3
	SGRAF	79.2	96.5	98.6	63.5	90.2	95.8
MSCOCO5k	SAF	55.5	83.8	91.8	40.1	69.7	80.4
	SGR	57.3	83.2	90.6	40.5	69.6	80.3
	SGRAF	58.8	84.8	92.1	41.6	70.9	81.5

Requirements

We recommended the following dependencies for Branch main.

Python 2.7
PyTorch (>=0.4.1)
NumPy (>=1.12.1)
TensorBoard
Punkt Sentence Tokenizer:

import nltk
nltk.download()
> d punkt

Download data and vocab

We follow SCAN to obtain image features and vocabularies, which can be downloaded by using:

wget https://scanproject.blob.core.windows.net/scan-data/data.zip
wget https://scanproject.blob.core.windows.net/scan-data/vocab.zip

Pre-trained models and evaluation

Modify the model_path, data_path, vocab_path in the evaluation.py file. Then run evaluation.py:

python evaluation.py

Note that fold5=True is only for evaluation on mscoco1K (5 folders average) while fold5=False for mscoco5K and flickr30K. Pretrained models and Log files can be downloaded from Flickr30K_SGRAF and MSCOCO_SGRAF.

Training new models from scratch

Modify the data_path, vocab_path, model_name, logger_name in the opts.py file. Then run train.py:

For MSCOCO:

(For SGR) python train.py --data_name coco_precomp --num_epochs 20 --lr_update 10 --module_name SGR
(For SAF) python train.py --data_name coco_precomp --num_epochs 20 --lr_update 10 --module_name SAF

For Flickr30K:

(For SGR) python train.py --data_name f30k_precomp --num_epochs 40 --lr_update 30 --module_name SGR
(For SAF) python train.py --data_name f30k_precomp --num_epochs 30 --lr_update 20 --module_name SAF

Reference

If SGRAF is useful for your research, please cite the following paper:

@inproceedings{Diao2021SGRAF,
  title={Similarity Reasoning and Filtration for Image-Text Matching},
  author={Diao, Haiwen and Zhang, Ying and Ma, Lin and Lu, Huchuan},
  booktitle={AAAI},
  year={2021}
}

License

Apache License 2.0.
If any problems, please contact me at ([email protected]) or ([email protected]).

The code of “Similarity Reasoning and Filtration for Image-Text Matching” [AAAI2021]

Related tags

Overview

SGRAF

Introduction

Requirements

Download data and vocab

Pre-trained models and evaluation

Training new models from scratch

Reference

License

Owner

Ronnie_IIAU

TCNN Temporal convolutional neural network for real-time speech enhancement in the time domain

Proto-RL: Reinforcement Learning with Prototypical Representations

Simple Pose: Rethinking and Improving a Bottom-up Approach for Multi-Person Pose Estimation

Measuring if attention is explanation with ROAR

NAS-Bench-x11 and the Power of Learning Curves

CLDF dataset derived from Robbeets et al.'s "Triangulation Supports Agricultural Spread" from 2021

Contrastive Learning Inverts the Data Generating Process

Implementation of the SUMO (Slim U-Net trained on MODA) model

Code implementing "Improving Deep Learning Interpretability by Saliency Guided Training"

Official pytorch code for "APP: Anytime Progressive Pruning"

PyDEns is a framework for solving Ordinary and Partial Differential Equations (ODEs & PDEs) using neural networks

Face Recognition and Emotion Detector Device

TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis with Explicit Pitch and Duration Prediction.

TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-Captured Scenarios

Official pytorch implementation of Active Learning for deep object detection via probabilistic modeling (ICCV 2021)

1st ranked 'driver careless behavior detection' for AI Online Competition 2021, hosted by MSIT Korea.

Official repository of the AAAI'2022 paper "Contrast and Generation Make BART a Good Dialogue Emotion Recognizer"

[CVPR2021 Oral] FFB6D: A Full Flow Bidirectional Fusion Network for 6D Pose Estimation.

Generative Adversarial Text to Image Synthesis

C3DPO - Canonical 3D Pose Networks for Non-rigid Structure From Motion.