This repo provides code for QB-Norm (Cross Modal Retrieval with Querybank Normalisation)

Related tags

Deep Learningqb-norm
Overview

This repo provides code for QB-Norm (Cross Modal Retrieval with Querybank Normalisation)

Usage example

python dynamic_inverted_softmax.py --sims_train_test_path msrvtt/tt-ce-train-captions-test-videos-seed0.pkl --sims_test_path msrvtt/tt-ce-test-captions-test-videos-seed0.pkl --test_query_masks_path msrvtt/tt-ce-test-query_masks.pkl

To test QB-Norm on your own data you need to:

  1. Extract the similarity matrix between the caption from the training split and the videos from the testing split path/to/sims/train/test
  2. Extract testing split similarity matrix (similarities between testing captions and testing video) path/to/sims/test
  3. Run QB-Norm
python dynamic_inverted_softmax.py --sims_train_test_path path/to/sims/train/test --sims_test_path path/to/sims/test

Data

The similarity matrices for each method were extracted using the official repositories as follows: CE+, TT-CE+, CLIP2Video, CLIP4Clip (for CLIP4Clip we used the official repo to train from scratch new models since they do not provide pre-trained weights), CLIP, MMT, Audio-Retrieval.

You can download the extracted similarity matrices for training and testing here: MSRVTT, MSVD, DiDeMo, LSMDC.

Text-Video retrieval results

QB-Norm Results on MSRVTT Benchmark

Model Split Task [email protected] [email protected] [email protected] MdR Geom
CE+ Full t2v 14.4(0.1) 37.4(0.1) 50.2(0.1) 10.0(0.0) 30.0(0.1)
CE+ (+QB-Norm) Full t2v 16.4(0.0) 40.3(0.1) 52.9(0.1) 9.0(0.0) 32.7(0.1)
TT-CE+ Full t2v 14.9(0.1) 38.3(0.1) 51.5(0.1) 10.0(0.0) 30.9(0.1)
TT-CE+ (+QB-Norm) Full t2v 17.3(0.0) 42.1(0.2) 54.9(0.1) 8.0(0.0) 34.2(0.1)

QB-Norm Results on MSVD Benchmark

Model Split Task [email protected] [email protected] [email protected] MdR Geom
TT-CE+ Full t2v 25.4(0.3) 56.9(0.4) 71.3(0.2) 4.0(0.0) 46.9(0.3)
TT-CE+ (+QB-Norm) Full t2v 26.6(1.0) 58.6(1.3) 71.8(1.1) 4.0(0.0) 48.2(1.2)
CLIP2Video Full t2v 47.0 76.8 85.9 2.0 67.7
CLIP2Video (+QB-Norm) Full t2v 48.0 77.9 86.2 2.0 68.5

QB-Norm Results on DiDeMo Benchmark

Model Split Task [email protected] [email protected] [email protected] MdR Geom
TT-CE+ Full t2v 21.6(0.7) 48.6(0.4) 62.9(0.6) 6.0(0.0) 40.4(0.4)
TT-CE+ (+QB-Norm) Full t2v 24.2(0.7) 50.8(0.7) 64.4(0.1) 5.3(0.5) 43.0(0.2)
CLIP4Clip Full t2v 43.0 70.5 80.0 2.0 62.4
CLIP4Clip (+QB-Norm) Full t2v 43.5 71.4 80.9 2.0 63.1

QB-Norm Results on LSMDC Benchmark

Model Split Task [email protected] [email protected] [email protected] MdR Geom
TT-CE+ Full t2v 17.2(0.4) 36.5(0.6) 46.3(0.3) 13.7(0.5) 30.7(0.3)
TT-CE+ (+QB-Norm) Full t2v 17.8(0.4) 37.7(0.5) 47.6(0.6) 12.7(0.5) 31.7(0.3)
CLIP4Clip Full t2v 21.3 40.0 49.5 11.0 34.8
CLIP4Clip (+QB-Norm) Full t2v 22.4 40.1 49.5 11.0 35.4

QB-Norm Results on VaTeX Benchmark

Model Split Task [email protected] [email protected] [email protected] MdR Geom
TT-CE+ Full t2v 53.2(0.2) 87.4(0.1) 93.3(0.0) 1.0(0.0) 75.7(0.1)
TT-CE+ (+QB-Norm) Full t2v 54.8(0.1) 88.2(0.1) 93.8(0.1) 1.0(0.0) 76.8(0.0)
CLIP2Video Full t2v 57.4 87.9 93.6 1.0 77.9
CLIP2Video (+QB-Norm) Full t2v 58.8 88.3 93.8 1.0 78.7

QB-Norm Results on QuerYD Benchmark

Model Split Task [email protected] [email protected] [email protected] MdR Geom
CE+ Full t2v 13.2(2.0) 37.1(2.9) 50.5(1.9) 10.3(1.2) 29.1(2.2)
CE+ (+QB-Norm) Full t2v 14.1(1.8) 38.6(1.3) 51.1(1.6) 10.0(0.8) 30.2(1.7)
TT-CE+ Full t2v 14.4(0.5) 37.7(1.7) 50.9(1.6) 9.8(1.0) 30.3(0.9)
TT-CE+ (+QB-Norm) Full t2v 15.1(1.6) 38.3(2.4) 51.2(2.8) 10.3(1.7) 30.9(2.3)

Text-Image retrieval results

QB-Norm Results on MSCoCo Benchmark

Model Split Task [email protected] [email protected] [email protected] MdR Geom
CLIP 5k t2i 30.3 56.1 67.1 4.0 48.5
CLIP (+QB-Norm) 5k t2i 34.8 59.9 70.4 3.0 52.8
MMT-Oscar 5k t2i 52.2 80.2 88.0 1.0 71.7
MMT-Oscar (+QB-Norm) 5k t2i 53.9 80.5 88.1 1.0 72.6

Text-Audio retrieval results

QB-Norm Results on AudioCaps Benchmark

Model Split Task [email protected] [email protected] [email protected] MdR Geom
AR-CE Full t2a 23.1(0.6) 55.1(0.7) 70.7(0.6) 4.7(0.5) 44.8(0.7)
AR-CE (+QB-Norm) Full t2a 23.9(0.2) 57.1(0.3) 71.6(0.4) 4.0(0.0) 46.0(0.3)

References

If you find this code useful or use the extracted similarity matrices, please consider citing:

@misc{bogolin2021cross,
      title={Cross Modal Retrieval with Querybank Normalisation}, 
      author={Simion-Vlad Bogolin and Ioana Croitoru and Hailin Jin and Yang Liu and Samuel Albanie},
      year={2021},
      eprint={2112.12777},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
Team Enigma at ArgMining 2021 Shared Task: Leveraging Pretrained Language Models for Key Point Matching

Team Enigma at ArgMining 2021 Shared Task: Leveraging Pretrained Language Models for Key Point Matching This is our attempt of the shared task on Quan

Manav Nitin Kapadnis 12 Jul 08, 2022
Generic image compressor for machine learning. Pytorch code for our paper "Lossy compression for lossless prediction".

Lossy Compression for Lossless Prediction Using: Training: This repostiory contains our implementation of the paper: Lossy Compression for Lossless Pr

Yann Dubois 84 Jan 02, 2023
This repo contains the code for the paper "Efficient hierarchical Bayesian inference for spatio-temporal regression models in neuroimaging" that has been accepted to NeurIPS 2021.

Dugh-NeurIPS-2021 This repo contains the code for the paper "Efficient hierarchical Bayesian inference for spatio-temporal regression models in neuroi

Ali Hashemi 5 Jul 12, 2022
PyTorch Language Model for 1-Billion Word (LM1B / GBW) Dataset

PyTorch Large-Scale Language Model A Large-Scale PyTorch Language Model trained on the 1-Billion Word (LM1B) / (GBW) dataset Latest Results 39.98 Perp

Ryan Spring 114 Nov 04, 2022
On Evaluation Metrics for Graph Generative Models

On Evaluation Metrics for Graph Generative Models Authors: Rylee Thompson, Boris Knyazev, Elahe Ghalebi, Jungtaek Kim, Graham Taylor This is the offic

13 Jan 07, 2023
The (Official) PyTorch Implementation of the paper "Deep Extraction of Manga Structural Lines"

MangaLineExtraction_PyTorch The (Official) PyTorch Implementation of the paper "Deep Extraction of Manga Structural Lines" Usage model_torch.py [sourc

Miaomiao Li 82 Jan 02, 2023
Dogs classification with Deep Metric Learning using some popular losses

Tsinghua Dogs classification with Deep Metric Learning 1. Introduction Tsinghua Dogs dataset Tsinghua Dogs is a fine-grained classification dataset fo

QuocThangNguyen 45 Nov 09, 2022
Meta-meta-learning with evolution and plasticity

Evolve plastic networks to be able to automatically acquire novel cognitive (meta-learning) tasks

5 Jun 28, 2022
Code for "Learning the Best Pooling Strategy for Visual Semantic Embedding", CVPR 2021

Learning the Best Pooling Strategy for Visual Semantic Embedding Official PyTorch implementation of the paper Learning the Best Pooling Strategy for V

Jiacheng Chen 106 Jan 06, 2023
The official implementation of paper Siamese Transformer Pyramid Networks for Real-Time UAV Tracking, accepted by WACV22

SiamTPN Introduction This is the official implementation of the SiamTPN (WACV2022). The tracker intergrates pyramid feature network and transformer in

Robotics and Intelligent Systems Control @ NYUAD 29 Jan 08, 2023
[UNMAINTAINED] Automated machine learning for analytics & production

auto_ml Automated machine learning for production and analytics Installation pip install auto_ml Getting started from auto_ml import Predictor from au

Preston Parry 1.6k Jan 02, 2023
Collection of common code that's shared among different research projects in FAIR computer vision team.

fvcore fvcore is a light-weight core library that provides the most common and essential functionality shared in various computer vision frameworks de

Meta Research 1.5k Jan 07, 2023
Corruption Invariant Learning for Re-identification

Corruption Invariant Learning for Re-identification The official repository for Benchmarks for Corruption Invariant Person Re-identification (NeurIPS

Minghui Chen 73 Dec 08, 2022
[ICCV 2021] Our work presents a novel neural rendering approach that can efficiently reconstruct geometric and neural radiance fields for view synthesis.

MVSNeRF Project page | Paper This repository contains a pytorch lightning implementation for the ICCV 2021 paper: MVSNeRF: Fast Generalizable Radiance

Anpei Chen 529 Dec 30, 2022
A modern pure-Python library for reading PDF files

pdf A modern pure-Python library for reading PDF files. The goal is to have a modern interface to handle PDF files which is consistent with itself and

6 Apr 06, 2022
Official repository of PanoAVQA: Grounded Audio-Visual Question Answering in 360° Videos (ICCV 2021)

Pano-AVQA Official repository of PanoAVQA: Grounded Audio-Visual Question Answering in 360° Videos (ICCV 2021) [Paper] [Poster] [Video] Getting Starte

Heeseung Yun 9 Dec 23, 2022
Certifiable Outlier-Robust Geometric Perception

Certifiable Outlier-Robust Geometric Perception About This repository holds the implementation for certifiably solving outlier-robust geometric percep

83 Dec 31, 2022
Classifies galaxy morphology with Bayesian CNN

Zoobot Zoobot classifies galaxy morphology with deep learning. This code will let you: Reproduce and improve the Galaxy Zoo DECaLS automated classific

Mike Walmsley 39 Dec 20, 2022
Minimal diffusion models - Minimal code and simple experiments to play with Denoising Diffusion Probabilistic Models (DDPMs)

Minimal code and simple experiments to play with Denoising Diffusion Probabilist

Rithesh Kumar 16 Oct 06, 2022
Official Implementation of DE-CondDETR and DELA-CondDETR in "Towards Data-Efficient Detection Transformers"

DE-DETRs By Wen Wang, Jing Zhang, Yang Cao, Yongliang Shen, and Dacheng Tao This repository is an official implementation of DE-CondDETR and DELA-Cond

Wen Wang 41 Dec 12, 2022