This repo provides code for QB-Norm (Cross Modal Retrieval with Querybank Normalisation)

Related tags

Deep Learningqb-norm
Overview

This repo provides code for QB-Norm (Cross Modal Retrieval with Querybank Normalisation)

Usage example

python dynamic_inverted_softmax.py --sims_train_test_path msrvtt/tt-ce-train-captions-test-videos-seed0.pkl --sims_test_path msrvtt/tt-ce-test-captions-test-videos-seed0.pkl --test_query_masks_path msrvtt/tt-ce-test-query_masks.pkl

To test QB-Norm on your own data you need to:

  1. Extract the similarity matrix between the caption from the training split and the videos from the testing split path/to/sims/train/test
  2. Extract testing split similarity matrix (similarities between testing captions and testing video) path/to/sims/test
  3. Run QB-Norm
python dynamic_inverted_softmax.py --sims_train_test_path path/to/sims/train/test --sims_test_path path/to/sims/test

Data

The similarity matrices for each method were extracted using the official repositories as follows: CE+, TT-CE+, CLIP2Video, CLIP4Clip (for CLIP4Clip we used the official repo to train from scratch new models since they do not provide pre-trained weights), CLIP, MMT, Audio-Retrieval.

You can download the extracted similarity matrices for training and testing here: MSRVTT, MSVD, DiDeMo, LSMDC.

Text-Video retrieval results

QB-Norm Results on MSRVTT Benchmark

Model Split Task [email protected] [email protected] [email protected] MdR Geom
CE+ Full t2v 14.4(0.1) 37.4(0.1) 50.2(0.1) 10.0(0.0) 30.0(0.1)
CE+ (+QB-Norm) Full t2v 16.4(0.0) 40.3(0.1) 52.9(0.1) 9.0(0.0) 32.7(0.1)
TT-CE+ Full t2v 14.9(0.1) 38.3(0.1) 51.5(0.1) 10.0(0.0) 30.9(0.1)
TT-CE+ (+QB-Norm) Full t2v 17.3(0.0) 42.1(0.2) 54.9(0.1) 8.0(0.0) 34.2(0.1)

QB-Norm Results on MSVD Benchmark

Model Split Task [email protected] [email protected] [email protected] MdR Geom
TT-CE+ Full t2v 25.4(0.3) 56.9(0.4) 71.3(0.2) 4.0(0.0) 46.9(0.3)
TT-CE+ (+QB-Norm) Full t2v 26.6(1.0) 58.6(1.3) 71.8(1.1) 4.0(0.0) 48.2(1.2)
CLIP2Video Full t2v 47.0 76.8 85.9 2.0 67.7
CLIP2Video (+QB-Norm) Full t2v 48.0 77.9 86.2 2.0 68.5

QB-Norm Results on DiDeMo Benchmark

Model Split Task [email protected] [email protected] [email protected] MdR Geom
TT-CE+ Full t2v 21.6(0.7) 48.6(0.4) 62.9(0.6) 6.0(0.0) 40.4(0.4)
TT-CE+ (+QB-Norm) Full t2v 24.2(0.7) 50.8(0.7) 64.4(0.1) 5.3(0.5) 43.0(0.2)
CLIP4Clip Full t2v 43.0 70.5 80.0 2.0 62.4
CLIP4Clip (+QB-Norm) Full t2v 43.5 71.4 80.9 2.0 63.1

QB-Norm Results on LSMDC Benchmark

Model Split Task [email protected] [email protected] [email protected] MdR Geom
TT-CE+ Full t2v 17.2(0.4) 36.5(0.6) 46.3(0.3) 13.7(0.5) 30.7(0.3)
TT-CE+ (+QB-Norm) Full t2v 17.8(0.4) 37.7(0.5) 47.6(0.6) 12.7(0.5) 31.7(0.3)
CLIP4Clip Full t2v 21.3 40.0 49.5 11.0 34.8
CLIP4Clip (+QB-Norm) Full t2v 22.4 40.1 49.5 11.0 35.4

QB-Norm Results on VaTeX Benchmark

Model Split Task [email protected] [email protected] [email protected] MdR Geom
TT-CE+ Full t2v 53.2(0.2) 87.4(0.1) 93.3(0.0) 1.0(0.0) 75.7(0.1)
TT-CE+ (+QB-Norm) Full t2v 54.8(0.1) 88.2(0.1) 93.8(0.1) 1.0(0.0) 76.8(0.0)
CLIP2Video Full t2v 57.4 87.9 93.6 1.0 77.9
CLIP2Video (+QB-Norm) Full t2v 58.8 88.3 93.8 1.0 78.7

QB-Norm Results on QuerYD Benchmark

Model Split Task [email protected] [email protected] [email protected] MdR Geom
CE+ Full t2v 13.2(2.0) 37.1(2.9) 50.5(1.9) 10.3(1.2) 29.1(2.2)
CE+ (+QB-Norm) Full t2v 14.1(1.8) 38.6(1.3) 51.1(1.6) 10.0(0.8) 30.2(1.7)
TT-CE+ Full t2v 14.4(0.5) 37.7(1.7) 50.9(1.6) 9.8(1.0) 30.3(0.9)
TT-CE+ (+QB-Norm) Full t2v 15.1(1.6) 38.3(2.4) 51.2(2.8) 10.3(1.7) 30.9(2.3)

Text-Image retrieval results

QB-Norm Results on MSCoCo Benchmark

Model Split Task [email protected] [email protected] [email protected] MdR Geom
CLIP 5k t2i 30.3 56.1 67.1 4.0 48.5
CLIP (+QB-Norm) 5k t2i 34.8 59.9 70.4 3.0 52.8
MMT-Oscar 5k t2i 52.2 80.2 88.0 1.0 71.7
MMT-Oscar (+QB-Norm) 5k t2i 53.9 80.5 88.1 1.0 72.6

Text-Audio retrieval results

QB-Norm Results on AudioCaps Benchmark

Model Split Task [email protected] [email protected] [email protected] MdR Geom
AR-CE Full t2a 23.1(0.6) 55.1(0.7) 70.7(0.6) 4.7(0.5) 44.8(0.7)
AR-CE (+QB-Norm) Full t2a 23.9(0.2) 57.1(0.3) 71.6(0.4) 4.0(0.0) 46.0(0.3)

References

If you find this code useful or use the extracted similarity matrices, please consider citing:

@misc{bogolin2021cross,
      title={Cross Modal Retrieval with Querybank Normalisation}, 
      author={Simion-Vlad Bogolin and Ioana Croitoru and Hailin Jin and Yang Liu and Samuel Albanie},
      year={2021},
      eprint={2112.12777},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
A Decentralized Omnidirectional Visual-Inertial-UWB State Estimation System for Aerial Swar.

Omni-swarm A Decentralized Omnidirectional Visual-Inertial-UWB State Estimation System for Aerial Swarm Introduction Omni-swarm is a decentralized omn

HKUST Aerial Robotics Group 99 Dec 23, 2022
๐Ÿ˜ฎThe official implementation of "CoNeRF: Controllable Neural Radiance Fields" ๐Ÿ˜ฎ

CoNeRF: Controllable Neural Radiance Fields This is the official implementation for "CoNeRF: Controllable Neural Radiance Fields" Project Page Paper V

Kacper Kania 61 Dec 24, 2022
Adversarial Adaptation with Distillation for BERT Unsupervised Domain Adaptation

Knowledge Distillation for BERT Unsupervised Domain Adaptation Official PyTorch implementation | Paper Abstract A pre-trained language model, BERT, ha

Minho Ryu 29 Nov 30, 2022
AgeGuesser: deep learning based age estimation system. Powered by EfficientNet and Yolov5

AgeGuesser AgeGuesser is an end-to-end, deep-learning based Age Estimation system, presented at the CAIP 2021 conference. You can find the related pap

5 Nov 10, 2022
Markov Attention Models

Introduction This repo contains code for reproducing the results in the paper Graphical Models with Attention for Context-Specific Independence and an

Vicarious 0 Dec 09, 2021
Highly comparative time-series analysis

ใ€ฐ๏ธ hctsa ใ€ฐ๏ธ : highly comparative time-series analysis hctsa is a software package for running highly comparative time-series analysis using Matlab (fu

Ben Fulcher 569 Dec 21, 2022
Official PyTorch Implementation for "Recurrent Video Deblurring with Blur-Invariant Motion Estimation and Pixel Volumes"

PVDNet: Recurrent Video Deblurring with Blur-Invariant Motion Estimation and Pixel Volumes This repository contains the official PyTorch implementatio

Junyong Lee 98 Nov 06, 2022
Research code for CVPR 2021 paper "End-to-End Human Pose and Mesh Reconstruction with Transformers"

MeshTransformer โœจ This is our research code of End-to-End Human Pose and Mesh Reconstruction with Transformers. MEsh TRansfOrmer is a simple yet effec

Microsoft 473 Dec 31, 2022
Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.

Pattern Pattern is a web mining module for Python. It has tools for: Data Mining: web services (Google, Twitter, Wikipedia), web crawler, HTML DOM par

Computational Linguistics Research Group 8.4k Jan 03, 2023
Sequential GCN for Active Learning

Sequential GCN for Active Learning Please cite if using the code: Link to paper. Requirements: python 3.6+ torch 1.0+ pip libraries: tqdm, sklearn, sc

45 Dec 26, 2022
Source Code for DialogBERT: Discourse-Aware Response Generation via Learning to Recover and Rank Utterances (https://arxiv.org/pdf/2012.01775.pdf)

DialogBERT This is a PyTorch implementation of the DialogBERT model described in DialogBERT: Neural Response Generation via Hierarchical BERT with Dis

Xiaodong Gu 67 Jan 06, 2023
Keras implementation of PersonLab for Multi-Person Pose Estimation and Instance Segmentation.

PersonLab This is a Keras implementation of PersonLab for Multi-Person Pose Estimation and Instance Segmentation. The model predicts heatmaps and vari

OCTI 160 Dec 21, 2022
A large-scale video dataset for the training and evaluation of 3D human pose estimation models

ASPset-510 (Australian Sports Pose Dataset) is a large-scale video dataset for the training and evaluation of 3D human pose estimation models. It contains 17 different amateur subjects performing 30

Aiden Nibali 25 Jun 20, 2021
Raindrop strategy for Irregular time series

Graph-Guided Network For Irregularly Sampled Multivariate Time Series Overview This repository contains processed datasets and implementation code for

Zitnik Lab @ Harvard 74 Jan 03, 2023
[CVPR 2022] Unsupervised Image-to-Image Translation with Generative Prior

GP-UNIT - Official PyTorch Implementation This repository provides the official PyTorch implementation for the following paper: Unsupervised Image-to-

Shuai Yang 125 Jan 03, 2023
๐Ÿ”€ Visual Room Rearrangement

AI2-THOR Rearrangement Challenge Welcome to the 2021 AI2-THOR Rearrangement Challenge hosted at the CVPR'21 Embodied-AI Workshop. The goal of this cha

AI2 55 Dec 22, 2022
Suite of 500 procedurally-generated NLP tasks to study language model adaptability

TaskBench500 The TaskBench500 dataset and code for generating tasks. Data The TaskBench dataset is available under wget http://web.mit.edu/bzl/www/Tas

Belinda Li 20 May 17, 2022
QuanTaichi evaluation suite

QuanTaichi: A Compiler for Quantized Simulations (SIGGRAPH 2021) Yuanming Hu, Jiafeng Liu, Xuanda Yang, Mingkuan Xu, Ye Kuang, Weiwei Xu, Qiang Dai, W

Taichi Developers 120 Jan 04, 2023
๐Ÿ’ƒ VALSE: A Task-Independent Benchmark for Vision and Language Models Centered on Linguistic Phenomena

๐Ÿ’ƒ VALSE: A Task-Independent Benchmark for Vision and Language Models Centered on Linguistic Phenomena.

Heidelberg-NLP 17 Nov 07, 2022
TLDR: Twin Learning for Dimensionality Reduction

TLDR (Twin Learning for Dimensionality Reduction) is an unsupervised dimensionality reduction method that combines neighborhood embedding learning with the simplicity and effectiveness of recent self

NAVER 105 Dec 28, 2022