[2021 MultiMedia] CONQUER: Contextual Query-aware Ranking for Video Corpus Moment Retrieval

Related tags

Deep LearningCONQUER
Overview

CONQUER: Contexutal Query-aware Ranking for Video Corpus Moment Retreival

PyTorch implementation of CONQUER: Contexutal Query-aware Ranking for Video Corpus Moment Retreival.

Task Definition

Given a natural language query, e.g., Addison is having a conversation with Bailey while checking on her baby, the problem of Video Corpus Moment Retrieval, is to locate a precise moment in a video retrieved from a large video corpus. And we are especially interested in the more pragmatic scenario, videos are additionally associated with the text descriptions such as subtitles or ASR (automatic speech transcript).

task_definition

Model Overiew

CONQUER:

  • Query-dependent Fusion (QDF)
  • Query-aware Feature Learning (QAL)
  • Moment localization (ML) head and optional video scoring (VS) head

model_overview

Getting started

Prerequisites

1 . Clone this repository

git clone https://github.com/houzhijian/CONQUER.git
cd CONQUER

2 . Prepare feature files and data

Download tvr_feature_release.tar.gz (21GB). After downloading the feature file, extract it to YOUR DATA STORAGE directory:

tar zxvf path/to/tvr_feature_release.tar.gz 

You should be able to see tvr_feature_release under YOUR DATA STORAGE directory.

It contains visual features (ResNet, SlowFast) obtained from HERO authors and text features (subtitle and query, from fine-tuned RoBERTa) obtained from XML authors. You can refer to the code to learn details on how the features are extracted: visual feature extraction, text feature extraction.

Then modify root_path inside config/tvr_data_config.json to your own root path for data storage.

3 . Install dependencies.

  • Python
  • PyTorch
  • Cuda
  • tensorboard
  • tqdm
  • lmdb
  • easydict
  • msgpack
  • msgpack_numpy

To install the dependencies use conda and pip, you need to have anaconda3 or miniconda3 installed first, then:

conda create --name conquer
conda activate conquer 
conda install python==3.7.9 numpy==1.19.2 pytorch==1.6.0 cudatoolkit=10.1 -c pytorch
conda install tensorboard==2.4.0 tqdm
pip install easydict lmdb msgpack msgpack_numpy

Training and Inference

NOTE: Currently only support train and inference using one gpu.

We give examples on how to perform training and inference for our CONQUER model.

1 . CONQUER training

bash scripts/TRAIN_SCRIPTS.sh EXP_ID CUDA_DEVICE_ID

TRAIN_SCRIPTS is a name string for training script. EXP_ID is a name string for current run. CUDA_DEVICE_ID is cuda device id.

Below are four examples of training CONQUER when

  • it adopts general similarity measure function without shared normalization training objective :
bash scripts/train_general.sh general 0 
  • it adopts general similarity measure function with three negative videos and extend pool size 1000:
bash scripts/train_sharednorm_general.sh general_extend1000_neg3 0 \
--use_extend_pool 1000 --neg_video_num 3 --bsz 16
  • it adopts disjoint similarity measure function with three negative videos and extend pool size 1000:
bash scripts/train_sharednorm_disjoint.sh disjoint_extend1000_neg3 0 \
--use_extend_pool 1000 --neg_video_num 3 --bsz 16
  • it adopts exclusive similarity measure function with three negative videos and extend pool size 1000:
bash scripts/train_sharednorm_exclusive_pretrain.sh exclusive_pretrain_extend1000_neg3 0 \
--use_extend_pool 1000 --neg_video_num 3 --bsz 16 --encoder_pretrain_ckpt_filepath YOUR_DATA_STORAGE_PATH/first_stage_trained_model/model.ckpt

NOTE: The training has randomness when we adopt shared normalization training objective, because we randomly sample negative videos via an adpative pool size. You will witness performance difference each time.

2 . CONQUER inference

After training, you can inference using the saved model on val or test_public set:

bash scripts/inference.sh MODEL_DIR_NAME CUDA_DEVICE_ID

MODEL_DIR_NAME is the name of the dir containing the saved model, e.g., tvr-general_extend1000_neg3-*. CUDA_DEVICE_ID is cuda device id.

By default, this code evaluates all the 3 tasks (VCMR, SVMR, VR), you can change this behavior by appending option, e.g. --tasks VCMR VR where only VCMR and VR are evaluated.

Below is one example of inference CONQUER which produce the best performance shown in paper.

2.1. Download the trained model tvr-conquer_general_paper_performance.tar.gz (173 MB). After downloading the trained model, extract it to the current directory:

tar zxvf tvr-conquer_general_paper_performance.tar.gz

You should be able to see results/tvr-conquer_general_paper_performance under the current directory.

2.2. Perform inference on validation split

bash scripts/inference.sh tvr-conquer_general_paper_performance 0 --nms_thd 0.7

We use non-maximum suppression (NMS) and set the threshold as 0.7, because NMS can contribute to a higher [email protected] and [email protected] score empirically.

Citation

If you find this code useful for your research, please cite our paper:

@inproceedings{hou2020conquer,
  title={CONQUER: Contextual Query-aware Ranking for Video Corpus Moment Retrieval},
  author={Zhijian, Hou and  Chong-Wah, Ngo and Wing-Kwong Chan},
  booktitle={Proceedings of the 29th ACM International Conference on Multimedia},
  year={2021}
}

Acknowledgement

This code borrowed components from the following projects: TVRetrieval, HERO, HuggingFace, MMT, MME. We thank the authors for open-sourcing these great projects!

Contact

zjhou3-c [at] my.cityu.edu.hk

Owner
Hou zhijian
A PH.D student
Hou zhijian
Open standard for machine learning interoperability

Open Neural Network Exchange (ONNX) is an open ecosystem that empowers AI developers to choose the right tools as their project evolves. ONNX provides

Open Neural Network Exchange 13.9k Dec 30, 2022
Low Complexity Channel estimation with Neural Network Solutions

Interpolation-ResNet Invited paper for WSA 2021, called 'Low Complexity Channel estimation with Neural Network Solutions'. Low complexity residual con

Dianxin 10 Dec 10, 2022
The PyTorch implementation of Directed Graph Contrastive Learning (DiGCL), NeurIPS-2021

Directed Graph Contrastive Learning Paper | Poster | Supplementary The PyTorch implementation of Directed Graph Contrastive Learning (DiGCL). In this

Tong Zekun 28 Jan 08, 2023
A Research-oriented Federated Learning Library and Benchmark Platform for Graph Neural Networks. Accepted to ICLR'2021 - DPML and MLSys'21 - GNNSys workshops.

FedGraphNN: A Federated Learning System and Benchmark for Graph Neural Networks A Research-oriented Federated Learning Library and Benchmark Platform

FedML-AI 175 Dec 01, 2022
TJU Deep Learning & Neural Network

Deep_Learning & Neural_Network_Lab 实验环境 Python 3.9 Anaconda3(官网下载或清华镜像都行) PyTorch 1.10.1(安装代码如下) conda install pytorch torchvision torchaudio cudatool

St3ve Lee 1 Jan 19, 2022
This project uses reinforcement learning on stock market and agent tries to learn trading. The goal is to check if the agent can learn to read tape. The project is dedicated to hero in life great Jesse Livermore.

Reinforcement-trading This project uses Reinforcement learning on stock market and agent tries to learn trading. The goal is to check if the agent can

Deepender Singla 1.4k Dec 22, 2022
Official implementation of ACMMM'20 paper 'Self-supervised Video Representation Learning Using Inter-intra Contrastive Framework'

Self-supervised Video Representation Learning Using Inter-intra Contrastive Framework Official code for paper, Self-supervised Video Representation Le

Li Tao 103 Dec 21, 2022
Official implementation of our neural-network-based fast diffuse room impulse response generator (FAST-RIR)

This is the official implementation of our neural-network-based fast diffuse room impulse response generator (FAST-RIR) for generating room impulse responses (RIRs) for a given acoustic environment.

12 Jan 13, 2022
Liquid Warping GAN with Attention: A Unified Framework for Human Image Synthesis

Liquid Warping GAN with Attention: A Unified Framework for Human Image Synthesis, including human motion imitation, appearance transfer, and novel view synthesis. Currently the paper is under review

2.3k Jan 05, 2023
The pytorch implementation of SOKD (BMVC2021).

Semi-Online Knowledge Distillation Implementations of SOKD. Requirements This repo was tested with Python 3.8, PyTorch 1.5.1, torchvision 0.6.1, CUDA

4 Dec 19, 2021
Official implementation of "UCTransNet: Rethinking the Skip Connections in U-Net from a Channel-wise Perspective with Transformer"

[AAAI2022] UCTransNet This repo is the official implementation of "UCTransNet: Rethinking the Skip Connections in U-Net from a Channel-wise Perspectiv

Haonan Wang 199 Jan 03, 2023
This code is 3d-CNN model that can predict environmental value

Predict-environmental-value-3dCNN This code is 3d-CNN model that can predict environmental value. Firstly, I built a model that can create a lot of bu

1 Jan 06, 2022
Xintao 1.4k Dec 25, 2022
GitHub repository for "Improving Video Generation for Multi-functional Applications"

Improving Video Generation for Multi-functional Applications GitHub repository for "Improving Video Generation for Multi-functional Applications" Pape

Bernhard Kratzwald 328 Dec 07, 2022
Code accompanying "Dynamic Neural Relational Inference" from CVPR 2020

Code accompanying "Dynamic Neural Relational Inference" This codebase accompanies the paper "Dynamic Neural Relational Inference" from CVPR 2020. This

Colin Graber 48 Dec 23, 2022
Ensemble Visual-Inertial Odometry (EnVIO)

Ensemble Visual-Inertial Odometry (EnVIO) Authors : Jae Hyung Jung, Yeongkwon Choe, and Chan Gook Park 1. Overview This is a ROS package of Ensemble V

Jae Hyung Jung 95 Jan 03, 2023
【steal piano】GitHub偷情分析工具!

【steal piano】GitHub偷情分析工具! 你是否有这样的困扰,有一天你的仓库被很多人加了star,但是你却不知道这些人都是从哪来的? 别担心,GitHub偷情分析工具帮你轻松解决问题! 原理 GitHub偷情分析工具透过分析star的时间以及他们之间的follow关系,可以推测出每个st

黄巍 442 Dec 21, 2022
Unofficial implementation of Google "CutPaste: Self-Supervised Learning for Anomaly Detection and Localization" in PyTorch

CutPaste CutPaste: image from paper Unofficial implementation of Google's "CutPaste: Self-Supervised Learning for Anomaly Detection and Localization"

Lilit Yolyan 59 Nov 27, 2022
This is an official implementation for "PlaneRecNet".

PlaneRecNet This is an official implementation for PlaneRecNet: A multi-task convolutional neural network provides instance segmentation for piece-wis

yaxu 50 Nov 17, 2022
Adaptive FNO transformer - official Pytorch implementation

Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers This repository contains PyTorch implementation of the Adaptive Fourier Neu

NVIDIA Research Projects 77 Dec 29, 2022