MARS: Learning Modality-Agnostic Representation for Scalable Cross-media Retrieva

Last update: Aug 24, 2022

Related tags

Deep Learning MARS_TCSVT2021

Overview

Introduction

This is the source code of our TCSVT 2021 paper "MARS: Learning Modality-Agnostic Representation for Scalable Cross-media Retrieval". Please cite the following paper if you use our code.

Yunbo Wang and Yuxin Peng, "MARS: Learning Modality-Agnostic Representation for Scalable Cross-media Retrieval", IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), 2021.

Preparation

We use Python 3.7.2, PyTorch 1.1.0, cuda 9.0, and evaluate on Ubuntu 16.04.12

Install anaconda downloaded from https://repo.anaconda.com/archive. And create a new environment sh Anaconda3-2018.12-Linux-x86_64.sh conda create -n MARS python=3.7.2 conda activate MARS
Run the followed commands conda install pytorch==1.1.0 torchvision==0.3.0 cudatoolkit=9.0 -c pytorch pip install -r requirements.txt

Training and evaluation

We use the Wikipedia dataset as example, and the data is placed in ./datasets/Wiki. In addition, the XMedia&XMediaNet datasets are obtiand via http://59.108.48.34/tiki/XMediaNet/. The NUS-WIDE dataset is obtained via https://lms.comp.nus.edu.sg/wp-content/uploads/2019/research/nuswide/NUS-WIDE.html.

Run the followed command for traning&evaluation, and the configure can be found in main_MARS.py. python main_MARS.py --datasets wiki --output_shape 128 --batch_size 64 --epochs 50 --lr [1e-4, 5e-4] # for Wikipedia

The common representations can be found in folder "features".

For any questions, fell free to contact us. ([email protected])

Welcome to our Laboratory Homepage for more information.

MARS: Learning Modality-Agnostic Representation for Scalable Cross-media Retrieva

Related tags

Overview

Introduction

Preparation

Training and evaluation

Owner

Training DALL-E with volunteers from all over the Internet using hivemind and dalle-pytorch (NeurIPS 2021 demo)

DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting

Python parser for DTED data.

PlenOctrees: NeRF-SH Training & Conversion

Speech Enhancement Generative Adversarial Network Based on Asymmetric AutoEncoder

Lightweight plotting to the terminal. 4x resolution via Unicode.

Project page for the paper Semi-Supervised Raw-to-Raw Mapping 2021.

This repository contains the code for: RerrFact model for SciVer shared task

Personal implementation of paper "Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval"

The Curious Layperson: Fine-Grained Image Recognition without Expert Labels (BMVC 2021)

Code corresponding to The Introspective Agent: Interdependence of Strategy, Physiology, and Sensing for Embodied Agents

The implementation of the algorithm in the paper "Safe Deep Semi-Supervised Learning for Unseen-Class Unlabeled Data" published in ICML 2020.

The PyTorch re-implement of a 3D CNN Tracker to extract coronary artery centerlines with state-of-the-art (SOTA) performance. (paper: 'Coronary artery centerline extraction in cardiac CT angiography using a CNN-based orientation classiﬁer')

[CVPR 2021] Involution: Inverting the Inherence of Convolution for Visual Recognition, a brand new neural operator

A spatial genome aligner for analyzing multiplexed DNA-FISH imaging data.

General Assembly Capstone: NBA Game Predictor

Official implementation for "Style Transformer for Image Inversion and Editing" (CVPR 2022)

PyContinual (An Easy and Extendible Framework for Continual Learning)

PuppetGAN - Cross-Domain Feature Disentanglement and Manipulation just got way better! 🚀

Production First and Production Ready End-to-End Speech Recognition Toolkit