Joint Versus Independent Multiview Hashing for Cross-View Retrieval[J] (IEEE TCYB 2021, PyTorch Code)

Related tags

Deep LearningDCHN
Overview

2021-IEEE TCYB-DCHN

Peng Hu, Xi Peng, Hongyuan Zhu, Jie Lin, Liangli Zhen, Dezhong Peng, Joint Versus Independent Multiview Hashing for Cross-View Retrieval[J]. IEEE Transactions on Cybernetics, vol. 51, no. 10, pp. 4982-4993, Oct. 2021. (PyTorch Code)

Abstract

Thanks to the low storage cost and high query speed, cross-view hashing (CVH) has been successfully used for similarity search in multimedia retrieval. However, most existing CVH methods use all views to learn a common Hamming space, thus making it difficult to handle the data with increasing views or a large number of views. To overcome these difficulties, we propose a decoupled CVH network (DCHN) approach which consists of a semantic hashing autoencoder module (SHAM) and multiple multiview hashing networks (MHNs). To be specific, SHAM adopts a hashing encoder and decoder to learn a discriminative Hamming space using either a few labels or the number of classes, that is, the so-called flexible inputs. After that, MHN independently projects all samples into the discriminative Hamming space that is treated as an alternative ground truth. In brief, the Hamming space is learned from the semantic space induced from the flexible inputs, which is further used to guide view-specific hashing in an independent fashion. Thanks to such an independent/decoupled paradigm, our method could enjoy high computational efficiency and the capacity of handling the increasing number of views by only using a few labels or the number of classes. For a newly coming view, we only need to add a view-specific network into our model and avoid retraining the entire model using the new and previous views. Extensive experiments are carried out on five widely used multiview databases compared with 15 state-of-the-art approaches. The results show that the proposed independent hashing paradigm is superior to the common joint ones while enjoying high efficiency and the capacity of handling newly coming views.

Framework

DCHN

Figure 1. Framework of the proposed DCHN method. g is the output of the corresponding view (i.e., image, text, video, etc.). o is the semantic hash code that is computed by the corresponding label y and semantic hashing transformation W. W is computed by the proposed semantic hashing autoencoder module (SHAM). sgn is an elementwise sign function. ℒR and ℒH are hash reconstruction and semantic hashing functions, respectively. In the training stage, first, W is used to recast the label y as a ground-truth hash code o. Then, the obtained hash code is used to guide view-specific networks with a semantic hashing reconstruction regularizer. Such a learning scheme makes the v view-specific neural networks (one network for each view) can be trained separately since they are decoupled and do not share any trainable parameters. Therefore, our DCHN can be easy to scale to a large number of views. In the inference stage, each trained view-specific network fk(xk, Θk) is used to compute the hash code of the sample xk.

SHAM

Figure 1. Proposed SHAM utilizes the semantic information (e.g., labels or classes) to learn an encoder W and a decoder WT by mutually converting the semantic and Hamming spaces. SHAM is one key component of our independent hashing paradigm.

Usage

First, to train SHAM wtih 64 bits on MIRFLICKR-25K, just run trainSHAM.py as follows:

python trainSHAM.py --datasets mirflickr25k --output_shape 64 --gama 1 --available_num 100

Then, to train a model for image modality wtih 64 bits on MIRFLICKR-25K, just run main_DCHN.py as follows:

python main_DCHN.py --mode train --epochs 100 --view 0 --datasets mirflickr25k --output_shape 64 --alpha 0.02 --gama 1 --available_num 100 --gpu_id 0

For text modality:

python main_DCHN.py --mode train --epochs 100 --view 1 --datasets mirflickr25k --output_shape 64 --alpha 0.02 --gama 1 --available_num 100 --gpu_id 1

To evaluate the trained models, you could run main_DCHN.py as follows:

python main_DCHN.py --mode eval --view -1 --datasets mirflickr25k --output_shape 64 --alpha 0.02 --gama 1 --available_num 100 --num_workers 0

Comparison with the State-of-the-Art

Table 1: Performance comparison in terms of MAP scores on the MIRFLICKR-25K and IAPR TC-12 datasets. The highest MAP score is shown in bold.

   Method    MIRFLICKR-25K IAPR TC-12
Image → Text Text → Image Image → Text Text → Image
16 32 64 128 16 32 64 128 16 32 64 128 16 32 64 128
Baseline 0.581 0.520 0.553 0.573 0.578 0.544 0.556 0.579 0.329 0.292 0.309 0.298 0.332 0.295 0.311 0.304
SePH [21] 0.729 0.738 0.744 0.750 0.753 0.762 0.764 0.769 0.467 0.476 0.486 0.493 0.463 0.475 0.485 0.492
SePHlr [12] 0.729 0.746 0.754 0.763 0.760 0.780 0.785 0.793 0.410 0.434 0.448 0.463 0.461 0.495 0.515 0.525
RoPH [34] 0.733 0.744 0.749 0.756 0.757 0.759 0.768 0.771 0.457 0.481 0.493 0.500 0.451 0.478 0.488 0.495
LSRH [22] 0.756 0.780 0.788 0.800 0.772 0.786 0.791 0.802 0.474 0.490 0.512 0.522 0.474 0.492 0.511 0.526
KDLFH [23] 0.734 0.755 0.770 0.771 0.764 0.780 0.794 0.797 0.306 0.314 0.351 0.357 0.307 0.315 0.350 0.356
DLFH [23] 0.721 0.743 0.760 0.767 0.761 0.788 0.805 0.810 0.306 0.314 0.326 0.340 0.305 0.315 0.333 0.353
MTFH [13] 0.581 0.571 0.645 0.543 0.584 0.556 0.633 0.531 0.303 0.303 0.307 0.300 0.303 0.303 0.308 0.302
DJSRH [14] 0.620 0.630 0.645 0.660 0.620 0.626 0.645 0.649 0.368 0.396 0.419 0.439 0.370 0.400 0.423 0.437
DCMH [9] 0.737 0.754 0.763 0.771 0.753 0.760 0.763 0.770 0.423 0.439 0.456 0.463 0.449 0.464 0.476 0.481
SSAH [20] 0.797 0.809 0.810 0.802 0.782 0.797 0.799 0.790 0.501 0.503 0.496 0.479 0.504 0.530 0.554 0.565
DCHN0 0.806 0.823 0.836 0.842 0.797 0.808 0.823 0.827 0.487 0.492 0.550 0.573 0.481 0.488 0.543 0.567
DCHN100 0.813 0.816 0.823 0.840 0.808 0.803 0.814 0.830 0.533 0.558 0.582 0.596 0.527 0.557 0.582 0.595

Table 2: Performance comparison in terms of MAP scores on the NUS-WIDE and MS-COCO datasets. The highest MAP score is shown in bold.

   Method    NUS-WIDE MS-COCO
Image → Text Text → Image Image → Text Text → Image
16 32 64 128 16 32 64 128 16 32 64 128 16 32 64 128
Baseline 0.281 0.337 0.263 0.341 0.299 0.339 0.276 0.346 0.362 0.336 0.332 0.373 0.348 0.341 0.347 0.359
SePH [21] 0.644 0.652 0.661 0.664 0.654 0.662 0.670 0.673 0.586 0.598 0.620 0.628 0.587 0.594 0.618 0.625
SePHlr [12] 0.607 0.624 0.644 0.651 0.630 0.649 0.665 0.672 0.527 0.571 0.592 0.600 0.555 0.596 0.618 0.621
RoPH [34] 0.638 0.656 0.662 0.669 0.645 0.665 0.671 0.677 0.592 0.634 0.649 0.657 0.587 0.628 0.643 0.652
LSRH [22] 0.622 0.650 0.659 0.690 0.600 0.662 0.685 0.692 0.580 0.563 0.561 0.567 0.580 0.611 0.615 0.632
KDLFH [23] 0.323 0.367 0.364 0.403 0.325 0.365 0.368 0.408 0.373 0.403 0.451 0.542 0.370 0.400 0.449 0.542
DLFH [23] 0.316 0.367 0.381 0.404 0.319 0.379 0.386 0.415 0.352 0.398 0.455 0.443 0.359 0.393 0.456 0.442
MTFH [13] 0.265 0.473 0.434 0.445 0.243 0.418 0.414 0.485 0.288 0.264 0.311 0.413 0.301 0.284 0.310 0.406
DJSRH [14] 0.433 0.453 0.467 0.442 0.457 0.468 0.468 0.501 0.478 0.520 0.544 0.566 0.462 0.525 0.550 0.567
DCMH [9] 0.569 0.595 0.612 0.621 0.548 0.573 0.585 0.592 0.548 0.575 0.607 0.625 0.568 0.595 0.643 0.664
SSAH [20] 0.636 0.636 0.637 0.510 0.653 0.676 0.683 0.682 0.550 0.577 0.576 0.581 0.552 0.578 0.578 0.669
DCHN0 0.648 0.660 0.669 0.683 0.662 0.677 0.685 0.697 0.602 0.658 0.682 0.706 0.591 0.652 0.669 0.696
DCHN100 0.654 0.671 0.681 0.691 0.668 0.683 0.697 0.707 0.662 0.701 0.703 0.720 0.650 0.689 0.693 0.714

Citation

If you find DCHN useful in your research, please consider citing:

@article{hu2021joint,
  author={Hu, Peng and Peng, Xi and Zhu, Hongyuan and Lin, Jie and Zhen, Liangli and Peng, Dezhong},
  journal={IEEE Transactions on Cybernetics}, 
  title={Joint Versus Independent Multiview Hashing for Cross-View Retrieval}, 
  year={2021},
  volume={51},
  number={10},
  pages={4982-4993},
  doi={10.1109/TCYB.2020.3027614}}
}
Owner
https://penghu-cs.github.io/
Action Segmentation Evaluation

Reference Action Segmentation Evaluation Code This repository contains the reference code for action segmentation evaluation. If you have a bug-fix/im

5 May 22, 2022
Visual Tracking by TridenAlign and Context Embedding

Visual Tracking by TridentAlign and Context Embedding (TACT) Test code for "Visual Tracking by TridentAlign and Context Embedding" Janghoon Choi, Juns

Janghoon Choi 32 Aug 25, 2021
Pytorch implementation of the Variational Recurrent Neural Network (VRNN).

VariationalRecurrentNeuralNetwork Pytorch implementation of the Variational RNN (VRNN), from A Recurrent Latent Variable Model for Sequential Data. Th

emmanuel 251 Dec 17, 2022
RodoSol-ALPR Dataset

RodoSol-ALPR Dataset This dataset, called RodoSol-ALPR dataset, contains 20,000 images captured by static cameras located at pay tolls owned by the Ro

Rayson Laroca 45 Dec 15, 2022
HeatNet is a python package that provides tools to build, train and evaluate neural networks designed to predict extreme heat wave events globally on daily to subseasonal timescales.

HeatNet HeatNet is a python package that provides tools to build, train and evaluate neural networks designed to predict extreme heat wave events glob

Google Research 6 Jul 07, 2022
patchmatch和patchmatchstereo算法的python实现

patchmatch patchmatch以及patchmatchstereo算法的python版实现 patchmatch参考 github patchmatchstereo参考李迎松博士的c++版代码 由于patchmatchstereo没有做任何优化,并且是python的代码,主要是方便解析算

Sanders Bao 11 Dec 02, 2022
CoSMA: Convolutional Semi-Regular Mesh Autoencoder. From Paper "Mesh Convolutional Autoencoder for Semi-Regular Meshes of Different Sizes"

Mesh Convolutional Autoencoder for Semi-Regular Meshes of Different Sizes Implementation of CoSMA: Convolutional Semi-Regular Mesh Autoencoder arXiv p

Fraunhofer SCAI 10 Oct 11, 2022
SAN for Product Attributes Prediction

SAN Heterogeneous Star Graph Attention Network for Product Attributes Prediction This repository contains the official PyTorch implementation for ADVI

Xuejiao Zhao 9 Dec 12, 2022
This repository contains python code necessary to replicated the experiments performed in our paper "Invariant Ancestry Search"

InvariantAncestrySearch This repository contains python code necessary to replicated the experiments performed in our paper "Invariant Ancestry Search

Phillip Bredahl Mogensen 0 Feb 02, 2022
Research code for the paper "Variational Gibbs inference for statistical estimation from incomplete data".

Variational Gibbs inference (VGI) This repository contains the research code for Simkus, V., Rhodes, B., Gutmann, M. U., 2021. Variational Gibbs infer

Vaidotas Šimkus 1 Apr 08, 2022
A collection of models for image<->text generation in ACM MM 2021.

Bi-directional Image and Text Generation UMT-BITG (image & text generator) Unifying Multimodal Transformer for Bi-directional Image and Text Generatio

Multimedia Research 63 Oct 30, 2022
Official implementation of Self-supervised Graph Attention Networks (SuperGAT), ICLR 2021.

SuperGAT Official implementation of Self-supervised Graph Attention Networks (SuperGAT). This model is presented at How to Find Your Friendly Neighbor

Dongkwan Kim 127 Dec 28, 2022
PyTorch implementation of MLP-Mixer

PyTorch implementation of MLP-Mixer MLP-Mixer: an all-MLP architecture composed of alternate token-mixing and channel-mixing operations. The token-mix

Duo Li 33 Nov 27, 2022
[ICCV'21] Pri3D: Can 3D Priors Help 2D Representation Learning?

Pri3D: Can 3D Priors Help 2D Representation Learning? [ICCV 2021] Pri3D leverages 3D priors for downstream 2D image understanding tasks: during pre-tr

Ji Hou 124 Jan 06, 2023
Multi-Scale Aligned Distillation for Low-Resolution Detection (CVPR2021)

MSAD Multi-Scale Aligned Distillation for Low-Resolution Detection Lu Qi*, Jason Kuen*, Jiuxiang Gu, Zhe Lin, Yi Wang, Yukang Chen, Yanwei Li, Jiaya J

DV Lab 115 Dec 23, 2022
Genpass - A Passwors Generator App With Python3

Genpass Welcom again into another python3 App this is simply an Passwors Generat

Mal4D 1 Jan 09, 2022
68 keypoint annotations for COFW test data

68 keypoint annotations for COFW test data This repository contains manually annotated 68 keypoints for COFW test data (original annotation of CFOW da

31 Dec 06, 2022
Code for the CVPR 2021 paper "Triple-cooperative Video Shadow Detection"

Triple-cooperative Video Shadow Detection Code and dataset for the CVPR 2021 paper "Triple-cooperative Video Shadow Detection"[arXiv link] [official l

Zhihao Chen 24 Oct 04, 2022
Mercury: easily convert Python notebook to web app and share with others

Mercury Share your Python notebooks with others Easily convert your Python notebooks into interactive web apps by adding parameters in YAML. Simply ad

MLJAR 2.2k Dec 27, 2022
Code for "The Box Size Confidence Bias Harms Your Object Detector"

The Box Size Confidence Bias Harms Your Object Detector - Code Disclaimer: This repository is for research purposes only. It is designed to maintain r

Johannes G. 24 Dec 07, 2022