The official implementation of CVPR 2021 Paper: Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation.

Last update: Nov 14, 2022

Related tags

Deep Learning weak-sup-visual-grounding

Overview

Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation

This repository is the official implementation of CVPR 2021 paper: Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation.

Requirements

Tensorflow-1-15

Training

To train the NCE model(s) in the paper, run this command:

python train_nce_distill_model.py \
  --region_feat_path=region_features.hdf5 \
  --phrase_feat_path=phrase_features.hdf5 \
  --glove_path=glove.hdf5

To train the NCE+Distill model(s) in the paper, run this command:

python train_nce_distill_model.py \
  --region_feat_path=region_features.hdf5 \
  --phrase_feat_path=phrase_features.hdf5 \
  --glove_path=glove.hdf5 \
  --phrase_to_label_json=phrase_to_label.json

Evaluation

To evaluate the model on Flickr30K, run:

python eval_model.py \
  --region_feat_path=region_features_test.hdf5 \
  --phrase_feat_path=phrase_features_test.hdf5 \
  --glove_path=glove.hdf5 \
  --restore_path=checkpoint.meta

Pre-trained Models

You can download pretrained models using Res101 VG features here:

You can also find the features on Flickr30K test split here.

The pretrained models achieve the following performance on Flickr30K test split:

Model Name	[email protected]	[email protected]	[email protected]
NCE+Distill	0.5310	0.7394	0.7875
NCE	0.5135	0.7338	0.7833

Citation

If you use our implementation in your research or wish to refer to the results published in our paper, please use the following BibTeX entry.

@InProceedings{Wang_2021_CVPR,
    author    = {Wang, Liwei and Huang, Jing and Li, Yin and Xu, Kun and Yang, Zhengyuan and Yu, Dong},
    title     = {Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2021},
    pages     = {14090-14100}
}

The official implementation of CVPR 2021 Paper: Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation.

Related tags

Overview

Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation

Requirements

Training

Evaluation

Pre-trained Models

Citation

Owner

Code and Experiments for ACL-IJCNLP 2021 Paper Mind Your Outliers! Investigating the Negative Impact of Outliers on Active Learning for Visual Question Answering.

One-line your code easily but still with the fun of doing so!

Contrastive Learning for Compact Single Image Dehazing, CVPR2021

End-to-end face detection, cropping, norm estimation, and landmark detection in a single onnx model

上海交通大学全自动抢课脚本，支持准点开抢与抢课后持续捡漏两种模式。2021/06/08更新。

PyTorch implementation of the Transformer in Post-LN (Post-LayerNorm) and Pre-LN (Pre-LayerNorm).

This is an official pytorch implementation of Lite-HRNet: A Lightweight High-Resolution Network.

This is the research repository for Vid2Doppler: Synthesizing Doppler Radar Data from Videos for Training Privacy-Preserving Activity Recognition.

AugLiChem - The augmentation library for chemical systems.

[ECCV'20] Convolutional Occupancy Networks

Video Matting Refinement For Python

Direct design of biquad filter cascades with deep learning by sampling random polynomials.

This is a classifier which basically predicts whether there is a gun law in a state or not, depending on various things like murder rates etc.

This codebase is the official implementation of Test-Time Classifier Adjustment Module for Model-Agnostic Domain Generalization (NeurIPS2021, Spotlight)

SpinalNet: Deep Neural Network with Gradual Input

Make a surveillance camera from your raspberry pi!

PyTorch Code of "Memory In Memory: A Predictive Neural Network for Learning Higher-Order Non-Stationarity from Spatiotemporal Dynamics"

Code for T-Few from "Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning"

The Empirical Investigation of Representation Learning for Imitation (EIRLI)

An implementation of an abstract algebra for music tones (pitches).