Compact Bidirectional Transformer for Image Captioning

Related tags

Deep LearningCBTrans
Overview

Compact Bidirectional Transformer for Image Captioning

Requirements

  • Python 3.8
  • Pytorch 1.6
  • lmdb
  • h5py
  • tensorboardX

Prepare Data

  1. Please use git clone --recurse-submodules to clone this repository and remember to follow initialization steps in coco-caption/README.md.
  2. Download the preprocessd dataset from this link and extract it to data/.
  3. Please download the converted VinVL feature from this link and place them under data/mscoco_VinVL/. You can also optionally follow this instruction to prepare the fixed or adaptive bottom-up features extracted by Anderson and place them under data/mscoco/ or data/mscoco_adaptive/.
  4. Download part checkpoints from here and extract them to save/.

Offline Evaluation

To reproduce the results of single CBTIC model on Karpathy test split, just run

python  eval.py  --model  save/nsc-transformer-cb-VinVL-feat/model-best.pth   --infos_path  save/nsc-transformer-cb-VinVL-feat/infos_nsc-transformer-cb-VinVL-feat-best.pkl      --beam_size   2   --id  nsc-transformer-cb-VinVL-feat   --split test

To reproduce the results of ensemble of CBTIC models on Karpathy test split, just run

python eval_ensemble.py   --ids   nsc-transformer-cb-VinVL-feat  nsc-transformer-cb-VinVL-feat-seed1   nsc-transformer-cb-VinVL-feat-seed2  nsc-transformer-cb-VinVL-feat-seed3 --weights  1 1 1 1  --beam_size  2   --split  test

Online Evaluation

Please first run

python eval_ensemble.py   --split  test  --language_eval 0  --ids   nsc-transformer-cb-VinVL-feat  nsc-transformer-cb-VinVL-feat-seed1   nsc-transformer-cb-VinVL-feat-seed2  nsc-transformer-cb-VinVL-feat-seed3 --weights  1 1 1 1  --input_json  data/cocotest.json  --input_fc_dir data/mscoco_VinVL/cocobu_test2014/cocobu_fc --input_att_dir  data/mscoco_VinVL/cocobu_test2014/cocobu_att   --input_label_h5    data/cocotalk_bw_label.h5    --language_eval 0        --batch_size  128   --beam_size   2   --id   captions_test2014_cbtic_results 

and then follow the instruction to upload results.

Training

  1. In the first training stage, such as using VinVL feature, run
python  train.py   --noamopt --noamopt_warmup 20000   --seq_per_img 5 --batch_size 10 --beam_size 1 --learning_rate 5e-4 --num_layers 6 --input_encoding_size 512 --rnn_size 2048 --learning_rate_decay_start 0  --scheduled_sampling_start 0  --save_checkpoint_every 3000 --language_eval 1 --val_images_use 5000 --max_epochs 15     --checkpoint_path   save/transformer-cb-VinVL-feat   --id   transformer-cb-VinVL-feat   --caption_model  cbt     --input_fc_dir   data/mscoco_VinVL/cocobu_fc   --input_att_dir   data/mscoco_VinVL/cocobu_att    --input_box_dir    data/mscoco_VinVL/cocobu_box    
  1. Then in the second training stage, you need two GPUs with 12G memory each, please copy the above pretrained model first
cd save
./copy_model.sh  transformer-cb-VinVL-feat    nsc-transformer-cb-VinVL-feat
cd ..

and then run

python  train.py    --seq_per_img 5 --batch_size 10 --beam_size 1 --learning_rate 1e-5 --num_layers 6 --input_encoding_size 512 --rnn_size 2048  --save_checkpoint_every 3000 --language_eval 1 --val_images_use 5000 --self_critical_after 14  --max_epochs    30  --start_from   save/nsc-transformer-cb-VinVL-feat     --checkpoint_path   save/nsc-transformer-cb-VinVL-feat   --id  nsc-transformer-cb-VinVL-feat   --caption_model  cbt    --input_fc_dir   data/mscoco_VinVL/cocobu_fc   --input_att_dir   data/mscoco_VinVL/cocobu_att    --input_box_dir    data/mscoco_VinVL/cocobu_box 

Note

  1. Even if fixing all random seed, we find that the results of the two runs are still slightly different when using DataParallel on two GPUs. However, the results can be reproduced exactly when using one GPU.
  2. If you are interested in the ablation studies, you can use the git reflog to list all commits and use git reset --hard commit_id to change to corresponding commit.

Citation

@misc{zhou2022compact,
      title={Compact Bidirectional Transformer for Image Captioning}, 
      author={Yuanen Zhou and Zhenzhen Hu and Daqing Liu and Huixia Ben and Meng Wang},
      year={2022},
      eprint={2201.01984},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Acknowledgements

This repository is built upon self-critical.pytorch. Thanks for the released code.

Owner
YE Zhou
YE Zhou
Subdivision-based Mesh Convolutional Networks

Subdivision-based Mesh Convolutional Networks The official implementation of SubdivNet in our paper, Subdivion-based Mesh Convolutional Networks Requi

Zheng-Ning Liu 181 Dec 28, 2022
Voxel Transformer for 3D object detection

Voxel Transformer This is a reproduced repo of Voxel Transformer for 3D object detection. The code is mainly based on OpenPCDet. Introduction We provi

173 Dec 25, 2022
Few-shot Relation Extraction via Bayesian Meta-learning on Relation Graphs

Few-shot Relation Extraction via Bayesian Meta-learning on Relation Graphs This is an implemetation of the paper Few-shot Relation Extraction via Baye

MilaGraph 36 Nov 22, 2022
This is the official implementation of the paper "Object Propagation via Inter-Frame Attentions for Temporally Stable Video Instance Segmentation".

ObjProp Introduction This is the official implementation of the paper "Object Propagation via Inter-Frame Attentions for Temporally Stable Video Insta

Anirudh S Chakravarthy 6 May 03, 2022
The official re-implementation of the Neurips 2021 paper, "Targeted Neural Dynamical Modeling".

Targeted Neural Dynamical Modeling Note: This is a re-implementation (in Tensorflow2) of the original TNDM model. We do not plan to further update the

6 Oct 05, 2022
How to Train a GAN? Tips and tricks to make GANs work

(this list is no longer maintained, and I am not sure how relevant it is in 2020) How to Train a GAN? Tips and tricks to make GANs work While research

Soumith Chintala 10.8k Dec 31, 2022
Open source repository for the code accompanying the paper 'PatchNets: Patch-Based Generalizable Deep Implicit 3D Shape Representations'.

PatchNets This is the official repository for the project "PatchNets: Patch-Based Generalizable Deep Implicit 3D Shape Representations". For details,

16 May 22, 2022
Classification of Long Sequential Data using Circular Dilated Convolutional Neural Networks

Classification of Long Sequential Data using Circular Dilated Convolutional Neural Networks arXiv preprint: https://arxiv.org/abs/2201.02143. Architec

19 Nov 30, 2022
Codes for CIKM'21 paper 'Self-Supervised Graph Co-Training for Session-based Recommendation'.

COTREC Codes for CIKM'21 paper 'Self-Supervised Graph Co-Training for Session-based Recommendation'. Requirements: Python 3.7, Pytorch 1.6.0 Best Hype

Xin Xia 42 Dec 09, 2022
NeurIPS 2021, self-supervised 6D pose on category level

SE(3)-eSCOPE video | paper | website Leveraging SE(3) Equivariance for Self-Supervised Category-Level Object Pose Estimation Xiaolong Li, Yijia Weng,

Xiaolong 63 Nov 22, 2022
Genpass - A Passwors Generator App With Python3

Genpass Welcom again into another python3 App this is simply an Passwors Generat

Mal4D 1 Jan 09, 2022
Datasets, tools, and benchmarks for representation learning of code.

The CodeSearchNet challenge has been concluded We would like to thank all participants for their submissions and we hope that this challenge provided

GitHub 1.8k Dec 25, 2022
The code for our paper Semi-Supervised Learning with Multi-Head Co-Training

Semi-Supervised Learning with Multi-Head Co-Training (PyTorch) Abstract Co-training, extended from self-training, is one of the frameworks for semi-su

cmc 6 Dec 04, 2022
ParmeSan: Sanitizer-guided Greybox Fuzzing

ParmeSan: Sanitizer-guided Greybox Fuzzing ParmeSan is a sanitizer-guided greybox fuzzer based on Angora. Published Work USENIX Security 2020: ParmeSa

VUSec 158 Dec 31, 2022
Pytorch implementation of “Recursive Non-Autoregressive Graph-to-Graph Transformer for Dependency Parsing with Iterative Refinement”

Graph-to-Graph Transformers Self-attention models, such as Transformer, have been hugely successful in a wide range of natural language processing (NL

Idiap Research Institute 40 Aug 14, 2022
This is the official PyTorch implementation of our paper: "Artistic Style Transfer with Internal-external Learning and Contrastive Learning".

Artistic Style Transfer with Internal-external Learning and Contrastive Learning This is the official PyTorch implementation of our paper: "Artistic S

51 Dec 20, 2022
[ICLR 2021] Is Attention Better Than Matrix Decomposition?

Enjoy-Hamburger 🍔 Official implementation of Hamburger, Is Attention Better Than Matrix Decomposition? (ICLR 2021) Under construction. Introduction T

Gsunshine 271 Dec 29, 2022
Official pytorch implementation of DeformSyncNet: Deformation Transfer via Synchronized Shape Deformation Spaces

DeformSyncNet: Deformation Transfer via Synchronized Shape Deformation Spaces Minhyuk Sung*, Zhenyu Jiang*, Panos Achlioptas, Niloy J. Mitra, Leonidas

Zhenyu Jiang 21 Aug 30, 2022
Author Disambiguation using Knowledge Graph Embeddings with Literals

Author Name Disambiguation with Knowledge Graph Embeddings using Literals This is the repository for the master thesis project on Knowledge Graph Embe

12 Oct 19, 2022
Image Processing, Image Smoothing, Edge Detection and Transforms

opevcvdl-hw1 This project uses openCV and Qt to achieve the requirements. Version Python 3.7 opencv-contrib-python 3.4.2.17 Matplotlib 3.1.1 pyqt5 5.1

Kenny Cheng 3 Aug 17, 2022