Codes for paper "Towards Diverse Paragraph Captioning for Untrimmed Videos". CVPR 2021

Last update: Oct 11, 2022

Related tags

Overview

Towards Diverse Paragraph Captioning for Untrimmed Videos

This repository contains PyTorch implementation of our paper Towards Diverse Paragraph Captioning for Untrimmed Videos (CVPR 2021).

Requirements

Python 3.6
Java 15.0.2
PyTorch 1.2
numpy, tqdm, h5py, scipy, six

Training & Inference

Data preparation

Download the pre-extracted video features of ActivityNet Captions or Charades Captions datasets from BaiduNetdisk (code: he21).
Decompress the downloaded files to the corresponding dataset folder in the ordered_feature/ directory.

Start training

Train our model without reinforcement learning, * can be activitynet or charades.

$ cd driver
$ CUDA_VISIBLE_DEVICES=0 python transformer.py ../results/*/dm.token/model.json ../results/*/dm.token/path.json --is_train

Fine-tune the pretrained model using self-critical with both accuracy and diversity rewards.

$ cd driver
$ CUDA_VISIBLE_DEVICES=0 python transformer.py ../results/*/dm.token.rl/model.json ../results/*/dm.token.rl/path.json --is_train --resume_file ../results/*/dm.token/model/epoch.*.th

Train our model with key frames selection.

$ cd driver
$ CUDA_VISIBLE_DEVICES=0 python transformer.py ../results/*/key_frames/model.json ../results/*/key_frames/path.json --is_train --resume_file ../results/*/key_frames/pretrained.th

It will achieve a slightly worse result with only a half of the video features used at inference phase for faster decoding. You need to download the pretrained.th model at first for the key-frame selection.

Evaluation

The trained checkpoints have been saved at the results/*/folder/model/ directory. After evaluation, the generated captions (corresponding to the name file in the public_split) and evaluating scores will be saved at results/*/folder/pred/tst/.

$ cd driver
$ CUDA_VISIBLE_DEVICES=0 python transformer.py ../results/*/folder/model.json ../results/*/folder/path.json --eval_set tst --resume_file ../results/*/folder/model/epoch.*.th

We also provide the pretrained models for the ActivityNet dataset here and Charades dataset here, which are re-run and achieve similar results with the paper.

Reference

If you find this repo helpful, please consider citing:

@inproceedings{song2021paragraph,
  title={Towards Diverse Paragraph Captioning for Untrimmed Videos},
  author={Song, Yuqing and Chen, Shizhe and Jin, Qin},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2021}
}

Codes for paper "Towards Diverse Paragraph Captioning for Untrimmed Videos". CVPR 2021

Related tags

Overview

Towards Diverse Paragraph Captioning for Untrimmed Videos

Requirements

Training & Inference

Data preparation

Start training

Evaluation

Reference

Owner

Yuqing Song

The project page of paper: Architecture disentanglement for deep neural networks [ICCV 2021, oral]

A new GCN model for Point Cloud Analyse

Attentive Implicit Representation Networks (AIR-Nets)

A Broader Picture of Random-walk Based Graph Embedding

Official implementation of Influence-balanced Loss for Imbalanced Visual Classification in PyTorch.

Dcf-game-infrastructure-public - Contains all the components necessary to run a DC finals (attack-defense CTF) game from OOO

NeurIPS'21 Tractable Density Estimation on Learned Manifolds with Conformal Embedding Flows

Keras implementation of AdaBound

Attendance Monitoring with Face Recognition using Python

🌊 Online machine learning in Python

Easy-to-use,Modular and Extendible package of deep-learning based CTR models .

Official implementation of "Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets" (CVPR2021)

A annotation of yolov5-5.0

Anime Face Detector using mmdet and mmpose

Vector Quantized Diffusion Model for Text-to-Image Synthesis

Binary classification for arrythmia detection with ECG datasets.

Anonymous implementation of KSL

Fuzzing the Kernel Using Unicornafl and AFL++

Public scripts, services, and configuration for running a smart home K3S network cluster

TorchMetrics is a collection of 25+ PyTorch metrics implementations and an easy-to-use API to create custom metrics.