Codes for paper "Towards Diverse Paragraph Captioning for Untrimmed Videos". CVPR 2021

Overview

Towards Diverse Paragraph Captioning for Untrimmed Videos

This repository contains PyTorch implementation of our paper Towards Diverse Paragraph Captioning for Untrimmed Videos (CVPR 2021).

Requirements

  • Python 3.6
  • Java 15.0.2
  • PyTorch 1.2
  • numpy, tqdm, h5py, scipy, six

Training & Inference

Data preparation

  1. Download the pre-extracted video features of ActivityNet Captions or Charades Captions datasets from BaiduNetdisk (code: he21).
  2. Decompress the downloaded files to the corresponding dataset folder in the ordered_feature/ directory.

Start training

  1. Train our model without reinforcement learning, * can be activitynet or charades.
$ cd driver
$ CUDA_VISIBLE_DEVICES=0 python transformer.py ../results/*/dm.token/model.json ../results/*/dm.token/path.json --is_train
  1. Fine-tune the pretrained model using self-critical with both accuracy and diversity rewards.
$ cd driver
$ CUDA_VISIBLE_DEVICES=0 python transformer.py ../results/*/dm.token.rl/model.json ../results/*/dm.token.rl/path.json --is_train --resume_file ../results/*/dm.token/model/epoch.*.th
  1. Train our model with key frames selection.
$ cd driver
$ CUDA_VISIBLE_DEVICES=0 python transformer.py ../results/*/key_frames/model.json ../results/*/key_frames/path.json --is_train --resume_file ../results/*/key_frames/pretrained.th

It will achieve a slightly worse result with only a half of the video features used at inference phase for faster decoding. You need to download the pretrained.th model at first for the key-frame selection.

Evaluation

The trained checkpoints have been saved at the results/*/folder/model/ directory. After evaluation, the generated captions (corresponding to the name file in the public_split) and evaluating scores will be saved at results/*/folder/pred/tst/.

$ cd driver
$ CUDA_VISIBLE_DEVICES=0 python transformer.py ../results/*/folder/model.json ../results/*/folder/path.json --eval_set tst --resume_file ../results/*/folder/model/epoch.*.th

We also provide the pretrained models for the ActivityNet dataset here and Charades dataset here, which are re-run and achieve similar results with the paper.

Reference

If you find this repo helpful, please consider citing:

@inproceedings{song2021paragraph,
  title={Towards Diverse Paragraph Captioning for Untrimmed Videos},
  author={Song, Yuqing and Chen, Shizhe and Jin, Qin},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2021}
}
Owner
Yuqing Song
A student from RUC, major in CS.
Yuqing Song
The mini-MusicNet dataset

mini-MusicNet A music-domain dataset for multi-label classification Music transcription is sequence-to-sequence prediction problem: given an audio per

John Thickstun 4 Nov 09, 2022
Llvlir - Low Level Variable Length Intermediate Representation

Low Level Variable Length Intermediate Representation Low Level Variable Length

Michael Clark 2 Jan 24, 2022
The official repository for our paper "The Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic Generalization".

Codebase for learning control flow in transformers The official repository for our paper "The Neural Data Router: Adaptive Control Flow in Transformer

Csordás Róbert 24 Oct 15, 2022
Pytorch implementation of MixNMatch

MixNMatch: Multifactor Disentanglement and Encoding for Conditional Image Generation [Paper] Yuheng Li, Krishna Kumar Singh, Utkarsh Ojha, Yong Jae Le

910 Dec 30, 2022
Simple reference implementation of GraphSAGE.

Reference PyTorch GraphSAGE Implementation Author: William L. Hamilton Basic reference PyTorch implementation of GraphSAGE. This reference implementat

William L Hamilton 861 Jan 06, 2023
Adaptable tools to make reinforcement learning and evolutionary computation algorithms.

Pearl The Parallel Evolutionary and Reinforcement Learning Library (Pearl) is a pytorch based package with the goal of being excellent for rapid proto

38 Jan 01, 2023
A vision library for performing sliced inference on large images/small objects

SAHI: Slicing Aided Hyper Inference A vision library for performing sliced inference on large images/small objects Overview Object detection and insta

Open Business Software Solutions 2.3k Jan 04, 2023
Codes for the ICCV'21 paper "FREE: Feature Refinement for Generalized Zero-Shot Learning"

FREE This repository contains the reference code for the paper "FREE: Feature Refinement for Generalized Zero-Shot Learning". [arXiv][Paper] 1. Prepar

Shiming Chen 28 Jul 29, 2022
ICLR 2021, Fair Mixup: Fairness via Interpolation

Fair Mixup: Fairness via Interpolation Training classifiers under fairness constraints such as group fairness, regularizes the disparities of predicti

Ching-Yao Chuang 49 Nov 22, 2022
Deep Markov Factor Analysis (NeurIPS2021)

Deep Markov Factor Analysis (DMFA) Codes and experiments for deep Markov factor analysis (DMFA) model accepted for publication at NeurIPS2021: A. Farn

Sarah Ostadabbas 2 Dec 16, 2022
Minecraft Hack Detection With Python

Minecraft Hack Detection An attempt to try and use crowd sourced replays to find

Kuleen Sasse 3 Mar 26, 2022
ICCV2021 - Mining Contextual Information Beyond Image for Semantic Segmentation

Introduction The official repository for "Mining Contextual Information Beyond Image for Semantic Segmentation". Our full code has been merged into ss

55 Nov 09, 2022
Zero-shot Synthesis with Group-Supervised Learning (ICLR 2021 paper)

GSL - Zero-shot Synthesis with Group-Supervised Learning Figure: Zero-shot synthesis performance of our method with different dataset (iLab-20M, RaFD,

Andy_Ge 62 Dec 21, 2022
Tensorflow implementation of Fully Convolutional Networks for Semantic Segmentation

FCN.tensorflow Tensorflow implementation of Fully Convolutional Networks for Semantic Segmentation (FCNs). The implementation is largely based on the

Sarath Shekkizhar 1.3k Dec 25, 2022
EMNLP 2021 Adapting Language Models for Zero-shot Learning by Meta-tuning on Dataset and Prompt Collections

Adapting Language Models for Zero-shot Learning by Meta-tuning on Dataset and Prompt Collections Ruiqi Zhong, Kristy Lee*, Zheng Zhang*, Dan Klein EMN

Ruiqi Zhong 42 Nov 03, 2022
Source code for the BMVC-2021 paper "SimReg: Regression as a Simple Yet Effective Tool for Self-supervised Knowledge Distillation".

SimReg: A Simple Regression Based Framework for Self-supervised Knowledge Distillation Source code for the paper "SimReg: Regression as a Simple Yet E

9 Oct 15, 2022
Grad2Task: Improved Few-shot Text Classification Using Gradients for Task Representation

Grad2Task: Improved Few-shot Text Classification Using Gradients for Task Representation Prerequisites This repo is built upon a local copy of transfo

Jixuan Wang 10 Sep 28, 2022
Exploiting Robust Unsupervised Video Person Re-identification

Exploiting Robust Unsupervised Video Person Re-identification Implementation of the proposed uPMnet. For the preprint, please refer to [Arxiv]. Gettin

1 Apr 09, 2022
The most simple and minimalistic navigation dashboard.

Navigation This project follows a goal to have simple and lightweight dashboard with different links. I use it to have my own self-hosted service dash

Yaroslav 23 Dec 23, 2022
Official Code for AdvRush: Searching for Adversarially Robust Neural Architectures (ICCV '21)

AdvRush Official Code for AdvRush: Searching for Adversarially Robust Neural Architectures (ICCV '21) Environmental Set-up Python == 3.6.12, PyTorch =

11 Dec 10, 2022