Weakly Supervised Dense Event Captioning in Videos, i.e. generating multiple sentence descriptions for a video in a weakly-supervised manner.

Overview

WSDEC

This is the official repo for our NeurIPS paper Weakly Supervised Dense Event Captioning in Videos.

Description

Repo directories

  • ./: global config files, training, evaluating scripts;
  • ./data: data dictionary;
  • ./model: our final models used to reproduce the results;
  • ./runs: the default output dictionary used to store our trained model and result files;
  • ./scripts: helper scripts;
  • ./third_party: third party dependency include the official evaluation scripts;
  • ./utils: helper functions;
  • ./train_script: all training scripts;
  • ./eval_script: all evalulating scripts.

Dependency

  • Python 2.7
  • CUDA 9.0(note: you will encounter a bug saying segmentation fault(core dump) if you run our code with CUDA 8.0)
    • But it seems that the bug still exists. See issue
  • [Pytorch 0.3.1](note: 0.3.1 is not compatible with newer version)
  • numpy, hdf5 and other necessary packages(no special requirement)

Usage for reproduction

Before we start

Before the training and testing, we should make sure the data, third party data are prepared, here is the one-by-one steps to make everything prepared.

1. Clone our repo and submodules

git clone --recursive https://github.com/XgDuan/WSDEC

2. Download all the data

  • Download the official C3D features, you can either download the data from the website or from our onedrive cloud.

    • Download from the official website; (Note, after you download the C3D features, you can either place it in the data folder and rename it as anet_v1.3.c3d.hdf5, or create a soft link in the data dictionary as ln -s YOURC3DFeature data/anet_v1.3.c3d.hdf5)
  • Download the dense video captioning data from the official website; (Similar to the C3D feature, you are supposed to place the download data in the data folder and rename it as densecap)

  • Download the data for the official evaluation scripts densevid_eval;

    • run the command sh download.sh scripts in the folder PREFIX/WSDEC/third_party/densevid_eval;
  • [Good News]: we write a shell script for you to download the data, just run the following command:

    cd data
    sh download.sh
    

3. Generate the dictionary for the caption model

python scripts/caption_preprocess.py

Training

There are two steps for model training: pretrain a not so bad caption model; and the second step, train the final/baseline model.

Our pretrained captioning model is trained.

python train_script/train_cg_pretrain.py

train our final model

python train_script/train_final.py --checkpoint_cg YOUR_PRETRAINED_CAPTION_MODEL.ckp --alias MODEL_NAME

train baselines

  1. train the baseline model without classification loss.
python train_script/train_baseline_regressor.py --checkpoint_cg YOUR_PRETRAINED_CAPTION_MODEL.ckp --alias MODEL_NAME
  1. train the baseline model without regression branch.
python train_script/train_final.py --checkpoint_cg YOUR_PRETRAINED_CAPTION_MODEL.ckp --regressor_scale 0 --alias MODEL_NAME

About the arguments

All the arguments we use can be found in the corresponding training scripts. You can also use your own argumnets if you like to do so. But please mind, some arguments are discarded(This is our own reimplementation of our paper, the first version codes are too dirty that no one would like to use it.)

Testing

Testing is easier than training. Firstly, in the process of training, our scripts will call the densevid_eval in a subprocess every time after we run the eval function. From these results, you can have a general grasp about the final performance by just have a look at the eval_results.txt scripts. Secondly, after some epochs, you can run the evaluation scripts:

  1. evaluate the full model or no_regression model:
python eval_script/evaluate.py --checkpoint YOUR_TRAINED_MODEL.ckp
  1. evaluate the no_classification model:
python eval_script/evaluate_baseline_regressor.py --checkpoint YOUR_TRAINED_MODEL.ckp
  1. evaluate the pretrained model with random temporal segment:
python eval_script/evaluate_pretrain.py --checkpoint YOUR_PRETRAIN_CAPTION_MODEL.ckp

Other usages

Besides reproduce our work, there are at least two interesting things you can do with our codes.

Train a supervised sentence localization model

To know what is sentence localization, you can have a look at our paper ABLR. Note that our work at a matter of fact provides an unsupervised solution towards sentence localization, we introduce the usage for the supervised model here. We have written the trainer, you can just run the following command and have a cup of coffee:

python train_script/train_sl.py

Train a supervised video event caption generation model

If you have read our paper, you would find that event captioning is the dual task of the aforementioned sentence localization task. To train such a model, just run the following command:

python train_script/train_cg.py

BUGS

You may encounter a cuda internal bug that says Segmentation fault(core dumped) during training if you are using cuda 8.0. If such things happen, try upgrading your cuda to 9.0.

other

We will add more description about how to use our code. Please feel free to contact us if you have any questions or suggestions.

Trained model and results

Links for our trained model

You can download our pretrained model for evaluation or further usage from our onedrive, which includes a pretrained caption generator(cg_pretrain.ckp), a baseline model without classification loss(baseline_noclass.ckp), a baseline model without regression branch(baseline_noregress.ckp), and our final model(final_model.ckp).

Cite the paper and give us star ⭐️

If you find our paper or code useful, please cite our paper using the following bibtex:

@incollection{NIPS2018_7569,
title = {Weakly Supervised Dense Event Captioning in Videos},
author = {Duan, Xuguang and Huang, Wenbing and Gan, Chuang and Wang, Jingdong and Zhu, Wenwu and Huang, Junzhou},
booktitle = {Advances in Neural Information Processing Systems 31},
editor = {S. Bengio and H. Wallach and H. Larochelle and K. Grauman and N. Cesa-Bianchi and R. Garnett},
pages = {3062--3072},
year = {2018},
publisher = {Curran Associates, Inc.},
url = {http://papers.nips.cc/paper/7569-weakly-supervised-dense-event-captioning-in-videos.pdf}
}
Owner
Melon(Xuguang Duan)
Lick the screen
Melon(Xuguang Duan)
Lung Pattern Classification for Interstitial Lung Diseases Using a Deep Convolutional Neural Network

ild-cnn This is supplementary material for the manuscript: "Lung Pattern Classification for Interstitial Lung Diseases Using a Deep Convolutional Neur

22 Nov 05, 2022
DA2Lite is an automated model compression toolkit for PyTorch.

DA2Lite (Deep Architecture to Lite) is a toolkit to compress and accelerate deep network models. ⭐ Star us on GitHub — it helps!! Frameworks & Librari

Sinhan Kang 7 Mar 22, 2022
A light-weight image labelling tool for Python designed for creating segmentation data sets.

An image labelling tool for creating segmentation data sets, for Django and Flask.

117 Nov 21, 2022
A curated list and survey of awesome Vision Transformers.

English | 简体中文 A curated list and survey of awesome Vision Transformers. You can use mind mapping software to open the mind mapping source file. You c

OpenMMLab 281 Dec 21, 2022
This folder contains the python code of UR5E's advanced forward kinematics model.

This folder contains the python code of UR5E's advanced forward kinematics model. By entering the angle of the joint of UR5e, the detailed coordinates of up to 48 points around the robot arm can be c

Qiang Wang 4 Sep 17, 2022
[CVPR 2016] Unsupervised Feature Learning by Image Inpainting using GANs

Context Encoders: Feature Learning by Inpainting CVPR 2016 [Project Website] [Imagenet Results] Sample results on held-out images: This is the trainin

Deepak Pathak 829 Dec 31, 2022
This repository implements WGAN_GP.

Image_WGAN_GP This repository implements WGAN_GP. Image_WGAN_GP This repository uses wgan to generate mnist and fashionmnist pictures. Firstly, you ca

Lieon 6 Dec 10, 2021
A PyTorch implementation of "Semi-Supervised Graph Classification: A Hierarchical Graph Perspective" (WWW 2019)

SEAL ⠀⠀⠀ A PyTorch implementation of Semi-Supervised Graph Classification: A Hierarchical Graph Perspective (WWW 2019) Abstract Node classification an

Benedek Rozemberczki 202 Dec 27, 2022
CoaT: Co-Scale Conv-Attentional Image Transformers

CoaT: Co-Scale Conv-Attentional Image Transformers Introduction This repository contains the official code and pretrained models for CoaT: Co-Scale Co

mlpc-ucsd 191 Dec 03, 2022
MinkLoc++: Lidar and Monocular Image Fusion for Place Recognition

MinkLoc++: Lidar and Monocular Image Fusion for Place Recognition Paper: MinkLoc++: Lidar and Monocular Image Fusion for Place Recognition accepted fo

64 Dec 18, 2022
CLIP (Contrastive Language–Image Pre-training) trained on Indonesian data

CLIP-Indonesian CLIP (Radford et al., 2021) is a multimodal model that can connect images and text by training a vision encoder and a text encoder joi

Galuh 17 Mar 10, 2022
A (PyTorch) imbalanced dataset sampler for oversampling low frequent classes and undersampling high frequent ones.

Imbalanced Dataset Sampler Introduction In many machine learning applications, we often come across datasets where some types of data may be seen more

Ming 2k Jan 08, 2023
Code for the paper: On Pathologies in KL-Regularized Reinforcement Learning from Expert Demonstrations

Non-Parametric Prior Actor-Critic (N-PPAC) This repository contains the code for On Pathologies in KL-Regularized Reinforcement Learning from Expert D

Cong Lu 5 May 13, 2022
A Vision Transformer approach that uses concatenated query and reference images to learn the relationship between query and reference images directly.

A Vision Transformer approach that uses concatenated query and reference images to learn the relationship between query and reference images directly.

24 Dec 13, 2022
Rate-limit-semaphore - Semaphore implementation with rate limit restriction for async-style (any core)

Rate Limit Semaphore Rate limit semaphore for async-style (any core) There are t

Yan Kurbatov 4 Jun 21, 2022
Code in PyTorch for the convex combination linear IAF and the Householder Flow, J.M. Tomczak & M. Welling

VAE with Volume-Preserving Flows This is a PyTorch implementation of two volume-preserving flows as described in the following papers: Tomczak, J. M.,

Jakub Tomczak 87 Dec 26, 2022
We provided a matlab implementation for an evolutionary multitasking AUC optimization framework (EMTAUC).

EMTAUC We provided a matlab implementation for an evolutionary multitasking AUC optimization framework (EMTAUC). In this code, SBGA is considered a ba

7 Nov 24, 2022
Implementation of UNET architecture for Image Segmentation.

Semantic Segmentation using UNET This is the implementation of UNET on Carvana Image Masking Kaggle Challenge About the Dataset This dataset contains

Anushka agarwal 4 Dec 21, 2021
Magisk module to enable hidden features on Android 12 Developer Preview 1.

Android 12 Extensions This is a Magisk module that enables hidden features on Android 12 Developer Preview 1. Features Scrolling screenshots Wallpaper

Danny Lin 384 Jan 06, 2023
[NeurIPS 2021] Official implementation of paper "Learning to Simulate Self-driven Particles System with Coordinated Policy Optimization".

Code for Coordinated Policy Optimization Webpage | Code | Paper | Talk (English) | Talk (Chinese) Hi there! This is the source code of the paper “Lear

DeciForce: Crossroads of Machine Perception and Autonomy 81 Dec 19, 2022