Videocaptioning.pytorch - A simple implementation of video captioning

Overview

pytorch implementation of video captioning

recommend installing pytorch and python packages using Anaconda

This code is based on video-caption.pytorch

requirements (my environment, other versions of pytorch and torchvision should also support this code (not been verified!))

  • cuda
  • pytorch 1.7.1
  • torchvision 0.8.2
  • python 3
  • ffmpeg (can install using anaconda)

python packages

  • tqdm
  • pillow
  • nltk

Data

MSR-VTT. Download and put them in ./data/msr-vtt-data directory

|-data
  |-msr-vtt-data
    |-train-video
    |-test-video
    |-annotations
      |-train_val_videodatainfo.json
      |-test_videodatainfo.json

MSVD. Download and put them in ./data/msvd-data directory

|-data
  |-msvd-data
    |-YouTubeClips
    |-annotations
      |-AllVideoDescriptions.txt

Options

all default options are defined in opt.py or corresponding code file, change them for your like.

Acknowledgements

Some code refers to ImageCaptioning.pytorch

Usage

(Optional) c3d features (not verified)

you can use video-classification-3d-cnn-pytorch to extract features from video.

Steps

  1. preprocess MSVD annotations (convert txt file to json file)

refer to data/msvd-data/annotations/prepro_annotations.ipynb

  1. preprocess videos and labels
# For MSR-VTT dataset
# Train and Validata set
CUDA_VISIBLE_DEVICES=0 python prepro_feats.py \
    --video_path ./data/msr-vtt-data/train-video \
    --video_suffix mp4 \
    --output_dir ./data/msr-vtt-data/resnet152 \
    --model resnet152 \
    --n_frame_steps 40

# Test set
CUDA_VISIBLE_DEVICES=0 python prepro_feats.py \
    --video_path ./data/msr-vtt-data/test-video \
    --video_suffix mp4 \
    --output_dir ./data/msr-vtt-data/resnet152 \
    --model resnet152 \
    --n_frame_steps 40

python prepro_vocab.py \
    --input_json data/msr-vtt-data/annotations/train_val_videodatainfo.json data/msr-vtt-data/annotations/test_videodatainfo.json \
    --info_json data/msr-vtt-data/info.json \
    --caption_json data/msr-vtt-data/caption.json \
    --word_count_threshold 4

# For MSVD dataset
CUDA_VISIBLE_DEVICES=0 python prepro_feats.py \
    --video_path ./data/msvd-data/YouTubeClips \
    --video_suffix avi \
    --output_dir ./data/msvd-data/resnet152 \
    --model resnet152 \
    --n_frame_steps 40

python prepro_vocab.py \
    --input_json data/msvd-data/annotations/MSVD_annotations.json \
    --info_json data/msvd-data/info.json \
    --caption_json data/msvd-data/caption.json \
    --word_count_threshold 2
  1. Training a model
# For MSR-VTT dataset
CUDA_VISIBLE_DEVICES=0 python train.py \
    --epochs 1000 \
    --batch_size 300 \
    --checkpoint_path data/msr-vtt-data/save \
    --input_json data/msr-vtt-data/annotations/train_val_videodatainfo.json \
    --info_json data/msr-vtt-data/info.json \
    --caption_json data/msr-vtt-data/caption.json \
    --feats_dir data/msr-vtt-data/resnet152 \
    --model S2VTAttModel \
    --with_c3d 0 \
    --dim_vid 2048

# For MSVD dataset
CUDA_VISIBLE_DEVICES=0 python train.py \
    --epochs 1000 \
    --batch_size 300 \
    --checkpoint_path data/msvd-data/save \
    --input_json data/msvd-data/annotations/train_val_videodatainfo.json \
    --info_json data/msvd-data/info.json \
    --caption_json data/msvd-data/caption.json \
    --feats_dir data/msvd-data/resnet152 \
    --model S2VTAttModel \
    --with_c3d 0 \
    --dim_vid 2048
  1. test

    opt_info.json will be in same directory as saved model.

# For MSR-VTT dataset
CUDA_VISIBLE_DEVICES=0 python eval.py \
    --input_json data/msr-vtt-data/annotations/test_videodatainfo.json \
    --recover_opt data/msr-vtt-data/save/opt_info.json \
    --saved_model data/msr-vtt-data/save/model_xxx.pth \
    --batch_size 100

# For MSVD dataset
CUDA_VISIBLE_DEVICES=0 python eval.py \
    --input_json data/msvd-data/annotations/test_videodatainfo.json \
    --recover_opt data/msvd-data/save/opt_info.json \
    --saved_model data/msvd-data/save/model_xxx.pth \
    --batch_size 100

NOTE

This code is just a simple implementation of video captioning. And I have not verify whether the SCST training process and C3D feature are useful!

Acknowledgements

Some code refers to ImageCaptioning.pytorch

Owner
Yiyu Wang
Yiyu Wang
Video Swin Transformer - PyTorch

Video-Swin-Transformer-Pytorch This repo is a simple usage of the official implementation "Video Swin Transformer". Introduction Video Swin Transforme

Haofan Wang 116 Dec 20, 2022
How to Train a GAN? Tips and tricks to make GANs work

(this list is no longer maintained, and I am not sure how relevant it is in 2020) How to Train a GAN? Tips and tricks to make GANs work While research

Soumith Chintala 10.8k Dec 31, 2022
render sprites into your desktop environment as shaped windows using GTK

spritegtk render static or animated sprites into your desktop environment as dynamic shaped windows using GTK requires pycairo and PYGobject: pip inst

hermit 20 Oct 27, 2022
UCSD Oasis platform

oasis UCSD Oasis platform Local project setup Install Docker Compose and make sure you have Pip installed Clone the project and go to the project fold

InSTEDD 4 Jun 16, 2021
The easiest tool for extracting radiomics features and training ML models on them.

Simple pipeline for experimenting with radiomics features Installation git clone https://github.com/piotrekwoznicki/ClassyRadiomics.git cd classrad pi

Piotr Woźnicki 17 Aug 04, 2022
ColBERT: Contextualized Late Interaction over BERT (SIGIR'20)

Update: if you're looking for ColBERTv2 code, you can find it alongside a new simpler API, in the branch new_api. ColBERT ColBERT is a fast and accura

Stanford Future Data Systems 637 Jan 08, 2023
Taming Transformers for High-Resolution Image Synthesis

Taming Transformers for High-Resolution Image Synthesis CVPR 2021 (Oral) Taming Transformers for High-Resolution Image Synthesis Patrick Esser*, Robin

CompVis Heidelberg 3.5k Jan 03, 2023
Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue

Realtime Unsupervised Depth Estimation from an Image This is the caffe implementation of our paper "Unsupervised CNN for single view depth estimation:

Ravi Garg 227 Nov 28, 2022
JORLDY an open-source Reinforcement Learning (RL) framework provided by KakaoEnterprise

Repository for Open Source Reinforcement Learning Framework JORLDY

Kakao Enterprise Corp. 330 Dec 30, 2022
《LXMERT: Learning Cross-Modality Encoder Representations from Transformers》(EMNLP 2020)

The Most Important Thing. Our code is developed based on: LXMERT: Learning Cross-Modality Encoder Representations from Transformers

53 Dec 16, 2022
A python package simulating the quasi-2D pseudospin-1/2 Gross-Pitaevskii equation with NVIDIA GPU acceleration.

A python package simulating the quasi-2D pseudospin-1/2 Gross-Pitaevskii equation with NVIDIA GPU acceleration. Introduction spinor-gpe is high-level,

2 Sep 20, 2022
CTRL-C: Camera calibration TRansformer with Line-Classification

CTRL-C: Camera calibration TRansformer with Line-Classification This repository contains the official code and pretrained models for CTRL-C (Camera ca

57 Nov 14, 2022
Lane assist for ETS2, built with the ultra-fast-lane-detection model.

Euro-Truck-Simulator-2-Lane-Assist Lane assist for ETS2, built with the ultra-fast-lane-detection model. This project was made possible by the amazing

36 Jan 05, 2023
Unofficial implementation of "Coordinate Attention for Efficient Mobile Network Design"

Unofficial implementation of "Coordinate Attention for Efficient Mobile Network Design". CoordAttention tensorflow slim

Billy 9 Aug 22, 2022
Tree-based Search Graph for Approximate Nearest Neighbor Search

TBSG: Tree-based Search Graph for Approximate Nearest Neighbor Search. TBSG is a graph-based algorithm for ANNS based on Cover Tree, which is also an

Fanxbin 2 Dec 27, 2022
The code of NeurIPS 2021 paper "Scalable Rule-Based Representation Learning for Interpretable Classification".

Rule-based Representation Learner This is a PyTorch implementation of Rule-based Representation Learner (RRL) as described in NeurIPS 2021 paper: Scal

Zhuo Wang 53 Dec 17, 2022
The aim of this project is to build an AI bot that can play the Wordle game, or more generally Squabble

Wordle RL The aim of this project is to build an AI bot that can play the Wordle game, or more generally Squabble I know there are more deterministic

Aditya Arora 3 Feb 22, 2022
Code for ACL 2019 Paper: "COMET: Commonsense Transformers for Automatic Knowledge Graph Construction"

To run a generation experiment (either conceptnet or atomic), follow these instructions: First Steps First clone, the repo: git clone https://github.c

Antoine Bosselut 575 Jan 01, 2023
AdaNet is a lightweight TensorFlow-based framework for automatically learning high-quality models with minimal expert intervention

AdaNet is a lightweight TensorFlow-based framework for automatically learning high-quality models with minimal expert intervention. AdaNet buil

3.4k Jan 07, 2023
High-performance moving least squares material point method (MLS-MPM) solver.

High-Performance MLS-MPM Solver with Cutting and Coupling (CPIC) (MIT License) A Moving Least Squares Material Point Method with Displacement Disconti

Yuanming Hu 2.2k Dec 31, 2022