Yet another video caption

Last update: May 26, 2022

Related tags

Deep Learning yet-another-video-caption

Overview

yet-another-video-caption

数据集配置

准备数据集

将原始数据集重新组织成统一的格式后，放置于 ./dataset 中。

数据集的组织格式为：

./dataset
    train/
        video/
            *.avi
        ...
        info.json
    test/
        video/ 
            *.avi
        ...

自动配置

通常你只需要使用数据集的一个子集，此时请考虑运行自动抽取脚本 makedata.py。

所有数据位于 ./data 中。

所有视频（包括 train/val/test）位于 ./data/video 中。

所有视频信息（包括 train/val/test）输入到 ./data/input.json。

程序会在 ./data 中产生一些中间信息，请勿修改。

依赖

pip install tqdm pillow pretrainedmodels nltk

此外，请确保已当前环境下已经正确配置 CUDA 运行库，CUDNN，Pytorch(GPU)，ffmpeg，JDK

食用步骤

确保数据集已正确配置
确保依赖已经正确安装
抽取数据，将你希望使用的 train/val/test 划分参数输入 makedata.py 中，然后执行该脚本
依次执行（请自行修改 batch_size 和 saved_model 参数！）

python prepro_feats.py --output_dir data/feats/resnet152 --model resnet152
python prepro_vocab.py
python train.py --epochs 3001 --batch_size 1 --checkpoint_path data/save --feats_dir data/feats/resnet152 --model S2VTAttModel --with_c3d 0 --dim_vid 2048
python eval.py --recover_opt data/save/opt_info.json --saved_model data/save/model_10.pth --batch_size 1

速度测试

以下结果测试于单张 2080Ti

预处理（ResNet152 特征提取）：共 40min

训练速度（batch_size=32）：6.20 it/s

Todo

大小写问题

References

https://github.com/xiadingZ/video-caption.pytorch

Yet another video caption

Related tags

Overview

yet-another-video-caption

数据集配置

准备数据集

自动配置

依赖

食用步骤

速度测试

Todo

References

Owner

Fan Zhimin

Human Dynamics from Monocular Video with Dynamic Camera Movements

FaceOcc: A Diverse, High-quality Face Occlusion Dataset for Human Face Extraction

Code to produce syntactic representations that can be used to study syntax processing in the human brain

Breaking Shortcut: Exploring Fully Convolutional Cycle-Consistency for Video Correspondence Learning

Pytorch implementation of the DeepDream computer vision algorithm

A unified framework to jointly model images, text, and human attention traces.

Codes for the AAAI'22 paper "TransZero: Attribute-guided Transformer for Zero-Shot Learning"

Official Implementation of "Transformers Can Do Bayesian Inference"

This is the code of "Multi-view Contrastive Graph Clustering" in NeurlPS 2021.

We provided a matlab implementation for an evolutionary multitasking AUC optimization framework (EMTAUC).

BMVC 2021: This is the github repository for "Few Shot Temporal Action Localization using Query Adaptive Transformers" accepted in British Machine Vision Conference (BMVC) 2021, Virtual

Implementations of paper Controlling Directions Orthogonal to a Classifier

Demonstrates how to divide a DL model into multiple IR model files (division) and introduce a simplest way to implement a custom layer works with OpenVINO IR models.

FedTorch is an open-source Python package for distributed and federated training of machine learning models using PyTorch distributed API

Film review classification

Official repository of PanoAVQA: Grounded Audio-Visual Question Answering in 360° Videos (ICCV 2021)

Multi-Stage Progressive Image Restoration

Official PyTorch implementation and pretrained models of the paper Self-Supervised Classification Network

Self-Supervised Learning

[NeurIPS2021] Exploring Architectural Ingredients of Adversarially Robust Deep Neural Networks