MVFNet: Multi-View Fusion Network for Efficient Video Recognition (AAAI 2021)

Last update: Jan 29, 2022

Related tags

Overview

MVFNet: Multi-View Fusion Network for Efficient Video Recognition (AAAI 2021)

Overview

We release the code of the MVFNet (Multi-View Fusion Network). The core code to implement the Multi-View Fusion Module is codes/models/modules/MVF.py.

[Mar 24, 2021] We has released the code of MVFNet.

[Dec 20, 2020] MVFNet has been accepted by AAAI 2021.

Prerequisites
Data Preparation
Model Zoo
Testing
Training

Prerequisites

All dependencies can be installed using pip:

python -m pip install -r requirements.txt

Our experiments run on Python 3.7 and PyTorch 1.5. Other versions should work but are not tested.

Download Pretrained Models

Download ImageNet pre-trained models

cd pretrained
sh download_imgnet.sh

Download K400 pre-trained models

Please refer to Model Zoo.

Data Preparation

Please refer to DATASETS.md for data preparation.

Model Zoo

Architecture	Dataset	T x interval	Top-1 Acc.	Pre-trained model	Train log	Test log
MVFNet-ResNet50	Kinetics-400	4x16	74.2%	Download link	Log link	Log link
MVFNet-ResNet50	Kinetics-400	8x8	76.0%	Download link	Miss	Log link
MVFNet-ResNet50	Kinetics-400	16x4	77.0%	Download link	Log link	Log link
MVFNet-ResNet101	Kinetics-400	4x16	76.0%	Download link	Log link	Log link
MVFNet-ResNet101	Kinetics-400	8x8	77.4%	Download link	Log link	Log link
MVFNet-ResNet101	Kinetics-400	16x4	78.4%	Download link	Log link	Log link

Testing

For 3 crops, 10 clips, the processing of testing

# Dataset: Kinetics-400
# Architecture: R50_8x8 [email protected]=76.0%
bash scripts/dist_test_recognizer.sh configs/MVFNet/K400/mvf_kinetics400_2d_rgb_r50_dense.py ckpt_path 8 --fcn_testing

Training

This implementation supports multi-gpu, DistributedDataParallel training, which is faster and simpler.

For example, to train MVFNet-ResNet50 on Kinetics400 with 8 gpus, you can run:

bash scripts/dist_train_recognizer.sh configs/MVFNet/K400/mvf_kinetics400_2d_rgb_r50_dense.py 8

We also provide the script to train MVFNet on Kinetics400 with multiple machines (e.g., 2 machines and 16 GPUs).

# For first machine, --master_addr is the ip of your first machine
bash scripts/dist_train_multinode_1.sh configs/MVFNet/K400/mvf_kinetics400_2d_rgb_r50_dense.py 8

# For second machine, --master_addr is still the ip of your first machine
bash scripts/dist_train_multinode_2.sh configs/MVFNet/K400/mvf_kinetics400_2d_rgb_r50_dense.py 8

Acknowledgements

We especially thank the contributors of the mmaction codebase for providing helpful code.

License

This repository is released under the Apache-2.0. license as found in the LICENSE file.

Citation

If you think our work is useful, please feel free to cite our paper 😆 :

@inproceedings{wu2020MVFNet,
  author    = {Wu, Wenhao and He, Dongliang and Lin, Tianwei and Li, Fu and Gan, Chuang and Ding, Errui},
  title     = {MVFNet: Multi-View Fusion Network for Efficient Video Recognition},
  booktitle = {AAAI},
  year      = {2021}
}

Contact

For any question, please file an issue or contact

Wenhao Wu: [email protected]

MVFNet: Multi-View Fusion Network for Efficient Video Recognition (AAAI 2021)

Related tags

Overview

MVFNet: Multi-View Fusion Network for Efficient Video Recognition (AAAI 2021)

Overview

Prerequisites

Download Pretrained Models

Data Preparation

Model Zoo

Testing

Training

Acknowledgements

License

Citation

Contact

Owner

Codes of paper "Unseen Object Amodal Instance Segmentation via Hierarchical Occlusion Modeling"

Fuzzing the Kernel Using Unicornafl and AFL++

TensorFlow Implementation of "Show, Attend and Tell"

Blender scripts for computing geodesic distance

This program automatically runs Python code copied in clipboard

Bringing Characters to Life with Computer Brains in Unity

Large-Scale Unsupervised Object Discovery

Fast and robust clustering of point clouds generated with a Velodyne sensor.

Navigating StyleGAN2 w latent space using CLIP

Convert weight file.pth to weight file.blob

LAMDA: Label Matching Deep Domain Adaptation

CVPR 2021

[Pedestron] Generalizable Pedestrian Detection: The Elephant In The Room. @ CVPR2021

Modifications of the official PyTorch implementation of StyleGAN3. Let's easily generate images and videos with StyleGAN2/2-ADA/3!

Pytorch implementation of "Get To The Point: Summarization with Pointer-Generator Networks"

Structured Data Gradient Pruning (SDGP)

Multimodal Descriptions of Social Concepts: Automatic Modeling and Detection of (Highly Abstract) Social Concepts evoked by Art Images

Pytorch implementation of MaskFlownet

TensorFlow (v2.7.0) benchmark results on an M1 Macbook Air 2020 laptop (macOS Monterey v12.1).

Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing