【ACMMM 2021】DSANet: Dynamic Segment Aggregation Network for Video-Level Representation Learning

Last update: Dec 27, 2022

Overview

DSANet: Dynamic Segment Aggregation Network for Video-Level Representation Learning (ACMMM 2021)

Overview

We release the code of the DSANet (Dynamic Segment Aggregation Network). We introduce the DSA module to capture relationship among snippets for video-level representation learning. Equipped with DSA modules, the top-1 accuracy of I3D ResNet-50 is improved to 78.2% on Kinetics-400.

The core code to implement the Dynamic Segment Aggregation Module is codes/models/modules_maker/DSA.py.

[July 7, 2021] We release the core code of DSANet.

[July 3, 2021] DSANet has been accepted by ACMMM 2021.

Prerequisites
Data Preparation
Model Zoo
Testing
Training

Prerequisites

All dependencies can be installed using pip:

python -m pip install -r requirements.txt

Our experiments run on Python 3.7 and PyTorch 1.5. Other versions should work but are not tested.

Download Pretrained Models

Download ImageNet pre-trained models for offline environment

cd pretrained
sh download_imgnet.sh

Download K400 pre-trained models for inference

TODO

Data Preparation

We follow the same data process with MVFNet for data preparation.

Model Zoo

TODO

Testing

bash dist_test_recognizer.sh CONFIG_PATH CHECKPOINT_PATH 8

Training

This implementation supports multi-gpu, DistributedDataParallel training, which is faster and simpler.

For example, to train DSANet with 8 gpus, you can run:

bash dist_train_recognizer.sh configs/kinetics/r50_e100.py 8

Acknowledgements

We especially thank the contributors of the MVFNet and mmaction codebase for providing helpful code.

License

This repository is released under the Apache-2.0. license as found in the LICENSE file.

Related Work

MVFNet: Multi-View Fusion Network for Efficient Video Recognition, AAAI2021 Paper | Code

Citation

If you think our work is useful, please feel free to cite our paper 😆 :

@inproceedings{wu2021dsanet,
  title={DSANet: Dynamic Segment Aggregation Network for Video-Level Representation Learning},
  author={Wu, Wenhao and Zhao, Yuxiang and Xu, Yanwu and Tan, Xiao and He, Dongliang and Zou, Zhikang and Ye, Jin and Li, Yingying and Yao, Mingde and Dong, Zichao and others},
  booktitle = {ACMMM},
  year={2021}
}

Contact

For any question, please file an issue or contact

Wenhao Wu: [email protected]
Yuxiang Zhao: [email protected]

【ACMMM 2021】DSANet: Dynamic Segment Aggregation Network for Video-Level Representation Learning

Related tags

Overview

DSANet: Dynamic Segment Aggregation Network for Video-Level Representation Learning (ACMMM 2021)

Overview

Prerequisites

Download Pretrained Models

Data Preparation

Model Zoo

Testing

Training

Acknowledgements

License

Related Work

Citation

Contact

Owner

Wenhao Wu

Implementation of Graph Transformer in Pytorch, for potential use in replicating Alphafold2

StocksMA is a package to facilitate access to financial and economic data of Moroccan stocks.

本项目是一个带有前端界面的垃圾分类项目，加载了训练好的模型参数，模型为efficientnetb4，暂时为40分类问题。

An experimentation and research platform to investigate the interaction of automated agents in an abstract simulated network environments.

Deep Watershed Transform for Instance Segmentation

Allele-specific pipeline for unbiased read mapping(WIP), QTL discovery(WIP), and allelic-imbalance analysis

FedJAX is a library for developing custom Federated Learning (FL) algorithms in JAX.

Chainer implementation of recent GAN variants

Robotics with GPU computing

PyTorch DepthNet Training on Still Box dataset

Python package to generate image embeddings with CLIP without PyTorch/TensorFlow

Alpha-Zero - Telegram Group Manager Bot Written In Python Using Pyrogram

Nightmare-Writeup - Writeup for the Nightmare CTF Challenge from 2022 DiceCTF

The Habitat-Matterport 3D Research Dataset - the largest-ever dataset of 3D indoor spaces.

PyTorch Implementation of Unsupervised Depth Completion with Calibrated Backprojection Layers (ORAL, ICCV 2021)

DFM: A Performance Baseline for Deep Feature Matching

A Model for Natural Language Attack on Text Classification and Inference

A criticism of a recent paper on buggy image downsampling methods in popular image processing and deep learning libraries.

Multi-modal co-attention for drug-target interaction annotation and Its Application to SARS-CoV-2

TorchOk - The toolkit for fast Deep Learning experiments in Computer Vision