Code release for Convolutional Two-Stream Network Fusion for Video Action Recognition

Last update: Dec 31, 2022

Related tags

Overview

================================================================================

Convolutional Two-Stream Network Fusion for Video Action Recognition

This repository contains the code for our CVPR 2016 paper:

Christoph Feichtenhofer, Axel Pinz, Andrew Zisserman
"Convolutional Two-Stream Network Fusion for Video Action Recognition"
in Proc. CVPR 2016

If you find the code useful for your research, please cite our paper:

    @inproceedings{feichtenhofer2016convolutional,
      title={Convolutional Two-Stream Network Fusion for Video Action Recognition},
      author={Feichtenhofer, Christoph and Pinz, Axel and Zisserman, Andrew},
      booktitle={Conference on Computer Vision and Pattern Recognition (CVPR)},
      year={2016}
    }

Requirements

The code was tested on Ubuntu 14.04 and Windows 10 using MATLAB R2015b and NVIDIA Titan X or Z GPUs.

If you have questions regarding the implementation please contact:

Christoph Feichtenhofer

================================================================================

Setup

Download the code git clone --recursive https://github.com/feichtenhofer/twostreamfusion
Compile the code by running compile.m.
- This will also compile a modified (and older) version of the MatConvNet toolbox. In case of any issues, please follow the installation instructions on the MatConvNet homepage.
Edit the file cnn_setup_environment.m to adjust the models and data paths.
Download pretrained model files and the datasets, linked below and unpack them into your models/data directory.

Optionally you can pretrain your own twostream models by running
1. cnn_ucf101_spatial(); to train the appearance network stream.
2. cnn_ucf101_temporal(); to train the optical flow network stream.

Run cnn_ucf101_fusion(); this will use the downloaded models and demonstrate training of our final architecture on UCF101/HMDB51.
- In case you would like to train on the CPU, clear the variable opts.train.gpus
- In case you encounter memory issues on your GPU, consider decreasing the cudnnWorkspaceLimit (512MB is default)

Pretrained models

Download our baseline networks trained on UCF101 here:

Data

Pre-computed optical flow images and resized rgb frames for the UCF101 and HMDB51 datasets

UCF101 RGB: part1 part2 part3
UCF101 Flow: part1 part2 part3
HMDB51 RGB: part1
HMDB51 Flow: part1

Use it on your own dataset

Our Optical flow extraction tool provides OpenCV wrappers for optical flow extraction on a GPU.

Code release for Convolutional Two-Stream Network Fusion for Video Action Recognition

Related tags

Overview

Convolutional Two-Stream Network Fusion for Video Action Recognition

Requirements

Setup

Pretrained models

Data

Use it on your own dataset

Owner

Christoph Feichtenhofer

Python parser for DTED data.

chen2020iros: Learning an Overlap-based Observation Model for 3D LiDAR Localization.

Implementation for paper: Self-Regulation for Semantic Segmentation

The code for our paper Semi-Supervised Learning with Multi-Head Co-Training

"Learning and Analyzing Generation Order for Undirected Sequence Models" in Findings of EMNLP, 2021

Airborne Optical Sectioning (AOS) is a wide synthetic-aperture imaging technique

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

An implementation of an abstract algebra for music tones (pitches).

This is a project based on retinaface face detection, including ghostnet and mobilenetv3

repro_eval is a collection of measures to evaluate the reproducibility/replicability of system-oriented IR experiments

Use Python, OpenCV, and MediaPipe to control a keyboard with facial gestures

Rapid experimentation and scaling of deep learning models on molecular and crystal graphs.

DeFMO: Deblurring and Shape Recovery of Fast Moving Objects (CVPR 2021)

Using fully convolutional networks for semantic segmentation with caffe for the cityscapes dataset

DeepMoCap: Deep Optical Motion Capture using multiple Depth Sensors and Retro-reflectors

CSD: Consistency-based Semi-supervised learning for object Detection

StyleMapGAN - Official PyTorch Implementation

HEAM: High-Efficiency Approximate Multiplier Optimization for Deep Neural Networks

Malware Analysis Neural Network project.

Convert Mission Planner (ArduCopter) Waypoint Missions to Litchi CSV Format to execute on DJI Drones