A fast model to compute optical flow between two input images.

Related tags

Deep LearningDCVNet
Overview

DCVNet: Dilated Cost Volumes for Fast Optical Flow

This repository contains our implementation of the paper:

@InProceedings{jiang2021dcvnet,
  title={DCVNet: Dilated Cost Volumes for Fast Optical Flow},
  author={Jiang, Huaizu and Learned-Miller, Erik},
  booktitle={arXiv},
  year={2021}
}

Need a fast optical flow model? Try DCVNet

  • Fast. On a mid-end GTX 1080ti GPU, DCVNet runs in real time at 71 fps (frames-per-second) to process images with sizes of 1024 × 436.
  • Compact and accurate. DCVNet has 4.94M parameters and consumes 1.68GB GPU memory during inference. It achieves comparable accuracy to state-of-the-art approaches on the MPI Sintel benchmark.

In the figure above, for each model, the circle radius indicates the number of parameters (larger radius means more parameters). The center of a circle corresponds to a model’s EPE (end-point-error).

Requirements

This code has been tested with Python 3.7, PyTorch 1.6.0, and CUDA 9.2. We suggest to use a conda environment.

conda create -n dcvnet
conda activate dcvnet
conda install pytorch=1.6.0 torchvision=0.7.0 cudatoolkit=10.1 matplotlib tensorboardX scipy opencv -c pytorch
pip install yacs

We use an open-source implementation https://github.com/ClementPinard/Pytorch-Correlation-extension to compute dilated cost volumes. Follow the instructions there to install this module.

Demos

Pretrained models can be downloaded by running

./scripts/download_models.sh

or downloaded from Google drive.

You can demo a pre-trained model on a sequence of frames

python demo.py --weights-path pretrained_models/sceneflow_dcvnet.pth --path demo-frames

Required data

The following datasets are required to train and evaluate DCVNet.

We borrow the data loaders used in RAFT. By default, dcvnet/data/raft/datasets.py will search for the datasets in these locations. You can create symbolic links to wherever the datasets were downloaded in the datasets folder

|-- datasets
    |-- Driving
        |-- frames_cleanpass
        |-- optical_flow
    |-- FlyingThings3D_subset
        |-- train
            |-- flow
            |-- image_clean
        |-- val
            |-- flow
            |-- image_clean
    |-- Monkaa
        |-- frames_cleanpass
        |-- optical_flow
    |-- MPI_Sintel
        |-- test
        |-- training
    |-- KITTI2012
        |-- testing
        |-- training
    |-- KITTI2015
        |-- testing
        |-- training
    |-- HD1K
        |-- hd1k_flow_gt
        |-- hd1k_input

Evaluation

You can evaluate a pre-trained model using tools/evaluate_optical_flow.py

python evaluate_optical_flow.py --weights_path models/dcvnet-sceneflow.pth --dataset sintel

You can optionally add the --amp switch to do inference in mixed precision to reduce GPU memory usage.

Training

We used 8 GTX 1080ti GPUs for training. Training logs will be written to the output folder, which can be visualized using tensorboard.

# train on the synthetic scene flow dataset
python tools/train_optical_flow.py --config-file configs/sceneflow_dcvnet.yaml 

# fine-tune it on the MPI-Sintel dataset
# 4 GPUs are sufficient, but here we use 8 GPUs for fast training
python tools/train_optical_flow.py --config-file configs/sintel_dcvnet.yaml --pretrain-weights output/SceneFlow/sceneflow_dcvnet/default/train_epoch_50.pth

# fine-tune it on the KITTI 2012 and 2015 dataset
# we only use 6 GPUs (3 GPUs are sufficient) since the batch size is 6
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5 python tools/train_optical_flow.py --config-file configs/kitti12+15_dcvnet.yaml --pretrain-weights output/Sintel+SceneFlow/sintel_dcvnet/default/train_epoch_5.pth

Note on the inference speed

In the main branch, the computation of the dilated cost volumes can be further optimized without using the for loop. Checkout the efficient branch for details. If you are interested in testing the inference speed, we suggest to switch to the efficient branch.

git checkout efficient
CUDA_VISIBLE_DEVICES=0 python tools/evaluate_optical_flow.py --dry-run

We haven't fixed this problem because our pre-trained models are based on the implementation in the main branch, which are not compatible with the resizing in the efficient branch. We need to re-train all our models. It will be fixed soon.

To-do

  • Fix the problem of efficient cost volume computation.
  • Train the model on the AutoFlow dataset.

Acknowledgment

Our implementation is built on top of RAFT, Pytorch-Correlation-extension, yacs, Detectron2, and semseg. We thank the authors for releasing and maintaining the code.

Owner
Huaizu Jiang
Assistant Professor at Northeastern University.
Huaizu Jiang
Code for visualizing the loss landscape of neural nets

Visualizing the Loss Landscape of Neural Nets This repository contains the PyTorch code for the paper Hao Li, Zheng Xu, Gavin Taylor, Christoph Studer

Tom Goldstein 2.2k Jan 09, 2023
Vertex AI: Serverless framework for MLOPs (ESP / ENG)

Vertex AI: Serverless framework for MLOPs (ESP / ENG) Español Qué es esto? Este repo contiene un pipeline end to end diseñado usando el SDK de Kubeflo

Hernán Escudero 2 Apr 28, 2022
Code used to generate the results appearing in "Train longer, generalize better: closing the generalization gap in large batch training of neural networks"

Train longer, generalize better - Big batch training This is a code repository used to generate the results appearing in "Train longer, generalize bet

Elad Hoffer 145 Sep 16, 2022
Codeflare - Scale complex AI/ML pipelines anywhere

Scale complex AI/ML pipelines anywhere CodeFlare is a framework to simplify the integration, scaling and acceleration of complex multi-step analytics

CodeFlare 169 Nov 29, 2022
Exploring Visual Engagement Signals for Representation Learning

Exploring Visual Engagement Signals for Representation Learning Menglin Jia, Zuxuan Wu, Austin Reiter, Claire Cardie, Serge Belongie and Ser-Nam Lim C

Menglin Jia 9 Jul 23, 2022
PyTorch implementation of Off-policy Learning in Two-stage Recommender Systems

Off-Policy-2-Stage This repo provides a PyTorch implementation of the MovieLens experiments for the following paper: Off-policy Learning in Two-stage

Jiaqi Ma 25 Dec 12, 2022
Pytorch Implementation of Residual Vision Transformers(ResViT)

ResViT Official Pytorch Implementation of Residual Vision Transformers(ResViT) which is described in the following paper: Onat Dalmaz and Mahmut Yurt

ICON Lab 41 Dec 08, 2022
The repository contains source code and models to use PixelNet architecture used for various pixel-level tasks. More details can be accessed at .

PixelNet: Representation of the pixels, by the pixels, and for the pixels. We explore design principles for general pixel-level prediction problems, f

Aayush Bansal 196 Aug 10, 2022
The official TensorFlow implementation of the paper Action Transformer: A Self-Attention Model for Short-Time Pose-Based Human Action Recognition

Action Transformer A Self-Attention Model for Short-Time Human Action Recognition This repository contains the official TensorFlow implementation of t

PIC4SeRCentre 20 Jan 03, 2023
Contrastive Fact Verification

VitaminC This repository contains the dataset and models for the NAACL 2021 paper: Get Your Vitamin C! Robust Fact Verification with Contrastive Evide

47 Dec 19, 2022
Real-time LIDAR-based Urban Road and Sidewalk detection for Autonomous Vehicles 🚗

urban_road_filter: a real-time LIDAR-based urban road and sidewalk detection algorithm for autonomous vehicles Dependency ROS (tested with Kinetic and

JKK - Vehicle Industry Research Center 180 Dec 12, 2022
CoINN: Correlated-informed neural networks: a new machine learning framework to predict pressure drop in micro-channels

CoINN: Correlated-informed neural networks: a new machine learning framework to predict pressure drop in micro-channels Accurate pressure drop estimat

Alejandro Montanez 0 Jan 21, 2022
NeuroGen: activation optimized image synthesis for discovery neuroscience

NeuroGen: activation optimized image synthesis for discovery neuroscience NeuroGen is a framework for synthesizing images that control brain activatio

3 Aug 17, 2022
Aligning Latent and Image Spaces to Connect the Unconnectable

About This repo contains the official implementation of the Aligning Latent and Image Spaces to Connect the Unconnectable paper. It is a GAN model whi

Ivan Skorokhodov 203 Jan 03, 2023
CSD: Consistency-based Semi-supervised learning for object Detection

CSD: Consistency-based Semi-supervised learning for object Detection (NeurIPS 2019) By Jisoo Jeong, Seungeui Lee, Jee-soo Kim, Nojun Kwak Installation

80 Dec 15, 2022
A CROSS-MODAL FUSION NETWORK BASED ON SELF-ATTENTION AND RESIDUAL STRUCTURE FOR MULTIMODAL EMOTION RECOGNITION

CFN-SR A CROSS-MODAL FUSION NETWORK BASED ON SELF-ATTENTION AND RESIDUAL STRUCTURE FOR MULTIMODAL EMOTION RECOGNITION The audio-video based multimodal

skeleton 15 Sep 26, 2022
A task Provided by A respective Artenal Ai and Ml based Company to complete it

A task Provided by A respective Alternal Ai and Ml based Company to complete it .

Parth Madan 1 Jan 25, 2022
A PyTorch Implementation of Neural IMage Assessment

NIMA: Neural IMage Assessment This is a PyTorch implementation of the paper NIMA: Neural IMage Assessment (accepted at IEEE Transactions on Image Proc

yunxiaos 418 Dec 29, 2022
A toy project using OpenCV and PyMunk

A toy project using OpenCV, PyMunk and Mediapipe the source code for my LindkedIn post It's just a toy project and I didn't write a documentation yet,

Amirabbas Asadi 82 Oct 28, 2022
Python TFLite scripts for detecting objects of any class in an image without knowing their label.

Python TFLite scripts for detecting objects of any class in an image without knowing their label.

Ibai Gorordo 42 Oct 07, 2022