Code of paper "CDFI: Compression-Driven Network Design for Frame Interpolation", CVPR 2021

Related tags

Deep LearningCDFI
Overview

CDFI (Compression-Driven-Frame-Interpolation)

[Paper] (Coming soon...) | [arXiv]

Tianyu Ding*, Luming Liang*, Zhihui Zhu, Ilya Zharkov

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021

Introduction

We propose a Compression-Driven network design for Frame Interpolation (CDFI), that leverages model compression to significantly reduce the model size (allows a better understanding of the current architecture) while making room for further improvements and achieving superior performance in the end. Concretely, we first compress AdaCoF and show that a 10X compressed AdaCoF performs similarly as its original counterpart; then we improve upon this compressed model with simple modifications. Note that typically it is prohibitive to implement the same improvements on the original heavy model.

  • We achieve a significant performance gain with only a quarter in size compared with the original AdaCoF

    Vimeo-90K Middlebury UCF101-DVF #Params
    PSNR, SSIM, LPIPS PSNR, SSIM, LPIPS PSNR, SSIM, LPIPS
    AdaCoF 34.38, 0.974, 0.019 35.74, 0.979, 0.019 35.20, 0.967, 0.019 21.8M
    Compressed AdaCoF 34.15, 0.973, 0.020 35.46, 0.978, 0.019 35.14, 0.967, 0.019 2.45M
    AdaCoF+ 34.58, 0.975, 0.018 36.12, 0.981, 0.017 35.19, 0.967, 0.019 22.9M
    Compressed AdaCoF+ 34.46, 0.975, 0.019 35.76, 0.979, 0.019 35.16, 0.967, 0.019 2.56M
    Our Final Model 35.19, 0.978, 0.010 37.17, 0.983, 0.008 35.24, 0.967, 0.015 4.98M
  • Our final model also performs favorably against other state-of-the-arts (details refer to our paper)

  • The proposed framework is generic and can be easily transferred to other DNN-based frame interpolation method

The above GIF is a demo of using our method to generate slow motion video, which increases the FPS from 5 to 160. We also provide a long video demonstration here (redirect to YouTube).

Environment

  • CUDA 11.0

  • python 3.8.3

  • torch 1.6.0

  • torchvision 0.7.0

  • cupy 7.7.0

  • scipy 1.5.2

  • numpy 1.19.1

  • Pillow 7.2.0

  • scikit-image 0.17.2

Test Pre-trained Models

Download repository:

$ git clone https://github.com/tding1/CDFI.git
$ cd CDFI/

Testing data

For user convenience, we already provide the Middlebury and UCF101-DVF test datasets in our repository, which can be found under directory test_data/.

Evaluation metrics

We use the built-in functions in skimage.metrics to compute the PSNR and SSIM, for which the higher the better. We also use LPIPS, a newly proposed metric that measures perceptual similarity, for which the smaller the better. For user convenience, we include the implementation of LPIPS in our repo under lpips_pytorch/, which is a slightly modified version of here (with an updated squeezenet backbone).

Test our pre-trained CDFI model

$ python test.py --gpu_id 0

By default, it will load our pre-trained model checkpoints/CDFI_adacof.pth. It will print the quantitative results on both Middlebury and UCF101-DVF, and the interpolated images will be saved under test_output/cdfi_adacof/.

Test the compressed AdaCoF

$ python test_compressed_adacof.py --gpu_id 0 --kernel_size 5 --dilation 1

By default, it will load the compressed AdaCoF model checkpoints/compressed_adacof_F_5_D_1.pth. It will print the quantitative results on both Middlebury and UCF101-DVF, and the interpolated images will be saved under test_output/compressed_adacof_F_5_D_1/.

Test the compressed AdaCoF+

$ python test_compressed_adacof.py --gpu_id 0 --kernel_size 11 --dilation 2

By default, it will load the compressed AdaCoF+ model checkpoints/compressed_adacof_F_11_D_2.pth. It will print the quantitative results on both Middlebury and UCF101-DVF, and the interpolated images will be saved under test_output/compressed_adacof_F_11_D_2/.

Interpolate two frames

$ python interpolate_twoframe.py --gpu_id 0 --first_frame figs/0.png --second_frame figs/1.png --output_frame output.png

By default, it will load our pre-trained model checkpoints/CDFI_adacof.pth, and generate the intermediate frame output.png given two consecutive frames in a sequence.

Train Our Model

Training data

We use the Vimeo-90K triplet dataset for video frame interpolation task, which is relatively large (>32 GB).

$ wget http://data.csail.mit.edu/tofu/dataset/vimeo_triplet.zip
$ unzip vimeo_triplet.zip
$ rm vimeo_triplet.zip

Start training

$ python train.py --gpu_id 0 --data_dir path/to/vimeo_triplet/ --batch_size 8

It will generate an unique ID for each training, and all the intermediate results/records will be saved under model_weights/<training id>/. For a GPU device with memory around 10GB, the --batch_size can take a value as large as 3, otherwise CUDA may be out of memory. There are many other training options, e.g., --lr, --epochs, --loss and so on, can be found in train.py.

Apply CDFI to New Models

One nice thing about CDFI is that the framework can be easily applied to other (heavy) DNN models and potentially boost their performance. The key to CDFI is the optimization-based compression that compresses a model via fine-grained pruning. In particular, we use the efficient and easy-to-use sparsity-inducing optimizer OBPROXSG (see also paper), and summarize the compression procedure for any other model in the following.

  • Copy the OBPROXSG optimizer, which is already implemented as torch.optim.optimizer, to your working directory
  • Starting from a pre-trained model, finetune its weights by using the OBPROXSG optimizer, like using any standard PyTorch built-in optimizer such as SGD or Adam
    • It is not necessarily to use the full dataset for this finetuning process
  • The parameters for the OBPROXSG optimizer
    • lr: learning rate
    • lambda_: coefficient of the L1 regularization term
    • epochSize: number of batches in a epoch
    • Np: number of proximal steps, which is set to be 2 for pruning AdaCoF
    • No: number of orthant steps (key step to promote sparsity), for which we recommend using the default setting
    • eps: threshold for trimming zeros, which is set to be 0.0001 for pruning AdaCoF
  • After the optimization is done (either by reaching a maximum number of epochs or achieving a high sparsity), use the layer density as the compression ratio for that layer, as described in the paper
  • As an example, compare the architectures in models/adacof.py and model/compressed_adacof.py for compressing AdaCoF with the above procedure

Now it's ready to make further improvements/modifications on the compressed model, based on the understanding of its flaws/drawbacks.

Citation

Coming soon...

Acknowledgements

The code is largely based on HyeongminLEE/AdaCoF-pytorch and baowenbo/DAIN.

Owner
Tianyu Ding
Ph.D. in Applied Mathematics \\ Master in Computer Science
Tianyu Ding
Unsupervised Foreground Extraction via Deep Region Competition

Unsupervised Foreground Extraction via Deep Region Competition [Paper] [Code] The official code repository for NeurIPS 2021 paper "Unsupervised Foregr

28 Nov 06, 2022
PyTorch implementation of Decoupling Value and Policy for Generalization in Reinforcement Learning

PyTorch implementation of Decoupling Value and Policy for Generalization in Reinforcement Learning

48 Dec 08, 2022
[ICML 2020] DrRepair: Learning to Repair Programs from Error Messages

DrRepair: Learning to Repair Programs from Error Messages This repo provides the source code & data of our paper: Graph-based, Self-Supervised Program

Michihiro Yasunaga 155 Jan 08, 2023
Weak-supervised Visual Geo-localization via Attention-based Knowledge Distillation

Weak-supervised Visual Geo-localization via Attention-based Knowledge Distillation Introduction WAKD is a PyTorch implementation for our ICPR-2022 pap

2 Oct 20, 2022
TSDF++: A Multi-Object Formulation for Dynamic Object Tracking and Reconstruction

TSDF++: A Multi-Object Formulation for Dynamic Object Tracking and Reconstruction TSDF++ is a novel multi-object TSDF formulation that can encode mult

ETHZ ASL 130 Dec 29, 2022
Learning cell communication from spatial graphs of cells

ncem Features Repository for the manuscript Fischer, D. S., Schaar, A. C. and Theis, F. Learning cell communication from spatial graphs of cells. 2021

Theis Lab 77 Dec 30, 2022
ColossalAI-Examples - Examples of training models with hybrid parallelism using ColossalAI

ColossalAI-Examples This repository contains examples of training models with Co

HPC-AI Tech 185 Jan 09, 2023
IsoGCN code for ICLR2021

IsoGCN The official implementation of IsoGCN, presented in the ICLR2021 paper Isometric Transformation Invariant and Equivariant Graph Convolutional N

horiem 39 Nov 25, 2022
Tensorflow/Keras Plug-N-Play Deep Learning Models Compilation

DeepBay This project was created with the objective of compile Machine Learning Architectures created using Tensorflow or Keras. The architectures mus

Whitman Bohorquez 4 Sep 26, 2022
Hybrid CenterNet - Hybrid-supervised object detection / Weakly semi-supervised object detection

Hybrid-Supervised Object Detection System Object detection system trained by hybrid-supervision/weakly semi-supervision (HSOD/WSSOD): This project is

5 Dec 10, 2022
Tiny Kinetics-400 for test

Kinetics-400迷你数据集 English | 简体中文 该数据集旨在解决的问题:参照Kinetics-400数据格式,训练基于自己数据的视频理解模型。 数据集介绍 Kinetics-400是视频领域benchmark常用数据集,详细介绍可以参考其官方网站Kinetics。整个数据集包含40

38 Jan 06, 2023
Codes for "CSDI: Conditional Score-based Diffusion Models for Probabilistic Time Series Imputation"

CSDI This is the github repository for the NeurIPS 2021 paper "CSDI: Conditional Score-based Diffusion Models for Probabilistic Time Series Imputation

106 Jan 04, 2023
Lenia - Mathematical Life Forms

For full version list, see Timeline in Lenia portal [2020-10-13] Update Python version with multi-kernel and multi-channel extensions (v3.4 LeniaNDK.p

Bert Chan 3.1k Dec 28, 2022
RAMA: Rapid algorithm for multicut problem

RAMA: Rapid algorithm for multicut problem Solves multicut (correlation clustering) problems orders of magnitude faster than CPU based solvers without

Paul Swoboda 60 Dec 13, 2022
INSPIRED: A Transparent Dialogue Dataset for Interactive Semantic Parsing

INSPIRED: A Transparent Dialogue Dataset for Interactive Semantic Parsing Existing studies on semantic parsing focus primarily on mapping a natural-la

7 Aug 22, 2022
Driller: augmenting AFL with symbolic execution!

Driller Driller is an implementation of the driller paper. This implementation was built on top of AFL with angr being used as a symbolic tracer. Dril

Shellphish 791 Jan 06, 2023
Real-Time Semantic Segmentation in Mobile device

Real-Time Semantic Segmentation in Mobile device This project is an example project of semantic segmentation for mobile real-time app. The architectur

708 Jan 01, 2023
DTCN IJCAI - Sequential prediction learning framework and algorithm

DTCN This is the implementation of our paper "Sequential Prediction of Social Me

Bobby 2 Jan 24, 2022
Code of the paper "Shaping Visual Representations with Attributes for Few-Shot Learning (ASL)".

Shaping Visual Representations with Attributes for Few-Shot Learning This code implements the Shaping Visual Representations with Attributes for Few-S

chx_nju 9 Sep 01, 2022
Benchmark for evaluating open-ended generation

OpenMEVA Contributed by Jian Guan, Zhexin Zhang. Thank Jiaxin Wen for DeBugging. OpenMEVA is a benchmark for evaluating open-ended story generation me

25 Nov 15, 2022