Code for "Layered Neural Rendering for Retiming People in Video."

Last update: Dec 16, 2022

Overview

Layered Neural Rendering in PyTorch

This repository contains training code for the examples in the SIGGRAPH Asia 2020 paper "Layered Neural Rendering for Retiming People in Video."

This is not an officially supported Google product.

Prerequisites

Linux
Python 3.6+
NVIDIA GPU + CUDA CuDNN

Installation

This code has been tested with PyTorch 1.4 and Python 3.8.

Install PyTorch 1.4 and other dependencies.
- For pip users, please type the command pip install -r requirements.txt.
- For Conda users, you can create a new Conda environment using conda env create -f environment.yml.

Data Processing

Download the data for a video used in our paper (e.g. "reflection"):

bash ./datasets/download_data.sh reflection

Or alternatively, download all the data by specifying all.
Download the pretrained keypoint-to-UV model weights:

bash ./scripts/download_kp2uv_model.sh

The pretrained model will be saved at ./checkpoints/kp2uv/latest_net_Kp2uv.pth.

Generate the UV maps from the keypoints:

bash datasets/prepare_iuv.sh ./datasets/reflection

Training

To train a model on a video (e.g. "reflection"), run:

python train.py --name reflection --dataroot ./datasets/reflection --gpu_ids 0,1

To view training results and loss plots, visit the URL http://localhost:8097. Intermediate results are also at ./checkpoints/reflection/web/index.html.

You can find more scripts in the scripts directory, e.g. run_${VIDEO}.sh which combines data processing, training, and saving layer results for a video.

Note:

It is recommended to use >=2 GPUs, each with >=16GB memory.
The training script first trains the low-resolution model for --num_epochs at --batch_size, and then trains the upsampling module for --num_epochs_upsample at --batch_size_upsample. If you do not need the upsampled result, pass --num_epochs_upsample 0.
Training the upsampling module requires ~2.5x memory as the low-resolution model, so set batch_size_upsample accordingly. The provided scripts set the batch sizes appropriately for 2 GPUs with 16GB memory.
GPU memory scales linearly with the number of layers.

Saving layer results from a trained model

Run the trained model:

python test.py --name reflection --dataroot ./datasets/reflection --do_upsampling

The results (RGBA layers, videos) will be saved to ./results/reflection/test_latest/.
Passing --do_upsampling uses the results of the upsampling module. If the upsampling module hasn't been trained (num_epochs_upsample=0), then remove this flag.

Custom video

To train on your own video, you will have to preprocess the data:

Extract the frames, e.g.

mkdir ./datasets/my_video && cd ./datasets/my_video 
mkdir rgb && ffmpeg -i video.mp4 rgb/%04d.png

Resize the video to 256x448 and save the frames in my_video/rgb_256, and resize the video to 512x896 and save in my_video/rgb_512.
Run AlphaPose and Pose Tracking on the frames. Save results as my_video/keypoints.json
Create my_video/metadata.json following these instructions.
If your video has camera motion, either (1) stabilize the video, or (2) maintain the camera motion by computing homographies and saving as my_video/homographies.txt. See scripts/run_cartwheel.sh for a training example with camera motion, and see ./datasets/cartwheel/homographies.txt for formatting.

Note: Videos that are suitable for our method have the following attributes:

Static camera or limited camera motion that can be represented with a homography.
Limited number of people, due to GPU memory limitations. We tested up to 7 people and 7 layers. Multiple people can be grouped onto the same layer, though they cannot be individually retimed.
People that move relative to the background (static people will be absorbed into the background layer).
We tested a video length of up to 200 frames (~7 seconds).

Citation

If you use this code for your research, please cite the following paper:

@inproceedings{lu2020,
  title={Layered Neural Rendering for Retiming People in Video},
  author={Lu, Erika and Cole, Forrester and Dekel, Tali and Xie, Weidi and Zisserman, Andrew and Salesin, David and Freeman, William T and Rubinstein, Michael},
  booktitle={SIGGRAPH Asia},
  year={2020}
}

Acknowledgments

This code is based on pytorch-CycleGAN-and-pix2pix.

Code for "Layered Neural Rendering for Retiming People in Video."

Related tags

Overview

Layered Neural Rendering in PyTorch

Prerequisites

Installation

Data Processing

Training

Saving layer results from a trained model

Custom video

Citation

Acknowledgments

Owner

Google

FANet - Real-time Semantic Segmentation with Fast Attention

RLDS stands for Reinforcement Learning Datasets

[ICLR 2022] Pretraining Text Encoders with Adversarial Mixture of Training Signal Generators

A map update dataset and benchmark

PyTorch implementation for the Neuro-Symbolic Sudoku Solver leveraging the power of Neural Logic Machines (NLM)

Code for "Continuous-Time Meta-Learning with Forward Mode Differentiation" (ICLR 2022)

The 1st place solution of track2 (Vehicle Re-Identification) in the NVIDIA AI City Challenge at CVPR 2021 Workshop.

ColBERT: Contextualized Late Interaction over BERT (SIGIR'20)

Airborne Optical Sectioning (AOS) is a wide synthetic-aperture imaging technique

A pre-trained model with multi-exit transformer architecture.

code for "Self-supervised edge features for improved Graph Neural Network training",

Official pytorch implementation of "DSPoint: Dual-scale Point Cloud Recognition with High-frequency Fusion"

Perform zero-order Hankel Transform for an 1D array (float or real valued).

This repository contains the code used to quantitatively evaluate counterfactual examples in the associated paper.

On Evaluation Metrics for Graph Generative Models

DeLighT: Very Deep and Light-Weight Transformers

Supporting code for short YouTube series Neural Networks Demystified.

Source code of D-HAN: Dynamic News Recommendation with Hierarchical Attention Network

Does Oversizing Improve Prosumer Profitability in a Flexibility Market? - A Sensitivity Analysis using PV-battery System

A curated list of neural rendering resources.