Code for "Layered Neural Rendering for Retiming People in Video."

Overview

Layered Neural Rendering in PyTorch

This repository contains training code for the examples in the SIGGRAPH Asia 2020 paper "Layered Neural Rendering for Retiming People in Video."

This is not an officially supported Google product.

Prerequisites

  • Linux
  • Python 3.6+
  • NVIDIA GPU + CUDA CuDNN

Installation

This code has been tested with PyTorch 1.4 and Python 3.8.

  • Install PyTorch 1.4 and other dependencies.
    • For pip users, please type the command pip install -r requirements.txt.
    • For Conda users, you can create a new Conda environment using conda env create -f environment.yml.

Data Processing

  • Download the data for a video used in our paper (e.g. "reflection"):
bash ./datasets/download_data.sh reflection
  • Or alternatively, download all the data by specifying all.
  • Download the pretrained keypoint-to-UV model weights:
bash ./scripts/download_kp2uv_model.sh

The pretrained model will be saved at ./checkpoints/kp2uv/latest_net_Kp2uv.pth.

  • Generate the UV maps from the keypoints:
bash datasets/prepare_iuv.sh ./datasets/reflection

Training

  • To train a model on a video (e.g. "reflection"), run:
python train.py --name reflection --dataroot ./datasets/reflection --gpu_ids 0,1
  • To view training results and loss plots, visit the URL http://localhost:8097. Intermediate results are also at ./checkpoints/reflection/web/index.html.

You can find more scripts in the scripts directory, e.g. run_${VIDEO}.sh which combines data processing, training, and saving layer results for a video.

Note:

  • It is recommended to use >=2 GPUs, each with >=16GB memory.
  • The training script first trains the low-resolution model for --num_epochs at --batch_size, and then trains the upsampling module for --num_epochs_upsample at --batch_size_upsample. If you do not need the upsampled result, pass --num_epochs_upsample 0.
  • Training the upsampling module requires ~2.5x memory as the low-resolution model, so set batch_size_upsample accordingly. The provided scripts set the batch sizes appropriately for 2 GPUs with 16GB memory.
  • GPU memory scales linearly with the number of layers.

Saving layer results from a trained model

  • Run the trained model:
python test.py --name reflection --dataroot ./datasets/reflection --do_upsampling
  • The results (RGBA layers, videos) will be saved to ./results/reflection/test_latest/.
  • Passing --do_upsampling uses the results of the upsampling module. If the upsampling module hasn't been trained (num_epochs_upsample=0), then remove this flag.

Custom video

To train on your own video, you will have to preprocess the data:

  1. Extract the frames, e.g.
    mkdir ./datasets/my_video && cd ./datasets/my_video 
    mkdir rgb && ffmpeg -i video.mp4 rgb/%04d.png
    
  2. Resize the video to 256x448 and save the frames in my_video/rgb_256, and resize the video to 512x896 and save in my_video/rgb_512.
  3. Run AlphaPose and Pose Tracking on the frames. Save results as my_video/keypoints.json
  4. Create my_video/metadata.json following these instructions.
  5. If your video has camera motion, either (1) stabilize the video, or (2) maintain the camera motion by computing homographies and saving as my_video/homographies.txt. See scripts/run_cartwheel.sh for a training example with camera motion, and see ./datasets/cartwheel/homographies.txt for formatting.

Note: Videos that are suitable for our method have the following attributes:

  • Static camera or limited camera motion that can be represented with a homography.
  • Limited number of people, due to GPU memory limitations. We tested up to 7 people and 7 layers. Multiple people can be grouped onto the same layer, though they cannot be individually retimed.
  • People that move relative to the background (static people will be absorbed into the background layer).
  • We tested a video length of up to 200 frames (~7 seconds).

Citation

If you use this code for your research, please cite the following paper:

@inproceedings{lu2020,
  title={Layered Neural Rendering for Retiming People in Video},
  author={Lu, Erika and Cole, Forrester and Dekel, Tali and Xie, Weidi and Zisserman, Andrew and Salesin, David and Freeman, William T and Rubinstein, Michael},
  booktitle={SIGGRAPH Asia},
  year={2020}
}

Acknowledgments

This code is based on pytorch-CycleGAN-and-pix2pix.

Owner
Google
Google ❤️ Open Source
Google
Using fully convolutional networks for semantic segmentation with caffe for the cityscapes dataset

Using fully convolutional networks for semantic segmentation (Shelhamer et al.) with caffe for the cityscapes dataset How to get started Download the

Simon Guist 27 Jun 06, 2022
Event sourced bank - A wide-and-shallow example using the Python event sourcing library

Event Sourced Bank A "wide but shallow" example of using the Python event sourci

3 Mar 09, 2022
Message Passing on Cell Complexes

CW Networks This repository contains the code used for the papers Weisfeiler and Lehman Go Cellular: CW Networks (Under review) and Weisfeiler and Leh

Twitter Research 108 Jan 05, 2023
TDN: Temporal Difference Networks for Efficient Action Recognition

TDN: Temporal Difference Networks for Efficient Action Recognition Overview We release the PyTorch code of the TDN(Temporal Difference Networks).

Multimedia Computing Group, Nanjing University 326 Dec 13, 2022
Image Captioning using CNN and Transformers

Image-Captioning Keras/Tensorflow Image Captioning application using CNN and Transformer as encoder/decoder. In particulary, the architecture consists

24 Dec 28, 2022
Arquitetura e Desenho de Software.

S203 Este é um repositório dedicado às aulas de Arquitetura e Desenho de Software, cuja sigla é "S203". E agora, José? Como não tenho muito a falar aq

Fabio 7 Oct 23, 2021
Automatic 2D-to-3D Video Conversion with CNNs

Deep3D: Automatic 2D-to-3D Video Conversion with CNNs How To Run To run this code. Please install MXNet following the official document. Deep3D requir

Eric Junyuan Xie 1.2k Dec 30, 2022
The source code of CVPR17 'Generative Face Completion'.

GenerativeFaceCompletion Matcaffe implementation of our CVPR17 paper on face completion. In each panel from left to right: original face, masked input

Yijun Li 313 Oct 18, 2022
Blender add-on: Add to Cameras menu: View → Camera, View → Add Camera, Camera → View, Previous Camera, Next Camera

Blender add-on: Camera additions In 3D view, it adds these actions to the View|Cameras menu: View → Camera : set the current camera to the 3D view Vie

German Bauer 11 Feb 08, 2022
Learning to Disambiguate Strongly Interacting Hands via Probabilistic Per-Pixel Part Segmentation [3DV 2021 Oral]

Learning to Disambiguate Strongly Interacting Hands via Probabilistic Per-Pixel Part Segmentation [3DV 2021 Oral] Learning to Disambiguate Strongly In

Zicong Fan 40 Dec 22, 2022
A set of tools for converting a darknet dataset to COCO format working with YOLOX

darknet格式数据→COCO darknet训练数据目录结构(详情参见dataset/darknet): darknet ├── class.names ├── gen_config.data ├── gen_train.txt ├── gen_valid.txt └── images

RapidAI-NG 148 Jan 03, 2023
GeDML is an easy-to-use generalized deep metric learning library

GeDML is an easy-to-use generalized deep metric learning library

Borui Zhang 32 Dec 05, 2022
Approaches to modeling terrain and maps in python

topography 🌎 Contains different approaches to modeling terrain and topographic-style maps in python Features Inverse Distance Weighting (IDW) A given

John Gutierrez 1 Aug 10, 2022
IEEE-CIS Technical Challenge on Predict+Optimize for Renewable Energy Scheduling

IEEE-CIS Technical Challenge on Predict+Optimize for Renewable Energy Scheduling This is my code, data and approach for the IEEE-CIS Technical Challen

3 Sep 18, 2022
A customisable game where you have to quickly click on black tiles in order of appearance while avoiding clicking on white squares.

W.I.P-Aim-Memory-Game A customisable game where you have to quickly click on black tiles in order of appearance while avoiding clicking on white squar

dE_soot 1 Dec 08, 2021
Metric learning algorithms in Python

metric-learn: Metric Learning in Python metric-learn contains efficient Python implementations of several popular supervised and weakly-supervised met

1.3k Jan 02, 2023
Solutions of Reinforcement Learning 2nd Edition

Solutions of Reinforcement Learning, An Introduction

YIFAN WANG 1.4k Dec 30, 2022
Official code for "Decoupling Zero-Shot Semantic Segmentation"

Decoupling Zero-Shot Semantic Segmentation This is the official code for the arxiv. ZegFormer is the first framework that decouple the zero-shot seman

Jian Ding 108 Dec 30, 2022
Code for our WACV 2022 paper "Hyper-Convolution Networks for Biomedical Image Segmentation"

Hyper-Convolution Networks for Biomedical Image Segmentation Code for our WACV 2022 paper "Hyper-Convolution Networks for Biomedical Image Segmentatio

Tianyu Ma 17 Nov 02, 2022