Playable Video Generation

Overview

Playable Video Generation




Playable Video Generation
Willi Menapace, Stéphane Lathuilière, Sergey Tulyakov, Aliaksandr Siarohin, Elisa Ricci

Paper: ArXiv
Supplementary: Website
Demo: Try it Live

Abstract: This paper introduces the unsupervised learning problem of playable video generation (PVG). In PVG, we aim at allowing a user to control the generated video by selecting a discrete action at every time step as when playing a video game. The difficulty of the task lies both in learning semantically consistent actions and in generating realistic videos conditioned on the user input. We propose a novel framework for PVG that is trained in a self-supervised manner on a large dataset of unlabelled videos. We employ an encoder-decoder architecture where the predicted action labels act as bottleneck. The network is constrained to learn a rich action space using, as main driving loss, a reconstruction loss on the generated video. We demonstrate the effectiveness of the proposed approach on several datasets with wide environment variety.

Overview



Figure 1. Illustration of the proposed CADDY model for playable video generation.


Given a set of completely unlabeled videos, we jointly learn a set of discrete actions and a video generation model conditioned on the learned actions. At test time, the user can control the generated video on-the-fly providing action labels as if he or she was playing a videogame. We name our method CADDY. Our architecture for unsupervised playable video generation is composed by several components. An encoder E extracts frame representations from the input sequence. A temporal model estimates the successive states using a recurrent dynamics network R and an action network A which predicts the action label corresponding to the current action performed in the input sequence. Finally, a decoder D reconstructs the input frames. The model is trained using reconstruction as the main driving loss.

Requirements

We recommend the use of Linux and of one or more CUDA compatible GPUs. We provide both a Conda environment and a Dockerfile to configure the required libraries.

Conda

The environment can be installed and activated with:

conda env create -f env.yml

conda activate video-generation

Docker

Use the Dockerfile to build the docker image:

docker build -t video-generation:1.0 .

Run the docker image mounting the root directory to /video-generation in the docker container:

docker run -it --gpus all --ipc=host -v /path/to/directory/video-generation:/video-generation video-generation:1.0 /bin/bash

Preparing Datasets

BAIR

Coming soon

Atari Breakout

Download the breakout_160_ours.tar.gz archive from Google Drive and extract it under the data folder.

Tennis

The Tennis dataset is automatically acquired from Youtube by running

./get_tennis_dataset.sh

This requires an installation of youtube-dl (Download). Please run youtube-dl -U to update the utility to the latest version. The dataset will be created at data/tennis_v4_256_ours.

Custom Datasets

Custom datasets can be created from a user-provided folder containing plain videos. Acquired video frames are sampled at the specified resolution and framerate. ffmpeg is used for the extraction and supports multiple input formats. By default only mp4 files are acquired.

python -m dataset.acquisition.convert_video_directory --video_directory --output_directory --target_size [--fps --video_extension --processes ]

As an example the following command transforms all mp4 videos in the tmp/my_videos directory into a 256x256px dataset sampled at 10fps and saves it in the data/my_videos folder python -m dataset.acquisition.convert_video_directory --video_directory tmp/my_videos --output_directory data/my_videos --target_size 256 256 --fps 10

Using Pretrained Models

Pretrained models in .pth.tar format are available for all the datasets and can be downloaded at the following link: Google Drive

Please place each directory under the checkpoints folder. Training and inference scripts automatically make use of the latest.pth.tar checkpoint when present in the checkpoints subfolder corresponding to the configuration in use.

Playing

When a latest.pth.tar checkpoint is present under the checkpoints folder corresponding to the current configuration, the model can be interactively used to generate videos with the following commands:

  • Bair: python play.py --config configs/01_bair.yaml

  • Breakout: python play.py configs/breakout/02_breakout.yaml

  • Tennis: python play.py --config configs/03_tennis.yaml

A full screen window will appear and actions can be provided using number keys in the range [1, actions_count]. Number key 0 resets the generation process.

The inference process is lightweight and can be executed even in browser as in our Live Demo.

Training

The models can be trained with the following commands:

python train.py --config configs/

The training process generates multiple files under the results and checkpoint directories a sub directory with the name corresponding to the one specified in the configuration file. In particular, the folder under the results directory will contain an images folder showing qualitative results obtained during training. The checkpoints subfolder will contain regularly saved checkpoints and the latest.pth.tar checkpoint representing the latest model parameters.

The training can be completely monitored through Weights and Biases by running before execution of the training command: wandb init

Training the model in full resolution on our datasets required the following GPU resources:

  • BAIR: 4x2080Ti 44GB
  • Breakout: 1x2080Ti 11GB
  • Tennis: 2x2080 16GB

Lower resolution versions of the model can be trained with a single 8GB GPU.

Evaluation

Evaluation requires two steps. First, an evaluation dataset must be built. Second, evaluation is carried out on the evaluation dataset. To build the evaluation dataset please issue:

python build_evaluation_dataset.py --config configs/

The command creates a reconstruction of the test portion of the dataset under the results//evaluation_dataset directory. To run evaluation issue:

python evaluate_dataset.py --config configs/evaluation/configs/

Evaluation results are saved under the evaluation_results directory the folder specified in the configuration file with the name data.yml.

Owner
Willi Menapace
Hi, I'm Willi Menapace, Ph.D Student and passionate deep learning practitioner. Here you can find some of the projects I am allowed to publish.
Willi Menapace
A library of extension and helper modules for Python's data analysis and machine learning libraries.

Mlxtend (machine learning extensions) is a Python library of useful tools for the day-to-day data science tasks. Sebastian Raschka 2014-2020 Links Doc

Sebastian Raschka 4.2k Jan 02, 2023
"NAS-Bench-301 and the Case for Surrogate Benchmarks for Neural Architecture Search".

NAS-Bench-301 This repository containts code for the paper: "NAS-Bench-301 and the Case for Surrogate Benchmarks for Neural Architecture Search". The

AutoML-Freiburg-Hannover 57 Nov 30, 2022
Pytorch implementation of "MOSNet: Deep Learning based Objective Assessment for Voice Conversion"

MOSNet pytorch implementation of "MOSNet: Deep Learning based Objective Assessment for Voice Conversion" https://arxiv.org/abs/1904.08352 Dependency L

9 Nov 18, 2022
Learn about Spice.ai with in-depth samples

Samples Learn about Spice.ai with in-depth samples ServerOps - Learn when to run server maintainance during periods of low load Gardener - Intelligent

Spice.ai 16 Mar 23, 2022
Normal Learning in Videos with Attention Prototype Network

Codes_APN Official codes of CVPR21 paper: Normal Learning in Videos with Attention Prototype Network (https://arxiv.org/abs/2108.11055) Overview of ou

11 Dec 13, 2022
A PyTorch implementation of a Factorization Machine module in cython.

fmpytorch A library for factorization machines in pytorch. A factorization machine is like a linear model, except multiplicative interaction terms bet

Jack Hessel 167 Jul 06, 2022
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

Salesforce 1.3k Dec 31, 2022
This initial strategy was developed specifically for larger pools and is based on taking a moving average and deriving Bollinger Bands to create a projected active liquidity range.

Gamma's Strategy One This initial strategy was developed specifically for larger pools and is based on taking a moving average and deriving Bollinger

Gamma Strategies 46 Dec 02, 2022
OpenFace – a state-of-the art tool intended for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation.

OpenFace 2.2.0: a facial behavior analysis toolkit Over the past few years, there has been an increased interest in automatic facial behavior analysis

Tadas Baltrusaitis 5.8k Dec 31, 2022
Using Tensorflow Object Detection API to detect Waymo open dataset

Waymo-2D-Object-Detection Using Tensorflow Object Detection API to detect Waymo open dataset Result CenterNet Training Loss SSD ResNet Training Loss C

76 Dec 12, 2022
Adjust Decision Boundary for Class Imbalanced Learning

Adjusting Decision Boundary for Class Imbalanced Learning This repository is the official PyTorch implementation of WVN-RS, introduced in Adjusting De

Peyton Byungju Kim 16 Jan 04, 2023
Combining Diverse Feature Priors

Combining Diverse Feature Priors This repository contains code for reproducing the results of our paper. Paper: https://arxiv.org/abs/2110.08220 Blog

Madry Lab 5 Nov 12, 2022
Official implementation of SynthTIGER (Synthetic Text Image GEneratoR) ICDAR 2021

🐯 SynthTIGER: Synthetic Text Image GEneratoR Official implementation of SynthTIGER | Paper | Datasets Moonbin Yim1, Yoonsik Kim1, Han-cheol Cho1, Sun

Clova AI Research 256 Jan 05, 2023
A resource for learning about ML, DL, PyTorch and TensorFlow. Feedback always appreciated :)

A resource for learning about ML, DL, PyTorch and TensorFlow. Feedback always appreciated :)

Aladdin Persson 4.7k Jan 08, 2023
PyTorch implementation of Deep HDR Imaging via A Non-Local Network (TIP 2020).

NHDRRNet-PyTorch This is the PyTorch implementation of Deep HDR Imaging via A Non-Local Network (TIP 2020). 0. Differences between Original Paper and

Yutong Zhang 1 Mar 01, 2022
DiscoBox: Weakly Supervised Instance Segmentation and Semantic Correspondence from Box Supervision

The Official PyTorch Implementation of DiscoBox: Weakly Supervised Instance Segmentation and Semantic Correspondence from Box Supervision

Shiyi Lan 3 Oct 15, 2021
DeepSTD: Mining Spatio-temporal Disturbances of Multiple Context Factors for Citywide Traffic Flow Prediction

DeepSTD: Mining Spatio-temporal Disturbances of Multiple Context Factors for Citywide Traffic Flow Prediction This is the implementation of DeepSTD in

5 Sep 26, 2022
Another pytorch implementation of FCN (Fully Convolutional Networks)

FCN-pytorch-easiest Trying to be the easiest FCN pytorch implementation and just in a get and use fashion Here I use a handbag semantic segmentation f

Y. Dong 158 Dec 21, 2022
Deeper DCGAN with AE stabilization

AEGeAN Deeper DCGAN with AE stabilization Parallel training of generative adversarial network as an autoencoder with dedicated losses for each stage.

Tyler Kvochick 36 Feb 17, 2022
Application of the L2HMC algorithm to simulations in lattice QCD.

l2hmc-qcd 📊 Slides Recent talk on Training Topological Samplers for Lattice Gauge Theory from the Machine Learning for High Energy Physics, on and of

Sam Foreman 37 Dec 14, 2022