A TensorFlow Implementation of "Deep Multi-Scale Video Prediction Beyond Mean Square Error" by Mathieu, Couprie & LeCun.

Overview

Adversarial Video Generation

This project implements a generative adversarial network to predict future frames of video, as detailed in "Deep Multi-Scale Video Prediction Beyond Mean Square Error" by Mathieu, Couprie & LeCun. Their official code (using Torch) can be found here.

Adversarial generation uses two networks – a generator and a discriminator – to improve the sharpness of generated images. Given the past four frames of video, the generator learns to generate accurate predictions for the next frame. Given either a generated or a real-world image, the discriminator learns to correctly classify between generated and real. The two networks "compete," with the generator attempting to fool the discriminator into classifying its output as real. This forces the generator to create frames that are very similar to what real frames in the domain might look like.

Results and Comparison

I trained and tested my network on a dataset of frame sequences from Ms. Pac-Man. To compare adversarial training vs. non-adversarial, I trained an adversarial network for 500,000 steps on both the generator and discriminator, and I trained a non-adversarial network for 1,000,000 steps (as the non-adversarial network runs about twice as fast). Training took around 24 hours for each network, using a GTX 980TI GPU.

In the following examples, I ran the networks recursively for 64 frames. (i.e. The input to generate the first frame was [input1, input2, input3, input4], the input to generate the second frame was [input2, input3, input4, generated1], etc.). As the networks are not fed actions from the original game, they cannot predict much of the true motion (such as in which direction Ms. Pac-Man will turn). Thus, the goal is not to line up perfectly with the ground truth images, but to maintain a crisp and likely representation of the world.

The following example exhibits how quickly the non-adversarial network becomes fuzzy and loses definition of the sprites. The adversarial network exhibits this behavior to an extent, but is much better at maintaining sharp representations of at least some sprites throughout the sequence:

This example shows how the adversarial network is able to keep a sharp representation of Ms. Pac-Man around multiple turns, while the non-adversarial network fails to do so:

While the adversarial network is clearly superior in terms of sharpness and consistency over time, the non-adversarial network does generate some fun/spectacular failures:

Using the error measurements outlined in the paper (Peak Signal to Noise Ratio and Sharp Difference) did not show significant difference between adversarial and non-adversarial training. I believe this is because sequential frames from the Ms. Pac-Man dataset have no motion in the majority of pixels, while the original paper was trained on real-world video where there is motion in much of the frame. Despite this, it is clear that adversarial training produces a qualitative improvement in the sharpness of the generated frames, especially over long time spans. You can view the loss and error statistics by running tensorboard --logdir=./Results/Summaries/ from the root of this project.

Usage

  1. Clone or download this repository.
  2. Prepare your data:
  • If you want to replicate my results, you can download the Ms. Pac-Man dataset here. Put this in a directory named Data/ in the root of this project for default behavior. Otherwise, you will need to specify your data location using the options outlined in parts 3 and 4.
  • If you would like to train on your own videos, preprocess them so that they are directories of frame sequences as structured below. (Neither the names nor the image extensions matter, only the structure):
  - Test
    - Video 1
      - frame1.png
      - frame2.png
      - frame ...
      - frameN.png
    - Video ...
    - Video N
      - ...
  - Train
    - Video 1
      - frame ...
    - Video ...
    - Video N
      - frame ...
  1. Process training data:
  • The network trains on random 32x32 pixel crops of the input images, filtered to make sure that most clips have some movement in them. To process your input data into this form, run the script python process_data from the Code/ directory with the following options:
-n/--num_clips= <# clips to process for training> (Default = 5000000)
-t/--train_dir= <Directory of full training frames>
-c/--clips_dir= <Save directory for processed clips>
                (I suggest making this a hidden dir so the filesystem doesn't freeze
                 with so many files. DON'T `ls` THIS DIRECTORY!)
-o/--overwrite  (Overwrites the previous data in clips_dir)
-H/--help       (prints usage)
  • This can take a few hours to complete, depending on the number of clips you want.
  1. Train/Test:
  • If you want to plug-and-play with the Ms. Pac-Man dataset, you can download my trained models here. Load them using the -l option. (e.g. python avg_runner.py -l ./Models/Adversarial/model.ckpt-500000).
  • Train and test your network by running python avg_runner.py from the Code/ directory with the following options:
-l/--load_path=    <Relative/path/to/saved/model>
-t/--test_dir=     <Directory of test images>
-r--recursions=    <# recursive predictions to make on test>
-a/--adversarial=  <{t/f}> (Whether to use adversarial training. Default=True)
-n/--name=         <Subdirectory of ../Data/Save/*/ in which to save output of this run>
-O/--overwrite     (Overwrites all previous data for the model with this save name)
-T/--test_only     (Only runs a test step -- no training)
-H/--help          (Prints usage)
--stats_freq=      <How often to print loss/train error stats, in # steps>
--summary_freq=    <How often to save loss/error summaries, in # steps>
--img_save_freq=   <How often to save generated images, in # steps>
--test_freq=       <How often to test the model on test data, in # steps>
--model_save_freq= <How often to save the model, in # steps>

FAQs

Why don't you train on patches larger then 32x32? Why not train on the whole image?

Memory usage. Since the discriminator has fully-connected layers after the convolutions, the output of the last convolution must be flattened to connect to the first fully-connected layer. The size of this output is dependent on the input image size, and blows up really quickly (e.g. For an input size of 64x64, going from 128 feature maps to a fully connected layer with 512 nodes, you need a connection with 64 * 64 * 128 * 512 = 268,435,456 weights). Because of this, training on patches larger than 32x32 causes an out-of-memory error (at least on my machine).

Luckily, you only need the discriminator for training, and the generator network is fully convolutional, so you can test the weights you trained on 32x32 images over images of any size (which is why I'm able to do generations for the entire Ms. Pac-Man board).

Owner
Matt Cooper
I'm an entrepreneur who loves music, coding and great design. Passionate about advancing AI and creating products that bring those advancements to the world.
Matt Cooper
Code for the KDD 2021 paper 'Filtration Curves for Graph Representation'

Filtration Curves for Graph Representation This repository provides the code from the KDD'21 paper Filtration Curves for Graph Representation. Depende

Machine Learning and Computational Biology Lab 16 Oct 16, 2022
A deep learning tabular classification architecture inspired by TabTransformer with integrated gated multilayer perceptron.

The GatedTabTransformer. A deep learning tabular classification architecture inspired by TabTransformer with integrated gated multilayer perceptron. C

Radi Cho 60 Dec 15, 2022
Neighborhood Contrastive Learning for Novel Class Discovery

Neighborhood Contrastive Learning for Novel Class Discovery This repository contains the official implementation of our paper: Neighborhood Contrastiv

Zhun Zhong 56 Dec 09, 2022
Beyond Image to Depth: Improving Depth Prediction using Echoes (CVPR 2021)

Beyond Image to Depth: Improving Depth Prediction using Echoes (CVPR 2021) Kranti Kumar Parida, Siddharth Srivastava, Gaurav Sharma. We address the pr

Kranti Kumar Parida 33 Jun 27, 2022
ViSER: Video-Specific Surface Embeddings for Articulated 3D Shape Reconstruction

ViSER: Video-Specific Surface Embeddings for Articulated 3D Shape Reconstruction. NeurIPS 2021.

Gengshan Yang 59 Nov 25, 2022
Implementations of CNNs, RNNs, GANs, etc

Tensorflow Programs and Tutorials This repository will contain Tensorflow tutorials on a lot of the most popular deep learning concepts. It'll also co

Adit Deshpande 1k Dec 30, 2022
Official PyTorch implementation of PICCOLO: Point-Cloud Centric Omnidirectional Localization (ICCV 2021)

Official PyTorch implementation of PICCOLO: Point-Cloud Centric Omnidirectional Localization (ICCV 2021)

16 Nov 19, 2022
An Agnostic Computer Vision Framework - Pluggable to any Training Library: Fastai, Pytorch-Lightning with more to come

IceVision is the first agnostic computer vision framework to offer a curated collection with hundreds of high-quality pre-trained models from torchvision, MMLabs, and soon Pytorch Image Models. It or

airctic 789 Dec 29, 2022
noisy labels; missing labels; semi-supervised learning; entropy; uncertainty; robustness and generalisation.

ProSelfLC: CVPR 2021 ProSelfLC: Progressive Self Label Correction for Training Robust Deep Neural Networks For any specific discussion or potential fu

amos_xwang 57 Dec 04, 2022
Fast image augmentation library and easy to use wrapper around other libraries. Documentation: https://albumentations.ai/docs/ Paper about library: https://www.mdpi.com/2078-2489/11/2/125

Albumentations Albumentations is a Python library for image augmentation. Image augmentation is used in deep learning and computer vision tasks to inc

11.4k Jan 09, 2023
Visualize Camera's Pose Using Extrinsic Parameter by Plotting Pyramid Model on 3D Space

extrinsic2pyramid Visualize Camera's Pose Using Extrinsic Parameter by Plotting Pyramid Model on 3D Space Intro A very simple and straightforward modu

JEONG HYEONJIN 106 Dec 28, 2022
A python implementation of Physics-informed Spline Learning for nonlinear dynamics discovery

PiSL A python implementation of Physics-informed Spline Learning for nonlinear dynamics discovery. Sun, F., Liu, Y. and Sun, H., 2021. Physics-informe

Fangzheng (Andy) Sun 8 Jul 13, 2022
The toolkit to generate auto labeled datasets

Ozeu Ozeu is the toolkit to autolabal dataset for instance segmentation. You can generate datasets labaled with segmentation mask and bounding box fro

Xiong Jie 28 Mar 28, 2022
Official implementation of Rethinking Graph Neural Architecture Search from Message-passing (CVPR2021)

Rethinking Graph Neural Architecture Search from Message-passing Intro The GNAS can automatically learn better architecture with the optimal depth of

Shaofei Cai 48 Sep 30, 2022
Learnable Multi-level Frequency Decomposition and Hierarchical Attention Mechanism for Generalized Face Presentation Attack Detection

LMFD-PAD Note This is the official repository of the paper: LMFD-PAD: Learnable Multi-level Frequency Decomposition and Hierarchical Attention Mechani

28 Dec 02, 2022
This is an official implementation for "PlaneRecNet".

PlaneRecNet This is an official implementation for PlaneRecNet: A multi-task convolutional neural network provides instance segmentation for piece-wis

yaxu 50 Nov 17, 2022
This repo contains the code required to train the multivariate time-series Transformer.

Multi-Variate Time-Series Transformer This repo contains the code required to train the multivariate time-series Transformer. Download the data The No

Gregory Duthé 4 Nov 24, 2022
Library for converting from RGB / GrayScale image to base64 and back.

Library for converting RGB / Grayscale numpy images from to base64 and back. Installation pip install -U image_to_base_64 Conversion RGB to base 64 b

Vladimir Iglovikov 16 Aug 28, 2022
Gesture-controlled Video Game. Just swing your finger and play the game without touching your PC

Gesture Controlled Video Game Detailed Blog : https://www.analyticsvidhya.com/blog/2021/06/gesture-controlled-video-game/ Introduction This project is

Devbrat Anuragi 35 Jan 06, 2023
A simple API wrapper for Discord interactions.

Your ultimate Discord interactions library for discord.py. About | Installation | Examples | Discord | PyPI About What is discord-py-interactions? dis

james 641 Jan 03, 2023