Interactive Image Generation via Generative Adversarial Networks

Overview

iGAN: Interactive Image Generation via Generative Adversarial Networks

Project | Youtube | Paper

Recent projects:
[pix2pix]: Torch implementation for learning a mapping from input images to output images.
[CycleGAN]: Torch implementation for learning an image-to-image translation (i.e., pix2pix) without input-output pairs.
[pytorch-CycleGAN-and-pix2pix]: PyTorch implementation for both unpaired and paired image-to-image translation.

Overview

iGAN (aka. interactive GAN) is the author's implementation of interactive image generation interface described in:
"Generative Visual Manipulation on the Natural Image Manifold"
Jun-Yan Zhu, Philipp Krähenbühl, Eli Shechtman, Alexei A. Efros
In European Conference on Computer Vision (ECCV) 2016

Given a few user strokes, our system could produce photo-realistic samples that best satisfy the user edits in real-time. Our system is based on deep generative models such as Generative Adversarial Networks (GAN) and DCGAN. The system serves the following two purposes:

  • An intelligent drawing interface for automatically generating images inspired by the color and shape of the brush strokes.
  • An interactive visual debugging tool for understanding and visualizing deep generative models. By interacting with the generative model, a developer can understand what visual content the model can produce, as well as the limitation of the model.

Please cite our paper if you find this code useful in your research. (Contact: Jun-Yan Zhu, junyanz at mit dot edu)

Getting started

  • Install the python libraries. (See Requirements).
  • Download the code from GitHub:
git clone https://github.com/junyanz/iGAN
cd iGAN
  • Download the model. (See Model Zoo for details):
bash ./models/scripts/download_dcgan_model.sh outdoor_64
  • Run the python script:
THEANO_FLAGS='device=gpu0, floatX=float32, nvcc.fastmath=True' python iGAN_main.py --model_name outdoor_64

Requirements

The code is written in Python2 and requires the following 3rd party libraries:

sudo apt-get install python-opencv
sudo pip install --upgrade --no-deps git+git://github.com/Theano/Theano.git
  • PyQt4: more details on Qt installation can be found here
sudo apt-get install python-qt4
sudo pip install qdarkstyle
sudo pip install dominate
  • GPU + CUDA + cuDNN: The code is tested on GTX Titan X + CUDA 7.5 + cuDNN 5. Here are the tutorials on how to install CUDA and cuDNN. A decent GPU is required to run the system in real-time. [Warning] If you run the program on a GPU server, you need to use remote desktop software (e.g., VNC), which may introduce display artifacts and latency problem.

Python3

For Python3 users, you need to replace pip with pip3:

  • PyQt4 with Python3:
sudo apt-get install python3-pyqt4
  • OpenCV3 with Python3: see the installation instruction.

Interface:

See [Youtube] at 2:18s for the interactive image generation demos.

Layout

  • Drawing Pad: This is the main window of our interface. A user can apply different edits via our brush tools, and the system will display the generated image. Check/Uncheck Edits button to display/hide user edits.
  • Candidate Results: a display showing thumbnails of all the candidate results (e.g., different modes) that fits the user edits. A user can click a mode (highlighted by a green rectangle), and the drawing pad will show this result.
  • Brush Tools: Coloring Brush for changing the color of a specific region; Sketching brush for outlining the shape. Warping brush for modifying the shape more explicitly.
  • Slider Bar: drag the slider bar to explore the interpolation sequence between the initial result (i.e., randomly generated image) and the current result (e.g., image that satisfies the user edits).
  • Control Panel: Play: play the interpolation sequence; Fix: use the current result as additional constraints for further editing Restart: restart the system; Save: save the result to a webpage. Edits: Check the box if you would like to show the edits on top of the generated image.

User interaction

  • Coloring Brush: right-click to select a color; hold left click to paint; scroll the mouse wheel to adjust the width of the brush.
  • Sketching Brush: hold left-click to sketch the shape.
  • Warping Brush: We recommend you first use coloring and sketching before the warping brush. Right-click to select a square region; hold left click to drag the region; scroll the mouse wheel to adjust the size of the square region.
  • Shortcuts: P for Play, F for Fix, R for Restart; S for Save; E for Edits; Q for quitting the program.
  • Tooltips: when you move the cursor over a button, the system will display the tooltip of the button.

Model Zoo:

Download the Theano DCGAN model (e.g., outdoor_64). Before using our system, please check out the random real images vs. DCGAN generated samples to see which kind of images that a model can produce.

bash ./models/scripts/download_dcgan_model.sh outdoor_64

We provide a simple script to generate samples from a pre-trained DCGAN model. You can run this script to test if Theano, CUDA, cuDNN are configured properly before running our interface.

THEANO_FLAGS='device=gpu0, floatX=float32, nvcc.fastmath=True' python generate_samples.py --model_name outdoor_64 --output_image outdoor_64_dcgan.png

Command line arguments:

Type python iGAN_main.py --help for a complete list of the arguments. Here we discuss some important arguments:

  • --model_name: the name of the model (e.g., outdoor_64, shoes_64, etc.)
  • --model_type: currently only supports dcgan_theano.
  • --model_file: the file that stores the generative model; If not specified, model_file='./models/%s.%s' % (model_name, model_type)
  • --top_k: the number of the candidate results being displayed
  • --average: show an average image in the main window. Inspired by AverageExplorer, average image is a weighted average of multiple generated results, with the weights reflecting user-indicated importance. You can switch between average mode and normal mode by press A.
  • --shadow: We build a sketching assistance system for guiding the freeform drawing of objects inspired by ShadowDraw To use the interface, download the model hed_shoes_64 and run the following script
THEANO_FLAGS='device=gpu0, floatX=float32, nvcc.fastmath=True' python iGAN_main.py --model_name hed_shoes_64 --shadow --average

Dataset and Training

See more details here

Projecting an Image onto Latent Space

We provide a script to project an image into latent space (i.e., x->z):

  • Download the pre-trained AlexNet model (conv4):
bash models/scripts/download_alexnet.sh conv4
  • Run the following script with a model and an input image. (e.g., model: shoes_64.dcgan_theano, and input image ./pics/shoes_test.png)
THEANO_FLAGS='device=gpu0, floatX=float32, nvcc.fastmath=True' python iGAN_predict.py --model_name shoes_64 --input_image ./pics/shoes_test.png --solver cnn_opt
  • Check the result saved in ./pics/shoes_test_cnn_opt.png
  • We provide three methods: opt for optimization method; cnn for feed-forward network method (fastest); cnn_opt hybrid of the previous methods (default and best). Type python iGAN_predict.py --help for a complete list of the arguments.

Script without UI

We also provide a standalone script that should work without UI. Given user constraints (i.e., a color map, a color mask, and an edge map), the script generates multiple images that mostly satisfy the user constraints. See python iGAN_script.py --help for more details.

THEANO_FLAGS='device=gpu0, floatX=float32, nvcc.fastmath=True' python iGAN_script.py --model_name outdoor_64

Citation

@inproceedings{zhu2016generative,
  title={Generative Visual Manipulation on the Natural Image Manifold},
  author={Zhu, Jun-Yan and Kr{\"a}henb{\"u}hl, Philipp and Shechtman, Eli and Efros, Alexei A.},
  booktitle={Proceedings of European Conference on Computer Vision (ECCV)},
  year={2016}
}

Cat Paper Collection

If you love cats, and love reading cool graphics, vision, and learning papers, please check out our Cat Paper Collection:
[Github] [Webpage]

Acknowledgement

  • We modified the DCGAN code in our package. Please cite the original DCGAN paper if you use their models.
  • This work was supported, in part, by funding from Adobe, eBay, and Intel, as well as a hardware grant from NVIDIA. J.-Y. Zhu is supported by Facebook Graduate Fellowship.
Owner
Jun-Yan Zhu
Understanding and creating pixels.
Jun-Yan Zhu
A Simple Long-Tailed Rocognition Baseline via Vision-Language Model

BALLAD This is the official code repository for A Simple Long-Tailed Rocognition Baseline via Vision-Language Model. Requirements Python3 Pytorch(1.7.

Teli Ma 4 Jan 20, 2022
This solves the autonomous driving issue which is supported by deep learning technology. Given a video, it splits into images and predicts the angle of turning for each frame.

Self Driving Car An autonomous car (also known as a driverless car, self-driving car, and robotic car) is a vehicle that is capable of sensing its env

Sagor Saha 4 Sep 04, 2021
Notepy is a full-featured Notepad Python app

Notepy A full featured python text-editor Notable features Autocompletion for parenthesis and quote Auto identation Syntax highlighting Compile and ru

Mirko Rovere 11 Sep 28, 2022
Occlusion robust 3D face reconstruction model in CFR-GAN (WACV 2022)

Occlusion Robust 3D face Reconstruction Yeong-Joon Ju, Gun-Hee Lee, Jung-Ho Hong, and Seong-Whan Lee Code for Occlusion Robust 3D Face Reconstruction

Yeongjoon 31 Dec 19, 2022
(JMLR' 19) A Python Toolbox for Scalable Outlier Detection (Anomaly Detection)

Python Outlier Detection (PyOD) Deployment & Documentation & Stats & License PyOD is a comprehensive and scalable Python toolkit for detecting outlyin

Yue Zhao 6.6k Jan 05, 2023
Prediction of MBA refinance Index (Mortgage prepayment)

Prediction of MBA refinance Index (Mortgage prepayment) Deep Neural Network based Model The ability to predict mortgage prepayment is of critical use

Ruchil Barya 1 Jan 16, 2022
DIT is a DTLS MitM proxy implemented in Python 3. It can intercept, manipulate and suppress datagrams between two DTLS endpoints and supports psk-based and certificate-based authentication schemes (RSA + ECC).

DIT - DTLS Interception Tool DIT is a MitM proxy tool to intercept DTLS traffic. It can intercept, manipulate and/or suppress DTLS datagrams between t

52 Nov 30, 2022
Snscrape-jsonl-urls-extractor - Extracts urls from jsonl produced by snscrape

snscrape-jsonl-urls-extractor extracts urls from jsonl produced by snscrape Usag

1 Feb 26, 2022
A C implementation for creating 2D voronoi diagrams

Branch OSX/Linux Windows master dev jc_voronoi A fast C/C++ header only implementation for creating 2D Voronoi diagrams from a point set Uses Fortune'

Mathias Westerdahl 481 Dec 29, 2022
The source code for CATSETMAT: Cross Attention for Set Matching in Bipartite Hypergraphs

catsetmat The source code for CATSETMAT: Cross Attention for Set Matching in Bipartite Hypergraphs To be able to run it, add catsetmat to PYTHONPATH H

2 Dec 19, 2022
Multimodal Co-Attention Transformer (MCAT) for Survival Prediction in Gigapixel Whole Slide Images

Multimodal Co-Attention Transformer (MCAT) for Survival Prediction in Gigapixel Whole Slide Images [ICCV 2021] © Mahmood Lab - This code is made avail

Mahmood Lab @ Harvard/BWH 63 Dec 01, 2022
LibFewShot: A Comprehensive Library for Few-shot Learning.

LibFewShot Make few-shot learning easy. Supported Methods Meta MAML(ICML'17) ANIL(ICLR'20) R2D2(ICLR'19) Versa(NeurIPS'18) LEO(ICLR'19) MTL(CVPR'19) M

<a href=[email protected]&L"> 603 Jan 05, 2023
SMD-Nets: Stereo Mixture Density Networks

SMD-Nets: Stereo Mixture Density Networks This repository contains a Pytorch implementation of "SMD-Nets: Stereo Mixture Density Networks" (CVPR 2021)

Fabio Tosi 115 Dec 26, 2022
TensorFlow Ranking is a library for Learning-to-Rank (LTR) techniques on the TensorFlow platform

TensorFlow Ranking is a library for Learning-to-Rank (LTR) techniques on the TensorFlow platform

2.6k Jan 04, 2023
some classic model used to segment the medical images like CT、X-ray and so on

github_project This is a project for medical image segmentation. This project includes common medical image segmentation models such as U-net, FCN, De

2 Mar 30, 2022
Official TensorFlow code for the forthcoming paper

~ Efficient-CapsNet ~ Are you tired of over inflated and overused convolutional neural networks? You're right! It's time for CAPSULES :)

Vittorio Mazzia 203 Jan 08, 2023
PyTorch implementation of "Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning"

Transparency-by-Design networks (TbD-nets) This repository contains code for replicating the experiments and visualizations from the paper Transparenc

David Mascharka 351 Nov 18, 2022
End-to-end image segmentation kit based on PaddlePaddle.

English | 简体中文 PaddleSeg PaddleSeg has released the new version including the following features: Our team won the 6.2k Jan 02, 2023

Python package for covariance matrices manipulation and Biosignal classification with application in Brain Computer interface

pyRiemann pyRiemann is a python package for covariance matrices manipulation and classification through Riemannian geometry. The primary target is cla

447 Jan 05, 2023
🌾 PASTIS 🌾 Panoptic Agricultural Satellite TIme Series

🌾 PASTIS 🌾 Panoptic Agricultural Satellite TIme Series (optical and radar) The PASTIS Dataset Dataset presentation PASTIS is a benchmark dataset for

86 Jan 04, 2023