Text to image synthesis using thought vectors

Overview

Text To Image Synthesis Using Thought Vectors

Join the chat at https://gitter.im/text-to-image/Lobby

This is an experimental tensorflow implementation of synthesizing images from captions using Skip Thought Vectors. The images are synthesized using the GAN-CLS Algorithm from the paper Generative Adversarial Text-to-Image Synthesis. This implementation is built on top of the excellent DCGAN in Tensorflow. The following is the model architecture. The blue bars represent the Skip Thought Vectors for the captions.

Model architecture

Image Source : Generative Adversarial Text-to-Image Synthesis Paper

Requirements

Datasets

  • All the steps below for downloading the datasets and models can be performed automatically by running python download_datasets.py. Several gigabytes of files will be downloaded and extracted.
  • The model is currently trained on the flowers dataset. Download the images from this link and save them in Data/flowers/jpg. Also download the captions from this link. Extract the archive, copy the text_c10 folder and paste it in Data/flowers.
  • Download the pretrained models and vocabulary for skip thought vectors as per the instructions given here. Save the downloaded files in Data/skipthoughts.
  • Make empty directories in Data, Data/samples, Data/val_samples and Data/Models. They will be used for sampling the generated images and saving the trained models.

Usage

  • Data Processing : Extract the skip thought vectors for the flowers data set using :
python data_loader.py --data_set="flowers"
  • Training

    • Basic usage python train.py --data_set="flowers"
    • Options
      • z_dim: Noise Dimension. Default is 100.
      • t_dim: Text feature dimension. Default is 256.
      • batch_size: Batch Size. Default is 64.
      • image_size: Image dimension. Default is 64.
      • gf_dim: Number of conv in the first layer generator. Default is 64.
      • df_dim: Number of conv in the first layer discriminator. Default is 64.
      • gfc_dim: Dimension of gen untis for for fully connected layer. Default is 1024.
      • caption_vector_length: Length of the caption vector. Default is 1024.
      • data_dir: Data Directory. Default is Data/.
      • learning_rate: Learning Rate. Default is 0.0002.
      • beta1: Momentum for adam update. Default is 0.5.
      • epochs: Max number of epochs. Default is 600.
      • resume_model: Resume training from a pretrained model path.
      • data_set: Data Set to train on. Default is flowers.
  • Generating Images from Captions

    • Write the captions in text file, and save it as Data/sample_captions.txt. Generate the skip thought vectors for these captions using:
    python generate_thought_vectors.py --caption_file="Data/sample_captions.txt"
    
    • Generate the Images for the thought vectors using:
    python generate_images.py --model_path=<path to the trained model> --n_images=8
    

    n_images specifies the number of images to be generated per caption. The generated images will be saved in Data/val_samples/. python generate_images.py --help for more options.

Sample Images Generated

Following are the images generated by the generative model from the captions.

Caption Generated Images
the flower shown has yellow anther red pistil and bright red petals
this flower has petals that are yellow, white and purple and has dark lines
the petals on this flower are white with a yellow center
this flower has a lot of small round pink petals.
this flower is orange in color, and has petals that are ruffled and rounded.
the flower has yellow petals and the center of it is brown

Implementation Details

  • Only the uni-skip vectors from the skip thought vectors are used. I have not tried training the model with combine-skip vectors.
  • The model was trained for around 200 epochs on a GPU. This took roughly 2-3 days.
  • The images generated are 64 x 64 in dimension.
  • While processing the batches before training, the images are flipped horizontally with a probability of 0.5.
  • The train-val split is 0.75.

Pre-trained Models

  • Download the pretrained model from here and save it in Data/Models. Use this path for generating the images.

TODO

  • Train the model on the MS-COCO data set, and generate more generic images.
  • Try different embedding options for captions(other than skip thought vectors). Also try to train the caption embedding RNN along with the GAN-CLS model.

References

Alternate Implementations

License

MIT

Owner
Paarth Neekhara
PhD student, Computer Science, UCSD
Paarth Neekhara
CZU-MHAD: A multimodal dataset for human action recognition utilizing a depth camera and 10 wearable inertial sensors

CZU-MHAD: A multimodal dataset for human action recognition utilizing a depth camera and 10 wearable inertial sensors   In order to facilitate the res

yujmo 11 Dec 12, 2022
This repository contains the official implementation code of the paper Transformer-based Feature Reconstruction Network for Robust Multimodal Sentiment Analysis

This repository contains the official implementation code of the paper Transformer-based Feature Reconstruction Network for Robust Multimodal Sentiment Analysis, accepted at ACMMM 2021.

Ziqi Yuan 10 Sep 30, 2022
Towards Rolling Shutter Correction and Deblurring in Dynamic Scenes (CVPR2021)

RSCD (BS-RSCD & JCD) Towards Rolling Shutter Correction and Deblurring in Dynamic Scenes (CVPR2021) by Zhihang Zhong, Yinqiang Zheng, Imari Sato We co

81 Dec 15, 2022
retweet 4 satoshi ⚡️

rt4sat retweet 4 satoshi This bot is the codebase for https://twitter.com/rt4sat please feel free to create an issue if you saw any bugs basically thi

6 Sep 30, 2022
The ARCA23K baseline system

ARCA23K Baseline System This is the source code for the baseline system associated with the ARCA23K dataset. Details about ARCA23K and the baseline sy

4 Jul 02, 2022
Data-Uncertainty Guided Multi-Phase Learning for Semi-supervised Object Detection

An official implementation of paper Data-Uncertainty Guided Multi-Phase Learning for Semi-supervised Object Detection

11 Nov 23, 2022
The 2nd place solution of 2021 google landmark retrieval on kaggle.

Google_Landmark_Retrieval_2021_2nd_Place_Solution The 2nd place solution of 2021 google landmark retrieval on kaggle. Environment We use cuda 11.1/pyt

229 Dec 13, 2022
Source code for Adaptively Calibrated Critic Estimates for Deep Reinforcement Learning

Adaptively Calibrated Critic Estimates for Deep Reinforcement Learning Official implementation of ACC, described in the paper "Adaptively Calibrated C

3 Sep 16, 2022
PyTorch implementation for the ICLR 2020 paper "Understanding the Limitations of Variational Mutual Information Estimators"

Smoothed Mutual Information ``Lower Bound'' Estimator PyTorch implementation for the ICLR 2020 paper Understanding the Limitations of Variational Mutu

50 Nov 09, 2022
Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes

Neural Scene Flow Fields PyTorch implementation of paper "Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes", CVPR 2021 [Projec

Zhengqi Li 583 Dec 30, 2022
Robbing the FED: Directly Obtaining Private Data in Federated Learning with Modified Models

Robbing the FED: Directly Obtaining Private Data in Federated Learning with Modified Models This repo contains a barebones implementation for the atta

16 Dec 04, 2022
An automated algorithm to extract the linear blend skinning (LBS) from a set of example poses

Dem Bones This repository contains an implementation of Smooth Skinning Decomposition with Rigid Bones, an automated algorithm to extract the Linear B

Electronic Arts 684 Dec 26, 2022
PointCNN: Convolution On X-Transformed Points (NeurIPS 2018)

PointCNN: Convolution On X-Transformed Points Created by Yangyan Li, Rui Bu, Mingchao Sun, Wei Wu, Xinhan Di, and Baoquan Chen. Introduction PointCNN

Yangyan Li 1.3k Dec 21, 2022
Locationinfo - A script helps the user to show network information such as ip address

Description This script helps the user to show network information such as ip ad

Roxcoder 1 Dec 30, 2021
PyTorch implementation of Glow

glow-pytorch PyTorch implementation of Glow, Generative Flow with Invertible 1x1 Convolutions (https://arxiv.org/abs/1807.03039) Usage: python train.p

Kim Seonghyeon 433 Dec 27, 2022
Syllabic Quantity Patterns as Rhythmic Features for Latin Authorship Attribution

Syllabic Quantity Patterns as Rhythmic Features for Latin Authorship Attribution Abstract Within the Latin (and ancient Greek) production, it is well

4 Dec 03, 2022
Athena is the only tool that you will ever need to optimize your portfolio.

Athena Portfolio optimization is the process of selecting the best portfolio (asset distribution), out of the set of all portfolios being considered,

Indrajit 1 Mar 25, 2022
Image super-resolution through deep learning

srez Image super-resolution through deep learning. This project uses deep learning to upscale 16x16 images by a 4x factor. The resulting 64x64 images

David Garcia 5.3k Dec 28, 2022
RL-GAN: Transfer Learning for Related Reinforcement Learning Tasks via Image-to-Image Translation

RL-GAN: Transfer Learning for Related Reinforcement Learning Tasks via Image-to-Image Translation RL-GAN is an official implementation of the paper: T

42 Nov 10, 2022
Relative Uncertainty Learning for Facial Expression Recognition

Relative Uncertainty Learning for Facial Expression Recognition The official implementation of the following paper at NeurIPS2021: Title: Relative Unc

35 Dec 28, 2022