Generative Adversarial Text to Image Synthesis

Overview

Text To Image Synthesis

This is a tensorflow implementation of synthesizing images. The images are synthesized using the GAN-CLS Algorithm from the paper Generative Adversarial Text-to-Image Synthesis. This implementation is built on top of the excellent DCGAN in Tensorflow.

Plese star https://github.com/tensorlayer/tensorlayer

Model architecture

Image Source : Generative Adversarial Text-to-Image Synthesis Paper

Requirements

Datasets

  • The model is currently trained on the flowers dataset. Download the images from here and save them in 102flowers/102flowers/*.jpg. Also download the captions from this link. Extract the archive, copy the text_c10 folder and paste it in 102flowers/text_c10/class_*.

N.B You can downloads all data files needed manually or simply run the downloads.py and put the correct files to the right directories.

python downloads.py

Codes

  • downloads.py download Oxford-102 flower dataset and caption files(run this first).
  • data_loader.py load data for further processing.
  • train_txt2im.py train a text to image model.
  • utils.py helper functions.
  • model.py models.

References

Results

  • the flower shown has yellow anther red pistil and bright red petals.
  • this flower has petals that are yellow, white and purple and has dark lines
  • the petals on this flower are white with a yellow center
  • this flower has a lot of small round pink petals.
  • this flower is orange in color, and has petals that are ruffled and rounded.
  • the flower has yellow petals and the center of it is brown
  • this flower has petals that are blue and white.
  • these white flowers have petals that start off white in color and end in a white towards the tips.

License

Apache 2.0

Comments
  • ValueError: Object arrays cannot be loaded when allow_pickle=False

    ValueError: Object arrays cannot be loaded when allow_pickle=False

    File "train_txt2im.py", line 458, in main_train() File "train_txt2im.py", line 133, in main_train load_and_assign_npz(sess=sess, name=net_rnn_name, model=net_rnn) File "train_txt2im.py", line 458, in main_train() File "train_txt2im.py", line 133, in main_train load_and_assign_npz(sess=sess, name=net_rnn_name, model=net_rnn) File "/home/siddanath/importantforprojects/text-to-image/utils.py", line 20, in load_and_assign_npz params = tl.files.load_npz(name=name) File "/home/siddanath/importantforprojects/text-to-image/tensorlayer/files.py", line 600, in load_npz return d['params'] File "/home/siddanath/anaconda3/lib/python3.7/site-packages/numpy/lib/npyio.py", line 262, in getitem pickle_kwargs=self.pickle_kwargs) File "/home/siddanath/anaconda3/lib/python3.7/site-packages/numpy/lib/format.py", line 722, in read_array raise ValueError("Object arrays cannot be loaded when " ValueError: Object arrays cannot be loaded when allow_pickle=False

    opened by Siddanth-pai 2
  • Attempt to have a second RNNCell use the weights of a variable scope that already has weights

    Attempt to have a second RNNCell use the weights of a variable scope that already has weights

    I got a problem, how can I solve it?

    Attempt to have a second RNNCell use the weights of a variable scope that already has weights: 'rnnftxt/rnn/dynamic/rnn/basic_lstm_cell'; and the cell was not constructed as BasicLSTMCell(..., reuse=True). To share the weights of an RNNCell, simply reuse it in your second calculation, or create a new one with the argument reuse=True.

    opened by flsd201983 1
  • Next step after download.py

    Next step after download.py

    What is the next step to do after download.py? I tried python data_loader.py, but it has FileNotFoundError: FileNotFoundError: [Errno 2] No such file or directory: '/home/ly/src/lib/text-to-image/102flowers/text_c10'

    opened by arisliang 0
  • ValueError: invalid literal for int() with base 10: 'e' - when making inference

    ValueError: invalid literal for int() with base 10: 'e' - when making inference

    code -

    sample_sentence = ["a"] * int(sample_size/ni) + ["e"] * int(sample_size/ni) + ["i"] * int(sample_size/ni) + ["o"] * int(sample_size/ni) + ["u"] * int(sample_size/ni)

    for i, sentence in enumerate(sample_sentence): print("seed: %s" % sentence) sentence = preprocess_caption(sentence) sample_sentence[i] = [vocab.word_to_id(word) for word in nltk.tokenize.word_tokenize( sentence)] + [vocab.end_id] # add END_ID

    sample_sentence = tl.prepro.pad_sequences(sample_sentence, padding='post')
    
    img_gen, rnn_out = sess.run([net_g_res.outputs, net_rnn_res.outputs], feed_dict={
        t_real_caption: sample_sentence,
        t_z: sample_seed})
    
    save_images(img_gen, [ni, ni], 'samples/gen_samples/gen.png')
    
    opened by Akinleyejoshua 0
  • Excuse me, why is the flower dataset I test the result is very different from result.png

    Excuse me, why is the flower dataset I test the result is very different from result.png

    import tensorflow as tf import tensorlayer as tl from tensorlayer.layers import * from tensorlayer.prepro import * from tensorlayer.cost import * import numpy as np import scipy from scipy.io import loadmat import time, os, re, nltk

    from utils import * from model import * import model import pickle

    ###======================== PREPARE DATA ====================================### print("Loading data from pickle ...") import pickle with open("_vocab.pickle", 'rb') as f: vocab = pickle.load(f) with open("_image_train.pickle", 'rb') as f: _, images_train = pickle.load(f) with open("_image_test.pickle", 'rb') as f: _, images_test = pickle.load(f) with open("_n.pickle", 'rb') as f: n_captions_train, n_captions_test, n_captions_per_image, n_images_train, n_images_test = pickle.load(f) with open("_caption.pickle", 'rb') as f: captions_ids_train, captions_ids_test = pickle.load(f)

    images_train_256 = np.array(images_train_256)

    images_test_256 = np.array(images_test_256)

    images_train = np.array(images_train) images_test = np.array(images_test)

    ni = int(np.ceil(np.sqrt(batch_size))) save_dir = "checkpoint"

    t_real_image = tf.placeholder('float32', [batch_size, image_size, image_size, 3], name = 'real_image')

    t_real_caption = tf.placeholder(dtype=tf.int64, shape=[batch_size, None], name='real_caption_input')

    t_z = tf.placeholder(tf.float32, [batch_size, z_dim], name='z_noise') generator_txt2img = model.generator_txt2img_resnet

    net_rnn = rnn_embed(t_real_caption, is_train=False, reuse=False) net_g, _ = generator_txt2img(t_z, net_rnn.outputs, is_train=False, reuse=False, batch_size=batch_size)

    sess = tf.Session(config=tf.ConfigProto(allow_soft_placement=True)) tl.layers.initialize_global_variables(sess)

    net_rnn_name = os.path.join(save_dir, 'net_rnn.npz400.npz') net_cnn_name = os.path.join(save_dir, 'net_cnn.npz400.npz') net_g_name = os.path.join(save_dir, 'net_g.npz400.npz') net_d_name = os.path.join(save_dir, 'net_d.npz400.npz')

    net_rnn_res = tl.files.load_and_assign_npz(sess=sess, name=net_rnn_name, network=net_rnn)

    net_g_res = tl.files.load_and_assign_npz(sess=sess, name=net_g_name, network=net_g)

    sample_size = batch_size sample_seed = np.random.normal(loc=0.0, scale=1.0, size=(sample_size, z_dim)).astype(np.float32)

    n = int(sample_size / ni) sample_sentence = ["the flower shown has yellow anther red pistil and bright red petals."] * n +
    ["this flower has petals that are yellow, white and purple and has dark lines"] * n +
    ["the petals on this flower are white with a yellow center"] * n +
    ["this flower has a lot of small round pink petals."] * n +
    ["this flower is orange in color, and has petals that are ruffled and rounded."] * n +
    ["the flower has yellow petals and the center of it is brown."] * n +
    ["this flower has petals that are blue and white."] * n +
    ["these white flowers have petals that start off white in color and end in a white towards the tips."] * n

    for i, sentence in enumerate(sample_sentence): print("seed: %s" % sentence) sentence = preprocess_caption(sentence) sample_sentence[i] = [vocab.word_to_id(word) for word in nltk.tokenize.word_tokenize(sentence)] + [vocab.end_id] # add END_ID

    sample_sentence = tl.prepro.pad_sequences(sample_sentence, padding='post')

    img_gen, rnn_out = sess.run([net_g_res.outputs, net_rnn_res.outputs], feed_dict={ t_real_caption : sample_sentence, t_z : sample_seed})

    save_images(img_gen, [ni, ni], 'samples/gen_samples/gen.png')

    opened by keqkeq 0
  • Tensorflow 2.1, Tensorlayer 2.2 update

    Tensorflow 2.1, Tensorlayer 2.2 update

    Hello,

    are there any plans in the near future to update this git to the latest Tensorflow and Tensorlayer versions? I've been trying making the code run with backwards compat (compat.tf1. ...) but I've keep bumping on errors which are a bit too big of mouth full for me.

    Fyi: I've succesfully run the DCGAN Tensorlayer implementation with Tensorlayer 2.2 and a self build Tensorflow 2.1 (with 3.0 compute compatibility) from source in Python 3.7.

    So, an update would be greatly appreciated!

    opened by SadRebel1000 0
Releases(0.2)
Owner
Hao
Assistant Professor @ Peking University
Hao
Symmetry and Uncertainty-Aware Object SLAM for 6DoF Object Pose Estimation

SUO-SLAM This repository hosts the code for our CVPR 2022 paper "Symmetry and Uncertainty-Aware Object SLAM for 6DoF Object Pose Estimation". ArXiv li

Robot Perception & Navigation Group (RPNG) 97 Jan 03, 2023
Speech-Emotion-Analyzer - The neural network model is capable of detecting five different male/female emotions from audio speeches. (Deep Learning, NLP, Python)

Speech Emotion Analyzer The idea behind creating this project was to build a machine learning model that could detect emotions from the speech we have

Mitesh Puthran 965 Dec 24, 2022
Keras-1D-ACGAN-Data-Augmentation

Keras-1D-ACGAN-Data-Augmentation What is the ACGAN(Auxiliary Classifier GANs) ? Related Paper : [Abstract : Synthesizing high resolution photorealisti

Jae-Hoon Shim 7 Dec 23, 2022
An implementation of the proximal policy optimization algorithm

PPO Pytorch C++ This is an implementation of the proximal policy optimization algorithm for the C++ API of Pytorch. It uses a simple TestEnvironment t

Martin Huber 59 Dec 09, 2022
Facebook AI Image Similarity Challenge: Descriptor Track

Facebook AI Image Similarity Challenge: Descriptor Track This repository contains the code for our solution to the Facebook AI Image Similarity Challe

Sergio MP 17 Dec 14, 2022
Rotation-Only Bundle Adjustment

ROBA: Rotation-Only Bundle Adjustment Paper, Video, Poster, Presentation, Supplementary Material In this repository, we provide the implementation of

Seong 51 Nov 29, 2022
Constrained Logistic Regression - How to apply specific constraints to logistic regression's coefficients

Constrained Logistic Regression Sample implementation of constructing a logistic regression with given ranges on each of the feature's coefficients (v

1 Dec 29, 2021
CDGAN: Cyclic Discriminative Generative Adversarial Networks for Image-to-Image Transformation

CDGAN CDGAN: Cyclic Discriminative Generative Adversarial Networks for Image-to-Image Transformation CDGAN Implementation in PyTorch This is the imple

Kancharagunta Kishan Babu 6 Apr 19, 2022
Python based framework for Automatic AI for Regression and Classification over numerical data.

Python based framework for Automatic AI for Regression and Classification over numerical data. Performs model search, hyper-parameter tuning, and high-quality Jupyter Notebook code generation.

BlobCity, Inc 141 Dec 21, 2022
ExCon: Explanation-driven Supervised Contrastive Learning

ExCon: Explanation-driven Supervised Contrastive Learning Contributors of this repo: Zhibo Zhang ( Zhibo (Darren) Zhang 18 Nov 01, 2022

A framework to train language models to learn invariant representations.

Invariant Language Modeling Implementation of the training for invariant language models. Motivation Modern pretrained language models are critical co

6 Nov 16, 2022
A dead simple python wrapper for darknet that works with OpenCV 4.1, CUDA 10.1

What Dead simple python wrapper for Yolo V3 using AlexyAB's darknet fork. Works with CUDA 10.1 and OpenCV 4.1 or later (I use OpenCV master as of Jun

Pliable Pixels 6 Jan 12, 2022
A complete speech segmentation system using Kaldi and x-vectors for voice activity detection (VAD) and speaker diarisation.

bbc-speech-segmenter: Voice Activity Detection & Speaker Diarization A complete speech segmentation system using Kaldi and x-vectors for voice activit

BBC 16 Oct 27, 2022
Deep-Learning-Image-Captioning - Implementing convolutional and recurrent neural networks in Keras to generate sentence descriptions of images

Deep Learning - Image Captioning with Convolutional and Recurrent Neural Nets ========================================================================

23 Apr 06, 2022
Code for ICLR 2020 paper "VL-BERT: Pre-training of Generic Visual-Linguistic Representations".

VL-BERT By Weijie Su, Xizhou Zhu, Yue Cao, Bin Li, Lewei Lu, Furu Wei, Jifeng Dai. This repository is an official implementation of the paper VL-BERT:

Weijie Su 698 Dec 18, 2022
Detectron2 for Document Layout Analysis

Detectron2 trained on PubLayNet dataset This repo contains the training configurations, code and trained models trained on PubLayNet dataset using Det

Himanshu 163 Nov 21, 2022
PyTorch implementation of the ACL, 2021 paper Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks.

Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks This repo contains the PyTorch implementation of the ACL, 2021 pa

Rabeeh Karimi Mahabadi 98 Dec 28, 2022
Neural Oblivious Decision Ensembles

Neural Oblivious Decision Ensembles A supplementary code for anonymous ICLR 2020 submission. What does it do? It learns deep ensembles of oblivious di

25 Sep 21, 2022
Banglore House Prediction Using Flask Server (Python)

Banglore House Prediction Using Flask Server (Python) 🌐 Links 🌐 📂 Repo In this repository, I've implemented a Machine Learning-based Bangalore Hous

Dhyan Shah 1 Jan 24, 2022
Code for CVPR2021 paper 'Where and What? Examining Interpretable Disentangled Representations'.

PS-SC GAN This repository contains the main code for training a PS-SC GAN (a GAN implemented with the Perceptual Simplicity and Spatial Constriction c

Xinqi/Steven Zhu 40 Dec 16, 2022