Zero-Shot Text-to-Image Generation VQGAN+CLIP Dockerized

Last update: Sep 11, 2022

Overview

VQGAN-CLIP-Docker

About

Zero-Shot Text-to-Image Generation VQGAN+CLIP Dockerized

This is a stripped and minimal dependency repository for running locally or in production VQGAN+CLIP.

For a Google Colab notebook see the original repository.

Samples

Setup

Clone this repository and cd inside.

git clone https://github.com/kcosta42/VQGAN-CLIP-Docker.git
cd VQGAN-CLIP-Docker

Download a VQGAN model and put it in the ./models folder.

Dataset	Link
ImageNet (f=16), 16384	vqgan_imagenet_f16_16384

For GPU capability, make sure you have CUDA installed on your system (tested with CUDA 11.1+).

6 GB of VRAM is required to generate 256x256 images.
11 GB of VRAM is required to generate 512x512 images.
24 GB of VRAM is required to generate 1024x1024 images. (Untested)

Local

Install the Python requirements

python3 -m pip install -r requirements.txt

To know if you can run this on your GPU, the following command must return True.

python3 -c "import torch; print(torch.cuda.is_available());"

Docker

Make sure you have docker and docker-compose installed. nvidia-docker is needed if you want to run this on your GPU through Docker.

A Makefile is provided for ease of use.

make build  # Build the docker image

Usage

Two configuration file are provided ./configs/local.json and ./configs/docker.json. They are ready to go, but you may want to edit them to meet your need. Check the Configuration section to understand each field.

The resulting generations can be found in the ./outputs folder.

GPU

To run locally:

python3 -m scripts.generate -c ./configs/local.json

To run on docker:

make generate

CPU

To run locally:

DEVICE=cpu python3 -m scripts.generate -c ./configs/local.json

To run on docker:

make generate-cpu

Configuration

Argument	Type	Descriptions
`prompts`	List[str]	Text prompts
`image_prompts`	List[FilePath]	Image prompts / target image path
`max_iterations`	int	Number of iterations
`save_freq`	int	Save image iterations
`size`	[int, int]	Image size (width height)
`init_image`	FilePath	Initial image
`init_noise`	str	Initial noise image ['gradient','pixels']
`init_weight`	float	Initial weight
`output_dir`	FilePath	Path to output directory
`models_dir`	FilePath	Path to models cache directory
`clip_model`	FilePath	CLIP model path or name
`vqgan_checkpoint`	FilePath	VQGAN checkpoint path
`vqgan_config`	FilePath	VQGAN config path
`noise_prompt_seeds`	List[int]	Noise prompt seeds
`noise_prompt_weights`	List[float]	Noise prompt weights
`step_size`	float	Learning rate
`cutn`	int	Number of cuts
`cut_pow`	float	Cut power
`seed`	int	Seed (-1 for random seed)
`optimizer`	str	Optimiser ['Adam','AdamW','Adagrad','Adamax','DiffGrad','AdamP','RAdam']
`augments`	List[str]	Enabled augments ['Ji','Sh','Gn','Pe','Ro','Af','Et','Ts','Cr','Er','Re']

Acknowledgments

Citations

@misc{unpublished2021clip,
    title  = {CLIP: Connecting Text and Images},
    author = {Alec Radford, Ilya Sutskever, Jong Wook Kim, Gretchen Krueger, Sandhini Agarwal},
    year   = {2021}
}

@misc{esser2020taming,
      title={Taming Transformers for High-Resolution Image Synthesis},
      author={Patrick Esser and Robin Rombach and Björn Ommer},
      year={2020},
      eprint={2012.09841},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

@misc{ramesh2021zeroshot,
    title   = {Zero-Shot Text-to-Image Generation},
    author  = {Aditya Ramesh and Mikhail Pavlov and Gabriel Goh and Scott Gray and Chelsea Voss and Alec Radford and Mark Chen and Ilya Sutskever},
    year    = {2021},
    eprint  = {2102.12092},
    archivePrefix = {arXiv},
    primaryClass = {cs.CV}
}

Zero-Shot Text-to-Image Generation VQGAN+CLIP Dockerized

Related tags

Overview

VQGAN-CLIP-Docker

About

Samples

Setup

Local

Docker

Usage

GPU

CPU

Configuration

Acknowledgments

Citations

Owner

Kevin Costa

Protect against subdomain takeover

TSP: Temporally-Sensitive Pretraining of Video Encoders for Localization Tasks

MixRNet(Using mixup as regularization and tuning hyper-parameters for ResNets)

The official implementation of the Hybrid Self-Attention NEAT algorithm

Robbing the FED: Directly Obtaining Private Data in Federated Learning with Modified Models

Jupyter notebooks for the code samples of the book "Deep Learning with Python"

A transformer-based method for Healthcare Image Captioning in Vietnamese

[CVPR 2021] Region-aware Adaptive Instance Normalization for Image Harmonization

Additional functionality for use with fastai’s medical imaging module

Code repository for the work "Multi-Domain Incremental Learning for Semantic Segmentation", accepted at WACV 2022

PyTorch implementation for our paper "Deep Facial Synthesis: A New Challenge"

Sample code and notebooks for Vertex AI, the end-to-end machine learning platform on Google Cloud

Classification Modeling: Probability of Default

Arquitetura e Desenho de Software.

A modular, primitive-first, python-first PyTorch library for Reinforcement Learning.

This is a official repository of SimViT.

Applying curriculum to meta-learning for few shot classification

Code for the IJCAI 2021 paper "Structure Guided Lane Detection"

Action Segmentation Evaluation

Breast cancer is been classified into benign tumour and malignant tumour.