Text-to-Image Translation (DALL-E) for TPU in Pytorch

Refactoring Taming Transformers and DALLE-pytorch for TPU VM with Pytorch Lightning

Requirements

pip install -r requirements.txt

Data Preparation

Place any image dataset with ImageNet-style directory structure (at least 1 subfolder) to fit the dataset into pytorch ImageFolder.

Training VQVAEs

You can easily test main.py with randomly generated fake data.

python train_vae.py --use_tpus --fake_data

For actual training provide specific directory for train_dir, val_dir, log_dir:

python train_vae.py --use_tpus --train_dir [training_set] --val_dir [val_set] --log_dir [where to save results]

Training DALL-E

python train_dalle.py --use_tpus --train_dir [training_set] --val_dir [val_set] --log_dir [where to save results] --vae_path [pretrained vae] --bpe_path [pretrained bpe(optional)]

TODO

Refactor Encoder and Decoder modules for better readability
Refactor VQVAE2
Add Net2Net Conditional Transformer for conditional image generation
Refactor, optimize, and merge DALL-E with Net2Net Conditional Transformer
Add Guided Diffusion + CLIP for image refinement
Add VAE converter for JAX to support dalle-mini
Add DALL-E colab notebook
Add RBGumbelQuantizer
Add HiT

ON-GOING

Test large dataset loading on TPU Pods
Change current DALL-E code to fully support latest updates from DALLE-pytorch

DONE

BibTeX

@misc{oord2018neural,
      title={Neural Discrete Representation Learning}, 
      author={Aaron van den Oord and Oriol Vinyals and Koray Kavukcuoglu},
      year={2018},
      eprint={1711.00937},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

@misc{razavi2019generating,
      title={Generating Diverse High-Fidelity Images with VQ-VAE-2}, 
      author={Ali Razavi and Aaron van den Oord and Oriol Vinyals},
      year={2019},
      eprint={1906.00446},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

@misc{esser2020taming,
      title={Taming Transformers for High-Resolution Image Synthesis}, 
      author={Patrick Esser and Robin Rombach and Björn Ommer},
      year={2020},
      eprint={2012.09841},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

@misc{ramesh2021zeroshot,
    title   = {Zero-Shot Text-to-Image Generation}, 
    author  = {Aditya Ramesh and Mikhail Pavlov and Gabriel Goh and Scott Gray and Chelsea Voss and Alec Radford and Mark Chen and Ilya Sutskever},
    year    = {2021},
    eprint  = {2102.12092},
    archivePrefix = {arXiv},
    primaryClass = {cs.CV}
}

Refactoring dalle-pytorch and taming-transformers for TPU VM

Related tags

Overview

Text-to-Image Translation (DALL-E) for TPU in Pytorch

Requirements

Data Preparation

Training VQVAEs

Training DALL-E

TODO

ON-GOING

DONE

BibTeX

Owner

Kim, Taehoon

RoBERTa Marathi Language model trained from scratch during huggingface 🤗 x flax community week

Build a small, 3 domain internet using Github pages and Wikipedia and construct a crawler to crawl, render, and index.

Transfer Learning library for Deep Neural Networks.

[CVPR'21] DeepSurfels: Learning Online Appearance Fusion

Code of PVTv2 is released! PVTv2 largely improves PVTv1 and works better than Swin Transformer with ImageNet-1K pre-training.

Reinforcement learning library(framework) designed for PyTorch, implements DQN, DDPG, A2C, PPO, SAC, MADDPG, A3C, APEX, IMPALA ...

The PyTorch implementation for paper "Neural Texture Extraction and Distribution for Controllable Person Image Synthesis" (CVPR2022 Oral)

Code for CVPR 2021 paper: Anchor-Free Person Search

Reproduced Code for Image Forgery Detection papers.

Language-Agnostic Website Embedding and Classification

Implementation of "Semi-supervised Domain Adaptive Structure Learning"

Jittor implementation of PCT:Point Cloud Transformer

StyleGAN2-ADA - Official PyTorch implementation

Pytorch Lightning code guideline for conferences

Image-Stitching - Panorama composition using SIFT Features and a custom implementaion of RANSAC algorithm

Runtime type annotations for the shape, dtype etc. of PyTorch Tensors.

Attention-guided gan for synthesizing IR images

8-week curriculum for AI Builders

Deep Learning for Human Part Discovery in Images - Chainer implementation

Exploring Simple 3D Multi-Object Tracking for Autonomous Driving (ICCV 2021)