Pytorch implementation of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors

Last update: Dec 28, 2022

Overview

Make-A-Scene - PyTorch

Pytorch implementation (inofficial) of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors (https://arxiv.org/pdf/2203.13131.pdf)

Figure 1. from paper

Note: this is work in progress.

Everyone is happily invited to contribute --> Discord Channel: https://discord.gg/hCRMGRZkC6

We would love to open-source a trained model. The model is a billion parameter model. Training it requires a lot of compute. If anyone can provide computational resources, let us know.

Paper Description:

Make-A-Scene modifies the VQGAN framework. It makes heavy use of using semantic segmentation maps for extra conditioning. This enables more influence on the generation process. Morever, it also conditions on text. The main improvements are the following:

Segmentation condition: separate VQVAE is trained (VQ-SEG) + loss modified to a weighted binary cross entropy. (3.4)
VQGAN training (VQ-IMG) is extended by Face-Loss & Object-Loss (3.3 & 3.5)
Classifier Guidance for the autoregressive transformer (3.7)

Training Pipeline

Figure 6. from paper

What needs to be done?

Refer to the different folders to see details.

Citation

@misc{https://doi.org/10.48550/arxiv.2203.13131,
  doi = {10.48550/ARXIV.2203.13131},
  url = {https://arxiv.org/abs/2203.13131},
  author = {Gafni, Oran and Polyak, Adam and Ashual, Oron and Sheynin, Shelly and Parikh, Devi and Taigman, Yaniv},
  title = {Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors},
  publisher = {arXiv},
  year = {2022},
  copyright = {arXiv.org perpetual, non-exclusive license}
}

Pytorch implementation of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors

Related tags

Overview

Make-A-Scene - PyTorch

Note: this is work in progress.

Paper Description:

Training Pipeline

What needs to be done?

Citation

Owner

Casual GAN Papers

U^2-Net - Portrait matting This repository explores possibilities of using the original u^2-net model for portrait matting.

PyTorch implementation of UPFlow (unsupervised optical flow learning)

Tom-the-AI - A compound artificial intelligence software for Linux systems.

A Topic Modeling toolbox

A python package simulating the quasi-2D pseudospin-1/2 Gross-Pitaevskii equation with NVIDIA GPU acceleration.

Official Code Release for "TIP-Adapter: Training-free clIP-Adapter for Better Vision-Language Modeling"

Large Scale Multi-Illuminant (LSMI) Dataset for Developing White Balance Algorithm under Mixed Illumination

Translate darknet to tensorflow. Load trained weights, retrain/fine-tune using tensorflow, export constant graph def to mobile devices

[ICLR 2022] Contact Points Discovery for Soft-Body Manipulations with Differentiable Physics

PyTorch implementation of Weak-shot Fine-grained Classification via Similarity Transfer

BrainGNN - A deep learning model for data-driven discovery of functional connectivity

Misc YOLOL scripts for use in the Starbase space sandbox videogame

Yolov5 + Deep Sort with PyTorch

FAIR's research platform for object detection research, implementing popular algorithms like Mask R-CNN and RetinaNet.

Hyper-parameter optimization for sklearn

YOLOX-CondInst - Implement CondInst which is a instances segmentation method on YOLOX

This is an official implementation of the High-Resolution Transformer for Dense Prediction.

SpeechNAS Better Trade off between Latency and Accuracy for Large Scale Speaker Verification

Deep Reinforcement Learning by using an on-policy adaptation of Maximum a Posteriori Policy Optimization (MPO)

Weighted QMIX: Expanding Monotonic Value Function Factorisation