A Human-in-the-Loop workflow for creating HD images from text

Overview

DALL·E Flow: A Human-in-the-Loop workflow for creating HD images from text
A Human-in-the-Loop? workflow for creating HD images from text

Open in Google Colab Open in Google Colab

DALL·E Flow is an interactive workflow for generating high-definition images from text prompt. First, it leverages DALL·E-Mega to generate image candidates, and then calls CLIP-as-service to rank the candidates w.r.t. the prompt. The preferred candidate is fed to GLID-3 XL for diffusion, which often enriches the texture and background. Finally, the candidate is upscaled to 1024x1024 via SwinIR.

DALL·E Flow is built with Jina in a client-server architecture, which gives it high scalability, non-blocking streaming, and a modern Pythonic interface. Client can interact with the server via gRPC/Websocket/HTTP with TLS.

Why Human-in-the-Loop? Generative art is a creative process. While recent advances of DALL·E unleash people's creativity, having a single-prompt-single-output UX/UI locks the imagination to a single possibility, which is bad no matter how fine this single result is. DALL·E Flow is an alternative to the one-liner, by formalizing the generative art as an iterative procedure.

Gallery

Image filename is the corresponding text prompt.

A raccoon astronaut with the cosmos reflecting on the glass of his helmet dreaming of the stars, digital artoil painting of a hamster drinking tea outsideAn oil pastel painting of an annoyed cat in a spaceshipa rainy night with a superhero perched above a city, in the style of a comic bookA synthwave style sunset above the reflecting water of the sea, digital arta 3D render of a rainbow colored hot air balloon flying above a reflective lakea teddy bear on a skateboard in Times Square A stained glass window of toucans in outer spacea campfire in the woods at night with the milky-way galaxy in the skyThe Hanging Gardens of Babylon in the middle of a city, in the style of DalíAn oil painting of a family reunited inside of an airport, digital artan oil painting of a humanoid robot playing chess in the style of Matissegolden gucci airpods realistic photo

Client

Open in Google Colab

Using client is super easy. The following steps are best run in Jupyter notebook or Google Colab.

You will need to install DocArray and Jina first:

pip install "docarray[common]>=0.13.5" jina

We have provided a demo server for you to play:

⚠️ Due to the massive requests now, the server is super busy. You can deploy your own server by following the instruction here.

server_url = 'grpc://dalle-flow.jina.ai:51005'

Step 1: Generate via DALL·E Mega

Now let's define the prompt:

prompt = 'an oil painting of a humanoid robot playing chess in the style of Matisse'

Let's submit it to the server and visualize the results:

from docarray import Document

da = Document(text=prompt).post(server_url, parameters={'num_images': 16}).matches

da.plot_image_sprites(fig_size=(10,10), show_index=True)

Here we generate 16 candidates as defined in num_images, which takes about ~2 minutes. You can use a smaller value if it is too long for you. The results are sorted by CLIP-as-service, with index-0 as the best candidate judged by CLIP.

Step 2: Select and refinement via GLID3 XL

Of course, you may think differently. Notice the number in the top-left corner? Select the one you like the most and get a better view:

fav_id = 3
fav = da[fav_id]
fav.display()

Now let's submit the selected candidates to the server for diffusion.

diffused = fav.post(f'{server_url}/diffuse', parameters={'skip_rate': 0.5}).matches

diffused.plot_image_sprites(fig_size=(10,10), show_index=True)

This will give 36 images based on the given image. You may allow the model to improvise more by giving skip_rate a near-zero value, or a near-one value to force its closeness to the given image. The whole procedure takes about ~2 minutes.

Step 3: Select and upscale via SwanIR

Select the image you like the most, and give it a closer look:

dfav_id = 34
fav = diffused[dfav_id]
fav.display()

Finally, submit to the server for the last step: upscaling to 1024 x 1024px.

fav = fav.post(f'{server_url}/upscale')
fav.display()

That's it! It is the one. If not satisfied, please repeat the procedure.

Btw, DocArray is a powerful and easy-to-use data structure for unstructured data. It is super productive for data scientists who work in cross-/multi-modal domain. To learn more about DocArray, please check out the docs.

Server

You can host your own server by following the instruction below.

Hardware requirements

It is highly recommended to run DALL·E Flow on a GPU machine. In fact, one GPU is probably not enough. DALL·E Mega needs one with 22GB memory. SwinIR and GLID-3 also need one; as they can be spawned on-demandly in seconds, they can share one GPU.

It requires at least 40GB free space on the hard drive, mostly for downloading pretrained models.

CPU-only environment is not tested and likely won't work. Google Colab is likely throwing OOM hence also won't work.

Install

Clone repos

mkdir dalle && cd dalle
git clone https://github.com/jina-ai/dalle-flow.git
git clone https://github.com/JingyunLiang/SwinIR.git
git clone https://github.com/CompVis/latent-diffusion.git
git clone https://github.com/Jack000/glid-3-xl.git

You should have the following folder structure:

dalle/
 |
 |-- dalle-flow/
 |-- SwinIR/
 |-- glid-3-xl/
 |-- latent-diffusion/

Install auxiliary repos

cd latent-diffusion && pip install -e . && cd -
cd glid-3-xl && pip install -e . && cd -

There are couple models we need to download first for GLID-3-XL:

wget https://dall-3.com/models/glid-3-xl/bert.pt
wget https://dall-3.com/models/glid-3-xl/kl-f8.pt
wget https://dall-3.com/models/glid-3-xl/finetune.pt

Install flow

cd dalle-flow
pip install -r requirements.txt

Start the server

Now you are under dalle-flow/, run the following command:

jina flow --uses flow.yml

You should see this screen immediately:

On the first start it will take ~8 minutes for downloading the DALL·E mega model and other necessary models. The proceeding runs should only take ~1 minute to reach the success message.

When everything is ready, you will see:

Congrats! Now you should be able to run the client.

You can modify and extend the server flow as you like, e.g. changing the model, adding persistence, or even auto-posting to Instagram/OpenSea. With Jina and DocArray, you can easily make DALL·E Flow cloud-native and ready for production.

Support

Join Us

DALL·E Flow is backed by Jina AI and licensed under Apache-2.0. We are actively hiring AI engineers, solution engineers to build the next neural search ecosystem in open-source.

Owner
Jina AI
A Neural Search Company. We help businesses and developers to build neural search-powered applications in minutes.
Jina AI
PyTorch implementation of ENet

PyTorch-ENet PyTorch (v1.1.0) implementation of ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation, ported from the lua-torc

David Silva 333 Dec 29, 2022
TorchFlare is a simple, beginner-friendly, and easy-to-use PyTorch Framework train your models effortlessly.

TorchFlare TorchFlare is a simple, beginner-friendly and an easy-to-use PyTorch Framework train your models without much effort. It provides an almost

Atharva Phatak 85 Dec 26, 2022
A Python reference implementation of the CF data model

cfdm A Python reference implementation of the CF data model. References Compliance with FAIR principles Documentation https://ncas-cms.github.io/cfdm

NCAS CMS 25 Dec 13, 2022
Simple cross-platform application for DaVinci surgical video frame annotation

About DaVid is a simple cross-platform GUI for annotating robotic and endoscopic surgical actions for use in deep-learning research. Features Simple a

Cyril Zakka 4 Oct 09, 2021
PyTorch implementation of adversarial patch

adversarial-patch PyTorch implementation of adversarial patch This is an implementation of the Adversarial Patch paper. Not official and likely to hav

Jamie Hayes 172 Nov 29, 2022
D2Go is a toolkit for efficient deep learning

D2Go D2Go is a production ready software system from FacebookResearch, which supports end-to-end model training and deployment for mobile platforms. W

Facebook Research 744 Jan 04, 2023
This package contains deep learning models and related scripts for RoseTTAFold

RoseTTAFold This package contains deep learning models and related scripts to run RoseTTAFold This repository is the official implementation of RoseTT

1.6k Jan 03, 2023
OpenDILab RL Kubernetes Custom Resource and Operator Lib

DI Orchestrator DI Orchestrator is designed to manage DI (Decision Intelligence) jobs using Kubernetes Custom Resource and Operator. Prerequisites A w

OpenDILab 205 Dec 29, 2022
Fashion Landmark Estimation with HRNet

HRNet for Fashion Landmark Estimation (Modified from deep-high-resolution-net.pytorch) Introduction This code applies the HRNet (Deep High-Resolution

SVIP Lab 91 Dec 26, 2022
RAMA: Rapid algorithm for multicut problem

RAMA: Rapid algorithm for multicut problem Solves multicut (correlation clustering) problems orders of magnitude faster than CPU based solvers without

Paul Swoboda 60 Dec 13, 2022
Fast, general, and tested differentiable structured prediction in PyTorch

Fast, general, and tested differentiable structured prediction in PyTorch

HNLP 1.1k Dec 16, 2022
Citation Intent Classification in scientific papers using the Scicite dataset an Pytorch

Citation Intent Classification Table of Contents About the Project Built With Installation Usage Acknowledgments About The Project Citation Intent Cla

Federico Nocentini 4 Mar 04, 2022
From this paper "SESNet: A Semantically Enhanced Siamese Network for Remote Sensing Change Detection"

SESNet for remote sensing image change detection It is the implementation of the paper: "SESNet: A Semantically Enhanced Siamese Network for Remote Se

1 May 24, 2022
An automated facial recognition based attendance system (desktop application)

Facial_Recognition_based_Attendance_System An automated facial recognition based attendance system (desktop application) Made using Python, Tkinter an

1 Jun 21, 2022
A simple tutoral for error correction task, based on Pytorch

gramcorrector A simple tutoral for error correction task, based on Pytorch Grammatical Error Detection (sentence-level) a binary sequence-based classi

peiyuan_gong 8 Dec 03, 2022
Least Square Calibration for Peer Reviews

Least Square Calibration for Peer Reviews Requirements gurobipy - for solving convex programs GPy - for Bayesian baseline numpy pandas To generate p

Sigma <a href=[email protected]"> 1 Nov 01, 2021
1st Solution For ICDAR 2021 Competition on Mathematical Formula Detection

This project releases our 1st place solution on ICDAR 2021 Competition on Mathematical Formula Detection. We implement our solution based on MMDetection, which is an open source object detection tool

yuxzho 94 Dec 25, 2022
Chainer Implementation of Semantic Segmentation using Adversarial Networks

Semantic Segmentation using Adversarial Networks Requirements Chainer (1.23.0) Differences Use of FCN-VGG16 instead of Dilated8 as Segmentor. Caution

Taiki Oyama 99 Jun 28, 2022
JAXMAPP: JAX-based Library for Multi-Agent Path Planning in Continuous Spaces

JAXMAPP: JAX-based Library for Multi-Agent Path Planning in Continuous Spaces JAXMAPP is a JAX-based library for multi-agent path planning (MAPP) in c

OMRON SINIC X 24 Dec 28, 2022
An implementation of the "Attention is all you need" paper without extra bells and whistles, or difficult syntax

Simple Transformer An implementation of the "Attention is all you need" paper without extra bells and whistles, or difficult syntax. Note: The only ex

29 Jun 16, 2022