An architecture that makes any doodle realistic, in any specified style, using VQGAN, CLIP and some basic embedding arithmetics.

Last update: Dec 18, 2022

Related tags

Deep Learning Sketch-Simulator

Overview

Sketch Simulator

An architecture that makes any doodle realistic, in any specified style, using VQGAN, CLIP and some basic embedding arithmetics.

See the final cell output of the colab below for some examples with and without subtracting sketch embedding averages.

WARNING: This colab is messy, a precursor of the code in this repo, but it works.

Architecture Overview

Setup

run ./setup.sh in your environment. This will install required libraries and download model weights.

Usage

To work a single doodle, in your desired style (see train.py for all avaible modifiers), run:
- train.py --start_image "path/to/your/doodle" --prompts "a painting in the style of ... | Trending on artstation
Prompts are split using "|", and specific weights can be assigned using {prompt1}:{weight1}|{prompt2}:{weight2}
To explore the hyperparameter space or large amounts of doodles and / or promps using weights and biases:
- Create a sweep config with your desired parameters your_sweep.yaml in sweep_configs/ (see sweep_configs/* for examples)
- Start the sweep:
  - wandb sweep -p Sketch-sim "\path\to\your_sweep.yaml" (this returns the sweep_ID, to be used in the next command)
  - wandb agent janzuiderveld/Sketch-sim/sweep_ID''
- Alternatively, when working in SLURM environments, one can utilize `SLURM_scripts/sweeper.sh' (make sure to edit paths appropriately):
  - sbatch SLURM_scripts/sweeper.sh "path/to/your_sweep.yaml"

All outputs are saved in outputs/{args.experiment_name}/step_{i}.png

Calculate Average Sketch Embedding

To (re)calculate average sketch embeddings (results/ovl_mean_sketch.pth is calculated based on 1000 (padded) items per class for all 350 quickdraw classes) run:
- extract_sketch_emb.py --items_per_class 1000 --save_root "path/to/repo/root" --pad_images 6

Notes

1 step of synthesizing + embedding 400x400 images takes about 0.3 seconds on a single 1080, usually 20-30 steps is enough for nice results.
Prompts can be used as a metric in large hyperparameter sweeps (their scores are automatically logged) by using a weight of 0.

TODO

Add server / client scripts to circumvent startup times
Add CLIP-based classifier for testing conceptual embedding accuracy on Quickdraw classification

An architecture that makes any doodle realistic, in any specified style, using VQGAN, CLIP and some basic embedding arithmetics.

Related tags

Overview

Sketch Simulator

Architecture Overview

Setup

Usage

Calculate Average Sketch Embedding

Notes

TODO

Owner

Prototypical Cross-Attention Networks for Multiple Object Tracking and Segmentation, NeurIPS 2021 Spotlight

Towers of Babel: Combining Images, Language, and 3D Geometry for Learning Multimodal Vision. ICCV 2021.

PyTorch implementation of the Deep SLDA method from our CVPRW-2020 paper "Lifelong Machine Learning with Deep Streaming Linear Discriminant Analysis"

RSC-Net: 3D Human Pose, Shape and Texture from Low-Resolution Images and Videos

Metadata-Extractor - Metadata Extractor Script can be used to read in exif metadata

Predicting a person's gender based on their weight and height

This is the repo for Uncertainty Quantification 360 Toolkit.

Project page of the paper 'Analyzing Perception-Distortion Tradeoff using Enhanced Perceptual Super-resolution Network' (ECCVW 2018)

Learning Calibrated-Guidance for Object Detection in Aerial Images

Shared Attention for Multi-label Zero-shot Learning

Exploring the Dual-task Correlation for Pose Guided Person Image Generation

MOOSE (Multi-organ objective segmentation) a data-centric AI solution that generates multilabel organ segmentations to facilitate systemic TB whole-person research

Codes of paper "Unseen Object Amodal Instance Segmentation via Hierarchical Occlusion Modeling"

Neon: an add-on for Lightbulb making it easier to handle component interactions

Sandbox for training deep learning networks

This repo contains source code and materials for the TEmporally COherent GAN SIGGRAPH project.

Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams

Offical implementation for "Trash or Treasure? An Interactive Dual-Stream Strategy for Single Image Reflection Separation".

Frigate - NVR With Realtime Object Detection for IP Cameras

DatasetGAN: Efficient Labeled Data Factory with Minimal Human Effort