EdiBERT is a generative model based on a bi-directional transformer, suited for image manipulation

Last update: Dec 07, 2022

Related tags

Overview

EdiBERT, a generative model for image editing

EdiBERT is a generative model based on a bi-directional transformer, suited for image manipulation. The same EdiBERT model, derived from a single training, can be used on a wide variety of tasks.

We follow the implementation of Taming-Transformers (https://github.com/CompVis/taming-transformers). Main modifications can be found in: taming/models/bert_transformer.py ; scripts/sample_mask_likelihood_maximization.py.

Requirements

A suitable conda environment named edibert can be created and activated with:

conda env create -f environment.yaml
conda activate edibert

FFHQ

Download FFHQ dataset (https://github.com/NVlabs/ffhq-dataset) and put it into data/ffhq/.

Training BERT

In the logs/ folder, download and extract the FFHQ VQGAN:

gdown --id '1P_wHLRfdzf1DjsAH_tG10GXk9NKEZqTg'
tar -xvzf 2021-04-23T18-19-01_ffhq_vqgan.tar.gz

Training on 1 GPUs:

python main.py --base configs/ffhq_transformer_bert_2D.yaml -t True --gpus 0,

Training on 2 GPUs:

python main.py --base configs/ffhq_transformer_bert_2D.yaml -t True --gpus 0,1

Running pre-trained BERT on composite/scribble-edited images

In the logs/ folder, download and extract the FFHQ VQGAN:

gdown --id '1P_wHLRfdzf1DjsAH_tG10GXk9NKEZqTg'
tar -xvzf 2021-04-23T18-19-01_ffhq_vqgan.tar.gz

In the logs/ folder, download and extract the FFHQ BERT:

gdown --id '1YGDd8XyycKgBp_whs9v1rkYdYe4Oxfb3'
tar -xvzf 2021-10-14T16-32-28_ffhq_transformer_bert_2D.tar.gz

folders and place them into logs.

Then, launch the following script for composite images:

python scripts/sample_mask_likelihood_maximization.py -r logs/2021-10-14T16-32-28_ffhq_transformer_bert_2D/checkpoints/epoch=000019.ckpt \
--image_folder data/ffhq_collages/ --mask_folder data/ffhq_collages_masks/ --image_list data/ffhq_collages.txt --keep_img \
--dilation_sampling 1 -k 100 -t 1.0 --batch_size 5 --bert --epochs 2  \
--device 0 --random_order \
--mask_collage --collage_frequency 3 --gaussian_smoothing_collage

Then, launch the following script for edits images:

python scripts/sample_mask_likelihood_maximization.py -r logs/2021-10-14T16-32-28_ffhq_transformer_bert_2D/checkpoints/epoch=000019.ckpt \
--image_folder data/ffhq_edits/ --mask_folder data/ffhq_edits_masks/ --image_list data/ffhq_edits.txt --keep_img \
--dilation_sampling 1 -k 100 -t 1.0 --batch_size 5 --bert --epochs 2  \
--device 0 --random_order \
--mask_collage --collage_frequency 3 --gaussian_smoothing_collage

The samples can then be found in logs/my_model/samples/. Here, the --batch_size argument corresponds to the number of EdiBERT generations per image.

Notebooks for playing with completion/denoising with BERT

Notebooks for image denoising and image inpainting can also be found in the main folder.

EdiBERT is a generative model based on a bi-directional transformer, suited for image manipulation

Related tags

Overview

EdiBERT, a generative model for image editing

Requirements

FFHQ

Training BERT

Running pre-trained BERT on composite/scribble-edited images

Notebooks for playing with completion/denoising with BERT

Owner

Transformer Tracking (CVPR2021)

CausaLM: Causal Model Explanation Through Counterfactual Language Models

Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics

[SIGGRAPH Asia 2021] DeepVecFont: Synthesizing High-quality Vector Fonts via Dual-modality Learning.

Invertible conditional GANs for image editing

African language Speech Recognition - Speech-to-Text

Deep Learning and Reinforcement Learning Library for Scientists and Engineers 🔥

URIE: Universal Image Enhancementfor Visual Recognition in the Wild

TensorFlow Tutorials with YouTube Videos

A Low Complexity Speech Enhancement Framework for Full-Band Audio (48kHz) based on Deep Filtering.

AbelNN: Deep Learning Python module from scratch

Keras-tensorflow implementation of Fully Convolutional Networks for Semantic Segmentation（Unfinished）

A curated list of the latest breakthroughs in AI (in 2021) by release date with a clear video explanation, link to a more in-depth article, and code.

CAMoE + Dual SoftMax Loss (DSL): Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax Loss

This repo implements several applications of the proposed generalized Bures-Wasserstein (GBW) geometry on symmetric positive definite matrices.

🔮 Execution time predictions for deep neural network training iterations across different GPUs.

A simple python stock Predictor

Use AI to generate a optimized stock portfolio

Source code and data in paper "MDFEND: Multi-domain Fake News Detection (CIKM'21)"

Implementations of CNNs, RNNs, GANs, etc