[ACM MM 2021] Diverse Image Inpainting with Bidirectional and Autoregressive Transformers

Last update: Nov 09, 2022

Related tags

Overview

Diverse Image Inpainting with Bidirectional and Autoregressive Transformers

Installation

pip install -r requirements.txt

Dataset Preparation

Given the dataset, please prepare the images paths in a folder named by the dataset with the following folder strcuture.

    flist/dataset_name
        ├── train.flist    # paths of training images
        ├── valid.flist    # paths of validation images
        └── test.flist     # paths of testing images

In this work, we use CelebA-HQ (Download availbale here), Places2 (Download availbale here), ParisStreet View (need author's permission to download)

ImageNet K-means Cluster: The kmeans_centers.npy is downloaded from image-gpt, it's used to quantitize the low-resolution images.

Testing with Pre-trained Models

Download pre-trained models:

CelebA-HQ: BAT ; Upsmapler
Places2: BAT ; Upsmapler
Paris-StreetView: BAT ; Upsmapler

Put the pre-trained model under the checkpoints folder, e.g.

    checkpoints
        ├── celebahq_bat_pretrain
            ├── latest_net_G.pth

Prepare the input images and masks to test.

python bat_sample.py --num_sample [1] --tran_model [bat name] --up_model [upsampler name] --input_dir [dir of input] --mask_dir [dir of mask] --save_dir [dir to save results]

Training New Models

Pretrained VGG model Download from here, move it to models/. This model is used to calculate training loss for the upsampler.

New models can be trained with the following commands.

Prepare dataset. Use --dataroot option to locate the directory of file lists, e.g. ./flist, and specify the dataset name to train with --dataset_name option. Identify the types and mask ratio using --mask_type and --pconv_level options.
Train the transformer.

# To specify your own dataset or settings in the bash file.
bash train_bat.sh

Please note that some of the transformer settings are defined in train_bat.py instead of options/, and this script will take every available gpus for training, please define the GPUs via CUDA_VISIBLE_DEVICES instead of --gpu_ids, which is used for the upsampler.

Train the upsampler.

# To specify your own dataset or settings in the bash file.
bash train_up.sh

The upsampler is typically trained by the low-resolution ground truth, we find that using some samples from the trained BAT might be helpful to improve the performance i.e. PSNR, SSIM. But the sampling process is quite time consuming, training with ground truth also could yield reasonable results.

Citation

If you find this code helpful for your research, please cite our papers.

@inproceedings{yu2021diverse,
  title={Diverse Image Inpainting with Bidirectional and Autoregressive Transformers},
  author={Yu, Yingchen and Zhan, Fangneng and Wu, Rongliang and Pan, Jianxiong and Cui, Kaiwen and Lu, Shijian and Ma, Feiying and Xie, Xuansong and Miao, Chunyan},
  booktitle={Proceedings of the 29th ACM International Conference on Multimedia},
  year={2021}
}

Acknowledgments

This code borrows heavily from SPADE and minGPT, we apprecite the authors for sharing their codes.

[ACM MM 2021] Diverse Image Inpainting with Bidirectional and Autoregressive Transformers

Related tags

Overview

Diverse Image Inpainting with Bidirectional and Autoregressive Transformers

Installation

Dataset Preparation

Testing with Pre-trained Models

Training New Models

Citation

Acknowledgments

Owner

Yingchen Yu

Astrostatistics class for the MSc degree in Astrophysics at the University of Milan-Bicocca (Italy)

Txt2Xml tool will help you convert from txt COCO format to VOC xml format in Object Detection Problem.

Attention-based CNN-LSTM and XGBoost hybrid model for stock prediction

The implementation of 'Image synthesis via semantic composition'.

The pyrelational package offers a flexible workflow to enable active learning with as little change to the models and datasets as possible

DeepOBS: A Deep Learning Optimizer Benchmark Suite

[ICML 2021] DouZero: Mastering DouDizhu with Self-Play Deep Reinforcement Learning | 斗地主AI

"Graph Neural Controlled Differential Equations for Traffic Forecasting", AAAI 2022

GAN Image Generator and Characterwise Image Recognizer with python

ICON: Implicit Clothed humans Obtained from Normals (CVPR 2022)

Joint Versus Independent Multiview Hashing for Cross-View Retrieval[J] (IEEE TCYB 2021, PyTorch Code)

Code for the USENIX 2017 paper: kAFL: Hardware-Assisted Feedback Fuzzing for OS Kernels

BERTMap: A BERT-Based Ontology Alignment System

Code for LIGA-Stereo Detector, ICCV'21

Contenido del curso Bases de datos del DCC PUC versión 2021-2

A lossless neural compression framework built on top of JAX.

Combine Tacotron2 and Hifi GAN to generate speech from text

Tom-the-AI - A compound artificial intelligence software for Linux systems.

Code and datasets for TPAMI 2021

MVS2D: Efficient Multi-view Stereo via Attention-Driven 2D Convolutions