Code for Talk-to-Edit (ICCV2021). Paper: Talk-to-Edit: Fine-Grained Facial Editing via Dialog.

Overview

Talk-to-Edit (ICCV2021)

Python 3.7 pytorch 1.6.0

This repository contains the implementation of the following paper:

Talk-to-Edit: Fine-Grained Facial Editing via Dialog
Yuming Jiang, Ziqi Huang, Xingang Pan, Chen Change Loy, Ziwei Liu
IEEE International Conference on Computer Vision (ICCV), 2021

[Paper] [Project Page] [CelebA-Dialog Dataset]

Overview

overall_structure

Dependencies and Installation

  1. Clone Repo

    git clone [email protected]:yumingj/Talk-to-Edit.git
  2. Create Conda Environment and Install Dependencies

    conda env create -f environment.yml
    conda activate talk_edit
    • Python >= 3.7
    • PyTorch >= 1.6
    • CUDA 10.1
    • GCC 5.4.0

Get Started

Editing

We provide scripts for editing using our pretrained models.

  1. First, download the pretrained models from this link and put them under ./download/pretrained_models as follows:

    ./download/pretrained_models
    ├── 1024_field
    │   ├── Bangs.pth
    │   ├── Eyeglasses.pth
    │   ├── No_Beard.pth
    │   ├── Smiling.pth
    │   └── Young.pth
    ├── 128_field
    │   ├── Bangs.pth
    │   ├── Eyeglasses.pth
    │   ├── No_Beard.pth
    │   ├── Smiling.pth
    │   └── Young.pth
    ├── arcface_resnet18_110.pth
    ├── language_encoder.pth.tar
    ├── predictor_1024.pth.tar
    ├── predictor_128.pth.tar
    ├── stylegan2_1024.pth
    ├── stylegan2_128.pt
    ├── StyleGAN2_FFHQ1024_discriminator.pth
    └── eval_predictor.pth.tar
    
  2. You can try pure image editing without dialog instructions:

    python editing_wo_dialog.py \
       --opt ./configs/editing/editing_wo_dialog.yml \
       --attr 'Bangs' \
       --target_val 5

    The editing results will be saved in ./results.

    You can change attr to one of the following attributes: Bangs, Eyeglasses, Beard, Smiling, and Young(i.e. Age). And the target_val can be [0, 1, 2, 3, 4, 5].

  3. You can also try dialog-based editing, where you talk to the system through the command prompt:

    python editing_with_dialog.py --opt ./configs/editing/editing_with_dialog.yml

    The editing results will be saved in ./results.

    How to talk to the system:

    • Our system is able to edit five facial attributes: Bangs, Eyeglasses, Beard, Smiling, and Young(i.e. Age).
    • When prompted with "Enter your request (Press enter when you finish):", you can enter an editing request about one of the five attributes. For example, you can say "Make the bangs longer."
    • To respond to the system's feedback, just talk as if you were talking to a real person. For example, if the system asks "Is the length of the bangs just right?" after one round of editing, You can say things like "Yes." / "No." / "Yes, and I also want her to smile more happily.".
    • To end the conversation, just tell the system things like "That's all" / "Nothing else, thank you."
  4. By default, the above editing would be performed on the teaser image. You may change the image to be edited in two ways: 1) change line 11: latent_code_index to other values ranging from 0 to 99; 2) set line 10: latent_code_path to ~, so that an image would be randomly generated.

  5. If you want to try editing on real images, you may download the real images from this link and put them under ./download/real_images. You could also provide other real images at your choice. You need to change line 12: img_path in editing_with_dialog.yml or editing_wo_dialog.yml according to the path to the real image and set line 11: is_real_image as True.

  6. You can switch the default image size to 128 x 128 by setting line 3: img_res to 128 in config files.

Train the Semantic Field

  1. To train the Semantic Field, a number of sampled latent codes should be prepared and then we use the attribute predictor to predict the facial attributes for their corresponding images. The attribute predictor is trained using fine-grained annotations in CelebA-Dialog dataset. Here, we provide the latent codes we used. You can download the train data from this link and put them under ./download/train_data as follows:

    ./download/train_data
    ├── 1024
    │   ├── Bangs
    │   ├── Eyeglasses
    │   ├── No_Beard
    │   ├── Smiling
    │   └── Young
    └── 128
        ├── Bangs
        ├── Eyeglasses
        ├── No_Beard
        ├── Smiling
        └── Young
    
  2. We will also use some editing latent codes to monitor the training phase. You can download the editing latent code from this link and put them under ./download/editing_data as follows:

    ./download/editing_data
    ├── 1024
    │   ├── Bangs.npz.npy
    │   ├── Eyeglasses.npz.npy
    │   ├── No_Beard.npz.npy
    │   ├── Smiling.npz.npy
    │   └── Young.npz.npy
    └── 128
        ├── Bangs.npz.npy
        ├── Eyeglasses.npz.npy
        ├── No_Beard.npz.npy
        ├── Smiling.npz.npy
        └── Young.npz.npy
    
  3. All logging files in the training process, e.g., log message, checkpoints, and snapshots, will be saved to ./experiments and ./tb_logger directory.

  4. There are 10 configuration files under ./configs/train, named in the format of field_<IMAGE_RESOLUTION>_<ATTRIBUTE_NAME>. Choose the corresponding configuration file for the attribute and resolution you want.

  5. For example, to train the semantic field which edits the attribute Bangs in 128x128 image resolution, simply run:

    python train.py --opt ./configs/train/field_128_Bangs.yml

Quantitative Results

We provide codes for quantitative results shown in Table 1. Here we use Bangs in 128x128 resolution as an example.

  1. Use the trained semantic field to edit images.

    python editing_quantitative.py \
    --opt ./configs/train/field_128_bangs.yml \
    --pretrained_path ./download/pretrained_models/128_field/Bangs.pth
  2. Evaluate the edited images using quantitative metircs. Change image_num for different attribute accordingly: Bangs: 148, Eyeglasses: 82, Beard: 129, Smiling: 140, Young: 61.

    python quantitative_results.py \
    --attribute Bangs \
    --work_dir ./results/field_128_bangs \
    --image_dir ./results/field_128_bangs/visualization \
    --image_num 148

Qualitative Results

result

CelebA-Dialog Dataset

result

Our CelebA-Dialog Dataset is available at link.

CelebA-Dialog is a large-scale visual-language face dataset with the following features:

  • Facial images are annotated with rich fine-grained labels, which classify one attribute into multiple degrees according to its semantic meaning.
  • Accompanied with each image, there are captions describing the attributes and a user request sample.

result

The dataset can be employed as the training and test sets for the following computer vision tasks: fine-grained facial attribute recognition, fine-grained facial manipulation, text-based facial generation and manipulation, face image captioning, and broader natural language based facial recognition and manipulation tasks.

Citation

If you find our repo useful for your research, please consider citing our paper:

@InProceedings{jiang2021talkedit,
  author = {Jiang, Yuming and Huang, Ziqi and Pan, Xingang and Loy, Chen Change and Liu, Ziwei},
  title = {Talk-to-Edit: Fine-Grained Facial Editing via Dialog},
  booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year={2021}
}

Contact

If you have any question, please feel free to contact us via [email protected] or [email protected].

Acknowledgement

The codebase is maintained by Yuming Jiang and Ziqi Huang.

Part of the code is borrowed from stylegan2-pytorch, IEP and face-attribute-prediction.

Owner
Yuming Jiang
[email protected], Ph.D. Student
Yuming Jiang
Poisson Surface Reconstruction for LiDAR Odometry and Mapping

Poisson Surface Reconstruction for LiDAR Odometry and Mapping Surfels TSDF Our Approach Table: Qualitative comparison between the different mapping te

Photogrammetry & Robotics Bonn 305 Dec 21, 2022
ECLARE: Extreme Classification with Label Graph Correlations

ECLARE ECLARE: Extreme Classification with Label Graph Correlations @InProceedings{Mittal21b, author = "Mittal, A. and Sachdeva, N. and Agrawal

Extreme Classification 35 Nov 06, 2022
Unified API to facilitate usage of pre-trained "perceptor" models, a la CLIP

mmc installation git clone https://github.com/dmarx/Multi-Modal-Comparators cd 'Multi-Modal-Comparators' pip install poetry poetry build pip install d

David Marx 37 Nov 25, 2022
Code and data for the paper "Hearing What You Cannot See"

Hearing What You Cannot See: Acoustic Vehicle Detection Around Corners Public repository of the paper "Hearing What You Cannot See: Acoustic Vehicle D

TU Delft Intelligent Vehicles 26 Jul 13, 2022
Elastic weight consolidation technique for incremental learning.

Overcoming-Catastrophic-forgetting-in-Neural-Networks Elastic weight consolidation technique for incremental learning. About Use this API if you dont

Shivam Saboo 89 Dec 22, 2022
The official implementation of Equalization Loss for Long-Tailed Object Recognition (CVPR 2020) based on Detectron2

Equalization Loss for Long-Tailed Object Recognition Jingru Tan, Changbao Wang, Buyu Li, Quanquan Li, Wanli Ouyang, Changqing Yin, Junjie Yan ⚠️ We re

Jingru Tan 197 Dec 25, 2022
Framework for evaluating ANNS algorithms on billion scale datasets.

Billion-Scale ANN http://big-ann-benchmarks.com/ Install The only prerequisite is Python (tested with 3.6) and Docker. Works with newer versions of Py

Harsha Vardhan Simhadri 132 Dec 24, 2022
Code release for "BoxeR: Box-Attention for 2D and 3D Transformers"

BoxeR By Duy-Kien Nguyen, Jihong Ju, Olaf Booij, Martin R. Oswald, Cees Snoek. This repository is an official implementation of the paper BoxeR: Box-A

Nguyen Duy Kien 111 Dec 07, 2022
RITA is a family of autoregressive protein models, developed by LightOn in collaboration with the OATML group at Oxford and the Debora Marks Lab at Harvard.

RITA: a Study on Scaling Up Generative Protein Sequence Models RITA is a family of autoregressive protein models, developed by a collaboration of Ligh

LightOn 69 Dec 22, 2022
Cycle Consistent Adversarial Domain Adaptation (CyCADA)

Cycle Consistent Adversarial Domain Adaptation (CyCADA) A pytorch implementation of CyCADA. If you use this code in your research please consider citi

Hyunwoo Ko 2 Jan 10, 2022
Delta Conformity Sociopatterns Analysis - Delta Conformity Sociopatterns Analysis

Delta_Conformity_Sociopatterns_Analysis ∆-Conformity is a local homophily measur

2 Jan 09, 2022
Patch SVDD for Image anomaly detection

Patch SVDD Patch SVDD for Image anomaly detection. Paper: https://arxiv.org/abs/2006.16067 (published in ACCV 2020). Original Code : https://github.co

Hong-Jeongmin 0 Dec 03, 2021
Code release for "MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound"

merlot_reserve Code release for "MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound" MERLOT Reserve (in submission) is a mo

Rowan Zellers 92 Dec 11, 2022
Segmentation in Style: Unsupervised Semantic Image Segmentation with Stylegan and CLIP

Segmentation in Style: Unsupervised Semantic Image Segmentation with Stylegan and CLIP Abstract: We introduce a method that allows to automatically se

Daniil Pakhomov 134 Dec 19, 2022
Official repository with code and data accompanying the NAACL 2021 paper "Hurdles to Progress in Long-form Question Answering" (https://arxiv.org/abs/2103.06332).

Hurdles to Progress in Long-form Question Answering This repository contains the official scripts and datasets accompanying our NAACL 2021 paper, "Hur

Kalpesh Krishna 41 Nov 08, 2022
Dataset and codebase for NeurIPS 2021 paper: Exploring Forensic Dental Identification with Deep Learning

Repository under construction. Example dataset, checkpoints, and training/testing scripts will be avaible soon! 💡 Collated best practices from most p

4 Jun 26, 2022
Code for 2021 NeurIPS --- Towards Multi-Grained Explainability for Graph Neural Networks

ReFine: Multi-Grained Explainability for GNNs We are trying hard to update the code, but it may take a while to complete due to our tight schedule rec

Shirley (Ying-Xin) Wu 47 Dec 16, 2022
A convolutional recurrent neural network for classifying A/B phases in EEG signals recorded for sleep analysis.

CAP-Classification-CRNN A deep learning model based on Inception modules paired with gated recurrent units (GRU) for the classification of CAP phases

Apurva R. Umredkar 2 Nov 25, 2022
In this project we use both Resnet and Self-attention layer for cat, dog and flower classification.

cdf_att_classification classes = {0: 'cat', 1: 'dog', 2: 'flower'} In this project we use both Resnet and Self-attention layer for cdf-Classification.

3 Nov 23, 2022
Implementation of DocFormer: End-to-End Transformer for Document Understanding, a multi-modal transformer based architecture for the task of Visual Document Understanding (VDU)

DocFormer - PyTorch Implementation of DocFormer: End-to-End Transformer for Document Understanding, a multi-modal transformer based architecture for t

171 Jan 06, 2023