Code for Talk-to-Edit (ICCV2021). Paper: Talk-to-Edit: Fine-Grained Facial Editing via Dialog.

Overview

Talk-to-Edit (ICCV2021)

Python 3.7 pytorch 1.6.0

This repository contains the implementation of the following paper:

Talk-to-Edit: Fine-Grained Facial Editing via Dialog
Yuming Jiang, Ziqi Huang, Xingang Pan, Chen Change Loy, Ziwei Liu
IEEE International Conference on Computer Vision (ICCV), 2021

[Paper] [Project Page] [CelebA-Dialog Dataset]

Overview

overall_structure

Dependencies and Installation

  1. Clone Repo

    git clone [email protected]:yumingj/Talk-to-Edit.git
  2. Create Conda Environment and Install Dependencies

    conda env create -f environment.yml
    conda activate talk_edit
    • Python >= 3.7
    • PyTorch >= 1.6
    • CUDA 10.1
    • GCC 5.4.0

Get Started

Editing

We provide scripts for editing using our pretrained models.

  1. First, download the pretrained models from this link and put them under ./download/pretrained_models as follows:

    ./download/pretrained_models
    ├── 1024_field
    │   ├── Bangs.pth
    │   ├── Eyeglasses.pth
    │   ├── No_Beard.pth
    │   ├── Smiling.pth
    │   └── Young.pth
    ├── 128_field
    │   ├── Bangs.pth
    │   ├── Eyeglasses.pth
    │   ├── No_Beard.pth
    │   ├── Smiling.pth
    │   └── Young.pth
    ├── arcface_resnet18_110.pth
    ├── language_encoder.pth.tar
    ├── predictor_1024.pth.tar
    ├── predictor_128.pth.tar
    ├── stylegan2_1024.pth
    ├── stylegan2_128.pt
    ├── StyleGAN2_FFHQ1024_discriminator.pth
    └── eval_predictor.pth.tar
    
  2. You can try pure image editing without dialog instructions:

    python editing_wo_dialog.py \
       --opt ./configs/editing/editing_wo_dialog.yml \
       --attr 'Bangs' \
       --target_val 5

    The editing results will be saved in ./results.

    You can change attr to one of the following attributes: Bangs, Eyeglasses, Beard, Smiling, and Young(i.e. Age). And the target_val can be [0, 1, 2, 3, 4, 5].

  3. You can also try dialog-based editing, where you talk to the system through the command prompt:

    python editing_with_dialog.py --opt ./configs/editing/editing_with_dialog.yml

    The editing results will be saved in ./results.

    How to talk to the system:

    • Our system is able to edit five facial attributes: Bangs, Eyeglasses, Beard, Smiling, and Young(i.e. Age).
    • When prompted with "Enter your request (Press enter when you finish):", you can enter an editing request about one of the five attributes. For example, you can say "Make the bangs longer."
    • To respond to the system's feedback, just talk as if you were talking to a real person. For example, if the system asks "Is the length of the bangs just right?" after one round of editing, You can say things like "Yes." / "No." / "Yes, and I also want her to smile more happily.".
    • To end the conversation, just tell the system things like "That's all" / "Nothing else, thank you."
  4. By default, the above editing would be performed on the teaser image. You may change the image to be edited in two ways: 1) change line 11: latent_code_index to other values ranging from 0 to 99; 2) set line 10: latent_code_path to ~, so that an image would be randomly generated.

  5. If you want to try editing on real images, you may download the real images from this link and put them under ./download/real_images. You could also provide other real images at your choice. You need to change line 12: img_path in editing_with_dialog.yml or editing_wo_dialog.yml according to the path to the real image and set line 11: is_real_image as True.

  6. You can switch the default image size to 128 x 128 by setting line 3: img_res to 128 in config files.

Train the Semantic Field

  1. To train the Semantic Field, a number of sampled latent codes should be prepared and then we use the attribute predictor to predict the facial attributes for their corresponding images. The attribute predictor is trained using fine-grained annotations in CelebA-Dialog dataset. Here, we provide the latent codes we used. You can download the train data from this link and put them under ./download/train_data as follows:

    ./download/train_data
    ├── 1024
    │   ├── Bangs
    │   ├── Eyeglasses
    │   ├── No_Beard
    │   ├── Smiling
    │   └── Young
    └── 128
        ├── Bangs
        ├── Eyeglasses
        ├── No_Beard
        ├── Smiling
        └── Young
    
  2. We will also use some editing latent codes to monitor the training phase. You can download the editing latent code from this link and put them under ./download/editing_data as follows:

    ./download/editing_data
    ├── 1024
    │   ├── Bangs.npz.npy
    │   ├── Eyeglasses.npz.npy
    │   ├── No_Beard.npz.npy
    │   ├── Smiling.npz.npy
    │   └── Young.npz.npy
    └── 128
        ├── Bangs.npz.npy
        ├── Eyeglasses.npz.npy
        ├── No_Beard.npz.npy
        ├── Smiling.npz.npy
        └── Young.npz.npy
    
  3. All logging files in the training process, e.g., log message, checkpoints, and snapshots, will be saved to ./experiments and ./tb_logger directory.

  4. There are 10 configuration files under ./configs/train, named in the format of field_<IMAGE_RESOLUTION>_<ATTRIBUTE_NAME>. Choose the corresponding configuration file for the attribute and resolution you want.

  5. For example, to train the semantic field which edits the attribute Bangs in 128x128 image resolution, simply run:

    python train.py --opt ./configs/train/field_128_Bangs.yml

Quantitative Results

We provide codes for quantitative results shown in Table 1. Here we use Bangs in 128x128 resolution as an example.

  1. Use the trained semantic field to edit images.

    python editing_quantitative.py \
    --opt ./configs/train/field_128_bangs.yml \
    --pretrained_path ./download/pretrained_models/128_field/Bangs.pth
  2. Evaluate the edited images using quantitative metircs. Change image_num for different attribute accordingly: Bangs: 148, Eyeglasses: 82, Beard: 129, Smiling: 140, Young: 61.

    python quantitative_results.py \
    --attribute Bangs \
    --work_dir ./results/field_128_bangs \
    --image_dir ./results/field_128_bangs/visualization \
    --image_num 148

Qualitative Results

result

CelebA-Dialog Dataset

result

Our CelebA-Dialog Dataset is available at link.

CelebA-Dialog is a large-scale visual-language face dataset with the following features:

  • Facial images are annotated with rich fine-grained labels, which classify one attribute into multiple degrees according to its semantic meaning.
  • Accompanied with each image, there are captions describing the attributes and a user request sample.

result

The dataset can be employed as the training and test sets for the following computer vision tasks: fine-grained facial attribute recognition, fine-grained facial manipulation, text-based facial generation and manipulation, face image captioning, and broader natural language based facial recognition and manipulation tasks.

Citation

If you find our repo useful for your research, please consider citing our paper:

@InProceedings{jiang2021talkedit,
  author = {Jiang, Yuming and Huang, Ziqi and Pan, Xingang and Loy, Chen Change and Liu, Ziwei},
  title = {Talk-to-Edit: Fine-Grained Facial Editing via Dialog},
  booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year={2021}
}

Contact

If you have any question, please feel free to contact us via [email protected] or [email protected].

Acknowledgement

The codebase is maintained by Yuming Jiang and Ziqi Huang.

Part of the code is borrowed from stylegan2-pytorch, IEP and face-attribute-prediction.

Owner
Yuming Jiang
[email protected], Ph.D. Student
Yuming Jiang
The code for Bi-Mix: Bidirectional Mixing for Domain Adaptive Nighttime Semantic Segmentation

BiMix The code for Bi-Mix: Bidirectional Mixing for Domain Adaptive Nighttime Semantic Segmentation arxiv Framework: visualization results: Requiremen

stanley 18 Sep 18, 2022
Official project website for the CVPR 2021 paper "Exploring intermediate representation for monocular vehicle pose estimation"

EgoNet Official project website for the CVPR 2021 paper "Exploring intermediate representation for monocular vehicle pose estimation". This repo inclu

Shichao Li 138 Dec 09, 2022
Accurate 3D Face Reconstruction with Weakly-Supervised Learning: From Single Image to Image Set (CVPRW 2019). A PyTorch implementation.

Accurate 3D Face Reconstruction with Weakly-Supervised Learning: From Single Image to Image Set —— PyTorch implementation This is an unofficial offici

Sicheng Xu 833 Dec 28, 2022
Disagreement-Regularized Imitation Learning

Due to a normalization bug the expert trajectories have lower performance than the rl_baseline_zoo reported experts. Please see the following link in

Kianté Brantley 25 Apr 28, 2022
4K videos with annotated masks in our ICCV2021 paper 'Internal Video Inpainting by Implicit Long-range Propagation'.

Annotated 4K Videos paper | project website | code | demo video 4K videos with annotated object masks in our ICCV2021 paper: Internal Video Inpainting

Tengfei Wang 21 Nov 05, 2022
Bayesian algorithm execution (BAX)

Bayesian Algorithm Execution (BAX) Code for the paper: Bayesian Algorithm Execution: Estimating Computable Properties of Black-box Functions Using Mut

Willie Neiswanger 38 Dec 08, 2022
PySlowFast: video understanding codebase from FAIR for reproducing state-of-the-art video models.

PySlowFast PySlowFast is an open source video understanding codebase from FAIR that provides state-of-the-art video classification models with efficie

Meta Research 5.3k Jan 03, 2023
UnFlow: Unsupervised Learning of Optical Flow with a Bidirectional Census Loss

UnFlow: Unsupervised Learning of Optical Flow with a Bidirectional Census Loss This repository contains the TensorFlow implementation of the paper UnF

Simon Meister 270 Nov 06, 2022
Tensorflow implementation of Character-Aware Neural Language Models.

Character-Aware Neural Language Models Tensorflow implementation of Character-Aware Neural Language Models. The original code of author can be found h

Taehoon Kim 751 Dec 26, 2022
K-PLUG: Knowledge-injected Pre-trained Language Model for Natural Language Understanding and Generation in E-Commerce (EMNLP Founding 2021)

Introduction K-PLUG: Knowledge-injected Pre-trained Language Model for Natural Language Understanding and Generation in E-Commerce. Installation PyTor

Xu Song 21 Nov 16, 2022
Face recognition project by matching the features extracted using SIFT.

MV_FaceDetectionWithSIFT Face recognition project by matching the features extracted using SIFT. By : Aria Radmehr Professor : Ali Amiri Dependencies

Aria Radmehr 4 May 31, 2022
A Loss Function for Generative Neural Networks Based on Watson’s Perceptual Model

This repository contains the similarity metrics designed and evaluated in the paper, and instructions and code to re-run the experiments. Implementation in the deep-learning framework PyTorch

Steffen 86 Dec 27, 2022
EGNN - Implementation of E(n)-Equivariant Graph Neural Networks, in Pytorch

EGNN - Pytorch Implementation of E(n)-Equivariant Graph Neural Networks, in Pytorch. May be eventually used for Alphafold2 replication. This

Phil Wang 259 Jan 04, 2023
This is an official source code for implementation on Extensive Deep Temporal Point Process

Extensive Deep Temporal Point Process This is an official source code for implementation on Extensive Deep Temporal Point Process, which is composed o

Haitao Lin 8 Aug 15, 2022
Oriented Response Networks, in CVPR 2017

Oriented Response Networks [Home] [Project] [Paper] [Supp] [Poster] Torch Implementation The torch branch contains: the official torch implementation

ZhouYanzhao 217 Dec 12, 2022
Official PyTorch Implementation of Mask-aware IoU and maYOLACT Detector [BMVC2021]

The official implementation of Mask-aware IoU and maYOLACT detector. Our implementation is based on mmdetection. Mask-aware IoU for Anchor Assignment

Kemal Oksuz 46 Sep 29, 2022
Churn-Prediction-Project - In this project, a churn prediction model is developed for a private bank as a term project for Data Mining class.

Churn-Prediction-Project In this project, a churn prediction model is developed for a private bank as a term project for Data Mining class. Project in

1 Jan 03, 2022
Source codes for the paper "Local Additivity Based Data Augmentation for Semi-supervised NER"

LADA This repo contains codes for the following paper: Jiaao Chen*, Zhenghui Wang*, Ran Tian, Zichao Yang, Diyi Yang: Local Additivity Based Data Augm

GT-SALT 36 Dec 02, 2022
Quadruped-command-tracking-controller - Quadruped command tracking controller (flat terrain)

Quadruped command tracking controller (flat terrain) Prepare Install RAISIM link

Yunho Kim 4 Oct 20, 2022