Code for Talk-to-Edit (ICCV2021). Paper: Talk-to-Edit: Fine-Grained Facial Editing via Dialog.

Overview

Talk-to-Edit (ICCV2021)

Python 3.7 pytorch 1.6.0

This repository contains the implementation of the following paper:

Talk-to-Edit: Fine-Grained Facial Editing via Dialog
Yuming Jiang, Ziqi Huang, Xingang Pan, Chen Change Loy, Ziwei Liu
IEEE International Conference on Computer Vision (ICCV), 2021

[Paper] [Project Page] [CelebA-Dialog Dataset]

Overview

overall_structure

Dependencies and Installation

  1. Clone Repo

    git clone [email protected]:yumingj/Talk-to-Edit.git
  2. Create Conda Environment and Install Dependencies

    conda env create -f environment.yml
    conda activate talk_edit
    • Python >= 3.7
    • PyTorch >= 1.6
    • CUDA 10.1
    • GCC 5.4.0

Get Started

Editing

We provide scripts for editing using our pretrained models.

  1. First, download the pretrained models from this link and put them under ./download/pretrained_models as follows:

    ./download/pretrained_models
    ├── 1024_field
    │   ├── Bangs.pth
    │   ├── Eyeglasses.pth
    │   ├── No_Beard.pth
    │   ├── Smiling.pth
    │   └── Young.pth
    ├── 128_field
    │   ├── Bangs.pth
    │   ├── Eyeglasses.pth
    │   ├── No_Beard.pth
    │   ├── Smiling.pth
    │   └── Young.pth
    ├── arcface_resnet18_110.pth
    ├── language_encoder.pth.tar
    ├── predictor_1024.pth.tar
    ├── predictor_128.pth.tar
    ├── stylegan2_1024.pth
    ├── stylegan2_128.pt
    ├── StyleGAN2_FFHQ1024_discriminator.pth
    └── eval_predictor.pth.tar
    
  2. You can try pure image editing without dialog instructions:

    python editing_wo_dialog.py \
       --opt ./configs/editing/editing_wo_dialog.yml \
       --attr 'Bangs' \
       --target_val 5

    The editing results will be saved in ./results.

    You can change attr to one of the following attributes: Bangs, Eyeglasses, Beard, Smiling, and Young(i.e. Age). And the target_val can be [0, 1, 2, 3, 4, 5].

  3. You can also try dialog-based editing, where you talk to the system through the command prompt:

    python editing_with_dialog.py --opt ./configs/editing/editing_with_dialog.yml

    The editing results will be saved in ./results.

    How to talk to the system:

    • Our system is able to edit five facial attributes: Bangs, Eyeglasses, Beard, Smiling, and Young(i.e. Age).
    • When prompted with "Enter your request (Press enter when you finish):", you can enter an editing request about one of the five attributes. For example, you can say "Make the bangs longer."
    • To respond to the system's feedback, just talk as if you were talking to a real person. For example, if the system asks "Is the length of the bangs just right?" after one round of editing, You can say things like "Yes." / "No." / "Yes, and I also want her to smile more happily.".
    • To end the conversation, just tell the system things like "That's all" / "Nothing else, thank you."
  4. By default, the above editing would be performed on the teaser image. You may change the image to be edited in two ways: 1) change line 11: latent_code_index to other values ranging from 0 to 99; 2) set line 10: latent_code_path to ~, so that an image would be randomly generated.

  5. If you want to try editing on real images, you may download the real images from this link and put them under ./download/real_images. You could also provide other real images at your choice. You need to change line 12: img_path in editing_with_dialog.yml or editing_wo_dialog.yml according to the path to the real image and set line 11: is_real_image as True.

  6. You can switch the default image size to 128 x 128 by setting line 3: img_res to 128 in config files.

Train the Semantic Field

  1. To train the Semantic Field, a number of sampled latent codes should be prepared and then we use the attribute predictor to predict the facial attributes for their corresponding images. The attribute predictor is trained using fine-grained annotations in CelebA-Dialog dataset. Here, we provide the latent codes we used. You can download the train data from this link and put them under ./download/train_data as follows:

    ./download/train_data
    ├── 1024
    │   ├── Bangs
    │   ├── Eyeglasses
    │   ├── No_Beard
    │   ├── Smiling
    │   └── Young
    └── 128
        ├── Bangs
        ├── Eyeglasses
        ├── No_Beard
        ├── Smiling
        └── Young
    
  2. We will also use some editing latent codes to monitor the training phase. You can download the editing latent code from this link and put them under ./download/editing_data as follows:

    ./download/editing_data
    ├── 1024
    │   ├── Bangs.npz.npy
    │   ├── Eyeglasses.npz.npy
    │   ├── No_Beard.npz.npy
    │   ├── Smiling.npz.npy
    │   └── Young.npz.npy
    └── 128
        ├── Bangs.npz.npy
        ├── Eyeglasses.npz.npy
        ├── No_Beard.npz.npy
        ├── Smiling.npz.npy
        └── Young.npz.npy
    
  3. All logging files in the training process, e.g., log message, checkpoints, and snapshots, will be saved to ./experiments and ./tb_logger directory.

  4. There are 10 configuration files under ./configs/train, named in the format of field_<IMAGE_RESOLUTION>_<ATTRIBUTE_NAME>. Choose the corresponding configuration file for the attribute and resolution you want.

  5. For example, to train the semantic field which edits the attribute Bangs in 128x128 image resolution, simply run:

    python train.py --opt ./configs/train/field_128_Bangs.yml

Quantitative Results

We provide codes for quantitative results shown in Table 1. Here we use Bangs in 128x128 resolution as an example.

  1. Use the trained semantic field to edit images.

    python editing_quantitative.py \
    --opt ./configs/train/field_128_bangs.yml \
    --pretrained_path ./download/pretrained_models/128_field/Bangs.pth
  2. Evaluate the edited images using quantitative metircs. Change image_num for different attribute accordingly: Bangs: 148, Eyeglasses: 82, Beard: 129, Smiling: 140, Young: 61.

    python quantitative_results.py \
    --attribute Bangs \
    --work_dir ./results/field_128_bangs \
    --image_dir ./results/field_128_bangs/visualization \
    --image_num 148

Qualitative Results

result

CelebA-Dialog Dataset

result

Our CelebA-Dialog Dataset is available at link.

CelebA-Dialog is a large-scale visual-language face dataset with the following features:

  • Facial images are annotated with rich fine-grained labels, which classify one attribute into multiple degrees according to its semantic meaning.
  • Accompanied with each image, there are captions describing the attributes and a user request sample.

result

The dataset can be employed as the training and test sets for the following computer vision tasks: fine-grained facial attribute recognition, fine-grained facial manipulation, text-based facial generation and manipulation, face image captioning, and broader natural language based facial recognition and manipulation tasks.

Citation

If you find our repo useful for your research, please consider citing our paper:

@InProceedings{jiang2021talkedit,
  author = {Jiang, Yuming and Huang, Ziqi and Pan, Xingang and Loy, Chen Change and Liu, Ziwei},
  title = {Talk-to-Edit: Fine-Grained Facial Editing via Dialog},
  booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year={2021}
}

Contact

If you have any question, please feel free to contact us via [email protected] or [email protected].

Acknowledgement

The codebase is maintained by Yuming Jiang and Ziqi Huang.

Part of the code is borrowed from stylegan2-pytorch, IEP and face-attribute-prediction.

Owner
Yuming Jiang
[email protected], Ph.D. Student
Yuming Jiang
Unofficial Pytorch Lightning implementation of Contrastive Syn-to-Real Generalization (ICLR, 2021)

Unofficial Pytorch Lightning implementation of Contrastive Syn-to-Real Generalization (ICLR, 2021)

Gyeongjae Choi 17 Sep 23, 2021
CZU-MHAD: A multimodal dataset for human action recognition utilizing a depth camera and 10 wearable inertial sensors

CZU-MHAD: A multimodal dataset for human action recognition utilizing a depth camera and 10 wearable inertial sensors   In order to facilitate the res

yujmo 11 Dec 12, 2022
DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.

DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.

Microsoft 8.4k Jan 01, 2023
Malware Analysis Neural Network project.

MalanaNeuralNetwork Description Malware Analysis Neural Network project. Table of Contents Getting Started Requirements Installation Clone Set-Up VENV

2 Nov 13, 2021
Cleaned up code for DSTC 10: SIMMC 2.0 track: subtask 2: multimodal coreference resolution

UNITER-Based Situated Coreference Resolution with Rich Multimodal Input: arXiv MMCoref_cleaned Code for the MMCoref task of the SIMMC 2.0 dataset. Pre

Yichen (William) Huang 2 Dec 05, 2022
Feedback is important: response-aware feedback mechanism for background based conversation

RFM The code for the paper: "Feedback is important: response-aware feedback mechanism for background based conversation." Requirements python 3.7 pyto

Jiatao Chen 2 Sep 29, 2022
Official repository with code and data accompanying the NAACL 2021 paper "Hurdles to Progress in Long-form Question Answering" (https://arxiv.org/abs/2103.06332).

Hurdles to Progress in Long-form Question Answering This repository contains the official scripts and datasets accompanying our NAACL 2021 paper, "Hur

Kalpesh Krishna 41 Nov 08, 2022
网络协议2天集训

网络协议2天集训 抓包工具安装 Wireshark wireshark下载地址 Tcpdump CentOS yum install tcpdump -y Ubuntu apt-get install tcpdump -y k8s抓包测试环境 查看虚拟网卡veth pair 查看

120 Dec 12, 2022
Apache Spark - A unified analytics engine for large-scale data processing

Apache Spark Spark is a unified analytics engine for large-scale data processing. It provides high-level APIs in Scala, Java, Python, and R, and an op

The Apache Software Foundation 34.7k Jan 04, 2023
Implementation of Deep Deterministic Policy Gradiet Algorithm in Tensorflow

ddpg-aigym Deep Deterministic Policy Gradient Implementation of Deep Deterministic Policy Gradiet Algorithm (Lillicrap et al.arXiv:1509.02971.) in Ten

Steven Spielberg P 247 Dec 07, 2022
ONNX Runtime Web demo is an interactive demo portal showing real use cases running ONNX Runtime Web in VueJS.

ONNX Runtime Web demo is an interactive demo portal showing real use cases running ONNX Runtime Web in VueJS. It currently supports four examples for you to quickly experience the power of ONNX Runti

Microsoft 58 Dec 18, 2022
Repository for publicly available deep learning models developed in Rosetta community

trRosetta2 This package contains deep learning models and related scripts used by Baker group in CASP14. Installation Linux/Mac clone the package git

81 Dec 29, 2022
PyTorch implementation of Glow

glow-pytorch PyTorch implementation of Glow, Generative Flow with Invertible 1x1 Convolutions (https://arxiv.org/abs/1807.03039) Usage: python train.p

Kim Seonghyeon 433 Dec 27, 2022
Face-Recognition-Attendence-System - This face recognition Attendence system using Python

Face-Recognition-Attendence-System I have developed this face recognition Attend

Riya Gupta 4 May 10, 2022
In this project we investigate the performance of the SetCon model on realistic video footage. Therefore, we implemented the model in PyTorch and tested the model on two example videos.

Contrastive Learning of Object Representations Supervisor: Prof. Dr. Gemma Roig Institutions: Goethe University CVAI - Computational Vision & Artifici

Dirk Neuhäuser 6 Dec 08, 2022
NeoPlay is the project dedicated to ESport events.

NeoPlay is the project dedicated to ESport events. On this platform users can participate in tournaments with prize pools as well as create their own tournaments.

3 Dec 18, 2021
Implementation of Heterogeneous Graph Attention Network

HetGAN Implementation of Heterogeneous Graph Attention Network This is the code repository of paper "Prediction of Metro Ridership During the COVID-19

5 Dec 28, 2021
Wav2Vec for speech recognition, classification, and audio classification

Soxan در زبان پارسی به نام سخن This repository consists of models, scripts, and notebooks that help you to use all the benefits of Wav2Vec 2.0 in your

Mehrdad Farahani 140 Dec 15, 2022
Use VITS and Opencpop to develop singing voice synthesis; Maybe it will VISinger.

Init Use VITS and Opencpop to develop singing voice synthesis; Maybe it will VISinger. 本项目基于 https://github.com/jaywalnut310/vits https://github.com/S

AmorTX 107 Dec 23, 2022
Using machine learning to predict and analyze high and low reader engagement for New York Times articles posted to Facebook.

How The New York Times can increase Engagement on Facebook Using machine learning to understand characteristics of news content that garners "high" Fa

Jessica Miles 0 Sep 16, 2021