Fine-tuning StyleGAN2 for Cartoon Face Generation

Overview

Cartoon-StyleGAN 🙃 : Fine-tuning StyleGAN2 for Cartoon Face Generation

Abstract

Recent studies have shown remarkable success in the unsupervised image to image (I2I) translation. However, due to the imbalance in the data, learning joint distribution for various domains is still very challenging. Although existing models can generate realistic target images, it’s difficult to maintain the structure of the source image. In addition, training a generative model on large data in multiple domains requires a lot of time and computer resources. To address these limitations, I propose a novel image-to-image translation method that generates images of the target domain by finetuning a stylegan2 pretrained model. The stylegan2 model is suitable for unsupervised I2I translation on unbalanced datasets; it is highly stable, produces realistic images, and even learns properly from limited data when applied with simple fine-tuning techniques. Thus, in this project, I propose new methods to preserve the structure of the source images and generate realistic images in the target domain.

Inference Notebook

🎉 You can do this task in colab ! : Open In Colab

Arxiv arXiv

[NEW!] 2021.08.30 Streamlit Ver


1. Method

Baseline : StyleGAN2-ADA + FreezeD

It generates realistic images, but does not maintain the structure of the source domain.

Ours : FreezeSG (Freeze Style vector and Generator)

FreezeG is effective in maintaining the structure of the source image. As a result of various experiments, I found that not only the initial layer of the generator but also the initial layer of the style vector are important for maintaining the structure. Thus, I froze the low-resolution layer of both the generator and the style vector.

Freeze Style vector and Generator

Results

With Layer Swapping

When LS is applied, the generated images by FreezeSG have a higher similarity to the source image than when FreezeG or the baseline (FreezeD + ADA) were used. However, since this fixes the weights of the low-resolution layer of the generator, it is difficult to obtain meaningful results when layer swapping on the low-resolution layer.

Ours : Structure Loss

Based on the fact that the structure of the image is determined at low resolution, I apply structure loss to the values of the low-resolution layer so that the generated image is similar to the image in the source domain. The structure loss makes the RGB output of the source generator to be fine-tuned to have a similar value with the RGB output of the target generator during training.

Results

Compare


2. Application : Change Facial Expression / Pose

I applied various models(ex. Indomain-GAN, SeFa, StyleCLIP…) to change facial expression, posture, style, etc.

(1) Closed Form Factorization(SeFa)

Pose

Slim Face

(2) StyleCLIP – Latent Optimization

Inspired by StyleCLIP that manipulates generated images with text, I change the faces of generated cartoon characters by text. I used the latent optimization method among the three methods of StyleCLIP and additionally introduced styleclip strength. It allows the latent vector to linearly move in the direction of the optimized latent vector, making the image change better with text.

with baseline model(FreezeD)

with our model(structureLoss)

(3) Style Mixing

Style-Mixing

When mixing layers, I found specifics layers that make a face. While the overall structure (hair style, facial shape, etc.) and texture (skin color and texture) were maintained, only the face(eyes, nose and mouth) was changed.

Results


3. Requirements

I have tested on:

Installation

Clone this repo :

git clone https://github.com/happy-jihye/Cartoon-StyleGan2
cd Cartoon-StyleGan2

Pretrained Models

Please download the pre-trained models from the following links.

Path Description
StyleGAN2-FFHQ256 StyleGAN2 pretrained model(256px) with FFHQ dataset from Rosinality
StyleGAN2-Encoder In-Domain GAN Inversion model with FFHQ dataset from Bryandlee
NaverWebtoon FreezeD + ADA with NaverWebtoon Dataset
NaverWebtoon_FreezeSG FreezeSG with NaverWebtoon Dataset
NaverWebtoon_StructureLoss StructureLoss with NaverWebtoon Dataset
Romance101 FreezeD + ADA with Romance101 Dataset
TrueBeauty FreezeD + ADA with TrueBeauty Dataset
Disney FreezeD + ADA with Disney Dataset
Disney_FreezeSG FreezeSG with Disney Dataset
Disney_StructureLoss StructureLoss with Disney Dataset
Metface_FreezeSG FreezeSG with Metface Dataset
Metface_StructureLoss StructureLoss with Metface Dataset

If you want to download all of the pretrained model, you can use download_pretrained_model() function in utils.py.

Dataset

I experimented with a variety of datasets, including Naver Webtoon, Metfaces, and Disney.

NaverWebtoon Dataset contains facial images of webtoon characters serialized on Naver. I made this dataset by crawling webtoons from Naver’s webtoons site and cropping the faces to 256 x 256 sizes. There are about 15 kinds of webtoons and 8,000 images(not aligned). I trained the entire Naver Webtoon dataset, and I also trained each webtoon in this experiment

I was also allowed to share a pretrained model with writers permission to use datasets. Thank you for the writers (Yaongyi, Namsoo, justinpinkney) who gave us permission.

Getting Started !

1. Prepare LMDB Dataset

First create lmdb datasets:

python prepare_data.py --out LMDB_PATH --n_worker N_WORKER --size SIZE1,SIZE2,SIZE3,... DATASET_PATH

# if you have zip file, change it to lmdb datasets by this commend
python run.py --prepare_data=DATASET_PATH --zip=ZIP_NAME --size SIZE

2. Train

# StyleGAN2
python train.py --batch BATCH_SIZE LMDB_PATH
# ex) python train.py --batch=8 --ckpt=ffhq256.pt --freezeG=4 --freezeD=3 --augment --path=LMDB_PATH

# StructureLoss
# ex) python train.py --batch=8 --ckpt=ffhq256.pt --structure_loss=2 --freezeD=3 --augment --path=LMDB_PATH

# FreezeSG
# ex) python train.py --batch=8 --ckpt=ffhq256.pt --freezeStyle=2 --freezeG=4 --freezeD=3 --augment --path=LMDB_PATH


# Distributed Settings
python train.py --batch BATCH_SIZE --path LMDB_PATH \
    -m torch.distributed.launch --nproc_per_node=N_GPU --main_port=PORT

Options

  1. Project images to latent spaces

    python projector.py --ckpt [CHECKPOINT] --size [GENERATOR_OUTPUT_SIZE] FILE1 FILE2 ...
    
  2. Closed-Form Factorization

    You can use closed_form_factorization.py and apply_factor.py to discover meaningful latent semantic factor or directions in unsupervised manner.

    First, you need to extract eigenvectors of weight matrices using closed_form_factorization.py

    python closed_form_factorization.py [CHECKPOINT]
    

    This will create factor file that contains eigenvectors. (Default: factor.pt) And you can use apply_factor.py to test the meaning of extracted directions

    python apply_factor.py -i [INDEX_OF_EIGENVECTOR] -d [DEGREE_OF_MOVE] -n [NUMBER_OF_SAMPLES] --ckpt [CHECKPOINT] [FACTOR_FILE]
    # ex) python apply_factor.py -i 19 -d 5 -n 10 --ckpt [CHECKPOINT] factor.pt
    

StyleGAN2-ada + FreezeD

During the experiment, I also carried out a task to generate a cartoon image based on Nvidia Team's StyleGAN2-ada code. When training these models, I didn't control the dataset resolution(256px) 😂 . So the quality of the generated image can be broken.

You can practice based on this code at Colab : Open In Colab

Generated-Image Interpolation

Reference

Owner
Jihye Back
Jihye Back
Applying PVT to Semantic Segmentation

Applying PVT to Semantic Segmentation Here, we take MMSegmentation v0.13.0 as an example, applying PVTv2 to SemanticFPN. For details see Pyramid Visio

35 Nov 30, 2022
PyTorch Implementation of the SuRP algorithm by the authors of the AISTATS 2022 paper "An Information-Theoretic Justification for Model Pruning"

PyTorch Implementation of the SuRP algorithm by the authors of the AISTATS 2022 paper "An Information-Theoretic Justification for Model Pruning".

Berivan Isik 8 Dec 08, 2022
Hi Guys, here I am providing examples, which will help you in Lerarning Python

LearningPython Hi guys, here I am trying to include as many practice examples of Python Language, as i Myself learn, and hope these will help you in t

4 Feb 03, 2022
classification task on dataset-CIFAR10,by using Tensorflow/keras

CIFAR10-Tensorflow classification task on dataset-CIFAR10,by using Tensorflow/keras 在这一个库中,我使用Tensorflow与keras框架搭建了几个卷积神经网络模型,针对CIFAR10数据集进行了训练与测试。分别使

3 Oct 17, 2021
An Image compression simulator that uses Source Extractor and Monte Carlo methods to examine the post compressive effects different compression algorithms have.

ImageCompressionSimulation An Image compression simulator that uses Source Extractor and Monte Carlo methods to examine the post compressive effects o

James Park 1 Dec 11, 2021
CLIP (Contrastive Language–Image Pre-training) trained on Indonesian data

CLIP-Indonesian CLIP (Radford et al., 2021) is a multimodal model that can connect images and text by training a vision encoder and a text encoder joi

Galuh 17 Mar 10, 2022
codes for Self-paced Deep Regression Forests with Consideration on Ranking Fairness

Self-paced Deep Regression Forests with Consideration on Ranking Fairness This is official codes for paper Self-paced Deep Regression Forests with Con

Learning in Vision 4 Sep 11, 2022
Analysis of rationale selection in neural rationale models

Neural Rationale Interpretability Analysis We analyze the neural rationale models proposed by Lei et al. (2016) and Bastings et al. (2019), as impleme

Yiming Zheng 3 Aug 31, 2022
A hybrid SOTA solution of LiDAR panoptic segmentation with C++ implementations of point cloud clustering algorithms. ICCV21, Workshop on Traditional Computer Vision in the Age of Deep Learning

ICCVW21-TradiCV-Survey-of-LiDAR-Cluster Motivation In contrast to popular end-to-end deep learning LiDAR panoptic segmentation solutions, we propose a

YimingZhao 103 Nov 22, 2022
Implementation of Monocular Direct Sparse Localization in a Prior 3D Surfel Map (DSL)

DSL Project page: https://sites.google.com/view/dsl-ram-lab/ Monocular Direct Sparse Localization in a Prior 3D Surfel Map Authors: Haoyang Ye, Huaiya

Haoyang Ye 93 Nov 30, 2022
A Haskell kernel for IPython.

IHaskell You can now try IHaskell directly in your browser at CoCalc or mybinder.org. Alternatively, watch a talk and demo showing off IHaskell featur

Andrew Gibiansky 2.4k Dec 29, 2022
This is a project based on retinaface face detection, including ghostnet and mobilenetv3

English | 简体中文 RetinaFace in PyTorch Chinese detailed blog:https://zhuanlan.zhihu.com/p/379730820 Face recognition with masks is still robust---------

pogg 59 Dec 21, 2022
Hand-distance-measurement-game - Hand Distance Measurement Game

Hand Distance Measurement Game This is program is made to calculate the distance

Priyansh 2 Jan 12, 2022
PIXIE: Collaborative Regression of Expressive Bodies

PIXIE: Collaborative Regression of Expressive Bodies [Project Page] This is the official Pytorch implementation of PIXIE. PIXIE reconstructs an expres

Yao Feng 331 Jan 04, 2023
Mixed Transformer UNet for Medical Image Segmentation

MT-UNet Update 2021/11/19 Thank you for your interest in our work. We have uploaded the code of our MTUNet to help peers conduct further research on i

dotman 92 Dec 25, 2022
Embeds a story into a music playlist by sorting the playlist so that the order of the music follows a narrative arc.

playlist-story-builder This project attempts to embed a story into a music playlist by sorting the playlist so that the order of the music follows a n

Dylan R. Ashley 0 Oct 28, 2021
Codes accompanying the paper "Learning Nearly Decomposable Value Functions with Communication Minimization" (ICLR 2020)

NDQ: Learning Nearly Decomposable Value Functions with Communication Minimization Note This codebase accompanies paper Learning Nearly Decomposable Va

Tonghan Wang 69 Nov 26, 2022
4th place solution for the SIGIR 2021 challenge.

SIGIR-2021 (Tinkoff.AI) How to start Download train and test data: https://sigir-ecom.github.io/data-task.html Place it under sigir-2021/data/. Run py

Tinkoff.AI 4 Jul 01, 2022
Vrcwatch - Supply the local time to VRChat as Avatar Parameters through OSC

English: README-EN.md VRCWatch VRCWatch は、VRChat 内のアバター向けに現在時刻を送信するためのプログラムです。 使

Kosaki Mezumona 17 Nov 30, 2022
PyTorch ,ONNX and TensorRT implementation of YOLOv4

PyTorch ,ONNX and TensorRT implementation of YOLOv4

4.2k Jan 01, 2023