Demo programs for the Talking Head Anime from a Single Image 2: More Expressive project.

Overview

Demo Code for "Talking Head Anime from a Single Image 2: More Expressive"

This repository contains demo programs for the Talking Head Anime from a Single Image 2: More Expressive project. Similar to the previous version, it has two programs:

  • The manual_poser lets you manipulate the facial expression and the head rotation of an anime character, given in a single image, through a graphical user interface. The poser is available in two forms: a standard GUI application, and a Jupyter notebook.
  • The ifacialmocap_puppeteer lets you transfer your facial motion, captured by a commercial iOS application called iFacialMocap, to an image of an anime character.

Try the Manual Poser on Google Colab

If you do not have the required hardware (discussed below) or do not want to download the code and set up an environment to run it, click this link to try running the manual poser on Google Colab.

Hardware Requirements

Both programs require a recent and powerful Nvidia GPU to run. I could personally ran them at good speed with the Nvidia Titan RTX. However, I think recent high-end gaming GPUs such as the RTX 2080, the RTX 3080, or better would do just as well.

The ifacialmocap_puppeteer requires an iOS device that is capable of computing blend shape parameters from a video feed. This means that the device must be able to run iOS 11.0 or higher and must have a TrueDepth front-facing camera. (See this page for more info.) In other words, if you have the iPhone X or something better, you should be all set. Personally, I have used an iPhone 12 mini.

Software Requirements

Both programs were written in Python 3. To run the GUIs, the following software packages are required:

  • Python >= 3.8
  • PyTorch >= 1.7.1 with CUDA support
  • SciPY >= 1.6.0
  • wxPython >= 4.1.1
  • Matplotlib >= 3.3.4

In particular, I created the environment to run the programs with Anaconda, using the following commands:

> conda create -n talking-head-anime-2-demo python=3.8
> conda activate talking-head-anime-2-demo
> conda install pytorch torchvision cudatoolkit=10.2 -c pytorch
> conda install scipy
> pip install wxPython
> conda install matplotlib

To run the Jupyter notebook version of the manual_poser, you also need:

  • Jupyter Notebook >= 6.2.0
  • IPyWidgets >= 7.6.3

This means that, in addition to the commands above, you also need to run:

> conda install -c conda-forge notebook
> conda install -c conda-forge ipywidgets
> jupyter nbextension enable --py widgetsnbextension

Lastly, the ifacialmocap_puppeteer requires iFacialMocap, which is available in the App Store for 980 yen. You also need to install the paired desktop application on your PC or Mac. (Linux users, I'm sorry!) Your iOS and your computer must also use the same network. (For example, you may connect them to the same wireless router.)

Automatic Environment Construction with Anaconda

You can also use Anaconda to download and install all Python packages in one command. Open your shell, change the directory to where you clone the repository, and run:

conda env create -f environment.yml

This will create an environment called talking-head-anime-2-demo containing all the required Python packages.

Download the Model

Before running the programs, you need to download the model files from this Dropbox link and unzip it to the data folder of the repository's directory. In the end, the data folder should look like:

+ data
  + illust
    - waifu_00.png
    - waifu_01.png
    - waifu_02.png
    - waifu_03.png
    - waifu_04.png
    - waifu_05.png
    - waifu_06.png
    - waifu_06_buggy.png
  - combiner.pt
  - eyebrow_decomposer.pt
  - eyebrow_morphing_combiner.pt
  - face_morpher.pt
  - two_algo_face_rotator.pt

The model files are distributed with the Creative Commons Attribution 4.0 International License, which means that you can use them for commercial purposes. However, if you distribute them, you must, among other things, say that I am the creator.

Running the manual_poser Desktop Application

Open a shell. Change your working directory to the repository's root directory. Then, run:

> python tha2/app/manual_poser.py

Note that before running the command above, you might have to activate the Python environment that contains the required packages. If you created an environment using Anaconda as was discussed above, you need to run

> conda activate talking-head-anime-2-demo

if you have not already activated the environment.

Running the manual_poser Jupyter Notebook

Open a shell. Activate the environment. Change your working directory to the repository's root directory. Then, run:

> jupyter notebook

A browser window should open. In it, open tha2.ipynb. Once you have done so, you should see that it only has one cell. Run it. Then, scroll down to the end of the document, and you'll see the GUI there.

Running the ifacialmocap_puppeteer

First, run iFacialMocap on your iOS device. It should show you the device's IP address. Jot it down. Keep the app open.

IP address in iFacialMocap screen

Then, run the companion desktop application.

iFaciaMocap desktop application

Click "Open Advanced Setting >>". The application should expand.

Click the 'Open Advanced Setting >>' button.

Click the button that says "Maya" on the right side.

Click the 'Maya' button.

Then, click "Blender."

Select 'Blender' mode in the desktop application

Next, replace the IP address on the left side with your iOS device's IP address.

Replace IP address with device's IP address.

Click "Connect to Blender."

Click 'Connect to Blender.'

Open a shell. Activate the environment. Change your working directory to the repository's root directory. Then, run:

> python tha2/app/ifacialmocap_puppeteer.py

If the programs are connected properly, you should see that the many progress bars at the bottom of the ifacialmocap_puppeteer window should move when you move your face in front of the iOS device's front-facing camera.

You should see the progress bars moving.

If all is well, load an character image, and it should follow your facial movement.

Constraints on Input Images

In order for the model to work well, the input image must obey the following constraints:

  • It must be of size 256 x 256.
  • It must be of PNG format.
  • It must have an alpha channel.
  • It must contain only one humanoid anime character.
  • The character must be looking straight ahead.
  • The head of the character should be roughly contained in the middle 128 x 128 box.
  • All pixels that do not belong to the character (i.e., background pixels) should have RGBA = (0,0,0,0).

Image specification

FAQ: I prepared an image just like you said, why is my output so ugly?!?

This is most likely because your image does not obey the "background RGBA = (0,0,0,0)" constraint. In other words, your background pixels are (RRR,GGG,BBB,0) for some RRR, GGG, BBB > 0 rather than (0,0,0,0). This happens when you use Photoshop because it does not clear the RGB channels of transparent pixels.

Let's see an example. When I tried to use the manual_poser with data/illust/waifu_06_buggy.png. Here's what I got.

A failure case

When you look at the image, there seems to be nothing wrong with it.

waifu_06_buggy.png

However, if you inspect it with GIMP, you will see that the RGB channels have what backgrounds, which means that those pixels have non-zero RGB values.

In the buggy image, background pixels have colors in the RGB channels.

What you want, instead, is something like the non-buggy version: data/illust/waifu_06.png, which looks exactly the same as the buggy one to the naked eyes.

waifu_06.png

However, in GIMP, all channels have black backgrounds.

In the good image, background pixels do not have colors in any channels.

Because of this, the output was clean.

A success case

A way to make sure that your image works well with the model is to prepare it with GIMP. When exporting your image to the PNG format, make sure to uncheck "Save color values from transparent pixels" before you hit "Export."

Make sure to uncheck 'Save color values from transparent pixels' before exporting!

Disclaimer

While the author is an employee of Google Japan, this software is not Google's product and is not supported by Google.

The copyright of this software belongs to me as I have requested it using the IARC process. However, Google might claim the rights to the intellectual property of this invention.

The code is released under the MIT license. The model is released under the Creative Commons Attribution 4.0 International License.

Owner
Pramook Khungurn
A software developer from Thailand, interested in computer graphics, machine learning, and algorithms.
Pramook Khungurn
结巴中文分词

jieba “结巴”中文分词:做最好的 Python 中文分词组件 "Jieba" (Chinese for "to stutter") Chinese text segmentation: built to be the best Python Chinese word segmentation

Sun Junyi 29.8k Jan 02, 2023
PyTorch implementation of NATSpeech: A Non-Autoregressive Text-to-Speech Framework

A Non-Autoregressive Text-to-Speech (NAR-TTS) framework, including official PyTorch implementation of PortaSpeech (NeurIPS 2021) and DiffSpeech (AAAI 2022)

760 Jan 03, 2023
Crowd sourced training data for Rasa NLU models

NLU Training Data Crowd-sourced training data for the development and testing of Rasa NLU models. If you're interested in grabbing some data feel free

Rasa 169 Dec 26, 2022
OCR을 이용하여 인원수를 인식 후 줌을 Kill 해줍니다

How To Use killtheZoom-2.0 Windows 0. https://joyhong.tistory.com/79 이 글을 보면서 tesseract를 C:\Program Files\Tesseract-OCR 경로로 설치해주세요(한국어 언어 추가 필요) 상단의 초

김정인 9 Sep 13, 2021
MMDA - multimodal document analysis

MMDA - multimodal document analysis

AI2 75 Jan 04, 2023
customer care chatbot made with Rasa Open Source.

Customer Care Bot Customer care bot for ecomm company which can solve faq and chitchat with users, can contact directly to team. 🛠 Features Basic E-c

Dishant Gandhi 23 Oct 27, 2022
AI Assistant for Building Reliable, High-performing and Fair Multilingual NLP Systems

AI Assistant for Building Reliable, High-performing and Fair Multilingual NLP Systems

Microsoft 37 Nov 29, 2022
AI-powered literature discovery and review engine for medical/scientific papers

AI-powered literature discovery and review engine for medical/scientific papers paperai is an AI-powered literature discovery and review engine for me

NeuML 819 Dec 30, 2022
Search for documents in a domain through Google. The objective is to extract metadata

MetaFinder - Metadata search through Google _____ __ ___________ .__ .___ / \

Josué Encinar 85 Dec 16, 2022
Use the state-of-the-art m2m100 to translate large data on CPU/GPU/TPU. Super Easy!

Easy-Translate is a script for translating large text files in your machine using the M2M100 models from Facebook/Meta AI. We also privide a script fo

Iker García-Ferrero 41 Dec 15, 2022
Write Alphabet, Words and Sentences with your eyes.

The-Next-Gen-AI-Eye-Writer The Eye tracking Technique has become one of the most popular techniques within the human and computer interaction era, thi

Rohan Kasabe 2 Apr 05, 2022
This is the Alpha of Nutte language, she is not complete yet / Essa é a Alpha da Nutte language, não está completa ainda

nutte-language This is the Alpha of Nutte language, it is not complete yet / Essa é a Alpha da Nutte language, não está completa ainda My language was

catdochrome 2 Dec 18, 2021
Checking spelling of form elements

Checking spelling of form elements. You can check the source files of external workflows/reports and configuration files

СКБ Контур (команда 1с) 15 Sep 12, 2022
BROS: A Pre-trained Language Model Focusing on Text and Layout for Better Key Information Extraction from Documents

BROS (BERT Relying On Spatiality) is a pre-trained language model focusing on text and layout for better key information extraction from documents. Given the OCR results of the document image, which

Clova AI Research 94 Dec 30, 2022
BiNE: Bipartite Network Embedding

BiNE: Bipartite Network Embedding This repository contains the demo code of the paper: BiNE: Bipartite Network Embedding. Ming Gao, Leihui Chen, Xiang

leihuichen 214 Nov 24, 2022
A benchmark for evaluation and comparison of various NLP tasks in Persian language.

Persian NLP Benchmark The repository aims to track existing natural language processing models and evaluate their performance on well-known datasets.

Mofid AI 68 Dec 19, 2022
Top2Vec is an algorithm for topic modeling and semantic search.

Top2Vec is an algorithm for topic modeling and semantic search. It automatically detects topics present in text and generates jointly embedded topic, document and word vectors.

Dimo Angelov 2.4k Jan 06, 2023
A website which allows you to play with the GPT-2 transformer

transformers A website which allows you to play with the GPT-2 model Built with ❤️ by raphtlw Table of contents Model Setup About Contributors Model T

raphtlw 2 Jan 27, 2022
Official implementation of Meta-StyleSpeech and StyleSpeech

Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation Dongchan Min, Dong Bok Lee, Eunho Yang, and Sung Ju Hwang This is an official code

min95 169 Jan 05, 2023
Deploying a Text Summarization NLP use case on Docker Container Utilizing Nvidia GPU

GPU Docker NLP Application Deployment Deploying a Text Summarization NLP use case on Docker Container Utilizing Nvidia GPU, to setup the enviroment on

Ritesh Yadav 9 Oct 14, 2022