AudioDVP:Photorealistic Audio-driven Video Portraits

Related tags

AudioAudioDVP
Overview

AudioDVP

This is the official implementation of Photorealistic Audio-driven Video Portraits.

Major Requirements

  • Ubuntu >= 18.04
  • PyTorch >= 1.2
  • GCC >= 7.5
  • NVCC >= 10.1
  • FFmpeg (with H.264 support)

FYI, detailed environment setup is in enviroment.yml. (You definitely don't have to install all of them, just install what you need when you encounter an import error.)

Major implementation differences against original paper

  • Geometry parameter and texture parameter of 3DMM is now initialized from zero and shared among all samples during fitting, since it is more reasonable.

  • Using OpenCV rather than PIL for image editing operation.

Usage

1. Download face model data

  • Download Basel Face Model 2009. (Register and get 01_MorphableModel.mat.)

  • Download expression basis from 3DFace. (There is an Exp_Pca.bin in CoarseData.)

  • Download auxiliary files from Deep3DFaceReconstruction.

  • Put the data in renderer/data like the structure below.

    renderer/data
    ├── 01_MorphableModel.mat
    ├── Exp_Pca.bin
    ├── BFM_front_idx.mat
    ├── BFM_exp_idx.mat
    ├── facemodel_info.mat
    ├── select_vertex_id.mat
    ├── std_exp.txt
    └── data.mat(This is generated by the step 2 below.)
    

2. Build data

cd renderer/
python build_data.py

3.Download pretrained model of ATnet

  • The link is here.
  • Put atnet_lstm_18.pth in vendor/ATVGnet/model.

4.Download pretrained ResNet on VGGFace2

  • The link is here.
  • Put resnet50_ft_weight.pkl in weights

5.Download Trump speech video

  • The link is here. (Video courtesy of The White House.)
  • Put it in data/video

6.Compile CUDA rasterizer kernel

cd renderer/kernels
python setup.py build_ext --inplace

7.Running demo script

# Explanation of every step is provided.
./scripts/demo.sh

Since we provide both training and inference code, we won't upload pretrained model for brevity at present. We provide expected result in data/sample_result.mp4 using synthesized audio in data/test_audio.

Acknowledgment

This work is build upon many great open source code and data.

Notification

  • Our method is built upon Deep Video Portraits.
  • Our method adopts a person-specific Audio2Expression module, which is not robust enough than a universal one trained on large dataset such as Lip Reading Sentences in the Wild. A universal one is encouraged! Fortunately, our method works quite well on WaveNet sythesized audio like provided in data/test_audio.
  • The code IS NOT fully tested on another clean machine.
  • There is a known bug in the rasterizer that several pixels of rendered face are black (not assigned with any color) in some corner conditions due to float point error which I can't fix.

Disclaimer

We made this code publicly available to benefit graphics and vision community. Please DO NOT abuse the code for devil things.

Citation

@article{wen2020audiodvp,
    author={Xin Wen and Miao Wang and Christian Richardt and Ze-Yin Chen and Shi-Min Hu},
    journal={IEEE Transactions on Visualization and Computer Graphics}, 
    title={Photorealistic Audio-driven Video Portraits}, 
    year={2020},
    volume={26},
    number={12},
    pages={3457-3466},
    doi={10.1109/TVCG.2020.3023573}
}

License

BSD

An audio guide for destroying oracles in Destiny's Vault of Glass raid

prophet An audio guide for destroying oracles in Destiny's Vault of Glass raid. This project allows you to make any encounter with oracles without hav

24 Sep 15, 2022
User-friendly Voice Cloning Application

Multi-Language-RTVC stands for Multi-Language Real Time Voice Cloning and is a Voice Cloning Tool capable of transfering speaker-specific audio featur

Sven Eschlbeck 19 Dec 30, 2022
Music player - endlessly plays your music

Music player First, if you wonder about what is supposed to be a music player or what makes a music player different from a simple media player, read

Albert Zeyer 482 Dec 19, 2022
A Python library and tools AUCTUS A6 based radios.

A Python library and tools AUCTUS A6 based radios.

Jonathan Hart 6 Nov 23, 2022
Xbot-Music - Bot Play Music and Video in Voice Chat Group Telegram

XBOT-MUSIC A Telegram Music+video Bot written in Python using Pyrogram and Py-Tg

Fariz 2 Jan 20, 2022
convert-to-opus-cli is a Python CLI program for converting audio files to opus audio format.

convert-to-opus-cli convert-to-opus-cli is a Python CLI program for converting audio files to opus audio format. Installation Must have installed ffmp

4 Dec 21, 2022
Audio augmentations library for PyTorch for audio in the time-domain

Audio augmentations library for PyTorch for audio in the time-domain, with support for stochastic data augmentations as used often in self-supervised / contrastive learning.

Janne 166 Jan 08, 2023
Guide & Examples to create deeplearning gstreamer plugins and use them in your pipeline

upai-gst-dl-plugins Guide & Examples to create deeplearning gstreamer plugins and use them in your pipeline Introduction Thanks to the work done by @j

UPAI.IO 11 Dec 11, 2022
Gammatone-based spectrograms, using gammatone filterbanks or Fourier transform weightings.

Gammatone Filterbank Toolkit Utilities for analysing sound using perceptual models of human hearing. Jason Heeris, 2013 Summary This is a port of Malc

Jason Heeris 188 Dec 14, 2022
:speech_balloon: SpeechPy - A Library for Speech Processing and Recognition: http://speechpy.readthedocs.io/en/latest/

SpeechPy Official Project Documentation Table of Contents Documentation Which Python versions are supported Citation How to Install? Local Installatio

Amirsina Torfi 870 Dec 27, 2022
Library for working with sound files of the format: .ogg, .mp3, .wav

Library for working with sound files of the format: .ogg, .mp3, .wav. By work is meant - playing sound files in a straight line and in the background, obtaining information about the sound file (auth

Romanin 2 Dec 15, 2022
Klangbecken: The RaBe Endless Music Player

Klangbecken Klangbecken is the minimalistic endless music player for Radio Bern RaBe based on liquidsoap. It supports configurable and editable playli

Radio Bern RaBe 8 Oct 09, 2021
A voice control utility for Spotify

Spotify Voice Control A voice control utility for Spotify · Report Bug · Request

Shoubhit Dash 27 Jan 01, 2023
a library for audio and music analysis

aubio aubio is a library to label music and sounds. It listens to audio signals and attempts to detect events. For instance, when a drum is hit, at wh

aubio 2.9k Dec 30, 2022
A library for augmenting annotated audio data

muda A library for Musical Data Augmentation. muda package implements annotation-aware musical data augmentation, as described in the muda paper. The

Brian McFee 214 Nov 22, 2022
Audio pitch-shifting & re-sampling utility, based on the EMU SP-1200

Pitcher.py Free & OS emulation of the SP-12 & SP-1200 signal chain (now with GUI) Pitch shift / bitcrush / resample audio files Written and tested in

morgan 13 Oct 03, 2022
XA Music Player - Telegram Music Bot

XA Music Player Requirements 📝 FFmpeg (Latest) NodeJS nodesource.com (NodeJS 17+) Python (3.10+) PyTgCalls (Lastest) MongoDB (3.12.1) 2nd Telegram Ac

RexAshh 3 Jun 30, 2022
Automatically move or copy files based on metadata associated with the files. For example, file your photos based on EXIF metadata or use MP3 tags to file your music files.

Automatically move or copy files based on metadata associated with the files. For example, file your photos based on EXIF metadata or use MP3 tags to file your music files.

Rhet Turnbull 14 Nov 02, 2022
PatrikZero's CS:GO Hearing protection

Program that lowers volume when you die and get flashed in CS:GO. It aims to lower the chance of hearing damage by reducing overall sound exposure. Uses game state integration. Anti-cheat safe.

Patrik Žúdel 224 Dec 04, 2022
live coding in python + supercollider

live coding in python + supercollider

Zack 6 Feb 06, 2022