TalkingHead-1KH is a talking-head dataset consisting of YouTube videos

Last update: Dec 29, 2022

Related tags

Overview

TalkingHead-1KH Dataset

TalkingHead-1KH is a talking-head dataset consisting of YouTube videos, originally created as a benchmark for face-vid2vid:

One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing
Ting-Chun Wang (NVIDIA), Arun Mallya (NVIDIA), Ming-Yu Liu (NVIDIA)
https://nvlabs.github.io/face-vid2vid/
https://arxiv.org/abs/2011.15126.pdf

The dataset consists of 500k video clips, of which about 80k are greater than 512x512 resolution. Only videos under permissive licenses are included. Note that the number of videos differ from that in the original paper because a more robust preprocessing script was used to split the videos. For business inquiries, please visit our website and submit the form: NVIDIA Research Licensing.

Download

Unzip the video metadata

First, unzip the metadata and put it under the root directory:

unzip data_list.zip

Unit test

This step downloads a small subset of the dataset to verify the scripts are working on your computer. You can also skip this step if you want to directly download the entire dataset.

bash videos_download_and_crop.sh small

The processed clips should appear in small/cropped_clips.

Download the entire dataset

Please run

bash videos_download_and_crop.sh train

The script will automatically download the YouTube videos, split them into short clips, and then crop and trim them to include only the face regions. The final processed clips should appear in train/cropped_clips.

Evaluation

To download the evaluation set which consists of only 1080p videos, please run

bash videos_download_and_crop.sh val

The processed clips should appear in val/cropped_clips.

We also provide the reconstruction results synthesized by our model here. For each video, we use only the first frame to reconstruct all the following frames.

Furthermore, for models trained using the VoxCeleb2 dataset, we also provide comparisons using another model trained on the VoxCeleb2 dataset. Please find the reconstruction results here.

Licenses

The individual videos were published in YouTube by their respective authors under Creative Commons BY 3.0 license. The metadata file, the download script file, the processing script file, and the documentation file are made available under MIT license. You can use, redistribute, and adapt it, as long as you (a) give appropriate credit by citing our paper, (b) indicate any changes that you've made, and (c) distribute any derivative works under the same license.

Privacy

When collecting the data, we were careful to only include videos that – to the best of our knowledge – were intended for free use and redistribution by their respective authors. That said, we are committed to protecting the privacy of individuals who do not wish their videos to be included.

If you would like to remove your video from the dataset, you can either

Go to YouTube and change the license of your video, or remove your video entirely.
Contact [email protected]. Please include your YouTube video link in the email.

Acknowledgements

This webpage borrows heavily from the FFHQ-dataset page.

Citation

If you use this dataset for your work, please cite

@inproceedings{wang2021facevid2vid,
  title={One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing},
  author={Ting-Chun Wang and Arun Mallya and Ming-Yu Liu},
  booktitle={CVPR},
  year={2021}
}

TalkingHead-1KH is a talking-head dataset consisting of YouTube videos

Related tags

Overview

TalkingHead-1KH Dataset

Download

Unzip the video metadata

Unit test

Download the entire dataset

Evaluation

Licenses

Privacy

Acknowledgements

Citation

Owner

atmaCup #11 の Public 4th / Pricvate 5th Solution のリポジトリです。

Code for "LASR: Learning Articulated Shape Reconstruction from a Monocular Video". CVPR 2021.

TensorFlow implementation of ENet, trained on the Cityscapes dataset.

Source Code for AAAI 2022 paper "Graph Convolutional Networks with Dual Message Passing for Subgraph Isomorphism Counting and Matching"

Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network)

Franka Emika Panda manipulator kinematics&dynamics simulation

Chunkmogrify: Real image inversion via Segments

Compositional Sketch Search

Codes for Causal Semantic Generative model (CSG), the model proposed in "Learning Causal Semantic Representation for Out-of-Distribution Prediction" (NeurIPS-21)

High-Fidelity Pluralistic Image Completion with Transformers (ICCV 2021)

CSPML (crystal structure prediction with machine learning-based element substitution)

Segmentation and Identification of Vertebrae in CT Scans using CNN, k-means Clustering and k-NN

PyTorch implementation of hand mesh reconstruction described in CMR and MobRecon.

Working demo of the Multi-class and Anomaly classification model using the CLIP feature space

🔅 Shapash makes Machine Learning models transparent and understandable by everyone

Python Library for Signal/Image Data Analysis with Transport Methods

A simple baseline for 3d human pose estimation in tensorflow. Presented at ICCV 17.

Implementation of a protein autoregressive language model, but with autoregressive infilling objective (editing subsequences capability)

PyTorch implementation of our ICCV 2021 paper, Interpretation of Emergent Communication in Heterogeneous Collaborative Embodied Agents.

Devkit for 3D -- Some utils for 3D object detection based on Numpy and Pytorch