Code for CVPR 2022 paper "Bailando: 3D dance generation via Actor-Critic GPT with Choreographic Memory"

Last update: Dec 29, 2022

Related tags

Computer Vision Bailando

Overview

Bailando

Code for CVPR 2022 (oral) paper "Bailando: 3D dance generation via Actor-Critic GPT with Choreographic Memory"

[Paper] | [Project Page] | [Video Demo]

✨ Do not hesitate to give a star! ✨

Driving 3D characters to dance following a piece of music is highly challenging due to the spatial constraints applied to poses by choreography norms. In addition, the generated dance sequence also needs to maintain temporal coherency with different music genres. To tackle these challenges, we propose a novel music-to-dance framework, Bailando, with two powerful components: 1) a choreographic memory that learns to summarize meaningful dancing units from 3D pose sequence to a quantized codebook, 2) an actor-critic Generative Pre-trained Transformer (GPT) that composes these units to a fluent dance coherent to the music. With the learned choreographic memory, dance generation is realized on the quantized units that meet high choreography standards, such that the generated dancing sequences are confined within the spatial constraints. To achieve synchronized alignment between diverse motion tempos and music beats, we introduce an actor-critic-based reinforcement learning scheme to the GPT with a newly-designed beat-align reward function. Extensive experiments on the standard benchmark demonstrate that our proposed framework achieves state-of-the-art performance both qualitatively and quantitatively. Notably, the learned choreographic memory is shown to discover human-interpretable dancing-style poses in an unsupervised manner.

Code

Environment

PyTorch == 1.6.0

Data preparation

In our experiments, we use AIST++ for both training and evaluation. Please visit here to download the AIST++ annotations and unzip them as './aist_plusplus_final/' folder, visit here to download all original music pieces (wav) into './aist_plusplus_final/all_musics'. And please set up the AIST++ API from here and download the required SMPL models from here. Please make a folder './smpl' and copy the downloaded 'male' SMPL model (with '_m' in name) to 'smpl/SMPL_MALE.pkl' and finally run

./prepare_aistpp_data.sh

to produce the features for training and test. Otherwise, directly download our preprocessed feature from here as ./data folder if you don't wish to process the data.

Training

The training of Bailando comprises of 4 steps in the following sequence. If you are using the slurm workload manager, you can directly run the corresponding shell. Otherwise, please remove the 'srun' parts. Our models are all trained with single NVIDIA V100 GPU. * A kind reminder: the quantization code does not fit multi-gpu training

Step 1: Train pose VQ-VAE (without global velocity)

sh srun.sh configs/sep_vqvae.yaml train [your node name] 1

Step 2: Train glabal velocity branch of pose VQ-VAE

sh srun.sh configs/sep_vavqe_root.yaml train [your node name] 1

Step 3: Train motion GPT

sh srun_gpt_all.sh configs/cc_motion_gpt.yaml train [your node name] 1

Step 4: Actor-Critic finetuning on target music

sh srun_actor_critic.sh configs/actor_critic.yaml train [your node name] 1

Evaluation

To test with our pretrained models, please download the weights from here (Google Drive) or separately downloads the four weights from [weight 1]|[weight 2]|[weight 3]|[weight4] (坚果云) into ./experiments folder.

1. Generate dancing results

To test the VQ-VAE (with or without global shift as you indicated in config):

sh srun.sh configs/sep_vqvae.yaml eval [your node name] 1

To test GPT:

sh srun_gpt_all.sh configs/cc_motion_gpt.yaml eval [your node name] 1

To test final restuls:

sh srun_actor_critic.sh configs/actor_critic.yaml eval [your node name] 1

2. Dance quality evaluations

After generating the dance in the above step, run the following codes.

Step 1: Extract the (kinetic & manual) features of all AIST++ motions (ONLY do it by once):

python extract_aist_features.py

Step 2: compute the evaluation metrics:

python utils/metrics_new.py

It will show exactly the same values reported in the paper. To fasten the computation, comment Line 184 of utils/metrics_new.py after computed the ground-truth feature once. To test another folder, change Line 182 to your destination, or kindly modify this code to a "non hard version" :)

Choreographic for music in the wild

TODO

Citation

@inproceedings{siyao2022bailando,
    title={Bailando: 3D dance generation via Actor-Critic GPT with Choreographic Memory,
    author={Siyao, Li and Yu, Weijiang and Gu, Tianpei and Lin, Chunze and Wang, Quan and Qian, Chen and Loy, Chen Change and Liu, Ziwei },
    booktitle={CVPR},
    year={2022}
}

License

Our code is released under MIT License.

Code for CVPR 2022 paper "Bailando: 3D dance generation via Actor-Critic GPT with Choreographic Memory"

Related tags

Overview

Bailando

Code

Environment

Data preparation

Training

Step 1: Train pose VQ-VAE (without global velocity)

Step 2: Train glabal velocity branch of pose VQ-VAE

Step 3: Train motion GPT

Step 4: Actor-Critic finetuning on target music

Evaluation

1. Generate dancing results

2. Dance quality evaluations

Step 1: Extract the (kinetic & manual) features of all AIST++ motions (ONLY do it by once):

Step 2: compute the evaluation metrics:

Choreographic for music in the wild

Citation

License

Owner

Li Siyao

Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)

Primary QPDF source code and documentation

A webcam-based 3x3x3 rubik's cube solver written in Python 3 and OpenCV.

Code for the head detector (HeadHunter) proposed in our CVPR 2021 paper Tracking Pedestrian Heads in Dense Crowd.

Vietnamese Language Detection and Recognition

Qrcode Attendence System with Opencv and Pyzbar

GDB python tool to pretty print and debug c++ xtensor containers

A curated list of promising OCR resources

Image processing using OpenCv

RepMLP: Re-parameterizing Convolutions into Fully-connected Layers for Image Recognition

An Optical Character Recognition system using Pytesseract/Extracting data from Blood Pressure Reports.

Use Convolutional Recurrent Neural Network to recognize the Handwritten line text image without pre segmentation into words or characters. Use CTC loss Function to train.

This is the open source implementation of the ICLR2022 paper "StyleNeRF: A Style-based 3D-Aware Generator for High-resolution Image Synthesis"

Text to QR-CODE

An advanced 2D image manipulation with features such as edge detection and image segmentation built using OpenCV

(CVPR 2021) Back-tracing Representative Points for Voting-based 3D Object Detection in Point Clouds

Face Anonymizer - FaceAnonApp v1.0

An Implementation of the alogrithm in paper IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Oriented Scene Text Detection

Convert PDF/Image to TXT using EasyOcr - the best OCR engine available!

This is used to convert a string to an Image with Handwritten Characters.