HALO: A Skeleton-Driven Neural Occupancy Representation for Articulated Hands

Last update: Oct 07, 2022

Related tags

Deep Learning halo

Overview

HALO: A Skeleton-Driven Neural Occupancy Representation for Articulated Hands

Oral Presentation, 3DV 2021

Korrawe Karunratanakul, Adrian Spurr, Zicong Fan, Otmar Hilliges, Siyu Tang
ETH Zurich

Video: Youtube

Abstract

We present Hand ArticuLated Occupancy (HALO), a novel representation of articulated hands that bridges the advantages of 3D keypoints and neural implicit surfaces and can be used in end-to-end trainable architectures. Unlike existing statistical parametric hand models (e.g.~MANO), HALO directly leverages the 3D joint skeleton as input and produces a neural occupancy volume representing the posed hand surface. The key benefits of HALO are (1) it is driven by 3D keypoints, which have benefits in terms of accuracy and are easier to learn for neural networks than the latent hand-model parameters; (2) it provides a differentiable volumetric occupancy representation of the posed hand; (3) it can be trained end-to-end, allowing the formulation of losses on the hand surface that benefit the learning of 3D keypoints. We demonstrate the applicability of HALO to the task of conditional generation of hands that grasp 3D objects. The differentiable nature of HALO is shown to improve the quality of the synthesized hands both in terms of physical plausibility and user preference.

Updates

December 1, 2021: Initial release for version 0.01 with demo.

Running the code

Dependencies

im2mesh library from Occupancy Network for point sampling and marching cubes
requirements.txt
MANO model for data preprocessing

The easiest way to run the code is to use conda. The code is tested on Ubuntu 18.04.

Implicit surface from keypoints

To try a demo which produces an implicit hand surface from the input keypoints, run:

cd halo
python demo_kps_to_hand.py

The demo will run the marching cubes algorithm and render each image in the animation above sequentially. The output images are in the output folder. The provided sample sequence are interpolations beetween 17 randomly sampled poses from the unseen HO3D dataset .

Dataset

The HALO-base model is trained using Youtube3D hand dataset. We only use the hand mesh ground truth without the images and videos. We provide the preprocessed data in the evaluation section.
The HALO-VAE model is trained and test on the GRAB dataset

Evaluation

HALO base model (implicit hand model)

To generate the mesh given the 3D keypoints and precomputed transformation matrices, run:

cd halo_base
python generate.py CONFIG_FILE.yaml

To evaluate the hand surface, run:

python eval_meshes.py

We provide the preprocessed test set of the Youtube3D here. In addition, you can also find the produced meshes from our keypoint model on the same test set here.

HALO-VAE

To generate grasps given 3D object mesh, run:

python generate.py HALO_VAE_CONFIG_FILE.ymal --test_data DATA_PATH --inference

The evaluation code for contact/interpenetration and cluster analysis can be found in halo/evaluate.py and halo/evaluate_cluster.py accordningly. The intersection test demo is in halo/utils/interscetion.py

Training

HALO base model (implicit hand model)

Data Preprocessing

Each data point consists of 3D keypoints, transformation matrices, and a hand surface. To speed up the training, all transformation matrices are precomputed, either by out Canonicalization Layer or from the MANO. Please check halo/halo_base/prepare_data_from_mano_param_keypoints.py for details. We use the surface point sampling and occupancy computation method from the Occupancy Networks

Run

To train HALO base model (implicit functions), run:

cd halo_base
python train.py

HALO-VAE

To train HALO-VAE, run:

cd halo
python train.py

HALO_VAE requires a HALO base model trained using the transformation matrices from the Canonicalization Layer. The weights of the base model are not updated during the VAE training.

BibTex

@inproceedings{karunratanakul2021halo,
  title={A Skeleton-Driven Neural Occupancy Representation for Articulated Hands},
  author={Karunratanakul, Korrawe and, Spurr, Adrian and Fan, Zicong and Hilliges, Otmar and Tang, Siyu},
  booktitle={International Conference on 3D Vision (3DV)},
  year={2021}
}

References

Some code in our repo uses snippets of the following repo:

The occupancy computation and point sampling is based on Occupancy Networks.
We use MANO layer implementation from Yana Hasson
We use the renderer from MeshTransformer in our demo.

Please consider citing them if you found the code useful.

Acknowledgement

We sincerely acknowledge Shaofei Wang and Marko Mihajlovic for the insightful discussionsand helps with the baselines.

HALO: A Skeleton-Driven Neural Occupancy Representation for Articulated Hands

Related tags

Overview

HALO: A Skeleton-Driven Neural Occupancy Representation for Articulated Hands

Korrawe Karunratanakul, Adrian Spurr, Zicong Fan, Otmar Hilliges, Siyu Tang ETH Zurich

Abstract

Updates

Running the code

Dependencies

Implicit surface from keypoints

Dataset

Evaluation

HALO base model (implicit hand model)

HALO-VAE

Training

HALO base model (implicit hand model)

Data Preprocessing

Run

HALO-VAE

BibTex

References

Acknowledgement

Owner

Korrawe Karunratanakul

💃 VALSE: A Task-Independent Benchmark for Vision and Language Models Centered on Linguistic Phenomena

[CVPR'21] Projecting Your View Attentively: Monocular Road Scene Layout Estimation via Cross-view Transformation

Homepage of paper: Paint Transformer: Feed Forward Neural Painting with Stroke Prediction, ICCV 2021.

Generalized Data Weighting via Class-level Gradient Manipulation

A plug-and-play library for neural networks written in Python

NaturalCC is a sequence modeling toolkit that allows researchers and developers to train custom models

Code, Data and Demo for Paper: Controllable Generation from Pre-trained Language Models via Inverse Prompting

Library to enable Bayesian active learning in your research or labeling work.

A PyTorch implementation of Sharpness-Aware Minimization for Efficiently Improving Generalization

This project is a re-implementation of MASTER: Multi-Aspect Non-local Network for Scene Text Recognition by MMOCR

Vertical Federated Principal Component Analysis and Its Kernel Extension on Feature-wise Distributed Data based on Pytorch Framework

Speech recognition tool to convert audio to text transcripts, for Linux and Raspberry Pi.

C3d-pytorch - Pytorch porting of C3D network, with Sports1M weights

Self-Supervised Image Denoising via Iterative Data Refinement

Make differentially private training of transformers easy for everyone

PyTorch implementation of Tacotron speech synthesis model.

A python package to perform same transformation to coco-annotation as performed on the image.

Video Corpus Moment Retrieval with Contrastive Learning (SIGIR 2021)

Neural style transfer as a class in PyTorch

This is the official source code of "BiCAT: Bi-Chronological Augmentation of Transformer for Sequential Recommendation".

Korrawe Karunratanakul, Adrian Spurr, Zicong Fan, Otmar Hilliges, Siyu Tang
ETH Zurich