Code for "LASR: Learning Articulated Shape Reconstruction from a Monocular Video". CVPR 2021.

Last update: Dec 26, 2022

Related tags

Overview

LASR

Installation

Build with conda

conda env create -f lasr.yml
conda activate lasr
# install softras
cd third_party/softras; python setup.py install; cd -;
# install manifold remeshing
git clone --recursive -j8 git://github.com/hjwdzh/Manifold; cd Manifold; mkdir build; cd build; cmake .. -DCMAKE_BUILD_TYPE=Release;make; cd ../../

For docker installation, please see install.md

Data preparation

Create folders to store data and training logs

mkdir log; mkdir tmp;

Synthetic data

To render {silhouette, flow, rgb} observations of spot.

python scripts/render_syn.py

Real data (DAVIS)

First, download DAVIS 2017 trainval set and copy JPEGImages/Full-Resolution and Annotations/Full-Resolution folders of DAVIS-camel into the according folders in database.

cp ...davis-path/DAVIS/Annotations/Full-Resolution/camel/ -rf database/DAVIS/Annotations/Full-Resolution/
cp ...davis-path/DAVIS-lasr/DAVIS/JPEGImages/Full-Resolution/camel/ -rf database/DAVIS/JPEGImages/Full-Resolution/

Then download pre-trained VCN optical flow:

pip install gdown
mkdir ./lasr_vcn
gdown https://drive.google.com/uc?id=139S6pplPvMTB-_giI6V2dxpOHGqqAdHn -O ./lasr_vcn/vcn_rob.pth

Run VCN-robust to predict optical flow on DAVIS camel video:

bash preprocess/auto_gen.sh camel

Your own video

You will need to download and install detectron2 to obtain object segmentations as instructed below.

python -m pip install detectron2 -f \
  https://dl.fbaipublicfiles.com/detectron2/wheels/cu110/torch1.7/index.html

First, use any video processing tool (such as ffmpeg) to extract frames into JPEGImages/Full-Resolution/name-of-the-video.

mkdir database/DAVIS/JPEGImages/Full-Resolution/pika-tmp/
ffmpeg -ss 00:00:04 -i database/raw/IMG-7495.MOV -vf fps=10 database/DAVIS/JPEGImages/Full-Resolution/pika-tmp/%05d.jpg

Then, run pointrend to get segmentations:

cd preprocess
python mask.py pika path-to-detectron2-root; cd -

Assuming you have downloaded VCN flow in the previous step, run flow prediction:

bash preprocess/auto_gen.sh pika

Single video optimization

Synthetic spot

Next, we want to optimize the shape, texture and camera parameters from image observartions. Optimizing spot takes ~20min on a single Titan Xp GPU.

bash scripts/spot3.sh

To render the optimized shape, texture and camera parameters

bash scripts/extract.sh spot3-1 10 1 26 spot3 no no
python render_vis.py --testdir log/spot3-1/ --seqname spot3 --freeze --outpath tmp/1.gif

DAVIS camel

Optimize on camel observations.

bash scripts/template.sh camel

To render optimized camel

bash scripts/render_result.sh camel

Costumized video (Pika)

Similarly, run the following steps to reconstruct pika

bash scripts/template.sh pika

To render reconstructed shape

bash scripts/render_result.sh pika

Monitor optimization

To monitor optimization, run

tensorboard --logdir log/

Example outputs

Evaluation

Run the following command to evaluate 3D shape accuracy for synthetic spot.

python scripts/eval_mesh.py --testdir log/spot3-1/ --gtdir database/DAVIS/Meshes/Full-Resolution/syn-spot3f/

Run the following command to evaluate keypoint accuracy on BADJA.

python scripts/eval_badja.py --testdir log/camel-5/ --seqname camel

Additional Notes

Other videos in DAVIS/BAJDA

Please refer to data preparation and optimization of the camel example, and modify camel to other sequence names, such as dance-twirl. We provide config files the configs folder.

Synthetic articulated objects

To render and reproduce results on articulated objects (Sec. 4.2), you will need to purchase and download 3D models here. We use blender to export animated meshes and run rendera_all.py:

python scripts/render_syn.py --outdir syn-dog-15 --nframes 15 --alpha 0.5 --model dog

Optimize on rendered observations

bash scripts/dog15.sh

To render optimized dog

bash scripts/render_result.sh dog

Batchsize

The current codebase is tested with batchsize=4. Batchsize can be modified in scripts/template.sh. Note decreasing the batchsize will improive speed but reduce the stability.

Distributed training

The current codebase supports single-node multi-gpu training with pytorch distributed data-parallel. Please modify dev and ngpu in scripts/template.sh to select devices.

Acknowledgement

The code borrows the skeleton of CMR

External repos:

External data:

Citation

To cite our paper,

@inproceedings{yang2021lasr,
  title={LASR: Learning Articulated Shape Reconstruction from a Monocular Video},
  author={Yang, Gengshan 
      and Sun, Deqing
      and Jampani, Varun
      and Vlasic, Daniel
      and Cole, Forrester
      and Chang, Huiwen
      and Ramanan, Deva
      and Freeman, William T
      and Liu, Ce},
  booktitle={CVPR},
  year={2021}
}

Code for "LASR: Learning Articulated Shape Reconstruction from a Monocular Video". CVPR 2021.

Related tags

Overview

LASR

Installation

Build with conda

Data preparation

Single video optimization

Example outputs

Additional Notes

Acknowledgement

Citation

Owner

Google

Official repository of ICCV21 paper "Viewpoint Invariant Dense Matching for Visual Geolocalization"

Julia package for multiway (inverse) covariance estimation.

Measuring and Improving Consistency in Pretrained Language Models

Breast Cancer Detection 🔬 ITI "AI_Pro" Graduation Project

Generative vs Discriminative: Rethinking The Meta-Continual Learning (NeurIPS 2021)

Drone Task1 - Drone Task1 With Python

Code for SentiBERT: A Transferable Transformer-Based Architecture for Compositional Sentiment Semantics (ACL'2020).

[IJCAI-2021] A benchmark of data-free knowledge distillation from paper "Contrastive Model Inversion for Data-Free Knowledge Distillation"

Reinforcement learning library in JAX.

A New Approach to Overgenerating and Scoring Abstractive Summaries

GULAG: GUessing LAnGuages with neural networks

Simple tool to combine(merge) onnx models. Simple Network Combine Tool for ONNX.

[CVPR'21 Oral] Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning

Linear image-to-image translation

Implementation of a protein autoregressive language model, but with autoregressive infilling objective (editing subsequences capability)

[CVPR 2021] Involution: Inverting the Inherence of Convolution for Visual Recognition, a brand new neural operator

Here is the implementation of our paper S2VC: A Framework for Any-to-Any Voice Conversion with Self-Supervised Pretrained Representations.

Deep Learning Head Pose Estimation using PyTorch.

MVS2D: Efficient Multi-view Stereo via Attention-Driven 2D Convolutions

This is the pytorch implementation for the paper: Learning Accurate Performance Predictors for Ultrafast Automated Model Compression, which is in submission to TPAMI

Code for "LASR: Learning Articulated Shape Reconstruction from a Monocular Video". CVPR 2021.

Related tags

Overview

LASR

Installation

Build with conda

Data preparation

Single video optimization

Example outputs

Additional Notes

Acknowledgement

Citation

Owner

Google

Official repository of ICCV21 paper "Viewpoint Invariant Dense Matching for Visual Geolocalization"

Julia package for multiway (inverse) covariance estimation.

Measuring and Improving Consistency in Pretrained Language Models

Breast Cancer Detection 🔬 ITI "AI_Pro" Graduation Project

Generative vs Discriminative: Rethinking The Meta-Continual Learning (NeurIPS 2021)

Drone Task1 - Drone Task1 With Python

Code for SentiBERT: A Transferable Transformer-Based Architecture for Compositional Sentiment Semantics (ACL'2020).

[IJCAI-2021] A benchmark of data-free knowledge distillation from paper "Contrastive Model Inversion for Data-Free Knowledge Distillation"

Reinforcement learning library in JAX.

A New Approach to Overgenerating and Scoring Abstractive Summaries

GULAG: GUessing LAnGuages with neural networks

Simple tool to combine(merge) onnx models. Simple Network Combine Tool for ONNX.

[CVPR'21 Oral] Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning

Linear image-to-image translation

Implementation of a protein autoregressive language model, but with autoregressive infilling objective (editing subsequences capability)

[CVPR 2021] Involution: Inverting the Inherence of Convolution for Visual Recognition, a brand new neural operator

Here is the implementation of our paper S2VC: A Framework for Any-to-Any Voice Conversion with Self-Supervised Pretrained Representations.

Deep Learning Head Pose Estimation using PyTorch.

MVS2D: Efficient Multi-view Stereo via Attention-Driven 2D Convolutions

This is the pytorch implementation for the paper: *Learning Accurate Performance Predictors for Ultrafast Automated Model Compression*, which is in submission to TPAMI

This is the pytorch implementation for the paper: Learning Accurate Performance Predictors for Ultrafast Automated Model Compression, which is in submission to TPAMI