当前位置：网站首页>[point cloud series] multi view neural human rendering (NHR)

[point cloud series] multi view neural human rendering (NHR)

2022-04-23 13:18:00 【^_^ Min Fei】

List of articles

1. Summary
2. motivation
3. Method
4. experiment
- Data sets
- Experimental results ：
5. summary

1. Summary

Yu Jingyi's team work ,CVPR2020, Neural rendering series
Address of thesis ：https://openaccess.thecvf.com/content_CVPR_2020/papers/Wu_Multi-View_Neural_Human_Rendering_CVPR_2020_paper.pdf
Project address ：https://wuminye.github.io/NHR/
Data sets ：https://wuminye.github.io/NHR/datasets.html

2. motivation

Specifically for human body rendering end-to-end frame （NHR）： Use a little cloud PointNet++ To extract 3D features + Project to 2D Smooth CNN To deal with noise and deformity . In essence, point cloud is introduced to guide the rendering method .
Insert picture description here

3. Method

flow chart

Insert picture description here

The overall framework

It includes three modules ：

feature extraction （FE）
Projection and rasterization （PR）
Rendering （RE）

modular 1： feature extraction （FE）

Insert picture description here
$\varPsi_{fe}$ : PointNet++ Feature extraction operations , Remove classification branches , Keep only split branches as FE The branch of .
$D_t$ ： Feature descriptor of point cloud
$V={v_i}$ : Normalized viewing angle direction , $v^i = \frac{p^i_t-o}{||p^i_t-o||}_2$ , among $o$ Is the projection center of the target angle camera .
$\{.\}$ : Indicates splicing , This refers to the mosaic color and the viewing angle direction of the normalized point , The spliced features are used as initialization point attributes for feature extraction .
$\varphi_{norm}$ Indicates that the point coordinates have been normalized .

modular 2： Projection and rasterization （PR）

Insert picture description here
$S$ : After projection 2D Characteristics of figure , among $S_{x,y}=d^i_t$ , $d^i_t$ It's No $i$ Feature descriptor of a point .
$E$ : Depth map of the current view
$Objective mark phase machine ginseng Count$ ： $\hat{K}$ 、 $\hat{T}$
$can learn xi Of ginseng Count$ ： $\theta_d$
$\psi_{pr}$ ： The whole process of projection and rasterization

modular 3： Rendering （RE）

Insert picture description here
$\psi_{render}$ : An improved version of U-Net, Output 4 passageway , The first three channels are RGB Images $I *$ , The last passage is mask yes $M *$ , Use sigmoid.

Loss of training

L1 Loss + Loss of perception
Insert picture description here
$n_b$ ：batch_size size
$I_i*$ 、 $M_i*$ : The first $i$ A graph of rendered output and mask.
$\psi_{vgg}$ : Extract the... Respectively 2 Tier and tier 4 layer VGG-19 Characteristics of

Geometric improvement

To refine the geometry , Rendered a dense set of new views , And use the generated mask mask As an outline , And give space engraving or contour shape for reconstruction .

Due to multi view stereo input （ In fact, it is a rough point cloud input ） There may be empty places or sheltered areas .

Mask And shape generation ： By training the rendering model, we get something similar to RGB Cutout , Then render on a new viewpoint set with unified sampling mask, Each has a corresponding camera parameter , The size is 800x600. And then , have access to shape-from-silhouettes(SfS) To reconstruct the human body mesh.

Point sampling and coloring ： It can be done by MVS Calculate the corresponding color from the point cloud on the , Use the nearest neighbor .

Hole completion ：
Completion block mechanism ： For each point $u^i_t\in U_t$ , And $P_t$ Medium $p^i_t$ Euclidean distance of a point $\phi(u^i_t,p^i)t)$ than $\hat{P}_t-U_t$ Big . So set the threshold $\tau_1$ As formula （5）： The experiment is set to 0.2
Insert picture description here
Then calculate $\hat{P_t}$ In the middle $p^j_t$ Of Euclidean distance < Number of threshold points , Remember to do $s^i_t=\#\{b^i_t|b^i_t<\tau_1\}$

And then use 15 individual bins Calculation $s^i_t$ All histograms of , By bisecting the maximum distance value 15 individual bins. As observed in the first bin It contains a more important point than the second , So use the first bin The maximum distance is used as the second threshold ： $\tau_2$ To select the last point value ：
Insert picture description here
The figure below shows that after the hole is filled , It can reduce the flicker when changing the viewing point .
Be careful ： The final set will still have artifacts , Because its quality depends on the threshold $\tau$

The figure shows how to set the threshold more intuitively , And distance measurement .
Insert picture description here

4. experiment

Data sets

adopt 80 Multiple camera systems collect 5 A sequence of . Per second 25 frame . All sequences are in 8-24 second . Characters wear different clothes to do different actions .
Each sequence includes ：RGB Images 、 prospects mask,RGB Point cloud sequence and camera calibration .
Insert picture description here