当前位置:网站首页>[point cloud series] learning representations and generative models for 3D point clouds
[point cloud series] learning representations and generative models for 3D point clouds
2022-04-23 13:18:00 【^_^ Min Fei】
List of articles
Clean up some previous drafts , I forgot to write and publish half of it , I hope you will forgive me . However, since several people have translated the full text , If you don't understand, you can refer to , This article is really regarded as the ancestor of point cloud generation model . From Stanford , The validation test is really much . It basically tells us that the Gaussian mixture model is easy to use .
1. Summary :
2018CVPR Conference papers , It's also a pioneer work .
The paper :http://proceedings.mlr.press/v80/achlioptas18a/achlioptas18a.pdf
Supplementary materials :http://proceedings.mlr.press/v80/achlioptas18a/achlioptas18a-supp.pdf
Code :https://github.com/optas/latent_3d_points
Main points of this paper :
-
Study the representation of point cloud , The main use of deep AutoEncoder.
-
Different generation models are compared ,l-GAN Much better GMMs:
The model includes :
1)GANs: Act directly on the original point cloud ;
2)l-GANs(latent-GANs): Point cloud features directly acting on potential space , use AE First extract the features of potential spatial point cloud ;
3)GMMs: Gaussian mixture modelEvaluation methods :
1) Sample fidelity ;
2) Coverage measurement ;
2. Contribution point :
- A new point cloud AE frame ;
- The first set of point cloud depth generation model ;
- A new measure of , Based on the best match between two different point cloud sets ;
3. Measurement method :
3.1 Measure
1) EMD: Geodesic distance
Limit : The number of two point sets needs to be consistent , It's a transportation problem between two sets , One-to-one correspondence .
advantage : It's almost everywhere
2) CD: Nearest neighbor metric
advantage : Measure the square of the nearest neighbor between one set and another , There is no limit to the number of two sets being the same . And the calculation is EMD More efficient .
3.2 Generate model metrics
1) JSD:
Euclidean three-dimensional boundary distribution of Jensen - Shannon divergence (Jensen-Shannon Divergence )
Used to measure the similarity between two distributions
2) Converage:
Calculation of coverage : use B And A To measure . The essence is to calculate the distance to get . So the calculation of similarity still needs to use the basic measurement CD or EMD.
Distance calculation formula at point level ; EMD It's almost everywhere , but CD Differentiable and computationally more efficient ;
-
Earth Mover’s distance(EMD)
-
Chamfer(pseudo)distance(CD)
3) Minimum matching distance (Minimum Matching Distance,MMD)
MMD Can be measured by A Yes B The fidelity of , make up Disadvantages of coverage ( Cannot accurately indicate Geometry A How well is covered in )
Compare
MMD And Coverage complementary . stay MMD Small Cov In big cases , Point cloud A The collection was captured with good fidelity B All modes of .
JSD Is to calculate the distribution similarity , Therefore, it is a rough evaluation of the similarity between the two sets .JSD And MMD There is a good correlation , Can be used as an effective alternative .
4. Representation and generation model
Basic block :
AEs: Recode the input schema . Compressed features can be used z z z To express the original point cloud . Can be used for reconstruction tasks .
GANs: Infrastructure : generator G Discriminator D.
G: The generated sample cannot be distinguished from the real data , By putting a simple distribution Randomly selected samples are passed to the generator function .
D: Distinguish the synthetic sample from the actual sample ;
GMM: A probabilistic model , Used to represent a population assuming a multimodal Gaussian distribution , That is, it is composed of multiple sub populations , Each sub population obeys Gaussian distribution . Suppose the number of subpopulations is known , Maximize expectations (EM) The algorithm can learn from random samples GMM Parameters ( In fact, it is the mean and variance of Gaussian distribution ).
4.1 original GAN Model (r-GAN):
Training data set :20483 Point cloud
Judging device : Structure and AE identical , It's no use Batch-Norm, Use Leaky ReLU, The last full connection layer output is sent to sigmoid Activation function .
generator : Gaussian noise as input ,5 individual FC-ReLu Layer mapping to 20483 Output
4.2 Hidden space GAN(l-GAN):
First train a AE, And then use AE The encoder obtains implicit expression features Z.
Both generator and discriminator are based on Z To operate . Optimize implicit expression features Z.
advantage : Simple structure . say concretely , Single hidden layer MLP Generator with two hidden layers MLP Discriminator coupling , Enough to produce measurable and realistic results .
4.3 Gaussian mixture model (GMM):
stay AE Build a series of Gaussian mixture models in the learning hidden space (GMMs).
Various quantities of Gaussian components and diagonal or complete covariance matrices are tried .
The distribution is first configured and then sampled AE The decoder of , GMM It can be regarded as a point cloud generator , similar l-GANs.
5. Experimental evaluation
Data sets
ShapeNet: The axis is aligned and centered in the unit sphere . The data is divided into 85%:5%:10% = Training : verification : test .
JSD The evaluation uses 283 Regular voxel mesh .
AE The reconstruction
Representativeness
Use AE The effect of editing the part of the point cloud by simple addition algebra in latent space .
Shape editing : The formula is as follows , The effect is as follows 3, Reference resources (Yi et al. 2016a)
It is assumed that a given category can be further divided into two subclasses :A And B, The two have structural features that do not exist in each other , Can be represented by their average potential X B − X A \mathbf{X}_B-\mathbf{X}_A XB−XA To simulate the difference between the two subclasses . The same can be done through x A ′ = x A + X B − X A x_{A'}=x_A+\mathbf{X}_B-\mathbf{X}_A xA′=xA+XB−XA. See the figure below 3.
Generalization : The reconstructed shape is almost as good as the training shape . Use MMD Capture AE The generalization of .
Dive space exploration : linear interpolation
The implicit expressions of the two shapes are linearly interpolated and the results are decoded , Get an intermediate variant between two shapes (morph-like).
Shape completion :
classification :
Training comes from 55 Of two categories of man-made objects 57,000 A model . Specifically for this experiment , We use 512 The characteristics of dimensions . Using linear SVM To classify , The following table shows the effect . You can see the simplicity AE The effect is good .
surface 2 ModelNet10 / 40 Classification performance on (%). Compare A:SPH(Kazhdan wait forsomeone ,2003),B:LFD(Chen wait forsomeone ,2003),C:TL-Net(Girdhar wait forsomeone ,2016),D:VConv-DAE(Sharma wait forsomeone . ,2016),E:3D-GAN(Wu etc. ,2016).
there SVM Of the classifier AE Use an encoder , Each layer corresponds to a filter :128,128,256,512 individual , There were 1024,2048,2048*3 A neuron decoder . Each layer uses BatchNorm. Along the Z Rotate the axis to each batch of input point cloud to realize online data enhancement . Training 1000 The wheel CD Loss and 1100 The wheel EMD. Some parameters of classification are shown in the table below .
Evaluation generation model
The comparison of various methods is shown in the table below , In the verification of - The segmentation passes the minimum JSD choice epochs/ Model , Evaluate on the test segmentation of the chair dataset 5 A generator . We report :A: Sample based memory baseline ,B:r-GAN,C:1-GAN(AE-CD),D:1-GAN(AE-EMD),E:1-WGAN(AE-EMD), F:GMM(AE-EMD). You can see GMM The best effect .
chart 6 It shows that as the training goes on , The generated synthetic data set is based on GAN The model maintains the relationship between test data JSD( Left ) and MMD And coverage ( Right ).
Be careful :r-GAN Strive to provide good coverage and good fidelity of the test set ; This implies a recognized fact , End to end GAN It's usually hard to train .
With less training ,l-GAN (AE-CD) Perform better in fidelity , But coverage is still low . Switch to based on EMD Of AE Used to represent and use the same potential GAN framework (l-GAN,AE-EMD), Resulting in significant improvements in coverage and fidelity . Two l-GAN Although there are known problems with mode crashes : Halfway through the training , First, coverage began to decline , But the fidelity is still at a good level , This means that they over fit a small part of the data . later , This was followed by a catastrophic collapse , Coverage dropped to 0.5%. As expected , Switch to potential WGAN This kind of collapse has been largely eliminated .
chart 7 Explained CD The blindness of distance , It's very uncomfortable, very discriminative . Only some good matches have additional side effects , That is, coverage CD> coverage EMD When .
6. other
Shape analogy :
Here through the hidden space Perform linear operation + Euclidean nearest neighbor search Find a similar shape , The Euclidean property of implicit space is proved .
Specifically : It is known that A A A , A ′ A' A′, and B B B, that B ′ = B + ( A ′ − A ) B' = B + (A' - A) B′=B+(A′−A), That is, searching B B B Potential space with B ′ B' B′ The result of nearest neighbor . Can be used to find shape analogies . As shown in the figure below .
Auto coder
Data sets D-FAUST:10 personal , Everyone 14 A sport , Each movement consists of 300 Time series capture of a grid .
This experiment : Random sampling 80 Grid (s) , Each mesh is extracted 4096 A point cloud . The following figure shows the effect of interpolation .
Parameter selection
See the paper for other detailed parameters
GMM
Conclusion : The model with complete covariance matrix is better than the model trained with diagonal covariance JSD Much smaller .
Using the complete covariance matrix ,30 One or more clusters seem to be enough to get the smallest JSD.
limited
Chairs with rare shapes will be incorrectly decoded .
AE May miss high-frequency geometric details , You can't rebuild an uncommon example .
7. Reference resources
版权声明
本文为[^_^ Min Fei]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204230611136601.html
边栏推荐
- XML
- FFmpeg常用命令
- Analysis of the latest Android high frequency interview questions in 2020 (BAT TMD JD Xiaomi)
- Imx6ull QEMU bare metal tutorial 2: usdhc SD card
- uniapp image 引入本地图片不显示
- 4.22 study record (you only did water problems in one day, didn't you)
- Uninstall MySQL database
- 【快排】215. 数组中的第K个最大元素
- Hbuilderx + uniapp packaging IPA submission app store stepping on the pit
- Office 2021 installation package download and activation tutorial
猜你喜欢
[wechat applet] flex layout usage record
CSDN高校俱乐部“名师高校行”——湖南师范大学站
Example interview | sun Guanghao: College Club grows and starts a business with me
Summary of request and response and their ServletContext
[official announcement] Changsha software talent training base was established!
MySQL 8.0.11下载、安装和使用可视化工具连接教程
【官宣】长沙软件人才实训基地成立!
web三大组件之Servlet
Riscv MMU overview
The first lesson is canvas, showing a small case
随机推荐
Lpddr4 notes
SPI NAND flash summary
Translation of attention in natural language processing
100 GIS practical application cases (53) - making three-dimensional image map as the base map of urban spatial pattern analysis
Feature Engineering of interview summary
filter()遍历Array异常友好
AUTOSAR from introduction to mastery 100 lectures (81) - FIM of AUTOSAR Foundation
uniapp image 引入本地图片不显示
Design of STM32 multi-channel temperature measurement wireless transmission alarm system (industrial timing temperature measurement / engine room temperature timing detection, etc.)
MySQL basic statement query
mui picker和下拉刷新冲突问题
LeetCode_DFS_中等_695.岛屿的最大面积
鸿蒙系统是抄袭?还是未来?3分钟听完就懂的专业讲解
(personal) sorting out system vulnerabilities after recent project development
"Xiangjian" Technology Salon | programmer & CSDN's advanced road
According to the salary statistics of programmers in June 2021, the average salary is 15052 yuan. Are you holding back?
Servlet of three web components
Nodejs + Mysql realize simple registration function (small demo)
2020最新Android大厂高频面试题解析大全(BAT TMD JD 小米)
Translation of multi modal visual tracking: review and empirical comparison