当前位置：网站首页>[point cloud series] full revolutionary geometric features

[point cloud series] full revolutionary geometric features

2022-04-23 13:18:00 【^_^ Min Fei】

List of articles

1. Summary
2. motivation
3. Method
4. experimental result
5. Conclusion and thinking
6. Reference resources

Inventory clearing series , It took a long time .

1. Summary

The paper ：Fully-Convolutional geometric features
Code ：https://github.com/chrischoy/fcgf

Background knowledge supplement ：
In reverse engineering, the point data set of product appearance surface obtained by measuring instrument is also called point cloud , Generally, the number of points obtained by using three-dimensional coordinate measuring machine is relatively small , The distance between points is also relatively large , It's called sparse point cloud ; A three-dimensional laser scanner or cloud scanner is used to obtain points , The number of points is relatively large and dense , It's called dense point cloud .

2. motivation

Existing methods often need to calculate the underlying features as input Or block based finite receptive field features .
Differentiated 3D features , Especially registration 、 track 、 In the scene flow task .
Therefore, this paper proposes FCGF, Through the full convolution network, the feature of point cloud is calculated , No need to deal with , Compact structure （32 dimension ）.
Specifically ： Use Minkowski Convolution coefficient expression + Use ResUnet The extracted features + new loss Measure

Basically, it can be understood as Minkowski Of U-Net Convolution form , utilize Minkowski The sparsity of + Residuals and U-Net Feature retention enables compact expression .
Insert picture description here

3. Method

The overall framework ：

Basically, it benefits from U-Net+ The good effect of residuals
Residual structure ： 2 A convolution operation on the output , As shown in the orange box on the right .
Encoder ：3 individual （Conv+BN+Res） Structure , The blue module , Nuclear size ： $3\times 3$ , The first convolution stride=1, The rest is 2.
decoder ：3 individual （Transposed Conv+BN+Res） structure , Yellow module , Remove the first one Transposed Conv, The rest have two inputs .
Feature extraction layer ： the last one Conv, Output 32 passageway ;
Insert picture description here
Point clouds express ： Coordinate matrix C+ features F, That is, the pattern of Minkowski convolution .

Loss function

4 Loss function in ：

Contrast the loss （Contrastive loss）
Triplet loss （Triplet loss）
Hard sample - Contrast the loss （Hardest-contrastive）
Hard sample - A triple （Hardest-triplet）

The basic design idea meets ：
If （i, j） That's right , Then their characteristic distance satisfies $D(f_i, f_j) ->0$ , General Settings $D(f_i, f_j) <m_p$ that will do , Prevent over fitting ;
If （i, j） Is a negative sample pair , Then the characteristics between them should meet $D(f_i, f_j) >m_n$ .
m It's the threshold . The following compares the differences between the four methods ：
Blue arrow ： A positive sample is right ; orange ： Negative sample pair ;
Insert picture description here

Contrast the loss

Insert picture description here
Formula analysis ：
$I_{ij}=1$ :（i, j） Positive sample pair ; otherwise $I_{ij}=0$
$\bar{I}_{ij}=1$ : （i, j） Is a negative sample pair ; otherwise $\bar{I}_{ij}=0$
Positive sample alignment by 3DMatch Data GT Our nearest neighbors get , Negative samples are generated randomly , Filter out the points belonging to the positive sample through the hash table .

Hard sample - Contrast the loss

Insert picture description here
Formula analysis ：
The formula is divided into three parts , The part measured by the positive sample remains unchanged , Compared with the loss . It just expands the negative sample into two parts , Calculated proportionally . That is, it refines the loss of the part of the negative sample that is easy to distinguish into positive samples .
$P$ : Number of positive samples
$P_i$ And $P_j$ : All negative samples , Two positive samples correspond to one negative sample , So there are two parts .
Icon 3 Very clear .

Triplet loss

Insert picture description here
Formula analysis ：
Want to minimize the distance between two positive samples , Maximize the distance between two negative samples at the same time .
$f$ : Current characteristics ;
$f_+$ ： $f$ A positive sample of
$f_-$ : $f$ The negative sample of

Hard sample - Triplet loss

Insert picture description here
Formula analysis ：
Just a pair of positive samples （i, j） Respectively for $i$ Build a triple , Yes $j$ Build a triple . Each point corresponds to a negative sample , So it becomes a triple loss of two terms . It is hoped that the larger the sample spacing is , The smaller the negative sample spacing .

4. experimental result

Experimental setup

The optimizer is SGD, Initial learning rate 0.1, Exponential decay learning rate ( $\gamma = 0.99$ ).Batch size Set to 4, Training 100 individual epoches. Use random data in training scale(0.8 - 1.2) And random rotation (0-360°) The enhancement of .

Data sets

3D Match
KITTI

Evaluation indicators

Feature-match Recall （FMR）

Formula analysis ： The average value of each point cloud's judgment on the quality of features .
$1$ : Indicator function
$\Omega_s$ : The first $s$ individual pair Nearest neighbor
$T^*$ ： Translation and rotation transformation of point cloud pair
$y_j = argmin_{y_j} ||F_{xi}-F_{yj}||$ . That is to say $x_i$ stay Y The point with the smallest feature distance .
$\tau_1=0.1, \tau_2=0.05$
Registration recall

Formula analysis ： Measure two pairs of points （i, j） With its estimated point pair $\hat{T}_{i,j}$ Of MSE distance .
$\Omega^*$ : Point pair set , If (i , j) Coverage is in 30% above , So think $E_{RMSE}<0.2m$ The match is correct .
Associated rotation and conversion losses ：
$|\hat{T} - T^*|$
$arcoss((Tr(\hat{R}^TR^*)-1)/2)$ , $\hat{R}$ Is the predicted rotation matrix , $R^*$ yes GT.

experiment

Feature matching recall chart , It can be seen that the proposed method is the best
Insert picture description here
Visual matching diagram ：

visualization KITTI Effect of dataset ：

3DMatch Dataset effects ： Low dimension , The effect is good .

Ablation Experiment ： Output feature dimensions ：32 The best .

Ablation Experiment ：
For comparative losses , Normalized features are better than non normalized features
Hard sample - Compare the loss ratio Compared with the loss , And the best of all .
For triple loss , Non normalized features are better than normalized features .
Hard sample - Triple loss ratio Triple loss is better , But it can easily lead to collapse .
Insert picture description here
Different threshold design effects ,
In general , $\frac{m_n}{m_p}$ The bigger it is , The better ; But if >30, The effect began to decline .

3D Match Data sets ： Registration Recall result . The average effect is the best .

KITTI Effect on dataset ：

5. Conclusion and thinking

be based on Minkowski Convoluted fully connected network , Sparse representation optimizes video memory ;
The quantization of sparse expression will lose some point cloud information ;
Loss design , Use hash to speed up the generation of triples ;
The follow-up work is to use it in the end-to-end point cloud registration task ;