当前位置：网站首页>[point cloud series] pnp-3d: a plug and play for 3D point clouds

[point cloud series] pnp-3d: a plug and play for 3D point clouds

2022-04-23 07:20:00 【^_^ Min Fei】

List of articles

1. Summary
2. motivation
3. Method
4. effect
5. reflection

Recently, I finally have time to read the paper , It's really a happy thing to have time to read papers , You can enter again .

1. Summary

The Journal Conference ：TPAMI Top issue 2021 year , only 8 Page content
Code ：https://github.com/ShiQiu0419/pnp-3d

2. motivation

Provide a plug and play enhanced feature expression module , Enhance the effectiveness of the existing point cloud analysis network , Nature is

Provide a system with enhance Existing network and Light The characteristic expression of , have Universal sex .
Enhancement ： Local geometry & Characteristic information （ Focus on geometric relevance ）
Portability ： Global bilateral regularization （ similar attention Pruning version of ）
generality ： Point cloud classification 、 Point cloud semantic segmentation 、 Object detection

The following figure is an intuitive effect
Insert picture description here

3. Method

The following is the network architecture of the proposed method , It mainly includes two parts ： Local content fusion + Global bilateral regularization
Insert picture description here

3.1 Local content fusion

Purpose ： be based on 3D Space geometry Relevance To merge Local geometry and Feature content
practice ： There are two branches , Yes Local geometry and Feature content Respectively carry out correlation coding , The final splicing .

3.1.1 Local geometric branching ：

Purpose ： Location coding can provide information in a certain order ;
Insert picture description here
In the process , There are three steps ：

Use DGCNN （EdgeConv） Generate nearest neighbor KNN,
As shown in the following formula , Any point $p_i$ Of K The nearest neighbor is expressed as $\mathcal{N}(p_i)$
Build local geometry
This step is the core of the proposed method , For each point $p_i$ , Its local geometry includes itself and the differences between it and its nearest neighbors $p_{i_k}-p_i$ , So the whole is expressed as $p_i;p_{i_k}-p_i]$ , Because there are two parts , Each part is 3 passageway , Plus there is k A close neighbor , All together kx6 Dimension information input ;
Then the global geometry of all points is represented as $\mathcal{\tilde{P}}$ .
Coding Geometry
Here is something similar to the routine PointNet Operation , about $\mathcal{\tilde{P}}$ Conduct 1x1 The size of MLP+BN+ReLu after , after maxpooling Get the largest one to represent the local geometric information of the whole ;

3.1.2 Feature content branch ：

Insert picture description here

Purpose ： The extracted features
This is actually the same as 3.1.1 It's the same step , The only difference is that this is for features , And the above is for the point . So I'll just list the formula ：
MLP It's with the above 3.1.1 Shared .
Insert picture description here

3.1.3 polymerization

Fusion geometry 3.1.1 And feature content 3.1.2 Information together .
Insert picture description here

Insert picture description here

3.1.4 Code implementation

Definition ：
Insert picture description here
Realization ：

3.2 Global bilateral regularization

Purpose ： Enhance feature expressiveness .
practice ： Firstly, it is divided into point based and channel based feature information , Then regularization aggregation , Form the final expression .
Insert picture description here

3.2.1 Channel based information （channel-wise）

Insert picture description here
The essence is to learn the weight of features for each channel , Then use the average result to express , The formula is as follows ：
Actually sum attetention The effect is basically the same , The only difference is that not all channels are used here , But with the $\frac{C}{r}$ Channels . here $r > 2$ , It's a reduction factor , Mainly to reduce the output dimension , In the experiment 8, Because it reduces the amount of calculation , So than attention Much lighter . The specific operation is also ：1. Multiply the matrix first ;2. RELU Activate ;3. Average pool solution , obtain $g_c$ .

Formula analysis ： there $\mathbf{W_c}$ Is a weight matrix ,ReLu One is to provide nonlinear effects , The other is to satisfy that the output is nonnegative , This requirement comes from the formula （6）. Actually , That is, it is used for a 1 Dimensional convolution , Input is $\mathcal{F}_L$ , The output is $\mathcal{F}_L\mathbf{W}_c$ , What is involved is only the change of dimension .
Insert picture description here

3.2.2 Point based information （point-wise）

Insert picture description here

Operation consistent with channel based , Direct column formula ：
Insert picture description here

3.2.3 polymerization

Insert picture description here

It's used here 1. First, carry out the outer product operation ; then 2. take a square root , As formula （6） Shown ;
Then by the formula （8） obtain $\mathcal{F}_G$ The expression of ;
The final output is the formula （9）, Its meaning is ：
Insert picture description here
The formula （8）： This is a reference [42], Added two links .
The formula （9）： More distinctive and representative feature output is needed , And the formula （8） Because it is obtained from the average , It represents the usual pattern , Can be filtered out . Therefore, the final output feature is to filter out the distinguishing feature map of common patterns .

3. 2. 4 Code implementation

Here are the corresponding two convolutions , Dimensions have changed , there 8 Corresponding to $r$
Definition ：
Insert picture description here
Realization ：

4. effect

4.1 classification

Input ：1024 A little bit
Which tested RS-CNN+PnP-3D And PointNet++ + PnP-3D： In every SA Add... After the layer PnP-3D.
The rest is added after each convolution block PnP-3D
Insert picture description here

4.2 Semantic segmentation

Insert picture description here

4.3 object detection

Insert picture description here

4.4 Comparison with the existing baseline for the task

Insert picture description here

4.5 Ablation Experiment - The formula

chart 4： For the formula 4、5、9 The ablation experiment , It can be found that all use average pooling and The subtraction strategy is the best .
chart 5： The formula for the final aggregation , It is found that the geometric average is optimal .
Insert picture description here

4.6 Ablation Experiment -attention

Insert picture description here

4.7 visualization

To be honest , Like an enhancement on the edge , But it's not so obvious .
Insert picture description here
Here we mainly talk about , The proposed module is used , Located at the core .