当前位置：网站首页>CVPR2022——Not All Points Are Equal : IA-SSD

CVPR2022——Not All Points Are Equal : IA-SSD

2022-08-11 06:17:00 【zhSunw】

IA-SSD

Not All Points Are Equal:IA-SSD

Not All Points Are Equal:IA-SSD

insert image description here
Write in front: with RandLa-Net recently readThe pointcut is similar to the downsampling method.

Motivation

The current Point-Based methods all use "task-agnostic" (unrelated to the detection itself) sampling methods: Random, D-FPS, Feat-FPS.For these sampling methods are ignored: "For the detection task, foreground points are more important than background points".

Contribution

This paper proposes two "learnable, task-oriented, instance-aware" sampling methods (instance-aware learning methods related to detection tasks).
An efficient model IA-SSD is proposed based on the sampling method.
Extensive experiments were performed on KITTI, Waymo, ONCE datasets.

Keyknowledge

Instance-aware Downsampling Strategy

Class-aware Sampling
The training branch learns point semantic information*, predicts the foreground point probability score of each point, and takes the top k as the sampling point and sends it to the next layer.
The loss function uses normal cross-entropy:

Difference from Feature-FPS: This paper wants as many foreground points as possible, while F-FPS wants points with as large a feature gap as possible.
Centroid-aware Sampling
Introduce the central mask mask based on Class-aware Sampling:

The mask has the same centrality as in 3DSSD:

Use the center mask to weight the cross-entropy loss to improve the probability of being sampled close to the center point and preserve the center point as much as possible (considering that instance center estimation is the key to the final object detection):

Contextual Instance Centroid Perception

Contextual Centroid Prediction
Follows the VoteNet method to predict an offset from the center, and adds a regularization, so that the center prediction of each instance is aggregated, reducing the instability of the predicted center offset:

and VoteNet use only the points in the BBox to predict the center pointDifferently, this paper also utilizes the surrounding representative points: manually expanding the BBox, or scaling up the box to cover more relevant contextual information near the object.
Centroid-based Instance Aggregation
For each center point, use PointNet++ to learn the features of the instance: convert the adjacent points into a local regular coordinate system, and then aggregate the point features through shared mlp and symmetric functions.
Proposal Generation Head
Predicts BBox based on aggregated instance features, and then performs 3D-NMS post-processing.

Loss

Add multiple losses and jointly optimize to achieve end-to-end training.
insert image description here

Experiment

Comparison of sampling methods on KITII validation set

Insert image description here
In the case of low sampling points (256points) articleThe sampling ratio of the proposed two samples to the instances is obviously due to other sampling methods.At the same time, Feature-FPS takes into account the characteristics of each point, so the sampling ratio of instances is also higher than random and D-FPS.

Quantitative comparison of detection performance of different methods on the KITTI test set

insert image description here
IA-SSD is on two instances of Car and CycThe effect is better, the accuracy is higher in the Point-based method, but lower than PV_RCNN, and the effect is poor on Ped instances.Simultaneous detection speed is higher than all other methods.