当前位置:网站首页>[paper reading] [3D object detection] voxel transformer for 3D object detection
[paper reading] [3D object detection] voxel transformer for 3D object detection
2022-04-23 04:37:00 【Lukas88664】
Paper title :Voxel Transformer for 3D Object Detection
iccv2021
Most of the current practice is to point on the cloud For example, first put the point cloud group turn Then group transformer This article proposes a method based on voxel Of transformer Can be applied based on voxel On the detector Easy to carry out voxel 3d The extraction of global features .
Old rules Upper figure !
It can be seen that the main innovation of the article is 3d Of backbone This means that we can apply this module to all voxel A phase of Above the two-stage detector .
Point cloud voxel Of 3d Convolution is mainly divided into two categories of processing :sparse and submanifold.
Their operation is basically the same except attending voxel It's just different , These two categories of 3d For operation, please refer to SECOND Three dimensional target detector .
To put it simply, use sparse Take the next sample use submanifold While maintaining sparsity 3d Convolution .
For non empty voxel We have to attending voxel( What is? attending voxel Well Let's define ) Conduct transformer operation Position code: select the relative position code Yes transformer Basic students will understand by looking at the following formula ~
about submanifold Layer
its querying voxel All non empty voxel , Well, first of all, there are two kinds of attention operation The output result is added to the input ( One res Layer operation ) And then batch Norm. Then input to the forward propagation layer Conduct submanifold Convolution Another one res layer batch norm layer Last relu Activate Then proceed proj Pay attention to is What we use here is batch norm And the random identification of neurons is cancelled The author believes that this will help the learning process .( The two mentioned in the article attention Let's explain below )
about sparse Layer
It needs to be in some empty voxel on querying operation And these voxel It's not feature Of We use an estimation function The article says it can be for attending voxel Interpolation and other operations The network directly adopts max pool Obviously, through the self attention layer The output result is different from the output structure So the network framework cancels the previous one res layer .
Then let's explain two attention modular
These two kinds of attention The module is mainly composed of attend voxel Divided according to the difference of
local attention
Participate in this module voxel It's our current query voxel Nearby voxel It's probably all nonempty in a convolution size voxel
Give them a transformer operation , obviously For the present query voxel Come on His feature Fusion is the combination of all... In the current feeling field voxel and transformer Relative to convolution More receptive to people from nearby feature.
dilated attention
The convolution of this part can refer to sparse convolution The name is similar Mainly to expand the receptive field :
The article says a sparse attention After reasonable attending voxel choice You can make query The range is up to 15m.
Finally, we can understand the above convolution in combination with the diagram of the article :
After the above two convolution operations We have achieved localfeature and Wider receptive field feature Fusion .
Then the author puts forward a voxel query Quick take non empty voxel Methods The main idea is to put non empty voxel out Make a code You have to be right behind someone voxel Conduct attention When dealing with Directly to attending voxel Just take their code such The complexity of the model is significantly reduced :
The results are very good :
The necessity of different convolution was compared in Ablation Experiment
Necessity of random inactivation layer :
attending voxel Number of
Finally, the reasoning speed and size compared with the traditional model are compared
I saw it for the first time voxel do trans Relatively new
版权声明
本文为[Lukas88664]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204230407539009.html
边栏推荐
- Brushless motor drive scheme based on Infineon MCU GTM module
- 从MySQL数据库迁移到AWS DynamoDB
- A new method for evaluating the quality of metagenome assembly - magista
- 【BIM+GIS】ArcGIS Pro2. 8 how to open Revit model, Bim and GIS integration?
- zynq平臺交叉編譯器的安裝
- 2021数学建模国赛一等奖经验总结与分享
- Fusobacterium -- symbiotic bacteria, opportunistic bacteria, oncobacterium
- Qtspim manual - Chinese Translation
- 用D435i录制自己的数据集运行ORBslam2并构建稠密点云
- 229. 求众数 II
猜你喜欢
【论文阅读】【3d目标检测】Voxel Transformer for 3D Object Detection
指纹Key全国产化电子元件推荐方案
Ali's ten-year technical experts jointly created the "latest" jetpack compose project combat drill (with demo)
The perfect combination of collaborative process and multi process
【时序】基于 TCN 的用于序列建模的通用卷积和循环网络的经验评估
[BIM introduction practice] wall hierarchy and FAQ in Revit
STM32单片机ADC规则组多通道转换-DMA模式
【论文阅读】【3d目标检测】Improving 3D Object Detection with Channel-wise Transformer
Record your own dataset with d435i, run orbslam2 and build a dense point cloud
test
随机推荐
Redis command Encyclopedia
Cortex-M3寄存器组、汇编语言与C语言的接口介绍
为什么推荐你学嵌入式
A heavy sword without a blade is a great skill
mysql table 中增加列的SQL语句
MATLAB lit plusieurs diagrammes fig et les combine en un seul diagramme (sous forme de sous - Diagramme)
Single chip microcomputer serial port data processing (1) -- serial port interrupt sending data
229. 求众数 II
Bridge between ischemic stroke and intestinal flora: short chain fatty acids
Apache Bench(ab 压力测试工具)的安装与使用
thymeleaf th:value 为null时报错问题
A new method for evaluating the quality of metagenome assembly - magista
383. 赎金信
zynq平台交叉编译器的安装
【Echart】echart 入门
Basic use of shell WC (counting the number of characters)
[AI vision · quick review of NLP natural language processing papers today, issue 31] Fri, 15 APR 2022
Inverse system of RC low pass filter
优麒麟 22.04 LTS 版本正式发布 | UKUI 3.1开启全新体验
Matlab reads multiple fig graphs and then combines them into one graph (in the form of sub graph)