当前位置:网站首页>Efficient and Robust 2D-to-BEV Representation Learning via Geometry-guided Kernel Transformer Paper Notes
Efficient and Robust 2D-to-BEV Representation Learning via Geometry-guided Kernel Transformer Paper Notes
2022-08-10 13:07:00 【byzy】
原文链接:https://arxiv.org/abs/2206.04584
1.引言
Convert image features to BEVWhether to explicitly use geometry information when making features,Current methods can be divided into geometry-based point-by-point transformations and geometry-free global transformations.
前者(左图)Use the camera-calibrated extrinsic and extrinsic parameters to build the image pixel toBEVGrid correspondence.But this method relies on too much calibration data,In practice the camera may be offset from the calibration position,lead to unstable correspondence;此外,Often complex and time-consuming operations such as dense depth distribution estimation are required、The feature propagates along the ray toBEV空间等等.
后者(右图)Elongate image features,每个BEVThe grid interacts with all image features.This method view transformation does not require geometric priors,So insensitive to camera offset.However, the computational complexity of this method is positively related to the number of image pixels,There is a contradiction between efficiency and resolution;Since there is no geometrical prior guidance,Models need to mine discriminative information from all views,makes convergence difficult.
This paper proposes a geometrically guided kernelTransformer(GKT),Use camera parameters as a guide without relying too much.When camera shift occurs,The corresponding nuclear regions also move,But can also cover the target,Makes the method insensitive to camera offset.The attention weights of the kernel regions are dynamically generated according to the offset.
GKTUse lookup table indexes,Get rid of the point-by-point transform2D-3D映射操作,提高运行效率.Compared to global transformation,GKTNo global interaction is required,Focus only on the nuclear region guided by the geometry,Has faster running speed and convergence speed.因此GKTBalanced point-wise and global transformations.
2.方法
2.1 The core of geometric guidanceTransformer
上图为GKT的框架.One of the multi-view images is shared throughCNNThe backbone extracts multi-scale features.BEVOne for each grid of space3D坐标and a query to embed,其中is a predefined height shared by all grids.将PiRoughly projected to image coordinates by camera extrinsic and extrinsic parameters and rounded,用于指导transformerPay attention to the corresponding area:
其中Index feature scale,索引视图.
然后在Consider nearbythe nuclear region,每个查询with each view、All feature interactions within the corresponding kernel region at each scale(Some features beyond the image range are set to 0).
2.2 Robustness to camera offset
Decomposes camera offsets into rotational and translational offsets.where the translation offset is
The rotation offset is
其中
Noise random variable满足
After adding offset noise,The formula in the previous section becomes
Since the rounding operation is noise resistant,So small shifts do not change the nuclear area;Even a slightly larger offset,The nuclear zone can still cover the target,And the attention weight can be dynamically adjusted according to the offset.
2.3 BEV到2Dlookup table index
每个BEVThe kernel area corresponding to the grid is fixed,可离线计算.each before runningBEVThe pixel indices corresponding to the grid are stored in a lookup table,Features at the corresponding location can be found directly and efficiently at runtime.
2.4 核的配置
The kernel size can be flexibly configured to balance the receptive field and computational cost;Because of the lookup table index,The layout of the cores can also be chosen arbitrarily(such as cross-shaped nuclei、Expansion core, etc).
3.实验
实施细节:预设的BEVThe grid resolution is lower,High resolution is obtained by upsampling and convolution blocks before segmentationBEV网格,for map segmentation.
主要结果:The method of this paper is in all实时The method is the fastest and has the best performance,Although far from real-timeBEVFormer有更好的性能.
Robustness to camera offset:The experiments examine the performance degradation under different noise variances,found under certain noiseGKTcan maintain comparable performance.And it is found that larger kernels are more robust,And the length in the vertical direction has a greater effect.这可能是因为BEV网格的z是预定义的,There is greater uncertainty.
when there is no noise,GKTThe kernel used is the vertical kernel(The horizontal width is 1),能达到最好的性能.
对BEVHigh robustness:由于GKTOnly rough projections are used,Hence the defaultz值不敏感.
收敛速度:The introduction of geometric priors makes GKT的收敛速度比CVT(Methods using global transformations)快,And can achieve better results in a short period of training.
GKTComparison of different implementations:
- Im2col:Split the image into columns,Each column represents a nuclear region,为BEVThe query selects the corresponding nuclear region.This method requires a lot of storage space.
- 网格采样:All features in the nuclear region are sampled and concatenated.
- Lookup table index:如前文所述.
inference speed,The lookup table index method is the fastest.
边栏推荐
- Loudi Cosmetics Laboratory Construction Planning Concept
- Comparison version number of middle questions in LeetCode
- 【list合并】多个list合并为一个list
- Codeforces Round #276 (Div. 1) B. Maximum Value
- 47Haproxy集群
- Nanodlp v2.2/v3.0光固化电路板,机械开关/光电开关/接近开关的接法和系统状态电平设置
- 燃炸!字节跳动成功上岸,只因刷爆LeetCode算法面试题
- Shell:数组
- bgp双平面实验 路由策略控制流量
- 娄底植物细胞实验室建设基本组成要点
猜你喜欢
随机推荐
想问下大佬们 ,cdc oracle初始化一张300万的表任务运行着后面就这个错 怎么解决哇
ASP.NET Core依赖注入系统学习教程:ServiceDescriptor(服务注册描述类型)
Loudi Cosmetics Laboratory Construction Planning Concept
Guidelines for Sending Overseas Mail (2)
47Haproxy Cluster
MySQL面试题——MySQL常见查询
协程与任务
Merge similar items in LeetCode simple questions
2022 Recruitment Notice for Academician Zhao Guoping Group of Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences
Loudi Sewage Treatment Plant Laboratory Construction Management
【jstack、jps命令使用】排查死锁
【list合并】多个list合并为一个list
G1和CMS的三色标记法及漏标问题
IP地址分类以及网络地址的计算(子网划分、超网划分)[通俗易懂]
【iOS】Organization of interviews
ArcMAP has a problem of -15 and cannot be accessed [Provide your license server administrator with the following information:Err-15]
LeetCode中等题之颠倒字符串中的单词
LeetCode medium topic search of two-dimensional matrix
Reversing words in a string in LeetCode
StarRocks on AWS 回顾 | Data Everywhere 系列活动深圳站圆满结束