当前位置:网站首页>Spatial Pyramid Pooling -Spatial Pyramid Pooling (including source code)
Spatial Pyramid Pooling -Spatial Pyramid Pooling (including source code)
2022-08-11 07:13:00 【KPer_Yang】
Table of Contents
1. Problems solved by Spatial Pyramid Pooling
2. SpatialPyramid Pooling Implementation Principle
3. Code implementation of Spatial Pyramid Pooling
Reference:
"Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition"
1,Problems solved by Spatial Pyramid Pooling
Spatial pyramid pooling is mainly used to solve the problem of inconsistent resolution of input images.Previously, image scaling or cropping was used to resolve image resolution inconsistencies, which could easily lead to loss of image information.The difference between the two methods to solve the problem of inconsistent image resolution is shown in Figure 1.1:
Figure 1.1 The difference between cropping, scaling and Spatial Pyramid Pooling
2. Principle of Spatial Pyramid Pooling
As shown in Figure 2.1, the implementation of SPP-Net is to pool the feature maps by a variety of pooling layers of different sizes, and then perform vector flattening and splicing.The pooling layers of 16*16, 4*4, and 1*1 are used in this article. When applied to your own tasks, you can change them according to factors such as the size of the feature map.At the same time, when the feature map is not equal in length and width, a padding operation is required, and 16*16 and 4*4 are pooled according to the method of dividing the grid, which is different from the operation of the ordinary pooling layer.
Figure 2.1 The principle diagram of Spatial Pyramid Pooling implementation
3,Code implementation of Spatial Pyramid Pooling
import mathdef spatial_pyramid_pool(self, previous_conv, num_sample, previous_conv_size, out_pool_size):'''previous_conv: a tensor vector of previous convolution layernum_sample: an int number of image in the batchprevious_conv_size: an int vector [height, width] of the matrix features size of previous convolution layerout_pool_size: a int vector of expected output size of max pooling layerreturns: a tensor vector with shape [1 x n] is the concentration of multi-level pooling'''# print(previous_conv.size())for i in range(len(out_pool_size)):# print(previous_conv_size)h_wid = int(math.ceil(previous_conv_size[0] / out_pool_size[i]))w_wid = int(math.ceil(previous_conv_size[1] / out_pool_size[i]))h_pad = (h_wid*out_pool_size[i] - previous_conv_size[0] + 1)/2w_pad = (w_wid*out_pool_size[i] - previous_conv_size[1] + 1)/2maxpool = nn.MaxPool2d((h_wid, w_wid), stride=(h_wid, w_wid), padding=(h_pad, w_pad))x = maxpool(previous_conv)if(i == 0):spp = x.view(num_sample,-1)# print("spp size:",spp.size())else:# print("size:",spp.size())spp = torch.cat((spp,x.view(num_sample,-1)), 1)return
边栏推荐
猜你喜欢
随机推荐
知识蒸馏Knownledge Distillation
华为防火墙-7-dhcp
maxwell 概念
什么是Inductive learning和Transductive learning
HCIA experiment
arcmap下的多进程脚本
OA项目之我的会议(会议排座&送审)
LabelEncoder和LabelBinarizer的区别
概念名词解释
arcgis填坑_4
MySQl进阶之索引结构
Windos10专业版开启远程桌面协助
HCIP MPLS/BGP Comprehensive Experiment
bash的命令退出状态码
抖音获取douyin分享口令url API 返回值说明
亚马逊API接口大全
实现通用的、高性能排序和快排优化
矩阵分析——微分、积分、极限
八股文之mysql
训练分类器