当前位置:网站首页>Spatial Pyramid Pooling -Spatial Pyramid Pooling (including source code)

Spatial Pyramid Pooling -Spatial Pyramid Pooling (including source code)

2022-08-11 07:13:00 KPer_Yang

Table of Contents

Reference:

1. Problems solved by Spatial Pyramid Pooling

2. SpatialPyramid Pooling Implementation Principle

3. Code implementation of Spatial Pyramid Pooling


Reference:

"Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition"

Paper link: [1406.4729] Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition (arxiv.org)

1,Problems solved by Spatial Pyramid Pooling

Spatial pyramid pooling is mainly used to solve the problem of inconsistent resolution of input images.Previously, image scaling or cropping was used to resolve image resolution inconsistencies, which could easily lead to loss of image information.The difference between the two methods to solve the problem of inconsistent image resolution is shown in Figure 1.1:

Figure 1.1 The difference between cropping, scaling and Spatial Pyramid Pooling

2. Principle of Spatial Pyramid Pooling

As shown in Figure 2.1, the implementation of SPP-Net is to pool the feature maps by a variety of pooling layers of different sizes, and then perform vector flattening and splicing.The pooling layers of 16*16, 4*4, and 1*1 are used in this article. When applied to your own tasks, you can change them according to factors such as the size of the feature map.At the same time, when the feature map is not equal in length and width, a padding operation is required, and 16*16 and 4*4 are pooled according to the method of dividing the grid, which is different from the operation of the ordinary pooling layer.

Figure 2.1 The principle diagram of Spatial Pyramid Pooling implementation

3,Code implementation of Spatial Pyramid Pooling

yueruchen/sppnet-pytorch: A simple Spatial Pyramid Pooling layer which could be added in CNN (github.com)

import mathdef spatial_pyramid_pool(self, previous_conv, num_sample, previous_conv_size, out_pool_size):'''previous_conv: a tensor vector of previous convolution layernum_sample: an int number of image in the batchprevious_conv_size: an int vector [height, width] of the matrix features size of previous convolution layerout_pool_size: a int vector of expected output size of max pooling layerreturns: a tensor vector with shape [1 x n] is the concentration of multi-level pooling'''# print(previous_conv.size())for i in range(len(out_pool_size)):# print(previous_conv_size)h_wid = int(math.ceil(previous_conv_size[0] / out_pool_size[i]))w_wid = int(math.ceil(previous_conv_size[1] / out_pool_size[i]))h_pad = (h_wid*out_pool_size[i] - previous_conv_size[0] + 1)/2w_pad = (w_wid*out_pool_size[i] - previous_conv_size[1] + 1)/2maxpool = nn.MaxPool2d((h_wid, w_wid), stride=(h_wid, w_wid), padding=(h_pad, w_pad))x = maxpool(previous_conv)if(i == 0):spp = x.view(num_sample,-1)# print("spp size:",spp.size())else:# print("size:",spp.size())spp = torch.cat((spp,x.view(num_sample,-1)), 1)return 

原网站

版权声明
本文为[KPer_Yang]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/223/202208110517347715.html