当前位置：网站首页>Two Stage Detection

Two Stage Detection

2022-04-23 21:00:00 【Top of the program】

Fast R-CNN
Insert picture description here

utilize selective search Get an approximation 2000 individual RoI Area
Deep ConvNet
The original whole picture is convoluted once
RoI projection
original ROI After convolution feature map Mapping , According to the size of the original picture and the size of the whole picture feature map The size can be scaled to a certain scale . because ROI Dimensions are scaled , There may be non integer cases , This requires rounding down , Cause pixel offset . This is the first quantization error .
RoI pooling layer
Will be different sizes of feature map Become the same size .
take feature map Dage , such as 7x7 The size of the grid , Then proceed max pooling, take feature map Reduce to a certain size . Or by transpose conv（ Transposition convolution ）, Bilinear interpolation （upsample） Enlarge the feature map . Similarly, there is also the operation of grid rounding , Cause pixel offset . This is the second quantization error .

The offset on the feature map , If you map it to the original picture , Then it will lead to the final prediction Bounding Box There will be a greater offset on the original map , So it's usually used ROI pooling The algorithm of , For small targets, the effect is not very good .
The image after transposition and convolution has checkerboard effect , When the picture is enlarged , The picture is similar to the checkerboard . In general , People use simple upsample To improve from small to large .
FCs
Fully connected layer
softmax, bbox regressor
Classification and location regression

because RoI pooling Twice quantization error of ,HeKaimin Put forward ROI Align
Insert picture description here
ROI Align The method used is , In accordance with the mapping scale from the original large picture feature map Get on ROI feature Of map after , Even if feature map There are decimal points in the pixel coordinates in , No rounding operation , Then make the green grid into NxN（ The original paper is 2x2） Small black , Similarly, even if the small black box exists, there are decimals , No rounding operation . Instead, the value of the color point in the center of the small black grid is calculated by bilinear interpolation , And then again maxpolling The four color dots in the whole green box get the pixel value instead of the green grid as the reduced feature map.

ROI Align It solves the position drift caused by twice quantization , But the introduction of super parameters N,N Different sizes of , Some pixels may not be utilized , At the same time, the pixels at the edge of the red box may not be utilized .

2018 year Put forward Precise ROI Pooling：

Precise ROI Pooling [2018, IoU-Net]
First, on the basis of the red box , Hit the grid to get a green box , For each green box inside , The red dot is obtained through blue dot and double line interpolation , Finally, sum the red dots and divide by the total number of pixels , Get the pixel feature representing the green box .
This is what's on the green box average pooling, The green box is a decimal box , It contains decimal pixels . So you need to generate red dots , The red dot is just inside the green box , Is the number of integers . Red dots need to be evenly spaced , For example, the pixels on the red edge are 5.87 Pixel , Uniformly obtained 6 A little bit , that 5.87/5 Is the interval of each point .
Insert picture description here

Faster R-CNN [2015 Ren]

Insert picture description here
Faster R-CNN In addition to the Selective search, use Region proposal network Replaced the , The main generation is Region proposal.

It can be simplified to the following figure
Insert picture description here

Backbone
RPN RPN It's mainly about generating Region Proposal, Training is needed ,RPN The introduction of , bring fast rcnn Truly realize the end-to-end network
Fast RCNN: ROI + 𝐂𝐥𝐚𝐬𝐬𝐢𝐟𝐢𝐜𝐚𝐭𝐢𝐨𝐧,𝐑𝐞𝐠𝐫𝐞𝐬𝐬𝐢𝐨n

The algorithm has no standard answer , The details should not be tied up , The main idea of algorithm is
The steps of learning algorithms

Through the discussion of literariness in words
Text description becomes mathematical language
Code Mathematics

The Internet is building blocks , Simple functional programming can switch between different building blocks .
The main composition and structure are as follows ：
Backbone Feature extraction network , This part can be used VGG, Resnet, DenseNet,Unet And other basic networks . Basic components
neck/link： The essential , It can be used 1x1 Convolution kernel , Can be replaced by inception module,bottol neck module And so on. .
head: functional head, It's usually fc, Convolution operation, etc
Backbone
Insert picture description here
RPN
The main purpose is to generate Region proposal

The so-called two stages come from RPN+Bbox Regression
This is also why the two-stage detection effect is better than the one-stage detection effect （ Note that this conclusion is wrong ）
RPN The structure is as follows , Go first 3x3 And twice 1x1 Convolution kernel ,1x1, Once, the number of channels was changed to 18 Form one output , It is mainly used for classification , In addition, the number of channels is changed to 36, Mainly used to do Bbox forecast