当前位置:网站首页>Two Stage Detection

Two Stage Detection

2022-04-23 21:00:00 Top of the program

Fast R-CNN
 Insert picture description here

  1. utilize selective search Get an approximation 2000 individual RoI Area

  2. Deep ConvNet
    The original whole picture is convoluted once

  3. RoI projection
    original ROI After convolution feature map Mapping , According to the size of the original picture and the size of the whole picture feature map The size can be scaled to a certain scale . because ROI Dimensions are scaled , There may be non integer cases , This requires rounding down , Cause pixel offset . This is the first quantization error .

  4. RoI pooling layer
    Will be different sizes of feature map Become the same size .
    take feature map Dage , such as 7x7 The size of the grid , Then proceed max pooling, take feature map Reduce to a certain size . Or by transpose conv( Transposition convolution ), Bilinear interpolation (upsample) Enlarge the feature map . Similarly, there is also the operation of grid rounding , Cause pixel offset . This is the second quantization error .
     Insert picture description here
    The offset on the feature map , If you map it to the original picture , Then it will lead to the final prediction Bounding Box There will be a greater offset on the original map , So it's usually used ROI pooling The algorithm of , For small targets, the effect is not very good .

    The image after transposition and convolution has checkerboard effect , When the picture is enlarged , The picture is similar to the checkerboard . In general , People use simple upsample To improve from small to large .

  5. FCs
    Fully connected layer

  6. softmax, bbox regressor
    Classification and location regression

because RoI pooling Twice quantization error of ,HeKaimin Put forward ROI Align
 Insert picture description here
ROI Align The method used is , In accordance with the mapping scale from the original large picture feature map Get on ROI feature Of map after , Even if feature map There are decimal points in the pixel coordinates in , No rounding operation , Then make the green grid into NxN( The original paper is 2x2) Small black , Similarly, even if the small black box exists, there are decimals , No rounding operation . Instead, the value of the color point in the center of the small black grid is calculated by bilinear interpolation , And then again maxpolling The four color dots in the whole green box get the pixel value instead of the green grid as the reduced feature map.

ROI Align It solves the position drift caused by twice quantization , But the introduction of super parameters N,N Different sizes of , Some pixels may not be utilized , At the same time, the pixels at the edge of the red box may not be utilized .

2018 year Put forward Precise ROI Pooling:

Precise ROI Pooling [2018, IoU-Net]
First, on the basis of the red box , Hit the grid to get a green box , For each green box inside , The red dot is obtained through blue dot and double line interpolation , Finally, sum the red dots and divide by the total number of pixels , Get the pixel feature representing the green box .
This is what's on the green box average pooling, The green box is a decimal box , It contains decimal pixels . So you need to generate red dots , The red dot is just inside the green box , Is the number of integers . Red dots need to be evenly spaced , For example, the pixels on the red edge are 5.87 Pixel , Uniformly obtained 6 A little bit , that 5.87/5 Is the interval of each point .
 Insert picture description here

Faster R-CNN [2015 Ren]

 Insert picture description here
Faster R-CNN In addition to the Selective search, use Region proposal network Replaced the , The main generation is Region proposal.

It can be simplified to the following figure
 Insert picture description here

  1. Backbone
  2. RPN RPN It's mainly about generating Region Proposal, Training is needed ,RPN The introduction of , bring fast rcnn Truly realize the end-to-end network
  3. Fast RCNN: ROI + 𝐂𝐥𝐚𝐬𝐬𝐢𝐟𝐢𝐜𝐚𝐭𝐢𝐨𝐧,𝐑𝐞𝐠𝐫𝐞𝐬𝐬𝐢𝐨n

The algorithm has no standard answer , The details should not be tied up , The main idea of algorithm is
The steps of learning algorithms

  • Through the discussion of literariness in words
  • Text description becomes mathematical language
  • Code Mathematics

The Internet is building blocks , Simple functional programming can switch between different building blocks .
The main composition and structure are as follows :
Backbone Feature extraction network , This part can be used VGG, Resnet, DenseNet,Unet And other basic networks . Basic components
neck/link: The essential , It can be used 1x1 Convolution kernel , Can be replaced by inception module,bottol neck module And so on. .
head: functional head, It's usually fc, Convolution operation, etc
Backbone
 Insert picture description here
RPN
The main purpose is to generate Region proposal

  • The so-called two stages come from RPN+Bbox Regression
  • This is also why the two-stage detection effect is better than the one-stage detection effect ( Note that this conclusion is wrong )
    RPN The structure is as follows , Go first 3x3 And twice 1x1 Convolution kernel ,1x1, Once, the number of channels was changed to 18 Form one output , It is mainly used for classification , In addition, the number of channels is changed to 36, Mainly used to do Bbox forecast
     Insert picture description here

版权声明
本文为[Top of the program]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/111/202204210545091126.html