当前位置:网站首页>Two Stage Detection
Two Stage Detection
2022-04-23 21:00:00 【Top of the program】
utilize selective search Get an approximation 2000 individual RoI Area
Deep ConvNet
The original whole picture is convoluted onceRoI projection
original ROI After convolution feature map Mapping , According to the size of the original picture and the size of the whole picture feature map The size can be scaled to a certain scale . because ROI Dimensions are scaled , There may be non integer cases , This requires rounding down , Cause pixel offset . This is the first quantization error .RoI pooling layer
Will be different sizes of feature map Become the same size .
take feature map Dage , such as 7x7 The size of the grid , Then proceed max pooling, take feature map Reduce to a certain size . Or by transpose conv( Transposition convolution ), Bilinear interpolation (upsample) Enlarge the feature map . Similarly, there is also the operation of grid rounding , Cause pixel offset . This is the second quantization error .
The offset on the feature map , If you map it to the original picture , Then it will lead to the final prediction Bounding Box There will be a greater offset on the original map , So it's usually used ROI pooling The algorithm of , For small targets, the effect is not very good .The image after transposition and convolution has checkerboard effect , When the picture is enlarged , The picture is similar to the checkerboard . In general , People use simple upsample To improve from small to large .
FCs
Fully connected layersoftmax, bbox regressor
Classification and location regression
because RoI pooling Twice quantization error of ,HeKaimin Put forward ROI Align
ROI Align The method used is , In accordance with the mapping scale from the original large picture feature map Get on ROI feature Of map after , Even if feature map There are decimal points in the pixel coordinates in , No rounding operation , Then make the green grid into NxN( The original paper is 2x2) Small black , Similarly, even if the small black box exists, there are decimals , No rounding operation . Instead, the value of the color point in the center of the small black grid is calculated by bilinear interpolation , And then again maxpolling The four color dots in the whole green box get the pixel value instead of the green grid as the reduced feature map.
ROI Align It solves the position drift caused by twice quantization , But the introduction of super parameters N,N Different sizes of , Some pixels may not be utilized , At the same time, the pixels at the edge of the red box may not be utilized .
2018 year Put forward Precise ROI Pooling:
Precise ROI Pooling [2018, IoU-Net]
First, on the basis of the red box , Hit the grid to get a green box , For each green box inside , The red dot is obtained through blue dot and double line interpolation , Finally, sum the red dots and divide by the total number of pixels , Get the pixel feature representing the green box .
This is what's on the green box average pooling, The green box is a decimal box , It contains decimal pixels . So you need to generate red dots , The red dot is just inside the green box , Is the number of integers . Red dots need to be evenly spaced , For example, the pixels on the red edge are 5.87 Pixel , Uniformly obtained 6 A little bit , that 5.87/5 Is the interval of each point .
Faster R-CNN In addition to the Selective search, use Region proposal network Replaced the , The main generation is Region proposal.
It can be simplified to the following figure
- Backbone
- RPN RPN It's mainly about generating Region Proposal, Training is needed ,RPN The introduction of , bring fast rcnn Truly realize the end-to-end network
- Fast RCNN: ROI + 𝐂𝐥𝐚𝐬𝐬𝐢𝐟𝐢𝐜𝐚𝐭𝐢𝐨𝐧,𝐑𝐞𝐠𝐫𝐞𝐬𝐬𝐢𝐨n
The algorithm has no standard answer , The details should not be tied up , The main idea of algorithm is
The steps of learning algorithms
- Through the discussion of literariness in words
- Text description becomes mathematical language
- Code Mathematics
The Internet is building blocks , Simple functional programming can switch between different building blocks .
The main composition and structure are as follows :
Backbone Feature extraction network , This part can be used VGG, Resnet, DenseNet,Unet And other basic networks . Basic components
neck/link: The essential , It can be used 1x1 Convolution kernel , Can be replaced by inception module,bottol neck module And so on. .
head: functional head, It's usually fc, Convolution operation, etc
Backbone
RPN
The main purpose is to generate Region proposal
- The so-called two stages come from RPN+Bbox Regression
- This is also why the two-stage detection effect is better than the one-stage detection effect ( Note that this conclusion is wrong )
RPN The structure is as follows , Go first 3x3 And twice 1x1 Convolution kernel ,1x1, Once, the number of channels was changed to 18 Form one output , It is mainly used for classification , In addition, the number of channels is changed to 36, Mainly used to do Bbox forecast
版权声明
本文为[Top of the program]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/111/202204210545091126.html
边栏推荐
猜你喜欢
Matlab matrix index problem
Rust更适合经验较少的程序员?
Recommended usage scenarios and production tools for common 60 types of charts
MySQL进阶之数据的增删改查(DML)
Reentrant function
Deep analysis of C language function
Express ③ (use express to write interface and cross domain related issues)
opencv应用——以图拼图
Prim、Kruskal
Lunch on the 23rd day at home
随机推荐
Prim、Kruskal
Centralized record of experimental problems
100天拿下11K,转岗测试的超全学习指南
[matlab 2016 use mex command to find editor visual studio 2019]
启牛学堂有用吗,推荐的证券账户是否安全
mmap、munmap
Awk example skills
【SDU Chart Team - Core】SVG属性类设计之枚举
opencv应用——以图拼图
Express ③ (use express to write interface and cross domain related issues)
Ubutnu20 installer centernet
Selenium displays webdriverwait
Fastdfs思维导图
使用mbean 自动执行heap dump
Keywords static, extern + global and local variables
How to learn software testing? Self study or training? After reading this article, you will understand
go defer
wait、waitpid
Google 尝试在 Chrome 中使用 Rust
电脑越用越慢怎么办?文件误删除恢复方法