当前位置:网站首页>ConvNeXt
ConvNeXt
2022-04-21 09:52:00 【Relearn CS】
ConvNeXt
![[ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-joTcP64v-1650164682676)(https://s3-us-west-2.amazonaws.com/secure.notion-static.com/2c930934-c41f-4bdb-8df9-dc15e9f3fc23/Untitled.png)]](/img/3d/672253d56404f3d6ea148adfd01918.png)
notes : The following notes refer to the original paper and This blog It's simplified .

Macro Design
- Changing stage ratio: stay ResNet in , commonly conv4_x(stage3) Stacked block Maximum number ,ResNet50 in , each stage Stacked block The number is about (3,4,6,3), The ratio is 1:1:2:1.Swin Transformer in stage3 A higher proportion of , so take ResNet The number of stacks is from (3,4,6,3) Adjusted for (3,3,9,3), and Swin-T There are similar ones FLOPs, After the adjustment , Accuracy from 78.8% Rise to 79.4%
- Changing stem to "Patchify”:ResNet The down sampling in is stride by 2 Of 7x7 Convolution layer and stride by 2 The maximum pool sampling is composed of .Transformer In general Through one stride be equal to kernel_size The convolution layer realizes down sampling , It looks like putting a patch Through the above convolution layer, the lower sampling is 1 Pixel , The down sampling multiple is equal to kernel_size. The accuracy after replacement is from 79.4% Upgrade to 79.5%, also FLOPs Also reduced. .
ResNeXt-ify
- ResNeXt A group convolution is used in the so that it is in FLOPs and accuracy A good balance has been achieved in . here ConvNeXt In the direct Used MobileNet Proposed in depthwise-conv Replace ResBkock Medium 3x3 Convolution . At the same time, the author will The number of channels ranges from 64 Turned into 96, Final The accuracy is up to 80.5%.
Inverted Bottleneck
- The author thinks that Transformer block Medium MLP Module and MobileNetV2 Medium Inverted bottlenect The structure is similar to , They are both thin at both ends and thick in the middle .ConvNeXt Finally, the structure is shown in the figure below a It is amended as follows c,b by MobileNetV2 The use of Inverted bottleneck. author use Inverted bottleneck After that, the accuracy on the smaller model ranges from 80.5% Promoted to 80.6%, On larger models, the accuracy ranges from 81.9% Upgrade to 82.6%, also FLOPs Reduced by a small margin

Large kernel size
because Transformer In general, we do the overall situation self-attention, such as vision transformer, and Swin Transformer There are also 7x7 The window size of . But at present, the mainstream neural networks use 3x3 A window the size of , Because before VGG It is mentioned that stacking multiple 3x3 The convolution layer can replace the larger convolution layer , And 3x3 The optimization of convolution layer is more efficient .
- Moving up depthwise convolution: take 1x1conv→depthwise conv → 1x1conv Turn into depthwise conv→1x1conv→1x1conv, The accuracy dropped to 79.9%, meanwhile FLOPs Also reduced
- Increasing kernel size: modify depth wise The convolution kernel of is 7x7, At the same time, the author has tried convolution kernels of other sizes , Include 3,5,7,9,11 Found to 7 The accuracy reaches saturation . also The accuracy is from 79.9%(3x3) Growth to 80.6%(7x7)
Macro Design
Focus on smaller differences , Such as activation functions and Normalization etc.
- Relu→GRELU: take Relu Change activation function to GELU Activation function , There is no significant change in accuracy
- fewer activation: Use fewer activation functions , In general, convolution networks will be followed by an activation function after the convolution layer or the full connection layer , however Transformer in MLP Only after the first full connection layer GELU Activation function . The author in ConvNeXt Activation functions are also reduced in ,depthwise conv→1x1conv + GELU→1x1conv, The accuracy is from 80.6% Up to 81.3%
- Fewer normalization layers: The author in ConvNeXt Only in 7x7 Of depthwise conv I used Normalization, The accuracy has increased 0.1%, arrive 81.4%
- Substitute BN with LN: Use layer norm Replace batch norm, The accuracy has increased 0.1%, arrive 81.5%
- Separate downsampling layers: primary ResNet in stage2-stage4 The downsampling in is done by taking the main branch 3x3 The convolution step is set to 2, On the shortcut Branch 1x1 The convolution layer step of is set to 2 On going .Swin Transformer Is through a separate patch merging Realized ,ConvNeXt The author alone A lower sampling layer is used , Through one Layer norm Add a convolution kernel with a size of 2,stride by 2 The convolution layer composition of , Finally, the accuracy is improved to 82.0%
版权声明
本文为[Relearn CS]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204210944460933.html
边栏推荐
- 【手拉手 带你准备电赛】单色块识别(基于openmv)
- M3u8 Video Downloader IDM breaks through the failure to download protected data and cannot be restricted
- C#字符串提供的各种方法
- Install redis in docker under CentOS
- 当uniapp遇上滚动穿透,巧妙的解决方式~
- You are using pip version 20.2.3; however, version 22.0.4 is available. You should consider
- [跟着官方文档学Junit5][一][Overview][学习笔记]
- Sixtool, which is launched in the whole network, is a multi-functional and multi-in-one generation hanging assistant. It supports BiliBili, movement steps and other functions. The operating version ca
- What are the products of Guangzhou futures exchange?
- 每日一题(2022-04-20)——文件的最长绝对路径
猜你喜欢

【数据库第十二次作业-存储过程】

Advanced C language - dynamic memory management

CANoe:Vector Tool Platform是什么

Operation of simulation test platform for test questions of refrigeration and air conditioning equipment operation test in 2022

比 Navicat 还要好用、功能更强大的工具!
Transaction isolation level and mvcc

基于WebSocket实现一个简易的群聊功能

每日一题(2022-04-20)——文件的最长绝对路径
![[hand in hand to prepare you for the electric game] use the timer interrupt to change the PWM duty cycle](/img/b3/203fdbd700347c531e6567b6972594.png)
[hand in hand to prepare you for the electric game] use the timer interrupt to change the PWM duty cycle

Interview question of a small factory: what is false awakening?
随机推荐
刷题记录(牛客MySQL)
【无标题】定时的
CentOS下Docker中安装redis
[daily question 1] stone jumping -- dynamic programming
【手拉手 带你准备电赛】April Tag标记跟踪(3D定位)详解
2022 refrigeration and air conditioning equipment operation test question simulation test question bank and answers
C语言进阶-动态内存管理
Mapbox 创建多个可拖动的标记点
Ansible_02_playbook
事务的隔离级别与MVCC
2022年A特种设备相关管理(电梯)考试试题模拟考试平台操作
Gunicorn usage - server project deployment
Fashion cloud learning -js implementation disables right click and F12
Promise处理复杂异步
【手拉手 带你准备电赛】使用定时器中断更改PWM占空比
One trick is to solve the servlet of servlet [dispatcher servlet] Init() threw an exception
[cloud resident co creation] database principle and application course of Huawei cloud database 11. Database system control
Ansible_ 02_ playbook
Question brushing record (Niuke MySQL)
【总结】1296- 总结 12 个常见移动端 H5 与 Hybrid 开发问题