当前位置:网站首页>Machine learning -- model optimization
Machine learning -- model optimization
2022-04-23 13:16:00 【DCGJ666】
machine learning —— Model optimization
Model compression method
- Low rank approximation
The basic operation of neural network is convolution , In fact, it is matrix operation , The technique of low rank approximation is to reconstruct the weight matrix through a series of small-scale matrices , So as to reduce the amount of computation and storage overhead . At present, there are two common methods : One is Toeplitz matrix ( It means that the elements on each slash from top left to bottom right in the matrix are the same ) Direct refactoring Weight matrices , Second, singular value decomposition (SVD), The weight matrix is decomposed into several small matrices . - Pruning and sparse constraints
prune It is a classic post-processing technology in the field of model compression , Typical applications such as pre pruning and post pruning of decision tree . Pruning technology can reduce the amount of model parameters , Prevent over fitting , Improve model generalization ability .
Applied to neural networks , Follow four steps :
One is to measure the importance of neurons
The second is to remove some unimportant neurons
Third, fine tune the network
Fourth, return to the first step , Do the next round of pruning
Sparse constraint It has the same effect as direct pruning , The idea is to optimize the network Sparse regularization term with weight added to the objective , The partial weight of the training network tends to 0, And these 0 The value element is the object of pruning . therefore , Sparse constraints can be considered as dynamic pruning .
Relative pruning , Advantages of sparse constraints : Just one workout , Can achieve the purpose of network pruning . - Parameter quantification
Compared with pruning operation , Parameter quantification Is a commonly used back-end compression technology . Quantification is to sum up several representative weights from the weights , These representatives represent the specific value of a certain kind of weight . these “ representative ” Stored in codebook , The original weight matrix only needs to record their “ representative ” The index of , This greatly reduces the storage overhead - Binary network
It is equivalent to an extreme case of quantitative method : The values of all parameters can only be +1 or -1. Most of the existing neural networks are trained based on gradient descent , But the weight of binary network is only +1 or -1, Gradient information cannot be calculated directly , You can't update the weight . A compromise is , The forward and reverse returns of the network are binary , The weight update is to update the single precision weight - Distillation of knowledge
Distillation of knowledge The purpose of is to make a high-precision and bulky teacher Convert to a more compact student. The specific idea is : Training teacher Model softmax A suitable solution is obtained for the hyperparameters of the layer soft target aggregate (“ Soft label ” It refers to the output of a large network after convolution at each layer feature map), And then for those who need training student Model , Make the same hyperparameter value as close as possible to teacher Model soft target aggregate , As student Part of the total objective function of the model , To induce student Model training , Realize the transfer of knowledge .
squeezeNet Of Fire Module What are the characteristics of ?
extrusion : High dimensional features have better representation ability , But it makes the model parameters expand sharply . In order to pursue the balance between model capacity and parameters , You can use 1x1 To reduce the dimension of input features . meanwhile ,1x1 The convolution of can synthesize the information of multiple channels , Get more compact input features , So as to ensure the prosperity of the model .
expansion : Usually 3x3 The convolution of takes up a lot of computing resources . have access to 1x1 Convolution to piece together 3x3 Convolution of .
版权声明
本文为[DCGJ666]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204230611343243.html
边栏推荐
- MySQL 8.0.11 download, install and connect tutorials using visualization tools
- MySQL basic statement query
- Imx6ull QEMU bare metal tutorial 1: GPIO, iomux, I2C
- [walking notes]
- CSDN College Club "famous teacher college trip" -- Hunan Normal University Station
- The project file '' has been renamed or is no longer in the solution, and the source control provider associated with the solution could not be found - two engineering problems
- office2021安装包下载与激活教程
- Learning notes of AMBA protocol
- 100000 college students have become ape powder. What are you waiting for?
- 2020年最新字节跳动Android开发者常见面试题及详细解析
猜你喜欢

Stack protector under armcc / GCC

Lpddr4 notes

The first lesson is canvas, showing a small case
![[official announcement] Changsha software talent training base was established!](/img/ee/0c2775efc4578a008c872022a95559.png)
[official announcement] Changsha software talent training base was established!

解决Oracle中文乱码的问题

JMeter operation redis

Request和Response及其ServletContext总结

【微信小程序】flex布局使用记录

超40W奖金池等你来战!第二届“长沙银行杯”腾讯云启创新大赛火热来袭!

Solve the problem of Oracle Chinese garbled code
随机推荐
这几种 VSCode 扩展是我最喜欢的
4.22学习记录(你一天只做了水题是吗)
AUTOSAR from introduction to mastery 100 lectures (50) - AUTOSAR memory management series - ECU abstraction layer and MCAL layer
【行走的笔记】
[quick platoon] 215 The kth largest element in the array
8086 of x86 architecture
5 tricky activity life cycle interview questions. After learning, go and hang the interviewer!
pyqt5 将opencv图片存入内置SQLlite数据库,并查询
AUTOSAR from introduction to mastery 100 lectures (83) - bootloader self refresh
STD:: shared of smart pointer_ ptr、std::unique_ ptr
decast id.var measure.var数据拆分与合并
基于uniapp异步封装接口请求简介
[walking notes]
[notes de marche]
100 lectures on practical application cases of Excel (VIII) - report connection function of Excel
[untitled] PID control TT encoder motor
Use Proteus to simulate STM32 ultrasonic srf04 ranging! Code+Proteus
uniapp image 引入本地图片不显示
Solve the problem that Oracle needs to set IP every time in the virtual machine
Mysql数据库的卸载