当前位置:网站首页>Machine learning -- model optimization
Machine learning -- model optimization
2022-04-23 13:16:00 【DCGJ666】
machine learning —— Model optimization
Model compression method
- Low rank approximation
The basic operation of neural network is convolution , In fact, it is matrix operation , The technique of low rank approximation is to reconstruct the weight matrix through a series of small-scale matrices , So as to reduce the amount of computation and storage overhead . At present, there are two common methods : One is Toeplitz matrix ( It means that the elements on each slash from top left to bottom right in the matrix are the same ) Direct refactoring Weight matrices , Second, singular value decomposition (SVD), The weight matrix is decomposed into several small matrices . - Pruning and sparse constraints
prune It is a classic post-processing technology in the field of model compression , Typical applications such as pre pruning and post pruning of decision tree . Pruning technology can reduce the amount of model parameters , Prevent over fitting , Improve model generalization ability .
Applied to neural networks , Follow four steps :
One is to measure the importance of neurons
The second is to remove some unimportant neurons
Third, fine tune the network
Fourth, return to the first step , Do the next round of pruning
Sparse constraint It has the same effect as direct pruning , The idea is to optimize the network Sparse regularization term with weight added to the objective , The partial weight of the training network tends to 0, And these 0 The value element is the object of pruning . therefore , Sparse constraints can be considered as dynamic pruning .
Relative pruning , Advantages of sparse constraints : Just one workout , Can achieve the purpose of network pruning . - Parameter quantification
Compared with pruning operation , Parameter quantification Is a commonly used back-end compression technology . Quantification is to sum up several representative weights from the weights , These representatives represent the specific value of a certain kind of weight . these “ representative ” Stored in codebook , The original weight matrix only needs to record their “ representative ” The index of , This greatly reduces the storage overhead - Binary network
It is equivalent to an extreme case of quantitative method : The values of all parameters can only be +1 or -1. Most of the existing neural networks are trained based on gradient descent , But the weight of binary network is only +1 or -1, Gradient information cannot be calculated directly , You can't update the weight . A compromise is , The forward and reverse returns of the network are binary , The weight update is to update the single precision weight - Distillation of knowledge
Distillation of knowledge The purpose of is to make a high-precision and bulky teacher Convert to a more compact student. The specific idea is : Training teacher Model softmax A suitable solution is obtained for the hyperparameters of the layer soft target aggregate (“ Soft label ” It refers to the output of a large network after convolution at each layer feature map), And then for those who need training student Model , Make the same hyperparameter value as close as possible to teacher Model soft target aggregate , As student Part of the total objective function of the model , To induce student Model training , Realize the transfer of knowledge .
squeezeNet Of Fire Module What are the characteristics of ?
extrusion : High dimensional features have better representation ability , But it makes the model parameters expand sharply . In order to pursue the balance between model capacity and parameters , You can use 1x1 To reduce the dimension of input features . meanwhile ,1x1 The convolution of can synthesize the information of multiple channels , Get more compact input features , So as to ensure the prosperity of the model .
expansion : Usually 3x3 The convolution of takes up a lot of computing resources . have access to 1x1 Convolution to piece together 3x3 Convolution of .
版权声明
本文为[DCGJ666]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204230611343243.html
边栏推荐
- Mui close other pages and keep only the first page
- async void 導致程序崩潰
- SSM整合之pom.xml
- 初鉴canvas,展示个小小的小案例
- 2020最新Android大厂高频面试题解析大全(BAT TMD JD 小米)
- Nodejs + Mysql realize simple registration function (small demo)
- filter()遍历Array异常友好
- @优秀的你!CSDN高校俱乐部主席招募!
- MySQL basic statement query
- 51 single chip microcomputer stepping motor control system based on LabVIEW upper computer (upper computer code + lower computer source code + ad schematic + 51 complete development environment)
猜你喜欢

CMSIS cm3 source code annotation

LeetCode_DFS_中等_695.岛屿的最大面积

SPI NAND flash summary

FatFs FAT32 learning notes

@优秀的你!CSDN高校俱乐部主席招募!

Xi'an CSDN signed a contract with Xi'an Siyuan University, opening a new chapter in IT talent training

X509 parsing

@Excellent you! CSDN College Club President Recruitment!

1130 - host XXX is not allowed to connect to this MySQL server error in Navicat remote connection database

内核错误: No rule to make target ‘debian/canonical-certs.pem‘, needed by ‘certs/x509_certificate_list‘
随机推荐
Complete project data of UAV apriltag dynamic tracking landing based on openmv (LabVIEW + openmv + apriltag + punctual atom four axes)
The difference between string and character array in C language
[51 single chip microcomputer traffic light simulation]
MySQL 8.0.11下载、安装和使用可视化工具连接教程
Office 2021 installation package download and activation tutorial
ESP32 VHCI架构传统蓝牙设置scan mode,让设备能被搜索到
[Technical Specification]: how to write technical documents?
[official announcement] Changsha software talent training base was established!
Esp32 vhci architecture sets scan mode for traditional Bluetooth, so that the device can be searched
vscode小技巧
Imx6ull QEMU bare metal tutorial 2: usdhc SD card
Common interview questions and detailed analysis of the latest Android developers in 2020
Design of body fat detection system based on 51 single chip microcomputer (51 + OLED + hx711 + US100)
Ffmpeg common commands
CSDN College Club "famous teacher college trip" -- Hunan Normal University Station
2020年最新字节跳动Android开发者常见面试题及详细解析
Solve the problem of Oracle Chinese garbled code
Loading and using image classification dataset fashion MNIST in pytorch
Scons build embedded ARM compiler
[quick platoon] 215 The kth largest element in the array