当前位置：网站首页>Machine learning -- model optimization

Machine learning -- model optimization

2022-04-23 13:16:00 【DCGJ666】

machine learning —— Model optimization

Model compression method
squeezeNet Of Fire Module What are the characteristics of ？

Model compression method

Low rank approximation
The basic operation of neural network is convolution , In fact, it is matrix operation , The technique of low rank approximation is to reconstruct the weight matrix through a series of small-scale matrices , So as to reduce the amount of computation and storage overhead . At present, there are two common methods ： One is Toeplitz matrix （ It means that the elements on each slash from top left to bottom right in the matrix are the same ） Direct refactoring Weight matrices , Second, singular value decomposition （SVD）, The weight matrix is decomposed into several small matrices .
Pruning and sparse constraints
prune It is a classic post-processing technology in the field of model compression , Typical applications such as pre pruning and post pruning of decision tree . Pruning technology can reduce the amount of model parameters , Prevent over fitting , Improve model generalization ability .
Applied to neural networks , Follow four steps ：
One is to measure the importance of neurons
The second is to remove some unimportant neurons
Third, fine tune the network
Fourth, return to the first step , Do the next round of pruning
Sparse constraint It has the same effect as direct pruning , The idea is to optimize the network Sparse regularization term with weight added to the objective , The partial weight of the training network tends to 0, And these 0 The value element is the object of pruning . therefore , Sparse constraints can be considered as dynamic pruning .
Relative pruning , Advantages of sparse constraints ： Just one workout , Can achieve the purpose of network pruning .
Parameter quantification
Compared with pruning operation , Parameter quantification Is a commonly used back-end compression technology . Quantification is to sum up several representative weights from the weights , These representatives represent the specific value of a certain kind of weight . these “ representative ” Stored in codebook , The original weight matrix only needs to record their “ representative ” The index of , This greatly reduces the storage overhead
Binary network
It is equivalent to an extreme case of quantitative method ： The values of all parameters can only be +1 or -1. Most of the existing neural networks are trained based on gradient descent , But the weight of binary network is only +1 or -1, Gradient information cannot be calculated directly , You can't update the weight . A compromise is , The forward and reverse returns of the network are binary , The weight update is to update the single precision weight
Distillation of knowledge
Distillation of knowledge The purpose of is to make a high-precision and bulky teacher Convert to a more compact student. The specific idea is ： Training teacher Model softmax A suitable solution is obtained for the hyperparameters of the layer soft target aggregate （“ Soft label ” It refers to the output of a large network after convolution at each layer feature map）, And then for those who need training student Model , Make the same hyperparameter value as close as possible to teacher Model soft target aggregate , As student Part of the total objective function of the model , To induce student Model training , Realize the transfer of knowledge .

squeezeNet Of Fire Module What are the characteristics of ？

extrusion ： High dimensional features have better representation ability , But it makes the model parameters expand sharply . In order to pursue the balance between model capacity and parameters , You can use 1x1 To reduce the dimension of input features . meanwhile ,1x1 The convolution of can synthesize the information of multiple channels , Get more compact input features , So as to ensure the prosperity of the model .
expansion ： Usually 3x3 The convolution of takes up a lot of computing resources . have access to 1x1 Convolution to piece together 3x3 Convolution of .

版权声明
本文为[DCGJ666]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/04/202204230611343243.html