当前位置:网站首页>Cvpr2022 | efficient pre training based on knowledge distillation

Cvpr2022 | efficient pre training based on knowledge distillation

2022-04-23 21:48:00 Zhiyuan community

Thesis link :https://arxiv.org/abs/2203.05180

Large scale pre training has been shown to be critical for a wide range of computer vision tasks , Can bring significant upside ; However , As the amount of pre training data increases , The emergence of private data , Diversification of model structure , All model structures are pre trained on large-scale pre training data sets , Become expensive 、 Inefficient 、 impractical .
Researchers think : Whether a model that has been pre trained on a large amount of data has extracted the knowledge of a large amount of data , And only a small part of the pre training data , Pass it to a new model efficiently and quickly ?
, in turn, , Researchers propose to realize efficient model pre training through knowledge distillation . They found that , Traditional knowledge distillation is due to the classification of logits Distillation on top of , And these categories logits It will not be used in downstream migration tasks , Therefore, it is not suitable for feature learning required by pre training . Regarding this , Researchers propose a pure feature distillation method based on feature dimension alignment without additional parameters .

版权声明
本文为[Zhiyuan community]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/113/202204232144292288.html