当前位置:网站首页>Knowledge Distillation Thesis Learning
Knowledge Distillation Thesis Learning
2022-08-10 05:46:00 【a program circle】
Supervised Learning
Supervised learning: The training sample set contains not only samples, but also labels corresponding to these samples, that is, samples and sample labels appear in pairs.The goal of supervised learning is to learn an efficient sample-to-label mapping from the training samples, enabling it to predict the labels of unknown samples.Common supervised learning methods include neural networks, support vector machines.
All regression and classification algorithms are supervised learning.The algorithm difference between regression and classification is the type of output variable. Quantitative output is called regression, or continuous variable prediction; qualitative output is called classification, or discrete variable prediction.
Unsupervised Learning
Unsupervised learning algorithm: no training samples.The category of the sample is not known in advance, and similar samples are grouped together in a certain way.
Semi-Supervised Learning
Semi-supervised learning: With a small number of training samples, the learning machine is based on the knowledge obtained from the training samples, and gradually corrects the existing knowledge in combination with the distribution of the test samples, and judges the type of the test samples.
Knowledge Distillation
Knowledge distillation (KD) is a common method for model compression.Knowledge distillation is to train the small model by building a lightweight small model and use the supervision information of the larger model with better performance to achieve better performance and accuracy.The large model is called the Teacher model and the small model is called the Student model.
Classification of Knowledge Distillation: Offline Distillation, Semi-Supervised Distillation, Self-Supervised Distillation
The core idea of knowledge distillation is to first train a complex network model, and then use the output of this complex network and the true label of the data to train a smaller network, so the knowledge distillation framework usually includes a complex model and a small model.
Features of Knowledge Distillation
1, Improve model accuracy
If the user is not satisfied with the accuracy of the current network model A, then you can first train a higher-precision teacher model B (usually more parameters and longer delay),Then use this trained teacher model B to perform knowledge distillation on student model A to obtain a higher-precision model.
2. Reduce model delay and compress network parameters.
3. Domain transfer between image tags
The user trains a teacher model A using the dog and cat datasets, and a teacher model B using bananas and apples, then you can use these two models at the same time.Distill a model that can recognize dogs, cats, bananas, and apples, and integrate and transfer data sets from two different domains.
4. Reducing the amount of labeling
It is achieved by semi-supervised distillation. The user uses the trained teacher network model to distill the unlabeled data set to achieve the purpose of reducing the amount of labeling.
Characteristic distillation
logits represents the data of the last layer of the model in deep learning, that is, raw data, which can then be scaled by softmax or sigmod
The range of logits is [− ∞ - \infty−∞, + ∞ + \infty+∞]
** logits: unnormalized probability, generally the input of the softmax layer.**So logits has the same shape as labels
It can also be used as the input of sigmoid
Feature distillation Unlike the Logits method, Student only learns the result knowledge of the teacher's logits, but learns the intermediate layer features in the teacher's network structure.The correspondence of the teacher's middle feature layer is the knowledge passed to the student.
边栏推荐
猜你喜欢
随机推荐
redis---非关系型数据库(NoSql)
虚拟土地价格暴跌85% 房地产泡沫破裂?依托炒作的暴富游戏需谨慎参与
网络安全作业
几种绘制时间线图的方法
sqlplus 显示上一条命令及可用退格键
kaggle小白必看:小白常见的2个错误解决方案
基于Qiskit——《量子计算编程实战》读书笔记(五)
Module build failed TypeError this.getOptions is not a function报错解决方案
如何在报表控件FastReport.NET中连接XLSX 文件作为数据源?
【写下自用】每次都忘记如何train?记录如何训练自己的yolov5
pytorch框架学习(2)使用GPU训练
CSDN Markdown 之我见代码块 | CSDN编辑器测评
手把手带你写嵌入式物联网的第一个项目
链读推荐:从瓷砖到生成式 NFT
第五次实验
scikit-learn机器学习 读书笔记(二)
毫米波雷达基础概念学习
智能合约和去中心化应用DAPP
深度学习中的学习率调整策略(1)
2021-06-22