当前位置:网站首页>Summary: Cross Validation
Summary: Cross Validation
2022-08-11 05:33:00 【weiAweiww】
Table of Contents
WHAT
Cross-Validation, or CV for short.
Also called circular estimation, it is a method used to statistically cut a data sample into smaller subsets.
Introduce three nouns:
Training set: learn the sample data set and match the parameters to establish the model.
Validation set: Adjust the parameters of the trained model, and are also used to determine the network structure or parameters that control the complexity of the model.
Test Set: Test the model.
Three important indicators:
Bias: Accuracy.The degree of deviation between the expected prediction of the learned model and the actual result (difference between the average predicted value and the actual value), which is used to describe the fit degree of the algorithm itself.
Variance: Stability.The performance change (expectation of the square of the difference between the predicted value and the average predicted value) during training with different training sets of the same size is used to characterize the impact of data perturbations.
Error: The accuracy of the entire model.
Note:
1, Error=Bias^2+Variance+Noise
2. Bias and Variance are often not compatible.Both are low, which is an ideal state (see the figure below), but if you want to reduce the Bias, the Variance will increase to a certain extent, and vice versa.
The root cause: We prefer to use a limited sample data set to estimate and predict an infinite real data set.When we continuously improve the accuracy of the model (Bias is reduced), overfitting will occur, the generalization ability of the model will be reduced, the performance of the model in real data will be reduced, and the uncertainty of the model will be increased (Variance increases).Conversely, if more restrictions are added in the process of learning the model, the Variance of the model can be reduced (Variance reduction) and the stability of the model can be improved, but the Bias of the model will be increased (Bias increased).
Summary: Overfitting has high bias and underfitting has high variance.
So, how to avoid these two extreme cases???
(1) Avoid underfitting: find better features (representative), use more features (increase the dimension of the input vector).
(2) Avoid overfitting: increase the data set (reduce the proportion of noise), reduce data features (reduce data dimension), regularization method (add a regular term to the objective function or cost function), Cross-validation method (the key part of this post)
Three CV methods: Hold-out Method, K-fold Cross Validation, Leave-One-Out Cross Validation
Details about K-fold Cross Validation here

1. Divide the original data into k groups (usually equally divided),
Each subset is used as a validation set, and the remaining k-1 sets of subset data are used as training set
Get k models
2. Use the average of the classification accuracy of the final validation set of the k models as the performance index of the classifier under this k-CV
3. Evaluate the effect of the k models and pick the best hyperparameters (hyperparameters are parameters that set values before starting the learning process, not parameter data obtained through training.).
4. Use the optimal hyperparameters, and then retrain the model with all the k data as the training set to obtain the final model.
WHY
1. Cross-validation is used to evaluate the prediction performance of the model, especially the performance of the trained model on new data, which can reduce overfitting to a certain extent.
2. Obtain as much effective information as possible from limited data.
3. A convenient technique to measure model performance using only the training set, instead of using the test set after modeling.
边栏推荐
- BGP Comprehensive Experiment
- Switch and Router Technology - 32 - Named ACL
- Golden Warehouse Database KingbaseGIS User Manual (6.10. Geometric Object Operation Operator)
- StarUML使用心得
- ARM Architecture 4: Embedded Hardware Platform Interface Development
- 4 Module 3: Literature Reading and Research Methods
- 实战noVNC全过程操作(包含遇到的问题和解决)
- Prometheus :(一)基本概念
- oracle tablespace and user creation
- 【无标题】2022年胺基化工艺考试题模拟考试题库及在线模拟考试
猜你喜欢

【ARM】rk3399挂载nfs报错

Django--20实现Redis支持、上下文以及上下文和接口的交互

Redis-数据类型(基本指令、String、List、Set、Hash、ZSet、BitMaps、HyperLogLog、GeoSpatial)/发布和订阅

flaks framework learning: adding variables to the URL

代码在线审查(添加网页批注)的实现

MySQL必知必会(初级篇)

【嵌入式开源库】cJSON的使用,高效精简的json解析库

leetcode 9. Palindromic Numbers

tensorflow代码翻译成pytorch代码 -详细教程+案例
![[Embedded open source library] The use of cJSON, an efficient and streamlined json parsing library](/img/11/26ec988a23b239d7b01e2e29e3e32d.png)
[Embedded open source library] The use of cJSON, an efficient and streamlined json parsing library
随机推荐
关于ie下href有中文出现RFC 7230 and RFC 3986问题的研究
[ARM] rk3399 mounts nfs error
[Embedded open source library] The use of cJSON, an efficient and streamlined json parsing library
redis集群模式--解决redis单点故障
滴滴出行 nlp算法工程师面试经验分享 带offer截图真实
Redis-使用jedis连接linux中redis服务器失败的解决方案
for循环使用多线程优化
Tips to improve your productivity, you have to know - Navitcat shortcuts
判断一个字符串是否为空,如果为空,对其赋值,如果不为空,获取字符的个数并打印第一个字符
【Cron】学习:cron 表达式
批量修改数据库等视频文件名称
ARM结构体系4:嵌入式硬件平台接口开发
Keras与tensorflow 使用基础
你务必得明白——JSP的九大内置对象与四大域对象
Delphi7学习记录-demo实例
一些常见mysql入门练习
MySQL存储引擎概念
MySQL数据库管理
分库分表之sharding-proxy
每周推荐短视频:你常用的拍立淘,它的前身原来是这样的!
