当前位置:网站首页>Summary: Cross Validation
Summary: Cross Validation
2022-08-11 05:33:00 【weiAweiww】
Table of Contents
WHAT
Cross-Validation, or CV for short.
Also called circular estimation, it is a method used to statistically cut a data sample into smaller subsets.
Introduce three nouns:
Training set: learn the sample data set and match the parameters to establish the model.
Validation set: Adjust the parameters of the trained model, and are also used to determine the network structure or parameters that control the complexity of the model.
Test Set: Test the model.
Three important indicators:
Bias: Accuracy.The degree of deviation between the expected prediction of the learned model and the actual result (difference between the average predicted value and the actual value), which is used to describe the fit degree of the algorithm itself.
Variance: Stability.The performance change (expectation of the square of the difference between the predicted value and the average predicted value) during training with different training sets of the same size is used to characterize the impact of data perturbations.
Error: The accuracy of the entire model.
Note:
1, Error=Bias^2+Variance+Noise
2. Bias and Variance are often not compatible.Both are low, which is an ideal state (see the figure below), but if you want to reduce the Bias, the Variance will increase to a certain extent, and vice versa.
The root cause: We prefer to use a limited sample data set to estimate and predict an infinite real data set.When we continuously improve the accuracy of the model (Bias is reduced), overfitting will occur, the generalization ability of the model will be reduced, the performance of the model in real data will be reduced, and the uncertainty of the model will be increased (Variance increases).Conversely, if more restrictions are added in the process of learning the model, the Variance of the model can be reduced (Variance reduction) and the stability of the model can be improved, but the Bias of the model will be increased (Bias increased).
Summary: Overfitting has high bias and underfitting has high variance.
So, how to avoid these two extreme cases???
(1) Avoid underfitting: find better features (representative), use more features (increase the dimension of the input vector).
(2) Avoid overfitting: increase the data set (reduce the proportion of noise), reduce data features (reduce data dimension), regularization method (add a regular term to the objective function or cost function), Cross-validation method (the key part of this post)
Three CV methods: Hold-out Method, K-fold Cross Validation, Leave-One-Out Cross Validation
Details about K-fold Cross Validation here

1. Divide the original data into k groups (usually equally divided),
Each subset is used as a validation set, and the remaining k-1 sets of subset data are used as training set
Get k models
2. Use the average of the classification accuracy of the final validation set of the k models as the performance index of the classifier under this k-CV
3. Evaluate the effect of the k models and pick the best hyperparameters (hyperparameters are parameters that set values before starting the learning process, not parameter data obtained through training.).
4. Use the optimal hyperparameters, and then retrain the model with all the k data as the training set to obtain the final model.
WHY
1. Cross-validation is used to evaluate the prediction performance of the model, especially the performance of the trained model on new data, which can reduce overfitting to a certain extent.
2. Obtain as much effective information as possible from limited data.
3. A convenient technique to measure model performance using only the training set, instead of using the test set after modeling.
边栏推荐
猜你喜欢

nodes服务器

Sub-database sub-table ShardingSphere-JDBC notes arrangement

Redis-数据类型(基本指令、String、List、Set、Hash、ZSet、BitMaps、HyperLogLog、GeoSpatial)/发布和订阅

关于ie下href有中文出现RFC 7230 and RFC 3986问题的研究

3 Module 2: Use of scientific research tools

BGP综合实验

【ARM】rk3399挂载nfs报错

Prometheus :(一)基本概念

prometheus:(二)监控概述(你永远逃不出我的手掌哈哈)

并发编程之线程基础
随机推荐
【无2022上海市安全员A证考试题库及模拟考试
Switch and Router Technology-35-NAT to PAT
Tips to improve your productivity, you have to know - Navitcat shortcuts
一些常见mysql入门练习
pytorch和tensorflow函数对应表
华为od德科面试数据算法解析 2022-8-10 迷宫问题
flaks framework learning: adding variables to the URL
JedisLock_Redis分布式锁实现_转载
Idea essential skills to improve work efficiency
[Embedded open source library] The use of MultiButton, an easy-to-use event-driven button driver module
MySQL必知必会(初级篇)
Configure checkstyle in IDEA
PyTorch显存机制分析
【Cron】学习:cron 表达式
Tips to make your code more and more taller and taller - code specification, you have to know
selenuim使用cookie登录京东
MFC 进程间通信(共享内存)
2022 Quality Officer-Civil Construction Direction-General Basic (Quality Officer) Exam Mock 100 Questions and Online Mock Exam
C statement: data storage
Difference between @Resource and @Autowired
