当前位置:网站首页>Understanding ML Cross Validation Fast
Understanding ML Cross Validation Fast
2022-08-09 04:20:00 【The whole stack O - Jay】
Cross Validation is a statistical analysis method for validating the performance of a classifier (the model you train).The basic idea is to group the original training data in a sense, a training set and a validation set.So for a large data set, we generally divide it into training set, validation set and test set according to 6:2:2 (the simple machine learning process will omit the validation set).First use the training set to train the model, and then use the validation set to test the trained model to preliminarily evaluate the performance of the model (Note!Cross-validation is still in the training stage, nottesting phase).
Common cross-validation methods include simple cross-validation, K-fold cross-validation, leave-one-out cross-validation, and leave-out cross-validationmethod for cross-validation.
Simple cross-validation
It is the simplest concept above. The training data is divided into a training set and a validation set. The training set trains the model, and the validation set validates the model. The accuracy obtained from the validation is the performance indicator of the model.
from sklearn.model_selection import train_test_splitX_train, X_test, y_train, y_test = train_test_split( data, target, test_size=.4, random_state=0 )
K-Fold Cross Validation (K-Fold)
The training data is divided into K groups (usually divided equally), and then each subset data respectively is used as a validation set, and the remaining K-1 groups are subdivided intoThe set is regarded as a training set, so K cross-validation will be performed to obtain K models, and the final validation accuracy of the K models will be averaged as the performance indicator of the model. Usually, K is set to be greater than or equal to 3.
from sklearn.model_selection import KFoldkf = KFold(n_splits = 10) # k take 10
Leave One Out Cross Validation (LOO-CV)
leave-one-out cross validation is the case of K=N in K-fold cross validation, that is, each subset consists of only one sample data, N samplesTherefore, N times of cross-validation will be performed to obtain N models, and the final validation accuracy of the N models will be averaged as the performance indicator of the model.
from sklearn.model_selection import LeaveOneOutloo = LeaveOneOut()
Leave P cross-validation (LPO-CV)
leave-p-out cross validationis the case of K=P in K-fold cross-validation, P is determined by ourselves, and each subset only consists of P samplesThe data is composed of N sample data, so (N-P+1) cross-validation will be performed to obtain (N-P+1) models, and the final validation accuracy of these models will be averaged as the model's performance indicator..
from sklearn.model_selection import LeavePOutlpo = LeavePOut(p=5) #ptake 5
边栏推荐
- “error“: { “root_cause“: [{ “type“: “circuit_breaking_exception“, “reason“: “[parent] D【已解决】
- 单元测试覆盖率怎么算?
- 2022R1快开门式压力容器操作考试模拟100题及在线模拟考试
- 自动化测试-图片中添加文字注释,添加到allure测试报告中
- Query the size of the total points obtained in a certain time period to sort
- Base64编码和图片转化
- 使用ceph-deploycep集群部署,并用3个磁盘作为专用osd
- 数据库指标是怎么个意思
- 2022年低压电工练习题及模拟考试
- 岭回归和LASSO回归
猜你喜欢
随机推荐
MySQL: redo log log - notes for personal use
2022年熔化焊接与热切割考试模拟100题及在线模拟考试
Crosstalk and Protection
【数学建模绘图系列教程】绘图模板总结
浅谈进程与其创建方式
JVM垃圾回收机制简介
稳定性测试怎么做,这篇文章彻底讲透了!
LeetCode题解—15.三数之和
消失的遗传力--wiki
松柏集(江风起)
MySQL:意向共享锁和意向排它锁 | 死锁 | 锁的优化
pr22.5最新版下载地址
MKNetworkKit replacing domain name wrong solution
数量遗传学遗传力计算1:亲子回归方法
松柏集(夜未央)
旭日图更好地呈现数据的层次结构,细致划分各项数据
人类微生物组和缺失遗传力--读论文
全栈代码测试覆盖率及用例发现系统的建设和实践
电脑系统重装后如何开启Win11实时辅助字幕
“error“: { “root_cause“: [{ “type“: “circuit_breaking_exception“, “reason“: “[parent] D【已解决】