当前位置:网站首页>Understanding ML Cross Validation Fast
Understanding ML Cross Validation Fast
2022-08-09 04:20:00 【The whole stack O - Jay】
Cross Validation is a statistical analysis method for validating the performance of a classifier (the model you train).The basic idea is to group the original training data in a sense, a training set and a validation set.So for a large data set, we generally divide it into training set, validation set and test set according to 6:2:2 (the simple machine learning process will omit the validation set).First use the training set to train the model, and then use the validation set to test the trained model to preliminarily evaluate the performance of the model (Note!Cross-validation is still in the training stage, nottesting phase).
Common cross-validation methods include simple cross-validation, K-fold cross-validation, leave-one-out cross-validation, and leave-out cross-validationmethod for cross-validation.
Simple cross-validation
It is the simplest concept above. The training data is divided into a training set and a validation set. The training set trains the model, and the validation set validates the model. The accuracy obtained from the validation is the performance indicator of the model.
from sklearn.model_selection import train_test_splitX_train, X_test, y_train, y_test = train_test_split( data, target, test_size=.4, random_state=0 )K-Fold Cross Validation (K-Fold)
The training data is divided into K groups (usually divided equally), and then each subset data respectively is used as a validation set, and the remaining K-1 groups are subdivided intoThe set is regarded as a training set, so K cross-validation will be performed to obtain K models, and the final validation accuracy of the K models will be averaged as the performance indicator of the model. Usually, K is set to be greater than or equal to 3.
from sklearn.model_selection import KFoldkf = KFold(n_splits = 10) # k take 10Leave One Out Cross Validation (LOO-CV)
leave-one-out cross validation is the case of K=N in K-fold cross validation, that is, each subset consists of only one sample data, N samplesTherefore, N times of cross-validation will be performed to obtain N models, and the final validation accuracy of the N models will be averaged as the performance indicator of the model.
from sklearn.model_selection import LeaveOneOutloo = LeaveOneOut()Leave P cross-validation (LPO-CV)
leave-p-out cross validationis the case of K=P in K-fold cross-validation, P is determined by ourselves, and each subset only consists of P samplesThe data is composed of N sample data, so (N-P+1) cross-validation will be performed to obtain (N-P+1) models, and the final validation accuracy of these models will be averaged as the model's performance indicator..
from sklearn.model_selection import LeavePOutlpo = LeavePOut(p=5) #ptake 5边栏推荐
- Dingding conflicts with RStudio shortcuts--Dingding shortcut settings
- NanoDet代码逐行精读与修改(四)动态软标签分配:dynamic soft label assigner
- 松柏集(浮窗思)
- Poly1CrossEntropyLoss的pytorch实现
- “error“: { “root_cause“: [{ “type“: “circuit_breaking_exception“, “reason“: “[parent] D【已解决】
- 软件质效领航者 | 优秀案例•东风集团DevOps改革项目
- Integer multiple series
- 器件可靠性与温度的关系
- 电脑系统重装后如何开启Win11实时辅助字幕
- 【Pyspark】udf使用入门
猜你喜欢
随机推荐
提升用户体验,给你的模态弹窗加个小细节
2022 Security Officer-A Certificate Special Work Permit Exam Question Bank and Online Mock Exam
岭回归和LASSO回归
simple math formula calculation
OpenCV相机标定完全指南(有手就行)
松柏集(浮窗思)
MySQL:意向共享锁和意向排它锁 | 死锁 | 锁的优化
为什么有的时间函数在同一事务内返回的都是同一值?
npm package.json
数量遗传学遗传力计算2:半同胞和全同胞
“error“: { “root_cause“: [{ “type“: “circuit_breaking_exception“, “reason“: “[parent] D [solved]
2022年安全员-B证考试练习题及在线模拟考试
OKR management process, how to implement effective dialogue, using the CFR feedback and recognition?
ceph create pool, map, delete exercises
网络设置、ssh服务
Efficient review of deep learning DL, CV, NLP
极速理解ML交叉验证
[math] dot product and cross product
HyperLynx(四)差分传输线模型
配置网络接口的“IP“命令








