当前位置:网站首页>[machine learning] scikit learn introduction
[machine learning] scikit learn introduction
2022-04-23 05:32:00 【Muxi dare】
One 、Scikit-learn brief introduction
Scikit-learn Is an open source machine learning library that supports supervised and unsupervised learning . It also fits the model 、 Data preprocessing 、 Model selection and evaluation and many other utilities provide a variety of tools .
Two 、 Fitting and Forecasting : Estimator Foundation Fitting and predicting: estimator basics
( One )estimator
estimators:sklearn Provide built-in machine learning algorithms and models , Collectively referred to as estimators . Each estimator can use its fitting method fit To fit some data .
( Two )fit Method
fit Methods usually take two inputs : Sample matrix / Design matrix X ,X Of size by (n_samples, n_features), That is, the number of samples is rows , Characteristic is column . The target y Is the real number of the regression task , Or categorical integer ( Or any other discrete value set ). For unsupervised learning tasks , You don't have to specify y . y It's usually a one-dimensional array , Among them the first i Entries correspond to X Of the i Samples ( That's ok ) The goal of .X and y Is usually numpy Array or equivalent data type .
Once the estimator is fitted , It can be used to predict the target value of new data . And there is no need to retrain the estimator .
3、 ... and 、 Converters and preprocessors Transformers and pre-processors
Machine learning workflow usually consists of different parts . A typical pipe (pipeline) It consists of a preprocessing step of converting or interpolating data and a final predictor for predicting the target value .
pipeline = a pre-processing step(transform or impute the data)+ a final predictor/estimator(predicts target value)
stay scikit-learn in , The preprocessor and converter follow estimator The objects are the same API( They all actually inherit from the same BaseEstimator class ). Converter object has no prediction method , Instead, output the sample matrix of the new transformation X The transformation method of .
ColumnTransformer Column converter : Apply different transformations to different features
Four 、 The Conduit : Link preprocessor and estimator Pipelines: chaining pre-processors and estimators
The pipeline provides the same performance as a conventional estimator API: It can go through fit and predict To be consolidated for forecasting . Using pipes also prevents data leakage .
5、 ... and 、 Model to evaluate Model evaluation
many-ways , Especially for cross-validation
6、 ... and 、 Automatic parameter search Automatic parameter searches
All estimators have adjustable parameters ( Hyperparameters hyper-parameters). The generalization ability of estimators usually depends mainly on several parameters . The determination of super parameters mainly depends on the data .
Scikit-learn It provides a tool to automatically find the best parameter combination ( Through cross validation ).
Be careful : In practice , You almost always want to search the pipeline , Instead of a single estimator . One of the main reasons is , If you apply preprocessing steps to the entire dataset without using pipes , Then perform any type of cross validation , You will break the basic assumption of independence between training and test data . actually , Because you use the entire dataset to preprocess the data , Therefore, some information about the test set can be used by the training set . This will lead to overestimating the generalization ability of the estimator ( You can go to Kaggle Read more in the article ).
Using pipelines for cross validation and search will largely avoid this common trap .
Reference resources :
scikit-learn Official documents
scikit-learn Chinese document
Of machine learning framework sklearn brief introduction
scikit-learn Using examples
版权声明
本文为[Muxi dare]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204220535577551.html
边栏推荐
- Nécessité de précharger les cookies dans le sélénium
- Introduction to qqueue
- what is wifi6?
- 跨域CORS的情缘~
- What financial products will benefit during May Day?
- Utf8 to STD: string and STD: string to utf8
- Tslint annotations ignore errors and restful understanding
- es6数组的使用
- The main difference between pointer and reference
- Call the interface to get the time
猜你喜欢

If I am PM's performance, movie VR ticket purchase display

2021-11-01
![[untitled] Notepad content writing area](/img/0a/4a3636025c3e0441f45c99e3c67b67.png)
[untitled] Notepad content writing area
![[triangle Yang Hui triangle printing odd even cycle JS for break cycle]](/img/9a/6cdc00e6056a1a47d2fbb8b9a8e975.png)
[triangle Yang Hui triangle printing odd even cycle JS for break cycle]

如果我是pm之 演出电影vr购票展示

On the use of constant pointer and pointer constant -- exercise (record)

OSI层常用协议

CPT 104_ TTL 09

Laravel routing job

How to set the initial value of El input number to null
随机推荐
Why can't V-IF and V-for be used together
分支与循环语句
Several examples of pointer transfer, parameter transfer, value transfer, etc
How to realize adaptive layout
Generation of straightening body in 3D slicer
Self incrementing sequence creation of MySQL
Excel 2016 打开文件第一次打不开,有时空白,有时很慢要打开第二次才行
Differences between auto and decltype inference methods (learning notes)
双击.jar包无法运行解决方法
es6数组的使用
2021-11-01
what is wifi6?
可执行程序执行流程
创建进程内存管理copy_mm - 进程与线程(九)
Double click The jar package cannot run the solution
Wbpack configuring production development environment
修仙真实世界与游戏世界
Understand the relationship between promise async await
what is wifi6?
Laravel routing settings