当前位置:网站首页>[machine learning] scikit learn introduction
[machine learning] scikit learn introduction
2022-04-23 05:32:00 【Muxi dare】
One 、Scikit-learn brief introduction
Scikit-learn Is an open source machine learning library that supports supervised and unsupervised learning . It also fits the model 、 Data preprocessing 、 Model selection and evaluation and many other utilities provide a variety of tools .
Two 、 Fitting and Forecasting : Estimator Foundation Fitting and predicting: estimator basics
( One )estimator
estimators:sklearn Provide built-in machine learning algorithms and models , Collectively referred to as estimators . Each estimator can use its fitting method fit To fit some data .
( Two )fit Method
fit Methods usually take two inputs : Sample matrix / Design matrix X ,X Of size by (n_samples, n_features), That is, the number of samples is rows , Characteristic is column . The target y Is the real number of the regression task , Or categorical integer ( Or any other discrete value set ). For unsupervised learning tasks , You don't have to specify y . y It's usually a one-dimensional array , Among them the first i Entries correspond to X Of the i Samples ( That's ok ) The goal of .X and y Is usually numpy Array or equivalent data type .
Once the estimator is fitted , It can be used to predict the target value of new data . And there is no need to retrain the estimator .
3、 ... and 、 Converters and preprocessors Transformers and pre-processors
Machine learning workflow usually consists of different parts . A typical pipe (pipeline) It consists of a preprocessing step of converting or interpolating data and a final predictor for predicting the target value .
pipeline = a pre-processing step(transform or impute the data)+ a final predictor/estimator(predicts target value)
stay scikit-learn in , The preprocessor and converter follow estimator The objects are the same API( They all actually inherit from the same BaseEstimator class ). Converter object has no prediction method , Instead, output the sample matrix of the new transformation X The transformation method of .
ColumnTransformer Column converter : Apply different transformations to different features
Four 、 The Conduit : Link preprocessor and estimator Pipelines: chaining pre-processors and estimators
The pipeline provides the same performance as a conventional estimator API: It can go through fit and predict To be consolidated for forecasting . Using pipes also prevents data leakage .
5、 ... and 、 Model to evaluate Model evaluation
many-ways , Especially for cross-validation
6、 ... and 、 Automatic parameter search Automatic parameter searches
All estimators have adjustable parameters ( Hyperparameters hyper-parameters). The generalization ability of estimators usually depends mainly on several parameters . The determination of super parameters mainly depends on the data .
Scikit-learn It provides a tool to automatically find the best parameter combination ( Through cross validation ).
Be careful : In practice , You almost always want to search the pipeline , Instead of a single estimator . One of the main reasons is , If you apply preprocessing steps to the entire dataset without using pipes , Then perform any type of cross validation , You will break the basic assumption of independence between training and test data . actually , Because you use the entire dataset to preprocess the data , Therefore, some information about the test set can be used by the training set . This will lead to overestimating the generalization ability of the estimator ( You can go to Kaggle Read more in the article ).
Using pipelines for cross validation and search will largely avoid this common trap .
Reference resources :
scikit-learn Official documents
scikit-learn Chinese document
Of machine learning framework sklearn brief introduction
scikit-learn Using examples
版权声明
本文为[Muxi dare]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204220535577551.html
边栏推荐
- Traversal array, object parent-child communication props / $emit
- CPT 104_TTL 09
- egg中的多进程模型--egg文档搬运工
- College entrance examination volunteer filling reference
- Uniapp wechat sharing
- 2021-10-12
- Basic knowledge of redis
- Branch and loop statements
- Understand the relationship between promise async await
- Deep learning object detection
猜你喜欢

QT displays the specified position and size of the picture

Necessity of selenium preloading cookies

Use of uniapp native plug-ins

Requirements for SQL server to retrieve SQL and user information

Uncle wolf is looking for a translator -- Plato -- ongoing translation

varnish入门

CPT 104_ TTL 09

Arithmetic and logical operations

Interview Basics
![Laravel [view]](/img/39/71db98d8832d9419bcc1097594d1b6.png)
Laravel [view]
随机推荐
selenium预先加载cookie的必要性
Excel 2016 cannot open the file for the first time. Sometimes it is blank and sometimes it is very slow. You have to open it for the second time
Uniapp wechat sharing
[triangle Yang Hui triangle printing odd even cycle JS for break cycle]
2021-09-28
Use of qwbengneview and qwebchannel.
[the background color changes after clicking a line]
CMake基础教程(39)pkgconfig
egg中的多进程模型--egg文档搬运工
The QT debug version runs normally and the release version runs crash
varnish入门
Phlli in a VM node
2021-11-01
史上最强egg框架的error处理机制
修仙真实世界与游戏世界
QT displays the specified position and size of the picture
Generation of straightening body in 3D slicer
String class understanding - final is immutable
what is wifi6?
Use pagoda + Xdebug + vscode to debug code remotely