当前位置:网站首页>[machine learning] scikit learn introduction
[machine learning] scikit learn introduction
2022-04-23 05:32:00 【Muxi dare】
One 、Scikit-learn brief introduction
Scikit-learn Is an open source machine learning library that supports supervised and unsupervised learning . It also fits the model 、 Data preprocessing 、 Model selection and evaluation and many other utilities provide a variety of tools .
Two 、 Fitting and Forecasting : Estimator Foundation Fitting and predicting: estimator basics
( One )estimator
estimators:sklearn Provide built-in machine learning algorithms and models , Collectively referred to as estimators . Each estimator can use its fitting method fit To fit some data .
( Two )fit Method
fit Methods usually take two inputs : Sample matrix / Design matrix X ,X Of size by (n_samples, n_features), That is, the number of samples is rows , Characteristic is column . The target y Is the real number of the regression task , Or categorical integer ( Or any other discrete value set ). For unsupervised learning tasks , You don't have to specify y . y It's usually a one-dimensional array , Among them the first i Entries correspond to X Of the i Samples ( That's ok ) The goal of .X and y Is usually numpy Array or equivalent data type .
Once the estimator is fitted , It can be used to predict the target value of new data . And there is no need to retrain the estimator .
3、 ... and 、 Converters and preprocessors Transformers and pre-processors
Machine learning workflow usually consists of different parts . A typical pipe (pipeline) It consists of a preprocessing step of converting or interpolating data and a final predictor for predicting the target value .
pipeline = a pre-processing step(transform or impute the data)+ a final predictor/estimator(predicts target value)
stay scikit-learn in , The preprocessor and converter follow estimator The objects are the same API( They all actually inherit from the same BaseEstimator class ). Converter object has no prediction method , Instead, output the sample matrix of the new transformation X The transformation method of .
ColumnTransformer Column converter : Apply different transformations to different features
Four 、 The Conduit : Link preprocessor and estimator Pipelines: chaining pre-processors and estimators
The pipeline provides the same performance as a conventional estimator API: It can go through fit and predict To be consolidated for forecasting . Using pipes also prevents data leakage .
5、 ... and 、 Model to evaluate Model evaluation
many-ways , Especially for cross-validation
6、 ... and 、 Automatic parameter search Automatic parameter searches
All estimators have adjustable parameters ( Hyperparameters hyper-parameters). The generalization ability of estimators usually depends mainly on several parameters . The determination of super parameters mainly depends on the data .
Scikit-learn It provides a tool to automatically find the best parameter combination ( Through cross validation ).
Be careful : In practice , You almost always want to search the pipeline , Instead of a single estimator . One of the main reasons is , If you apply preprocessing steps to the entire dataset without using pipes , Then perform any type of cross validation , You will break the basic assumption of independence between training and test data . actually , Because you use the entire dataset to preprocess the data , Therefore, some information about the test set can be used by the training set . This will lead to overestimating the generalization ability of the estimator ( You can go to Kaggle Read more in the article ).
Using pipelines for cross validation and search will largely avoid this common trap .
Reference resources :
scikit-learn Official documents
scikit-learn Chinese document
Of machine learning framework sklearn brief introduction
scikit-learn Using examples
版权声明
本文为[Muxi dare]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204220535577551.html
边栏推荐
- Laravel [view]
- varnish入门
- Differences between auto and decltype inference methods (learning notes)
- Rog attack
- Double click The jar package cannot run the solution
- Interpretation of common SQL statements
- After adding qmenu to qtoolbutton and QPushButton, remove the triangle icon in the lower right corner
- 跨域CORS的情缘~
- Frequently asked interview questions - 1 (non technical)
- catkin_package到底干了什么
猜你喜欢
随机推荐
Some pits used by uni
2021-09-27
转置卷积(Transposed Convolution)
Getting started with varnish
QSS, qdateedit, qcalendarwidget custom settings
Traversal array, object parent-child communication props / $emit
How to realize adaptive layout
C, class library
Qwebsocket communication
Interview Basics
Why can't V-IF and V-for be used together
Usage and difference of shellexecute, shellexecuteex and winexec in QT
使用宝塔+xdebug+vscode远程调试代码
Fast application fuzzy search
IPI interrupt
[no title] Click the classification jump page to display the details
Common interview questions - 4 (MySQL)
Differences between auto and decltype inference methods (learning notes)
Introduction to qqueue
Laravel database








