当前位置:网站首页>[machine learning] scikit learn introduction
[machine learning] scikit learn introduction
2022-04-23 05:32:00 【Muxi dare】
One 、Scikit-learn brief introduction
Scikit-learn Is an open source machine learning library that supports supervised and unsupervised learning . It also fits the model 、 Data preprocessing 、 Model selection and evaluation and many other utilities provide a variety of tools .
Two 、 Fitting and Forecasting : Estimator Foundation Fitting and predicting: estimator basics
( One )estimator
estimators:sklearn Provide built-in machine learning algorithms and models , Collectively referred to as estimators . Each estimator can use its fitting method fit To fit some data .
( Two )fit Method
fit Methods usually take two inputs : Sample matrix / Design matrix X ,X Of size by (n_samples, n_features), That is, the number of samples is rows , Characteristic is column . The target y Is the real number of the regression task , Or categorical integer ( Or any other discrete value set ). For unsupervised learning tasks , You don't have to specify y . y It's usually a one-dimensional array , Among them the first i Entries correspond to X Of the i Samples ( That's ok ) The goal of .X and y Is usually numpy Array or equivalent data type .
Once the estimator is fitted , It can be used to predict the target value of new data . And there is no need to retrain the estimator .
3、 ... and 、 Converters and preprocessors Transformers and pre-processors
Machine learning workflow usually consists of different parts . A typical pipe (pipeline) It consists of a preprocessing step of converting or interpolating data and a final predictor for predicting the target value .
pipeline = a pre-processing step(transform or impute the data)+ a final predictor/estimator(predicts target value)
stay scikit-learn in , The preprocessor and converter follow estimator The objects are the same API( They all actually inherit from the same BaseEstimator class ). Converter object has no prediction method , Instead, output the sample matrix of the new transformation X The transformation method of .
ColumnTransformer Column converter : Apply different transformations to different features
Four 、 The Conduit : Link preprocessor and estimator Pipelines: chaining pre-processors and estimators
The pipeline provides the same performance as a conventional estimator API: It can go through fit and predict To be consolidated for forecasting . Using pipes also prevents data leakage .
5、 ... and 、 Model to evaluate Model evaluation
many-ways , Especially for cross-validation
6、 ... and 、 Automatic parameter search Automatic parameter searches
All estimators have adjustable parameters ( Hyperparameters hyper-parameters). The generalization ability of estimators usually depends mainly on several parameters . The determination of super parameters mainly depends on the data .
Scikit-learn It provides a tool to automatically find the best parameter combination ( Through cross validation ).
Be careful : In practice , You almost always want to search the pipeline , Instead of a single estimator . One of the main reasons is , If you apply preprocessing steps to the entire dataset without using pipes , Then perform any type of cross validation , You will break the basic assumption of independence between training and test data . actually , Because you use the entire dataset to preprocess the data , Therefore, some information about the test set can be used by the training set . This will lead to overestimating the generalization ability of the estimator ( You can go to Kaggle Read more in the article ).
Using pipelines for cross validation and search will largely avoid this common trap .
Reference resources :
scikit-learn Official documents
scikit-learn Chinese document
Of machine learning framework sklearn brief introduction
scikit-learn Using examples
版权声明
本文为[Muxi dare]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204220535577551.html
边栏推荐
- Uncle wolf is looking for a translator -- Plato -- ongoing translation
- 字符识别easyocr
- 分支与循环语句
- Laravel implements the Holy Grail model with template inheritance
- Box collapse and margin collapse
- 3d slicer中拉直体的生成
- JS time format conversion
- Interpretation of common SQL statements
- Simple and basic use of switch and if
- Phlli in a VM node
猜你喜欢
Traversal array, object parent-child communication props / $emit
Hongji | how does HR carry out self change and organizational change in the digital era?
2021-11-01
Laravel implements the Holy Grail model with template inheritance
Interview Basics
How to set the initial value of El input number to null
Arithmetic and logical operations
Create process memory management copy_ Mm - processes and threads (IX)
Hongji micro classroom | cyclone RPA's "flexible digital employee" actuator
STL learning notes 0x0001 (container classification)
随机推荐
shell指令学习1
Camera imaging + homography transformation + camera calibration + stereo correction
Write the declaration of a function to return the reference of the array, and the array contains 10 string objects (notes)
Basic knowledge of redis
Various situations of data / component binding
catkin_ What did package do
Nécessité de précharger les cookies dans le sélénium
史上最强egg框架的error处理机制
QT compressed folder
Qwebsocket communication
Requirements for SQL server to retrieve SQL and user information
egg中的cors和proxy(づ ̄3 ̄)づ╭~踩坑填坑的过程~ToT~
Multiple mainstream SQL queries only take the latest one of the data
Laravel database
TSlint注释忽略错误和RESTful理解
SQL语句简单优化
Create a tabbar component under the components folder, which is public
Executable program execution process
Knowledge of egg testing -- mock, Supertest, coffee
Xiuxian real world and game world