当前位置:网站首页>Deep learning -- Summary of Feature Engineering

Deep learning -- Summary of Feature Engineering

2022-04-23 19:25:00 Try not to lie flat

For machine learning , General steps :

Data collection — Data cleaning — Feature Engineering — Data modeling

We know , Feature engineering includes feature construction , Feature extraction and feature selection . Feature engineering is actually transforming the original data into models , The process of training data .

Feature building

 ​https://zhuanlan.zhihu.com/p/424518359    Other bloggers' explanations for normalization

      In feature construction , First give me a pile of data , So many and messy , We must normalize its data first , Let the data be distributed as I want to see . Then after the specification , You need data preprocessing , Especially missing values 、 Classification feature processing 、 Processing of continuous features .

Data normalization : normalization : Maximum and minimum standardization 、Z-Score Standardization

So what's the biggest difference between them ? Is to change the distribution of characteristic data .

Maximum and minimum standardization : Will change the distribution of characteristic data

 Deep learning —— Summary of feature engineering _ Data preprocessing

Z-Score Standardization : Do not change the distribution of characteristic data

 Deep learning —— Summary of feature engineering _PCA_02

Maximum and minimum standardization :

  • The linear function transforms the method of linearizing the original data into [0 1] The scope of the , The calculation result is the normalized data ,X For raw data
  • This normalization method is more suitable for The values are concentrated The situation of
  • defects : If max and min unstable , It's easy to make the normalization result unstable , It makes the follow-up effect unstable . Empirical constants can be used to replace max and min
  • Application scenarios : When it comes to distance measurement 、 Covariance calculation 、 When the data does not conform to the positive distribution , You can use the first method or other normalization methods ( barring Z-score Method ). For example, in image processing , take RGB After the image is converted to a grayscale image, its value is limited to [0 255] The scope of the

Z-Score Standardization :

  • among ,μ、σ They are the mean and method of the original data set .
  • Normalize the original data set to mean 0、 variance 1 Data set of
  • This normalization method requires that the distribution of the original data can be approximately Gaussian distribution , Otherwise, the effect of normalization will become very bad .
  • Application scenarios : stay classification 、 clustering In the algorithm, , When distance is needed to measure similarity 、 Or use PCA technology During dimensionality reduction ,Z-score standardization Perform better .

feature extraction

      So in the feature extraction method , We first learned about data partitioning : Include what the dataset is ? Give you a pile of data , What is your split method ? There are also important dimensionality reduction methods :PCA, There are other ways , such as ICA, But for my final exam , I won't focus on the record, hahaha .

      Data sets : Training set 、 Verification set 、 Test set

  • Training set : Training data , Adjust model parameters 、 Training model weight , Building machine learning model
  • Verification set : The performance of the model is verified by the data separated from the training set , As the performance index of the evaluation model
  • Test set : Enter the training set with new data , To verify the quality of the trained model

      Split method : Set aside method 、K- Fold cross validation

  • Set aside method : Divide the data set into mutually exclusive sets , Maintain the consistency of the split set data
  • K- Fold cross validation : Split the dataset into K A mutually exclusive subset of similar size , Ensure the consistency of their data distribution

      In order to convert the original data into obvious physical / Characteristics of statistical significance , You need to build new data , The methods used are usually PCA、ICA、LDA etc. .

      So why do we need to reduce the dimension of features

  • Eliminate noise
  • Data compression
  • Eliminate data redundancy
  • Improve the accuracy of the algorithm
  • Reduce the data dimension to 2 Dimension or 3 dimension , Maintain data visibility

     PCA( Principal component analysis ): Through the transformation of coordinate axis ; Find the optimal subspace of data distribution


  • Enter the original data , The structure is (m,n), Find the original n It's made up of two eigenvectors n Dimensional space
  • Determine the eigenvector after dimensionality reduction :K
  • Through some kind of change , find n A new eigenvector , And the new n Dimensional space V*—— Matrix decomposition
  • Find the original data in the new feature space V Medium n The value corresponding to a new eigenvector , Mapping data to a new space
  • Before selection K One of the most informative features , Delete unselected features , Will succeed n Dimension reduction of dimensional space K dimension

 Deep learning —— Summary of feature engineering _ Feature Engineering _03

      For feature selection , There are several ways : Filter type 、 Parcel type 、 The embedded ( Understanding can )

      Last , Let's look at the difference between super parameters and parameters :

  • Hyperparameters : Parameters set before learning the model , Artificially set , such as padding、stride、k-means Of k、 depth 、 Number and size of convolution kernels 、 Learning rate
  • Parameters : The parameters obtained through a series of model training , Such as weight w and wx+b Inside b.













版权声明
本文为[Try not to lie flat]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204231859372120.html