当前位置:网站首页>Feature Engineering of interview summary
Feature Engineering of interview summary
2022-04-23 13:16:00 【DCGJ666】
Feature Engineering of interview summary
- What are the characteristics of engineering
- In case of missing value , What are the ways to deal with it
- Treatment of sample imbalance
- appear Nan Why
- Feature screening , How to find features with high similarity and remove
- Contains millions , How to deal with hundreds of millions of characteristic data in deep learning
- What are the methods to calculate the correlation between features ?
What are the characteristics of engineering
- Data preprocessing
1. Handling missing values
2. Picture data expansion
3. Handling outliers
4. Deal with category imbalance - Feature scaling
1. normalization
2. Regularization - Feature code
1. Serial number code
2. Hot coding alone
3. Binary code
4. discretization - feature selection
1. Filter type (filter): First of all, we select the features of the data set , The process has nothing to do with subsequent learners , That is to design some statistics to filter features , Do not consider the follow-up learner problem , Such as variance selection , Chi square test , Mutual information
2. Parcel type (wrapper): It's actually a classifier , It is the performance of subsequent learners as the evaluation standard of feature subset . Such as las vagas Algorithm
3. The embedded (embedding): In fact, it is the learner's autonomous selection of features . For example, based on the selection of punishment items , Tree based selection GBDT - feature extraction
1. Dimension reduction
2. Image feature extraction
3. Text feature extraction - Feature building
In case of missing value , What are the ways to deal with it
- Use features with missing values directly : When only a small number of samples lack this feature, you can try to use ;
- Delete features with missing values : This method is generally applicable to most samples that lack this feature , And containing only a small number of valid values is valid
- Interpolation to complete missing values
mean value 、 The number of 、 Median 、 Fixed value 、 Manual 、 Nearest neighbor complement
Modeling predictions : Return to 、 Decision tree
High dimensional mapping , Compress perception
There are many ways to interpolate
Treatment of sample imbalance
- Expand the data set
- Try other evaluation indicators
- Resampling the dataset
- Sample the data samples of the subclass to increase the number of data samples of the subclass , Oversampling (over-sampling, The number of samples is greater than the number of such samples )
- Sample a large class of data samples to reduce the number of such data samples , Under sampling (under-sampling, The number of samples is less than the number of such samples )
- Try different classification algorithms : For example, the decision tree often performs well on category unbalanced data
- Try to punish the model : For example, your classification task is to identify those sub categories , Then you can add weights to the small class sample data of the classifier , Reduce the weight of large classes of samples ,focal loss
appear Nan Why
- Nan The meaning of is meaningless number , There are several situations :0/0, Inf/Inf, Inf-Inf, Inf*0 etc. , Will lead to uncertain results , So you get NaN
- Data processing , In practical engineering, data is often missing or incomplete , At this point, we can set those missing to nan
- When reading data , A character is not data , Then we think of it as nan Handle
Feature screening , How to find features with high similarity and remove
feature selection — Filtration method : May adopt Variance selection method or Correlation coefficient method
Contains millions , How to deal with hundreds of millions of characteristic data in deep learning
Many features , Less data , It is easy to cause model over fitting
- Dimension reduction :PCA or LDA
- Using regularization ,L1 or L2
- Sample expansion
- feature selection : Remove unimportant features
What are the methods to calculate the correlation between features ?
- pearson coefficient , Calculate the data of constant distance continuous variables . Is between -1 and 1 Between the value of the
- spearman Rank correlation coefficient : It is an indicator to measure the statistical correlation between two variables , Used to evaluate how good the current monotone function is to describe the relationship between two variables
- kendall The correlation coefficient : Kendall coefficient is a statistical value used to measure the correlation between two random variables
版权声明
本文为[DCGJ666]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204230611343376.html
边栏推荐
- [notes de marche]
- CMSIS cm3 source code annotation
- Conflict between Mui picker and drop-down refresh
- 9419页最新一线互联网Android面试题解析大全
- mui + hbuilder + h5api模拟弹出支付样式
- 普通大学生如何拿到大厂offer?敖丙教你一招致胜!
- FFmpeg常用命令
- Playwright contrôle l'ouverture de la navigation Google locale et télécharge des fichiers
- Armv8m (cortex M33) MPU actual combat
- nodejs + mysql 实现简单注册功能(小demo)
猜你喜欢

How do ordinary college students get offers from big factories? Ao Bing teaches you one move to win!

100 GIS practical application cases (53) - making three-dimensional image map as the base map of urban spatial pattern analysis

Servlet of three web components

你和42W奖金池,就差一次“长沙银行杯”腾讯云启创新大赛!

内核错误: No rule to make target ‘debian/canonical-certs.pem‘, needed by ‘certs/x509_certificate_list‘
![[51 single chip microcomputer traffic light simulation]](/img/70/0d78e38c49ce048b179a85312d063f.png)
[51 single chip microcomputer traffic light simulation]

FatFs FAT32 learning notes

Example interview | sun Guanghao: College Club grows and starts a business with me

SPI NAND flash summary

解决虚拟机中Oracle每次要设置ip的问题
随机推荐
[walking notes]
The difference between string and character array in C language
Playwright controls local Google browsing to open and download files
Vscode tips
AUTOSAR from introduction to mastery 100 lectures (83) - bootloader self refresh
mui 关闭其他页面,只保留首页面
Example interview | sun Guanghao: College Club grows and starts a business with me
Translation of multi modal visual tracking: review and empirical comparison
[untitled] make a 0-99 counter, P1 7 connected to key, P2 connected to nixie tube section, common anode nixie tube, P3 0,P3. 1. Connect the nixie tube bit code. Each time you press the key, the nixie
Design and manufacture of 51 single chip microcomputer solar charging treasure with low voltage alarm (complete code data)
Important knowledge of transport layer (interview, retest, final)
Mui + hbuilder + h5api simulate pop-up payment style
4.22 study record (you only did water problems in one day, didn't you)
SSM整合之pom.xml
基于uniapp异步封装接口请求简介
AUTOSAR from introduction to mastery 100 lectures (81) - FIM of AUTOSAR Foundation
R语言中dcast 和 melt的使用 简单易懂
Office 2021 installation package download and activation tutorial
你和42W奖金池,就差一次“长沙银行杯”腾讯云启创新大赛!
Request和Response及其ServletContext总结