当前位置:网站首页>Feature Engineering of interview summary
Feature Engineering of interview summary
2022-04-23 13:16:00 【DCGJ666】
Feature Engineering of interview summary
- What are the characteristics of engineering
- In case of missing value , What are the ways to deal with it
- Treatment of sample imbalance
- appear Nan Why
- Feature screening , How to find features with high similarity and remove
- Contains millions , How to deal with hundreds of millions of characteristic data in deep learning
- What are the methods to calculate the correlation between features ?
What are the characteristics of engineering
- Data preprocessing
1. Handling missing values
2. Picture data expansion
3. Handling outliers
4. Deal with category imbalance - Feature scaling
1. normalization
2. Regularization - Feature code
1. Serial number code
2. Hot coding alone
3. Binary code
4. discretization - feature selection
1. Filter type (filter): First of all, we select the features of the data set , The process has nothing to do with subsequent learners , That is to design some statistics to filter features , Do not consider the follow-up learner problem , Such as variance selection , Chi square test , Mutual information
2. Parcel type (wrapper): It's actually a classifier , It is the performance of subsequent learners as the evaluation standard of feature subset . Such as las vagas Algorithm
3. The embedded (embedding): In fact, it is the learner's autonomous selection of features . For example, based on the selection of punishment items , Tree based selection GBDT - feature extraction
1. Dimension reduction
2. Image feature extraction
3. Text feature extraction - Feature building
In case of missing value , What are the ways to deal with it
- Use features with missing values directly : When only a small number of samples lack this feature, you can try to use ;
- Delete features with missing values : This method is generally applicable to most samples that lack this feature , And containing only a small number of valid values is valid
- Interpolation to complete missing values
mean value 、 The number of 、 Median 、 Fixed value 、 Manual 、 Nearest neighbor complement
Modeling predictions : Return to 、 Decision tree
High dimensional mapping , Compress perception
There are many ways to interpolate
Treatment of sample imbalance
- Expand the data set
- Try other evaluation indicators
- Resampling the dataset
- Sample the data samples of the subclass to increase the number of data samples of the subclass , Oversampling (over-sampling, The number of samples is greater than the number of such samples )
- Sample a large class of data samples to reduce the number of such data samples , Under sampling (under-sampling, The number of samples is less than the number of such samples )
- Try different classification algorithms : For example, the decision tree often performs well on category unbalanced data
- Try to punish the model : For example, your classification task is to identify those sub categories , Then you can add weights to the small class sample data of the classifier , Reduce the weight of large classes of samples ,focal loss
appear Nan Why
- Nan The meaning of is meaningless number , There are several situations :0/0, Inf/Inf, Inf-Inf, Inf*0 etc. , Will lead to uncertain results , So you get NaN
- Data processing , In practical engineering, data is often missing or incomplete , At this point, we can set those missing to nan
- When reading data , A character is not data , Then we think of it as nan Handle
Feature screening , How to find features with high similarity and remove
feature selection — Filtration method : May adopt Variance selection method or Correlation coefficient method
Contains millions , How to deal with hundreds of millions of characteristic data in deep learning
Many features , Less data , It is easy to cause model over fitting
- Dimension reduction :PCA or LDA
- Using regularization ,L1 or L2
- Sample expansion
- feature selection : Remove unimportant features
What are the methods to calculate the correlation between features ?
- pearson coefficient , Calculate the data of constant distance continuous variables . Is between -1 and 1 Between the value of the
- spearman Rank correlation coefficient : It is an indicator to measure the statistical correlation between two variables , Used to evaluate how good the current monotone function is to describe the relationship between two variables
- kendall The correlation coefficient : Kendall coefficient is a statistical value used to measure the correlation between two random variables
版权声明
本文为[DCGJ666]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204230611343376.html
边栏推荐
- three. JS text ambiguity problem
- 5 tricky activity life cycle interview questions. After learning, go and hang the interviewer!
- Xi'an CSDN signed a contract with Xi'an Siyuan University, opening a new chapter in IT talent training
- Nodejs + websocket cycle small case
- Proteus 8.10 installation problem (personal test is stable and does not flash back!)
- 内核错误: No rule to make target ‘debian/canonical-certs.pem‘, needed by ‘certs/x509_certificate_list‘
- 【动态规划】221. 最大正方形
- Three channel ultrasonic ranging system based on 51 single chip microcomputer (timer ranging)
- Loading and using image classification dataset fashion MNIST in pytorch
- 4.22学习记录(你一天只做了水题是吗)
猜你喜欢

Common interview questions and detailed analysis of the latest Android developers in 2020

MySQL 8.0.11下载、安装和使用可视化工具连接教程
![[untitled] PID control TT encoder motor](/img/ce/942a0b87994699f73da215e7cad2a1.png)
[untitled] PID control TT encoder motor

ESP32 VHCI架构传统蓝牙设置scan mode,让设备能被搜索到

Ding ~ your scholarship has arrived! C certified enterprise scholarship list released

“湘见”技术沙龙 | 程序员&CSDN的进阶之路

Request和Response及其ServletContext总结

100 GIS practical application cases (51) - a method for calculating the hourly spatial average of NC files according to the specified range in ArcGIS

100 GIS practical application cases (52) - how to keep the number of rows and columns consistent and aligned when cutting grids with grids in ArcGIS?

Nodejs + Mysql realize simple registration function (small demo)
随机推荐
uniapp image 引入本地图片不显示
Mui close other pages and keep only the first page
初鉴canvas,展示个小小的小案例
十万大学生都已成为猿粉,你还在等什么?
Design of STM32 multi-channel temperature measurement wireless transmission alarm system (industrial timing temperature measurement / engine room temperature timing detection, etc.)
mui + hbuilder + h5api模拟弹出支付样式
office2021安装包下载与激活教程
9419页最新一线互联网Android面试题解析大全
Mui + hbuilder + h5api simulate pop-up payment style
Example interview | sun Guanghao: College Club grows and starts a business with me
51 single chip microcomputer stepping motor control system based on LabVIEW upper computer (upper computer code + lower computer source code + ad schematic + 51 complete development environment)
Vscode tips
[untitled] make a 0-99 counter, P1 7 connected to key, P2 connected to nixie tube section, common anode nixie tube, P3 0,P3. 1. Connect the nixie tube bit code. Each time you press the key, the nixie
hbuilderx + uniapp 打包ipa提交App store踩坑记
playwright控制本地穀歌瀏覽打開,並下載文件
Translation of multi modal visual tracking: review and empirical comparison
mysql 基本语句查询
Servlet of three web components
@优秀的你!CSDN高校俱乐部主席招募!
mui 微信支付 排坑