当前位置:网站首页>How to create DataFrame with feature importance from XGBClassifier made by GridSearchCV?
How to create DataFrame with feature importance from XGBClassifier made by GridSearchCV?
2022-08-09 15:20:00 【AI悦创】
I use GridSearchCV of scikit-learn to find the best parameters for my XGBClassifier model, I use code like below:
grid_params = {
'n_estimators' : [100, 500, 1000],
'subsample' : [0.01, 0.05]
}
est = xgb.Classifier()
grid_xgb = GridSearchCV(param_grid = grid_params,
estimator = est,
scoring = 'roc_auc',
cv = 4,
verbose = 0)
grid_xgb.fit(X_train, y_train)
print('best estimator:', grid_xgb.best_estimator_)
print('best AUC:', grid_xgb.best_score_)
print('best parameters:', grid_xgb.best_params_)
I need to have feature importance DataFrame with my variables and their importance something like below:
variable | importance
---------|-------
x1 | 12.456
x2 | 3.4509
x3 | 1.4456
... | ...
How can I achieve above DF from my XGBClassifier made by using GridSearchCV ?
I tried to achieve that by using something like below:
f_imp_xgb = grid_xgb.get_booster().get_score(importance_type='gain')
keys = list(f_imp_xgb.keys())
values = list(f_imp_xgb.values())
df_f_imp_xgb = pd.DataFrame(data = values, index = keys, columns = ['score']).sort_values(by='score', ascending = False)
But I have error:
AttributeError: 'GridSearchCV' object has no attribute 'get_booster'
What can I do?
You can use:
clf.best_estimator_.get_booster().get_score(importance_type='gain')
Example:
import pandas as pd
import numpy as np
from xgboost import XGBClassifier
from sklearn.model_selection import GridSearchCV
np.random.seed(42)
# generate some dummy data
df = pd.DataFrame(data=np.random.normal(loc=0, scale=1, size=(100, 3)), columns=['x1', 'x2', 'x3'])
df['y'] = np.where(df.mean(axis=1) > 0, 1, 0)
# find the best model
X = df.drop(labels=['y'], axis=1)
y = df['y']
parameters = {
'n_estimators': [100, 500, 1000],
'subsample': [0.01, 0.05]
}
clf = GridSearchCV(
param_grid=parameters,
estimator=XGBClassifier(random_state=42),
scoring='roc_auc',
cv=4,
verbose=0
)
clf.fit(X, y)
# get the feature importances
importances = clf.best_estimator_.get_booster().get_score(importance_type='gain')
importances = pd.DataFrame(importances, index=[0]).transpose().rename(columns={
0: 'importance'})
print(importances)
# importance
# x1 1.782590
# x2 1.420949
# x3 1.500457
边栏推荐
- 【完美解决v-if导致echart不显示问题】
- Swagger2 knife4j NullPointerException 空指针问题
- 选择器的使用
- Leading practice | How the world's largest wine app uses design sprint to innovate the vivino model
- 求素数的三种方法
- 第三章:GEE数据的使用(3.1-3.3)
- Three ways to find prime numbers
- display属性的使用
- 第四章:使用本地地理空间数据(4.6-4.14)
- #yyds干货盘点# 面试必刷TOP101:删除有序链表中重复的元素-II
猜你喜欢

Apple Developer Account Apply for D-U-N-S Number

IDEA中操作数据库 以MySQL为例,可以放弃Navicat了

Heap series_0x09: Example of heap corruption (illegal access + uninitialized + heap handle mismatch)

Super hot summer air conditioner

Access Characteristics of Constructor under Inheritance Relationship

Leading practice | How the world's largest wine app uses design sprint to innovate the vivino model

投入C语言

初始C语言 C生万物

网络——数字数据编码

给我一个机会,帮你快速上手三子棋
随机推荐
网络——涉及的相关协议和设备汇总
C语言循环结构之万恶之源goto语句
No need to pay for the 688 Apple developer account, xcode13 packaged and exported ipa, and provided others for internal testing
web项目访问jar内部的静态资源
【1413. 逐步求和得到正数的最小值】
分布式恢复【进阶篇】
求n的阶乘的两种方法
0. About The Author And Preface
IDEA中操作数据库 以MySQL为例,可以放弃Navicat了
超文本链接
无需支付688苹果开发者账号,xcode13打包导出ipa,提供他人进行内测
PHP 补全日期区间中缺少的日期/返回缺少的日期
Heap series_0x0A: 3 methods to solve the heap overflow problem at once
在追梦的路上,唯独脚踏实地,才能梦想成真
Super hot summer air conditioner
网络——IPV4地址(一)
网络——ARP、DHCP、ICMP协议
yolov5训练并生成rknn模型以及3588平台部署
第三章:GEE数据的使用(3.4-3.11)
第一篇博客