当前位置:网站首页>How to create DataFrame with feature importance from XGBClassifier made by GridSearchCV?
How to create DataFrame with feature importance from XGBClassifier made by GridSearchCV?
2022-08-09 15:20:00 【AI悦创】
I use GridSearchCV of scikit-learn to find the best parameters for my XGBClassifier model, I use code like below:
grid_params = {
'n_estimators' : [100, 500, 1000],
'subsample' : [0.01, 0.05]
}
est = xgb.Classifier()
grid_xgb = GridSearchCV(param_grid = grid_params,
estimator = est,
scoring = 'roc_auc',
cv = 4,
verbose = 0)
grid_xgb.fit(X_train, y_train)
print('best estimator:', grid_xgb.best_estimator_)
print('best AUC:', grid_xgb.best_score_)
print('best parameters:', grid_xgb.best_params_)
I need to have feature importance DataFrame with my variables and their importance something like below:
variable | importance
---------|-------
x1 | 12.456
x2 | 3.4509
x3 | 1.4456
... | ...
How can I achieve above DF from my XGBClassifier made by using GridSearchCV ?
I tried to achieve that by using something like below:
f_imp_xgb = grid_xgb.get_booster().get_score(importance_type='gain')
keys = list(f_imp_xgb.keys())
values = list(f_imp_xgb.values())
df_f_imp_xgb = pd.DataFrame(data = values, index = keys, columns = ['score']).sort_values(by='score', ascending = False)
But I have error:
AttributeError: 'GridSearchCV' object has no attribute 'get_booster'
What can I do?
You can use:
clf.best_estimator_.get_booster().get_score(importance_type='gain')
Example:
import pandas as pd
import numpy as np
from xgboost import XGBClassifier
from sklearn.model_selection import GridSearchCV
np.random.seed(42)
# generate some dummy data
df = pd.DataFrame(data=np.random.normal(loc=0, scale=1, size=(100, 3)), columns=['x1', 'x2', 'x3'])
df['y'] = np.where(df.mean(axis=1) > 0, 1, 0)
# find the best model
X = df.drop(labels=['y'], axis=1)
y = df['y']
parameters = {
'n_estimators': [100, 500, 1000],
'subsample': [0.01, 0.05]
}
clf = GridSearchCV(
param_grid=parameters,
estimator=XGBClassifier(random_state=42),
scoring='roc_auc',
cv=4,
verbose=0
)
clf.fit(X, y)
# get the feature importances
importances = clf.best_estimator_.get_booster().get_score(importance_type='gain')
importances = pd.DataFrame(importances, index=[0]).transpose().rename(columns={
0: 'importance'})
print(importances)
# importance
# x1 1.782590
# x2 1.420949
# x3 1.500457
边栏推荐
猜你喜欢
随机推荐
【Web渗透】信息收集篇——Google搜索引擎(一)
Numpy数组索引/切片 多维度索引
DP 优化方法合集
The second chapter: create an interactive map (2.1 2.3)
【挨踢(IT)初体验】
C语言初印象(1.2w字粗略讲讲C)
2022年8月9日:用C#生成.NET应用程序--使用 Visual Studio Code 调试器,以交互方式调试 .NET 应用(不会,失败)
易基因|作物育种:DNA甲基化在大豆优良品种培育中的作用研究成果
第一章:GEE 和 GEEMAP
在追梦的路上,唯独脚踏实地,才能梦想成真
No need to pay for the 688 Apple developer account, xcode13 packaged and exported ipa, and provided others for internal testing
前言:关于作者吴秋生博士与此书简介
C语言三大循环while,for,do...while
线性表之顺序表
第一篇博客
网络——2021年大题解析
开源星「001 号」落地 FlyFish,欢迎登陆赢神秘大礼包!
初始C语言(2) C生万物
良匠-手把手教你写NFT抢购软(三)
Chapter 2: Creating Interactive Maps (2.4-2.6)









