Overview
之前我们记录了CatBoost
一个训练的例子,这次我们更新一个CatBoost
调参的例子,用的是业界比较流行的贝叶斯调参法。
1. 引入依赖包并加载数据
import pandas as pd
import numpy as np
from catboost import CatBoostClassifier, CatBoost, Pool, cv
from bayes_opt import BayesianOptimization
data_train = pd.read_csv('data/训练集.csv')
data_val = pd.read_csv('data/验证集.csv')
data_test = pd.read_csv('data/测试集.csv')
2. 加载特征列表并处理数据
name_list = pd.read_csv('特征列表_20190705.txt', header=None, index_col=0)
my_feature_names = list(name_list.transpose())
len(my_feature_names)
data_train_X = data_train[my_feature_names]
data_val_X = data_val[my_feature_names]
data_test_X = data_test[my_feature_names]
data_train_y = data_train['label']
data_val_y = data_val['label']
data_test_y = data_test['label']
3. 贝叶斯调参
def cat_train(bagging_temperature, reg_lambda, learning_rate):
params = {
'iterations':800,
'depth':3,
'bagging_temperature':bagging_temperature,
'reg_lambda':reg_lambda,
'learning_rate':learning_rate,
'loss_function':'Logloss',
'eval_metric':'AUC',
'random_seed':696,
'verbose':30
}
model = CatBoost(params)
# 评价数据集是验证集,评价指标是AUC
model.fit(data_train_X, data_train_y, eval_set=(data_val_X, data_val_y), plot=False, early_stopping_rounds=20)
print(params)
score_max = model.best_score_.get('validation').get('AUC')
return score_max
cat_opt = BayesianOptimization(cat_train,
{
'bagging_temperature': (1, 50),
'reg_lambda': (1, 200),
'learning_rate':(0.05, 0.2)
})
cat_opt.maximize(n_iter=15, init_points=5)
有了最佳参数之后,用这组最佳参数即可训练出最终的模型了。
贝叶斯调参部分,我们参考了如下文章:
Bayesian methods of hyperparameter optimization
Hyperparameter Optimization using bayesian optimization
以及GitHub源码:BayesianOptimization。