Overview

之前我们记录了CatBoost一个训练的例子,这次我们更新一个CatBoost调参的例子,用的是业界比较流行的贝叶斯调参法。

1. 引入依赖包并加载数据

import pandas as pd
import numpy as np
from catboost import CatBoostClassifier, CatBoost, Pool, cv
from bayes_opt import BayesianOptimization

data_train = pd.read_csv('data/训练集.csv')
data_val = pd.read_csv('data/验证集.csv')
data_test = pd.read_csv('data/测试集.csv')

2. 加载特征列表并处理数据

name_list = pd.read_csv('特征列表_20190705.txt', header=None, index_col=0)
my_feature_names = list(name_list.transpose())
len(my_feature_names)

data_train_X = data_train[my_feature_names]
data_val_X = data_val[my_feature_names]
data_test_X = data_test[my_feature_names]

data_train_y = data_train['label']
data_val_y  = data_val['label']
data_test_y  = data_test['label']

3. 贝叶斯调参

def cat_train(bagging_temperature, reg_lambda, learning_rate):
    params = {
        'iterations':800,
        'depth':3,
        'bagging_temperature':bagging_temperature,
        'reg_lambda':reg_lambda,
        'learning_rate':learning_rate,
        'loss_function':'Logloss',
        'eval_metric':'AUC',
        'random_seed':696,
        'verbose':30
    }

    model = CatBoost(params)
    # 评价数据集是验证集,评价指标是AUC
    model.fit(data_train_X, data_train_y, eval_set=(data_val_X, data_val_y), plot=False, early_stopping_rounds=20) 
    
    print(params)
    score_max = model.best_score_.get('validation').get('AUC')
    return score_max

cat_opt = BayesianOptimization(cat_train, 
                           {
                              'bagging_temperature': (1, 50),  
                              'reg_lambda': (1, 200),
                              'learning_rate':(0.05, 0.2)
                            })

cat_opt.maximize(n_iter=15, init_points=5)

有了最佳参数之后,用这组最佳参数即可训练出最终的模型了。

贝叶斯调参部分,我们参考了如下文章:
Bayesian methods of hyperparameter optimization

Hyperparameter Optimization using bayesian optimization

以及GitHub源码:BayesianOptimization