Overview

之前我们记录了CatBoost一个训练的例子,这次我们更新一个CatBoost调参的例子,用的是业界比较流行的贝叶斯调参法。

1. 引入依赖包并加载数据

import pandas as pd
import numpy as np
from catboost import CatBoostClassifier, CatBoost, Pool, cv
from bayes_opt import BayesianOptimization

data_train = pd.read_csv('data/训练集.csv')
data_val = pd.read_csv('data/验证集.csv')
data_test = pd.read_csv('data/测试集.csv')

2. 加载特征列表并处理数据

name_list = pd.read_csv('特征列表_20190705.txt', header=None, index_col=0)
my_feature_names = list(name_list.transpose())
len(my_feature_names)
    
data_train_X = data_train[my_feature_names]
data_val_X = data_val[my_feature_names]
data_test_X = data_test[my_feature_names]

data_train_y = data_train['label']
data_val_y  = data_val['label']
data_test_y  = data_test['label']

3. 贝叶斯调参

    def cat_train(bagging_temperature, reg_lambda, learning_rate):
        params = {
            'iterations':800,
            'depth':3,
            'bagging_temperature':bagging_temperature,
            'reg_lambda':reg_lambda,
            'learning_rate':learning_rate,
            'loss_function':'Logloss',
            'eval_metric':'AUC',
            'random_seed':696,
            'verbose':30
        }

        model = CatBoost(params)
        # 评价数据集是验证集,评价指标是AUC
        model.fit(data_train_X, data_train_y, eval_set=(data_val_X, data_val_y), plot=False, early_stopping_rounds=20) 
        
        print(params)
        score_max = model.best_score_.get('validation').get('AUC')
        return score_max

    cat_opt = BayesianOptimization(cat_train, 
                               {
                                  'bagging_temperature': (1, 50),  
                                  'reg_lambda': (1, 200),
                                  'learning_rate':(0.05, 0.2)
                                })

    cat_opt.maximize(n_iter=15, init_points=5)

有了最佳参数之后,用这组最佳参数即可训练出最终的模型了。

贝叶斯调参部分,我们参考了如下文章:
Bayesian methods of hyperparameter optimization

Hyperparameter Optimization using bayesian optimization

以及GitHub源码:BayesianOptimization