Overview
之前我们记录了CatBoost
一个训练的例子,这次我们更新一个CatBoost
调参的例子,用的是业界比较流行的贝叶斯调参法。
1. 引入依赖包并加载数据
1 2 3 4 5 6 7 8 | import pandas as pd import numpy as np from catboost import CatBoostClassifier, CatBoost, Pool, cv from bayes_opt import BayesianOptimization data_train = pd.read_csv( 'data/训练集.csv' ) data_val = pd.read_csv( 'data/验证集.csv' ) data_test = pd.read_csv( 'data/测试集.csv' ) |
2. 加载特征列表并处理数据
1 2 3 4 5 6 7 8 9 10 11 | name_list = pd.read_csv( '特征列表_20190705.txt' , header = None , index_col = 0 ) my_feature_names = list (name_list.transpose()) len (my_feature_names) data_train_X = data_train[my_feature_names] data_val_X = data_val[my_feature_names] data_test_X = data_test[my_feature_names] data_train_y = data_train[ 'label' ] data_val_y = data_val[ 'label' ] data_test_y = data_test[ 'label' ] |
3. 贝叶斯调参
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | def cat_train(bagging_temperature, reg_lambda, learning_rate): params = { 'iterations' : 800 , 'depth' : 3 , 'bagging_temperature' :bagging_temperature, 'reg_lambda' :reg_lambda, 'learning_rate' :learning_rate, 'loss_function' : 'Logloss' , 'eval_metric' : 'AUC' , 'random_seed' : 696 , 'verbose' : 30 } model = CatBoost(params) # 评价数据集是验证集,评价指标是AUC model.fit(data_train_X, data_train_y, eval_set = (data_val_X, data_val_y), plot = False , early_stopping_rounds = 20 ) print (params) score_max = model.best_score_.get( 'validation' ).get( 'AUC' ) return score_max cat_opt = BayesianOptimization(cat_train, { 'bagging_temperature' : ( 1 , 50 ), 'reg_lambda' : ( 1 , 200 ), 'learning_rate' :( 0.05 , 0.2 ) }) cat_opt.maximize(n_iter = 15 , init_points = 5 ) |
有了最佳参数之后,用这组最佳参数即可训练出最终的模型了。
贝叶斯调参部分,我们参考了如下文章:
Bayesian methods of hyperparameter optimization
Hyperparameter Optimization using bayesian optimization
以及GitHub源码:BayesianOptimization。