XGBOOST调参思路

system

参数调优的一般方法。我们会使用和GBM中相似的方法。需要进行如下步骤：

选择较高的学习速率(learning rate)。一般情况下，学习速率的值为0.1。但是，对于不同的问题，理想的学习速率有时候会在0.05到0.3之间波动。选择对应于此学习速率的理想决策树数量。XGBoost有一个很有用的函数“cv”，这个函数可以在每一次迭代中使用交叉验证，并返回理想的决策树数量。
对于给定的学习速率和决策树数量，进行决策树特定参数调优(max_depth, min_child_weight, gamma, subsample, colsample_bytree)。在确定一棵树的过程中，我们可以选择不同的参数，待会儿我会举例说明。
xgboost的正则化参数的调优。(lambda, alpha)。这些参数可以降低模型的复杂度，从而提高模型的表现。
降低学习速率，确定理想参数。

第一步：确定学习速率和tree_based 参数调优的估计器数目。

为了确定boosting 参数，我们要先给其它参数一个初始值。咱们先按如下方法取值：
1、max_depth = 5 :这个参数的取值最好在3-10之间。我选的起始值为5，但是你也可以选择其它的值。起始值在4-6之间都是不错的选择。
2、min_child_weight = 1:在这里选了一个比较小的值，因为这是一个极不平衡的分类问题。因此，某些叶子节点下的值会比较小。
3、gamma = 0: 起始值也可以选其它比较小的值，在0.1到0.2之间就可以。这个参数后继也是要调整的。
4、subsample,colsample_bytree = 0.8: 这个是最常见的初始值了。典型值的范围在0.5-0.9之间。
5、scale_pos_weight = 1: 这个值是因为类别十分不平衡。
注意哦，上面这些参数的值只是一个初始的估计值，后继需要调优。这里把学习速率就设成默认的0.1。然后用xgboost中的cv函数来确定最佳的决策树数量。前文中的函数可以完成这个工作。

learning_rate =0.1,
n_estimators=1000,
max_depth=5,
min_child_weight=1,
gamma=0,
subsample=0.8,
colsample_bytree=0.8,
objective= ‘binary:logistic’,
nthread=4,
scale_pos_weight=1,
seed=27

从输出结果可以看出，在学习速率为0.1时，理想的决策树数目是140。这个数字对你而言可能比较高，当然这也取决于你的系统的性能。

注意：在AUC(test)这里你可以看到测试集的AUC值。但是如果你在自己的系统上运行这些命令，并不会出现这个值。因为数据并不公开。这里提供的值仅供参考。生成这个值的代码部分已经被删掉了。

第二步： max_depth 和 min_weight 参数调优

我们先对这两个参数调优，是因为它们对最终结果有很大的影响。首先，我们先大范围地粗调参数，然后再小范围地微调。
注意：在这一节我会进行高负荷的栅格搜索(grid search)，这个过程大约需要15-30分钟甚至更久，具体取决于你系统的性能。你也可以根据自己系统的性能选择不同的值。

‘max_depth’:range(3,10,2),
‘min_child_weight’:range(1,6,2)
learning_rate =0.1
n_estimators=140,
max_depth=5,
min_child_weight=1,
gamma=0,
subsample=0.8
colsample_bytree=0.8,
objective= ‘binary:logistic’,
nthread=4,
scale_pos_weight=1
seed=27

至此，我们对于数值进行了较大跨度的12中不同的排列组合，可以看出理想的max_depth值为5，理想的min_child_weight值为5。在这个值附近我们可以再进一步调整，来找出理想值。我们把上下范围各拓展1，因为之前我们进行组合的时候，参数调整的步长是2。

‘max_depth’:[4,5,6],
‘min_child_weight’:[4,5,6]
learning_rate=0.1,
n_estimators=140,
max_depth=5,
min_child_weight=2,
gamma=0,
subsample=0.8,
colsample_bytree=0.8,
objective= ‘binary:logistic’,
nthread=4,
scale_pos_weight=1,
seed=27

至此，我们得到max_depth的理想取值为4，min_child_weight的理想取值为6。同时，我们还能看到cv的得分有了小小一点提高。需要注意的一点是，随着模型表现的提升，进一步提升的难度是指数级上升的，尤其是你的表现已经接近完美的时候。当然啦，你会发现，虽然min_child_weight的理想取值是6，但是我们还没尝试过大于6的取值。像下面这样，就可以尝试其它值。

‘min_child_weight’:[6,8,10,12]
learning_rate=0.1,
n_estimators=140,
max_depth=4,
min_child_weight=2,
gamma=0,
subsample=0.8,
colsample_bytree=0.8,
objective= ‘binary:logistic’,
nthread=4,
scale_pos_weight=1,
seed=27

我们可以看出，6确确实实是理想的取值了。

第三步：gamma参数调优

在已经调整好其它参数的基础上，我们可以进行gamma参数的调优了。Gamma参数取值范围可以很大，我这里把取值范围设置为5了。你其实也可以取更精确的gamma值。

‘gamma’:[i/10.0 for i in range(0,5)]
learning_rate =0.1,
n_estimators=140,
max_depth=4,
min_child_weight=6,
gamma=0,
subsample=0.8,
colsample_bytree=0.8,
objective= ‘binary:logistic’,
nthread=4,
scale_pos_weight=1,
seed=27

从这里可以看出来，我们在第一步调参时设置的初始gamma值就是比较合适的。也就是说，理想的gamma值为0。在这个过程开始之前，最好重新调整boosting回合，因为参数都有变化。

从这里可以看出，得分提高了。所以，最终得到的参数是：

learning_rate =0.1,
n_estimators=1000,
max_depth=4,
min_child_weight=6,
gamma=0,
subsample=0.8,
colsample_bytree=0.8,
objective= ‘binary:logistic’,
nthread=4,
scale_pos_weight=1,
seed=27

第四步：调整subsample 和 colsample_bytree 参数

下一步是尝试不同的subsample 和 colsample_bytree 参数。我们分两个阶段来进行这个步骤。这两个步骤都取0.6,0.7,0.8,0.9作为起始值。

‘subsample’:[i/10.0fori in range(6,10)],
‘colsample_bytree’:[i/10.0fori in range(6,10)]
learning_rate =0.1,
n_estimators=177,
max_depth=3,
min_child_weight=4,
gamma=0.1,
subsample=0.8,
colsample_bytree=0.8,
objective= ‘binary:logistic’,
nthread=4,
scale_pos_weight=1,
seed=27

从这里可以看出来，subsample 和 colsample_bytree 参数的理想取值都是0.8。现在，我们以0.05为步长，在这个值附近尝试取值。

‘subsample’:[i/100.0fori in range(75,90,5)],
‘colsample_bytree’:[i/100.0fori in range(75,90,5)]
learning_rate =0.1,
n_estimators=177,
max_depth=4,
min_child_weight=6,
gamma=0,
subsample=0.8,
colsample_bytree=0.8,
objective= ‘binary:logistic’,
nthread=4,
scale_pos_weight=1,
seed=27

我们得到的理想取值还是原来的值。因此，最终的理想取值是:

subsample: 0.8
colsample_bytree: 0.8

第五步：正则化参数调优。

下一步是应用正则化来降低过拟合。由于gamma函数提供了一种更加有效地降低过拟合的方法，大部分人很少会用到这个参数。但是我们在这里也可以尝试用一下这个参数。我会在这里调整’reg_alpha’参数，然后’reg_lambda’参数留给你来完成。

‘reg_alpha’:[1e-5, 1e-2, 0.1, 1, 100]
learning_rate =0.1,
n_estimators=177,
max_depth=4,
min_child_weight=6,
gamma=0.1,
subsample=0.8,
colsample_bytree=0.8,
objective= ‘binary:logistic’,
nthread=4,
scale_pos_weight=1,
seed=27

我们可以看到，相比之前的结果，CV的得分甚至还降低了。但是我们之前使用的取值是十分粗糙的，我们在这里选取一个比较靠近理想值(0.01)的取值，来看看是否有更好的表现。

‘reg_alpha’:[0, 0.001, 0.005, 0.01, 0.05]
learning_rate =0.1,
n_estimators=177,
max_depth=4,
min_child_weight=6,
gamma=0.1,
subsample=0.8,
colsample_bytree=0.8,
objective= ‘binary:logistic’,
nthread=4,
scale_pos_weight=1
,seed=27

可以看到，CV的得分提高了。现在，我们在模型中来使用正则化参数，来看看这个参数的影响。

learning_rate =0.1,
n_estimators=1000,
max_depth=4,
min_child_weight=6,
gamma=0,
subsample=0.8,
colsample_bytree=0.8,
reg_alpha=0.005,
objective= ‘binary:logistic’,
nthread=4,
scale_pos_weight=1,
seed=27

然后我们发现性能有了小幅度提高。

第6步：降低学习速率

最后，我们使用较低的学习速率，以及使用更多的决策树。我们可以用XGBoost中的CV函数来进行这一步工作。

learning_rate =0.01,
n_estimators=5000,
max_depth=4,
min_child_weight=6,
gamma=0,
subsample=0.8,
colsample_bytree=0.8,
reg_alpha=0.005,
objective= ‘binary:logistic’,
nthread=4,
scale_pos_weight=1,
seed=27

system

（二）简化版本：
一般Xgboost调优的顺序可以参考如下：

确定一个较大的学习速率0.1
num_boost_round调优
max_depth 和 min_weight 参数调优
gamma参数调优
正则化参数调优
降低学习速率

第一步：关于num_boost_round的调优，一般有两种可选的方法：

首先将num_boost_round设的足够大，然后在运行的过程中我们看训练集和测试集的auc变化，一般来说训练集上面的auc会一直增加，但是测试集上面的auc会随着num_boost_round增大因为过拟合而下降，这样我们就会在训练过程中找到测试集的一个峰值，一旦找到，我们就可以结束训练了。

第二步：max_depth 和 min_weight 参数调优

先对这两个参数调优，是因为它们对最终结果有很大的影响。首先，我们先大范围地粗调参数，然后再小范围地微调。

第三步：gamma参数调优

第四步：调整subsample 和 colsample_bytree 参数

下一步是尝试不同的subsample 和 colsample_bytree 参数。我们分两个阶段来进行这个步骤。这两个步骤都取0.6,0.7,0.8,0.9作为起始值。

第五步：正则化参数调优

下一步是应用正则化来降低过拟合。由于gamma函数提供了一种更加有效地降低过拟合的方法，大部分人很少会用到这个参数。但是我们在这里也可以尝试用一下这个参数。

至此，你可以看到模型的表现有了大幅提升，调整每个参数带来的影响也更加清楚了。

三、总结
1、仅仅靠参数的调整和模型的小幅优化，想要让模型的表现有个大幅度提升是不可能的。GBM的最高得分是0.8487，XGBoost的最高得分是0.8494。确实是有一定的提升，但是没有达到质的飞跃。
2、要想让模型的表现有一个质的飞跃，需要依靠其他的手段，诸如，特征工程(feature egineering) ，模型组合(ensemble of model),以及堆叠(stacking)等。

system

‌eta（学习率）‌：默认值为0.3，也被称为learning_rate。典型值一般设置为0.01-0.2，这个参数可以缩减每一步的权重值，使得模型更加健壮。
‌min_child_weight‌：默认值为1，表示一个子集的所有观察值的最小权重和。如果新分裂的节点的样本权重和小于min_child_weight，则停止分裂。这个参数可以用来减少过拟合，但也不能太高，以免导致欠拟合。
‌max_depth‌：树的最大深度，默认值为6。值越大，树越大，模型越复杂。典型值是3-10，这个参数可以用来防止过拟合。
‌gamma‌（min_split_loss）：分裂节点时，损失函数减小值只有大于等于gamma节点才分裂。gamma值越大，算法越保守，越不容易过拟合，但性能就不一定能保证，需要平衡。
‌subsample‌：构建每棵树对样本的采样率，默认值为1。如果设置成0.5，XGBoost会随机选择一半的样本作为训练集。
‌colsample_bytree‌：列采样率，也就是特征采样率，默认值为1。
‌lambda‌（reg_lambda）：L2正则化，默认值为1。这个参数用来控制XGBoost的正则化部分，虽然大部分数据科学家很少用到这个参数，但它有助于减少过拟合。
‌alpha‌（reg_alpha）：L1正则化，增加该值会让模型更加收敛。
‌scale_pos_weight‌：在类别高度不平衡的情况下，将参数设置大于0，可以加快收敛。
此外，还有一些其他参数如‌n_estimators‌（num_boost_round）指定训练期间确定要生成的决策树的数量，对于大型数据集，默认值100往往不够，可能需要设置一个较高的数值如100,000，并利用早停技术获得最佳版本。

system

在XGBoost（eXtreme Gradient Boosting）中，num_boost_round 和 eta 是两个重要的超参数，它们共同影响着模型的学习过程和最终性能。

num_boost_round 指的是训练过程中添加的树的数量。在每一轮中，XGBoost会添加一棵新的树来修正已有模型的预测结果。这个值通常是在交叉验证中选择的，以防止过拟合。
eta（学习率）则是每一轮迭代更新权重时的步长。一个较小的eta值会导致每棵树对最终模型的影响变小，因此需要更多的树（即更大的num_boost_round）来达到同样的性能水平。较大的eta值则意味着每棵树的影响较大，可能更快地达到较好的性能，但也更容易过拟合。
简单来说，eta和num_boost_round之间存在一种权衡关系：

如果 eta 设置得比较小，那么为了达到满意的模型性能，就需要增加 num_boost_round 的值，即需要更多的树来逐渐逼近最优解。
反之，如果 eta 较大，则每次迭代调整的幅度较大，可能需要较少的迭代次数（num_boost_round 较小），但是过大的 eta 值可能导致欠拟合或过拟合问题。
在实践中，通常会通过交叉验证来寻找这两个参数的最佳组合，以达到既能有效避免过拟合又能获得良好泛化能力的效果。调整这些参数时，应该注意到它们对模型复杂性和训练时间的影响。

system

在特征数少的情况下，可以尝试设置较小的num_boost_round（例如100到200），并且将eta设置为0.1或更小，以确保模型不会过拟合。

使用交叉验证来调优这两个参数，以找到最佳的组合，确保模型在训练集和验证集上的表现都是良好的。

system

param = {'learning_rate' : 0.1, 'n_estimators': 1000, 'max_depth': 3, 
        'min_child_weight': 5, 'gamma': 0, 'subsample': 1.0, 'colsample_bytree': 0.8,
        'scale_pos_weight': 1, 'eta': 0.05, 'silent': 1, 'objective': 'binary:logistic'}

system

xgb_params = {
    'verbosity':                    0,
    'alpha':                        0.9,
    'max_bin':                      256,
    'scale_pos_weight':             2,
    'learning_rate':                0.1,
    'subsample':                    1,
    'reg_lambda':                   1,
    "min_child_weight":             0,
    'max_depth':                    8,
    'max_leaves':                   2**8,
    'predictor':                    'cpu_predictor',
    'tree_method':                  'hist',
    'n_estimators':                 1000
}

system

xgb_pars = {
    'eta': 0.15,
    'gamma': 0.0,
    'max_depth': 8,
    'min_child_weight': 1,
    'max_delta_step': 0,
    'subsample': 0.6,
    'colsample_bytree': 0.6,
    'colsample_bylevel': 1,
    'lambda': 1,
    'alpha': 0,
    'tree_method': 'approx',
    'objective': 'rank:pairwise',
    'eval_metric': 'map@12',
    'nthread': 12,
    'seed': 42,
    'silent': 1
}


# train the model

model = xgb.train(xgb_pars, dfold0, num_boost_round=n_estimators, 
                  verbose_eval=1, evals=watchlist)

system

params={
    'booster':'gbtree',
    'objective':'rank:pairwise',
    'eval_metric':'auc',
    'gama':0.1,
    'min_child_weight':2,
    'max_depth':5,
    'lambda':10,
    'subsample':0.7,
    'colsample_bytree':0.7,
    'eta':0.01,
    'tree_method':'exact',#'hist'
    'seed':0,
    'nthead':7
}

system

params = {
    'booster': 'gbtree',
    'objective': 'rank:pairwise', #'binary:logistic', 
    'eta': 0.05, 
    'seed' : 2018,
    'max_depth': 5,
    'subsample': 0.9, 
    'colsample_bytree': 0.8,
    'colsample_bylevel' : 0.8,
    'eval_metric': ['auc'], # Need TO Logloss
    'nthread' : 8,
    'gamma': 2,
}

xgb_model = xgb.train(params,xgb_train,2000,watch_list,early_stopping_rounds=40,verbose_eval=10)

system

params = {'booster':'gbtree',
              'max_depth': 3,
              'colsample_bytree': 0.7,
              'subsample': 0.7, 
              'eta': 0.03,
              'silent': 1,
#              'objective': 'binary:logistic',
              'objective': 'rank:pairwise',
              'min_child_weight': 6,  # 这儿不是3就是6
              'seed': 10,
              'eval_metric':'auc',
              'scale_pos_weight': 3176 / 76824}
    watchlist = [(dtrain,'train')]
    bst=xgb.train(params,dtrain,num_boost_round=1000,evals=watchlist, 
                  early_stopping_rounds=100)

system

def lgb_model(X_train, y_train, X_test, y_test=None):  
    #LightGBM  
    lgb_train=lgb.Dataset(X_train,y_train,categorical_feature={'sex','merriage','income',
                                                                   'qq_bound','degree',
                                                                   'wechat_bound',
                                                                   'account_grade','industry'})
    lgb_test = lgb.Dataset(X_test,categorical_feature={'sex','merriage','income','qq_bound',
                                                             'degree','wechat_bound',
                                                             'account_grade','industry'})  
    params = {  
        'task': 'train',  
        'boosting_type': 'gbdt',  
        'objective': 'binary',  
        'metric':'auc',  
        'num_leaves': 25,  
        'learning_rate': 0.01,  
        'feature_fraction': 0.7,  
        'bagging_fraction': 0.7,  
        'bagging_freq': 5,  
        'min_data_in_leaf':5,  
        'max_bin':200,  
        'verbose': 0,  
    }  
    gbm = lgb.train(params,  
    lgb_train,  
    num_boost_round=2000)  
    predict = gbm.predict_proba(X_test)  
    minmin = min(predict)  
    maxmax = max(predict)  
    vfunc = np.vectorize(lambda x:(x-minmin)/(maxmax-minmin))  
    return vfunc(predict)

system

param = {‘max_depth’: 8, ‘eta’: 0.05, ‘silent’: 1, ‘objective’: ‘rank:pairwise’, ‘min_child_weight’: 0.01, ‘lambda’:100}

system

params = {
        'booster': 'gbtree',
        'objective': 'rank:pairwise',
        'gamma': 0.3,
        'max_depth': 3,
        'lambda': 1,
        'subsample': 0.6,
        'colsample_bytree': 0.8,
        'min_child_weight': 3,
        'silent': 1,
        'eta': 0.01,
        'seed': 100,
        'alpha': 1,
        'nthread': -1,
        'eval_metric': 'map@1-',
    }

system

特征：13
训练数据：2222783

params = {
        'booster': 'gbtree',
        'objective': 'rank:pairwise',
        'gamma': 1,
        'max_depth': 6,        
        'min_child_weight': 3,
        'nthread': 1,
        'eta': 0.08,
        'min_child_weight': 0.5,
    }
累计收益：153.24%	 最大回撤：-32.05%

以下微调

微调参数	累计收益	最大回撤
num_boost_round：2000,eta:0.06,‘eval_metric’: ‘ndcg’	43.90%	-29.61%
num_boost_round：2000,‘eval_metric’: ‘ndcg’	153.24%	-32.05%
num_boost_round：2000,‘eval_metric’: ‘map’	153.24%	-32.05%