摘要:字符串函數(shù)名,或是可調(diào)用對象,需要其函數(shù)簽名形如如果是,則使用的誤差估計(jì)函數(shù)。運(yùn)行后的結(jié)果為每輪迭代運(yùn)行結(jié)果參數(shù)的最佳取值最佳模型得分由輸出結(jié)果可知參數(shù)的最佳取值。提醒一點(diǎn),這個(gè)分?jǐn)?shù)是根據(jù)前面設(shè)置的得分函數(shù)算出來的,即中的。
這一篇博客的內(nèi)容是在上一篇博客Scikit中的特征選擇,XGboost進(jìn)行回歸預(yù)測,模型優(yōu)化的實(shí)戰(zhàn)的基礎(chǔ)上進(jìn)行調(diào)參優(yōu)化的,所以在閱讀本篇博客之前,請先移步看一下上一篇文章。
我前面所做的工作基本都是關(guān)于特征選擇的,這里我想寫的是關(guān)于XGBoost參數(shù)調(diào)整的一些小經(jīng)驗(yàn)。之前我在網(wǎng)站上也看到很多相關(guān)的內(nèi)容,基本是翻譯自一篇英文的博客,更坑的是很多文章步驟講的不完整,新人看了很容易一頭霧水。由于本人也是一個(gè)新手,在這過程中也踩了很多大坑,希望這篇博客能夠幫助到大家!下面,就進(jìn)入正題吧。
首先,很幸運(yùn)的是,Scikit-learn中提供了一個(gè)函數(shù)可以幫助我們更好地進(jìn)行調(diào)參:
sklearn.model_selection.GridSearchCV
常用參數(shù)解讀:
estimator:所使用的分類器,如果比賽中使用的是XGBoost的話,就是生成的model。比如: model = xgb.XGBRegressor(**other_params)
param_grid:值為字典或者列表,即需要最優(yōu)化的參數(shù)的取值。比如:cv_params = {"n_estimators": [550, 575, 600, 650, 675]}
scoring :準(zhǔn)確度評價(jià)標(biāo)準(zhǔn),默認(rèn)None,這時(shí)需要使用score函數(shù);或者如scoring="roc_auc",根據(jù)所選模型不同,評價(jià)準(zhǔn)則不同。字符串(函數(shù)名),或是可調(diào)用對象,需要其函數(shù)簽名形如:scorer(estimator, X, y);如果是None,則使用estimator的誤差估計(jì)函數(shù)。scoring參數(shù)選擇如下:
具體參考地址:http://scikit-learn.org/stable/modules/model_evaluation.html
這次實(shí)戰(zhàn)我使用的是r2這個(gè)得分函數(shù),當(dāng)然大家也可以根據(jù)自己的實(shí)際需要來選擇。
調(diào)參剛開始的時(shí)候,一般要先初始化一些值:
learning_rate: 0.1
n_estimators: 500
max_depth: 5
min_child_weight: 1
subsample: 0.8
colsample_bytree:0.8
gamma: 0
reg_alpha: 0
reg_lambda: 1
鏈接:XGBoost常用參數(shù)一覽表
你可以按照自己的實(shí)際情況來設(shè)置初始值,上面的也只是一些經(jīng)驗(yàn)之談吧。
調(diào)參的時(shí)候一般按照以下順序來進(jìn)行:
1、最佳迭代次數(shù):n_estimators
if __name__ == "__main__": trainFilePath = "dataset/soccer/train.csv" testFilePath = "dataset/soccer/test.csv" data = pd.read_csv(trainFilePath) X_train, y_train = featureSet(data) X_test = loadTestData(testFilePath) cv_params = {"n_estimators": [400, 500, 600, 700, 800]} other_params = {"learning_rate": 0.1, "n_estimators": 500, "max_depth": 5, "min_child_weight": 1, "seed": 0, "subsample": 0.8, "colsample_bytree": 0.8, "gamma": 0, "reg_alpha": 0, "reg_lambda": 1} model = xgb.XGBRegressor(**other_params) optimized_GBM = GridSearchCV(estimator=model, param_grid=cv_params, scoring="r2", cv=5, verbose=1, n_jobs=4) optimized_GBM.fit(X_train, y_train) evalute_result = optimized_GBM.grid_scores_ print("每輪迭代運(yùn)行結(jié)果:{0}".format(evalute_result)) print("參數(shù)的最佳取值:{0}".format(optimized_GBM.best_params_)) print("最佳模型得分:{0}".format(optimized_GBM.best_score_))
寫到這里,需要提醒大家,在代碼中有一處很關(guān)鍵:
model = xgb.XGBRegressor(**other_params)中兩個(gè)*號千萬不能省略!可能很多人不注意,再加上網(wǎng)上很多教程估計(jì)是從別人那里直接拷貝,沒有運(yùn)行結(jié)果,所以直接就用了 model = xgb.XGBRegressor(other_params)。悲劇的是,如果直接這樣運(yùn)行的話,會報(bào)如下錯(cuò)誤:
xgboost.core.XGBoostError: b"Invalid Parameter format for max_depth expect int but value...
不信,請看鏈接:xgboost issue
以上是血的教訓(xùn)啊,自己不運(yùn)行一遍代碼,永遠(yuǎn)不知道會出現(xiàn)什么Bug!
運(yùn)行后的結(jié)果為:
[Parallel(n_jobs=4)]: Done 25 out of 25 | elapsed: 1.5min finished 每輪迭代運(yùn)行結(jié)果:[mean: 0.94051, std: 0.01244, params: {"n_estimators": 400}, mean: 0.94057, std: 0.01244, params: {"n_estimators": 500}, mean: 0.94061, std: 0.01230, params: {"n_estimators": 600}, mean: 0.94060, std: 0.01223, params: {"n_estimators": 700}, mean: 0.94058, std: 0.01231, params: {"n_estimators": 800}] 參數(shù)的最佳取值:{"n_estimators": 600} 最佳模型得分:0.9406056804545407
由輸出結(jié)果可知最佳迭代次數(shù)為600次。但是,我們還不能認(rèn)為這是最終的結(jié)果,由于設(shè)置的間隔太大,所以,我又測試了一組參數(shù),這次粒度小一些:
cv_params = {"n_estimators": [550, 575, 600, 650, 675]} other_params = {"learning_rate": 0.1, "n_estimators": 600, "max_depth": 5, "min_child_weight": 1, "seed": 0, "subsample": 0.8, "colsample_bytree": 0.8, "gamma": 0, "reg_alpha": 0, "reg_lambda": 1}
運(yùn)行后的結(jié)果為:
[Parallel(n_jobs=4)]: Done 25 out of 25 | elapsed: 1.5min finished 每輪迭代運(yùn)行結(jié)果:[mean: 0.94065, std: 0.01237, params: {"n_estimators": 550}, mean: 0.94064, std: 0.01234, params: {"n_estimators": 575}, mean: 0.94061, std: 0.01230, params: {"n_estimators": 600}, mean: 0.94060, std: 0.01226, params: {"n_estimators": 650}, mean: 0.94060, std: 0.01224, params: {"n_estimators": 675}] 參數(shù)的最佳取值:{"n_estimators": 550} 最佳模型得分:0.9406545392685364
果不其然,最佳迭代次數(shù)變成了550。有人可能會問,那還要不要繼續(xù)縮小粒度測試下去呢?這個(gè)我覺得可以看個(gè)人情況,如果你想要更高的精度,當(dāng)然是粒度越小,結(jié)果越準(zhǔn)確,大家可以自己慢慢去調(diào)試,我在這里就不一一去做了。
2、接下來要調(diào)試的參數(shù)是min_child_weight以及max_depth:
注意:每次調(diào)完一個(gè)參數(shù),要把 other_params對應(yīng)的參數(shù)更新為最優(yōu)值。
cv_params = {"max_depth": [3, 4, 5, 6, 7, 8, 9, 10], "min_child_weight": [1, 2, 3, 4, 5, 6]} other_params = {"learning_rate": 0.1, "n_estimators": 550, "max_depth": 5, "min_child_weight": 1, "seed": 0, "subsample": 0.8, "colsample_bytree": 0.8, "gamma": 0, "reg_alpha": 0, "reg_lambda": 1}
運(yùn)行后的結(jié)果為:
[Parallel(n_jobs=4)]: Done 42 tasks | elapsed: 1.7min [Parallel(n_jobs=4)]: Done 192 tasks | elapsed: 12.3min [Parallel(n_jobs=4)]: Done 240 out of 240 | elapsed: 17.2min finished 每輪迭代運(yùn)行結(jié)果:[mean: 0.93967, std: 0.01334, params: {"min_child_weight": 1, "max_depth": 3}, mean: 0.93826, std: 0.01202, params: {"min_child_weight": 2, "max_depth": 3}, mean: 0.93739, std: 0.01265, params: {"min_child_weight": 3, "max_depth": 3}, mean: 0.93827, std: 0.01285, params: {"min_child_weight": 4, "max_depth": 3}, mean: 0.93680, std: 0.01219, params: {"min_child_weight": 5, "max_depth": 3}, mean: 0.93640, std: 0.01231, params: {"min_child_weight": 6, "max_depth": 3}, mean: 0.94277, std: 0.01395, params: {"min_child_weight": 1, "max_depth": 4}, mean: 0.94261, std: 0.01173, params: {"min_child_weight": 2, "max_depth": 4}, mean: 0.94276, std: 0.01329...] 參數(shù)的最佳取值:{"min_child_weight": 5, "max_depth": 4} 最佳模型得分:0.94369522247392
由輸出結(jié)果可知參數(shù)的最佳取值:{"min_child_weight": 5, "max_depth": 4}。(代碼輸出結(jié)果被我省略了一部分,因?yàn)榻Y(jié)果太長了,以下也是如此)
3、接著我們就開始調(diào)試參數(shù):gamma:
cv_params = {"gamma": [0.1, 0.2, 0.3, 0.4, 0.5, 0.6]} other_params = {"learning_rate": 0.1, "n_estimators": 550, "max_depth": 4, "min_child_weight": 5, "seed": 0, "subsample": 0.8, "colsample_bytree": 0.8, "gamma": 0, "reg_alpha": 0, "reg_lambda": 1}
運(yùn)行后的結(jié)果為:
[Parallel(n_jobs=4)]: Done 30 out of 30 | elapsed: 1.5min finished 每輪迭代運(yùn)行結(jié)果:[mean: 0.94370, std: 0.01010, params: {"gamma": 0.1}, mean: 0.94370, std: 0.01010, params: {"gamma": 0.2}, mean: 0.94370, std: 0.01010, params: {"gamma": 0.3}, mean: 0.94370, std: 0.01010, params: {"gamma": 0.4}, mean: 0.94370, std: 0.01010, params: {"gamma": 0.5}, mean: 0.94370, std: 0.01010, params: {"gamma": 0.6}] 參數(shù)的最佳取值:{"gamma": 0.1} 最佳模型得分:0.94369522247392
由輸出結(jié)果可知參數(shù)的最佳取值:{"gamma": 0.1}。
4、接著是subsample以及colsample_bytree:
cv_params = {"subsample": [0.6, 0.7, 0.8, 0.9], "colsample_bytree": [0.6, 0.7, 0.8, 0.9]} other_params = {"learning_rate": 0.1, "n_estimators": 550, "max_depth": 4, "min_child_weight": 5, "seed": 0, "subsample": 0.8, "colsample_bytree": 0.8, "gamma": 0.1, "reg_alpha": 0, "reg_lambda": 1}
運(yùn)行后的結(jié)果顯示參數(shù)的最佳取值:{"subsample": 0.7,"colsample_bytree": 0.7}
5、緊接著就是:reg_alpha以及reg_lambda:
cv_params = {"reg_alpha": [0.05, 0.1, 1, 2, 3], "reg_lambda": [0.05, 0.1, 1, 2, 3]} other_params = {"learning_rate": 0.1, "n_estimators": 550, "max_depth": 4, "min_child_weight": 5, "seed": 0, "subsample": 0.7, "colsample_bytree": 0.7, "gamma": 0.1, "reg_alpha": 0, "reg_lambda": 1}
運(yùn)行后的結(jié)果為:
[Parallel(n_jobs=4)]: Done 42 tasks | elapsed: 2.0min [Parallel(n_jobs=4)]: Done 125 out of 125 | elapsed: 5.6min finished 每輪迭代運(yùn)行結(jié)果:[mean: 0.94169, std: 0.00997, params: {"reg_alpha": 0.01, "reg_lambda": 0.01}, mean: 0.94112, std: 0.01086, params: {"reg_alpha": 0.01, "reg_lambda": 0.05}, mean: 0.94153, std: 0.01093, params: {"reg_alpha": 0.01, "reg_lambda": 0.1}, mean: 0.94400, std: 0.01090, params: {"reg_alpha": 0.01, "reg_lambda": 1}, mean: 0.93820, std: 0.01177, params: {"reg_alpha": 0.01, "reg_lambda": 100}, mean: 0.94194, std: 0.00936, params: {"reg_alpha": 0.05, "reg_lambda": 0.01}, mean: 0.94136, std: 0.01122, params: {"reg_alpha": 0.05, "reg_lambda": 0.05}, mean: 0.94164, std: 0.01120...] 參數(shù)的最佳取值:{"reg_alpha": 1, "reg_lambda": 1} 最佳模型得分:0.9441561344357595
由輸出結(jié)果可知參數(shù)的最佳取值:{"reg_alpha": 1, "reg_lambda": 1}。
6、最后就是learning_rate,一般這時(shí)候要調(diào)小學(xué)習(xí)率來測試:
cv_params = {"learning_rate": [0.01, 0.05, 0.07, 0.1, 0.2]} other_params = {"learning_rate": 0.1, "n_estimators": 550, "max_depth": 4, "min_child_weight": 5, "seed": 0, "subsample": 0.7, "colsample_bytree": 0.7, "gamma": 0.1, "reg_alpha": 1, "reg_lambda": 1}
運(yùn)行后的結(jié)果為:
[Parallel(n_jobs=4)]: Done 25 out of 25 | elapsed: 1.1min finished 每輪迭代運(yùn)行結(jié)果:[mean: 0.93675, std: 0.01080, params: {"learning_rate": 0.01}, mean: 0.94229, std: 0.01138, params: {"learning_rate": 0.05}, mean: 0.94110, std: 0.01066, params: {"learning_rate": 0.07}, mean: 0.94416, std: 0.01037, params: {"learning_rate": 0.1}, mean: 0.93985, std: 0.01109, params: {"learning_rate": 0.2}] 參數(shù)的最佳取值:{"learning_rate": 0.1} 最佳模型得分:0.9441561344357595
由輸出結(jié)果可知參數(shù)的最佳取值:{"learning_rate": 0.1}。
我們可以很清楚地看到,隨著參數(shù)的調(diào)優(yōu),最佳模型得分是不斷提高的,這也從另一方面驗(yàn)證了調(diào)優(yōu)確實(shí)是起到了一定的作用。不過,我們也可以注意到,其實(shí)最佳分?jǐn)?shù)并沒有提升太多。提醒一點(diǎn),這個(gè)分?jǐn)?shù)是根據(jù)前面設(shè)置的得分函數(shù)算出來的,即:
optimized_GBM = GridSearchCV(estimator=model, param_grid=cv_params, scoring="r2", cv=5, verbose=1, n_jobs=4)
中的scoring="r2"。在實(shí)際情境中,我們可能需要利用各種不同的得分函數(shù)來評判模型的好壞。
最后,我們把得到的最佳參數(shù)組合扔到模型里訓(xùn)練,就可以得到預(yù)測的結(jié)果了:
def trainandTest(X_train, y_train, X_test): # XGBoost訓(xùn)練過程,下面的參數(shù)就是剛才調(diào)試出來的最佳參數(shù)組合 model = xgb.XGBRegressor(learning_rate=0.1, n_estimators=550, max_depth=4, min_child_weight=5, seed=0, subsample=0.7, colsample_bytree=0.7, gamma=0.1, reg_alpha=1, reg_lambda=1) model.fit(X_train, y_train) # 對測試集進(jìn)行預(yù)測 ans = model.predict(X_test) ans_len = len(ans) id_list = np.arange(10441, 17441) data_arr = [] for row in range(0, ans_len): data_arr.append([int(id_list[row]), ans[row]]) np_data = np.array(data_arr) # 寫入文件 pd_data = pd.DataFrame(np_data, columns=["id", "y"]) # print(pd_data) pd_data.to_csv("submit.csv", index=None) # 顯示重要特征 # plot_importance(model) # plt.show()
好了,調(diào)參的過程到這里就基本結(jié)束了。正如我在上面提到的一樣,其實(shí)調(diào)參對于模型準(zhǔn)確率的提高有一定的幫助,但這是有限的。最重要的還是要通過數(shù)據(jù)清洗,特征選擇,特征融合,模型融合等手段來進(jìn)行改進(jìn)!
下面我就貼出完整代碼(聲明一點(diǎn),我的代碼質(zhì)量不是很好,大家參考一下思路就行):
#!/usr/bin/env python # -*- coding: utf-8 -*- # @File : soccer_value.py # @Author: Huangqinjian # @Date : 2018/3/22 # @Desc : import numpy as np import pandas as pd import xgboost as xgb from sklearn import preprocessing from sklearn import metrics from sklearn.preprocessing import Imputer from sklearn.grid_search import GridSearchCV from hyperopt import hp # 加載訓(xùn)練數(shù)據(jù) def featureSet(data): imputer = Imputer(missing_values="NaN", strategy="mean", axis=0) imputer.fit(data.loc[:, ["rw", "st", "lw", "cf", "cam", "cm"]]) x_new = imputer.transform(data.loc[:, ["rw", "st", "lw", "cf", "cam", "cm"]]) le = preprocessing.LabelEncoder() le.fit(["Low", "Medium", "High"]) att_label = le.transform(data.work_rate_att.values) # print(att_label) def_label = le.transform(data.work_rate_def.values) # print(def_label) data_num = len(data) XList = [] for row in range(0, data_num): tmp_list = [] tmp_list.append(data.iloc[row]["club"]) tmp_list.append(data.iloc[row]["league"]) tmp_list.append(data.iloc[row]["potential"]) tmp_list.append(data.iloc[row]["international_reputation"]) tmp_list.append(data.iloc[row]["pac"]) tmp_list.append(data.iloc[row]["sho"]) tmp_list.append(data.iloc[row]["pas"]) tmp_list.append(data.iloc[row]["dri"]) tmp_list.append(data.iloc[row]["def"]) tmp_list.append(data.iloc[row]["phy"]) tmp_list.append(data.iloc[row]["skill_moves"]) tmp_list.append(x_new[row][0]) tmp_list.append(x_new[row][1]) tmp_list.append(x_new[row][2]) tmp_list.append(x_new[row][3]) tmp_list.append(x_new[row][4]) tmp_list.append(x_new[row][5]) tmp_list.append(att_label[row]) tmp_list.append(def_label[row]) XList.append(tmp_list) yList = data.y.values return XList, yList # 加載測試數(shù)據(jù) def loadTestData(filePath): data = pd.read_csv(filepath_or_buffer=filePath) imputer = Imputer(missing_values="NaN", strategy="mean", axis=0) imputer.fit(data.loc[:, ["rw", "st", "lw", "cf", "cam", "cm"]]) x_new = imputer.transform(data.loc[:, ["rw", "st", "lw", "cf", "cam", "cm"]]) le = preprocessing.LabelEncoder() le.fit(["Low", "Medium", "High"]) att_label = le.transform(data.work_rate_att.values) # print(att_label) def_label = le.transform(data.work_rate_def.values) # print(def_label) data_num = len(data) XList = [] for row in range(0, data_num): tmp_list = [] tmp_list.append(data.iloc[row]["club"]) tmp_list.append(data.iloc[row]["league"]) tmp_list.append(data.iloc[row]["potential"]) tmp_list.append(data.iloc[row]["international_reputation"]) tmp_list.append(data.iloc[row]["pac"]) tmp_list.append(data.iloc[row]["sho"]) tmp_list.append(data.iloc[row]["pas"]) tmp_list.append(data.iloc[row]["dri"]) tmp_list.append(data.iloc[row]["def"]) tmp_list.append(data.iloc[row]["phy"]) tmp_list.append(data.iloc[row]["skill_moves"]) tmp_list.append(x_new[row][0]) tmp_list.append(x_new[row][1]) tmp_list.append(x_new[row][2]) tmp_list.append(x_new[row][3]) tmp_list.append(x_new[row][4]) tmp_list.append(x_new[row][5]) tmp_list.append(att_label[row]) tmp_list.append(def_label[row]) XList.append(tmp_list) return XList def trainandTest(X_train, y_train, X_test): # XGBoost訓(xùn)練過程 model = xgb.XGBRegressor(learning_rate=0.1, n_estimators=550, max_depth=4, min_child_weight=5, seed=0, subsample=0.7, colsample_bytree=0.7, gamma=0.1, reg_alpha=1, reg_lambda=1) model.fit(X_train, y_train) # 對測試集進(jìn)行預(yù)測 ans = model.predict(X_test) ans_len = len(ans) id_list = np.arange(10441, 17441) data_arr = [] for row in range(0, ans_len): data_arr.append([int(id_list[row]), ans[row]]) np_data = np.array(data_arr) # 寫入文件 pd_data = pd.DataFrame(np_data, columns=["id", "y"]) # print(pd_data) pd_data.to_csv("submit.csv", index=None) # 顯示重要特征 # plot_importance(model) # plt.show() if __name__ == "__main__": trainFilePath = "dataset/soccer/train.csv" testFilePath = "dataset/soccer/test.csv" data = pd.read_csv(trainFilePath) X_train, y_train = featureSet(data) X_test = loadTestData(testFilePath) # 預(yù)測最終的結(jié)果 # trainandTest(X_train, y_train, X_test) """ 下面部分為調(diào)試參數(shù)的代碼 """ # # cv_params = {"n_estimators": [400, 500, 600, 700, 800]} # other_params = {"learning_rate": 0.1, "n_estimators": 500, "max_depth": 5, "min_child_weight": 1, "seed": 0, # "subsample": 0.8, "colsample_bytree": 0.8, "gamma": 0, "reg_alpha": 0, "reg_lambda": 1} # # cv_params = {"n_estimators": [550, 575, 600, 650, 675]} # other_params = {"learning_rate": 0.1, "n_estimators": 600, "max_depth": 5, "min_child_weight": 1, "seed": 0, # "subsample": 0.8, "colsample_bytree": 0.8, "gamma": 0, "reg_alpha": 0, "reg_lambda": 1} # # cv_params = {"max_depth": [3, 4, 5, 6, 7, 8, 9, 10], "min_child_weight": [1, 2, 3, 4, 5, 6]} # other_params = {"learning_rate": 0.1, "n_estimators": 550, "max_depth": 5, "min_child_weight": 1, "seed": 0, # "subsample": 0.8, "colsample_bytree": 0.8, "gamma": 0, "reg_alpha": 0, "reg_lambda": 1} # # cv_params = {"gamma": [0.1, 0.2, 0.3, 0.4, 0.5, 0.6]} # other_params = {"learning_rate": 0.1, "n_estimators": 550, "max_depth": 4, "min_child_weight": 5, "seed": 0, # "subsample": 0.8, "colsample_bytree": 0.8, "gamma": 0, "reg_alpha": 0, "reg_lambda": 1} # # cv_params = {"subsample": [0.6, 0.7, 0.8, 0.9], "colsample_bytree": [0.6, 0.7, 0.8, 0.9]} # other_params = {"learning_rate": 0.1, "n_estimators": 550, "max_depth": 4, "min_child_weight": 5, "seed": 0, # "subsample": 0.8, "colsample_bytree": 0.8, "gamma": 0.1, "reg_alpha": 0, "reg_lambda": 1} # # cv_params = {"reg_alpha": [0.05, 0.1, 1, 2, 3], "reg_lambda": [0.05, 0.1, 1, 2, 3]} # other_params = {"learning_rate": 0.1, "n_estimators": 550, "max_depth": 4, "min_child_weight": 5, "seed": 0, # "subsample": 0.7, "colsample_bytree": 0.7, "gamma": 0.1, "reg_alpha": 0, "reg_lambda": 1} # # cv_params = {"learning_rate": [0.01, 0.05, 0.07, 0.1, 0.2]} # other_params = {"learning_rate": 0.1, "n_estimators": 550, "max_depth": 4, "min_child_weight": 5, "seed": 0, # "subsample": 0.7, "colsample_bytree": 0.7, "gamma": 0.1, "reg_alpha": 1, "reg_lambda": 1} # # model = xgb.XGBRegressor(**other_params) # optimized_GBM = GridSearchCV(estimator=model, param_grid=cv_params, scoring="r2", cv=5, verbose=1, n_jobs=4) # optimized_GBM.fit(X_train, y_train) # evalute_result = optimized_GBM.grid_scores_ # print("每輪迭代運(yùn)行結(jié)果:{0}".format(evalute_result)) # print("參數(shù)的最佳取值:{0}".format(optimized_GBM.best_params_)) # print("最佳模型得分:{0}".format(optimized_GBM.best_score_))
更多干貨,歡迎去聽我的GitChat:
文章版權(quán)歸作者所有,未經(jīng)允許請勿轉(zhuǎn)載,若此文章存在違規(guī)行為,您可以聯(lián)系管理員刪除。
轉(zhuǎn)載請注明本文地址:http://specialneedsforspecialkids.com/yun/71058.html
摘要:字符串函數(shù)名,或是可調(diào)用對象,需要其函數(shù)簽名形如如果是,則使用的誤差估計(jì)函數(shù)。運(yùn)行后的結(jié)果為每輪迭代運(yùn)行結(jié)果參數(shù)的最佳取值最佳模型得分由輸出結(jié)果可知參數(shù)的最佳取值。提醒一點(diǎn),這個(gè)分?jǐn)?shù)是根據(jù)前面設(shè)置的得分函數(shù)算出來的,即中的。 這一篇博客的內(nèi)容是在上一篇博客Scikit中的特征選擇,XGboost進(jìn)行回歸預(yù)測,模型優(yōu)化的實(shí)戰(zhàn)的基礎(chǔ)上進(jìn)行調(diào)參優(yōu)化的,所以在閱讀本篇博客之前,請先移步看一下上...
摘要:字符串函數(shù)名,或是可調(diào)用對象,需要其函數(shù)簽名形如如果是,則使用的誤差估計(jì)函數(shù)。運(yùn)行后的結(jié)果為每輪迭代運(yùn)行結(jié)果參數(shù)的最佳取值最佳模型得分由輸出結(jié)果可知參數(shù)的最佳取值。提醒一點(diǎn),這個(gè)分?jǐn)?shù)是根據(jù)前面設(shè)置的得分函數(shù)算出來的,即中的。 這一篇博客的內(nèi)容是在上一篇博客Scikit中的特征選擇,XGboost進(jìn)行回歸預(yù)測,模型優(yōu)化的實(shí)戰(zhàn)的基礎(chǔ)上進(jìn)行調(diào)參優(yōu)化的,所以在閱讀本篇博客之前,請先移步看一下上...
摘要:字符串函數(shù)名,或是可調(diào)用對象,需要其函數(shù)簽名形如如果是,則使用的誤差估計(jì)函數(shù)。運(yùn)行后的結(jié)果為每輪迭代運(yùn)行結(jié)果參數(shù)的最佳取值最佳模型得分由輸出結(jié)果可知參數(shù)的最佳取值。提醒一點(diǎn),這個(gè)分?jǐn)?shù)是根據(jù)前面設(shè)置的得分函數(shù)算出來的,即中的。 這一篇博客的內(nèi)容是在上一篇博客Scikit中的特征選擇,XGboost進(jìn)行回歸預(yù)測,模型優(yōu)化的實(shí)戰(zhàn)的基礎(chǔ)上進(jìn)行調(diào)參優(yōu)化的,所以在閱讀本篇博客之前,請先移步看一下上...
摘要:采用機(jī)器學(xué)習(xí)預(yù)測足球比賽結(jié)果足球是世界上最火爆的運(yùn)動之一,世界杯期間也往往是球迷們最亢奮的時(shí)刻。特征工程在機(jī)器學(xué)習(xí)中占有非常重要的作用,一般認(rèn)為括特征構(gòu)建特征提取特征選擇三大部分。 采用 Python 機(jī)器學(xué)習(xí)預(yù)測足球比賽結(jié)果 足球是世界上最火爆的運(yùn)動之一,世界杯期間也往往是球迷們最亢奮的時(shí)刻。比賽狂歡季除了炸出了熬夜看球的鐵桿粉絲,也讓足球競猜也成了大家茶余飯后最熱衷的話題。甚至連原...
閱讀 2543·2023-04-26 00:56
閱讀 1999·2021-10-25 09:46
閱讀 1235·2019-10-29 15:13
閱讀 811·2019-08-30 15:54
閱讀 2190·2019-08-29 17:10
閱讀 2610·2019-08-29 15:43
閱讀 496·2019-08-29 15:28
閱讀 3022·2019-08-29 13:24