過度配適、正規化和提早中止
透過集合功能整理內容
你可以依據偏好儲存及分類內容。
與隨機森林不同,梯度提升樹可能過度擬合。因此,針對神經網路,您可以使用驗證資料集套用正則化和提早停止。
舉例來說,下圖顯示訓練 GBT 模型時,訓練和驗證集的損失和準確度曲線。請注意曲線的差異程度,這表示過度擬合程度高。

圖 29. 損失率與決策樹數量。

圖 30. 準確率與決策樹數量。
梯度提升樹的常見正則化參數包括:
- 樹狀結構的深度上限。
- 收縮率。
- 每個節點測試的屬性比率。
- 損失的 L1 和 L2 係數。
請注意,決策樹的深度通常比隨機森林模型淺得多。根據預設,TF-DF 中的梯度提升樹會成長至深度 6。由於樹狀結構較淺,每個葉節點的示例數量下限幾乎沒有影響,因此通常不會進行調整。
當訓練範例數量不多時,就需要驗證資料集。因此,通常會在交叉驗證迴圈中訓練梯度提升樹,或是在已知模型不會過度擬合時停用提早停止機制。
使用範例
在上一章中,我們針對小型資料集訓練隨機森林。在這個範例中,我們會直接將隨機樹模型替換為梯度提升樹模型:
model = tfdf.keras.GradientBoostedTreesModel()
# Part of the training dataset will be used as validation (and removed
# from training).
model.fit(tf_train_dataset)
# The user provides the validation dataset.
model.fit(tf_train_dataset, validation_data=tf_valid_dataset)
# Disable early stopping and the validation dataset. All the examples are
# used for training.
model.fit(
tf_train_dataset,
validation_ratio=0.0,
early_stopping="NONE")
# Note: When "validation_ratio=0", early stopping is automatically disabled,
# so early_stopping="NONE" is redundant here.
使用方式和限制
梯度提升樹有幾項優缺點。
優點
- 與決策樹一樣,這些樹狀結構本支援數值和分類特徵,通常不需要特徵預先處理。
- 梯度提升樹的預設超參數通常可帶來出色的結果。不過,調整這些超參數可大幅改善模型。
- 梯度提升樹模型通常很小 (節點數和記憶體),且執行速度快 (通常只需一或幾個 µs / 例項)。
缺點
- 決策樹必須依序訓練,這可能會大幅減緩訓練速度。不過,訓練速度變慢的情況會因決策樹變小而稍微緩解。
- 與隨機樹系一樣,梯度提升樹無法學習及重複使用內部表示法。每個決策樹 (以及每個決策樹的每個分支) 都必須重新學習資料集模式。在某些資料集 (尤其是包含非結構化資料 (例如圖片、文字) 的資料集) 中,這會導致梯度提升樹的結果比其他方法更差。
除非另有註明,否則本頁面中的內容是採用創用 CC 姓名標示 4.0 授權,程式碼範例則為阿帕契 2.0 授權。詳情請參閱《Google Developers 網站政策》。Java 是 Oracle 和/或其關聯企業的註冊商標。
上次更新時間:2025-02-25 (世界標準時間)。
[null,null,["上次更新時間:2025-02-25 (世界標準時間)。"],[[["\u003cp\u003eGradient boosted trees, unlike random forests, are susceptible to overfitting and may require regularization and early stopping techniques using a validation dataset.\u003c/p\u003e\n"],["\u003cp\u003eKey regularization parameters for gradient boosted trees include maximum tree depth, shrinkage rate, attribute test ratio at each node, and L1/L2 coefficients on the loss.\u003c/p\u003e\n"],["\u003cp\u003eGradient boosted trees offer advantages such as native support for various feature types, generally good default hyperparameters, and efficient model size and prediction speed.\u003c/p\u003e\n"],["\u003cp\u003eTraining gradient boosted trees involves sequential tree construction, potentially slowing down the process, and they lack the ability to learn and reuse internal representations, potentially impacting performance on certain datasets.\u003c/p\u003e\n"]]],[],null,["# Overfitting, regularization, and early stopping\n\n\u003cbr /\u003e\n\nUnlike random forests, gradient boosted trees *can* overfit. Therefore, as for\nneural networks, you can apply regularization and early stopping using a\nvalidation dataset.\n\nFor example, the following figures show loss and accuracy curves for training\nand validation sets when training a GBT model. Notice how divergent the curves\nare, which suggests a high degree of overfitting.\n\n**Figure 29. Loss vs. number of decision trees.**\n\n**Figure 30. Accuracy vs. number of decision trees.**\n\nCommon regularization parameters for gradient boosted trees include:\n\n- The maximum depth of the tree.\n- The shrinkage rate.\n- The ratio of attributes tested at each node.\n- L1 and L2 coefficient on the loss.\n\nNote that decision trees generally grow much shallower than random forest\nmodels. By default, gradient boosted trees trees in TF-DF are grown to depth 6.\nBecause the trees are shallow, the minimum number of examples per leaf has\nlittle impact and is generally not tuned.\n\nThe need for a validation dataset is an issue when the number of training\nexamples is small. Therefore, it is common to train gradient boosted trees\ninside a cross-validation loop, or to disable early stopping when the model\nis known not to overfit.\n\nUsage example\n-------------\n\nIn the previous chapter, we trained a random forest on a small dataset. In this\nexample, we will simply replace the random forest model with a gradient boosted\ntrees model: \n\n model = tfdf.keras.GradientBoostedTreesModel()\n\n # Part of the training dataset will be used as validation (and removed\n # from training).\n model.fit(tf_train_dataset)\n\n # The user provides the validation dataset.\n model.fit(tf_train_dataset, validation_data=tf_valid_dataset)\n\n # Disable early stopping and the validation dataset. All the examples are\n # used for training.\n model.fit(\n tf_train_dataset,\n validation_ratio=0.0,\n early_stopping=\"NONE\")\n # Note: When \"validation_ratio=0\", early stopping is automatically disabled,\n # so early_stopping=\"NONE\" is redundant here.\n\nUsage and limitations\n---------------------\n\nGradient boosted trees have some pros and cons.\n\n**Pros**\n\n- Like decision trees, they natively support numerical and categorical features and often do not need feature pre-processing.\n- Gradient boosted trees have default hyperparameters that often give great results. Nevertheless, tuning those hyperparameters can significantly improve the model.\n- Gradient boosted tree models are generally small (in number of nodes and in memory) and fast to run (often just one or a few µs / examples).\n\n**Cons**\n\n- The decision trees must be trained sequentially, which can slow training considerably. However, the training slowdown is somewhat offset by the decision trees being smaller.\n- Like random forests, gradient boosted trees can't learn and reuse internal representations. Each decision tree (and each branch of each decision tree) must relearn the dataset pattern. In some datasets, notably datasets with unstructured data (for example, images, text), this causes gradient boosted trees to show poorer results than other methods."]]