过拟合、正则化和早停法
使用集合让一切井井有条
根据您的偏好保存内容并对其进行分类。
与随机森林不同,梯度提升树可能会过拟合。因此,对于神经网络,您可以使用验证数据集应用正则化和提前停止。
例如,以下图显示了训练 GBT 模型时训练集和验证集的损失和准确率曲线。请注意曲线的差异程度,这表明过拟合程度较高。

图 29. 损失与决策树数量的关系。

图 30. 准确率与决策树数量。
梯度提升树的常见正则化参数包括:
- 树的最大深度。
- 收缩率。
- 每个节点测试的属性的比例。
- 损失函数的 L1 和 L2 系数。
请注意,决策树通常比随机森林模型要浅得多。默认情况下,TF-DF 中的梯度提升树会生长到深度 6。
由于树的深度较浅,因此每个叶子节点的最小示例数影响不大,通常不需要进行调整。
当训练示例数量较少时,需要验证数据集是一个问题。因此,通常是在交叉验证循环中训练梯度提升树,或者在已知模型不会过拟合时停用提前停止。
用法示例
在上一章中,我们对一个小型数据集训练了随机森林。在此示例中,我们只需将随机森林模型替换为梯度提升树模型即可:
model = tfdf.keras.GradientBoostedTreesModel()
# Part of the training dataset will be used as validation (and removed
# from training).
model.fit(tf_train_dataset)
# The user provides the validation dataset.
model.fit(tf_train_dataset, validation_data=tf_valid_dataset)
# Disable early stopping and the validation dataset. All the examples are
# used for training.
model.fit(
tf_train_dataset,
validation_ratio=0.0,
early_stopping="NONE")
# Note: When "validation_ratio=0", early stopping is automatically disabled,
# so early_stopping="NONE" is redundant here.
使用和限制
梯度提升树有一定的优缺点。
优点
- 与决策树一样,它们原生支持数值特征和分类特征,通常不需要特征预处理。
- 梯度提升树具有默认超参数,通常可提供出色的结果。不过,调整这些超参数可以显著改进模型。
- 梯度提升树模型通常体积较小(节点数和内存用量较小),并且运行速度较快(通常只需 1 到 2 微秒即可处理一个示例)。
缺点
- 决策树必须按顺序进行训练,这可能会大大减慢训练速度。不过,决策树较小,这在一定程度上抵消了训练速度放慢的影响。
- 与随机森林一样,梯度提升树无法学习和重复使用内部表示法。每个决策树(以及每个决策树的每个分支)都必须重新学习数据集模式。在某些数据集中(尤其是包含非结构化数据 [例如图片、文本] 的数据集),这会导致梯度提升树的结果不如其他方法。
如未另行说明,那么本页面中的内容已根据知识共享署名 4.0 许可获得了许可,并且代码示例已根据 Apache 2.0 许可获得了许可。有关详情,请参阅 Google 开发者网站政策。Java 是 Oracle 和/或其关联公司的注册商标。
最后更新时间 (UTC):2025-02-25。
[null,null,["最后更新时间 (UTC):2025-02-25。"],[[["\u003cp\u003eGradient boosted trees, unlike random forests, are susceptible to overfitting and may require regularization and early stopping techniques using a validation dataset.\u003c/p\u003e\n"],["\u003cp\u003eKey regularization parameters for gradient boosted trees include maximum tree depth, shrinkage rate, attribute test ratio at each node, and L1/L2 coefficients on the loss.\u003c/p\u003e\n"],["\u003cp\u003eGradient boosted trees offer advantages such as native support for various feature types, generally good default hyperparameters, and efficient model size and prediction speed.\u003c/p\u003e\n"],["\u003cp\u003eTraining gradient boosted trees involves sequential tree construction, potentially slowing down the process, and they lack the ability to learn and reuse internal representations, potentially impacting performance on certain datasets.\u003c/p\u003e\n"]]],[],null,["# Overfitting, regularization, and early stopping\n\n\u003cbr /\u003e\n\nUnlike random forests, gradient boosted trees *can* overfit. Therefore, as for\nneural networks, you can apply regularization and early stopping using a\nvalidation dataset.\n\nFor example, the following figures show loss and accuracy curves for training\nand validation sets when training a GBT model. Notice how divergent the curves\nare, which suggests a high degree of overfitting.\n\n**Figure 29. Loss vs. number of decision trees.**\n\n**Figure 30. Accuracy vs. number of decision trees.**\n\nCommon regularization parameters for gradient boosted trees include:\n\n- The maximum depth of the tree.\n- The shrinkage rate.\n- The ratio of attributes tested at each node.\n- L1 and L2 coefficient on the loss.\n\nNote that decision trees generally grow much shallower than random forest\nmodels. By default, gradient boosted trees trees in TF-DF are grown to depth 6.\nBecause the trees are shallow, the minimum number of examples per leaf has\nlittle impact and is generally not tuned.\n\nThe need for a validation dataset is an issue when the number of training\nexamples is small. Therefore, it is common to train gradient boosted trees\ninside a cross-validation loop, or to disable early stopping when the model\nis known not to overfit.\n\nUsage example\n-------------\n\nIn the previous chapter, we trained a random forest on a small dataset. In this\nexample, we will simply replace the random forest model with a gradient boosted\ntrees model: \n\n model = tfdf.keras.GradientBoostedTreesModel()\n\n # Part of the training dataset will be used as validation (and removed\n # from training).\n model.fit(tf_train_dataset)\n\n # The user provides the validation dataset.\n model.fit(tf_train_dataset, validation_data=tf_valid_dataset)\n\n # Disable early stopping and the validation dataset. All the examples are\n # used for training.\n model.fit(\n tf_train_dataset,\n validation_ratio=0.0,\n early_stopping=\"NONE\")\n # Note: When \"validation_ratio=0\", early stopping is automatically disabled,\n # so early_stopping=\"NONE\" is redundant here.\n\nUsage and limitations\n---------------------\n\nGradient boosted trees have some pros and cons.\n\n**Pros**\n\n- Like decision trees, they natively support numerical and categorical features and often do not need feature pre-processing.\n- Gradient boosted trees have default hyperparameters that often give great results. Nevertheless, tuning those hyperparameters can significantly improve the model.\n- Gradient boosted tree models are generally small (in number of nodes and in memory) and fast to run (often just one or a few µs / examples).\n\n**Cons**\n\n- The decision trees must be trained sequentially, which can slow training considerably. However, the training slowdown is somewhat offset by the decision trees being smaller.\n- Like random forests, gradient boosted trees can't learn and reuse internal representations. Each decision tree (and each branch of each decision tree) must relearn the dataset pattern. In some datasets, notably datasets with unstructured data (for example, images, text), this causes gradient boosted trees to show poorer results than other methods."]]