梯度提升(可选单元)
使用集合让一切井井有条
根据您的偏好保存内容并对其进行分类。
在回归问题中,将有符号误差定义为预测值与标签之间的差值是合理的。不过,对于其他类型的问题,这种策略通常会导致效果不佳。在梯度提升中,更好的策略是:
- 定义一个类似于神经网络中使用的损失函数。例如,分类问题的熵(也称为对数损失)。
- 训练弱模型,以根据强模型输出预测损失的梯度。
形式上,给定损失函数 $L(y,p)$(其中 $y$ 是标签,$p$ 是预测),用于在第 $i$ 步训练弱模型的伪响应 $z_i$ 为:
$$ z_i = \frac {\partial L(y, F_i)} {\partial F_i} $$
其中:
上面的示例是一个回归问题:目标是预测数值。对于回归问题,平方误差是一个常见的损失函数:
$$ L(y,p) = (y - p)^2 $$
在本例中,梯度为:
$$ z = \frac {\partial L(y, F_i)} {\partial F_i} = \frac {\partial(y-p)^2} {\partial p} = -2(y - p) = 2 \ \text{signed error} $$
换句话说,梯度是示例中带有系数 2 的有符号误差。请注意,由于收缩,常量因子不重要。请注意,只有对于使用平方误差损失函数的回归问题,此等式才成立。对于其他监督学习问题(例如分类、排名、使用百分位损失函数的回归),梯度和有符号误差之间并不等价。
使用牛顿方法步骤优化叶和结构
牛顿法是一种优化方法,与梯度下降法类似。不过,与仅使用函数梯度进行优化的梯度下降法不同,牛顿法同时使用函数的梯度(一阶导数)和二阶导数进行优化。
梯度下降法的步骤如下:
$$ x_{i+1} = x_i - \frac {df}{dx}(x_i) = x_i - f'(x_i) $$
和牛顿法如下所示:
$$ x_{i+1} = x_i - \frac {\frac {df}{dx} (x_i)} {\frac {d^2f}{d^2x} (x_i)} = x_i - \frac{f'(x_i)}{f''(x_i)}$$
您可以通过以下两种方式将牛顿法集成到梯度提升树的训练中(可选):
- 训练完树后,系统会对每个叶子应用一个牛顿步骤,并替换其值。树结构保持不变;只有叶值会发生变化。
- 在树生长过程中,系统会根据包含牛顿公式组件的得分来选择条件。树的结构会受到影响。
在 YDF 中:
- YDF 始终对叶节点应用牛顿步骤(选项 1)。
- 您可以使用
use_hessian_gain=True
启用选项 2。
如未另行说明,那么本页面中的内容已根据知识共享署名 4.0 许可获得了许可,并且代码示例已根据 Apache 2.0 许可获得了许可。有关详情,请参阅 Google 开发者网站政策。Java 是 Oracle 和/或其关联公司的注册商标。
最后更新时间 (UTC):2025-07-27。
[null,null,["最后更新时间 (UTC):2025-07-27。"],[[["\u003cp\u003eGradient boosting employs loss functions and trains weak models to predict the gradient of the loss, differing from simple signed error calculations.\u003c/p\u003e\n"],["\u003cp\u003eFor regression with squared error loss, the gradient is proportional to the signed error, but this doesn't hold for other problem types.\u003c/p\u003e\n"],["\u003cp\u003eNewton's method, incorporating both first and second derivatives, can enhance gradient boosted trees by optimizing leaf values and influencing tree structure.\u003c/p\u003e\n"],["\u003cp\u003eYDF, a specific implementation, always applies Newton's method to refine leaf values and optionally uses it for tree structure optimization.\u003c/p\u003e\n"]]],[],null,["# Gradient boosting (optional unit)\n\nIn regression problems, it makes sense to define the signed error as the\ndifference between the prediction and the label. However, in other types of\nproblems this strategy often leads to poor results. A better strategy used in\ngradient boosting is to:\n\n- Define a loss function similar to the loss functions used in neural networks. For example, the entropy (also known as log loss) for a classification problem.\n- Train the weak model to predict the *gradient of the loss according to the\n strong model output*.\n\nFormally, given a loss function $L(y,p)$ where $y$ is a label and $p$ a\nprediction, the pseudo response $z_i$ used to train the weak model\nat step $i$ is: \n$$ z_i = \\\\frac {\\\\partial L(y, F_i)} {\\\\partial F_i} $$\n\nwhere:\n\n- $F_i$ is the prediction of the strong model.\n\nThe preceding example was a regression problem: The objective is to predict a\nnumerical value. In the case of regression, squared error is a common loss\nfunction: \n$$ L(y,p) = (y - p)\\^2 $$\n\nIn this case, the gradient is: \n$$ z = \\\\frac {\\\\partial L(y, F_i)} {\\\\partial F_i} = \\\\frac {\\\\partial(y-p)\\^2} {\\\\partial p} = -2(y - p) = 2 \\\\ \\\\text{signed error} $$\n\nIn order words, the gradient is the signed error from our example with a factor\nof 2. Note that constant factors do not matter because of the shrinkage. Note\nthat this equivalence is only true for regression problems with squared error\nloss. For other supervised learning problems (for example, classification,\nranking, regression with percentile loss), there is no equivalence between the\ngradient and a signed error.\n\nLeaf and structure optimization with Newton's method step\n---------------------------------------------------------\n\nNewton's method is an optimization method like gradient descent. However, unlike\nthe gradient descent that only uses the gradient of the function to optimize,\nNewton's method uses both the gradient (first derivative) and the second\nderivative of the function for optimization.\n\nA step of gradient descent is as follows: \n$$ x_{i+1} = x_i - \\\\frac {df}{dx}(x_i) = x_i - f'(x_i) $$\n\nand Newton's method as as follows: \n$$ x_{i+1} = x_i - \\\\frac {\\\\frac {df}{dx} (x_i)} {\\\\frac {d\\^2f}{d\\^2x} (x_i)} = x_i - \\\\frac{f'(x_i)}{f''(x_i)}$$\n\nOptionally, Newton's method can be integrated to the training of gradient\nboosted trees in two ways:\n\n1. Once a tree is trained, a step of Newton is applied on each leaf and overrides its value. The tree structure is untouched; only the leaf values change.\n2. During the growth of a tree, conditions are selected according to a score that includes a component of the Newton formula. The structure of the tree is impacted.\n\nYDF Code\nIn YDF:\n\n- YDF always applies a Newton step on the leaf (option 1).\n- You can enable option 2 with `use_hessian_gain=True`."]]