漸層增強 (選用單位)
透過集合功能整理內容
你可以依據偏好儲存及分類內容。
在迴歸問題中,將有符號的誤差定義為預測值和標籤之間的差異,是合理的做法。不過,在其他類型的問題中,這項策略通常會導致結果不佳。在梯度提升中採用更理想的策略:
- 定義損失函式,類似於神經網路中使用的損失函式。例如分類問題的熵 (也稱為對數損失)。
- 訓練弱模型,以便根據強模型輸出內容預測損失的梯度。
正式來說,假設損失函數為 $L(y,p)$,其中 $y$ 為標籤,$p$ 為預測值,則用於在步驟 $i$ 訓練弱模型的偽回應 $z_i$ 為:
$$ z_i = \frac {\partial L(y, F_i)} {\partial F_i} $$
其中:
上述範例是迴歸問題:目標是預測數值。在迴歸的情況下,平方誤差是常見的損失函式:
$$ L(y,p) = (y - p)^2 $$
在本例中,漸層為:
$$ z = \frac {\partial L(y, F_i)} {\partial F_i} = \frac {\partial(y-p)^2} {\partial p} = -2(y - p) = 2 \ \text{signed error} $$
換句話說,梯度是從我們的範例中取出,並以 2 為因數的符號錯誤。請注意,由於縮減,常數因數並不重要。請注意,這種等價關係僅適用於使用平方誤差損失函數的迴歸問題。對於其他監督式學習問題 (例如分類、排名、百分位損失的迴歸),梯度和帶有符號的錯誤之間並無對應關係。
使用牛頓方法步驟進行葉子和結構最佳化
牛頓方法是梯度下降法這類最佳化方法。不過,與梯度下降法只使用函式梯度進行最佳化不同,牛頓方法會同時使用函式的梯度 (一階導數) 和二階導數進行最佳化。
梯度下降法的步驟如下:
$$ x_{i+1} = x_i - \frac {df}{dx}(x_i) = x_i - f'(x_i) $$
和牛頓方法如下:
$$ x_{i+1} = x_i - \frac {\frac {df}{dx} (x_i)} {\frac {d^2f}{d^2x} (x_i)} = x_i - \frac{f'(x_i)}{f''(x_i)}$$
您可以選擇以兩種方式將牛頓方法整合至梯度提升樹的訓練作業:
- 樹狀結構訓練完成後,系統會在每個葉節上套用牛頓步驟,並覆寫其值。樹狀結構不會受到影響,只有葉節點值會變更。
- 在樹狀結構成長期間,系統會根據包含牛頓公式元素的評分選取條件。樹狀結構會受到影響。
在 YDF 中:
- YDF 一律會在節點上套用牛頓步驟 (選項 1)。
- 您可以使用
use_hessian_gain=True
啟用選項 2。
除非另有註明,否則本頁面中的內容是採用創用 CC 姓名標示 4.0 授權,程式碼範例則為阿帕契 2.0 授權。詳情請參閱《Google Developers 網站政策》。Java 是 Oracle 和/或其關聯企業的註冊商標。
上次更新時間:2025-07-27 (世界標準時間)。
[null,null,["上次更新時間:2025-07-27 (世界標準時間)。"],[[["\u003cp\u003eGradient boosting employs loss functions and trains weak models to predict the gradient of the loss, differing from simple signed error calculations.\u003c/p\u003e\n"],["\u003cp\u003eFor regression with squared error loss, the gradient is proportional to the signed error, but this doesn't hold for other problem types.\u003c/p\u003e\n"],["\u003cp\u003eNewton's method, incorporating both first and second derivatives, can enhance gradient boosted trees by optimizing leaf values and influencing tree structure.\u003c/p\u003e\n"],["\u003cp\u003eYDF, a specific implementation, always applies Newton's method to refine leaf values and optionally uses it for tree structure optimization.\u003c/p\u003e\n"]]],[],null,["# Gradient boosting (optional unit)\n\nIn regression problems, it makes sense to define the signed error as the\ndifference between the prediction and the label. However, in other types of\nproblems this strategy often leads to poor results. A better strategy used in\ngradient boosting is to:\n\n- Define a loss function similar to the loss functions used in neural networks. For example, the entropy (also known as log loss) for a classification problem.\n- Train the weak model to predict the *gradient of the loss according to the\n strong model output*.\n\nFormally, given a loss function $L(y,p)$ where $y$ is a label and $p$ a\nprediction, the pseudo response $z_i$ used to train the weak model\nat step $i$ is: \n$$ z_i = \\\\frac {\\\\partial L(y, F_i)} {\\\\partial F_i} $$\n\nwhere:\n\n- $F_i$ is the prediction of the strong model.\n\nThe preceding example was a regression problem: The objective is to predict a\nnumerical value. In the case of regression, squared error is a common loss\nfunction: \n$$ L(y,p) = (y - p)\\^2 $$\n\nIn this case, the gradient is: \n$$ z = \\\\frac {\\\\partial L(y, F_i)} {\\\\partial F_i} = \\\\frac {\\\\partial(y-p)\\^2} {\\\\partial p} = -2(y - p) = 2 \\\\ \\\\text{signed error} $$\n\nIn order words, the gradient is the signed error from our example with a factor\nof 2. Note that constant factors do not matter because of the shrinkage. Note\nthat this equivalence is only true for regression problems with squared error\nloss. For other supervised learning problems (for example, classification,\nranking, regression with percentile loss), there is no equivalence between the\ngradient and a signed error.\n\nLeaf and structure optimization with Newton's method step\n---------------------------------------------------------\n\nNewton's method is an optimization method like gradient descent. However, unlike\nthe gradient descent that only uses the gradient of the function to optimize,\nNewton's method uses both the gradient (first derivative) and the second\nderivative of the function for optimization.\n\nA step of gradient descent is as follows: \n$$ x_{i+1} = x_i - \\\\frac {df}{dx}(x_i) = x_i - f'(x_i) $$\n\nand Newton's method as as follows: \n$$ x_{i+1} = x_i - \\\\frac {\\\\frac {df}{dx} (x_i)} {\\\\frac {d\\^2f}{d\\^2x} (x_i)} = x_i - \\\\frac{f'(x_i)}{f''(x_i)}$$\n\nOptionally, Newton's method can be integrated to the training of gradient\nboosted trees in two ways:\n\n1. Once a tree is trained, a step of Newton is applied on each leaf and overrides its value. The tree structure is untouched; only the leaf values change.\n2. During the growth of a tree, conditions are selected according to a score that includes a component of the Newton formula. The structure of the tree is impacted.\n\nYDF Code\nIn YDF:\n\n- YDF always applies a Newton step on the leaf (option 1).\n- You can enable option 2 with `use_hessian_gain=True`."]]