透過集合功能整理內容
你可以依據偏好儲存及分類內容。
袋外評估
隨機森林不需要驗證資料集。大多數隨機森林都會使用一種稱為「袋外評估」 (OOB 評估) 的技術來評估模型品質。OOB 評估會將訓練集視為交叉驗證的測試集。
如先前所述,隨機森林中的每個決策樹通常會使用約 67% 的訓練樣本進行訓練。因此,每個決策樹都不會看到約 33% 的訓練範例。OOB 評估的核心概念如下:
- 評估訓練集的隨機森林。
- 針對每個範例,只使用在訓練期間未看到該範例的決策樹。
下表說明隨機森林的 OOB 評估結果,其中有 3 個決策樹,並以 6 個範例進行訓練。(是的,這與「標記」一節中的表格相同)。表格中列出在 OOB 評估期間,哪個決策樹與哪個範例搭配使用。
表 7. OOB 評估:這些數字代表在訓練特定範例時,特定訓練範例的使用次數
|
訓練範例 |
OOB 評估的範例 |
|
#1 | #2 | #3 | #4 | #5 | #6 |
|
原始資料集 |
1 | 1 | 1 | 1 | 1 | 1 |
決策樹 1 |
1 | 1 | 0 | 2 | 1 | 1 |
#3 |
決策樹 2 |
3 | 0 | 1 | 0 | 2 | 0 |
#2、#4 和 #6 |
決策樹 3 |
0 | 1 | 3 | 1 | 0 | 1 |
#1 和 #5 |
在表 7 的範例中,訓練範例 1 的 OOB 預測結果會使用決策樹 #3 計算 (因為決策樹 #1 和 #2 使用這個範例進行訓練)。在實際情況中,如果資料集大小適中且有幾個決策樹,所有示例都會有 OOB 預測。
如果模型是使用 compute_oob_performances=True
訓練,在 YDF 中,訓練記錄中就會提供 OOB 評估。
OOB 評估也能有效計算隨機森林模型的排列變數重要性。請參閱「變數重要性」一文,瞭解變形變數重要性是如何透過評估變數在洗牌時的模型品質下降幅度,來評估變數的重要性。隨機決策樹「OOB 排列變數重要性」是使用 OOB 評估值計算的排列變數重要性。
如果模型是使用 compute_oob_variable_importances=True
訓練,在 YDF 中,訓練記錄中會顯示 OOB 排序變數的重要性。
除非另有註明,否則本頁面中的內容是採用創用 CC 姓名標示 4.0 授權,程式碼範例則為阿帕契 2.0 授權。詳情請參閱《Google Developers 網站政策》。Java 是 Oracle 和/或其關聯企業的註冊商標。
上次更新時間:2025-02-25 (世界標準時間)。
[null,null,["上次更新時間:2025-02-25 (世界標準時間)。"],[[["\u003cp\u003eRandom forests utilize out-of-bag (OOB) evaluation, eliminating the need for a separate validation dataset by treating the training set as a test set in a cross-validation-like approach.\u003c/p\u003e\n"],["\u003cp\u003eOOB evaluation leverages the fact that each decision tree in the forest is trained on approximately 67% of the training data, allowing the remaining 33% to be used for evaluation, similar to a test set.\u003c/p\u003e\n"],["\u003cp\u003eDuring OOB evaluation, predictions for a specific example are generated using only the decision trees that did not include that example in their training process.\u003c/p\u003e\n"],["\u003cp\u003eYDF provides access to OOB evaluation metrics and OOB permutation variable importances within the training logs, offering insights into model performance and feature relevance.\u003c/p\u003e\n"]]],[],null,["\u003cbr /\u003e\n\nOut-of-bag evaluation\n---------------------\n\nRandom forests do not require a validation dataset. Most random forests use a\ntechnique called **out-of-bag-evaluation** (**OOB** **evaluation**) to evaluate\nthe quality of the model. OOB evaluation treats the training set as if it were\non the test set of a cross-validation.\n\nAs explained earlier, each decision tree in a random forest is typically trained\non \\~67% of the training examples. Therefore, each decision tree does not see\n\\~33% of the training examples. The core idea of OOB-evaluation is as follows:\n\n- To evaluate the random forest on the training set.\n- For each example, only use the decision trees that did not see the example during training.\n\nThe following table illustrates OOB evaluation of a random forest with 3\ndecision trees trained on 6 examples. (Yes, this is the same table as in\nthe Bagging section). The table shows which decision tree is used with\nwhich example during OOB evaluation.\n\n**Table 7. OOB Evaluation - the numbers represent the number of times a given\ntraining example is used during training of the given example**\n\n| | Training examples |||||| Examples for OOB Evaluation |\n| | #1 | #2 | #3 | #4 | #5 | #6 | |\n| original dataset | 1 | 1 | 1 | 1 | 1 | 1 |\n| decision tree 1 | 1 | 1 | 0 | 2 | 1 | 1 | #3 |\n| decision tree 2 | 3 | 0 | 1 | 0 | 2 | 0 | #2, #4, and #6 |\n| decision tree 3 | 0 | 1 | 3 | 1 | 0 | 1 | #1 and #5 |\n|------------------|----|----|----|----|----|----|-----------------------------|\n\nIn the example shown in Table 7, the OOB predictions for training example 1\nwill be computed with decision tree #3 (since decision trees #1 and #2 used\nthis example for training). In practice, on a reasonable size dataset and\nwith a few decision trees, all the examples have an OOB prediction. \nYDF Code\nIn YDF, the OOB-evaluation is available in the training logs if the model is trained with `compute_oob_performances=True`.\n\nOOB evaluation is also effective to compute permutation variable importance for\nrandom forest models. Remember from [Variable\nimportances](/machine-learning/decision-forests/variable-importances) that\npermutation variable importance measures the importance of a variable by\nmeasuring the drop of model quality when this variable is shuffled. The random\nforest \"OOB permutation variable importance\" is a permutation variable\nimportance computed using the OOB evaluation. \nYDF Code\nIn YDF, the OOB permutation variable importances are available in the training logs if the model is trained with `compute_oob_variable_importances=True`."]]