Machine Learning | Google for Developers

Out-of-bag evaluation

Random forests do not require a validation dataset. Most random forests use a technique called out-of-bag-evaluation (OOB evaluation) to evaluate the quality of the model. OOB evaluation treats the training set as if it were on the test set of a cross-validation.

As explained earlier, each decision tree in a random forest is typically trained on ~67% of the training examples. Therefore, each decision tree does not see ~33% of the training examples. The core idea of OOB-evaluation is as follows:

To evaluate the random forest on the training set.
For each example, only use the decision trees that did not see the example during training.

The following table illustrates OOB evaluation of a random forest with 3 decision trees trained on 6 examples. (Yes, this is the same table as in the Bagging section). The table shows which decision tree is used with which example during OOB evaluation.

Table 7. OOB Evaluation - the numbers represent the number of times a given training example is used during training of the given example

	Training examples						Examples for OOB Evaluation
	#1	#2	#3	#4	#5	#6
original dataset	1	1	1	1	1	1
decision tree 1	1	1	0	2	1	1	#3
decision tree 2	3	0	1	0	2	0	#2, #4, and #6
decision tree 3	0	1	3	1	0	1	#1 and #5

In the example shown in Table 7, the OOB predictions for training example 1 will be computed with decision tree #3 (since decision trees #1 and #2 used this example for training). In practice, on a reasonable size dataset and with a few decision trees, all the examples have an OOB prediction.

YDF Code

In YDF, the OOB-evaluation is available in the training logs if the model is trained with compute_oob_performances=True.

OOB evaluation is also effective to compute permutation variable importance for random forest models. Remember from Variable importances that permutation variable importance measures the importance of a variable by measuring the drop of model quality when this variable is shuffled. The random forest "OOB permutation variable importance" is a permutation variable importance computed using the OOB evaluation.

YDF Code

In YDF, the OOB permutation variable importances are available in the training logs if the model is trained with compute_oob_variable_importances=True.

Check your understanding