Other topics

This unit examines the following topics:

interpreting random forests
training random forests
pros and cons of random forests

Interpreting random forests

Random forests are more complex to interpret than decision trees. Random forests contain decision trees trained with random noise. Therefore, it is harder to make judgments on the decision tree structure. However, we can interpret random forest models in a couple of ways.

One approach to interpret a random forest is simply to train and interpret a decision tree with the CART algorithm. Because both random forest and CART are trained with the same core algorithm, they "share the same global view" of the dataset. This option works well for simple datasets and to understand the overall interpretation of the model.

Variable importances are another good interpretability approach. For example, the following table ranks the variable importance of different features for a random forest model trained on the Census dataset (also known as Adult).

Table 8. Variable importance of 14 different features.

Feature	Sum score	Mean decrease in accuracy	Mean decrease in AUC	Mean min depth	Num nodes	Mean decrease in PR-AUC	Num as root
relationship	4203592.6	0.0045	0.0172	4.970	57040	0.0093	1095
capital_gain	3363045.1	0.0199	0.0194	2.852	56468	0.0655	457
marital_status	3128996.3	0.0018	0.0230	6.633	52391	0.0107	750
age	2520658.8	0.0065	0.0074	4.969	356784	0.0033	200
education	2015905.4	0.0018	-0.0080	5.266	115751	-0.0129	205
occupation	1939409.3	0.0063	-0.0040	5.017	221935	-0.0060	62
education_num	1673648.4	0.0023	-0.0066	6.009	58303	-0.0080	197
fnlwgt	1564189.0	-0.0002	-0.0038	9.969	431987	-0.0049	0
hours_per_week	1333976.3	0.0030	0.0007	6.393	206526	-0.0031	20
capital_loss	866863.8	0.0060	0.0020	8.076	58531	0.0118	1
workclass	644208.4	0.0025	-0.0019	9.898	132196	-0.0023	0
native_country	538841.2	0.0001	-0.0016	9.434	67211	-0.0058	0
sex	226049.3	0.0002	0.0002	10.911	37754	-0.0011	13
race	168180.9	-0.0006	-0.0004	11.571	42262	-0.0031	0

As you see, different definitions of variable importances have different scales and can lead to differences in the ranking of the features.

Variable importances that come from the model structure (for example, sum score, mean min depth, num nodes and num as root in the table above) are computed similarly for decision trees (see section "Cart | Variable importance") and random forests.

Permutation variable importance (for example, mean decrease in {accuracy, auc, pr-auc} in the table above) are model agnostic measures that can be computed on any machine learning model with a validation dataset. With random forest, however, instead of using a validation dataset, you can compute permutation variable importance with out-of-bag evaluation.

SHAP (SHapley Additive exPlanations) is a model agnostic method to explain individual predictions or model-wise interpretation. (See Interpretable Machine Learning by Molnar for an introduction to model agnostic interpretation.) SHAP is ordinarily expensive to compute but can be speeded-up significantly for decision forests, so it is a good way to interpret decision forests.

Usage example

In the previous lesson, we trained a CART decision tree on a small dataset by calling tfdf.keras.CartModel. To train a random forest model, simply replace tfdf.keras.CartModel with tfdf.keras.RandomForestModel:

model = tfdf.keras.RandomForestModel()
model.fit(tf_train_dataset)

Pros and cons

This section contains a quick summary of the pros and cons of random forests.

Pros:

Like decision trees, random forests support natively numerical and categorical features and often do not need feature pre-processing.
Because the decision trees are independent, random forests can be trained in parallel. Consequently, you can train random forests quickly.
Random forests have default parameters that often give great results. Tuning those parameters often has little effect on the model.

Cons:

Because decision trees are not pruned, they can be large. Models with more than 1M nodes are common. The size (and therefore inference speed) of the random forest can sometimes be an issue.
Random forests cannot learn and reuse internal representations. Each decision tree (and each branch of each decision tree) must relearn the dataset pattern. In some datasets, notably non-tabular dataset (e.g. image, text), this leads random forests to worse results than other methods.

Out-of-bag evaluation

Introduction