This unit examines the following topics:
- interpreting random forests
- training random forests
- pros and cons of random forests
Interpreting random forests
Random forests are more complex to interpret than decision trees. Random forests contain decision trees trained with random noise. Therefore, it is harder to make judgments on the decision tree structure. However, we can interpret random forest models in a couple of ways.
One approach to interpret a random forest is simply to train and interpret a decision tree with the CART algorithm. Because both random forest and CART are trained with the same core algorithm, they "share the same global view" of the dataset. This option works well for simple datasets and to understand the overall interpretation of the model.
Variable importances are another good interpretability approach. For example, the following table ranks the variable importance of different features for a random forest model trained on the Census dataset (also known as Adult).
Table 8. Variable importance of 14 different features.
Feature | Sum score | Mean decrease in accuracy | Mean decrease in AUC | Mean min depth | Num nodes | Mean decrease in PR-AUC | Num as root |
---|---|---|---|---|---|---|---|
relationship | 4203592.6 |
0.0045 |
0.0172 |
4.970 |
57040 |
0.0093 |
1095 |
capital_gain | 3363045.1 |
0.0199 |
0.0194 |
2.852 |
56468 |
0.0655 |
457 |
marital_status | 3128996.3 |
0.0018 |
0.0230 |
6.633 |
52391 |
0.0107 |
750 |
age | 2520658.8 |
0.0065 |
0.0074 |
4.969 |
356784 |
0.0033 |
200 |
education | 2015905.4 |
0.0018 |
-0.0080 |
5.266 |
115751 |
-0.0129 |
205 |
occupation | 1939409.3 |
0.0063 |
-0.0040 |
5.017 |
221935 |
-0.0060 |
62 |
education_num | 1673648.4 |
0.0023 |
-0.0066 |
6.009 |
58303 |
-0.0080 |
197 |
fnlwgt | 1564189.0 |
-0.0002 |
-0.0038 |
9.969 |
431987 |
-0.0049 |
0 |
hours_per_week | 1333976.3 |
0.0030 |
0.0007 |
6.393 |
206526 |
-0.0031 |
20 |
capital_loss | 866863.8 |
0.0060 |
0.0020 |
8.076 |
58531 |
0.0118 |
1 |
workclass | 644208.4 |
0.0025 |
-0.0019 |
9.898 |
132196 |
-0.0023 |
0 |
native_country | 538841.2 |
0.0001 |
-0.0016 |
9.434 |
67211 |
-0.0058 |
0 |
sex | 226049.3 |
0.0002 |
0.0002 |
10.911 |
37754 |
-0.0011 |
13 |
race | 168180.9 |
-0.0006 |
-0.0004 |
11.571 |
42262 |
-0.0031 |
0 |
As you see, different definitions of variable importances have different scales and can lead to differences in the ranking of the features.
Variable importances that come from the model structure (for example, sum score, mean min depth, num nodes and num as root in the table above) are computed similarly for decision trees (see section "Cart | Variable importance") and random forests.
Permutation variable importance (for example, mean decrease in {accuracy, auc, pr-auc} in the table above) are model agnostic measures that can be computed on any machine learning model with a validation dataset. With random forest, however, instead of using a validation dataset, you can compute permutation variable importance with out-of-bag evaluation.
SHAP (SHapley Additive exPlanations) is a model agnostic method to explain individual predictions or model-wise interpretation. (See Interpretable Machine Learning by Molnar for an introduction to model agnostic interpretation.) SHAP is ordinarily expensive to compute but can be speeded-up significantly for decision forests, so it is a good way to interpret decision forests.
Usage example
In the previous lesson, we trained a CART decision tree on a small dataset
by calling tfdf.keras.CartModel
. To train a random forest model,
simply replace tfdf.keras.CartModel
with tfdf.keras.RandomForestModel
:
model = tfdf.keras.RandomForestModel()
model.fit(tf_train_dataset)
Pros and cons
This section contains a quick summary of the pros and cons of random forests.
Pros:
- Like decision trees, random forests support natively numerical and categorical features and often do not need feature pre-processing.
- Because the decision trees are independent, random forests can be trained in parallel. Consequently, you can train random forests quickly.
- Random forests have default parameters that often give great results. Tuning those parameters often has little effect on the model.
Cons:
- Because decision trees are not pruned, they can be large. Models with more than 1M nodes are common. The size (and therefore inference speed) of the random forest can sometimes be an issue.
- Random forests cannot learn and reuse internal representations. Each decision tree (and each branch of each decision tree) must relearn the dataset pattern. In some datasets, notably non-tabular dataset (e.g. image, text), this leads random forests to worse results than other methods.