Decision trees: Check your understanding

This page challenges you to answer a series of multiple choice exercises about the material discussed in the "Training Decision Trees" unit.

Question 1

What are the effects of replacing the numerical features with their negative values (for example, changing the value +8 to -8) with the exact numerical splitter?
The structure of the decision tree will be completely different.
The same conditions will be learned; only the positive/negative children will be switched.
Different conditions will be learned, but the overall structure of the decision tree will remain the same.

Question 2

What two answers best describe the effect of testing only half (randomly selected) of the candidate threshold values in X?
The information gain would be lower or equal.
The final decision tree would have worse testing accuracy.
The information gain would be higher or equal.
The final decision tree would have no better training accuracy.

Question 3

What would happen if the "information gain" versus "threshold" curve had multiple local maxima?
The algorithm would select the local maxima with the smallest threshold value.
It is impossible to have multiple local maxima.
The algorithm would select the global maximum.

Question 4

Compute the information gain of the following split:

Node # of positive examples # of negative examples
parent node 10 6
first child 8 2
second child 2 4
# Positive label distribution
p_parent = 10 / (10+6) # = 0.625
p_child_1 = 8 / (8+2) # = 0.8
p_child_2 = 2 / (2+4) # = 0.3333333

# Entropy
h_parent = -p_parent * log(p_parent) - (1-p_parent) * log(1-p_parent) # = 0.6615632
h_child_1 = ... # = 0.5004024
h_child_2 = ... # = 0.6365142

# Ratio of example in the child 1
s = (8+2)/(10+6)
f_final = s * h_child_1 + (1-s) * h_child_2 # = 0.5514443

information_gain = h_parent - f_final # = 0.1101189