决策树:检查您的理解情况
使用集合让一切井井有条
根据您的偏好保存内容并对其进行分类。
本页面将为您提供一系列多选题练习,以便您巩固“训练决策树”单元中所学内容。
问题 1
使用精确数值分隔符将数值特征替换为其负值(例如,将值 +8 更改为 -8)会产生什么影响?
系统会学习相同的条件;只会切换正例/负例子项。
太棒了
系统会学习不同的条件,但决策树的整体结构将保持不变。
如果地图项发生变化,相应条件也会发生变化。
决策树的结构将完全不同。
决策树的结构实际上会大致相同。不过,条件会发生变化。
问题 2
哪两项回答最能描述仅测试 X 中一半(随机选择)候选阈值的影响?
问题 3
如果“信息增益”与“阈值”曲线具有多个局部最大值,会出现什么情况?
不可能存在多个局部极大值。
可能存在多个局部极大值。
问题 4
计算以下分块的信息增益:
节点 | 正例数 | 反例数量 |
父节点 | 10 | 6 |
第一个子项 | 8 | 2 |
第二个子项 | 2 | 4 |
点击该图标即可查看答案。
# Positive label distribution
p_parent = 10 / (10+6) # = 0.625
p_child_1 = 8 / (8+2) # = 0.8
p_child_2 = 2 / (2+4) # = 0.3333333
# Entropy
h_parent = -p_parent * log(p_parent) - (1-p_parent) * log(1-p_parent) # = 0.6615632
h_child_1 = ... # = 0.5004024
h_child_2 = ... # = 0.6365142
# Ratio of example in the child 1
s = (8+2)/(10+6)
f_final = s * h_child_1 + (1-s) * h_child_2 # = 0.5514443
information_gain = h_parent - f_final # = 0.1101189
如未另行说明,那么本页面中的内容已根据知识共享署名 4.0 许可获得了许可,并且代码示例已根据 Apache 2.0 许可获得了许可。有关详情,请参阅 Google 开发者网站政策。Java 是 Oracle 和/或其关联公司的注册商标。
最后更新时间 (UTC):2025-02-25。
[null,null,["最后更新时间 (UTC):2025-02-25。"],[[["\u003cp\u003eThis webpage presents a series of multiple-choice exercises focused on evaluating your understanding of decision tree training concepts.\u003c/p\u003e\n"],["\u003cp\u003eThe exercises cover topics such as the impact of feature manipulation on decision tree structure, the effects of altering threshold selection strategies, and the implications of multiple local maxima in information gain curves.\u003c/p\u003e\n"],["\u003cp\u003eOne question requires calculating information gain using entropy and provided data, demonstrating the practical application of decision tree principles.\u003c/p\u003e\n"]]],[],null,["# Decision trees: Check your understanding\n\n\u003cbr /\u003e\n\nThis page challenges you to answer a series of multiple choice exercises\nabout the material discussed in the \"Training Decision Trees\" unit.\n\nQuestion 1\n----------\n\nWhat are the effects of replacing the numerical features with their negative values (for example, changing the value +8 to -8) with the exact numerical splitter? \nThe same conditions will be learned; only the positive/negative children will be switched. \nFantastic. \nDifferent conditions will be learned, but the overall structure of the decision tree will remain the same. \nIf the features change, then the conditions will change. \nThe structure of the decision tree will be completely different. \nThe structure of the decision tree will actually be pretty much the same. The conditions will change, though.\n\nQuestion 2\n----------\n\nWhat two answers best describe the effect of testing only half (randomly selected) of the candidate threshold values in X? \nThe information gain would be higher or equal. \nThe information gain would be lower or equal. \nWell done. \nThe final decision tree would have worse testing accuracy. \nThe final decision tree would have no better training accuracy. \nWell done.\n\nQuestion 3\n----------\n\nWhat would happen if the \"information gain\" versus \"threshold\" curve had multiple local maxima? \nIt is impossible to have multiple local maxima. \nMultiple local maxima are possible. \nThe algorithm would select the local maxima with the smallest threshold value. \nThe algorithm would select the global maximum. \nWell done.\n\nQuestion 4\n----------\n\nCompute the information gain of the following split:\n\n| Node | # of positive examples | # of negative examples |\n|--------------|------------------------|------------------------|\n| parent node | 10 | 6 |\n| first child | 8 | 2 |\n| second child | 2 | 4 |\n\n#### Click the icon to see the answer.\n\n```scdoc\n# Positive label distribution\np_parent = 10 / (10+6) # = 0.625\np_child_1 = 8 / (8+2) # = 0.8\np_child_2 = 2 / (2+4) # = 0.3333333\n\n# Entropy\nh_parent = -p_parent * log(p_parent) - (1-p_parent) * log(1-p_parent) # = 0.6615632\nh_child_1 = ... # = 0.5004024\nh_child_2 = ... # = 0.6365142\n\n# Ratio of example in the child 1\ns = (8+2)/(10+6)\nf_final = s * h_child_1 + (1-s) * h_child_2 # = 0.5514443\n\ninformation_gain = h_parent - f_final # = 0.1101189\n```\n\n*** ** * ** ***"]]