此页面由 Cloud Translation API 翻译。

种植决策树

与所有监督式机器学习模型一样，决策树经过训练，能够用于解释一组训练示例。决策树的最佳训练方式是 NP 困难问题。因此，训练通常使用启发式方法完成，一种易于创建的学习算法，可提供非最优，但接近于最优决策树。

用于训练决策树的大多数算法都使用贪心除法和制胜策略。该算法首先创建一个节点（根节点），然后以递归方式贪心地增大决策树。

在每个节点上，系统会评估所有可能的条件并打分。通过算法选出“最佳”条件，即得分。现在，您只需要知道，得分是与任务和条件，以最大限度地提高该指标。

例如，在 Palmer 企鹅数据集（在本课程后面的代码示例中），大多数 Adelie 和 Chinstrap 企鹅的账单长度超过 16 毫米，而大部分企鹅企鹅的账单比较小。因此，条件 bill_length_mm ≥ 16 能够对巴布亚企鹅但无法分辨 Adelies 和 Chinstraps 之间的交流。算法很可能会选择此条件。

一种情况导致两片叶子。使用情况为“bill_length_mm >= 16”。
如果是，这个叶子就是“Adelie 或 Chinstrap”。如果不是，则代表叶节点。
是“Gentoo”

<ph type="x-smartling-placeholder"></ph> 图 7.种植树木的第一步。 。

然后，该算法在两个子节点上以递归方式独立重复。如果未找到满足条件，该节点就会成为叶节点。叶子被确定为样本中最具代表性的标签值。

算法如下：

def train_decision_tree(training_examples):
  root = create_root() # Create a decision tree with a single empty root.
  grow_tree(root, training_examples) # Grow the root node.
  return root

def grow_tree(node, examples):
  condition = find_best_condition(examples) # Find the best condition.

  if condition is None:
    # No satisfying conditions were found, therefore the grow of the branch stops.
    set_leaf_prediction(node, examples)
    return

  # Create two childrens for the node.
  positive_child, negative_child = split_node(node, condition)

  # List the training examples used by each children.
  negative_examples = [example for example in examples if not condition(example)]
  positive_examples = [example for example in examples if condition(example)]

  # Continue the growth of the children.
  grow_tree(negative_child, negative_examples)
  grow_tree(positive_child, positive_examples)

下面我们来介绍在训练过程中。

第 1 步：创建根目录：

带问号的节点。