本頁面由 Cloud Translation API 翻譯而成。

成長中的決策樹

就像所有監督式機器學習模型一樣，決策樹的訓練目標是盡可能解釋一組訓練範例。決策樹的最佳訓練方法是 NP-hard 問題。因此，訓練通常會使用啟發法，這是一種易於建立的學習演算法，可提供非最佳但接近最佳的決策樹。

用於訓練決策樹的演算法大多採用貪婪的「分而治」策略。演算法會先建立單一節點 (根節點)，然後以遞迴方式貪婪地擴充決策樹。

系統會在每個節點評估所有可能的條件並給予分數。演算法會選取「最佳」條件，也就是分數最高的條件。目前，您只需要知道分數是與工作相關的指標，而系統會選擇可將該指標提升至最高的條件。

舉例來說，在 Palmer 企鵝資料集 (用於本課程後續的程式碼範例) 中，大多數的阿德利企鵝和巴布亞企鵝嘴喙長度超過 16 公釐，而大多數的冠企鵝嘴喙較小。因此，條件 bill_length_mm ≥ 16 幾乎可完美預測 Gentoo 企鵝，但無法區分 Adelie 企鵝和 Chinstrap 企鵝。演算法可能會選取這個條件。

一個條件會導向兩個葉節點。條件為「bill_length_mm >= 16」。如果是，則葉片為「Adelie 或 Chinstrap」。如果不是，則葉子為「Gentoo」。

圖 7. 種樹的第一步。

接著，演算法會在兩個子節點上獨立地重複遞迴。如果找不到符合條件的節點，該節點就會變成葉節點。葉節預測結果會視為範例中最具代表性的標籤值。

演算法如下：

def train_decision_tree(training_examples):
  root = create_root() # Create a decision tree with a single empty root.
  grow_tree(root, training_examples) # Grow the root node.
  return root

def grow_tree(node, examples):
  condition = find_best_condition(examples) # Find the best condition.

  if condition is None:
    # No satisfying conditions were found, therefore the grow of the branch stops.
    set_leaf_prediction(node, examples)
    return

  # Create two childrens for the node.
  positive_child, negative_child = split_node(node, condition)

  # List the training examples used by each children.
  negative_examples = [example for example in examples if not condition(example)]
  positive_examples = [example for example in examples if condition(example)]

  # Continue the growth of the children.
  grow_tree(negative_child, negative_examples)
  grow_tree(positive_child, positive_examples)

接下來，我們將進一步說明訓練特定決策樹的步驟。

步驟 1：建立根：

含有問號的節點。