實驗報表

您可以透過下列兩種主要方式製作實驗報表:

  • 直接查詢實驗報表:查詢 experiment 資源的指標。這個選項會在單一回應中提供控制組和實驗組的指標,以及提升和 p 值等統計比較資料。這是製作廣告活動內實驗報表的唯一方法。
  • 廣告活動報表:使用 campaign.experiment_type 區分基準和實驗廣告活動,查詢 campaign 資源的指標。這個選項僅適用於使用個別控制組和實驗組廣告活動的實驗,例如系統管理的實驗。

本指南主要著重於直接實驗報表,適用於所有支援報表的實驗類型。

直接實驗報表

您可以直接查詢 experiment 資源,擷取控制組和實驗組的成效指標和統計比較資料。

指標和統計顯著程度

對於點擊、曝光、費用、轉換和轉換價值等核心指標,experiment 資源會在同一列中提供實驗組指標 (例如 metrics.clicks) 和對照組指標 (例如 metrics.control_clicks)。

此外,這項功能還提供欄位,協助您評估實驗組之間差異的統計顯著程度:

  • metrics.*_p_value:如果實驗對指標沒有實際影響,觀察到的結果會發生的機率。p 值越低,表示統計顯著程度越高。
  • metrics.*_point_estimate:實驗組與控制組相比,指定指標的預估升幅百分比 (正向或負向)。與 margin_of_error 搭配使用時,可描述估計差異的信賴區間,以及規定的信賴水準。預估數量為 (實驗組 / 控制組 - 1)。點估計值是信賴區間的中心。
  • metrics.*_margin_of_error:信賴區間的半徑,以 point_estimate 為中心。系統會根據指定信賴水準計算,而信賴水準取決於實驗類型。

experiment 資源支援下列核心指標欄位,包括實驗組值、控制組值,以及先前列出的統計資料欄位:

  • clicks
  • impressions
  • cost_micros
  • conversions
  • cost_per_conversion
  • conversion_value
  • conversion_value_per_cost

就轉換而言,統計欄位可透過下列 absolute_change 欄位取得,而非相對值:

如需協助建構對 experiment 資源的有效查詢,請使用 Google Ads 查詢建立工具

查詢示例

下列 GAQL 查詢會擷取實驗的重要指標:

SELECT
  experiment.experiment_id,
  experiment.name,
  experiment.type,
  metrics.clicks,
  metrics.control_clicks,
  metrics.clicks_point_estimate,
  metrics.clicks_margin_of_error,
  metrics.clicks_p_value,
  metrics.conversions,
  metrics.control_conversions,
  metrics.conversions_absolute_change_point_estimate,
  metrics.conversions_absolute_change_margin_of_error,
  metrics.conversions_absolute_change_p_value
FROM experiment
WHERE experiment.experiment_id = EXPERIMENT_ID

解讀結果

您可以使用 p 值、點估計值和誤差範圍欄位,判斷實驗是否產生具統計顯著性的結果。舉例來說,如果 conversions_absolute_change_p_value 低於您選擇的門檻 (例如,95% 信賴水準的門檻為 0.05),且 conversions_absolute_change_point_estimate - conversions_absolute_change_margin_of_error 大於零,表示實驗組的轉換成效明顯優於控制組。

以下 Python 程式碼片段示範如何根據 p 值和升幅估算值評估結果:

Java

This example is not yet available in Java; you can take a look at the other languages.
    

C#

This example is not yet available in C#; you can take a look at the other languages.
    

PHP

This example is not yet available in PHP; you can take a look at the other languages.
    

Python

def evaluate_experiment(
    client: GoogleAdsClient, customer_id: str, row: GoogleAdsRow
) -> None:
    """Evaluates the performance of the experiment.

    Args:
        client: an initialized GoogleAdsClient instance.
        customer_id: a client customer ID.
        row: a GoogleAdsRow containing the experiment arm and metrics.
    """
    metrics = row.metrics
    experiment_resource_name = row.experiment.resource_name

    # 1. Evaluate conversion success as a primary success signal.
    # - Point Estimate: Represents the estimated average lift or difference in conversions.
    # - Margin of Error: Outlines the confidence interval bounds. Note that the margin_of_error provided by the API is calculated for a preset confidence level which is set based on the experiment type.
    # - Lower Bound: (Point Estimate - Margin of Error). If this value is above 0,
    #   we have statistical significance that performance has improved.
    conv_p_value = metrics.conversions_absolute_change_p_value
    conv_lift = metrics.conversions_absolute_change_point_estimate
    conv_error = metrics.conversions_absolute_change_margin_of_error
    conv_lower_bound = conv_lift - conv_error

    if conv_p_value <= P_VALUE_THRESHOLD:
        if conv_lower_bound > 0:
            print(
                "Significant Success: Conversions increased. Even at the lower"
                f" bound, the lift is {conv_lower_bound:.2f}. Promoting"
                " changes."
            )
            promote_experiment(client, customer_id, experiment_resource_name)
            return
        elif (conv_lift + conv_error) < 0:
            print(
                "Significant Decline: Even the upper bound"
                f" ({conv_lift + conv_error:.2f}) is below zero. Ending"
                " experiment."
            )
            end_experiment(client, customer_id, experiment_resource_name)
            return

    # 2. Evaluate click volume as a secondary signal.
    # This is helpful as an early indicator or for lower-volume accounts.
    click_p_value = metrics.clicks_p_value
    click_lift = metrics.clicks_point_estimate
    click_error = metrics.clicks_margin_of_error
    click_lower_bound = click_lift - click_error

    if click_p_value <= P_VALUE_THRESHOLD and click_lower_bound > 0:
        # We have a directional winner: high confidence in more traffic,
        # but not enough data to confirm conversion impact yet.
        print(
            f"Click volume is significantly up (+{click_lift*100:.1f}%). "
            "Graduating treatment for further manual analysis."
        )

        # Graduate if it's a separate campaign test.
        # This keeps the high-volume treatment running independently.
        # Intra-campaign experiments (like ADOPT_BROAD_MATCH_KEYWORDS and
        # ADOPT_AI_MAX) run directly within the base campaign, meaning there is only
        # a single campaign involved and no separate treatment campaign to graduate.
        # Therefore, graduation is not supported for intra-campaign experiments.
        experiment_type_name = row.experiment.type_.name
        if (
            experiment_type_name != "ADOPT_BROAD_MATCH_KEYWORDS"
            and experiment_type_name != "ADOPT_AI_MAX"
        ):
            graduate_experiment(client, customer_id, experiment_resource_name)
        else:
            print(
                "Intra-campaign trial detected: Graduation is not supported"
                " because there is only one campaign. Continuing to run to"
                " gather more conversion data."
            )
    else:
        # Both conversions and clicks are noisy.
        print(
            "Inconclusive: No significant lift in Conversions"
            f" (p={conv_p_value:.2f}) or Clicks (p={click_p_value:.2f})."
            f" Current estimated lift: {conv_lift:.2f} +/- {conv_error:.2f}."
            " Continue running."
        )
      

Ruby

This example is not yet available in Ruby; you can take a look at the other languages.
    

Perl

This example is not yet available in Perl; you can take a look at the other languages.
    

curl

相較於廣告活動報表的優勢

相較於分別查詢廣告活動報表,直接實驗報表有幾項優點:

  1. 集中式指標:在單一資料列中擷取控制組和實驗組的指標。
  2. 統計信賴資料:提供計算出的 p 值、點估計值和誤差範圍。
  3. 效率:不必手動合併或比較多份報表的結果。
  4. 廣告活動內支援:這是比較廣告活動內實驗控制組與實驗組的唯一方法,流量會在單一廣告活動內分配。

廣告活動報表

對於會建立個別實驗組廣告活動的實驗 (例如 SEARCH_CUSTOM),您可以查詢 campaign 資源,並使用 campaign.experiment_type 找出 BASE (控制組) 和 EXPERIMENT (實驗組) 廣告活動。如果您需要以更精細的層級 (例如廣告群組或關鍵字) 區隔指標,或是查看 experiment 資源中沒有的廣告活動中繼資料,這個方法就非常實用。不過,您必須手動比較成效及進行統計計算。

您無法使用廣告活動層級報表,比較廣告活動內實驗的實驗組,因為流量分配是在單一廣告活動內部進行。查詢廣告活動內實驗的 campaign 時,只會傳回匯總總計。

最佳做法

  • 選取適當的信賴水準:設定較低的 p 值門檻 (例如 0.05),可更快提供方向指引,特別是在預算或轉換量較低的情況下。95% 的信賴度 (p 值 <= 0.05) 是學術標準,在較長的時間範圍內,可提供更準確的結果。
  • 實驗執行時間要夠長:實驗應執行至少 4 週,才能考量到每週的成效週期、轉換延遲和學習期。
  • 給予適應期:如果廣告活動採用自動出價或測試新功能,請忽略前 1 到 2 週的資料,讓出價模型和流量重新調整至分割狀態。
  • 使用 50/50 分組:一般來說,將流量分配比例設為 50/50,最快就能取得具有統計顯著性的結果。
  • 提前安排:將實驗開始日期設在 3 到 7 天後,以便廣告審查和核准程序有充足時間。
  • 每個廣告活動一次只能進行一項實驗。