您可以透過下列兩種主要方式製作實驗報表:
- 直接查詢實驗報表:查詢
experiment資源的指標。這個選項會在單一回應中提供控制組和實驗組的指標,以及提升和 p 值等統計比較資料。這是製作廣告活動內實驗報表的唯一方法。 - 廣告活動報表:使用
campaign.experiment_type區分基準和實驗廣告活動,查詢campaign資源的指標。這個選項僅適用於使用個別控制組和實驗組廣告活動的實驗,例如系統管理的實驗。
本指南主要著重於直接實驗報表,適用於所有支援報表的實驗類型。
直接實驗報表
您可以直接查詢 experiment 資源,擷取控制組和實驗組的成效指標和統計比較資料。
指標和統計顯著程度
對於點擊、曝光、費用、轉換和轉換價值等核心指標,experiment 資源會在同一列中提供實驗組指標 (例如 metrics.clicks) 和對照組指標 (例如 metrics.control_clicks)。
此外,這項功能還提供欄位,協助您評估實驗組之間差異的統計顯著程度:
metrics.*_p_value:如果實驗對指標沒有實際影響,觀察到的結果會發生的機率。p 值越低,表示統計顯著程度越高。metrics.*_point_estimate:實驗組與控制組相比,指定指標的預估升幅百分比 (正向或負向)。與margin_of_error搭配使用時,可描述估計差異的信賴區間,以及規定的信賴水準。預估數量為 (實驗組 / 控制組 - 1)。點估計值是信賴區間的中心。metrics.*_margin_of_error:信賴區間的半徑,以point_estimate為中心。系統會根據指定信賴水準計算,而信賴水準取決於實驗類型。
experiment 資源支援下列核心指標欄位,包括實驗組值、控制組值,以及先前列出的統計資料欄位:
clicksimpressionscost_microsconversionscost_per_conversionconversion_valueconversion_value_per_cost
就轉換而言,統計欄位可透過下列 absolute_change 欄位取得,而非相對值:
metrics.conversions_absolute_change_p_value: 實驗對轉換絕對變化沒有影響的虛無假設 p 值。範圍為 0 到 1。metrics.conversions_absolute_change_point_estimate: 預估實驗對轉換次數的影響時, 絕對變動的點估計值。metrics.conversions_absolute_change_margin_of_error: 估算實驗對轉換絕對變化影響時的誤差幅度。
如需協助建構對 experiment 資源的有效查詢,請使用 Google Ads 查詢建立工具。
查詢示例
下列 GAQL 查詢會擷取實驗的重要指標:
SELECT
experiment.experiment_id,
experiment.name,
experiment.type,
metrics.clicks,
metrics.control_clicks,
metrics.clicks_point_estimate,
metrics.clicks_margin_of_error,
metrics.clicks_p_value,
metrics.conversions,
metrics.control_conversions,
metrics.conversions_absolute_change_point_estimate,
metrics.conversions_absolute_change_margin_of_error,
metrics.conversions_absolute_change_p_value
FROM experiment
WHERE experiment.experiment_id = EXPERIMENT_ID
解讀結果
您可以使用 p 值、點估計值和誤差範圍欄位,判斷實驗是否產生具統計顯著性的結果。舉例來說,如果 conversions_absolute_change_p_value 低於您選擇的門檻 (例如,95% 信賴水準的門檻為 0.05),且 conversions_absolute_change_point_estimate - conversions_absolute_change_margin_of_error 大於零,表示實驗組的轉換成效明顯優於控制組。
以下 Python 程式碼片段示範如何根據 p 值和升幅估算值評估結果:
Java
This example is not yet available in Java; you can take a look at the other languages.
C#
This example is not yet available in C#; you can take a look at the other languages.
PHP
This example is not yet available in PHP; you can take a look at the other languages.
Python
def evaluate_experiment( client: GoogleAdsClient, customer_id: str, row: GoogleAdsRow ) -> None: """Evaluates the performance of the experiment. Args: client: an initialized GoogleAdsClient instance. customer_id: a client customer ID. row: a GoogleAdsRow containing the experiment arm and metrics. """ metrics = row.metrics experiment_resource_name = row.experiment.resource_name # 1. Evaluate conversion success as a primary success signal. # - Point Estimate: Represents the estimated average lift or difference in conversions. # - Margin of Error: Outlines the confidence interval bounds. Note that the margin_of_error provided by the API is calculated for a preset confidence level which is set based on the experiment type. # - Lower Bound: (Point Estimate - Margin of Error). If this value is above 0, # we have statistical significance that performance has improved. conv_p_value = metrics.conversions_absolute_change_p_value conv_lift = metrics.conversions_absolute_change_point_estimate conv_error = metrics.conversions_absolute_change_margin_of_error conv_lower_bound = conv_lift - conv_error if conv_p_value <= P_VALUE_THRESHOLD: if conv_lower_bound > 0: print( "Significant Success: Conversions increased. Even at the lower" f" bound, the lift is {conv_lower_bound:.2f}. Promoting" " changes." ) promote_experiment(client, customer_id, experiment_resource_name) return elif (conv_lift + conv_error) < 0: print( "Significant Decline: Even the upper bound" f" ({conv_lift + conv_error:.2f}) is below zero. Ending" " experiment." ) end_experiment(client, customer_id, experiment_resource_name) return # 2. Evaluate click volume as a secondary signal. # This is helpful as an early indicator or for lower-volume accounts. click_p_value = metrics.clicks_p_value click_lift = metrics.clicks_point_estimate click_error = metrics.clicks_margin_of_error click_lower_bound = click_lift - click_error if click_p_value <= P_VALUE_THRESHOLD and click_lower_bound > 0: # We have a directional winner: high confidence in more traffic, # but not enough data to confirm conversion impact yet. print( f"Click volume is significantly up (+{click_lift*100:.1f}%). " "Graduating treatment for further manual analysis." ) # Graduate if it's a separate campaign test. # This keeps the high-volume treatment running independently. # Intra-campaign experiments (like ADOPT_BROAD_MATCH_KEYWORDS and # ADOPT_AI_MAX) run directly within the base campaign, meaning there is only # a single campaign involved and no separate treatment campaign to graduate. # Therefore, graduation is not supported for intra-campaign experiments. experiment_type_name = row.experiment.type_.name if ( experiment_type_name != "ADOPT_BROAD_MATCH_KEYWORDS" and experiment_type_name != "ADOPT_AI_MAX" ): graduate_experiment(client, customer_id, experiment_resource_name) else: print( "Intra-campaign trial detected: Graduation is not supported" " because there is only one campaign. Continuing to run to" " gather more conversion data." ) else: # Both conversions and clicks are noisy. print( "Inconclusive: No significant lift in Conversions" f" (p={conv_p_value:.2f}) or Clicks (p={click_p_value:.2f})." f" Current estimated lift: {conv_lift:.2f} +/- {conv_error:.2f}." " Continue running." )
Ruby
This example is not yet available in Ruby; you can take a look at the other languages.
Perl
This example is not yet available in Perl; you can take a look at the other languages.
curl
相較於廣告活動報表的優勢
相較於分別查詢廣告活動報表,直接實驗報表有幾項優點:
- 集中式指標:在單一資料列中擷取控制組和實驗組的指標。
- 統計信賴資料:提供計算出的 p 值、點估計值和誤差範圍。
- 效率:不必手動合併或比較多份報表的結果。
- 廣告活動內支援:這是比較廣告活動內實驗控制組與實驗組的唯一方法,流量會在單一廣告活動內分配。
廣告活動報表
對於會建立個別實驗組廣告活動的實驗 (例如 SEARCH_CUSTOM),您可以查詢 campaign 資源,並使用 campaign.experiment_type 找出 BASE (控制組) 和 EXPERIMENT (實驗組) 廣告活動。如果您需要以更精細的層級 (例如廣告群組或關鍵字) 區隔指標,或是查看 experiment 資源中沒有的廣告活動中繼資料,這個方法就非常實用。不過,您必須手動比較成效及進行統計計算。
您無法使用廣告活動層級報表,比較廣告活動內實驗的實驗組,因為流量分配是在單一廣告活動內部進行。查詢廣告活動內實驗的 campaign 時,只會傳回匯總總計。
最佳做法
- 選取適當的信賴水準:設定較低的 p 值門檻 (例如 0.05),可更快提供方向指引,特別是在預算或轉換量較低的情況下。95% 的信賴度 (p 值 <= 0.05) 是學術標準,在較長的時間範圍內,可提供更準確的結果。
- 實驗執行時間要夠長:實驗應執行至少 4 週,才能考量到每週的成效週期、轉換延遲和學習期。
- 給予適應期:如果廣告活動採用自動出價或測試新功能,請忽略前 1 到 2 週的資料,讓出價模型和流量重新調整至分割狀態。
- 使用 50/50 分組:一般來說,將流量分配比例設為 50/50,最快就能取得具有統計顯著性的結果。
- 提前安排:將實驗開始日期設在 3 到 7 天後,以便廣告審查和核准程序有充足時間。
- 每個廣告活動一次只能進行一項實驗。