MMM Unified Schema

The Unified Schema is a layered data schema, defined using Protocol Buffers, designed to represent the core model and its outputs for any MMM framework (e.g. Google Meridian, Robyn by Meta). This schema decouples the model training concerns from practical applications that consume the model's output. It standardizes the representation of model inputs, model fit statistics, marketing analyses, and optimizations (e.g. budget, reach/frequency) in a model-agnostic way.

Unified Schema Overview

Key Features

A list of key features and benefits of the Unified Schema for Marketing Mix Models (MMM):

  • Framework-agnostic standardization: Establishes a common, unified structure for representing MMM inputs and outputs, regardless of the underlying framework (e.g. Google's Meridian, Meta's Robyn). This allows for consistent interpretation and comparison of results across different models.
  • Decoupling: Separates the concerns of the training framework and the model representation. Downstream systems (e.g. visualization tools) can interact with the standardized schema output without needing to depend on the internal implementation details of the modeling library.
  • Enables downstream tooling: Provides a contract for a variety of downstream applications. Examples include:
    • Report generation
    • Visualization dashboards (e.g. Meridian Scenario Planner)
    • Data processing pipelines
  • Interoperability and collaboration: Uses Protocol Buffers for the serialization format. This provides language-neutral, platform-neutral, and extensible data exchange, allowing teams using different programming languages (Python, Java, R, etc.) to work with model outputs.
  • Extensibility and scalability: The modular and layered structure of the Unified Schema allows new features, metrics, and analyses to be added as MMM methodologies evolve, requiring minimal changes to the core schema.
  • Persistence, sharing, and versioning: The protobuf-based schema allows trained models, inputs, and derived analyses to be serialized to disk, shared between teams, and version-controlled.
  • Community-driven evolution: An open-sourced Unified Schema invites contributions from the broader MMM community. This collaborative environment can accelerate the schema's development, incorporate diverse perspectives, and ensure it keeps up-to-date with the latest advancements in the rapidly evolving industry of MMM.

Overview

Unified Schema Overview

The Unified Schema is roughly divided into four parts: the model core itself, and three extension groups: model fit, marketing analysis, and marketing optimization. The latter three are derived from the former. Model extensions are optional data structures that are attached to the model kernel. While the latter represents the state of a trained marketing mix model as data at rest, a model extension represents a derivative analysis of the core.

// mmm.proto

// A schema that contains derived metrics and modeled analysis by a trained
// marketing mix model.
message Mmm {
  // A MMM kernel contains the core information about the model used to generate
  // this output.
  model.MmmKernel mmm_kernel = 1;

  // Model fit result.
  fit.ModelFit model_fit = 2;

  // A list of marketing analysis generated by the MMM kernel.
  marketing.analysis.MarketingAnalysisList marketing_analysis_list = 3;

  // Marketing optimization on different perspectives using the MMM kernel.
  marketing.optimization.MarketingOptimization marketing_optimization = 4;
}

Model Core: mmm_kernel

MmmKernel message is a general model wrapper. It includes:

  • marketing_data: Data used to train the model, make marketing analyses, and optimize budget allocation.
  • model: A field of Any type that wraps a serialized form for the model implementation. Using Any type allows the model implementation to be represented as a blackbox.

Marketing Data

The marketing information in MarketingDataPoint includes KPI data, media channel details, and other non-media factors.

KPI can be either revenue or non-revenue type. For the non-revenue type, in addition to specifying the KPI value, the user can also specify the conversion rate from KPI to revenue. This is known as revenue_per_kpi.

Media channel details have a field for geo info, time info, KPI, control variables, media variables, non-media variables, and reach-frequency variables.

If a MediaVariable from a paid media channel and its media spend breakdown by geo and time is not available (i.e. media spend is aggregated across all geos and times), then there should be a separate MarketingDataPoint message with media_spend where geo_info is unset and date_interval spans the entire time dimension's coordinates.

Non-media factors include the variables or factors that are not directly related to media but still impact the target response. For example: promotions, seasonality factors, and macroeconomic factors. What needs to be included is a model-dependent choice. The optional marketing data metadata field contains time, geo, and KPI information that's helpful for serialization and deserialization processes.

Model Fit Extension: model_fit

The model fit extension shows the prediction versus ground truth over time and the performance metrics. It also contains goodness of fit statistics (R-squared, MAPE, wMAPE, RMSE). Note that the structure for model fit captures generic concepts that should apply to all MMMs. This is done by capturing various model-agnostic prediction points and comparing them to actual observed values, and then deriving metrics from the aggregate.

Marketing Analysis Extension: marketing_analysis_list

The marketing analysis extension contains insights on input data and how media spend and non-media factors impact outcomes.

Media Analysis and Non-Media Analysis

A MediaAnalysis provides insights into the impacts of a media channel on KPIs (both revenue and non-revenue). When the media is a non-paid media, the response curve and the impact metrics that are related to spend won't be available due to the lack of spend information. A NonMediaAnalysis is similar to media analysis except that it is not associated with a spend value. As a result, it does not contain a response curve, and the fields that are derived from spend values won't be set in its Outcome.

Outcome

Outcome is the business result (e.g. total sales revenue or conversions). It represents the estimated impact of media channels or non-paid media factors through various metrics. It is the primary metric of interest that measures the causal effect of treatment variables upon.

Outcome can be defined by one of the following:

  • KPI: is a specific, quantifiable metric used to measure progress towards some outcome. It is the response variable of the model. There are two types of KPI: (a) one that takes the form of revenue (e.g. monetary value), (b) one that takes the form of a generic, user-defined, non-revenue KPI (e.g. click-through rate).
  • ROI, mROI, Cost per contribution: Media channels can be either revenue-based or non-revenue-based, meaning all proto fields (ROI, mROI, Cost per contribution) can be populated. Non-media channels don't have spend, so spend-related fields (ROI, mROI) shouldn't be populated.
  • Contribution and effectiveness: could have slightly different meanings depending on the context. For revenue KPI in media, it uses monetary revenue values in calculations while non-revenue uses generic, user-defined KPI values.

Marketing Optimization Extension Group: marketing_optimization

Marketing optimization can be done in many different aspects. In this extension group, all optimization related extensions are included here.

BudgetAllocation Extension

The structures defined here capture model-agnostic concepts for some optimal budget allocation over multiple marketing channels, given some user inputs for constraints, optimization objectives, etc.

BudgetOptimizationResult is done for marketing spend over all applicable channels given specific time selections. It requires user input with an optimization goal and budget constraints on each channel. This optimization has two types: fixed and flexible budget. For a fixed budget optimization, the goal is to maximize an objective metric value by optimizing budget allocation over channels without changing the total budget amount. A flexible budget optimization, on the other hand, aims to maximize an objective metric value by optimizing budget allocation over channels with a flexible total budget amount, keeping some predefined constraints in mind.

The optimized result is expressed per channel. Each channel result has the actual outcome result of the channel, the model-optimized outcome result of the channel, and the budget constraints on the channel used in the optimization process.

In addition to the optimization results, BudgetOptimization contains a list of IncrementalOutcomeGrids that have different types of incremental outcomes or different grid granularities. Each IncrementalOutcomeGrid represents a grid that details the incremental outcome of marketing spend by channel. This grid is useful for, for example, plotting the trend of incremental impacts over spend or executing an online budget optimization based on a grid search optimization algorithm. Note that this grid is constructed under the assumption that there is no interaction effect across channels, that is, the spend on one channel won't affect other channels.

OptimalFrequency Extension

Optimal average frequency—which is optional and is predicated on the presence of reach and frequency data in the model in the first place—indicates an optimal frequency for advertising impression per unique user in the channel to maximize an objective. The availability of this extension depends on the data and the MMM capability.

Installation

Install the Unified Schema package by cloning the Meridian GitHub repository and then including the [schema] extra when installing Meridian. This package contains the relevant proto definitions.

$ git clone https://github.com/google/meridian.git;
$ cd meridian;
$ pip install .[schema];

Translation Layer

Depending on the MMM framework and model-specific algorithms, the translation layer or processors that produce a corresponding Mmm proto message will vary. As a starting point, see an example of Google Meridian's processors.