After deployment, the Aggregation Service will expose two endpoints for ad tech usage: createJob
and getJob
.
You can read more on the createJob
and getJob
endpoints in the Aggregation Service API documentation.
createJob
The createJob
endpoint is called through a HTTP POST request and requires a request body. Once the createJob
request is made, you will receive a HTTP 202 success response.
POST https://<api-gateway>/stage/v1alpha/createJob
An example of the request body for createJob
:
{
"job_request_id": "<job_request_id>",
"input_data_blob_prefix": "<report_folder>/<report_name>.avro",
"input_data_bucket_name": "<input_bucket_name>",
"output_data_blob_prefix": "<output_folder>/<summary_report_prefix>",
"output_data_bucket_name": "<output_bucket_name>",
"job_parameters": {
"output_domain_blob_prefix": "<output_domain_folder>/<output_domain>.avro",
"output_domain_bucket_name": "<output_domain_bucket_name>",
"attribution_report_to": "<reporting origin of report>",
"reporting_site": "<host name of reporting origin>"
}
}
Note that reporting_site
and attribution_report_to
are mutually exclusive and only one is required.
You can also request for a debug job by adding debug_run
into the job_parameters
.
To understand debug mode, check out our aggregation debug run documentation.
{
"job_request_id": "<job_request_id>",
"input_data_blob_prefix": "<report_folder>/<report_name>.avro",
"input_data_bucket_name": "<input_bucket_name>",
"output_data_blob_prefix": "<output_folder>/<summary_report_prefix>",
"output_data_bucket_name": "<output_bucket_name>",
"job_parameters": {
"output_domain_blob_prefix": "<output_domain_folder>/<output_domain>.avro",
"output_domain_bucket_name": "<output_domain_bucket_name>",
"attribution_report_to": "<reporting origin of report>"
"debug_run": "true"
}
}
Request Fields
Parameter | Type | Description |
---|---|---|
job_request_id |
String |
This is an ad tech generated unique identifier that should be ASCII letters with 128 characters or less. This identifies the batch job request and will take all the aggregatable avro reports specified in the `input_data_blob_prefix` from the input bucket specified in the input_data_bucket_name that is hosted on the ad tech's cloud storage.
Characters: `a-z, A-Z, 0-9, !"#$%&'()*+,-./:;<=>?@[\]^_`{}~
|
input_data_blob_prefix |
String |
This is the path in the bucket. For single files, you can use the path. For multiple files, you can use the prefix in the path.
Example: folder/file will collect all reports from folder/file1.avro, folder/file/file1.avro, folder/file1/test/file2.avro. |
input_data_bucket_name |
String | This is the storage bucket for the input data or aggregatable reports. This is on the ad tech's cloud storage. |
output_data_blob_prefix |
String | This is the output path in the bucket. Single output file is supported. |
output_data_bucket_name |
String |
This is the storage bucket where the output_data will be sent. This is on the ad tech's cloud storage.
|
job_parameters |
Dictionary |
Required field. This will contain the different fields such as:
|
job_parameters.output_domain_blob_prefix |
String |
Similar to input_data_blob_prefix , this will be the path in the output_domain_bucket_name where your output domain AVRO will be located. For multiple files, you can use the prefix in the path. Once the Aggregation Service completes the batch, the summary report is created and placed in the output bucket output_data_bucket_name with the output_data_blob_prefix name.
|
job_parameters.output_domain_bucket_name |
String | This is the storage bucket for your output domain AVRO file. This is on the ad tech's cloud storage. |
job_parameters.attribution_report_to |
String | Mutually exclusive to reporting_site. This will be the reporting URL or reporting origin where the report was received. The origin will be part of the site that is registered in the Aggregation Service Onboarding. |
job_parameters.reporting_site |
String |
Mutually exclusive to attribution_report_to . This will be the hostname of the reporting URL or reporting origin where the report was received. The origin will be part of the site that is registered in the Aggregation Service Onboarding.
Note: You may submit reports with multiple reporting origins in the same request as long as all reporting origins belong to the same reporting site mentioned in this parameter.
|
job_parameters.debug_privacy_epsilon |
Floating point, Double | Optional field. If none passed, default is 10. Value of 0-64 can be used. The value can be varied. |
job_parameters.report_error_threshold_percentage |
Double | Optional field. This is the threshold of the percentage of reports that can fail before the job will start to fail. If left empty, the default is 10%. |
job_parameters.input_report_count |
long value |
Optional field. Total number of reports provided as input data for this job. This value, in conjunction with
report_error_threshold_percentage will enable early failure of the job when reports are excluded due to errors.
|
job_parameters.filtering_ids |
String |
Optional field. A list of unsigned filtering IDs separated by comma. All the contributions other than the matching filtering IDs will be filtered out.
(e.g. "filtering_ids":"12345,34455,12" . Default value is "0".) Read more about filtering IDs.
|
job_parameters.debug_run |
Boolean |
Optional field. When executing a debug run, noised and un-noised debug summary reports and annotations are added to indicate which keys are present in the domain input and/or reports. Additionally, duplicates across batches are also not enforced. Note that the debug run only considers reports that have the flag "debug_mode": "enabled" and debug runs consume the budget.
|
getJob
When ad tech wants to know the status of a requested batch, the ad tech can call the getJob endpoint. The getJob
endpoint can be called using a HTTPS GET call along with the job_request_id
parameter in the request.
GET https://<api-gateway>/stage/v1alpha/getJob?job_request_id=<job_request_id>
You should get a response like so. This will return the status of the job and error messages.
{
"job_status": "FINISHED",
"request_received_at": "2023-07-17T19:15:13.926530Z",
"request_updated_at": "2023-07-17T19:15:28.614942839Z",
"job_request_id": "PSD_0003",
"input_data_blob_prefix": "reports/output_reports_2023-07-17T19:11:27.315Z.avro",
"input_data_bucket_name": "ags-report-bucket",
"output_data_blob_prefix": "summary/summary.avro",
"output_data_bucket_name": "ags-report-bucket",
"postback_URL": "",
"result_info": {
"return_code": "SUCCESS",
"return_message": "Aggregation job successfully processed",
"error_summary": {
"error_counts": [],
"error_messages": []
},
"finished_at": "2023-07-17T19:15:28.607802354Z"
},
"job_parameters": {
"debug_run": "true",
"output_domain_bucket_name": "ags-report-bucket",
"output_domain_blob_prefix": "output_domain/output_domain.avro",
"attribution_report_to": "https://privacy-sandcastle-dev-dsp.web.app"
},
"request_processing_started_at": "2023-07-17T19:15:21.583759622Z"
}
Response Fields
Parameter | Type | Description |
---|---|---|
job_request_id |
String |
This is the unique job / batch ID that was specified in the createJob request.
|
job_status |
String | This is the status of the job request. |
request_received_at |
String | The time that the request was received. |
request_updated_at |
String | The time the job was last updated. |
input_data_blob_prefix |
String |
This is the input data prefix that was set at createJob .
|
input_data_bucket_name |
String |
This is the ad tech's input data bucket where the aggregatable reports are stored. This field is set at createJob .
|
output_data_blob_prefix |
String |
This is the output data prefix that was set at createJob .
|
output_data_bucket_name |
String |
This is the ad tech's output data bucket where the summary reports will be stored once generated. This field is set at createJob .
|
request_processing_started_at |
String |
The time when the latest processing attempt has started. This will exclude the time waiting in the job queue.
(Total processing time = request_updated_at - request_processing_started_at )
|
result_info |
Dictionary |
This is the result of the createJob and all the information available on the job.
This will show the return_code , return_message , finished_at and error_summary .
|
result_info.return_code |
String | The return code of the result of the job. This information will be needed when an issue happens in the Aggregation Service to understand what the issue might be. |
result_info.return_message |
String | The message (success or failure) that returns as a result of the job. This information will also be needed during investigation of Aggregation Service failures. |
result_info.error_summary |
Dictionary | The errors that return for the job. This will contain how many reports failed with what type of error. |
result_info.finished_at |
Timestamp | The timestamp of when the job finished. |
result_info.error_summary.error_counts |
List |
This will return a list of the error messages and the number of reports that failed with the same error message. Each error count will contain category, error_count , description .
|
result_info.error_summary.error_messages |
List | This will return a list of the error messages of reports that failed to be processed. |
job_parameters |
Dictionary |
This contains the job parameters provided in the createJob request. Relevant properties such as `output_domain_blob_prefix` and `output_domain_bucket_name`.
|
job_parameters.attribution_report_to |
String |
Mutually exclusive to reporting_site . This will be the reporting URL or reporting origin where the report was received. The origin will be part of the site that is registered in the Aggregation Service Onboarding. This is specified in the createJob request.
|
job_parameters.reporting_site |
String |
Mutually exclusive to attribution_report_to . This will be the hostname of the reporting URL or reporting origin where the report was received. The origin will be part of the site that is registered in the Aggregation Service Onboarding. Take note, you may submit reports with multiple reporting origins in the same request as long as all reporting origins belong to the same reporting site mentioned in this parameter. This is specified in the createJob request. Additionally, ensure that the bucket only contains the reports that you want aggregated at the time of job creation. Any reports added to that input data bucket with reporting origins matching the reporting site specified in the job parameter will be processed.
For example, if an ad tech has registered a reporting origin https://exampleabc.com , but within the input data bucket they add in reports from https://1.exampleabc.com , https://2.exampleabc.com , and https://3.examplexyz.com . Only reports that have been added in the bucket that match the registered reporting origin in the job will be aggregated, which would be https://exampleabc.com .
|
job_parameters.debug_privacy_epsilon |
Floating point, Double |
Optional field. If none passed, default is 10. Value of 0-64 can be used. The value can be varied. This is specified in the createJob request.
|
job_parameters.report_error_threshold_percentage |
Double |
Optional field. This is the threshold of the percentage of reports that can fail before the job will start to fail. If left empty, the default is 10%. This is specified in the createJob request.
|
job_parameters.input_report_count |
Long value |
Optional field. Total number of reports provided as input data for this job. This value, in conjunction with report_error_threshold_percentage will enable early failure of the job when reports are excluded due to errors. This is specified in the createJob request.
|
job_parameters.filtering_ids |
String |
Optional field. A list of unsigned filtering IDs separated by comma. All the contributions other than the matching filtering ID will be filtered out. This is specified in the createJob request.
(e.g. "filtering_ids":"12345,34455,12" . Default value is "0".) Read more about filtering IDs.
|