To keep data private and secure, Aggregation Service uses a framework that supports differential privacy (DP). Tools and mechanisms are designed to quantify and limit the amount of information revealed by an individual user. Let us discuss a few of these privacy protections.
Added noise to summary reports
When ad tech batches aggregatable reports, Aggregation Service creates a summary report. The summary report is an aggregate of all the contributions of all the predefined domain keys with added statistical noise.
Noise added to the reports does not depend on the number of reports aggregated, individual report values or aggregated report values.
Noise is drawn from a discrete version of the Laplace distribution scaled to the contribution budget (L1
sensitivity) that is enforced by the client dependent on the corresponding measurement API and the privacy parameter epsilon
. Read more about noise.
Contribution Bounding
Varying depending on the measurement client APIs (Attribution Reporting API or Private Aggregation API) used, the number of contributions passed in a call are tied to a specific contribution bounding limit in order to control the sensitivity of the summary report.
To understand more about contribution budgets per API, you can find them in the following sections of each API:
"No duplicates" rule
The rule states that an aggregatable report, uniquely identified by report_id
, can only appear once in a single batch. Should an aggregatable report appear more than once per batch, the first report will be included in the aggregation and the subsequent reports with the same report_id
will be discarded. The batch will complete successfully.
The rule also states that the same report cannot appear in more than one batch. If a report has already been batched in a previous successful batch, the same report cannot be in a latter batch. The latter batch will end with a failure.
Without these rules, attackers can gain insight to the contents of a specific batch by manipulating the contents of the batches through including duplicate copies of a report in a single batch or multiple batches.
Another concept that Aggregation Service introduces is one of disjoint batches. This means that two or more batches shouldn't have reports that share a common shared ID.
Shared ID is a combination of data collected from the shared_info
field of an aggregatable report. A sample shared_info
field can be seen in the following. We can see the API, version
, attribution_destination
(for Attribution Reporting), reporting_origin
, scheduled_report_time
and source_registration_time
(for Attribution Reporting). All these fields except for the report_id
contribute to the shared ID.
"shared_info": {
"API": "attribution-reporting",
"attribution_destination": "https://privacy-sandbox-demos-shop.dev",
"report_id": "5b052748-f5fb-4f14-b291-de03484ed59e",
"reporting_origin": "https://privacy-sandbox-demos-dsp.dev",
"scheduled_report_time": "1707786751",
"source_registration_time": "0",
"version": "0.1"
}
Since source_registration_time
is truncated by the day and scheduled_report_time
is truncated by the hour, there will be reports that will share the same shared ID.
Take a look at how two reports can share the same shared ID. We have the following example of Shared Info fields from Report1 and Report2.
Both reports have the same API, version, attribution_destination
, reporting_origin
and source_registration_time
. Since report_id
is not part of the shared ID, we can ignore this difference. The only other difference is the scheduled_report_time
. When we look into this further, scheduled_report_time
for Report1 is February 19, 2024 9:08:10 PM
and for Report2 is February 19, 2024 9:55:10 PM
. Because scheduled_report_time
is truncated to the hour, we can see that both reports have February 19, 2024 9 PM
as the scheduled_report_time
. And because all fields are the same, we can confirm that both reports have the same shared ID.
Observe the scheduled_report_time
.
Report1 Shared Info | Report2 Shared Info |
---|---|
"shared_info": { | "shared_info": { |
"API": "attribution-reporting", | "API": "attribution-reporting", |
"attribution_destination": "https://shop.dev", | "attribution_destination": "https://shop.dev", |
"report_id": "5b052748-...", | "report_id": "1a1b25aa-...", |
"reporting_origin": "https://dsp.dev", | "reporting_origin": "https://dsp.dev", |
"scheduled_report_time": "1708376890", | "scheduled_report_time": "1708379710", |
"source_registration_time": "0", | "source_registration_time": "0", |
"version": "0.1" | "version": "0.1" |
} | } |
Now that it is confirmed that both reports have the same shared ID, both reports will have to be included in the same batch.
Should Report1 be batched in a previously successful batch and Report2 be processed in a subsequent batch, the batch with Report2 will fail with a PRIVACY_BUDGET_EXHAUSTED
error. If this happens, remove the reports with shared ID that have been successfully batched in prior batches and try again.
To know more about batching, visit the batching strategy guide.
Pre-declaring aggregation keys
When submitting a batch to Aggregation Service, it is required to include both the aggregatable reports that are received from the reporting origin and the output domain file. The output domain contains the keys or buckets that will be retrieved from the aggregatable reports.
From a privacy standpoint, noise will be added to all keys pre-declared in the output domain, even when no real report matches a particular key. Specifying the output domain protects against an attack where the presence of a key in the output reveals something about a single user / event. For example, if you only showed a campaign to one user, receiving a key in the output (even with noise) reveals that that user later converted. By specifying this domain beforehand, we can be sure that it does not reveal anything about the user contributions.
Keys or buckets are 128-bit keys that are declared by the ad tech in either Attribution Reporting API or Private Aggregation API ad techs can use these keys to encode dimensions that they want to track.
Only pre-declared keys will be considered for aggregation and included in the summary report. The aggregated values of the buckets in the summary report will have statistical noise added, which will be reflected in the created summary report.
Essentially, should an aggregation key be included in the output domain file and, yet, not located in any batch report, even if the aggregated value is zero, the final summary report will likely be non-zero because of the added noise to preserve privacy.
Note that a feature called key-discovery is being considered at the time of this writing. Key discovery will allow the ad tech to process aggregatable files without the requirement of pre-declared keys, but to preserve privacy in the previously stated scenario, an additional threshold step will be performed.