Differentially private aggregate functions

GoogleSQL for BigQuery supports differentially private aggregate functions.For an explanation of how aggregate functions work, seeAggregate function calls.

You can only use differentially private aggregate functions withdifferentially private queries in adifferential privacy clause.

Note: In this topic, the privacy parameters in the examples aren'trecommendations. You should work with your privacy or security officer todetermine the optimal privacy parameters for your dataset and organization.

Function list

NameSummary
AVG (Differential Privacy)DIFFERENTIAL_PRIVACY-supportedAVG.

Gets the differentially-private average of non-NULL, non-NaN values in a query with aDIFFERENTIAL_PRIVACY clause.
COUNT (Differential Privacy)DIFFERENTIAL_PRIVACY-supportedCOUNT.

Signature 1: Gets the differentially-private count of rows in a query with aDIFFERENTIAL_PRIVACY clause.

Signature 2: Gets the differentially-private count of rows with a non-NULL expression in a query with aDIFFERENTIAL_PRIVACY clause.
PERCENTILE_CONT (Differential Privacy)DIFFERENTIAL_PRIVACY-supportedPERCENTILE_CONT.

Computes a differentially-private percentile across privacy unit columns in a query with aDIFFERENTIAL_PRIVACY clause.
SUM (Differential Privacy)DIFFERENTIAL_PRIVACY-supportedSUM.

Gets the differentially-private sum of non-NULL, non-NaN values in a query with aDIFFERENTIAL_PRIVACY clause.

AVG (DIFFERENTIAL_PRIVACY)

WITHDIFFERENTIAL_PRIVACY...AVG(expression,[contribution_bounds_per_group=>(lower_bound,upper_bound)])

Description

Returns the average of non-NULL, non-NaN values in the expression.This function first computes the average per privacy unit column, and thencomputes the final result by averaging these averages.

This function must be used with theDIFFERENTIAL_PRIVACY clauseand can support the following arguments:

  • expression: The input expression. This can be any numeric input type,such asINT64.
  • contribution_bounds_per_group: A named argument with acontribution bound.Performs clamping for each group separately before performing intermediategrouping on the privacy unit column.

Return type

FLOAT64

Examples

The following differentially private query gets the average number of each itemrequested per professor. Smaller aggregations might not be included. This queryreferences a table calledprofessors.

-- With noise, using the epsilon parameter.SELECTWITHDIFFERENTIAL_PRIVACYOPTIONS(epsilon=10,delta=.01,max_groups_contributed=1,privacy_unit_column=id)item,AVG(quantity,contribution_bounds_per_group=>(0,100))average_quantityFROMprofessorsGROUPBYitem;-- These results will change each time you run the query.-- Smaller aggregations might be removed./*----------+------------------+ | item     | average_quantity | +----------+------------------+ | pencil   | 38.5038356810269 | | pen      | 13.4725028762032 | +----------+------------------*/
-- Without noise, using the epsilon parameter.-- (this un-noised version is for demonstration only)SELECTWITHDIFFERENTIAL_PRIVACYOPTIONS(epsilon=1e20,delta=.01,max_groups_contributed=1,privacy_unit_column=id)item,AVG(quantity)average_quantityFROMprofessorsGROUPBYitem;-- These results will not change when you run the query./*----------+------------------+ | item     | average_quantity | +----------+------------------+ | scissors | 8                | | pencil   | 40               | | pen      | 18.5             | +----------+------------------*/
Note: For more information about when and when not to usenoise, seeRemove noise.

COUNT (DIFFERENTIAL_PRIVACY)

  • Signature 1: Returns the number of rows in adifferentially privateFROM clause.
  • Signature 2: Returns the number of non-NULLvalues in an expression.

Signature 1

WITHDIFFERENTIAL_PRIVACY...COUNT(*,[contribution_bounds_per_group=>(lower_bound,upper_bound)])

Description

Returns the number of rows in thedifferentially privateFROM clause. The final resultis an aggregation across a privacy unit column.

This function must be used with theDIFFERENTIAL_PRIVACY clauseand can support the following arguments:

  • contribution_bounds_per_group: A named argument with acontribution bound.Performs clamping for each group separately before performing intermediategrouping on the privacy unit column.

Return type

INT64

Examples

The following differentially private query counts the number of requests foreach item. This query references a table calledprofessors.

-- With noise, using the epsilon parameter.SELECTWITHDIFFERENTIAL_PRIVACYOPTIONS(epsilon=10,delta=.01,max_groups_contributed=1,privacy_unit_column=id)item,COUNT(*,contribution_bounds_per_group=>(0,100))times_requestedFROMprofessorsGROUPBYitem;-- These results will change each time you run the query.-- Smaller aggregations might be removed./*----------+-----------------+ | item     | times_requested | +----------+-----------------+ | pencil   | 5               | | pen      | 2               | +----------+-----------------*/
-- Without noise, using the epsilon parameter.-- (this un-noised version is for demonstration only)SELECTWITHDIFFERENTIAL_PRIVACYOPTIONS(epsilon=1e20,delta=.01,max_groups_contributed=1,privacy_unit_column=id)item,COUNT(*,contribution_bounds_per_group=>(0,100))times_requestedFROMprofessorsGROUPBYitem;-- These results will not change when you run the query./*----------+-----------------+ | item     | times_requested | +----------+-----------------+ | scissors | 1               | | pencil   | 4               | | pen      | 3               | +----------+-----------------*/
Note: For more information about when and when not to usenoise, seeRemove noise.

Signature 2

WITHDIFFERENTIAL_PRIVACY...COUNT(expression,[contribution_bounds_per_group=>(lower_bound,upper_bound)])

Description

Returns the number of non-NULL expression values. The final result is anaggregation across a privacy unit column.

This function must be used with theDIFFERENTIAL_PRIVACY clauseand can support these arguments:

  • expression: The input expression. This expression can be anynumeric input type, such asINT64.
  • contribution_bounds_per_group: A named argument with acontribution bound.Performs clamping per each group separately before performing intermediategrouping on the privacy unit column.

Return type

INT64

Examples

The following differentially private query counts the number of requests madefor each type of item. This query references a table calledprofessors.

-- With noise, using the epsilon parameter.SELECTWITHDIFFERENTIAL_PRIVACYOPTIONS(epsilon=10,delta=.01,max_groups_contributed=1,privacy_unit_column=id)item,COUNT(item,contribution_bounds_per_group=>(0,100))times_requestedFROMprofessorsGROUPBYitem;-- These results will change each time you run the query.-- Smaller aggregations might be removed./*----------+-----------------+ | item     | times_requested | +----------+-----------------+ | pencil   | 5               | | pen      | 2               | +----------+-----------------*/
-- Without noise, using the epsilon parameter.-- (this un-noised version is for demonstration only)SELECTWITHDIFFERENTIAL_PRIVACYOPTIONS(epsilon=1e20,delta=.01,max_groups_contributed=1,privacy_unit_column=id)item,COUNT(item,contribution_bounds_per_group=>(0,100))times_requestedFROMprofessorsGROUPBYitem;-- These results will not change when you run the query./*----------+-----------------+ | item     | times_requested | +----------+-----------------+ | scissors | 1               | | pencil   | 4               | | pen      | 3               | +----------+-----------------*/
Note: For more information about when and when not to usenoise, seeRemove noise.

PERCENTILE_CONT (DIFFERENTIAL_PRIVACY)

WITHDIFFERENTIAL_PRIVACY...PERCENTILE_CONT(expression,percentile,contribution_bounds_per_row=>(lower_bound,upper_bound))

Description

Takes an expression and computes a percentile for it. The final result is anaggregation across privacy unit columns.

This function must be used with theDIFFERENTIAL_PRIVACY clauseand can support these arguments:

  • expression: The input expression. This can be most numeric input types,such asINT64.NULL values are always ignored.
  • percentile: The percentile to compute. The percentile must be a literal inthe range[0, 1].
  • contribution_bounds_per_row: A named argument with acontribution bounds.Performs clamping for each row separately before performing intermediategrouping on the privacy unit column.

NUMERIC andBIGNUMERIC arguments aren't allowed. If you need them, cast them as theFLOAT64 data type first.

Return type

FLOAT64

Examples

The following differentially private query gets the percentile of itemsrequested. Smaller aggregations might not be included. This query references aview calledprofessors.

-- With noise, using the epsilon parameter.SELECTWITHDIFFERENTIAL_PRIVACYOPTIONS(epsilon=10,delta=.01,max_groups_contributed=1,privacy_unit_column=id)item,PERCENTILE_CONT(quantity,0.5,contribution_bounds_per_row=>(0,100))percentile_requestedFROMprofessorsGROUPBYitem;-- These results will change each time you run the query.-- Smaller aggregations might be removed./*----------+----------------------+  | item     | percentile_requested |  +----------+----------------------+  | pencil   | 72.00011444091797    |  | scissors | 8.000175476074219    |  | pen      | 23.001075744628906   |  +----------+----------------------*/

SUM (DIFFERENTIAL_PRIVACY)

WITHDIFFERENTIAL_PRIVACY...SUM(expression,[contribution_bounds_per_group=>(lower_bound,upper_bound)])

Description

Returns the sum of non-NULL, non-NaN values in the expression. The finalresult is an aggregation across privacy unit columns.

This function must be used with theDIFFERENTIAL_PRIVACY clauseand can support these arguments:

  • expression: The input expression. This can be any numeric input type,such asINT64.NULL values are always ignored.
  • contribution_bounds_per_group: A named argument with acontribution bound. Performs clamping for each groupseparately before performing intermediate grouping on the privacy unit column.

Return type

One of the followingsupertypes:

  • INT64
  • FLOAT64

Examples

The following differentially private query gets the sum of items requested.Smaller aggregations might not be included. This query references a view calledprofessors.

-- With noise, using the epsilon parameter.SELECTWITHDIFFERENTIAL_PRIVACYOPTIONS(epsilon=10,delta=.01,max_groups_contributed=1,privacy_unit_column=id)item,SUM(quantity,contribution_bounds_per_group=>(0,100))quantityFROMprofessorsGROUPBYitem;-- These results will change each time you run the query.-- Smaller aggregations might be removed./*----------+-----------+ | item     | quantity  | +----------+-----------+ | pencil   | 143       | | pen      | 59        | +----------+-----------*/
-- Without noise, using the epsilon parameter.-- (this un-noised version is for demonstration only)SELECTWITHDIFFERENTIAL_PRIVACYOPTIONS(epsilon=1e20,delta=.01,max_groups_contributed=1,privacy_unit_column=id)item,SUM(quantity)quantityFROMprofessorsGROUPBYitem;-- These results will not change when you run the query./*----------+----------+ | item     | quantity | +----------+----------+ | scissors | 8        | | pencil   | 144      | | pen      | 58       | +----------+----------*/
Note: For more information about when and when not to usenoise, seeUse differential privacy.

Supplemental materials

Clamp values in a differentially private aggregate function

Indifferentially private queries,aggregation clamping is used to limit the contribution of outliers. You canclamp explicitly or implicitly as follows:

Implicitly clamp values

If you don't include the contribution bounds named argument with theDIFFERENTIAL_PRIVACY clause, clamping is implicit, whichmeans bounds are derived from the data itself in a differentially private way.

Implicit bounding works best when computed using large datasets. For moreinformation, seeImplicit bounding limitations for small datasets.

Details

In differentially private aggregate functions, explicit clamping is optional.If you don't include this clause, clamping is implicit,which means bounds are derived from the data itself in a differentiallyprivate way. The process is somewhat random, so aggregations with identicalranges can have different bounds.

Implicit bounds are determined for each aggregation. So if someaggregations have a wide range of values, and others have a narrow range ofvalues, implicit bounding can identify different bounds for differentaggregations as appropriate. Implicit bounds might be an advantage or adisadvantage depending on your use case. Different bounds for differentaggregations can result in lower error. Different bounds also means thatdifferent aggregations have different levels of uncertainty, which might not bedirectly comparable.Explicit bounds, on the other hand,apply uniformly to all aggregations and should be derived from publicinformation.

When clamping is implicit, part of the total epsilon is spent picking bounds.This leaves less epsilon for aggregations, so these aggregations are noisier.

Explicitly clamp values

contribution_bounds_per_group=>(lower_bound,upper_bound)
contribution_bounds_per_row=>(lower_bound,upper_bound)

Use the contribution bounds named argument to explicitly clampvalues per group or per row between a lower and upper bound in aDIFFERENTIAL_PRIVACY clause.

Input values:

  • contribution_bounds_per_row: Contributions per privacy unit are clampedon a per-row (per-record) basis. This means the following:
    • Upper and lower bounds are applied to column values in individualrows produced by the input subquery independently.
    • The maximum possible contribution per privacy unit (and per grouping set)is the product of the per-row contribution limit andmax_groups_contributeddifferential privacy parameter.
  • contribution_bounds_per_group: Contributions per privacy unit are clampedon a unique set of entity-specifiedGROUP BY keys. The upper and lowerbounds are applied to values per group after the values are aggregated perprivacy unit.
  • lower_bound: Numeric literal that represents the smallest value toinclude in an aggregation.
  • upper_bound: Numeric literal that represents the largest value toinclude in an aggregation.

NUMERIC andBIGNUMERIC arguments aren't allowed.

Details

In differentially private aggregate functions, clamping explicitly clamps thetotal contribution from each privacy unit column to within a specifiedrange.

Explicit bounds are uniformly applied to all aggregations. So even if someaggregations have a wide range of values, and others have a narrow range ofvalues, the same bounds are applied to all of them. On the other hand, whenimplicit bounds are inferred from the data, the boundsapplied to each aggregation can be different.

Explicit bounds should be chosen to reflect public information.For example, bounding ages between 0 and 100 reflects public informationbecause the age of most people generally falls within this range.

Important: The results of the query reveal the explicit bounds. Don't useexplicit bounds based on the entity data; explicit bounds should be based onpublic information.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-11-24 UTC.