Query Sensitive Data Protection findings in BigQuery

This page provides example queries that you can use to analyzeSensitive Data Protection findings that were exported toBigQuery.

You canconfigure an inspection job or jobtrigger to save the findings toBigQuery. Doing so lets you query the findings for furtheranalysis. When your findings are exported to BigQuery, the datais written to either a new or existing table.

For more information about all the actions that Sensitive Data Protection canperform after inspection, see theActions conceptual topic.

For more information about running queries, see the following:

Columns of the BigQuery table

The columns of the table of exported findings are based on the attributes of theFinding object.

Link results back to the rows containing the findings

If you areconfiguring inspection of a BigQuerytable, you can set up the job orjob trigger such that the exported findings will contain the row's identifiers.Doing so lets you link the inspection findings back to the rows that containthem.

In the inspection job or job trigger, set the following fields to the namesof the columns that uniquely identify each row in the table—that is,the columns that serve the purpose of a primary key:

  • If you're using the Google Cloud console, set theIdentifying fields (commaseparated) field.
  • If you're using the DLP API, set theidentifyingFieldsproperty.

When the inspection is done and the findings are exported toBigQuery, each finding will contain the corresponding values ofthe columns that you specified. Those values will be in thelocation.content_locations.record_location.record_key.id_values field. You canthen use those values to link the finding back to the specific row in theinspected BigQuery table.

Note: TherowIndexproperty is not intended for use in BigQuery inspection jobs.

Sample queries

You can use the following sample queries to analyze your findings. You canalso use the queries in a visualization tool such asLooker Studio. Thesequeries are provided to help you get started querying your findings data.

In each of the following queries, replace the following:

  • PROJECT_ID: the project identifier
  • DATASET: the BigQuery dataset name
  • TABLE_ID: the table ID

Select the count of each infoType

Google Cloud console

SELECTinfo_type.name,COUNT(info_type.name)AScountFROM`PROJECT_ID.DATASET.TABLE_ID`GROUPBYinfo_type.nameORDERBYcountDESC;

Command-line

bqquery--use_legacy_sql=false' SELECT info_type.name,COUNT(info_type.name) AS countFROM `PROJECT_ID.DATASET.TABLE_ID`GROUP BY info_type.name ORDER BY count DESC;'

Select the count of each infoType by day

Google Cloud console

SELECTinfo_type.name,cast(TIMESTAMP_SECONDS(create_time.seconds)asdate)asday,COUNT(locations.container_name)AScountFROM`PROJECT_ID.DATASET.TABLE_ID`,UNNEST(location.content_locations)ASlocationsGROUPBYinfo_type.name,dayORDERBYcountDESC;

Command-line

bqquery--use_legacy_sql=false' SELECT info_type.name,cast(TIMESTAMP_SECONDS(create_time.seconds) as date) as day,COUNT(locations.container_name) AS count FROM `PROJECT_ID.DATASET.TABLE_ID`,UNNEST(location.content_locations) AS locationsGROUP BY info_type.name, day ORDER BY count DESC;'

Selects the count of each infoType in each container

Google Cloud console

SELECTinfo_type.name,locations.container_name,COUNT(locations.container_name)AScountFROM`PROJECT_ID.DATASET.TABLE_ID`,UNNEST(location.content_locations)ASlocationsGROUPBYlocations.container_name,info_type.nameORDERBYcountDESC;

Command-line

bqquery--use_legacy_sql=false' SELECT info_type.name, locations.container_name,COUNT(locations.container_name) AS count FROM `PROJECT_ID.DATASET.TABLE_ID`,UNNEST(location.content_locations) AS locationsGROUP BY locations.container_name,info_type.name ORDER BY count DESC;'

Selects the finding types found for each column of a table

This query will group all the findings by column name and is intended to work onfindings from a BigQuery inspection job. This query is useful ifyou are trying to identify the likely types for a given column. You can adjustsettings by modifying the WHERE and HAVING clauses. For example, if multipletable results are included in your findings table, you can limit these to justone job run or one table name.

Google Cloud console

SELECTtable_counts.field_name,STRING_AGG(CONCAT(" ",table_counts.name," [count: ",CAST(table_counts.count_totalASString),"]")ORDERBYtable_counts.count_totalDESC)ASinfoTypesFROM(SELECTlocations.record_location.field_id.nameASfield_name,info_type.name,COUNT(*)AScount_totalFROM`PROJECT_ID.DATASET.TABLE_ID`,UNNEST(location.content_locations)ASlocationsWHERE(likelihood='LIKELY'ORlikelihood='VERY_LIKELY'ORlikelihood='POSSIBLE')GROUPBYlocations.record_location.field_id.name,info_type.nameHAVINGcount_total>200)AStable_countsGROUPBYtable_counts.field_nameORDERBYtable_counts.field_name

The above query might produce a result like this for a sample table, where the infoTypes column tells us how many instances of each infoType was found for that given column.

field_nameinfoTypes
field1CUSTOM_USER_US [count: 7004],CUSTOM_USER_EU [count: 2996]
field2US_VEHICLE_IDENTIFICATION_NUMBER [count: 9597]
field3EMAIL_ADDRESS [count: 10000]
field4IP_ADDRESS [count: 10000]
field5PHONE_NUMBER [count: 7815]
field6US_SOCIAL_SECURITY_NUMBER [count: 10000]
field7CREDIT_CARD_NUMBER [count: 10000]

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-17 UTC.