Query Sensitive Data Protection findings in BigQuery Stay organized with collections Save and categorize content based on your preferences.
This page provides example queries that you can use to analyzeSensitive Data Protection findings that were exported toBigQuery.
You canconfigure an inspection job or jobtrigger to save the findings toBigQuery. Doing so lets you query the findings for furtheranalysis. When your findings are exported to BigQuery, the datais written to either a new or existing table.
For more information about all the actions that Sensitive Data Protection canperform after inspection, see theActions conceptual topic.
For more information about running queries, see the following:
Columns of the BigQuery table
The columns of the table of exported findings are based on the attributes of theFinding object.
Link results back to the rows containing the findings
If you areconfiguring inspection of a BigQuerytable, you can set up the job orjob trigger such that the exported findings will contain the row's identifiers.Doing so lets you link the inspection findings back to the rows that containthem.
In the inspection job or job trigger, set the following fields to the namesof the columns that uniquely identify each row in the table—that is,the columns that serve the purpose of a primary key:
- If you're using the Google Cloud console, set theIdentifying fields (commaseparated) field.
- If you're using the DLP API, set the
identifyingFieldsproperty.
When the inspection is done and the findings are exported toBigQuery, each finding will contain the corresponding values ofthe columns that you specified. Those values will be in thelocation.content_locations.record_location.record_key.id_values field. You canthen use those values to link the finding back to the specific row in theinspected BigQuery table.
rowIndexproperty is not intended for use in BigQuery inspection jobs.Sample queries
You can use the following sample queries to analyze your findings. You canalso use the queries in a visualization tool such asLooker Studio. Thesequeries are provided to help you get started querying your findings data.
In each of the following queries, replace the following:
PROJECT_ID: the project identifierDATASET: the BigQuery dataset nameTABLE_ID: the table ID
Select the count of each infoType
Google Cloud console
SELECTinfo_type.name,COUNT(info_type.name)AScountFROM`PROJECT_ID.DATASET.TABLE_ID`GROUPBYinfo_type.nameORDERBYcountDESC;
Command-line
bqquery--use_legacy_sql=false' SELECT info_type.name,COUNT(info_type.name) AS countFROM `PROJECT_ID.DATASET.TABLE_ID`GROUP BY info_type.name ORDER BY count DESC;'
Select the count of each infoType by day
Google Cloud console
SELECTinfo_type.name,cast(TIMESTAMP_SECONDS(create_time.seconds)asdate)asday,COUNT(locations.container_name)AScountFROM`PROJECT_ID.DATASET.TABLE_ID`,UNNEST(location.content_locations)ASlocationsGROUPBYinfo_type.name,dayORDERBYcountDESC;
Command-line
bqquery--use_legacy_sql=false' SELECT info_type.name,cast(TIMESTAMP_SECONDS(create_time.seconds) as date) as day,COUNT(locations.container_name) AS count FROM `PROJECT_ID.DATASET.TABLE_ID`,UNNEST(location.content_locations) AS locationsGROUP BY info_type.name, day ORDER BY count DESC;'
Selects the count of each infoType in each container
Google Cloud console
SELECTinfo_type.name,locations.container_name,COUNT(locations.container_name)AScountFROM`PROJECT_ID.DATASET.TABLE_ID`,UNNEST(location.content_locations)ASlocationsGROUPBYlocations.container_name,info_type.nameORDERBYcountDESC;
Command-line
bqquery--use_legacy_sql=false' SELECT info_type.name, locations.container_name,COUNT(locations.container_name) AS count FROM `PROJECT_ID.DATASET.TABLE_ID`,UNNEST(location.content_locations) AS locationsGROUP BY locations.container_name,info_type.name ORDER BY count DESC;'
Selects the finding types found for each column of a table
This query will group all the findings by column name and is intended to work onfindings from a BigQuery inspection job. This query is useful ifyou are trying to identify the likely types for a given column. You can adjustsettings by modifying the WHERE and HAVING clauses. For example, if multipletable results are included in your findings table, you can limit these to justone job run or one table name.
Google Cloud console
SELECTtable_counts.field_name,STRING_AGG(CONCAT(" ",table_counts.name," [count: ",CAST(table_counts.count_totalASString),"]")ORDERBYtable_counts.count_totalDESC)ASinfoTypesFROM(SELECTlocations.record_location.field_id.nameASfield_name,info_type.name,COUNT(*)AScount_totalFROM`PROJECT_ID.DATASET.TABLE_ID`,UNNEST(location.content_locations)ASlocationsWHERE(likelihood='LIKELY'ORlikelihood='VERY_LIKELY'ORlikelihood='POSSIBLE')GROUPBYlocations.record_location.field_id.name,info_type.nameHAVINGcount_total>200)AStable_countsGROUPBYtable_counts.field_nameORDERBYtable_counts.field_name
The above query might produce a result like this for a sample table, where the infoTypes column tells us how many instances of each infoType was found for that given column.
| field_name | infoTypes |
|---|---|
field1 | CUSTOM_USER_US [count: 7004],CUSTOM_USER_EU [count: 2996] |
field2 | US_VEHICLE_IDENTIFICATION_NUMBER [count: 9597] |
field3 | EMAIL_ADDRESS [count: 10000] |
field4 | IP_ADDRESS [count: 10000] |
field5 | PHONE_NUMBER [count: 7815] |
field6 | US_SOCIAL_SECURITY_NUMBER [count: 10000] |
field7 | CREDIT_CARD_NUMBER [count: 10000] |
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-17 UTC.