Scan for data quality issues
This document explains how to use BigQuery andDataplex Universal Catalog together to ensure that data meets your qualityexpectations. Dataplex Universal Catalog automatic data quality lets you define andmeasure the quality of the data in your BigQuery tables. You canautomate the scanning of data, validate data against defined rules, and logalerts if your data doesn't meet quality requirements.
For more information about automatic data quality, see theAuto data quality overview.
Tip: The steps in this document show how to manage data quality scans acrossyour project. You can also create and manage data quality scans when workingwith a specific table. For more information, see theManage data quality scans for a specific table sectionof this document.Before you begin
Enable the Dataplex API.
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM role (
roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enablepermission.Learn how to grant roles.- Optional: If you want Dataplex Universal Catalog to generate recommendations fordata quality rules based on the results of a data profile scan,create and run the data profile scan.
Required roles
To run a data quality scan on a BigQuery table, you needpermission to read the BigQuery table and permission tocreate a BigQuery job in the project used to scan the table.
Note: Dataplex Universal Catalog doesn't create a BigQuery job inyour project. However, you need this permission to create aDryRunjob tocheck for permissions for the table.If the BigQuery table and the data quality scan are indifferent projects, then you need to give the Dataplex Universal Catalog serviceaccount of the project containing the data quality scan read permission forthe corresponding BigQuery table.
Note: If you haven't created any data quality or data profile scans or youdon't have a Dataplex Universal Catalog lake in this project, create aservice identifier by running:gcloud beta services identity create --service=dataplex.googleapis.com.This command returns a Dataplex Universal Catalog service identifier if it exists.If the data quality rules refer to additional tables, then the scan project'sservice account must have read permissions on the same tables.
To get the permissions that you need to export the scan results to aBigQuery table, ask your administrator to grant theDataplex Universal Catalog service account theBigQuery Data Editor (
roles/bigquery.dataEditor) IAM role on theresults dataset and table. This grants the following permissions:bigquery.datasets.getbigquery.tables.createbigquery.tables.getbigquery.tables.getDatabigquery.tables.updatebigquery.tables.updateData
If the BigQuery data is organized in a Dataplex Universal Cataloglake, grant the Dataplex Universal Catalog service account theDataplex Metadata Reader (
roles/dataplex.metadataReader) andDataplex Viewer (roles/dataplex.viewer) IAM roles.Alternatively, you need all of the following permissions:dataplex.lakes.listdataplex.lakes.getdataplex.zones.listdataplex.zones.getdataplex.entities.listdataplex.entities.getdataplex.operations.get
If you're scanning a BigQuery external table fromCloud Storage, grant the Dataplex Universal Catalog service account theStorage Object Viewer (
roles/storage.objectViewer) role for the bucket.Alternatively, assign the Dataplex Universal Catalog service account thefollowing permissions:storage.buckets.getstorage.objects.get
If you want to publish the data quality scan results asDataplex Universal Catalog metadata, you must be grantedthe BigQuery Data Editor (
roles/bigquery.dataEditor)IAM role for the table, and thedataplex.entryGroups.useDataQualityScorecardAspectpermission on the@bigqueryentry group in the same location as the table.Alternatively, you must be granted the Dataplex Catalog Editor(roles/dataplex.catalogEditor) role for the@bigqueryentry group in thesame location as the table.Alternatively, you need all of the following permissions:
bigquery.tables.update- on the tabledataplex.entryGroups.useDataQualityScorecardAspect- on the@bigqueryentry group
Or, you need all of the following permissions:
dataplex.entries.update- on the@bigqueryentry groupdataplex.entryGroups.useDataQualityScorecardAspect- on the@bigqueryentry group
If you need to access columns protected by BigQuery column-levelaccess policies, then assign the Dataplex Universal Catalog service accountpermissions for those columns. The user creating or updating a data scan alsoneeds permissions for the columns.
If a table has BigQuery row-level access policies enabled, then youcan only scan rows visible to the Dataplex Universal Catalog service account. Notethat the individual user's access privileges are not evaluated for row-levelpolicies.
Required data scan roles
To use auto data quality, ask your administrator to grant you one of the followingIAM roles:
- Full access to
DataScanresources: Dataplex DataScan Administrator (roles/dataplex.dataScanAdmin) - To create
DataScanresources: Dataplex DataScan Creator (roles/dataplex.dataScanCreator) on the project - Write access to
DataScanresources: Dataplex DataScan Editor (roles/dataplex.dataScanEditor) - Read access to
DataScanresources excluding rules and results:Dataplex DataScan Viewer (roles/dataplex.dataScanViewer) - Read access to
DataScanresources, including rules and results:Dataplex DataScan DataViewer (roles/dataplex.dataScanDataViewer)
The following table lists theDataScan permissions:
| Permission name | Grants permission to do the following: |
|---|---|
dataplex.datascans.create | Create aDataScan |
dataplex.datascans.delete | Delete aDataScan |
dataplex.datascans.get | View operational metadata such as ID or schedule, but not results and rules |
dataplex.datascans.getData | ViewDataScan details including rules and results |
dataplex.datascans.list | ListDataScans |
dataplex.datascans.run | Run aDataScan |
dataplex.datascans.update | Update the description of aDataScan |
dataplex.datascans.getIamPolicy | View the current IAM permissions on the scan |
dataplex.datascans.setIamPolicy | Set IAM permissions on the scan |
Create a data quality scan
Console
In the Google Cloud console, on the BigQueryMetadata curation page, go to theData profiling & quality tab.
ClickCreate data quality scan.
In theDefine scan window, fill in the following fields:
Optional: Enter aDisplay name.
Enter anID. See theresource naming conventions.
Optional: Enter aDescription.
In theTable field, clickBrowse. Choose the table to scan, andthen clickSelect.Only standard BigQuery tables aresupported.
For tables in multi-region datasets, choose a region where to createthe data scan.
To browse the tables organized within Dataplex Universal Catalog lakes,clickBrowse within Dataplex Lakes.
In theScope field, chooseIncremental orEntire data.
- If you chooseIncremental: In theTimestamp column field,select a column of type
DATEorTIMESTAMPfrom yourBigQuery table that increases as new records are added,and that can be used to identify new records. It can be a column thatpartitions the table.
- If you chooseIncremental: In theTimestamp column field,select a column of type
To filter your data, select theFilter rows checkbox. Provide arow filter consisting of a valid SQL expression that can be used as a part of a
WHEREclause in GoogleSQL syntax.For example,col1 >= 0.The filter can be a combination of multiple column conditions. Forexample,col1 >= 0 AND col2 < 10.To sample your data, in theSampling size list, select asampling percentage. Choose a percentage value that ranges between0.0% and 100.0% with up to 3 decimal digits. For largerdatasets, choose a lower sampling percentage. For example, for a1 PB table, if you enter a value between 0.1% and 1.0%,the data quality scan samples between 1-10 TB of data. Forincremental data scans, the data quality scan applies sampling to thelatest increment.
To publish the data quality scan results as Dataplex Universal Catalogmetadata, select thePublish results to BigQuery and Dataplex Catalog checkbox.
You can view the latest scan results on theData quality tab in theBigQuery and Dataplex Universal Catalog pages for the sourcetable. To enable users to access the published scan results, see theGrant access to data profile scan results sectionof this document.
In theSchedule section, choose one of the following options:
Repeat: Run the data quality scan on a schedule: hourly, daily,weekly, monthly, or custom. Specify how often the scan runs andat what time. If you choose custom, usecronformat to specify the schedule.
On-demand: Run the data quality scan on demand.
ClickContinue.
In theData quality rules window, define the rules toconfigure for this data quality scan.
ClickAdd rules, and then choose from the following options.
Profile based recommendations: Build rules from therecommendations based on an existing data profiling scan.
Choose columns: Select the columns to get recommended rules for.
Choose scan project: If the data profiling scan is in adifferent project than the project where you are creatingthe data quality scan, then select the project to pull profilescans from.
Choose profile results: Select one or more profile results andthen clickOK. This populates a list of suggested rules thatyou can use as a starting point.
Select the checkbox for the rules that you want to add, and thenclickSelect. Once selected, the rules are added to yourcurrent rule list. Then, you can edit the rules.
Built-in rule types: Build rules from predefined rules.See the list ofpredefined rules.
Choose columns: Select the columns to select rules for.
Choose rule types: Select the rule types that you want tochoose from, and then clickOK. The rule types that appeardepend on the columns that you selected.
Select the checkbox for the rules that you want to add, and thenclickSelect. Once selected, the rules are added to yourcurrent rules list. Then, you can edit the rules.
SQL row check rule: Create a custom SQL rule to apply to each row.
InDimension, choose one dimension.
InPassing threshold, choose a percentage of records that mustpass the check.
InColumn name, choose a column.
In theProvide a SQL expression field, enter a SQL expressionthat evaluates to a boolean
true(pass) orfalse(fail). Formore information, seeSupported custom SQL rule typesand the examples inDefine data quality rules.ClickAdd.
SQL aggregate check rule: Create a custom SQLtable condition rule.
InDimension, choose one dimension.
InColumn name, choose a column.
In theProvide a SQL expression field, enter a SQL expressionthat evaluates to a boolean
true(pass) orfalse(fail). Formore information, seeSupported custom SQL rule typesand the examples inDefine data quality rules.ClickAdd.
SQL assertion rule: Create a custom SQL assertion rule to checkfor an invalid state of the data.
InDimension, choose one dimension.
Optional: InColumn name, choose a column.
In theProvide a SQL statement field, enter a SQL statementthat returns rows that match the invalid state. If any rows arereturned, this rule fails. Omit the trailing semicolon from the SQLstatement. For more information, seeSupported custom SQL rule typesand the examples inDefine data quality rules.
ClickAdd.
Optional: For any data quality rule, you can assign a custom rule nameto use for monitoring and alerting, and a description. To do this,edit a rule and specify the following details:
- Rule name: Enter a custom rule name with up to 63 characters.The rule name can include letters (a-z, A-Z), digits (0-9), andhyphens (-) and must start with a letter and end with a numberor a letter.
- Description: Enter a rule description with a maximumlength of 1,024 characters.
Repeat the previous steps to add additional rules to the data qualityscan. When finished, clickContinue.
Optional: Export the scan results to a BigQuery standardtable. In theExport scan results to BigQuery table section, do thefollowing:
In theSelect BigQuery dataset field, clickBrowse. Select aBigQuery dataset to store the data quality scan results.
In theBigQuery table field, specify the table to store the dataquality scan results. If you're using an existing table, make surethat it is compatible with theexport table schema.If the specified table doesn't exist, Dataplex Universal Catalog createsit for you.
Note: You can use the same results table for multiple data qualityscans.
Optional: Add labels. Labels are key-value pairs that let you grouprelated objects together or with other Google Cloud resources.
Optional: Set up email notification reports to alert people about thestatus and results of a data quality scan job. In theNotification reportsection, clickAdd email ID andenter up to five email addresses. Then, select the scenarios that you wantto send reports for:
- Quality score (<=): sends a report when a job succeeds with a dataquality score that is lower than the specified target score. Enter atarget quality score between 0 and 100.
- Job failures: sends a report when the job itself fails, regardlessof the data quality results.
- Job completion (success or failure): sends a report when the jobends, regardless of the data quality results.
ClickCreate.
After the scan is created, you can run it at any time by clickingRun now.
gcloud
To create a data quality scan, use thegcloud dataplex datascans create data-quality command.
If the source data is organized in a Dataplex Universal Catalog lake, include the--data-source-entity flag:
gclouddataplexdatascanscreatedata-qualityDATASCAN\--location=LOCATION\--data-quality-spec-file=DATA_QUALITY_SPEC_FILE\--data-source-entity=DATA_SOURCE_ENTITYIf the source data isn't organized in a Dataplex Universal Catalog lake, includethe--data-source-resource flag:
gclouddataplexdatascanscreatedata-qualityDATASCAN\--location=LOCATION\--data-quality-spec-file=DATA_QUALITY_SPEC_FILE\--data-source-resource=DATA_SOURCE_RESOURCEReplace the following variables:
DATASCAN: The name of the data quality scan.LOCATION: The Google Cloud region in which tocreate the data quality scan.DATA_QUALITY_SPEC_FILE: The path to the JSON orYAML file containing the specifications for the data quality scan. The filecan be a local file or a Cloud Storage path with the prefixgs://.Use this file to specify the data quality rules for the scan. You can alsospecify additional details in this file, such as filters, sampling percent,and post-scan actions like exporting to BigQuery or sendingemail notification reports. See thedocumentation for JSON representationand theexample YAML representation.DATA_SOURCE_ENTITY: The Dataplex Universal Catalogentity that contains the data for the data quality scan. For example,projects/test-project/locations/test-location/lakes/test-lake/zones/test-zone/entities/test-entity.DATA_SOURCE_RESOURCE: The name of the resourcethat contains the data for the data quality scan. For example,//bigquery.googleapis.com/projects/test-project/datasets/test-dataset/tables/test-table.
REST
To create a data quality scan, use thedataScans.create method.
If you want to build rules for the data quality scan by using rulerecommendations that are based on the results of a data profiling scan, getthe recommendations by calling thedataScans.jobs.generateDataQualityRules methodon the data profiling scan.
Run a data quality scan
Console
In the Google Cloud console, on the BigQueryMetadata curation page, go to theData profiling & quality tab.
Click the data quality scan to run.
ClickRun now.
gcloud
To run a data quality scan, use thegcloud dataplex datascans run command:
gcloud dataplex datascans runDATASCAN \--location=LOCATION \
Replace the following variables:
LOCATION: The Google Cloud region in which thedata quality scan was created.DATASCAN: The name of the data quality scan.
REST
To run a data quality scan, use thedataScans.run method.
View data quality scan results
Console
In the Google Cloud console, on the BigQueryMetadata curation page, go to theData profiling & quality tab.
Click the name of a data quality scan.
TheOverview section displays information about the most recentjobs, including when the scan was run, the number of recordsscanned in each job, whether all the data quality checks passed, andif there were failures, the number of data quality checks that failed.
TheData quality scan configuration section displays details about thescan.
To see detailed information about a job, such as data quality scores thatindicate the percentage of rules that passed, which rules failed, and thejob logs, click theJobs history tab. Then, click a job ID.
gcloud
To view the results of a data quality scan job, use thegcloud dataplex datascans jobs describe command:
gcloud dataplex datascans jobs describeJOB \--location=LOCATION \--datascan=DATASCAN \--view=FULL
Replace the following variables:
JOB: The job ID of the data quality scan job.LOCATION: The Google Cloud region in which the dataquality scan was created.DATASCAN: The name of the data quality scan the jobbelongs to.--view=FULL: To see the scan job result, specifyFULL.
REST
To view the results of a data quality scan, use thedataScans.get method.
View published results
If the data quality scan results are published as Dataplex Universal Catalogmetadata, then you can see the latest scan resultson the BigQuery and Dataplex Universal Catalog pages in theGoogle Cloud console, on the source table'sData quality tab.
In the Google Cloud console, go to theBigQuery page.
In the left pane, clickExplorer:

If you don't see the left pane, clickExpand left pane to open the pane.
In theExplorer pane, clickDatasets, and then click your dataset.
ClickOverview> Tables, and then select the table whose data quality scanresults you want to see.
Click theData quality tab.
The latest published results are displayed.
Note: Published results might not be available if a scan is running for the firsttime.
View historical scan results
Dataplex Universal Catalog saves the data quality scan history of the last 300jobs or for the past year, whichever occurs first.
Console
In the Google Cloud console, on the BigQueryMetadata curation page, go to theData profiling & quality tab.
Click the name of a data quality scan.
Click theJobs history tab.
TheJobs history tab provides information about past jobs, such asthe number of records scanned in each job, the job status, the timethe job was run, and whether each rule passed or failed.
To view detailed information about a job, click any of the jobs in theJob ID column.
gcloud
To view historical data quality scan jobs, use thegcloud dataplex datascans jobs list command:
gcloud dataplex datascans jobs list \--location=LOCATION \--datascan=DATASCAN \
Replace the following variables:
LOCATION: The Google Cloud region in which the dataquality scan was created.DATASCAN: The name of the data quality scan to viewhistorical jobs for.
REST
To view historical data quality scan jobs, use thedataScans.jobs.list method.
Grant access to data quality scan results
To enable the users in your organization to view the scan results, do the following:
In the Google Cloud console, on the BigQueryMetadata curation page, go to theData profiling & quality tab.
Click the data quality scan you want to share the results of.
Click thePermissions tab.
Do the following:
- To grant access to a principal, clickGrant access. Grant theDataplex DataScan DataViewer role to theassociated principal.
- To remove access from a principal, select the principal that youwant to remove theDataplex DataScan DataViewer role from. ClickRemove access, and then confirm when prompted.
Troubleshoot a data quality failure
You can set alerts for data quality failures using the logs in Cloud Logging.For more information, including sample queries, seeSet alerts in Cloud Logging.
For each job with row-level rules that fail, Dataplex Universal Catalog providesa query to get the failed records. Run this query to see the records that didnot match your rule.
Note: The query returns all of the columns of the table, not just the failedcolumn.Console
In the Google Cloud console, on the BigQueryMetadata curation page, go to theData profiling & quality tab.
Click the name of the data quality scan whose records you want to troubleshoot.
Click theJobs history tab.
Click the job ID of the job that identified data quality failures.
In the job results window that opens, in theRules section, find the columnQuery to get failed records. ClickCopy query to clipboard for thefailed rule.
Run the query in BigQueryto see the records that caused the job to fail.
gcloud
Not supported.
REST
To get the job that identified data quality failures, use the
dataScans.getmethod.In the response object, the
failingRowsQueryfield shows the query.Run the query in BigQueryto see the records that caused the job to fail.
Manage data quality scans for a specific table
The steps in this document show how to manage data quality scans across yourproject by using the BigQueryMetadata curation> Data profiling & quality page in theGoogle Cloud console.
You can also create and manage data quality scans when working with aspecific table. In the Google Cloud console, on the BigQuerypage for the table, use theData quality tab. Do the following:
In the Google Cloud console, go to theBigQuery page.
In theExplorer pane (in the left pane), clickDatasets, and then click your dataset. ClickOverview> Tables, and then select the table whose data quality scan results you want to see.
Click theData quality tab.
Depending on whether the table has a data quality scan whose results arepublished as Dataplex Universal Catalog metadata, you can work with the table'sdata quality scans in the following ways:
Data quality scan results are published: the latest scan results aredisplayed on the page.
To manage the data quality scans for this table, clickData qualityscan, and then select from the following options:
Create new scan: create a new data quality scan. For moreinformation, see theCreate a data quality scan sectionof this document. When you create a scan from a table's details page, thetable is preselected.
Run now: run the scan.
Edit scan configuration: edit settings including the display name,filters, and schedule.
To edit the data quality rules, on theData quality tab, click theRules tab. ClickModify rules. Update the rules and then clickSave.
Manage scan permissions: control who can access the scan results.For more information, see theGrant access to data quality scan resultssection of this document.
View historical results: view detailed information about previousdata quality scan jobs. For more information, see theView data quality scan results andView historical scan results sections ofthis document.
View all scans: view a list of data quality scans that apply to thistable.
Data quality scan results aren't published: select from thefollowing options:
Create data quality scan: create a new data quality scan. For moreinformation, see theCreate a data quality scan sectionof this document. When you create a scan from a table's details page, thetable is preselected.
View existing scans: view a list of data quality scans that apply tothis table.
View the data quality scans for a table
To view the data quality scans that apply to a specific table, do the following:
In the Google Cloud console, on the BigQueryMetadata curation page, go to theData profiling & quality tab.
Filter the list by table name and scan type.
Update a data quality scan
You can edit various settings for an existing data quality scan, such as thedisplay name, filters, schedule, and data quality rules.
Note: If an existing data quality scan publishes the results to theBigQuery and Dataplex Universal Catalog pages in theGoogle Cloud console, and you instead want to publish future scan results asDataplex Universal Catalog metadata, you must edit the scan and reenable publishing.You might need additional permissions to enable catalog publishing.Console
In the Google Cloud console, on the BigQueryMetadata curation page, go to theData profiling & quality tab.
Click the name of a data quality scan.
To edit settings including the display name, filters, and schedule, clickEdit. Edit the values and then clickSave.
To edit the data quality rules, on the scan details page, click theCurrent rules tab. ClickModify rules. Update the rules andthen clickSave.
gcloud
To update the description of a data quality scan, use thegcloud dataplex datascans update data-quality command:
gcloud dataplex datascans update data-qualityDATASCAN \--location=LOCATION \--description=DESCRIPTION
Replace the following:
DATASCAN: The name of the data quality scan toupdate.LOCATION: The Google Cloud region in which the dataquality scan was created.DESCRIPTION: The new description for the dataquality scan.
rules,rowFilter, orsamplingPercent, in the data quality specification file. Refer toJSON andYAML representations.REST
To edit a data quality scan, use thedataScans.patch method.
Delete a data quality scan
Console
In the Google Cloud console, on the BigQueryMetadata curation page, go to theData profiling & quality tab.
Click the scan you want to delete.
ClickDelete, and then confirm when prompted.
gcloud
To delete a data quality scan, use thegcloud dataplex datascans delete command:
gcloud dataplex datascans deleteDATASCAN \--location=LOCATION \--async
Replace the following variables:
DATASCAN: The name of the data quality scan todelete.LOCATION: The Google Cloud region in which the dataquality scan was created.
REST
To delete a data quality scan, use thedataScans.delete method.
What's next
- Learn more aboutdata governance in BigQuery.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-10-24 UTC.