Use auto data quality

This document describes how to use Dataplex Universal Catalog data quality scans tomeasure, monitor, and manage the quality of your data. Data quality scanshelp you automate the process of validating your data for completeness,validity, and consistency.

With data quality scans, you can define rules to check for missing values,ensure values match a regular expression or belong to a set, verify uniqueness,or use custom SQL for more complex validations like anomaly detection. Thisdocument explains how to create and manage data quality scans.

To learn more about data quality scans, seeAbout auto data quality.

Note: The steps in this document show how to manage data quality scans acrossyour project. You can also create and manage data quality scans when workingwith a specific table. For more information, see theManage data quality scans for a specific table sectionof this document.

Before you begin

  1. Enable the Dataplex API.

    Roles required to enable APIs

    To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enable permission.Learn how to grant roles.

    Enable the API

  2. Optional: If you want Dataplex Universal Catalog to generate recommendations fordata quality rules based on the results of a data profile scan,create and run the data profile scan.

Required roles and permissions

Required data scan roles

To use auto data quality, ask your administrator to grant you one of the followingIAM roles:

The following table lists theDataScan permissions:

Permission nameGrants permission to do the following:
dataplex.datascans.createCreate aDataScan
dataplex.datascans.deleteDelete aDataScan
dataplex.datascans.getView operational metadata such as ID or schedule, but not results and rules
dataplex.datascans.getDataViewDataScan details including rules and results
dataplex.datascans.listListDataScans
dataplex.datascans.runRun aDataScan
dataplex.datascans.updateUpdate the description of aDataScan
dataplex.datascans.getIamPolicyView the current IAM permissions on the scan
dataplex.datascans.setIamPolicySet IAM permissions on the scan

Define data quality rules

You can define data quality rules by usingbuilt-in rules orcustom SQL checks.If you're using the Google Cloud CLI, you can definethese rules in aJSON or YAML file.

The examples in the following sections show how to define a variety of data qualityrules. The rules validate a sample table that contains data about customer transactions.Assume the table has the following schema:

Column nameColumn typeColumn description
transaction_timestampTimestampTimestamp of the transaction. The table is partitioned on this field.
customer_idStringA customer ID in the format of 8 letters followed by 16 digits.
transaction_idStringThe transaction ID needs to be unique across the table.
currency_idStringOne of the supported currencies.The currency type must match one of the available currencies in the dimension tabledim_currency.
amountfloatTransaction amount.
discount_pctfloatDiscount percentage. This value must be between 0 and 100.

Define data quality rules using built-in rule types

The following example rules are based on built-in rule types. You can createrules based on built-in rule types using the Google Cloud console or the API.Dataplex Universal Catalog might recommend some of these rules.

Column name Rule Type Suggested dimension Rule parameters
transaction_id Uniqueness check Uniqueness Threshold:Not Applicable
amount Null check Completeness Threshold:100%
customer_idRegex (regular expression) check ValidityRegular expression:^[0-9]{8}[a-zA-Z]{16}$
Threshold:100%
currency_id Value set check Validity Set of:USD,JPY,INR,GBP,CAN
Threshold:100%

Define data quality rules using custom SQL rules

To build custom SQL rules, use the following framework:

  • When you create a rule that evaluates one row at a time, create an expressionthat generates the number of successful rows when Dataplex Universal Catalogevaluates the querySELECT COUNTIF(CUSTOM_SQL_EXPRESSION) FROM TABLE.Dataplex Universal Catalog checks the number of successful rows against thethreshold.

  • When you create a rule that evaluates across the rows or uses a tablecondition, create an expression that returns success or failure whenDataplex Universal Catalog evaluates the querySELECT IF(CUSTOM_SQL_EXPRESSION) FROM TABLE.

  • When you create a rule that evaluates the invalid state of a dataset, providea statement that returns invalid rows. If any rows are returned, the rulefails. Omit the trailing semicolon from the SQL statement.

  • You can refer to a data source table and all of its precondition filters byusing the data reference parameter${data()} in a rule, instead ofexplicitly mentioning the source table and its filters. Examples ofprecondition filters include row filters, sampling percents, and incrementalfilters. The${data()} parameter is case-sensitive.

The following example rules are based on custom SQL rules.

Rule typeRule descriptionSQL expression
Row conditionChecks if the value of thediscount_pct is between 0 and 100.0 <discount_pct ANDdiscount_pct <100
Row conditionReference check to validate thatcurrency_id is one of the supported currencies.currency_id in (select id from my_project_id.dim_dataset.dim_currency)
Table conditionAggregate SQL expression that checks if the averagediscount_pct is between 30% and 50%.30<avg(discount) AND avg(discount) <50
Row conditionChecks if a date is not in the future.TIMESTAMP(transaction_timestamp)< CURRENT_TIMESTAMP()
Table condition A BigQuery user-defined function (UDF) to check that the average transaction amount is less than a predefined value per country. Create the (Javascript) UDF by running the following command:
        CREATE OR REPLACE FUNCTION        myProject.myDataset.average_by_country (          country STRING, average FLOAT64)        RETURNS BOOL LANGUAGE js AS R"""        if (country = "CAN" && average< 5000){          return 1        } else if (country = "IND" && average< 1000){          return 1        } else { return 0 }        """;
Example rule to check the average transaction amount forcountry=CAN.
        myProject.myDataset.average_by_country(        "CAN",        (SELECT avg(amount) FROM          myProject.myDataset.transactions_table            WHERE currency_id = 'CAN'        ))
Table condition ABigQuery ML predict clause to identify anomalies indiscount_pct. It checks if a discount should be applied based oncustomer,currency, andtransaction. The rule checks if the prediction matches the actual value, at least 99% of times. Assumption: The ML model is created before using the rule. Create the ML model using the following command:
  CREATE MODEL  model-project-id.dataset-id.model-name        OPTIONS(model_type='logistic_reg') AS  SELECT  IF(discount_pct IS NULL, 0, 1) AS label,  IFNULL(customer_id, "") AS customer,  IFNULL(currency_id, "") AS currency,  IFNULL(amount, 0.0) AS amount  FROM  `data-project-id.dataset-id.table-names`  WHERE transaction_timestamp< '2022-01-01';
The following rule checks if prediction accuracy is greater than 99%.
      SELECT        accuracy > 0.99      FROM       ML.EVALUATE        (MODEL model-project-id.dataset-id.model-name,         (          SELECT            customer_id,            currency_id,            amount,            discount_pct          FROM            data-project-id.dataset-id.table-names          WHERE transaction_timestamp > '2022-01-01';         )        )
Row condition ABigQuery ML predict function to identify anomalies indiscount_pct. The function checks if a discount should be applied based oncustomer,currency andtransaction. The rule identifies all the occurrences where the prediction didn't match. Assumption: The ML model is created before using the rule. Create the ML model using the following command:
  CREATE MODEL  model-project-id.dataset-id.model-name        OPTIONS(model_type='logistic_reg') AS  SELECT  IF(discount_pct IS NULL, 0, 1) AS label,  IFNULL(customer_id, "") AS customer,  IFNULL(currency_id, "") AS currency,  IFNULL(amount, 0.0) AS amount  FROM  `data-project-id.dataset-id.table-names`  WHERE transaction_timestamp< '2022-01-01';
The following rule checks if the discount prediction matches with the actual for every row.
       IF(discount_pct > 0, 1, 0)          =(SELECT predicted_label FROM           ML.PREDICT(            MODEL model-project-id.dataset-id.model-name,              (                SELECT                  customer_id,                  currency_id,                  amount,                  discount_pct                FROM                  data-project-id.dataset-id.table-names AS t                    WHERE t.transaction_timestamp =                     transaction_timestamp                   LIMIT 1              )            )         )
SQL assertionValidates if thediscount_pct is greater than 30% for today by checking whether any rows exist with a discount percent less than or equal to 30.SELECT * FROM my_project_id.dim_dataset.dim_currency WHERE discount_pct<= 30 AND transaction_timestamp >= current_date()
SQL assertion (withdata reference parameter)

Checks if thediscount_pct is greater than 30% for all the supported currencies today.

The date filtertransaction_timestamp >= current_date() is applied as a row filter on the data source table.

The data reference parameter${data()} acts as a placeholder formy_project_id.dim_dataset.dim_currency WHERE transaction_timestamp >= current_date() and applies the row filter.

SELECT * FROM ${data()} WHERE discount_pct > 30

Define data quality rules using the gcloud CLI

The following example YAML file uses some of the same rules as thesample rules using built-in types and thesample custom SQL rules. This YAML file also containsother specifications for the data quality scan, such as filters and samplingpercent. When you use the gcloud CLI to create or update a dataquality scan, you can use a YAML file like this as input to the--data-quality-spec-file argument.

rules:-uniquenessExpectation:{}column:transaction_iddimension:UNIQUENESS-nonNullExpectation:{}column:amountdimension:COMPLETENESSthreshold:1-regexExpectation:regex:'^[0-9]{8}[a-zA-Z]{16}$'column:customer_idignoreNull:truedimension:VALIDITYthreshold:1-setExpectation:values:-'USD'-'JPY'-'INR'-'GBP'-'CAN'column:currency_idignoreNull:truedimension:VALIDITYthreshold:1-rangeExpectation:minValue:'0'maxValue:'100'column:discount_pctignoreNull:truedimension:VALIDITYthreshold:1-rowConditionExpectation:sqlExpression:0 < `discount_pct` AND `discount_pct` < 100column:discount_pctdimension:VALIDITYthreshold:1-rowConditionExpectation:sqlExpression:currency_id in (select id from `my_project_id.dim_dataset.dim_currency`)column:currency_iddimension:VALIDITYthreshold:1-tableConditionExpectation:sqlExpression:30 < avg(discount_pct) AND avg(discount_pct) < 50dimension:VALIDITY-rowConditionExpectation:sqlExpression:TIMESTAMP(transaction_timestamp) < CURRENT_TIMESTAMP()column:transaction_timestampdimension:VALIDITYthreshold:1-sqlAssertion:sqlStatement:SELECT * FROM `my_project_id.dim_dataset.dim_currency` WHERE discount_pct > 100dimension:VALIDITYdebugQueries:-sqlStatement:SELECT MAX(discount_pct) FROM `my_project_id.dim_dataset.dim_currency`samplingPercent:50rowFilter:discount_pct > 100postScanActions:bigqueryExport:resultsTable:projects/my_project_id/datasets/dim_dataset/tables/dim_currencynotificationReport:recipients:emails:-'222larabrown@gmail.com'-'cloudysanfrancisco@gmail.com'scoreThresholdTrigger:scoreThreshold:50jobFailureTrigger:{}jobEndTrigger:{}catalogPublishingEnabled:true

Create a data quality scan

Console

  1. In the Google Cloud console, go to the Dataplex Universal CatalogData profiling & quality page.

    Go to Data profiling & quality

  2. ClickCreate data quality scan.

  3. In theDefine scan window, fill in the following fields:

    1. Optional: Enter aDisplay name.

    2. Enter anID. See theresource naming conventions.

    3. Optional: Enter aDescription.

    4. In theTable field, clickBrowse. Choose the table to scan, andthen clickSelect.Only standard BigQuery tables aresupported.

      For tables in multi-region datasets, choose a region where to createthe data scan.

      To browse the tables organized within Dataplex Universal Catalog lakes,clickBrowse within Dataplex Lakes.

    5. In theScope field, chooseIncremental orEntire data.

      • If you chooseIncremental: In theTimestamp column field,select a column of typeDATE orTIMESTAMP from yourBigQuery table that increases as new records are added,and that can be used to identify new records. It can be a column thatpartitions the table.
    6. To filter your data, select theFilter rows checkbox. Provide arow filter consisting of a valid SQL expression that can be used as a part of aWHERE clause in GoogleSQL syntax.For example,col1 >= 0.The filter can be a combination of multiple column conditions. Forexample,col1 >= 0 AND col2 < 10.

    7. To sample your data, in theSampling size list, select asampling percentage. Choose a percentage value that ranges between0.0% and 100.0% with up to 3 decimal digits. For largerdatasets, choose a lower sampling percentage. For example, for a1 PB table, if you enter a value between 0.1% and 1.0%,the data quality scan samples between 1-10 TB of data. Forincremental data scans, the data quality scan applies sampling to thelatest increment.

    8. To publish the data quality scan results as Dataplex Universal Catalogmetadata, select thePublish results to Dataplex Catalog checkbox.

      You can view the latest scan results on theData quality tab in theBigQuery and Dataplex Universal Catalog pages for the sourcetable. To enable users to access the published scan results, see theGrant access to data quality scan results sectionof this document.

    9. In theSchedule section, choose one of the following options:

      • Repeat: Run the data quality scan on a schedule: hourly, daily,weekly, monthly, or custom. Specify how often the scan runs andat what time. If you choose custom, usecronformat to specify the schedule.

      • On-demand: Run the data quality scan on demand.

      • One-time: Run the data quality scan once now, and remove thescan after the time-to-live period.

      • Time to live: The time-to-live value is the time span betweenwhen the scan is executed and when the scan is deleted. A dataquality scan without a specified time-to-live is automaticallydeleted 24 hours after its execution. The time-to-live can rangefrom 0 seconds (immediate deletion) to 365 days.

    10. ClickContinue.

  4. In theData quality rules window, define the rules toconfigure for this data quality scan.

    1. ClickAdd rules, and then choose from the following options.

      • Profile based recommendations: Build rules from therecommendations based on an existing data profiling scan.

        1. Choose columns: Select the columns to get recommended rules for.

        2. Choose scan project: If the data profiling scan is in adifferent project than the project where you are creatingthe data quality scan, then select the project to pull profilescans from.

        3. Choose profile results: Select one or more profile results andthen clickOK. This populates a list of suggested rules thatyou can use as a starting point.

        4. Select the checkbox for the rules that you want to add, and thenclickSelect. Once selected, the rules are added to yourcurrent rule list. Then, you can edit the rules.

      • Built-in rule types: Build rules from predefined rules.See the list ofpredefined rules.

        1. Choose columns: Select the columns to select rules for.

        2. Choose rule types: Select the rule types that you want tochoose from, and then clickOK. The rule types that appeardepend on the columns that you selected.

        3. Select the checkbox for the rules that you want to add, and thenclickSelect. Once selected, the rules are added to yourcurrent rules list. Then, you can edit the rules.

      • SQL row check rule: Create a custom SQL rule to apply to each row.

        1. InDimension, choose one dimension.

        2. InPassing threshold, choose a percentage of records that mustpass the check.

        3. InColumn name, choose a column.

        4. In theProvide a SQL expression field, enter a SQL expressionthat evaluates to a booleantrue (pass) orfalse (fail). Formore information, seeSupported custom SQL rule typesand the examples inDefine data quality rules.

        5. ClickAdd.

      • SQL aggregate check rule: Create a custom SQLtable condition rule.

        1. InDimension, choose one dimension.

        2. InColumn name, choose a column.

        3. In theProvide a SQL expression field, enter a SQL expressionthat evaluates to a booleantrue (pass) orfalse (fail). Formore information, seeSupported custom SQL rule typesand the examples inDefine data quality rules.

        4. ClickAdd.

      • SQL assertion rule: Create a custom SQL assertion rule to checkfor an invalid state of the data.

        1. InDimension, choose one dimension.

        2. Optional: InColumn name, choose a column.

        3. In theProvide a SQL statement field, enter a SQL statementthat returns rows that match the invalid state. If any rows arereturned, this rule fails. Omit the trailing semicolon from the SQLstatement. For more information, seeSupported custom SQL rule typesand the examples inDefine data quality rules.

        4. ClickAdd.

    2. Optional: For any data quality rule, you can assign a custom rule nameto use for monitoring and alerting, and a description. To do this,edit a rule and specify the following details:

      • Rule name: Enter a custom rule name with up to 63 characters.The rule name can include letters (a-z, A-Z), digits (0-9), andhyphens (-) and must start with a letter and end with a numberor a letter.
      • Description: Enter a rule description with a maximumlength of 1,024 characters.
    3. Repeat the previous steps to add additional rules to the data qualityscan. When finished, clickContinue.

  5. Optional: Export the scan results to a BigQuery standardtable. In theExport scan results to BigQuery table section, do thefollowing:

    1. In theSelect BigQuery dataset field, clickBrowse. Select aBigQuery dataset to store the data quality scan results.

    2. In theBigQuery table field, specify the table to store the dataquality scan results. If you're using an existing table, make surethat it is compatible with theexport table schema.If the specified table doesn't exist, Dataplex Universal Catalog createsit for you.

      Note: You can use the same results table for multiple data qualityscans.
  6. Optional: Add labels. Labels are key-value pairs that let you grouprelated objects together or with other Google Cloud resources.

  7. Optional: Set up email notification reports to alert people about thestatus and results of a data quality scan job. In theNotification reportsection, clickAdd email ID andenter up to five email addresses. Then, select the scenarios that you wantto send reports for:

  8. ClickCreate.

    After the scan is created, you can run it at any time by clickingRun now.

gcloud

To create a data quality scan, use thegcloud dataplex datascans create data-quality command.

If the source data is organized in a Dataplex Universal Catalog lake, include the--data-source-entity flag:

gclouddataplexdatascanscreatedata-qualityDATASCAN\--location=LOCATION\--data-quality-spec-file=DATA_QUALITY_SPEC_FILE\--data-source-entity=DATA_SOURCE_ENTITY

If the source data isn't organized in a Dataplex Universal Catalog lake, includethe--data-source-resource flag:

gclouddataplexdatascanscreatedata-qualityDATASCAN\--location=LOCATION\--data-quality-spec-file=DATA_QUALITY_SPEC_FILE\--data-source-resource=DATA_SOURCE_RESOURCE

Replace the following variables:

  • DATASCAN: The name of the data quality scan.
  • LOCATION: The Google Cloud region in which tocreate the data quality scan.
  • DATA_QUALITY_SPEC_FILE: The path to the JSON orYAML file containing the specifications for the data quality scan. The filecan be a local file or a Cloud Storage path with the prefixgs://.Use this file to specify the data quality rules for the scan. You can alsospecify additional details in this file, such as filters, sampling percent,and post-scan actions like exporting to BigQuery or sendingemail notification reports. See thedocumentation for JSON representationand theexample YAML representation.
  • DATA_SOURCE_ENTITY: The Dataplex Universal Catalogentity that contains the data for the data quality scan. For example,projects/test-project/locations/test-location/lakes/test-lake/zones/test-zone/entities/test-entity.
  • DATA_SOURCE_RESOURCE: The name of the resourcethat contains the data for the data quality scan. For example,//bigquery.googleapis.com/projects/test-project/datasets/test-dataset/tables/test-table.

C#

C#

Before trying this sample, follow theC# setup instructions in theDataplex Universal Catalog quickstart using client libraries. For more information, see theDataplex Universal CatalogC# API reference documentation.

To authenticate to Dataplex Universal Catalog, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

usingGoogle.Api.Gax.ResourceNames;usingGoogle.Cloud.Dataplex.V1;usingGoogle.LongRunning;publicsealedpartialclassGeneratedDataScanServiceClientSnippets{/// <summary>Snippet for CreateDataScan</summary>/// <remarks>/// This snippet has been automatically generated and should be regarded as a code template only./// It will require modifications to work:/// - It may require correct/in-range values for request initialization./// - It may require specifying regional endpoints when creating the service client as shown in///   https://cloud.google.com/dotnet/docs/reference/help/client-configuration#endpoint./// </remarks>publicvoidCreateDataScanRequestObject(){// Create clientDataScanServiceClientdataScanServiceClient=DataScanServiceClient.Create();// Initialize request argument(s)CreateDataScanRequestrequest=newCreateDataScanRequest{ParentAsLocationName=LocationName.FromProjectLocation("[PROJECT]","[LOCATION]"),DataScan=newDataScan(),DataScanId="",ValidateOnly=false,};// Make the requestOperation<DataScan,OperationMetadata>response=dataScanServiceClient.CreateDataScan(request);// Poll until the returned long-running operation is completeOperation<DataScan,OperationMetadata>completedResponse=response.PollUntilCompleted();// Retrieve the operation resultDataScanresult=completedResponse.Result;// Or get the name of the operationstringoperationName=response.Name;// This name can be stored, then the long-running operation retrieved later by nameOperation<DataScan,OperationMetadata>retrievedResponse=dataScanServiceClient.PollOnceCreateDataScan(operationName);// Check if the retrieved long-running operation has completedif(retrievedResponse.IsCompleted){// If it has completed, then access the resultDataScanretrievedResult=retrievedResponse.Result;}}}

Go

Go

Before trying this sample, follow theGo setup instructions in theDataplex Universal Catalog quickstart using client libraries. For more information, see theDataplex Universal CatalogGo API reference documentation.

To authenticate to Dataplex Universal Catalog, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

packagemainimport("context"dataplex"cloud.google.com/go/dataplex/apiv1"dataplexpb"cloud.google.com/go/dataplex/apiv1/dataplexpb")funcmain(){ctx:=context.Background()// This snippet has been automatically generated and should be regarded as a code template only.// It will require modifications to work:// - It may require correct/in-range values for request initialization.// - It may require specifying regional endpoints when creating the service client as shown in://   https://pkg.go.dev/cloud.google.com/go#hdr-Client_Optionsc,err:=dataplex.NewDataScanClient(ctx)iferr!=nil{// TODO: Handle error.}deferc.Close()req:=&dataplexpb.CreateDataScanRequest{// TODO: Fill request struct fields.// See https://pkg.go.dev/cloud.google.com/go/dataplex/apiv1/dataplexpb#CreateDataScanRequest.}op,err:=c.CreateDataScan(ctx,req)iferr!=nil{// TODO: Handle error.}resp,err:=op.Wait(ctx)iferr!=nil{// TODO: Handle error.}// TODO: Use resp._=resp}

Java

Java

Before trying this sample, follow theJava setup instructions in theDataplex Universal Catalog quickstart using client libraries. For more information, see theDataplex Universal CatalogJava API reference documentation.

To authenticate to Dataplex Universal Catalog, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

importcom.google.cloud.dataplex.v1.CreateDataScanRequest;importcom.google.cloud.dataplex.v1.DataScan;importcom.google.cloud.dataplex.v1.DataScanServiceClient;importcom.google.cloud.dataplex.v1.LocationName;publicclassSyncCreateDataScan{publicstaticvoidmain(String[]args)throwsException{syncCreateDataScan();}publicstaticvoidsyncCreateDataScan()throwsException{// This snippet has been automatically generated and should be regarded as a code template only.// It will require modifications to work:// - It may require correct/in-range values for request initialization.// - It may require specifying regional endpoints when creating the service client as shown in// https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_librarytry(DataScanServiceClientdataScanServiceClient=DataScanServiceClient.create()){CreateDataScanRequestrequest=CreateDataScanRequest.newBuilder().setParent(LocationName.of("[PROJECT]","[LOCATION]").toString()).setDataScan(DataScan.newBuilder().build()).setDataScanId("dataScanId1260787906").setValidateOnly(true).build();DataScanresponse=dataScanServiceClient.createDataScanAsync(request).get();}}}

Node.js

Node.js

Before trying this sample, follow theNode.js setup instructions in theDataplex Universal Catalog quickstart using client libraries. For more information, see theDataplex Universal CatalogNode.js API reference documentation.

To authenticate to Dataplex Universal Catalog, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

// Copyright 2026 Google LLC//// Licensed under the Apache License, Version 2.0 (the "License");// you may not use this file except in compliance with the License.// You may obtain a copy of the License at////     https://www.apache.org/licenses/LICENSE-2.0//// Unless required by applicable law or agreed to in writing, software// distributed under the License is distributed on an "AS IS" BASIS,// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.// See the License for the specific language governing permissions and// limitations under the License.//// ** This file is automatically generated by gapic-generator-typescript. **// ** https://github.com/googleapis/gapic-generator-typescript **// ** All changes to this file may be overwritten. **'use strict';functionmain(parent,dataScan,dataScanId){/**   * This snippet has been automatically generated and should be regarded as a code template only.   * It will require modifications to work.   * It may require correct/in-range values for request initialization.   * TODO(developer): Uncomment these variables before running the sample.   *//**   *  Required. The resource name of the parent location:   *  `projects/{project}/locations/{location_id}`   *  where `project` refers to a *project_id* or *project_number* and   *  `location_id` refers to a Google Cloud region.   */// const parent = 'abc123'/**   *  Required. DataScan resource.   */// const dataScan = {}/**   *  Required. DataScan identifier.   *  * Must contain only lowercase letters, numbers and hyphens.   *  * Must start with a letter.   *  * Must end with a number or a letter.   *  * Must be between 1-63 characters.   *  * Must be unique within the customer project / location.   */// const dataScanId = 'abc123'/**   *  Optional. Only validate the request, but do not perform mutations.   *  The default is `false`.   */// const validateOnly = true// Imports the Dataplex libraryconst{DataScanServiceClient}=require('@google-cloud/dataplex').v1;// Instantiates a clientconstdataplexClient=newDataScanServiceClient();asyncfunctioncallCreateDataScan(){// Construct requestconstrequest={parent,dataScan,dataScanId,};// Run requestconst[operation]=awaitdataplexClient.createDataScan(request);const[response]=awaitoperation.promise();console.log(response);}callCreateDataScan();}process.on('unhandledRejection',err=>{console.error(err.message);process.exitCode=1;});main(...process.argv.slice(2));

Python

Python

Before trying this sample, follow thePython setup instructions in theDataplex Universal Catalog quickstart using client libraries. For more information, see theDataplex Universal CatalogPython API reference documentation.

To authenticate to Dataplex Universal Catalog, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

# This snippet has been automatically generated and should be regarded as a# code template only.# It will require modifications to work:# - It may require correct/in-range values for request initialization.# - It may require specifying regional endpoints when creating the service#   client as shown in:#   https://googleapis.dev/python/google-api-core/latest/client_options.htmlfromgoogle.cloudimportdataplex_v1defsample_create_data_scan():# Create a clientclient=dataplex_v1.DataScanServiceClient()# Initialize request argument(s)data_scan=dataplex_v1.DataScan()data_scan.data_quality_spec.rules.dimension="dimension_value"data_scan.data.entity="entity_value"request=dataplex_v1.CreateDataScanRequest(parent="parent_value",data_scan=data_scan,data_scan_id="data_scan_id_value",)# Make the requestoperation=client.create_data_scan(request=request)print("Waiting for operation to complete...")response=operation.result()# Handle the responseprint(response)

Ruby

Ruby

Before trying this sample, follow theRuby setup instructions in theDataplex Universal Catalog quickstart using client libraries. For more information, see theDataplex Universal CatalogRuby API reference documentation.

To authenticate to Dataplex Universal Catalog, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

require"google/cloud/dataplex/v1"### Snippet for the create_data_scan call in the DataScanService service## This snippet has been automatically generated and should be regarded as a code# template only. It will require modifications to work:# - It may require correct/in-range values for request initialization.# - It may require specifying regional endpoints when creating the service# client as shown in https://cloud.google.com/ruby/docs/reference.## This is an auto-generated example demonstrating basic usage of# Google::Cloud::Dataplex::V1::DataScanService::Client#create_data_scan.#defcreate_data_scan# Create a client object. The client can be reused for multiple calls.client=Google::Cloud::Dataplex::V1::DataScanService::Client.new# Create a request. To set request fields, pass in keyword arguments.request=Google::Cloud::Dataplex::V1::CreateDataScanRequest.new# Call the create_data_scan method.result=client.create_data_scanrequest# The returned object is of type Gapic::Operation. You can use it to# check the status of an operation, cancel it, or wait for results.# Here is how to wait for a response.result.wait_until_done!timeout:60ifresult.response?presult.responseelseputs"No response received."endend

REST

To create a data quality scan, use thedataScans.create method.

The following request creates a one-time data quality scan:

POSThttps://dataplex.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/dataScans?data_scan_id=DATASCAN_ID{"data":{"resource":"//bigquery.googleapis.com/projects/PROJECT_ID/datasets/DATASET_ID/tables/TABLE_ID"},"type":"DATA_QUALITY","executionSpec":{"trigger":{"oneTime":{"ttl_after_scan_completion":"120s"}}},"dataQualitySpec":{"rules":[{"nonNullExpectation":{},"column":"COLUMN_NAME","dimension":"DIMENSION","threshold":1}]}}

Replace the following:

  • PROJECT_ID: Your project ID.
  • LOCATION: The region where to create the data quality scan.
  • DATASCAN_ID: The ID of the data quality scan.
  • DATASET_ID: The ID of BigQuery dataset.
  • TABLE_ID: The ID of BigQuery table.
  • COLUMN_NAME: The column name for the rule.
  • DIMENSION: The dimension for the rule, for exampleVALIDITY.

If you want to build rules for the data quality scan by using rulerecommendations that are based on the results of a data profiling scan, getthe recommendations by calling thedataScans.jobs.generateDataQualityRules methodon the data profiling scan.

Note: If your BigQuerytable is configured with theRequire partition filter set totrue, use theBigQuery partition column as the data quality scan row filter ortimestamp column.

Export table schema

To export the data quality scan results to an existing BigQuerytable, make sure that it is compatible with the following table schema:

Column nameColumn data typeSub field name
(if applicable)
Sub field data typeModeExample
data_quality_scanstruct/recordresource_namestringnullable//dataplex.googleapis.com/projects/test-project/locations/europe-west2/datascans/test-datascan
project_idstringnullabledataplex-back-end-dev-project
locationstringnullableus-central1
data_scan_idstringnullabletest-datascan
display_namestringnullabledatascan-display-name
data_sourcestruct/recordresource_namestringnullableEntity case:
//dataplex.googleapis.com/projects/dataplex-back-end-dev-project/locations/europe-west2/lakes/a0-datascan-test-lake/zones/a0-datascan-test-zone/entities/table1

Table case://bigquery.googleapis.com/projects/test-project/datasets/test-dataset/tables/test-table
dataplex_entity_project_idstringnullabledataplex-back-end-dev-project
dataplex_entity_project_numberintegernullable123456789
dataplex_lake_idstringnullable(Valid only if source is entity)
test-lake
dataplex_zone_idstringnullable(Valid only if source is entity)
test-zone
dataplex_entity_idstringnullable(Valid only if source is entity)
test-entity
table_project_idstringnullabletest-project
table_project_numberintegernullable987654321
dataset_idstringnullable(Valid only if source is table)
test-dataset
table_idstringnullable(Valid only if source is table)
test-table
data_quality_job_idstringnullablecaeba234-cfde-4fca-9e5b-fe02a9812e38
data_quality_job_configurationjsontriggerstringnullableondemand/schedule
incrementalbooleannullabletrue/false
sampling_percentfloatnullable(0-100)
20.0 (indicates 20%)
row_filterstringnullablecol1 >= 0 AND col2< 10
incremental_columnstringnullablecolumn_name
job_labelsjsonnullable{"key1":value1}
job_start_timetimestampnullable2023-01-01 00:00:00 UTC
job_end_timetimestampnullable2023-01-01 00:00:00 UTC
job_quality_resultstruct/recordpassedbooleannullabletrue/false
scorefloatnullable90.8
incremental_startstringnullable2023-01-01T00:00:00
incremental_endstringnullable2024-01-01T00:00:00
job_dimension_resultjsonnullable{"ACCURACY":{"passed":true,"score":100},"CONSISTENCY":{"passed":false,"score":60}}
job_rows_scannedintegernullable7500
rule_namestringnullabletest-rule
rule_descriptionstringnullableTest rule description
rule_typestringnullableRange Check
rule_evaluation_typestringnullablePer row
rule_columnstringnullableRule only attached to a certain column
rule_dimensionstringnullableUNIQUENESS
rule_threshold_percentfloatnullable(0.0-100.0)
Rule-threshold-pct in API * 100
rule_parametersjsonnullable{min: 24, max:5345}
rule_passedbooleannullabletrue
rule_rows_evaluatedintegernullable7400
rule_rows_passedintegernullable3
rule_rows_nullintegernullable4
rule_failed_records_querystringnullable"SELECT * FROM `test-project.test-dataset.test-table` WHERE (NOT((`cTime` >= '15:31:38.776361' and `cTime`<= '19:23:53.754823') IS TRUE));"
created_ontimestampnullable2023-01-01 00:00:00 UTC
last_updatedtimestampnullable2023-01-01 00:00:00 UTC
rule_assertion_row_countintegernullable10
debug_queriesstruct/recorddescriptionstringnullableTest debug query description
sql_statementstringnullableSELECT MIN(col1) AS min_col1, AVG(col1) FROM ${data()}
debug_query_resultsstruct/recordrepeated[{"name": "min_col1", "type": "INTEGER", "value": "5"}, {"type": "FLOAT", "value": "7"}]
namestringnullableThe name of query result column, likemin_col1
typestringnullableThe type of query result column, likeINTEGER
valuestringnullableThe value of query result column, like5
Note: Columnrule_assertion_row_count is only applicable forSQL Assertion rule.

When you configureBigQueryExportfor a data quality scan job, follow these guidelines:

  • For the fieldresultsTable, use the format://bigquery.googleapis.com/projects/{project-id}/datasets/{dataset-id}/tables/{table-id}.
  • Use a BigQuery standard table.
  • If the table doesn't exist when the scan is created or updated,Dataplex Universal Catalog creates the table for you.
  • By default, the table is partitioned on thejob_start_time column daily.
  • If you want the table to be partitioned in other configurations or ifyou don't want the partition, then recreate the table with the requiredschema and configurations and then provide the pre-created table as theresults table.
  • Make sure the results table is in the same location as the source table.
  • If VPC-SC is configured on the project, then the results table must be in thesame VPC-SC perimeter as the source table.
  • If the table is modified during the scan execution stage, then the currentrunning job exports to the previous results table and the table changetakes effect from the next scan job.
  • Don't modify the table schema. If you need customized columns, create a viewupon the table.
  • To reduce costs, set an expiration on the partition based on your use case.For more information, see how toset the partition expiration.

Run a data quality scan

Console

  1. In the Google Cloud console, go to the Dataplex Universal CatalogData profiling & quality page.

    Go to Data profiling & quality

  2. Click the data quality scan to run.

  3. ClickRun now.

gcloud

To run a data quality scan, use thegcloud dataplex datascans run command:

gcloud dataplex datascans runDATASCAN \--location=LOCATION \

Replace the following variables:

  • LOCATION: The Google Cloud region in which thedata quality scan was created.
  • DATASCAN: The name of the data quality scan.

C#

C#

Before trying this sample, follow theC# setup instructions in theDataplex Universal Catalog quickstart using client libraries. For more information, see theDataplex Universal CatalogC# API reference documentation.

To authenticate to Dataplex Universal Catalog, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

usingGoogle.Cloud.Dataplex.V1;publicsealedpartialclassGeneratedDataScanServiceClientSnippets{/// <summary>Snippet for RunDataScan</summary>/// <remarks>/// This snippet has been automatically generated and should be regarded as a code template only./// It will require modifications to work:/// - It may require correct/in-range values for request initialization./// - It may require specifying regional endpoints when creating the service client as shown in///   https://cloud.google.com/dotnet/docs/reference/help/client-configuration#endpoint./// </remarks>publicvoidRunDataScanRequestObject(){// Create clientDataScanServiceClientdataScanServiceClient=DataScanServiceClient.Create();// Initialize request argument(s)RunDataScanRequestrequest=newRunDataScanRequest{DataScanName=DataScanName.FromProjectLocationDataScan("[PROJECT]","[LOCATION]","[DATASCAN]"),};// Make the requestRunDataScanResponseresponse=dataScanServiceClient.RunDataScan(request);}}

Go

Go

Before trying this sample, follow theGo setup instructions in theDataplex Universal Catalog quickstart using client libraries. For more information, see theDataplex Universal CatalogGo API reference documentation.

To authenticate to Dataplex Universal Catalog, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

packagemainimport("context"dataplex"cloud.google.com/go/dataplex/apiv1"dataplexpb"cloud.google.com/go/dataplex/apiv1/dataplexpb")funcmain(){ctx:=context.Background()// This snippet has been automatically generated and should be regarded as a code template only.// It will require modifications to work:// - It may require correct/in-range values for request initialization.// - It may require specifying regional endpoints when creating the service client as shown in://   https://pkg.go.dev/cloud.google.com/go#hdr-Client_Optionsc,err:=dataplex.NewDataScanClient(ctx)iferr!=nil{// TODO: Handle error.}deferc.Close()req:=&dataplexpb.RunDataScanRequest{// TODO: Fill request struct fields.// See https://pkg.go.dev/cloud.google.com/go/dataplex/apiv1/dataplexpb#RunDataScanRequest.}resp,err:=c.RunDataScan(ctx,req)iferr!=nil{// TODO: Handle error.}// TODO: Use resp._=resp}

Java

Java

Before trying this sample, follow theJava setup instructions in theDataplex Universal Catalog quickstart using client libraries. For more information, see theDataplex Universal CatalogJava API reference documentation.

To authenticate to Dataplex Universal Catalog, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

importcom.google.cloud.dataplex.v1.DataScanName;importcom.google.cloud.dataplex.v1.DataScanServiceClient;importcom.google.cloud.dataplex.v1.RunDataScanRequest;importcom.google.cloud.dataplex.v1.RunDataScanResponse;publicclassSyncRunDataScan{publicstaticvoidmain(String[]args)throwsException{syncRunDataScan();}publicstaticvoidsyncRunDataScan()throwsException{// This snippet has been automatically generated and should be regarded as a code template only.// It will require modifications to work:// - It may require correct/in-range values for request initialization.// - It may require specifying regional endpoints when creating the service client as shown in// https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_librarytry(DataScanServiceClientdataScanServiceClient=DataScanServiceClient.create()){RunDataScanRequestrequest=RunDataScanRequest.newBuilder().setName(DataScanName.of("[PROJECT]","[LOCATION]","[DATASCAN]").toString()).build();RunDataScanResponseresponse=dataScanServiceClient.runDataScan(request);}}}

Python

Python

Before trying this sample, follow thePython setup instructions in theDataplex Universal Catalog quickstart using client libraries. For more information, see theDataplex Universal CatalogPython API reference documentation.

To authenticate to Dataplex Universal Catalog, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

# This snippet has been automatically generated and should be regarded as a# code template only.# It will require modifications to work:# - It may require correct/in-range values for request initialization.# - It may require specifying regional endpoints when creating the service#   client as shown in:#   https://googleapis.dev/python/google-api-core/latest/client_options.htmlfromgoogle.cloudimportdataplex_v1defsample_run_data_scan():# Create a clientclient=dataplex_v1.DataScanServiceClient()# Initialize request argument(s)request=dataplex_v1.RunDataScanRequest(name="name_value",)# Make the requestresponse=client.run_data_scan(request=request)# Handle the responseprint(response)

Ruby

Ruby

Before trying this sample, follow theRuby setup instructions in theDataplex Universal Catalog quickstart using client libraries. For more information, see theDataplex Universal CatalogRuby API reference documentation.

To authenticate to Dataplex Universal Catalog, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

require"google/cloud/dataplex/v1"### Snippet for the run_data_scan call in the DataScanService service## This snippet has been automatically generated and should be regarded as a code# template only. It will require modifications to work:# - It may require correct/in-range values for request initialization.# - It may require specifying regional endpoints when creating the service# client as shown in https://cloud.google.com/ruby/docs/reference.## This is an auto-generated example demonstrating basic usage of# Google::Cloud::Dataplex::V1::DataScanService::Client#run_data_scan.#defrun_data_scan# Create a client object. The client can be reused for multiple calls.client=Google::Cloud::Dataplex::V1::DataScanService::Client.new# Create a request. To set request fields, pass in keyword arguments.request=Google::Cloud::Dataplex::V1::RunDataScanRequest.new# Call the run_data_scan method.result=client.run_data_scanrequest# The returned object is of type Google::Cloud::Dataplex::V1::RunDataScanResponse.presultend

REST

To run a data quality scan, use thedataScans.run method.

Note: Run isn't supported for data quality scans that are on a one-timeschedule.

View the data quality scan results

Console

  1. In the Google Cloud console, go to the Dataplex Universal CatalogData profiling & quality page.

    Go to Data profiling & quality

  2. Click the name of a data quality scan.

    • TheOverview section displays information about the most recentjobs, including when the scan was run, the number of recordsscanned in each job, whether all the data quality checks passed, andif there were failures, the number of data quality checks that failed.

    • TheData quality scan configuration section displays details about thescan.

  3. To see detailed information about a job, such as data quality scores thatindicate the percentage of rules that passed, which rules failed, and thejob logs, click theJobs history tab. Then, click a job ID.

Note: If you exported the scan results to a BigQuery table,then you can also access the scan results from the table. The data qualityscores are available if you published the scan results asDataplex Universal Catalog metadata.

gcloud

To view the results of a data quality scan job, use thegcloud dataplex datascans jobs describe command:

gcloud dataplex datascans jobs describeJOB \--location=LOCATION \--datascan=DATASCAN \--view=FULL

Replace the following variables:

  • JOB: The job ID of the data quality scan job.
  • LOCATION: The Google Cloud region in which the dataquality scan was created.
  • DATASCAN: The name of the data quality scan the jobbelongs to.
  • --view=FULL: To see the scan job result, specifyFULL.

C#

C#

Before trying this sample, follow theC# setup instructions in theDataplex Universal Catalog quickstart using client libraries. For more information, see theDataplex Universal CatalogC# API reference documentation.

To authenticate to Dataplex Universal Catalog, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

usingGoogle.Cloud.Dataplex.V1;publicsealedpartialclassGeneratedDataScanServiceClientSnippets{/// <summary>Snippet for GetDataScan</summary>/// <remarks>/// This snippet has been automatically generated and should be regarded as a code template only./// It will require modifications to work:/// - It may require correct/in-range values for request initialization./// - It may require specifying regional endpoints when creating the service client as shown in///   https://cloud.google.com/dotnet/docs/reference/help/client-configuration#endpoint./// </remarks>publicvoidGetDataScanRequestObject(){// Create clientDataScanServiceClientdataScanServiceClient=DataScanServiceClient.Create();// Initialize request argument(s)GetDataScanRequestrequest=newGetDataScanRequest{DataScanName=DataScanName.FromProjectLocationDataScan("[PROJECT]","[LOCATION]","[DATASCAN]"),View=GetDataScanRequest.Types.DataScanView.Unspecified,};// Make the requestDataScanresponse=dataScanServiceClient.GetDataScan(request);}}

Go

Go

Before trying this sample, follow theGo setup instructions in theDataplex Universal Catalog quickstart using client libraries. For more information, see theDataplex Universal CatalogGo API reference documentation.

To authenticate to Dataplex Universal Catalog, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

packagemainimport("context"dataplex"cloud.google.com/go/dataplex/apiv1"dataplexpb"cloud.google.com/go/dataplex/apiv1/dataplexpb")funcmain(){ctx:=context.Background()// This snippet has been automatically generated and should be regarded as a code template only.// It will require modifications to work:// - It may require correct/in-range values for request initialization.// - It may require specifying regional endpoints when creating the service client as shown in://   https://pkg.go.dev/cloud.google.com/go#hdr-Client_Optionsc,err:=dataplex.NewDataScanClient(ctx)iferr!=nil{// TODO: Handle error.}deferc.Close()req:=&dataplexpb.GetDataScanRequest{// TODO: Fill request struct fields.// See https://pkg.go.dev/cloud.google.com/go/dataplex/apiv1/dataplexpb#GetDataScanRequest.}resp,err:=c.GetDataScan(ctx,req)iferr!=nil{// TODO: Handle error.}// TODO: Use resp._=resp}

Java

Java

Before trying this sample, follow theJava setup instructions in theDataplex Universal Catalog quickstart using client libraries. For more information, see theDataplex Universal CatalogJava API reference documentation.

To authenticate to Dataplex Universal Catalog, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

importcom.google.cloud.dataplex.v1.DataScan;importcom.google.cloud.dataplex.v1.DataScanName;importcom.google.cloud.dataplex.v1.DataScanServiceClient;importcom.google.cloud.dataplex.v1.GetDataScanRequest;publicclassSyncGetDataScan{publicstaticvoidmain(String[]args)throwsException{syncGetDataScan();}publicstaticvoidsyncGetDataScan()throwsException{// This snippet has been automatically generated and should be regarded as a code template only.// It will require modifications to work:// - It may require correct/in-range values for request initialization.// - It may require specifying regional endpoints when creating the service client as shown in// https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_librarytry(DataScanServiceClientdataScanServiceClient=DataScanServiceClient.create()){GetDataScanRequestrequest=GetDataScanRequest.newBuilder().setName(DataScanName.of("[PROJECT]","[LOCATION]","[DATASCAN]").toString()).build();DataScanresponse=dataScanServiceClient.getDataScan(request);}}}

Python

Python

Before trying this sample, follow thePython setup instructions in theDataplex Universal Catalog quickstart using client libraries. For more information, see theDataplex Universal CatalogPython API reference documentation.

To authenticate to Dataplex Universal Catalog, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

# This snippet has been automatically generated and should be regarded as a# code template only.# It will require modifications to work:# - It may require correct/in-range values for request initialization.# - It may require specifying regional endpoints when creating the service#   client as shown in:#   https://googleapis.dev/python/google-api-core/latest/client_options.htmlfromgoogle.cloudimportdataplex_v1defsample_get_data_scan():# Create a clientclient=dataplex_v1.DataScanServiceClient()# Initialize request argument(s)request=dataplex_v1.GetDataScanRequest(name="name_value",)# Make the requestresponse=client.get_data_scan(request=request)# Handle the responseprint(response)

Ruby

Ruby

Before trying this sample, follow theRuby setup instructions in theDataplex Universal Catalog quickstart using client libraries. For more information, see theDataplex Universal CatalogRuby API reference documentation.

To authenticate to Dataplex Universal Catalog, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

require"google/cloud/dataplex/v1"### Snippet for the get_data_scan call in the DataScanService service## This snippet has been automatically generated and should be regarded as a code# template only. It will require modifications to work:# - It may require correct/in-range values for request initialization.# - It may require specifying regional endpoints when creating the service# client as shown in https://cloud.google.com/ruby/docs/reference.## This is an auto-generated example demonstrating basic usage of# Google::Cloud::Dataplex::V1::DataScanService::Client#get_data_scan.#defget_data_scan# Create a client object. The client can be reused for multiple calls.client=Google::Cloud::Dataplex::V1::DataScanService::Client.new# Create a request. To set request fields, pass in keyword arguments.request=Google::Cloud::Dataplex::V1::GetDataScanRequest.new# Call the get_data_scan method.result=client.get_data_scanrequest# The returned object is of type Google::Cloud::Dataplex::V1::DataScan.presultend

REST

To view the results of a data quality scan, use thedataScans.get method.

View published results

If the data quality scan results are published as Dataplex Universal Catalogmetadata, then you can see the latest scan resultson the BigQuery and Dataplex Universal Catalog pages in theGoogle Cloud console, on the source table'sData quality tab.

  1. In the Google Cloud console, go to the Dataplex Universal CatalogSearchpage.

    Go to Search

  2. Search for and then select the table.

  3. Click theData quality tab.

    The latest published results are displayed.

    Note: Published results might not be available if a scan is running for the firsttime.

View historical scan results

Dataplex Universal Catalog saves the data quality scan history of the last 300jobs or for the past year, whichever occurs first.

Console

  1. In the Google Cloud console, go to the Dataplex Universal CatalogData profiling & quality page.

    Go to Data profiling & quality

  2. Click the name of a data quality scan.

  3. Click theJobs history tab.

    TheJobs history tab provides information about past jobs, such asthe number of records scanned in each job, the job status, the timethe job was run, and whether each rule passed or failed.

  4. To view detailed information about a job, click any of the jobs in theJob ID column.

gcloud

To view historical data quality scan jobs, use thegcloud dataplex datascans jobs list command:

gcloud dataplex datascans jobs list \--location=LOCATION \--datascan=DATASCAN \

Replace the following variables:

  • LOCATION: The Google Cloud region in which the dataquality scan was created.
  • DATASCAN: The name of the data quality scan to viewhistorical jobs for.

C#

C#

Before trying this sample, follow theC# setup instructions in theDataplex Universal Catalog quickstart using client libraries. For more information, see theDataplex Universal CatalogC# API reference documentation.

To authenticate to Dataplex Universal Catalog, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

usingGoogle.Api.Gax;usingGoogle.Cloud.Dataplex.V1;usingSystem;publicsealedpartialclassGeneratedDataScanServiceClientSnippets{/// <summary>Snippet for ListDataScanJobs</summary>/// <remarks>/// This snippet has been automatically generated and should be regarded as a code template only./// It will require modifications to work:/// - It may require correct/in-range values for request initialization./// - It may require specifying regional endpoints when creating the service client as shown in///   https://cloud.google.com/dotnet/docs/reference/help/client-configuration#endpoint./// </remarks>publicvoidListDataScanJobsRequestObject(){// Create clientDataScanServiceClientdataScanServiceClient=DataScanServiceClient.Create();// Initialize request argument(s)ListDataScanJobsRequestrequest=newListDataScanJobsRequest{ParentAsDataScanName=DataScanName.FromProjectLocationDataScan("[PROJECT]","[LOCATION]","[DATASCAN]"),Filter="",};// Make the requestPagedEnumerable<ListDataScanJobsResponse,DataScanJob>response=dataScanServiceClient.ListDataScanJobs(request);// Iterate over all response items, lazily performing RPCs as requiredforeach(DataScanJobiteminresponse){// Do something with each itemConsole.WriteLine(item);}// Or iterate over pages (of server-defined size), performing one RPC per pageforeach(ListDataScanJobsResponsepageinresponse.AsRawResponses()){// Do something with each page of itemsConsole.WriteLine("A page of results:");foreach(DataScanJobiteminpage){// Do something with each itemConsole.WriteLine(item);}}// Or retrieve a single page of known size (unless it's the final page), performing as many RPCs as requiredintpageSize=10;Page<DataScanJob>singlePage=response.ReadPage(pageSize);// Do something with the page of itemsConsole.WriteLine($"A page of {pageSize} results (unless it's the final page):");foreach(DataScanJobiteminsinglePage){// Do something with each itemConsole.WriteLine(item);}// Store the pageToken, for when the next page is required.stringnextPageToken=singlePage.NextPageToken;}}

Go

Go

Before trying this sample, follow theGo setup instructions in theDataplex Universal Catalog quickstart using client libraries. For more information, see theDataplex Universal CatalogGo API reference documentation.

To authenticate to Dataplex Universal Catalog, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

packagemainimport("context"dataplex"cloud.google.com/go/dataplex/apiv1"dataplexpb"cloud.google.com/go/dataplex/apiv1/dataplexpb""google.golang.org/api/iterator")funcmain(){ctx:=context.Background()// This snippet has been automatically generated and should be regarded as a code template only.// It will require modifications to work:// - It may require correct/in-range values for request initialization.// - It may require specifying regional endpoints when creating the service client as shown in://   https://pkg.go.dev/cloud.google.com/go#hdr-Client_Optionsc,err:=dataplex.NewDataScanClient(ctx)iferr!=nil{// TODO: Handle error.}deferc.Close()req:=&dataplexpb.ListDataScanJobsRequest{// TODO: Fill request struct fields.// See https://pkg.go.dev/cloud.google.com/go/dataplex/apiv1/dataplexpb#ListDataScanJobsRequest.}it:=c.ListDataScanJobs(ctx,req)for{resp,err:=it.Next()iferr==iterator.Done{break}iferr!=nil{// TODO: Handle error.}// TODO: Use resp._=resp// If you need to access the underlying RPC response,// you can do so by casting the `Response` as below.// Otherwise, remove this line. Only populated after// first call to Next(). Not safe for concurrent access._=it.Response.(*dataplexpb.ListDataScanJobsResponse)}}

Java

Java

Before trying this sample, follow theJava setup instructions in theDataplex Universal Catalog quickstart using client libraries. For more information, see theDataplex Universal CatalogJava API reference documentation.

To authenticate to Dataplex Universal Catalog, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

importcom.google.cloud.dataplex.v1.DataScanJob;importcom.google.cloud.dataplex.v1.DataScanName;importcom.google.cloud.dataplex.v1.DataScanServiceClient;importcom.google.cloud.dataplex.v1.ListDataScanJobsRequest;publicclassSyncListDataScanJobs{publicstaticvoidmain(String[]args)throwsException{syncListDataScanJobs();}publicstaticvoidsyncListDataScanJobs()throwsException{// This snippet has been automatically generated and should be regarded as a code template only.// It will require modifications to work:// - It may require correct/in-range values for request initialization.// - It may require specifying regional endpoints when creating the service client as shown in// https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_librarytry(DataScanServiceClientdataScanServiceClient=DataScanServiceClient.create()){ListDataScanJobsRequestrequest=ListDataScanJobsRequest.newBuilder().setParent(DataScanName.of("[PROJECT]","[LOCATION]","[DATASCAN]").toString()).setPageSize(883849137).setPageToken("pageToken873572522").setFilter("filter-1274492040").build();for(DataScanJobelement:dataScanServiceClient.listDataScanJobs(request).iterateAll()){// doThingsWith(element);}}}}

Python

Python

Before trying this sample, follow thePython setup instructions in theDataplex Universal Catalog quickstart using client libraries. For more information, see theDataplex Universal CatalogPython API reference documentation.

To authenticate to Dataplex Universal Catalog, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

# This snippet has been automatically generated and should be regarded as a# code template only.# It will require modifications to work:# - It may require correct/in-range values for request initialization.# - It may require specifying regional endpoints when creating the service#   client as shown in:#   https://googleapis.dev/python/google-api-core/latest/client_options.htmlfromgoogle.cloudimportdataplex_v1defsample_list_data_scan_jobs():# Create a clientclient=dataplex_v1.DataScanServiceClient()# Initialize request argument(s)request=dataplex_v1.ListDataScanJobsRequest(parent="parent_value",)# Make the requestpage_result=client.list_data_scan_jobs(request=request)# Handle the responseforresponseinpage_result:print(response)

Ruby

Ruby

Before trying this sample, follow theRuby setup instructions in theDataplex Universal Catalog quickstart using client libraries. For more information, see theDataplex Universal CatalogRuby API reference documentation.

To authenticate to Dataplex Universal Catalog, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

require"google/cloud/dataplex/v1"### Snippet for the list_data_scan_jobs call in the DataScanService service## This snippet has been automatically generated and should be regarded as a code# template only. It will require modifications to work:# - It may require correct/in-range values for request initialization.# - It may require specifying regional endpoints when creating the service# client as shown in https://cloud.google.com/ruby/docs/reference.## This is an auto-generated example demonstrating basic usage of# Google::Cloud::Dataplex::V1::DataScanService::Client#list_data_scan_jobs.#deflist_data_scan_jobs# Create a client object. The client can be reused for multiple calls.client=Google::Cloud::Dataplex::V1::DataScanService::Client.new# Create a request. To set request fields, pass in keyword arguments.request=Google::Cloud::Dataplex::V1::ListDataScanJobsRequest.new# Call the list_data_scan_jobs method.result=client.list_data_scan_jobsrequest# The returned object is of type Gapic::PagedEnumerable. You can iterate# over elements, and API calls will be issued to fetch pages as needed.result.eachdo|item|# Each element is of type ::Google::Cloud::Dataplex::V1::DataScanJob.pitemendend

REST

To view historical data quality scan jobs, use thedataScans.jobs.list method.

Grant access to data quality scan results

To enable the users in your organization to view the scan results, do the following:

  1. In the Google Cloud console, go to the Dataplex Universal CatalogData profiling & quality page.

    Go to Data profiling & quality

  2. Click the data quality scan you want to share the results of.

  3. Click thePermissions tab.

  4. Do the following:

    • To grant access to a principal, clickGrant access. Grant theDataplex DataScan DataViewer role to theassociated principal.
    • To remove access from a principal, select the principal that youwant to remove theDataplex DataScan DataViewer role from. ClickRemove access, and then confirm when prompted.

Set alerts in Cloud Logging

To set alerts for data quality failures using the logs in Cloud Logging,follow these steps:

Console

  1. In the Google Cloud console, go to the Cloud LoggingLogs Explorer.

    Go to Logs explorer

  2. In theQuery window, enter your query. Seesample queries.

  3. ClickRun Query.

  4. ClickCreate alert. This opens a side panel.

  5. Enter your alert policy name and clickNext.

  6. Review the query.

    1. Click thePreview Logs button to test your query. This shows logswith matching conditions.

    2. ClickNext.

  7. Set the time between notifications and clickNext.

  8. Define who should be notified for the alert and clickSave to createthe alert policy.

Alternatively, you can configure and edit your alerts by navigating in theGoogle Cloud console toMonitoring>Alerting.

gcloud

Not supported.

REST

For more information about how to set alerts in Cloud Logging, seeCreate a log-based alerting policy by using the Monitoring API.

Sample queries for setting job level or dimension level alerts

  • A sample query to set alerts on overall data quality failures for a data qualityscan:

    resource.type="dataplex.googleapis.com/DataScan"AND labels."dataplex.googleapis.com/data_scan_state"="SUCCEEDED"AND resource.labels.resource_container="projects/112233445566"AND resource.labels.datascan_id="a0-test-dec6-dq-3"AND NOT jsonPayload.dataQuality.passed=true
  • A sample query to set alerts on data quality failures for a dimension(for example, uniqueness) of a given data quality scan:

    resource.type="dataplex.googleapis.com/DataScan"AND labels."dataplex.googleapis.com/data_scan_state"="SUCCEEDED"AND resource.labels.resource_container="projects/112233445566"AND resource.labels.datascan_id="a0-test-dec6-dq-3"AND jsonPayload.dataQuality.dimensionPassed.UNIQUENESS=false
  • A sample query to set alerts on data quality failures for a table.

    • Set alerts on data quality failures for a BigQuery table thatisn't organized in a Dataplex Universal Catalog lake:

      resource.type="dataplex.googleapis.com/DataScan"AND jsonPayload.dataSource="//bigquery.googleapis.com/projects/test-project/datasets/testdataset/table/chicago_taxi_trips"AND labels."dataplex.googleapis.com/data_scan_state"="SUCCEEDED"AND resource.labels.resource_container="projects/112233445566"AND NOT jsonPayload.dataQuality.passed=true
    • Set alerts on data quality failures for a BigQuery tablethat's organized in a Dataplex Universal Catalog lake:

      resource.type="dataplex.googleapis.com/DataScan"AND jsonPayload.dataSource="projects/test-project/datasets/testdataset/table/chicago_taxi_trips"AND labels."dataplex.googleapis.com/data_scan_state"="SUCCEEDED"AND resource.labels.resource_container="projects/112233445566"AND NOT jsonPayload.dataQuality.passed=true

Sample queries to set per rule alerts

  • A sample query to set alerts on all failing data quality rules with thespecified custom rule name for a data quality scan:

    resource.type="dataplex.googleapis.com/DataScan"AND jsonPayload.ruleName="custom-name"AND jsonPayload.result="FAILED"
  • A sample query to set alerts on all failing data quality rules of a specificevaluation type for a data quality scan:

    resource.type="dataplex.googleapis.com/DataScan"AND jsonPayload.evalutionType="PER_ROW"AND jsonPayload.result="FAILED"
  • A sample query to set alerts on all failing data quality rules for a columnin the table used for a data quality scan:

    resource.type="dataplex.googleapis.com/DataScan"AND jsonPayload.column="CInteger"AND jsonPayload.result="FAILED"

Troubleshoot a data quality failure

For each job with row-level rules that fail, Dataplex Universal Catalog providesa query to get the failed records. Run this query to see the records that didnot match your rule.

Note: The query returns all of the columns of the table, not just the failedcolumn.

Console

  1. In the Google Cloud console, go to the Dataplex Universal CatalogData profiling & quality page.

    Go to Data profiling & quality

  2. Click the name of the data quality scan whose records you want to troubleshoot.

  3. Click theJobs history tab.

  4. Click the job ID of the job that identified data quality failures.

  5. In the job results window that opens, in theRules section, find the columnQuery to get failed records. ClickCopy query to clipboard for thefailed rule.

  6. Run the query in BigQueryto see the records that caused the job to fail.

gcloud

Not supported.

REST

  1. To get the job that identified the data quality failures, use thedataScans.get method.

    In the response object, thefailingRowsQuery field shows the query.

  2. Run the query in BigQueryto see the records that caused the job to fail.

Dataplex Universal Catalog also runs the debug query, provided it was includedduring the rule creation. The debug query results are included in each rule'soutput. This feature is inPreview.

Console

Not supported.

gcloud

Not supported.

REST

To get the job that identified the data quality failures, use thedataScans.get method.In the response object, thedebugQueriesResultSets field shows theresults of the debug queries.

Manage data quality scans for a specific table

The steps in this document show how to manage data profile scans across yourproject by using the Dataplex Universal CatalogData profiling & quality pagein the Google Cloud console.

You can also create and manage data profile scans when working with aspecific table. In the Google Cloud console, on the Dataplex Universal Catalogpage for the table, use theData quality tab. Do the following:

  1. In the Google Cloud console, go to the Dataplex Universal CatalogSearch page.

    Go to Search

    Search for and then select the table.

  2. Click theData quality tab.

  3. Depending on whether the table has a data quality scan whose results arepublished as Dataplex Universal Catalog metadata, you can work with the table'sdata quality scans in the following ways:

    • Data quality scan results are published: the latest scan results aredisplayed on the page.

      To manage the data quality scans for this table, clickData qualityscan, and then select from the following options:

      • Create new scan: create a new data quality scan. For moreinformation, see theCreate a data quality scan sectionof this document. When you create a scan from a table's details page, thetable is preselected.

      • Run now: run the scan.

      • Edit scan configuration: edit settings including the display name,filters, and schedule.

        To edit the data quality rules, on theData quality tab, click theRules tab. ClickModify rules. Update the rules and then clickSave.

      • Manage scan permissions: control who can access the scan results.For more information, see theGrant access to data quality scan resultssection of this document.

      • View historical results: view detailed information about previousdata quality scan jobs. For more information, see theView data quality scan results andView historical scan results sections ofthis document.

      • View all scans: view a list of data quality scans that apply to thistable.

    • Data quality scan results aren't published: select from thefollowing options:

      • Create data quality scan: create a new data quality scan. For moreinformation, see theCreate a data quality scan sectionof this document. When you create a scan from a table's details page, thetable is preselected.

      • View existing scans: view a list of data quality scans that apply tothis table.

Update a data quality scan

You can edit various settings for an existing data quality scan, such as thedisplay name, filters, schedule, and data quality rules.

Note: If an existing data quality scan publishes the results to theBigQuery and Dataplex Universal Catalog pages in theGoogle Cloud console, and you instead want to publish future scan results asDataplex Universal Catalog metadata, you must edit the scan and re-enable publishing.You might need additional permissions to enable catalog publishing.

Console

  1. In the Google Cloud console, go to the Dataplex Universal CatalogData profiling & quality page.

    Go to Data profiling & quality

  2. Click the name of a data quality scan.

  3. To edit settings including the display name, filters, and schedule, clickEdit. Edit the values and then clickSave.

  4. To edit the data quality rules, on the scan details page, click theCurrent rules tab. ClickModify rules. Update the rules andthen clickSave.

gcloud

To update the description of a data quality scan, use thegcloud dataplex datascans update data-quality command:

gcloud dataplex datascans update data-qualityDATASCAN \--location=LOCATION \--description=DESCRIPTION

Replace the following:

  • DATASCAN: The name of the data quality scan toupdate.
  • LOCATION: The Google Cloud region in which the dataquality scan was created.
  • DESCRIPTION: The new description for the dataquality scan.
Note: You can update specification fields, such asrules,rowFilter, orsamplingPercent, in the data quality specification file. Refer toJSON andYAML representations.

C#

C#

Before trying this sample, follow theC# setup instructions in theDataplex Universal Catalog quickstart using client libraries. For more information, see theDataplex Universal CatalogC# API reference documentation.

To authenticate to Dataplex Universal Catalog, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

usingGoogle.Cloud.Dataplex.V1;usingGoogle.LongRunning;usingGoogle.Protobuf.WellKnownTypes;publicsealedpartialclassGeneratedDataScanServiceClientSnippets{/// <summary>Snippet for UpdateDataScan</summary>/// <remarks>/// This snippet has been automatically generated and should be regarded as a code template only./// It will require modifications to work:/// - It may require correct/in-range values for request initialization./// - It may require specifying regional endpoints when creating the service client as shown in///   https://cloud.google.com/dotnet/docs/reference/help/client-configuration#endpoint./// </remarks>publicvoidUpdateDataScanRequestObject(){// Create clientDataScanServiceClientdataScanServiceClient=DataScanServiceClient.Create();// Initialize request argument(s)UpdateDataScanRequestrequest=newUpdateDataScanRequest{DataScan=newDataScan(),UpdateMask=newFieldMask(),ValidateOnly=false,};// Make the requestOperation<DataScan,OperationMetadata>response=dataScanServiceClient.UpdateDataScan(request);// Poll until the returned long-running operation is completeOperation<DataScan,OperationMetadata>completedResponse=response.PollUntilCompleted();// Retrieve the operation resultDataScanresult=completedResponse.Result;// Or get the name of the operationstringoperationName=response.Name;// This name can be stored, then the long-running operation retrieved later by nameOperation<DataScan,OperationMetadata>retrievedResponse=dataScanServiceClient.PollOnceUpdateDataScan(operationName);// Check if the retrieved long-running operation has completedif(retrievedResponse.IsCompleted){// If it has completed, then access the resultDataScanretrievedResult=retrievedResponse.Result;}}}

Go

Go

Before trying this sample, follow theGo setup instructions in theDataplex Universal Catalog quickstart using client libraries. For more information, see theDataplex Universal CatalogGo API reference documentation.

To authenticate to Dataplex Universal Catalog, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

packagemainimport("context"dataplex"cloud.google.com/go/dataplex/apiv1"dataplexpb"cloud.google.com/go/dataplex/apiv1/dataplexpb")funcmain(){ctx:=context.Background()// This snippet has been automatically generated and should be regarded as a code template only.// It will require modifications to work:// - It may require correct/in-range values for request initialization.// - It may require specifying regional endpoints when creating the service client as shown in://   https://pkg.go.dev/cloud.google.com/go#hdr-Client_Optionsc,err:=dataplex.NewDataScanClient(ctx)iferr!=nil{// TODO: Handle error.}deferc.Close()req:=&dataplexpb.UpdateDataScanRequest{// TODO: Fill request struct fields.// See https://pkg.go.dev/cloud.google.com/go/dataplex/apiv1/dataplexpb#UpdateDataScanRequest.}op,err:=c.UpdateDataScan(ctx,req)iferr!=nil{// TODO: Handle error.}resp,err:=op.Wait(ctx)iferr!=nil{// TODO: Handle error.}// TODO: Use resp._=resp}

Java

Java

Before trying this sample, follow theJava setup instructions in theDataplex Universal Catalog quickstart using client libraries. For more information, see theDataplex Universal CatalogJava API reference documentation.

To authenticate to Dataplex Universal Catalog, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

importcom.google.cloud.dataplex.v1.DataScan;importcom.google.cloud.dataplex.v1.DataScanServiceClient;importcom.google.cloud.dataplex.v1.UpdateDataScanRequest;importcom.google.protobuf.FieldMask;publicclassSyncUpdateDataScan{publicstaticvoidmain(String[]args)throwsException{syncUpdateDataScan();}publicstaticvoidsyncUpdateDataScan()throwsException{// This snippet has been automatically generated and should be regarded as a code template only.// It will require modifications to work:// - It may require correct/in-range values for request initialization.// - It may require specifying regional endpoints when creating the service client as shown in// https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_librarytry(DataScanServiceClientdataScanServiceClient=DataScanServiceClient.create()){UpdateDataScanRequestrequest=UpdateDataScanRequest.newBuilder().setDataScan(DataScan.newBuilder().build()).setUpdateMask(FieldMask.newBuilder().build()).setValidateOnly(true).build();DataScanresponse=dataScanServiceClient.updateDataScanAsync(request).get();}}}

Python

Python

Before trying this sample, follow thePython setup instructions in theDataplex Universal Catalog quickstart using client libraries. For more information, see theDataplex Universal CatalogPython API reference documentation.

To authenticate to Dataplex Universal Catalog, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

# This snippet has been automatically generated and should be regarded as a# code template only.# It will require modifications to work:# - It may require correct/in-range values for request initialization.# - It may require specifying regional endpoints when creating the service#   client as shown in:#   https://googleapis.dev/python/google-api-core/latest/client_options.htmlfromgoogle.cloudimportdataplex_v1defsample_update_data_scan():# Create a clientclient=dataplex_v1.DataScanServiceClient()# Initialize request argument(s)data_scan=dataplex_v1.DataScan()data_scan.data_quality_spec.rules.dimension="dimension_value"data_scan.data.entity="entity_value"request=dataplex_v1.UpdateDataScanRequest(data_scan=data_scan,)# Make the requestoperation=client.update_data_scan(request=request)print("Waiting for operation to complete...")response=operation.result()# Handle the responseprint(response)

Ruby

Ruby

Before trying this sample, follow theRuby setup instructions in theDataplex Universal Catalog quickstart using client libraries. For more information, see theDataplex Universal CatalogRuby API reference documentation.

To authenticate to Dataplex Universal Catalog, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

require"google/cloud/dataplex/v1"### Snippet for the update_data_scan call in the DataScanService service## This snippet has been automatically generated and should be regarded as a code# template only. It will require modifications to work:# - It may require correct/in-range values for request initialization.# - It may require specifying regional endpoints when creating the service# client as shown in https://cloud.google.com/ruby/docs/reference.## This is an auto-generated example demonstrating basic usage of# Google::Cloud::Dataplex::V1::DataScanService::Client#update_data_scan.#defupdate_data_scan# Create a client object. The client can be reused for multiple calls.client=Google::Cloud::Dataplex::V1::DataScanService::Client.new# Create a request. To set request fields, pass in keyword arguments.request=Google::Cloud::Dataplex::V1::UpdateDataScanRequest.new# Call the update_data_scan method.result=client.update_data_scanrequest# The returned object is of type Gapic::Operation. You can use it to# check the status of an operation, cancel it, or wait for results.# Here is how to wait for a response.result.wait_until_done!timeout:60ifresult.response?presult.responseelseputs"No response received."endend

REST

To edit a data quality scan, use thedataScans.patch method.

Note: Update isn't supported for data quality scans that are on a one-timeschedule.

Delete a data quality scan

Console

Console

  1. In the Google Cloud console, go to the Dataplex Universal CatalogData profiling & quality page.

    Go to Data profiling & quality

  2. Click the scan you want to delete.

  3. ClickDelete, and then confirm when prompted.

gcloud

gcloud

To delete a data quality scan, use thegcloud dataplex datascans delete command:

gcloud dataplex datascans deleteDATASCAN \--location=LOCATION \--async

Replace the following variables:

  • DATASCAN: The name of the data quality scan todelete.
  • LOCATION: The Google Cloud region in which the dataquality scan was created.

C#

C#

Before trying this sample, follow theC# setup instructions in theDataplex Universal Catalog quickstart using client libraries. For more information, see theDataplex Universal CatalogC# API reference documentation.

To authenticate to Dataplex Universal Catalog, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

usingGoogle.Cloud.Dataplex.V1;usingGoogle.LongRunning;usingGoogle.Protobuf.WellKnownTypes;publicsealedpartialclassGeneratedDataScanServiceClientSnippets{/// <summary>Snippet for DeleteDataScan</summary>/// <remarks>/// This snippet has been automatically generated and should be regarded as a code template only./// It will require modifications to work:/// - It may require correct/in-range values for request initialization./// - It may require specifying regional endpoints when creating the service client as shown in///   https://cloud.google.com/dotnet/docs/reference/help/client-configuration#endpoint./// </remarks>publicvoidDeleteDataScanRequestObject(){// Create clientDataScanServiceClientdataScanServiceClient=DataScanServiceClient.Create();// Initialize request argument(s)DeleteDataScanRequestrequest=newDeleteDataScanRequest{DataScanName=DataScanName.FromProjectLocationDataScan("[PROJECT]","[LOCATION]","[DATASCAN]"),Force=false,};// Make the requestOperation<Empty,OperationMetadata>response=dataScanServiceClient.DeleteDataScan(request);// Poll until the returned long-running operation is completeOperation<Empty,OperationMetadata>completedResponse=response.PollUntilCompleted();// Retrieve the operation resultEmptyresult=completedResponse.Result;// Or get the name of the operationstringoperationName=response.Name;// This name can be stored, then the long-running operation retrieved later by nameOperation<Empty,OperationMetadata>retrievedResponse=dataScanServiceClient.PollOnceDeleteDataScan(operationName);// Check if the retrieved long-running operation has completedif(retrievedResponse.IsCompleted){// If it has completed, then access the resultEmptyretrievedResult=retrievedResponse.Result;}}}

Go

Go

Before trying this sample, follow theGo setup instructions in theDataplex Universal Catalog quickstart using client libraries. For more information, see theDataplex Universal CatalogGo API reference documentation.

To authenticate to Dataplex Universal Catalog, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

packagemainimport("context"dataplex"cloud.google.com/go/dataplex/apiv1"dataplexpb"cloud.google.com/go/dataplex/apiv1/dataplexpb")funcmain(){ctx:=context.Background()// This snippet has been automatically generated and should be regarded as a code template only.// It will require modifications to work:// - It may require correct/in-range values for request initialization.// - It may require specifying regional endpoints when creating the service client as shown in://   https://pkg.go.dev/cloud.google.com/go#hdr-Client_Optionsc,err:=dataplex.NewDataScanClient(ctx)iferr!=nil{// TODO: Handle error.}deferc.Close()req:=&dataplexpb.DeleteDataScanRequest{// TODO: Fill request struct fields.// See https://pkg.go.dev/cloud.google.com/go/dataplex/apiv1/dataplexpb#DeleteDataScanRequest.}op,err:=c.DeleteDataScan(ctx,req)iferr!=nil{// TODO: Handle error.}err=op.Wait(ctx)iferr!=nil{// TODO: Handle error.}}

Java

Java

Before trying this sample, follow theJava setup instructions in theDataplex Universal Catalog quickstart using client libraries. For more information, see theDataplex Universal CatalogJava API reference documentation.

To authenticate to Dataplex Universal Catalog, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

importcom.google.cloud.dataplex.v1.DataScanName;importcom.google.cloud.dataplex.v1.DataScanServiceClient;importcom.google.cloud.dataplex.v1.DeleteDataScanRequest;importcom.google.protobuf.Empty;publicclassSyncDeleteDataScan{publicstaticvoidmain(String[]args)throwsException{syncDeleteDataScan();}publicstaticvoidsyncDeleteDataScan()throwsException{// This snippet has been automatically generated and should be regarded as a code template only.// It will require modifications to work:// - It may require correct/in-range values for request initialization.// - It may require specifying regional endpoints when creating the service client as shown in// https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_librarytry(DataScanServiceClientdataScanServiceClient=DataScanServiceClient.create()){DeleteDataScanRequestrequest=DeleteDataScanRequest.newBuilder().setName(DataScanName.of("[PROJECT]","[LOCATION]","[DATASCAN]").toString()).setForce(true).build();dataScanServiceClient.deleteDataScanAsync(request).get();}}}

Python

Python

Before trying this sample, follow thePython setup instructions in theDataplex Universal Catalog quickstart using client libraries. For more information, see theDataplex Universal CatalogPython API reference documentation.

To authenticate to Dataplex Universal Catalog, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

# This snippet has been automatically generated and should be regarded as a# code template only.# It will require modifications to work:# - It may require correct/in-range values for request initialization.# - It may require specifying regional endpoints when creating the service#   client as shown in:#   https://googleapis.dev/python/google-api-core/latest/client_options.htmlfromgoogle.cloudimportdataplex_v1defsample_delete_data_scan():# Create a clientclient=dataplex_v1.DataScanServiceClient()# Initialize request argument(s)request=dataplex_v1.DeleteDataScanRequest(name="name_value",)# Make the requestoperation=client.delete_data_scan(request=request)print("Waiting for operation to complete...")response=operation.result()# Handle the responseprint(response)

Ruby

Ruby

Before trying this sample, follow theRuby setup instructions in theDataplex Universal Catalog quickstart using client libraries. For more information, see theDataplex Universal CatalogRuby API reference documentation.

To authenticate to Dataplex Universal Catalog, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

require"google/cloud/dataplex/v1"### Snippet for the delete_data_scan call in the DataScanService service## This snippet has been automatically generated and should be regarded as a code# template only. It will require modifications to work:# - It may require correct/in-range values for request initialization.# - It may require specifying regional endpoints when creating the service# client as shown in https://cloud.google.com/ruby/docs/reference.## This is an auto-generated example demonstrating basic usage of# Google::Cloud::Dataplex::V1::DataScanService::Client#delete_data_scan.#defdelete_data_scan# Create a client object. The client can be reused for multiple calls.client=Google::Cloud::Dataplex::V1::DataScanService::Client.new# Create a request. To set request fields, pass in keyword arguments.request=Google::Cloud::Dataplex::V1::DeleteDataScanRequest.new# Call the delete_data_scan method.result=client.delete_data_scanrequest# The returned object is of type Gapic::Operation. You can use it to# check the status of an operation, cancel it, or wait for results.# Here is how to wait for a response.result.wait_until_done!timeout:60ifresult.response?presult.responseelseputs"No response received."endend

REST

REST

To delete a data quality scan, use thedataScans.delete method.

Note: Delete isn't supported for data quality scans that are on a one-timeschedule.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-20 UTC.