Profile your data
This document explains how to use data profile scans to better understand your data.BigQuery uses Dataplex Universal Catalog to analyze the statisticalcharacteristics of your data, such as average values, unique values, and maximumvalues. Dataplex Universal Catalog also uses this information torecommend rules for data quality checks.
For more information about data profiling, seeAbout data profiling.
Tip: The steps in this document show how to manage data profile scans acrossyour project. You can also create and manage data profile scans when workingwith a specific table. For more information, see theManage data profile scans for a specific table sectionof this document.Before you begin
Enable the Dataplex API.
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enable permission.Learn how to grant roles.
Required roles
To get the permissions that you need to create and manage data profile scans, ask your administrator to grant you the following IAM roles on your resource such as the project or table:
- To create, run, update, and delete data profile scans:Dataplex DataScan Editor (
roles/dataplex.dataScanEditor) role on the project containing the data scan. - To allow Dataplex Universal Catalog to run data profile scans against BigQuery data, grant the following roles to theDataplex Universal Catalog service account:BigQuery Job User (
roles/bigquery.jobUser) role on the project running the scan;BigQuery Data Viewer (roles/bigquery.dataViewer) role on the tables being scanned. - To run data profile scans for BigQuery external tables that use Cloud Storage data: grant theDataplex Universal Catalog service account theStorage Object Viewer (
roles/storage.objectViewer) andStorage Legacy Bucket Reader (roles/storage.legacyBucketReader) roles on the Cloud Storage bucket. - To view data profile scan results, jobs, and history:Dataplex DataScan Viewer (
roles/dataplex.dataScanViewer) role on the project containing the data scan. - To export data profile scan results to a BigQuery table:BigQuery Data Editor (
roles/bigquery.dataEditor) role on the table. - To publish data profile scan results to Dataplex Universal Catalog:Dataplex Catalog Editor (
roles/dataplex.catalogEditor) role on the@bigqueryentry group. - To view published data profile scan results in BigQuery on theData profile tab:BigQuery Data Viewer (
roles/bigquery.dataViewer) role on the table.
For more information about granting roles, seeManage access to projects, folders, and organizations.
You might also be able to get the required permissions throughcustom roles or otherpredefined roles.
Required permissions
If you use custom roles, you need to grant the following IAM permissions:
- To create, run, update, and delete data profile scans:
dataplex.datascans.createon project—Create aDataScandataplex.datascans.updateon data scan—Update the description of aDataScandataplex.datascans.deleteon data scan—Delete aDataScandataplex.datascans.runon data scan—Run aDataScandataplex.datascans.geton data scan—ViewDataScandetails excluding resultsdataplex.datascans.liston project—ListDataScansdataplex.dataScanJobs.geton data scan job—Read DataScan job resourcesdataplex.dataScanJobs.liston data scan—List DataScan job resources in a project
- To allow Dataplex Universal Catalog to run data profile scans against BigQuery data:
bigquery.jobs.createon project—Run jobsbigquery.tables.geton table—Get table metadatabigquery.tables.getDataon table—Get table data
- To run data profile scans for BigQuery external tables that use Cloud Storage data:
storage.buckets.geton bucket—Read bucket metadatastorage.objects.geton object—Read object data
- To view data profile scan results, jobs, and history:
dataplex.datascans.getDataon data scan—ViewDataScandetails including resultsdataplex.datascans.liston project—ListDataScansdataplex.dataScanJobs.geton data scan job—Read DataScan job resourcesdataplex.dataScanJobs.liston data scan—List DataScan job resources in a project
- To export data profile scan results to a BigQuery table:
bigquery.tables.createon dataset—Create tablesbigquery.tables.updateDataon table—Write data to tables
- To publish data profile scan results to Dataplex Universal Catalog:
dataplex.entryGroups.useDataProfileAspecton entry group—Allows Dataplex Universal Catalog data profile scans to save their results to Dataplex Universal Catalog- Additionally, you need one of the following permissions:
bigquery.tables.updateon table—Update table metadatadataplex.entries.updateon entry—Update entries
- To view published data profile results for a table in BigQuery or Dataplex Universal Catalog:
bigquery.tables.geton table—Get table metadatabigquery.tables.getDataon table—Get table data
If a table uses BigQueryrow-levelsecurity, then Dataplex Universal Catalogcan only scan rows visible to the Dataplex Universal Catalog service account. Toallow Dataplex Universal Catalog to scan all rows, add its service account to a rowfilter where the predicate isTRUE.
If a table uses BigQuerycolumn-level security, then Dataplex Universal Catalogrequires access to scan protected columns. To grant access, give theDataplex Universal Catalog service account theData Catalog Fine-Grained Reader (roles/datacatalog.fineGrainedReader)role on all policy tags used in the table. The user creating or updating a datascan also needs permissions on protected columns.
Grant roles to the Dataplex Universal Catalog service account
To run data profile scans, Dataplex Universal Catalog uses a service account thatrequires permissions to run BigQuery jobs and readBigQuery table data. To grant the required roles, followthese steps:
Get the Dataplex Universal Catalog service account email address. If you haven'tcreated a data profile or data quality scan in this project before,run the following
gcloudcommand to generate the service identity:gcloudbetaservicesidentitycreate--service=dataplex.googleapis.comThe command returns the service account email, which has the following format:service-PROJECT_ID@gcp-sa-dataplex.iam.gserviceaccount.com.
If the service account already exists, you can find its email by viewingprincipals with theDataplex name on theIAM page in the Google Cloud console.
Grant the service account theBigQuery Job User(
roles/bigquery.jobUser) role on your project. This role lets theservice account run BigQuery jobs for the scan.gcloudprojectsadd-iam-policy-bindingPROJECT_ID\--member="serviceAccount:service-PROJECT_NUMBER@gcp-sa-dataplex.iam.gserviceaccount.com"\--role="roles/bigquery.jobUser"Replace the following:
PROJECT_ID: your Google Cloud project ID.service-PROJECT_NUMBER@gcp-sa-dataplex.iam.gserviceaccount.com: the email of the Dataplex Universal Catalog service account.
Grant the service account theBigQuery Data Viewer(
roles/bigquery.dataViewer) role for each table that you want toprofile. This role grants read-only access to the tables.gcloudbigquerytablesadd-iam-policy-bindingDATASET_ID.TABLE_ID\--member="serviceAccount:service-PROJECT_NUMBER@gcp-sa-dataplex.iam.gserviceaccount.com"\--role="roles/bigquery.dataViewer"Replace the following:
DATASET_ID: the ID of the dataset containing the table.TABLE_ID: the ID of the table to profile.service-PROJECT_NUMBER@gcp-sa-dataplex.iam.gserviceaccount.com: the email of the Dataplex Universal Catalog service account.Create a data profile scan
Note: If your BigQuery table is configured with theConsole
In the Google Cloud console, on the BigQueryMetadata curation page, go to theData profiling & quality tab.
ClickCreate data profile scan.
Optional: Enter aDisplay name.
Enter anID. See theResource naming conventions.
Optional: Enter aDescription.
In theTable field, clickBrowse. Choose the table to scan, andthen clickSelect.
For tables in multi-region datasets, choose a region where to createthe data scan.
To browse the tables organized within Dataplex Universal Catalog lakes,clickBrowse within Dataplex Lakes.
In theScope field, chooseIncremental orEntire data.
- If you chooseIncremental data, in theTimestamp column field,select a column of type
DATEorTIMESTAMPfrom yourBigQuery table that increases as new records are added,and that can be used to identify new records. For tables partitioned on acolumn of typeDATEorTIMESTAMP, we recommend using the partitioncolumn as the timestamp field.
- If you chooseIncremental data, in theTimestamp column field,select a column of type
Optional: To filter your data, do any of the following:
To filter by rows, click select theFilter rows checkbox.Enter a valid SQL expression that can be used in a
WHEREclause in GoogleSQL syntax.For example:col1 >= 0.The filter can be a combination of SQL conditions over multiplecolumns. For example:
col1 >= 0 AND col2 < 10.To filter by columns, select theFilter columns checkbox.
To include columns in the profile scan, in theInclude columnsfield, clickBrowse. Select the columns to include, and thenclickSelect.
To exclude columns from the profile scan, in theExclude columnsfield, clickBrowse. Select the columns to exclude, and thenclickSelect.
To apply sampling to your data profile scan, in theSampling sizelist, select a sampling percentage. Choose a percentage value that rangesbetween 0.0% and 100.0% with up to 3 decimal digits.
For larger datasets, choose a lower sampling percentage. For example,for a 1 PB table, if you enter a value between 0.1% and 1.0%,the data profile samples between 1-10 TB of data.
There must be at least 100 records in the sampled data to return a result.
For incremental data scans, the data profile scan applies sampling tothe latest increment.
Optional: Publish the data profile scan results in theBigQuery and Dataplex Universal Catalog pages in theGoogle Cloud console for the source table. Select thePublish results to BigQuery and Dataplex Catalogcheckbox.
You can view the latest scan results in theData profile tab in theBigQuery and Dataplex Universal Catalog pages for the sourcetable. To enable users to access the published scan results, see theGrant access to data profile scan results sectionof this document.
The publishing option might not be available in the following cases:
- You don't have the required permissions on the table.
- Another data quality scan is set to publish results.
In theSchedule section, choose one of the following options:
Repeat: Run the data profile scan on a schedule: hourly, daily,weekly, monthly, or custom. Specify how often the scan should run andat what time. If you choose custom, usecron format to specify theschedule.
On-demand: Run the data profile scan on demand.
One-time: Run the data quality scan once now, and remove the scanafter the time-to-live period.
Time to live: The time-to-live value defines the duration a dataprofile scan remains active after execution. A data profile scan withouta specified time-to-live is automatically removed after 24 hours. Thetime-to-live can range from 0 seconds (immediate deletion) to 365 days.
ClickContinue.
Optional: Export the scan results to a BigQuery standardtable. In theExport scan results to BigQuery table section, do thefollowing:
In theSelect BigQuery dataset field, clickBrowse. Select aBigQuery dataset to store the data profile scan results.
In theBigQuery table field, specify the table to store the dataprofile scan results. If you're using an existing table, make surethat it is compatible with theexport table schema.If the specified table doesn't exist, Dataplex Universal Catalog createsit for you.
Note: You can use the same results table for multiple data profilescans.
Optional: Add labels. Labels are key-value pairs that let you grouprelated objects together or with other Google Cloud resources.
To create the scan, clickCreate.
If you set the schedule to on-demand, you can also run the scan nowby clickingRun scan.
gcloud
To create a data profile scan, use the
gcloud dataplex datascans create data-profilecommand.If the source data is organized in a Dataplex Universal Catalog lake, includethe
--data-source-entityflag:gcloud dataplex datascans create data-profileDATASCAN \--location=LOCATION \--data-source-entity=DATA_SOURCE_ENTITY
If the source data isn't organized in a Dataplex Universal Catalog lake, includethe
--data-source-resourceflag:gcloud dataplex datascans create data-profileDATASCAN \--location=LOCATION \--data-source-resource=DATA_SOURCE_RESOURCE
Replace the following variables:
DATASCAN: The name of the data profile scan.LOCATION: The Google Cloud region in which to createthe data profile scan.DATA_SOURCE_ENTITY: The Dataplex Universal Catalogentity that contains the data for the data profile scan. For example,projects/test-project/locations/test-location/lakes/test-lake/zones/test-zone/entities/test-entity.DATA_SOURCE_RESOURCE: The name of the resourcethat contains the data for the data profile scan. For example,//bigquery.googleapis.com/projects/test-project/datasets/test-dataset/tables/test-table.
C#
C#
Before trying this sample, follow theC# setup instructions in theDataplex Universal Catalog quickstart using client libraries. For more information, see theDataplex Universal CatalogC# API reference documentation.
To authenticate to Dataplex Universal Catalog, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.
usingGoogle.Api.Gax.ResourceNames;usingGoogle.Cloud.Dataplex.V1;usingGoogle.LongRunning;publicsealedpartialclassGeneratedDataScanServiceClientSnippets{/// <summary>Snippet for CreateDataScan</summary>/// <remarks>/// This snippet has been automatically generated and should be regarded as a code template only./// It will require modifications to work:/// - It may require correct/in-range values for request initialization./// - It may require specifying regional endpoints when creating the service client as shown in/// https://cloud.google.com/dotnet/docs/reference/help/client-configuration#endpoint./// </remarks>publicvoidCreateDataScanRequestObject(){// Create clientDataScanServiceClientdataScanServiceClient=DataScanServiceClient.Create();// Initialize request argument(s)CreateDataScanRequestrequest=newCreateDataScanRequest{ParentAsLocationName=LocationName.FromProjectLocation("[PROJECT]","[LOCATION]"),DataScan=newDataScan(),DataScanId="",ValidateOnly=false,};// Make the requestOperation<DataScan,OperationMetadata>response=dataScanServiceClient.CreateDataScan(request);// Poll until the returned long-running operation is completeOperation<DataScan,OperationMetadata>completedResponse=response.PollUntilCompleted();// Retrieve the operation resultDataScanresult=completedResponse.Result;// Or get the name of the operationstringoperationName=response.Name;// This name can be stored, then the long-running operation retrieved later by nameOperation<DataScan,OperationMetadata>retrievedResponse=dataScanServiceClient.PollOnceCreateDataScan(operationName);// Check if the retrieved long-running operation has completedif(retrievedResponse.IsCompleted){// If it has completed, then access the resultDataScanretrievedResult=retrievedResponse.Result;}}}Go
Go
Before trying this sample, follow theGo setup instructions in theDataplex Universal Catalog quickstart using client libraries. For more information, see theDataplex Universal CatalogGo API reference documentation.
To authenticate to Dataplex Universal Catalog, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.
packagemainimport("context"dataplex"cloud.google.com/go/dataplex/apiv1"dataplexpb"cloud.google.com/go/dataplex/apiv1/dataplexpb")funcmain(){ctx:=context.Background()// This snippet has been automatically generated and should be regarded as a code template only.// It will require modifications to work:// - It may require correct/in-range values for request initialization.// - It may require specifying regional endpoints when creating the service client as shown in:// https://pkg.go.dev/cloud.google.com/go#hdr-Client_Optionsc,err:=dataplex.NewDataScanClient(ctx)iferr!=nil{// TODO: Handle error.}deferc.Close()req:=&dataplexpb.CreateDataScanRequest{// TODO: Fill request struct fields.// See https://pkg.go.dev/cloud.google.com/go/dataplex/apiv1/dataplexpb#CreateDataScanRequest.}op,err:=c.CreateDataScan(ctx,req)iferr!=nil{// TODO: Handle error.}resp,err:=op.Wait(ctx)iferr!=nil{// TODO: Handle error.}// TODO: Use resp._=resp}Java
Java
Before trying this sample, follow theJava setup instructions in theDataplex Universal Catalog quickstart using client libraries. For more information, see theDataplex Universal CatalogJava API reference documentation.
To authenticate to Dataplex Universal Catalog, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.
importcom.google.cloud.dataplex.v1.CreateDataScanRequest;importcom.google.cloud.dataplex.v1.DataScan;importcom.google.cloud.dataplex.v1.DataScanServiceClient;importcom.google.cloud.dataplex.v1.LocationName;publicclassSyncCreateDataScan{publicstaticvoidmain(String[]args)throwsException{syncCreateDataScan();}publicstaticvoidsyncCreateDataScan()throwsException{// This snippet has been automatically generated and should be regarded as a code template only.// It will require modifications to work:// - It may require correct/in-range values for request initialization.// - It may require specifying regional endpoints when creating the service client as shown in// https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_librarytry(DataScanServiceClientdataScanServiceClient=DataScanServiceClient.create()){CreateDataScanRequestrequest=CreateDataScanRequest.newBuilder().setParent(LocationName.of("[PROJECT]","[LOCATION]").toString()).setDataScan(DataScan.newBuilder().build()).setDataScanId("dataScanId1260787906").setValidateOnly(true).build();DataScanresponse=dataScanServiceClient.createDataScanAsync(request).get();}}}Python
Python
Before trying this sample, follow thePython setup instructions in theDataplex Universal Catalog quickstart using client libraries. For more information, see theDataplex Universal CatalogPython API reference documentation.
To authenticate to Dataplex Universal Catalog, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.
# This snippet has been automatically generated and should be regarded as a# code template only.# It will require modifications to work:# - It may require correct/in-range values for request initialization.# - It may require specifying regional endpoints when creating the service# client as shown in:# https://googleapis.dev/python/google-api-core/latest/client_options.htmlfromgoogle.cloudimportdataplex_v1defsample_create_data_scan():# Create a clientclient=dataplex_v1.DataScanServiceClient()# Initialize request argument(s)data_scan=dataplex_v1.DataScan()data_scan.data_quality_spec.rules.dimension="dimension_value"data_scan.data.entity="entity_value"request=dataplex_v1.CreateDataScanRequest(parent="parent_value",data_scan=data_scan,data_scan_id="data_scan_id_value",)# Make the requestoperation=client.create_data_scan(request=request)print("Waiting for operation to complete...")response=operation.result()# Handle the responseprint(response)Ruby
Ruby
Before trying this sample, follow theRuby setup instructions in theDataplex Universal Catalog quickstart using client libraries. For more information, see theDataplex Universal CatalogRuby API reference documentation.
To authenticate to Dataplex Universal Catalog, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.
require"google/cloud/dataplex/v1"### Snippet for the create_data_scan call in the DataScanService service## This snippet has been automatically generated and should be regarded as a code# template only. It will require modifications to work:# - It may require correct/in-range values for request initialization.# - It may require specifying regional endpoints when creating the service# client as shown in https://cloud.google.com/ruby/docs/reference.## This is an auto-generated example demonstrating basic usage of# Google::Cloud::Dataplex::V1::DataScanService::Client#create_data_scan.#defcreate_data_scan# Create a client object. The client can be reused for multiple calls.client=Google::Cloud::Dataplex::V1::DataScanService::Client.new# Create a request. To set request fields, pass in keyword arguments.request=Google::Cloud::Dataplex::V1::CreateDataScanRequest.new# Call the create_data_scan method.result=client.create_data_scanrequest# The returned object is of type Gapic::Operation. You can use it to# check the status of an operation, cancel it, or wait for results.# Here is how to wait for a response.result.wait_until_done!timeout:60ifresult.response?presult.responseelseputs"No response received."endendREST
To create a data profile scan, use the
dataScans.createmethod.Requirepartition filtersetting set totrue, use the table's partition column as thedata profile scan's row filter or timestamp column.Create multiple data profile scans
You can configure data profile scans for multiple tables in aBigQuery dataset at the same time by using the Google Cloud console.
In the Google Cloud console, on the BigQueryMetadata curation page, go to theData profiling & quality tab.
ClickCreate data profile scan.
Select theMultiple data profile scans option.
Enter anID prefix. Dataplex Universal Catalog automatically generates scanIDs by using the provided prefix and unique suffixes.
Enter aDescription for all of the data profile scans.
In theDataset field, clickBrowse. Select a dataset to pick tablesfrom. ClickSelect.
If the dataset is multi-regional, select aRegion in which to create thedata profile scans.
Configure the common settings for the scans:
In theScope field, chooseIncremental orEntire data.
Note: If you chooseIncremental data, you can select only tables thatare partitioned on a column of typeDATEorTIMESTAMP.To apply sampling to the data profile scans, in theSampling sizelist, select a sampling percentage.
Choose a percentage value between 0.0% and 100.0% with up to 3 decimaldigits.
Optional: Publish the data profile scan results in theBigQuery and Dataplex Universal Catalog pages in theGoogle Cloud console for the source table. Select thePublish results to BigQuery and Dataplex Catalog checkbox.
You can view the latest scan results in theData profile tab in theBigQuery and Dataplex Universal Catalog pages for the sourcetable. To enable users to access the published scan results, see theGrant access to data profile scanresults section of this document.
Note: You must choose tables that don't have any existing scans publishingtheir results.In theSchedule section, choose one of the following options:
Repeat: Run the data profile scans on a schedule: hourly, daily,weekly, monthly, or custom. Specify how often the scans should run andat what time. If you choose custom, usecron format to specify theschedule.
On-demand: Run the data profile scans on demand.
ClickContinue.
In theChoose tables field, clickBrowse. Choose one or more tablesto scan, and then clickSelect.
ClickContinue.
Optional: Export the scan results to a BigQuery standardtable. In theExport scan results to BigQuery table section, do thefollowing:
In theSelect BigQuery dataset field, clickBrowse. Select aBigQuery dataset to store the data profile scan results.
In theBigQuery table field, specify the table to store the dataprofile scan results. If you're using an existing table, make sure thatit is compatible with theexport table schema.If the specified table doesn't exist, Dataplex Universal Catalog creates itfor you.
Dataplex Universal Catalog uses the same results table for all of the dataprofile scans.
Optional: Add labels. Labels are key-value pairs that let you group relatedobjects together or with other Google Cloud resources.
To create the scans, clickCreate.
If you set the schedule to on-demand, you can also run the scans now byclickingRun scan.
Run a data profile scan
Note: Run isn't supported for data profile scans that are on a one-timeschedule.Console
- In the Google Cloud console, on the BigQueryMetadata curation page, go to theData profiling & quality tab.
- Click the data profile scan to run.
- ClickRun now.
gcloud
To run a data profile scan, use the
gcloud dataplex datascans runcommand:gcloud dataplex datascans runDATASCAN \--location=LOCATION
Replace the following variables:
DATASCAN: The name of the data profile scan.LOCATION: The Google Cloud region in which thedata profile scan was created.
C#
C#
Before trying this sample, follow theC# setup instructions in theDataplex Universal Catalog quickstart using client libraries. For more information, see theDataplex Universal CatalogC# API reference documentation.
To authenticate to Dataplex Universal Catalog, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.
usingGoogle.Cloud.Dataplex.V1;publicsealedpartialclassGeneratedDataScanServiceClientSnippets{/// <summary>Snippet for RunDataScan</summary>/// <remarks>/// This snippet has been automatically generated and should be regarded as a code template only./// It will require modifications to work:/// - It may require correct/in-range values for request initialization./// - It may require specifying regional endpoints when creating the service client as shown in/// https://cloud.google.com/dotnet/docs/reference/help/client-configuration#endpoint./// </remarks>publicvoidRunDataScanRequestObject(){// Create clientDataScanServiceClientdataScanServiceClient=DataScanServiceClient.Create();// Initialize request argument(s)RunDataScanRequestrequest=newRunDataScanRequest{DataScanName=DataScanName.FromProjectLocationDataScan("[PROJECT]","[LOCATION]","[DATASCAN]"),};// Make the requestRunDataScanResponseresponse=dataScanServiceClient.RunDataScan(request);}}Go
Go
Before trying this sample, follow theGo setup instructions in theDataplex Universal Catalog quickstart using client libraries. For more information, see theDataplex Universal CatalogGo API reference documentation.
To authenticate to Dataplex Universal Catalog, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.
packagemainimport("context"dataplex"cloud.google.com/go/dataplex/apiv1"dataplexpb"cloud.google.com/go/dataplex/apiv1/dataplexpb")funcmain(){ctx:=context.Background()// This snippet has been automatically generated and should be regarded as a code template only.// It will require modifications to work:// - It may require correct/in-range values for request initialization.// - It may require specifying regional endpoints when creating the service client as shown in:// https://pkg.go.dev/cloud.google.com/go#hdr-Client_Optionsc,err:=dataplex.NewDataScanClient(ctx)iferr!=nil{// TODO: Handle error.}deferc.Close()req:=&dataplexpb.RunDataScanRequest{// TODO: Fill request struct fields.// See https://pkg.go.dev/cloud.google.com/go/dataplex/apiv1/dataplexpb#RunDataScanRequest.}resp,err:=c.RunDataScan(ctx,req)iferr!=nil{// TODO: Handle error.}// TODO: Use resp._=resp}Java
Java
Before trying this sample, follow theJava setup instructions in theDataplex Universal Catalog quickstart using client libraries. For more information, see theDataplex Universal CatalogJava API reference documentation.
To authenticate to Dataplex Universal Catalog, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.
importcom.google.cloud.dataplex.v1.DataScanName;importcom.google.cloud.dataplex.v1.DataScanServiceClient;importcom.google.cloud.dataplex.v1.RunDataScanRequest;importcom.google.cloud.dataplex.v1.RunDataScanResponse;publicclassSyncRunDataScan{publicstaticvoidmain(String[]args)throwsException{syncRunDataScan();}publicstaticvoidsyncRunDataScan()throwsException{// This snippet has been automatically generated and should be regarded as a code template only.// It will require modifications to work:// - It may require correct/in-range values for request initialization.// - It may require specifying regional endpoints when creating the service client as shown in// https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_librarytry(DataScanServiceClientdataScanServiceClient=DataScanServiceClient.create()){RunDataScanRequestrequest=RunDataScanRequest.newBuilder().setName(DataScanName.of("[PROJECT]","[LOCATION]","[DATASCAN]").toString()).build();RunDataScanResponseresponse=dataScanServiceClient.runDataScan(request);}}}Python
Python
Before trying this sample, follow thePython setup instructions in theDataplex Universal Catalog quickstart using client libraries. For more information, see theDataplex Universal CatalogPython API reference documentation.
To authenticate to Dataplex Universal Catalog, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.
# This snippet has been automatically generated and should be regarded as a# code template only.# It will require modifications to work:# - It may require correct/in-range values for request initialization.# - It may require specifying regional endpoints when creating the service# client as shown in:# https://googleapis.dev/python/google-api-core/latest/client_options.htmlfromgoogle.cloudimportdataplex_v1defsample_run_data_scan():# Create a clientclient=dataplex_v1.DataScanServiceClient()# Initialize request argument(s)request=dataplex_v1.RunDataScanRequest(name="name_value",)# Make the requestresponse=client.run_data_scan(request=request)# Handle the responseprint(response)Ruby
Ruby
Before trying this sample, follow theRuby setup instructions in theDataplex Universal Catalog quickstart using client libraries. For more information, see theDataplex Universal CatalogRuby API reference documentation.
To authenticate to Dataplex Universal Catalog, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.
require"google/cloud/dataplex/v1"### Snippet for the run_data_scan call in the DataScanService service## This snippet has been automatically generated and should be regarded as a code# template only. It will require modifications to work:# - It may require correct/in-range values for request initialization.# - It may require specifying regional endpoints when creating the service# client as shown in https://cloud.google.com/ruby/docs/reference.## This is an auto-generated example demonstrating basic usage of# Google::Cloud::Dataplex::V1::DataScanService::Client#run_data_scan.#defrun_data_scan# Create a client object. The client can be reused for multiple calls.client=Google::Cloud::Dataplex::V1::DataScanService::Client.new# Create a request. To set request fields, pass in keyword arguments.request=Google::Cloud::Dataplex::V1::RunDataScanRequest.new# Call the run_data_scan method.result=client.run_data_scanrequest# The returned object is of type Google::Cloud::Dataplex::V1::RunDataScanResponse.presultendREST
To run a data profile scan, use the
dataScans.runmethod.View data profile scan results
Console
In the Google Cloud console, on the BigQueryMetadata curation page, go to theData profiling & quality tab.
Click the name of a data profile scan.
TheOverview section displays information about the most recentjobs, including when the scan was run, the number of table recordsscanned, and the job status.
TheData profile scan configuration section displays details aboutthe scan.
To see detailed information about a job, such as the scanned table'scolumns, statistics about the columns that were found in the scan, and thejob logs, click theJobs history tab. Then, click a job ID.
gcloud
To view the results of a data profile scan job, use the
gcloud dataplex datascans jobs describecommand:gcloud dataplex datascans jobs describeJOB \--location=LOCATION \--datascan=DATASCAN \--view=FULL
Replace the following variables:
JOB: The job ID of the data profile scan job.LOCATION: The Google Cloud region in which thedata profile scan was created.DATASCAN: The name of the data profile scan thejob belongs to.--view=FULL: To see the scan job result, specifyFULL.
C#
C#
Before trying this sample, follow theC# setup instructions in theDataplex Universal Catalog quickstart using client libraries. For more information, see theDataplex Universal CatalogC# API reference documentation.
To authenticate to Dataplex Universal Catalog, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.
usingGoogle.Cloud.Dataplex.V1;publicsealedpartialclassGeneratedDataScanServiceClientSnippets{/// <summary>Snippet for GetDataScan</summary>/// <remarks>/// This snippet has been automatically generated and should be regarded as a code template only./// It will require modifications to work:/// - It may require correct/in-range values for request initialization./// - It may require specifying regional endpoints when creating the service client as shown in/// https://cloud.google.com/dotnet/docs/reference/help/client-configuration#endpoint./// </remarks>publicvoidGetDataScanRequestObject(){// Create clientDataScanServiceClientdataScanServiceClient=DataScanServiceClient.Create();// Initialize request argument(s)GetDataScanRequestrequest=newGetDataScanRequest{DataScanName=DataScanName.FromProjectLocationDataScan("[PROJECT]","[LOCATION]","[DATASCAN]"),View=GetDataScanRequest.Types.DataScanView.Unspecified,};// Make the requestDataScanresponse=dataScanServiceClient.GetDataScan(request);}}Go
Go
Before trying this sample, follow theGo setup instructions in theDataplex Universal Catalog quickstart using client libraries. For more information, see theDataplex Universal CatalogGo API reference documentation.
To authenticate to Dataplex Universal Catalog, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.
packagemainimport("context"dataplex"cloud.google.com/go/dataplex/apiv1"dataplexpb"cloud.google.com/go/dataplex/apiv1/dataplexpb")funcmain(){ctx:=context.Background()// This snippet has been automatically generated and should be regarded as a code template only.// It will require modifications to work:// - It may require correct/in-range values for request initialization.// - It may require specifying regional endpoints when creating the service client as shown in:// https://pkg.go.dev/cloud.google.com/go#hdr-Client_Optionsc,err:=dataplex.NewDataScanClient(ctx)iferr!=nil{// TODO: Handle error.}deferc.Close()req:=&dataplexpb.GetDataScanRequest{// TODO: Fill request struct fields.// See https://pkg.go.dev/cloud.google.com/go/dataplex/apiv1/dataplexpb#GetDataScanRequest.}resp,err:=c.GetDataScan(ctx,req)iferr!=nil{// TODO: Handle error.}// TODO: Use resp._=resp}Java
Java
Before trying this sample, follow theJava setup instructions in theDataplex Universal Catalog quickstart using client libraries. For more information, see theDataplex Universal CatalogJava API reference documentation.
To authenticate to Dataplex Universal Catalog, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.
importcom.google.cloud.dataplex.v1.DataScan;importcom.google.cloud.dataplex.v1.DataScanName;importcom.google.cloud.dataplex.v1.DataScanServiceClient;importcom.google.cloud.dataplex.v1.GetDataScanRequest;publicclassSyncGetDataScan{publicstaticvoidmain(String[]args)throwsException{syncGetDataScan();}publicstaticvoidsyncGetDataScan()throwsException{// This snippet has been automatically generated and should be regarded as a code template only.// It will require modifications to work:// - It may require correct/in-range values for request initialization.// - It may require specifying regional endpoints when creating the service client as shown in// https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_librarytry(DataScanServiceClientdataScanServiceClient=DataScanServiceClient.create()){GetDataScanRequestrequest=GetDataScanRequest.newBuilder().setName(DataScanName.of("[PROJECT]","[LOCATION]","[DATASCAN]").toString()).build();DataScanresponse=dataScanServiceClient.getDataScan(request);}}}Python
Python
Before trying this sample, follow thePython setup instructions in theDataplex Universal Catalog quickstart using client libraries. For more information, see theDataplex Universal CatalogPython API reference documentation.
To authenticate to Dataplex Universal Catalog, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.
# This snippet has been automatically generated and should be regarded as a# code template only.# It will require modifications to work:# - It may require correct/in-range values for request initialization.# - It may require specifying regional endpoints when creating the service# client as shown in:# https://googleapis.dev/python/google-api-core/latest/client_options.htmlfromgoogle.cloudimportdataplex_v1defsample_get_data_scan():# Create a clientclient=dataplex_v1.DataScanServiceClient()# Initialize request argument(s)request=dataplex_v1.GetDataScanRequest(name="name_value",)# Make the requestresponse=client.get_data_scan(request=request)# Handle the responseprint(response)Ruby
Ruby
Before trying this sample, follow theRuby setup instructions in theDataplex Universal Catalog quickstart using client libraries. For more information, see theDataplex Universal CatalogRuby API reference documentation.
To authenticate to Dataplex Universal Catalog, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.
require"google/cloud/dataplex/v1"### Snippet for the get_data_scan call in the DataScanService service## This snippet has been automatically generated and should be regarded as a code# template only. It will require modifications to work:# - It may require correct/in-range values for request initialization.# - It may require specifying regional endpoints when creating the service# client as shown in https://cloud.google.com/ruby/docs/reference.## This is an auto-generated example demonstrating basic usage of# Google::Cloud::Dataplex::V1::DataScanService::Client#get_data_scan.#defget_data_scan# Create a client object. The client can be reused for multiple calls.client=Google::Cloud::Dataplex::V1::DataScanService::Client.new# Create a request. To set request fields, pass in keyword arguments.request=Google::Cloud::Dataplex::V1::GetDataScanRequest.new# Call the get_data_scan method.result=client.get_data_scanrequest# The returned object is of type Google::Cloud::Dataplex::V1::DataScan.presultendREST
To view the results of a data profile scan, use the
dataScans.getmethod.View published results
If the data profile scan results are published to the BigQueryand Dataplex Universal Catalog pages in the Google Cloud console, then you cansee the latest scan results on the source table'sData profile tab.
In the Google Cloud console, go to the BigQuery page.
In the left pane, clickExplorer:

If you don't see the left pane, clickExpand left pane to open the pane.
In theExplorer pane, clickDatasets, and then click your dataset.
ClickOverview> Tables, and then select the table whose data profile scanresults you want to see.
Click theData profile tab.
The latest published results are displayed.
Note: Published results might not be available if a scan is running for thefirst time.
View the most recent data profile scan job
Console
In the Google Cloud console, on the BigQueryMetadata curation page, go to theData profiling & quality tab.
Click the name of a data profile scan.
Click theLatest job results tab.
TheLatest job results tab, when there is at least one successfullycompleted run, provides information about the most recent job. It lists the scannedtable's columns and statistics about the columns that were found in the scan.
gcloud
To view the most recent successful data profile scan, use the
gcloud dataplex datascans describecommand:gcloud dataplex datascans describeDATASCAN \--location=LOCATION \--view=FULL
Replace the following variables:
DATASCAN: The name of the data profile scan to viewthe most recent job for.LOCATION: The Google Cloud region in which the dataprofile scan was created.--view=FULL: To see the scan job result, specifyFULL.
REST
To view the most recent scan job, use the
dataScans.getmethod.View historical scan results
Dataplex Universal Catalog saves the data profile scan history of the last 300jobs or for the past year, whichever occurs first.
Console
In the Google Cloud console, on the BigQueryMetadata curation page, go to theData profiling & quality tab.
Click the name of a data profile scan.
Click theJobs history tab.
TheJobs history tab provides information about past jobs, such asthe number of records scanned in each job, the job status, and the time thejob was run.
To view detailed information about a job, click any of the jobs in theJob ID column.
gcloud
To view historical data profile scan jobs, use the
gcloud dataplex datascans jobs listcommand:gcloud dataplex datascans jobs list \--location=LOCATION \--datascan=DATASCAN
Replace the following variables:
LOCATION: The Google Cloud region in which the dataprofile scan was created.DATASCAN: The name of the data profile scan to viewjobs for.
C#
C#
Before trying this sample, follow theC# setup instructions in theDataplex Universal Catalog quickstart using client libraries. For more information, see theDataplex Universal CatalogC# API reference documentation.
To authenticate to Dataplex Universal Catalog, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.
usingGoogle.Api.Gax;usingGoogle.Cloud.Dataplex.V1;usingSystem;publicsealedpartialclassGeneratedDataScanServiceClientSnippets{/// <summary>Snippet for ListDataScanJobs</summary>/// <remarks>/// This snippet has been automatically generated and should be regarded as a code template only./// It will require modifications to work:/// - It may require correct/in-range values for request initialization./// - It may require specifying regional endpoints when creating the service client as shown in/// https://cloud.google.com/dotnet/docs/reference/help/client-configuration#endpoint./// </remarks>publicvoidListDataScanJobsRequestObject(){// Create clientDataScanServiceClientdataScanServiceClient=DataScanServiceClient.Create();// Initialize request argument(s)ListDataScanJobsRequestrequest=newListDataScanJobsRequest{ParentAsDataScanName=DataScanName.FromProjectLocationDataScan("[PROJECT]","[LOCATION]","[DATASCAN]"),Filter="",};// Make the requestPagedEnumerable<ListDataScanJobsResponse,DataScanJob>response=dataScanServiceClient.ListDataScanJobs(request);// Iterate over all response items, lazily performing RPCs as requiredforeach(DataScanJobiteminresponse){// Do something with each itemConsole.WriteLine(item);}// Or iterate over pages (of server-defined size), performing one RPC per pageforeach(ListDataScanJobsResponsepageinresponse.AsRawResponses()){// Do something with each page of itemsConsole.WriteLine("A page of results:");foreach(DataScanJobiteminpage){// Do something with each itemConsole.WriteLine(item);}}// Or retrieve a single page of known size (unless it's the final page), performing as many RPCs as requiredintpageSize=10;Page<DataScanJob>singlePage=response.ReadPage(pageSize);// Do something with the page of itemsConsole.WriteLine($"A page of {pageSize} results (unless it's the final page):");foreach(DataScanJobiteminsinglePage){// Do something with each itemConsole.WriteLine(item);}// Store the pageToken, for when the next page is required.stringnextPageToken=singlePage.NextPageToken;}}Go
Go
Before trying this sample, follow theGo setup instructions in theDataplex Universal Catalog quickstart using client libraries. For more information, see theDataplex Universal CatalogGo API reference documentation.
To authenticate to Dataplex Universal Catalog, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.
packagemainimport("context"dataplex"cloud.google.com/go/dataplex/apiv1"dataplexpb"cloud.google.com/go/dataplex/apiv1/dataplexpb""google.golang.org/api/iterator")funcmain(){ctx:=context.Background()// This snippet has been automatically generated and should be regarded as a code template only.// It will require modifications to work:// - It may require correct/in-range values for request initialization.// - It may require specifying regional endpoints when creating the service client as shown in:// https://pkg.go.dev/cloud.google.com/go#hdr-Client_Optionsc,err:=dataplex.NewDataScanClient(ctx)iferr!=nil{// TODO: Handle error.}deferc.Close()req:=&dataplexpb.ListDataScanJobsRequest{// TODO: Fill request struct fields.// See https://pkg.go.dev/cloud.google.com/go/dataplex/apiv1/dataplexpb#ListDataScanJobsRequest.}it:=c.ListDataScanJobs(ctx,req)for{resp,err:=it.Next()iferr==iterator.Done{break}iferr!=nil{// TODO: Handle error.}// TODO: Use resp._=resp// If you need to access the underlying RPC response,// you can do so by casting the `Response` as below.// Otherwise, remove this line. Only populated after// first call to Next(). Not safe for concurrent access._=it.Response.(*dataplexpb.ListDataScanJobsResponse)}}Java
Java
Before trying this sample, follow theJava setup instructions in theDataplex Universal Catalog quickstart using client libraries. For more information, see theDataplex Universal CatalogJava API reference documentation.
To authenticate to Dataplex Universal Catalog, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.
importcom.google.cloud.dataplex.v1.DataScanJob;importcom.google.cloud.dataplex.v1.DataScanName;importcom.google.cloud.dataplex.v1.DataScanServiceClient;importcom.google.cloud.dataplex.v1.ListDataScanJobsRequest;publicclassSyncListDataScanJobs{publicstaticvoidmain(String[]args)throwsException{syncListDataScanJobs();}publicstaticvoidsyncListDataScanJobs()throwsException{// This snippet has been automatically generated and should be regarded as a code template only.// It will require modifications to work:// - It may require correct/in-range values for request initialization.// - It may require specifying regional endpoints when creating the service client as shown in// https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_librarytry(DataScanServiceClientdataScanServiceClient=DataScanServiceClient.create()){ListDataScanJobsRequestrequest=ListDataScanJobsRequest.newBuilder().setParent(DataScanName.of("[PROJECT]","[LOCATION]","[DATASCAN]").toString()).setPageSize(883849137).setPageToken("pageToken873572522").setFilter("filter-1274492040").build();for(DataScanJobelement:dataScanServiceClient.listDataScanJobs(request).iterateAll()){// doThingsWith(element);}}}}Python
Python
Before trying this sample, follow thePython setup instructions in theDataplex Universal Catalog quickstart using client libraries. For more information, see theDataplex Universal CatalogPython API reference documentation.
To authenticate to Dataplex Universal Catalog, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.
# This snippet has been automatically generated and should be regarded as a# code template only.# It will require modifications to work:# - It may require correct/in-range values for request initialization.# - It may require specifying regional endpoints when creating the service# client as shown in:# https://googleapis.dev/python/google-api-core/latest/client_options.htmlfromgoogle.cloudimportdataplex_v1defsample_list_data_scan_jobs():# Create a clientclient=dataplex_v1.DataScanServiceClient()# Initialize request argument(s)request=dataplex_v1.ListDataScanJobsRequest(parent="parent_value",)# Make the requestpage_result=client.list_data_scan_jobs(request=request)# Handle the responseforresponseinpage_result:print(response)Ruby
Ruby
Before trying this sample, follow theRuby setup instructions in theDataplex Universal Catalog quickstart using client libraries. For more information, see theDataplex Universal CatalogRuby API reference documentation.
To authenticate to Dataplex Universal Catalog, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.
require"google/cloud/dataplex/v1"### Snippet for the list_data_scan_jobs call in the DataScanService service## This snippet has been automatically generated and should be regarded as a code# template only. It will require modifications to work:# - It may require correct/in-range values for request initialization.# - It may require specifying regional endpoints when creating the service# client as shown in https://cloud.google.com/ruby/docs/reference.## This is an auto-generated example demonstrating basic usage of# Google::Cloud::Dataplex::V1::DataScanService::Client#list_data_scan_jobs.#deflist_data_scan_jobs# Create a client object. The client can be reused for multiple calls.client=Google::Cloud::Dataplex::V1::DataScanService::Client.new# Create a request. To set request fields, pass in keyword arguments.request=Google::Cloud::Dataplex::V1::ListDataScanJobsRequest.new# Call the list_data_scan_jobs method.result=client.list_data_scan_jobsrequest# The returned object is of type Gapic::PagedEnumerable. You can iterate# over elements, and API calls will be issued to fetch pages as needed.result.eachdo|item|# Each element is of type ::Google::Cloud::Dataplex::V1::DataScanJob.pitemendendREST
To view historical data profile scan jobs, use the
dataScans.jobs.listmethod.View the data profile scans for a table
To view the data profile scans that apply to a specific table, do the following:
In the Google Cloud console, on the BigQueryMetadata curation page, go to theData profiling & quality tab.
Filter the list by table name and scan type.
Grant access to data profile scan results
To enable the users in your organization to view the scan results, do the following:
In the Google Cloud console, on the BigQueryMetadata curation page, go to theData profiling & quality tab.
Click the data quality scan you want to share the results of.
Click thePermissions tab.
Do the following:
- To grant access to a principal, clickGrant access. Grant theDataplex DataScan DataViewer role to theassociated principal.
- To remove access from a principal, select the principal that youwant to remove theDataplex DataScan DataViewer role from. ClickRemove access, and then confirm when prompted.
Manage data profile scans for a specific table
The steps in this document show how to manage data profile scans across yourproject by using the BigQueryMetadata curation> Data profiling & quality page in theGoogle Cloud console.
You can also create and manage data profile scans when working with aspecific table. In the Google Cloud console, on the BigQuerypage for the table, use theData profile tab. Do the following:
In the Google Cloud console, go to theBigQuery page.
In theExplorer pane (in the left pane), clickDatasets, and then click your dataset. Now clickOverview> Tables, and select the table whose data profile scan results you want to see.
Click theData profile tab.
Depending on whether the table has a data profile scan whose results arepublished, you can work with the table's data profile scans in the following ways:
Data profile scan results are published: the latest published scanresults are displayed on the page.
To manage the data profile scans for this table, clickData profilescan, and then select from the following options:
Create new scan: create a new data profile scan. For moreinformation, see theCreate a data profile scan sectionof this document. When you create a scan from a table's details page, thetable is preselected.
Run now: run the scan.
Edit scan configuration: edit settings including the display name,filters, sampling size, and schedule.
Manage scan permissions: control who can access the scan results.For more information, see theGrant access to data profile scan resultssection of this document.
View historical results: view detailed information about previousdata profile scan jobs. For more information, see theView data profile scan results andView historical scan results sections ofthis document.
View all scans: view a list of data profile scans that apply to thistable.
Data profile scan results aren't published: click the menu next toQuick data profile, and then select from the following options:
Customize data profiling: create a new data profile scan. For moreinformation, see theCreate a data profile scan sectionof this document. When you create a scan from a table's details page, thetable is preselected.
View previous profiles: view a list of data profile scans thatapply to this table.
Update a data profile scan
Note: Update isn't supported for data profile scans that are on a one-timeschedule.Console
In the Google Cloud console, on the BigQueryMetadata curation page, go to theData profiling & quality tab.
Click the name of a data profile scan.
ClickEdit, and then edit the values.
ClickSave.
gcloud
To update a data profile scan, use the
gcloud dataplex datascans update data-profilecommand:gcloud dataplex datascans update data-profileDATASCAN \--location=LOCATION \--description=DESCRIPTION
Replace the following variables:
DATASCAN: The name of the data profile scan toupdate.LOCATION: The Google Cloud region in which the dataprofile scan was created.DESCRIPTION: The new description for the dataprofile scan.
rowFilter,samplingPercent, orincludeFields, in the data quality specification file. See theJSON format.C#
C#
Before trying this sample, follow theC# setup instructions in theDataplex Universal Catalog quickstart using client libraries. For more information, see theDataplex Universal CatalogC# API reference documentation.
To authenticate to Dataplex Universal Catalog, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.
usingGoogle.Cloud.Dataplex.V1;usingGoogle.LongRunning;usingGoogle.Protobuf.WellKnownTypes;publicsealedpartialclassGeneratedDataScanServiceClientSnippets{/// <summary>Snippet for UpdateDataScan</summary>/// <remarks>/// This snippet has been automatically generated and should be regarded as a code template only./// It will require modifications to work:/// - It may require correct/in-range values for request initialization./// - It may require specifying regional endpoints when creating the service client as shown in/// https://cloud.google.com/dotnet/docs/reference/help/client-configuration#endpoint./// </remarks>publicvoidUpdateDataScanRequestObject(){// Create clientDataScanServiceClientdataScanServiceClient=DataScanServiceClient.Create();// Initialize request argument(s)UpdateDataScanRequestrequest=newUpdateDataScanRequest{DataScan=newDataScan(),UpdateMask=newFieldMask(),ValidateOnly=false,};// Make the requestOperation<DataScan,OperationMetadata>response=dataScanServiceClient.UpdateDataScan(request);// Poll until the returned long-running operation is completeOperation<DataScan,OperationMetadata>completedResponse=response.PollUntilCompleted();// Retrieve the operation resultDataScanresult=completedResponse.Result;// Or get the name of the operationstringoperationName=response.Name;// This name can be stored, then the long-running operation retrieved later by nameOperation<DataScan,OperationMetadata>retrievedResponse=dataScanServiceClient.PollOnceUpdateDataScan(operationName);// Check if the retrieved long-running operation has completedif(retrievedResponse.IsCompleted){// If it has completed, then access the resultDataScanretrievedResult=retrievedResponse.Result;}}}Go
Go
Before trying this sample, follow theGo setup instructions in theDataplex Universal Catalog quickstart using client libraries. For more information, see theDataplex Universal CatalogGo API reference documentation.
To authenticate to Dataplex Universal Catalog, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.
packagemainimport("context"dataplex"cloud.google.com/go/dataplex/apiv1"dataplexpb"cloud.google.com/go/dataplex/apiv1/dataplexpb")funcmain(){ctx:=context.Background()// This snippet has been automatically generated and should be regarded as a code template only.// It will require modifications to work:// - It may require correct/in-range values for request initialization.// - It may require specifying regional endpoints when creating the service client as shown in:// https://pkg.go.dev/cloud.google.com/go#hdr-Client_Optionsc,err:=dataplex.NewDataScanClient(ctx)iferr!=nil{// TODO: Handle error.}deferc.Close()req:=&dataplexpb.UpdateDataScanRequest{// TODO: Fill request struct fields.// See https://pkg.go.dev/cloud.google.com/go/dataplex/apiv1/dataplexpb#UpdateDataScanRequest.}op,err:=c.UpdateDataScan(ctx,req)iferr!=nil{// TODO: Handle error.}resp,err:=op.Wait(ctx)iferr!=nil{// TODO: Handle error.}// TODO: Use resp._=resp}Java
Java
Before trying this sample, follow theJava setup instructions in theDataplex Universal Catalog quickstart using client libraries. For more information, see theDataplex Universal CatalogJava API reference documentation.
To authenticate to Dataplex Universal Catalog, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.
importcom.google.cloud.dataplex.v1.DataScan;importcom.google.cloud.dataplex.v1.DataScanServiceClient;importcom.google.cloud.dataplex.v1.UpdateDataScanRequest;importcom.google.protobuf.FieldMask;publicclassSyncUpdateDataScan{publicstaticvoidmain(String[]args)throwsException{syncUpdateDataScan();}publicstaticvoidsyncUpdateDataScan()throwsException{// This snippet has been automatically generated and should be regarded as a code template only.// It will require modifications to work:// - It may require correct/in-range values for request initialization.// - It may require specifying regional endpoints when creating the service client as shown in// https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_librarytry(DataScanServiceClientdataScanServiceClient=DataScanServiceClient.create()){UpdateDataScanRequestrequest=UpdateDataScanRequest.newBuilder().setDataScan(DataScan.newBuilder().build()).setUpdateMask(FieldMask.newBuilder().build()).setValidateOnly(true).build();DataScanresponse=dataScanServiceClient.updateDataScanAsync(request).get();}}}Python
Python
Before trying this sample, follow thePython setup instructions in theDataplex Universal Catalog quickstart using client libraries. For more information, see theDataplex Universal CatalogPython API reference documentation.
To authenticate to Dataplex Universal Catalog, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.
# This snippet has been automatically generated and should be regarded as a# code template only.# It will require modifications to work:# - It may require correct/in-range values for request initialization.# - It may require specifying regional endpoints when creating the service# client as shown in:# https://googleapis.dev/python/google-api-core/latest/client_options.htmlfromgoogle.cloudimportdataplex_v1defsample_update_data_scan():# Create a clientclient=dataplex_v1.DataScanServiceClient()# Initialize request argument(s)data_scan=dataplex_v1.DataScan()data_scan.data_quality_spec.rules.dimension="dimension_value"data_scan.data.entity="entity_value"request=dataplex_v1.UpdateDataScanRequest(data_scan=data_scan,)# Make the requestoperation=client.update_data_scan(request=request)print("Waiting for operation to complete...")response=operation.result()# Handle the responseprint(response)Ruby
Ruby
Before trying this sample, follow theRuby setup instructions in theDataplex Universal Catalog quickstart using client libraries. For more information, see theDataplex Universal CatalogRuby API reference documentation.
To authenticate to Dataplex Universal Catalog, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.
require"google/cloud/dataplex/v1"### Snippet for the update_data_scan call in the DataScanService service## This snippet has been automatically generated and should be regarded as a code# template only. It will require modifications to work:# - It may require correct/in-range values for request initialization.# - It may require specifying regional endpoints when creating the service# client as shown in https://cloud.google.com/ruby/docs/reference.## This is an auto-generated example demonstrating basic usage of# Google::Cloud::Dataplex::V1::DataScanService::Client#update_data_scan.#defupdate_data_scan# Create a client object. The client can be reused for multiple calls.client=Google::Cloud::Dataplex::V1::DataScanService::Client.new# Create a request. To set request fields, pass in keyword arguments.request=Google::Cloud::Dataplex::V1::UpdateDataScanRequest.new# Call the update_data_scan method.result=client.update_data_scanrequest# The returned object is of type Gapic::Operation. You can use it to# check the status of an operation, cancel it, or wait for results.# Here is how to wait for a response.result.wait_until_done!timeout:60ifresult.response?presult.responseelseputs"No response received."endendREST
To edit a data profile scan, use the
dataScans.patchmethod.Delete a data profile scan
Note: Delete isn't supported for data profile scans that are on a one-timeschedule.Console
In the Google Cloud console, on the BigQueryMetadata curation page, go to theData profiling & quality tab.
Click the scan you want to delete.
ClickDelete, and then confirm when prompted.
gcloud
To delete a data profile scan, use the
gcloud dataplex datascans deletecommand:gcloud dataplex datascans deleteDATASCAN \--location=LOCATION --async
Replace the following variables:
DATASCAN: The name of the data profile scan todelete.LOCATION: The Google Cloud region in which the dataprofile scan was created.
REST
To delete a data profile scan, use the
dataScans.deletemethod.What's next
- Learn how toexplore your data by generating data insights.
- Learn more aboutdata governance in BigQuery.
- Learn how toscan your data for data quality issues.
- Learn how to examine table data and create queries withtable explorer.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-15 UTC.