Profile your data

This document explains how to use data profile scans to better understand your data.BigQuery uses Dataplex Universal Catalog to analyze the statisticalcharacteristics of your data, such as average values, unique values, and maximumvalues. Dataplex Universal Catalog also uses this information torecommend rules for data quality checks.

For more information about data profiling, seeAbout data profiling.

Tip: The steps in this document show how to manage data profile scans acrossyour project. You can also create and manage data profile scans when workingwith a specific table. For more information, see theManage data profile scans for a specific table sectionof this document.

Before you begin

Enable the Dataplex API.

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enable permission.Learn how to grant roles.

Enable the API

Required roles

To get the permissions that you need to create and manage data profile scans, ask your administrator to grant you the following IAM roles on your resource such as the project or table:

  • To create, run, update, and delete data profile scans:Dataplex DataScan Editor (roles/dataplex.dataScanEditor) role on the project containing the data scan.
  • To allow Dataplex Universal Catalog to run data profile scans against BigQuery data, grant the following roles to theDataplex Universal Catalog service account:BigQuery Job User (roles/bigquery.jobUser) role on the project running the scan;BigQuery Data Viewer (roles/bigquery.dataViewer) role on the tables being scanned.
  • To run data profile scans for BigQuery external tables that use Cloud Storage data: grant theDataplex Universal Catalog service account theStorage Object Viewer (roles/storage.objectViewer) andStorage Legacy Bucket Reader (roles/storage.legacyBucketReader) roles on the Cloud Storage bucket.
  • To view data profile scan results, jobs, and history:Dataplex DataScan Viewer (roles/dataplex.dataScanViewer) role on the project containing the data scan.
  • To export data profile scan results to a BigQuery table:BigQuery Data Editor (roles/bigquery.dataEditor) role on the table.
  • To publish data profile scan results to Dataplex Universal Catalog:Dataplex Catalog Editor (roles/dataplex.catalogEditor) role on the@bigquery entry group.
  • To view published data profile scan results in BigQuery on theData profile tab:BigQuery Data Viewer (roles/bigquery.dataViewer) role on the table.

For more information about granting roles, seeManage access to projects, folders, and organizations.

You might also be able to get the required permissions throughcustom roles or otherpredefined roles.

Required permissions

If you use custom roles, you need to grant the following IAM permissions:

  • To create, run, update, and delete data profile scans:
    • dataplex.datascans.create on project—Create aDataScan
    • dataplex.datascans.update on data scan—Update the description of aDataScan
    • dataplex.datascans.delete on data scan—Delete aDataScan
    • dataplex.datascans.run on data scan—Run aDataScan
    • dataplex.datascans.get on data scan—ViewDataScan details excluding results
    • dataplex.datascans.list on project—ListDataScans
    • dataplex.dataScanJobs.get on data scan job—Read DataScan job resources
    • dataplex.dataScanJobs.list on data scan—List DataScan job resources in a project
  • To allow Dataplex Universal Catalog to run data profile scans against BigQuery data:
    • bigquery.jobs.create on project—Run jobs
    • bigquery.tables.get on table—Get table metadata
    • bigquery.tables.getData on table—Get table data
  • To run data profile scans for BigQuery external tables that use Cloud Storage data:
    • storage.buckets.get on bucket—Read bucket metadata
    • storage.objects.get on object—Read object data
  • To view data profile scan results, jobs, and history:
    • dataplex.datascans.getData on data scan—ViewDataScan details including results
    • dataplex.datascans.list on project—ListDataScans
    • dataplex.dataScanJobs.get on data scan job—Read DataScan job resources
    • dataplex.dataScanJobs.list on data scan—List DataScan job resources in a project
  • To export data profile scan results to a BigQuery table:
    • bigquery.tables.create on dataset—Create tables
    • bigquery.tables.updateData on table—Write data to tables
  • To publish data profile scan results to Dataplex Universal Catalog:
    • dataplex.entryGroups.useDataProfileAspect on entry group—Allows Dataplex Universal Catalog data profile scans to save their results to Dataplex Universal Catalog
    • Additionally, you need one of the following permissions:
      • bigquery.tables.update on table—Update table metadata
      • dataplex.entries.update on entry—Update entries
  • To view published data profile results for a table in BigQuery or Dataplex Universal Catalog:
    • bigquery.tables.get on table—Get table metadata
    • bigquery.tables.getData on table—Get table data

If a table uses BigQueryrow-levelsecurity, then Dataplex Universal Catalogcan only scan rows visible to the Dataplex Universal Catalog service account. Toallow Dataplex Universal Catalog to scan all rows, add its service account to a rowfilter where the predicate isTRUE.

If a table uses BigQuerycolumn-level security, then Dataplex Universal Catalogrequires access to scan protected columns. To grant access, give theDataplex Universal Catalog service account theData Catalog Fine-Grained Reader (roles/datacatalog.fineGrainedReader)role on all policy tags used in the table. The user creating or updating a datascan also needs permissions on protected columns.

Grant roles to the Dataplex Universal Catalog service account

To run data profile scans, Dataplex Universal Catalog uses a service account thatrequires permissions to run BigQuery jobs and readBigQuery table data. To grant the required roles, followthese steps:

  1. Get the Dataplex Universal Catalog service account email address. If you haven'tcreated a data profile or data quality scan in this project before,run the followinggcloud command to generate the service identity:

    gcloudbetaservicesidentitycreate--service=dataplex.googleapis.com

    The command returns the service account email, which has the following format:service-PROJECT_ID@gcp-sa-dataplex.iam.gserviceaccount.com.

    If the service account already exists, you can find its email by viewingprincipals with theDataplex name on theIAM page in the Google Cloud console.

  2. Grant the service account theBigQuery Job User(roles/bigquery.jobUser) role on your project. This role lets theservice account run BigQuery jobs for the scan.

    gcloudprojectsadd-iam-policy-bindingPROJECT_ID\--member="serviceAccount:service-PROJECT_NUMBER@gcp-sa-dataplex.iam.gserviceaccount.com"\--role="roles/bigquery.jobUser"

    Replace the following:

    • PROJECT_ID: your Google Cloud project ID.
    • service-PROJECT_NUMBER@gcp-sa-dataplex.iam.gserviceaccount.com: the email of the Dataplex Universal Catalog service account.
  3. Grant the service account theBigQuery Data Viewer(roles/bigquery.dataViewer) role for each table that you want toprofile. This role grants read-only access to the tables.

    gcloudbigquerytablesadd-iam-policy-bindingDATASET_ID.TABLE_ID\--member="serviceAccount:service-PROJECT_NUMBER@gcp-sa-dataplex.iam.gserviceaccount.com"\--role="roles/bigquery.dataViewer"

    Replace the following:

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.