View lineage in Dataplex Universal Catalog

This page describes how to view the data lineage generated by yourCloud Data Fusion pipelines with other data movement on Google Cloud,for discovery and governance purposes. You canview the lineage graphs for supported data sources on theDataplex Universal Catalog page in the console, or use the Data Lineage APIto retrieve complete data lineage records.

Plugins that support Dataplex Universal Catalog data lineage

Cloud Data Fusion and Dataplex Universal Catalog support asset-level lineage for the following plugins:

  • Amazon S3
  • BigQuery
  • BigQuery Multi Table sink (version 6.9.1 and later)
  • Spanner
  • Cloud Storage
  • Cloud SQL for MySQL
  • Cloud SQL for PostgreSQL
  • Dataplex Universal Catalog
  • FTP
  • Generic Database
  • HTTP
  • MSSQL/SQL Server
  • Multiple Database Tables source (version 6.9.1 and later)
  • MySQL
  • Oracle
  • PostgreSQL
  • SAP OData
  • SAP ODP
  • SAP Table

For more information, seeCloud Data Fusion plugins.

Before you begin

To enable viewing Cloud Data Fusion lineage graphs on theDataplex Universal Catalog page in the console, do the following:

  1. Create a data pipeline that uses only thesupported plugins.

  2. Enable the Data Lineage API in the project that contains yourCloud Data Fusion instance.

  3. Grant the Data Lineage Events Producer role(roles/datalineage.producer)to the Cloud Data Fusion-managed service account, theCloud DataFusion API ServiceAgent.The process varies if your instance runs in an earlier version ofCloud Data Fusion and RBAC is enabled.

    6.10+ or no RBAC

    If your Cloud Data Fusion instance uses version 6.10.0 or later, oryour instance uses an earlier version and RBAC isn't enabled, follow thesesteps:

    1. In the Google Cloud console, go to theIAM page.

      Go to IAM

    2. Select theInclude Google-provided role grants checkbox.

    3. Select the Cloud Data Fusion API Service Agent service account andclickEdit.

    4. ClickAdd another role and select theData Lineage EventsProducer role.

    5. ClickSave.

    <6.10 with RBAC

    If your Cloud Data Fusion instance uses a version earlier than6.10.0 and RBAC is enabled, the service account doesn't appear in thelist of principals on the IAM page. You must enter theservice account name manually.

    To grant the required role, follow these steps:

    1. In the Google Cloud console, go to theIAM page.

      Go to IAM

    2. ClickGrant access.

    3. In theNew principals field, enter the Cloud Data Fusion APIService Agent service account. Use the following format:datafusion-system@TENANT_PROJECT_ID.iam.gserviceaccount.com.

      ReplaceTENANT_PROJECT_ID with thetenant ID for your instance. To view the tenant project ID, go totheInstances page and click the instance name for instancedetails.

      Go to Instances

    4. Select theData Lineage Events Producer role.

    5. ClickSave.

Enable Dataplex Universal Catalog data lineage in Cloud Data Fusion

For new instances in Cloud Data Fusion, Dataplex Universal Catalog datalineage is turned off by default. If you created the instance before January 27,2024 with version 6.8.0 or later, it's turned on by default after completing thesteps inBefore you begin.

Enable Dataplex Universal Catalog data lineage when you create an instance

Console

To enable Dataplex Universal Catalog data lineage when you create an instance,follow these steps:

  1. Go to the Cloud Data FusionInstances page and clickCreate aninstance.

    Create an instance

  2. When you configure the instance, expand theAdvanced options sectionand clickEnable integration with Dataplex data lineage. For moreinformation about creating instances, seeCreate a publicinstance.

REST API

To enable Dataplex Universal Catalog data lineage when you create an instance,set the optionaldataplex_data_lineage_integration_enabled property totrue:

echo '{ "description": "CDAPinstance","dataplex_data_lineage_integration_enabled": "true"}' | curl -X POST \  -H "Authorization: Bearer $(gcloud auth print-access-token)" \  -H "Content-Type: application/json" \  --data @- \  "https://datafusion.googleapis.com/v1/projects/PROJECT/locations/LOCATION/instances?instanceId=INSTANCE_NAME"

To turn it off, either set the property to false or omit the property, aslineage is turned off by default when you create a new instance.

Enable or disable Dataplex Universal Catalog data lineage in an existing instance

Console

To enable or disable Dataplex Universal Catalog data lineage in an existing instance inCloud Data Fusion, follow these steps:

  1. View the instance details:
    1. In the Google Cloud console, go to the Cloud Data Fusion page.

    2. ClickInstances, and then click the instance's name to go to theInstance details page.

      Go to Instances

  2. In theDataplex data lineage integration field, clickEdit.
  3. Enable or disable Dataplex Universal Catalog data lineage, and then clickSave.

REST API

To enable Dataplex Universal Catalog data lineage in an existing instance inCloud Data Fusion, set thedataplex_data_lineage_integration_enabledproperty totrue and include theupdateMask parameter value:

echo '{ "description": "CDAPinstance","dataplex_data_lineage_integration_enabled": "true"}' | curl -X POST \  -H "Authorization: Bearer $(gcloud auth print-access-token)" \  -H "Content-Type: application/json" \  --data @- \  "https://datafusion.googleapis.com/v1/projects/PROJECT/locations/LOCATION/instances?instanceId=INSTANCE_NAME?updateMask=dataplex_data_lineage_integration_enabled"

To disable Dataplex Universal Catalog data lineage in an existing instance inCloud Data Fusion, set thedataplex_data_lineage_integration_enabledproperty tofalse and include theupdateMask parameter value:

echo '{ "description": "CDAPinstance","dataplex_data_lineage_integration_enabled": "false"}' | curl -X POST \  -H "Authorization: Bearer $(gcloud auth print-access-token)" \  -H "Content-Type: application/json" \  --data @- \  "https://datafusion.googleapis.com/v1/projects/PROJECT/locations/LOCATION/instances?instanceId=INSTANCE_NAME?updateMask=dataplex_data_lineage_integration_enabled"

View data lineage graphs

To view lineage graphs for entities across all Google Cloud services,do the following:

  1. Go to your instance in Cloud Data Fusion and run a data pipelinethat uses supported plugins.

  2. View the lineage graphs on the Dataplex Universal Catalog page in the consoleand find the asset for which you want to view lineage information.

Limitations

Viewing lineage in Dataplex Universal Catalog has the following limitations:

Warning: Dataplex Universal Catalog uses IP addresses to form a fully qualified name that uniquely identifies sources and sinks (such as a Database sink) to display lineage. If you must prevent sharing IP address or hostname information, don't enable Dataplex Universal Catalog data lineage integration.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.