Use data lineage with Google Cloud systems

View data lineage to understand the relationships between yourproject's resources and the processes that created them. These relationshipsshow how data assets, such as tables and datasets, are transformed by processeslike queries and pipelines. This guide describes how to access lineage graphs inDataplex Universal Catalog, BigQuery, and Vertex AI.

You can view data lineage details in the Google Cloud console orretrieve them by using the Data Lineage API.

Roles and permissions

Data lineage tracks lineage information automatically whenyou enable theData Lineage API. You don'tneed any administrator or editor roles to capture lineage for your data assets.

To view data lineage, you need specific Identity and Access Management(IAM) permissions. Lineage information is captured across projects, so you needpermissions in multiple projects.

When viewing lineage in Dataplex Universal Catalog, BigQuery, orVertex AI: you need permissions to view lineage information in theproject where you are viewing it.
When viewing lineage that was recorded in other projects: you needpermissions to view lineage information in those projects where it wasrecorded.

To get the permissions that you need to view data lineage, ask your administrator to grant you the following IAM roles:

Data Lineage Viewer (roles/datalineage.viewer) on the project where lineage is recorded, and the project where lineage is viewed
View BigQuery table details:BigQuery Data Viewer (roles/bigquery.dataViewer) on the table's storage project
View BigQuery job details:BigQuery Resource Viewer (roles/bigquery.resourceViewer) on the job's compute project
View details for other cataloged assets:Dataplex Catalog Viewer (roles/dataplex.catalogViewer) on the project where catalog entries are stored

For more information about granting roles, seeManage access to projects, folders, and organizations.

These predefined roles contain the permissions required to view data lineage. To see the exact permissions that are required, expand theRequired permissions section:

Required permissions

The following permissions are required to view data lineage:

View BigQuery table details:bigquery.tables.get - the table's storage project
View BigQuery job details:bigquery.jobs.get - the job's compute project

You might also be able to get these permissions withcustom roles or otherpredefined roles.

Data lineage tracks lineage information automatically whenyou enable theData Lineage API. You don'tneed any administrator or editor roles to capture lineage for your data assets.

Types of data lineage views

You can view lineage information as a graph or a list.The lineage graph displays table-level lineage by default. ForBigQuery jobs, you can view column-level lineage inboth graph and list views.

The following view types are available:

Graph view: displays lineage as an interactive graph, letting youexplore relationships between data assets and columns by expanding nodes.
List view: displays lineage in a tabular format, providing simplifiedand detailed representations of table-level and column-level lineage.You can customize columns and export lineage data from this view.

The key elements in the graph are described as follows:

Nodes: represent the data entities. In the table-level view, a nodeshows the table name and its columns. In the column-level view, each noderepresents a specific table and its columns that have lineage.
Edges: the lines that connect nodes and represent the processes thatoccur between them. Edges can feature icons or labels to provide moreinformation about the transformation:
- Icons: In table-level view, icons appear on edges to representthe transformation process. When you manually explore the graph,icons on edges represent the source system of the process(for example, BigQuery or Vertex AI).If multiple processes are involved, a 'multiple processes' icon isdisplayed. If the process source system is unknown, a gear icon is used.When you apply filters, a gear icon is used for all processes.
- Labels: In column-level view, edges are labeled to describe thetype of dependency between columns, such asExact copy orOther.

Enable data lineage

Caution: Data lineage is enabled on a per-project basis, not aper-service basis.After you enable the Data Lineage API, lineage information is automaticallyreported for multiple Google Cloud services in the project, depending on theirproduct-level lineage control.For more details, see Data lineage considerations.

Enable data lineage to begin automatically tracking lineageinformation forsupported systems.By default, enabling the API activates lineage tracking for most supportedservices. To control Dataproc lineage ingestion, seeControl lineage ingestion for a service.

You must enable the Data Lineage API in both the project where you viewlineage and the projects where lineage is recorded. For more information, seeProject types.

To capture lineage information, complete the following steps:
1. In the Google Cloud console, on theProject selector page, select the project where you want to record lineage.
  Go to Project selector
2. Enable the Data Lineage API.
  Enable the Data Lineage API
3. Repeat the previous steps for each project where you want to record lineage.
In the project where you view lineage, enable the Data Lineage API and the Dataplex API.
Enable the APIs

Control lineage ingestion for a service

Preview

This product or feature is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of the Service Specific Terms. Pre-GA products and features are available "as is" and might have limited support. For more information, see thelaunch stage descriptions.

After you enable the Data Lineage API, the service starts automatic lineagetracking for most supported services. You can then selectively enable or disablelineage ingestion for specific integrations at the project, folder, ororganization level. During preview, this feature only supports configuringingestion for Dataproc. If you disable lineage ingestion forDataproc, it also disables lineage ingestion for DataprocServerless for Apache Spark.

The configuration is hierarchical. The most specific configuration takesprecedence. For example, a project-level configuration overrides a folder-levelconfiguration. If no configuration is set, the service's default behavior isused. For Dataproc, the default isEnabled.

Any changes to the configuration might take up to 24 hours to propagate, butusually become effective within two hours.

For Dataproc and Dataproc Serverless for Apache Spark,lineage data is only sent if lineage is also enabled in Dataproc.For more information, seeDataproc Spark lineage andDataproc Serverless for Apache Spark data lineage.

For more information about controlling lineage ingestion including how theconfiguration is applied hierarchically, seeControl lineageingestion.

Prerequisites

To control lineage ingestion, you must use the Data Lineage API.Ensure you have a client project configured for billing and quota, as theData Lineage API is aclient-based API.

Enable thedatalineage.googleapis.com API in your client project. Formore information, seeEnable data lineage.
Set the client project. For the following examples, use theX-Goog-User-Project header. For more information, seeSystem parameters.

Get current configuration

To view the current lineage configuration, use theprojects.locations.config.get method. You can retrieve the configuration fora project, folder, or organization.

The following example shows how to get the configuration for a project:

curl-H"Authorization: Bearer$(gcloudauthprint-access-token)"\-H"Content-Type: application/json"\-H"X-Goog-User-Project:CLIENT_PROJECT_ID"\-XGET\"https://datalineage.googleapis.com/v1/projects/PROJECT_ID/locations/global/config"

Replace these values:

CLIENT_PROJECT_ID: The ID of your client project used for billing or quotas.
PROJECT_ID: The ID of the project whose configuration you want to view.

To get the configuration for a folder or organization, replaceprojects/PROJECT_ID withfolders/FOLDER_ID ororganizations/ORGANIZATION_ID.

The command returns one of the following output:

If no configuration is set, you get an output with an emptyingestion object:
```
{"name":"projects/123456789012/locations/global/config","ingestion":{}}
```
In this case, Dataproc lineage ingestion uses the default setting, which isenabled.

If Dataproc lineage ingestion is explicitly enabled, you get the following output:

{"name":"projects/123456789012/locations/global/config","ingestion":{"rules":[{"integrationSelector":{"integration":"DATAPROC"},"lineageEnablement":{"enabled":true}}]},"etag":"Wb35wDxTTLd6Z+QAL+Yd4g=="}

If Dataproc lineage ingestion is disabled, you get the following output:

{"name":"projects/123456789012/locations/global/config","ingestion":{"rules":[{"integrationSelector":{"integration":"DATAPROC"},"lineageEnablement":{"enabled":false}}]},"etag":"Wb35wDxTTLd6Z+QAL+Yd4g=="}

Theetag field in the response is a checksum generated by the server based onthe current value of the configuration. When updating a configuration usingthepatch method, you can include theetag value returned from arecentget request in the request body. If you provide theetag,Dataplex Universal Catalog uses it to verify that the configuration hasn't changedsince your last read request. If there's a mismatch, the update requestfails. This prevents you from unintentionally overwriting configurations made byother users in read-modify-write scenarios. If you don't provide anetagin yourpatch request, Dataplex Universal Catalog overwrites the configurationunconditionally.

Disable lineage ingestion for a service

To disable lineage ingestion for a specific service,use theprojects.locations.config.patch method with an ingestion rule thatsetslineageEnablement.enabled tofalse for the specificintegration.

To prevent unintentionally overwriting configurations made by other users inread-modify-write scenarios, you can include theetag field in the requestbody. For more information, seeGet current configuration.

curl-H"Authorization: Bearer$(gcloudauthprint-access-token)"\-H"Content-Type: application/json"\-H"X-Goog-User-Project:CLIENT_PROJECT_ID"\-XPATCH\"https://datalineage.googleapis.com/v1/projects/PROJECT_ID/locations/global/config"\--data-binary@-<< EOF{  "ingestion": {    "rules": [{      "integrationSelector": {        "integration": "DATAPROC"      },      "lineageEnablement": {        "enabled": false      }    }]  },  "etag": "ETAG"}EOF

Replace the following:

CLIENT_PROJECT_ID: The ID of your client project used for billing or quotas.
PROJECT_ID: The ID of the project whose configuration you want to update.
ETAG: Theetag value returned from a recentget request.

To disable lineage ingestion of a service for a folder or organization, replaceprojects/PROJECT_ID withfolders/FOLDER_ID ororganizations/ORGANIZATION_ID.

Enable lineage ingestion for a service

To enable lineage ingestion for a specific service,use theprojects.locations.config.patch method with an ingestion rule thatsetslineageEnablement.enabled totrue for the specificintegration.

curl-H"Authorization: Bearer$(gcloudauthprint-access-token)"\-H"Content-Type: application/json"\-H"X-Goog-User-Project:CLIENT_PROJECT_ID"\-XPATCH\"https://datalineage.googleapis.com/v1/projects/PROJECT_ID/locations/global/config"\--data-binary@-<< EOF{  "ingestion": {    "rules": [{      "integrationSelector": {        "integration": "DATAPROC"      },      "lineageEnablement": {        "enabled": true      }    }]  },  "etag": "ETAG"}EOF

Replace the following:

CLIENT_PROJECT_ID: The ID of your client project used for billing or quotas.
PROJECT_ID: The ID of the project whose configuration you want to update.
ETAG: Theetag value returned from a recentget request.

To enable lineage ingestion of a service for a folder or organization, replaceprojects/PROJECT_ID withfolders/FOLDER_ID ororganizations/ORGANIZATION_ID.

View lineage in Dataplex Universal Catalog

You can view data lineage information in the Dataplex Universal Catalog web interface.

Tip: When you use data lineage with Dataplex Universal Catalog, be aware of thedifferences that are described in About data lineage in Data Catalog.Note: Depending on the volume of data being processed, it takes time fordata lineage to display a graph. For most jobs it takes three hours,and for some jobs it can take up to 24 hours.

To view the lineage, follow these instructions:

In the Google Cloud console, go to the Dataplex Universal CatalogSearch page.
Go to Search
SelectDataplex Universal Catalog as the search mode.
Search for the entry you want to view, and then click it. For moreinformation, seeSearch for resources in Dataplex Universal Catalog.
Click theLineage tab.
The defaultGraph view opens, showing table-level lineage acrosssystems and regions. For more information, seeLineage graph view.
To manually explore the lineage graph, clickExpand next to anode to load five more nodes at a time.
For more information, seeManually explore the lineage graph.
Click a node in theGraph view.
TheDetails panel opens with information about the asset, such as fullyqualified name and type. For more information, seeNode details.
Click an edge with a process icon in theGraph view.
TheQuery panel opens. For more information, seeInspect transformation logic andAudit and history of runs.
- To inspect transformation logic, click theDetails tab.
- To see audit and history of runs, click theRuns tab.
In theLineage explorer panel, select filter criteria—for example,Direction,Dependency type, orTime range—and then clickApply.
This opens a focused view within a specific region (Preview). This viewautomatically expands the graph up to three levels of nodes. For moreinformation, seeApply filters for a focused lineage view.
In the focusedGraph view, select a node, and then in the node's detailspanel, clickVisualize Path to visualize the lineage path from theselected node back to the root entry (only in focused view).
For more information, seeLineage path visualization.
To view column-level lineage (only for BigQuery jobs), do one of the following:
- In a focusedGraph view, click the column icon on a table.
  Column icon
- In theLineage explorer panel, filter by column name, and clickApply.
For more information, seeColumn-level lineage.
ClickReset.
This action removes all applied filters and takes you to the beginning ofthe graph view.
ClickList to switch to the list view.
TheList view offers simplified and detailed tabular representations of lineagefor both table-level and column-level lineage,synchronized with theGraph view. By default, simplified list viewis displayed, and you can toggle to detailed list view for analyzingindividual source-target relationships. You can configure which columnsare displayed and export lineage data. For more information, seeLineage list view.

View lineage in BigQuery

You can view data lineage information in the BigQuery web interface.

To view the lineage, follow these instructions:

In the Google Cloud console, go to theBigQuery page.
Open the BigQuery page
Open the table for which you want to see the data lineage.
Click theLineage tab.
The defaultGraph view opens, showing table-level lineage acrosssystems and regions. For more information, seeLineage graph view.
To manually explore the lineage graph, clickExpand next to anode to load five more nodes at a time.
For more information, seeManually explore the lineage graph.
Click a node in theGraph view.
TheDetails panel opens with information about the asset, such as fullyqualified name and type. For more information, seeNode details.
Click an edge with a process icon in theGraph view.
TheQuery panel opens. For more information, seeInspect transformation logic andAudit and history of runs.
- To inspect transformation logic, click theDetails tab.
- To see audit and history of runs, click theRuns tab.
In theLineage explorer panel, select filter criteria—for example,Direction,Dependency type, orTime range—and then clickApply.
This opens a focused view within a specific region (Preview). This viewautomatically expands the graph up to three levels of nodes. For moreinformation, seeApply filters for a focused lineage view.
In the focusedGraph view, select a node, and then in the node's detailspanel, clickVisualize Path to visualize the lineage path from theselected node back to the root entry (only in focused view).
For more information, seeLineage path visualization.
To view column-level lineage (only for BigQuery jobs), do one of the following:
- In a focusedGraph view, click the column icon on a table.
  Column icon
- In theLineage explorer panel, filter by column name, and clickApply.
For more information, seeColumn-level lineage.
ClickReset.
This action removes all applied filters and takes you to the beginning ofthe graph view.
ClickList to switch to the list view.
TheList view offers simplified and detailed tabular representations of lineagefor both table-level and column-level lineage,synchronized with theGraph view. By default, simplified list viewis displayed, and you can toggle to detailed list view for analyzingindividual source-target relationships. You can configure which columnsare displayed and export lineage data. For more information, seeLineage list view.

View lineage in Vertex AI

Systems like Vertex AI Pipelines generate lineage data forVertex AI models and datasets. You can view data lineage information inthe Vertex AI web interface.

Note: To view the lineage of Vertex AI operations, such as thetraining, test, and evaluation data used or hyperparameters applied, see Track the lineage of pipeline artifacts.

View lineage for a managed dataset in Vertex AI

To view the lineage for a dataset, follow these instructions:

In the Google Cloud console, go to theDatasets page.
Open the Datasets page
Click the dataset for which you want to see the data lineage.
Click theLineage tab.
The defaultGraph view opens, showing table-level lineage acrosssystems and regions. For more information, seeLineage graph view.
To manually explore the lineage graph, clickExpand next to anode to load five more nodes at a time.
For more information, seeManually explore the lineage graph.
Click a node in theGraph view.
TheDetails panel opens with information about the asset, such as fullyqualified name and type. For more information, seeNode details.
Click an edge with a process icon in theGraph view.
TheQuery panel opens. For more information, seeInspect transformation logic andAudit and history of runs.
- To inspect transformation logic, click theDetails tab.
- To see audit and history of runs, click theRuns tab.
In theLineage explorer panel, select filter criteria—for example,Direction,Dependency type, orTime range—and then clickApply.
This opens a focused view within a specific region (Preview). This viewautomatically expands the graph up to three levels of nodes. For moreinformation, seeApply filters for a focused lineage view.
In the focusedGraph view, select a node, and then in the node's detailspanel, clickVisualize Path to visualize the lineage path from theselected node back to the root entry (only in focused view).
For more information, seeLineage path visualization.
To view column-level lineage (only for BigQuery jobs), do one of the following:
- In a focusedGraph view, click the column icon on a table.
  Column icon
- In theLineage explorer panel, filter by column name, and clickApply.
For more information, seeColumn-level lineage.
ClickReset.
This action removes all applied filters and takes you to the beginning ofthe graph view.
ClickList to switch to the list view.
TheList view offers simplified and detailed tabular representations of lineagefor both table-level and column-level lineage,synchronized with theGraph view. By default, simplified list viewis displayed, and you can toggle to detailed list view for analyzingindividual source-target relationships. You can configure which columnsare displayed and export lineage data. For more information, seeLineage list view.

Note: On the lineage graph, Vertex AI processes are preceded by the

.These processes include Vertex AI Pipelines components and templates, andVertex ML Metadata artifacts, datasets, and models. If the process iscataloged in Dataplex Universal Catalog with a fully qualified domain name (FQN),you can click the entry in the lineage graph to view its details. Alternatively,to view the lineage graph details in Vertex AI, clickOpen in Vertex AI in theDetails panel.

View lineage for a model in Vertex AI

To view the lineage for a model, follow these instructions:

In the Google Cloud console, go to theModel Registry page.
Open the Model Registry page
Click the model for which you want to see the data lineage.
Click theLineage tab.
The defaultGraph view opens, showing table-level lineage acrosssystems and regions. For more information, seeLineage graph view.
To manually explore the lineage graph, clickExpand next to anode to load five more nodes at a time.
For more information, seeManually explore the lineage graph.
Click a node in theGraph view.
TheDetails panel opens with information about the asset, such as fullyqualified name and type. For more information, seeNode details.
Click an edge with a process icon in theGraph view.
TheQuery panel opens. For more information, seeInspect transformation logic andAudit and history of runs.
- To inspect transformation logic, click theDetails tab.
- To see audit and history of runs, click theRuns tab.
In theLineage explorer panel, select filter criteria—for example,Direction,Dependency type, orTime range—and then clickApply.
This opens a focused view within a specific region (Preview). This viewautomatically expands the graph up to three levels of nodes. For moreinformation, seeApply filters for a focused lineage view.
In the focusedGraph view, select a node, and then in the node's detailspanel, clickVisualize Path to visualize the lineage path from theselected node back to the root entry (only in focused view).
For more information, seeLineage path visualization.
To view column-level lineage (only for BigQuery jobs), do one of the following:
- In a focusedGraph view, click the column icon on a table.
  Column icon
- In theLineage explorer panel, filter by column name, and clickApply.
For more information, seeColumn-level lineage.
ClickReset.
This action removes all applied filters and takes you to the beginning ofthe graph view.
ClickList to switch to the list view.
TheList view offers simplified and detailed tabular representations of lineagefor both table-level and column-level lineage,synchronized with theGraph view. By default, simplified list viewis displayed, and you can toggle to detailed list view for analyzingindividual source-target relationships. You can configure which columnsare displayed and export lineage data. For more information, seeLineage list view.