About lineage visualization

Data lineage helps you understand how datamoves through your systems by tracking the relationships between data assets andthe processes that transform them. You can view this lineage information asgraphs and lists in the Google Cloud console.

This document provides an overview ofthe data lineage information model, details on table-level and column-levellineage granularity, and instructions on using graph and list views to exploredata lineage.

Data lineage information model

Lineage is a record of data being transformed from sources to targets. TheData Lineage API collects this information and organizes it into ahierarchical data model that uses the concepts of processes, runs, and events.

Process: a data transformation definition.
Run: an execution of a process.
Event: a record of data movement during a run.

Process

A process is the definition of a data transformation operation for aspecific system. For BigQuery lineage, a process is a job of asupported job type.All executions of the same SQL query are linked to a single process, which letsyou track every instance where a specific transformation logic is used.

For example, the following SQL query is a process. This query creates a table bycounting the total number of trips for each vendor from two source tables.

CREATETABLE`dataplex-docs.data_lineage_demo.total_green_trips_22_21`ASSELECTvendor_id,COUNT(*)ASnumber_of_tripsFROM(SELECTvendor_idFROM`dataplex-docs.data_lineage_demo.nyc_green_trips_2022`UNIONALLSELECTvendor_idFROM`dataplex-docs.data_lineage_demo.nyc_green_trips_2021`)GROUPBYvendor_id;

The REST resource name format for a process isprojects/PROJECT_NUMBER/locations/LOCATION/processes/PROCESS_ID.

For example:projects/123456789123/locations/us/processes/sh-0548bbf4ff3c8072a6c7372ba1acafb6

For more information about theprocess resource, see theProcess resource reference.

Run

A run is a single execution of a process. Processes can have multiple runs.

Each run is a unique operation characterized by astartTime, anendTime, anda final state, such asCOMPLETED,FAILED, orABORTED.

For example, executing the SQL query from theProcess section at 9:00 AM creates a specificrun. Executing the same query again at 10:00 AM creates a new, distinct run.Both runs are linked to the same parent process.

The REST resource name format for a run shows that it's a child of a process:projects/PROJECT_NUMBER/locations/LOCATION/processes/PROCESS_ID/runs/RUN_ID.

For example:projects/123456789123/locations/us/processes/sh-0548bbf4ff3c8072a6c7372ba1acafb6/runs/83dd03a51cd2ac80f465c9e267a950b1

For more information about therun resource, see theRun resource reference.

Event

An event represents a point in time when a data transformation movesdata between a source and a target entity. An event is a granular record of aspecific data movement that connects source and target tables for a specificrun. An event can also have multiple sources and targets.

For example, if your run executes the SQL query discussed in theProcess section, a lineage event records thatthenyc_green_trips_2021 andnyc_green_trips_2022 source tables are used tocreate thetotal_green_trips_22_21 target table.

A lineage event contains a list oflinksthat define the source and target. Events are used to create lineage graphs.While the Google Cloud console presents these lineage graphs, it doesn'tdirectly display individual events. You can create, read, and delete, but notupdate events by using the Data Lineage API.

Each link within an event defines a single path of data flow from a sourceentity to a target entity. An entity is a reference to a data asset, such as aBigQuery table, and is identified by itsFully Qualified Name (FQN).A single event can contain multiple links, which is common in operationslike table joins where multiple sources contribute to one target.

For details on how events support column-level lineage, seeColumn-level lineage.

Lineage granularity

Data lineage lets you trace the origin andtransformation path of your data at both the table and column level.

Table-level lineage

Table-level lineage provides a high-level overview of your data pipelines byshowing the relationships between entire tables. Use table-level lineage formacro-level tasks such as the following:

Data discovery. An analyst building a new dashboard can use table-levellineage to trace a summary table back to its sources and confirm that thedata originates from an authoritative database.
Migration planning. A database administrator planning to migrate acore database can use table-level lineage to identify every downstreamreport and dashboard that depends on it.
Auditing and governance. A data governor can use table-level andcolumn-level lineage to check how data from a table that contains personallyidentifiable information (PII) flows through a pipeline.

Column-level lineage

Note: Column-level lineage is only supported for BigQuery jobs.

Column-level lineage provides a more granular view by tracking the flow ofdata between individual columns. In this view, the links within a lineage eventrepresent the relationship between a source column and a target column. Each ofthese column-level links has a dependency type that describes thetransformation:

Exact copy: values are copied between columns.
Other: other types of dependencies between columns.

Use column-level lineage for tasks such as the following:

Root cause analysis. If a data analyst finds an incorrect value in acolumn, they can use column-level lineage to trace it back to thesource columns to find the root cause.
Impact analysis. Before a data engineer deprecates a column, they canuse column-level lineage to find every downstream column that dependson it.
Data source verification for metrics. A data analyst can usecolumn-level lineage to identify which source columns are used to calculatea metric without deciphering a complex SQL query.

Column-level lineage is automatically collected for the following types ofBigQuery jobs:

Lineage views in the Google Cloud console

Data lineage in the Google Cloud console lets you interactwith lineage information in two ways: you can explore the lineage graph acrossmultiple available regions, or you can use theLineage explorer panel to geta more focused view within a specific region. You can also switch between theGraph view and theList view to analyze data flow at different levels ofdetail.

Lineage views are onlyavailable for Dataplex Universal Catalog entries,BigQuery assets, and Vertex AI resources (models,datasets, feature store views, and feature groups).

To see the different views discussed in this page, seeUse data lineage with Google Cloud systems.

Lineage graph view

TheGraph view visualizes data asset flow and relationships across systemsand regions, helping you understand data architecture, trace origins anddestinations, and identify patterns. These lineage graphs, generated by theData Lineage API service for a specific Dataplex Universal Catalogentry, show how data is transformed over time, displaying upstream, downstream,or both flows from a selected root entry.

The Data Lineage API automatically receives asset information from supportedsystems and through API calls for custom sources.

The key elements in the graph are described as follows:

Nodes. Represent the data entities. In a table-level view, a node showsthe table name and its columns. In a column-level view, each node representsa specific table and column.
Edges. The lines that connect nodes and represent the processes thatoccur between them. The appearance of an edge depends on the lineage view:
- In the table-level view, edges have icons to indicate datatransformations.
- In the column-level view, edges have labels to indicate datatransformations. For example, an edge label might sayExact copy todescribe how a source column was copied to a target column.
Process icons and labels. Appear on edges to provide more informationabout the transformation.
- Icons. Represent the transformation process. When you manuallyexplore the graph, icons on edges represent the source system of theprocess (for example, BigQuery orVertex AI). If multiple processes are involved, a'multiple processes' icon is displayed. If the process source systemis unknown, a gear icon is used. When you apply filters, a gear icon isused for all processes.
- Labels. In the column-level lineage view, a label describes the typeof dependency between columns:Exact copy orOther.

Manually explore the lineage graph

When you open theLineage tab, you see the defaultGraph view. Thedefault view provides a high-level overview across systems and regions, withmanual and incremental graph expansion that can load five nodes at a time.Process icons on edges represent the source system or indicate multipleprocesses.

A default lineage graph view showing interconnected data assets. — Default lineage graph view

Apply filters for a focused lineage view

Preview

This product or feature is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of the Service Specific Terms. Pre-GA products and features are available "as is" and might have limited support. For more information, see thelaunch stage descriptions.

To filter lineage data for focused analysis within a specific region, use theLineage explorer panel. Here are some criteria that you can use toswitch to a focused view:

Column name: Filter lineage by column name to see column-leveldetails.
Direction: Show upstream or downstream lineage, or both.
Time range: Filter lineage based on a specific start or end time.
Dependency type: Filter column-level lineage based on dependency type.Examples of available options includeAll orExact copy.

The Lineage explorer panel showing filters for column-level lineage, direction, and time range. — Lineage Explorer panel

The focused view automatically expands the graph up to 3 levels, loading alllineage matching the filter criteria. Lineage Explorer fetches up to 10 levelsof the lineage graph, but only the first 3 levels are expanded by default. Youcan expand the graph to see the remaining levels by clicking on the arrows.

The focused view supports both table-level and column-level lineage, includingpath visualization from any selected node back to the root. In this focusedview, a generic gear icon is used for all processes.

A focused lineage graph view showing filtered data assets. — Focused table-level lineage graph view

To view column-level lineage, you can follow one of the following methods:

In a focusedGraph view, click the column icon on a table to switchto column-level lineage.
Column icon
In defaultGraph view or focusedGraph view, apply a columnname in theLineage explorer panel.

A lineage graph showing column-level relationships between tables. — Column-level lineage view

To remove all filters and return to the default view, click reset.

Node details

To see the details of a node, click the node. A side panel appears and displaysdetailed information about the selected data asset. For example, in atable-level lineage view, clicking a node displays information such as theasset's fully qualified name, type, and other relevant attributes.

Details panel for a selected node in the lineage graph. — Node details

Audit and history of runs

A complete lineage graph is the result of runs from many different jobs, witheach job creating a specific link in the graph. Multiple executions are loggedas new runs but don't change the static appearance of the graph.

To see the details of these individual executions, click an edge with a processon the graph. In theQuery panel that appears, click theRuns tab.

The Query panel showing the Details and Runs tab. — Query panel

Inspect transformation logic

To understand the business logic of a transformation without searching forthe code, you can view the exact SQL query that was run. To view the SQLcode, click an edge with a process on the graph. In the side panel thatappears, click theDetails tab.

Lineage path visualization

Note: Lineage path visualization is only available when you have applied filtersin theLineage explorer panel.

Lineage path visualization helps you trace the path from any selected node inthe graph back to the root entry. When you select a node and clickVisualize path, the graph highlights only the nodes and processes that formthe direct lineage path to the root entry.

To see the lineage path visualization, in theLineage explorer panel, apply afilter to create a focusedGraph view. Then, in the focusedGraph view,select a node. In the details panel for the selected node, clickVisualize Path.

Lineage path visualization is available for table-level and column-levellineage. You can also use lineage path visualization in theListview.

Lineage path visualization button in column-level lineage graph view. — Lineage path visualization button in column-level lineage graph view

Lineage list view

TheList view offers a tabular, structured representation of lineage,synchronized with theGraph view. It facilitates sorting, filtering, anddownloading data assets. This view is ideal for analyzing source-targetrelationships, detailing involved assets, and exporting lineage data.

TheList view is available for both table-level and column-level lineage. Youcan toggle between the following detailed and simplified list views.

Simplified list view: this view is useful for getting a condensed,unique list of all assets involved in the lineage. The columns such asSystem,Project,Entity,FQN (Fully Qualified Name),Direction, andDepth help you see all the data assets in thelineage, where they reside, their original source, and their distance fromthe central asset being analyzed. It is ideal for a high-level overview ofall entities participating in the data flow. It is the default view.
Detailed list view: this view is designed for analyzing individualsource-target relationships. By providing separate columns forSourceandTarget, you can see each specific data transformation link. Thisview is ideal for tasks requiring a deep understanding of how data movesbetween specific pairs of assets, such as auditing individual data flows,understanding dependencies between tables, or exporting detailed lineagerecords for each connection.

Table-level lineage list view

This view shows relationships between tables as a whole.Use the provided filters to select the columns that you require.

Expand the following sections to see the columns available in the table-level list views.

Columns available in simplified table-level list view

System: the system where the data asset is located. Examples include BigQuery.
Project: the Google Cloud project ID containing the data asset.
Entity: the name of the data asset. Examples include a table name.
FQN: the Fully Qualified Name (FQN) of the original source entity or column.
Direction: indicates whether the listed asset is upstream (source) or downstream (target) in the lineage flow.
Depth: the number of lineage steps from the central asset being analyzed.

Columns available in detailed table-level list view

Source system: the system where the source data asset is located. Examples include BigQuery.
Source project: the Google Cloud project ID containing the source data asset.
Source: the name of the source data asset. Examples include a table name.
Source FQN: the FQN of the source entity.
Target system: the system where the target data asset is located. Examples include BigQuery.
Target project: the Google Cloud project ID containing the target data asset.
Target: the name of the target data asset. Examples include a table name.
Target FQN: the FQN of the target entity.
Direction: indicates whether the listed asset is upstream (source) or downstream (target) in the lineage flow.
Depth: the number of lineage steps from the central asset being analyzed.

Column-level lineage list view

This view shows relationships between individual columns in the source andtarget tables. Use the provided filters to select the columns that you require.

Expand the following sections to see the columns available in the column-level list views.

Columns available in simplified column-level list view

System: the system where the data asset is located. Examples include BigQuery.
Project: the Google Cloud project ID containing the data asset.
Entity: the name of the data asset. Examples include a table name.
Column: the specific column chosen in theLineage Explorer panel within the entity.
FQN: the Fully Qualified Name (FQN) of the original source entity or column.
Direction: indicates whether the listed asset is upstream (source) or downstream (target) in the lineage flow.
Depth: the number of lineage steps from the central asset being analyzed.

Columns available in detailed column-level list view

Source system: the system where the source data asset is located.
Source project: the Google Cloud project ID containing the source data asset.
Source FQN: the FQN of the source column.
Target system: the system where the target data asset is located.
Target project: the Google Cloud project ID containing the target data asset.
Target FQN: the FQN of the target column.
Direction: indicates if the data flow is upstream or downstream.
Dependency types: describes the nature of the relationship between the columns.
Depth: the number of lineage steps from the central asset being analyzed.

What's next

Learn about lineage sources.
Learn how totrack data lineage for a BigQuery table copy and query jobs.
Learn how touse data lineage with Google Cloud systems.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.

Movatterモバイル変換

About lineage visualization Stay organized with collections Save and categorize content based on your preferences.