Track data lineage for a BigQuery table
This document describes how to track the lineage of data in BigQuerytables.Data lineage is the process of tracking where data comesfrom, how it's transformed, and where it moves over time. Understandingdata lineage is crucial for ensuring compliance,troubleshooting data issues, and performing root-cause analysis.
This quickstart shows you how to get started with data lineage forBigQuery tables:
Copy two tables from a publicly available
new_york_taxi_tripsdataset.Combine the total number of taxi rides from both tables into a new table.
View a lineage visualization graph for all three operations.
Before you begin
Set up your project:
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.Roles required to select or create a project
- Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
- Create a project: To create a project, you need the Project Creator role (
roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission.Learn how to grant roles.
If you're using an existing project for this guide,verify that you have the permissions required to complete this guide. If you created a new project, then you already have the required permissions.
Verify that billing is enabled for your Google Cloud project.
Enable the Dataplex, BigQuery, and Data Lineage APIs.
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM role (
roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enablepermission.Learn how to grant roles.In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.Roles required to select or create a project
- Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
- Create a project: To create a project, you need the Project Creator role (
roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission.Learn how to grant roles.
If you're using an existing project for this guide,verify that you have the permissions required to complete this guide. If you created a new project, then you already have the required permissions.
Verify that billing is enabled for your Google Cloud project.
Enable the Dataplex, BigQuery, and Data Lineage APIs.
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM role (
roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enablepermission.Learn how to grant roles.
Required roles
To get the permissions that you need to view lineage visualization graphs, ask your administrator to grant you the following IAM roles:
- Dataplex Catalog Viewer (
roles/dataplex.catalogViewer) on the Dataplex Universal Catalog resource project - Data Lineage Viewer (
roles/datalineage.viewer) on the project where you use BigQuery - BigQuery Data Viewer (
roles/bigquery.dataViewer) on the project where you use BigQuery
For more information about granting roles, seeManage access to projects, folders, and organizations.
You might also be able to get the required permissions throughcustom roles or otherpredefined roles.
Add a public dataset to your project
In the Google Cloud console, go to the BigQuery page.
In the left pane, clickExplorer:

If you don't see the left pane, clickExpand left pane to open the pane.
In theExplorer pane, clickAdd data.
In theAdd data pane, selectPublic datasets.
In theMarketplace pane, search for
NYC TLC Tripsand click theNYC TLC Trips result.ClickView dataset.
This adds the public dataset's project as a reference that you can view in theExplorer pane. The details pane showsDataset info, including informationsuch asDataset ID,Data location, andLast modified date.
Create a dataset in your project
In the left pane, clickExplorer:

In theExplorer pane, select the project where you want to create thedataset.
Click
more_vertActions and clickCreatedataset. On theCreate dataset page, in theDataset ID field, enter:
data_lineage_demo. Leave the other fields with their default values.ClickCreate dataset.
In theExplorer pane, clickDatasets, and then click the newly added
data_lineage_demo.
The details pane shows itsDataset info.
Copy two publicly accessible tables to your dataset
Open a query editor: In the details pane, next to the tab called
data_lineage_demo, clickSQL query. This step creates a tab calledUntitled.In the query editor, copy the first table by entering the followingquery. Replace
PROJECT_IDwith yourproject'sidentifier.CREATETABLE`PROJECT_ID.data_lineage_demo.nyc_green_trips_2021`COPY`bigquery-public-data.new_york_taxi_trips.tlc_green_trips_2021`ClickRun. This stepcreates the first table, called
nyc_green_trips_2021.In theQuery results pane, clickGo to table. This step displaysthe contents of the first table.
In the query editor, copy the second table by replacing the previousquery with the following query. Replace
PROJECT_IDwith yourproject'sidentifier.CREATETABLE`PROJECT_ID.data_lineage_demo.nyc_green_trips_2022`COPY`bigquery-public-data.new_york_taxi_trips.tlc_green_trips_2022`ClickRun. This stepcreates the second table, called
nyc_green_trips_2022.In theQuery results pane, clickGo to table. This step displaysthe contents of the second table.
Aggregate data into a new table
In the query editor, enter the following query. Replace
PROJECT_IDwithyourproject's identifier.CREATETABLE`PROJECT_ID.data_lineage_demo.total_green_trips_22_21`ASSELECTvendor_id,COUNT(*)ASnumber_of_tripsFROM(SELECTvendor_idFROM`PROJECT_ID.data_lineage_demo.nyc_green_trips_2022`UNIONALLSELECTvendor_idFROM`PROJECT_ID.data_lineage_demo.nyc_green_trips_2021`)GROUPBYvendor_idClickRun. This stepcreates a combined table, called
total_green_trips_22_21.In theQuery results pane, clickGo to table. This step displaysthe combined table.
View the lineage graph in Dataplex Universal Catalog
In the Google Cloud console, go to the Dataplex Universal CatalogSearch page.
If your search platform is set toData Catalog, in theChoose search platform menu, selectDataplex Universal Catalog.
In theSearch box, enter
total_green_trips_22_21and clickSearch.From the results list, click
total_green_trips_22_21. This step displaysthe BigQuery tableDetails tab.Click theLineage tab.

In the lineage graph, each rectangular node represents a table, either anoriginal, copied, or combined table. You can do the following:
To show or hide the origin of a table, click+ (Expand) or-(Collapse).
To show table information, click a node. This step displays a nodeDetails pane.
To show process information, click
.This step displays a processDetails pane showing the job thattransformed a source table to a target table.

Clean up
To avoid incurring charges to your Google Cloud account for the resources used on this page, follow these steps.
Delete the project
Delete the dataset
In the Google Cloud console, go to theBigQuery page.
In the left pane, clickExplorer:

In theExplorer pane, search for the
data_lineage_demodataset thatyou created.Click the dataset, and then clickDelete.
Confirm your delete action.
What's next
- Learn more aboutdata lineage.
- Learn how torun BigQueryqueries.
- Learn how touse data lineage.
- Learn aboutDataplex Universal Catalog pricing.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2026-02-19 UTC.