Track data lineage for a BigQuery table

This document describes how to track the lineage of data in BigQuerytables.Data lineage is the process of tracking where data comesfrom, how it's transformed, and where it moves over time. Understandingdata lineage is crucial for ensuring compliance,troubleshooting data issues, and performing root-cause analysis.

This quickstart shows you how to get started with data lineage forBigQuery tables:

  1. Copy two tables from a publicly availablenew_york_taxi_trips dataset.

  2. Combine the total number of taxi rides from both tables into a new table.

  3. View a lineage visualization graph for all three operations.

Before you begin

Set up your project:

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Roles required to select or create a project

    • Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
    • Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.create permission.Learn how to grant roles.
    Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.

    Go to project selector

  3. If you're using an existing project for this guide,verify that you have the permissions required to complete this guide. If you created a new project, then you already have the required permissions.

  4. Verify that billing is enabled for your Google Cloud project.

  5. Enable the Dataplex, BigQuery, and Data Lineage APIs.

    Roles required to enable APIs

    To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enable permission.Learn how to grant roles.

    Enable the APIs

  6. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Roles required to select or create a project

    • Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
    • Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.create permission.Learn how to grant roles.
    Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.

    Go to project selector

  7. If you're using an existing project for this guide,verify that you have the permissions required to complete this guide. If you created a new project, then you already have the required permissions.

  8. Verify that billing is enabled for your Google Cloud project.

  9. Enable the Dataplex, BigQuery, and Data Lineage APIs.

    Roles required to enable APIs

    To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enable permission.Learn how to grant roles.

    Enable the APIs

Caution: Data lineage is enabled on a per-project basis, not aper-service basis.After you enable the Data Lineage API, lineage information is automaticallyreported for multiple Google Cloud services in the project, depending on theirproduct-level lineage control.For more details, seeData lineage considerations.

Required roles

To get the permissions that you need to view lineage visualization graphs, ask your administrator to grant you the following IAM roles:

For more information about granting roles, seeManage access to projects, folders, and organizations.

You might also be able to get the required permissions throughcustom roles or otherpredefined roles.

Add a public dataset to your project

  1. In the Google Cloud console, go to the BigQuery page.

    Go to BigQuery

  2. In the left pane, clickExplorer:

    Highlighted button for the Explorer pane.

    If you don't see the left pane, clickExpand left pane to open the pane.

  3. In theExplorer pane, clickAdd data.

  4. In theAdd data pane, selectPublic datasets.

  5. In theMarketplace pane, search forNYC TLC Trips and click theNYC TLC Trips result.

  6. ClickView dataset.

This adds the public dataset's project as a reference that you can view in theExplorer pane. The details pane showsDataset info, including informationsuch asDataset ID,Data location, andLast modified date.

Create a dataset in your project

  1. In the left pane, clickExplorer:

    Highlighted button for the Explorer pane.

  2. In theExplorer pane, select the project where you want to create thedataset.

  3. Clickmore_vertActions and clickCreatedataset.

  4. On theCreate dataset page, in theDataset ID field, enter:data_lineage_demo. Leave the other fields with their default values.

  5. ClickCreate dataset.

  6. In theExplorer pane, clickDatasets, and then click the newly addeddata_lineage_demo.

The details pane shows itsDataset info.

Copy two publicly accessible tables to your dataset

  1. Open a query editor: In the details pane, next to the tab calleddata_lineage_demo, clickSQL query. This step creates a tab calledUntitled.

  2. In the query editor, copy the first table by entering the followingquery. ReplacePROJECT_ID with yourproject'sidentifier.

    CREATETABLE`PROJECT_ID.data_lineage_demo.nyc_green_trips_2021`COPY`bigquery-public-data.new_york_taxi_trips.tlc_green_trips_2021`
  3. ClickRun. This stepcreates the first table, callednyc_green_trips_2021.

  4. In theQuery results pane, clickGo to table. This step displaysthe contents of the first table.

  5. In the query editor, copy the second table by replacing the previousquery with the following query. ReplacePROJECT_ID with yourproject'sidentifier.

    CREATETABLE`PROJECT_ID.data_lineage_demo.nyc_green_trips_2022`COPY`bigquery-public-data.new_york_taxi_trips.tlc_green_trips_2022`
  6. ClickRun. This stepcreates the second table, callednyc_green_trips_2022.

  7. In theQuery results pane, clickGo to table. This step displaysthe contents of the second table.

Aggregate data into a new table

  1. In the query editor, enter the following query. ReplacePROJECT_ID withyourproject's identifier.

    CREATETABLE`PROJECT_ID.data_lineage_demo.total_green_trips_22_21`ASSELECTvendor_id,COUNT(*)ASnumber_of_tripsFROM(SELECTvendor_idFROM`PROJECT_ID.data_lineage_demo.nyc_green_trips_2022`UNIONALLSELECTvendor_idFROM`PROJECT_ID.data_lineage_demo.nyc_green_trips_2021`)GROUPBYvendor_id
  2. ClickRun. This stepcreates a combined table, calledtotal_green_trips_22_21.

  3. In theQuery results pane, clickGo to table. This step displaysthe combined table.

View the lineage graph in Dataplex Universal Catalog

  1. In the Google Cloud console, go to the Dataplex Universal CatalogSearch page.

    Go to Search

  2. If your search platform is set toData Catalog, in theChoose search platform menu, selectDataplex Universal Catalog.

  3. In theSearch box, entertotal_green_trips_22_21 and clickSearch.

  4. From the results list, clicktotal_green_trips_22_21. This step displaysthe BigQuery tableDetails tab.

  5. Click theLineage tab.

Note: It might take some time for automatic lineage to pick up these changes. Ifyou don't see the lineage graph yet, check again after several minutes.
The total_green_trips_22_21 table with details panel docked to the bottom.
Figure 1. Data lineage with node details

In the lineage graph, each rectangular node represents a table, either anoriginal, copied, or combined table. You can do the following:

  • To show or hide the origin of a table, click+ (Expand) or-(Collapse).

  • To show table information, click a node. This step displays a nodeDetails pane.

  • To show process information, clickview lineage process details.This step displays a processDetails pane showing the job thattransformed a source table to a target table.

The intermediary nyc_green_trips_2021 table with details panel docked to the bottom.
Figure 2. Data lineage with process details

Clean up

To avoid incurring charges to your Google Cloud account for the resources used on this page, follow these steps.

Delete the project

    Caution: Deleting a project has the following effects:
    • Everything in the project is deleted. If you used an existing project for the tasks in this document, when you delete it, you also delete any other work you've done in the project.
    • Custom project IDs are lost. When you created this project, you might have created a custom project ID that you want to use in the future. To preserve the URLs that use the project ID, such as anappspot.com URL, delete selected resources inside the project instead of deleting the whole project.

    If you plan to explore multiple architectures, tutorials, or quickstarts, reusing projects can help you avoid exceeding project quota limits.

  1. In the Google Cloud console, go to theManage resources page.

    Go to Manage resources

  2. In the project list, select the project that you want to delete, and then clickDelete.
  3. In the dialog, type the project ID, and then clickShut down to delete the project.

Delete the dataset

  1. In the Google Cloud console, go to theBigQuery page.

    Go to BigQuery

  2. In the left pane, clickExplorer:

    Highlighted button for the Explorer pane.

  3. In theExplorer pane, search for thedata_lineage_demo dataset thatyou created.

  4. Click the dataset, and then clickDelete.

  5. Confirm your delete action.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.