Data Catalog overview Stay organized with collections Save and categorize content based on your preferences.
Data Catalog is a central inventoryof an organization's data assets. Data Catalog automaticallycatalogs metadata from Google Cloud sources such as BigQuery,Vertex AI, Pub/Sub, Spanner, Bigtable,and more. Data Catalog also indexes table and fileset metadatafrom Cloud Storage throughdiscovery.
You can discover data with Dataplex Universal Catalog's governed organization-widemetadata search capability. You can further enrich metadata with criticalbusiness context, and enable lineage tracking, data profiling, data qualitychecks, and access control capabilities.
Using Data Catalog, organizations can achieve better datadiscovery, metadata management, and governance.
Why do you need Data Catalog?
Most organizations deal with a large and growing number of data assets.Data stakeholders (consumers, producers, and administrators) within anorganization face multiple challenges, including the following:
Searching for insightful data:
- Data consumers don't know the location and origin of data. They have tonavigate data "swamps".
- Data consumers don't know what data to use to get insights because most dataisn't well documented and, even if documented, isn't well maintained.
- Data can't be found and is often lost when it resides only in people'sminds.
Understanding data:
- Is the data fresh, clean, validated, approved for use in production?
- Which dataset out of several duplicate sets is relevant and up-to-date?
- How does one dataset relate to another?
- Who is using the data and who is the owner?
- Who and what processes are transforming the data?
Making data useful:
Data producers don't have an efficient way to put forward their data forconsumers. If there's no self-service, consumers may overwhelm producers.Several data engineers can't manually provide data to thousands of dataanalysts.
Valuable time is lost if data consumers have to find out how to requestdata access, wait without a defined response time, escalate, and wait again.
Without the right tools, the challenges become a major obstacleto the efficient use of data. Data Catalog provides a centralizedrepository that lets organizations achieve the following:
- Gain aunified view to reduce the pain of searching for the right data.
- Support data-driven decision making and accelerate the insight time by enriching data withtechnical and business metadata.
- Improvedata management to increase operational efficiency andproductivity.
- Takeownership over the data to improve trust and confidence in it.
Data Catalog functions
Data Catalog provides three main functions:
- Searching for data entries for which you have access
- Tagging data entries with metadata
- Providingcolumn-level securityfor BigQuery tables
In addition, Data Catalog can build on the results of aSensitive Data Protection scan to identify sensitivedata directly within Data Catalog in the form of tag templates.
How Data Catalog works
Data Catalog can catalog asset metadata from different Google Cloud systems.
You can also use Data Catalog APIs to integrate withcustom data sources.
After your data is cataloged, you can add your own metadata to these assets using tags.

Data Catalog metadata
Data Catalog handles two types of metadata:technical metadata andbusiness metadata. To know more about metadata, seeData Catalog metadata.
Search and discovery
Data Catalog offers a powerful predicate-based searchexperience for technical and business metadata associated with a data entry. Youmust have the permissions to read the metadata for a data entry so that you canapply search and discovery on the metadata. Data Catalog does notindex the data within a data entry. Data Catalog only indexes themetadata that describes an asset.
Data Catalog controls some metadata such as user-generated tags.For all metadata sourced from the underlying storage system,Data Catalog is a read-only service that reflects the metadataand permissions provided by the underlying storage system. You can make edits inthe underlying storage system to add, update, or delete the metadata of a dataentry.
To know more about Data Catalog search, seeSearch for data assets with Data Catalog.
Automatic cataloging of assets
For a given project, Data Catalog automatically catalogs thefollowing Google Cloud assets:
- BigQuery sharing (formerly Analytics Hub) linked datasets
- BigQuery datasets, tables, models, routines, and connections
- Bigtable instances, clusters, and tables (including column family details)
- Dataplex Universal Catalog lakes, zones, tables, and filesets
- Dataproc Metastore services, databases, and tables
- Pub/Sub topics
- Spanner instances, databases, tables, and views
Vertex AI models,datasets, andVertex AI Feature Store resources
Note: If a project name contains:, Dataplex Universal Catalog doesn'tcatalogFeatureViewandFeatureresources created in that project.
In addition to cataloging assets within the project IDs for which you have metadata access, Data Catalog can catalog data stored in the BigQuery projectsthat contain public datasets.
Catalog non-Google Cloud assets
To catalog metadata from non-Google Cloud systems in your organization, you can use thefollowing:
- Community-contributed connectorsto multiple popular on-premises data sources
- Manually build on theData Catalog APIs for custom entries
Access Data Catalog
You can access Data Catalog functionalities using:
Dataplex Universal Catalog in theGoogle Cloud console
gcloudcommand-line interface (CLI)
What's next
Learn how totag a BigQuery table by using Data Catalog.
Learn how tosearch data assets with Data Catalog.
Learn how tointegrate Google Cloud and on-premises data sources with Data Catalog.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2026-02-18 UTC.