Introduction to data governance inBigQuery

BigQuery has built-in governance capabilities that simplify howyou discover, manage, monitor, govern, and use your data and AI assets.

Administrators, data stewards, data governance managers, and data custodians canuse the governance capabilities in BigQuery to do the following:

  • Discover data.
  • Curate data.
  • Gather and enrich metadata.
  • Manage data quality.
  • Ensure that data is used consistently and in compliance with organizationalpolicies.
  • Share data at scale and in a secure fashion.

BigQuery governance capabilities are powered byDataplex Universal Catalog, a centralized inventory of all data assets in your organization.Dataplex Universal Catalog holds business, technical, and operational metadata forall of your data. It helps you discover relationships and semantics in themetadata by applying artificial intelligence and machine learning.

BigLake metastore lets you usemultiple data processing engines to query a single copy of data with a singleschema, without data duplication. The data processing engines that you can useinclude BigQuery, Apache Spark, ApacheFlink, and Apache Hive. Your data can be stored in locations likeBigQuery storage tables, BigLake tables for Apache Iceberg in BigQuery, orBigLake external tables.

BigQuery supports an end-to-end data lifecycle, fromdiscovery to use of data. Governance features are also available inDataplex Universal Catalog.

Data discovery

BigQuery discovers data across the organization in Google Cloud,whether the data is in BigQuery, Spanner, Cloud SQL,Pub/Sub, or Cloud Storage. The metadata is automaticallyextracted and stored in Dataplex Universal Catalog. Forexample, you can extract metadata for structuredand unstructured data from Cloud Storage, and you can automaticallycreate query-ready BigLake tables at scale. This lets you performanalytics with an open source engine without data duplication.

You can also extract and catalog metadata from third-party data sources usingcustom connectors.

BigQuery offers the following data discoverycapabilities:

  • Search. Search for data and AI resources across projects and theorganization. Within BigQuery in the Google Cloud console, usesemantic search (Preview) to search for resources byusing everyday language. Or, find resources by usingkeyword search in Dataplex Universal Catalog.
  • Automatic discovery of Cloud Storagedata. Scan for datain Cloud Storage buckets to extract and then catalog metadata.Automatic discovery creates tables for both structured and unstructureddata.
  • Metadata import. Import metadata at scale fromthird-party systems into Dataplex Universal Catalog. You can buildcustom connectors to extract data from your data sources, and then runmanaged connectivity pipelines that orchestrate the metadata importworkflow.
  • Metadata export. Export metadata at scale out ofDataplex Universal Catalog. You can analyze the exportedmetadata with BigQuery, or integrate the metadata into customapplications or programmatic processing workflows.

Curation and data stewardship

To improve the discoverability and usability of data, data stewards andadministrators can use BigQuery to review, update, and analyzemetadata. BigQuery data curation and stewardship capabilitieshelp you ensure that your data is accurate, consistent, and aligned with yourorganization's policies.

BigQuery offers the following data curation andstewardship capabilities:

  • Business glossary. Improvecontext, collaboration, and search by defining your organization'sterminology in a glossary. Identify data stewards for the terms, and attachterms to data asset fields.
  • Data insights.Gemini uses metadata to generate natural language questions aboutyour table and the SQL queries to answer them. These data insights help youuncover patterns, assess data quality, and perform statistical analysis.
  • Data profiling. Identifycommon statistical characteristics of the columns in BigQuerytables to understand and analyze your data more effectively.
  • Data quality. Define and rundata quality checks across tables in BigQuery andCloud Storage, and apply regular and ongoing data controls inBigQuery environments.
  • Data lineage. Track how datamoves through your systems: where it comes from, where it's passed to, andwhat transformations are applied to it. BigQuery supportsdata lineage at the table- and column-levels.

Next steps for curation and data stewardship

The following table outlines next steps that you can take to learn more aboutcuration and data stewardship features:

Experience levelLearning path
New cloud users
  • Run adata profile scan to gain insights about your data, including the limits or averages of your data.
Experienced cloud users

Security and access control

Data access management is the process of defining, enforcing, and monitoringthe rules and policies governing who has access to data. Access managementensures that data is only accessible to those who are authorized to access it.

BigQuery offers the following security and access controlcapabilities:

  • Identity and Access Management (IAM). IAM lets you control whohas access to your BigQuery resources, such as projects,datasets, tables, and views. You can grant IAMroles to users, groups, and service accounts. These roles define what theycan do with your resources.
  • Column-level accesscontrols androw-level access controls. Column-level and row-level accesscontrols let you restrict access to specific columns and rows in a table,based on user attributes or data values. This control lets you implementfine-grained access to help protect sensitive data from unauthorized access.
  • Data transfer management.VPC Service Controls lets you create perimeters around Google Cloudresources and control access to those resources based on yourorganization's policies.
  • Audit logs. Audit logs provide you with a detailedrecord of user activity and system events in your organization. These logshelp you enforce data governance policies and identify potential securityrisks.
  • Data masking. Data masking lets you obscure sensitivedata in a table while still permitting authorized users to access thesurrounding data. Data masking can also obscure data that matches sensitivedata patterns, safeguarding against accidental data disclosure.
  • Encryption.BigQuery automatically encrypts all data atrest and in transit, while letting you customize your encryption settings tomeet your specific requirements.

Next steps for security and access control

The following table outlines next steps that you can take to learn more aboutaccess control features:

Experience levelLearning path
New cloud users
Experienced cloud users
  • For greater flexibility and granularity in managing your permissions, considercreating custom roles that match your needs.
  • Addrow andcolumn controls to help control access to specific rows and columns in your tables.
  • Establish an access perimeter around your Google Cloud resources by setting up VPC Service Controls.
  • Addcolumn-level data masking to your table to share information through your organization without revealing sensitive data.
  • Use Sensitive Data Protection to scan your data for sensitive and high-risk information, such as personally identifiable information (PII), financial data, and health information.

Shared data and insights

BigQuery lets you share data and insights at scale within andacross organizational boundaries. It has a robust security and privacy frameworkthrough a built-in data exchange platform. UsingBigQuery sharing,you can discover, access, and consume a data library that's curated by a wideselection of data providers.

BigQuery offers the following sharing capabilities:

  • Share more than data. You can share a wide range of data and AI assets such asBigQuery datasets, tables, views, real-time streams withPub/Sub topics, SQL stored procedures, and BigQuery MLmodels.
  • Access Google datasets. Augment your analytics and ML initiatives withGoogle datasets from Search Trends, DeepMind WeatherNext models,Google Maps Platform, Google Earth Engine, and more.
  • Integrate with data governanceprinciples. Data ownersretain control over their data and have the ability to define and configurerules or policies to restrict access and usage.
  • Live, zero-copy data sharing. Data is shared in place with nointegration, data movement, or replication needed, ensuring analysis isbased on the latest information. Linked datasets created are a live pointerto the shared asset.
  • Enhance security posture. You can use access controls to reduce overprovisioning access,including built-in VPC Service Controls support.
  • Increase visibility with provider usagemetrics. Datapublishers can view and monitor usage for shared assets such as the numberof jobs executed, total bytes scanned, and subscribers for eachorganization.
  • Collaborate on sensitive data with data cleanrooms. Data cleanrooms provide a security-enhanced environment in which multiple parties canshare, join, and analyze their data assets without moving or revealing theunderlying data.
  • Built on BigQuery. You can buildon the scalability and massive processing capabilities inBigQuery, allowing for large scale collaborations.

Next steps for sharing

The following table outlines next steps that you can take to learn more aboutsharing features:

Experience levelLearning path
New cloud users
  • Learn how to create and manageexchanges andlistings to start sharing within or outside of your organization.
Experienced cloud users

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-16 UTC.