Sync Dataproc Metastore to Data Catalog

Caution: Data Catalog isdeprecated in favor ofDataplex Universal Catalog. Dataplex Universal Catalog is also integrated with Dataproc Metastore, offering similar capabilities. You can use Dataplex Universal Catalog to enrich your data with aspects, which are the equivalent of Data Catalog tags. For more information, seeManage aspects and enrich metadata.

This document shows you how to sync Dataproc Metastore metadatawith Data Catalog.

After you sync these two services together, you can use Data Catalogto manage your Dataproc Metastore metadata. For example, by usingData Catalog, you can tag and search for specific Dataproc Metastoreresources, such as databases and tables.

What is Data Catalog

Data Catalog is a fully managed, scalable metadata managementservice. It provides unified view and tagging mechanisms for technicaland business metadata.

For more information, see the following Data Catalog featureguides:

Before you begin

Required roles

To get the permissions that you need to sync Dataproc Metastore metadata with Data Catalog, ask your administrator to grant you theView synced Dataproc Metastore entries in Data Catalog (roles/metastore.metadataViewer) IAM role on your project, based on the principle of least privilege. For more information about granting roles, seeManage access to projects, folders, and organizations.

This predefined role contains the permissions required to sync Dataproc Metastore metadata with Data Catalog. To see the exact permissions that are required, expand theRequired permissions section:

Required permissions

The following permissions are required to sync Dataproc Metastore metadata with Data Catalog:

  • To get Dataproc Metastore databases:metastore.databases.get
  • To list Dataproc Metastore databases:metastore.databases.list
  • To get Dataproc Metastore tables:metastore.tables.get
  • To list Dataproc Metastore tables:metastore.tables.list

You might also be able to get these permissions withcustom roles or otherpredefined roles.

For more information about specific Dataproc Metastore roles and permissions, seeManage access with IAM.

How permissions work between the services

Data Catalog abides by Dataproc Metastore levelpermissions. For metadata that is synced from Dataproc Metastoreto Data Catalog, IAM permissions specified inDataproc Metastore apply to the metadata in Data Catalogas well.

Data Catalog checks the permissions for each metastoredatabase and table at the time of access so that only users with access to theDataproc Metastore service are able to see the synced serviceresources as entries in Data Catalog.

How Data Catalog sync works with Dataproc Metastore

You can enable Dataproc Metastore to Data Catalogsync when you create or update a Dataproc Metastore service usingthe Google Cloud console. You can disable the sync in the same way.

After enabling Data Catalog sync, database and table metadata areautomatically synced from Dataproc Metastore to Data Catalog.

Note: It can take up to 6 hours before the metadata in Dataproc Metastoreis fully ingested to Data Catalog.

Data Catalog syncs the following metadata:

  • Instances.
  • Databases, including name and description.
  • Tables, including name, description, and schema (columns with descriptions).

The following table shows the resource mapping between Dataproc Metastoreand Data Catalog:

Dataproc Metastore ResourceData Catalog Resource
InstanceEntry group
Entry
DatabaseEntry
TableEntry
ColumnSchema

Considerations

Create a service with Data Catalog sync enabled

Data Catalog sync is disabled by default.

To enable Data Catalog sync for a new service, use the followinginstructions.

Console

  1. In the Google Cloud console, open the Dataproc Metastore page:

    Go to Dataproc Metastore

  2. At the top of theDataproc Metastore page, clickCreate.

    TheCreate service page opens.

  3. Select the version of Dataproc Metastore that you want to use.

  4. UnderMetadata integration, clickData Catalog sync.

  5. For the remaining service configuration options, use the provided defaults.OrConfigure your service as needed.

  6. ClickSubmit.

Enable or disable Data Catalog sync for an existing service

To enable or disable Data Catalog sync for an existing service,use the following instructions.

Console

  1. In the Google Cloud console, open the Dataproc Metastore page:

    Go to Dataproc Metastore

  2. On theDataproc Metastore page, click service you want to update.

    TheService detail page for that service opens.

  3. Under theConfiguration tab, clickEdit.

    TheEdit service page opens.

  4. UnderMetadata integration, toggleData Catalog syncon or off.

  5. ClickSubmit.

Search with Data Catalog

You can search synced Dataproc Metastore metadata usingData Catalog.

Although there are no custom search options for Dataproc Metastore,there are multiple ways to search for different Dataproc Metastoreresources, including the following:

  • Dataproc Metastore instance
    • By display name
    • Standard Data Catalog functions — for example, by using tags.
  • Database
    • By display name
    • By description
    • By Dataproc Metastore instance
    • Standard Data Catalog functions — for example, by using tags.
  • Table
    • By display name
    • By description
    • By column name
    • By column description
    • By database
    • By Dataproc Metastore instance
    • Standard Data Catalog functions — for example, by using tags.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.