Create a Dataproc Metastore service

This page shows you how to create a Dataproc Metastore service.

After you create your Dataproc Metastore service, you can importmetadata and connect to any of the following services:

After you connect one of these services, it uses yourDataproc Metastore service as its Hive metastore during queryexecution.

Note: It can take over 20 minutes to create a Dataproc Metastoreservice. Creating additional services in the same project, region, and VPCnetwork can take less than 10 minutes.

Before you begin

Required Roles

To get the permission that you need to create a Dataproc Metastore, ask your administrator to grant you the following IAM roles on your project, based on the principle of least privilege:

For more information about granting roles, seeManage access to projects, folders, and organizations.

This predefined role contains the metastore.services.create permission, which is required to create a Dataproc Metastore.

You might also be able to get this permission withcustom roles or otherpredefined roles.

For more information about specific Dataproc Metastore roles and permissions, seeManage access with Identity and Access Management (IAM).

Create Dataproc Metastore using default settings

Creating a Dataproc Metastore using the default settingsconfigures your service with an enterprise tier, a medium instance size,the latest version of the Hive metastore, a Thrift endpoint, and a data locationofus-central1.

Dataproc Metastore 2

The following instructions show you how to create a Dataproc Metastore2 using a Thrift endpoint and other provided default settings.

Console

  1. In the Google Cloud console, go to theDataproc Metastorepage.

    Go toDataproc Metastore

  2. In the navigation bar, click+Create.

    TheCreate metastore service dialog opens.

  3. SelectDataproc Metastore 2.

  4. In thePricing and Capacity section, choose an instance size.

    For more information, seepricing plans and scaling configurations.

  5. In theService name field, enter a unique name for your service.

    For information on naming conventions, seeResource naming convention.

  6. Select theData location.

    For more information about selecting the appropriate region,seeAvailable regions and zonesandRegional endpoint.

  7. For the remaining service configuration options, use the provided defaults.

  8. To create and start the service, clickSubmit.

    Your new metastore service appears on theDataproc Metastorepage. The status displaysCreating until the service is ready to use.When it's ready, the status changes toActive.Provisioning the service might take a few minutes.

  • gcloud CLI

    To create a Dataproc Metastore metastore service 2 using theprovided defaults, run the followinggcloud metastore services createcommand:

    gcloud metastore services createSERVICE \  --location=LOCATION \  --instance-size=INSTANCE_SIZE \  --scaling-factor=SCALING_FACTOR

    Replace the following:

    • SERVICE: The name of your newDataproc Metastore service.
    • LOCATION: The Google Cloud region that you wantto create your Dataproc Metastore in. You can also set adefault location.

      For information on naming conventions, seeResource naming convention.

    • INSTANCE_SIZE: theinstance sizeof your Dataproc Metastore. For example,small,medium orlarge. If you specify a value forINSTANCE_SIZE, don'tspecify a value forSCALING_FACTOR.

    • SCALING_FACTOR: thescaling factorof your Dataproc Metastore service. For example,0.1.If you specify a value forSCALING_FACTOR, don't specify a value forINSTANCE_SIZE.

REST

Follow the API instructions tocreate a service by using the APIs Explorer.

Dataproc Metastore 1

The following instructions show you how to create a Dataproc Metastore1 using a Thrift endpoint and other provided default settings.

Console

  1. In the Google Cloud console, go to theDataproc Metastorepage.

    Go toDataproc Metastore

  2. In the navigation bar, click+Create.

    TheCreate metastore service dialog opens.

  3. SelectDataproc Metastore 1.

  4. In theService name field, enter a unique name for your service.

    For information on naming conventions, seeResource naming convention.

  5. Select theData location.

    For more information about selecting the appropriate region,seeAvailable regions and zonesandRegional endpoint.

  6. For the remaining service configuration options, use the provided defaults.

  7. To create and start the service, clickSubmit.

    Your new metastore service appears on theDataproc Metastorepage. The status displaysCreating until the service is ready to use.When it's ready, the status changes toActive.Provisioning the service might take a couple of minutes.

  • gcloud CLI

    To create a basic metastore service using the provided defaults,run the followinggcloud metastore services createcommand:

    gcloud metastore services createSERVICE \  --location=LOCATION

    Replace the following:

    • SERVICE: The name of your newDataproc Metastore service.
    • LOCATION: The Google Cloud region that you wantto create your Dataproc Metastore in. You can also set adefault location.

      For information on naming conventions, seeResource naming convention.

REST

Follow the API instructions tocreate a serviceby using the APIs Explorer.

Create Dataproc Metastore using advanced settings

Creating a Dataproc Metastore using the advanced settingsshows you how to modify configurations such as network configurations, scalingsettings, endpoint settings, security settings, and optional features.

Dataproc Metastore 2 or 1

The following instructions show you how to create a Dataproc Metastore2 or a Dataproc Metastore 1 service using advancedsettings.

Note: If your Hive warehouse directory is already on Cloud Storage, then youshouldset a metastore configuration override when you create yourDataproc Metastore service.

Console

Get started

  1. In the Google Cloud console, open theDataproc Metastore page:

    Open Dataproc Metastore

  2. In the navigation menu, click+Create.

    TheCreate metastore service dialog opens.

  3. Select the metastore version that you want to use,Dataproc Metastore 1 orDataproc Metastore 2.

    Service info

    Create service page
    Example of thecreate service page

    1. (Optional): ForDataproc Metastore 2. In thePricing and Capacitysection, choose an instance size.

      For more information, seepricing plans and scaling configurations.

    2. In theService name field, enter a unique name for your service.

      For information on naming conventions, seeResource naming convention.

    3. Select theData location.

      For more information about selecting the appropriate region,seeAvailable regions and zonesandRegional endpoint.

    4. Select the HiveMetastore version.

      If this value is not modified, your service uses the latest supportedversion of Hive (currently version3.1.2).

      For more information about selecting the correct version,seeVersion policy.

    5. Select theRelease channel.

      If this value is not modified, your metastore uses theStable value.For more information, seeRelease channel.

    6. Enter the TCPPort.

      The TCP port your Thrift endpoint connects to. If this value isnot modified, port number9083 is used. If you change yourendpoint to gRPC, this value automatically changes to443 and can'tbe changed.

    7. (Optional) ForDataproc Metastore 1. Select theService tier.

      The service tier influences the capacity of your service.For more information, seeService Tier.

    Endpoint protocol

    • Optional: Choose an endpoint protocol.

      The default selected option isApache Thrift. For more informationabout the different endpoints, seeChoose the endpoint protocol.

    Network configuration

    1. Select aNetwork Configuration.

      By default, your services can be made accessible in multiple VPC networks. You can specify up to five subnetworks.

      Change your network settings to complete thefollowing actions:

      • Connect your Dataproc Metastore service to Dataproc Metastoreservices in other projects.
      • Use your Dataproc Metastore service with otherGoogle Cloud services, such as Dataproc cluster.
      Note: If you use a gRPC endpoint, these settings are disabled.
    2. Optional: ClickUse shared VPC network and enter theProject ID andVPC subnetwork name.

    3. Optional: ClickExpose service in 1 VPC network.and select the network or shared VPC network where you want to make the service available.

    4. Optional: ClickMake services accessible in multiple VPC subnetworksand select the subnetworks. You can specify up to five subnetworks.

    5. ClickDone.

    Metadata integration

    Maintenance window

    • Optional: Select theDay of week andHour of day for yourmaintenance window.

      For more information, seeMaintenance windows.

    Security

    1. Optional: Enable Kerberos.

      Note: To enable Kerberos, you need a keytab file. This file containspairs of Kerberos principals and encrypted keys. A keytab file mustcontain the entry for the service principal created for this Hive metastore. For more information, seeConfigure Kerberos for a service.
      1. To enable Kerberos, click the toggle.
      2. Select or enter your secret resource ID.
      3. Either choose to use the latest secret version or select an olderone to use.
      4. Enter theKerberos principal.

        This is the principal allocated for this Dataproc Metastoreservice.

      5. Browse to thekrb5 config file.

    2. Optional: Choose an encryption type.

      • The default selected option isGoogle-managed encryption key.

      • To select a customer-managed key, clickUse a customer-managed encryption key (CMEK).

        For more information, seeUsing customer-managed encryption keys.

    Metastore config overrides

    Auxiliary version config

    • Optional: To add an auxiliary version config, clickEnable.

      For more information, seeAuxiliary versions.

    Database type

    • Optional: Choose a database type.

      ForDatabase type, selectMySQL orSpanner. MySQL is thedefault database type.

      For more information about choosing a specific database type,seeDatabase types.

    Labels

    • Optional: To add or remove optional labels that describe your metadata,click+ Add Labels.

Start the service

To create and start the service, clickSubmit.

Your new metastore service appears on theDataproc Metastore page. The status displaysCreating until the service is ready to use. When it's ready, the status changes toActive. Provisioning the service might take a few minutes.

gcloud CLI

  1. To create a metastore, run the followinggcloud metastore services create command:

    gcloud metastore services createSERVICE \  --location=LOCATION \  --instance-size=INSTANCE_SIZE \  --scaling-factor=SCALING_FACTOR \  --port=PORT \  --tier=TIER \  --endpoint-protocol=ENDPOINT_PROTOCOL \  --database-type=DATABASE_TYPE \  --hive-metastore-version=HIVE_METASTORE_VERSION \  --data-catalog-sync=DATA_CATALOG_SYNC \  --release-channel=RELEASE_CHANNEL \  --hive-metastore-configs=METADATA_OVERRIDE \  --labels=LABELS \  --auxiliary-versions=AUXILIARY_VERSION \  --network=NETWORK \  --consumer-subnetworks="projects/PROJECT_ID/regions/LOCATION/subnetworks/SUBNET1, projects/PROJECT_ID/regions/LOCATION/subnetworks/SUBNET2" \  --kerberos-principal=KERBEROS_PRINCIPAL \  --krb5-config=KRB5_CONFIG \  --keytab=CLOUD_SECRET \  --encryption-kms-key=KMS_KEY

    Replace the following:

    Service settings:

    • SERVICE: The name of your newDataproc Metastore service.
    • LOCATION: The Google Cloud region that you wantto create your Dataproc Metastore in. You can also set adefault location.
    • PORT: Optional: The TCP port that yourThrift endpoint uses. If not set, port9083 is used.If you choose to use a gRPC endpoint, your port number automaticallychanges to443.
    • TIER: Optional forDataproc Metastore 1:Theservice tier of your newservice. If not set, theDeveloper value is used.
    • ENDPOINT_PROTOCOL:Optional:Choose the endpoint protocol for your service.
    • DATABASE_TYPE:Optional: Choose the database type for your service.For more information about choosing a specific database type, seeDatabase types.
    • DATA_CATALOG_SYNC: Optional: Enable theData Catalog sync feature.
    • HIVE_METASTORE_VERSION: Optional: The Hivemetastore version you want to use with your service. For example,3.1.2. If not set, the latest version of Hive is used.
    • RELEASE_CHANNEL: Optional: Therelease channelof the service. If not set, theStable value is used.
    • METADATA_OVERRIDE: Optional: The Hive metastoreoverride configs you want to apply to your service. Use a comma separated listin the following formatk1=v1,k2=v2,k3=v3.
    • LABELS: Optional: key-value pairs to addadditional metadata to your service. Use a comma separated listin the following formatk1=v1,k2=v2,k3=v3. Dataproc Metastore
    • AUXILIARY_VERSION: Optional: Enable auxiliaryversions. For more information, seeAuxiliary versions.

    Scaling settings:

    • INSTANCE_SIZE: Optional for Dataproc Metastore 2:theinstance sizeof your Dataproc Metastore. For example,small,medium orlarge.If you specify a value forINSTANCE_SIZE, don't specify a value forSCALING_FACTOR.
    • SCALING_FACTOR: Optional forDataproc Metastore 2: thescaling factorof your Dataproc Metastore service. For example,0.1. If youspecify a value forSCALING_FACTOR, don't specify a value forINSTANCE_SIZE

    Network settings:

    • NETWORK: The name of the VPC network thatyou're connecting to your service. If not set, thedefault value is used.

      If you use a VPC network that belongs to adifferent project than your service, you must provide the entirerelative resource name must be provided. For example,projects/HOST_PROJECT/global/networks/NETWORK_ID.

    • SUBNET1,SUBNET2:Optional: A list of subnetworks that can access your service. You canuse the ID, fully-qualified URL, or relative name of the subnetwork.You can specify up to 5 subnetworks.

      Note: If you specified a value in--network, then you can't use thisparameter.

    Kerberos settings:

  2. Verify that the creation was successful.

REST

Follow the API instructions tocreate a serviceby using the APIs Explorer.

Set a Hive metastore config override for Dataproc Metastore

If your Apache Hive warehouse directory is on Cloud Storage, you should set ametastore config override. This override sets your custom data warehouse as thedefault warehouse directory for your Dataproc Metastore service.

Before you set this override, make sure that your Dataproc Metastoreservice has object read and write permissions to access the warehouse directory.For more information, seeHive warehouse directory.

The following instructions show you how to set a Hive Metastore config overridefor a new Dataproc Metastore service.The following instructions show you how to set a Hive metastore config overridefor a new Dataproc Metastore service.

Console

  1. In the Google Cloud console, open theDataproc Metastore page:

    Open Dataproc Metastore

  2. In the navigation bar, click+Create.

  3. In theMetastore config overrides, enter the following values:

    • Key:hive.metastore.warehouse.dir.
    • Value: The Cloud Storage location of your warehouse directory.For example:gs://my-bucket/path/to/location.
  4. Configure the remaining service options as necessary, or use theprovided defaults.

  5. ClickSubmit.

    Return to theDataproc Metastore page, andverify that your service was successfully created.

gcloud CLI

  1. To create a Dataproc Metastore service with a Hive override,run the followinggcloud metastore services createcommand:

    gcloud metastore services createSERVICE \  --location=LOCATION \  --hive-metastore-configs="hive.metastore.warehouse.dir=CUSTOMER_DIR"

    Replace the following:

    • SERVICE: The name of your newDataproc Metastore service.
    • LOCATION: The Google Cloud region that you wantto create your Dataproc Metastore in. You can also set adefault location.
    • CUSTOMER_DIR: The Cloud Storage location ofyour warehouse directory. For example:gs://my-bucket/path/to/location.
  2. Verify that the creation was successful.

Create Dataproc Metastore with autoscaling

Dataproc Metastore 2 supports autoscaling. If you turn on autoscaling,you can set a minimum scaling factor and a maximum scaling factor. After this isset, your service automatically increases or decreases the scaling factorrequired to run your workloads.

Autoscaling considerations

  • Autoscaling and scaling factors are mutually exclusive options. For example,if you turn on autoscaling, you can't manually set a scaling factor or size.
  • Autoscaling is only available for single region Dataproc Metastoreinstances.
  • When autoscaling is enabled, existing scaling factors settings are cleared.
  • When autoscaling is disabled:
    • Existing autoscaling settings are cleared.
    • The scaling factor is set to the lastautoscaling_factor that was configuredon the service.
  • The minimum and maximum autoscaling factors are optional. If not set, thedefault values are0.1 and6, respectively.

Choose one of the following tabs to learn how to create a Dataproc Metastoreservice 2 with autoscaling enabled.

Console

  1. In the Google Cloud console, go to theDataproc Metastore page.

    Go to Dataproc Metastore

  2. In the navigation bar, click+Create.

    TheCreate metastore service dialog opens.

  3. SelectDataproc Metastore 2.

  4. In thePricing and Capacity section, selectEnterprise - Single region

  5. UnderInstance Size, clickEnable autoscaling.

  6. UnderInstance Size, use the slider to choose a minimum and maximum instancesize.

  7. To create and start the service, clickSubmit.

    Your new metastore service appears on theDataproc Metastore page. The status displaysCreating until the service is ready to use. When it's ready, the status changes toActive. Provisioning the service might take a few minutes.

REST

Note: The following command assumes that you've logged in to the gcloud CLI withyour user account. You can sign in by executinggcloud initorgcloud auth login,or by usingCloud Shell, whichautomatically logs you into the gcloud CLI. You can check the activeaccount by executinggcloud auth list.
curl -X POST -s -i -H "Authorization: Bearer $(gcloud auth print-access-token)" \-d '{"scaling_config":{"autoscaling_config":{"autoscaling_enabled": true,"limit_config":{"max_scaling_factor":MAX_SCALING_FACTOR,"min_scaling_factor":MIN_SCALING_FACTOR}}}}' \-H "Content-Type:application/json" \https://metastore.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/services?service_id=SERVICE_ID

Replace the following:

  • MIN_INSTANCES Optional: The minimum number of instancesto use in your autoscaling configuration. If this values are not specified,a default value of0.1 is used.
  • MAX_INSTANCES Optional: The maximum number of instancesto use in your autoscaling configuration. If this values are not specified,a default value of6 is used.

Create a Dataproc Metastore service using Shared VPC

A Shared VPC lets you connect Dataproc Metastoreresources from multiple projects to a common VPC network.

To create a Dataproc Metastore service configured with aShared VPC, seeCreate a service using advanced settings.

Considerations

  • VPC networks are not relevant for Dataproc Metastore servicesconfigured with the gRPC endpoint protocol.

  • For Dataproc Metastore services configured with the Thrift endpointprotocol, make sure your Dataproc Metastore service and theDataproc cluster it's attached to are using the same Shared VPCnetwork.

  • For Dataproc Metastore services configured with the Thrift endpointprotocol and Private Service Connect, make sure that you use subnetworksfrom the Shared VPC network.

IAM roles required for Shared VPC networks

To create a Dataproc Metastore service with a VPC that is accessiblein a network belonging to a different project,you must grantroles/metastore.serviceAgent to the service project'sDataproc Metastore service agent(service-SERVICE_PROJECT_NUMBER@gcp-sa-metastore.iam.gserviceaccount.com)in the network project's IAM policy.

Caution: This IAM policy change grants Dataproc Metastoreusers with themetastore.services.create permission in the service project toindirectly create addresses and peerings in the network project.
gcloudprojectsadd-iam-policy-bindingNETWORK_PROJECT_ID\--role"roles/metastore.serviceAgent"\--member"serviceAccount:service-SERVICE_PROJECT_NUMBER@gcp-sa-metastore.iam.gserviceaccount.com"
Note: If you have never created a Dataproc Metastore service inthe service project, then thegcloud projects add-iam-policy-binding commandmight fail with an error message containingService account [SERVICE_ACCOUNT_NAME] does not exist.If this happens, you can resolve the issue by attempting tocreate a service using a non-existent network in the service project. Theservice creation will fail, but it will trigger the creation of the serviceaccount. Afterwards, the command should succeed.

Troubleshoot common issues

Some common issues include the following:

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.