Create a Dataproc Metastore service Stay organized with collections Save and categorize content based on your preferences.
This page shows you how to create a Dataproc Metastore service.
After you create your Dataproc Metastore service, you can importmetadata and connect to any of the following services:
A self-managedApache Hiveinstance,Apache Spark instance,or aPresto cluster.
After you connect one of these services, it uses yourDataproc Metastore service as its Hive metastore during queryexecution.
Note: It can take over 20 minutes to create a Dataproc Metastoreservice. Creating additional services in the same project, region, and VPCnetwork can take less than 10 minutes.Before you begin
- Understand the differences betweenaDataproc Metastore 1 serviceand aDataproc Metastore 2 service.
- EnableDataproc Metastorein your project.
- Understand networkingrequirements specific to yourproject.
Required Roles
To get the permission that you need to create a Dataproc Metastore, ask your administrator to grant you the following IAM roles on your project, based on the principle of least privilege:
- Grant full control of Dataproc Metastore resources (
roles/metastore.editor) - Grant full access to all Dataproc Metastore resources, including IAM policy administration (
roles/metastore.admin)
For more information about granting roles, seeManage access to projects, folders, and organizations.
This predefined role contains the metastore.services.create permission, which is required to create a Dataproc Metastore.
You might also be able to get this permission withcustom roles or otherpredefined roles.
For more information about specific Dataproc Metastore roles and permissions, seeManage access with Identity and Access Management (IAM).Create Dataproc Metastore using default settings
Creating a Dataproc Metastore using the default settingsconfigures your service with an enterprise tier, a medium instance size,the latest version of the Hive metastore, a Thrift endpoint, and a data locationofus-central1.
Dataproc Metastore 2
The following instructions show you how to create a Dataproc Metastore2 using a Thrift endpoint and other provided default settings.
Console
In the Google Cloud console, go to theDataproc Metastorepage.
In the navigation bar, click+Create.
TheCreate metastore service dialog opens.
SelectDataproc Metastore 2.
In thePricing and Capacity section, choose an instance size.
For more information, seepricing plans and scaling configurations.
In theService name field, enter a unique name for your service.
For information on naming conventions, seeResource naming convention.
Select theData location.
For more information about selecting the appropriate region,seeAvailable regions and zonesandRegional endpoint.
For the remaining service configuration options, use the provided defaults.
To create and start the service, clickSubmit.
Your new metastore service appears on theDataproc Metastorepage. The status displaysCreating until the service is ready to use.When it's ready, the status changes toActive.Provisioning the service might take a few minutes.
gcloud CLI
To create a Dataproc Metastore metastore service 2 using theprovided defaults, run the following
gcloud metastore services createcommand:gcloud metastore services createSERVICE \ --location=LOCATION \ --instance-size=INSTANCE_SIZE \ --scaling-factor=SCALING_FACTOR
Replace the following:
SERVICE: The name of your newDataproc Metastore service.LOCATION: The Google Cloud region that you wantto create your Dataproc Metastore in. You can also set adefault location.For information on naming conventions, seeResource naming convention.
INSTANCE_SIZE: theinstance sizeof your Dataproc Metastore. For example,small,mediumorlarge. If you specify a value forINSTANCE_SIZE, don'tspecify a value forSCALING_FACTOR.SCALING_FACTOR: thescaling factorof your Dataproc Metastore service. For example,0.1.If you specify a value forSCALING_FACTOR, don't specify a value forINSTANCE_SIZE.
REST
Follow the API instructions tocreate a service by using the APIs Explorer.
Dataproc Metastore 1
The following instructions show you how to create a Dataproc Metastore1 using a Thrift endpoint and other provided default settings.
Console
In the Google Cloud console, go to theDataproc Metastorepage.
In the navigation bar, click+Create.
TheCreate metastore service dialog opens.
SelectDataproc Metastore 1.
In theService name field, enter a unique name for your service.
For information on naming conventions, seeResource naming convention.
Select theData location.
For more information about selecting the appropriate region,seeAvailable regions and zonesandRegional endpoint.
For the remaining service configuration options, use the provided defaults.
To create and start the service, clickSubmit.
Your new metastore service appears on theDataproc Metastorepage. The status displaysCreating until the service is ready to use.When it's ready, the status changes toActive.Provisioning the service might take a couple of minutes.
gcloud CLI
To create a basic metastore service using the provided defaults,run the following
gcloud metastore services createcommand:gcloud metastore services createSERVICE \ --location=LOCATION
Replace the following:
SERVICE: The name of your newDataproc Metastore service.LOCATION: The Google Cloud region that you wantto create your Dataproc Metastore in. You can also set adefault location.For information on naming conventions, seeResource naming convention.
REST
Follow the API instructions tocreate a serviceby using the APIs Explorer.
Create Dataproc Metastore using advanced settings
Creating a Dataproc Metastore using the advanced settingsshows you how to modify configurations such as network configurations, scalingsettings, endpoint settings, security settings, and optional features.
Dataproc Metastore 2 or 1
The following instructions show you how to create a Dataproc Metastore2 or a Dataproc Metastore 1 service using advancedsettings.
Note: If your Hive warehouse directory is already on Cloud Storage, then youshouldset a metastore configuration override when you create yourDataproc Metastore service.Console
Get started
In the Google Cloud console, open theDataproc Metastore page:
In the navigation menu, click+Create.
TheCreate metastore service dialog opens.
Select the metastore version that you want to use,Dataproc Metastore 1 orDataproc Metastore 2.
Service info

Example of thecreate service page (Optional): ForDataproc Metastore 2. In thePricing and Capacitysection, choose an instance size.
For more information, seepricing plans and scaling configurations.
In theService name field, enter a unique name for your service.
For information on naming conventions, seeResource naming convention.
Select theData location.
For more information about selecting the appropriate region,seeAvailable regions and zonesandRegional endpoint.
Select the HiveMetastore version.
If this value is not modified, your service uses the latest supportedversion of Hive (currently version
3.1.2).For more information about selecting the correct version,seeVersion policy.
Select theRelease channel.
If this value is not modified, your metastore uses the
Stablevalue.For more information, seeRelease channel.Enter the TCPPort.
The TCP port your Thrift endpoint connects to. If this value isnot modified, port number
9083is used. If you change yourendpoint to gRPC, this value automatically changes to443and can'tbe changed.(Optional) ForDataproc Metastore 1. Select theService tier.
The service tier influences the capacity of your service.For more information, seeService Tier.
Endpoint protocol
Optional: Choose an endpoint protocol.
The default selected option isApache Thrift. For more informationabout the different endpoints, seeChoose the endpoint protocol.
Network configuration
Select aNetwork Configuration.
By default, your services can be made accessible in multiple VPC networks. You can specify up to five subnetworks.
Change your network settings to complete thefollowing actions:
- Connect your Dataproc Metastore service to Dataproc Metastoreservices in other projects.
- Use your Dataproc Metastore service with otherGoogle Cloud services, such as Dataproc cluster.
Optional: ClickUse shared VPC network and enter theProject ID andVPC subnetwork name.
Optional: ClickExpose service in 1 VPC network.and select the network or shared VPC network where you want to make the service available.
Optional: ClickMake services accessible in multiple VPC subnetworksand select the subnetworks. You can specify up to five subnetworks.
ClickDone.
Metadata integration
Optional: EnableData Catalog sync.
For more information, seeDataproc Metastore to Data Catalog sync.
Maintenance window
Optional: Select theDay of week andHour of day for yourmaintenance window.
For more information, seeMaintenance windows.
Security
Optional: Enable Kerberos.
Note: To enable Kerberos, you need a keytab file. This file containspairs of Kerberos principals and encrypted keys. A keytab file mustcontain the entry for the service principal created for this Hive metastore. For more information, seeConfigure Kerberos for a service.- To enable Kerberos, click the toggle.
- Select or enter your secret resource ID.
- Either choose to use the latest secret version or select an olderone to use.
Enter theKerberos principal.
This is the principal allocated for this Dataproc Metastoreservice.
Browse to thekrb5 config file.
Optional: Choose an encryption type.
The default selected option isGoogle-managed encryption key.
To select a customer-managed key, clickUse a customer-managed encryption key (CMEK).
For more information, seeUsing customer-managed encryption keys.
Metastore config overrides
Optional: To apply a mapping to the Hive metastore, click+ Add Overrides.
Note: To set a Hive metastore warehouse override, seeSet a Hive warehouse override.
Auxiliary version config
Optional: To add an auxiliary version config, clickEnable.
For more information, seeAuxiliary versions.
Database type
Optional: Choose a database type.
ForDatabase type, selectMySQL orSpanner. MySQL is thedefault database type.
For more information about choosing a specific database type,seeDatabase types.
Labels
- Optional: To add or remove optional labels that describe your metadata,click+ Add Labels.
Start the service
To create and start the service, clickSubmit.
Your new metastore service appears on theDataproc Metastore page. The status displaysCreating until the service is ready to use. When it's ready, the status changes toActive. Provisioning the service might take a few minutes.
gcloud CLI
To create a metastore, run the following
gcloud metastore services createcommand:gcloud metastore services createSERVICE \ --location=LOCATION \ --instance-size=INSTANCE_SIZE \ --scaling-factor=SCALING_FACTOR \ --port=PORT \ --tier=TIER \ --endpoint-protocol=ENDPOINT_PROTOCOL \ --database-type=DATABASE_TYPE \ --hive-metastore-version=HIVE_METASTORE_VERSION \ --data-catalog-sync=DATA_CATALOG_SYNC \ --release-channel=RELEASE_CHANNEL \ --hive-metastore-configs=METADATA_OVERRIDE \ --labels=LABELS \ --auxiliary-versions=AUXILIARY_VERSION \ --network=NETWORK \ --consumer-subnetworks="projects/PROJECT_ID/regions/LOCATION/subnetworks/SUBNET1, projects/PROJECT_ID/regions/LOCATION/subnetworks/SUBNET2" \ --kerberos-principal=KERBEROS_PRINCIPAL \ --krb5-config=KRB5_CONFIG \ --keytab=CLOUD_SECRET \ --encryption-kms-key=KMS_KEY
Replace the following:
Service settings:
SERVICE: The name of your newDataproc Metastore service.LOCATION: The Google Cloud region that you wantto create your Dataproc Metastore in. You can also set adefault location.PORT: Optional: The TCP port that yourThrift endpoint uses. If not set, port9083is used.If you choose to use a gRPC endpoint, your port number automaticallychanges to443.TIER: Optional forDataproc Metastore 1:Theservice tier of your newservice. If not set, theDevelopervalue is used.ENDPOINT_PROTOCOL:Optional:Choose the endpoint protocol for your service.DATABASE_TYPE:Optional: Choose the database type for your service.For more information about choosing a specific database type, seeDatabase types.DATA_CATALOG_SYNC: Optional: Enable theData Catalog sync feature.HIVE_METASTORE_VERSION: Optional: The Hivemetastore version you want to use with your service. For example,3.1.2. If not set, the latest version of Hive is used.RELEASE_CHANNEL: Optional: Therelease channelof the service. If not set, theStablevalue is used.METADATA_OVERRIDE: Optional: The Hive metastoreoverride configs you want to apply to your service. Use a comma separated listin the following formatk1=v1,k2=v2,k3=v3.LABELS: Optional: key-value pairs to addadditional metadata to your service. Use a comma separated listin the following formatk1=v1,k2=v2,k3=v3. Dataproc MetastoreAUXILIARY_VERSION: Optional: Enable auxiliaryversions. For more information, seeAuxiliary versions.
Scaling settings:
INSTANCE_SIZE: Optional for Dataproc Metastore 2:theinstance sizeof your Dataproc Metastore. For example,small,mediumorlarge.If you specify a value forINSTANCE_SIZE, don't specify a value forSCALING_FACTOR.SCALING_FACTOR: Optional forDataproc Metastore 2: thescaling factorof your Dataproc Metastore service. For example,0.1. If youspecify a value forSCALING_FACTOR, don't specify a value forINSTANCE_SIZE
Network settings:
NETWORK: The name of the VPC network thatyou're connecting to your service. If not set, thedefaultvalue is used.If you use a VPC network that belongs to adifferent project than your service, you must provide the entirerelative resource name must be provided. For example,
projects/HOST_PROJECT/global/networks/NETWORK_ID.
Note: If you specified a value inSUBNET1,SUBNET2:Optional: A list of subnetworks that can access your service. You canuse the ID, fully-qualified URL, or relative name of the subnetwork.You can specify up to 5 subnetworks.--network, then you can't use thisparameter.
Kerberos settings:
KERBEROS_PRINCIPAL: Optional: A Kerberosprincipal that exists in both the keytab and the KDC. A typicalprincipal is of the form "primary/instance@REALM", but there is noexact format.KRB5_CONFIG: Optional: The krb5.config filespecifies the KDC and the Kerberos realm information, which includeslocations of KDCs and defaults for the realm and Kerberosapplications.CLOUD_SECRET: Optional: The relative resourcename of aSecret Managersecret version.KMS_KEY: Optional: Refers to the key resourceID.
Verify that the creation was successful.
REST
Follow the API instructions tocreate a serviceby using the APIs Explorer.
Set a Hive metastore config override for Dataproc Metastore
If your Apache Hive warehouse directory is on Cloud Storage, you should set ametastore config override. This override sets your custom data warehouse as thedefault warehouse directory for your Dataproc Metastore service.
Before you set this override, make sure that your Dataproc Metastoreservice has object read and write permissions to access the warehouse directory.For more information, seeHive warehouse directory.
The following instructions show you how to set a Hive Metastore config overridefor a new Dataproc Metastore service.The following instructions show you how to set a Hive metastore config overridefor a new Dataproc Metastore service.
Console
In the Google Cloud console, open theDataproc Metastore page:
In the navigation bar, click+Create.
In theMetastore config overrides, enter the following values:
- Key:
hive.metastore.warehouse.dir. - Value: The Cloud Storage location of your warehouse directory.For example:
gs://my-bucket/path/to/location.
- Key:
Configure the remaining service options as necessary, or use theprovided defaults.
ClickSubmit.
Return to theDataproc Metastore page, andverify that your service was successfully created.
gcloud CLI
To create a Dataproc Metastore service with a Hive override,run the following
gcloud metastore services createcommand:gcloud metastore services createSERVICE \ --location=LOCATION \ --hive-metastore-configs="hive.metastore.warehouse.dir=CUSTOMER_DIR"
Replace the following:
SERVICE: The name of your newDataproc Metastore service.LOCATION: The Google Cloud region that you wantto create your Dataproc Metastore in. You can also set adefault location.CUSTOMER_DIR: The Cloud Storage location ofyour warehouse directory. For example:gs://my-bucket/path/to/location.
Verify that the creation was successful.
Create Dataproc Metastore with autoscaling
Dataproc Metastore 2 supports autoscaling. If you turn on autoscaling,you can set a minimum scaling factor and a maximum scaling factor. After this isset, your service automatically increases or decreases the scaling factorrequired to run your workloads.
Autoscaling considerations
- Autoscaling and scaling factors are mutually exclusive options. For example,if you turn on autoscaling, you can't manually set a scaling factor or size.
- Autoscaling is only available for single region Dataproc Metastoreinstances.
- When autoscaling is enabled, existing scaling factors settings are cleared.
- When autoscaling is disabled:
- Existing autoscaling settings are cleared.
- The scaling factor is set to the last
autoscaling_factorthat was configuredon the service.
- The minimum and maximum autoscaling factors are optional. If not set, thedefault values are
0.1and6, respectively.
Choose one of the following tabs to learn how to create a Dataproc Metastoreservice 2 with autoscaling enabled.
Console
In the Google Cloud console, go to theDataproc Metastore page.
In the navigation bar, click+Create.
TheCreate metastore service dialog opens.
SelectDataproc Metastore 2.
In thePricing and Capacity section, selectEnterprise - Single region
UnderInstance Size, clickEnable autoscaling.
UnderInstance Size, use the slider to choose a minimum and maximum instancesize.
To create and start the service, clickSubmit.
Your new metastore service appears on theDataproc Metastore page. The status displaysCreating until the service is ready to use. When it's ready, the status changes toActive. Provisioning the service might take a few minutes.
REST
Note: The following command assumes that you've logged in to the gcloud CLI withyour user account. You can sign in by executinggcloud initorgcloud auth login,or by usingCloud Shell, whichautomatically logs you into the gcloud CLI. You can check the activeaccount by executinggcloud auth list.curl -X POST -s -i -H "Authorization: Bearer $(gcloud auth print-access-token)" \-d '{"scaling_config":{"autoscaling_config":{"autoscaling_enabled": true,"limit_config":{"max_scaling_factor":MAX_SCALING_FACTOR,"min_scaling_factor":MIN_SCALING_FACTOR}}}}' \-H "Content-Type:application/json" \https://metastore.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/services?service_id=SERVICE_IDReplace the following:
MIN_INSTANCESOptional: The minimum number of instancesto use in your autoscaling configuration. If this values are not specified,a default value of0.1is used.MAX_INSTANCESOptional: The maximum number of instancesto use in your autoscaling configuration. If this values are not specified,a default value of6is used.
Create a Dataproc Metastore service using Shared VPC
A Shared VPC lets you connect Dataproc Metastoreresources from multiple projects to a common VPC network.
To create a Dataproc Metastore service configured with aShared VPC, seeCreate a service using advanced settings.
Considerations
VPC networks are not relevant for Dataproc Metastore servicesconfigured with the gRPC endpoint protocol.
For Dataproc Metastore services configured with the Thrift endpointprotocol, make sure your Dataproc Metastore service and theDataproc cluster it's attached to are using the same Shared VPCnetwork.
For Dataproc Metastore services configured with the Thrift endpointprotocol and Private Service Connect, make sure that you use subnetworksfrom the Shared VPC network.
IAM roles required for Shared VPC networks
To create a Dataproc Metastore service with a VPC that is accessiblein a network belonging to a different project,you must grantroles/metastore.serviceAgent to the service project'sDataproc Metastore service agent(service-SERVICE_PROJECT_NUMBER@gcp-sa-metastore.iam.gserviceaccount.com)in the network project's IAM policy.
metastore.services.create permission in the service project toindirectly create addresses and peerings in the network project.gcloudprojectsadd-iam-policy-bindingNETWORK_PROJECT_ID\--role"roles/metastore.serviceAgent"\--member"serviceAccount:service-SERVICE_PROJECT_NUMBER@gcp-sa-metastore.iam.gserviceaccount.com"gcloud projects add-iam-policy-binding commandmight fail with an error message containingService account [SERVICE_ACCOUNT_NAME] does not exist.If this happens, you can resolve the issue by attempting tocreate a service using a non-existent network in the service project. Theservice creation will fail, but it will trigger the creation of the serviceaccount. Afterwards, the command should succeed.Troubleshoot common issues
Some common issues include the following:
Restricting VPC peering. Before creating a metastore, don't set anorg policy constraint to restrict VPC peering or else the metastore creationfails. For more information about setting the correct VPC configurations,seeService creation fails due to constraint to restrict VPCpeering.
Issues with VPC networks. When creating a metastore, the VPC network youare using might run out of available RFC 1918 addresses required byDataproc Metastore services. For more information aboutfixing this issue, seeAllocated IP range isexhausted.
What's next
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2026-02-19 UTC.