Deploy a Dataproc Metastore service
This page shows you how to create a Dataproc Metastore serviceand connect to it from a Dataproc cluster. After, you SSH into thecluster, launch an instance of Apache Hive, and run some basic queries.
Dataproc Metastore provides you with a fully compatible HiveMetastore (HMS), which is the established standard in the open source big dataecosystem for managing technical metadata. This service helps you manage themetadata of your data lakes and provides interoperability between the variousdata processing tools you're using.
Note: This guide shows you how to configure Dataproc Metastoreusing the provided default options. To learn how to configureDataproc Metastore with advanced settings, seeCreate ametastore.To follow step-by-step guidance for this task directly in the Google Cloud console, clickGuide me:
Before you begin
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.Roles required to select or create a project
- Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
- Create a project: To create a project, you need the Project Creator role (
roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission.Learn how to grant roles.
Verify that billing is enabled for your Google Cloud project.
Enable the Dataproc Metastore, Dataproc APIs.
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM role (
roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enablepermission.Learn how to grant roles.In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.Roles required to select or create a project
- Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
- Create a project: To create a project, you need the Project Creator role (
roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission.Learn how to grant roles.
Verify that billing is enabled for your Google Cloud project.
Enable the Dataproc Metastore, Dataproc APIs.
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM role (
roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enablepermission.Learn how to grant roles.
Required Roles
To get the permissions that you need to create a Dataproc Metastore and a Dataproc cluster, ask your administrator to grant you the following IAM roles:
- To grant full access to all Dataproc Metastore resources, including setting IAM permissions: (
roles/metastore.admin) on the user account or service account - To grant full control of Dataproc Metastore resources:Dataproc Metastore Editor (
roles/metastore.editor) on the user account or service account - To create a Dataproc cluster: (
roles/dataproc.worker) on the service account
For more information about granting roles, seeManage access to projects, folders, and organizations.
These predefined roles contain the permissions required to create a Dataproc Metastore and a Dataproc cluster. To see the exact permissions that are required, expand theRequired permissions section:
Required permissions
The following permissions are required to create a Dataproc Metastore and a Dataproc cluster:
- To create a Dataproc Metastore service:
metastore.services.createon the user account or service account - To create a Dataproc cluster:
Dataproc worker (on on the service accountroles/dataproc.worker)
You might also be able to get these permissions withcustom roles or otherpredefined roles.
For more information about specific Dataproc Metastore roles and permissions, seeDataproc Metastore IAM overview.Create a Dataproc Metastore service
The following instructions show you how to create a basicDataproc Metastore service using the provided default settings.
Console
In the Google Cloud console, go to theDataproc Metastore page.
In the navigation menu, click+Create.
TheCreate Metastore service dialog opens.
SelectDataproc Metastore 2.
In theService name field, enter
example-service.In theData location field, select
us-central1.For the remaining service configuration options, use the provideddefaults.
To create and start the service, clickSubmit.
Your new metastore service appears on theDataproc Metastore page. Thestatus displaysCreating until the service is ready to use. When it'sready, the status changes toActive. Provisioning the service might takea couple of minutes.
The following screenshot shows an example of theCreate service pageusing some of the provided defaults.

gcloud CLI
gcloud metastore services create example-service \ --location=us-central1 \ --instance-size=MEDIUM
.
REST
Follow the API instructions tocreate a serviceby using the APIs Explorer.
Create a Dataproc cluster and connect to Dataproc Metastore
Next, you create a Dataproc cluster and connect to your metastorefrom the cluster. After that, your cluster uses the metastore service as its HMS.The cluster you create here uses the default provided settings.
Console
In the Google Cloud console, go to theDataproc Clusters page.
In the navigation bar, select+Create cluster.
TheCreate a cluster dialog opens providing multiple infrastructure choices thatyou can choose from.
In theCluster on Compute Engine row, selectCreate.
TheCreate a Dataproc cluster on Compute Engine pageopens.
In theCluster Name field, enter
example-cluster.In theRegion andZone menus, select
us-central1.For the remainingSet up cluster options, use the provided defaults.
In the navigation menu, click theCustomize cluster (optional) tab.
In theDataproc Metastore section, select the metastoreservice you created earlier.
If you followed this tutorial as-is, it's named
example-service.For the remaining service configuration options, use the provided defaults.
To create the cluster, clickCreate.
Your new cluster appears in theClusters list. The cluster statusdisplaysProvisioning until the cluster is ready to use. When it'sready, the status changes toActive. Provisioning the cluster mighttake a couple of minutes.
Note: The Dataproc cluster creation process can failif your service account doesn't have the appropriate roles. For moreinformation, seeCluster creation fails due to insufficient roles.
gcloud CLI
To create a cluster using the provided default settings, run the followinggcloud dataproc clusters create command:
gcloud dataproc clusters create example-cluster \ --dataproc-metastore=projects/PROJECT_ID/locations/us-central1/services/example-service \ --region=us-central1
ReplacePROJECT_ID with the project ID of the project that you created your Dataproc Metastore service in.
REST
Follow the API instructions tocreate a clusterby using the APIs Explorer.
Connect to Apache Hive with a Dataproc cluster
These next steps show you how to run some example commands in Apache Hive to createa database and a table.
Next, open an SSH session on the Dataproc cluster and launch aHive session.
- In the Google Cloud console, go to theVMInstances page.
- In the list of virtual machine instances, clickSSH next to
example-cluster.
A browser window opens in your home directory on the node with an output similarto the following:
Connected, host fingerprint: ssh-rsa ...Linux cluster-1-m 3.16.0-0.bpo.4-amd64 ......example-cluster@cluster-1-m:~$To start Hive and create a database and table, run the following commands in the SSH session:
Start Hive.
hiveCreate a database called
myDatabase.create database myDatabase;Show the database you created.
show databases;Use the database you created.
use myDatabase;Create a table called
myTable.create table myTable(id int,name string);List the tables under
myDatabase.show tables;Describe the schema of the table you created.
desc MyTable;
Running these commands show an output similar to the following:
$hivehive> show databases;OKdefaulthive> create database myDatabase;OKhive> use myDatabase;OKhive> create table myTable(id int,name string);OKhive> show tables;OKmyTablehive> desc myTable;OKid intname stringClean up
To avoid incurring charges to your Google Cloud account for the resources used on this page, follow these steps.
Alternatively, you can delete the resources used in this tutorial:
Delete the Dataproc Metastore service.
Console
In the Google Cloud console, open theDataproc Metastore page:
In the service list, select
example-service.In the navigation bar, clickDelete.
TheDelete service dialog opens.
In the dialog, clickDelete
Your service no longer appears in theService list.
gcloud CLI
To delete your service, run the following
gcloud metastore servicesdeletecommand.gcloud metastore services delete example-service \ --location=us-central1
REST
Follow the API instructions todelete a serviceby using the APIs Explorer.
All deletions succeed immediately.
Delete the Cloud Storage bucket for the Dataproc Metastore service.
Note: Deleting the Dataproc Metastore service doesn'tautomatically delete its bucket.Delete the Dataproc cluster that used the Dataproc Metastore service.
What's next
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2026-02-19 UTC.