Deploy a Dataproc Metastore service

This page shows you how to create a Dataproc Metastore serviceand connect to it from a Dataproc cluster. After, you SSH into thecluster, launch an instance of Apache Hive, and run some basic queries.

Dataproc Metastore provides you with a fully compatible HiveMetastore (HMS), which is the established standard in the open source big dataecosystem for managing technical metadata. This service helps you manage themetadata of your data lakes and provides interoperability between the variousdata processing tools you're using.

Note: This guide shows you how to configure Dataproc Metastoreusing the provided default options. To learn how to configureDataproc Metastore with advanced settings, seeCreate ametastore.

To follow step-by-step guidance for this task directly in the Google Cloud console, clickGuide me:

Guide me


Before you begin

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Roles required to select or create a project

    • Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
    • Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.create permission.Learn how to grant roles.
    Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.

    Go to project selector

  3. Verify that billing is enabled for your Google Cloud project.

  4. Enable the Dataproc Metastore, Dataproc APIs.

    Roles required to enable APIs

    To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enable permission.Learn how to grant roles.

    Enable the APIs

  5. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Roles required to select or create a project

    • Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
    • Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.create permission.Learn how to grant roles.
    Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.

    Go to project selector

  6. Verify that billing is enabled for your Google Cloud project.

  7. Enable the Dataproc Metastore, Dataproc APIs.

    Roles required to enable APIs

    To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enable permission.Learn how to grant roles.

    Enable the APIs

Required Roles

To get the permissions that you need to create a Dataproc Metastore and a Dataproc cluster, ask your administrator to grant you the following IAM roles:

  • To grant full access to all Dataproc Metastore resources, including setting IAM permissions: (roles/metastore.admin) on the user account or service account
  • To grant full control of Dataproc Metastore resources:Dataproc Metastore Editor (roles/metastore.editor) on the user account or service account
  • To create a Dataproc cluster: (roles/dataproc.worker) on the service account

For more information about granting roles, seeManage access to projects, folders, and organizations.

These predefined roles contain the permissions required to create a Dataproc Metastore and a Dataproc cluster. To see the exact permissions that are required, expand theRequired permissions section:

Required permissions

The following permissions are required to create a Dataproc Metastore and a Dataproc cluster:

  • To create a Dataproc Metastore service: metastore.services.create on the user account or service account
  • To create a Dataproc cluster: Dataproc worker (roles/dataproc.worker) on on the service account

You might also be able to get these permissions withcustom roles or otherpredefined roles.

For more information about specific Dataproc Metastore roles and permissions, seeDataproc Metastore IAM overview.

Create a Dataproc Metastore service

The following instructions show you how to create a basicDataproc Metastore service using the provided default settings.

Console

  1. In the Google Cloud console, go to theDataproc Metastore page.

    Go toDataproc Metastore

  2. In the navigation menu, click+Create.

    TheCreate Metastore service dialog opens.

  3. SelectDataproc Metastore 2.

  4. In theService name field, enterexample-service.

  5. In theData location field, selectus-central1.

  6. For the remaining service configuration options, use the provideddefaults.

  7. To create and start the service, clickSubmit.

Your new metastore service appears on theDataproc Metastore page. Thestatus displaysCreating until the service is ready to use. When it'sready, the status changes toActive. Provisioning the service might takea couple of minutes.

The following screenshot shows an example of theCreate service pageusing some of the provided defaults.

The Create service page.

gcloud CLI

 gcloud metastore services create example-service \     --location=us-central1 \     --instance-size=MEDIUM

.

REST

Follow the API instructions tocreate a serviceby using the APIs Explorer.

Create a Dataproc cluster and connect to Dataproc Metastore

Next, you create a Dataproc cluster and connect to your metastorefrom the cluster. After that, your cluster uses the metastore service as its HMS.The cluster you create here uses the default provided settings.

Console

  1. In the Google Cloud console, go to theDataproc Clusters page.

    Go toDataproc Clusters

  2. In the navigation bar, select+Create cluster.

    TheCreate a cluster dialog opens providing multiple infrastructure choices thatyou can choose from.

  3. In theCluster on Compute Engine row, selectCreate.

    TheCreate a Dataproc cluster on Compute Engine pageopens.

  4. In theCluster Name field, enterexample-cluster.

  5. In theRegion andZone menus, selectus-central1.

  6. For the remainingSet up cluster options, use the provided defaults.

  7. In the navigation menu, click theCustomize cluster (optional) tab.

  8. In theDataproc Metastore section, select the metastoreservice you created earlier.

    If you followed this tutorial as-is, it's namedexample-service.

  9. For the remaining service configuration options, use the provided defaults.

  10. To create the cluster, clickCreate.

    Your new cluster appears in theClusters list. The cluster statusdisplaysProvisioning until the cluster is ready to use. When it'sready, the status changes toActive. Provisioning the cluster mighttake a couple of minutes.

    Note: The Dataproc cluster creation process can failif your service account doesn't have the appropriate roles. For moreinformation, seeCluster creation fails due to insufficient roles.

gcloud CLI

To create a cluster using the provided default settings, run the followinggcloud dataproc clusters create command:

 gcloud dataproc clusters create example-cluster \    --dataproc-metastore=projects/PROJECT_ID/locations/us-central1/services/example-service \    --region=us-central1

ReplacePROJECT_ID with the project ID of the project that you created your Dataproc Metastore service in.

Note: The Dataproc cluster creation process can fail if your service account doesn't have the appropriate roles. For more information, seeCluster creation fails due to insufficient roles.

REST

Follow the API instructions tocreate a clusterby using the APIs Explorer.

Connect to Apache Hive with a Dataproc cluster

These next steps show you how to run some example commands in Apache Hive to createa database and a table.

Next, open an SSH session on the Dataproc cluster and launch aHive session.

  1. In the Google Cloud console, go to theVMInstances page.
  2. In the list of virtual machine instances, clickSSH next toexample-cluster.

A browser window opens in your home directory on the node with an output similarto the following:

Connected, host fingerprint: ssh-rsa ...Linux cluster-1-m 3.16.0-0.bpo.4-amd64 ......example-cluster@cluster-1-m:~$

To start Hive and create a database and table, run the following commands in the SSH session:

  1. Start Hive.

    hive
  2. Create a database calledmyDatabase.

    create database myDatabase;
  3. Show the database you created.

    show databases;
  4. Use the database you created.

    use myDatabase;
  5. Create a table calledmyTable.

    create table myTable(id int,name string);
  6. List the tables undermyDatabase.

    show tables;
  7. Describe the schema of the table you created.

    desc MyTable;

Running these commands show an output similar to the following:

$hivehive> show databases;OKdefaulthive> create database myDatabase;OKhive> use myDatabase;OKhive> create table myTable(id int,name string);OKhive> show tables;OKmyTablehive> desc myTable;OKid                      intname                    string

Clean up

To avoid incurring charges to your Google Cloud account for the resources used on this page, follow these steps.

    Caution: Deleting a project has the following effects:
    • Everything in the project is deleted. If you used an existing project for the tasks in this document, when you delete it, you also delete any other work you've done in the project.
    • Custom project IDs are lost. When you created this project, you might have created a custom project ID that you want to use in the future. To preserve the URLs that use the project ID, such as anappspot.com URL, delete selected resources inside the project instead of deleting the whole project.

    If you plan to explore multiple architectures, tutorials, or quickstarts, reusing projects can help you avoid exceeding project quota limits.

  1. In the Google Cloud console, go to theManage resources page.

    Go to Manage resources

  2. If the project that you plan to delete is attached to an organization, expand theOrganization list in theName column.
  3. In the project list, select the project that you want to delete, and then clickDelete.
  4. In the dialog, type the project ID, and then clickShut down to delete the project.

Alternatively, you can delete the resources used in this tutorial:

  1. Delete the Dataproc Metastore service.

    Console

    1. In the Google Cloud console, open theDataproc Metastore page:

      Go to Dataproc Metastore

    2. In the service list, selectexample-service.

    3. In the navigation bar, clickDelete.

      TheDelete service dialog opens.

    4. In the dialog, clickDelete

      Your service no longer appears in theService list.

    gcloud CLI

    To delete your service, run the followinggcloud metastore servicesdelete command.

     gcloud metastore services delete example-service \     --location=us-central1

    REST

    Follow the API instructions todelete a serviceby using the APIs Explorer.

    All deletions succeed immediately.

  2. Delete the Cloud Storage bucket for the Dataproc Metastore service.

    Note: Deleting the Dataproc Metastore service doesn'tautomatically delete its bucket.
  3. Delete the Dataproc cluster that used the Dataproc Metastore service.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.