Hive metastore

Dataproc Metastore is a fully managed, highly available,autohealing, serverless, Apache Hive metastore (HMS) that runs on Google Cloud.

To fully manage your metadata, Dataproc Metastore maps yourdata toApache Hive tables.

Supported Apache Hive versions

Dataproc Metastore only supports specific versions of Apache Hive.For more information, see theHive version policy.

How Hive handles metadata

Since Dataproc Metastore is a Hive metastore,it's important to understand how it manages your metadata.

By default, all Hive applications can havemanaged internal tables orunmanaged external tables. Meaning, the metadata that you store in aDataproc Metastore service can exist in both internal and external tables.

When modifying data, a Dataproc Metastore service (Hive) treatsinternal and external tables differently.

  • Internal tables. Manages both metadata and table data.
  • External tables. Only manages metadata.

For example, if you delete a table definition using theDROP TABLE Hive SQLstatement:

drop table foo
  • Internal tables. Dataproc Metastore deletes all metadata.It also deletes the files associated with the table.

  • External tables. Dataproc Metastore only deletes the metadata.It keeps the data associated with the table.

Hive warehouse directory

Dataproc Metastore uses the Hive warehouse directory to manageyour internal tables. The Hive warehouse directory is where your actual data isstored.

When you use a Dataproc Metastore service, the default Hive warehousedirectory is a Cloud Storage bucket. Dataproc Metastore onlysupports the use of Cloud Storage buckets for the warehouse directory.In comparison, this is different to an on-premises HMS, where the Hive warehousedirectory usually points to a local directory.

This bucket is automatically created for you every time you create aDataproc Metastore service. This value can be changed by settinga Hive Metastore configuration override on thehive.metastore.warehouse.dirproperty.

Artifacts Cloud Storage buckets

The artifacts bucket stores your Dataproc Metastore artifacts,such as exported metadata and managed internal table data.

When you create a Dataproc Metastore service, a Cloud Storagebucket is automatically created for you in your project. By default both theartifacts bucket and the warehouse directory point to the same bucket. Youcan't change the location of the artifacts bucket, however, you canchange the location of the Hive warehouse directory.

The artifacts bucket is located at the following location:

  • gs://your-artifacts-bucket/hive-warehouse.
  • For example,gs://gcs-your-project-name-0825d7b3-0627-4637-8fd0-cc6271d00eb4.
Note: This bucket is created withuniform bucket-level accessand can't be changed to use fine-grained ACLs.

Access the Hive warehouse directory

After your bucket is automatically created for you, ensure that yourDataproc service accountshave permission to access the Hive warehouse directory.

  • To access the warehouse directory at the object level (for example, gs://mybucket/object), grant the Dataproc service accounts read and write access to the storage object of the bucket, using theroles/storage.objectAdmin role. This role must be set at the bucket level or higher.

  • To access the warehouse directory when you use a top-level folder, (for example, gs://mybucket), grant the Dataproc service accounts read and write access to the storage object of the bucket, using theroles/storage.storageAdmin role.

If the Hive warehouse directory is not in the same project as theDataproc Metastore, ensure that the Dataproc Metastoreservice agent has permission to access the Hive warehouse directory. The serviceagent for a Dataproc Metastore project isservice-PROJECT_NUMBER@gcp-sa-metastore.iam.gserviceaccount.com.Grant the service agent read and write access to the bucket using theroles/storage.objectAdmin role.

Find the Hive warehouse directory

  1. Open theDataproc Metastore page.
  2. Click the name of your service.

    The Service detail page opens.

  3. In the configuration table, findMetastore config overrides >hive.metastore.warehouse.dir.

  4. Find the value that starts withgs://.

    This value is the location of your Hive warehouse directory.

    hive.metastore.warehouse.dir values

Change the Hive warehouse directory

To use your own Cloud Storage bucket with Dataproc Metastore,set a Hive Metastore configuration override to point to the new bucket location.

If you change your default warehouse directory, follow these recommendations.

  • Don't use the Cloud Storage bucket root (gs://mybucket) to storeHive tables.

  • Make sure your Dataproc Metastore VM service accounthas permission toaccess the Hive warehouse directory.

  • For best results, use Cloud Storage buckets that are located in thesame region as your Dataproc Metastore service. AlthoughDataproc Metastore allows cross-region buckets,colocated resources perform better. For example, aeurope-west1 bucketdoesn't work well with aus-central1 service. Cross-region access results inhigher latency, lack of regional failure isolation, and charges for cross-regionnetwork bandwidth.

To change the Hive warehouse directory

  1. Open theDataproc Metastore page.
  2. Click the name of your service.

    The Service detail page opens.

  3. In the configuration table, find theMetastore config overrides >hive.metastore.warehouse.dir section.

  4. Change thehive.metastore.warehouse.dir value to the location of your newbucket. For example,gs://my-bucket/path/to/location.

Delete your bucket

Deleting your Dataproc Metastore service doesn't automaticallydelete your Cloud Storage artifacts bucket. Your bucket isn't automaticallydeleted because it might contain useful post-service data. To delete your bucket,run a Cloud Storagedelete operation.

Caution: Don't delete your artifacts bucket until after you delete yourDataproc Metastore service. If you delete your artifacts bucket first,the service might stop working.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.