Use hierarchical namespace enabled buckets for Hadoop workloads Stay organized with collections Save and categorize content based on your preferences.
This page describes how to usehierarchical namespace enabled buckets for Hadoop workloads.
Overview
When using a Cloud Storage bucket with hierarchical namespace, you can configure theCloud Storage connector to use therename folder operation for workloads like Hadoop, Spark, Hive.
In a bucket without hierarchical namespace, a rename operation in Hadoop, Spark,and Hive involves multiple object copy and delete jobs, impactingperformance and consistency. Renaming a folder using the Cloud Storageconnector optimizes performance and ensures consistency, when handling folderswith a large number of objects.
Before you begin
To use features of hierarchical namespace buckets, use the following Cloud Storageconnector versions:
- 2.2.23 or later (if you are using version 2.x.x)
- 3.0.1 or later (if you are using version 3.x.x)
Older connector versions (3.0.0 and older than 2.2.23) have limitations. For more information about the limitations, seeCompatibility withCloud Storage connector version 3.0.0 or versions older than2.2.23.
Enable the Cloud Storage connector on a cluster
This section describes how to enable the Cloud Storage connector on a Dataproc cluster and a self-managed Hadoop cluster.
Dataproc
You can use the Google Cloud CLI to create a Dataproc cluster and enable the Cloud Storage connector to perform the folder operations.
Create a Dataproc cluster using the following command:
gcloud dataproc clusters createCLUSTER_NAME --properties=core:fs.gs.hierarchical.namespace.folders.enable=true, core:fs.gs.http.read-timeout=30000
Where:
CLUSTER_NAMEis the name of the cluster. Forexample,my-clusterfs.gs.hierarchical.namespace.folders.enableis used to enable the hierarchical namespace on a bucket.
Note: If you are using the Cloud Storage connector version 3.0.0 or a version older than 2.2.23, the configuration settingfs.gs.http.read-timeoutis the maximum time allowed, in milliseconds, to read data from an established connection. This is an optional setting.fs.gs.hierarchical.namespace.folders.enableis not supported and results in an error if included.
Self-managed Hadoop
You can enable the Cloud Storage connector on your self-managed Hadoop cluster to perform the folder operations.
Add the following to core-site.xml configuration file:
<property> <name>fs.gs.hierarchical.namespace.folders.enable</name> <value>true</value> </property> <property> <name>fs.gs.http.read-timeout</name> <value>30000</value> </property>
Where:
fs.gs.hierarchical.namespace.folders.enableis used to enable the hierarchical namespace on a bucket
Note: If you are using the Cloud Storage connector version 3.0.0 or a version older than 2.2.23, the configuration settingfs.gs.http.read-timeoutis the maximum time allowed, in milliseconds, to read data from an established connection. This is an optional setting.fs.gs.hierarchical.namespace.folders.enableis not supported and results in an error if included.
Compatibility with Cloud Storage connector version 3.0.0 or versions older than 2.2.23
Using the Cloud Storage connector version 3.0.0 or versions older than 2.2.23 or disabling folder operations for hierarchical namespace can lead to the following limitations:
Inefficient folder renames: Folder rename operations in Hadoop happenusing object-level copy and delete operations, which is slower and lessefficient than the dedicated
rename folderoperation.Accumulation of empty folders: Folders are not deletedautomatically, leading to the accumulation of empty folders in your bucket.Accumulation of empty folders can have the following impact:
- Increase storage costs if not deleted explicitly.
Slow down the list operations and increase the risk of list operationtimeouts.
Note: To reduce the risk of list operation timeouts, configure thefs.gs.http.read-timeouttimeout value to30000milliseconds. To configure timeout settings, refer to the instructions forDataproc orSelf-managed Hadoop, depending on which one you are using.
Compatibility issues: Mixing the usage of older and newer connectorversions, or enabling and disabling folder operations, can lead tocompatibility issues, when renaming folders. Consider the following scenariowhich uses a combination of connector versions:
Use the Cloud Storage connector version older than 2.2.23 toperform the following tasks:
- Write objects under the folder
foo/. - Rename the folder
foo/tobar/. The rename operation copies anddeletes the objects underfoo/but does not delete the emptyfoo/folder.
- Write objects under the folder
Use the Cloud Storage connector version 2.2.23 with thefolder operations settings enabled to rename the folder
bar/tofoo/.
The connector version 2.2.23, with the folder operation enabled,detects the existing
foo/folder, causing the rename operation tofail. The older connector version, did not delete thefoo/folder asthe folder operation was disabled.
What's next
Try it for yourself
If you're new to Google Cloud, create an account to evaluate how Cloud Storage performs in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
Try Cloud Storage freeExcept as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2026-02-19 UTC.