Troubleshoot cluster creation issues Stay organized with collections Save and categorize content based on your preferences.
This document explains common cluster creation error messages and providestips on troubleshooting cluster creation issues.
Common cluster creation error messages
User not authorized to act as service account
Cause: The principal attempting to create the Dataproc cluster does not have the necessary permissions to use the specified service account. Dataproc users are required to have service account
ActAspermission to deploy Dataproc resources;this permission is included in the Service Account User role (roles/iam.serviceAccountUser)(seeDataproc roles).Solution: Identify theuser or service account trying to create the Dataproc cluster. Grant that principal the Service Account User role (
roles/iam.serviceAccountUser) on the service account the cluster is configured to use (typically, theDataproc VM service account).Operation timed out: Only 0 out of 2 minimum required datanodes/node managers running.
Cause: The controller node is unable to create the cluster becauseit cannot communicate with worker nodes.
Solution:
- Check firewall rule warnings.
- Make sure the correct firewall rules are in place. For more information, seeOverview of the default Dataproc firewall rules.
- Perform aconnectivity testin the Google Cloud console to determine what isblocking communication between the controller and worker nodes.
Required
compute.subnetworks.usepermission forprojects/{projectId}/regions/{region}/subnetworks/{subnetwork}Cause: This error can occur when you attempt to setup a Dataproccluster using a VPC network in another project and the DataprocService Agentservice account does not have the necessary permissions on the Shared VPCproject that is hosting the network.
Solution: Follow the steps listed inCreate a cluster that uses a VPC network in another project.
The zone
projects/zones/{zone}does not have enough resources available to fulfill the request(resource type:compute)Cause: The zone being used to create the cluster does not have sufficient resources.
Solution:
- Use the DataprocAuto Zone placementfeature to create the cluster in any of a region's zones with available resources.
- Create the cluster in a different zone.
Quota Exceeded errors
Insufficient CPUS/CPUS_ALL_REGIONS quota
Insufficient 'DISKS_TOTAL_GB' quota
Insufficient 'IN_USE_ADDRESSES' quotaCause: YourCPU,disk,orIP address request exceeds youravailable quota.
Solution: Request additional quota from theGoogle Cloud console.
Initialization action failed
Cause: The initialization action provided during cluster creation failedto install.
Solution:
- Seeinitialization actions considerations and guidelines.
- Examine the output logs. The error message should provide a link to thelogs in Cloud Storage.
Failed to initialize node
CLUSTER-NAME-m. ... See output in:<gs://PATH_TO_STARTUP_SCRIPT_OUTPUT>Cause: Dataproc cluster controller node failed to be initialized.
Solution:
- Review the startup script output logs listed in the error message(
gs://PATH_TO_STARTUP_SCRIPT_OUTPUT) and verifythe cause of the failed node initialization. - Causes can includeDataproc cluster network configuration issuesandfailed installation of Python package dependencies.
- If the issue is not resolved after you review the startup-script logs, fix any user side issues, then retrywith exponential backoff, contactGoogle Cloud support.
- Review the startup script output logs listed in the error message(
Cluster creation failed: IP address space exhausted
Cause: IP address space needed to provision the requested cluster nodes isunavailable.
Solution:
- Create a cluster with fewer worker nodes, but a larger machine type.
- Create a cluster on a different subnetwork or network.
- Reduce usage on the network to free IP address space.
- Wait until sufficient IP space becomes available on the network.
Initialization script error message: The repositoryREPO_NAME no longer has a Release file
Cause: The Debian oldstable backports repository was purged.
Solution:
Add the following code before the code that runs
apt-getin your initialization script.oldstable=$(curl -s https://deb.debian.org/debian/dists/oldstable/Release | awk '/^Codename/ {print $2}');stable=$(curl -s https://deb.debian.org/debian/dists/stable/Release | awk '/^Codename/ {print $2}');matched_files="$(grep -rsil '\-backports' /etc/apt/sources.list*)"if [[ -n "$matched_files" ]]; then for filename in "$matched_files"; do grep -e "$oldstable-backports" -e "$stable-backports" "$filename" || \ sed -i -e 's/^.*-backports.*$//' "$filename" donefiTimeout waiting for instance
DATAPROC_CLUSTER_VM_NAMEto report in orNetwork is unreachable:dataproccontrol-REGION.googleapis.comCause: These error messages indicate that the networking setup of yourDataproc cluster is incomplete: you may be missing theroute tothe default internet gatewayorfirewall rules.
Solution:
To troubleshoot this issue, you can create the followingConnectivity Tests:
- Create a Connectivity Testbetween two Dataproc cluster VMs. The outcome of this testwill help you understand whether the ingress or egress allow firewall rulesof your network apply to the cluster VMs correctly.
- Create a Connectivity Testbetween a Dataproc cluster VM and a current Dataproccontrol API IP address. To get a current Dataproc control APIIP address, use the following command:
dig dataproccontrol-REGION.googleapis.com A
Use any of the IPv4 addresses in the answer section of the output.
The outcome of the Connectivity Test will help you understand whether the routeto the default internet gateway and the egress allow firewall are properlyconfigured.
Based on the outcomes of the Connectivity Tests:
- Add a route to the internet to your cluster VPC network:
0.0.0.0/0for IPv4 and::/0for IPv6 with--next-hop-gateway=default-internet-gateway. - Addfirewall rules for access control.
Error due to update
Cause: The cluster accepted a job submitted to the Dataprocservice, but was unable to scale up or down manually or through autoscaling.This error can also be caused by a non-standard cluster configuration.
Solution:
Cluster reset: Open a support ticket, include adiagnostic tar file,and ask for the cluster to be reset to a RUNNING state.
New cluster:Recreate the clusterwith the same configuration. This solution can be faster than asupport-provided reset.
Cluster troubleshooting tips
This section provides additional guidance on troubleshooting common issues thatcan prevent the creation of Dataproc clusters.
When a Dataproc cluster fails to provision, it often producesa generic error message or reports aPENDING orPROVISIONING statusbefore failing. The key to diagnosing and solving cluster failure issuesis to examine cluster logs and assess common failure points.
Common symptoms
The following are common symptoms associated with cluster creation failures:
- Cluster status remains
PENDINGorPROVISIONINGfor an extended period. - Cluster transitions to
ERRORstate. - Generic API errors during cluster creation, such as
Operation timed out. Logged or API response error messages, such as:
RESOURCE_EXHAUSTED: related to CPU, disk, or IP address quotasInstance failed to startPermission deniedUnable to connect to service_name.googleapis.comorCould not reach required Google APIsConnection refusedornetwork unreachable- Errors related to initialization actions failing, such as script execution errors and file not found.
Review cluster logs
An important initial step when diagnosing cluster creation failures is reviewing thedetailed cluster logs available in Cloud Logging.
- Go to Logs Explorer: Open theLogs Explorerin the Google Cloud console.
- Filter for Dataproc clusters:
- In theResource drop-down, select
Cloud Dataproc Cluster. - Enter your
cluster_nameandproject_id. You can also filter bylocation(region).
- In theResource drop-down, select
- Examine Log Entries:
- Look for
ERRORorWARNINGlevel messages that occur close to the timeof the cluster creation failure. - Pay attention to logs from
master-startup,worker-startup, andagentcomponents for insights into VM-level or Dataprocagent issues. - For insight into VM boot-time issues,filter logs by
resource.type="gce_instance", and look for messagesfrom the instance names associated with your cluster nodes, such asCLUSTER_NAME-morCLUSTER_NAME-w-0.Serial console logs can reveal network configuration issues, diskproblems, and script failures that occur early in the VM lifecycle.
- Look for
Common cluster failure causes and troubleshooting tips
This section outlines common reasons why Dataproc clustercreation might fail and provides troubleshooting tips to help troubleshoot cluster failures.
Insufficient IAM permissions
TheVM service accountthat your Dataproc cluster uses must have appropriateIAM roles to provision Compute Engine instances, access Cloud Storagebuckets, write logs, and interact with other Google Cloud services.
- Required Worker role: Verify that the VM service account has theDataproc Workerrole (
roles/dataproc.worker). This role has the minimum permissions required forDataproc to manage cluster resources. - Data access permissions: If your jobs read from or write toCloud Storage or BigQuery, the service account needsrelated roles, such as
Storage Object Viewer,Storage Object Creator,orStorage Object Adminfor Cloud Storage,orBigQuery Data ViewerorBigQuery Editorfor BigQuery. - Logging permissions: The service account must have a role with permissionsneeded to write logs to Cloud Logging, such as the
Logging Writerrole.
Troubleshooting tips:
Identify service account: Determine theVM service accountthat your cluster is configured to use. If not specified, the default is theCompute Engine default service account.
Verify IAM roles: Go totheIAM & Admin > IAMpage in the Google Cloud console, find the cluster VM service account,and then verify that it has the necessary roles needed for cluster operations.Grant any missing roles.
Resource quotas exceeded
Dataproc clusters consume resources from Compute Engine andother Google Cloud services. Exceeding project or regional quotas can causecluster creation failures.
- CommonDataproc quotas to check:
CPUs(regional)DISKS_TOTAL_GB(regional)IN_USE_ADDRESSES(regional for internal IPs, global for external IPs)- Dataproc API quotas, such as
ClusterOperationRequestsPerMinutePerProjectPerRegionTo compare Dataproc quotas withServerless for Apache Spark quotas,seeServerless for Apache Spark quotas.
Troubleshooting tips:
- Review quotas: Go to theIAM & Admin > IAMpage in the Google Cloud console.Filter by "Service" for "Compute Engine API" and "Dataproc API."
- Check usage vs. limit: Identify any quotas that are at or near their limits.
- If necessary, request a quota increase.
Network configuration issues
Network configurationissues, such as incorrect VPC network, subnet, firewall, or DNS configuration,are a common cause of cluster creation failures. Cluster instances must be ableto communicate with each other and with Google APIs.
- VPC network and subnet:
- Verify that the cluster VPC network and subnet exist and areconfigured correctly.
- Verify that the subnet has a sufficient range of available IP addresses.
- Private Google Access(PGA): If cluster VMshave internal IP addresses and need to reach Google APIs for Cloud Storage,Cloud Logging, and other operations, verify thatPrivate Google Accessis enabled on the subnet. By default, Dataproc clusters createdwith 2.2+ image versions provision VMs with internal-only IP addresses with Private Google Accessenabled on the cluster regional subnet.
- Private Service Connect(PSC): If you are usingPrivate Service Connect to access GoogleAPIs, verify that necessaryPrivate Service Connect endpointsare correctly configured for the Google APIs that Dataprocdepends on, such as
dataproc.googleapis.com,storage.googleapis.com,compute.googleapis.com, andlogging.googleapis.com. DNS entries forthe APIs must resolve to private IP addresses. Note that usingPrivate Service Connectdoes not eliminate the need to use VPC peering to communicatewith other customer-managed VPC networks.For detailed Private Service Connect network troubleshooting, seeDataproc cluster networking with Private Service Connect. - VPC Peering: If your cluster communicates with resources inother VPC networks, such as shared VPC host projects orother customer VPCs, verify that VPC peering is correctlyconfigured and routes are propagating.
Firewall rules:
- Default rules: Verify that default firewall rules, such as
allow-internalorallow-ssh, are not overly restrictive. Custom rules: If custom firewall rules are in place, verify that theyallow needed communication paths:
- Internal communication within the cluster (between-m and-wnodes).
Outbound traffic from cluster VMs to Google APIs, using either publicIPs or an internet gateway, Private Google Access, orPrivate Service Connect endpoints.
Traffic to any external data sources or services that your jobs depend on.
- Default rules: Verify that default firewall rules, such as
DNS resolution: Confirm that cluster instances can correctly resolve DNSnames for Google APIs and any internal or external services.
Troubleshooting tips:
- Review network configuration: Inspect the VPC network and subnetsettings where the cluster is being deployed.
- Check firewall rules: Review firewall rules in the VPC networkor shared VPC host project.
- Test connectivity: Launch a temporary Compute EngineVM in the cluster subnet and run the following steps:
pingorcurlto external Google API domains, such asstorage.googleapis.com.nslookupto verify DNS resolution to expected IP addresses (Private Google Access orPrivate Service Connect).- Run Google Cloudconnectivity teststo diagnose paths from a test VM to relevant endpoints.
Initialization action failures
Dataproc initialization actions are scripts that run on cluster VMsduring cluster creation. Errors in these scripts can prevent cluster startup.
Troubleshooting tips:
- Examine logs for initialization action errors: Look for log entries related to
init-actionsorstartup-scriptfor the cluster instances in Cloud Logging. - Check script paths and permissions: Verify that initialization actionscripts are correctly located in Cloud Storage and that the cluster VM serviceaccount has the
Storage Object Viewerrole needed to read Cloud Storage scripts. - Debug script logic: Test script logic on a separate Compute EngineVM that mimics the cluster environment to identify errors. Add verboselogging to the script.
Regional resource availability (stockouts)
Occasionally, a machine type or resource in a region or zoneexperiences temporary unavailability (stockout). Typically, this resultsinRESOURCE_EXHAUSTED errors unrelated to project quota issues.
Troubleshooting tips:
- Try a different zone or region: Attempt to create the cluster in a differentzone within the same region, or in a different region.
- Use Auto Zone placement: Use the DataprocAuto Zone placementfeature to automatically select a zone with capacity.
- Adjust machine type: If using a custom or specialized machine type,try a standard machine type to see if that resolves the issue.
Contact Cloud Customer Care
If you continue to experience cluster failure issues, contactCloud Customer Care. Describe the cluster failure issue andtroubleshooting steps taken. Additionally, provide the following information:
- Cluster diagnostic data
- Output from the following command:
gcloud dataproc clusters describeCLUSTER_NAME \ --region=REGION
- Exported logs for the failed cluster.
Use thegcpdiag tool
gcpdiagis an open source tool. It is not an officially supported Google Cloud product.You can use thegcpdiag tool to help you identify and fix Google Cloudproject issues. For more information, see thegcpdiag project on GitHub.
Thegcpdiag tool helps you discover the following Dataproccluster creation issues by performing the following checks:
- Stockout errors: Evaluates Logs Explorer logs to discover stockouts inregions and zones.
- Insufficient quota: Checks quota availability in the Dataproccluster project.
- Incomplete network configuration: Performs network connectivity tests,including checks for necessary firewall rules and external and internal IPconfiguration.If the cluster has been deleted, the
gcpdiagtool cannotperform a network connectivity check. - Incorrect cross-project configuration: Checks for cross-project serviceaccounts and reviews additional roles and organization policies enforcement.
- Missing shared Virtual Private Cloud network IAM roles: If the Dataproc cluster usesa Shared VPC network, checks for the addition of required serviceaccount roles.
- Initialization action failures: Evaluates Logs Explorerlogs to discover initialization action script failures and timeouts.
For a list ofgcpdiag cluster-creation steps, seePotential steps.
Run thegcpdiag command
You can run thegcpdiag command fromCloud Shell in theGoogle Cloud console or within aDocker container.
Google Cloud console
- Complete and then copy the following command.
- Open the Google Cloud console and activate Cloud Shell. Open Cloud console
- Paste the copied command.
- Run the
gcpdiagcommand, which downloads thegcpdiagdocker image, and then performs diagnostic checks. If applicable, follow the output instructions to fix failed checks.
gcpdiag runbook dataproc/cluster-creation \ --parameter project_id=PROJECT_ID \ --parameter cluster_name=CLUSTER_NAME \ --parameterOPTIONAL_FLAGSDocker
You can rungcpdiag using a wrapper that startsgcpdiag in aDocker container. Docker orPodman must be installed.
- Copy and run the following command on your local workstation.
curl https://gcpdiag.dev/gcpdiag.sh >gcpdiag && chmod +x gcpdiag
- Execute the
gcpdiagcommand../gcpdiag runbook dataproc/cluster-creation \ --parameter project_id=PROJECT_ID \ --parameter cluster_name=CLUSTER_NAME \ --parameterOPTIONAL_FLAGS
Viewavailable parameters for this runbook.
Replace the following:
- PROJECT_ID: The ID of the project containing the resource
- CLUSTER_NAME: The name of the target Dataproc cluster in your project
- OPTIONAL_PARAMETERS: Add one or more of the following optional parameters.These parameters are required if the cluster has been deleted.
cluster_uuid: The UUID of the target Dataproc cluster in your projectservice_account: The Dataproc clusterVM service accountsubnetwork: The Dataproc cluster subnetwork full URI pathinternal_ip_only: True or Falsecross_project: The cross-project ID if the Dataproc cluster uses a VM service account in another project
Useful flags:
--universe-domain: If applicable, theTrusted Partner Sovereign Cloud domain hosting the resource--parameteror-p: Runbook parameters
For a list and description of allgcpdiag tool flags, see thegcpdiag usage instructions.
What's next
- Learn more about theDataproc monitoring and troubleshooting tools.
- Learn how todiagnose Dataproc clusters.
- Refer to theDataproc FAQdocument.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2026-02-19 UTC.