Troubleshoot cluster creation issues

This document explains common cluster creation error messages and providestips on troubleshooting cluster creation issues.

Common cluster creation error messages

  • User not authorized to act as service account

    Cause: The principal attempting to create the Dataproc cluster does not have the necessary permissions to use the specified service account. Dataproc users are required to have service accountActAs permission to deploy Dataproc resources;this permission is included in the Service Account User role (roles/iam.serviceAccountUser)(seeDataproc roles).

    Solution: Identify theuser or service account trying to create the Dataproc cluster. Grant that principal the Service Account User role (roles/iam.serviceAccountUser) on the service account the cluster is configured to use (typically, theDataproc VM service account).

  • Operation timed out: Only 0 out of 2 minimum required datanodes/node managers running.

    Cause: The controller node is unable to create the cluster becauseit cannot communicate with worker nodes.

    Solution:

  • Requiredcompute.subnetworks.use permission forprojects/{projectId}/regions/{region}/subnetworks/{subnetwork}

    Cause: This error can occur when you attempt to setup a Dataproccluster using a VPC network in another project and the DataprocService Agentservice account does not have the necessary permissions on the Shared VPCproject that is hosting the network.

    Solution: Follow the steps listed inCreate a cluster that uses a VPC network in another project.

  • The zoneprojects/zones/{zone} does not have enough resources available to fulfill the request(resource type:compute)

    Cause: The zone being used to create the cluster does not have sufficient resources.

    Solution:

    • Use the DataprocAuto Zone placementfeature to create the cluster in any of a region's zones with available resources.
    • Create the cluster in a different zone.
  • Quota Exceeded errors

    Insufficient CPUS/CPUS_ALL_REGIONS quota
    Insufficient 'DISKS_TOTAL_GB' quota
    Insufficient 'IN_USE_ADDRESSES' quota

    Cause: YourCPU,disk,orIP address request exceeds youravailable quota.

    Solution: Request additional quota from theGoogle Cloud console.

  • Initialization action failed

    Cause: The initialization action provided during cluster creation failedto install.

    Solution:

  • Failed to initialize nodeCLUSTER-NAME-m. ... See output in:<gs://PATH_TO_STARTUP_SCRIPT_OUTPUT>

    Cause: Dataproc cluster controller node failed to be initialized.

    Solution:

  • Cluster creation failed: IP address space exhausted

    Cause: IP address space needed to provision the requested cluster nodes isunavailable.

    Solution:

    • Create a cluster with fewer worker nodes, but a larger machine type.
    • Create a cluster on a different subnetwork or network.
    • Reduce usage on the network to free IP address space.
    • Wait until sufficient IP space becomes available on the network.
  • Initialization script error message: The repositoryREPO_NAME no longer has a Release file

    Cause: The Debian oldstable backports repository was purged.

    Solution:

    Add the following code before the code that runsapt-get in your initialization script.

    oldstable=$(curl -s https://deb.debian.org/debian/dists/oldstable/Release | awk '/^Codename/ {print $2}');stable=$(curl -s https://deb.debian.org/debian/dists/stable/Release | awk '/^Codename/ {print $2}');matched_files="$(grep -rsil '\-backports' /etc/apt/sources.list*)"if [[ -n "$matched_files" ]]; then  for filename in "$matched_files"; do    grep -e "$oldstable-backports" -e "$stable-backports" "$filename" || \      sed -i -e 's/^.*-backports.*$//' "$filename"  donefi
  • Timeout waiting for instanceDATAPROC_CLUSTER_VM_NAME to report in orNetwork is unreachable:dataproccontrol-REGION.googleapis.com

    Cause: These error messages indicate that the networking setup of yourDataproc cluster is incomplete: you may be missing theroute tothe default internet gatewayorfirewall rules.

    Solution:

    To troubleshoot this issue, you can create the followingConnectivity Tests:

    • Create a Connectivity Testbetween two Dataproc cluster VMs. The outcome of this testwill help you understand whether the ingress or egress allow firewall rulesof your network apply to the cluster VMs correctly.
    • Create a Connectivity Testbetween a Dataproc cluster VM and a current Dataproccontrol API IP address. To get a current Dataproc control APIIP address, use the following command:
    dig dataproccontrol-REGION.googleapis.com A

    Use any of the IPv4 addresses in the answer section of the output.

    The outcome of the Connectivity Test will help you understand whether the routeto the default internet gateway and the egress allow firewall are properlyconfigured.

    Based on the outcomes of the Connectivity Tests:

  • Error due to update

    Cause: The cluster accepted a job submitted to the Dataprocservice, but was unable to scale up or down manually or through autoscaling.This error can also be caused by a non-standard cluster configuration.

    Solution:

    • Cluster reset: Open a support ticket, include adiagnostic tar file,and ask for the cluster to be reset to a RUNNING state.

    • New cluster:Recreate the clusterwith the same configuration. This solution can be faster than asupport-provided reset.

Cluster troubleshooting tips

This section provides additional guidance on troubleshooting common issues thatcan prevent the creation of Dataproc clusters.

When a Dataproc cluster fails to provision, it often producesa generic error message or reports aPENDING orPROVISIONING statusbefore failing. The key to diagnosing and solving cluster failure issuesis to examine cluster logs and assess common failure points.

Common symptoms

The following are common symptoms associated with cluster creation failures:

  • Cluster status remainsPENDING orPROVISIONING for an extended period.
  • Cluster transitions toERROR state.
  • Generic API errors during cluster creation, such asOperation timed out.
  • Logged or API response error messages, such as:

    • RESOURCE_EXHAUSTED: related to CPU, disk, or IP address quotas
    • Instance failed to start
    • Permission denied
    • Unable to connect to service_name.googleapis.com orCould not reach required Google APIs
    • Connection refused ornetwork unreachable
    • Errors related to initialization actions failing, such as script execution errors and file not found.

Review cluster logs

An important initial step when diagnosing cluster creation failures is reviewing thedetailed cluster logs available in Cloud Logging.

  1. Go to Logs Explorer: Open theLogs Explorerin the Google Cloud console.
  2. Filter for Dataproc clusters:
    • In theResource drop-down, selectCloud Dataproc Cluster.
    • Enter yourcluster_name andproject_id. You can also filter bylocation (region).
  3. Examine Log Entries:
    • Look forERROR orWARNING level messages that occur close to the timeof the cluster creation failure.
    • Pay attention to logs frommaster-startup,worker-startup, andagentcomponents for insights into VM-level or Dataprocagent issues.
    • For insight into VM boot-time issues,filter logs byresource.type="gce_instance", and look for messagesfrom the instance names associated with your cluster nodes, such asCLUSTER_NAME-m orCLUSTER_NAME-w-0.Serial console logs can reveal network configuration issues, diskproblems, and script failures that occur early in the VM lifecycle.

Common cluster failure causes and troubleshooting tips

This section outlines common reasons why Dataproc clustercreation might fail and provides troubleshooting tips to help troubleshoot cluster failures.

Insufficient IAM permissions

TheVM service accountthat your Dataproc cluster uses must have appropriateIAM roles to provision Compute Engine instances, access Cloud Storagebuckets, write logs, and interact with other Google Cloud services.

  • Required Worker role: Verify that the VM service account has theDataproc Workerrole (roles/dataproc.worker). This role has the minimum permissions required forDataproc to manage cluster resources.
  • Data access permissions: If your jobs read from or write toCloud Storage or BigQuery, the service account needsrelated roles, such asStorage Object Viewer,Storage Object Creator,orStorage Object Admin for Cloud Storage,orBigQuery Data Viewer orBigQuery Editorfor BigQuery.
  • Logging permissions: The service account must have a role with permissionsneeded to write logs to Cloud Logging, such as theLogging Writer role.

Troubleshooting tips:

  • Identify service account: Determine theVM service accountthat your cluster is configured to use. If not specified, the default is theCompute Engine default service account.

  • Verify IAM roles: Go totheIAM & Admin > IAMpage in the Google Cloud console, find the cluster VM service account,and then verify that it has the necessary roles needed for cluster operations.Grant any missing roles.

Resource quotas exceeded

Dataproc clusters consume resources from Compute Engine andother Google Cloud services. Exceeding project or regional quotas can causecluster creation failures.

Troubleshooting tips:

  • Review quotas: Go to theIAM & Admin > IAMpage in the Google Cloud console.Filter by "Service" for "Compute Engine API" and "Dataproc API."
  • Check usage vs. limit: Identify any quotas that are at or near their limits.
  • If necessary, request a quota increase.

Network configuration issues

Network configurationissues, such as incorrect VPC network, subnet, firewall, or DNS configuration,are a common cause of cluster creation failures. Cluster instances must be ableto communicate with each other and with Google APIs.

  • VPC network and subnet:
    • Verify that the cluster VPC network and subnet exist and areconfigured correctly.
    • Verify that the subnet has a sufficient range of available IP addresses.
  • Private Google Access(PGA): If cluster VMshave internal IP addresses and need to reach Google APIs for Cloud Storage,Cloud Logging, and other operations, verify thatPrivate Google Accessis enabled on the subnet. By default, Dataproc clusters createdwith 2.2+ image versions provision VMs with internal-only IP addresses with Private Google Accessenabled on the cluster regional subnet.
  • Private Service Connect(PSC): If you are usingPrivate Service Connect to access GoogleAPIs, verify that necessaryPrivate Service Connect endpointsare correctly configured for the Google APIs that Dataprocdepends on, such asdataproc.googleapis.com,storage.googleapis.com,compute.googleapis.com, andlogging.googleapis.com. DNS entries forthe APIs must resolve to private IP addresses. Note that usingPrivate Service Connectdoes not eliminate the need to use VPC peering to communicatewith other customer-managed VPC networks.For detailed Private Service Connect network troubleshooting, seeDataproc cluster networking with Private Service Connect.
  • VPC Peering: If your cluster communicates with resources inother VPC networks, such as shared VPC host projects orother customer VPCs, verify that VPC peering is correctlyconfigured and routes are propagating.
  • Firewall rules:

    • Default rules: Verify that default firewall rules, such asallow-internalorallow-ssh, are not overly restrictive.
    • Custom rules: If custom firewall rules are in place, verify that theyallow needed communication paths:

      • Internal communication within the cluster (between-m and-wnodes).
      • Outbound traffic from cluster VMs to Google APIs, using either publicIPs or an internet gateway, Private Google Access, orPrivate Service Connect endpoints.

      • Traffic to any external data sources or services that your jobs depend on.

  • DNS resolution: Confirm that cluster instances can correctly resolve DNSnames for Google APIs and any internal or external services.

Troubleshooting tips:

  • Review network configuration: Inspect the VPC network and subnetsettings where the cluster is being deployed.
  • Check firewall rules: Review firewall rules in the VPC networkor shared VPC host project.
  • Test connectivity: Launch a temporary Compute EngineVM in the cluster subnet and run the following steps:
    • ping orcurl to external Google API domains, such asstorage.googleapis.com.
    • nslookup to verify DNS resolution to expected IP addresses (Private Google Access orPrivate Service Connect).
    • Run Google Cloudconnectivity teststo diagnose paths from a test VM to relevant endpoints.

Initialization action failures

Dataproc initialization actions are scripts that run on cluster VMsduring cluster creation. Errors in these scripts can prevent cluster startup.

Troubleshooting tips:

  • Examine logs for initialization action errors: Look for log entries related toinit-actions orstartup-script for the cluster instances in Cloud Logging.
  • Check script paths and permissions: Verify that initialization actionscripts are correctly located in Cloud Storage and that the cluster VM serviceaccount has theStorage Object Viewer role needed to read Cloud Storage scripts.
  • Debug script logic: Test script logic on a separate Compute EngineVM that mimics the cluster environment to identify errors. Add verboselogging to the script.

Regional resource availability (stockouts)

Occasionally, a machine type or resource in a region or zoneexperiences temporary unavailability (stockout). Typically, this resultsinRESOURCE_EXHAUSTED errors unrelated to project quota issues.

Troubleshooting tips:

  • Try a different zone or region: Attempt to create the cluster in a differentzone within the same region, or in a different region.
  • Use Auto Zone placement: Use the DataprocAuto Zone placementfeature to automatically select a zone with capacity.
  • Adjust machine type: If using a custom or specialized machine type,try a standard machine type to see if that resolves the issue.

Contact Cloud Customer Care

If you continue to experience cluster failure issues, contactCloud Customer Care. Describe the cluster failure issue andtroubleshooting steps taken. Additionally, provide the following information:

  • Cluster diagnostic data
  • Output from the following command:
      gcloud dataproc clusters describeCLUSTER_NAME \      --region=REGION
  • Exported logs for the failed cluster.

Use thegcpdiag tool

gcpdiagis an open source tool. It is not an officially supported Google Cloud product.You can use thegcpdiag tool to help you identify and fix Google Cloudproject issues. For more information, see thegcpdiag project on GitHub.

Thegcpdiag tool helps you discover the following Dataproccluster creation issues by performing the following checks:

  • Stockout errors: Evaluates Logs Explorer logs to discover stockouts inregions and zones.
  • Insufficient quota: Checks quota availability in the Dataproccluster project.
  • Incomplete network configuration: Performs network connectivity tests,including checks for necessary firewall rules and external and internal IPconfiguration.If the cluster has been deleted, thegcpdiag tool cannotperform a network connectivity check.
  • Incorrect cross-project configuration: Checks for cross-project serviceaccounts and reviews additional roles and organization policies enforcement.
  • Missing shared Virtual Private Cloud network IAM roles: If the Dataproc cluster usesa Shared VPC network, checks for the addition of required serviceaccount roles.
  • Initialization action failures: Evaluates Logs Explorerlogs to discover initialization action script failures and timeouts.

For a list ofgcpdiag cluster-creation steps, seePotential steps.

Run thegcpdiag command

You can run thegcpdiag command fromCloud Shell in theGoogle Cloud console or within aDocker container.

Google Cloud console

  1. Complete and then copy the following command.
  2. gcpdiag runbook dataproc/cluster-creation \    --parameter project_id=PROJECT_ID \    --parameter cluster_name=CLUSTER_NAME \    --parameterOPTIONAL_FLAGS
  3. Open the Google Cloud console and activate Cloud Shell.
  4. Open Cloud console
  5. Paste the copied command.
  6. Run thegcpdiag command, which downloads thegcpdiag docker image, and then performs diagnostic checks. If applicable, follow the output instructions to fix failed checks.

Docker

You can rungcpdiag using a wrapper that startsgcpdiag in aDocker container. Docker orPodman must be installed.

  1. Copy and run the following command on your local workstation.
    curl https://gcpdiag.dev/gcpdiag.sh >gcpdiag && chmod +x gcpdiag
  2. Execute thegcpdiag command.
    ./gcpdiag runbook dataproc/cluster-creation \    --parameter project_id=PROJECT_ID \    --parameter cluster_name=CLUSTER_NAME \    --parameterOPTIONAL_FLAGS

Viewavailable parameters for this runbook.

Replace the following:

    • PROJECT_ID: The ID of the project containing the resource
    • CLUSTER_NAME: The name of the target Dataproc cluster in your project
    • OPTIONAL_PARAMETERS: Add one or more of the following optional parameters.These parameters are required if the cluster has been deleted.
      • cluster_uuid: The UUID of the target Dataproc cluster in your project
      • service_account: The Dataproc clusterVM service account
      • subnetwork: The Dataproc cluster subnetwork full URI path
      • internal_ip_only: True or False
      • cross_project: The cross-project ID if the Dataproc cluster uses a VM service account in another project

Useful flags:

For a list and description of allgcpdiag tool flags, see thegcpdiag usage instructions.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.