Troubleshoot batch and session creation failures

This document provides guidance on troubleshooting common issues that preventGoogle Cloud Serverless for Apache Spark Spark batch workloads and interactive sessions from starting.

Overview

Typically, when a batch or session fails to start, it reports thefollowing error message:

Driver compute node failed to initialize for batch in 600 seconds

This error message indicates that the Spark drivercouldn't start within the default timeout period of 600 seconds (10 minutes).Common causes are related to service account permissions, resource availability,network configuration, or Spark properties.

Batch and session start failure causes and troubleshooting steps

The following sections list common causes of batch and session start failures withtroubleshooting tips to help you resolve the issues.

Insufficient service account permissions

The service account used by your Serverless for Apache Spark batch or session requires specificIAM rolesthat include permissions for Serverless for Apache Spark operation and accessto Google Cloud resources. If the service account lacks the necessary roles,the Spark driver for the batch or session can fail to initialize.

Required Worker role: The batch or session service account must have theDataproc Worker role (roles/dataproc.worker). This role containsthe minimum permissions needed for Serverless for Apache Spark to provision andmanage compute resources.
Data Access Permissions: If your Spark application reads from orwrites to Cloud Storage or BigQuery, the serviceaccount needs roles related to those services:
- Cloud Storage: TheStorage Object Viewer role (roles/storage.objectViewer)is needed for reading, and theStorage Object Creator role (roles/storage.objectCreator)orStorage Object Admin role (roles/storage.admin) is needed for writing.
- BigQuery: TheBigQuery Data Viewer role (roles/bigquery.dataViewer)is needed for reading and theBigQuery Data Editor role (roles/bigquery.dataEditor)is needed for writing.
Logging Permissions: The service account needs a role withpermission to write logs to Cloud Logging. Typically, theLogging Writer role (roles/logging.logWriter) is sufficient.

Troubleshooting tips:

Identify the batch or sessionservice account.If not specified, it defaults to theCompute Engine default service account.
Go to theIAM & Admin > IAMpage in the Google Cloud console, find the batch or session service account,and then verify that it has the necessary roles needed for operations.Grant any missing roles.

Insufficient quota

Exceeding project or region-specific quotas for Google Cloud Serverless for Apache Sparkor other Google Cloud resources can prevent new batches or session from starting.

Troubleshooting tips:

Review theGoogle Cloud Serverless for Apache Spark quotas pageto understand limits on concurrent batches, DCUs, and shuffle storage.
- You can also use thegcloud compute quotas list command to viewcurrent usage and limits for your project and region:
```
gcloud compute quotas list --project=PROJECT_ID --filter="service:dataproc.googleapis.com"
```
If you repeatedly hit quota limits, consider requesting a quotaincrease through the Google Cloud console.

Network configuration issues

Incorrect network settings, such as VPC configuration, Private Google Access,or firewall rules, can block the Spark driver from initializing or connecting tonecessary services.

Troubleshooting tips:

Verify that the VPC network and subnet specified for your batch or session arecorrectly configured and have sufficient IP addresses available.
If your batch or session needs to access Google APIsand services without traversing the public internet, verifyPrivate Google Access is enabled for the subnet.Serverless for Apache Spark batch workloads and interactive sessions runon VMs with internal IP addresses only and on a regional subnet withPrivate Google Access automatically enabled on the session subnet.
Review your VPC firewall rules to verify they don'tinadvertently block internal communication or egress to Google APIs orexternal services that are required by your Spark application.

Tip: To diagnose batch and network connectivity issues, also see Troubleshoot batch and session connectivity.

Invalid spark properties or application code issues

Misconfigured Spark properties, particularly those related to driver resources,or issues within your Spark application code can lead to startup failures.

Troubleshooting tips:

Checkspark.driver.memory andspark.driver.cores values.Verify they are within reasonable limits and align with available DCUs.Excessively large values for these properties can lead to resourceexhaustion and initialization failures. Remove any unnecessary orexperimental Spark properties to simplify debugging.
Try running a "Hello World" Spark application to determine if the issueis with your environment setup or due to code complexity or errors.
Verify that all application JARs, Python files,or dependencies specified for your batch or session are correctlylocated in Cloud Storage and are accessible by thebatch or session service account.

Check logs

A critical step in diagnosing batch creation failures is to examinethe detailed logs in Cloud Logging.

Go to theCloud Logging pagein the Google Cloud console.
Filter for Serverless for Apache Spark Batches or Sessions:
1. In theResource drop-down, selectCloud Dataproc Batch orCloud Dataproc Session.
2. Filter bybatch_id orsession_id for the failed batch or session.You can also filter byproject_id andlocation (region).
Look for log entries withjsonPayload.component="driver".These logs often contain specific error messages or stack traces thatcan pinpoint the reason for the driver initialization failurebefore the 600-second timeout occurs.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.

Movatterモバイル変換

Troubleshoot batch and session creation failures Stay organized with collections Save and categorize content based on your preferences.

Overview

Batch and session start failure causes and troubleshooting steps

Insufficient service account permissions

Insufficient quota

Network configuration issues

Invalid spark properties or application code issues

Check logs

Troubleshoot batch and session creation failures