Spark properties

This document describes Spark properties and how to set them.Serverless for Apache Spark uses Spark properties to determinethe compute, memory, and disk resources to allocate to your batch workload.These property settings can affect workload quota consumption and cost. Formore information, seeServerless for Apache Spark quotas andServerless for Apache Spark pricing.

Note: Also seeSpark metrics,which describes properties you can set to control the collection of Sparkmetrics.

Set Spark batch workload properties

You can specify Spark properties when yousubmit a Serverless for Apache Spark Spark batch workloadusing the Google Cloud console, gcloud CLI, or the Dataproc API.

Note: Unlike Dataproc on Compute Engine Spark configurationcluster properties, Serverless for Apache Spark Spark workload properties don't require a leading "spark:" file prefix.

Console

  1. In the Google Cloud console, go to theDataproc create batchpage.

    Go to Dataproc Create batch

  2. In theProperties section, clickAdd Property.

  3. Enter theKey (name) andValue of asupported Spark property.

gcloud

gcloud CLI batch submission example:

gcloud dataproc batches submit spark    --properties=spark.checkpoint.compress=true \    --region=region \    other args ...

API

SetRuntimeConfig.propertieswithsupported Spark properties as part of abatches.create request.

Supported Spark properties

Serverless for Apache Spark supports most Spark properties, but itdoes not support YARN-related and shuffle-related Spark properties, such asspark.master=yarn andspark.shuffle.service.enabled. If Spark applicationcode sets a YARN or shuffle property, the application will fail.

Runtime environment properties

Serverless for Apache Spark supports the following custom Spark propertiesfor configuring runtime environment:

PropertyDescription
spark.dataproc.driverEnv.ENVIRONMENT_VARIABLE_NAMEAddENVIRONMENT_VARIABLE_NAME to the driver process. You can specify multiple environment variables.
spark.executorEnv.ENVIRONMENT_VARIABLE_NAMEAddENVIRONMENT_VARIABLE_NAME to the executor process. You can specify multiple environment variables.

Tier property

PropertyDescriptionDefault
dataproc.tier The tier on which a batch workload runs, eitherstandard orpremium (seeGoogle Cloud Serverless for Apache Spark tiers) Interactive sessions always run at the premiumdataproc.tier.
  • Setting this batch tier property tostandard sets Dataproc runtime and resource tier properties to thestandard tier (seeresource allocation properties).
  • Setting this batch tier property topremium setsspark.dataproc.engine tolightningEngine, and setsspark.dataproc.driver.compute.tier andspark.dataproc.executor.compute.tier topremium. You can override most automatically set batch tier settings, but the automatically setcompute tier settings can't be overridden for batches using runtimes prior to3.0 (seeresource allocation properties).
standard

Engine and runtime properties

PropertyDescriptionDefault
spark.dataproc.engine The engine to use to run the batch workload or the interactive session: eitherlightningEngine (seeLightning Engine) or thedefault engine.
  • Batch workloads: If you select the standarddataproc.tier property for your workload, this property is automatically set todefault and cannot be overridden. If you select thepremiumdataproc.tier for your workload, this property is automatically set tolightningEngine, but you can change the setting todefault if needed.
  • Interactive sessions: This setting is automatically set todefault, but you can change this setting tolightningEngine. Note that interactive session always run at thepremium tier.
  • Batches (standard tier):default
  • Batches (premium tier):lightningEngine
  • Sessions:default
spark.dataproc.lightningEngine.runtime The runtime to use when Lightning Engine is selected for a batch workload or interactive session:default ornative (Native Query Execution).default

Resource allocation properties

Note: Since Spark resource allocation properties must be fully configured beforeworkload executions starts, set Spark resource allocation propertieswhen yousubmit a batch workloador create asession or session template,not on theSparkContext/SparkSession during workload execution.

Serverless for Apache Spark supports the following Spark propertiesfor configuring resource allocation:

PropertyDescriptionDefaultExamples
spark.driver.coresThe number of cores (vCPUs) to allocate to the Spark driver. Valid values:4,8,16.4
spark.driver.memory

The amount of memory to allocate to the Spark driver process, specified in JVM memory string format with a size unit suffix ("m", "g" or "t").

Total driver memory per driver core, including driver memory overhead, which must be between1024m and7424m for the Standard compute tier (24576m for the Premium compute tier). For example, ifspark.driver.cores = 4, then4096m <= spark.driver.memory + spark.driver.memoryOverhead <= 29696m.

512m,2g
spark.driver.memoryOverhead

The amount of additional JVM memory to allocate to the Spark driver process, specified in JVM memory string format with a size unit suffix ("m", "g" or "t").

This is non-heap memory associated with JVM overheads, internal strings, and other native overheads, and includes memory used by other driver processes, such as PySpark driver processes and memory used by other non-driver processes running in the container. The maximum memory size of the container in which the driver runs is determined by the sum ofspark.driver.memoryOverhead plusspark.driver.memory.

Total driver memory per driver core, including driver memory overhead, must be between1024m and7424m for the Standard compute tier (24576m for the Premium compute tier). For example, ifspark.driver.cores = 4, then4096m <= spark.driver.memory + spark.driver.memoryOverhead <= 29696m.

10% of driver memory, except for PySpark batch workloads, which default to 40% of driver memory512m,2g
spark.dataproc.driver.compute.tierThe compute tier to use on the driver. The Premium compute tier offers higher per-core performance, but it is billed at a higher rate.standardstandard, premium
spark.dataproc.driver.disk.sizeThe amount of disk space allocated to the driver, specified with a size unit suffix ("k", "m", "g" or "t"). Must be at least250GiB. If the Premium disk tier is selected on the driver, valid sizes are 375g, 750g, 1500g, 3000g, 6000g, or 9000g. If the Premium disk tier and 16 driver cores are selected, the minimum disk size is 750g.100GiB per core1024g,2t
spark.dataproc.driver.disk.tierThe disk tier to use for local and shuffle storage on the driver. The Premium disk tier offers better performance in IOPS and throughput, but it is billed at a higher rate. If the Premium disk tier is selected on the driver, the Premium compute tier also must be selected usingspark.dataproc.driver.compute.tier=premium, and the amount of disk space must be specified usingspark.dataproc.executor.disk.size.

If the Premium disk tier is selected, the driver allocates an additional 50GiB of disk space for system storage, which is not usable by user applications.

standardstandard, premium
spark.executor.coresThe number of cores (vCPUs) to allocate to each Spark executor. Valid values:4,8,16.4
spark.executor.memory

The amount of memory to to allocate to each Spark executor process, specified in JVM memory string format with a size unit suffix ("m", "g" or "t").

Total executor memory per executor core, including executor memory overhead, must be between1024m and7424m for the Standard compute tier (24576m for the Premium compute tier). For example, ifspark.executor.cores = 4, then4096m <= spark.executor.memory + spark.executor.memoryOverhead <= 29696m.

512m,2g
spark.executor.memoryOverhead

The amount of additional JVM memory to allocate to the Spark executor process, specified in JVM memory string format with a size unit suffix ("m", "g" or "t").

This is non-heap memory used for JVM overheads, internal strings, and other native overheads, and includes PySpark executor memory and memory used by other non-executor processes running in the container. The maximum memory size of the container in which the executor runs is determined by the sum ofspark.executor.memoryOverhead plusspark.executor.memory.

Total executor memory per executor core, including executor memory overhead, must be between1024m and7424m for the Standard compute tier (24576m for the Premium compute tier). For example, ifspark.executor.cores = 4, then4096m <= spark.executor.memory + spark.executor.memoryOverhead <= 29696m.

10% of executor memory, except for PySpark batch workloads, which default to 40% of executor memory512m,2g
spark.dataproc.executor.compute.tierThe compute tier to use on the executors. The Premium compute tier offers higher per-core performance, but it is billed at a higher rate.standardstandard, premium
spark.dataproc.executor.disk.sizeThe amount of disk space allocated to each executor, specified with a size unit suffix ("k", "m", "g" or "t"). Executor disk space may be used for shuffle data and to stage dependencies. Must be at least250GiB. If the Premium disk tier is selected on the executor, valid sizes are 375g, 750g, 1500g, 3000g, 6000g, or 9000g. If the Premium disk tier and 16 executor cores are selected, the minimum disk size is 750g.100GiB per core1024g,2t
spark.dataproc.executor.disk.tierThe disk tier to use for local and shuffle storage on executors. The Premium disk tier offers better performance in IOPS and throughput, but it is billed at a higher rate. If the Premium disk tier is selected on the executor, the Premium compute tier also must be selected usingspark.dataproc.executor.compute.tier=premium, and the amount of disk space must be specified usingspark.dataproc.executor.disk.size.

If the Premium disk tier is selected, each executor is allocated an additional 50GiB of disk space for system storage, which is not usable by user applications.

standardstandard, premium
spark.executor.instancesThe initial number of executors to allocate. After a batch workload starts, autoscaling may change the number of active executors. Must be at least2 and at most2000.

Autoscaling properties

SeeSpark dynamic allocation propertiesfor a list of Spark properties you can use to configureServerless for Apache Spark autoscaling.

Logging properties

PropertyDescriptionDefaultExamples
spark.log.levelWhen set, overrides any user-defined log settings with the effect of a call toSparkContext.setLogLevel() at Spark startup. Valid log levels include:ALL,DEBUG,ERROR,FATAL,INFO,OFF,TRACE, andWARN.INFO,DEBUG
spark.executor.syncLogLevel.enabledWhen set totrue, the log level applied through theSparkContext.setLogLevel() method is propagated to all executors.falsetrue,false
spark.log.level.PackageNameWhen set, overrides any user-defined log settings with the effect of a call toSparkContext.setLogLevel(PackageName,level) at Spark startup. Valid log levels include:ALL,DEBUG,ERROR,FATAL,INFO,OFF,TRACE, andWARN.spark.log.level.org.apache.spark=error

Scheduling properties

PropertyDescriptionDefaultExamples
spark.scheduler.excludeShuffleSkewExecutorsExclude shuffle map skewed executors when scheduling, which can reduce long shuffle fetch wait times caused by shuffle write skew.falsetrue
spark.scheduler.shuffleSkew.minFinishedTasksMinimum number of finished shuffle map tasks on an executor to treat as skew.10100
spark.scheduler.shuffleSkew.maxExecutorsNumberMaximum number of executors to treat as skew. Skewed executors are excluded from the current scheduling round.510
spark.scheduler.shuffleSkew.maxExecutorsRatioMaximum ratio of total executors to treat as skew. Skewed executors are excluded from scheduling.0.050.1
spark.scheduler.shuffleSkew.ratioA multiple of the average finished shuffle map tasks on an executor to treat as skew.1.52.0

Other properties

PropertyDescription
dataproc.diagnostics.enabledEnable this property to run diagnostics on a batch workload failure or cancellation. If diagnostics are enabled, your batch workload continues to use compute resources after the workload is complete until diagnostics are finished. A URI pointing to the location of the diagnostics tar file is listed in theBatch.RuntimeInfo.diagnosticOutputUri API field.
dataproc.gcsConnector.versionUse this property to upgrade to aCloud Storage connector version that is different from the version installed with your batch workload'sruntime version.
dataproc.sparkBqConnector.versionUse this property to upgrade to aSpark BigQuery connector version that is different from the version installed with your batch workload'sruntime version (seeUse the BigQuery connector with Serverless for Apache Spark).
dataproc.profiling.enabledSet this property totrue to enable profiling for the Serverless for Apache Spark workload.
dataproc.profiling.nameUse this property to set the name used to create a profile on theProfiler service.
spark.jarsUse this property to set the comma-separated list of jars to include on the driver and executor classpaths
spark.archivesUse this property to set the comma-separated list of archives to be extracted into the working directory of each executor. .jar, .tar.gz, .tgz and .zip are supported. For serverless interactive sessions add this property when creating an interactive session/template
dataproc.artifacts.removeUse this property to remove default artifacts installed onServerless for Apache Spark runtimes. Supported artifacts arespark-bigquery-connector,conscrypt,iceberg, anddelta-lake.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.