Cluster caching

When you enable Dataproc cluster caching, the cluster cachesCloud Storage data frequently accessed by your Spark jobs.

Benefits

  • Improved performance: Caching can improve job performance by reducing the amountof time spent retrieving data from storage.
  • Reduced storage costs: Since hot data is cached on local disk,fewer API calls are made to storage to retrieve data.
  • Spark job applicability: When cluster caching is enabled on a cluster,it applies to all Spark jobs run on the cluster, whether submitted to theDataproc service or run independently on the cluster.

Limitations and requirements

Enable cluster caching

You can enable cluster caching when you create a Dataproc clusterusing the Google Cloud console, Google Cloud CLI, or the Dataproc API.

Google Cloud console

  • Open the DataprocCreate a cluster on Compute Engine page in the Google Cloud console.
  • TheSet up cluster panel is selected. In theSpark performance enhancements section, selectEnable Google Cloud Storage caching.
  • After confirming and specifying cluster details in the cluster create panels, clickCreate.

gcloud CLI

Run thegcloud dataproc clusters create command locally in a terminal window or inCloud Shell using thedataproc:dataproc.cluster.caching.enabled=truecluster property.

Example:

gcloud dataproc clusters createCLUSTER_NAME \    --region=REGION \    --properties dataproc:dataproc.cluster.caching.enabled=true \    --num-master-local-ssds=2 \    --master-local-ssd-interface=NVME \    --num-worker-local-ssds=2 \    --worker-local-ssd-interface=NVME \    other args ...

REST API

SetSoftwareConfig.properties to include the"dataproc:dataproc.cluster.caching.enabled": "true"cluster property as part of aclusters.create request.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.