Dataproc versioning

Dataproc uses images to tie together useful Google Cloudconnectors and Apache Spark & Apache Hadoop components into one package thatcan be deployed on a Dataproc cluster.These images contain the base operating system (Debian or Ubuntu) for thecluster, along withcore and optional componentsneeded to run jobs, such as Spark, Hadoop, and Hive.These images are periodically upgraded to include new improvements andfeatures. Dataproc versioning lets you select sets of softwareversions when you create clusters.

How versioning works

When an image is created, it is given animage version numberin the following format:

version_major.version_minor.version_sub_minor-os_distribution

The following OS distributions are maintained:

OS Distribution CodeOS Distribution
debian12Debian 12
debian10Debian 10
debian11Debian 11
rocky8Rocky Linux 8
rocky9Rocky Linux 9
ubuntu18Ubuntu 18.04 LTS
ubuntu20Ubuntu 20.04 LTS
ubuntu22Ubuntu 22.04 LTS

Seeold image versions for previously supported OSdistributions.

The recommended practice is to specify themajor.minor imageversion for production environments or when compatibility with specific componentversions is important. The subminor and OS distributions are automaticallyset to the latest weekly release.

Select versions

When you create a new Dataproc cluster, the latest availableDebian image version are used by default. You can select aDebian, Rocky Linux or Ubuntu image version when creating a cluster (see theDataproc image version list).When specifying Debian-based images, you can omit the OS DistributionCode suffix, for example by specifying2.0 to select the2.0-debian10 image.The OS suffixmust be used to select a Rocky Linux orUbuntu-based image, for example by specifying2.0-ubuntu18.

gcloud command

When using thegcloud dataproc clusters create command, you canuse the--image-version argument to specify an image version forthe new cluster.

Debian image example:

gcloud dataproc clusters createCLUSTER_NAME \    --image-version=2.0 \    --region=REGION

Ubuntu image example:

gcloud dataproc clusters createCLUSTER_NAME \    --image-version=2.0-ubuntu18 \    --region=REGION

Best practice is to omit the subminor version so that the latest subminorversion is used. However, if necessary, the subminor version can be specified,for example,2.0.20.

You can check your current version with the Google Cloud CLI.

gcloud dataproc clusters describeCLUSTER_NAME \    --region=REGION

REST API

You can specify theSoftwareConfigimageVersion field as part of acluster.createAPI request.

Example

POST /v1/projects/project-id/regions/us-central1/clusters/{  "projectId": "project-id",  "clusterName": "example-cluster",  "config": {    "configBucket": "",    "gceClusterConfig": {      "subnetworkUri": "default",      "zoneUri": "us-central1-b"    },    "masterConfig": {      ...      }    },    "workerConfig": {      ...      }    },"softwareConfig": {"imageVersion": "2.0"    }  }}

Console

Open the DataprocCreate a cluster page. TheSet up cluster panel is selected. TheImage type andVersion field in theVersioning section shows the image that will be used when creating the cluster. The image release date is also shown. Initially, the default image, the latest available Debian version, is shown as selected. ClickChange to display a lists of available images. You can select a standard orcustom image to use for your cluster.

When new versions are created

Newmajor versions are periodically created to incorporateone or more of the following:

  • Major releases for:
    • Spark, Hadoop, and other Big Data components
    • Google Cloud connectors
  • Major changes or updates to Dataproc functionality

Newpreview versions (with a-RC suffix) are released prior to the releaseof a new major version:

  • Preview images are not intended for use in production workloads.
  • Preview image component versions might be upgraded to the latest availablecomponent version in the post-preview GA image version.

Newminor versions are periodically created to incorporateone or more of the following:

  • Minor releases and updates for:
    • Spark, Hadoop, and other Big Data components
    • Google Cloud connectors
  • Minor changes or updates to Dataproc functionality

When a new minor version is created, its Debian image becomes the default for the majorversion, and represents the latest release of the major version.

Newsubminor versions are periodically created to incorporateone or more of the following:

  • Patches or fixes for a component in the image
  • Component subminor version upgrades

Image version and Dataproc support

Minor image versions are supported for 24 months after initialGA (General Availability) release. During this period, clusters using theseimage versions are eligible for support (to receive fixes, recreate your clusterusing the latest supported subminor image version). After the support windowhas closed, clusters using the image versions aren't eligible for support.

Old image versions

Previously supported OS distributions

The following OS distributions were previously supported:

OS Distribution CodeOS DistributionLast Patched (End of support)
debian9Debian 9July 10, 2020
deb8Debian 8October 26, 2018

Image versions without explicit OS distribution

Prior to August 16, 2018, image versions were built with Debian 8, and omittedthe OS Distribution Code. They are specified in the following format:

version_major.version_minor.version_sub_minor

Versions 0.1 and 0.2

Image versions released as alpha or beta releases prior toDataproc version1.0 general availabilityaren't subject to theDataproc support policy.

Important notes about versioning

  • Image versions contain the following components:
  • Your Dataproc clusters are not automatically updated when newimage versions are released.
    • Recommendations:
    • Run clusters with the latestsubminor image version.Image metadata includes aprevious-subminor label, which is set totrue if thecluster is not using the latest subminor image version.
      • To view image metadata:
        1. Run the followinggcloud compute images list --filtercommand to list the resource name of a Dataproc image.
          gcloud compute images list --project=PROJECT_NAME --filter="labels.goog-dataproc-version ~ ^IMAGE_VERSION (such as2.2.16-debian12)"
        2. Run the followinggcloud compute images describeto view image metadata.
          gcloud compute images describe --project=PROJECT_NAMEIMAGE_NAME"
    • Test and validate that your applications run successfully on clusters created withnew image versions, particularly when using new major image version releases.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.