Dataproc versioning Stay organized with collections Save and categorize content based on your preferences.
Dataproc uses images to tie together useful Google Cloudconnectors and Apache Spark & Apache Hadoop components into one package thatcan be deployed on a Dataproc cluster.These images contain the base operating system (Debian or Ubuntu) for thecluster, along withcore and optional componentsneeded to run jobs, such as Spark, Hadoop, and Hive.These images are periodically upgraded to include new improvements andfeatures. Dataproc versioning lets you select sets of softwareversions when you create clusters.
How versioning works
When an image is created, it is given animage version numberin the following format:
version_major.version_minor.version_sub_minor-os_distribution
The following OS distributions are maintained:
| OS Distribution Code | OS Distribution |
|---|---|
| debian12 | Debian 12 |
| debian10 | Debian 10 |
| debian11 | Debian 11 |
| rocky8 | Rocky Linux 8 |
| rocky9 | Rocky Linux 9 |
| ubuntu18 | Ubuntu 18.04 LTS |
| ubuntu20 | Ubuntu 20.04 LTS |
| ubuntu22 | Ubuntu 22.04 LTS |
Seeold image versions for previously supported OSdistributions.
The recommended practice is to specify themajor.minor imageversion for production environments or when compatibility with specific componentversions is important. The subminor and OS distributions are automaticallyset to the latest weekly release.
Select versions
When you create a new Dataproc cluster, the latest availableDebian image version are used by default. You can select aDebian, Rocky Linux or Ubuntu image version when creating a cluster (see theDataproc image version list).When specifying Debian-based images, you can omit the OS DistributionCode suffix, for example by specifying2.0 to select the2.0-debian10 image.The OS suffixmust be used to select a Rocky Linux orUbuntu-based image, for example by specifying2.0-ubuntu18.
gcloud command
When using thegcloud dataproc clusters create command, you canuse the--image-version argument to specify an image version forthe new cluster.
Debian image example:
gcloud dataproc clusters createCLUSTER_NAME \ --image-version=2.0 \ --region=REGION
Ubuntu image example:
gcloud dataproc clusters createCLUSTER_NAME \ --image-version=2.0-ubuntu18 \ --region=REGION
Best practice is to omit the subminor version so that the latest subminorversion is used. However, if necessary, the subminor version can be specified,for example,2.0.20.
You can check your current version with the Google Cloud CLI.
gcloud dataproc clusters describeCLUSTER_NAME \ --region=REGION
REST API
You can specify theSoftwareConfigimageVersion field as part of acluster.createAPI request.
Example
POST /v1/projects/project-id/regions/us-central1/clusters/{ "projectId": "project-id", "clusterName": "example-cluster", "config": { "configBucket": "", "gceClusterConfig": { "subnetworkUri": "default", "zoneUri": "us-central1-b" }, "masterConfig": { ... } }, "workerConfig": { ... } },"softwareConfig": {"imageVersion": "2.0" } }}Console
Open the DataprocCreate a cluster page. TheSet up cluster panel is selected. TheImage type andVersion field in theVersioning section shows the image that will be used when creating the cluster. The image release date is also shown. Initially, the default image, the latest available Debian version, is shown as selected. ClickChange to display a lists of available images. You can select a standard orcustom image to use for your cluster.
When new versions are created
Newmajor versions are periodically created to incorporateone or more of the following:
- Major releases for:
- Spark, Hadoop, and other Big Data components
- Google Cloud connectors
- Major changes or updates to Dataproc functionality
Newpreview versions (with a-RC suffix) are released prior to the releaseof a new major version:
- Preview images are not intended for use in production workloads.
- Preview image component versions might be upgraded to the latest availablecomponent version in the post-preview GA image version.
Newminor versions are periodically created to incorporateone or more of the following:
- Minor releases and updates for:
- Spark, Hadoop, and other Big Data components
- Google Cloud connectors
- Minor changes or updates to Dataproc functionality
When a new minor version is created, its Debian image becomes the default for the majorversion, and represents the latest release of the major version.
Newsubminor versions are periodically created to incorporateone or more of the following:
- Patches or fixes for a component in the image
- Component subminor version upgrades
Image version and Dataproc support
Minor image versions are supported for 24 months after initialGA (General Availability) release. During this period, clusters using theseimage versions are eligible for support (to receive fixes, recreate your clusterusing the latest supported subminor image version). After the support windowhas closed, clusters using the image versions aren't eligible for support.
Old image versions
Previously supported OS distributions
The following OS distributions were previously supported:
| OS Distribution Code | OS Distribution | Last Patched (End of support) |
|---|---|---|
| debian9 | Debian 9 | July 10, 2020 |
| deb8 | Debian 8 | October 26, 2018 |
Image versions without explicit OS distribution
Prior to August 16, 2018, image versions were built with Debian 8, and omittedthe OS Distribution Code. They are specified in the following format:
version_major.version_minor.version_sub_minor
Versions 0.1 and 0.2
Image versions released as alpha or beta releases prior toDataproc version1.0 general availabilityaren't subject to theDataproc support policy.
Important notes about versioning
- Image versions contain the following components:
- Core components that areinstalled on all clusters, such as Spark, Hadoop, and Hive
- Optional componentsthat you specify when you create a cluster
- Your Dataproc clusters are not automatically updated when newimage versions are released.
- Recommendations:
- Run clusters with the latestsubminor image version.Image metadata includes a
previous-subminorlabel, which is set totrueif thecluster is not using the latest subminor image version.- To view image metadata:
- Run the following
gcloud compute images list --filtercommand to list the resource name of a Dataproc image.gcloud compute images list --project=PROJECT_NAME --filter="labels.goog-dataproc-version ~ ^IMAGE_VERSION (such as
2.2.16-debian12)" - Run the following
gcloud compute images describeto view image metadata.gcloud compute images describe --project=PROJECT_NAMEIMAGE_NAME"
- Run the following
- To view image metadata:
- Test and validate that your applications run successfully on clusters created withnew image versions, particularly when using new major image version releases.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-15 UTC.