2.3.x release versions Stay organized with collections Save and categorize content based on your preferences.
Notes:
Version
2.3is a lightweight image that contains only core components,reducing exposure to Common Vulnerabilities and Exposures (CVEs). For highersecurity compliance requirements, use the image version2.3or later, whencreating a Dataproc cluster.If you choose to installoptional components whencreating a Dataproc cluster with
Note: You must specify the optional components that you want to install whenyou create the cluster. For more information, seeAdd optional components.2.3image, they will bedownloaded and installed during cluster creation. This might increase thecluster startup time. To avoid this delay, you can create acustom imagewith the optional components pre-installed. This is achieved by runninggenerate_custom_image.pywith the--optional-componentsflag.
The following example shows the Google Cloud CLI command for creating a clusterwith optional components:gcloud dataproc clusters createCLUSTER_NAME --optional-components=COMPONENT_NAME \ ... other flags
Notes
The following optional components are supported in non-arm 2.3 images:
- Apache Flink
- Apache Hive WebHCat
- Apache Hudi
- Apache Iceberg
- Apache Pig
- Delta Lake
- Docker
- JupyterLab Notebook
- Ranger
- Solr
- Trino
- Zeppelin notebook
- Zookeeper
2.3.x-*-armimagessupport only the pre-installed components and the following optionalcomponents. The other 2.3 optional components and all initialization actionsaren't supported:- Apache Hive WebHCat
- Apache Pig (starting with
2.3.22-ubuntu22-arm) - Docker
- Zeppelin notebook
- Zookeeper (installed inhigh availability clusters;optional component in other clusters)
yarn.nodemanager.recovery.enabledand HDFS Audit Loggingare enabled by default in 2.3 images.micromamba, instead ofcondain previous image versions, is installed as partof the Python installation.Docker and Zeppelin installation issues:
- Installation fails if the cluster has no public internet access. As aworkaround, create a cluster that uses a custom image with optionalcomponents pre-installed. You can do this by running
generate_custom_image.pywith the--optional-componentsflag. - Installation can fail if the cluster is pinned to an older sub-minor imageversion: Packages are installed on demand from public OSS repositories, and a packagemight not be available upstream to support the installation.As a workaround, create a cluster that uses a custom image with optionalcomponents pre-installed in the custom image. To do this, run
generate_custom_image.pywith the--optional-componentsflag.
- Installation fails if the cluster has no public internet access. As aworkaround, create a cluster that uses a custom image with optionalcomponents pre-installed. You can do this by running
The default resource calculator for YARN has been changed fromDefaultResourceCalculatortoDominantResourceCalculator,which uses the dominant-resource concept to determine resource allocation,such as Memory and CPU allocation. This change impactsAutoscaler,which scales based on the dominant resource usage of the cluster.
Image version 2.3 machine learning (ML) components
The Dataproc2.3-ml-ubuntu image extends the 2.3 base imagewith ML-specific software. It supports 2.3 image optional components and other2.3 features, and adds the component versions listed in the following sections.
GPU-specific libraries
For Dataproc jobs that use GPU VMs,the following NVIDIA driver and libraries are available in the2.3-ml-ubuntu image. You can use them to accomplish the followingtasks:
- Accelerate Spark batch workloads with theNVIDIA Spark Rapids library
- Train machine learning workloads
- Run distributed batch inference using Spark
| Package Name | Version |
|---|---|
| Spark Rapids | 25.04.0 |
| NVIDIA Driver | Ubuntu 22.04 LTS Accelerated with NVIDIA driver version 570 |
| CUDA | 12.6.3 |
| cublas | 12.6.4 |
| cusolver | 11.7.1 |
| cupti | 12.6.80 |
| cusparse | 12.5.4 |
| cuDNN | 9.10.1 |
| NCCL | 2.27.5 |
XGBoost libraries
The followingMaven package versionsare available in2.3-ml-ubuntu image to let you useXGBoost with Spark in Java orScala.
| Group ID | Package Name | Version |
|---|---|---|
| ml.dmlc | xgboost4j-gpu_2.12 | 2.1.1 |
| ml.dmlc | xgboost4j-spark-gpu_2.12 | 2.1.1 |
spark.dynamicAllocation.enabled = falseproperty on a Dataproc job to disable dynamic allocation.Python libraries
The2.3-ml-ubuntu image contains the following libraries, which support differentstages in the ML lifecycle.
| Package | Version |
|---|---|
| accelerate | 1.8.1 |
| conda | 23.11.0 |
| cookiecutter | 2.5.0 |
| curl | 8.12.1 |
| cython | 3.0.12 |
| dask | 2023.12.1 |
| datasets | 3.6.0 |
| deepspeed | 0.17.2 |
| delta-spark | 3.2.0 |
| evaluate | 0.4.5 |
| fastavro | 1.9.7 |
| fastparquet | 2023.10.1 |
| fiona | 1.10.0 |
| gateway-provisioners[yarn] | 0.4.0 |
| gcsfs | 2023.12.2.post1 |
| google-auth-oauthlib | 1.2.2 |
| google-cloud-aiplatform | 1.88.0 |
| google-cloud-bigquery[pandas] | 3.31.0 |
| google-cloud-bigquery-storage | 2.30.0 |
| google-cloud-bigtable | 2.30.1 |
| google-cloud-container | 2.56.1 |
| google-cloud-datacatalog | 3.26.1 |
| google-cloud-dataproc | 5.18.1 |
| google-cloud-datastore | 2.21.0 |
| google-cloud-language | 2.17.2 |
| google-cloud-logging | 3.11.4 |
| google-cloud-monitoring | 2.27.2 |
| google-cloud-pubsub | 2.29.1 |
| google-cloud-redis | 2.18.1 |
| google-cloud-spanner | 3.53.0 |
| google-cloud-speech | 2.32.0 |
| google-cloud-storage | 2.19.0 |
| google-cloud-texttospeech | 2.25.1 |
| google-cloud-translate | 3.20.3 |
| google-cloud-vision | 3.10.2 |
| huggingface_hub | 0.33.1 |
| httplib2 | 0.22.0 |
| ipyparallel | 8.6.1 |
| ipython-sql | 0.3.9 |
| ipywidgets | 8.1.7 |
| jupyter_contrib_nbextensions | 0.7.0 |
| jupyter_http_over_ws | 0.0.8 |
| jupyter_kernel_gateway | 2.5.2 |
| jupyter_server | 1.24.0 |
| jupyterhub | 4.1.6 |
| jupyterlab | 3.6.8 |
| jupyterlab-git | 0.44.0 |
| jupyterlab_widgets | 3.0.15 |
| koalas | 0.22.0 |
| langchain | 0.3.26 |
| lightgbm | 4.6.0 |
| markdown | 3.5.2 |
| matplotlib | 3.8.4 |
| mlflow | 3.1.1 |
| nbconvert | 7.14.2 |
| nbdime | 3.2.1 |
| nltk | 3.9.1 |
| notebook | 6.5.7 |
| numba | 0.58.1 |
| numpy | 1.26.4 |
| oauth2client | 4.1.3 |
| onnx | 1.17.0 |
| openblas | 0.3.25 |
| opencv | 4.11.0 |
| orc | 2.1.1 |
| pandas | 2.1.4 |
| pandas-profiling | 3.0.0 |
| papermill | 2.4.0 |
| pyarrow | 16.1.0 |
| pydot | 2.0.0 |
| pyhive | 0.7.0 |
| pynvml | 12.0.0 |
| pysal | 23.7 |
| pytables | 3.9.2 |
| python | 3.11 |
| regex | 2023.12.25 |
| requests | 2.32.2 |
| requests-kerberos | 0.12.0 |
| rtree | 1.1.0 |
| scikit-image | 0.22.0 |
| scikit-learn | 1.5.2 |
| scipy | 1.11.4 |
| seaborn | 0.13.2 |
| sentence-transformers | 5.0.0 |
| setuptools | 79.0.1 |
| shap | 0.48.0 |
| shapely | 2.1.1 |
| spacy | 3.8.7 |
| spark-tensorflow-distributor | 1.0.0 |
| spyder | 5.5.6 |
| sqlalchemy | 2.0.41 |
| sympy | 1.13.3 |
| tensorflow | 2.18.0 |
| tokenizers | 0.21.4.dev0 |
| toree | 0.5.0 |
| torch | 2.6.0 |
| torch-model-archiver | 0.11.1 |
| torcheval | 0.0.7 |
| tornado | 6.4.2 |
| torchvision | 0.21.0 |
| traitlets | 5.14.3 |
| transformers | 4.53.1 |
| uritemplate | 4.1.1 |
| virtualenv | 20.26.6 |
| wordcloud | 1.9.4 |
| xgboost | 2.1.4 |
R libraries
The following R library versions are included in2.3-ml-ubuntu image.
| Package Name | Version |
|---|---|
| r-ggplot2 | 3.4.4 |
| r-irkernel | 1.3.2 |
| r-rcurl | 1.98-1.16 |
| r-recommended | 4.3 |
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2026-02-19 UTC.