Dataproc components Stay organized with collections Save and categorize content based on your preferences.
Dataproc clusters feature the following types of components:
Installed components: Components that are installed in the image and activatedwhen the cluster is created.
Optional components: Components that you select to install and use onyour cluster when you create the cluster. Dataproc installs andactivates optional components depending on the cluster image version as follows:
2.2and earlier image versions: Optional components are automaticallyinstalled. Selected optional components are activated and non-selectedoptional components are uninstalled at cluster creation.2.3and later image versions: All optional components are installed duringcluster creation except the Jupyter, Iceberg, and Delta Lake optional components,which are pre-installed in2.3and later image versions. Pre-installedoptional components are removed from a2.3or later image version clusterif they are not enabled when the cluster is created. For more information, seeDataproc 2.3.x release versions.To avoid increased startup time for2.3and later image version clusters, create acustom imagewith optional components pre-installed. You can do this by runninggenerate_custom_image.pywith the--optional-componentsflag.
Initialization action components: Components installed on a cluster as partof aninitialization actionthat you specify when you create a cluster.
Optional components are installed on a cluster beforeinitialization actionsare run on the cluster.
TheDataproc image version pageslist the components and component types available in the latestDataproc image releases.
Optional components have the following advantages over initialization actionsused to install components:
- Optional components are tested as compatible with specificDataproc versions.
- Optional components are enabled with a cluster creation parameter;initialization actions require a script.
Available optional components
| Optional component | Component name in Google Cloud CLI commands and API requests | Image Version | Release Stage |
|---|---|---|---|
| Delta Lake | DELTA | 2.2.46 and later | GA |
| Docker | DOCKER | 1.5 and later | GA |
| Flink | FLINK | 1.5 and later | GA |
| HBase | HBASE | 1.5 and later (not available in2.1 and later) | Deprecated |
| Hive WebHCat | HIVE_WEBHCAT | 1.3 and later | GA |
| Hudi | HUDI | 1.5 and later | GA |
| Iceberg | ICEBERG | 2.2 and later | GA |
| Jupyter Notebook | JUPYTER | 1.3 and later | GA |
| Pig | PIG | 1.5* and later | GA |
| Presto | PRESTO | 1.3 and later (not available in2.1 and later) | GA |
| Ranger | RANGER | 1.3 and later | GA |
| Solr | SOLR | 1.3 and later | GA |
| Trino | TRINO | 2.1 and later | GA |
| Zeppelin Notebook | ZEPPELIN | 1.3 and later | GA |
| Zookeeper | ZOOKEEPER | 1.0 and later | GA |
Notes:
- Apache Pig is an optional component in image versions 2.3 and later. It waspre-installed in
2.2and earlier image versions.
Add optional components
Note: The following usage examples apply toGeneral Availability (GA)components.Console
- In the Google Cloud console, go to the DataprocCreate a cluster page.
TheSet up cluster panel is selected.
- In theComponents section, underOptional components, select one or more components to install on your cluster.
Google Cloud CLI
To create a Dataproc cluster and install one or moreoptional components on the cluster, use thegcloud beta dataproc clusters createcluster-name command with the--optional-components flag.
gcloud dataproc clusters createCLUSTER_NAME \ --optional-components=COMPONENT-NAME(s) \... other flags
REST API
Optional components can be specified through the Dataproc APIusingSoftwareConfig.Componentas part of aclusters.createrequest.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-15 UTC.