Dataproc components

Dataproc clusters feature the following types of components:

  • Installed components: Components that are installed in the image and activatedwhen the cluster is created.

  • Optional components: Components that you select to install and use onyour cluster when you create the cluster. Dataproc installs andactivates optional components depending on the cluster image version as follows:

  • Initialization action components: Components installed on a cluster as partof aninitialization actionthat you specify when you create a cluster.

Optional components are installed on a cluster beforeinitialization actionsare run on the cluster.

TheDataproc image version pageslist the components and component types available in the latestDataproc image releases.

Optional components have the following advantages over initialization actionsused to install components:

  • Optional components are tested as compatible with specificDataproc versions.
  • Optional components are enabled with a cluster creation parameter;initialization actions require a script.

Available optional components

Optional componentComponent name
in Google Cloud CLI commands and API requests
Image VersionRelease Stage
Delta LakeDELTA2.2.46 and laterGA
DockerDOCKER1.5 and laterGA
FlinkFLINK1.5 and laterGA
HBaseHBASE1.5 and later
(not available in2.1 and later)
Deprecated
Hive WebHCatHIVE_WEBHCAT1.3 and laterGA
HudiHUDI1.5 and laterGA
IcebergICEBERG2.2 and laterGA
Jupyter NotebookJUPYTER1.3 and laterGA
PigPIG1.5* and laterGA
PrestoPRESTO1.3 and later
(not available in2.1 and later)
GA
RangerRANGER1.3 and laterGA
SolrSOLR1.3 and laterGA
TrinoTRINO2.1 and laterGA
Zeppelin NotebookZEPPELIN1.3 and laterGA
ZookeeperZOOKEEPER1.0 and laterGA

Notes:

  • Apache Pig is an optional component in image versions 2.3 and later. It waspre-installed in2.2 and earlier image versions.
SeeCluster web interfacesfor connecting to component Web interfaces running on clusters.Also see the DataprocComponent Gateway,which lets you connect to the web interfaces of Dataproccore and optional components, including YARN, HDFS, Jupyter,and Zeppelin UIs, without requiring the use ofSSH tunnels or themodification of firewall rulesto allow inbound traffic.

Add optional components

Note: The following usage examples apply toGeneral Availability (GA)components.

Console

  1. In the Google Cloud console, go to the DataprocCreate a cluster page.

    Go to Create a cluster

    TheSet up cluster panel is selected.

  2. In theComponents section, underOptional components, select one or more components to install on your cluster.

Google Cloud CLI

To create a Dataproc cluster and install one or moreoptional components on the cluster, use thegcloud beta dataproc clusters createcluster-name command with the--optional-components flag.

gcloud dataproc clusters createCLUSTER_NAME \  --optional-components=COMPONENT-NAME(s) \... other flags

REST API

Optional components can be specified through the Dataproc APIusingSoftwareConfig.Componentas part of aclusters.createrequest.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.