gcloud dataproc clusters gke create

NAME
gcloud dataproc clusters gke create - create a GKE-based virtual cluster
SYNOPSIS
gcloud dataproc clusters gke create(CLUSTER :--region=REGION)--spark-engine-version=SPARK_ENGINE_VERSION(--gke-cluster=GKE_CLUSTER :--gke-cluster-location=GKE_CLUSTER_LOCATION)[--async][--namespace=NAMESPACE][--pools=[KEY=VALUE[;VALUE],…]][--properties=[PREFIX:PROPERTY=VALUE,…]][--setup-workload-identity][--staging-bucket=STAGING_BUCKET][--history-server-cluster=HISTORY_SERVER_CLUSTER :--history-server-cluster-region=HISTORY_SERVER_CLUSTER_REGION][--metastore-service=METASTORE_SERVICE :--metastore-service-location=METASTORE_SERVICE_LOCATION][GCLOUD_WIDE_FLAG]
DESCRIPTION
Create a GKE-based virtual cluster.
EXAMPLES
Create a Dataproc on GKE cluster in us-central1 on a GKE cluster in the sameproject and region with default values:
gclouddataprocclustersgkecreatemy-cluster--region=us-central1--gke-cluster=my-gke-cluster--spark-engine-version=latest--pools='name=dp,roles=default'

Create a Dataproc on GKE cluster in us-central1 on a GKE cluster in the sameproject and zone us-central1-f with default values:

gclouddataprocclustersgkecreatemy-cluster--region=us-central1--gke-cluster=my-gke-cluster--gke-cluster-location=us-central1-f--spark-engine-version=3.1--pools='name=dp,roles=default'

Create a Dataproc on GKE cluster in us-central1 with machine type'e2-standard-4', autoscaling 5-15 nodes per zone.

gclouddataprocclustersgkecreatemy-cluster--region='us-central1'--gke-cluster='projects/my-project/locations/us-central1/clusters/my-gke-cluster'--spark-engine-version=dataproc-1.5--pools='name=dp-default,roles=default,machineType=e2-standard-4,min=5,max=15'

Create a Dataproc on GKE cluster in us-central1 with two distinct node pools.

gclouddataprocclustersgkecreatemy-cluster--region='us-central1'--gke-cluster='my-gke-cluster'--spark-engine-version='dataproc-2.0'--pools='name=dp-default,roles=default,machineType=e2-standard-4'--pools='name=workers,roles=spark-drivers;spark-executors,machineType=n2-standard-8
POSITIONAL ARGUMENTS
Cluster resource - The name of the cluster to create. The arguments in thisgroup can be used to specify the attributes of this resource. (NOTE) Someattributes are not given arguments in this group but can be set in other ways.

To set theproject attribute:

  • provide the argumentcluster on the command line with a fullyspecified name;
  • provide the argument--project on the command line;
  • set the propertycore/project.

This must be specified.

CLUSTER
ID of the cluster or fully qualified identifier for the cluster.

To set thecluster attribute:

  • provide the argumentcluster on the command line.

This positional argument must be specified if any of the other arguments in thisgroup are specified.

--region=REGION
Dataproc region for the cluster. Each Dataproc region constitutes an independentresource namespace constrained to deploying instances into Compute Engine zonesinside the region. Overrides the defaultdataproc/region propertyvalue for this command invocation.

To set theregion attribute:

  • provide the argumentcluster on the command line with a fullyspecified name;
  • provide the argument--region on the command line;
  • set the propertydataproc/region.
REQUIRED FLAGS
--spark-engine-version=SPARK_ENGINE_VERSION
The version of the Spark engine to run on this cluster.
Gke cluster resource - The GKE cluster to install the Dataproc cluster on. Thearguments in this group can be used to specify the attributes of this resource.(NOTE) Some attributes are not given arguments in this group but can be set inother ways.

To set theproject attribute:

  • provide the argument--gke-cluster on the command line with a fullyspecified name;
  • provide the argument--project on the command line;
  • set the propertycore/project.

This must be specified.

--gke-cluster=GKE_CLUSTER
ID of the gke-cluster or fully qualified identifier for the gke-cluster.

To set thegke-cluster attribute:

  • provide the argument--gke-cluster on the command line.

This flag argument must be specified if any of the other arguments in this groupare specified.

--gke-cluster-location=GKE_CLUSTER_LOCATION
GKE region for the gke-cluster.

To set thegke-cluster-location attribute:

  • provide the argument--gke-cluster on the command line with a fullyspecified name;
  • provide the argument--gke-cluster-location on the command line;
  • provide the argument--region on the command line;
  • set the propertydataproc/region.
OPTIONAL FLAGS
--async
Return immediately, without waiting for the operation in progress to complete.
--namespace=NAMESPACE
The name of the Kubernetes namespace to deploy Dataproc system components in.This namespace does not need to exist.
--pools=[KEY=VALUE[;VALUE],…]
Each--pools flag represents a GKE node pool associated with thevirtual cluster. It is comprised of a CSV in the formKEY=VALUE[;VALUE], where certain keys may have multiple values.

The following KEYs must be specified:

-----------------------------------------------------------------------------------------------------------KEYTypeExampleDescription--------------------------------------------------------------------------------------------------------namestring`my-node-pool`Nameofthenodepool.
rolesrepeatedstring`default;spark-driver`Rolesthatthisnodepoolshouldperform.Validvaluesare`default`,`controller`,`spark-driver`,`spark-executor`.-----------------------------------------------------------------------------------------------------------

The following KEYs may be specified:

----------------------------------------------------------------------------------------------------------------------------------------------------------------KEYTypeExampleDescription-------------------------------------------------------------------------------------------------------------------------------------------------------------machineTypestring`n1-standard-8`ComputeEnginemachinetypetouse.
preemptibleboolean`false`Iftrue,thenthisnodepoolusespreemptibleVMs.Thiscannotbetrueonthenodepoolwiththe`controllers`role(or`default`roleif`controllers`roleisnotspecified).
localSsdCountint`2`ThenumberoflocalSSDstoattachtoeachnode.
acceleratorrepeatedstring`nvidia-tesla-a100=1`Acceleratorstoattachtoeachnode.IntheformatNAME=COUNT.
minCpuPlatformstring`IntelSkylake`MinimumCPUplatformforeachnode.
bootDiskKmsKeystring`projects/project-id/locations/us-central1TheCustomerManagedEncryptionKey(CMEK)usedtoencrypt/keyRings/keyRing-name/cryptoKeys/key-name`thebootdiskattachedtoeachnodeinthenodepool.
locationsrepeatedstring`us-west1-a;us-west1-c`ZoneswithinthelocationoftheGKEcluster.All`--pools`flagsforaDataprocclustermusthaveidenticallocations.
minint`0`Minimumnumberofnodesperzonethatthisnodepoolcanscaledownto.
maxint`10`Maximumnumberofnodesperzonethatthisnodepoolcanscaleupto.----------------------------------------------------------------------------------------------------------------------------------------------------------------
--properties=[PREFIX:PROPERTY=VALUE,…]
Specifies configuration properties for installed packages, such as Spark.Properties are mapped to configuration files by specifying a prefix, such as"core:io.serializations".
--setup-workload-identity
Sets up the GKE Workload Identity for your Dataproc on GKE cluster. Note thatrunning this requires elevated permissions as it will manipulate IAM policies onthe Google Service Accounts that will be used by your Dataproc on GKE cluster.
--staging-bucket=STAGING_BUCKET
The Cloud Storage bucket to use to stage job dependencies, miscellaneous configfiles, and job driver console output when using this cluster.
History server cluster resource - A Dataproc Cluster created as a HistoryServer, seehttps://cloud.google.com/dataproc/docs/concepts/jobs/history-serverThe arguments in this group can be used to specify the attributes of thisresource. (NOTE) Some attributes are not given arguments in this group but canbe set in other ways.

To set theproject attribute:

  • provide the argument--history-server-cluster on the command linewith a fully specified name;
  • provide the argument--project on the command line;
  • set the propertycore/project.
--history-server-cluster=HISTORY_SERVER_CLUSTER
ID of the history-server-cluster or fully qualified identifier for thehistory-server-cluster.

To set thehistory-server-cluster attribute:

  • provide the argument--history-server-cluster on the command line.

This flag argument must be specified if any of the other arguments in this groupare specified.

--history-server-cluster-region=HISTORY_SERVER_CLUSTER_REGION
Compute Engine region for the history-server-cluster. It must be the same regionas the Dataproc cluster that is being created.

To set thehistory-server-cluster-region attribute:

  • provide the argument--history-server-cluster on the command linewith a fully specified name;
  • provide the argument--history-server-cluster-region on the commandline;
  • provide the argument--region on the command line;
  • set the propertydataproc/region.
Metastore service resource - Dataproc Metastore Service to be used as anexternal metastore. The arguments in this group can be used to specify theattributes of this resource. (NOTE) Some attributes are not given arguments inthis group but can be set in other ways.

To set theproject attribute:

  • provide the argument--metastore-service on the command line with afully specified name;
  • provide the argument--project on the command line;
  • set the propertycore/project.
--metastore-service=METASTORE_SERVICE
ID of the metastore-service or fully qualified identifier for themetastore-service.

To set themetastore-service attribute:

  • provide the argument--metastore-service on the command line.

This flag argument must be specified if any of the other arguments in this groupare specified.

--metastore-service-location=METASTORE_SERVICE_LOCATION
Dataproc Metastore location for the metastore-service.

To set themetastore-service-location attribute:

  • provide the argument--metastore-service on the command line with afully specified name;
  • provide the argument--metastore-service-location on the commandline;
  • provide the argument--region on the command line;
  • set the propertydataproc/region.
GCLOUD WIDE FLAGS
These flags are available to all commands:--access-token-file,--account,--billing-project,--configuration,--flags-file,--flatten,--format,--help,--impersonate-service-account,--log-http,--project,--quiet,--trace-token,--user-output-enabled,--verbosity.

Run$gcloud help for details.

NOTES
These variants are also available:
gcloudalphadataprocclustersgkecreate
gcloudbetadataprocclustersgkecreate

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-01-21 UTC.