gcloud dataproc clusters gke create Stay organized with collections Save and categorize content based on your preferences.
- NAME
- gcloud dataproc clusters gke create - create a GKE-based virtual cluster
- SYNOPSIS
gcloud dataproc clusters gke create(CLUSTER:--region=REGION)--spark-engine-version=SPARK_ENGINE_VERSION(--gke-cluster=GKE_CLUSTER:--gke-cluster-location=GKE_CLUSTER_LOCATION)[--async][--namespace=NAMESPACE][--pools=[KEY=VALUE[;VALUE],…]][--properties=[PREFIX:PROPERTY=VALUE,…]][--setup-workload-identity][--staging-bucket=STAGING_BUCKET][--history-server-cluster=HISTORY_SERVER_CLUSTER:--history-server-cluster-region=HISTORY_SERVER_CLUSTER_REGION][--metastore-service=METASTORE_SERVICE:--metastore-service-location=METASTORE_SERVICE_LOCATION][GCLOUD_WIDE_FLAG …]
- DESCRIPTION
- Create a GKE-based virtual cluster.
- EXAMPLES
- Create a Dataproc on GKE cluster in us-central1 on a GKE cluster in the sameproject and region with default values:
gclouddataprocclustersgkecreatemy-cluster--region=us-central1--gke-cluster=my-gke-cluster--spark-engine-version=latest--pools='name=dp,roles=default'Create a Dataproc on GKE cluster in us-central1 on a GKE cluster in the sameproject and zone us-central1-f with default values:
gclouddataprocclustersgkecreatemy-cluster--region=us-central1--gke-cluster=my-gke-cluster--gke-cluster-location=us-central1-f--spark-engine-version=3.1--pools='name=dp,roles=default'Create a Dataproc on GKE cluster in us-central1 with machine type'e2-standard-4', autoscaling 5-15 nodes per zone.
gclouddataprocclustersgkecreatemy-cluster--region='us-central1'--gke-cluster='projects/my-project/locations/us-central1/clusters/my-gke-cluster'--spark-engine-version=dataproc-1.5--pools='name=dp-default,roles=default,machineType=e2-standard-4,min=5,max=15'Create a Dataproc on GKE cluster in us-central1 with two distinct node pools.
gclouddataprocclustersgkecreatemy-cluster--region='us-central1'--gke-cluster='my-gke-cluster'--spark-engine-version='dataproc-2.0'--pools='name=dp-default,roles=default,machineType=e2-standard-4'--pools='name=workers,roles=spark-drivers;spark-executors,machineType=n2-standard-8 - POSITIONAL ARGUMENTS
- Cluster resource - The name of the cluster to create. The arguments in thisgroup can be used to specify the attributes of this resource. (NOTE) Someattributes are not given arguments in this group but can be set in other ways.
To set the
projectattribute:- provide the argument
clusteron the command line with a fullyspecified name; - provide the argument
--projecton the command line; - set the property
core/project.
This must be specified.
CLUSTER- ID of the cluster or fully qualified identifier for the cluster.
To set the
clusterattribute:- provide the argument
clusteron the command line.
This positional argument must be specified if any of the other arguments in thisgroup are specified.
- provide the argument
--region=REGION- Dataproc region for the cluster. Each Dataproc region constitutes an independentresource namespace constrained to deploying instances into Compute Engine zonesinside the region. Overrides the default
dataproc/regionpropertyvalue for this command invocation.To set the
regionattribute:- provide the argument
clusteron the command line with a fullyspecified name; - provide the argument
--regionon the command line; - set the property
dataproc/region.
- provide the argument
- provide the argument
- Cluster resource - The name of the cluster to create. The arguments in thisgroup can be used to specify the attributes of this resource. (NOTE) Someattributes are not given arguments in this group but can be set in other ways.
- REQUIRED FLAGS
--spark-engine-version=SPARK_ENGINE_VERSION- The version of the Spark engine to run on this cluster.
- Gke cluster resource - The GKE cluster to install the Dataproc cluster on. Thearguments in this group can be used to specify the attributes of this resource.(NOTE) Some attributes are not given arguments in this group but can be set inother ways.
To set the
projectattribute:- provide the argument
--gke-clusteron the command line with a fullyspecified name; - provide the argument
--projecton the command line; - set the property
core/project.
This must be specified.
--gke-cluster=GKE_CLUSTER- ID of the gke-cluster or fully qualified identifier for the gke-cluster.
To set the
gke-clusterattribute:- provide the argument
--gke-clusteron the command line.
This flag argument must be specified if any of the other arguments in this groupare specified.
- provide the argument
--gke-cluster-location=GKE_CLUSTER_LOCATION- GKE region for the gke-cluster.
To set the
gke-cluster-locationattribute:- provide the argument
--gke-clusteron the command line with a fullyspecified name; - provide the argument
--gke-cluster-locationon the command line; - provide the argument
--regionon the command line; - set the property
dataproc/region.
- provide the argument
- provide the argument
- OPTIONAL FLAGS
--async- Return immediately, without waiting for the operation in progress to complete.
--namespace=NAMESPACE- The name of the Kubernetes namespace to deploy Dataproc system components in.This namespace does not need to exist.
--pools=[KEY=VALUE[;VALUE],…]- Each
--poolsflag represents a GKE node pool associated with thevirtual cluster. It is comprised of a CSV in the formKEY=VALUE[;VALUE], where certain keys may have multiple values.The following KEYs must be specified:
-----------------------------------------------------------------------------------------------------------KEYTypeExampleDescription--------------------------------------------------------------------------------------------------------namestring`my-node-pool`Nameofthenodepool.
rolesrepeatedstring`default;spark-driver`Rolesthatthisnodepoolshouldperform.Validvaluesare`default`,`controller`,`spark-driver`,`spark-executor`.-----------------------------------------------------------------------------------------------------------
The following KEYs may be specified:
----------------------------------------------------------------------------------------------------------------------------------------------------------------KEYTypeExampleDescription-------------------------------------------------------------------------------------------------------------------------------------------------------------machineTypestring`n1-standard-8`ComputeEnginemachinetypetouse.
preemptibleboolean`false`Iftrue,thenthisnodepoolusespreemptibleVMs.Thiscannotbetrueonthenodepoolwiththe`controllers`role(or`default`roleif`controllers`roleisnotspecified).
localSsdCountint`2`ThenumberoflocalSSDstoattachtoeachnode.
acceleratorrepeatedstring`nvidia-tesla-a100=1`Acceleratorstoattachtoeachnode.IntheformatNAME=COUNT.
minCpuPlatformstring`IntelSkylake`MinimumCPUplatformforeachnode.
bootDiskKmsKeystring`projects/project-id/locations/us-central1TheCustomerManagedEncryptionKey(CMEK)usedtoencrypt/keyRings/keyRing-name/cryptoKeys/key-name`thebootdiskattachedtoeachnodeinthenodepool.
locationsrepeatedstring`us-west1-a;us-west1-c`ZoneswithinthelocationoftheGKEcluster.All`--pools`flagsforaDataprocclustermusthaveidenticallocations.
minint`0`Minimumnumberofnodesperzonethatthisnodepoolcanscaledownto.
maxint`10`Maximumnumberofnodesperzonethatthisnodepoolcanscaleupto.----------------------------------------------------------------------------------------------------------------------------------------------------------------
--properties=[PREFIX:PROPERTY=VALUE,…]- Specifies configuration properties for installed packages, such as Spark.Properties are mapped to configuration files by specifying a prefix, such as"core:io.serializations".
--setup-workload-identity- Sets up the GKE Workload Identity for your Dataproc on GKE cluster. Note thatrunning this requires elevated permissions as it will manipulate IAM policies onthe Google Service Accounts that will be used by your Dataproc on GKE cluster.
--staging-bucket=STAGING_BUCKET- The Cloud Storage bucket to use to stage job dependencies, miscellaneous configfiles, and job driver console output when using this cluster.
- History server cluster resource - A Dataproc Cluster created as a HistoryServer, seehttps://cloud.google.com/dataproc/docs/concepts/jobs/history-serverThe arguments in this group can be used to specify the attributes of thisresource. (NOTE) Some attributes are not given arguments in this group but canbe set in other ways.
To set the
projectattribute:- provide the argument
--history-server-clusteron the command linewith a fully specified name; - provide the argument
--projecton the command line; - set the property
core/project.
- provide the argument
--history-server-cluster=HISTORY_SERVER_CLUSTER- ID of the history-server-cluster or fully qualified identifier for thehistory-server-cluster.
To set the
history-server-clusterattribute:- provide the argument
--history-server-clusteron the command line.
This flag argument must be specified if any of the other arguments in this groupare specified.
- provide the argument
--history-server-cluster-region=HISTORY_SERVER_CLUSTER_REGION- Compute Engine region for the history-server-cluster. It must be the same regionas the Dataproc cluster that is being created.
To set the
history-server-cluster-regionattribute:- provide the argument
--history-server-clusteron the command linewith a fully specified name; - provide the argument
--history-server-cluster-regionon the commandline; - provide the argument
--regionon the command line; - set the property
dataproc/region.
- provide the argument
- Metastore service resource - Dataproc Metastore Service to be used as anexternal metastore. The arguments in this group can be used to specify theattributes of this resource. (NOTE) Some attributes are not given arguments inthis group but can be set in other ways.
To set the
projectattribute:- provide the argument
--metastore-serviceon the command line with afully specified name; - provide the argument
--projecton the command line; - set the property
core/project.
- provide the argument
--metastore-service=METASTORE_SERVICE- ID of the metastore-service or fully qualified identifier for themetastore-service.
To set the
metastore-serviceattribute:- provide the argument
--metastore-serviceon the command line.
This flag argument must be specified if any of the other arguments in this groupare specified.
- provide the argument
--metastore-service-location=METASTORE_SERVICE_LOCATION- Dataproc Metastore location for the metastore-service.
To set the
metastore-service-locationattribute:- provide the argument
--metastore-serviceon the command line with afully specified name; - provide the argument
--metastore-service-locationon the commandline; - provide the argument
--regionon the command line; - set the property
dataproc/region.
- provide the argument
- GCLOUD WIDE FLAGS
- These flags are available to all commands:
--access-token-file,--account,--billing-project,--configuration,--flags-file,--flatten,--format,--help,--impersonate-service-account,--log-http,--project,--quiet,--trace-token,--user-output-enabled,--verbosity.Run
$gcloud helpfor details. - NOTES
- These variants are also available:
gcloudalphadataprocclustersgkecreategcloudbetadataprocclustersgkecreate
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2026-01-21 UTC.