gcloud dataproc clusters create Stay organized with collections Save and categorize content based on your preferences.
- NAME
- gcloud dataproc clusters create - create a cluster
- SYNOPSIS
gcloud dataproc clusters create(CLUSTER:--region=REGION)[--action-on-failed-primary-workers=ACTION_ON_FAILED_PRIMARY_WORKERS][--async][--autoscaling-policy=AUTOSCALING_POLICY][--bucket=BUCKET][--cluster-type=TYPE][--confidential-compute][--dataproc-metastore=DATAPROC_METASTORE][--delete-max-idle=DELETE_MAX_IDLE][--driver-pool-accelerator=[type=TYPE,[count=COUNT],…]][--driver-pool-boot-disk-size=DRIVER_POOL_BOOT_DISK_SIZE][--driver-pool-boot-disk-type=DRIVER_POOL_BOOT_DISK_TYPE][--driver-pool-id=DRIVER_POOL_ID][--driver-pool-local-ssd-interface=DRIVER_POOL_LOCAL_SSD_INTERFACE][--driver-pool-machine-type=DRIVER_POOL_MACHINE_TYPE][--driver-pool-min-cpu-platform=PLATFORM][--driver-pool-size=DRIVER_POOL_SIZE][--enable-component-gateway][--initialization-action-timeout=TIMEOUT; default="10m"][--initialization-actions=CLOUD_STORAGE_URI,[…]][--labels=[KEY=VALUE,…]][--master-accelerator=[type=TYPE,[count=COUNT],…]][--master-boot-disk-provisioned-iops=MASTER_BOOT_DISK_PROVISIONED_IOPS][--master-boot-disk-provisioned-throughput=MASTER_BOOT_DISK_PROVISIONED_THROUGHPUT][--master-boot-disk-size=MASTER_BOOT_DISK_SIZE][--master-boot-disk-type=MASTER_BOOT_DISK_TYPE][--master-local-ssd-interface=MASTER_LOCAL_SSD_INTERFACE][--master-machine-type=MASTER_MACHINE_TYPE][--master-min-cpu-platform=PLATFORM][--min-secondary-worker-fraction=MIN_SECONDARY_WORKER_FRACTION][--node-group=NODE_GROUP][--num-driver-pool-local-ssds=NUM_DRIVER_POOL_LOCAL_SSDS][--num-master-local-ssds=NUM_MASTER_LOCAL_SSDS][--num-masters=NUM_MASTERS][--num-secondary-worker-local-ssds=NUM_SECONDARY_WORKER_LOCAL_SSDS][--num-worker-local-ssds=NUM_WORKER_LOCAL_SSDS][--optional-components=[COMPONENT,…]][--private-ipv6-google-access-type=PRIVATE_IPV6_GOOGLE_ACCESS_TYPE][--properties=[PREFIX:PROPERTY=VALUE,…]][--secondary-worker-accelerator=[type=TYPE,[count=COUNT],…]][--secondary-worker-boot-disk-size=SECONDARY_WORKER_BOOT_DISK_SIZE][--secondary-worker-boot-disk-type=SECONDARY_WORKER_BOOT_DISK_TYPE][--secondary-worker-local-ssd-interface=SECONDARY_WORKER_LOCAL_SSD_INTERFACE][--secondary-worker-machine-types=type=MACHINE_TYPE[,type=MACHINE_TYPE…][,rank=RANK]][--secondary-worker-standard-capacity-base=SECONDARY_WORKER_STANDARD_CAPACITY_BASE][--secondary-worker-standard-capacity-percent-above-base=SECONDARY_WORKER_STANDARD_CAPACITY_PERCENT_ABOVE_BASE][--shielded-integrity-monitoring][--shielded-secure-boot][--shielded-vtpm][--stop-max-idle=STOP_MAX_IDLE][--temp-bucket=TEMP_BUCKET][--tier=TIER][--worker-accelerator=[type=TYPE,[count=COUNT],…]][--worker-boot-disk-provisioned-iops=WORKER_BOOT_DISK_PROVISIONED_IOPS][--worker-boot-disk-provisioned-throughput=WORKER_BOOT_DISK_PROVISIONED_THROUGHPUT][--worker-boot-disk-size=WORKER_BOOT_DISK_SIZE][--worker-boot-disk-type=WORKER_BOOT_DISK_TYPE][--worker-local-ssd-interface=WORKER_LOCAL_SSD_INTERFACE][--worker-min-cpu-platform=PLATFORM][--zone=ZONE,-zZONE][--delete-expiration-time=DELETE_EXPIRATION_TIME|--delete-max-age=DELETE_MAX_AGE][--gce-pd-kms-key=GCE_PD_KMS_KEY:--gce-pd-kms-key-keyring=GCE_PD_KMS_KEY_KEYRING--gce-pd-kms-key-location=GCE_PD_KMS_KEY_LOCATION--gce-pd-kms-key-project=GCE_PD_KMS_KEY_PROJECT][--identity-config-file=IDENTITY_CONFIG_FILE|--secure-multi-tenancy-user-mapping=SECURE_MULTI_TENANCY_USER_MAPPING][--image=IMAGE|--image-version=VERSION][--kerberos-config-file=KERBEROS_CONFIG_FILE|--enable-kerberos--kerberos-root-principal-password-uri=KERBEROS_ROOT_PRINCIPAL_PASSWORD_URI[--kerberos-kms-key=KERBEROS_KMS_KEY:--kerberos-kms-key-keyring=KERBEROS_KMS_KEY_KEYRING--kerberos-kms-key-location=KERBEROS_KMS_KEY_LOCATION--kerberos-kms-key-project=KERBEROS_KMS_KEY_PROJECT]][--kms-key=KMS_KEY:--kms-keyring=KMS_KEYRING--kms-location=KMS_LOCATION--kms-project=KMS_PROJECT][--metadata=KEY=VALUE,[KEY=VALUE,…]--resource-manager-tags=KEY=VALUE,[KEY=VALUE,…]--scopes=SCOPE,[SCOPE,…]--service-account=SERVICE_ACCOUNT--tags=TAG,[TAG,…]--network=NETWORK|--subnet=SUBNET--reservation=RESERVATION--reservation-affinity=RESERVATION_AFFINITY; default="any"][[--metric-sources=[METRIC_SOURCE,…] :--metric-overrides=[METRIC_SOURCE:INSTANCE:GROUP:METRIC,…] |--metric-overrides-file=METRIC_OVERRIDES_FILE]][--no-address|--public-ip-address][--single-node|--min-num-workers=MIN_NUM_WORKERS--num-secondary-workers=NUM_SECONDARY_WORKERS--num-workers=NUM_WORKERS--secondary-worker-type=TYPE; default="preemptible"][--stop-expiration-time=STOP_EXPIRATION_TIME|--stop-max-age=STOP_MAX_AGE][--worker-machine-type=WORKER_MACHINE_TYPE|--worker-machine-types=type=MACHINE_TYPE[,type=MACHINE_TYPE…][,rank=RANK]][GCLOUD_WIDE_FLAG …]
- DESCRIPTION
- Create a cluster.
- EXAMPLES
- To create a cluster, run:
gclouddataprocclusterscreatemy-cluster--region=us-central1 - POSITIONAL ARGUMENTS
- Cluster resource - The name of the cluster to create. The arguments in thisgroup can be used to specify the attributes of this resource. (NOTE) Someattributes are not given arguments in this group but can be set in other ways.
To set the
projectattribute:- provide the argument
clusteron the command line with a fullyspecified name; - provide the argument
--projecton the command line; - set the property
core/project.
This must be specified.
CLUSTER- ID of the cluster or fully qualified identifier for the cluster.
To set the
clusterattribute:- provide the argument
clusteron the command line.
This positional argument must be specified if any of the other arguments in thisgroup are specified.
- provide the argument
--region=REGION- Dataproc region for the cluster. Each Dataproc region constitutes an independentresource namespace constrained to deploying instances into Compute Engine zonesinside the region. Overrides the default
dataproc/regionpropertyvalue for this command invocation.To set the
regionattribute:- provide the argument
clusteron the command line with a fullyspecified name; - provide the argument
--regionon the command line; - set the property
dataproc/region.
- provide the argument
- provide the argument
- Cluster resource - The name of the cluster to create. The arguments in thisgroup can be used to specify the attributes of this resource. (NOTE) Someattributes are not given arguments in this group but can be set in other ways.
- FLAGS
--action-on-failed-primary-workers=ACTION_ON_FAILED_PRIMARY_WORKERS- Failure action to take when primary workers fail during cluster creation.
ACTION_ON_FAILED_PRIMARY_WORKERSmust be one of:DELETE- delete the failed primary workers
FAILURE_ACTION_UNSPECIFIED- failure action is not specified
NO_ACTION- take no action
--async- Return immediately, without waiting for the operation in progress to complete.
--autoscaling-policy=AUTOSCALING_POLICY- ID of the autoscaling policy or fully qualified identifier for the autoscalingpolicy.
To set the
autoscaling_policyattribute:- provide the argument
--autoscaling-policyon the command line.
- provide the argument
--bucket=BUCKET- The Google Cloud Storage bucket to use by default to stage job dependencies,miscellaneous config files, and job driver console output when using thiscluster.
--cluster-type=TYPE- The type of cluster.
TYPEmust be one of:standard,single-node,zero-scale. --confidential-compute- Enables Confidential VM. Seehttps://cloud.google.com/compute/confidential-vm/docsfor more information. Note that Confidential VM can only be enabled when themachine types are N2D(https://cloud.google.com/compute/docs/machine-types#n2d_machine_types) and theimage is SEV Compatible.
--dataproc-metastore=DATAPROC_METASTORE- Specify the name of a Dataproc Metastore service to be used as an externalmetastore in the format:"projects/{project-id}/locations/{region}/services/{service-name}".
--delete-max-idle=DELETE_MAX_IDLE- The duration after the last job completes to auto-delete the cluster, such as"2h" or "1d". See $gcloud topicdatetimes for information on duration formats.
--driver-pool-accelerator=[type=TYPE,[count=COUNT],…]- Attaches accelerators, such as GPUs, to the driver-pool instance(s).
type- The specific type of accelerator to attach to the instances, such as
nvidia-tesla-t4for NVIDIA T4. Usegcloud computeaccelerator-types listto display available accelerator types. count- The number of accelerators to attach to each instance. The default value is 1.
--driver-pool-boot-disk-size=DRIVER_POOL_BOOT_DISK_SIZE- The size of the boot disk. The value must be a whole number followed by a sizeunit of
for kilobyte,KBfor megabyte,MBfor gigabyte, orGBfor terabyte. For example,TBwill produce a 10 gigabyte disk. Theminimum size a boot disk can have is 10 GB. Disk size must be a multiple of 1GB.10GB --driver-pool-boot-disk-type=DRIVER_POOL_BOOT_DISK_TYPE- The type of the boot disk. The value must be
pd-balanced,pd-ssd, orpd-standard. --driver-pool-id=DRIVER_POOL_ID- Custom identifier for the DRIVER Node Group being created. If not provided, arandom string is generated.
--driver-pool-local-ssd-interface=DRIVER_POOL_LOCAL_SSD_INTERFACE- Interface to use to attach local SSDs to cluster driver pool node(s).
--driver-pool-machine-type=DRIVER_POOL_MACHINE_TYPE- The type of machine to use for the cluster driver pool nodes. Defaults toserver-specified.
--driver-pool-min-cpu-platform=PLATFORM- When specified, the VM is scheduled on the host with a specified CPUarchitecture or a more recent CPU platform that's available in that zone. Tolist available CPU platforms in a zone, run:
gcloudcomputezonesdescribeZONECPU platform selection may not be available in a zone. Zones that support CPUplatform selection provide an
availableCpuPlatformsfield, whichcontains the list of available CPU platforms in the zone (seeAvailabilityof CPU platforms for more information). --driver-pool-size=DRIVER_POOL_SIZE- The size of the cluster driver pool.
--enable-component-gateway- Enable access to the web UIs of selected components on the cluster through thecomponent gateway.
--initialization-action-timeout=TIMEOUT; default="10m"- The maximum duration of each initialization action. See $gcloud topic datetimes forinformation on duration formats.
--initialization-actions=CLOUD_STORAGE_URI,[…]- A list of Google Cloud Storage URIs of executables to run on each node in thecluster.
--labels=[KEY=VALUE,…]- List of label KEY=VALUE pairs to add.
Keys must start with a lowercase character and contain only hyphens(
-), underscores (_), lowercase characters, andnumbers. Values must contain only hyphens (-), underscores(_), lowercase characters, and numbers. --master-accelerator=[type=TYPE,[count=COUNT],…]- Attaches accelerators, such as GPUs, to the master instance(s).
type- The specific type of accelerator to attach to the instances, such as
nvidia-tesla-t4for NVIDIA T4. Usegcloud computeaccelerator-types listto display available accelerator types. count- The number of accelerators to attach to each instance. The default value is 1.
--master-boot-disk-provisioned-iops=MASTER_BOOT_DISK_PROVISIONED_IOPS- Indicates theIOPS toprovision for the disk. This sets the limit for disk I/O operations per second.This is only supported if the bootdisk type ishyperdisk-balanced.
--master-boot-disk-provisioned-throughput=MASTER_BOOT_DISK_PROVISIONED_THROUGHPUT- Indicates thethroughputto provision for the disk. This sets the limit for throughput in MiB per second.This is only supported if the bootdisk type ishyperdisk-balanced.
--master-boot-disk-size=MASTER_BOOT_DISK_SIZE- The size of the boot disk. The value must be a whole number followed by a sizeunit of
for kilobyte,KBfor megabyte,MBfor gigabyte, orGBfor terabyte. For example,TBwill produce a 10 gigabyte disk. Theminimum size a boot disk can have is 10 GB. Disk size must be a multiple of 1GB.10GB --master-boot-disk-type=MASTER_BOOT_DISK_TYPE- The type of the boot disk. The value must be
pd-balanced,pd-ssd, orpd-standard. --master-local-ssd-interface=MASTER_LOCAL_SSD_INTERFACE- Interface to use to attach local SSDs to master node(s) in a cluster.
--master-machine-type=MASTER_MACHINE_TYPE- The type of machine to use for the master. Defaults to server-specified.
--master-min-cpu-platform=PLATFORM- When specified, the VM is scheduled on the host with a specified CPUarchitecture or a more recent CPU platform that's available in that zone. Tolist available CPU platforms in a zone, run:
gcloudcomputezonesdescribeZONECPU platform selection may not be available in a zone. Zones that support CPUplatform selection provide an
availableCpuPlatformsfield, whichcontains the list of available CPU platforms in the zone (seeAvailabilityof CPU platforms for more information). --min-secondary-worker-fraction=MIN_SECONDARY_WORKER_FRACTION- Minimum fraction of secondary worker nodes required to create the cluster. If itis not met, cluster creation will fail. Must be a decimal value between 0 and 1.The number of required secondary workers is calculated byceil(min-secondary-worker-fraction * num_secondary_workers). Defaults to 0.0001.
--node-group=NODE_GROUP- The name of the sole-tenant node group to create the cluster on. Can be a shortname ("node-group-name") or in the format"projects/{project-id}/zones/{zone}/nodeGroups/{node-group-name}".
--num-driver-pool-local-ssds=NUM_DRIVER_POOL_LOCAL_SSDS- The number of local SSDs to attach to each cluster driver pool node.
--num-master-local-ssds=NUM_MASTER_LOCAL_SSDS- The number of local SSDs to attach to the master in a cluster.
--num-masters=NUM_MASTERS- The number of master nodes in the cluster.
Number of Masters Cluster Mode 1 Standard 3 High Availability --num-secondary-worker-local-ssds=NUM_SECONDARY_WORKER_LOCAL_SSDS- The number of local SSDs to attach to each preemptible worker in a cluster.
--num-worker-local-ssds=NUM_WORKER_LOCAL_SSDS- The number of local SSDs to attach to each worker in a cluster.
--optional-components=[COMPONENT,…]- List of optional components to be installed on cluster machines.
The following page documents the optional components that can be installed:https://cloud.google.com/dataproc/docs/concepts/configuring-clusters/optional-components.
--private-ipv6-google-access-type=PRIVATE_IPV6_GOOGLE_ACCESS_TYPE- The private IPv6 Google access type for the cluster.
PRIVATE_IPV6_GOOGLE_ACCESS_TYPEmust be one of:inherit-subnetwork,outbound,bidirectional. --properties=[PREFIX:PROPERTY=VALUE,…]- Specifies configuration properties for installed packages, such as Hadoop andSpark.
Properties are mapped to configuration files by specifying a prefix, such as"core:io.serializations". The following are supported prefixes and theirmappings:
Seehttps://cloud.google.com/dataproc/docs/concepts/configuring-clusters/cluster-propertiesfor more information.Prefix File Purpose of file capacity-scheduler capacity-scheduler.xml Hadoop YARN Capacity Scheduler configuration core core-site.xml Hadoop general configuration distcp distcp-default.xml Hadoop Distributed Copy configuration hadoop-env hadoop-env.sh Hadoop specific environment variables hdfs hdfs-site.xml Hadoop HDFS configuration hive hive-site.xml Hive configuration mapred mapred-site.xml Hadoop MapReduce configuration mapred-env mapred-env.sh Hadoop MapReduce specific environment variables pig pig.properties Pig configuration spark spark-defaults.conf Spark configuration spark-env spark-env.sh Spark specific environment variables yarn yarn-site.xml Hadoop YARN configuration yarn-env yarn-env.sh Hadoop YARN specific environment variables --secondary-worker-accelerator=[type=TYPE,[count=COUNT],…]- Attaches accelerators, such as GPUs, to the secondary-worker instance(s).
type- The specific type of accelerator to attach to the instances, such as
nvidia-tesla-t4for NVIDIA T4. Usegcloud computeaccelerator-types listto display available accelerator types. count- The number of accelerators to attach to each instance. The default value is 1.
--secondary-worker-boot-disk-size=SECONDARY_WORKER_BOOT_DISK_SIZE- The size of the boot disk. The value must be a whole number followed by a sizeunit of
for kilobyte,KBfor megabyte,MBfor gigabyte, orGBfor terabyte. For example,TBwill produce a 10 gigabyte disk. Theminimum size a boot disk can have is 10 GB. Disk size must be a multiple of 1GB.10GB --secondary-worker-boot-disk-type=SECONDARY_WORKER_BOOT_DISK_TYPE- The type of the boot disk. The value must be
pd-balanced,pd-ssd, orpd-standard. --secondary-worker-local-ssd-interface=SECONDARY_WORKER_LOCAL_SSD_INTERFACE- Interface to use to attach local SSDs to each secondary worker in a cluster.
--secondary-worker-machine-types=type=MACHINE_TYPE[,type=MACHINE_TYPE…][,rank=RANK]- Types of machines with optional rank for secondary workers to use. Defaults toserver-specified.eg.--secondary-worker-machine-types="type=e2-standard-8,type=t2d-standard-8,rank=0"
--secondary-worker-standard-capacity-base=SECONDARY_WORKER_STANDARD_CAPACITY_BASE- This flag sets the base number of Standard VMs to use forsecondaryworkers. Dataproc will create only standard VMs until it reaches thisnumber, then it will mix Spot and Standard VMs according to
.SECONDARY_WORKER_STANDARD_CAPACITY_PERCENT_ABOVE_BASE --secondary-worker-standard-capacity-percent-above-base=SECONDARY_WORKER_STANDARD_CAPACITY_PERCENT_ABOVE_BASE- When combining Standard and Spot VMs forsecondary-workersonce the number of Standard VMs specified by
hasbeen used, this flag specifies the percentage of the total number of additionalStandard VMs secondary workers will use. Spot VMs will be used for the remainingpercentage.SECONDARY_WORKER_STANDARD_CAPACITY_BASE --shielded-integrity-monitoring- Enables monitoring and attestation of the boot integrity of the cluster's VMs.vTPM (virtual Trusted Platform Module) must also be enabled. A TPM is a hardwaremodule that can be used for different security operations, such as remoteattestation, encryption, and sealing of keys.
--shielded-secure-boot- The cluster's VMs will boot with secure boot enabled.
--shielded-vtpm- The cluster's VMs will boot with the TPM (Trusted Platform Module) enabled. ATPM is a hardware module that can be used for different security operations,such as remote attestation, encryption, and sealing of keys.
--stop-max-idle=STOP_MAX_IDLE- The duration after the last job completes to auto-stop the cluster, such as "2h"or "1d". See $gcloud topicdatetimes for information on duration formats.
--temp-bucket=TEMP_BUCKET- The Google Cloud Storage bucket to use by default to store ephemeral cluster andjobs data, such as Spark and MapReduce history files.
--tier=TIER- Cluster tier.
TIERmust be one of:premium,standard. --worker-accelerator=[type=TYPE,[count=COUNT],…]- Attaches accelerators, such as GPUs, to the worker instance(s).
type- The specific type of accelerator to attach to the instances, such as
nvidia-tesla-t4for NVIDIA T4. Usegcloud computeaccelerator-types listto display available accelerator types. count- The number of accelerators to attach to each instance. The default value is 1.
--worker-boot-disk-provisioned-iops=WORKER_BOOT_DISK_PROVISIONED_IOPS- Indicates theIOPS toprovision for the disk. This sets the limit for disk I/O operations per second.This is only supported if the bootdisk type ishyperdisk-balanced.
--worker-boot-disk-provisioned-throughput=WORKER_BOOT_DISK_PROVISIONED_THROUGHPUT- Indicates thethroughputto provision for the disk. This sets the limit for throughput in MiB per second.This is only supported if the bootdisk type ishyperdisk-balanced.
--worker-boot-disk-size=WORKER_BOOT_DISK_SIZE- The size of the boot disk. The value must be a whole number followed by a sizeunit of
for kilobyte,KBfor megabyte,MBfor gigabyte, orGBfor terabyte. For example,TBwill produce a 10 gigabyte disk. Theminimum size a boot disk can have is 10 GB. Disk size must be a multiple of 1GB.10GB --worker-boot-disk-type=WORKER_BOOT_DISK_TYPE- The type of the boot disk. The value must be
pd-balanced,pd-ssd, orpd-standard. --worker-local-ssd-interface=WORKER_LOCAL_SSD_INTERFACE- Interface to use to attach local SSDs to each worker in a cluster.
--worker-min-cpu-platform=PLATFORM- When specified, the VM is scheduled on the host with a specified CPUarchitecture or a more recent CPU platform that's available in that zone. Tolist available CPU platforms in a zone, run:
gcloudcomputezonesdescribeZONECPU platform selection may not be available in a zone. Zones that support CPUplatform selection provide an
availableCpuPlatformsfield, whichcontains the list of available CPU platforms in the zone (seeAvailabilityof CPU platforms for more information). --zone=ZONE,-zZONE- The compute zone (e.g. us-central1-a) for the cluster. If empty and --region isset to a value other than
global, the server will pick a zone inthe region. Overrides the defaultcompute/zoneproperty value forthis command invocation. - At most one of these can be specified:
--delete-expiration-time=DELETE_EXPIRATION_TIME- The time when the cluster will be auto-deleted, such as"2017-08-29T18:52:51.142Z." See $gcloud topic datetimes forinformation on time formats.
--delete-max-age=DELETE_MAX_AGE- The lifespan of the cluster, with auto-deletion upon completion, such as "2h" or"1d". See $gcloud topicdatetimes for information on duration formats.
- Key resource - The Cloud KMS (Key Management Service) cryptokey that will beused to protect the cluster. The 'Compute Engine Service Agent' service accountmust hold permission 'Cloud KMS CryptoKey Encrypter/Decrypter'. The arguments inthis group can be used to specify the attributes of this resource.
--gce-pd-kms-key=GCE_PD_KMS_KEY- ID of the key or fully qualified identifier for the key.
To set the
kms-keyattribute:- provide the argument
--gce-pd-kms-keyon the command line.
This flag argument must be specified if any of the other arguments in this groupare specified.
- provide the argument
--gce-pd-kms-key-keyring=GCE_PD_KMS_KEY_KEYRING- The KMS keyring of the key.
To set the
kms-keyringattribute:- provide the argument
--gce-pd-kms-keyon the command line with afully specified name; - provide the argument
--gce-pd-kms-key-keyringon the command line.
- provide the argument
--gce-pd-kms-key-location=GCE_PD_KMS_KEY_LOCATION- The Google Cloud location for the key.
To set the
kms-locationattribute:- provide the argument
--gce-pd-kms-keyon the command line with afully specified name; - provide the argument
--gce-pd-kms-key-locationon the command line.
- provide the argument
--gce-pd-kms-key-project=GCE_PD_KMS_KEY_PROJECT- The Google Cloud project for the key.
To set the
kms-projectattribute:- provide the argument
--gce-pd-kms-keyon the command line with afully specified name; - provide the argument
--gce-pd-kms-key-projecton the command line; - set the property
core/project.
- provide the argument
- Specifying these flags will enable Secure Multi-Tenancy for the cluster.
At most one of these can be specified:
--identity-config-file=IDENTITY_CONFIG_FILE- Path to a YAML (or JSON) file containing the configuration for SecureMulti-Tenancy on the cluster. The path can be a Cloud Storage URL (Example:'gs://path/to/file') or a local file system path. If you pass "-" as the valueof the flag the file content will be read from stdin.
The YAML file is formatted as follows:
# Required. The mapping from user accounts to service accounts.user_service_account_mapping:bob@company.com:service-account-bob@project.iam.gserviceaccount.comalice@company.com:service-account-alice@project.iam.gserviceaccount.com --secure-multi-tenancy-user-mapping=SECURE_MULTI_TENANCY_USER_MAPPING- A string of user-to-service-account mappings. Mappings are separated by commas,and each mapping takes the form of "user-account:service-account". Example:"bob@company.com:service-account-bob@project.iam.gserviceaccount.com,alice@company.com:service-account-alice@project.iam.gserviceaccount.com".
- At most one of these can be specified:
--image=IMAGE- The custom image used to create the cluster. It can be the image name, the imageURI, or the image family URI, which selects the latest image from the family.
--image-version=VERSION- The image version to use for the cluster. Defaults to the latest version.
- Specifying these flags will enable Kerberos for the cluster.
At most one of these can be specified:
--kerberos-config-file=KERBEROS_CONFIG_FILE- Path to a YAML (or JSON) file containing the configuration for Kerberos on thecluster. If you pass
-as the value of the flag the file contentwill be read from stdin.The YAML file is formatted as follows:
# Optional. Flag to indicate whether to Kerberize the cluster.# The default value is true.enable_kerberos:true# Optional. The Google Cloud Storage URI of a KMS encrypted file# containing the root principal password.root_principal_password_uri:gs://bucket/password.encrypted# Optional. The URI of the Cloud KMS key used to encrypt# sensitive files.kms_key_uri:projects/myproject/locations/global/keyRings/mykeyring/cryptoKeys/my-key# Configuration of SSL encryption. If specified, all sub-fields# are required. Otherwise, Dataproc will provide a self-signed# certificate and generate the passwords.ssl:# Optional. The Google Cloud Storage URI of the keystore file.keystore_uri:gs://bucket/keystore.jks# Optional. The Google Cloud Storage URI of a KMS encrypted# file containing the password to the keystore.keystore_password_uri:gs://bucket/keystore_password.encrypted# Optional. The Google Cloud Storage URI of a KMS encrypted# file containing the password to the user provided key.key_password_uri:gs://bucket/key_password.encrypted# Optional. The Google Cloud Storage URI of the truststore# file.truststore_uri:gs://bucket/truststore.jks# Optional. The Google Cloud Storage URI of a KMS encrypted# file containing the password to the user provided# truststore.truststore_password_uri:gs://bucket/truststore_password.encrypted# Configuration of cross realm trust.cross_realm_trust:# Optional. The remote realm the Dataproc on-cluster KDC will# trust, should the user enable cross realm trust.realm:REMOTE.REALM# Optional. The KDC (IP or hostname) for the remote trusted# realm in a cross realm trust relationship.kdc:kdc.remote.realm# Optional. The admin server (IP or hostname) for the remote# trusted realm in a cross realm trust relationship.admin_server:admin-server.remote.realm# Optional. The Google Cloud Storage URI of a KMS encrypted# file containing the shared password between the on-cluster# Kerberos realm and the remote trusted realm, in a cross# realm trust relationship.shared_password_uri:gs://bucket/cross-realm.password.encrypted# Optional. The Google Cloud Storage URI of a KMS encrypted file# containing the master key of the KDC database.kdc_db_key_uri:gs://bucket/kdc_db_key.encrypted# Optional. The lifetime of the ticket granting ticket, in# hours. If not specified, or user specifies 0, then default# value 10 will be used.tgt_lifetime_hours:1# Optional. The name of the Kerberos realm. If not specified,# the uppercased domain name of the cluster will be used.realm:REALM.NAME
--enable-kerberos- Enable Kerberos on the cluster.
--kerberos-root-principal-password-uri=KERBEROS_ROOT_PRINCIPAL_PASSWORD_URI- Google Cloud Storage URI of a KMS encrypted file containing the root principalpassword. Must be a Cloud Storage URL beginning with 'gs://'.
- Key resource - The Cloud KMS (Key Management Service) cryptokey that will beused to protect the password. The 'Compute Engine Service Agent' service accountmust hold permission 'Cloud KMS CryptoKey Encrypter/Decrypter'. The arguments inthis group can be used to specify the attributes of this resource.
--kerberos-kms-key=KERBEROS_KMS_KEY- ID of the key or fully qualified identifier for the key.
To set the
kms-keyattribute:- provide the argument
--kerberos-kms-keyon the command line.
This flag argument must be specified if any of the other arguments in this groupare specified.
- provide the argument
--kerberos-kms-key-keyring=KERBEROS_KMS_KEY_KEYRING- The KMS keyring of the key.
To set the
kms-keyringattribute:- provide the argument
--kerberos-kms-keyon the command line with afully specified name; - provide the argument
--kerberos-kms-key-keyringon the commandline.
- provide the argument
--kerberos-kms-key-location=KERBEROS_KMS_KEY_LOCATION- The Google Cloud location for the key.
To set the
kms-locationattribute:- provide the argument
--kerberos-kms-keyon the command line with afully specified name; - provide the argument
--kerberos-kms-key-locationon the commandline.
- provide the argument
--kerberos-kms-key-project=KERBEROS_KMS_KEY_PROJECT- The Google Cloud project for the key.
To set the
kms-projectattribute:- provide the argument
--kerberos-kms-keyon the command line with afully specified name; - provide the argument
--kerberos-kms-key-projecton the commandline; - set the property
core/project.
- provide the argument
- Key resource - The Cloud KMS (Key Management Service) cryptokey that will beused to protect the cluster. The 'Compute Engine Service Agent' service accountmust hold permission 'Cloud KMS CryptoKey Encrypter/Decrypter'. The arguments inthis group can be used to specify the attributes of this resource.
--kms-key=KMS_KEY- ID of the key or fully qualified identifier for the key.
To set the
kms-keyattribute:- provide the argument
--kms-keyon the command line.
This flag argument must be specified if any of the other arguments in this groupare specified.
- provide the argument
--kms-keyring=KMS_KEYRING- The KMS keyring of the key.
To set the
kms-keyringattribute:- provide the argument
--kms-keyon the command line with a fullyspecified name; - provide the argument
--kms-keyringon the command line.
- provide the argument
--kms-location=KMS_LOCATION- The Google Cloud location for the key.
To set the
kms-locationattribute:- provide the argument
--kms-keyon the command line with a fullyspecified name; - provide the argument
--kms-locationon the command line.
- provide the argument
--kms-project=KMS_PROJECT- The Google Cloud project for the key.
To set the
kms-projectattribute:- provide the argument
--kms-keyon the command line with a fullyspecified name; - provide the argument
--kms-projecton the command line; - set the property
core/project.
- provide the argument
- Compute Engine options for Dataproc clusters.
--metadata=KEY=VALUE,[KEY=VALUE,…]- Metadata to be made available to the guest operating system running on theinstances
--resource-manager-tags=KEY=VALUE,[KEY=VALUE,…]- Specifies a list of resource manager tags to apply to each cluster node (masterand worker nodes).
--scopes=SCOPE,[SCOPE,…]- Specifies scopes for the node instances. Multiple SCOPEs can be specified,separated by commas. Examples:
gclouddataprocclusterscreateexample-cluster--scopeshttps://www.googleapis.com/auth/bigtable.admingclouddataprocclusterscreateexample-cluster--scopessqlservice,bigqueryThe following
minimum scopesare necessary for the cluster tofunction properly and are always added, even if not explicitly specified:https://www.googleapis.com/auth/devstorage.read_writehttps://www.googleapis.com/auth/logging.write
If the
--scopesflag is not specified, the followingdefaultscopesare also included:https://www.googleapis.com/auth/bigqueryhttps://www.googleapis.com/auth/bigtable.admin.tablehttps://www.googleapis.com/auth/bigtable.datahttps://www.googleapis.com/auth/devstorage.full_control
If you want to enable all scopes use the 'cloud-platform' scope.
SCOPE can be either the full URI of the scope or an alias.
Defaultscopes are assigned to all instances. Available aliases are:
DEPRECATION WARNING:https://www.googleapis.com/auth/sqlserviceaccount scope andAlias URI bigquery https://www.googleapis.com/auth/bigquery cloud-platform https://www.googleapis.com/auth/cloud-platform cloud-source-repos https://www.googleapis.com/auth/source.full_control cloud-source-repos-ro https://www.googleapis.com/auth/source.read_only compute-ro https://www.googleapis.com/auth/compute.readonly compute-rw https://www.googleapis.com/auth/compute datastore https://www.googleapis.com/auth/datastore default https://www.googleapis.com/auth/devstorage.read_only https://www.googleapis.com/auth/logging.write https://www.googleapis.com/auth/monitoring.write https://www.googleapis.com/auth/pubsub https://www.googleapis.com/auth/service.management.readonly https://www.googleapis.com/auth/servicecontrol https://www.googleapis.com/auth/trace.append gke-default https://www.googleapis.com/auth/devstorage.read_only https://www.googleapis.com/auth/logging.write https://www.googleapis.com/auth/monitoring https://www.googleapis.com/auth/service.management.readonly https://www.googleapis.com/auth/servicecontrol https://www.googleapis.com/auth/trace.append logging-write https://www.googleapis.com/auth/logging.write monitoring https://www.googleapis.com/auth/monitoring monitoring-read https://www.googleapis.com/auth/monitoring.read monitoring-write https://www.googleapis.com/auth/monitoring.write pubsub https://www.googleapis.com/auth/pubsub service-control https://www.googleapis.com/auth/servicecontrol service-management https://www.googleapis.com/auth/service.management.readonly sql (deprecated) https://www.googleapis.com/auth/sqlservice sql-admin https://www.googleapis.com/auth/sqlservice.admin storage-full https://www.googleapis.com/auth/devstorage.full_control storage-ro https://www.googleapis.com/auth/devstorage.read_only storage-rw https://www.googleapis.com/auth/devstorage.read_write taskqueue https://www.googleapis.com/auth/taskqueue trace https://www.googleapis.com/auth/trace.append userinfo-email https://www.googleapis.com/auth/userinfo.email sqlalias do not provide SQL instance managementcapabilities and have been deprecated. Please, usehttps://www.googleapis.com/auth/sqlservice.adminorsql-adminto manage your Google SQL Service instances. --service-account=SERVICE_ACCOUNT- The Google Cloud IAM service account to be authenticated as.
--tags=TAG,[TAG,…]- Specifies a list of tags to apply to the instance. These tags allow networkfirewall rules and routes to be applied to specified VM instances. See
gcloud computefirewall-rules create(1) for more details.To read more about configuring network tags, read this guide:https://cloud.google.com/vpc/docs/add-remove-network-tags
To list instances with their respective status and tags, run:
gcloudcomputeinstanceslist--format='table(name,status,tags.list())'To list instances tagged with a specific tag,
tag1, run:gcloudcomputeinstanceslist--filter='tags:tag1' - At most one of these can be specified:
--network=NETWORK- The Compute Engine network that the VM instances of the cluster will be part of.This is mutually exclusive with --subnet. If neither is specified, this defaultsto the "default" network.
--subnet=SUBNET- Specifies the subnet that the cluster will be part of. This is mutally exclusivewith --network.
- Specifies the reservation for the instance.
--reservation=RESERVATION- The name of the reservation, required when
--reservation-affinity=specific. --reservation-affinity=RESERVATION_AFFINITY; default="any"- The type of reservation for the instance.
RESERVATION_AFFINITYmust be one of:any,none,specific.
--metric-sources=[METRIC_SOURCE,…]- Specifies a list of clusterMetricSources to collect custom metrics.
METRIC_SOURCEmust be one of:FLINK,HDFS,HIVEMETASTORE,HIVESERVER2,MONITORING_AGENT_DEFAULTS,SPARK,SPARK_HISTORY_SERVER,YARN. - At most one of these can be specified:
--metric-overrides=[METRIC_SOURCE:INSTANCE:GROUP:METRIC,…]- List of metrics that override the default metrics enabled for the metricsources. Any of theavailableOSS metrics and all Spark metrics, can be listed for collection as a metricoverride. Override metric values are case sensitive, and must be provided, ifappropriate, in CamelCase format, for example:
sparkHistoryServer:JVM:Memory:NonHeapMemoryUsage.committedhiveserver2:JVM:Memory:NonHeapMemoryUsage.usedOnly the specified overridden metrics will be collected from a given metricsource. For example, if one or more
spark:executivemetrics arelisted as metric overrides, otherSPARKmetrics will not becollected. The collection of default OSS metrics from other metric sources isunaffected. For example, if bothSPARKandYARNmetricsources are enabled, and overrides are provided for Spark metrics only, alldefault YARN metrics will be collected.The source of the specified metric override must be enabled. For example, if oneor more
spark:drivermetrics are provided as metric overrides, thespark metric source must be enabled (--metric-sources=spark). --metric-overrides-file=METRIC_OVERRIDES_FILE- Path to a file containing list of Metrics that override the default metricsenabled for the metric sources. The path can be a Cloud Storage URL (example:
gs://path/to/file) or a local file system path.
- At most one of these can be specified:
--no-address- If provided, the instances in the cluster will not be assigned external IPaddresses.
If omitted, then the Dataproc service will apply a default policy to determineif each instance in the cluster gets an external IP address or not.
Note: Dataproc VMs need access to the Dataproc API. This can be achieved withoutexternal IP addresses using Private Google Access(https://cloud.google.com/compute/docs/private-google-access).
--public-ip-address- If provided, cluster instances are assigned external IP addresses.
If omitted, the Dataproc service applies a default policy to determine whetheror not each instance in the cluster gets an external IP address.
Note: Dataproc VMs need access to the Dataproc API. This can be achieved withoutexternal IP addresses using Private Google Access(https://cloud.google.com/compute/docs/private-google-access).
- At most one of these can be specified:
--single-node- Create a single node cluster.
A single node cluster has all master and worker components. It cannot have anyseparate worker nodes. If this flag is not specified, a cluster with separateworkers is created.
- Multi-node cluster flags
--min-num-workers=MIN_NUM_WORKERS- Minimum number of primary worker nodes to provision for cluster creation tosucceed.
--num-secondary-workers=NUM_SECONDARY_WORKERS- The number of secondary worker nodes in the cluster.
--num-workers=NUM_WORKERS- The number of worker nodes in the cluster. Defaults to server-specified.
--secondary-worker-type=TYPE; default="preemptible"- The type of the secondary worker group.
TYPEmust be oneof:preemptible,non-preemptible,spot.
- At most one of these can be specified:
--stop-expiration-time=STOP_EXPIRATION_TIME- The time when the cluster will be auto-stopped, such as"2017-08-29T18:52:51.142Z." See $gcloud topic datetimes forinformation on time formats.
--stop-max-age=STOP_MAX_AGE- The lifespan of the cluster, with auto-stop upon completion, such as "2h" or"1d". See $gcloud topicdatetimes for information on duration formats.
- At most one of these can be specified:
--worker-machine-type=WORKER_MACHINE_TYPE- The type of machine to use for primary workers. Defaults to server-specified.
--worker-machine-types=type=MACHINE_TYPE[,type=MACHINE_TYPE…][,rank=RANK]- Machinetypes for primary worker nodes to use with optional rank. A lower ranknumber is given higher preference. Based on availablilty, Dataproc tries tocreate primary worker VMs using the worker machine type with the lowest rank,and then tries to use machine types with higher ranks as necessary. Machinetypes with the same rank are given the same preference. Example use:--worker-machine-types="type=e2-standard-8,type=n2-standard-8,rank=0". For moreinformation, seeDataprocFlexible VMs
- GCLOUD WIDE FLAGS
- These flags are available to all commands:
--access-token-file,--account,--billing-project,--configuration,--flags-file,--flatten,--format,--help,--impersonate-service-account,--log-http,--project,--quiet,--trace-token,--user-output-enabled,--verbosity.Run
$gcloud helpfor details. - NOTES
- These variants are also available:
gcloudalphadataprocclusterscreategcloudbetadataprocclusterscreate
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-09-16 UTC.