gcloud dataproc clusters create

NAME: gcloud dataproc clusters create - create a cluster
SYNOPSIS: gcloud dataproc clusters create(CLUSTER :--region=REGION)[--action-on-failed-primary-workers=ACTION_ON_FAILED_PRIMARY_WORKERS][--async][--autoscaling-policy=AUTOSCALING_POLICY][--bucket=BUCKET][--cluster-type=TYPE][--confidential-compute][--dataproc-metastore=DATAPROC_METASTORE][--delete-max-idle=DELETE_MAX_IDLE][--driver-pool-accelerator=[type=TYPE,[count=COUNT],…]][--driver-pool-boot-disk-size=DRIVER_POOL_BOOT_DISK_SIZE][--driver-pool-boot-disk-type=DRIVER_POOL_BOOT_DISK_TYPE][--driver-pool-id=DRIVER_POOL_ID][--driver-pool-local-ssd-interface=DRIVER_POOL_LOCAL_SSD_INTERFACE][--driver-pool-machine-type=DRIVER_POOL_MACHINE_TYPE][--driver-pool-min-cpu-platform=PLATFORM][--driver-pool-size=DRIVER_POOL_SIZE][--enable-component-gateway][--initialization-action-timeout=TIMEOUT; default="10m"][--initialization-actions=CLOUD_STORAGE_URI,[…]][--labels=[KEY=VALUE,…]][--master-accelerator=[type=TYPE,[count=COUNT],…]][--master-boot-disk-provisioned-iops=MASTER_BOOT_DISK_PROVISIONED_IOPS][--master-boot-disk-provisioned-throughput=MASTER_BOOT_DISK_PROVISIONED_THROUGHPUT][--master-boot-disk-size=MASTER_BOOT_DISK_SIZE][--master-boot-disk-type=MASTER_BOOT_DISK_TYPE][--master-local-ssd-interface=MASTER_LOCAL_SSD_INTERFACE][--master-machine-type=MASTER_MACHINE_TYPE][--master-min-cpu-platform=PLATFORM][--min-secondary-worker-fraction=MIN_SECONDARY_WORKER_FRACTION][--node-group=NODE_GROUP][--num-driver-pool-local-ssds=NUM_DRIVER_POOL_LOCAL_SSDS][--num-master-local-ssds=NUM_MASTER_LOCAL_SSDS][--num-masters=NUM_MASTERS][--num-secondary-worker-local-ssds=NUM_SECONDARY_WORKER_LOCAL_SSDS][--num-worker-local-ssds=NUM_WORKER_LOCAL_SSDS][--optional-components=[COMPONENT,…]][--private-ipv6-google-access-type=PRIVATE_IPV6_GOOGLE_ACCESS_TYPE][--properties=[PREFIX:PROPERTY=VALUE,…]][--secondary-worker-accelerator=[type=TYPE,[count=COUNT],…]][--secondary-worker-boot-disk-size=SECONDARY_WORKER_BOOT_DISK_SIZE][--secondary-worker-boot-disk-type=SECONDARY_WORKER_BOOT_DISK_TYPE][--secondary-worker-local-ssd-interface=SECONDARY_WORKER_LOCAL_SSD_INTERFACE][--secondary-worker-machine-types=type=MACHINE_TYPE[,type=MACHINE_TYPE…][,rank=RANK]][--secondary-worker-standard-capacity-base=SECONDARY_WORKER_STANDARD_CAPACITY_BASE][--secondary-worker-standard-capacity-percent-above-base=SECONDARY_WORKER_STANDARD_CAPACITY_PERCENT_ABOVE_BASE][--shielded-integrity-monitoring][--shielded-secure-boot][--shielded-vtpm][--stop-max-idle=STOP_MAX_IDLE][--temp-bucket=TEMP_BUCKET][--tier=TIER][--worker-accelerator=[type=TYPE,[count=COUNT],…]][--worker-boot-disk-provisioned-iops=WORKER_BOOT_DISK_PROVISIONED_IOPS][--worker-boot-disk-provisioned-throughput=WORKER_BOOT_DISK_PROVISIONED_THROUGHPUT][--worker-boot-disk-size=WORKER_BOOT_DISK_SIZE][--worker-boot-disk-type=WORKER_BOOT_DISK_TYPE][--worker-local-ssd-interface=WORKER_LOCAL_SSD_INTERFACE][--worker-min-cpu-platform=PLATFORM][--zone=ZONE,-zZONE][--delete-expiration-time=DELETE_EXPIRATION_TIME |--delete-max-age=DELETE_MAX_AGE][--gce-pd-kms-key=GCE_PD_KMS_KEY :--gce-pd-kms-key-keyring=GCE_PD_KMS_KEY_KEYRING--gce-pd-kms-key-location=GCE_PD_KMS_KEY_LOCATION--gce-pd-kms-key-project=GCE_PD_KMS_KEY_PROJECT][--identity-config-file=IDENTITY_CONFIG_FILE |--secure-multi-tenancy-user-mapping=SECURE_MULTI_TENANCY_USER_MAPPING][--image=IMAGE |--image-version=VERSION][--kerberos-config-file=KERBEROS_CONFIG_FILE |--enable-kerberos--kerberos-root-principal-password-uri=KERBEROS_ROOT_PRINCIPAL_PASSWORD_URI [--kerberos-kms-key=KERBEROS_KMS_KEY :--kerberos-kms-key-keyring=KERBEROS_KMS_KEY_KEYRING--kerberos-kms-key-location=KERBEROS_KMS_KEY_LOCATION--kerberos-kms-key-project=KERBEROS_KMS_KEY_PROJECT]][--kms-key=KMS_KEY :--kms-keyring=KMS_KEYRING--kms-location=KMS_LOCATION--kms-project=KMS_PROJECT][--metadata=KEY=VALUE,[KEY=VALUE,…]--resource-manager-tags=KEY=VALUE,[KEY=VALUE,…]--scopes=SCOPE,[SCOPE,…]--service-account=SERVICE_ACCOUNT--tags=TAG,[TAG,…]--network=NETWORK |--subnet=SUBNET--reservation=RESERVATION--reservation-affinity=RESERVATION_AFFINITY; default="any"][[--metric-sources=[METRIC_SOURCE,…] :--metric-overrides=[METRIC_SOURCE:INSTANCE:GROUP:METRIC,…] |--metric-overrides-file=METRIC_OVERRIDES_FILE]][--no-address |--public-ip-address][--single-node |--min-num-workers=MIN_NUM_WORKERS--num-secondary-workers=NUM_SECONDARY_WORKERS--num-workers=NUM_WORKERS--secondary-worker-type=TYPE; default="preemptible"][--stop-expiration-time=STOP_EXPIRATION_TIME |--stop-max-age=STOP_MAX_AGE][--worker-machine-type=WORKER_MACHINE_TYPE |--worker-machine-types=type=MACHINE_TYPE[,type=MACHINE_TYPE…][,rank=RANK]][GCLOUD_WIDE_FLAG …]
DESCRIPTION: Create a cluster.
EXAMPLES: To create a cluster, run:
gclouddataprocclusterscreatemy-cluster--region=us-central1
POSITIONAL ARGUMENTS: Cluster resource - The name of the cluster to create. The arguments in thisgroup can be used to specify the attributes of this resource. (NOTE) Someattributes are not given arguments in this group but can be set in other ways.
To set theproject attribute:
provide the argumentcluster on the command line with a fullyspecified name;
provide the argument--project on the command line;
set the propertycore/project.
This must be specified.
CLUSTER
ID of the cluster or fully qualified identifier for the cluster.
To set thecluster attribute:
provide the argumentcluster on the command line.
This positional argument must be specified if any of the other arguments in thisgroup are specified.
--region=REGION
Dataproc region for the cluster. Each Dataproc region constitutes an independentresource namespace constrained to deploying instances into Compute Engine zonesinside the region. Overrides the defaultdataproc/region propertyvalue for this command invocation.
To set theregion attribute:
provide the argumentcluster on the command line with a fullyspecified name;
provide the argument--region on the command line;
set the propertydataproc/region.
FLAGS: --action-on-failed-primary-workers=ACTION_ON_FAILED_PRIMARY_WORKERS
Failure action to take when primary workers fail during cluster creation.ACTION_ON_FAILED_PRIMARY_WORKERS must be one of:
DELETE
delete the failed primary workers
FAILURE_ACTION_UNSPECIFIED
failure action is not specified
NO_ACTION
take no action
--async
Return immediately, without waiting for the operation in progress to complete.
--autoscaling-policy=AUTOSCALING_POLICY
ID of the autoscaling policy or fully qualified identifier for the autoscalingpolicy.
To set theautoscaling_policy attribute:
provide the argument--autoscaling-policy on the command line.
--bucket=BUCKET
The Google Cloud Storage bucket to use by default to stage job dependencies,miscellaneous config files, and job driver console output when using thiscluster.
--cluster-type=TYPE
The type of cluster.TYPE must be one of:standard,single-node,zero-scale.
--confidential-compute
Enables Confidential VM. Seehttps://cloud.google.com/compute/confidential-vm/docsfor more information. Note that Confidential VM can only be enabled when themachine types are N2D(https://cloud.google.com/compute/docs/machine-types#n2d_machine_types) and theimage is SEV Compatible.
--dataproc-metastore=DATAPROC_METASTORE
Specify the name of a Dataproc Metastore service to be used as an externalmetastore in the format:"projects/{project-id}/locations/{region}/services/{service-name}".
--delete-max-idle=DELETE_MAX_IDLE
The duration after the last job completes to auto-delete the cluster, such as"2h" or "1d". See $gcloud topicdatetimes for information on duration formats.
--driver-pool-accelerator=[type=TYPE,[count=COUNT],…]
Attaches accelerators, such as GPUs, to the driver-pool instance(s).
type
The specific type of accelerator to attach to the instances, such asnvidia-tesla-t4 for NVIDIA T4. Usegcloud computeaccelerator-types list to display available accelerator types.
count
The number of accelerators to attach to each instance. The default value is 1.
--driver-pool-boot-disk-size=DRIVER_POOL_BOOT_DISK_SIZE
The size of the boot disk. The value must be a whole number followed by a sizeunit ofKB for kilobyte,MB for megabyte,GB for gigabyte, orTB for terabyte. For example,10GB will produce a 10 gigabyte disk. Theminimum size a boot disk can have is 10 GB. Disk size must be a multiple of 1GB.
--driver-pool-boot-disk-type=DRIVER_POOL_BOOT_DISK_TYPE
The type of the boot disk. The value must bepd-balanced,pd-ssd, orpd-standard.
--driver-pool-id=DRIVER_POOL_ID
Custom identifier for the DRIVER Node Group being created. If not provided, arandom string is generated.
--driver-pool-local-ssd-interface=DRIVER_POOL_LOCAL_SSD_INTERFACE
Interface to use to attach local SSDs to cluster driver pool node(s).
--driver-pool-machine-type=DRIVER_POOL_MACHINE_TYPE
The type of machine to use for the cluster driver pool nodes. Defaults toserver-specified.
--driver-pool-min-cpu-platform=PLATFORM
When specified, the VM is scheduled on the host with a specified CPUarchitecture or a more recent CPU platform that's available in that zone. Tolist available CPU platforms in a zone, run:
gcloudcomputezonesdescribeZONE
CPU platform selection may not be available in a zone. Zones that support CPUplatform selection provide anavailableCpuPlatforms field, whichcontains the list of available CPU platforms in the zone (seeAvailabilityof CPU platforms for more information).
--driver-pool-size=DRIVER_POOL_SIZE
The size of the cluster driver pool.
--enable-component-gateway
Enable access to the web UIs of selected components on the cluster through thecomponent gateway.
--initialization-action-timeout=TIMEOUT; default="10m"
The maximum duration of each initialization action. See $gcloud topic datetimes forinformation on duration formats.
--initialization-actions=CLOUD_STORAGE_URI,[…]
A list of Google Cloud Storage URIs of executables to run on each node in thecluster.
--labels=[KEY=VALUE,…]
List of label KEY=VALUE pairs to add.
Keys must start with a lowercase character and contain only hyphens(-), underscores (_), lowercase characters, andnumbers. Values must contain only hyphens (-), underscores(_), lowercase characters, and numbers.
--master-accelerator=[type=TYPE,[count=COUNT],…]
Attaches accelerators, such as GPUs, to the master instance(s).
type
The specific type of accelerator to attach to the instances, such asnvidia-tesla-t4 for NVIDIA T4. Usegcloud computeaccelerator-types list to display available accelerator types.
count
The number of accelerators to attach to each instance. The default value is 1.
--master-boot-disk-provisioned-iops=MASTER_BOOT_DISK_PROVISIONED_IOPS
Indicates theIOPS toprovision for the disk. This sets the limit for disk I/O operations per second.This is only supported if the bootdisk type ishyperdisk-balanced.
--master-boot-disk-provisioned-throughput=MASTER_BOOT_DISK_PROVISIONED_THROUGHPUT
Indicates thethroughputto provision for the disk. This sets the limit for throughput in MiB per second.This is only supported if the bootdisk type ishyperdisk-balanced.
--master-boot-disk-size=MASTER_BOOT_DISK_SIZE
The size of the boot disk. The value must be a whole number followed by a sizeunit ofKB for kilobyte,MB for megabyte,GB for gigabyte, orTB for terabyte. For example,10GB will produce a 10 gigabyte disk. Theminimum size a boot disk can have is 10 GB. Disk size must be a multiple of 1GB.
--master-boot-disk-type=MASTER_BOOT_DISK_TYPE
The type of the boot disk. The value must bepd-balanced,pd-ssd, orpd-standard.
--master-local-ssd-interface=MASTER_LOCAL_SSD_INTERFACE
Interface to use to attach local SSDs to master node(s) in a cluster.
--master-machine-type=MASTER_MACHINE_TYPE
The type of machine to use for the master. Defaults to server-specified.
--master-min-cpu-platform=PLATFORM
When specified, the VM is scheduled on the host with a specified CPUarchitecture or a more recent CPU platform that's available in that zone. Tolist available CPU platforms in a zone, run:
gcloudcomputezonesdescribeZONE
CPU platform selection may not be available in a zone. Zones that support CPUplatform selection provide anavailableCpuPlatforms field, whichcontains the list of available CPU platforms in the zone (seeAvailabilityof CPU platforms for more information).
--min-secondary-worker-fraction=MIN_SECONDARY_WORKER_FRACTION
Minimum fraction of secondary worker nodes required to create the cluster. If itis not met, cluster creation will fail. Must be a decimal value between 0 and 1.The number of required secondary workers is calculated byceil(min-secondary-worker-fraction * num_secondary_workers). Defaults to 0.0001.
--node-group=NODE_GROUP
The name of the sole-tenant node group to create the cluster on. Can be a shortname ("node-group-name") or in the format"projects/{project-id}/zones/{zone}/nodeGroups/{node-group-name}".
--num-driver-pool-local-ssds=NUM_DRIVER_POOL_LOCAL_SSDS
The number of local SSDs to attach to each cluster driver pool node.
--num-master-local-ssds=NUM_MASTER_LOCAL_SSDS
The number of local SSDs to attach to the master in a cluster.
--num-masters=NUM_MASTERS
The number of master nodes in the cluster.
Number of Masters Cluster Mode
1 Standard
3 High Availability
--num-secondary-worker-local-ssds=NUM_SECONDARY_WORKER_LOCAL_SSDS
The number of local SSDs to attach to each preemptible worker in a cluster.
--num-worker-local-ssds=NUM_WORKER_LOCAL_SSDS
The number of local SSDs to attach to each worker in a cluster.
--optional-components=[COMPONENT,…]
List of optional components to be installed on cluster machines.
The following page documents the optional components that can be installed:https://cloud.google.com/dataproc/docs/concepts/configuring-clusters/optional-components.
--private-ipv6-google-access-type=PRIVATE_IPV6_GOOGLE_ACCESS_TYPE
The private IPv6 Google access type for the cluster.PRIVATE_IPV6_GOOGLE_ACCESS_TYPE must be one of:inherit-subnetwork,outbound,bidirectional.
--properties=[PREFIX:PROPERTY=VALUE,…]
Specifies configuration properties for installed packages, such as Hadoop andSpark.
Properties are mapped to configuration files by specifying a prefix, such as"core:io.serializations". The following are supported prefixes and theirmappings:
Prefix File Purpose of file
capacity-scheduler capacity-scheduler.xml Hadoop YARN Capacity Scheduler configuration
core core-site.xml Hadoop general configuration
distcp distcp-default.xml Hadoop Distributed Copy configuration
hadoop-env hadoop-env.sh Hadoop specific environment variables
hdfs hdfs-site.xml Hadoop HDFS configuration
hive hive-site.xml Hive configuration
mapred mapred-site.xml Hadoop MapReduce configuration
mapred-env mapred-env.sh Hadoop MapReduce specific environment variables
pig pig.properties Pig configuration
spark spark-defaults.conf Spark configuration
spark-env spark-env.sh Spark specific environment variables
yarn yarn-site.xml Hadoop YARN configuration
yarn-env yarn-env.sh Hadoop YARN specific environment variables
Seehttps://cloud.google.com/dataproc/docs/concepts/configuring-clusters/cluster-propertiesfor more information.
--secondary-worker-accelerator=[type=TYPE,[count=COUNT],…]
Attaches accelerators, such as GPUs, to the secondary-worker instance(s).
type
The specific type of accelerator to attach to the instances, such asnvidia-tesla-t4 for NVIDIA T4. Usegcloud computeaccelerator-types list to display available accelerator types.
count
The number of accelerators to attach to each instance. The default value is 1.
--secondary-worker-boot-disk-size=SECONDARY_WORKER_BOOT_DISK_SIZE
The size of the boot disk. The value must be a whole number followed by a sizeunit ofKB for kilobyte,MB for megabyte,GB for gigabyte, orTB for terabyte. For example,10GB will produce a 10 gigabyte disk. Theminimum size a boot disk can have is 10 GB. Disk size must be a multiple of 1GB.
--secondary-worker-boot-disk-type=SECONDARY_WORKER_BOOT_DISK_TYPE
The type of the boot disk. The value must bepd-balanced,pd-ssd, orpd-standard.
--secondary-worker-local-ssd-interface=SECONDARY_WORKER_LOCAL_SSD_INTERFACE
Interface to use to attach local SSDs to each secondary worker in a cluster.
--secondary-worker-machine-types=type=MACHINE_TYPE[,type=MACHINE_TYPE…][,rank=RANK]
Types of machines with optional rank for secondary workers to use. Defaults toserver-specified.eg.--secondary-worker-machine-types="type=e2-standard-8,type=t2d-standard-8,rank=0"
--secondary-worker-standard-capacity-base=SECONDARY_WORKER_STANDARD_CAPACITY_BASE
This flag sets the base number of Standard VMs to use forsecondaryworkers. Dataproc will create only standard VMs until it reaches thisnumber, then it will mix Spot and Standard VMs according toSECONDARY_WORKER_STANDARD_CAPACITY_PERCENT_ABOVE_BASE.
--secondary-worker-standard-capacity-percent-above-base=SECONDARY_WORKER_STANDARD_CAPACITY_PERCENT_ABOVE_BASE
When combining Standard and Spot VMs forsecondary-workersonce the number of Standard VMs specified bySECONDARY_WORKER_STANDARD_CAPACITY_BASE hasbeen used, this flag specifies the percentage of the total number of additionalStandard VMs secondary workers will use. Spot VMs will be used for the remainingpercentage.
--shielded-integrity-monitoring
Enables monitoring and attestation of the boot integrity of the cluster's VMs.vTPM (virtual Trusted Platform Module) must also be enabled. A TPM is a hardwaremodule that can be used for different security operations, such as remoteattestation, encryption, and sealing of keys.
--shielded-secure-boot
The cluster's VMs will boot with secure boot enabled.
--shielded-vtpm
The cluster's VMs will boot with the TPM (Trusted Platform Module) enabled. ATPM is a hardware module that can be used for different security operations,such as remote attestation, encryption, and sealing of keys.
--stop-max-idle=STOP_MAX_IDLE
The duration after the last job completes to auto-stop the cluster, such as "2h"or "1d". See $gcloud topicdatetimes for information on duration formats.
--temp-bucket=TEMP_BUCKET
The Google Cloud Storage bucket to use by default to store ephemeral cluster andjobs data, such as Spark and MapReduce history files.
--tier=TIER
Cluster tier.TIER must be one of:premium,standard.
--worker-accelerator=[type=TYPE,[count=COUNT],…]
Attaches accelerators, such as GPUs, to the worker instance(s).
type
The specific type of accelerator to attach to the instances, such asnvidia-tesla-t4 for NVIDIA T4. Usegcloud computeaccelerator-types list to display available accelerator types.
count
The number of accelerators to attach to each instance. The default value is 1.
--worker-boot-disk-provisioned-iops=WORKER_BOOT_DISK_PROVISIONED_IOPS
Indicates theIOPS toprovision for the disk. This sets the limit for disk I/O operations per second.This is only supported if the bootdisk type ishyperdisk-balanced.
--worker-boot-disk-provisioned-throughput=WORKER_BOOT_DISK_PROVISIONED_THROUGHPUT
Indicates thethroughputto provision for the disk. This sets the limit for throughput in MiB per second.This is only supported if the bootdisk type ishyperdisk-balanced.
--worker-boot-disk-size=WORKER_BOOT_DISK_SIZE
The size of the boot disk. The value must be a whole number followed by a sizeunit ofKB for kilobyte,MB for megabyte,GB for gigabyte, orTB for terabyte. For example,10GB will produce a 10 gigabyte disk. Theminimum size a boot disk can have is 10 GB. Disk size must be a multiple of 1GB.
--worker-boot-disk-type=WORKER_BOOT_DISK_TYPE
The type of the boot disk. The value must bepd-balanced,pd-ssd, orpd-standard.
--worker-local-ssd-interface=WORKER_LOCAL_SSD_INTERFACE
Interface to use to attach local SSDs to each worker in a cluster.
--worker-min-cpu-platform=PLATFORM
When specified, the VM is scheduled on the host with a specified CPUarchitecture or a more recent CPU platform that's available in that zone. Tolist available CPU platforms in a zone, run:
gcloudcomputezonesdescribeZONE
CPU platform selection may not be available in a zone. Zones that support CPUplatform selection provide anavailableCpuPlatforms field, whichcontains the list of available CPU platforms in the zone (seeAvailabilityof CPU platforms for more information).
--zone=ZONE,-zZONE
The compute zone (e.g. us-central1-a) for the cluster. If empty and --region isset to a value other thanglobal, the server will pick a zone inthe region. Overrides the defaultcompute/zone property value forthis command invocation.
At most one of these can be specified:
--delete-expiration-time=DELETE_EXPIRATION_TIME
The time when the cluster will be auto-deleted, such as"2017-08-29T18:52:51.142Z." See $gcloud topic datetimes forinformation on time formats.
--delete-max-age=DELETE_MAX_AGE
The lifespan of the cluster, with auto-deletion upon completion, such as "2h" or"1d". See $gcloud topicdatetimes for information on duration formats.
Key resource - The Cloud KMS (Key Management Service) cryptokey that will beused to protect the cluster. The 'Compute Engine Service Agent' service accountmust hold permission 'Cloud KMS CryptoKey Encrypter/Decrypter'. The arguments inthis group can be used to specify the attributes of this resource.
--gce-pd-kms-key=GCE_PD_KMS_KEY
ID of the key or fully qualified identifier for the key.
To set thekms-key attribute:
provide the argument--gce-pd-kms-key on the command line.
This flag argument must be specified if any of the other arguments in this groupare specified.
--gce-pd-kms-key-keyring=GCE_PD_KMS_KEY_KEYRING
The KMS keyring of the key.
To set thekms-keyring attribute:
provide the argument--gce-pd-kms-key on the command line with afully specified name;
provide the argument--gce-pd-kms-key-keyring on the command line.
--gce-pd-kms-key-location=GCE_PD_KMS_KEY_LOCATION
The Google Cloud location for the key.
To set thekms-location attribute:
provide the argument--gce-pd-kms-key on the command line with afully specified name;
provide the argument--gce-pd-kms-key-location on the command line.
--gce-pd-kms-key-project=GCE_PD_KMS_KEY_PROJECT
The Google Cloud project for the key.
To set thekms-project attribute:
provide the argument--gce-pd-kms-key on the command line with afully specified name;
provide the argument--gce-pd-kms-key-project on the command line;
set the propertycore/project.
Specifying these flags will enable Secure Multi-Tenancy for the cluster.
At most one of these can be specified:
--identity-config-file=IDENTITY_CONFIG_FILE
Path to a YAML (or JSON) file containing the configuration for SecureMulti-Tenancy on the cluster. The path can be a Cloud Storage URL (Example:'gs://path/to/file') or a local file system path. If you pass "-" as the valueof the flag the file content will be read from stdin.
The YAML file is formatted as follows:
# Required. The mapping from user accounts to service accounts.user_service_account_mapping:bob@company.com:service-account-bob@project.iam.gserviceaccount.comalice@company.com:service-account-alice@project.iam.gserviceaccount.com
--secure-multi-tenancy-user-mapping=SECURE_MULTI_TENANCY_USER_MAPPING
A string of user-to-service-account mappings. Mappings are separated by commas,and each mapping takes the form of "user-account:service-account". Example:"bob@company.com:service-account-bob@project.iam.gserviceaccount.com,alice@company.com:service-account-alice@project.iam.gserviceaccount.com".
At most one of these can be specified:
--image=IMAGE
The custom image used to create the cluster. It can be the image name, the imageURI, or the image family URI, which selects the latest image from the family.
--image-version=VERSION
The image version to use for the cluster. Defaults to the latest version.
Specifying these flags will enable Kerberos for the cluster.
At most one of these can be specified:
--kerberos-config-file=KERBEROS_CONFIG_FILE
Path to a YAML (or JSON) file containing the configuration for Kerberos on thecluster. If you pass- as the value of the flag the file contentwill be read from stdin.
The YAML file is formatted as follows:
# Optional. Flag to indicate whether to Kerberize the cluster.# The default value is true.enable_kerberos:true# Optional. The Google Cloud Storage URI of a KMS encrypted file# containing the root principal password.root_principal_password_uri:gs://bucket/password.encrypted# Optional. The URI of the Cloud KMS key used to encrypt# sensitive files.kms_key_uri:projects/myproject/locations/global/keyRings/mykeyring/cryptoKeys/my-key# Configuration of SSL encryption. If specified, all sub-fields# are required. Otherwise, Dataproc will provide a self-signed# certificate and generate the passwords.ssl:# Optional. The Google Cloud Storage URI of the keystore file.keystore_uri:gs://bucket/keystore.jks# Optional. The Google Cloud Storage URI of a KMS encrypted# file containing the password to the keystore.keystore_password_uri:gs://bucket/keystore_password.encrypted# Optional. The Google Cloud Storage URI of a KMS encrypted# file containing the password to the user provided key.key_password_uri:gs://bucket/key_password.encrypted# Optional. The Google Cloud Storage URI of the truststore# file.truststore_uri:gs://bucket/truststore.jks# Optional. The Google Cloud Storage URI of a KMS encrypted# file containing the password to the user provided# truststore.truststore_password_uri:gs://bucket/truststore_password.encrypted# Configuration of cross realm trust.cross_realm_trust:# Optional. The remote realm the Dataproc on-cluster KDC will# trust, should the user enable cross realm trust.realm:REMOTE.REALM# Optional. The KDC (IP or hostname) for the remote trusted# realm in a cross realm trust relationship.kdc:kdc.remote.realm# Optional. The admin server (IP or hostname) for the remote# trusted realm in a cross realm trust relationship.admin_server:admin-server.remote.realm# Optional. The Google Cloud Storage URI of a KMS encrypted# file containing the shared password between the on-cluster# Kerberos realm and the remote trusted realm, in a cross# realm trust relationship.shared_password_uri:gs://bucket/cross-realm.password.encrypted# Optional. The Google Cloud Storage URI of a KMS encrypted file# containing the master key of the KDC database.kdc_db_key_uri:gs://bucket/kdc_db_key.encrypted# Optional. The lifetime of the ticket granting ticket, in# hours. If not specified, or user specifies 0, then default# value 10 will be used.tgt_lifetime_hours:1# Optional. The name of the Kerberos realm. If not specified,# the uppercased domain name of the cluster will be used.realm:REALM.NAME
--enable-kerberos
Enable Kerberos on the cluster.
--kerberos-root-principal-password-uri=KERBEROS_ROOT_PRINCIPAL_PASSWORD_URI
Google Cloud Storage URI of a KMS encrypted file containing the root principalpassword. Must be a Cloud Storage URL beginning with 'gs://'.
Key resource - The Cloud KMS (Key Management Service) cryptokey that will beused to protect the password. The 'Compute Engine Service Agent' service accountmust hold permission 'Cloud KMS CryptoKey Encrypter/Decrypter'. The arguments inthis group can be used to specify the attributes of this resource.
--kerberos-kms-key=KERBEROS_KMS_KEY
ID of the key or fully qualified identifier for the key.
To set thekms-key attribute:
provide the argument--kerberos-kms-key on the command line.
This flag argument must be specified if any of the other arguments in this groupare specified.
--kerberos-kms-key-keyring=KERBEROS_KMS_KEY_KEYRING
The KMS keyring of the key.
To set thekms-keyring attribute:
provide the argument--kerberos-kms-key on the command line with afully specified name;
provide the argument--kerberos-kms-key-keyring on the commandline.
--kerberos-kms-key-location=KERBEROS_KMS_KEY_LOCATION
The Google Cloud location for the key.
To set thekms-location attribute:
provide the argument--kerberos-kms-key on the command line with afully specified name;
provide the argument--kerberos-kms-key-location on the commandline.
--kerberos-kms-key-project=KERBEROS_KMS_KEY_PROJECT
The Google Cloud project for the key.
To set thekms-project attribute:
provide the argument--kerberos-kms-key on the command line with afully specified name;
provide the argument--kerberos-kms-key-project on the commandline;
set the propertycore/project.
Key resource - The Cloud KMS (Key Management Service) cryptokey that will beused to protect the cluster. The 'Compute Engine Service Agent' service accountmust hold permission 'Cloud KMS CryptoKey Encrypter/Decrypter'. The arguments inthis group can be used to specify the attributes of this resource.
--kms-key=KMS_KEY
ID of the key or fully qualified identifier for the key.
To set thekms-key attribute:
provide the argument--kms-key on the command line.
This flag argument must be specified if any of the other arguments in this groupare specified.
--kms-keyring=KMS_KEYRING
The KMS keyring of the key.
To set thekms-keyring attribute:
provide the argument--kms-key on the command line with a fullyspecified name;
provide the argument--kms-keyring on the command line.
--kms-location=KMS_LOCATION
The Google Cloud location for the key.
To set thekms-location attribute:
provide the argument--kms-key on the command line with a fullyspecified name;
provide the argument--kms-location on the command line.
--kms-project=KMS_PROJECT
The Google Cloud project for the key.
To set thekms-project attribute:
provide the argument--kms-key on the command line with a fullyspecified name;
provide the argument--kms-project on the command line;
set the propertycore/project.
Compute Engine options for Dataproc clusters.
--metadata=KEY=VALUE,[KEY=VALUE,…]
Metadata to be made available to the guest operating system running on theinstances
--resource-manager-tags=KEY=VALUE,[KEY=VALUE,…]
Specifies a list of resource manager tags to apply to each cluster node (masterand worker nodes).
--scopes=SCOPE,[SCOPE,…]
Specifies scopes for the node instances. Multiple SCOPEs can be specified,separated by commas. Examples:
gclouddataprocclusterscreateexample-cluster--scopeshttps://www.googleapis.com/auth/bigtable.admin
gclouddataprocclusterscreateexample-cluster--scopessqlservice,bigquery
The followingminimum scopes are necessary for the cluster tofunction properly and are always added, even if not explicitly specified:
https://www.googleapis.com/auth/devstorage.read_write https://www.googleapis.com/auth/logging.write
If the--scopes flag is not specified, the followingdefaultscopes are also included:
https://www.googleapis.com/auth/bigquery https://www.googleapis.com/auth/bigtable.admin.table https://www.googleapis.com/auth/bigtable.data https://www.googleapis.com/auth/devstorage.full_control
If you want to enable all scopes use the 'cloud-platform' scope.
SCOPE can be either the full URI of the scope or an alias.Defaultscopes are assigned to all instances. Available aliases are:
Alias URI
bigquery https://www.googleapis.com/auth/bigquery
cloud-platform https://www.googleapis.com/auth/cloud-platform
cloud-source-repos https://www.googleapis.com/auth/source.full_control
cloud-source-repos-ro https://www.googleapis.com/auth/source.read_only
compute-ro https://www.googleapis.com/auth/compute.readonly
compute-rw https://www.googleapis.com/auth/compute
datastore https://www.googleapis.com/auth/datastore
default https://www.googleapis.com/auth/devstorage.read_only
https://www.googleapis.com/auth/logging.write
https://www.googleapis.com/auth/monitoring.write
https://www.googleapis.com/auth/pubsub
https://www.googleapis.com/auth/service.management.readonly
https://www.googleapis.com/auth/servicecontrol
https://www.googleapis.com/auth/trace.append
gke-default https://www.googleapis.com/auth/devstorage.read_only
https://www.googleapis.com/auth/logging.write
https://www.googleapis.com/auth/monitoring
https://www.googleapis.com/auth/service.management.readonly
https://www.googleapis.com/auth/servicecontrol
https://www.googleapis.com/auth/trace.append
logging-write https://www.googleapis.com/auth/logging.write
monitoring https://www.googleapis.com/auth/monitoring
monitoring-read https://www.googleapis.com/auth/monitoring.read
monitoring-write https://www.googleapis.com/auth/monitoring.write
pubsub https://www.googleapis.com/auth/pubsub
service-control https://www.googleapis.com/auth/servicecontrol
service-management https://www.googleapis.com/auth/service.management.readonly
sql (deprecated) https://www.googleapis.com/auth/sqlservice
sql-admin https://www.googleapis.com/auth/sqlservice.admin
storage-full https://www.googleapis.com/auth/devstorage.full_control
storage-ro https://www.googleapis.com/auth/devstorage.read_only
storage-rw https://www.googleapis.com/auth/devstorage.read_write
taskqueue https://www.googleapis.com/auth/taskqueue
trace https://www.googleapis.com/auth/trace.append
userinfo-email https://www.googleapis.com/auth/userinfo.email
DEPRECATION WARNING:https://www.googleapis.com/auth/sqlserviceaccount scope andsql alias do not provide SQL instance managementcapabilities and have been deprecated. Please, usehttps://www.googleapis.com/auth/sqlservice.adminorsql-admin to manage your Google SQL Service instances.
--service-account=SERVICE_ACCOUNT
The Google Cloud IAM service account to be authenticated as.
--tags=TAG,[TAG,…]
Specifies a list of tags to apply to the instance. These tags allow networkfirewall rules and routes to be applied to specified VM instances. Seegcloud computefirewall-rules create(1) for more details.
To read more about configuring network tags, read this guide:https://cloud.google.com/vpc/docs/add-remove-network-tags
To list instances with their respective status and tags, run:
gcloudcomputeinstanceslist--format='table(name,status,tags.list())'
To list instances tagged with a specific tag,tag1, run:
gcloudcomputeinstanceslist--filter='tags:tag1'
At most one of these can be specified:
--network=NETWORK
The Compute Engine network that the VM instances of the cluster will be part of.This is mutually exclusive with --subnet. If neither is specified, this defaultsto the "default" network.
--subnet=SUBNET
Specifies the subnet that the cluster will be part of. This is mutally exclusivewith --network.
Specifies the reservation for the instance.
--reservation=RESERVATION
The name of the reservation, required when--reservation-affinity=specific.
--reservation-affinity=RESERVATION_AFFINITY; default="any"
The type of reservation for the instance.RESERVATION_AFFINITY must be one of:any,none,specific.
--metric-sources=[METRIC_SOURCE,…]
Specifies a list of clusterMetricSources to collect custom metrics.METRIC_SOURCEmust be one of:FLINK,HDFS,HIVEMETASTORE,HIVESERVER2,MONITORING_AGENT_DEFAULTS,SPARK,SPARK_HISTORY_SERVER,YARN.
At most one of these can be specified:
--metric-overrides=[METRIC_SOURCE:INSTANCE:GROUP:METRIC,…]
List of metrics that override the default metrics enabled for the metricsources. Any of theavailableOSS metrics and all Spark metrics, can be listed for collection as a metricoverride. Override metric values are case sensitive, and must be provided, ifappropriate, in CamelCase format, for example:
sparkHistoryServer:JVM:Memory:NonHeapMemoryUsage.committedhiveserver2:JVM:Memory:NonHeapMemoryUsage.used
Only the specified overridden metrics will be collected from a given metricsource. For example, if one or morespark:executive metrics arelisted as metric overrides, otherSPARK metrics will not becollected. The collection of default OSS metrics from other metric sources isunaffected. For example, if bothSPARK andYARN metricsources are enabled, and overrides are provided for Spark metrics only, alldefault YARN metrics will be collected.
The source of the specified metric override must be enabled. For example, if oneor morespark:driver metrics are provided as metric overrides, thespark metric source must be enabled (--metric-sources=spark).
--metric-overrides-file=METRIC_OVERRIDES_FILE
Path to a file containing list of Metrics that override the default metricsenabled for the metric sources. The path can be a Cloud Storage URL (example:gs://path/to/file) or a local file system path.
At most one of these can be specified:
--no-address
If provided, the instances in the cluster will not be assigned external IPaddresses.
If omitted, then the Dataproc service will apply a default policy to determineif each instance in the cluster gets an external IP address or not.
Note: Dataproc VMs need access to the Dataproc API. This can be achieved withoutexternal IP addresses using Private Google Access(https://cloud.google.com/compute/docs/private-google-access).
--public-ip-address
If provided, cluster instances are assigned external IP addresses.
If omitted, the Dataproc service applies a default policy to determine whetheror not each instance in the cluster gets an external IP address.
Note: Dataproc VMs need access to the Dataproc API. This can be achieved withoutexternal IP addresses using Private Google Access(https://cloud.google.com/compute/docs/private-google-access).
At most one of these can be specified:
--single-node
Create a single node cluster.
A single node cluster has all master and worker components. It cannot have anyseparate worker nodes. If this flag is not specified, a cluster with separateworkers is created.
Multi-node cluster flags
--min-num-workers=MIN_NUM_WORKERS
Minimum number of primary worker nodes to provision for cluster creation tosucceed.
--num-secondary-workers=NUM_SECONDARY_WORKERS
The number of secondary worker nodes in the cluster.
--num-workers=NUM_WORKERS
The number of worker nodes in the cluster. Defaults to server-specified.
--secondary-worker-type=TYPE; default="preemptible"
The type of the secondary worker group.TYPE must be oneof:preemptible,non-preemptible,spot.
At most one of these can be specified:
--stop-expiration-time=STOP_EXPIRATION_TIME
The time when the cluster will be auto-stopped, such as"2017-08-29T18:52:51.142Z." See $gcloud topic datetimes forinformation on time formats.
--stop-max-age=STOP_MAX_AGE
The lifespan of the cluster, with auto-stop upon completion, such as "2h" or"1d". See $gcloud topicdatetimes for information on duration formats.
At most one of these can be specified:
--worker-machine-type=WORKER_MACHINE_TYPE
The type of machine to use for primary workers. Defaults to server-specified.
--worker-machine-types=type=MACHINE_TYPE[,type=MACHINE_TYPE…][,rank=RANK]
Machinetypes for primary worker nodes to use with optional rank. A lower ranknumber is given higher preference. Based on availablilty, Dataproc tries tocreate primary worker VMs using the worker machine type with the lowest rank,and then tries to use machine types with higher ranks as necessary. Machinetypes with the same rank are given the same preference. Example use:--worker-machine-types="type=e2-standard-8,type=n2-standard-8,rank=0". For moreinformation, seeDataprocFlexible VMs
GCLOUD WIDE FLAGS: These flags are available to all commands:--access-token-file,--account,--billing-project,--configuration,--flags-file,--flatten,--format,--help,--impersonate-service-account,--log-http,--project,--quiet,--trace-token,--user-output-enabled,--verbosity.
Run$gcloud help for details.
NOTES: These variants are also available:
gcloudalphadataprocclusterscreate
gcloudbetadataprocclusterscreate

Number of Masters	Cluster Mode
1	Standard
3	High Availability

Prefix	File	Purpose of file
capacity-scheduler	capacity-scheduler.xml	Hadoop YARN Capacity Scheduler configuration
core	core-site.xml	Hadoop general configuration
distcp	distcp-default.xml	Hadoop Distributed Copy configuration
hadoop-env	hadoop-env.sh	Hadoop specific environment variables
hdfs	hdfs-site.xml	Hadoop HDFS configuration
hive	hive-site.xml	Hive configuration
mapred	mapred-site.xml	Hadoop MapReduce configuration
mapred-env	mapred-env.sh	Hadoop MapReduce specific environment variables
pig	pig.properties	Pig configuration
spark	spark-defaults.conf	Spark configuration
spark-env	spark-env.sh	Spark specific environment variables
yarn	yarn-site.xml	Hadoop YARN configuration
yarn-env	yarn-env.sh	Hadoop YARN specific environment variables

Alias	URI
bigquery	https://www.googleapis.com/auth/bigquery
cloud-platform	https://www.googleapis.com/auth/cloud-platform
cloud-source-repos	https://www.googleapis.com/auth/source.full_control
cloud-source-repos-ro	https://www.googleapis.com/auth/source.read_only
compute-ro	https://www.googleapis.com/auth/compute.readonly
compute-rw	https://www.googleapis.com/auth/compute
datastore	https://www.googleapis.com/auth/datastore
default	https://www.googleapis.com/auth/devstorage.read_only
	https://www.googleapis.com/auth/logging.write
	https://www.googleapis.com/auth/monitoring.write
	https://www.googleapis.com/auth/pubsub
	https://www.googleapis.com/auth/service.management.readonly
	https://www.googleapis.com/auth/servicecontrol
	https://www.googleapis.com/auth/trace.append
gke-default	https://www.googleapis.com/auth/devstorage.read_only
	https://www.googleapis.com/auth/logging.write
	https://www.googleapis.com/auth/monitoring
	https://www.googleapis.com/auth/service.management.readonly
	https://www.googleapis.com/auth/servicecontrol
	https://www.googleapis.com/auth/trace.append
logging-write	https://www.googleapis.com/auth/logging.write
monitoring	https://www.googleapis.com/auth/monitoring
monitoring-read	https://www.googleapis.com/auth/monitoring.read
monitoring-write	https://www.googleapis.com/auth/monitoring.write
pubsub	https://www.googleapis.com/auth/pubsub
service-control	https://www.googleapis.com/auth/servicecontrol
service-management	https://www.googleapis.com/auth/service.management.readonly
sql (deprecated)	https://www.googleapis.com/auth/sqlservice
sql-admin	https://www.googleapis.com/auth/sqlservice.admin
storage-full	https://www.googleapis.com/auth/devstorage.full_control
storage-ro	https://www.googleapis.com/auth/devstorage.read_only
storage-rw	https://www.googleapis.com/auth/devstorage.read_write
taskqueue	https://www.googleapis.com/auth/taskqueue
trace	https://www.googleapis.com/auth/trace.append
userinfo-email	https://www.googleapis.com/auth/userinfo.email

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-09-16 UTC.

Movatterモバイル変換

gcloud dataproc clusters create Stay organized with collections Save and categorize content based on your preferences.

gcloud dataproc clusters create