Dataproc boot disks

You select standard, SSD, balancedPersistent Disk,orGoogle Cloud Hyperdisk Balanced as boot disks forDataproc cluster nodes.

Select persistent boot disk types for cluster nodes

You can select the persistent boot disk type when you create a cluster using theGoogle Cloud console, Google Cloud CLI, or Dataproc API.

Console

You can create a cluster and select a standard, SSD, balanced Persistent Disk,orHyperdisk Balanced boot disk for manager (master), primary worker, andsecondary workercluster nodes from theConfigure nodes panel on the DataprocCreate a cluster pageof the Google Cloud console.

gcloud CLI

You can create a cluster and select a standard, SSD , balanced persistent boot disk,orHyperdisk balanced for manager (master), primary worker, andsecondary worker cluster nodes using thegcloud dataproc clusters createcommand with the--master-boot-disk-type,--worker-boot-disk-type, and--secondary-worker-boot-disk-type flags.

The default persistent boot disk type for Dataproc cluster manager (master) andprimary worker nodes ispd-standard. If the VMmachine typesupports onlyHyperdisk Balanced as the boot disk,the default boot disk ishyperdisk-balanced.The default persistent boot disk type for clustersecondary worker nodes is the primary worker node persistent boot disk type.

You can pass a value ofpd-standard,pd-ssd,pd-balanced, orhyperdisk-balanced to the--master-boot-disk-type,--worker-boot-disk-type, and--secondary-worker-boot-disk-type flags. Any of the validdisk type values can be set on any cluster node type.

Example:
gcloud dataproc clusters createCLUSTER_NAME \    --region=REGION \    --master-boot-disk-type=pd-ssd \    --worker-boot-disk-type=hyperdisk-balanced \    --secondary-worker-boot-disk-type=pd-standard \    other args ...
pYou can set the size of persistent boot disks using the--master-boot-disk-size,--worker-boot-disk-size, and--secondary-worker-boot-disk-sizeflags.

REST API

The default boot disk type for Dataproc cluster manager (master) and primary workernodes ispd-standard. If the VMmachine typesupports onlyHyperdisk Balanced as the boot disk,the default boot disk ishyperdisk-balanced. The default boot disk type forsecondary worker nodes is theprimary work node boot disk type.

You can set a value ofpd-standard,pd-ssd,pd-balanced, orhyperdisk-balanced in theInstanceGroupConfig.DiskConfig.bootDiskTypefield in themasterConfig,workerConfig, andsecondaryWorkerConfig as part of acluster.createAPI request. Any of the valid boot disk type type values can be set on any cluster node type.

Note: Set the per-node boot disksize using theInstanceGroupConfig.DiskConfig.bootDiskSizeGb field in the master, worker, or secondary worker config.

Hyperdisk settings

When you create a cluster with a Hyperdisk Balanced volumeas the boot disk for a Dataproc cluster node, you can set theprovisioned IOPS and throughput.

Console

Hyperdisk Balanced is selected as the defaultprimary boot disk type for manager (master) and primary worker cluster nodes from theConfigure nodes panel on the DataprocCreate a cluster pageof the Google Cloud console. You can set IOPS and throughput, or accept the defaultvalues.

gcloud CLI

You can set provisioned IOPS and provisioned throughput for cluster nodes with the hyperdisk-balanced boot disks using thegcloud dataproc clusters create command--master-boot-disk-provisioned-iops,--worker-boot-disk-provisioned-iops,--master-boot-disk-provisioned-throughput, and--worker-boot-disk-provisioned-throughput flags.

Example:
  gcloud dataproc clusters createCLUSTER_NAME \      --region=REGION \      --master-boot-disk-type=hyperdisk-balanced \      --master-boot-disk-provisioned-iops=MASTER_BOOT_DISK_IOPS  \      --master-boot-disk-provisioned-throughput=MASTER_BOOT_DISK_THROUGHPUT \      --worker-boot-disk-type=hyperdisk-balanced \      --worker-boot-disk-provisioned-iops=WORKER_BOOT_DISK_IOPS \      --worker-boot-disk-provisioned-throughput=WORKER_BOOT_DISK_THROUGHPUT \      other args ...
Secondary workers are configured with primary worker boot disk settings.

REST API

You can set provisioned IOPS and provisioned throughput for cluster nodes with Hyperdisk boot disks using theInstanceGroupConfig.DiskConfig.bootDiskProvisionedIops andInstanceGroupConfig.DiskConfig.bootDiskProvisionedThroughput fields for the manager (master) and worker configs.