Create a Dataproc partial cluster

To mitigate the effects of the unavailability of user-specified VMs in specificregions at specific times(stockouts),Dataproc lets you request the creation of apartial cluster by specifyingaminimum number of primary workers that is acceptable to allow cluster creation.

Note: SeeDataproc secondary workersto understand the difference between primary and secondary workers.
Standard clusterPartial cluster
If one or more primary workers cannot be created and initialized, cluster creation fails. Workers that are created continue to run and incur charges until deleted by the user.If the specified minimum number of workers can be created, the cluster is created. Failed (uninitialized) workers are deleted and don't incur charges. If the specified minimum number of workers can't be created and initialized, the cluster is not created. Workers that are created aren't deleted to allow for debugging.
Cluster creation time is optimized.Longer cluster creation time can occur since all nodes must report provisioning status.
Single node clusters are available for creation.Single node clusters are not available for creation.

Autoscaling

Useautoscalingwith partial cluster creation to make sure that the target (full) numberof primary workers is created. Autoscaling will try to acquire failed workersin the background if the workload requires them.

The following is a sample autoscaling policy that retries until the total numberof primary worker instances reaches a target size of 10.The policy'sminInstances andmaxInstances match the minimum and totalnumber of primary workers specified at cluster creation time (seeCreate a partial cluster).Setting thescaleDownFactor to 0 prevents the cluster from scaling downfrom 10 to 8, and will help keep the number of workers at the maximum 10-workerlimit.

workerConfig:minInstances:8maxInstances:10basicAlgorithm:cooldownPeriod:2myarnConfig:scaleUpFactor:1scaleDownFactor:0gracefulDecommissionTimeout:1h

Create a partial cluster

You can use the Google Cloud CLI or the Dataproc API tocreate a Dataproc partial cluster.

Note: Dataproc partial cluster creation is not available in theGoogle Cloud console.

gcloud

To create a Dataproc partial cluster on the command line, run thefollowinggcloud dataproc clusters createcommand locally in a terminal window or inCloud Shell.

gcloud dataproc clusters createCLUSTER_NAME \    --project=PROJECT \    --region=REGION \    --num-workers=NUM_WORKERS \    --min-num-workers=MIN_NUM_WORKERS \    other args ...

Replace the following:

  • CLUSTER_NAME: The cluster name must start with a lowercase letterfollowed by up to 51 lowercase letters, numbers, and hyphens, and cannot end with a hyphen.
  • PROJECT: Specify the project associated with the job cluster.
  • REGION: Specify theCompute Engine regionwhere the job cluster will be located.
  • NUM_WORKERS: The total number of primary workers in the cluster tocreate if available.
  • MIN_NUM_WORKERS: The minimum number of primary workers to createif the specified total number of workers (NUM_WORKERS) cannot be created.Cluster creation fails if this minimum number of primary workers cannot be created(workers that are created are not deleted to allow for debugging).If this flag is omitted, standard cluster creation with the total number ofprimary workers (NUM_WORKERS) is attempted.

REST

To create a Dataproc partial cluster, specify the minimum number of primary workers in theworkerConfig.minNumInstancesfield as part of aclusters.create request.

Note: You can click the Equivalent RESTor command line links at the bottom of the left panel of theDataproc Google Cloud consoleCreate a cluster page to havethe Console construct an equivalent API REST request or gcloud CLIcommand to use in your code or from the command line to create a cluster.

Display the number of provisioned workers

After creating a cluster, you can run the following gcloud CLIcommand to list the number of workers, including any secondary workers,provisioned in your cluster.

gcloud dataproc clusters list \    --project=PROJECT \    --region=REGION \    --filter=clusterName=CLUSTER_NAME

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.