Create a Dataproc partial cluster Stay organized with collections Save and categorize content based on your preferences.
To mitigate the effects of the unavailability of user-specified VMs in specificregions at specific times(stockouts),Dataproc lets you request the creation of apartial cluster by specifyingaminimum number of primary workers that is acceptable to allow cluster creation.
| Standard cluster | Partial cluster |
|---|---|
| If one or more primary workers cannot be created and initialized, cluster creation fails. Workers that are created continue to run and incur charges until deleted by the user. | If the specified minimum number of workers can be created, the cluster is created. Failed (uninitialized) workers are deleted and don't incur charges. If the specified minimum number of workers can't be created and initialized, the cluster is not created. Workers that are created aren't deleted to allow for debugging. |
| Cluster creation time is optimized. | Longer cluster creation time can occur since all nodes must report provisioning status. |
| Single node clusters are available for creation. | Single node clusters are not available for creation. |
Autoscaling
Useautoscalingwith partial cluster creation to make sure that the target (full) numberof primary workers is created. Autoscaling will try to acquire failed workersin the background if the workload requires them.
The following is a sample autoscaling policy that retries until the total numberof primary worker instances reaches a target size of 10.The policy'sminInstances andmaxInstances match the minimum and totalnumber of primary workers specified at cluster creation time (seeCreate a partial cluster).Setting thescaleDownFactor to 0 prevents the cluster from scaling downfrom 10 to 8, and will help keep the number of workers at the maximum 10-workerlimit.
workerConfig:minInstances:8maxInstances:10basicAlgorithm:cooldownPeriod:2myarnConfig:scaleUpFactor:1scaleDownFactor:0gracefulDecommissionTimeout:1hCreate a partial cluster
You can use the Google Cloud CLI or the Dataproc API tocreate a Dataproc partial cluster.
Note: Dataproc partial cluster creation is not available in theGoogle Cloud console.gcloud
To create a Dataproc partial cluster on the command line, run thefollowinggcloud dataproc clusters createcommand locally in a terminal window or inCloud Shell.
gcloud dataproc clusters createCLUSTER_NAME \ --project=PROJECT \ --region=REGION \ --num-workers=NUM_WORKERS \ --min-num-workers=MIN_NUM_WORKERS \ other args ...
Replace the following:
- CLUSTER_NAME: The cluster name must start with a lowercase letterfollowed by up to 51 lowercase letters, numbers, and hyphens, and cannot end with a hyphen.
- PROJECT: Specify the project associated with the job cluster.
- REGION: Specify theCompute Engine regionwhere the job cluster will be located.
- NUM_WORKERS: The total number of primary workers in the cluster tocreate if available.
- MIN_NUM_WORKERS: The minimum number of primary workers to createif the specified total number of workers (
NUM_WORKERS) cannot be created.Cluster creation fails if this minimum number of primary workers cannot be created(workers that are created are not deleted to allow for debugging).If this flag is omitted, standard cluster creation with the total number ofprimary workers (NUM_WORKERS) is attempted.
REST
To create a Dataproc partial cluster, specify the minimum number of primary workers in theworkerConfig.minNumInstancesfield as part of aclusters.create request.
Display the number of provisioned workers
After creating a cluster, you can run the following gcloud CLIcommand to list the number of workers, including any secondary workers,provisioned in your cluster.
gcloud dataproc clusters list \ --project=PROJECT \ --region=REGION \ --filter=clusterName=CLUSTER_NAME
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-15 UTC.