Create a cluster

Dataproc prevents the creation of clusters with image versionsprior to 1.3.95, 1.4.77, 1.5.53, and 2.0.27, which were affected byApache Log4j security vulnerabilities. Dataproc also prevents cluster creation for Dataproc image versions 0.x, 1.0.x, 1.1.x, and 1.2.x.Dataproc advises that, when possible, you create Dataprocclusters with the latest sub-minor image versions.

Image versionlog4j versionCustomer guidance
2.0.29, 1.5.55, and 1.4.79, or later of eachlog4j.2.17.1Advised
2.0.28, 1.5.54, and 1.4.78log4j.2.17.0Advised
2.0.27, 1.5.53, and 1.4.77log4j.2.16.0Strongly recommended
2.0.26, 1.5.52, and 1.4.76, or earlier of eachOlder versionDiscontinue use

See theDataproc release notesfor specific image andlog4j update information.

Create a Dataproc cluster

Requirements:

  • Name: The cluster name must start with a lowercase letter followed by upto 51 lowercase letters, numbers, and hyphens, and cannot end with a hyphen.

  • Cluster region: You must specify a Compute Engine region forthe cluster, such asus-east1 oreurope-west1, toisolate cluster resources, such as VM instances and cluster metadata stored inCloud Storage, within the region.

    • SeeRegional endpoints for moreinformation on regional endpoints.
    • SeeAvailable regions & zonesfor information on selecting a region. You can also run thegcloud compute regions list command to display a listing of available regions.
  • Connectivity:Compute Engine Virtual Machine instances(VMs) in a Dataproc cluster, consisting of master and worker VMs, requirefull internal IP networking cross connectivity. Thedefault VPC network provides thisconnectivity (seeDataproc Cluster Network Configuration).

Also see:

gcloud

To create a Dataproc cluster on the command line, run thegcloud dataproc clusters createcommand locally in a terminal window or inCloud Shell.

gcloud dataproc clusters createCLUSTER_NAME \    --region=REGION

The command creates a cluster with default Dataproc service settingsfor your master and worker virtual machine instances, disk sizes and types,network type, region and zone where your cluster is deployed, and other clustersettings. See thegcloud dataproc clusters createcommand for information on using command line flags to customize cluster settings.

Create a cluster with a YAML file

  1. Run the followinggcloud command to export the configurationof an existing Dataproc cluster into acluster.yamlfile.
    gcloud dataproc clusters exportEXISTING_CLUSTER_NAME \    --region=REGION \    --destination=cluster.yaml
  2. Create a new cluster by importing the YAML file configuration.
    gcloud dataproc clusters importNEW_CLUSTER_NAME \    --region=REGION \    --source=cluster.yaml

Note: During the export operation, cluster-specific fields,such as cluster name, output-only fields, and automatically applied labels arefiltered. These fields are disallowed in the imported YAML file used to create a cluster.

Note: You can click the Equivalent RESTor command line links at the bottom of the left panel on theDataproc Google Cloud consoleCreate a cluster page to havethe Console construct an equivalent API REST request orgcloud toolcommand to use in your code or from the command line to create a cluster.

REST

This section shows how to create a cluster with required values and the default configuration (1 master, 2 workers).

Before using any of the request data, make the following replacements:

  • CLUSTER_NAME: cluster name
  • PROJECT: Google Cloud project ID
  • REGION: An available Compute Engineregion where the cluster will be created.
  • ZONE: An optionalzone within the selected region where the cluster will be created.

HTTP method and URL:

POST https://dataproc.googleapis.com/v1/projects/PROJECT/regions/REGION/clusters

Request JSON body:

{  "project_id":"PROJECT",  "cluster_name":"CLUSTER_NAME",  "config":{    "master_config":{      "num_instances":1,      "machine_type_uri":"n1-standard-2",      "image_uri":""    },    "softwareConfig": {      "imageVersion": "",      "properties": {},      "optionalComponents": []    },    "worker_config":{      "num_instances":2,      "machine_type_uri":"n1-standard-2",      "image_uri":""    },    "gce_cluster_config":{      "zone_uri":"ZONE"    }  }}

To send your request, expand one of these options:

curl (Linux, macOS, or Cloud Shell)

Note: The following command assumes that you have logged in to thegcloud CLI with your user account by runninggcloud init orgcloud auth login , or by usingCloud Shell, which automatically logs you into thegcloud CLI . You can check the currently active account by runninggcloud auth list.

Save the request body in a file namedrequest.json, and execute the following command:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://dataproc.googleapis.com/v1/projects/PROJECT/regions/REGION/clusters"

PowerShell (Windows)

Note: The following command assumes that you have logged in to thegcloud CLI with your user account by runninggcloud init orgcloud auth login . You can check the currently active account by runninggcloud auth list.

Save the request body in a file namedrequest.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://dataproc.googleapis.com/v1/projects/PROJECT/regions/REGION/clusters" | Select-Object -Expand Content

You should receive a JSON response similar to the following:

{"name": "projects/PROJECT/regions/REGION/operations/b5706e31......",  "metadata": {    "@type": "type.googleapis.com/google.cloud.dataproc.v1.ClusterOperationMetadata",    "clusterName": "CLUSTER_NAME",    "clusterUuid": "5fe882b2-...",    "status": {      "state": "PENDING",      "innerState": "PENDING",      "stateStartTime": "2019-11-21T00:37:56.220Z"    },    "operationType": "CREATE",    "description": "Create cluster with 2 workers",    "warnings": [      "For PD-Standard without local SSDs, we strongly recommend provisioning 1TB ...""    ]  }}
Note: You can click the Equivalent RESTor command line links at the bottom of the left panel of theDataproc Google Cloud consoleCreate a cluster page to havethe Console construct an equivalent API REST request orgcloud toolcommand to use in your code or from the command line to create a cluster.

Console

Open the DataprocCreate a clusterpage in the Google Cloud console in your browser, then clickCreate in the cluster onCompute engine row in theCreate a Dataproc cluster on Compute Engine page. TheSet up cluster panel is selected with fields filled in with default values. Youcan select each panel and confirm or change default values to customize your cluster.

ClickCreate to create the cluster. The cluster name appears intheClusters page, and its status is updated to Running afterthe cluster is provisioned. Click the cluster name to open the cluster detailspage where you can examine jobs, instances, and configuration settings for yourcluster and connect to web interfaces running on your cluster.

Go

  1. Install the client library.
  2. Set up application default credentials.
  3. Run the code.SeeSetting up your development environment.
    import("context""fmt""io"dataproc"cloud.google.com/go/dataproc/apiv1""cloud.google.com/go/dataproc/apiv1/dataprocpb""google.golang.org/api/option")funccreateCluster(wio.Writer,projectID,region,clusterNamestring)error{// projectID := "your-project-id"// region := "us-central1"// clusterName := "your-cluster"ctx:=context.Background()// Create the cluster client.endpoint:=region+"-dataproc.googleapis.com:443"clusterClient,err:=dataproc.NewClusterControllerClient(ctx,option.WithEndpoint(endpoint))iferr!=nil{returnfmt.Errorf("dataproc.NewClusterControllerClient: %w",err)}deferclusterClient.Close()// Create the cluster config.req:=&dataprocpb.CreateClusterRequest{ProjectId:projectID,Region:region,Cluster:&dataprocpb.Cluster{ProjectId:projectID,ClusterName:clusterName,Config:&dataprocpb.ClusterConfig{MasterConfig:&dataprocpb.InstanceGroupConfig{NumInstances:1,MachineTypeUri:"n1-standard-2",},WorkerConfig:&dataprocpb.InstanceGroupConfig{NumInstances:2,MachineTypeUri:"n1-standard-2",},},},}// Create the cluster.op,err:=clusterClient.CreateCluster(ctx,req)iferr!=nil{returnfmt.Errorf("CreateCluster: %w",err)}resp,err:=op.Wait(ctx)iferr!=nil{returnfmt.Errorf("CreateCluster.Wait: %w",err)}// Output a success message.fmt.Fprintf(w,"Cluster created successfully: %s",resp.ClusterName)returnnil}

Java

  1. Install the client library.
  2. Set up application default credentials.
  3. Run the code.SeeSetting Up a Java Development Environment.
    importcom.google.api.gax.longrunning.OperationFuture;importcom.google.cloud.dataproc.v1.Cluster;importcom.google.cloud.dataproc.v1.ClusterConfig;importcom.google.cloud.dataproc.v1.ClusterControllerClient;importcom.google.cloud.dataproc.v1.ClusterControllerSettings;importcom.google.cloud.dataproc.v1.ClusterOperationMetadata;importcom.google.cloud.dataproc.v1.InstanceGroupConfig;importjava.io.IOException;importjava.util.concurrent.ExecutionException;publicclassCreateCluster{publicstaticvoidcreateCluster()throwsIOException,InterruptedException{// TODO(developer): Replace these variables before running the sample.StringprojectId="your-project-id";Stringregion="your-project-region";StringclusterName="your-cluster-name";createCluster(projectId,region,clusterName);}publicstaticvoidcreateCluster(StringprojectId,Stringregion,StringclusterName)throwsIOException,InterruptedException{StringmyEndpoint=String.format("%s-dataproc.googleapis.com:443",region);// Configure the settings for the cluster controller client.ClusterControllerSettingsclusterControllerSettings=ClusterControllerSettings.newBuilder().setEndpoint(myEndpoint).build();// Create a cluster controller client with the configured settings. The client only needs to be// created once and can be reused for multiple requests. Using a try-with-resources// closes the client, but this can also be done manually with the .close() method.try(ClusterControllerClientclusterControllerClient=ClusterControllerClient.create(clusterControllerSettings)){// Configure the settings for our cluster.InstanceGroupConfigmasterConfig=InstanceGroupConfig.newBuilder().setMachineTypeUri("n1-standard-2").setNumInstances(1).build();InstanceGroupConfigworkerConfig=InstanceGroupConfig.newBuilder().setMachineTypeUri("n1-standard-2").setNumInstances(2).build();ClusterConfigclusterConfig=ClusterConfig.newBuilder().setMasterConfig(masterConfig).setWorkerConfig(workerConfig).build();// Create the cluster object with the desired cluster config.Clustercluster=Cluster.newBuilder().setClusterName(clusterName).setConfig(clusterConfig).build();// Create the Cloud Dataproc cluster.OperationFuture<Cluster,ClusterOperationMetadata>createClusterAsyncRequest=clusterControllerClient.createClusterAsync(projectId,region,cluster);Clusterresponse=createClusterAsyncRequest.get();// Print out a success message.System.out.printf("Cluster created successfully: %s",response.getClusterName());}catch(ExecutionExceptione){System.err.println(String.format("Error executing createCluster: %s ",e.getMessage()));}}}

Node.js

  1. Install the client library.
  2. Set up application default credentials.
  3. Run the code.SeeSetting up a Node.js development environment.
    constdataproc=require('@google-cloud/dataproc');// TODO(developer): Uncomment and set the following variables// projectId = 'YOUR_PROJECT_ID'// region = 'YOUR_CLUSTER_REGION'// clusterName = 'YOUR_CLUSTER_NAME'// Create a client with the endpoint set to the desired cluster regionconstclient=newdataproc.v1.ClusterControllerClient({apiEndpoint:`${region}-dataproc.googleapis.com`,projectId:projectId,});asyncfunctioncreateCluster(){// Create the cluster configconstrequest={projectId:projectId,region:region,cluster:{clusterName:clusterName,config:{masterConfig:{numInstances:1,machineTypeUri:'n1-standard-2',},workerConfig:{numInstances:2,machineTypeUri:'n1-standard-2',},},},};// Create the clusterconst[operation]=awaitclient.createCluster(request);const[response]=awaitoperation.promise();// Output a success messageconsole.log(`Cluster created successfully:${response.clusterName}`);

Python

  1. Install the client library.
  2. Set up application default credentials.
  3. Run the code.SeeSetting Up a Python Development Environment.
    fromgoogle.cloudimportdataproc_v1asdataprocdefcreate_cluster(project_id,region,cluster_name):"""This sample walks a user through creating a Cloud Dataproc cluster    using the Python client library.    Args:        project_id (string): Project to use for creating resources.        region (string): Region where the resources should live.        cluster_name (string): Name to use for creating a cluster.    """# Create a client with the endpoint set to the desired cluster region.cluster_client=dataproc.ClusterControllerClient(client_options={"api_endpoint":f"{region}-dataproc.googleapis.com:443"})# Create the cluster config.cluster={"project_id":project_id,"cluster_name":cluster_name,"config":{"master_config":{"num_instances":1,"machine_type_uri":"n1-standard-2"},"worker_config":{"num_instances":2,"machine_type_uri":"n1-standard-2"},},}# Create the cluster.operation=cluster_client.create_cluster(request={"project_id":project_id,"region":region,"cluster":cluster})result=operation.result()# Output a success message.print(f"Cluster created successfully:{result.cluster_name}")

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.