Create a Dataproc cluster by using the gcloud CLI Stay organized with collections Save and categorize content based on your preferences.
This page shows you how to use the Google Cloud CLIgcloud command-line tool to create aDataproc cluster, run aApache Spark jobin the cluster, then modify the number of workers in the cluster.
You can find out how to do the same or similar tasks withQuickstarts Using the API Explorer,the Google Cloud console inCreate a Dataproc cluster by using the Google Cloud console,and using the client libraries inCreate a Dataproc cluster by using client libraries.
Before you begin
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
Install the Google Cloud CLI.
Note: If you installed the gcloud CLI previously, make sure you have the latest version by runninggcloud components update.If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.
Toinitialize the gcloud CLI, run the following command:
gcloudinit
Create or select a Google Cloud project.
Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.Roles required to select or create a project
- Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
- Create a project: To create a project, you need the Project Creator role (
roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission.Learn how to grant roles.
Create a Google Cloud project:
gcloud projects createPROJECT_ID
Replace
PROJECT_IDwith a name for the Google Cloud project you are creating.Select the Google Cloud project that you created:
gcloud config set projectPROJECT_ID
Replace
PROJECT_IDwith your Google Cloud project name.
Verify that you have the permissions required to complete this guide.
Verify that billing is enabled for your Google Cloud project.
Enable the Dataproc API:
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM role (
roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enablepermission.Learn how to grant roles.gcloudservicesenabledataproc.googleapis.comInstall the Google Cloud CLI.
Note: If you installed the gcloud CLI previously, make sure you have the latest version by runninggcloud components update.If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.
Toinitialize the gcloud CLI, run the following command:
gcloudinit
Create or select a Google Cloud project.
Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.Roles required to select or create a project
- Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
- Create a project: To create a project, you need the Project Creator role (
roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission.Learn how to grant roles.
Create a Google Cloud project:
gcloud projects createPROJECT_ID
Replace
PROJECT_IDwith a name for the Google Cloud project you are creating.Select the Google Cloud project that you created:
gcloud config set projectPROJECT_ID
Replace
PROJECT_IDwith your Google Cloud project name.
Verify that you have the permissions required to complete this guide.
Verify that billing is enabled for your Google Cloud project.
Enable the Dataproc API:
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM role (
roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enablepermission.Learn how to grant roles.gcloudservicesenabledataproc.googleapis.com
Required roles
Certain IAM roles are required torun the examples on this page. Depending on organization policies, theseroles may have already been granted. To check role grants, seeDo you need to grant roles?.
For more information about granting roles, seeManage access to projects,folders, and organizations.
User roles
To get the permissions that you need to create a Dataproc cluster, ask your administrator to grant you the following IAM roles:
- Dataproc Editor (
roles/dataproc.editor) on the project - Service Account User (
roles/iam.serviceAccountUser) on the Compute Engine default service account
Service account role
To ensure that the Compute Engine default service account has the necessary permissions to create a Dataproc cluster, ask your administrator to grant the Compute Engine default service account theDataproc Worker (roles/dataproc.worker) IAM role on the project.Important: You must grant this role to the Compute Engine default service account,not to your user account. Failure to grant the role to the correct principal might result in permission errors.
Create a cluster
To create a cluster calledexample-cluster, run the followinggcloud Dataproc clusters create command.
gcloudcommand-line tool is fromCloud Shell,which has the Google Cloud CLI pre-installed. Cloud Shellis free for Google Cloud customers. To use Cloud Shell, you need aGoogle Cloud project.gcloud dataproc clusters create example-cluster --region=REGION
Replace the following:
REGION: Specify aregionwhere the cluster will be located.
Submit a job
To submit a sample Spark job that calculates a rough value forpi, run thefollowinggcloud Dataproc jobs submit spark command:
gcloud dataproc jobs submit spark --cluster example-cluster \ --region=REGION \ --class org.apache.spark.examples.SparkPi \ --jars file:///usr/lib/spark/examples/jars/spark-examples.jar -- 1000
Notes:
Replace the following:
REGION: Specify the cluster region.
- The job runs on the
example-cluster. - The
classcontains the main method for the SparkPi, which calculatesan approximate value ofpi.application. - The jar file contains the job code.
1000is a job parameter. It specifies the number of tasks (iterations)the job performs to calculater the value ofpi.
--).For more information, see theGoogle Cloud CLI documentation.The job's running and final output is displayed in the terminal window:
Waiting for job output......Pi is roughly 3.14118528...Job finished successfully.
Update a cluster
To change the number of workers in the cluster to five, run thefollowing command:
gcloud dataproc clusters update example-cluster \ --region=REGION \ --num-workers 5
The command output displays cluster details:
workerConfig:... instanceNames: - example-cluster-w-0 - example-cluster-w-1 - example-cluster-w-2 - example-cluster-w-3 - example-cluster-w-4 numInstances: 5statusHistory:...- detail: Add 3 workers.
To decrease the number of worker nodes to the original value of2, runthe following command:
gcloud dataproc clusters update example-cluster \ --region=REGION \ --num-workers 2
Clean up
To avoid incurring charges to your Google Cloud account forthe resources used on this page, follow these steps.
- To delete the
example-cluster, run theclusters deletecommand:gcloud dataproc clusters delete example-cluster \ --region=REGION
What's next
- Learn how towrite and run a Spark Scala job.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-17 UTC.