Create a Dataproc cluster by using the gcloud CLI

This page shows you how to use the Google Cloud CLIgcloud command-line tool to create aDataproc cluster, run aApache Spark jobin the cluster, then modify the number of workers in the cluster.

You can find out how to do the same or similar tasks withQuickstarts Using the API Explorer,the Google Cloud console inCreate a Dataproc cluster by using the Google Cloud console,and using the client libraries inCreate a Dataproc cluster by using client libraries.

Before you begin

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. Install the Google Cloud CLI.

    Note: If you installed the gcloud CLI previously, make sure you have the latest version by runninggcloud components update.
  3. If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.

  4. Toinitialize the gcloud CLI, run the following command:

    gcloudinit
  5. Create or select a Google Cloud project.

    Roles required to select or create a project

    • Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
    • Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.create permission.Learn how to grant roles.
    Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.
    • Create a Google Cloud project:

      gcloud projects createPROJECT_ID

      ReplacePROJECT_ID with a name for the Google Cloud project you are creating.

    • Select the Google Cloud project that you created:

      gcloud config set projectPROJECT_ID

      ReplacePROJECT_ID with your Google Cloud project name.

  6. Verify that you have the permissions required to complete this guide.

  7. Verify that billing is enabled for your Google Cloud project.

  8. Enable the Dataproc API:

    Roles required to enable APIs

    To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enable permission.Learn how to grant roles.

    gcloudservicesenabledataproc.googleapis.com
  9. Install the Google Cloud CLI.

    Note: If you installed the gcloud CLI previously, make sure you have the latest version by runninggcloud components update.
  10. If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.

  11. Toinitialize the gcloud CLI, run the following command:

    gcloudinit
  12. Create or select a Google Cloud project.

    Roles required to select or create a project

    • Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
    • Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.create permission.Learn how to grant roles.
    Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.
    • Create a Google Cloud project:

      gcloud projects createPROJECT_ID

      ReplacePROJECT_ID with a name for the Google Cloud project you are creating.

    • Select the Google Cloud project that you created:

      gcloud config set projectPROJECT_ID

      ReplacePROJECT_ID with your Google Cloud project name.

  13. Verify that you have the permissions required to complete this guide.

  14. Verify that billing is enabled for your Google Cloud project.

  15. Enable the Dataproc API:

    Roles required to enable APIs

    To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enable permission.Learn how to grant roles.

    gcloudservicesenabledataproc.googleapis.com

Required roles

Certain IAM roles are required torun the examples on this page. Depending on organization policies, theseroles may have already been granted. To check role grants, seeDo you need to grant roles?.

For more information about granting roles, seeManage access to projects,folders, and organizations.

User roles

To get the permissions that you need to create a Dataproc cluster, ask your administrator to grant you the following IAM roles:

Service account role

To ensure that the Compute Engine default service account has the necessary permissions to create a Dataproc cluster, ask your administrator to grant the Compute Engine default service account theDataproc Worker (roles/dataproc.worker) IAM role on the project.Important: You must grant this role to the Compute Engine default service account,not to your user account. Failure to grant the role to the correct principal might result in permission errors.

Create a cluster

To create a cluster calledexample-cluster, run the followinggcloud Dataproc clusters create command.

A convenient way to run thegcloudcommand-line tool is fromCloud Shell,which has the Google Cloud CLI pre-installed. Cloud Shellis free for Google Cloud customers. To use Cloud Shell, you need aGoogle Cloud project.
gcloud dataproc clusters create example-cluster --region=REGION

Replace the following:

REGION: Specify aregionwhere the cluster will be located.

Submit a job

To submit a sample Spark job that calculates a rough value forpi, run thefollowinggcloud Dataproc jobs submit spark command:

gcloud dataproc jobs submit spark --cluster example-cluster \    --region=REGION \    --class org.apache.spark.examples.SparkPi \    --jars file:///usr/lib/spark/examples/jars/spark-examples.jar -- 1000

Notes:

Replace the following:

REGION: Specify the cluster region.

  • The job runs on theexample-cluster.
  • Theclass contains the main method for the SparkPi, which calculatesan approximate value ofpi.application.
  • The jar file contains the job code.
  • 1000 is a job parameter. It specifies the number of tasks (iterations)the job performs to calculater the value ofpi.
Parameters passed to the job must follow a double dash (--).For more information, see theGoogle Cloud CLI documentation.

The job's running and final output is displayed in the terminal window:

Waiting for job output......Pi is roughly 3.14118528...Job finished successfully.

Update a cluster

To change the number of workers in the cluster to five, run thefollowing command:

gcloud dataproc clusters update example-cluster \    --region=REGION \    --num-workers 5

The command output displays cluster details:

workerConfig:...  instanceNames:  - example-cluster-w-0  - example-cluster-w-1  - example-cluster-w-2  - example-cluster-w-3  - example-cluster-w-4  numInstances: 5statusHistory:...- detail: Add 3 workers.

To decrease the number of worker nodes to the original value of2, runthe following command:

gcloud dataproc clusters update example-cluster \    --region=REGION \    --num-workers 2

Clean up

To avoid incurring charges to your Google Cloud account forthe resources used on this page, follow these steps.

  1. To delete theexample-cluster, run theclusters deletecommand:
    gcloud dataproc clusters delete example-cluster \    --region=REGION

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-17 UTC.