Create a Dataproc cluster by using the Google Cloud console
This page shows you how to use the Google Cloud console to create aDataproc cluster, run a basicApache Sparkjob in the cluster, and then modify the number of workers in the cluster.
To follow step-by-step guidance for this task directly in the Google Cloud console, clickGuide me:
Before you begin
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.Roles required to select or create a project
- Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
- Create a project: To create a project, you need the Project Creator role (
roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission.Learn how to grant roles.
Verify that billing is enabled for your Google Cloud project.
Enable the Dataproc API.
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM role (
roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enablepermission.Learn how to grant roles.In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.Roles required to select or create a project
- Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
- Create a project: To create a project, you need the Project Creator role (
roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission.Learn how to grant roles.
Verify that billing is enabled for your Google Cloud project.
Enable the Dataproc API.
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM role (
roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enablepermission.Learn how to grant roles.
Create a cluster
In the Google Cloud console, go to the DataprocClusters page.
ClickCreate cluster.
In theCreate Dataproc cluster dialog, clickCreate intheCluster on Compute Engine row.
In theCluster name field, enter
example-cluster.In theRegion andZone lists, select a region and zone.
Select a region (for example,
us-east1oreurope-west1)to isolate resources, such as virtual machine (VM) instances andCloud Storage and metadata storage locations that are utilized byDataproc, in the region. For moreinformation, seeAvailable regions and zones andRegional endpoints.For all the other options, use the default settings.
To create the cluster, clickCreate.
Your new cluster appears in a list on theClusters page. The status isProvisioning until the cluster is ready to use, and then the statuschanges toRunning. Provisioning the cluster might take a couple ofminutes.
Submit a Spark job
Submit a Spark job that estimates a value of Pi:
- In the Dataproc navigation menu, clickJobs.
On theJobs page, clickSubmit job, and then dothe following:
- In theJob ID field, use the default setting, or provide an ID thatis unique to your Google Cloud project.
- In theCluster drop-down, select
example-cluster. - ForJob type, selectSpark.
- In theMain class or jar field, enter
org.apache.spark.examples.SparkPi. - In theJar files field, enter
file:///usr/lib/spark/examples/jars/spark-examples.jar. In theArguments field, enter
Note: The Spark job estimates Pi by using theMonte Carlo method.It generatesx andy points on a coordinate plane that models acircle enclosed by a unit square. The input argument (1000to set the number of tasks.1000) determinesthe number of x-y pairs to generate; the more pairs generated, thegreater the accuracy of the estimation. This estimation usesDataproc worker nodes to parallelize the computation. Formore information, seeEstimating Pi using the Monte Carlo Method andJavaSparkPi.java on GitHub.ClickSubmit.
Your job is displayed on theJob details page. The job status isRunning orStarting, and then it changes toSucceeded afterit's submitted.
To avoid scrolling in the output, clickLine wrap: off. The outputis similar to the following:
Pi is roughly 3.1416759514167594
To view job details, click theConfiguration tab.
Update a cluster
Update your cluster by changing the number of worker instances:
- In the Dataproc navigation menu, clickClusters.
- In the list of clusters, click
example-cluster. On theCluster details page, click theConfiguration tab.
Your cluster settings are displayed.
ClickEdit.
In theWorker nodes field, enter
5.ClickSave.
Your cluster is now updated. To decrease the number of worker nodes to theoriginal value, follow the same procedure.
Clean up
To avoid incurring charges to your Google Cloud account for the resources used on this page, follow these steps.
- To delete the cluster, on theCluster details pagefor
example-cluster, clickDelete. - To confirm that you want to delete the cluster, clickDelete.
What's next
- Try this quickstart by using other tools:
- Learn how tocreate robust firewall rules when you create a project.
- Learn how towrite and run a Spark Scala job.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-15 UTC.