Create a Dataproc cluster by using the Google Cloud console

This page shows you how to use the Google Cloud console to create aDataproc cluster, run a basicApache Sparkjob in the cluster, and then modify the number of workers in the cluster.

To follow step-by-step guidance for this task directly in the Google Cloud console, clickGuide me:

Guide me

Before you begin

Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.create permission.Learn how to grant roles.

Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Enable the Dataproc API.

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enable permission.Learn how to grant roles.

Enable the API

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.create permission.Learn how to grant roles.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Enable the Dataproc API.

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enable permission.Learn how to grant roles.

Enable the API

Create a cluster

In the Google Cloud console, go to the DataprocClusters page.
Go to Clusters
ClickCreate cluster.
In theCreate Dataproc cluster dialog, clickCreate intheCluster on Compute Engine row.
In theCluster name field, enterexample-cluster.
In theRegion andZone lists, select a region and zone.
Select a region (for example,us-east1 oreurope-west1)to isolate resources, such as virtual machine (VM) instances andCloud Storage and metadata storage locations that are utilized byDataproc, in the region. For moreinformation, seeAvailable regions and zones andRegional endpoints.
For all the other options, use the default settings.
To create the cluster, clickCreate.
Your new cluster appears in a list on theClusters page. The status isProvisioning until the cluster is ready to use, and then the statuschanges toRunning. Provisioning the cluster might take a couple ofminutes.

Submit a Spark job

Submit a Spark job that estimates a value of Pi:

In the Dataproc navigation menu, clickJobs.
On theJobs page, clickSubmit job, and then dothe following:
1. In theJob ID field, use the default setting, or provide an ID thatis unique to your Google Cloud project.
2. In theCluster drop-down, selectexample-cluster.
3. ForJob type, selectSpark.
4. In theMain class or jar field, enterorg.apache.spark.examples.SparkPi.
5. In theJar files field, enterfile:///usr/lib/spark/examples/jars/spark-examples.jar.
6. In theArguments field, enter1000 to set the number of tasks.
  Note: The Spark job estimates Pi by using the Monte Carlo method.It generatesx andy points on a coordinate plane that models acircle enclosed by a unit square. The input argument (1000) determinesthe number of x-y pairs to generate; the more pairs generated, thegreater the accuracy of the estimation. This estimation usesDataproc worker nodes to parallelize the computation. Formore information, seeEstimating Pi using the Monte Carlo Method andJavaSparkPi.java on GitHub.
7. ClickSubmit.
  Your job is displayed on theJob details page. The job status isRunning orStarting, and then it changes toSucceeded afterit's submitted.
  To avoid scrolling in the output, clickLine wrap: off. The outputis similar to the following:
```
Pi is roughly 3.1416759514167594
```
  To view job details, click theConfiguration tab.

Update a cluster

Update your cluster by changing the number of worker instances:

In the Dataproc navigation menu, clickClusters.
In the list of clusters, clickexample-cluster.
On theCluster details page, click theConfiguration tab.
Your cluster settings are displayed.
ClickEdit.
In theWorker nodes field, enter5.
ClickSave.

Your cluster is now updated. To decrease the number of worker nodes to theoriginal value, follow the same procedure.

Clean up

To avoid incurring charges to your Google Cloud account for the resources used on this page, follow these steps.

To delete the cluster, on theCluster details pageforexample-cluster, clickDelete.
To confirm that you want to delete the cluster, clickDelete.

What's next

Try this quickstart by using other tools:
- Use the API Explorer.
- Use the Google Cloud CLI.
Learn how tocreate robust firewall rules when you create a project.
Learn how towrite and run a Spark Scala job.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.

Movatterモバイル変換