Dataproc client libraries

This page shows how to get started with the Cloud Client Libraries for theDataproc API. Client libraries make it easier to accessGoogle Cloud APIs from a supported language. Although you can useGoogle Cloud APIs directly by making raw requests to the server, clientlibraries provide simplifications that significantly reduce the amount of codeyou need to write.

However, we recommend using the older Google API ClientLibraries if running onApp Engine standard environment. Read more about the Cloud Client Librariesand the older Google API Client Libraries inClient libraries explained.

Dataproc Cloud Client Libraries maybe in alpha or beta stage. See the library reference for details.

Install the client library

C++

SeeSetting up a C++ development environmentfor details about this client library's requirements and install dependencies.

C#

Also seeGoogle Cloud Libraries for .NET.

For more information, seeSetting Up a C# Development Environment.

Go

go get cloud.google.com/go/dataproc/apiv1

For more information, seeInstall the Cloud Client Libraries for Go.

For more information, seeSetting Up a Go Development Environment.

Java

If you are using Maven, add this to your pom.xml file:

<dependency><groupId>com.google.cloud</groupId><artifactId>google-cloud-dataproc</artifactId><version>insertdataproc-library-versionhere</version></dependency>

If you are using Gradle, add this to your dependencies:

compilegroup:'com.google.cloud',name:'google-cloud-dataproc',version:'insertdataproc-library-versionhere'

For more information, seeSetting Up a Java Development Environment.

Node.js

npm install --save @google-cloud/dataproc

For more information, seeSetting Up a Node.js Development Environment.

PHP

composer require google/cloud

For more information, seeUsing PHP on Google Cloud.

Python

pip install --upgrade google-cloud-dataproc

For more information, seeSetting Up a Python Development Environment.

Ruby

gem install google-cloud-dataproc

For more information, seeSetting Up a Ruby Development Environment.

Set up authentication

To authenticate calls to Google Cloud APIs, client libraries supportApplication Default Credentials (ADC);the libraries look for credentials in a set of defined locations and use those credentialsto authenticate requests to the API. With ADC, you can makecredentials available to your application in a variety of environments, such as localdevelopment or production, without needing to modify your application code.

For production environments, the way you set up ADC depends on the serviceand context. For more information, seeSet up Application Default Credentials.

For a local development environment, you can set up ADC with the credentialsthat are associated with your Google Account:

  1. Install the Google Cloud CLI. After installation,initialize the Google Cloud CLI by running the following command:

    gcloudinit

    If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.

  2. If you're using a local shell, then create local authentication credentials for your user account:

    gcloudauthapplication-defaultlogin

    You don't need to do this if you're using Cloud Shell.

    If an authentication error is returned, and you are using an external identity provider (IdP), confirm that you have signed in to the gcloud CLI with your federated identity.

    A sign-in screen appears. After you sign in, your credentials are stored in the local credential file used by ADC.

Use the client library

The following example shows how to use the client library.

C++

+
#include"google/cloud/dataproc/v1/cluster_controller_client.h"#include"google/cloud/common_options.h"#include <iostream>intmain(intargc,char*argv[])try{if(argc!=3){std::cerr <<"Usage: " <<argv[0] <<" project-id region\n";return1;}std::stringconstproject_id=argv[1];std::stringconstregion=argv[2];namespacedataproc=::google::cloud::dataproc_v1;autoclient=dataproc::ClusterControllerClient(dataproc::MakeClusterControllerConnection(region=="global"?"":region));for(autoc:client.ListClusters(project_id,region)){if(!c)throwstd::move(c).status();std::cout <<c->cluster_name() <<"\n";}return0;}catch(google::cloud::Statusconst&status){std::cerr <<"google::cloud::Status thrown: " <<status <<"\n";return1;}

Go

import("context""fmt""io"dataproc"cloud.google.com/go/dataproc/apiv1""cloud.google.com/go/dataproc/apiv1/dataprocpb""google.golang.org/api/option")funccreateCluster(wio.Writer,projectID,region,clusterNamestring)error{// projectID := "your-project-id"// region := "us-central1"// clusterName := "your-cluster"ctx:=context.Background()// Create the cluster client.endpoint:=region+"-dataproc.googleapis.com:443"clusterClient,err:=dataproc.NewClusterControllerClient(ctx,option.WithEndpoint(endpoint))iferr!=nil{returnfmt.Errorf("dataproc.NewClusterControllerClient: %w",err)}deferclusterClient.Close()// Create the cluster config.req:=&dataprocpb.CreateClusterRequest{ProjectId:projectID,Region:region,Cluster:&dataprocpb.Cluster{ProjectId:projectID,ClusterName:clusterName,Config:&dataprocpb.ClusterConfig{MasterConfig:&dataprocpb.InstanceGroupConfig{NumInstances:1,MachineTypeUri:"n1-standard-2",},WorkerConfig:&dataprocpb.InstanceGroupConfig{NumInstances:2,MachineTypeUri:"n1-standard-2",},},},}// Create the cluster.op,err:=clusterClient.CreateCluster(ctx,req)iferr!=nil{returnfmt.Errorf("CreateCluster: %w",err)}resp,err:=op.Wait(ctx)iferr!=nil{returnfmt.Errorf("CreateCluster.Wait: %w",err)}// Output a success message.fmt.Fprintf(w,"Cluster created successfully: %s",resp.ClusterName)returnnil}

Java

importcom.google.api.gax.longrunning.OperationFuture;importcom.google.cloud.dataproc.v1.Cluster;importcom.google.cloud.dataproc.v1.ClusterConfig;importcom.google.cloud.dataproc.v1.ClusterControllerClient;importcom.google.cloud.dataproc.v1.ClusterControllerSettings;importcom.google.cloud.dataproc.v1.ClusterOperationMetadata;importcom.google.cloud.dataproc.v1.InstanceGroupConfig;importjava.io.IOException;importjava.util.concurrent.ExecutionException;publicclassCreateCluster{publicstaticvoidcreateCluster()throwsIOException,InterruptedException{// TODO(developer): Replace these variables before running the sample.StringprojectId="your-project-id";Stringregion="your-project-region";StringclusterName="your-cluster-name";createCluster(projectId,region,clusterName);}publicstaticvoidcreateCluster(StringprojectId,Stringregion,StringclusterName)throwsIOException,InterruptedException{StringmyEndpoint=String.format("%s-dataproc.googleapis.com:443",region);// Configure the settings for the cluster controller client.ClusterControllerSettingsclusterControllerSettings=ClusterControllerSettings.newBuilder().setEndpoint(myEndpoint).build();// Create a cluster controller client with the configured settings. The client only needs to be// created once and can be reused for multiple requests. Using a try-with-resources// closes the client, but this can also be done manually with the .close() method.try(ClusterControllerClientclusterControllerClient=ClusterControllerClient.create(clusterControllerSettings)){// Configure the settings for our cluster.InstanceGroupConfigmasterConfig=InstanceGroupConfig.newBuilder().setMachineTypeUri("n1-standard-2").setNumInstances(1).build();InstanceGroupConfigworkerConfig=InstanceGroupConfig.newBuilder().setMachineTypeUri("n1-standard-2").setNumInstances(2).build();ClusterConfigclusterConfig=ClusterConfig.newBuilder().setMasterConfig(masterConfig).setWorkerConfig(workerConfig).build();// Create the cluster object with the desired cluster config.Clustercluster=Cluster.newBuilder().setClusterName(clusterName).setConfig(clusterConfig).build();// Create the Cloud Dataproc cluster.OperationFuture<Cluster,ClusterOperationMetadata>createClusterAsyncRequest=clusterControllerClient.createClusterAsync(projectId,region,cluster);Clusterresponse=createClusterAsyncRequest.get();// Print out a success message.System.out.printf("Cluster created successfully: %s",response.getClusterName());}catch(ExecutionExceptione){System.err.println(String.format("Error executing createCluster: %s ",e.getMessage()));}}}

Node.js

Also see theDataproc Cloud Client Library Quickstarts.
constdataproc=require('@google-cloud/dataproc');// TODO(developer): Uncomment and set the following variables// projectId = 'YOUR_PROJECT_ID'// region = 'YOUR_CLUSTER_REGION'// clusterName = 'YOUR_CLUSTER_NAME'// Create a client with the endpoint set to the desired cluster regionconstclient=newdataproc.v1.ClusterControllerClient({apiEndpoint:`${region}-dataproc.googleapis.com`,projectId:projectId,});asyncfunctioncreateCluster(){// Create the cluster configconstrequest={projectId:projectId,region:region,cluster:{clusterName:clusterName,config:{masterConfig:{numInstances:1,machineTypeUri:'n1-standard-2',},workerConfig:{numInstances:2,machineTypeUri:'n1-standard-2',},},},};// Create the clusterconst[operation]=awaitclient.createCluster(request);const[response]=awaitoperation.promise();// Output a success messageconsole.log(`Cluster created successfully:${response.clusterName}`);

Python

fromgoogle.cloudimportdataproc_v1asdataprocdefcreate_cluster(project_id,region,cluster_name):"""This sample walks a user through creating a Cloud Dataproc cluster    using the Python client library.    Args:        project_id (string): Project to use for creating resources.        region (string): Region where the resources should live.        cluster_name (string): Name to use for creating a cluster.    """# Create a client with the endpoint set to the desired cluster region.cluster_client=dataproc.ClusterControllerClient(client_options={"api_endpoint":f"{region}-dataproc.googleapis.com:443"})# Create the cluster config.cluster={"project_id":project_id,"cluster_name":cluster_name,"config":{"master_config":{"num_instances":1,"machine_type_uri":"n1-standard-2"},"worker_config":{"num_instances":2,"machine_type_uri":"n1-standard-2"},},}# Create the cluster.operation=cluster_client.create_cluster(request={"project_id":project_id,"region":region,"cluster":cluster})result=operation.result()# Output a success message.print(f"Cluster created successfully:{result.cluster_name}")

Additional resources

C++

The following list contains links to more resources related to theclient library for C++:

C#

The following list contains links to more resources related to theclient library for C#:

Go

The following list contains links to more resources related to theclient library for Go:

Java

The following list contains links to more resources related to theclient library for Java:

Node.js

The following list contains links to more resources related to theclient library for Node.js:

PHP

The following list contains links to more resources related to theclient library for PHP:

Python

The following list contains links to more resources related to theclient library for Python:

Ruby

The following list contains links to more resources related to theclient library for Ruby:

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.