Connect to Apache Spark
As a BigQuery administrator, you can create aconnection to enable data analysts torun stored procedures for Apache Spark.
Before you begin
Enable the BigQuery Connection API.
To get the permissions that you need to create a Spark connection, ask your administrator to grant you theBigQuery Connection Admin (
roles/bigquery.connectionAdmin) IAM role on the project. For more information about granting roles, seeManage access to projects, folders, and organizations.You might also be able to get the required permissions throughcustom roles or otherpredefined roles.
- Optional: To manage your metadata usingDataproc Metastore, ensure that you havecreated a Dataproc Metastore service.
- Optional: Toview job history using Spark History Server web interfaces, ensure that you havecreated a Dataproc Persistent History Server (PHS).
Location considerations
When you choose a location for your data, consider the following:
Multi-regions
You must specify Google Cloud resources located in the same largegeographic area:
A connection in the BigQuery US multi-region canreference aSpark History Serveror aDataproc Metastorein any single region in the US geographic area, such as
us-central1,us-east4, orus-west2.A connection in the BigQuery EU multi-region canreference a Spark History Server or a Dataproc Metastore inmemberstates of theEuropean Union, such as
europe-north1oreurope-west3.
Single regions
A connection in a single region can only reference Google Cloudresources in the same region. For example, a connection in the singleregionus-east4 can only reference a Spark History Server or aDataproc Metastore inus-east4.
Create connections
Select one of the following options:
Console
Go to theBigQuery page.
In theExplorer pane, clickAdd data.
TheAdd data dialog opens.
In theFilter By pane, in theData Source Type section, selectBusiness Applications.
Alternatively, in theSearch for data sources field, you can enter
Spark.In theFeatured data sources section, clickApache Spark.
Click theApache Spark: BigQuery Federation solution card.
In theExternal data source pane, enter the following information:
In theConnection type list, selectApache Spark.
In theConnection ID field, enter a name for your connection—forexample,
spark_connection.In theData location list, select a region.
You can create a connection inregions and multi-regions that supportBigQuery.For more information, seeLocation considerations.
Optional: From theMetastore service list, select aDataproc Metastore.
Optional: In theHistory server cluster field, enter aDataproc Persistent History Server.
ClickCreate connection.
ClickGo to connection.
In theConnection info pane, copy the service account ID for use in afollowing step.
bq
In a command-line environment, use the
bq mkcommand to create aconnection:bq mk --connection --connection_type='SPARK' \ --properties=PROPERTIES \ --project_id=PROJECT_ID \ --location=LOCATIONCONNECTION_ID
Replace the following:
PROPERTIES: a key-value pair to provideconnection-specific parameters in JSON formatFor example:
--properties='{"metastoreServiceConfig": {"metastoreService": "METASTORE_SERVICE_NAME"},"sparkHistoryServerConfig": {"dataprocCluster": "DATAPROC_CLUSTER_NAME"}}'Replace the following:
METASTORE_SERVICE_NAME: theDataproc Metastore with a gRPCnetwork configuration—forexample,projects/my-project-id/locations/us-central1/services/my-serviceFor more information, see how to access thestored Hive metastoremetadata using an endpoint protocol.
DATAPROC_CLUSTER_NAME: the Spark HistoryServer configuration—for example,projects/my-project-id/regions/us-central1/clusters/my-clusterFor more information, seeCreate a Persistent History Servercluster.
PROJECT_ID: your Google Cloud project IDLOCATION: the location where you want to storethe connection—for example,USCONNECTION_ID: the connectionID—for example,myconnectionWhen youview the connection detailsin the Google Cloud console, the connection ID is the value in the lastsection of the fully qualified connection ID that is shown inConnection ID—for example
projects/.../locations/.../connections/myconnection
Retrieve and copy the service account ID because you need it in another step:
bq show --location=LOCATION --connectionPROJECT_ID.LOCATION.CONNECTION_ID
The output is similar to the following:
Connection myproject.us.myconnection name type properties---------------------- ------- ---------------------------------------------------myproject.us.myconnection SPARK {"serviceAccountId": "bqserver@example.iam.gserviceaccount.com"}
For information about how to manage connections, seeManage connections.
Grant access to the service account
To let a stored procedure for Apache Spark access your Google Cloudresources, you need to grant the service account that'sassociated with the stored procedure's connection the necessary IAMpermissions. Alternatively, you can use yourcustom service accountfor data access.
To read and write data from and to BigQuery, you need to givethe service account the following IAM permissions:
bigquery.tables.*on your BigQuery tablesbigquery.readsessions.*on your project
The
Note: If your stored procedure writes data to a temporaryCloud Storage bucket and thenloads theCloud Storage data to BigQuery,then you need to give the service account theroles/bigquery.adminIAM role includes the permissionsthat the service account needs in order to read and write data from and toBigQuery.bigquery.jobs.createpermission onyour project. For more information about IAM roles andpermissions in BigQuery, seeAccess control withIAM.To read and write data from and to Cloud Storage, you need to givethe service account the
storage.objects.*permission on yourCloud Storage objects.The
roles/storage.objectAdminIAM role includes the permissionsthat the service account needs in order to read and write data from and to Cloud Storage.If you specify Dataproc Metastore when you create aconnection, then for BigQuery to retrieve details about themetastore configuration, you need to give the service account the
metastore.services.getpermission on your Dataproc Metastore.The predefined
roles/metastore.metadataViewerrole includes the permissionthat the service account needs in order to retrieve details about themetastore configuration.You also need to grant the service account the
roles/storage.objectAdminrole on the Cloud Storage bucket so that your storedprocedure can access the Hive warehouse directory of yourDataproc Metastore (hive.metastore.warehouse.dir).If your stored procedure performs operations on the metastore, youmight need to give additional permissions. For more information aboutIAM roles and permissions in Dataproc Metastore,seeDataproc Metastore predefined roles and permissions.If you specify a Dataproc Persistent History Server when youcreate a connection, then you need to grant the service account thefollowing roles:
- The
roles/dataproc.viewerrole on your DataprocPersistent History Server that contains thedataproc.clusters.getpermission. - The
roles/storage.objectAdminrole on the Cloud Storage bucketthat you specify for the propertyspark:spark.history.fs.logDirectorywhen you create the Dataproc Persistent History Server.
For more information, seeDataproc Persistent HistoryServer andDataproc roles and permissions.
- The
Share connections with users
You can grant the following roles to let users query data and manage connections:
roles/bigquery.connectionUser: enables users to use connections to connectwith external data sources and run queries on them.roles/bigquery.connectionAdmin: enables users to manage connections.
For more information about IAM roles and permissions inBigQuery, seePredefined roles and permissions.
Select one of the following options:
Console
Go to theBigQuery page.
Connections are listed in your project, in a group calledConnections.
In the left pane, clickExplorer:

If you don't see the left pane, clickExpand left pane to open the pane.
Click your project, clickConnections, and then select a connection.
In theDetails pane, clickShare to share a connection.Then do the following:
In theConnection permissions dialog, share theconnection with other principals by adding or editingprincipals.
ClickSave.
bq
You cannot share a connection with the bq command-line tool.To share a connection, use the Google Cloud console orthe BigQuery Connections API method to share a connection.
API
Use theprojects.locations.connections.setIAM methodin the BigQuery Connections REST API reference section, andsupply an instance of thepolicy resource.
Java
Before trying this sample, follow theJava setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQueryJava API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.
importcom.google.api.resourcenames.ResourceName;importcom.google.cloud.bigquery.connection.v1.ConnectionName;importcom.google.cloud.bigqueryconnection.v1.ConnectionServiceClient;importcom.google.iam.v1.Binding;importcom.google.iam.v1.Policy;importcom.google.iam.v1.SetIamPolicyRequest;importjava.io.IOException;// Sample to share connectionspublicclassShareConnection{publicstaticvoidmain(String[]args)throwsIOException{// TODO(developer): Replace these variables before running the sample.StringprojectId="MY_PROJECT_ID";Stringlocation="MY_LOCATION";StringconnectionId="MY_CONNECTION_ID";shareConnection(projectId,location,connectionId);}staticvoidshareConnection(StringprojectId,Stringlocation,StringconnectionId)throwsIOException{try(ConnectionServiceClientclient=ConnectionServiceClient.create()){ResourceNameresource=ConnectionName.of(projectId,location,connectionId);Bindingbinding=Binding.newBuilder().addMembers("group:example-analyst-group@google.com").setRole("roles/bigquery.connectionUser").build();Policypolicy=Policy.newBuilder().addBindings(binding).build();SetIamPolicyRequestrequest=SetIamPolicyRequest.newBuilder().setResource(resource.toString()).setPolicy(policy).build();client.setIamPolicy(request);System.out.println("Connection shared successfully");}}}What's next
- Learn about differentconnection types.
- Learn aboutmanaging connections.
- Learn how tocreate a stored procedure for Apache Spark.
- Learn how tomanage stored procedures.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-15 UTC.