Run federated queries with Data Boost

This page explains how to use Spanner Data Boost when you run federated queriesfrom BigQuery to a Spanner database. WithData Boost, federated queries run with minimal impact to existingworkloads on the provisioned Spanner instance. TheData Boost queries from BigQuery to aSpanner database can join BigQuery data withSpanner data.

Spanner federation lets BigQuery query dataresiding in Spanner in real time, without copying or moving data.To learn more about Spanner federated queries, seeSpanner federated queries.To learn about Data Boost, seeData Boostoverview.

Before you begin

Before you can run federated queries with Data Boost, you need tocomplete the following tasks:

Create a Spanner instance and database

If you don't have a Spanner instance and database, follow thesteps inCreate and query a database using the Google Cloud console to create them.

Enable the BigQuery connection API

The BigQuery connection API lets you manageBigQuery connections to external data sources such as aSpanner database.

  • Enable the BigQuery connection API.

    Roles required to enable APIs

    To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enable permission.Learn how to grant roles.

    Enable the API

For more information, seeBigQuery connectionAPI in theBigQuery documentation.

Grant IAM permissions for Data Boost to principals

A principal must be granted the following permissions to run federated querieswith Data Boost:

  • spanner.instances.get - lets you get the configuration of an instance.
  • spanner.databases.useDataBoost - lets you use theSpanner Data Boost computeresources to processpartitioned queries.

For more information about Spanner permissions, seeIdentity and Access Management(IAM) permissions.

To grant these required permissions, we recommend that you use theCloudSpanner Database Reader With DataBoost(roles/spanner.databaseReaderWithDataBoost) IAM role. You canadd that role to any principal that needs to run federated querieswith Data Boost. To learn more about predefined roles inSpanner, seePredefined roles. To learnhow to create a custom IAM role, seeCreate a customrole.

Run a federated Data Boost query

To run a Data Boost query from BigQuery to anexternal source, you need a BigQuery connection to the externalsource and the ID of the connection. When you run a federatedSpanner query with Data Boost, the external sourceis a Spanner database. After you create your connection ID, it'sused by BigQuery to run a Data Boostquery of a Spanner database.

Use one of the following options to create a BigQuery connectionID, and then use the connection ID to run a Data Boost query fromBigQuery:

  1. Start in Spanner - Create theBigQuery external connection ID in the Spannerconsole. After your connection ID is created in the Spannerconsole, you're redirected to the BigQuery console to run afederated Data Boost query to a Spannerdatabase.

  2. Start in BigQuery - Create theData Boost external connection ID in theBigQuery console or using thebq command-line tool. Afteryou create the connection ID, you stay in the BigQueryconsole to run a federated Data Boost query to aSpanner database.

Start in Spanner to run a Data Boost query

To run a federated Data Boost query starting in theSpanner Studio, do the following:

  1. Go to the SpannerInstances page in theGoogle Cloud console.

    Go to the Instances page

    The console shows a list of your Spannerinstances.

  2. Select a Spanner instance, and then select a database.

  3. On theDatabase overview page, in the navigation menu, clickSpanner Studio.

  4. ClickView in BiqQuery.

  5. In theView in BigQuery dialog, enter a connection ID.

    The connection ID is used to create a new BigQuery externalconnection to your Spanner database. You reference yourexternal connection using the following pattern:

    PROJECT-ID.LOCATION.CONNECTION-ID

    An error occurs if the ID already exists.

  6. Fill in the rest of the dialog and do the following:

    • SelectRead data in parallel.
    • SelectUse Spanner Data Boost.
  7. ClickView in BigQuery.

    BigQuery Studio opens with the following query:

    SELECT*FROMEXTERNAL_QUERY("PROJECT-ID.LOCATION.CONNECTION-ID","SELECT * FROM INFORMATION_SCHEMA.TABLES;");

    You can replace this with your federated query. For example, you might makea query that's similar to the following example. This example makes afederated query from a table namedorders in a Spannerdatabase and joins the results with a BigQuery table namedmydataset.customers.

    SELECTc.customer_id,c.name,rq.first_order_dateFROMmydataset.customersAScLEFTOUTERJOINEXTERNAL_QUERY('my-project.us.example-db','''SELECT customer_id, MIN(order_date) AS first_order_date  FROM orders  GROUP BY customer_id''')ASrqONrq.customer_id=c.customer_idGROUPBYc.customer_id,c.name,rq.first_order_date;

Start in BigQuery to run a Data Boost query

To create an external data connection from BigQuery to aSpanner database and use that connection to run a federatedData Boost query from BigQuery, select one of thefollowing options:

Console

  1. Go toCreate Spanner connectionsin the BigQuery documentation and follow the instructions ontheConsole tab.

  2. In theExternal data source pane, do the following:

    • SelectRead data in parallel.
    • SelectUse Spanner Data Boost.

bq

  1. Go toCreate Spanner connectionsin the BigQuery documentation and follow instructions on thebq* tab.

  2. Set the following connection properties totrue:

    • useParallelism
    • useDataBoost

The following example uses thebqmk command to create anew connection namedmy_connection with the two required properties forData Boost:

bqmk--connection--connection_type='CLOUD_SPANNER'--location='us'\--properties='{"database":"projects/my-project/instances/my-instance/databases/my-database", "useParallelism":true, "useDataBoost": true}'my_connection

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.