Query Bigtable data

This document describes how to use BigQuery to query data stored in aBigtable external table.For information on how to query data directly from Bigtable,seeGoogleSQL for Bigtableoverview.

Bigtable is Google's sparsely populated NoSQLdatabase that can scale to billions of rows, thousands of columns, andpetabytes of data. For information on the Bigtable data model,seeStorage model.

Query permanent external tables

Before you begin, you or an administrator in your organization must create anexternal table for you to use. For details and required permissions, seeCreatea BigQuery externaltable.

Required roles

To query Bigtable external tables, ensureyou have the following roles.

  • BigQuery Data Viewer (roles/bigquery.dataViewer)
  • BigQuery User (roles/bigquery.user)
  • Bigtable Reader (roles/bigtable.reader)

Depending on your permissions, you cangrant these roles to yourself or ask your administratorto grant them to you. For more information about granting roles, seeViewing the grantable roles on resources.

To see the exact BigQuery permissions that are required to queryexternal tables, expand theRequired permissions section:

Required permissions

You might also be able to get these permissions withcustom rolesor otherpredefined roles.

Query the table

You can run a query against a permanent external Bigtable tableexactly as if it were astandard BigQuery table,subject to thelimitationson external data sources. For more information, seeRun interactive and batchqueries.

Query temporary external tables

Querying an external data source using a temporary table is useful for one-time, ad-hoc queries over external data, or for extract, transform, and load (ETL) processes.

To query an external data source without creating a permanent table, you provide a table definition for the temporary table, and then use that table definition in a command or call to query the temporary table. You can provide the table definition in any of the following ways:

The table definition file or supplied schema is used to create the temporary external table,and the query runs against the temporary external table.

When you use a temporary external table, you do not create a table in one of your BigQuery datasets. Because the table is not permanently stored in a dataset, it cannot be shared with others.

Using a temporary external table instead of a permanent externaltable has some limitations, including the following:

  • You must have the Bigtable Admin (roles/bigtable.admin) role.
  • This approach does not let you use the Google Cloud console to infer theschema of the Bigtable table and automatically create the tabledefinition. You must create the table definition yourself.

Required roles

To query Bigtable temporary external tables, ensureyou have the following roles:

  • BigQuery Data Viewer (roles/bigquery.dataViewer)
  • BigQuery User (roles/bigquery.user)
  • Bigtable Admin (roles/bigtable.admin)

Depending on your permissions, you cangrant these roles to yourself or ask your administratorto grant them to you. For more information about granting roles, seeViewing the grantable roles on resources.

To see the exact BigQuery permissions that are required to queryexternal tables, expand theRequired permissions section:

Required permissions

You might also be able to get these permissions withcustom rolesor otherpredefined roles.

Create and query the table

To query Bigtable data using a temporary external table, you:

Creating and querying a temporary external table is supported by thebq command-line tool and the API.

bq

To query a temporary table using a table definition file, enter thebq query command with the--external_table_definition flag.

(Optional) Supply the--location flag and set the value to yourlocation.

bq--location=LOCATIONquery\--use_legacy_sql=false\--external_table_definition=TABLE::DEFINITION_FILE\'QUERY'

Replace the following:

  • LOCATION: the name of yourlocation.The--location flag is optional.
  • TABLE: the name of the temporary table you're creating.
  • DEFINITION_FILE: the path to thetable definition fileon your local machine.
  • QUERY: the query you're submitting to the temporary table.

For example, the following command creates and queries a temporary tablenamedfollows using a table definition file namedfollows_def.

bq query \--use_legacy_sql=false \--external_table_definition=follows::/tmp/follows_def \'SELECT  COUNT(rowkey) FROM   follows'

API

  • Create a query. SeeQuerying data forinformation about creating a query job.

  • (Optional) Specify your location in thelocation property in thejobReference section of thejob resource.

  • Specify the external data source properties by setting theExternalDataConfiguration for thetable resource.

Java

Before trying this sample, follow theJava setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQueryJava API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

importcom.google.cloud.bigquery.BigQuery;importcom.google.cloud.bigquery.BigQueryException;importcom.google.cloud.bigquery.BigQueryOptions;importcom.google.cloud.bigquery.BigtableColumn;importcom.google.cloud.bigquery.BigtableColumnFamily;importcom.google.cloud.bigquery.BigtableOptions;importcom.google.cloud.bigquery.ExternalTableDefinition;importcom.google.cloud.bigquery.QueryJobConfiguration;importcom.google.cloud.bigquery.TableResult;importcom.google.common.collect.ImmutableList;importorg.apache.commons.codec.binary.Base64;// Sample to queries an external bigtable data source using a temporary tablepublicclassQueryExternalBigtableTemp{publicstaticvoidmain(String[]args){// TODO(developer): Replace these variables before running the sample.StringprojectId="MY_PROJECT_ID";StringbigtableInstanceId="MY_INSTANCE_ID";StringbigtableTableName="MY_BIGTABLE_NAME";StringbigqueryTableName="MY_TABLE_NAME";StringsourceUri=String.format("https://googleapis.com/bigtable/projects/%s/instances/%s/tables/%s",projectId,bigtableInstanceId,bigtableTableName);Stringquery=String.format("SELECT * FROM %s ",bigqueryTableName);queryExternalBigtableTemp(bigqueryTableName,sourceUri,query);}publicstaticvoidqueryExternalBigtableTemp(StringtableName,StringsourceUri,Stringquery){try{// Initialize client that will be used to send requests. This client only needs to be created// once, and can be reused for multiple requests.BigQuerybigquery=BigQueryOptions.getDefaultInstance().getService();BigtableColumnFamily.BuilderstatsSummary=BigtableColumnFamily.newBuilder();// Configuring ColumnsBigtableColumnconnectedCell=BigtableColumn.newBuilder().setQualifierEncoded(Base64.encodeBase64String("connected_cell".getBytes())).setFieldName("connected_cell").setType("STRING").setEncoding("TEXT").build();BigtableColumnconnectedWifi=BigtableColumn.newBuilder().setQualifierEncoded(Base64.encodeBase64String("connected_wifi".getBytes())).setFieldName("connected_wifi").setType("STRING").setEncoding("TEXT").build();BigtableColumnosBuild=BigtableColumn.newBuilder().setQualifierEncoded(Base64.encodeBase64String("os_build".getBytes())).setFieldName("os_build").setType("STRING").setEncoding("TEXT").build();// Configuring column family and columnsstatsSummary.setColumns(ImmutableList.of(connectedCell,connectedWifi,osBuild)).setFamilyID("stats_summary").setOnlyReadLatest(true).setEncoding("TEXT").setType("STRING").build();// Configuring BigtableOptions is optional.BigtableOptionsoptions=BigtableOptions.newBuilder().setIgnoreUnspecifiedColumnFamilies(true).setReadRowkeyAsString(true).setColumnFamilies(ImmutableList.of(statsSummary.build())).build();// Configure the external data source and query job.ExternalTableDefinitionexternalTable=ExternalTableDefinition.newBuilder(sourceUri,options).build();QueryJobConfigurationqueryConfig=QueryJobConfiguration.newBuilder(query).addTableDefinition(tableName,externalTable).build();// Example queryTableResultresults=bigquery.query(queryConfig);results.iterateAll().forEach(row->row.forEach(val->System.out.printf("%s,",val.toString())));System.out.println("Query on external temporary table performed successfully.");}catch(BigQueryException|InterruptedExceptione){System.out.println("Query not performed \n"+e.toString());}}}

Performance considerations

The performance of queries against Bigtable external data sourcesdepends on three factors:

  • The number of rows
  • The amount of data read
  • The extent of parallelization

BigQuery tries to read as little data as possible by only readingthe column families that are referenced in the query. The extent ofparallelization depends on how many nodes you have in yourBigtable cluster and how many splits you have for your table.

Note that Bigtable auto-merges splits based on load. If yourtable is not being read frequently, there will be fewer splits over time anda gradual degradation in query performance. For more information, seeHowBigQuery optimizes your data overtime.

Compute

Preview

This product or feature is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of theService Specific Terms. Pre-GA products and features are available "as is" and might have limited support. For more information, see thelaunch stage descriptions.

When you query your Bigtable data from BigQuery,you have the following options for compute:

  • Cluster nodes, which is the default.
  • Data Boost (Preview), a serverlesscompute option that lets you isolate your analytics traffic withoutimpacting the application serving-traffic that your clusters' nodes arehandling.

To use Data Boost, you or your administrator must create a definition file thatspecifies a Data Boost app profile in the Bigtable URI. For moreinformation, seeCreate a Bigtable externaltable.

If you don't use Data Boost, be aware that querying Bigtable fromBigQuery consumes Bigtable CPU cycles. CPUconsumption by BigQuery when using provisioned nodes for computemight affect latency and throughput for other concurrent requests such as liveuser traffic serving. For example, high CPU usage on Bigtableaffects long-tail queries and increases latency at the 99th percentile.

As a result, you should monitor Bigtable CPU usage to verify thatyou're within the recommended bounds as noted on the Bigtablemonitoring dashboard in the Google Cloud console. Increasing the number ofnodes for your instance lets you handle both BigQuery traffic andtraffic from other concurrent requests.

Query filters

You can add query filters when querying an external tableto reduce BigQuery resource usage.

Row key filter

Queries with a row key equality filter only read that specific row. For example, inGoogleSQL syntax:

SELECTCOUNT(follows.column.name)FROM`dataset.table`WHERErowkey="alice";

Range filters such asrowkey > '1' androwkey < '8' are also supported, butonly when rowkey is read as a string with thereadRowkeyAsString option.

Note: IfreadRowkeyAsString is set totrue, then the rowkey column familiesare read and converted to strings. Otherwise they are read with BYTES typevalues.

Filter by column family and qualifier

You can also select a specific column family or a specific qualifier within a column family.To filter by column family, select the column family name, and the result includes only the selected column family. In the following example,user_info represents a column family:

SELECTrowkeyASuser_id,user_infoFROMproject.dataset.table;
To filter by a specific qualifier, you must first declare them in"columns" in the external table definition:
CREATEORREPLACEEXTERNALTABLEproject.dataset.tableOPTIONS(format='CLOUD_BIGTABLE',uris=['https://googleapis.com/bigtable/projects/…/instances/…/tables/…'],bigtable_options='''{  "columnFamilies": [    {      "familyId": "user_info",      "columns": [        {          "qualifierString": "name"        },        {          "qualifierString": "email"        },        {          "qualifierString": "registered_at"        }      ]    },    {      "familyId": "session_data"    }  ],  "readRowkeyAsString": true,  "timestampSuffix": "_ts"}''');

After the external table is created, use aSELECT statement to query a specific qualifier.This ensures that BigQuery pushes down the filter to Bigtable and only loads the specified qualifiers when running aSELECT statement from BigQuery, not the entire column family's data. This reduces BigQuery resource consumption.

SELECTrowkeyASuser_id,user_info.email.cell[SAFE_OFFSET(0)].valueasemailFROMproject.dataset.table;

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.