Create a Bigtable external table
This page describes how to create a BigQuery permanent external tablethat can be used to query data stored in Bigtable. Querying data inBigtable is available in allBigtablelocations.
Before you begin
Before you create an external table, gather some information and make sure youhave permission to create the table.
Required roles
To create an external table to use to query your Bigtable data,you must be a principal in the Bigtable Admin(roles/bigtable.admin) role for the instance that contains the source table.
You also need thebigquery.tables.create BigQueryIdentity and Access Management (IAM) permission.
Each of the following predefined Identity and Access Management roles includes this permission:
- BigQuery Data Editor (
roles/bigquery.dataEditor) - BigQuery Data Owner (
roles/bigquery.dataOwner) - BigQuery Admin (
roles/bigquery.admin)
If you are not a principal in any of these roles, ask your administratorto grant you access or to create the external table for you.
For more information on Identity and Access Management roles and permissions inBigQuery, seePredefined roles andpermissions. To view information onBigtable permissions, seeAccess control withIdentity and Access Management. To view the roles required toquery the external table, seeQuery Bigtabledata.
Create or identify a dataset
Before you create an external table, you mustcreate a dataset to contain the external table. Youcan also use an existing dataset.
Plan your compute usage
Preview
This product or feature is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of theService Specific Terms. Pre-GA products and features are available "as is" and might have limited support. For more information, see thelaunch stage descriptions.
Determine the type of compute you want to use when you query your data. Youspecify that you want to use Data Boost or that you want to route to a dedicatedcluster in yourapp profile settings.
Data Boost
To avoid impacting your application serving traffic, you can use Data Boostserverless compute when you use a BigQuery externaltable to read your Bigtable data. To use Data Boost, you must usea Data Boost app profile and include the app profile ID when you compose yourBigtable URI. For more information about Data Boost, seeBigtable Data Boostoverview.
Provisioned nodes
If you don't use Data Boost, cluster nodes are used for compute.
If you don't use Data Boost and you plan to frequently query the same data thatserves your production application, we recommend that you designate a cluster inyour Bigtable instance to be used solely forBigQuery analysis. This isolates the traffic from the cluster orclusters that you use for your application's reads and writes. To learn moreabout replication and creating instances that have more than one cluster, seeAbout replication.
Identify or create an app profile
Before you create an external table, decide which Bigtable appprofile that BigQuery should use to read the data. We recommendthat you use an app profile that you designate for use only withBigQuery. The app profile can be either a standard app profile ora Data Boost app profile, depending on the type of compute that youwant to use to query your data.
If you have a cluster in your Bigtable instance that is dedicatedto BigQuery access, configure the app profile to usesingle-cluster routing to that cluster.
To use Data Boost serverless compute, create a Data Boost app profile. To usecluster nodes for compute, create a standard app profile. To learn howBigtable app profiles work, seeAbout appprofiles. To see how to create a new app profile,seeCreate and configure appprofiles.
Retrieve the Bigtable URI
To create an external table for a Bigtable data source, you mustprovide the Bigtable URI. To retrieve the BigtableURI, do the following:
Open the Bigtable page in the console.
Retrieve the following details about your Bigtabledata source:
- Your project ID.
- Your Bigtable instance ID.
- The ID of the Bigtable app profile that you plan to use.This can be either a standard app profile or a Data Boost app profile,depending on thetype of compute that you want to use. Ifyou don't specify an app profile ID, the default app profile is used.
- The name of your Bigtable table.
Compose the Bigtable URI using the following format, where:
- PROJECT_ID is the project that contains yourBigtable instance
- INSTANCE_ID is the Bigtable instance ID
- APP_PROFILE (optional) is the identifier for the app profilethat you want to use
- TABLE_NAME is the name of the table you're querying
https://googleapis.com/bigtable/projects/PROJECT_ID/instances/INSTANCE_ID[/appProfiles/APP_PROFILE]/tables/TABLE_NAME
Create permanent external tables
When you create a permanent external table in BigQuery that islinked to a Bigtable data source, there are two options forspecifying the format of the external table:
- If you are using the API or the bq command-line tool, you create atable definition filethat defines the schema and metadata for the external table.
- If you are using SQL, you use the
urioption of theCREATE EXTERNAL TABLEstatement to specify the Bigtable table to pull data from, andthebigtable_optionsoption to specify the table schema.
The external table data is not stored in the BigQuery table.Because the table is permanent, you can use dataset-levelaccesscontrols to share the table with others who alsohave access to the underlying Bigtable data source.
To create a permanent table, choose one of the following methods.
SQL
You can create a permanent external table by running theCREATE EXTERNAL TABLE DDL statement.You must specify the table schema explicitly as part of the statementoptions.
In the Google Cloud console, go to theBigQuery page.
In the query editor, enter the following statement:
CREATEEXTERNALTABLEDATASET.NEW_TABLEOPTIONS(format='CLOUD_BIGTABLE',uris=['URI'],bigtable_options=BIGTABLE_OPTIONS);
Replace the following:
DATASET: the dataset in which to create the Bigtable external table.NEW_TABLE: the name for the Bigtable external table.URI: the URI for the Bigtable table you want to use as a data source. This URI must follow the format described inRetrieving the Bigtable URI.BIGTABLE_OPTIONS: the schema for the Bigtable table in JSON format. For a list of Bigtable table definition options, seeBigtableOptionsin the REST API reference.
ClickRun.
For more information about how to run queries, seeRun an interactive query.
A statement to create an external Bigtable tablemight look similar to the following:
CREATEEXTERNALTABLEmydataset.BigtableTableOPTIONS(format='CLOUD_BIGTABLE',uris=['https://googleapis.com/bigtable/projects/myproject/instances/myBigtableInstance/appProfiles/myAppProfile/tables/table1'],bigtable_options=""" { columnFamilies: [ { "familyId": "familyId1", "type": "INTEGER", "encoding": "BINARY" } ], readRowkeyAsString: true } """);bq
You create a table in the bq command-line tool using thebq mk command. Whenyou use the bq command-line tool to create a table linked to an external data source,you identify the table's schema using atable definition file.
Use the
bq mkcommand to create a permanent table.bqmk\--external_table_definition=DEFINITION_FILE\DATASET.TABLE
Replace the following:
DEFINITION_FILE: the path to thetable definition fileon your local machine.DATASET: the name of the dataset that contains the table.TABLE: the name of the table you're creating.
API
Use thetables.insertAPI method, and create anExternalDataConfigurationin theTable resourcethat you pass in.
For thesourceUris property in theTable resource,specify only oneBigtable URI. It must be avalid HTTPS URL.
For thesourceFormatproperty, specify"BIGTABLE".
Java
Before trying this sample, follow theJava setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQueryJava API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.
importcom.google.cloud.bigquery.BigQuery;importcom.google.cloud.bigquery.BigQueryException;importcom.google.cloud.bigquery.BigQueryOptions;importcom.google.cloud.bigquery.BigtableColumn;importcom.google.cloud.bigquery.BigtableColumnFamily;importcom.google.cloud.bigquery.BigtableOptions;importcom.google.cloud.bigquery.ExternalTableDefinition;importcom.google.cloud.bigquery.QueryJobConfiguration;importcom.google.cloud.bigquery.TableId;importcom.google.cloud.bigquery.TableInfo;importcom.google.cloud.bigquery.TableResult;importcom.google.common.collect.ImmutableList;importorg.apache.commons.codec.binary.Base64;// Sample to queries an external bigtable data source using a permanent tablepublicclassQueryExternalBigtablePerm{publicstaticvoidmain(String[]args){// TODO(developer): Replace these variables before running the sample.StringprojectId="MY_PROJECT_ID";StringbigtableInstanceId="MY_INSTANCE_ID";StringbigtableTableName="MY_BIGTABLE_NAME";StringbigqueryDatasetName="MY_DATASET_NAME";StringbigqueryTableName="MY_TABLE_NAME";StringsourceUri=String.format("https://googleapis.com/bigtable/projects/%s/instances/%s/tables/%s",projectId,bigtableInstanceId,bigtableTableName);Stringquery=String.format("SELECT * FROM %s ",bigqueryTableName);queryExternalBigtablePerm(bigqueryDatasetName,bigqueryTableName,sourceUri,query);}publicstaticvoidqueryExternalBigtablePerm(StringdatasetName,StringtableName,StringsourceUri,Stringquery){try{// Initialize client that will be used to send requests. This client only needs to be created// once, and can be reused for multiple requests.BigQuerybigquery=BigQueryOptions.getDefaultInstance().getService();BigtableColumnFamily.BuilderstatsSummary=BigtableColumnFamily.newBuilder();// Configuring ColumnsBigtableColumnconnectedCell=BigtableColumn.newBuilder().setQualifierEncoded(Base64.encodeBase64String("connected_cell".getBytes())).setFieldName("connected_cell").setType("STRING").setEncoding("TEXT").build();BigtableColumnconnectedWifi=BigtableColumn.newBuilder().setQualifierEncoded(Base64.encodeBase64String("connected_wifi".getBytes())).setFieldName("connected_wifi").setType("STRING").setEncoding("TEXT").build();BigtableColumnosBuild=BigtableColumn.newBuilder().setQualifierEncoded(Base64.encodeBase64String("os_build".getBytes())).setFieldName("os_build").setType("STRING").setEncoding("TEXT").build();// Configuring column family and columnsstatsSummary.setColumns(ImmutableList.of(connectedCell,connectedWifi,osBuild)).setFamilyID("stats_summary").setOnlyReadLatest(true).setEncoding("TEXT").setType("STRING").build();// Configuring BigtableOptions is optional.BigtableOptionsoptions=BigtableOptions.newBuilder().setIgnoreUnspecifiedColumnFamilies(true).setReadRowkeyAsString(true).setColumnFamilies(ImmutableList.of(statsSummary.build())).build();TableIdtableId=TableId.of(datasetName,tableName);// Create a permanent table linked to the Bigtable tableExternalTableDefinitionexternalTable=ExternalTableDefinition.newBuilder(sourceUri,options).build();bigquery.create(TableInfo.of(tableId,externalTable));// Example queryTableResultresults=bigquery.query(QueryJobConfiguration.of(query));results.iterateAll().forEach(row->row.forEach(val->System.out.printf("%s,",val.toString())));System.out.println("Query on external permanent table performed successfully.");}catch(BigQueryException|InterruptedExceptione){System.out.println("Query not performed \n"+e.toString());}}}Query external tables
For more information, seeQuery Bigtable data.
Generated schema
By default, BigQuery exposes the values in a column family as anarray of columns and within that, an array of values written at differenttimestamps. This schema preserves the natural layout of data inBigtable, but SQL queries can be challenging. It is possible topromote columns to subfields within the parent column family and to read onlythe latest value from each cell. This represents both of the arrays in thedefault schema as scalar values.
Example
You are storing user profiles for a fictional social network. One data model forthis might be aprofile column family with individualcolumns forgender,age andemail:
rowkey | profile:gender| profile:age| profile:email-------| --------------| -----------| -------------alice | female | 30 | alice@gmail.comUsing the default schema, a GoogleSQL query to count the number of male usersover 30 is:
SELECTCOUNT(1)FROM`dataset.table`OMITRECORDIFNOTSOME(profile.column.name="gender"ANDprofile.column.cell.value="male")ORNOTSOME(profile.column.name="age"ANDINTEGER(profile.column.cell.value)>30)
Querying the data is less challenging ifgender andage are exposed as sub-fields. To expose them as sub-fields, listgender andage as named columnsin theprofile column family when defining the table. You can also instructBigQuery to expose the latest values from this column familybecause typically, only the latest value (and possibly the only value) is ofinterest.
After exposing the columns as sub-fields, the GoogleSQL query to count thenumber of male users over 30 is:
SELECTCOUNT(1)FROM`dataset.table`WHEREprofile.gender.cell.value="male"ANDprofile.age.cell.value>30
Notice howgender andage are referenced directly as fields. The JSONconfiguration for this setup is:
"bigtableOptions":{"readRowkeyAsString":"true","columnFamilies":[{"familyId":"profile","onlyReadLatest":"true","columns":[{"qualifierString":"gender","type":"STRING"},{"qualifierString":"age","type":"INTEGER"}]}]}
Value encoding
Bigtable stores data as raw bytes, independent to data encoding.However, byte values are of limited use in SQL query analysis.Bigtable provides two basic types of scalar decoding: textand HBase-binary.
The text format assumes that all values are stored as alphanumeric text strings.For example, an integer 768 will be stored as the string "768". The binaryencoding assumes that HBase'sBytes.toBytesclass of methods were used to encode the data and applies an appropriatedecoding method.
Supported regions and zones
Querying data in Bigtable is available in all supportedBigtable zones. You can find the list of zoneshere. For multi-cluster instances,BigQuery routes traffic based onBigtableapp profile settings.
Limitations
- You can't create external tables over Bigtable SQL-basedobjects, such as views and continuous materialized views.
- For more information about limitations that apply to external tables, seeExternal table limitations.
Scopes for Compute Engine instances
When you create a Compute Engine instance, you can specify a list of scopesfor the instance. The scopes control the instance's access to Google Cloudproducts, including Bigtable. Applications running on the VMuse the service account to call Google Cloud APIs.
If you set up a Compute Engine instance to run as aservice account,and that service account accesses an external table linked to aBigtable data source, you must add the Bigtableread-only data access scope(https://www.googleapis.com/auth/bigtable.data.readonly) to theinstance. For more information, seeCreating a Compute Engine instance for Bigtable.
For information on applying scopes to a Compute Engine instance,seeChanging the service account and access scopes for an instance.For more information on Compute Engine service accounts, seeService accounts.
What's next
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-15 UTC.