Query Cloud Storage data in BigLake tables

Important: The term "BigLake" on this page refers to an accessdelegation functionality for external tables in BigQuery. Forinformation about BigLake, the stand-alone Google Cloudproduct that includes BigLake metastore, the Apache Iceberg REST catalog,and BigLake tables for Apache Iceberg seeBigLake overview.

This document describes how to query data stored in aCloud Storage BigLake table.

Before you begin

Ensure that you have aCloud Storage BigLake table.

Required roles

To query Cloud Storage BigLake tables, ensureyou have the following roles:

  • BigQuery Data Viewer (roles/bigquery.dataViewer)
  • BigQuery User (roles/bigquery.user)

Depending on your permissions, you cangrant these roles to yourself or ask your administratorto grant them to you. For more information about granting roles, seeViewing the grantable roles on resources.

To see the exact permissions that are required to queryCloud Storage BigLake tables, expand theRequired permissions section:

Required permissions

You might also be able to get these permissions withcustom rolesor otherpredefined roles

Query BigLake tables

After creating a Cloud Storage BigLake table, you canquery it using GoogleSQL syntax,the same as if it were a standard BigQuery table. For example,SELECT field1, field2 FROM mydataset.my_cloud_storage_table;.

Query BigLake tables using external data processing tools

You can use BigQuery connectors with other data processing toolsto access BigLake tables on Cloud Storage. For moreinformation, seeConnectors.

Apache Spark

The following example usesDataproc, but it also works with any Spark deployment thatuses theSpark-BigQuery connector.

In this example, you supply the Spark-BigQuery connector as an initializationaction when youcreate a cluster.This action lets you use a Zeppelin notebook and exercise the data analyst userjourney.

Spark-BigQuery connector versions are listed in the GitHubGoogleCloudDataproc/spark-bigquery-connector repository.

Create a single node cluster using the initialization action for theSpark-BigQuery connector:

gclouddataprocclusterscreatebiglake-demo-cluster\--optional-components=ZEPPELIN\--region=REGION\--enable-component-gateway\--single-node\--initialization-actionsgs://goog-dataproc-initialization-actions-REGION/connectors/connectors.sh\--metadataspark-bigquery-connector-url=gs://spark-lib/bigquery/spark-bigquery-with-dependencies_SCALA_VERSION-CONNECTOR_VERSION.jar

Apache Hive

The following example usesDataproc, but it also works with any Hive deployment thatuses theHive-BigQuery connector.

In this example, you supply the Hive-BigQuery connector as an initializationaction when youcreate a cluster.

Hive-BigQuery connector versions are listed in the GitHubGoogleCloudDataproc/hive-bigquery-connector repository.

Create a single node cluster using the initialization action for theHive-BigQuery connector:

gclouddataprocclusterscreatebiglake-hive-demo-cluster\--region=REGION\--single-node\--initialization-actionsgs://goog-dataproc-initialization-actions-REGION/connectors/connectors.sh\--metadatahive-bigquery-connector-url=gs://goog-dataproc-artifacts-REGION/hive-bigquery/hive-bigquery-connector-CONNECTOR_VERSION.jar

For more information about the Hive-BigQuery connector, seeUse the Hive-BigQuery Connector.

Dataflow

To read BigLake tables fromDataflow, use the Dataflowconnector inDIRECT_READ mode to use the BigQuery Storage API. Reading from a query stringis also supported. SeeBigQuery I/Oin the Apache Beam documentation.

Note: The defaultEXPORT mode for Dataflow is not supported.

Query temporary BigLake tables

Querying an external data source using a temporary table is useful for one-time, ad-hoc queries over external data, or for extract, transform, and load (ETL) processes.

To query an external data source without creating a permanent table, you provide a table definition for the temporary table, and then use that table definition in a command or call to query the temporary table. You can provide the table definition in any of the following ways:

The table definition file or supplied schema is used to create the temporary external table,and the query runs against the temporary external table.

When you use a temporary external table, you do not create a table in one of your BigQuery datasets. Because the table is not permanently stored in a dataset, it cannot be shared with others.

You can create and query a temporary table linked to an external data sourceby using the bq command-line tool, the API, or the client libraries.

bq

Use thebq query commandwith the--external_table_definition flag.

(Optional) Supply the--location flag and set the value to yourlocation.

To query a temporary table linked to your external data source using a tabledefinition file, enter the following command.

bq--location=LOCATIONquery\--external_table_definition=TABLE::DEFINITION_FILE\'QUERY'

Replace the following:

  • LOCATION: the name of yourlocation.The--location flag is optional. For example, if you are usingBigQuery in the Tokyo region, you can set the flag's valuetoasia-northeast1. You can set a default value for the location usingthe.bigqueryrc file.
  • TABLE: the name of the temporary table you're creating.
  • DEFINITION_FILE: the path to thetable definition fileon your local machine.
  • QUERY: the query you're submitting to the temporary table.

For example, the following command creates and queries a temporary tablenamedsales using a table definition file namedsales_def.

bqquery\--external_table_definition=sales::sales_def@us.myconnection \'SELECT  Region,  Total_salesFROM  sales'

To query a temporary table linked to your external data source using aninline schema definition, enter the following command.

bq--location=LOCATIONquery\--external_table_definition=TABLE::SCHEMA@SOURCE_FORMAT=BUCKET_PATH@projects/PROJECT_ID/locations/REGION/connections/CONNECTION_ID\'query'

Replace the following:

  • LOCATION: the name of yourlocation.The--location flag is optional. For example, if you are usingBigQuery in the Tokyo region, you can set the flag's valuetoasia-northeast1. You can set a default value for the location usingthe.bigqueryrc file.
  • TABLE: the name of the temporary table you're creating.
  • SCHEMA: the inline schema definition in the formatfield:data_type,field:data_type.
  • SOURCE_FORMAT: the format of the external data source, for example,CSV.
  • BUCKET_PATH: the path to theCloud Storage bucket that contains the data for thetable, in the formatgs://bucket_name/[folder_name/]file_pattern.

    You can select multiple files from the bucket by specifying one asterisk (*)wildcard character in thefile_pattern. For example,gs://mybucket/file00*.parquet. For moreinformation, seeWildcard support for Cloud Storage URIs.

    You can specify multiple buckets for theuris option by providing multiplepaths.

    The following examples show validuris values:

    • gs://bucket/path1/myfile.csv
    • gs://bucket/path1/*.parquet
    • gs://bucket/path1/file1*,gs://bucket1/path1/*

    When you specifyuris values that target multiple files, all of thosefiles must share a compatible schema.

    For more information about using Cloud Storage URIs inBigQuery, seeCloud Storage resource path.

  • PROJECT_ID: the project that contains theconnection.

  • REGION: the region that contains theconnection—for example,us.

  • CONNECTION_ID: the name of the connection—forexample,myconnection.

  • QUERY: the query you're submitting to the temporary table.

For example, the following command creates and queries a temporary tablenamedsales linked to a CSV file stored in Cloud Storage with thefollowing schema definition:Region:STRING,Quarter:STRING,Total_sales:INTEGER.

bqquery\--external_table_definition=sales::Region:STRING,Quarter:STRING,Total_sales:INTEGER@CSV=gs://mybucket/sales.csv@us.myconnection \'SELECT  Region,  Total_salesFROM  sales'

To query a temporary table linked to your external data source using a JSONschema file, enter the following command.

bq--location=LOCATIONquery\--external_table_definition=SCHEMA_FILE@SOURCE_FORMAT=BUCKET_PATH@projects/PROJECT_ID/locations/REGION/connections/CONNECTION_ID\'QUERY'

Replace the following:

  • LOCATION: the name of yourlocation.The--location flag is optional. For example, if you are usingBigQuery in the Tokyo region, you can set the flag's valuetoasia-northeast1. You can set a default value for the location usingthe.bigqueryrc file.
  • SCHEMA_FILE: the path to the JSON schema file on your localmachine.
  • SOURCE_FORMAT: the format of the external data source, for example,CSV.
  • BUCKET_PATH: the path to theCloud Storage bucket that contains the data for thetable, in the formatgs://bucket_name/[folder_name/]file_pattern.

    You can select multiple files from the bucket by specifying one asterisk (*)wildcard character in thefile_pattern. For example,gs://mybucket/file00*.parquet. For moreinformation, seeWildcard support for Cloud Storage URIs.

    You can specify multiple buckets for theuris option by providing multiplepaths.

    The following examples show validuris values:

    • gs://bucket/path1/myfile.csv
    • gs://bucket/path1/*.parquet
    • gs://bucket/path1/file1*,gs://bucket1/path1/*

    When you specifyuris values that target multiple files, all of thosefiles must share a compatible schema.

    For more information about using Cloud Storage URIs inBigQuery, seeCloud Storage resource path.

  • PROJECT_ID: the project that contains theconnection.

  • REGION: the region that contains theconnection—for example,us.

  • CONNECTION_ID: the name of the connection—forexample,myconnection.

  • QUERY: the query you're submitting to the temporary table.

For example, the following command creates and queries a temporary tablenamedsales linked to a CSV file stored in Cloud Storage using the/tmp/sales_schema.json schema file.

bqquery\--external_table_definition=sales::/tmp/sales_schema.json@CSV=gs://mybucket/sales.csv@us.myconnection\'SELECT      Region,      Total_sales    FROM      sales'

API

To run a query using the API, follow these steps:

  1. Create aJob object.
  2. Populate theconfiguration section of theJob object with aJobConfiguration object.
  3. Populate thequery section of theJobConfiguration object with aJobConfigurationQuery object.
  4. Populate thetableDefinitions section of theJobConfigurationQuery objectwith anExternalDataConfiguration object.Specify the connection to use for connecting to Cloud Storage intheconnectionId field.
  5. Call thejobs.insert method torun the query asynchronously or thejobs.query method to runthe query synchronously, passing in theJob object.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.