Create Cloud Storage external tables

Important: The term "BigLake" on this page refers to an accessdelegation functionality for external tables in BigQuery. Forinformation about BigLake, the stand-alone Google Cloudproduct that includes BigLake metastore, the Apache Iceberg REST catalog,and BigLake tables for Apache Iceberg seeBigLake overview.

BigQuery supports querying Cloud Storage data in thefollowing formats:

  • Comma-separated values (CSV)
  • JSON (newline-delimited)
  • Avro
  • ORC
  • Parquet
  • Datastore exports
  • Firestore exports

BigQuery supports querying Cloud Storage data from thesestorage classes:

  • Standard
  • Nearline
  • Coldline
  • Archive

To query a Cloud Storage external table, you must have permissionson both the external table and the Cloud Storage files. We recommendusing aBigLake table instead ifpossible. BigLake tables provide access delegation, so thatyou only need permissions on the BigLake table in order to querythe Cloud Storage data.

Be sure toconsider the locationof your dataset and Cloud Storage bucket when you query data stored inCloud Storage.

Before you begin

Grant Identity and Access Management (IAM) roles that give users the necessary permissions to perform each task in this document. The permissions required to perform a task (if any) are listed in the "Required permissions" section of the task.

Required roles

To create an external table, you need thebigquery.tables.createBigQuery Identity and Access Management (IAM) permission.

Each of the following predefined Identity and Access Management roles includes this permission:

  • BigQuery Data Editor (roles/bigquery.dataEditor)
  • BigQuery Data Owner (roles/bigquery.dataOwner)
  • BigQuery Admin (roles/bigquery.admin)

You also need the following permissions to access the Cloud Storagebucket that contains your data:

  • storage.buckets.get
  • storage.objects.get
  • storage.objects.list (required if you are using a URIwildcard)

The Cloud Storage Storage Admin (roles/storage.admin)predefined Identity and Access Management role includes these permissions.

If you are not a principal in any of these roles, ask your administratorto grant you access or to create the external table for you.

For more information on Identity and Access Management roles and permissions inBigQuery, seePredefined roles andpermissions.

Access scopes for Compute Engine instances

If, from a Compute Engine instance, you need to query an external tablethat is linked to a Cloud Storage source, the instance must have at least theCloud Storage read-onlyaccess scope(https://www.googleapis.com/auth/devstorage.read_only).

The scopes control the Compute Engine instance's access to Google Cloudproducts, including Cloud Storage. Applications running on the instance usethe service account attached to the instance to call Google Cloud APIs.

If you set up a Compute Engine instance to run as thedefault Compute Engine service account,the instance is by default granted a number ofdefault scopes,including thehttps://www.googleapis.com/auth/devstorage.read_only scope.

If instead you set up the instance with a custom service account, make sure toexplicitly grant thehttps://www.googleapis.com/auth/devstorage.read_onlyscope to the instance.

For information about applying scopes to a Compute Engine instance,seeChanging the service account and access scopes for an instance.For more information about Compute Engine service accounts, seeService accounts.

Create external tables on unpartitioned data

You can create a permanent table linked to your external data source by:

Select one of the following options:

Console

  1. Go to theBigQuery page.

    Go to BigQuery

  2. In the left pane, clickExplorer:

    Highlighted button for the Explorer pane.

    If you don't see the left pane, clickExpand left pane to open the pane.

  3. In theExplorer pane, expand your project, clickDatasets, andthen select a dataset.

  4. Expand theActions option and clickCreate table.

  5. In theSource section, specify the following details:

    1. ForCreate table from, selectGoogle Cloud Storage

    2. ForSelect file from GCS bucket or use a URI pattern, browse toselect a bucket and file to use, or type the path in the formatgs://bucket_name/[folder_name/]file_name.

      You can't specify multiple URIs in the Google Cloud console, butyou can select multiple files by specifying one asterisk (*)wildcard character. For example,gs://mybucket/file_name*. For moreinformation, seeWildcard support for Cloud Storage URIs.

      The Cloud Storage bucketmust be in the same location as the dataset that contains the tableyou're creating.

    3. ForFile format, select the format that matches your file.

  6. In theDestination section, specify the following details:

    1. ForProject, choose the project in which to create the table.

    2. ForDataset, choose the dataset in which to create the table.

    3. ForTable, enter the name of the table you are creating.

    4. ForTable type, selectExternal table.

  7. In theSchema section, you can either enableschema auto-detection or manually specifya schema if you have a source file. If you don't have a source file, youmust manually specify a schema.

    • To enable schema auto-detection, select theAuto-detect option.

    • To manually specify a schema, leave theAuto-detect optionunchecked. EnableEdit as text and enter the table schema as aJSON array.

  8. To ignore rows with extra column values that do not match the schema,expand theAdvanced options section and selectUnknown values.

  9. ClickCreate table.

After the permanent table is created, you can run a query against the tableas if it were a native BigQuery table. After your querycompletes, you canexport the resultsas CSV or JSON files, save the resultsas a table, or save the results to Google Sheets.

SQL

You can create a permanent external table by running theCREATE EXTERNAL TABLE DDL statement.You can specify the schema explicitly, or useschema auto-detection to infer the schemafrom the external data.

  1. In the Google Cloud console, go to theBigQuery page.

    Go to BigQuery

  2. In the query editor, enter the following statement:

    CREATEEXTERNALTABLE`PROJECT_ID.DATASET.EXTERNAL_TABLE_NAME`OPTIONS(format="TABLE_FORMAT",uris=['BUCKET_PATH'[,...]]);

    Replace the following:

    • PROJECT_ID: the name of your project in which you want to create the table—for example,myproject
    • DATASET: the name of the BigQuery dataset that you want to create the table in—for example,mydataset
    • EXTERNAL_TABLE_NAME: the name of the table that you want to create—for example,mytable
    • TABLE_FORMAT: the format of the table that you want to create—for example,PARQUET
    • BUCKET_PATH: the path to the Cloud Storage bucket that contains the data for the external table, in the format['gs://bucket_name/[folder_name/]file_name'].

      You can select multiple files from the bucket by specifying one asterisk (*) wildcard character in the path. For example,['gs://mybucket/file_name*']. For more information, seeWildcard support for Cloud Storage URIs.

      You can specify multiple buckets for theuris option by providing multiple paths.

      The following examples show validuris values:

      • ['gs://bucket/path1/myfile.csv']
      • ['gs://bucket/path1/*.csv']
      • ['gs://bucket/path1/*', 'gs://bucket/path2/file00*']

      When you specifyuris values that target multiple files, all of those files must share a compatible schema.

      For more information about using Cloud Storage URIs in BigQuery, seeCloud Storage resource path.

  3. ClickRun.

For more information about how to run queries, seeRun an interactive query.

Examples

The following example uses schema auto-detection to create an external tablenamedsales that is linked to a CSV file stored in Cloud Storage:

CREATEORREPLACEEXTERNALTABLEmydataset.salesOPTIONS(format='CSV',uris=['gs://mybucket/sales.csv']);

The next example specifies a schema explicitly and skips the first row inthe CSV file:

CREATEORREPLACEEXTERNALTABLEmydataset.sales(RegionSTRING,QuarterSTRING,Total_SalesINT64)OPTIONS(format='CSV',uris=['gs://mybucket/sales.csv'],skip_leading_rows=1);

bq

To create an external table, use thebq mk command with the--external_table_definitionflag. This flag contains either a path to atable definition file or an inlinetable definition.

Option 1: Table definition file

Use thebq mkdefcommand to create a table definition file, and then pass the file path tothebq mk command as follows:

bqmkdef--source_format=SOURCE_FORMAT\BUCKET_PATH >DEFINITION_FILEbqmk--table\--external_table_definition=DEFINITION_FILE\DATASET_NAME.TABLE_NAME\SCHEMA

Replace the following:

  • SOURCE_FORMAT: the format of the external data source.For example,CSV.
  • BUCKET_PATH: the path to theCloud Storage bucket that contains the data for thetable, in the formatgs://bucket_name/[folder_name/]file_pattern.

    You can select multiple files from the bucket by specifying one asterisk (*)wildcard character in thefile_pattern. For example,gs://mybucket/file00*.parquet. For moreinformation, seeWildcard support for Cloud Storage URIs.

    You can specify multiple buckets for theuris option by providing multiplepaths.

    The following examples show validuris values:

    • gs://bucket/path1/myfile.csv
    • gs://bucket/path1/*.parquet
    • gs://bucket/path1/file1*,gs://bucket1/path1/*

    When you specifyuris values that target multiple files, all of thosefiles must share a compatible schema.

    For more information about using Cloud Storage URIs inBigQuery, seeCloud Storage resource path.

  • DEFINITION_FILE: the path to thetable definition fileon your local machine.

  • DATASET_NAME: the name of the dataset that contains thetable.

  • TABLE_NAME: the name of the table you're creating.

  • SCHEMA: specifies a path to aJSON schema file,or specifies the schema in the formfield:data_type,field:data_type,....

Example:

bq mkdef --source_format=CSV gs://mybucket/sales.csv > mytable_defbq mk --table --external_table_definition=mytable_def \  mydataset.mytable \  Region:STRING,Quarter:STRING,Total_sales:INTEGER

To use schema auto-detection, set the--autodetect=true flag in themkdef command and omit the schema:

bq mkdef --source_format=CSV --autodetect=true \  gs://mybucket/sales.csv > mytable_defbq mk --table --external_table_definition=mytable_def \  mydataset.mytable

Option 2: Inline table definition

Instead of creating a table definition file, you can pass the tabledefinition directly to thebq mk command:

bqmk--table\--external_table_definition=@SOURCE_FORMAT=BUCKET_PATH\DATASET_NAME.TABLE_NAME\SCHEMA

Replace the following:

  • SOURCE_FORMAT: the format of the external data source

    For example,CSV.

  • BUCKET_PATH: the path to theCloud Storage bucket that contains the data for thetable, in the formatgs://bucket_name/[folder_name/]file_pattern.

    You can select multiple files from the bucket by specifying one asterisk (*)wildcard character in thefile_pattern. For example,gs://mybucket/file00*.parquet. For moreinformation, seeWildcard support for Cloud Storage URIs.

    You can specify multiple buckets for theuris option by providing multiplepaths.

    The following examples show validuris values:

    • gs://bucket/path1/myfile.csv
    • gs://bucket/path1/*.parquet
    • gs://bucket/path1/file1*,gs://bucket1/path1/*

    When you specifyuris values that target multiple files, all of thosefiles must share a compatible schema.

    For more information about using Cloud Storage URIs inBigQuery, seeCloud Storage resource path.

  • DATASET_NAME: the name of the dataset that contains the table.

  • TABLE_NAME: the name of the table you're creating.

  • SCHEMA: specifies a path to aJSON schema file,or specifies the schema in the formfield:data_type,field:data_type,.... To use schemaauto-detection, omit this argument.

Example:

bq mkdef --source_format=CSV gs://mybucket/sales.csv > mytable_defbq mk --table --external_table_definition=mytable_def \  mydataset.mytable \  Region:STRING,Quarter:STRING,Total_sales:INTEGER

API

Call thetables.insert methodAPI method, and create anExternalDataConfigurationin theTable resourcethat you pass in.

Specify theschema property or set theautodetect property totrue to enable schema auto detection forsupported data sources.

Java

Before trying this sample, follow theJava setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQueryJava API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

importcom.google.cloud.bigquery.BigQuery;importcom.google.cloud.bigquery.BigQueryException;importcom.google.cloud.bigquery.BigQueryOptions;importcom.google.cloud.bigquery.CsvOptions;importcom.google.cloud.bigquery.ExternalTableDefinition;importcom.google.cloud.bigquery.Field;importcom.google.cloud.bigquery.QueryJobConfiguration;importcom.google.cloud.bigquery.Schema;importcom.google.cloud.bigquery.StandardSQLTypeName;importcom.google.cloud.bigquery.TableId;importcom.google.cloud.bigquery.TableInfo;importcom.google.cloud.bigquery.TableResult;// Sample to queries an external data source using a permanent tablepublicclassQueryExternalGCSPerm{publicstaticvoidrunQueryExternalGCSPerm(){// TODO(developer): Replace these variables before running the sample.StringdatasetName="MY_DATASET_NAME";StringtableName="MY_TABLE_NAME";StringsourceUri="gs://cloud-samples-data/bigquery/us-states/us-states.csv";Schemaschema=Schema.of(Field.of("name",StandardSQLTypeName.STRING),Field.of("post_abbr",StandardSQLTypeName.STRING));Stringquery=String.format("SELECT * FROM %s.%s WHERE name LIKE 'W%%'",datasetName,tableName);queryExternalGCSPerm(datasetName,tableName,sourceUri,schema,query);}publicstaticvoidqueryExternalGCSPerm(StringdatasetName,StringtableName,StringsourceUri,Schemaschema,Stringquery){try{// Initialize client that will be used to send requests. This client only needs to be created// once, and can be reused for multiple requests.BigQuerybigquery=BigQueryOptions.getDefaultInstance().getService();// Skip header row in the file.CsvOptionscsvOptions=CsvOptions.newBuilder().setSkipLeadingRows(1).build();TableIdtableId=TableId.of(datasetName,tableName);// Create a permanent table linked to the GCS fileExternalTableDefinitionexternalTable=ExternalTableDefinition.newBuilder(sourceUri,csvOptions).setSchema(schema).build();bigquery.create(TableInfo.of(tableId,externalTable));// Example query to find states starting with 'W'TableResultresults=bigquery.query(QueryJobConfiguration.of(query));results.iterateAll().forEach(row->row.forEach(val->System.out.printf("%s,",val.toString())));System.out.println("Query on external permanent table performed successfully.");}catch(BigQueryException|InterruptedExceptione){System.out.println("Query not performed \n"+e.toString());}}}

Node.js

Before trying this sample, follow theNode.js setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQueryNode.js API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

// Import the Google Cloud client library and create a clientconst{BigQuery}=require('@google-cloud/bigquery');constbigquery=newBigQuery();asyncfunctionqueryExternalGCSPerm(){// Queries an external data source using a permanent table/**   * TODO(developer): Uncomment the following lines before running the sample.   */// const datasetId = "my_dataset";// const tableId = "my_table";// Configure the external data sourceconstdataConfig={sourceFormat:'CSV',sourceUris:['gs://cloud-samples-data/bigquery/us-states/us-states.csv'],// Optionally skip header rowcsvOptions:{skipLeadingRows:1},};// For all options, see https://cloud.google.com/bigquery/docs/reference/v2/tables#resourceconstoptions={schema:schema,externalDataConfiguration:dataConfig,};// Create an external table linked to the GCS fileconst[table]=awaitbigquery.dataset(datasetId).createTable(tableId,options);console.log(`Table${table.id} created.`);// Example query to find states starting with 'W'constquery=`SELECT post_abbr  FROM \`${datasetId}.${tableId}\`  WHERE name LIKE 'W%'`;// Run the query as a jobconst[job]=awaitbigquery.createQueryJob(query);console.log(`Job${job.id} started.`);// Wait for the query to finishconst[rows]=awaitjob.getQueryResults();// Print the resultsconsole.log('Rows:');console.log(rows);}

Python

Before trying this sample, follow thePython setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQueryPython API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

fromgoogle.cloudimportbigquery# Construct a BigQuery client object.client=bigquery.Client()# TODO(developer): Set table_id to the ID of the table to create.table_id="your-project.your_dataset.your_table_name"# TODO(developer): Set the external source format of your table.# Note that the set of allowed values for external data sources is# different than the set used for loading data (see :class:`~google.cloud.bigquery.job.SourceFormat`).external_source_format="AVRO"# TODO(developer): Set the source_uris to point to your data in Google Cloudsource_uris=["gs://cloud-samples-data/bigquery/federated-formats-reference-file-schema/a-twitter.avro","gs://cloud-samples-data/bigquery/federated-formats-reference-file-schema/b-twitter.avro","gs://cloud-samples-data/bigquery/federated-formats-reference-file-schema/c-twitter.avro",]# Create ExternalConfig object with external source formatexternal_config=bigquery.ExternalConfig(external_source_format)# Set source_uris that point to your data in Google Cloudexternal_config.source_uris=source_uris# TODO(developer) You have the option to set a reference_file_schema_uri, which points to# a reference file for the table schemareference_file_schema_uri="gs://cloud-samples-data/bigquery/federated-formats-reference-file-schema/b-twitter.avro"external_config.reference_file_schema_uri=reference_file_schema_uritable=bigquery.Table(table_id)# Set the external data configuration of the tabletable.external_data_configuration=external_configtable=client.create_table(table)# Make an API request.print(f"Created table with external source format{table.external_data_configuration.source_format}")

Create external tables on partitioned data

You can create an external table for Hive partitioned data that resides inCloud Storage. After you create an externally partitioned table, youcan't change the partition key. You need to recreate the table to change thepartition key.

To create an external table for Hive partitioned data, choose one of thefollowing options:

Console

  1. In the Google Cloud console, go toBigQuery.

    Go to BigQuery

  2. In the left pane, clickExplorer.
  3. In theExplorer pane, expand your project, clickDatasets, and then select a dataset.
  4. ClickActions, and then clickCreate table. This opens theCreate table pane.
  5. In theSource section, specify the following details:
    1. ForCreate table from, selectGoogle Cloud Storage.
    2. ForSelect file from Cloud Storage bucket, enter the path to the Cloud Storage folder, usingwildcards. For example,my_bucket/my_files*. The Cloud Storage bucket must be in the same location as the dataset that contains the table you want to create, append, or overwrite.
    3. From theFile format list, select the file type.
    4. Select theSource data partitioning checkbox, and then forSelect Source URI Prefix, enter the Cloud Storage URI prefix. For example,gs://my_bucket/my_files.
    5. In thePartition inference mode section, select one of the following options:
      • Automatically infer types: set the partition schema detection mode toAUTO.
      • All columns are strings: set the partition schema detection mode toSTRINGS.
      • Provide my own: set the partition schema detection mode toCUSTOM and manually enter the schema information for the partition keys. For more information, seeProvide a custom partition key schema.
    6. Optional: To require a partition filter on all queries for this table, select theRequire partition filter checkbox. Requiring a partition filter can reduce cost and improve performance. For more information, seeRequiring predicate filters on partition keys in queries.
  6. In theDestination section, specify the following details:
    1. ForProject, select the project in which you want to create the table.
    2. ForDataset, select the dataset in which you want to create the table.
    3. ForTable, enter the name of the table that you want to create.
    4. ForTable type, selectExternal table.
  7. In theSchema section, enter theschema definition.
  8. To enable theauto detection of schema, selectAuto detect.
  9. To ignore rows with extra column values that do not match the schema, expand theAdvanced options section and selectUnknown values.
  10. ClickCreate table.

SQL

Use theCREATE EXTERNAL TABLE DDL statement.

The following example uses automatic detection of Hive partition keys:

CREATEEXTERNALTABLE`PROJECT_ID.DATASET.EXTERNAL_TABLE_NAME`WITHPARTITIONCOLUMNSOPTIONS(format='SOURCE_FORMAT',uris=['GCS_URIS'],hive_partition_uri_prefix='GCS_URI_SHARED_PREFIX',require_hive_partition_filter=BOOLEAN);

Replace the following:

  • SOURCE_FORMAT: the format of the external datasource, such asPARQUET
  • GCS_URIS: the path to the Cloud Storagefolder, using wildcard format
  • GCS_URI_SHARED_PREFIX: the source URI prefix withoutthe wildcard
  • BOOLEAN: whether to require a predicate filter atquery time. This flag is optional. The default value isfalse.

The following example uses custom Hive partition keys and types by listingthem in theWITH PARTITION COLUMNS clause:

CREATEEXTERNALTABLE`PROJECT_ID.DATASET.EXTERNAL_TABLE_NAME`WITHPARTITIONCOLUMNS(PARTITION_COLUMN_LIST)OPTIONS(format='SOURCE_FORMAT',uris=['GCS_URIS'],hive_partition_uri_prefix='GCS_URI_SHARED_PREFIX',require_hive_partition_filter=BOOLEAN);

Replace the following:

  • PARTITION_COLUMN_LIST: a list of columns followingthe same order in the path of Cloud Storage folder, in the formatof:
KEY1TYPE1,KEY2TYPE2

The following example creates an externally partitioned table. It uses schemaauto-detection to detect both the file schema and the hive partitioninglayout. If the external path isgs://bucket/path/field_1=first/field_2=1/data.parquet, the partition columnsare detected asfield_1 (STRING) andfield_2 (INT64).

CREATEEXTERNALTABLEdataset.AutoHivePartitionedTableWITHPARTITIONCOLUMNSOPTIONS(uris=['gs://bucket/path/*'],format='PARQUET',hive_partition_uri_prefix='gs://bucket/path',require_hive_partition_filter=false);

The following example creates an externally partitioned table by explicitlyspecifying the partition columns. This example assumes that the external filepath has the patterngs://bucket/path/field_1=first/field_2=1/data.parquet.

CREATEEXTERNALTABLEdataset.CustomHivePartitionedTableWITHPARTITIONCOLUMNS(field_1STRING,-- column order must match the external pathfield_2INT64)OPTIONS(uris=['gs://bucket/path/*'],format='PARQUET',hive_partition_uri_prefix='gs://bucket/path',require_hive_partition_filter=false);

bq

First, use thebq mkdef command tocreate a table definition file:

bqmkdef\--source_format=SOURCE_FORMAT\--hive_partitioning_mode=PARTITIONING_MODE\--hive_partitioning_source_uri_prefix=GCS_URI_SHARED_PREFIX\--require_hive_partition_filter=BOOLEAN\GCS_URIS>DEFINITION_FILE

Replace the following:

  • SOURCE_FORMAT: the format of the external data source. Forexample,CSV.
  • PARTITIONING_MODE: the Hive partitioning mode. Use one of thefollowing values:
    • AUTO: Automatically detect the key names and types.
    • STRINGS: Automatically convert the key names to strings.
    • CUSTOM: Encode the key schema in the source URI prefix.
  • GCS_URI_SHARED_PREFIX: the source URI prefix.
  • BOOLEAN: specifies whether to require a predicate filter at querytime. This flag is optional. The default value isfalse.
  • GCS_URIS: the path to the Cloud Storage folder, usingwildcard format.
  • DEFINITION_FILE: the path to thetable definition file on your localmachine.

IfPARTITIONING_MODE isCUSTOM, include the partition key schemain the source URI prefix, using the following format:

--hive_partitioning_source_uri_prefix=GCS_URI_SHARED_PREFIX/{KEY1:TYPE1}/{KEY2:TYPE2}/...

After you create the table definition file, use thebq mk command tocreate the external table:

bqmk--external_table_definition=DEFINITION_FILE\DATASET_NAME.TABLE_NAME\SCHEMA

Replace the following:

  • DEFINITION_FILE: the path to the table definition file.
  • DATASET_NAME: the name of the dataset that contains thetable.
  • TABLE_NAME: the name of the table you're creating.
  • SCHEMA: specifies a path to aJSON schema file,or specifies the schema in the formfield:data_type,field:data_type,.... To use schemaauto-detection, omit this argument.

Examples

The following example usesAUTO Hive partitioning mode:

bq mkdef --source_format=CSV \  --hive_partitioning_mode=AUTO \  --hive_partitioning_source_uri_prefix=gs://myBucket/myTable \  gs://myBucket/myTable/* > mytable_defbq mk --external_table_definition=mytable_def \  mydataset.mytable \  Region:STRING,Quarter:STRING,Total_sales:INTEGER

The following example usesSTRING Hive partitioning mode:

bq mkdef --source_format=CSV \  --hive_partitioning_mode=STRING \  --hive_partitioning_source_uri_prefix=gs://myBucket/myTable \  gs://myBucket/myTable/* > mytable_defbq mk --external_table_definition=mytable_def \  mydataset.mytable \  Region:STRING,Quarter:STRING,Total_sales:INTEGER

The following example usesCUSTOM Hive partitioning mode:

bq mkdef --source_format=CSV \  --hive_partitioning_mode=CUSTOM \  --hive_partitioning_source_uri_prefix=gs://myBucket/myTable/{dt:DATE}/{val:STRING} \  gs://myBucket/myTable/* > mytable_defbq mk --external_table_definition=mytable_def \  mydataset.mytable \  Region:STRING,Quarter:STRING,Total_sales:INTEGER

API

To set Hive partitioning using the BigQuery API, include ahivePartitioningOptionsobject in theExternalDataConfigurationobject when you create thetable definition file.

If you set thehivePartitioningOptions.mode field toCUSTOM, you mustencode the partition key schema in thehivePartitioningOptions.sourceUriPrefix field as follows:gs://BUCKET/PATH_TO_TABLE/{KEY1:TYPE1}/{KEY2:TYPE2}/...

To enforce the use of a predicate filter at query time, set thehivePartitioningOptions.requirePartitionFilter field totrue.

Java

Before trying this sample, follow theJava setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQueryJava API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

importcom.google.cloud.bigquery.BigQuery;importcom.google.cloud.bigquery.BigQueryException;importcom.google.cloud.bigquery.BigQueryOptions;importcom.google.cloud.bigquery.ExternalTableDefinition;importcom.google.cloud.bigquery.FormatOptions;importcom.google.cloud.bigquery.HivePartitioningOptions;importcom.google.cloud.bigquery.TableId;importcom.google.cloud.bigquery.TableInfo;// Sample to create external table using hive partitioningpublicclassSetHivePartitioningOptions{publicstaticvoidmain(String[]args){// TODO(developer): Replace these variables before running the sample.StringdatasetName="MY_DATASET_NAME";StringtableName="MY_TABLE_NAME";StringsourceUri="gs://cloud-samples-data/bigquery/hive-partitioning-samples/customlayout/*";StringsourceUriPrefix="gs://cloud-samples-data/bigquery/hive-partitioning-samples/customlayout/{pkey:STRING}/";setHivePartitioningOptions(datasetName,tableName,sourceUriPrefix,sourceUri);}publicstaticvoidsetHivePartitioningOptions(StringdatasetName,StringtableName,StringsourceUriPrefix,StringsourceUri){try{// Initialize client that will be used to send requests. This client only needs to be created// once, and can be reused for multiple requests.BigQuerybigquery=BigQueryOptions.getDefaultInstance().getService();// Configuring partitioning optionsHivePartitioningOptionshivePartitioningOptions=HivePartitioningOptions.newBuilder().setMode("CUSTOM").setRequirePartitionFilter(true).setSourceUriPrefix(sourceUriPrefix).build();TableIdtableId=TableId.of(datasetName,tableName);ExternalTableDefinitioncustomTable=ExternalTableDefinition.newBuilder(sourceUri,FormatOptions.parquet()).setAutodetect(true).setHivePartitioningOptions(hivePartitioningOptions).build();bigquery.create(TableInfo.of(tableId,customTable));System.out.println("External table created using hivepartitioningoptions");}catch(BigQueryExceptione){System.out.println("External table was not created"+e.toString());}}}

Query external tables

For more information, seeQuery Cloud Storage data in external tables.

Upgrade external tables to BigLake

You can upgrade tables based on Cloud Storage to BigLaketables by associating the external table to a connection. If you want to usemetadata cachingwith the BigLake table, you can specify settings for this atthe same time. To get table details such as source format and source URI, seeGet table information.

To update an external table to a BigLake table, select one of thefollowing options:

SQL

Use theCREATE OR REPLACE EXTERNAL TABLE DDL statementto update a table:

  1. In the Google Cloud console, go to theBigQuery page.

    Go to BigQuery

  2. In the query editor, enter the following statement:

    CREATEORREPLACEEXTERNALTABLE`PROJECT_ID.DATASET.EXTERNAL_TABLE_NAME`WITHCONNECTION{`REGION.CONNECTION_ID`|DEFAULT}OPTIONS(format="TABLE_FORMAT",uris=['BUCKET_PATH'],max_staleness=STALENESS_INTERVAL,metadata_cache_mode='CACHE_MODE');

    Replace the following:

    • PROJECT_ID: the name of the project that contains the table
    • DATASET: the name of the dataset that contains the table
    • EXTERNAL_TABLE_NAME: the name of the table
    • REGION: the region that contains the connection
    • CONNECTION_ID: the name of the connection to use

      To use a default connection, specifyDEFAULT instead of the connection string containingREGION.CONNECTION_ID.

    • TABLE_FORMAT: the format used by the table

      You can't change this when updating the table.

    • BUCKET_PATH: the path to the Cloud Storage bucket that contains the data for the external table, in the format['gs://bucket_name/[folder_name/]file_name'].

      You can select multiple files from the bucket by specifying one asterisk (*) wildcard character in the path. For example,['gs://mybucket/file_name*']. For more information, seeWildcard support for Cloud Storage URIs.

      You can specify multiple buckets for theuris option by providing multiple paths.

      The following examples show validuris values:

      • ['gs://bucket/path1/myfile.csv']
      • ['gs://bucket/path1/*.csv']
      • ['gs://bucket/path1/*', 'gs://bucket/path2/file00*']

      When you specifyuris values that target multiple files, all of those files must share a compatible schema.

      For more information about using Cloud Storage URIs in BigQuery, seeCloud Storage resource path.

    • STALENESS_INTERVAL: specifies whether cached metadata is used by operations against the table, and how fresh the cached metadata must be in order for the operation to use it

      For more information about metadata caching considerations, seeMetadata caching for performance.

      To disable metadata caching, specify 0. This is the default.

      To enable metadata caching, specify aninterval literal value between 30 minutes and 7 days. For example, specifyINTERVAL 4 HOUR for a 4 hour staleness interval. With this value, operations against the table use cached metadata if it has been refreshed within the past 4 hours. If the cached metadata is older than that, the operation retrieves metadata from Cloud Storage instead.

    • CACHE_MODE: specifies whether the metadata cache is refreshed automatically or manually

      For more information on metadata caching considerations, seeMetadata caching for performance.

      Set toAUTOMATIC for the metadata cache to be refreshed at a system-defined interval, usually somewhere between 30 and 60 minutes.

      Set toMANUAL if you want to refresh the metadata cache on a schedule you determine. In this case, you can call theBQ.REFRESH_EXTERNAL_METADATA_CACHE system procedure to refresh the cache.

      You must setCACHE_MODE ifSTALENESS_INTERVAL is set to a value greater than 0.

  3. ClickRun.

For more information about how to run queries, seeRun an interactive query.

bq

Use thebq mkdef andbq update commandsto update a table:

  1. Generate anexternal table definition,that describes the aspects of the table to change:

    bqmkdef--connection_id=PROJECT_ID.REGION.CONNECTION_ID\--source_format=TABLE_FORMAT\--metadata_cache_mode=CACHE_MODE\"BUCKET_PATH">/tmp/DEFINITION_FILE

    Replace the following:

    • PROJECT_ID: the name of the project thatcontains the connection
    • REGION: the region that contains theconnection
    • CONNECTION_ID: the name of the connectionto use
    • TABLE_FORMAT: the format used by thetable. You can't change this when updating the table.
    • CACHE_MODE: specifies whether the metadatacache is refreshed automatically or manually. For more informationon metadata caching considerations, seeMetadata caching for performance.

      Set toAUTOMATIC for the metadata cache to be refreshed at asystem-defined interval, usually somewhere between 30 and60 minutes.

      Set toMANUAL if you want to refresh the metadata cache on aschedule you determine. In this case, you can call theBQ.REFRESH_EXTERNAL_METADATA_CACHE system procedure to refreshthe cache.

      You must setCACHE_MODE ifSTALENESS_INTERVAL is set to a valuegreater than 0.

    • BUCKET_PATH: the path to theCloud Storage bucket that contains the data for theexternal table, in the formatgs://bucket_name/[folder_name/]file_name.

      You can limit the files selected from the bucket by specifying one asterisk (*)wildcard character in the path. For example,gs://mybucket/file_name*. For moreinformation, seeWildcard support for Cloud Storage URIs.

      You can specify multiple buckets for theuris option by providing multiplepaths.

      The following examples show validuris values:

      • gs://bucket/path1/myfile.csv
      • gs://bucket/path1/*.csv
      • gs://bucket/path1/*,gs://bucket/path2/file00*

      When you specifyuris values that target multiple files, all of thosefiles must share a compatible schema.

      For more information about using Cloud Storage URIs inBigQuery, seeCloud Storage resource path.

    • DEFINITION_FILE: the name of the tabledefinition file that you are creating.

  2. Update the table using the new external table definition:

    bqupdate--max_staleness=STALENESS_INTERVAL\--external_table_definition=/tmp/DEFINITION_FILE\PROJECT_ID:DATASET.EXTERNAL_TABLE_NAME

    Replace the following:

    • STALENESS_INTERVAL: specifies whethercached metadata is used by operations against thetable, and how fresh the cached metadata must be in order forthe operation to use it. For more information about metadatacaching considerations, seeMetadata caching for performance.

      To disable metadata caching, specify 0. This is the default.

      To enable metadata caching, specify an interval value between 30minutes and 7 days, using theY-M D H:M:S format described in theINTERVAL data typedocumentation. For example, specify0-0 0 4:0:0 for a 4hour staleness interval.With this value, operations against the table use cached metadata ifit has been refreshed within the past 4 hours. If the cached metadatais older than that, the operation retrieves metadata fromCloud Storage instead.

    • DEFINITION_FILE: the name of the tabledefinition file that you created or updated.

    • PROJECT_ID: the name of the projectthat contains the table

    • DATASET: the name of the dataset thatcontains the table

    • EXTERNAL_TABLE_NAME: the name of the table

Cloud Storage resource path

When you create an external table based on a Cloud Storage data source,you must provide the path to the data.

The Cloud Storage resource path contains your bucket name and yourobject (filename). For example, if the Cloud Storage bucket is namedmybucket and the data file is namedmyfile.csv, the resource path would begs://mybucket/myfile.csv.

BigQuery does not support Cloud Storage resource pathsthat include multiple consecutive slashes after the initial double slash.Cloud Storage object names can contain multiple consecutive slash ("/")characters. However, BigQuery converts multiple consecutiveslashes into a single slash. For example, the following resource path, thoughvalid in Cloud Storage, does not work in BigQuery:gs://bucket/my//object//name.

To retrieve the Cloud Storage resource path:

  1. Open the Cloud Storage console.

    Cloud Storage console

  2. Browse to the location of the object (file) that contains the source data.

  3. Click on the name of the object.

    TheObject details page opens.

  4. Copy the value provided in thegsutil URI field, which begins withgs://.

Note: You can also use thegcloud storage ls command to list bucketsor objects.

Wildcard support for Cloud Storage URIs

If your data is separated into multiple files, you can use an asterisk (*)wildcard to select multiple files. Use of the asterisk wildcard must followthese rules:

  • The asterisk can appear inside the object name or at the end of the objectname.
  • Using multiple asterisks is unsupported. For example, the pathgs://mybucket/fed-*/temp/*.csv is invalid.
  • Using an asterisk with the bucket name is unsupported.

Examples:

  • The following example shows how to select all of the files in all thefolders which start with the prefixgs://mybucket/fed-samples/fed-sample:

    gs://mybucket/fed-samples/fed-sample*
  • The following example shows how to select only files with a.csv extensionin the folder namedfed-samples and any subfolders offed-samples:

    gs://mybucket/fed-samples/*.csv
  • The following example shows how to select files with a naming pattern offed-sample*.csv in the folder namedfed-samples. This example doesn'tselect files in subfolders offed-samples.

    gs://mybucket/fed-samples/fed-sample*.csv

When using the bq command-line tool, you might need to escape the asterisk on someplatforms.

You can't use an asterisk wildcard when you create external tables linked toDatastore or Firestore exports.

Pricing

The following Cloud Storage retrieval and data transfer fees apply toBigQuery requests:

Limitations

For information about limitations that apply to external tables, seeExternal table limitations.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.