Query Cloud Storage data in external tables

This document describes how to query data stored in aCloud Storage external table.

Before you begin

Ensure that you have aCloud Storage external table.

Required roles

To query Cloud Storage external tables, ensureyou have the following roles:

  • BigQuery Data Viewer (roles/bigquery.dataViewer)
  • BigQuery User (roles/bigquery.user)
  • Storage Object Viewer (roles/storage.objectViewer)

Depending on your permissions, you cangrant these roles to yourself or ask your administratorto grant them to you. For more information about granting roles, seeViewing the grantable roles on resources.

To see the exact BigQuery permissions that are required to queryexternal tables, expand theRequired permissions section:

Required permissions

You might also be able to get these permissions withcustom rolesor otherpredefined roles.

Query permanent external tables

After creating a Cloud Storage external table, you canquery it usingGoogleSQL syntax, the same as ifit were a standard BigQuery table. For example,SELECT field1, field2FROM mydataset.my_cloud_storage_table;.

Query temporary external tables

Querying an external data source using a temporary table is useful for one-time, ad-hoc queries over external data, or for extract, transform, and load (ETL) processes.

To query an external data source without creating a permanent table, you provide a table definition for the temporary table, and then use that table definition in a command or call to query the temporary table. You can provide the table definition in any of the following ways:

The table definition file or supplied schema is used to create the temporary external table,and the query runs against the temporary external table.

When you use a temporary external table, you do not create a table in one of your BigQuery datasets. Because the table is not permanently stored in a dataset, it cannot be shared with others.

You can create and query a temporary table linked to an external data sourceby using the bq command-line tool, the API, or the client libraries.

bq

You query a temporary table linked to an external data source using thebq query commandwith the--external_table_definition flag.When you use the bq command-line tool to query a temporary table linked to an externaldata source, you can identify the table's schema using:

(Optional) Supply the--location flag and set the value to yourlocation.

To query a temporary table linked to your external data source using a tabledefinition file, enter the following command.

bq--location=LOCATIONquery\--external_table_definition=TABLE::DEFINITION_FILE\'QUERY'

Replace the following:

  • LOCATION: the name of yourlocation.The--location flag is optional. For example, if you are usingBigQuery in the Tokyo region, you can set the flag's valuetoasia-northeast1. You can set a default value for the location usingthe.bigqueryrc file.
  • TABLE: the name of the temporary table you're creating.
  • DEFINITION_FILE: the path to thetable definition fileon your local machine.
  • QUERY: the query you're submitting to the temporary table.

For example, the following command creates and queries a temporary tablenamedsales using a table definition file namedsales_def.

bq query \--external_table_definition=sales::sales_def \'SELECT  Region,  Total_salesFROM  sales'

To query a temporary table linked to your external data source using aninline schema definition, enter the following command.

bq--location=LOCATIONquery\--external_table_definition=TABLE::SCHEMA@SOURCE_FORMAT=BUCKET_PATH\'QUERY'

Replace the following:

  • LOCATION: the name of yourlocation.The--location flag is optional. For example, if you are usingBigQuery in the Tokyo region, you can set the flag's valuetoasia-northeast1. You can set a default value for the location usingthe.bigqueryrc file.
  • TABLE: the name of the temporary table you're creating.
  • SCHEMA: the inline schema definition in the formatfield:data_type,field:data_type.
  • SOURCE_FORMAT: the format of the external data source, for example,CSV.
  • BUCKET_PATH: the path to theCloud Storage bucket that contains the data for thetable, in the formatgs://bucket_name/[folder_name/]file_pattern.

    You can select multiple files from the bucket by specifying one asterisk (*)wildcard character in thefile_pattern. For example,gs://mybucket/file00*.parquet. For moreinformation, seeWildcard support for Cloud Storage URIs.

    You can specify multiple buckets for theuris option by providing multiplepaths.

    The following examples show validuris values:

    • gs://bucket/path1/myfile.csv
    • gs://bucket/path1/*.parquet
    • gs://bucket/path1/file1*,gs://bucket1/path1/*

    When you specifyuris values that target multiple files, all of thosefiles must share a compatible schema.

    For more information about using Cloud Storage URIs inBigQuery, seeCloud Storage resource path.

  • QUERY: the query you're submitting to the temporary table.

For example, the following command creates and queries a temporary tablenamedsales linked to a CSV file stored in Cloud Storage with thefollowing schema definition:Region:STRING,Quarter:STRING,Total_sales:INTEGER.

bqquery\--external_table_definition=sales::Region:STRING,Quarter:STRING,Total_sales:INTEGER@CSV=gs://mybucket/sales.csv \'SELECT  Region,  Total_salesFROM  sales'

To query a temporary table linked to your external data source using a JSONschema file, enter the following command.

bq--location=LOCATIONquery\--external_table_definition=SCHEMA_FILE@SOURCE_FORMAT=BUCKET_PATH\'QUERY'

Replace the following:

  • LOCATION: the name of yourlocation.The--location flag is optional. For example, if you are usingBigQuery in the Tokyo region, you can set the flag's valuetoasia-northeast1. You can set a default value for the location usingthe.bigqueryrc file.
  • SCHEMA_FILE: the path to the JSON schema file on your localmachine.
  • SOURCE_FORMAT: the format of the external data source, for example,CSV.
  • BUCKET_PATH: the path to theCloud Storage bucket that contains the data for thetable, in the formatgs://bucket_name/[folder_name/]file_pattern.

    You can select multiple files from the bucket by specifying one asterisk (*)wildcard character in thefile_pattern. For example,gs://mybucket/file00*.parquet. For moreinformation, seeWildcard support for Cloud Storage URIs.

    You can specify multiple buckets for theuris option by providing multiplepaths.

    The following examples show validuris values:

    • gs://bucket/path1/myfile.csv
    • gs://bucket/path1/*.parquet
    • gs://bucket/path1/file1*,gs://bucket1/path1/*

    When you specifyuris values that target multiple files, all of thosefiles must share a compatible schema.

    For more information about using Cloud Storage URIs inBigQuery, seeCloud Storage resource path.

  • QUERY: the query you're submitting to the temporary table.

For example, the following command creates and queries a temporary tablenamedsales linked to a CSV file stored in Cloud Storage using the/tmp/sales_schema.json schema file.

bqquery\--external_table_definition=sales::/tmp/sales_schema.json@CSV=gs://mybucket/sales.csv\'SELECT      Region,      Total_sales    FROM      sales'

API

To run a query using the API, follow these steps:

  1. Create aJob object.
  2. Populate theconfiguration section of theJob object with aJobConfiguration object.
  3. Populate thequery section of theJobConfiguration object with aJobConfigurationQuery object.
  4. Populate thetableDefinitions section of theJobConfigurationQuery objectwith anExternalDataConfiguration object.
  5. Call thejobs.insert method torun the query asynchronously or thejobs.query method to runthe query synchronously, passing in theJob object.

Java

Before trying this sample, follow theJava setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQueryJava API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

importcom.google.cloud.bigquery.BigQuery;importcom.google.cloud.bigquery.BigQueryException;importcom.google.cloud.bigquery.BigQueryOptions;importcom.google.cloud.bigquery.CsvOptions;importcom.google.cloud.bigquery.ExternalTableDefinition;importcom.google.cloud.bigquery.Field;importcom.google.cloud.bigquery.QueryJobConfiguration;importcom.google.cloud.bigquery.Schema;importcom.google.cloud.bigquery.StandardSQLTypeName;importcom.google.cloud.bigquery.TableResult;// Sample to queries an external data source using a temporary tablepublicclassQueryExternalGCSTemp{publicstaticvoidrunQueryExternalGCSTemp(){// TODO(developer): Replace these variables before running the sample.StringtableName="MY_TABLE_NAME";StringsourceUri="gs://cloud-samples-data/bigquery/us-states/us-states.csv";Schemaschema=Schema.of(Field.of("name",StandardSQLTypeName.STRING),Field.of("post_abbr",StandardSQLTypeName.STRING));Stringquery=String.format("SELECT * FROM %s WHERE name LIKE 'W%%'",tableName);queryExternalGCSTemp(tableName,sourceUri,schema,query);}publicstaticvoidqueryExternalGCSTemp(StringtableName,StringsourceUri,Schemaschema,Stringquery){try{// Initialize client that will be used to send requests. This client only needs to be created// once, and can be reused for multiple requests.BigQuerybigquery=BigQueryOptions.getDefaultInstance().getService();// Skip header row in the file.CsvOptionscsvOptions=CsvOptions.newBuilder().setSkipLeadingRows(1).build();// Configure the external data source and query job.ExternalTableDefinitionexternalTable=ExternalTableDefinition.newBuilder(sourceUri,csvOptions).setSchema(schema).build();QueryJobConfigurationqueryConfig=QueryJobConfiguration.newBuilder(query).addTableDefinition(tableName,externalTable).build();// Example query to find states starting with 'W'TableResultresults=bigquery.query(queryConfig);results.iterateAll().forEach(row->row.forEach(val->System.out.printf("%s,",val.toString())));System.out.println("Query on external temporary table performed successfully.");}catch(BigQueryException|InterruptedExceptione){System.out.println("Query not performed \n"+e.toString());}}}

Node.js

Before trying this sample, follow theNode.js setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQueryNode.js API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

// Import the Google Cloud client library and create a clientconst{BigQuery}=require('@google-cloud/bigquery');constbigquery=newBigQuery();asyncfunctionqueryExternalGCSTemp(){// Queries an external data source using a temporary table.consttableId='us_states';// Configure the external data sourceconstexternalDataConfig={sourceFormat:'CSV',sourceUris:['gs://cloud-samples-data/bigquery/us-states/us-states.csv'],// Optionally skip header row.csvOptions:{skipLeadingRows:1},schema:{fields:schema},};// Example query to find states starting with 'W'constquery=`SELECT post_abbr  FROM \`${tableId}\`  WHERE name LIKE 'W%'`;// For all options, see https://cloud.google.com/bigquery/docs/reference/v2/tables#resourceconstoptions={query,tableDefinitions:{[tableId]:externalDataConfig},};// Run the query as a jobconst[job]=awaitbigquery.createQueryJob(options);console.log(`Job${job.id} started.`);// Wait for the query to finishconst[rows]=awaitjob.getQueryResults();// Print the resultsconsole.log('Rows:');console.log(rows);}

Python

Before trying this sample, follow thePython setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQueryPython API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

fromgoogle.cloudimportbigquery# Construct a BigQuery client object.client=bigquery.Client()# Configure the external data source and query job.external_config=bigquery.ExternalConfig("CSV")external_config.source_uris=["gs://cloud-samples-data/bigquery/us-states/us-states.csv"]external_config.schema=[bigquery.SchemaField("name","STRING"),bigquery.SchemaField("post_abbr","STRING"),]external_config.options.skip_leading_rows=1table_id="us_states"job_config=bigquery.QueryJobConfig(table_definitions={table_id:external_config})# Example query to find states starting with 'W'.sql='SELECT * FROM `{}` WHERE name LIKE "W%"'.format(table_id)query_job=client.query(sql,job_config=job_config)# Make an API request.w_states=list(query_job)# Wait for the job to complete.print("There are{} states with names starting with W.".format(len(w_states)))

Query the_FILE_NAME pseudocolumn

Tables based on external data sources provide a pseudocolumn named_FILE_NAME. This column contains the fully qualified path to the file to which the row belongs. This column is available only for tables that reference external data stored inCloud Storage,Google Drive,Amazon S3, andAzure Blob Storage.

The_FILE_NAME column name is reserved, which means that you cannot create a column by that name in any of your tables. To select the value of_FILE_NAME, you must use an alias. The following example query demonstrates selecting_FILE_NAME by assigning the aliasfn to the pseudocolumn.

bqquery\--project_id=PROJECT_ID\--use_legacy_sql=false\'SELECT     name,     _FILE_NAME AS fn   FROM`DATASET.TABLE_NAME`   WHERE     name contains "Alex"'

Replace the following:

  • PROJECT_ID is a valid project ID (this flag is not required if you use Cloud Shell or if you set a default project in the Google Cloud CLI)
  • DATASETis the name of the dataset that stores the permanent external table
  • TABLE_NAME is the name of the permanent external table

When the query has a filter predicate on the_FILE_NAME pseudocolumn, BigQuery attempts to skip reading files that do not satisfy the filter. Similar recommendations to querying ingestion-time partitioned tables using pseudocolumns apply when constructing query predicates with the_FILE_NAME pseudocolumn.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.