Create object tables

This document describes how to make unstructured data in Cloud Storage accessiblein BigQuery by creating anobject table.

To create an object table, you must complete the following tasks:

  1. Create adataset to contain the objecttable.
  2. Create aconnection to read objectinformation from Cloud Storage.
  3. Grant the Storage Object Viewer (roles/storage.objectViewer) roleto the service account associated with the connection.
  4. Create the object table and associate it with the connection by using theCREATE EXTERNAL TABLE statement.

Before you begin

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Roles required to select or create a project

    • Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
    • Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.create permission.Learn how to grant roles.
    Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.

    Go to project selector

  3. Verify that billing is enabled for your Google Cloud project.

  4. Enable the BigQuery and BigQuery Connection API APIs.

    Roles required to enable APIs

    To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enable permission.Learn how to grant roles.

    Enable the APIs

  5. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Roles required to select or create a project

    • Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
    • Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.create permission.Learn how to grant roles.
    Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.

    Go to project selector

  6. Verify that billing is enabled for your Google Cloud project.

  7. Enable the BigQuery and BigQuery Connection API APIs.

    Roles required to enable APIs

    To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enable permission.Learn how to grant roles.

    Enable the APIs

Required roles

To create object tables, you must have the following roles on the project:

  • To create datasets and tables, you must have the BigQuery Data Editor(roles/bigquery.dataEditor) role.
  • To create a connection, you must have the BigQuery Connection Admin(roles/bigquery.connectionAdmin) role.
  • To grant a role to the connection's service account, you must have theProject IAM Admin (roles/resourcemanager.projectIamAdmin).

Toquery object tables, you must have the followingroles on the project:

  • BigQuery Data Viewer (roles/bigquery.dataViewer) role
  • BigQuery Connection User (roles/bigquery.connectionUser) role

To see the exact permissions that are required, expand theRequired permissions section:

Required permissions

  • bigquery.datasets.create
  • bigquery.tables.create
  • bigquery.tables.update
  • bigquery.connections.create
  • bigquery.connections.get
  • bigquery.connections.list
  • bigquery.connections.update
  • bigquery.connections.use
  • bigquery.connections.delete
  • bigquery.connections.delegate
  • storage.bucket.*
  • storage.object.*
  • bigquery.jobs.create
  • bigquery.tables.get
  • bigquery.tables.getData
  • bigquery.readsessions.create

You might also be able to get these permissions withcustom rolesor otherpredefined roles.

Caution: Users that are meant to only query the data shouldnot have the following:
  • The ability to read objects directly from Cloud Storage, grantedby the Storage Object Viewer role.
  • The ability to bind tables to connections, granted by the BigQueryConnection Administrator role.

Otherwise, users can create new object tables that don't have any accesscontrols, thus circumventing controls placed by data warehouse administrators.

Create a dataset

Create a BigQuery dataset to contain the object table:

  1. In the Google Cloud console, go to theBigQuery page.

    Go to BigQuery

  2. In the left pane, clickExplorer:

    Highlighted button for the Explorer pane.

    If you don't see the left pane, clickExpand left pane to open the pane.

  3. In theExplorer pane, click your project name.

  4. ClickView actions > Create dataset.

  5. On theCreate dataset page, do the following:

    1. ForDataset ID, type a name for the dataset.

    2. ForLocation type, selectRegion orMulti-region.

      • If you selectedRegion, then select a location from theRegion list.
      • If you selectedMulti-region, then selectUS orEuropefrom theMulti-region list.
    3. ClickCreate dataset.

Create a connection

You can skip this step if you either have a default connection configured, oryou have the BigQuery Admin role.

Create aCloud resource connectionfor the object table to use, and get the connection's service account.

  1. Go to theBigQuery page.

    Go to BigQuery

  2. In the left pane, clickExplorer:

    Highlighted button for the Explorer pane.

  3. In theExplorer pane, clickAdd data.

    TheAdd data dialog opens.

  4. In theFilter By pane, in theData Source Type section, selectBusiness Applications.

    Alternatively, in theSearch for data sources field, you can enterVertex AI.

  5. In theFeatured data sources section, clickVertex AI.

  6. Click theVertex AI Models: BigQuery Federation solution card.

  7. In theConnection type list, selectVertex AI remote models, remote functions, BigLake and Spanner (Cloud Resource).

  8. In theConnection ID field, type a name for the connection.

  9. ForLocation type, selectRegion orMulti-region.

    • If you selectedRegion, then select a location from theRegion list.
    • If you selectedMulti-region, then selectUS orEuropefrom theMulti-region list.
  10. ClickCreate connection.

  11. ClickGo to connection.

  12. In theConnection info pane, copy the service account ID for use in afollowing step.

Give the service account access

Grant the connection's service account the Storage Object Viewer role:

Console

  1. Go to theIAM & Admin page.

    Go to IAM & Admin

  2. ClickAdd.

    TheAdd principals dialog opens.

  3. In theNew principals field, enter the service account ID that youcopied earlier.

  4. In theSelect a role field, chooseCloud Storage, and thenselectStorage Object Viewer.

  5. ClickSave.

gcloud

Use thegcloud projects add-iam-policy-binding command.

gcloud projects add-iam-policy-binding 'PROJECT_NUMBER' --member='serviceAccount:MEMBER' --role='roles/storage.objectViewer' --condition=None

Replace the following:

  • PROJECT_NUMBER: the project number of the project in which to grant the role.
  • MEMBER: the service account ID that you copied earlier.

Create an object table

To create an object table:

SQL

Use theCREATE EXTERNAL TABLE statement.

  1. In the Google Cloud console, go to theBigQuery page.

    Go to BigQuery

  2. In the query editor, enter the following statement:

    CREATEEXTERNALTABLE`PROJECT_ID.DATASET_ID.TABLE_NAME`WITHCONNECTION{`PROJECT_ID.REGION.CONNECTION_ID`|DEFAULT}OPTIONS(object_metadata='SIMPLE',uris=['BUCKET_PATH'[,...]],max_staleness=STALENESS_INTERVAL,metadata_cache_mode='CACHE_MODE');

    Replace the following:

    • PROJECT_ID: your project ID.
    • DATASET_ID: the ID of the dataset to contain the object table.
    • TABLE_NAME: the name of the object table.
    • REGION: theregion or multi-region that contains the connection.
    • CONNECTION_ID: the ID of thecloud resource connection to use with this object table. The connection determines which service account is used to read data from Cloud Storage.

      When youview the connection details in the Google Cloud console, the connection ID is the value in the last section of the fully qualified connection ID that is shown inConnection ID—for exampleprojects/myproject/locations/connection_location/connections/myconnection.

      To use a default connection, specifyDEFAULT instead of the connection string containingPROJECT_ID.REGION.CONNECTION_ID.

    • BUCKET_PATH: the path to the Cloud Storage bucket that contains the objects represented by the object table, in the format['gs://bucket_name/[folder_name/]*'].

      You can use one asterisk (*) wildcard character in each path to limit the objects included in the object table. For example, if the bucket contains several types of unstructured data, you could create the object table over only PDF objects by specifying['gs://bucket_name/*.pdf']. For more information, seeWildcard support for Cloud Storage URIs.

      You can specify multiple buckets for theuris option by providing multiple paths, for example['gs://mybucket1/*', 'gs://mybucket2/folder5/*'].

      For more information about using Cloud Storage URIs in BigQuery, seeCloud Storage resource path.

    • STALENESS_INTERVAL: specifies whether cached metadata is used by operations against the object table, and how fresh the cached metadata must be in order for the operation to use it. For more information on metadata caching considerations, seeMetadata caching for performance.

      To disable metadata caching, specify 0. This is the default.

      To enable metadata caching, specify aninterval literal value between 30 minutes and 7 days. For example, specifyINTERVAL 4 HOUR for a 4 hour staleness interval. With this value, operations against the table use cached metadata if it has been refreshed within the past 4 hours. If the cached metadata is older than that, the operation retrieves metadata from Cloud Storage instead.

    • CACHE_MODE: specifies whether the metadata cache is refreshed automatically or manually. For more information on metadata caching considerations, seeMetadata caching for performance.

      Set toAUTOMATIC for the metadata cache to be refreshed at a system-defined interval, usually somewhere between 30 and 60 minutes.

      Set toMANUAL if you want to refresh the metadata cache on a schedule you determine. In this case, you can call theBQ.REFRESH_EXTERNAL_METADATA_CACHE system procedure to refresh the cache.

      You must setCACHE_MODE ifSTALENESS_INTERVAL is set to a value greater than 0.

  3. ClickRun.

For more information about how to run queries, seeRun an interactive query.

Examples

The following example creates an object table with a metadata cachestaleness interval of 1 day:

CREATEEXTERNALTABLE`my_dataset.object_table`WITHCONNECTION`us.my-connection`OPTIONS(object_metadata='SIMPLE',uris=['gs://mybucket/*'],max_staleness=INTERVAL1DAY,metadata_cache_mode='AUTOMATIC');

The following example creates an object table over the objects in threeCloud Storage buckets:

CREATEEXTERNALTABLE`my_dataset.object_table`WITHCONNECTION`us.my-connection`OPTIONS(object_metadata='SIMPLE',uris=['gs://bucket1/*','gs://bucket2/folder1/*','gs://bucket3/*']);

The following example creates an object table over just the PDF objectsin a Cloud Storage bucket:

CREATEEXTERNALTABLE`my_dataset.object_table`WITHCONNECTION`us.my-connection`OPTIONS(object_metadata='SIMPLE',uris=['gs://bucket1/*.pdf']);

bq

Use thebq mk command.

bqmk--table\--external_table_definition=BUCKET_PATH@REGION.CONNECTION_ID\--object_metadata=SIMPLE\--max_staleness=STALENESS_INTERVAL\--metadata_cache_mode=CACHE_MODE\PROJECT_ID:DATASET_ID.TABLE_NAME

Replace the following:

  • PROJECT_ID: your project ID.
  • DATASET_ID: the ID of the dataset to contain the object table.
  • TABLE_NAME: the name of the object table.
  • REGION: theregion or multi-region that contains the connection.
  • CONNECTION_ID: the ID of thecloud resource connection to use with this external table. The connection determines which service account is used to read data from Cloud Storage.

    When youview the connection details in the Google Cloud console, the connection ID is the value in the last section of the fully qualified connection ID that is shown inConnection ID—for exampleprojects/myproject/locations/connection_location/connections/myconnection.

  • BUCKET_PATH: the path to the Cloud Storage bucket that contains the objects represented by the object table, in the formatgs://bucket_name/[folder_name/]*.

    You can use one asterisk (*) wildcard character in each path to limit the objects included in the object table. For example, if the bucket contains several types of unstructured data, you could create the object table over only PDF objects by specifyinggs://bucket_name/*.pdf. For more information, seeWildcard support for Cloud Storage URIs.

    You can specify multiple buckets for theuris option by providing multiple paths, for examplegs://mybucket1/*,gs://mybucket2/folder5/*.

    For more information about using Cloud Storage URIs in BigQuery, seeCloud Storage resource path.

  • STALENESS_INTERVAL: specifies whether cached metadata is used by operations against the object table, and how fresh the cached metadata must be in order for the operation to use it. For more information on metadata caching considerations, seeMetadata caching for performance.

    To disable metadata caching, specify 0. This is the default.

    To enable metadata caching, specify an interval value between 30 minutes and 7 days, using theY-M D H:M:S format described in theINTERVAL data type documentation. For example, specify0-0 0 4:0:0 for a 4 hour staleness interval. With this value, operations against the table use cached metadata if it has been refreshed within the past 4 hours. If the cached metadata is older than that, the operation retrieves metadata from Cloud Storage instead.

  • CACHE_MODE: specifies whether the metadata cache is refreshed automatically or manually. For more information on metadata caching considerations, seeMetadata caching for performance.

    Set toAUTOMATIC for the metadata cache to be refreshed at a system-defined interval, usually somewhere between 30 and 60 minutes.

    Set toMANUAL if you want to refresh the metadata cache on a schedule you determine. In this case, you can call theBQ.REFRESH_EXTERNAL_METADATA_CACHE system procedure to refresh the cache.

    You must setCACHE_MODE ifSTALENESS_INTERVAL is set to a value greater than 0.

Examples

The following example creates an object table with a metadata cachestaleness interval of 1 day:

bqmk--table\--external_table_definition=gs://mybucket/*@us.my-connection\--object_metadata=SIMPLE\--max_staleness=0-010:0:0\--metadata_cache_mode=AUTOMATIC\my_dataset.object_table

The following example creates an object table over the objects in threeCloud Storage buckets:

bqmk--table\--external_table_definition=gs://bucket1/*,gs://bucket2/folder1/*,gs://bucket3/*@us.my-connection\--object_metadata=SIMPLE\my_dataset.object_table

The following example creates an object table over just the PDF objectsin a Cloud Storage bucket:

bqmk--table\--external_table_definition=gs://bucket1/*.pdf@us.my-connection\--object_metadata=SIMPLE\my_dataset.object_table

API

Call thetables.insert method.Include anExternalDataConfiguration objectwith theobjectMetadata field set toSIMPLE in theTable resourcethat you pass in.

The following example shows how to call this method by usingcurl:

ACCESS_TOKEN=$(gcloud auth print-access-token) curl \-H "Authorization: Bearer ${ACCESS_TOKEN}" \-H "Content-Type: application/json" \-X POST \-d '{"tableReference": {"projectId": "my_project", "datasetId": "my_dataset", "tableId": "object_table_name"}, "externalDataConfiguration": {"objectMetadata": "SIMPLE", "sourceUris": ["gs://mybucket/*"]}}' \https://www.googleapis.com/bigquery/v2/projects/my_project/datasets/my_dataset/tables

Terraform

This example creates an object table with metadata caching enabled withmanual refresh.

To authenticate to BigQuery, set up Application DefaultCredentials. For more information, seeSet up authentication for client libraries.

The key fields to specify for an object table aregoogle_bigquery_table.external_data_configuration.object_metadata,google_bigquery_table.external_data_configuration.metadata_cache_mode,andgoogle_bigquery_table.max_staleness. For more information on each resource, see theTerraform BigQuery documentation.

# This queries the provider for project information.data"google_project""default"{}# This creates a connection in the US region named "my-connection-id".# This connection is used to access the bucket.resource"google_bigquery_connection""default"{connection_id="my-connection-id"location="US"cloud_resource{}}# This grants the previous connection IAM role access to the bucket.resource"google_project_iam_member""default"{role="roles/storage.objectViewer"project=data.google_project.default.project_idmember="serviceAccount:${google_bigquery_connection.default.cloud_resource[0].service_account_id}"}# This defines a Google BigQuery dataset.resource"google_bigquery_dataset""default"{dataset_id="my_dataset_id"}# This creates a bucket in the US region named "my-bucket" with a pseudorandom suffix.resource"random_id""bucket_name_suffix"{byte_length=8}resource"google_storage_bucket""default"{name="my-bucket-${random_id.bucket_name_suffix.hex}"location="US"force_destroy=trueuniform_bucket_level_access=true}# This defines a BigQuery object table with manual metadata caching.resource"google_bigquery_table""default"{table_id="my-table-id"dataset_id=google_bigquery_dataset.default.dataset_idexternal_data_configuration{connection_id=google_bigquery_connection.default.nameautodetect=false    # `object_metadata is` required for object tables. For more information, see    # https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/bigquery_table#object_metadataobject_metadata="SIMPLE"    # This defines the source for the prior object table.source_uris=["gs://${google_storage_bucket.default.name}/*",]metadata_cache_mode="MANUAL"}  # This ensures that the connection can access the bucket  # before Terraform creates a table.depends_on=[google_project_iam_member.default]}

To apply your Terraform configuration in a Google Cloud project, complete the steps in the following sections.

Prepare Cloud Shell

  1. LaunchCloud Shell.
  2. Set the default Google Cloud project where you want to apply your Terraform configurations.

    You only need to run this command once per project, and you can run it in any directory.

    export GOOGLE_CLOUD_PROJECT=PROJECT_ID

    Environment variables are overridden if you set explicit values in the Terraform configuration file.

Prepare the directory

Each Terraform configuration file must have its own directory (alsocalled aroot module).

  1. InCloud Shell, create a directory and a new file within that directory. The filename must have the.tf extension—for examplemain.tf. In this tutorial, the file is referred to asmain.tf.
    mkdirDIRECTORY && cdDIRECTORY && touch main.tf
  2. If you are following a tutorial, you can copy the sample code in each section or step.

    Copy the sample code into the newly createdmain.tf.

    Optionally, copy the code from GitHub. This is recommended when the Terraform snippet is part of an end-to-end solution.

  3. Review and modify the sample parameters to apply to your environment.
  4. Save your changes.
  5. Initialize Terraform. You only need to do this once per directory.
    terraform init

    Optionally, to use the latest Google provider version, include the-upgrade option:

    terraform init -upgrade

Apply the changes

  1. Review the configuration and verify that the resources that Terraform is going to create or update match your expectations:
    terraform plan

    Make corrections to the configuration as necessary.

  2. Apply the Terraform configuration by running the following command and enteringyes at the prompt:
    terraform apply

    Wait until Terraform displays the "Apply complete!" message.

  3. Open your Google Cloud project to view the results. In the Google Cloud console, navigate to your resources in the UI to make sure that Terraform has created or updated them.
Note: Terraform samples typically assume that the required APIs are enabled in your Google Cloud project.

Query object tables

You can query an object table like any other BigQuery, forexample:

SELECT*FROMmydataset.myobjecttable;

Querying an object table returns metadata for the underlying objects. For moreinformation, seeObject table schema.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.