Create Apache Iceberg external tables

Apache Iceberg external tables let you accessApache Icebergtables with finer-grained access control in a read-only format.

Iceberg is an open source table format that supportspetabyte scale data tables. The Iceberg open specificationlets you run multiple query engines on a single copy of data stored in an objectstore. Apache Iceberg external tables (hereafter calledIceberg external tables) supportIceberg specification version 2,including merge-on-read.

As a BigQuery administrator, you can enforce row- andcolumn-level access control including datamasking on tables. For information about how to set up access control at thetable level, seeSet up access control policies. Tableaccess policies are also enforced when you use theBigQuery Storage API as a data source for the table in Dataproc andServerless Spark.

You can create Iceberg external tables in the followingways:

  • With BigLake metastore (recommended for Google Cloud).BigLake metastore is a unified, managed, serverless, and scalable metastorethat connects lakehouse data stored in Google Cloud to multipleruntimes, including open source engines (like Apache Spark) andBigQuery.

  • With AWS Glue Data Catalog (recommended for AWS).AWS Glue is the recommended method for AWS because it's a centralized metadata repository whereyou define the structure and location of your data stored in various AWS services and providescapabilities like automatic schema discovery and integration with AWS analytics tools.

  • With Iceberg JSON metadata files (recommended for Azure).If you use an Iceberg JSON metadata file, then you mustmanually update the latest metadata file whenever there are any table updates.You can use a BigQuery stored procedure forApache Spark to create Iceberg external tables thatreference an Iceberg metadata file.

For a full list of limitations, seeLimitations.

Before you begin

Enable the BigQuery Connection andBigQuery Reservation APIs.

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enable permission.Learn how to grant roles.

Enable the APIs

Required roles

To get the permissions that you need to create an Iceberg external table, ask your administrator to grant you the following IAM roles on the project:

For more information about granting roles, seeManage access to projects, folders, and organizations.

These predefined roles contain the permissions required to create an Iceberg external table. To see the exact permissions that are required, expand theRequired permissions section:

Required permissions

The following permissions are required to create an Iceberg external table:

  • bigquery.tables.create
  • bigquery.connections.delegate
  • bigquery.jobs.create

You might also be able to get these permissions withcustom roles or otherpredefined roles.

Create tables with BigLake metastore

We recommend creating Iceberg external tables withBigLake metastore.

Create tables with a metadata file

You can create Iceberg external tables with aJSON metadata file.However, this is not the recommended method becauseyou have to manuallyupdate the URI of the JSON metadata fileto keep the Iceberg external table up to date. If the URI is not kept upto date, queries in BigQuery can either fail or provide differentresults from other query engines that directly use an Iceberg catalog.

Iceberg table metadata filesare created in the Cloud Storage bucketthat you specify when you create anIceberg table using Spark.

Select one of the following options:

SQL

Use theCREATE EXTERNAL TABLE statement. The following example creates anIceberg external table namedmyexternal-table:

  CREATE EXTERNAL TABLE myexternal-table  WITH CONNECTION `myproject.us.myconnection`  OPTIONS (         format = 'ICEBERG',         uris = ["gs://mybucket/mydata/mytable/metadata/iceberg.metadata.json"]   )

Replace theuris value with the latestJSON metadata file for a specific table snapshot.

You can enable therequire partition filter by setting therequire_partition_filter flag.

bq

In a command-line environment, use thebq mk --table commandwith the@connection decorator to specify the connection to use at the endof the--external_table_definition parameter.To enable the require partition filter, use--require_partition_filter.

bqmk
--table
--external_table_definition=TABLE_FORMAT=URI@projects/CONNECTION_PROJECT_ID/locations/CONNECTION_REGION/connections/CONNECTION_ID
PROJECT_ID:DATASET.EXTERNAL_TABLE

Replace the following:

  • TABLE_FORMAT: the format of the table thatyou want to create

    In this case,ICEBERG.

  • URI: the latestJSON metadata file for a specific table snapshot.

    For example,gs://mybucket/mydata/mytable/metadata/iceberg.metadata.json.

    The URI can point to an external cloud location as well; such as Amazon S3 or Azure Blob Storage.

    • Example for AWS:s3://mybucket/iceberg/metadata/1234.metadata.json.
    • Example for Azure:azure://mystorageaccount.blob.core.windows.net/mycontainer/iceberg/metadata/1234.metadata.json.
  • CONNECTION_PROJECT_ID: the project thatcontains theconnection tocreate the Iceberg external table—for example,myproject

  • CONNECTION_REGION: the region thatcontains the connection to create theIceberg external table—for example,us

  • CONNECTION_ID: the table connectionID—for example,myconnection

    When youview the connection detailsin the Google Cloud console, the connection ID is the value in the lastsection of the fully qualified connection ID that is shown inConnection ID—for exampleprojects/myproject/locations/connection_location/connections/myconnection

  • DATASET: the name of theBigQuery dataset that you want to create a tablein

    For example,mydataset.

  • EXTERNAL_TABLE: the name of the table thatyou want to create

    For example,mytable.

Update table metadata

If you use a JSON metadata file to create anIceberg external table, update the table definition to thelatest table metadata. To update the schema or the metadata file, select one ofthe following options:

bq

  1. Create a table definition file:

    bq mkdef --source_format=ICEBERG \"URI" >TABLE_DEFINITION_FILE
  2. Use thebq update commandwith the--autodetect_schema flag:

    bq update --autodetect_schema --external_table_definition=TABLE_DEFINITION_FILEPROJECT_ID:DATASET.TABLE

    Replace the following:

    • URI: your Cloud Storage URI with thelatestJSON metadata file

      For example,gs://mybucket/us/iceberg/mytable/metadata/1234.metadata.json.

    • TABLE_DEFINITION_FILE: the name of the file containing thetable schema

    • PROJECT_ID: the project ID containing the table that youwant to update

    • DATASET: the dataset containing the table that youwant to update

    • TABLE: the table that you want to update

API

Use thetables.patch methodwith theautodetect_schema property set totrue:

PATCH https://bigquery.googleapis.com/bigquery/v2/projects/PROJECT_ID/datasets/DATASET/tables/TABLE?autodetect_schema=true

Replace the following:

  • PROJECT_ID: the project ID that contains the table that youwant to update
  • DATASET: the dataset containing the table that youwant to update
  • TABLE: the table that you want to update

In the body of the request, specify the updated values for the followingfields:

{     "externalDataConfiguration": {      "sourceFormat": "ICEBERG",      "sourceUris": [        "URI"      ]    },    "schema": null  }'

ReplaceURI with the latest Icebergmetadata file. For example,gs://mybucket/us/iceberg/mytable/metadata/1234.metadata.json.

Set up access control policies

You can control access to Iceberg external tables throughcolumn-level security,row-level security,anddata masking.

Query Iceberg external tables

For more information, seeQuery Iceberg data.

Query historical data

You can access snapshots of Iceberg external tables that areretained in your Iceberg metadata by using theFOR SYSTEM_TIME AS OF clause.

Time travel and fail-safe data retention windowsaren't supported for any external tables.

Data mapping

BigQuery converts Iceberg data types to BigQuerydata types as shown in the following table:

Iceberg data typeBigQuery data type
booleanBOOL
intINT64
longINT64
floatFLOAT64
doubleFLOAT64
Decimal(P/S)NUMERIC or BIG_NUMERIC depending on precision
dateDATE
timeTIME
timestampDATETIME
timestamptzTIMESTAMP
stringSTRING
uuidBYTES
fixed(L)BYTES
binaryBYTES
list<Type>ARRAY<Type>
structSTRUCT
map<KeyType, ValueType>ARRAY<Struct<key KeyType, value ValueType>>

Limitations

Iceberg external tables tables haveexternal table limitations andthe following limitations:

  • Tables using merge-on-read have the following limitations:

    • Each data file can be associated with up to 10,000 delete files.
    • No more than 100,000 equality deletes can be applied to a data file.
    • You can work around these limitations by either compacting delete filesfrequently, or creating a view on top of the Iceberg table that avoidsfrequently mutated partitions.
  • BigQuery supports manifest pruning using allIceberg partitiontransformation functions.For information about how to prune partitions, seeQuery partitioned tables. Queriesreferencing Iceberg external tables must contain literals inpredicates compared to columns that are partitioned.

  • Only Apache Parquet data files are supported.

Merge-on-read costs

On-demand billing for merge-on-read data is the sum of scans of the following data:

  • All logical bytes read in the data file (including rows that are marked asdeleted by position and equality deletes).
  • Logical bytes read loading the equality delete and position deletes filesto find the deleted rows in a data file.

Require partition filter

You can require the use of predicate filters by enabling therequire partition filter option for your Iceberg table. If you enable this option, attempts to query the table without specifying aWHERE clause that aligns with each manifest file will produce the following error:

Cannot query over tableproject_id.dataset.table without afilter that can be used for partition elimination.

Each manifest file requires at least one predicate suitable for partition elimination.

You can enable therequire_partition_filter in the following ways whilecreating an Iceberg table :

SQL

Use theCREATE EXTERNAL TABLE statement.The following examplecreates an Iceberg external table namedTABLE with require partition filter enabled:

  CREATE EXTERNAL TABLETABLE  WITH CONNECTION `PROJECT_ID.REGION.CONNECTION_ID`  OPTIONS (         format = 'ICEBERG',         uris = [URI],         require_partition_filter =true   )

Replace the following:

  • TABLE: the table name that you want to create.
  • PROJECT_ID: the project ID containing the table that you want to create.
  • REGION: thelocation where you want to create the Iceberg table.
  • CONNECTION_ID: theconnection ID. For example,myconnection.

  • URI: the Cloud Storage URI with thelatestJSON metadata file.

    For example,gs://mybucket/us/iceberg/mytable/metadata/1234.metadata.json.

    The URI can point to an external cloud location as well; such as Amazon S3 or Azure Blob Storage.

    • Example for AWS:s3://mybucket/iceberg/metadata/1234.metadata.json.
    • Example for Azure:azure://mystorageaccount.blob.core.windows.net/mycontainer/iceberg/metadata/1234.metadata.json.

bq

Use thebq mk --table command with the@connection decorator to specify the connection to use at the endof the--external_table_definition parameter.Use--require_partition_filter to enable the require partition filter.The following example creates an Iceberg external table namedTABLE with require partition filter enabled:

bqmk\--table\--external_table_definition=ICEBERG=URI@projects/CONNECTION_PROJECT_ID/locations/CONNECTION_REGION/connections/CONNECTION_ID\PROJECT_ID:DATASET.EXTERNAL_TABLE\--require_partition_filter

Replace the following:

  • URI: the latestJSON metadata file for a specific table snapshot

    For example,gs://mybucket/mydata/mytable/metadata/iceberg.metadata.json.

    The URI can point to an external cloud location as well; such as Amazon S3 or Azure Blob Storage.

    • Example for AWS:s3://mybucket/iceberg/metadata/1234.metadata.json.
    • Example for Azure:azure://mystorageaccount.blob.core.windows.net/mycontainer/iceberg/metadata/1234.metadata.json.
  • CONNECTION_PROJECT_ID: the project thatcontains theconnection to createthe Iceberg external table—for example,myproject

  • CONNECTION_REGION: theregion thatcontains the connection to create theIceberg external table. For example,us.

  • CONNECTION_ID: : theconnection ID. For example,myconnection.

    When youview the connection detailsin the Google Cloud console, the connection ID is the value in the lastsection of the fully qualified connection ID that is shown inConnection ID—for exampleprojects/myproject/locations/connection_location/connections/myconnection

  • DATASET: the name of the BigQuery

    dataset that contains the table that you want to update. For example,mydataset.

  • EXTERNAL_TABLE: the name of the tablethat you want to create

    For example,mytable.

You can also update your Iceberg table to enable the require partition filter.

If you don't enable therequire partition filter option when you create the partitioned table, you can update the table to add the option.

bq

Use thebq update command and supply the--require_partition_filter flag.

For example:

To updatemypartitionedtable inmydataset in your default project,enter:

bqupdate--require_partition_filterPROJECT_ID:DATASET.TABLE

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.