Create Apache Iceberg external tables
Apache Iceberg external tables let you accessApache Icebergtables with finer-grained access control in a read-only format.
Iceberg is an open source table format that supportspetabyte scale data tables. The Iceberg open specificationlets you run multiple query engines on a single copy of data stored in an objectstore. Apache Iceberg external tables (hereafter calledIceberg external tables) supportIceberg specification version 2,including merge-on-read.
As a BigQuery administrator, you can enforce row- andcolumn-level access control including datamasking on tables. For information about how to set up access control at thetable level, seeSet up access control policies. Tableaccess policies are also enforced when you use theBigQuery Storage API as a data source for the table in Dataproc andServerless Spark.
You can create Iceberg external tables in the followingways:
With BigLake metastore (recommended for Google Cloud).BigLake metastore is a unified, managed, serverless, and scalable metastorethat connects lakehouse data stored in Google Cloud to multipleruntimes, including open source engines (like Apache Spark) andBigQuery.
With AWS Glue Data Catalog (recommended for AWS).AWS Glue is the recommended method for AWS because it's a centralized metadata repository whereyou define the structure and location of your data stored in various AWS services and providescapabilities like automatic schema discovery and integration with AWS analytics tools.
With Iceberg JSON metadata files (recommended for Azure).If you use an Iceberg JSON metadata file, then you mustmanually update the latest metadata file whenever there are any table updates.You can use a BigQuery stored procedure forApache Spark to create Iceberg external tables thatreference an Iceberg metadata file.
For a full list of limitations, seeLimitations.
Before you begin
Enable the BigQuery Connection andBigQuery Reservation APIs.
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enable permission.Learn how to grant roles.
If you use a stored procedure for Spark inBigQuery to create Iceberg external tables, youmust follow these steps:
To store the Iceberg external table metadata and data files inCloud Storage,create a Cloud Storage bucket. Youneed to connect to your Cloud Storage bucket to access metadata files.To do so, follow these steps:
Required roles
To get the permissions that you need to create an Iceberg external table, ask your administrator to grant you the following IAM roles on the project:
- BigQuery Admin (
roles/bigquery.admin) - Storage Object Admin (
roles/storage.objectAdmin)
For more information about granting roles, seeManage access to projects, folders, and organizations.
These predefined roles contain the permissions required to create an Iceberg external table. To see the exact permissions that are required, expand theRequired permissions section:
Required permissions
The following permissions are required to create an Iceberg external table:
bigquery.tables.createbigquery.connections.delegatebigquery.jobs.create
You might also be able to get these permissions withcustom roles or otherpredefined roles.
Create tables with BigLake metastore
We recommend creating Iceberg external tables withBigLake metastore.
Create tables with a metadata file
You can create Iceberg external tables with aJSON metadata file.However, this is not the recommended method becauseyou have to manuallyupdate the URI of the JSON metadata fileto keep the Iceberg external table up to date. If the URI is not kept upto date, queries in BigQuery can either fail or provide differentresults from other query engines that directly use an Iceberg catalog.
Iceberg table metadata filesare created in the Cloud Storage bucketthat you specify when you create anIceberg table using Spark.
Select one of the following options:
SQL
Use theCREATE EXTERNAL TABLE statement. The following example creates anIceberg external table namedmyexternal-table:
CREATE EXTERNAL TABLE myexternal-table WITH CONNECTION `myproject.us.myconnection` OPTIONS ( format = 'ICEBERG', uris = ["gs://mybucket/mydata/mytable/metadata/iceberg.metadata.json"] )
Replace theuris value with the latestJSON metadata file for a specific table snapshot.
You can enable therequire partition filter by setting therequire_partition_filter flag.
bq
In a command-line environment, use thebq mk --table commandwith the@connection decorator to specify the connection to use at the endof the--external_table_definition parameter.To enable the require partition filter, use--require_partition_filter.
bqmk
--table
--external_table_definition=TABLE_FORMAT=URI@projects/CONNECTION_PROJECT_ID/locations/CONNECTION_REGION/connections/CONNECTION_ID
PROJECT_ID:DATASET.EXTERNAL_TABLE
Replace the following:
TABLE_FORMAT: the format of the table thatyou want to createIn this case,
ICEBERG.URI: the latestJSON metadata file for a specific table snapshot.For example,
gs://mybucket/mydata/mytable/metadata/iceberg.metadata.json.The URI can point to an external cloud location as well; such as Amazon S3 or Azure Blob Storage.
- Example for AWS:
s3://mybucket/iceberg/metadata/1234.metadata.json. - Example for Azure:
azure://mystorageaccount.blob.core.windows.net/mycontainer/iceberg/metadata/1234.metadata.json.
- Example for AWS:
CONNECTION_PROJECT_ID: the project thatcontains theconnection tocreate the Iceberg external table—for example,myprojectCONNECTION_REGION: the region thatcontains the connection to create theIceberg external table—for example,usCONNECTION_ID: the table connectionID—for example,myconnectionWhen youview the connection detailsin the Google Cloud console, the connection ID is the value in the lastsection of the fully qualified connection ID that is shown inConnection ID—for example
projects/myproject/locations/connection_location/connections/myconnectionDATASET: the name of theBigQuery dataset that you want to create a tableinFor example,
mydataset.EXTERNAL_TABLE: the name of the table thatyou want to createFor example,
mytable.
Update table metadata
If you use a JSON metadata file to create anIceberg external table, update the table definition to thelatest table metadata. To update the schema or the metadata file, select one ofthe following options:
bq
Create a table definition file:
bq mkdef --source_format=ICEBERG \"URI" >TABLE_DEFINITION_FILE
Use the
bq updatecommandwith the--autodetect_schemaflag:bq update --autodetect_schema --external_table_definition=TABLE_DEFINITION_FILEPROJECT_ID:DATASET.TABLE
Replace the following:
URI: your Cloud Storage URI with thelatestJSON metadata fileFor example,
gs://mybucket/us/iceberg/mytable/metadata/1234.metadata.json.TABLE_DEFINITION_FILE: the name of the file containing thetable schemaPROJECT_ID: the project ID containing the table that youwant to updateDATASET: the dataset containing the table that youwant to updateTABLE: the table that you want to update
API
Use thetables.patch methodwith theautodetect_schema property set totrue:
PATCH https://bigquery.googleapis.com/bigquery/v2/projects/PROJECT_ID/datasets/DATASET/tables/TABLE?autodetect_schema=true
Replace the following:
PROJECT_ID: the project ID that contains the table that youwant to updateDATASET: the dataset containing the table that youwant to updateTABLE: the table that you want to update
In the body of the request, specify the updated values for the followingfields:
{ "externalDataConfiguration": { "sourceFormat": "ICEBERG", "sourceUris": [ "URI" ] }, "schema": null }'ReplaceURI with the latest Icebergmetadata file. For example,gs://mybucket/us/iceberg/mytable/metadata/1234.metadata.json.
Set up access control policies
You can control access to Iceberg external tables throughcolumn-level security,row-level security,anddata masking.
Query Iceberg external tables
For more information, seeQuery Iceberg data.
Query historical data
You can access snapshots of Iceberg external tables that areretained in your Iceberg metadata by using theFOR SYSTEM_TIME AS OF clause.
Time travel and fail-safe data retention windowsaren't supported for any external tables.
Data mapping
BigQuery converts Iceberg data types to BigQuerydata types as shown in the following table:
| Iceberg data type | BigQuery data type |
|---|---|
boolean | BOOL |
int | INT64 |
long | INT64 |
float | FLOAT64 |
double | FLOAT64 |
Decimal(P/S) | NUMERIC or BIG_NUMERIC depending on precision |
date | DATE |
time | TIME |
timestamp | DATETIME |
timestamptz | TIMESTAMP |
string | STRING |
uuid | BYTES |
fixed(L) | BYTES |
binary | BYTES |
list<Type> | ARRAY<Type> |
struct | STRUCT |
map<KeyType, ValueType> | ARRAY<Struct<key KeyType, value ValueType>> |
Limitations
Iceberg external tables tables haveexternal table limitations andthe following limitations:
Tables using merge-on-read have the following limitations:
- Each data file can be associated with up to 10,000 delete files.
- No more than 100,000 equality deletes can be applied to a data file.
- You can work around these limitations by either compacting delete filesfrequently, or creating a view on top of the Iceberg table that avoidsfrequently mutated partitions.
BigQuery supports manifest pruning using allIceberg partitiontransformation functions.For information about how to prune partitions, seeQuery partitioned tables. Queriesreferencing Iceberg external tables must contain literals inpredicates compared to columns that are partitioned.
Only Apache Parquet data files are supported.
Merge-on-read costs
On-demand billing for merge-on-read data is the sum of scans of the following data:
- All logical bytes read in the data file (including rows that are marked asdeleted by position and equality deletes).
- Logical bytes read loading the equality delete and position deletes filesto find the deleted rows in a data file.
Require partition filter
You can require the use of predicate filters by enabling therequire partition filter option for your Iceberg table. If you enable this option, attempts to query the table without specifying aWHERE clause that aligns with each manifest file will produce the following error:
Cannot query over tableproject_id.dataset.table without afilter that can be used for partition elimination.
Each manifest file requires at least one predicate suitable for partition elimination.
You can enable therequire_partition_filter in the following ways whilecreating an Iceberg table :
SQL
Use theCREATE EXTERNAL TABLE statement.The following examplecreates an Iceberg external table namedTABLE with require partition filter enabled:
CREATE EXTERNAL TABLETABLE WITH CONNECTION `PROJECT_ID.REGION.CONNECTION_ID` OPTIONS ( format = 'ICEBERG', uris = [URI], require_partition_filter =true )
Replace the following:
TABLE: the table name that you want to create.PROJECT_ID: the project ID containing the table that you want to create.REGION: thelocation where you want to create the Iceberg table.CONNECTION_ID: theconnection ID. For example,myconnection.URI: the Cloud Storage URI with thelatestJSON metadata file.For example,
gs://mybucket/us/iceberg/mytable/metadata/1234.metadata.json.The URI can point to an external cloud location as well; such as Amazon S3 or Azure Blob Storage.
- Example for AWS:
s3://mybucket/iceberg/metadata/1234.metadata.json. - Example for Azure:
azure://mystorageaccount.blob.core.windows.net/mycontainer/iceberg/metadata/1234.metadata.json.
- Example for AWS:
bq
Use thebq mk --table command with the@connection decorator to specify the connection to use at the endof the--external_table_definition parameter.Use--require_partition_filter to enable the require partition filter.The following example creates an Iceberg external table namedTABLE with require partition filter enabled:
bqmk\--table\--external_table_definition=ICEBERG=URI@projects/CONNECTION_PROJECT_ID/locations/CONNECTION_REGION/connections/CONNECTION_ID\PROJECT_ID:DATASET.EXTERNAL_TABLE\--require_partition_filter
Replace the following:
URI: the latestJSON metadata file for a specific table snapshotFor example,
gs://mybucket/mydata/mytable/metadata/iceberg.metadata.json.The URI can point to an external cloud location as well; such as Amazon S3 or Azure Blob Storage.
- Example for AWS:
s3://mybucket/iceberg/metadata/1234.metadata.json. - Example for Azure:
azure://mystorageaccount.blob.core.windows.net/mycontainer/iceberg/metadata/1234.metadata.json.
- Example for AWS:
CONNECTION_PROJECT_ID: the project thatcontains theconnection to createthe Iceberg external table—for example,myprojectCONNECTION_REGION: theregion thatcontains the connection to create theIceberg external table. For example,us.CONNECTION_ID: : theconnection ID. For example,myconnection.When youview the connection detailsin the Google Cloud console, the connection ID is the value in the lastsection of the fully qualified connection ID that is shown inConnection ID—for example
projects/myproject/locations/connection_location/connections/myconnectionDATASET: the name of the BigQuerydataset that contains the table that you want to update. For example,
mydataset.EXTERNAL_TABLE: the name of the tablethat you want to createFor example,
mytable.
You can also update your Iceberg table to enable the require partition filter.
If you don't enable therequire partition filter option when you create the partitioned table, you can update the table to add the option.
bq
Use thebq update command and supply the--require_partition_filter flag.
For example:
To updatemypartitionedtable inmydataset in your default project,enter:
bqupdate--require_partition_filterPROJECT_ID:DATASET.TABLE
What's next
- Learn aboutstored procedure for Spark.
- Learn aboutaccess control policies.
- Learn aboutrunning queries in BigQuery.
- Learn about thesupported statements and SQL dialects inBigQuery.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-15 UTC.