Load data with cross-cloud operations

As a BigQuery administrator or analyst, you can load data from an Amazon Simple Storage Service (Amazon S3)bucket or Azure Blob Storage intoBigQuery tables. Youcan either join the transferred data with the data present inGoogle Cloud regions or take advantage of BigQueryfeatures likeBigQuery ML. You canalso create materialized view replicas of certain external sources to make thatdata available in BigQuery.

You can transfer data into BigQuery in the following ways:

Note: If you want to transfer large files from Amazon Simple Storage Service (Amazon S3) bucket orAzure Blob Storage into BigQuery tables on a scheduled basis,useBigQuery Data Transfer Service. If you want to readand process data before transferring data into BigQuerytables, use theCREATE TABLE AS SELECT statement.

Quotas and limits

For information about quotas and limits, seequery jobs quotas and limits.

Before you begin

To provide Google Cloud with read access required to load or filterdata in other clouds, ask your administrator to create aconnection and share it with you. Forinformation about how to create connections, seeConnect to Amazon S3 orBlob Storage.

Required role

To get the permissions that you need to load data using cross-cloud transfers, ask your administrator to grant you theBigQuery Data Editor (roles/bigquery.dataEditor) IAM role on the dataset. For more information about granting roles, seeManage access to projects, folders, and organizations.

This predefined role contains the permissions required to load data using cross-cloud transfers. To see the exact permissions that are required, expand theRequired permissions section:

Required permissions

The following permissions are required to load data using cross-cloud transfers:

  • bigquery.tables.create
  • bigquery.tables.get
  • bigquery.tables.updateData
  • bigquery.tables.update
  • bigquery.jobs.create
  • bigquery.connections.use

You might also be able to get these permissions withcustom roles or otherpredefined roles.

For more information about IAM roles in BigQuery,seeBigQuery IAM roles and permissions.

Pricing

You are billed for the bytes that are transferred across clouds by using theLOAD statement. For pricing information, see the Omni Cross Cloud Data Transfer section inBigQuery Omni pricing.

You are billed for the bytes that are transferred across clouds by using theCREATE TABLE AS SELECT statement orINSERT INTO SELECT statement and for thecompute capacity.

BothLOAD andCREATE TABLE AS SELECT statements require slots in theBigQuery Omni regions to scan Amazon S3 andBlob Storage files to load them. For more information, seeBigQuery Omni pricing.

For materialized view replicas of external data sources, costs can also includematerialized views pricing.

Best practices for load and filter options

  • Avoid loading multiple files that are less than 5 MB. Instead, create an external table for your file and export query result toAmazon S3orBlob Storage to create a larger file. This method helps to improve the transfer time of your data.
  • For information about the limit for maximum query result, seeBigQuery Omni maximum query result size.
  • If your source data is in a gzip-compressed file, then while creating external tables, set theexternal_table_options.compression option toGZIP.

Load data

You can load data into BigQuery with theLOAD DATA [INTO|OVERWRITE] statement.

Limitations

  • The connection and the destination dataset must belong to the same project.Loading data across projects is not supported.
  • LOAD DATA is only supported when you transfer data from an Amazon Simple Storage Service (Amazon S3)or Azure Blob Storage to a colocated BigQuery region. For moreinformation, seeLocations.
    • You can transfer data from anyUS region to aUS multi-region. You canalso transfer from anyEU region to aEU multi-region.

Example

Example 1

The following example loads a parquet file namedsample.parquet from an Amazon S3bucket into thetest_parquet table with an auto-detect schema:

LOADDATAINTOmydataset.testparquetFROMFILES(uris=['s3://test-bucket/sample.parquet'],format='PARQUET')WITHCONNECTION`aws-us-east-1.test-connection`

Example 2

The following example loads a CSV file with the prefixsampled* from yourBlob Storage into thetest_csv table with predefined column partitioning by time:

LOADDATAINTOmydataset.test_csv(NumberINT64,NameSTRING,TimeDATE)PARTITIONBYTimeFROMFILES(format='CSV',uris=['azure://test.blob.core.windows.net/container/sampled*'],skip_leading_rows=1)WITHCONNECTION`azure-eastus2.test-connection`

Example 3

The following example overwrites the existing tabletest_parquet withdata from a file namedsample.parquet with an auto-detect schema:

LOADDATAOVERWRITEmydataset.testparquetFROMFILES(uris=['s3://test-bucket/sample.parquet'],format='PARQUET')WITHCONNECTION`aws-us-east-1.test-connection`

Filter data

You can filter data before transferring them into BigQuery byusing theCREATE TABLE AS SELECT statementand theINSERT INTO SELECT statement.

Limitations

  • If the result of theSELECT query exceeds 60 GiB in logical bytes, thequery fails. The table is not created and data is nottransferred. To learn how to reduce the size of data that is scanned, seeReduce data processed in queries.

  • Temporary tables are not supported.

  • Transferring theWell-known binary (WKB)geospatial data format is not supported.

  • INSERT INTO SELECT statement does not support transferring data into clustered table.

  • In theINSERT INTO SELECT statement, if the destination table is thesame as the source table in theSELECT query, then theINSERT INTO SELECTstatement doesn't modify any rows in the destination table. The destination tableisn't modified as BigQuery can't read data across regions.

  • CREATE TABLE AS SELECT andINSERT INTO SELECT are only supported when youtransfer data from an Amazon S3or Blob Storage to a colocated BigQuery region. For moreinformation, seeLocations.

    • You can transfer data from anyUS region to aUS multi-region. You canalso transfer from anyEU region to aEU multi-region.

Example

Example 1

Suppose you have a BigLake table namedmyawsdataset.orders thatreferences data fromAmazon S3.You want to transfer data from that table to aBigQuery tablemyotherdataset.shipments in the US multi-region.

First, display information about themyawsdataset.orders table:

bqshowmyawsdataset.orders;

The output is similar to the following:

  Last modified             Schema              Type     Total URIs   Expiration----------------- -------------------------- ---------- ------------ -----------  31 Oct 17:40:28   |- l_orderkey: integer     EXTERNAL   1                    |- l_partkey: integer                    |- l_suppkey: integer                    |- l_linenumber: integer                    |- l_returnflag: string                    |- l_linestatus: string                    |- l_commitdate: date

Next, display information about themyotherdataset.shipments table:

bqshowmyotherdataset.shipments

The output is similar to the following. Some columns are omitted to simplify theoutput.

  Last modified             Schema             Total Rows   Total Bytes   Expiration   Time Partitioning   Clustered Fields   Total Logical ----------------- --------------------------- ------------ ------------- ------------ ------------------- ------------------ ---------------  31 Oct 17:34:31   |- l_orderkey: integer      3086653      210767042                                                         210767042                    |- l_partkey: integer                    |- l_suppkey: integer                    |- l_commitdate: date                    |- l_shipdate: date                    |- l_receiptdate: date                    |- l_shipinstruct: string                    |- l_shipmode: string

Now, using theCREATE TABLE AS SELECT statement you can selectively load datato themyotherdataset.orders table in the US multi-region:

CREATEORREPLACETABLEmyotherdataset.ordersPARTITIONBYDATE_TRUNC(l_commitdate,YEAR)ASSELECT*FROMmyawsdataset.ordersWHEREEXTRACT(YEARFROMl_commitdate)=1992;
Note: If you get aResourceExhausted error, retry after sometime. If the issue persists, you cancontact support.

You can then perform a join operation with the newly created table:

SELECTorders.l_orderkey,orders.l_orderkey,orders.l_suppkey,orders.l_commitdate,orders.l_returnflag,shipments.l_shipmode,shipments.l_shipinstructFROMmyotherdataset.shipmentsJOIN`myotherdataset.orders`asordersONorders.l_orderkey=shipments.l_orderkeyANDorders.l_partkey=shipments.l_partkeyANDorders.l_suppkey=shipments.l_suppkeyWHEREorders.l_returnflag='R';-- 'R' means refunded.

When new data is available, append the data of the 1993 year to the destinationtable using theINSERT INTO SELECT statement:

INSERTINTOmyotherdataset.ordersSELECT*FROMmyawsdataset.ordersWHEREEXTRACT(YEARFROMl_commitdate)=1993;

Example 2

The following example inserts data into an ingestion-time partitioned table:

CREATETABLEmydataset.orders(idString,numeric_idINT64)PARTITIONBY_PARTITIONDATE;

After creating a partitioned table, you can insert data into the ingestion-timepartitioned table:

INSERTINTOmydataset.orders(_PARTITIONTIME,id,numeric_id)SELECTTIMESTAMP("2023-01-01"),id,numeric_id,FROMmydataset.ordersof23WHEREnumeric_id>4000000;

Materialized view replicas

A materialized view replica is a replication of externalAmazon Simple Storage Service (Amazon S3), Apache Iceberg, or Salesforce Data Cloud data in aBigQuery dataset so that the data is available locally inBigQuery. This can help you avoid data egress costs and improvequery performance. BigQuery lets youcreate materialized views on BigLake metadata cache-enabled tablesover Amazon Simple Storage Service (Amazon S3), Apache Iceberg, or Salesforce Data Cloud data.

A materialized view replica lets you use the Amazon S3,Iceberg, or Data Cloud materialized viewdata in queries while avoiding data egress costs and improving queryperformance. A materialized view replica does this by replicating theAmazon S3, Iceberg, or Data Clouddata to a dataset in asupported BigQuery region,so that the data is available locally in BigQuery.

Before you begin

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Roles required to select or create a project

    • Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
    • Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.create permission.Learn how to grant roles.
    Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.

    Go to project selector

  3. Verify that billing is enabled for your Google Cloud project.

  4. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Roles required to select or create a project

    • Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
    • Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.create permission.Learn how to grant roles.
    Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.

    Go to project selector

  5. Verify that billing is enabled for your Google Cloud project.

  6. Ensure that you have therequired Identity and Access Management (IAM) permissionsto perform the tasks in this section.

Required roles

To get the permissions that you need to perform the tasks in this section, ask your administrator to grant you theBigQuery Admin (roles/bigquery.admin) IAM role. For more information about granting roles, seeManage access to projects, folders, and organizations.

This predefined role contains the permissions required to perform the tasks in this section. To see the exact permissions that are required, expand theRequired permissions section:

Required permissions

The following permissions are required to perform the tasks in this section:

  • bigquery.tables.create
  • bigquery.tables.get
  • bigquery.tables.getData
  • bigquery.tables.replicateData
  • bigquery.jobs.create

You might also be able to get these permissions withcustom roles or otherpredefined roles.

For more information aboutBigQuery IAM, seeBigQuery IAM roles and permissions.

Prepare a dataset for materialized view replicas

Before creating a materialized view replica, you must complete the followingtasks:

  1. Create a dataset in aregion that supports Amazon S3
  2. Create a source table in the dataset you created in the preceding step. Thesource table can be any of the following table types:

Create materialized view replicas

Select one of the following options:

Console

  1. In the Google Cloud console, go to theBigQuery page.

    Go to BigQuery

  2. In the left pane, clickExplorer:

    Highlighted button for the Explorer pane.

    If you don't see the left pane, clickExpand left pane to open the pane.

  3. In theExplorer pane, navigate to the project and dataset where youwant to create the materialized view replica, and then clickActions> Create table.

  4. In theSource section of theCreate table dialog, do thefollowing:

    1. ForCreate table from, selectExisting table/view.
    2. ForProject, enter the project where the source table or view islocated.
    3. ForDataset, enter the dataset where the source table or view islocated.
    4. ForView, enter the source table or view that you arereplicating. If you choose a view, it must be anauthorized view, or if not, alltables that are used to generate that view must be located in the view'sdataset.
  5. Optional: ForLocal materialized view max staleness, enter amax_staleness valuefor your local materialized view.

  6. In theDestination section of theCreate table dialog, do thefollowing:

    1. ForProject, enter the project in which you want to create thematerialized view replica.
    2. ForDataset, enter the dataset in which you want to create thematerialized view replica.
    3. ForReplica materialized view name, enter a name for yourreplica.
  7. Optional: Specifytags andadvanced options for your materializedview replica. If you don't specify a dataset forLocal Materialized View Dataset, then one is automatically created inthe same project and region as the source data and namedbq_auto_generated_local_mv_dataset. If you don't specify a name forLocal Materialized View Name, then one is automaticallycreated in the same project and region as the source data and given theprefixbq_auto_generated_local_mv_.

  8. ClickCreate table.

A new local materialized view is created (if it wasn't specified) andauthorized in the source dataset. Then the materialized view replica iscreated in the destination dataset.

SQL

  1. Create a materialized viewover the base table in the dataset that you created. You can also create thematerialized view in a different dataset that is in an Amazon S3region.
  2. Authorize the materialized view on thedatasets that contain the source tables used in the query that created thematerialized view.
  3. If you configured manual metadata cache refreshing for the source table,run theBQ.REFRESH_EXTERNAL_METADATA_CACHE system procedureto refresh the metadata cache.
  4. Run theBQ.REFRESH_MATERIALIZED_VIEW system procedureto refresh the materialized view.
  5. Create materialized view replicas by using theCREATE MATERIALIZED VIEW AS REPLICA OF statement:

    CREATEMATERIALIZEDVIEWPROJECT_ID.BQ_DATASET.REPLICA_NAMEOPTIONS(replication_interval_seconds=REPLICATION_INTERVAL)ASREPLICAOFPROJECT_ID.S3_DATASET.MATERIALIZED_VIEW_NAME;

    Replace the following:

    • PROJECT_ID: the name of your project in which you want to create the materialized view replica—for example,myproject.
    • BQ_DATASET: the name of the BigQuery dataset that you want to create the materialized view replica in—for example,bq_dataset. The dataset must be in the BigQueryregion that maps to the region of the source materialized view.
    • REPLICA_NAME: the name of the materialized view replica that you want to create—for example,my_mv_replica.
    • REPLICATION_INTERVAL: specifies how often to replicate the data from the source materialized view to the replica, in seconds. Must be a value between 60 and 3,600, inclusive. Defaults to 300 (5 minutes).
    • S3_DATASET: the name of the dataset that contains the source materialized view—for example,s3_dataset.
    • MATERIALIZED_VIEW_NAME: the name of the materialized view to replicate—for example,my_mv.

    The following example creates a materialized view replica namedmv_replicainbq_dataset:

    CREATEMATERIALIZEDVIEW`myproject.bq_dataset.mv_replica`OPTIONS(replication_interval_seconds=600)ASREPLICAOF`myproject.s3_dataset.my_s3_mv`

After you create the materialized view replica, the replication process pollsthe source materialized view for changes and replicates data to the materializedview replica, refreshing the data at the interval you specified in thereplication_interval_seconds ormax_staleness option. If you query thereplica before the first backfill completes, you get abackfill in progresserror. You can query the data in the materialized view replica after the firstreplication completes.

Data freshness

After you create the materialized view replica, the replication processpolls the source materialized view for changes and replicates data to thematerialized view replica. The data is replicated at the interval you specifiedin thereplication_interval_seconds option of theCREATE MATERIALIZED VIEW AS REPLICA OF statement.

In addition to the replication interval, the freshness of the materialized viewreplica data is also affected by how often the source materialized viewrefreshes, and how often the metadata cache of the Amazon S3,Iceberg, or Data Cloud table used by thematerialized view refreshes.

You can check the data freshness for the materialized view replica and theresources it is based on by using the Google Cloud console:

  • For materialized view replica freshness, look at theLast modified fieldin the materialized view replica'sDetails pane.
  • For source materialized view freshness, look at theLast modified fieldin the materialized view'sDetails pane.
  • For source Amazon S3, Iceberg, orData Cloud table metadata cache freshness, look at theMax staleness field in the materialized view'sDetails pane.

Supported materialized view replica regions

Use the location mappings in the following table when creating materializedview replicas:

Location of the source materialized viewLocation of the materialized view replica
aws-us-east-1TheUSmulti-region, or any of the followingregions:
  • northamerica-northeast1
  • northamerica-northeast2
  • us-central1
  • us-east1
  • us-east4
  • us-east5
  • us-south1
  • us-west1
  • us-west2
  • us-west3
  • us-west4
aws-us-west-2TheUSmulti-region, or any of the followingregions:
  • northamerica-northeast1
  • northamerica-northeast2
  • us-central1
  • us-east1
  • us-east4
  • us-east5
  • us-south1
  • us-west1
  • us-west2
  • us-west3
  • us-west4
aws-eu-west-1TheEUmulti-region, or any of the followingregions:
  • europe-central2
  • europe-north1
  • europe-southwest1
  • europe-west1
  • europe-west2
  • europe-west3
  • europe-west4
  • europe-west6
  • europe-west8
  • europe-west9
  • europe-west10
aws-ap-northeast-2Any of the followingregions:
  • asia-east1
  • asia-east2
  • asia-northeast1
  • asia-northeast2
  • asia-northeast3
  • asia-south1
  • asia-south2
  • asia-southeast1
aws-ap-southeast-2Any of the followingregions:
  • australia-southeast1
  • australia-southeast2

Limitations of materialized view replicas

Materialized view replica pricing

Use of materialized view replicas incurs compute, outbound data transfer, andstorage costs.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-18 UTC.