Load data with cross-cloud operations

As a BigQuery administrator or analyst, you can load data from an Amazon Simple Storage Service (Amazon S3)bucket or Azure Blob Storage intoBigQuery tables. Youcan either join the transferred data with the data present inGoogle Cloud regions or take advantage of BigQueryfeatures likeBigQuery ML.

You can transfer data into BigQuery in the following ways:

Transfer data from files in Amazon S3 and Azure Blob Storage intoBigQuery tables, byusing theLOAD DATA statement.
Filter data from files in Amazon S3 or Blob Storagebefore transferring results into BigQuery tables, by using theCREATE TABLE AS SELECT statement. To append data to the destinationtable, use theINSERT INTO SELECT statement.
Data manipulation is applied on the external tables thatreference data fromAmazon S3orBlob Storage.

Note: If you want to transfer large files from Amazon Simple Storage Service (Amazon S3) bucket orAzure Blob Storage into BigQuery tables on a scheduled basis,use BigQuery Data Transfer Service. If you want to readand process data before transferring data into BigQuerytables, use theCREATE TABLE AS SELECT statement.

Quotas and limits

For information about quotas and limits, seequery jobs quotas and limits.

Pricing

You are billed for the bytes that are transferred across clouds by using theLOAD statement. For pricing information, see the Omni Cross Cloud Data Transfer section inBigQuery Omni pricing.

You are billed for the bytes that are transferred across clouds by using theCREATE TABLE AS SELECT statement orINSERT INTO SELECT statement and for thecompute capacity.

BothLOAD andCREATE TABLE AS SELECT statements require slots in theBigQuery Omni regions to scan Amazon S3 andBlob Storage files to load them. For more information, seeBigQuery Omni pricing.

Before you begin

To provide Google Cloud with read access to the files in other clouds,ask your administrator to create aconnectionand share it with you. For information about how to create connections, seeConnect to Amazon S3 orBlob Storage.

Required role

To get the permissions that you need to load data using cross-cloud transfers, ask your administrator to grant you theBigQuery Data Editor (roles/bigquery.dataEditor) IAM role on the dataset. For more information about granting roles, seeManage access to projects, folders, and organizations.

This predefined role contains the permissions required to load data using cross-cloud transfers. To see the exact permissions that are required, expand theRequired permissions section:

Required permissions

The following permissions are required to load data using cross-cloud transfers:

bigquery.tables.create
bigquery.tables.get
bigquery.tables.updateData
bigquery.tables.update
bigquery.jobs.create
bigquery.connections.use

You might also be able to get these permissions withcustom roles or otherpredefined roles.

For more information about IAM roles in BigQuery,seePredefined roles and permissions.

Load data

You can load data into BigQuery with theLOAD DATA [INTO|OVERWRITE] statement.

Limitations

The connection and the destination dataset must belong to the same project.Loading data across projects is not supported.
LOAD DATA is only supported when you transfer data from an Amazon Simple Storage Service (Amazon S3)or Azure Blob Storage to a colocated BigQuery region. For moreinformation, seeLocations.
- You can transfer data from anyUS region to aUS multi-region. You canalso transfer from anyEU region to aEU multi-region.

Example

Example 1

The following example loads a parquet file namedsample.parquet from an Amazon S3bucket into thetest_parquet table with an auto-detect schema:

LOADDATAINTOmydataset.testparquetFROMFILES(uris=['s3://test-bucket/sample.parquet'],format='PARQUET')WITHCONNECTION`aws-us-east-1.test-connection`

Example 2

The following example loads a CSV file with the prefixsampled* from yourBlob Storage into thetest_csv table with predefined column partitioning by time:

LOADDATAINTOmydataset.test_csv(NumberINT64,NameSTRING,TimeDATE)PARTITIONBYTimeFROMFILES(format='CSV',uris=['azure://test.blob.core.windows.net/container/sampled*'],skip_leading_rows=1)WITHCONNECTION`azure-eastus2.test-connection`

Example 3

The following example overwrites the existing tabletest_parquet withdata from a file namedsample.parquet with an auto-detect schema:

LOADDATAOVERWRITEmydataset.testparquetFROMFILES(uris=['s3://test-bucket/sample.parquet'],format='PARQUET')WITHCONNECTION`aws-us-east-1.test-connection`

Filter data

You can filter data before transferring them into BigQuery byusing theCREATE TABLE AS SELECT statementand theINSERT INTO SELECT statement.

Limitations

If the result of theSELECT query exceeds 60 GiB in logical bytes, thequery fails. The table is not created and data is nottransferred. To learn how to reduce the size of data that is scanned, seeReduce data processed in queries.
Temporary tables are not supported.
Transferring theWell-known binary (WKB)geospatial data format is not supported.
INSERT INTO SELECT statement does not support transferring data into clustered table.
In theINSERT INTO SELECT statement, if the destination table is thesame as the source table in theSELECT query, then theINSERT INTO SELECTstatement doesn't modify any rows in the destination table. The destination tableisn't modified as BigQuery can't read data across regions.
CREATE TABLE AS SELECT andINSERT INTO SELECT are only supported when youtransfer data from an Amazon S3or Blob Storage to a colocated BigQuery region. For moreinformation, seeLocations.
- You can transfer data from anyUS region to aUS multi-region. You canalso transfer from anyEU region to aEU multi-region.

Example

Example 1

Suppose you have a BigLake table namedmyawsdataset.orders thatreferences data fromAmazon S3.You want to transfer data from that table to aBigQuery tablemyotherdataset.shipments in the US multi-region.

First, display information about themyawsdataset.orders table:

bqshowmyawsdataset.orders;

The output is similar to the following:

  Last modified             Schema              Type     Total URIs   Expiration----------------- -------------------------- ---------- ------------ -----------  31 Oct 17:40:28   |- l_orderkey: integer     EXTERNAL   1                    |- l_partkey: integer                    |- l_suppkey: integer                    |- l_linenumber: integer                    |- l_returnflag: string                    |- l_linestatus: string                    |- l_commitdate: date

Next, display information about themyotherdataset.shipments table:

bqshowmyotherdataset.shipments

The output is similar to the following. Some columns are omitted to simplify theoutput.

  Last modified             Schema             Total Rows   Total Bytes   Expiration   Time Partitioning   Clustered Fields   Total Logical ----------------- --------------------------- ------------ ------------- ------------ ------------------- ------------------ ---------------  31 Oct 17:34:31   |- l_orderkey: integer      3086653      210767042                                                         210767042                    |- l_partkey: integer                    |- l_suppkey: integer                    |- l_commitdate: date                    |- l_shipdate: date                    |- l_receiptdate: date                    |- l_shipinstruct: string                    |- l_shipmode: string

Now, using theCREATE TABLE AS SELECT statement you can selectively load datato themyotherdataset.orders table in the US multi-region:

CREATEORREPLACETABLEmyotherdataset.ordersPARTITIONBYDATE_TRUNC(l_commitdate,YEAR)ASSELECT*FROMmyawsdataset.ordersWHEREEXTRACT(YEARFROMl_commitdate)=1992;

Note: If you get aResourceExhausted error, retry after sometime. If the issue persists, you can contact support.

You can then perform a join operation with the newly created table:

SELECTorders.l_orderkey,orders.l_orderkey,orders.l_suppkey,orders.l_commitdate,orders.l_returnflag,shipments.l_shipmode,shipments.l_shipinstructFROMmyotherdataset.shipmentsJOIN`myotherdataset.orders`asordersONorders.l_orderkey=shipments.l_orderkeyANDorders.l_partkey=shipments.l_partkeyANDorders.l_suppkey=shipments.l_suppkeyWHEREorders.l_returnflag='R';-- 'R' means refunded.

When new data is available, append the data of the 1993 year to the destinationtable using theINSERT INTO SELECT statement:

INSERTINTOmyotherdataset.ordersSELECT*FROMmyawsdataset.ordersWHEREEXTRACT(YEARFROMl_commitdate)=1993;

Example 2

The following example inserts data into an ingestion-time partitioned table:

CREATETABLEmydataset.orders(idString,numeric_idINT64)PARTITIONBY_PARTITIONDATE;

After creating a partitioned table, you can insert data into the ingestion-timepartitioned table:

INSERTINTOmydataset.orders(_PARTITIONTIME,id,numeric_id)SELECTTIMESTAMP("2023-01-01"),id,numeric_id,FROMmydataset.ordersof23WHEREnumeric_id>4000000;

Best practices

Avoid loading multiple files that are less than 5 MB. Instead, create an external table for your file and export query result toAmazon S3orBlob Storage to create a larger file. This method helps to improve the transfer time of your data.

BigQuery Omni maximum query result size

If your source data is in a gzip-compressed file, then while creating external tables, set theexternal_table_options.compression option toGZIP.

What's next

Learn aboutBigQuery ML.
Learn aboutBigQuery Omni.
Learn how torun queries.
Learn how toset up VPC Service Controls for BigQuery Omni.
Learn how to schedule and manage recurring load jobs fromAmazon S3 into BigQuery andBlob Storage into BigQuery.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-07-02 UTC.

Movatterモバイル変換

Load data with cross-cloud operations

Quotas and limits

Pricing

Before you begin

Required role

Required permissions

Load data

Limitations

Example

Example 1

Example 2

Example 3

Filter data

Limitations

Example

Example 1

Example 2

Best practices

What's next