Introduction to loading data
This document explains how you can load data into BigQuery. The twocommon approaches to data integration are to extract, load, andtransform (ELT) or to extract, transform, load (ETL) data.
For an overview of ELT and ETL approaches, seeIntroduction to loading, transforming, and exporting data.
Methods of loading or accessing external data
In the BigQuery page, in theAdd data dialog, you can view allavailable methods to load data into BigQuery or access data fromBigQuery. Choose one of the following options based on your usecase and data sources:
| Loading method | Description |
|---|---|
| Batch load | This method is suitable for batch loading large volumes of data from a variety of sources. For batch or incremental loading of data from Cloud Storage and other supported data sources, we recommend using theBigQuery Data Transfer Service. With the BigQuery Data Transfer Service, to automate data loading pipelines into BigQuery, you can schedule load jobs. You can schedule one-time or batch data transfers at regular intervals (for example, daily or monthly). To ensure that your BigQuery data is always current, you can monitor and log your transfers. For a list of data sources supported by the BigQuery Data Transfer Service, seeSupported data sources. |
| Streaming load | This method enables loading data in near real time from messaging systems. To stream data into BigQuery, you can use a BigQuery subscription inPub/Sub. Pub/Sub can handle high throughput of data loads into BigQuery. It supports real-time data streaming, loading data as it's generated. For more information, seeBigQuery subscriptions. |
| Change Data Capture (CDC) | This method enables replicating data from databases to BigQuery in near real time. Datastream can stream data from databases to BigQuery data with near real-time replication. Datastream leverages CDC capabilities to track and replicate row-level changes from your data sources. For a list of data sources supported by Datastream, seeSources. |
| Federation to external data sources | This method enables access to external data without loading it into BigQuery. BigQuery supports accessing selectexternal data sources through Cloud Storage and federated queries. The advantage of this method is that you don't need to load the data before transforming it for subsequent use. You can perform the transformation by running SELECT statements over the external data. |
You can also use the following programmatic methods to load the data:
| Loading method | Description |
|---|---|
| Batch load | You canload data from Cloud Storage or from a local file by creating a load job. If your source data changes infrequently, or you don't need continuously updated results, load jobs can be a less expensive, less resource-intensive way to load your data into BigQuery. The loaded data can be in Avro, CSV, JSON, ORC, or Parquet format. To create the load job, you can also use the LOAD DATA SQL statement.Popular open source systems, such asSpark and variousETL partners, also support batch loading data into BigQuery. To optimize batch loading into tables to avoid reaching the daily load limit, seeOptimize load jobs. |
| Streaming load | If you must support custom streaming data sources, or preprocess data before streaming it with large throughput into BigQuery, useDataflow. For more information about loading from Dataflow to BigQuery, seeWrite from Dataflow to BigQuery. You can also directly use theBigQuery Storage Write API. To optimize streaming into tables to avoid reaching the daily load limit, seeOptimize load jobs. |
Cloud Data Fusion can help facilitate your ETL process. BigQuery also works with3rd party partners that transform and load data into BigQuery.
BigQuery lets you create external connections to query data that'sstored outside of BigQuery in Google Cloud services likeCloud Storage or Spanner, or in third-party sources likeAmazon Web Services (AWS) or Microsoft Azure. These external connections use theBigQuery Connection API. For more information, seeIntroduction to connections.
Other ways to acquire data
You can run queries on data without loading it into BigQueryyourself. The following sections describe some alternatives.
The following list describes some of the alternatives:
Run queries on public data
Public datasets are datasets stored in BigQuery and shared withthe public. For more information, seeBigQuery public datasets.
Run queries on shared data
To run queries on a BigQuery dataset that someone has sharedwith you, seeIntroduction to BigQuery sharing (formerly Analytics Hub). Sharing is a dataexchange platform that enables data sharing.
Run queries with log data
You can run queries on logs without creating additional load jobs:
Cloud Logging lets youroute logs to a BigQuery destination.
Log Analytics lets yourun queries that analyze your log data.
What's next
- Learn how toprepare data with Gemini inBigQuery.
- Learn more about transforming data withDataform.
- Learn more about monitoring load jobs in theadministrative jobs explorer andBigQuery metrics.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-15 UTC.