Optimize load jobs

The strategies and best practices described in this document help you optimizebatch loading or streaming data into BigQuery to avoidreachingthe limit for the number of load jobsper table, per day.

Because the limit for load jobs is fixed and can't be increased, you shouldoptimize your load jobs by structuring your tables through methods such astable partitions, or by managing your loads through methods such asbatch loading or streaming.

How table operations quotas work

The BigQuery limit for table modifications per table per dayper project is fixed, regardless of whether modifications append orupdate data, or truncate the table. This limit includes the combined total ofallload jobs,copyjobs, andquery jobsthat add to or overwrite a destination table.

Load jobs have arefill rate. If you exceed the table operation limit or itsrefill rate, load jobs fail with aquotaExceeded error. Theproject-level limit for load jobs per day refills within a rolling 24-hourperiod. When load jobs finish, your available quota decreases. The quota thengradually refills over the next 24 hours. Failed load jobs still count towardboth per-table and per-project quotas. For more information about load joblimits, seeLoad jobs.

For partitioned tables, a separatelimit for partitioned table modificationsapplies, replacing the standard table limit.

To stay within your daily table operation limits, spread operations over a24-hour period. For example, if you perform 25 updates, each with 60 operations,you can run about 60 operations every 58 minutes. This approach helps you meetthe daily limit. To monitor table updates, seeBigQueryINFORMATION_SCHEMA views.

Table operations excluded from quota

Updating table information (metadata) and using DML statements does not counttoward your daily table modification limit. This exclusion applies to bothstandard and partitioned tables.

Your project can run an unlimited number of DML statements. While DML statementspreviously counted toward daily table modifications and were not throttled evenat the limit, they no longer do.

Streaming inserts also modify tables, buttheir own specificquotas govern them.

Load strategies to avoid the table operations limit

To stay within BigQuery's daily table operation limit, consider these best practices:

Perform fewer, larger writes instead of many small ones.
Minimize separate write jobs to your final production table each day.

To use these best practices, batch or stream your data into BigQuery. Your choice of load methoddepends on whether you need to load high volumes of data in real time, or ifreal-time loading is not a concern. The following sections explain batch loadingand data streaming in detail, including the tools and services you can use for eachmethod.

Batch loading

To stay within the daily load limit per project for BigQuery,batch large amounts of data and load it with fewer jobs intoBigQuery. The following sections describe several methods youcan use to batch load your data.

Load more data for each job

Instead of sending data to BigQuery each time new informationbecomes available, collect and load it intoBigQuery using a single large job.

For example, instead of running a separate load job for every few rows of data,you can wait until you accumulate several thousand rows of data in a file—forexample, in a CSV or JSON file—and then run one load job to append all the datato a table. This action counts as one table operation, even though the jobcontains much more data. You can batch your files byusing wildcards withyour load job. Wildcards let you select batches of files in a directory toload multiple files in a single load job.

The following example shows how to use wildcards with yourbq load command orSQLLOAD DATA queries.

bq

The following example shows abq load commandto load CSV data from Cloud Storage into a BigQuery table namedmy_target_table. To select more than one source filename, use a wildcardwith the command. TheAUTODETECT flag automatically determines your tableschema from the source data in Cloud Storage, and can supporta wildcard (*) to load multiple files that fit a specific naming patterninto the BigQuery table.

bqload\--source_format=CSV\--autodetect\--project_id=PROJECT_ID\DATASET_NAME.TABLE_NAME\"gs://BUCKET_NAME/OBJECT_PATH_WILDCARD"

Replace the following:

PROJECT_ID: the ID of your Google Cloud project.
DATASET_NAME: the name of theBigQuery dataset where you want to load the data.
TABLE_NAME: the name of the BigQuerytable where you want to load the data.
BUCKET_NAME: the name of yourCloud Storage bucket that contains the source files.
OBJECT_PATH_WILDCARD: the path to your CSV filesin the Cloud Storage bucket. Include a wildcard (*) to matchmultiple files. For example, the stringgs://my-bucket/path/to/data/my_prefix_*.csvuses the wildcard character* to load all files ings://my-bucket/path/to/data/ that begin withmy_prefix_ and endwith.csv.

For more information, see the following:

SQL

The following example shows how to use the SQLLOAD DATA queryto load CSV data from a Cloud Storage bucket intoBigQuery table. To select more than one source filename, use awildcard with the command.

LOADDATAINTODATASET_NAME.TABLE_NAMEFROMFILES(format='SOURCE_FORMAT',uris=['gs://BUCKET_NAME/OBJECT_PATH_WILDCARD]);

Replace the following:

DATASET_NAME: the name of theBigQuery dataset where you want to load the data.
TABLE_NAME: the name of the BigQuerytable where you want to load the data.
TheSOURCE_FORMAT sets the type of yoursource files, for example,CSV orJSON. In this example, useCSV.
BUCKET_NAME: the name of yourCloud Storage bucket that contains the source files.
OBJECT_PATH_WILDCARD: the path to your CSV filesin the Cloud Storage bucket. Include a wildcard (*) to matchmultiple files. For example, the stringgs://my-bucket/path/to/data/my_prefix_*.csvuses the wildcard character* to load all files ings://my-bucket/path/to/data/ that begin withmy_prefix_ and endwith.csv.

For more information, seeLoad statements in GoogleSQL.

Batch load using the BigQuery Storage Write API

To load batch data into BigQuery, one option is to use theStorage Write API directly from your application with theGoogle API Client Libraries.

The Storage Write API optimizes data loading to stay within tablelimits. For high-volume, real-time streaming, use aPENDING stream, ratherthan aCOMMITTED stream. When you use aPENDING stream, the API temporarilystores records until you commit the stream.

For a complete example of batch loading data using the Storage Write API, seeBatch load data using the Storage Write API.

Batch load using Dataflow

If you want to stream, transform, and write data into BigQueryusing data pipelines, you can use Dataflow. The data pipelinesthat you create read from supported sources like Pub/Sub orApache Kafka. You can also create a Dataflow pipelineusing theBigQueryIO connector, which uses the Storage Write APIfor high-performance data streaming and exactly-once semantics.

For information about using Dataflow to batch load data toBigQuery, seeWrite from Dataflow toBigQuery.

Data streaming

To load high volumes of data with frequent updates, we recommend that you streamyour data into BigQuery. With data streaming, new datacontinuously writes from your client application into BigQuery,a strategy that avoids reaching the limit for running too many load jobs. Thefollowing sections describe several methods to stream your data intoBigQuery.

Stream data using the Storage Write API

Use theStorage Write API tostream records in real time into BigQuery with minimallatency. TheStorage Write API providesan efficient streaming protocol that provides advanced functionality likeexactly-once delivery semantics, schema update detection, and streamingChange Data Capture (CDC) upserts. In addition, you can ingest up to 2 TiB permonth at no cost.

For information about using the Storage Write API, seeStreaming data using the Storage Write API.

Stream data using Dataflow

Use Dataflow to create data pipelines that read fromsupported sources, for example, Pub/Sub or ApacheKafka. These pipelines then transform and write the data toBigQuery as a destination. You can create a Dataflowpipeline using theBigQueryIO connector, which uses theStorage Write API.

For information about using Dataflow to stream data toBigQuery, seeWrite from Dataflow toBigQuery.

Best practices to manage your tables for loading

In addition to batch loading or streaming data into BigQuery,manage your tables in the following ways to optimize them fordata ingestion.

Use partitioned tables

Table partitioning is a powerful techniquefor managing large tables in BigQuery, especially when you needto perform frequent data loading operations. You can significantly improve tableperformance and cost-effectiveness by dividing a table into smaller, moremanageable segments based on a date, timestamp, or integer.

The primary advantage of partitioning for data loading is that the daily tableoperation quotas for BigQuery apply at the partition level ratherthan at the table level. For partitioned tables, a separate, higher limitapplies topartition modifications, whichreplaces thestandard table limit. The limitfor partitioned tables dramatically increases the number of load jobs you canrun per day without reaching quota limits.

A common and highly effective strategy is to batch load your daily data. Forexample, you can gather all of the day's data for2025-09-18 in a temporarystaging table. Then, at the end of the day, you run a single job to loadthis data into the specific partition for this day in your main production table.Because BigQuery interacts only with the data for a singlepartition, this approach keeps your data well organized and makes yourloading operations faster and less expensive.

While partitioning is highly recommended for large, growing tables, it's best toavoid it if your partitions would be consistently smaller than 10 GB. For moreinformation, seeWhen to use partitioning.

To learn more about the different partitioning methods available, such astime-unit and integer-range partitioning, seeTypes of partitioned tables.

Take advantage of built-in exponential backoff, truncate, and jitter

Built-inexponential backoff andretryis an error-handling method that helps your application recover smoothly when anoperation fails temporarily. Such failures can include a rate limit error(rateLimitExceeded) or a brief network problem (unavailable).

In a reliable system, workers that take tasks from your client-side queue alsouse exponential backoff and retry. They do this when callingBigQuery, which creates two levels of protection.

For example, the officialgoogle-cloud-bigquery-storage library for Pythonincludes built-in retry logic with exponential backoff. This logic handlestemporary gRPC errors, for example,UNAVAILABLE. In most cases, you don't needto write this retry code yourself. Theclient.append_rows() call handles theseretries automatically.

This built-in handling is a significant benefit of using the official clientlibraries. You only need to deal with errors that cannot be retried, for example,INVALID_ARGUMENT, which means there is a schema mismatch.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.

Movatterモバイル変換