BigQuery Storage Write API best practices

This document gives best practices for using the BigQuery Storage Write API. Beforereading this document, readOverview of the BigQuery Storage Write API.

Limit the rate of stream creation

Before creating a stream, consider whether you can use thedefault stream. For streamingscenarios, the default stream has fewer quota limitations and can scale betterthan using application-created streams. If you use an application-createdstream, then make sure to utilize the maximum throughput on each stream beforecreating additional streams. For example, useasynchronous writes.

For application-created streams, avoid callingCreateWriteStream at a highfrequency. Generally, if you exceed 40-50 calls per second, the latency of theAPI calls grows substantially (>25s). Make sure your application can accept acold start and ramp up the number of streams gradually, and limit the rate ofCreateWriteStream calls. You might also set a larger deadline to wait for thecall to complete, so that it doesn't fail with aDeadlineExceeded error. Thereis also a longer-termquota on the maximumrate ofCreateWriteStream calls. Creating streams is a resource-intensiveprocess, so reducing the rate of stream creations and fully utilizing existingstreams is the best way to not run over this limit.

Connection pool management

TheAppendRows method creates a bidirectional connection to a stream. You canopen multiple connections on the default stream, but only a single activeconnection on application-created streams.

When using the default stream, you can use Storage Write APImultiplexing to write to multiple destination tables with shared connections.Multiplexing pools connections for better throughput and utilization ofresources. If your workflow has over 20 concurrent connections, we recommendthat you use multiplexing. Multiplexing is available in Javaand Go. For Java implementation details, seeUse multiplexing. For Goimplementation details, seeConnection Sharing (Multiplexing). If you use theBeam connector with at-least-once semantics,you can enable multiplexing throughUseStorageApiConnectionPool. Dataproc Sparkconnector has Multiplexing enabled by default.

For best performance, use one connection for as many data writes as possible.Don't use one connection for just a single write, or open and close streams formany small writes.

There is a quota on the number ofconcurrent connections that can beopen at the same time per project. Above the limit, calls toAppendRows fail.However, the quota for concurrent connections can be increased and should notnormally be a limiting factor for scaling.

Each call toAppendRows creates a new data writer object. So,when using an application-created stream, the number of connections correspondsto the number of streams that have been created. Generally, a single connectionsupports at least 1MBps of throughput. The upper bound depends on severalfactors, such as network bandwidth, the schema of the data, and server load, butcan exceed 10MBps.

There is also a quota on thetotal throughput per project. Thisrepresents the bytes per second across all connections flowing through theStorage Write API service. If your project exceeds this quota, youcanrequest a quota adjustment.Typically this involves raising accompanying quotas, like the concurrentconnections quota, in an equal ratio.

Manage stream offsets to achieve exactly-once semantics

The Storage Write API only allows writes to the current end of thestream, which moves as data is appended. The current position in the stream isspecified as an offset from the start of the stream.

When you write to an application-created stream, you can specify the streamoffset to achieve exactly-once write semantics.

When you specify an offset, the write operation is idempotent, which makes itsafe to retry due to network errors or unresponsiveness from the server.Handle the following errors related to offsets:

ALREADY_EXISTS (StorageErrorCode.OFFSET_ALREADY_EXISTS): The row wasalready written. You can safely ignore this error.
OUT_OF_RANGE (StorageErrorCode.OFFSET_OUT_OF_RANGE): A previous writeoperation failed. Retry from the last successful write.

Note that these errors can also happen if you set the wrong offset value, so youhave to manage offsets carefully.

Before using stream offsets, consider whether you need exactly-once semantics.For example, if your upstream data pipeline only guarantees at-least-oncewrites, or if you can easily detect duplicates after data ingestion, then youmight not require exactly-once writes. In that case, we recommend using thedefault stream, which does not require keeping track of row offsets.

Do not block on`AppendRows` calls

TheAppendRows method is asynchronous. You can send a series of writes withoutblocking on a response for each write individually. The response messages on thebidirectional connection arrive in the same order as the requests were enqueued.For the highest throughput, callAppendRows without blocking to wait on theresponse.

Handle schema updates

For data streaming scenarios, table schemas are usually managed outside of thestreaming pipeline. It's common for the schema to evolve over time, for exampleby adding new nullable fields. A robust pipeline must handle out-of-band schemaupdates.

The Storage Write API supports table schemas as follows:

The first write request includes the schema.
You send each row of data as a binary protocol buffer. BigQuerymaps the data to the schema.
You can omit nullable fields, but you cannot include any fields that are notpresent in the current schema. If you send rows with extra fields, theStorage Write API returns aStorageError withStorageErrorCode.SCHEMA_MISMATCH_EXTRA_FIELD.

If you want to send new fields in the payload, you should first update the tableschema in BigQuery. The Storage Write API detectsschema changes after a short time, on the order of minutes. When theStorage Write API detects the schema change, theAppendRowsResponse response message contains aTableSchema object that describes the new schema.

To send data using the updated schema, you must close existing connections andopen new connections with the new schema.

Java client. The Java client library provides some additional features forschema updates, through theJsonStreamWriter class. Aftera schema update, theJsonStreamWriter automatically reconnects with theupdated schema. You don't need to explicitly close and reopen the connection.To check for schema changes programmatically, callAppendRowsResponse.hasUpdatedSchema after theappendmethod completes.

Note: Schema updates aren't immediately visible to the client library, but aredetected on the order of minutes.

You can also configure theJsonStreamWriter to ignore unknown fields in theinput data. To set this behavior, call setIgnoreUnknownFields. This behavior is similar totheignoreUnknownValues option when using the legacytabledata.insertAllAPI. However, it can lead to unintentional data loss, because unknown fields aresilently dropped.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-05 UTC.

Movatterモバイル変換