Use the legacy streaming API Stay organized with collections Save and categorize content based on your preferences.
This document describes how to stream data into BigQuery by using thelegacytabledata.insertAllmethod.
For new projects, we recommend using theBigQuery Storage Write API instead of thetabledata.insertAll method. The Storage Write API has lowerpricing and more robust features, including exactly-once delivery semantics. Ifyou are migrating an existing project from thetabledata.insertAll methodto the Storage Write API, we recommend selecting thedefault stream. Thetabledata.insertAll method is still fully supported.
Before you begin
Ensure that you have write access to the dataset that contains yourdestination table. The table must exist before you begin writing data to itunless you are using template tables. For more information on template tables,seeCreating tables automatically using template tables.
Check thequota policy for streaming data.
Verify that billing is enabled for your Google Cloud project.
Grant Identity and Access Management (IAM) roles that give users the necessary permissions to perform each task in this document.
Streaming is not available through thefree tier. If you attempt to use streaming without enabling billing, you receive the following error:BigQuery: Streaming insert is not allowed in the free tier.
Required permissions
To stream data into BigQuery, you need the following IAM permissions:
bigquery.tables.updateData(lets you insert data into the table)bigquery.tables.get(lets you obtain table metadata)bigquery.datasets.get(lets you obtain dataset metadata)bigquery.tables.create(required if you use atemplate table to create the table automatically)
Each of the following predefined IAM roles includes the permissions that you need in order to stream data into BigQuery:
roles/bigquery.dataEditorroles/bigquery.dataOwnerroles/bigquery.admin
For more information about IAM roles and permissions inBigQuery, seePredefined roles and permissions.
Stream data into BigQuery
C#
Before trying this sample, follow theC# setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQueryC# API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.
usingGoogle.Cloud.BigQuery.V2;publicclassBigQueryTableInsertRows{publicvoidTableInsertRows(stringprojectId="your-project-id",stringdatasetId="your_dataset_id",stringtableId="your_table_id"){BigQueryClientclient=BigQueryClient.Create(projectId);BigQueryInsertRow[]rows=newBigQueryInsertRow[]{// The insert ID is optional, but can avoid duplicate data// when retrying inserts.newBigQueryInsertRow(insertId:"row1"){{"name","Washington"},{"post_abbr","WA"}},newBigQueryInsertRow(insertId:"row2"){{"name","Colorado"},{"post_abbr","CO"}}};client.InsertRows(datasetId,tableId,rows);}}Go
Before trying this sample, follow theGo setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQueryGo API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.
import("context""fmt""cloud.google.com/go/bigquery")// Item represents a row item.typeItemstruct{NamestringAgeint}// Save implements the ValueSaver interface.// This example disables best-effort de-duplication, which allows for higher throughput.func(i*Item)Save()(map[string]bigquery.Value,string,error){returnmap[string]bigquery.Value{"full_name":i.Name,"age":i.Age,},bigquery.NoDedupeID,nil}// insertRows demonstrates inserting data into a table using the streaming insert mechanism.funcinsertRows(projectID,datasetID,tableIDstring)error{// projectID := "my-project-id"// datasetID := "mydataset"// tableID := "mytable"ctx:=context.Background()client,err:=bigquery.NewClient(ctx,projectID)iferr!=nil{returnfmt.Errorf("bigquery.NewClient: %w",err)}deferclient.Close()inserter:=client.Dataset(datasetID).Table(tableID).Inserter()items:=[]*Item{// Item implements the ValueSaver interface.{Name:"Phred Phlyntstone",Age:32},{Name:"Wylma Phlyntstone",Age:29},}iferr:=inserter.Put(ctx,items);err!=nil{returnerr}returnnil}Java
Before trying this sample, follow theJava setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQueryJava API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.
importcom.google.cloud.bigquery.BigQuery;importcom.google.cloud.bigquery.BigQueryError;importcom.google.cloud.bigquery.BigQueryException;importcom.google.cloud.bigquery.BigQueryOptions;importcom.google.cloud.bigquery.InsertAllRequest;importcom.google.cloud.bigquery.InsertAllResponse;importcom.google.cloud.bigquery.TableId;importjava.util.HashMap;importjava.util.List;importjava.util.Map;// Sample to inserting rows into a table without running a load job.publicclassTableInsertRows{publicstaticvoidmain(String[]args){// TODO(developer): Replace these variables before running the sample.StringdatasetName="MY_DATASET_NAME";StringtableName="MY_TABLE_NAME";// Create a row to insertMap<String,Object>rowContent=newHashMap<>();rowContent.put("booleanField",true);rowContent.put("numericField","3.14");// TODO(developer): Replace the row id with a unique value for each row.StringrowId="ROW_ID";tableInsertRows(datasetName,tableName,rowId,rowContent);}publicstaticvoidtableInsertRows(StringdatasetName,StringtableName,StringrowId,Map<String,Object>rowContent){try{// Initialize client that will be used to send requests. This client only needs to be created// once, and can be reused for multiple requests.BigQuerybigquery=BigQueryOptions.getDefaultInstance().getService();// Get tableTableIdtableId=TableId.of(datasetName,tableName);// Inserts rowContent into datasetName:tableId.InsertAllResponseresponse=bigquery.insertAll(InsertAllRequest.newBuilder(tableId)// More rows can be added in the same RPC by invoking .addRow() on the builder.// You can omit the unique row ids to disable de-duplication..addRow(rowId,rowContent).build());if(response.hasErrors()){// If any of the insertions failed, this lets you inspect the errorsfor(Map.Entry<Long,List<BigQueryError>>entry:response.getInsertErrors().entrySet()){System.out.println("Response error: \n"+entry.getValue());}}System.out.println("Rows successfully inserted into table");}catch(BigQueryExceptione){System.out.println("Insert operation not performed \n"+e.toString());}}}Node.js
Before trying this sample, follow theNode.js setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQueryNode.js API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.
// Import the Google Cloud client libraryconst{BigQuery}=require('@google-cloud/bigquery');constbigquery=newBigQuery();asyncfunctioninsertRowsAsStream(){// Inserts the JSON objects into my_dataset:my_table./** * TODO(developer): Uncomment the following lines before running the sample. */// const datasetId = 'my_dataset';// const tableId = 'my_table';constrows=[{name:'Tom',age:30},{name:'Jane',age:32},];// Insert data into a tableawaitbigquery.dataset(datasetId).table(tableId).insert(rows);console.log(`Inserted${rows.length} rows`);}PHP
Before trying this sample, follow thePHP setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQueryPHP API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.
use Google\Cloud\BigQuery\BigQueryClient;/** * Stream data into bigquery * * @param string $projectId The project Id of your Google Cloud Project. * @param string $datasetId The BigQuery dataset ID. * @param string $tableId The BigQuery table ID. * @param string $data Json encoded data For eg, * $data = json_encode([ * "field1" => "value1", * "field2" => "value2", * ]); */function stream_row( string $projectId, string $datasetId, string $tableId, string $data): void { // instantiate the bigquery table service $bigQuery = new BigQueryClient([ 'projectId' => $projectId, ]); $dataset = $bigQuery->dataset($datasetId); $table = $dataset->table($tableId); $data = json_decode($data, true); $insertResponse = $table->insertRows([ ['data' => $data], // additional rows can go here ]); if ($insertResponse->isSuccessful()) { print('Data streamed into BigQuery successfully' . PHP_EOL); } else { foreach ($insertResponse->failedRows() as $row) { foreach ($row['errors'] as $error) { printf('%s: %s' . PHP_EOL, $error['reason'], $error['message']); } } }}Python
Before trying this sample, follow thePython setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQueryPython API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.
fromgoogle.cloudimportbigquery# Construct a BigQuery client object.client=bigquery.Client()# TODO(developer): Set table_id to the ID of table to append to.# table_id = "your-project.your_dataset.your_table"rows_to_insert=[{"full_name":"Phred Phlyntstone","age":32},{"full_name":"Wylma Phlyntstone","age":29},]errors=client.insert_rows_json(table_id,rows_to_insert)# Make an API request.iferrors==[]:print("New rows have been added.")else:print("Encountered errors while inserting rows:{}".format(errors))Ruby
Before trying this sample, follow theRuby setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQueryRuby API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.
require"google/cloud/bigquery"deftable_insert_rowsdataset_id="your_dataset_id",table_id="your_table_id"bigquery=Google::Cloud::Bigquery.newdataset=bigquery.datasetdataset_idtable=dataset.tabletable_idrow_data=[{name:"Alice",value:5},{name:"Bob",value:10}]response=table.insertrow_dataifresponse.success?puts"Inserted rows successfully"elseputs"Failed to insert#{response.error_rows.count} rows"endendNUMERIC orBIGNUMERIC value in a row, you must surroundthe value with double quotation marks, such as"big_numeric_col":"0.123456789123".You don't need to populate theinsertID field when you insert rows.The following example shows how to avoid sending aninsertID for each rowwhen streaming.
Java
Before trying this sample, follow theJava setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQueryJava API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.
importcom.google.cloud.bigquery.BigQuery;importcom.google.cloud.bigquery.BigQueryError;importcom.google.cloud.bigquery.BigQueryException;importcom.google.cloud.bigquery.BigQueryOptions;importcom.google.cloud.bigquery.InsertAllRequest;importcom.google.cloud.bigquery.InsertAllResponse;importcom.google.cloud.bigquery.TableId;importcom.google.common.collect.ImmutableList;importjava.util.HashMap;importjava.util.List;importjava.util.Map;// Sample to insert rows without row ids in a tablepublicclassTableInsertRowsWithoutRowIds{publicstaticvoidmain(String[]args){// TODO(developer): Replace these variables before running the sample.StringdatasetName="MY_DATASET_NAME";StringtableName="MY_TABLE_NAME";tableInsertRowsWithoutRowIds(datasetName,tableName);}publicstaticvoidtableInsertRowsWithoutRowIds(StringdatasetName,StringtableName){try{// Initialize client that will be used to send requests. This client only needs to be created// once, and can be reused for multiple requests.finalBigQuerybigquery=BigQueryOptions.getDefaultInstance().getService();// Create rows to insertMap<String,Object>rowContent1=newHashMap<>();rowContent1.put("stringField","Phred Phlyntstone");rowContent1.put("numericField",32);Map<String,Object>rowContent2=newHashMap<>();rowContent2.put("stringField","Wylma Phlyntstone");rowContent2.put("numericField",29);InsertAllResponseresponse=bigquery.insertAll(InsertAllRequest.newBuilder(TableId.of(datasetName,tableName))// No row ids disable de-duplication, and also disable the retries in the Java// library..setRows(ImmutableList.of(InsertAllRequest.RowToInsert.of(rowContent1),InsertAllRequest.RowToInsert.of(rowContent2))).build());if(response.hasErrors()){// If any of the insertions failed, this lets you inspect the errorsfor(Map.Entry<Long,List<BigQueryError>>entry:response.getInsertErrors().entrySet()){System.out.println("Response error: \n"+entry.getValue());}}System.out.println("Rows successfully inserted into table without row ids");}catch(BigQueryExceptione){System.out.println("Insert operation not performed \n"+e.toString());}}}Python
Before trying this sample, follow thePython setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQueryPython API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.
fromgoogle.cloudimportbigquery# Construct a BigQuery client object.client=bigquery.Client()# TODO(developer): Set table_id to the ID of table to append to.# table_id = "your-project.your_dataset.your_table"rows_to_insert=[{"full_name":"Phred Phlyntstone","age":32},{"full_name":"Wylma Phlyntstone","age":29},]errors=client.insert_rows_json(table_id,rows_to_insert,row_ids=[None]*len(rows_to_insert))# Make an API request.iferrors==[]:print("New rows have been added.")else:print("Encountered errors while inserting rows:{}".format(errors))Send date and time data
For date and time fields, format the data in thetabledata.insertAll method asfollows:
| Type | Format |
|---|---|
DATE | A string in the form"YYYY-MM-DD" |
DATETIME | A string in the form"YYYY-MM-DD [HH:MM:SS]" |
TIME | A string in the form"HH:MM:SS" |
TIMESTAMP | The number of seconds since 1970-01-01 (the Unix epoch), or a string in the form"YYYY-MM-DD HH:MM[:SS]" |
Send range data
For fields with typeRANGE<T>, format the data in thetabledata.insertAllmethod as a JSON object with two fields,start andend.Missing or NULL values for thestart andend fields represent unbounded boundaries.These fields must have the same supported JSON format of typeT, whereT can be one ofDATE,DATETIME, andTIMESTAMP.
In the following example, thef_range_date field represents aRANGE<DATE>column in a table. A row is inserted into this column using thetabledata.insertAll API.
{ "f_range_date": { "start": "1970-01-02", "end": null }}Stream data availability
Data is available for real-time analysis usingGoogleSQLqueries immediately after BigQuery successfully acknowledges atabledata.insertAll request. When you query data in the streaming buffer,you aren't charged for bytes processed from streaming buffer if you useon-demand compute pricing. If you use capacity-based pricing, yourreservationsconsume slots for processing data in streaming buffer.
Recently streamed rows to an ingestion time partitioned table temporarily have aNULL value for the_PARTITIONTIMEpseudocolumn. For such rows, BigQuery assigns the final non-NULLvalue of thePARTITIONTIME column in the background, typically within a fewminutes. In rare cases, this can take up to 90 minutes.
Some recently streamed rows might not be available for table copy typically for afew minutes. In rare cases, this can take up to 90 minutes. To see whether datais available for table copy, check thetables.get response for a section namedstreamingBuffer.If thestreamingBuffer section is absent, your data is available for copy.You can also use thestreamingBuffer.oldestEntryTime field to identify theage of records in the streaming buffer.
Best effort de-duplication
When you supplyinsertId for an inserted row, BigQuery uses thisID to support best effort de-duplication for up to one minute. That is, ifyou stream the same row with the sameinsertId more than once withinthat time period into the same table, BigQuerymight de-duplicatethe multiple occurrences of that row, retaining only one of those occurrences.
The system expects that rows provided with identicalinsertIds are alsoidentical. If two rows have identicalinsertIds, it is nondeterministicwhich row BigQuery preserves.
De-duplication is generally meant for retry scenarios in a distributed system where there'sno way to determine the state of a streaming insert under certain errorconditions, such as network errors between your system and BigQueryor internal errors within BigQuery.If you retry an insert, use the sameinsertId for the same set of rows so thatBigQuery can attempt to de-duplicate your data. For moreinformation, seetroubleshooting streaming inserts.
De-duplication offered by BigQuery is best effort, and it shouldnot be relied upon as a mechanism to guarantee the absence of duplicates in yourdata. Additionally, BigQuery might degrade the quality of besteffort de-duplication at any time in order to guarantee higherreliability and availability for your data.
If you have strict de-duplication requirements for your data,Google Cloud Datastore is an alternative service that supportstransactions.
Disabling best effort de-duplication
You can disable best effort de-duplication by not populating theinsertIdfield for each row inserted. This is the recommended way to insert data.
Apache Beam and Dataflow
To disable best effort de-duplication when you use Apache Beam'sBigQuery I/O connector for Java, use theignoreInsertIds() method.
Manually removing duplicates
To ensure that no duplicate rows exist after you are done streaming, use the following manual process:
- Add the
insertIdas a column in your table schema and include theinsertIdvalue in the data for each row. - After streaming has stopped, perform the following query to check for duplicates:
If the result is greater than 1, duplicates exist.#standardSQLSELECTMAX(count)FROM(SELECTID_COLUMN,count(*)ascountFROM`TABLE_NAME`GROUPBYID_COLUMN)
- To remove duplicates, run the following query. Specify a destination table, allow large results, and disable result flattening.
#standardSQLSELECT*EXCEPT(row_number)FROM(SELECT*,ROW_NUMBER()OVER(PARTITIONBYID_COLUMN)row_numberFROM`TABLE_NAME`)WHERErow_number=1
Notes about the duplicate removal query:
- The safer strategy for the duplicate removal query is to target a new table.Alternatively, you can target the source table with write disposition
WRITE_TRUNCATE. - The duplicate removal query adds a
row_numbercolumn with the value1tothe end of the table schema. The query uses aSELECT * EXCEPTstatement fromGoogleSQL toexclude therow_numbercolumn from the destination table. The#standardSQLprefixenablesGoogleSQL for this query. Alternatively, you can select by specificcolumn names to omit this column. - For querying live data with duplicates removed, you can also create a viewover your table using the duplicate removal query. Be aware that query costsagainst the view are calculated based on the columns selected in yourview, which can result in large bytes scanned sizes.
Stream into time-partitioned tables
When you stream data to a time-partitioned table, each partition has a streamingbuffer. The streaming buffer is retained when you perform a load, query, or copyjob that overwrites a partition by setting thewriteDisposition property toWRITE_TRUNCATE. If you want to remove the streaming buffer, verify that thestreaming buffer is empty by callingtables.get on the partition.
Ingestion-time partitioning
When you stream to an ingestion-time partitioned table, BigQueryinfers the destination partition from the current UTC time.
Newly arriving data is temporarily placed in the__UNPARTITIONED__ partitionwhile in the streaming buffer. When there's enough unpartitioned data,BigQuery partitions the data into the correct partition. However,there is no SLA for how long it takes for data to move out of the__UNPARTITIONED__ partition. A querycan exclude data in the streaming buffer from a query by filtering out theNULL values from the__UNPARTITIONED__ partition by using one of thepseudocolumns (_PARTITIONTIMEor_PARTITIONDATEdepending on your preferred data type).
If you are streaming data into a daily partitioned table, then you can overridethe date inference by supplying a partition decorator as part of theinsertAllrequest. Include the decorator in thetableId parameter. For example, you canstream to the partition corresponding to 2021-03-01 for tabletable1 using thepartition decorator:
table1$20210301When streaming using a partition decorator, you can stream to partitions withinthe last 31 days in the past and 16 days in the future relative to the currentdate, based on current UTC time. To write to partitions for dates outside theseallowed bounds, use a load or query job instead, as described inAppending to and overwriting partitioned table data.
Streaming using a partition decorator is only supported for daily partitionedtables. It is not supported for hourly, monthly, or yearly partitioned tables.
For testing, you can use the bq command-line toolbq insert CLI command.For example, the following command streams a single row to a partition for thedate January 1, 2017 ($20170101) into a partitioned table namedmydataset.mytable:
echo '{"a":1, "b":2}' | bq insert 'mydataset.mytable$20170101'bq insert command is intended for testing only.Time-unit column partitioning
You can stream data into a table partitioned on aDATE,DATETIME, orTIMESTAMP column that is between 10 years in the past and 1 year in the future.Data outside this range is rejected.
When the data is streamed, it is initially placed in the__UNPARTITIONED__partition. When there's enough unpartitioned data, BigQueryautomatically repartitions the data, placing it into the appropriate partition.However, there is no SLA for how long it takes for data to move out of the__UNPARTITIONED__ partition.
- Note: Daily partitions are processed differently than hourly, monthly andyearly partitions. Only data outsideof the date range (last 7 days to future 3 days) is extracted to the UNPARTITIONED partition,waiting to be repartitioned.On the other hand, for hourly partitioned table, data is always extracted tothe UNPARTITIONED partition, and later repartitioned.
Create tables automatically using template tables
Template tables provide a mechanism to split a logical table into many smallertables to create smaller sets of data (for example, by user ID). Template tableshave a number of limitations described below. Instead,partitioned tables andclustered tables are the recommended ways toachieve this behavior.
To use a template table through the BigQuery API, add atemplateSuffix parameterto yourinsertAll request. For the bq command-line tool, add thetemplate_suffix flagto yourinsert command. If BigQuery detects atemplateSuffixparameter or thetemplate_suffix flag, it treats the targeted table as a basetemplate. It creates a new table that shares the same schema as the targetedtable and has a name that includes the specified suffix:
<targeted_table_name> + <templateSuffix>
By using a template table, you avoid the overhead of creating each tableindividually and specifying the schema for each table. You need only createa single template, and supply different suffixes so that BigQuery can createthe new tables for you. BigQuery places the tables in the same projectand dataset.
Tables created by using template tables are usually available within a few seconds.On rare occasions, they may take longer to become available.
Change the template table schema
If you change a template table schema, all tables that are generated subsequentlyuse the updated schema. Previously generated tables are not affected,unless the existing table still has a streaming buffer.
For existing tables that still have a streaming buffer, if you modify thetemplate table schema in a backward compatible way, the schema of thoseactively streamed generated tables is also updated. However,if you modify the template table schema in a non-backward compatible way,any buffered data that uses the old schema is lost. Also, youcannot stream new data to existing generated tables that usethe old, but now incompatible, schema.
After you change a template table schema, wait until the changes have propagatedbefore you try to insert new data or query the generated tables. Requests to insertnew fields should succeed within a few minutes. Attempts to query the newfields might require a longer wait of up to 90 minutes.
If you want to change a generated table's schema, do not change theschema until streaming through the template table has ceased and the generatedtable's streaming statistics section is absent from thetables.get() response,which indicates that no data is buffered on the table.
Partitioned tables andclustered tables do not suffer from thepreceding limitations and are the recommended mechanism.
Template table details
- Template suffix value
- The
templateSuffix(or--template_suffix) value must contain only letters(a-z, A-Z), numbers (0-9), or underscores (_). The maximum combined lengthof the table name and the table suffix is 1024 characters. - Quota
Template tables are subject tostreaming quotalimitations. Your project can make up to 10 tables per second with template tables, similarto the
tables.insertAPI. This quota only appliesto tables being created, not to tables being modified.If your application needs to create more than 10 tables per second, we recommend usingclustered tables.For example, you can put the high cardinality table ID into the key column of a single clustering table.
- Time to live
The generated table inherits its expiration time from the dataset. As withnormal streaming data, generated tables cannot be copied immediately.
- Deduplication
Deduplication only happens between uniform references to a destination table.For example, if you simultaneously stream to a generated table using bothtemplate tables and a regular
insertAllcommand, no deduplication occursbetween rows inserted by template tables and a regularinsertAllcommand.- Views
The template table and the generated tables shouldn't be views.
Troubleshoot streaming inserts
The following sections discuss how to troubleshoot errors that occur when youstream data into BigQuery using the legacy streaming API. For more information on how to resolve quota errors for streaming inserts, seeStreaming insert quota errors.
Failure HTTP response codes
If you receive a failure HTTP response code such as a network error, there'sno way to tell whether the streaming insert succeeded. If you try to re-sendthe request, you might end up with duplicated rows in your table. To helpprotect your table against duplication, set theinsertId property whensending your request. BigQuery uses theinsertId property for de-duplication.
If you receive a permission error, an invalid table name error, or an exceeded quota error, no rows are inserted and the entire request fails.
Success HTTP response codes
Even if you receive a success HTTP response code, you'll need to check theinsertErrors property of the response to determine whether the row insertionswere successful because it's possible that BigQuery was only partiallysuccessful at inserting the rows. You might encounter one of the following scenarios:
- All rows inserted successfully. If the
insertErrorsproperty is an empty list, all of the rows were inserted successfully. - Some rows inserted successfully. Except in cases where there is aschema mismatch in any of the rows, rows indicated in the
insertErrorsproperty arenot inserted, and all other rows are inserted successfully. Theerrorsproperty contains detailed information about why each unsuccessful row failed.Theindexproperty indicates the 0-based row index of the requestthat the error applies to. - None of the rows inserted successfully. If BigQuery encounters aschema mismatch on individual rows in the request, none of the rows are inserted and an
insertErrorsentry is returned for each row, even the rows that did not have a schemamismatch. Rows that did not have a schema mismatch have an error with thereasonproperty set tostopped, and can be re-sent as-is. Rows that failedinclude detailed information about the schema mismatch. To learn about the supported protocol buffertypes for each BigQuery data type, seeSupported protocol buffer and Arrow data types.
Metadata errors for streaming inserts
Because BigQuery's streaming API is designed for high insertion rates, modifications to the underlying table metadata exhibit are eventually consistent when interacting with the streaming system. Most of the time, metadata changes are propagated within minutes, but during this period API responses might reflect the inconsistent state of the table.
Some scenarios include:
- Schema Changes. Modifying the schema of a table that has recently received streaming inserts can cause responses with schema mismatch errors because the streaming system might not immediately pick up the schema change.
- Table Creation/Deletion. Streaming to a nonexistent table returns a variation of a
notFoundresponse. A table created in response might not immediately be recognizedby subsequent streaming inserts. Similarly, deleting or recreating a table can create a periodof time where streaming inserts are effectively delivered to the old table. The streaming insertsmight not be present in the new table. - Table Truncation. Truncating a table's data (by using a query job that useswriteDisposition of WRITE_TRUNCATE) can similarly cause subsequent inserts during the consistencyperiod to be dropped.
Missing/Unavailable data
Streaming inserts reside temporarily in the write-optimized storage, which has different availabilitycharacteristics than managed storage. Certain operations in BigQuery don't interactwith the write-optimized storage, such as table copy jobs and API methods liketabledata.list.Recent streaming data won't be present in the destination table or output.
Streaming insert quota errors
This section provides tips for troubleshooting quota errors related tostreaming data into BigQuery.
In certain regions, streaming inserts have a higher quota if you don't populatetheinsertId field for each row. For more information about quotas forstreaming inserts, seeStreaming inserts.The quota-related errors for BigQuery streaming depend on thepresence or absence ofinsertId.
Error message
If theinsertId field is empty, the following quota error is possible:
| Quota limit | Error message |
|---|---|
| Bytes per second perproject | Your entity with gaia_id:GAIA_ID,project:PROJECT_ID in region:REGION exceeded quota for insert bytesper second. |
If theinsertId field is populated, the following quota errors are possible:
| Quota limit | Error message |
|---|---|
| Rows per second perproject | Your project:PROJECT_ID inREGION exceeded quota for streaminginsert rows per second. |
| Rows per second pertable | Your table:TABLE_ID exceeded quota forstreaming insert rows per second. |
| Bytes per second pertable | Your table:TABLE_ID exceeded quota forstreaming insert bytes per second. |
The purpose of theinsertId field is to deduplicate inserted rows. If multipleinserts with the sameinsertId arrive within a few minutes' window,BigQuery writes a single version of the record. However, thisautomatic deduplication is not guaranteed. For maximum streaming throughput, werecommend that you don't includeinsertId and instead usemanual deduplication.For more information, seeEnsuring data consistency.
When you encounter this error,diagnose the issuethe issue and thenfollow the recommended steps to resolve it.
Diagnosis
Use theSTREAMING_TIMELINE_BY_*views to analyze the streaming traffic. These views aggregate streamingstatistics over one-minute intervals, grouped byerror_code. Quota errors appearin the results witherror_code equal toRATE_LIMIT_EXCEEDED orQUOTA_EXCEEDED.
Depending on the specific quota limit that was reached, look attotal_rows ortotal_input_bytes. If the error is a table-level quota, filter bytable_id.
For example, the following query shows total bytes ingested per minute, and thetotal number of quota errors:
SELECTstart_timestamp,error_code,SUM(total_input_bytes)assum_input_bytes,SUM(IF(error_codeIN('QUOTA_EXCEEDED','RATE_LIMIT_EXCEEDED'),total_requests,0))ASquota_errorFROM`region-REGION_NAME`.INFORMATION_SCHEMA.STREAMING_TIMELINE_BY_PROJECTWHEREstart_timestamp>TIMESTAMP_SUB(CURRENT_TIMESTAMP,INTERVAL1DAY)GROUPBYstart_timestamp,error_codeORDERBY1DESC
Resolution
To resolve this quota error, do the following:
If you are using the
insertIdfield for deduplication, and your project isin a region that supports the higher streaming quota, we recommend removing theinsertIdfield. This solution might require some additional steps to manuallydeduplicate the data. For more information, seeManually removing duplicates.If you are not using
insertId, or if it's not feasible to remove it, monitoryour streaming traffic over a 24-hour period and analyze the quota errors:If you see mostly
RATE_LIMIT_EXCEEDEDerrors rather thanQUOTA_EXCEEDEDerrors, and your overall traffic is less than 80% of quota, the errors probablyindicate temporary spikes. You can address these errors by retrying theoperation using exponential backoff between retries.If you are using a Dataflow job to insert data, consider usingload jobs instead of streaminginserts. For more information, seeSetting the insertionmethod.If you are using Dataflow with a custom I/O connector, considerusing a built-in I/O connector instead. For more information, seeCustomI/O patterns.
If you see
QUOTA_EXCEEDEDerrors or the overall traffic consistentlyexceeds 80% of the quota, submit a request for a quota increase. For moreinformation, seeRequest a quota adjustment.You might also want to consider replacing streaming inserts with the newerStorage Write API, which has higher throughput,lower price, and many useful features.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2026-02-19 UTC.