Troubleshoot query issues

This document is intended to help you troubleshoot common issues related torunning queries, such as identifying reasons for slow queries, or providingresolution steps for common errors returned by failed queries.

Troubleshoot slow queries

When troubleshooting slow query performance, consider the following commoncauses:

  1. Check theGoogle Cloud Service Healthpage for known BigQuery service outages that might impact queryperformance.

  2. Review the job timeline for your query on thejob details pageto see how long each stage of the query took to run.

    • If most of the elapsed time was due to long creation times,contact Cloud Customer Care for assistance.

    • If most of the elapsed time was due to long execution times, thenreview yourquery performance insights.Query performance insights can inform you if your query ran longer than theaverage execution time, and suggest possible causes.Possible causes might include query slot contention or an insufficientshuffle quota. For more information about each query performance issue andpossible resolutions, seeInterpret query performance insights.

  3. Review thefinalExecutionDurationMs field in theJobStatisticsfor your query job. The query might have been retried. ThefinalExecutionDurationMs field contains theduration in milliseconds of the execution of the final attempt of this job.

  4. Review the bytes processed in thequery job details pageto see if it is higher than expected. You can do this by comparing the number ofbytes processed by the current query with another query job that completed inan acceptable amount of time. If there is a large discrepancy of bytes processedbetween the two queries, then perhaps the query was slow due to a large datavolume. For information on optimizing your queries to handle large data volumes,seeOptimize query computation.

    You can also identify queries in your project that process a largeamount of data by searching for the most expensive queries using theINFORMATION_SCHEMA.JOBS view.

Compare a slow and fast execution of the same query

If a query that previously ran quickly is now running slowly, examine theJob API object output to identifychanges in its execution.

Cache hits

Confirm whether the fast execution of the job was a cache hit by looking at thecacheHit value. Ifthe value istrue for the fast execution of the query, then the query usedcached results instead of executing the query.

If you expect the slow job to use cached results, investigate why the query isno longer using cached results.If you don't expect the query to retrieve data from the cache, look for a fastquery execution example that didn't hit the cache for the investigation.

Quota delays

To determine whether the slowdown was caused by any quotadeferments, check thequotaDeferments fieldfor both jobs. Compare the values to determine whether the slower query'sstart time was delayed by any quota deferments that didn't affect the faster job.

Execution duration

To understand the difference between the execution duration of the lastattempt of both jobs, compare their values for thefinalExecutionDurationMs field.

If the values forfinalExecutionDurationMs are quitesimilar, but the difference in the wall execution time between the two queries,calculated asstartTime - endTime,is much larger, this means that there could have been an internal queryexecution retry for the slow job due to a possible transient issue. If you seethis difference pattern repeatedly,contact Cloud Customer Carefor assistance

Bytes processed

Review the bytes processed in thequery job details page or look at thetotalBytesProcessed fromJobStatistics to see if it is higher than expected. If there is a large discrepancy of bytes processed between the two queries, then the query might be slow due to a change in the volume of data processed. For information on optimizing queries to handle large data volumes, seeOptimize query computation. The following reasons can cause an increase in the number of bytes processed by a query:

  • The size of the tables referenced by the query has increased.
  • The query is now reading a larger partition of the table.
  • The query references a view whose definition has changed.

Referenced tables

Check whether the queries read the same tables by analyzing the output of thereferencedTables field inJobStatistics2.Differences in the referenced tables can be explained by the following:

  • The SQL query was modified to read different tables. Compare the query text toconfirm this.
  • The view definition has changed between executions of the query. Check thedefinitions of the views referenced in this query andupdate them if necessary.

Differences in referenced tables could explain changes intotalBytesProcessed.

Materialized view usage

If the query references anymaterialized views, differences in performance can be caused by materialized views being chosen or rejected during query execution. InspectMaterializedViewStatistics to understand whether any materialized views used in the fast query were rejected in the slow query. Look at thechosen andrejectedReason fields in theMaterializedView object.

Metadata caching statistics

For queries that involve Amazon S3 BigLake tables orCloud Storage BigLake tables with metadata caching enabled, comparethe output of theMetadataCacheStatisticsto check whether there is a difference in metadata cache usage between the slowand fast query and corresponding reasons. For example, themetadata cache might be outside the table'smaxStaleness window.

Comparing BigQuery BI Engine statistics

If the query uses BigQuery BI Engine, analyze the output ofBiEngineStatisticsto determine whether the same acceleration modes were applied to both the slowand fast query. Look at theBiEngineReasonfield to understand the high-level reason for partial acceleration or noacceleration at all, such as not enough memory, missing a reservation, or inputbeing too large.

Reviewing differences in query performance insights

Compare thequery performance insights for eachof the queries by looking at theExecution Graphin the Google Cloud console or theStagePerformanceStandaloneInsightobject to understand the following possible issues:

Pay attention to both the insights provided for the slow job as well asdifferences between insights produced for the fast job to identify stage changesaffecting performance.

A more thorough job execution metadata analysis requires going through thesingle stages of query execution by comparing theExplainQueryStageobjects for the two jobs.

To get started, look at theWait ms andShuffle output bytes metricsdescribed in theinterpret query stage informationsection.

Resource warnings from theINFORMATION_SCHEMA.JOBS view

Query thequery_info.resource_warning field of theINFORMATION_SCHEMA.JOBS view to seeif there is a difference in the warnings analyzed by BigQuerywith respect to resources used.

Workload statistics analysis

Available slot resources and slot contention can affect query execution time.The following sections help you understand the slot usage and availability for aparticular run of a query.

Average slots per second

To compute the average number of slots used per millisecond by the query, dividethe slot-milliseconds value for the job,totalSlotMs fromJobStatistics2, by theduration in milliseconds of the execution of the final attempt of this job,finalExecutionDurationMs fromJobStatistics.

You can also compute theaverage number of slots per millisecond used by a jobby querying theINFORMATION_SCHEMA.JOBS view

A job performing a similar amount of work with a larger amount of average slotsper second completes faster. A lower average slot usage per second can becaused by the following:

  1. There were no additional resources available due to a resource contentionbetween different jobs - the reservation was maxed out.
  2. The job didn't request more slots during a large part of theexecution. For example, this can happen when there is data skew.

Workload management models and reservation size

If you use the on-demand billing model, the number of slots that you can useper project is limited. Your project might also occasionally have fewer slotsavailable if there is a high amount of contention for on-demand capacity in aspecific location.

The capacity-based model is more predictable and lets you specify a confirmednumber of baseline slots.

Take these differences into account when comparing a query execution run usingon-demand to a query execution that uses a reservation.

Using a reservation isrecommended to have stable predictable query execution performance. For moreinformation about the differences between on-demand and capacity-basedworkloads, seeIntroduction to workload management.

Job concurrency

Job concurrency represents the competition among jobs for slot resources duringquery execution. Higher job concurrency generally causes slower job executionbecause the job has access to fewer slots.

You can query theINFORMATION_SCHEMA.JOBS view tofind the average number of concurrent jobsthat are running at the same time as a particular query within a project.

If there is more than one project assigned to a reservation, modify the query touseJOBS_BY_ORGANIZATION instead ofJOBS_BY_PROJECT to get accuratereservation-level data.

A higher average concurrency during the slow job execution compared to the fastjob is a contributing factor to the overall slowness.

Consider reducing the concurrency within the project or reservationby spreading out resource intensive queries, either over time within areservation or project or over different reservations or projects.

Another solution is to purchase a reservation or increase an existingreservation's size. Consider allowing the reservation touse idle slots.

To understand how many slots to add, read aboutestimating slot capacity requirements.

Jobs running in reservations with more than one project assigned can experiencedifferent slot assignment outcomes with the same average job concurrencydepending on which project is running them. Read aboutfair scheduling to learn more.

Reservation utilization

Admin resource chartsandBigQuery Cloud Monitoringcan be used to monitor reservation utilization. For more information, seeMonitor BigQuery reservations.

To understand whether a job requested any additional slots, look at theestimated runnable units metric, which isestimatedRunnableUnitsfrom the Job API response, orperiod_estimated_runnable_units in theINFORMATION_SCHEMA.JOBS_TIMELINE view.If the value for this metric is more than 0, then the job could have benefitedfrom additional slots at that time.To estimate the percentage of the job execution time where the job would havebenefited from additional slots, run the following query against theINFORMATION_SCHEMA.JOBS_TIMELINE view:

SELECTROUND(COUNTIF(period_estimated_runnable_units>0)/COUNT(*)*100,1)ASexecution_duration_percentageFROM`myproject`.`region-us`.INFORMATION_SCHEMA.JOBS_TIMELINEWHEREjob_id='my_job_id'GROUPBYjob_id;
The result is similar to the following:
+---------------------------------+|   execution_duration_percentage |+---------------------------------+|                            96.7 |+---------------------------------+

A low percentage means the slot resource availability is not a major contributorto the query slowness in this scenario.

If the percentage is high and the reservation wasn'tfully utilized during this time,contact Cloud Customer Care toinvestigate.

If the reservation was fully utilized during the slow job execution and thepercentage is high, the job was constrained on resources. Consider reducingconcurrency, increasing the reservation size, allowing the reservation to useidle slots, or purchasing a reservation if the job was run on-demand.

Job metadata and workload analysis findings inconclusive

If you still can't find the reason to explain slower than expected queryperformance,contact Cloud Customer Care for assistance.

Troubleshoot query failure usinggcpdiag

gcpdiagis an open source tool. It is not an officially supported Google Cloud product.You can use thegcpdiag tool to help you identify and fix Google Cloudproject issues. For more information, see thegcpdiag project on GitHub.

Thegcpdiag tool helps you analyze failedBigQuery queries to understand if there is a known root cause andmitigation for the specific failure.

Run thegcpdiag command

You can run thegcpdiag command from Google Cloud CLI:

Google Cloud console

  1. Complete and then copy the following command.
  2. gcpdiag runbook bigquery/failed_query \   --parameter project_id=PROJECT_ID \   --parameter bigquery_job_region=JOB_REGION \   --parameter bigquery_job_id=JOB_ID \   --parameter bigquery_skip_permission_check=SKIP_PERMISSION_CHECK
  3. Open the Google Cloud console and activate Cloud Shell.
  4. Open Cloud console
  5. Paste the copied command.
  6. Run thegcpdiag command, which downloads thegcpdiag docker image, and then performs diagnostic checks. If applicable, follow the output instructions to fix failed checks.

Docker

You can rungcpdiag using a wrapper that startsgcpdiag in aDocker container. Docker orPodman must be installed.

  1. Copy and run the following command on your local workstation.
    curl https://gcpdiag.dev/gcpdiag.sh >gcpdiag && chmod +x gcpdiag
  2. Execute thegcpdiag command.
    ./gcpdiag runbook bigquery/failed_query \   --parameter project_id=PROJECT_ID \   --parameter bigquery_job_region=JOB_REGION \   --parameter bigquery_job_id=JOB_ID \   --parameter bigquery_skip_permission_check=SKIP_PERMISSION_CHECK

Viewavailable parameters for this runbook.

Replace the following:

  • PROJECT_ID: The ID of the project containing the resource.
  • JOB_REGION: The region where the BigQuery job was executed.
  • JOB_ID: The job identifier of the BigQuery job.
  • SKIP_PERMISSION_CHECK: (Optional) set this toTrue if you want to skip the relevant permissions check and speed up the runbook execution (default value isFalse).

Useful flags:

For a list and description of allgcpdiag tool flags, see thegcpdiag usage instructions.

Avro schema resolution

Error string:Cannot skip stream

This error can occur when loading multiple Avro files with different schemas,resulting in a schema resolution issue and causing the import job to fail at arandom file.

To address this error, ensure that the last alphabetical file in the load jobcontains the superset (union) of the differing schemas. This is a requirementbased onhow Avro handles schema resolution.

Conflicting concurrent queries

Error string:Concurrent jobs in the same session are not allowed

This error can occur when multiple queries are running concurrently in asession, which is not supported. Seesessionlimitations.

Conflicting DML statements

Error string:Could not serialize access to table due to concurrent update

This error can occur when mutating data manipulation language (DML) statementsthat are running concurrently on the same table conflict with each other, orwhen the table is truncated during a mutating DML statement. For moreinformation, seeDML statement conflicts.

To address this error, run DML operations that affect a single table such thatthey don't overlap.

Correlated subqueries

Error string:Correlated subqueries that reference other tables are notsupported unless they can be de-correlated

This error can occur when your query contains a subquery that references acolumn from outside that subquery, called acorrelation column.The correlated subquery is evaluated using an inefficient, nested executionstrategy, in which the subquery is evaluated for every row from the outer querythat produces the correlation columns. Sometimes, BigQuerycan internally rewrite queries with correlated subqueries so that theyexecute more efficiently. The correlated subqueries error occurs whenBigQuery can't sufficiently optimize the query.

To address this error, try the following:

  • Remove anyORDER BY,LIMIT,EXISTS,NOT EXISTS, orIN clauses fromyour subquery.
  • Use amulti-statement queryto create a temporary table to reference in your subquery.
  • Rewrite your query to use aCROSS JOIN instead.

Insufficient column-level access control permissions

Error strings:

  • Access denied: Requires fineGrainedGet permission on the read columns toexecute the DML statements
  • `Access denied: User does not have permission to access policy tagprojects/PROJECT_ID/locations/LOCATION/taxonomies/TAXONOMY_ID/policyTags/POLICY_TAG_IDon columnPROJECT_ID.DATASET.TABLE.COLUMN.'

These errors occur when you attempt run a SQL query or a DMLDELETE,UPDATE,orMERGE statement without being granted theFine-Grained Readerrole on the columns that use column-level access control. This role is assignedto principals as part of configuring a policy tag. For more information, seeImpact on writes from column-level access control.

To work around this issue, modify the query to exclude columns with policy tags,or grant theFine-Grained Readerrole to the user. This role is assigned to principals as part of configuring apolicy tag. For more information, seeUpdate permissions on policy tags.

Troubleshoot scheduled queries

The following issues may arise during scheduled query configuration or execution.

Scheduled query triggers duplicate runs

A scheduled query might be triggered more than once at the scheduled time.This behavior is more likely to occur for queries scheduled exactly on the hour (such as 09:00).This can lead to unexpected outcomes,such as duplicate data if the query performsINSERT operations.

To minimize the risk of duplicate runs, schedule your queries at an off-the-hour time,such as a few minutes before or after the hour (for example, 08:58 or 09:03).For more information, seeScheduling queries.

Invalid credentials for scheduled queries

Error strings:

  • Error code: INVALID_USERID
  • Error code 5: Authentication failure: User Id not found
  • PERMISSION_DENIED: BigQuery: Permission denied while getting Drive credentials

This error can occur when a scheduled query fails due to havingoutdated credentials, especially when querying Google Drive data.

To address this error, follow these steps:

Invalid service account credentials

Error string:HttpError 403 when requesting returned: The caller does not have permission

This error might appear when you attempt to set up a scheduled query with aservice account. To resolve this error, see the troubleshooting steps inAuthorization and permission issues.

Invalid snapshot time

Error string:Invalid snapshot time

This error can occur when trying to query historical data that is outside of thetime travel window forthe dataset. To address this error, change the query to access historical datawithin the dataset's time travel window.

This error can also appear if one of the tables used in the query is droppedand re-created after the query starts. Check to see if there is a scheduledquery or application that performs this operation that ran at the same time asthe failed query. If there is, try moving the process that performs the drop andre-create operation to run at a time that doesn't conflict with queries thatread that table.

Job already exists

Error string:Already Exists: Job <job name>

This error can occur for query jobs that must evaluate large arrays, such thatit takes longer than average to create a query job. For example, aquery with aWHERE clause likeWHERE column IN (<2000+ elements array>).

To address this error, follow these steps:

This error can also occur when you manually set a job ID but the jobdoesn't return success within a timeout period. In this case,you can add an exception handler to check if the job exists. If it does, thenyou can pull the query results from the job.

Job not found

Error string:Job not found

This error can occur in response to agetQueryResults call,where no value is specified for thelocation field. If that is the case,try the call again and provide alocation value.

For more information, seeAvoid multiple evaluations of the same Common Table Expressions (CTEs).

Location not found

Error string:Dataset [project_id]:[dataset_id] was not found in location [region]

This error returns when you refer to a dataset resource that doesn't exist, orwhen the location in the request does not match the location of the dataset.

To address this issue, specify the location of the dataset in the query orconfirm that the dataset is available in the same location.

Query exceeds execution time limit

Error string:Query fails due to reaching the execution time limit

If your query is hitting thequery execution time limit, check theexecution time of previous runs of the query by querying theINFORMATION_SCHEMA.JOBS viewwith a query similar to the following example:

SELECTTIMESTAMP_DIFF(end_time,start_time,SECOND)ASruntime_in_secondsFROM`region-us`.INFORMATION_SCHEMA.JOBSWHEREstatement_type='QUERY'ANDquery="my query string";

If previous runs of the query have taken significantly less time, usequery performance insights to determineand address the underlying issue.

Query response is too large

Error string:responseTooLarge

This error occurs when your query's results are larger than themaximum response size.

To address this error, follow the guidance provided for theresponseTooLarge error message in theerror table.

Reservation not found or is missing slots

Error string:Cannot run query: project does not have the reservation in the data region or no slots are configured

This error occurs when the reservation assigned to the project in thequery's region has zero slots assigned. You can either add slots to thereservation, allow the reservation to use idle slots, use a differentreservation, or remove the assignment and run the query on-demand.

Table not found

Error string:Not found: Table [project_id]:[dataset].[table_name] was not found in location [region]

This error occurs when a table in your query can't be found in the dataset orregion that you specified. To address this error, do the following:

  • Check that your query contains the correct project, dataset, and table name.
  • Check that the table exists in the region in which you ran the query.
  • Make sure that the table wasn't dropped and recreated during the executionof the job. Otherwise, incomplete metadata propagation can cause this error.

Too many DML statements

Error string:Too many DML statements outstanding against <table-name>, limit is 20

This error occurs when you exceed thelimit of 20 DML statementsinPENDING status in a queue for a single table. This error usually occurswhen you submit DML jobs against a single table faster than what BigQuerycan process.

One possible solution is to group multiple smaller DML operations into largerbut fewer jobs—for example, by batching updates and inserts. When you groupsmaller jobs into larger ones, the cost to run the larger jobs is amortized andthe execution is faster. Consolidating DML statements that affect the same datagenerally improves the efficiency of DML jobs, and is less likely to exceed thequeue size quota limit. For more information about optimizing your DML operations, seeAvoid DML statements that update or insert single rows.

Other solutions to improve your DML efficiency could be to partition or clusteryour tables. For more information, seeBest practices.

Transaction aborted due to concurrent update

Error string:Transaction is aborted due to concurrent update against table [table_name]

This error can occur when two different mutating DML statementsattempt to concurrently update the same table. For example, suppose you startatransactionin a session that contains a mutating DMLstatement followed by an error. If there is no exception handler, thenBigQuery automatically rolls back the transaction when thesession ends, which takes up to 24 hours. During this time, other attempts torun a mutating DML statement on the table fail.

To address this error,list your active sessions and checkwhether any of them contains a query job with statusERROR that rana mutating DML statement on the table. Then, terminate that session.

User does not have permission

Error strings:

  • Access Denied: Project [project_id]: User does not have bigquery.jobs.createpermission in project [project_id].
  • User does not have permission to query table project-id:dataset.table.
  • Access Denied: User does not have permission to query table or perhaps itdoes not exist.

These errors can occur when you run a query without thebigquery.jobs.createpermission on the project from which you are running the query, regardless ofyour permissions on the project that contains the data.

You may also receive these errors if your service account, user, or groupdoesn't have thebigquery.tables.getData permission on all tables and viewsthat your query references. For more information about the permissions requiredfor running a query, seeRequiredroles.

These errors can also occur if the table does not exist in the queried region,such asasia-south1. You can verify the region by examining thedatasetlocation.

When addressing these errors, consider the following:

  • Service accounts: Service accounts must have thebigquery.jobs.createpermission on the project from which they run, and they must havebigquery.tables.getData permission on all tables and views that arereferenced by the query.

  • Custom roles: Custom IAM roles must have thebigquery.jobs.create permission explicitly included in the relevant role,and they must havebigquery.tables.getData permission on all tables andviews that are referenced by the query.

  • Shared datasets: When working with shared datasets in a separate project,you might still need thebigquery.jobs.create permission in the project torun queries or jobs in that dataset.

To give permission to access a table or view, seeGrant access to a table orview.

Access denied by organization policy

Error string:IAM setPolicy failed for DatasetDATASET: Operationdenied by org policy on resource.

This error occurs when an organizational policy prevents the principal fromquerying a BigQuery resource. TheOrganization Policy Servicelets you enforce constraints onsupported resourcesacross your organization hierarchy.

If the principal should have access to the resource, you'll need to use theavailableVPC troubleshooting toolsto diagnose the issue with your organization policy.

Resources exceeded issues

The following issues result when BigQuery has insufficientresources to complete your query.

Query exceeds CPU resources

Error string:Query exceeded resource limits

This error occurs when on-demand queries use too much CPU relative to theamount of data scanned. For information on how to resolve these issues, seeTroubleshoot resources exceeded issues.

Query exceeds memory resources

Error string:Resources exceeded during query execution: The query could not be executed in the allotted memory

ForSELECT statements,this error occurs when the query uses too many resources.To address this error, seeTroubleshootresources exceeded issues.

Out of stack space

Error string:Out of stack space due to deeply nested query expression during query resolution.

This error can occur when a query contains too many nested function calls.Sometimes, parts of a query are translated to function calls during parsing.For example, an expression with repeatedconcatenation operators,such asA || B || C || ..., becomesCONCAT(A, CONCAT(B, CONCAT(C, ...))).

To address this error, rewrite your query to reduce the amount of nesting.

Resources exceeded during query execution

Error string:Resources exceeded during query execution: The query could not be executed in the allotted memory. Peak usage: [percentage]% of limit. Top memory consumer(s): ORDER BY operations.

This can happen withORDER BY ... LIMIT ... OFFSET ... queries. Due to implementation details, the sorting might occur on a single compute unit, which can run out of memory if it needs to process too many rows before theLIMIT andOFFSET are applied, particularly with a largeOFFSET.

To address this error, avoid largeOFFSET values inORDER BY ...LIMIT queries. Alternatively, use the scalableROW_NUMBER() window function to assign ranks based on the chosen order, and then filter these ranks in aWHERE clause. For example:

SELECT...FROM(SELECTROW_NUMBER()OVER(ORDERBY...)ASrnFROM...)WHERErn >@start_indexANDrn<=@page_size+@start_index-- note that row_number() starts with 1

Query exceeds shuffle resources

Error string:Resources exceeded during query execution: Your project or organization exceeded the maximum disk and memory limit available for shuffle operations

This error occurs when a query can't access sufficient shuffle resources.

To address this error, provision more slots or reduce the amount ofdata processed by the query. For more information about ways to do this, seeInsufficient shuffle quota.

For additional information on how to resolve these issues, seeTroubleshootresources exceeded issues.

Query is too complex

Error string:Resources exceeded during query execution: Not enough resources for query planning - too many subqueries or query is too complex

This error occurs when a query is too complex. The primary causes ofcomplexity are:

  • WITH clauses that are deeply nested or used repeatedly.
  • Views that are deeply nested or used repeatedly.
  • Repeated use of theUNION ALL operator.

To address this error, try the following options:

  • Split the query into multiple queries, then useprocedural languageto run those queries in a sequence with shared state.
  • Use temporary tables instead ofWITH clauses.
  • Rewrite your query to reduce the number of referenced objects and comparisons.

You can proactively monitor queries that are approaching the complexity limitby using thequery_info.resource_warning field in theINFORMATION_SCHEMA.JOBS view.The following example returns queries with high resource usage for thelast three days:

SELECTANY_VALUE(query)ASquery,MAX(query_info.resource_warning)ASresource_warningFROM<your_project_id>.`region-us`.INFORMATION_SCHEMA.JOBSWHEREcreation_time >TIMESTAMP_SUB(CURRENT_TIMESTAMP(),INTERVAL3DAY)ANDquery_info.resource_warningISNOTNULLGROUPBYquery_info.query_hashes.normalized_literalsLIMIT1000

For additional information on how to resolve these issues, seeTroubleshootresources exceeded issues.

Troubleshoot resources exceeded issues

For query jobs:

To optimize your queries, try the following steps:

  • Try removing anORDER BY clause.
  • If your query usesJOIN, ensure that the larger table is on the leftside of the clause. Also ensure that your data does not contain duplicate join keys.
  • If your query usesFLATTEN, determine if it's necessary for your use case.For more information, seenested and repeated data.
  • If your query usesEXACT_COUNT_DISTINCT, consider usingCOUNT(DISTINCT) instead.
  • If your query usesCOUNT(DISTINCT <value>, <n>)with a large<n> value, consider usingGROUP BYinstead. For more information, seeCOUNT(DISTINCT).
  • If your query usesUNIQUE, consider usingGROUP BY instead, orawindow function inside of asubselect.
  • If your query materializes many rows using aLIMIT clause, consider filtering on another column, for exampleROW_NUMBER(),or removing theLIMIT clause altogether to allow write parallelization.
  • If your query used deeply nested views and aWITH clause, this can causean exponential growth in complexity, thereby reaching the limits.
  • Use temporary tables instead ofWITH clauses. AWITH clause might have tobe recalculated several times, which can make the query complex and thereforeslow. Persisting intermediate results in temporary tables instead reducescomplexity.
  • Avoid usingUNION ALL queries.
  • If your query usesMATCH_RECOGNIZE, modify thePARTITION BY clauseto reduce the size of the partitions, or add aPARTITION BY clause if onedoesn't exist.

For more information, see the following resources:

For load jobs:

If you are loading Avro or Parquet files, reduce the row size in the files. Checkfor specific size restrictions for the file format that you are loading:

If you get this error when loading ORC files,contact Support.

For Storage API:

Error string:Stream memory usage exceeded

During a Storage Read APIReadRows call, some streams with highmemory usage might get aRESOURCE_EXHAUSTED error with this message.This can happen when reading from wide tables or tables with a complex schema. As a resolution,reduce the result row size by selecting fewer columns toread (using theselected_fields parameter),or by simplifying the table schema.

Troubleshoot connectivity problems

The following sections describe how to troubleshoot connectivity issues when trying to interact with BigQuery:

Allowlist Google DNS

Use theGoogle IP Dig tool to resolve the BigQuery DNS endpointbigquery.googleapis.com to a single 'A' record IP. Make sure this IP is not blocked in your firewall settings.

In general we recommend allowlisting Google DNS names. The IP ranges shared in thehttps://www.gstatic.com/ipranges/goog.json andhttps://www.gstatic.com/ipranges/cloud.json files change often; therefore, we recommended allowlisting Google DNS names instead. Here is a list of common DNS names we recommend to add to the allowlist:

  • *.1e100.net
  • *.google.com
  • *.gstatic.com
  • *.googleapis.com
  • *.googleusercontent.com
  • *.appspot.com
  • *.gvt1.com

Identify the proxy or firewall dropping packets

To identify all packet hops between the client and the Google Front End (GFE) run atraceroute command on your client machine that could highlight the server that is dropping packets directed towards the GFE. Here is a sampletraceroute command:

traceroute -T -p 443 bigquery.googleapis.com

It is also possible to identify packet hops for specific GFE IP addresses if the problem is related to a particular IP address:

traceroute -T -p 443 142.250.178.138

If there's a Google-side timeout issue, you'll see the request make it all the way to the GFE.

If you see that the packets never reach the GFE, reach out to your network administrator to resolve this problem.

Generate a PCAP file and analyze your firewall or proxy

Generate a packet capture file (PCAP) and analyze the file to make sure the firewall or proxy is not filtering out packets to Google IPs and is allowing packets to reach the GFE.

Here is a sample command that can be run with thetcpdump tool:

tcpdump -s 0 -w debug.pcap -K -n host bigquery.googleapis.com

Set up retries for intermittent connectivity problems

There are situations in which GFE load balancers might drop connections from a client IP - for example, if it detects DDOS traffic patterns, or if the load balancer instance is being scaled down which may result in the endpoint IP being recycled. If the GFE load balancers drop the connection, the client needs to catch the timed-out request and retry the request to the DNS endpoint. Ensure that you don't use the same IP address until the request eventually succeeds, because the IP address may have changed.

If you've identified an issue with consistent Google-side timeouts where retries don't help,contact Cloud Customer Care and make sure to include a fresh PCAP file generated by running a packet capturing tool liketcpdump.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.