Query plan and timeline

Embedded within query jobs, BigQuery includes diagnostic queryplan and timing information. This is similar to the information provided bystatements such asEXPLAIN in other database and analytical systems. Thisinformation can be retrieved from the API responses of methods such asjobs.get.

For long running queries, BigQuery will periodically update thesestatistics. These updates happen independently of the rate at which the jobstatus is polled, but typically won't happen more frequently than every 30seconds. Additionally, query jobs that don't use execution resources, such asdry run requests or results that can be served from cached results, won'tinclude the additional diagnostic information, though other statistics may bepresent.

Background

When BigQuery executes a query, it converts the SQL into anexecution graph that consists ofstages. Stages are composed ofsteps,the elemental operations that perform thequery's logic. BigQuery leverages a heavily distributed parallelarchitecture that executes stages in parallel for reduced latency. Stagescommunicate with one another by usingshuffle, a fast distributed memoryarchitecture.

The query plan uses the termswork units andworkers to describe stageparallelism. Elsewhere withinBigQuery, you may encounter the termslot, which is anabstracted representation of multiple facets of query execution, includingcompute, memory, and I/O resources. Slots execute the individual work units of astage in parallel. Top level job statistics provide individualquery cost usingtotalSlotMs based on this abstracted accounting.

Another important property of the query execution is that BigQuerycan modify the query plan while a query is running. For example,BigQuery introducesrepartition stages to improve datadistribution between query workers, which improves parallelism and reduces querylatency.

In addition to the query plan, query jobs also expose atimeline ofexecution, which provides an accounting of units of work completed, pending, andactive. A query can have multiple stages with activeworkers simultaneously, and the timeline is intended to show overall progress ofthe query.

View the execution graph with the Google Cloud console

In theGoogle Cloud console,you can see details of the query plan for a completed query by clicking theExecution Details button.

Query plan information

Within the API response, query plans are represented as a list of query stages.Each item in the list shows per-stage overview statistics, detailed stepinformation, and stage timing classifications. Not all details are renderedwithin the Google Cloud console, but they can all be present within the APIresponses.

Understand the execution graph

In the Google Cloud console, you can see the query plan details by clicking theExecution graph tab.

The execution graph tab.

TheExecution graph panel is organized as follows:

The execution graph layout.

In the center is theexecution graph. It shows stages as nodes and shufflememory exchanged between stages as edges.
The left panel has thequery text heatmap. It shows the main query textthat the query executed along with any referenced views.
The right panel has the query or stage details.

Navigating the execution graph

The execution graph applies a color scheme to the nodes in the graph based onslot-time, where nodes with a darker red take more slot-time relative to therest of the stages in the graph.

To navigate around the execution graph, you can:

Click and hold on the graph background to pan to different areas of the graph.
Use the mouse scroll wheel to zoom in and out of the graph.
Click and hold on theminimap on the top-right to pan to different areasof the graph.

Clicking on a stage in the graph shows the selected stage's details. The stagedetails have:

Statistics. SeeStage overview for details about thestatistics.
Step details. Steps describe the individual operations that execute thequery's logic.

Step details

Stages are made up of steps, the individual operations that execute the query'slogic. Steps havesubsteps that describe what the step did in pseudocode.Substeps use variables to describe relationships between steps. Variables beginwith a dollar sign followed by a unique number. Variable numbers are not sharedacross stages.

The following image shows a stages steps:

The execution graph step details.

Here is an example of a stage's steps:

  READ  $30:l_orderkey, $31:l_quantity  FROM lineitem  AGGREGATE  GROUP BY $100 := $30  $70 := SUM($31)  WRITE  $100, $70  TO __stage00_output  BY HASH($100)

The example's steps describe the following:

The stage read the columns l_orderkey and l_quantity from the table lineitemand stored the values in the variables $30 and $31, respectively.
The stage aggregated the variables $30 and $31, storing aggregations into thevariables $100 and $70, respectively.
The stage wrote the results of the variables $100 and $70 to shuffle. Thestage ordered the results in shuffle memory based on $100.

SeeInterpret and optimize steps for fulldetails about the type of steps and how to optimize them.

BigQuery might truncate substeps when the query's execution graphwas complex enough that providing complete substeps would cause payload sizeissues when retrieving query information.

Query text heatmap

Preview

This product or feature is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of the Service Specific Terms. Pre-GA products and features are available "as is" and might have limited support. For more information, see thelaunch stage descriptions.

To request feedback or support for this feature, send an email tobq-performance-troubleshooting+feedback@google.com

BigQuery can map some stage steps to portions of query text. Thequery text heatmap shows all corresponding query text with stage steps. Ithighlights the query text based on the total slot-time of stages whose stepshave mapped query text.

The following image shows the highlighted query text:

The execution graph highlighted query text.

Holding the pointer over a mapped portion of the query text shows a tooltip thatlists all stage steps that map to the query text along with stage slot-time.Clicking on a mapped query text selects the stage in the execution graph andopens the stage details in the right panel.

The execution graph associates the query text with the stage.

A single portion of query text can map to multiple stages. The tooltip listseach mapped stage and its slot-time. Clicking on the query text highlights thecorresponding stages and grays out the rest of the graph. Subsequently clickingon a specific stage shows its details.

The following image shows how the query text relates to the step details:

The execution graph associates the query text with the step.

In a stage'sStep details section, if a step maps to query text, the stephas a code icon. Clicking on the code icon highlights the mapped portion ofquery text on the left.

It is important to keep in mind that the heatmap color is based on the entirestage's slot-time. Because BigQuery does not measure theslot-time of steps, the heatmap does not represent the actual slot-time for thatspecific portion of mapped query text. In most cases, a stage performs just asingle complex step, like a join or aggregate. Thus the heatmap color isappropriate. However, when a stage is made up of steps that perform multiplecomplex operations, the heatmap color may overrepresent the actual slot-time inthe heatmap. In such cases, it is important to understand the other steps thatmake up the stage to get a more complete understanding of the query'sperformance.

If a query uses views, and the stage's steps have mappings to a view's querytext, then the query text heatmap shows the view's name and view's query textwith their mappings. However, if the view is deleted, or if you lose thebigquery.tables.getIAM permission on theview, then the query text heatmap doesn't show the stage steps' mappings for theview.

Stage overview

The overview fields for each stage can include the following:

API field	Description
`id`	Unique numeric ID for the stage.
`name`	Simple summary name for the stage. The`steps` within the stage provide additional details about execution steps.
`status`	Execution status of the stage. Possible states include PENDING, RUNNING, COMPLETE, FAILED, and CANCELLED.
`inputStages`	A list of the IDs that form the dependency graph of the stage. For example, a JOIN stage often needs two dependent stages that prepare the data on the left and right side of the JOIN relationship.
`startMs`	Timestamp, in epoch milliseconds, that represents when the first worker within the stage began execution.
`endMs`	Timestamp, in epoch milliseconds, that represents when the last worker completed execution.
`steps`	More detailed list of execution steps within the stage. See next section for more information.
`recordsRead`	Input size of the stage as number of records, across all stage workers.
`recordsWritten`	Output size of the stage as number of records, across all stage workers.
`parallelInputs`	Number of parallelizable units of work for the stage. Depending on the stage and query, this may represent the number of columnar segments within a table, or the number of partitions within an intermediate shuffle.
`completedParallelInputs`	Number of units work within the stage that were completed. For some queries, not all inputs within a stage need to be completed for the stage to complete.
`shuffleOutputBytes`	Represents the total bytes written across all workers within a query stage.
`shuffleOutputBytesSpilled`	Queries that transmit significant data between stages may need to fallback to disk-based transmission. The spilled bytes statistic communicates how much data was spilled to disk. Depends on an optimization algorithm so it is not deterministic for any given query.

Per-stage timing classification

The query stages provide stage timing classifications, in both relative andabsolute form. As each stage of execution represents work undertaken by one ormore independent workers, information is provided in both average and worst-casetimes. These times represent the average performance for all workers in a stageas well as the long-tail slowest worker performance for a given classification.The average and max times are furthermore broken down into absolute and relativerepresentations. For ratio-based statistics, the data is provided as a fractionof the longest time spent by any worker in any segment.

The Google Cloud console presents stage timing using the relative timingrepresentations.

The stage timing information is reported as follows:

Relative timing	Absolute timing	Ratio numerator
`waitRatioAvg`	`waitMsAvg`	Time the average worker spent waiting to be scheduled.
`waitRatioMax`	`waitMsMax`	Time the slowest worker spent waiting to be scheduled.
`readRatioAvg`	`readMsAvg`	Time the average worker spent reading input data.
`readRatioMax`	`readMsMax`	Time the slowest worker spent reading input data.
`computeRatioAvg`	`computeMsAvg`	Time the average worker spent CPU bound.
`computeRatioMax`	`computeMsMax`	Time the slowest worker spent CPU bound.
`writeRatioAvg`	`writeMsAvg`	Time the average worker spent writing output data.
`writeRatioMax`	`writeMsMax`	Time the slowest worker spent writing output data.

Step overview

Steps contain the operations that each worker within a stage executes, presentedas an ordered list of operations. Each step operation has a category, with someoperations providing more detailed information. The operation categories presentin the query plan include the following:

Step category	Description
`READ`	A read of one or more columns from an input table or from intermediate shuffle. Only the first sixteen columns that are read are returned in the step details.
`WRITE`	A write of one or more columns to an output table or to intermediate shuffle. For`HASH` partitioned outputs from a stage, this also includes the columns used as the partition key.
`COMPUTE`	Expression evaluation and SQL functions.
`FILTER`	Used by`WHERE`,`OMIT IF`, and`HAVING` clauses.
`SORT`	`ORDER BY` operation that includes the column keys and the sort order.
`AGGREGATE`	Implements aggregations for clauses like`GROUP BY` or`COUNT`, among others.
`LIMIT`	Implements the`LIMIT` clause.
`JOIN`	Implements joins for clauses like`JOIN`, among others; includes the join type and possibly the join conditions.
`ANALYTIC_FUNCTION`	An invocation of a window function (also known as an "analytic function").
`USER_DEFINED_FUNCTION`	An invocation to a user-defined function.

Interpret and optimize steps

The following sections explain how to interpret the steps in a query plan andprovide ways to optimize your queries.

`READ` step

TheREAD step means that a stage is accessing data for processing. Data can beread directly from the tables referenced in a query, or from shuffle memory.When data from a previous stage is read, BigQuery reads data fromshuffle memory. The amount of data scanned impacts cost when using on-demandslots and affects performance when using reservations.

Potential performance issues

Large scan of unpartitioned table: if the query only needs a smallportion of the data, then this might indicate that a table scan isinefficient.Partitioning could be agood optimization strategy.
Scan of a large table with a small filter ratio: this suggests that thefilter isn't effectively reducing the data scanned. Consider revising thefilter conditions.
Shuffle bytes spilled over to disk: this suggests that the data isn'tstored effectively using optimization techniques such as clustering, whichcould maintain similar data in clusters.

Optimize

Targeted filtering: useWHERE clauses strategically to filter outirrelevant data as early as possible in the query. This reduces the amountof data that needs to be processed by the query.
Partitioning and clustering: BigQuery uses tablepartitioning and clustering to efficiently locate specific data segments.Ensure your tables are partitioned and clustered based on your typical querypatterns to minimize data scanned duringREAD steps.
Select relevant columns: avoid usingSELECT * statements. Instead,select specific columns or useSELECT * EXCEPT to avoid readingunnecessary data.
Materialized views: materialized views can precompute and storefrequently used aggregations, potentially reducing the need to read basetables duringREAD steps for queries that use those views.

`COMPUTE` step

In theCOMPUTE step, BigQuery performs the following actions onyour data:

Evaluates expressions in the query'sSELECT,WHERE,HAVING, and otherclauses, including calculations, comparisons, and logical operations.
Executes built-in SQL functions and user-defined functions.
Filters rows of data based on conditions in the query.

Optimize

The query plan can reveal bottlenecks within theCOMPUTE step. Look for stageswith extensive computations or a high number of rows processed.

Correlate theCOMPUTE step with data volume: if a stage showssignificant computation and processes a large volume of data, then it mightbe a good candidate for optimization.
Skewed data: for stages where the compute maximum is significantlyhigher than the compute average, this indicates that the stage spent adisproportionate amount of time processing a few slices of data. Considerlooking at the distribution of data to see if there is data skew.
Consider data types: use appropriate data types for your columns. Forexample, using integers, datetimes, and timestamps instead of strings canimprove performance.

`WRITE` step

WRITE steps happen for intermediate data and final output.

Writing to shuffle memory: in a multi-stage query, theWRITE stepoften involves sending the processed data to another stage for furtherprocessing. This is typical for shuffle memory, which combines or aggregatesdata from multiple sources. The data written during this stage is typicallyan intermediate result, not the final output.
Final output: the query result is written to either the destination or atemporary table.

Hash Partitioning

When a stage in the query plan writes data to a hash partitioned output,BigQuery writes the columns included in the output and the columnchosen as the partition key.

Optimize

While theWRITE step itself might not be directly optimized, understanding itsrole can help you identify potential bottlenecks in earlier stages:

Minimize data written: focus on optimizing preceding stages withfiltering and aggregation to reduce the amount of data written during thisstep.
Partitioning: writing benefits greatly from table partitioning. If thedata that you write is confined to specific partitions, thenBigQuery can perform faster writes.
If the DML statement has aWHERE clause with a static condition against atable partition column, then BigQuery only modifies therelevant table partitions.
Denormalization trade-offs: denormalization can sometimes lead tosmaller result sets in intermediateWRITE step. However, there aredrawbacks such as increased storage usage and data consistency challenges.

`JOIN` step

In theJOIN step, BigQuery combines data from two data sources.Joins can include join conditions. Joins are resource intensive. When joininglarge data in BigQuery, the join keys are shuffled independentlyto line up on the same slot, so that the join is performed locally on each slot.

The query plan for theJOIN step typically reveals the following details:

Join pattern: this indicates the type of join used. Each type defineshow many rows from the joined tables are included in the result set.
Join columns: these are the columns used to match rows between thesources of data. The choice of columns is crucial for join performance.

Join patterns

Broadcast join: when one table, typically the smaller one, can fit inmemory on a single worker node or slot, BigQuery canbroadcast it to all other nodes to perform the join efficiently. Look forJOIN EACH WITH ALL in the step details.
Hash join: when tables are large or a broadcast join isn't suitable, ahash join might be used. BigQuery uses hash and shuffleoperations to shuffle the left and right tables so that the matching keysend up in the same slot to perform a local join. Hash joins are an expensiveoperation since the data needs to be moved, but they enable efficientmatching of rows across hashes. Look forJOIN EACH WITH EACH in the stepdetails.
Self join: a SQL antipattern in which a table is joined to itself.
Cross join: a SQL antipattern that can cause significant performanceissues because it generates larger output data than the inputs.
Skewed join: the data distribution across the join key in one table isvery skewed and can lead to performance issues. Look for cases where the maxcompute time is much greater than the average compute time in the queryplan. For more information, seeHigh cardinalityjoin andPartitionskew.

Debugging

Large data volume: if the query plan shows a significant amount of dataprocessed during theJOIN step, investigate the join condition and joinkeys. Consider filtering or using more selective join keys.
Skewed data distribution: analyze the data distribution of join keys. Ifone table is very skewed, explore strategies such as splitting the query orprefiltering.
High cardinality joins: joins that produce significantly more rows thanthe number of left and right input rows can drastically reduce queryperformance. Avoid joins that produce a very large number of rows.
Incorrect ordering of table: Ensure you've chosen the appropriate jointype, such asINNER orLEFT, and ordered tables from largest to smallestbased on your query's requirements.

Optimize

Selective join keys: for join keys, useINT64 instead ofSTRING whenpossible.STRING comparisons are slower thanINT64 comparisons becausethey compare each character in a string. Integers only require a singlecomparison.
Filter before joining: applyWHERE clause filters on individual tablesbefore the join. This reduces the amount of data involved in the joinoperation.
Avoid functions on join columns: avoid calling functions on joincolumns. Instead, standardize your table data during the ingestion orpost-ingestion process using ELT SQL pipelines. This approach eliminates theneed to modify join columns dynamically, which enables more efficient joinswithout compromising data integrity.
Avoid self joins: self-joins are commonly used to compute row-dependentrelationships. However, self-joins can potentially quadruple the number ofoutput rows, leading to performance issues. Instead of relying onself-joins, consider using window (analytic) functions.
Large tables first: even though the SQL query optimizer can determinewhich table should be on which side of the join, order your joined tablesappropriately. The best practice is to place the largest table first,followed by the smallest, and then by decreasing size.
Denormalization: in some cases, strategically denormalizing tables(adding redundant data) can eliminate joins altogether. However, thisapproach comes with storage and data consistency trade-offs.
Partitioning and clustering: partitioning tables based on join keys andclustering colocated data can significantly speed up joins by lettingBigQuery target relevant data partitions.
Optimizing skewed joins: to avoid performance issues associated withskewed joins, pre-filter data from the table as early as possible or splitthe query into two or more queries.

`AGGREGATE` step

In theAGGREGATE step, BigQuery aggregates and groups data.

Debugging

Stage details: check the number of input rows to and output rows fromthe aggregation, and the shuffle size to determine how much data reductionthe aggregate step achieved and whether data shuffling was involved.
Shuffle size: a large shuffle size might indicate that a significantamount of data was moved across worker nodes during the aggregation.
Check data distribution: ensure the data is evenly distributed acrosspartitions. Skewed data distribution can lead to imbalanced workloads in theaggregate step.
Review aggregations: analyze the aggregation clauses to confirm they arenecessary and efficient.

Optimize

Clustering: cluster your tables on columns frequently used inGROUPBY,COUNT, or other aggregation clauses.
Partitioning: choose a partitioning strategy that aligns with your querypatterns. Consider using ingestion-time partitioned tables to reduce theamount of data scanned during the aggregation.
Aggregate earlier: if possible, perform aggregations earlier in thequery pipeline. This can reduce the amount of data that needs to beprocessed during the aggregation.
Shuffling optimization: if shuffling is a bottleneck, explore ways tominimize it. For example, denormalize tables or use clustering to colocaterelevant data.

Edge cases

DISTINCT aggregates: queries withDISTINCT aggregates can becomputationally expensive, especially on large datasets. Consideralternatives likeAPPROX_COUNT_DISTINCT for approximate results.
Large number of groups: if the query produces a vast number of groups,it might consume a substantial amount of memory. In such cases, think aboutlimiting the number of groups or using a different aggregation strategy.

`REPARTITION` step

BothREPARTITION andCOALESCE are optimization techniques thatBigQuery applies directly to the shuffled data in the query.

REPARTITION: this operation aims to rebalance data distribution acrossworker nodes. Suppose that after shuffling, one worker node ends up with adisproportionately large amount of data. TheREPARTITION stepredistributes the data more evenly, preventing any single worker frombecoming a bottleneck. This is particularly important for computationallyintensive operations like joins.
COALESCE: this step happens when you have many small buckets of dataafter shuffling. TheCOALESCE step combines these buckets into largerones, reducing the overhead associated with managing numerous small piecesof data. This can be especially beneficial when dealing with very smallintermediate result sets.

If you seeREPARTITION orCOALESCE steps in your query plan, it doesn'tnecessarily mean there's a problem with your query. It's often a sign thatBigQuery is proactively optimizing data distribution for betterperformance. However, if you see these operations repeatedly, it might indicatethat your data is inherently skewed or that your query is causing excessive datashuffling.

Optimize

To reduce the number ofREPARTITION steps, try the following:

Data distribution: ensure that your tables are partitioned and clusteredeffectively. Well-distributed data reduces the likelihood of significantimbalances after shuffling.
Query structure: analyze the query for potential sources of data skew.For example, are there highly selective filters or joins that result in asmall subset of data being processed on a single worker?
Join strategies: experiment with different join strategies to see ifthey lead to a more balanced data distribution.

To reduce the number ofCOALESCE steps, try the following:

Aggregation strategies: consider performing aggregations earlier in thequery pipeline. This can help reduce the number of small intermediate resultsets that might causeCOALESCE steps.
Data volume: if you're dealing with very small datasets,COALESCEmight not be a significant concern.

Don't over-optimize. Premature optimization might make your queries more complexwithout yielding significant benefits.

Explanation for federated queries

Federated queries let you send a querystatement to an external data source by using theEXTERNAL_QUERYfunction.Federated queries are subject to the optimization technique known asSQLpushdowns and the queryplan shows operations pushed down to the external data source, if any. Forexample, if you run the following query:

SELECTid,nameFROMEXTERNAL_QUERY("<connection>","SELECT * FROM company")WHEREcountry_codeIN('ee','hu')ANDnamelike'%TV%'

The query plan will show the following stage steps:

$1:id, $2:name, $3:country_codeFROM table_for_external_query_$_0(  SELECT id, name, country_code  FROM (    /*native_query*/    SELECT * FROM company  )  WHERE in(country_code, 'ee', 'hu'))WHERE and(in($3, 'ee', 'hu'), like($2, '%TV%'))$1, $2TO __stage00_output

In this plan,table_for_external_query_$_0(...) represents theEXTERNAL_QUERY function. In the parentheses you can see the query that theexternal data source executes. Based on that, you can notice that:

An external data source returns only 3 selected columns.
An external data source returns only rows for whichcountry_code is either'ee' or'hu'.
TheLIKE operator is not push down and is evaluated byBigQuery.

For comparison, if there are no pushdowns, the query plan will show thefollowing stage steps:

$1:id,$2:name,$3:country_codeFROMtable_for_external_query_$_0(SELECTid,name,description,country_code,primary_address,secondaryaddressFROM(/*native_query*/SELECT*FROMcompany))WHEREand(in($3,'ee','hu'),like($2,'%TV%'))$1,$2TO__stage00_output

This time an external data source returns all the columns and all the rows fromthecompany table and BigQuery performs filtering.

Timeline metadata

The query timeline reports progress at specific points in time, providingsnapshot views of overall query progress. The timeline is represented as aseries of samples that report the following details:

Field	Description
`elapsedMs`	Milliseconds elapsed since the start of query execution.
`totalSlotMs`	A cumulative representation of the slot milliseconds used by the query.
`pendingUnits`	Total units of work scheduled and waiting for execution.
`activeUnits`	Total active units of work being processed by workers.
`completedUnits`	Total units of work that have been completed while executing this query.

An example query

The following query counts the number of rows in the Shakespeare public datasetand has a second conditional count that restricts results to rows that reference'hamlet':

SELECTCOUNT(1)asrowcount,COUNTIF(corpus='hamlet')asrowcount_hamletFROM`publicdata.samples.shakespeare`

ClickExecution details to see the query plan:

The hamlet query plan.

The color indicators show the relative timings for all steps across all stages.

To learn more about the steps of the execution stages, clickto expand the details for the stage:

The hamlet query plan stepdetails.

In this example, the longest time in any segment was the time the single workerin Stage 01 spent waiting for Stage 00 to complete. This is because Stage 01 wasdependent on Stage 00's input, and couldn't start until the first stage wroteits output into intermediate shuffle.

Error reporting

It is possible for query jobs to fail mid-execution. Because plan information isupdated periodically, you can observe where within the execution graph thefailure occurred. Within the Google Cloud console, successful or failed stagesare labeled with a checkmark or exclamation point next to the stage name.

For more information about interpreting and addressing errors, see thetroubleshooting guide.

API sample representation

Query plan information is embedded in the job response information, and you canretrieve it by callingjobs.get.For example, the following excerpt of a JSON response for a job returning thesample hamlet query shows both the query plan and timeline information.

"statistics": {  "creationTime": "1576544129234",  "startTime": "1576544129348",  "endTime": "1576544129681",  "totalBytesProcessed": "2464625",  "query": {    "queryPlan": [      {        "name": "S00: Input",        "id": "0",        "startMs": "1576544129436",        "endMs": "1576544129465",        "waitRatioAvg": 0.04,        "waitMsAvg": "1",        "waitRatioMax": 0.04,        "waitMsMax": "1",        "readRatioAvg": 0.32,        "readMsAvg": "8",        "readRatioMax": 0.32,        "readMsMax": "8",        "computeRatioAvg": 1,        "computeMsAvg": "25",        "computeRatioMax": 1,        "computeMsMax": "25",        "writeRatioAvg": 0.08,        "writeMsAvg": "2",        "writeRatioMax": 0.08,        "writeMsMax": "2",        "shuffleOutputBytes": "18",        "shuffleOutputBytesSpilled": "0",        "recordsRead": "164656",        "recordsWritten": "1",        "parallelInputs": "1",        "completedParallelInputs": "1",        "status": "COMPLETE",        "steps": [          {            "kind": "READ",            "substeps": [              "$1:corpus",              "FROM publicdata.samples.shakespeare"            ]          },          {            "kind": "AGGREGATE",            "substeps": [              "$20 := COUNT($30)",              "$21 := COUNTIF($31)"            ]          },          {            "kind": "COMPUTE",            "substeps": [              "$30 := 1",              "$31 := equal($1, 'hamlet')"            ]          },          {            "kind": "WRITE",            "substeps": [              "$20, $21",              "TO __stage00_output"            ]          }        ]      },      {        "name": "S01: Output",        "id": "1",        "startMs": "1576544129465",        "endMs": "1576544129480",        "inputStages": [          "0"        ],        "waitRatioAvg": 0.44,        "waitMsAvg": "11",        "waitRatioMax": 0.44,        "waitMsMax": "11",        "readRatioAvg": 0,        "readMsAvg": "0",        "readRatioMax": 0,        "readMsMax": "0",        "computeRatioAvg": 0.2,        "computeMsAvg": "5",        "computeRatioMax": 0.2,        "computeMsMax": "5",        "writeRatioAvg": 0.16,        "writeMsAvg": "4",        "writeRatioMax": 0.16,        "writeMsMax": "4",        "shuffleOutputBytes": "17",        "shuffleOutputBytesSpilled": "0",        "recordsRead": "1",        "recordsWritten": "1",        "parallelInputs": "1",        "completedParallelInputs": "1",        "status": "COMPLETE",        "steps": [          {            "kind": "READ",            "substeps": [              "$20, $21",              "FROM __stage00_output"            ]          },          {            "kind": "AGGREGATE",            "substeps": [              "$10 := SUM_OF_COUNTS($20)",              "$11 := SUM_OF_COUNTS($21)"            ]          },          {            "kind": "WRITE",            "substeps": [              "$10, $11",              "TO __stage01_output"            ]          }        ]      }    ],    "estimatedBytesProcessed": "2464625",    "timeline": [      {        "elapsedMs": "304",        "totalSlotMs": "50",        "pendingUnits": "0",        "completedUnits": "2"      }    ],    "totalPartitionsProcessed": "0",    "totalBytesProcessed": "2464625",    "totalBytesBilled": "10485760",    "billingTier": 1,    "totalSlotMs": "50",    "cacheHit": false,    "referencedTables": [      {        "projectId": "publicdata",        "datasetId": "samples",        "tableId": "shakespeare"      }    ],    "statementType": "SELECT"  },  "totalSlotMs": "50"},

Using execution information

BigQuery query plans provide information about how the serviceexecutes queries, but the managed nature of the service limits whether somedetails are directly actionable. Many optimizations happen automatically byusing the service, which can differ from other environments where tuning,provisioning, and monitoring can require dedicated, knowledgeable staff.

For specific techniques that can improve query execution and performance, seethebest practicesdocumentation. The queryplan and timeline statistics can help you understand whether certain stagesdominate resource utilization. For example, a JOIN stage that generates far moreoutput rows than input rows can indicate an opportunity to filter earlier in thequery.

Additionally, timeline information can help identify whether a given query isslow in isolation or due to effects from other queries contending for the sameresources. If you observe that the number of active units remains limitedthroughout the lifetime of the query but the amount of queued units of workremains high, this can represent cases where reducing the number of concurrentqueries can significantly improve overall execution time for certain queries.

Note: Some query processing takes place outside of the context of any stage. Insome cases, significant latency or slot usage can be accrued before the firststage is dispatched, or after the last stage completes. Examples include someforms of partition pruning, various metadata operations, and compensation forexcess slot usage.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.

Movatterモバイル変換

Query plan and timeline Stay organized with collections Save and categorize content based on your preferences.

Background

View the execution graph with the Google Cloud console

Query plan information

Understand the execution graph

Navigating the execution graph

Step details

Query text heatmap

Stage overview

Per-stage timing classification

Step overview

Interpret and optimize steps

READ step

Potential performance issues

Optimize

COMPUTE step

Optimize

WRITE step

Hash Partitioning

Optimize

JOIN step

Join patterns

Debugging

Optimize

AGGREGATE step

Debugging

Optimize

Edge cases

REPARTITION step

Optimize

Explanation for federated queries

Timeline metadata

An example query

Error reporting

API sample representation

Using execution information

Query plan and timeline

`READ` step

`COMPUTE` step

`WRITE` step

`JOIN` step

`AGGREGATE` step

`REPARTITION` step