Query optimizer overview Stay organized with collections Save and categorize content based on your preferences.
This page describes the Spanner query optimizer and its benefits.Query optimizer is available for GoogleSQL-dialect databases and PostgreSQL-dialect databases.
Overview
The Spanner SQL query optimizer converts a declarative SQLstatement, that describes what data the query wants, into an imperativeexecution plan, that describes one way to precisely obtain that data. Theprocess of transforming a declarative statement into aquery execution planinvolves performing transformations to tree structures used to represent thequery. The optimizer, in the process of producing an execution plan, preservesthe logical meaning of the original SQL query so that the correct rows arereturned.
Another important role of the optimizer is to produce an execution plan that isefficient.
How does the Spanner optimizer produce efficient execution plans?
Spanner's optimizer uses a combination of well-establishedheuristics and cost-based optimization to produce efficient plans. Someheuristics are very straightforward, such as "perform filters on remote machinesrather than pulling data to the local machine". Other heuristics are morecomplex, but still embody the principle of moving logic that reduces data volumecloser to the data. That principle is particularly important in a system thatshards data across multiple machines.
Not all execution decisions can be made effectively using such fixed rules, soSpanner's optimizer also makes decisions based on anestimatedcost of alternatives. Such cost estimates are calculated using the structure ofthe query, the schema of the database, and estimates of the data volume thatwill be produced by fragments of the query. For example, Spannerestimates how many rows of the Songs table qualify the filterSongGenre = "Country" if that filter appears in a query. To help thecomputation of such estimates, Spanner periodically gathersstatistics to characterize the distribution of the data in a database.
In addition, Spanner optimizes query execution by automaticallydetermining if a row- or column-oriented processing method should be used forthe query. For more information, seeOptimize scans.
To learn more about query execution plans and how they are used bySpanner to perform queries in a distributed environment, seeQuery execution plans.
Query optimizer versioning
Over time, the Spanner query optimizer will evolve, broadening theset of choices in the query execution plan and improving the accuracy ofthe estimates that inform those choices, leading to more efficient queryexecution plans.
Spanner releases optimizer updates as new query optimizerversions to improve the efficiency of its query execution plans. To learn moreabout the different versions, seeSpanner query optimizer versions.
Query optimizer statistics packages
Spanner maintains statistics on the data distribution of tablecolumns to help estimate how many rows will be produced by a query. The queryoptimizer uses these estimates to help choose the best query execution plan.These statistics are periodically updated by Spanner. Because thestatistics are used to choose query execution plans, when the statistics areupdated, it's possible for Spanner to change the query plan thatit uses for a query.
Note: These statistics are only available to the query optimizer, and cannot bequeried by the end user.By default, databases automatically use the latest generated statistics package.You can pin your database to an earlier statistics package version. You alsohave the option of running individual queries with a statistics package otherthan the latest.
Construct a new statistics package
Spanner automatically generates a new statistics packages everythree days. To construct a new statistics package manually, use theGoogleSQLANALYZE DDL statement or the PostgreSQLANALYZE DDL statement.
After significant changes to your database's data or schema, constructing a newstatistics package can benefit query performance. As a best practice, constructa new statistics package if the following occurs:
- The database processes a large amount of inserts, updates, or deletes.
- You add a new index to the database.
- You add a new column to a table.
Running anANALYZE DDL statementupdates your schema, initiates along-running operation,and cancels the creation of any automatically triggered statistics.
After Spanner finishes executing the statement, it takes up toten minutes for the query optimizer to account for a new statistics package inits query planning.
Garbage collection of statistics packages
Statistics packages in Spanner are kept for a period of 30 dayssince their creation, after which they are subject to garbage collection.
The Spanner built-inINFORMATION_SCHEMA.SPANNER_STATISTICStable contains a list of available statistics packages. Each row in this tablelists a statistics package by name, and the name contains the creation timestampof the given package. Each entry also contains a field calledALLOW_GC whichdefines whether a package can be garbage collected or not.
You can pin your entire database to any one of the packages listed in thattable. The pinned statistics package won't be garbage collected and the value ofALLOW_GC is ignored as long as the database is pinned to this package. To usea particular statistics package for an individual query, the package must belisted withALLOW_GC=FALSE or pinned. This prevents queries from failing afterthe statistics package has been garbage collected. You can change the value ofALLOW_GC using the GoogleSQLALTER STATISTICS orPostgreSQLALTER STATISTICS DDL statement.
Package retention and Personally Identifiable Information (PII)
A statistics package contains histograms of the column data, as per standardindustry practice. This helps the query optimizer select the optimal queryplans. The histogram is constructed using a small sample of values. This smalldataset can potentially contain PII.
Spanner creates a new statistics package on a regular basis andretains it for 30 days by default. Thus a small sample of values deleted fromthe database may be retained for additional 30 days in statistics histograms.Statistics packages pinned withoptimizer_statistics_package database optionor packages withALLOW_GC=FALSE option won't be garbage collected.Histograms in these packages may contain values deleted from the database for alonger period. In addition, the content of statistics packages are included indatabase backups.
The optimizer statistics is stored encrypted in the same way as user data.
The total amount of storage required for these packages is usually less than 100MB, and does count towards your total storage costs.
What's next
- To learn more about the history of the query optimizer, seeQuery optimizerversion history.
- To manage both the optimizer version and statistics package for yourscenario, seeManage the query optimizer.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-15 UTC.