Create and manage vector indexes Stay organized with collections Save and categorize content based on your preferences.
Note: This feature is available with the Spanner Enterprise edition and Enterprise Plus edition. For more information, see theSpanner editions overview.
This page explains how to create and manage Spanner vector indexes,which use approximate nearest neighbor (ANN) search and tree-based structures toaccelerate vector similarity searches on your data.
Spanner accelerates approximate nearest neighbor (ANN) vectorsearches by using a specialized vector index. This index leverages GoogleResearch'sScalable Nearest Neighbor (ScaNN),a highly efficient nearest neighbor algorithm.
The vector index uses a tree-based structure to partition data and facilitatefaster searches. Spanner offers both two-level and three-leveltree configurations:
- Two-level tree configuration: Leaf nodes (
num_leaves) contain groups ofclosely related vectors along with their corresponding centroid. The rootlevel consists of the centroids from all leaf nodes. - Three-level tree configuration: Similar in concept to a two-level tree, whileintroducing an additional branch layer (
num_branches), from which leaf nodecentroids are further partitioned to form the root level (num_leaves).
Spanner picks an index for you. However, if you know that aspecific index works best, then you can use theFORCE_INDEX hintto choose to use the most appropriate vector index for your use case.
For more information, seeVECTOR INDEX statements.
Limitations
- You can't pre-split vector indexes. For more information, seePre-splitting overview.
Create vector index
To optimize the recall and performance of a vector index, we recommend that you:
Create your vector index after most of the rows with embeddings arewritten to your database. You might also need to periodicallyrebuild the vector index after you insert new data. For more information, seeRebuild the vector index.
Use the
STORINGclause to store a copy of a column in the vector index. If acolumn value is stored in the vector index, then Spannerperforms filtering at the index's leaf level to improve queryperformance. We recommend that you store a column if it's used in a filteringcondition. For more information about usingSTORINGin an index, seeCreate an index for index-only scans.
When you create your table, the embedding column must be an array of theFLOAT32 (recommended) orFLOAT64 data type, and have avector_lengthannotation, indicating the dimension of the vectors. The optimal vector lengthdepends on your workload, dataset size, and available computationalresources. Experiment with different dimensions to find the smallest size thatmaintains accuracy and performance for your application.
The following DDL statement creates aDocuments table with an embeddingcolumnDocEmbedding with a vector length:
CREATETABLEDocuments(UserIdINT64NOTNULL,DocIdINT64NOTNULL,AuthorSTRING(1024),DocContentsBytes(MAX),DocEmbeddingARRAY<FLOAT32>(vector_length=>128)NOTNULL,NullableDocEmbeddingARRAY<FLOAT32>(vector_length=>128),WordCountINT64,)PRIMARYKEY(DocId);After you populate yourDocuments table, you can create a vector index with atwo-level tree and 1000 leaf nodes on theDocuments table with an embeddingcolumnDocEmbedding using the cosine distance:
CREATEVECTORINDEXDocEmbeddingIndexONDocuments(DocEmbedding)STORING(WordCount)OPTIONS(distance_type='COSINE',tree_depth=2,num_leaves=1000);If your embedding column isn't marked asNOT NULL in the table definition, youmust declare it with aWHERE COLUMN_NAME IS NOT NULL clause in the vectorindex definition, whereCOLUMN_NAME is the name of your embedding column. Tocreate a vector index with a three-level tree and 1000000 leaf nodes on thenullable embedding columnNullableDocEmbedding using the cosine distance:
CREATEVECTORINDEXDocEmbeddingThreeLevelIndexONDocuments(NullableDocEmbedding)STORING(WordCount)WHERENullableDocEmbeddingISNOTNULLOPTIONS(distance_type='COSINE',tree_depth=3,num_branches=1000,num_leaves=1000000);Filter a vector index
You can also create a filtered vector index to find the most similar items inyour database that match the filter condition. A filtered vector indexselectively indexes rows that satisfy the specified filter conditions, improvingsearch performance.
In the following example, the tableDocuments2 has a column calledCategory.In our vector search, we want to index the "Tech" category so we create agenerated column that evaluates toNULL if the category condition isn't met.
CREATETABLEDocuments2(DocIdINT64NOTNULL,CategorySTRING(MAX),NullIfFilteredBOOLAS(IF(Category='Tech',TRUE,NULL))HIDDEN,DocEmbeddingARRAY<FLOAT32>(vector_length=>128),)PRIMARYKEY(DocId);Then, we create a vector index with a filter. TheTechDocEmbeddingIndex vectorindex only indexes documents in the "Tech" category.
CREATEVECTORINDEXTechDocEmbeddingIndexONDocuments2(DocEmbedding)STORING(NullIfFiltered)WHEREDocEmbeddingISNOTNULLANDNullIfFilteredISNOTNULLOPTIONS(...);When Spanner runs the following query, which has filters thatmatch theTechDocEmbeddingIndex, it automatically picks and is accelerated byTechDocEmbeddingIndex. The query only searches documents in the "Tech"category. You can also use{@FORCE_INDEX=TechDocEmbeddingIndex} to forceSpanner to useTechDocEmbeddingIndex explicitly.
SELECT*FROMDocuments2WHEREDocEmbeddingISNOTNULLANDNullIfFilteredISNOTNULLORDERBYAPPROX_(....)LIMIT10;NullIfFiltered IS NOT NULL withCategory = 'Tech', then the query won't match the vector indexTechDocEmbeddingIndex.What's next
Learn more about Spannerapproximate nearest neighbors.
Learn more about theGoogleSQL
APPROXIMATE_COSINE_DISTANCE(),APPROXIMATE_EUCLIDEAN_DISTANCE(),APPROXIMATE_DOT_PRODUCT()functions.Learn more about theGoogleSQL
VECTOR INDEXstatements.Learn more aboutvector index best practices.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2026-02-19 UTC.