Vector indexing best practices

Note: This feature is available with the Spanner Enterprise edition and Enterprise Plus edition. For more information, see theSpanner editions overview.

This page describes vector indexing best practices that optimize yourvector indexes and improveapproximate nearest neighbor (ANN) query results.

Tune the vector search options

The most optimal values for your vector index options depend on your use case,vector dataset, and on the query vectors. You can set and tune these valuesby creating a new vector index and setting theindex_option_listin theCREATE VECTOR INDEX statement. You might need to perform iterativetuning to find the best values for your specific workload.

Here are some helpful guidelines to follow when picking appropriate values:

  • tree_depth (tree level): If the table you're indexing has fewer than 10million rows, use atree_depth of2. Otherwise, atree_depth of3supports tables of up to about 10 billion rows.

  • num_leaves: Use the square root of the number of rows in the dataset. Alarger value can increase vector index build time. Avoid settingnum_leaveslarger than thetable_row_count divided by 1000 as this results in overlysmall leaves and poor performance.

  • num_leaves_to_search: This option specifies how many leaf nodes of the indexare searched. Increasingnum_leaves_to_search improves recall but alsoincreases latency and cost. We recommend using a number that is 1% the totalnumber of leaves defined in theCREATE VECTOR INDEX statement as the valuefornum_leaves_to_search. If you're using a filter clause, increasethis value to widen the search.

If acceptable recall is achieved, but the cost of querying is too high,resulting in low maximum QPS, try increasingnum_leaves by following thesesteps:

  1. Setnum_leaves to some multiple k of its original value (for example,2 * sqrt(table_row_count)).
  2. Setnum_leaves_to_search to be the same multiple k of its original value.
  3. Experiment with reducingnum_leaves_to_search to improve cost and QPSwhile maintaining recall.

Improve recall

To improve recall, consider tuning thenum_leaves_to_search value orrebuilding your vector index.

Increase thenum_leaves_to_search value

If thenum_leaves_to_search value is too small, you might find it morechallenging to find the nearest neighbors for some query vectors. Creating a newvector index with an increasednum_leaves_to_search value can help improverecall by searching more leaves. Recent queries might contain more of thesechallenging vectors.

Rebuild the vector index

The tree structure of the vector index is optimized for the dataset at the timeof creation, and is static thereafter. Therefore, if significantly differentvectors are added after creating the initial vector index, then the treestructure might be sub-optimal, leading to poorer recall.

To rebuild your vector index without downtime:

  1. Create a new vector index on the same embedding column as the current vectorindex, updating parameters (for example,OPTIONS) as appropriate.
  2. After the index creation completes, use theFORCE_INDEX hintto point at the new index to update the vector search query. This ensuresthat the query uses the new vector index. You might also need to retunenum_leaves_to_search in your new query.
  3. Drop the outdated vector index.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-17 UTC.