Vector search

MySQL  |  PostgreSQL  |  SQL Server

To use the GA version of vector search, the instance maintenance version shouldbe versionMYSQL_8_0_version.R20241208.01_00 or newer, whereversion is the minor version number. For information about how to upgrade yourinstance to a newer version that supports GA vector embeddings, seeSelf-service maintenance.

For information about the syntax and behavior of vector embeddings for thePublic Preview version of this feature, seeWork with vector embeddings (Preview).

This page describes how vector searches are implemented onCloud SQL for MySQL instances. Cloud SQL lets you store vectorembeddings, create vector indexes, and performvector searchesin conjunction with your other stored data.

Vector embedding storage

You store vector embeddings in a table that's compliant with atomicity,consistency, isolation, and durability (ACID) properties. Like other relationaldata in the table, you can access the vector embeddings in the table withexisting transactional semantics.

To establish mapping between table rows and vector representations, you need tocreate a column in your table to store your vector embeddings. The column mustuse the Cloud SQLVECTOR data type and must indicate the numberof dimensions that the embedding requires. The vector embedding column can onlystore vector embeddings that use exactly the same dimensions that you specifywhen you define the column.

A table can only have one vector embedding column. There are no restrictions forthe number of rows in the table.

To distinguish the vector embedding column from other columns,Cloud SQL adds a specialCOMMENT andCONSTRAINT to the column.The constraint is required for input validation, and the vector embedding columnannotation is visible as aCOMMENT. You can't modify or delete the comment orconstraint.

If you have enough storage and memory available on your Cloud SQLinstance, you can have multiple tables with their own vector embedding columns.

Data replication works thesame way for the vector embedding column as it does for other MySQL InnoDB columns.

For a list of limitations and restrictions for vector embedding tables, columns,and DML statements, seeLimitations.

Vector indexes

You must use a vector index to perform ANN similarity searches on your vectorembeddings. Cloud SQL creates vector indexes using theScalable Nearest Neighbors (ScANN)algorithm.

Vector indexes have the following requirements:

  • You can only create one vector index per table.
  • If you have multiple tables with vector embeddings on your instance, you cancreate vector indexes for each of them.
  • If you're creating a vector index, you can't add a constraint to the primarykey of the indexed table.

For better search quality, create a vector index only after loading the bulk ofyour data in the base table. If you have less than 1000 embeddings in the basetable, the index creation fails.

When deciding whether to create a vector index, if you have a small number ofrows, consider whether you can perform a KNN search instead. The decision to usea KNN versus an ANN search also depends on the number of dimensions on thevector embedding. A larger number of embeddings might require a vector index.

For a list of limitations and restrictions for vector indexes, seeLimitations. For information oncreating a vector index, seeCreate and manage vector indexes.

Vector index updates

Cloud SQL updates vector indexes in real time. Any transaction thatperforms data manipulation language (DML) operations on the base table alsopropagates changes to the associated vector indexes. Vector indexes behavein the same way as any other secondary index on the table. The vector indexesare fully transactionally consistent and ACID compliant. If you roll back atransaction, then the corresponding rollback changes also occur in the vectorindex.

Replication of vector indexes

Cloud SQL replicates vector indexes to all read replicas, includingfor cascading replication.When you create a new read replica from a primary instance that has vectorembedding, the read replica inherits the vector embedding settings from theprimary instance. For existing read replicas, you must enable vector embeddingsupport on each one.

In terms of impact to replication lag, creating and maintaining vector indexesoperate in the same way as regular MySQL indexes.

Persistence, shutdown, and impact on maintenance

Vector indexes are persisted the same way as base tables, with full ACIDsupport. Vector indexes are always in-sync with their base table data, and havethe same visibility, isolation, and crash safety. There is no impact to thevector index when the instance is shut down or receives maintenance.

Index maintenance

After extensive DML operations are performed on the base table, the vector indexthat you trained on the initial data (at the time of index creation) might notreflect the new state. This can impact search quality.

There are two parts to the index:

  • The index tree. This is built by training on existing data. It staysunchanged during the lifetime of the index.
  • The index leaves. These contain all the rows of data. The index leaves nevergo out of sync.

The index tree might become less efficient after a large number of DMLstatements are run because rows move from one leaf to another. To refresh theindex tree, you need to rebuild the index.

Unsupported DDL operations on tables with vector indexes

  • Alter table operations requiring copy algorithm.
  • Alter table operations that requires the table to be rebuilt.
  • Drop or change the primary key.
  • Move the table to a general tablespace.

Vector search

Cloud SQL provides vector distance functions that you use toperform approximate nearest neighbor (ANN) and K-nearest neighbors (KNN) vectorsimilarity searches on your instance. When you run a query, the query vector iscompared to vectors in your dataset. Distance functions calculate the distancebetween the vectors using a similarity metric such as cosine. The vectors withthe shortest distance between them are the most similar and are returned insearch results.

Cloud SQL uses the following functions to measure distance betweenvectors in vector searches when you perform ANN and KNN vectorsearches:

  • Cosine:measures the cosine of the angle between two vectors. A smaller valueindicates greater similarity between the vectors.
  • Dot product:calculates the cosine of the angle multiplied by the product ofcorresponding vector magnitudes.
  • L2 squared distance:measures the Euclidean distance between two vectors by adding the squareddistance on each dimension.

KNN search

A KNN vector search is the preferred search method when you need exactresults or want to add selective filtering. KNN search performs a distancecomputation of the query vector with every embedding in the dataset to find thenearest neighbor. KNN searches in Cloud SQL provide perfect recall.KNN searches don't use a vector index so they're a good option when working withsmaller datasets.

To perform a KNN search, you use thevector_distance function that takes twovectors as input: the query vector (what you're searching for) and a candidatevector from your dataset. It calculates the distance between these two vectors.You use vector_distance in aSELECT statement. For more information, SeeSearch K-nearest neighbors (KNN).

If you find that KNN isn't performing well, you can build a vector index laterand continue to useapprox_distance in your application for ANN searches.

ANN search

An ANN vector search is the preferred search type when queryefficiency is a concern. It speeds up similarity searches by calculating thedistance between your query vector and only a portion of the vectors in yourdataset. To do this, Cloud SQL organizes the data into clusters orpartitions and then focuses the search on the clusters closest to the query. ANNsearches require vector indexes. These indexes prioritize search speed overperfect recall. In Cloud SQL, theTREE_SQ index type is usedfor ANN searches.

To perform an ANN search, you use theapprox_distance function with adistance measurement option. You useapprox_distance in anORDER BY orSELECT list and aLIMIT clause is permitted to limit search results. You canalso add aWHERE clause to perform post-filtering of your search results.If you want to have more control over the number of results that are returnedwhen you perform an ANN search with filters, then you canuseiterative filtering. With iterative filtering, your search query canreturn more search results by scanning more of the vector index untilthe preferred number of neighbors are found.

You can enable iterative filtering for your searchquery by setting thecloudsql_vector_iterative_filtering flagtoON at a session level for individual clients or a global levelfor all clients that connect to the instance.

For more information, seeSearch approximate nearest neighbors (ANN).

There are some cases when an ANN search falls back to a KNN search. For moreinformation, seeCheck the fallback status for ANN searches.

Requirements

Cloud SQL requires you to enable vector embeddings on the instanceusing thecloudsql_vector flag before you add vector embeddings. For moreinformation, seeEnable and disable vector embeddings on your instance.

Limitations

The following are limitations on tables that have a vector embedding column:

  • There can only be one vector embedding column per table.
  • There can only be one vector index per table.
  • A vector embedding is restricted to 16,000 dimensions.
  • The vector embedding column can't be a generated column.
  • Table-level partitioning on tables with vector embedding columns isn'tsupported.
  • Primary keys that use theBIT,BINARY,VARBINARY,JSON,BLOB,TEXT data types, or spatial data aren't supported for vector indexes.Composite primary keys also can't include any of these types.
  • If there's a vector index, you can't add a constraint to the primary key ofthe base table.
  • When a vector index is present on a table, there are some DDL operations youcan't perform. For more information, seeUnsupported DDL operations on tables with vector indexes.

The following are restrictions for vector search queries:

  • Theapprox_distance function can only be used in anORDER BY orSELECTlist.
  • Predicates involving the base table can be used in theWHERE condition incombination withapprox_distance expressions in theORDER BY orSELECTlist. TheWHERE condition predicates are evaluated after theapprox_distance vector functions are evaluated.

Best practices for working with vector indexes

This section provides best practices for working with vector indexes. Everyworkload is different, and you might need to adjust accordingly.

  • After major DML operations, it's a good practice to rebuild the index.
  • Generally, it's acceptable to let Cloud SQL compute the numberof leaves to use. If you have a use case where you want to specify thenumber of leaves, it's recommended to have at least 100 vectors perleaf for the best recall.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-11-24 UTC.