Search and filter with vector embeddings
MYSQL_8_0_version.R20241208.01_00 or newer, whereversion is the minor version number. For information about how to upgrade yourinstance to a newer version that supports GA vector embeddings, seeSelf-service maintenance.For information about the syntax and behavior of vector embeddings for thePublic Preview version of this feature, seeWork with vector embeddings (Preview).
This page describes the different ways that you can query vector embeddings. Foran overview of ANN and KNN similarity searches, seevector searches.
Search approximate nearest neighbors (ANN)
To perform an ANN search, use theapprox_distance functionin aSELECT andORDER BY clause. You must use aLIMIT clause in an ANNsearch. You can also get the distance value by puttingapprox_distance in aSELECT list.
Use the following syntax for ANN queries:
#OrderingbydistanceSELECTtitleFROMbooksORDERBYapprox_distance(embedding,string_to_vector('[1,2,3]'),'distance_measure=l2_squared')LIMIT4;#SelectingthedistancevalueSELECTapprox_distance(embedding_name,string_to_vector('[1,2,3]'),'distance_measure=cosine,num_leaves_to_search=3')distFROMtableORDERBYdistLIMITlimit_value;Theapprox_distance function uses the following options:
embedding: uses the vector embedding column name from the base table.string_to_vectororvector_to_string: converts a vector to astring and a string to a vector to make the vector human readable.distance_measure: specify the distance measure to use for a vectorsimilarity search. This value must match the value you set in thedistance_measureparameter when you created the index. Thisparameter is required. Possible values for this parameter are:COSINEL2_SQUAREDDOT_PRODUCT
num_leaves_to_search: optional. specifies the number of leaves toprobe for an ANN vector similarity search. If you don't specify the numberof leaves, then Cloud SQL uses a value generated based on thesize of the table, number of leaves in the vector index, and other factors.You can view this value ininformation_schema.innodb_vector_indexes. Werecommend that you fine-tunenum_leaves_to_searchto achieve the bestbalance between search quality and performance for your specific workload.If increased, it impacts performance but improves recall.
The following example shows how to useapprox_distance to find the top Kclosest rows using thel2_squared distance measure and order the results bydistance.
#OrderingbydistanceSELECTtitleFROMbooksORDERBYapprox_distance(embedding,string_to_vector('[1,2,3]'),'distance_measure=l2_squared')LIMIT4;#SelectingthedistancevalueSELECTapprox_distance(embedding,string_to_vector('[1,2,3]'),'distance_measure=l2_squared')distFROMtableORDERBYdistLIMIT4;Filter results from approx_distance queries
You can use theapprox_distance function withWHERE conditions that filterquery results with a non-vector predicate to perform post filtering. Theapprox_distance function is evaluated before applying the filter which meansthat the number of results returned are nondeterministic.
For example, for the following query:
SELECTidFROMproductsWHEREprice <100ORDERBYapprox(embedding,@query_vector,'distance_measure=cosine')LIMIT11;Theapprox_distance function returns the 11 nearest neighbors to the queryvector regardless of price. In post filtering, the products with a price< 100 are selected. It's possible that all of the nearest neighbors have aprice < 100, so there are 11 results to the query. Alternatively, if noneof the nearest neighbors have a price < 100, there are 0 rows returned.
If you anticipate that your filter in theWHERE condition is very selective, anexact search (KNN) might be a better option to ensure that a sufficient numberof rows are returned.
Check the fallback status on ANN searches
There are certain cases where an ANN search falls back to a KNN search. Theseinclude the following:
- There's no vector index on the base table.
- There's a vector index on the base table, but it uses a differentdistance measure from the
distance_measureparameter in theapprox_distancesearch options. - The vector index is corrupt or invisible to the current transaction.
- The
LIMITspecified is greater than 10000. - There is no
LIMITspecified. - The current query involves more than one
approx_distancecall on the samebase table. - The optimizer calculates that it's more efficient to use KNN.
All of these cases push a warning to the client indicating that exact search wasperformed and the reason why.
Use the following command in the mysql client to view the fallback status:
SHOWglobalstatusLIKE'%cloudsql_vector_knn_fallback%';If you want to use ANN and it's falling back to KNN, the query might run slower.You should find the reason it's falling backand assess whether to make changes so that ANN is used instead.
Example: Create a vector index and run an ANN query
The following example walkthrough provides steps to create a vector index andrun an ANN query in Cloud SQL.
- Generate vector embeddings. You can create vector embeddings manually or usea text embedding API of your choice. For an example that usesVertex AI, seeGenerate vector embeddings based on row data.
Create a table in Cloud SQL that contains a vector embeddingcolumn with three dimensions.
CREATETABLEbooks(idINTEGERPRIMARYKEYAUTO_INCREMENT,titleVARCHAR(60),embeddingVECTOR(3)USINGVARBINARY);Insert a vector embedding into the column.
INSERTINTObooksVALUES((1,'book title',string_to_vector('[1,2,3]')));Commit the changes.
commit;Create the vector index using the
L2_squaredfunction to measure distance.CREATEVECTORINDEXvectorIndexONdbname.books(embeddings)USINGSCANNQUANTIZER=SQ8DISTANCE_MEASURE=l2_squared;Use the following syntax to perform an ANN search with a
LIMITof 4 searchresults:SELECTtitleFROMbooksORDERBYapprox_distance(embedding,string_to_vector('[1,2,3]'),'distance_measure=l2_squared')LIMIT4;SELECTapprox_distance(embedding,string_to_vector('[1,2,3]'),'distance_measure=cosine')distFROMbooksORDERBYdistLIMIT4;
Search K-nearest neighbors (KNN)
To perform a K-nearest neighbor search, use thevector_distance functionwith a distance measure option and a vector conversion function(string_to_vector orvector_to_string) in aSELECT statement. Use thefollowing syntax:
SELECTvector_distance(string_to_vector('[1,2,3]'),string_to_vector('[1,2,3]'),'Distance_Measure=dot_product');Replace the values [1,2,3] with the embedding values of your data.
The following example shows how to use this query with thecosine_distancefunction and thestring_to_vector vector conversion function.
SELECTid,cosine_distance(embedding,string_to_vector('[1,2,3]'))distFROMbooksORDERBYdistanceLIMIT10;Get the Cosine distance in a KNN query
Use the Cloud SQLcosine_distance functionto calculate the distance using cosine.
SELECTcosine_distance(embedding,string_to_vector('[3,1,2]'))ASdistanceFROMbooksWHEREid=10;Get the Dot Product distance in a KNN query
Use the Cloud SQLdot_product functionto calculate the distance using the dot product.
SELECTdot_product(embedding,string_to_vector('[3,1,2]'))ASdistanceFROMbooksWHEREid=10;Get the L2-squared distance in a KNN query
Use the Cloud SQLl2_squared_distance functionto calculate the distance using L2 squared.
SELECTl2_squared_distance(embedding,string_to_vector('[3,1,2]'))ASdistanceFROMbooksWHEREid=10;What's next
- Read theoverview about vector search on Cloud SQL.
- Learn how toenable and disable vector embeddings on your instance.
- Learn how togenerate vector embeddings.
- Learn how tocreate vector indexes.
- Learn how toperform searches on vector embeddings.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-11-03 UTC.