The AI.SEARCH function
Preview
This feature is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of theService Specific Terms. Pre-GA features are available "as is" and might have limited support. For more information, see thelaunch stage descriptions.
Note: To provide feedback or request support for this feature during thepreview, contactbq-vector-search@google.com.This document describes theAI.SEARCH function, which is a table-valuedfunction for semantic search on tables that haveautonomous embedding generation enabled.
For example, you could use a query like the following to search a table ofproduct descriptions for anything described as a fun toy. In this example,theproduct_description column has autonomous embedding generation enabled.
SELECT*FROMAI.SEARCH(TABLEproduct_table,product_description,"A really fun toy");Embeddings are high-dimensional numerical vectors that represent a given entity.Embeddings encode semantics about entitiesto make it easier to reason about and compare them. If two entities aresemantically similar, then their respective embeddings are located near eachother in the embedding vector space. TheAI.SEARCH function embeds yoursearch query and searches the table that you provide for embeddings in the inputtable that are close to it. If your table has a vector index on the embeddingcolumn, thenAI.SEARCH uses it to optimize the search.
You can useAI.SEARCH to help with the following tasks:
- Semantic search: search entities ranked by semantic similarity.
- Recommendation: return entities with attributes similar to a givenentity.
- Classification: return the class of entities whose attributes aresimilar to the given entity.
- Clustering: cluster entities whose attributes are similar to a givenentity.
- Outlier detection: return entities whose attributes are least related tothe given entity.
Syntax
AI.SEARCH({TABLEbase_table|base_table_query},column_to_search,query_value[,top_k=>top_k_value][,distance_type=>distance_type_value][,options=>options_value])
Arguments
AI.SEARCH takes the following arguments:
base_table: The table to search for nearest neighbor embeddings. The tablemust haveautonomous embedding generation enabled.base_table_query: A query that you can use to pre-filter the basetable. OnlySELECT,FROM, andWHEREclauses are allowed in this query.Don't apply any filters to the embedding column.You can't uselogical views in this query.Using asubquery mightinterfere with index usage or cause your query to fail.If the base table is indexed and theWHEREclause contains columns that arenot stored in the index, thenAI.SEARCHuses post-filters on those columnsinstead. To learn more, seeStore columns and pre-filter.column_to_search: ASTRINGliteral that contains the name of the stringcolumn to search. This must be the name of the source column that theautomatically generated embedding column is based on, but it's not the name ofthe generated embedding column itself.If the column has a vector index, BigQuery attempts to use it.To determine if an index was used in the vector search, seeVector index usage.query_value: A string literal that represents the search query. This valueis embedded at runtime using the same connection and endpoint specified forthe base table's embedding generation. You must have theBigQuery Connection User role (roles/bigquery.connectionUser) on theconnection that the base table uses for background embedding generation.If embedding generation fails forquery_value, then the whole query fails.Rows with missing embeddings in the base table are skipped during thesearch.top_k: A named argument with anINT64value.top_k_valuespecifies the number of nearest neighbors toreturn. The default is10. If the value is negative, all values are countedas neighbors and returned.distance_type: A named argument with aSTRINGvalue.distance_type_valuespecifies the type of metric to use tocompute the distance between two vectors. Supported distance types areEUCLIDEAN,COSINE,andDOT_PRODUCT.The default isEUCLIDEAN.If you don't specify
distance_type_valueand thecolumn_to_searchcolumn has a vector index that's used, thenAI.SEARCHuses the distancetype specified in thedistance_typeoptionof theCREATE VECTOR INDEXstatement.options: A named argument with a JSON-formattedSTRINGvalue.options_valueis a literal that specifies the following searchoptions:fraction_lists_to_search: A JSON number that specifies thepercentage of lists to search. For example,options => '{"fraction_lists_to_search":0.15}'. Thefraction_lists_to_searchvalue must be in the range0.0to1.0,exclusive.Specifying a higher percentage leads to higher recall and slowerperformance, and the converse is true when specifying a lower percentage.
fraction_lists_to_searchis only used when a vector index is also used.If you don't specify afraction_lists_to_searchvalue but an index ismatched, an appropriate value is picked.The number of available lists to search is determined by the
num_listsoptionin theivf_optionsoption or derived fromtheleaf_node_embedding_countoptionin thetree_ah_optionsoption of theCREATE VECTOR INDEXstatement ifspecified. Otherwise, BigQuery calculates an appropriate number.You can't specify
fraction_lists_to_searchwhenuse_brute_forceisset totrue.use_brute_force: A JSON boolean that determines whether to use bruteforce search by skipping the vector index if one is available. Forexample,options => '{"use_brute_force":true}'. Thedefault isfalse. If you specifyuse_brute_force=falseand there isno useable vector index available, brute force is used anyway.
optionsdefaults to'{}'to denote that all underlying options use theircorresponding default values.
Details
You can optionally useAI.SEARCH with avector index. Whena vector index is used,AI.SEARCH uses theApproximate NearestNeighborsearch technique to help improve vector search performance, withthe trade-off of reducingrecalland so returning more approximateresults. When a base table is large, the use of an index typically improvesperformance without significantly sacrificing recall. Brute force is used toreturn exact results when a vector index isn't available, and you canchoose to use brute force to get exact results even when a vector indexis available.
Output
The output includes the following columns:
base: ASTRUCTvalue that contains all columns frombase_tableor asubset of the columns frombase_tablethat you selected in thebase_table_queryquery.distance: AFLOAT64value that represents the distance between thequery_valueand the embedding incolumn_to_search.
Rows that are missing a generated embedding are skipped during the search.
Example
The following example shows how to create a table of products and descriptionswith autonomous embedding enabled on the description column,add some data to the table, and then search it for products that would befun to play with.
# Create a table of products and descriptions with a generated embedding column.CREATETABLEmydataset.products(nameSTRING,descriptionSTRING,description_embeddingSTRUCT<resultARRAY<FLOAT64>,statusSTRING>GENERATEDALWAYSAS(AI.EMBED(description,connection_id=>'us.example_connection',endpoint=>'text-embedding-005'))STOREDOPTIONS(asynchronous=TRUE));# Insert product descriptions into the table.# The description_embedding column is automatically updated.INSERTINTOmydataset.products(name,description)VALUES("Lounger chair","A comfortable chair for relaxing in."),("Super slingers","An exciting board game for the whole family."),("Encyclopedia set","A collection of informational books.");# Search for products that are fun to play with.SELECTbase.name,base.description,distanceFROMAI.SEARCH(TABLEmydataset.products,'description',"A really fun toy");/*------------------+----------------------------------------------+----------------------+ | name | description | distance | +------------------+----------------------------------------------+----------------------+ | Super slingers | An exciting board game for the whole family. | 0.80954913893618929 | | Lounger chair | A comfortable chair for relaxing in. | 0.938933930620146 | | Encyclopedia set | A collection of informational books. | 1.1119297739353384 | +------------------+----------------------------------------------+----------------------*/Related functions
TheAI.SEARCH andVECTOR_SEARCHfunctions support overlapping use cases. In general, you should useAI.SEARCHwhen your base table has autonomous embedding generation enabled and you wantto search for results close to a single string literal. It offers a simplifiedsyntax compared toVECTOR_SEARCH and doesn't require you to embed yoursearch query. You should useVECTOR_SEARCH when you want to batch your search queries, when you wantto generate yourown embeddings as input, or if your base table doesn't use autonomous embeddinggeneration.
Locations
You can runAI.SEARCH in all of thelocationsthat support Vertex AI embedding models, and also in theUSandEU multi-regions.
Quotas
SeeGenerative AI functions quotas and limits.
What's next
- Learn more aboutautonomous embedding generation.
- Learn more aboutcreating and managing vector indexes.
- Learn more aboutembeddings and search.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-15 UTC.