The AI.SEARCH function

Preview

This feature is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of theService Specific Terms. Pre-GA features are available "as is" and might have limited support. For more information, see thelaunch stage descriptions.

Note: To provide feedback or request support for this feature during thepreview, contactbq-vector-search@google.com.

This document describes theAI.SEARCH function, which is a table-valuedfunction for semantic search on tables that haveautonomous embedding generation enabled.

For example, you could use a query like the following to search a table ofproduct descriptions for anything described as a fun toy. In this example,theproduct_description column has autonomous embedding generation enabled.

SELECT*FROMAI.SEARCH(TABLEproduct_table,product_description,"A really fun toy");

Embeddings are high-dimensional numerical vectors that represent a given entity.Embeddings encode semantics about entitiesto make it easier to reason about and compare them. If two entities aresemantically similar, then their respective embeddings are located near eachother in the embedding vector space. TheAI.SEARCH function embeds yoursearch query and searches the table that you provide for embeddings in the inputtable that are close to it. If your table has a vector index on the embeddingcolumn, thenAI.SEARCH uses it to optimize the search.

You can useAI.SEARCH to help with the following tasks:

  • Semantic search: search entities ranked by semantic similarity.
  • Recommendation: return entities with attributes similar to a givenentity.
  • Classification: return the class of entities whose attributes aresimilar to the given entity.
  • Clustering: cluster entities whose attributes are similar to a givenentity.
  • Outlier detection: return entities whose attributes are least related tothe given entity.

Syntax

AI.SEARCH({TABLEbase_table|base_table_query},column_to_search,query_value[,top_k=>top_k_value][,distance_type=>distance_type_value][,options=>options_value])

Arguments

AI.SEARCH takes the following arguments:

  • base_table: The table to search for nearest neighbor embeddings. The tablemust haveautonomous embedding generation enabled.
  • base_table_query: A query that you can use to pre-filter the basetable. OnlySELECT,FROM, andWHERE clauses are allowed in this query.Don't apply any filters to the embedding column.You can't uselogical views in this query.Using asubquery mightinterfere with index usage or cause your query to fail.If the base table is indexed and theWHERE clause contains columns that arenot stored in the index, thenAI.SEARCH uses post-filters on those columnsinstead. To learn more, seeStore columns and pre-filter.
  • column_to_search: ASTRING literal that contains the name of the stringcolumn to search. This must be the name of the source column that theautomatically generated embedding column is based on, but it's not the name ofthe generated embedding column itself.If the column has a vector index, BigQuery attempts to use it.To determine if an index was used in the vector search, seeVector index usage.
  • query_value: A string literal that represents the search query. This valueis embedded at runtime using the same connection and endpoint specified forthe base table's embedding generation. You must have theBigQuery Connection User role (roles/bigquery.connectionUser) on theconnection that the base table uses for background embedding generation.If embedding generation fails forquery_value, then the whole query fails.Rows with missing embeddings in the base table are skipped during thesearch.
  • top_k: A named argument with anINT64 value.top_k_valuespecifies the number of nearest neighbors toreturn. The default is10. If the value is negative, all values are countedas neighbors and returned.
  • distance_type: A named argument with aSTRING value.distance_type_value specifies the type of metric to use tocompute the distance between two vectors. Supported distance types areEUCLIDEAN,COSINE,andDOT_PRODUCT.The default isEUCLIDEAN.

    If you don't specifydistance_type_value and thecolumn_to_searchcolumn has a vector index that's used, thenAI.SEARCH uses the distancetype specified in thedistance_type optionof theCREATE VECTOR INDEX statement.

  • options: A named argument with a JSON-formattedSTRING value.options_value is a literal that specifies the following searchoptions:

    • fraction_lists_to_search: A JSON number that specifies thepercentage of lists to search. For example,options => '{"fraction_lists_to_search":0.15}'. Thefraction_lists_to_search value must be in the range0.0 to1.0,exclusive.

      Specifying a higher percentage leads to higher recall and slowerperformance, and the converse is true when specifying a lower percentage.

      fraction_lists_to_search is only used when a vector index is also used.If you don't specify afraction_lists_to_search value but an index ismatched, an appropriate value is picked.

      The number of available lists to search is determined by thenum_lists optionin theivf_options option or derived fromtheleaf_node_embedding_count optionin thetree_ah_options option of theCREATE VECTOR INDEX statement ifspecified. Otherwise, BigQuery calculates an appropriate number.

      You can't specifyfraction_lists_to_search whenuse_brute_force isset totrue.

    • use_brute_force: A JSON boolean that determines whether to use bruteforce search by skipping the vector index if one is available. Forexample,options => '{"use_brute_force":true}'. Thedefault isfalse. If you specifyuse_brute_force=false and there isno useable vector index available, brute force is used anyway.

    options defaults to'{}' to denote that all underlying options use theircorresponding default values.

Details

You can optionally useAI.SEARCH with avector index. Whena vector index is used,AI.SEARCH uses theApproximate NearestNeighborsearch technique to help improve vector search performance, withthe trade-off of reducingrecalland so returning more approximateresults. When a base table is large, the use of an index typically improvesperformance without significantly sacrificing recall. Brute force is used toreturn exact results when a vector index isn't available, and you canchoose to use brute force to get exact results even when a vector indexis available.

Output

The output includes the following columns:

  • base: ASTRUCT value that contains all columns frombase_table or asubset of the columns frombase_table that you selected in thebase_table_query query.
  • distance: AFLOAT64 value that represents the distance between thequery_value and the embedding incolumn_to_search.

Rows that are missing a generated embedding are skipped during the search.

Example

The following example shows how to create a table of products and descriptionswith autonomous embedding enabled on the description column,add some data to the table, and then search it for products that would befun to play with.

# Create a table of products and descriptions with a generated embedding column.CREATETABLEmydataset.products(nameSTRING,descriptionSTRING,description_embeddingSTRUCT<resultARRAY<FLOAT64>,statusSTRING>GENERATEDALWAYSAS(AI.EMBED(description,connection_id=>'us.example_connection',endpoint=>'text-embedding-005'))STOREDOPTIONS(asynchronous=TRUE));# Insert product descriptions into the table.# The description_embedding column is automatically updated.INSERTINTOmydataset.products(name,description)VALUES("Lounger chair","A comfortable chair for relaxing in."),("Super slingers","An exciting board game for the whole family."),("Encyclopedia set","A collection of informational books.");# Search for products that are fun to play with.SELECTbase.name,base.description,distanceFROMAI.SEARCH(TABLEmydataset.products,'description',"A really fun toy");/*------------------+----------------------------------------------+----------------------+ | name             | description                                  | distance             | +------------------+----------------------------------------------+----------------------+ | Super slingers   | An exciting board game for the whole family. | 0.80954913893618929  | | Lounger chair    | A comfortable chair for relaxing in.         | 0.938933930620146    | | Encyclopedia set | A collection of informational books.         | 1.1119297739353384   | +------------------+----------------------------------------------+----------------------*/

Related functions

TheAI.SEARCH andVECTOR_SEARCHfunctions support overlapping use cases. In general, you should useAI.SEARCHwhen your base table has autonomous embedding generation enabled and you wantto search for results close to a single string literal. It offers a simplifiedsyntax compared toVECTOR_SEARCH and doesn't require you to embed yoursearch query. You should useVECTOR_SEARCH when you want to batch your search queries, when you wantto generate yourown embeddings as input, or if your base table doesn't use autonomous embeddinggeneration.

Locations

You can runAI.SEARCH in all of thelocationsthat support Vertex AI embedding models, and also in theUSandEU multi-regions.

Quotas

SeeGenerative AI functions quotas and limits.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.