Search functions

GoogleSQL for BigQuery supports the following search functions.

Function list

NameSummary
SEARCH Checks to see whether a table or other search data contains a set of search terms.
VECTOR_SEARCH Performs a vector search on embeddings to find semantically similar entities.

SEARCH

SEARCH(data_to_search,search_query[,json_scope=>{'JSON_VALUES'|'JSON_KEYS'|'JSON_KEYS_AND_VALUES'}][,analyzer=>{'LOG_ANALYZER'|'NO_OP_ANALYZER'|'PATTERN_ANALYZER'}][,analyzer_options=>analyzer_options_values])

Description

TheSEARCH function checks to see whether a BigQuery table or othersearch data contains a set of search terms (tokens). It returnsTRUE if allsearch terms appear in the data, based on therules for search_queryand text analysis described in thetext analyzer. Otherwise,this function returnsFALSE.

Definitions

  • data_to_search: The data to search over. The value can be:

    • Any GoogleSQL data type literal
    • A list of columns
    • A table reference
    • A column of any type

    A table reference is evaluated as aSTRUCT whose fields are the columns ofthe table.data_to_search can be any type, butSEARCH will returnFALSE for all types except those listed here:

    • ARRAY<STRING>
    • ARRAY<STRUCT>
    • JSON
    • STRING
    • STRUCT

    You can search for string literals in columns of the preceding types.For additional rules, seeSearch data rules.

  • search_query: ASTRING literal, or aSTRING constant expression thatrepresents the terms of the search query. Ifsearch_query isNULL, anerror is returned. Ifsearch_query produces no search tokens,and the text analyzer isLOG_ANALYZER orPATTERN_ANALYZER, an error isproduced.
  • json_scope: A named argument with aSTRING value.Takes one of the following values to indicate the scope of JSON data to besearched. It has no effect ifdata_to_search isn't a JSON value ordoesn't contain a JSON field.

    • 'JSON_VALUES' (default): Only the JSON values are searched. Ifjson_scope isn't provided, this is used by default.

    • 'JSON_KEYS': Only the JSON keys are searched.

    • 'JSON_KEYS_AND_VALUES': The JSON keys and values are searched.

  • analyzer: A named argument with aSTRING value. Takesone of the following values to indicate the text analyzer to use:

    • 'LOG_ANALYZER' (default): Breaks the input into tokens when delimitersare encountered and then normalizes the tokens.For more information, seeLOG_ANALYZER.

    • 'NO_OP_ANALYZER': Extracts the text as a single token, butdoesn't apply normalization. For more information about this analyzer,seeNO_OP_ANALYZER.

    • 'PATTERN_ANALYZER': Breaks the input into tokens that match aregular expression. For more information, seePATTERN_ANALYZER text analyzer.

  • analyzer_options: A named argument with a JSON-formattedSTRING value.Takes a list of text analysis rules. For more information,seeText analyzer options.

Details

TheSEARCH function is designed to work withsearch indexes tooptimize point lookups. Although theSEARCH function works fortables that aren't indexed, its performance will be greatly improved with asearch index. If both the analyzer and analyzer options match the one usedto create the index, the search index will be used.

Rules forsearch_query

The'NO_OP_ANALYZER' extracts the search query as a single token withoutparsing it. The following rules apply only when using the'LOG_ANALYZER' or'PATTERN_ANALYZER'.

A search query is a set of one or more terms that are combinedusing the logical operatorsAND andOR along with parenthesis. Anywhitespace in the search query that is not in aphrase orbacktick term isconsidered an (implicit)AND. First, a search query is broken down intoterms using logical operators and parenthesis in the search query. Then, eachterm is evaluated based on whether or not it appears in the data to search. Thefinal outcome of theSEARCH function is the result of the logical expressionrepresented by the search query.

The following grammar is used to transform the search query into a logicalexpression of terms. The grammar is defined using theANTLR meta-language:

query_string : expression EOF;expression  : '(' expression  ')'            | expression 'AND' expression            | expression '\s' expression            | expression 'OR' expression            | term            ;term : single_term     | phrase_term     | backtick_term     ;backtick_term : '`' ( '\`' | ~[`] )+ '`';phrase_term : '"' ( '\"' | ~["] )+ '"';single_term : ( '\' reserved_char | ~[reserved_char] )+;

To evaluate each term, it is further broken down into zero or more searchabletokens based on the text analyzer. The following section contains the rules forhow different types of terms are analyzed and evaluated.

Rules forbacktick_term insearch_query:

  • If theLOG_ANALYZER text analyzer is used, text enclosed in backticksforces an exact match.

    For example,`Hello World` happy days becomesHello World,happy,anddays.

  • Search terms enclosed in backticks must match exactly indata_to_search,subject to the following conditions:

    • It appears at the start ofdata_to_search or is immediately precededby a delimiter.

    • It appears at the end ofdata_to_search or is immediately followed bya delimiter.

    For example,SEARCH('foo.bar', '`foo.`') returnsFALSE because thetext enclosed in the backticksfoo. is immediately followed by thecharacterb in the search datafoo.bar, rather than by a delimiter orthe end of the string. However,SEARCH('foo..bar', '`foo.`') returnsTRUE becausefoo. is immediately followed by the delimiter. in thesearch data.

  • Search terms enclosed in backticks must match case exactly, regardless ofany normalization settings inanalyzer_options.

    For example:

    -- FALSE because backticks require an exact match, including capitalizationSELECTSEARCH('Hello-world','`WORLD`',analyzer=>'LOG_ANALYZER',analyzer_options=>'''    {      "token_filters": [        {          "normalizer": {"mode": "LOWER"}        }      ]    }''')ASresults
  • The backtick itself can be escaped using a backslash,as in\`foobar\`.

  • The following are reserved words and must be enclosedin backticks:

    AND,NOT,OR,IN, andNEAR

Rules forreserved_char insearch_query:

  • Text not enclosed in backticks requires the followingreserved characters to be escaped by a double backslash\\:

    • [ ] < > ( ) { } | ! ' " * & ? + / : = - \ ~ ^

    • If the quoted string is preceded by the characterr orR, such asr"my\+string", then it's treated as a raw string and only a singlebackslash is required to escape the reserved characters. For moreinformation about raw strings and escapesequences, seeString and byte literals.

Rules forphrase_term insearch_query:

  • A phrase is a type of term. If text is enclosed in double quotes and theanalyzer isLOG_ANALYZER,PATTERN_ANALYZER, or not set(LOG_ANALYZER by default), the term represents a phrase.
  • When a phrase is analyzed, a subset of tokens is created for that phrase.For example, from the phrase"foo baz.bar", the analyzer calledLOG_ANALYZER generates the phrase-specific tokensfoo,baz, andbar.
  • The order of terms in a phrase matters. A match is only returned ifthe tokens that were produced for the phrase are next to each other and inthe same order as the tokens fordata_to_search.

    For example:

    -- FALSE because 'foo' and 'bar' aren't next to each other in-- 'foo baz.bar'.SEARCH('foo baz.bar','"foo bar"')
    -- TRUE because 'foo' and 'baz' are next to each other in-- 'foo baz.bar'.SEARCH('foo baz.bar','"foo baz"')
  • A single quote inside of the phrase is analyzed as a special character.

  • An escaped double quote (double quote after a backslash) is analyzedas a double quote character.

Howdata_to_search is broken into searchable tokens

The following table shows howdata_to_search is brokeninto searchable tokens by theLOG_ANALYZER text analyzer. All entries arestrings.

data_to_searchsearchable tokens
127.0.0.1127
0
1
127.0.0.1
.127.0.0
127.0
0.0
0.0.1
0.1
foobar@example.comfoobar
example
com
foobar@example
example.com
foobar@example.com
The fox.the
fox
The
The fox
The fox.
fox
fox.

Howsearch_query is broken into query terms

The following table shows howsearch_query is broken intoquery terms by theLOG_ANALYZER text analyzer. All entries are strings.

search_queryquery terms
127.0.0.1127
0
1
`127.0.0.1`127.0.0.1
foobar@example.comfoobar
example
com
`foobar@example.com`foobar@example.com

Rules fordata_to_search

General rules fordata_to_search:

  • data_to_search must contain all tokens produced forsearch_query for the function to returnTRUE.
  • To perform a cross-field search,data_to_search must be aSTRUCT,ARRAY, orJSON data type.
  • EachSTRING field in a compound data type is individuallysearched for terms.
  • If at least one field indata_to_search includes all search termsproduced bysearch_query,SEARCH returnsTRUE. Otherwise it has thefollowing behavior:

    • If at least oneSTRING field isNULL,SEARCH returnsNULL.

    • Otherwise,SEARCH returnsFALSE.

Return type

BOOL

Examples

The following queries show how tokens insearch_query are analyzedby aSEARCH function call using the default analyzer,LOG_ANALYZER:

SELECT-- ERROR: `search_query` is NULL.SEARCH('foobarexample',NULL)ASa,-- ERROR: `search_query` contains no tokens.SEARCH('foobarexample','')ASb,
SELECT-- TRUE: '-' and ' ' are delimiters.SEARCH('foobar-example','foobar example')ASa,-- TRUE: The search query is a constant expression evaluated to 'foobar'.SEARCH('foobar-example',CONCAT('foo','bar'))ASb,-- FALSE: The search_query isn't split.SEARCH('foobar-example','foobarexample')ASc,-- TRUE: The double backslash escapes the ampersand which is a delimiter.SEARCH('foobar-example','foobar\\&example')ASd,-- TRUE: The single backslash escapes the ampersand in a raw string.SEARCH('foobar-example',R'foobar\&example')ASe,-- FALSE: The backticks indicate that there must be an exact match for-- foobar&example.SEARCH('foobar-example','`foobar&example`')ASf,-- TRUE: An exact match is found.SEARCH('foobar&example','`foobar&example`')ASg/*-------+-------+-------+-------+-------+-------+-------+ | a     | b     | c     | d     | e     | f     | g     | +-------+-------+-------+-------+-------+-------+-------+ | true  | true  | false | true  | true  | false | true  | +-------+-------+-------+-------+-------+-------+-------*/
SELECT-- TRUE: The order of terms doesn't matter.SEARCH('foobar-example','example foobar')ASa,-- TRUE: Tokens are made lower-case.SEARCH('foobar-example','Foobar Example')ASb,-- TRUE: An exact match is found.SEARCH('foobar-example','`foobar-example`')ASc,-- FALSE: Backticks preserve capitalization.SEARCH('foobar-example','`Foobar`')ASd,-- FALSE: Backticks don't have special meaning for search_data and are-- not delimiters in the default LOG_ANALYZER.SEARCH('`foobar-example`','`foobar-example`')ASe,-- TRUE: An exact match is found after the delimiter in search_data.SEARCH('foobar@example.com','`example.com`')ASf,-- TRUE: An exact match is found between the space delimiters.SEARCH('a foobar-example b','`foobar-example`')ASg;/*-------+-------+-------+-------+-------+-------+-------+ | a     | b     | c     | d     | e     | f     | g     | +-------+-------+-------+-------+-------+-------+-------+ | true  | true  | true  | false | false | true  | true  | +-------+-------+-------+-------+-------+-------+-------*/
SELECT-- FALSE: No single array entry matches all search terms.SEARCH(['foobar','example'],'foobar example')ASa,-- FALSE: The search query is equivalent to foobar\\=.SEARCH('foobar=','`foobar\\=`')ASb,-- FALSE: This is equivalent to the previous example.SEARCH('foobar=',R'`\foobar=`')ASc,-- TRUE: The equals sign is a delimiter in the data and query.SEARCH('foobar=','foobar\\=')ASd,-- TRUE: This is equivalent to the previous example.SEARCH('foobar=',R'foobar\=')ASe,-- TRUE: An exact match is found.SEARCH('foobar.example','`foobar`')ASf,-- FALSE: `foobar.\` isn't analyzed because of backticks; it isn't-- followed by a delimiter in search_data 'foobar.example'.SEARCH('foobar.example','`foobar.\`')ASg,-- TRUE: `foobar.` isn't analyzed because of backticks; it is-- followed by the delimiter '.' in search_data 'foobar..example'.SEARCH('foobar..example','`foobar.`')ASh;/*-------+-------+-------+-------+-------+-------+-------+-------+ | a     | b     | c     | d     | e     | f     | g     | h     | +-------+-------+-------+-------+-------+-------+-------+-------+ | false | false | false | true  | true  | true  | false | true  | +-------+-------+-------+-------+-------+-------+-------+-------*/

The following queries show how logical expression can be used insearch_queryto perform aSEARCH function call:

SELECT-- TRUE: A whitespace is an implicit AND.-- Both `foo` and `bar` are in `foo bar baz`.SEARCH(R'foo bar baz',R'foo bar')ASa,-- TRUE: Similar to previous case-- `foo` and `bar` are in `foo bar baz`.SEARCH(R'foo bar baz',R'foo AND bar')ASb,-- TRUE: Only one of `foo` or `bar` should be in `foo`.SEARCH(R'foo',R'foo OR bar')ASc,-- TRUE: `foo` and one of `bar` or `baz` should be in `foo bar`.SEARCH(R'foo bar',R'"foo AND (bar OR baz)"')ASd,-- FALSE: Neither `bar` or `baz` are in `foo`.SEARCH(R'foo',R'foo AND (bar OR baz)')ASc,/*-------+-------+-------+-------+-------+ | a     | b     | c     | d     | e     | +-------+-------+-------+-------+-------+ | true  | true  | true  | true  | false | +-------+-------+-------+-------+-------+/

The following queries show how phrases insearch_query are analyzedby aSEARCH function call:

SELECT-- TRUE: The phrase `foo bar` is in `foo bar baz`.-- The tokens in `data_to_search` are `foo`, `bar`, and `baz`.-- The searchable tokens in `query_string` are `foo` and `bar`-- and because they appear in that exact order in `data_to_search`,-- the function returns TRUE.SEARCH(R'foo bar baz',R'"foo bar"')ASa,-- TRUE: Case is ignored.-- The tokens in `data_to_search` are `foo`, `bar`, and `baz`.-- The searchable tokens in `query_string` are `foo` and `bar`-- and because they appear in that exact order in `data_to_search`,-- the function return TRUE.SEARCH(R'Foo bar baz',R'"foo Bar"')ASb,-- TRUE: Both `-` and `&` are delimiters used during tokenization.-- The tokens in `data_to_search` are `foo`, `bar`, and `baz`.-- The searchable tokens in `query_string` are `foo` and `bar`-- and because they appear in that exact order in `data_to_search`,-- the function returns TRUE.SEARCH(R'foo-bar baz',R'"foo&bar"')ASc,-- FALSE: Backticks in a phrase are treated as normal characters.-- The tokens in `data_to_search` are `foo`, `bar`, and `baz`.-- The searchable tokens in `query_string` are:-- `foo-- bar`-- Because these searchable tokens don't appear in `data_to_search`,-- the function returns FALSE.SEARCH(R'foo bar baz',R'"`foo bar`"')ASd,-- FALSE: `foo bar` isn't in `foo else bar`.-- The tokens in `data_to_search` are `foo`, `else`, and `bar`.-- The searchable tokens in `query_string` are `foo` and `bar`.-- Even though they appear in `data_to_search`, but because they-- don't appear in that exact order (`foo` before `bar`),-- the function returns FALSE.SEARCH(R'foo else bar',R'"foo bar"')ASe,-- FALSE: `foo baz` isn't in `foo bar baz`.-- The `search_query` produces two terms. The first term is `bar`, which-- matches with the similar token in `data_to_search`. However, the second-- term is the phrase "foo&baz" with two tokens, `foo` and `baz`. Because-- `foo` and `baz` don't appear next to each other in `data_to_search`-- (`bar` is in between), the function returns FALSE.SEARCH(R'foo-bar-baz',R'bar "foo&baz"')ASf;/*-------+-------+-------+-------+-------+-------+ | a     | b     | c     | d     | e     | f     | +-------+-------+-------+-------+-------+-------+ | true  | true  | false | false | false | false | +-------+-------+-------+-------+-------+-------*/
SELECT-- FALSE: Only double quotes need to be escaped in a phrase.-- The tokens in `data_to_search` are `foo`, `bar`, and `baz`.-- The searchable tokens in `query_string` are `foo\` and `bar` and they-- must appear in that exact order in `data_to_search`, but don't.SEARCH(R'foo bar baz',R'"foo\ bar"',analyzer_options=>'{"delimiters": [" "]}')ASa,-- TRUE: `foo bar` is in `foo bar baz` after tokenization with the given-- delimiters.-- The tokens in `data_to_search` are `foo`, `bar`, and `baz`.-- The searchable tokens in `query_string` are `foo` and `bar` and they-- must appear in that exact order in `data_to_search`.SEARCH(R'foo bar baz',R'"foo? bar"',analyzer_options=>'{"delimiters": [" ", "?"]}')ASb,-- TRUE: `read book` is in `read book now` after `the` is ignored.-- The tokens in `data_to_search` are `read`, `book`, and `now`.-- The searchable tokens in `query_string` are `read` and `book` and they-- must appear in that exact order in `data_to_search`.SEARCH('read the book now',R'"read the book"',analyzer_options=>'{ "token_filters": [{"stop_words": ["the"]}] }')ASc,-- FALSE: `c d` isn't in `a`, `b`, `cd`, `e` or `f` after tokenization with-- the given pattern.-- The tokens in `data_to_search` are `a`, `b`, `cd`, `e` and `f`.-- The searchable tokens in `query_string` are `c` and `d` and they-- must appear in that exact order in `data_to_search`. `data_to_search`-- contains a `cd` token, but not a `c` or `d` token.SEARCH(R'abcdef',R'"c d"',analyzer=>'PATTERN_ANALYZER',analyzer_options=>'{"patterns": ["(?:cd)|[a-z]"]}')ASd,-- TRUE: `ant apple` is in `ant apple avocado` after tokenization with-- the given pattern.-- The tokens in `data_to_search` are `ant`, `apple`, and `avocado`.-- The searchable tokens in `query_string` are `ant` and `apple` and they-- must appear in that exact order in `data_to_search`.SEARCH(R'ant orange apple avocado',R'"ant apple"',analyzer=>'PATTERN_ANALYZER',analyzer_options=>'{"patterns": ["a[a-z]"]}')ASe;/*-------+-------+-------+-------+-------+ | a     | b     | c     | d     | e     | +-------+-------+-------+-------+-------+ | false | true  | true  | false | true  | +-------+-------+-------+-------+-------*/

The following query shows examples of calls to theSEARCH function using theNO_OP_ANALYZER text analyzer and reasons for various return values:

SELECT-- TRUE: exact matchSEARCH('foobar','foobar',analyzer=>'NO_OP_ANALYZER')ASa,-- FALSE: Backticks aren't special characters for `NO_OP_ANALYZER`.SEARCH('foobar','\`foobar\`',analyzer=>'NO_OP_ANALYZER')ASb,-- FALSE: The capitalization doesn't match.SEARCH('foobar','Foobar',analyzer=>'NO_OP_ANALYZER')ASc,-- FALSE: There are no delimiters for `NO_OP_ANALYZER`.SEARCH('foobar example','foobar',analyzer=>'NO_OP_ANALYZER')ASd,-- TRUE: An exact match is found.SEARCH('','',analyzer=>'NO_OP_ANALYZER')ASe,-- FALSE: 'foo bar' and "foo bar" aren't considered an exact match.SEARCH(R'foo bar baz',R'"foo bar"',analyzer=>'NO_OP_ANALYZER')ASf,-- TRUE: "foo bar" and "foo Bar" are considered an exact match because the-- analysis is case-insensitive.SEARCH(R'"foo bar"',R'"foo Bar"',analyzer=>'NO_OP_ANALYZER')ASg;-- FALSE: With NO_OP_ANALYZER the query string is analyzed as "foo OR bar"-- which is not an exact match with "foo".SEARCH(R'foo',R'foo OR bar',analyzer=>'NO_OP_ANALYZER')ASh;/*-------+-------+-------+-------+-------+-------+-------+-------+ | a     | b     | c     | d     | e     | f     | g     | h     | +-------+-------+-------+-------+-------+-------+-------+-------+ | true  | false | false | false | true  | false | true  | false | +-------+-------+-------+-------+-------+-------+-------+-------*/

Consider the following table calledmeals with columnsbreakfast,lunch,anddinner:

/*-------------------+-------------------------+------------------+ | breakfast         | lunch                   | dinner           | +-------------------+-------------------------+------------------+ | Potato pancakes   | Toasted cheese sandwich | Beef soup        | | Avocado toast     | Tomato soup             | Chicken soup     | +-------------------+-------------------------+------------------*/

The following query shows how to search single columns, multiple columns, andwhole tables, using the defaultLOG_ANALYZER text analyzerwith the default analyzer options:

WITHmealsAS(SELECT'Potato pancakes'ASbreakfast,'Toasted cheese sandwich'ASlunch,'Beef soup'ASdinnerUNIONALLSELECT'Avocado toast'ASbreakfast,'Tomato soup'ASlunch,'Chicken soup'ASdinner)SELECTSEARCH(lunch,'soup')ASlunch_soup,SEARCH((breakfast,dinner),'soup')ASbreakfast_or_dinner_soup,SEARCH(meals,'soup')ASanytime_soupFROMmeals;/*------------+--------------------------+--------------+ | lunch_soup | breakfast_or_dinner_soup | anytime_soup | +------------+--------------------------+--------------+ | false      | true                     | true         | | true       | true                     | true         | +------------+--------------------------+--------------*/

The following query shows additional ways to search, using thedefaultLOG_ANALYZER text analyzer withdefault analyzer options:

WITHdataAS(SELECT'Please use foobar@example.com as your email.'ASemail)SELECTSEARCH(email,'exam')ASa,SEARCH(email,'foobar')ASb,SEARCH(email,'example.com')ASc,SEARCH(email,R'"please use"')ASd,SEARCH(email,R'"as email"')ASeFROMdata;/*-------+-------+-------+-------+-------+ | a     | b     | c     | d     | e     | +-------+-------+-------+-------+-------+ | false | true  | true  | true  | false | +-------+-------+-------+-------+-------*/

The following query shows additional ways to search, using thedefaultLOG_ANALYZER text analyzer with customanalyzer options. Terms are only split when a space or@ symbol isencountered.

WITHdataAS(SELECT'Please use foobar@example.com as your email.'ASemail)SELECTSEARCH(email,'foobar',analyzer_options=>'{"delimiters": [" ", "@"]}')ASa,SEARCH(email,'example',analyzer_options=>'{"delimiters": [" ", "@"]}')ASb,SEARCH(email,'example.com',analyzer_options=>'{"delimiters": [" ", "@"]}')ASc,SEARCH(email,'foobar@example.com',analyzer_options=>'{"delimiters": [" ", "@"]}')ASd,SEARCH(email,R'use "foobar example.com" "as your"',analyzer_options=>'{"delimiters": [" ", "@"]}')ASeFROMdata;/*-------+-------+-------+-------+-------+ | a     | b     | c     | d     | e     | +-------+-------+-------+-------+-------+ | true  | false | true  | true  | true  | +-------+-------+-------+-------+-------*/

The following query shows how to search, using theNO_OP_ANALYZER text analyzer:

WITHmealsAS(SELECT'Tomato soup'ASlunch)SELECTSEARCH(lunch,'Tomato soup',analyzer=>'NO_OP_ANALYZER')ASa,SEARCH(lunch,'soup',analyzer=>'NO_OP_ANALYZER')ASb,SEARCH(lunch,'tomato soup',analyzer=>'NO_OP_ANALYZER')ASc,SEARCH(lunch,R'"Tomato soup"',analyzer=>'NO_OP_ANALYZER')ASdFROMmeals;/*-------+-------+-------+-------+ | a     | b     | c     | d     | +-------+-------+-------+-------+ | true  | false | false | false | +-------+-------+-------+-------*/

The following query shows how to use thePATTERN_ANALYZERtext analyzer with default analyzer options:

WITHdataAS(SELECT'Please use foobar@example.com as your email.'ASemail)SELECTSEARCH(email,'exam',analyzer=>'PATTERN_ANALYZER')ASa,SEARCH(email,'foobar',analyzer=>'PATTERN_ANALYZER')ASb,SEARCH(email,'example.com',analyzer=>'PATTERN_ANALYZER')ASc,SEARCH(email,R'foobar "EXAMPLE.com as" email',analyzer=>'PATTERN_ANALYZER')ASdFROMdata;/*-------+-------+-------+-------+ | a     | b     | c     | d     | +-------+-------+-------+-------+ | false | true  | true  | true  | +-------+-------+-------+-------*/

The following query shows additional ways to search, using thePATTERN_ANALYZER text analyzer withcustom analyzer options:

WITHdataAS(SELECT'Please use foobar@EXAMPLE.com as your email.'ASemail)SELECTSEARCH(email,'EXAMPLE',analyzer=>'PATTERN_ANALYZER',analyzer_options=>'{"patterns": ["[A-Z]*"]}')ASa,SEARCH(email,'example',analyzer=>'PATTERN_ANALYZER',analyzer_options=>'{"patterns": ["[a-z]*"]}')ASb,SEARCH(email,'example.com',analyzer=>'PATTERN_ANALYZER',analyzer_options=>'{"patterns": ["[a-z]*"]}')ASc,SEARCH(email,'example.com',analyzer=>'PATTERN_ANALYZER',analyzer_options=>'{"patterns": ["[a-zA-Z.]*"]}')ASdFROMdata;/*-------+-------+-------+-------+-------+ | a     | b     | c     | d     | e     | +-------+-------+-------+-------+-------+ | true  | false | false | true  | false | +-------+-------+-------+-------+-------*/

For additional examples that include analyzer options,see theText analysis reference guide.

For helpful analyzer recipes that you can use to enhanceanalyzer-supported queries, see theSearch with text analyzers user guide.

VECTOR_SEARCH

VECTOR_SEARCH({TABLEbase_table|(base_table_query)},column_to_search,{TABLEquery_table|(query_table_query)},[,query_column_to_search=>query_column_to_search_value][,top_k=>top_k_value][,distance_type=>distance_type_value][,options=>options_value])

Description

TheVECTOR_SEARCH function lets you search embeddings to find semanticallysimilar entities.

Embeddings are high-dimensional numerical vectors that represent a given entity,like a piece of text or an audio file. Machine learning (ML) models useembeddings to encode semantics about such entities to make it easier toreason about and compare them. For example, a common operation in clustering,classification, and recommendation models is to measure the distance betweenvectors in anembedding space tofind items that are most semantically similar.

Definitions

  • base_table: The table to search for nearest neighbor embeddings.
  • base_table_query: A query that you can use to pre-filter the basetable. OnlySELECT,FROM, andWHERE clauses are allowed in this query.Don't apply any filters to the embedding column.You can't uselogical views in this query.Using asubquery mightinterfere with index usage or cause your query to fail.If the base table is indexed and theWHERE clause contains columns that arenot stored in the index, thenVECTOR_SEARCH post-filters on those columnsinstead. To learn more, seeStore columns and pre-filter.
  • column_to_search: The name of the base table columnto search for nearest neighbor embeddings. The column must havea type ofARRAY<FLOAT64>. All elements in the array must be non-NULL, andall values in the column must have the same array dimensions.If the column has a vector index, BigQuery attempts to use it.To determine if an index was used in the vector search, seeVector index usage.
  • query_table: The table that provides theembeddings for which to find nearest neighbors. All columns are passedthrough as output columns.
  • query_table_query: A query that provides theembeddings for which to find nearest neighbors. All columns are passedthrough as output columns.
  • query_column_to_search: A named argument with aSTRING value.query_column_to_search_value specifies the name of the column in the querytable or statement that contains the embeddings for which to find nearestneighbors. The column must have a type ofARRAY<FLOAT64>. All elements inthe array must be non-NULLand all values in the column must have the samearray dimensions as the values in thecolumn_to_search column. If you don'tspecifyquery_column_to_search_value, the function uses thecolumn_to_search value or picks the most appropriate column.
  • top_k: A named argument with anINT64 value.top_k_valuespecifies the number of nearest neighbors toreturn. The default is10. A negative value is treated as infinity, meaningthat all values are counted as neighbors and returned.
  • distance_type: A named argument with aSTRING value.distance_type_value specifies the type of metric to use tocompute the distance between two vectors. Supported distance types areEUCLIDEAN,COSINE, andDOT_PRODUCT.The default isEUCLIDEAN.

    If you don't specifydistance_type_value and thecolumn_to_searchcolumn has a vector index that's used,VECTOR_SEARCH uses the distancetype specified in thedistance_type option of theCREATE VECTOR INDEX statement.

  • options: A named argument with a JSON-formattedSTRING value.options_value is a literal that specifies the following vector searchoptions:

    • fraction_lists_to_search: A JSON number that specifies thepercentage of lists to search. For example,options => '{"fraction_lists_to_search":0.15}'. Thefraction_lists_to_search value must be in the range0.0 to1.0,exclusive.

      Specifying a higher percentage leads to higher recall and slowerperformance, and the converse is true when specifying a lower percentage.

      fraction_lists_to_search is only used when a vector index is also used.If you don't specify afraction_lists_to_search value but an index ismatched, an appropriate value is picked.

      The number of available lists to search is determined by thenum_lists option in theivf_options option or derived fromtheleaf_node_embedding_count option in thetree_ah_options option of theCREATE VECTOR INDEX statement ifspecified. Otherwise, BigQuery calculates an appropriate number.

      You can't specifyfraction_lists_to_search whenuse_brute_force isset totrue.

    • use_brute_force: A JSON boolean that determines whether to use bruteforce search by skipping the vector index if one is available. Forexample,options => '{"use_brute_force":true}'. Thedefault isfalse. If you specifyuse_brute_force=false and there isno useable vector index available, brute force is used anyway.

    options defaults to'{}' to denote that all underlying options use theircorresponding default values.

Details

You can optionally useVECTOR_SEARCH with avector index. Whena vector index is used,VECTOR_SEARCH uses theApproximate NearestNeighbor search technique to help improve vector search performance, withthe trade-off of reducingrecall and so returning more approximateresults. When a base table is large, the use of an index typically improvesperformance without significantly sacrificing recall. Brute force is used toreturn exact results when a vector index isn't available, and you canchoose to use brute force to get exact results even when a vector indexis available.

Output

For each row in the query data, the output contains multiple rows from thebase table that satisfy the search criteria. The number of results rows perquery table row is either 10 or thetop_k value if it's specified. Theorder of the output isn't guaranteed.

The output includes the following columns:

  • query: ASTRUCT value that contains all selected columns from the querydata.
  • base: ASTRUCT value that contains all columns frombase_table or asubset of the columns frombase_table that you selected in thebase_table_query query.
  • distance: AFLOAT64 value that represents the distance between the basedata and the query data.

Limitations

BigQuery data security and governance rules apply to the use ofVECTOR_SEARCH, which results in the following behavior:

  • If the base table hasrow-level security policies,VECTOR_SEARCH applies the row-levelaccess policies to the query results.
  • If the indexed column from the base table hasdata masking policies,VECTOR_SEARCH succeeds only if the userrunning the query has theFine-Grained Reader role on the policy tagsthat are used. Otherwise,VECTOR_SEARCH fails with an invalid query error.
  • If any base table column or any column in the query table or statement hascolumn-level security policies and you don't have appropriatepermissions to access the column,VECTOR_SEARCH fails with a permissiondenied error.

  • The project that runs the query containingVECTOR_SEARCH must match theproject that contains the base table.

Examples

The following queries create test tablesbase_table andquery_table to usein subsequent query examples. These tables use a fictional 2-dimensionalembedding of various animal names for readability, but a typical text embeddinguses hundreds or thousands of dimensions.

CREATEORREPLACETABLEmydataset.base_table(idSTRING,my_embeddingARRAY<FLOAT64>);INSERTmydataset.base_table(id,my_embedding)VALUES('dog',[1.0,2.0]),('wolf',[2.0,4.0]),('snake',[-2.0,3.0]),('lion',[2.0,-2.5]),('tiger',[3.0,-2.0]),('otter',[-3.0,-1.0]),('whale',[-5.0,-1.0]);
CREATEORREPLACETABLEmydataset.query_table(query_idSTRING,embeddingARRAY<FLOAT64>);INSERTmydataset.query_table(query_id,embedding)VALUES('dog',[1.0,2.0]),('cat',[1.0,-1.0]);

The following example searches themy_embedding column ofbase_table forthe top two embeddings that match each row of data in theembedding column ofquery_table:

SELECT*FROMVECTOR_SEARCH(TABLEmydataset.base_table,'my_embedding',(SELECTquery_id,embeddingFROMmydataset.query_table),'embedding',top_k=>2);/*----------------+-----------------+---------+----------------------------------------+ | query.query_id | query.embedding | base.id | base.my_embedding | distance           | +----------------+-----------------+---------+-------------------+--------------------+ | dog            |  1.0            | dog     |  1.0              | 0.0                | |                |  2.0            |         |  2.0              |                    | +----------------+-----------------+---------+-------------------+--------------------+ | dog            |  1.0            | wolf    |  2.0              | 2.23606797749979   | |                |  2.0            |         |  4.0              |                    | +----------------+-----------------+---------+-------------------+--------------------+ | cat            |  1.0            | lion    |  2.0              | 1.8027756377319946 | |                | -1.0            |         | -2.5              |                    | +----------------+-----------------+---------+-------------------+--------------------+ | cat            |  1.0            | tiger   |  3.0              | 2.23606797749979   | |                | -1.0            |         | -2.0              |                    | +----------------+-----------------+---------+-------------------+--------------------*/

The following example pre-filtersbase_table to rows whereid isn't equal to"wolf" and then searches themy_embedding column ofbase_table for the toptwo embeddings that match each row of data in theembedding column ofquery_table.

SELECT*FROMVECTOR_SEARCH((SELECT*FROMmydataset.base_tableWHEREid!='wolf'),'my_embedding',(SELECTquery_id,embeddingFROMmydataset.query_table),'embedding',top_k=>2,options=>'{"use_brute_force":true}');/*----------------+-----------------+---------+----------------------------------------+ | query.query_id | query.embedding | base.id | base.my_embedding | distance           | +----------------+-----------------+---------+-------------------+--------------------+ | dog            |  1.0            | dog     |  1.0              | 0.0                | |                |  2.0            |         |  2.0              |                    | +----------------+-----------------+---------+-------------------+--------------------+ | dog            |  1.0            | snake   | -2.0              | 3.1622776601683795 | |                |  2.0            |         |  3.0              |                    | +----------------+-----------------+---------+-------------------+--------------------+ | cat            |  1.0            | lion    |  2.0              | 1.8027756377319946 | |                | -1.0            |         | -2.5              |                    | +----------------+-----------------+---------+-------------------+--------------------+ | cat            |  1.0            | tiger   |  3.0              | 2.23606797749979   | |                | -1.0            |         | -2.0              |                    | +----------------+-----------------+---------+-------------------+--------------------*/

The following example searches themy_embedding column ofbase_table forthe top two embeddings that match each row of data in theembedding column ofquery_table, and uses theCOSINE distance type to measure the distancebetween the embeddings:

SELECT*FROMVECTOR_SEARCH(TABLEmydataset.base_table,'my_embedding',TABLEmydataset.query_table,'embedding',top_k=>2,distance_type=>'COSINE');/*----------------+-----------------+---------+-------------------------------------------+ | query.query_id | query.embedding | base.id | base.my_embedding | distance              | +----------------+-----------------+---------+-------------------+-----------------------+ | dog            |  1.0            | wolf    |  2.0              | 0                     | |                |  2.0            |         |  4.0              |                       | +----------------+-----------------+---------+-------------------+-----------------------+ | dog            |  1.0            | dog     |  1.0              | 0                     | |                |  2.0            |         |  2.0              |                       | +----------------+-----------------+---------+-------------------+-----------------------+ | cat            |  1.0            | lion    |  2.0              | 0.0061162653263812095 | |                | -1.0            |         | -2.5              |                       | +----------------+-----------------+---------+-------------------+-----------------------+ | cat            |  1.0            | tiger   |  3.0              | 0.019419324309079777  | |                | -1.0            |         | -2.0              |                       | +----------------+-----------------+---------+-------------------+-----------------------*/

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-11-24 UTC.