Search functions in GoogleSQL

GoogleSQL for Spanner supports the following search functions.

Category	Functions	Description
Indexing	`TOKEN` `TOKENIZE_BOOL` `TOKENIZE_FULLTEXT` `TOKENIZE_JSON` `TOKENIZE_NGRAMS` `TOKENIZE_NUMBER` `TOKENIZE_SUBSTRING` `TOKENLIST_CONCAT`	Functions that you can use to create search indexes.
Retrieval and presentation	`SCORE` `SCORE_NGRAMS` `SEARCH` `SEARCH_NGRAMS` `SEARCH_SUBSTRING` `SNIPPET`	Functions that you can use to search for data, score the search result, or format the search result.
Debugging	`DEBUG_TOKENLIST`	Functions that you can use for debugging.

Function list

Name	Summary
`DEBUG_TOKENLIST`	Displays a human-readable representation of tokens present in the`TOKENLIST` value for debugging purposes.
`SCORE`	Calculates a relevance score of a`TOKENLIST` for a full-text search query. The higher the score, the stronger the match.
`SCORE_NGRAMS`	Calculates a relevance score of a`TOKENLIST` for a fuzzy search. The higher the score, the stronger the match.
`SEARCH`	Returns`TRUE` if a full-text search query matches tokens.
`SEARCH_NGRAMS`	Checks whether enough n-grams match the tokens in a fuzzy search.
`SEARCH_SUBSTRING`	Returns`TRUE` if a substring query matches tokens.
`SNIPPET`	Gets a list of snippets that match a full-text search query.
`TOKEN`	Constructs an exact match`TOKENLIST` value by tokenizing a`BYTE` or`STRING` value verbatim to accelerate exact match expressions in SQL.
`TOKENIZE_BOOL`	Constructs a boolean`TOKENLIST` value by tokenizing a`BOOL` value to accelerate boolean match expressions in SQL.
`TOKENIZE_FULLTEXT`	Constructs a full-text`TOKENLIST` value by tokenizing text for full-text matching.
`TOKENIZE_JSON`	Constructs a JSON`TOKENLIST` value by tokenizing a`JSON` value to accelerate JSON predicate expressions in SQL.
`TOKENIZE_NGRAMS`	Constructs an n-gram`TOKENLIST` value by tokenizing a`STRING` value for matching n-grams.
`TOKENIZE_NUMBER`	Constructs a numeric`TOKENLIST` value by tokenizing numeric values to accelerate numeric comparison expressions in SQL.
`TOKENIZE_SUBSTRING`	Constructs a substring`TOKENLIST` value by tokenizing text for substring matching.
`TOKENLIST_CONCAT`	Constructs a`TOKENLIST` value by concatenating one or more`TOKENLIST` values.

`DEBUG_TOKENLIST`

DEBUG_TOKENLIST(tokenlist)

Description

Displays a human-readable representation of tokens present in aTOKENLISTvalue for debugging purposes.

Definitions

tokenlist: TheTOKENLIST value to display.

Details

Note: The returnedSTRING value is intended solely for debugging purposes andits format is subject to change without notice.

The output of this function is dependent on the source of theTOKENLIST valueprovided as input.

Return type

STRING

Examples

The following query illustrates how attributes and positions are represented:

Inhello(boundary),hello is the text of the token andboundary isan attribute of the token.
Tokendb has no attributes.
In[#world, world](boundary),#world andworld are both tokens addedto the tokenlist, at the same position.boundary is the attribute for bothof them. This can match either#world orworld query terms.

SELECTDEBUG_TOKENLIST(TOKENIZE_FULLTEXT('Hello DB #World'))ASResult;/*------------------------------------------------+ | Result                                         | +------------------------------------------------+ | hello(boundary), db, [#world, world](boundary) | +------------------------------------------------*/

The following query illustrates how equality and range are represented:

==1 and==10 represent equality tokens for1 and10.
[1, 1] represents a range token with1 as the lower bound and1 as theupper bound.

SELECTDEBUG_TOKENLIST(TOKENIZE_NUMBER([1,10],min=>1,max=>10))ASResult;/*--------------------------------------------------------------------------------+ | Result                                                                         | +--------------------------------------------------------------------------------+ | ==1, ==10, [1, 1], [1, 2], [1, 4], [1, 8], [9, 10], [9, 12], [9, 16], [10, 10] | +--------------------------------------------------------------------------------*/

`SCORE`

SCORE(tokens,search_query[,dialect=>{"rquery"|"words"|"words_phrase"}][,language_tag=>value][,enhance_query=>{TRUE|FALSE}][,options=>value])

Description

Calculates a relevance score of aTOKENLIST for a full-text search query. Thehigher the score, the stronger the match.

Definitions

tokens: ATOKENLIST value that represents a list of full-text tokens.

search_query: ASTRING value that represents a search query, which isinterpreted based on thedialectargument. For more information,see the search query overview.
dialect: A named argument with aSTRING value. The value determines howsearch_query is understood and processed. If the value isNULL or thisargument isn't specified,rquery is used by default. This functionsupports the following dialect values:
- rquery: The raw search query (or "rquery") using adomain-specific language (DSL). For more information, seerquery syntax overview.For rquery syntax rules, seerquery syntax.
- words: Perform a conjunctive search, requiring all terms insearch_query to be present. For an overview, seewords dialect overview.For syntax rules, seewords syntax.
- words_phrase: Perform a phrase search that requires all terms insearch_query to be adjacent and in order. For an overview, seewords phrase overview.For syntax rules, seewords_phrase syntax.
language_tag: A named argument with aSTRING value. The value containsanIETF BCP 47 language tag. You can use this tag tospecify the language forsearch_query. If the value for this argument isNULL, this function doesn't use a specific language. If this argumentisn't specified,NULL is used by default.
enhance_query: A named argument with aBOOL value. The value determineswhether to enhance the search query. For example, ifenhance_query isenabled, a search query containing the termclassic can expand to includesimilar terms such asclassical. If theenhance_query call times out, thesearch query proceeds without enhancement. However, if the query includes@{require_enhance_query=true} SELECT ..., a timeout causes the entire queryto fail instead. The default timeout for query enhancement is 500 ms,which you can override using a hint like@{enhance_query_timeout_ms=200} SELECT ....
- IfTRUE, the search query is enhanced to improve search quality.
- IfFALSE (default), the search query isn't enhanced.
options: A named argument with aJSON value. The value represents the fine-tuning for the search scoring.
- bigram_weight: A multiplier for bigrams, which have matching termsadjacent to each other. The default is 2.0.
- idf_weight: A multiplier for term commonality. Hits on rare termsscore relatively higher than hits on common terms. The default is 1.0.
- token_category_weights: A multiplier for each HTML category. Theavailable categories are:small,medium,large,title.
- version: A distinct release of the Scorer that bundles a specific setof active features and default parameter values.The available versions are:1,2, and the default is1.For example:options=> JSON '{"version": 2}'

Details

This function must reference a full-textTOKENLIST column in a table thatis also indexed in a search index. To add a full-textTOKENLIST column toa table and to a search index, see the examples for this function.
This function requires theSEARCH function in the same SQL query.
This function returns0 whentokens orsearch_query isNULL.

Versions

TheSCORE algorithm is periodically updated. After a short evaluation period,the default behavior updates to the newest version. You are encouraged to leavethe version unspecified so that your database can benefit from improvements totheSCORE algorithm. However, you can set the version number in theoptionsargument to retain old behavior.

2 (2025-08):
- Whenenhance_query is true, hits on synonyms are now demotedbased on confidence in the synonym's accuracy.
- Improved the algorithm that limits each query term's maximumcontribution to the overall score.
- Fixed an issue where documents with exactly one hit for a query termreceived a lower score than intended.
- Fixed an issue where query terms under an "OR" were not weightedcorrectly, especially whenenhance_query was used.
1 (Default): The initial version.

Return type

FLOAT64

Examples

The following examples reference a table calledAlbums and a search indexcalledAlbumsIndex.

TheAlbums table contains a column calledDescriptionTokens, which tokenizesthe input added to theDescription column, and then saves those tokens in theDescriptionTokens column. Finally,AlbumsIndex indexesDescriptionTokens.OnceDescriptionTokens is indexed, it can be used with theSCORE function.

CREATETABLEAlbums(SingerIdINT64NOTNULL,AlbumIdINT64NOTNULL,DescriptionSTRING(MAX),DescriptionTokensTOKENLISTAS(TOKENIZE_FULLTEXT(Description))HIDDEN)PRIMARYKEY(SingerId,AlbumId);CREATESEARCHINDEXAlbumsIndexONAlbums(DescriptionTokens);INSERTINTOAlbums(SingerId,AlbumId,Description)VALUES(1,1,'classical album');INSERTINTOAlbums(SingerId,AlbumId,Description)VALUES(1,2,'classical and rock album');

The following query searches the column calledDescription for a token calledclassical album. If this token is found for singer ID1, the matchingDescription are returned with the corresponding score. Bothclassical albumandclassical and rock album have the termsclassical andalbum, but thefirst one has a higher score because the terms are adjacent.

SELECTa.Description,SCORE(a.DescriptionTokens,'classical album')ASScoreFROMAlbumsaWHERESEARCH(a.DescriptionTokens,'classical album');/*--------------------------+---------------------+ | Description              | Score               | +--------------------------+---------------------+ | classical album          | 1.2818930149078369  | | classical and rock album | 0.50003194808959961 | +--------------------------+---------------------*/

The following query is like the previous one. However, scores are boosted morewithbigram_weight on adjacent positions.

SELECTa.Description,SCORE(a.DescriptionTokens,'classical album',options=>JSON'{"bigram_weight": 3.0}')ASScoreFROMAlbumsaWHERESEARCH(a.DescriptionTokens,'classical album');/*--------------------------+---------------------+ | Description              | Score               | +--------------------------+---------------------+ | classical album          | 1.7417128086090088  | | classical and rock album | 0.50003194808959961 | +--------------------------+---------------------*/

The following query usesSCORE in theORDER BY clause to get the row withthe highest score.

SELECTa.DescriptionFROMAlbumsaWHERESEARCH(a.DescriptionTokens,'classical album')ORDERBYSCORE(a.DescriptionTokens,'classical album')DESCLIMIT1;/*--------------------------+ | Description              | +--------------------------+ | classical album          | +--------------------------*/

`SCORE_NGRAMS`

SCORE_NGRAMS(tokens,ngrams_query[,language_tag=>value][,algorithm=>value][,array_aggregator=>value])

Description

Calculates a relevance score of aTOKENLIST for a fuzzy search. The higherthe score, the stronger the match.

Definitions

tokens: ATOKENLIST value that contains a list of n-gram tokens. Thisvalue must be aTOKENLIST generated by eitherTOKENIZE_SUBSTRING orTOKENIZE_NGRAMS, and the tokenization function'svalue_to_tokenizeargument must be a column reference. ATOKENLIST with an expression asvalue_to_tokenize or aTOKENLIST generated byTOKENLIST_CONCAT isn'tsupported, such asTOKENIZE_SUBSTRING(REGEXP_REPLACE(col, 'foo', 'bar')) orTOKENLIST_CONCAT([token1, token2]).If using an expression asvalue_to_tokenize or if aTOKENLIST generatedbyTOKENLIST_CONCAT is necessary, consider creating a generated column andthen creating aTOKENLIST from that generated column.
ngrams_query: ASTRING value that represents a fuzzy search query.
language_tag: A named argument with aSTRING value. The value containsanIETF BCP 47 language tag. You can use this tag tospecify the language forngrams_query. If the value for this argument isNULL, this function doesn't use a specific language. If this argumentisn't specified,NULL is used by default.
algorithm: A named argument with aSTRING value. The value specifiesthe scoring algorithm for the fuzzy search. The default value for thisargument istrigrams, and currently it's the only supported algorithm.
- trigrams: Generates trigrams (n-grams with size 3) without duplicationfrom the query, then also generates trigrams without duplication from thesource column of thetokens. Matches are an intersection betweenquery trigrams and source trigrams. The score is roughly calculated as(match_count / (query_trigrams + source_trigrams - match_count)).
array_aggregator: A named argument that determines how scoring isperformed on array. This argument can be used only when tokenlist is from anarray column. This argument uses aSTRING value. The default value forthis argument isflatten.
- flatten: Flattens the array column as a single string first, thencalculates a score from the flattened string. More non-matching elementsin the array makes the score lower.
- max_element: Scores each element separately, then returns the highestscore.

Details

This function returns0 whentokens orngrams_query isNULL.
UnlikeSEARCH_NGRAMS, this function requires access to the source columnoftokens. Therefore, it's often advantageous to include the sourcecolumn inSEARCH INDEX'sSTORING clause, to avoid a join withthe base table. Please seeindex-only scans.

Return type

FLOAT64

Examples

The following examples reference a table calledAlbums and a search indexcalledAlbumsIndex.

TheAlbums table contains a columnDescriptionSubstrTokens which tokenizesDescription column usingTOKENIZE_SUBSTRING. Finally,AlbumsIndex storesDescription, so that the query below doesn't have to join with the basetable.

CREATETABLEAlbums(AlbumIdINT64NOTNULL,DescriptionSTRING(MAX),DescriptionSubstrTokensTOKENLISTAS(TOKENIZE_SUBSTRING(Description,ngram_size_max=>3))HIDDEN)PRIMARYKEY(AlbumId);CREATESEARCHINDEXAlbumsIndexONAlbums(DescriptionSubstrTokens)STORING(Description);INSERTINTOAlbums(AlbumId,Description)VALUES(1,'rock album');INSERTINTOAlbums(AlbumId,Description)VALUES(2,'classical album');

The following query scoresDescription withclasic albun, which ismisspelled.

SELECTa.Description,SCORE_NGRAMS(a.DescriptionSubstrTokens,'clasic albun')ASScoreFROMAlbumsa/*-----------------+---------------------+ | Description     | Score               | +-----------------+---------------------+ | rock album      | 0.14285714285714285 | | classical album | 0.38095238095238093 | +-----------------+---------------------*/

The following query usesSCORE_NGRAMS in theORDER BY clause to produce therow with the highest score.

SELECTa.DescriptionFROMAlbumsaWHERESEARCH_NGRAMS(a.DescriptionSubstrTokens,'clasic albun')ORDERBYSCORE_NGRAMS(a.DescriptionSubstrTokens,'clasic albun')DESCLIMIT1/*-----------------+ | Description     | +-----------------+ | classical album | +-----------------*/

`SEARCH`

SEARCH(tokens,search_query[,dialect=>{"rquery"|"words"|"words_phrase"}][,language_tag=>value][,enhance_query=>{TRUE|FALSE}])

Description

ReturnsTRUE if a full-text search query matches tokens.

Definitions

tokens: ATOKENLIST value that contains a list of full-text tokens. Itmust be aTOKENLIST generated by eitherTOKENIZE_FULLTEXT, or byconcatenatingTOKENLISTs fromTOKENIZE_FULLTEXT usingTOKENLIST_CONCAT.
search_query: ASTRING value that represents a search query, which isinterpreted based on thedialectargument. For more information,see thesearch query overview.
dialect: A named argument with aSTRING value. The value determines howsearch_query is understood and processed. If the value isNULL or thisargument isn't specified,rquery is used by default. This functionsupports the following dialect values:
- rquery: The raw search query (or "rquery") using adomain-specific language (DSL). For more information, seerquery syntax overview.For rquery syntax rules, seerquery syntax.
- words: Perform a conjunctive search, requiring all terms insearch_query to be present. For an overview, seewords dialect overview.For syntax rules, seewords syntax.
- words_phrase: Perform a phrase search that requires all terms insearch_query to be adjacent and in order. For an overview, seewords phrase overview.For syntax rules, seewords_phrase syntax.
language_tag: A named argument with aSTRING value. The value containsanIETF BCP 47 language tag. You can use this tag tospecify the language forsearch_query. If the value for this argument isNULL, this function doesn't use a specific language. If this argumentisn't specified,NULL is used by default.
enhance_query: A named argument with aBOOL value. The value determineswhether to enhance the search query. For example, ifenhance_query isenabled, a search query containing the termclassic can expand to includesimilar terms such asclassical. If theenhance_query call times out, thesearch query proceeds without enhancement. However, if the query includes@{require_enhance_query=true} SELECT ..., a timeout causes the entire queryto fail instead. The default timeout for query enhancement is 500 ms,which you can override using a hint like@{enhance_query_timeout_ms=200} SELECT ....
- IfTRUE, the search query is enhanced to improve search quality.
- IfFALSE (default), the search query isn't enhanced.

Details

ReturnsTRUE iftokens is a match forsearch_query.
This function must reference a full-textTOKENLIST column in a table thatis also indexed in a search index. To add a full-textTOKENLIST column toa table and to a search index, see the examples for this function.
This function returnsNULL whentokens orsearch_query isNULL.
This function can only be used in theWHERE clause of a SQL query.

Search query syntax dialects

Search query uses rquery syntax by default. You can specify other supportedsyntax dialects using thedialect argument.

rquery syntax (default)

The rquery dialect follows these rules:

Multiple terms implyAND. For example, "big time" is equivalent tobig AND time.
TheOR operator implies disjunction between two terms, such asbig ORtime. The predicateSEARCH(tl, 'big time OR fast car') is equivalentto:
```
SEARCH(tl,'big')AND(SEARCH(tl,'time')ORSEARCH(tl,'fast'))ANDSEARCH(tl,'car');
```
OR only applies to the two adjacent terms so the search expressionbig time OR fast car searches for all the documents that have thetermsbig andcar and eithertime orfast.
The OR operator is case sensitive.
The pipe character (|) is a shortcut forOR.
Double quotes mean a phrase search. For example, the rquery"fast car"matches "You got a fast car", but doesn't match "driving fast in mycar".
TheAROUND operator matches terms that are within a certain distanceof each other, and in the same order (the default is five tokens). Forexample, the rqueryfast AROUND car matches "driving fast in my car",but doesn't match "driving fast in his small shiny metal Italian car".The default is to match terms separated by, at most, five positions. Toadjust the distance, pass an argument to theAROUND operator. supports two syntaxes forAROUND:
- fast AROUND(10) car
- fast AROUND 10 car
TheAROUND operator is case sensitive.
Negation of a single term is expressed with a dash (-). For example-dog matches all documents that don't contain the termdog.
Punctuation is generally ignored. For example, "Fast Car!" is equivalentto "Fast Car".
Search is case insensitive. For example, "Fast Car" matches "fast car".

The following table explains the meaning of various rquery strings:

rquery	Explanation
`Miles Davis`	Matches documents that contain both terms "Miles" and "Davis".
`Miles OR Davis`	Matches documents that contain at least one of the terms "Miles" and "Davis".
`-Davis`	Matches all documents that don't contain the term "Davis".
`"Miles Davis" -"Miles Jaye"`	Matches documents that contain two adjacent terms "Miles" and "Davis", but don't contain adjacent "Miles" and "Jaye". For example, this query matches "I saw Miles Davis last night and Jaye earlier today", but doesn't match "I saw Miles Davis and Miles Jaye perform together".
`Davis\|Jaye`	This is the same as`Davis OR Jaye`.
`and OR or`	Matches documents that have either the term "and" or the term "or" (the`OR` operator must be uppercase)

words syntax
The words dialect follows these rules:
- Multiple terms implyAND. For example, "red yellow blue" is equivalenttored AND yellow AND blue.
- Punctuation is generally ignored. For example, "red*yellow%blue" isequivalent to "red yellow blue".
- Search is case insensitive.
words_phrase syntax
The words_phrase dialect follows these rules:
- Multiple terms imply a phrase. For example, the query "colorful rainbow"matches "There is a colorful rainbow", but doesn't match "The rainbow iscolorful".
- Punctuation is generally ignored. For example, "colorful rainbow!" isequivalent to "colorful rainbow".
- Search is case insensitive.

Return type

BOOL

Examples

The following examples reference a table calledAlbums and a search indexcalledAlbumsIndex.

TheAlbums table contains a column calledDescriptionTokens, which tokenizestheDescription column usingTOKENIZE_FULLTEXT, and then saves those tokensin theDescriptionTokens column. Finally,AlbumsIndex indexesDescriptionTokens. OnceDescriptionTokens is indexed, it can be used withtheSEARCH function.

CREATETABLEAlbums(SingerIdINT64NOTNULL,AlbumIdINT64NOTNULL,DescriptionSTRING(MAX),DescriptionTokensTOKENLISTAS(TOKENIZE_FULLTEXT(Description))HIDDEN)PRIMARYKEY(SingerId,AlbumId);CREATESEARCHINDEXAlbumsIndexONAlbums(DescriptionTokens)PARTITIONBYSingerId;INSERTINTOAlbums(SingerId,AlbumId,Description)VALUES(1,1,'rock album');INSERTINTOAlbums(SingerId,AlbumId,Description)VALUES(1,2,'classical album');

The following query searches the column calledDescription for a token calledclassical. If this token is found for singer ID1, the matching rows arereturned.

SELECTa.AlbumId,a.DescriptionFROMAlbumsaWHEREa.SingerId=1ANDSEARCH(a.DescriptionTokens,'classical');/*---------------------------+ | AlbumId | Description     | +---------------------------+ | 2       | classical album | +---------------------------*/

The following query is like the previous one. However, ifDescription containstheclassical orrock token, the matching rows are returned.

SELECTa.AlbumId,a.DescriptionFROMAlbumsaWHEREa.SingerId=1ANDSEARCH(a.DescriptionTokens,'classical OR rock');/*---------------------------+ | AlbumId | Description     | +---------------------------+ | 2       | classical album | | 1       | rock album      | +---------------------------*/

The following query is like the previous ones. However, ifDescriptioncontains theclassic andalbums token, the matching rows are returned. Whenenhance_query is enabled, it includes similar matches ofclassical andalbum.

SELECTa.AlbumId,a.DescriptionFROMAlbumsaWHEREa.SingerId=1ANDSEARCH(a.DescriptionTokens,'classic albums',enhance_query=>TRUE);/*---------------------------+ | AlbumId | Description     | +---------------------------+ | 2       | classical album | +---------------------------*/

`SEARCH_NGRAMS`

SEARCH_NGRAMS(tokens,ngrams_query[,language_tag=>value][,min_ngrams=>value][,min_ngrams_percent=>value])

Description

Checks whether enough n-grams match the tokens in a fuzzy search.

Definitions

tokens: ATOKENLIST value that contains a list of n-gram tokens. It mustbe aTOKENLIST generated byTOKENIZE_SUBSTRING,TOKENIZE_NGRAMS, or byconcatenatingTOKENLISTs fromTOKENIZE_SUBSTRING usingTOKENLIST_CONCAT.
ngrams_query: ASTRING value that represents a fuzzy search query. Thisfunction generates n-gram query terms from this value, using the sametokenization method as what was used to producetokens (for example, ifTOKENIZE_SUBSTRING was used,ngrams_query is split into lower-casedwords before producing n-grams), withtoken'sngram_size_max as n-gramsize.
language_tag: A named argument with aSTRING value. The value containsanIETF BCP 47 language tag. You can use this tag tospecify the language forngrams_query. If the value for this argument isNULL, this function doesn't use a specific language. If this argumentisn't specified,NULL is used by default.
min_ngrams: A named argument with anINT64 value. The value specifies theminimum number of n-grams inngrams_query that have to match in order forSEARCH_NGRAMS to returntrue. This only counts distinct n-grams andignores repeating n-grams. The default value for this argument is2.
min_ngrams_percent: A named argument with aFLOAT64 value.The value specifies the minimum percentage of n-grams inngrams_query thathave to match in order forSEARCH_NGRAMS to returntrue. This onlycounts distinct n-grams and ignores repeating n-grams.

Details

This function must reference a substring or n-gramsTOKENLIST column in atable that's also indexed in a search index.
This function returnsNULL whentokens orngrams_query isNULL.
This function returnsfalse if the length ofngrams_query is smaller thanngram_size_min oftokens.
This function can only be used in theWHERE clause of a SQL query.

Return type

BOOL

Examples

The following examples reference a table calledAlbums and a search indexcalledAlbumsIndex.

TheAlbums table contains columnsDescriptionSubstrTokens andDescriptionNgramsTokens which tokenize aDescription column usingTOKENIZE_SUBSTRING andTOKENIZE_NGRAMS, respectively. Finally,AlbumsIndexindexesDescriptionSubstrTokens andDescriptionNgramsTokens.

CREATETABLEAlbums(AlbumIdINT64NOTNULL,DescriptionSTRING(MAX),DescriptionSubstrTokensTOKENLISTAS(TOKENIZE_SUBSTRING(Description,ngram_size_min=>3,ngram_size_max=>3))HIDDEN,DescriptionNgramsTokensTOKENLISTAS(TOKENIZE_NGRAMS(Description,ngram_size_min=>3,ngram_size_max=>3))HIDDEN)PRIMARYKEY(SingerId,AlbumId);CREATESEARCHINDEXAlbumsIndexONAlbums(DescriptionSubstrTokens,DescriptionNgramsTokens);INSERTINTOAlbums(AlbumId,Description)VALUES(1,'rock album');INSERTINTOAlbums(AlbumId,Description)VALUES(2,'classical album');INSERTINTOAlbums(AlbumId,Description)VALUES(3,'last note');

The following query searches the columnDescription forclasic. The queryis misspelled, so querying withSEARCH_SUBSTRING(a.DescriptionSubstrTokens, 'clasic') doesn't return a row,but the n-grams search is able to find similar matches.

SEARCH_NGRAMS first transforms the queryclasic into n-grams of size 3 (thevalue ofDescriptionSubstrTokens'sngram_size_max), producing['asi', 'cla', 'las', 'sic']. Then it finds rows that have at least two ofthese n-grams (the default value formin_ngrams) in theDescriptionSubstrTokens column.

SELECTa.AlbumId,a.DescriptionFROMAlbumsaWHERESEARCH_NGRAMS(a.DescriptionSubstrTokens,'clasic');/*---------------------------+ | AlbumId | Description     | +---------------------------+ | 2       | classical album | +---------------------------*/

If we change themin_ngrams to 1, then the query will also return the row withlast which has one n-gram match withlas. This example illustrates thedecreased relevancy of the returned results when this parameter is set low.

SELECTa.AlbumId,a.DescriptionFROMAlbumsaWHERESEARCH_NGRAMS(a.DescriptionSubstrTokens,'clasic',min_ngrams=>1);/*---------------------------+ | AlbumId | Description     | +---------------------------+ | 2       | classical album | | 3       | last notes      | +---------------------------*/

The following query searches the columnDescription forclasic albun. As theDescriptionSubstrTokens is tokenized byTOKENIZE_SUBSTRING, the query is segmented into['clasic', 'albun'] first, then n-gram tokens are generated from those words, producing the following:['alb', 'asi', 'bun', 'cla', 'las', 'lbu', 'sic'].

SELECTa.AlbumId,a.DescriptionFROMAlbumsaWHERESEARCH_NGRAMS(a.DescriptionSubstrTokens,'clasic albun');/*---------------------------+ | AlbumId | Description     | +---------------------------+ | 2       | classical album | | 1       | rock album      | +---------------------------*/

The following query searches the columnDescription forl al, but using theDescriptionNgramsTokens this time. As theDescriptionNgramsTokens isgenerated byTOKENIZE_NGRAMS, there is no splitting into words before makingn-gram tokens, so the query n-gram tokens are generated as the following:['%20al', 'l%20a'].

SELECTa.AlbumId,a.DescriptionFROMAlbumsaWHERESEARCH_NGRAMS(a.DescriptionNgramsTokens,'l al');/*---------------------------+ | AlbumId | Description     | +---------------------------+ | 2       | classical album | +---------------------------*/

`SEARCH_SUBSTRING`

SEARCH_SUBSTRING(tokens,substring_query[,language_tag=>value][,relative_search_type=>value])

Description

ReturnsTRUE if a substring query matches tokens.

Definitions

tokens: ATOKENLIST value that contains a list of substring tokens. Itmust be aTOKENLIST generated by eitherTOKENIZE_SUBSTRING orby concatenatingTOKENLISTs fromTOKENIZE_SUBSTRING usingTOKENLIST_CONCAT.
substring_query: ASTRING value that represents a substring query.substring_query is first converted to lowercase to matchtokens thatwere converted to lowercase during tokenization.
language_tag: A named argument with aSTRING value. The value containsanIETF BCP 47 language tag. You can use this tag tospecify the language forsubstring_query. If the value for this argument isNULL, this function doesn't use a specific language. If this argumentisn't specified,NULL is used by default.
relative_search_type: A named argument with aSTRING value. The valuerefines the substring search result. To use a givenrelative_search_type,the substringTOKENLIST must have been generated with the correspondingtype in itsTOKENIZE_SUBSTRINGrelative_search_types argument. This functionsupports these relative search types:
- phrase: The substring query terms must appear adjacent to one anotherand in order in the tokenized value (the value that was tokenized toproduce thetokens argument).
- value_prefix: The substring query terms must be found at the start oftokenized value.
- value_suffix: The substring query terms must be found at the end oftokenized value.
- word_prefix: The substring query terms must be found at the start of aword in the tokenized value.
- word_suffix: The substring query terms must be found at the end of aword in the tokenized value.

Details

ReturnsTRUE iftokens is a match forsubstring_query.
This function must reference a substringTOKENLIST column in a table that isalso indexed in a search index. To add a substringTOKENLIST column to atable and to a search index, see the examples for this function.
This function returnsNULL whentokens orsubstring_query isNULL.
This function can only be used in theWHERE clause of a SQL query.

Return type

BOOL

Examples

The following examples reference a table calledAlbums and a search indexcalledAlbumsIndex.

TheAlbums table contains a column calledDescriptionSubstrTokens, whichtokenizes the input added to theDescription column usingTOKENIZE_SUBSTRING, and then saves those substring tokens in theDescriptionSubstrTokens column. Finally,AlbumsIndex indexesDescriptionSubstrTokens. OnceDescriptionSubstrTokens is indexed, it can beused with theSEARCH_SUBSTRING function.

CREATETABLEAlbums(SingerIdINT64NOTNULL,AlbumIdINT64NOTNULL,DescriptionSTRING(MAX),DescriptionSubstrTokensTOKENLISTAS(TOKENIZE_SUBSTRING(Description,support_relative_search=>TRUE))HIDDEN)PRIMARYKEY(SingerId,AlbumId);CREATESEARCHINDEXAlbumsIndexONAlbums(DescriptionSubstrTokens)PARTITIONBYSingerId;INSERTINTOAlbums(SingerId,AlbumId,Description)VALUES(1,1,'rock album');INSERTINTOAlbums(SingerId,AlbumId,Description)VALUES(1,2,'classical album');

The following query searches the column calledDescription for a token calledssic. If this token is found for singer ID1, the matching rows arereturned.

SELECTa.AlbumId,a.DescriptionFROMAlbumsaWHEREa.SingerId=1ANDSEARCH_SUBSTRING(a.DescriptionSubstrTokens,'ssic');/*---------------------------+ | AlbumId | Description     | +---------------------------+ | 2       | classical album | +---------------------------*/

The following query searches the column calledDescription for a token calledbothlbu andoc. If these tokens are found for singer ID1, the matchingrows are returned.

SELECTa.AlbumId,a.DescriptionFROMAlbumsaWHEREa.SingerId=1ANDSEARCH_SUBSTRING(a.DescriptionSubstrTokens,'lbu oc');/*-----------------------+ | AlbumId | Description | +-----------------------+ | 1       | rock album  | +-----------------------*/

The following query searches the column calledDescription for a token calledal at the start of a word. If this token is found for singer ID1, thematching rows are returned.

SELECTa.AlbumId,a.DescriptionFROMAlbumsaWHEREa.SingerId=1ANDSEARCH_SUBSTRING(a.DescriptionSubstrTokens,'al',relative_search_type=>'word_prefix');/*---------------------------+ | AlbumId | Description     | +---------------------------+ | 2       | classical album | | 1       | rock album      | +---------------------------*/

The following query searches the column calledDescription for a token calledal at the start of tokens. If this token is found for singer ID1, thematching rows are returned. Because there are no matches, no rows are returned.

SELECTa.AlbumId,a.DescriptionFROMAlbumsaWHEREa.SingerId=1ANDSEARCH_SUBSTRING(a.DescriptionSubstrTokens,'al',relative_search_type=>'value_prefix');/*---------------------------+ | AlbumId | Description     | +---------------------------+ |         |                 | +---------------------------*/

`SNIPPET`

SNIPPET(data_to_search,raw_search_query[,language_tag=>value][,enhance_query=>{TRUE|FALSE}][,max_snippet_width=>value][,max_snippets=>value][,content_type=>{"text/plain"|"text/html"}])

Description

Gets a list of snippets that match a full-text search query.

Definitions

data_to_search: ASTRING value that represents the data to search over.
raw_search_query: ASTRING value that represents the terms of araw search query.
language_tag: A named argument with aSTRING value. The value containsanIETF BCP 47 language tag. You can use this tag tospecify the language forraw_search_query. If the value for this argument isNULL, this function doesn't use a specific language. If this argumentisn't specified,NULL is used by default.
max_snippets: A named argument with anINT64 value. The value representsthe maximum number of output snippets to produce.
max_snippet_width: A named argument with anINT64 value. The valuerepresents the width of the output snippet. The width is measured by theestimated number of average proportional-width characters. For example, a widecharacter like'M' uses more space than a narrow character like'i'.
enhance_query: A named argument with aBOOL value. The value determineswhether to enhance the search query. For example, ifenhance_query isenabled, a search query containing the termclassic can expand to includesimilar terms such asclassical. If theenhance_query call times out, thesearch query proceeds without enhancement. However, if the query includes@{require_enhance_query=true} SELECT ..., a timeout causes the entire queryto fail instead. The default timeout for query enhancement is 500 ms,which you can override using a hint like@{enhance_query_timeout_ms=200} SELECT ....
- IfTRUE, the search query is enhanced to improve search quality.
- IfFALSE (default), the search query isn't enhanced.
content_type: A named argument with aSTRING value. Indicates the MIMEtype ofdata_to_search. This can be:
- "text/plain" (default):data_to_search contains plain text.
- "text/html":data_to_search contains HTML. The HTML tags are removed.HTML-escaped entities are replaced with their unescaped equivalents (forexample,< becomes<).

Details

Each snippet contains a matching substring of thedata_to_search, and a listof highlights for the location of matching terms.

This function returnsNULL whendata_to_search orraw_search_query isNULL.

Return type

JSON

TheJSON value has this format and definitions:

{"snippets":[{"highlights":[{"begin":json_number,"end":json_number},],"snippet":json_string,"source_begin":json_number,"source_end":json_number}]}

snippets: A JSON object that contains snippets fromdata_to_search.These are snippets of text forraw_search_query from the provideddata_to_search argument.
highlights: A JSON array that contains the position of each search termfound insnippet.
begin: A JSON number that represents the position of a search term's firstcharacter insnippet.
end: A JSON number that represents the position of a search term's finalcharacter insnippet.
snippet: A JSON string that represents an individual snippet fromsnippets.
source_begin: A JSON number that represents the starting ordinal of therange within thedata_to_search argument thatsnippet was sourced from.This range might not contain exactly the same text as the snippet itself.For example, HTML tags are removed from the snippet whencontent_type istext/html, and some types of punctuation and whitespace are either removedor normalized.
source_end: A JSON number that represents the ordinal one past the end ofthe source range. Likesource_begin, can include whitespace or punctuationnot present in the snippet itself.

Examples

The following query produces a single snippet,Rock albums rock. with twohighlighted positions for the matching raw search query term,rock:

SELECTSNIPPET('Rock albums rock.','rock')ASSnippet;/*--------------------------------------------------------------------------------------------------------------------------------------------------+ | Snippet                                                                                                                                          | +--------------------------------------------------------------------------------------------------------------------------------------------------+ | {"snippets":[{"highlights":[{"begin":"1","end":"5"},{"begin":"13","end":"17"}],"snippet":"Rock albums rock.","source_begin":1,"source_end":18}]} | +--------------------------------------------------------------------------------------------------------------------------------------------------*/

`TOKEN`

TOKEN(value_to_tokenize)

Description

Constructs an exact matchTOKENLIST value by tokenizing aBYTE orSTRINGvalue verbatim to accelerate exact match expressions in SQL.

Definitions

value_to_tokenize: ABYTE,ARRAY<BYTE>,STRING orARRAY<STRING>value to tokenize for searching with exact match expressions.

Details

This function returnsNULL whenvalue_to_tokenize isNULL.

Return type

TOKENLIST

Examples

TheAlbums table contains a column calledSingerNameToken andSongTitlesToken, which tokenizes theSingerName andSongTitles columnsrespectively using theTOKEN function. Finally,AlbumsIndex indexesSingerNameToken andSongTitlesToken, which makes it possible forSpanner to use the index to accelerate exact-match expressions in SQL.

CREATETABLEAlbums(SingerIdINT64NOTNULL,AlbumIdINT64NOTNULL,SingerNameSTRING(MAX),SingerNameTokenTOKENLISTAS(TOKEN(SingerName))HIDDEN,SongTitlesARRAY<STRING(MAX)>,SongTitlesTokenTOKENLISTAS(TOKEN(SongTitles))HIDDEN)PRIMARYKEY(SingerId,AlbumId);CREATESEARCHINDEXAlbumsIndexONAlbums(SingerNameToken,SongTitlesToken);-- For example, the INSERT statement below generates SingerNameToken of-- 'Catalina Smith', and SongTitlesToken of-- ['Starting Again', 'The Second Title'].INSERTINTOAlbums(SingerId,AlbumId,SingerName,SongTitles)VALUES(1,1,'Catalina Smith',['Starting Again','The Second Time']);

The following query finds the columnSingerName is equal toCatalina Smith.The query optimizer could choose to accelerate the condition usingAlbumsIndexwithSingerNameToken. Optionally, the query can provide@{force_index = AlbumsIndex} to force the optimizer to useAlbumsIndex.

SELECTa.AlbumIdFROMAlbums@{force_index=AlbumsIndex}aWHEREa.SingerName='Catalina Smith';/*---------+ | AlbumId | +---------+ | 1       | +---------*/

The following query is like the previous ones. However, this time the querysearches forSongTitles that contain the stringStarting Again. Arrayconditions should useARRAY_INCLUDES,ARRAY_INCLUDES_ANY orARRAY_INCLUDES_ALL functions to be eligible for using a search index foracceleration.

SELECTa.AlbumIdFROMAlbumsaWHEREARRAY_INCLUDES(a.SongTitles,'Starting Again');/*---------+ | AlbumId | +---------+ | 1       | +---------*/

`TOKENIZE_BOOL`

TOKENIZE_BOOL(value_to_tokenize)

Description

Constructs a booleanTOKENLIST value by tokenizing aBOOL value toaccelerate boolean match expressions in SQL.

Definitions

value_to_tokenize: ABOOL orARRAY<BOOL> value to tokenize for booleanmatch.

Details

This function returnsNULL whenvalue_to_tokenize isNULL.

Return type

TOKENLIST

Examples

TheAlbums table contains a column calledIsAwardedToken, which tokenizestheIsAwarded column usingTOKENIZE_BOOL function. Finally,AlbumsIndexindexesIsAwardedToken, which makes it possible for Spannerto use the index to accelerate boolean-match expressions in SQL.

CREATETABLEAlbums(SingerIdINT64NOTNULL,AlbumIdINT64NOTNULL,IsAwardedBOOL,IsAwardedTokenTOKENLISTAS(TOKENIZE_BOOL(IsAwarded))HIDDEN)PRIMARYKEY(SingerId,AlbumId);CREATESEARCHINDEXAlbumsIndexONAlbums(IsAwardedToken);-- IsAwarded with TRUE generates IsAwardedToken with value 'y'.INSERTINTOAlbums(SingerId,AlbumId,IsAwarded)VALUES(1,1,TRUE);-- IsAwarded with FALSE generates IsAwardedToken with value 'n'.INSERTINTOAlbums(SingerId,AlbumId,IsAwarded)VALUES(1,2,FALSE);-- NULL IsAwarded generates IsAwardedToken with value NULL.INSERTINTOAlbums(SingerId,AlbumId)VALUES(1,3);

The following query finds the columnIsAwarded is equal toTRUE. The queryoptimizer could choose to accelerate the condition usingAlbumsIndex withIsAwardedToken. Optionally, the query can provide@{force_index = AlbumsIndex} to force the optimizer to useAlbumsIndex.

SELECTa.AlbumIdFROMAlbums@{force_index=AlbumsIndex}aWHEREIsAwarded=TRUE;

`TOKENIZE_FULLTEXT`

TOKENIZE_FULLTEXT(value_to_tokenize[,language_tag=>value][,content_type=>{"text/plain"|"text/html"}][,token_category=>{"small"|"medium"|"large"|"title"}][,remove_diacritics=>{TRUE|FALSE}])

Description

Constructs a full-textTOKENLIST value by tokenizing text for full-textmatching.

Definitions

value_to_tokenize: ASTRING orARRAY<STRING> value to tokenize forfull-text search.
language_tag: A named argument with aSTRING value. The value containsanIETF BCP 47 language tag. You can use this tag tospecify the language forvalue_to_tokenize. If the value for this argument isNULL, this function doesn't use a specific language. If this argumentisn't specified,NULL is used by default.
content_type: A named argument with aSTRING value. Indicates the MIMEtype ofvalue. This can be:
- "text/plain" (default):value_to_tokenize contains plain text. All tokens areassigned to thesmall token category.
- "text/html":value_to_tokenize contains HTML. The HTML tags are removed.HTML-escaped entities are replaced with their unescaped equivalents (forexample,< becomes<). A token category is assigned to each tokendepending on its prominence in the HTML. For example, bolded text ortext in a<h1> tag might have higher prominence than normal text andthus might be placed into a different token category.
  We use token categories during scoring to boost the weight ofhigh-prominence tokens.
token_category: A named argument with aSTRING value. Sets or overridesthe token importance signals detected by the tokenizer and used by thescorer. Useful for cases where two or moreTOKENLISTs are combined withTOKENLIST_CONCAT and one of the inputcolumns is known to have higher or lower than usual importance.
Allowed values:
- "small": The category with the lowest importance.
- "medium": The category with the second lowest importance.
- "large": The category with the second highest importance.
- "title": The category with the highest importance.
remove_diacritics: A named argument with aBOOL value. IfTRUE, thediacritics are removed fromvalue_to_tokenize before indexing. This isuseful when you want to ignore diacritics when searching (full-text,substring, or ngram). When a search query is called on aTOKENLIST valuewithremove_diacritics set asTRUE, the diacritics are also removed atquery time from the search queries.

Details

This function returnsNULL whenvalue_to_tokenize isNULL.

Return type

TOKENLIST

Examples

In the following example, aTOKENLIST column is created using theTOKENIZE_FULLTEXT function:

CREATETABLEAlbums(SingerIdINT64NOTNULL,AlbumIdINT64NOTNULL,TitleSTRING(MAX),DescriptionSTRING(MAX),DescriptionTokensTOKENLISTAS(TOKENIZE_FULLTEXT(Description))HIDDEN,TitleTokensTOKENLISTAS(TOKENIZE_FULLTEXT(Title,token_category=>"title"))HIDDEN)PRIMARYKEY(SingerId,AlbumId);-- DescriptionTokens is generated from the Description value, using the-- TOKENIZE_FULLTEXT function. For example, the following INSERT statement-- generates DescriptionTokens with the tokens ['rock', 'album']. TitleTokens-- will contain ['abbey', 'road'] and these tokens will be assigned to the-- "title" token category.INSERTINTOAlbums(SingerId,AlbumId,Description)VALUES(1,1,'rock album');-- Capitalization and delimiters are removed during tokenization. For example,-- the following INSERT statement generates DescriptionTokens with the tokens-- ['classical', 'albums'].INSERTINTOAlbums(SingerId,AlbumId,Description)VALUES(1,1,'Classical, Albums.');

To query a full-textTOKENLIST column, see theSEARCH function.

`TOKENIZE_JSON`

TOKENIZE_JSON(value_to_tokenize)

Description

Constructs a JSONTOKENLIST value by tokenizing aJSON value toaccelerate JSON predicate matching in SQL.

Definitions

value_to_tokenize: AJSON value to tokenize for JSON predicate matching.

Details

This function returnsNULL whenvalue_to_tokenize isNULL.

Return type

TOKENLIST

Examples

TheAlbums table contains a column calledMetadataTokens, which tokenizestheMetadata column using theTOKENIZE_JSON function.AlbumsIndex indexesMetadataToken, which makes it possible for Spanner to use the indexto accelerate JSON predicate expressions in SQL.

CREATETABLEAlbums(SingerIdINT64NOTNULL,AlbumIdINT64NOTNULL,MetadataJSON,MetadataTokensTOKENLISTAS(TOKENIZE_JSON(Metadata))HIDDEN)PRIMARYKEY(SingerId,AlbumId);CREATESEARCHINDEXAlbumsIndexONAlbums(MetadataTokens);-- Albums can be stored with varying metadata.INSERTINTOAlbums(SingerId,AlbumId,Metadata)VALUES(1,1,JSON'{"AvailableFormats": ["vinyl", "cd"]}'),(1,2,JSON'{"ReissueDate": "1999-07-13", "MultiDiscCount": 2}'),(1,3,JSON'{"RegionalReleases": [{"Region": "Japan", "ReleaseDate": "2025-01-05"}]}');

The following queries perform containment and existence checks on theMetadatacolumn. The query optimizer might choose to accelerate these conditions usingAlbumsIndex andMetadataTokens.

-- Query for albums available on vinyl.SELECTa.AlbumIdFROMAlbumsaWHEREJSON_CONTAINS(a.Metadata,JSON'{"AvailableFormats": ["vinyl"]}');/*---------+ | AlbumId | +---------+ | 1       | +---------*/-- Query for albums with a regional release in Japan.SELECTa.AlbumIdFROMAlbumsaWHEREJSON_CONTAINS(a.Metadata,JSON'{"RegionalReleases": [{"Region": "Japan"}]}');/*---------+ | AlbumId | +---------+ | 3       | +---------*/-- Query for reissued albums (those with a reissue date).SELECTa.AlbumIdFROMAlbumsaWHEREa.Metadata.ReissueDateISNOTNULL;/*---------+ | AlbumId | +---------+ | 2       | +---------*/

`TOKENIZE_NGRAMS`

TOKENIZE_NGRAMS(value_to_tokenize[,ngram_size_min=>value][,ngram_size_max=>value][,remove_diacritics=>{TRUE|FALSE}])

Description

Constructs an n-gramTOKENLIST value by tokenizing text for n-gram matching.

Definitions

value_to_tokenize: ASTRING orARRAY<STRING> value to tokenize forn-gram search.
ngram_size_min: A named argument with anINT64 value. The value is theminimum length of the n-gram tokens to generate. The default value for thisargument is1. This argument must be less than or equal tongram_size_max.
Increasingngram_size_min can reduce write overhead and index size bygenerating fewer tokens. However, since n-gram tokens shorter thanngram_size_min are not generated, n-gram search queries that require thosetokens are not able to find any matches.
We recommend tuningngram_size_min only when the developer controls thequeries and can ensure that the minimum query length is at leastngram_size_min.
ngram_size_max: A named argument with anINT64 value. The value is themaximum size of each n-gram token to generate. Setting a higherngram_size_max can lead to better retrieval performance by reducing thenumber of irrelevant records that share common n-grams. However, a largerdifference betweenngram_size_min andngram_size_max can substantiallyincrease index sizes and write costs.
When using the resultingTOKENLIST withSEARCH_NGRAMS, thengram_size_max parameter also determines the length of n-grams generatedfor thengrams_query parameter ofSEARCH_NGRAMS. Opting for a shortern-gram length in your query yields a higher number of matches, but can alsointroduce irrelevant results.
The default value for this argument is4. However, when using theresultingTOKENLIST with theSEARCH_NGRAMS function,ngram_size_maxof 3 can be a good starting point for matching common typographical errors.Further fine-tuning can help with specific fuzzy search queries and datapatterns.
remove_diacritics: A named argument with aBOOL value. IfTRUE, thediacritics are removed fromvalue_to_tokenize before indexing. This isuseful when you want to ignore diacritics when searching (full-text,substring, or ngram). When a search query is called on aTOKENLIST valuewithremove_diacritics set asTRUE, the diacritics are also removed atquery time from the search queries.

Details

This function returnsNULL whenvalue_to_tokenize isNULL.

Return type

TOKENLIST

Examples

In the following example, aTOKENLIST column is created using theTOKENIZE_NGRAMS function. TheINSERT generates aTOKENLIST which containstwo sets of tokens. First, the whole string is broken up into n-grams with alength in the range[ngram_size_min, ngram_size_max-1]. Capitalization andwhitespace are preserved in the n-grams. These n-grams are placed in the firstposition in the tokenlist.

[" ", " M", " Me", "vy ", "y ", "y M", H, He, Hea, Heav, ...], ...

Second, any n-grams with length equal tongram_size_max are stored insequence, with the first of these in the same position as the smaller n-grams.(In this example, theHeav token is in the first position.)

..., eavy, "avy ", "vy M", "y Me", " Met", Meta, etal

CREATETABLEAlbums(AlbumIdINT64NOTNULL,DescriptionSTRING(MAX),DescriptionNgramTokensTOKENLISTAS(TOKENIZE_NGRAMS(Description))HIDDEN)PRIMARYKEY(AlbumId);CREATESEARCHINDEXAlbumsIndexONAlbums(DescriptionNgramTokens);INSERTINTOAlbums(AlbumId,Description)VALUES(1,'Heavy Metal');

To query an n-gramTOKENLIST column, see theSEARCH_NGRAMS function.

`TOKENIZE_NUMBER`

TOKENIZE_NUMBER(value_to_tokenize,[,comparison_type=>{"all"|"equality"}][,algorithm=>{"logtree"|"prefixtree"|"floatingpoint"}][,min=>value][,max=>value][,granularity=>value][,tree_base=>value][,precision=>value])

Description

Constructs a numericTOKENLIST value by tokenizing numeric values toaccelerate numeric comparison expressions in SQL.

Definitions

value_to_tokenize: AnINT64,FLOAT32,FLOAT64 orARRAY of these types to tokenize for numericcomparison expressions.
comparison_type: A named argument with aSTRING value. The valuerepresents the type of comparison to use for numeric expressions. Set toequality to save space if equality is only required comparison. Default isall.
algorithm: A named argument with aSTRING value. The value indicates theindexing algorithm to use. Supported algorithms are limited, depending onthe type of value being indexed. The default islogtree.FLOAT32 orFLOAT64 must not use default. Theyshould specify the algorithm and must also usemin andmax when using thelogtreeorprefixtree algorithms.
- logtree: Use for indexing uniformly distributed data.min,max,andgranularity must be specified ifvalue_to_tokenize isFLOAT32 orFLOAT64.
- prefixtree: Use when indexing exponentially distributed data and whenquery predicate is of the form "@param > number" or "@param >=number" (ranges without an upper bound). Compared tologtree, thisalgorithm generates fewerindex tokens for small numbers. For querieswhere theWHERE clause contains the predicate previously described,prefixtree generates fewerquery tokens, which can improveperformance.min,max, andgranularity must be specified ifvalue_to_tokenize isFLOAT32 orFLOAT64.
- floatingpoint: Use for indexingFLOAT32 orFLOAT64 values where the indexed data and queries oftencontain fractions. When tokenizingFLOAT32 orFLOAT64 usinglogtree orprefixtree,TOKENIZE_NUMBER might lose precision as the count ofgranularitybuckets in themin tomax range approaches the maximum resolution offloating point numbers. This can make queries less efficient, but itdoesn't cause incorrect behavior. This loss of precision doesn't happenwith thefloatingpoint algorithm if theprecision argument is sethigh enough. However, thefloatingpoint algorithm generates more indextokens whenprecision is set to a larger value.
min: A named argument with the same type asvalue_to_tokenize. Valuesless thanmin are indexed in the same index bucket. This will not causeincorrect results, but may cause significant over-retrieval for queries witha range that includes values lesser thanmin. Don't usemin whencomparison_type isequality.
max: A named argument with the same type asvalue_to_tokenize. Valuesgreater thanmax are indexed in the same index bucket. This doesn't causeincorrect results, but might cause significant over-retrieval for querieswith a range that includes values greater than themax. Don't usemaxwhencomparison_type isequality.
granularity: A named argument with the same type asvalue_to_tokenize.The value represents the width of each indexing bucket. Values in the samebucket are indexed together, so larger buckets are more storage efficient,but may cause over-retrieval, causing high latency during query execution.granularity is only allowed whenalgorithm islogtree orprefixtree.
tree_base: A named argument with anINT64 value. The value is thenumerical base of a tree for tree-based algorithms.
For example, the value of2 means that each tree token represents somepower-of-two number of buckets. In the case of a value indexed in the 1024thbucket, there is a token for [1024,1024], then a token for [1024,1025], thena token for [1024, 1027], and so on.
Increasingtree_base reduces the required number of index tokens andincreases the required number of query tokens.
The default value is 2.tree_base is only allowed whenalgorithm islogtree orprefixtree.
precision: A named argument with anINT64 value. Reducing the precisionreduces the number of index tokens, but increases over-retrieval whenqueries specify ranges with a high number of significant digits. The defaultvalue is 15.precision is only allowed whenalgorithm isfloatingpoint.

Details

This function returnsNULL whenvalue_to_tokenize isNULL.
Thetree_base parameter controls the width of each tree bucket in thelogtree andprefixtree algorithms. Both algorithms generate tokensrepresenting nodes in abase-ary tree where the width of a node isbase^{distance_from_leaf}. The algorithms differ inthatprefixtree omits some of the tree nodes in favor of greater-thantokens that accelerate greater-than queries. When a larger base is selected,fewer index tokens are generated. However, largerbase values increase themaximum number of query tokens required.
Numbers that fall outside of the[min, max] range are all indexed into twobuckets: one for all numbers less thanmin, and the other for all numbersgreater thanmax. This might cause significant over-retrieval (retrievalof too many candidate results) when the range requested by the query alsoincludes numbers outside of the range. For this reason, setmin andmaxto the narrowest possible values that encompass all input numbers. Like alltokenization configurations, changing themin andmax values requires arebuild of the numeric index, so leave room to grow if the final domain of acolumn isn't known. The problem of over-retrieval isn't a correctnessproblem as all potential matches are checked against non-bucketized numbersat the end of the search process; it's only a potential efficiency issue.
Thegranularity argument controls the rate of downsampling that's appliedto numbers before they are indexed in the tree-based algorithms. Before eachnumber is tokenized, it's sorted into buckets with a width equal togranularity. All the numbers in the samegranularity bucket get the sametokens. This means that over-retrieval might occur if the granularity valueis set to anything other than 1 for integral numbers. Over retrieval isalways possible forFLOAT64 numbers. It also means that ifnumeric values change by a small amount, most of their tokens don't need tobe reindexed. Using agranularity higher than 1 also reduces the number oftokens that the algorithm needs to generate, but the effect is lesssignificant than the effect of increasing thebase. Therefore, werecommend that 'granularity' is set to 1.

Return type

TOKENLIST

Examples

TheAlbums table contains a column called theRatingTokens, which tokenizestheRating column using theTOKENIZE_NUMBER function. Finally,AlbumsIndexindexesRatingTokens, which makes it possible for Spannerto use the index to accelerate numeric comparison expressions in SQL.

CREATETABLEAlbums(SingerIdINT64NOTNULL,AlbumIdINT64NOTNULL,RatingINT64,RatingTokensTOKENLISTAS(TOKENIZE_NUMBER(Rating))HIDDEN,TrackRatingARRAY<INT64>,TrackRatingTokensTOKENLISTAS(TOKENIZE_NUMBER(TrackRating))HIDDEN)PRIMARYKEY(SingerId,AlbumId);CREATESEARCHINDEXAlbumsIndexONAlbums(RatingTokens,TrackRatingTokens);-- RatingTokens and TrackRatingTokens are generated from Rating and TrackRating-- values, respectively, using the TOKENIZE_NUMBER function.INSERTINTOAlbums(SingerId,AlbumId,Rating,TrackRating)VALUES(1,1,2,[2,3]);INSERTINTOAlbums(SingerId,AlbumId,Rating,TrackRating)VALUES(1,2,5,[3,5]);

The following query finds rows in which the columnRating is equal to5. Thequery optimizer might choose to accelerate the condition usingAlbumsIndexwithRatingTokens. Optionally, the query can provide@{force_index = AlbumsIndex} to force the optimizer to useAlbumsIndex.

SELECTa.AlbumIdFROMAlbums@{force_index=AlbumsIndex}aWHEREa.Rating=5;/*---------+ | AlbumId | +---------+ | 2       | +---------*/

The following query is like the previous one. However, the condition is on thearray column ofTrackRating this time. Array conditions should useARRAY_INCLUDES,ARRAY_INCLUDES_ANY orARRAY_INCLUDES_ALL functions to beeligible for using a search index for acceleration.

SELECTa.AlbumIdFROMAlbumsaWHEREARRAY_INCLUDES_ALL(a.TrackRating,[2,3]);/*---------+ | AlbumId | +---------+ | 1       | +---------*/SELECTa.AlbumIdFROMAlbumsaWHEREARRAY_INCLUDES_ANY(a.TrackRating,[3,4,5]);/*---------+ | AlbumId | +---------+ | 1       | | 2       | +---------*/

The following query is like the previous ones. However, the condition is rangethis time. This query can also be accelerated, as defaultcomparison_type isall which covers bothequality andrange comparisons.

SELECTa.AlbumIdFROMAlbumsaWHEREa.Rating>=2;/*---------+ | AlbumId | +---------+ | 1       | | 2       | +---------*/

`TOKENIZE_SUBSTRING`

TOKENIZE_SUBSTRING(value_to_tokenize[,language_tag=>value][,ngram_size_min=>value][,ngram_size_max=>value][,relative_search_types=>value][,content_type=>{"text/plain"|"text/html"}][,short_tokens_only_for_anchors=>{TRUE|FALSE}][,remove_diacritics=>{TRUE|FALSE}])

Description

Constructs a substringTOKENLIST value by tokenizing text for substringmatching.

Definitions

value_to_tokenize: ASTRING orARRAY<STRING> value to tokenize forsubstring search.value_to_tokenize is split into lower-cased words first,then n-gram tokens are generated from each word.
language_tag: A named argument with aSTRING value. The value containsanIETF BCP 47 language tag. You can use this tag tospecify the language forvalue_to_tokenize. If the value for this argument isNULL, this function doesn't use a specific language. If this argumentisn't specified,NULL is used by default.
relative_search_types: A named argument with anARRAY<STRING> value. Thevalue determines whichTOKENIZE_SUBSTRING relative search types aresupported. See theSEARCH_SUBSTRING functionfor a list of the different relative search types.
In addition to the relative search types from theSEARCH_SUBSTRINGfunction, theTOKENIZE_SUBSTRING function accepts a special flag,all,which means that all relative search types are supported.
If this argument isn't used, then no relative search tokens are generatedfor the resultingTOKENLIST value.
Setting this value causes extraanchor tokens to be generated to enablerelative searches. A given relative search type can only be used in a queryif that type, orall, is present in therelative_search_types argument.By default,relative_search_types is empty.
content_type: A named argument with aSTRING value. Indicates the MIMEtype ofvalue. This can be:
- "text/plain" (default):value_to_tokenize contains plain text. All tokens areassigned to thesmall token category.
- "text/html":value_to_tokenize contains HTML. The HTML tags are removed.HTML-escaped entities are replaced with their unescaped equivalents (forexample,< becomes<). A token category is assigned to each tokendepending on its prominence in the HTML. For example, bolded text ortext in a<h1> tag might have higher prominence than normal text andthus might be placed into a different token category.
  We use token categories during scoring to boost the weight ofhigh-prominence tokens.
ngram_size_min: A named argument with anINT64 value. The value is theminimum length of the n-gram tokens to generate. The default value for thisargument is1. This argument must be less than or equal tongram_size_max.
While partial-word n-grams shorter thanngram_size_min are not generated,tokens for whole words that are shorter thanngram_size_min are. ThisletsSEARCH_SUBSTRING match values containing such words, but only if thequery text contains these tokens as words.
Increasingngram_size_min can reduce write overhead and index size bygenerating fewer tokens. However, since n-gram tokens shorter thanngram_size_min are not generated except for whole words, substring searchqueries that require those tokens are not able to find any matches.
We recommend tuningngram_size_min only when the developer controls thequeries and can ensure that the minimum query length is at leastngram_size_min.
ngram_size_max: A named argument with anINT64 value. The value is themaximum size of each n-gram token to generate. Setting a higherngram_size_max can lead to better retrieval performance by reducing thenumber of irrelevant records that share common n-grams. However, a largerdifference betweenngram_size_min andngram_size_max can substantiallyincrease index sizes and write costs.
When using the resultingTOKENLIST withSEARCH_NGRAMS, thengram_size_max parameter also determines the length of n-grams generatedfor thengrams_query parameter ofSEARCH_NGRAMS. Opting for a shortern-gram length in your query yields a higher number of matches, but can alsointroduce irrelevant results.
The default value for this argument is4. However, when using theresultingTOKENLIST with theSEARCH_NGRAMS function,ngram_size_maxof 3 can be a good starting point for matching common typographical errors.Further fine-tuning can help with specific fuzzy search queries and datapatterns.
short_tokens_only_for_anchors: A named argument with aBOOL value. Iftrue, theTOKENLIST emitted by this function doesn't contain shortn-grams — those with sizes less thanngram_size_max — except when thosen-grams are part of one of the anchors used to support the prefix and suffixrelative_search_types settings. The default value isFALSE.
Setting this toTRUE can reduce the number of n-grams generated.However, it causesSEARCH_SUBSTRING to returnFALSE for shortquery terms whenrelative_search_types isn't one of the prefix or suffixmodes. Therefore, we recommend setting this only whenrelative_search_typesis always set to a prefix or suffix mode.
remove_diacritics: A named argument with aBOOL value. IfTRUE, thediacritics are removed fromvalue_to_tokenize before indexing. This isuseful when you want to ignore diacritics when searching (full-text,substring, or ngram). When a search query is called on aTOKENLIST valuewithremove_diacritics set asTRUE, the diacritics are also removed atquery time from the search queries.

Details

This function returnsNULL whenvalue_to_tokenize isNULL.

Return type

TOKENLIST

Example

In the following example, aTOKENLIST column is created using theTOKENIZE_SUBSTRING function. TheINSERT generates aTOKENLIST whichcontains two sets of tokens. First, each word is broken up into lower-casedn-grams with a length in the range[ngram_size_min, ngram_size_max-1], and anywhole words with a length shorter than thatngram_size_max. All of thesetokens are placed in the first position in the tokenlist.

[a, al, av, avy, e, ea, eav, et, eta, h, he, hea, ...], ...

Second, any n-grams with length equal tongram_size_max are stored insubsequent positions. These tokens are used when searching for words larger thanthe maximum n-gram size.

..., heav, eavy, <gap(1)>, meta, etal

CREATETABLEAlbums(SingerIdINT64NOTNULL,AlbumIdINT64NOTNULL,DescriptionSTRING(MAX),DescriptionSubstrTokensTOKENLISTAS(TOKENIZE_SUBSTRING(Description,ngram_size_min=>1,ngram_size_max=>4))HIDDEN)PRIMARYKEY(SingerId,AlbumId);INSERTINTOAlbums(SingerId,AlbumId,Description)VALUES(1,1,'Heavy Metal');

To query a substringTOKENLIST column, see theSEARCH_SUBSTRING orSEARCH_NGRAMS function.

`TOKENLIST_CONCAT`

TOKENLIST_CONCAT(value1[,...])

Description

Constructs aTOKENLIST value by concatenating one or moreTOKENLIST values.

Details

This function only takes TOKENLIST generated byTOKENIZE_FULLTEXT orTOKENIZE_SUBSTRING.
All theTOKENLIST args must be generated by the same tokenization functions.
This function returnsNULL when an array of TOKENLIST isNULL.
This function treats theNULL element in the array as an emptyTOKENLIST.

Return type

TOKENLIST

Examples

In the following example, full-textTOKENLIST columns are created using theTOKENIZE_FULLTEXT function, then another full-textTOKENLIST column iscreated using theTOKENLIST_CONCAT function:

CREATETABLEAlbums(SingerIdINT64NOTNULL,AlbumIdINT64NOTNULL,SingerNameSTRING(MAX),SingerNameTokensTOKENLISTAS(TOKENIZE_FULLTEXT(SingerName))HIDDEN,AlbumNameSTRING(MAX),AlbumNameTokensTOKENLISTAS(TOKENIZE_FULLTEXT(AlbumName))HIDDEN,SingerOrAlbumNameTokensTOKENLISTAS(TOKENLIST_CONCAT([SingerNameTokens,AlbumNameTokens]))HIDDEN)PRIMARYKEY(SingerId,AlbumId);CREATESEARCHINDEXAlbumsIndexONAlbums(SingerNameTokens,AlbumNameTokens,SingerOrAlbumNameTokens);-- The INSERT statement below generates SingerOrAlbumNameTokens by concatenating-- all the tokens in SingerNameTokens and AlbumNameTokens.INSERTINTOAlbums(SingerId,AlbumId,SingerName,AlbumName)VALUES(1,1,'Alice Trentor','Go Go Go');INSERTINTOAlbums(SingerId,AlbumId,SingerName,AlbumName)VALUES(2,1,'Catalina Smith','Alice Wonderland');

The following query searches for a tokenalice in theSingerOrAlbumNameColumnTokens. The rows that matchalice in eitherSingerNameTokens orAlbumNameTokens are returned.

SELECTa.SingerId,a.AlbumIdFROMAlbumsaWHERESEARCH(a.SingerOrAlbumNameTokens,'alice');/*--------------------+ | SingerId | AlbumId | +--------------------+ | 2        | 1       | | 1        | 1       | +--------------------*/

The following query is like the previous one. However,TOKENLIST_CONCAT iscalled directly inside of aSEARCH function this time.

SELECTa.SingerId,a.AlbumIdFROMAlbumsaWHERESEARCH(TOKENLIST_CONCAT([a.SingerNameTokens,a.AlbumNameTokens]),'alice');/*--------------------+ | SingerId | AlbumId | +--------------------+ | 2        | 1       | | 1        | 1       | +--------------------*/

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.

Movatterモバイル変換

Search functions in GoogleSQL

Categories

Function list

`DEBUG_TOKENLIST`

`SCORE`

`SCORE_NGRAMS`

`SEARCH`

`SEARCH_NGRAMS`

`SEARCH_SUBSTRING`

`SNIPPET`

`TOKEN`

`TOKENIZE_BOOL`

`TOKENIZE_FULLTEXT`

`TOKENIZE_JSON`

`TOKENIZE_NGRAMS`

`TOKENIZE_NUMBER`

`TOKENIZE_SUBSTRING`

`TOKENLIST_CONCAT`

Movatterモバイル変換

Search functions in GoogleSQL Stay organized with collections Save and categorize content based on your preferences.

Categories

Function list

DEBUG_TOKENLIST

SCORE

SCORE_NGRAMS

SEARCH

SEARCH_NGRAMS

SEARCH_SUBSTRING

SNIPPET

TOKEN

TOKENIZE_BOOL

TOKENIZE_FULLTEXT

TOKENIZE_JSON

TOKENIZE_NGRAMS

TOKENIZE_NUMBER

TOKENIZE_SUBSTRING

TOKENLIST_CONCAT

Search functions in GoogleSQL

`DEBUG_TOKENLIST`

`SCORE`

`SCORE_NGRAMS`

`SEARCH`

`SEARCH_NGRAMS`

`SEARCH_SUBSTRING`

`SNIPPET`

`TOKEN`

`TOKENIZE_BOOL`

`TOKENIZE_FULLTEXT`

`TOKENIZE_JSON`

`TOKENIZE_NGRAMS`

`TOKENIZE_NUMBER`

`TOKENIZE_SUBSTRING`

`TOKENLIST_CONCAT`