The ML.NGRAMS function
This document describes theML.NGRAMS function, which lets you createn-grams of the input values.
You can use this function with models that supportmanual feature preprocessing. For moreinformation, see the following documents:
Syntax
ML.NGRAMS(array_input, range [, separator])
Arguments
ML.NGRAMS takes the following arguments:
array_input: anARRAY<STRING>value that represent the tokens to bemerged.range: anARRAYof twoINT64elements or a singleINT64value. Ifyou specify anARRAYvalue, theINT64elements provide the rangeof n-gram sizes to return. Provide the numerical values in order, lower tohigher. If you specify a singleINT64value ofx, therange of n-gram sizes to return is[x, x].separator: aSTRINGvalue that specifies the separator toconnect two adjacent tokens in the output. The default value iswhitespace.
Output
ML.NGRAMS returns anARRAY<STRING> value that contain the n-grams.
Example
The following example outputs all possible 2-token and 3-token combinationsfor a set of three input strings:
SELECTML.NGRAMS(['a','b','c'],[2,3],'#')ASoutput;
The output looks similar to the following:
+-----------------------+| output |+-----------------------+| ["a#b","a#b#c","b#c"] |+-----------------------+
What's next
- For information about feature preprocessing, seeFeature preprocessing overview.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-15 UTC.