The ML.NGRAMS function

This document describes theML.NGRAMS function, which lets you createn-grams of the input values.

You can use this function with models that supportmanual feature preprocessing. For moreinformation, see the following documents:

Syntax

ML.NGRAMS(array_input, range [, separator])

ML.NGRAMS takes the following arguments:

array_input: anARRAY<STRING> value that represent the tokens to bemerged.
range: anARRAY of twoINT64 elements or a singleINT64 value. Ifyou specify anARRAY value, theINT64 elements provide the rangeof n-gram sizes to return. Provide the numerical values in order, lower tohigher. If you specify a singleINT64 value ofx, therange of n-gram sizes to return is[x, x].
separator: aSTRING value that specifies the separator toconnect two adjacent tokens in the output. The default value iswhitespace.

ML.NGRAMS returns anARRAY<STRING> value that contain the n-grams.

The following example outputs all possible 2-token and 3-token combinationsfor a set of three input strings:

SELECTML.NGRAMS(['a','b','c'],[2,3],'#')ASoutput;

The output looks similar to the following:

+-----------------------+|        output         |+-----------------------+| ["a#b","a#b#c","b#c"] |+-----------------------+

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.