The ML.BUCKETIZE function

This document describes theML.BUCKETIZE function, which lets you splita numerical expression into buckets.

You can use this function with models that supportmanual feature preprocessing. For moreinformation, see the following documents:

Syntax

ML.BUCKETIZE(numerical_expression, array_split_points [, exclude_boundaries] [, output_format])

Arguments

ML.BUCKETIZE takes the following arguments:

  • numerical_expression: thenumericalexpression to bucketize.
  • array_split_points: an array of numerical values that provide thepoints at which to split thenumerical_expression value. Thenumerical values in the array must be finite, so not-inf,inf, orNaN.Provide the numerical values in order, lowest to highest. The range ofpossible buckets is determined by the upper and lower boundaries of the array.For example, if thearray_split_points value is[1, 2, 3, 4], then thereare five potential buckets that thenumerical_expression value can bebucketized into.
  • exclude_boundaries: aBOOL value that determines whetherthe upper and lower boundaries fromarray_split_points are used.IfTRUE, then the boundary values aren't used to create buckets. Forexample, if thearray_split_points value is[1, 2, 3, 4] andexclude_boundaries isTRUE, then there are three potential bucketsthat thenumerical_expression value can be bucketized into.The default value isFALSE.
  • output_format: aSTRING value that specifies the output format of the bucket. Valid output formats are as follows:
    • bucket_names: returns aSTRING value in the formatbin_<bucket_index>. For example,bin_3. Thebucket_index value starts at 1. This is the default bucket format.
    • bucket_ranges: returns aSTRING value in the format[lower_bound, upper_bound) ininterval notation. For example,(-inf, 2.5),[2.5, 4.6),[4.6, +inf).
    • bucket_ranges_json: returns a JSON-formattedSTRING value in the format{"start": "lower_bound", "end": "upper_bound"}. For example,{"start": "-Infinity", "end": "2.5"},{"start": "2.5", "end": "4.6"},{"start": "4.6", "end": "Infinity"}. The inclusivity and exclusivity of the lower and upper bound follow the same pattern as thebucket_ranges option.

Output

ML.BUCKETIZE returns aSTRING value that contains the name of the bucket, in the format specified by theoutput_format argument.

Example

The following example bucketizes a numerical expression both with and withoutboundaries:

SELECTML.BUCKETIZE(2.5,[1,2,3])ASbucket,ML.BUCKETIZE(2.5,[1,2,3],TRUE)ASbucket_without_boundaries,ML.BUCKETIZE(2.5,[1,2,3],FALSE,"bucket_ranges")ASbucket_ranges,ML.BUCKETIZE(2.5,[1,2,3],FALSE,"bucket_ranges_json")ASbucket_ranges_json;

The output looks similar to the following:

+--------+---------------------------+---------------+----------------------------+| bucket | bucket_without_boundaries | bucket_ranges | bucket_ranges_json         ||--------|---------------------------|---------------|----------------------------|| bin_3  | bin_2                     | [2, 3)        | {"start": "2", "end": "3"} |+--------+---------------------------+---------------+----------------------------+

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.