The ML.BUCKETIZE function
This document describes theML.BUCKETIZE function, which lets you splita numerical expression into buckets.
You can use this function with models that supportmanual feature preprocessing. For moreinformation, see the following documents:
Syntax
ML.BUCKETIZE(numerical_expression, array_split_points [, exclude_boundaries] [, output_format])
Arguments
ML.BUCKETIZE takes the following arguments:
numerical_expression: thenumericalexpression to bucketize.array_split_points: an array of numerical values that provide thepoints at which to split thenumerical_expressionvalue. Thenumerical values in the array must be finite, so not-inf,inf, orNaN.Provide the numerical values in order, lowest to highest. The range ofpossible buckets is determined by the upper and lower boundaries of the array.For example, if thearray_split_pointsvalue is[1, 2, 3, 4], then thereare five potential buckets that thenumerical_expressionvalue can bebucketized into.exclude_boundaries: aBOOLvalue that determines whetherthe upper and lower boundaries fromarray_split_pointsare used.IfTRUE, then the boundary values aren't used to create buckets. Forexample, if thearray_split_pointsvalue is[1, 2, 3, 4]andexclude_boundariesisTRUE, then there are three potential bucketsthat thenumerical_expressionvalue can be bucketized into.The default value isFALSE.output_format: aSTRINGvalue that specifies the output format of the bucket. Valid output formats are as follows:bucket_names: returns aSTRINGvalue in the formatbin_<bucket_index>. For example,bin_3. Thebucket_indexvalue starts at 1. This is the default bucket format.bucket_ranges: returns aSTRINGvalue in the format[lower_bound, upper_bound)ininterval notation. For example,(-inf, 2.5),[2.5, 4.6),[4.6, +inf).bucket_ranges_json: returns a JSON-formattedSTRINGvalue in the format{"start": "lower_bound", "end": "upper_bound"}. For example,{"start": "-Infinity", "end": "2.5"},{"start": "2.5", "end": "4.6"},{"start": "4.6", "end": "Infinity"}. The inclusivity and exclusivity of the lower and upper bound follow the same pattern as thebucket_rangesoption.
Output
ML.BUCKETIZE returns aSTRING value that contains the name of the bucket, in the format specified by theoutput_format argument.
Example
The following example bucketizes a numerical expression both with and withoutboundaries:
SELECTML.BUCKETIZE(2.5,[1,2,3])ASbucket,ML.BUCKETIZE(2.5,[1,2,3],TRUE)ASbucket_without_boundaries,ML.BUCKETIZE(2.5,[1,2,3],FALSE,"bucket_ranges")ASbucket_ranges,ML.BUCKETIZE(2.5,[1,2,3],FALSE,"bucket_ranges_json")ASbucket_ranges_json;
The output looks similar to the following:
+--------+---------------------------+---------------+----------------------------+| bucket | bucket_without_boundaries | bucket_ranges | bucket_ranges_json ||--------|---------------------------|---------------|----------------------------|| bin_3 | bin_2 | [2, 3) | {"start": "2", "end": "3"} |+--------+---------------------------+---------------+----------------------------+What's next
- For information about feature preprocessing, seeFeature preprocessing overview.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-15 UTC.