Manual feature preprocessing

You can use theTRANSFORM clauseof theCREATE MODEL statement in combination with manual preprocessingfunctions to define custom data preprocessing. You canalso use these manual preprocessing functions outside of theTRANSFORM clause.

If you want to decouple data preprocessing from model training, you can create atransform-only modelthat only performs data transformations by using theTRANSFORM clause.

You can use theML.TRANSFORM functionto increase the transparency of feature preprocessing. This function lets youreturn the preprocessed data from a model'sTRANSFORM clause, so that you cansee the actual training data that goes into the model training, as well as theactual prediction data that goes into model serving.

For information about feature preprocessing support inBigQuery ML, seeFeature preprocessing overview.

Types of preprocessing functions

There are several types of manual preprocessing functions:

  • Scalar functions operate on a single row. For example,ML.BUCKETIZE.
  • Table-valued functions operate on all rows and output a table. For example,ML.FEATURES_AT_TIME.
  • Analytic functions operate on all rows, and output the result for eachrow based on the statistics collected across all rows. For example,ML.QUANTILE_BUCKETIZE.

    You must always use an emptyOVER() clause with ML analytic functions.

    When you use ML analytic functions inside theTRANSFORM clauseduring training, the same statistics are automatically applied tothe input in prediction.

The following sections describe the available preprocessing functions.

General functions

Use the following function on string or numerical expressions to do data cleanup:

Numerical functions

Use the following functions on numerical expressions to regularize data:

Categorical functions

Use the following functions on categorize data:

Text functions

Use the following functions on text string expressions:

Image functions

Use the following functions on image data:

Known limitations

What's next

For more information about supported SQL statements and functions for modelsthat support manual feature preprocessing, see the following documents:

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.