The CREATE MODEL statement for transform-only models
This document describes theCREATE MODEL statement for creatingtransform-only models in BigQuery. Transform-only models use theTRANSFORM clauseto applypreprocessing functionsto input data and return the preprocessed data. Transform-only models decoupledata preprocessing from model training, making it easier for you to capture andreuse a set of data preprocessing rules.
You can use a transform-only model in conjunction with theML.TRANSFORM functionto provide preprocessed data to other models:
- You can use it in thequery statementwhen creating another model, in order to use the transformed data as thetraining data for that model.
- You can use it in the
query statementargumentof theML.PREDICTfunction, in order to provide data for prediction that isprocessed in the way the target model expects.
For batch feature transformations, it is better to use transform-only modelsbecause it lets you process large amounts of data in a short time. For onlinefeature transformations, it is better to useVertex AI Feature Storebecause it provides responses with low latency.
You can also use a transform-only model with theML.FEATURE_INFO functionin order to return information about feature transformations in the model.
For more information about supported SQL statements and functions for thismodel, seeEnd-to-end user journeys for ML models.
CREATE MODEL syntax
{CREATE MODEL |CREATE MODEL IF NOT EXISTS |CREATE OR REPLACE MODEL}model_nameTRANSFORM (select_list)OPTIONS(MODEL_TYPE = 'TRANSFORM_ONLY')ASquery_statementCREATE MODEL
Creates and trains a new model in the specified dataset. If the model nameexists,CREATE MODEL returns an error.
CREATE MODEL IF NOT EXISTS
Creates and trains a new model only if the model doesn't exist in thespecified dataset.
CREATE OR REPLACE MODEL
Creates and trains a model and replaces an existing model with the same name inthe specified dataset.
model_name
The name of the model you're creating or replacing. The modelname must be unique in the dataset: no other model or table can have the samename. The model name must follow the same naming rules as aBigQuery table. A model name can:
- Contain up to 1,024 characters
- Contain letters (upper or lower case), numbers, and underscores
model_name is case-sensitive.
If you don't have a default project configured, then you must prepend theproject ID to the model name in the following format, including backticks:
`[PROJECT_ID].[DATASET].[MODEL]`
For example, `myproject.mydataset.mymodel`.
select_list
You can pass columns fromquery_statement through to model training withouttransformation by either using*, * EXCEPT(), or by listingthe column names directly.
Not all columns fromquery_statement are required to appear in theTRANSFORMclause, so you can drop columns appearing inquery_statement by omittingthem from theTRANSFORM clause.
You can transform inputs fromquery_statement by using expressions inselect_list.select_list is similar to a normalSELECT statement.select_list supports the following syntax:
** EXCEPT()* REPLACE()expressionexpression.*
The following cannot appear insideselect_list:
- Aggregation functions.
- Non-BigQuery ML analytic functions. For more information aboutsupported functions, seeManual feature preprocessing.
- UDFs.
- Subqueries.
- Anonymous columns. For example,
a + b as cis allowed, whilea + bisn't.
The output columns ofselect_list can be of any BigQuerysupported data type.
If present, the following columns must appear inselect_list withouttransformation:
labeldata_split_colkmeans_init_colinstance_weight_col
If these columns are returned byquery_statement, you must reference them inselect_list by column name outside of any expression, or by using*. Youcan't use aliases with these columns.
MODEL_TYPE
Syntax
MODEL_TYPE='TRANSFORM_ONLY'Description
Specify the model type. This option is required.
query_statement
TheGoogleSQL querythat contains the data to preprocess. The statistics that are calculatedwhen transforming this data are applied to the input data of any functionsthat you use the model with.
Examples
The following examples show how to create and use transform-only models.
Example 1
The following example creates a model namedtransform_model inmydataset in yourdefault project. The model transforms several columns from theBigQuery public tablebigquery-public-data.ml_datasets.penguins:
CREATEMODEL`mydataset.transform_model`TRANSFORM(species,island,ML.MAX_ABS_SCALER(culmen_length_mm)OVER()ASculmen_length_mm,ML.MAX_ABS_SCALER(culmen_depth_mm)OVER()ASculmen_depth_mm,ML.MAX_ABS_SCALER(flipper_length_mm)OVER()ASflipper_length_mm,sex,body_mass_g)OPTIONS(model_type='transform_only')AS(SELECT*FROM`bigquery-public-data.ml_datasets.penguins`);
Example 2
The following example creates a model namedmymodel inmydataset in yourdefault project. The model is trained on data that is preprocessed by usinga transform-only model:
CREATEMODEL`mydataset.mymodel`OPTIONS(MODEL_TYPE='LINEAR_REG',MAX_ITERATIONS=5,INPUT_LABEL_COLS=['body_mass_g'])ASSELECT*FROMML.TRANSFORM(MODEL`mydataset.transform_model`,TABLE`bigquery-public-data.ml_datasets.penguins`)WHEREbody_mass_gISNOTNULL;
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-11-24 UTC.