The ML.PREDICT function
This document describes theML.PREDICT function, which you can use topredict outcomes by using a model.ML.PREDICT works with the following models:
- Linear and logistic regression models
- Boosted tree models
- Random forest models
- Deep neural network (DNN) models
- Wide-and-deep models
- K-means models
- Principal component analysis (PCA) models
- Autoencoder models
- Imported models:
- Vertex AI hosted models
For PCA and autoencoder models, you can use theAI.GENERATE_EMBEDDING function as an alternative to theML.PREDICTfunction.AI.GENERATE_EMBEDDING generates the same embedding data asML.PREDICTas an array in a single column, rather than in a series of columns. Having allof the embeddings in a single column lets you directly use theVECTOR_SEARCH functionon theAI.GENERATE_EMBEDDING output.
You can run prediction during model creation, after model creation, or after afailure (as long as at least one iteration is finished).ML.PREDICT always usesthe model weights from the last successful iteration.
Syntax
ML.PREDICT( MODEL `PROJECT_ID.DATASET.MODEL_NAME`, { TABLE `PROJECT_ID.DATASET.TABLE` | (QUERY_STATEMENT) } STRUCT( [THRESHOLD AS threshold] [,KEEP_ORIGINAL_COLUMNS AS keep_original_columns] [,TRIAL_ID AS trial_id]))Arguments
ML.PREDICT takes the following arguments:
PROJECT_ID: the project that contains theresource.DATASET: the dataset that contains theresource.MODEL: the name of the model.TABLE: The name of the input table that contains theevaluation data.If
TABLEis specified, the input column names in the table must match thecolumn names in the model, and their types should be compatible according toBigQueryimplicit coercion rules.For TensorFlow Lite, Open Neural Network Exchange (ONNX), andXGBoost models, the input must beconvertible tothe type expected by the model.
For remote models, the input columns must contain all Vertex AIendpoint input fields.
If there are unused columns from the table, they are passed through asoutput columns.
QUERY_STATEMENT: The GoogleSQL query thatis used to generate the evaluation data. See theGoogleSQL query syntaxpage for the supported SQL syntax of theQUERY_STATEMENTclause.If
QUERY_STATEMENTis specified, the input column names from the querymust match the column names in the model, and their types should becompatible according to BigQueryimplicit coercion rules.For TensorFlow Lite, ONNX, and XGBoost models, the input must beconvertible tothe type expected by the model.
For remote models, the input columns must contain all Vertex AIendpoint input fields.
If there are unused columns from the query, they are passed through asoutput columns.
If you used the
TRANSFORMclausein theCREATE MODELstatement that created the model, then only the inputcolumns present in theTRANSFORMclause must appear inQUERY_STATEMENT.If you are running inference on image data from anobject table, you must use the
ML.DECODE_IMAGEfunctionto convert image bytes to a multi-dimensionalARRAYrepresentation. Youcan useML.DECODE_IMAGEoutput directly in anML.PREDICTstatement,or you can write the results fromML.DECODE_IMAGEto a table column andreference that column when you callML.PREDICT. For more information, seePredict an outcome from image data with an imported TensorFlow model.THRESHOLD: a
FLOAT64value that specifies a custom threshold fora binary classification model. It is used as the cutoff between the twolabels. Predictions above the threshold are positive predictions.Predictions below the threshold are negative predictions. The default valueis0.5.KEEP_ORIGINAL_COLUMNS: aBOOLvalue that specifieswhether to output the input table columns. IfTRUE, the columns from theinput table are output. The default value isFALSE.KEEP_ORIGINAL_COLUMNSonly applies to principal component analysis (PCA)models.TRIAL_ID: anINT64value that identifies thehyperparameter tuning trial that you want the function to evaluate. Thefunction uses the optimal trial by default. Only specify this argument if youran hyperparameter tuning when creating the model.
Output
The output of theML.PREDICT function has as many rows as the input table, andit includes all columns from the input table and all output columns from themodel. The output column names for the model arepredicted_<label_column_name>and, for classification models,predicted_<label_column_name>_probs. Inboth columns,label_column_name is the name of the input label column that'sused during training.
Regression models
For the following types of regression models:
- Linear regression
- Boosted tree regressor
- Random forest regressor
- DNN regressor
- Wide-and-deep regressor
The following column is returned:
predicted_<label_column_name>: aSTRINGvalue that contains the predictedvalue of the label.
Classification models
For the following types of binary-class classification models:
- Logistic regression
- Boosted tree classifier
- Random forest classifier
- DNN classifier
- Wide-and-deep classifier
The following columns are returned:
- The
predicted_<label_column_name>: aSTRINGvalue that contains one of thetwo input labels, depending on which label has the higher predictedprobability. - The
predicted_<label_column_name>_probs: anARRAY<STRUCT>value in the form[<label, probability>]that contains the predicted probability of eachlabel.
For the following types of multiclass classification models:
- Logistic regression
- Boosted tree classifier
- Random forest classifier
- DNN classifier
- Wide-and-deep classifier
The following columns are returned:
- The
predicted_<label_column_name>: aSTRINGvalue that contains the labelwith the highest predicted probability score. - The
predicted_<label_column_name>_probs: aFLOAT64value that containsthe probability for each class label, calculated using asoftmax function.
K-means models
For k-means models, the following columns are returned:
centroid_id: anINT64value that identifies the centroid.nearest_centroids_distance: anARRAY<STRUCT>value that contains thedistances to the nearestkclusters, wherekis equal to the lesserofnum_clustersor5. If the model was created with thestandardize_featuresoptionset toTRUE, then the model computes these distances using standardizedfeatures; otherwise, it computes these distances using non-standardizedfeatures.
PCA models
For PCA models, the following columns are returned:
principal_component_<index>: anINT64value that represents the projectionof the input data onto each principal component. These values can also beconsidered as embedded low-dimensional features in the space that isspanned by the principal components.
The original input columns are appended if thekeep_original_columnsargument is set toTRUE.
Autoencoder models
For autoencoder models, the following columns are returned:
latent_col_<index>: anINT64value that represents the dimensions of thelatent space.
The original input columns are appended after the latent space columns.
Imported models
For TensorFlow Lite models, the output is the output of theTensorFlow Lite model's predict method.
For ONNX models, the output is the output of theONNX model's predict method.
For XGBoost models, the output is the output of the XGBoost model's predictmethod.
Remote models
For remote models, the output columns contain all Vertex AI endpointoutput fields, and also aremote_model_status field that contains statusmessages from Vertex AI endpoint.
Missing data imputation
In statistics, imputation is used to replace missing data with substitutedvalues. When you train a model in BigQuery ML,NULL values aretreated as missing data. When you predict outcomes in BigQuery ML,missing values can occur when BigQuery ML encounters aNULLvalue or a previously unseen value. BigQuery ML handles missingdata differently, based on the type of data in the column.
| Column type | Imputation method |
|---|---|
| Numeric | In both training and prediction,NULL values in numeric columns are replaced with the mean value of the given column, as calculated by the feature column in the original input data. |
| One-hot/Multi-hot encoded | In both training and prediction,NULL values in the encoded columns are mapped to an additional category that is added to the data. Previously unseen data is assigned a weight of 0 during prediction. |
TIMESTAMP | TIMESTAMP columns use a mixture of imputation methods from both standardized and one-hot encoded columns. For the generated Unix time column, BigQuery ML replaces values with the mean Unix time across the original columns. For other generated values, BigQuery ML assigns them to the respectiveNULL category for each extracted feature. |
STRUCT | In both training and prediction, each field of theSTRUCT is imputed according to its type. |
Permissions
You must have thebigquery.models.getDataIdentity and Access Management (IAM) permissionin order to runML.PREDICT.
Examples
The following examples assume your model and input table are in your defaultproject.
Predict an outcome
The following example predicts an outcome and returns the following columns:
predicted_labellabelcolumn1column2
SELECT*FROMML.PREDICT(MODEL`mydataset.mymodel`,(SELECTlabel,column1,column2FROM`mydataset.mytable`))
Compare predictions from two different models
The following example creates two models and then compares their output:
Create the first model:
CREATEMODEL`mydataset.mymodel1`OPTIONS(model_type='linear_reg',input_label_cols=['label'],)ASSELECTlabel,input_column1FROM`mydataset.mytable`
Create the second model:
CREATEMODEL`mydataset.mymodel2`OPTIONS(model_type='linear_reg',input_label_cols=['label'],)ASSELECTlabel,input_column2FROM`mydataset.mytable`
Compare the output of the two models:
SELECTlabel,predicted_label1,predicted_labelASpredicted_label2FROMML.PREDICT(MODEL`mydataset.mymodel2`,(SELECT*EXCEPT(predicted_label),predicted_labelASpredicted_label1FROMML.PREDICT(MODEL`mydataset.mymodel1`,TABLE`mydataset.mytable`)))
Specify a custom threshold
The following example runs prediction with input data anda custom threshold of0.55:
SELECT*FROMML.PREDICT(MODEL`mydataset.mymodel`,(SELECTcustom_label,column1,column2FROM`mydataset.mytable`),STRUCT(0.55ASthreshold))
Predict an outcome from structured data with an imported TensorFlow model
The following query predicts outcomes using an importedTensorFlow model. Theinput_data table contains inputs in theschema expected bymy_model. SeetheCREATE MODEL statement for TensorFlow modelsfor more information.
SELECT*FROMML.PREDICT(MODEL`my_project.my_dataset.my_model`,(SELECT*FROMinput_data))
Predict an outcome from image data with an imported TensorFlow model
If you are running inference on image data from anobject table, you must use theML.DECODE_IMAGE functionto convert image bytes to a multi-dimensionalARRAY representation. You canuseML.DECODE_IMAGE output directly in anML.PREDICT function,or you can write the results fromML.DECODE_IMAGE to a table column andreference that column when you callML.PREDICT. You can also passML.DECODE_IMAGE output to another image processing function foradditional preprocessing during either of these procedures.
You can join the object table to standard BigQuery tables tolimit the data used in inference, or to provide additional input to the model.
The following examples show different ways you can use theML.PREDICTfunction with image data.
Example 1
The following example uses theML.DECODE_IMAGE function directly in theML.PREDICT function. It returns the inference results for all images in theobject table, for a model with an input field ofinput and an outputfield offeature:
SELECT*FROMML.PREDICT(MODEL`my_dataset.vision_model`,(SELECTuri,ML.RESIZE_IMAGE(ML.DECODE_IMAGE(data),480,480,FALSE)ASinputFROM`my_dataset.object_table`));
Example 2
The following example uses theML.DECODE_IMAGE function directly in theML.PREDICT function, and uses theML.CONVERT_COLOR_SPACE function in theML.PREDICT function to convertthe image color space fromRBG toYIQ. It also shows how touse object table fields to filter the objects included in inference.It returns the inference results for all JPG images in theobject table, for a model with an input field ofinput and an outputfield offeature:
SELECT*FROMML.PREDICT(MODEL`my_dataset.vision_model`,(SELECTuri,ML.CONVERT_COLOR_SPACE(ML.RESIZE_IMAGE(ML.DECODE_IMAGE(data),224,280,TRUE),'YIQ')ASinputFROM`my_dataset.object_table`WHEREcontent_type='image/jpeg'));
Example 3
The following example uses results fromML.DECODE_IMAGE that have beenwritten to a table column but not processed any further. It usesML.RESIZE_IMAGE andML.CONVERT_IMAGE_TYPE in theML.PREDICT function toprocess the image data. It returns the inference results for all images in thedecoded images table, for a model with an input field ofinput and an outputfield offeature.
Create the decoded images table:
CREATEORREPLACETABLE`my_dataset.decoded_images`AS(SELECTML.DECODE_IMAGE(data)ASdecoded_imageFROM`my_dataset.object_table`);
Run inference on the decoded images table:
SELECT*FROMML.PREDICT(MODEL`my_dataset.vision_model`,(SELECTuri,ML.CONVERT_IMAGE_TYPE(ML.RESIZE_IMAGE(decoded_image,480,480,FALSE))ASinputFROM`my_dataset.decoded_images`));
Example 4
The following example uses results fromML.DECODE_IMAGE that have beenwritten to a table column and preprocessed usingML.RESIZE_IMAGE. It returns the inference results for all images in thedecoded images table, for a model with an input field ofinput and an outputfield offeature.
Create the table:
CREATEORREPLACETABLE`my_dataset.decoded_images`AS(SELECTML.RESIZE_IMAGE(ML.DECODE_IMAGE(data)480,480,FALSE)ASdecoded_imageFROM`my_dataset.object_table`);
Run inference on the decoded images table:
SELECT*FROMML.PREDICT(MODEL`my_dataset.vision_model`,(SELECTuri,decoded_imageASinputFROM`my_dataset.decoded_images`));
Example 5
The following example uses theML.DECODE_IMAGE function directly in theML.PREDICT function. In this example, the model has an output field ofembeddings and two input fields: one that expects animage,f_img, and one that expects a string,f_txt. The imageinput comes from the object table and the string input comes from astandard BigQuery table that is joined with the object tableby using theuri column.
SELECT*FROMML.PREDICT(MODEL`my_dataset.mixed_model`,(SELECTuri,ML.RESIZE_IMAGE(ML.DECODE_IMAGE(my_dataset.my_object_table.data),224,224,FALSE)ASf_img,my_dataset.image_description.descriptionASf_txtFROM`my_dataset.object_table`JOIN`my_dataset.image_description`ONobject_table.uri=image_description.uri));
Predict an outcome with a model trained with theTRANSFORM clause
The following example trains a model using theTRANSFORM clause:
CREATEMODEL`mydataset.mymodel`TRANSFORM(f1+f2asc,label)OPTIONS(...)ASSELECTf1,f2,f3,labelFROMt;
Because thef3 column doesn't appear in theTRANSFORM clause,the following prediction query omits that column in theQUERY_STATEMENT:
SELECT*FROMML.PREDICT(MODEL`mydataset.mymodel`,(SELECTf1,f2FROMt1));
Iff3 is provided in theSELECT statement, it isn't used for calculatingpredictions but is instead passed through for use in the rest of theSQL statement.
Predict dimensionality reduction results (latent space) with an autoencoder model
The following example runs prediction against a previously builtautoencoder model, where the input was 4 dimensional (4 input columns) andthe dimensionality reduction had 2 dimensions (2 output columns):
SELECT*FROMML.PREDICT(MODEL`mydataset.mymodel`,(SELECTf1,f2,f3,f4FROMt1));
What's next
- For more information about model inference, seeModel inference overview.
- For more information about supported SQL statements and functions for MLmodels, seeEnd-to-end user journeys for ML models.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-11-25 UTC.