The ML.EXPLAIN_PREDICT function
This document describes theML.EXPLAIN_PREDICT function, which lets yougenerate a predicted value and a set of feature attributions for each instanceof the input data. Feature attributions indicate how much each feature in yourmodel contributed to the final prediction for each given instance.ML.EXPLAIN_PREDICT is essentially an extended version ofML.PREDICT.
Syntax
ML.EXPLAIN_PREDICT( MODEL `PROJECT_ID.DATASET.MODEL`, { TABLE `PROJECT_ID.DATASET.TABLE` | (QUERY_STATEMENT) }, STRUCT( [MAX_OUTPUT_TOKENS AS max_output_tokens] [,TOP_K_FEATURES AS top_k_features] [,THRESHOLD AS threshold] [,INTEGRATED_GRADIENTS_NUM_STEPS AS integrated_gradients_num_steps] [,APPROX_FEATURE_CONTRIB AS approx_feature_contrib]))Arguments
ML.EXPLAIN_PREDICT takes the following arguments:
PROJECT_ID: the project that contains theresource.DATASET: the dataset that contains theresource.MODEL: the name of the model.TABLE: the name of the input table that contains thedata to be evaluated.If
TABLEis specified, the input column names in the table must match thecolumn names in the model, and their types should be compatible according toBigQueryimplicit coercion rules.If there are unused columns from the table, they are passed through tothe output columns.
QUERY_STATEMENT: the GoogleSQL query that isused to generate the evaluation data. For the supported SQL syntax for theQUERY_STATEMENTclause in GoogleSQL, seeQuery syntax.If
QUERY_STATEMENTis specified, the input column names from the querymust match the column names in the model, and their types should becompatible according to BigQueryimplicit coercion rules.If there are unused columns from the table, they are passed through tothe output columns.
If you used the
TRANSFORMclausein theCREATE MODELstatement that created the model, then only the inputcolumns present in theTRANSFORMclause can appear inQUERY_STATEMENT.TOP_K_FEATURES: anINT64value that specifies howmany top feature attribution pairs are generated for each row of input data.The features are ranked by the absolute values of their attributions.By default,
TOP_K_FEATURESis set to5. If its value is greater thanthe number of features in the training data, the attributions of allfeatures are returned.THRESHOLD: aFLOAT64value that specifies the cutoffbetween the two labels for binary classification models. Predictions above thethreshold are positive predictions. Predictions below the threshold arenegative predictions. Feature attributions are returned only for the predictedlabel.The
THRESHOLDvalue must be between0.0and1.0. The default value is0.5.INTEGRATED_GRADIENTS_NUM_STEPS: anINT64value thatspecifies the number of steps to sample between the example being explainedand its baseline. This value is used to approximate the integral inintegrated gradientsattribution methods. Increasing the value improves the precision of featureattributions, but can be slower and more computationally expensive.This option only applies todeep neural network (DNN) models,which use integrated gradients attribution methods. The default valueis
15.APPROX_FEATURE_CONTRIB: aBOOLvalue that indicateswhether to use an approximate feature contribution method in the XGBoost modelexplanation. This option applies only toboosted treeandrandom forestmodels.This capability is provided by the XGBoost library;BigQuery ML only passes this option through to it. For moreinformation, seePackage 'xgboost'and search for
approxcontrib.The default value is
FALSE.
Output
ML.EXPLAIN_PREDICT returns the following columns in addition to anypassthrough columns:
predicted_<label_column_name>: aSTRINGvalue that contains either thepredicted value of the label for regression models or the predicted labelclass for classification models.probability: aFLOAT64value that contains the probability of thepredicted label class. This column is only present for classification models.top_feature_attributions: AnARRAY<STRUCT>value that contains theattributions of the topk features to the final prediction:top_feature_attributions.feature: aSTRINGvalue that contains thefeature name.top_feature_attributions.attribution: aFLOAT64value that contains theattribution of the feature to the final prediction.
baseline_prediction_value: aFLOAT64value that contains one of thefollowing:- For linear models, the
baseline_prediction_valuevalue is the interceptof the model. - For DNN models, the
baseline_prediction_valuevalue is themean across all numerical features andNULLfor other types of features. - For boosted tree and random forest models, the
baseline_prediction_valuevalue is equal to the biasterm, which is the expected output of the model over the training dataset. SeeTree SHAP documentation for moreinformation.
- For linear models, the
prediction_value: The raw prediction value.- For regression models, this is a
FLOAT64value that contains the value ofthe column identified bypredicted_<label_column_name>. - For classification models, this is an
INTorSTRINGvalue that containsthelogit value (alsocalled log-odds) for the predicted class. The predicted class probabilitiesare obtained by applying thesoftmaxtransformation to the logit values.
- For regression models, this is a
approximation_error:Exact attribution methods like Tree SHAP are defined as follows:
$$\frac{|\texttt{prediction_value} - \texttt{baseline_prediction_value} - \sum{\texttt{feature_attributions}}|}{|\texttt{prediction_value} -\texttt{baseline_prediction_value}|}$$Because of this explanation of the contributions to the predicted value,there is no approximation error for these types ofmethods, and this column value is
0. Exact attribution methods areused for the following types of models:Integrated gradients is an approximated attribution method that is definedas follows:
$$\texttt{baseline_prediction_value} + \sum{\texttt{feature_attributions}} = \texttt{prediction_value}$$For integrated gradients, this column value is greater than
0.The integrated gradients method is used with DNN models.
Examples
The following examples assume that your model and input table are in yourdefault project.
Explain a prediction generated by a linear regression model
The following example explains a prediction for alinear regression model by generating the top three attributions.
Assume a linear regression model stored inmydataset.mymodel was trained withthe tablemydataset.table with the following columns:
labelcolumn1column2column3column4column5
SELECT*FROMML.EXPLAIN_PREDICT(MODEL`mydataset.mymodel`,(SELECTlabel,column1,column2,column3,column4,column5FROM`mydataset.mytable`),STRUCT(3AStop_k_features))
Explain a prediction generated by a boosted tree or a random forest binary classification model
The following example explains a prediction generated by a boosted tree or arandom forest binary classification model. It generates the top threeattributions with a custom threshold.
Assume a boosted tree or a random forest binary classification model storedinmydataset.mymodel is trained with the tablemydataset.table with thefollowing columns:
labelcolumn1column2column3column4column5
SELECT*FROMML.EXPLAIN_PREDICT(MODEL`mydataset.mymodel`,(SELECTlabel,column1,column2,column3,column4,column5FROM`mydataset.mytable`),STRUCT(3AStop_k_features,0.7ASthreshold))
Explain a prediction generated by a DNN classifier model
The following example explains a prediction generated by a DNN classifier model.
Assume a DNN classifier is stored inmydataset.mymodel and trained with thetablemydataset.table with the following columns:
labelcolumn1column2column3column4column5
SELECT*FROMML.EXPLAIN_PREDICT(MODEL`mydataset.mymodel`,(SELECTlabel,column1,column2,column3,column4,column5FROM`mydataset.mytable`),STRUCT(3AStop_k_features,30ASintegrated_gradients_num_steps))
What's next
- For more information about Explainable AI, seeBigQuery Explainable AI overview.
- For more information about supported SQL statements and functions for MLmodels, seeEnd-to-end user journeys for ML models.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-11-24 UTC.