The ML.EXPLAIN_PREDICT function

This document describes theML.EXPLAIN_PREDICT function, which lets yougenerate a predicted value and a set of feature attributions for each instanceof the input data. Feature attributions indicate how much each feature in yourmodel contributed to the final prediction for each given instance.ML.EXPLAIN_PREDICT is essentially an extended version ofML.PREDICT.

Syntax

ML.EXPLAIN_PREDICT(  MODEL `PROJECT_ID.DATASET.MODEL`,  { TABLE `PROJECT_ID.DATASET.TABLE` | (QUERY_STATEMENT) },  STRUCT(  [MAX_OUTPUT_TOKENS AS max_output_tokens]  [,TOP_K_FEATURES AS top_k_features]  [,THRESHOLD AS threshold]  [,INTEGRATED_GRADIENTS_NUM_STEPS AS integrated_gradients_num_steps]  [,APPROX_FEATURE_CONTRIB AS approx_feature_contrib]))

Arguments

ML.EXPLAIN_PREDICT takes the following arguments:

  • PROJECT_ID: the project that contains theresource.
  • DATASET: the dataset that contains theresource.
  • MODEL: the name of the model.
  • TABLE: the name of the input table that contains thedata to be evaluated.

    IfTABLE is specified, the input column names in the table must match thecolumn names in the model, and their types should be compatible according toBigQueryimplicit coercion rules.

    If there are unused columns from the table, they are passed through tothe output columns.

  • QUERY_STATEMENT: the GoogleSQL query that isused to generate the evaluation data. For the supported SQL syntax for theQUERY_STATEMENT clause in GoogleSQL, seeQuery syntax.

    IfQUERY_STATEMENT is specified, the input column names from the querymust match the column names in the model, and their types should becompatible according to BigQueryimplicit coercion rules.

    If there are unused columns from the table, they are passed through tothe output columns.

    If you used theTRANSFORM clausein theCREATE MODEL statement that created the model, then only the inputcolumns present in theTRANSFORM clause can appear inQUERY_STATEMENT.

  • TOP_K_FEATURES: anINT64 value that specifies howmany top feature attribution pairs are generated for each row of input data.The features are ranked by the absolute values of their attributions.

    By default,TOP_K_FEATURES is set to5. If its value is greater thanthe number of features in the training data, the attributions of allfeatures are returned.

  • THRESHOLD: aFLOAT64 value that specifies the cutoffbetween the two labels for binary classification models. Predictions above thethreshold are positive predictions. Predictions below the threshold arenegative predictions. Feature attributions are returned only for the predictedlabel.

    TheTHRESHOLD value must be between0.0 and1.0. The default value is0.5.

  • INTEGRATED_GRADIENTS_NUM_STEPS: anINT64 value thatspecifies the number of steps to sample between the example being explainedand its baseline. This value is used to approximate the integral inintegrated gradientsattribution methods. Increasing the value improves the precision of featureattributions, but can be slower and more computationally expensive.

    This option only applies todeep neural network (DNN) models,which use integrated gradients attribution methods. The default valueis15.

  • APPROX_FEATURE_CONTRIB: aBOOL value that indicateswhether to use an approximate feature contribution method in the XGBoost modelexplanation. This option applies only toboosted treeandrandom forestmodels.

    This capability is provided by the XGBoost library;BigQuery ML only passes this option through to it. For moreinformation, seePackage 'xgboost'and search forapproxcontrib.

    The default value isFALSE.

Output

ML.EXPLAIN_PREDICT returns the following columns in addition to anypassthrough columns:

  • predicted_<label_column_name>: aSTRING value that contains either thepredicted value of the label for regression models or the predicted labelclass for classification models.
  • probability: aFLOAT64 value that contains the probability of thepredicted label class. This column is only present for classification models.
  • top_feature_attributions: AnARRAY<STRUCT> value that contains theattributions of the topk features to the final prediction:
    • top_feature_attributions.feature: aSTRING value that contains thefeature name.
    • top_feature_attributions.attribution: aFLOAT64 value that contains theattribution of the feature to the final prediction.
  • baseline_prediction_value: aFLOAT64 value that contains one of thefollowing:
    • For linear models, thebaseline_prediction_value value is the interceptof the model.
    • For DNN models, thebaseline_prediction_value value is themean across all numerical features andNULL for other types of features.
    • For boosted tree and random forest models, thebaseline_prediction_valuevalue is equal to the biasterm, which is the expected output of the model over the training dataset. SeeTree SHAP documentation for moreinformation.
  • prediction_value: The raw prediction value.
    • For regression models, this is aFLOAT64 value that contains the value ofthe column identified bypredicted_<label_column_name>.
    • For classification models, this is anINT orSTRING value that containsthelogit value (alsocalled log-odds) for the predicted class. The predicted class probabilitiesare obtained by applying thesoftmaxtransformation to the logit values.
  • approximation_error:

    • Exact attribution methods like Tree SHAP are defined as follows:

      $$\frac{|\texttt{prediction_value} - \texttt{baseline_prediction_value} - \sum{\texttt{feature_attributions}}|}{|\texttt{prediction_value} -\texttt{baseline_prediction_value}|}$$

      Because of this explanation of the contributions to the predicted value,there is no approximation error for these types ofmethods, and this column value is0. Exact attribution methods areused for the following types of models:

    • Integrated gradients is an approximated attribution method that is definedas follows:

      $$\texttt{baseline_prediction_value} + \sum{\texttt{feature_attributions}} = \texttt{prediction_value}$$

      For integrated gradients, this column value is greater than0.The integrated gradients method is used with DNN models.

Examples

The following examples assume that your model and input table are in yourdefault project.

Explain a prediction generated by a linear regression model

The following example explains a prediction for alinear regression model by generating the top three attributions.

Assume a linear regression model stored inmydataset.mymodel was trained withthe tablemydataset.table with the following columns:

  • label
  • column1
  • column2
  • column3
  • column4
  • column5
SELECT*FROMML.EXPLAIN_PREDICT(MODEL`mydataset.mymodel`,(SELECTlabel,column1,column2,column3,column4,column5FROM`mydataset.mytable`),STRUCT(3AStop_k_features))

Explain a prediction generated by a boosted tree or a random forest binary classification model

The following example explains a prediction generated by a boosted tree or arandom forest binary classification model. It generates the top threeattributions with a custom threshold.

Assume a boosted tree or a random forest binary classification model storedinmydataset.mymodel is trained with the tablemydataset.table with thefollowing columns:

  • label
  • column1
  • column2
  • column3
  • column4
  • column5
SELECT*FROMML.EXPLAIN_PREDICT(MODEL`mydataset.mymodel`,(SELECTlabel,column1,column2,column3,column4,column5FROM`mydataset.mytable`),STRUCT(3AStop_k_features,0.7ASthreshold))

Explain a prediction generated by a DNN classifier model

The following example explains a prediction generated by a DNN classifier model.

Assume a DNN classifier is stored inmydataset.mymodel and trained with thetablemydataset.table with the following columns:

  • label
  • column1
  • column2
  • column3
  • column4
  • column5
SELECT*FROMML.EXPLAIN_PREDICT(MODEL`mydataset.mymodel`,(SELECTlabel,column1,column2,column3,column4,column5FROM`mydataset.mytable`),STRUCT(3AStop_k_features,30ASintegrated_gradients_num_steps))

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-11-24 UTC.