The ML.CONFUSION_MATRIX function

This document describes theML.CONFUSION_MATRIX function, which you can useto return a confusion matrix for the input classification model and input data.

Syntax

ML.CONFUSION_MATRIX(  MODEL `PROJECT_ID.DATASET.MODEL_NAME`,  [, { TABLE `PROJECT_ID.DATASET.TABLE` | (QUERY_STATEMENT) }]    STRUCT(      [THRESHOLD AS threshold]      [,TRIAL_ID AS trial_id]))

Arguments

ML.CONFUSION_MATRIX takes the following arguments:

  • PROJECT_ID: the project that contains theresource.
  • DATASET: the dataset that contains theresource.
  • MODEL: the name of the model.
  • TABLE: the name of the input table that containsthe evaluation data.

    IfTABLE is specified, the input column names in the table must match thecolumn names in the model, and their types should be compatible according toBigQueryimplicit coercion rules.The input must have a column that matches thelabel column name provided during training. This value is provided using theinput_label_cols option. Ifinput_label_cols is unspecified, the columnnamedlabel in the training data is used.

    If you don't specify eitherTABLE orQUERY_STATEMENT,ML.CONFUSION_MATRIX computes the confusion matrix results as follows:

    • If the data is split during training, the split evaluation data is used tocompute the confusion matrix results.
    • If the data is not split during training, the entire training input isused to compute the confusion matrix results.
  • QUERY_STATEMENT: a GoogleSQL query that isused to generate the evaluation data. For the supported SQL syntax of theQUERY_STATEMENT clause in GoogleSQL, seeQuery syntax.

    IfQUERY_STATEMENT is specified, the input column names from the querymust match the column names in the model, and their types should becompatible according to BigQueryimplicit coercion rules.The input must have a column that matches the label column name providedduring training. This value is provided using theinput_label_cols option.Ifinput_label_cols is unspecified, the column namedlabel in thetraining data is used. The extra columns are ignored.

    If you used theTRANSFORM clausein theCREATE MODEL statement that created the model, then only the inputcolumns present in theTRANSFORM clause must appear inQUERY_STATEMENT.

    If you don't specify eitherTABLE orQUERY_STATEMENT,ML.CONFUSION_MATRIX computes the confusion matrix results as follows:

    • If the data is split during training, the split evaluation data is used tocompute the confusion matrix results.
    • If the data is not split during training, the entire training input isused to compute the confusion matrix results.
  • THRESHOLD: aFLOAT64 value that specifies a customthreshold for the binary-class classification model to use for evaluation. Thedefault value is0.5.

    A0 value for precision or recall means that the selected thresholdproduced no true positive labels. ANaN value for precision means that theselected threshold produced no positive labels, neither true positives norfalse positives.

    If bothTABLE andQUERY_STATEMENT are unspecified, you can't use athreshold.

    You can't useTHRESHOLD with multiclass classification models.

  • TRIAL_ID: anINT64 value that identifies thehyperparameter tuning trial that you want the function to evaluate. Thefunction uses the optimal trial by default. Only specify this argument if youran hyperparameter tuning when creating the model.

Note:ML.CONFUSION_MATRIX requires input data with some models, andreturns an error if it is absent. If this occurs, provide input data when usingML.CONFUSION_MATRIX with these models.

Output

The output columns of theML.CONFUSION_MATRIX function depend on the model.The first output column is alwaysexpected_label. There areN additionalcolumns, one for each class in the trained model. The names of the additionalcolumns depend on the class labels used to train the model.

If the training class labels all conform to BigQuerycolumn naming rules, the labels are usedas the column names. Columns that don't conform to naming rules are altered toconform to the column naming rules and to be unique. For example, if the labelsare0 and1, the output column names are_0 and_1.

The columns are ordered based on the class labels in ascending order. If thelabels in the evaluation data match those in the training data, theTrue Positivesare shown on the diagonal from top left to bottom right. The expected (oractual) labels are listed one per row, and the predicted labels are listed oneper column.

The values in theexpected_label column are the exact values and type passedintoML.CONFUSION_MATRIX in the label column of the evaluation data. This istrue even if they don't exactly match the values or type used during training.

Limitations

ML.CONFUSION_MATRIX doesn't supportimported TensorFlow models.

Examples

The following examples demonstrate the use of theML.CONFUSION_MATRIX function.

ML.CONFUSION_MATRIX with a query statement

The following example returns the confusion matrix for a logisticregression model namedmydataset.mymodel in your default project:

SELECT*FROMML.CONFUSION_MATRIX(MODEL`mydataset.mymodel`,(SELECT*FROM`mydataset.mytable`))

ML.CONFUSION_MATRIX with a custom threshold

The following example returns the confusion matrix for a logisticregression model namedmydataset.mymodel in your default project:

SELECT*FROMML.CONFUSION_MATRIX(MODEL`mydataset.mymodel`,(SELECT*FROM`mydataset.mytable`),STRUCT(0.6ASthreshold))

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.