The ML.ROC_CURVE function

This document describes theML.ROC_CURVE function, which you can use toevaluate binary class classification specific metrics.

Syntax

ML.ROC_CURVE(  MODEL `PROJECT_ID.DATASET.MODEL_NAME`,  { TABLE `PROJECT_ID.DATASET.TABLE` | (QUERY_STATEMENT) },  [, GENERATE_ARRAY(THRESHOLDS)]  [, STRUCT(TRIAL_ID AS trial_id)])

Arguments

ML.ROC_CURVE takes the following arguments:

  • PROJECT_ID: the project that contains theresource.
  • DATASET: the dataset that contains theresource.
  • MODEL: the name of the model.
  • TABLE: the name of the input table that containsthe evaluation data.

    IfTABLE is specified, the input column names in the table must match thecolumn names in the model, and their types should be compatible according toBigQueryimplicit coercion rules.The input must have a column that matches thelabel column name that's provided during training. This value is providedusing theinput_label_cols option. Ifinput_label_cols is unspecified,the column that's namedlabel in the training data is used.

    If you don't specify eitherTABLE orQUERY_STATEMENT,ML.ROC_CURVE computes the curve results as follows:

    • If the data is split during training, the split evaluation data is usedto compute the curve results.
    • If the data is not split during training, the entire training input isused to compute the curve results.
  • QUERY_STATEMENT: a GoogleSQL query that isused to generate the evaluation data. For the supported SQL syntax of theQUERY_STATEMENT clause in GoogleSQL, seeQuery syntax.

    IfQUERY_STATEMENT is specified, the input column names from the querymust match the column names in the model, and their types should becompatible according to BigQueryimplicit coercion rules.The input must have a column that matches the label column name providedduring training. This value is provided using theinput_label_cols option.Ifinput_label_cols is unspecified, the column namedlabel in thetraining data is used. The extra columns are ignored.

    If you used theTRANSFORM clausein theCREATE MODEL statement that created the model, then only the inputcolumns present in theTRANSFORM clause must appear inQUERY_STATEMENT.

    If you don't specify eithertable orQUERY_STATEMENT,ML.ROC_CURVE computes the curve results as follows:

    • If the data is split during training, the split evaluation data is usedto compute the curve results.
    • If the data is not split during training, the entire training input isused to compute the curve results.
  • THRESHOLDS: anARRAY<FLOAT64> value that specifiesthe percentile values of the prediction output supplied by theGENERATE_ARRAY function.

  • TRIAL_ID: anINT64 value that identifies thehyperparameter tuning trial that you want the function to evaluate. Thefunction uses the optimal trial by default. Only specify this argument if youran hyperparameter tuning when creating the model.

Output

ML.ROC_CURVE returns multiple rows with metrics fordifferent threshold values for the model. The metrics include the following:

  • threshold: aFLOAT64 value that contains the custom threshold for thebinary class classification model.
  • recall: aFLOAT64 value that indicates the proportion of actual positivecases that were correctly predicted by the model.
  • true_positives: anINT64 value that contains the number of casescorrectly predicted as positive by the model.
  • false_positives: anINT64 value that contains the number of casesincorrectly predicted as positive by the model.
  • true_negatives: anINT64 value that contains the number of casescorrectly predicted as negative by the model.
  • false_negatives: anINT64 value that contains the number of casesincorrectly predicted as negative by the model.

Examples

The following examples assume your model and input table are in your defaultproject.

Evaluate the ROC curve of a binary class logistic regression model

The following query returns all of the output columns forML.ROC_CURVE. Youcan graph therecall andfalse_positive_rate values for an ROC curve. Thethreshold values returned are chosen based on the percentile values of theprediction output.

SELECT*FROMML.ROC_CURVE(MODEL`mydataset.mymodel`,TABLE`mydataset.mytable`)

Evaluate an ROC curve with custom thresholds

The following query returns all of the output columns forML.ROC_CURVE. Thethreshold values returned are chosen based on the output of theGENERATE_ARRAYfunction.

SELECT*FROMML.ROC_CURVE(MODEL`mydataset.mymodel`,TABLE`mydataset.mytable`,GENERATE_ARRAY(0.4,0.6,0.01))

Evaluate the precision-recall curve

Instead of getting an ROC curve (the recall versus false positive rate), thefollowing query calculates a precision-recall curve by using the precisionfrom the true and false positive counts:

SELECTrecall,true_positives/(true_positives+false_positives)ASprecisionFROMML.ROC_CURVE(MODEL`mydataset.mymodel`,TABLE`mydataset.mytable`)

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.