Model evaluation components

Model evaluation components take ground truth andbatch predictionresults as input and generate evaluation metrics. Ground truth data is"correctly labeled" data as determined by humans, and is usually provided as thetarget field in the test dataset you used to train the model. For moreinformation, seeModel evaluation in Vertex AI.

Vertex AI provides the following model evaluation components:

Model type support

The following table shows supported model types for each model evaluationcomponent:

Model evaluation component	Supported model types
`ModelEvaluationClassificationOp`	AutoML tabular or image Custom tabular
`ModelEvaluationRegressionOp`	AutoML tabular Custom tabular
`ModelEvaluationForecastingOp`	AutoML tabular

Remove the target field

For some model types, theBatchPredictionJob componentrequires you to exclude the target column (ground truth) from your dataset.

Configure and format slices for your tabular dataset

Preview

This product or feature is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of the Service Specific Terms. Pre-GA products and features are available "as is" and might have limited support. For more information, see thelaunch stage descriptions.

A slice is a subset of tabular data with specified characteristics. Slicedmetrics offers more granular evaluation metrics for AutoML tabular and customtraining classification models.

For example, consider a pet adoption dataset where a model is trained to predictwhether a pet will be adopted within a week. While it's useful to look atmetrics for the whole dataset, you may be interested in metrics related to thepet's species and age. In other words, you're interested in the followingsubset of the dataset:

Feature	Values
`age (in years)`	1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12
`species`	"dog", "cat", "rabbit"

This fine-grained analysis can be helpful in identifying where the model mightbe most error prone, or ensuring that the model performs well on criticalsubsets of the data.

Configure slices

To configure a slice, create a JSON file with the following configuration:

{  "FEATURE": {    "KIND": {CONFIG    }  }}

Where:

FEATURE is a feature for which you want to create a slice. For example,age.
KIND is one of the following:
- value
  value creates a single slice on a single value of a feature. SpecifyCONFIG as a key-value pair where the key isfloat_value orstring_value. For example,"float_value": 1.0.
  The following example configuration creates a single slice containing datawith theage of1:
```
{  "age": {    "value": {      "float_value": 1.0    }  }}
```
- range
  range creates a single slice that includes data within a specified range.SpecifyCONFIG as two key-value pairs where the keys arelowandhigh.
  The following example configuration creates a single slice containing datawhere theage is any value between1 and3:
```
{  "age": {    "range": {      "low": 1,      "high": 3    }  }}
```
- all_values
  all_values creates a separate slice for each possiblelabel for the feature. SpecifyCONFIG as"value": true.
  The following example configuration creates three slices—one each for"dog","cat", and"rabbit":
```
{  "species": {    "all_values": {      "value": true    }  }}
```

You can also use multiple features to create one or more slices.

The following configuration creates a single slice containing data wheretheage is between1 and3 and thespecies is"dog":

{  "age": {    "range": {      "low": 1,      "high": 3    }  },  "species": {    "value": {      "string_value": "dog"    }  }}

The following configuration creates multiple slices—one for each unique speciesin the dataset, where theage is1.

{  "species": {    "all_values": {      "value": true    }  },  "age": {    "value": {      "float_value": 1.0    }  }}

The resulting slices contain the following data from the example dataset:

Slice 1:age:1 andspecies:"dog"
Slice 2:age:1 andspecies:"cat"
Slice 3:age:1 andspecies:"rabbit"

Format slices

To format slices for theModelEvaluationClassificationOp component, do thefollowing:

Create aslicing_spec. For example:

from google.cloud.aiplatform_v1.types.ModelEvaluationSlice.Slice import SliceSpecfrom google.cloud.aiplatform_v1.types.ModelEvaluationSlice.Slice.SliceSpec import SliceConfigslicing_spec = SliceSpec(configs={ 'feature_a': SliceConfig(SliceSpec.Value(string_value='label_a') ) })

Create a list to store the slice configurations. For example:
```
slicing_specs = []
```

Format eachslicing_spec as JSON or a dictionary. For example:

slicing_spec_json = json_format.MessageToJson(slicing_spec)slicing_spec_dict = json_format.MessageToDict(slicing_spec)

Combine eachslicing_spec into a list. For example:
```
slicing_specs.append(slicing_spec_json)
```
Passslicing_specs as a parameter to theModelEvaluationClassificationOpcomponent. For example:
```
ModelEvaluationClassificationOp(slicing_specs=slicing_specs)
```

Format fairness metrics configuration slices

To evaluate your model forfairness, use the following components:

DetectDataBiasOp: returnsdata bias metrics for the slicesyou pass into the component.
DetectModelBiasOp: returnsmodel bias metrics for theslices you pass into the component.

To format your slices for theDetectDataBiasOp andDetectModelBiasOpcomponents, do the following:

Create aBiasConfig. For example:

from google.cloud.aiplatform_v1.types.ModelEvaluation import BiasConfigfrom google.cloud.aiplatform_v1.types.ModelEvaluationSlice.Slice import SliceSpecfrom google.cloud.aiplatform_v1.types.ModelEvaluationSlice.Slice.SliceSpec import SliceConfigbias_config = BiasConfig(bias_slices=SliceSpec(configs={ 'feature_a': SliceConfig(SliceSpec.Value(string_value='label_a') ) }))

Create a list to store the bias configurations. For example:
```
bias_configs = []
```

Format eachbias_config as JSON or a dictionary. For example:

bias_config_json = json_format.MessageToJson(bias_config)bias_config_dict = json_format.MessageToDict(bias_config)

Combine eachbias_config into a list. For example:
```
bias_configs.append(bias_config_json)
```
Passbias_configs as a parameter to theDetectDataBiasOp orDetectModelBiasOpcomponent. For example:
```
DetectDataBiasOp(bias_configs=bias_configs)DetectModelBiasOp(bias_configs=bias_configs)
```

Required input parameters

The required input parameters for model evaluation components vary based on thetype of model you are evaluating.

AutoML models

When training AutoML models, Vertex AI uses default schemas.Some input parameters for the model evaluation components are only required ifyou used a non-default schema to train your model.

You can view and download schema files from the following Cloud Storagelocation:
gs://google-cloud-aiplatform/schema/modelevaluation/

The following tabs show example batch prediction output files and required inputparameters for each Tabular model type parameters:

Classification

The default schema includes the following fields:

scores: array
classes: array

The following is an example of a batch prediction output file with the target column ofmale:

{  "instance": {    "male": "1",    "age": "49",    "heartRate": "67",    "education": "medium_high",    "totChol": "282",    "diaBP": "79",    "BPMeds": "0",    "currentSmoker": "current_high",    "cigsPerDay": "25",    "prevalentStroke": "stroke_0",    "prevalentHyp": "hyp_0",    "TenYearCHD": "FALSE",    "diabetes": "0",    "sysBP": "134",    "BMI": "26.87",    "glucose": "78"    },  "prediction": {      "scores": [0.3666940927505493, 0.6333059072494507],      "classes": ["0", "1"]    }}

This table describes the required parameters for theModelEvaluationClassificationOp component with a tabular model:

Evaluation component parameter	Required
`target_field_name`	Yes
`prediction_label_column`	Not required for default schema. Use`prediction.classes` for non-default schemas.
`prediction_score_column`	Not required for default schema. Use`prediction.scores` for non-default schemas.
`evaluation_class_labels`	Not required for default schema. Use`["0","1"]` for non-default schemas.

Regression

The default schema includes the following fields:

value: float
lower_bound: float
upper_bound: float

The following is an example of a batch prediction output file with the target column ofage:

{  "instance": {    "BPMeds": "0",    "diabetes": "0",    "diaBP": "65",    "cigsPerDay": "45",    "prevalentHyp": "hyp_0",    "age": "46",    "male": "1",    "sysBP": "112.5",    "glucose": "78",    "BMI": "27.48",    "TenYearCHD": "FALSE",    "totChol": "209",    "education": "high",    "prevalentStroke": "stroke_0",    "heartRate": "75",    "currentSmoker": "current_high"  },  "prediction": {    "value": 44.96103286743164,    "lower_bound": 44.61349868774414,    "upper_bound": 44.590206146240234  }}

This table describes the required parameters for theModelEvaluationRegressionOp component with a tabular model:

Evaluation component parameter	Required
`target_field_name`	Yes
`prediction_label_column`	Not required for regression models.
`prediction_score_column`	Not required for default schema. Use`prediction.value` for non-default schemas.
`evaluation_class_labels`	Not required for regression models.

Forecasting

For tabular forecasting models, theBatchPredictionJob component requires you to exclude the target column (ground truth) from your dataset.

The default schema includes the following fields:

id: string
displayNames: string
type: enum
timeSegmentStart: string
timeSegmentEnd: string
confidence: float

The following is an example CSV file withsale_dollars as the target column:

date,store_name,city,zip_code,county,sale_dollars2020-03-17,Thriftway,,,,774.089999999999922020-03-10,Thriftway,,,,1160.672020-03-03,Thriftway,,,,2247.242020-06-08,New Star / Fort Dodge,,,,753.982020-06-01,New Star / Fort Dodge,,,,967.732020-01-10,Casey's General Store #1280 / Fort Dodge,,,,1040.922020-10-30,KUM & GO #76 / ADAIR,Adair,50002,ADAIR,1387.02

TheTargetFieldDataRemoverOp component removes the target column before sending the file to theBatchPredictionJob component:

date,store_name,city,zip_code,county2020-03-17,Thriftway,nan,nan,nan2020-03-10,Thriftway,nan,nan,nan2020-03-03,Thriftway,nan,nan,nan2020-06-08,New Star / Fort Dodge,nan,nan,nan2020-06-01,New Star / Fort Dodge,nan,nan,nan2020-01-10,Casey's General Store #1280 / Fort Dodge,nan,nan,nan2020-10-30,KUM & GO #76 / ADAIR,Adair,50002.0,ADAIR

The following is an example of a batch prediction output file without the target column ofsale_dollars:

{  "instance": {    "content": "gs://kbn-us-central1-test/datasets/text/happy_11556.txt",    "mimeType":"text/plain"  },  "prediction": {    "ids":    ["7903916851837534208","3292230833410146304","986387824196452352","2139309328803299328","5598073842623840256","6750995347230687232","4559431178561519616"],    "displayNames": ["affection","bonding","achievement","exercise","nature","enjoy_the_moment","leisure"],    "confidences": [0.99483216,0.005162797,4.1117933E-6,3.9997E-7,2.4624453E-7,1.9969502E-7,1.16997434E-7]  }}

This table describes the required parameters for theModelEvaluationForecastingOp component with a tabular model:

Evaluation component parameter	Required
`target_field_name`	Yes
`prediction_label_column`	Not required for forecasting models.
`prediction_score_column`	Not required for default schema. Use`"prediction.value"` for non-default schemas.
`evaluation_class_labels`	Not required for forecasting models.
`ground_truth_gcs_source`	Yes: the Cloud Storage link to the original dataset with the target column.
`ground_truth_format`	Yes: the format of the original dataset. For example,`"jsonl"`,`"csv"`, or`"bigquery"`.

Custom-trained models

For custom-trained models, theBatchPredictionJob componentrequires you to exclude the target column (ground truth) from your dataset. Youcan use theTargetFieldDataRemoverOp component for a moreefficient workflow.

The following example shows a batch prediction output file and required input parametersfor a custom-trained tabular classification model:

Classification

The typical schema includes the following fields:

instance: array of input data in the same order as the model training dataset columns.
prediction: array of input data in the same order as the model training dataset columns.

The following is an example CSV dataset withspecies as the target column:

petal_length,petal_width,sepal_length,sepal_width,species6.4,2.8,5.6,2.2,25.0,2.3,3.3,1.0,14.9,2.5,4.5,1.7,24.9,3.1,1.5,0.1,0

TheTargetFieldDataRemoverOp component removes the target column before sending the file to the batch prediction component:

petal_length,petal_width,sepal_length,sepal_width6.4,2.8,5.6,2.25.0,2.3,3.3,1.04.9,2.5,4.5,1.74.9,3.1,1.5,0.1

The following is an example of a batch prediction output file without the target column ofspecies:

{  "instance": [5.6, 2.5, 3.9, 1.1],  "prediction": [0.0005816521588712931, 0.9989032745361328, 0.0005150380893610418]}{  "instance": [5.0, 3.6, 1.4, 0.2],  "prediction": [0.999255359172821, 0.000527293945197016, 0.00021737271163146943]}{  "instance": [6.0, 2.9, 4.5, 1.5],  "prediction": [0.00025063654175028205, 0.9994204044342041, 0.00032893591560423374]}

This table describes the required parameters for theModelEvaluationClassificationOp component with a custom-trained model:

Evaluation component parameter	Required
`target_field_name`	Yes
`prediction_label_column`	Must be empty; this column doesn't exist in custom-trained models.
`prediction_score_column`	Not required for default schema; is set automatically for`JSONL`,`CSV`, or`BIGQUERY` formats.
`evaluation_class_labels`	Yes. Must be the same length as the output prediction array.
`ground_truth_gcs_source`	Yes: the Cloud Storage link to the original dataset with the target column.
`ground_truth_format`	Yes: the format of the original dataset. For example,`"jsonl"`,`"csv"`, or`"bigquery"`.

Model evaluations without Vertex AI-generated batch predictions

You can use a model evaluation pipeline component with a batch prediction thatyou didn't generate in Vertex AI. However, note that the inputfor the model evaluation pipeline component must be a batch prediction directorycontaining files that match the following prefixes:

prediction.results-
explanation.results-
predictions_

API reference

For component reference, see theGoogle Cloud SDK reference page for Model evaluation components.
For examples of how to use the Model Evaluation components, see the notebookslisted atCreate an evaluation.

Version history and release notes

To learn more about the version history and changes to the Google Cloud Pipeline Components SDK, see theGoogle Cloud Pipeline Components SDK Release Notes.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-17 UTC.

Movatterモバイル変換

Model evaluation components Stay organized with collections Save and categorize content based on your preferences.

Model type support

Remove the target field

Configure and format slices for your tabular dataset

Configure slices

Format slices

Format fairness metrics configuration slices

Required input parameters

AutoML models

Classification

Regression

Forecasting

Custom-trained models

Classification

Model evaluations without Vertex AI-generated batch predictions

API reference

Version history and release notes

Model evaluation components