Details for managed rubric-based metrics

Preview

This feature is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of theService Specific Terms. Pre-GA features are available "as is" and might have limited support. For more information, see thelaunch stage descriptions.

This page provides a full list of managed rubric-based metrics offered by the Gen AI evaluation service, which you can use in the GenAI Client in Vertex AI SDK.

For more information about test-driven evaluation, seeDefine your evaluation metrics.

Overview

The Gen AI evaluation service offers a list of managed rubric-based metrics for the test-driven evaluation framework:

  • For metrics with adaptive rubrics, most of them include both the workflow for rubric generation for each prompt and rubric validation. You can run them separately if needed. SeeRun an evaluation for details.

  • For metrics with static rubrics, no per-prompt rubrics are generated. For details regarding the intended outputs, seeMetric details.

Each managed rubric-based metric has a versioning number. The metric uses the latest version by default, but you can pin to a specific version if needed:

fromvertexaiimporttypestext_quality_metric=types.RubricMetric.TEXT_QUALITYgeneral_quality_v1=types.RubricMetric.GENERAL_QUALITY(version='v1')

Backward compatibility

For metrics offered as aMetric prompt templates, you can still access the pointwise metrics through the GenAI Client in Vertex AI SDK through the same approach. Pairwise metrics are not supported by the GenAI Client in Vertex AI SDK, but seeRun an evaluation to compare two models in the same evaluation.

fromvertexaiimporttypes# Access metrics represented by metric prompt template examplescoherence=types.RubricMetric.COHERENCEfluency=types.RubricMetric.FLUENCY

Managed metrics details

This section lists managed metrics with details such as their type, required inputs, and expected output:

General quality

Latest versiongeneral_quality_v1
TypeAdaptive rubrics
DescriptionA comprehensive adaptive rubrics metric that evaluates the overall quality of a model's response. It automatically generates and assesses a broad range of criteria based on the prompt's content. This is the recommended starting point for most evaluations.
How to access in SDKtypes.RubricMetric.GENERAL_QUALITY
Input
  • prompt
  • response
  • (Optional)rubric_groups
If you have rubrics already generated, you can provide them directly for evaluation.
Output
  • score
  • rubrics and correspondingverdicts
The score represents the passing rate of the response based on the rubrics.
Number of LLM calls6 calls to Gemini 2.5 Flash

Text quality

Latest versiontext_quality_v1
TypeAdaptive rubrics
DescriptionA targeted adaptive rubrics metric that specifically evaluates the linguistic quality of the response. It assesses aspects like fluency, coherence, and grammar.
How to access in SDKtypes.RubricMetric.TEXT_QUALITY
Input
  • prompt
  • response
  • (Optional)rubric_groups
If you have rubrics already generated, you can provide them directly for evaluation.
Output
  • score
  • rubrics and correspondingverdicts
The score represents the passing rate of the response based on the rubrics.
Number of LLM calls6 calls to Gemini 2.5 Flash

Instruction following

Latest versioninstruction_following_v1
TypeAdaptive rubrics
DescriptionA targeted adaptive rubrics metric that measures how well the response adheres to the specific constraints and instructions given in the prompt.
How to access in SDKtypes.RubricMetric.INSTRUCTION_FOLLOWING
Input
  • prompt
  • response
  • (Optional)rubric_groups
If you have rubrics already generated, You can provide them directly for evaluation.
Output
  • score (passing rate)
  • rubrics and correspondingverdicts
The score represents the passing rate of the response based on the rubrics.
Number of LLM calls6 calls to Gemini 2.5 Flash

Grounding

Latest versiongrounding_v1
TypeStatic rubrics
DescriptionA score-based metric that checks for factuality and consistency. It verifies that the model's response is grounded based on the context.
How to access in SDKtypes.RubricMetric.GROUNDING
Input
  • prompt
  • response
  • context
Output
  • score
  • explanation
The score has a range of0-1, and represents the rate of claims labeled assupported orno_rad (not requiring factual attributions, such as greetings, questions, or disclaimers) to the input prompt.
The explanation contains groupings of sentence, label, reasoning and excerpt from context.
Number of LLM calls1 call to Gemini 2.5 Flash

Safety

Latest versionsafety_v1
TypeStatic rubrics
Description A score-based metric that assesses whether the model's response violated one or more of the following policies:
  • PII & Demographic Data
  • Hate Speech
  • Dangerous Content
  • Harassment
  • Sexually Explicit
How to access in SDKtypes.RubricMetric.SAFETY
Input
  • prompt
  • response
Output
  • score
  • explanation
For the score,0 is unsafe and1 is safe.
The explanation field includes violated policies.
Number of LLM calls10 calls to Gemini 2.5 Flash

Multi-turn general quality

Latest versionmulti_turn_general_quality_v1
TypeAdaptive rubrics
DescriptionAn adaptive rubrics metric that evaluates the overall quality of a model's response within the context of a multi-turn dialogue.
How to access in SDKtypes.RubricMetric.MULTI_TURN_GENERAL_QUALITY
Input
  • prompt with multi-turn conversations
  • response
  • (Optional)rubric_groups
If you have rubrics already generated, you can provide them directly for evaluation.
Output
  • score
  • rubrics and corresponding verdicts
The score represents the passing rate of the response based on the rubrics.
Number of LLM calls6 calls to Gemini 2.5 Flash

Multi-turn text quality

Latest versionmulti_turn_text_quality_v1
TypeAdaptive rubrics
DescriptionAn adaptive rubrics metric that evaluates the text quality of a model's response within the context of a multi-turn dialogue.
How to access in SDKtypes.RubricMetric.TEXT_QUALITY
Input
  • prompt with multi-turn conversations
  • response
  • (Optional)rubric_groups
If you have rubrics already generated, you can provide them directly for evaluation.
Output
  • score
  • rubrics and correspondingverdicts
The score represents the passing rate of the response based on the rubrics.
Number of LLM calls6 calls to Gemini 2.5 Flash

Agent final response match

Latest versionfinal_response_match_v2
TypeStatic rubrics
DescriptionA metric that evaluates the quality of an AI agent's final answer by comparing it to a provided reference answer (ground truth).
How to access in SDKtypes.RubricMetric.FINAL_RESPONSE_MATCH
Input
  • prompt
  • response
  • reference
OutputScore
  • 1: Valid response that matches the reference.
  • 0: Invalid response that does not match the reference.
Explanation
Number of LLM calls5 calls to Gemini 2.5 Flash

Agent final response reference free

Latest versionfinal_response_reference_free_v1
TypeAdaptive rubrics
DescriptionAn adaptive rubrics metric that evaluates the quality of an AI agent's final answer without needing a reference answer.
You need to provide rubrics for this metric, as it doesn't support auto-generated rubrics.
How to access in SDKtypes.RubricMetric.FINAL_RESPONSE_REFERENCE_FREE
Input
  • prompt
  • response
  • rubric_groups
Output
  • score
  • rubrics and correspondingverdicts
The score represents the passing rate of the response based on the rubrics.
Number of LLM calls5 calls to Gemini 2.5 Flash

Agent final response quality

Latest versionfinal_response_quality_v1
TypeAdaptive rubrics
DescriptionA comprehensive adaptive rubrics metric that evaluates the overall quality of an agent's response. It automatically generates a broad range of criteria based on the agent configuration (developer instruction and declarations for tools available to the agent) and the user's prompt, then assesses the generated criteria based on tool usage in intermediate events and final answer by the agent.
How to access in SDKtypes.RubricMetric.FINAL_RESPONSE_QUALITY
Input
  • prompt
  • response
  • developer_instruction
  • tool_declarations (can be an empty list)
  • intermediate_events (containing function calls & responses, can be an empty list)
  • (Optional)rubric_groups (If you have rubrics already generated, you can provide them directly for evaluation)
Output
  • score
  • rubrics and correspondingverdicts

The score represents the passing rate of the response based on the rubrics.
Number of LLM calls5 calls to Gemini 2.5 Flash and 1 call to Gemini 2.5 Pro

Agent hallucination

Latest versionhallucination_v1
TypeStatic Rubrics
DescriptionA score-based metric that checks for factuality and consistency of text responses by segmenting the response into atomic claims. It verifies if each claim is grounded or not based on tool usage in the intermediate events.It can also be leveraged to evaluate any intermediate text responses by setting the flagevaluate_intermediate_nl_responses to true.
How to access in SDKtypes.RubricMetric.HALLUCINATION
Input
  • response
  • developer_instruction
  • tool_declarations (can be an empty list)
  • intermediate_events (containing function calls & responses, can be an empty list)
  • evaluate_intermediate_nl_responses (default is False)
Output
  • score
  • explanation and correspondingverdicts
The score has a range of 0-1, and represents the rate of claims labeled assupported orno_rad (not requiring factual attributions, such as greetings, questions, or disclaimers) relative to the input prompt. The explanation contains a structured breakdown of claim, label, reasoning, and excerpts that support the context.
Number of LLM calls2 calls to Gemini 2.5 Flash

Agent tools usage quality

Latest versiontool_use_quality_v1
TypeAdaptive rubrics
DescriptionA targeted adaptive rubrics metric that evaluates the selection of appropriate tools, correct parameter usage, and adherence to the specified sequence of operations.
How to access in SDKtypes.RubricMetric.TOOL_USE_QUALITY
Input
  • prompt
  • developer_instruction
  • tool_declarations (can be an empty list)
  • intermediate_events (containing function calls & responses, can be an empty list)
  • (Optional)rubric_groups (If you have rubrics already generated, you can provide them directly for evaluation)
Output
  • score
  • rubrics and correspondingverdicts
The score represents the passing rate of the response based on the rubrics.
Number of LLM calls5 calls to Gemini 2.5 Flash and 1 call to Gemini 2.5 Pro

Gecko text-to-image quality

Latest versiongecko_text2image_v1
TypeAdaptive rubrics
DescriptionTheGecko text-to-image metric is an adaptive, rubric-based method for evaluating the quality of a generated image against its corresponding text prompt. It works by first generating a set of questions from the prompt, which serve as a detailed, prompt-specific rubric. A model then answers these questions based on the generated image.
How to access in SDKtypes.RubricMetric.GECKO_TEXT2IMAGE
Input
  • prompt
  • response - should be file data with image MIME type
Output
  • score
  • rubrics and correspondingverdicts
The score represents the passing rate of the response based on the rubrics.
Number of LLM calls2 calls to Gemini 2.5 Flash

Gecko text-to-video quality

Latest versiongecko_text2video_v1
TypeAdaptive rubrics
DescriptionTheGecko text-to-video metric is an adaptive, rubric-based method for evaluating the quality of a generated video against its corresponding text prompt. It works by first generating a set of questions from the prompt, which serve as a detailed, prompt-specific rubric. A model then answers these questions based on the generated video.
How to access in SDKtypes.RubricMetric.GECKO_TEXT2VIDEO
Input
  • prompt
  • response - should be file data with video MIME type
Output
  • score
  • rubrics and correspondingverdicts
The score represents the passing rate of the response based on the rubrics.
Number of LLM calls2 calls to Gemini 2.5 Flash

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-16 UTC.