Tutorial: Perform evaluation using the console Stay organized with collections Save and categorize content based on your preferences.
Preview
This feature is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of theService Specific Terms. Pre-GA features are available "as is" and might have limited support. For more information, see thelaunch stage descriptions.
Learn how to get started with Gen AI evaluation service using the Google Google Cloud console.
Before you begin
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.Roles required to select or create a project
- Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
- Create a project: To create a project, you need the Project Creator role (
roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission.Learn how to grant roles.
Verify that billing is enabled for your Google Cloud project.
Make sure that you have the following role or roles on the project: Storage Admin
Check for the roles
In the Google Cloud console, go to theIAM page.
Go to IAM- Select the project.
In thePrincipal column, find all rows that identify you or a group that you're included in. To learn which groups you're included in, contact your administrator.
- For all rows that specify or include you, check theRole column to see whether the list of roles includes the required roles.
Grant the roles
In the Google Cloud console, go to theIAM page.
Go to IAM- Select the project.
- ClickGrant access.
In theNew principals field, enter your user identifier. This is typically the email address for a Google Account.
- In theSelect a role list, select a role.
- To grant additional roles, clickAdd another role and add each additional role.
- ClickSave.
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.Roles required to select or create a project
- Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
- Create a project: To create a project, you need the Project Creator role (
roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission.Learn how to grant roles.
Verify that billing is enabled for your Google Cloud project.
Make sure that you have the following role or roles on the project: Storage Admin
Check for the roles
In the Google Cloud console, go to theIAM page.
Go to IAM- Select the project.
In thePrincipal column, find all rows that identify you or a group that you're included in. To learn which groups you're included in, contact your administrator.
- For all rows that specify or include you, check theRole column to see whether the list of roles includes the required roles.
Grant the roles
In the Google Cloud console, go to theIAM page.
Go to IAM- Select the project.
- ClickGrant access.
In theNew principals field, enter your user identifier. This is typically the email address for a Google Account.
- In theSelect a role list, select a role.
- To grant additional roles, clickAdd another role and add each additional role.
- ClickSave.
Evaluate your model
To evaluate your model:
In the Google Cloud console, go to the Gen AI Evaluation page.
ClickNew evaluation to open the evaluation page.
ForDefine evaluation dataset, select an option:
Upload file: ClickUpload to upload a CSV or JSONL file. The dataset should contain either prompts or records to use in a prompt template and optionally model responses, with a maximum of 200 rows.
Generate data: Enter aPrompt template to guide the Gen AI evaluation service in generating a dataset. Variables you define in your prompt template are generated and populated in the dataset. For more information, seeUse prompt templates.
Define variables to generate: Specify variables to generate and descriptions of the variable to guide generation. If needed, clickAdd another variable description.
Enter aNumber of samples to generate.
ClickGenerate and preview dataset to display a generated dataset based on your prompt template and variables. To adjust the dataset, you can add more details to the variable descriptions and clickRe-generate.
Use model logs: Use the snapshot of prompts and responses from the logged traffic of the selected model. You can only use this option if you have request-response logs enabled on a deployed model in Vertex AI. If you just enabled logging, allow time for sufficient samples to accumulate.
Select theModel and theRegion you want to log traffic from. You must have enabled logging already on your selected model and region.
Enter aSampling count.
(Optional) EnableFilter by prompt template to use only logs that match yourPrompt template. This can be useful if you use your selected models for a variety of use cases and want to evaluate one specific use case.
ForDefine model responses to evaluate, select an option:
From dataset (only available if you selectedUpload file forDefine evaluation dataset): If you want to use one of the fields in the uploaded dataset as your response, select aResponse column.
From model (only available if you selectedUse model logs forDefine evaluation dataset): If you're using model logs as the evaluation dataset, the Gen AI evaluation service uses the model responses from the model logs.
Call model: Select a model. The Gen AI evaluation service runs prompts on the selected model and uses the responses for evaluation.
(Optional) ForAuto-generated evaluation metrics, you canSpecify custom instructions to guide the rubrics generated from each prompt. For example,
Evaluate the dataset on cultural sensitivity to the countries {name}. For more information, seeDefine your evaluation metrics.ForName and output directory, enter the following:
Evaluation name: Enter a name for your evaluation.
Output private data path: Enter the name of a Cloud Storage bucket where you want to store your evaluation, or click Browse to choose the bucket.
ClickEvaluate.
View your evaluation results
To view an evaluation result:
In the Google Cloud console, go to the Gen AI Evaluation page.
Click the evaluation name.
For each prompt in your evaluation dataset, the model's response displays along with the evaluation results.
What's next
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-15 UTC.