Evaluate generative AI agents using the GenAI Client in Vertex AI SDK

Preview

This feature is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of theService Specific Terms. Pre-GA features are available "as is" and might have limited support. For more information, see thelaunch stage descriptions.

After you build and evaluate your generative AI model, you might use the model to buildan agent such as a chatbot. The Gen AI evaluation service lets you measure youragent's ability to complete tasks and goals for your use case.

This pageshows you how to create and deploy a basic agent and use theGen AI evaluation service to evaluate the agent:

Before you begin

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

    In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Roles required to select or create a project

    • Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
    • Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.create permission.Learn how to grant roles.
    Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.

    Go to project selector

    Verify that billing is enabled for your Google Cloud project.

    In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Roles required to select or create a project

    • Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
    • Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.create permission.Learn how to grant roles.
    Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.

    Go to project selector

    Verify that billing is enabled for your Google Cloud project.

  2. Install the Vertex AI SDK for Python:

    %pipinstallgoogle-cloud-aiplatform[adk,agent_engines]%pipinstall--upgrade--force-reinstall-qgoogle-cloud-aiplatform[evaluation]
  3. Set up your credentials. If you are running this tutorial in Colaboratory, run the following:

    fromgoogle.colabimportauthauth.authenticate_user()

    For other environments, refer toAuthenticate to Vertex AI.

  4. Initialize the GenAI Client in Vertex AI SDK:

    importvertexaifromvertexaiimportClientfromgoogle.genaiimporttypesasgenai_typesGCS_DEST="gs://BUCKET_NAME/output-path"vertexai.init(project=PROJECT_ID,location=LOCATION,)client=Client(project=PROJECT_ID,location=LOCATION,http_options=genai_types.HttpOptions(api_version="v1beta1"),)

    Replace the following:

    • BUCKET_NAME: Cloud Storage bucket name. SeeCreate a bucket to learn more about creating buckets.

    • PROJECT_ID: Your project ID.

    • LOCATION: Your selected region.

Develop an agent

Develop an Agent Development Kit (ADK) agent by defining the model, instruction,and set of tools. For more information on developing an agent, seeDevelop anAgent Development Kit agent.

fromgoogle.adkimportAgent# Define Agent Toolsdefsearch_products(query:str):"""Searches for products based on a query."""# Mock response for demonstrationif"headphones"inquery.lower():return{"products":[{"name":"Wireless Headphones","id":"B08H8H8H8H"}]}else:return{"products":[]}defget_product_details(product_id:str):"""Gets the details for a given product ID."""ifproduct_id=="B08H8H8H8H":return{"details":"Noise-cancelling, 20-hour battery life."}else:return{"error":"Product not found."}defadd_to_cart(product_id:str,quantity:int):"""Adds a specified quantity of a product to the cart."""return{"status":f"Added{quantity} of{product_id} to cart."}# Define Agentmy_agent=Agent(model="gemini-2.5-flash",name='ecommerce_agent',instruction='You are an ecommerce expert',tools=[search_products,get_product_details,add_to_cart],)

Deploy agent

Deploy your agent to Vertex AI Agent Engine Runtime. This can take upto 10 minutes. Retrieve the resource name from the deployed agent.

defdeploy_adk_agent(root_agent):"""Deploy agent to agent engine.  Args:    root_agent: The ADK agent to deploy.  """app=vertexai.agent_engines.AdkApp(agent=root_agent,)remote_app=client.agent_engines.create(agent=app,config={"staging_bucket":gs://BUCKET_NAME,"requirements":['google-cloud-aiplatform[adk,agent_engines]'],"env_vars":{"GOOGLE_CLOUD_AGENT_ENGINE_ENABLE_TELEMETRY":"true"}})returnremote_appagent_engine=deploy_adk_agent(my_agent)agent_engine_resource_name=agent_engine.api_resource.name

To get the list of agents that are deployed to Vertex AI Agent Engine,seeManage deployed agents.

Generate responses

  1. Generate model responses for your dataset usingrun_inference():

    Prepare yourdataset as a PandasDataFrame. The prompts should be specific to your agent. Session inputs arerequired for traces. For more information, seeSession: Tracking IndividualConversations.

    importpandasaspdfromvertexaiimporttypessession_inputs=types.evals.SessionInput(user_id="user_123",state={},)agent_prompts=["Search for 'noise-cancelling headphones'.","Show me the details for product 'B08H8H8H8H'.","Add one pair of 'B08H8H8H8H' to my shopping cart.","Find 'wireless earbuds' and then add the first result to my cart.","I need a new laptop for work, can you find one with at least 16GB of RAM?",]agent_dataset=pd.DataFrame({"prompt":agent_prompts,"session_inputs":[session_inputs]*len(agent_prompts),})
  2. Generate model responses usingrun_inference():

    agent_dataset_with_inference=client.evals.run_inference(agent=agent_engine_resource_name,src=agent_dataset,)
  3. Visualize your inference results by calling.show() on theEvaluationDataset object to inspect the model's outputs alongside youroriginal prompts and references:

    agent_dataset_with_inference.show()

    The following image displays the evaluation dataset with prompts and theircorresponding generatedintermediate_events andresponses:

    Agent evaluation results

Run the agent evaluation

Runcreate_evaluation_run() to evaluate the agent responses.

  1. Retrieve theagent_info using the built-in helper function:

    agent_info=types.evals.AgentInfo.load_from_agent(my_agent,agent_engine_resource_name)
  2. Evaluate the model responses using agent-specificadaptive rubric-basedmetrics (FINAL_RESPONSE_QUALITY,TOOL_USE_QUALITY, andHALLUCINATION):

    evaluation_run=client.evals.create_evaluation_run(dataset=agent_dataset_with_inference,agent_info=agent_info,metrics=[types.RubricMetric.FINAL_RESPONSE_QUALITY,types.RubricMetric.TOOL_USE_QUALITY,types.RubricMetric.HALLUCINATION,types.RubricMetric.SAFETY,],dest=GCS_DEST,)

View the agent evaluation results

You can view the evaluation results using the Vertex AI SDK.

Retrieve the evaluation run andvisualize your evaluation results by calling.show() to display summary metrics and detailed results:

evaluation_run=client.evals.get_evaluation_run(name=evaluation_run.name,include_evaluation_items=True)evaluation_run.show()

The following image displays an evaluation report, which shows summary metrics,agent information, and detailed results for each prompt-response pair. Thedetailed results also include traces showing the agent interactions. For moreinformation on traces seeTrace anagent.

Agent evaluation results

What's next

Try the following agent evaluation notebooks:

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-17 UTC.