Choose your agentic AI architecture components

This document provides guidance to help you choose architectural components foryour agentic AI applications in Google Cloud. It describes how to evaluate thecharacteristics of your application and workload in order to choose anappropriate product or service that best suits your needs. The process to designan agentic AI architecture is iterative. You should periodically reassess yourarchitecture as your workload characteristics change, as your requirementsevolve, or as new Google Cloud products and features become available.

AI agents are effective for applications that solve open-ended problems, whichmight require autonomous decision-making and complex multi-step workflowmanagement. Agents excel at solving problems in real-time by using external dataand they excel at automating knowledge-intensive tasks. These capabilitiesenable agents to provide more business value than the assistive and generativecapabilities of an AI model.

You can use AI agents for deterministic problems with predefined steps.However, other approaches can be more efficient and cost-effective. For example,you don't need an agentic workflow for tasks like summarizing a document,translating text, or classifying customer feedback.

For information about alternative non-agentic AI solutions, see the followingresources:

Agent architecture overview

Anagent is an application that achieves a goal by processing input, performing reasoningwith available tools, and taking actions based on its decisions. An agent usesan AI model as its core reasoning engine to automate complex tasks. The agentuses a set of tools that let the AI model interact with external systems anddata sources. An agent can use a memory system to maintain context and learnfrom interactions. The goal of an agentic architecture is to create an autonomoussystem that can understand a user's intent, create a multi-step plan, andexecute that plan by using the available tools.

The following diagram shows a high-level overview of an agentic system'sarchitecture components:

The architecture components of an agentic system.

The agentic system architecture includes the following components:

Frontend framework: A collection of prebuilt components,libraries, and tools that you use to build the user interface (UI) for yourapplication.
Agent development framework: The frameworks and libraries that youuse to build and structure your agent's logic.
Agent tools: The collection of tools, such as APIs, services, andfunctions, that fetch data and perform actions or transactions.
Agent memory: The system that your agent uses to store and recallinformation.
Agent design patterns: Common architectural approaches forstructuring your agentic application.
Agent runtime: The compute environment where your agent'sapplication logic runs.
AI models: The core reasoning engine that powers your agent'sdecision-making capabilities.
Model runtime: The infrastructure that hosts and serves your AI model.

The following sections provide a detailed analysis of the components to helpyou make decisions about how to build your architecture. The components that youchoose will influence your agent's performance, scalability, cost, and security.This document focuses on the essential architectural components that you use tobuild and deploy an agent's core reasoning and execution logic. Topics such asresponsible AI safety frameworks and agent identity management are consideredout of scope for this document.

Frontend framework

Thefrontend framework is a collection of prebuilt components, libraries,and tools that you use to build the UI for your agentic application. Thefrontend framework that you choose defines the requirements for your backend. Asimple interface for an internal demo might only require a synchronous HTTP API,while a production-grade application requires a backend that supports streamingprotocols and robust state management.

Consider the following categories of frameworks:

Prototyping and internal tool frameworks: For rapid development,internal demos, and proof-of-concept applications, choose frameworks thatprioritize developer experience and velocity. These frameworks typicallyfavor a simple and synchronous model that's called arequest-responsemodel. A request-response model lets you build a functional UI withminimal code and a simpler backend compared to a production framework. Thisapproach is ideal for quickly testing agent logic and tool integrations,but it might not be suitable for highly scalable, public-facingapplications that require real-time interactions. Common frameworks in thiscategory includeMesop andGradio.
Production frameworks: For scalable, responsive, and feature-richapplications for external users, choose a framework that allows for customcomponents. These frameworks require a backend architecture that cansupport a modern user experience. A production framework should includesupport for streaming protocols, a stateless API design, and a robust,externalized memory system to manage conversation state across multipleuser sessions. Common frameworks for production applications includeStreamlit,React,and theFlutter AI Toolkit.

To manage the communication between these frameworks and your AI agent, you canuseAgent–User Interaction (AG-UI) protocol.AG-UI is an open protocol that enables backend AI agents to interact with yourfrontend framework. AG-UI tells the frontend framework when to render theagent's response, update application state, or trigger a client-side action. Tobuild interactive AI applications, combineAG-UI with Agent Development Kit (ADK). For information about ADK, continue to the next section "Agent development frameworks."

Agent development frameworks

Agent development frameworks are libraries that simplify the process ofbuilding, testing, and deploying agentic AI applications. These developmenttools provide prebuilt components and abstractions for core agent capabilities,including reasoning loops, memory, and tool integration.

To accelerate agent development in Google Cloud, we recommend that youuseADK.ADK is an open-source, opinionated, and modular framework that provides ahigh-level of abstraction for building and orchestrating workflows from simpletasks to complex, multi-agent systems.

ADK is optimized for Gemini models and Google Cloud, but it'sbuilt for compatibility with other frameworks. ADK supports other AI models andruntimes, so you can use it with any model or deployment method. For multi-agentsystems, ADK supports interaction through shared session states, model-drivendelegation to route tasks between agents, and explicit invocation that lets oneagent call another agent as a function or tool.

To help you get started quickly, ADK providescode samples in Python, Java, and Go that demonstrate a variety of use cases across multipleindustries. Although many of these samples highlight conversational flows, ADKis also well-suited for building autonomous agents that perform backend tasks.For these non-interactive use cases, choose anagent design patternthat excels in processing a single, self-contained request and that implementsrobust error handling.

To build a custom agent architecture, you can also use a general-purpose AIframework likeGenkit.Genkit provides primitives that let you have fine-grain control over youragent logic without the high-level abstraction that ADK offers. However, a dedicated agentframework like ADK provides specialized tools for developing agenticapplications.

Agent tools

An agent's ability to interact with external systems through tools defines itseffectiveness.Agent tools are functions or APIs that are available to the AImodel and that the agent uses to enhance output and allow for task automation.When you connect an AI agent to external systems, tools transform the agent froma simple text generator into a system that can automate complex, multi-steptasks.

To enable tool interactions, choose from the following tool use patterns:

Use case	Tool use pattern
You need to perform a common task like completing a web search,running a calculation, or executing code, and you want to accelerateinitial development.	Built-in tools
You want to build a modular or multi-agent system that requiresinteroperable and reusable tools.	Model Context Protocol (MCP)
You need to manage, secure, and monitor a large number of API-basedtools at an enterprise scale.	API management platform
You need to integrate with a specific internal or third-party APIthat doesn't have an MCP server.	Custom function tools

When you select tools for your agent, evaluate them on their functionalcapabilities and their operational reliability. Prioritize tools that areobservable, easy to debug, and that include robust error handling. Thesecapabilities help to ensure that you can trace actions and resolve failuresquickly. In addition, evaluate the agent's ability to select the right tool tosuccessfully complete its assigned tasks.

Built-in tools

ADK provides severalbuilt-in tools that are integrated directly into the agent's runtime. You can call these tools asfunctions without configuring external communication protocols. These tools provide common functionalities, including accessing real-time information from the web, executing code programmatically in a secure environment, retrieving information from private enterprise data to implement RAG, and interacting with structured data in cloud databases. The built-in tools work alongside any custom tools that you create.

MCP

To enable the components of your agentic system to interact, you need to establish clear communication protocols. MCP is an open protocol that provides a standardized interface for agents to access and use the necessary tools, data, and other services.

MCP decouples the agent's core reasoning logic from the specific implementation of its tools, similar to how a standard hardware port allows different peripherals to connect to a device. MCP simplifies tool integration because it provides a growing list of prebuilt connectors and a consistent way to build custom integrations. The flexibility to integrate tools promotes interoperability across different models and tools.

You can connect to a remote MCP server if one is available, or you can host your own MCP server. When you host your own MCP server, you have full control over how you expose proprietary or third-party API to your agents. To host your own custom MCP server, deploy it as a containerized application on Cloud Run or GKE.

API management platform

AnAPI management platform is a centralized system that lets you secure, monitor, and control internal or external services through APIs. An API management platform provides a centralized location to catalog all of your organization's APIs, simplifies how you expose data, and provides observability through usage monitoring.

To manage your agent's API-based tools at an enterprise scale on Google Cloud, we recommend that you useApigee API hub. API hub lets agents connect to data instantly through direct HTTP calls, prebuilt connectors, custom APIs registered in the hub, or direct access to Google Cloud data sources. This approach gives your agents immediate access to the information that they need without the complexity of building custom data loading and integration pipelines.

An API management platform and a communication protocol like MCP solve different architectural problems. A communication protocol standardizes the interaction format between the agent and the tool, which ensures that components are reusable and can be swapped. By contrast, an API management platform governs the lifecycle and security of the API endpoint, handling tasks like authentication, rate limiting, and monitoring. These patterns are complementary. For example, an agent can use MCP to communicate with a tool, and that tool can in turn be a secure API endpoint that API hub manages and protects.

Custom function tool

Afunction tool gives an agent new capabilities. You can write a custom function tool to give your agent specialized capabilities, such as to integrate with an external API or a proprietary business system. Writing a custom function tool is the most common pattern for extending an agent's abilities beyond what built-in tools can offer.

To create a custom function tool, you write a function in your preferred programming language and then provide a clear, natural-language description of its purpose, parameters, and return values. The agent's model uses this description to reason about when the tool is needed, what inputs to provide, and how to interpret the output to complete a user's request.

You can also create a custom function tool that implements anagent-as-a-tool function. An agent-as-a-tool function exposes one agent as a callable function that another agent can invoke. This technique lets you build complex, multi-agent systems where an agent can coordinate and delegate specialized tasks to other, specialized agents. For more information about agent design patterns and coordinating multi-agent orchestration, see the section onagent design patterns later in this document.

Agent memory

An agent's ability to remember past interactions is fundamental to provide acoherent and useful conversational experience. To create stateful, context-awareagents, you must implement mechanisms for short-term memory and long-termmemory. The following sections explore the design choices and Google Cloudservices that you can use to implement both short-term and long-term memory foryour agent.

Short-term memory

Short-term memory enables an agent to maintain context within a single, ongoingconversation. To implement short-term memory, you must manage both the sessionand its associated state.

Session:A session is the conversational thread between a user and the agent, fromthe initial interaction to the end of the dialogue.
State:State is the data that the agent uses and collects within a specificsession. The state data that's collected includes the history of messagesthat the user and agent exchanged, the results of any tool calls, and othervariables that the agent needs in order to understand the context of theconversation.

The following are options forimplementing short-term memory with ADK:

In-memory storage: For development, testing, or simple applicationsthat run on a single instance, you can store the session state directly inyour application's memory. The agent uses a data structure, such as adictionary or an object, to store alist of key-value pairs and it updates these values throughout the session. However, when you usein-memory storage, session state isn't persistent. If the applicationrestarts, it loses all conversation history.
External state management: For production applications that requirescalability and reliability, we recommend that you build a stateless agentapplication and manage the session state in an external storage service. Inthis architecture, each time the agent application receives a request, itretrieves the current conversation state from the external store, processesthe new turn, and then saves the updated state back to the store. Thisdesign lets you scale your application horizontally because any instancecan serve any user's request. Common choices for external state managementincludeMemorystore for Redis,Firestore,orVertex AI Agent Engine sessions.
If you use ADK, theDatabaseSessionService requires arelational database, such asCloud SQL.

Long-term memory

Long-term memory provides the agent with a persistent knowledge base that exists across allconversations for individual users. Long-term memory lets the agent retrieve and use externalinformation, learn from past interactions, and provide more accurate andrelevant responses.

The following are options for implementing long-term memory with ADK:

In-memory storage: For development and testing, you can store thesession state directly in your application's memory. This approach issimple to implement, but it isn't persistent. If the application restarts,it loses the conversation history. You typically implement this pattern byusing an in-memory provider within a development framework, such as theInMemoryMemoryService that's included in ADK for testing.
External storage: For production applications, manage your agent'sknowledge base in an external, persistent storage service. An externalstorage service ensures that your agent's knowledge is durable, scalable,and accessible across multiple application instances. UseMemory Bank for long-term storage with any agent runtime on Google Cloud.

Agent design patterns

Agent design patterns are common architectural approaches to build agenticapplications. These patterns offer a distinct framework for organizing asystem's components, integrating the AI model, and orchestrating a single agentor multiple agents to accomplish a workflow. To determine which approach is bestfor your workflow, you must consider the complexity and workflow of your tasks,latency, performance, and cost requirements.

Asingle-agent system relies on one model's reasoning capabilities to interpret a user's request, plana sequence of steps, and decide which tools to use. This approach is aneffective starting point that lets you focus on refining the core logic,prompts, and tool definitions before you add architectural complexity. However,a single agent's performance can degrade as tasks and the number of tools growin complexity.

For complex problems, amulti-agent system orchestrates multiple specialized agents to achieve a goal that a single agentcan't easily manage. This modular design can improve the scalability,reliability, and maintainability of the system. However, it also introducesadditional evaluation, security, and cost considerations compared to asingle-agent system.

When you develop a multi-agent system, you must implement precise accesscontrols for each specialized agent, design a robust orchestration system toensure reliable inter-agent communication, and manage the increased operationalcosts from the computational overhead of running multiple agents. To facilitatecommunication between agents, useAgent2Agent (A2A) protocol with ADK.A2A is an open standard protocol that enables AI agents to communicate andcollaborate across different platforms and frameworks, regardless of theirunderlying technologies.

For more information about common agent design patterns and how to select apattern based on your workload requirements, seeChoose a design pattern for your agentic AI system.

AI models

Agentic applications depend on the reasoning and understanding capabilities ofa model to act as the primary task orchestrator. For this core agent role, werecommend that you useGemini Pro.

Google models, like Gemini, provide access to the latest andmost capable proprietary models through a managed API. This approach is idealfor minimizing operational overhead. In contrast, an open, self-hosted modelprovides the deep control that's required when you fine-tune on proprietarydata. Workloads with strict security and data residency requirements alsorequire a self-hosted model, because it lets you run the model within your ownnetwork.

To improve agent performance, you can adjust the model's reasoningcapabilities. Models such as the latestGemini Pro and Flash models feature a built-in thinking process that improves reasoning and multi-stepplanning. For debugging and refinement, you can review the model'sthought summaries,or synthesized versions of its internal thoughts, to understand its reasoningpath. You can control the model's reasoning capabilities by adjusting thethinking budget,or the number of thinking tokens, based on task complexity. A higher thinkingbudget lets the model perform more detailed reasoning and planning before itprovides an answer. A higher thinking budget can improve response quality, butit might also increase latency and cost.

To optimize for performance and cost, implementmodel routing to dynamically select the most appropriate model for each task based on thetask's complexity, cost, or latency requirements. For example, you can routesimple requests to a small language model (SLM) for structured tasks like codegeneration or text classification, and reserve a more powerful and expensivemodel for complex reasoning. If you implement model routing in your agenticapplication, you can create a cost-effective system that maintains highperformance.

Google Cloud provides access to a wide selection of Google models,partner models, and open models that you can use in your agentic architecture.For more information on the models that are available and how to choose a modelto fit your needs, seeModel Garden on Vertex AI.

Model runtime

A model runtime is the environment that hosts and serves your AI model and thatmakes its reasoning capabilities available to your agent.

Choose a model runtime

To select the best runtime when you host your AImodels, use the following guidance:

Use case	Model runtime
You need a fully managed API to serve Geminimodels, partner models, open models, or custom models withenterprise-grade security, scaling, and generative AI tools.	Vertex AI
You need to deploy an open or custom containerized model andprioritize serverless simplicity and cost-efficiency for variabletraffic.	Cloud Run
You need maximum control over the infrastructure to run an openor custom containerized model on specialized hardware or to meetcomplex security and networking requirements.	GKE

The following sections provide an overview of the preceding model runtimes,including key features and design considerations. This document focuses onVertex AI, Cloud Run, andGKE. However, Google Cloud offers other services thatyou might consider for a model runtime:

Gemini API:The Gemini API is designed for developers who need quick,direct access to Gemini models without the enterprisegovernance features that complex agentic systems often require.
Compute Engine:Compute Engine is an infrastructure as a service (IaaS) product that issuitable for legacy applications. It introduces significant operationaloverhead compared to modern, container-based runtimes.

For more information about the features that distinguish all of the service options for model runtimes, seeModel hosting infrastructure.

Vertex AI

Vertex AIprovides a fully managed, serverless environment that hosts your AI models. Youcan serve and fine-tune Google models, partner models, and open models througha secure and scalable API. This approach abstracts away all infrastructuremanagement, and it lets you focus on integrating model intelligence into yourapplications.

When you use Vertex AIas a model runtime, the key features and considerations include the following:

Infrastructure control: A fully managed API for yourmodels. Google manages the underlying infrastructure.
Security: Managed security defaults and standard compliancecertifications are sufficient for your needs. To provide prompt and response protection and toensure responsible AI practices, you can integrateModel Armorinto Vertex AI.
Model availability: Access to a wide selection of models,including the latest Gemini models, through a managed API.
Cost: Pay-per-use pricing model that scales with yourapplication's traffic. For more information, seeCost of buildingand deploying AI models in Vertex AI.

Cloud Run

Cloud Run provides a serverless runtime that hosts yourmodels inside custom containers. Cloud Run offers a balancebetween the fully managed simplicity of Vertex AI and the deepinfrastructure control of GKE. This approach is ideal whenyou need the flexibility to run your model in a containerized environmentwithout managing servers or clusters.

When you use Cloud Runas a model runtime, the key features and considerations include the following:

Infrastructure control: Run any model in a customcontainer, which provides full control over the software environment, while theplatform manages the underlying serverless infrastructure.
Security: Provides security through ephemeral, isolatedcompute instances and allows for secure connections to private resources by usingDirectVPC egress or aServerless VPC Accessconnector. For more information, seePrivatenetworking and Cloud Run.
Model availability: Serve open models such asGemma or serve your own custom models. You can't host or serveGemini models on Cloud Run.
Cost: Features a pay-per-use, request-based pricing modelthat scales to zero, which makes it highly cost-effective for models withsporadic or variable traffic. For more information, seeCloud Runpricing.

GKE

GKE provides the most control and flexibility forhosting your AI models. To use this approach, you run your models in containerson a GKE cluster that you configure and manage. GKE isthe ideal choice when you need to run models on specialized hardware, colocatethem with your applications for minimal latency, or require granular controlover every aspect of the serving environment.

When you use GKE as a model runtime, the key features and considerations include the following:

Infrastructure control: Provides maximum, granularcontrol over the entire serving environment, including node configurations,specialized machine accelerators, and the specific model serving software.
Security: Enables the highest level of security and dataisolation because it lets you run models entirely within your network and applyfine-grained Kubernetes security policies. To screen traffic to and from a GKE clusterand to protect all interactions with the AI models, you can integrateModel Armorwith GKE .
Model availability: Serve open models such asGemma, or serve your own custom models. You can't host or serveGemini models on GKE.
Cost: Features a cost model that's based on the underlying computeand cluster resources that you consume, which makes it highly optimized forpredictable, high-volume workloads when you usecommitted usediscounts (CUDs). For more information, seeGoogle Kubernetes Enginepricing.

Agent runtime

To host and deploy your agentic application, you must choose an agent runtime.This service runs your application code—the business logic and orchestrationthat you write when you use anagent development framework.From this runtime, your application makes API calls to the models that yourchosenmodel runtime hosts and manages.

Choose an agent runtime

To select the runtime when you host your AI agents, use the following guidance:

Use case	Agent runtime
Your application is a Python agent and it requires a fullymanaged experience with minimal operational overhead.	Vertex AI Agent Engine
Your application is containerized and it requires serverless,event-driven scaling with language flexibility.	Cloud Run
Your application is containerized, has complex statefulrequirements, and it needs fine-grained infrastructureconfiguration.	GKE

If you already manage applications on Cloud Run or onGKE, you can accelerate development and simplify long-term operationsby using the same platform for your agentic workload.

The following sections provide an overview of each agent runtime, including keyfeatures and design considerations.

Vertex AI Agent Engine

Vertex AI Agent Engine is a fully-managed, opinionated runtime that you can use to deploy,operate, and scale agentic applications. Vertex AI Agent Engine abstracts away theunderlying infrastructure, which lets you focus on agent logic instead ofoperations.

The following are features and considerations for Vertex AI Agent Engine:

Programming language and frameworkflexibility:Develop agents in Python with anysupportedframeworks.
Communication protocols: Orchestrate agents and tools thatuse MCP and A2A. Vertex AI Agent Engine efficiently manages the runtime for thesecomponents, but it doesn't support the hosting of custom MCP servers.
Memory: Provides built-in, managed memory capabilities,which removes the need to configure external databases for core agent memory.
Requirement Available options
Short-term memory Vertex AI Agent Engine sessions
Long-term memory Memory Bank
Database search and retrieval
Cloud SQL
AlloyDB for PostgreSQL
Scalability: Automatically scales to meet the demandsof your agentic workload, which removes the need for manual configuration.Vertex AI Agent Engine is built on Cloud Run andit usesCloud Run's built-in instancescaling to provide this automatic scaling.
Observability: Provides integrated logging, monitoring, andtracing throughGoogle Cloud Observabilityservices.
Security: Provides the followingenterprise-level reliability, scalability, and compliance:
- Built-inservice identity for secure, authenticated calls to GoogleCloud APIs.
- Run code in a secure, isolated, and managed sandbox withVertex AI Agent Engine CodeExecution.
- Protect your data with your owncustomer-managedencryption key (CMEK) inSecret Manager.
- RestrictIAMpermissions anduse VPC firewall rules to prevent unwantednetwork calls.
For information aboutVertex AI Agent Engine security features, seeEnterprise security.

Requirement	Available options
Short-term memory	Vertex AI Agent Engine sessions
Long-term memory	Memory Bank
Database search and retrieval	Cloud SQL AlloyDB for PostgreSQL

Vertex AI Agent Engine accelerates the path to productionbecause it provides a purpose-built, managed environment that handles manycomplex aspects when you operate agents, such as lifecycle and contextmanagement. Vertex AI Agent Engine is less suitable for use casesthat require extensive customization of the compute environment or that requireprogramming languages other than Python. For workloads that have strict securityrequirements for private dependency management, Cloud Run andGKE offer a more direct, IAM-basedconfiguration path.

Cloud Run

Cloud Run is a fully managed,serverless platform that lets you run your agent application code in a statelesscontainer. Cloud Run is ideal when you want to deploy the entireagent application, individual components, or custom tools as scalable HTTPendpoints without needing to manage the underlying infrastructure.

Thefollowing are features and considerations forCloud Run:

Programming language and framework flexibility: Whenyou package your application in a container, you can develop agents in anyprogramming language and with any framework.
Communicationprotocols: Orchestrate agents and tools that use MCP and A2A.Host MCP clients andservers with streamable HTTP transport on Cloud Run.

Memory: Cloud Run instances are stateless,which means that an instance loses any in-memory data after it terminates. Toimplement persistent memory, connect your service to a managedGoogle Cloud storage service:

Requirement	Available options
Short-term memory	Memorystore for Redis Vertex AI Agent Engine sessions with Cloud Run Firestore ADK's`DatabaseSessionService` withCloud SQL
Long-term memory	Firestore Memory Bank with Cloud Run
Database search and retrieval	Cloud SQL AlloyDB for PostgreSQL

Scalability:Automatically scales the number of instancesbased on incoming traffic, and also scales instances down to zero. This featurehelps make Cloud Run cost-effective for applications that havevariable workloads.
Observability: Provides integratedlogging, monitoring, and tracing throughGoogle Cloud Observability services. For moreinformation, seeMonitoring and loggingoverview.
Security: Provides the followingsecurity controls for your agents:
- Built-in identity service for secure,authenticated calls to GoogleCloud APIs.
- Run untested code in a secure environment with theCloud Run sandbox environmentor withVertex AI Agent Engine codeexecution.
- Store sensitive data that Cloud Run usesbyconfiguring secretsinSecret Manager.
- Prevent unwanted network calls by restrictingIAMpermissions and usingVPC firewall rules.

Cloud Run offers significant operational simplicity andcost-effectiveness because it eliminates infrastructure management. However, thestateless nature of Cloud Run requires you to use a storageservice in order to manage context across a multi-step workflow. Additionally,the maximum request timeout for Cloud Run services is up to onehour, which might constrain long-running agentic tasks.

GKE

Google Kubernetes Engine (GKE) is a managedcontainer orchestration service that provides granular control over your agenticapplication's architecture and infrastructure.GKE is suitable for complex agentic systems that requirerobust, production-grade capabilities or if you are already aGKE customer and you want to implement an agentic workflowon top of your existing application.

The following are features and considerations that are available onGKE:

Programming language and framework flexibility: Whenyou package your application in a container, you can develop agents in anyprogramming language and with any framework.
Communication protocols: Orchestrate agents and tools thatuse MCP and A2A. Host MCP clients and servers on GKE whenyou package them as containers.

Memory: GKE pods are ephemeral.However, you can build stateful agents with persistent memory by usingin-cluster resources or by connecting to external services:

Requirement	Available options
Short-term memory	Memorystore for Redis Vertex AI Agent Engine sessions with GKE Firestore ADK's`DatabaseSessionService` withCloud SQL
Long-term memory	Firestore Memory Bank with GKE
Database search and retrieval	StatefulSets andPersistent Volumes for durable storage within your cluster. Cloud SQL AlloyDB for PostgreSQL

Scalability: GKE clustersautomatically provision andscale your nodepools to meet the requirements of your workload.
Observability: Provides integrated logging, monitoring, andtracing at the cluster, node, and pod levels with Google Cloud Observability. To collectconfigured third-party and user-defined metrics and then send them toCloud Monitoring, you can also useGoogle Cloud Managed Service for Prometheus. For moreinformation, seeOverview ofGKE observability.
Security: Provides fine-grained security controls for youragents.
- UseWorkload Identity Federation for GKE for secure authentication to Google Cloud APIs.
- Isolate untrusted code withGKE Sandbox.
- Store sensitive data that your GKE clusters use in Secret Manager.
- RestrictIAM permissions and useVPC firewall rules andNetwork Policies to prevent unwanted network calls.

GKE provides maximum control and flexibility, which letsyou run complex, stateful agents. However, this control introduces significantoperational overhead and complexity. You must configure and manage theKubernetes cluster, including node pools, networking, and scaling policies,which requires more expertise and development effort than a serverless platformrequires.

What's next

Agent tools:
Agent memory:
- Remember me, memory in agents.
- Remember this: Agent state and memory with ADK.
Agent design patterns:
Agent runtime:
Other agentic AI resources on Google Cloud:
For more reference architectures, diagrams, and best practices, explore theCloud Architecture Center.

Contributors

Author:Samantha He | Technical Writer

Other contributors:

Amina Mansour | Head of Cloud Platform Evaluations Team
Amit Maraj | Developer Relations Engineer
Casey West | Architecture Advocate, Google Cloud
Jack Wotherspoon | Developer Advocate
Joe Fernandez | Staff Technical Writer
Joe Shirey | Cloud Developer Relations Manager
Karl Weinmeister | Director of Cloud Product Developer Relations
Kumar Dhanagopal | Cross-Product Solution Developer
Lisa Shen | Senior Outbound Product Manager, Google Cloud
Mandy Grover | Head of Architecture Center
Megan O'Keefe | Developer Advocate
Olivier Bourgeois | Developer Relations Engineer
Polong Lin | Developer Relations Engineering Manager
Shir Meir Lador | Developer Relations Engineering Manager
Vlad Kolesnikov | Developer Relations Engineer

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-11-24 UTC.

Movatterモバイル変換

Choose your agentic AI architecture components Stay organized with collections Save and categorize content based on your preferences.

Agent architecture overview

Frontend framework

Agent development frameworks

Agent tools

Built-in tools

MCP

API management platform

Custom function tool

Agent memory

Short-term memory

Long-term memory

Agent design patterns

AI models

Model runtime

Choose a model runtime

Vertex AI

Cloud Run

GKE

Agent runtime

Choose an agent runtime

Vertex AI Agent Engine

Cloud Run

GKE

What's next

Contributors

Choose your agentic AI architecture components