Host AI agents on Cloud Run Stay organized with collections Save and categorize content based on your preferences.
This page highlights use cases for hosting AI agents on Cloud Run.
AI agents are autonomous software entities that use LLM-powered systems toperceive, decide, and act to achieve goals. As more autonomous agents arebuilt, their ability to communicate and collaborate becomes crucial.
For an introduction to AI agents, seeWhat is an AI agent.
Use cases for AI agents on Cloud Run
You can implement AI agents as Cloud Run services to orchestrate a setof asynchronous tasks and provide information through multiplerequest-response interactions.
A Cloud Run service is a scalable API endpoint for yourapplication's core logic. It efficiently manages multiple concurrent usersthrough automatic, on-demand, and rapid scaling of instances.
AI agent on Cloud Run architecture
A typical AI agent architecture deployed on Cloud Run can involveseveral components from Google Cloud as well as outside of Google Cloud:
The diagram shows the following:
Hosting platform: Cloud Run is a hosting platform forrunning agents and it offers the following benefits:
- Supports running anyagent framework to builddifferent types of agents and agentic architectures. Examples of agentframeworks includeAgent Development Kit (ADK),Dify, andLangGraph, andn8n.
- Provides built-in features for managing youragent. For example, Cloud Run provides a built-inservice identity that you can useas the agent identity for calling Google Cloud APIs with secure andautomatic credentials.
- Supports connecting your agent framework to other services. You canconnect your agent to first-party or third-party tools deployed onCloud Run. For example, to gain visibility into youragent's tasks and executions, you can deploy and use tools likeLangfuse andArize.
Agent interactions: Cloud Run supportsstreaming HTTP responsesback to the user, andWebSockets forreal-time interactions.
GenAI models: The orchestration layer calls models for reasoningcapabilities. These modelscan be hosted on services, such as the following:
- Gemini APIfor Google's generative AI models.
- Vertex AI endpoints forcustom models or other foundation models.
- GPU-enabled-Cloud Run servicefor your own fine-tuned models.
Memory: Agents often need memory to retain context and learn from pastinteractions. You can use the following services:
- Memorystore for Redisfor short-term memory.
- Firestore for long-term memory, such asstoring the conversational history or remembering the user'spreferences based on raw data.
- Vertex AI Agent Engine Memory Bankfor long-term personalized memory. This feature automatically extractsfrom the user's conversational history to remember and update the user'spreferences over time. Note that you need tocreate at least one Agent Engine instanceto use this feature with Cloud Run.
Vector database: For Retrieval-Augmented Generation (RAG) orfetching structured data, use a vector database to query specific entityinformation or perform a vector search over embeddings.Use the
pgvectorextension with the following services:Tools: The orchestrator uses tools to perform specific tasks to interactwith external services, APIs, or websites.This can include:
- Model Context Protocol (MCP): Use this standardized protocol tocommunicate with external tools that are executed through anMCP server.
- Basic utilities: Precise math calculations, time conversions, or othersimilar utilities.
- API calling: Make calls to other internal or third-party APIs (read orwrite access).
- Image or chart generation: Quickly and effectively create visual content.
- Browser and OS automation: Run aheadless or a full graphical Operating Systemwithin container instances to allow the agent to browse the web,extract information from websites, or perform actions using clicks andkeyboard input.
- Code execution:Execute codein a secure environment with multi-layered sandboxing, with minimal ornoIAM permissions.
- Vertex AI Agent Engine Code Execution: Execute code in asecure, isolated, and managed sandbox environmentsthat supports file input and output, less than one second codeexecution, and long-lived memory. Note that you need tocreate at least one Vertex AI Agent Engine instanceto use this feature in Cloud Run.
What's next
- WatchBuild AI agents on Cloud Run.
- Try thecodelab for learning how to build and deploy a LangChain app to Cloud Run.
- Learn how todeploy Agent Development Kit (ADK) to Cloud Run.
- Try thecodelab for using an MCP server on Cloud Run with an ADK agent.
- Try thecodelab for deploying your ADK agent to Cloud Run with GPU.
- Find ready-to-use agent samples inAgent Development Kit (ADK) samples.
- Host Model Context Protocol (MCP) servers on Cloud Run.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2026-02-19 UTC.