AI use cases on Cloud Run

Whether you're building agents, running inference models, or integrating withvarious AI services, Cloud Run provides the scalability,flexibility, and ease of use needed to bring your AI innovations to life.

This page highlights some high-level use cases for hosting, building, anddeploying AI workloads on Cloud Run.

Why use Cloud Run for AI workloads?

Cloud Run offers several advantages for ensuring your AIapplications are scalable, flexible, and managable. Some highlights include:

  • Flexible container support: Package your app and its dependencies in acontainer, or use any supported language, library, or framework. Learn moreabout Cloud Run'sContainer runtime contract.
  • HTTP endpoint: After deploying a Cloud Run service, receive anout of the box, secureCloud Run URL endpoint.Cloud Run provides streaming through supporting HTTP chunkedtransfer encoding, HTTP/2, and WebSockets.
  • Automatic or manual scaling: By default, Cloud Runautomatically scales your servicebased on demand, even to zero. This ensures you only pay for what you use,making it ideal for unpredictable AI workloads. You can also set yourservice to manual scaling based on your traffic and CPU utilization needs.
  • GPU Support: Accelerate your AI models by configuringCloud Run resources with GPUs. Cloud Runservices with GPUs enabled can scale down to zero for cost savings when notin use.

  • Integrated ecosystem: Seamlesslyconnect to other Google Cloud services,such as Vertex AI, BigQuery, Cloud SQL,Memorystore, Pub/Sub, AlloyDB for PostgreSQL, Cloud CDN,Secret Manager, and custom domains to build comprehensiveend-to-end AI pipelines. Google Cloud Observability also provides built-in monitoring andlogging tools to understand application performance and troubleshoot issueseffectively.

Key AI use cases

Here are some ways you can use Cloud Run to power your AIapplications:

Host AI agents and bots

Cloud Run is an ideal platform for hosting the backend logic forAI agents, chatbots, and virtual assistants. These agents can orchestrate callsto AI models like Gemini on Vertex AI, manage state, andintegrate with various tools and APIs.

  • Microservices for agents: Deploy individual agent capabilities asseparate Cloud Run services. SeeHost AI agents to learn more.
  • Agent2Agent (A2A) communication: Build collaborative agent systems usingthe A2A protocol. SeeHost A2A agents to learn more.
  • Model Context Protocol (MCP) servers: Implement MCP servers to providestandardized context to LLMs from your tools and data sources. SeeHost MCP servers to learn more.

Serve AI/ML models for inference

Deploy your trained machine learning models as scalable HTTP endpoints.

  • Real-time inference: Serve predictions from models built with frameworkslike TensorFlow, PyTorch, scikit-learn, or using open models likeGemma. SeeRun Gemma 3 on Cloud Runfor an example.
  • GPU acceleration: Use NVIDIA GPUs to accelerate inference for moredemanding models. SeeConfigure GPU for servicesto learn more.
  • Integrate with Vertex AI: Serve models trained or deployed onVertex AI, using Cloud Run as a scalablefrontend.
  • Decouple large model files from your container: The Cloud Storage FUSEadapter lets you mount a Cloud Storage bucket, and makes it accessible as alocal directory inside your Cloud Run container.

Build Retrieval-Augmented Generation (RAG) systems

Build RAG applications by connecting Cloud Run services to yourdata sources.

  • Vector databases: Connect to vector databases hosted onCloud SQL (withpgvector), AlloyDB for PostgreSQL,Memorystore for Redis, or other specialized vector stores to retrieverelevant context for your LLMs. See aninfrastructure exampleof using Cloud Run for hosting a a RAG-capable generative AIapplication and data processing using Vertex AI andVector Search.
  • Data access: Fetch data from Cloud Storage, BigQuery,Firestore, or other APIs to enrich prompts.

Host AI-powered APIs and backends

Create APIs and microservices that embed AI capabilities.

  • Smart APIs: Develop APIs that use LLMs for natural languageunderstanding, sentiment analysis, translation, summarization, and so forth.
  • Automated workflows: Build services that trigger AI-driven actions basedon events or requests.

Prototype and experiment on ideas

Rapidly iterate on AI ideas.

  • Rapid deployment: Quickly move prototypes from environments likeVertex AI Studio,Google AI studio,or Jupyter notebooks to scalable deployments on Cloud Runwith minimal configuration.
  • Traffic splitting: Use Cloud Run'straffic splitting featureto A/B test different models, prompts, or configurations, and Google Cloud Observabilityfor monitoring metrics (latency, error rate, cost) to measure the success ofA/B testing.

What's next

Depending on your familiarity with AI concepts and your AI use case,explore the Cloud Run AI resources.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-18 UTC.