About maximum instances

By default, Cloud Run services have a maximum number of instances determined by the lowest of the following relevant quota limits. The maximum limit for each region is also impactedby the CPU and memory configuration for your Cloud Run service. Specifically,the maximum number of instances available for your service is the minimum of each of thefollowing:

  • regionalCPU quota divided by the CPU configuration for the service.
  • regionalmemory quota divided by the memory configuration for the service.
  • regionalGPU quota, with or without zonal redundancy, divided by the GPU configuration for the service.

For example, a baseline quota of 1000 vCPU or 2000 GiB memory would allow a deployment using 2 GiB memory or 1 CPU to specify a maximum of 1000 instances for a single service.

These quotas are also accounted as a sum of all in-use resources in the region across of your Cloud Run resources. You may experience failures to scale up or start new job executions if your total usage reaches one of these limits.

You can see the baseline total CPU, Memory, and GPU limits per-region quota for your region in thequotas page in the console.

How to increase baseline regional quota

If you need a greater total count of CPU, memory, or GPU for the region your Cloud Run service isdeployed to, you canrequest a quota increase.

Best practices for setting maximum instances

The following section describes the best practices for configuring maximuminstance limits for your services.

Optimal maximum instance value for event-driven services

Event driven services, such as functions, can experience sporadic traffic spikesbased on incoming events. To determine an optimal maximum instance value forthese services, you need to consider factors such as, service invocation time,expected average invocation, peak invocation frequency, and fault tolerance forinvocation failures.

A good rule of thumb is to start with a maximum instancesvalue of 3, then monitor for invocation failures and adjust the maximuminstances value upward as necessary.

Handle requests when all instances are busy

Under normal circumstances, your service scales up by creating new instances tohandle incoming traffic load. But when you have set a maximum instances limit,you might encounter a scenario where there are insufficient instances to meetincoming traffic load.

In that scenario, Cloud Run attempts to serve a new inbound request forup to 30 seconds:

  • If an instance finishes processing its request during this time period, itmight start to process the new inbound request.
  • If no instance becomes available, the request will fail.

Cloud Run automatically saves events destined for event-driven servicesuntil capacity is available.

Maximum instance limits that exceed Cloud Run's scaling ability

When you specify a maximum instances limit, you are specifying an upper limit.Setting a large limit does not mean that your service will scale up to thespecified number of instances. It only means that the number of instances thatco-exist at any point in time shouldn't exceed the limit.

Further, setting a maximum instances limit might affect the scaling strategiesthat Cloud Run uses to meet your traffic demand. In general,Cloud Run will prioritize honoring your specified limit rather thanscaling up and potentially exceeding your limit.

Handle traffic spikes

In some cases, such as rapid traffic surges, Cloud Run might, for a shortperiod of time, create more instances than the specified maximum instanceslimit. If your service can't tolerate this temporary behavior, you might wantto factor in a safety margin and set a lower maximum instances value than yourservice can tolerate.

Deployments

When you deploy a new revision, Cloud Run migrates trafficfrom the earlier revision to the new one. Because maximum instance limits are setfor each revision independently, you might temporarily exceedthe specified limit during the period after deployment.

For example, a service might have a maximum instances limit of 5. Under normalcircumstances, the service scales up to 5 instances as it handles requests.When you deploy a new revision, the new revision has its own maxinstances limit of 5.

Requests that are already being handled by the previous revisionaren't interrupted when you deploy a new revision. Instead,these requests continue to make progress. New inbound requests will behandled by the newly-deployed revision of your service.

Thus, the service in the previous example might have up to 10 total instances(5 for each revision) during the period after deploying the newrevision. The amount of time required for instances of the previous revision toterminate depends on the time required for those instances to finish handlingany active requests. This is an additional factor to take into account whenselecting an appropriate max instances limit.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.