Set maximum instances for services

This page describes how to set the maximum number of instances thatcan be used for your Cloud Run service usingthe defaultCloud Run autoscalingbehavior. To manually scale your service, seeManual scaling.

Specifying maximum instances inCloud Run lets you limit the scaling of your service in responseto incoming requests, although this maximum setting can be exceededfor a brief period due to circumstances such astraffic spikes.

You can use this setting as a way to control your costs or tolimit the number of connections to a backing service, such as to a database.

For information about the maximum instance limits that might apply to yourservice, refer toMaximum instances limits.

For more information on the way Cloud Run autoscales containerinstances, refer toInstance autoscaling.

Apply maximum instances at service-level versus revision-level

You can configure maximum instances at theservicelevel or at therevision level.Google recommends that you use service-level maximum instances unless you havea specific need to limit instances at the revision level.

When applying maximum instances, the settings go into effect as follows:

  • Service-level: immediately
  • Revision-level: upon deployment of the revision

Tagged revisions and service-level maximum instances

Tagged revisions arestarted, but only count toward the service-level maximum instances if they are apart of a traffic split.

Required roles

To get the permissions that you need to configure and deploy Cloud Run services, ask your administrator to grant you the following IAM roles:

If you are deploying aserviceorfunction from source code, youmust also have additional roles granted to you on your project andCloud Build service account.

For a list of IAM roles and permissions that are associated withCloud Run, seeCloud Run IAM rolesandCloud Run IAM permissions.If your Cloud Run service interfaces withGoogle Cloud APIs, such as Cloud Client Libraries, see theservice identity configuration guide.For more information about granting roles, seedeployment permissionsandmanage access.

Configure service-level maximum instances

You can change the maximum instances setting using the Google Cloud console,the Google Cloud CLI, YAML, or Terraform when youcreate a new service ordeploy a new revision.

Console

  1. In the Google Cloud console, go to the Cloud RunServices page:

    Go to Cloud Run

  2. If you are configuring a new service, clickDeploy container todisplay theCreate service form. Locate theService scaling form.

  3. If you are configuring an existing service, click the service to displayits detail panel, then click theEdit service level scaling settings at the top right of the detail panel.

  4. In the field labelledMaximum number of instances, specify therequired maximum number of container instances, using any integer valuefrom1 to themaximum limitpossible for your service. To disable the service-level maximum instances,clear any values you set in theMaximum number of instances field.

  5. ClickCreate for a new service orDeploy for an existing service.

gcloud

You canupdate the maximum number of instancesof a given service by using the following command:

gcloudrunservicesupdateSERVICE--maxMAX-VALUE

Replace the following:

  • SERVICE: the name of your service.
  • MAX-VALUE: the required maximum number of containerinstances, using any integer value from1 to themaximum limit possible for your service. To disable service-levelmaximum instances, set this value to0.

You can also set the maximum number of instances duringdeployment using the following command:

gcloudrundeploy--imageIMAGE_URL--maxMAX-VALUE

Replace the following:

  • IMAGE_URL: a reference to the container image, forexample,us-docker.pkg.dev/cloudrun/container/hello:latest. If you use Artifact Registry,therepositoryREPO_NAME mustalready be created. The URL follows the format ofLOCATION-docker.pkg.dev/PROJECT_ID/REPO_NAME/PATH:TAG.
  • MAX-VALUE: the required maximum number of containerinstances, using any integer value from1 to themaximum limit.

YAML

Any configuration change leads to thecreation of a new revision. Subsequent revisions will also automatically getthis configuration setting unless you make explicit updates to change it.

  1. If you are creating a new service, skip this step.If you are updating an existing service, download itsYAML configuration:

    gcloudrunservicesdescribeSERVICE--formatexport>service.yaml
  2. Update therun.googleapis.com/maxScale: attribute:

    apiVersion:serving.knative.dev/v1kind:Servicemetadata:name:SERVICEannotations:run.googleapis.com/maxScale:'MAX-INSTANCE'

    Replace the following:

    • SERVICE: the name of your Cloud Run service
    • MAX-INSTANCE: the required maximum number of containerinstances, using any integer value from1 to themaximum limit possible for your service. To disable service-level maximum instances, set this value to0.
  3. Create or update the service using the following command:

    gcloudrunservicesreplaceservice.yaml

Terraform

To learn how to apply or remove a Terraform configuration, seeBasic Terraform commands.

Add the following to agoogle_cloud_run_v2_service resource in your Terraform configuration:
resource"google_cloud_run_v2_service""default"{name="SERVICE"location="REGION"scaling{max_instance_count=MAX_INSTANCE}template{containers{image="IMAGE_URL"}}}

Replace the following:

  • SERVICE: the name of your Cloud Run service.
  • REGION: the Google Cloud region—for example,europe-west1.
  • MAX_INSTANCE: the required maximum number of containerinstances, using any integer value from1 to themaximum limit.
  • IMAGE_URL: a reference to the container image, forexample,us-docker.pkg.dev/cloudrun/container/hello:latest. If you use Artifact Registry,therepositoryREPO_NAME mustalready be created. The URL follows the format ofLOCATION-docker.pkg.dev/PROJECT_ID/REPO_NAME/PATH:TAG

View service-level maximum instances

To view the current service-level maximum instances settings for yourCloud Run service:

Console

  1. In the Google Cloud console, go to the Cloud RunServices page:

    Go to Cloud Run

  2. Click that service to open itsService detailspanel.

  3. View the current setting in the upper right of the service detailspanel, next toScaling.

gcloud

  1. Use the following command:

    gcloudrunservicesdescribeSERVICE
  2. Locate the value forScaling: Auto (Min:MIN_VALUE, Max:MAX_VALUE) in the returned configuration.

Configure revision-level maximum instances

Any configuration change leads to thecreation of a new revision. Subsequent revisions will also automatically getthis configuration setting unless you make explicit updates to change it.

By default, Cloud Run revisions are configured to scale up to a maximumof 100 instances.

You can change the maximum instances setting using the Google Cloud console,the Google Cloud CLI, or a YAML file when youcreate a new service ordeploy a new revision.

Console

  1. In the Google Cloud console, go to Cloud Run:

    Go to Cloud Run

  2. Find and click the service you want to update in the services listto open that service's details.

  3. ClickEdit and deploy new revision to display the revision deployment form.

  4. Click theContainer tab.

  5. Locate theRevision scaling section. In the field labelledMaximum number of instances, specify the maximum number of container instances.

  6. ClickDeploy.

gcloud

You canupdate the maximum number of instancesof a given service by using the following command:

gcloudrunservicesupdateSERVICE--max-instancesMAX-VALUE

Replace the following:

  • SERVICE: the name of your service.
  • MAX-VALUE: the required maximum number of containerinstances, using any integer value from1 to themaximum limit.

YAML

  1. If you are creating a new service, skip this step.If you are updating an existing service, download itsYAML configuration:

    gcloudrunservicesdescribeSERVICE--formatexport>service.yaml
  2. Update theautoscaling.knative.dev/maxScale: attribute:

    apiVersion:serving.knative.dev/v1kind:Servicemetadata:name:SERVICEspec:template:metadata:annotations:autoscaling.knative.dev/maxScale:'MAX-INSTANCE'name:REVISION

    Replace the following:

    • SERVICE: the name of your Cloud Run service
    • MAX-INSTANCE: the required maximum number of containerinstances, using any integer value from1 to themaximum limit.
    • REVISION with a new revision name or delete it (if present). If you supply a new revision name, itmust meet the following criteria:
      • Starts withSERVICE-
      • Contains only lowercase letters, numbers and-
      • Does not end with a-
      • Does not exceed 63 characters
  3. Create or update the service using the following command:

    gcloudrunservicesreplaceservice.yaml

View revision-level maximum instance settings

To view the current revision-level maximum instances settings for yourCloud Run service:

Console

  1. In the Google Cloud console, go to the Cloud RunServices page:

    Go to Cloud Run

  2. Click that service to open itsService detailspanel.

  3. Click theRevisions tab.

  4. In the details panel at the right, view theRevision max. instances settinglisted under theContainer tab.

gcloud

  1. Use the following command:

    gcloudrunservicesdescribeSERVICE
  2. Locate the value forMax instances: in the returned configuration.

Use both service-level and revision-level minimum or maximum instances

The following table shows the behavior if you combine service-level maximuminstances with revision-level minimum or maximum instances:

Service-level settingRevision-level settingBehavior
Service-level maximum instances setRevision-level maximum instances setEffective maximum instance limit is the lesser value between revision-level maximum instances and service-level maximum instances.
Service-level maximum instances setRevision-level minimum instances setIf service-level maximum instances is set to a value lower than revision-level minimum instances, then the revision starts instances up to the service-level maximum instances, and won't reach the configured revision-level minimum instances.

Use service-level maximum instances with traffic splitting

If you usetraffic splitting,the service-level maximum instances are distributed across the revisions based on theproportion of the traffic split. For example, if the service-level maximuminstances = 100, a 50/50 traffic split allocates 50 service-level maximum instancesto each revision.The following table shows a sample configuration scenario:

Sample configurationResulting behavior
Service-level maximum instances set (scenario where there are no revision-level settings): 100
  • Traffic spit for Revision A: 10%
  • Traffic split for Revision B: 10%
  • Traffic split for Revision C : 80%
A portion of the service-level maximum instances is allocated to each revision. The effective maximum instances for each revision is fixed based on traffic split. Maximum instances for Revision A is 10, Revision B is 10, and Revision C is 80.
Note: When using traffic splitting, revisions might reach their effective maximum instances limit, and returnNo available container instances error, even if the service as a whole, hasn't reached the service-level maximum instances limit.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.