Load testing best practices

This page provides best practices for load testing yourCloud Run service to determine whether it scales successfully duringproduction use, and to find any bottlenecks that prevent it from scaling.

Tests to runbefore load testing

Identify and address concurrency problems in a development or small testenvironment before proceeding to load testing. Measure container concurrencybefore performing a load test, and make sure that your Cloud Runservice starts up reliably.

Focus your container tests on small incremental counts in manually scaled runs.You can approximate manual scaling in Cloud Run by settingmaximum instancesto the value that you wish to scale to.

If you have only recently built your container image or recently changed thecontainer image, test that independently before performing a load test.

You should also check other kinds of performance problems, such as excessivelatency and CPU utilization, before running a large scale load test.

Usemax instances appropriately

Cloud Run enforcesa maximum instancesto limit the scaling of a service. The default maximum number of instances is 100.If you expect your load test to exceed this default, make sure you work with youraccount team at Google and set a new maximum. If you do not yet have a relationshipwith an account team,contact Google Cloud sales.

The maximum number of instances that you can select depends on yourCPU limits andmemory limitsas well as the region you are deploying to.

These limits are managed by a quota limit and can be increased by making aquota limit increase request.

Load test in the regioneurope-west1

The Google Cloud regioneurope-west1 offers a high quotalimit, so Google recommends load testing ineurope-west1.Coordinate with your account team and submit a support case with details of thetime and scale of the test if you expect to approach quota limits.

Test an appropriate CPU utilization and service initialization profile

In an ideal scenario, you deploy a test version of your serviceto Cloud Run and load test it directly. However, in some cases, you mightbe unable to deploy a test version of your service. For example, yourCloud Run service might be part of a complex ecosystem that is hardto replicate in a test environment.

For these cases, you can approximate the performance of your service by simulating itwith a simpler service that has comparable CPU usage and comparable initialization times.Initialization time is particularly important for rapid scaling. Keep in mind thattesting with something too simple is also problematic. For example, avoid testing with asimplehello world service that returns received requests without any processing.

Use a test harness to generate loads

You can generate test loads causing a controlled spike in traffic using a test harness,such asJMeter. You can use the number of JMeter threadgroups and delay between requests in the JMeter test to increase the load.

You can also send simple HTTP requests or you can record a browser session withJMeter. Cloud Run lets you test your service without Internet accessby usingDeveloper Authentication. Thisallows access from a test harness like JMeter, running on a Compute Enginevirtual machine attached to a Virtual Private Cloud (VPC) associated with the project.

Do not generate load from tools where the rate and concurrency cannot be controlled.Pub/Sub is a poor choice of tool to generate load because you cannot control therate of the traffic and number of clients. If you do not know the rate and concurrency,then you will not know what you are testing.

Use detailed log analysis using exported logs

You need a second-by-second analysis of events to understand your Cloud Runservice's response to rapid traffic spikes. Log analysis is needed to do thisbecause the granularity ofmonitoring data is not sufficientlyfine grained. Log analysis also allows you to investigate the reasons for requests with highlatency.

When you write logs, you can get better logging performance by writing directlytostdout instead of using a Cloud Logging client library.

To set up a log export before starting the test, create alog sinkwith the destination BigQuery and an inclusion filter, such as:

resource.type="cloud_run_revision"resource.labels.service_name="[your app]"

Avoid spurious cold starts

To minimizecold startsexperienced by users, set theminimum number of instancesto at least 1.

Ensure that your service scales out linearly

Repeat the test at different loads to make sure that your Cloud Run servicescales out linearly with load and does not reach a limiting bottleneck at a loadless than you expect in production.

Analyze and visualize the results in Colaboratory

Use the summary monitoring charts to get a high level understanding of results tosupplement thedetailed log analysis using exported logs.

The monitoring charts can help you discover:

  • How quickly, to the nearest second, are new instances created and initialized?
  • How evenly are requests distributed across different instances?
  • How quickly can the latency at different percentiles be drawn down to a steady-state value?

You can use the Google Cloud console user interface for BigQuery to introspectthe exported log schema and preview results. Run the queries and plot results usingColab, which has readyintegration with BigQuery, Pandas, and Matplotlab. Colab also integrateseasily with rich data visualization tools likeSeaborn.

Find bottlenecks

Load tests can help you discover the existence of both inefficient code and scalingbottlenecks. Inefficient code leads to higher costs as it needs to handle moretraffic but does not necessarily prevent scaling. For example, a dependency on adatabase translation with table level locking can be a bottleneck that willprevent the Cloud Run service from scaling because only one transactioncan execute at a time.

Check performance as experienced by the client

You can query logs captured byJMeter, where the logsinclude latencies measured at the client. However, because server testing tools like JMeterare not the same as a browser or mobile client, you may also want to run a test witha browser-based framework, such asSelenium Webdriver, or a mobileclient testing framework. Be careful of excessive maximum latencies dueto TLS connection initialization that may skew results with outliers.

Summary of best practices

Perform a load test to determine whether migrating to Cloud Run is theright choice and that your service can scale to the maximum expected traffic.Run the test with a harness like JMeter. Export the logs to BigQueryfor detailed analysis.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.