Guidelines for load testing backend services with Application Load Balancers

While integrating a backend service with Application Load Balancer, it's importantto measure the performance of a backend service on its own, in the absence of aload balancer. Load testing under controlled conditions helps you assesscapacity-planning trade-offs between different dimensions of performance, suchas throughput and latency. Because careful capacity planning could stillunderestimate actual demand, we recommend that you use load tests to proactivelydetermine how the availability of a service is affected when the system isoverloaded.

Load testing goals

A typical load test measures the externally visible behavior of the backendservice under different dimensions of load. Some of the most relevant dimensionsof this testing are as follows:

To assess the health of the server under load, a load test procedure might alsocollect the following internal service metrics:

Capacity planning versus overload testing

Load-testing tools help you measure different scaling dimensions individually.For capacity planning, determine the load threshold for the acceptableperformance in multiple dimensions. For example, instead of measuring theabsolute limit of a service request throughout, consider measuring thefollowing:

  • The request rate at which the service can serve with 99th-percentile latencythat is less than a specified number of milliseconds. The number isspecified by the SLO of the service.
  • The maximum request rate that doesn't cause system resource utilization to exceedoptimal levels. Note that the optimal utilization varies by application andcould be significantly less than 100%. For example, at 80% peak memoryutilization, the application might be able to handle minor load spikesbetter than if the peak utilization were at 99%.

While it's important to use load test results to form capacity planningdecisions, it's equally important to understand how a service behaves when loadexceeds capacity. Some server behaviors that are often evaluated using overloadtests are as follows:

  • Load-shedding: When a service receives excessive incoming requests orconnections, it could respond by slowing down all requests, or by rejectingsome requests to maintain acceptable performance for the remaining ones. Werecommend the latter approach to prevent client timeouts before receiving aresponse and to reduce the risk of memory exhaustion by lowering requestconcurrency on the server.

  • Resilience against resource exhaustion: A service generally avoids crashingfrom resource exhaustion because it's difficult for pending requests tomake further progress if the service has crashed. If a backend service hasmany instances, the robustness of individual instances is vital for overallavailability of the service. While an instance restarts from a crash, otherinstances might experience more load, potentially causing cascade failure.

General testing guidelines

While defining your test cases, consider the following guidelines.

Note: The guidelines listed in this section are not applicable to every service,and the list is not exhaustive. For more information, see thebest benchmarkingpracticespage maintained by the Envoy project. The page contains many suggestions thatare relevant to load testing of any HTTPS service.

Create small-scale tests

Create small-scale tests to measure the performance limits of the server. Withexcessive server capacity, there's a risk that a test won't reveal theperformance limits of the service itself, but might uncover bottlenecks in othersystems, such as the client hosts or the network layer.

For best results, consider a test case that uses a single virtual machine (VM)instance or a Google Kubernetes Engine (GKE) Pod to independently test the service. Toachieve full load on the server, if necessary, you can use multiple VMs, butremember they can complicate the collection of performance data.

Choose open-loop load patterns

Most load generators use the closed-loop pattern to limit the number ofconcurrent requests and delay new requests until the previous ones are complete.We don't recommend this approach because production clients of the service mightnot exhibit such throttling behavior.

In contrast, the open-loop pattern enables load generators to simulate theproduction load by sending requests at a steady rate, independent of the rate atwhich server responses arrive.

Run tests using recommended load generators

We recommend the following load generators for the load testing of the backendservice:

Nighthawk

Nighthawk is anopen-source tool developed in coordination with the Envoy project. You can useit to generate client load,visualizebenchmarks,and measure server performance for most load-testing scenarios of HTTPSservices.

Test HTTP/1

To test HTTP/1, use the following command:

nighthawk_clientURI \    --durationDURATION \    --open-loop \    --no-default-failure-predicates \    --protocol http1 \    --request-body-sizeREQ_BODY_SIZE \    --concurrencyCONCURRENCY \    --rpsRPS \    --connectionsCONNECTIONS

Replace the following:

  • URI: the URI to benchmark
  • DURATION: total test run time in seconds
  • REQ_BODY_SIZE: size of the POST payload in eachrequest
  • CONCURRENCY: the total number of concurrent eventloops

    This number should match the core count of the client VM

  • RPS: the target rate of requests per second, perevent loop

  • CONNECTIONS: the number of concurrent connections,per event loop

Note: Theno-default-failure-predicates flag is new. Use the latest Nighthawkbuild to test it.

See the following example:

nighthawk_client http://10.20.30.40:80 \    --duration 600 --open-loop --no-default-failure-predicates \    --protocol http1 --request-body-size 5000 \    --concurrency 16 --rps 500 --connections 200

The output of each test run provides a histogram of response latencies. In theexample from the Nighthawkdocumentation, notice that the 99th-percentile latency is approximately 135 microseconds.

Initiation to completion    samples: 9992    mean:    0s 000ms 113us    pstdev:  0s 000ms 061us    Percentile  Count       Latency    0           1           0s 000ms 077us    0.5         4996        0s 000ms 115us    0.75        7495        0s 000ms 118us    0.8         7998        0s 000ms 118us    0.9         8993        0s 000ms 121us    0.95        9493        0s 000ms 124us    0.990625    9899        0s 000ms 135us    0.999023    9983        0s 000ms 588us    1           9992        0s 004ms 090us

Test HTTP/2

To test HTTP/2, use the following command:

nighthawk_clientURI \    --durationDURATION \    --open-loop \    --no-default-failure-predicates \    --protocol http2 \    --request-body-sizeREQ_BODY_SIZE \    --concurrencyCONCURRENCY \    --rpsRPS \    --max-active-requestsMAX_ACTIVE_REQUESTS \    --max-concurrent-streamsMAX_CONCURRENT_STREAMS

Replace the following:

  • URI: the URI to benchmark
  • DURATION: total test run time in seconds
  • REQ_BODY_SIZE: size of the POST payload in eachrequest
  • CONCURRENCY: the total number of concurrent eventloops

    This number should match the core count of the client VM

  • RPS: the target rate of requests per second for eachevent loop

  • MAX_ACTIVE_REQUESTS: the maximum number ofconcurrent active requests for each event loop

  • MAX_CONCURRENT_STREAMS: the maximum number ofconcurrent streams allowed on each HTTP/2 connection

See the following example:

nighthawk_client http://10.20.30.40:80 \    --duration 600 --open-loop --no-default-failure-predicates \    --protocol http2 --request-body-size 5000 \    --concurrency 16 --rps 500 \    --max-active-requests 200 --max-concurrent-streams 1

ab (Apache benchmark tool)

abis a less flexible alternative to Nighthawk, but it's available as a package onalmost every Linux distribution.ab is only recommended for quick and simpletests.

To installab, use the following command:

  • On Debian and Ubuntu, runsudo apt-get install apache2-utils.
  • On RedHat-based distributions, runsudo yum install httpd-utils.

After you've installedab, use the following command to run it:

ab -cCONCURRENCY \    -nNUM_REQUESTS \    -tTIMELIMIT \    -pPOST_FILEURI

Replace the following:

  • CONCURRENCY: number of concurrent requests toperform
  • NUM_REQUESTS: number of requests to perform
  • TIMELIMIT: maximum number of seconds to spend onrequests
  • POST_FILE: local file containing the HTTP POSTpayload
  • URI: the URI to benchmark

See the following example:

ab -c 200 -n 1000000 -t 600 -p body http://10.20.30.40:80

The command in the preceding example sends requests with a concurrency of 200(closed-loop pattern), and stops after either 1,000,000 (one million) requestsor 600 seconds of elapsed time. The command also includes the contents of thefilebody as an HTTP POST payload.

Theab command produces response latency histograms similar to those asNighthawk, but its resolution is limited to milliseconds, instead ofmicroseconds:

Percentage of the requests served within a certain time (ms)    50%     7    66%     7    75%     7    80%     7    90%    92    95%   121    98%   123    99%   127    100%  156 (longest request)

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.