Guidelines for load testing backend services with Application Load Balancers

While integrating a backend service with Application Load Balancer, it's importantto measure the performance of a backend service on its own, in the absence of aload balancer. Load testing under controlled conditions helps you assesscapacity-planning trade-offs between different dimensions of performance, suchas throughput and latency. Because careful capacity planning could stillunderestimate actual demand, we recommend that you use load tests to proactivelydetermine how the availability of a service is affected when the system isoverloaded.

Load testing goals

A typical load test measures the externally visible behavior of the backendservice under different dimensions of load. Some of the most relevant dimensionsof this testing are as follows:

Request throughput: The number of requests served per second.
Request concurrency: The number of requests processed concurrently.
Connection throughput: The number of connections initiated by clientsper second. Most services that use Transport Layer Security (TLS) have somenetwork transport and TLS negotiation overhead associated with eachconnection that is independent of request processing.
Connection concurrency: The number of client connections processedconcurrently.
Note: This number might be lower than the request concurrency because HTTPSclients often reuse connections for multiple requests. Note that connectionsconsume system resources (such as RAM and file handles) even when they arenot serving requests.
Request latency: The total elapsed time between the beginning of therequest and the end of the response.
Error rate: How often requests cause errors, such as HTTP 5xx errors andprematurely closed connections.

To assess the health of the server under load, a load test procedure might alsocollect the following internal service metrics:

Use of system resources: System resources, such as CPU, RAM, and filehandles (sockets), are typically expressed in percentage.
The importance of these metrics differs based on how the service isimplemented. Applications experience reduced performance, shed load, orcrash when they exhaust their resources. Therefore, it becomes essential todetermine the availability of resources when a host is under heavy load.
Use of other bounded resources: Non-system resources that could bedepleted under load, such as at the application layer.
Some examples of such resources include the following:
- A bounded pool of worker threads or processes.
- For an application server using threads, it's common to limit the numberof worker threads operating concurrently. Thread pool size limits are usefulfor preventing memory and CPU exhaustion, but default settings are oftenvery conservative. Limits that are too low might prevent adequate use ofsystem resources.
- Some servers use process pools, instead of thread pools. For example, anApache server when set up with the Prefork Multi-ProcessingModel,assigns one process to each client connection. So, the size limit of thepool determines the upper bound on connection concurrency.
- A service deployed as a frontend to another service that has a backendconnection pool of bounded size.

Capacity planning versus overload testing

Load-testing tools help you measure different scaling dimensions individually.For capacity planning, determine the load threshold for the acceptableperformance in multiple dimensions. For example, instead of measuring theabsolute limit of a service request throughout, consider measuring thefollowing:

The request rate at which the service can serve with 99th-percentile latencythat is less than a specified number of milliseconds. The number isspecified by the SLO of the service.
The maximum request rate that doesn't cause system resource utilization to exceedoptimal levels. Note that the optimal utilization varies by application andcould be significantly less than 100%. For example, at 80% peak memoryutilization, the application might be able to handle minor load spikesbetter than if the peak utilization were at 99%.

While it's important to use load test results to form capacity planningdecisions, it's equally important to understand how a service behaves when loadexceeds capacity. Some server behaviors that are often evaluated using overloadtests are as follows:

Load-shedding: When a service receives excessive incoming requests orconnections, it could respond by slowing down all requests, or by rejectingsome requests to maintain acceptable performance for the remaining ones. Werecommend the latter approach to prevent client timeouts before receiving aresponse and to reduce the risk of memory exhaustion by lowering requestconcurrency on the server.
Resilience against resource exhaustion: A service generally avoids crashingfrom resource exhaustion because it's difficult for pending requests tomake further progress if the service has crashed. If a backend service hasmany instances, the robustness of individual instances is vital for overallavailability of the service. While an instance restarts from a crash, otherinstances might experience more load, potentially causing cascade failure.

General testing guidelines

While defining your test cases, consider the following guidelines.

Note: The guidelines listed in this section are not applicable to every service,and the list is not exhaustive. For more information, see the best benchmarkingpracticespage maintained by the Envoy project. The page contains many suggestions thatare relevant to load testing of any HTTPS service.

Create small-scale tests

Create small-scale tests to measure the performance limits of the server. Withexcessive server capacity, there's a risk that a test won't reveal theperformance limits of the service itself, but might uncover bottlenecks in othersystems, such as the client hosts or the network layer.

For best results, consider a test case that uses a single virtual machine (VM)instance or a Google Kubernetes Engine (GKE) Pod to independently test the service. Toachieve full load on the server, if necessary, you can use multiple VMs, butremember they can complicate the collection of performance data.

Choose open-loop load patterns

Most load generators use the closed-loop pattern to limit the number ofconcurrent requests and delay new requests until the previous ones are complete.We don't recommend this approach because production clients of the service mightnot exhibit such throttling behavior.

In contrast, the open-loop pattern enables load generators to simulate theproduction load by sending requests at a steady rate, independent of the rate atwhich server responses arrive.

Run tests using recommended load generators

We recommend the following load generators for the load testing of the backendservice:

Nighthawk

Nighthawk is anopen-source tool developed in coordination with the Envoy project. You can useit to generate client load,visualizebenchmarks,and measure server performance for most load-testing scenarios of HTTPSservices.

Test HTTP/1

To test HTTP/1, use the following command:

nighthawk_clientURI \    --durationDURATION \    --open-loop \    --no-default-failure-predicates \    --protocol http1 \    --request-body-sizeREQ_BODY_SIZE \    --concurrencyCONCURRENCY \    --rpsRPS \    --connectionsCONNECTIONS

Replace the following:

URI: the URI to benchmark
DURATION: total test run time in seconds
REQ_BODY_SIZE: size of the POST payload in eachrequest
CONCURRENCY: the total number of concurrent eventloops
This number should match the core count of the client VM
RPS: the target rate of requests per second, perevent loop
CONNECTIONS: the number of concurrent connections,per event loop

Note: Theno-default-failure-predicates flag is new. Use the latest Nighthawkbuild to test it.

See the following example:

nighthawk_client http://10.20.30.40:80 \    --duration 600 --open-loop --no-default-failure-predicates \    --protocol http1 --request-body-size 5000 \    --concurrency 16 --rps 500 --connections 200

The output of each test run provides a histogram of response latencies. In the example from the Nighthawkdocumentation, notice that the 99th-percentile latency is approximately 135 microseconds.

Initiation to completion    samples: 9992    mean:    0s 000ms 113us    pstdev:  0s 000ms 061us    Percentile  Count       Latency    0           1           0s 000ms 077us    0.5         4996        0s 000ms 115us    0.75        7495        0s 000ms 118us    0.8         7998        0s 000ms 118us    0.9         8993        0s 000ms 121us    0.95        9493        0s 000ms 124us    0.990625    9899        0s 000ms 135us    0.999023    9983        0s 000ms 588us    1           9992        0s 004ms 090us

Test HTTP/2

To test HTTP/2, use the following command:

nighthawk_clientURI \    --durationDURATION \    --open-loop \    --no-default-failure-predicates \    --protocol http2 \    --request-body-sizeREQ_BODY_SIZE \    --concurrencyCONCURRENCY \    --rpsRPS \    --max-active-requestsMAX_ACTIVE_REQUESTS \    --max-concurrent-streamsMAX_CONCURRENT_STREAMS

Replace the following:

URI: the URI to benchmark
DURATION: total test run time in seconds
REQ_BODY_SIZE: size of the POST payload in eachrequest
CONCURRENCY: the total number of concurrent eventloops
This number should match the core count of the client VM
RPS: the target rate of requests per second for eachevent loop
MAX_ACTIVE_REQUESTS: the maximum number ofconcurrent active requests for each event loop
MAX_CONCURRENT_STREAMS: the maximum number ofconcurrent streams allowed on each HTTP/2 connection

See the following example:

nighthawk_client http://10.20.30.40:80 \    --duration 600 --open-loop --no-default-failure-predicates \    --protocol http2 --request-body-size 5000 \    --concurrency 16 --rps 500 \    --max-active-requests 200 --max-concurrent-streams 1

ab (Apache benchmark tool)

abis a less flexible alternative to Nighthawk, but it's available as a package onalmost every Linux distribution.ab is only recommended for quick and simpletests.

To installab, use the following command:

On Debian and Ubuntu, runsudo apt-get install apache2-utils.
On RedHat-based distributions, runsudo yum install httpd-utils.

After you've installedab, use the following command to run it:

ab -cCONCURRENCY \    -nNUM_REQUESTS \    -tTIMELIMIT \    -pPOST_FILEURI

Replace the following:

CONCURRENCY: number of concurrent requests toperform
NUM_REQUESTS: number of requests to perform
TIMELIMIT: maximum number of seconds to spend onrequests
POST_FILE: local file containing the HTTP POSTpayload
URI: the URI to benchmark

See the following example:

ab -c 200 -n 1000000 -t 600 -p body http://10.20.30.40:80

The command in the preceding example sends requests with a concurrency of 200(closed-loop pattern), and stops after either 1,000,000 (one million) requestsor 600 seconds of elapsed time. The command also includes the contents of thefilebody as an HTTP POST payload.

Theab command produces response latency histograms similar to those asNighthawk, but its resolution is limited to milliseconds, instead ofmicroseconds:

Percentage of the requests served within a certain time (ms)    50%     7    66%     7    75%     7    80%     7    90%    92    95%   121    98%   123    99%   127    100%  156 (longest request)

What's next

Capacity management with load balancing

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.

Movatterモバイル変換

Guidelines for load testing backend services with Application Load Balancers Stay organized with collections Save and categorize content based on your preferences.

Load testing goals

Capacity planning versus overload testing

General testing guidelines

Create small-scale tests

Choose open-loop load patterns

Run tests using recommended load generators

Nighthawk

Test HTTP/1

Test HTTP/2

ab (Apache benchmark tool)

What's next

Guidelines for load testing backend services with Application Load Balancers