Using Cloud Load Balancing metrics

This page reviews the types of load balancers available fromCloud Load Balancingand describes how to use the Cloud Monitoring metrics exposed by them asSLIs.

Cloud Load Balancing services often provide the first entry point forapplications hosted in Google Cloud. Load balancers are automaticallyinstrumented to provide information about traffic, availability, and latencyof the Google Cloud services that they expose; therefore, load balancersoften act as an excellent source of SLI metrics without the need forapplication instrumentation.

When getting started, you might choose to focus on availability andlatency as the primary dimensions of reliability and create SLIs and SLOsto measure and alert on those dimensions. This page provides implementationexamples.

For additional information, see the following:

Note: The filter strings in some of these examples have been line-wrapped for readability.

Availability SLIs and SLOs

For non-UDP applications, arequest-basedavailability SLI is the most appropriate, since service interactions map neatlyto requests.

You express a request-based availability SLI by using theTimeSeriesRatio structure to set up aratio of good requests to totalrequests, as shown in the following availability examples.To arrive at your preferred determination of "good" or "valid", you filter themetric using its available labels.

External layer 7 (HTTP/S) load balancer

HTTP/S load balancers are used to expose applications that are accessed overHTTP/S and to distribute traffic to resources located in multiple regions.

External Application Load Balancers write metric data to Monitoring usingthehttps_lb_rule monitored-resource type and metric types with the prefixloadbalancing.googleapis.com. The metric type that is mostrelevant to availability SLOs ishttps/request_count, whichyou can filter by using theresponse_code_class metric label.

If you choose to not count those requests that result in a 4xx error responsecode as "valid" because they might indicate client errors, rather than serviceor application errors, you can write the filter for "total" like this:

"totalServiceFilter":"metric.type=\"loadbalancing.googleapis.com/https/request_count\"resource.type=\"https_lb_rule\"resource.label.\"url_map_name\"=\"my-app-lb\"metric.label.\"response_code_class\"!=\"400\"",

You also need to determine how to count "good" requests. For example, if youchoose to count only those that return a 200 OK success status response code,you can write the filter for "good" like this:

"goodServiceFilter":"metric.type=\"loadbalancing.googleapis.com/https/request_count\"resource.type=\"https_lb_rule\"resource.label.\"url_map_name\"=\"my-app-lb\"metric.label.\"response_code_class\"=\"200\"",

You can then express a request-based SLI like this:

"serviceLevelIndicator":{"requestBased":{"goodTotalRatio":{"totalServiceFilter":"metric.type=\"loadbalancing.googleapis.com/https/request_count\"resource.type=\"https_lb_rule\"resource.label.\"url_map_name\"=\"my-app-lb\"metric.label.\"response_code_class\"!=\"400\"","goodServiceFilter":"metric.type=\"loadbalancing.googleapis.com/https/request_count\"resource.type=\"https_lb_rule\"resource.label.\"url_map_name\"=\"my-app-lb\"metric.label.\"response_code_class\"=\"200\"",}}},

For applications where traffic is served by multiple backends, you might chooseto define SLIs for a specific backend. To create an availability SLI for aspecific backend, use thehttps/backend_request_count metricwith thebackend_target_name resource label in your filters,as shown in this example:

"serviceLevelIndicator":{"requestBased":{"goodTotalRatio":{"totalServiceFilter":"metric.type=\"loadbalancing.googleapis.com/https/backend_request_count\"resource.type=\"https_lb_rule\"resource.label.\"url_map_name\"=\"my-app-lb\"resource.label.\"backend_target_name\"=\"my-app-backend\"metric.label.\"response_code_class\"!=\"400\"","goodServiceFilter":"metric.type=\"loadbalancing.googleapis.com/https/backend_request_count\"resource.type=\"https_lb_rule\" resource.label.\"url_map_name\"=\"my-app-lb\"resource.label.\"backend_target_name\"=\"my-app-backend\"metric.label.\"response_code_class\"=\"200\"",}}}

Internal layer 7 (HTTP/S) load balancer

Internal Application Load Balancers write metric data to Monitoring using theinternal_http_lb_rule monitored-resource type and metric types with theprefixloadbalancing.googleapis.com. The metric type that is mostrelevant to availability SLOs ishttps/internal_request_count,which you can filter by using theresponse_code_class metric label.

The following shows an example of a request-based availability SLI:

"serviceLevelIndicator":{"requestBased":{"goodTotalRatio":{"totalServiceFilter":"metric.type=\"loadbalancing.googleapis.com/https/internal/request_count\"resource.type=\"internal_http_lb_rule\"resource.label.\"url_map_name\"=\"my-internal-lb\"metric.label.\"response_code_class\"!=\"400\"","goodServiceFilter":"metric.type=\"loadbalancing.googleapis.com/https/internal/request_count\"resource.type=\"internal_http_lb_rule\"resource.label.\"url_map_name\"=\"my-internal-lb\"metric.label.\"response_code_class\"=\"200\"",}}},

Layer 3 (TCP) load balancers

TCP load balancers don't provide request metrics because the applications thatuse these might not be based on the request-response model. None of theloadbalancing.googleapis.com metrics provided by these load balancers lendthemselves to good availability SLIs.

To create availability SLIs for these load balancers, you must createcustom or logs-based metrics. For more information, seeUsing custom metricsorUsing logs-based metrics.

Latency SLIs and SLOs

For request-response applications, there are two ways to write latency SLOs:

  • As request-based SLOs.
  • As window-based SLOs.

Request-based latency SLOs

A request-based SLO applies a latency threshold and counts the fraction ofrequests that complete under the threshold within a given compliance window.An example of a request-based SLO is "99% of requests complete in under100 ms within a rolling one-hour window".

You express a request-based latency SLI by using aDistributionCut structure, as shown in thefollowing latency examples.

A single request-based SLO can't capture both typical performance andthe degradation of user experience, where the "tail" or slowest requests seeincreasingly longer response times. A SLO for typical performance doesn'tsupport understanding tail latency. For a discussion of tail latency, seethe section "Worrying About Your Tail" inChapter 6: Monitoring Distributed Systems ofSite Reliability Engineering.

To mitigate this limitation, you can write a second SLO to focus specificallyon tail latency, for example, "99.9% of requests complete in under1000 ms over a rolling 1 hour window". The combination of the two SLOscapture degradations in both typical user experience and tail latency.

Window-based latency SLOs

A window-based SLO defines a goodness criterion for time period of measurementsand computes the ratio of "good" intervals to the total number of intervals.An example of a window-based SLO is "The 95th percentile latency metricis less than 100 ms for at least 99% of one-minute windows, over a 28-dayrolling window":

  • A "good" measurement period is a one-minute span in which 95% of therequests have latency under 100 ms.
  • The measure of compliance is the fraction of such "good" periods.The service is compliant if this fraction is at least 0.99, calculatedover the compliance period.

You must use a window-based SLO if the raw metric available to you is a latency percentile; that is, when both of the following are true:

  • The data is bucketed into time periods (for example, into one-minuteintervals).
  • The data is expressed in percentile groups (for example, p50, p90, p95, p99).

For this kind of data, each percentile group indicates the time dividing thedata groups above and below that percentile. For example, a one-minute interval with a p95 latency metric of 89 ms tells you that, for that minute, the service responded to 95% of the requests in 89 ms or less.

External Application Load Balancer

External Application Load Balancers use the following primary metric types tocapture latency:

  • https/total_latencies: a distribution of the latencycalculated from when the request was received by the proxy until the proxy gotACK from client on last response byte. Use when overall user experience isof primary importance.

  • https/backend_latencies: a distribution of the latencycalculated from when the request was sent by the proxy to the backend untilthe proxy received from the backend the last byte of response. Use tomeasure latencies of specific backends serving traffic behind the loadbalancer.

These metrics are written against thehttps_lb_rule monitored-resource type.

Note: Both metrics have value typeDISTRIBUTION,which means that they contain histograms representing distribution of valuesacross buckets. The bucket boundaries for these metrics are created usingpowers of 1.4; that is, the first bucket is 0-1.4, the second is 1.4-1.96, andso forth. If you have very precise alerting requirements, then you mightconsider this as you set your thresholds, but most of the time, relyingon interpolation for thresholds that fall within a bucket is adequate.

Total latency

This example SLO expects that 99% of requests fall between 0 and 100ms in total latency over a rolling one-hour period:

{"serviceLevelIndicator":{"requestBased":{"distributionCut":{"distributionFilter":"metric.type=\"loadbalancing.googleapis.com/https/total_latencies\"resource.type=\"https_lb_rule\"","range":{"min":0,"max":100}}}},"goal":0.99,"rollingPeriod":"3600s","displayName":"98% requests under 100 ms"}

Backend latency

This example SLO expects that 98% of requests to the "my-app-backend"backend target fall between 0 and 100 ms in latency over a rolling one-hourperiod:

{"serviceLevelIndicator":{"requestBased":{"distributionCut":{"distributionFilter":"metric.type=\"loadbalancing.googleapis.com/https/backend_latencies\"resource.type=\"https_lb_rule\"resource.label.\"backend_target_name\"=\"my-app-backend\"","range":{"min":0,"max":100}}}},"goal":0.98,"rollingPeriod":"3600s","displayName":"98% requests under 100 ms"}

Internal Application Load Balancer

Internal Application Load Balancers use two primary metric types to capture latency:

  • https/internal/total_latencies: a distribution of thelatencycalculated from when the request was received by the proxy until the proxy gotACK from client on last response byte. Use when overall user experience isof primary importance.

  • https/internal/backend_latencies: a distribution of the latencycalculated from when the request was sent by the proxy to the backend untilthe proxy received from the backend the last byte of response. Use tomeasure latencies of specific backends serving traffic behind the loadbalancer.

These metrics are written against theinternal_http_lb_rulemonitored-resource type.

Note: Both metrics have value typeDISTRIBUTION,which means that they contain histograms representing distribution of valuesacross buckets. The bucket boundaries for these metrics are created usingpowers of 2 with a scale of 1e-06. If you have very precise alertingrequirements, then you mightconsider this as you set your thresholds, but most of the time, relyingon interpolation for thresholds that fall within a bucket is adequate.

Total latency

This example SLO expects that 99% of requests fall between 0 and 100 ms in totallatency over a rolling one-hour period:

{"serviceLevelIndicator":{"requestBased":{"distributionCut":{"distributionFilter":"metric.type=\"loadbalancing.googleapis.com/https/internal/total_latencies\"resource.type=\"internal_http_lb_rule\"","range":{"min":0,"max":100}}}},"goal":0.99,"rollingPeriod":"3600s","displayName":"98% requests under 100 ms"}

This example SLO expects that 99% of requests fall between 0 and 100 ms in totallatency over a rolling one-hour period.

Backend latency

This example SLO expects that 98% of requests to the "my-internal-backend"backend target fall between 0 and 100 ms in latency over a rolling one-hourperiod:

{"serviceLevelIndicator":{"requestBased":{"distributionCut":{"distributionFilter":"metric.type=\"loadbalancing.googleapis.com/https/internal/backend_latencies\"resource.type=\"https_lb_rule\"resource.label.\"backend_target_name\"=\"my-internal-backend\"","range":{"min":0,"max":100}}}},"goal":0.98,"rollingPeriod":"3600s","displayName":"98% requests under 100 ms"}

External layer 3 (TCP) load balancer

External TCP load balancers use a single metric type,l3/external/rtt_latencies, which records distribution ofround-trip time measured over TCP connections for external load-balancer flows.

This metric is written against thetcp_lb_rule resource.

This example SLO expects that 99% of requests fall between 0 and 100 ms in totallatency over a rolling one-hour period:

{"serviceLevelIndicator":{"requestBased":{"distributionCut":{"distributionFilter":"metric.type=\"loadbalancing.googleapis.com/l3/external/rtt_latencies\"resource.type=\"tcp_lb_rule\"","range":{"min":0,"max":100}}}},"goal":0.99,"rollingPeriod":"3600s","displayName":"98% requests under 100 ms"}

Internal layer 3 (TCP) load balancer

Internal TCP load balancers use a single metric type,l3/internal/rtt_latencies, which records distribution ofround-trip time measured over TCP connections for internal load-balancer flows.

This metric is written against theinternal_tcp_lb_rule resource.

This example SLO expects that 99% of requests fall between 0 and 100 ms in totallatency over a rolling one-hour period:

{"serviceLevelIndicator":{"requestBased":{"distributionCut":{"distributionFilter":"metric.type=\"loadbalancing.googleapis.com/l3/internal/rtt_latencies\"resource.type=\"internal_tcp_lb_rule\"","range":{"min":0,"max":100}}}},"goal":0.99,"rollingPeriod":"3600s","displayName":"98% requests under 100 ms"}

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-17 UTC.