Custom metrics for Application Load Balancers Stay organized with collections Save and categorize content based on your preferences.
This page describes how to use custom metrics with yourApplication Load Balancers. Custom metrics let you configure your load balancer'straffic distribution behavior to be based on metrics specific to yourapplication or infrastructure requirements, rather than Google Cloud'sstandard utilization or rate-based metrics. Defining custom metrics for yourload balancer gives you the flexibility to route application requests to thebackend instances and endpoints that are most optimal for your workload.
For GKE, you can also use custom metrics that come from theservice or application that you are running. For details, seeExpose custom metrics.
The load balancer uses the custom metrics values to make the followingdecisions:
- Select which backend virtual machine (VM) instance group or network endpointgroup is to receive traffic.
- Select which VM instance or endpoint is to receive traffic.
Here are some example use-cases for custom metrics:
Maximize the use of your global compute capacity by making load balancingdecisions based on custom metrics that are most relevant to your application,instead of the default criteria such as regional affinity or network latency.
In case your applications often have backend processing latencies in the orderof seconds, you can use your global compute capacity more efficiently by loadbalancing requests based on custom metrics rather than network latency.
Maximize compute efficiency by making load balancing decisions based oncombinations of metrics unique to your deployment. For example, consider ascenario where your requests have highly variable processing times and computerequirements. In such a scenario, load balancing based solely on the rate ofrequests per second results in an uneven load distribution. In such acase you might want to define a custom metric that balances load based on acombination of both the rate of requests as well as CPU or GPU utilization tomost efficiently use your compute fleet.
Autoscale backends based on custom metrics that are most relevant to yourapplication requirements. For example, you can define an autoscaling policy toautoscale your backend instances when your configured custom metric exceeds80%. This is achieved by using traffic-based autoscaling metrics(
autoscaling.googleapis.com|gclb-capacity-fullness). For more information,seeAutoscaling based on load balancer traffic.
Supported load balancers and backends
Custom metrics are supported for the following Application Load Balancers:
- Global external Application Load Balancer
- Regional external Application Load Balancer
- Cross-region internal Application Load Balancer
- Regional internal Application Load Balancer
Custom metrics are supported with the following backend types:
- Managed instance groups
- Zonal NEGs (with
GCE_VM_IP_PORTendpoints) - Hybrid connectivity NEGs
How custom metrics work
To enable your load balancer to make traffic distribution decisions based oncustom metrics, you must first determine what the most relevant metrics are foryour specific application. When you know which metrics you want to use, you thenconfigure your backends to start reporting a steady stream of these metrics toyour load balancer. Google Cloud lets you report metrics as part of theheader of each HTTP response sent from the backends to your load balancer. Thesemetrics are encapsulated in a custom HTTP response header and must follow theOpen Request Cost Aggregation(ORCA) standard.
Metrics can be configured at two levels:
- At the backend service level, to influence backend (MIG or NEG) selection
- At the backend level, to influence VM instance or endpoint selection
The following sections describe how custom metrics work.
Determine which custom metrics influence load balancing decisions
Determining which custom metrics influence load balancing decisions ishighly subjective and based on the needs of your applications. For example, ifyour applications have backend processing latencies in the order ofseconds, then you might want to load balance requests based on other custommetrics rather than standard network latencies.
After you have determined which metrics you want to use, you must also determinethe maximum utilization threshold for each metric. For example, if you want touse memory utilization as a metric, you must also determine the maximum memoryutilization threshold for each backend.
For example, if you configure a metric calledexample-custom-metric, with itsmaximum utilization threshold set to 0.8, the load balancer dynamically adjuststraffic distribution across backends to keep theexample-custom-metric metricreported by the backend less than 0.8, as much as possible.
There are two types of custom metrics you can use:
Reserved metrics. There are five reserved metric names; these names arereserved because they correspond to top-level predefined fields in the ORCAAPI.
orca.cpu_utilizationorca.mem_utilizationorca.application_utilizationorca.epsorca.rps_fractional
The
mem_utilization,cpu_utilization, andapplication_utilizationmetrics expect values in the range of0.0 - 1.00but can exceed1.00forscenarios where resource utilization goes over budget.Named metrics. These are metrics that are unique to your application thatyou specify by using the ORCA
named_metricsfield in the following format:orca.named_metrics.METRIC_NAME
All user-defined custom metrics are specified using this
named_metricsmap in the format of name, value pairs.Named metrics defined for the
CUSTOM_METRICSbalancing mode must includevalues in the0 - 100range. Named metrics defined for theWEIGHTED_ROUND_ROBINload balancing locality policy have no expected range.
Required metrics
To enable your load balancer to use custom metrics for backend VM instancegroup or network endpoint group selection, you must specifyone or more ofthe following utilization metrics in the ORCA load report sent to the load balancer.orca.named_metrics is a map of user-defined metrics in the form ofname, value pairs.
orca.cpu_utilizationorca.application_utilizationorca.mem_utilizationorca.named_metrics
Additionally, to enable your load balancer to use custom metrics to furtherinfluence the selection of the backend VM instance or endpoint, you must provideall of the following metrics in the ORCA load report sent to the loadbalancer. The load balancer uses weights computed from these reported metrics toassign load to individual backends.
orca.rps_fractional(requests per second)orca.eps(errors per second)- a utilization metric with the following order of precedence:
orca.application_utilizationorca.cpu_utilization- user-defined metrics in the
orca.named_metricsmap
Limits and requirements
There is a limit of two custom metrics per backend. However, you can perform
dryRuntests with a maximum of three custom metrics.If two metrics are provided, the load balancer treats them independently.For example, if you define two dimensions:
custom-metric-util1andcustom-metric-util2, the load balancer treats them independently. If abackend is running at a high utilization level in terms ofcustom-metric-util1, the load balancer avoids sending traffic to thisbackend. Generally, the load balancer tries to keep all backends runningwith roughly the samefullness. Fullness is computed ascurrentUtilization/maxUtilization. In this case, the load balanceruses the higher of the two fullness values reported by the two metrics tomake load balancing decisions.There is a limit of two custom metrics per backend service. However, you canperform
dryRuntests with a maximum of three custom metrics.This limit doesn't include the requiredorca.epsandorca.rps_fractionalmetrics. This limit is also independent of metrics configured at the backendlevel.Both reserved metrics and named metrics can be used together. For example,both
orca.cpu_utilization = 0.5and a custom metric such asorca.named_metrics.queue_depth_util = 0.2can be provided in a single loadreport.Custom metric names must not contain regulated, sensitive, identifiable, orother confidential information that anyone external to your organizationmust not see.
Available encodings for custom metric specification
JSON
Sample JSON encoding of a load report:
endpoint-load-metrics-json: JSON {"cpu_utilization": 0.3, "mem_utilization": 0.8, "rps_fractional": 10.0, "eps": 1, "named_metrics": {"custom-metric-util": 0.4}}.Binary Protobuf
For Protocol Buffers-aware code, this is a binary serialized base64encoded OrcaLoadReport protobuf in either
endpoint-load-metrics-binor inendpoint-load-metrics: BIN.Native HTTP
Comma separated key-value pairs in
endpoint-load-metrics. This is aflattened text representation of the OrcaLoadReport:endpoint-load-metrics: TEXT cpu_utilization=0.3, mem_utilization=0.8, rps_fractional=10.0, eps=1, named_metrics.custom_metric_util=0.4
gRPC
gRPC specification requires the metrics to be provided by using trailingmetadata using the
endpoint-load-metrics-binkey.
Backend configuration to report custom metrics
After you determine the metrics you want the load balancer to use, youconfigure your backends to compile the required custom metrics in an ORCAload report and report their values in each HTTP response header sent tothe load balancer.
For example, if you choseorca.cpu_utilization as a custom metric for abackend, that backend must report the current CPU utilization to the loadbalancer in each response sent to the load balancer. For instructions, see thereport metrics to the load balancer section on this page.
Load balancer configuration to support custom metrics
To enable the load balancer to use the custom metrics values reported by thebackends to make traffic distribution decisions, you must set each backend'sbalancing mode toCUSTOM_METRICS and set the backend service load balancinglocality policy toWEIGHTED_ROUND_ROBIN.
CUSTOM_METRICSbalancing mode. Each of your backends in a backendservice must be configured to use theCUSTOM_METRICSbalancing mode.When a backend is configured withCUSTOM_METRICSbalancing mode,the load balancer directs traffic to the backends according to the maximumutilization threshold configured for each custom metric.Each backend can specify a different set of metrics to report. If multiplecustom metrics are configured per backend, the load balancer tries todistribute traffic such that all the metrics remain below the configuredmaximum utilization limits.
Traffic is load balanced across backends based on the load balancingalgorithm you choose; for example, thedefault
WATERFALL_BY_REGIONalgorithm triesto keep all backends running with the same fullness.WEIGHTED_ROUND_ROBINload balancing locality policy. The backendservice's load balancing locality policy must be set toWEIGHTED_ROUND_ROBIN. With this configuration, the load balancer alsouses the custom metrics to select the optimal instance or endpoint withinthe backend to serve the request.
Configure custom metrics
To enable your Application Load Balancers to use custom metrics, do the following:
- Determine the custom metrics you want to use.
- Configure the backends to report custom metrics to the load balancer. Youmust establish a stream of data that can be sent to the load balancer to beused for load balancing. These metrics must be compiled and encodedin an ORCA load report and then reported to the load balancer by using HTTPresponse headers.
- Configure the load balancer to use the custom metric values being reported bythe backends.
Determine the custom metrics
This step is highly subjective based on the needs of your applications.After you have determined which metrics you want to use, you must also determinethe maximum utilization threshold for each metric. For example, if you want touse memory utilization as a metric, you must also determine the maximum memoryutilization threshold for each backend.
Before you proceed to configuring the load balancer, make sure you have reviewedthe types of custom metrics available to you (reserved and named) and therequirements for metric selection, which are described in theHow custom metricswork section on this page.
Configure backends to report metrics to the load balancer
Custom metrics are reported to load balancers as part of each HTTP response fromyour application backends by using the ORCA standard.
When using Google Kubernetes Engine, you also have the option to usecustom metrics for load balancers.
This section shows you howto compile the custom metrics in an ORCA load report and report these metrics ineach HTTP response header sent to the load balancer.
For example, if you're using HTTP text encoding, the header mustreport the metrics in the following format.
endpoint-load-metrics: TEXTBACKEND_METRIC_NAME_1=BACKEND_METRIC_VALUE_1,BACKEND_METRIC_NAME_2=BACKEND_METRIC_VALUE_2
Regardless of theencoding format used, make sure that you removetheorca. prefix from the metric name when you build the load report.
Here is a code snippet that shows how to append two custom metrics(customUtilA andcustomUtilB) to your HTTP headers. This code snippet showsboth native HTTP text encoding and base64 encoding. Note that this examplehardcodes the values forcustomUtilA andcustomUtilB only for simplicity.Your load balancer receives the values for the metrics that youdetermined are to influence load balancing.
...typeOrcaReportTypeintconst(OrcaTextOrcaReportType=iotaOrcaBin)typeHttpHeaderstruct{keystringvaluestring}const(customUtilA=0.2customUtilB=0.4)funcGetBinOrcaReport()HttpHeader{report:=&pb.OrcaLoadReport{NamedMetrics:map[string]float64{"customUtilA":customUtilA,"customUtilB":customUtilB}}out,err:=proto.Marshal(report)iferr!=nil{log.Fatalf("failed to serialize the ORCA proto: %v",err)}returnHttpHeader{"endpoint-load-metrics-bin",base64.StdEncoding.EncodeToString(out)}}funcGetHttpOrcaReport()HttpHeader{returnHttpHeader{"endpoint-load-metrics",fmt.Sprintf("TEXT named_metrics.customUtilA=%.2f,named_metrics.customUtilB=%.2f",customUtilA,customUtilB)}}funcGetOrcaReport(tOrcaReportType)HttpHeader{switcht{caseOrcaText:returnGetHttpOrcaReport()caseOrcaBin:returnGetBinOrcaReport()default:returnHttpHeader{"",""}}}...Configure the load balancer to use custom metrics
For the load balancer to use these custom metrics when selecting a backend, youneed to set the balancing mode for each backend toCUSTOM_METRICS.Additionally, if you want the custom metrics to also influence endpointselection, you set the load balancing locality policy toWEIGHTED_ROUND_ROBIN.
The steps described in this section assume you have already deployed a loadbalancer with zonal NEG backends. However, you can use the same--custom-metrics flags demonstrated here to update any existing backend byusing thegcloud compute backend-services update command.
You can set a backend's balancing mode to
CUSTOM_METRICSwhen you add the backend to the backend service. You usethe--custom-metricsflag to specify your custom metric and the thresholdto be used for load balancing decisions.gcloud compute backend-services add-backendBACKEND_SERVICE_NAME \ --network-endpoint-group=NEG_NAME \ --network-endpoint-group-zone=NEG_ZONE \ [--global | region=REGION] \ --balancing-mode=CUSTOM_METRICS \ --custom-metrics='name="BACKEND_METRIC_NAME_1",maxUtilization=MAX_UTILIZATION_FOR_METRIC_1' \ --custom-metrics='name="BACKEND_METRIC_NAME_2",maxUtilization=MAX_UTILIZATION_FOR_METRIC_2'
Replace the following:
BACKEND_SERVICE_NAME: the name of the backendserviceNEG_NAME: the name of the zonal or hybrid NEGNEG_ZONE: the zone where the NEG was createdREGION: for regional load balancers, the regionwhere the load balancer was createdBACKEND_METRIC_NAME: the custom metric names usedhere must match the custom metric names being reported by the backend's ORCAreportMAX_UTILIZATION_FOR_METRIC: the maximum utilizationthat the load balancing algorithms must target for each metric
For example, if your backends are reporting two custom metrics,
customUtilAandcustomUtilB(as demonstrated in theConfigure backends to reportmetrics to the load balancer section), you use thefollowing command to configure your load balancer to use these metrics:gcloud compute backend-services add-backendBACKEND_SERVICE_NAME \ --network-endpoint-group=NEG_NAME \ --network-endpoint-group-zone=NEG_ZONE \ [--global | region=REGION] \ --balancing-mode=CUSTOM_METRICS \ --custom-metrics='name="customUtilA",maxUtilization=0.8' \ --custom-metrics='name="customUtilB",maxUtilization=0.9'
Alternatively, you can provide a list of custom metrics in a structured JSONfile:
{"name":"METRIC_NAME_1","maxUtilization":MAX_UTILIZATION_FOR_METRIC_1,"dryRun":true}{"name":"METRIC_NAME_2","maxUtilization":MAX_UTILIZATION_FOR_METRIC_2,"dryRun":false}
Then attach the metrics file in JSON format to the backend as follows:
gcloud compute backend-services add-backendBACKEND_SERVICE_NAME \ --network-endpoint-group=NEG_NAME \ --network-endpoint-group-zone=NEG_ZONE \ [--global | region=REGION] \ --balancing-mode=CUSTOM_METRICS \ --custom-metrics-file='BACKEND_METRIC_FILE_NAME'
If you want to test whether the metrics are being reported without actuallyaffecting the load balancer, you can set the
dryRunflag totruewhenconfiguring the metric as follows:gcloud compute backend-services add-backendBACKEND_SERVICE_NAME \ --network-endpoint-group=NEG_NAME \ --network-endpoint-group-zone=NEG_ZONE \ [--global | region=REGION] \ --balancing-mode=CUSTOM_METRICS \ --custom-metrics 'name="BACKEND_METRIC_NAME",maxUtilization=MAX_UTILIZATION_FOR_METRIC,dryRun=true'
When a metric is configured with
dryRunset totrue, the metric isreported to Monitoring but isn't actually used by the loadbalancer.To reverse this, update the backend service with the
dryRunflagset tofalse.gcloud compute backend-services update-backendBACKEND_SERVICE_NAME \ --network-endpoint-group=NEG_NAME \ --network-endpoint-group-zone=NEG_ZONE \ [--global | region=REGION] \ --balancing-mode=CUSTOM_METRICS \ --custom-metrics 'name="BACKEND_METRIC_NAME",maxUtilization=MAX_UTILIZATION_FOR_METRIC_,dryRun=false'
If all your custom metrics are configured with
Note: To see other flags supported by thedryRunset totrue, settingthe balancing mode toCUSTOM_METRICSor the load balancing locality policytoWEIGHTED_ROUND_ROBINhas no effect on the load balancer.gcloud compute backend-servicesupdate-backendcommand, such as--clear-custom-metrics, see thegcloudcompute backend-services update-backendreference page.To configure the load balancer to use the custom metrics to influenceendpoint selection, you set the backend service load balancing localitypolicy to
WEIGHTED_ROUND_ROBIN.For example, if you have a backend service that is already configuredwith the appropriate backends, you configure the load balancing localitypolicy as follows:
gcloud compute backend-services updateBACKEND_SERVICE_NAME \ [--global | region=REGION] \ --custom-metrics='name=BACKEND_SERVICE_METRIC_NAME,dryRun=false' \ --locality-lb-policy=WEIGHTED_ROUND_ROBIN
As demonstrated previously for the backend level metrics, you can also providea list of custom metrics in a structured JSON file at the backend servicelevel. Use the
--custom-metrics-filefield to attach the metrics file to thebackend service.
What's next
- Troubleshoot issues with external Application Load Balancers
- Troubleshoot issues with internal Application Load Balancers
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2026-02-18 UTC.