Custom metrics for Application Load Balancers

This page describes how to use custom metrics with yourApplication Load Balancers. Custom metrics let you configure your load balancer'straffic distribution behavior to be based on metrics specific to yourapplication or infrastructure requirements, rather than Google Cloud'sstandard utilization or rate-based metrics. Defining custom metrics for yourload balancer gives you the flexibility to route application requests to thebackend instances and endpoints that are most optimal for your workload.

For GKE, you can also use custom metrics that come from theservice or application that you are running. For details, seeExpose custom metrics.

The load balancer uses the custom metrics values to make the followingdecisions:

  1. Select which backend virtual machine (VM) instance group or network endpointgroup is to receive traffic.
  2. Select which VM instance or endpoint is to receive traffic.
Load balancing with custom metrics.
Load balancing with custom metrics (click to enlarge).

Here are some example use-cases for custom metrics:

  • Maximize the use of your global compute capacity by making load balancingdecisions based on custom metrics that are most relevant to your application,instead of the default criteria such as regional affinity or network latency.

    In case your applications often have backend processing latencies in the orderof seconds, you can use your global compute capacity more efficiently by loadbalancing requests based on custom metrics rather than network latency.

  • Maximize compute efficiency by making load balancing decisions based oncombinations of metrics unique to your deployment. For example, consider ascenario where your requests have highly variable processing times and computerequirements. In such a scenario, load balancing based solely on the rate ofrequests per second results in an uneven load distribution. In such acase you might want to define a custom metric that balances load based on acombination of both the rate of requests as well as CPU or GPU utilization tomost efficiently use your compute fleet.

  • Autoscale backends based on custom metrics that are most relevant to yourapplication requirements. For example, you can define an autoscaling policy toautoscale your backend instances when your configured custom metric exceeds80%. This is achieved by using traffic-based autoscaling metrics(autoscaling.googleapis.com|gclb-capacity-fullness). For more information,seeAutoscaling based on load balancer traffic.

Supported load balancers and backends

Custom metrics are supported for the following Application Load Balancers:

  • Global external Application Load Balancer
  • Regional external Application Load Balancer
  • Cross-region internal Application Load Balancer
  • Regional internal Application Load Balancer

Custom metrics are supported with the following backend types:

  • Managed instance groups
  • Zonal NEGs (withGCE_VM_IP_PORT endpoints)
  • Hybrid connectivity NEGs

How custom metrics work

To enable your load balancer to make traffic distribution decisions based oncustom metrics, you must first determine what the most relevant metrics are foryour specific application. When you know which metrics you want to use, you thenconfigure your backends to start reporting a steady stream of these metrics toyour load balancer. Google Cloud lets you report metrics as part of theheader of each HTTP response sent from the backends to your load balancer. Thesemetrics are encapsulated in a custom HTTP response header and must follow theOpen Request Cost Aggregation(ORCA) standard.

Metrics can be configured at two levels:

  • At the backend service level, to influence backend (MIG or NEG) selection
  • At the backend level, to influence VM instance or endpoint selection

The following sections describe how custom metrics work.

Determine which custom metrics influence load balancing decisions

Determining which custom metrics influence load balancing decisions ishighly subjective and based on the needs of your applications. For example, ifyour applications have backend processing latencies in the order ofseconds, then you might want to load balance requests based on other custommetrics rather than standard network latencies.

After you have determined which metrics you want to use, you must also determinethe maximum utilization threshold for each metric. For example, if you want touse memory utilization as a metric, you must also determine the maximum memoryutilization threshold for each backend.

For example, if you configure a metric calledexample-custom-metric, with itsmaximum utilization threshold set to 0.8, the load balancer dynamically adjuststraffic distribution across backends to keep theexample-custom-metric metricreported by the backend less than 0.8, as much as possible.

There are two types of custom metrics you can use:

  • Reserved metrics. There are five reserved metric names; these names arereserved because they correspond to top-level predefined fields in the ORCAAPI.

    • orca.cpu_utilization
    • orca.mem_utilization
    • orca.application_utilization
    • orca.eps
    • orca.rps_fractional

    Themem_utilization,cpu_utilization, andapplication_utilizationmetrics expect values in the range of0.0 - 1.00 but can exceed1.00 forscenarios where resource utilization goes over budget.

  • Named metrics. These are metrics that are unique to your application thatyou specify by using the ORCAnamed_metrics field in the following format:

    orca.named_metrics.METRIC_NAME

    All user-defined custom metrics are specified using thisnamed_metrics map in the format of name, value pairs.

    Named metrics defined for theCUSTOM_METRICS balancing mode must includevalues in the0 - 100 range. Named metrics defined for theWEIGHTED_ROUND_ROBIN load balancing locality policy have no expected range.

Required metrics

To enable your load balancer to use custom metrics for backend VM instancegroup or network endpoint group selection, you must specifyone or more ofthe following utilization metrics in the ORCA load report sent to the load balancer.orca.named_metrics is a map of user-defined metrics in the form ofname, value pairs.

  • orca.cpu_utilization
  • orca.application_utilization
  • orca.mem_utilization
  • orca.named_metrics

Additionally, to enable your load balancer to use custom metrics to furtherinfluence the selection of the backend VM instance or endpoint, you must provideall of the following metrics in the ORCA load report sent to the loadbalancer. The load balancer uses weights computed from these reported metrics toassign load to individual backends.

  • orca.rps_fractional (requests per second)
  • orca.eps (errors per second)
  • a utilization metric with the following order of precedence:
    1. orca.application_utilization
    2. orca.cpu_utilization
    3. user-defined metrics in theorca.named_metrics map

Limits and requirements

  • There is a limit of two custom metrics per backend. However, you can performdryRun tests with a maximum of three custom metrics.

    If two metrics are provided, the load balancer treats them independently.For example, if you define two dimensions:custom-metric-util1 andcustom-metric-util2, the load balancer treats them independently. If abackend is running at a high utilization level in terms ofcustom-metric-util1, the load balancer avoids sending traffic to thisbackend. Generally, the load balancer tries to keep all backends runningwith roughly the samefullness. Fullness is computed ascurrentUtilization /maxUtilization. In this case, the load balanceruses the higher of the two fullness values reported by the two metrics tomake load balancing decisions.

  • There is a limit of two custom metrics per backend service. However, you canperformdryRun tests with a maximum of three custom metrics.This limit doesn't include the requiredorca.eps andorca.rps_fractionalmetrics. This limit is also independent of metrics configured at the backendlevel.

  • Both reserved metrics and named metrics can be used together. For example,bothorca.cpu_utilization = 0.5 and a custom metric such asorca.named_metrics.queue_depth_util = 0.2 can be provided in a single loadreport.

  • Custom metric names must not contain regulated, sensitive, identifiable, orother confidential information that anyone external to your organizationmust not see.

Available encodings for custom metric specification

  • JSON

    Sample JSON encoding of a load report:

    endpoint-load-metrics-json: JSON {"cpu_utilization": 0.3, "mem_utilization": 0.8, "rps_fractional": 10.0, "eps": 1, "named_metrics": {"custom-metric-util": 0.4}}.
  • Binary Protobuf

    For Protocol Buffers-aware code, this is a binary serialized base64encoded OrcaLoadReport protobuf in eitherendpoint-load-metrics-bin or inendpoint-load-metrics: BIN.

  • Native HTTP

    Comma separated key-value pairs inendpoint-load-metrics. This is aflattened text representation of the OrcaLoadReport:

    endpoint-load-metrics: TEXT cpu_utilization=0.3, mem_utilization=0.8, rps_fractional=10.0, eps=1, named_metrics.custom_metric_util=0.4
  • gRPC

    gRPC specification requires the metrics to be provided by using trailingmetadata using theendpoint-load-metrics-bin key.

Backend configuration to report custom metrics

After you determine the metrics you want the load balancer to use, youconfigure your backends to compile the required custom metrics in an ORCAload report and report their values in each HTTP response header sent tothe load balancer.

For example, if you choseorca.cpu_utilization as a custom metric for abackend, that backend must report the current CPU utilization to the loadbalancer in each response sent to the load balancer. For instructions, see thereport metrics to the load balancer section on this page.

Load balancer configuration to support custom metrics

To enable the load balancer to use the custom metrics values reported by thebackends to make traffic distribution decisions, you must set each backend'sbalancing mode toCUSTOM_METRICS and set the backend service load balancinglocality policy toWEIGHTED_ROUND_ROBIN.

How custom metrics work with Application Load Balancers.
How custom metrics work with Application Load Balancers (click to enlarge).
  • CUSTOM_METRICS balancing mode. Each of your backends in a backendservice must be configured to use theCUSTOM_METRICS balancing mode.When a backend is configured withCUSTOM_METRICS balancing mode,the load balancer directs traffic to the backends according to the maximumutilization threshold configured for each custom metric.

    Each backend can specify a different set of metrics to report. If multiplecustom metrics are configured per backend, the load balancer tries todistribute traffic such that all the metrics remain below the configuredmaximum utilization limits.

    Traffic is load balanced across backends based on the load balancingalgorithm you choose; for example, thedefaultWATERFALL_BY_REGIONalgorithm triesto keep all backends running with the same fullness.

  • WEIGHTED_ROUND_ROBIN load balancing locality policy. The backendservice's load balancing locality policy must be set toWEIGHTED_ROUND_ROBIN. With this configuration, the load balancer alsouses the custom metrics to select the optimal instance or endpoint withinthe backend to serve the request.

Configure custom metrics

To enable your Application Load Balancers to use custom metrics, do the following:

  1. Determine the custom metrics you want to use.
  2. Configure the backends to report custom metrics to the load balancer. Youmust establish a stream of data that can be sent to the load balancer to beused for load balancing. These metrics must be compiled and encodedin an ORCA load report and then reported to the load balancer by using HTTPresponse headers.
  3. Configure the load balancer to use the custom metric values being reported bythe backends.

Determine the custom metrics

This step is highly subjective based on the needs of your applications.After you have determined which metrics you want to use, you must also determinethe maximum utilization threshold for each metric. For example, if you want touse memory utilization as a metric, you must also determine the maximum memoryutilization threshold for each backend.

Before you proceed to configuring the load balancer, make sure you have reviewedthe types of custom metrics available to you (reserved and named) and therequirements for metric selection, which are described in theHow custom metricswork section on this page.

Configure backends to report metrics to the load balancer

Custom metrics are reported to load balancers as part of each HTTP response fromyour application backends by using the ORCA standard.

When using Google Kubernetes Engine, you also have the option to usecustom metrics for load balancers.

This section shows you howto compile the custom metrics in an ORCA load report and report these metrics ineach HTTP response header sent to the load balancer.

For example, if you're using HTTP text encoding, the header mustreport the metrics in the following format.

endpoint-load-metrics: TEXTBACKEND_METRIC_NAME_1=BACKEND_METRIC_VALUE_1,BACKEND_METRIC_NAME_2=BACKEND_METRIC_VALUE_2

Regardless of theencoding format used, make sure that you removetheorca. prefix from the metric name when you build the load report.

Here is a code snippet that shows how to append two custom metrics(customUtilA andcustomUtilB) to your HTTP headers. This code snippet showsboth native HTTP text encoding and base64 encoding. Note that this examplehardcodes the values forcustomUtilA andcustomUtilB only for simplicity.Your load balancer receives the values for the metrics that youdetermined are to influence load balancing.

...typeOrcaReportTypeintconst(OrcaTextOrcaReportType=iotaOrcaBin)typeHttpHeaderstruct{keystringvaluestring}const(customUtilA=0.2customUtilB=0.4)funcGetBinOrcaReport()HttpHeader{report:=&pb.OrcaLoadReport{NamedMetrics:map[string]float64{"customUtilA":customUtilA,"customUtilB":customUtilB}}out,err:=proto.Marshal(report)iferr!=nil{log.Fatalf("failed to serialize the ORCA proto: %v",err)}returnHttpHeader{"endpoint-load-metrics-bin",base64.StdEncoding.EncodeToString(out)}}funcGetHttpOrcaReport()HttpHeader{returnHttpHeader{"endpoint-load-metrics",fmt.Sprintf("TEXT named_metrics.customUtilA=%.2f,named_metrics.customUtilB=%.2f",customUtilA,customUtilB)}}funcGetOrcaReport(tOrcaReportType)HttpHeader{switcht{caseOrcaText:returnGetHttpOrcaReport()caseOrcaBin:returnGetBinOrcaReport()default:returnHttpHeader{"",""}}}...

Configure the load balancer to use custom metrics

For the load balancer to use these custom metrics when selecting a backend, youneed to set the balancing mode for each backend toCUSTOM_METRICS.Additionally, if you want the custom metrics to also influence endpointselection, you set the load balancing locality policy toWEIGHTED_ROUND_ROBIN.

The steps described in this section assume you have already deployed a loadbalancer with zonal NEG backends. However, you can use the same--custom-metrics flags demonstrated here to update any existing backend byusing thegcloud compute backend-services update command.

  1. You can set a backend's balancing mode toCUSTOM_METRICS when you add the backend to the backend service. You usethe--custom-metrics flag to specify your custom metric and the thresholdto be used for load balancing decisions.

    gcloud compute backend-services add-backendBACKEND_SERVICE_NAME \    --network-endpoint-group=NEG_NAME \    --network-endpoint-group-zone=NEG_ZONE \    [--global | region=REGION] \    --balancing-mode=CUSTOM_METRICS \    --custom-metrics='name="BACKEND_METRIC_NAME_1",maxUtilization=MAX_UTILIZATION_FOR_METRIC_1' \    --custom-metrics='name="BACKEND_METRIC_NAME_2",maxUtilization=MAX_UTILIZATION_FOR_METRIC_2'

    Replace the following:

    • BACKEND_SERVICE_NAME: the name of the backendservice
    • NEG_NAME: the name of the zonal or hybrid NEG
    • NEG_ZONE: the zone where the NEG was created
    • REGION: for regional load balancers, the regionwhere the load balancer was created
    • BACKEND_METRIC_NAME: the custom metric names usedhere must match the custom metric names being reported by the backend's ORCAreport
    • MAX_UTILIZATION_FOR_METRIC: the maximum utilizationthat the load balancing algorithms must target for each metric

    For example, if your backends are reporting two custom metrics,customUtilAandcustomUtilB (as demonstrated in theConfigure backends to reportmetrics to the load balancer section), you use thefollowing command to configure your load balancer to use these metrics:

    gcloud compute backend-services add-backendBACKEND_SERVICE_NAME \    --network-endpoint-group=NEG_NAME \    --network-endpoint-group-zone=NEG_ZONE \    [--global | region=REGION] \    --balancing-mode=CUSTOM_METRICS \    --custom-metrics='name="customUtilA",maxUtilization=0.8' \    --custom-metrics='name="customUtilB",maxUtilization=0.9'

    Alternatively, you can provide a list of custom metrics in a structured JSONfile:

    {"name":"METRIC_NAME_1","maxUtilization":MAX_UTILIZATION_FOR_METRIC_1,"dryRun":true}{"name":"METRIC_NAME_2","maxUtilization":MAX_UTILIZATION_FOR_METRIC_2,"dryRun":false}

    Then attach the metrics file in JSON format to the backend as follows:

    gcloud compute backend-services add-backendBACKEND_SERVICE_NAME \    --network-endpoint-group=NEG_NAME \    --network-endpoint-group-zone=NEG_ZONE \    [--global | region=REGION] \    --balancing-mode=CUSTOM_METRICS \    --custom-metrics-file='BACKEND_METRIC_FILE_NAME'

    If you want to test whether the metrics are being reported without actuallyaffecting the load balancer, you can set thedryRun flag totrue whenconfiguring the metric as follows:

    gcloud compute backend-services add-backendBACKEND_SERVICE_NAME \    --network-endpoint-group=NEG_NAME \    --network-endpoint-group-zone=NEG_ZONE \    [--global | region=REGION] \    --balancing-mode=CUSTOM_METRICS \    --custom-metrics 'name="BACKEND_METRIC_NAME",maxUtilization=MAX_UTILIZATION_FOR_METRIC,dryRun=true'

    When a metric is configured withdryRun set totrue, the metric isreported to Monitoring but isn't actually used by the loadbalancer.

    To reverse this, update the backend service with thedryRun flagset tofalse.

    gcloud compute backend-services update-backendBACKEND_SERVICE_NAME \    --network-endpoint-group=NEG_NAME \    --network-endpoint-group-zone=NEG_ZONE \    [--global | region=REGION] \    --balancing-mode=CUSTOM_METRICS \    --custom-metrics 'name="BACKEND_METRIC_NAME",maxUtilization=MAX_UTILIZATION_FOR_METRIC_,dryRun=false'

    If all your custom metrics are configured withdryRun set totrue, settingthe balancing mode toCUSTOM_METRICS or the load balancing locality policytoWEIGHTED_ROUND_ROBIN has no effect on the load balancer.

    Note: To see other flags supported by thegcloud compute backend-servicesupdate-backend command, such as--clear-custom-metrics, see thegcloudcompute backend-services update-backendreference page.
  2. To configure the load balancer to use the custom metrics to influenceendpoint selection, you set the backend service load balancing localitypolicy toWEIGHTED_ROUND_ROBIN.

    For example, if you have a backend service that is already configuredwith the appropriate backends, you configure the load balancing localitypolicy as follows:

    gcloud compute backend-services updateBACKEND_SERVICE_NAME \    [--global | region=REGION] \    --custom-metrics='name=BACKEND_SERVICE_METRIC_NAME,dryRun=false' \    --locality-lb-policy=WEIGHTED_ROUND_ROBIN

    As demonstrated previously for the backend level metrics, you can also providea list of custom metrics in a structured JSON file at the backend servicelevel. Use the--custom-metrics-file field to attach the metrics file to thebackend service.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-18 UTC.