Deploy a multi-cluster Gateway for capacity-based load balancing

This document guides you through deploying a sample application across twoGKE clusters in different regions, and shows how multi-clusterGateway intelligently routes traffic when it exceeds Service capacity limits.

Capacity-based load balancing is a feature of multi-cluster Gateways that helpsyou build highly reliable and resilient applications. By defining the capacityof your Services, you can protect them from being overloaded and help ensure aconsistent experience for your users. When a Service in one cluster reaches itscapacity, the load balancer automatically redirects traffic to another clusterwith available capacity. For more information about traffic management, seeGKE traffic management.

In this tutorial, you use a samplestore application to simulate a real-worldscenario where an online shopping service is owned and operated by separateteams and deployed across afleet ofshared GKE clusters.

Before you begin

Multi-cluster Gateways require some environmental preparation before they can bedeployed. Before you proceed, follow the steps inPrepare your environment for multi-clusterGateways:

  1. Deploy GKE clusters.

  2. Register your clusters to a fleet (if they aren't already).

  3. Enable the multi-cluster Service and multi-cluster Gateway controllers.

Finally, review the GKE Gateway controllerlimitations and known issuesbefore you use the controller in your environment.

Deploy capacity-based load balancing

The exercise in this section demonstrates global load balancing and Servicecapacity concepts by deploying an application across two GKEclusters in different regions. Generated traffic is sent at various request persecond (RPS) levels to show how traffic is load balanced across clusters andregions.

The following diagram shows the topology that you will deploy and how trafficoverflows between clusters and regions when traffic has exceeded Servicecapacity:

Traffic overflowing from one cluster to another

Prepare your environment

  1. FollowPrepare your environment for multi-cluster Gatewaysto prepare your environment.

  2. Confirm that the GatewayClass resources are installed on the config cluster:

    kubectlgetgatewayclasses--context=gke-west-1

    The output is similar to the following:

    NAME                                  CONTROLLER                  ACCEPTED   AGEgke-l7-global-external-managed        networking.gke.io/gateway   True       16hgke-l7-global-external-managed-mc     networking.gke.io/gateway   True       14hgke-l7-gxlb                           networking.gke.io/gateway   True       16hgke-l7-gxlb-mc                        networking.gke.io/gateway   True       14hgke-l7-regional-external-managed      networking.gke.io/gateway   True       16hgke-l7-regional-external-managed-mc   networking.gke.io/gateway   True       14hgke-l7-rilb                           networking.gke.io/gateway   True       16hgke-l7-rilb-mc                        networking.gke.io/gateway   True       14h

Deploy an application

Deploy the sample web application server to both clusters:

kubectlapply--contextgke-west-1-fhttps://raw.githubusercontent.com/GoogleCloudPlatform/gke-networking-recipes/master/gateway/docs/store-traffic-deploy.yamlkubectlapply--contextgke-east-1-fhttps://raw.githubusercontent.com/GoogleCloudPlatform/gke-networking-recipes/master/gateway/docs/store-traffic-deploy.yaml

The output is similar to the following:

namespace/store createddeployment.apps/store created

Deploy a Service, Gateway, and HTTPRoute

  1. Apply the followingService manifest to bothgke-west-1 andgke-east-1clusters:

    cat << EOF | kubectl apply --context gke-west-1 -f -apiVersion:v1kind:Servicemetadata:name:storenamespace:traffic-testannotations:networking.gke.io/max-rate-per-endpoint:"10"spec:ports:-port:8080targetPort:8080name:httpselector:app:storetype:ClusterIP---kind:ServiceExportapiVersion:net.gke.io/v1metadata:name:storenamespace:traffic-testEOF
    cat << EOF | kubectl apply --context gke-east-1 -f -apiVersion:v1kind:Servicemetadata:name:storenamespace:traffic-testannotations:networking.gke.io/max-rate-per-endpoint:"10"spec:ports:-port:8080targetPort:8080name:httpselector:app:storetype:ClusterIP---kind:ServiceExportapiVersion:net.gke.io/v1metadata:name:storenamespace:traffic-testEOF

    The Service is annotated withmax-rate-per-endpoint set to 10 requests perseconds. With 2 replicas per cluster, each Service has 20 RPS of capacityper cluster.

    For more information on how to choose a Service capacity level for yourService, seeDetermine your Service's capacity.

  2. Apply the followingGateway manifest to the config cluster,gke-west-1in this example:

    cat << EOF | kubectl apply --context gke-west-1 -f -kind:GatewayapiVersion:gateway.networking.k8s.io/v1metadata:name:storenamespace:traffic-testspec:gatewayClassName:gke-l7-global-external-managed-mclisteners:-name:httpprotocol:HTTPport:80allowedRoutes:kinds:-kind:HTTPRouteEOF

    The manifest describes an external, global, multi-cluster Gatewaythat deploys an external Application Load Balancer with a publicly accessible IP address.

  3. Apply the followingHTTPRoute manifest to the config cluster,gke-west-1in this example:

    cat << EOF | kubectl apply --context gke-west-1 -f -kind:HTTPRouteapiVersion:gateway.networking.k8s.io/v1metadata:name:storenamespace:traffic-testlabels:gateway:storespec:parentRefs:-kind:Gatewaynamespace:traffic-testname:storerules:-backendRefs:-name:storegroup:net.gke.iokind:ServiceImportport:8080EOF

    The manifest describes an HTTPRoute that configures the Gatewaywith a routing rule that directs all traffic to the store ServiceImport. Thestore ServiceImport groups thestore Service Pods across both clusters andallows them to be addressed by the load balancer as a single Service.

    Note: It might take several minutes (up to 10) for the Gateway to fullydeploy and serve traffic.

    You can check the Gateway's events after a fewminutes to see if it has finished deploying:

    kubectldescribegatewaystore-ntraffic-test--contextgke-west-1

    The output is similar to the following:

    ...Status:  Addresses:    Type:   IPAddress    Value:34.102.159.147  Conditions:    Last Transition Time:  2023-10-12T21:40:59Z    Message:               The OSS Gateway API has deprecated this condition, do not depend on it.    Observed Generation:   1    Reason:                Scheduled    Status:                True    Type:                  Scheduled    Last Transition Time:  2023-10-12T21:40:59Z    Message:    Observed Generation:   1    Reason:                Accepted    Status:                True    Type:                  Accepted    Last Transition Time:  2023-10-12T21:40:59Z    Message:    Observed Generation:   1    Reason:                Programmed    Status:                True    Type:                  Programmed    Last Transition Time:  2023-10-12T21:40:59Z    Message:               The OSS Gateway API has altered the "Ready" condition semantics and reservedit for future use.  GKE Gateway will stop emitting it in a future update, use "Programmed" instead.    Observed Generation:   1    Reason:                Ready    Status:                True    Type:                  Ready  Listeners:    Attached Routes:  1    Conditions:      Last Transition Time:  2023-10-12T21:40:59Z      Message:      Observed Generation:   1      Reason:                Programmed      Status:                True      Type:                  Programmed      Last Transition Time:  2023-10-12T21:40:59Z      Message:               The OSS Gateway API has altered the "Ready" condition semantics and reservedit for future use.  GKE Gateway will stop emitting it in a future update, use "Programmed" instead.      Observed Generation:   1      Reason:                Ready      Status:                True      Type:                  Ready    Name:                    http    Supported Kinds:      Group:  gateway.networking.k8s.io      Kind:   HTTPRouteEvents:  Type    Reason  Age                  From                   Message  ----    ------  ----                 ----                   -------  Normal  ADD     12m                  mc-gateway-controller  traffic-test/store  Normal  SYNC    6m43s                mc-gateway-controller  traffic-test/store  Normal  UPDATE  5m40s (x4 over 12m)  mc-gateway-controller  traffic-test/store  Normal  SYNC    118s (x6 over 10m)   mc-gateway-controller  SYNC on traffic-test/store was a success

    This output shows that the Gateway has deployed successfully. It might stilltake a few minutes for traffic to start passing after the Gateway hasdeployed. Take note of the IP address in this output, as it is used in afollowing step.

Confirm traffic

Confirm that traffic is passing to the application by testing the Gateway IPaddress with a curl command:

curlGATEWAY_IP_ADDRESS

The output is similar to the following:

{  "cluster_name": "gke-west-1",  "host_header": "34.117.182.69",  "pod_name": "store-54785664b5-mxstv",  "pod_name_emoji": "👳🏿",  "project_id": "project",  "timestamp": "2021-11-01T14:06:38",  "zone": "us-west1-a"}

This output shows the Pod metadata, which indicates the region where therequest was served from.

Verify traffic using load testing

To verify the load balancer is working, you can deploy a traffic generator inyourgke-west-1 cluster. The traffic generator generates traffic at differentlevels of load to demonstrate the capacity and overflow capabilities of the loadbalancer. The following steps demonstrate three levels of load:

  • 10 RPS, which is under the capacity for the store Service ingke-west-1.
  • 30 RPS, which is over capacity for thegke-west-1 store Service and causestraffic overflow togke-east-1.
  • 60 RPS, which is over capacity for the Services in both clusters.

Configure dashboard

  1. Get the name of the underying URLmap for your Gateway:

    kubectlgetgatewaystore-ntraffic-test--context=gke-west-1-o=jsonpath="{.metadata.annotations.networking\.gke\.io/url-maps}"

    The output is similar to the following:

    /projects/PROJECT_NUMBER/global/urlMaps/gkemcg1-traffic-test-store-armvfyupay1t
  2. In the Google Cloud console, go to theMetrics explorer page.

    Go to Metrics explorer

  3. UnderSelect a metric, clickCODE: MQL.

  4. Enter the following query to observe traffic metrics for the store Serviceacross your two clusters:

    fetchhttps_lb_rule|metric'loadbalancing.googleapis.com/https/backend_request_count'|filter(resource.url_map_name=='GATEWAY_URL_MAP')|alignrate(1m)|every1m|group_by[resource.backend_scope],[value_backend_request_count_aggregate:aggregate(value.backend_request_count)]

    ReplaceGATEWAY_URL_MAP with the URLmap namefrom the previous step.

  5. ClickRun query. Wait at least 5 minutes after deploying the loadgenerator in the next section for the metrics to display in the chart.

Test with 10 RPS

  1. Deploy a Pod to yourgke-west-1 cluster:

    kubectlrun--contextgke-west-1-i--tty--rmloadgen\--image=cyrilbkr/httperf\--restart=Never\--/bin/sh-c'httperf  \    --server=GATEWAY_IP_ADDRESS  \    --hog --uri="/zone" --port 80  --wsess=100000,1,1 --rate 10'

    ReplaceGATEWAY_IP_ADDRESS with the Gateway IPaddress from the previous step.

    The output is similar to the following, indicating that the traffic generatoris sending traffic:

    If you don't see a command prompt, try pressing enter.

    The load generator continuously sends 10 RPS to the Gateway. Even thoughtraffic is coming from inside a Google Cloud region, the load balancer treatsit as client traffic coming from the US West Coast. To simulate a realisticclient diversity, the load generator sends each HTTP request as a new TCPconnection, which means traffic is distributed across backend Pods moreevenly.

    The generator takes up to 5 minutes to generate traffic for the dashboard.

  2. View your Metrics explorer dashboard. Two lines appear, indiciating how muchtraffic is load balanced to each of the clusters:

    Graph showing traffic load balanced to clusters

    You should see thatus-west1-a is receiving approximately 10 RPS of trafficwhileus-east1-b is not receiving any traffic. Because the trafficgenerator is running inus-west1, all traffic is sent to the Service in thegke-west-1 cluster.

  3. Stop the load generator usingCtrl+C, then delete the pod:

    kubectldeletepodloadgen--contextgke-west-1

Test with 30 RPS

  1. Deploy the load generator again, but configured to send 30 RPS:

    kubectlrun--contextgke-west-1-i--tty--rmloadgen\--image=cyrilbkr/httperf\--restart=Never\--/bin/sh-c'httperf  \    --server=GATEWAY_IP_ADDRESS  \    --hog --uri="/zone" --port 80  --wsess=100000,1,1 --rate 30'

    The generator takes up to 5 minutes to generate traffic for the dashboard.

  2. View your Cloud Ops dashboard.

    Graph showing traffic overflowing to gke-east-1

    You should see that approximately 20 RPS is being sent tous-west1-a and10 RPS tous-east1-b. This indicates that the Service ingke-west-1 isfully utilized and is overflowing 10 RPS of traffic to the Service ingke-east-1.

  3. Stop the load generator usingCtrl+C, then delete the Pod:

    kubectldeletepodloadgen--contextgke-west-1

Test with 60 RPS

  1. Deploy the load generator configured to send 60 RPS:

    kubectlrun--contextgke-west-1-i--tty--rmloadgen\--image=cyrilbkr/httperf\--restart=Never\--/bin/sh-c'httperf  \    --server=GATEWAY_IP_ADDRESS  \    --hog --uri="/zone" --port 80  --wsess=100000,1,1 --rate 60'
  2. Wait 5 minutes and view your Cloud Ops dashboard. It should now show thatboth clusters are receiving roughly 30 RPS. Since all Services areoverutilized globally, there is no traffic spillover and Services absorb allthe traffic they can.

    Graph showing Services overutilized

  3. Stop the load generator usingCtrl+C, then delete the Pod:

    kubectldeletepodloadgen--contextgke-west-1

Clean up

After completing the exercises on this document, follow these steps to removeresources and prevent unwanted charges incurring on your account:

  1. Delete the clusters.

  2. Unregister the clustersfrom the fleet if they don't need to be registered for another purpose.

  3. Disable themulticlusterservicediscovery feature:

    gcloudcontainerfleetmulti-cluster-servicesdisable
  4. Disable Multi Cluster Ingress:

    gcloudcontainerfleetingressdisable
  5. Disable the APIs:

    gcloudservicesdisable\multiclusterservicediscovery.googleapis.com\multiclusteringress.googleapis.com\trafficdirector.googleapis.com\--project=PROJECT_ID

Troubleshooting

No healthy upstream

Symptom:

The following issue might occur when you create a Gateway butcannot access the backend services (503 response code):

no healthy upstream

Reason:

This error message indicates that the health check prober cannot find healthybackend services. It is possible that your backend services are healthybut you might need to customize the health checks.

Workaround:

To resolve this issue,customize your health check based on your application'srequirements (for example,/health) using aHealthCheckPolicy.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.