Deploy a multi-cluster Gateway for weighted traffic splitting

This document guides you through a blue-green deployment of a samplestoreapplication across two GKE clusters. Blue-green deployments arean effective strategy to migrate your applications to new GKE clusters with minimalrisk. By gradually shifting traffic from the current cluster (blue) to the newcluster (green), you can validate the new environment in productionbefore committing to a full cutover.

Multi-cluster Gateways provide a powerful way to manage traffic for servicesdeployed across multiple GKE clusters. By using Google's globalload-balancing infrastructure, you can create a single entry point for yourapplications, which simplifies management and improves reliability.

In this tutorial, you use a samplestore application to simulate a real-worldscenario where an online shopping service is owned and operated by separateteams and deployed across afleet ofshared GKE clusters.

Before you begin

Multi-cluster Gateways require some environmental preparation before they can bedeployed. Before you proceed, follow the steps inPrepare your environment for multi-clusterGateways:

  1. Deploy GKE clusters.

  2. Register your clusters to a fleet (if they aren't already).

  3. Enable the multi-cluster Service and multi-cluster Gateway controllers.

Finally, review the GKE Gateway controllerlimitations and known issuesbefore you use the controller in your environment.

Blue-green, multi-cluster routing with Gateway

Thegke-l7-global-external-managed-*,gke-l7-regional-external-managed-*, andgke-l7-rilb-* GatewayClasses have many advanced traffic routing capabilitiesincluding traffic splitting, header matching, header manipulation, trafficmirroring, and more. In this example, you'll demonstrate how to use weight-basedtraffic splitting to explicitly control the traffic proportion across twoGKE clusters.

This example goes through some realistic steps that a service owner would takein moving or expanding their application to a new GKE cluster.The goal of blue-green deployments is to reduce risk through multiple validationsteps which confirm that the new cluster is operating correctly. This examplewalks through four stages of deployment:

  1. 100%-Header-based canary:Use HTTP header routing to send only test or synthetic trafficto the new cluster.
  2. 100%-Mirror traffic:Mirror user traffic to the canary cluster. This tests thecapacity of the canary cluster by copying 100% of the user traffic to this cluster.
  3. 90%-10%:Canary a traffic split of 10% to slowly expose the new clusterto live traffic.
  4. 0%-100%:Cutover fully to the new cluster with the option of switching back if any errors are observed.

Blue-green traffic splitting across two GKE clusters

This example is similar to the previous one, except it deploys an internalmulti-cluster Gateway instead. This deploys an internal Application Load Balancer which is onlyprivately accessible from within the VPC. You will use the clusters and sameapplication that you deployed in the previous steps, except deploy them througha different Gateway.

Note: The same configurations could be applied to Gateways using externalGatewayClasses (gke-l7-global-external-managed-,gke-l7-regional-external-managed-),except the gke-l7-gxlb GatewayClass that does not support advanced traffic managementcapabilities. To learn more about the different features supported with eachGatewayClass, seeGatewayClass capabilities.

Prerequisites

The following example builds on some of the steps inDeploying an external multi-cluster Gateway. Ensure thatyou have done the following steps before proceeding with this example:

  1. Prepare your environment for multi-cluster Gateways

  2. Deploying a demo application

    This example uses thegke-west-1 andgke-west-2 clusters that you alreadyset up. These clusters are in the same region because thegke-l7-rilb-mcGatewayClass is regional and only supports cluster backends in the same region.

  3. Deploy the Service and ServiceExports needed on each cluster. If you deployedServices and ServiceExports from the previous example then you alreadydeployed some of these.

    kubectlapply--contextgke-west-1-fhttps://raw.githubusercontent.com/GoogleCloudPlatform/gke-networking-recipes/main/gateway/gke-gateway-controller/multi-cluster-gateway/store-west-1-service.yamlkubectlapply--contextgke-west-2-fhttps://raw.githubusercontent.com/GoogleCloudPlatform/gke-networking-recipes/main/gateway/gke-gateway-controller/multi-cluster-gateway/store-west-2-service.yaml

    It deploys a similar set of resources to each cluster:

    service/store createdserviceexport.net.gke.io/store createdservice/store-west-2 createdserviceexport.net.gke.io/store-west-2 created

Configuring a proxy-only subnet

If you have not already done so,configure a proxy-onlysubnet foreach region in which you are deploying internal Gateways. This subnet is usedto provide internal IP addresses to the load balancer proxies and must beconfigured with a--purpose set toREGIONAL_MANAGED_PROXY only.

You must create a proxy-only subnet before you create Gateways that manageinternal Application Load Balancers. Each region of a Virtual Private Cloud (VPC) networkin which you use internal Application Load Balancers must have a proxy-onlysubnet.

Thegcloud compute networks subnets createcommand creates a proxy-only a subnet.

gcloudcomputenetworkssubnetscreateSUBNET_NAME\--purpose=REGIONAL_MANAGED_PROXY\--role=ACTIVE\--region=REGION\--network=VPC_NETWORK_NAME\--range=CIDR_RANGE

Replace the following:

  • SUBNET_NAME: the name of the proxy-only subnet.
  • REGION: the region of the proxy-only subnet.
  • VPC_NETWORK_NAME: the name of the VPCnetwork that contains the subnet.
  • CIDR_RANGE: the primary IP address range of the subnet.You must use a subnet mask no larger than/26 so that at least 64 IPaddresses are available for proxies in the region. The recommended subnet maskis/23.

Deploying the Gateway

The following Gateway is created from thegke-l7-rilb-mc GatewayClass, which isa regional internal Gateway that can target only GKE clusters inthe same region.

  1. Apply the followingGateway manifest to the config cluster,gke-west-1in this example:

    cat << EOF | kubectl apply --context gke-west-1 -f -kind:GatewayapiVersion:gateway.networking.k8s.io/v1metadata:name:internal-httpnamespace:storespec:gatewayClassName:gke-l7-rilb-mclisteners:-name:httpprotocol:HTTPport:80allowedRoutes:kinds:-kind:HTTPRouteEOF
    Note: It might take several minutes (up to 10) for the Gateway to fullydeploy and serve traffic.
  2. Validate that the Gateway has come up successfully. You can filter for justthe events from this Gateway with the following command:

    kubectlgetevents--field-selectorinvolvedObject.kind=Gateway,involvedObject.name=internal-http--context=gke-west-1--namespacestore

    The Gateway deployment was successful if the output resembles the following:

    LAST SEEN   TYPE     REASON   OBJECT                  MESSAGE5m18s       Normal   ADD      gateway/internal-http   store/internal-http3m44s       Normal   UPDATE   gateway/internal-http   store/internal-http3m9s        Normal   SYNC     gateway/internal-http   SYNC on store/internal-http was a success

Header-based canary

Header-based canarying lets the service owner match synthetictest traffic that does not come from real users. This is an easy way ofvalidating that the basic networking of the application is functioning withoutexposing users directly.

  1. Apply the followingHTTPRoute manifest to the config cluster,gke-west-1in this example:

    cat << EOF | kubectl apply --context gke-west-1 -f -kind:HTTPRouteapiVersion:gateway.networking.k8s.io/v1metadata:name:internal-store-routenamespace:storelabels:gateway:internal-httpspec:parentRefs:-kind:Gatewaynamespace:storename:internal-httphostnames:-"store.example.internal"rules:# Matches for env=canary and sends it to store-west-2 ServiceImport-matches:-headers:-name:envvalue:canarybackendRefs:-group:net.gke.iokind:ServiceImportname:store-west-2port:8080# All other traffic goes to store-west-1 ServiceImport-backendRefs:-group:net.gke.iokind:ServiceImportname:store-west-1port:8080EOF

    Once deployed, this HTTPRoute configures the following routing behavior:

    • Internal requests tostore.example.internalwithout theenv: canary HTTP header are routed tostore Pods on thegke-west-1cluster
    • Internal requests tostore.example.internalwith theenv: canaryHTTP header are routed tostore Pods on thegke-west-2 cluster

    The HTTPRoute enables routing to different clusters based on the HTTPheaders

    Validate that the HTTPRoute is functioning correctly by sending traffic to theGateway IP address.

  2. Retrieve the internal IP address frominternal-http.

    kubectlgetgateways.gateway.networking.k8s.iointernal-http-o=jsonpath="{.status.addresses[0].value}"--contextgke-west-1--namespacestore

    ReplaceVIP in the following steps with the IPaddress you receive as output.

    Note: In the following steps send all requests toVIPfrom a client that has internal VPC connectivity and is in thesame region as the GKE cluster (unless you have configuredglobal access on your Gateway). You can create a VM inus-west1 and SSH toit for this purpose. This is necessary because theinternal-http Gatewayis an internal, regional load balancer. Also set the host header withstore.example.internal so that DNS does not have to be configured for thisexample to work.
  3. Send a request to the Gateway using theenv: canary HTTP header. This willconfirm that traffic is being routed togke-west-2. Use a private client inthe same VPC as the GKE clusters to confirm that requests arebeing routed correctly. The following command must be run on a machine that hasprivate access to the Gateway IP address or else it will not function.

    curl-H"host: store.example.internal"-H"env: canary"http://VIP

    The output confirms that the request was served by a Pod from thegke-west-2 cluster:

    {"cluster_name":"gke-west-2","host_header":"store.example.internal","node_name":"gke-gke-west-2-default-pool-4cde1f72-m82p.c.agmsb-k8s.internal","pod_name":"store-5f5b954888-9kdb5","pod_name_emoji":"😂","project_id":"agmsb-k8s","timestamp":"2021-05-31T01:21:55","zone":"us-west1-a"}

Traffic mirror

This stage sends traffic to the intended cluster but also mirrors that trafficto the canary cluster.

Using mirroring is helpful to determine how traffic load will impact applicationperformance without impacting responses to your clients in any way. It may notbe necessary for all kinds of rollouts, but can be useful when rolling out largechanges that could impact performance or load.

  1. Apply the followingHTTPRoute manifest to the config cluster,gke-west-1in this example:

    cat << EOF | kubectl apply --context gke-west-1 -f -kind:HTTPRouteapiVersion:gateway.networking.k8s.io/v1metadata:name:internal-store-routenamespace:storelabels:gateway:internal-httpspec:parentRefs:-kind:Gatewaynamespace:storename:internal-httphostnames:-"store.example.internal"rules:# Sends all traffic to store-west-1 ServiceImport-backendRefs:-name:store-west-1group:net.gke.iokind:ServiceImportport:8080# Also mirrors all traffic to store-west-2 ServiceImportfilters:-type:RequestMirrorrequestMirror:backendRef:group:net.gke.iokind:ServiceImportname:store-west-2port:8080EOF
  2. Using your private client, send a request to theinternal-http Gateway.Use the/mirror path so you can uniquely identify this request in theapplication logs in a later step.

    curl-H"host: store.example.internal"http://VIP/mirror
  3. The output confirms that the client received a response from a Pod in thegke-west-1 cluster:

    {"cluster_name":"gke-west-1","host_header":"store.example.internal","node_name":"gke-gke-west-1-default-pool-65059399-ssfq.c.agmsb-k8s.internal","pod_name":"store-5f5b954888-brg5w","pod_name_emoji":"🎖","project_id":"agmsb-k8s","timestamp":"2021-05-31T01:24:51","zone":"us-west1-a"}

    This confirms that the primary cluster is responding to traffic. You stillneed to confirm that the cluster you are migrating to is receiving mirroredtraffic.

  4. Check the application logs of astore Pod on thegke-west-2 cluster. Thelogs should confirm that the Pod received mirrored traffic from the loadbalancer.

    kubectllogsdeployment/store--contextgke-west-2-nstore|grep/mirror
  5. This output confirms that Pods on thegke-west-2 cluster are also receivingthe same requests, however their responses to these requests are not sent backto the client. The IP addresses seen in the logs are that of the loadbalancer's internal IP addresses which are communicating with your Pods.

    Found2pods,usingpod/store-5c65bdf74f-vpqbs[2023-10-1221:05:20,805]INFOin_internal:192.168.21.3--[12/Oct/202321:05:20]"GET /mirror HTTP/1.1"200-[2023-10-1221:05:27,158]INFOin_internal:192.168.21.3--[12/Oct/202321:05:27]"GET /mirror HTTP/1.1"200-[2023-10-1221:05:27,805]INFOin_internal:192.168.21.3--[12/Oct/202321:05:27]"GET /mirror HTTP/1.1"200-

Traffic split

Traffic splitting is one of the most common methods of rolling out new code ordeploying to new environments safely. The service owner sets an explicitpercentage of traffic that is sent to the canary backends that is typically avery small amount of the overall traffic so that the success of the rollout canbe determined with an acceptable amount of risk to real user requests.

Doing a traffic split with a minority of the traffic enables the service owner toinspect the health of the application and the responses. If all the signals lookhealthy, then they may proceed to the full cutover.

  1. Apply the followingHTTPRoute manifest to the config cluster,gke-west-1in this example:

    cat << EOF | kubectl apply --context gke-west-1 -f -kind:HTTPRouteapiVersion:gateway.networking.k8s.io/v1metadata:name:internal-store-routenamespace:storelabels:gateway:internal-httpspec:parentRefs:-kind:Gatewaynamespace:storename:internal-httphostnames:-"store.example.internal"rules:-backendRefs:# 90% of traffic to store-west-1 ServiceImport-name:store-west-1group:net.gke.iokind:ServiceImportport:8080weight:90# 10% of traffic to store-west-2 ServiceImport-name:store-west-2group:net.gke.iokind:ServiceImportport:8080weight:10EOF
  2. Using your private client, send a continuous curl request to theinternal-http Gateway.

    whiletrue;docurl-H"host: store.example.internal"-sVIP|grep"cluster_name";sleep1;done

    The output will be similar to this, indicating that a 90/10 traffic split isoccurring.

    "cluster_name": "gke-west-1","cluster_name": "gke-west-1","cluster_name": "gke-west-1","cluster_name": "gke-west-1","cluster_name": "gke-west-1","cluster_name": "gke-west-1","cluster_name": "gke-west-1","cluster_name": "gke-west-1","cluster_name": "gke-west-2","cluster_name": "gke-west-1","cluster_name": "gke-west-1",...

Traffic cut over

The last stage of the blue-green migration is to fully cut over to the newcluster and remove the old cluster. If the service owner was actually onboardinga second cluster to an existing cluster then this last step would be differentas the final step would have traffic going to both clusters. In that scenario asinglestore ServiceImport is recommended that has Pods from bothgke-west-1andgke-west-2 clusters. This allows the load balancer to make the decision ofwhere traffic should go for an active-active application, based on proximity,health, and capacity.

  1. Apply the followingHTTPRoute manifest to the config cluster,gke-west-1in this example:

    cat << EOF | kubectl apply --context gke-west-1 -f -kind:HTTPRouteapiVersion:gateway.networking.k8s.io/v1metadata:name:internal-store-routenamespace:storelabels:gateway:internal-httpspec:parentRefs:-kind:Gatewaynamespace:storename:internal-httphostnames:-"store.example.internal"rules:-backendRefs:# No traffic to the store-west-1 ServiceImport-name:store-west-1group:net.gke.iokind:ServiceImportport:8080weight:0# All traffic to the store-west-2 ServiceImport-name:store-west-2group:net.gke.iokind:ServiceImportport:8080weight:100EOF
  2. Using your private client, send a continuous curl request to theinternal-http Gateway.

    whiletrue;docurl-H"host: store.example.internal"-sVIP|grep"cluster_name";sleep1;done

    The output will be similar to this, indicating that all traffic is now goingtogke-west-2.

    "cluster_name": "gke-west-2","cluster_name": "gke-west-2","cluster_name": "gke-west-2","cluster_name": "gke-west-2",...

This final step completes a full blue-green application migration from oneGKE cluster to another GKE cluster.

Clean up

After completing the exercises on this document, follow these steps to removeresources and prevent unwanted charges incurring on your account:

  1. Delete the clusters.

  2. Unregister the clustersfrom the fleet if they don't need to be registered for another purpose.

  3. Disable themulticlusterservicediscovery feature:

    gcloudcontainerfleetmulti-cluster-servicesdisable
  4. Disable Multi Cluster Ingress:

    gcloudcontainerfleetingressdisable
  5. Disable the APIs:

    gcloudservicesdisable\multiclusterservicediscovery.googleapis.com\multiclusteringress.googleapis.com\trafficdirector.googleapis.com\--project=PROJECT_ID

Troubleshooting

No healthy upstream

Symptom:

The following issue might occur when you create a Gateway butcannot access the backend services (503 response code):

no healthy upstream

Reason:

This error message indicates that the health check prober cannot find healthybackend services. It is possible that your backend services are healthybut you might need to customize the health checks.

Workaround:

To resolve this issue,customize your health check based on your application'srequirements (for example,/health) using aHealthCheckPolicy.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.