Troubleshoot Envoy deployments

This guide provides information to help you resolve configuration issues withEnvoy clients when you run Cloud Service Mesh with Google APIs. Forinformation about how to use the Client Status Discovery Service (CSDS) API tohelp you investigate issues with Cloud Service Mesh, seeUnderstanding Cloud Service Mesh client status.

Note: This guide only supports Cloud Service Mesh with Google Cloud APIs anddoes not support Istio APIs. For more information see,Cloud Service Mesh overview.

Determining the version of Envoy installed on a VM

Use these instructions to verify which version of Envoy is running on avirtual machine (VM) instance.

To verify or check the Envoy version, you can do one of the following:

Check the guest attributes of the VM under the pathgce-service-proxy/proxy-version:

  gcloud compute --project cloud-vm-mesh-monitoring instances get-guest-attributesINSTANCE_NAME
--zoneZONEc --query-path=gce-service-proxy/proxy-version

NAMESPACE KEY VALUE gce-service-proxy proxy-version dc78069b10cc94fa07bb974b7101dd1b42e2e7bf/1.15.1-dev/Clean/RELEASE/BoringSSL

Check the Cloud Logging instance logs from the VM instance detailsLogging page in the Google Cloud console with a query such as this:

  resource.type="gce_instance"  resource.labels.instance_id="3633122484352464042"  jsonPayload.message:"Envoy version"

You receive a response such as this:

  {    "insertId": "9zy0btf94961a",    "jsonPayload": {      "message": "Envoy Version: dc78069b10cc94fa07bb974b7101dd1b42e2e7bf/1.15.1-dev/Clean/RELEASE/BoringSSL",      "localTimestamp": "2021-01-12T11:39:14.3991Z"    },    "resource": {      "type": "gce_instance",      "labels": {        "zone": "asia-southeast1-b",        "instance_id": "3633122484352464042",        "project_id": "cloud-vm-mesh-monitoring"      }    },    "timestamp": "2021-01-12T11:39:14.399200504Z",    "severity": "INFO",    "logName": "projects/cloud-vm-mesh-monitoring/logs/service-proxy-agent",    "receiveTimestamp": "2021-01-12T11:39:15.407023427Z"  }

Use SSH to connect to a VM and check the binary version:

YOUR_USER_NAME@backend-mig-5f5651e1-517a-4269-b457-f6bdcf3d98bc-m3wt:~$ /usr/local/bin/envoy --version

/usr/local/bin/envoy version: dc78069b10cc94fa07bb974b7101dd1b42e2e7bf/1.15.1-dev/Clean/RELEASE/BoringSSL

Use SSH to connect to a VM and the admin interface as root:

  root@backend-mig-5f5651e1-517a-4269-b457-f6bdcf3d98bc-m3wt:~# curl localhost:15000/server_info  {   "version": "dc78069b10cc94fa07bb974b7101dd1b42e2e7bf/1.15.1-dev/Clean/RELEASE/BoringSSL",   "state": "LIVE",   "hot_restart_version": "disabled",   ...  }

Envoy log locations

To troubleshoot some issues, you need to examine the Envoy proxy logs.

You can use SSH to connect to the VM instance to obtain the log file. The path islikely to be the following.

  /var/log/envoy/envoy.err.log

Proxies don't connect to Cloud Service Mesh

If your proxies don't connect to Cloud Service Mesh, do the following:

  • Check the Envoy proxy logs for any errors connecting totrafficdirector.googleapis.com.

  • If you set upnetfilter (by usingiptables) to redirect all traffic tothe Envoy proxy, make sure that the user (UID) as whom you run the proxyis excluded from redirection. Otherwise, this causes traffic to continuouslyloop back to the proxy.

  • Make sure that you enabled the Cloud Service MeshAPI for the project. UnderAPIs & services for your project, look forerrors for the Cloud Service Mesh API.

  • Confirm that the API access scope of the VM is set to allow full access to theGoogle Cloud APIs by specifying the following when you create the VM:

    --scopes=https://www.googleapis.com/auth/cloud-platform
  • Confirm that the service account has the correct permissions. For moreinformation, seeEnable the service account to access the Traffic Director API.

  • Confirm that you can accesstrafficdirector.googleapis.com:443 from theVM. If there are issues with this access, possible reasons could be a firewallpreventing access totrafficdirector.googleapis.com over TCP port443 orDNS resolution issues for thetrafficdirector.googleapis.com hostname.

  • If you're using Envoy for the sidecar proxy, confirm that the Envoy versionis release 1.24.9 or later.

Service configured with Cloud Service Mesh is not reachable

If a service configured with Cloud Service Mesh is not reachable, confirm thatthe sidecar proxy is running and able to connect to Cloud Service Mesh.

If you are using Envoy as a sidecar proxy, you can confirm this by running thefollowing commands:

  1. From the command line, confirm that the Envoy process is running:

    ps aux | grep envoy
  2. Inspect Envoy's runtime configuration to confirm that Cloud Service Meshconfigured dynamic resources. To see the config, run this command:

    curl http://localhost:15000/config_dump
  3. Ensure that traffic interception for the sidecar proxy is set up correctly.For the redirect setup withiptables, run theiptables command andthengrep the output to ensure that your rules are there:

    sudo iptables -t nat -S | grep ISTIO

    The following is an example of the output foriptables intercepting thevirtual IP address (VIP)10.0.0.1/32 and forwarding it to an Envoy proxyrunning on port15001 as UID1006:

    -N ISTIO_IN_REDIRECT-N ISTIO_OUTPUT-N ISTIO_REDIRECT-A OUTPUT -p tcp -j ISTIO_OUTPUT-A ISTIO_IN_REDIRECT -p tcp -j REDIRECT --to-ports 15001-A ISTIO_OUTPUT -m owner --uid-owner 1006 -j RETURN-A ISTIO_OUTPUT -d 127.0.0.1/32 -j RETURN-A ISTIO_OUTPUT -d 10.0.0.1/32 -j ISTIO_REDIRECT-A ISTIO_OUTPUT -j RETURN

If the VM instance is created through the Google Cloud console, some IPv6-relatedmodules are not installed and available before a restart. This causesiptablesto fail because of missing dependencies. In this case, restart the VM and rerunthe setup process, which should solve the problem. A Compute Engine VM that youcreated by using the Google Cloud CLI is not expected to have this problem.

Service stops being reachable when Envoy access logging is configured

If you usedTRAFFICDIRECTOR_ACCESS_LOG_PATH to configure anEnvoy access log as described inConfigure Envoy bootstrap attributes for Cloud Service Mesh,make sure that the system user running Envoy proxy has permissions to write tothe specified access log location.

Failure to provide necessary permissions results in listeners not beingprogrammed on the proxy and can be detected by checking for the following errormessage in the Envoy proxy log:

gRPC config for type.googleapis.com/envoy.api.v2.Listener rejected:Error adding/updating listener(s)TRAFFICDIRECTOR_INTERCEPTION_PORT:unable to open file '/var/log/envoy.log': Permission denied

To solve the problem, change the permissions of the chosen file for the accesslog to be writable by the Envoy user.

Error messages in the Envoy logs indicate a configuration problem

This section applies to deployments using the load balancing APIs.

If you are having difficulty with your Cloud Service Mesh configuration, youmight see any of the following error messages in the Envoy logs:

  • warning envoy config    StreamAggregatedResources gRPC config stream closed:5, Cloud Service Mesh configuration was not found for network "VPC_NAME" inproject "PROJECT_NUMBER".
  • warning envoy upstream  StreamLoadStats gRPC config stream closed:5, Cloud Service Mesh configuration was not found for network "VPC_NAME" inproject "PROJECT_NUMBER".
  • warning envoy config    StreamAggregatedResources gRPC config stream closed:5, Requested entity was not found.
  • warning envoy upstream  StreamLoadStats gRPC config stream closed:5, Requested entity was not found.
  • Cloud Service Mesh configuration was not found.

The last error message (Traffic Director configuration was not found)generally indicates that Envoy is requesting configuration fromCloud Service Mesh, but no matching configuration can be found. When Envoyconnects to Cloud Service Mesh, it presents a VPC network name(for example,my-network). Cloud Service Mesh then looks for forwarding rulesthat have theINTERNAL_SELF_MANAGED load-balancing scheme and reference thesame VPC network name.

To fix this error, do the following:

  1. Make sure that there is a forwarding rule in your network that has theload-balancing schemeINTERNAL_SELF_MANAGED. Note the forwarding rule'sVPC network name.

  2. If you're using Cloud Service Mesh withautomatic Envoy deployments on Compute Engine,ensure that the value provided to the--service-proxy:network flag matchesthe forwarding rule's VPC network name.

  3. If you're using Cloud Service Mesh withmanual Envoy deployments on Compute Engine,check the Envoy bootstrap file for the following:

    1. Ensure that the value for theTRAFFICDIRECTOR_NETWORK_NAME variable matches theforwarding rule's VPC network name.
    2. Ensure that the project number is set in theTRAFFICDIRECTOR_GCP_PROJECT_NUMBER variable.
  4. If you're deploying on GKE, and you are using the auto-injector,ensure that the project number and VPC network name areconfigured correctly, according to the directions inCloud Service Mesh setup for GKE Pods with automatic Envoy injection.

Troubleshooting for Compute Engine

This section provides instructions for troubleshooting Envoydeployments for Compute Engine.

The Envoy and VM bootstrapping processes and further lifecycle managementoperations can fail for many reasons, including temporary connectivity issues,broken repositories, bugs in bootstrapping scripts and on-VM agents, andunexpected user actions.

Communication channels for troubleshooting

Google Cloud provides communications channels that you can use to helpyou understand the bootstrapping process and the current state of the componentsthat reside on your VMs.

Virtual serial port output logging

A VM's operating system, BIOS, and other system-level entities typicallywrite output to the serial ports. This output is useful for troubleshootingsystem crashes, failed boot-ups, start-up issues, and shutdown issues.

Compute Engine bootstrapping agents log all performed actions to serialport 1. This includes system events, starting with basic package installationthrough getting data from an instance's metadata server,iptablesconfiguration, and Envoy installation status.

On-VM agents log Envoy process health status, newly discoveredCloud Service Mesh services, and any other information that might beuseful when you investigate issues with VMs.

Cloud Monitoring logging

Data exposed in serial port output is also logged to Monitoring,which uses the Golang library and exports the logs to a separate log to reducenoise. Because this log is an instance-level log, you might find serviceproxy logs on the same page as other instance logs.

VM guest attributes

Guest attributesare a specific type of custom metadata that your applications can write towhile running on your instance. Any application or user on your instances canread and write data to these guest attribute metadata values.

Compute Engine Envoy bootstrap scripts and on-VM agents expose attributeswith information about the bootstrapping process and current status of Envoy.All guest attributes are exposed in thegce-service-proxy namespace:

gcloud compute instances get-guest-attributesINSTANCE_NAME \    --query-path=gce-service-proxy/ \    --zone=ZONE

If you find any issues, we recommend that you check the value of the guestattributesbootstrap-status andbootstrap-last-failure. Anybootstrap-status value other thanFINISHED indicates that the Envoyenvironment is not configured yet. The value ofbookstrap-last-failuremight indicate what the problem is.

Unable to reach Cloud Service Mesh service from a VM created using a service-proxy-enabled instance template

To correct this problem, follow these steps:

  1. The installation of service proxy components on the VM might not havecompleted or might have failed. Use the following command to determinewhether all components are properly installed:

    gcloud compute instances get-guest-attributesINSTANCE_NAME \    --query-path=gce-service-proxy/ \    --zone=ZONE

    Thebootstrap-status guest attribute is set to one of the following:

    • [none] indicates that installation has not started yet. The VM mightstill be booting up. Check the status again in a few minutes.
    • IN PROGRESS indicates that the installation and configuration of theservice proxy components are not yet complete. Repeat the status checkfor updates on the process.
    • FAILED indicates that the installation or configuration of a componentfailed. Check the error message by querying thegce-service-proxy/bootstrap-last-failure1 attribute.
    • FINISHED indicates that the installation and configuration processesfinished without any errors. Use the following instructions to verifythat traffic interception and the Envoy proxy are configured correctly.
  2. Traffic interception on the VM is not configured correctly forCloud Service Mesh-based services. Sign in to the VM and check theiptablesconfiguration:

    gcloud compute sshINSTANCE_NAME \    --zone=ZONE \    sudo iptables -L -t nat

    Examine the chainSERVICE_PROXY_SERVICE_CIDRS forSERVICE_PROXY_REDIRECTentries such as these:

    Chain SERVICE_PROXY_SERVICE_CIDRS (1 references)target                   prot opt source         destination ...SERVICE_PROXY_REDIRECT   all  --  anywhere       10.7.240.0/20

    For each service, there should be a matching IP address or CIDR in thedestination column. If there is no entry for the virtual IP address (VIP),then there is a problem with populating the Envoy proxy configuration fromCloud Service Mesh, or the on-VM agent failed.

  3. The Envoy proxies haven't received their configuration from Cloud Service Meshyet. Sign in to the VM to check the Envoy proxy configuration:

    gcloud compute sshINSTANCE_NAME \    --zone=ZONE \    sudo curl localhost:15000/config_dump

    Examine the listener configuration received from Cloud Service Mesh. Forexample:

    "dynamic_active_listeners": [  ...  "filter_chains": [{    "filter_chain_match": {      "prefix_ranges": [{        "address_prefix": "10.7.240.20",        "prefix_len": 32      }],      "destination_port": 80    },  ...    "route_config_name": "URL_MAP/PROJECT_NUMBER.td-routing-rule-1"  ...]

    Theaddress_prefix is the virtual IP address (VIP) of a Cloud Service Meshservice. It points to the URL map calledtd-routing-rule-1. Check whetherthe service that you want to connect to is already included in the listenerconfiguration.

  4. The on-VM agent is not running. The on-VM agent automatically configurestraffic interception when new Cloud Service Mesh services are created. Ifthe agent is not running, all traffic to new services goes directly toVIPs, bypassing the Envoy proxy, and times out.

    1. Verify the status of the on-VM agent by running the following command:

      gcloud compute instances get-guest-attributesINSTANCE_NAME \   --query-path=gce-service-proxy/ \   --zone=ZONE
    2. Examine the attributes of the on-VM agent. The value of theagent-heartbeat attribute has the time that the agent last performed anaction or check. If the value is more than five minutes old, the agent isstuck, and you should re-create the VM by using the following command:

      gcloud compute instance-groups managed recreate-instance
    3. Theagent-last-failure attribute exposes the last error that occurred inthe agent. This might be a transient issue that resolves by the next timethe agent checks—for example, if the error isCannot reach theCloud Service Mesh API server—or it might be a permanent error.Wait a few minutes and then recheck the error.

Inbound traffic interception is configured to the workload port, but you cannot connect to the port from outside the VM

To correct this problem, follow these steps:

  1. The installation of service proxy components on the VM might not havecompleted or might have failed. Use the following command to determinewhether all components are properly installed:

    gcloud compute instances get-guest-attributesINSTANCE_NAME \    --query-path=gce-service-proxy/ \    --zone=ZONE

    Thebootstrap-status guest attribute is set to one of the following:

    • [none] indicates that installation has not started yet. The VM mightstill be booting up. Check the status again in a few minutes.
    • IN PROGRESS indicates that the installation and configuration of theservice proxy components are not yet complete. Repeat the status checkfor updates on the process.
    • FAILED indicates that the installation or configuration of a componentfailed. Check the error message by querying thegce-service-proxy/bootstrap-last-failure1 attribute.
    • FINISHED indicates that the installation and configuration processesfinished without any errors. Use the following instructions to verifythat traffic interception and the Envoy proxy are configured correctly.
  2. Traffic interception on the VM is not configured correctly forinbound traffic. Sign in to the VM and check theiptables configuration:

    gcloud compute sshINSTANCE_NAME \    --zone=ZONE \    sudo iptables -L -t nat

    Examine the chainSERVICE_PROXY_INBOUND forSERVICE_PROXY_IN_REDIRECTentries such as these:

    Chain SERVICE_PROXY_INBOUND (1 references)target                      prot opt source       destination ...SERVICE_PROXY_IN_REDIRECT   tcp  --  anywhere     anywhere  tcp dpt:mysql

    For each port that is defined inservice-proxy:serving-ports, there shouldbe a matching port in thedestination column. If there is no entry for theport, all inbound traffic goes to this port directly, bypassing the Envoyproxy.

    Verify that there are no other rules that drop traffic to this port or allports except one specific port.

  3. The Envoy proxies haven't received their configuration for the inbound portfrom Cloud Service Mesh yet. Sign in to the VM to check the Envoy proxyconfiguration:

    gcloud compute sshINSTANCE_NAME \    --zone=ZONE \    sudo curl localhost:15000/config_dump

    Look for theinbound listener configuration received fromCloud Service Mesh:

    "dynamic_active_listeners": [  ...  "filter_chains": [{    "filter_chain_match": {      "prefix_ranges": [{        "address_prefix": "10.0.0.1",        "prefix_len": 32      }],      "destination_port": 80    },  ...    "route_config_name": "inbound|default_inbound_config-80"  ...]

    Theroute_config_name, starting withinbound, indicates a specialservice created for inbound traffic interception purposes. Check whetherthe port that you want to connect to is already included in the listenerconfiguration underdestination_port.

Issues when connections use server-first protocols

Some applications, such as MySQL, use protocols where the server sends the firstpacket. This means that upon initial connection the server sends the firstbytes. These protocols and applications are not supported by Cloud Service Mesh.

Troubleshoot the health of your service mesh

This guide provides information to help you resolve Cloud Service Meshconfiguration issues.

Cloud Service Mesh behavior when most endpoints are unhealthy

For better reliability, when 99% of endpoints are unhealthy,Cloud Service Mesh configures the data plane to disregard the healthstatus of the endpoints. Instead, the data plane balances traffic among all ofthe endpoints because it is possible that the serving port is still functional.

Unhealthy backends cause suboptimal distribution of traffic

Cloud Service Mesh uses the information in theHealthCheck resourceattached to a backend service to evaluate the health of your backends.Cloud Service Mesh uses this health status to route traffic to theclosest healthy backend. If some of your backends are unhealthy, traffic mightcontinue to be processed, but with suboptimal distribution. For example, trafficmight flow to a region where healthy backends are still present, but which ismuch farther from the client, introducing latency. To identify and monitor thehealth status of your backends, try the following steps:

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.