Known issues

Linux Windows

This page describes known issues that you might run into while usingCompute Engine. For issues that specifically affectConfidential VMs, seeConfidential VM limitations.

General issues

The following issues provide troubleshooting guidance or general information.

Local SSD disks attached to C4A, C4D, C4, and H4D instances might not capture all writes in case of power loss

Following a loss of power to a host server, if Compute Enginecan recover the data on the Local SSD disks, then a compute instance running onthat host server restarts with all disks attached, and the data includes allwrites that were completed prior to the host error.

For C4A, C4D, C4, and H4D compute instances, the restored Local SSD disks mightnot include writes that were completed immediately prior to the power lossevent. When the compute instance restarts, reading from an affected logicalblock address (LBA) returns an error indicating the LBA is unreadable. If yourVM experiences an unexpected reboot, check the OS error logs for read or writefailures after the instance restarts.

Hyperdisk Throughput and Hyperdisk Extreme capacity consumes Persistent Disk quotas simultaneously

When you create Hyperdisk Throughput or Hyperdisk Extreme disks, the disk capacity counts against twoseparate quotas at the same time: the specific Hyperdisk quotaand a corresponding Persistent Disk quota.

Hyperdisk Throughput Capacity (GB) (HDT-TOTAL-GB) also counts against yourPersistent disk standard (GB) (DISKS-TOTAL-GB) quota.
Hyperdisk Extreme Capacity (GB) (HDX-TOTAL-GB) also counts against yourPersistent disk SSD (GB) (SSD-TOTAL-GB) quota.

If your Persistent Disk quota limit is lower than your Hyperdiskquota limit, you'll encounterQUOTA_EXCEEDED errors. You can't createadditional disks once the Persistent Disk limit is reached, even if you haveremaining Hyperdisk quota available.

To workaround this issue, you must adjust both quotas whenever you request anincrease. When you adjust yourHDT-TOTAL-GB orHDX-TOTAL-GB quota, you mustalso adjust yourDISKS-TOTAL-GB orSSD-TOTAL-GB quota respectively.

Workload interruptions on A4 VMs due to firmware issues for NVIDIA B200 GPUs

NVIDIA has identified two firmware issues for B200 GPUs, which are used by A4VMs, that are causing workload interruptions. Specifically, if you noticeworkload interruptions on A4 VMs, then check if either of the following aretrue:

The VM's uptime(lastStartTimestamp field) exceeds 65 days.
Logs show anXid 149 message that mentions0x02a.

To mitigate this issue, we recommendresetting your GPUs.To help prevent the issue, we recommend resetting the GPUs on A4 VMs at leastonce every 60 days.

Possible host errors during C4 VM creation on sole tenant nodes

C4 machine type virtual machine (VM) instances running on sole tenant nodesmight encounter unexpected VM terminations due to host errors or VMcreation failures.

To address this issue, Google has limited the maximum number of C4 VM instancesallowed per sole tenant node to 26.

Canceling jobs on 32-node or larger HPC clusters exceeds the timeout

For large jobs on 32-node or larger clusters, the time it takes to cancel ajob can exceed the defaultUnkillableStepTimeout value of 300 seconds.Exceeding this value causes the affected nodes to become unusable for futurejobs.

To resolve this issue, use one of the following methods:

Update Cluster Toolkit to release 1.65.0 or later. Then redeploy thecluster using the following command:
```
gclusterdeploy-w--forceBLUEPRINT_NAME.yaml
```
If you can't update Cluster Toolkit or redeploy the cluster,then you can manually modify theUnkillableStepTimeout parameter by completingthe following steps:
Warning: If you complete these steps but don't update Cluster Toolkitto release 1.65.0 or later, then these manual changes will get overwritten ifthe cluster is redeployed with thegcluster deploy command.
1. Use SSH to connect to the main controller node of your cluster.
```
gcloudcomputessh--projectPROJECT_ID--zoneZONEDEPLOYMENT_NAME-controller
```
  You can find the exact name and IP address for the main controller nodeby using the Google Cloud console and navigating to the VM instances page.
2. Create a backup of the currentcloud.conf file. This file is usuallylocated in/etc/slurm/.
```
sudocp/etc/slurm/cloud.conf/etc/slurm/cloud.conf.backup-$(date+%Y%m%d)
```
3. Usingsudo privileges, use a text editor to open the file/etc/slurm/cloud.conf.
4. Add or modify the line that containsUnkillableStepTimeout. Forexample, set the timeout to 900 seconds (15 minutes) as follows:
```
UnkillableStepTimeout=900
```
5. Save the file.
  Note: Cluster Toolkit deployments typically configure an NFSshare from the controller node for directories like/etc/slurm and/home. This configuration means that any changes made tocloud.conf onthe controller node are automatically visible to all compute nodes.
6. Use the commandsudo scontrol reconfigure to apply the new settingacross the cluster without needing a full restart.

Verify the fix

You can verify the setting has changed by running the following command:

scontrolshowconfig|grepUnkillableStepTimeout

The output should reflect the new value you set, for example:UnkillableStepTimeout = 900.

Resolved: Modifying the IOPS or throughput on an Asynchronous Replication primary disk using the`gcloud compute disks update` command causes a false error

The following issue was resolved on June 1, 2025.

When you use the gcloud compute disks updatecommand to modify theIOPS and throughput on an Asynchronous Replication primary disk, the gcloud CLIshows an error message even if the update was successful.

To accurately verify that an update was successful, use thegcloud CLI or the Google Cloud console to see if the disk properties showthe new IOPS and throughput values. For more information, seeView the provisioned performance settings for Hyperdisk.

The metadata server might display old`physicalHost` VM metadata

After experiencing ahost error whichmoves an instance to a new host, when youquery the metadata server,it might display thephysicalHost metadata of the instance's previous host.

To workaround this issue, do one of the following:

Use theinstances.get methodor thegcloud compute instances describe commandto retrieve the correctphysicalHost information.
Stop and then start yourinstance. This process updates thephysicalHost information in the metadataserver.
Wait 24 hours for the impacted instance'sphysicalHost information to beupdated.

Long`baseInstanceName` values in managed instance groups (MIGs) can cause disk name conflicts

In a MIG, disk name conflicts can occur if the instance template specifies disksto be created upon VM creation and thebaseInstanceName value exceeds 54characters. This happens because Compute Engine generates disk names using theinstance name as a prefix.

When generating disk names, if the resulting nameexceeds the resource name limit of 63 characters, Compute Enginetruncates the excess characters from the end of instance name. This truncationcan lead to the creation of identical disk names for instances thathave similar naming patterns. In such a case, the new instance will attemptto attach the existing disk. If the disk is already attached to anotherinstance, the new instance creation fails. If the disk is not attached or is inmulti-writer mode, the new instance willattach the disk, potentially leading to data corruption.

To avoid disk name conflicts, keep thebaseInstanceName value to a maximumlength of 54 characters.

Creating reservations or future reservation requests using an instance template that specifies an A2, C3, or G2 machine type causes issues

If you use an instance template that specifies an A2, C3, or G2 machine type tocreate a reservation, or to create and submit a future reservation request forreview, you encounter issues. Specifically:

Creating the reservation might fail. If it succeeds, then one of thefollowing applies:
- If you created an automatically consumed reservation (default), creatingVMs with matching properties won't consume the reservation.
- If you created a specific reservation, creating VMs to specifically targetthe reservation fails.
Creating the future reservation request succeeds. However, if you submit itfor review, Google Cloud declines your request.

You can't replace the instance template used to create a reservation or futurereservation request, or override the template's VM properties. If you want toreserve resources for A2, C3, or G2 machine types, do one of the followinginstead:

Create a newsingle-projectorshared reservationby specifying properties directly.
Create a new future reservation request by doing the following:
1. If you want to stop an existing future reservation request fromrestricting the properties of the future reservation requests you cancreate in your current project—or in the projects the futurereservation request is sharedwith—delete the future reservation request.
2. Create asingle-projectorshared future reservation requestby specifying properties directly and submit it for review.

Limitations when using`-lssd` machine types with Google Kubernetes Engine

When using the Google Kubernetes Engine API, the node poolwith Local SSD attached that you provisionmust have the same number ofSSD disks as the selected C4, C3, or C3D machine type. For example, if you planto create a VM that uses thec3-standard-8-lssd there must be 2 SSD disks,whereas for ac3d-standard-8-lssd, just 1 SSD disk is required. If thedisk number doesn't match you will get a Local SSD misconfiguration errorfrom the Compute Engine control plane. SeeMachine types that automatically attach Local SSD disksto select the correct number of Local SSD disks based on thelssdmachine type.

Using the Google Kubernetes Engine Google Cloud console to create a cluster or node poolwithc4-standard-*-lssd,c4-highmem-*-lssd,c3-standard-*-lssd andc3d-standard-*-lssd VMs results in node creationfailure or a failure to detect Local SSDs as ephemeral storage.

Single flow TCP throughput variability on C3D VMs

C3D VMs larger than 30 vCPUs might experience single flow TCP throughputvariability and occasionally be limited to 20-25 Gbps.To achieve higher rates, use multiple tcp flows.

The CPU utilization observability metric is incorrect for VMs that use one thread per core

If your VM's CPU uses one thread per core, the CPU utilizationCloud Monitoring observability metric in theCompute Engine > VM instances > Observabilitytab only scales to 50%. Two threads per core is the default for all machinetypes, except Tau T2D. For more information, seeSet number of threads per core.

To view your VM's CPU utilization normalized to 100%, view CPU utilization inMetrics Explorer instead. For more information, seeCreate charts with Metrics Explorer.

Google Cloud console SSH-in-browser connections might fail if you use custom firewall rules

If you use custom firewall rules to control SSH access to your VM instances, youmight not be able to use theSSH-in-browserfeature.

To work around this issue, do one of the following:

EnableIdentity-Aware Proxy for TCPto continue connecting to VMs using the SSH-in-browserGoogle Cloud console feature.
Recreate thedefault-allow-ssh firewall ruleto continue connecting to VMs using SSH-in-browser.
Connect to VMs using the Google Cloud CLIinstead of SSH-in-browser.

Temporary names for disks

During virtual machine (VM) instance updates initiated using thegcloud compute instances update commandor theinstances.update API method,Compute Engine might temporarily change the name of your VM's disks, by addingof the following suffixes to the original name:

-temp
-old
-new

Compute Engine removes the suffix and restores the original disk names as theupdate completes.

Increased latency for some Persistent Disks caused by disk resizing

In some cases, resizing large Persistent Disks (~3 TB or larger) might bedisruptive to the I/O performance of the disk. If you are impacted by thisissue, your Persistent Disks might experience increased latency during theresize operation. This issue can impact Persistent Disks of anytype.

Your automated processes might fail if they use API response data about your resource-based commitment quotas

Your automated processes that consume and use API response data aboutyour Compute Engine resource-based commitment quotas might failif each of the following things happens. Your automated processes caninclude any snippets of code, business logic, or database fields that use orstore the API responses.

The response data is from any of the followingCompute Engine API methods:
You use anint instead of anumber to define the field for yourresource quota limit in your API response bodies. You can find the fieldin the following ways for each method:
- items[].quotas[].limit for thecompute.regions.list method.
- quotas[].limit for thecompute.regions.get method.
- quotas[].limit for thecompute.projects.get method.
You have unlimited default quota available for any of yourCompute Engine committed SKUs.
For more information about quotas for commitments and committed SKUs, seeQuotas for commitments and committed resources.

Root cause

When you have limited quota, if you define theitems[].quotas[].limit orquotas[].limit field as anint type, the API response data for your quotalimits might still fall within the range forint type and your automatedprocess might not get disrupted. But when the default quota limit is unlimited,Compute Engine API returns a value for thelimit field that fallsoutside of the range defined byint type. Your automated process can'tconsume the value returned by the API method and fails as a result.

How to work around this issue

You can work around this issue and continue generating your automated reportsin the following ways:

Recommended: Follow theCompute Engine API reference documentationand use the correct data types for the API method definitions.Specifically, use thenumber type to define theitems[].quotas[].limitandquotas[].limit fields for your API methods.
Decrease your quota limit to a value under 9,223,372,036,854,775,807.You must set quota caps for all projects that have resource-basedcommitments, across all regions. You can do this in one of the followingways:
- Follow the same steps that you would torequest a quota adjustment,and request for a lower quota limit.
- Create a quota override.

Note: Setting a quota limit cap is only a temporary workaround until you correctthe definition of theitems[].quotas[].limit field forcompute.regions.list method and change it tonumber type. To return thedefault quota limits for your committed SKUs back to their unlimited value,you must remove the quota limit caps.

Known issues for GPU instances

The following section describes the known issues for Compute Engine GPU instances.

Accelerator-optimized machine types that have Local SSD automatically attached might take hours to terminate and restart

Accelerator-optimized machinetypes have GPUs automaticallyattached. Most A-series accelerator-optimized machine types, with the exceptionof A2 Standard, haveLocal SSD automatically attached.

Accelerator-optimized machine types don't support live migration, and you mustset their host maintenance policytoTERMINATE. These machine types can take up to one hour to terminate afterfailures orhost errors.For the accelerator-optimized machine types that have Local SSD automaticallyattached, the termination process might take several hours.

Creation errors and decreased performance when using Dynamic NICs with GPU instances

Dynamic NICsaren't supported for use with GPU instances. If you create a GPU instance withDynamic NICs, or add Dynamic NICs to an existingGPU instance, the following issues might occur:

The operation fails with an error such as the following:
Internal error. Please try again or contact Google Support. (Code: 'CODE')
The operation succeeds, but the instance experiences decreased performance,such as significantly lowernetwork bandwidth.

These issues occur because the Dynamic NIC configuration leadsto errors when Compute Engine attempts to distribute the instance'svNICs across physical NICs on the host server.

Known issues for bare metal instances

These are the known issues for Compute Engine bare metal instances.

C4D bare metal doesn't support SUSE Linux 15 SP6 images

C4D bare metal instances can't run the SUSE Linux Enterprise Server (SLES)version 15 SP6 OS.

Workaround

Use SLES 15 SP5 instead.

Simulating host maintenance doesn't work for C4 bare metal instances

Thec4-standard-288-metal andc4-highmem-288-metal machine typesdon't supportsimulating host maintenance events.

Workaround

You can use VM instances created using other C4 machine types to simulatemaintenance events.

Create a VM instance using a C4 machine type that doesn't end in-metal.
When creating the VM instance,configure the C4 VM toTerminateinstead of using Live Migration during host maintenance events.
Simulate a host maintenance event for this VM.

During a simulated host maintenance event, the behavior for VMs configuredto Terminate is the same behavior as for C4 bare metal instances.

Lower than expected performance with Z3 bare metal instances on RHEL 8

When using Red Hat Enterprise Linux (RHEL) version 8 with a Z3 bare metalinstance, the network performance is lower than expected.

Root cause

There is a missing Page Pool feature in the Linux kernel version (4.18) that isused by RHEL 8.

Workaround

Use a more recent version of RHEL or a different operating system when you areworking with Z3 bare metal instances.

Issues related to using Dynamic Network Interfaces

This section describes known issues related to using multiple network interfacesand Dynamic Network Interfaces.

Dropped packets when using Dynamic NICs with alias IP ranges, protocol forwarding, or Passthrough Network Load Balancers

The guest agent automatically add local routes in the following scenarios forvNICs, but not for Dynamic NICs:

When youconfigure an alias IP range,the guest agent creates a local route for the alias IP range.
When you create atarget instance that references a compute instance forprotocol forwarding, the guest agent creates a local route for the associatedforwarding rule IP address.
When you add a backend to aPassthrough Network Load Balancer,the guest agent creates a local route for the associated forwarding rule IP address.

Because the local routes aren't added for Dynamic NICs, theDynamic NIC might experience dropped packets.

To resolve this issue, add the IP addresses manually as follows:

Connect to the instanceby using SSH.
If you are configuring an alias IP range, do the following.Otherwise, you can skip this step.
1. In/etc/default/instance_configs.cfg, ensure that theip_aliasessetting is set totrue.
2. If the ip_aliases setting is set tofalse, modify the file to change it totrue and then restart the guest agent:
```
systemctl restart google-guest-agent
```
Configure a local route for the alias IP range or the forwarding rule IPaddress by using the following command:
```
ip route add to localIP_ADDRESS devDYNAMIC_NIC_DEVICE_NAME proto 66
```
Replace the following:
- IP_ADDRESS: the alias IP range or forwarding rule IPaddress that you want to add a local route for.
- DYNAMIC_NIC_DEVICE_NAME: the device name of theDynamic NIC that you want to add a local route for.For example,a-gcp.ens4.3.

Issues with installation and management of Dynamic NICs in guest agent versions 20250901.00 to 20251120.01

If youconfigure automatic management of Dynamic NICsand your instance is running the guest agent at a version from20250901.00 to 20251120.01, you might encounter the following issues:

The guest agent fails to install and manage Dynamic NICs in theguest OS of your instance.
You might receive an error that includesCannot find device when runningcommands in the guest OS that reference Dynamic NICs.
Deleting multiple Dynamic NICs causes themetadata server to become inaccessible.

Root cause

Starting with version20250901.00, theguest agent migrated to a newplugin-based architecture to improve modularity. The new architecture didn'tinitially support the automatic installation and management ofDynamic NICs.

Resolution

To resolve these issues, update your instance to use guest agent version20251205.00 or later:

To update the guest agent to the latest version, seeUpdate the guest environment.
To confirm the guest agent version that your instance is running,seeView installed packages by operating system version.

If necessary, you can temporarily work around these issues for instances that arerunning guest agent versions 20250901.00 to 20251120.01 by following theinstructions inBackward compatibilityto revert to the previous guest agent architecture.

Packet interception can result in dropped packets due to missing VLAN tags in the Ethernet headers

Packet interception when using Dynamic NIC can result in droppedpackets. Dropped packets can happen when the pipeline is terminated early. Theissue affects both session-based and non-session-based modes.

Root cause

Dropped packets occur during packet interception when the pipeline is terminatedearly (ingress intercept and egress reinject). The early termination causes theVLAN ID to be missing from the ingress packet's Ethernet header. Because theegress packet is derived from the modified ingress packet, the egress packetalso lacks the VLAN ID. This leads to incorrect endpoint index selection andsubsequent packet drops.

Workaround

Don't use Google Cloud features that rely on packet intercept,such asfirewall endpoints.

Known issues for Linux VM instances

These are the known issues for Linux VMs.

Package upgrade error on Rocky Linux 9.7

dnf update fails on Rocky Linux Accelerator Optimized images versionv20251113or earlier (for example,rocky-linux-9-optimized-gcp-nvidia-latest-v20251113) due to apackage dependency conflict. You might see an error similar to the following:

[root@rockylinux9 ~]# dnf updateCIQ SIG/Cloud Next for Rocky Linux 9 37 MB/s | 49 MB 00:01CIQ SIG/Cloud Next Nonfree for Rocky Linux 9 4.4 MB/s | 1.5 MB 00:00NVIDIA DOCA 2.10.0 packages for EL 9.5 239 kB/s | 160 kB 00:00Google Compute Engine 38 kB/s | 8.2 kB 00:00Google Cloud SDK 59 MB/s | 154 MB 00:02Rocky Linux 9 - BaseOS 24 MB/s | 6.3 MB 00:00Rocky Linux 9 - AppStream 36 MB/s | 11 MB 00:00Rocky Linux 9 - Extras 124 kB/s | 16 kB 00:00Error: Problem 1: package perftest-25.04.0.0.84-1.el9.x86_64 from baseos requires libhns.so.1(HNS_1.0)(64bit), but none of the providers can be installed  - package perftest-25.04.0.0.84-1.el9.x86_64 from baseos requires libhns.so.1()(64bit), but none of the providers can be installed  - cannot install both libibverbs-51.0-3.el9_5.cld_next.x86_64 from ciq-sigcloud-next and libibverbs-2501mlnx56-1.2501060.x86_64 from @System  - cannot install both libibverbs-51.0-5.el9_5.cld_next.x86_64 from ciq-sigcloud-next and libibverbs-2501mlnx56-1.2501060.x86_64 from @System  - cannot install both libibverbs-54.0-2.el9_6.cld_next.x86_64 from ciq-sigcloud-next and libibverbs-2501mlnx56-1.2501060.x86_64 from @System  - cannot install both libibverbs-54.0-3.el9_6.cld_next.x86_64 from ciq-sigcloud-next and libibverbs-2501mlnx56-1.2501060.x86_64 from @System  - cannot install both libibverbs-54.0-4.el9_6.cld_next.x86_64 from ciq-sigcloud-next and libibverbs-2501mlnx56-1.2501060.x86_64 from @System  - cannot install both libibverbs-54.0-5.el9_6.cld_next.x86_64 from ciq-sigcloud-next and libibverbs-2501mlnx56-1.2501060.x86_64 from @System  - cannot install both libibverbs-57.0-3.el9_7_ciq.x86_64 from ciq-sigcloud-next and libibverbs-2501mlnx56-1.2501060.x86_64 from @System  - cannot install both libibverbs-57.0-2.el9.x86_64 from baseos and libibverbs-2501mlnx56-1.2501060.x86_64 from @System  - cannot install the best update candidate for package perftest-25.01.0-0.70.g759a5c5.2501060.x86_64  - cannot install the best update candidate for package libibverbs-2501mlnx56-1.2501060.x86_64 Problem 2: package ucx-ib-mlx5-1.18.0-1.2501060.x86_64 from @System requires ucx(x86-64) = 1.18.0-1.2501060, but none of the providers can be installed  - cannot install both ucx-1.18.1-1.el9.x86_64 from appstream and ucx-1.18.0-1.2501060.x86_64 from @System  - cannot install both ucx-1.18.1-1.el9.x86_64 from appstream and ucx-1.18.0-1.2501060.x86_64 from doca  - cannot install the best update candidate for package ucx-ib-mlx5-1.18.0-1.2501060.x86_64  - cannot install the best update candidate for package ucx-1.18.0-1.2501060.x86_64...(try to add '--allowerasing' to command line to replace conflicting packages or '--skip-broken' to skip uninstallable packages or '--nobest' to use not only best candidate packages)

Root cause

A userspace package version conflict exists between DOCA OFED versions prior to3.20 and Rocky Linux 9.7. Specifically, Rocky Linux 9.7 includesucx andperftest packages that are a later version than the corresponding packages inthe DOCA OFED repository. This version mismatch causesdnf update to failwith dependency resolution errors.

Resolution

Before you perform a full system upgrade, update the DOCA repository package:

sudo dnf update doca-reposudo dnf update

Rocky Linux Accelerator Optimized images built in December 2025 (for example,rocky-linux-9-optimized-gcp-nvidia-latest-v20251215) already includethe updateddoca-repo package, so this upgrade issue is not present on thosebuilds or later.

OS Login isn't supported on SLES 16

An SSH configuration issue in SUSE Linux Enterprise Server (SLES) 16 preventsthe use of the Google Cloud featureOSLogin.HoweverMetadata-managed SSH connections are unaffected and continue to function.

Supported URL formats for startup script

If your instance uses guest agent version20251115.00, fetching a startup script using thestartup-script-url metadata key fails if the URL uses thehttps://storage.googleapis.com/ format that is documented in theUse startup scripts on Linux VMs page.

To work around this issue, use one of the following supported URL formats:

Authenticated URL:https://storage.cloud.google.com/BUCKET/FILE
gcloud CLI storage URI:gs://BUCKET/FILE

Debian 11 VMs that use OS image version prior to v20250728 fail to run apt update

On July 22, 2025, the Debian communityremovedDebian 11 (Bullseye) backports from the main Debian upstream. This update causessudo apt update to fail with the following error:

The repository 'https://deb.debian.org/debian bullseye-backports Release' doesnot have a Release file.

Root cause

Because the Debian community removed the backports repositories from the mainupstream, there is no longer any reference tobullseye-backports Release.

Resolution

Use image versiondebian-11-bullseye-v20250728 or newer. These versions don'tcontain the backports repositories. Alternatively, you can update currentinstances by modifying/etc/apt/sources.list:

To update the repository URL and use the archive forbullseye-backports:

sudosed-i's/^deb https:\/\/deb.debian.org\/debian bullseye-backports main$/deb https:\/\/archive.debian.org\/debian bullseye-backports main/g; s/^deb-src https:\/\/deb.debian.org\/debian bullseye-backports main$/deb-src https:\/\/archive.debian.org\/debian bullseye-backports main/g'/etc/apt/sources.list

To delete the repository URL and discardbullseye-backports:

sudosed-i'/^deb https:\/\/deb.debian.org\/debian bullseye-backports main$/d; /^deb-src https:\/\/deb.debian.org\/debian bullseye-backports main$/d'/etc/apt/sources.list

Installing`ubuntu-desktop` package breaks VM network upon restarting

After installingubuntu-desktop package on an Ubuntu VM,the following workaround must be executed before restarting the VM:

echo-e'network:\n  version: 2\n  renderer: networkd'|sudotee/etc/netplan/99-gce-renderer.yaml

Otherwise, network interfaces might not be correctly configured upon restarting.

Root cause

Theubuntu-deskop package pullsubuntu-settings as a dependency,whichsets NetworkManager as the default "renderer" for netplan.More specifically, it inserts a new YAML configuration fornetplan at/usr/lib/netplan/00-network-manager-all.yaml containing the following:

network:version:2renderer:NetworkManager

This configuration conflicts with ournetworkd-based early provisioningusingcloud-init.

Recovery

If the VM has been restarted and is inaccessible, then do the following:

Follow theinstructions on rescuing a VM
After mounting the inaccessible VM's Linux file system partition,run the following command (replacing/rescue with your mount point):
```
echo-e'network:\n  version: 2\n  renderer: networkd'|sudotee/rescue/etc/netplan/99-gce-renderer.yaml
```
Continue withinstructions on booting the inaccessible VM back

Ubuntu VMs that use OS Image version v20250530 show incorrect FQDN

You might see an incorrect Fully Qualified Domain Name (FQDN) with the additionof.local suffix when you do one of the following:

Update to version20250328.00 of thegoogle-compute-engine package.
Launch instances from any Canonical offered Ubuntu image with the versionsuffixv20250530.For example,projects/ubuntu-os-cloud/global/images/ubuntu-2204-jammy-v20250530.

If you experience this issue, you might see a FQDN similar to the following:

   [root@ubuntu2204 ~]# apt list --installed | grep google   ...   google-compute-engine/noble-updates,now 20250328.00-0ubuntu2~24.04.0 all [installed]   ...   [root@ubuntu2204 ~]# curl "http://metadata.google.internal/computeMetadata/v1/instance/image" -H "Metadata-Flavor: Google"   projects/ubuntu-os-cloud/global/images/ubuntu-2204-jammy-v20250530   [root@ubuntu2204 ~]# hostname -f   ubuntu2204.local

Root cause

On all Ubuntu images with versionv20250530, theguest-config package version20250328.00addslocal to the search path due to the introduction of a new configurationfile:https://github.com/GoogleCloudPlatform/guest-configs/blob/20250328.00/src/etc/systemd/resolved.conf.d/gce-resolved.conf

   [root@ubuntu2204 ~]# cat /etc/resolv.conf   # This is /run/systemd/resolve/stub-resolv.conf managed by man:systemd-resolved(8).   # Do not edit.   ...   nameserver 127.0.0.53   options edns0 trust-ad   search local ... google.internal

The presence of thislocal entry within the search path in the/etc/resolv.conffile results in a.local element being appended to the hostname when a FQDN isrequested.

Note thatversion 20250501 ofguest-configsalready fixes the issue but Canonical hasn't incorporated the fix into their images yet.

Workaround

Modify the Network Name Resolution configuration file/etc/systemd/resolved.conf.d/gce-resolved.conf by changingDomains=local toDomains=~local
Run the following command to restart the systemd-resolved service:systemctl restart systemd-resolved
Check thatlocal is removed from the search path in/etc/resolv.conf

Confirm the FQDN by using the commandhostname -f

[root@ubuntu2204 ~]# hostname -fubuntu2204.us-central1-a.c.my-project.internal

Missing mkfs.ext4 on openSUSE Images

Recentv20250724 release of openSUSE images (starting withopensuse-leap-15-6-v20250724-x86-64)from August 2025 is missing thee2fsprogs package, which provides utilitiesfor managing file systems. A common symptom of this issue is that you see anerror message such ascommand not found when you attempt to use themkfs.ext4 command.

Workaround

If you encounter this issue, install the missing package manually by using theopenSUSE package manager,zypper.

# Update the package indexuser@opensuse:~> sudo zypper refresh# Install the e2fsprogs packageuser@opensuse:~> sudo zypper install e2fsprogs# Verify the installationuser@opensuse:~> which mkfs.ext4

SUSE Enterprise VMs fail to boot after changing instances types

After changing a SUSE Linux Enterprise VM's instance type it can fail to bootwith the following error seen repeating in the serial console:

            Starting [0;1;39mdracut initqueue hook[0m...   [  136.146065] dracut-initqueue[377]: Warning: dracut-initqueue: timeout, still waiting for following initqueue hooks:   [  136.164820] dracut-initqueue[377]: Warning: /lib/dracut/hooks/initqueue/finished/devexists-\x2fdev\x2fdisk\x2fby-uuid\x2fD3E2-0CEB.sh: "[ -e "/dev/disk/by-uuid/D3E2-0CEB" ]"   [  136.188732] dracut-initqueue[377]: Warning: /lib/dracut/hooks/initqueue/finished/devexists-\x2fdev\x2fdisk\x2fby-uuid\x2fe7b218a9-449d-4477-8200-a7bb61a9ff4d.sh: "if ! grep -q After=remote-fs-pre.target /run/systemd/generator/systemd-cryptsetup@*.service 2>/dev/null; then   [  136.220738] dracut-initqueue[377]:     [ -e "/dev/disk/by-uuid/e7b218a9-449d-4477-8200-a7bb61a9ff4d" ]   [  136.240713] dracut-initqueue[377]: fi"

Root cause

SUSE creates its cloud images with a versatileinitramfs (initial RAM filesystem) thatsupports various instance types. This is achieved by using the--no-hostonly --no-hostonly-cmdline -o multipath flags during the initialimage creation. However, when a new kernel is installed or theinitramfs isregenerated, which happens during system updates, these flags are omitted by default.This results in a smallerinitramfs tailored specifically for the current system'shardware, potentially excluding drivers needed for other instance types.

For example, C3 instances use NVMe drives, which require specific modulesto be included in theinitramfs. If a system with aninitramfs lacking these NVMemodules is migrated to a C3 instance, the boot process fails. This issuecan also affect other instance types with unique hardware requirements.

Workaround

Before changing the machine type, regenerate theinitramfs with all drivers:

dracut --force --no-hostonly

If the system is already impacted by the issue create atemporary rescue VM. Use thechroot command toaccess the impacted VM's boot disk then regenerate theinitramfs using the followingcommand:

dracut -f --no-hostonly

Lower IOPS performance for Local SSD on Z3 with SUSE 12 images

Z3 VMs on SUSE Linux Enterprise Server (SLES) 12 images have significantlyless than expected performance for IOPS on Local SSD disks.

Root cause

This is an issue within the SLES 12 codebase.

Workaround

A patch from SUSE to fix this issue is not available or planned. Instead, youshould use the SLES 15 operating system.

RHEL 7 and CentOS VMs lose network access after reboot

If your CentOS or RHEL 7 VMs have multiple network interface cards (NICs) andone of these NICs doesn't use the VirtIO interface, then networkaccess might be lost on reboot. This happens because RHEL doesn't supportdisabling predictable network interface names if at least one NIC doesn't usethe VirtIO interface.

Resolution

Network connectivity can be restored bystopping and starting the VMuntil the issue resolves. Network connectivity loss can be prevented fromreoccurring by doing the following:

Edit the/etc/default/grub file and remove the kernel parametersnet.ifnames=0 andbiosdevname=0.
Regenerate the grub configuration.
Reboot the VM.

repomd.xml signature couldn't be verified

On Red Hat Enterprise Linux (RHEL) or CentOS 7 based systems, you might seethe following error when trying to install or update software using yum. Thiserror shows that you have an expired or incorrect repository GPG key.

Sample log:

[root@centos7 ~]# yum update

...

google-cloud-sdk/signature | 1.4 kB 00:00:01 !!!https://packages.cloud.google.com/yum/repos/cloud-sdk-el7-x86_64/repodata/repomd.xml: [Errno -1] repomd.xml signature could not be verified for google-cloud-sdkTrying other mirror.

...

failure: repodata/repomd.xml from google-cloud-sdk: [Errno 256] No more mirrors to try.https://packages.cloud.google.com/yum/repos/cloud-sdk-el7-x86_64/repodata/repomd.xml: [Errno -1] repomd.xml signature could not be verified for google-cloud-sdk

Resolution

To fix this, disable repository GPG key checking in the yum repository configurationby settingrepo_gpgcheck=0. In supported Compute Engine base images thissetting might be found in/etc/yum.repos.d/google-cloud.repo file. However,your VM can have this set in different repository configuration filesor automation tools.

Yum repositories don't usually use GPG keys for repository validation. Instead,thehttps endpoint is trusted.

To locate and update this setting, complete the following steps:

Look for the setting in your/etc/yum.repos.d/google-cloud.repo file.

cat /etc/yum.repos.d/google-cloud.repo[google-compute-engine]name=Google Compute Enginebaseurl=https://packages.cloud.google.com/yum/repos/google-compute-engine-el7-x86_64-stableenabled=1gpgcheck=1repo_gpgcheck=1gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg   https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg[google-cloud-sdk]name=Google Cloud SDKbaseurl=https://packages.cloud.google.com/yum/repos/cloud-sdk-el7-x86_64enabled=1gpgcheck=1repo_gpgcheck=1gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg   https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg

Change all lines that sayrepo_gpgcheck=1 torepo_gpgcheck=0.

sudo sed -i 's/repo_gpgcheck=1/repo_gpgcheck=0/g' /etc/yum.repos.d/google-cloud.repo

Check that the setting is updated.

cat /etc/yum.repos.d/google-cloud.repo[google-compute-engine]name=Google Compute Enginebaseurl=https://packages.cloud.google.com/yum/repos/google-compute-engine-el7-x86_64-stableenabled=1gpgcheck=1repo_gpgcheck=0gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg   https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg[google-cloud-sdk]name=Google Cloud SDKbaseurl=https://packages.cloud.google.com/yum/repos/cloud-sdk-el7-x86_64enabled=1gpgcheck=1repo_gpgcheck=0gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg   https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg

Instances using OS Login return a login message after connection

On some instances thatuse OS Login,you might receive the following error message after theconnection is established:

/usr/bin/id: cannot find name for group ID 123456789

Resolution

Ignore the error message.

Known issues for Windows VM instances

Support for NVMe on Windows using the Community NVMe driver is inBeta, and the performance might not matchthat of Linux instances.The Community NVMe driver has been replaced with the Microsoft StorNVMe driverin Google Cloud public images. We recommend that youreplace the NVME driver on VMs created before May 2022and use the Microsoft StorNVMe driver instead.
After you create an instance, you cannot connect to it instantly. Allnew Windows instances use theSystem preparation (sysprep)tool to set up your instance, which can take 5–10 mins to complete.
Windows Server images cannot activate without a network connection tokms.windows.googlecloud.com and stop functioning if they don'tinitially authenticate within 30 days. Software activated by the KMSmust reactivate every 180 days, but the KMS attempts to reactivate every7 days. Make sure toconfigure your Windows instancesso that they remain activated.
Kernel software that accesses non-emulatedmodel specific registers will generate general protection faults, which can cause a system crashdepending on the guest operating system.
Thevioscsi driver, used for SCSI disks, sets theremovable flag, causingthe disks to be treated as removable storage. This causes unexpectedaccess restrictions in Windows to disks that are subject to Group PolicyObjects (GPOs) that target removable storage.

Guest agent fails to start

The Windows guest agent version20251011.00 fails to start under certain loadconditions.

Root causeThe Windows guest agent packaging for version20251011.00 incorrectly sets thestart mode of the Windows guest agent toauto on Windows Service Manager.

ResolutionTo resolve this issue, update your instance to use guest agent version20251120.01 or later.

Manual workaroundIf installing version20251120.01 is not an option, run the following command:

sc.exe config GCEAgent start=delayed-auto

Credential management features might fail for Windows VMs using non-English language names

The Windows guest agent identifies administrator accounts and groups usingstring matching. Therefore, credential management features only functioncorrectly when you use English language names for user accounts and groups,for example,Administrators. If you use non-English language names, credentialmanagement features such as generating or resetting passwords might not functionas expected.

Windows Server 2016 won't boot on C3D machine types with 180 or more vCPUs

Windows Server 2016 won't boot onC3D machine typesthat have 180 or more vCPUs attached (c3d-standard-180 orc3d-standard-360) orlarger. To work around this issue, choose one of the following options:

If you need to use Windows Server 2016, use a smaller C3D machine.
If you need to use thec3d-standard-180 orc3d-standard-360 machine types, use a later versionof Windows Server.

Windows Server 2025 and Windows 11 24h2/25h2 - Suspend and resume support

Windows Server 2025, Windows 11 24h2, and Windows 11 25h2 are unable to resumewhen suspended. Until this issue is resolved, the suspend and resume featurewon't be supported for Windows Server 2025, Windows 11 24h2 and Windows 11 25h2.

Errors when measuring NTP time drift using w32tm on Windows VMs

For Windows VMs on Compute Engine running VirtIO NICs, there is a known bugwhere measuring NTP drift produces errors when using the following command:

w32tm /stripchart /computer:metadata.google.internal

The errors appear similar to the following:

Tracking metadata.google.internal [169.254.169.254:123].The current time is 11/6/2023 6:52:20 PM.18:52:20, d:+00.0007693s o:+00.0000285s  [                  *                  ]18:52:22, error: 0x8007273318:52:24, d:+00.0003550s o:-00.0000754s  [                  *                  ]18:52:26, error: 0x8007273318:52:28, d:+00.0003728s o:-00.0000696s  [                  *                  ]18:52:30, error: 0x8007273318:52:32, error: 0x80072733

This bug only impacts Compute Engine VMs with VirtIO NICs. VMs that use gVNICdon't encounter this issue.

To avoid this issue, Google recommends using other NTP drift measuring tools,such as theMeinberg Time Server Monitor.

Inaccessible boot device after updating a VM from Gen 1 or 2 to a Gen 3+ VM

Windows Server binds the boot drive to its initial disk interface type uponfirst startup. To change an existing VM from an older machine seriesthat uses a SCSI disk interface to a newer machine series that uses an NVMe diskinterface, perform a Windows PnP driver sysprep before shutting down the VM.This sysprep only prepares device drivers and verifies that all disk interfacetypes are scanned for the boot drive on the next start.

To update the machine series of a VM, do the following:

From a Powershell prompt asAdministrator run:

PS C:\> start rundll32.exe sppnp.dll,Sysprep_Generalize_Pnp -wait

Stop the VM.
Change the VM to the new VM machine type.
Start the VM.

If the new VM doesn't start correctly, change the VM back to theoriginal machine type in order to get your VM running again. It should startsuccessfully. Review themigration requirementsto verify that you meet them. Then retry the instructions.

Limited disk count attachment for newer VM machine series

VMs running on Microsoft Windows with the NVMe disk interface, which includesT2A and all third-generation VMs, have a disk attachment limit of 16disks. This limitation does not apply to fourth-generation VMs (C4, M4).To avoid errors, consolidate your Persistent Disk andHyperdisk storage to a maximum of 16 disks per VM. Local SSDstorage is excluded from this issue.

Replace the NVME driver on VMs created before May 2022

If you want to use NVMe on a VM that uses Microsoft Windows, and the VM wascreated prior to May 1, 2022, you must update the existing NVMe driver in theGuest OS to use theMicrosoft StorNVMe driver.

You must update the NVMe driver on your VM before you change the machine typeto a third generation machine series, or before creating a boot disk snapshotthat will be used to create new VMs that use a third generation machineseries.

Use the following commands to install the StorNVME driver package and removethe community driver, if it's present in the guest OS.

googet updategooget install google-compute-engine-driver-nvme

Lower performance for Local SSD on Microsoft Windows with C3 and C3D VMs

Local SSD performance is limited for C3 and C3D VMs running Microsoft Windows.

Performance improvements are in progress.

Lower performance for Hyperdisk Extreme volumes attached to`n2-standard-80` instances running Microsoft Windows

Microsoft Windows instances running onn2-standard-80 machine types can reachat most 80,000 IOPS across all Hyperdisk Extreme volumes that are attached to the instance.

Resolution

To reach up to 160,000 IOPS with N2 instances running Windows, choose one of the followingmachine types:

n2-highmem-80
n2-highcpu-80
n2-standard-96
n2-highmem-96
n2-highcpu-96
n2-highmem-128
n2-standard-128

Poor networking throughput when using gVNIC

Windows Server 2022 and Windows 11 VMs that use gVNIC driver GooGet packageversion1.0.0@44 or earlier might experience poor networking throughput whenusingGoogle Virtual NIC (gVNIC).

To resolve this issue, update the gVNIC driver GooGet package to version1.0.0@45 or later by doing the following:

Check which driver version is installed on your VM by running the followingcommand from an administrator Command Prompt or Powershell session:
```
googet installed
```
The output looks similar to the following:
```
Installed packages:  ...  google-compute-engine-driver-gvnic.x86_64 VERSION_NUMBER  ...
```
If thegoogle-compute-engine-driver-gvnic.x86_64 driver version is1.0.0@44 or earlier, update theGooGet package repository by running the following command from an administrator Command Prompt orPowershell session:
```
googet update
```

Large C4, C4D, and C3D vCPU machine types don't support Windows OS images

C4 machine types with more than 144 vCPUS and C4D and C3D machine types withmore than 180 vCPUs don't support Windows Server 2012 and 2016 OS images.Larger C4, C4D, and C3D machine types that use Windows Server 2012 and 2016 OSimages will fail to boot. To work around this issue, select a smaller machinetype or use another OS image.

C3D VMs created with 360 vCPUs and Windows OS images will fail to boot.To work around this issue, select a smaller machine type or use another OS image.

C4D VMs created with more than 255 vCPUs and Windows 2025 will fail to boot.To work around this issue, select a smaller machine type or use another OS image.

Generic disk error on Windows Server 2016 and 2012 R2 for M3, C3, C3D, and C4D VMs

Warning: Windows Server 2012 R2 is no longer supported and is not recommendedfor use. Upgrade to a supported version of Windows Server.

The ability to add or resize a Hyperdisk or Persistent Disk for arunning M3, C3, C3D, or C4D VM doesn't work as expected on specific Windowsguests at this time. Windows Server 2012 R2 and Windows Server 2016, and theircorresponding non-server Windows variants, don't respond correctly to the diskattach and disk resize commands.

For example, removing a disk from a running M3 VM disconnects the disk from aWindows Server instance without the Windows operating system recognizing thatthe disk is gone. Subsequent writes to the disk return a generic error.

Resolution

You must restart the M3, C3, C3D, or C4D VM running on Windows after modifyinga Hyperdisk or Persistent Disk for the disk modifications to berecognized by these guests.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.

Movatterモバイル変換

Known issues Stay organized with collections Save and categorize content based on your preferences.

General issues

Local SSD disks attached to C4A, C4D, C4, and H4D instances might not capture all writes in case of power loss

Hyperdisk Throughput and Hyperdisk Extreme capacity consumes Persistent Disk quotas simultaneously

Workload interruptions on A4 VMs due to firmware issues for NVIDIA B200 GPUs

Possible host errors during C4 VM creation on sole tenant nodes

Canceling jobs on 32-node or larger HPC clusters exceeds the timeout

Verify the fix

Resolved: Modifying the IOPS or throughput on an Asynchronous Replication primary disk using thegcloud compute disks update command causes a false error

The metadata server might display oldphysicalHost VM metadata

LongbaseInstanceName values in managed instance groups (MIGs) can cause disk name conflicts

Creating reservations or future reservation requests using an instance template that specifies an A2, C3, or G2 machine type causes issues

Limitations when using-lssd machine types with Google Kubernetes Engine

Single flow TCP throughput variability on C3D VMs

The CPU utilization observability metric is incorrect for VMs that use one thread per core

Google Cloud console SSH-in-browser connections might fail if you use custom firewall rules

Temporary names for disks

Increased latency for some Persistent Disks caused by disk resizing

Your automated processes might fail if they use API response data about your resource-based commitment quotas

Root cause

How to work around this issue

Known issues for GPU instances

Accelerator-optimized machine types that have Local SSD automatically attached might take hours to terminate and restart

Creation errors and decreased performance when using Dynamic NICs with GPU instances

Known issues for bare metal instances

C4D bare metal doesn't support SUSE Linux 15 SP6 images

Simulating host maintenance doesn't work for C4 bare metal instances

Workaround

Lower than expected performance with Z3 bare metal instances on RHEL 8

Root cause

Workaround

Issues related to using Dynamic Network Interfaces

Dropped packets when using Dynamic NICs with alias IP ranges, protocol forwarding, or Passthrough Network Load Balancers

Issues with installation and management of Dynamic NICs in guest agent versions 20250901.00 to 20251120.01

Root cause

Resolution

Packet interception can result in dropped packets due to missing VLAN tags in the Ethernet headers

Root cause

Workaround

Known issues for Linux VM instances

Package upgrade error on Rocky Linux 9.7

Root cause

Resolution

OS Login isn't supported on SLES 16

Supported URL formats for startup script

Debian 11 VMs that use OS image version prior to v20250728 fail to run apt update

Root cause

Resolution

Installingubuntu-desktop package breaks VM network upon restarting

Root cause

Recovery

Ubuntu VMs that use OS Image version v20250530 show incorrect FQDN

Root cause

Workaround

Missing mkfs.ext4 on openSUSE Images

Workaround

SUSE Enterprise VMs fail to boot after changing instances types

Root cause

Workaround

Lower IOPS performance for Local SSD on Z3 with SUSE 12 images

Root cause

Workaround

RHEL 7 and CentOS VMs lose network access after reboot

repomd.xml signature couldn't be verified

Instances using OS Login return a login message after connection

Known issues for Windows VM instances

Guest agent fails to start

Credential management features might fail for Windows VMs using non-English language names

Windows Server 2016 won't boot on C3D machine types with 180 or more vCPUs

Windows Server 2025 and Windows 11 24h2/25h2 - Suspend and resume support

Errors when measuring NTP time drift using w32tm on Windows VMs

Inaccessible boot device after updating a VM from Gen 1 or 2 to a Gen 3+ VM

Limited disk count attachment for newer VM machine series

Replace the NVME driver on VMs created before May 2022

Lower performance for Local SSD on Microsoft Windows with C3 and C3D VMs

Lower performance for Hyperdisk Extreme volumes attached ton2-standard-80 instances running Microsoft Windows

Poor networking throughput when using gVNIC

Large C4, C4D, and C3D vCPU machine types don't support Windows OS images

Generic disk error on Windows Server 2016 and 2012 R2 for M3, C3, C3D, and C4D VMs

Known issues

Resolved: Modifying the IOPS or throughput on an Asynchronous Replication primary disk using the`gcloud compute disks update` command causes a false error

The metadata server might display old`physicalHost` VM metadata

Long`baseInstanceName` values in managed instance groups (MIGs) can cause disk name conflicts

Limitations when using`-lssd` machine types with Google Kubernetes Engine

Installing`ubuntu-desktop` package breaks VM network upon restarting

Lower performance for Hyperdisk Extreme volumes attached to`n2-standard-80` instances running Microsoft Windows