Troubleshooting vCPU soft lockups

Linux

This document describes how to troubleshoot vCPU soft lockups. Asoft lockupoccurs when a virtual machine (VM) instance's vCPU is unable to run a new taskfor more than 20 seconds. Most soft lockups are caused by bugs in applicationsoftware.

Soft lockups can cause VMs to become unresponsive for short periods of time,disrupt SSH access to VMs, and trigger application timeouts or failover. VMsthat are experiencing a soft lockup might also have unusually high orunusually low CPU utilization, depending on the exact cause of the soft lockup.

Identify soft lockups

To identify whether your VM is experiencing a soft lockup, do one of thefollowing:

Example soft lockup stack trace

watchdog: BUG: soft lockup - CPU#3 stuck for 22s!

To detect future soft lockups, you can do the following:

  1. Enable serial port output logging.

  2. Create a log-based alerting policyfor the following log:

    resource.type="gce_instance" log_id("serialconsole.googleapis.com/serial_port_1_output") textPayload=~"watchdog.*lockup"
    Note: When you test the query, it is likely that no logs appear. This isexpected behavior.

Troubleshoot soft lockups

After you've identified that a soft lockup is occurring, try the followingtroubleshooting steps to resolve the issue:

  1. Check your OS vendor's site for known errors with your OS version. Sometimesyou might find reference to specific kernel modules in the stack trace thatsuggests a particular function or operation that is involved.
  2. Identify whether the soft lockup repeats with any frequency, such ascoinciding with high load or certain activities. If the soft lockupscorrelate with high load, you might need to reconfigure your workload, forexample by using a larger VM or splitting the load across more VMs.
  3. Check if the soft lockups correlate with any changes to your runtimeenvironment such as new software deployments or OS image updates.
  4. Evaluate whether anymaintenance eventshave taken place around the time of the soft lockup, by reviewingaudit logs for systemevent audit logs.

If the proceeding troubleshooting steps didn't resolve the issue,file a support caseand include all of the information you gathered from troubleshooting.

Best practices to avoid soft lockups

To help prevent your VMs from experiencing soft lockups, we recommendimplementing the following best practices:

  • Ensure that you have appropriate redundant components configured for yoursystem, such as high availability clusters, to provide a failover capabilityif a particular VM experiences a prolonged soft lockup. For more information,seeDesigning resilient systems.
  • For compute-intensive workloads, consider usingcompute-optimized machine families.
  • Test your workload with simulatedmaintenance eventsto learn how your workload performs during live migration (if enabled),particularly under load testing.
  • If you're running a custom Linux Kernel or custom modules in your VM, test newchanges under load before deploying them to your production environment.Confirm that your custom changes don't disqualify you from receiving supportfrom your OS vendor.
  • Keep your operating system up to date. For more information, seeOperating system details.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-10-02 UTC.