Monitor disk health

Preview

This product or feature is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of theService Specific Terms. Pre-GA products and features are available "as is" and might have limited support. For more information, see thelaunch stage descriptions.

You can check the health of a Persistent Disk or Google Cloud Hyperdisk volume by reviewingthedisk performance status metric. This metric indicates whether the disk'sperformance is potentially affected by adverse events within Compute Engine.

An issue affecting the disk performance status might also be visible in yourproject'sPersonal Service Health(PSH) dashboard or theGoogle Cloud Service Healthdashboard.

This document discusses the disk performance status and how to use it to troubleshootperformance issues.

When to check a disk's health

If you notice a performance issue with a disk, check the disk's health by reviewingthe disk performance status metric. The disk performance status metric is updatedevery minute and represents disk performance over the entire previous minute.For steps to check the disk's health, seeview disk performance status.

The following table summarizes the possible values of the disk performance status.

StatusMeaning
HealthyDisk performance is as expected.
DegradedYou might temporarily observe higher than expected I/O latency.
Severely degradedHigh I/O latency or other errors are occurring.

If the performance status isn'tHealthy, seeUnderstand each statusfor next steps.

If the performance status isHealthy, the disk is functioning normally and youneed to check for other causes for the performance issue.You should check for application or operating system errors and make sure yourdisk is optimized correctly. For optimization guidelines,seeOptimize HyperdiskandOptimize Persistent Disk.

How the disk health relates to other disk performance metrics

The disk's health as indicated by the performance status metric shows theinternal status of the disk from Google's perspective. If a disk's status isDegraded orSeverely Degraded, the root cause is always within theCompute Engine infrastructure.

You generally can't change a disk's health by modifying the workload. However,in rare cases, a change to the workload might trigger an internal issue, so itmight be possible to mitigate an issue by modifying the workload.

To learn about the other available disk performance metrics, seeReview disk performance metrics.

Scenarios that don't affect the disk performance status

The disk performance status is unrelated to performance issues caused by thefollowing factors:

  • Incomplete or insufficient disk optimization
  • Performance limit associated with the disk and machine type (if the chosen machine type can't meet the performance requirements of your workload)
  • Increased load on the disk due to workload traffic
  • User, application, or operating system error
  • Full or corrupted disks
  • For Hyperdisk and Extreme Persistent Disk volumes, insufficientlyprovisioned IOPS or throughput.

In these situations, it is your responsibility to improve performance, such asby optimizing the disk, scaling up the workload, changingthe machine type, and provisioning more capacity, IOPS, or throughput.

View a disk's health in Cloud Monitoring

To view a disk's health, create a chart in Metrics Explorer.

Required roles and permissions

To get the permissions that you need to check the disk performance status metric, ask your administrator to grant you the following IAM roles on the project:

For more information about granting roles, seeManage access to projects, folders, and organizations.

You might also be able to get the required permissions throughcustom roles or otherpredefined roles.

Create a chart in Metrics Explorer

To create a chart, build a query with either the menu-driven interface orPromQL.

Menu-driven interface

To view the health of one or more disks on a chart, follow these instructions.
  1. In the Google Cloud console, go to the Metrics explorer page:

    Go toMetrics explorer

    If you use the search bar to find this page, then select the result whose subheading isMonitoring.

  2. In the toolbar of the Google Cloud console, select your Google Cloud project. ForApp Hub configurations, select the App Hub host project or the app-enabled folder's management project.
  3. In theMetric element, expand theSelect a metric menu, enterVM Instance in the filter bar, and then use the submenus to select a specific resource type and metric:
    1. In theActive resources menu, selectVM Instance.
    2. In theActive metric categories menu, selectInstance.
    3. In theActive metrics menu, selectDisk performance status.
    4. ClickApply.
    The fully qualified name for this metric iscompute.googleapis.com/instance/disk/performance_status.
  4. To add filters, which remove time series from the query results, use theFilter element.

  5. Configure how the data is viewed.
    Disable aggregation. Make sure that in theAggregation element, first menu is set toUnaggregated and the second menu is set toNone.
    To view the health of a specific disk, filter bydevice_name.

    For more information about configuring a chart, seeSelect metrics when using Metrics Explorer.

PromQL

  1. Open the query editor: follow the steps inWrite PromQL Queries.

  2. Enter your query in the query editor. For example, to view the performancestatus of a specific disk, enter the following query:

  last_over_time    (compute_googleapis_com:instance_disk_performance_status      {monitored_resource="gce_instance",        project_id ="PROJECT_ID",        device_name="DISK_NAME"}[${__interval}])

ReplaceDISK_NAME with the disk name, for example,disk-1.

If you view the results in a chart, then there are 3 lines for each disk,one for each possible status. Similarly, if you view the query result in atable, then the table has 3 rows for each disk.

If you built the query with PromQL, then each row or line willhave a value of1 or0. For queries built with the menus, the values forwill be100% or0.

The disk's current health is represented by the row or line whose value is100%or1.

For example, the following screenshot shows the chart for a disk nameda-test-VM,whose status isHealthy:

screenshot showing the chart where the disk's status is Healthy

If you view the query results as a table, the following table is an example ofthe results for a disk that'sHealthy:

performance_statusvalue
Healthy1
Degraded0
Severely Degraded0

The following screenshot shows the chart for a disk calledreplica-23509 whose status isDegraded:screenshot showing the chart where the disk's status is Degraded

For information about what each performance status means, seeUnderstand each status.After you create the chart, you cansave the chart to a dashboard for future use.

Fractional results

If your query includes fractional results like in the following table,this is typically because the selecteddisplay periodwas long. As a result Cloud Monitoring aggregated the data over time.A value of77% for theHealthy status means that the disk's status wasHealthy77% of the selected display period.

performance_statusvalue
Healthy77%
Degraded23%
Severely Degraded0

For a more granular view of a disk's health, use a display period of a few hoursor some number of minutes.

Understand each status

This section discusses what each status means and when you might need to takefurther action.

Healthy

TheHealthy status indicates that from Google's perspective, the disk isworking normally.

If aHealthy disk has performance issues, don't contact support. Instead,troubleshoot the disk using some of the following suggestions:

  • Review disk performance metrics, such as latency andqueue depth.
  • Check your workload's logs and metrics for anomalies and bottlenecks.
  • If you're using a Persistent Disk, make sure the provisioned capacity can meet thedisk's performance needs. If you're using Hyperdisk orExtreme Persistent Disk volumes, verify you've provisioned enough IOPSand throughput.
  • Ensure you have followed the guidelines to optimize the disk. For moreinformation, seeOptimize HyperdiskandOptimize Persistent Disk.

Degraded

You usuallydo not need to contact support if your disk's statusisDegraded. ADegraded status is generally caused by normal internalmaintenance on the Compute Engine infrastructure.

You might not notice any impact to the disk's performance while its status isDegraded. If the performance issue and theDegraded statuscorrelate in time, the performance issue might still be unrelated to theDegraded status.

In the unlikely event that a performance issue is due to theDegraded status,the impact is usually temporary. The disk's status should revert toHealthy withina few minutes.

You can safely ignore theDegraded status if there are no performance issueswith the disk.

What to do if there is a performance problem

If your disk's performance status isDegraded, and you're observing a performanceissue, follow these steps:

  1. Check thePSH dashboard to see if there is an incidentaffecting the disk. If there is an incident, don't contact support as Googleis aware and working to resolve the issue.
  2. If there are no known issues, wait at least 5 minutes for the performanceissue to resolve on its own.
  3. If after 5 minutes, the performance issue is unresolvedand the status isstillDegraded, make sure that the performance issue isn't because the diskis insufficiently optimized. For example, check the disk's latency and queue depth.It's possible that the performance issue and theDegraded status are unrelatedand only coincidental.To do so,review the disk's metricsand theperformance optimization guidelines.

  4. If the performance issues continue andall the following conditions are met,you can contact support for assistance:

    • The disk's status has beenDegraded for more than 5 minutes
    • You are reasonably confident it's not a workload issue because you'veoptimized the disk and verified there are no other issues such as abottleneck or an overloaded application
    • There are no alerts in the PSH dashboard

Google doesn't recommend creating an alert for theDegraded statusdirectly, but rather alerting on higher level application status and usingthis metric to debug problems.

Severely Degraded

A disk whose performance status isSeverely Degraded is experiencing aperformance issue. This issue can be due to an incident or error and mightalready be visible in thePSH dashboardor theGoogle Cloud service health dashboard.

What to do

If your disk's performance status isSeverely Degraded, follow these steps:

  1. Check the PSH dashboard and the general Google Cloud health dashboard for anincident affecting the disk. If there is an incident, don't contact support asGoogle is aware and working to resolve the issue.
  2. If there are no known issues in both dashboards, contact supportfor assistance.

Decision tree

The following diagram illustrates how to proceed if a disk has a performance issueand summarizes the information in the preceding sections.

Flowchart describing steps to take to interpret the disk performance status metric.

As shown in the flowchart, you should only contact support if there are no knownalerts in PSH and Cloud service dashboards and the disk status isSeverely Degraded. If the disk isDegraded, contact support only ifall of thefollowing conditions have been met:

  • The disk has beenDegraded for more than 5 minutes
  • You have ruled out a workload error or misconfiguration (such as networkingissues)
  • No additional optimizations can be performed at the application, workload, ordisk level
  • You've reviewed all the disk's metrics
  • You've examined your workload and virtual machine (VM) logs

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.