Observe and monitor VMs

This document describes how to access and view virtual machine (VM) metrics. Italso describes how to review VM metrics to learn more about your VMs ortroubleshoot specific problems with a VM.

Monitoring virtual machine (VM) instances is essential tomaintaining your VM resources. Compute Engine offers a high-level view of yourVM metrics using theObservability tab in the Google Cloud console. Thistab provides a predefined dashboard using telemetry data so you can monitoryour VMs and make informed decisions about your Compute Engineresources. You can also customize the predefined dashboard to view only thespecific metrics that you want.

All VMs have basic process utilization data available when they are created.However, installing theOps Agentprovides deeper insights into VM behavior.

Note: CPU utilization monitoring isn't available with bare metal instances. Ifyou install the Ops Agent on a bare metal instance, you can instead use themetric "OS Reported CPU %".

For more information about creating a monitoring alerting policy, using theMetrics Explorer, or for general information on how monitoring and metricswork on Google Cloud, seeCloud Monitoring documents.

Before you begin

Optional: Install theOps Agentto gather more detailed data from your Compute Engine instances.

To check which VM instances have the Ops Agent installed, do the following:

  1. In the Google Cloud console, go toMonitoring Dashboards

    Go to Monitoring Dashboards

  2. SelectVM instances from the dashboard list.

  3. ClickList to view the VMs as a list.

    All the VMs in your project are displayed. TheAgent column showsthe status of Ops Agent installation. You can install or update theagent from this page.

  4. Optional: To update thePredefined dashboard to display events, such asthose that indicate an update to a managed instance group, clickSelect Events, and then complete the dialog.

    For more information about events, seeEvent types.

Access VM observability metrics

Access information for single or multiple VMs using theObservabilitytab in the Google Cloud console. By default, a predefined dashboard displays theVM metrics. If you want to view only the specific metrics that you want, you cancreate a customized dashboard.

View observability metrics for a single VM

Basic VM metrics such as CPU utilization and network traffic are available toyou when you create your VM. Metrics for memory and process utilizationare only available with the installation of the Ops Agent, which is the primaryagent for collecting telemetry from your Compute Engine instances.

Note: CPU utilization monitoring isn't available with bare metal instances. Ifyou install the Ops Agent on a bare metal instance, you can instead view the"OS Reported CPU %" metric.

To view the metrics for a single VM do the following:

  1. In the Google Cloud console, go to theVM instances page.

    Go to VM instances

  2. Select a VM to open theDetails page.

  3. Click theObservability tab to display information about the VM.

  4. Optional: Reset the one hour default timeframe to the timeframe you wantto monitor.

  5. Optional: To update thePredefined dashboard to display events, such asthose that indicate an update to a managed instance group, clickSelect Events, and then complete the dialog.

    For more information about events, seeEvent types.

The information inFigure 1 displays VM details without theOps Agentinstalled on the VM. Notice that theMemory andDisk Space Utilizationgraphs have no data.

Observability tab for one VM without the Ops Agent installed.
Figure 1: TheObservability tab for a single VM without the Ops Agent installed.

View observability metrics for multiple VMs

Observability at the fleet level displays the metrics for the top fiveVMs with the highest process utilization. The top five VMs listed vary bymetric. You might not see the same five VMs for each process. Although there ismore data available at the fleet level without installing theOps Agent compared to the amount of data available for a single VM, installingthe agent provides more data for future troubleshooting purposes.

Note: CPU utilization monitoring isn't available with bare metal instances. Ifyou install the Ops Agent on a bare metal instance, you can instead view the"OS Reported CPU %" metric.

To view the metrics for multiple VMs, do the following:

  1. In the Google Cloud console, go to theVM instances page.

    Go to VM instances

  2. Click theObservability tab.

  3. Optional: Reset the one hour default timeframe to the timeframe that you wantto monitor.

  4. Filter the results by one or more of the following options:

    • ID
    • Name
    • Machine type
    • Zone
    • Region
    • Instance group
    • Labels
    • State

The information inFigure 2 displays an example of the Observability tab whenmultiple VMs in a project have the Ops Agent installed. Notice there are moremetrics available about these VMs.

Multiple VM instances with the Ops Agent installed.
Figure 2: MultipleVM instances with the Ops Agent installed.

View detailed metrics for a VM

Each VM process metric is represented by a graph line on a chart. In thefollowing example, theuptime-demo VM has the Ops Agent installed. Memoryutilization data is available for troubleshooting purposes. If a VM is notlisted on the card, filter by the VM name to find a specific VM.

To retrieve the information about this VM or another of the top five VMs fromthe Observability tab, do the following:

  1. Hold the pointer over the graph line of any VM. A card appears with a listof the top five VMs using the process, each displaying a metric.
  2. To learn more about the VM's behavior, click the VM graph line or a specificVM name on the list.

Theuptime-demo VM displayed on the card inFigure 3 reveals somemetrics that might require a review.

The graph line represents a VM. Click it to learn more information about a specific VM.
Figure 3: The graph line represents a VM. Click it to learn more about a specific VM.

Click theuptime-demo VM to open theVM Details page displayed inFigure 4, which provides the following information:

  • The Ops Agent status.
  • The in-context options to createAlerts, check forEvents,or createUptime Checks.
  • The option to view the details of the VM's configurations, metrics, and logs.
The VM Details page provides information about a specific VM.
Figure 4: The VM Details page provides information about a specific VM.

Create a customized dashboard to view specific metrics

By default, theObservability tab in Compute Engine provides a predefineddashboard that displays basic VM metrics. To view only the specific metrics thatyou want to see, you can modify the predefined dashboard and save it as acustomized dashboard. You can further customize the dashboard as you see fit.

To create a customized dashboard, do the following:

  1. In the Google Cloud console, go to theVM instances page.

    Go to VM instances

  2. Go to theObservability tab as follows:

    • For a single VM: In theVM instances page, click the VM name to openitsDetails page, and then click theObservability tab for that VM.
    • For multiple VMs: In theVM instances page, click theObservabilitytab.
  3. If theDashboard drop-down is enabled, then customized dashboards areavailable. To modify a custom view, select a custom view from the drop-down,and then, in the dashboard toolbar, click.

  4. Otherwise, to customize the predefined dashboard, in the dashboard toolbar,click.

    Compute Engine creates a copy of the predefined dashboard, and then opensthe copy in edit mode.

  5. In the editor, you can add, modify, delete, reposition, or resize thevisualizations in the dashboard. The visualizations are collectively calledwidgets. For more information about the differentwidget types, seeDashboards overview.

    • To add a widget, in the dashboard toolbar, clickAdd widget andcomplete the configuration.

      For example, to view the logs with your metric data, clickAdd widget,selectLogs, and then clickApply.

    • To modify a widget, place your pointer on the widget toactivate the toolbar, clickEdit widget, and then use theConfigure widget dialog. To apply your changes to the dashboard,in the toolbar, clickApply. To discard your changes, clickCancel.

    • To delete a widget, place your pointer on the widget to activate thetoolbar, clickMore chart options,and then selectDelete.

    • To reposition a widget, use your pointer to drag the widget by its headerto a new location.

    • To resize a widget, use your pointer to reposition the right-hand cornerof the widget.

  6. After you finish modifying the dashboard, clickSave.

  7. In the dialog confirming the changes, clickView customized dashboardto go to the customized view.

    You can switch back to the predefined view by selectingPredefined fromtheDashboard drop-down.

Review resource metrics

To learn more about each resource metric, click each process within theObservability tab menu:

  • ExploreCPU,Processes,Memory utilization,Networktraffic, andDisk utilization.
  • View log data by searchingLogs to identify and viewSystem Events.
  • Add third partyIntegrations and check forConfigured existingintegrations.

The rest of this section describes examples of how some processes mightaffect your workloads. This information assumes that theOps Agent isinstalled on your VMs.

CPU utilization

An example of extreme CPU utilization might be when a server is under anunexpectedly heavy load, such as when a website experiences a sudden surgein traffic or when a large-scale data processing task is in progress. In suchsituations, the CPU might be running at 100% capacity for an extended period oftime, which can cause the server to slow down or become unresponsive.

In this example, saturation is the concern. If your CPU utilization is at 100%,that might be fine for your workloads, but you might want to examine othermetrics to learn if this requires intervention. In this case, you might want tocreate analerting policy so you're notified when a VM'sCPU utilization surges.

With proper permissions, you can connect using SSH to your VMs to investigatethe problem. However, if the Ops Agent is installed, you can see morehistorical data to help you troubleshoot.

Note: CPU utilization monitoring isn't available with bare metal instances. Ifyou install the Ops Agent on a bare metal instance, you can instead view the"OS Reported CPU %" metric.

Process utilization

An example of extreme process behavior might be when a process is consuming anexcessive amount of resources such as CPU, memory, or disk I/O, to the pointwhere it's causing performance degradation or even crashes the VM.

For example, if a process running on a VM is experiencing a memory leak,it might start consuming increasingly large amounts of memory over time,eventually causing the VM to run out of memory and crash. Similarly,if a process is using the disk heavily, it can cause the VM's disk I/O tobecome saturated, leading to slow response times for other processes.

Memory utilization

Databases require a large amount of memory to perform operations like indexing,sorting, and joining tables.

An example of high memory usage on a VM is when you run a database server,such as Cloud SQL for MySQL or Cloud SQL for PostgreSQL, with a large dataset.If the available memory of your VM is too small, reloading a dataset intomemory can cause the database to run slow or crash.

Network performance

Network performance issues are the result of different factors: congestion,bandwidth limitations, hardware or software issues, and latency. To diagnosethe problem, monitor your network performance metrics, troubleshoot hardwareand software issues, and analyze network traffic patterns to identify andsolve the root cause of the issue.

Disk utilization

High disk utilization on a VM occurs when there is a great amount of data beingread to or written from the virtual disk resulting in a delay in disk accessand a possible effect on VM performance.

Monitoring disk utilization metrics like disk I/O operations persecond (IOPS), disk queue length, and average disk response time can helpidentify and diagnose high disk utilization issues on a VM.

Check logs and system events

TheAll Logs page provides log data about your resources. Sort byseverity to identify problems and inspect the payload.

Audit logs record administrative events that occur inyour resources. The logs can tell you what happened to trigger the event.Multiple logs are recorded and maintained in the same row, so for example,if you have 20 identical logs, the information is stored in one row, ratherthan 20 separate rows.

You can think ofSystem Events as anumbrella term for events that are occurring at a higher level, but might affectyour Compute Engine resources. A system event occurs when an error unrelatedto a planned event fires. System events are logged at the fleet level.

Use third party integrations

Monitoring provides integrations with third-party applications.These integrations let you collect telemetry from applications such as ApacheWeb Server, Cloud SQL for MySQL, Memorystore for Redis, and others fordeployments running on Compute Engine and GKE. When youuse Compute Engine, third-party telemetry is collected by the Ops Agent.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.