Cloud Monitoring overview

This document provides an overview of the services that Cloud Monitoringprovides. These services can help you to understand the behavior, health, andperformance of your applications and of other Google Cloud services.Cloud Monitoring automatically collects and stores performance informationfor most Google Cloud services. Youcan collectPrometheusmetrics by usingGoogle Cloud Managed Service for Prometheus.If you install theOps Agent on your Compute Enginevirtual machines (VMs), then you can collect metrics and logs from yourapplications and from third-party applications.

The alerting, testing, and visualization services provided byCloud Monitoring help you answer important questions like the following:

  • What is the load on my service?
  • Is my website responding correctly?
  • Is my service performing well?
  • What is the health of myApp Hub application?

Cloud Monitoring provides both Google Cloud console and API support for most ofits services. Some services also support the Google Cloud CLI orTerraform. The Cloud Monitoring API reference pages, such as the pagealertPolicies.list,let you experiment with API calls directly from the reference page.

Cloud Monitoring services

Cloud Monitoring provides different services that you can useto understand the health and performance of your applications, and of theother Google Cloud services that you use.

Incidents and notifications

To be notified when the value of a performance metric meets criteria that youdefine, create analerting policy. The alerting policyincludes the list of people or groups who are to receive notifications.Monitoring supportscommon notification channels,including email, Cloud Mobile App, and services such as PagerDuty or Slack.For example, you might create an alerting policy so that you are notifiedwhen the CPU utilization of a VM exceeds 80%.

Each notification includes relevant information about a failure, and itincludes a link to an incident. Anincident is a persistent recordthat stores information that you can use to troubleshoot the failure. Typically,a record lists the status of the incident, links to logs, a chart of therecorded metric data, labels, and duration.

The alerting service is integrated with many Google Cloud services. When theseintegrations exist, you might see a panel that lists recommended alerts, oryou might see a button on a chart that lets you create an alerting policy.In both cases, the alerting policies are pre-configured,you only specify the list of people or groups to be notified.

You can create and manage alerting policies by using the Google Cloud console,the Cloud Monitoring API, the Google Cloud CLI, or Terraform.

Proactive monitoring and validation

To test the availability, consistency, and performance of your services,applications, web pages, and APIs, createsynthetic monitors. For example,you can probe HTTP, HTTPS, and TCP endpoints for responsiveness withuptime checks, and then get notified when anendpoint fails to respond. You can also create abroken-link checker to crawl a web page and thennotify you when broken links are detected.

You can create and manage synthetic monitors by using the Google Cloud console,the Cloud Monitoring API, the Google Cloud CLI, or Terraform.

Data visualization

As you instantiate Google Cloud resources or register applications withApp Hub, the dashboard service automatically createsGoogle Cloud-managed dashboards.These dashboards show curated information that helps you understandthe health of your resources and applications.For example, for an App Hub application, dashboards are created forthe application and for each of its services and workloads.These dashboards show information like an application's logor metric data, and the number of open alerts.

The dashboards created by Google Cloud might provide you enough informationto complete an investigation. However, they might not provide the exact datayou need to see trends, identify outliers, or view otherdetails about your data. To complete these tasks, you can use thedashboard andcharting services:

  • To control what data you view and the display format for that data,create a custom dashboard. For example,you mightimport a Grafana dashboard orinstall a dashboard from a template.

    Your custom dashboards can display the following.

    • Charts and tables that show metric data
    • Log data and error groups
    • Charts for alerting policies
    • Information about alerts
    • Text
    • Events, such as a reboot or a crash, that affectsthe operation of a system.

    You can create and manage dashboards by usingtheGoogle Cloud console or theAPI.

  • The chart service,Metrics Explorer,lets you quickly visualize and explore time-series data. The chart settingslet you compare current data to previous data, display outliers andpercentiles, and display multiple metrics. You can alsosave charts to a custom dashboard.

Data collection and storage

Cloud Monitoring collects and stores the following types of metric data:

  • Log-based metrics that record numeric information aboutthe logs written toCloud Logging.Google-defined log-based metrics includecounts of the errors that your service detects and the totalnumber of log entries received by your Google Cloud project.You can also define log-based metrics.

Query languages

When you create an alerting policy or a chart, you mustprovide a query that describes the data that you want to monitor or chart:

  • Google Cloud console: You can build your query by making selections from menus,or you can write a query. Query editors are available forthePrometheus Query Language (PromQL). The query editorprovides syntax checks and suggestions. You can alsowrite aMonitoring filter expression.

  • Cloud Monitoring API: The API supportsPrometheus Query Language (PromQL)and Monitoring filter expressions.

Monitor large systems

This section describes how you can manage resources as acollection and how you can monitor metrics that are stored inmultiple Google Cloud projects.

Manage resources as a collection

To manage your resources as a collectioninstead of individually, create aresource group.Aresource group is a dynamic collection of resourcesthat satisfy some criteria that you provide. As you add and remove resources,for example by adding Compute Engine VM instances to yourGoogle Cloud project, the membership in the group automatically changes.The following are examples of resource groups:

  • Compute Engine instances whose names start with the stringprod-.
  • Resources with the tagtest-cluster.
  • Amazon EC2 instances in region A or region B.

After youdefine a resource group,you can monitor the group as if it were a single resource. For example, youcan configure anuptime check to monitor a resource group.For charts and alerting policies, you can also filter based on the group name.

For more information, seeConfigure resource groups.

Monitor metrics for multiple Google Cloud projects

To view and monitor the time-series data for multipleGoogle Cloud projects and AWS accounts through a single interface,configure a multi-projectmetrics scope.

By default, Cloud Monitoring pages in the Google Cloud console provideaccess only to the time series stored in thescoping project. Thescoping project is the project that you selected with theGoogle Cloud console project picker. The scoping projectstores thealerts,synthetic monitors,dashboards, andmonitoring groupsthat you configure.

The scoping project also hosts a metrics scope.Themetrics scope defines the projects and accounts whose metrics arevisible to the scoping project. You can configure the metrics scopeto include time-series data from other Google Cloud projects and fromAWS accounts. For information about how to modify a metrics scope, seeConfigure a metrics scope for multiple projects.

Cloud Monitoring data model

This section introduces the Cloud Monitoring data model:

  • Ametric type describes something that ismeasured. Examples of metric types include a VM's CPU utilization and thepercentage of a disk that is used.

  • Atime series is a data structure that contains time-stamped measurementsof a metric and information about the source andmeaning of those measurements.

Here are some details about what a time series contains:

  • Thepoints array contains the time-stamped measurements.

    The following is an example of apoints array with two values:

      "points": [    {      "interval": {        "startTime": "2020-07-27T20:20:21.597143Z",        "endTime": "2020-07-27T20:20:21.597143Z"      },      "value": {        "doubleValue": 0.473005      }    },    {      "interval": {        "startTime": "2020-07-27T20:19:21.597239Z",        "endTime": "2020-07-27T20:19:21.597239Z"      },      "value": {        "doubleValue": 0.473025      }    },  ],

    To understand the meaning of a value, you need to refer to the other dataincluded in the time series and to the definitions of that data.

  • Theresource field describes the hardware or software componentthat is being monitored. In Cloud Monitoring, the hardware or softwarecomponent is referred to as themonitored resource.Examples of monitored resources include Compute Engineinstances and App Engine applications. For a list ofmonitored resources, see theMonitored resource list.

    The following is an example of aresource field:

      "resource": {    "type": "gce_instance",    "labels": {      "instance_id": "2708613220420473591",      "zone": "us-east1-b",      "project_id": "sampleproject"    }  }
    • Thetype field lists the monitored resourceas agce_instance, which indicates thatthese measurements are taken on a Compute Engine VM instance.

    • Thelabels field contains key-value pairs that provide additionalinformation about the monitored resource. For agce_instance type,the labels identify the VM instance that is being monitored.

  • Themetric field describes what is being measured.

    The following is an example of ametric field:

      "metric": {    "labels": {      "instance_name": "test"    },    "type": "compute.googleapis.com/instance/cpu/utilization"  },
    • For Google Cloud services, thetype field specifies the service and whatis being monitored. In this example, the Compute Engine servicemeasuring the CPU utilization.When thetype field begins withcustom orexternal, the metric iseither a custom metric or one defined by a third party.
    • Thelabels field contains key-value pairs that provide additionalinformation about the measurement. These labels are defined as part oftheMetricDescriptor, which is a data structure thatdefines the attributes of the measured data.TheMetricDescriptor for the metriccompute.googleapis.com/instance/cpu/utilizationincludes the labelinstance_name.
  • ThemetricKind field describes the relationship betweenadjacent measurements within a time series:

    • GAUGE metrics store the value of the thing beingmeasured at a given moment in time—for example, an hourlytemperature record.

    • CUMULATIVE metrics store the accumulated value of the thing beingmeasured at a given moment in time—for example, an odometer in avehicle.

    • DELTA metrics store the change in the value of the thing beingmeasured over a specified period—for example, a stocksummary that shows the stock's gains or losses.

  • ThevalueType field describes the data type for themeasurement:INT64,DOUBLE,BOOL,STRING, orDISTRIBUTION.

Cloud Monitoring writes one time series for each combination ofresource and metric label values. You can use these labels to group and tofilter time series.For example, when a Google Cloud project contains multipleCompute Engine VM instances, the CPU utilization for each VM instanceis a unique time series. Here are a few of the ways that you can display thisdata:

  • You can show the CPU utilization of every VM instance.
  • You can show the CPU utilization for a specific VM instance byfiltering the time series for a single value of theinstance_id label.
  • You can group by the VM instances by themachine_type label,and then display the average CPU utilization. The following screenshotillustrates a chart with this configuration:

    Average CPU utilization grouped by machine type.

Pricing

To learn about pricing for Cloud Monitoring, see theGoogle Cloud Observability pricing page.

What's next

  • For information about how to configure our Google Cloud project to viewmetrics for multiple Google Cloud projects and AWS accounts, seeMetrics scopes overview.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.