Configure metrics collection

This document describes how to configure Google Kubernetes Engine (GKE) to sendmetrics toCloud Monitoring. Metrics in Cloud Monitoringcan populatecustom dashboards,generatealerts,createservice-level objectives,or be fetched by third-party monitoring services using theCloud Monitoring API.

GKE provides several sources of metrics:

  • System metrics: metrics from essential systemcomponents, describing low-level resources such as CPU, memory and storage.
  • Google Cloud Managed Service for Prometheus:lets you monitor and alert on your workloads, using Prometheus, withouthaving to manually manage and operate Prometheus at scale.
  • Packages of observability metrics:

    • Control plane metrics: metrics exported fromcertain control plane components such as the API server and scheduler.
    • Kube state metrics: a curated setof metrics exported from thekube state service,used to monitor the state of Kubernetes objects like Pods, Deployments,and more. For the set of included metrics, seeUse kube state metrics.

      The kube state package is a managed solution. If you needgreater flexibility—for example, if you need to collect additionalmetrics, or need to manage scrape intervals or to scrape otherresources—you candisable the package,if it is enabled, and deploy your own instance of the open sourcekube state metrics service. For moreinformation, see the Google Cloud Managed Service for Prometheus exporter documentationforKube statemetrics.

    • cAdvisor/Kubelet: acurated set of cAdvisor and Kubelet metrics. For the set of includedmetrics, seeUse cAdvisor/Kubelet metrics.

      The cAdvisor/Kubelet package is a managed solution. Ifyou need greater flexibility—for example, if you need to collectadditional metrics or to manage scrape intervals or to scrape otherresources—you candisable the package,if it is enabled, and deploy your own instance of the open sourcecAdvisor/Kubelet metricsservices.

    • NVIDIA Data Center GPU Manager (DCGM) metrics: metrics fromDCGM that provide a comprehensiveview of GPU health, performance, and utilization.

You can also configureautomatic application monitoringfor certain workloads.

System metrics

When a cluster is created, GKE by default collects certainmetrics emitted by system components.

You have a choice whether or not to send metrics from your GKEcluster to Cloud Monitoring. If you choose to send metrics toCloud Monitoring, you must send system metrics.

All GKE system metrics are ingested into Cloud Monitoring withthe prefixkubernetes.io.

Pricing

Cloud Monitoring does not charge for the ingestion of GKE system metrics.For more information, seeCloud Monitoring pricing.

Configuring collection of system metrics

To enable system metric collection, pass theSYSTEM value to the--monitoring flag of thegcloud container clusters createorgcloud container clusters updatecommands.

To disable system metric collection, use theNONE value for the--monitoringflag. If system metric collection is disabled, basic information like CPU usage,memory usage, and disk usage are not available for a cluster when viewingobservability metrics.

For GKE Autopilot clusters, you cannot disable thecollection of system metrics.

Warning: If you disable Cloud Logging or Cloud Monitoring or applyexclusion filters, GKE customer support is offered on abest-effort basis and might require additional effort from your engineering team.

SeeObservability for GKEfor more details about Cloud Monitoring integration with GKE.

To configure the collection of system metrics by using Terraform,see themonitoring_config block in the Terraform registry forgoogle_container_cluster.For general information about using Google Cloud with Terraform, seeTerraform with Google Cloud.

List of system metrics

System metrics include metrics from essential system components important forKubernetes. For a list of these metrics, seeGKE system metrics.

If you enable Cloud Monitoring for your cluster, then you can't disablesystem monitoring (--monitoring=SYSTEM).

Troubleshooting system metrics

If system metrics are not available in Cloud Monitoring as expected, seeTroubleshoot system metrics.

Package: Control plane metrics

You can configure a GKE cluster to send certain metrics emittedby the Kubernetes API server, Scheduler, and Controller Manager toCloud Monitoring.

For more information, seeCollect and view control plane metrics.

Package: Kube state metrics

You can configure a GKE cluster to send a curated set ofkube state metrics in Prometheus format to Cloud Monitoring.This package of kube state metrics includes metrics for Pods,Deployments, StatefulSets, DaemonSets, HorizontalPodAutoscaler resources,Persistent Volumes, Persistent Volume Claims, and JobSets.

For more information, seeCollect and view Kube state metrics.

Package: cAdvisor/Kubelet metrics

You can configure a GKE cluster to send a curated set ofcAdvisor/Kubelet metrics in Prometheus format toCloud Monitoring. The curated set of metrics is a subset of thelarge set of cAdvisor/Kubelet metrics built into everyKubernetes deployment by default. The curated cAdvisor/Kubeletis designed to provide the most useful metrics, reducing ingestion volumeand associated costs.

For more information, seeCollect and view cAdvisor/Kubelet metrics.

Package: NVIDIA Data Center GPU Manager (DCGM) metrics

You can monitor GPU utilization, performance, and health by configuringGKE to sendNVIDIA Data Center GPU Manager (DCGM) metrics toCloud Monitoring.

For more information, seeCollect and view NVIDIA Data Center GPU Manager (DCGM) metrics.

Disable metric packages

You can disable the use of metric packages in the cluster. You might want todisable certain packages to reduce costs or if you are using an alternatemechanism for collecting the metrics, like Google Cloud Managed Service for Prometheus and anexporter.

Console

To disable the collection of metrics from theDetailstab for the cluster, do the following:

  1. In the Google Cloud console, go to theKubernetes clusters page:

    Go toKubernetes clusters

    If you use the search bar to find this page, then select the result whose subheading isKubernetes Engine.

  2. Click your cluster's name.

  3. In theFeatures row labelledCloud Monitoring,click theEdit icon.

  4. In theComponents drop-down menu, clear themetric components that you want to disable.

  5. ClickOK.

  6. ClickSave Changes.

gcloud

  1. Open a terminal window with Google Cloud SDK and the Google Cloud CLIinstalled. One way to do this is to use Cloud Shell.

  2. In the Google Cloud console, activate Cloud Shell.

    Activate Cloud Shell

    At the bottom of the Google Cloud console, aCloud Shell session starts and displays a command-line prompt. Cloud Shell is a shell environment with the Google Cloud CLI already installed and with values already set for your current project. It can take a few seconds for the session to initialize.

  3. Call thegcloud container clusters updatecommand and pass an updated set of values to the--monitoringflag. The set of values supplied to the--monitoring flagoverrides any previous setting.

    For example, to turn off the collection of all metrics exceptsystem metrics, run the following command:

    gcloud container clusters updateCLUSTER_NAME \    --location=COMPUTE_LOCATION \--enable-managed-prometheus \    --monitoring=SYSTEM

    This command disables the collection of any previously configuredmetric packages.

Terraform

To configure the collection of metrics by using Terraform,see themonitoring_config block in the Terraform registry forgoogle_container_cluster.For general information about using Google Cloud with Terraform, seeTerraform with Google Cloud.

Understanding your Monitoring bill

You can use Cloud Monitoring to identify the control plane orkube state metrics that are writing the largest numbers of samples.These metrics are contributing the mostto your costs. After you identify the most expensive metrics, you canmodify your scrape configs to filter these metrics appropriately.

The Cloud MonitoringMetrics Management page provides informationthat can help you control the amount you spend on billable metricswithout affecting observability. TheMetrics Management page reports thefollowing information:

  • Ingestion volumes for both byte- and sample-based billing, across metric domains and for individual metrics.
  • Data about labels and cardinality of metrics.
  • Number of reads for each metric.
  • Use of metrics in alerting policies and custom dashboards.
  • Rate of metric-write errors.

You can also use theMetrics Management page toexclude unneeded metrics,eliminating the cost of ingesting them.

To view theMetrics Management page, do the following:

  1. In the Google Cloud console, go to the Metrics management page:

    Go toMetrics management

    If you use the search bar to find this page, then select the result whose subheading isMonitoring.

  2. In the toolbar, select your time window. By default, theMetrics Management page displays information about the metrics collected in the previous one day.

For more information about theMetrics Management page, seeView and manage metric usage.

To identify which control plane or kube state metrics have thelargest number of samples being ingested, do the following:

  1. In the Google Cloud console, go to the Metrics management page:

    Go toMetrics management

    If you use the search bar to find this page, then select the result whose subheading isMonitoring.

  2. On theBillable samples ingested scorecard,clickView charts.

  3. Locate theNamespace Volume Ingestion chart, and then click More chart options.

  4. In theMetric field, verify that the following resource andand metric are selected:
    Metric Ingestion AttributionandSamples written by attribution id.

  5. In theFilters page, do the following:

    1. In theLabel field, verify that the value isattribution_dimension.

    2. In theComparison field, verify that the value is= (equals).

    3. In theValue field, selectcluster.

  6. Clear theGroup by setting.

  7. Optionally, filter for only certain metrics. For example, control plane APIserver metrics all include "apiserver" as part of the metric name, andkube state Pod metrics all include "kube_pod" as part of themetric name, so you can filter for metrics containing those strings:

    • ClickAdd Filter.

    • In theLabel field, selectmetric_type.

    • In theComparison field, select=~ (equals regex).

    • In theValue field, enter.*apiserver.* or.*kube_pod.*.

  8. Optionally, group the number of samples ingested by GKE region orproject:

    • ClickGroup by.

    • Ensuremetric_type is selected.

    • To group by GKE region, selectlocation.

    • To group by project, selectproject_id.

    • ClickOK.

  9. Optionally, group the number of samples ingested by GKE cluster name:

    • ClickGroup by.

    • To group by GKE cluster name, ensure bothattribution_dimensionandattribution_id are selected.

    • ClickOK.

  10. To see the ingestion volume for each of the metrics, in thetoggle labeledChart Table Both, selectBoth.The table shows the ingested volume for each metric in theValue column.

    Click theValue column header twice to sort the metrics bydescending ingestion volume.

These steps show the metrics with the highest rate of samples ingested intoCloud Monitoring. Because the metrics in the observability packages arecharged by the number of samples ingested, payattention to metrics with the greatest rate of samples being ingested.

Other metrics

In addition to thesystem metricsand metric packages described in this document,Istio metrics are also available forGKE clusters. For pricing information, seeCloud Monitoring pricing.

Available metrics

The following table indicates supported values for the--monitoring flag forthecreate andupdate commands.

Source--monitoring valueMetrics Collected
NoneNONE No metrics sent to Cloud Monitoring; no metric collection agent installed in the cluster. This value isn't supported for Autopilot clusters.
SystemSYSTEMMetrics from essential system components required for Kubernetes. For a complete list of the metrics, see Kubernetes metrics.
API serverAPI_SERVERMetrics fromkube-apiserver. For a complete list of the metrics, see API server metrics.
SchedulerSCHEDULERMetrics fromkube-scheduler. For a complete list of the metrics, see Scheduler metrics.
Controller ManagerCONTROLLER_MANAGERMetrics fromkube-controller-manager. For a complete list of the metrics, see Controller Manager metrics.
Persistent volume (Storage)STORAGEStorage metrics fromkube-state-metrics. Includes metrics for Persistent Volume and Persistent Volume Claims. For a complete list of the metrics, see Storage metrics.
PodPODPod metrics fromkube-state-metrics. For a complete list of the metrics, see Pod metrics.
DeploymentDEPLOYMENTDeployment metrics fromkube-state-metrics. For a complete list of the metrics, see Deployment metrics.
StatefulSetSTATEFULSETStatefulSet metrics fromkube-state-metrics. For a complete list of the metrics, see StatefulSet metrics.
DaemonSetDAEMONSETDaemonSet metrics fromkube-state-metrics. For a complete list of the metrics, see DaemonSet metrics.
HorizonalPodAutoscalerHPAHPA metrics fromkube-state-metrics. See a complete list of HorizonalPodAutoscaler metrics.
cAdvisorCADVISORcAdvisor metrics from the cAdvisor/Kubelet metrics package. For a complete list of the metrics, see cAdvisor metrics.
KubeletKUBELETKubelet metrics from the cAdvisor/Kubelet For a complete list of the metrics, see Kubelet metrics.
NVIDIA Data Center GPU Manager (DCGM) metricsDCGMMetrics from NVIDIA Data Center GPU Manager (DCGM).

You can also collect Prometheus-style metrics exposed by any GKEworkload by usingGoogle Cloud Managed Service for Prometheus,which lets you monitor and alert on your workloads, using Prometheus, withouthaving to manually manage and operate Prometheus at scale.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-10-24 UTC.