Configure metrics collection Stay organized with collections Save and categorize content based on your preferences.
This document describes how to configure Google Kubernetes Engine (GKE) to sendmetrics toCloud Monitoring. Metrics in Cloud Monitoringcan populatecustom dashboards,generatealerts,createservice-level objectives,or be fetched by third-party monitoring services using theCloud Monitoring API.
GKE provides several sources of metrics:
- System metrics: metrics from essential systemcomponents, describing low-level resources such as CPU, memory and storage.
- Google Cloud Managed Service for Prometheus:lets you monitor and alert on your workloads, using Prometheus, withouthaving to manually manage and operate Prometheus at scale.
Packages of observability metrics:
- Control plane metrics: metrics exported fromcertain control plane components such as the API server and scheduler.
Kube state metrics: a curated setof metrics exported from thekube state service,used to monitor the state of Kubernetes objects like Pods, Deployments,and more. For the set of included metrics, seeUse kube state metrics.
The kube state package is a managed solution. If you needgreater flexibility—for example, if you need to collect additionalmetrics, or need to manage scrape intervals or to scrape otherresources—you candisable the package,if it is enabled, and deploy your own instance of the open sourcekube state metrics service. For moreinformation, see the Google Cloud Managed Service for Prometheus exporter documentationforKube statemetrics.
cAdvisor/Kubelet: acurated set of cAdvisor and Kubelet metrics. For the set of includedmetrics, seeUse cAdvisor/Kubelet metrics.
The cAdvisor/Kubelet package is a managed solution. Ifyou need greater flexibility—for example, if you need to collectadditional metrics or to manage scrape intervals or to scrape otherresources—you candisable the package,if it is enabled, and deploy your own instance of the open sourcecAdvisor/Kubelet metricsservices.
NVIDIA Data Center GPU Manager (DCGM) metrics: metrics fromDCGM that provide a comprehensiveview of GPU health, performance, and utilization.
You can also configureautomatic application monitoringfor certain workloads.
System metrics
When a cluster is created, GKE by default collects certainmetrics emitted by system components.
You have a choice whether or not to send metrics from your GKEcluster to Cloud Monitoring. If you choose to send metrics toCloud Monitoring, you must send system metrics.
All GKE system metrics are ingested into Cloud Monitoring withthe prefixkubernetes.io.
Pricing
Cloud Monitoring does not charge for the ingestion of GKE system metrics.For more information, seeCloud Monitoring pricing.
Configuring collection of system metrics
To enable system metric collection, pass theSYSTEM value to the--monitoring flag of thegcloud container clusters createorgcloud container clusters updatecommands.
To disable system metric collection, use theNONE value for the--monitoringflag. If system metric collection is disabled, basic information like CPU usage,memory usage, and disk usage are not available for a cluster when viewingobservability metrics.
For GKE Autopilot clusters, you cannot disable thecollection of system metrics.
Warning: If you disable Cloud Logging or Cloud Monitoring or applyexclusion filters, GKE customer support is offered on abest-effort basis and might require additional effort from your engineering team.SeeObservability for GKEfor more details about Cloud Monitoring integration with GKE.
To configure the collection of system metrics by using Terraform,see themonitoring_config block in the Terraform registry forgoogle_container_cluster.For general information about using Google Cloud with Terraform, seeTerraform with Google Cloud.
List of system metrics
System metrics include metrics from essential system components important forKubernetes. For a list of these metrics, seeGKE system metrics.
If you enable Cloud Monitoring for your cluster, then you can't disablesystem monitoring (--monitoring=SYSTEM).
Troubleshooting system metrics
If system metrics are not available in Cloud Monitoring as expected, seeTroubleshoot system metrics.
Package: Control plane metrics
You can configure a GKE cluster to send certain metrics emittedby the Kubernetes API server, Scheduler, and Controller Manager toCloud Monitoring.
For more information, seeCollect and view control plane metrics.
Package: Kube state metrics
You can configure a GKE cluster to send a curated set ofkube state metrics in Prometheus format to Cloud Monitoring.This package of kube state metrics includes metrics for Pods,Deployments, StatefulSets, DaemonSets, HorizontalPodAutoscaler resources,Persistent Volumes, Persistent Volume Claims, and JobSets.
For more information, seeCollect and view Kube state metrics.
Package: cAdvisor/Kubelet metrics
You can configure a GKE cluster to send a curated set ofcAdvisor/Kubelet metrics in Prometheus format toCloud Monitoring. The curated set of metrics is a subset of thelarge set of cAdvisor/Kubelet metrics built into everyKubernetes deployment by default. The curated cAdvisor/Kubeletis designed to provide the most useful metrics, reducing ingestion volumeand associated costs.
For more information, seeCollect and view cAdvisor/Kubelet metrics.
Package: NVIDIA Data Center GPU Manager (DCGM) metrics
You can monitor GPU utilization, performance, and health by configuringGKE to sendNVIDIA Data Center GPU Manager (DCGM) metrics toCloud Monitoring.
For more information, seeCollect and view NVIDIA Data Center GPU Manager (DCGM) metrics.
Disable metric packages
You can disable the use of metric packages in the cluster. You might want todisable certain packages to reduce costs or if you are using an alternatemechanism for collecting the metrics, like Google Cloud Managed Service for Prometheus and anexporter.
Console
To disable the collection of metrics from theDetailstab for the cluster, do the following:
In the Google Cloud console, go to theKubernetes clusters page:
If you use the search bar to find this page, then select the result whose subheading isKubernetes Engine.
Click your cluster's name.
In theFeatures row labelledCloud Monitoring,click theEdit icon.
In theComponents drop-down menu, clear themetric components that you want to disable.
ClickOK.
ClickSave Changes.
gcloud
Open a terminal window with Google Cloud SDK and the Google Cloud CLIinstalled. One way to do this is to use Cloud Shell.
In the Google Cloud console, activate Cloud Shell.
At the bottom of the Google Cloud console, aCloud Shell session starts and displays a command-line prompt. Cloud Shell is a shell environment with the Google Cloud CLI already installed and with values already set for your current project. It can take a few seconds for the session to initialize.
Call the
gcloud container clusters updatecommand and pass an updated set of values to the--monitoringflag. The set of values supplied to the--monitoringflagoverrides any previous setting.For example, to turn off the collection of all metrics exceptsystem metrics, run the following command:
gcloud container clusters updateCLUSTER_NAME \ --location=COMPUTE_LOCATION \--enable-managed-prometheus\ --monitoring=SYSTEMThis command disables the collection of any previously configuredmetric packages.
Terraform
To configure the collection of metrics by using Terraform,see themonitoring_config block in the Terraform registry forgoogle_container_cluster.For general information about using Google Cloud with Terraform, seeTerraform with Google Cloud.
Understanding your Monitoring bill
You can use Cloud Monitoring to identify the control plane orkube state metrics that are writing the largest numbers of samples.These metrics are contributing the mostto your costs. After you identify the most expensive metrics, you canmodify your scrape configs to filter these metrics appropriately.
The Cloud MonitoringMetrics Management page provides informationthat can help you control the amount you spend on billable metricswithout affecting observability. TheMetrics Management page reports thefollowing information:
- Ingestion volumes for both byte- and sample-based billing, across metric domains and for individual metrics.
- Data about labels and cardinality of metrics.
- Number of reads for each metric.
- Use of metrics in alerting policies and custom dashboards.
- Rate of metric-write errors.
You can also use theMetrics Management page toexclude unneeded metrics,eliminating the cost of ingesting them.
To view theMetrics Management page, do the following:
In the Google Cloud console, go to the Metrics management page:
If you use the search bar to find this page, then select the result whose subheading isMonitoring.
- In the toolbar, select your time window. By default, theMetrics Management page displays information about the metrics collected in the previous one day.
For more information about theMetrics Management page, seeView and manage metric usage.
To identify which control plane or kube state metrics have thelargest number of samples being ingested, do the following:
In the Google Cloud console, go to the Metrics management page:
If you use the search bar to find this page, then select the result whose subheading isMonitoring.
On theBillable samples ingested scorecard,clickView charts.
Locate theNamespace Volume Ingestion chart, and then clickmore_vert More chart options.
In theMetric field, verify that the following resource andand metric are selected:
Metric Ingestion AttributionandSamples written by attribution id.In theFilters page, do the following:
In theLabel field, verify that the value is
attribution_dimension.In theComparison field, verify that the value is
= (equals).In theValue field, select
cluster.
Clear theGroup by setting.
Optionally, filter for only certain metrics. For example, control plane APIserver metrics all include "apiserver" as part of the metric name, andkube state Pod metrics all include "kube_pod" as part of themetric name, so you can filter for metrics containing those strings:
ClickAdd Filter.
In theLabel field, select
metric_type.In theComparison field, select
=~ (equals regex).In theValue field, enter
.*apiserver.*or.*kube_pod.*.
Optionally, group the number of samples ingested by GKE region orproject:
ClickGroup by.
Ensuremetric_type is selected.
To group by GKE region, selectlocation.
To group by project, selectproject_id.
ClickOK.
Optionally, group the number of samples ingested by GKE cluster name:
ClickGroup by.
To group by GKE cluster name, ensure bothattribution_dimensionandattribution_id are selected.
ClickOK.
To see the ingestion volume for each of the metrics, in thetoggle labeledChart Table Both, selectBoth.The table shows the ingested volume for each metric in theValue column.
Click theValue column header twice to sort the metrics bydescending ingestion volume.
These steps show the metrics with the highest rate of samples ingested intoCloud Monitoring. Because the metrics in the observability packages arecharged by the number of samples ingested, payattention to metrics with the greatest rate of samples being ingested.
Other metrics
In addition to thesystem metricsand metric packages described in this document,Istio metrics are also available forGKE clusters. For pricing information, seeCloud Monitoring pricing.
Available metrics
The following table indicates supported values for the--monitoring flag forthecreate andupdate commands.
| Source | --monitoring value | Metrics Collected |
|---|---|---|
| None | NONE | No metrics sent to Cloud Monitoring; no metric collection agent installed in the cluster. This value isn't supported for Autopilot clusters. |
| System | SYSTEM | Metrics from essential system components required for Kubernetes. For a complete list of the metrics, see Kubernetes metrics. |
| API server | API_SERVER | Metrics fromkube-apiserver. For a complete list of the metrics, see API server metrics. |
| Scheduler | SCHEDULER | Metrics fromkube-scheduler. For a complete list of the metrics, see Scheduler metrics. |
| Controller Manager | CONTROLLER_MANAGER | Metrics fromkube-controller-manager. For a complete list of the metrics, see Controller Manager metrics. |
| Persistent volume (Storage) | STORAGE | Storage metrics fromkube-state-metrics. Includes metrics for Persistent Volume and Persistent Volume Claims. For a complete list of the metrics, see Storage metrics. |
| Pod | POD | Pod metrics fromkube-state-metrics. For a complete list of the metrics, see Pod metrics. |
| Deployment | DEPLOYMENT | Deployment metrics fromkube-state-metrics. For a complete list of the metrics, see Deployment metrics. |
| StatefulSet | STATEFULSET | StatefulSet metrics fromkube-state-metrics. For a complete list of the metrics, see StatefulSet metrics. |
| DaemonSet | DAEMONSET | DaemonSet metrics fromkube-state-metrics. For a complete list of the metrics, see DaemonSet metrics. |
| HorizonalPodAutoscaler | HPA | HPA metrics fromkube-state-metrics. See a complete list of HorizonalPodAutoscaler metrics. |
| cAdvisor | CADVISOR | cAdvisor metrics from the cAdvisor/Kubelet metrics package. For a complete list of the metrics, see cAdvisor metrics. |
| Kubelet | KUBELET | Kubelet metrics from the cAdvisor/Kubelet For a complete list of the metrics, see Kubelet metrics. |
| NVIDIA Data Center GPU Manager (DCGM) metrics | DCGM | Metrics from NVIDIA Data Center GPU Manager (DCGM). |
You can also collect Prometheus-style metrics exposed by any GKEworkload by usingGoogle Cloud Managed Service for Prometheus,which lets you monitor and alert on your workloads, using Prometheus, withouthaving to manually manage and operate Prometheus at scale.
What's next
- Learn how to troubleshootsystem metrics.
- Learn how tocollect and view kube state metrics.
- Learn how toview observability metrics.
- Learn how tocollect and view control plane metrics.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-10-24 UTC.