Troubleshoot monitoring dashboards

When you can't see your Google Kubernetes Engine (GKE) monitoring dashboards inCloud Monitoring, or they appear to be missing data, it can obstruct yourability to observe and respond to issues in your clusters and workloads.

Use this document to diagnose and resolve issues with GKEmonitoring dashboards. Find guidance on verifying whether Cloud Monitoring isenabled, checking Google Cloud console and account settings, and troubleshootinglogs and alerting policies for GKE resources.

This information is important for Platform admins and operators andApplication developers who use Cloud Monitoring dashboards tounderstand the health and performance of their GKE clusters andapplications. For more information about the common roles and example tasks thatwe reference in Google Cloud content, seeCommon GKE user rolesand tasks.

For more information about how to use these dashboards to troubleshoot yourclusters and workloads, seeAssess cluster and workload health in the Google Cloud console.

GKE dashboards are not listed in Cloud Monitoring

By default Monitoring is enabled when you create a cluster.If you don't see GKE dashboards when you areviewing provided Google Cloud dashboards inMonitoring, Monitoring is not enabled forclusters in the selected Google Cloud project.Enable monitoring to view these dashboards.

Note: For GKE Autopilot clusters, you cannot disablethe Cloud Monitoring and Cloud Logging integration.

No Kubernetes resources are in my dashboard

If you don't see any Kubernetes resources in your GKE dashboard,then check the following:

Selected Google Cloud project

Verify that you have selected the correct Google Cloud project from thedrop-down list in the Google Cloud console menu bar to select a project. You mustselect the project whose data you want to see.

Clusters activity

If you just created your cluster, wait a few minutes for it to populate withdata. SeeConfiguring logging and monitoring for GKEfor details.

Time range

The selected time range might be too narrow. You can use theTime menu inthe dashboard toolbar to select other time ranges or define aCustom range.

Permissions to view the dashboard

If you see either of the following permission-denied error messages when viewinga service's deployment details or a Google Cloud project's metrics, youneed to update your Identity and Access Management role to includeroles/monitoring.viewerorroles/viewer:

  • You do not have sufficient permissions to view this page
  • You don't have permissions to perform the action on the selected resources

For more details, go toPredefined roles.

Cluster and node service account permissions to write data to Monitoring and Logging

If you see high error rates in theEnabled APIs and services page inthe Google Cloud console, then your service account might be missing thefollowing roles:

  • roles/logging.logWriter: In the Google Cloud console, this role is namedLogs Writer. For more information on Logging roles, seetheLogging access control guide.

  • roles/monitoring.metricWriter: In the Google Cloud console, this role is namedMonitoring Metric Writer. For more information onMonitoring roles, see theMonitoring accesscontrol guide.

  • roles/stackdriver.resourceMetadata.writer: In the Google Cloud console, thisrole is namedStackdriver Resource Metadata Writer. This role permitswrite-only access to resource metadata, and it provides exactly thepermissions needed by agents to send metadata. For more information onMonitoring roles, see theMonitoring access control guide.

To list your service accounts, in the Google Cloud console go toIAM and Admin, and then selectService Accounts.

Can't view logs

If you don't see your logs in dashboards, check the following:

Agent is running and healthy

GKE version 1.17 and later useFluent Bit to capture logs. Fluent Bit is the Logging agent that runs on Kubernetes nodes.To check if the agent is running correctly, perform the following steps:

  1. Check whether the agent is restarting by running the following command:

    kubectlgetpods-lk8s-app=fluentbit-gke-nkube-system

    If there are no restarts, the output is similar to the following:

    NAME                  READY   STATUS    RESTARTS   AGEfluentbit-gke-6zr6g   2/2     Running   0          44dfluentbit-gke-dzh9l   2/2     Running   0          44d
  2. Check Pod status conditions by running the following command:

    JSONPATH='{range .items[*]};{@.metadata.name}:{range @.status.conditions[*]}{@.type}={@.status},{end}{end};'\ &&kubectlgetpods-lk8s-app=fluentbit-gke-nkube-system-ojsonpath="$JSONPATH"|tr";""\n"

    If the deployment is healthy, the output is similar to the following:

    fluentbit-gke-nj4qs:Initialized=True,Ready=True,ContainersReady=True,PodScheduled=True,fluentbit-gke-xtcvt:Initialized=True,Ready=True,ContainersReady=True,PodScheduled=True,
  3. Check the Pod status, which can help determine if the deployment is healthyby running the following command:

    kubectlgetdaemonset-lk8s-app=fluentbit-gke-nkube-system

    If the deployment is healthy, the output is similar to the following:

    NAME            DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR            AGEfluentbit-gke   2         2         2       2            2           kubernetes.io/os=linux   5d19h

    In this example output, the desired state matches the current state.

If the agent is running and healthy in these scenarios, and you still don't see allof your logs, the agent might be overloaded and dropping logs.

Agent overloaded and dropping logs

One possible reason you're not seeing all of your logs is that the node's logvolume is overloading the agent. The default Logging agentconfiguration in GKE is tuned for the rate of 100 kiB persecond for each node, and the agent might start dropping logs if the volumeexceeds that limit.

To detect if you might be hitting this limit, look for any of the followingindicators:

  • View thekubernetes.io/container/cpu/core_usage_time metric with the filtercontainer_name=fluentbit-gke to see if the CPU usage of the Loggingagent is near or at 100%.

  • View thelogging.googleapis.com/byte_count metric grouped bymetadata.system_labels.node_name to see if any node reaches 100 kiB persecond.

If you see any of these conditions, you can reduce the log volume of your nodesby adding more nodes to the cluster. If all of the log volume comes from asingle pod, then you would need to reduce the volume from that pod.

For more information on investigating and resolving GKE loggingrelated issues, seeTroubleshooting logging in GKE.

Incident isn't matched to a GKE resource?

If you have an alerting policy condition that aggregates metrics across distinctGKE resources, you might need to edit the policy'scondition to include more GKE hierarchy labels to associateincidents with specific entities.

For example, you might have two GKE clusters, one forproduction and one for staging, each with their own copy of servicelilbuddy-2. When the alerting policy condition aggregates a metric acrosscontainers in both clusters, the GKEMonitoring dashboard isn't able to associate this incidentuniquely with the production service or the staging service.

To resolve this situation, target the alerting policy to a specific service byaddingnamespace,cluster, andlocation to the policy'sGroup Byfield. On the event card for the alert, click theUpdate alert policy linkto open theEdit alerting policy page for the relevant alert policy. Fromhere, you can update the alerting policy with the additional information so thatthe dashboard can find the associated resource.

After you update the alerting policy, the GKEMonitoring dashboard is able to associate all future incidentswith a unique service in a particular cluster, giving you additional informationto diagnose the problem.

Depending on your use case, you might want to filter on some of these labels inaddition to adding them to theGroup By field. For example, if you only wantalerts for your production cluster, you can filter oncluster_name.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-18 UTC.