Observability in Google Cloud

Google Cloud Observability includes observability services that help you to understand thebehavior, health, and performance of your applications. Visibility into howapplications behave and how components are connected help you to anticipate,identify, and respond to unexpected changes more quickly and effectively.

This document includes the following information:

About observability

Observability is a holistic approach to gathering and analyzingtelemetrydata in order to understand the state of your environment. Telemetry data ismetrics, logs, traces, and other data generated by your applications and theapplication infrastructure that provide information about application health andperformance.Application-centric observability refers to tools that letyou visualize and analyze the telemetry data from the perspective of anapplication.

Metrics
Metrics are numeric data about health or performance that you measure at regularintervals over time, such as CPU utilization and request latency. Unexpectedchanges to a metric might indicate an issue to investigate. Over time, you canalso analyze metric patterns to better understand usage patterns and anticipateresource needs.
Logs

A log is a generated record of system or application activity over time. Eachlog is a collection of time stamped log entries, and each log entry describes anevent at a specific point in time.

A log often contains rich, detailed information that helps you understand whathappened with a specific part of your application. However, logs don't providegood information about how a change in one component of your application relatesto activity in another component. Traces can help to bridge that gap.

Traces

Traces represent the path of a request across the parts of your distributedapplication. A metric or log entry in one application component that triggeredan alert notification might be a symptom of a problem that originates in anothercomponent. Traces let you follow the flow of a request and examine latency datato help you to identify the root cause of an issue.

Other data

You can gain additional insights by analyzing metrics, logs, and traces in thecontext of other data. For example, a label for the severity of an alert or thecustomer ID associated with a request in logs provide context that can be usefulfor troubleshooting and debugging.

Monitoring, debugging, and troubleshooting distributed applications can bedifficult because there are many systems and software components involved, oftenwith a mix of open source and commercial software.

Observability tools help you to navigate this complexity by collectingmeaningful data and providing features to explore, analyze, and correlate thedata. An observable environment helps you to:

  • Proactively detect issues before they impact your users
  • Troubleshoot both known and new issues
  • Debug applications during development
  • Plan for and understand the impacts of changes to your applications
  • Explore data to discover new insights

In short, an observable environment helps you to maintain applicationreliability. An application is reliable when it meets your current objectivesfor availability and resilience to failures.

To learn more about reliability practices, including principles and practicesrelated to observability, read the bookSite Reliability Engineering: HowGoogle Runs Production Systems. Topics include:

Google Cloud Observability

Services in Google Cloud Observability help you to collect, analyze, andcorrelate telemetry data, both from your applications and from theunderlying infrastructure. These services also provide built-in defaults to helpyou get started faster, such as default dashboards for yourApp Hub applications and preconfigured alerting policies.

Cloud Monitoring, Cloud Logging, and Cloud Trace are among theservices enabled by default when you create a Google Cloud project.

Monitoring: Use collected metrics to monitor health andperformance, identify trends and issues, and notify for changes in behavior.

  • View the health of yourApp Hub applications.
  • Automatically collect metrics for most Google Cloud services.
  • Collect system and application metrics from third-party applications.
  • Visualize and analyze metrics with default or customized dashboards.
  • Use synthetic monitoring to test the performance of your applications.
  • Define service level objectives (SLOs) to monitor service reliability.
  • Receive alerts when issues occur.

Logging: Use collected logs to debug, troubleshoot, andgain insights about your applications.

  • Automatically collect logs for most Google Cloud services.
  • Automatically collectaudit logs for most Google Cloudservices.
  • Collect logs from third-party software.
  • Explore and analyze logs.
  • UseLog Analytics to perform an analysis across yourlogs and other data with BigQuery. For example, you can useBigQuery to compareURLs in your logs with a public dataset of known malicious URLs.
  • Create metrics from logs.
  • Receive alerts when a specified message appears in a log.

Error Reporting: View and analyze errors from runningcloud services:

  • Aggregate errors that Error Reporting detects in log entries, and view theassociated logs.
  • Aggregate errors that your applications send to the Error Reporting API.

Trace: View and analyze the flow and latency ofapplication requests when you are debugging and troubleshooting.

  • Track how requests propagate through your applications.
  • Collect latency data from your applications and view graphs of the data.
  • View latency reports that show performance degradations.
  • Receive alerts for changes in the latency profile for your applications.
  • Export traces to BigQuery so that you can explore it with other data.

Cloud Profiler: Analyze CPU and memory usage for yourapplications so that you can identify opportunities to improve performance.

  • Collect CPU usage and memory allocation data from your applications.
  • Identify the parts of an application that are consuming the most resources andgain insights about the application's overall performance.

Get started

This section describes steps you can take to get familiar with observabilityfeatures in Google Cloud.

Try the quickstarts

Try the quickstarts to get familiar with the available services.

Look at automatically collected data

Most Google Cloud services automatically generate predefined metrics and logs.This means that you can start looking at some observability data for supportedGoogle Cloud services without additional configuration.

  • Some Google Cloud services such asGoogle Kubernetes Engine (GKE),Compute Engine, andCloud SQL providedefault dashboards in the Google Cloud console to view observability data incontext of the service.
  • Compute Engine, GKE, and Cloud Run generatesystem metrics and logs by default, and you configure collection of additionaldata.
  • Cloud Run functions, and App Engine automatically generate metrics, logs, andtraces.

You can also chart collected metrics inMetrics Explorer,view logs inLogs Explorer, or view traces inTrace. To review related data together, createcustom dashboards. For example, you can create a dashboard thatincludes logs, performance metrics, and alerting policies for virtual machines.

Configure Compute Engine VMs to collect additional data

Compute Engine VMs only collect basic system metrics and logs by default withoutthe Ops agent

Install the Ops Agent to collect additional telemetry data(logs, metrics, and traces) from your Compute Engine instances and applicationsfor troubleshooting, performance monitoring, and alerting.

Configure GKE clusters to collect additional data

By default, GKE clusters send system logs and system metrics toLogging and Monitoring.Google Cloud Managed Service for Prometheus handles collection ofthird-party and user-defined metrics.

  • Useobservability metrics packages to better understandthe state of your applications and cluster resources. For example, controlplane metrics are useful for creating SLOs to monitor service availability andlatency.
  • Monitor third-party applications such as Postgres, MongoDB,and Redis. These integrations provide pre-configured dashboards and alertpolicies.

Configure Cloud Run to collect custom data

If you have a have a Cloud Run service that writesPrometheus metrics, then you can use thePrometheus sidecar to send the metrics to Cloud Monitoring.

If your Cloud Run service writesOTLP metricsinstead, then you can use an OpenTelemetry sidecar. For an example, see thetutorial for collecting OTLP metrics by using the sidecar.

Instrument your applications

Instrumentation is code that you add to an application to emit telemetry data.There are several open-source instrumentation frameworks let youcollect metrics, logs, and traces from your application and send that data toany vendor, including Google Cloud. However, you might not need toinstrument your application. For example,Cloud Run, Cloud Run functions, and App Engineprovide automatic tracing.

To instrument your application, we recommend that you use avendor-neutral instrumentation framework that is open source, such asOpenTelemetry, instead ofvendor- and product-specific APIs or client libraries.For information about instrumenting your application, see Instrumentation and observability.

For code samples that illustrate how to instrument your application tosend telemetry to Google Cloud, see the following:

You might also be interested in exploring the following topics:

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.