Hybrid and multicloud monitoring and logging patterns Stay organized with collections Save and categorize content based on your preferences.
This document discusses monitoring and logging architectures for hybrid andmulticloud deployments, and provides best practices for implementing them byusing Google Cloud. With this document, you can identify whichpatterns and products are best suited for your environments.
Every enterprise has a unique portfolio of application workloads that placerequirements and constraints on the architecture of a hybrid or multicloudsetup. Although you must design and tailor your architecture to meet theseconstraints and requirements, you can rely on some common patterns.
The patterns covered in this document fall into two categories:
- In asingle pane of glass architecture, all monitoring and logging iscentralized, with the aim of providing a single point of access and control.
- In aseparate application and operations architecture, sensitiveapplication data is segregated from less sensitive operations data, with theaim of meeting compliance requirements for sensitive data.
Choosing your architecture pattern
You can use the decision tree in the following diagram to identify the bestarchitecture for your use case.
Details of each architecture are discussed further in this document, but at ahigh level, your choices are as follows:
- Export from Monitoring to legacy solution.
- Export directly to legacy solution.
- Use Monitoring with Prometheus and Fluentd or Fluent Bit.
- Use Monitoring with observIQ BindPlane.
Single pane of glass architecture
A common goal for a hybrid system is to integrate monitoring and logginginformation from various sources across multiple applications and environmentsinto a single display. This type of display is called asingle pane of glass.
The following diagram illustrates this pattern where monitoring and loggingdata from all applications, both on-premises and in the cloud, is centralizedinto a single repository hosted in the cloud.
This architecture has the following advantages:
- You have a single, consistent view for all monitoring and logging.
- You have a single place to manage data storage and retention.
- You get centralized access control and auditing. However, you stillneed to ensure the security of data in transit to the central repository.
Monitoring as a single pane of glass
Cloud Monitoring is a Google-managed monitoring and management solution for services, containers,applications, and infrastructure. For a singlepane of glass and a robust storage solution formetrics, logs, traces, and events, use Google Cloud Observability. Thesuite also provides a complete suite of observability tooling, such as dashboards,reporting, and alerting.
All Google Cloud products and services support integration withMonitoring. In addition, there are several integrated tools thatyou can use to extend Monitoring to hybrid and on-premisesresources.
The following best practices apply to all architectures usingMonitoring as a single pane of glass:
- To fulfill compliance requirements for log retention, set up log sinks for your organization.
- For fast analysis of log events, set uplog exports to BigQuery for security and access analytics.
- To analyze logs that are stored in your log bucket, run SQL queries throughLog Analytics.
- For projects containing sensitive data, consider enablingData Access audit logs,so you can track who has accessed the data.
- To remove sensitive information, such as Social Security numbers, creditcard numbers, and email addresses, you can filter log data. You can filterby usinga custom Fluent Bit configuration oringest with logs exclusions.You can also export unfiltered logs separately to meet compliance requirements.
Hybrid monitoring and logging with Monitoring and BindPlane by observIQ
With BindPlane from Google's partnerobservIQ,you can import monitoring and logging data from both on-premises VMs and othercloud providers, such as Amazon Web Services (AWS), Microsoft Azure, AlibabaCloud, and IBM Cloud into Google Cloud. The following diagramshows how Monitoring and BindPlane can provide a single pane ofglass for a hybrid cloud.
This architecture has the following advantages:
- In addition to monitoring resources like VMs, BindPlane has built-indeep integration forover 50 popular data sources.
- There are no additional licensing costs for using BindPlane.BindPlane metrics are imported into Monitoring as custommetrics, which arechargeable.Likewise, BindPlane logs are charged at the same rate as otherLogging logs.
For more details about implementing this pattern, seeLogging and monitoring on-premises resources with BindPlane.
Hybrid Google Kubernetes Engine monitoring with Prometheus and Monitoring
WithGoogle Cloud Managed Service for Prometheus,a popular open source monitoring solution fully managed by Google Cloud,you can monitor applications running on multiple Kubernetes clusters withMonitoring. This architecture is useful when running Kubernetesworkloads distributed across Google Kubernetes Engine (GKE) onGoogle Cloud and Google Distributed Cloud in your on-premises data center,because it provides a unified interface across both. The following diagram showshow to use Prometheus and the Monitoring collectors for datacollection.
This architecture has the following advantages:
- Consistent Kubernetes metrics across cloud and on-premises environments.
- It lets you globally monitor and alert on your workloads by usingPrometheus, without having to manually manage and operate Prometheus atscale.
- There are no additional licensing costs for using Prometheus. Prometheusmetrics are imported into Monitoring. The imports arechargeable and priced by the number of samples ingested.
This architecture has the following disadvantages:
- Prometheus supports monitoring only, so logging has to be configuredseparately. The following section discussesa common option for logging using either Fluentd or Fluent Bit.
We recommend the following best practice:
- By default, Prometheus collects all exposed metrics, each of whichbecomes a chargeable metric. To avoid unexpected costs, considerimplementingMonitoring cost controls.
Hybrid GKE logging with Fluentd or Fluent Bit and Cloud Logging
WithFluentd orFluent Bit,a popular open source logging agent and Cloud Logging, you can ingest logsfrom applications running on multiple GKE clusters toCloud Logging. This architecture is useful when running Kubernetes workloadsdistributed across GKE on Google Cloud andGoogle Distributed Cloud in your on-premises data center, because it providesa unified interface across both. The following diagram illustrates the flow oflogs.
This architecture has the following advantages:
- You can have consistent Kubernetes logging across cloud and on-premisesenvironments.
- You can customize Logging to filter out sensitiveinformation.
- There are no additional licensing costs for using Fluentd or Fluent Bit. Logsthat are imported into Logging by using Fluentd or FluentBit arechargeable.
This architecture has the following disadvantages:
- Fluentd and Fluent Bit support logging only, so monitoring has to beconfigured separately. The previous section discusses a common option formonitoring with Prometheus.
For more details about implementing this pattern, seeCustomizing Fluent Bit for Google Kubernetes Engine logs.
Partner services as single panes of glass
If you are already using a third-party monitoring or logging service such asDatadog or Splunk, you might not want to move to Logging. Ifso, you can export data from Google Cloud to many common monitoring andlogging services. You can choose to use an integrated monitoring and loggingservice, or select separate monitoring and logging services that best fit yourneeds.
Export from Logging to partner services
In this pattern, you authorize the partner's monitoring service, such asDatadog,to connect to the Cloud Monitoring API. This authorization lets the service ingestall the metrics available to Logging, so Datadog can functionas a single pane of glass for monitoring.
For logging data, Logging provides exports (log sinks) toPub/Sub.These exports provide a performant and resilient method for partner loggingservices such asElastic andSplunk to ingest large volumes of logs from Logging in real time, sothese partner services can serve a single pane of glass for logs.
The combined architecture for logging and monitoring is shown in the followingdiagram.
This architecture has the following advantages:
- You can continue to use familiar existing tools.
- Google Cloud Support continues to have access toLogging logs for troubleshooting.
This architecture has the following disadvantages:
- Partner solutions are typically externally hosted, which means theymight not be available or collect data if network connections aredisrupted. Sometimes, you can mitigate this risk by self-hosting, but atthe cost of having to maintain the infrastructure for the solution yourself.
- Externally hosted dashboards aren't directly available toGoogle Cloud Support. This lack of availability can slow downtroubleshooting and mitigation.
- Commercial partner solutions might entail more licensing fees.
Some detailed example integrations include the following:
- Datadog:Monitoring Compute Engine metrics andCollect Logging Logs
- Elastic:Exporting Logging logs to Elastic Cloud
- Splunk:Scenarios for exporting Logging
Analyze metrics from Prometheus and Logging with Grafana
Grafana is a popular open source monitoring tool commonly paired withPrometheus for metrics collection. In this architecture, you use Prometheus as theon-premises collection layer and use Grafana as a single pane of glass for bothGoogle Cloud and on-premises resources. The following diagram shows asample architecture that analyzes metrics from Google Cloud andon-premises.
This architecture has the following advantages:
- It's suitable for hybrid environments with both VMs and containers.
- If your organization is already using Prometheus and Grafana, your userscan continue to use them.
This architecture has the following disadvantages:
- Prometheus supports monitoring only, so logging has to beconfigured separately, for example, usingFluentd or theCloud Logging plugin for Grafana.
- Prometheus is open source and extensible, but supports only alimited range of enterprise software integrations.
- Prometheus and Grafana are third-party tools and not official Googleproducts. Google doesn't offer support for Prometheus or Grafana.
For more information, seeBetter troubleshooting with a Cloud Logging plugin for Grafana.
Export logs using Fluentd
Anearlier pattern covered using Fluentd or Fluent Bit as a log collector for Logging. Thesame basic architecture can also be used for other logging or data analyticssystems that support Fluentd or Fluent Bit, includingBigQuery,Elastic, and Splunk. The following diagram illustrates this pattern.
This architecture has the following advantages:
- It's suitable for hybrid environments with both VMs and containers.
- Fluentd can read from manydata sources,including system logs.
- Fluentd offersoutput plugins for many popular third-party logging and data analytics systems.
- Fluent Bit can also read from manyinputs,including system logs.
- Fluent Bit offersoutputs for many popular third-party logging and data analytics systems.
This architecture has the following disadvantages:
- Fluentd and Fluent Bit support logs only, so monitoring has to be configuredseparately. The previous section discussescommon options formonitoring with Prometheus and Grafana.
- Fluentd and Fluent Bit are third-party tools and not official Google products. Googledoesn't offer support for them.
- Exported logs are not available to Google Cloud Support fortroubleshooting. In particular, Google does not offersupport for Google Distributed Cloud clusters without Logging enabled.
Separate application and operations data
Single pane of glass architectures require streaming application monitoringand logging data to the cloud. However, you might have regulatory or compliancerequirements that either require keeping customer data on-premises or placestrict constraints on what data can be stored in the public cloud.
A useful pattern for these hybrid environments is to separate sensitiveapplication data from lower-risk operations data, as illustrated in thefollowing diagram.
Separate application and system data with a hybrid and multi-cloud architecture
To monitor on-premises clusters, you can use open source tools like Prometheusand Grafana. To collect and route telemetry data, you can use asolution like theOpenTelemetry Collector orobservIQ BindPlane.These tools let you configure sensitive application data to be ingestedand viewed entirely on-premises, such as in a self-hosted monitoring andlogging solution. You can export less sensitive system data toMonitoring and Logging on Google Cloud. Thefollowing diagram illustrates this architecture.
This architecture has the following advantages:
- Sensitive application data is kept entirely on-premises.
- On-premises monitoring and logging have no cloud dependencies andremain available even if the network connection is interrupted.
- All GKE system data, both on-premises andGoogle Cloud, is centralized in Monitoring and Logging and is alsoaccessible to Google Cloud Support as needed.
What's next
- Learn more about hybrid and multicloud best practices with theHybrid and multicloud patterns and practices series, includingarchitecture patterns andsecure networking architecture patterns.
- Enroll in theCloud Kubernetes Best Practice quest for hands-on exercises about observability and more on GKE.
- Explore reference architectures, diagrams, and best practices about Google Cloud.Take a look at ourCloud Architecture Center.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2024-06-11 UTC.