Monitoring

You can monitor Bigtable visually, using charts that are available intheGoogle Cloud console, or you can programmatically call the Cloud Monitoring API.

Note: If you use a Java client library to access Bigtable, youcan enableclient-side metrics.

In the Google Cloud console, monitoring data is available in the followingplaces:

  • Bigtable system insights
  • Bigtable instance overview
  • Bigtable cluster overview
  • Bigtable table overview
  • Cloud Monitoring
  • Key Visualizer

The system insights and overview pages provide a high-level view of yourBigtable usage. You can useKey Visualizer to drill down into your access patterns by row key andtroubleshoot specific performance issues.

Understand CPU and disk usage

No matter what tools you use to monitor your instance, it's essential to monitorthe CPU and disk usage for each cluster in the instance. If a cluster's CPU ordisk usage exceeds certain thresholds, the cluster won't perform well, and itmight return errors when you try to read or write data.

CPU usage

The nodes in your clusters use CPU resources to handle reads, writes, andadministrative tasks. We recommend that youenableautoscaling, which lets Bigtableautomatically add and remove nodes to a cluster based on workload. To learn moreabout how the number of nodes affects a cluster's performance, seePerformancefor typical workloads.

Bigtable reports the following metrics for CPU usage:

MetricDescription
Average CPU utilization

The average CPU utilization across all nodes in the cluster. Includes change stream activity if achange stream is enabled for a table in the instance.

In app profile charts, <system> indicates system background activities such as replication andcompaction. System background activities are not client-driven.

The recommended maximum values provide headroom for brief spikes in usage.

CPU utilization of hottest node

CPU utilization for the busiest node in the cluster. This metric continues to beprovided for continuity, but in most cases you should use the more accurate metricHigh-granularity CPU utilization of hottest node.

High-granularity CPU utilization of hottest node

A fine-grained measurement of CPU utilization for the busiest node in the cluster.

The hottest node is not necessarily the same node over time and can change rapidly, especiallyduring large batch jobs or table scans.

If the hottest node is frequently above the recommended value, even when your average CPU utilization is reasonable, you might be accessing a small part of your data much more frequently than the rest of your data.

  • Use theKey Visualizer tool to identify hotspots in your table that might be causing spikes in CPU utilization.
  • Check yourschema design to make sure it supports an even distribution of reads and writes across each table.
Change stream CPU utilization

The average CPU utilization caused by change stream activity across all nodes in the cluster.

CPU utilization by app profile, method, and table

CPU utilization by app profile, method, and table.

If you observe higher than expected CPU usage for a cluster, use this metric to determine if the CPU usage of a particular app profile, API method, or table is driving the CPU load.

Note: If an instance has one or two clusters and at least one of the instance'sapp profiles uses multi-cluster routing, the CPU utilization chart shows arecommendation line that is calculated based on multi-cluster routing.

Disk usage

For each cluster in your instance, Bigtable stores a separate copyof all of the tables in that instance.

Bigtable tracks disk usage in binary units, such as binarygigabytes (GB), where 1 GB is 230 bytes. Thisunit of measurement is also known as agibibyte (GiB).

Bigtable reports the following metrics for disk usage:

MetricDescription
Storage utilization (bytes)

The amount of data stored in the cluster. Change stream usage is not included for this metric.

This value affects yourcosts. Also, as described below, you might need to add nodes to each cluster as the amount of data increases.

Storage utilization (% max)

The percentage of the cluster's storage capacity that is being used. The capacity is based on thenumber of nodes in your cluster. Change stream usage is not included for this metric.

In general, do not use more than 70% of the hard limit on total storage, so you have room to add more data. If you do not plan to add significant amounts of data to your instance, you can use up to 100% of the hard limit.

Important: If any cluster in an instance exceeds the hard limit on theamount of storage per node, writes to all clusters in that instance will fail until youadd nodes to each cluster that is over thelimit. Also, if you try to remove nodes from a cluster, and the change would cause the clusterto exceed the hard limit on storage, Bigtable will deny the request.

If you are using more than the recommended percentage of the storage limit, add nodes to the cluster. You can also delete existing data, butdeleted data takes upmore space, not less, until a compaction occurs.

For details about how this value is calculated, seeStorage utilization per node.

Change stream storage utilization (bytes)

The amount of storage consumed by change stream records for tables in the instance. This storagedoes not count toward the total storage utilization. You are charged for change stream storage,but it is not included in the calculation of storage utilization (% max).

Disk load

The percentage your cluster is using of the maximum possible bandwidth for HDD reads.Available only for HDD clusters.

If this value is frequently at 100%, you might experience increased latency. Add nodes to the cluster to reduce the disk load percentage.

Compaction and replicated instances

Storage metrics reflect the data size on disk as of the last compaction. Becausecompaction happens on a rolling basis over the course of a week, storage usagemetrics for a cluster might sometimes temporarily be different from metrics forother clusters in the instance. Observable impacts of this include thefollowing:

  • A new cluster that has recently been added to an instance might temporarilyshow 0 bytes of storage even though all data has successfully been replicatedto the new cluster.

  • A table might be a different size in each cluster, even when replication isworking properly.

  • Storage usage metrics might be different in each cluster, even afterreplication has finished and no writes have been sent for a few days. The internalstorage implementation, including how data is divided and stored in a distributed manner,can be different for each cluster, causing the actual usage of storage todiffer.

Instance overview

The instance overview page shows the current values of several key metrics foreach cluster:

MetricDescription
CPU utilization average

The average CPU utilization across all nodes in the cluster. Includes change stream activity if achange stream is enabled for a table in the instance.

In app profile charts, <system> indicates system background activities such as replication andcompaction. System background activities are not client-driven.

CPU utilization of hottest node

CPU utilization for the busiest node in the cluster. This metric continues to beprovided for continuity, but in most cases you should use the more accurate metricHigh-granularity CPU utilization of hottest node.

High-granularity CPU utilization of hottest node

A fine-grained measurement of CPU utilization for the busiest node in the cluster.

The hottest node is not necessarily the same node over time and can change rapidly, especiallyduring large batch jobs or table scans.

Exceeding the recommended maximum for the busiest node can cause latency and other issues for thecluster.

Rows readThe number of rows read per second.
Rows writtenThe number of rows written per second.
Read throughputThe number of bytes per second of response data sent. This metric refers to the fullamount of data that is returned after filters are applied.
Write throughputThe number of bytes per second that were received when data was written.
System error rateThe percentage of all requests that failed on the Bigtable server side.
Replication latency for inputThe highest amount of time at the 99th percentile, in seconds, for a write to another clusterto be replicated to this cluster.
Replication latency for outputThe highest amount of time at the 99th percentile, in seconds, for a write to this cluster to bereplicated to another cluster.

To see an overview of these key metrics:

  1. Open the list of Bigtable instances in the Google Cloud console.

    Open the instance list

  2. Click the instance whose metrics you want to view. The Google Cloud console displays thecurrent metrics for your instance's clusters.

Cluster overview

Use the cluster overview page to understand the current and past status of anindividual cluster.

The cluster overview page displays charts showing the following metrics for eachcluster:

MetricDescription
Number of nodesThe number of nodes in use for the cluster at a given time.
Maximum node count targetThe maximum number of nodes that Bigtable will scale the cluster up to when autoscaling is enabled. This metric is visible only when autoscaling is enabled for the cluster. You are able to change this value on theEdit cluster page.
Minimum node count targetThe minimum number of nodes that Bigtable will scale the cluster down to when autoscaling is enabled. This metric is visible only when autoscaling is enabled for the cluster. You are able to change this value on theEdit cluster page.
Recommended number of nodes for CPU targetThe number of nodes that Bigtable recommends for the cluster based on the CPU utilization target that you set. This metric is visible only when autoscaling is enabled for the cluster. If this number is higher than the maximum node count target, consider raising your CPU utilization target or increasing the maximum number of nodes for the cluster. If this number is lower than the minimum number of nodes, the cluster might be overprovisioned for your usage, and you should consider lowering the minimum.
Recommended number of nodes for storage targetThe number of nodes that Bigtable recommends for the cluster based on the built-in storage utilization target. This metric is visible only when autoscaling is enabled for the cluster. If this number is higher than the maximum node count target, consider increasing the maximum number of nodes for the cluster.
CPU utilization

The average CPU utilization across all nodes in the cluster. Includes change stream activity if achange stream is enabled for a table in the instance.

In app profile charts, <system> indicates system background activities such as replication andcompaction. System background activities are not client-driven.

Storage utilization

The amount of data stored in the cluster. Change stream usage is not included for this metric.

This metric reflects the fact that Bigtable compresses your data when it is stored.

To view a cluster's overview page, do the following:

  1. Open the list of Bigtable instances in the Google Cloud console.

    Open the instance list

  2. Click the instance whose metrics you want to view.

  3. Go to the section that follows the section that shows the current status ofsome of the cluster's metrics.

  4. Click the cluster ID to open the cluster'sCluster overview page.

Logs

TheLogs chart displays system event log entries for the cluster. Systemevent logs are generated only for clusters that use autoscaling. To learnadditional ways to view Bigtable audit logs, seeAudit logging.

Table overview

Use the table overview page to understand the current and past status of anindividual table.

The table overview page displays charts showing the following metrics for thetable. Each chart shows a separate line for each cluster that the table is in.

MetricDescription
Storage utilization (bytes)

The percentage of the cluster's storage capacity that is being used by the table. The capacity is based on the number of nodes in the cluster.

For details about how this value is calculated, seeStorage utilization per node.

CPU utilization

The average CPU utilization across all nodes in the cluster. Includes change stream activity if achange stream is enabled for a table in the instance.

In app profile charts, <system> indicates system background activities such as replication andcompaction. System background activities are not client-driven.

Read latency

The time for a read request to return a response.

Measurement of read latency begins when Bigtable receives the request and ends when the last byte of data is sent to the client. For requests for large amounts of data, read latency can be affected by the client's ability to consume the response.

Write latency

The time for a write request to return a response.

Rows read

The number of rows read per second.

This metric provides a more useful view of Bigtable's overall throughput than thenumber of read requests, because a single request can read a large number of rows.

Rows written

The number of rows written per second.

This metric provides a more useful view of Bigtable's overall throughput than thenumber of write requests, because a single request can write a large number of rows.

Read requestsThe number of random reads and scan requests per second.
Write requestsThe number of write requests per second.
Read throughputThe number of bytes per second of response data sent. This metric refers to the fullamount of data that is returned after filters are applied.
Write throughputThe number of bytes per second that were received when data was written.
Automatic failovers

The number of requests that were automatically rerouted from one cluster to another due to a failover scenario, such as a brief outage or delay. Automatic rerouting can occur if an app profile uses multi-cluster routing.

This chart does not include manually rerouted requests.

The table overview page also shows the table's replication state in each clusterin the instance. For each cluster, the page displays the following:

  • Status
  • Cluster ID
  • Zone
  • The amount of cluster storage used by the table
  • Encryption key and key status
  • Date of the latest backup of the selected table
  • A link to theEdit cluster page.

To view a table's overview page, do the following:

  1. Open the list of Bigtable instances in the Google Cloud console.

    Open the instance list

  2. Click the instance whose metrics you want to view.

  3. In the left pane, clickTables. The Google Cloud console displays alist of all the tables in the instance.

  4. Click a table ID to open the table'sTable overview page.

Monitor performance over time

Use your Bigtable instance's system insights page to understand thepast performance of your instance. You can analyze the performance of eachcluster, and you can break down the metrics for different types ofBigtable resources. Charts can display a period ranging from thepast 1 hour to the past 6 weeks.

System insights charts for Bigtable resources

The Bigtable system insights page provides charts for the followingtypes of Bigtable resources:

  • Instances
  • Tables
  • Application profiles
  • Replication

Charts on the system insights page show the following metrics:

MetricAvailable forDescription
CPU utilizationInstances
Tables
App profiles

The average CPU utilization across all nodes in the cluster. Includes change stream activity if achange stream is enabled for a table in the instance.

In app profile charts, <system> indicates system background activities such as replication andcompaction. System background activities are not client-driven.

High-granularity CPU utilization (hottest node)Instances

A fine-grained measurement of CPU utilization for the busiest node in the cluster.

The hottest node is not necessarily the same node over time and can change rapidly, especiallyduring large batch jobs or table scans.

Exceeding the recommended maximum for the busiest node can cause latency and other issues for thecluster.

Data Boost serverless processing units (SPUs)InstancesBillable Data Boost compute usage measured in SPU-seconds.
Read latency Instances
Tables
App profiles

The time for a read request to return a response.

Measurement of read latency begins when Bigtable receives the request and ends when the last byte of data is sent to the client. For requests for large amounts of data, read latency can be affected by the client's ability to consume the response.

SQL read latency Instances
App profiles

The time for a SQL read request to return a response.

Measurement of SQL read latency begins when Bigtable receives the request and ends when the last byte of data is sent to the client. For requests for large amounts of data, SQL read latency can be affected by the client's ability to consume the response.

Write latency Instances
Tables
App profiles

The time for a write request to return a response.

Client-side read latency Instances
Tables
App profiles

The total end-to-end latency across all RPC attempts associated with a Bigtable operation. Measures the operation's round trip from the client to Bigtable and back to the client and includes all retries.

Client-side SQL read latency Instances
Tables
App profiles

The total end-to-end latency across all RPC attempts associated with a Bigtable operation.

Measures the operation's round trip from the client to Bigtable and back to the client and includes all retries. ForExecuteQuery requests, the operation latencies include the application processing time for each returned message.

Client-side write latency Instances
Tables
App profiles

The total end-to-end latency across all RPC attempts associated with a Bigtable operation. Measures the operation's round trip from the client to Bigtable and back to the client and includes all retries.

Client-side read attempt latency Instances
Tables
App profiles

The latencies of a client read RPC attempt. Under normal circumstances, this value is identical tooperation_latencies. If the client receives transient errors, however, thenoperation_latencies is the sum of allattempt_latencies and the exponential delays.

Client-side SQL read attempt latency Instances
Tables
App profiles

The latencies of a client SQL read RPC attempt. Under normal circumstances, this value is identical tooperation_latencies. If the client receives transient errors, however, thenoperation_latencies is the sum of allattempt_latencies and the exponential delays.

Client-side write attempt latency Instances
Tables
App profiles

The latencies of a client write RPC attempt. Under normal circumstances, this value is identical tooperation_latencies. If the client receives transient errors, however, thenoperation_latencies is the sum of allattempt_latencies and the exponential delays.

User error rateInstances

The rate of errors caused by the content of a request, as opposed to errors on the Bigtable server side. The user error rate includes the followingstatus codes:

  • INVALID_ARGUMENT
  • NOT_FOUND
  • PERMISSION_DENIED
  • RESOURCE_EXHAUSTED
  • OUT_OF_RANGE

User errors are typically caused by a configuration issue, such as a request that specifies the wrong cluster, table, or app profile.

Note: To view this chart, you must group the system insights data by instance. In theView metrics for drop-down list, selectInstance. Then, underGroup by, clickInstance.
System error rate InstancesThe percentage of all requests that failed on the Bigtable server side. The system error rate includes the followingstatus codes:
  • UNKNOWN
  • ABORTED
  • UNIMPLEMENTED
  • INTERNAL
  • UNAVAILABLE
Automatic failovers Instances
Tables
App profiles

The number of requests that were automatically rerouted from one cluster to another due to a failover scenario, such as a brief outage or delay. Automatic rerouting can occur if an app profile uses multi-cluster routing.

This chart does not include manually rerouted requests.

SQL automatic failovers Instances
Tables
App profiles

The number of SQL requests that were automatically rerouted from one cluster to another due to a failover scenario, such as a brief outage or delay. Automatic rerouting can occur if an app profile uses multi-cluster routing.

This chart does not include manually rerouted requests.

Storage utilization (bytes) Instances
Tables

The amount of data stored in the cluster. Change stream usage is not included for this metric.

This metric reflects the fact that Bigtable compresses your data when it is stored.

Storage utilization (% max)Instances

The percentage of the cluster's storage capacity that is being used. The capacity is based on thenumber of nodes in your cluster. Change stream usage is not included for this metric.

For details about how this value is calculated, seeStorage utilization per node.

Disk loadInstancesThe percentage your cluster is using of the maximum possible bandwidth for HDD reads.Available only for HDD clusters.
Rows read Instances
Tables
App profiles

The number of rows read per second.

This metric provides a more useful view of Bigtable's overall throughput than thenumber of read requests, because a single request can read a large number of rows.

Rows written Instances
Tables
App profiles

The number of rows written per second.

This metric provides a more useful view of Bigtable's overall throughput than thenumber of write requests, because a single request can write a large number of rows.

Read requests Instances
Tables
App profiles
The number of random reads and scan requests per second.
Write requests Instances
Tables
App profiles
The number of write requests per second.
Read throughput Instances
Tables
App profiles
The number of bytes per second of response data sent. This metric refers to the fullamount of data that is returned after filters are applied.
Write throughput Instances
Tables
App profiles
The number of bytes per second that were received when data was written.
Write throughput Instances
Tables
App profiles
The number of bytes per second that were received when data was written.
Node countInstancesThe number of nodes in the cluster.
Data Boost traffic eligibilityApp profilesCurrent Bigtable requests that are eligible and ineligible for Data Boost.
Data Boost traffic ineligible reasonsApp profilesReasons that current traffic is ineligible for Data Boost.

To view metrics for these resources:

  1. Open the list of Bigtable instances in the Google Cloud console.

    Open the instance list

  2. Click the instance whose metrics you want to view.

  3. In the left pane, clickSystem insights. The Google Cloud console displays aseries of charts for the instance, as well as a tabular view of the instance'smetrics. By default, the Google Cloud console shows metrics for the past hour,and it shows separate metrics for each cluster in the instance.

    To view all of the charts, scroll through the pane where the charts aredisplayed.

    To view metrics at the table level, clickTables.

    To view metrics for individual app profiles, clickApplication Profiles.

    To view combined metrics for the instance as a whole, find theGroup bysection above the charts, then clickInstance.

    To view metrics for a longer period of time, click the arrow next to1Hour. Choose a pre-set time range or enter a custom time range, then clickApply.

Important: When you group byCluster, you might sometimes see requests tagged with an<unspecified> cluster ID. This tag is used for requests that were unable toreach a cluster, and generally represents only a very small percentage of requests.

Charts for replication

The system insights page provides a chart that shows replication latency over time.You can view the average latency for replicating writes at the 50th, 99th, and100th percentiles.

To view the replication latency over time:

  1. Open the list of Bigtable instances in the Google Cloud console.

    Open the instance list

  2. Click the instance whose metrics you want to view.

  3. In the left pane, clickSystem insights. The page opens withtheInstance tab selected.

  4. Click theReplication tab. TheGoogle Cloud console displays replication latency over time. By default,the Google Cloud console shows replication latency for the past hour.

    To toggle between latency charts grouped by table or by cluster, use theGroup by menu.

    To change which percentile to view, use thePercentile menu.

    To view metrics for a longer period of time, click the arrow next to1Hour. Choose a pre-set time range or enter a custom time range, then clickApply.

Monitor with Cloud Monitoring

Bigtable exportsusage metrics toCloud Monitoring. You can use these metrics in a variety ofways:

  • Monitor programmatically using the Cloud Monitoring API.
  • Monitor visually in the Metrics Explorer.
  • Set upalerting policies.
  • Add Bigtable usage metricsto a custom dashboard.
  • Use a graphing library, such asMatplotlib for Python, to plotand analyze the usage metrics for Bigtable.

To view usage metrics in the Metrics Explorer:

  1. Open the Monitoring page in the Google Cloud console.

    Open the Monitoring page

    If you are prompted to choose an account, choose the account that you use toaccess Google Cloud.

  2. ClickResources, then clickMetrics Explorer.

  3. UnderFind resource type and metric, typebigtable. A list ofBigtable resources and metrics appears.

  4. Click a metric to view a chart for that metric.

For additional information about using Cloud Monitoring, see theCloud Monitoring documentation.

For a complete list of Bigtable metrics, seeMetrics.

Create a storage utilization alert

You can set up an alert to notify you when your Bigtable clusterexceeds a specified threshold. For more information about determining yourtarget storage utilization, seeDisk usage.

To create an alerting policy that triggers when the storage utilization for yourBigtable cluster is above a recommended threshold, such as 70%, use the following settings.

Steps to create an alerting policy.

To create an alerting policy, do the following:

  1. In the Google Cloud console, go to the Alerting page:

    Go toAlerting

    If you use the search bar to find this page, then select the result whose subheading isMonitoring.

  2. If you haven't created your notification channels and if you want to be notified, then clickEdit Notification Channels and add your notification channels. Return to theAlerting page after you add your channels.
  3. From theAlerting page, selectCreate policy.
  4. To select the resource, metric, and filters, expand theSelect a metric menu and then use the values in theNew condition table:
    1. Optional: To limit the menu to relevant entries, enter the resource or metric name in the filter bar.
    2. Select aResource type. For example, selectVM instance.
    3. Select aMetric category. For example, selectinstance.
    4. Select aMetric. For example, selectCPU Utilization.
    5. SelectApply.
  5. ClickNext and then configure the alerting policy trigger. To complete these fields, use the values in theConfigure alert trigger table.
  6. ClickNext.
  7. Optional: To add notifications to your alerting policy, clickNotification channels. In the dialog, select one or more notification channels from the menu, and then clickOK.

    To be notified when incidents are openend and closed, checkNotify on incident closure. By default, notifications are sent only when incidents are openend.

  8. Optional: Update theIncident autoclose duration. This field determines when Monitoring closes incidents in the absence of metric data.
  9. Optional: ClickDocumentation, and then add any information that you want included in a notification message.
  10. ClickAlert name and enter a name for the alerting policy.
  11. ClickCreate Policy.
New condition
Field

Value
Resource and MetricIn theResources menu, selectCloud Bigtable Cluster.
In theMetric categories menu, selectCluster.
In theMetrics menu, selectStorage utilization.

(The metric.type isbigtable.googleapis.com/cluster/storage_utilization).
Filtercluster =YOUR_CLUSTER_ID
Configure alert trigger
Field

Value
Condition typeThreshold
Condition triggers ifAny time series violates
Threshold positionAbove threshold
Threshold value70
Retest window10 minutes

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.