Change streams overview

Bigtable provides change data capture (CDC) with itschange streamsfeature. A change stream captures data changes to a Bigtable tableas the changes happen, letting you stream them for processingor analysis.

This document provides an overview of Bigtable change streams.Before you read this document, you should be familiar with theBigtable overview.

Change streams are valuable for CDC use cases including the following:

Triggering downstream application logic when specified changes occur
Integrating with a data analytics pipeline
Supporting audit and archival requirements

What a change stream is

A change stream tracks changes at the table level that are made by a user orapplication, usually using one of the Cloud Bigtable client libraries.Garbage collection changes are also captured.

All changes applied to a change stream-enabled table are stored asdata changerecords. Data change records include data changes applied by the following:

Writes, deletions, and updates that are sent using the Cloud Bigtable API methodsMutateRow,MutateRows,CheckAndMutateRow, andReadModifyWriteRow
Deletions that take place due to garbage collection
Rows deleted using Admin API'sDropRowRange method

For details about the types of changes that you can send to aBigtable table, seeReads,Writes,Deletes,andGarbage collection overview.

Change streams don't track schema changes, such as adding or modifying acolumn family, or replication topology, like adding or removing a cluster.

Data change records for each row key and each cluster are in commit timestamporder. However, there is no ordering guarantee on data change records for adifferent row key or cluster.

You enable change streams on a table and specify a retention period of 1 to 7days.

Note: In the Cloud Bigtable API (Data API), changes sent to a table arecalledmutations.

What's in a data change record

Each data change record contains all changes for a row that were appliedatomically as part of a single RPC call.

If a value is overwritten, the newly written value is recorded in the datachange record.The data change record does not contain the old value.

A data change record receives its timestamp, called acommit timestamp, at thesame time that the change is applied to the first cluster that receives it. Forexample, consider an instance with two clusters. If you send a write request toTable 1 on Cluster A, the data change record commit timestamp is assigned whenthe write is received by Cluster A, and the data change record on Cluster B forthis write has the same commit timestamp.

Each data change record contains the following:

Entries - changes made to the row, including one or more of the following:

Write
- Column family
- Column qualifier
- Timestamp
- Value
Deletion of cells
- Column family
- Column qualifier
- Timestamp range
Deletion of a column family
- Column family
- Deletion from a row - Deletion from a row is converted to a listof deletions from column families for each column family that therow has data in.

Row key - the identifier for the changed row

Change type - either user-initiated or garbage collection

ID of the cluster that received the change

Commit timestamp - server-side time when the change was committed to thetable

Tie breaker - a value that lets the application that is reading the streamuse Bigtable's built-in conflict resolutionpolicy
Token - used by the consuming application to resume the stream if it'sinterrupted
Estimated low watermark - the estimated time since the record's partitioncaught up with replication across all clusters. For details, seePartitionsandWatermarks.

For more information about the fields in a data change record, see the APIreference forDataChange.

Change stream storage

A table and its change stream share the same cluster-level resources, includingnodes and storage. As a result, change stream data storage is part of a table'sstorage. In an instance that uses replication, a copy of a change stream's datais stored in every cluster of the instance that contains the changestream-enabled table.

The storage used for your change stream data doesn't count toward your totalstorage utilization (% max). As a result, you don't need to add nodes to handlethe increased storage that change stream data consumes (although you might needto add nodes for additional compute power). However, you are charged for thestorage that your change stream data consumes. For details, seeCost considerations.

Reading a change stream

To read (stream) a change stream, you must use anapplication profile configured forsingle-cluster routing, and if you stream using Dataflow, you must enable single-row transactions.

For more information about routing policies, seeRouting options.

For more information about single-row transactions, seeSingle-row transactions.

Change stream methods are provided by the Cloud Bigtable API (Data API).We recommend that you use one of the following options instead of calling the API without using a client library or connector:

Dataflow templates
Bigtable Beam connector
Java client library

All the options let you avoid the need to track and handlepartition changes due to splits and merges.

For more information, seeReadChangeStream.

Dataflow templates

You can use one of the following Dataflow templates provided by Google:

Bigtable Beam connector

You can use the Bigtable Beam connector to build a pipeline:

Bigtable Beam connector

If you don't want to build your own pipeline, you can use the code samples fromthe Bigtable tutorial or quickstart as a starting point for yourcode:

Java client library

Cloud Bigtable client for Java

Partitions

To maintain a high read throughput that matches a high write or change rate,Bigtable divides a change stream into multiplepartitions thatcan be used to read the change stream in parallel. Each change stream partitionis associated with atablet. Tablets are subsections of a table that areredistributed as needed to help balance the table's request workload. To learnmore, seeLoad balancing.

Note: You are not able to configure or monitor change stream partitions ortablets, but you need to understand them to read a change stream using the Javaclient library.

The Java client library lets you query each partition for changes and providesthe information required to manage changes in partitions dueto splits and merges.

Watermarks

Awatermark is a timestamp that estimates how recently a partition has caughtup with replication across all clusters. The watermark for the partition iscontinuously updated as replication occurs, advancing forward in time.

EachChangeStreamMutation (data change record) includes anestimatedLowWatermark field, which is the watermark for the partition that isassociated with the data change record. ThisestimatedLowWatermark is anestimate and doesn't guarantee that there isn't data that has yet to arrive onthe stream.

Watermarks for replicated tables

A partition'sestimatedLowWatermark (low watermark) doesn't advance ifreplication isn't fully caught up for the partition. Thestream-wide lowwatermark — the lowest of all partition-level estimated low watermarks— stops advancing if any partition's watermark is not moving forward. Awatermark that has stopped advancing is considered to bestalled. When thisoccurs, if you are streaming your change stream in a pipeline, the pipelinestalls.

Many factors can cause one or more partition-level watermarks to stall for someamount of time, including the following:

Overloading a cluster with traffic that causes replication to fallbehind for one or more partitions
Network delays
Cluster unavailability

The Bigtable Beam connector handles this by setting theoutput timestamp to zero for all data. For more information seeGrouping data without event times.

Monitoring

To help you understand how enabling a change stream affects CPU and storageutilization for an instance that contains change stream-enabled tables, weprovide two change stream-specific metrics. You can view the metrics on theBigtable system insights page or by using the Cloud Monitoring suiteof tools.

Bytes used by the change stream records (change_stream_log_used_bytes)
CPU utilization by change streams ( usescpu_load_by_app_profile_by_method_by_table)

For details on these metrics, seeMonitoring.

Cost considerations

Enabling a change stream on a table results in increased costs for nodes andstorage. In particular, you can expect to incurmore storagecosts.

Nodes

You usually need to add nodes to a cluster (or increase the maximum number ofnodes if you use autoscaling) to handle the additional traffic of enabling andprocessing the data change records.

Enabling a change stream can increase CPU usage by around 10%, evenbefore you start processing it. Processing a change stream, such as reading itusing a Dataflow pipeline, can increase CPU utilization by around20 to 30%, depending on the level of change activity and how the stream datais read.

Storage

You are charged the standardBigtable storage rates to store your table's data change records. You are also charged to store thetable that is created to track change stream metadata. The retention period thatyou specify directly affects storage costs.

As a general rule, a day's worth of data change records – reflecting only themutations that occurred that day – takes up about 1.5 times as much storage asthe data that was written that day consumes on disk.

Network data transfer

If you read a change stream across regions, you can incur costs for thattraffic. See the Network section onBigtable pricing for a complete list of network data transfer rates.

Processing costs

Depending on how you read the data change records, additional costs forservices other than Bigtable apply. For example, if you useDataflow, you pay for the bytes that are processed and the workermachines that process the job. For details, seeDataflow pricing.

Dropping row ranges

If possible, avoiddropping a rowrangefrom a table that has a change stream enabled. If you must drop a row range, beaware that it might take a long time for Bigtable to complete theoperation, and CPU usage increases during the operation.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.

Movatterモバイル変換