Stream data from SQL Server databases

This page contains information about:

  • The behavior of how Datastream handles data that's being pulled froma source SQL Server database.
  • The change data capture (CDC) methods that Datastream supports.
  • The versions of SQL Server databases that Datastream supports.
  • Known limitations for using SQL Server database as a source.

Behavior

Datastream tracks data manipulation language (DML) changes using oneof the following CDC methods:

Change tables

The change tables CDC method lets users retain logs for a shorter time, and thussave storage space, but supports lower throughput compared to the transactionlogs method. The method has fewer limitations than transaction logs. For example,it eliminates the risk of log truncation leading to streams failing permanently,and supports replication of encrypted tables. For more information, seeKnown limitations.

When using this CDC method, changes to the source are tracked using dedicatedchange tables. Transaction logs are still used, but to a limited extent, and youdon't need to retain them for longer periods. As DML events are applied to thesource tables, the changes are replicated to the corresponding change tables.The change tables have the same structure as the source tables, but with extracolumns to include the metadata for the changes. Only committed transactions areadded to the change tables, along with the log sequence number (LSN) of thecommit operation.

Note: Datastream only replicates changes from the default capture instance whose name is<schema_name>_<table_name>.

How Datastream handles DDL changes to the source schema

When you use the change tables CDC method, capture instances are created foreach change table. Each capture instance is associated with a list of columns itcaptures and tracks. By default, when a data definition language (DDL) changehappens in the source after the capture instance is created, the instanceignores the change. However, you can configure your SQL Server stream toreplicate columns added to the source schema after the stream and the captureinstance are created.

Before you begin
  • Make sure that your Datastream user has thedb_owner permissionassigned.

Replicate columns added to the source schema

To have Datastream support replicating columns added to the sourceschema after a stream was created, you need to add theenable_ddl_support_for_ctlabel to your stream:

  1. Go to theStreams page in the Google Cloud Console.

    Go to the Streams page

  2. Click the SQL Server stream that you want to edit.

  3. On theStreams details page, clickPause.

  4. ClickEdit >Edit stream settings.

  5. ClickAdd label.

  6. In theKey field, typeenable_ddl_support_for_ct.

  7. In theValue field, typetrue.

  8. ClickSave.

  9. ClickStart to resume your stream.

Datastream checks thecdc.ddl_historytable for new DDLs every five minutes. If there's a new column added to a tableincluded in the stream configuration, Datastream verifies whether thetable has two capture instances:

Transaction logs

When using this CDC method, Datastream reads changes in the source directly from transaction logs. This method requires fewer resources and allowsfor faster data retrieval, but hasmore limitations.

To prevent data loss, it's important that the logs aren't truncated beforeDatastream reads them. On the other hand, if you keep the log files fortoo long, they take up storage space, which might eventually cause the databaseinstance to enter read-only mode.

To make sure that the CDC reader has enough time to read the logs while allowinglog truncation to free up storage space, you need to apply additional configurationsteps, such as changing polling intervals and setting up a truncation safeguard.These steps provide an additional layer of protection to ensure that Datastreamcan read the data even if there's downtime on the Datastream side or aconnectivity issue between the source database and Datastream.

For detailed instructions on how to apply these additional measures, see theConfigure a source SQL Server databasepage and select your database type.

Versions

Datastream supports the following versions and editions of SQL Serverdatabases:

Datastream doesn't support the following versions of SQL Server databases:

Known limitations

Known limitations for using SQL Server database as a source include:

Additional limitations when using the transaction logs method

If you use the transaction logs CDC method, the following additional limitationsapply:

  • Transparent Data Encryption (TDE) isn't supported.
  • Column Level Encryption isn't supported. Data in these columns is replaced withNULL values.
  • When using the transaction logs CDC method, Datastream doesn'tsupport replicating columns added to the source schema after a stream is created.The new columns aren't replicated to the destination.
  • Datastream doesn't support theROLLBACK TO SAVEPOINT statement.Such rollback events are ignored and aren't replicated to the destination.
  • Datastream doesn't support CDC for rows greater than 8 KB in thefollowing types of tables:
    • Tables without a unique index
    • Tables containing only a unique, non-clustered index with one or morevariable length columns (VARCHAR,VARBINARY,NVARCHAR)
  • Datastream doesn't support CDC for large object columns (TEXT,NTEXT,XML,IMAGE) in the following types of tables:

    • Tables without a unique index
    • Tables containing only a unique, non-clustered index with one or morevariable length columns (VARCHAR,VARBINARY,NVARCHAR)

    If large object columns aren't included in the stream, CDC is only supportedfor such tables if they have valid indexes.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.