Stream data from SQL Server databases Stay organized with collections Save and categorize content based on your preferences.
This page contains information about:
- The behavior of how Datastream handles data that's being pulled froma source SQL Server database.
- The change data capture (CDC) methods that Datastream supports.
- The versions of SQL Server databases that Datastream supports.
- Known limitations for using SQL Server database as a source.
Behavior
Datastream tracks data manipulation language (DML) changes using oneof the following CDC methods:
Change tables
The change tables CDC method lets users retain logs for a shorter time, and thussave storage space, but supports lower throughput compared to the transactionlogs method. The method has fewer limitations than transaction logs. For example,it eliminates the risk of log truncation leading to streams failing permanently,and supports replication of encrypted tables. For more information, seeKnown limitations.
When using this CDC method, changes to the source are tracked using dedicatedchange tables. Transaction logs are still used, but to a limited extent, and youdon't need to retain them for longer periods. As DML events are applied to thesource tables, the changes are replicated to the corresponding change tables.The change tables have the same structure as the source tables, but with extracolumns to include the metadata for the changes. Only committed transactions areadded to the change tables, along with the log sequence number (LSN) of thecommit operation.
Note: Datastream only replicates changes from the default capture instance whose name is<schema_name>_<table_name>.How Datastream handles DDL changes to the source schema
When you use the change tables CDC method, capture instances are created foreach change table. Each capture instance is associated with a list of columns itcaptures and tracks. By default, when a data definition language (DDL) changehappens in the source after the capture instance is created, the instanceignores the change. However, you can configure your SQL Server stream toreplicate columns added to the source schema after the stream and the captureinstance are created.
Before you begin
- Make sure that your Datastream user has the
db_ownerpermissionassigned.
Replicate columns added to the source schema
To have Datastream support replicating columns added to the sourceschema after a stream was created, you need to add theenable_ddl_support_for_ctlabel to your stream:
Go to theStreams page in the Google Cloud Console.
Click the SQL Server stream that you want to edit.
On theStreams details page, clickPause.
ClickEdit >Edit stream settings.
ClickAdd label.
In theKey field, type
enable_ddl_support_for_ct.In theValue field, type
true.ClickSave.
ClickStart to resume your stream.
Datastream checks thecdc.ddl_historytable for new DDLs every five minutes. If there's a new column added to a tableincluded in the stream configuration, Datastream verifies whether thetable has two capture instances:
If it doesn't, then Datastream creates a new captureinstance, reads the original capture instance data until the moment when theDDL happened, and then starts reading from the new capture instance.
Note: After Datastream creates the new capture instance, delete theoriginal one. SQL Server supports a maximum of two capture instances per table,so deleting the old instance is required to allow for future DDL changes.If it does, a log entry is added that the DDL change can't be handled becausethe maximum number of capture instances is reached.
Transaction logs
When using this CDC method, Datastream reads changes in the source directly from transaction logs. This method requires fewer resources and allowsfor faster data retrieval, but hasmore limitations.
To prevent data loss, it's important that the logs aren't truncated beforeDatastream reads them. On the other hand, if you keep the log files fortoo long, they take up storage space, which might eventually cause the databaseinstance to enter read-only mode.
To make sure that the CDC reader has enough time to read the logs while allowinglog truncation to free up storage space, you need to apply additional configurationsteps, such as changing polling intervals and setting up a truncation safeguard.These steps provide an additional layer of protection to ensure that Datastreamcan read the data even if there's downtime on the Datastream side or aconnectivity issue between the source database and Datastream.
For detailed instructions on how to apply these additional measures, see theConfigure a source SQL Server databasepage and select your database type.
Versions
Datastream supports the following versions and editions of SQL Serverdatabases:
- Self-managed (on-premise or cloud-hosted) using the following versions:
- Enterprise: 2008 and later
- Standard: 2016 SP1 and later
- Developer: 2008 and later
- Amazon RDS for SQL Server
Azure SQL Database (tier S3 and above).
Note: Only supported when using the change tables CDC method.Cloud SQL for SQL Server
Datastream doesn't support the following versions of SQL Server databases:
- SQL Server Standard edition from version 2008 to 2014
- SQL Server Express
- SQL Server Web
Known limitations
Known limitations for using SQL Server database as a source include:
- Streams are limited to 10,000 tables.
- A table that has more than 500 million rows can't be backfilled unless thefollowing conditions are met:
- The table has a unique index.
- None of the index columns are nullable.
- All columns of the index are included in the stream.
- Databases with delayed durability or accelerated database recovery (ADR) enabledaren't supported.
- Streaming changes to system tables isn't supported.
- Windows Active Directory (AD) authentication isn't supported.
- The following data types aren't supported, and aren't replicated to the destination:
SQL_VARIANTHIERARCHYIDGEOMETRYGEOGRAPHY
- Datastream replicates user-defined data types, however it's the basedata type from which you derive your user-defined type that is stored in thedestination. For example, if you define a
USERNAMEdata type based on theVARCHAR(50)data type, the data is stored in the destination asVARCHAR(50). Datastream doesn't support CDC for large object columns (
TEXT,NTEXT,XML,IMAGE) and maximum variable length columns(VARCHAR(MAX),VARBINARY(MAX),NVARCHAR(MAX)) in tables without a uniqueindex.If large object columns aren't included in the stream, CDC is supported.
Replicating the following source schema changes isn't supported when usingthe change tables CDC method, and might cause data corruption or failure toprocess events:
- Dropping columns: data in these columns is replaced with
NULLvalues. - Renaming columns: not supported for SQL Server when CDC is enabled.
Modifying data types: the SQL Server CDC capture job propagates only thefollowing data type changes:
- Changing the maximum length for strings: for example, changing the
VARCHAR(50)data type toVARCHAR(100). The BigQuerySTRINGcolumn doesn't have a fixed length and accepts the longer string. - Increasing the integer value range: for example, changing the
INTdata type toBIGINT. The BigQueryINT64data type accommodatesthe new type without requiring a schema change. - Changing the precision and scale for numeric values: for example,changing the
DECIMAL(10, 2)data type toDECIMAL(18, 4). TheBigQueryNUMERICtype has a large fixed precision (38) andscale (9), and such a change is supported.
For other data type modifications, Datastream tries to insert thedata to the destination and raises an error if the data is rejected.
- Changing the maximum length for strings: for example, changing the
- Dropping columns: data in these columns is replaced with
Datastream doesn't support data masking. Data is replicated withoutmasking.
Datastream doesn't support replicating changes applied to the databaseusing the Data Tier Application Package (DACPAC) package.
Datastream doesn't replicate changes made using the
WRITETEXTorUPDATETEXTstatements.If you use the
enable_ddl_support_for_ctlabel to replicate columns added tothe source schema, then you can't have multiple streams attached to the sameSQL Server source table. Because SQL Server supports a maximum of two captureinstances per table, schema changes might cause race conditions whereDatastream attempts to create capture instances, and this mightcause streams to fail.Datastream doesn't support replicatingcomputed columns unless the column is marked
PERSISTED.Datastream doesn't support
PAGE,COLUMNSTOREorCOLUMNSTORE ARCHIVEcompression types.
Additional limitations when using the transaction logs method
If you use the transaction logs CDC method, the following additional limitationsapply:
- Transparent Data Encryption (TDE) isn't supported.
- Column Level Encryption isn't supported. Data in these columns is replaced with
NULLvalues. - When using the transaction logs CDC method, Datastream doesn'tsupport replicating columns added to the source schema after a stream is created.The new columns aren't replicated to the destination.
- Datastream doesn't support the
ROLLBACK TO SAVEPOINTstatement.Such rollback events are ignored and aren't replicated to the destination. - Datastream doesn't support CDC for rows greater than 8 KB in thefollowing types of tables:
- Tables without a unique index
- Tables containing only a unique, non-clustered index with one or morevariable length columns (
VARCHAR,VARBINARY,NVARCHAR)
Datastream doesn't support CDC for large object columns (
TEXT,NTEXT,XML,IMAGE) in the following types of tables:- Tables without a unique index
- Tables containing only a unique, non-clustered index with one or morevariable length columns (
VARCHAR,VARBINARY,NVARCHAR)
If large object columns aren't included in the stream, CDC is only supportedfor such tables if they have valid indexes.
What's next
- Learn how toconfigure a SQL Server sourcefor use with Datastream.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-15 UTC.