Configure a self-managed SQL Server database for CDC

This page describes how to configure change data capture (CDC) tostream data from a self-managed SQL Server database to asupported destination,such as BigQuery or Cloud Storage.

  1. Enable CDC for your source database. To do it, connectto the database and run the following command at a SQL prompt or in a terminal:

    USE[DATABASE_NAME]GOEXECsys.sp_cdc_enable_dbGO

    ReplaceDATABASE_NAME with the name of your source database.

  2. Enable CDC on the tables for which you need to capture changes:

    USE[DATABASE_NAME]EXECsys.sp_cdc_enable_table@source_schema=N'SCHEMA_NAME',@source_name=N'TABLE_NAME',@role_name=NULLGO
    Note: You need to run the command for each table for which you want to enable CDC.

    Replace the following:

    • DATABASE_NAME: the name of your source database
    • SCHEMA_NAME: the name of the schema to which the tables belong
    • TABLE_NAME: the name of the table for which you want to enable CDC
  3. Start the SQL Server Agent and make sure it's running at all times. If theSQL Server Agent remains down for an extended period, the logs might get truncated,leading to a permanent loss of the change data that wasn't read by Datastream.

    For information about running the SQL Server Agent, seeStart, stop, or restart an instance of the SQL Server Agent.

    Note: We recommendsetting the automatic restart on for the SQL Server Agent service to prevent the agent from being down for a longer time.
  4. Enable snapshot isolation.

    When you backfill data from your SQL Server database, it's important to ensureconsistent snapshots. If you don't apply the settings described in thissection, changes made to the database during the backfill process might lead toduplicates or incorrect results, especially for tables without primary keys.

    Enabling snapshot isolation creates a temporary view of your database at the startof the backfill process. This ensures that the data being copied remains consistent,even if other users are making changes to the live tables at the same time.Enabling snapshot isolation might have a slight performance impact, but it'sessential for reliable data extraction.

    To enable snapshot isolation:

    1. Connect to your database using a SQL Server client.
    2. Run the following command:
    ALTERDATABASEDATABASE_NAMESETALLOW_SNAPSHOT_ISOLATIONON;

    ReplaceDATABASE_NAME with the name of you database.

  5. Create a Datastream user:

    1. Connect to the source database and enter the following command:

      USEDATABASE_NAME;
    2. Create a login to use while setting up the connection profile in Datastream.

      CREATELOGINYOUR_LOGINWITHPASSWORD='PASSWORD';
    3. Create a user:

      CREATEUSERUSER_NAMEFORLOGINYOUR_LOGIN;
    4. Assign thedb_datareader role to them:

      EXECsp_addrolemember'db_datareader','USER_NAME';
    5. Grant theVIEW DATABASE STATE permission to them:

      GRANTVIEWDATABASESTATETOUSER_NAME;
    6. Add this user to themaster database:

      USEmaster;CREATEUSERUSER_NAMEFORLOGINYOUR_LOGIN;

Additional steps required for the transaction logs CDC method

The steps described in this section are only required when you configure yoursource SQL Server database for use with the transaction logs CDC method.

  1. Connect to the source database and assign thedb_owner anddb_denydatawriterroles to your user:

    USEDATABASE_NAME;EXECsp_addrolemember'db_owner','USER_NAME';EXECsp_addrolemember'db_denydatawriter','USER_NAME';
  2. GrantSELECT permissions for thesys.fn_dblog function.

    USEmaster;GRANTSELECTONsys.fn_dblogTOUSER_NAME;
  3. Add your user to the msdb database and assign the following permissions to them:

    USEmsdb;CREATEUSERUSER_NAMEFORLOGINYOUR_LOGIN;GRANTSELECTONdbo.sysjobsTOUSER_NAME;
  4. Assign the following permissions to your user in themaster database:

    USEmaster;GRANTVIEWSERVERSTATETOYOUR_LOGIN;
  5. Set the polling interval for which you want the changes to be available onyour source.

    USE[DATABASE_NAME]EXECsys.sp_cdc_change_job@job_type='capture',@pollinginterval=86399EXECsp_cdc_stop_job'capture'EXECsp_cdc_start_job'capture'

    The@pollinginterval parameter is measured in seconds with a recommended valueset to86399. This means that the transaction log retains changes for 86,399seconds (one day). Executing thesp_cdc_start_job 'capture procedure initiatesthe settings.

  6. If there are any cleanup or capture jobs running on your database, stop them.For more information, seeAdminister and monitor change data capture.

  7. Set up a log truncation safeguard.

    To make sure that the CDC reader has enough time to read the logs while allowinglog truncation to prevent using up the storage space, you can set up a log truncationsafeguard:

    1. Connect to the database using a SQL Server client.
    2. Create a stored procedure that runs an active transaction for a period thatyou specify to prevent log truncation:

      CREATEPROCEDUREdbo.DatastreamLogTruncationSafeguard@transaction_logs_retention_timeINTASBEGINDECLARE@transactionLogTABLE(beginLSNBINARY(10),endLSNBINARY(10))INSERT@transactionLogEXECsp_repltransDECLARE@currentDateTimeDATETIME=GETDATE()DECLARE@cutoffDateTimeDATETIME=DATEADD(MINUTE,-@transaction_logs_retention_time,@currentDateTime)DECLARE@firstValidLSNBINARY(10)=NULLDECLARE@lastValidLSNBINARY(10)=NULLDECLARE@firstTxnTimeDATETIME=NULLDECLARE@lastTxnTimeDATETIME=NULLSELECTTOP1@lastTxnTime=t.logStartTime,@lastValidLSN=t.beginLSNFROM(SELECTbeginLSNASbeginLSN,(SELECTTOP1[begin time]FROMfn_dblog(stuff(stuff(CONVERT(CHAR(24),beginLSN,1),19,0,':'),11,0,':'),DEFAULT))ASlogStartTimeFROM@transactionLog)tORDERBYt.beginLSNDESC-- If all transactions are before cutoff, clear everythingIF(@lastTxnTime <@cutoffDateTime)BEGINEXECsp_repldoneNULL,NULL,0,0,1ENDELSEBEGIN-- Find the earliest transactionSELECTTOP1@firstTxnTime=t.logStartTime,@firstValidLSN=ISNULL(@firstValidLSN,t.beginLSN)FROM(SELECTbeginLSNASbeginLSN,(SELECTTOP1[begin time]FROMfn_dblog(stuff(stuff(CONVERT(CHAR(24),beginLSN,1),19,0,':'),11,0,':'),DEFAULT))ASlogStartTimeFROM@transactionLog)tORDERBYt.beginLSNASCIF(@firstTxnTime <@cutoffDateTime)BEGIN-- Identify the earliest and latest LSNs within VLogs before cutoffSELECT@firstValidLSN=SUBSTRING(MAX(t.lsnMarkers),1,10),@lastValidLSN=SUBSTRING(MAX(t.lsnMarkers),11,10)FROM(SELECTMIN(beginLSN+endLSN)ASlsnMarkersFROM@transactionLogGROUPBYSUBSTRING(beginLSN,1,4))tWHERE(SELECTTOP1[begin time]FROMfn_dblog(stuff(stuff(CONVERT(CHAR(24),t.lsnMarkers,1),19,0,':'),11,0,':'),DEFAULT)WHEREOperation='LOP_BEGIN_XACT') <@cutoffDateTimeEXECsp_repldone@firstValidLSN,@lastValidLSN,0,0,0ENDENDEND;
    3. Create another stored procedure. This time, you create a job that runs thestored procedure that you created in the previous step according to a specifiedcadence:

      CREATEPROCEDURE[dbo].[SetUpDatastreamJob]@transaction_logs_retention_timeINTASBEGINDECLARE@database_nameVARCHAR(MAX)SET@database_name=(SELECTDB_NAME());;DECLARE@command_strVARCHAR(MAX);SET@command_str=CONCAT('Use ',@database_name,'; exec dbo.DatastreamLogTruncationSafeguard @transaction_logs_retention_time = '+CAST(@transaction_logs_retention_timeASVARCHAR(10)));DECLARE@job_nameVARCHAR(MAX);SET@job_name=CONCAT(@database_name,'_','DatastreamLogTruncationSafeguardJob1')DECLARE@current_timeINT=CAST(FORMAT(GETDATE(),'HHmmss')ASINT);-- Schedule the procedure to run after every 5 minutes.IFNOTEXISTS(SELECT*FROMmsdb.dbo.sysjobsWHEREname=@job_name)BEGINEXECmsdb.dbo.sp_add_job@job_name=@job_name,@enabled=1,@description=N'Execute the procedure every 5 minutes.';EXECmsdb.dbo.sp_add_jobstep@job_name=@job_name,@step_name=N'Execute_DatastreamLogTruncationSafeguard',@subsystem=N'TSQL',@command=@command_str;DECLARE@schedule_name_1VARCHAR(MAX);SET@schedule_name_1=CONCAT(@database_name,'_','DatastreamEveryFiveMinutesSchedule')EXECmsdb.dbo.sp_add_schedule@schedule_name=@schedule_name_1,@freq_type=4,-- daily start@freq_subday_type=4,-- every X minutes daily@freq_interval=1,@freq_subday_interval=5,@active_start_time=@current_time;EXECmsdb.dbo.sp_attach_schedule@job_name=@job_name,@schedule_name=@schedule_name_1;-- Add a schedule that runs the stored procedure on the SQL Server Agent startup.DECLARE@schedule_name_agent_startupVARCHAR(MAX);SET@schedule_name_agent_startup=CONCAT(@database_name,'_','DatastreamSqlServerAgentStartupSchedule')EXECmsdb.dbo.sp_add_schedule@schedule_name=@schedule_name_agent_startup,@freq_type=64,-- start on SQL Server Agent startup@active_start_time=@current_time;EXECmsdb.dbo.sp_attach_schedule@job_name=@job_name,@schedule_name=@schedule_name_agent_startup;EXECmsdb.dbo.sp_add_jobserver@job_name=@job_name,@server_name=@@servername;ENDEND;
    4. Execute the stored procedure that creates the Datastream job.

      DECLARE@transaction_logs_retention_timeINT=(INT)EXEC[dbo].[SetUpDatastreamJob]@transaction_logs_retention_time

      ReplaceINT with the number of minutes for which you want toretain the logs. For example:

      • The value of60 sets the retention time to 1 hour
      • The value of24 * 60 sets the retention time to 1 day
      • The value of3 * 24 * 60 sets the retention time to 3 days

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.