Movatterモバイル変換


[0]ホーム

URL:


DOCS >REPLICATION >REPLICATION METHODS >KEY-BASED INCREMENTAL REPLICATION

Get started with a free trial today

Already have an account? Sign in

Key-based Incremental Replication

Key-based Incremental Replication is a method of replication that replicates new or updated data from a data source. In this guide, we’ll cover:

  1. How it works (with examples),
  2. When it should be used,
  3. Limitations of this Replication Method, and
  4. How to enable it for your integration

How Key-based Incremental Replication works

Key-based Incremental Replication basics

When using Key-based Incremental Replication, Stitch uses a column called aReplication Key - a column in a source table - to identify new and updated data in a table for replication.

When Stitch replicates a table using Key-based Incremental Replication, a few things will happen:

  1. During a replication job, Stitch stores themaximum value of a table’s Replication Key column.
  2. During thenext replication job, Stitch will compare saved value from the previous job to Replication Key column values in the source.
  3. Any rows in the table with a Replication Keygreater than or equal to the stored value are replicated.
  4. Stitch stores the new maximum value from the table’s Replication Key column.
  5. Repeat.

Let’s use a SQL query as an example:

SELECTreplication_key_column,column_you_selected_1,column_you_selected_2,[...]FROMschema.tableWHEREreplication_key_column>='last_saved_maximum_value'

Supported Replication Key data types

While this section touches on Replication Keys, a full walkthrough is outside the scope of this guide. Refer to theReplication Keys documentation to learn about Replication Key requirements and how to select appropriate Replication Key columns.

Replication Key columns must be one of the following data types:

Data typeAvailable forNotes
DATETIMEAll database integrations
INTEGERAll database integrations

IncludesBIGINT andMEDIUMINT

TIMESTAMPAll database integrations
FLOATMongoDB v1+ database integrations
INT64MongoDB v1+ database integrations
NUMBEROracle v1+ database integrations
OBJECTIDMongoDB v1+ database integrations
UUIDMongoDB v1+ database integrations

Replication Key type impact on data extraction

Below are examples of how different Replication Key types impact the data extracted using Key-based Incremental Replication.

Note: These examples only demonstrate how data isextracted from a data source, not how it will be loaded into your destination.

In this example, we’ll demonstrate how Key-based Incremental Replication works with aTIMESTAMP (orDATE-TIME) Replication Key column for initial and ongoing replication jobs.

Let’s assume the Replication Key column for the table in this example is a column namedupdated_at.

Initial replication job

During theinitial job:

  • In the source: A table in the source contains three records:1,2,3. Every time a record is created or modified, itsupdated_at value is updated.

  • During extraction: Depending on the integration type, the initial extraction for a table using Key-based Incremental Replication will vary slightly:

    • For adatabase integration:
      1. Stitch replicates the entire table in full.
      2. Stitch savesupdated_at: 2018-07-27 12:07:00 as the maximum Replication Key value.
    • For aSaaS integration:
      1. Stitch uses theStart Date defined in the Integration Settings as the initial Replication Key (updated_at) value.
      2. Records with anupdated_at value greater than or equal to theStart Date are replicated.
      3. Stitch savesupdated_at: 2018-07-27 12:07:00 as the maximum Replication Key value.
Source records
idupdated_atactive
1 2018-07-27 12:00:00 false
2 2018-07-27 12:05:00 false
3 2018-07-27 12:07:00 false
Extracted records
idupdated_atactive
1 2018-07-27 12:00:00 false
2 2018-07-27 12:05:00 false
3 2018-07-27 12:07:00 false

Second replication job

During thesecond job:

  • In the source: Before the next replication job, record2 is updated and a new record,4, is added to the table in the source. Theupdated_at values for these records are updated.

  • During extraction: Stitch replicates records3,2, and4, as the records’updated_at values are eithergreater than or equal to the last saved maximum Replication Key value ofupdated_at: 2018-07-27 12:07:00 from the previous replication job.

    Record3 is selected because itsupdated_at value is equal to the maximum Replication Key value from the previous replication job.

    If this were a SQL query, it would look like this:

    SELECT*FROMsample_tableWHEREupdated_at>='2018-07-27 12:07:00'
Source records
idupdated_atactive
1 2018-07-27 12:00:00 false
2 2018-07-28 09:12:00 true
3 2018-07-27 12:07:00 false
4 2018-07-28 06:53:00 false
Extracted records
idupdated_atactive
2 2018-07-28 09:12:00 true
3 2018-07-27 12:07:00 false
4 2018-07-28 06:53:00 false

In this example, we’ll demonstrate how Key-based Incremental Replication works with an auto-incrementingINTEGER Replication Key column.

For this table, the Replication Key is a column namedcustomer_id. The value of this column increments every time a new record is added to the table, but it is never updated after the record is created.

Initial replication job

During theinitial job:

  • In the source: A table in the source contains three records:1,2,3.

    Thecustomer_id value is auto-incremented when a new record is added to the table. This column is not updated, even if changes are made to the record.

  • During extraction:

    1. Stitch replicates the entire table in full.
    2. Stitch savescustomer_id: 3 as the maximum Replication Key value.
Source records
customer_idnameactive
1 Finn false
2 Jake false
3 Bubblegum false
Extracted records
customer_idnameactive
1 Finn false
2 Jake false
3 Bubblegum false

Second replication job

During thesecond job:

  • In the source: Before the next replication job, record2 is updated and a new record,4, is added to the table in the source.

  • During extraction:

    1. Stitch replicates records3 and4, as the records’customer_id values are eithergreater than or equal to the last saved maximum Replication Key value ofcustomer_id: 3 from the previous replication job. Note that:
      • Record3 is selected because itscustomer_id value is equal to the maximum Replication Key value from the previous replication job.
      • Record2 is not selected because itscustomer_id value wasn’t updated when the record was modified, and was therefore less than the last saved maximum Replication Key value.
    2. Stitch savescustomer_id: 4 as the new maximum Replication Key value.

    If this were a SQL query, it would look like this:

    SELECT*FROMsample_tableWHEREcustomer_id>=3
Source records
customer_idnameactive
1 Finn false
2 Jake true
3 Bubblegum false
4 Beamo false
Extracted records
customer_idnameactive
3 Bubblegum false
4 Beamo false

When Key-based Incremental Replication should be used

Aside fromLog-based Replication where it’s supported, Key-based Incremental Replication is the most efficient method for replicating your data. If Log-based Replication is unavailable for your source, Key-based Incremental Replication may be a good fit if:

  1. A table contains a modification timestamp column, which is updated when the record changes
  2. Records aren’t hard deleted from the source table. Refer to theLimitations section below for more info.

Note: In the case of SaaS integrations, Stitch will use Key-based Incremental Replication whenever possible. Refer to theSchema section of anyintegration’s documentation for the Replication Method and Replication Key(s) used by specific tables.


Limitations of Key-based Incremental Replication

Before you select Key-based Incremental Replication as the Replication Method for a table, you should be aware of the limitations this method can have. Being aware of these limitations can help prevent data discrepancies and ensure your data is replicated in the most efficient manner possible.

The limitations of Key-based Incremental Replication are:

Limitation 1: Works best with a modification timestamp column

While aninteger column can be used as a Replication Key, Key-based Incremental Replication functions best with a modification timestamp Replication Key. Unlike an auto-incrementing integer, a modification timestamp allows Stitch to identify both new and updated records for replication.

Limitation 2: Hard deletes aren't captured

Hard deletes aren’t able to be replicated with Key-based Incremental Replication. This is due to the usage of Replication Keys to identify data for replication.

When a record is hard deleted, or entirely removed from a source, its Replication Key value is also removed. Without a Replication Key value to check, Stitch can’t identify the change and update the record in the destination. This means that the record will remain in the destination.

Refer to theDeleted Record Handling guide for a more detailed explanation and examples.

Limitation 3: Duplication in replication

Due to the inclusive nature of Replication Keys, there will be some duplication during the extraction process. This is because Stitch checks for values that are greater thanor equal to the last saved maximum Replication Key value.

Because of this approach, the record or records with Replication Key values equal to the maximum value will be selected for extraction during subsequent jobs. Most of the time, the number of re-replicated rows will be small. If, however, a bulk update occurs and a large number of records all have the same Replication Key value, you could see a high amount of rows being replicated during every replication job until a greater Replication Key value is detected.

Example

In this example, we’ll use acustomers table with a Replication Key column namedupdated_at.

  1. In a database, you run a process that updates 100 records in thecustomers table.
  2. These records’updated_at values are updated to2018-11-01 00:00:00.
  3. During the next replication job - or Job 1 - Stitch extracts 101 records from the sourcecustomers table:
    • The 100 updated records withupdated_at values of2018-11-01 00:00:00, and
    • The one record with anupdated_at value equal to the last saved maximum value from the previous replication job.
  4. Stitch saves the new maximum Replication Key value asupdated_at: 2018-11-01 00:00:00.
  5. No records are updated between Job 1 and the next job.
  6. When Job 2 begins, Stitch again extracts the 100 records withupdated_at: 2018-11-01 00:00:00 because their Replication Key values areequal to the last saved maximum value.

Until a record with a greaterupdated_at value is added to thecustomers table, Stitch will continue to extract all records withupdated_at: 2018-11-01 00:00:00 values.

Bulk update handling

To avoid the above scenario, add a single record with a greater Replication Key value at the end of a bulk update. This will ensure that the maximum Replication Key value Stitch saves will only be equal to one record instead of many.

For example: If one of the 100 updated records in thecustomers table had had anupdated_at value of2018-11-01 00:00:01, Stitch would have saved this as the maximum Replication Key value. Then, during Job 2, only one record - instead of 100 - would have been re-replicated.


Enable Key-based Incremental Replication

Key-based Incremental Replication is available for use with the majority of Stitch integrations. Depending on the type of integration, enabling this Replication Method will vary:

  • Database integrations: To use Key-based Incremental Replication, a table must contain a column suitable for use as aReplication Key.Note: For MongoDB integrations, there are additional considerations for Replication Keys. Refer to theMongoDB Replication Keys guide for more info.

  • SaaS integrations: With the exception of Salesforce, no configuration is required on your part. Replication Methods are pre-defined for every table set to replicate. Stitch will use Key-based Incremental Replication whenever possible to ensure your data is replicated accurately and efficiently.

  • Webhook integrations: No configuration is required on your part. As webhook data is sent to Stitch in real-time, only new records are ever replicated from a webhook source. This can be thought of as using Key-based Incremental Replication with a Replication Key ofcreated_at.


RelatedTroubleshooting

Questions? Feedback?

Did this article help? If you have questions or feedback, feel free tosubmit a pull request with your suggestions,open an issue on GitHub, orreach out to us.


[8]ページ先頭

©2009-2026 Movatter.jp