Movatterモバイル変換


[0]ホーム

URL:


✨ As of November 2023, Arcion has become a part of Databricks.Learn more here
Applier Configuration

Applier Configuration#

The Applier configuration file contains all the parameters that Replicant uses while loading data on to the Destination database. While it is not necessary to modify most of the parameters, they can be adjusted to optimize the Applier job as necessary. The sample Applier configuration file is located in the Replicant release download folder. The path to the sample Applier configuration file in the release folder is:conf/dst/target_database_name.yaml The sample file is broken up into two sections- snapshot and realtime. The Applier configurations for the initial data capture must be specified undersnapshot and the configurations for realtime replication must be specified underrealtime

Snapshot Mode#

Replicant can run in snapshot mode using the default Applier configurations. Changing the snapshot configurations is therefore not necessary. Depending on the replication job, however, adjusting or specifying the parameters described below may aid in optimizing Replicant.

threads#

Maximum number of threads Replicant should use for writing to target.

batch-size-rows#

This configuration determines the size of a batch. We recommend that you both refer to the value provided in the sample extractor configuration file of the Arcion Replicant release folder and also experiment a bit on your deployment to find the appropriate value here. For example, a typical size of 10000 offers good performance for MongoDB, but you should tune it depending on your use case and target system.

txn-size-rows#

Determines the unit of the Applier-side job size. A transaction consists of multiple batches.

bulk-load#

Arcion Replicant can leverage the underlying support of a FILE/PIPE based bulk loading into the Target system. You can configure bulk loading using the following parameters:

Parameter nameAllowed valuesDescription
enabletrue orfalseWhether to enable or disable bulk loading.
typeFILE orPIPEThe type of bulk loading.
serializetrue orfalseEnables generated files to be applied in a serial/parallel fashion.
native-load-configs[v20.09.14.3]LOAD config stringLOAD config string from the user. These will be appended to the target specificLOAD SQL command.
enable-archive[v21.09.17.1]true orfalseWhether to archive data files of last iterations on disk. You can use this config when bulk loading is enabled andtype is set toFILE.
archive-dir[v21.09.17.1]Directory path, for example"/home/file-archive"Directory path where Replicant will archive data files. Without specifying, Replicant will use the default project data path as the archive directory ifenable-archive is set totrue.
compression-type[v21.09.17.1]"GZIP"The file compression type to compress the CSV files to be loaded on to Target.

handle-failed-opers#

true orfalse.

Whether to handle failure to load rows, tables, and/or operations on the Target.

Iftrue, Replicant will write to a version table or a CSV file when it fails to load rows, tables, and/or operations on the Target.

use-quoted-identifiers#

true orfalse.

Iffalse, Replicant will avoid quoting catalog, schema, table, view, and column names while creating and accessing them.

skip-tables-on-failures#

true orfalse.

Enabling this parameter will force Replicant to skip a table/collection that Replicant is unable to load on to the Target even after multiple attempts. Instead, Replicant will continue replicating other tables.

deferred-delete#

true orfalse.

When enabled, Replicant will create an additional column on each Target table to mark the deletes instead of actually deleting records from the tables.

Note: This parameter is relevant only for incremental replication. It’s irrelevant for full replication based on snapshot or CDC.

schema-dictionary#

Cntrols dumping of schemas. It can have any one of the following values:

Note: This is supported for selective targets only.

per-table-config#

Allows you to specify various properties for specific tables on the Target database. The can configure the following properties under this parameter:

You can also configure bulk loading for specific tables underper-table-config. To know about the available bulk loading parameters, seebulk-load.

user-role#

For creating user roles. It has the following parameters:

init-indexes#

true orfalse.

Enabling this allows Replicant to create indexes on Target database. Default value istrue.

init-indexes-post-snapshot#

true orfalse.

Enabling this allows Replicant to create indexes after the snapshot is complete. If disabled, Replicant creates indexes prior to the snapshot.

Default value istrue.

init-constraint-post-snapshot#

Enabling this allows Replicant to create constraints and indexes after the snapshot is complete.

denormalize[v21.04.06.1]#

Controls enabling denormalization in the database. It has only one parameter:

use-upsert-based-recovery[v22.11]#

true orfalse.

Whether to enable upsert-based recovery for snapshot mode.

This parameter controls Replicant’s behavior for recovery. By default, upon resuming from a crash, Replicant ignores the completed jobs and deletes any uncommitted or partially executed jobs. Then it goes on to execute those remaining jobs.

Delete operation is expensive and slows down operations in most databases. We can get past this by skipping the delete operation entirely, without violating primary or unique key constraints. Replicant achieves this by performing upsert operations to complete the uncommitted jobs.

Ifuse-upsert-based-recovery is set totrue, Replicant restarts the partially executed jobs on the Extractor side, while the Applier performs upsert operations on rows in the target tables.

Note: Arcion Replicant supports upsert-based recovery for the following target databases:

Upsert-based recovery doesn’t work in the following situations:

  • The target database doesn’t support upsert operation.
  • A table doesn’t have a candidate key (primary or unique key).

In these situations, Replicant falls back to the default behavior for recovery.

Realtime Mode#

Replicant can run in realtime mode using the default configurations. But changing certain parameters may improve real time replication performance depending on the use case. The following configuration parameters are available for realtime mode:

threads#

Maximum number of threads Replicant should use for writing to Target.

batch-size-rows#

In realtime operations, insert/update/deletes are batched together and applied to Target. This parameter determines the size of the batch. For example, typical size of10000 offers good performance for MongoDB, but you should tune it depending on your use case and Target system.

txn-size-rows#

Determines the unit of Applier side job size. A transaction consists of multiple batches.

handle-failed-opers#

true orfalse.

Whether to handle failure to load rows, tables, and/or operations on the Target.

Iftrue, Replicant will write to a version table or a CSV file when it fails to load rows, tables, and/or operations on the Target.

use-quoted-identifiers#

true orfalse.

Iffalse, Replicant will avoid quoting catalog, schema, table, view, and column names while creating and accessing them.

Important: Thesame parameter is available in snapshot mode. If you set this parameter in realtime mode as well, make sure they both have the same value (bothtrue or bothfalse).

before-image-format#

This indicates which columns should be included in the WHERE part of the UPDATE and DELETE operations. Allowed values are:

after-image-format#

This indicates which columns should be included in the SET part of the UPDATE operation. Allowed values are:

retry-transactions[v20.06.01.2]#

true orfalse.

Note: The default value of this parameter istrue for target databases supporting ACID transactions (except MongoDB). Otherwise, the default isfalse.

With this set totrue, Replicant tries each real-time transaction again in the event of failures. The number of retries and wait duration between each retry depends on the connection configuration of the Target system.

retry-failed-txn-idempotently[v20.09.14.8]#

true orfalse.

With this set andretry-transactions set totrue, Relicant retries each real-time transaction idempotently on failure.

replay-consistency[v20.09.14.1]#

Indicates the type of transactional consistency. The following values are possible:

GLOBAL
Real-time replication is performed with global transactional consistency.
EVENTUAL
Default value.

Real-time replication is performed with eventual consistency.

skip-upto-cursors[v20.09.14.1]#

Note: This parameter applies onlyifreplay-consitency is set toGLOBAL.

Here, you can specify a list of cursor positions up to which replication must be skipped. It may not always be possible to persist the operations of a failed transaction in thefailed_txn table of metadata storage (see--skip-failed-txn in Various Replication Options Explanation for more details). In that case, Replicant logs the operations in a text file. The text file is located in the following path:

<Replicant_home>/data/<Replicant ID>/bad_rows replicate_io_failed_txn_log_<tableID/extractorID>.

The user gets notified of this file location through notification mails and trace logs. The file contains the failed operations as well as theskip-upto-cursor(s) configuration. After resolving the failure, you can use this parameter to skip over the failed transactions.

replay-shard-key-update-as-delete-insert[v20.12.04.7]#

Allows replay of update operation that changes values of shard key columns as delete + inserts. The following values are possible for this parameter:

This isON by default for SingleStore as a Target.

per-table-config#

This configuration allows you to specify various properties for specific target tables.

skip-tables-on-failures#

true orfalse.

Enabling this parameter will force Replicant to skip a table/collection that Replicant is unable to load on to the Target even after multiple attempts. Instead, Replicant will continue replicating other tables.

replay-replace-as-upsert[v22.11]#

true orfalse.

Whether to enable upsert-based recovery whenoperating in full mode.

It controls how Replicant replays replace operations to address realtime changes infull mode replication.

For example, some insert operations can occur during the snapshot phase. Replicant’s default behavior in this case is replaying replace operation by performing delete and then insert. This slows down replication in many cases. If you setreplay-replace-as-upsert totrue, Replicant replays replace operation using upsert.

Note: Arcion Replicant supports upsert-based recovery for the following target databases:

Upsert-based recovery doesn’t work in the following situations:

  • The target database doesn’t support upsert operation.
  • A table doesn’t have a candidate key (primary or unique key).

In these situations, Replicant falls back to the default behavior for recovery.

cdc-metadata-type#

This parameter allows you to add extra information columns to the CDC row.

Support forcdc-metadata-type possesses the following limitations:

Arcion supports this parameter for the following targets:

You can setcdc-metadata-type to one of the following values:

NONE

Default value ofcdc-metadata-type. CDC works in the normal manner.

TYPE2_CDC

With this value, all operations become append only and Arcion uses Type-2 CDC. For more information, seeType-2 CDC.

ADD_METADATA_CDC

With this value, insert, update, and delete operations work in the normal manner but each row contains two additional fields:
  • ARCION_SOURCE_EXTRACTION_TIMESTAMP. Time when Replicant detects the DML from logs.
  • ARCION_SOURCE_OPERATION_TYPE. Type of the operation (INSERT, UPDATE, or DELETE). Delete operation deletes the row.

Each update operation updates the preceding fields and insert operation inserts a new row with the preceding fields.


[8]ページ先頭

©2009-2026 Movatter.jp