Movatterモバイル変換


[0]ホーム

URL:


✨ As of November 2023, Arcion has become a part of Databricks.Learn more here
Setup guide

Destination Snowflake setup guide#

The following steps referthe extracted Arcion self-hosted CLI download as the$REPLICANT_HOME directory.

Required permissions#

I. Set up connection configuration#

Specify our Snowflake connection details to Replicant with a connection configuration file. You can find a sample connection configuration filesnowflake.yaml in the$REPLICANT_HOME/conf/conn directory.

In the connection configuration file, you need to configure the following:

Configure target Snowflake server connection#

Note: All communications with Snowflake happens through port 443, the standard port for HTTPS. So all data is encrypted and secure with SSL by default.
To connect to target Snowflake server, you can choose between two methods for an authenticated connection:

Connect using username and password#

To connect to Snowflake with basic username and password authentication, you have two options:

Specify your credentials in plain text YAML format in the connection configuration file like the following sample:

type:SNOWFLAKEhost:SNOWFLAKE_HOSTNAMEport:PORT_NUMBERwarehouse:"WAREHOUSE_NAME"username:"USERNAME"password:"PASSWORD"max-connections:20#Specify the maximum number of connections replicant can open in Snowflakemax-retries:10retry-wait-duration-ms:1000

Replace the following:

  • SNOWFLAKE_HOSTNAME: The Snowflake hostname. To find your Snowflake hostname, follow these steps:
    1. Go to theSnowflake web interface and sign in into your account.
    2. Click theAccount selector toolbar inthe bottom of the left navigation menu.
    3. Hover the mouse over your account and click Copy account URL in the items list.
  • PORT_NUMBER: The port number of Snowflake host.
  • WAREHOUSE_NAME: The name of theSnowflake warehouse.
  • USERNAME: The username to connect to the Snowflake server.
  • PASSWORD: The password associated withUSERNAME. Omit this parameter if you useRSA key pair authentication.

Additional parameters#

credential-store#

Replicant supports consumingusername andpassword configurations from acredentials store rather than having users specify them in plain text configuration file. You can use KeyStores to store your credentials for Snowflake server connections.credential-store supports the following parameters to configure credentials store:

type
Type of the KeyStore.

Arcion supports the following types:

  • PKCS12
  • JKS
  • JCEKS.
path
Location of the KeyStore.
key-prefix
You must create entries in the credential store for your configs using a prefix and specify the prefix here. For example, if you create KeyStore entries with aliasessnowflake1_username andsnowflake1_password, then specify the prefix here assnowflake1_.
password

Optional field.

If you don’t specify the KeyStore password here, then you must use the UUID from your license file as the KeyStore password. Remember to keep your license file somewhere safe in order to keep this password secure.

You can store your connection credentials in a secrets management service and tell Replicant to retrieve the credentials. For more information, seeSecrets management.

Configure stage#

Stage configuration allows you to tune native or external staging area for bulk loading. For more information, seeStage configuration.

Use RSA key pair for authentication#

You can also choose to useSnowflake’s key pair authentication support for enhanced authentication security instead of using basic authentication via username and password.

To set up key pair authentication using RSA keys, follow these steps:

Generate the private key#

From your command line, execute the following command to generate an encrypted private key:

openssl genrsa2048 | openssl pkcs8 -topk8 -inform PEM -v1 PBE-SHA1-RC4-128 -out rsa_key.p8

The command generates a private key in PEM format:

-----BEGIN ENCRYPTED PRIVATE KEY-----MIIFHDBOBgkqhkiG9w0BBQ0wQTApBgkqhkiG9w0BBQwwHAQIK0h8dqI1n5sCAggAMAwGCCqGSIb3DQIJBQAwFAYIKoZIhvcNAwcECNDwqMf6Xx1pBIIEyNmf044S+pEQ...-----END ENCRYPTED PRIVATE KEY-----

Important: The preceding command to generate an encrypted key prompts for a passphrase to grant access to the key. We recommend using a passphrase that complies with PCI DSS standards to protect the generated private key. We also recommend storing the passphrase in a secure location. When using an encrypted key to connect to Snowflake, you need to input the passphrase during the initial connection. The passphrase only protects the private key and never reaches Snowflake servers.

To generate a long and complex passphrase based on PCI DSS standards, follow these steps:

  • Go to thePCI Security Standards Document Library.
  • ForPCI DSS, select the most recent version and your desired language.
  • Complete the form to access the document.
  • Search forPasswords/passphrases must meet the following: and follow the recommendations for password/passphrase requirements, testing, and guidance.

Generate a public key#

From the command line, generate the public key by referencing the private key. The following command references the private key from a filersa_key.p8 that you create in theprevious step:

openssl rsa -in rsa_key.p8 -pubout -out rsa_key.pub

The command generates a public key in PEM format:

-----BEGIN PUBLIC KEY-----MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAslwTa+Lj5SMI58GiDzWq0rwj4FBymfKzHT16RXecnMcx7uI6KsVpqfh9HH0FMb/3C6YEMeGPkaRmKvXYjM5s...-----END PUBLIC KEY-----

Store the private and public keys securely#

After following the preceding steps, the private and public key files are saved in a local directory of your system. Note down the path to those files. The private key is stored using the PKCS#8 (Public Key Cryptography Standards) format and is encrypted using the passphrase you specified in thefirst step.

However, maintain caution in protecting the file from unauthorized access using the file permission mechanism your operating system provides. You must take responsibility to secure the file when not in use.

Assign the public key to a Snowflake user#

Execute the following command to assign the public key to a Snowflake user.

alteruser jsmithset rsa_public_key='MIIBIjANBgkqh...';
  • Only security administrators, for example, users with the SECURITYADMIN role or higher, can alter a user.
  • Exclude the public key delimiters in the SQL statement.

Verify the user’s public key fingerprint#

Execute the following command to verify the user’s public key:

DESCUSER jsmith;

The command output resembles the following:

+---------------------+-----------------------------------------------------+---------+----------------------------------------------+| property            | value                                               | default | description                                  |+---------------------+-----------------------------------------------------+---------+----------------------------------------------+| NAME                | JSMITH                                              | null    | Name                                         |......| RSA_PUBLIC_KEY      | MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAslwT... | null    | RSA public key of the user                   || RSA_PUBLIC_KEY_FP   | SHA256:nvnONUsfiuycCLMXIEWG4eTp4FjhVUZQUQbNpbSHXiA= | null    | Fingerprint of user's RSA public key.        || RSA_PUBLIC_KEY_2    | null                                                | null    | Second RSA public key of the user            || RSA_PUBLIC_KEY_2_FP | null                                                | null    | Fingerprint of user's second RSA public key. |...+---------------------+-----------------------------------------------------+---------+----------------------------------------------+

Edit the connection configuration file#

You need to modifyReplicant’s connection configuration file for Snowflake and include RSA key information there. Specifically, add the following parameters to the connection configuration file:

private-key-path:"/PATH_TO_GENERATED_KEY/rsa_key.p8"private-key-passphrase:"PRIVATE_KEY_PASSPHRASE"

Replace the following:

Note: If you specify theprivate-key-path andprivate-key-passphrase parameters, you don’t need to specify thepassword parameter in theconnection configuration file.

II. Set up Applier configuration#

To configure replication mode according to your requirements, specify your configuration in the Applier configuration file. You can find a sample Applier configuration filesnowflake.yaml in the$REPLICANT_HOME/conf/dst directory.

You can configure Snowflake for operating in eithersnapshot orrealtime modes.

Configuresnapshot mode#

For operating in snapshot mode, specify your configuration under thesnapshot section of the conifiguration file. For example:

snapshot:threads:16#Specify the maximum number of threads Replicant should use for writing to the targetbatch-size-rows:100_000txn-size-rows:1_000_000#If bulk-load is used, Replicant will use the native bulk-loading capabilities of the target databasebulk-load:enable:true|false#Set to true if you want to enable bulk loadingtype:FILE|PIPE#Specify the type of bulk loading between FILE and PIPEserialize:true|false#Set to true if you want the generated files to be applied in serial/parallel fashion#For versions 20.09.14.3 and beyondnative-load-configs:#Specify the user-provided LOAD configuration string which will be appended to the s3 specific LOAD SQL command

For more information about the Applier parameters forsnapshot mode, seeSnapshot mode.

Enable clustering#

To improve performance, primary keys and unique keys need to be clustering keys on Snowflake side.Replicant achieves that by making primary and unique keys as clustering keys when Replicant creates the tables.

To enable clustered table creation, set theforce-use-clustered-key parameter totrue in your Applier configuration file.for-use-clustered-key defaults tofalse and Snowflake tables don’t have clustering keys designated to them.

Important: You must run Replicant with the--replace option for clustering to work.

For more information on Snowflake clustering, seeClustering Keys & Clustered Tables.

Configurerealtime mode#

For operating in realtime mode, specify your configuration under therealtime section of the conifiguration file. For example:

realtime:threads:8#Specify the maximum number of threads Replicant should use for writing to the targetmax-retries-per-op:30#Specify the maximum amount of retries for a failed operationretry-wait-duration-ms:5000#Specify the time in milliseconds Replicant should wait before re-trying a failed operationcdc-stage-type:FILE#Enter your cdc-stage-type

When operating in realtime mode, pay attention to the following details:

For more information about the configuration parameters forrealtime mode, seeRealtime mode.

Enable Type-2 CDC#

From version 22.07.19.3 onwards, Arcion supports Type-2 CDC for Snowflake as the target. For more information, seeType-2 CDC andcdc-metadata-type.


[8]ページ先頭

©2009-2026 Movatter.jp