Snowflake Table of Contents

Source Snowflake #

The following steps referthe extracted Arcion self-hosted CLI download as the$REPLICANT_HOME directory.

Prerequisites for CDC-based replication#

Make sure that you possess the following object privileges for CDC-based replication:

Object	Privilege
`DATABASE`	`USAGE`
`SCHEMA`	`USAGE`,`CREATE`
`TABLE`	`SELECT`,`CREATE STREAM`,`CREATE TABLE`

Limitations#

Real-time replication#

Streams may become stale over time. For more information, seeData Retention Period and Staleness.
Snowflake can extract data on a per-table basis. Therefore, you don’t need to create heartbeat table manually.

Warning: If a stream goes stale, Replicant drops and recreates the stream. This might cause data loss. So we highly recommend that you take necessary measures so that streams don’t become stale.

Native export#

Snowflake native export should only be used when the Applier supports file-based bulk loading.
We recommend that you use S3 as the stage only when the Applier utilizes S3 as the stage for bulk loading. Otherwise, replication performs similarly to theNATIVE stage type.
When using the CSV file format, make sure that the samenative-extract-options exist in both the Extractor and Applier configurations.
Parquet files might produce an error withTIMESTAMP_TZ orTIMESTAMP_LTZ data.
For all general limitations and notes, seeUsage notes for theCOPY INTO command.

I. Set up connection configuration#

From$REPLICANT_HOME, navigate to the sample connection configuration file:
```
vi conf/conn/snowflake.yaml
```
The configuration file has two parts:
- Parameters related to source Snowflake server connection.
- Parameters related to stage configuration.
Parameters related to Source Snowflake server connection#
Note: All communications with Snowflake happens through port 443, the standard port for HTTPS. So all data is encrypted and secure with SSL by default.
For connecting to Source Snowflake server, you can choose between two methods for an authenticated connection:
- RSA key pair authentication
- Basic username and password authentication
Basic username and password authentication#
To connect to Snowflake using basic username and password authentication, you have three options:
Secrets management service
You can store your connection credentials in a secrets management service and tell Replicant to retrieve the credentials. For more information, seeSecrets management.
Credentials in plain text
You can also specify your credentials in plain form in the connection configuration file like the sample below:
```
type:SNOWFLAKEhost:SNOWFLAKE_HOSTNAMEport:PORT_NUMBERwarehouse:"WAREHOUSE_NAME"username:"USERNAME"password:"PASSWORD"max-connections:20#Specify the maximum number of connections replicant can open in Snowflakemax-retries:10retry-wait-duration-ms:1000
```
Replace the following:
- SNOWFLAKE_HOSTNAME: The Snowflake hostname. To find your Snowflake hostname, follow these steps:
  1. Go to theSnowflake web interface and sign in into your account.
  2. Click theAccount selector toolbar inthe bottom of the left navigation menu.
  3. Hover the mouse over your account and click Copy account URL in the items list.
- PORT_NUMBER: The port number of Snowflake host.
- WAREHOUSE_NAME: The name of theSnowflake warehouse.
- USERNAME: The username to connect to the Snowflake server.
- PASSWORD: The password associated withUSERNAME.
Credentials store
Replicant supports consumingusername andpassword configurations from acredentials store rather than having users specify them in plain text configuration file. You can use KeyStores to store your credentials related to your Snowflake server connections. Use thecredential-store parameter for the credentials store details.
The following parameters are available undercredential-store:
- type: Type of the keystore. Allowed types arePKCS12,JKS, andJCEKS.
- path : Location of the key-store.
- key-prefix: You should create entries in the credential store for your configs using a prefix and specify the prefix here. For example, you can create keystore entries with aliasessnowflake1_username andsnowflake1_password. You can then specify the prefix here assnowflake1_.
- password: This field is optional. If you don’t specify the keystore password here, then you must use the UUID from your license file as the keystore password. Remember to keep your license file somewhere safe in order to keep the password secure.
Use RSA key pair for authentication#
You can also choose to useSnowflake’s key pair authentication support for enhanced authentication security instead of using basic authentication via username and password.
To set up key pair authentication using RSA keys, follow the steps below:
Generate the private key
From your command line, execute the following command to generate an encrypted private key:
```
openssl genrsa2048 | openssl pkcs8 -topk8 -inform PEM -v1 PBE-SHA1-RC4-128 -out rsa_key.p8
```
The command generates a private key in PEM format:
```
-----BEGIN ENCRYPTED PRIVATE KEY-----MIIFHDBOBgkqhkiG9w0BBQ0wQTApBgkqhkiG9w0BBQwwHAQIK0h8dqI1n5sCAggAMAwGCCqGSIb3DQIJBQAwFAYIKoZIhvcNAwcECNDwqMf6Xx1pBIIEyNmf044S+pEQ...-----END ENCRYPTED PRIVATE KEY-----
```
Important: The command above to generate an encrypted key prompts for a passphrase to grant access to the key. We recommend using a passphrase that complies with PCI DSS standards to protect the generated private key. Additionally, we recommend storing the passphrase in a secure location. When using an encrypted key to connect to Snowflake, you will need to input the passphrase during the initial connection. The use of the passphrase is only for protecting the private key; it’s never to sent to Snowflake servers.
To generate a long and complex passphrase based on PCI DSS standards, follow the steps below:
- Go to thePCI Security Standards Document Library.
- ForPCI DSS, select the most recent version and your desired language.
- Complete the form to access the document.
- Search forPasswords/passphrases must meet the following: and follow the recommendations for password/passphrase requirements, testing, and guidance.
Generate a public key
From the command line, generate the public key by referencing the private key. The following command references the private key contained in a file namedrsa_key.p8 created in theprevious step:
```
openssl rsa -in rsa_key.p8 -pubout -out rsa_key.pub
```
The command generates a public key in PEM format:
```
-----BEGIN PUBLIC KEY-----MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAslwTa+Lj5SMI58GiDzWq0rwj4FBymfKzHT16RXecnMcx7uI6KsVpqfh9HH0FMb/3C6YEMeGPkaRmKvXYjM5s...-----END PUBLIC KEY-----
```
Store the private and public Keys securely
After following the above steps, you should find the private and public key files saved in a local directory of your system. Note down the path to those files. The private key is stored using the PKCS#8 (Public Key Cryptography Standards) format and is encrypted using the passphrase you specified in thefirst step.
However, maintain caution in protecting the file from unauthorized access using the file permission mechanism provided by your operating system. It’s your responsibility to secure the file when it’s not being used.
Assign the public key to a Snowflake user
Execute the following command to assign the public key to a Snowflake user.
```
alteruser jsmithset rsa_public_key='MIIBIjANBgkqh...';
```
- Only security administrators (i.e. users with the SECURITYADMIN role) or higher can alter a user.
- Exclude the public key delimiters in the SQL statement.
Verify the user's public key fingerprint
Execute the following command to verify the user’s public key:
```
DESCUSER jsmith;
```
The command output is similar to the following:
```
+---------------------+-----------------------------------------------------+---------+----------------------------------------------+| property            | value                                               | default | description                                  |+---------------------+-----------------------------------------------------+---------+----------------------------------------------+| NAME                | JSMITH                                              | null    | Name                                         |......| RSA_PUBLIC_KEY      | MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAslwT... | null    | RSA public key of the user                   || RSA_PUBLIC_KEY_FP   | SHA256:nvnONUsfiuycCLMXIEWG4eTp4FjhVUZQUQbNpbSHXiA= | null    | Fingerprint of user's RSA public key.        || RSA_PUBLIC_KEY_2    | null                                                | null    | Second RSA public key of the user            || RSA_PUBLIC_KEY_2_FP | null                                                | null    | Fingerprint of user's second RSA public key. |...+---------------------+-----------------------------------------------------+---------+----------------------------------------------+
```
Edit the connection configuration file
You need to modifyReplicant’s connection configuration file for Snowflake and include RSA key information there. Specifically, add the following parameters to the connection configuration file:
```
private-key-path:"/PATH_TO_GENERATED_KEY/rsa_key.p8"private-key-passphrase:"PRIVATE_KEY_PASSPHRASE"
```
Replace the following:
- PATH_TO_GENERATED_KEY: the local directory path to thersa_key.p8 keyfile
- PRIVATE_KEY_PASSPHRASE: the private key passphrase you specified in thefirst step
If you specify theprivate-key-path andprivate-key-passphrase parameters, you don’t need to specify thepassword parameter in the connection configuration file.
Parameters related to stage configuration#
ForCOPY extraction method, you need to specify a stage connection configuration. To specify the stage configuration, use thestage field in the connection configuration file. The following configuration options are available:
type[v21.06.14.1]
The stage type. The following stages are supported:
NATIVE
Snowflake’s native stage. This stage is created based on table name and job ID. You can specifyCSV orPARQUET as thefile-format.
S3
This specifies S3 as the external stage type, allowing Snowflake to export CSV or Parquet files directly to an S3 bucket.
To be able to connect to the S3 bucket, you need to provide the connection configuration using theroot-dir,conn-url,key-id, andsecret-key paramters. Keep in mind that you need to provide the same stage connection configuration in the target connection configuration file. This allows the Applier to pick up these files directly from S3.
We recommended S3 stage when both the Extractor and the Applier support S3.
file-format
The file format to use for the exported data. The following file foramts are supported:
- CSV
- PARQUET
root-dir
Specifies a directory on stage that can be used to stage bulk-load files.
conn-url[v21.06.14.1]
Specifies the URL for the stage. For example, forS3 stage, specify the S3 bucket name.
key-id
Specifies the access key ID for AWS account hosting S3.
secret-key[v21.06.14.1]
Specifies the secret access Key for AWS account hosting S3. Applies toS3 stage type only.
Example configuration forNATIVE stage#
```
stage:type:NATIVEfile-format:PARQUET
```
Example configuration forS3 stage#
```
stage:type:S3root-dir:"test_snowflake"conn-url:"replicate-stage"key-id:"AKIAIOSFODNN7EXAMPLE"secret-key:"wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"file-format:PARQUET
```

II. Set up Extractor configuration#

To configure replication mode according to your requirements, specify your configuration in the Extractor configuration file. You can find a sample Extractor configuration filesnowflake.yaml in the$REPLICANT_HOME/conf/src directory. For example:

Configure`snapshot` replication#

Snowflake supports two types of extraction method:

QUERY: The default JDBC-based extraction method.
COPY: The file-based extraction method that uses Snowflake’s native export capability. For more information, see the following section.

Native export[v23.06.30.2]#

From version 23.06.30.2, Arcion Replicant supports exporting Snowflake data into CSV or Parquet files. You can store these files locally or in a remote directory like an S3 bucket. This feature is currently supported insnapshot mode.

To enable native export, follow these steps:

1. Set the extraction method and options in the Extractor configuration file

Set theextraction-method parameter toCOPY in the Extractor configuration file. This enables Snowflake native export by using theCOPY command to export data.
Extraction method defaults toQUERY and native export is disabled.
Set thenative-extract-options options in the Extractor configuration file.
This configuration only applies when you use CSV as the file format for Snowflake native export. This allows you to tune parameters such as the compression type, control characters, delimiter, escape character, and line ending. Make sure to specify similar configurations in theApplierbulk-load parameter to avoid compatibility issues.

You can specify these options both globally and forspecific tables.

2. Specify stage configuration in the connection configuration file

Snowflake dumps extracted files into a stage in CSV or Parquet format. To specify the stage configuration, seeParameters related to stage configuration.

Sample`snapshot` mode configuration#

With native export disabled

snapshot:threads:32fetch-size-rows:100000min-job-size-rows:1000000max-jobs-per-chunk:32_traceDBTasks:trueper-table-config:    -catalog:DEMO_DBschema:tpchtables:CUSTOMER:num-jobs:32split-key:C_CUSTKEYrow-identifier-key: [C_CUSTKEY ]ORDERS:num-jobs:32split-key:O_ORDERKEYrow-identifier-key: [O_ORDERKEY ]split-hints:row-count-estimate:15000

With native export enabled

snapshot:threads:32fetch-size-rows:100000min-job-size-rows:1000000max-jobs-per-chunk:32_traceDBTasks:trueextraction-method:COPYnative-extract-options:compression-type:"GZIP"control-chars:delimiter:','escape:"\\"line-end:"\n"

For more information about the configuration parameters forsnapshot mode, seeSnapshot mode.

Configure`realtime` replication#

Sample`realtime` mode configuration#

realtime:threads:32fetch-size-rows:100000_traceDBTasks:truefetch-interval-s:0

For more information about the configuration parameters forrealtime mode, seeRealtime mode.

Movatterモバイル変換

Source Snowflake #

Prerequisites for CDC-based replication#

Limitations#

Real-time replication#

Native export#

I. Set up connection configuration#

Parameters related to Source Snowflake server connection#

Basic username and password authentication#

Use RSA key pair for authentication#

Parameters related to stage configuration#

Example configuration for`NATIVE` stage#

Example configuration for`S3` stage#

II. Set up Extractor configuration#

Configure`snapshot` replication#

Native export[v23.06.30.2]#

Sample`snapshot` mode configuration#

Configure`realtime` replication#

Sample`realtime` mode configuration#

Movatterモバイル変換

Source Snowflake#

Prerequisites for CDC-based replication#

Limitations#

Real-time replication#

Native export#

I. Set up connection configuration#

Parameters related to Source Snowflake server connection#

Basic username and password authentication#

Use RSA key pair for authentication#

Parameters related to stage configuration#

Example configuration forNATIVE stage#

Example configuration forS3 stage#

II. Set up Extractor configuration#

Configuresnapshot replication#

Native export[v23.06.30.2]#

Samplesnapshot mode configuration#

Configurerealtime replication#

Samplerealtime mode configuration#

Source Snowflake #

Example configuration for`NATIVE` stage#

Example configuration for`S3` stage#

Configure`snapshot` replication#

Sample`snapshot` mode configuration#

Configure`realtime` replication#

Sample`realtime` mode configuration#