Source Google BigQuery#
The extractedreplicant-cli will be referred to as the$REPLICANT_HOME directory in the proceeding steps.
I. Obtain the JDBC Driver for Google BigQuery#
Replicant requires the JDBC driver for Google BigQuery as a dependency. To obtain the appropriate driver, follow the steps below:
- Go to theJDBC drivers for BigQuery page.
- From there, download thelatest JDBC 4.2-compatible JDBC driver ZIP.
- From the downloaded ZIP, locate and extract the
GoogleBigQueryJDBC42.jarfile. - Put the
GoogleBigQueryJDBC42.jarfile inside$REPLICANT_HOME/libdirectory.
II. Set up Connection Configuration#
From
$REPLICANT_HOME, navigate to the sample connection configuration file:vi conf/conn/bigquery_src.yamlYou can store your connection credentials in a secrets management service and tell Replicant to retrieve the credentials. For more information, seeSecrets management.
Otherwise, you can put your credentials like usernames and passwords in plain form like the sample below:
type:BIGQUERYhost:https://www.googleapis.com/bigquery/v2port:443project-id:<bigquery_projectID>auth-type:0o-auth-service-acc-email:<your_service_account@your_project.iam.gserviceaccount.com>o-auth-pvt-key-path:<path_to_oauth_private_key>location:UStimeout:500username:"<your_username>"password:"<your_password>"max-connections:20max-retries:10retry-wait-duration-ms:1000
III. Set up Extractor Configuration#
From
$REPLICANT_HOME, navigate to the Extractor configuration file:vi conf/src/bigquery.yamlCurrently, Arcion only supports snapshot mode for BigQuery as Source. So make the necessary changes as follows in the
snapshotsection of the configuration file:snapshot:threads:32fetch-size-rows:10_000min-job-size-rows:1_000_000# max-jobs-per-chunk: 32per-table-config: -schema:tpchtables:partsupp:split-key:ps_partkeysupplier:split-key:s_suppkeyorders:split-key:o_orderkeynation:split-key:n_regionkey
For a detailed explanation of configuration parameters in the Extractor file, readExtractor Reference.