Continuous data replication to BigQuery using Striim

Last reviewed 2024-02-13 UTC
By: Edward Bell, Solutions Architect,Striim, Inc.

This tutorial demonstrates how to migrate a MySQL database toBigQuery using Striim. Striim is a comprehensive streamingextract, transform, and load (ETL) platform that enables online database migrations and continuous streamingreplication from on-premises and cloud data sources to Google Cloud dataservices.

This tutorial focuses on the implementation of a continuous replication fromCloud SQL for MySQL to BigQuery. It is intended for database administrators, ITprofessionals, and data architects interested in taking advantage ofBigQuery capabilities.

Objectives

  • Launch the Stiim for BigQuery free trial.
  • Use Striim to continuously replicate from Cloud SQL for MySQL toBigQuery.

Costs

In this document, you use the following billable components of Google Cloud:

To generate a cost estimate based on your projected usage, use thepricing calculator.

New Google Cloud users might be eligible for afree trial.

This tutorial also uses Striim, which includes a trial period. You can findStriim in theCloud Marketplace.

When you finish the tasks that are described in this document, you can avoid continued billing by deleting the resources that you created. For more information, seeClean up.

Before you begin

  1. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Roles required to select or create a project

    • Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
    • Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.create permission.Learn how to grant roles.
    Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.

    Go to project selector

  2. Verify that billing is enabled for your Google Cloud project.

  3. Enable the Compute Engine and BigQuery APIs.

    Roles required to enable APIs

    To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enable permission.Learn how to grant roles.

    Enable the APIs

  4. In the Google Cloud console, activate Cloud Shell.

    Activate Cloud Shell

  5. Set the default compute zone tous-central1-a:

    gcloudconfigsetcompute/zoneus-central1-aexportCOMPUTE_ZONE=us-central1-a

    This zone is where you deploy your database. For more information aboutzones, seeGeography and regions.

Create a Cloud SQL for MySQL instance

You create a Cloud SQL for MySQL instance that you later connect toStriim. In this case, the instance acts as the source transactional system thatyou later replicate. In a real-world scenario, the source database can be one ofmany transactional database systems.

  1. In Cloud Shell, create the environment variables to create theinstance:

    CSQL_NAME=striim-sql-srcCSQL_USERNAME=striim-userCSQL_USER_PWD=$(openssl rand -base64 18)CSQL_ROOT_PWD=$(openssl rand -base64 18)

    If you close the Cloud Shell session, you lose the variables.

  2. Make a note of theCSQL_USER_PWD andCSQL_ROOT_PWD passwordsgenerated by the following commands:

    echo $CSQL_USER_PWD and echo $CSQL_ROOT_PWD
  3. Create the Cloud SQL for MySQL instance:

    gcloud sql instances create $CSQL_NAME \    --root-password=$CSQL_ROOT_PWD --zone=$COMPUTE_ZONE \    --tier=db-n1-standard-2 --enable-bin-log
  4. Create a Cloud SQL for MySQL user that Striim can connect to:

    gcloud sql users create $CSQL_USERNAME --instance $CSQL_NAME \    --password $CSQL_USER_PWD --host=%

    The Cloud SQL for MySQL database is set up for Striim to read.

  5. Find the IP address of the Cloud SQL for MySQL instance and make anote of it:

    gcloud sql instances describe $CSQL_NAME --format='get(ipAddresses.ipAddress)'

Set up Striim

To set up an instance of the Striim server software, you use theCloud Marketplace.

  1. In the Google Cloud console, go to theStriim page in the Cloud Marketplace.

    Go to Striim in the Cloud Marketplace

  2. ClickLaunch.

  3. In theNew Striim Deployment window, complete the following fields:

    • Select the project that you created or selected to use for thistutorial.
    • In theZone drop-down menu, selectus-central1-a.
    • If you accept the terms for service, select theI accept theGoogle Cloud Marketplace Terms of Service checkbox. Terms of Service** checkbox.

      Cloud Marketplace solutions typically come with various resourcesthat launch to support the software. Review the monthly billingestimate before launching the solution.

    • Leave all other settings at their default values.

  4. ClickDeploy.

  5. In the Google Cloud console, go to theDeployments page.

    Go to Deployments

  6. To review the deployment details of the Striim instance, click the nameof the Striim instance. Make a note of the name of the deployment and thename of the VM that has deployed.

  7. To allow Striim to communicate with Cloud SQL for MySQL,add the Striim server's IP address to the Cloud SQL for MySQL instance's authorized networks:

    STRIIMVM_NAME=STRIIM_VM_NAMESTRIIMVM_ZONE=us-central1-agcloud sql instances patch $CSQL_NAME \    --authorized-networks=$(gcloud compute instances describe $STRIIM_VM_NAME \    --format='get(networkInterfaces[0].accessConfigs[0].natIP)' \    --zone=$STRIIMVM_ZONE)

    Replace the following:

    • STRIIM_VM_NAME: the name of the VM that youdeployed with Striim.
  8. In the Google Cloud console, on the deployment instance details page,clickVisit the site to open the Striim web UI.

  9. In the Striim configuration wizard, configure the following:

    • Review the end-user license agreement. If you accept the terms,clickAccept Striim EULA and Continue.
    • Enter your contact information.
    • Enter the Cluster Name, Admin, Sys, and Striim Key passwords ofyour choice. Make a note of these passwords. ClickSave and Continue.
    • Leave the key field blank to enable the trial, and then clickSave and Continue.
  10. ClickLaunch. It takes about a minute for Striim to be configured.When done, clickLog In.

  11. To log in to the Striim administrator console, log in with theadmin userand the administrator password that you previously set. Keep this windowopen because you return to it in a later step.

Set up Connector/J

UseMySQL Connector/J to connect Striim to your Cloud SQL for MySQL instance. As of this writing,5.1.49 is the latest version of Connector/J.

  1. In the Google Cloud console, go to theDeployments page.

    Go to Deployments

  2. For the Striim instance, clickSSH to automatically connect to theinstance.

  3. Download the Connector/J to the instance and extract it:

    wgethttps://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.49.tar.gztar-xvzfmysql-connector-java-5.1.49.tar.gz
  4. Copy the file to the Striim library path, allow it to be executable,and change ownership of the file that you downloaded:

    sudo cp ~/mysql-connector-java-5.1.49/mysql-connector-java-5.1.49.jar /opt/striim/libsudo chmod +x /opt/striim/lib/mysql-connector-java-5.1.49.jarsudo chown striim /opt/striim/lib/mysql-connector-java-5.1.49.jar
  5. To recognize the new library, restart the Striim server:

    sudo systemctl stop striim-nodesudo systemctl stop striim-dbmssudo systemctl start striim-dbmssudo systemctl start striim-node
  6. Go back to the browser window with the administration console in it.Reload the page, and then log in using theadmin user credentials.

    It can take a couple minutes for the server to complete its restart fromthe previous step, so you might get a browser error during that time. If youencounter an error, reload the page and log in again.

Load sample transactions to Cloud SQL

Before you can configure your first Striim app, load transactions into theMySQL instance.

  1. In Cloud Shell, connect to the instance using theCloud SQL for MySQL instance credentials that you previously set:

    gcloud sql connect $CSQL_NAME --user=$CSQL_USERNAME
  2. Create a sample database and load some transactions into it:

    CREATEDATABASEstriimdemo;USEstriimdemo;CREATETABLEORDERS(ORDER_IDInteger,ORDER_DATEVARCHAR(50),ORDER_MODEVARCHAR(8),CUSTOMER_IDInteger,ORDER_STATUSInteger,ORDER_TOTALFloat,SALES_REP_IDInteger,PROMOTION_IDInteger,PRIMARYKEY(ORDER_ID));INSERTINTOORDERS(ORDER_ID,ORDER_DATE,ORDER_MODE,CUSTOMER_ID,ORDER_STATUS,ORDER_TOTAL,SALES_REP_ID,PROMOTION_ID)VALUES(1001,1568927976017,'In-Store',1001,9,34672.59,331,9404);INSERTINTOORDERS(ORDER_ID,ORDER_DATE,ORDER_MODE,CUSTOMER_ID,ORDER_STATUS,ORDER_TOTAL,SALES_REP_ID,PROMOTION_ID)VALUES(1002,1568928036017,'In-Store',1002,1,28133.14,619,2689);INSERTINTOORDERS(ORDER_ID,ORDER_DATE,ORDER_MODE,CUSTOMER_ID,ORDER_STATUS,ORDER_TOTAL,SALES_REP_ID,PROMOTION_ID)VALUES(1003,1568928096017,'CompanyB',1003,1,37367.95,160,30888);INSERTINTOORDERS(ORDER_ID,ORDER_DATE,ORDER_MODE,CUSTOMER_ID,ORDER_STATUS,ORDER_TOTAL,SALES_REP_ID,PROMOTION_ID)VALUES(1004,1568928156017,'CompanyA',1004,1,7737.02,362,89488);INSERTINTOORDERS(ORDER_ID,ORDER_DATE,ORDER_MODE,CUSTOMER_ID,ORDER_STATUS,ORDER_TOTAL,SALES_REP_ID,PROMOTION_ID)VALUES(1005,1568928216017,'CompanyA',1005,9,15959.91,497,78454);INSERTINTOORDERS(ORDER_ID,ORDER_DATE,ORDER_MODE,CUSTOMER_ID,ORDER_STATUS,ORDER_TOTAL,SALES_REP_ID,PROMOTION_ID)VALUES(1006,1568928276017,'In-Store',1006,1,82531.55,399,22488);INSERTINTOORDERS(ORDER_ID,ORDER_DATE,ORDER_MODE,CUSTOMER_ID,ORDER_STATUS,ORDER_TOTAL,SALES_REP_ID,PROMOTION_ID)VALUES(1007,1568928336017,'CompanyA',1007,7,52929.61,420,66256);INSERTINTOORDERS(ORDER_ID,ORDER_DATE,ORDER_MODE,CUSTOMER_ID,ORDER_STATUS,ORDER_TOTAL,SALES_REP_ID,PROMOTION_ID)VALUES(1008,1568928396017,'Online',1008,1,26912.56,832,7262);INSERTINTOORDERS(ORDER_ID,ORDER_DATE,ORDER_MODE,CUSTOMER_ID,ORDER_STATUS,ORDER_TOTAL,SALES_REP_ID,PROMOTION_ID)VALUES(1009,1568928456017,'CompanyA',1009,1,97706.08,124,12185);INSERTINTOORDERS(ORDER_ID,ORDER_DATE,ORDER_MODE,CUSTOMER_ID,ORDER_STATUS,ORDER_TOTAL,SALES_REP_ID,PROMOTION_ID)VALUES(1010,1568928516017,'CompanyB',1010,1,47539.16,105,17868);
  3. To check the upload, count the records to ensure that 10 records wereinserted:

    SELECT COUNT(*) FROM ORDERS;
  4. Leave the Cloud SQL for MySQL instance:

    Exit

Create a BigQuery target dataset

In this section, you create a BigQuery dataset, and load serviceaccount credentials so that Striim can write to the target database from theGoogle Cloud console.

  1. In Cloud Shell, create a BigQuery dataset:

    bq --location=US mk -d \--description "Test Target for Striim." striimdemo

    For this tutorial, you deploy BigQuery in the US.

  2. Create a new target table:

    bq mk \    --table \    --description "Striim Table" \    --label organization:striimlab striimdemo.orders order_id:INTEGER,order_date:STRING,order_mode:STRING,customer_id:INTEGER,order_status:INTEGER,order_total:FLOAT,sales_rep_id:INTEGER,promotion_id:INTEGER
  3. Create a service account for Striim to connect to BigQuery:

    gcloudiamservice-accountscreatestriim-bq\--display-namestriim-bqexportsa_striim_bq=$(gcloudiamservice-accountslist\--filter="displayName:striim-bq"--format='value(email)')exportPROJECT=$(gcloudinfo\--format='value(config.project)')gcloudprojectsadd-iam-policy-binding$PROJECT\--roleroles/bigquery.dataEditor\--memberserviceAccount:$sa_striim_bqgcloudprojectsadd-iam-policy-binding$PROJECT\--roleroles/bigquery.user--memberserviceAccount:$sa_striim_bqgcloudiamservice-accountskeyscreate~/striim-bq-key.json\--iam-account$sa_striim_bq

    A key calledstriim-bq-key.json is created in your home path.

  4. Move the newly generated key to the server:

    gcloud compute scp ~/striim-bq-key.json $STRIIM_VM_NAME:~ \    --zone=$COMPUTE_ZONE
  5. Move the key to the/opt/striim directory:

    gcloud compute ssh \    --zone=$COMPUTE_ZONE $STRIIM_VM_NAME \    -- 'sudo cp ~/striim-bq-key.json /opt/striim && sudo chown striim /opt/striim/striim-bq-key.json'

    You are now ready to create a Striim app.

Create an online database migration

An online database migration moves data from a source database (eitheron-premises or hosted on a cloud provider) to a target database or datawarehouse in Google Cloud. The source database remains fully accessible bythe business app and with minimal performance impact on the source databaseduring this time.

In an online migration, you perform an initial bulkload, and also continuouslycapture any changes. You then synchronize the two databases to ensure that dataisn't lost.

If you want to focus on creating a change data capture (CDC) pipeline, see theCreate a continuous Cloud SQL for MySQL to BigQuery data pipeline section.

Create the source connection

  1. In the Google Cloud console, on the instance details page, clickVisit the site to open the Striim web UI.
  2. In the Striim web UI, clickApps.

  3. ClickAdd App.

  4. ClickStart from Scratch.

  5. In theName field, enterMySQLToBigQuery_initLoad.

  6. In theNamespace drop-down menu, select the defaultAdminnamespace. This label is used to organize your apps.

  7. ClickSave.

  8. On theFlow Designer page, to do a one-time initial bulkload ofdata, from theSources pane, dragDatabase to the flow designpalette in the center of the screen and enter the following connectionproperties:

    • In theName field, entermysql_source.
    • Leave theAdapter field at the default value ofDatabaseReader.
    • In theConnection URL field, enterjdbc:mysql://PRIMARY_ADDRESS:3306/striimdemo.ReplacePRIMARY_ADDRESS with the IP address ofthe Cloud SQL instance that you created in theprevious section.
    • In theUsername field, enter the username that you set astheCSQL_USER environment variable,striim-user.
    • In thePassword field, enter theCSQL_USER_PWD value thatyou made a note of when youcreated a Cloud SQL for MySQL instance.
    • To see more configuration properties, clickShow optionalproperties.
    • In theTables field, enterstriimdemo.ORDERS.
    • ForOutput to, selectNew output.
    • In theNew output field, enterstream_CloudSQLMySQLInitLoad.
    • ClickSave.
  9. To test the configuration settings to make sure that Striim cansuccessfully connect to Cloud SQL for MySQL, clickCreated, andthen selectDeploy App.

  10. In theDeployment window, you can specify that you want to run partsof your app on some of your deployment topology. For this tutorial, selectDefault, and clickDeploy.

  11. To preview your data as it flows through the Striim pipeline, clickmysql_source DataBase reader, and then clickPreview on run.

  12. ClickDeployed, and then clickStart App.

    The Striim app starts running, and data flows through the pipeline. If thereare any errors, there is an issue connecting to the source database becausethere is only a source component in the pipeline. If you see your appsuccessfully run, but no data flows through, typically that means that youdon't have any data in your database.

  13. After you've successfully connected to your source database and testedthat it can read data, clickRunning, and then selectStop App.

  14. ClickStopped, and then selectUndeploy App. You are now readyto connect this flow to BigQuery.

Perform an initial load into BigQuery

  1. In the Striim web UI, clickmysql_source Database reader.

  2. ClickConnect to next component, selectConnect next Targetcomponent, and then complete the following fields:

    • In theName field, enterbq_target.
    • In theAdapter field, enterBigQueryWriter.
    • TheTables property is a source/target pair separated by commas. Itis in the format ofsrcSchema1.srcTable1,tgtSchema1.tgtTable1;srcSchema2.srcTable2,tgtSchema2.tgtTable2.For this tutorial, enterstriimdemo.ORDERS,striimdemo.orders.
    • TheService Account Key requires a fully qualified path and name ofthe key file that was previously generated. For this tutorial, enter/opt/striim/striim-bq-key.json.
    • In theProject ID field, enter your Google Cloud project ID.
  3. ClickSave.

  4. To deploy the app and preview the data flow, do the following:

    • ClickCreated, and then selectDeploy App.
    • In theDeployment window, selectDefault, and then clickDeploy.
    • To preview your data as it flows through the Striim pipeline,clickmysql_source Database reader, and then clickPreview on run.
    • ClickDeployed, and then clickStart App.
  5. In the Google Cloud console, go to theBigQuery page.

    Go to BigQuery

  6. Click thestriimdemo database.

  7. In the query editor, enterSELECT COUNT(*) AS ORDERS, AVG(ORDER_TOTAL)AS ORDERS_AVE, SUM(ORDER_TOTAL) AS ORDERS_SUM FROM striimdemo.orders; andthen clickRun. It can take up to 90 seconds for the transactions tofully replicate to BigQuery due to the default configurationsettings. After it's successfully replicated, the results table outputs theaverage order of43148.952 and the total size of the orders,431489.52.

    You have successfully set up your Striim environment and pipeline toperform a batch load.

Create a continuous data pipeline from Cloud SQL for MySQL to BigQuery

With an initial one-time bulkload in place, you can now set up a continuousreplication pipeline. This pipeline is similar to the bulk pipeline that youcreated, but with a different source object.

Create a CDC source

  1. In the Striim web UI, clickHome.
  2. ClickApps.
  3. ClickStart from Scratch.
  4. In theName field, enterMySQLToBigQuery_cdc.
  5. In theNamespace drop-down menu, selectAdmin namespace.
  6. On theFlow Designer page, drag aMySQL CDC source reader to thecenter of the design palette.
  7. Configure your new MySQL CDC source with the following information:

    • In theName field, entermysql_cdc_source.
    • Leave theAdapter field at the default value ofMysqlReader.
    • In theConnection URL field, enterjdbc:mysql://PRIMARY_ADDRESS:3306/striimdemo.
    • Enter the username and password that you used in the previous section.
    • To see more configuration properties, clickShow optionalproperties.
    • In theTables field, enterstriimdemo.ORDERS.
    • ForOutput to, selectNew output.
    • In theNew output field, enterstream_CloudSQLMySQLCDCLoad.
    • ClickSave.

Load new transactions into BigQuery

  1. In the Striim web UI, clickMysqlReader.
  2. ClickConnect to next component, and then selectConnect nextTarget component.

    • In theName field, enterbq_cdc_target.
    • In theAdapter field, enterBigQueryWriter.
    • TheTables property is a source/target pair separated bycommas. It is in the format ofsrcSchema1.srcTable1,tgtSchema1.tgtTable1;srcSchema2.srcTable2,tgtSchema2.tgtTable2.For this tutorial, usestriimdemo.ORDERS,striimdemo.orders.

    • TheService Account Key requires a fully qualified pathand name of the key file that was previously generated. For thistutorial, enter/opt/striim/striim-bq-key.json

    • In theProject ID field, enter your Google Cloud project ID.

  3. ClickSave.

  4. To deploy the app and preview the data flow, do the following:

    • ClickCreated, and then selectDeploy App.
    • In theDeployment window, selectDefault, and then clickDeploy.
    • To preview your data as it flows through the Striim pipeline, clickMysqlReader, and then clickPreview on Run.
    • ClickDeployed, and then clickStart App.
  5. In Cloud Shell, connect to your Cloud SQL for MySQL instance:

    gcloud sql connect $CSQL_NAME --user=$CSQL_USERNAME
  6. Connect to your database and load new transactions into it:

    USEstriimdemo;INSERTINTOORDERS(ORDER_ID,ORDER_DATE,ORDER_MODE,CUSTOMER_ID,ORDER_STATUS,ORDER_TOTAL,SALES_REP_ID,PROMOTION_ID)VALUES(1011,1568928576017,'In-Store',1011,9,13879.56,320,88252);INSERTINTOORDERS(ORDER_ID,ORDER_DATE,ORDER_MODE,CUSTOMER_ID,ORDER_STATUS,ORDER_TOTAL,SALES_REP_ID,PROMOTION_ID)VALUES(1012,1568928636017,'CompanyA',1012,1,19729.99,76,95203);INSERTINTOORDERS(ORDER_ID,ORDER_DATE,ORDER_MODE,CUSTOMER_ID,ORDER_STATUS,ORDER_TOTAL,SALES_REP_ID,PROMOTION_ID)VALUES(1013,1568928696017,'In-Store',1013,5,7286.68,164,45162);INSERTINTOORDERS(ORDER_ID,ORDER_DATE,ORDER_MODE,CUSTOMER_ID,ORDER_STATUS,ORDER_TOTAL,SALES_REP_ID,PROMOTION_ID)VALUES(1014,1568928756017,'Online',1014,1,87268.61,909,70407);INSERTINTOORDERS(ORDER_ID,ORDER_DATE,ORDER_MODE,CUSTOMER_ID,ORDER_STATUS,ORDER_TOTAL,SALES_REP_ID,PROMOTION_ID)VALUES(1015,1568928816017,'CompanyB',1015,1,69744.13,424,79401);
  7. In the Striim web UI, on theTransactions view page, transactionsnow populate the page and show that data is flowing.

  8. In the Google Cloud console, go to theBigQuery page.

    Go to BigQuery

  9. Click thestriimdemo database.

  10. To verify that your data is successfully replicated, in theQueryEditor enterSELECT COUNT(*) AS ORDERS, AVG(ORDER_TOTAL) AS ORDERS_AVE,SUM(ORDER_TOTAL) AS ORDERS_SUM FROM striimdemo.orders; and then clickRun. The results table outputs the average order of43148.952 and thetotal size of the orders,431489.52.

    It can take up to 90 seconds for the transactions to fullyreplicate to BigQuery due to the default configurationsettings.

Congratulations, you have successfully set up a streaming replication pipelinefrom Cloud SQL for MySQL to BigQuery.

Clean up

The easiest way to eliminate billing is to delete the Google Cloud project youcreated for the tutorial. Alternatively, you can delete the individualresources.

Delete the project

    Caution: Deleting a project has the following effects:
    • Everything in the project is deleted. If you used an existing project for the tasks in this document, when you delete it, you also delete any other work you've done in the project.
    • Custom project IDs are lost. When you created this project, you might have created a custom project ID that you want to use in the future. To preserve the URLs that use the project ID, such as anappspot.com URL, delete selected resources inside the project instead of deleting the whole project.
  1. In the Google Cloud console, go to theManage resources page.

    Go to Manage resources

  2. In the project list, select the project that you want to delete, and then clickDelete.
  3. In the dialog, type the project ID, and then clickShut down to delete the project.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2024-02-13 UTC.