Continuous data replication to BigQuery using Striim Stay organized with collections Save and categorize content based on your preferences.
This tutorial demonstrates how to migrate a MySQL database toBigQuery using Striim. Striim is a comprehensive streamingextract, transform, and load (ETL) platform that enables online database migrations and continuous streamingreplication from on-premises and cloud data sources to Google Cloud dataservices.
This tutorial focuses on the implementation of a continuous replication fromCloud SQL for MySQL to BigQuery. It is intended for database administrators, ITprofessionals, and data architects interested in taking advantage ofBigQuery capabilities.
Objectives
- Launch the Stiim for BigQuery free trial.
- Use Striim to continuously replicate from Cloud SQL for MySQL toBigQuery.
Costs
In this document, you use the following billable components of Google Cloud:
To generate a cost estimate based on your projected usage, use thepricing calculator.
This tutorial also uses Striim, which includes a trial period. You can findStriim in theCloud Marketplace.
When you finish the tasks that are described in this document, you can avoid continued billing by deleting the resources that you created. For more information, seeClean up.
Before you begin
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.Roles required to select or create a project
- Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
- Create a project: To create a project, you need the Project Creator role (
roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission.Learn how to grant roles.
Verify that billing is enabled for your Google Cloud project.
Enable the Compute Engine and BigQuery APIs.
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM role (
roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enablepermission.Learn how to grant roles.In the Google Cloud console, activate Cloud Shell.
Set the default compute zone to
us-central1-a:gcloudconfigsetcompute/zoneus-central1-aexportCOMPUTE_ZONE=us-central1-aThis zone is where you deploy your database. For more information aboutzones, seeGeography and regions.
Create a Cloud SQL for MySQL instance
You create a Cloud SQL for MySQL instance that you later connect toStriim. In this case, the instance acts as the source transactional system thatyou later replicate. In a real-world scenario, the source database can be one ofmany transactional database systems.
In Cloud Shell, create the environment variables to create theinstance:
CSQL_NAME=striim-sql-srcCSQL_USERNAME=striim-userCSQL_USER_PWD=$(openssl rand -base64 18)CSQL_ROOT_PWD=$(openssl rand -base64 18)If you close the Cloud Shell session, you lose the variables.
Make a note of the
CSQL_USER_PWDandCSQL_ROOT_PWDpasswordsgenerated by the following commands:echo $CSQL_USER_PWD and echo $CSQL_ROOT_PWDCreate the Cloud SQL for MySQL instance:
gcloud sql instances create $CSQL_NAME \ --root-password=$CSQL_ROOT_PWD --zone=$COMPUTE_ZONE \ --tier=db-n1-standard-2 --enable-bin-logCreate a Cloud SQL for MySQL user that Striim can connect to:
gcloud sql users create $CSQL_USERNAME --instance $CSQL_NAME \ --password $CSQL_USER_PWD --host=%The Cloud SQL for MySQL database is set up for Striim to read.
Find the IP address of the Cloud SQL for MySQL instance and make anote of it:
gcloud sql instances describe $CSQL_NAME --format='get(ipAddresses.ipAddress)'
Set up Striim
To set up an instance of the Striim server software, you use theCloud Marketplace.
In the Google Cloud console, go to theStriim page in the Cloud Marketplace.
ClickLaunch.
In theNew Striim Deployment window, complete the following fields:
- Select the project that you created or selected to use for thistutorial.
- In theZone drop-down menu, select
us-central1-a. If you accept the terms for service, select theI accept theGoogle Cloud Marketplace Terms of Service checkbox. Terms of Service** checkbox.
Cloud Marketplace solutions typically come with various resourcesthat launch to support the software. Review the monthly billingestimate before launching the solution.
Leave all other settings at their default values.
ClickDeploy.
In the Google Cloud console, go to theDeployments page.
To review the deployment details of the Striim instance, click the nameof the Striim instance. Make a note of the name of the deployment and thename of the VM that has deployed.
To allow Striim to communicate with Cloud SQL for MySQL,add the Striim server's IP address to the Cloud SQL for MySQL instance's authorized networks:
STRIIMVM_NAME=STRIIM_VM_NAMESTRIIMVM_ZONE=us-central1-agcloud sql instances patch $CSQL_NAME \ --authorized-networks=$(gcloud compute instances describe $STRIIM_VM_NAME \ --format='get(networkInterfaces[0].accessConfigs[0].natIP)' \ --zone=$STRIIMVM_ZONE)
Replace the following:
STRIIM_VM_NAME: the name of the VM that youdeployed with Striim.
In the Google Cloud console, on the deployment instance details page,clickVisit the site to open the Striim web UI.
In the Striim configuration wizard, configure the following:
- Review the end-user license agreement. If you accept the terms,clickAccept Striim EULA and Continue.
- Enter your contact information.
- Enter the Cluster Name, Admin, Sys, and Striim Key passwords ofyour choice. Make a note of these passwords. ClickSave and Continue.
- Leave the key field blank to enable the trial, and then clickSave and Continue.
ClickLaunch. It takes about a minute for Striim to be configured.When done, clickLog In.
To log in to the Striim administrator console, log in with the
adminuserand the administrator password that you previously set. Keep this windowopen because you return to it in a later step.
Set up Connector/J
UseMySQL Connector/J to connect Striim to your Cloud SQL for MySQL instance. As of this writing,5.1.49 is the latest version of Connector/J.
In the Google Cloud console, go to theDeployments page.
For the Striim instance, clickSSH to automatically connect to theinstance.
Download the Connector/J to the instance and extract it:
wgethttps://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.49.tar.gztar-xvzfmysql-connector-java-5.1.49.tar.gzCopy the file to the Striim library path, allow it to be executable,and change ownership of the file that you downloaded:
sudo cp ~/mysql-connector-java-5.1.49/mysql-connector-java-5.1.49.jar /opt/striim/libsudo chmod +x /opt/striim/lib/mysql-connector-java-5.1.49.jarsudo chown striim /opt/striim/lib/mysql-connector-java-5.1.49.jarTo recognize the new library, restart the Striim server:
sudo systemctl stop striim-nodesudo systemctl stop striim-dbmssudo systemctl start striim-dbmssudo systemctl start striim-nodeGo back to the browser window with the administration console in it.Reload the page, and then log in using the
adminuser credentials.It can take a couple minutes for the server to complete its restart fromthe previous step, so you might get a browser error during that time. If youencounter an error, reload the page and log in again.
Load sample transactions to Cloud SQL
Before you can configure your first Striim app, load transactions into theMySQL instance.
In Cloud Shell, connect to the instance using theCloud SQL for MySQL instance credentials that you previously set:
gcloud sql connect $CSQL_NAME --user=$CSQL_USERNAMECreate a sample database and load some transactions into it:
CREATEDATABASEstriimdemo;USEstriimdemo;CREATETABLEORDERS(ORDER_IDInteger,ORDER_DATEVARCHAR(50),ORDER_MODEVARCHAR(8),CUSTOMER_IDInteger,ORDER_STATUSInteger,ORDER_TOTALFloat,SALES_REP_IDInteger,PROMOTION_IDInteger,PRIMARYKEY(ORDER_ID));INSERTINTOORDERS(ORDER_ID,ORDER_DATE,ORDER_MODE,CUSTOMER_ID,ORDER_STATUS,ORDER_TOTAL,SALES_REP_ID,PROMOTION_ID)VALUES(1001,1568927976017,'In-Store',1001,9,34672.59,331,9404);INSERTINTOORDERS(ORDER_ID,ORDER_DATE,ORDER_MODE,CUSTOMER_ID,ORDER_STATUS,ORDER_TOTAL,SALES_REP_ID,PROMOTION_ID)VALUES(1002,1568928036017,'In-Store',1002,1,28133.14,619,2689);INSERTINTOORDERS(ORDER_ID,ORDER_DATE,ORDER_MODE,CUSTOMER_ID,ORDER_STATUS,ORDER_TOTAL,SALES_REP_ID,PROMOTION_ID)VALUES(1003,1568928096017,'CompanyB',1003,1,37367.95,160,30888);INSERTINTOORDERS(ORDER_ID,ORDER_DATE,ORDER_MODE,CUSTOMER_ID,ORDER_STATUS,ORDER_TOTAL,SALES_REP_ID,PROMOTION_ID)VALUES(1004,1568928156017,'CompanyA',1004,1,7737.02,362,89488);INSERTINTOORDERS(ORDER_ID,ORDER_DATE,ORDER_MODE,CUSTOMER_ID,ORDER_STATUS,ORDER_TOTAL,SALES_REP_ID,PROMOTION_ID)VALUES(1005,1568928216017,'CompanyA',1005,9,15959.91,497,78454);INSERTINTOORDERS(ORDER_ID,ORDER_DATE,ORDER_MODE,CUSTOMER_ID,ORDER_STATUS,ORDER_TOTAL,SALES_REP_ID,PROMOTION_ID)VALUES(1006,1568928276017,'In-Store',1006,1,82531.55,399,22488);INSERTINTOORDERS(ORDER_ID,ORDER_DATE,ORDER_MODE,CUSTOMER_ID,ORDER_STATUS,ORDER_TOTAL,SALES_REP_ID,PROMOTION_ID)VALUES(1007,1568928336017,'CompanyA',1007,7,52929.61,420,66256);INSERTINTOORDERS(ORDER_ID,ORDER_DATE,ORDER_MODE,CUSTOMER_ID,ORDER_STATUS,ORDER_TOTAL,SALES_REP_ID,PROMOTION_ID)VALUES(1008,1568928396017,'Online',1008,1,26912.56,832,7262);INSERTINTOORDERS(ORDER_ID,ORDER_DATE,ORDER_MODE,CUSTOMER_ID,ORDER_STATUS,ORDER_TOTAL,SALES_REP_ID,PROMOTION_ID)VALUES(1009,1568928456017,'CompanyA',1009,1,97706.08,124,12185);INSERTINTOORDERS(ORDER_ID,ORDER_DATE,ORDER_MODE,CUSTOMER_ID,ORDER_STATUS,ORDER_TOTAL,SALES_REP_ID,PROMOTION_ID)VALUES(1010,1568928516017,'CompanyB',1010,1,47539.16,105,17868);To check the upload, count the records to ensure that 10 records wereinserted:
SELECT COUNT(*) FROM ORDERS;Leave the Cloud SQL for MySQL instance:
Exit
Create a BigQuery target dataset
In this section, you create a BigQuery dataset, and load serviceaccount credentials so that Striim can write to the target database from theGoogle Cloud console.
In Cloud Shell, create a BigQuery dataset:
bq --location=US mk -d \--description "Test Target for Striim." striimdemoFor this tutorial, you deploy BigQuery in the US.
Create a new target table:
bq mk \ --table \ --description "Striim Table" \ --label organization:striimlab striimdemo.orders order_id:INTEGER,order_date:STRING,order_mode:STRING,customer_id:INTEGER,order_status:INTEGER,order_total:FLOAT,sales_rep_id:INTEGER,promotion_id:INTEGERCreate a service account for Striim to connect to BigQuery:
gcloudiamservice-accountscreatestriim-bq\--display-namestriim-bqexportsa_striim_bq=$(gcloudiamservice-accountslist\--filter="displayName:striim-bq"--format='value(email)')exportPROJECT=$(gcloudinfo\--format='value(config.project)')gcloudprojectsadd-iam-policy-binding$PROJECT\--roleroles/bigquery.dataEditor\--memberserviceAccount:$sa_striim_bqgcloudprojectsadd-iam-policy-binding$PROJECT\--roleroles/bigquery.user--memberserviceAccount:$sa_striim_bqgcloudiamservice-accountskeyscreate~/striim-bq-key.json\--iam-account$sa_striim_bqA key called
striim-bq-key.jsonis created in your home path.Move the newly generated key to the server:
gcloud compute scp ~/striim-bq-key.json $STRIIM_VM_NAME:~ \ --zone=$COMPUTE_ZONEMove the key to the
/opt/striimdirectory:gcloud compute ssh \ --zone=$COMPUTE_ZONE $STRIIM_VM_NAME \ -- 'sudo cp ~/striim-bq-key.json /opt/striim && sudo chown striim /opt/striim/striim-bq-key.json'You are now ready to create a Striim app.
Create an online database migration
An online database migration moves data from a source database (eitheron-premises or hosted on a cloud provider) to a target database or datawarehouse in Google Cloud. The source database remains fully accessible bythe business app and with minimal performance impact on the source databaseduring this time.
In an online migration, you perform an initial bulkload, and also continuouslycapture any changes. You then synchronize the two databases to ensure that dataisn't lost.
If you want to focus on creating a change data capture (CDC) pipeline, see theCreate a continuous Cloud SQL for MySQL to BigQuery data pipeline section.
Create the source connection
- In the Google Cloud console, on the instance details page, clickVisit the site to open the Striim web UI.
In the Striim web UI, clickApps.
ClickAdd App.
ClickStart from Scratch.
In theName field, enter
MySQLToBigQuery_initLoad.In theNamespace drop-down menu, select the defaultAdminnamespace. This label is used to organize your apps.
ClickSave.
On theFlow Designer page, to do a one-time initial bulkload ofdata, from theSources pane, dragDatabase to the flow designpalette in the center of the screen and enter the following connectionproperties:
- In theName field, enter
mysql_source. - Leave theAdapter field at the default value ofDatabaseReader.
- In theConnection URL field, enter
jdbc:mysql://PRIMARY_ADDRESS:3306/striimdemo.ReplacePRIMARY_ADDRESSwith the IP address ofthe Cloud SQL instance that you created in theprevious section. - In theUsername field, enter the username that you set asthe
CSQL_USERenvironment variable,striim-user. - In thePassword field, enter the
CSQL_USER_PWDvalue thatyou made a note of when youcreated a Cloud SQL for MySQL instance. - To see more configuration properties, clickShow optionalproperties.
- In theTables field, enter
striimdemo.ORDERS. - ForOutput to, selectNew output.
- In theNew output field, enter
stream_CloudSQLMySQLInitLoad. - ClickSave.
- In theName field, enter
To test the configuration settings to make sure that Striim cansuccessfully connect to Cloud SQL for MySQL, clickCreated, andthen selectDeploy App.
In theDeployment window, you can specify that you want to run partsof your app on some of your deployment topology. For this tutorial, selectDefault, and clickDeploy.
To preview your data as it flows through the Striim pipeline, click
mysql_source DataBase reader, and then clickPreview on run.ClickDeployed, and then clickStart App.
The Striim app starts running, and data flows through the pipeline. If thereare any errors, there is an issue connecting to the source database becausethere is only a source component in the pipeline. If you see your appsuccessfully run, but no data flows through, typically that means that youdon't have any data in your database.
After you've successfully connected to your source database and testedthat it can read data, clickRunning, and then selectStop App.
ClickStopped, and then selectUndeploy App. You are now readyto connect this flow to BigQuery.
Perform an initial load into BigQuery
In the Striim web UI, click
mysql_source Database reader.ClickConnect to next component, selectConnect next Targetcomponent, and then complete the following fields:
- In theName field, enter
bq_target. - In theAdapter field, enter
BigQueryWriter. - TheTables property is a source/target pair separated by commas. Itis in the format of
srcSchema1.srcTable1,tgtSchema1.tgtTable1;srcSchema2.srcTable2,tgtSchema2.tgtTable2.For this tutorial, enterstriimdemo.ORDERS,striimdemo.orders. - TheService Account Key requires a fully qualified path and name ofthe key file that was previously generated. For this tutorial, enter
/opt/striim/striim-bq-key.json. - In theProject ID field, enter your Google Cloud project ID.
- In theName field, enter
ClickSave.
To deploy the app and preview the data flow, do the following:
- ClickCreated, and then selectDeploy App.
- In theDeployment window, selectDefault, and then clickDeploy.
- To preview your data as it flows through the Striim pipeline,click
mysql_source Database reader, and then clickPreview on run. - ClickDeployed, and then clickStart App.
In the Google Cloud console, go to theBigQuery page.
Click the
striimdemodatabase.In the query editor, enter
SELECT COUNT(*) AS ORDERS, AVG(ORDER_TOTAL)AS ORDERS_AVE, SUM(ORDER_TOTAL) AS ORDERS_SUM FROM striimdemo.orders;andthen clickRun. It can take up to 90 seconds for the transactions tofully replicate to BigQuery due to the default configurationsettings. After it's successfully replicated, the results table outputs theaverage order of43148.952and the total size of the orders,431489.52.You have successfully set up your Striim environment and pipeline toperform a batch load.
Create a continuous data pipeline from Cloud SQL for MySQL to BigQuery
With an initial one-time bulkload in place, you can now set up a continuousreplication pipeline. This pipeline is similar to the bulk pipeline that youcreated, but with a different source object.
Create a CDC source
- In the Striim web UI, clickHome.
- ClickApps.
- ClickStart from Scratch.
- In theName field, enter
MySQLToBigQuery_cdc. - In theNamespace drop-down menu, selectAdmin namespace.
- On theFlow Designer page, drag aMySQL CDC source reader to thecenter of the design palette.
Configure your new MySQL CDC source with the following information:
- In theName field, enter
mysql_cdc_source. - Leave theAdapter field at the default value ofMysqlReader.
- In theConnection URL field, enter
jdbc:mysql://PRIMARY_ADDRESS:3306/striimdemo. - Enter the username and password that you used in the previous section.
- To see more configuration properties, clickShow optionalproperties.
- In theTables field, enter
striimdemo.ORDERS. - ForOutput to, selectNew output.
- In theNew output field, enter
stream_CloudSQLMySQLCDCLoad. - ClickSave.
- In theName field, enter
Load new transactions into BigQuery
- In the Striim web UI, clickMysqlReader.
ClickConnect to next component, and then selectConnect nextTarget component.
- In theName field, enter
bq_cdc_target. - In theAdapter field, enter
BigQueryWriter. TheTables property is a source/target pair separated bycommas. It is in the format of
srcSchema1.srcTable1,tgtSchema1.tgtTable1;srcSchema2.srcTable2,tgtSchema2.tgtTable2.For this tutorial, usestriimdemo.ORDERS,striimdemo.orders.TheService Account Key requires a fully qualified pathand name of the key file that was previously generated. For thistutorial, enter
/opt/striim/striim-bq-key.jsonIn theProject ID field, enter your Google Cloud project ID.
- In theName field, enter
ClickSave.
To deploy the app and preview the data flow, do the following:
- ClickCreated, and then selectDeploy App.
- In theDeployment window, selectDefault, and then clickDeploy.
- To preview your data as it flows through the Striim pipeline, clickMysqlReader, and then clickPreview on Run.
- ClickDeployed, and then clickStart App.
In Cloud Shell, connect to your Cloud SQL for MySQL instance:
gcloud sql connect $CSQL_NAME --user=$CSQL_USERNAMEConnect to your database and load new transactions into it:
USEstriimdemo;INSERTINTOORDERS(ORDER_ID,ORDER_DATE,ORDER_MODE,CUSTOMER_ID,ORDER_STATUS,ORDER_TOTAL,SALES_REP_ID,PROMOTION_ID)VALUES(1011,1568928576017,'In-Store',1011,9,13879.56,320,88252);INSERTINTOORDERS(ORDER_ID,ORDER_DATE,ORDER_MODE,CUSTOMER_ID,ORDER_STATUS,ORDER_TOTAL,SALES_REP_ID,PROMOTION_ID)VALUES(1012,1568928636017,'CompanyA',1012,1,19729.99,76,95203);INSERTINTOORDERS(ORDER_ID,ORDER_DATE,ORDER_MODE,CUSTOMER_ID,ORDER_STATUS,ORDER_TOTAL,SALES_REP_ID,PROMOTION_ID)VALUES(1013,1568928696017,'In-Store',1013,5,7286.68,164,45162);INSERTINTOORDERS(ORDER_ID,ORDER_DATE,ORDER_MODE,CUSTOMER_ID,ORDER_STATUS,ORDER_TOTAL,SALES_REP_ID,PROMOTION_ID)VALUES(1014,1568928756017,'Online',1014,1,87268.61,909,70407);INSERTINTOORDERS(ORDER_ID,ORDER_DATE,ORDER_MODE,CUSTOMER_ID,ORDER_STATUS,ORDER_TOTAL,SALES_REP_ID,PROMOTION_ID)VALUES(1015,1568928816017,'CompanyB',1015,1,69744.13,424,79401);In the Striim web UI, on theTransactions view page, transactionsnow populate the page and show that data is flowing.
In the Google Cloud console, go to theBigQuery page.
Click the
striimdemodatabase.To verify that your data is successfully replicated, in theQueryEditor enter
SELECT COUNT(*) AS ORDERS, AVG(ORDER_TOTAL) AS ORDERS_AVE,SUM(ORDER_TOTAL) AS ORDERS_SUM FROM striimdemo.orders;and then clickRun. The results table outputs the average order of43148.952and thetotal size of the orders,431489.52.It can take up to 90 seconds for the transactions to fullyreplicate to BigQuery due to the default configurationsettings.
Congratulations, you have successfully set up a streaming replication pipelinefrom Cloud SQL for MySQL to BigQuery.
Clean up
The easiest way to eliminate billing is to delete the Google Cloud project youcreated for the tutorial. Alternatively, you can delete the individualresources.Delete the project
What's next
- Explore reference architectures, diagrams, and best practices about Google Cloud.Take a look at ourCloud Architecture Center.
- Look at theGoogle Cloud Data Migration content.
- To learn about Striim,visit the website,schedule a demo with a Striim technologist, and subscribe to theStriim blog.
- To learn how to set up continuous data movement from Oracle toBigQuery, seeOracle to Google BigQuery – Continuous Movement of On-Premises Data via CDC and theMove Oracle to Google BigQuery in Real Time video.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2024-02-13 UTC.