Deploy scalable BigQuery backup automation Stay organized with collections Save and categorize content based on your preferences.
This document describes how you deployScalable BigQuery backup automation.
This document is intended for cloud architects, engineers, and data governanceofficers who want to define and automate data policies in their organizations.Experience with Terraform is helpful.
Architecture
The following diagram shows the automated backup architecture:
Cloud Scheduler triggers the run. The dispatcher service, using BigQuery API,lists the in-scope tables. Through a Pub/Sub message,the dispatcher service submits one request for each table to the configurator service.The configurator service determines the backup policies for the tables, and thensubmits one request for each table to the relevant Cloud Run service.The Cloud Run service then submits a request to the BigQuery APIand runs the backup operations. Pub/Sub triggers the tagger service,which logs the results and updates the backup state in theCloud Storage metadata layer.
For details about the architecture, seeScalable BigQuery backup automation.
Objectives
- Build Cloud Run services.
- Configure Terraform variables.
- Run the Terraform and manual deployment scripts.
- Run the solution.
Costs
In this document, you use the following billable components of Google Cloud:
- BigQuery
- Pub/Sub
- Cloud Logging
- Cloud Run
- Cloud Storage
- Cloud Scheduler
- Firestore in Datastore mode (Datastore)
To generate a cost estimate based on your projected usage, use thepricing calculator.
When you finish the tasks that are described in this document, you can avoidcontinued billing by deleting the resources that you created. For moreinformation, seeClean up.
Before you begin
If you're re-deploying the solution, you can skip this section (for example, afternew commits).
In this section, you create one-time resources.
In the Google Cloud console, activate Cloud Shell.
If you want to create a new Google Cloud project to use as the host projectfor the deployment, use the
gcloud projects createcommand:gcloudprojectscreatePROJECT_IDReplacePROJECT_ID with the ID of the project you want to create.
Install Maven:
- DownloadMaven.
In Cloud Shell, add Maven to
PATH:exportPATH=/DOWNLOADED_MAVEN_DIR/bin:$PATH
In Cloud Shell, clone the GitHub repository:
gitclonehttps://github.com/GoogleCloudPlatform/bq-backup-manager.gitSet and export the following environment variables:
exportPROJECT_ID=PROJECT_IDexportTF_SA=bq-backup-mgr-terraformexportCOMPUTE_REGION=COMPUTE_REGIONexportDATA_REGION=DATA_REGIONexportBUCKET_NAME=${PROJECT_ID}-bq-backup-mgrexportBUCKET=gs://${BUCKET_NAME}exportDOCKER_REPO_NAME=docker-repoexportCONFIG=bq-backup-managerexportACCOUNT=ACCOUNT_EMAILgcloudconfigconfigurationscreate$CONFIGgcloudconfigsetproject$PROJECT_IDgcloudconfigsetaccount$ACCOUNTgcloudconfigsetcompute/region$COMPUTE_REGIONgcloudauthlogingcloudauthapplication-defaultloginReplace the following:
- PROJECT_ID: the ID of the Google Cloud host projectthat you want to deploy the solution to.
- COMPUTE_REGION: the Google Cloudregion where you wantto deploy compute resources like Cloud Run and Identity and Access Management (IAM).
- DATA_REGION: the Google Cloud region you want to deploydata resources (such as buckets and datasets) to.
- ACCOUNT_EMAIL: the user account email address.
Enable the APIs:
./scripts/enable_gcp_apis.shThe script enables the following APIs:
- Cloud Resource Manager API
- IAM API
- Data Catalog API
- Artifact Registry API
- BigQuery API
- Pub/Sub API
- Cloud Storage API
- Cloud Run Admin API
- Cloud Build API
- Service Usage API
- App Engine Admin API
- Serverless VPC Access API
- Cloud DNS API
Prepare the Terraform state bucket:
gcloudstoragebucketscreate$BUCKET--project=$PROJECT_ID--location=$COMPUTE_REGION--uniform-bucket-level-accessPrepare the Terraform service account:
./scripts/prepare_terraform_service_account.shTo publish images that this solution uses, prepare aDocker repository:
gcloudartifactsrepositoriescreate$DOCKER_REPO_NAME--repository-format=docker\--location=$COMPUTE_REGION\--description="Docker repository for backups"
Deploy the infrastructure
Make sure that you've completedBefore you begin at least once.
In this section, follow the steps to deploy or redeploy the latest codebase tothe Google Cloud environment.
Activate the gcloud CLI configuration
In Cloud Shell, activate and authenticate thegcloud CLI configuration:
gcloudconfigconfigurationsactivate$CONFIGgcloudauthlogingcloudauthapplication-defaultlogin
Build Cloud Run services images
In Cloud Shell, build and deploy docker images to be used bythe Cloud Run service:
exportDISPATCHER_IMAGE=${COMPUTE_REGION}-docker.pkg.dev/${PROJECT_ID}/${DOCKER_REPO_NAME}/bqsm-dispatcher-service:latestexportCONFIGURATOR_IMAGE=${COMPUTE_REGION}-docker.pkg.dev/${PROJECT_ID}/${DOCKER_REPO_NAME}/bqsm-configurator-service:latestexportSNAPSHOTER_BQ_IMAGE=${COMPUTE_REGION}-docker.pkg.dev/${PROJECT_ID}/${DOCKER_REPO_NAME}/bqsm-snapshoter-bq-service:latestexportSNAPSHOTER_GCS_IMAGE=${COMPUTE_REGION}-docker.pkg.dev/${PROJECT_ID}/${DOCKER_REPO_NAME}/bqsm-snapshoter-gcs-service:latestexportTAGGER_IMAGE=${COMPUTE_REGION}-docker.pkg.dev/${PROJECT_ID}/${DOCKER_REPO_NAME}/bqsm-tagger-service:latest./scripts/deploy_services.sh
Configure Terraform variables
This deployment uses Terraform for configurations and a deployment script.
In Cloud Shell, create a new Terraform TFVARS file in whichyou can override the variables in this section:
exportVARS=FILENAME.tfvarsReplaceFILENAME with the name of the variables file that youcreated (for example,
my-variables). You can use theexample-variablesfile as a reference.In the TFVARS file, configure the project variables:
project = "PROJECT_ID"compute_region = "COMPUTE_REGION"data_region = "DATA_REGION"You can use the default values that are defined in thevariables.tf file or change the values.
Configure the Terraform service account, which you created and prepared earlierinBefore you begin:
terraform_service_account ="bq-backup-mgr-terraform@PROJECT_ID.iam.gserviceaccount.com"Make sure that you use the full email address of the account that you created.
Configure the Cloud Run services to use the containerimages that you built and deployed earlier:
dispatcher_service_image = "${COMPUTE_REGION}-docker.pkg.dev/${PROJECT_ID}/${DOCKER_REPO_NAME}/bqsm-dispatcher-service:latest"configurator_service_image = "${COMPUTE_REGION}-docker.pkg.dev/${PROJECT_ID}/${DOCKER_REPO_NAME}/bqsm-configurator-service:latest"snapshoter_bq_service_image = "${COMPUTE_REGION}-docker.pkg.dev/${PROJECT_ID}/${DOCKER_REPO_NAME}/bqsm-snapshoter-bq-service:latest"snapshoter_gcs_service_image = "${COMPUTE_REGION}-docker.pkg.dev/${PROJECT_ID}/${DOCKER_REPO_NAME}/bqsm-snapshoter-gcs-service:latest"tagger_service_image = "${COMPUTE_REGION}-docker.pkg.dev/${PROJECT_ID}/${DOCKER_REPO_NAME}/bqsm-tagger-service:latest"This script instructs Terraform to use these published images in theCloud Run services, which Terraform creates later.
Terraform only links a Cloud Run service to an existingimage. It doesn't build the images from the codebase, because that was completedin a previous step.
In the
schedulersvariable, define at least one scheduler. The schedulerperiodically lists and checks tables for required backups, based on theirtable-level backup cron schedules.{name = "SCHEDULER_NAME"cron = "SCHEDULER_CRON"payload ={is_force_run =FORCE_RUNis_dry_run =DRY_RUNfolders_include_list =[FOLDERS_INCLUDED]projects_include_list =[PROJECTS_INCLUDED]projects_exclude_list =[PROJECTS_EXCLUDED]datasets_include_list =[DATASETS_INCLUDED]datasets_exclude_list =[DATASETS_EXCLUDED]tables_include_list =[TABLES_INCLUDED]tables_exclude_list =[TABLES_EXCLUDED]}}Replace the following:
- SCHEDULER_NAME: the display name ofthe Cloud Scheduler.
- SCHEDULER_CRON: the frequency with which the scheduler checkswhether a backup is due for the in-scope tables, based on their individualbackup schedules. This can be anyunix-cron compatible string. For example,
0 * * * *is an hourly frequency. - FORCE_RUN: a boolean value. Set the value to
falseif youwant the scheduler to use the tables' cron schedules. If set totrue, allin-scope tables are backed up, regardless of their cron setting. - DRY_RUN: a boolean value. When set to
true, no actualbackup operations take place. Only log messages are generated. Usetruewhen you want to test and debug the solution without incurring backup costs. - FOLDERS_INCLUDED: a list of numericalIDs for folders that contain BigQuery data(for example,
1234, 456). When set, the solution backs up thetables in the specified folders, and ignores theprojects_include_list,datasets_include_list, andtables_include_listfield settings. - PROJECTS_INCLUDED: a list of project names (for example,
"project1", "project2"). When set, the solution backs up the tables inthe specified projects, and ignores thedatasets_include_listandtables_include_listfield settings. This setting is ignored if you set thefolders_include_listfield. - PROJECTS_EXCLUDED: a list of project names or regular expression(for example,
"project1", "regex:^test_"). When set, the solution doesnottake backups of the tables in the specified projects. You can use thissetting in combination with thefolders_include_listfield. - DATASETS_INCLUDED: a list of datasets (for example,
"project1.dataset1", "project1.dataset2"). When set, the solution backsup the tables in the specified datasets, and ignores thetables_include_listfield setting. This setting is ignored if you set thefolders_include_listorprojects_include_listfields. - DATASETS_EXCLUDED: a list of datasets or regular expression(for example,
"project1.dataset1", "regex:.*\\_landing$"). When set, thesolution doesnot take backups of the tables in the specified datasets. Youcan use this setting in combination with thefolders_include_listorprojects_include_listfields. - TABLES_INCLUDED: a list of tables (for example,
"project1.dataset1.table 1", "project1.dataset2.table2"). When set, thesolution backs up the specified tables. This setting is ignored if you setthefolders_include_list,projects_include_list, ordatasets_include_listfields. - TABLES_EXCLUDED: a list of tables or regular expression (forexample,
"project1.dataset1.table 1", "regex:.*\_test"). When set, the solutiondoesnot take backups of the specified tables. You can use this settingin combination with thefolders_include_list,projects_include_list, ordatasets_include_listfields.
All exclusion lists accept regular expressions in the form
regex:REGULAR_EXPRESSION.If the fully qualified entry name (for example,
"project.dataset.table")matches any of the supplied regular expression, it's excluded from the backupscope.The following are some common use cases:
- Exclude all dataset names that end with
_landing:datasets_exclude_list= ["regex:.*\\_landing$"] - Exclude all tables ending with
_test,_tst,_bkp, or_copy:tables_exclude_list = ["regex:.*\_(test|tst|bkp|copy)"]
Define fallback policies
On each run, the solution needs to determine the backup policy of each in-scopetable. For more information about the types of policies, seeBackup policies.This section shows you how to define a fallback policy.
A fallback policy is defined with adefault_policy variable and a set ofexceptions or overrides on different levels (folder, project, dataset, andtable). This approach provides granular flexibility without the need for anentry for each table.
There are additional sets of policy fields, depending on the backup method thatyou decide to use: BigQuery snapshots, exports toCloud Storage, or both.
In the TFVARS file, for the
default_policyvariable, set thefollowing common fields for the default policy:fallback_policy = {"default_policy":{"backup_cron":"BACKUP_CRON""backup_method":"BACKUP_METHOD","backup_time_travel_offset_days":"OFFSET_DAYS","backup_storage_project":"BACKUP_STORAGE_PROJECT","backup_operation_project":"BACKUP_OPERATIONS_PROJECT",Replace the following:
- BACKUP_CRON: a cron expression to set the frequency withwhich a table is backed up (for example, for backups every 6 hours, specify
0 0 */6 * * *). This must be aSpring-Framework compatible cron expression. - BACKUP_METHOD: the method, which you specify as
BigQuery Snapshot,GCS Snapshot(to use theexport to Cloud Storage method), orBoth. You need to provide therequired fields for each chosen backup method, as shown later. - OFFSET_DAYS: the number of days in the past that determinesthe point in time from which to back up the tables. Values can be a numberbetween 0 and 7.
- BACKUP_STORAGE_PROJECT: the ID of the project where allsnapshot and export operations are stored. This is the same project wherethe
bq_snapshot_storage_datasetandgcs_snapshot_storage_locationresides.Small deployments can use the host project, but large scale deploymentsshould use a separate project. - BACKUP_OPERATIONS_PROJECT: an optional setting, where youspecify the ID of the project where all snapshot and export operations run.Snapshot and export jobquotas and limitsare applicable to this project. This can be the same value as
backup_storage_project. If not set, the solution uses the source table'sproject.
- BACKUP_CRON: a cron expression to set the frequency withwhich a table is backed up (for example, for backups every 6 hours, specify
If you specified
BigQuery SnapshotorBothas thebackup_method, add the following fields after the common fields, in thedefault_policyvariable:"bq_snapshot_expiration_days":"SNAPSHOT_EXPIRATION","bq_snapshot_storage_dataset":"DATASET_NAME",Replace the following:
- SNAPSHOT_EXPIRATION: the number of days to keep eachsnapshot (for example,
15). - DATASET_NAME: the name of the dataset to store snapshots in(for example,
backups). The dataset must already exist in the projectspecified forbackup_storage_project.
- SNAPSHOT_EXPIRATION: the number of days to keep eachsnapshot (for example,
If you specified
GCS Snapshot(to use the exportto Cloud Storage method) orBothas thebackup_method, add thefollowing fields to thedefault_policyvariable:"gcs_snapshot_storage_location":"STORAGE_BUCKET","gcs_snapshot_format":"FILE_FORMAT","gcs_avro_use_logical_types":AVRO_TYPE,"gcs_csv_delimiter":"CSV_DELIMITER","gcs_csv_export_header":CSV_EXPORT_HEADERReplace the following:
- STORAGE_BUCKET: the Cloud Storage bucket in whichto store the exported data, in the format
gs://bucket/path/. For example,gs://bucket1/backups/. - FILE_FORMAT: the file format and compression used to export aBigQuery table to Cloud Storage. Available valuesare
CSV,CSV_GZIP,JSON,JSON_GZIP,AVRO,AVRO_DEFLATE,AVRO_SNAPPY,PARQUET,PARQUET_SNAPPY, andPARQUET_GZIP. - AVRO_TYPE: a boolean value. If set to
false, theBigQuery types are exported as strings. If set totrue,the types are exported as their correspondingAvro logical type.This field is required when thegcs_snapshot_formatis any Avro type format. - CSV_DELIMITER: the delimiter used for the exported CSV files,and the value can be any ISO-8859-1 single-byte character. You can use
\tortabto specify tab delimiters. This field is required when thegcs_snapshot_formatis any CSV type format. - CSV_EXPORT_HEADER: a boolean value. If set to
true, the columnheaders are exported to the CSV files. This field is required when thegcs_snapshot_formatis any CSV type format.
For details and Avro type mapping, see the following table:
BigQuery Type Avro Logical Type TIMESTAMPtimestamp-micros(annotates AvroLONG)DATEdate(annotates AvroINT)TIMEtimestamp-micro(annotates AvroLONG)DATETIMESTRING(custom named logical typedatetime)- STORAGE_BUCKET: the Cloud Storage bucket in whichto store the exported data, in the format
Add override variables for specific folders, projects, datasets, andtables:
},"folder_overrides":{"FOLDER_NUMBER":{},},"project_overrides":{"PROJECT_NAME":{}},"dataset_overrides":{"PROJECT_NAME.DATASET_NAME":{}},"table_overrides":{"PROJECT_NAME.DATASET_NAME.TABLE_NAME":{}}}Replace the following:
- FOLDER_NUMBER: specify the folder for which you want to setoverride fields.
- PROJECT_NAME: specify the project when you set overridefields for a particular project, dataset, or table.
- DATASET_NAME: specify the dataset when you set overridefields for a particular dataset or table.
- TABLE_NAME: specify the table for which you want to setoverride fields.
For each override entry, such as a specific project in the
project_overridesvariable, add the common fields and the required fieldsfor the backup method that you specified earlier indefault_policy.If you don't want to set overrides for a particular level, set thatvariable to an empty map (for example,
project_overrides : {}).In the following example, override fields are set for a specific table thatuses the BigQuery snapshot method:
},"project_overrides":{},"table_overrides":{"example_project1.dataset1.table1":{"backup_cron":"00*/5***",# every 5 hours each day"backup_method":"BigQuerySnapshot","backup_time_travel_offset_days":"7","backup_storage_project":"projectname","backup_operation_project":"projectname",# bq settings"bq_snapshot_expiration_days":"14","bq_snapshot_storage_dataset":"backups2"},}}
For a full example of a fallback policy, see theexample-variables file.
Configure additional backup operation projects
If you want to specify additional backup projects, such as thosedefined in external configurations (table-level backup policy) or the tablesource projects, configure the following variable:
additional_backup_operation_projects = [ADDITIONAL_BACKUPS]ReplaceADDITIONAL_BACKUPS with a comma-separated list ofproject names (for example,
"project1", "project2"). If you're using onlythe fallback backup policy without table-level external policies, you canset the value to an empty list.If you don't add this field, any projects that are specified in the optional
backup_operation_projectfield are automatically included as backupprojects.
Configure Terraform service account permissions
In the previous steps, you configured the backup projects where the backupoperations run. Terraform needs to deploy resources to those backup projects.
The service account that Terraform uses must have the required permissions forthese specified backup projects.
In Cloud Shell, grant the service account permissions for allof the projects where backup operations run:
./scripts/prepare_backup_operation_projects_for_terraform.shBACKUP_OPERATIONS_PROJECTDATA_PROJECTSADDITIONAL_BACKUPSReplace the following:
- BACKUP_OPERATIONS_PROJECT: any projects defined in the
backup_operation_projectfields in any of the fallback policies andtable-level policies. - DATA_PROJECTS: if no
backup_operation_projectfield isdefined in a fallback or table-level policy, include the projects for thosesource tables. - ADDITIONAL_BACKUPS: any projects that are defined in the
additional_backup_operation_projectsTerraform variable.
- BACKUP_OPERATIONS_PROJECT: any projects defined in the
Run the deployment scripts
In Cloud Shell, run the Terraform deployment script:
cdterraformterraforminit\-backend-config="bucket=${BUCKET_NAME}"\-backend-config="prefix=terraform-state"\-backend-config="impersonate_service_account=$TF_SA@$PROJECT_ID.iam.gserviceaccount.com"terraformplan-var-file=$VARSterraformapply-var-file=$VARSAdd the time to live (TTL) policies for Firestore:
gcloudfirestorefieldsttlsupdateexpires_at\--collection-group=project_folder_cache\--enable-ttl\--async\--project=$PROJECT_IDThe solution uses Datastore as a cache in somesituations. To save costs and improve lookup performance, the TTL policy allowsFirestore to automatically delete entries that are expired.
Set up access to sources and destinations
In Cloud Shell, set the following variables for the serviceaccounts used by the solution:
exportSA_DISPATCHER_EMAIL=dispatcher@${PROJECT_ID}.iam.gserviceaccount.comexportSA_CONFIGURATOR_EMAIL=configurator@${PROJECT_ID}.iam.gserviceaccount.comexportSA_SNAPSHOTER_BQ_EMAIL=snapshoter-bq@${PROJECT_ID}.iam.gserviceaccount.comexportSA_SNAPSHOTER_GCS_EMAIL=snapshoter-gcs@${PROJECT_ID}.iam.gserviceaccount.comexportSA_TAGGER_EMAIL=tagger@${PROJECT_ID}.iam.gserviceaccount.comIf you've changed the default names in Terraform, update the serviceaccount emails.
If you've set the
folders_include_listfield, and want to set thescope of the BigQuery scan to include certain folders, grantthe required permissions on the folder level:./scripts/prepare_data_folders.shFOLDERS_INCLUDEDTo enable the application to execute the necessary tasks in differentprojects, grant the required permissions on each of these projects:
./scripts/prepare_data_projects.shDATA_PROJECTS./scripts/prepare_backup_storage_projects.shBACKUP_STORAGE_PROJECT./scripts/prepare_backup_operation_projects.shBACKUP_OPERATIONS_PROJECTReplace the following:
DATA_PROJECTS: the data projects (or source projects) thatcontain the source tables that you want to back up (for example,
project1project2). Include the following projects:- Projects that are specified in the inclusion lists in the Terraformvariable
schedulers. - If you want to back up tables in the host project, include thehost project.
- Projects that are specified in the inclusion lists in the Terraformvariable
BACKUP_STORAGE_PROJECT: the backup storage projects (ordestination projects) where the solution stores the backups (for example,
project1 project2). You need to include the projects that are specified inthe following fields:- The
backup_storage_projectfields in all of the fallback policies. - The
backup_storage_projectfields in all of the table-level policies.
Include backup storage projects that are used in multiple fields or thatare used as both the source and destination project
- The
BACKUP_OPERATIONS_PROJECT: the data operation projects wherethe solution runs the backup operations (for example,
project1 project2).You need to include the projects that are specified in the following fields:- The
backup_operation_projectfields in all of the fallback policies. - All inclusion lists in the scope of the BigQueryscan (if you don't set the
backup_operation_projectfield). - The
backup_operation_projectfields in all of the table-levelpolicies.
Include backup operations projects that are used in multiple fields orthat are used as both the source and destination project.
- The
For tables that usecolumn-level access control,identify all policy tag taxonomies that are used by your tables (if any), and grantthe solution's service accounts access to the table data:
TAXONOMY="projects/TAXONOMY_PROJECT/locations/TAXONOMY_LOCATION/taxonomies/TAXONOMY_ID"gclouddata-catalogtaxonomiesadd-iam-policy-binding\$TAXONOMY\--member="serviceAccount:${SA_SNAPSHOTER_BQ_EMAIL}"\--role='roles/datacatalog.categoryFineGrainedReader'gclouddata-catalogtaxonomiesadd-iam-policy-binding\$TAXONOMY\--member="serviceAccount:${SA_SNAPSHOTER_GCS_EMAIL}"\--role='roles/datacatalog.categoryFineGrainedReader'Replace the following:
- TAXONOMY_PROJECT: the project ID inthe policy tag taxonomy
- TAXONOMY_LOCATION: the locationspecified in the policy tag taxonomy
- TAXONOMY_ID: the taxonomy ID of the policytag taxonomy
Repeat the previous step for each policy tag taxonomy.
Run the solution
After you deploy the solution, use the following sections to run and manage thesolution.
Set table-level backup policies
In Cloud Shell, create a table-level policy with the requiredfields, and then store the policy in the Cloud Storage bucket forpolicies:
# Use the default backup policies bucket unless overwritten in the .tfvarsexportPOLICIES_BUCKET=${PROJECT_ID}-bq-backup-manager-policies# set target table infoexportTABLE_PROJECT='TABLE_PROJECT'exportTABLE_DATASET='TABLE_DATASET'exportTABLE='TABLE_NAME'# Config Source must be 'MANUAL' when assigned this wayexportBACKUP_POLICY="{'config_source' : 'MANUAL','backup_cron' : 'BACKUP_CRON','backup_method' : 'BACKUP_METHOD','backup_time_travel_offset_days' : 'OFFSET_DAYS','backup_storage_project' : 'BACKUP_STORAGE_PROJECT','backup_operation_project' : 'BACKUP_OPERATION_PROJECT','gcs_snapshot_storage_location' : 'STORAGE_BUCKET','gcs_snapshot_format' : 'FILE_FORMAT','gcs_avro_use_logical_types' : 'AVRO_TYPE','bq_snapshot_storage_dataset' : 'DATASET_NAME','bq_snapshot_expiration_days' : 'SNAPSHOT_EXPIRATION'}"# File name MUST BE backup_policy.jsonecho$BACKUP_POLICY >>backup_policy.jsongcloudstoragecpbackup_policy.jsongs://${POLICIES_BUCKET}/policy/project=${TABLE_PROJECT}/dataset=${TABLE_DATASET}/table=${TABLE}/backup_policy.jsonReplace the following:
- TABLE_PROJECT: the project in which the table resides
- TABLE_DATASET: the dataset of the table
- TABLE_NAME: the name of the table
Trigger backup operations
The Cloud Scheduler jobs that you configured earlier run automaticallybased on their cron expression.
You can also manually run the jobs in the Google Cloud console. For more information,seeRun your job.
Monitor and report
With your host project (PROJECT_ID) selected, youcan run the following queries in BigQuery Studio to get reports and information.
Get progress statistics of each run (including in-progress runs):
SELECT*FROM`bq_backup_manager.v_run_summary_counts`Get all fatal (non-retryable errors) for a single run:
SELECT*FROM`bq_backup_manager.v_errors_non_retryable`WHERErun_id='RUN_ID'ReplaceRUN_ID with the ID of the run.
Get all runs on a table and their execution information:
SELECT*FROM`bq_backup_manager.v_errors_non_retryable`WHEREtablespec='project.dataset.table'You can also specify a
groupedversion:SELECT*FROM`bq_backup_manager.v_audit_log_by_table_grouped`,UNNEST(runs)rWHEREr.run_has_retryable_error=FALSEFor debugging, you can get detailed request and response informationfor each service invocation:
SELECTjsonPayload.unified_target_tableAStablespec,jsonPayload.unified_run_idASrun_id,jsonPayload.unified_tracking_idAStracking_id,CAST(jsonPayload.unified_is_successfulASBOOL)ASconfigurator_is_successful,jsonPayload.unified_errorASconfigurator_error,CAST(jsonPayload.unified_is_retryable_errorASBOOL)ASconfigurator_is_retryable_error,CAST(JSON_VALUE(jsonPayload.unified_input_json,'$.isForceRun')ASBOOL)ASis_force_run,CAST(JSON_VALUE(jsonPayload.unified_output_json,'$.isBackupTime')ASBOOL)ASis_backup_time,JSON_VALUE(jsonPayload.unified_output_json,'$.backupPolicy.method')ASbackup_method,CAST(JSON_VALUE(jsonPayload.unified_input_json,'$.isDryRun')ASBOOL)ASis_dry_run,jsonPayload.unified_input_jsonASrequest_json,jsonPayload.unified_output_jsonASresponse_jsonFROM`bq_backup_manager.run_googleapis_com_stdout`WHEREjsonPayload.global_app_log='UNIFIED_LOG'-- 1= dispatcher, 2= configurator, 3=bq snapshoter, -3=gcs snapshoter and 4=taggerANDjsonPayload.unified_component="2"Get the backup policies that are manually added or assigned by thesystem based on fallbacks:
SELECT*FROM`bq_backup_manager.ext_backup_policies`
Limitations
For more information about limits and quotas for each project that is specifiedin thebackup_operation_project fields, seeLimits.
Clean up
To avoid incurring charges to your Google Cloud account for the resources usedin this deployment, either delete the projects that contain the resources, orkeep the projects and delete the individual resources.
Delete the projects
Delete the new resources
As an alternative to deleting the projects, you can delete the resourcescreated during this procedure.
In Cloud Shell, delete the Terraform resources:
terraformdestroy-var-file="${VARS}"The command deletes almost all of the resources. Check to ensure thatall the resources you want to delete are removed.
What's next
- Learn more about BigQuery:
- For more reference architectures, diagrams, and best practices, explore theCloud Architecture Center.
Contributors
Author:Karim Wadie | Strategic Cloud Engineer
Other contributors:
- Chris DeForeest | Site Reliability Engineer
- Eyal Ben Ivri | Cloud Solutions Architect
- Jason Davenport | Developer Advocate
- Jaliya Ekanayake | Engineering Manager
- Muhammad Zain | Strategic Cloud Engineer
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2024-09-17 UTC.