Deploy scalable BigQuery backup automation

Last reviewed 2024-09-17 UTC

This document describes how you deployScalable BigQuery backup automation.

This document is intended for cloud architects, engineers, and data governanceofficers who want to define and automate data policies in their organizations.Experience with Terraform is helpful.

Architecture

The following diagram shows the automated backup architecture:

Architecture for the automated backup solution.

Cloud Scheduler triggers the run. The dispatcher service, using BigQuery API,lists the in-scope tables. Through a Pub/Sub message,the dispatcher service submits one request for each table to the configurator service.The configurator service determines the backup policies for the tables, and thensubmits one request for each table to the relevant Cloud Run service.The Cloud Run service then submits a request to the BigQuery APIand runs the backup operations. Pub/Sub triggers the tagger service,which logs the results and updates the backup state in theCloud Storage metadata layer.

For details about the architecture, seeScalable BigQuery backup automation.

Objectives

Build Cloud Run services.
Configure Terraform variables.
Run the Terraform and manual deployment scripts.
Run the solution.

Costs

In this document, you use the following billable components of Google Cloud:

To generate a cost estimate based on your projected usage, use thepricing calculator.

New Google Cloud users might be eligible for afree trial.

When you finish the tasks that are described in this document, you can avoidcontinued billing by deleting the resources that you created. For moreinformation, seeClean up.

Before you begin

If you're re-deploying the solution, you can skip this section (for example, afternew commits).

In this section, you create one-time resources.

In the Google Cloud console, activate Cloud Shell.
Activate Cloud Shell
If you want to create a new Google Cloud project to use as the host projectfor the deployment, use thegcloud projects createcommand:
```
gcloudprojectscreatePROJECT_ID
```
ReplacePROJECT_ID with the ID of the project you want to create.
Install Maven:
1. DownloadMaven.
2. In Cloud Shell, add Maven toPATH:
```
exportPATH=/DOWNLOADED_MAVEN_DIR/bin:$PATH
```

In Cloud Shell, clone the GitHub repository:

gitclonehttps://github.com/GoogleCloudPlatform/bq-backup-manager.git

Set and export the following environment variables:

exportPROJECT_ID=PROJECT_IDexportTF_SA=bq-backup-mgr-terraformexportCOMPUTE_REGION=COMPUTE_REGIONexportDATA_REGION=DATA_REGIONexportBUCKET_NAME=${PROJECT_ID}-bq-backup-mgrexportBUCKET=gs://${BUCKET_NAME}exportDOCKER_REPO_NAME=docker-repoexportCONFIG=bq-backup-managerexportACCOUNT=ACCOUNT_EMAILgcloudconfigconfigurationscreate$CONFIGgcloudconfigsetproject$PROJECT_IDgcloudconfigsetaccount$ACCOUNTgcloudconfigsetcompute/region$COMPUTE_REGIONgcloudauthlogingcloudauthapplication-defaultlogin

Replace the following:

PROJECT_ID: the ID of the Google Cloud host projectthat you want to deploy the solution to.
COMPUTE_REGION: the Google Cloudregion where you wantto deploy compute resources like Cloud Run and Identity and Access Management (IAM).
DATA_REGION: the Google Cloud region you want to deploydata resources (such as buckets and datasets) to.
ACCOUNT_EMAIL: the user account email address.

Enable the APIs:
```
./scripts/enable_gcp_apis.sh
```
The script enables the following APIs:
- Cloud Resource Manager API
- IAM API
- Data Catalog API
- Artifact Registry API
- BigQuery API
- Pub/Sub API
- Cloud Storage API
- Cloud Run Admin API
- Cloud Build API
- Service Usage API
- App Engine Admin API
- Serverless VPC Access API
- Cloud DNS API

Prepare the Terraform state bucket:

gcloudstoragebucketscreate$BUCKET--project=$PROJECT_ID--location=$COMPUTE_REGION--uniform-bucket-level-access

Prepare the Terraform service account:

./scripts/prepare_terraform_service_account.sh

To publish images that this solution uses, prepare aDocker repository:

gcloudartifactsrepositoriescreate$DOCKER_REPO_NAME--repository-format=docker\--location=$COMPUTE_REGION\--description="Docker repository for backups"

Deploy the infrastructure

Make sure that you've completedBefore you begin at least once.

In this section, follow the steps to deploy or redeploy the latest codebase tothe Google Cloud environment.

Activate the gcloud CLI configuration

In Cloud Shell, activate and authenticate thegcloud CLI configuration:

gcloudconfigconfigurationsactivate$CONFIGgcloudauthlogingcloudauthapplication-defaultlogin

Build Cloud Run services images

In Cloud Shell, build and deploy docker images to be used bythe Cloud Run service:

exportDISPATCHER_IMAGE=${COMPUTE_REGION}-docker.pkg.dev/${PROJECT_ID}/${DOCKER_REPO_NAME}/bqsm-dispatcher-service:latestexportCONFIGURATOR_IMAGE=${COMPUTE_REGION}-docker.pkg.dev/${PROJECT_ID}/${DOCKER_REPO_NAME}/bqsm-configurator-service:latestexportSNAPSHOTER_BQ_IMAGE=${COMPUTE_REGION}-docker.pkg.dev/${PROJECT_ID}/${DOCKER_REPO_NAME}/bqsm-snapshoter-bq-service:latestexportSNAPSHOTER_GCS_IMAGE=${COMPUTE_REGION}-docker.pkg.dev/${PROJECT_ID}/${DOCKER_REPO_NAME}/bqsm-snapshoter-gcs-service:latestexportTAGGER_IMAGE=${COMPUTE_REGION}-docker.pkg.dev/${PROJECT_ID}/${DOCKER_REPO_NAME}/bqsm-tagger-service:latest./scripts/deploy_services.sh

Configure Terraform variables

This deployment uses Terraform for configurations and a deployment script.

In Cloud Shell, create a new Terraform TFVARS file in whichyou can override the variables in this section:
```
exportVARS=FILENAME.tfvars
```
ReplaceFILENAME with the name of the variables file that youcreated (for example,my-variables). You can use theexample-variables file as a reference.
In the TFVARS file, configure the project variables:
```
project = "PROJECT_ID"compute_region = "COMPUTE_REGION"data_region = "DATA_REGION"
```
You can use the default values that are defined in thevariables.tf file or change the values.
Configure the Terraform service account, which you created and prepared earlierinBefore you begin:
```
terraform_service_account ="bq-backup-mgr-terraform@PROJECT_ID.iam.gserviceaccount.com"
```
Make sure that you use the full email address of the account that you created.

Configure the Cloud Run services to use the containerimages that you built and deployed earlier:

dispatcher_service_image     = "${COMPUTE_REGION}-docker.pkg.dev/${PROJECT_ID}/${DOCKER_REPO_NAME}/bqsm-dispatcher-service:latest"configurator_service_image   = "${COMPUTE_REGION}-docker.pkg.dev/${PROJECT_ID}/${DOCKER_REPO_NAME}/bqsm-configurator-service:latest"snapshoter_bq_service_image  = "${COMPUTE_REGION}-docker.pkg.dev/${PROJECT_ID}/${DOCKER_REPO_NAME}/bqsm-snapshoter-bq-service:latest"snapshoter_gcs_service_image = "${COMPUTE_REGION}-docker.pkg.dev/${PROJECT_ID}/${DOCKER_REPO_NAME}/bqsm-snapshoter-gcs-service:latest"tagger_service_image         = "${COMPUTE_REGION}-docker.pkg.dev/${PROJECT_ID}/${DOCKER_REPO_NAME}/bqsm-tagger-service:latest"

This script instructs Terraform to use these published images in theCloud Run services, which Terraform creates later.

Terraform only links a Cloud Run service to an existingimage. It doesn't build the images from the codebase, because that was completedin a previous step.

In theschedulers variable, define at least one scheduler. The schedulerperiodically lists and checks tables for required backups, based on theirtable-level backup cron schedules.
```
{name    = "SCHEDULER_NAME"cron    = "SCHEDULER_CRON"payload ={is_force_run =FORCE_RUNis_dry_run   =DRY_RUNfolders_include_list  =[FOLDERS_INCLUDED]projects_include_list =[PROJECTS_INCLUDED]projects_exclude_list =[PROJECTS_EXCLUDED]datasets_include_list =[DATASETS_INCLUDED]datasets_exclude_list =[DATASETS_EXCLUDED]tables_include_list   =[TABLES_INCLUDED]tables_exclude_list   =[TABLES_EXCLUDED]}}
```
Replace the following:
- SCHEDULER_NAME: the display name ofthe Cloud Scheduler.
- SCHEDULER_CRON: the frequency with which the scheduler checkswhether a backup is due for the in-scope tables, based on their individualbackup schedules. This can be anyunix-cron compatible string. For example,0 * * * * is an hourly frequency.
- FORCE_RUN: a boolean value. Set the value tofalse if youwant the scheduler to use the tables' cron schedules. If set totrue, allin-scope tables are backed up, regardless of their cron setting.
- DRY_RUN: a boolean value. When set totrue, no actualbackup operations take place. Only log messages are generated. Usetruewhen you want to test and debug the solution without incurring backup costs.
- FOLDERS_INCLUDED: a list of numericalIDs for folders that contain BigQuery data(for example,1234, 456). When set, the solution backs up thetables in the specified folders, and ignores theprojects_include_list,datasets_include_list, andtables_include_list field settings.
- PROJECTS_INCLUDED: a list of project names (for example,"project1", "project2"). When set, the solution backs up the tables inthe specified projects, and ignores thedatasets_include_list andtables_include_list field settings. This setting is ignored if you set thefolders_include_list field.
- PROJECTS_EXCLUDED: a list of project names or regular expression(for example,"project1", "regex:^test_"). When set, the solution doesnottake backups of the tables in the specified projects. You can use thissetting in combination with thefolders_include_list field.
- DATASETS_INCLUDED: a list of datasets (for example,"project1.dataset1", "project1.dataset2"). When set, the solution backsup the tables in the specified datasets, and ignores thetables_include_list field setting. This setting is ignored if you set thefolders_include_list orprojects_include_list fields.
- DATASETS_EXCLUDED: a list of datasets or regular expression(for example,"project1.dataset1", "regex:.*\\_landing$"). When set, thesolution doesnot take backups of the tables in the specified datasets. Youcan use this setting in combination with thefolders_include_list orprojects_include_list fields.
- TABLES_INCLUDED: a list of tables (for example,"project1.dataset1.table 1", "project1.dataset2.table2"). When set, thesolution backs up the specified tables. This setting is ignored if you setthefolders_include_list,projects_include_list, ordatasets_include_listfields.
- TABLES_EXCLUDED: a list of tables or regular expression (forexample,"project1.dataset1.table 1", "regex:.*\_test"). When set, the solutiondoesnot take backups of the specified tables. You can use this settingin combination with thefolders_include_list,projects_include_list, ordatasets_include_list fields.
All exclusion lists accept regular expressions in the formregex:REGULAR_EXPRESSION.
If the fully qualified entry name (for example,"project.dataset.table")matches any of the supplied regular expression, it's excluded from the backupscope.
The following are some common use cases:
- Exclude all dataset names that end with_landing:datasets_exclude_list= ["regex:.*\\_landing$"]
- Exclude all tables ending with_test,_tst,_bkp, or_copy:tables_exclude_list = ["regex:.*\_(test|tst|bkp|copy)"]

Define fallback policies

On each run, the solution needs to determine the backup policy of each in-scopetable. For more information about the types of policies, seeBackup policies.This section shows you how to define a fallback policy.

A fallback policy is defined with adefault_policy variable and a set ofexceptions or overrides on different levels (folder, project, dataset, andtable). This approach provides granular flexibility without the need for anentry for each table.

There are additional sets of policy fields, depending on the backup method thatyou decide to use: BigQuery snapshots, exports toCloud Storage, or both.

In the TFVARS file, for thedefault_policy variable, set thefollowing common fields for the default policy:
```
fallback_policy = {"default_policy":{"backup_cron":"BACKUP_CRON""backup_method":"BACKUP_METHOD","backup_time_travel_offset_days":"OFFSET_DAYS","backup_storage_project":"BACKUP_STORAGE_PROJECT","backup_operation_project":"BACKUP_OPERATIONS_PROJECT",
```
Replace the following:
- BACKUP_CRON: a cron expression to set the frequency withwhich a table is backed up (for example, for backups every 6 hours, specify0 0 */6 * * *). This must be aSpring-Framework compatible cron expression.
- BACKUP_METHOD: the method, which you specify asBigQuery Snapshot,GCS Snapshot (to use theexport to Cloud Storage method), orBoth. You need to provide therequired fields for each chosen backup method, as shown later.
- OFFSET_DAYS: the number of days in the past that determinesthe point in time from which to back up the tables. Values can be a numberbetween 0 and 7.
- BACKUP_STORAGE_PROJECT: the ID of the project where allsnapshot and export operations are stored. This is the same project wherethebq_snapshot_storage_dataset andgcs_snapshot_storage_location resides.Small deployments can use the host project, but large scale deploymentsshould use a separate project.
- BACKUP_OPERATIONS_PROJECT: an optional setting, where youspecify the ID of the project where all snapshot and export operations run.Snapshot and export jobquotas and limitsare applicable to this project. This can be the same value asbackup_storage_project. If not set, the solution uses the source table'sproject.
If you specifiedBigQuery Snapshot orBoth as thebackup_method, add the following fields after the common fields, in thedefault_policy variable:
```
"bq_snapshot_expiration_days":"SNAPSHOT_EXPIRATION","bq_snapshot_storage_dataset":"DATASET_NAME",
```
Replace the following:
- SNAPSHOT_EXPIRATION: the number of days to keep eachsnapshot (for example,15).
- DATASET_NAME: the name of the dataset to store snapshots in(for example,backups). The dataset must already exist in the projectspecified forbackup_storage_project.
If you specifiedGCS Snapshot (to use the exportto Cloud Storage method) orBoth as thebackup_method, add thefollowing fields to thedefault_policy variable:
```
"gcs_snapshot_storage_location":"STORAGE_BUCKET","gcs_snapshot_format":"FILE_FORMAT","gcs_avro_use_logical_types":AVRO_TYPE,"gcs_csv_delimiter":"CSV_DELIMITER","gcs_csv_export_header":CSV_EXPORT_HEADER
```
Replace the following:
- STORAGE_BUCKET: the Cloud Storage bucket in whichto store the exported data, in the formatgs://bucket/path/. For example,gs://bucket1/backups/.
- FILE_FORMAT: the file format and compression used to export aBigQuery table to Cloud Storage. Available valuesareCSV,CSV_GZIP,JSON,JSON_GZIP,AVRO,AVRO_DEFLATE,AVRO_SNAPPY,PARQUET,PARQUET_SNAPPY, andPARQUET_GZIP.
- AVRO_TYPE: a boolean value. If set tofalse, theBigQuery types are exported as strings. If set totrue,the types are exported as their correspondingAvro logical type.This field is required when thegcs_snapshot_format is any Avro type format.
- CSV_DELIMITER: the delimiter used for the exported CSV files,and the value can be any ISO-8859-1 single-byte character. You can use\t ortab to specify tab delimiters. This field is required when thegcs_snapshot_format is any CSV type format.
- CSV_EXPORT_HEADER: a boolean value. If set totrue, the columnheaders are exported to the CSV files. This field is required when thegcs_snapshot_format is any CSV type format.
For details and Avro type mapping, see the following table:
BigQuery Type Avro Logical Type
TIMESTAMP timestamp-micros (annotates AvroLONG)
DATE date (annotates AvroINT)
TIME timestamp-micro (annotates AvroLONG)
DATETIME STRING (custom named logical typedatetime)
Add override variables for specific folders, projects, datasets, andtables:
```
},"folder_overrides":{"FOLDER_NUMBER":{},},"project_overrides":{"PROJECT_NAME":{}},"dataset_overrides":{"PROJECT_NAME.DATASET_NAME":{}},"table_overrides":{"PROJECT_NAME.DATASET_NAME.TABLE_NAME":{}}}
```
Replace the following:
- FOLDER_NUMBER: specify the folder for which you want to setoverride fields.
- PROJECT_NAME: specify the project when you set overridefields for a particular project, dataset, or table.
- DATASET_NAME: specify the dataset when you set overridefields for a particular dataset or table.
- TABLE_NAME: specify the table for which you want to setoverride fields.
For each override entry, such as a specific project in theproject_overrides variable, add the common fields and the required fieldsfor the backup method that you specified earlier indefault_policy.
If you don't want to set overrides for a particular level, set thatvariable to an empty map (for example,project_overrides : {}).
In the following example, override fields are set for a specific table thatuses the BigQuery snapshot method:
```
},"project_overrides":{},"table_overrides":{"example_project1.dataset1.table1":{"backup_cron":"00*/5***",# every 5 hours each day"backup_method":"BigQuerySnapshot","backup_time_travel_offset_days":"7","backup_storage_project":"projectname","backup_operation_project":"projectname",# bq settings"bq_snapshot_expiration_days":"14","bq_snapshot_storage_dataset":"backups2"},}}
```

BigQuery Type	Avro Logical Type
`TIMESTAMP`	`timestamp-micros` (annotates Avro`LONG`)
`DATE`	`date` (annotates Avro`INT`)
`TIME`	`timestamp-micro` (annotates Avro`LONG`)
`DATETIME`	`STRING` (custom named logical type`datetime`)

For a full example of a fallback policy, see theexample-variables file.

Configure additional backup operation projects

If you want to specify additional backup projects, such as thosedefined in external configurations (table-level backup policy) or the tablesource projects, configure the following variable:
```
additional_backup_operation_projects = [ADDITIONAL_BACKUPS]
```
ReplaceADDITIONAL_BACKUPS with a comma-separated list ofproject names (for example,"project1", "project2"). If you're using onlythe fallback backup policy without table-level external policies, you canset the value to an empty list.
If you don't add this field, any projects that are specified in the optionalbackup_operation_project field are automatically included as backupprojects.

Configure Terraform service account permissions

In the previous steps, you configured the backup projects where the backupoperations run. Terraform needs to deploy resources to those backup projects.

The service account that Terraform uses must have the required permissions forthese specified backup projects.

In Cloud Shell, grant the service account permissions for allof the projects where backup operations run:
```
./scripts/prepare_backup_operation_projects_for_terraform.shBACKUP_OPERATIONS_PROJECTDATA_PROJECTSADDITIONAL_BACKUPS
```
Replace the following:
- BACKUP_OPERATIONS_PROJECT: any projects defined in thebackup_operation_project fields in any of the fallback policies andtable-level policies.
- DATA_PROJECTS: if nobackup_operation_project field isdefined in a fallback or table-level policy, include the projects for thosesource tables.
- ADDITIONAL_BACKUPS: any projects that are defined in theadditional_backup_operation_projects Terraform variable.

Run the deployment scripts

In Cloud Shell, run the Terraform deployment script:

cdterraformterraforminit\-backend-config="bucket=${BUCKET_NAME}"\-backend-config="prefix=terraform-state"\-backend-config="impersonate_service_account=$TF_SA@$PROJECT_ID.iam.gserviceaccount.com"terraformplan-var-file=$VARSterraformapply-var-file=$VARS

Add the time to live (TTL) policies for Firestore:
```
gcloudfirestorefieldsttlsupdateexpires_at\--collection-group=project_folder_cache\--enable-ttl\--async\--project=$PROJECT_ID
```
The solution uses Datastore as a cache in somesituations. To save costs and improve lookup performance, the TTL policy allowsFirestore to automatically delete entries that are expired.

Set up access to sources and destinations

In Cloud Shell, set the following variables for the serviceaccounts used by the solution:

exportSA_DISPATCHER_EMAIL=dispatcher@${PROJECT_ID}.iam.gserviceaccount.comexportSA_CONFIGURATOR_EMAIL=configurator@${PROJECT_ID}.iam.gserviceaccount.comexportSA_SNAPSHOTER_BQ_EMAIL=snapshoter-bq@${PROJECT_ID}.iam.gserviceaccount.comexportSA_SNAPSHOTER_GCS_EMAIL=snapshoter-gcs@${PROJECT_ID}.iam.gserviceaccount.comexportSA_TAGGER_EMAIL=tagger@${PROJECT_ID}.iam.gserviceaccount.com

If you've changed the default names in Terraform, update the serviceaccount emails.

If you've set thefolders_include_list field, and want to set thescope of the BigQuery scan to include certain folders, grantthe required permissions on the folder level:
```
./scripts/prepare_data_folders.shFOLDERS_INCLUDED
```
To enable the application to execute the necessary tasks in differentprojects, grant the required permissions on each of these projects:
```
./scripts/prepare_data_projects.shDATA_PROJECTS./scripts/prepare_backup_storage_projects.shBACKUP_STORAGE_PROJECT./scripts/prepare_backup_operation_projects.shBACKUP_OPERATIONS_PROJECT
```
Replace the following:
- DATA_PROJECTS: the data projects (or source projects) thatcontain the source tables that you want to back up (for example,project1project2). Include the following projects:
  - Projects that are specified in the inclusion lists in the Terraformvariableschedulers.
  - If you want to back up tables in the host project, include thehost project.
- BACKUP_STORAGE_PROJECT: the backup storage projects (ordestination projects) where the solution stores the backups (for example,project1 project2). You need to include the projects that are specified inthe following fields:
  - Thebackup_storage_project fields in all of the fallback policies.
  - Thebackup_storage_project fields in all of the table-level policies.
  Include backup storage projects that are used in multiple fields or thatare used as both the source and destination project
- BACKUP_OPERATIONS_PROJECT: the data operation projects wherethe solution runs the backup operations (for example,project1 project2).You need to include the projects that are specified in the following fields:
  - Thebackup_operation_project fields in all of the fallback policies.
  - All inclusion lists in the scope of the BigQueryscan (if you don't set thebackup_operation_project field).
  - Thebackup_operation_project fields in all of the table-levelpolicies.
  Include backup operations projects that are used in multiple fields orthat are used as both the source and destination project.

For tables that usecolumn-level access control,identify all policy tag taxonomies that are used by your tables (if any), and grantthe solution's service accounts access to the table data:

TAXONOMY="projects/TAXONOMY_PROJECT/locations/TAXONOMY_LOCATION/taxonomies/TAXONOMY_ID"gclouddata-catalogtaxonomiesadd-iam-policy-binding\$TAXONOMY\--member="serviceAccount:${SA_SNAPSHOTER_BQ_EMAIL}"\--role='roles/datacatalog.categoryFineGrainedReader'gclouddata-catalogtaxonomiesadd-iam-policy-binding\$TAXONOMY\--member="serviceAccount:${SA_SNAPSHOTER_GCS_EMAIL}"\--role='roles/datacatalog.categoryFineGrainedReader'

Replace the following:

TAXONOMY_PROJECT: the project ID inthe policy tag taxonomy
TAXONOMY_LOCATION: the locationspecified in the policy tag taxonomy
TAXONOMY_ID: the taxonomy ID of the policytag taxonomy

Repeat the previous step for each policy tag taxonomy.

Run the solution

After you deploy the solution, use the following sections to run and manage thesolution.

Set table-level backup policies

In Cloud Shell, create a table-level policy with the requiredfields, and then store the policy in the Cloud Storage bucket forpolicies:

# Use the default backup policies bucket unless overwritten in the .tfvarsexportPOLICIES_BUCKET=${PROJECT_ID}-bq-backup-manager-policies# set target table infoexportTABLE_PROJECT='TABLE_PROJECT'exportTABLE_DATASET='TABLE_DATASET'exportTABLE='TABLE_NAME'# Config Source must be 'MANUAL' when assigned this wayexportBACKUP_POLICY="{'config_source' : 'MANUAL','backup_cron' : 'BACKUP_CRON','backup_method' : 'BACKUP_METHOD','backup_time_travel_offset_days' : 'OFFSET_DAYS','backup_storage_project' : 'BACKUP_STORAGE_PROJECT','backup_operation_project' : 'BACKUP_OPERATION_PROJECT','gcs_snapshot_storage_location' : 'STORAGE_BUCKET','gcs_snapshot_format' : 'FILE_FORMAT','gcs_avro_use_logical_types' : 'AVRO_TYPE','bq_snapshot_storage_dataset' : 'DATASET_NAME','bq_snapshot_expiration_days' : 'SNAPSHOT_EXPIRATION'}"# File name MUST BE backup_policy.jsonecho$BACKUP_POLICY >>backup_policy.jsongcloudstoragecpbackup_policy.jsongs://${POLICIES_BUCKET}/policy/project=${TABLE_PROJECT}/dataset=${TABLE_DATASET}/table=${TABLE}/backup_policy.json

Replace the following:

TABLE_PROJECT: the project in which the table resides
TABLE_DATASET: the dataset of the table
TABLE_NAME: the name of the table

Trigger backup operations

The Cloud Scheduler jobs that you configured earlier run automaticallybased on their cron expression.

You can also manually run the jobs in the Google Cloud console. For more information,seeRun your job.

Monitor and report

With your host project (PROJECT_ID) selected, youcan run the following queries in BigQuery Studio to get reports and information.

Get progress statistics of each run (including in-progress runs):
```
SELECT*FROM`bq_backup_manager.v_run_summary_counts`
```
Get all fatal (non-retryable errors) for a single run:
```
SELECT*FROM`bq_backup_manager.v_errors_non_retryable`WHERErun_id='RUN_ID'
```
ReplaceRUN_ID with the ID of the run.

Get all runs on a table and their execution information:

SELECT*FROM`bq_backup_manager.v_errors_non_retryable`WHEREtablespec='project.dataset.table'

You can also specify agrouped version:

SELECT*FROM`bq_backup_manager.v_audit_log_by_table_grouped`,UNNEST(runs)rWHEREr.run_has_retryable_error=FALSE

For debugging, you can get detailed request and response informationfor each service invocation:

SELECTjsonPayload.unified_target_tableAStablespec,jsonPayload.unified_run_idASrun_id,jsonPayload.unified_tracking_idAStracking_id,CAST(jsonPayload.unified_is_successfulASBOOL)ASconfigurator_is_successful,jsonPayload.unified_errorASconfigurator_error,CAST(jsonPayload.unified_is_retryable_errorASBOOL)ASconfigurator_is_retryable_error,CAST(JSON_VALUE(jsonPayload.unified_input_json,'$.isForceRun')ASBOOL)ASis_force_run,CAST(JSON_VALUE(jsonPayload.unified_output_json,'$.isBackupTime')ASBOOL)ASis_backup_time,JSON_VALUE(jsonPayload.unified_output_json,'$.backupPolicy.method')ASbackup_method,CAST(JSON_VALUE(jsonPayload.unified_input_json,'$.isDryRun')ASBOOL)ASis_dry_run,jsonPayload.unified_input_jsonASrequest_json,jsonPayload.unified_output_jsonASresponse_jsonFROM`bq_backup_manager.run_googleapis_com_stdout`WHEREjsonPayload.global_app_log='UNIFIED_LOG'-- 1= dispatcher, 2= configurator, 3=bq snapshoter, -3=gcs snapshoter and 4=taggerANDjsonPayload.unified_component="2"

Get the backup policies that are manually added or assigned by thesystem based on fallbacks:
```
SELECT*FROM`bq_backup_manager.ext_backup_policies`
```

Limitations

For more information about limits and quotas for each project that is specifiedin thebackup_operation_project fields, seeLimits.

Clean up

To avoid incurring charges to your Google Cloud account for the resources usedin this deployment, either delete the projects that contain the resources, orkeep the projects and delete the individual resources.

Delete the projects

Caution: Deleting a project has the following effects:

Everything in the project is deleted. If you used an existing project for the tasks in this document, when you delete it, you also delete any other work you've done in the project.
Custom project IDs are lost. When you created this project, you might have created a custom project ID that you want to use in the future. To preserve the URLs that use the project ID, such as anappspot.com URL, delete selected resources inside the project instead of deleting the whole project.

If you plan to explore multiple architectures, tutorials, or quickstarts, reusing projects can help you avoid exceeding project quota limits.

Delete a Google Cloud project:

gcloud projects deletePROJECT_ID

Delete the new resources

As an alternative to deleting the projects, you can delete the resourcescreated during this procedure.

In Cloud Shell, delete the Terraform resources:
```
terraformdestroy-var-file="${VARS}"
```
The command deletes almost all of the resources. Check to ensure thatall the resources you want to delete are removed.

What's next

Learn more about BigQuery:
- BigQuery table snapshots
- BigQuery table exports to Cloud Storage
For more reference architectures, diagrams, and best practices, explore theCloud Architecture Center.

Contributors

Author:Karim Wadie | Strategic Cloud Engineer

Other contributors:

Chris DeForeest | Site Reliability Engineer
Eyal Ben Ivri | Cloud Solutions Architect
Jason Davenport | Developer Advocate
Jaliya Ekanayake | Engineering Manager
Muhammad Zain | Strategic Cloud Engineer

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2024-09-17 UTC.

Movatterモバイル変換

Deploy scalable BigQuery backup automation Stay organized with collections Save and categorize content based on your preferences.

Architecture

Objectives

Costs

Before you begin

Deploy the infrastructure

Activate the gcloud CLI configuration

Build Cloud Run services images

Configure Terraform variables

Define fallback policies

Configure additional backup operation projects

Configure Terraform service account permissions

Run the deployment scripts

Set up access to sources and destinations

Run the solution

Set table-level backup policies

Trigger backup operations

Monitor and report

Limitations

Clean up

Delete the projects

Delete the new resources

What's next

Contributors

Deploy scalable BigQuery backup automation