Manage data quality rules as code with Terraform Stay organized with collections Save and categorize content based on your preferences.
This tutorial explains how to manage Dataplex Universal Catalogdata quality rules as code withTerraform, Cloud Build, and GitHub.
Many different options for data quality rules are available to define andmeasure the quality of your data. When you automate the process of deployingdata quality rules as a part of your larger infrastructure management strategy,you ensure that your data is consistently and predictably subjected to the rulesthat you assign to it.
If you have different versions of a dataset for multiple environments, such asdev andprod environments, Terraform provides a reliable way to assign dataquality rules to environment-specific versions of datasets.
Version control also is an importantDevOps best practice. Managingyour data quality rules as code provides you with versions of your data qualityrules that are available in your GitHub history. Terraform can alsosave its state to Cloud Storage, which can store earlier versions of thestate file.
For more information about Terraform and Cloud Build, seeOverview of Terraform on Google Cloud andCloud Build.
Architecture
To understand how this tutorial uses Cloud Build for managingTerraform executions, consider the following architecture diagram. Note that ituses GitHub branches—dev andprod—to represent actual environments.
dev andprodenvironments. You can extend this behavior to deploy to moreenvironments and to create projects under yourorganization hierarchy if needed.The process starts when you push Terraform code to either thedev orprodbranch. In this scenario, Cloud Build triggers and then appliesTerraform manifests to achieve the state you want in the respective environment.On the other hand, when you push Terraform code to any other branch—for example,to a feature branch—Cloud Build runs to executeterraform plan, butnothing is applied to any environment.
Ideally, either developers or operators must make infrastructure proposals tonon-protected branches and then submit them throughpull requests.TheCloud Build GitHub app,discussed later in this tutorial, automatically triggers the build jobs andlinks theterraform plan reports to these pull requests. This way, you candiscuss and review the potential changes with collaborators and add follow-upcommits before changes are merged into the base branch.
If no concerns are raised, you must first merge the changes to thedevbranch. This merge triggers an infrastructure deployment to thedevenvironment, allowing you to test this environment. After you have tested andare confident about what was deployed, you must merge thedev branch into theprod branch to trigger the infrastructure installation to the productionenvironment.
Objectives
- Set up your GitHub repository.
- Configure Terraform to store state in a Cloud Storage bucket.
- Grant permissions to your Cloud Build service account.
- Connect Cloud Build to your GitHub repository.
- Establish Dataplex Universal Catalog data quality rules.
- Change your environment configuration in a feature branch and test.
- Promote changes to the development environment.
- Promote changes to the production environment.
Costs
In this document, you use the following billable components of Google Cloud:
To generate a cost estimate based on your projected usage, use thepricing calculator.
When you finish the tasks that are described in this document, you can avoid continued billing by deleting the resources that you created. For more information, seeClean up.
Before you begin
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.Roles required to select or create a project
- Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
- Create a project: To create a project, you need the Project Creator role (
roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission.Learn how to grant roles.
Verify that billing is enabled for your Google Cloud project.
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.Roles required to select or create a project
- Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
- Create a project: To create a project, you need the Project Creator role (
roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission.Learn how to grant roles.
Verify that billing is enabled for your Google Cloud project.
In the Google Cloud console, activate Cloud Shell.
At the bottom of the Google Cloud console, aCloud Shell session starts and displays a command-line prompt. Cloud Shell is a shell environment with the Google Cloud CLI already installed and with values already set for your current project. It can take a few seconds for the session to initialize.
- In Cloud Shell, get the ID of the project you just selected:
If this command doesn't return the project ID, configure Cloud Shell touse your project. Replacegcloud config get-value project
PROJECT_IDwith your projectID.gcloud config set projectPROJECT_ID
- Enable the required APIs:
This step might take a few minutes to finish.gcloud services enable bigquery.googleapis.com cloudbuild.googleapis.com compute.googleapis.com dataplex.googleapis.com
- If you've never used Git in Cloud Shell, configure it with yourname and email address:
Git uses this information to identify you as the author of the commits that youcreate in Cloud Shell.git config --global user.email "YOUR_EMAIL_ADDRESS"git config --global user.name "YOUR_NAME"
Set up your GitHub repository
In this tutorial, you use a single Git repository to define your cloudinfrastructure. You orchestrate this infrastructure by having differentbranches corresponding to different environments:
- The
devbranch contains the latest changes that are applied to thedevelopment environment. - The
prodbranch contains the latest changes that are applied to theproduction environment.
With this infrastructure, you can always reference the repository to know whatconfiguration is expected in each environment and to propose new changes byfirst merging them into thedev environment. You then promote the changes bymerging thedev branch into the subsequentprod branch.
To get started, fork theterraform-google-dataplex-auto-data-quality repository.
On GitHub, navigate tohttps://github.com/GoogleCloudPlatform/terraform-google-dataplex-auto-data-quality.git.
ClickFork.
Now you have a copy of the
terraform-google-dataplex-auto-data-qualityrepository with source files.In Cloud Shell, clone the following forked repository:
cd ~git clone https://github.com/GITHUB_USERNAME/terraform-google-dataplex-auto-data-quality.gitcd ~/terraform-google-dataplex-auto-data-quality
Replace the following:
- GITHUB_USERNAME: your GitHub username
Create
devandprodbranches:git checkout -b prodgit checkout -b dev
The code in this repository is structured as follows:
The
environments/folder contains subfolders that represent environments,such asdevandprod, which provide logical separation between workloadsat different stages of maturity, development and production, respectively.The
modules/folder contains inline Terraform modules. These modulesrepresent logical groupings of related resources and are used to share codeacross different environments. Themodules/deploy/module here represents atemplate for a deployment and is reused for different deploymentenvironments.Within
modules/deploy/:The
rule/folder containsyamlfilescontaining data quality rules. One file represents a set of data qualityrules for one table. This file is used indevandprodenvironments.The
schemas/folder contains the table schema for theBigQuery table deployed in this infrastructure.The
bigquery.tffile contains the configuration forBigQuery tables created in this deployment.The
dataplex.tffile contains a Dataplex Universal Catalog data scan fordata quality. This file is used in conjunction torules_file_parsing.tfto read data quality rules from ayamlfileinto the environment.
The
cloudbuild.yamlfile is a build configuration file that containsinstructions for Cloud Build, such as how to perform tasks basedon a set of steps. This file specifies a conditional execution depending onthe branch Cloud Build is fetching the code from, for example:For
devandprodbranches, the following steps are executed:terraform initterraform planterraform apply
For any other branch, the following steps are executed:
terraform initfor allenvironmentssubfoldersterraform planfor allenvironmentssubfolders
To ensure that the changes being proposed are appropriate for every environment,terraform init andterraform plan are run for all environments. Beforemerging the pull request, you can review the plans to make sure that accessisn't being granted to an unauthorized entity, for example.
Configuring Terraform to store state in Cloud Storage buckets
By default, Terraform storesstate locally in a file namedterraform.tfstate. This default configuration canmake Terraform usage difficult for teams, especially when many users runTerraform at the same time and each machine has its own understanding of thecurrent infrastructure.
To help you avoid such issues, this section configures aremote state that points to a Cloud Storage bucket. Remote state is a feature ofbackends and, in this tutorial, is configured in thebackend.tf file.
# Copyright 2024 Google LLC## Licensed under the Apache License, Version 2.0 (the "License");# you may not use this file except in compliance with the License.# You may obtain a copy of the License at## https://www.apache.org/licenses/LICENSE-2.0## Unless required by applicable law or agreed to in writing, software# distributed under the License is distributed on an "AS IS" BASIS,# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.# See the License for the specific language governing permissions and# limitations under the License.terraform{backend"gcs"{bucket="PROJECT_ID-tfstate-dev"}}A separatebackend.tf file exists in each of thedev andprodenvironments. It is considered best practice to use a differentCloud Storage bucket for each environment.
In the following steps, you create two Cloud Storage buckets fordevandprod and change a few files to point to your new buckets and yourGoogle Cloud project.
In Cloud Shell, create the two Cloud Storage buckets:
DEV_BUCKET=gs://PROJECT_ID-tfstate-devgcloudstoragebucketscreate${DEV_BUCKET}PROD_BUCKET=gs://PROJECT_ID-tfstate-prodgcloudstoragebucketscreate${PROD_BUCKET}To keep the history of your deployments, enableObject Versioning:
gcloudstoragebucketsupdate${DEV_BUCKET}--versioninggcloudstoragebucketsupdate${PROD_BUCKET}--versioningEnabling object versioning increasesstorage costs,which you can mitigate by configuringObject Lifecycle Management to delete old state versions.
In each environment, in the
main.tfandbackend.tffiles , replacePROJECT_IDwith the project ID:cd ~/terraform-google-dataplex-auto-data-qualitysed -i s/PROJECT_ID/PROJECT_ID/g environments/*/main.tfsed -i s/PROJECT_ID/PROJECT_ID/g environments/*/backend.tf
On OS X or macOS, you might need to add two quotation marks (
"") aftersed -i, as follows:cd ~/solutions-terraform-cloudbuild-gitopssed -i "" s/PROJECT_ID/PROJECT_ID/g environments/*/main.tfsed -i "" s/PROJECT_ID/PROJECT_ID/g environments/*/backend.tf
Check whether all files were updated:
gitstatusThe following is a sample output:
On branch devYour branch is up-to-date with 'origin/dev'.Changes not staged for commit: (use "git add <file>..." to update what will be committed) (use "git checkout -- <file>..." to discard changes in working directory) modified: environments/dev/backend.tf modified: environments/dev/main.tf modified: environments/prod/backend.tf modified: environments/prod/main.tfno changes added to commit (use "git add" and/or "git commit -a")
Commit and push your changes:
gitadd--allgitcommit-m"Update project IDs and buckets"gitpushorigindevDepending on your GitHub configuration, you must authenticate to push thepreceding changes.
Grant permissions to your Cloud Build service account
To allowCloud Build service account to run Terraform scripts with the goal of managing Google Cloud resources,you need to grant it appropriate access to your project. For simplicity,project editor access is granted in this tutorial. But when the project editor role has awide-range permission, in production environments, you must follow your company'sIT security best practices, usually providingleast-privileged access.
In Cloud Shell, retrieve the email for your project'sCloud Build service account:
CLOUDBUILD_SA="$(gcloudprojectsdescribe$PROJECT_ID\--format'value(projectNumber)')@cloudbuild.gserviceaccount.com"Grant the required access to your Cloud Build service account:
gcloudprojectsadd-iam-policy-binding$PROJECT_ID\--memberserviceAccount:$CLOUDBUILD_SA--roleroles/editor
Directly connect Cloud Build to your GitHub repository
This section describes you how to install theCloud Build GitHub app.This installation lets you connect your GitHub repository with yourGoogle Cloud project so that Cloud Build can automatically applyyour Terraform manifests each time you create a new branch or push code toGitHub.
The following steps provide instructions for installing the app only for theterraform-google-dataplex-auto-data-quality repository, but you can choose toinstall the app for more or all of your repositories.
In GitHub Marketplace, go to theCloud Build app page.
- If this is your first time configuring an app in GitHub: ClickSetupwith Google Cloud Build at the bottom of the page. Then clickGrantthis app access to your GitHub account.
- If this is not the first time configuring an app in GitHub: ClickConfigure access. TheApplications page of your personalaccount opens.
ClickConfigure in the Cloud Build row.
SelectOnly select repositories, then select
terraform-google-dataplex-auto-data-qualityto connect to the repository.ClickSave orInstall—the button label changes depending onyour workflow. You are redirected to Google Cloud to continue theinstallation.
Sign in with your Google Cloud account. If requested, authorizeCloud Build integration with GitHub.
On theCloud Build page, select your project. Awizard appears.
In theSelect repository section, select your GitHub account and the
terraform-google-dataplex-auto-data-qualityrepository.If you agree with the terms and conditions, select the checkbox, then clickConnect.
In theCreate a trigger section, clickCreate a trigger:
- Add a trigger name, such as
push-to-branch. Note this trigger namebecause you will need it later. - In theEvent section, selectPush to a branch.
- In theSource section, select
.*in theBranch field. - ClickCreate.
- Add a trigger name, such as
The Cloud Build GitHub app is configured, and your GitHubrepository is linked to your Google Cloud project. Changes tothe GitHub repository trigger Cloud Build executions, which reportthe results back to GitHub by usingGitHub Checks.
Change your environment configuration in a new feature branch
You have most of your environment configured. Make necessary code changes inyour local environment:
On GitHub, navigate to the main page of your forked repository.
https://github.com/YOUR_GITHUB_USERNAME/terraform-google-dataplex-auto-data-quality
Make sure you are on the
devbranch.To open the file for editing, go to the
modules/deploy/dataplex.tffile.On line 19, change the label
the_environmenttoenvironment.Add a commit message at the bottom of the page, such as "modifying label",and selectCreate a new branch for this commit and start a pull request.
ClickPropose changes.
On the following page, clickCreate pull request to open a new pullrequest with your change to the
devbranch.After your pull request is open, a Cloud Build job isautomatically initiated.
ClickShow all checks and wait for the check to become green.Don't merge your pull request yet. Merging is done in a later step of the tutorial.
ClickDetails to see more information, including the output of the
terraform planatView more details on Google Cloud Build link.
Note that the Cloud Build job ran the pipeline defined in thecloudbuild.yaml file. This pipeline has different behaviors depending on thebranch being fetched. The build checks whether the$BRANCH_NAME variable matches any environment folder. If so,Cloud Build executesterraform plan for that environment.Otherwise, Cloud Build executesterraform plan for all environmentsto make sure that the proposed change is appropriate for all of them. If any ofthese plans fail to execute, the build fails.
-id:'tfplan'name:'hashicorp/terraform:1.9.8'entrypoint:'sh'args:-'-c'-|if [ -d "environments/$BRANCH_NAME/" ]; thencd environments/$BRANCH_NAMEterraform planelsefor dir in environments/*/docd ${dir}env=${dir%*/}env=${env#*/}echo ""echo "*************** TERRAFORM PLAN ******************"echo "******* At environment: ${env} ********"echo "*************************************************"terraform plan || exit 1cd ../../donefiSimilarly, theterraform apply command runs for environment branches, but itis completely ignored in any other case. In this section, you have submitted acode change to a new branch, so no infrastructure deployments were applied toyour Google Cloud project.
-id:'tfapply'name:'hashicorp/terraform:1.9.8'entrypoint:'sh'args:-'-c'-|if [ -d "environments/$BRANCH_NAME/" ]; thencd environments/$BRANCH_NAMEterraform apply -auto-approveelseecho "***************************** SKIPPING APPLYING *******************************"echo "Branch '$BRANCH_NAME' does not represent an official environment."echo "*******************************************************************************"fiEnforce Cloud Build execution success before merging branches
To make sure merges can be applied only when respective Cloud Buildexecutions are successful, follow these steps:
On GitHub, navigate to the main page of your forked repository.
https://github.com/YOUR_GITHUB_USERNAME/terraform-google-dataplex-auto-data-quality
Under your repository name, clickSettings.
In the left menu, clickBranches.
UnderBranch protection rules, clickAdd rule.
InBranch name pattern, type
dev.In theProtect matching branches section, selectRequire statuschecks to pass before merging.
Search for your Cloud Build trigger name created previously.
ClickCreate.
Repeat steps 3–7, settingBranch name pattern to
prod.
This configuration is important toprotect both thedev andprod branches. Meaning, commits must first be pushed toanother branch, and only then they can be merged to the protected branch. Inthis tutorial, the protection requires that the Cloud Build executionbe successful for the merge to be allowed.
Promote changes to the development environment
You have a pull request waiting to be merged. It's time to apply the state youwant to yourdev environment.
On GitHub, navigate to the main page of your forked repository.
https://github.com/YOUR_GITHUB_USERNAME/terraform-google-dataplex-auto-data-quality
Under your repository name, clickPull requests.
Click the pull request you just created.
ClickMerge pull request, and then clickConfirm merge.
Check that a new Cloud Build has been triggered:
Open the build and check the logs. It will show you all of the resourcesthat Terraform is creating and managing.
Promote changes to the production environment
Now that you have your development environment fully tested, you can promoteyour code for data quality rules to production.
On GitHub, navigate to the main page of your forked repository.
https://github.com/YOUR_GITHUB_USERNAME/terraform-google-dataplex-auto-data-quality
Under your repository name, clickPull requests.
ClickNew pull request.
For thebase repository, select your just-forked repository.
Forbase, select
prodfrom your own base repository. Forcompare, selectdev.ClickCreate pull request.
Fortitle, enter a title such as
Changing label name, andthen clickCreate pull request.Review the proposed changes, including the
terraform plandetails fromCloud Build, and then clickMerge pull request.ClickConfirm merge.
In the Google Cloud console, open theBuild History page to seeyour changes being applied to the production environment:
You have successfully configured data quality rules that are managed usingTerraform and Cloud Build.
Clean up
After you've finished the tutorial, clean up the resources you created onGoogle Cloud so you won't be billed for them in the future.
Delete the project
Delete the GitHub repository
To avoid blocking new pull requests on your GitHub repository, you can deleteyour branch protection rules:
- In GitHub, navigate to the main page of your forked repository.
- Under your repository name, clickSettings.
- In the left menu, clickBranches.
- Under theBranch protection rules section, click theDelete buttonfor both
devandprodrows.
Optionally, you can completely uninstall the Cloud Build app fromGitHub:
In GitHub, go to theGitHub Applications page.
In theInstalled GitHub Apps tab, clickConfigure in theCloud Build row. Then, in theDanger zone section,click theUninstall button in theUninstall Google Cloud Builderrow.
At the top of the page, you see a message saying "You're all set. A job has been queued to uninstall Google Cloud Build."
In theAuthorized GitHub Apps tab, click theRevoke button in theGoogle Cloud Build row, thenI understand, revoke access.
If you don't want to keep your GitHub repository, delete it:
- In GitHub, go to the main page of your forked repository.
- Under your repository name, clickSettings.
- Go toDanger Zone.
- ClickDelete this repository, and follow the confirmation steps.
What's next
- Learn aboutauto data quality.
- Learn more aboutDevOps and DevOps best practices.
- Explore theCloud Foundation Toolkit for more Terraform templates.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-15 UTC.