Manage data quality rules as code with Terraform

This tutorial explains how to manage Dataplex Universal Catalogdata quality rules as code withTerraform, Cloud Build, and GitHub.

Many different options for data quality rules are available to define andmeasure the quality of your data. When you automate the process of deployingdata quality rules as a part of your larger infrastructure management strategy,you ensure that your data is consistently and predictably subjected to the rulesthat you assign to it.

If you have different versions of a dataset for multiple environments, such asdev andprod environments, Terraform provides a reliable way to assign dataquality rules to environment-specific versions of datasets.

Version control also is an importantDevOps best practice. Managingyour data quality rules as code provides you with versions of your data qualityrules that are available in your GitHub history. Terraform can alsosave its state to Cloud Storage, which can store earlier versions of thestate file.

For more information about Terraform and Cloud Build, seeOverview of Terraform on Google Cloud andCloud Build.

Architecture

To understand how this tutorial uses Cloud Build for managingTerraform executions, consider the following architecture diagram. Note that ituses GitHub branches—dev andprod—to represent actual environments.

Note: For simplicity, this tutorial implements onlydev andprodenvironments. You can extend this behavior to deploy to moreenvironments and to create projects under yourorganization hierarchy if needed.

Infrastructure with dev and prod environments.

The process starts when you push Terraform code to either thedev orprodbranch. In this scenario, Cloud Build triggers and then appliesTerraform manifests to achieve the state you want in the respective environment.On the other hand, when you push Terraform code to any other branch—for example,to a feature branch—Cloud Build runs to executeterraform plan, butnothing is applied to any environment.

Ideally, either developers or operators must make infrastructure proposals tonon-protected branches and then submit them throughpull requests.TheCloud Build GitHub app,discussed later in this tutorial, automatically triggers the build jobs andlinks theterraform plan reports to these pull requests. This way, you candiscuss and review the potential changes with collaborators and add follow-upcommits before changes are merged into the base branch.

If no concerns are raised, you must first merge the changes to thedevbranch. This merge triggers an infrastructure deployment to thedevenvironment, allowing you to test this environment. After you have tested andare confident about what was deployed, you must merge thedev branch into theprod branch to trigger the infrastructure installation to the productionenvironment.

Objectives

  • Set up your GitHub repository.
  • Configure Terraform to store state in a Cloud Storage bucket.
  • Grant permissions to your Cloud Build service account.
  • Connect Cloud Build to your GitHub repository.
  • Establish Dataplex Universal Catalog data quality rules.
  • Change your environment configuration in a feature branch and test.
  • Promote changes to the development environment.
  • Promote changes to the production environment.

Costs

In this document, you use the following billable components of Google Cloud:

To generate a cost estimate based on your projected usage, use thepricing calculator.

New Google Cloud users might be eligible for afree trial.

When you finish the tasks that are described in this document, you can avoid continued billing by deleting the resources that you created. For more information, seeClean up.

Before you begin

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Roles required to select or create a project

    • Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
    • Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.create permission.Learn how to grant roles.
    Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.

    Go to project selector

  3. Verify that billing is enabled for your Google Cloud project.

  4. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Roles required to select or create a project

    • Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
    • Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.create permission.Learn how to grant roles.
    Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.

    Go to project selector

  5. Verify that billing is enabled for your Google Cloud project.

  6. In the Google Cloud console, activate Cloud Shell.

    Activate Cloud Shell

    At the bottom of the Google Cloud console, aCloud Shell session starts and displays a command-line prompt. Cloud Shell is a shell environment with the Google Cloud CLI already installed and with values already set for your current project. It can take a few seconds for the session to initialize.

  7. In Cloud Shell, get the ID of the project you just selected:
    gcloud config get-value project
    If this command doesn't return the project ID, configure Cloud Shell touse your project. ReplacePROJECT_ID with your projectID.
    gcloud config set projectPROJECT_ID
  8. Enable the required APIs:
    gcloud services enable bigquery.googleapis.com cloudbuild.googleapis.com compute.googleapis.com dataplex.googleapis.com
    This step might take a few minutes to finish.
  9. If you've never used Git in Cloud Shell, configure it with yourname and email address:
    git config --global user.email "YOUR_EMAIL_ADDRESS"git config --global user.name "YOUR_NAME"
    Git uses this information to identify you as the author of the commits that youcreate in Cloud Shell.

Set up your GitHub repository

In this tutorial, you use a single Git repository to define your cloudinfrastructure. You orchestrate this infrastructure by having differentbranches corresponding to different environments:

  • Thedev branch contains the latest changes that are applied to thedevelopment environment.
  • Theprod branch contains the latest changes that are applied to theproduction environment.

With this infrastructure, you can always reference the repository to know whatconfiguration is expected in each environment and to propose new changes byfirst merging them into thedev environment. You then promote the changes bymerging thedev branch into the subsequentprod branch.

To get started, fork theterraform-google-dataplex-auto-data-quality repository.

  1. On GitHub, navigate tohttps://github.com/GoogleCloudPlatform/terraform-google-dataplex-auto-data-quality.git.

  2. ClickFork.

    Now you have a copy of theterraform-google-dataplex-auto-data-qualityrepository with source files.

  3. In Cloud Shell, clone the following forked repository:

    cd ~git clone https://github.com/GITHUB_USERNAME/terraform-google-dataplex-auto-data-quality.gitcd ~/terraform-google-dataplex-auto-data-quality

    Replace the following:

    • GITHUB_USERNAME: your GitHub username
  4. Createdev andprod branches:

    git checkout -b prodgit checkout -b dev

The code in this repository is structured as follows:

  • Theenvironments/ folder contains subfolders that represent environments,such asdev andprod, which provide logical separation between workloadsat different stages of maturity, development and production, respectively.

  • Themodules/ folder contains inline Terraform modules. These modulesrepresent logical groupings of related resources and are used to share codeacross different environments. Themodules/deploy/ module here represents atemplate for a deployment and is reused for different deploymentenvironments.

  • Withinmodules/deploy/:

    • Therule/ folder containsyaml filescontaining data quality rules. One file represents a set of data qualityrules for one table. This file is used indev andprod environments.

    • Theschemas/ folder contains the table schema for theBigQuery table deployed in this infrastructure.

    • Thebigquery.tf file contains the configuration forBigQuery tables created in this deployment.

    • Thedataplex.tf file contains a Dataplex Universal Catalog data scan fordata quality. This file is used in conjunction torules_file_parsing.tf to read data quality rules from ayaml fileinto the environment.

  • Thecloudbuild.yaml file is a build configuration file that containsinstructions for Cloud Build, such as how to perform tasks basedon a set of steps. This file specifies a conditional execution depending onthe branch Cloud Build is fetching the code from, for example:

    • Fordev andprod branches, the following steps are executed:

      1. terraform init
      2. terraform plan
      3. terraform apply
    • For any other branch, the following steps are executed:

      1. terraform init for allenvironments subfolders
      2. terraform plan for allenvironments subfolders

To ensure that the changes being proposed are appropriate for every environment,terraform init andterraform plan are run for all environments. Beforemerging the pull request, you can review the plans to make sure that accessisn't being granted to an unauthorized entity, for example.

Configuring Terraform to store state in Cloud Storage buckets

By default, Terraform storesstate locally in a file namedterraform.tfstate. This default configuration canmake Terraform usage difficult for teams, especially when many users runTerraform at the same time and each machine has its own understanding of thecurrent infrastructure.

To help you avoid such issues, this section configures aremote state that points to a Cloud Storage bucket. Remote state is a feature ofbackends and, in this tutorial, is configured in thebackend.tf file.

# Copyright 2024 Google LLC## Licensed under the Apache License, Version 2.0 (the "License");# you may not use this file except in compliance with the License.# You may obtain a copy of the License at##     https://www.apache.org/licenses/LICENSE-2.0## Unless required by applicable law or agreed to in writing, software# distributed under the License is distributed on an "AS IS" BASIS,# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.# See the License for the specific language governing permissions and# limitations under the License.terraform{backend"gcs"{bucket="PROJECT_ID-tfstate-dev"}}

A separatebackend.tf file exists in each of thedev andprodenvironments. It is considered best practice to use a differentCloud Storage bucket for each environment.

In the following steps, you create two Cloud Storage buckets fordevandprod and change a few files to point to your new buckets and yourGoogle Cloud project.

  1. In Cloud Shell, create the two Cloud Storage buckets:

    DEV_BUCKET=gs://PROJECT_ID-tfstate-devgcloudstoragebucketscreate${DEV_BUCKET}PROD_BUCKET=gs://PROJECT_ID-tfstate-prodgcloudstoragebucketscreate${PROD_BUCKET}
  2. To keep the history of your deployments, enableObject Versioning:

    gcloudstoragebucketsupdate${DEV_BUCKET}--versioninggcloudstoragebucketsupdate${PROD_BUCKET}--versioning

    Enabling object versioning increasesstorage costs,which you can mitigate by configuringObject Lifecycle Management to delete old state versions.

  3. In each environment, in themain.tf andbackend.tf files , replacePROJECT_ID with the project ID:

    cd ~/terraform-google-dataplex-auto-data-qualitysed -i s/PROJECT_ID/PROJECT_ID/g environments/*/main.tfsed -i s/PROJECT_ID/PROJECT_ID/g environments/*/backend.tf

    On OS X or macOS, you might need to add two quotation marks ("") aftersed -i, as follows:

    cd ~/solutions-terraform-cloudbuild-gitopssed -i "" s/PROJECT_ID/PROJECT_ID/g environments/*/main.tfsed -i "" s/PROJECT_ID/PROJECT_ID/g environments/*/backend.tf
  4. Check whether all files were updated:

    gitstatus

    The following is a sample output:

    On branch devYour branch is up-to-date with 'origin/dev'.Changes not staged for commit: (use "git add <file>..." to update what will be committed) (use "git checkout -- <file>..." to discard changes in working directory)       modified:   environments/dev/backend.tf       modified:   environments/dev/main.tf       modified:   environments/prod/backend.tf       modified:   environments/prod/main.tfno changes added to commit (use "git add" and/or "git commit -a")
  5. Commit and push your changes:

    gitadd--allgitcommit-m"Update project IDs and buckets"gitpushorigindev

    Depending on your GitHub configuration, you must authenticate to push thepreceding changes.

Grant permissions to your Cloud Build service account

To allowCloud Build service account to run Terraform scripts with the goal of managing Google Cloud resources,you need to grant it appropriate access to your project. For simplicity,project editor access is granted in this tutorial. But when the project editor role has awide-range permission, in production environments, you must follow your company'sIT security best practices, usually providingleast-privileged access.

  1. In Cloud Shell, retrieve the email for your project'sCloud Build service account:

    CLOUDBUILD_SA="$(gcloudprojectsdescribe$PROJECT_ID\--format'value(projectNumber)')@cloudbuild.gserviceaccount.com"
  2. Grant the required access to your Cloud Build service account:

    gcloudprojectsadd-iam-policy-binding$PROJECT_ID\--memberserviceAccount:$CLOUDBUILD_SA--roleroles/editor

Directly connect Cloud Build to your GitHub repository

This section describes you how to install theCloud Build GitHub app.This installation lets you connect your GitHub repository with yourGoogle Cloud project so that Cloud Build can automatically applyyour Terraform manifests each time you create a new branch or push code toGitHub.

The following steps provide instructions for installing the app only for theterraform-google-dataplex-auto-data-quality repository, but you can choose toinstall the app for more or all of your repositories.

  1. In GitHub Marketplace, go to theCloud Build app page.

    • If this is your first time configuring an app in GitHub: ClickSetupwith Google Cloud Build at the bottom of the page. Then clickGrantthis app access to your GitHub account.
    • If this is not the first time configuring an app in GitHub: ClickConfigure access. TheApplications page of your personalaccount opens.
  2. ClickConfigure in the Cloud Build row.

  3. SelectOnly select repositories, then selectterraform-google-dataplex-auto-data-quality to connect to the repository.

  4. ClickSave orInstall—the button label changes depending onyour workflow. You are redirected to Google Cloud to continue theinstallation.

  5. Sign in with your Google Cloud account. If requested, authorizeCloud Build integration with GitHub.

  6. On theCloud Build page, select your project. Awizard appears.

  7. In theSelect repository section, select your GitHub account and theterraform-google-dataplex-auto-data-quality repository.

  8. If you agree with the terms and conditions, select the checkbox, then clickConnect.

  9. In theCreate a trigger section, clickCreate a trigger:

    1. Add a trigger name, such aspush-to-branch. Note this trigger namebecause you will need it later.
    2. In theEvent section, selectPush to a branch.
    3. In theSource section, select.* in theBranch field.
    4. ClickCreate.

The Cloud Build GitHub app is configured, and your GitHubrepository is linked to your Google Cloud project. Changes tothe GitHub repository trigger Cloud Build executions, which reportthe results back to GitHub by usingGitHub Checks.

Change your environment configuration in a new feature branch

You have most of your environment configured. Make necessary code changes inyour local environment:

  1. On GitHub, navigate to the main page of your forked repository.

    https://github.com/YOUR_GITHUB_USERNAME/terraform-google-dataplex-auto-data-quality
  2. Make sure you are on thedev branch.

  3. To open the file for editing, go to themodules/deploy/dataplex.tf file.

  4. On line 19, change the labelthe_environment toenvironment.

  5. Add a commit message at the bottom of the page, such as "modifying label",and selectCreate a new branch for this commit and start a pull request.

  6. ClickPropose changes.

  7. On the following page, clickCreate pull request to open a new pullrequest with your change to thedev branch.

    After your pull request is open, a Cloud Build job isautomatically initiated.

  8. ClickShow all checks and wait for the check to become green.Don't merge your pull request yet. Merging is done in a later step of the tutorial.

  9. ClickDetails to see more information, including the output of theterraform plan atView more details on Google Cloud Build link.

Note that the Cloud Build job ran the pipeline defined in thecloudbuild.yaml file. This pipeline has different behaviors depending on thebranch being fetched. The build checks whether the$BRANCH_NAME variable matches any environment folder. If so,Cloud Build executesterraform plan for that environment.Otherwise, Cloud Build executesterraform plan for all environmentsto make sure that the proposed change is appropriate for all of them. If any ofthese plans fail to execute, the build fails.

-id:'tfplan'name:'hashicorp/terraform:1.9.8'entrypoint:'sh'args:-'-c'-|if [ -d "environments/$BRANCH_NAME/" ]; thencd environments/$BRANCH_NAMEterraform planelsefor dir in environments/*/docd ${dir}env=${dir%*/}env=${env#*/}echo ""echo "*************** TERRAFORM PLAN ******************"echo "******* At environment: ${env} ********"echo "*************************************************"terraform plan || exit 1cd ../../donefi

Similarly, theterraform apply command runs for environment branches, but itis completely ignored in any other case. In this section, you have submitted acode change to a new branch, so no infrastructure deployments were applied toyour Google Cloud project.

-id:'tfapply'name:'hashicorp/terraform:1.9.8'entrypoint:'sh'args:-'-c'-|if [ -d "environments/$BRANCH_NAME/" ]; thencd environments/$BRANCH_NAMEterraform apply -auto-approveelseecho "***************************** SKIPPING APPLYING *******************************"echo "Branch '$BRANCH_NAME' does not represent an official environment."echo "*******************************************************************************"fi

Enforce Cloud Build execution success before merging branches

To make sure merges can be applied only when respective Cloud Buildexecutions are successful, follow these steps:

  1. On GitHub, navigate to the main page of your forked repository.

    https://github.com/YOUR_GITHUB_USERNAME/terraform-google-dataplex-auto-data-quality
  2. Under your repository name, clickSettings.

  3. In the left menu, clickBranches.

  4. UnderBranch protection rules, clickAdd rule.

  5. InBranch name pattern, typedev.

  6. In theProtect matching branches section, selectRequire statuschecks to pass before merging.

  7. Search for your Cloud Build trigger name created previously.

  8. ClickCreate.

  9. Repeat steps 3–7, settingBranch name pattern toprod.

This configuration is important toprotect both thedev andprod branches. Meaning, commits must first be pushed toanother branch, and only then they can be merged to the protected branch. Inthis tutorial, the protection requires that the Cloud Build executionbe successful for the merge to be allowed.

Promote changes to the development environment

You have a pull request waiting to be merged. It's time to apply the state youwant to yourdev environment.

  1. On GitHub, navigate to the main page of your forked repository.

    https://github.com/YOUR_GITHUB_USERNAME/terraform-google-dataplex-auto-data-quality
  2. Under your repository name, clickPull requests.

  3. Click the pull request you just created.

  4. ClickMerge pull request, and then clickConfirm merge.

  5. Check that a new Cloud Build has been triggered:

    Go to the Cloud Build page

  6. Open the build and check the logs. It will show you all of the resourcesthat Terraform is creating and managing.

Promote changes to the production environment

Now that you have your development environment fully tested, you can promoteyour code for data quality rules to production.

  1. On GitHub, navigate to the main page of your forked repository.

    https://github.com/YOUR_GITHUB_USERNAME/terraform-google-dataplex-auto-data-quality
  2. Under your repository name, clickPull requests.

  3. ClickNew pull request.

  4. For thebase repository, select your just-forked repository.

  5. Forbase, selectprod from your own base repository. Forcompare, selectdev.

  6. ClickCreate pull request.

  7. Fortitle, enter a title such asChanging label name, andthen clickCreate pull request.

  8. Review the proposed changes, including theterraform plan details fromCloud Build, and then clickMerge pull request.

  9. ClickConfirm merge.

  10. In the Google Cloud console, open theBuild History page to seeyour changes being applied to the production environment:

    Go to the Cloud Build page

You have successfully configured data quality rules that are managed usingTerraform and Cloud Build.

Clean up

After you've finished the tutorial, clean up the resources you created onGoogle Cloud so you won't be billed for them in the future.

Delete the project

    Caution: Deleting a project has the following effects:
    • Everything in the project is deleted. If you used an existing project for the tasks in this document, when you delete it, you also delete any other work you've done in the project.
    • Custom project IDs are lost. When you created this project, you might have created a custom project ID that you want to use in the future. To preserve the URLs that use the project ID, such as anappspot.com URL, delete selected resources inside the project instead of deleting the whole project.

    If you plan to explore multiple architectures, tutorials, or quickstarts, reusing projects can help you avoid exceeding project quota limits.

  1. In the Google Cloud console, go to theManage resources page.

    Go to Manage resources

  2. In the project list, select the project that you want to delete, and then clickDelete.
  3. In the dialog, type the project ID, and then clickShut down to delete the project.

Delete the GitHub repository

To avoid blocking new pull requests on your GitHub repository, you can deleteyour branch protection rules:

  1. In GitHub, navigate to the main page of your forked repository.
  2. Under your repository name, clickSettings.
  3. In the left menu, clickBranches.
  4. Under theBranch protection rules section, click theDelete buttonfor bothdev andprod rows.

Optionally, you can completely uninstall the Cloud Build app fromGitHub:

  1. In GitHub, go to theGitHub Applications page.

  2. In theInstalled GitHub Apps tab, clickConfigure in theCloud Build row. Then, in theDanger zone section,click theUninstall button in theUninstall Google Cloud Builderrow.

    At the top of the page, you see a message saying "You're all set. A job has been queued to uninstall Google Cloud Build."

  3. In theAuthorized GitHub Apps tab, click theRevoke button in theGoogle Cloud Build row, thenI understand, revoke access.

If you don't want to keep your GitHub repository, delete it:

  1. In GitHub, go to the main page of your forked repository.
  2. Under your repository name, clickSettings.
  3. Go toDanger Zone.
  4. ClickDelete this repository, and follow the confirmation steps.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.