Profile Vertex AI data in a single project

This page describes how to configure Vertex AI data discovery at theproject level. If you want to profile an organization or folder, seeProfileVertex AI data in an organization orfolder.

For more information about the discovery service, seeDataprofiles.

Before you begin

  1. If you have an organization-level discovery subscription—including onethrough Security Command Center—be aware that this project-level discoveryconfiguration isn't included in your subscription and is billed separately.We recommend that you use an organization-level discovery configuration toprofile the project. For more information, seeProfile select projects ordata assets in an organization orfolder.

  2. Make sure the Cloud Data Loss Prevention API is enabled on your project:

    1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
    2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

      Roles required to select or create a project

      • Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
      • Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.create permission.Learn how to grant roles.
      Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.

      Go to project selector

    3. Verify that billing is enabled for your Google Cloud project.

    4. Enable the required API.

      Roles required to enable APIs

      To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enable permission.Learn how to grant roles.

      Enable the API

    5. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

      Roles required to select or create a project

      • Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
      • Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.create permission.Learn how to grant roles.
      Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.

      Go to project selector

    6. Verify that billing is enabled for your Google Cloud project.

    7. Enable the required API.

      Roles required to enable APIs

      To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enable permission.Learn how to grant roles.

      Enable the API

  3. Confirm that you have the IAM permissions thatare required to configure data profiles at theproject level.

  4. You must have an inspection template in each region where you have data to beprofiled. If you want to use a single template for multiple regions, you can usea template that is stored in theglobal region. If organizationalpolicies prevent you from creating an inspection template in theglobal region, thenyou must set a dedicated inspection template for each region. For moreinformation, seeData residency considerations.

    This task lets you create an inspection template in theglobal region only.If you need dedicated inspection templates for one or more regions, you mustcreate thosetemplatesbefore performing this task.

  5. You can configure Sensitive Data Protection to send notifications toPub/Sub when certain events occur, such as whenSensitive Data Protection profiles a new dataset. If you want to use thisfeature, you must firstcreate a Pub/Subtopic.

Create a scan configuration

  1. Go to theCreate scan configuration page.

    Go to Create scan configuration

  2. Go to your project. On the toolbar, click the project selector andselect your project.

The following sections provide more information about the steps in theCreatescan configuration page. At the end of each section, clickContinue.

Select a discovery type

SelectVertex AI.

Select scope

Do one of the following:

  • If you want to scan a single dataset, selectScan onedataset.

    For each dataset, you can have only one single-resource scanconfiguration. For more information, seeProfile a single dataresource.

    Fill in the details of the dataset that you want to profile.

  • If you want to perform standard project-level profiling, selectScanselected project.

Manage schedules

If thedefault profilingfrequency suitsyour needs, you can skip this section of theCreate scan configuration page.

Configure this section for the following reasons:

  • To make fine-grained adjustments to the profiling frequency of all your dataor certain subsets of your data.
  • To specify the datasets that you don't want to profile.
  • To specify the datasets that you don't want profiled more than once.

To make fine-grained adjustments to profiling frequency, follow these steps:

  1. ClickAdd schedule.
  2. In theFilters section, you define one or more filters that specifywhich datasets are in the schedule's scope. A dataset is considered to be inthe schedule's scope if it matches at least one of the filters defined.

    To configure a filter, specify a project ID or a regular expression thatspecifies one or more projects.

    Regular expressions must followRE2 syntax.

    For example, if you want all datasets in a project to be included in thefilter, enter the project ID in theProject ID field.

    If you want to add more filters, clickAdd filter and repeat this step.

  3. ClickFrequency.

  4. In theFrequency section, specify whether the discovery serviceshould profile the datasets that you selected and, if so, how often:

    • If you never want the datasets to be profiled, turn offDo profile this data.

    • If you want the datasets to be profiled at least once, leaveDo profile this data on.

      In the succeeding fields in this section, you specify whether the systemshould reprofile your data and what events should trigger a reprofileoperation. For more information, seeFrequency of data profilegeneration.

      1. ForOn a schedule, specify how often you want the the datasets to be reprofiled. The datasets are reprofiled regardless of whether they underwent any changes.
      2. ForWhen inspect template changes, specify whether you want your data to be reprofiled when the associated inspection template is updated, and if so, how often.Note: You specify the inspection templates to use in theSelect inspection template step on this page.

        An inspection template change is detected when either of the following occurs:

        • The name of an inspection template changes in your scan configuration.
        • TheupdateTime of an inspection template changes.

      3. For example, if you set an inspection template for theus-west1 region and you update that inspection template, then only data in theus-west1 region will be reprofiled.

  5. Optional: ClickConditions.

    In theConditions section, you specify any conditions that thedatasets—defined in your filters—must meet beforeSensitive Data Protection profiles them.

    If needed, set the following:

    • Minimum condition: If you want to delay profiling of a dataset untilit reaches a certain age, turn on this option. Then, enter the minimumduration.

    • Time condition: If you don't want old datasets to ever be profiled,turn on this option. Then, use the date picker to select a date and time.Any dataset created on or before your selected timestamp is excluded fromprofiling.

    Example conditions

    Suppose that you have the following configuration:

    • Minimum conditions

      • Minimum duration: 24 hours
    • Time condition

      • Timestamp: 05/4/22, 11:59 PM

    In this case, Sensitive Data Protection excludes any dataset that wascreated on or before May 4, 2022, 11:59 PM. Among the datasets that werecreated after that date and time, Sensitive Data Protection profiles onlythe datasets that are at least 24 hours old.

  6. ClickDone.

  7. Optional: To add more schedules, clickAdd schedule and repeat theprevious steps.

  8. To specify precedence between schedules, reorder them using the

    The order of the schedules specifies how conflicts between schedules areresolved. If a dataset matches the filters of two different schedules,the schedule higher in the schedules list dictates the profiling frequencyfor that dataset.

    Note: If your discovery pricing mode issubscription mode, the rate at which Sensitive Data Protection profiles your data is affected by how much capacity you purchased. To determine your daily profiling capacity, seeMonitoring utilization. If you haveunder-provisioned capacity, then the profiling frequencies that you set in your schedules might not be followed. If there is a backlog of data to be profiled, the schedule order doesn't dictate the order in which Sensitive Data Protection profiles the data in the backlog. Rather, all data resources in scope get a randomly assigned slot in the queue.
  9. Optional: Edit or turn offCatch-all schedule.

    The last schedule in the list is the catch-all schedule. This schedule coversthe datasets in your selected scope that don't match any of theschedules that you created. The catch-all schedule follows thesystemdefault profilingfrequency.

    • To adjust the catch-all schedule, clickEdit schedule, and then adjustthe settings as needed.
    • To prevent Sensitive Data Protection from profiling any resource that iscovered by the catch-all schedule, turn offProfile the resourcesthat don't match any custom schedule.

Select inspection template

Depending on how you want to provide an inspection configuration, choose one ofthe following options. Regardless of which option you choose,Sensitive Data Protection scans your data in the region where that data is stored.That is, your data doesn't leave its region of origin.

Option 1: Create an inspection template

Choose this option if you want to create a new inspection template in theglobal region.

  1. ClickCreate new inspection template.
  2. Optional: To modify the default selection of infoTypes, clickManage infoTypes.

    For more information about how to manage built-in and custom infoTypes, seeManage infoTypes through theGoogle Cloud console.

    You must have at least one infoType selected to continue.

  3. Optional: Configure the inspection template further by adding rulesetsand setting a confidence threshold. For more information, seeConfigure detection.

When Sensitive Data Protection creates the scan configuration, it stores thisnew inspection template in theglobal region.

Option 2: Use an existing inspection template

Choose this option if you have existing inspection templates that youwant to use.

  1. ClickSelect existing inspection template.
  2. Enter the full resource name of the inspection template that you want to use. TheRegion field is automatically populated with the name of the region where your inspection template is stored.

    The inspection template that you enter must be in the same region as the data to be profiled.

    To respect data residency, Sensitive Data Protection doesn't use an inspection template outside the region where that template is stored.

    To find the full resource name of an inspection template, follow these steps:

    1. Go to your inspection templates list. This page opens on a separate tab.

      Go to inspection templates

    2. Select the project that contains the inspection template that you want to use.
    3. SelectConfiguration> Templates> Inspect, and then click the template ID of the template that you want to use.
    4. On the page that opens, copy the full resource name of the template. The full resource name follows this format:
      projects/PROJECT_ID/locations/REGION/inspectTemplates/TEMPLATE_ID
    5. On theCreate scan configuration page, in theTemplate name field, paste the full resource name of the template.
  3. To add an inspection template for another region, clickAdd inspection template and enter the template's full resource name. Repeat this for each region where you have a dedicated inspection template.
  4. Optional: Add an inspection template that's stored in theglobal region. Sensitive Data Protection automatically uses that template for data in regions where you don't have a dedicated inspection template.
  5. Caution: If you don't include an inspection template that's stored in theglobal region, Sensitive Data Protection can't profile data in regions that don't have a dedicated inspection template. For more information, seeData residency considerations.

Add actions

This section describes how to specify actions that you wantSensitive Data Protection to take after profiling a dataset. These actionsare useful if you want to send insights gathered from data profiles to otherGoogle Cloud services.

Note: For information about how other Google Cloud services may charge you for configuring actions, seePricing for exporting data profiles.

Publish to Security Command Center

Findings from data profiles provide context when you triage and develop responseplans for your vulnerability and threat findings inSecurity Command Center.

Note: You can also configure Security Command Center to automatically prioritize resources for theattack path simulation feature according to the calculated sensitivity of the data that the resources contain. For more information, seeSet resource priority values automatically by data sensitivity.Before you can use this action, Security Command Center must be activated at theorganization level. Turning on Security Command Center at the organization levelenables the flow of findings from integrated services likeSensitive Data Protection. Sensitive Data Protection works withSecurity Command Center in all service tiers.

If Security Command Center isn't activated at the organization level,Sensitive Data Protection findings won't appear inSecurity Command Center. For more information, seeCheck the activation level ofSecurity Command Center.

To send the results of your data profiles to Security Command Center, make sure thePublish to Security Command Center option is turned on.

For more information, seePublish data profiles toSecurity Command Center.

Save data profile copies to BigQuery

Sensitive Data Protection saves a copy of each generated data profilein a BigQuery table. If you don't provide the details of yourpreferred table, Sensitive Data Protection creates a dataset and table in theproject.By default, the dataset is namedsensitive_data_protection_discovery andthe table is nameddiscovery_profiles.

Important: The output table usesDataProfileBigQueryRowSchemaas its schema. This schema can change as Sensitive Data Protection addsfeatures. Make sure that your workflows can handle schema changes, for example,by ignoring unknown fields.

This action lets you keep a history of all of your generated profiles. Thishistory can be useful for creating audit reports andvisualizing dataprofiles. You can alsoload this information into other systems.

Also, this option lets you see all of your data profiles in a single view,regardless of which region your data resides in. Although you can alsoview thedata profiles through theGoogle Cloud console, theconsole displays the profiles in only one region at a time.

When Sensitive Data Protection fails to profile a dataset, it periodicallyretries. To minimize noise in the exported data, Sensitive Data Protectionexports only the successfully generated profiles to BigQuery.

Sensitive Data Protection starts exporting profiles from the time you turn onthis option. Profiles that were generated before you turned on exporting aren'tsaved to BigQuery.

Note:The service agent associated with your project must have write access on the table where the profile copies will be saved.

For example queries that you can use when analyzing data profiles,seeAnalyze data profiles.

Save sample discovery findings to BigQuery

Sensitive Data Protection can add sample findings to aBigQuery table of your choice. Sample findings represent a subsetof all findings and might not represent all infoTypes that were discovered.Normally, the system generates around 10 sample findings per dataset, butthis number can vary for each discovery run.

Each finding includes the actual string (also calledquote) that was detectedand its exact location.

This action is useful if you want to evaluate whether yourinspectionconfiguration is correctlymatching the type of information that you want to flag as sensitive. Using theexported data profiles and the exported sample findings, you can runqueries to get more information about the specific items that were flagged, theinfoTypes they matched, their exact locations, their calculated sensitivitylevels, and other details.

Important: The output table usesDataProfileFindingas its schema. This schema can change as Sensitive Data Protection addsfeatures. Make sure that your workflows can handle schema changes, for example,by ignoring unknown fields.

To save sample findings to a BigQuery table, follow thesesteps:

  1. Turn onSave sample discovery findings to BigQuery.

  2. Enter the details of the BigQuerytable where you want to save the sample findings.

    The table that you specify for this action must be different from thetable used for theSave data profile copies to BigQuery action.

    • ForProject ID, enter the ID of an existing project where you wantto export the findings to.

    • ForDataset ID, enter the name of an existing dataset in the project.

    • ForTable ID, enter the name of the BigQuery table wherewant to save the findings to. If this table doesn't exist,Sensitive Data Protection automatically creates it for you using the namethat you provide.

Note:The service agent associated with your project must have write access on the table.

For information about the contents of each finding that is saved in theBigQuery table, seeDataProfileFinding.

Publish to Pub/Sub

Turning onPublish to Pub/Sub lets you take programmaticactions based on profiling results. You can use Pub/Subnotifications to develop a workflow for catching and remediating findingswith significant data risk or sensitivity.

To send notifications to a Pub/Sub topic, follow these steps:

  1. Turn onPublish to Pub/Sub.

    A list of options appears. Each option describes an event that causesSensitive Data Protection to send a notification to Pub/Sub.

  2. Select the events that should trigger a Pub/Sub notification.

    If you selectSend a Pub/Sub notification each time a profile is updated,Sensitive Data Protection sends a notification when there's a change in thesensitivity level, data risk level, detected infoTypes, public access, andother importantmetrics in theprofile.

  3. For each event you select, follow these steps:

    1. Enter the name of the topic. The name must be in the following format:

      projects/PROJECT_ID/topics/TOPIC_ID

      Replace the following:

      • PROJECT_ID: the ID of the project associated with thePub/Sub topic.
      • TOPIC_ID: the ID of the Pub/Sub topic.
    2. Specify whether to include the full dataset profile in thenotification, or just the full resource name of the dataset thatwas profiled.

    3. Set the minimum data risk and sensitivity levels that must be met forSensitive Data Protection to send a notification.

    4. Specify whether only one or both of the data risk and sensitivityconditions must be met. For example, if you chooseAND, thenboth the data risk and the sensitivity conditions must bemet before Sensitive Data Protection sends a notification.

Note:The service agent associated with your project must have publishing access on the Pub/Sub topic. An example of a role that has publishing access is the Pub/Sub Publisher role (roles/pubsub.publisher). If there are configuration or permission issues with the Pub/Sub topic,Sensitive Data Protection retries sending the Pub/Sub notification for up totwo weeks. After two weeks, the notification is discarded.

Send to Dataplex Universal Catalog as aspects

This action lets you addDataplex Universal Catalog aspectsto profiled datasets based on insights from data profiles.This action is only applied to new and updated profiles.Existing profiles that aren't updated aren't sent to Dataplex Universal Catalog.

When you enable this action, Sensitive Data Protection attaches theSensitive Data Protection profile aspect to theDataplex Universal Catalogentry for each new or updateddataset that you profile. The generated aspects contain insights gatheredfrom the data profiles. You can then search your organization and projects forentries with specificSensitive Data Protection profile aspect values.

To send the data profiles to Dataplex Universal Catalog, make sure that theSend to Dataplex Catalog as aspects option is turned on.

For more information, seeAdd Dataplex Universal Catalog aspects basedon insights from data profiles.

Set fallback processing locations for images

In general, Sensitive Data Protection processes your data in the locationwhere the data is stored. However, images can only be processed in amulti-region or in theglobal region. If you set a fallback location, thenSensitive Data Protection uses your fallback location to process images thataren't in a multi-region or in theglobal region. If you skip this section,then those images aren't processed.

To set fallback locations for image processing, select one or both of thefollowing:

  • Fall back to the multi-region: If an image can't be processed in itsoriginal location, then the image is processed in themulti-region that corresponds tothe image's original location. If the image's original location has nocorresponding multi-region, then the image is skipped.
  • Fall back to global: If an image can't be processed in its originallocation, then the image is processed in theglobal region.

If you select both options, Sensitive Data Protection chooses which locationto use as a fallback location.

Set location to store configuration

Click theResource location list, and select the region where youwant to store this scan configuration. All scan configurations that youlater create will also be stored in this location.

Where you choose to store your scan configuration doesn't affect the data to bescanned. Your datais scanned in the same region where that data is stored. For more information,seeData residency considerations.

Note: If you already have an existing scan configuration, you can't change the valueset in this field. All scan configurations are stored in the same location.If you want to change the location of all your scan configurations, you mustdelete them,recreate them, and store them in the new location.

Review and create

  1. If you want to make sure that profiling doesn't start automatically after you create the scan configuration, selectCreate scan in paused mode.

    This option is useful in the following cases:

    • You opted to save data profiles to BigQuery and you want to make sure the service agent has write access to the BigQuery table where the data profile copies will be saved.
    • You opted to save sample discovery findings to BigQuery and you want to make sure that the service agent has write access to the BigQuery table where the sample findings will be saved.
    • You configured Pub/Sub notifications and you want togrant publishing access to the service agent.
  2. Review your settings and clickCreate.

    Sensitive Data Protection creates the scan configuration and adds it to the discovery scan configurations list.

To view or manage your scanconfigurations, seeManage scanconfigurations.

Note:We regularly improve our detection algorithm. If we find that your organizationor project would benefit from a new improvement that we implement, we mightautomatically regenerate your data profiles and redo theactions in your scanconfiguration. You won't incur Sensitive Data Protection charges for thisoperation. However, because we will redo the actions, you might incur chargesfor your use of other Google Cloud services. For example, if you configuredSensitive Data Protection to save the data profiles to BigQuery, youmight incur BigQuery charges.

What's next

  • Learn how tomanage data profiles.
  • Learn how tomanage scan configurations.
  • Learn how toreceive and parse Pub/Sub messages published by the data profiler.
  • Learn how totroubleshoot issues with data profiles.
  • Look through thedata profiling limits.
  • Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

    Last updated 2025-12-15 UTC.