Create de-identified copies of data stored in Cloud Storage using the Google Cloud console Stay organized with collections Save and categorize content based on your preferences.
This page describes how to inspect a Cloud Storage directoryand create de-identified copies of the supportedfiles, using Sensitive Data Protection in the Google Cloud console.
This operation helps to ensure that the files that you use in your businessprocesses don't contain sensitive data, such as personally identifiableinformation (PII). Sensitive Data Protection can inspect files in aCloud Storage bucket for sensitive data, and create de-identified copies ofthose files in a separate bucket. You can then use the de-identified copies inyour business processes.
For more information about what happens when you de-identify data in storage,seeDe-identification of sensitive data in storage.
Before you begin
This page assumes the following:
You have enabled billing.
You have enabled Sensitive Data Protection.
You have a Cloud Storage bucket with data that you want to de-identify.
Learn about thelimitations and points of consideration for thisoperation.
Storage inspection requires the following OAuth scope:https://www.googleapis.com/auth/cloud-platform. For more information, seeAuthenticating to the DLP API.
Required IAM roles
If all resources for this operation are in the same project, theDLP API Service Agent role (roles/dlp.serviceAgent) on theservice agent is sufficient. With that role, you can do the following:
- Create the inspection job
- Read the files in the input directory
- Write the de-identified files in the output directory
- Write the transformation details in a BigQuery table
The relevant resources includethe inspection job, de-identification templates, input bucket, output bucket,and transformation details table.
If you must have the resources in separate projects, make sure that theservice agent of your project also has the following roles:
- The Storage Object Viewer role (
roles/storage.objectViewer) on the inputbucket or the project that contains it. - The Storage Object Creator role(
roles/storage.objectCreator) on the output bucket or the project thatcontains it. - The BigQuery Data Editor role (
roles/bigquery.dataEditor) on thetransformation details table or the project that contains it.
To grant a role to the service agent, seeGrant a single role. You canalso control access at the following levels:
Overview
To create de-identified copies of your Cloud Storage files, you configurean inspection job that looks for sensitive data according to the criteria thatyou specify. Then, within the inspection job, you enable theMake ade-identified copy action. You can set de-identify templates that dictate howSensitive Data Protection must transform the findings. If you don't provide anyde-identify template, Sensitive Data Protection transforms the findings asdescribed inDefault de-identification behavior.
Note:In Sensitive Data Protection, anaction is something that occurs after a Sensitive Data Protection job completes successfully. For more information about actions, see the following:- Actions conceptual topic
- Action reference documentation
- Retrieving inspection results
If you enable theMake a de-identified copy action, by default,Sensitive Data Protection transforms allsupported file types included in thescan. However, you can configure the job to transform only a subset of thesupported file types.
Optional: Create de-identify templates
If you want to control how the findings aretransformed,create the following templates. These templates provide instructionsabout transforming findings in structured files, unstructured files, andimages.
Note: If you choose acryptographic method, youmust firstcreate a wrapped key using Cloud Key Management Service, and provide that keyin your de-identification template. Transient (raw) keys aren't supported.De-identify template: a default de-identify template to beused for unstructured files, such as freeform text files.This type of de-identify template can't containrecord transformations, which are only supported for structured content.If this template isn't present, Sensitive Data Protection uses theinfoType replacement method to transform unstructured files.
Structured de-identify template: a de-identify template to be used forstructured files, such as CSV files. This de-identify templatecan contain record transformations. If this template isn't present,Sensitive Data Protection uses the default de-identify template that you created.If that is also not present, Sensitive Data Protection usesthe infoType replacement method to transform structured files.
Image redaction template: a de-identify template to be used for images.If this template isn't present, Sensitive Data Protection redacts all findings inimages with a black box.
Learn how tocreate a de-identify template.
Create an inspection job that has a de-identification action
In the Google Cloud console go to theCreate job or job trigger page.
Enter the Sensitive Data Protection job information, and clickContinue tocomplete each step.
The following sections describe how to fill in the relevant sections of thepage.
Choose input data
In theChoose input data section, do the following:
- Optional: ForName, enter an identifier for the inspection job.
- ForResource location, selectGlobal or the region where you want tostore the inspection job.
- ForLocation, selectGoogle Cloud Storage.
- ForURL, enter the path to the input directory. The input directorycontains the data that you want to scan—forexample,
gs://input-bucket/folder1/folder1a. If you want to scan theinput directory recursively, add a trailing slash to the URL, and then selectScan recursively. In theSampling section, in theSampling method list, selectNo sampling.
Sampling isn't supported on jobs and job triggers configured withde-identification.
Configure detection
In theConfigure detection section, choose the types of sensitive data toinspect for. These are calledinfoTypes. Youcan select from thelist of predefined infoTypes, or you canselect a template if one exists. For more details, seeConfigure detection.
Add actions
In theAdd actions section, do the following:
- Turn onMake a de-identified copy.
- Optional: ForDe-identification template enter the full resource name ofthe default de-identify template if youcreated one.
- Optional: ForStructured de-identification template enter the fullresource name of the de-identify template for structured files if you createdone. If you did not, Sensitive Data Protection uses the defaulttemplate if you created one.
- Optional: ForImage redaction template enter the full resource name ofthe image redaction template for images if you created one.
Optional: If you want Sensitive Data Protection to store the transformationdetails in a BigQuery table, selectExport transformation details to BigQuery, then fill in the following:
- Project ID: the project that contains the BigQuery table.
- Dataset ID: the dataset that contains the BigQuery table.
- Table ID: the table where Sensitive Data Protection muststore details about each transformation. Sensitive Data Protection creates thistable with the table ID that you provide. If you don't provide a tableID, the system automatically creates one.
This table does not store the actual de-identified content.
Whendata is writtento a BigQuery table, the billing and quota usage are applied tothe project that contains the destination table.
ForCloud Storage output location, enter the URL of theCloud Storage directory where you want to store the de-identified files.This directory must not be in the same Cloud Storage bucket as theinput directory.
Optional: ForFile types, select the types of files that you want totransform.
For more information about other actions you can add, seeAdd actions.
Schedule
In theSchedule section, specify whether you want to make this job arecurring job:
- To run the scan only once, keep the field set toNone.
- To schedule scans to run periodically, clickCreate a trigger to run the job on a periodic schedule.
For more information, seeSchedule.
Review
In theSchedule section, review the job configuration, and if needed,edit the job.
ClickCreate.
If you opted not to schedule the job, Sensitive Data Protection immediately startsrunning it. After the job completes, the system redirectsyou to theJob details page, where you can view the results of theinspection and de-identification operations.
If you opted to export the transformation details to a BigQuerytable, the table is populated. It contains one row for each transformationthat Sensitive Data Protection made. For each transformation, detailsinclude a description, a success or error code, any error details, thenumber of bytes transformed, the location of the transformed content, andthe name of the inspection job in which Sensitive Data Protection made thetransformation. This table does not contain the actual de-identified content.
Confirm that the files were de-identified
- On theJob details page, click theConfiguration tab.
- To view the de-identified files in the output directory, click the link in theOutput bucket for de-identified Cloud Storage data field.
To view the BigQuery table that contains the transformationdetails, click the link in theTransformation Details field.
For information about how to query a BigQuery table, seeRunning interactive queries.
What's next
- Learn more about theprocess of de-identifying data in storage.
- Learn how tode-identify sensitive data stored in Cloud Storage usingthe DLP API.
- Work through theCreating a De-identified Copy of Data inCloud Storage codelab.
- Learn more aboutde-identification transformations.
- Learn how tocreate and schedule inspection jobs.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2026-02-19 UTC.