Create de-identified copies of data stored in Cloud Storage using the API Stay organized with collections Save and categorize content based on your preferences.
This page describes how to inspect a Cloud Storage resource and createde-identified copies the data using the Cloud Data Loss Prevention API.
This operation helps to ensure that the files that you use in your businessprocesses don't contain sensitive data, such as personally identifiableinformation (PII). Sensitive Data Protection can inspect files in aCloud Storage bucket for sensitive data, and create de-identified copies ofthose files in a separate bucket. You can then use the de-identified copies inyour business processes.
For more information about this feature, seeDe-identification of sensitive data in Cloud Storage.
Before you begin
This page assumes the following:
You have enabled billing.
You have enabled Sensitive Data Protection.
You have a Cloud Storage bucket with data that you want to de-identify.
You know how to send an HTTP request to the DLP API. For moreinformation, seeInspect sensitive text by using the DLP API.
Learn about thelimitations and points of consideration for thisoperation.
Storage inspection requires the following OAuth scope:https://www.googleapis.com/auth/cloud-platform. For more information, seeAuthenticating to the DLP API.
Required IAM roles
If all resources for this operation are in the same project, theDLP API Service Agent role (roles/dlp.serviceAgent) on theservice agent is sufficient. With that role, you can do the following:
- Create the inspection job
- Read the files in the input directory
- Write the de-identified files in the output directory
- Write the transformation details in a BigQuery table
The relevant resources includethe inspection job, de-identification templates, input bucket, output bucket,and transformation details table.
If you must have the resources in separate projects, make sure that theservice agent of your project also has the following roles:
- The Storage Object Viewer role (
roles/storage.objectViewer) on the inputbucket or the project that contains it. - The Storage Object Creator role(
roles/storage.objectCreator) on the output bucket or the project thatcontains it. - The BigQuery Data Editor role (
roles/bigquery.dataEditor) on thetransformation details table or the project that contains it.
To grant a role to the service agent, seeGrant a single role. You canalso control access at the following levels:
API overview
To create de-identified copies of content stored in Cloud Storage,you configurean inspection jobthat looks for sensitive dataaccording to the criteria that you specify. Then, within the inspection job, youprovide de-identification instructions in the form of aDeidentify action.
- Actions conceptual topic
- Action reference documentation
- Retrieving inspection results
If you want to scan only a subset of the files in your bucket, you canlimit the files that the job scans. The supported options for jobs withde-identification are file filtering by type (FileType) and regularexpression (FileSet).
When you enable theDeidentify action, by default, Sensitive Data Protectioncreates de-identified (transformed) copies of allsupported file typesincluded in the scan. However, you can configure the job to transform only asubset of the supported file types.
Optional: Create de-identify templates
If you want to control how the findings aretransformed,create the following templates. These templates provide instructionsabout transforming findings in structured files, unstructured files, andimages.
Note: If you choose acryptographic method, youmust firstcreate a wrapped key using Cloud Key Management Service, and provide that keyin your de-identification template. Transient (raw) keys aren't supported.De-identify template: a default
DeidentifyTemplateto beused for unstructured files, such as freeform text files. This type ofDeidentifyTemplatecan't contain aRecordTransformationsobject, whichis only supported for structured content. If this template isn't present,Sensitive Data Protection uses theReplaceWithInfoTypeConfigmethod to transformunstructured files.Structured de-identify template: a
DeidentifyTemplateto be used forstructured files, such as CSV files. ThisDeidentifyTemplatecan containRecordTransformations. If this template isn't present,Sensitive Data Protection uses the default de-identify template that you created.If that is also not present, Sensitive Data Protection usestheReplaceWithInfoTypeConfigmethod to transform structured files.Image redaction template: a
DeidentifyTemplateto be used for images. Thistemplate must contain anImageTransformationsobject.If this template isn't present, Sensitive Data Protection redacts all findings inimages with a black box.
Learn more aboutcreating a de-identify template.
Create an inspection job that has a de-identification action
TheDlpJob object provides instructions on what to inspect, what typesof data to flag as sensitive, and what to do with the findings.To de-identify sensitive data in a Cloud Storage directory, yourDlpJob must define at least the following:
- A
StorageConfigobject, which specifies the Cloud Storage directoryto inspect. - An
InspectConfigobject, which contains the types of data to look forand additional inspection instructions for how to find the sensitive data. A
Deidentifyaction that contains the following:A
TransformationConfigobject, which specifies anytemplates you created forde-identifying data in structured and unstructured files. Youcan also include configuration for redacting sensitive data from images.If you don't include a
TransformationConfigobject, Sensitive Data Protectionreplacessensitive data in text with its infoType. On images, it covers sensitivedata with a black box.A
TransformationDetailsStorageConfigobject, which specifiesa BigQuery table where Sensitive Data Protection muststore details about each transformation. For each transformation, detailsinclude a description, a success or error code, any error details, thenumber of bytes transformed, the location of the transformed content, andthe name of the inspection job in which Sensitive Data Protection made thetransformation. This table does not store the actual de-identified content.
Whendata is writtento a BigQuery table, the billing and quota usage are applied tothe project that contains the destination table.
After the copied content is de-identified, the de-identification jobfinishes. The job contains a summary of how many times the specifiedtransformations have been applied, which you can retrieve using theprojects.dlpJobs.get method onDlpJob. The returnedDlpJob includes bothaDeidentifyDataSourceDetails object and anInspectDataSourceDetailsobject. Those objects contain both the results of aDeidentify action and theinspection job, respectively.
If you included aTransformationDetailsStorageConfig objectin yourDlpJob, a BigQuerytable is created containing metadata about the transformation details. For eachtransformation that occurs, Sensitive Data Protection writes one row of metadatato the table. For more information about the contents of the table,seeTransformation details reference.
Code examples
The following examples demonstrate how to use the DLP API tocreate de-identified copies of Cloud Storage files.
Note: The following example requires anOAuth 2.0 access token.HTTP method and URL
POST https://dlp.googleapis.com/v2/projects/PROJECT_ID/dlpJobsC#
To learn how to install and use the client library for Sensitive Data Protection, seeSensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.
usingGoogle.Api.Gax.ResourceNames;usingGoogle.Cloud.Dlp.V2;usingSystem.Linq;publicclassDeidentifyDataStoredInCloudStorage{publicstaticDlpJobDeidentify(stringprojectId,stringgcsInputPath,stringunstructuredDeidentifyTemplatePath,stringstructuredDeidentifyTemplatePath,stringimageRedactionTemplatePath,stringgcsOutputPath,stringdatasetId,stringtableId){// Instantiate the client.vardlp=DlpServiceClient.Create();//Construct the storage config by specifying the input directory.varstorageConfig=newStorageConfig{CloudStorageOptions=newCloudStorageOptions{FileSet=newCloudStorageOptions.Types.FileSet{Url=gcsInputPath}}};// Construct the inspect config by specifying the type of info to be inspected.varinspectConfig=newInspectConfig{InfoTypes={newInfoType[]{newInfoType{Name="PERSON_NAME"},newInfoType{Name="EMAIL_ADDRESS"}}},IncludeQuote=true};// Construct the actions to take after the inspection portion of the job is completed.// Specify how Cloud DLP must de-identify sensitive data in structured files, unstructured files and images// using Transformation config.// The de-identified files will be written to the the GCS bucket path specified in gcsOutputPath and the details of// transformations performed will be written to BigQuery table specified in datasetId and tableId.varactions=newAction[]{newAction{Deidentify=newAction.Types.Deidentify{CloudStorageOutput=gcsOutputPath,TransformationConfig=newTransformationConfig{DeidentifyTemplate=unstructuredDeidentifyTemplatePath,ImageRedactTemplate=imageRedactionTemplatePath,StructuredDeidentifyTemplate=structuredDeidentifyTemplatePath,},TransformationDetailsStorageConfig=newTransformationDetailsStorageConfig{Table=newBigQueryTable{ProjectId=projectId,DatasetId=datasetId,TableId=tableId}}}}};// Construct the inspect job config using created storage config, inspect config and actions.varinspectJob=newInspectJobConfig{StorageConfig=storageConfig,InspectConfig=inspectConfig,Actions={actions}};// Create the dlp job and call the API.DlpJobresponse=dlp.CreateDlpJob(newCreateDlpJobRequest{ParentAsLocationName=newLocationName(projectId,"global"),InspectJob=inspectJob});returnresponse;}}Go
To learn how to install and use the client library for Sensitive Data Protection, seeSensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.
import("context""fmt""io"dlp"cloud.google.com/go/dlp/apiv2""cloud.google.com/go/dlp/apiv2/dlppb")funcdeidentifyCloudStorage(wio.Writer,projectID,gcsUri,tableId,datasetId,outputDirectory,deidentifyTemplateId,structuredDeidentifyTemplateId,imageRedactTemplateIdstring)error{// projectId := "my-project-id"// gcsUri := "gs://" + "your-bucket-name" + "/path/to/your/file.txt"// tableId := "your-bigquery-table-id"// datasetId := "your-bigquery-dataset-id"// outputDirectory := "your-output-directory"// deidentifyTemplateId := "your-deidentify-template-id"// structuredDeidentifyTemplateId := "your-structured-deidentify-template-id"// imageRedactTemplateId := "your-image-redact-template-id"ctx:=context.Background()// Initialize a client once and reuse it to send multiple requests. Clients// are safe to use across goroutines. When the client is no longer needed,// call the Close method to cleanup its resources.client,err:=dlp.NewClient(ctx)iferr!=nil{returnerr}// Closing the client safely cleans up background resources.deferclient.Close()// Set path in Cloud Storage.cloudStorageOptions:=&dlppb.CloudStorageOptions{FileSet:&dlppb.CloudStorageOptions_FileSet{Url:gcsUri,},}// Define the storage config options for cloud storage options.storageConfig:=&dlppb.StorageConfig{Type:&dlppb.StorageConfig_CloudStorageOptions{CloudStorageOptions:cloudStorageOptions,},}// Specify the type of info the inspection will look for.// See https://cloud.google.com/dlp/docs/infotypes-reference for complete list of info typesinfoTypes:=[]*dlppb.InfoType{{Name:"PERSON_NAME"},{Name:"EMAIL_ADDRESS"},}// inspectConfig holds the configuration settings for data inspection and analysis// within the context of the Google Cloud Data Loss Prevention (DLP) API.inspectConfig:=&dlppb.InspectConfig{InfoTypes:infoTypes,IncludeQuote:true,}// Types of files to include for de-identification.fileTypesToTransform:=[]dlppb.FileType{dlppb.FileType_CSV,dlppb.FileType_IMAGE,dlppb.FileType_TEXT_FILE,}// Specify the BigQuery table to be inspected.table:=&dlppb.BigQueryTable{ProjectId:projectID,DatasetId:datasetId,TableId:tableId,}// transformationDetailsStorageConfig holds configuration settings for storing transformation// details in the context of the Google Cloud Data Loss Prevention (DLP) API.transformationDetailsStorageConfig:=&dlppb.TransformationDetailsStorageConfig{Type:&dlppb.TransformationDetailsStorageConfig_Table{Table:table,},}transformationConfig:=&dlppb.TransformationConfig{DeidentifyTemplate:deidentifyTemplateId,ImageRedactTemplate:imageRedactTemplateId,StructuredDeidentifyTemplate:structuredDeidentifyTemplateId,}// Action to execute on the completion of a job.deidentify:=&dlppb.Action_Deidentify{TransformationConfig:transformationConfig,TransformationDetailsStorageConfig:transformationDetailsStorageConfig,Output:&dlppb.Action_Deidentify_CloudStorageOutput{CloudStorageOutput:outputDirectory,},FileTypesToTransform:fileTypesToTransform,}action:=&dlppb.Action{Action:&dlppb.Action_Deidentify_{Deidentify:deidentify,},}// Configure the inspection job we want the service to perform.inspectJobConfig:=&dlppb.InspectJobConfig{StorageConfig:storageConfig,InspectConfig:inspectConfig,Actions:[]*dlppb.Action{action,},}// Construct the job creation request to be sent by the client.req:=&dlppb.CreateDlpJobRequest{Parent:fmt.Sprintf("projects/%s/locations/global",projectID),Job:&dlppb.CreateDlpJobRequest_InspectJob{InspectJob:inspectJobConfig,},}// Send the request.resp,err:=client.CreateDlpJob(ctx,req)iferr!=nil{fmt.Fprintf(w,"error after resp: %v",err)returnerr}// Print the results.fmt.Fprint(w,"Job created successfully: ",resp.Name)returnnil}Java
To learn how to install and use the client library for Sensitive Data Protection, seeSensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.
importcom.google.cloud.dlp.v2.DlpServiceClient;importcom.google.privacy.dlp.v2.Action;importcom.google.privacy.dlp.v2.BigQueryTable;importcom.google.privacy.dlp.v2.CloudStorageOptions;importcom.google.privacy.dlp.v2.CreateDlpJobRequest;importcom.google.privacy.dlp.v2.DlpJob;importcom.google.privacy.dlp.v2.FileType;importcom.google.privacy.dlp.v2.InfoType;importcom.google.privacy.dlp.v2.InfoTypeStats;importcom.google.privacy.dlp.v2.InspectConfig;importcom.google.privacy.dlp.v2.InspectDataSourceDetails;importcom.google.privacy.dlp.v2.InspectJobConfig;importcom.google.privacy.dlp.v2.LocationName;importcom.google.privacy.dlp.v2.ProjectDeidentifyTemplateName;importcom.google.privacy.dlp.v2.StorageConfig;importcom.google.privacy.dlp.v2.TransformationConfig;importcom.google.privacy.dlp.v2.TransformationDetailsStorageConfig;importjava.io.IOException;importjava.util.ArrayList;importjava.util.Arrays;importjava.util.List;importjava.util.concurrent.TimeUnit;publicclassDeidentifyCloudStorage{// Set the timeout duration in minutes.privatestaticfinalintTIMEOUT_MINUTES=15;publicstaticvoidmain(String[]args)throwsIOException,InterruptedException{// TODO(developer): Replace these variables before running the sample.// The Google Cloud project id to use as a parent resource.StringprojectId="your-project-id";// Specify the cloud storage directory that you want to inspect.StringgcsPath="gs://"+"your-bucket-name"+"/path/to/your/file.txt";// Specify the big query dataset id to store the transformation details.StringdatasetId="your-bigquery-dataset-id";// Specify the big query table id to store the transformation details.StringtableId="your-bigquery-table-id";// Specify the cloud storage directory to store the de-identified files.StringoutputDirectory="your-output-directory";// Specify the de-identify template ID for unstructured files.StringdeidentifyTemplateId="your-deidentify-template-id";// Specify the de-identify template ID for structured files.StringstructuredDeidentifyTemplateId="your-structured-deidentify-template-id";// Specify the de-identify template ID for images.StringimageRedactTemplateId="your-image-redact-template-id";deidentifyCloudStorage(projectId,gcsPath,tableId,datasetId,outputDirectory,deidentifyTemplateId,structuredDeidentifyTemplateId,imageRedactTemplateId);}publicstaticvoiddeidentifyCloudStorage(StringprojectId,StringgcsPath,StringtableId,StringdatasetId,StringoutputDirectory,StringdeidentifyTemplateId,StringstructuredDeidentifyTemplateId,StringimageRedactTemplateId)throwsIOException,InterruptedException{try(DlpServiceClientdlp=DlpServiceClient.create()){// Set path in Cloud Storage.CloudStorageOptionscloudStorageOptions=CloudStorageOptions.newBuilder().setFileSet(CloudStorageOptions.FileSet.newBuilder().setUrl(gcsPath)).build();// Set storage config indicating the type of cloud storage.StorageConfigstorageConfig=StorageConfig.newBuilder().setCloudStorageOptions(cloudStorageOptions).build();// Specify the type of info the inspection will look for.// See https://cloud.google.com/dlp/docs/infotypes-reference for complete list of info typesList<InfoType>infoTypes=newArrayList<>();for(StringtypeName:newString[]{"PERSON_NAME","EMAIL_ADDRESS"}){infoTypes.add(InfoType.newBuilder().setName(typeName).build());}InspectConfiginspectConfig=InspectConfig.newBuilder().addAllInfoTypes(infoTypes).setIncludeQuote(true).build();// Types of files to include for de-identification.List<FileType>fileTypesToTransform=Arrays.asList(FileType.valueOf("IMAGE"),FileType.valueOf("CSV"),FileType.valueOf("TEXT_FILE"));// Specify the big query table to store the transformation details.BigQueryTabletable=BigQueryTable.newBuilder().setProjectId(projectId).setTableId(tableId).setDatasetId(datasetId).build();TransformationDetailsStorageConfigtransformationDetailsStorageConfig=TransformationDetailsStorageConfig.newBuilder().setTable(table).build();// Specify the de-identify template used for the transformation.TransformationConfigtransformationConfig=TransformationConfig.newBuilder().setDeidentifyTemplate(ProjectDeidentifyTemplateName.of(projectId,deidentifyTemplateId).toString()).setImageRedactTemplate(ProjectDeidentifyTemplateName.of(projectId,imageRedactTemplateId).toString()).setStructuredDeidentifyTemplate(ProjectDeidentifyTemplateName.of(projectId,structuredDeidentifyTemplateId).toString()).build();Action.Deidentifydeidentify=Action.Deidentify.newBuilder().setCloudStorageOutput(outputDirectory).setTransformationConfig(transformationConfig).setTransformationDetailsStorageConfig(transformationDetailsStorageConfig).addAllFileTypesToTransform(fileTypesToTransform).build();Actionaction=Action.newBuilder().setDeidentify(deidentify).build();// Configure the long-running job we want the service to perform.InspectJobConfiginspectJobConfig=InspectJobConfig.newBuilder().setInspectConfig(inspectConfig).setStorageConfig(storageConfig).addActions(action).build();// Construct the job creation request to be sent by the client.CreateDlpJobRequestcreateDlpJobRequest=CreateDlpJobRequest.newBuilder().setParent(LocationName.of(projectId,"global").toString()).setInspectJob(inspectJobConfig).build();// Send the job creation request.DlpJobresponse=dlp.createDlpJob(createDlpJobRequest);// Get the current time.longstartTime=System.currentTimeMillis();// Check if the job state is DONE.while(response.getState()!=DlpJob.JobState.DONE){// Sleep for 30 second.Thread.sleep(30000);// Get the updated job status.response=dlp.getDlpJob(response.getName());// Check if the timeout duration has exceeded.longelapsedTime=System.currentTimeMillis()-startTime;if(TimeUnit.MILLISECONDS.toMinutes(elapsedTime)>=TIMEOUT_MINUTES){System.out.printf("Job did not complete within %d minutes.%n",TIMEOUT_MINUTES);break;}}// Print the results.System.out.println("Job status: "+response.getState());System.out.println("Job name: "+response.getName());InspectDataSourceDetails.Resultresult=response.getInspectDetails().getResult();System.out.println("Findings: ");for(InfoTypeStatsinfoTypeStat:result.getInfoTypeStatsList()){System.out.print("\tInfo type: "+infoTypeStat.getInfoType().getName());System.out.println("\tCount: "+infoTypeStat.getCount());}}}}Node.js
To learn how to install and use the client library for Sensitive Data Protection, seeSensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.
// Imports the Google Cloud client libraryconstDLP=require('@google-cloud/dlp');// Instantiates a clientconstdlp=newDLP.DlpServiceClient();// The project ID to run the API call under// const projectId = 'my-project';// The Cloud Storage directory that needs to be inspected// const inputDirectory = 'your-google-cloud-storage-path';// The ID of the dataset to inspect, e.g. 'my_dataset'// const datasetId = 'my_dataset';// The ID of the table to inspect, e.g. 'my_table'// const tableId = 'my_table';// The Cloud Storage directory that will be used to store the de-identified files// const outputDirectory = 'your-output-directory';// The full resource name of the default de-identify template// const deidentifyTemplateId = 'your-deidentify-template-id';// The full resource name of the de-identify template for structured files// const structuredDeidentifyTemplateId = 'your-structured-deidentify-template-id';// The full resource name of the image redaction template for images// const imageRedactTemplateId = 'your-image-redact-template-id';asyncfunctiondeidentifyCloudStorage(){// Specify storage configuration that uses file set.conststorageConfig={cloudStorageOptions:{fileSet:{url:inputDirectory,},},};// Specify the type of info the inspection will look for.constinfoTypes=[{name:'PERSON_NAME'},{name:'EMAIL_ADDRESS'}];// Construct inspect configurationconstinspectConfig={infoTypes:infoTypes,includeQuote:true,};// Types of files to include for de-identification.constfileTypesToTransform=[{fileType:'IMAGE'},{fileType:'CSV'},{fileType:'TEXT_FILE'},];// Specify the big query table to store the transformation details.consttransformationDetailsStorageConfig={table:{projectId:projectId,tableId:tableId,datasetId:datasetId,},};// Specify the de-identify template used for the transformation.consttransformationConfig={deidentifyTemplate:deidentifyTemplateId,structuredDeidentifyTemplate:structuredDeidentifyTemplateId,imageRedactTemplate:imageRedactTemplateId,};// Construct action to de-identify sensitive data.constaction={deidentify:{cloudStorageOutput:outputDirectory,transformationConfig:transformationConfig,transformationDetailsStorageConfig:transformationDetailsStorageConfig,fileTypes:fileTypesToTransform,},};// Construct the inspect job configuration.constinspectJobConfig={inspectConfig:inspectConfig,storageConfig:storageConfig,actions:[action],};// Construct the job creation request to be sent by the client.constcreateDlpJobRequest={parent:`projects/${projectId}/locations/global`,inspectJob:inspectJobConfig,};// Send the job creation request and process the response.const[response]=awaitdlp.createDlpJob(createDlpJobRequest);constjobName=response.name;// Waiting for a maximum of 15 minutes for the job to get complete.letjob;letnumOfAttempts=30;while(numOfAttempts >0){// Fetch DLP Job status[job]=awaitdlp.getDlpJob({name:jobName});// Check if the job has completed.if(job.state==='DONE'){break;}if(job.state==='FAILED'){console.log('Job Failed, Please check the configuration.');return;}// Sleep for a short duration before checking the job status again.awaitnewPromise(resolve=>{setTimeout(()=>resolve(),30000);});numOfAttempts-=1;}// Print out the results.constinfoTypeStats=job.inspectDetails.result.infoTypeStats;if(infoTypeStats.length >0){infoTypeStats.forEach(infoTypeStat=>{console.log(` Found${infoTypeStat.count} instance(s) of infoType${infoTypeStat.infoType.name}.`);});}else{console.log('No findings.');}}awaitdeidentifyCloudStorage();PHP
To learn how to install and use the client library for Sensitive Data Protection, seeSensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.
use Google\Cloud\Dlp\V2\Action;use Google\Cloud\Dlp\V2\Action\Deidentify;use Google\Cloud\Dlp\V2\BigQueryTable;use Google\Cloud\Dlp\V2\Client\DlpServiceClient;use Google\Cloud\Dlp\V2\CloudStorageOptions;use Google\Cloud\Dlp\V2\CloudStorageOptions\FileSet;use Google\Cloud\Dlp\V2\CreateDlpJobRequest;use Google\Cloud\Dlp\V2\DlpJob\JobState;use Google\Cloud\Dlp\V2\FileType;use Google\Cloud\Dlp\V2\GetDlpJobRequest;use Google\Cloud\Dlp\V2\InfoType;use Google\Cloud\Dlp\V2\InspectConfig;use Google\Cloud\Dlp\V2\InspectJobConfig;use Google\Cloud\Dlp\V2\StorageConfig;use Google\Cloud\Dlp\V2\TransformationConfig;use Google\Cloud\Dlp\V2\TransformationDetailsStorageConfig;/** * De-identify sensitive data stored in Cloud Storage using the API. * Create an inspection job that has a de-identification action. * * @param string $callingProjectId The project ID to run the API call under. * @param string $inputgcsPath The Cloud Storage directory that you want to de-identify. * @param string $outgcsPath The Cloud Storage directory where you want to store the * de-identified files. * @param string $deidentifyTemplateName The full resource name of the default de-identify template — for * unstructured and structured files — if you created one. This value * must be in the format * `projects/projectName/(locations/locationId)/deidentifyTemplates/templateName`. * @param string $structuredDeidentifyTemplateName The full resource name of the de-identify template for structured * files if you created one. This value must be in the format * `projects/projectName/(locations/locationId)/deidentifyTemplates/templateName`. * @param string $imageRedactTemplateName The full resource name of the image redaction template for images if * you created one. This value must be in the format * `projects/projectName/(locations/locationId)/deidentifyTemplates/templateName`. * @param string $datasetId The ID of the BigQuery dataset where you want to store * the transformation details. If you don't provide a table ID, the * system automatically creates one. * @param string $tableId The ID of the BigQuery table where you want to store the * transformation details. */function deidentify_cloud_storage( // TODO(developer): Replace sample parameters before running the code. string $callingProjectId, string $inputgcsPath = 'gs://YOUR_GOOGLE_STORAGE_BUCKET', string $outgcsPath = 'gs://YOUR_GOOGLE_STORAGE_BUCKET', string $deidentifyTemplateName = 'YOUR_DEIDENTIFY_TEMPLATE_NAME', string $structuredDeidentifyTemplateName = 'YOUR_STRUCTURED_DEIDENTIFY_TEMPLATE_NAME', string $imageRedactTemplateName = 'YOUR_IMAGE_REDACT_DEIDENTIFY_TEMPLATE_NAME', string $datasetId = 'YOUR_DATASET_ID', string $tableId = 'YOUR_TABLE_ID'): void { // Instantiate a client. $dlp = new DlpServiceClient(); $parent = "projects/$callingProjectId/locations/global"; // Specify the GCS Path to be de-identify. $cloudStorageOptions = (new CloudStorageOptions()) ->setFileSet((new FileSet()) ->setUrl($inputgcsPath)); $storageConfig = (new StorageConfig()) ->setCloudStorageOptions(($cloudStorageOptions)); // Specify the type of info the inspection will look for. $inspectConfig = (new InspectConfig()) ->setInfoTypes([ (new InfoType())->setName('PERSON_NAME'), (new InfoType())->setName('EMAIL_ADDRESS') ]); // Specify the big query table to store the transformation details. $transformationDetailsStorageConfig = (new TransformationDetailsStorageConfig()) ->setTable((new BigQueryTable()) ->setProjectId($callingProjectId) ->setDatasetId($datasetId) ->setTableId($tableId)); // Specify the de-identify template used for the transformation. $transformationConfig = (new TransformationConfig()) ->setDeidentifyTemplate( DlpServiceClient::projectDeidentifyTemplateName($callingProjectId, $deidentifyTemplateName) ) ->setStructuredDeidentifyTemplate( DlpServiceClient::projectDeidentifyTemplateName($callingProjectId, $structuredDeidentifyTemplateName) ) ->setImageRedactTemplate( DlpServiceClient::projectDeidentifyTemplateName($callingProjectId, $imageRedactTemplateName) ); $deidentify = (new Deidentify()) ->setCloudStorageOutput($outgcsPath) ->setTransformationConfig($transformationConfig) ->setTransformationDetailsStorageConfig($transformationDetailsStorageConfig) ->setFileTypesToTransform([FileType::TEXT_FILE, FileType::IMAGE, FileType::CSV]); $action = (new Action()) ->setDeidentify($deidentify); // Configure the inspection job we want the service to perform. $inspectJobConfig = (new InspectJobConfig()) ->setInspectConfig($inspectConfig) ->setStorageConfig($storageConfig) ->setActions([$action]); // Send the job creation request and process the response. $createDlpJobRequest = (new CreateDlpJobRequest()) ->setParent($parent) ->setInspectJob($inspectJobConfig); $job = $dlp->createDlpJob($createDlpJobRequest); $numOfAttempts = 10; do { printf('Waiting for job to complete' . PHP_EOL); sleep(30); $getDlpJobRequest = (new GetDlpJobRequest()) ->setName($job->getName()); $job = $dlp->getDlpJob($getDlpJobRequest); if ($job->getState() == JobState::DONE) { break; } $numOfAttempts--; } while ($numOfAttempts > 0); // Print finding counts. printf('Job %s status: %s' . PHP_EOL, $job->getName(), JobState::name($job->getState())); switch ($job->getState()) { case JobState::DONE: $infoTypeStats = $job->getInspectDetails()->getResult()->getInfoTypeStats(); if (count($infoTypeStats) === 0) { printf('No findings.' . PHP_EOL); } else { foreach ($infoTypeStats as $infoTypeStat) { printf( ' Found %s instance(s) of infoType %s' . PHP_EOL, $infoTypeStat->getCount(), $infoTypeStat->getInfoType()->getName() ); } } break; case JobState::FAILED: printf('Job %s had errors:' . PHP_EOL, $job->getName()); $errors = $job->getErrors(); foreach ($errors as $error) { var_dump($error->getDetails()); } break; case JobState::PENDING: printf('Job has not completed. Consider a longer timeout or an asynchronous execution model' . PHP_EOL); break; default: printf('Unexpected job state. Most likely, the job is either running or has not yet started.'); }}Python
To learn how to install and use the client library for Sensitive Data Protection, seeSensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.
importtimefromtypingimportListimportgoogle.cloud.dlpdefdeidentify_cloud_storage(project:str,input_gcs_bucket:str,output_gcs_bucket:str,info_types:List[str],deid_template_id:str,structured_deid_template_id:str,image_redact_template_id:str,dataset_id:str,table_id:str,timeout:int=300,)->None:""" Uses the Data Loss Prevention API to de-identify files in a Google Cloud Storage directory. Args: project: The Google Cloud project id to use as a parent resource. input_gcs_bucket: The name of google cloud storage bucket to inspect. output_gcs_bucket: The name of google cloud storage bucket where de-identified files would be stored. info_types: A list of strings representing info types to look for. A full list of info type categories can be fetched from the API. deid_template_id: The name of the de-identify template for unstructured and structured files. structured_deid_template_id: The name of the de-identify template for structured files. image_redact_template_id: The name of the image redaction template for images. dataset_id: The identifier of the BigQuery dataset where transformation details would be stored. table_id: The identifier of the BigQuery table where transformation details would be stored. timeout: The number of seconds to wait for a response from the API. """# Instantiate a client.dlp=google.cloud.dlp_v2.DlpServiceClient()# Construct the configuration dictionary.# Specify the type of info the inspection will look for.# See https://cloud.google.com/dlp/docs/infotypes-reference for complete list of info types.inspect_config={"info_types":[{"name":info_type}forinfo_typeininfo_types]}# Construct cloud_storage_options dictionary with the bucket's URL.storage_config={"cloud_storage_options":{"file_set":{"url":f"gs://{input_gcs_bucket}"}}}# Specify the big query table to store the transformation details.big_query_table={"project_id":project,"dataset_id":dataset_id,"table_id":table_id,}# Convert the project id into a full resource id.parent=f"projects/{project}/locations/global"# Construct Transformation Configuration with de-identify Templates used# for transformation.transformation_config={"deidentify_template":f"{parent}/deidentifyTemplates/{deid_template_id}","structured_deidentify_template":f"{parent}/deidentifyTemplates/{structured_deid_template_id}","image_redact_template":f"{parent}/deidentifyTemplates/{image_redact_template_id}",}# Tell the API where to send notification when the job is completed.actions=[{"deidentify":{"cloud_storage_output":f"gs://{output_gcs_bucket}","transformation_config":transformation_config,"transformation_details_storage_config":{"table":big_query_table},"file_types_to_transform":["IMAGE","CSV","TEXT_FILE"],}}]# Construct the job definition.inspect_job={"inspect_config":inspect_config,"storage_config":storage_config,"actions":actions,}# Call the API.response=dlp.create_dlp_job(request={"parent":parent,"inspect_job":inspect_job,})job_name=response.nameprint(f"Inspection Job started :{job_name}")# Waiting for the job to get completed.job=dlp.get_dlp_job(request={"name":job_name})# Since the sleep time is kept as 30s, number of calls would be timeout/30.no_of_attempts=timeout//30whileno_of_attempts!=0:# Check if the job has completed.ifjob.state==google.cloud.dlp_v2.DlpJob.JobState.DONE:breakifjob.state==google.cloud.dlp_v2.DlpJob.JobState.FAILED:print("Job Failed, Please check the configuration.")break# Sleep for a short duration before checking the job status again.time.sleep(30)no_of_attempts-=1# Get DLP job status.job=dlp.get_dlp_job(request={"name":job_name})ifjob.state!=google.cloud.dlp_v2.DlpJob.JobState.DONE:print(f"Job did not complete within{timeout} minutes.")return# Print out the results.print(f"Job name:{job.name}")result=job.inspect_details.resultprint(f"Processed Bytes:{result.processed_bytes}")ifresult.info_type_stats:forstatsinresult.info_type_stats:print(f"Info type:{stats.info_type.name}")print(f"Count:{stats.count}")else:print("No findings.")REST
JSON input{"inspect_job":{"storage_config":{"cloud_storage_options":{"file_set":{"url":"INPUT_DIRECTORY"}}},"inspect_config":{"info_types":[{"name":"PERSON_NAME"}]},"actions":{"deidentify":{"cloud_storage_output":"OUTPUT_DIRECTORY","transformation_config":{"deidentify_template":"DEIDENTIFY_TEMPLATE_NAME","structured_deidentify_template":"STRUCTURED_DEIDENTIFY_TEMPLATE_NAME","image_redact_template":"IMAGE_REDACTION_TEMPLATE_NAME"},"transformation_details_storage_config":{"table":{"project_id":"TRANSFORMATION_DETAILS_PROJECT_ID","dataset_id":"TRANSFORMATION_DETAILS_DATASET_ID","table_id":"TRANSFORMATION_DETAILS_TABLE_ID"}},"fileTypesToTransform":["IMAGE","CSV","TEXT_FILE"]}}}}Replace the following:
PROJECT_ID: theID of the project where you wantto store the inspection job.INPUT_DIRECTORY: the Cloud Storage directory thatyou want to inspect—for example,gs://input-bucket/folder1/folder1a.If the URL ends in a trailing slash, any subdirectories insideINPUT_DIRECTORYaren't scanned.OUTPUT_DIRECTORY: the Cloud Storage directorywhere you want to store the de-identified files. This directory must not be inthe same Cloud Storage bucket asINPUT_DIRECTORY.DEIDENTIFY_TEMPLATE_NAME: the full resource name ofthe default de-identify template—for unstructured and structuredfiles—if youcreated one.This value must be in the formatprojects/projectName/(locations/locationId)/deidentifyTemplates/templateName.STRUCTURED_DEIDENTIFY_TEMPLATE_NAME: the full resourcename of the de-identify template for structured files if youcreated one.This value must be in the formatprojects/projectName/(locations/locationId)/deidentifyTemplates/templateName.IMAGE_REDACTION_TEMPLATE_NAME: the full resourcename of the image redaction template for images if youcreated one.This value must be in the formatprojects/projectName/(locations/locationId)/deidentifyTemplates/templateName.TRANSFORMATION_DETAILS_PROJECT_ID: the ID of theproject where you want to store the transformation details.TRANSFORMATION_DETAILS_DATASET_ID: the ID of theBigQuery dataset where you want to store the transformationdetails. If you don't provide a table ID, the system automatically createsone.TRANSFORMATION_DETAILS_TABLE_ID: the ID of theBigQuery table where you want to store the transformationdetails.
Note the following objects:
inspectJob: The configuration object for the job(DlpJob). This object contains theconfiguration for both the inspection and de-identification stages.storageConfig: The location of the content to inspect(StorageConfig). This example specifies a Cloud Storage bucketCloudStorageOptions.inspectConfig: Information about the sensitive data you want to inspectfor (InspectConfig). This example inspectsfor content matching thebuilt-in infoTypePERSON_NAME.actions: The actions to take after the inspection portion of the job is complete (Action).deidentify: Specifying this action tells Sensitive Data Protection tode-identify the matched sensitive data according to the configurationspecified inside (Deidentify).cloud_storage_output: Specifies the URL of the Cloud Storagedirectory that you want to inspect.transformation_config: Specifies how Sensitive Data Protection mustde-identify sensitive data in structured files, unstructured files, andimages (TransformationConfig).If you don't include a
TransformationConfigobject, Sensitive Data Protectionreplacessensitive data in text with its infoType. On images, it covers sensitivedata with a black box.transformation_details_storage_config: Specifies that Sensitive Data Protectionmust store metadata about each transformation that it performs for this job.Also, it specifies the location and name of the table whereSensitive Data Protection must store that metadata(TransformationDetailsStorageConfig).fileTypesToTransform: Limits the de-identification operation to only the file types that you list. If you don't set this field, all supported file types included in the inspection operation are also included in the de-identification operation. In this example, Sensitive Data Protection de-identifies only image, CSV, and text files, even if you configured theDlpJobto inspect all supported file types.
Create an inspection job through the REST API
To create the inspection job (DlpJob), send aprojects.dlpJobs.createrequest. To send the request using cURL, save theprevious RESTexample as a JSON file and run thefollowing command:
curl -s \-H "Content-Type: application/json" \-H "Authorization: Bearer $(gcloud auth print-access-token)" \-H "X-Goog-User-Project:PROJECT_ID" \https://dlp.googleapis.com/v2/projects/PROJECT_ID/dlpJobs \-d @PATH_TO_JSON_FILEReplace the following:
PROJECT_ID: the ID of the project where you storedtheDlpJob.PATH_TO_JSON_FILE: the path to the JSONfile that contains the request body.
Sensitive Data Protection returns the identifier of the newly createdDlpJob, its status, and a snapshot of the inspection configuration that youset.
{ "name": "projects/PROJECT_ID/dlpJobs/JOB_ID", "type": "INSPECT_JOB", "state": "PENDING", ...}Retrieve the results of the inspection job
To retrieve the results of theDlpJob, send aprojects.dlpJobs.getrequest:
curl -s \-H "Content-Type: application/json" \-H "Authorization: Bearer $(gcloud auth print-access-token)" \-H "X-Goog-User-Project:PROJECT_ID" \https://dlp.googleapis.com/v2/projects/PROJECT_ID/dlpJobs/JOB_IDReplace the following:
PROJECT_ID: the ID of the project where you storedtheDlpJob.JOB_ID: the ID of the job that was returned whenyou created theDlpJob.
If the operation is complete, you get a response similar to the following:
{ "name": "projects/PROJECT_ID/dlpJobs/JOB_ID", "type": "INSPECT_JOB", "state": "DONE", "inspectDetails": { "requestedOptions": { "snapshotInspectTemplate": {}, "jobConfig": { "storageConfig": { "cloudStorageOptions": { "fileSet": { "url": "INPUT_DIRECTORY" } } }, "inspectConfig": { "infoTypes": [ { "name": "PERSON_NAME" } ], "limits": {} }, "actions": [ { "deidentify": { "transformationDetailsStorageConfig": { "table": { "projectId": "TRANSFORMATION_DETAILS_PROJECT_ID", "datasetId": "TRANSFORMATION_DETAILS_DATASET_ID", "tableId": "TRANSFORMATION_DETAILS_TABLE_ID" } }, "transformationConfig": { "deidentifyTemplate": "DEIDENTIFY_TEMPLATE_NAME", "structuredDeidentifyTemplate": "STRUCTURED_DEIDENTIFY_TEMPLATE_NAME", "imageRedactTemplate": "IMAGE_REDACTION_TEMPLATE_NAME" }, "fileTypesToTransform": [ "IMAGE", "CSV", "TEXT_FILE" ], "cloudStorageOutput": "OUTPUT_DIRECTORY" } } ] } }, "result": { "processedBytes": "25242", "totalEstimatedBytes": "25242", "infoTypeStats": [ { "infoType": { "name": "PERSON_NAME" }, "count": "114" } ] } }, "createTime": "2022-06-09T23:00:53.380Z", "startTime": "2022-06-09T23:01:27.986383Z", "endTime": "2022-06-09T23:02:00.443536Z", "actionDetails": [ { "deidentifyDetails": { "requestedOptions": { "snapshotDeidentifyTemplate": { "name": "DEIDENTIFY_TEMPLATE_NAME", "createTime": "2022-06-09T17:46:34.208923Z", "updateTime": "2022-06-09T17:46:34.208923Z", "deidentifyConfig": { "infoTypeTransformations": { "transformations": [ { "primitiveTransformation": { "characterMaskConfig": { "maskingCharacter": "*", "numberToMask": 25 } } } ] } }, "locationId": "global" }, "snapshotStructuredDeidentifyTemplate": { "name": "STRUCTURED_DEIDENTIFY_TEMPLATE_NAME", "createTime": "2022-06-09T20:51:12.411456Z", "updateTime": "2022-06-09T21:07:53.633149Z", "deidentifyConfig": { "recordTransformations": { "fieldTransformations": [ { "fields": [ { "name": "Name" } ], "primitiveTransformation": { "replaceConfig": { "newValue": { "stringValue": "[redacted]" } } } } ] } }, "locationId": "global" }, "snapshotImageRedactTemplate": { "name": "IMAGE_REDACTION_TEMPLATE_NAME", "createTime": "2022-06-09T20:52:25.453564Z", "updateTime": "2022-06-09T20:52:25.453564Z", "deidentifyConfig": {}, "locationId": "global" } }, "deidentifyStats": { "transformedBytes": "3972", "transformationCount": "110" } } } ], "locationId": "global"}What's next
- Learn more about theprocess of de-identifying data in storage.
- Learn how tode-identify data in storage using the Google Cloud console.
- Work through theCreating a De-identified Copy of Data inCloud Storage codelab.
- Learn more aboutde-identification transformations.
- Learn how toinspect storage for sensitive data.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-17 UTC.