Computing k-map for a dataset

K-map is very similar tok-anonymity, except that it assumes that theattacker most likely doesn't know who is in the dataset. Usek-map if yourdataset is relatively small, or if the level of effort involved in generalizingattributes would be too high.

Just likek-anonymity,k-map requires you to determine which columns of yourdatabase are quasi-identifiers. In doing this, you are stating what data anattacker will most likely use to re-identify subjects. In addition, computing ak-map value requires a re-identification dataset: a larger table with which tocompare rows in the original dataset.

This topic demonstrates how to computek-map values for a dataset usingSensitive Data Protection. For more information aboutk-map or risk analysis ingeneral, see therisk analysis concept topicbefore continuing on.

Note: At this time, you can only computek-map values using theDLP API or Sensitive Data Protection-supportedclientlibraries. Sensitive Data Protection in theGoogle Cloud console doesn't support computingk-map values.

Note: Prematurely canceling an operation midway through a job still incurs costs for the portion of the job that was completed. For more information about billing, seeSensitive Data Protection pricing.

Before you begin

Before continuing, be sure you've done the following:

  1. Sign in to your Google Account.
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
  3. Go to the project selector
  4. Make sure that billing is enabled for your Google Cloud project.Learn how to confirm billing is enabled for your project.
  5. Enable Sensitive Data Protection.
  6. Enable Sensitive Data Protection

  7. Select a BigQuery dataset to analyze. Sensitive Data Protection estimates thek-map metric by scanning a BigQuery table.
  8. Determine the types of datasets you want to use to model the attack dataset. For more information, see the reference page for theKMapEstimationConfig object, as well asRisk analysis terms and techniques.

Computek-map estimates

You can estimatek-map values using Sensitive Data Protection, which usesa statistical model to estimate a re-identification dataset. This is in contrastto the other risk analysis methods, in which the attack dataset is explicitlyknown. Depending on the type of data, Sensitive Data Protection uses publiclyavailable datasets (for example, from the US Census) or a custom statisticalmodel (for example, one or more BigQuery tables that you specify), orit extrapolates from the distribution of values in your input dataset. For moreinformation, see the reference page for theKMapEstimationConfigobject.

To compute ak-map estimate using Sensitive Data Protection, first configure therisk job. Compose a request to theprojects.dlpJobsresource, wherePROJECT_ID indicates yourprojectidentifier:

https://dlp.googleapis.com/v2/projects/PROJECT_ID/dlpJobs

The request contains aRiskAnalysisJobConfigobject, which is composed of the following:

Code examples

Following is sample code in several languages that demonstrates how touse Sensitive Data Protection to compute ak-map value.

Important: The code on this page requires that you first set up a Sensitive Data Protection client. For more information about installing and creating a Sensitive Data Protection client, seeSensitive Data Protection client libraries. (Sending JSON to Sensitive Data Protection REST endpoints does not require a client library.)

Go

To learn how to install and use the client library for Sensitive Data Protection, seeSensitive Data Protection client libraries.

To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

import("context""fmt""io""strings""time"dlp"cloud.google.com/go/dlp/apiv2""cloud.google.com/go/dlp/apiv2/dlppb""cloud.google.com/go/pubsub""github.com/golang/protobuf/ptypes/empty")// riskKMap runs K Map on the given data.funcriskKMap(wio.Writer,projectID,dataProject,pubSubTopic,pubSubSub,datasetID,tableID,regionstring,columnNames...string)error{// projectID := "my-project-id"// dataProject := "bigquery-public-data"// pubSubTopic := "dlp-risk-sample-topic"// pubSubSub := "dlp-risk-sample-sub"// datasetID := "san_francisco"// tableID := "bikeshare_trips"// region := "US"// columnNames := "zip_code"ctx:=context.Background()client,err:=dlp.NewClient(ctx)iferr!=nil{returnfmt.Errorf("dlp.NewClient: %w",err)}// Create a PubSub Client used to listen for when the inspect job finishes.pubsubClient,err:=pubsub.NewClient(ctx,projectID)iferr!=nil{returnerr}deferpubsubClient.Close()// Create a PubSub subscription we can use to listen for messages.// Create the Topic if it doesn't exist.t:=pubsubClient.Topic(pubSubTopic)topicExists,err:=t.Exists(ctx)iferr!=nil{returnerr}if!topicExists{ift,err=pubsubClient.CreateTopic(ctx,pubSubTopic);err!=nil{returnerr}}// Create the Subscription if it doesn't exist.s:=pubsubClient.Subscription(pubSubSub)subExists,err:=s.Exists(ctx)iferr!=nil{returnerr}if!subExists{ifs,err=pubsubClient.CreateSubscription(ctx,pubSubSub,pubsub.SubscriptionConfig{Topic:t});err!=nil{returnerr}}// topic is the PubSub topic string where messages should be sent.topic:="projects/"+projectID+"/topics/"+pubSubTopic// Build the QuasiID slice.varq[]*dlppb.PrivacyMetric_KMapEstimationConfig_TaggedFieldfor_,c:=rangecolumnNames{q=append(q,&dlppb.PrivacyMetric_KMapEstimationConfig_TaggedField{Field:&dlppb.FieldId{Name:c,},Tag:&dlppb.PrivacyMetric_KMapEstimationConfig_TaggedField_Inferred{Inferred:&empty.Empty{},},})}// Create a configured request.req:=&dlppb.CreateDlpJobRequest{Parent:fmt.Sprintf("projects/%s/locations/global",projectID),Job:&dlppb.CreateDlpJobRequest_RiskJob{RiskJob:&dlppb.RiskAnalysisJobConfig{// PrivacyMetric configures what to compute.PrivacyMetric:&dlppb.PrivacyMetric{Type:&dlppb.PrivacyMetric_KMapEstimationConfig_{KMapEstimationConfig:&dlppb.PrivacyMetric_KMapEstimationConfig{QuasiIds:q,RegionCode:region,},},},// SourceTable describes where to find the data.SourceTable:&dlppb.BigQueryTable{ProjectId:dataProject,DatasetId:datasetID,TableId:tableID,},// Send a message to PubSub using Actions.Actions:[]*dlppb.Action{{Action:&dlppb.Action_PubSub{PubSub:&dlppb.Action_PublishToPubSub{Topic:topic,},},},},},},}// Create the risk job.j,err:=client.CreateDlpJob(ctx,req)iferr!=nil{returnfmt.Errorf("CreateDlpJob: %w",err)}fmt.Fprintf(w,"Created job: %v\n",j.GetName())// Wait for the risk job to finish by waiting for a PubSub message.// This only waits for 10 minutes. For long jobs, consider using a truly// asynchronous execution model such as Cloud Functions.ctx,cancel:=context.WithTimeout(ctx,10*time.Minute)defercancel()err=s.Receive(ctx,func(ctxcontext.Context,msg*pubsub.Message){// If this is the wrong job, do not process the result.ifmsg.Attributes["DlpJobName"]!=j.GetName(){msg.Nack()return}msg.Ack()time.Sleep(500*time.Millisecond)j,err:=client.GetDlpJob(ctx,&dlppb.GetDlpJobRequest{Name:j.GetName(),})iferr!=nil{fmt.Fprintf(w,"GetDlpJob: %v",err)return}h:=j.GetRiskDetails().GetKMapEstimationResult().GetKMapEstimationHistogram()fori,b:=rangeh{fmt.Fprintf(w,"Histogram bucket %v\n",i)fmt.Fprintf(w,"  Anonymity range: [%v,%v]\n",b.GetMaxAnonymity(),b.GetMaxAnonymity())fmt.Fprintf(w,"  %v unique values total\n",b.GetBucketSize())for_,v:=rangeb.GetBucketValues(){varqvs[]stringfor_,qv:=rangev.GetQuasiIdsValues(){qvs=append(qvs,qv.String())}fmt.Fprintf(w,"    QuasiID values: %s\n",strings.Join(qvs,", "))fmt.Fprintf(w,"    Estimated anonymity: %v\n",v.GetEstimatedAnonymity())}}// Stop listening for more messages.cancel()})iferr!=nil{returnfmt.Errorf("Recieve: %w",err)}returnnil}

Java

To learn how to install and use the client library for Sensitive Data Protection, seeSensitive Data Protection client libraries.

To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

importcom.google.api.core.SettableApiFuture;importcom.google.cloud.dlp.v2.DlpServiceClient;importcom.google.cloud.pubsub.v1.AckReplyConsumer;importcom.google.cloud.pubsub.v1.MessageReceiver;importcom.google.cloud.pubsub.v1.Subscriber;importcom.google.privacy.dlp.v2.Action;importcom.google.privacy.dlp.v2.Action.PublishToPubSub;importcom.google.privacy.dlp.v2.AnalyzeDataSourceRiskDetails.KMapEstimationResult;importcom.google.privacy.dlp.v2.AnalyzeDataSourceRiskDetails.KMapEstimationResult.KMapEstimationHistogramBucket;importcom.google.privacy.dlp.v2.AnalyzeDataSourceRiskDetails.KMapEstimationResult.KMapEstimationQuasiIdValues;importcom.google.privacy.dlp.v2.BigQueryTable;importcom.google.privacy.dlp.v2.CreateDlpJobRequest;importcom.google.privacy.dlp.v2.DlpJob;importcom.google.privacy.dlp.v2.FieldId;importcom.google.privacy.dlp.v2.GetDlpJobRequest;importcom.google.privacy.dlp.v2.InfoType;importcom.google.privacy.dlp.v2.LocationName;importcom.google.privacy.dlp.v2.PrivacyMetric;importcom.google.privacy.dlp.v2.PrivacyMetric.KMapEstimationConfig;importcom.google.privacy.dlp.v2.PrivacyMetric.KMapEstimationConfig.TaggedField;importcom.google.privacy.dlp.v2.RiskAnalysisJobConfig;importcom.google.pubsub.v1.ProjectSubscriptionName;importcom.google.pubsub.v1.ProjectTopicName;importcom.google.pubsub.v1.PubsubMessage;importjava.io.IOException;importjava.util.ArrayList;importjava.util.Arrays;importjava.util.List;importjava.util.concurrent.ExecutionException;importjava.util.concurrent.TimeUnit;importjava.util.concurrent.TimeoutException;importjava.util.stream.Collectors;@SuppressWarnings("checkstyle:AbbreviationAsWordInName")classRiskAnalysisKMap{publicstaticvoidmain(String[]args)throwsException{// TODO(developer): Replace these variables before running the sample.StringprojectId="your-project-id";StringdatasetId="your-bigquery-dataset-id";StringtableId="your-bigquery-table-id";StringtopicId="pub-sub-topic";StringsubscriptionId="pub-sub-subscription";calculateKMap(projectId,datasetId,tableId,topicId,subscriptionId);}publicstaticvoidcalculateKMap(StringprojectId,StringdatasetId,StringtableId,StringtopicId,StringsubscriptionId)throwsExecutionException,InterruptedException,IOException{// Initialize client that will be used to send requests. This client only needs to be created// once, and can be reused for multiple requests. After completing all of your requests, call// the "close" method on the client to safely clean up any remaining background resources.try(DlpServiceClientdlpServiceClient=DlpServiceClient.create()){// Specify the BigQuery table to analyzeBigQueryTablebigQueryTable=BigQueryTable.newBuilder().setProjectId(projectId).setDatasetId(datasetId).setTableId(tableId).build();// These values represent the column names of quasi-identifiers to analyzeList<String>quasiIds=Arrays.asList("Age","Gender");// These values represent the info types corresponding to the quasi-identifiers aboveList<String>infoTypeNames=Arrays.asList("AGE","GENDER");// Tag each of the quasiId column names with its corresponding infoTypeList<InfoType>infoTypes=infoTypeNames.stream().map(it->InfoType.newBuilder().setName(it).build()).collect(Collectors.toList());if(quasiIds.size()!=infoTypes.size()){thrownewIllegalArgumentException("The numbers of quasi-IDs and infoTypes must be equal!");}List<TaggedField>taggedFields=newArrayList<TaggedField>();for(inti=0;i <quasiIds.size();i++){TaggedFieldtaggedField=TaggedField.newBuilder().setField(FieldId.newBuilder().setName(quasiIds.get(i)).build()).setInfoType(infoTypes.get(i)).build();taggedFields.add(taggedField);}// The k-map distribution region can be specified by any ISO-3166-1 region code.StringregionCode="US";// Configure the privacy metric for the jobKMapEstimationConfigkmapConfig=KMapEstimationConfig.newBuilder().addAllQuasiIds(taggedFields).setRegionCode(regionCode).build();PrivacyMetricprivacyMetric=PrivacyMetric.newBuilder().setKMapEstimationConfig(kmapConfig).build();// Create action to publish job status notifications over Google Cloud Pub/SubProjectTopicNametopicName=ProjectTopicName.of(projectId,topicId);PublishToPubSubpublishToPubSub=PublishToPubSub.newBuilder().setTopic(topicName.toString()).build();Actionaction=Action.newBuilder().setPubSub(publishToPubSub).build();// Configure the risk analysis job to performRiskAnalysisJobConfigriskAnalysisJobConfig=RiskAnalysisJobConfig.newBuilder().setSourceTable(bigQueryTable).setPrivacyMetric(privacyMetric).addActions(action).build();// Build the request to be sent by the clientCreateDlpJobRequestcreateDlpJobRequest=CreateDlpJobRequest.newBuilder().setParent(LocationName.of(projectId,"global").toString()).setRiskJob(riskAnalysisJobConfig).build();// Send the request to the API using the clientDlpJobdlpJob=dlpServiceClient.createDlpJob(createDlpJobRequest);// Set up a Pub/Sub subscriber to listen on the job completion statusfinalSettableApiFuture<Boolean>done=SettableApiFuture.create();ProjectSubscriptionNamesubscriptionName=ProjectSubscriptionName.of(projectId,subscriptionId);MessageReceivermessageHandler=(PubsubMessagepubsubMessage,AckReplyConsumerackReplyConsumer)->{handleMessage(dlpJob,done,pubsubMessage,ackReplyConsumer);};Subscribersubscriber=Subscriber.newBuilder(subscriptionName,messageHandler).build();subscriber.startAsync();// Wait for job completion semi-synchronously// For long jobs, consider using a truly asynchronous execution model such as Cloud Functionstry{done.get(15,TimeUnit.MINUTES);}catch(TimeoutExceptione){System.out.println("Job was not completed after 15 minutes.");return;}finally{subscriber.stopAsync();subscriber.awaitTerminated();}// Build a request to get the completed jobGetDlpJobRequestgetDlpJobRequest=GetDlpJobRequest.newBuilder().setName(dlpJob.getName()).build();// Retrieve completed job statusDlpJobcompletedJob=dlpServiceClient.getDlpJob(getDlpJobRequest);System.out.println("Job status: "+completedJob.getState());System.out.println("Job name: "+dlpJob.getName());// Get the result and parse through and process the informationKMapEstimationResultkmapResult=completedJob.getRiskDetails().getKMapEstimationResult();for(KMapEstimationHistogramBucketresult:kmapResult.getKMapEstimationHistogramList()){System.out.printf("\tAnonymity range: [%d, %d]\n",result.getMinAnonymity(),result.getMaxAnonymity());System.out.printf("\tSize: %d\n",result.getBucketSize());for(KMapEstimationQuasiIdValuesvalueBucket:result.getBucketValuesList()){List<String>quasiIdValues=valueBucket.getQuasiIdsValuesList().stream().map(value->{Strings=value.toString();returns.substring(s.indexOf(':')+1).trim();}).collect(Collectors.toList());System.out.printf("\tValues: {%s}\n",String.join(", ",quasiIdValues));System.out.printf("\tEstimated k-map anonymity: %d\n",valueBucket.getEstimatedAnonymity());}}}}// handleMessage injects the job and settableFuture into the message reciever interfaceprivatestaticvoidhandleMessage(DlpJobjob,SettableApiFuture<Boolean>done,PubsubMessagepubsubMessage,AckReplyConsumerackReplyConsumer){StringmessageAttribute=pubsubMessage.getAttributesMap().get("DlpJobName");if(job.getName().equals(messageAttribute)){done.set(true);ackReplyConsumer.ack();}else{ackReplyConsumer.nack();}}}

Node.js

To learn how to install and use the client library for Sensitive Data Protection, seeSensitive Data Protection client libraries.

To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

// Import the Google Cloud client librariesconstDLP=require('@google-cloud/dlp');const{PubSub}=require('@google-cloud/pubsub');// Instantiates clientsconstdlp=newDLP.DlpServiceClient();constpubsub=newPubSub();// The project ID to run the API call under// const projectId = 'my-project';// The project ID the table is stored under// This may or (for public datasets) may not equal the calling project ID// const tableProjectId = 'my-project';// The ID of the dataset to inspect, e.g. 'my_dataset'// const datasetId = 'my_dataset';// The ID of the table to inspect, e.g. 'my_table'// const tableId = 'my_table';// The name of the Pub/Sub topic to notify once the job completes// TODO(developer): create a Pub/Sub topic to use for this// const topicId = 'MY-PUBSUB-TOPIC'// The name of the Pub/Sub subscription to use when listening for job// completion notifications// TODO(developer): create a Pub/Sub subscription to use for this// const subscriptionId = 'MY-PUBSUB-SUBSCRIPTION'// The ISO 3166-1 region code that the data is representative of// Can be omitted if using a region-specific infoType (such as US_ZIP_5)// const regionCode = 'USA';// A set of columns that form a composite key ('quasi-identifiers'), and// optionally their reidentification distributions// const quasiIds = [{ field: { name: 'age' }, infoType: { name: 'AGE' }}];asyncfunctionkMapEstimationAnalysis(){constsourceTable={projectId:tableProjectId,datasetId:datasetId,tableId:tableId,};// Construct request for creating a risk analysis jobconstrequest={parent:`projects/${projectId}/locations/global`,riskJob:{privacyMetric:{kMapEstimationConfig:{quasiIds:quasiIds,regionCode:regionCode,},},sourceTable:sourceTable,actions:[{pubSub:{topic:`projects/${projectId}/topics/${topicId}`,},},],},};// Create helper function for unpacking valuesconstgetValue=obj=>obj[Object.keys(obj)[0]];// Run risk analysis jobconst[topicResponse]=awaitpubsub.topic(topicId).get();constsubscription=awaittopicResponse.subscription(subscriptionId);const[jobsResponse]=awaitdlp.createDlpJob(request);constjobName=jobsResponse.name;console.log(`Job created. Job name:${jobName}`);// Watch the Pub/Sub topic until the DLP job finishesawaitnewPromise((resolve,reject)=>{constmessageHandler=message=>{if(message.attributes &&message.attributes.DlpJobName===jobName){message.ack();subscription.removeListener('message',messageHandler);subscription.removeListener('error',errorHandler);resolve(jobName);}else{message.nack();}};consterrorHandler=err=>{subscription.removeListener('message',messageHandler);subscription.removeListener('error',errorHandler);reject(err);};subscription.on('message',messageHandler);subscription.on('error',errorHandler);});setTimeout(()=>{console.log(' Waiting for DLP job to fully complete');},500);const[job]=awaitdlp.getDlpJob({name:jobName});consthistogramBuckets=job.riskDetails.kMapEstimationResult.kMapEstimationHistogram;histogramBuckets.forEach((histogramBucket,histogramBucketIdx)=>{console.log(`Bucket${histogramBucketIdx}:`);console.log(`  Anonymity range: [${histogramBucket.minAnonymity},${histogramBucket.maxAnonymity}]`);console.log(`  Size:${histogramBucket.bucketSize}`);histogramBucket.bucketValues.forEach(valueBucket=>{constvalues=valueBucket.quasiIdsValues.map(value=>getValue(value));console.log(`    Values:${values.join(' ')}`);console.log(`    Estimated k-map anonymity:${valueBucket.estimatedAnonymity}`);});});}awaitkMapEstimationAnalysis();

PHP

To learn how to install and use the client library for Sensitive Data Protection, seeSensitive Data Protection client libraries.

To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

use Exception;use Google\Cloud\Dlp\V2\Action;use Google\Cloud\Dlp\V2\Action\PublishToPubSub;use Google\Cloud\Dlp\V2\BigQueryTable;use Google\Cloud\Dlp\V2\Client\DlpServiceClient;use Google\Cloud\Dlp\V2\CreateDlpJobRequest;use Google\Cloud\Dlp\V2\DlpJob\JobState;use Google\Cloud\Dlp\V2\FieldId;use Google\Cloud\Dlp\V2\GetDlpJobRequest;use Google\Cloud\Dlp\V2\InfoType;use Google\Cloud\Dlp\V2\PrivacyMetric;use Google\Cloud\Dlp\V2\PrivacyMetric\KMapEstimationConfig;use Google\Cloud\Dlp\V2\PrivacyMetric\KMapEstimationConfig\TaggedField;use Google\Cloud\Dlp\V2\RiskAnalysisJobConfig;use Google\Cloud\PubSub\PubSubClient;/** * Computes the k-map risk estimation of a column set in a Google BigQuery table. * * @param string   $callingProjectId  The project ID to run the API call under * @param string   $dataProjectId     The project ID containing the target Datastore * @param string   $topicId           The name of the Pub/Sub topic to notify once the job completes * @param string   $subscriptionId    The name of the Pub/Sub subscription to use when listening for job * @param string   $datasetId         The ID of the dataset to inspect * @param string   $tableId           The ID of the table to inspect * @param string   $regionCode        The ISO 3166-1 region code that the data is representative of * @param string[] $quasiIdNames      Array columns that form a composite key (quasi-identifiers) * @param string[] $infoTypes         Array of infoTypes corresponding to the chosen quasi-identifiers */function k_map(    string $callingProjectId,    string $dataProjectId,    string $topicId,    string $subscriptionId,    string $datasetId,    string $tableId,    string $regionCode,    array $quasiIdNames,    array $infoTypes): void {    // Instantiate a client.    $dlp = new DlpServiceClient();    $pubsub = new PubSubClient();    $topic = $pubsub->topic($topicId);    // Verify input    if (count($infoTypes) != count($quasiIdNames)) {        throw new Exception('Number of infoTypes and number of quasi-identifiers must be equal!');    }    // Map infoTypes to quasi-ids    $quasiIdObjects = array_map(function ($quasiId, $infoType) {        $quasiIdField = (new FieldId())            ->setName($quasiId);        $quasiIdType = (new InfoType())            ->setName($infoType);        $quasiIdObject = (new TaggedField())            ->setInfoType($quasiIdType)            ->setField($quasiIdField);        return $quasiIdObject;    }, $quasiIdNames, $infoTypes);    // Construct analysis config    $statsConfig = (new KMapEstimationConfig())        ->setQuasiIds($quasiIdObjects)        ->setRegionCode($regionCode);    $privacyMetric = (new PrivacyMetric())        ->setKMapEstimationConfig($statsConfig);    // Construct items to be analyzed    $bigqueryTable = (new BigQueryTable())        ->setProjectId($dataProjectId)        ->setDatasetId($datasetId)        ->setTableId($tableId);    // Construct the action to run when job completes    $pubSubAction = (new PublishToPubSub())        ->setTopic($topic->name());    $action = (new Action())        ->setPubSub($pubSubAction);    // Construct risk analysis job config to run    $riskJob = (new RiskAnalysisJobConfig())        ->setPrivacyMetric($privacyMetric)        ->setSourceTable($bigqueryTable)        ->setActions([$action]);    // Listen for job notifications via an existing topic/subscription.    $subscription = $topic->subscription($subscriptionId);    // Submit request    $parent = "projects/$callingProjectId/locations/global";    $createDlpJobRequest = (new CreateDlpJobRequest())        ->setParent($parent)        ->setRiskJob($riskJob);    $job = $dlp->createDlpJob($createDlpJobRequest);    // Poll Pub/Sub using exponential backoff until job finishes    // Consider using an asynchronous execution model such as Cloud Functions    $attempt = 1;    $startTime = time();    do {        foreach ($subscription->pull() as $message) {            if (                isset($message->attributes()['DlpJobName'])&&                $message->attributes()['DlpJobName'] === $job->getName()            ) {                $subscription->acknowledge($message);                // Get the updated job. Loop to avoid race condition with DLP API.                do {                    $getDlpJobRequest = (new GetDlpJobRequest())                        ->setName($job->getName());                    $job = $dlp->getDlpJob($getDlpJobRequest);                } while ($job->getState() == JobState::RUNNING);                break 2; // break from parent do while            }        }        print('Waiting for job to complete' . PHP_EOL);        // Exponential backoff with max delay of 60 seconds        sleep(min(60, pow(2, ++$attempt)));    } while (time() - $startTime < 600); // 10 minute timeout    // Print finding counts    printf('Job %s status: %s' . PHP_EOL, $job->getName(), JobState::name($job->getState()));    switch ($job->getState()) {        case JobState::DONE:            $histBuckets = $job->getRiskDetails()->getKMapEstimationResult()->getKMapEstimationHistogram();            foreach ($histBuckets as $bucketIndex => $histBucket) {                // Print bucket stats                printf('Bucket %s:' . PHP_EOL, $bucketIndex);                printf(                    '  Anonymity range: [%s, %s]' . PHP_EOL,                    $histBucket->getMinAnonymity(),                    $histBucket->getMaxAnonymity()                );                printf('  Size: %s' . PHP_EOL, $histBucket->getBucketSize());                // Print bucket values                foreach ($histBucket->getBucketValues() as $percent => $valueBucket) {                    printf(                        '  Estimated k-map anonymity: %s' . PHP_EOL,                        $valueBucket->getEstimatedAnonymity()                    );                    // Pretty-print quasi-ID values                    print('  Values: ' . PHP_EOL);                    foreach ($valueBucket->getQuasiIdsValues() as $index => $value) {                        print('    ' . $value->serializeToJsonString() . PHP_EOL);                    }                }            }            break;        case JobState::FAILED:            printf('Job %s had errors:' . PHP_EOL, $job->getName());            $errors = $job->getErrors();            foreach ($errors as $error) {                var_dump($error->getDetails());            }            break;        case JobState::PENDING:            print('Job has not completed. Consider a longer timeout or an asynchronous execution model' . PHP_EOL);            break;        default:            print('Unexpected job state. Most likely, the job is either running or has not yet started.');    }}

Python

To learn how to install and use the client library for Sensitive Data Protection, seeSensitive Data Protection client libraries.

To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

importconcurrent.futuresfromtypingimportListimportgoogle.cloud.dlpfromgoogle.cloud.dlp_v2importtypesimportgoogle.cloud.pubsubdefk_map_estimate_analysis(project:str,table_project_id:str,dataset_id:str,table_id:str,topic_id:str,subscription_id:str,quasi_ids:List[str],info_types:List[str],region_code:str="US",timeout:int=300,)->None:"""Uses the Data Loss Prevention API to compute the k-map risk estimation        of a column set in a Google BigQuery table.    Args:        project: The Google Cloud project id to use as a parent resource.        table_project_id: The Google Cloud project id where the BigQuery table            is stored.        dataset_id: The id of the dataset to inspect.        table_id: The id of the table to inspect.        topic_id: The name of the Pub/Sub topic to notify once the job            completes.        subscription_id: The name of the Pub/Sub subscription to use when            listening for job completion notifications.        quasi_ids: A set of columns that form a composite key and optionally            their re-identification distributions.        info_types: Type of information of the quasi_id in order to provide a            statistical model of population.        region_code: The ISO 3166-1 region code that the data is representative            of. Can be omitted if using a region-specific infoType (such as            US_ZIP_5)        timeout: The number of seconds to wait for a response from the API.    Returns:        None; the response from the API is printed to the terminal.    """# Create helper function for unpacking valuesdefget_values(obj:types.Value)->int:returnint(obj.integer_value)# Instantiate a client.dlp=google.cloud.dlp_v2.DlpServiceClient()# Convert the project id into full resource ids.topic=google.cloud.pubsub.PublisherClient.topic_path(project,topic_id)parent=f"projects/{project}/locations/global"# Location info of the BigQuery table.source_table={"project_id":table_project_id,"dataset_id":dataset_id,"table_id":table_id,}# Check that numbers of quasi-ids and info types are equaliflen(quasi_ids)!=len(info_types):raiseValueError("""Number of infoTypes and number of quasi-identifiers                            must be equal!""")# Convert quasi id list to Protobuf typedefmap_fields(quasi_id:str,info_type:str)->dict:return{"field":{"name":quasi_id},"info_type":{"name":info_type}}quasi_ids=map(map_fields,quasi_ids,info_types)# Tell the API where to send a notification when the job is complete.actions=[{"pub_sub":{"topic":topic}}]# Configure risk analysis job# Give the name of the numeric column to compute risk metrics forrisk_job={"privacy_metric":{"k_map_estimation_config":{"quasi_ids":quasi_ids,"region_code":region_code,}},"source_table":source_table,"actions":actions,}# Call API to start risk analysis joboperation=dlp.create_dlp_job(request={"parent":parent,"risk_job":risk_job})defcallback(message:google.cloud.pubsub_v1.subscriber.message.Message)->None:ifmessage.attributes["DlpJobName"]==operation.name:# This is the message we're looking for, so acknowledge it.message.ack()# Now that the job is done, fetch the results and print them.job=dlp.get_dlp_job(request={"name":operation.name})print(f"Job name:{job.name}")histogram_buckets=(job.risk_details.k_map_estimation_result.k_map_estimation_histogram)# Print bucket statsfori,bucketinenumerate(histogram_buckets):print(f"Bucket{i}:")print("   Anonymity range: [{},{}]".format(bucket.min_anonymity,bucket.max_anonymity))print(f"   Size:{bucket.bucket_size}")forvalue_bucketinbucket.bucket_values:print("   Values:{}".format(map(get_values,value_bucket.quasi_ids_values)))print("   Estimated k-map anonymity:{}".format(value_bucket.estimated_anonymity))subscription.set_result(None)else:# This is not the message we're looking for.message.drop()# Create a Pub/Sub client and find the subscription. The subscription is# expected to already be listening to the topic.subscriber=google.cloud.pubsub.SubscriberClient()subscription_path=subscriber.subscription_path(project,subscription_id)subscription=subscriber.subscribe(subscription_path,callback)try:subscription.result(timeout=timeout)exceptconcurrent.futures.TimeoutError:print("No event received before the timeout. Please verify that the ""subscription provided is subscribed to the topic provided.")subscription.close()

C#

To learn how to install and use the client library for Sensitive Data Protection, seeSensitive Data Protection client libraries.

To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

usingGoogle.Api.Gax.ResourceNames;usingGoogle.Cloud.Dlp.V2;usingGoogle.Cloud.PubSub.V1;usingNewtonsoft.Json;usingSystem;usingSystem.Collections.Generic;usingSystem.Linq;usingSystem.Threading;usingSystem.Threading.Tasks;usingstaticGoogle.Cloud.Dlp.V2.Action.Types;usingstaticGoogle.Cloud.Dlp.V2.PrivacyMetric.Types;usingstaticGoogle.Cloud.Dlp.V2.PrivacyMetric.Types.KMapEstimationConfig.Types;publicclassRiskAnalysisCreateKMap{publicstaticobjectKMap(stringcallingProjectId,stringtableProjectId,stringdatasetId,stringtableId,stringtopicId,stringsubscriptionId,IEnumerable<FieldId>quasiIds,IEnumerable<InfoType>infoTypes,stringregionCode){vardlp=DlpServiceClient.Create();// Construct + submit the jobvarkmapEstimationConfig=newKMapEstimationConfig{QuasiIds={quasiIds.Zip(infoTypes,(Field,InfoType)=>newTaggedField{Field=Field,InfoType=InfoType})},RegionCode=regionCode};varconfig=newRiskAnalysisJobConfig(){PrivacyMetric=newPrivacyMetric{KMapEstimationConfig=kmapEstimationConfig},SourceTable=newBigQueryTable{ProjectId=tableProjectId,DatasetId=datasetId,TableId=tableId},Actions={newGoogle.Cloud.Dlp.V2.Action{PubSub=newPublishToPubSub{Topic=$"projects/{callingProjectId}/topics/{topicId}"}}}};varsubmittedJob=dlp.CreateDlpJob(newCreateDlpJobRequest{ParentAsProjectName=newProjectName(callingProjectId),RiskJob=config});// Listen to pub/sub for the jobvarsubscriptionName=newSubscriptionName(callingProjectId,subscriptionId);varsubscriber=SubscriberClient.CreateAsync(subscriptionName).Result;// SimpleSubscriber runs your message handle function on multiple// threads to maximize throughput.vardone=newManualResetEventSlim(false);subscriber.StartAsync((PubsubMessagemessage,CancellationTokencancel)=>{if(message.Attributes["DlpJobName"]==submittedJob.Name){Thread.Sleep(500);// Wait for DLP API results to become consistentdone.Set();returnTask.FromResult(SubscriberClient.Reply.Ack);}else{returnTask.FromResult(SubscriberClient.Reply.Nack);}});done.Wait(TimeSpan.FromMinutes(10));// 10 minute timeout; may not work for large jobssubscriber.StopAsync(CancellationToken.None).Wait();// Process resultsvarresultJob=dlp.GetDlpJob(newGetDlpJobRequest{DlpJobName=DlpJobName.Parse(submittedJob.Name)});varresult=resultJob.RiskDetails.KMapEstimationResult;for(varhistogramIdx=0;histogramIdx <result.KMapEstimationHistogram.Count;histogramIdx++){varhistogramValue=result.KMapEstimationHistogram[histogramIdx];Console.WriteLine($"Bucket {histogramIdx}");Console.WriteLine($"  Anonymity range: [{histogramValue.MinAnonymity}, {histogramValue.MaxAnonymity}].");Console.WriteLine($"  Size: {histogramValue.BucketSize}");foreach(vardatapointinhistogramValue.BucketValues){// 'UnpackValue(x)' is a prettier version of 'x.toString()'Console.WriteLine($"    Values: [{String.Join(',', datapoint.QuasiIdsValues.Select(x => UnpackValue(x)))}]");Console.WriteLine($"    Estimated k-map anonymity: {datapoint.EstimatedAnonymity}");}}return0;}publicstaticstringUnpackValue(ValueprotoValue){varjsonValue=JsonConvert.DeserializeObject<Dictionary<string,object>>(protoValue.ToString());returnjsonValue.Values.ElementAt(0).ToString();}}

Viewingk-map job results

To retrieve the results of thek-map risk analysis job using the REST API,send the following GET request to theprojects.dlpJobsresource. ReplacePROJECT_ID with your project ID andJOB_ID with the identifier of the job you want to obtain results for.The job ID was returned when you started the job, and can also be retrieved bylisting all jobs.

GET https://dlp.googleapis.com/v2/projects/PROJECT_ID/dlpJobs/JOB_ID

The request returns a JSON object containing an instance of the job. The resultsof the analysis are inside the"riskDetails" key, in anAnalyzeDataSourceRiskDetailsobject. For more information, see the API reference for theDlpJobresource.

What's next

  • Learn how to calculate thek-anonymityvalue for a dataset.
  • Learn how to calculate thel-diversityvalue for a dataset.
  • Learn how to calculate theδ-presence valuefor a dataset.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-17 UTC.