InfoTypes and infoType detectors Stay organized with collections Save and categorize content based on your preferences.
Sensitive Data Protection usesinformation types—orinfoTypes—todefine what it scans for. An infoType is a type of sensitive data, such as aname, email address, telephone number, identification number, credit cardnumber, and so on. AninfoType detector is the corresponding detectionmechanism that matches on an infoType's matching criteria.
Note: If you're looking for an up-to-date list of all built-in infoTypedetectors, seeInfoType detector reference.Best practices for selecting infoTypes
Understanding your data is one of the first critical steps in protecting it. Asa best practice, you should collect, store, and process only the informationthat you have a business need for. By identifying the data you are handling, youcan make informed decisions for your business, users, and data security andprivacy posture.
Some of your business use cases might require certain sensitive information,and others might not. There is no single solution that supports all use cases.For this reason, Sensitive Data Protection offers flexible control over thetypes of data to scan for. If you're using infoTypes forde-identification ormasking, you alsohave control of when and how data is transformed.
Important: Built-in infoType detectors are not a perfectly accurate detection method. For example, they can'tguarantee compliance with regulatory requirements. You must decide what data is sensitive and how tobest protect it. Google recommends that you test your settings to make sure that your configurationmeets your requirements.General guidelines
Consider the following general guidelines when selecting infoTypes.
Use general infoTypes in place of specific infoTypes
If you don't need your scan results to show the specific infoTypes that weredetected, then consider using general infoTypes instead of specific infoTypes inyour inspection configurations. For information about the advantages of usinggeneral infoType detectors in your requests, seeGeneral and specific infotypedetectors on this page.
For a complete list of general infoTypes and the specific infoTypes that theyinclude, seeGeneralinfoTypes.
Sensitive information that you don't need to collect
Each service in your business should collect only the data that the serviceneeds. For example, certain services in your business don't need to collectfinancial information. For those services, consider enabling infoType detectorslikeCREDIT_CARD_NUMBER,FINANCIAL_ACCOUNT_NUMBER, and other infoTypes inthe industry categoryFINANCE.
Information that you need to collect but don't want to share broadly with your team
There might be valid use cases for collecting personal information, but youshouldn't share it broadly with your team. For example, a customer who files asupport ticket might give you contact information, so that you can contact themto resolve any issues. You don't want everyone on the team who views the ticketto see the personally identifiable information (PII). Consider enabling infoTypedetectors likePHONE_NUMBER,EMAIL_ADDRESS, and other infoTypes in the typecategoryPII.
Categories of sensitive data that are under industry, data privacy, or jurisdictional regulations
Certain information types are considered sensitive because of how they areissued or what they can be used for. In other cases, contextual and demographicinformation are considered a protected category. These types of informationmight have additional restrictions on how they are collected, used, and managed.Consider enabling infoType detectors in the following categories:
- Type category
SPII,GOVERNMENT_ID, andDEMOGRAPHIC - Industrycategory
HEALTH
Choosing between similar infoTypes
Consider the following when choosing between similar infoType detectors.
Passports
If you don't need to scan for passport identifiers from a specific country,then choose the generalized detector:PASSPORT.
Certain country-specific passport detectors, likeUK_PASSPORT, areavailable. However, some country-specific passport detectors can only identifypassports with specific formats or with the presence of contextual clues.
Person names
When scanning for people's names, usePERSON_NAME for mostuse cases instead ofFIRST_NAME orLAST_NAME.
PERSON_NAME is a detector for people's names. It includes single-word namesand full names. This detector attempts to detect, for example, names likeJane,Jane Smith, andJane Marie Smith using various technologies,including natural language understanding.FIRST_NAME andLAST_NAME aresubsets of this detector that attempt to identify parts of a name. Findingsfrom these detectors are always subsets of findings fromPERSON_NAME.
Dates and times
If you don't need to scan for all dates, consider using a targeted date detectorlikeDATE_OF_BIRTH. This detector attempts to identify context indicating thatthe date is related to when a person is born.
TheDATE detector attempts to find all dates regardless of context. It alsoflags relative dates, liketoday oryesterday. Similarly,TIME attempts tofind all timestamps.
Locations
If you don't need to scan for all locations, consider usingSTREET_ADDRESSinstead of theLOCATION detector. TheSTREET_ADDRESS detector attempts to findfully qualified addresses, which are usually more precise than generic locationsand can be considered more sensitive.
TheLOCATION infoType detector attempts to find any location regardless ofcontext—for example,Paris orCanada.
InfoType detectors that require context
Many infoType detectors require contextual clues to be present before theyidentify a match. If a built-in infoType detector isn't flagging items that youexpect to be flagged, because no contextual clues occur in close proximity tothose items, then consider usingGENERIC_ID or acustom infoTypedetector instead.
Information types lacking a common industry definition
Some information types lack a common industry definition. Examples are medicalrecord numbers, account numbers, PINs, and security codes. For these types,consider using infoTypes likeGENERIC_ID,FINANCIAL_ACCOUNT_NUMBER, andMEDICAL_RECORD_NUMBER. These detectors use a combination of entity detectionand context to find potentially sensitive elements.
GENERIC_ID to a value that is lower thanPOSSIBLE, you might flag mostnumeric and alphanumeric entities in your data.Higher-latency infoType detectors
Avoid enabling infoType detectors that you don't need. Although the followingare useful in certain scenarios, these infoTypes can make requests run muchmore slowly than requests that don't include them:
PERSON_NAMEFEMALE_NAMEMALE_NAMEFIRST_NAMELAST_NAMEDATE_OF_BIRTHLOCATIONSTREET_ADDRESSORGANIZATION_NAME
Always specify infoType detectors explicitly. Don't use an empty infoTypeslist.
How to use infoTypes
Sensitive Data Protection uses infoType detectors in the configuration for itsscans to determine what to inspect for and how to transform findings. InfoTypenames are also used when displaying or reporting scan results.
For example, if you wanted to look for email addresses in a block of text, youwould specify theEMAIL_ADDRESS infoType detector in the inspectionconfiguration. If you wanted to redact email addresses from the text block,you would specifyEMAIL_ADDRESS in both the inspection configuration and thede-identification configuration to indicate how to redact or transform thattype.
Further, you could use a combination of built-in and custom infoType detectorsto exclude a subset of email addresses from scan findings. First, create acustom infoType calledINTERNAL_EMAIL_ADDRESS and configure it to excludeinternal test email addresses. Then, you can set up your scan to includefindings forEMAIL_ADDRESS, but include an exclusion rule that excludes anyfindings that matchINTERNAL_EMAIL_ADDRESS. For more information aboutexclusion rules and other features of custom infoType detectors, seeCreatingcustom infoType detectors.
Sensitive Data Protection provides a set of built-in infoType detectors that youspecify by name, each of which is listed inInfoType detectorreference. These detectors use a variety oftechniques to discover and classify each type. For example, some types willrequire a pattern match, some may have mathematical checksums, some have specialdigit restrictions, and others may have specific prefixes or context around thefindings.
Examples
When you set up Sensitive Data Protection to scan your content, you include theinfoType detectors to use in the scan configuration.
For example, the following JSON and code samples demonstrate a simple scanrequest to the DLP API. Notice that thePHONE_NUMBERdetector is specified ininspectConfig,which instructs Sensitive Data Protection to scan the given string for aphone number.
C#
To learn how to install and use the client library for Sensitive Data Protection, seeSensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.
usingSystem;usingGoogle.Api.Gax.ResourceNames;usingGoogle.Cloud.Dlp.V2;publicclassInspectPhoneNumber{publicstaticInspectContentResponseInspect(stringprojectId,stringtext,LikelihoodminLikelihood=Likelihood.Possible){// Instantiate a client.vardlp=DlpServiceClient.Create();// Set content item.varcontentItem=newContentItem{Value=text};// Construct inspect config.varinspectConfig=newInspectConfig{InfoTypes={newInfoType{Name="PHONE_NUMBER"}},IncludeQuote=true,MinLikelihood=minLikelihood};// Construct a request.varrequest=newInspectContentRequest{ParentAsLocationName=newLocationName(projectId,"global"),InspectConfig=inspectConfig,Item=contentItem,};// Call the API.varresponse=dlp.InspectContent(request);// Inspect the results.varresultFindings=response.Result.Findings;Console.WriteLine($"Findings: {resultFindings.Count}");foreach(varfinresultFindings){Console.WriteLine("\tQuote: "+f.Quote);Console.WriteLine("\tInfo type: "+f.InfoType.Name);Console.WriteLine("\tLikelihood: "+f.Likelihood);}returnresponse;}}Go
To learn how to install and use the client library for Sensitive Data Protection, seeSensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.
import("context""fmt""io"dlp"cloud.google.com/go/dlp/apiv2""cloud.google.com/go/dlp/apiv2/dlppb")// inspectPhoneNumber demonstrates a simple scan request to the Cloud DLP API.// Notice that the PHONE_NUMBER detector is specified in inspectConfig,// which instructs Cloud DLP to scan the given string for a phone number.funcinspectPhoneNumber(wio.Writer,projectID,textToInspectstring)error{// projectID := "my-project-id"// textToInspect := "My phone number is (123) 555-6789"ctx:=context.Background()// Initialize a client once and reuse it to send multiple requests. Clients// are safe to use across goroutines. When the client is no longer needed,// call the Close method to cleanup its resources.client,err:=dlp.NewClient(ctx)iferr!=nil{returnerr}// Closing the client safely cleans up background resources.deferclient.Close()// Create and send the request.req:=&dlppb.InspectContentRequest{Parent:fmt.Sprintf("projects/%s/locations/global",projectID),Item:&dlppb.ContentItem{DataItem:&dlppb.ContentItem_Value{Value:textToInspect,},},InspectConfig:&dlppb.InspectConfig{// Specify the type of info the inspection will look for.// See https://cloud.google.com/dlp/docs/infotypes-reference// for complete list of info typesInfoTypes:[]*dlppb.InfoType{{Name:"PHONE_NUMBER"},},IncludeQuote:true,},}// Send the request.resp,err:=client.InspectContent(ctx,req)iferr!=nil{fmt.Fprintf(w,"receive: %v",err)returnerr}// Process the results.result:=resp.Resultfmt.Fprintf(w,"Findings: %d\n",len(result.Findings))for_,f:=rangeresult.Findings{fmt.Fprintf(w,"\tQuote: %s\n",f.Quote)fmt.Fprintf(w,"\tInfo type: %s\n",f.InfoType.Name)fmt.Fprintf(w,"\tLikelihood: %s\n",f.Likelihood)}returnnil}Java
To learn how to install and use the client library for Sensitive Data Protection, seeSensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.
importcom.google.cloud.dlp.v2.DlpServiceClient;importcom.google.privacy.dlp.v2.ContentItem;importcom.google.privacy.dlp.v2.Finding;importcom.google.privacy.dlp.v2.InfoType;importcom.google.privacy.dlp.v2.InspectConfig;importcom.google.privacy.dlp.v2.InspectContentRequest;importcom.google.privacy.dlp.v2.InspectContentResponse;importcom.google.privacy.dlp.v2.Likelihood;importcom.google.privacy.dlp.v2.LocationName;importjava.io.IOException;publicclassInspectPhoneNumber{publicstaticvoidmain(String[]args)throwsException{// TODO(developer): Replace these variables before running the sample.StringprojectId="your-project-id";StringtextToInspect="My name is Gary and my email is gary@example.com";inspectString(projectId,textToInspect);}// Inspects the provided text.publicstaticvoidinspectString(StringprojectId,StringtextToInspect)throwsIOException{// Initialize client that will be used to send requests. This client only needs to be created// once, and can be reused for multiple requests. After completing all of your requests, call// the "close" method on the client to safely clean up any remaining background resources.try(DlpServiceClientdlp=DlpServiceClient.create()){// Specify the type and content to be inspected.ContentItemitem=ContentItem.newBuilder().setValue(textToInspect).build();// Specify the type of info the inspection will look for.// See https://cloud.google.com/dlp/docs/infotypes-reference for complete list of info typesInfoTypeinfoType=InfoType.newBuilder().setName("PHONE_NUMBER").build();// Construct the configuration for the Inspect request.InspectConfigconfig=InspectConfig.newBuilder().setIncludeQuote(true).setMinLikelihood(Likelihood.POSSIBLE).addInfoTypes(infoType).build();// Construct the Inspect request to be sent by the client.InspectContentRequestrequest=InspectContentRequest.newBuilder().setParent(LocationName.of(projectId,"global").toString()).setItem(item).setInspectConfig(config).build();// Use the client to send the API request.InspectContentResponseresponse=dlp.inspectContent(request);// Parse the response and process resultsSystem.out.println("Findings: "+response.getResult().getFindingsCount());for(Findingf:response.getResult().getFindingsList()){System.out.println("\tQuote: "+f.getQuote());System.out.println("\tInfo type: "+f.getInfoType().getName());System.out.println("\tLikelihood: "+f.getLikelihood());}}}}Node.js
To learn how to install and use the client library for Sensitive Data Protection, seeSensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.
// Imports the Google Cloud Data Loss Prevention libraryconstDLP=require('@google-cloud/dlp');// Instantiates a clientconstdlp=newDLP.DlpServiceClient();// The project ID to run the API call under// const projectId = 'my-project';// The string to inspect// const string = 'My email is gary@example.com and my phone number is (223) 456-7890.';// The minimum likelihood required before returning a match// const minLikelihood = 'LIKELIHOOD_UNSPECIFIED';// The maximum number of findings to report per request (0 = server maximum)// const maxFindings = 0;// The infoTypes of information to match// See https://cloud.google.com/dlp/docs/concepts-infotypes for more information// about supported infoTypes.// const infoTypes = [{ name: 'PHONE_NUMBER' }];// The customInfoTypes of information to match// const customInfoTypes = [{ infoType: { name: 'DICT_TYPE' }, dictionary: { wordList: { words: ['foo', 'bar', 'baz']}}},// { infoType: { name: 'REGEX_TYPE' }, regex: {pattern: '\\(\\d{3}\\) \\d{3}-\\d{4}'}}];// Whether to include the matching string// const includeQuote = true;asyncfunctioninspectPhoneNumber(){// Construct item to inspectconstitem={value:string};// Construct requestconstrequest={parent:`projects/${projectId}/locations/global`,inspectConfig:{infoTypes:infoTypes,customInfoTypes:customInfoTypes,minLikelihood:minLikelihood,includeQuote:includeQuote,limits:{maxFindingsPerRequest:maxFindings,},},item:item,};// Run requestconst[response]=awaitdlp.inspectContent(request);constfindings=response.result.findings;if(findings.length >0){console.log('Findings:');findings.forEach(finding=>{if(includeQuote){console.log(`\tQuote:${finding.quote}`);}console.log(`\tInfo type:${finding.infoType.name}`);console.log(`\tLikelihood:${finding.likelihood}`);});}else{console.log('No findings.');}}inspectPhoneNumber();PHP
To learn how to install and use the client library for Sensitive Data Protection, seeSensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.
use Google\Cloud\Dlp\V2\Client\DlpServiceClient;use Google\Cloud\Dlp\V2\ContentItem;use Google\Cloud\Dlp\V2\InfoType;use Google\Cloud\Dlp\V2\InspectConfig;use Google\Cloud\Dlp\V2\InspectContentRequest;use Google\Cloud\Dlp\V2\Likelihood;/** * Inspect data for phone numbers * Demonstrates a simple scan request to the Cloud DLP API. Notice that the PHONE_NUMBER detector is specified in inspectConfig, which instructs Cloud DLP to scan the given string for a phone number. * * @param string $projectId The Google Cloud project id to use as a parent resource. * @param string $textToInspect The string to inspect. */function inspect_phone_number( // TODO(developer): Replace sample parameters before running the code. string $projectId, string $textToInspect = 'My name is Gary and my phone number is (415) 555-0890'): void { // Instantiate a client. $dlp = new DlpServiceClient(); $parent = "projects/$projectId/locations/global"; // Specify what content you want the service to Inspect. $item = (new ContentItem()) ->setValue($textToInspect); $inspectConfig = (new InspectConfig()) // The infoTypes of information to match ->setInfoTypes([ (new InfoType())->setName('PHONE_NUMBER'), ]) // Whether to include the matching string ->setIncludeQuote(true) ->setMinLikelihood(Likelihood::POSSIBLE); // Run request $inspectContentRequest = (new InspectContentRequest()) ->setParent($parent) ->setInspectConfig($inspectConfig) ->setItem($item); $response = $dlp->inspectContent($inspectContentRequest); // Print the results $findings = $response->getResult()->getFindings(); if (count($findings) == 0) { printf('No findings.' . PHP_EOL); } else { printf('Findings:' . PHP_EOL); foreach ($findings as $finding) { printf(' Quote: %s' . PHP_EOL, $finding->getQuote()); printf(' Info type: %s' . PHP_EOL, $finding->getInfoType()->getName()); printf(' Likelihood: %s' . PHP_EOL, Likelihood::name($finding->getLikelihood())); } }}Python
To learn how to install and use the client library for Sensitive Data Protection, seeSensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.
importgoogle.cloud.dlpdefinspect_phone_number(project:str,content_string:str,)->None:"""Uses the Data Loss Prevention API to analyze strings for protected data. Args: project: The Google Cloud project id to use as a parent resource. content_string: The string to inspect phone number from. """# Instantiate a client.dlp=google.cloud.dlp_v2.DlpServiceClient()# Prepare info_types by converting the list of strings into a list of# dictionaries (protos are also accepted).info_types=[{"name":"PHONE_NUMBER"}]# Construct the configuration dictionary.inspect_config={"info_types":info_types,"include_quote":True,}# Construct the `item`.item={"value":content_string}# Convert the project id into a full resource id.parent=f"projects/{project}"# Call the API.response=dlp.inspect_content(request={"parent":parent,"inspect_config":inspect_config,"item":item})# Print out the results.ifresponse.result.findings:forfindinginresponse.result.findings:print(f"Quote:{finding.quote}")print(f"Info type:{finding.info_type.name}")print(f"Likelihood:{finding.likelihood}")else:print("No findings.")REST
JSON input:
POST https://dlp.googleapis.com/v2/projects/[PROJECT-ID]/content:inspect?key={YOUR_API_KEY}{ "item":{ "value":"My phone number is (415) 555-0890" }, "inspectConfig":{ "includeQuote":true, "minLikelihood":"POSSIBLE", "infoTypes":{ "name":"PHONE_NUMBER" } }}When you send the preceding request the specified endpoint,Sensitive Data Protection returns the following:
JSON output:
{ "result":{ "findings":[ { "quote":"(415) 555-0890", "infoType":{ "name":"PHONE_NUMBER" }, "likelihood":"VERY_LIKELY", "location":{ "byteRange":{ "start":"19", "end":"33" }, "codepointRange":{ "start":"19", "end":"33" } }, "createTime":"2018-10-29T23:46:34.535Z" } ] }}You must specify particularinfoTypes listed in thereference in yourinspection configuration. If you don't specify any infoTypes, Sensitive Data Protectionuses a default infoTypes list that is intended for testing purposes only. Thedefault list might not be suitable for your use cases.
For more information on how to use infoType detectors to scan your content,see one of thehow-to topics about inspecting, redacting,or de-identifying.
Certainty and testing
Findings are reported with a certainty score calledlikelihood. The likelihood score indicates how likely afinding matches the corresponding type. For example, a type may return a lowerlikelihood if it only matches the pattern and return a higher likelihood if itmatches the pattern and has positive context around it. For this reason, you maynotice that a single finding could match several types at lower likelihood.Also, a finding may not appear or might have lower certainty if it doesn't matchproperly, or if it has negative context around it. For example, a finding mightnot be reported if it matches the structure for the specified infoType but failsthe infoType's checksum. Or a finding could match more than one infoType buthave context that boosts one of them, and thus only get reported for that type.
Note:Positive context is when the inclusion of certain characters, words, orphrases in proximity to a potentially matched pattern indicates toSensitive Data Protection that a match to the pattern is more likely. Similarly,negative context is when the inclusion of certain characters, words, orphrases in proximity to a pattern indicates that a match is less likely.If you are testing various detectors, you may notice that fake or sample datadoes not get reported because that fake or sample data is not passing enoughchecks to report.
Kinds of infoType detectors
Sensitive Data Protection includes several kinds of infoType detectors, allof which are summarized here:
- Built-in infoType detectors are built intoSensitive Data Protection. They include detectors for country-specific orregion-specific sensitive data types as well as globally applicable datatypes.General infoTypes are also availableto help you simplify your configurations.
- Custom infoType detectors are detectors that you createyourself. There are three kinds of custom infoType detectors:
- Small custom dictionary detectors are simple word lists thatSensitive Data Protection matches on. Use small custom dictionarydetectors when you have a list of up to several tens of thousands ofwords or phrases. Small custom dictionary detectors are preferred ifyou don't anticipate your word list changing significantly.
- Large custom dictionary detectors are generated bySensitive Data Protection using large lists of words or phrases storedin either Cloud Storage or BigQuery. Use large custom dictionarydetectors when you have a large list of words or phrases—up to tensof millions.
- Regular expressions (regex) detectors enableSensitive Data Protection to detect matches based on a regular expressionpattern.
In addition, Sensitive Data Protection includes the concept ofinspection rules, which enable you to fine-tune scanresults using the following:
- Exclusion rules enable you to decrease the number of findings returnedby adding rules to a built-in or custom infoType detector.
- Hotword rules enable you to increase the quantity or change thelikelihood value of findings returned by addingrules to a built-in or custom infoType detector.
Built-in infoType detectors
Built-in infoType detectors are built into Sensitive Data Protection, andinclude detectors for country- or region-specific sensitive data types such asthe FrenchNuméro d'Inscription au Répertoire (NIR) (FRANCE_NIR), UKdriver's license number (UK_DRIVERS_LICENSE_NUMBER), and US Social Securitynumber (US_SOCIAL_SECURITY_NUMBER). They also include globally applicable datatypes such as a person name (PERSON_NAME), telephone numbers (PHONE_NUMBER),email addresses (EMAIL_ADDRESS), and credit card numbers(CREDIT_CARD_NUMBER).
The list of built-in infoType detectors is always being updated. For a completelist of currently supported built-in infoType detectors, seeInfoType detector reference.
You can also view a complete list of all built-in infoType detectors bycalling Sensitive Data Protection'sinfoTypes.listmethod.
Detection techniques
To detect content that corresponds to built-in infoTypes,Sensitive Data Protection uses various techniques including patternmatching, checksum validation, machine learning, and context analysis. Forexample, to detect theCREDIT_CARD_NUMBER infoType, Sensitive Data Protectionchecks for known issuer prefixes, validates checksums, analyzes characterlengths, and considers the context in which the potential credit card numberappears.
Sensitive Data Protection Demo is aweb-based application that you can use to test built-in infoType detectors.
Important: Built-in infoType detectors are not a perfectly accurate detection method. For example, they can'tguarantee compliance with regulatory requirements. You must decide what data is sensitive and how tobest protect it. Google recommends that you test your settings to make sure that your configurationmeets your requirements.Language support
Country-specific infoTypes support the English language and the respectivecountry's languages. Most global infoTypes work with multiple languages.Test Sensitive Data Protection with your data to verify that it meets yourrequirements.
General and specific infotype detectors
Preview This feature is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of theService Specific Terms. Pre-GA features are available "as is" and might have limited support. For more information, see thelaunch stage descriptions.
Ageneral infoType detector is a detector that is defined more broadly thantypical infoType detectors and can match a wide range of potentially sensitiveinformation types. General infoType detectors are supersets of specific infoTypedetectors that share a common attribute or purpose. For example, theDRIVERS_LICENSE_NUMBER infoType detector can detect content that matches theGERMANY_DRIVERS_LICENSE_NUMBER andCANADA_DRIVERS_LICENSE_NUMBER infoTypes.
In many cases, general infoType detectors can also find matches that specificinfoType detectors can't. For example, thePASSPORT detector is better atfinding passport numbers than country-specific passport detectors, whichsometimes require the presence of contextual clues or specifically formattedcontent.
In your inspection configuration, you can use a general infoType detector inplace of a specific infoType detector. Sensitive Data Protection presents theresults based on which detector you specified in your request. For example, if astring that you inspect matches theGERMANY_DRIVERS_LICENSE_NUMBER infoTypeand you scanned for bothDRIVERS_LICENSE_NUMBER andGERMANY_DRIVERS_LICENSE_NUMBER in your request, then you get two findings forthe same string—one forDRIVERS_LICENSE_NUMBER and one forGERMANY_DRIVERS_LICENSE_NUMBER. However, if you scanned for onlyDRIVERS_LICENSE_NUMBER in your request, then the inspection result shows onlythe finding forDRIVERS_LICENSE_NUMBER.
Using a general infoType detector has the following advantages:
In many cases, general infoType detectors have higher recall than specificinfoType detectors. Recall is the number of true positive instances out of thetotal number of relevant instances.
You can simplify your Sensitive Data Protection requests because you don'thave to specify each specific infoType that you need to scan for. For example,the
GOVERNMENT_IDinfoType detector alone includes more than 100 differentinfoType detectors.You are less likely to reach the limit of 150infoType detectors per request.
If Sensitive Data Protection releases a new infoType and adds it to ageneral infoType that is already specified in your existing configurations,then Sensitive Data Protection automatically includes the new infoType inits scans. You don't have to manually add newly released infoTypes to yourexisting configurations.
General and specific infoTypes have a many-to-many relationship. That is, ageneral infoType can include many specific infoTypes, and a specific infoTypecan belong to many general infoTypes. For a complete list of general infoTypesand the specific infoTypes that they include, seeGeneralinfoTypes.
Custom infoType detectors
There are three kinds of custom infoType detectors:
In addition, Sensitive Data Protection includes inspection rules, whichenable you to fine-tune scan results by adding the following to existingdetectors:
Small custom dictionary detectors
Usesmall custom dictionarydetectors (also referred to as"regular custom dictionary detectors") to match a short (up to several tens ofthousands) list of words or phrases. A small custom dictionary can act as itsown unique detector.
Custom dictionary detectors are useful when you want to scan for a list ofwords or phrases that are not easily matched by a regular expression or abuilt-in detector. For example, suppose you want to scan for conference roomsthat are commonly referred to by their assigned room names rather than theirroom numbers, such as state or region names, landmarks, fictional characters,and so on. You can make a small custom dictionary detector that contains alist of these room names. Sensitive Data Protection can scan your contentfor each of the room names and return a match when it encounters one of them incontext. Learn more about how Sensitive Data Protection matches dictionarywords and phrases in the "Dictionary matchingspecifics" section ofCreating a regular custom dictionarydetector.
For more details about how small dictionary custom infoType detectors work,as well as examples in action, seeCreating a regular custom dictionarydetector.
Large custom dictionary detectors
Uselarge custom dictionary detectors(also referred to as "stored custom dictionary detectors") when you have morethan a few words or phrases to scan for, or if your list of words or phraseschanges frequently. Large custom dictionary detectors can match on up to tens ofmillions of words or phrases.
Large custom dictionary detectors are created differently from both regularexpression custom detectors and small custom dictionary detectors. Each largecustom dictionary has two components:
- A list of phrases that you create and define. The list is stored as eithera text file within Cloud Storage or a column in aBigQuery table.
- The generated dictionary files, which are built by Sensitive Data Protectionbased on your phrase list. The dictionary files are stored inCloud Storage and include a copy of the source phrase dataplus bloom filters, which aids searching and matching. You can't editthese files directly.
Once you've created a word list and then used Sensitive Data Protection togenerate a custom dictionary, you initiate or schedule a scan using a largecustom dictionary detector in a similar way as other infoType detectors.
For more details about how large custom dictionary detectors work, as well asexamples in action, seeCreating a stored custom dictionarydetector.
Regular expressions
A regular expression (regex) custom infoType detector allows you to create yourown infoType detectors that enable Sensitive Data Protection to detect matchesbased on a regex pattern. For example, suppose that you had medical recordnumbers in the form ###-#-#####. You could define a regex pattern such as thefollowing:
[1-9]{3}-[1-9]{1}-[1-9]{5}Sensitive Data Protection then matches items like this:
123-4-56789You can also specify alikelihood to assign to eachcustom infoType match. That is, when Sensitive Data Protection matches the sequenceyou specify, it will assign the likelihood that you have indicated. This isuseful because if your custom regex defines a sequence that is common enough itcould easily match some other random sequence, you would not wantSensitive Data Protection to label every match as VERY_LIKELY. Doing so woulderode confidence in scan results and potentially cause the wrong information tobe matched or de-identified.
For more information about regular expression custom infoType detectors, and tosee them in action, seeCreating a custom regexdetector.
Inspection rules
You use inspection rules to refine the results returned by existing infoTypedetectors—either built-in or custom. Inspection rules can be useful fortimes when the results that Sensitive Data Protection returns need to be augmentedin some way, either by adding to or excluding from the existing infoTypedetector.
The two types of inspection rules are:
- Exclusion rules
- Hotword rules
For more information about inspection rules, seeModifying infoType detectorsto refine scan results.
Exclusion rules
Exclusion rules enable you to decrease the quantity or precision of findingsreturned by adding rules to a built-in or custom infoType detector. Exclusionrules can help you reduce noise or other unwanted findings from being returnedby an infoType detector.
For example, if you scan a database for email addresses, you can add anexclusion rule in the form of a custom regex that instructsSensitive Data Protection to exclude any findings ending in "@example.com."
Exclusion rules can't be applied toobject infoTypes.
For more information about exclusion rules, seeModifying infoType detectorsto refine scan results.
Hotword rules
Hotword rules enable you to increase the quantity or accuracy of findingsreturned by adding rules to a built-in or custom infoType detector. Hotwordrules can effectively help you loosen an existing infoType detector's rules.
For example, suppose you want to scan a medical database for patient names. Youcan use Sensitive Data Protection's built-inPERSON_NAME infoTypedetector, but that will cause Sensitive Data Protection to match on allnames of people, not just names of patients. To fix this, you can include ahotword rule in the form of a regex custom infoType that looks for the word"patient" within a certain character proximity to the first character ofpotential matches. You can then assign findings matching this pattern alikelihood of "very likely," since they correspond toyour special criteria.
For more information about hotword rules, seeModifying infoType detectors torefine scan results.
Examples
To get a better idea of how infoTypes match on findings, look at the followingexamples of matching on a series of digits to determine whether they constitutea US Social Security number or a US Individual Taxpayer Identification Number.Keep in mind that these examples are for built-in infoType detectors. When youcreate a custom infoType detector, you specify the criteria that determine thelikelihood of a scan match.
Example 1
"SSN 222-22-2222"Reports a high likelihood score ofVERY_LIKELY for aUS_SOCIAL_SECURITY_NUMBER because:
- It is in the standard Social Security number format, which raises thecertainty.
- It has context nearby ("SSN") that boosts towards
US_SOCIAL_SECURITY_NUMBER.
Example 2
"999-99-9999"Reports a low likelihood score ofVERY_UNLIKELY for aUS_SOCIAL_SECURITY_NUMBER because:
- It is in the standard format, which raises the certainty.
- It starts with a 9, which is not allowed in Social Security numbers andlowers the certainty.
- It lacks context, which lowers the certainty.
Example 3
"999-98-9999"Reports a likelihood score ofPOSSIBLE for aUS_INDIVIDUAL_TAXPAYER_IDENTIFICATION_NUMBER andVERY_UNLIKELY forUS_SOCIAL_SECURITY_NUMBER because:
- It has the standard format for both
US_SOCIAL_SECURITY_NUMBERandUS_INDIVIDUAL_TAXPAYER_IDENTIFICATION_NUMBER. - It starts with a 9 and has another digit check, which boosts certainty for
US_INDIVIDUAL_TAXPAYER_IDENTIFICATION_NUMBER. - It lacks any context, which lowers the certainty for both.
What's next
The Sensitive Data Protection team releases new infoType detectors and groupsperiodically. To learn how to get the latest list of built-in infoTypes, seeListing built-in infoType detectors.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-17 UTC.