Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

Text anonymization in many languages using Faker

License

NotificationsYou must be signed in to change notification settings

hal9ai/anonymization

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Text anonymization in many languages for python3.6+ usingFaker.

Install

pip install anonymization

Example

Replace emails and named entities in english

This example use NamedEntitiesAnonymizer which requirespacy and a spacy model.

pip install spacypython -m spacy download en_core_web_lg
>>>fromanonymizationimportAnonymization,AnonymizerChain,EmailAnonymizer,NamedEntitiesAnonymizer>>>text="Hi John,\nthanks for you for subscribing to Superprogram, feel free to ask me any question at secret.mail@Superprogram.com\n Superprogram the best program!">>>anon=AnonymizerChain(Anonymization('en_US'))>>>anon.add_anonymizers(EmailAnonymizer,NamedEntitiesAnonymizer('en_core_web_lg'))>>>anon.anonymize(text)'Hi Holly,\nthanks for you for subscribing to Ariel, feel free to ask me any question at shanestevenson@gmail.com\n Ariel the best program!'

Or make it reversible with pseudonymize:

>>>fromanonymizationimportAnonymization,AnonymizerChain,EmailAnonymizer,NamedEntitiesAnonymizer>>>text="Hi John,\nthanks for you for subscribing to Superprogram, feel free to ask me any question at secret.mail@Superprogram.com\n Superprogram the best program!">>>anon=AnonymizerChain(Anonymization('en_US'))>>>anon.add_anonymizers(EmailAnonymizer,NamedEntitiesAnonymizer('en_core_web_lg'))>>>clean_text,patch=anon.pseudonymize(text)>>>print(clean_text)'Christopher,\nthanks for you for subscribing to Audrey, feel free to ask me any question at colemanwesley@hotmail.com\n Audrey the best program!'revert_text=anon.revert(clean_text,patch)>>>print(text==revert_text)true

Replace a french phone number with a fake one

Our solution supports many languages along with their specific information formats.

For example, we can generate a french phone number:

>>>fromanonymizationimportAnonymization,PhoneNumberAnonymizer>>>>>>text="C'est bien le 0611223344 ton numéro ?">>>anon=Anonymization('fr_FR')>>>phoneAnonymizer=PhoneNumberAnonymizer(anon)>>>phoneAnonymizer.anonymize(text)"C'est bien le 0144939332 ton numéro ?"

More examples in/examples

Included anonymizers

Files

namelang
FilePathAnonymizer-

Internet

namelang
EmailAnonymizer-
UriAnonymizer-
MacAddressAnonymizer-
Ipv4Anonymizer-
Ipv6Anonymizer-

Phone numbers

namelang
PhoneNumberAnonymizer47+
msisdnAnonymizer47+

Date

namelang
DateAnonymizer-

Other

namelang
NamedEntitiesAnonymizer7+
DictionaryAnonymizer-
SignatureAnonymizer7+

Custom anonymizers

Custom anonymizers can be easily created to fit your needs:

classCustomAnonymizer():def__init__(self,anonymization:Anonymization):self.anonymization=anonymizationdefanonymize(self,text:str)->str:returnmodified_text# or replace by regex patterns in text using a faker providerreturnself.anonymization.regex_anonymizer(text,pattern,provider)# or replace all occurences using a faker providerreturnself.anonymization.replace_all(text,matchs,provider)

You may also add new faker provider with the helperAnonymization.add_provider(FakerProvider) or access the faker instance directlyAnonymization.faker.

Benchmark

This module is benchmarked onsynth_dataset frompresidio-research and returns accuracy result(0.79) better than Microsoft's solution(0.75)

You can run the benchmark using docker:

docker build. -f ./benchmark/dockerfile -t anonbenchdocker run -it --rm --name anonbench anonbench

License

MIT

About

Text anonymization in many languages using Faker

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python100.0%

[8]ページ先頭

©2009-2025 Movatter.jp