- Notifications
You must be signed in to change notification settings - Fork0
Text anonymization in many languages using Faker
License
hal9ai/anonymization
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Text anonymization in many languages for python3.6+ usingFaker.
pip install anonymization
This example use NamedEntitiesAnonymizer which requirespacy and a spacy model.
pip install spacypython -m spacy download en_core_web_lg
>>>fromanonymizationimportAnonymization,AnonymizerChain,EmailAnonymizer,NamedEntitiesAnonymizer>>>text="Hi John,\nthanks for you for subscribing to Superprogram, feel free to ask me any question at secret.mail@Superprogram.com\n Superprogram the best program!">>>anon=AnonymizerChain(Anonymization('en_US'))>>>anon.add_anonymizers(EmailAnonymizer,NamedEntitiesAnonymizer('en_core_web_lg'))>>>anon.anonymize(text)'Hi Holly,\nthanks for you for subscribing to Ariel, feel free to ask me any question at shanestevenson@gmail.com\n Ariel the best program!'
Or make it reversible with pseudonymize:
>>>fromanonymizationimportAnonymization,AnonymizerChain,EmailAnonymizer,NamedEntitiesAnonymizer>>>text="Hi John,\nthanks for you for subscribing to Superprogram, feel free to ask me any question at secret.mail@Superprogram.com\n Superprogram the best program!">>>anon=AnonymizerChain(Anonymization('en_US'))>>>anon.add_anonymizers(EmailAnonymizer,NamedEntitiesAnonymizer('en_core_web_lg'))>>>clean_text,patch=anon.pseudonymize(text)>>>print(clean_text)'Christopher,\nthanks for you for subscribing to Audrey, feel free to ask me any question at colemanwesley@hotmail.com\n Audrey the best program!'revert_text=anon.revert(clean_text,patch)>>>print(text==revert_text)true
Our solution supports many languages along with their specific information formats.
For example, we can generate a french phone number:
>>>fromanonymizationimportAnonymization,PhoneNumberAnonymizer>>>>>>text="C'est bien le 0611223344 ton numéro ?">>>anon=Anonymization('fr_FR')>>>phoneAnonymizer=PhoneNumberAnonymizer(anon)>>>phoneAnonymizer.anonymize(text)"C'est bien le 0144939332 ton numéro ?"
More examples in/examples
name | lang |
---|---|
FilePathAnonymizer | - |
name | lang |
---|---|
EmailAnonymizer | - |
UriAnonymizer | - |
MacAddressAnonymizer | - |
Ipv4Anonymizer | - |
Ipv6Anonymizer | - |
name | lang |
---|---|
PhoneNumberAnonymizer | 47+ |
msisdnAnonymizer | 47+ |
name | lang |
---|---|
DateAnonymizer | - |
name | lang |
---|---|
NamedEntitiesAnonymizer | 7+ |
DictionaryAnonymizer | - |
SignatureAnonymizer | 7+ |
Custom anonymizers can be easily created to fit your needs:
classCustomAnonymizer():def__init__(self,anonymization:Anonymization):self.anonymization=anonymizationdefanonymize(self,text:str)->str:returnmodified_text# or replace by regex patterns in text using a faker providerreturnself.anonymization.regex_anonymizer(text,pattern,provider)# or replace all occurences using a faker providerreturnself.anonymization.replace_all(text,matchs,provider)
You may also add new faker provider with the helperAnonymization.add_provider(FakerProvider)
or access the faker instance directlyAnonymization.faker
.
This module is benchmarked onsynth_dataset frompresidio-research and returns accuracy result(0.79) better than Microsoft's solution(0.75)
You can run the benchmark using docker:
docker build. -f ./benchmark/dockerfile -t anonbenchdocker run -it --rm --name anonbench anonbench
MIT
About
Text anonymization in many languages using Faker
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
Packages0
Languages
- Python100.0%