microsoft/presidioPublic

NotificationsYou must be signed in to change notification settings
Fork612
Star4.3k

Context aware, pluggable and customizable data protection and de-identification SDK for text and images

License

MIT license

4.3k stars 612 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 1,272 Commits
.devcontainer		.devcontainer
.github		.github
.pipelines/templates		.pipelines/templates
docs		docs
e2e-tests		e2e-tests
overrides		overrides
presidio-analyzer		presidio-analyzer
presidio-anonymizer		presidio-anonymizer
presidio-cli		presidio-cli
presidio-image-redactor		presidio-image-redactor
presidio-structured		presidio-structured
.env		.env
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT		CODE_OF_CONDUCT
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
NOTICE		NOTICE
README.MD		README.MD
SECURITY.md		SECURITY.md
azure-pipelines-ci.yml		azure-pipelines-ci.yml
azure-pipelines.yml		azure-pipelines.yml
docker-compose-image.yml		docker-compose-image.yml
docker-compose-text.yml		docker-compose-text.yml
docker-compose-transformers.yml		docker-compose-transformers.yml
docker-compose.yml		docker-compose.yml
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
run.bat		run.bat

Repository files navigation

Presidio - Data Protection and De-identification SDK

Context aware, pluggable and customizable PII de-identification service for text and images.

Presidio Analyzer
Presidio Anonymizer
Presidio Image-Redactor
Presidio Structured

What is Presidio

Presidio(Origin from Latin praesidium ‘protection, garrison’) helps to ensure sensitive data is properly managed and governed. It provides fastidentification andanonymization modules for private entities in text such as credit card numbers, names, locations, social security numbers, bitcoin wallets, US phone numbers, financial data and more.

📘Full documentation

❓Frequently Asked Questions

💭Demo

🛫Examples

Are you using Presidio? We'd love to know how

Please help us improve by takingthis short anonymous survey.

Goals

Allow organizations to preserve privacy in a simpler way by democratizing de-identification technologies and introducing transparency in decisions.
Embrace extensibility and customizability to a specific business need.
Facilitate both fully automated and semi-automated PII de-identification flows on multiple platforms.

Main features

Predefined orcustom PII recognizers leveragingNamed Entity Recognition,regular expressions,rule based logic andchecksum with relevant context in multiple languages.
Options for connecting to external PII detection models.
Multiple usage options,from Python or PySpark workloads through Docker to Kubernetes.
Customizability in PII identification and de-identification.
Module forredacting PII text in images (standard image types and DICOM medical images).

⚠️ Presidio can help identify sensitive/PII data in un/structured text. However, because it is using automated detection mechanisms, there is no guarantee that Presidio will find all sensitive information. Consequently, additional systems and protections should be employed.

Installing Presidio

Running Presidio

Support

Before you submit an issue, please go over thedocumentation.
For general discussions, please use theGitHub repo's discussion board.
If you have a usage question, found a bug or have a suggestion for improvement, please file aGitHub issue.
For other matters, please emailpresidio@microsoft.com.

Contributing

For details on contributing to this repository, see thecontributing guide.

This project welcomes contributions and suggestions. Most contributions require you to agree to aContributor License Agreement (CLA) declaring that you have the right to, and actually do, grant usthe rights to use your contribution. For details, visithttps://cla.microsoft.com.

When you submit a pull request, a CLA-bot will automatically determine whether you need to providea CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructionsprovided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted theMicrosoft Open Source Code of Conduct.For more information see theCode of Conduct FAQ orcontactopencode@microsoft.com with any additional questions or comments.