- Notifications
You must be signed in to change notification settings - Fork17
Dev/steven/remove anonymizer#47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Pull Request Overview
This PR removes the dependency onpresidio-anonymizer and implements custom text masking functionality to prevent package conflicts. The changes include a new lightweight anonymizer module, improved BIC/SWIFT detection patterns to reduce false positives, and comprehensive test coverage for both the anonymizer and BIC detection.
Key changes:
- Implemented custom anonymizer in
src/guardrails/utils/anonymizer.pyto replacepresidio-anonymizer - Enhanced BIC/SWIFT detection with context-aware patterns and bank code whitelisting to reduce false positives
- Added baseline tests to ensure the custom anonymizer matches expected behavior
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| src/guardrails/utils/anonymizer.py | New module implementing custom PII masking functionality with overlap resolution |
| src/guardrails/checks/text/pii.py | Updated to use custom anonymizer and improved BIC/SWIFT detection patterns |
| src/guardrails/_base_client.py | Migrated frompresidio-anonymizer to custom anonymizer for structured content masking |
| tests/unit/checks/test_pii.py | Added tests for BIC/SWIFT detection including false positive prevention |
| tests/unit/checks/test_anonymizer_baseline.py | New baseline tests ensuring custom anonymizer produces expected results |
| pyproject.toml | Removedpresidio-anonymizer dependency |
💡Add Copilot custom instructions for smarter, more guided reviews.Learn how to get started.
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Pull Request Overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 10 comments.
💡Add Copilot custom instructions for smarter, more guided reviews.Learn how to get started.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Pull Request Overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
💡Add Copilot custom instructions for smarter, more guided reviews.Learn how to get started.
| "ANZB|NATA|WPAC|CTBA|"# Australia | ||
| "BKCH|MHCB|BOTK|"# Japan | ||
| "ICBK|ABOC|PCBC|"# China | ||
| "HSBC|SCBL|"# Hong Kong |
CopilotAINov 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
"HSBC" is duplicated in the known_bank_codes string. It already appears in line 176 as part of the "Major international" banks. Consider removing the duplicate "HSBC" from the Hong Kong section to avoid redundancy.
| "HSBC|SCBL|"# Hong Kong | |
| "SCBL|"# Hong Kong |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
[nit]
| # Check that at least the first format is detected | ||
| assert"<PHONE_NUMBER>"inresult1.info["checked_text"]# noqa: S101 | ||
CopilotAINov 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Thetexts_and_results list is created and appended to but never used. Consider removing this unused variable or adding assertions that use it.
| # Check that at least the first format is detected | |
| assert"<PHONE_NUMBER>"inresult1.info["checked_text"]# noqa: S101 | |
| # Check that all formats are detected | |
| fororiginal,maskedintexts_and_results: | |
| assert"<PHONE_NUMBER>"inmasked# noqa: S101 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
This is a fine unit test for now
| "ICBK|ABOC|PCBC|"# China | ||
| "HSBC|SCBL|"# Hong Kong | ||
| "DBSS|OCBC|UOVB|"# Singapore | ||
| "CZNB|SHBK|KOEX|HVBK|NACF|IBKO|KODB|HNBN|CITI"# South Korea |
CopilotAINov 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
"CITI" is duplicated in the known_bank_codes string. It already appears in line 176 as part of the "Major international" banks. Consider removing the duplicate "CITI" from the South Korea section to avoid redundancy.
| "CZNB|SHBK|KOEX|HVBK|NACF|IBKO|KODB|HNBN|CITI"# South Korea | |
| "CZNB|SHBK|KOEX|HVBK|NACF|IBKO|KODB|HNBN"# South Korea |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
[nit]
bf65130 intomainUh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Remove dependency on
presidio-anonymizerto prevent package clashpresidio-anonymizeris being used to mask detected entities. This PR implements that functionality ourselves without depending on an external libraryBIC_SWIFTdetection which was resulting in too many false positives