- Notifications
You must be signed in to change notification settings - Fork16
🐯 Profanity filter, based on "Shutterstock" dictionary
License
NotificationsYou must be signed in to change notification settings
jojoee/leo-profanity
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Profanity filter, based on "Shutterstock" dictionary.Demo page,API document page
// npmnpm install leo-profanitynpm install leo-profanity --no-optional # install only English bad word dictionary// yarnyarn add leo-profanityyarn add leo-profanity --ignore-optional # install only English bad word dictionary// Bowerbower install leo-profanity// dictionary/default.json// githack<script src="https://raw.githack.com/jojoee/bahttext/master/src/index.js"></script>const filter = LeoProfanityfilter.clearList()filter.add(["boobs", "butt"])
// support languages// - en// - fr// - ruvarfilter=require('leo-profanity');// output: I have ****, etc.filter.clean('I have boob, etc.');// replace current dictionary with the frenchfilter.loadDictionary('fr');// create new dictionaryfilter.addDictionary('th',['หนึ่ง','สอง','สาม','สี่','ห้า'])
See more hereLeoProfanity - Documentation
This project decide to split it into 2 parts,Sanitize
andFilter
and these below is a interesting algorithms.
Attempt 1 (1.1): Convert all into lowercase stringExample:- "SomeThing" to "something"Advantage:- Simple to understand- Simple to implementDisadvantage or Caution:- Will ignore "case sensitive" wordAttempt 2 (1.2): Turn "similar-like" symbol to alphabetExample:- "@" to "a"- "5" or "$" to "s"- "@ss" to "ass"- "b00b" to "boob"- "a$$a$$in" to "assassin"Advantage:- Detect some trick wordsDisadvantage or Caution:- False positive- Subjective, which depends on each person think about the symbol- Limit user imagination (user cannot play with word) e.g. "joe@ssociallife.com" e.g. user want to try something funny like "a$$a$$in"Attempt 3 (1.3): Replace "." and "," with space to separate wordsIn some sentence, people usually using "." and "," to connect or end the sentenceExample:- "I like a55,b00b.t1ts" to "I like a55 b00b t1ts"Advantage:- Increase founding possibility e.g. "I like a55,b00b.t1ts"Disadvantage or Caution:- Disconnect some words e.g. "john.doe@gmail.com"
Attempt 1 (2.1): Split into array (or using regex)Using space to split "word string" into "word array" then check by profanity word listExample:- "I like ass boob" to ["I", "like", "ass", "boob"]Advantage:- Simple to implementDisadvantage:- Need proper list of profanity word- Some "false positive" e.g. Great tit (https://en.wikipedia.org/wiki/Great_tit)Attempt 2 (2.2): Filter word inside (with or without space)Detect all alphabet that contain "profanity word"Example:- "thistextisfunnyboobsanda55" which contains suspicious words: "boobs", "a55"Advantage:- Can detect "un-spaced" profanity wordDisadvantage:- Many "false positive" e.g. http://www.morewords.com/contains/ass/, Clbuttic mistake (filter mistake)
- We don't know all methods that can produce profanity word(e.g. how many different ways can you enter a55 ?)
- There have a non-algorithm-based approach to achieve it (yet)
- People will always find a way to connect with each other(e.g.Leet)
So, this project decide to go with 1.1, 1.3 and 2.1.
(note - you can found other attempts in "Reference" section)
npm run test.watchnpm run validatenpm run doc.generate# test npm publishnpm publish --dry-run# mutation testnpm install -g stryker-clistryker initexport STRYKER_DASHBOARD_API_KEY=<the_project_api_token>echo $STRYKER_DASHBOARD_API_KEYnpx stryker run
- Javascript onnpmjs.com/package/leo-profanity
- PHP onpackagist.org/packages/jojoee/leo-profanity
- Python onpypi.org/project/leoprofanity
- Java onMaven
- Wordpress onwordpress.org
- Inspired byjwils0n/profanity-filter
- Algorithm / Discussion
- "similar-like" symbol to alphabet
- Replace Bad words using Regex
- Clbuttic
- The Clbuttic Mistake
- The Clbuttic Mistake: When obscenity filters go wrong
- Obscenity Filters: Bad Idea, or Incredibly Intercoursing Bad Idea?
- How do you implement a good profanity filter?
- The Untold History of Toontown’s SpeedChat (or BlockChattm from Disney finally arrives)
- Profanity Filter Performance in Java
- Resource bad-word list
- Tool
About
🐯 Profanity filter, based on "Shutterstock" dictionary
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Uh oh!
There was an error while loading.Please reload this page.
Contributors10
Uh oh!
There was an error while loading.Please reload this page.