- Notifications
You must be signed in to change notification settings - Fork13
🐯 Profanity filter, based on "Shutterstock" dictionary
License
NotificationsYou must be signed in to change notification settings
jojoee/leo-profanity
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Profanity filter, based on "Shutterstock" dictionary.Demo page,API document page
// npmnpm install leo-profanitynpm install leo-profanity --no-optional # install only English bad word dictionary// yarnyarn add leo-profanityyarn add leo-profanity --ignore-optional # install only English bad word dictionary// Bowerbower install leo-profanity// dictionary/default.json// githack<script src="https://raw.githack.com/jojoee/bahttext/master/src/index.js"></script>const filter = LeoProfanityfilter.clearList()filter.add(["boobs", "butt"])
// support languages// - en// - fr// - ruvarfilter=require('leo-profanity');// output: I have ****, etc.filter.clean('I have boob, etc.');// replace current dictionary with the frenchfilter.loadDictionary('fr');// create new dictionaryfilter.addDictionary('th',['หนึ่ง','สอง','สาม','สี่','ห้า'])
See more hereLeoProfanity - Documentation
This project decide to split it into 2 parts,Sanitize
andFilter
and these below is a interesting algorithms.
Attempt 1 (1.1): Convert all into lowercase stringExample:- "SomeThing" to "something"Advantage:- Simple to understand- Simple to implementDisadvantage or Caution:- Will ignore "case sensitive" wordAttempt 2 (1.2): Turn "similar-like" symbol to alphabetExample:- "@" to "a"- "5" or "$" to "s"- "@ss" to "ass"- "b00b" to "boob"- "a$$a$$in" to "assassin"Advantage:- Detect some trick wordsDisadvantage or Caution:- False positive- Subjective, which depends on each person think about the symbol- Limit user imagination (user cannot play with word) e.g. "joe@ssociallife.com" e.g. user want to try something funny like "a$$a$$in"Attempt 3 (1.3): Replace "." and "," with space to separate wordsIn some sentence, people usually using "." and "," to connect or end the sentenceExample:- "I like a55,b00b.t1ts" to "I like a55 b00b t1ts"Advantage:- Increase founding possibility e.g. "I like a55,b00b.t1ts"Disadvantage or Caution:- Disconnect some words e.g. "john.doe@gmail.com"
Attempt 1 (2.1): Split into array (or using regex)Using space to split "word string" into "word array" then check by profanity word listExample:- "I like ass boob" to ["I", "like", "ass", "boob"]Advantage:- Simple to implementDisadvantage:- Need proper list of profanity word- Some "false positive" e.g. Great tit (https://en.wikipedia.org/wiki/Great_tit)Attempt 2 (2.2): Filter word inside (with or without space)Detect all alphabet that contain "profanity word"Example:- "thistextisfunnyboobsanda55" which contains suspicious words: "boobs", "a55"Advantage:- Can detect "un-spaced" profanity wordDisadvantage:- Many "false positive" e.g. http://www.morewords.com/contains/ass/, Clbuttic mistake (filter mistake)
- We don't know all methods that can produce profanity word(e.g. how many different ways can you enter a55 ?)
- There have a non-algorithm-based approach to achieve it (yet)
- People will always find a way to connect with each other(e.g.Leet)
So, this project decide to go with 1.1, 1.3 and 2.1.
(note - you can found other attempts in "Reference" section)
npm run test.watchnpm run validatenpm run doc.generate# test npm publishnpm publish --dry-run# mutation testnpm install -g stryker-clistryker initexport STRYKER_DASHBOARD_API_KEY=<the_project_api_token>echo $STRYKER_DASHBOARD_API_KEYnpx stryker run
- Javascript onnpmjs.com/package/leo-profanity
- PHP onpackagist.org/packages/jojoee/leo-profanity
- Python onpypi.org/project/leoprofanity
- Java onMaven
- Wordpress onwordpress.org
- Inspired byjwils0n/profanity-filter
- Algorithm / Discussion
- "similar-like" symbol to alphabet
- Replace Bad words using Regex
- Clbuttic
- The Clbuttic Mistake
- The Clbuttic Mistake: When obscenity filters go wrong
- Obscenity Filters: Bad Idea, or Incredibly Intercoursing Bad Idea?
- How do you implement a good profanity filter?
- The Untold History of Toontown’s SpeedChat (or BlockChattm from Disney finally arrives)
- Profanity Filter Performance in Java
- Resource bad-word list
- Tool
About
🐯 Profanity filter, based on "Shutterstock" dictionary