Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

🐯 Profanity filter, based on "Shutterstock" dictionary

License

NotificationsYou must be signed in to change notification settings

jojoee/leo-profanity

Repository files navigation

continuous integrationreleaserunnablerunnable old noderunnable without optional dependenciesCodecovVersion - npmLicense - npmsemantic-releaseGreenkeeper badgeMutation testing badge

Profanity filter, based on "Shutterstock" dictionary.Demo page,API document page

Installation

// npmnpm install leo-profanitynpm install leo-profanity --no-optional # install only English bad word dictionary// yarnyarn add leo-profanityyarn add leo-profanity --ignore-optional # install only English bad word dictionary// Bowerbower install leo-profanity// dictionary/default.json// githack<script src="https://raw.githack.com/jojoee/bahttext/master/src/index.js"></script>const filter = LeoProfanityfilter.clearList()filter.add(["boobs", "butt"])

Example usage for npm

// support languages// - en// - fr// - ruvarfilter=require('leo-profanity');// output: I have ****, etc.filter.clean('I have boob, etc.');// replace current dictionary with the frenchfilter.loadDictionary('fr');// create new dictionaryfilter.addDictionary('th',['หนึ่ง','สอง','สาม','สี่','ห้า'])

See more hereLeoProfanity - Documentation

Algorithm

This project decide to split it into 2 parts,Sanitize andFilterand these below is a interesting algorithms.

Sanitize

Attempt 1 (1.1): Convert all into lowercase stringExample:- "SomeThing" to "something"Advantage:- Simple to understand- Simple to implementDisadvantage or Caution:- Will ignore "case sensitive" wordAttempt 2 (1.2): Turn "similar-like" symbol to alphabetExample:- "@" to "a"- "5" or "$" to "s"- "@ss" to "ass"- "b00b" to "boob"- "a$$a$$in" to "assassin"Advantage:- Detect some trick wordsDisadvantage or Caution:- False positive- Subjective, which depends on each person think about the symbol- Limit user imagination (user cannot play with word)  e.g. "joe@ssociallife.com"  e.g. user want to try something funny like "a$$a$$in"Attempt 3 (1.3): Replace "." and "," with space to separate wordsIn some sentence, people usually using "." and "," to connect or end the sentenceExample:- "I like a55,b00b.t1ts" to "I like a55 b00b t1ts"Advantage:- Increase founding possibility e.g. "I like a55,b00b.t1ts"Disadvantage or Caution:- Disconnect some words e.g. "john.doe@gmail.com"

Filter

Attempt 1 (2.1): Split into array (or using regex)Using space to split "word string" into "word array" then check by profanity word listExample:- "I like ass boob" to ["I", "like", "ass", "boob"]Advantage:- Simple to implementDisadvantage:- Need proper list of profanity word- Some "false positive" e.g. Great tit (https://en.wikipedia.org/wiki/Great_tit)Attempt 2 (2.2): Filter word inside (with or without space)Detect all alphabet that contain "profanity word"Example:- "thistextisfunnyboobsanda55" which contains suspicious words: "boobs", "a55"Advantage:- Can detect "un-spaced" profanity wordDisadvantage:- Many "false positive" e.g. http://www.morewords.com/contains/ass/, Clbuttic mistake (filter mistake)

In Summary

  • We don't know all methods that can produce profanity word(e.g. how many different ways can you enter a55 ?)
  • There have a non-algorithm-based approach to achieve it (yet)
  • People will always find a way to connect with each other(e.g.Leet)

So, this project decide to go with 1.1, 1.3 and 2.1.

(note - you can found other attempts in "Reference" section)

CMD

npm run test.watchnpm run validatenpm run doc.generate# test npm publishnpm publish --dry-run# mutation testnpm install -g stryker-clistryker initexport STRYKER_DASHBOARD_API_KEY=<the_project_api_token>echo $STRYKER_DASHBOARD_API_KEYnpx stryker run

Other languages

Reference


[8]ページ先頭

©2009-2025 Movatter.jp