- Notifications
You must be signed in to change notification settings - Fork118
Correct commonly misspelled English words in source files
License
client9/misspell
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Correct commonly misspelled English words... quickly.
If you just want a binary and to start usingmisspell
:
curl -L -o ./install-misspell.sh https://git.io/misspellsh ./install-misspell.sh
Both will install as./bin/misspell
. You can adjust the download location using the-b
flag. File a ticket if you want another platform supported.
If you useGo, the best way to runmisspell
is by usinggometalinter. Otherwise, installmisspell
the old-fashioned way:
go get -u github.com/client9/misspell/cmd/misspell
and misspell will be in yourGOPATH
Also if you like to live dangerously, one could do
curl -L https://git.io/misspell| bash
$ misspell all.html your.txt important.md files.goyour.txt:42:10 found"langauge" a misspelling of"language"# ^ file, line, column
$ misspell -helpUsage of misspell: -debug Debug matching, very slow -error Exit with 2 if misspelling found -f string 'csv', 'sqlite3' or custom Golang template for output -i string ignore the following corrections, comma separated -j int Number of workers, 0 = number of CPUs -legal Show legal information and exit -locale string Correct spellings using locale perferances for US or UK. Default is to use a neutral variety of English. Setting locale to US will correct the British spelling of 'colour' to 'color' -o string output file or [stderr|stdout|] (default "stdout") -qDo not emit misspelling output -source string Source mode: auto=guess, go=golang source, text=plain or markdown-like text (default "auto") -wOverwrite file with corrections (default is just to display)
- Automatic Corrections
- Converting UK spellings to US
- Using pipes and stdin
- Golang special support
- gometalinter support
- CSV Output
- Using SQLite3
- Changing output format
- Checking a folder recursively
- Performance
- Known Issues
- Debugging
- False Negatives and missing words
- Origin of Word Lists
- Software License
- Problem statement
- Other spelling correctors
- Other ideas
Just add the-w
flag!
$ misspell -w all.html your.txt important.md files.goyour.txt:9:21:corrected "langauge" to "language"# ^ File is rewritten only if a misspelling is found
Add the-locale US
flag!
$ misspell -locale US important.txtimportant.txt:10:20 found"colour" a misspelling of"color"
Add the-locale UK
flag!
$echo"My favorite color is blue"| misspell -locale UKstdin:1:3:found"favorite color" a misspelling of"favourite colour"
Help is appreciated as I'm neither British nor anexpert in the English language.
Just list a directory you'd like to check
misspell.misspell aDirectory anotherDirectory aFile
You can also run misspell recursively using the following shell tricks:
misspell directory/**/*
or
find. -type f| xargs misspell
You can select a type of file as well. The following examples selects all.txt
files that arenot in thevendor
directory:
find. -type f -name'*.txt'| grep -v vendor/| xargs misspell -error
Yes!
Print messages tostderr
only:
$echo"zeebra"| misspellstdin:1:0:found"zeebra" a misspelling of"zebra"
Print messages tostderr
, and corrected text tostdout
:
$echo"zeebra"| misspell -wstdin:1:0:corrected"zeebra" to"zebra"zebra
Only print the corrected text tostdout
:
$echo"zeebra"| misspell -w -qzebra
Yes! If the file ends in.go
, then misspell will only check spelling incomments.
If you want to force a file to be checked as a golang source, use-source=go
on the command line. Conversely, you can check a golang source as if it werepure text by using-source=text
. You might want to do this since manyvariable names have misspellings in them!
I'm told the using-source=go
works well for ruby, javascript, java, c andc++.
It doesn't work well for python and bash.
gometalinter runsmultiple golang linters. Starting on2016-06-12gometalinter supportsmisspell
natively but it is disabled by default.
# update your copy of gometalintergo get -u github.com/alecthomas/gometalinter# install updates and misspellgometalinter --install --update
To use, just enablemisspell
gometalinter --enable misspell ./...
Note that gometalinter only checks golang files, and uses the default optionsofmisspell
You may wish to run this on your plaintext (.txt) and/or markdown files too.
Using-f csv
, the output is standard comma-seprated values with headers in the first row.
misspell -f csv *file,line,column,typo,corrected"README.md",9,22,langauge,language"README.md",47,25,langauge,language
Using-f sqlite
, the output is asqlite3 dump-file.
$ misspell -f sqlite*> /tmp/misspell.sql$ cat /tmp/misspell.sqlPRAGMA foreign_keys=OFF;BEGIN TRANSACTION;CREATE TABLE misspell("file" TEXT,"line" INTEGER,i"column" INTEGER,i"typo" TEXT,"corrected" TEXT);INSERT INTO misspell VALUES("install.txt",202,31,"immediatly","immediately");# etc...COMMIT;
$ sqlite3 -init /tmp/misspell.sql :memory:'select count(*) from misspell'1
With some tricks you can directly pipe output to sqlite3 by using-init /dev/stdin
:
misspell -f sqlite * | sqlite3 -init /dev/stdin -column -cmd '.width 60 15' ':memory' \ 'select substr(file,35),typo,count(*) as count from misspell group by file, typo order by count desc;'
Using the-i "comma,separated,rules"
flag you can specify corrections to ignore.
For example, if you were to runmisspell -w -error -source=text
against document that contains the stringGuy Finkelshteyn Braswell
, misspell would change the text toGuy Finkelstheyn Bras well
. You can thendetermine the rules to ignore by reverting the change and running the with the-debug
flag. You can then seethat the corrections werehtey -> they
andaswell -> as well
. To ignore these two rules, you add-i "htey,aswell"
toyour command. With debug mode on, you can see it print the corrections, but it will no longer make them.
Using the-f template
flag you can pass in agolang text template to format the output.
One can useprintf "%q" VALUE
to safely quote a value.
The default template is compatible withgometalinter
{{ .Filename }}:{{ .Line }}:{{ .Column }}:corrected {{ printf "%q" .Original }} to "{{ printf "%q" .Corrected }}"
To just print probable misspellings:
-f '{{ .Original }}'
This corrects commonly misspelled English words in computer sourcecode, and other text-based formats (.txt
,.md
, etc).
It is designed to run quickly so it can beused as apre-commit hookwith minimal burden on the developer.
It does not work with binary formats (e.g. Word, etc).
It is not a complete spell-checking program nor a grammar checker.
Some other misspelling correctors:
- https://github.com/vlajos/misspell_fixer
- https://github.com/lyda/misspell-check
- https://github.com/lucasdemarchi/codespell
They all work but had problems that prevented me from using them at scale:
- slow, all of the above check one misspelling at a time (i.e. linear) using regexps
- not MIT/Apache2 licensed (or equivalent)
- have dependencies that don't work for me (python3, bash, linux sed, etc)
- don't understand American vs. British English and sometimes makes unwelcome "corrections"
That said, they might be perfect for you and many have more featuresthan this project!
Misspell is easily 100x to 1000x faster than other spelling correctors. Youshould be able to check and correct 1000 files in under 250ms.
This uses the mighty power of golang'sstrings.Replacer which isa implementation or variation of theAho–Corasick algorithm.This makes multiple substring matchessimultaneously.
In addition this uses multiple CPU cores to work on multiple files.
Unlike the other projects, this doesn't know what a "word" is. There may bemore false positives and false negatives due to this. On the other hand, itsometimes catches things others don't.
Either way, please file bugs and we'll fix them!
Since it operates in parallel to make corrections, it can be non-obvious todetermine exactly what word was corrected.
Run using-debug
flag on the file you want. It should then print what wordit is trying to correct. Thenfile abug describing the problem.Thanks!
The matching function iscase-sensitive, so variable names that are multipleworlds either in all-upper or all-lower case sometimes can cause falsepositives. For instance a variable namedbodyreader
could trigger a falsepositive sinceyrea
is in the middle that could be corrected toyear
.Other problems happen if the variable name uses a English contraction thatshould use an apostrophe. The best way of fixing this is to use theEffective Go namingconventions and usecamelCase for variable names. Youcan check your code usinggolint
The main code isMIT.
Misspell also makes uses of the Golang standard library and contains a modified version of Golang'sstrings.Replacerwhich are covered under aBSD License. Typemisspell -legal
for more details or seelegal.go
It started with a word list fromWikipedia.Unfortunately, this list had to be highly edited as many of the words areobsolete or based from mistakes on mechanical typewriters (I'm guessing).
Additional words were added based on actually mistakes seen inthe wild (meaning self-generated).
Variations of UK and US spellings are based on many sources including:
- http://www.tysto.com/uk-us-spelling-list.html (with heavy editing, many are incorrect)
- http://www.oxforddictionaries.com/us/words/american-and-british-spelling-american (excellent site but incomplete)
- Diffing US and UKscowl dictionaries
American English is more accepting of spelling variations than is BritishEnglish, so "what is American or not" is subject to opinion. Corrections and help welcome.
Here's some ideas for enhancements:
Capitalization of proper nouns could be done (e.g. weekday and month names, country names, language names)
Opinionated US spellings US English has a number of words with alternatespellings. Thinkadviser vs.advisor. While "advisor" is not wrong, the opinionated USlocale would correct "advisor" to "adviser".
Versioning Some type of versioning is needed so reporting mistakes and errors is easier.
Feedback Mistakes would be sent to some server for agregation and feedback review.
Contractions and Apostrophes This would optionally correct "isnt" to"isn't", etc.
About
Correct commonly misspelled English words in source files