- Notifications
You must be signed in to change notification settings - Fork5
Cgo binding for Snowball C library
License
NotificationsYou must be signed in to change notification settings
goodsign/snowball
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Snowball stemmer port (cgo wrapper) for Go. Provides word stem extraction functionality. For more detailed info seehttp://snowball.tartarus.org/
go get github.com/goodsign/snowballgo test github.com/goodsign/snowball (Must PASS)
Done! Use it in your go files. (import 'github.com/goodsign/snowball')
stemmer,err:=NewWordStemmer(algorithm,encoding)ifnil!=err {/*...handle error...*/ }deferstemmer.Close()wordStem,err:=stemmer.Stem(word)ifnil!=err {/*...handle error...*/ }/* Use wordStem */
According to Snowball documentation:
Creating a stemmer is a relatively expensive operation - the expectedusage pattern is that a new stemmer is created when needed, usedto stem many words, and deleted after some time.
Filemodules.txt contains all the main algorithms for each language, in UTF-8, and also withthe most commonly used encoding.
Language Encodings Algorithmsdanish UTF_8,ISO_8859_1 danish,da,dandutch UTF_8,ISO_8859_1 dutch,nl,dut,nldenglish UTF_8,ISO_8859_1 english,en,engfinnish UTF_8,ISO_8859_1 finnish,fi,finfrench UTF_8,ISO_8859_1 french,fr,fre,fragerman UTF_8,ISO_8859_1 german,de,ger,deuhungarian UTF_8,ISO_8859_1 hungarian,hu,hunitalian UTF_8,ISO_8859_1 italian,it,itanorwegian UTF_8,ISO_8859_1 norwegian,no,norportuguese UTF_8,ISO_8859_1 portuguese,pt,porromanian UTF_8,ISO_8859_2 romanian,ro,rum,ronrussian UTF_8,KOI8_R russian,ru,russpanish UTF_8,ISO_8859_1 spanish,es,esl,spaswedish UTF_8,ISO_8859_1 swedish,sv,sweturkish UTF_8 turkish,tr,tur
The original Snowball documentation says:
Stemmers are re-entrant, but not threadsafe. In other words, ifyou wish to access the same stemmer object from multiple threads,you must ensure that all access is protected by a mutex or similardevice.
Thus this Go wrapper usessync.Mutex for each stem operation, so it is thread safe.
The Snowball library is released under theBSD Licence
The goodsign/snowball binding is released under theBSD Licence