Segmentation of short text sequences - like hashtags - into the separated words sequence, done with the use of dictionary, which may be built on custom corpus of texts. Unigram dictionary is used to find most probable sequence, and n-grams approach is used to determine possible segmentation given the text corpus.
| Version: | 0.1.0 |
| Depends: | R (≥ 3.5) |
| Imports: | dplyr,magrittr,Rcpp,stringr,text2vec,textclean, utils |
| LinkingTo: | BH,Rcpp |
| Suggests: | testthat (≥ 3.0.0) |
| Published: | 2024-08-19 |
| DOI: | 10.32614/CRAN.package.NUSS |
| Author: | Oskar Kosch |
| Maintainer: | Oskar Kosch <contact at oskarkosch.com> |
| BugReports: | https://github.com/theogrost/NUSS/issues |
| License: | GPL (≥ 3) |
| URL: | https://github.com/theogrost/NUSS |
| NeedsCompilation: | yes |
| Language: | en |
| Materials: | README |
| CRAN checks: | NUSS results |
| Reference manual: | NUSS.html ,NUSS.pdf |
| Package source: | NUSS_0.1.0.tar.gz |
| Windows binaries: | r-devel:NUSS_0.1.0.zip, r-release:NUSS_0.1.0.zip, r-oldrel:NUSS_0.1.0.zip |
| macOS binaries: | r-release (arm64):NUSS_0.1.0.tgz, r-oldrel (arm64):NUSS_0.1.0.tgz, r-release (x86_64):NUSS_0.1.0.tgz, r-oldrel (x86_64):NUSS_0.1.0.tgz |
Please use the canonical formhttps://CRAN.R-project.org/package=NUSSto link to this page.