Tokenize text into morphemes. The morphemepiece algorithm uses a lookup table to determine the morpheme breakdown of words, and falls back on a modified wordpiece tokenization algorithm for words not found in the lookup table.
| Version: | 1.2.3 |
| Imports: | dlr (≥ 1.0.0),fastmatch,magrittr,memoise (≥ 2.0.0),morphemepiece.data,piecemaker (≥ 1.0.0),purrr (≥ 0.3.4),readr,rlang,stringr (≥ 1.4.0) |
| Suggests: | dplyr,fs,ggplot2,here,knitr,remotes,rmarkdown,testthat (≥ 3.0.0), utils |
| Published: | 2022-04-16 |
| DOI: | 10.32614/CRAN.package.morphemepiece |
| Author: | Jonathan Bratt |
| Maintainer: | Jonathan Bratt <jonathan.bratt at macmillan.com> |
| BugReports: | https://github.com/macmillancontentscience/morphemepiece/issues |
| License: | Apache License (≥ 2) |
| URL: | https://github.com/macmillancontentscience/morphemepiece |
| NeedsCompilation: | no |
| Materials: | README,NEWS |
| CRAN checks: | morphemepiece results |
| Reference manual: | morphemepiece.html ,morphemepiece.pdf |
| Vignettes: | Testing the fall-through algorithm (source,R code) Generating a Vocabulary and Lookup (source,R code) |
| Package source: | morphemepiece_1.2.3.tar.gz |
| Windows binaries: | r-devel:morphemepiece_1.2.3.zip, r-release:morphemepiece_1.2.3.zip, r-oldrel:morphemepiece_1.2.3.zip |
| macOS binaries: | r-release (arm64):morphemepiece_1.2.3.tgz, r-oldrel (arm64):morphemepiece_1.2.3.tgz, r-release (x86_64):morphemepiece_1.2.3.tgz, r-oldrel (x86_64):morphemepiece_1.2.3.tgz |
| Old sources: | morphemepiece archive |
Please use the canonical formhttps://CRAN.R-project.org/package=morphemepieceto link to this page.