Apply 'Wordpiece' (<doi:10.48550/arXiv.1609.08144>) tokenization to input text, given an appropriate vocabulary. The 'BERT' (<doi:10.48550/arXiv.1810.04805>) tokenization conventions are used by default.
| Version: | 2.1.3 |
| Depends: | R (≥ 3.3.0) |
| Imports: | dlr (≥ 1.0.0),fastmatch (≥ 1.1),memoise (≥ 2.0.0),piecemaker (≥ 1.0.0),rlang,stringi (≥ 1.0),wordpiece.data (≥ 1.0.2) |
| Suggests: | covr,knitr,rmarkdown,testthat (≥ 3.0.0) |
| Published: | 2022-03-03 |
| DOI: | 10.32614/CRAN.package.wordpiece |
| Author: | Jonathan Bratt |
| Maintainer: | Jonathan Bratt <jonathan.bratt at macmillan.com> |
| BugReports: | https://github.com/macmillancontentscience/wordpiece/issues |
| License: | Apache License (≥ 2) |
| URL: | https://github.com/macmillancontentscience/wordpiece |
| NeedsCompilation: | no |
| Materials: | README,NEWS |
| CRAN checks: | wordpiece results |
| Reference manual: | wordpiece.html ,wordpiece.pdf |
| Vignettes: | Using wordpiece (source,R code) |
| Package source: | wordpiece_2.1.3.tar.gz |
| Windows binaries: | r-devel:wordpiece_2.1.3.zip, r-release:wordpiece_2.1.3.zip, r-oldrel:wordpiece_2.1.3.zip |
| macOS binaries: | r-release (arm64):wordpiece_2.1.3.tgz, r-oldrel (arm64):wordpiece_2.1.3.tgz, r-release (x86_64):wordpiece_2.1.3.tgz, r-oldrel (x86_64):wordpiece_2.1.3.tgz |
| Old sources: | wordpiece archive |
| Reverse suggests: | textrecipes |
Please use the canonical formhttps://CRAN.R-project.org/package=wordpieceto link to this page.