Movatterモバイル変換


[0]ホーム

URL:


tokenizers.bpe: Byte Pair Encoding Text Tokenization

Unsupervised text tokenizer focused on computational efficiency. Wraps the 'YouTokenToMe' library <https://github.com/VKCOM/YouTokenToMe> which is an implementation of fast Byte Pair Encoding (BPE) <https://aclanthology.org/P16-1162/>.

Version:0.1.4
Depends:R (≥ 2.10)
Imports:Rcpp (≥ 0.11.5)
LinkingTo:Rcpp
Published:2025-09-05
DOI:10.32614/CRAN.package.tokenizers.bpe
Author:Jan Wijffels [aut, cre, cph] (R wrapper), BNOSAC [cph] (R wrapper), VK.com [cph], Gregory Popovitch [ctb, cph] (Files at src/parallel_hashmap (Apache License, Version 2.0), The Abseil Authors [ctb, cph] (Files at src/parallel_hashmap (Apache License, Version 2.0), Ivan Belonogov [ctb, cph] (Files at src/youtokentome (MIT License))
Maintainer:Jan Wijffels <jwijffels at bnosac.be>
License:MPL-2.0
URL:https://github.com/bnosac/tokenizers.bpe
NeedsCompilation:yes
Materials:README,NEWS
In views:NaturalLanguageProcessing
CRAN checks:tokenizers.bpe results

Documentation:

Reference manual:tokenizers.bpe.html ,tokenizers.bpe.pdf

Downloads:

Package source: tokenizers.bpe_0.1.4.tar.gz
Windows binaries: r-devel:tokenizers.bpe_0.1.4.zip, r-release:tokenizers.bpe_0.1.4.zip, r-oldrel:tokenizers.bpe_0.1.4.zip
macOS binaries: r-release (arm64):tokenizers.bpe_0.1.4.tgz, r-oldrel (arm64):tokenizers.bpe_0.1.4.tgz, r-release (x86_64):tokenizers.bpe_0.1.4.tgz, r-oldrel (x86_64):tokenizers.bpe_0.1.4.tgz
Old sources: tokenizers.bpe archive

Reverse dependencies:

Reverse suggests:doc2vec,sentencepiece,textrecipes

Linking:

Please use the canonical formhttps://CRAN.R-project.org/package=tokenizers.bpeto link to this page.


[8]ページ先頭

©2009-2025 Movatter.jp