Fast and memory-friendly tools for text vectorization, topic modeling (LDA, LSA), word embeddings (GloVe), similarities. This package provides a source-agnostic streaming API, which allows researchers to perform analysis of collections of documents which are larger than available RAM. All core functions are parallelized to benefit from multicore machines.
| Version: | 0.6.6 |
| Depends: | R (≥ 3.6.0), methods |
| Imports: | Matrix (≥ 1.5-2),Rcpp (≥ 1.0.3),R6 (≥ 2.3.0),data.table (≥ 1.9.6),rsparse (≥ 0.3.3.4),stringi (≥ 1.1.5),mlapi (≥ 0.1.0),lgr (≥ 0.2),digest (≥ 0.6.8) |
| LinkingTo: | Rcpp,digest (≥ 0.6.8) |
| Suggests: | magrittr,udpipe (≥ 0.6),glmnet,testthat,covr,knitr,rmarkdown,proxy,LDAvis |
| Published: | 2025-12-01 |
| DOI: | 10.32614/CRAN.package.text2vec |
| Author: | Dmitriy Selivanov [aut, cre, cph], Manuel Bickel [aut, cph] (Coherence measures for topic models), Qing Wang [aut, cph] (Author of the WaprLDA C++ code) |
| Maintainer: | Dmitriy Selivanov <selivanov.dmitriy at gmail.com> |
| BugReports: | https://github.com/dselivanov/text2vec/issues |
| License: | GPL-2 |GPL-3 | fileLICENSE [expanded from: GPL (≥ 2) | file LICENSE] |
| URL: | http://text2vec.org |
| NeedsCompilation: | yes |
| Materials: | README,NEWS |
| In views: | NaturalLanguageProcessing |
| CRAN checks: | text2vec results |