- Notifications
You must be signed in to change notification settings - Fork133
Fast vectorization, topic modeling, distances and GloVe word embeddings in R.
License
dselivanov/text2vec
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
text2vec is an R package which provides an efficient framework with a concise API for text analysis and natural language processing (NLP).
Goals which we aimed to achieve as a result of development oftext2vec:
- Concise - expose as few functions as possible
- Consistent - expose unified interfaces, no need to explore new interface for each task
- Flexible - allow to easily solve complex tasks
- Fast - maximize efficiency per single thread, transparently scale to multiple threads on multicore machines
- Memory efficient - use streams and iterators, not keep data in RAM if possible
SeeAPI section for details.
This package is efficient because it is carefully written in C++, which also means that text2vec is memory friendly. Some parts are fully parallelized using OpenMP.
Other emrassingly parallel tasks (such as vectorization) can use any fork-based parallel backend on UNIX-like machines. They can achieve near-linear scalability with the number of available cores.
Finally, a streaming API means that users do not have to load all the data into RAM.
The package hasissue tracker on GitHub where I'm filing feature requests and notes for future work. Any ideas are appreciated.
Contributors are welcome. You can help by:
- testing and leaving feedback on theGitHub issuer tracker (preferably) or directly by e-mail
- forking and contributing (checkcode our style guide). Vignettes, docs, tests, and use cases are very welcome
- by giving me a star onproject page :-)
GPL (>= 2)
About
Fast vectorization, topic modeling, distances and GloVe word embeddings in R.
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Uh oh!
There was an error while loading.Please reload this page.
