- Notifications
You must be signed in to change notification settings - Fork11
Download, parse, store, and load text datasets instead of storing it in packages
License
Unknown, MIT licenses found
Licenses found
EmilHvitfeldt/textdata
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
The goal of textdata is to provide access to text-related data sets foreasy access without bundling them inside a package. Some text datasetsare too large to store within an R package or are licensed in such a waythat prevents them from being included in an OSS-licensed package.Instead, this package provides a framework to download, parse, and storethe datasets on the disk and load them when needed.
You can install the not yet released version of textdata fromCRAN with:
install.packages("textdata")And the development version fromGitHub with:
# install.packages("remotes")remotes::install_github("EmilHvitfeldt/textdata")
The first time you use one of the functions for accessing an includedtext dataset, such aslexicon_afinn() ordataset_sentence_polarity(), the function will prompt you to agreethat you understand the dataset’s license or terms of use and thendownload the dataset to your computer.
After the first use, each time you use a function likelexicon_afinn(), the function will load the dataset from disk.
As of today, the datasets included in textdata are:
| Dataset | Function |
|---|---|
| v1.0 sentence polarity dataset | dataset_sentence_polarity() |
| AFINN-111 sentiment lexicon | lexicon_afinn() |
| Hu and Liu’s opinion lexicon | lexicon_bing() |
| NRC word-emotion association lexicon | lexicon_nrc() |
| NRC Emotion Intensity Lexicon | lexicon_nrc_eil() |
| The NRC Valence, Arousal, and Dominance Lexicon | lexicon_nrc_vad() |
| Loughran and McDonald’s opinion lexicon for financial documents | lexicon_loughran() |
| AG’s News | dataset_ag_news() |
| DBpedia ontology | dataset_dbpedia() |
| Trec-6 and Trec-50 | dataset_trec() |
| IMDb Large Movie Review Dataset | dataset_imdb() |
| Stanford NLP GloVe pre-trained word vectors | embedding_glove6b() |
embedding_glove27b() | |
embedding_glove42b() | |
embedding_glove840b() |
Check out each function’s documentation for detailed information(including citations) for the relevant dataset.
Note that this project is released with aContributor Code ofConduct.By contributing to this project, you agree to abide by its terms.Feedback, bug reports (and fixes!), and feature requests are welcome;file issues or seek supporthere. For details onhow to add a new dataset to this package, check out the vignette!
About
Download, parse, store, and load text datasets instead of storing it in packages
Topics
Resources
License
Unknown, MIT licenses found
Licenses found
Code of conduct
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Uh oh!
There was an error while loading.Please reload this page.
Contributors6
Uh oh!
There was an error while loading.Please reload this page.

