Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Download, parse, store, and load text datasets instead of storing it in packages

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md
NotificationsYou must be signed in to change notification settings

EmilHvitfeldt/textdata

Repository files navigation

R-CMD-checkCRAN statusDownloadsDOICodecov test coverageLifecycle: stable

The goal of textdata is to provide access to text-related data sets foreasy access without bundling them inside a package. Some text datasetsare too large to store within an R package or are licensed in such a waythat prevents them from being included in an OSS-licensed package.Instead, this package provides a framework to download, parse, and storethe datasets on the disk and load them when needed.

Installation

You can install the not yet released version of textdata fromCRAN with:

install.packages("textdata")

And the development version fromGitHub with:

# install.packages("remotes")remotes::install_github("EmilHvitfeldt/textdata")

Example

The first time you use one of the functions for accessing an includedtext dataset, such aslexicon_afinn() ordataset_sentence_polarity(), the function will prompt you to agreethat you understand the dataset’s license or terms of use and thendownload the dataset to your computer.

After the first use, each time you use a function likelexicon_afinn(), the function will load the dataset from disk.

Included text datasets

As of today, the datasets included in textdata are:

DatasetFunction
v1.0 sentence polarity datasetdataset_sentence_polarity()
AFINN-111 sentiment lexiconlexicon_afinn()
Hu and Liu’s opinion lexiconlexicon_bing()
NRC word-emotion association lexiconlexicon_nrc()
NRC Emotion Intensity Lexiconlexicon_nrc_eil()
The NRC Valence, Arousal, and Dominance Lexiconlexicon_nrc_vad()
Loughran and McDonald’s opinion lexicon for financial documentslexicon_loughran()
AG’s Newsdataset_ag_news()
DBpedia ontologydataset_dbpedia()
Trec-6 and Trec-50dataset_trec()
IMDb Large Movie Review Datasetdataset_imdb()
Stanford NLP GloVe pre-trained word vectorsembedding_glove6b()
embedding_glove27b()
embedding_glove42b()
embedding_glove840b()

Check out each function’s documentation for detailed information(including citations) for the relevant dataset.

Community Guidelines

Note that this project is released with aContributor Code ofConduct.By contributing to this project, you agree to abide by its terms.Feedback, bug reports (and fixes!), and feature requests are welcome;file issues or seek supporthere. For details onhow to add a new dataset to this package, check out the vignette!

About

Download, parse, store, and load text datasets instead of storing it in packages

Topics

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

Code of conduct

Stars

Watchers

Forks

Packages

No packages published

Contributors6

Languages


[8]ページ先頭

©2009-2025 Movatter.jp