EmilHvitfeldt/textdataPublic

NotificationsYou must be signed in to change notification settings
Fork11
Star78

Download, parse, store, and load text datasets instead of storing it in packages

License

Unknown, MIT licenses found

Licenses found

78 stars 11 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 143 Commits
.github		.github
R		R
man		man
pkgdown/favicon		pkgdown/favicon
revdep		revdep
tests		tests
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
.travis.yml		.travis.yml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
README.Rmd		README.Rmd
README.md		README.md
_pkgdown.yml		_pkgdown.yml
codecov.yml		codecov.yml
cran-comments.md		cran-comments.md
textdata.Rproj		textdata.Rproj

Repository files navigation

textdata

The goal of textdata is to provide access to text-related data sets foreasy access without bundling them inside a package. Some text datasetsare too large to store within an R package or are licensed in such a waythat prevents them from being included in an OSS-licensed package.Instead, this package provides a framework to download, parse, and storethe datasets on the disk and load them when needed.

Installation

You can install the not yet released version of textdata fromCRAN with:

install.packages("textdata")

And the development version fromGitHub with:

# install.packages("remotes")remotes::install_github("EmilHvitfeldt/textdata")

Example

The first time you use one of the functions for accessing an includedtext dataset, such aslexicon_afinn() ordataset_sentence_polarity(), the function will prompt you to agreethat you understand the dataset’s license or terms of use and thendownload the dataset to your computer.

After the first use, each time you use a function likelexicon_afinn(), the function will load the dataset from disk.

Included text datasets

As of today, the datasets included in textdata are:

Dataset	Function
v1.0 sentence polarity dataset	`dataset_sentence_polarity()`
AFINN-111 sentiment lexicon	`lexicon_afinn()`
Hu and Liu’s opinion lexicon	`lexicon_bing()`
NRC word-emotion association lexicon	`lexicon_nrc()`
NRC Emotion Intensity Lexicon	`lexicon_nrc_eil()`
The NRC Valence, Arousal, and Dominance Lexicon	`lexicon_nrc_vad()`
Loughran and McDonald’s opinion lexicon for financial documents	`lexicon_loughran()`
AG’s News	`dataset_ag_news()`
DBpedia ontology	`dataset_dbpedia()`
Trec-6 and Trec-50	`dataset_trec()`
IMDb Large Movie Review Dataset	`dataset_imdb()`
Stanford NLP GloVe pre-trained word vectors	`embedding_glove6b()`
	`embedding_glove27b()`
	`embedding_glove42b()`
	`embedding_glove840b()`

Check out each function’s documentation for detailed information(including citations) for the relevant dataset.

Community Guidelines

Note that this project is released with aContributor Code ofConduct.By contributing to this project, you agree to abide by its terms.Feedback, bug reports (and fixes!), and feature requests are welcome;file issues or seek supporthere. For details onhow to add a new dataset to this package, check out the vignette!

About

Download, parse, store, and load text datasets instead of storing it in packages

emilhvitfeldt.github.io/textdata/

Topics

r rstats text-datasets

Resources

Readme

License

Unknown, MIT licenses found

Releases8

textdata 0.4.5 Latest

May 28, 2024

+ 7 releases

Packages

No packages published

Contributors6

Languages

R100.0%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Licenses found

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

textdata

Installation

Example

Included text datasets

Community Guidelines

About

Topics

Resources

License

Licenses found

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases8

Packages

Uh oh!

Contributors6

Uh oh!

Languages

Movatterモバイル変換

License

Licenses found

EmilHvitfeldt/textdata

Folders and files

Latest commit

History

Repository files navigation

textdata

Installation

Example

Included text datasets

Community Guidelines

About

Topics

Resources

License

Licenses found

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases8

Packages0

Uh oh!

Contributors6

Uh oh!

Languages

Packages