| Title: | Word Factor Vectors |
| Version: | 0.0.1 |
| Description: | A user-friendly factor-like interface for converting strings of text into numeric vectors and rectangular data structures. |
| Encoding: | UTF-8 |
| LazyData: | true |
| Imports: | xgboost, tokenizers, text2vec, R6, utils, tibble, ggplot2,stats, Matrix |
| URL: | https://github.com/mkearney/wactor |
| BugReports: | https://github.com/mkearney/wactor/issues |
| RoxygenNote: | 7.0.2 |
| License: | MIT + file LICENSE |
| Suggests: | testthat (≥ 2.1.0), covr |
| NeedsCompilation: | no |
| Packaged: | 2019-12-13 05:40:20 UTC; kmw |
| Author: | Michael W. Kearney |
| Maintainer: | Michael W. Kearney <kearneymw@missouri.edu> |
| Repository: | CRAN |
| Date/Publication: | 2019-12-18 15:30:02 UTC |
A wactor object
Description
A factor-like class for word vectors
Methods
Public methods
Methodnew()
Usage
Wactr$new( text = character(), tokenizer = NULL, max_words = 1000, doc_prop_max = 1, doc_prop_min = 0)
Arguments
max_wordsMaximum number of words in vocabulary
doc_prop_maxMaximum proportion of docs for terms in dinctionary
doc_prop_minMinimum proportion of docs for terms in dictionary.
Methodclone()
The objects of this class are cloneable with this method.
Usage
Wactr$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
As wactor
Description
Convert data into object of type 'wactor'
Usage
as_wactor(.x, ...)Arguments
.x | Input text vector |
... | Other args passed to Wactr$new(...) |
Value
An object of type wactor
Document term frequency
Description
Converts character vector into document term matrix (dtm)
Usage
dtm(object, .x = NULL)Arguments
object | Input object containing dictionary (column), e.g., wactor |
.x | Text from which the document term matrix will be created |
Value
A c-style matrix
Examples
## create wactorw <- wactor(letters)## use wactor to create dtm of same vectordtm(w, letters)## using the initial data is the default; so you don't actually have to## respecify itdtm(w)## use wactor to create dtm on new vectordtm(w, c("a", "e", "i", "o", "u"))## apply directly to character vectordtm(letters)Split into test and train data sets
Description
Randomly partition input into a list oftrain andtest data sets
Usage
split_test_train(.data, .p = 0.8, ...)Arguments
.data | Input data. If atomic (numeric, integer, character, etc.), theinput is first converted to a data frame with a column name of "x." |
.p | Proportion of data that should be used for the |
... | Optional. The response (outcome) variable. Uses tidy evaluation(quotes are not necessary). This is only relevant if the identifiedvariable is categorical–i.e., character, factor, logical–in which case itis used to ensure a uniform distribution for the |
Value
A list withtrain andtest tibbles (data.frames)
Examples
## example data framed <- data.frame( x = rnorm(100), y = rnorm(100), z = c(rep("a", 80), rep("b", 20)))## split using defaultssplit_test_train(d)## split 0.60/0.40split_test_train(d, 0.60)## split with equal response level obssplit_test_train(d, 0.80, label = z)## apply to atomic datasplit_test_train(letters)Term frequency inverse document frequency
Description
Converts character vector into a term frequency inverse document frequency(TFIDF) matrix
Usage
tfidf(object, .x = NULL)Arguments
object | Input object containing dictionary (column), e.g., wactor |
.x | Text from which the tfidf matrix will be created |
Value
A c-style matrix
Examples
## create wactorw <- wactor(letters)## use wactor to create tfidf of same vectortfidf(w, letters)## using the initial data is the default; so you don't actually have to## respecify ittfidf(w)## use wactor to create tfidf on new vectortfidf(w, c("a", "e", "i", "o", "u"))## apply directly to character vectortfidf(letters)Create wactor
Description
Create an object of type 'wactor'
Usage
wactor(.x, ...)Arguments
.x | Input text vector |
... | Other args passed to Wactr$new(...) |
Value
An object of type wactor
Examples
## createw <- wactor(c("a", "a", "a", "b", "b", "c"))## summarizesummary(w)## plotplot(w)## predictpredict(w)## use on NEW datadtm(w, letters[1:5])## dtm() is the same as predict()predict(w, letters[1:5])## works if you specify 'newdata' toopredict(w, newdata = letters[1:5])xgb matrix
Description
Simple wrapper for creating a xgboost matrix
Usage
xgb_mat(x, ..., y = NULL, split = NULL)Arguments
x | Input data |
... | Other data to cbind |
y | Label vector |
split | Optional number between 0-1 indicating the desired split betweentrain and test |
Value
A xgb.Dmatrix
Examples
xgb_mat(data.frame(x = rnorm(20), y = rnorm(20)))