Movatterモバイル変換

Title:

Word Factor Vectors

Version:

0.0.1

Description:

A user-friendly factor-like interface for converting strings of text into numeric vectors and rectangular data structures.

Encoding:

UTF-8

LazyData:

true

Imports:

xgboost, tokenizers, text2vec, R6, utils, tibble, ggplot2,stats, Matrix

URL:

https://github.com/mkearney/wactor

BugReports:

https://github.com/mkearney/wactor/issues

RoxygenNote:

7.0.2

License:

MIT + file LICENSE

Suggests:

testthat (≥ 2.1.0), covr

NeedsCompilation:

Packaged:

2019-12-13 05:40:20 UTC; kmw

Author:

Michael W. Kearney

[aut, cre], Lingshu Hu

[ctb]

Maintainer:

Michael W. Kearney <kearneymw@missouri.edu>

Repository:

CRAN

Date/Publication:

2019-12-18 15:30:02 UTC

A wactor object

Description

A factor-like class for word vectors

Methods

Public methods

Method`new()`

Usage

Wactr$new(  text = character(),  tokenizer = NULL,  max_words = 1000,  doc_prop_max = 1,  doc_prop_min = 0)

Arguments

max_words: Maximum number of words in vocabulary
doc_prop_max: Maximum proportion of docs for terms in dinctionary
doc_prop_min: Minimum proportion of docs for terms in dictionary.

Method`clone()`

The objects of this class are cloneable with this method.

Usage

Wactr$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

As wactor

Description

Convert data into object of type 'wactor'

Usage

as_wactor(.x, ...)

Arguments

.x

Input text vector

...

Other args passed to Wactr$new(...)

Value

An object of type wactor

Document term frequency

Description

Converts character vector into document term matrix (dtm)

Usage

dtm(object, .x = NULL)

Arguments

object

Input object containing dictionary (column), e.g., wactor

.x

Text from which the document term matrix will be created

Value

A c-style matrix

Examples

## create wactorw <- wactor(letters)## use wactor to create dtm of same vectordtm(w, letters)## using the initial data is the default; so you don't actually have to## respecify itdtm(w)## use wactor to create dtm on new vectordtm(w, c("a", "e", "i", "o", "u"))## apply directly to character vectordtm(letters)

Split into test and train data sets

Description

Randomly partition input into a list oftrain andtest data sets

Usage

split_test_train(.data, .p = 0.8, ...)

Arguments

.data

Input data. If atomic (numeric, integer, character, etc.), theinput is first converted to a data frame with a column name of "x."

.p

Proportion of data that should be used for thetrain data setoutput. The default value is 0.80, meaning thetrain output will includeroughly 80 pct. of the input cases while thetest output will include roughly20 oct..

...

Optional. The response (outcome) variable. Uses tidy evaluation(quotes are not necessary). This is only relevant if the identifiedvariable is categorical–i.e., character, factor, logical–in which case itis used to ensure a uniform distribution for thetrain output data set.If a value is supplied, uniformity in response level observations isprioritized over the.p (train proportion) value.

Value

A list withtrain andtest tibbles (data.frames)

Examples

## example data framed <- data.frame(  x = rnorm(100),  y = rnorm(100),  z = c(rep("a", 80), rep("b", 20)))## split using defaultssplit_test_train(d)## split 0.60/0.40split_test_train(d, 0.60)## split with equal response level obssplit_test_train(d, 0.80, label = z)## apply to atomic datasplit_test_train(letters)

Term frequency inverse document frequency

Description

Converts character vector into a term frequency inverse document frequency(TFIDF) matrix

Usage

tfidf(object, .x = NULL)

Arguments

object

Input object containing dictionary (column), e.g., wactor

.x

Text from which the tfidf matrix will be created

Value

A c-style matrix

Examples

## create wactorw <- wactor(letters)## use wactor to create tfidf of same vectortfidf(w, letters)## using the initial data is the default; so you don't actually have to## respecify ittfidf(w)## use wactor to create tfidf on new vectortfidf(w, c("a", "e", "i", "o", "u"))## apply directly to character vectortfidf(letters)

Create wactor

Description

Create an object of type 'wactor'

Usage

wactor(.x, ...)

Arguments

.x

Input text vector

...

Other args passed to Wactr$new(...)

Value

An object of type wactor

Examples

## createw <- wactor(c("a", "a", "a", "b", "b", "c"))## summarizesummary(w)## plotplot(w)## predictpredict(w)## use on NEW datadtm(w, letters[1:5])## dtm() is the same as predict()predict(w, letters[1:5])## works if you specify 'newdata' toopredict(w, newdata = letters[1:5])

xgb matrix

Description

Simple wrapper for creating a xgboost matrix

Usage

xgb_mat(x, ..., y = NULL, split = NULL)

Arguments

x

Input data

...

Other data to cbind

y

Label vector

split

Optional number between 0-1 indicating the desired split betweentrain and test

Value

A xgb.Dmatrix

Examples

xgb_mat(data.frame(x = rnorm(20), y = rnorm(20)))

Movatterモバイル変換

A wactor object

Description

Methods

Public methods

Methodnew()

Usage

Arguments

Methodclone()

Usage

Arguments

As wactor

Description

Usage

Arguments

Value

Document term frequency

Description

Usage

Arguments

Value

Examples

Split into test and train data sets

Description

Usage

Arguments

Value

Examples

Term frequency inverse document frequency

Description

Usage

Arguments

Value

Examples

Create wactor

Description

Usage

Arguments

Value

Examples

xgb matrix

Description

Usage

Arguments

Value

Examples

Method`new()`

Method`clone()`