Movatterモバイル変換


[0]ホーム

URL:


Title:Word Factor Vectors
Version:0.0.1
Description:A user-friendly factor-like interface for converting strings of text into numeric vectors and rectangular data structures.
Encoding:UTF-8
LazyData:true
Imports:xgboost, tokenizers, text2vec, R6, utils, tibble, ggplot2,stats, Matrix
URL:https://github.com/mkearney/wactor
BugReports:https://github.com/mkearney/wactor/issues
RoxygenNote:7.0.2
License:MIT + file LICENSE
Suggests:testthat (≥ 2.1.0), covr
NeedsCompilation:no
Packaged:2019-12-13 05:40:20 UTC; kmw
Author:Michael W. KearneyORCID iD [aut, cre], Lingshu HuORCID iD [ctb]
Maintainer:Michael W. Kearney <kearneymw@missouri.edu>
Repository:CRAN
Date/Publication:2019-12-18 15:30:02 UTC

A wactor object

Description

A factor-like class for word vectors

Methods

Public methods


Methodnew()

Usage
Wactr$new(  text = character(),  tokenizer = NULL,  max_words = 1000,  doc_prop_max = 1,  doc_prop_min = 0)
Arguments
max_words

Maximum number of words in vocabulary

doc_prop_max

Maximum proportion of docs for terms in dinctionary

doc_prop_min

Minimum proportion of docs for terms in dictionary.


Methodclone()

The objects of this class are cloneable with this method.

Usage
Wactr$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.


As wactor

Description

Convert data into object of type 'wactor'

Usage

as_wactor(.x, ...)

Arguments

.x

Input text vector

...

Other args passed to Wactr$new(...)

Value

An object of type wactor


Document term frequency

Description

Converts character vector into document term matrix (dtm)

Usage

dtm(object, .x = NULL)

Arguments

object

Input object containing dictionary (column), e.g., wactor

.x

Text from which the document term matrix will be created

Value

A c-style matrix

Examples

## create wactorw <- wactor(letters)## use wactor to create dtm of same vectordtm(w, letters)## using the initial data is the default; so you don't actually have to## respecify itdtm(w)## use wactor to create dtm on new vectordtm(w, c("a", "e", "i", "o", "u"))## apply directly to character vectordtm(letters)

Split into test and train data sets

Description

Randomly partition input into a list oftrain andtest data sets

Usage

split_test_train(.data, .p = 0.8, ...)

Arguments

.data

Input data. If atomic (numeric, integer, character, etc.), theinput is first converted to a data frame with a column name of "x."

.p

Proportion of data that should be used for thetrain data setoutput. The default value is 0.80, meaning thetrain output will includeroughly 80 pct. of the input cases while thetest output will include roughly20 oct..

...

Optional. The response (outcome) variable. Uses tidy evaluation(quotes are not necessary). This is only relevant if the identifiedvariable is categorical–i.e., character, factor, logical–in which case itis used to ensure a uniform distribution for thetrain output data set.If a value is supplied, uniformity in response level observations isprioritized over the.p (train proportion) value.

Value

A list withtrain andtest tibbles (data.frames)

Examples

## example data framed <- data.frame(  x = rnorm(100),  y = rnorm(100),  z = c(rep("a", 80), rep("b", 20)))## split using defaultssplit_test_train(d)## split 0.60/0.40split_test_train(d, 0.60)## split with equal response level obssplit_test_train(d, 0.80, label = z)## apply to atomic datasplit_test_train(letters)

Term frequency inverse document frequency

Description

Converts character vector into a term frequency inverse document frequency(TFIDF) matrix

Usage

tfidf(object, .x = NULL)

Arguments

object

Input object containing dictionary (column), e.g., wactor

.x

Text from which the tfidf matrix will be created

Value

A c-style matrix

Examples

## create wactorw <- wactor(letters)## use wactor to create tfidf of same vectortfidf(w, letters)## using the initial data is the default; so you don't actually have to## respecify ittfidf(w)## use wactor to create tfidf on new vectortfidf(w, c("a", "e", "i", "o", "u"))## apply directly to character vectortfidf(letters)

Create wactor

Description

Create an object of type 'wactor'

Usage

wactor(.x, ...)

Arguments

.x

Input text vector

...

Other args passed to Wactr$new(...)

Value

An object of type wactor

Examples

## createw <- wactor(c("a", "a", "a", "b", "b", "c"))## summarizesummary(w)## plotplot(w)## predictpredict(w)## use on NEW datadtm(w, letters[1:5])## dtm() is the same as predict()predict(w, letters[1:5])## works if you specify 'newdata' toopredict(w, newdata = letters[1:5])

xgb matrix

Description

Simple wrapper for creating a xgboost matrix

Usage

xgb_mat(x, ..., y = NULL, split = NULL)

Arguments

x

Input data

...

Other data to cbind

y

Label vector

split

Optional number between 0-1 indicating the desired split betweentrain and test

Value

A xgb.Dmatrix

Examples

xgb_mat(data.frame(x = rnorm(20), y = rnorm(20)))

[8]ページ先頭

©2009-2025 Movatter.jp