Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Extra recipes for Text Processing

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md
NotificationsYou must be signed in to change notification settings

tidymodels/textrecipes

Repository files navigation

R-CMD-checkCodecov test coverageCRAN statusDownloadsLifecycle: maturing

Introduction

textrecipes contain extra steps for therecipes package forpreprocessing text data.

Installation

You can install the released version of textrecipes fromCRAN with:

install.packages("textrecipes")

Install the development version from GitHub with:

# install.packages("pak")pak::pak("tidymodels/textrecipes")

Example

In the following example we will go through the steps needed, to converta character variable to the TF-IDF of its tokenized words after removingstopwords, and, limiting ourself to only the 10 most used words. Thepreprocessing will be conducted on the variablemedium andartist.

library(recipes)library(textrecipes)library(modeldata)data("tate_text")okc_rec<- recipe(~medium+artist,data=tate_text)|>  step_tokenize(medium,artist)|>  step_stopwords(medium,artist)|>  step_tokenfilter(medium,artist,max_tokens=10)|>  step_tfidf(medium,artist)okc_obj<-okc_rec|>  prep()str(bake(okc_obj,tate_text))#> tibble [4,284 × 20] (S3: tbl_df/tbl/data.frame)#>  $ tfidf_medium_colour     : num [1:4284] 2.31 0 0 0 0 ...#>  $ tfidf_medium_etching    : num [1:4284] 0 0.86 0.86 0.86 0 ...#>  $ tfidf_medium_gelatin    : num [1:4284] 0 0 0 0 0 0 0 0 0 0 ...#>  $ tfidf_medium_lithograph : num [1:4284] 0 0 0 0 0 0 0 0 0 0 ...#>  $ tfidf_medium_paint      : num [1:4284] 0 0 0 0 2.35 ...#>  $ tfidf_medium_paper      : num [1:4284] 0 0.422 0.422 0.422 0 ...#>  $ tfidf_medium_photograph : num [1:4284] 0 0 0 0 0 0 0 0 0 0 ...#>  $ tfidf_medium_print      : num [1:4284] 0 0 0 0 0 ...#>  $ tfidf_medium_screenprint: num [1:4284] 0 0 0 0 0 0 0 0 0 0 ...#>  $ tfidf_medium_silver     : num [1:4284] 0 0 0 0 0 0 0 0 0 0 ...#>  $ tfidf_artist_akram      : num [1:4284] 0 0 0 0 0 0 0 0 0 0 ...#>  $ tfidf_artist_beuys      : num [1:4284] 0 0 0 0 0 ...#>  $ tfidf_artist_ferrari    : num [1:4284] 0 0 0 0 0 0 0 0 0 0 ...#>  $ tfidf_artist_john       : num [1:4284] 0 0 0 0 0 0 0 0 0 0 ...#>  $ tfidf_artist_joseph     : num [1:4284] 0 0 0 0 0 ...#>  $ tfidf_artist_león       : num [1:4284] 0 0 0 0 0 0 0 0 0 0 ...#>  $ tfidf_artist_richard    : num [1:4284] 0 0 0 0 0 0 0 0 0 0 ...#>  $ tfidf_artist_schütte    : num [1:4284] 0 0 0 0 0 0 0 0 0 0 ...#>  $ tfidf_artist_thomas     : num [1:4284] 0 0 0 0 0 0 0 0 0 0 ...#>  $ tfidf_artist_zaatari    : num [1:4284] 0 0 0 0 0 0 0 0 0 0 ...

Breaking changes

As of version 0.4.0,step_lda() no longer accepts character variablesand instead takes tokenlist variables.

the following recipe

recipe(~text_var,data=data)|>  step_lda(text_var)

can be replaced with the following recipe to achive the same results

lda_tokenizer<-function(x)text2vec::word_tokenizer(tolower(x))recipe(~text_var,data=data)|>  step_tokenize(text_var,custom_token=lda_tokenizer  )|>  step_lda(text_var)

Contributing

This project is released with aContributor Code ofConduct.By contributing to this project, you agree to abide by its terms.

About

Extra recipes for Text Processing

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

Code of conduct

Stars

Watchers

Forks

Packages

No packages published

Contributors12

Languages


[8]ページ先頭

©2009-2025 Movatter.jp