Movatterモバイル変換

Title:

Extra Recipes Steps for Dealing with Omics Data

Version:

0.0.3

Description:

Omics data (e.g. transcriptomics, proteomics, metagenomics...) offer a detailed and multi-dimensional perspective on the molecular components and interactions within complex biological (eco)systems. Analyzing these data requires adapted procedures, which are implemented as steps according to the 'recipes' package.

License:

GPL (≥ 3)

URL:

https://github.com/abichat/scimo

BugReports:

https://github.com/abichat/scimo/issues

Depends:

R (≥ 2.10), recipes (≥ 1.1)

Imports:

dplyr, generics, magrittr, rlang, stats, tibble, tidyr

Suggests:

ggplot2, knitr, rmarkdown, testthat (≥ 3.0.0)

VignetteBuilder:

knitr

Config/testthat/edition:

Encoding:

UTF-8

LazyData:

false

RoxygenNote:

7.3.2

NeedsCompilation:

Packaged:

2025-07-24 18:38:45 UTC; antoinebichat

Author:

Antoine BICHAT

[aut, cre], Julie AUBERT

[ctb]

Maintainer:

Antoine BICHAT <antoine.bichat@proton.me>

Repository:

CRAN

Date/Publication:

2025-07-24 19:20:02 UTC

scimo: Extra Recipes Steps for Dealing with Omics Data

Description

logo

Author(s)

Maintainer: Antoine BICHATantoine.bichat@proton.me (ORCID)

Other contributors:

Julie AUBERTjulie.aubert@inrae.fr (ORCID) [contributor]

Pipe operator

Description

Seemagrittr::%>% for details.

Usage

lhs %>% rhs

Arguments

lhs

A value or the magrittr placeholder.

rhs

A function call using the magrittr semantics.

Value

The result of callingrhs(lhs).

Abundance of Fungal Communities in Cheese

Description

Fungal community abundance of 74 ASVs sampled from the surface of threedifferent French cheeses.

Usage

data("cheese_abundance", package = "scimo")data("cheese_taxonomy", package = "scimo")

Format

Forcheese_abundance, atibblewith columns:

sample: Sample ID.
cheese: Appellation of the cheese. One ofSaint-Nectaire,Livarot orEpoisses.
rind_type: One ofNatural orWashed.
other columns: Count of the ASV.

Forcheese_taxonomy, atibblewith columns:

asv: Amplicon Sequence Variant (ASV) ID.
lineage: Character corresponding to a standard concatenation oftaxonomic clades.
other columns: Clade to which the ASV belongs.

Source

This dataset came fromdoi:10.24072/pcjournal.321.

Examples

data("cheese_abundance", package = "scimo")cheese_abundancedata("cheese_taxonomy", package = "scimo")cheese_taxonomy

Coefficient of variation

Description

Coefficient of variation

Usage

cv(x, na.rm = TRUE)

Arguments

x

A numeric vector.

na.rm

Logical indicating whether NA values should be strippedbefore the computation proceeds. Default toTRUE.

Value

The coefficient of variation ofx.

Author(s)

Antoine Bichat

Examples

scimo:::cv(1:10)

Gene Expression of Pediatric Cancer

Description

Gene expression of 108 CCLE cell lines from 5 different pediatric cancers.

Usage

data("pedcan_expression", package = "scimo")

Format

Atibble with columns:

cell_line: Cell line name.
sex: One ofMale,Female orUnknown.
event: One ofPrimary,Metastasis orUnknown.
disease: One ofNeuroblastoma,⁠Ewing Sarcoma⁠,Rhabdomyosarcoma,⁠Embryonal Tumor⁠ orOsteosarcoma.
other columns: Expression of the gene, given in log2(TPM + 1).

Source

This dataset is generated from DepMap Public 23Q4 primary files.https://depmap.org/portal/download/all/.

Examples

data("pedcan_expression", package = "scimo")pedcan_expression

S3 methods for tracking which additional packages are needed for steps.

Description

Recipe-adjacent packages always list themselves as a required package so thatthe steps can function properly within parallel processing schemes.

Usage

## S3 method for class 'step_aggregate_hclust'required_pkgs(x, ...)## S3 method for class 'step_aggregate_list'required_pkgs(x, ...)## S3 method for class 'step_rownormalize_tss'required_pkgs(x, ...)## S3 method for class 'step_select_background'required_pkgs(x, ...)## S3 method for class 'step_select_cv'required_pkgs(x, ...)## S3 method for class 'step_select_kruskal'required_pkgs(x, ...)## S3 method for class 'step_select_wilcoxon'required_pkgs(x, ...)## S3 method for class 'step_taxonomy'required_pkgs(x, ...)

Arguments

x

A recipe step

Value

A character vector

Feature aggregation step based on a hierarchical clustering

Description

Aggregate variables according to hierarchical clustering.

Usage

step_aggregate_hclust(  recipe,  ...,  role = "predictor",  trained = FALSE,  n_clusters,  fun_agg,  dist_metric = "euclidean",  linkage_method = "complete",  res = NULL,  prefix = "cl_",  keep_original_cols = FALSE,  skip = FALSE,  id = rand_id("aggregate_hclust"))## S3 method for class 'step_aggregate_hclust'tidy(x, ...)

Arguments

recipe

A recipe object. The step will be added to the sequence ofoperations for this recipe.

...

One or more selector functions to choose variablesfor this step. Seerecipes::selections() for more details.

role

For model terms created by this step, what analysis role shouldthey be assigned? By default, the new columns created by this step fromthe original variables will be used aspredictors in a model.

trained

A logical to indicate if the quantities for preprocessinghave been estimated.

n_clusters

Number of cluster to create.

fun_agg

Aggregation function likesum ormean.

dist_metric

Default toeuclidean. Seestats::dist() for moredetails.

linkage_method

Default tocomplete. Seestats::hclust() for moredetails.

res

This parameter is only produced after the recipe has been trained.

prefix

A character string for the prefix of the resulting newvariables.

keep_original_cols

A logical to keep the original variables inthe output. Defaults toFALSE.

skip

A logical. Should the step be skipped when therecipe is baked byrecipes::bake()? While all operations are bakedwhenrecipes::prep() is run, some operations may not be able to beconducted on new data (e.g. processing the outcome variable(s)).Care should be taken when usingskip = TRUE as it may affectthe computations for subsequent operations.

id

A character string that is unique to this step to identify it.

x

Astep_aggregate_hclust object.

Value

An updated version of recipe with the new step added to thesequence of any existing operations.

Author(s)

Antoine Bichat

Examples

rec <-  iris %>%  recipe(formula = Species ~ .) %>%  step_aggregate_hclust(all_numeric_predictors(),                        n_clusters = 2, fun_agg = sum) %>%  prep()rectidy(rec, 1)bake(rec, new_data = NULL)

Feature aggregation step based on a defined list

Description

Aggregate variables according to prior knowledge.

Usage

step_aggregate_list(  recipe,  ...,  role = "predictor",  trained = FALSE,  list_agg = NULL,  fun_agg = NULL,  others = "discard",  name_others = "others",  res = NULL,  prefix = "agg_",  keep_original_cols = FALSE,  skip = FALSE,  id = rand_id("aggregate_list"))## S3 method for class 'step_aggregate_list'tidy(x, ...)

Arguments

recipe

A recipe object. The step will be added to the sequence ofoperations for this recipe.

...

One or more selector functions to choose variablesfor this step. Seerecipes::selections() for more details.

role

For model terms created by this step, what analysis role shouldthey be assigned? By default, the new columns created by this step fromthe original variables will be used aspredictors in a model.

trained

A logical to indicate if the quantities for preprocessinghave been estimated.

list_agg

Named list of aggregated variables.

fun_agg

Aggregation function likesum ormean.

others

Behavior for the selected variables in... that are notpresent inlist_agg. Ifdiscard (the default), they are not kept.Ifasis, they are kept without modification. Ifaggregate, they areaggregated in a new variable.

name_others

Ifothers is set toaggregate, name of theaggregated variable. Not used otherwise.

res

This parameter is only produced after the recipe has been trained.

prefix

A character string for the prefix of the resulting newvariables that are not named inlist_agg.

keep_original_cols

A logical to keep the original variables inthe output. Defaults toFALSE.

skip

id

A character string that is unique to this step to identify it.

x

Astep_aggregate_list object.

Value

An updated version of recipe with the new step added to thesequence of any existing operations.

Author(s)

Antoine Bichat

Examples

list_iris <- list(sepal.size = c("Sepal.Length", "Sepal.Width"),                  petal.size = c("Petal.Length", "Petal.Width"))rec <-  iris %>%  recipe(formula = Species ~ .) %>%  step_aggregate_list(all_numeric_predictors(),                      list_agg = list_iris, fun_agg = prod) %>%  prep()rectidy(rec, 1)bake(rec, new_data = NULL)

Feature normalization step using total sum scaling

Description

Normalize a set of variables by converting them to proportion, makingthem sum to 1. Also known as simplex projection.

Usage

step_rownormalize_tss(  recipe,  ...,  role = NA,  trained = FALSE,  res = NULL,  skip = FALSE,  id = rand_id("rownormalize_tss"))## S3 method for class 'step_rownormalize_tss'tidy(x, ...)

Arguments

recipe

A recipe object. The step will be added to the sequence ofoperations for this recipe.

...

One or more selector functions to choose variablesfor this step. Seerecipes::selections() for more details.

role

Not used by this step since no new variables are created.

trained

A logical to indicate if the quantities for preprocessinghave been estimated.

res

This parameter is only produced after the recipe has been trained.

skip

id

A character string that is unique to this step to identify it.

x

Astep_rownormalize_tss object.

Value

An updated version of recipe with the new step added to thesequence of any existing operations.

Author(s)

Antoine Bichat

Examples

rec <-  recipe(Species ~ ., data = iris) %>%  step_rownormalize_tss(all_numeric_predictors()) %>%  prep()rectidy(rec, 1)bake(rec, new_data = NULL)

Feature selection step using background level

Description

Select features that exceed a background level in at least a definednumber of samples.

Usage

step_select_background(  recipe,  ...,  role = NA,  trained = FALSE,  background_level = NULL,  n_samples = NULL,  prop_samples = NULL,  res = NULL,  skip = FALSE,  id = rand_id("select_background"))## S3 method for class 'step_select_background'tidy(x, ...)

Arguments

recipe

A recipe object. The step will be added to the sequence ofoperations for this recipe.

...

One or more selector functions to choose variablesfor this step. Seerecipes::selections() for more details.

role

Not used by this step since no new variables are created.

trained

A logical to indicate if the quantities for preprocessinghave been estimated.

background_level

Background level to exceed.

n_samples,prop_samples

Count or proportion of samples in which afeature exceedsbackground_level to be retained.

res

This parameter is only produced after the recipe has been trained.

skip

id

A character string that is unique to this step to identify it.

x

Astep_select_background object.

Value

An updated version of recipe with the new step added to thesequence of any existing operations.

Author(s)

Antoine Bichat

Examples

rec <-  iris %>%  recipe(formula = Species ~ .) %>%  step_select_background(all_numeric_predictors(),                         background_level = 4, prop_samples = 0.5) %>%  prep()rectidy(rec, 1)bake(rec, new_data = NULL)

Feature selection step using the coefficient of variation

Description

Select variables with highest coefficient of variation.

Usage

step_select_cv(  recipe,  ...,  role = NA,  trained = FALSE,  n_kept = NULL,  prop_kept = NULL,  cutoff = NULL,  res = NULL,  skip = FALSE,  id = rand_id("select_cv"))## S3 method for class 'step_select_cv'tidy(x, ...)

Arguments

recipe

A recipe object. The step will be added to the sequence ofoperations for this recipe.

...

One or more selector functions to choose variablesfor this step. Seerecipes::selections() for more details.

role

Not used by this step since no new variables are created.

trained

A logical to indicate if the quantities for preprocessinghave been estimated.

n_kept

Number of variables to keep.

prop_kept

A numeric value between 0 and 1 representing the proportionof variables to keep.n_kept andprop_kept are mutually exclusive.

cutoff

Threshold beyond which (below or above) the variables arediscarded.

res

This parameter is only produced after the recipe has been trained.

skip

id

A character string that is unique to this step to identify it.

x

Astep_select_cv object.

Value

An updated version of recipe with the new step added to thesequence of any existing operations.

Author(s)

Antoine Bichat

Examples

rec <-  recipe(Species ~ ., data = iris) %>%  step_select_cv(all_numeric_predictors(), n_kept = 2) %>%  prep()rectidy(rec, 1)bake(rec, new_data = NULL)

Feature selection step using Kruskal test

Description

Select variables with the lowest (adjusted) p-value of aKruskal-Wallis test against an outcome.

Usage

step_select_kruskal(  recipe,  ...,  role = NA,  trained = FALSE,  outcome = NULL,  n_kept = NULL,  prop_kept = NULL,  cutoff = NULL,  correction = "none",  res = NULL,  skip = FALSE,  id = rand_id("select_kruskal"))## S3 method for class 'step_select_kruskal'tidy(x, ...)

Arguments

recipe

A recipe object. The step will be added to the sequence ofoperations for this recipe.

...

One or more selector functions to choose variablesfor this step. Seerecipes::selections() for more details.

role

Not used by this step since no new variables are created.

trained

A logical to indicate if the quantities for preprocessinghave been estimated.

outcome

Name of the variable to perform the test against.

n_kept

Number of variables to keep.

prop_kept

A numeric value between 0 and 1 representing the proportionof variables to keep.n_kept andprop_kept are mutually exclusive.

cutoff

Threshold beyond which (below or above) the variables arediscarded.

correction

Multiple testing correction method. One ofp.adjust.methods. Default to"none".

res

This parameter is only produced after the recipe has been trained.

skip

id

A character string that is unique to this step to identify it.

x

Astep_select_kruskal object.

Value

An updated version of recipe with the new step added to thesequence of any existing operations.

Author(s)

Antoine Bichat

Examples

rec <-  iris %>%  recipe(formula = Species ~ .) %>%  step_select_kruskal(all_numeric_predictors(), outcome = "Species",                      correction = "fdr", prop_kept = 0.5) %>%  prep()rectidy(rec, 1)bake(rec, new_data = NULL)

Feature selection step using Wilcoxon test

Description

Select variables with the lowest (adjusted) p-value of aWilcoxon-Mann-Whitney test against an outcome.

Usage

step_select_wilcoxon(  recipe,  ...,  role = NA,  trained = FALSE,  outcome = NULL,  n_kept = NULL,  prop_kept = NULL,  cutoff = NULL,  correction = "none",  res = NULL,  skip = FALSE,  id = rand_id("select_wilcoxon"))## S3 method for class 'step_select_wilcoxon'tidy(x, ...)

Arguments

recipe

A recipe object. The step will be added to the sequence ofoperations for this recipe.

...

One or more selector functions to choose variablesfor this step. Seerecipes::selections() for more details.

role

Not used by this step since no new variables are created.

trained

A logical to indicate if the quantities for preprocessinghave been estimated.

outcome

Name of the variable to perform the test against.

n_kept

Number of variables to keep.

prop_kept

A numeric value between 0 and 1 representing the proportionof variables to keep.n_kept andprop_kept are mutually exclusive.

cutoff

Threshold beyond which (below or above) the variables arediscarded.

correction

Multiple testing correction method. One ofp.adjust.methods. Default to"none".

res

This parameter is only produced after the recipe has been trained.

skip

id

A character string that is unique to this step to identify it.

x

Astep_select_wilcoxon object.

Value

An updated version of recipe with the new step added to thesequence of any existing operations.

Author(s)

Antoine Bichat

Examples

rec <-  iris %>%  dplyr::filter(Species != "virginica") %>%  recipe(formula = Species ~ .) %>%  step_select_wilcoxon(all_numeric_predictors(), outcome = "Species",                       correction = "fdr", prop_kept = 0.5) %>%  prep()rectidy(rec, 1)bake(rec, new_data = NULL)

Taxonomic clades feature generator

Description

Extract clades from a lineage, as defined in the{yatah} package.

Usage

step_taxonomy(  recipe,  ...,  role = "predictor",  trained = FALSE,  rank = NULL,  res = NULL,  keep_original_cols = FALSE,  skip = FALSE,  id = rand_id("taxonomy"))## S3 method for class 'step_taxonomy'tidy(x, ...)

Arguments

recipe

A recipe object. The step will be added to the sequence ofoperations for this recipe.

...

One or more selector functions to choose variablesfor this step. Seerecipes::selections() for more details.

role

For model terms created by this step, what analysis role shouldthey be assigned? By default, the new columns created by this step fromthe original variables will be used aspredictors in a model.

trained

A logical to indicate if the quantities for preprocessinghave been estimated.

rank

The desired ranks, a combinaison of"kingdom","phylum","class","order","family","genus","species", or"strain". Seeyatah::get_clade() for more details.

res

This parameter is only produced after the recipe has been trained.

keep_original_cols

A logical to keep the original variables inthe output. Defaults toFALSE.

skip

id

A character string that is unique to this step to identify it.

x

Astep_taxonomy object.

Value

An updated version of recipe with the new step added to thesequence of any existing operations.

Author(s)

Antoine Bichat

Examples

data("cheese_taxonomy")rec <-  cheese_taxonomy %>%  select(asv, lineage) %>%  recipe(~ .) %>%  step_taxonomy(lineage, rank = c("order", "genus")) %>%  prep()rectidy(rec, 1)bake(rec, new_data = NULL)

Decide which variable to keep

Description

Decide which variable to keep

Usage

var_to_keep(  values,  n_kept = NULL,  prop_kept = NULL,  cutoff = NULL,  maximize = TRUE)

Arguments

values

A numeric vector, with one value per variable to keep ordiscard.

n_kept

Number of variables to keep.

prop_kept

A numeric value between 0 and 1 representing the proportionof variables to keep.n_kept andprop_kept are mutually exclusive.

cutoff

Threshold beyond which (below or above) the variables arediscarded.

maximize

Whether to minimize (FALSE) or maximize (TRUE, thedefault) the quantity given byvalues.

Value

A logical vector indicating if variables are kept or discarded.

Author(s)

Antoine Bichat

Examples

scimo:::var_to_keep(1:5, n_kept = 3, maximize = TRUE)scimo:::var_to_keep(1:10, cutoff = 8, maximize = FALSE)

Movatterモバイル変換

scimo: Extra Recipes Steps for Dealing with Omics Data

Description

Author(s)

See Also

Pipe operator

Description

Usage

Arguments

Value

Abundance of Fungal Communities in Cheese

Description

Usage

Format

Source

Examples

Coefficient of variation

Description

Usage

Arguments

Value

Author(s)

Examples

Gene Expression of Pediatric Cancer

Description

Usage

Format

Source

Examples

S3 methods for tracking which additional packages are needed for steps.

Description

Usage

Arguments

Value

Feature aggregation step based on a hierarchical clustering

Description

Usage

Arguments

Value

Author(s)

Examples

Feature aggregation step based on a defined list

Description

Usage

Arguments

Value

Author(s)

Examples

Feature normalization step using total sum scaling

Description

Usage

Arguments

Value

Author(s)

Examples

Feature selection step using background level

Description

Usage

Arguments

Value

Author(s)

Examples

Feature selection step using the coefficient of variation

Description

Usage

Arguments

Value

Author(s)

Examples

Feature selection step using Kruskal test

Description

Usage

Arguments

Value

Author(s)

Examples

Feature selection step using Wilcoxon test

Description

Usage

Arguments