| Title: | Extra Recipes Steps for Dealing with Omics Data |
| Version: | 0.0.3 |
| Description: | Omics data (e.g. transcriptomics, proteomics, metagenomics...) offer a detailed and multi-dimensional perspective on the molecular components and interactions within complex biological (eco)systems. Analyzing these data requires adapted procedures, which are implemented as steps according to the 'recipes' package. |
| License: | GPL (≥ 3) |
| URL: | https://github.com/abichat/scimo |
| BugReports: | https://github.com/abichat/scimo/issues |
| Depends: | R (≥ 2.10), recipes (≥ 1.1) |
| Imports: | dplyr, generics, magrittr, rlang, stats, tibble, tidyr |
| Suggests: | ggplot2, knitr, rmarkdown, testthat (≥ 3.0.0) |
| VignetteBuilder: | knitr |
| Config/testthat/edition: | 3 |
| Encoding: | UTF-8 |
| LazyData: | false |
| RoxygenNote: | 7.3.2 |
| NeedsCompilation: | no |
| Packaged: | 2025-07-24 18:38:45 UTC; antoinebichat |
| Author: | Antoine BICHAT |
| Maintainer: | Antoine BICHAT <antoine.bichat@proton.me> |
| Repository: | CRAN |
| Date/Publication: | 2025-07-24 19:20:02 UTC |
scimo: Extra Recipes Steps for Dealing with Omics Data
Description

Omics data (e.g. transcriptomics, proteomics, metagenomics...) offer a detailed and multi-dimensional perspective on the molecular components and interactions within complex biological (eco)systems. Analyzing these data requires adapted procedures, which are implemented as steps according to the 'recipes' package.
Author(s)
Maintainer: Antoine BICHATantoine.bichat@proton.me (ORCID)
Other contributors:
Julie AUBERTjulie.aubert@inrae.fr (ORCID) [contributor]
See Also
Useful links:
Pipe operator
Description
Seemagrittr::%>% for details.
Usage
lhs %>% rhsArguments
lhs | A value or the magrittr placeholder. |
rhs | A function call using the magrittr semantics. |
Value
The result of callingrhs(lhs).
Abundance of Fungal Communities in Cheese
Description
Fungal community abundance of 74 ASVs sampled from the surface of threedifferent French cheeses.
Usage
data("cheese_abundance", package = "scimo")data("cheese_taxonomy", package = "scimo")Format
Forcheese_abundance, atibblewith columns:
- sample
Sample ID.
- cheese
Appellation of the cheese. One of
Saint-Nectaire,LivarotorEpoisses.- rind_type
One of
NaturalorWashed.- other columns
Count of the ASV.
Forcheese_taxonomy, atibblewith columns:
- asv
Amplicon Sequence Variant (ASV) ID.
- lineage
Character corresponding to a standard concatenation oftaxonomic clades.
- other columns
Clade to which the ASV belongs.
Source
This dataset came fromdoi:10.24072/pcjournal.321.
Examples
data("cheese_abundance", package = "scimo")cheese_abundancedata("cheese_taxonomy", package = "scimo")cheese_taxonomyCoefficient of variation
Description
Coefficient of variation
Usage
cv(x, na.rm = TRUE)Arguments
x | A numeric vector. |
na.rm | Logical indicating whether NA values should be strippedbefore the computation proceeds. Default to |
Value
The coefficient of variation ofx.
Author(s)
Antoine Bichat
Examples
scimo:::cv(1:10)Gene Expression of Pediatric Cancer
Description
Gene expression of 108 CCLE cell lines from 5 different pediatric cancers.
Usage
data("pedcan_expression", package = "scimo")Format
Atibble with columns:
- cell_line
Cell line name.
- sex
One of
Male,FemaleorUnknown.- event
One of
Primary,MetastasisorUnknown.- disease
One of
Neuroblastoma,Ewing Sarcoma,Rhabdomyosarcoma,Embryonal TumororOsteosarcoma.- other columns
Expression of the gene, given in log2(TPM + 1).
Source
This dataset is generated from DepMap Public 23Q4 primary files.https://depmap.org/portal/download/all/.
Examples
data("pedcan_expression", package = "scimo")pedcan_expressionS3 methods for tracking which additional packages are needed for steps.
Description
Recipe-adjacent packages always list themselves as a required package so thatthe steps can function properly within parallel processing schemes.
Usage
## S3 method for class 'step_aggregate_hclust'required_pkgs(x, ...)## S3 method for class 'step_aggregate_list'required_pkgs(x, ...)## S3 method for class 'step_rownormalize_tss'required_pkgs(x, ...)## S3 method for class 'step_select_background'required_pkgs(x, ...)## S3 method for class 'step_select_cv'required_pkgs(x, ...)## S3 method for class 'step_select_kruskal'required_pkgs(x, ...)## S3 method for class 'step_select_wilcoxon'required_pkgs(x, ...)## S3 method for class 'step_taxonomy'required_pkgs(x, ...)Arguments
x | A recipe step |
Value
A character vector
Feature aggregation step based on a hierarchical clustering
Description
Aggregate variables according to hierarchical clustering.
Usage
step_aggregate_hclust( recipe, ..., role = "predictor", trained = FALSE, n_clusters, fun_agg, dist_metric = "euclidean", linkage_method = "complete", res = NULL, prefix = "cl_", keep_original_cols = FALSE, skip = FALSE, id = rand_id("aggregate_hclust"))## S3 method for class 'step_aggregate_hclust'tidy(x, ...)Arguments
recipe | A recipe object. The step will be added to the sequence ofoperations for this recipe. |
... | One or more selector functions to choose variablesfor this step. See |
role | For model terms created by this step, what analysis role shouldthey be assigned? By default, the new columns created by this step fromthe original variables will be used as |
trained | A logical to indicate if the quantities for preprocessinghave been estimated. |
n_clusters | Number of cluster to create. |
fun_agg | Aggregation function like |
dist_metric | Default to |
linkage_method | Default to |
res | This parameter is only produced after the recipe has been trained. |
prefix | A character string for the prefix of the resulting newvariables. |
keep_original_cols | A logical to keep the original variables inthe output. Defaults to |
skip | A logical. Should the step be skipped when therecipe is baked by |
id | A character string that is unique to this step to identify it. |
x | A |
Value
An updated version of recipe with the new step added to thesequence of any existing operations.
Author(s)
Antoine Bichat
Examples
rec <- iris %>% recipe(formula = Species ~ .) %>% step_aggregate_hclust(all_numeric_predictors(), n_clusters = 2, fun_agg = sum) %>% prep()rectidy(rec, 1)bake(rec, new_data = NULL)Feature aggregation step based on a defined list
Description
Aggregate variables according to prior knowledge.
Usage
step_aggregate_list( recipe, ..., role = "predictor", trained = FALSE, list_agg = NULL, fun_agg = NULL, others = "discard", name_others = "others", res = NULL, prefix = "agg_", keep_original_cols = FALSE, skip = FALSE, id = rand_id("aggregate_list"))## S3 method for class 'step_aggregate_list'tidy(x, ...)Arguments
recipe | A recipe object. The step will be added to the sequence ofoperations for this recipe. |
... | One or more selector functions to choose variablesfor this step. See |
role | For model terms created by this step, what analysis role shouldthey be assigned? By default, the new columns created by this step fromthe original variables will be used as |
trained | A logical to indicate if the quantities for preprocessinghave been estimated. |
list_agg | Named list of aggregated variables. |
fun_agg | Aggregation function like |
others | Behavior for the selected variables in |
name_others | If |
res | This parameter is only produced after the recipe has been trained. |
prefix | A character string for the prefix of the resulting newvariables that are not named in |
keep_original_cols | A logical to keep the original variables inthe output. Defaults to |
skip | A logical. Should the step be skipped when therecipe is baked by |
id | A character string that is unique to this step to identify it. |
x | A |
Value
An updated version of recipe with the new step added to thesequence of any existing operations.
Author(s)
Antoine Bichat
Examples
list_iris <- list(sepal.size = c("Sepal.Length", "Sepal.Width"), petal.size = c("Petal.Length", "Petal.Width"))rec <- iris %>% recipe(formula = Species ~ .) %>% step_aggregate_list(all_numeric_predictors(), list_agg = list_iris, fun_agg = prod) %>% prep()rectidy(rec, 1)bake(rec, new_data = NULL)Feature normalization step using total sum scaling
Description
Normalize a set of variables by converting them to proportion, makingthem sum to 1. Also known as simplex projection.
Usage
step_rownormalize_tss( recipe, ..., role = NA, trained = FALSE, res = NULL, skip = FALSE, id = rand_id("rownormalize_tss"))## S3 method for class 'step_rownormalize_tss'tidy(x, ...)Arguments
recipe | A recipe object. The step will be added to the sequence ofoperations for this recipe. |
... | One or more selector functions to choose variablesfor this step. See |
role | Not used by this step since no new variables are created. |
trained | A logical to indicate if the quantities for preprocessinghave been estimated. |
res | This parameter is only produced after the recipe has been trained. |
skip | A logical. Should the step be skipped when therecipe is baked by |
id | A character string that is unique to this step to identify it. |
x | A |
Value
An updated version of recipe with the new step added to thesequence of any existing operations.
Author(s)
Antoine Bichat
Examples
rec <- recipe(Species ~ ., data = iris) %>% step_rownormalize_tss(all_numeric_predictors()) %>% prep()rectidy(rec, 1)bake(rec, new_data = NULL)Feature selection step using background level
Description
Select features that exceed a background level in at least a definednumber of samples.
Usage
step_select_background( recipe, ..., role = NA, trained = FALSE, background_level = NULL, n_samples = NULL, prop_samples = NULL, res = NULL, skip = FALSE, id = rand_id("select_background"))## S3 method for class 'step_select_background'tidy(x, ...)Arguments
recipe | A recipe object. The step will be added to the sequence ofoperations for this recipe. |
... | One or more selector functions to choose variablesfor this step. See |
role | Not used by this step since no new variables are created. |
trained | A logical to indicate if the quantities for preprocessinghave been estimated. |
background_level | Background level to exceed. |
n_samples,prop_samples | Count or proportion of samples in which afeature exceeds |
res | This parameter is only produced after the recipe has been trained. |
skip | A logical. Should the step be skipped when therecipe is baked by |
id | A character string that is unique to this step to identify it. |
x | A |
Value
An updated version of recipe with the new step added to thesequence of any existing operations.
Author(s)
Antoine Bichat
Examples
rec <- iris %>% recipe(formula = Species ~ .) %>% step_select_background(all_numeric_predictors(), background_level = 4, prop_samples = 0.5) %>% prep()rectidy(rec, 1)bake(rec, new_data = NULL)Feature selection step using the coefficient of variation
Description
Select variables with highest coefficient of variation.
Usage
step_select_cv( recipe, ..., role = NA, trained = FALSE, n_kept = NULL, prop_kept = NULL, cutoff = NULL, res = NULL, skip = FALSE, id = rand_id("select_cv"))## S3 method for class 'step_select_cv'tidy(x, ...)Arguments
recipe | A recipe object. The step will be added to the sequence ofoperations for this recipe. |
... | One or more selector functions to choose variablesfor this step. See |
role | Not used by this step since no new variables are created. |
trained | A logical to indicate if the quantities for preprocessinghave been estimated. |
n_kept | Number of variables to keep. |
prop_kept | A numeric value between 0 and 1 representing the proportionof variables to keep. |
cutoff | Threshold beyond which (below or above) the variables arediscarded. |
res | This parameter is only produced after the recipe has been trained. |
skip | A logical. Should the step be skipped when therecipe is baked by |
id | A character string that is unique to this step to identify it. |
x | A |
Value
An updated version of recipe with the new step added to thesequence of any existing operations.
Author(s)
Antoine Bichat
Examples
rec <- recipe(Species ~ ., data = iris) %>% step_select_cv(all_numeric_predictors(), n_kept = 2) %>% prep()rectidy(rec, 1)bake(rec, new_data = NULL)Feature selection step using Kruskal test
Description
Select variables with the lowest (adjusted) p-value of aKruskal-Wallis test against an outcome.
Usage
step_select_kruskal( recipe, ..., role = NA, trained = FALSE, outcome = NULL, n_kept = NULL, prop_kept = NULL, cutoff = NULL, correction = "none", res = NULL, skip = FALSE, id = rand_id("select_kruskal"))## S3 method for class 'step_select_kruskal'tidy(x, ...)Arguments
recipe | A recipe object. The step will be added to the sequence ofoperations for this recipe. |
... | One or more selector functions to choose variablesfor this step. See |
role | Not used by this step since no new variables are created. |
trained | A logical to indicate if the quantities for preprocessinghave been estimated. |
outcome | Name of the variable to perform the test against. |
n_kept | Number of variables to keep. |
prop_kept | A numeric value between 0 and 1 representing the proportionof variables to keep. |
cutoff | Threshold beyond which (below or above) the variables arediscarded. |
correction | Multiple testing correction method. One of |
res | This parameter is only produced after the recipe has been trained. |
skip | A logical. Should the step be skipped when therecipe is baked by |
id | A character string that is unique to this step to identify it. |
x | A |
Value
An updated version of recipe with the new step added to thesequence of any existing operations.
Author(s)
Antoine Bichat
Examples
rec <- iris %>% recipe(formula = Species ~ .) %>% step_select_kruskal(all_numeric_predictors(), outcome = "Species", correction = "fdr", prop_kept = 0.5) %>% prep()rectidy(rec, 1)bake(rec, new_data = NULL)Feature selection step using Wilcoxon test
Description
Select variables with the lowest (adjusted) p-value of aWilcoxon-Mann-Whitney test against an outcome.
Usage
step_select_wilcoxon( recipe, ..., role = NA, trained = FALSE, outcome = NULL, n_kept = NULL, prop_kept = NULL, cutoff = NULL, correction = "none", res = NULL, skip = FALSE, id = rand_id("select_wilcoxon"))## S3 method for class 'step_select_wilcoxon'tidy(x, ...)Arguments
recipe | A recipe object. The step will be added to the sequence ofoperations for this recipe. |
... | One or more selector functions to choose variablesfor this step. See |
role | Not used by this step since no new variables are created. |
trained | A logical to indicate if the quantities for preprocessinghave been estimated. |
outcome | Name of the variable to perform the test against. |
n_kept | Number of variables to keep. |
prop_kept | A numeric value between 0 and 1 representing the proportionof variables to keep. |
cutoff | Threshold beyond which (below or above) the variables arediscarded. |
correction | Multiple testing correction method. One of |
res | This parameter is only produced after the recipe has been trained. |
skip | A logical. Should the step be skipped when therecipe is baked by |
id | A character string that is unique to this step to identify it. |
x | A |
Value
An updated version of recipe with the new step added to thesequence of any existing operations.
Author(s)
Antoine Bichat
Examples
rec <- iris %>% dplyr::filter(Species != "virginica") %>% recipe(formula = Species ~ .) %>% step_select_wilcoxon(all_numeric_predictors(), outcome = "Species", correction = "fdr", prop_kept = 0.5) %>% prep()rectidy(rec, 1)bake(rec, new_data = NULL)Taxonomic clades feature generator
Description
Extract clades from a lineage, as defined in the{yatah} package.
Usage
step_taxonomy( recipe, ..., role = "predictor", trained = FALSE, rank = NULL, res = NULL, keep_original_cols = FALSE, skip = FALSE, id = rand_id("taxonomy"))## S3 method for class 'step_taxonomy'tidy(x, ...)Arguments
recipe | A recipe object. The step will be added to the sequence ofoperations for this recipe. |
... | One or more selector functions to choose variablesfor this step. See |
role | For model terms created by this step, what analysis role shouldthey be assigned? By default, the new columns created by this step fromthe original variables will be used as |
trained | A logical to indicate if the quantities for preprocessinghave been estimated. |
rank | The desired ranks, a combinaison of |
res | This parameter is only produced after the recipe has been trained. |
keep_original_cols | A logical to keep the original variables inthe output. Defaults to |
skip | A logical. Should the step be skipped when therecipe is baked by |
id | A character string that is unique to this step to identify it. |
x | A |
Value
An updated version of recipe with the new step added to thesequence of any existing operations.
Author(s)
Antoine Bichat
Examples
data("cheese_taxonomy")rec <- cheese_taxonomy %>% select(asv, lineage) %>% recipe(~ .) %>% step_taxonomy(lineage, rank = c("order", "genus")) %>% prep()rectidy(rec, 1)bake(rec, new_data = NULL)Decide which variable to keep
Description
Decide which variable to keep
Usage
var_to_keep( values, n_kept = NULL, prop_kept = NULL, cutoff = NULL, maximize = TRUE)Arguments
values | A numeric vector, with one value per variable to keep ordiscard. |
n_kept | Number of variables to keep. |
prop_kept | A numeric value between 0 and 1 representing the proportionof variables to keep. |
cutoff | Threshold beyond which (below or above) the variables arediscarded. |
maximize | Whether to minimize ( |
Value
A logical vector indicating if variables are kept or discarded.
Author(s)
Antoine Bichat
Examples
scimo:::var_to_keep(1:5, n_kept = 3, maximize = TRUE)scimo:::var_to_keep(1:10, cutoff = 8, maximize = FALSE)