Movatterモバイル変換


[0]ホーム

URL:


mrIML:multi-Response (Multivariate) Interpretable Machine Learning

GitHub R package versionGitHub contributorsGitHub last commitR-CMD-checkCodecov test coverage

Overview

This package aims to enable users to build and interpret multivariatemachine learning models harnessing the tidyverse (tidy model syntax inparticular). This package builds off ideas from Gradient Forests (Elliset al., 2012), ecological genomic approaches (Fitzpatrick & Keller,2015), and multi-response stacking algorithms (Xing et al., 2020).

This package can be of use for any multi-response machine learningproblem, but was designed to handle data common to community ecology(site by species data) and ecological genomics (individual or populationby SNP loci).

How to Install

You can install the development version ofmrIML usingdevtools:

install.packages("mrIML")# Install development versiondevtools::install_github('nickfountainjones/mrIML')

Using mrIML

To get started, load mrIML and tidymodels:

library(mrIML)library(tidymodels)#> ── Attaching packages ────────────────────────────────────── tidymodels 1.3.0 ──#> ✔ broom        1.0.8     ✔ recipes      1.3.0#> ✔ dials        1.4.0     ✔ rsample      1.3.0#> ✔ dplyr        1.1.4     ✔ tibble       3.2.1#> ✔ ggplot2      3.5.2     ✔ tidyr        1.3.1#> ✔ infer        1.0.8     ✔ tune         1.3.0#> ✔ modeldata    1.4.0     ✔ workflows    1.2.0#> ✔ parsnip      1.3.1     ✔ workflowsets 1.1.0#> ✔ purrr        1.0.4     ✔ yardstick    1.3.2#> ── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ──#> ✖ purrr::discard() masks scales::discard()#> ✖ dplyr::filter()  masks stats::filter()#> ✖ dplyr::lag()     masks stats::lag()#> ✖ recipes::step()  masks stats::step()

Many functions in mrIML benefit from parallel processing.

future::plan("multisession",workers =2)

The core function ofmrIML ismrIMLpredicts(), which is a wrapper around the tidymodelsworkflow that fits a provided model to each response variable in amulti-response data set.

# Load example multi-response datadata<- MRFcov::Bird.parasites# Split into response and predictor dataY<- data%>%select(-c("scale.prop.zos"))X<- data%>%select(scale.prop.zos)# Define tidymodelmodel<-rand_forest(trees =100,mode ="classification",mtry =tune(),min_n =tune())%>%set_engine("randomForest")# Fit multi-response modelmrIML_model<-mrIMLpredicts(X = X,Y = Y,Model = model,prop =0.7,k =5)#>   |                                                                              |                                                                      |   0%  |                                                                              |==================                                                    |  25%  |                                                                              |===================================                                   |  50%  |                                                                              |====================================================                  |  75%  |                                                                              |======================================================================| 100%

The objectmrIML_model can be investigated using:

Two multi-response models can be compared usingmrPerformance().

Bootstrapping can be implemented usingmrBootstrap(),which can then be used to quantify uncertainty around partial dependenceplots,mrPdPlotBootstrap(), and variable importance,mrvipBootstrap(), as well as build co-occurrence networksusingmrCoOccurNet().

Recent mrIML publications

  1. Fountain-Jones, N. M., Kozakiewicz, C. P., Forester, B. R.,Landguth, E. L., Carver, S., Charleston, M., Gagne, R. B., Greenwell,B., Kraberger, S., Trumbo, D. R., Mayer, M., Clark, N. J., &Machado, G. (2021). MrIML: Multi-response interpretable machine learningto model genomic landscapes.Molecular Ecology Resources, 21,2766–2781.https://doi.org/10.1111/1755-0998.13495

  2. Sykes, A. L., Silva, G. S., Holtkamp, D. J., Mauch, B. W.,Osemeke, O., Linhares, D. C. L., & Machado, G. (2021). Interpretablemachine learning applied to on-farm biosecurity and porcine reproductiveand respiratory syndrome virus.Transboundary and Emerging Diseases,00, 1–15.https://doi.org/10.1111/tbed.14369

  3. Fountain-Jones, N. M., Appaw, R., Alkhamis, M., Baker, S., Clark,N., Powell-Romero, F., Mayer, M., Machado, G., & Videvall, E.(2024). Advancing ecological community analysis with MrIML 2.0:Unravelling taxa associations through interpretable machine learning.Authorea [preprint].https://doi.org/10.22541/au.172676147.77148600/v1

References

Ellis, N., Smith, S. J., & Pitcher, C. R. (2012). Gradientforests: calculating importance gradients on physical predictors.Ecology, 93, 156-168.https://doi.org/10.1890/11-0252.1

Fitzpatrick, M. C., & Keller, S. R. (2015). Ecological genomicsmeets community-level modelling of biodiversity: Mapping the genomiclandscape of current and future environmental adaptation.EcologyLetters, 18, 1–16.https://doi.org/10.1111/ele.12376

Xing, L., Lesperance, M. L., & Zhang, X. (2020). Simultaneousprediction of multiple outcomes using revised stacking algorithms.Bioinformatics, 36, 65-72.https://doi.org/10.1093/bioinformatics/btz531


[8]ページ先頭

©2009-2025 Movatter.jp