Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Algorithms package for generating model metagenomes with specified properties

License

NotificationsYou must be signed in to change notification settings

ctlab/samovar

Repository files navigation

Automated re-profiling & benchmarking of metagenomic tools based on artificial data generation

R packagepython package

There is a fundamental problem in modernmetagenomics: there are huge differences between methodological approaches that strongly influence the results, while remaining outside the attention of researchers.

The use of golden practice and open code, while allowing data to be analyzed reproducibly, locks scientists into a single, far from perfect approach, with its own bias.

Therefore, we propose an approach that utilizes de novo generation of the artificial metagenomes -SamovaR.

Installation

Quick Installation

Warning: beta

Use installation script:

git clone https://github.com/ctlab/samovarcd samovarchmod +x install.sh./install.sh

Attention: the script automatically detects custom R library paths from.Renviron (R_LIBS) or.Rprofile (libPaths())

Manual Installation

InstallR package:

devtools::install_github("https://github.com/ctlab/samovar/")

Attention:check that samovar can be loaded withRscript -e 'library(samovar)',especially in case of several R versions installed

Installpython package:

git clone https://github.com/ctlab/samovarcd samovarpip install -e.

Attention:most samovar usage require properly configurated file in build/config.json

Usage

Cross-validation and re-profiling

Example usage:

# Generate reads for benchmarking (skip for real data)samovar generate \    --genome_dir$SAMOVAR/data/test_genomes/meta \    --host_genome$SAMOVAR/data/test_genomes/host/9606.fna \    --output_dir samovar# Generate pipeline (for example, kraken2 + kaiju )## specify --input_dir for real datasamovar preprocess \    --output_dir samovar \    --kraken2-test"kraken2$DB_KRAKEN2" \    --kaiju-test"kaiju$DB_KAIJU"# Run the pipeline(s)samovarexec --output-dir samovar

Results and flexibility of the tool can be improved with specification of config files. Please folow wiki, or see {samovar_function} -h

Manual example:

cd samovarbash workflow/pipeline.sh
%%{init: {'theme': 'base', 'themeVariables': { 'fontSize': '16px', 'fontFamily': 'arial', 'primaryColor': '#fff', 'primaryTextColor': '#000', 'primaryBorderColor': '#000', 'lineColor': '#000', 'secondaryColor': '#fff', 'tertiaryColor': '#fff'}}}%%graph TD    subgraph Input        subgraph Metagenomes            A1[FastQ files]            A2([InSilicoSeq config])        end        A3([SAMOVAR config])    end    subgraph Processing        Metagenomes --> C[Initial annotation]        A3 --> C        A3 --> F        A3 --> E[Metagenome generation]        C --> E        E --> F[Re-annotation]    end        subgraph Results        F --> G1[Annotators scores]        F --> ML        subgraph Re-profiling            C --> R            ML --> R[Corrected results]        end        C --> C1[Cross-validation]    end    style Input fill:#90ee9020,stroke:#333,stroke-width:2px    style Metagenomes fill:#b2ee9020,stroke:#333,stroke-width:2px    style Processing fill:#ee90bf20,stroke:#333,stroke-width:2px    style Results fill:#90d8ee20,stroke:#333,stroke-width:2px    style Re-profiling fill:#90a4ee20,stroke:#333,stroke-width:2px
Loading

Artificial metagenome reneration

Basic usage described invignettes andwiki

You can also try the generator withweb shiny app

R generation

See description orsource a vignette

library(samovaR)# download datateatree<- GMrepo_type2data(number_to_process=2000)# filtertealeaves<-teatree %>%  teatree_trim(treshhold_species=3,treshhold_samples=3,treshhold_amount=10^(-3))# normalizingteabag<-tealeaves %>%  tealeaves_pack()# clusteringconcotion<-teabag %>%  teabag_brew(min_cluster_size=4,max_cluster_size=6)# building samovarsamovar<-concotion %>%  concotion_pour()# generating new datanew_data<-samovar %>%  samovar_boil(n=100)

Documentation for theR package

Pipeline

Components

  • R packagesamova.R for the artificial abundance table generation
  • Pipeline for the automated benchmarking and re-profiling

Project Structure

%%{init: {'theme': 'base', 'themeVariables': { 'fontSize': '16px', 'fontFamily': 'arial', 'primaryColor': '#fff', 'primaryTextColor': '#000', 'primaryBorderColor': '#000', 'lineColor': '#000', 'secondaryColor': '#fff', 'tertiaryColor': '#fff'}}}%%graph LR    A[SamovaR] --> G1[Abundance table generation]    G1 --> B[R Package]    A --> G2[Automated re-profiling]    G2 --> C[snakemake + Python Pipeline]    G1 --> G[Shiny App]    B --> B1[R/]    B --> B2[man/]    B --> B3[vignettes/]    C --> C1[workflow/]    C --> C2[src/]    G --> H[shiny/]
Loading

References

  • Chechenina А., Vaulin N., Ivanov A., Ulyantsev V. Development of in-silico models of metagenomic communities with given properties and a pipeline for their generation. Bioinformatics Institute 2022/23 URL:https://elibrary.ru/item.asp?id=60029330

Dependencies

%%{init: {'theme': 'base', 'themeVariables': { 'fontSize': '16px', 'fontFamily': 'arial', 'primaryColor': '#fff', 'primaryTextColor': '#000', 'primaryBorderColor': '#000', 'lineColor': '#000', 'secondaryColor': '#fff', 'tertiaryColor': '#fff'}}}%%graph LR    subgraph "R Package Dependencies"        subgraph "Main"            direction LR            tidyverse            scclust            Matrix            methods        end                subgraph "Visualization"            direction LR            ggplot            plotly            ggnewscale        end                subgraph "API"            direction LR            httr            jsonlite            xml2        end    end        subgraph "Automated Benchmarking"        subgraph "Major"            direction LR            samova.R            R::yaml            SnakeMake            InSilicoSeq        end                subgraph "Python packages"            direction LR            numpy            pandas            requests            ete3            scikit-learn        end    end        linkStyle default stroke:#000
Loading

About

Algorithms package for generating model metagenomes with specified properties

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp