- Notifications
You must be signed in to change notification settings - Fork0
Algorithms package for generating model metagenomes with specified properties
License
ctlab/samovar
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
There is a fundamental problem in modernmetagenomics: there are huge differences between methodological approaches that strongly influence the results, while remaining outside the attention of researchers.
The use of golden practice and open code, while allowing data to be analyzed reproducibly, locks scientists into a single, far from perfect approach, with its own bias.
Therefore, we propose an approach that utilizes de novo generation of the artificial metagenomes -SamovaR.
Warning: beta
Use installation script:
git clone https://github.com/ctlab/samovarcd samovarchmod +x install.sh./install.shAttention: the script automatically detects custom R library paths from.Renviron (R_LIBS) or.Rprofile (libPaths())
InstallR package:
devtools::install_github("https://github.com/ctlab/samovar/")
Attention:check that samovar can be loaded withRscript -e 'library(samovar)',especially in case of several R versions installed
Installpython package:
git clone https://github.com/ctlab/samovarcd samovarpip install -e.
Attention:most samovar usage require properly configurated file in build/config.json
Example usage:
# Generate reads for benchmarking (skip for real data)samovar generate \ --genome_dir$SAMOVAR/data/test_genomes/meta \ --host_genome$SAMOVAR/data/test_genomes/host/9606.fna \ --output_dir samovar# Generate pipeline (for example, kraken2 + kaiju )## specify --input_dir for real datasamovar preprocess \ --output_dir samovar \ --kraken2-test"kraken2$DB_KRAKEN2" \ --kaiju-test"kaiju$DB_KAIJU"# Run the pipeline(s)samovarexec --output-dir samovar
Results and flexibility of the tool can be improved with specification of config files. Please folow wiki, or see {samovar_function} -h
Manual example:
cd samovarbash workflow/pipeline.sh%%{init: {'theme': 'base', 'themeVariables': { 'fontSize': '16px', 'fontFamily': 'arial', 'primaryColor': '#fff', 'primaryTextColor': '#000', 'primaryBorderColor': '#000', 'lineColor': '#000', 'secondaryColor': '#fff', 'tertiaryColor': '#fff'}}}%%graph TD subgraph Input subgraph Metagenomes A1[FastQ files] A2([InSilicoSeq config]) end A3([SAMOVAR config]) end subgraph Processing Metagenomes --> C[Initial annotation] A3 --> C A3 --> F A3 --> E[Metagenome generation] C --> E E --> F[Re-annotation] end subgraph Results F --> G1[Annotators scores] F --> ML subgraph Re-profiling C --> R ML --> R[Corrected results] end C --> C1[Cross-validation] end style Input fill:#90ee9020,stroke:#333,stroke-width:2px style Metagenomes fill:#b2ee9020,stroke:#333,stroke-width:2px style Processing fill:#ee90bf20,stroke:#333,stroke-width:2px style Results fill:#90d8ee20,stroke:#333,stroke-width:2px style Re-profiling fill:#90a4ee20,stroke:#333,stroke-width:2pxBasic usage described invignettes andwiki
You can also try the generator withweb shiny app
See description orsource a vignette
library(samovaR)# download datateatree<- GMrepo_type2data(number_to_process=2000)# filtertealeaves<-teatree %>% teatree_trim(treshhold_species=3,treshhold_samples=3,treshhold_amount=10^(-3))# normalizingteabag<-tealeaves %>% tealeaves_pack()# clusteringconcotion<-teabag %>% teabag_brew(min_cluster_size=4,max_cluster_size=6)# building samovarsamovar<-concotion %>% concotion_pour()# generating new datanew_data<-samovar %>% samovar_boil(n=100)
Documentation for theR package
- R package
samova.Rfor the artificial abundance table generation - Pipeline for the automated benchmarking and re-profiling
%%{init: {'theme': 'base', 'themeVariables': { 'fontSize': '16px', 'fontFamily': 'arial', 'primaryColor': '#fff', 'primaryTextColor': '#000', 'primaryBorderColor': '#000', 'lineColor': '#000', 'secondaryColor': '#fff', 'tertiaryColor': '#fff'}}}%%graph LR A[SamovaR] --> G1[Abundance table generation] G1 --> B[R Package] A --> G2[Automated re-profiling] G2 --> C[snakemake + Python Pipeline] G1 --> G[Shiny App] B --> B1[R/] B --> B2[man/] B --> B3[vignettes/] C --> C1[workflow/] C --> C2[src/] G --> H[shiny/]- Chechenina А., Vaulin N., Ivanov A., Ulyantsev V. Development of in-silico models of metagenomic communities with given properties and a pipeline for their generation. Bioinformatics Institute 2022/23 URL:https://elibrary.ru/item.asp?id=60029330
%%{init: {'theme': 'base', 'themeVariables': { 'fontSize': '16px', 'fontFamily': 'arial', 'primaryColor': '#fff', 'primaryTextColor': '#000', 'primaryBorderColor': '#000', 'lineColor': '#000', 'secondaryColor': '#fff', 'tertiaryColor': '#fff'}}}%%graph LR subgraph "R Package Dependencies" subgraph "Main" direction LR tidyverse scclust Matrix methods end subgraph "Visualization" direction LR ggplot plotly ggnewscale end subgraph "API" direction LR httr jsonlite xml2 end end subgraph "Automated Benchmarking" subgraph "Major" direction LR samova.R R::yaml SnakeMake InSilicoSeq end subgraph "Python packages" direction LR numpy pandas requests ete3 scikit-learn end end linkStyle default stroke:#000About
Algorithms package for generating model metagenomes with specified properties
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
Packages0
Uh oh!
There was an error while loading.Please reload this page.

