Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

SortMeRNA: next-generation sequence filtering and alignment tool

License

GPL-3.0, GPL-3.0 licenses found

Licenses found

GPL-3.0
LICENSE.txt
GPL-3.0
COPYING
NotificationsYou must be signed in to change notification settings

sortmerna/sortmerna

Repository files navigation

SortMeRNA is a local sequence alignment tool for filtering, mapping and clustering.

The core algorithm is based on approximate seeds and allows for sensitive analysis of NGS reads.The main application of SortMeRNA is filtering rRNA from metatranscriptomic data.SortMeRNA takes as input files of reads (fasta, fastq, fasta.gz, fastq.gz) and one or multiplerRNA database file(s), and sorts apart aligned and rejected reads into two files. SortMeRNA workswith Illumina, Ion Torrent and PacBio data, and can produce SAM and BLAST-like alignments.

SortMeRNA is also available throughQIIME v1.9.1 andthenf-core RNA-Seq pipeline v.3.9.

Table of Contents

Getting Started

SortMeRNA 4 is C++17 compliant, and mostly uses standard libraries. It uses CMake as the build system, and can be run/built on all major OS including Linux, Windows, and Mac, on AMD64 and ARM64 processors.

Using Conda package

Install conda -official docs

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.shbash Miniconda3-latest-Linux-x86_64.sh

The conda packages before Sortmerna 4.3.7 were hosted on Bioconda. Starting with 4.3.7 the packages are hosted on conda-forge.Erroneously an empty 4.3.7 package made its way to Bioconda, and should be ignored until removed (from Bioconda).

Currently the build on conda-forge still waiting to be merged. Until it is ready, the local installation package can be used:

# == only for 4.3.7 until ready on conda-forge ==# download the conda-build package into a directory of your choice e.g. Downloads/wget https://github.com/sortmerna/sortmerna/releases/download/v4.3.7/sortmerna-4.3.7-conda-linux-64.tar.bz2 -P ~/Downloads/# create a new environment and install SortMeRNA in itconda create --name sortmernaconda activate sortmernaconda install ~/Downloads/sortmerna-4.3.7-conda-linux-64.tar.bz2which sortmerna  # check the installed binary e.g. miniforge3/envs/sortmerna/bin/sortmerna sortmerna -h

For versions older then 4.3.7 per theBioconda guidelines, add the following conda channels:

conda config --add channels defaultsconda config --add channels biocondaconda config --add channels conda-forgeconda config --set channel_priority strictconda search sortmerna  Loading channels: done  # Name                       Version           Build  Channel  sortmerna                        2.0               0  bioconda  ...  sortmerna                      4.3.4               0  bioconda  ...  sortmerna                      4.3.6               0  bioconda  ...  sortmerna                      4.3.7      hdbdd923_1  bioconda <- (!) ignore - corrupt, see instructions above# create a new environment and install SortMeRNA in itconda create --name sortmerna_envconda activate sortmerna_envconda install sortmernawhich sortmerna  /home/biocodz/miniconda3/envs/sortmerna_env/bin/sortmerna# test the installationsortmerna --version  SortMeRNA version 4.3.6  Build Date: Aug 16 2022  sortmerna_build_git_sha:@db8c1983765f61986b46ee686734749eda235dcc@  sortmerna_build_git_date:@2022/08/16 11:42:59@# view helpsortmerna -h

Using GitHub release binaries on Linux

VisitSortmerna GitHub Releases

Linux distribution is a Shell script with the embedded installation archive.

Issue the following bash commands:

pushd ~# get the distrowget https://github.com/biocore/sortmerna/releases/download/v4.3.6/sortmerna-4.3.6-Linux.sh# view the installer usagebash sortmerna-4.3.6-Linux.sh --help    Options: [defaults in brackets after descriptions]      --help            print this message      --version         print cmake installer version      --prefix=dir      directory in which to install      --include-subdir  include the sortmerna-4.3.6-Linux subdirectory      --exclude-subdir  exclude the sortmerna-4.3.6-Linux subdirectory      --skip-license    accept license# run the installerbash sortmerna-4.3.6-Linux.sh --skip-license  sortmerna Installer Version: 4.3.6, Copyright (c) Clarity Genomics  This is a self-extracting archive.  The archive will be extracted to: $HOME/sortmerna    Using target directory: /home/biocodz/sortmerna  Extracting, please wait...    Unpacking finished successfully# check the installed binariesls -lrt /home/biocodz/sortmerna/bin/sortmerna# set PATHexport PATH=$HOME/sortmerna/bin:$PATH# test the installationsortmerna --version  SortMeRNA version 4.3.6  Build Date: Jul 17 2021  sortmerna_build_git_sha:@921fa40256760ea2d44c49b21eb326afda748d5e@  sortmerna_build_git_date:@2022/08/16 10:59:31@# view helpsortmerna -h

Running

  • The only required options are--ref and--reads
  • Options (any) can be specified usig a single dash e.g.-ref and-reads
  • Both plainfasta/fastq and archivedfasta.gz/fastq.gz files are accepted
  • file extensions.fastq, .fastq.gz, .fq, .fq.gz, .fasta, ... are optional. The format and compression are automatically recognized
  • Relative paths are accepted

for example

# single reference and single reads filesortmerna --ref REF_PATH --reads READS_PATH# for multiple references use multiple '--ref'sortmerna --ref REF_PATH_1 --ref REF_PATH_2 --ref REF_PATH_3 --reads READS_PATH# for paired reads use '--reads' twicesortmerna --ref REF_PATH_1 --ref REF_PATH_2 --ref REF_PATH_3 --reads READS_PATH_1 --reads READS_PATH_2

More examples can be found intest.jinja andrun.py

Execution trace

Here is asample execution trace.

IMPORTANT

  • Progressing execution trace showing the number of reads processed so far indicates a normally running program.
  • Non-progressing trace means a problem. Please, kill the process (no waiting for two days), and file an issuehere
  • please, provide the execution trace when filing issues.

Sample execution statistics are provided to give an idea on what the execution time might be.

Building from sources

Build instructions

User Manual

SeeSortmerna Read The Docs project.

In case you need PDF, any modern browser can print web pages to PDF.

Databases

Please, usedatabase.tar.gz from release 4.3.4.

We recommend to use smr_v4.3_default_db.fasta.

Original source databases (clustering parameters given below):

  • Silva 138 SSURef NR99 (16S, 18S)
  • Silva 132 LSURef (23S, 28S)
  • RFAM v14.1 (5S, 5.8S)

The difference between the databases is the % ID for clustering the sequences for each kingdom + rRNA component.

Specifically,

  • smr_v4.3_fast_db.fasta
    • bac-16S 85%, 5S & 5.8S seeds, rest 90% (benchmark accuracy: 99.888%)
  • smr_v4.3_default_db.fasta
    • bac-16S 90%, 5S & 5.8S seeds, rest 95% (benchmark accuracy: 99.899%)
  • smr_v4.3_sensitive_db.fasta
    • all 97% (benchmark accuracy: 99.907%)
  • smr_v4.3_sensitive_db_rfam_seeds.fasta
    • all 97%, except RFAM database which includes the full seed database sequences

The accuracy (based on sensitivity and selectivity) is very good for all databases, however the "sensitive" databases will run at least 2x slower.

Taxonomies

The folderdata/rRNA_databases/silva_ids_acc_tax.tar.gz contains SILVA taxonomy strings (extracted from XML file generated by ARB)for each of the reference sequences in the representative databases. The format of the files is three tab-separated columns,the first being the reference sequence ID, the second being the accession number and the final column is the taxonomy.

Citation

If you use SortMeRNA, please cite:Kopylova E., Noé L. and Touzet H., "SortMeRNA: Fast and accurate filtering of ribosomal RNAs in metatranscriptomic data", Bioinformatics (2012), doi: 10.1093/bioinformatics/bts611.

Contributors

SeeAUTHORS for a list of contributors to this project.

Support

For questions and comments, feel free to file anissue, or start adiscussion.


[8]ページ先頭

©2009-2025 Movatter.jp