tseemann/sixessPublic

NotificationsYou must be signed in to change notification settings
Fork2
Star24

🔬🐛 Rapid 16s rRNA identification from isolate FASTQ files

License

GPL-3.0 license

24 stars 2 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
bin		bin
db		db
test		test
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
test.sh		test.sh

Repository files navigation

⚠️THIS SOFTWARE IS STILL UNDER DEVELOPMENT - USE AT OWN RISK

sixess

Rapid 16s rDNA from isolate FASTQ files

Introduction

sixess is a command-line software tool to identifybacterial species based on 16S rDNA sequence directlyfrom WGS FASTQ data. It includes databases fromNCBI (default), RDP and SILVA.

Quick start

# just give it sequences!% sixess R1.fastq.gzStaphylococcus epidermidis# sometimes there is no match% sixess /dev/nullNo matches# give it as many sequence files as needed% sixess R1.fq R2.fqEnterococcus faecium# we provide different databases you can choose% sixess -d RDP contigs.faBacillus cereus# you can pipe to stdin too% bzcat chernobyl.fq.bz2 | sixess -Deinococcus radiodurans

Installation

Source

cd $HOMEgit clone https://github.com/tseemann/sixessexport PATH=$HOME/sixess/bin:$PATH

Homebrew

brew install brewsci/bio/sixess  # COMING SOON

Bioconda

conda install -c bioconda -c conda-forge sixess  # COMING SOON

Usage

Input

The input can be one or more sequence files, or- denotingstdin.The input data can be FASTQ or FASTA, and may be.gz compressed.Any read length is accepted, even whole chromosomes.

Output

The output is asingle line tostdout.If a match was found, it will beGenus species.If no prediction could be made, it will beNo matches.

Options

  -q        Quiet mode, no output  -p DIR    Database folder (/home/tseemann/git/sixess/db)  -d FILE   Database {NCBI RDP SILVA.gz} (NCBI)  -t NUM    CPU threads (1)  -m FILE   Save alignments to FILE in PAF format  -V        Print version and exit

-q enables "quiet mode" which only prints to stderr for errors
-p is the location of the sequence databases
-d selects the database; they can be.gz compressed (seeDatabases
-t increases threads; 3 is the suggested value forminimap2
-m allows you to save the PAF output ofminimap2
-V prints the version and exitse.g.sixess 1.0

Databases

NCBI (bundled, default)

TheNCBI 16S ribosomal RNA projectcontains curated 16S ribosomal RNA bacteria and archaea RefSeq entries.It has ~20,000 entries.

esearch -db nucleotide -query '33175[BioProject] OR 33317[BioProject]' \  | efetch -db nuccore -format fasta \  > $(which sixess)/../db/NCBI

RDP (bundled)

Bacterial 16S rDNA sequences for "type strains"from theRDP databaseare included. These are denoted with(T) in theFASTA headers. It contains ~10,000 entries.

wget --no-check-certificate https://rdp.cme.msu.edu/download/current_Bacteria_unaligned.fa.gzgunzip -c current_Bacteria_unaligned.fa.gz \  | bioawk -cfastx '/\(T\)/{print ">" $name " " $comment "\n" toupper($seq)}' \  > $(which sixess)/../db/RDP

SILVA (bundled)

SILVAis a comprehensive on-line resource for quality checked andaligned ribosomal RNA sequence data.The filtered version of the aligned 16S/18S/SSU databasecontains ~100,000 entries.

# replace "132" with latest version as neededwget https://www.arb-silva.de/fileadmin/silva_databases/release_132/Exports/SILVA_132_SSURef_Nr99_tax_silva.fasta.gzgunzip -v SILVA_132_SSURef_Nr99_tax_silva.fasta.gz \  | bioawk -cfastx \    '$comment ~ /^Bacteria;|^Archaea;/ \    && $comment !~ /(;unidentified|Mitochondria;|;Chloroplast|;uncultured| sp\.)/ \    { sub(/^.*;/,"",$comment);      gsub("U","T",$seq);      print ">" $name " " $comment "\n" $seq }' \  | seqtk seq -l 60 -U \  > SILVA.tmp1cd-hit-est -i SILVA.tmp1 -o SILVA.tmp2 -c 1.0 -T 0 -M 2000 -d 250cp SILVA.tmp2 $(which sixess)/../db/SILVArm -f SILVA.tmp1 SILVA.tmp2 SILVA.tmp2.clstr

Custom databases

Assuming you have a FASTA file of 16S DNA sequencescalled/home/alex/GG.fa say, you can do this:

Global installaion

cp /home/alex/GG.fa $(which sixess)/../db/GGsixess -d GG R1.fastq.gz

Local installaion

sixess -p /home/alex/data -d GG.fa R1.fastq.gz

Algorithm

Identify reads which look like 16S (minimap2)
Count up how many reads hit each 16S sequence (possibly weighted)
Choose the top hit and report it

Feedback

Report bugs and give suggesions on theIssues page

License

GPL Version 3

Author

Torsten Seemann

About

🔬🐛 Rapid 16s rRNA identification from isolate FASTQ files

Releases

No releases published

Packages

No packages published

Languages

Shell100.0%

Movatterモバイル変換

License

tseemann/sixess

Folders and files

Latest commit

History

Repository files navigation

sixess

Introduction

Quick start

Installation

Source

Homebrew

Bioconda

Usage

Input

Output

Options

Databases

NCBI (bundled, default)

RDP (bundled)

SILVA (bundled)

Custom databases

Global installaion

Local installaion

Algorithm

Feedback

License

Author

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Languages

Packages