- Notifications
You must be signed in to change notification settings - Fork30
Ultra-fast de novo assembler using long noisy reads
License
ruanjue/smartdenovo
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
# Download sample PacBio from the PBcR websitewget -O- http://www.cbcb.umd.edu/software/PBcR/data/selfSampleData.tar.gz| tar zxf -awk'NR%4==1||NR%4==2' selfSampleData/pacbio_filtered.fastq| sed's/^@/>/g'> reads.fa# Install SMARTdenovogit clone https://github.com/ruanjue/smartdenovo.git&& (cd smartdenovo; make)# Assemble (raw unitigs in wtasm.lay.utg; consensus unitigs: wtasm.cns)smartdenovo/smartdenovo.pl -c 1 reads.fa> wtasm.makmake -f wtasm.mak
SMARTdenovo is ade novo assembler for PacBio and Oxford Nanopore (ONT)data. It produces an assembly from all-vs-all raw read alignments withoutan error correction stage. It also provides tools to generate accurateconsensus sequences, though a platform dependent consensus polish tools (e.g.Quiver for PacBio or Nanopolish for ONT) are still required for higheraccuracy.
SMARTdenovo consists of several separate command line tools:wtzmo for readoverlapping,wtgbo to rescue missing overlaps,wtclp for identifyinglow-quality regions and chimaera, andwtcns orwtmsa to produce betterunitig consensus. Thesmartdenovo.pl
script provides a convenient interfaceto call these programs in one go. If you do not care about the internal ofSMARTdenovo, you may simply run with:
/path/to/smartdenovo/smartdenovo.pl -p prefix -c 1 reads.fa> prefix.makmake -f prefix.mak
It calls other SMARTdenovo executables in the same directory containingsmartdenovo.pl
. After assembly, the raw unitigs are reported in fileprefix.lay.utg
and consensus unitigs inprefix.cns
. If you want to knowmore about how SMARTdenovo works in detail, please seeREADME-tools.md.
Most time of assembly is spent on Smith-Waterm alignment, which might be not necessaryto long reads assembly. We are developping a novel algorithm, called dot matrix alignment, which is smith-waterman free.
wtzmo now supports dot matrix alignment by add option-U -1 -m 0.1
.run_dmo.sh
workswell on E.coli, Yeast PacBio dataset, Bacteria ERS554120, and drosopila.