Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
forked fromlh3/bwa

Burrow-Wheeler Aligner for short-read alignment (see minimap2 for long-read alignment)

License

NotificationsYou must be signed in to change notification settings

tsnorri/bwa

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Build StatusSourceForge DownloadsGitHub DownloadsBioConda Install

Note:minimap2 has replaced BWA-MEM for PacBio and Nanopore readalignment. It retains all major BWA-MEM features, but is ~50 times as fast,more versatile, more accurate and produces better base-level alignment.A beta version ofBWA-MEM2 has been released for short-read mapping.BWA-MEM2 is about twice as fast as BWA-MEM and outputs near identical alignments.

Getting started

git clone https://github.com/lh3/bwa.gitcd bwa; make./bwa index ref.fa./bwa mem ref.fa read-se.fq.gz | gzip -3 > aln-se.sam.gz./bwa mem ref.fa read1.fq read2.fq | gzip -3 > aln-pe.sam.gz

Introduction

BWA is a software package for mapping DNA sequences against a large referencegenome, such as the human genome. It consists of three algorithms:BWA-backtrack, BWA-SW and BWA-MEM. The first algorithm is designed for Illuminasequence reads up to 100bp, while the rest two for longer sequences ranged from70bp to a few megabases. BWA-MEM and BWA-SW share similar features such as thesupport of long reads and chimeric alignment, but BWA-MEM, which is the latest,is generally recommended as it is faster and more accurate. BWA-MEM also hasbetter performance than BWA-backtrack for 70-100bp Illumina reads.

For all the algorithms, BWA first needs to construct the FM-index for thereference genome (theindex command). Alignment algorithms are invoked withdifferent sub-commands:aln/samse/sampe for BWA-backtrack,bwasw for BWA-SW andmem for the BWA-MEM algorithm.

Availability

BWA is released underGPLv3. The latest source code isfreelyavailable at github. Released packages canbe downloaded atSourceForge. After you acquire the source code, simply usemake to compileand copy the single executablebwa to the destination you want. The onlydependency required to build BWA iszlib.

Since 0.7.11, precompiled binary for x86_64-linux is available inbwakit.In addition to BWA, this self-consistent package also comes with bwa-associatedand 3rd-party tools for proper BAM-to-FASTQ conversion, mapping to ALT contigs,adapter triming, duplicate marking, HLA typing and associated data files.

Seeking help

The detailed usage is described in the man page available together with thesource code. You can useman ./bwa.1 to view the man page in a terminal. TheHTML version of the man page can be found at theBWA website. If youhave questions about BWA, you maysign up the mailing list and then sendthe questions tobio-bwa-help@sourceforge.net. You may also ask questionsin forums such asBioStar andSEQanswers.

Citing BWA

  • Li H. and Durbin R. (2009) Fast and accurate short read alignment withBurrows-Wheeler transform.Bioinformatics,25, 1754-1760. [PMID:19451168]. (if you use the BWA-backtrack algorithm)

  • Li H. and Durbin R. (2010) Fast and accurate long-read alignment withBurrows-Wheeler transform.Bioinformatics,26, 589-595. [PMID:20080505]. (if you use the BWA-SW algorithm)

  • Li H. (2013) Aligning sequence reads, clone sequences and assembly contigswith BWA-MEM.arXiv:1303.3997v2 [q-bio.GN]. (if you use the BWA-MEMalgorithm or thefastmap command, or want to cite the whole BWA package)

Please note that the last reference is a preprint hosted atarXiv.org. Ido not have plan to submit it to a peer-reviewed journal in the near future.

Frequently asked questions (FAQs)

  1. What types of data does BWA work with?
  2. Why does a read appear multiple times in the output SAM?
  3. Does BWA work on reference sequences longer than 4GB in total?
  4. Why can one read in a pair has high mapping quality but the other has zero?
  5. How can a BWA-backtrack alignment stands out of the end of a chromosome?
  6. Does BWA work with ALT contigs in the GRCh38 release?
  7. Can I just run BWA-MEM against GRCh38+ALT without post-processing?

1. What types of data does BWA work with?

BWA works with a variety types of DNA sequence data, though the optimalalgorithm and setting may vary. The following list gives the recommendedsettings:

  • Illumina/454/IonTorrent single-end reads longer than ~70bp or assemblycontigs up to a few megabases mapped to a closely related reference genome:

      bwa mem ref.fa reads.fq > aln.sam
  • Illumina single-end reads shorter than ~70bp:

      bwa aln ref.fa reads.fq > reads.sai; bwa samse ref.fa reads.sai reads.fq > aln-se.sam
  • Illumina/454/IonTorrent paired-end reads longer than ~70bp:

      bwa mem ref.fa read1.fq read2.fq > aln-pe.sam
  • Illumina paired-end reads shorter than ~70bp:

      bwa aln ref.fa read1.fq > read1.sai; bwa aln ref.fa read2.fq > read2.sai  bwa sampe ref.fa read1.sai read2.sai read1.fq read2.fq > aln-pe.sam
  • PacBio subreads or Oxford Nanopore reads to a reference genome:

      bwa mem -x pacbio ref.fa reads.fq > aln.sam  bwa mem -x ont2d ref.fa reads.fq > aln.sam

BWA-MEM is recommended for query sequences longer than ~70bp for a variety oferror rates (or sequence divergence). Generally, BWA-MEM is more tolerant witherrors given longer query sequences as the chance of missing all seeds is small.As is shown above, with non-default settings, BWA-MEM works with Oxford Nanoporereads with a sequencing error rate over 20%.

2. Why does a read appear multiple times in the output SAM?

BWA-SW and BWA-MEM perform local alignments. If there is a translocation, a genefusion or a long deletion, a read bridging the break point may have two hits,occupying two lines in the SAM output. With the default setting of BWA-MEM, oneand only one line is primary and is soft clipped; other lines are tagged with0x800 SAM flag (supplementary alignment) and are hard clipped.

3. Does BWA work on reference sequences longer than 4GB in total?

Yes. Since 0.6.x, all BWA algorithms work with a genome with total length over4GB. However, individual chromosome should not be longer than 2GB.

4. Why can one read in a pair have a high mapping quality but the other has zero?

This is correct. Mapping quality is assigned for individual read, not for a readpair. It is possible that one read can be mapped unambiguously, but its matefalls in a tandem repeat and thus its accurate position cannot be determined.

5. How can a BWA-backtrack alignment stand out of the end of a chromosome?

Internally BWA concatenates all reference sequences into one long sequence. Aread may be mapped to the junction of two adjacent reference sequences. In thiscase, BWA-backtrack will flag the read as unmapped (0x4), but you will seeposition, CIGAR and all the tags. A similar issue may occur to BWA-SW alignmentas well. BWA-MEM does not have this problem.

6. Does BWA work with ALT contigs in the GRCh38 release?

Yes, since 0.7.11, BWA-MEM officially supports mapping to GRCh38+ALT.BWA-backtrack and BWA-SW don't properly support ALT mapping as of now. PleaseseeREADME-alt.md for details. Briefly, it is recommended to usebwakit, the binary release of BWA, for generating the reference genomeand for mapping.

7. Can I just run BWA-MEM against GRCh38+ALT without post-processing?

If you are not interested in hits to ALT contigs, it is okay to run BWA-MEMwithout post-processing. The alignments produced this way are very close toalignments against GRCh38 without ALT contigs. Nonetheless, applyingpost-processing helps to reduce false mappings caused by reads from thediverged part of ALT contigs and also enables HLA typing. It is recommended torun the post-processing script.

About

Burrow-Wheeler Aligner for short-read alignment (see minimap2 for long-read alignment)

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C86.3%
  • JavaScript5.6%
  • Roff4.2%
  • C++1.5%
  • Perl1.3%
  • Shell0.6%
  • Makefile0.5%

[8]ページ先頭

©2009-2025 Movatter.jp