I am writing a software tool to which I would like to add the ability to compute alignments using the efficient Burrows-Wheeler Transform (BWT) approach made popular by tools such as BWA and Bowtie. As far as I can tell, though, both of these tools and their derivatives are invoked strictly via a command-line interface. Are there any libraries that implement BWT-based read alignment with a C/C++ API?
Python bindings would also be great, but probably too much to expect.
- $\begingroup$Not C/C++, but just to mention that rust-bio seems to have BWT-related capabilities: <rust-bio.github.io>$\endgroup$bli– bli2017-06-07 08:51:12 +00:00CommentedJun 7, 2017 at 8:51
4 Answers4
First, let us remark that there exist several hundred read mappers, most of which have been even published (see, e.g., pages 25-29 ofthis thesis). Developing a new mapper probably makes sense only as a programming exercise. Whereas developing a quick proof-of-concept read mapper is usually easy, turning it into a real competitor of existing and well-tuned mappers can last for years.
It is not clear from the provided description how long is your reference, how many alignments you need to compute, etc. In certain situations, it may be useful to write a wrapper over existing mappers (e.g., usingsubprocess.Popen for running the mapper andPySam for parsing the output); while in some other situations, standard dynamic programming may be sufficient (e.g., usingSSW with its Python binding).
Let assume that you want to develop a toy read mapper. Most of read mappers are based on a so called seed-and-extend paradigm. Simply speaking, first you detect candidates for alignments, usually as exact matches between a read and the reference (using either a hash table or some full-text index – e.g., BWT-index). Then you would need to compute alignments for these candidates, typically using some algorithm based on dynamic programming, and report the best obtained alignments (e.g., the ones with the highest alignment score).
There exist two big, powerful and well debugged libraries implementing BWT-indexes which can be easily used for building a read mapper:
SeqAn. See theFMIndex tutorial and thePairwise Sequence Alignment tutorial for quick examples of how to detect exact matches and how to do pairwise alignments. Also, they provide a tutorial about aquick development of a read mapper, but the resulting mapper seems to be hash-table based, not BWT-based. SeqAn is used, e.g., inYARA.
SDSL-Lite. This is a general library for succinct data structures. See thetutorial slides for an idea of how it works. For instance,GramTools are based on SDSL.
- $\begingroup$Never heard of YARA … better-known mappers also use SeqAn, e.g. both TopHat and Bowtie (though not for the mapping itself, as far as I know).$\endgroup$Konrad Rudolph– Konrad Rudolph2017-06-06 21:55:16 +00:00CommentedJun 6, 2017 at 21:55
- $\begingroup$Sigh, I was hoping to avoid writing BWA's SAM output to disk, but PySAM treats input strings as file names and explicitly doesn't support StringIO objects. Guess I'll have to look into named pipes or something like that.$\endgroup$Daniel Standage– Daniel Standage2017-06-12 20:12:35 +00:00CommentedJun 12, 2017 at 20:12
- $\begingroup$PySAM supports also file objects – seepysam.readthedocs.io/en/latest/api.html#pysam.AlignmentFile. AlignmentFile(filepath_or_object,....: "If filepath_or_object is a python File object, the already opened file will be used."$\endgroup$Karel Břinda– Karel Břinda2017-06-12 20:15:33 +00:00CommentedJun 12, 2017 at 20:15
- $\begingroup$Yeah, but running
proc.communicategives me BWA's SAM output as a string. It looks like none of the usual string trickery will work, and somehow I'll have to go through the file system to get PySAM to read it.$\endgroup$Daniel Standage– Daniel Standage2017-06-12 20:17:45 +00:00CommentedJun 12, 2017 at 20:17
BWA-MEM can be used as a library. Filebwa/example.c shows the basic functionality for single-end mapping. It should give identical mapping to the bwa-mem command line. Headerbwa/bwamem.h contains basic documentation. Paired-end mapping is doable, but not well exposed. Several teams, including GATK and MS genomics, have been using bwa-mem as a library.
- $\begingroup$That is extremely cool.$\endgroup$Konrad Rudolph– Konrad Rudolph2017-06-07 13:22:27 +00:00CommentedJun 7, 2017 at 13:22
SeqAn supports BWT tables for use with their parametrisable alignment algorithms.
To use it,follow the general outline for building a SeqAn short-read aligner, and use theFMIndex specialisation instead of — as in the example — theIndexQGram.
If you're looking for a python API, nanopore have created one (although I am not sure how "maintained" it is)https://github.com/nanoporetech/bwapy
Explore related questions
See similar questions with these tags.

