14
$\begingroup$

I am writing a software tool to which I would like to add the ability to compute alignments using the efficient Burrows-Wheeler Transform (BWT) approach made popular by tools such as BWA and Bowtie. As far as I can tell, though, both of these tools and their derivatives are invoked strictly via a command-line interface. Are there any libraries that implement BWT-based read alignment with a C/C++ API?

Python bindings would also be great, but probably too much to expect.

terdon's user avatar
terdon
11.4k6 gold badges26 silver badges51 bronze badges
askedJun 6, 2017 at 19:19
Daniel Standage's user avatar
$\endgroup$
1
  • $\begingroup$Not C/C++, but just to mention that rust-bio seems to have BWT-related capabilities: <rust-bio.github.io>$\endgroup$CommentedJun 7, 2017 at 8:51

4 Answers4

12
$\begingroup$

First, let us remark that there exist several hundred read mappers, most of which have been even published (see, e.g., pages 25-29 ofthis thesis). Developing a new mapper probably makes sense only as a programming exercise. Whereas developing a quick proof-of-concept read mapper is usually easy, turning it into a real competitor of existing and well-tuned mappers can last for years.

It is not clear from the provided description how long is your reference, how many alignments you need to compute, etc. In certain situations, it may be useful to write a wrapper over existing mappers (e.g., usingsubprocess.Popen for running the mapper andPySam for parsing the output); while in some other situations, standard dynamic programming may be sufficient (e.g., usingSSW with its Python binding).

Let assume that you want to develop a toy read mapper. Most of read mappers are based on a so called seed-and-extend paradigm. Simply speaking, first you detect candidates for alignments, usually as exact matches between a read and the reference (using either a hash table or some full-text index – e.g., BWT-index). Then you would need to compute alignments for these candidates, typically using some algorithm based on dynamic programming, and report the best obtained alignments (e.g., the ones with the highest alignment score).

There exist two big, powerful and well debugged libraries implementing BWT-indexes which can be easily used for building a read mapper:

answeredJun 6, 2017 at 20:17
Karel Břinda's user avatar
$\endgroup$
4
  • $\begingroup$Never heard of YARA … better-known mappers also use SeqAn, e.g. both TopHat and Bowtie (though not for the mapping itself, as far as I know).$\endgroup$CommentedJun 6, 2017 at 21:55
  • $\begingroup$Sigh, I was hoping to avoid writing BWA's SAM output to disk, but PySAM treats input strings as file names and explicitly doesn't support StringIO objects. Guess I'll have to look into named pipes or something like that.$\endgroup$CommentedJun 12, 2017 at 20:12
  • $\begingroup$PySAM supports also file objects – seepysam.readthedocs.io/en/latest/api.html#pysam.AlignmentFile. AlignmentFile(filepath_or_object,....: "If filepath_or_object is a python File object, the already opened file will be used."$\endgroup$CommentedJun 12, 2017 at 20:15
  • $\begingroup$Yeah, but runningproc.communicate gives me BWA's SAM output as a string. It looks like none of the usual string trickery will work, and somehow I'll have to go through the file system to get PySAM to read it.$\endgroup$CommentedJun 12, 2017 at 20:17
9
$\begingroup$

BWA-MEM can be used as a library. Filebwa/example.c shows the basic functionality for single-end mapping. It should give identical mapping to the bwa-mem command line. Headerbwa/bwamem.h contains basic documentation. Paired-end mapping is doable, but not well exposed. Several teams, including GATK and MS genomics, have been using bwa-mem as a library.

answeredJun 6, 2017 at 23:07
user172818's user avatar
$\endgroup$
1
  • $\begingroup$That is extremely cool.$\endgroup$CommentedJun 7, 2017 at 13:22
4
$\begingroup$

SeqAn supports BWT tables for use with their parametrisable alignment algorithms.

To use it,follow the general outline for building a SeqAn short-read aligner, and use theFMIndex specialisation instead of — as in the example — theIndexQGram.

gringer's user avatar
gringer
15.7k5 gold badges27 silver badges84 bronze badges
answeredJun 6, 2017 at 19:44
Konrad Rudolph's user avatar
$\endgroup$
1
$\begingroup$

If you're looking for a python API, nanopore have created one (although I am not sure how "maintained" it is)https://github.com/nanoporetech/bwapy

answeredApr 29, 2019 at 0:39
Michael Hall's user avatar
$\endgroup$

Your Answer

Sign up orlog in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

By clicking “Post Your Answer”, you agree to ourterms of service and acknowledge you have read ourprivacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.