Original author(s) | Heng Li |
---|---|
Developer(s) | John Marshall and Petr Danecek et al[1] |
Initial release | 2009 |
Stable release | |
Repository | |
Written in | C |
Operating system | Unix-like |
Type | Bioinformatics |
License | BSD,MIT |
Website | www![]() |
SAMtools is a set of utilities for interacting with and post-processingshort DNA sequence read alignments in theSAM (Sequence Alignment/Map),BAM (Binary Alignment/Map) andCRAM formats, written byHeng Li. These files are generated as output byshort read aligners likeBWA. Both simple and advanced tools are provided, supporting complex tasks like variant calling and alignment viewing as well as sorting, indexing, data extraction andformat conversion.[3] SAM files can be very large (tens ofGigabytes is common), so compression is used to save space. SAM files are human-readable text files, and BAM files are simply their binary equivalent, whilst CRAM files are a restructured column-oriented binary container format. BAM files are typically compressed and more efficient for software to work with than SAM. SAMtools makes it possible to work directly with a compressed BAM file, without having to uncompress the whole file. Additionally, since the format for a SAM/BAM file is somewhat complex - containing reads, references, alignments, quality information, and user-specified annotations - SAMtools reduces the effort needed to use SAM/BAM files by hiding low-level details.
As third-party projects were trying to use code from SAMtools despite it not being designed to be embedded in that way, the decision was taken in August 2014 to split the SAMtools package into a stand-alone software library with a well-definedAPI (HTSlib),[4] a project for variant calling and manipulation of variant data (BCFtools), and the stand-alone SAMtools package for working withsequence alignment data.[5]
Like manyUnix commands, SAMtool commands follow astream model, where data runs through each command as if carried on aconveyor belt. This allows combining multiple commands into a data processing pipeline. Although the final output can be very complex, only a limited number of simple commands are needed to produce it. If not specified, thestandard streams (stdin, stdout, and stderr) are assumed. Data sent to stdout are printed to the screen by default but are easily redirected to another file using the normal Unix redirectors (> and >>), or to another command via a pipe (|).
SAMtools provides the following commands, each invoked as "samtoolssome_command".
samtools viewsample.bam >sample.sam
Convert a bam file into a sam file.
samtools view -bSsample.sam >sample.bam
Convert a sam file into a bam file. The-b option compresses or leaves compressed input data.
samtools viewsample_sorted.bam "chr1:10-13"
Extract all the reads aligned to the range specified, which are those that are aligned to the reference element namedchr1 and cover its 10th, 11th, 12th or 13th base. The results is saved to a BAM file including the header. An index of the input file is required for extracting reads according to their mapping position in the reference genome, as created bysamtools index.
samtools view -h -bsample_sorted.bam "chr1:10-13" >tiny_sorted.bam
Extract the same reads as above, but instead of displaying them, writes them to a new bam file,tiny_sorted.bam. The-b option makes the output compressed and the-h option causes the SAM headers to be output also. These headers include a description of the reference that the reads insample_sorted.bam were aligned to and will be needed if thetiny_sorted.bam file is to be used with some of the more advanced SAMtools commands. The order of extracted reads is preserved.
samtools tviewsample_sorted.bam
Start an interactive viewer to visualize a small region of the reference, the reads aligned, and mismatches. Within the view, can jump to a new location by typing g: and a location, likeg:chr1:10,000,000. If the reference element name and following colon is replaced with=, the current reference element is used, i.e. ifg:=10,000,200 is typed after the previous "goto" command, the viewer jumps to the region 200 base pairs down onchr1. Typing? brings up help information for scroll movement, colors, views, ...
samtools tview -p chrM:1sample_chrM.bamUCSC_hg38.fa
Set start position and compare.
samtools tview -d T -p chrY:10,000,000sample_chrY.bamUCSC_hg38.fa >>save.txt
samtools tview -d H -p chrY:10,000,000sample_chrY.bamUCSC_hg38.fa >>save.html
Save screen in .txt or .html.
samtools sort -o sorted_outunsorted_in.bam
Read the specifiedunsorted_in.bam as input, sort it by aligned read position, and write it out tosorted_out. Type of output can be either sam, bam, or cram, and will be determined automatically by sorted_out's file-extension.
samtools sort -m 5000000unsorted_in.bamsorted_out
Read the specifiedunsorted_in.bam as input, sort it in blocks up to 5 million k (5 Gb)[units verification needed] and write output to a series of bam files namedsorted_out.0000.bam,sorted_out.0001.bam, etc., where all bam 0 reads come before any bam 1 read, etc.[verification needed]
samtools indexsorted.bam
Creates an index file,sorted.bam.bai for thesorted.bam file.