BMFtools (BarcodedMolecularFamilies tools) is a suite of tools for barcoded reads which takes advantage of PCR redundancy for error reduction/elimination. The core functionality consists ofmolecular demultiplexing at fastq stage, producing a unique observation for each sequenced founded template molecule. Accessory tools provide postprocessing, filtering, quality control, and summary statistics.
Requirements:gcc4.9+samtools 1.2+
git clone https://github.com/ARUP-NGS/BMFtools --recursivecd BMFtoolsmake
Name | Use |
---|
bmftools cap | Postprocess a tagged BAM for BMF-agnostic tools. |
bmftools depth | Calculates depth of coverage over a set of bed intervals. |
bmftools collapse | Collapse initial fastq records by barcode |
bmftools err | Calculate error rates based on cycle, base call, and quality score. |
bmftools famstats | Calculate family size statistics for a bam alignment file. |
bmftools filter | Filter or split a bam file by a set of filters. |
bmftools mark | Add tags for rsq. |
bmftools stack | A maximally-permissive variant caller using molecular barcode metadata analogous to samtools mpileup. |
bmftools rsq | Rescue reads with using positional inference to collapse to unique observations in spite of errors in the barcode sequence. |
bmftools sort | Sort for bam rescue |
bmftools target | Calculates on-target rate. |
bmftools vet | Curate variant calls from another variant caller (.bcf) and a bam alignment. |
These tools are divided into four categories:
- Core functionality
- Manipulation
- Analysis
####bmftools collapsebmftools collapse combines reads sharing barcodes into single observations respectively.
First, the barcodes are added to the comment fields of the fastqs and split the records into subsets based on the first characters in the barcode.Then, reads with exactly-matching barcode are collapsed, with a meta-analysis performed on each base call.
bmftools collapse inline collapses templates where both strands were sequenced, whereas collapse secondary lacks strand information.
####bmftools rsqbmftools rsq uses positional information to collapse reads sharing alignment signatures with close barcodesunder the assumption that they came from the same original founding molecule but with errors in reading thebarcode.
####bmftools capCaps quality scores using barcode metadata to facilitate working with barcode-agnostic tools.
####bmftools filterFilters or splits a bam file based on a set of filters. These can be inverted with -v (analogous to grep).
Filters:
Fail reads with insufficient mapping quality.
Fail reads with insufficient family size.
Fail read pairs by aligned fraction.
Fail reads outside of a bed region.
Fail reads without all bits in given parameter in the sam flag field.
Fail reads with any bits in given parameter in the sam flag field.
####bmftools vetCurates SNV calls from a tumor/normal variant call file using barcode metadata from the bams used to produce the variant call file.
####bmftools depthCalculates depth of coverage across a bed file using barcode metadata.
####bmftools targetCalculates on-target fraction for bed file using barcode metadata.
####bmftools errCalculates error rates by a variety of parameters.Additionally, pre-computes the quality score recalibration for the optional collapse recalibration step.
####bmftools famstatsCalculates summary statistics related to family size and demultiplexing.
####bmftools stackA maximally-permissive variant caller using molecular barcode metadata analogous to samtools mpileup.
Tag | Content | Format |
---|
DR | Whether the read was sequenced from both strands. Only valid for inline chemistry. | Integer [0, 1] |
FA | Number of reads in Family which Agreed with final sequence at each base | uint32_t array |
FM | Size of family (number of reads sharing barcode.), e.g., "Family Members" | Integer |
FP | Read Passes Filter related to barcoding. Determines QC fail flag in bmftools mark (without -q). | Integer [0, 1] |
NF | Mean number of differences between reads and consensus per read in family | Float |
PV | Phred Values for a base call after meta-analysis | uint32_t array |
RV | Number of reversed reads in consensus. Only for inline chemistry. | Integer |