genome/bam-readcountPublic

NotificationsYou must be signed in to change notification settings
Fork98
Star319

Count bases in BAM/CRAM files

License

MIT license

319 stars 98 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 287 Commits
.github/workflows		.github/workflows
build-common		build-common
cmake		cmake
debian		debian
integration-test		integration-test
src		src
test-data		test-data
test/lib/bamrc		test/lib/bamrc
tutorial		tutorial
vendor		vendor
version		version
.gitignore		.gitignore
.gitmodules		.gitmodules
.zenodo.json		.zenodo.json
BUILD.md		BUILD.md
CMakeLists.txt		CMakeLists.txt
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
PackageInfo.cmake		PackageInfo.cmake
README.md		README.md

Repository files navigation

bam-readcount

bam-readcount is a utility that runs on aBAM orCRAM file and generates low-level information aboutsequencing data at specific nucleotide positions. Its outputs include observed bases,readcounts, summarized mapping and base qualities, strandedness information,mismatch counts, and position within the reads. (see "Output" section below)

Originally designed to help filter genomic mutation calls, the metricsbam-readcount outputsare also useful as input for variant detection tools and for resolving ambiguity betweenvariant callers.

If you findbam-readcount useful in your work, please cite ourpaper:

Khanna et al., (2022). Bam-readcount - rapid generation of basepair-resolution sequence metrics.Journal of Open Source Software, 7(69), 3722.https://doi.org/10.21105/joss.03722

Installation

Docker

The latest release version ofbam-readcount is available as a Docker imageonDockerHub

docker pull mgibio/bam-readcount

For details see thedocker-bam-readcountrepository.

Build

Requires a C++ toolchain andcmake. For details seeBUILD.md.

git clone https://github.com/genome/bam-readcount cd bam-readcountmkdir buildcd buildcmake ..make# Executable isbin/bam-readcount

Usage

Run with no arguments for command-line help:

$ bam-readcountUsage: bam-readcount [OPTIONS] <bam_file> [region]Generate metrics for bam_file at single nucleotide positions.Example: bam-readcount -f ref.fa some.bamAvailable options:  -h [ --help ]                         produce this message  -v [ --version ]                      output the version number  -q [ --min-mapping-quality ] arg (=0) minimum mapping quality of reads used                                        for counting.  -b [ --min-base-quality ] arg (=0)    minimum base quality at a position to                                        use the read for counting.  -d [ --max-count ] arg (=10000000)    max depth to avoid excessive memory                                        usage.  -l [ --site-list ] arg                file containing a list of regions to                                        report readcounts within.  -f [ --reference-fasta ] arg          reference sequence in the fasta format.  -D [ --print-individual-mapq ] arg    report the mapping qualities as a comma                                        separated list.  -p [ --per-library ]                  report results by library.  -w [ --max-warnings ] arg             maximum number of warnings of each type                                        to emit. -1 gives an unlimited number.  -i [ --insertion-centric ]            generate indel centric readcounts.                                        Reads containing insertions will not be                                        included in per-base counts

The optional[region] should be in the same format assamtools:

chromosome:start-stop

The optional-l (--site-list) file should be tab-separated, noheader, one region per line:

chromosomestartend

CRAM support

When using CRAM files as input, if a reference is specified with-f, it will override whatever is inthe CRAM header. Otherwise, the reference(s) encoded in the CRAM header or a lookup byMD5 at ENA will be used.

Wrappers/Parsers

Add bam-readcount counts to VCF

VAtools allows you to add read-counts to VCF from modern variant callers.Additional detailsCreate csv file
brc-parser parser to convert bam-readcount output to comma seperated long format file.

Output

Output is tab-separated with no header toSTDOUT, one line perposition:

chrpositionreference_basedepthbase:count:avg_mapping_quality:avg_basequality:avg_se_mapping_quality:num_plus_strand:num_minus_strand:avg_pos_as_fraction:avg_num_mismatches_as_fraction:avg_sum_mismatch_qualities:num_q2_containing_reads:avg_distance_to_q2_start_in_q2_reads:avg_clipped_length:avg_distance_to_effective_3p_end   ...

There is one set of:-separated fields for each reportedbase withstatistics on the set of reads containing that base:

Field	Description
base	The base, eg`C`
count	Number of reads
avg_mapping_quality	Mean mapping quality
avg_basequality	Mean base quality
avg_se_mapping_quality	Mean single ended mapping quality
num_plus_strand	Number of reads on the plus/forward strand
num_minus_strand	Number of reads on the minus/reverse strand
avg_pos_as_fraction	Average position on the read as a fraction, calculated with respect to the length after clipping. This value is normalized to the center of the read: bases occurring strictly at the center of the read have a value of 1, those occurring strictly at the ends should approach a value of 0
avg_num_mismatches_as_fraction	Average number of mismatches on these reads per base
avg_sum_mismatch_qualities	Average sum of the base qualities of mismatches in the reads
num_q2_containing_reads	Number of reads with q2 runs at the 3’ end
avg_distance_to_q2_start_in_q2_reads	Average distance of position (as fraction of unclipped read length) to the start of the q2 run
avg_clipped_length	Average clipped read length
avg_distance_to_effective_3p_end	Average distance to the 3’ prime end of the read (as fraction of unclipped read length)

Per-library output

With the-p option, each output line will have a set of{}-delimitedresults, one for each library:

chrpositionreference_basedepthlibrary_1_name{base:count:avg_mapping_quality:avg_basequality:avg_se_mapping_quality:num_plus_strand:num_minus_strand:avg_pos_as_fraction:avg_num_mismatches_as_fraction:avg_sum_mismatch_qualities:num_q2_containing_reads:avg_distance_to_q2_start_in_q2_reads:avg_clipped_length:avg_distance_to_effective_3p_end}   ...   library_N_name{base:count:avg_mapping_quality:avg_basequality:avg_se_mapping_quality:num_plus_strand:num_minus_strand:avg_pos_as_fraction:avg_num_mismatches_as_fraction:avg_sum_mismatch_qualities:num_q2_containing_reads:avg_distance_to_q2_start_in_q2_reads:avg_clipped_length:avg_distance_to_effective_3p_end}

Tutorial

For those who learn best by example, abrief tutorial is available here that uses bam-readcount to identify the Omicron SARS-CoV-2 variant of concern from raw sequence data.

Support

For support, pleasesearchbam-readcount onBiostars as many of the most frequently askedquestions aboutbam-readcount have been answered there. For problems not addressed there,please open an github issue or make a BioStar post.

Contributing

We welcome contributions! SeeContributing for more details

About

Count bases in BAM/CRAM files

Releases7

v1.0.1 Release Latest

Jan 19, 2022

+ 6 releases

Packages

No packages published

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

bam-readcount

Contents

Installation

Docker

Build

Usage

CRAM support

Wrappers/Parsers

Output

Per-library output

Tutorial

Support

Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases7

Packages

Uh oh!

Contributors13

Uh oh!

Languages

Movatterモバイル変換

License

genome/bam-readcount

Folders and files

Latest commit

History

Repository files navigation

bam-readcount

Contents

Installation

Docker

Build

Usage

CRAM support

Wrappers/Parsers

Output

Per-library output

Tutorial

Support

Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases7

Packages0

Uh oh!

Contributors13

Uh oh!

Languages

Packages