- Notifications
You must be signed in to change notification settings - Fork3
New version of JACUSA -> 2.0
License
dieterich-lab/JACUSA2
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
JAVA framework for accurate Variant assessment (JACUSA2) is a one-stop solution to detect singlenucleotide variants (SNVs) and reverse transcriptase induced arrest events in Next-generationsequencing (NGS) data.
JACUSA2 is a direct successor ofJACUSA1 --- JACUSA1 is hereby deprecated and won't becontinued. All methods (call-1,call-2, andpileup) from JACUSA1 are available in JACUSA2.The new release of JACUSA2 features great performance enhancements (~3 faster) for existing methodsand adds new methodsrt-arrest andlrt-arrest (EXPERIMENTAL) that enable identification of read arrest events.
Check themanual for further details.
JACUSA2 does not require any configuration but needs a correctly configured Java environment.We developed and tested JACUSA2 with Java v1.8. If you encounter any Java related problems pleaseconsider to change to Java v1.8.
The latest version of JACUSA2 can be obtained fromall releases.
Available methods in JACUSA2:
$ java -jar jacusa2.jarusage: JACUSA2 <METHOD> <METHOD-OPTIONs> <BAMs> METHOD DESCRIPTION call-1 Call variants - 1 condition call-2 Call variants - 2 conditions pileup SAMtools like mpileup - 2 conditions rt-arrest Reverse Transcription Arrest - 2 conditions lrt-arrest Linkage arrest to base substitution - 2 conditions [...]
Get supported options for a method (e.g.: call-1):
$ java -jar jacusa2.jar call-1usage: JACUSA call-1 [OPTIONS] BAM1_1[,BAM1_2,...] -A Show all sites - including sites without variants -a <FEATURE-FILTER> [...] Use -h to see extended help -B <READ-TAG> Tag reads by base substitution. Count non-reference base substitution per read and stratify. Requires stranded library type. (Format for T to C mismatch: T2C; use ',' to separate substitutions) Default: none -b <BED> BED file to scan for variants -c <MIN-COVERAGE> filter positions with coverage < MIN-COVERAGE default: 5 [...]
Replicates or multiple bam files are separated by ",":
java -jar jacusa.jar call-2 -r JACUSA.out -a H:1 \ gDNA.bam \ cDNA_replicate_1.bam,cDNA_replicate_2,bam,cDNA_replicate_3.bam
Checkmanual for detailed method specific options.
JACUSA2 requires indexed BAM files.In order to create a BAM file index for an existing filealign.bam
, usesamtools and execute the following:
$ samtools index align.bam
For further details and sam->bam conversion, please check thesamtools howtos.
Some methods and options require the "MD"-field of a BAM file to be correctly populated.The "MD"-field stores information on mismatched and deleted reference bases.It allows to reconstruct the original reference sequence from alignments stored in a BAM file.
Given the reference sequencereference.fasta
from the mapping step,use the following command to populate the "MD"-field of an existingalign.bam
:
$ samtools calmd -b align.bam reference.fasta > align_md.bam
Checksamtools calmd for more details.
JACUSA2 writes its output to a user specified file. When using multiple threads, JACUSA2 willcreate a temporary file for each allocated thread in the temp directory that is provided by theJAVA Virtual Machine. Check the manual of your JAVA Virtual machine on how to change the temp directory.
Chosen command line parameters and current genomic position are printed to the command prompt andserve as a status guard.
Output format of JACUSA2 is controlled by the-f <FORMAT>
command line option. Support for outputformats depends on the used method.
A "##" prefixed header that contains JACUSA2 runtime specific data such as version info and command line options is added to the default output format.The default output format is a combination ofBED6 withJACUSA2 methods specific columns and common info columns: "info", "filter", and "ref".The actual number of columns depends on the JACUSA2 method and the number of provided BAM files.
Columns | 1-6 | 7 - (N-3) | (N-2) - N |
---|---|---|---|
Description | BED6 | Method specific | (General) info, filter, and ref(erence) specific |
Robust identification of variants has proven to be a daunting task due to artefacts specific forNGS-data and employed mapping strategies.We implement various artefact/feature filters (checkmanual for "-a [...])that reduce the number of false positives.
JACUSA2 supports two modes of sample setups of variant calling:
- single (call-1) or
- paired samples (call-2).
The methodcall-1 identifies variants against the reference sequence.BAM files with a properly populated "MD"-field are required - checkJACUSA2 - Required input andSAM Tags specification.
$ java -jar jacusa2.jar call-1 results_call1.out alignments.bam
The methodcall-2 identifies variants in 2 conditions.
$ java -jar jacusa2.jar call-2 results_call2.out condition1.bam condition2.bam
JACUSA2 supports two methods to identify arrest events by means of comparing counts of arrest and throughreads:rt-arrest andlrt-arrest. Beyond read counts, JACUSA2 shows base counts from arrest and through reads.This allows to inspect arrest events and variant calling simultaneously.It is mandatory to provide the library type by "-P" or "-P1" and "-P2"!
Check the section on arrest events in themanual.
In this method, base call counts of arrest and read through reads are modelled by a Beta-Binomial distribution anddifferences between conditions are to be identified by means of a likelihood ratio test. Subsequent approximiationwith
Sites are considered candidate arrest sites, if in all BAM files there is at least one read through AND one
read arrest event. Otherwise, there would be no difference between the conditions.Furthermore, coverage filter and minBASQ of base call apply that will affect the output.
$ java -jar jacusa2.jar rt-arrest -P FR-SECONDSTRAND -o results_rt_arrest.out condition1_1.bam,...,condition1_N.bam condition2_1.bam,...,condition2_M.bam
lrt-arrest allows to link pileups to their arrest position. Output consists of read arrest and read through counts anda references to the associated arrest positions. There are cases, where currently an arrest position cannot be defined,e.g.: non properly paired reads.
There is also a new version ofJACUSA2helperto support downstream analysis of JACUSA2 output.Additonally, some artefact filters have been removed from JACUSA1 in favour of the rewritten R helper package.The old version of JACUSAhelper has been declared deprecated and won't be maintained anymore.
Java 1.8 and Maven 3.0+ are required to compile JACUSA2.JACUSA2 JAR will be available intarget/JACUSA2-*.jar
.
# get srcgit clone git@github.com:dieterich-lab/JACUSA2.git# build JACUSA2cd JACUSA2mvn clean install
- JACUCA1 to JACUSA2
- General
- Complete code rework - ~3x faster than JACUSA2
- Added "##" prefixed header line that captures CLI arguments
- CLI changes
- ALL two dash options "--option ..." have been removed
- Use "-filterNH" and "-filterNM" instead of "--filterNH" and "--filterNM"
- Library type option has changed: JACUSA1: "-P Lib1,Lib2", JACUSA2: "-P1 Lib1 -P2 Lib2"
- New methods and options
- Addedrt-arrest method - Reverse Transcription Arrest - 2 conditions
- Addedlrt-arrest method - Linkage arrest to base substitution - 2 conditions
- Added "-B " option that to partition reads based on base substitution
- Added "-I" or "-D" options to add insertion or deletion counts
- Artefact filter
- Added Exclude Site Filter (option: E)
- Moved some filter toJACUSA2helper
- Library changes
- Upgradedcommons-cli v1.2 to v1.4
- Upgradedcommons-math3 v3.3 to v3.6.1
- Ueplaced sam v1.92 withhtsjdk v2.12.0
- Removed ssj dependence
- General