NotificationsYou must be signed in to change notification settings
Fork9
Star16

Annotation Pipeline for NLR genes

You must be signed in to change notification settings

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
src		src
README.md		README.md
meme.xml		meme.xml

Repository files navigation

NLR-Parser README

NLR-Parser is a tool to rapidly annotate the NLR complement from sequenced plant genomes.

The NLR-Parser refines the output of MAST and reliably annotates disease resistance genes encoding for nucleotide-binding leucine-rich repeat (NLR) proteins.

Prerequisites

MEME suite version 4.9.1

The MEME suite is available athttp://meme-suite.org/index.html

Please note that the most actual version of meme is not compatible with NLR Parser. Use meme 4.9.1.

Don't worry about setting up the Apache webserver. You just need MAST, so the quick install is sufficient.

JRE 1.6

Make sure you have the Java Runtime Environments 1.6 or higher. Download fromhttp://java.com

NLR motif definitions

Download the meme.xml that contains the definitions fromhere.The motifs were published byJupe et al. (2012). The downloaded meme.xml is an input argument for MAST.

6Frame translator

If you intend to screen nucleotide sequences for NLRs, it might make sense to translate your sequence in all 6 reading frames. To ensure the full functionality of the NLR-Parser, please make sure the 6 aa-sequences only differ by a suffix and end with:

_frame+0
_frame+1
_frame+2
_frame-0
_frame-1
_frame-2

For this you can use the TranslateSequence.jar, which is part of this software.

Installation

Just download NLR-Parser.jar from thelatest release. Run it from the command line.

java -jar NLR-Parser.jar -i <mast.xml> -o <output.mast.txt> [-s <splitpattern>] [-p <pvalue>] [-b <blastfile>] [-gh] [-a <sequence>]

If you want to build it from source you will need theApache Commons CLI

Input parameters

parameter	argument	description
-i	STR	The location of the xml output of MAST
-o	STR	Location and name of the outputfile that will be generated by the NLR-Parser. Note that an existing file will be overwritten
-s	STR	The splitpattern to combine 6-frame-translated nucleotide sequences to one output.default: "_frame"
-p	float	P-value threshold. Motifs with a p-value above will be ignored by the NLR-Parser.default: 1E-5
-a	STR	Location of an optional amino acid sequence file. This file should be the same as the one subjected to MAST. Providing this file allows extraction of the NB-ARC domain of the NLR, e.g. for phylogenetic studies. File has to be fasta format.
-g		Output gff format instead of a tsv.
-h		Print help

-s splitpattern

In case a nucleotide sequence has to be annotated, it should be translated into its 6 reading frames. The NLR-Parser can assume the sequence names for the 6 amino acid sequences are of a type . In that case it will report the combined result in one line with in the first column. It is highly unlikely that a sequence will have motifs in one forward strand and in the reverse strand at the same time. This makes sense if you annotate genomic sequence and introns cause a "frameshift".

This is of course a pit-fall if your sequence of interest contains two NLRs on different strands. In those cases, please use the workaround -s$$, assuming that none of your identifiers contains a "$$".

-g

Generate a gff file rather than a tsv table with the NLR-Parser results. This option is under development. Feel free to try and send us comments.

-a aminoacidfile.fasta

One column of the NLR-Parser output is the aminoacid sequence of the NB-ARC domain. This is usually the most conserved part of the NLR and can be used for phylogenetic studies. If you do not provide the complete amino acid sequence of the genes, this column is empty.

-p pvalue

This is the threshold of the p-values of the individual motifs. Motifs with a p-value above this threshold are ignored by the NLR-Parser. The default is1E-5.

Tips

MAST has an e-value threshold. Sequences with an evalue above that are not displayed. This evalue is dependent on the number of input sequences. If you run MAST on a really large file, add the parameter-ev 10000000 to your call.
If you want to annotate large files like genomes, it makes sense to chop them in overlapping fragments.

Citation

For using the NLR motifs, please citeJupe et al. (2012)
For NLR-Parser, please citeSteuernagel et al. (2015).

Contact

If there are any issues with the tool or if you would like to collaborate with us, please don't hesitate to contactus.

About

Annotation Pipeline for NLR genes

Releases1

1.0 Latest

Jan 21, 2015

Packages

No packages published

Languages

Java100.0%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

NLR-Parser README

Prerequisites

MEME suite version 4.9.1

JRE 1.6

NLR motif definitions

6Frame translator

Installation

Input parameters

-s splitpattern

-g

-a aminoacidfile.fasta

-p pvalue

Tips

Citation

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases1

Packages

Languages

Movatterモバイル変換

steuernb/NLR-Parser

Folders and files

Latest commit

History

Repository files navigation

NLR-Parser README

Prerequisites

MEME suite version 4.9.1

JRE 1.6

NLR motif definitions

6Frame translator

Installation

Input parameters

-s splitpattern

-g

-a aminoacidfile.fasta

-p pvalue

Tips

Citation

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases1

Packages0

Languages

Packages