Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

Annotation Pipeline for NLR genes

NotificationsYou must be signed in to change notification settings

steuernb/NLR-Parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 

Repository files navigation

NLR-Parser is a tool to rapidly annotate the NLR complement from sequenced plant genomes.

The NLR-Parser refines the output of MAST and reliably annotates disease resistance genes encoding for nucleotide-binding leucine-rich repeat (NLR) proteins.

Prerequisites

MEME suite version 4.9.1

The MEME suite is available athttp://meme-suite.org/index.html

Please note that the most actual version of meme is not compatible with NLR Parser. Use meme 4.9.1.

Don't worry about setting up the Apache webserver. You just need MAST, so the quick install is sufficient.

JRE 1.6

Make sure you have the Java Runtime Environments 1.6 or higher. Download fromhttp://java.com

NLR motif definitions

Download the meme.xml that contains the definitions fromhere.The motifs were published byJupe et al. (2012). The downloaded meme.xml is an input argument for MAST.

6Frame translator

If you intend to screen nucleotide sequences for NLRs, it might make sense to translate your sequence in all 6 reading frames. To ensure the full functionality of the NLR-Parser, please make sure the 6 aa-sequences only differ by a suffix and end with:

  • _frame+0
  • _frame+1
  • _frame+2
  • _frame-0
  • _frame-1
  • _frame-2

For this you can use the TranslateSequence.jar, which is part of this software.

Installation

Just download NLR-Parser.jar from thelatest release. Run it from the command line.

java -jar NLR-Parser.jar -i <mast.xml> -o <output.mast.txt> [-s <splitpattern>] [-p <pvalue>] [-b <blastfile>] [-gh] [-a <sequence>]

If you want to build it from source you will need theApache Commons CLI

Input parameters

parameterargumentdescription
-iSTRThe location of the xml output of MAST
-oSTRLocation and name of the outputfile that will be generated by the NLR-Parser. Note that an existing file will be overwritten
-sSTRThe splitpattern to combine 6-frame-translated nucleotide sequences to one output.default: "_frame"
-pfloatP-value threshold. Motifs with a p-value above will be ignored by the NLR-Parser.default: 1E-5
-aSTRLocation of an optional amino acid sequence file. This file should be the same as the one subjected to MAST. Providing this file allows extraction of the NB-ARC domain of the NLR, e.g. for phylogenetic studies. File has to be fasta format.
-gOutput gff format instead of a tsv.
-hPrint help

-s splitpattern

In case a nucleotide sequence has to be annotated, it should be translated into its 6 reading frames. The NLR-Parser can assume the sequence names for the 6 amino acid sequences are of a type . In that case it will report the combined result in one line with in the first column. It is highly unlikely that a sequence will have motifs in one forward strand and in the reverse strand at the same time. This makes sense if you annotate genomic sequence and introns cause a "frameshift".

This is of course a pit-fall if your sequence of interest contains two NLRs on different strands. In those cases, please use the workaround -s$$, assuming that none of your identifiers contains a "$$".

-g

Generate a gff file rather than a tsv table with the NLR-Parser results. This option is under development. Feel free to try and send us comments.

-a aminoacidfile.fasta

One column of the NLR-Parser output is the aminoacid sequence of the NB-ARC domain. This is usually the most conserved part of the NLR and can be used for phylogenetic studies. If you do not provide the complete amino acid sequence of the genes, this column is empty.

-p pvalue

This is the threshold of the p-values of the individual motifs. Motifs with a p-value above this threshold are ignored by the NLR-Parser. The default is1E-5.

Tips

  • MAST has an e-value threshold. Sequences with an evalue above that are not displayed. This evalue is dependent on the number of input sequences. If you run MAST on a really large file, add the parameter-ev 10000000 to your call.
  • If you want to annotate large files like genomes, it makes sense to chop them in overlapping fragments.

Citation

Contact

If there are any issues with the tool or if you would like to collaborate with us, please don't hesitate to contactus.

About

Annotation Pipeline for NLR genes

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages


[8]ページ先頭

©2009-2025 Movatter.jp