- Notifications
You must be signed in to change notification settings - Fork2
Workflow for identifying and classifying homologous gene/protein sequences
License
laelbarlow/amoebae
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Analysis ofMOlecularEvolution withBAtchEntry (AMOEBAE)is a customizable bioinformatics workflow for identifying homologues (andpotential orthologues) of genes of interest among a mid-size sampling ofgenomes. This workflow is designed to be run on high-performance computing(HPC) clusters and is executed via theSnakeMake workflow managementsystem. Code for steps in this workflow is written primarily inPython3,relying heavily on theBiopython library, and applybioinformatics packages includingBLAST+ andHMMER3 to input data files. AMOEBAE is open-source, andall dependencies are freely available.Lael D.Barlow isthe author.
AMOEBAE is useful for certain mid-scale comparative genomics studies thatmight otherwise require time-intensive and repetitivemanual/visual manipulation of data. Webservices such as those provided byNCBIandEMBL-EBI provide a means toreadily investigate the evolution of one or a few genes via similarity searching,and large-scale analysis workflows such asOrthoMCL andOrthoFinder attempt to rapidly perform orthologyprediction for all genes among several genomes. AMOEBAE addressesanalyses which are too cumbersome to be performed via webservices or simplescripts and yet require a level of detail and flexibility not offered bylarge-scale analysis workflows. AMOEBAE is useful for analyzing thedistribution of homologues of up to approximately 30 genes/proteins among asampling of no more than approximately 100 eukaryotic genomes, especially whenfollow-up with custom phylogenetic analysis is planned.
AMOEBAE serves this purpose by providing several unique features. The corefunctionality of AMOEBAE is to run sequence similarity searches with multiplealgorithms, multiple queries, and multiple databases simultaneously and toallow highly customizable implementation of reciprocal-best-hit searchstrategies. The output includes detailed summaries of results in the form of aspreadsheet and presence/absence plots. A particular advantage of AMOEBAEcompared to other workflows is its functionality for parsing results of TBLASTN(which searches nucleotide sequences with peptide sequence queries) searchresults. This allows rapid identification of High-scoring Segment Pair (HSP)clusters at separate gene loci, automatic checking of those loci againstinformation in genome annotation files, and systematic use of theExonerate package where possiblefor obtaining exon predictions. In addition, AMOEBAE provides manyoptions which can be tailored to the specific genes/proteins beinganalyzed. Despite the complexity of this workflow, analyses performed usingAMOEBAE can be reproduced viaSnakeMake.
The output files include a plot of the number of identifiedhomologues (potential orthologues) of several genes across several genomes, aswell as a spreadsheet in CSV format providing a detailed summary of searchresults.
Here's a diagram of the steps in the overall workflow:
Here's an example coulson plot output by the workflow:
See theworkflowprotocol for instructions andguidelines for running the AMOEBAE workflow.
Please use theissue tracker onthe GitHub webpage to report any problems you encounter while using AMOEBAE.
Please cite theAMOEBAE GitHubrepository (or alternative permanentrepositories if relevant).
AMOEBAE was initially developed at the Dacks Laboratory at the University ofAlberta, and was supported by National Sciences and Engineering Council ofCanada (NSERC) Discovery grants RES0021028, RES0043758, and RES0046091 awardedto Joel B. Dacks, as well as an NSERC Postgraduate Scholarship-Doctoral awardedto Lael D. Barlow.
We acknowledge the support of the Natural Sciences and Engineering Research Council of Canada (NSERC).
Cette recherche a été financée par le Conseil de recherches en sciences naturelles et en génie du Canada (CRSNG).
Also, help with testing AMOEBAE has been kindly provided by numerous members ofthe Dacks laboratory.
Copyright 2018-2024 Lael D. Barlow
Licensed under the Apache License, Version 2.0 (the "License");you may not use this file except in compliance with the License.You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License.
About
Workflow for identifying and classifying homologous gene/protein sequences