- Notifications
You must be signed in to change notification settings - Fork21
Genome Annotation for the Masses
License
wurmlab/afra
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Genomes of emerging model organisms are now being sequenced atlow cost.However, obtaining accurate gene predictions remains challenging. Even thebest gene prediction algorithms make substantial errors, leading to furthererroneous analysis. Therefore, many predicted genes need to be visuallyinspected and manually curated (Yandell & Ence); this can be infeasible when working with thousands of genes from multiple organisms.
Inspired by crowdsourcing approaches and platforms includingFoldit,Galaxy Zoo andCrowdflower, we are developing Afra to recruitadditional gene feature curators. This should help dramatically increasethe quality of gene curations available for newly sequenced genomes. In thelong-term we aim to recruit contributors among members of the general public.However, gene curation requires large amounts of specialist knowledge andovercoming a steep learning curve. While we are working to reduce the steepnessof the learning curve via interactive tutorials and support forums, genomecuration is not yet easily accessible to all. Thus in a first instance we arerecruiting curators among biology students. They perform curations as part oftheir courses aiming to understand gene structure and/or challenges with geneidentification and gene prediction.
Users login to their dashboard using their Facebook account, where they arepresented with documentation, guided tutorial exercises, and curation challenges which include "Curate" buttons.Each curation challenge invites user to contribute towards a different curation project.
Clicking 'Curate' sends the user to a JBrowse-derived WebApollo-like curation interfacefocusing on a single gene model and showing all available tracks ofevidence for this gene model. The user starts by dragging one of thesemodels (typically the consensus gene model) to the edit track and canthen edit this gene model.
Users may refer to the tutorials or seek help on our forum using the 'Help &Support' link at the top. A simple step by step guideline to curation is alwaysavailable in a sidebar that folds to the right.
Afra imports a GFF file of predicted gene models and creates a prioritized list of "curation tasks"based on expected curation difficulty; the administrator can additionally prioritize specific genesfor a specific curation project. Each gene prediction is presented to fourindependent users/curators. Each curator independently examines the gene modeland may propose revisions or add comments (e.g., if there is insufficientevidence to curate).
For each gene prediction, submitted gene models are then automatically compared:if all users propose the same changes to a gene model, these changes areconsidered to be correct. If gene models proposed by different curatorsdisagree, the different gene predictions are shown to several more experiencecurators who submit their curation in turn. If gene models proposed by the moreexperienced curators disagree, all predictions are shown to an even more seniorcurator who makes a final verdict.
Annotation editing.Prioritized redundant task distributionBasic user dashboard.Simple, non-interactive tutorials.Obtain curations from eight QMUL MSc students.Obtain contributions from 20 of undergraduate students.December 2014: Simple editor synchronization between two tabs/windows.December 2014: Improve annotation editing experience. Make it more intuitive.December 2014: Basic automated testing of annotation editing functionality.
Todos:
- Improve page load times.
- Partially done genome dashboard: Overview of contributions per genome. How many curations. How many pass auto-check.
- Comments on curations.
- Extensive automated testing of annotation editing functionality.
- Improve annotation editing performance.
- Interactive tutorial.
- Roll out to 200 first year students learning about gene structure ... and the inadequacies of Bioinformatics algorithms.
We welcome contributions of code, curations, or documentation. Find us onGitter to discuss how you could best help.
OurWiki details setting up adevelopment environment using Docker.
Pleaseemail if you:
- would like a demo
- would like to use Afra in your institution to help teach students
- have any other questions
Afra is Copyright (©) 2013 Queen Mary, University of London.
Parts of Afra are a derivative work ofJBrowse andWebApollo which are respectively copyright (c) 2000-2006 The Perl Foundation and copyright (c) 2010 Regents of the University of California.