- Notifications
You must be signed in to change notification settings - Fork2
sinc-lab/Comparison-of-Protein-learning
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This repository contains the data and code used in the review of proteinsequence embeddings entitled"Transfer learning in proteomics: comparison ofnovel learned representations for protein sequences," by E. Fenoy, A. Ederaand G. Stegmayer (under review). Research Institute for Signals, Systems andComputational Intelligence,sinc(i).
In the figure above, points depict 2D non-linear projections calculated fromthe 12 protein sequence embeddings studied. Orange points highlight proteinsequences having theImmunoglobulin C1-set domain(PF07654).
The figures above show the performance of the 12 embeddings used forpredicting the GO terms annotating protein sequences. Performance is measuredwith the F1 score and predictions are grouped according to the threesub-ontologies of the GO terms: Biological Process (BP), Cellular Component(CC) and Molecular Function (MF).
Recently, representation learning techniques are being proposed for encodingdifferent types of protein information (sequence, domains, interactions, etc.)as low-dimensional vectors. In this review, we performed a detailedexperimental comparison of several protein sequence embeddings on severalbioinformatics tasks:
determining similarities between proteins in the embeddings projected space.
inferring protein domains.
predicting GO ontology-based protein functions.
Thisnotebookreproduces the visual comparative analysis of 12 embeddings in the evaluationof the capability of protein sequence embeddings for capturing protein domaininformation.
The review used 9,479human protein sequences tobuildembeddingswith 12 embedding methods.
Note: Click the method name below to download the embeddings used in thisreview.
Embedding | Dimensionality | Reference |
---|---|---|
512 | ||
8,192 | ||
1,280 | ||
64 | ||
1,024 | ||
1,024 | ||
300 | ||
50 | ||
100 | ||
1,024 | ||
768 | ||
1,900 |
About
Comparison of protein learning
Topics
Resources
Uh oh!
There was an error while loading.Please reload this page.