Movatterモバイル変換

Protein superfamily

From Wikipedia, the free encyclopedia

Grouping of proteins

Aprotein superfamily is the largest grouping (clade) ofproteins for whichcommon ancestry can be inferred (seehomology). Usually this common ancestry is inferred fromstructural alignment^[1] and mechanistic similarity, even if no sequence similarity is evident.^[2]Sequence homology can then be deduced even if not apparent (due to low sequence similarity). Superfamilies typically contain severalprotein families which show sequence similarity within each family. The termprotein clan is commonly used forprotease andglycosyl hydrolases superfamilies based on theMEROPS andCAZy classification systems.^[2]^[3]

Identification

[edit]

Above,secondary structural conservation of 80 members of thePA protease clan (superfamily). H indicatesα-helix, E indicatesβ-sheet, L indicates loop. Below, sequence conservation for the same alignment. Arrows indicatecatalytic triad residues. Aligned on the basis of structure byDALI

Superfamilies of proteins are identified using a number of methods. Closely related members can be identified by different methods to those needed to group the most evolutionarily divergent members.

Sequence similarity

[edit]

Asequence alignment of mammalianhistone proteins. The similarity of the sequences implies that they evolved bygene duplication. Residues that are conserved across all sequences are highlighted in grey. Below the protein sequences is a key denoting:^[4]
*conserved sequence,
:conservative mutations,
. semi-conservative mutations, and
␣non-conservative mutations.

Main article:Sequence homology

Historically, the similarity of different amino acid sequences has been the most common method of inferringhomology.^[5] Sequence similarity is considered a good predictor of relatedness, since similar sequences are more likely the result ofgene duplication anddivergent evolution, rather than the result ofconvergent evolution. Amino acid sequence is typically more conserved than DNA sequence (due to thedegenerate genetic code), so it is a more sensitive detection method. Since some of the amino acids have similar properties (e.g., charge, hydrophobicity, size),conservative mutations that interchange them are oftenneutral to function. The most conserved sequence regions of a protein often correspond to functionally important regions likecatalytic sites and binding sites, since these regions are less tolerant to sequence changes.

Using sequence similarity to infer homology has several limitations. There is no minimum level of sequence similarity guaranteed to produce identical structures. Over long periods of evolution, related proteins may show no detectable sequence similarity to one another. Sequences with manyinsertions and deletions can also sometimes be difficult toalign and so identify the homologous sequence regions. In thePA clan ofproteases, for example, not a single residue is conserved through the superfamily, not even those in thecatalytic triad. Conversely, the individual families that make up a superfamily are defined on the basis of their sequence alignment, for example the C04 protease family within the PA clan.

Nevertheless, sequence similarity is the most commonly used form of evidence to infer relatedness, since the number of known sequences vastly outnumbers the number of knowntertiary structures.^[6] In the absence of structural information, sequence similarity constrains the limits of which proteins can be assigned to a superfamily.^[6]

Structural similarity

[edit]

Structural homology in thePA superfamily (PA clan). The double β-barrel that characterises the superfamily is highlighted in red. Shown are representative structures from several families within the PA superfamily. Note that some proteins show partially modified structural.Chymotrypsin (1gg6),tobacco etch virus protease (1lvm),calicivirin (1wqs),west nile virus protease (1fp7),exfoliatin toxin (1exf),HtrA protease (1l1j),snake venom plasminogen activator (1bqy),chloroplast protease (4fln) andequine arteritis virus protease (1mbm).

Main article:Structural alignment

Structure is much more evolutionarily conserved than sequence, such that proteins with highly similar structures can have entirely different sequences.^[7] Over very long evolutionary timescales, very few residues show detectable amino acid sequence conservation, howeversecondary structural elements andtertiary structural motifs are highly conserved. Someprotein dynamics^[8] andconformational changes of the protein structure may also be conserved, as is seen in theserpin superfamily.^[9] Consequently, protein tertiary structure can be used to detect homology between proteins even when no evidence of relatedness remains in their sequences.Structural alignment programs, such asDALI, use the 3D structure of a protein of interest to find proteins with similar folds.^[10] However, on rare occasions, related proteins may evolve to be structurally dissimilar^[11] and relatedness can only be inferred by other methods.^[12]^[13]^[14]

Mechanistic similarity

[edit]

Main article:Enzyme mechanism

Thecatalytic mechanism of enzymes within a superfamily is commonly conserved, althoughsubstrate specificity may be significantly different.^[15] Catalytic residues also tend to occur in the same order in the protein sequence.^[16] For the families within the PA clan of proteases, although there has been divergent evolution of thecatalytic triad residues used to perform catalysis, all members use a similar mechanism to performcovalent, nucleophilic catalysis on proteins, peptides or amino acids.^[17] However, mechanism alone is not sufficient to infer relatedness. Some catalytic mechanisms have beenconvergently evolved multiple times independently, and so form separate superfamilies,^[18]^[19]^[20] and in some superfamilies display a range of different (though often chemically similar) mechanisms.^[15]^[21]

Evolutionary significance

[edit]

Protein superfamilies represent the current limits of our ability to identify common ancestry.^[22] They are the largestevolutionary grouping based on directevidence that is currently possible. They are therefore amongst the most ancient evolutionary events currently studied. Some superfamilies have members present in allkingdoms oflife, indicating that the last common ancestor of that superfamily was in thelast universal common ancestor of all life (LUCA).^[23]

Superfamily members may be in different species, with the ancestral protein being the form of the protein that existed in the ancestral species (orthology). Conversely, the proteins may be in the same species, but evolved from a single protein whose gene wasduplicated in the genome (paralogy).

Diversification

[edit]

A majority of proteins contain multiple domains. Between 66 and 80% of eukaryotic proteins have multiple domains while about 40-60% of prokaryotic proteins have multiple domains.^[5] Over time, many of the superfamilies of domains have mixed together. In fact, it is very rare to find "consistently isolated superfamilies".^[5]^[1] When domains do combine, the N- to C-terminal domain order (the "domain architecture") is typically well conserved. Additionally, the number of domain combinations seen in nature is small compared to the number of possibilities, suggesting that selection acts on all combinations.^[5]

Examples

[edit]

α/β hydrolase superfamily: Members share an α/β sheet, containing 8strands connected byhelices, withcatalytic triad residues in the same order,^[24] activities includeproteases,lipases,peroxidases,esterases,epoxide hydrolases anddehalogenases.^[25]
Alkaline phosphatase superfamily: Members share an αβα sandwich structure^[26] as well as performing commonpromiscuous reactions by a common mechanism.^[27]
Globin superfamily: Members share an 8-alpha helix globularglobin fold.^[28]^[29]
Immunoglobulin superfamily: Members share a sandwich-like structure of twosheets of antiparallelβ strands (Ig-fold), and are involved in recognition, binding, andadhesion.^[30]^[31]
LYRM superfamily: Members share a conserved LYR motif (leucine–tyrosine–arginine) embedded within a threeα‑helix structure and function as adaptor proteins essential for mitochondrialFe–S cluster assembly andoxidative phosphorylation complex assembly.^[32]^[33]
PA clan: Members share achymotrypsin-like doubleβ-barrel fold and similarproteolysis mechanisms but sequence identity of <10%. The clan contains bothcysteine andserine proteases (differentnucleophiles).^[2]^[34]
Ras superfamily: Members share a common catalytic G domain of a 6-strand β sheet surrounded by 5 α-helices.^[35]
RSH superfamily: Members share capability to hydrolyze and/or synthesizeppGpp alarmones in thestringent response.^[36]
Serpin superfamily: Members share a high-energy, stressed fold which can undergo a largeconformational change, which is typically used to inhibitserine andcysteine proteases by disrupting their structure.^[9]
TIM barrel superfamily: Members share a large α₈β₈ barrel structure. It is one of the most commonprotein folds and themonophylicity of this superfamily is still contested.^[37]^[38]

Protein superfamily resources

[edit]

Severalbiological databases document protein superfamilies and protein folds, for example:

Pfam - Protein families database of alignments and HMMs
PROSITE - Database of protein domains, families and functional sites
PIRSF - SuperFamily Classification System
PASS2 - Protein Alignment as Structural Superfamilies v2
SUPERFAMILY - Library of HMMs representing superfamilies and database of (superfamily and family) annotations for all completely sequenced organisms
SCOP andCATH - Classifications of protein structures into superfamilies, families and domains

Similarly there are algorithms that search thePDB for proteins with structural homology to a target structure, for example:

DALI - Structural alignment based on a distance alignment matrix method

References

[edit]

^^a ^bHolm L, Rosenström P (July 2010)."Dali server: conservation mapping in 3D".Nucleic Acids Research.38 (Web Server issue): W545–9.doi:10.1093/nar/gkq366.PMC 2896194.PMID 20457744.
^^a ^b ^cRawlings ND, Barrett AJ, Bateman A (January 2012)."MEROPS: the database of proteolytic enzymes, their substrates and inhibitors".Nucleic Acids Research.40 (Database issue): D343–50.doi:10.1093/nar/gkr987.PMC 3245014.PMID 22086950.
^Henrissat B, Bairoch A (June 1996)."Updating the sequence-based classification of glycosyl hydrolases".The Biochemical Journal.316 (Pt 2):695–6.doi:10.1042/bj3160695.PMC 1217404.PMID 8687420.
^"Clustal FAQ #Symbols".Clustal. Archived fromthe original on 24 October 2016. Retrieved8 December 2014.
^^a ^b ^c ^dHan JH, Batey S, Nickson AA, Teichmann SA, Clarke J (April 2007). "The folding and evolution of multidomain proteins".Nature Reviews Molecular Cell Biology.8 (4):319–30.doi:10.1038/nrm2144.PMID 17356578.S2CID 13762291.
^^a ^bPandit SB, Gosar D, Abhiman S, Sujatha S, Dixit SS, Mhatre NS, Sowdhamini R, Srinivasan N (January 2002)."SUPFAM--a database of potential protein superfamily relationships derived by comparing sequence-based and structure-based families: implications for structural genomics and function annotation in genomes".Nucleic Acids Research.30 (1):289–93.doi:10.1093/nar/30.1.289.PMC 99061.PMID 11752317.
^Orengo CA, Thornton JM (2005). "Protein families and their evolution-a structural perspective".Annual Review of Biochemistry.74 (1):867–900.doi:10.1146/annurev.biochem.74.082803.133029.PMID 15954844.
^Liu Y, Bahar I (September 2012)."Sequence evolution correlates with structural dynamics".Molecular Biology and Evolution.29 (9):2253–63.doi:10.1093/molbev/mss097.PMC 3424413.PMID 22427707.
^^a ^bSilverman GA, Bird PI, Carrell RW, Church FC, Coughlin PB, Gettins PG, Irving JA, Lomas DA, Luke CJ, Moyer RW, Pemberton PA, Remold-O'Donnell E, Salvesen GS, Travis J, Whisstock JC (September 2001)."The serpins are an expanding superfamily of structurally similar but functionally diverse proteins. Evolution, mechanism of inhibition, novel functions, and a revised nomenclature".The Journal of Biological Chemistry.276 (36):33293–6.doi:10.1074/jbc.R100016200.PMID 11435447.
^Holm L, Laakso LM (July 2016)."Dali server update".Nucleic Acids Research.44 (W1): W351–5.doi:10.1093/nar/gkw357.PMC 4987910.PMID 27131377.
^Pascual-García A, Abia D, Ortiz ÁR, Bastolla U (2009)."Cross-Over between Discrete and Continuous Protein Structure Space: Insights into Automatic Classification and Networks of Protein Structures".PLOS Computational Biology.5 (3) e1000331.Bibcode:2009PLSCB...5E0331P.doi:10.1371/journal.pcbi.1000331.PMC 2654728.PMID 19325884.
^Li D, Zhang L, Yin H, Xu H, Satkoski Trask J, Smith DG, Li Y, Yang M, Zhu Q (June 2014). "Evolution of primate α and θ defensins revealed by analysis of genomes".Molecular Biology Reports.41 (6):3859–66.doi:10.1007/s11033-014-3253-z.PMID 24557891.S2CID 14936647.
^Krishna SS, Grishin NV (April 2005)."Structural drift: a possible path to protein fold change".Bioinformatics.21 (8):1308–10.doi:10.1093/bioinformatics/bti227.PMID 15604105.
^Bryan PN, Orban J (August 2010)."Proteins that switch folds".Current Opinion in Structural Biology.20 (4):482–8.doi:10.1016/j.sbi.2010.06.002.PMC 2928869.PMID 20591649.
^^a ^bDessailly, Benoit H.; Dawson, Natalie L.; Das, Sayoni; Orengo, Christine A. (2017), "Function Diversity within Folds and Superfamilies",From Protein Structure to Function with Bioinformatics, Springer Netherlands, pp. 295–325,doi:10.1007/978-94-024-1069-3_9,ISBN 978-94-024-1067-9
^Echave J, Spielman SJ, Wilke CO (February 2016)."Causes of evolutionary rate variation among protein sites".Nature Reviews. Genetics.17 (2):109–21.doi:10.1038/nrg.2015.18.PMC 4724262.PMID 26781812.
^Shafee T, Gatti-Lafranconi P, Minter R, Hollfelder F (September 2015)."Handicap-Recover Evolution Leads to a Chemically Versatile, Nucleophile-Permissive Protease".ChemBioChem.16 (13):1866–1869.doi:10.1002/cbic.201500295.PMC 4576821.PMID 26097079.
^Buller AR, Townsend CA (February 2013)."Intrinsic evolutionary constraints on protease structure, enzyme acylation, and the identity of the catalytic triad".Proceedings of the National Academy of Sciences of the United States of America.110 (8): E653–61.Bibcode:2013PNAS..110E.653B.doi:10.1073/pnas.1221050110.PMC 3581919.PMID 23382230.
^Coutinho PM, Deleury E, Davies GJ, Henrissat B (April 2003). "An evolving hierarchical family classification for glycosyltransferases".Journal of Molecular Biology.328 (2):307–17.doi:10.1016/S0022-2836(03)00307-3.PMID 12691742.
^Zámocký M, Hofbauer S, Schaffner I, Gasselhuber B, Nicolussi A, Soudi M, Pirker KF, Furtmüller PG, Obinger C (May 2015)."Independent evolution of four heme peroxidase superfamilies".Archives of Biochemistry and Biophysics.574:108–19.doi:10.1016/j.abb.2014.12.025.PMC 4420034.PMID 25575902.
^Akiva, Eyal; Brown, Shoshana; Almonacid, Daniel E.; Barber, Alan E.; Custer, Ashley F.; Hicks, Michael A.; Huang, Conrad C.; Lauck, Florian; Mashiyama, Susan T. (2013-11-23)."The Structure–Function Linkage Database".Nucleic Acids Research.42 (D1):D521 –D530.doi:10.1093/nar/gkt1130.ISSN 0305-1048.PMC 3965090.PMID 24271399.
^Shakhnovich BE, Deeds E, Delisi C, Shakhnovich E (March 2005)."Protein structure and evolutionary history determine sequence space topology".Genome Research.15 (3):385–92.arXiv:q-bio/0404040.doi:10.1101/gr.3133605.PMC 551565.PMID 15741509.
^Ranea JA, Sillero A, Thornton JM, Orengo CA (October 2006). "Protein superfamily evolution and the last universal common ancestor (LUCA)".Journal of Molecular Evolution.63 (4):513–25.Bibcode:2006JMolE..63..513R.doi:10.1007/s00239-005-0289-7.hdl:10261/78338.PMID 17021929.S2CID 25258028.
^Carr PD, Ollis DL (2009). "Alpha/beta hydrolase fold: an update".Protein and Peptide Letters.16 (10):1137–48.doi:10.2174/092986609789071298.PMID 19508187.
^Nardini M, Dijkstra BW (December 1999). "Alpha/beta hydrolase fold enzymes: the family keeps growing".Current Opinion in Structural Biology.9 (6):732–7.doi:10.1016/S0959-440X(99)00037-8.PMID 10607665.
^"SCOP". Archived fromthe original on 29 July 2014. Retrieved28 May 2014.
^Mohamed MF, Hollfelder F (January 2013). "Efficient, crosswise catalytic promiscuity among enzymes that catalyze phosphoryl transfer".Biochimica et Biophysica Acta (BBA) - Proteins and Proteomics.1834 (1):417–24.doi:10.1016/j.bbapap.2012.07.015.PMID 22885024.
^Branden C, Tooze J (1999).Introduction to protein structure (2nd ed.). New York: Garland Pub.ISBN 978-0-8153-2305-1.
^Bolognesi M, Onesti S, Gatti G, Coda A, Ascenzi P, Brunori M (February 1989). "Aplysia limacina myoglobin. Crystallographic analysis at 1.6 A resolution".Journal of Molecular Biology.205 (3):529–44.doi:10.1016/0022-2836(89)90224-6.PMID 2926816.
^Bork P, Holm L, Sander C (September 1994). "The immunoglobulin fold. Structural classification, sequence patterns and common core".Journal of Molecular Biology.242 (4):309–20.doi:10.1006/jmbi.1994.1582.PMID 7932691.
^Brümmendorf T, Rathjen FG (1995). "Cell adhesion molecules 1: immunoglobulin superfamily".Protein Profile.2 (9):963–1108.PMID 8574878.
^Angerer, Heike (2015-02-12)."Eukaryotic LYR Proteins Interact with Mitochondrial Protein Complexes".Biology.4 (1):133–150.doi:10.3390/biology4010133.ISSN 2079-7737.PMC 4381221.PMID 25686363.
^Dohnálek, Vít; Doležal, Pavel (May 2024)."Installation of LYRM proteins in early eukaryotes to regulate the metabolic capacity of the emerging mitochondrion".Open Biology.14 (5).doi:10.1098/rsob.240021.ISSN 2046-2441.PMC 11293456.PMID 38772414.
^Bazan JF, Fletterick RJ (November 1988)."Viral cysteine proteases are homologous to the trypsin-like family of serine proteases: structural and functional implications".Proceedings of the National Academy of Sciences of the United States of America.85 (21):7872–6.Bibcode:1988PNAS...85.7872B.doi:10.1073/pnas.85.21.7872.PMC 282299.PMID 3186696.
^Vetter IR, Wittinghofer A (November 2001). "The guanine nucleotide-binding switch in three dimensions".Science.294 (5545):1299–304.Bibcode:2001Sci...294.1299V.doi:10.1126/science.1062023.PMID 11701921.S2CID 6636339.
^Atkinson, Gemma C.; Tenson, Tanel; Hauryliuk, Vasili (2011-08-09)."The RelA/SpoT Homolog (RSH) Superfamily: Distribution and Functional Evolution of ppGpp Synthetases and Hydrolases across the Tree of Life".PLOS ONE.6 (8) e23479.Bibcode:2011PLoSO...623479A.doi:10.1371/journal.pone.0023479.ISSN 1932-6203.PMC 3153485.PMID 21858139.
^Nagano N, Orengo CA, Thornton JM (August 2002). "One fold with many functions: the evolutionary relationships between TIM barrel families based on their sequences, structures and functions".Journal of Molecular Biology.321 (5):741–65.doi:10.1016/s0022-2836(02)00649-6.PMID 12206759.
^Farber G (1993). "An α/β-barrel full of evolutionary trouble".Current Opinion in Structural Biology.3 (3):409–412.doi:10.1016/S0959-440X(05)80114-9.

External links

[edit]

Media related toProtein superfamilies at Wikimedia Commons

v t e Enzymes
Activity	Active site Binding site Catalytic triad Oxyanion hole Enzyme promiscuity Diffusion-limited enzyme Cofactor Enzyme catalysis
Regulation	Allosteric regulation Cooperativity Enzyme inhibitor Enzyme activator
Classification	EC number Enzyme superfamily Enzyme family List of enzymes
Kinetics	Enzyme kinetics Eadie–Hofstee diagram Hanes–Woolf plot Lineweaver–Burk plot Michaelis–Menten kinetics
Types	EC1Oxidoreductases (list) EC2Transferases (list) EC3Hydrolases (list) EC4Lyases (list) EC5Isomerases (list) EC6Ligases (list) EC7Translocases (list)