Movatterモバイル変換


[0]ホーム

URL:


Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
Thehttps:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

NIH NLM Logo
Log inShow account info
Access keysNCBI HomepageMyNCBI HomepageMain ContentMain Navigation
pubmed logo
Advanced Clipboard
User Guide

Full text links

Silverchair Information Systems full text link Silverchair Information Systems Free PMC article
Full text links

Actions

Share

.2025 Feb 3;17(2):evaf013.
doi: 10.1093/gbe/evaf013.

Tandem Repeats Provide Evidence for Convergent Evolution to Similar Protein Structures

Affiliations

Tandem Repeats Provide Evidence for Convergent Evolution to Similar Protein Structures

Erik S Wright. Genome Biol Evol..

Abstract

Homology is a key concept underpinning the comparison of sequences across organisms. Sequence-level homology is based on a statistical framework optimized over decades of work. Recently, computational protein structure prediction has enabled large-scale homology inference beyond the limits of accurate sequence alignment. In this regime, it is possible to observe nearly identical protein structures lacking detectable sequence similarity. In the absence of a robust statistical framework for structure comparison, it is largely assumed similar structures are homologous. However, it is conceivable that matching structures could arise through convergent evolution, resulting in analogous proteins without shared ancestry. Large databases of predicted structures offer a means of determining whether analogs are present among structure matches. Here, I find that a small subset (∼2.6%) of Foldseek clusters lack sequence-level support for homology, including ∼1% of strong structure matches with template modeling score ≥ 0.5. This result by itself does not imply these structure pairs are nonhomologous, since their sequences could have diverged beyond the limits of recognition. Yet, strong matches without sequence-level support for homology are enriched in structures with predicted repeats that could induce spurious matches. Some of these structural repeats are underpinned by sequence-level tandem repeats in both matching structures. I show that many of these tandem repeat units have genealogies inconsistent with their corresponding structures sharing a common ancestor, implying these highly similar structure pairs are analogous rather than homologous. This result suggests caution is warranted when inferring homology from structural resemblance alone in the absence of sequence-level support for homology.

Keywords: TM-score; analogy; homology; protein structure search.

© The Author(s) 2025. Published by Oxford University Press on behalf of Society for Molecular Biology and Evolution.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Discerning homology from analogy in structure matches. a) Birds share homologous beaks that descended from a common ancestor through divergence. In contrast, parrots and parrot fish have analogous beak morphology as a result of convergent evolution. b) Genotypes can be used to determine whether similar traits are the result of shared ancestry (homology) or resulted from convergent evolution (analogy). Asking whether structures share a common ancestor is equivalent to testing if the time to their most recent common ancestor (tMRCA) is less than infinity. To this end, a structural match's substitution score is compared to that of bootstrapped sequence alignments drawn from the match's background sequence distribution. Sequences with support for homology will have substitution scores greater than the random expectation. This approach can distinguish homology from nonhomology but not analogy from nonhomology. c) Tandem repeat units must exist before they diverge in order for the repeat units to be homologous. The process of descent from a common ancestor is expected to result in intermixed repeats on a phylogenetic tree created from the alignment of repeat units. However, tandem repeat units underlying analogous proteins are expected to segregate into separate clades on a phylogenetic tree. Therefore, the support for the branch partitioning the two clades on the phylogenetic tree is a measure of analogy.
Fig. 2.
Fig. 2.
Some strong structure matches lack support for homology. Points represent Foldseek structure matches between high confidence AlphaFold Database structures. Average support for homology increased with structural similarity (TM-score). However, 1% of strong structure matches (TM-score ≥ 0.5) lacked support for homology (i.e. points below 0.99 to the right of the vertical dashed line). This subset was depleted of multidomain proteins and enriched in structural repeats. The curve represents a logistic regression fit and 95% confidence intervals (shaded area). Red points represent the presence of a structural repeat with high repeat unit similarity (average repeat unit TM-score ≥ 0.5).
Fig. 3.
Fig. 3.
Representative structural repeat proteins. AlphaFold Database structures clustered by Foldseek were investigated containing structural repeats with high TM-scores (≥0.5) but low support for homology (<0.99). Matches were filtered to the subset of 90 structures with detectable structural repeats and sequence-level tandem repeats. Selected structural alignments from different repeat types are shown. Pairs of structures with the highest TM-scores tended to be β-solenoids or α-helical coils. However, more complex repeat structures were also observed. UniProt accessions are listed in the color corresponding to each structure.
Fig. 4.
Fig. 4.
Strong structure matches likely due to analogy. The repeat units underlying a subset of 30 matching structures with very high structural similarity (TM-score ≥ 0.78) underwent structure-guided alignment. All balanced minimum evolution trees constructed from the repeat unit alignments had strong bootstrap support for topologies consistent with the repeat units arising after splitting from a hypothetical common ancestor. This tree topology implies the structures are analogous, because the repeats did not exist in the same ancestor. Some tandem repeats had different periodicity between structures, further indicating the sequences are nonhomologous. Trees are mid-point rooted with bootstrap support listed above the root. UniProt accessions are listed in the color corresponding to the leaves. Repeat unit alignments are depicted as a set of boxes colored by amino acid, with gaps (i.e. “–”) in black.
See this image and copyright information in PMC

Similar articles

  • Expert Witness.
    Ronquillo Y, Robinson KJ, Kopitnik NL, Nouhan PP.Ronquillo Y, et al.2024 Dec 7. In: StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2025 Jan–.2024 Dec 7. In: StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2025 Jan–.PMID:28613772Free Books & Documents.
  • Peer Play.
    Scott HK, Cogburn M.Scott HK, et al.2023 Jul 4. In: StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2025 Jan–.2023 Jul 4. In: StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2025 Jan–.PMID:30020595Free Books & Documents.
  • Gadolinium Magnetic Resonance Imaging.
    Ibrahim MA, Hazhirkarzar B, Dublin AB.Ibrahim MA, et al.2023 Jul 3. In: StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2025 Jan–.2023 Jul 3. In: StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2025 Jan–.PMID:29494094Free Books & Documents.
  • Exploring conceptual and theoretical frameworks for nurse practitioner education: a scoping review protocol.
    Wilson R, Godfrey CM, Sears K, Medves J, Ross-White A, Lambert N.Wilson R, et al.JBI Database System Rev Implement Rep. 2015 Oct;13(10):146-55. doi: 10.11124/jbisrir-2015-2150.JBI Database System Rev Implement Rep. 2015.PMID:26571290
  • Depressing time: Waiting, melancholia, and the psychoanalytic practice of care.
    Salisbury L, Baraitser L.Salisbury L, et al.In: Kirtsoglou E, Simpson B, editors. The Time of Anthropology: Studies of Contemporary Chronopolitics. Abingdon: Routledge; 2020. Chapter 5.In: Kirtsoglou E, Simpson B, editors. The Time of Anthropology: Studies of Contemporary Chronopolitics. Abingdon: Routledge; 2020. Chapter 5.PMID:36137063Free Books & Documents.Review.
See all similar articles

References

    1. Barrio-Hernandez I, Yeo J, Janes J, Mirdita M, Gilchrist CLM, Wein T, Varadi M, Velankar S, Beltrao P, Steinegger M. Clustering predicted structures at the scale of the known protein universe. Nature. 2023:622(7983):637–645. 10.1038/s41586-023-06510-w. - DOI - PMC - PubMed
    1. Bittrich S, Segura J, Duarte JM, Burley SK, Rose Y. RCSB Protein Data Bank: exploring protein 3D similarities via comprehensive structural alignments. Bioinformatics. 2024:40(6):btae370. 10.1093/bioinformatics/btae370. - DOI - PMC - PubMed
    1. Bliven SE, Lafita A, Rose PW, Capitani G, Prlic A, Bourne PE. Analyzing the symmetrical arrangement of structural repeats in proteins with CE-Symm. PLoS Comput Biol. 2019:15(4):e1006842. 10.1371/journal.pcbi.1006842. - DOI - PMC - PubMed
    1. Cheng H, Kim BH, Grishin NV. MALISAM: a database of structurally analogous motifs in proteins. Nucleic Acids Res. 2007:36(Database):D211–D217. 10.1093/nar/gkm698. - DOI - PMC - PubMed
    1. Clementel D, Arrias PN, Mozaffari S, Osmanli Z, Castro XA; Repeats DBc curators, Ferrari C, Kajava AV, Tosatto SCE, Monzon AM. 2024. RepeatsDB in 2025: expanding annotations of structured tandem repeats proteins on AlphaFoldDB. Nucleic Acids Res. 53(D1):D575–D581. 10.1093/nar/gkae965. - DOI - PMC - PubMed

MeSH terms

Substances

Related information

Grants and funding

LinkOut - more resources

Full text links
Silverchair Information Systems full text link Silverchair Information Systems Free PMC article
Cite
Send To

NCBI Literature Resources

MeSHPMCBookshelfDisclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.


[8]ページ先頭

©2009-2025 Movatter.jp