RNA foundation models (FMs) have been extensively used to interpret genomic sequences and address a wide range of in-silico genomic tasks. However, current RNA FMs often overlook the incorporation of secondary structures in the pretraining of FMs, which impedes the effectiveness in various genomic tasks. To address this problem, we leverage filtered high-fidelity structure annotations for structure pretraining to enhance the modeling ability of FMs in single nucleotide resolution tasks. Experimental evaluations across four comprehensive genomic benchmarks demonstrate that our RNA FM consistently outperforms existing RNA FMs, achieving a 40% improvement in RNA secondary structure prediction and obtaining top-tier results on DNA genomic benchmarks even though it has not been pretrained on any DNA genome. We release the code and models to encourage further research to bridge the gap between in-silico predictions and biological reality.
@inproceedings{yang-li-2024-mp, title = "{MP}-{RNA}: Unleashing Multi-species {RNA} Foundation Model via Calibrated Secondary Structure Prediction", author = "Yang, Heng and Li, Ke", editor = "Al-Onaizan, Yaser and Bansal, Mohit and Chen, Yun-Nung", booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2024", month = nov, year = "2024", address = "Miami, Florida, USA", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2024.findings-emnlp.304/", doi = "10.18653/v1/2024.findings-emnlp.304", pages = "5278--5296", abstract = "RNA foundation models (FMs) have been extensively used to interpret genomic sequences and address a wide range of in-silico genomic tasks. However, current RNA FMs often overlook the incorporation of secondary structures in the pretraining of FMs, which impedes the effectiveness in various genomic tasks. To address this problem, we leverage filtered high-fidelity structure annotations for structure pretraining to enhance the modeling ability of FMs in single nucleotide resolution tasks. Experimental evaluations across four comprehensive genomic benchmarks demonstrate that our RNA FM consistently outperforms existing RNA FMs, achieving a 40{\%} improvement in RNA secondary structure prediction and obtaining top-tier results on DNA genomic benchmarks even though it has not been pretrained on any DNA genome. We release the code and models to encourage further research to bridge the gap between in-silico predictions and biological reality."}
<?xml version="1.0" encoding="UTF-8"?><modsCollection xmlns="http://www.loc.gov/mods/v3"><mods ID="yang-li-2024-mp"> <titleInfo> <title>MP-RNA: Unleashing Multi-species RNA Foundation Model via Calibrated Secondary Structure Prediction</title> </titleInfo> <name type="personal"> <namePart type="given">Heng</namePart> <namePart type="family">Yang</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Ke</namePart> <namePart type="family">Li</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <originInfo> <dateIssued>2024-11</dateIssued> </originInfo> <typeOfResource>text</typeOfResource> <relatedItem type="host"> <titleInfo> <title>Findings of the Association for Computational Linguistics: EMNLP 2024</title> </titleInfo> <name type="personal"> <namePart type="given">Yaser</namePart> <namePart type="family">Al-Onaizan</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Mohit</namePart> <namePart type="family">Bansal</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Yun-Nung</namePart> <namePart type="family">Chen</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <originInfo> <publisher>Association for Computational Linguistics</publisher> <place> <placeTerm type="text">Miami, Florida, USA</placeTerm> </place> </originInfo> <genre authority="marcgt">conference publication</genre> </relatedItem> <abstract>RNA foundation models (FMs) have been extensively used to interpret genomic sequences and address a wide range of in-silico genomic tasks. However, current RNA FMs often overlook the incorporation of secondary structures in the pretraining of FMs, which impedes the effectiveness in various genomic tasks. To address this problem, we leverage filtered high-fidelity structure annotations for structure pretraining to enhance the modeling ability of FMs in single nucleotide resolution tasks. Experimental evaluations across four comprehensive genomic benchmarks demonstrate that our RNA FM consistently outperforms existing RNA FMs, achieving a 40% improvement in RNA secondary structure prediction and obtaining top-tier results on DNA genomic benchmarks even though it has not been pretrained on any DNA genome. We release the code and models to encourage further research to bridge the gap between in-silico predictions and biological reality.</abstract> <identifier type="citekey">yang-li-2024-mp</identifier> <identifier type="doi">10.18653/v1/2024.findings-emnlp.304</identifier> <location> <url>https://aclanthology.org/2024.findings-emnlp.304/</url> </location> <part> <date>2024-11</date> <extent unit="page"> <start>5278</start> <end>5296</end> </extent> </part></mods></modsCollection>
%0 Conference Proceedings%T MP-RNA: Unleashing Multi-species RNA Foundation Model via Calibrated Secondary Structure Prediction%A Yang, Heng%A Li, Ke%Y Al-Onaizan, Yaser%Y Bansal, Mohit%Y Chen, Yun-Nung%S Findings of the Association for Computational Linguistics: EMNLP 2024%D 2024%8 November%I Association for Computational Linguistics%C Miami, Florida, USA%F yang-li-2024-mp%X RNA foundation models (FMs) have been extensively used to interpret genomic sequences and address a wide range of in-silico genomic tasks. However, current RNA FMs often overlook the incorporation of secondary structures in the pretraining of FMs, which impedes the effectiveness in various genomic tasks. To address this problem, we leverage filtered high-fidelity structure annotations for structure pretraining to enhance the modeling ability of FMs in single nucleotide resolution tasks. Experimental evaluations across four comprehensive genomic benchmarks demonstrate that our RNA FM consistently outperforms existing RNA FMs, achieving a 40% improvement in RNA secondary structure prediction and obtaining top-tier results on DNA genomic benchmarks even though it has not been pretrained on any DNA genome. We release the code and models to encourage further research to bridge the gap between in-silico predictions and biological reality.%R 10.18653/v1/2024.findings-emnlp.304%U https://aclanthology.org/2024.findings-emnlp.304/%U https://doi.org/10.18653/v1/2024.findings-emnlp.304%P 5278-5296
[MP-RNA: Unleashing Multi-species RNA Foundation Model via Calibrated Secondary Structure Prediction](https://aclanthology.org/2024.findings-emnlp.304/) (Yang & Li, Findings 2024)