.2020 Oct 30;16(10):e1008387.

doi: 10.1371/journal.pcbi.1008387. eCollection 2020 Oct.

RNA structure prediction using positive and negative evolutionary information

Elena Rivas¹

Affiliations

PMID:33125376
PMCID: PMC7657543
DOI: 10.1371/journal.pcbi.1008387

RNA structure prediction using positive and negative evolutionary information

Elena Rivas. PLoS Comput Biol.2020.

.2020 Oct 30;16(10):e1008387.

doi: 10.1371/journal.pcbi.1008387. eCollection 2020 Oct.

Author

Elena Rivas¹

Affiliation

¹ Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts, USA.

PMID:33125376
PMCID: PMC7657543
DOI: 10.1371/journal.pcbi.1008387

Abstract

Knowing the structure of conserved structural RNAs is important to elucidate their function and mechanism of action. However, predicting a conserved RNA structure remains unreliable, even when using a combination of thermodynamic stability and evolutionary covariation information. Here we present a method to predict a conserved RNA structure that combines the following three features. First, it uses significant covariation due to RNA structure and removes spurious covariation due to phylogeny. Second, it uses negative evolutionary information: basepairs that have variation but no significant covariation are prevented from occurring. Lastly, it uses a battery of probabilistic folding algorithms that incorporate all positive covariation into one structure. The method, named CaCoFold (Cascade variation/covariation Constrained Folding algorithm), predicts a nested structure guided by a maximal subset of positive basepairs, and recursively incorporates all remaining positive basepairs into alternative helices. The alternative helices can be compatible with the nested structure such as pseudoknots, or overlapping such as competing structures, base triplets, or other 3D non-antiparallel interactions. We present evidence that CaCoFold predictions are consistent with structures modeled from crystallography.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

**Fig 1. The CaCoFold algorithm.**
**(a)** Toy alignment of five sequences.(b) The statistical analysis identifies five significantly covarying position pairs in the alignment (E-value < 0.05). Column pairs that significantly covary are marked with green arches, compensatory pairwise substitutions including G:U pairs are marked green relative to consensus (black).(c) The maxCov algorithm requires two layers to explain all five covariations. In the first (C0) layer, three positive basepairs depicted in green are grouped together. In successive layers (C+), positive basepairs already taken into account (depicted in red) are excluded.(d) At each layer, a dynamic programming algorithm produces the most probable fold constrained by the assigned positive basepairs (green parentheses), to the exclusion of all negative basepairs and other positive basepairs (red arches). (This toy alignment does not include any negative basepairs.) Residues forming a red arch can pair to other bases. Basepairs that do not significantly covary are depicted by black parentheses.(e) The S+ alternative structures without positive basepairs that overlap in more that half of their residues with the S0 structure are removed. Alternative helices with positive basepairs are always kept.(f) The final consensus structure combining the nested S0 structure with the alternative filtered helices from all other layers is displayed automatically using a modified version of the program R2R. Positive basepairs are depicted in green.

**Fig 2. RNA models used by the CaCoFold algorithm.**
(a) The Nussinov grammar implemented by the maxCov algorithm uses the R-scape E-values of the significantly covarying pairs, and maximizes the sum of -log(E-value). (b) The RBG model used by the first layer of the folding algorithm. (c) The G6X model used by the rest of the layers completing the non-nested part of the RNA structure. For the RBG and G6X models, the F nonterminal is a shorthand for 16 different non-terminals that represent stacked basepairs. The three models are unambiguous, that is, given any nested structure, there is always one possible and unique way in which the structure can be formulated by following the rules of the grammar.

**Fig 3. The CaCoFold algorithm applied to the transfer-messenger RNA (tmRNA).**
Steps(a) to(f) refer to the same methods as described in Fig 1.(a) Characteristics of the input alignment.(b) The statistical test that considers all possible pairs equally resulting in the assignment of 121 significantly covarying positive basepairs. The Rfam consensus structure in not used in the analysis. The whole analysis is performed using the single commandR-scape --fold on the input alignment. The analysis takes 25 seconds (30s including drawing all the figures) on a 3.3 GHz Intel Core i7 MacBook Pro.(c) The maxCov algorithm requires 6 layers to incorporate all 121 positive basepairs.(d) The cascade Constraint folding completes the structure with a total of 139 basepairs.(e) After filtering, there are five pseudoknoted helices, three triplets and 10 other mRNA-induced covariations. The structural display in(f) has been modified by hand to match the standard depiction of the tmRNA secondary structure in(g). The thick line in (g) marked with an asterisk indicates the C-C triplet interaction proposed in Ref. . Details of the mRNA-induced covariations are given in S6(c) Fig.

**Fig 4. CaCoFold structures confirmed by known 3D structures (part 1/7).**
Structural elements with covariation support introduced by CaCoFold relative to the Rfam annotation and corroborated by 3D structures are annotated in blue.(a) The A-type RNase P RNA CaCoFold structure includes relative to the Rfam structure one more helix (P6) and two significant covariations, named tr_1 and tr_2. Blue arrows show the placement of these three covarying motifs relative to the 3D structure [46]. The display of the crystal structure has been modified to indicate with back shaded boxes five regions with tertiary interactions labeled “1” to “5”[68]. “tr_1” occurs in region “3” between P8 and the hairpin loop of P14, and “tr_2” in region “4” representing the interaction between P8 and the hairpin loop of P18. The display of the CaCoFold structure has been modified by hand to match the standard depiction of the structure.(b) The SAM-I riboswitch CaCoFold structure shows relative to the Rfam structure one more helix forming a pseudoknot, and a A-U pair stacking on helix P1 both confirmed by the SAM-I riboswitch 2.9 Å resolution crystal structure ofT. tengcongensis [47]. CaCoFold also identifies additional pairs with covariation support for helices P2a, P3 and P4.(c) The U4 snRNA CaCoFold structure identifies one more internal loop and one more helix than the Rfam structure confirmed by the 3D structure [48]. The new U4 internal loop flanked by covarying Watson-Crick basepairs includes a kink turn (UAG-AG). The non Watson-Crick pairs in a kink turn (A-G, G-A) are generally conserved (> 97% in this alignment) and do not covary.

**Fig 5. CaCoFold structures confirmed by known 3D structures (part 2/7).**
Structural elements with covariation support introduced by CaCoFold relative to the Rfam annotation and corroborated by 3D structures are annotated in blue.(a) Relative to the Rfam structure, the Cobalamin riboswitch CaCoFold structure adds one pseudoknot and one Watson-Crick basepair defining a four-way junction between helices P1, P2, and P3, both confirmed by theS. thermophilum crystal structure [49]. It also adds more covariation support for helices P1 and P2.(b) In CaCoFold structures, alternative helices that do not overlap with the nested structure are annotated as pseudoknots (pk), otherwise they are annotated as triplets (tr). For structures obtained from a crystal structure, non Watson-Crick basepairs are annotated as non-canonical (nc) regardless of whether they are overlapping or not with the nested structure. The tRNA CaCoFold structure has been re-annotated manually to match the labeling of theS. cerevisiae phenylalanine tRNA 1EHZ crystal structure (1.93 Å) for all common basepairs [51]. Of the covarying pairs in the CaCoFold structure but not in the Rfam tRNA structure, five (depicted in blue) are confirmed by the 1EHZ structure as analyzed by RNAView. The sequence of the 1EHZ tRNA does not include the V loop, which appears in 16% of the 954 sequences in the Rfam tRNA seed alignment. Two covarying pairs (depicted in orange) appear to be the result of constraints other than RNA structure. The remaining six covarying pairs are labeled in black. Four basepairs identified in the 3D structure but not incorporated in the CaCoFold structure are depicted in brown. The annotation of the non Watson-Crick pairs with at least two H-bonds follows the nomenclature of [34] that reports the two edges of the nucleotides involved in the plain of the H-bonds. “W” stands for the Watson-Crick edge, “S” for the Sugar edge, and “H” for the Hoogsteen face; “c” and “t” stand for cis and trans respectively. WWc is a standard Watson-Crick basepairs.(c) In the U2 spliceosomal RNA, Stem IIa and Stem IIc, both with covariation support, are two alternative helices that compete to promote different splicing steps [53].

See this image and copyright information in PMC

References

1. Mironov AS, Gusarov I, Rafikov R, Lopez LE, Shatalin K, Kreneva RA, et al. Sensing small molecules by nascent RNA: a mechanism to control transcription in bacteria Cell. Cell. 2002;5:747–756. 10.1016/S0092-8674(02)01134-0 - DOI - PubMed
1. Winkler WC, Nahvi A, Breaker RR. Thiamine derivatives bind messenger RNAs directly to regulate bacterial gene expression. Nature. 2002;419:952–956. 10.1038/nature01145 - DOI - PubMed
1. Babitzke P, Romeo T. CsrB sRNA family: sequestration of RNA-binding regulatory proteins. Current Opinion in Microbiology. 2007;10(2):156–163. 10.1016/j.mib.2007.03.007 - DOI - PubMed
1. Chen J, Wassarman KM, Feng S, Leon K, Feklistov A, Winkelman JT, et al. 6S RNA mimics B-form DNA to regulate Escherichia coli RNA polymerase. Mol Cell. 2017;68(2):388–397.e6. 10.1016/j.molcel.2017.09.006 - DOI - PMC - PubMed
1. Holley RW, Apgar J, Everett GA, Madison JT, Marquisee M, Merrill SH, et al. Structure of a ribonucleic acid. Science. 1965;14:1462–1465. 10.1126/science.147.3664.1462 - DOI - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources

Movatterモバイル変換

Account

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Full text links

Actions

RNA structure prediction using positive and negative evolutionary information

Affiliation

RNA structure prediction using positive and negative evolutionary information

Author

Affiliation

Abstract

Conflict of interest statement

Figures

References

MeSH terms

Substances

LinkOut - more resources

Full Text Sources