The number of non-coding RNAs within the human genome is unknown; however, recenttranscriptomic andbioinformatic studies suggest that there are thousands of non-coding transcripts.[1][2][3][4][5][6][7] Many of the newly identified ncRNAs have unknown functions, if any.[8] There is no consensus on how much of non-coding transcription is functional: some believe most ncRNAs to be non-functional "junk RNA", spurious transcriptions,[9][10] while others expect that many non-coding transcripts have functions to be discovered.[11][12]
The cloverleaf structure of Yeast tRNAPhe (inset) and the 3D structure determined by X-ray analysis.
The first non-coding RNA to be characterised was analanine tRNA found inbaker's yeast, its structure was published in 1965.[16] To produce a purified alanine tRNA sample,Robert W. Holleyet al. used 140kg of commercial baker's yeast to give just 1g of purified tRNAAla for analysis.[17] The 80nucleotide tRNA was sequenced by first being digested withPancreatic ribonuclease (producing fragments ending inCytosine orUridine) and then with takadiastase ribonuclease Tl (producing fragments which finished withGuanosine).Chromatography and identification of the 5' and 3' ends then helped arrange the fragments to establish the RNA sequence.[17] Of the three structures originally proposed for this tRNA,[16] the 'cloverleaf' structure was independently proposed in several following publications.[18][19][20][21] The cloverleafsecondary structure was finalised followingX-ray crystallography analysis performed by two independent research groups in 1974.[22][23]
Noncoding RNAs belong to several groups and are involved in many cellular processes.[26] These range from ncRNAs of central importance that are conserved across all or most cellular life through to more transient ncRNAs specific to one or a few closely related species. The more conserved ncRNAs are thought to bemolecular fossils or relics from thelast universal common ancestor and theRNA world, and their current roles remain mostly in regulation of information flow from DNA to protein.[27][28][29]
Atomic structure of the 50S Subunit fromHaloarcula marismortui. Proteins are shown in blue and the two RNA strands in orange and yellow.[30] The small patch of green in the center of the subunit is the active site.
Many of the conserved, essential and abundant ncRNAs are involved intranslation.Ribonucleoprotein (RNP) particles calledribosomes are the 'factories' where translation takes place in the cell. The ribosome consists of more than 60%ribosomal RNA; these are made up of 3 ncRNAs inprokaryotes and 4 ncRNAs ineukaryotes. Ribosomal RNAs catalyse the translation of nucleotide sequences to protein. Another set of ncRNAs,Transfer RNAs, form an 'adaptor molecule' betweenmRNA and protein. TheH/ACA box and C/D box snoRNAs are ncRNAs found in archaea and eukaryotes.RNase MRP is restricted to eukaryotes. Both groups of ncRNA are involved in the maturation of rRNA. The snoRNAs guide covalent modifications of rRNA, tRNA andsnRNAs; RNase MRP cleaves theinternal transcribed spacer 1 between 18S and 5.8S rRNAs. The ubiquitous ncRNA,RNase P, is an evolutionary relative of RNase MRP.[31] RNase P matures tRNA sequences by generating mature 5'-ends of tRNAs through cleaving the 5'-leader elements of precursor-tRNAs. Another ubiquitous RNP calledSRP recognizes and transports specific nascent proteins to theendoplasmic reticulum ineukaryotes and theplasma membrane inprokaryotes. In bacteria,Transfer-messenger RNA (tmRNA) is an RNP involved in rescuing stalled ribosomes, tagging incompletepolypeptides and promoting the degradation of aberrant mRNA.[citation needed]
Electron microscopy images of the yeast spliceosome. Note the bulk of the complex is in fact ncRNA.
In eukaryotes, thespliceosome performs thesplicing reactions essential for removingintron sequences, this process is required for the formation of maturemRNA. Thespliceosome is another RNP often known as thesnRNP or tri-snRNP. There are two different forms of the spliceosome, the major and minor forms. The ncRNA components of the major spliceosome areU1,U2,U4,U5, andU6. The ncRNA components of the minor spliceosome areU11,U12,U5,U4atac andU6atac.[citation needed]
Another group of introns can catalyse their own removal from host transcripts; these are called self-splicing RNAs. There are two main groups of self-splicing RNAs:group I catalytic intron andgroup II catalytic intron. These ncRNAs catalyze their own excision from mRNA, tRNA and rRNA precursors in a wide range of organisms.[citation needed]
Theexpression of many thousands ofgenes are regulated by ncRNAs. This regulation can occur intrans or incis. There is increasing evidence that a special type of ncRNAs calledenhancer RNAs, transcribed from the enhancer region of a gene, act to promote gene expression.[citation needed]
In higher eukaryotesmicroRNAs regulate gene expression. A single miRNA can reduce the expression levels of hundreds of genes. The mechanism by which mature miRNA molecules act is through partial complementarity to one or more messenger RNA (mRNA) molecules, generally in3' UTRs. The main function of miRNAs is to down-regulate gene expression.
The ncRNARNase P has also been shown to influence gene expression. In the human nucleus,RNase P is required for the normal and efficient transcription of various ncRNAs transcribed byRNA polymerase III. These include tRNA,5S rRNA,SRP RNA, andU6 snRNA genes. RNase P exerts its role in transcription through association with Pol III andchromatin of active tRNA and 5S rRNA genes.[39]
The bacterial ncRNA,6S RNA, specifically associates with RNA polymerase holoenzyme containing thesigma70 specificity factor. This interaction represses expression from a sigma70-dependentpromoter duringstationary phase.[citation needed]
Another bacterial ncRNA,OxyS RNA represses translation by binding to Shine-Dalgarno sequences thereby occluding ribosome binding. OxyS RNA is induced in response to oxidative stress in Escherichia coli.[citation needed]
The B2 RNA is a small noncoding RNA polymerase III transcript that represses mRNA transcription in response to heat shock in mousecells. B2 RNA inhibits transcription by binding to core Pol II. Through this interaction, B2 RNA assembles into preinitiation complexes at the promoter and blocks RNA synthesis.[40]
A recent study has shown that just the act of transcription of ncRNA sequence can have an influence on gene expression.RNA polymerase II transcription of ncRNAs is required forchromatin remodelling in theSchizosaccharomyces pombe. Chromatin is progressively converted to an open configuration, as several species of ncRNAs are transcribed.[41]
RNA leader sequences are found upstream of the first gene of amino acid biosynthetic operons. TheseRNA elements form one of two possible structures in regions encoding very short peptide sequences that are rich in the end product amino acid of the operon. A terminator structure forms when there is an excess of the regulatory amino acid and ribosome movement over the leader transcript is not impeded. When there is a deficiency of the charged tRNA of the regulatory amino acid the ribosome translating the leader peptide stalls and the antiterminator structure forms. This allows RNA polymerase to transcribe the operon. Known RNA leaders areHistidine operon leader,Leucine operon leader,Threonine operon leader and theTryptophan operon leader.[citation needed]
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) are repeats found in theDNA of manybacteria andarchaea. The repeats are separated by spacers of similar length. It has been demonstrated that these spacers can be derived from phage and subsequently help protect the cell from infection.
Telomerase is an RNPenzyme that adds specificDNA sequence repeats ("TTAGGG" in vertebrates) totelomeric regions, which are found at the ends of eukaryoticchromosomes. The telomeres contain condensed DNA material, giving stability to the chromosomes. The enzyme is areverse transcriptase that carriesTelomerase RNA, which is used as a template when it elongates telomeres, which are shortened after eachreplication cycle.
Bifunctional RNAs, ordual-function RNAs, are RNAs that have two distinct functions.[43][44] The majority of the known bifunctional RNAs are mRNAs that encode both a protein and ncRNAs. However, a growing number of ncRNAs fall into two different ncRNA categories; e.g.,H/ACA box snoRNA andmiRNA.[45][46]
Two well known examples of bifunctional RNAs areSgrS RNA andRNAIII. However, a handful of other bifunctional RNAs are known to exist (e.g., steroid receptor activator/SRA,[47] VegT RNA,[48][49] Oskar RNA,[50]ENOD40,[51] p53 RNA[52]SR1 RNA,[53] andSpot 42 RNA.[54]) Bifunctional RNAs were the subject of a 2011 special issue ofBiochimie.[55]
There is an important link between certain non-coding RNAs and the control of hormone-regulated pathways. InDrosophila, hormones such asecdysone andjuvenile hormone can promote the expression of certain miRNAs. Furthermore, this regulation occurs at distinct temporal points withinCaenorhabditis elegans development.[56] In mammals,miR-206 is a crucial regulator ofestrogen-receptor-alpha.[57]
Non-coding RNAs are crucial in the development of several endocrine organs, as well as in endocrine diseases such asdiabetes mellitus.[58] Specifically in the MCF-7 cell line, addition of 17β-estradiol increased global transcription of the noncoding RNAs calledlong noncoding RNAs (lncRNAs) near estrogen-activated coding genes.[59]
It has been suggested that a rareSNP (rs11614913) that overlapshsa-mir-196a-2 has been found to be associated withnon-small cell lung carcinoma.[71] Likewise, a screen of 17 miRNAs that have been predicted to regulate a number of breast cancer associated genes found variations in the microRNAsmiR-17 andmiR-30c-1of patients; these patients were noncarriers ofBRCA1 orBRCA2 mutations, lending the possibility that familial breast cancer may be caused by variation in these miRNAs.[72]Thep53 tumor suppressor is arguably the most important agent in preventing tumor formation and progression. The p53 protein functions as a transcription factor with a crucial role in orchestrating the cellular stress response. In addition to its crucial role in cancer, p53 has been implicated in other diseases including diabetes, cell death after ischemia, and various neurodegenerative diseases such as Huntington, Parkinson, and Alzheimer. Studies have suggested that p53 expression is subject to regulation by non-coding RNA.[5]
Another example of non-coding RNA dysregulated in cancer cells is the long non-coding RNA Linc00707. Linc00707 is upregulated and sponges miRNAs in human bone marrow-derived mesenchymal stem cells,[73] gastric cancer[74] or breast cancer,[75][76] and thus promotes osteogenesis, contributes to hepatocellular carcinoma progression, promotes proliferation and metastasis, or indirectly regulates expression of proteins involved in cancer aggressiveness, respectively.
The deletion of the 48 copies of the C/D box snoRNASNORD116 has been shown to be the primary cause ofPrader–Willi syndrome.[77][78][79][80] Prader–Willi is a developmental disorder associated with over-eating and learning difficulties. SNORD116 has potential target sites within a number of protein-coding genes, and could have a role in regulating alternative splicing.[81]
The chromosomal locus containing thesmall nucleolar RNA SNORD115 gene cluster has been duplicated in approximately 5% of individuals withautistic traits.[82][83] A mouse model engineered to have a duplication of the SNORD115 cluster displays autistic-like behaviour.[84] A recent small study of post-mortem brain tissue demonstrated altered expression of long non-coding RNAs in the prefrontal cortex and cerebellum of autistic brains as compared to controls.[85]
Mutations withinRNase MRP have been shown to causecartilage–hair hypoplasia, a disease associated with an array of symptoms such as short stature, sparse hair, skeletal abnormalities and a suppressed immune system that is frequent amongAmish andFinnish.[86][87][88] The best characterised variant is an A-to-Gtransition at nucleotide 70 that is in a loop region two bases 5' of aconservedpseudoknot. However, many other mutations within RNase MRP also cause CHH.
The antisense RNA,BACE1-AS is transcribed from the opposite strand toBACE1 and is upregulated in patients withAlzheimer's disease.[89] BACE1-AS regulates the expression of BACE1 by increasing BACE1 mRNA stability and generating additional BACE1 through a post-transcriptional feed-forward mechanism. By the same mechanism it also raises concentrations ofbeta amyloid, the main constituent of senile plaques. BACE1-AS concentrations are elevated in subjects with Alzheimer's disease and in amyloid precursor protein transgenic mice.
Variation within the seed region of maturemiR-96 has been associated withautosomal dominant, progressive hearing loss in humans and mice. Thehomozygous mutant mice were profoundly deaf, showing nocochlear responses.Heterozygous mice and humans progressively lose the ability to hear.[90][91][92]
Scientists have started to distinguishfunctional RNA (fRNA) from ncRNA, to describe regions functional at the RNA level that may or may not be stand-alone RNA transcripts.[97][98][99] This implies that fRNA (such as riboswitches,SECIS elements, and other cis-regulatory regions) is not ncRNA. Yet fRNA could also includemRNA, as this is RNA coding for protein, and hence is functional. Additionallyartificially evolved RNAs also fall under the fRNA umbrella term. Some publications[24] state thatncRNA andfRNA are nearly synonymous, however others have pointed out that a large proportion of annotated ncRNAs likely have no function.[9][10] It also has been suggested to simply use the termRNA, since the distinction from a protein coding RNA (messenger RNA) is already given by the qualifiermRNA.[100] This eliminates the ambiguity when addressing a gene "encoding a non-coding" RNA. Besides, there may be a number of ncRNAs that are misannoted in published literature and datasets.[101][102][103]
^Thind AS, Monga I, Thakur PK, Kumari P, Dindhoria K, Krzak M, et al. (November 2021). "Demystifying emerging bulk RNA-Seq applications: the application and utility of bioinformatic methodology".Briefings in Bioinformatics.22 (6).doi:10.1093/bib/bbab259.PMID34329375.
^Zachau HG, Dütting D, Feldmann H, Melchers F, Karau W (1966). "Serine specific transfer ribonucleic acids. XIV. Comparison of nucleotide sequences and secondary structure models".Cold Spring Harbor Symposia on Quantitative Biology.31:417–424.doi:10.1101/SQB.1966.031.01.054.PMID5237198.
^Lerner MR, Boyle JA, Hardin JA, Steitz JA (January 1981). "Two novel classes of small ribonucleoproteins detected by antibodies associated with lupus erythematosus".Science.211 (4480):400–402.Bibcode:1981Sci...211..400L.doi:10.1126/science.6164096.PMID6164096.
^Espinoza CA, Allen TA, Hieb AR, Kugel JF, Goodrich JA (September 2004). "B2 RNA binds directly to RNA polymerase II to repress transcript synthesis".Nature Structural & Molecular Biology.11 (9):822–829.doi:10.1038/nsmb812.PMID15300239.S2CID22199826.
^Zhang J, King ML (December 1996). "Xenopus VegT RNA is localized to the vegetal cortex during oogenesis and encodes a novel T-box transcription factor involved in mesodermal patterning".Development.122 (12):4119–4129.doi:10.1242/dev.122.12.4119.PMID9012531.S2CID28462527.
^Pibouin L, Villaudy J, Ferbus D, Muleris M, Prospéri MT, Remvikos Y, Goubin G (February 2002). "Cloning of the mRNA of overexpression in colon carcinoma-1: a sequence overexpressed in a subset of colon carcinomas".Cancer Genetics and Cytogenetics.133 (1):55–60.doi:10.1016/S0165-4608(01)00634-3.PMID11890990.
^Fu X, Ravindranath L, Tran N, Petrovics G, Srivastava S (March 2006). "Regulation of apoptosis by a prostate-specific and prostate cancer-associated noncoding gene, PCGEM1".DNA and Cell Biology.25 (3):135–141.doi:10.1089/dna.2006.25.135.PMID16569192.
^Xie M, Ma T, Xue J, Ma H, Sun M, Zhang Z, et al. (February 2019). "The long intergenic non-protein coding RNA 707 promotes proliferation and metastasis of gastric cancer by interacting with mRNA stabilizing protein HuR".Cancer Letters.443:67–79.doi:10.1016/j.canlet.2018.11.032.PMID30502359.S2CID54611497.
^Yuan RX, Bao D, Zhang Y (May 2020). "Linc00707 promotes cell proliferation, invasion, and migration via the miR-30c/CTHRC1 regulatory loop in breast cancer".European Review for Medical and Pharmacological Sciences.24 (9):4863–4872.doi:10.26355/eurrev_202005_21175.PMID32432749.S2CID218759508.
^Ding F, Prints Y, Dhar MS, Johnson DK, Garnacho-Montero C, Nicholls RD, Francke U (June 2005). "Lack of Pwcr1/MBII-85 snoRNA is critical for neonatal lethality in Prader-Willi syndrome mouse models".Mammalian Genome.16 (6):424–431.doi:10.1007/s00335-005-2460-2.PMID16075369.S2CID12256515.
^Bolton PF, Veltman MW, Weisblatt E, Holmes JR, Thomas NS, Youings SA, et al. (September 2004). "Chromosome 15q11-13 abnormalities and other medical conditions in individuals with autism spectrum disorders".Psychiatric Genetics.14 (3):131–137.doi:10.1097/00041444-200409000-00002.PMID15318025.S2CID37344935.
^Mencía A, Modamio-Høybjør S, Redshaw N, Morín M, Mayo-Merino F, Olavarrieta L, et al. (May 2009). "Mutations in the seed region of human miR-96 are responsible for nonsyndromic progressive hearing loss".Nature Genetics.41 (5):609–613.doi:10.1038/ng.355.PMID19363479.S2CID11113852.
^Housman G, Ulitsky I (January 2016). "Methods for distinguishing between protein-coding and long noncoding RNAs and the elusive biological purpose of translation of long noncoding RNAs".Biochimica et Biophysica Acta (BBA) - Gene Regulatory Mechanisms.1859 (1):31–40.doi:10.1016/j.bbagrm.2015.07.017.PMID26265145.