Several methods of RNA splicing occur in nature; the type of splicing depends on the structure of the spliced intron and thecatalysts required for splicing to occur.
The wordintron is derived from the termsintragenic region,[1] andintracistron,[2] that is, a segment of DNA that is located between two exons of agene. The term intron refers to both the DNA sequence within a gene and the corresponding sequence in the unprocessed RNA transcript. As part of the RNA processing pathway, introns are removed by RNA splicing either shortly after or concurrent withtranscription.[3] Introns are found in the genes of most organisms and many viruses. They can be located in a wide range of genes, including those that generateproteins,ribosomal RNA (rRNA), andtransfer RNA (tRNA).[4]
Within introns, a donor site (5' end of the intron), a branch site (near the 3' end of the intron) and an acceptor site (3' end of the intron) are required for splicing. The splice donor site includes an almost invariant sequence GU at the 5' end of the intron, within a larger, less highly conserved region. The splice acceptor site at the 3' end of the intron terminates the intron with an almost invariant AG sequence. Upstream (5'-ward) from the AG there is a region high inpyrimidines (C and U), orpolypyrimidine tract. Further upstream from the polypyrimidine tract is the branchpoint, which includes an adenine nucleotide involved in lariat formation.[5][6] Theconsensus sequence for an intron (in IUPACnucleic acid notation) is: G-G-[cut]-G-U-R-A-G-U (donor site) ... intron sequence ... Y-U-R-A-C (branch sequence 20-50 nucleotides upstream of acceptor site) ... Y-rich-N-C-A-G-[cut]-G (acceptor site).[7] However, it is noted that the specific sequence of intronic splicing elements and the number of nucleotides between the branchpoint and the nearest 3' acceptor site affect splice site selection.[8][9] Also, point mutations in the underlying DNA or errors during transcription can activate acryptic splice site in part of the transcript that usually is not spliced. This results in amature messenger RNA with a missing section of an exon. In this way, apoint mutation, which might otherwise affect only a single amino acid, can manifest as adeletion or truncation in the final protein.[citation needed]
Splicing is catalyzed by thespliceosome, a large RNA-protein complex composed of five small nuclear ribonucleoproteins (snRNPs). Assembly and activity of the spliceosome occurs during transcription of the pre-mRNA. The RNA components of snRNPs interact with the intron and are involved in catalysis. Two types of spliceosomes have been identified (major and minor) which contain differentsnRNPs.
Themajor spliceosome splices introns containing GU at the 5' splice site and AG at the 3' splice site. It is composed of theU1,U2,U4,U5, andU6snRNPs and is active in the nucleus. In addition, a number of proteins includingU2 small nuclear RNA auxiliary factor 1 (U2AF35),U2AF2 (U2AF65)[10] andSF1 are required for the assembly of the spliceosome.[6][11] The spliceosome forms different complexes during the splicing process:[12]
Complex E
The U1 snRNP binds to the GU sequence at the 5' splice site of an intron;
The U2 snRNP displaces SF1 and binds to the branch point sequence and ATP is hydrolyzed;
Complex B (pre-catalytic spliceosome)
The U5/U4/U6 snRNP trimer binds, and the U5 snRNP binds exons at the 5' site, with U6 binding to U2;
Complex B*
The U1 snRNP is released, U5 shifts from exon to intron, and the U6 binds at the 5' splice site;
Complex C (catalytic spliceosome)
U4 is released, U6/U2 catalyzes transesterification, making the 5'-end of the intron ligate to the A on intron and form a lariat, U5 binds exon at 3' splice site, and the 5' site is cleaved, resulting in the formation of the lariat;
Complex C* (post-spliceosomal complex)
U2/U5/U6 remain bound to the lariat, and the 3' site is cleaved and exons are ligated using ATP hydrolysis. The spliced RNA is released, the lariat is released and degraded,[14] and the snRNPs are recycled.
This type of splicing is termedcanonical splicing or termed thelariat pathway, which accounts for more than 99% of splicing. By contrast, when the intronic flanking sequences do not follow the GU-AG rule,noncanonical splicing is said to occur (see "minor spliceosome" below).[15]
Theminor spliceosome is very similar to the major spliceosome, but instead it splices out rare introns with different splice site sequences. While the minor and major spliceosomes contain the same U5snRNP, the minor spliceosome has different but functionally analogous snRNPs for U1, U2, U4, and U6, which are respectively calledU11,U12,U4atac, andU6atac.[16]
In most cases, splicing removes introns as single units from precursormRNA transcripts. However, in some cases, especially in mRNAs with very long introns, splicing happens in steps, with part of an intron removed and then the remaining intron is spliced out in a following step. This has been found first in theUltrabithorax (Ubx) gene of the fruit fly,Drosophila melanogaster, and a few otherDrosophila genes, but cases in humans have been reported as well.[17][18]
Trans-splicing is a form of splicing that removes introns oroutrons, and joins two exons that are not within the same RNA transcript.[19]Trans-splicing can occur between two different endogenouspre-mRNAs or between an endogenous and an exogenous (such as from viruses) or artificial RNAs.[20]
Self-splicing occurs for rare introns that form aribozyme, performing the functions of the spliceosome by RNA alone. There are three kinds of self-splicing introns,Group I,Group II andGroup III. Group I and II introns perform splicing similar to the spliceosome without requiring any protein. This similarity suggests that Group I and II introns may be evolutionarily related to the spliceosome. Self-splicing may also be very ancient, and may have existed in anRNA world present before protein.[citation needed]
tRNA (also tRNA-like) splicing is another rare form of splicing that usually occurs in tRNA. The splicing reaction involves a different biochemistry than the spliceosomal and self-splicing pathways.
In theyeastSaccharomyces cerevisiae, a yeast tRNA splicingendonuclease heterotetramer, composed ofTSEN54,TSEN2,TSEN34, andTSEN15, cleaves pre-tRNA at two sites in the acceptor loop to form a 5'-half tRNA, terminating at a 2',3'-cyclic phosphodiester group, and a 3'-half tRNA, terminating at a 5'-hydroxyl group, along with a discarded intron.[21] Yeast tRNA kinase then phosphorylates the 5'-hydroxyl group usingadenosine triphosphate. Yeast tRNA cyclic phosphodiesterase cleaves the cyclic phosphodiester group to form a 2'-phosphorylated 3' end. Yeast tRNA ligase adds anadenosine monophosphate group to the 5' end of the 3'-half and joins the two halves together.[22] NAD-dependent 2'-phosphotransferase then removes the 2'-phosphate group.[23][24]
Splicing occurs in all thekingdoms ordomains of life, however, the extent and types of splicing can be very different between the major divisions.Eukaryotes splice many protein-codingmessenger RNAs and somenon-coding RNAs.Prokaryotes, on the other hand, splice rarely and mostly non-coding RNAs. Another important difference between these two groups of organisms is that prokaryotes completely lack the spliceosomal pathway.
Because spliceosomal introns are not conserved in all species, there is debate concerning when spliceosomal splicing evolved. Two models have been proposed: the intron late and intron early models (seeintron evolution).
Diagram illustrating the two-step biochemistry of splicing
Spliceosomal splicing and self-splicing involve a two-step biochemical process. Both steps involvetransesterification reactions that occur between RNA nucleotides. tRNA splicing, however, is an exception and does not occur by transesterification.[25]
Spliceosomal and self-splicing transesterification reactions occur via two sequential transesterification reactions. First, the 2'OH of a specificbranchpoint nucleotide within the intron, defined during spliceosome assembly, performs anucleophilic attack on the first nucleotide of the intron at the 5' splice site, forming thelariat intermediate. Second, the 3'OH of the released 5' exon then performs a nucleophilic attack at the first nucleotide following the last nucleotide of the intron at the 3' splice site, thus joining the exons and releasing the intron lariat.[26]
In many cases, the splicing process can create a range of unique proteins by varying the exon composition of the same mRNA. This phenomenon is then calledalternative splicing. Alternative splicing can occur in many ways. Exons can be extended or skipped, or introns can be retained. It is estimated that 95% of transcripts from multiexon genes undergo alternative splicing, some instances of which occur in a tissue-specific manner and/or under specific cellular conditions.[27] Development of high throughput mRNA sequencing technology can help quantify the expression levels of alternatively spliced isoforms. Differential expression levels across tissues and cell lineages allowed computational approaches to be developed to predict the functions of these isoforms.[28][29]Given this complexity, alternative splicing of pre-mRNA transcripts is regulated by a system of trans-acting proteins (activators and repressors) that bind to cis-acting sites or "elements" (enhancers and silencers) on the pre-mRNA transcript itself. These proteins and their respective binding elements promote or reduce the usage of a particular splice site. The binding specificity comes from the sequence and structure of the cis-elements, e.g. in HIV-1 there are many donor and acceptor splice sites. Among the various splice sites, ssA7, which is 3' acceptor site, folds into three stem loop structures, i.e. Intronic splicing silencer (ISS), Exonic splicing enhancer (ESE), and Exonic splicing silencer (ESSE3). Solution structure of Intronic splicing silencer and its interaction to host protein hnRNPA1 give insight into specific recognition.[30] However, adding to the complexity of alternative splicing, it is noted that the effects of regulatory factors are many times position-dependent. For example, a splicing factor that serves as a splicing activator when bound to an intronic enhancer element may serve as a repressor when bound to its splicing element in the context of an exon, and vice versa.[31] In addition to the position-dependent effects of enhancer and silencer elements, the location of the branchpoint (i.e., distance upstream of the nearest 3' acceptor site) also affects splicing.[8] The secondary structure of the pre-mRNA transcript also plays a role in regulating splicing, such as by bringing together splicing elements or by masking a sequence that would otherwise serve as a binding element for a splicing factor.[32][33]
The location of pre-mRNA splicing is throughout the nucleus, and once mature mRNA is generated, it is transported to the cytoplasm for translation. In both plant and animal cells,nuclear speckles are regions with high concentrations of splicing factors. These speckles were once thought to be mere storage centers for splicing factors. However, it is now understood that nuclear speckles help concentrate splicing factors near genes that are physically located close to them. Genes located farther from speckles can still be transcribed and spliced, but their splicing is less efficient compared to those closer to speckles. Cells can vary their genomic positions of genes relative to nuclear speckles as a mechanism to modulate the expression of genes via splicing.[34]
Role of splicing/alternative splicing in HIV-integration
DNA damage affects splicing factors by altering theirpost-translational modification, localization, expression and activity.[36] Furthermore, DNA damage often disrupts splicing by interfering with its coupling totranscription. DNA damage also has an impact on the splicing and alternative splicing of genes intimately associated withDNA repair.[36] For instance, DNA damages modulate the alternative splicing of the DNA repair genesBrca1 andErcc1.
Splicing events can be experimentally altered[37][38] by binding steric-blocking antisenseoligos, such asMorpholinos orPeptide nucleic acids to snRNP binding sites, to the branchpoint nucleotide that closes the lariat,[39] or to splice-regulatory element binding sites.[40]
The use of antisense oligonucleotides to modulate splicing has shown great promise as a therapeutic strategy for a variety of genetic diseases caused by splicing defects.[41]
Recent studies have shown that RNA splicing can be regulated by a variety of epigenetic modifications, including DNA methylation and histone modifications.[42]
It has been suggested that one third of all disease-causing mutations impact onsplicing.[31] Common errors include:
Mutation of a splice site resulting in loss of function of that site. Results in exposure of a prematurestop codon, loss of an exon, or inclusion of an intron.
Mutation of a splice site reducing specificity. May result in variation in the splice location, causing insertion or deletion of amino acids, or most likely, a disruption of thereading frame.
Displacement of a splice site, leading to inclusion or exclusion of more RNA than expected, resulting in longer or shorter exons.
Although many splicing errors are safeguarded by a cellular quality control mechanism termednonsense-mediated mRNA decay (NMD),[43] a number of splicing-related diseases also exist, as suggested above.[44]
Allelic differences in mRNA splicing are likely to be a common and important source of phenotypic diversity at the molecular level, in addition to their contribution to genetic disease susceptibility. Indeed, genome-wide studies in humans have identified a range of genes that are subject to allele-specific splicing.
In plants, variation for flooding stress tolerance correlated with stress-induced alternative splicing of transcripts associated with gluconeogenesis and other processes.[45]
In addition to RNA, proteins can undergo splicing. Although the biomolecular mechanisms are different, the principle is the same: parts of the protein, calledinteins instead of introns, are removed. The remaining parts, calledexteins instead of exons, are fused together.Protein splicing has been observed in a wide range of organisms, including bacteria,archaea, plants, yeast and humans.[46]
The existence of backsplicing was first suggested in 2012.[47] This backsplicing explains the genesis of circular RNAs resulting from the exact junction between the 3' boundary of an exon with the 5' boundary of an exon located upstream.[48] In these exonic circular RNAs, the junction is a classic 3'-5'link.
The exclusion of intronic sequences during splicing can also leave traces, in the form of circular RNAs.[49] In some cases, the intronic lariat is not destroyed and the circular part remains as a lariat-derived circRNA[50].In these lariat-derived circular RNAs, the junction is a 2'-5'link.
^Roy SW, Gilbert W (March 2006). "The evolution of spliceosomal introns: patterns, puzzles and progress".Nature Reviews. Genetics.7 (3):211–221.doi:10.1038/nrg1807.PMID16485020.S2CID33672491.
^Matlin AJ, Clark F, Smith CW (May 2005). "Understanding alternative splicing: towards a cellular code".Nature Reviews. Molecular Cell Biology.6 (5):386–398.doi:10.1038/nrm1645.PMID15956978.S2CID14883495.
^Cheng Z, Menees TM (December 2011). "RNA splicing and debranching viewed through analysis of RNA lariats".Molecular Genetics and Genomics.286 (5–6):395–410.doi:10.1007/s00438-011-0635-y.PMID22065066.S2CID846297.
^Patel AA, Steitz JA (December 2003). "Splicing double: insights from the second spliceosome".Nature Reviews. Molecular Cell Biology.4 (12):960–970.doi:10.1038/nrm1259.PMID14685174.S2CID21816910.
^Pan Q, Shai O, Lee LJ, Frey BJ, Blencowe BJ (December 2008). "Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing".Nature Genetics.40 (12):1413–1415.doi:10.1038/ng.259.PMID18978789.S2CID9228930.
^Morcos PA (June 2007). "Achieving targeted and quantifiable alteration of mRNA splicing with Morpholino oligos".Biochemical and Biophysical Research Communications.358 (2):521–527.doi:10.1016/j.bbrc.2007.04.172.PMID17493584.
^ Salzman J, Gawad C, Wang PL, et al. Circular RNAs are the predominant transcript isoform from hundreds of human genes in diverse cell types. PLoS One 2012;7(2):e30733.
^ Jeck WR, Sorrentino JA, Wang K, et al. Circular RNAs are abundant, conserved, and associated with ALU repeats. RNA 2013;19(2):141-57.
^ Zhang Y, Zhang XO, Chen T, et al. Circular intronic long noncoding RNAs. Molecular cell 2013;51(6):792-806.
^ Talhouarne GJ and Gall JG. Lariat intronic RNAs in the cytoplasm of Xenopus tropicalis oocytes. RNA 2014;20(9):1476-87.