Thetranscriptome is the set of all RNA molecules (transcripts) in a cell or a population of cells. It includes all of the functional RNA molecules and all other transcripts that may arise by spurious transcription or transcription of non-functional regions such aspseudogenes or virus fragments. A major goal of modern molecular biology is to determine which transcripts are functional and which ones are junk RNA.
The termtranscriptome is a portmanteau of the wordstranscript andgenome; it is associated with the process of transcript production during the biological process oftranscription. The functional part of the transcriptome is dynamic — it changes with cell type, developmental stage, environment, and stimuli — and therefore represents the active gene expression state rather than the static DNA sequence (genome).
Eukaryotic transcriptomes tend to be more complex than bacterial transcriptomes and the transcriptomes of multicellular eukaryotes are even more complex than those of unicellular eukaryotes.
The wordtranscriptome is aportmanteau of the wordstranscript andgenome. It appeared along with otherneologisms formed using the suffixes-ome and-omics to denote all studies conducted on a genome-wide scale in the fields of life sciences and technology. As such, transcriptome and transcriptomics were one of the first words to emerge along with genome and proteome.[1] The first study to present a case of a collection of acDNA library forsilk moth mRNA was published in 1979.[2] The first seminal study to mention and investigate the transcriptome of an organism was published in 1997 and it described 60,633 transcripts expressed inS. cerevisiae usingserial analysis of gene expression (SAGE).[3] With the rise of high-throughput technologies andbioinformatics and the subsequent increased computational power, it became increasingly efficient and easy to characterize and analyze enormous amount of data.[1] Attempts to characterize the transcriptome became more prominent with the advent of automated DNA sequencing during the 1980s.[4] During the 1990s,expressed sequence tag sequencing was used to identify genes and their fragments.[5] This was followed by techniques such as serial analysis of gene expression (SAGE),cap analysis of gene expression (CAGE), andmassively parallel signature sequencing (MPSS).
The transcriptome encompasses all theribonucleic acid (RNA) transcripts present in a given organism or experimental sample.[6] The functional component of the transcriptome includes RNAs that carry genetic information that is responsible for the process of convertingDNA into an organism's phenotype. A gene gives rise to a single-stranded RNA molecule through a molecular process known astranscription; this RNA is complementary to the strand of DNA it originated from.[4] The enzyme RNA polymerase attaches to the template DNA strand and catalyzes the addition of ribonucleotides to the 3' end of the growing sequence of the RNA transcript.[7]
In order to initiate its function, RNA polymerase needs to recognize apromoter sequence, located near the transcription start site that defines the beginning of the gene. This process is usually mediated and regulated by transcription factors. Transcription ends at a terminator site that defines the other end of the gene. The terminator site is often identified by termination sequences.
Almost all functional transcripts are derived from known genes. The only exceptions are a small number of transcripts that might play a direct role in regulating gene expression near the prompters of known genes. (SeeEnhancer RNA.)
Genes occupy most of prokaryotic genomes so most of their genomes are transcribed. Many eukaryotic genomes are very large and known genes may take up only a fraction of the genome. In mammals, for example, known genes only account for 40-50% of the genome.[8] Nevertheless, identified transcripts often map to a much larger fraction of the genome suggesting that the transcriptome contains spurious transcripts that do not come from genes. Some of these transcripts are known to be non-functional because they map to transcribedpseudogenes or degenerativetransposons and viruses. Others map to unidentified regions of the genome that may bejunk DNA.
Spurious transcription is very common in eukaryotes, especially those with large genomes that might contain a lot of junk DNA.[9][10][11][12] Some scientists claim that if a transcript has not been assigned to a known gene then the default assumption must be that it is junk RNA until it has been shown to be functional.[9][13] This would mean that much of the transcriptome in species with large genomes is probably junk RNA. (SeeNon-coding RNA)
The transcriptome includes the transcripts of protein-coding genes (mRNA plusintrons) as well as the transcripts of non-coding genes (functional RNAs plus introns).
In the human genome, all genes get transcribed into RNA because that's how the molecular gene is defined. (SeeGene.) The transcriptome consists of coding regions of mRNA plus non-codinguntranslated regions (UTRs), introns, non-coding RNAs, and spurious non-functional transcripts.
Several factors render the content of the transcriptome difficult to establish. These includealternative splicing,RNA editing and alternative transcription among others.[15] Additionally, transcriptome techniques are capable of capturing transcription occurring in a sample at a specific time point, although the content of the transcriptome can change during differentiation.[4] The main aims of transcriptomics are the following: "catalogue all species of transcript, including mRNAs, non-coding RNAs and small RNAs; to determine the transcriptional structure of genes, in terms of their start sites, 5′ and 3′ ends, splicing patterns and other post-transcriptional modifications; and to quantify the changing expression levels of each transcript during development and under different conditions".[16]
The term can be applied to the total set of transcripts in a givenorganism, or to the specific subset of transcripts present in a particular cell type. Unlike thegenome, which is roughly fixed for a given cell line (excludingmutations), the transcriptome can vary with external environmental conditions. Because it includes all mRNA transcripts in the cell, the transcriptome reflects thegenes that are being activelyexpressed at any given time, with the exception of mRNA degradation phenomena such astranscriptional attenuation. The study oftranscriptomics, (which includesexpression profiling,splice variant analysis etc.), examines the expression level of RNAs in a given cell population, often focusing on mRNA, but sometimes including others such as tRNAs and sRNAs.
Transcriptomics is the quantitative science that encompasses the assignment of a list of strings ("reads") to the object ("transcripts" in the genome). To calculate the expression strength, the density of reads corresponding to each object is counted.[17] Initially, transcriptomes were analyzed and studied usingexpressed sequence tags libraries and serial and cap analysis of gene expression (SAGE).
Currently, the two maintranscriptomics techniques includeDNA microarrays andRNA-Seq. Both techniques require RNA isolation throughRNA extraction techniques, followed by its separation from other cellular components and enrichment of mRNA.[18][19]
There are two general methods of inferring transcriptome sequences. One approach maps sequence reads onto a reference genome, either of the organism itself (whose transcriptome is being studied) or of a closely related species. The other approach,de novo transcriptome assembly, uses software to infer transcripts directly from short sequence reads and is used in organisms with genomes that are not sequenced.[20]

The first transcriptome studies were based onmicroarray techniques (also known as DNA chips). Microarrays consist of thin glass layers with spots on whicholigonucleotides, known as "probes" are arrayed; each spot contains a known DNA sequence.[21]
When performing microarray analyses, mRNA is collected from a control and an experimental sample, the latter usually representative of a disease. The RNA of interest is converted to cDNA to increase its stability and marked withfluorophores of two colors, usually green and red, for the two groups. The cDNA is spread onto the surface of the microarray where it hybridizes with oligonucleotides on the chip and a laser is used to scan. The fluorescence intensity on each spot of the microarray corresponds to the level of gene expression and based on the color of the fluorophores selected, it can be determined which of the samples exhibits higher levels of the mRNA of interest.[5]
One microarray usually contains enough oligonucleotides to represent all known genes; however, data obtained using microarrays does not provide information about unknown genes. During the 2010s, microarrays were almost completely replaced by next-generation techniques that are based on DNA sequencing.
RNA sequencing is anext-generation sequencing technology; as such it requires only a small amount of RNA and no previous knowledge of the genome.[1] It allows for both qualitative and quantitative analysis of RNA transcripts, the former allowing discovery of new transcripts and the latter a measure of relative quantities for transcripts in a sample.[14]
The three main steps of sequencing transcriptomes of any biological samples include RNA purification, the synthesis of an RNA or cDNA library and sequencing the library.[14] The RNA purification process is different for short and long RNAs.[14] This step is usually followed by an assessment of RNA quality, with the purpose of avoiding contaminants such as DNA or technical contaminants related to sample processing. RNA quality is measured using UV spectrometry with an absorbance peak of 260 nm.[22] RNA integrity can also be analyzed quantitatively comparing the ratio and intensity of28S RNA to18S RNA reported in the RNA Integrity Number (RIN) score.[22] Since mRNA is the species of interest and it represents only 3% of its total content, the RNA sample should be treated to remove rRNA and tRNA and tissue-specific RNA transcripts.[22]
The step of library preparation with the aim of producing short cDNA fragments, begins with RNA fragmentation to transcripts in length between 50 and 300base pairs. Fragmentation can be enzymatic (RNAendonucleases), chemical (trismagnesium salt buffer,chemical hydrolysis) or mechanical (sonication, nebulisation).[23]Reverse transcription is used to convert the RNA templates into cDNA and three priming methods can be used to achieve it, including oligo-DT, using random primers or ligating special adaptor oligos.
Transcription can also be studied at the level of individual cells bysingle-cell transcriptomics. Single-cell RNA sequencing (scRNA-seq) is a recently developed technique that allows the analysis of the transcriptome of single cells, includingbacteria.[24] With single-cell transcriptomics, subpopulations of cell types that constitute the tissue of interest are also taken into consideration.[25] This approach allows one to identify whether changes in experimental samples are due to phenotypic cellular changes as opposed to proliferation, with which a specific cell type might be overexpressed in the sample.[26] Additionally, when assessing cellular progression throughdifferentiation, average expression profiles are only able to order cells by time rather than their stage of development and are consequently unable to show trends in gene expression levels specific to certain stages.[27] Single-cell transcriptomic techniques have been used to characterize rare cell populations such ascirculating tumor cells, cancer stem cells in solid tumors, andembryonic stem cells (ESCs) in mammalianblastocysts.[28]
Although there are no standardized techniques for single-cell transcriptomics, several steps need to be undertaken. The first step includes cell isolation, which can be performed using low- and high-throughput techniques. This is followed by a qPCR step and then single-cell RNAseq where the RNA of interest is converted into cDNA. Newer developments in single-cell transcriptomics allow for tissue and sub-cellular localization preservation through cryo-sectioning thin slices of tissues and sequencing the transcriptome in each slice. Another technique allows the visualization of single transcripts under a microscope while preserving the spatial information of each individual cell where they are expressed.[28]
A number of organism-specific transcriptome databases have been constructed and annotated to aid in the identification of genes that are differentially expressed in distinct cell populations.
RNA-seq is emerging (2013) as the method of choice for measuring transcriptomes of organisms, though the older technique ofDNA microarrays is still used.[16] RNA-seq measures the transcription of a specific gene by converting long RNAs into a library ofcDNA fragments. The cDNA fragments are then sequenced using high-throughput sequencing technology and aligned to a reference genome or transcriptome which is then used to create an expression profile of the genes.[16]
The number ofprotein-coding RNA sequences expression by each organ varies significantly between the organs, but also depends on the definitions and methodology used. In general,brain,testes,lymphatic system show the highest activity, andendometrium,gallbladder,seminal vesicle andsmooth muscle show the lowest.[29]
MammalsThe transcriptomes ofstem cells andcancer cells are of particular interest to researchers who seek to understand the processes ofcellular differentiation andcarcinogenesis. Transcriptomic profiling is being studied as a tool for cancer treatment, by matching genetic profiles to recommended treatments.[30][31] This demonstrates transcriptomics as a powerful tool for the development of personalized cancer treatment. A pipeline using RNA-seq or gene array data can be used to track genetic changes occurring instem andprecursor cells and requires at least three independent gene expression data from the former cell type and mature cells.[32]
Analysis of the transcriptomes of humanoocytes andembryos is used to understand the molecular mechanisms and signaling pathways controlling early embryonic development, and could theoretically be a powerful tool in making properembryo selection inin vitro fertilisation.[citation needed] Analyses of the transcriptome content of the placenta in the first-trimester of pregnancy inin vitro fertilization and embryo transfer (IVT-ET) revealed differences in genetic expression which are associated with higher frequency of adverse perinatal outcomes. Such insight can be used to optimize the practice.[33] Transcriptome analyses can also be used to optimize cryopreservation of oocytes, by lowering injuries associated with the process.[34]
Transcriptomics is an emerging and continually growing field inbiomarker discovery for use in assessing the safety of drugs or chemicalrisk assessment.[35]
Transcriptomes may also be used toinfer phylogenetic relationships among individuals or to detect evolutionary patterns of transcriptome conservation.[36]
Transcriptome analyses were used to discover the incidence of antisense transcription, their role in gene expression through interaction with surrounding genes and their abundance in different chromosomes.[37] RNA-seq was also used to show how RNA isoforms, transcripts stemming from the same gene but with different structures, can produce complex phenotypes from limited genomes.[20]
Transcriptome analysis have been used to study theevolution and diversification process of plant species. In 2014, the1000 Plant Genomes Project was completed in which the transcriptomes of 1,124 plant species from the familiesviridiplantae,glaucophyta andrhodophyta were sequenced. The protein coding sequences were subsequently compared to infer phylogenetic relationships between plants and to characterize the time of theirdiversification in the process of evolution.[38] Transcriptome studies have been used to characterize and quantify gene expression in maturepollen. Genes involved in cell wall metabolism and cytoskeleton were found to be overexpressed. Transcriptome approaches also allowed to track changes in gene expression through different developmental stages of pollen, ranging from microspore to mature pollen grains; additionally such stages could be compared across species of different plants includingArabidopsis,rice andtobacco.[39]

Similar to other-ome based technologies, analysis of the transcriptome allows for an unbiased approach when validating hypotheses experimentally. This approach also allows for the discovery of novel mediators in signaling pathways.[17] As with other -omics based technologies, the transcriptome can be analyzed within the scope of amultiomics approach. It is complementary tometabolomics but contrary to proteomics, a direct association between a transcript andmetabolite cannot be established.
There are several -ome fields that can be seen as subcategories of the transcriptome. Theexome differs from the transcriptome in that it includes only those RNA molecules found in a specified cell population, and usually includes the amount or concentration of each RNA molecule in addition to the molecular identities. Additionally, the transcriptome also differs from thetranslatome, which is the set of RNAs undergoing translation.
The term meiome is used infunctional genomics to describe the meiotic transcriptome or the set of RNA transcripts produced during the process ofmeiosis.[40] Meiosis is a key feature of sexually reproducingeukaryotes, and involves the pairing ofhomologous chromosome, synapse and recombination. Since meiosis in most organisms occurs in a short time period, meiotic transcript profiling is difficult due to the challenge of isolation (or enrichment) of meiotic cells (meiocytes). As with transcriptome analyses, the meiome can be studied at a whole-genome level using large-scale transcriptomic techniques.[41] The meiome has been well-characterized in mammal and yeast systems and somewhat less extensively characterized in plants.[42]
Thethanatotranscriptome consists of all RNA transcripts that continue to be expressed or that start getting re-expressed in internal organs of a dead body 24–48 hours following death. Some genes include those that are inhibited afterfetal development. If the thanatotranscriptome is related to the process of programmed cell death (apoptosis), it can be referred to as the apoptotic thanatotranscriptome. Analyses of the thanatotranscriptome are used inforensic medicine.[43]
eQTL mapping can be used to complement genomics with transcriptomics; genetic variants at DNA level and gene expression measures at RNA level.[44]
The transcriptome can be seen as a subset of theproteome, that is, the entire set of proteins expressed by a genome.
However, the analysis of relative mRNA expression levels can be complicated by the fact that relatively small changes in mRNA expression can produce large changes in the total amount of the corresponding protein present in the cell. One analysis method, known asgene set enrichment analysis, identifies coregulated gene networks rather than individual genes that are up- or down-regulated in different cell populations.[1]
Although microarray studies can reveal the relative amounts of different mRNAs in the cell, levels of mRNA are not directly proportional to the expression level of theproteins they code for.[45] The number of protein molecules synthesized using a given mRNA molecule as a template is highly dependent on translation-initiation features of the mRNA sequence; in particular, the ability of the translation initiation sequence is a key determinant in the recruiting ofribosomes for proteintranslation.