Cis-regulatory elements (CREs) orcis-regulatory modules (CRMs) are regions ofnon-coding DNA whichregulate thetranscription of neighboringgenes. CREs are vital components ofgenetic regulatory networks, which in turn controlmorphogenesis, the development ofanatomy, and other aspects ofembryonic development, studied inevolutionary developmental biology.
CREs are found in the vicinity of the genes that they regulate. CREs typically regulate gene transcription by binding totranscription factors. A single transcription factor may bind to many CREs, and hence control the expression of many genes (pleiotropy). TheLatin prefixcis means "on this side", i.e. on the same molecule of DNA as the gene(s) to be transcribed.
CRMs are stretches ofDNA, usually 100–1000 DNA base pairs in length,[1] where a number oftranscription factors can bind andregulate expression of nearbygenes and regulate their transcription rates. They are labeled ascis because they are typically located on the same DNA strand as the genes they control as opposed totrans, which refers to effects on genes not located on the same strand or farther away, such as transcription factors.[1] Onecis-regulatory element can regulate several genes,[2] and conversely, one gene can have severalcis-regulatory modules.[3]Cis-regulatory modules carry out their function by integrating the active transcription factors and the associated co-factors at a specific time and place in the cell where this information is read and an output is given.[4]
CREs are often but not always upstream of the transcription site. CREs contrast withtrans-regulatory elements (TREs). TREs code for transcription factors.[citation needed]
Thegenome of an organism contains anywhere from a few hundred to thousands of different genes, all encoding a singular product or more. For numerous reasons, including organizational maintenance, energy conservation, and generatingphenotypic variance, it is important that genes are only expressed when they are needed. The most efficient way for an organism toregulate gene expression is at the transcriptional level. CREs function to control transcription by acting nearby or within a gene. The most well characterized types of CREs areenhancers andpromoters. Both of these sequence elements are structural regions of DNA that serve astranscriptional regulators.[citation needed]
Cis-regulatory modules are one of several types of functionalregulatory elements. Regulatory elements are binding sites for transcription factors, which are involved in gene regulation.[1]Cis-regulatory modules perform a large amount of developmental information processing.[1]Cis-regulatory modules are non-random clusters at their specified target site that contain transcription factor binding sites.[1]
The original definition presented cis-regulatory modules as enhancers of cis-acting DNA, which increased the rate of transcription from a linkedpromoter.[4] However, this definition has changed to definecis-regulatory modules as a DNA sequence with transcription factor binding sites which are clustered into modular structures, including -but not limited to- locus control regions, promoters, enhancers, silencers, boundary control elements and other modulators.[4]
Cis-regulatory modules can be divided into three classes;enhancers, which regulate gene expression positively;[1]insulators, which work indirectly by interacting with other nearbycis-regulatory modules; and[1]silencers that turn off expression of genes.[1]
The design ofcis-regulatory modules is such thattranscription factors andepigenetic modifications serve as inputs, and the output of the module is the command given to the transcription machinery, which in turn determines the rate of gene transcription or whether it isturned on oroff.[1] There are two types of transcription factor inputs: those that determine when the target gene is to be expressed and those that serve as functionaldrivers, which come into play only during specific situations during development.[1] These inputs can come from different time points, can represent different signal ligands, or can come from different domains or lineages of cells. However, a lot still remains unknown.[citation needed]
Additionally, the regulation of chromatin structure and nuclear organization also play a role in determining and controlling the function of cis-regulatory modules.[4] Thus gene-regulation functions (GRF) provide a unique characteristic of a cis-regulatory module (CRM), relating the concentrations of transcription factors (input) to the promoter activities (output). The challenge is to predict GRFs. This challenge still remains unsolved. In general, gene-regulation functions do not useBoolean logic,[2] although in some cases the approximation of the Boolean logic is still very useful.[citation needed]
Within the assumption of the Boolean logic, principles guiding the operation of these modules includes the design of the module which determines the regulatory function. In relation to development, these modules can generate both positive and negative outputs. The output of each module is a product of the various operations performed on it. Common operations include theOR gate – this design indicates that in an output will be given when either input is given [3], and theAND gate – in this design two different regulatory factors are necessary to make sure that a positive output results.[1] "Toggle Switches" – This design occurs when the signal ligand is absent while the transcription factor is present; this transcription factor ends up acting as a dominant repressor. However, once the signal ligand is present the transcription factor's role as repressor is eliminated and transcription can occur.[1]
Other Boolean logic operations can occur as well, such as sequence specific transcriptional repressors, which when they bind to thecis-regulatory module lead to an output of zero. Additionally, besides influence from the different logic operations, the output of a "cis"-regulatory module will also be influenced by prior events.[1]4)Cis-regulatory modules must interact with other regulatory elements. For the most part, even with the presence of functional overlap betweencis-regulatory modules of a gene, the modules' inputs and outputs tend to not be the same.[1]
While the assumption of Boolean logic is important forsystems biology, detailed studies show that in general the logic of gene regulation is not Boolean.[2] This means, for example, that in the case of acis-regulatory module regulated by two transcription factors, experimentally determined gene-regulation functions can not be described by the 16 possible Boolean functions of two variables. Non-Boolean extensions of the gene-regulatory logic have been proposed to correct for this issue.[2]
Cis-regulatory modules can be characterized by the information processing that they encode and the organization of their transcription factor binding sites. Additionally,cis-regulatory modules are also characterized by the way they affect the probability, proportion, and rate of transcription.[4]Highly cooperative and coordinatedcis-regulatory modules are classified asenhanceosomes.[4] The architecture and the arrangement of the transcription factor binding sites are critical because disruption of the arrangement could cancel out the function.[4]Functional flexiblecis-regulatory modules are called billboards. Their transcriptional output is the summation effect of the bound transcription factors.[4]Enhancers affect the probability of a gene being activated, but have little or no effect on rate.[4]The Binary response model acts like an on/off switch for transcription. This model will increase or decrease the amount of cells that transcribe a gene, but it does not affect the rate of transcription.[4]Rheostatic response model describes cis-regulatory modules as regulators of the initiation rate of transcription of its associated gene.[4]
Promoters are CREs consisting of relatively short sequences of DNA which include the site where transcription is initiated and the region approximately 35 bpupstream or downstream from the initiation site (bp).[5] Ineukaryotes, promoters usually have the following four components: theTATA box, aTFIIBrecognition site, aninitiator, and the downstreamcore promoter element.[5] It has been found that a single gene can contain multiple promoter sites.[6] In order to initiate transcription of the downstream gene, a host of DNA-binding proteins called transcription factors (TFs) must bind sequentially to this region.[5] Only once this region has been bound with the appropriate set of TFs, and in the proper order, canRNA polymerase bind and begin transcribing the gene.
Enhancers are CREs that influence (enhance) the transcription of genes on the same molecule of DNA and can be found upstream, downstream, within theintrons, or even relatively far away from the gene they regulate. Multiple enhancers can act in a coordinated fashion to regulate transcription of one gene.[7] A number of genome-wide sequencing projects have revealed that enhancers are often transcribed tolong non-coding RNA (lncRNA) orenhancer RNA (eRNA), whose changes in levels frequently correlate with those of the target gene mRNA.[8]
Silencers are CREs that can bindtranscription regulation factors (proteins) calledrepressors, thereby preventing transcription of a gene. The term "silencer" can also refer to a region in the3' untranslated region of messenger RNA, that binds proteins which suppress translation of that mRNA molecule, but this usage is distinct from its use in describing a CRE.[citation needed]
Operators are CREs in prokaryotes and some eukaryotes that exist withinoperons, where they can bind proteins calledrepressors to affect transcription.[citation needed]
CREs have an important evolutionary role. The coding regions of genes are often wellconserved among organisms; yet different organisms display marked phenotypic diversity. It has been found thatpolymorphisms occurring within non-coding sequences have a profound effect on phenotype by alteringgene expression.[7]Mutations arising within a CRE can generate expression variance by changing the way TFs bind. Tighter or looser binding of regulatory proteins will lead to up- or down-regulated transcription.
The function of agene regulatory network depends on the architecture of thenodes, whose function is dependent on the multiplecis-regulatory modules.[1] The layout ofcis-regulatory modules can provide enough information to generate spatial and temporal patterns of gene expression.[1] During development each domain, where each domain represents a different spatial regions of the embryo, of gene expression will be under the control of differentcis-regulatory modules.[1] The design of regulatory modules help in producingfeedback,feed forward, and cross-regulatory loops.[9]
Cis-regulatory modules can regulate their target genes over large distances. Several models have been proposed to describe the way that these modules may communicate with their target gene promoter.[4] These include the DNA scanning model, the DNA sequence looping model and the facilitated tracking model. In the DNA scanning model, the transcription factor andcofactor complex form at thecis-regulatory module and then continues to move along the DNA sequence until it finds the target gene promoter.[4]In the looping model, the transcription factor binds to thecis-regulatory module, which then causes thelooping of the DNA sequence and allows for the interaction with the target gene promoter. The transcription factor-cis-regulatory module complex causes the looping of the DNA sequence slowly towards the target promoter and forms a stable looped configuration.[4] The facilitated tracking model combines parts of the two previous models.
Besides experimentally determining CRMs, there are variousbioinformatics algorithms for predicting them. Most algorithms try to search for significant combinations of transcription factor binding sites (DNA binding sites) inpromoter sequences of co-expressed genes.[10] More advanced methods combine the search for significant motifs with correlation ingene expression datasets betweentranscription factors and target genes.[11] Both methods have been implemented, for example, in theModuleMaster. Other programs created for the identification and prediction ofcis-regulatory modules include:
INSECT 2.0[12] is a web server that allows to search Cis-regulatory modules in a genome-wide manner. The program relies on the definition of strict restrictions among the Transcription Factor Binding Sites (TFBSs) that compose the module in order to decrease the false positives rate. INSECT is designed to be user-friendly since it allows automatic retrieval of sequences and several visualizations and links to third-party tools in order to help users to find those instances that are more likely to be true regulatory sites. INSECT 2.0 algorithm was previously published and the algorithm and theory behind it explained in[13]
Stubb uses hiddenMarkov models to identify statistically significant clusters of transcription factor combinations. It also uses a second related genome to improve the prediction accuracy of the model.[14]
Bayesian Networks use an algorithm that combines site predictions and tissue-specific expression data for transcription factors and target genes of interest. This model also uses regression trees to depict the relationship between the identifiedcis-regulatory module and the possible binding set of transcription factors.[15]
CRÈME examine clusters of target sites for transcription factors of interest. This program uses a database of confirmed transcription factor binding sites that were annotated across thehuman genome. A searchalgorithm is applied to the data set to identify possible combinations of transcription factors, which have binding sites that are close to the promoter of the gene set of interest. The possible cis-regulatory modules are then statistically analyzed and the significant combinations are graphically represented[16]
Activecis-regulatory modules in a genomic sequence have been difficult to identify. Problems in identification arise because often scientists find themselves with a small set of known transcription factors, so it makes it harder to identify statistically significant clusters of transcription factor binding sites.[14] Additionally, high costs limit the use of large whole genometiling arrays.[15]
An example of a cis-actingregulatory sequence is the operator in thelac operon. This DNA sequence is bound by thelac repressor, which, in turn, prevents transcription of the adjacent genes on the same DNA molecule. The lac operator is, thus, considered to "act in cis" on the regulation of the nearby genes. Theoperator itself does not code for anyprotein orRNA.
In contrast,trans-regulatory elements are diffusible factors, usually proteins, that may modify the expression of genes distant from the gene that was originally transcribed to create them. For example, atranscription factor that regulates a gene onchromosome 6 might itself have been transcribed from a gene onchromosome 11. The termtrans-regulatory is constructed from the Latin roottrans, which means "across from".
There are cis-regulatory and trans-regulatory elements. Cis-regulatory elements are oftenbinding sites for one or moretrans-acting factors.
To summarize, cis-regulatory elements are present on the same molecule of DNA as the gene they regulate whereas trans-regulatory elements can regulate genes distant from the gene from which they were transcribed.
Type | Abbr. | Function | Distribution | Ref. |
---|---|---|---|---|
Frameshift element | Regulates alternative frame use withmessenger RNAs | Archaea,bacteria,Eukaryota,RNA viruses | [17][18][19] | |
Internal ribosome entry site | IRES | Initiates translation in the middle of a messenger RNA | RNA virus,Eukaryota | [20] |
Iron response element | IRE | Regulates the expression of iron associated genes | Eukaryota | [21] |
Leader peptide | Regulates transcription of associated genes and/or operons | Bacteria | [22] | |
Riboswitch | Gene regulation | Bacteria,Eukaryota | [23] | |
RNA thermometer | Gene regulation | Bacteria | [24] | |
Selenocysteine insertion sequence | SECIS | Directs the cell to translate UGA stop-codons asselenocysteines | Metazoa | [25] |