AnE-box (enhancer box) is aDNA response element found in someeukaryotes that acts as aprotein-binding site and has been found to regulate gene expression inneurons,muscles, and other tissues.[1] Its specific DNA sequence, CANNTG (where N can be anynucleotide), with a palindromiccanonical sequence of CACGTG,[2] is recognized and bound bytranscription factors to initiategenetranscription. Once the transcription factors bind to the promoters through the E-box, otherenzymes can bind to the promoter and facilitate transcription from DNA tomRNA.
The E-box was discovered in a collaboration betweenSusumu Tonegawa's andWalter Gilbert's laboratories in 1985 as a control element inimmunoglobulin heavy-chain enhancer.[3][4] They found that a region of 140base pairs in the tissue-specific transcriptional enhancer element was sufficient for different levels of transcriptionenhancement in different tissues and sequences. They suggested that proteins made by specific tissues acted on these enhancers to activate sets ofgenes during cell differentiation.
In 1989, David Baltimore's lab discovered the first two E-box bindingproteins, E12 and E47.[5] Theseimmunoglobulin enhancers could bind asheterodimers to proteins throughbHLH domains. In 1990, another E-protein, ITF-2A (later renamed E2-2Alt) was discovered that can bind toimmunoglobulinlight chain enhancers.[6] Two years later, the third E-box binding protein, HEB, was discovered by screening acDNA library fromHeLa cells.[7] A splice-variant of the E2-2 was discovered in 1997 and was found to inhibit thepromoter of a muscle-specific gene.[8]
Since then, researchers have established that the E-box affects genetranscription in several eukaryotes and found E-box binding factors that identify E-boxconsensus sequences.[9] In particular, several experiments have shown that the E-box is an integral part of the transcription-translation feedback loop that comprises thecircadian clock.
E-box binding proteins play a major role in regulating transcriptional activity. These proteins usually contain thebasic helix-loop-helix proteinstructural motif, which allows them to bind asdimers.[10] This motif consists of twoamphipathicα-helices, separated by a small sequence ofamino acids, that form one or more β-turns. Thehydrophobic interactions between these α-helices stabilize dimerization. Besides, each bHLHmonomer has a basic region, which helps mediate recognition between the bHLHmonomer and the E-box (the basic region interacts with the major groove of theDNA). Depending on the DNA motif ("CAGCTG" versus "CACGTG") the bHLH protein has a different set of basic residues.
Relative Position of CTRR and E-Box
The E-box binding is modulated by Zn2+ in mice. The CT-Rich Regions (CTRR) located about 23nucleotides upstream of the E-box is important in E-box binding,transactivation (increased rate of genetic expression), and transcription of circadian genesBMAL1/NPAS2 and BMAL1/CLOCK complexes.[11]
The binding specificity of different E-boxes is found to be essential in their function. E-boxes with different functions have a different number and type of binding factor.[12]
The consensus sequence of the E-box is usually CANNTG; however, there exist other E-boxes of similar sequences called noncanonical E-boxes. These include, but are not limited to:
CACGTT sequence 20 bp upstream of themouse Period2 (PER2) gene and regulates its expression[13]
CAGCTT sequence found within theMyoD core enhancer[14]
The link between E-box-regulated genes and thecircadian clock was discovered in 1997, when Hao, Allen, and Hardin (Department of Biology atTexas A&M University) analyzedrhythmicity in the period (per) gene inDrosophila melanogaster.[16] They found a circadian transcriptional enhancer upstream of the per gene within a 69 bp DNA fragment. Depending upon PER protein levels, the enhancer drove high levels ofmRNA transcription in both LD (light-dark) and DD (constant darkness) conditions. The enhancer was found to be necessary for high-levelgene expression but not for circadian rhythmicity. It also works independently as a target of the BMAL1/CLOCK complex.
The E-box plays an important role in circadian genes; so far, nine E/E'BOX controlled circadian genes have been identified:PER1, PER2,BHLHB2,BHLHB3,CRY1,DBP,Nr1d1,Nr1d2, and RORC.[17] As the E-box is connected to several circadian genes, it is possible that the genes and proteins associated with it are "crucial and vulnerable points in the (circadian) system."[18]
E-box like CLOCK-related elements (EL-box; GGCACGAGGC) are also important in maintaining circadian rhythmicity in clock-controlled genes. Similarly to the E-box, the E-box like CLOCK related element can also induce transcription of BMAL1/CLOCK, which can then lead to expression in other EL-box containing genes (Ank, DBP, Nr1d1).[20] However, there are differences between the EL-box and the regular E-box. SuppressingDEC1 andDEC2 has a stronger effect on E-box than on EL-box. Furthermore, HES1, which can bind to a different consensus sequence (CACNAG, known as the N-box), shows suppression effect in EL-box, but not in E-box.
Both non-canonical E-boxes and E-box-like sequences are crucial for circadian oscillation. Recent research on this forms an hypothesis that either a canonical or non-canonical E-box followed by an E-box like sequence with 6base pair interval in between is a necessary combination for circadian transcription.[21] In silico analysis also suggests that such an interval existed in other known clock-controlled genes.
The CLOCK-ARNTL (BMAL1) complex is an integral part of themammalian circadian cycle and vital in maintaining circadian rhythmicity.
Knowing that binding activates transcription of the per gene in the promoter region, researchers discovered in 2002 thatDEC1 andDEC2 (bHLH transcription factors) repressed the CLOCK-BMAL1 complex through direct interaction with BMAL1 and/or competition for E-box elements. They concluded that DEC1 and DEC2 wereregulators of the mammalian molecular clock.[22]
In 2006, Ripperger and Schibler discovered that the binding of this complex to the E-box drove circadianDBP transcription andchromatin transitions (a change from chromatin tofacultative heterochromatin).[23] It was concluded that CLOCK regulates DBP expression by binding to E-box motifs in enhancer regions located in the first and secondintrons.
In 1991, researchers tested whether c-Myc could bind to DNA by dimerizing it to E12. Dimers of E6, thechimeric protein, were able to bind to an E-box element (GGCCACGTGACC) which was recognized by other HLH proteins.[24] Expression of E6 suppressed the function of c-Myc, which showed a link between the two.
In 1996, it was found that Myc heterodimerizes withMAX and that this heterodimeric complex could bind to the CAC(G/A)TG E-box sequence and activate transcription.[25]
In 1998, it was concluded that the function of c-Myc depends upon activating transcription of particular genes through E-box elements.[26]
MyoD comes from theMRFbHLH family and its main role ismyogenesis, the formation of muscle.[9] Other members in this family includemyogenin,Myf5, andMyf6 (MRF4).
When MyoD binds to the E-box motif CANNTG,muscle differentiation and expression of muscle-specific proteins is initiated.[27] The researchers ablated various parts of the recombinant MyoD sequence and concluded that MyoD used encompassing elements to bind the E-box and the tetraplex structure of the promoter sequence of the muscle specific gene α7integrin andsarcomericsMtCK.
MyoG belongs to the MyoD transcription factor family. MyoG-E-Box binding is necessary forneuromuscular synapse formation as an HDAC-Dach2-myogenin signaling pathway inskeletal muscle gene expression has been identified.[29] Decreased MyoG expression has been shown in patients with muscle wasting symptom.[30]
MyoG and MyoD have also been shown to involve inmyoblast differentiation.[31] They act bytransactivatingcathepsin B promotor activity and inducing its mRNA expression.
E47 is produced by alternative spliced E2A in E47 specific bHLH-encodingexons. Its role is to regulate tissue specific gene expression and differentiation. Manykinases have been associated with E47 including 3pk and MK2. These 2 proteins form a complex with E47 and reduce its transcription activity.[32] CKII and PKA are also shown tophosphorylate E47 in vitro.[33][34][35]
Similar to other E-box binding proteins, E47 also binds to the CANNTG sequence in the E-box. In homozygous E2A knock-out mice,B cells development stops before the DJ arrangement stage and the B cells fail to mature.[36] E47 has been shown to bind either as heterodimer(with E12)[37] or as homodimer(but weaker).[38]
Although the structural basis for how BMAL1/CLOCK interact with the E-box is unknown, recent research has shown that thebHLH protein domains of BMAL1/CLOCK are highly similar to other bHLH containing proteins, e.g. Myc/Max, which have been crystallized with E-boxes.[39] It is surmised that specificbases are necessary to support this high affinity binding. Furthermore, the sequence constraints on the region around the circadian E-box are not fully understood: it is believed to be necessary but not sufficient for E-boxes to be randomly spaced from each other in the genetic sequence in order for circadian transcription to occur. Recent research involving the E-box has been aimed at trying to find more binding proteins as well as discovering more mechanisms for inhibiting binding.
Researchers at the Medical School ofNanjing University found that the amplitude ofFBXL3 (F-box/Leucine rich-repeat protein) is expressed via an E-box.[40] They studied mice with FBXL3 deficiency and found that it regulates feedback loops in circadian rhythms by affecting circadian period length.
A study published April 4, 2013 by researchers atHarvard Medical School found that the nucleotides on either side of an E-box influences which transcription factors can bind to the E-box itself.[41] These nucleotides determine the 3-D spatial arrangement of the DNA strand and restrict the size of bindingtranscription factors. The study also found differences in binding patterns betweenin vivo andin vitro strands.
^Murre, C; Mc Caw, P S; Vaessin, H; et al. (Aug 1989). "Interactions between heterologous helix-loop-helix proteins generate complexes that bind specifically to a common DNA sequence".Cell.58 (3):537–544.doi:10.1016/0092-8674(89)90434-0.PMID2503252.S2CID29339773.
^Ueshima, T; Kawamoto T; Honda KK; Noshiro M; Fujimoto K; Nakao S; Ichinose N; Hashimoto S; Gotoh O; Kato Y (December 2012). "Identification of a new clock-related element EL-box involved in circadian regulation by BMAL1/CLOCK and HES1".Gene.510 (2):118–125.doi:10.1016/j.gene.2012.08.022.PMID22960268.
^Jane, D.T.; Morvay, L.C.; Koblinski, J.; et al. (2002). "Evidence that E-box promoter elements and MyoD transcription factors play a role in the induction of cathepsin B gene expression during human myoblast differentiation".Biol. Chem.383 (12):1833–1844.doi:10.1515/BC.2002.207.PMID12553720.S2CID26010667.
^Bain, Gretchen; Maandag, Els C.; Izon, David J.; Amsen, Derk; Kruisbeek, Ada M.; Weintraub, Bennett C.; Krop, Ian; Schlissel, Mark S.; Feeney, Ann J.; van Roon, Marian; van der Valk, Martin; te Riele, Hein P.J.; Berns, Anton; Murre, Cornelius (2 December 1994). "E2A proteins are required for proper B cell development and initiation of immunoglobulin gene rearrangements".Cell.79 (5):885–92.doi:10.1016/0092-8674(94)90077-9.PMID8001125.S2CID34325904.
^Lassar, Andrew B.; Davis, Robert L.; Wright, Woodring E.; Kadesch, Tom; Murre, Cornelius; Voronova, Anna; Baltimore, David; Weintraub, Harold (26 July 1991). "Functional activity of myogenic HLH proteins requires hetero-oligomerization with E12/E47-like proteins in vivo".Cell.66 (2):305–15.doi:10.1016/0092-8674(91)90620-E.PMID1649701.S2CID25957022.
^Murre, Cornelius; McCaw, Patrick Schonleber; Vaessin, H.; Caudy, M.; Jan, L.Y.; Jan, Y.N.; Cabrera, Carlos V.; Buskin, Jean N.; Hauschka, Stephen D.; Lassar, Andrew B.; Weintraub, Harold; Baltimore, David (11 August 1989). "Interactions between heterologous helix-loop-helix proteins generate complexes that bind specifically to a common DNA sequence".Cell.58 (3):537–44.doi:10.1016/0092-8674(89)90434-0.PMID2503252.S2CID29339773.