ADNA-binding domain (DBD) is an independently foldedprotein domain that contains at least onestructural motif that recognizes double- or single-strandedDNA. A DBD can recognize a specific DNA sequence (arecognition sequence) or have a general affinity to DNA.[1] Some DNA-binding domains may also include nucleic acids in their folded structure.

One or more DNA-binding domains are often part of a largerprotein consisting of furtherprotein domains with differing function. The extra domains often regulate the activity of the DNA-binding domain. The function of DNA binding is either structural or involvestranscription regulation, with the two roles sometimes overlapping.[citation needed]
DNA-binding domains with functions involvingDNA structure have biological roles inDNA replication,repair,storage, and modification, such asmethylation.[citation needed]
Many proteins involved in theregulation of gene expression contain DNA-binding domains. For example, proteins that regulatetranscription by binding DNA are calledtranscription factors. The final output of mostcellular signaling cascades is gene regulation.[citation needed]
The DBD interacts with thenucleotides of DNA in a DNAsequence-specific or non-sequence-specific manner, but even non-sequence-specific recognition involves some sort ofmolecular complementarity between protein and DNA. DNA recognition by the DBD can occur at the major or minor groove of DNA, or at the sugar-phosphate DNA backbone (see the structure ofDNA). Each specific type of DNA recognition is tailored to the protein's function. For example, the DNA-cuttingenzymeDNAse I cuts DNA almost randomly and so must bind to DNA in a non-sequence-specific manner. But, even so, DNAse I recognizes a certain 3-D DNAstructure, yielding a somewhat specific DNA cleavage pattern that can be useful for studying DNA recognition by a technique calledDNA footprinting.[citation needed]
Many DNA-binding domains must recognize specific DNA sequences, such as DBDs oftranscription factors that activate specific genes, or those of enzymes that modify DNA at specific sites, likerestriction enzymes andtelomerase. Thehydrogen bonding pattern in the DNA major groove is less degenerate than that of the DNA minor groove, providing a more attractive site forsequence-specific DNA recognition.[citation needed]
The specificity ofDNA-binding proteins can be studied using many biochemical and biophysical techniques, such asgel electrophoresis,analytical ultracentrifugation,calorimetry, DNAmutation,protein structure mutation or modification,nuclear magnetic resonance,x-ray crystallography,surface plasmon resonance,electron paramagnetic resonance,cross-linking andmicroscale thermophoresis (MST).
A large fraction of genes in each genome encodes DNA-binding proteins (see Table). However, only a rather small number of protein families are DNA-binding. For instance, more than 2000 of the ~20,000 human proteins are "DNA-binding", including about 750 Zinc-finger proteins.[3]
| Species | DNA-binding proteins[4] | DNA-binding families[4] |
|---|---|---|
| Arabidopsis thaliana (thale cress) | 4471 | 300 |
| Saccharomyces cerevisiae (yeast) | 720 | 243 |
| Caenorhabditis elegans (worm) | 2028 | 271 |
| Drosophila melanogaster (fruit fly) | 2620 | 283 |

Originally discovered in bacteria, thehelix-turn-helix motif is commonly found in repressor proteins and is about 20 amino acids long. In eukaryotes, thehomeodomain comprises 2 helices, one of which recognizes the DNA (aka recognition helix). They are common in proteins that regulate developmental processes.[5]
The helix-hairpin-helix is found in proteins that interact with DNA in a non-sequence-specific manner.[6] It consists of two anti-parallelalpha-helices connected by a short hairpin loop. The two alpha-helices are packed at an acuteangle of ~25–50° that dictates the characteristic pattern ofhydrophobicity in the sequences, while other DNA-binding structures like the helix-turn-helix motif, which is also formed by a pair of helices, can be easily distinguished by the packing of the helices at an almost right angle.[7]

Thezinc finger domain is mostly found in eukaryotes, but some examples have been found in bacteria.[8] The zinc finger domain is generally between 23 and 28 amino acids long and is stabilized by coordinating zinc ions with regularly spaced zinc-coordinating residues (either histidines or cysteines). The most common class of zinc finger (Cys2His2) coordinates a single zinc ion and consists of a recognition helix and a 2-strandbeta-sheet.[9] In transcription factors these domains are often found in arrays (usually separated by short linker sequences) and adjacent fingers are spaced at 3 basepair intervals when bound to DNA.
The basicleucine zipper (bZIP) domain is found mainly in eukaryotes and to a limited extent in bacteria. The bZIP domain contains an alpha helix with aleucine at every 7th amino acid. If two such helices find one another, the leucines can interact as the teeth in a zipper, allowing dimerization of two proteins. When binding to the DNA, basic amino acid residues bind to the sugar-phosphate backbone while the helices sit in the major grooves. It regulates gene expression.
Consisting of about 110 amino acids, thewinged helix (WH) domain has four helices and a two-strand beta-sheet.
The wingedhelix-turn-helix (wHTH) domainSCOP46785 is typically 85-90 amino acids long. It is formed by a 3-helical bundle and a 4-strand beta-sheet (wing).
Thebasic helix-loop-helix (bHLH) domain is found in sometranscription factors and is characterized by twoalpha helices (α-helixes) connected by a loop. One helix is typically smaller and due to the flexibility of the loop, allows dimerization by folding and packing against another helix. The larger helix typically contains the DNA-binding regions.
HMG-box domains are found in high mobility group proteins which are involved in a variety of DNA-dependent processes like replication and transcription. They also alter the flexibility of the DNA by inducing bends.[10][11] The domain consists of three alpha helices separated by loops.
Wor3 domains, named after the White–Opaque Regulator 3 (Wor3) inCandida albicans arose more recently in evolutionary time than most previously described DNA-binding domains and are restricted to a small number of fungi.[12]
TheOB-fold is a small structural motif originally named for itsoligonucleotide/oligosaccharidebinding properties. OB-fold domains range between 70 and 150 amino acids in length.[13] OB-folds bind single-stranded DNA, and hence aresingle-stranded binding proteins.[13]
OB-fold proteins have been identified as critical forDNA replication,DNA recombination,DNA repair,transcription,translation,cold shock response, andtelomere maintenance.[14]
Theimmunoglobulin domain (InterPro: IPR013783) consists of a beta-sheet structure with large connecting loops, which serve to recognize either DNA major grooves or antigens. Usually found in immunoglobulin proteins, they are also present in Stat proteins of the cytokine pathway. This is likely because the cytokine pathway evolved relatively recently and has made use of systems that were already functional, rather than creating its own.
TheB3 DBD (InterPro: IPR003340,SCOP117343) is found exclusively intranscription factors fromhigher plants andrestriction endonucleasesEcoRII and BfiI and typically consists of 100-120 residues. It includes sevenbeta sheets and twoalpha helices, which form a DNA-binding pseudobarrelprotein fold.
TAL effectors are found in bacterial plant pathogens of the genusXanthomonas and are involved in regulating the genes of the host plant in order to facilitate bacterial virulence, proliferation, and dissemination.[15] They contain a central region of tandem 33-35 residue repeats and each repeat region encodes a single DNA base in the TALE's binding site.[16][17] Within the repeat it is residue 13 alone that directly contacts the DNA base, determining sequence specificity, while other positions make contacts with the DNA backbone, stabilising the DNA-binding interaction.[18] Each repeat within the array takes the form of paired alpha-helices, while the whole repeat array forms a right-handed superhelix, wrapping around the DNA-double helix. TAL effector repeat arrays have been shown to contract upon DNA binding and a two-state search mechanism has been proposed whereby the elongated TALE begins to contract around the DNA beginning with a successful Thymine recognition from a unique repeat unit N-terminal of the core TAL-effector repeat array.[19]Related proteins are found in bacterial plant pathogenRalstonia solanacearum,[20] the fungal endosymbiontBurkholderia rhizoxinica[21] and two as-yet unidentified marine-microorganisms.[22] The DNA binding code and the structure of the repeat array is conserved between these groups, referred to collectively as theTALE-likes.