


Jump to content
WikipediaThe Free Encyclopedia

Base pair

From Wikipedia, the free encyclopedia
(Redirected fromKilo-base pair)
Two nucleobases bound by hydrogen bonds
The chemical structure of DNA base-pairs

Abase pair (bp) is a fundamental unit of double-strandednucleic acids consisting of twonucleobases bound to each other byhydrogen bonds. They form the building blocks of theDNA double helix and contribute to the folded structure of both DNA andRNA. Dictated by specifichydrogen bonding patterns, "Watson–Crick" (or "Watson–Crick–Franklin") base pairs (guaninecytosine andadeninethymine)[1] allow the DNA helix to maintain a regular helical structure that is subtly dependent on itsnucleotide sequence.[2] Thecomplementary nature of this based-paired structure provides aredundant copy of thegenetic information encoded within each strand of DNA. The regular structure and data redundancy provided by the DNA double helix make DNA well suited to the storage of genetic information, while base-pairing between DNA and incoming nucleotides provides the mechanism through whichDNA polymerase replicates DNA andRNA polymerase transcribes DNA into RNA. Many DNA-binding proteins can recognize specific base-pairing patterns that identify particular regulatory regions of genes.

Intramolecular base pairs can occur within single-stranded nucleic acids. This is particularly important in RNA molecules (e.g.,transfer RNA), where Watson–Crick base pairs (guanine–cytosine and adenine–uracil) permit the formation of short double-stranded helices, and a wide variety of non–Watson–Crick interactions (e.g., G–U or A–A) allow RNAs to fold into a vast range of specific three-dimensionalstructures. In addition, base-pairing betweentransfer RNA (tRNA) andmessenger RNA (mRNA) forms the basis for themolecular recognition events that result in the nucleotide sequence of mRNA becomingtranslated into the amino acid sequence ofproteins via thegenetic code.

The size of an individualgene or an organism's entiregenome is often measured in base pairs because DNA is usually double-stranded. Hence, the number of total base pairs is equal to the number of nucleotides in one of the strands (with the exception of non-coding single-stranded regions oftelomeres). Thehaploidhuman genome (23chromosomes) is estimated to be about 3.2 billion base pairs long and to contain 20,000–25,000 distinct protein-coding genes.[3][4][5][6] Akilobase (kb) is a unit of measurement inmolecular biology equal to 1000 base pairs of DNA or RNA.[7] The total number ofDNA base pairs on Earth is estimated at 5.0×1037 with a weight of 50 billiontonnes.[8] In comparison, the totalmass of thebiosphere has been estimated to be as much as 4 TtC (trillion tons ofcarbon).[9]

Hydrogen bonding and stability

Top, aG.C base pair with threehydrogen bonds. Bottom, anA.T base pair with two hydrogen bonds. Non-covalent hydrogen bonds between the bases are shown as dashed lines. The wiggly lines stand for the connection to the pentose sugar and point in the direction of the minor groove.

Hydrogen bonding is the chemical interaction that underlies the base-pairing rules described above. Appropriate geometrical correspondence of hydrogen bond donors and acceptors allows only the "right" pairs to form stably. DNA with highGC-content is more stable than DNA with low GC-content. Crucially, however,stacking interactions are primarily responsible for stabilising the double-helical structure; Watson-Crick base pairing's contribution to global structural stability is minimal, but its role in the specificity underlying complementarity is, by contrast, of maximal importance as this underlies the template-dependent processes of thecentral dogma (e.g.DNA replication).[10]

The biggernucleobases, adenine and guanine, are members of a class of double-ringed chemical structures calledpurines; the smaller nucleobases, cytosine and thymine (and uracil), are members of a class of single-ringed chemical structures calledpyrimidines. Purines are complementary only with pyrimidines: pyrimidine–pyrimidine pairings are energetically unfavorable because the molecules are too far apart for hydrogen bonding to be established; purine–purine pairings are energetically unfavorable because the molecules are too close, leading to overlap repulsion. Purine–pyrimidine base-pairing of AT or GC or UA (in RNA) results in proper duplex structure. The only other purine–pyrimidine pairings would be AC and GT and UG (in RNA); these pairings are mismatches because the patterns of hydrogen donors and acceptors do not correspond. The GU pairing, with two hydrogen bonds, does occur fairly often inRNA (seewobble base pair).

Paired DNA and RNA molecules are comparatively stable at room temperature, but the two nucleotide strands will separate above amelting point that is determined by the length of the molecules, the extent of mispairing (if any), and the GC content. Higher GC content results in higher melting temperatures; it is, therefore, unsurprising that the genomes ofextremophile organisms such asThermus thermophilus are particularly GC-rich. On the converse, regions of a genome that need to separate frequently — for example, the promoter regions for often-transcribed genes — are comparatively GC-poor (for example, seeTATA box). GC content and melting temperature must also be taken into account when designingprimers forPCR reactions.[citation needed]



The following DNA sequences illustrate pair double-stranded patterns. By convention, the top strand is written from the5′-end to the3′-end; thus, the bottom strand (complementary strand) is written 3′ to 5′.

A base-paired DNA sequence:
The corresponding RNA sequence, in whichuracil is substituted for thymine in the RNA strand:

Base analogs and intercalators

Main article:Nucleic acid analogue

Chemical analogs of nucleotides can take the place of proper nucleotides and establish non-canonical base-pairing, leading to errors (mostlypoint mutations) inDNA replication andDNA transcription. This is due to theirisosteric chemistry. One common mutagenic base analog is5-bromouracil, which resembles thymine but can base-pair to guanine in itsenol form.[11]

Other chemicals, known asDNA intercalators, fit into the gap between adjacent bases on a single strand and induceframeshift mutations by "masquerading" as a base, causing the DNA replication machinery to skip or insert additional nucleotides at the intercalated site. Most intercalators are largepolyaromatic compounds and are known or suspectedcarcinogens. Examples includeethidium bromide andacridine.[12][citation needed]

Mismatch repair


Mismatched base pairs can be generated by errors ofDNA replication and as intermediates duringhomologous recombination. The process of mismatch repair ordinarily must recognize and correctly repair a small number of base mispairs within a long sequence of normal DNA base pairs. To repair mismatches formed during DNA replication, several distinctive repair processes have evolved to distinguish between the template strand and the newly formed strand so that only the newly inserted incorrect nucleotide is removed (in order to avoid generating a mutation).[13] The proteins employed in mismatch repair during DNA replication, and the clinical significance of defects in this process are described in the articleDNA mismatch repair. The process of mispair correction during recombination is described in the articlegene conversion.

Length measurements

"Gbp" redirects here. For other uses, seeGBP (disambiguation).
Schematickaryogram of a human. The blue scale to the left of each nuclear chromosome pair (as well as themitochondrial genome at bottom left) shows its length in terms of mega–base-pairs.
Further information:Karyotype

The following abbreviations are commonly used to describe the length of a D/RNA molecule:

  • bp = base pair—one bp corresponds to approximately 3.4 Å (340 pm)[14] of length along the strand, and to roughly 618 or 643daltons for DNA and RNA respectively.
  • kb (= kbp) = kilo–base-pair = 1,000 bp
  • Mb (= Mbp) = mega–base-pair = 1,000,000 bp
  • Gb (= Gbp) = giga–base-pair = 1,000,000,000 bp

For single-stranded DNA/RNA, units ofnucleotides are used—abbreviated nt (or knt, Mnt, Gnt)—as they are not paired.To distinguish between units ofcomputer storage and bases, kbp, Mbp, Gbp, etc. may be used for base pairs.

Thecentimorgan is also often used to imply distance along a chromosome, but the number of base pairs it corresponds to varies widely. In the human genome, the centimorgan is about 1 million base pairs.[15][16]

Unnatural base pair (UBP)

See also:Artificial gene synthesis,Expanded genetic code,Nucleic acid analogue, andSynthetic genomics

An unnatural base pair (UBP) is a designed subunit (ornucleobase) ofDNA which is created in a laboratory and does not occur in nature. DNA sequences have been described which use newly created nucleobases to form a third base pair, in addition to the two base pairs found in nature, A-T (adeninethymine) and G-C (guaninecytosine). A few research groups have been searching for a third base pair for DNA, including teams led bySteven A. Benner,Philippe Marliere,Floyd E. Romesberg andIchiro Hirao.[17] Some new base pairs based on alternative hydrogen bonding, hydrophobic interactions and metal coordination have been reported.[18][19][20][21]

In 1989 Steven Benner (then working at theSwiss Federal Institute of Technology in Zurich) and his team led with modified forms of cytosine and guanine into DNA moleculesin vitro.[22] The nucleotides, which encoded RNA and proteins, were successfully replicatedin vitro. Since then, Benner's team has been trying to engineer cells that can make foreign bases from scratch, obviating the need for a feedstock.[23]

In 2002, Ichiro Hirao's group in Japan developed an unnatural base pair between 2-amino-8-(2-thienyl)purine (s) and pyridine-2-one (y) that functions in transcription and translation, for the site-specific incorporation of non-standard amino acids into proteins.[24] In 2006, they created 7-(2-thienyl)imidazo[4,5-b]pyridine (Ds) and pyrrole-2-carbaldehyde (Pa) as a third base pair for replication and transcription.[25] Afterward, Ds and 4-[3-(6-aminohexanamido)-1-propynyl]-2-nitropyrrole (Px) was discovered as a high fidelity pair in PCR amplification.[26][27] In 2013, they applied the Ds-Px pair to DNA aptamer generation byin vitro selection (SELEX) and demonstrated the genetic alphabet expansion significantly augment DNA aptamer affinities to target proteins.[28]

In 2012, a group of American scientists led by Floyd Romesberg, a chemical biologist at theScripps Research Institute in San Diego, California, published that his team designed an unnatural base pair (UBP).[20] The two new artificial nucleotides orUnnatural Base Pair (UBP) were namedd5SICS anddNaM. More technically, these artificialnucleotides bearing hydrophobicnucleobases, feature two fusedaromatic rings that form a (d5SICS–dNaM) complex or base pair in DNA.[23][29] His team designed a variety ofin vitro or "test tube" templates containing the unnatural base pair and they confirmed that it was efficiently replicated with high fidelity in virtually all sequence contexts using the modern standardin vitro techniques, namelyPCR amplification of DNA and PCR-based applications.[20] Their results show that for PCR and PCR-based applications, the d5SICS–dNaM unnatural base pair is functionally equivalent to a natural base pair, and when combined with the other two natural base pairs used by all organisms, A–T and G–C, they provide a fully functional and expanded six-letter "genetic alphabet".[29]

In 2014 the same team from the Scripps Research Institute reported that they synthesized a stretch of circular DNA known as aplasmid containing natural T-A and C-G base pairs along with the best-performing UBP Romesberg's laboratory had designed and inserted it into cells of the common bacteriumE. coli that successfully replicated the unnatural base pairs through multiple generations.[17] Thetransfection did not hamper the growth of theE. coli cells and showed no sign of losing its unnatural base pairs to its naturalDNA repair mechanisms. This is the first known example of a living organism passing along an expanded genetic code to subsequent generations.[29][30] Romesberg said he and his colleagues created 300 variants to refine the design of nucleotides that would be stable enough and would be replicated as easily as the natural ones when the cells divide. This was in part achieved by the addition of a supportivealgal gene that expresses anucleotide triphosphate transporter which efficiently imports the triphosphates of both d5SICSTP and dNaMTP intoE. coli bacteria.[29] Then, the natural bacterial replication pathways use them to accurately replicate aplasmid containing d5SICS–dNaM. Other researchers were surprised that the bacteria replicated these human-made DNA subunits.[31]

The successful incorporation of a third base pair is a significant breakthrough toward the goal of greatly expanding the number ofamino acids which can be encoded by DNA, from the existing 20 amino acids to a theoretically possible 172, thereby expanding the potential for living organisms to produce novelproteins.[17] The artificial strings of DNA do not encode for anything yet, but scientists speculate they could be designed to manufacture new proteins which could have industrial or pharmaceutical uses.[32] Experts said the synthetic DNA incorporating the unnatural base pair raises the possibility of life forms based on a different DNA code.[31][32]

Non-canonical base pairing

Main article:Non-canonical base pairing
Wobble base pairs
Comparison of Hoogsteen to Watson–Crick base pairs.[33]

In addition to the canonical pairing, some conditions can also favour base-pairing with alternative base orientation, and number and geometry of hydrogen bonds. These pairings are accompanied by alterations to the local backbone shape.[citation needed]

The most common of these is thewobble base pairing that occurs betweentRNAs andmRNAs at the third base position of manycodons duringtranscription[34] and during the charging of tRNAs by sometRNA synthetases.[35] They have also been observed in the secondary structures of some RNA sequences.[36]

Additionally,Hoogsteen base pairing (typically written as A•U/T and G•C) can exist in some DNA sequences (e.g. CA and TA dinucleotides) in dynamic equilibrium with standard Watson–Crick pairing.[33] They have also been observed in some protein–DNA complexes.[37]

In addition to these alternative base pairings, a wide range of base-base hydrogen bonding is observed in RNA secondary and tertiary structure.[38] These bonds are often necessary for the precise, complex shape of an RNA, as well as its binding to interaction partners.[38]

See also



  1. ^Spencer M (10 January 1959)."The stereochemistry of deoxyribonucleic acid. II. Hydrogen-bonded pairs of bases".Acta Crystallographica.12 (1):66–71.doi:10.1107/S0365110X59000160.ISSN 0365-110X.
  2. ^Zhurkin VB, Tolstorukov MY, Xu F, Colasanti AV, Olson WK (2005). "Sequence-Dependent Variability of B-DNA".DNA Conformation and Transcription. pp. 18–34.doi:10.1007/0-387-29148-2_2.ISBN 978-0-387-25579-8.
  3. ^Moran LA (2011-03-24)."The total size of the human genome is very likely to be ~3,200 Mb". Retrieved2012-07-16.
  4. ^"The finished length of the human genome is 2.86 Gb". 2006-06-12. Retrieved2012-07-16.
  5. ^"One copy of the human genome consists of approximately 3 billion base pairs of DNA". National Human Genome Research Institute. 2024-08-24.
  6. ^International Human Genome Sequencing Consortium (October 2004)."Finishing the euchromatic sequence of the human genome".Nature.431 (7011):931–945.Bibcode:2004Natur.431..931H.doi:10.1038/nature03001.PMID 15496913.
  7. ^Cockburn AF, Newkirk MJ, Firtel RA (December 1976). "Organization of the ribosomal RNA genes of Dictyostelium discoideum: mapping of the nontranscribed spacer regions".Cell.9 (4 Pt 1):605–613.doi:10.1016/0092-8674(76)90043-X.PMID 1034500.S2CID 31624366.
  8. ^Nuwer R (18 July 2015)."Counting All the DNA on Earth".The New York Times. New York.ISSN 0362-4331. Archived fromthe original on 2022-01-01. Retrieved2015-07-18.
  9. ^"The Biosphere: Diversity of Life".Aspen Global Change Institute. Basalt, CO. Archived fromthe original on 2014-11-10. Retrieved2015-07-19.
  10. ^Yakovchuk P, Protozanova E, Frank-Kamenetskii MD (2006-01-30)."Base-stacking and base-pairing contributions into thermal stability of the DNA double helix".Nucleic Acids Research.34 (2):564–574.doi:10.1093/nar/gkj454.PMC 1360284.PMID 16449200.
  11. ^Trautner TA, Swartz MN, Kornberg A (March 1962)."Enzymatic synthesis of deoxyribonucleic acid. X. Influence of bromouracil substitutions on replication".Proceedings of the National Academy of Sciences of the United States of America.48 (3):449–455.doi:10.1073/pnas.48.3.449.PMC 220799.PMID 13922323.
  12. ^Krebs JE, Goldstein ES, Kilpatrick ST, Lewin B (2018). "Genes are DNA and Encode RNAs and Polypeptides".Lewin's genes XII (12th ed.). Burlington, Mass: Jones & Bartlett Learning. p. 12.ISBN 978-1-284-10449-3.Each mutagenic event in the presence of an acridine results in the addition or removal of a single base pair.
  13. ^Putnam CD (September 2021)."Strand discrimination in DNA mismatch repair".DNA Repair.105: 103161.doi:10.1016/j.dnarep.2021.103161.PMC 8785607.PMID 34171627.
  14. ^Alberts B, Johnson A, Lewis J, Morgan D, Raff M, Roberts K, Walter P (December 2014).Molecular Biology of the Cell (6th ed.). New York/Abingdon: Garland Science, Taylor & Francis Group. p. 177.ISBN 978-0-8153-4432-2.
  15. ^"NIH ORDR – Glossary – C". Archived fromthe original on 2012-07-17. Retrieved2012-07-16.
  16. ^Scott MP, Matsudaira P, Lodish H, Darnell J, Zipursky L, Kaiser CA, Berk A, Krieger M (2004).Molecular Cell Biology (Fifth ed.). San Francisco: W. H. Freeman. p. 396.ISBN humans 1 centimorgan on average represents a distance of about 7.5x105 base pairs.
  17. ^abcFikes BJ (May 8, 2014)."Life engineered with expanded genetic code".San Diego Union Tribune. Archived fromthe original on 9 May 2014. Retrieved8 May 2014.
  18. ^Yang Z, Chen F, Alvarado JB, Benner SA (September 2011)."Amplification, mutation, and sequencing of a six-letter synthetic genetic system".Journal of the American Chemical Society.133 (38):15105–15112.doi:10.1021/ja204910n.PMC 3427765.PMID 21842904.
  19. ^Yamashige R, Kimoto M, Takezawa Y, Sato A, Mitsui T, Yokoyama S, Hirao I (March 2012)."Highly specific unnatural base pair systems as a third base pair for PCR amplification".Nucleic Acids Research.40 (6):2793–2806.doi:10.1093/nar/gkr1068.PMC 3315302.PMID 22121213.
  20. ^abcMalyshev DA, Dhami K, Quach HT, Lavergne T, Ordoukhanian P, Torkamani A, Romesberg FE (July 2012)."Efficient and sequence-independent replication of DNA containing a third base pair establishes a functional six-letter genetic alphabet".Proceedings of the National Academy of Sciences of the United States of America.109 (30):12005–12010.Bibcode:2012PNAS..10912005M.doi:10.1073/pnas.1205176109.PMC 3409741.PMID 22773812.
  21. ^Takezawa Y, Müller J, Shionoya M (2017-05-05)."Artificial DNA Base Pairing Mediated by Diverse Metal Ions".Chemistry Letters.46 (5):622–633.doi:10.1246/cl.160985.ISSN 0366-7022.
  22. ^Switzer C, Moroney SE, Benner SA (1989). "Enzymatic incorporation of a new base pair into DNA and RNA".J. Am. Chem. Soc.111 (21):8322–8323.doi:10.1021/ja00203a067.
  23. ^abCallaway E (May 7, 2014)."Scientists Create First Living Organism With 'Artificial' DNA".Nature News. Huffington Post. Retrieved8 May 2014.
  24. ^Hirao I, Ohtsuki T, Fujiwara T, Mitsui T, Yokogawa T, Okuni T, et al. (February 2002). "An unnatural base pair for incorporating amino acid analogs into proteins".Nature Biotechnology.20 (2):177–182.doi:10.1038/nbt0202-177.PMID 11821864.S2CID 22055476.
  25. ^Hirao I, Kimoto M, Mitsui T, Fujiwara T, Kawai R, Sato A, et al. (September 2006). "An unnatural hydrophobic base pair system: site-specific incorporation of nucleotide analogs into DNA and RNA".Nature Methods.3 (9):729–735.doi:10.1038/nmeth915.PMID 16929319.S2CID 6494156.
  26. ^Kimoto M, Kawai R, Mitsui T, Yokoyama S, Hirao I (February 2009)."An unnatural base pair system for efficient PCR amplification and functionalization of DNA molecules".Nucleic Acids Research.37 (2): e14.doi:10.1093/nar/gkn956.PMC 2632903.PMID 19073696.
  27. ^Yamashige R, Kimoto M, Takezawa Y, Sato A, Mitsui T, Yokoyama S, Hirao I (March 2012)."Highly specific unnatural base pair systems as a third base pair for PCR amplification".Nucleic Acids Research.40 (6):2793–2806.doi:10.1093/nar/gkr1068.PMC 3315302.PMID 22121213.
  28. ^Kimoto M, Yamashige R, Matsunaga K, Yokoyama S, Hirao I (May 2013). "Generation of high-affinity DNA aptamers using an expanded genetic alphabet".Nature Biotechnology.31 (5):453–457.doi:10.1038/nbt.2556.PMID 23563318.S2CID 23329867.
  29. ^abcdMalyshev DA, Dhami K, Lavergne T, Chen T, Dai N, Foster JM, et al. (May 2014)."A semi-synthetic organism with an expanded genetic alphabet".Nature.509 (7500):385–388.Bibcode:2014Natur.509..385M.doi:10.1038/nature13314.PMC 4058825.PMID 24805238.
  30. ^Sample I (May 7, 2014)."First life forms to pass on artificial DNA engineered by US scientists".The Guardian. Retrieved8 May 2014.
  31. ^ab"Scientists create first living organism containing artificial DNA".The Wall Street Journal. Fox News. May 8, 2014. Retrieved8 May 2014.
  32. ^abPollack A (May 7, 2014)."Scientists Add Letters to DNA's Alphabet, Raising Hope and Fear".New York Times. Retrieved8 May 2014.
  33. ^abNikolova EN, Kim E, Wise AA, O'Brien PJ, Andricioaei I, Al-Hashimi HM (February 2011)."Transient Hoogsteen base pairs in canonical duplex DNA".Nature.470 (7335):498–502.Bibcode:2011Natur.470..498N.doi:10.1038/nature09775.PMC 3074620.PMID 21270796.
  34. ^Murphy FV, Ramakrishnan V (December 2004). "Structure of a purine-purine wobble base pair in the decoding center of the ribosome".Nature Structural & Molecular Biology.11 (12):1251–1252.doi:10.1038/nsmb866.PMID 15558050.S2CID 27022506.
  35. ^Vargas-Rodriguez O, Musier-Forsyth K (June 2014). "Structural biology: wobble puts RNA on target".Nature.510 (7506):480–481.doi:10.1038/nature13502.PMID 24919145.S2CID 205239383.
  36. ^Garg A, Heinemann U (February 2018)."A novel form of RNA double helix based on G·U and C·A+ wobble base pairing".RNA.24 (2):209–218.doi:10.1261/rna.064048.117.PMC 5769748.PMID 29122970.
  37. ^Aishima J, Gitti RK, Noah JE, Gan HH, Schlick T, Wolberger C (December 2002)."A Hoogsteen base pair embedded in undistorted B-DNA".Nucleic Acids Research.30 (23):5244–5252.doi:10.1093/nar/gkf661.PMC 137974.PMID 12466549.
  38. ^abLeontis NB, Westhof E (June 2003). "Analysis of RNA motifs".Current Opinion in Structural Biology.13 (3):300–308.doi:10.1016/S0959-440X(03)00076-9.PMID 12831880.

Further reading


External links

Wikimedia Commons has media related toBase pairing.
  • DAN—webserver version of theEMBOSS tool for calculating melting temperatures
Key components
Archaeogenetics of
Related topics
Ribonucleic acids
Cloning vectors
Nucleic acid constituents
(Nucleoside monophosphate)
Cyclic nucleotide
Nucleoside diphosphate
Nucleoside triphosphate
Retrieved from ""
Hidden categories:

