Chloroplast DNA (cpDNA), also known asplastid DNA (ptDNA) is the DNA located in chloroplasts, which are photosynthetic organelles located within the cells of some eukaryotic organisms. Chloroplasts, like other types ofplastid, contain agenome separate from that in the cellnucleus. The existence of chloroplast DNA was identified biochemically in 1959,[1] and confirmed by electron microscopy in 1962.[2] The discoveries that the chloroplast contains ribosomes[3] and performs protein synthesis[4] revealed that the chloroplast is genetically semi-autonomous. The first complete chloroplast genome sequences were published in 1986,Nicotiana tabacum (tobacco) by Sugiura and colleagues andMarchantia polymorpha (liverwort) by Ozeki et al.[5][6] Since then,tens of thousands of chloroplast genomes from various species have beensequenced.
Chloroplast DNAs are circular, and are typically 120,000–170,000base pairs long.[7][8][9] They can have a contour length of around 30–60 micrometers, and have a mass of about 80–130 milliondaltons.[10]
Most chloroplasts have their entire chloroplast genome combined into a single large ring, though those ofdinophyte algae are a notable exception—their genome is broken up into about forty smallplasmids, each 2,000–10,000base pairs long.[11] Each minicircle contains one to three genes,[11] but blank plasmids, with nocoding DNA, have also been found.
Chloroplast DNA has long been thought to have a circular structure, but some evidence suggests that chloroplast DNA more commonly takes a linear shape.[12] Over 95% of the chloroplast DNA incorn chloroplasts has been observed to be in branched linear form rather than individual circles.[11]
Many chloroplast DNAs contain twoinverted repeats, which separate a long single copy section (LSC) from a short single copy section (SSC).[9]
The inverted repeats vary wildly in length, ranging from 4,000 to 25,000base pairs long each.[11] Inverted repeats in plants tend to be at the upper end of this range, each being 20,000–25,000 base pairs long.[9][13]The inverted repeat regions usually contain threeribosomal RNA and twotRNA genes, but they can be expanded orreduced to contain as few as four or as many as over 150 genes.[11]While a given pair of inverted repeats are rarely completely identical, they are always very similar to each other, apparently resulting fromconcerted evolution.[11]
The inverted repeat regions are highlyconserved among land plants, and accumulate few mutations.[9][13] Similar inverted repeats exist in the genomes of cyanobacteria and the other two chloroplast lineages (glaucophyta andrhodophyceæ), suggesting that they predate the chloroplast,[11] though some chloroplast DNAs like those ofpeas and a fewred algae[11] have since lost the inverted repeats.[13][14] Others, like the red algaPorphyra flipped one of its inverted repeats (making them direct repeats).[11] It is possible that the inverted repeats help stabilize the rest of the chloroplast genome, as chloroplast DNAs which have lost some of the inverted repeat segments tend to get rearranged more.[14]
Each chloroplast contains around 100 copies of its DNA in young leaves, declining to 15–20 copies in older leaves.[15] They are usually packed intonucleoids which can contain several identical chloroplast DNA rings. Many nucleoids can be found in each chloroplast.[10]
Though chloroplast DNA is not associated with truehistones,[16] inred algae, a histone-like chloroplast protein (HC) coded by the chloroplast DNA that tightly packs each chloroplast DNA ring into anucleoid has been found.[17]
In primitivered algae, the chloroplast DNA nucleoids are clustered in the center of a chloroplast, while in green plants andgreen algae, the nucleoids are dispersed throughout thestroma.[17]
More than 5000 chloroplast genomes have beensequenced and are accessible via the NCBI organelle genome database.[18] The first chloroplast genomes were sequenced in 1986, from tobacco (Nicotiana tabacum)[19] and liverwort (Marchantia polymorpha).[20] Comparison of the gene sequences of the cyanobacteriaSynechocystis to those of the chloroplast genome ofArabidopsis provided confirmation of theendosymbiotic origin of the chloroplast.[21][22] It also demonstrated the significant extent ofgene transfer from the cyanobacterial ancestor to the nuclear genome.
In most plant species, the chloroplast genome encodes approximately 120 genes.[23][24] The genes primarily encode core components of the photosynthetic machinery and factors involved in their expression and assembly.[25] Across species of land plants, the set of genes encoded by the chloroplast genome is fairly conserved. This includes fourribosomal RNAs, approximately 30tRNAs, 21ribosomal proteins, and 4 subunits of the plastid-encodedRNA polymerase complex that are involved in plastid gene expression.[25] The largeRubisco subunit and 28 photosyntheticthylakoid proteins are encoded within the chloroplast genome.[25]
Over time, many parts of the chloroplast genome were transferred to thenuclear genome of the host,[7][8][26] a process calledendosymbiotic gene transfer.As a result, the chloroplast genome is heavilyreduced compared to that of free-living cyanobacteria. Chloroplasts may contain 60–100 genes whereas cyanobacteria often have more than 1500 genes in their genome.[27] The parasiticPilostyles have even lost their plastid genes fortRNA.[28] Contrarily, there are only a few known instances where genes have been transferred to the chloroplast from various donors, including bacteria.[29][30][31]
Endosymbiotic gene transfer is how we know about thelost chloroplasts in manychromalveolate lineages. Even if a chloroplast is eventually lost, the genes it donated to the former host's nucleus persist, providing evidence for the lost chloroplast's existence. For example, whilediatoms (aheterokontophyte) now have ared algal derived chloroplast, the presence of manygreen algal genes in the diatom nucleus provide evidence that the diatom ancestor (probably the ancestor of all chromalveolates too) had agreen algal derived chloroplast at some point, which was subsequently replaced by the red chloroplast.[32]
In land plants, some 11–14% of the DNA in their nuclei can be traced back to the chloroplast,[33] up to 18% inArabidopsis, corresponding to about 4,500 protein-coding genes.[34] There have been a few recent transfers of genes from the chloroplast DNA to the nuclear genome in land plants.[8]
Of the approximately three-thousand proteins found in chloroplasts, some 95% of them are encoded by nuclear genes. Many of the chloroplast's protein complexes consist of subunits from both the chloroplast genome and the host's nuclear genome. As a result,protein synthesis must be coordinated between the chloroplast and the nucleus. The chloroplast is mostly under nuclear control, though chloroplasts can also give out signals regulatinggene expression in the nucleus, calledretrograde signaling.[35]
Protein synthesis within chloroplasts relies on anRNA polymerase coded by the chloroplast's own genome, which is related to RNA polymerases found in bacteria. Chloroplasts also contain a mysterious second RNA polymerase that is encoded by the plant's nuclear genome. The two RNA polymerases may recognize and bind to different kinds ofpromoters within the chloroplast genome.[36] Theribosomes in chloroplasts are similar to bacterial ribosomes.[37]
![]() | This sectionneeds expansion with: Genome size differences between algae and land plants, chloroplast stuff coded by the nucleus, DNA replication, NADPH redox, special tRNA synthetases, etc.. You can help byadding to it.(January 2013) |
RNA editing is the insertion, deletion, and substitution of nucleotides in a mRNA transcript prior to translation to protein. The highly oxidative environment inside chloroplasts increases the rate of mutation so post-transcription repairs are needed to conserve functional sequences. The chloroplast editosome substitutes C -> U and U -> C at very specific locations on the transcript. This can change the codon for an amino acid or restore a non-functional pseudogene by adding an AUG start codon or removing a premature UAA stop codon.[38]
The editosome recognizes and binds to cis sequence upstream of the editing site. The distance between the binding site and editing site varies by gene and proteins involved in the editosome. Hundreds of differentPPR proteins from the nuclear genome are involved in the RNA editing process. These proteins consist of 35-mer repeated amino acids, the sequence of which determines the cis binding site for the edited transcript.[38]
Basal land plants such as liverworts, mosses and ferns have hundreds of different editing sites while flowering plants typically have between thirty and forty. Parasitic plants such asEpifagus virginiana show a loss of RNA editing resulting in a loss of function for photosynthesis genes.[39]
The mechanism for chloroplast DNA (cpDNA) replication has not been conclusively determined, but two main models have been proposed. Scientists have attempted to observe chloroplast replication viaelectron microscopy since the 1970s.[40][41] The results of the microscopy experiments led to the idea that chloroplast DNA replicates using a double displacement loop (D-loop). As theD-loop moves through the circular DNA, it adopts a theta intermediary form, also known as a Cairns replication intermediate, and completes replication with a rolling circle mechanism.[40][12] Replication starts at specific points of origin. Multiplereplication forks open up, allowing replication machinery to replicate the DNA. As replication continues, the forks grow and eventually converge. The new cpDNA structures separate, creating daughter cpDNA chromosomes.
In addition to the early microscopy experiments, this model is also supported by the amounts ofdeamination seen in cpDNA.[40] Deamination occurs when anamino group is lost and is amutation that often results in base changes. When adenine is deaminated, it becomeshypoxanthine (H). Hypoxanthine can bind tocytosine, and when the HC base pair is replicated, it becomes a GC (thus, an A → G base change).[42]
In cpDNA, there are several A → G deamination gradients. DNA becomes susceptible to deamination events when it is single stranded. When replication forks form, the strand not being copied is single stranded, and thus at risk for A → G deamination. Therefore, gradients in deamination indicate that replication forks were most likely present and the direction that they initially opened (the highest gradient is most likely nearest the start site because it was single stranded for the longest amount of time).[40] This mechanism is still the leading theory today; however, a second theory suggests that most cpDNA is actually linear and replicates through homologous recombination. It further contends that only a minority of the genetic material is kept in circular chromosomes while the rest is in branched, linear, or other complex structures.[40][12]
One of the main competing models for cpDNA asserts that most cpDNA is linear and participates inhomologous recombination and replication structures similar tobacteriophage T4.[12] It has been established that some plants have linear cpDNA, such as maize, and that more still contain complex structures that scientists do not yet understand;[12] however, the predominant view today is that most cpDNA is circular. When the original experiments on cpDNA were performed, scientists did notice linear structures; however, they attributed these linear forms to broken circles.[12] If the branched and complex structures seen in cpDNA experiments are real and not artifacts of concatenated circular DNA or broken circles, then a D-loop mechanism of replication is insufficient to explain how those structures would replicate.[12] At the same time, homologous recombination does not explain the multiple A → G gradients seen in plastomes.[40] This shortcoming is one of the biggest for the linear structure theory.
The movement of so many chloroplast genes to the nucleus means that many chloroplastproteins that were supposed to betranslated in the chloroplast are now synthesized in the cytoplasm. This means that these proteins must be directed back to the chloroplast, and imported through at least two chloroplast membranes.[43]
Curiously, around half of the protein products of transferred genes aren't even targeted back to the chloroplast. Many becameexaptations, taking on new functions like participating incell division,protein routing, and evendisease resistance. A few chloroplast genes found new homes in themitochondrial genome—most became nonfunctionalpseudogenes, though a fewtRNA genes still work in themitochondrion.[27] Some transferred chloroplast DNA protein products get directed to thesecretory pathway[27] (though manysecondary plastids are bounded by an outermost membrane derived from the host'scell membrane, and thereforetopologically outside of the cell, because to reach the chloroplast from thecytosol, you have to cross thecell membrane, just like if you were headed for theextracellular space. In those cases, chloroplast-targeted proteins do initially travel along the secretory pathway).[44]
Because the cell acquiring a chloroplastalready hadmitochondria (andperoxisomes, and acell membrane for secretion), the new chloroplast host had to develop a uniqueprotein targeting system to avoid having chloroplast proteins being sent to the wrongorganelle.[43]
Polypeptides, the precursors ofproteins, are chains ofamino acids. The two ends of a polypeptide are called theN-terminus, oramino end, and theC-terminus, orcarboxyl end.[45] For many (but not all)[46] chloroplast proteins encoded bynuclear genes,cleavable transit peptides are added to the N-termini of the polypeptides, which are used to help direct the polypeptide to the chloroplast for import[43][47] (N-terminal transit peptides are also used to direct polypeptides to plantmitochondria).[48]N-terminal transit sequences are also calledpresequences[43] because they are located at the "front" end of a polypeptide—ribosomes synthesize polypeptides from the N-terminus to the C-terminus.[45]
Chloroplast transit peptides exhibit huge variation in length andamino acid sequence.[47] They can be from 20 to 150 amino acids long[43]—an unusually long length, suggesting that transit peptides are actually collections ofdomains with different functions.[47] Transit peptides tend to bepositively charged,[43] rich inhydroxylated amino acids such asserine,threonine, andproline, and poor inacidic amino acids likeaspartic acid andglutamic acid.[47] In anaqueous solution, the transit sequence forms a random coil.[43]
Not all chloroplast proteins include a N-terminal cleavable transit peptide though.[43] Some include the transit sequence within thefunctional part of the protein itself.[43] A few have their transit sequence appended to theirC-terminus instead.[49] Most of the polypeptides that lack N-terminal targeting sequences are the ones that are sent to theouter chloroplast membrane, plus at least one sent to theinner chloroplast membrane.[43]
After a chloroplastpolypeptide is synthesized on aribosome in thecytosol,ATP energy can be used tophosphorylate, or add aphosphate group to many (but not all) of them in their transit sequences.[43]Serine andthreonine (both very common in chloroplast transit sequences—making up 20–30% of the sequence)[50] are often theamino acids that accept thephosphate group.[48][50] Theenzyme that carries out the phosphorylation isspecific for chloroplast polypeptides, and ignores ones meant formitochondria orperoxisomes.[50]
Phosphorylation changes the polypeptide's shape,[50] making it easier for14-3-3 proteins to attach to the polypeptide.[43][51] In plants,14-3-3 proteins only bind to chloroplast preproteins.[48] It is also bound by theheatshockproteinHsp70 that keeps the polypeptide fromfolding prematurely.[43] This is important because it prevents chloroplast proteins from assuming their active form and carrying out their chloroplast functions in the wrong place—thecytosol.[48][51] At the same time, they have to keep just enough shape so that they can be recognized and imported into the chloroplast.[48]
The heat shock protein and the 14-3-3 proteins together form a cytosolic guidance complex that makes it easier for the chloroplast polypeptide to get imported into the chloroplast.[43]
Alternatively, if a chloroplast preprotein's transit peptide is not phosphorylated, a chloroplast preprotein can still attach to a heat shock protein orToc159. These complexes can bind to theTOC complex on the outer chloroplast membrane usingGTP energy.[43]
TheTOC complex, ortranslocon on theouterchloroplast membrane, is a collection of proteins that imports preproteins across theouter chloroplast envelope. Fivesubunits of the TOC complex have been identified—twoGTP-binding proteinsToc34 andToc159, the protein import tunnelToc75, plus the proteinsToc64[43] andToc12.[46]
The first three proteins form a core complex that consists of one Toc159, four to five Toc34s, and four Toc75s that form four holes in a disk 13nanometers across. The whole core complex weighs about 500kilodaltons. The other two proteins, Toc64 and Toc12, are associated with the core complex but are not part of it.[46]
Toc34 is anintegral protein in the outer chloroplast membrane that's anchored into it by itshydrophobic[53]C-terminal tail.[43][51] Most of the protein, however, including its largeguanosine triphosphate (GTP)-bindingdomain projects out into the stroma.[51]
Toc34's job is to catch some chloroplastpreproteins in thecytosol and hand them off to the rest of the TOC complex.[43] WhenGTP, an energy molecule similar toATP attaches to Toc34, the protein becomes much more able to bind to many chloroplast preproteins in thecytosol.[43] The chloroplast preprotein's presence causes Toc34 to break GTP intoguanosine diphosphate (GDP) andinorganic phosphate. This loss of GTP makes the Toc34 protein release the chloroplast preprotein, handing it off to the next TOC protein.[43] Toc34 then releases the depleted GDP molecule, probably with the help of an unknownGDP exchange factor. Adomain ofToc159 might be the exchange factor that carry out the GDP removal. The Toc34 protein can then take up another molecule of GTP and begin the cycle again.[43]
Toc34 can be turned off throughphosphorylation. Aprotein kinase drifting around on the outer chloroplast membrane can useATP to add aphosphate group to the Toc34 protein, preventing it from being able to receive anotherGTP molecule, inhibiting the protein's activity. This might provide a way to regulate protein import into chloroplasts.[43][51]
Arabidopsis thaliana has twohomologous proteins,AtToc33 andAtToc34 (TheAt stands forArabidopsisthaliana),[43][51] which are each about 60% identical inamino acid sequence to Toc34 inpeas (calledpsToc34).[51] AtToc33 is the most common inArabidopsis,[51] and it is the functionalanalogue of Toc34 because it can be turned off by phosphorylation. AtToc34 on the other hand cannot be phosphorylated.[43][51]
Toc159 is anotherGTP binding TOCsubunit, likeToc34. Toc159 has threedomains. At theN-terminal end is the A-domain, which is rich inacidic amino acids and takes up about half the protein length.[43][53] The A-domain is oftencleaved off, leaving an 86kilodalton fragment calledToc86.[53] In the middle is itsGTP binding domain, which is very similar to thehomologous GTP-binding domain in Toc34.[43][53] At theC-terminal end is thehydrophilic M-domain,[43] which anchors the protein to the outer chloroplast membrane.[53]
Toc159 probably works a lot like Toc34, recognizing proteins in the cytosol usingGTP. It can be regulated throughphosphorylation, but by a differentprotein kinase than the one that phosphorylates Toc34.[46] Its M-domain forms part of the tunnel that chloroplast preproteins travel through, and seems to provide the force that pushes preproteins through, using the energy fromGTP.[43]
Toc159 is not always found as part of the TOC complex—it has also been found dissolved in thecytosol. This suggests that it might act as a shuttle that finds chloroplast preproteins in the cytosol and carries them back to the TOC complex. There isn't a lot of direct evidence for this behavior though.[43]
A family of Toc159 proteins,Toc159,Toc132,Toc120, andToc90 have been found inArabidopsis thaliana. They vary in the length of their A-domains, which is completely gone in Toc90. Toc132, Toc120, and Toc90 seem to have specialized functions in importing stuff like nonphotosynthetic preproteins, and can't replace Toc159.[43]
Toc75 is the most abundant protein on the outer chloroplast envelope. It is atransmembrane tube that forms most of the TOC pore itself. Toc75 is aβ-barrel channel lined by 16β-pleated sheets.[43] The hole it forms is about 2.5nanometers wide at the ends, and shrinks to about 1.4–1.6 nanometers in diameter at its narrowest point—wide enough to allow partially folded chloroplast preproteins to pass through.[43]
Toc75 can also bind to chloroplast preproteins, but is a lot worse at this than Toc34 or Toc159.[43]
Arabidopsis thaliana has multipleisoforms ofToc75 that are named by thechromosomal positions of thegenes that code for them.AtToc75 III is the most abundant of these.[43]
TheTIC translocon, ortranslocon on theinnerchloroplast membranetranslocon[43] is another protein complex that imports proteins across theinner chloroplast envelope. Chloroplast polypeptide chains probably often travel through the two complexes at the same time, but the TIC complex can also retrieve preproteins lost in theintermembrane space.[43]
Like theTOC translocon, the TIC translocon has a large corecomplex surrounded by some loosely associated peripheral proteins likeTic110,Tic40, andTic21.[54]The core complex weighs about one milliondaltons and containsTic214,Tic100,Tic56, andTic20 I, possibly three of each.[54]
Tic20 is anintegral protein thought to have fourtransmembraneα-helices.[43] It is found in the 1 milliondalton TIC complex.[54] Because it is similar tobacterialamino acid transporters and themitochondrial import proteinTim17[43] (translocase on theinnermitochondrialmembrane),[55] it has been proposed to be part of the TIC import channel.[43] There is noin vitro evidence for this though.[43] InArabidopsis thaliana, it is known that for about every fiveToc75 proteins in the outer chloroplast membrane, there are twoTic20 I proteins (the mainform of Tic20 inArabidopsis) in the inner chloroplast membrane.[54]
UnlikeTic214,Tic100, orTic56, Tic20 hashomologous relatives incyanobacteria and nearly all chloroplast lineages, suggesting it evolved before the first chloroplast endosymbiosis.Tic214,Tic100, andTic56 are unique tochloroplastidan chloroplasts, suggesting that they evolved later.[54]
Tic214 is another TIC core complex protein, named because it weighs just under 214kilodaltons. It is 1786amino acids long and is thought to have sixtransmembrane domains on itsN-terminal end. Tic214 is notable for being coded for by chloroplast DNA, more specifically the firstopen reading frameycf1. Tic214 andTic20 together probably make up the part of the one milliondalton TIC complex that spans theentire membrane. Tic20 is buried inside the complex while Tic214 is exposed on both sides of theinner chloroplast membrane.[54]
Tic100 is anuclear encoded protein that's 871amino acids long. The 871 amino acids collectively weigh slightly less than 100 thousanddaltons, and since the mature protein probably doesn't lose any amino acids when itself imported into the chloroplast (it has nocleavable transit peptide), it was named Tic100. Tic100 is found at the edges of the 1 million dalton complex on the side that faces thechloroplast intermembrane space.[54]
Tic56 is also anuclear encoded protein. Thepreprotein its gene encodes is 527 amino acids long, weighing close to 62 thousanddaltons; the mature form probably undergoes processing that trims it down to something that weighs 56 thousand daltons when it gets imported into the chloroplast. Tic56 is largely embedded inside the 1 million dalton complex.[54]
Tic56 andTic100 are highlyconserved among land plants, but they don't resemble any protein whose function is known. Neither has anytransmembrane domains.[54]
number of copies of ctDNA per chloroplast.