Movatterモバイル変換


[0]ホーム

URL:


Skip to main content
                                  NCBI home page
Search in PMCSearch
As a library, NLM provides access to scientific literature. Inclusion in an NLM database does not imply endorsement of, or agreement with, the contents by NLM or the National Institutes of Health.
Learn more:PMC Disclaimer | PMC Copyright Notice
Proceedings of the National Academy of Sciences of the United States of America logo

Rolling-circle transposons in eukaryotes

Vladimir V Kapitonov1,*,Jerzy Jurka1
1Genetic Information Research Institute, 2081 Landings Drive, Mountain View, CA 94043
*

To whom reprint requests should be addressed. E-mail:vladimir@charon.girinst.org.

Communicated by Margaret G. Kidwell, University of Arizona,Tucson, AZ

Received 2001 Apr 10; Accepted 2001 May 29; Issue date 2001 Jul 17.

Copyright © 2001, The National Academy of Sciences
PMCID: PMC37501  PMID:11447285

Abstract

All eukaryotic DNA transposons reported so far belong to a singlecategory of elements transposed by the so-called “cut-and-paste”mechanism. Here, we report a previously unknown category of eukaryoticDNA transposons,Helitron, which transpose byrolling-circle replication. AutonomousHelitrons encodea 5′-to-3′ DNA helicase and nuclease/ligase similar to those encodedby known rolling-circle replicons.Helitron-liketransposons have conservative 5′-TC and CTRR-3′ termini and do not haveterminal inverted repeats. They contain 16- to 20-bp hairpins separatedby 10–12 nucleotides from the 3′-end and transpose precisely betweenthe 5′-A and T-3′, with no modifications of the AT target sites.Together with their multiple diverged nonautonomous descendants,Helitrons constitute ≈2% of both theArabidopsis thaliana andCaenorhabditis elegans genomes and alsocolonize theOriza sativa genome. Sequence conservationsuggests thatHelitrons continue to be transposed.


Eukaryotic and prokaryoticgenomes are populated by transposable elements (TEs) that are capableof intragenomic multiplication by transferring a DNA segment from onegenomic site to another (13). On the basis of mechanisms of theirtransposition, TEs can be divided into two classes: retrotransposons,which proliferate via reverse transcription, and DNA transposons, whichare transposed without RNA intermediates. DNA transposons found so farin eukaryotic genomes have characteristic structural hallmarks,including terminal inverted repeats (TIRs) and 2- to 10-bp flankingdirect repeats, generated by target site duplications (TSD) oninsertion of the transposons in the genome. Transposition of known DNAtransposons in eukaryotes is described by the “cut-and-paste”model (4), according to which transposases encoded by transposonsperform both DNA cleavage and transfer reactions, which are necessaryto cut a transposon at both its termini and insert it into a newposition. The majority of eukaryotic transposases belong to the DDEclass, named after the highly conserved Asp (D), Asp (D), and Glu (E)amino acid residues, which belong to the catalytic core (4,5). It hasbeen suggested (68) that the bacterial transposons IS91, IS801, andIS1294 form an exotic family of prokaryotic rolling-circle (RC)transposons that do not belong to the DDE class, as they transpose viaRC replication (RCR). These transposons share common 5′-AY and GTTC-3′termini, do not possess TIRs, and do not generate TSDs. They encodeonly one protein, similar to the replication initiator proteins (Rep)from known RC replicons (69). There are three groups of episomalreplicons using RCR: circular single-stranded DNA (ssDNA)bacteriophages (10), plasmids of bacteria or archaea (11), andgeminiviruses (circular ssDNA viruses replicating in plant cells) (12).Usually replication of RC replicons is catalyzed by thenuclease/ligase activity of Rep and is assisted by host DNA helicasesand ssDNA-binding proteins (SSBs) (612). Here, we report a previouslyunknown category of eukaryotic DNA transposons, namedHelitron, that transpose as RC replicons in theArabidopsis thaliana,Oriza sativa, andCaenorhabditis elegans genomes. Most TEsreported in this manuscript were active in recent evolutionary historyand were reconstructed from their inactive copies accumulated in therespective genomes. Typically, the reconstruction process produces aconsensus sequence without insertions, deletions, and false stop codonsaccumulated in the inactive copies. This approach is well known andbest illustrated by a recent study ofSleeping Beauty, aTc1-like transposon from fish (13), reconstructed from its inactivecopies and demonstrated to be transpositionally active in a test tube.Another much more ancient example is a PiggyBac-like DNA transposon,Looper, discovered in the human genome [V.V.K. and J.J.,Repbase Update (1998)www.girinst.org/Repbase_Update.html], whoseconsensus sequence is based on a multiple alignment of the inactivecopies, which are ≈100 million years old. All genomic copies ofLooper are mutated to the extent that no traces of itstransposase could be detected at the sequence level. However, thetransposase re-emerged from the virtual background noise afterreconstructing the consensus sequence.

Materials and Methods

Computational Analysis.

TEs reported in the manuscript were identified by running DNA sequencesof prospective TEs against GenBank by using the National Center forBiotechnology Informationblast server (14), followed bycensor analysis at Genetic Information Research Institute(GIRI) (15).censor is much more sensitive thanblast and was applied to determine the precise locations ofsequences similar to the query and to identify distantly related DNAsequences (≈60% identical to each other).

We built consensus sequences of the transposons on the basis of asimple majority rule applied to their multiply aligned copies.Additional copies of transposons obtained because of redundantsequencing or chromosomal duplications unrelated to transpositions werediscarded on the basis of the identity of extended flanking regions.

Distantly related proteins were identified by using theposition-specific iteratedpsi-blast search (16). Multiplealignments of protein sequences were produced byclustal w(17) and edited manually by usinggenedoc (18). Alignmentsof nucleotide sequences were performed by usingVMALN2 and PALN2, programsdeveloped at GIRI. Manual editing of nucleotide alignments was done byusingmase, a UNIX-based sequence editor (19). We used thegenscan (ref.20;http://genes.mit.edu/GENSCAN.html) andfgenesh(ref.21;http://genomic.sanger.ac.uk/gf/gf.html) programsto predict the exon/intron structures of genes encoded byHelitrons.

Monte Carlo Simulation.

We applied computer-assisted simulations to address whetherconservation of the ssDNA-binding replication protein A (RPA)-likeproteins in highly diverged families ofHelitron indicatesfunctional significance of those proteins for transposition. Thepremise behind the simulation is that a protein-coding sequence, freeof functional constraints, will lose its coding capacity because of theaccumulation of stop codons and breakup of splicing sites, leading toelimination of the protein-coding region. As a result, one could expecta little protein-coding capacity preserved in two sequences 40%diverged from each other, as in the case of the ATRPA1H1 and ATRPA2Hgenes (Fig.1) encoding ≈500-aaproteins 44% identical to each other. To test the impact of unselectedmutations, we used the 2,600-bp ATRPA2H DNA sequence, containing bothexons and introns, as a query to generate 100 random sequences. Everyrandom sequence was generated by random mutations atNrandom positions in the query, without insertions and deletions (thesimulation program written inperl is availableon request).N equalsDL/100,whereD stands for the sequence divergence andLfor length of the query. Usinggenscan, we wereable to obtain a distribution of lengths for proteins predicted in theset of 100 random DNA sequencesD% divergent from thequery. Assuming no functional role of RPA-like proteins inHelitrons, one should expect that the lengths of theseproteins in twoD% divergedHelitrons follow therandom distribution.

Figure 1.

Figure 1

Reconstruction of theHelitron1 (A),Helitron2 (B), andHelitron1_CE (C) consensus sequences.The consensus sequences are schematically depicted as rectangles.Contiguous copies ofHelitrons that we used forreconstruction of the consensus sequences are shown as bold linesbeneath the rectangles. Gaps in the lines mark deletions ofcorresponding regions of the consensus sequences. GenBank accessionnos. and sequence coordinates are indicated. Genes and theircoordinates in the consensus sequences are indicated above therectangles. Genes coding for proteins composed of the Rep and helicasedomains are shaded in gray. The AT target sites are encircled.

Databases.

Sequences of TEs reported in the manuscript are deposited in theA. thaliana andC. elegans sections of RepbaseUpdate (ref.22;www.girinst.org/Repbase_Update.html) and are alsoincluded in supplemental information, which is available atwww.girinst.org/∼vladimir/RC/S.html.

Results and Discussion

Helitrons in theA. thaliana Genome.

During computational identification of DNA repeats, we found that theA. thaliana genome harbors multiple dispersed ≈10-kb unitsthat encode two proteins similar to the yeast Pif1p DNA helicase(2324) and RPA (2526). On the basis of pairwise nucleotideidentity, these DNA units can be divided into several groups of 2–10sequences each. Typically, sequences from any particular group are≈90% identical to each other, whereas they are only 60–70%identical to sequences from separate groups. We have derived consensussequences for four groups, called hereafterHelitron1-4, of which the first two are discussedin detail in this paper (Table1). The15,809-bpHelitron1 consensus sequence harbors three genes(Fig.1A) composed of multiple exons (2021), namedATRPA1H1, ATHEL1, and ATRPA1H2, respectively. ATRPA1H1 encodes an842-aa protein, ATRPA1H1p, similar to RPA70, the largest subunit ofRPA. RPA70 is conserved in human, fly, rice, frog, worm, and yeast andis composed of three domains, RPA70_1, RPA70_2, and RPA70_3(2526). ATRPA1H1p (Fig.1A) is composed of twodivergent RPA70_2-like domains separated by a RPA70_3-like domain(data not shown). ATRPA1H2 is another gene that encodes aRPA70_2-like protein. This gene resides in the 3′-terminal segment ofHelitron1 (Fig.1A) and encodes a 262-aaprotein (ATRPA1H2p), 20 and 39% identical to the first and secondRPA70_2-like domains of ATRPA1H1p, respectively. It is known thatboth RPA70_2 and RPA70_3 bind ssDNA (2526). Therefore, bothATRPA1H1p and ATRPA1H2p are expected to be SSBs. ATHEL1 is the largestgene, residing in a central portion ofHelitron1, betweenthe ATRPA1H1 and ATRPA1H2 genes, and encodes a 1,697-aa protein(ATHEL1p). A ≈500-aa C-terminal portion of ATHEL1p is similar tonumerous DNA helicases that unwind the DNA duplex in a 5′-to-3′direction (10). These helicases belong to the SF1 superfamily,encompassing a broad spectrum of eukaryotic, prokaryotic, and viralproteins characterized by a specific set of seven conservative motifs(27). As shown in Fig.2, all thesemotifs are present in ATHEL1p.

Table 1.

RC transposons in theA. thaliana,O.sativa, andC.elegans genomes

FamilyLength,bpRep-helicaseRPANonautonomous families
A. thaliana
Helitron1(7)15,809ATHEL1pATRPA1H1p,ATRPA1H2pHelitrony1A(1348, 10),Helitrony1B(1311, 10), Helitrony1C(3058,10), Helitrony2(11114, 10), Atrepx1(2432,20), Atrep1(888, 100), Atrep2(564, 150),Atrep3(2097, 150), Atrep4(2240, 50), Atrep5(2386, 50), Atrep6(1189, 30), Atrep7(940, 50), Atrep8(1077, 40), Atrep9(899, 10), Atrep10(899, 50), Atrep10A(1380, 20), Atrep10B(1821, 20), Atrep10C(653, 20), Atrep11(1053, 50), Atrep12(1342, 10), Atrep13(648, 30)
Helitron2 (6)11,435ATHEL2pATRPA2Hp
Helitron3 (2)15,333ATHEL3pATRPA3H1p,ATRPA3H2p, ATRPA3H3p
Helitron4(5)17,261ATHEL4pATRPA4H1p, ATRPA4H2p, ATRPA4Hp3
O. sativa
Helitron1_OS(1)10,182OSHEL1pOSRPA1Hp
Helitron2_OS(2)15,167OSHEL2pOSRPA2Hp
Helitron3_OS (1)12,693OSHEL3pOSRPA3Hp
C. elegans
Helitron1_CE(7)8,484CEHEL1pHelitrony1_CE (2593, 50),Helitrony1A_CE (3023, 50),Helitrony2_CE(245, 200),Helitrony3_CE (193, 100),Helitrony4_CE (1855, 100),NDNAX1 (2085,100),NDNAX2 (2844, 100),NDNAX3 (1591, 100)
Helitron2_CE (1)5,514CEHEL2p

Rep-helicase, proteins composed of the RCR Rep and DNA helicasedomains. Numbers of copies per haploid genome are shown in parentheses.Length of the consensus sequence and numbers of copies per genome areshown together in parentheses for nonautonomousHelitrons. 

Figure 2.

Figure 2

Multiple alignment of helicases encoded by theHelitron1andHelitron1_CE transposons with a set of eukaryoticand prokaryotic DNA helicases. Domains I–VI that are conservative inDNA helicases from the SF1 superfamily (20) and distances between thesedomains are indicated. Domain IV/V has not been reported previously.Invariable positions are shaded in black, and those conserved in morethan 60% of the sequences are shaded in gray. The following are namesof helicases: PIF1 (GenBank protein identification no. 130196, yeast),BACULOVIRUS (7460536, the dsDNALymantia disparnucleopolyhedrovirus), CHILO (5725645, the dsDNA chilo iridescentvirus), TRAA_RHISN (2499024, a Ti-like plasmid fromRhizobium), TRAI_EC (136208, the F plasmid fromEscherichia coli), EXOV_EC (2507018, the RecD subunitfrom theE. coli exodeoxyribonuclease V), TRWC (1084124,the R388 conjugative plasmid fromE. coli), and HEL_T4(416895, the dsDNA T4 bacteriophage).

The 11,435-bpHelitron2 consensus sequence is ≈96%identical to seven identifiedHelitron2 copies (Fig.1B) and carries two genes called ATRPA2H and ATHEL2. ATRPA2His composed of 10 exons (2021) and encodes a 518-aa RPA70-likeprotein, 41% identical to ATRPA1H1p fromHelitron1. Thereis only a 60% nucleotide identity between corresponding DNA segmentsthat code for the two proteins. ATHEL2 is composed of 11 exons encodinga 1,743-aa protein (ATHEL2p), whose ≈500-aa C-terminal portion issimilar to the SF1 helicases. ATHEL1p and ATHEL2p have different 300-aaN termini, and their remaining ≈1,400-aa portions are 55% identical.There is only a 64% nucleotide identity between DNA segments encodingthese portions. Overall, about 10 families of 8- to 15-kb-longHelitrons are present in theA. thaliana genome(not shown). Despite a high nucleotide divergence between differentHelitrons (≈40%), each has conserved structuralhallmarks, which include 5′-TC and CTAR-3′ termini, the AT targetsites, and an ≈18-bp hairpin separated by ≈11 nucleotides from the3′ end (Fig.3).

Figure 3.

Figure 3

Termini ofHelitrons. Conserved 5′ and 3′ termini are inbold capital letters, 3′ terminal hairpins are shaded in gray, andinverted repeats are underlined.

Helitrons in theO. sativa Genome.

Arabidopsis is not the only plant species harboringHelitron-like transposons. We found three families ofHelitrons, namedHelitron1_OS,Helitron2_OS, andHelitron3_OS, in the ricegenome (Table1; also see supplemental information atwww.girinst.org/∼vladimir/RC/S.html). They encode RPAs as wellthe Rep/DNA helicases and share the same structural hallmarks withtheHelitrons fromArabidopsis (Fig.2).Helitrons have been transposed recently in the rice genome,where they are represented by just a few copies (not shown). Theirrecent origin is indicated in the case ofHelitron2_OS,represented by two ≈15-kb copies, which are 99% identical (GenBankaccession nos.AP001278, positions 56948–41572, andAP001800,115543–87503).

Helitrons in theC. elegans Genome.

TheC. elegans genome also contains multiple copies of DNAhelicases related to those encoded by the plantHelitrons.We extracted five DNA fragments coding for theHelitron-likehelicase, which were ≈98% identical to each other. After iterativeseries of expansions of the original DNA fragments and multiplealignments of the expanded fragments, we derived an 8,484-bp consensussequence (Fig.1C). Analysis of the consensus sequencerevealed a single gene, named CEHEL1, composed of nine exons (2021),which encode a 1,466-aa protein (CEHEL1p), 33% identical withATHEL1p (excluding their 125- and 308-aa N-terminal portions,correspondingly). This consensus sequence, namedHelitron1_CE, shares common structural hallmarks with theplantHelitrons, including 5′-TC and CTGG-3′ termini and the3′-hairpin (Fig.3). On the basis of protein and nucleotidesimilarities toHelitron1_CE, we found another 5,514-bpelement,Helitron2_CE, whose seven putative exons (2021)encode a protein 60% identical to CEHEL1p (67% nucleotide identity).Again, despite the high divergence,Helitron2_CE sharesstructural hallmarks with the plantHelitrons. However,nematodeHelitrons do not contain the RPA-like proteins.

Interestingly, five previously reported families of 200- to 400-bpminisatellite-like nematode repetitive DNA (RcA1, RcC9, RcD1, Rc35, andRc123) have been found frequently adjacent to one another in the sameorder and orientation (28). It has been suggested that gene conversionand molecular drive are responsible for the conservation of theserepetitive elements in different clusters located far apart in thegenome (28). We found that these repeats are different fragments ofHelitrons, and the “clusters” observed earlier arejust different copies ofHelitrons. Overall,Helitrons carry and propagate multiple minisatellites in theC. elegans genome. For example, the internal portion of the≈3-kb nonautonomous elementHelitrony1_CE is builtpredominantly of different 15-, 35-, and 40-bp minisatellites. Thenumber of minisatellite units present in some copies of the nematodeHelitrons is close to 300 (not shown).

Nonautonomous Helitrons.

MostHelitrons are “nonautonomous” elements. Theyshare common termini and other structural hallmarks with“autonomous”Helitrons, but they do not encode anycomplete set of proteins encoded by the autonomous elements. Thisphenomenon is common for other known DNA transposons and indicates thatstructural hallmarks, present in both 5′ and 3′ termini, ensure atransposition of the nonautonomous elements because of the interactionbetween the termini and transposase expressed by the autonomouselements. It has been reported recently (2930) that theA.thaliana genome harbors multiple families of 1- to 3-kb-longrepetitive elements reported under the names ATREP (29) and AthE1 (30),which constitute more than 1% of the haploid genome. We found thatthese elements share the same structural hallmarks withHelitrons and are 60–80% similar to other nonautonomousHelitrons, likeHelitrony1A-Helitrony1C (notshown); therefore, they are also classified here as nonautonomousHelitrons.

Multiple nonautonomous derivatives ofHelitrons are presentin theC. elegans genome (Table1). For example, we foundtwo families of short nonautonomous transposons, calledHelitrony2_CE andHelitrony3_CE, which sharesimilar terminal regions withHelitron1_CE andHelitron2_CE. The ≈200 copies ofHelitrony2_CE in the genome are ≈5% divergent fromtheir 249-bp consensus sequence and ≈10% divergent from each other.Helitrony3_CE copies are only 1% divergent from their195-bp consensus sequence. Moreover, several copies ofHelitrony3_CE, inserted in different places, are identical(GenBank accession nos.AF106583, positions 13953–13789,AL023831,213–407, andU97016, 7488–17294). The last observation suggests thatHelitrons are transpositionally active inC.elegans.

Approximately 2% of both theA. thaliana andC.elegans DNA sequences deposited in GenBank (over 90% of bothgenomes) are composed of repetitive elements significantly similar toHelitrons present currently in Repbase Update. Theseelements were identified bycensor, and theircoordinates are listed in maps of TEs from theA. thalianaandC. elegans (www.girinst.org/Repbase_Update.html).The proportion ofHelitrons in these genomes is slightlyunderestimated, because mostHelitrons reside inheterochromatin regions, underrepresented in the available sequencedata. Moreover, multiple highly divergent families of nonautonomousHelitrons are represented by one or two copies per genome,and only some of them can be detected on the basis of distantsimilarity to known TEs.

Target Sites.

Numerous nonautonomousHelitrons were found inserted intocopies of other well-known mobile elements in theA.thaliana genome (2930), and it has been suggested that thoseelements do not produce target site duplications on their integrationin the genome. On the basis of our analysis ofHelitroninsertions into other well-defined TEs in both theA.thaliana andC. elegans (Fig.4), we also conclude thatHelitrons transpose specifically into host AT target sites.The integration occurs precisely between the host A and T nucleotides,without duplications or deletions of the target sites, consistent withthe RC mechanism discussed below.

Figure 4.

Figure 4

Precise integration ofHelitrons into the host AT targetsites. Six insertion cases ofHelitrons (red) intodifferent transposons (green) are shown separately. Two copies ofATREP1, two copies ofHelitrony3_CE, and single copiesofHelitrony2_CE and ATREP9 are inserted into copiesof the ATTIRX1D, ATHATN1, PAL5A_CE, LTR2_CE, PALTTAA1_CE, andATREP10 transposons, respectively. The consensus sequences of theelements harboringHelitrons are marked by the boldblack letters and are described in theA. thaliana andC. elegans sections of Repbase Update atwww.girinst.org/Repbase_Update.html. Consensus sequences of thecorrespondingHelitrons are marked in blue. Asterisks,semicolons, and dots indicate identical nucleotide positions,transitions, and transversions, respectively. Only termini ofHelitrons are shown. Black, green, and blue numbers showpositions in the consensus sequences of the harboring transposons,GenBank, andHelitron consensus sequences,respectively.

RPA-Like Proteins Are Functional Components ofHelitrons.

We generated a set of 100 random DNA sequences 61% identical to the2,600-bp ATRPA2H gene fromHelitron2, encoding the 518-aaRPA-like protein ATRPA2Hp. These random sequences imitated a≈2,767-bp portion of ATRPA1H1 fromHelitron1 encoding a475-aa portion of ATRPA1H1p 44% identical to ATRPA2Hp at the proteinlevel and 60% at the DNA level. After alignment by VMALN2, the averageidentity between the random sequences and ATRPA2H was even higher,63%, because of gaps introduced by the alignment. Fig.5 shows thedistribution of lengths of proteins predicted bygenscan in the randomly mutated sequences(proteins encoded by the minus strand have been discarded). The averagelength was 46 aa with a standard deviation of 40 aa. No proteins longerthan 152 aa were found in the random set of DNA sequences. Overall,only 71 random sequences encoded proteins. Moreover, only 31 sequencesencoded proteins similar to ATRPA2Hp (blastp,E < 0.05). The average length of the “random”proteins similar to ATRPA2Hp was only 15 aa, with a standard deviationof 25 aa (Fig.5). These random proteinscontrast strongly with ATRPA1H1p, whose 475-aa length differs from theaverage “random” length by the 20 standard deviations. Therefore,we conclude that the RPA protein is a functional component ofHelitrons, evolving under selective pressures. The sameconclusion is well supported by the observation of RPA-like proteinsencoded by the riceHelitrons, which do not show anysignificant nucleotide identity toHelitrons fromA.thaliana.

Figure 5.

Figure 5

Distributions of protein lengths predicted in 100 random DNA sequences.Every random sequence was 61% identical to ATRPA2H, without insertionsor deletions. Black marks all proteins predicted bygenscan; gray marks proteins similar to ATRPA2Hp.

Replication Initiator Motifs inHelitrons.

Despite their DNA transposon-like characteristics,Helitronsdo not code for any proteins similar to known transposases. However,iterative screening bypsi-blast shows thatATHEL1p contains an 11-aa motif (Fig.6)similar to the “two-His” motif conserved in the Reps encoded by adiverse set of plasmids and ssDNA viruses that use RC DNA replication(9). Most importantly, Reps perform both cleavage and ligation of DNA,reactions that initiate and terminate RCR, and contain three conservedmotifs (9). The first, most variable motif is of unknown function andlies 30–80 aa upstream of a “two-His” motif, which functionsputatively as a ligand to Mg2+ andMn2+, which are required for RCR. Another Repmotif lies 30–90 aa downstream of “two His” and catalyzes DNAcleavage and ligation, two key reactions of RCR. The last motif is alsoconserved inHelitrons and is separated by the invariantnumber of 114 aa following the “two-His” motif (Fig.6). Itincludes two conserved tyrosine residues, which are central to DNA nickformation and ligation by the Rep proteins during RCR (9,3133).

Figure 6.

Figure 6

Alignment of the RC motifs in theHelitrons. Followingis a list of the RCR initiator-like proteins: SVTS2 (GenBank accessionno.AAF18310, the SVTS2 ssDNA spiroplasma plectrovirus); Rep_SC(BAA34784, the pSA1.1 conjugative plasmid fromStreptomycescyaneus); Rep_BB (BAA07788, the pHT926Bacillusborstelensis cryptic plasmid); Rep_AA (AAC37125, thepVT736–1 RCR plasmid fromActinobacillusactinomycetemcomitans); and Pf3 (AAA88392, the Pf3 ssDNAbacteriophage fromPseudomonas aeruginosa). Colorshading shows different physicochemical properties of conserved aminoacids (17). Conserved tyrosines, corresponding to the RCRnicking/ligation catalytic center, are marked by dots.

Helitrons Are RC DNA Transposons.

The current model for RCR involves several basic stages (11).Replication starts from a site-specific nicking of the replicon plusstrand by the Rep protein. A free 3′-OH end of the nicked plus strandserves as a primer for leading-strand DNA synthesis and is elongated byseveral host replication proteins, such as DNA helicase, DNA Pol, andSSB. The newly synthesized leading plus strand remains covalentlylinked to the 3′-OH end of the parent plus strand during the continuousdisplacement of its 5′-OH end. When the leading strand makes a completeturn, Rep catalyzes a strand transfer reaction followed by release ofan ssDNA intermediate, the parent minus strand, and a double-strandedDNA (dsDNA) replicon composed of both the parental plus and a newlysynthesized strand. The RCR model (68) explains why there are notarget site duplications during integration ofHelitrons inthe genome. Presumably, the 3′ terminal hairpin, conserved inHelitrons, serves as a terminator of RCR.

Evolutionary Implications.

It has been suggested that geminiviruses have evolved from prokaryoticcircular ssDNA replicons (9,12). If this scenario is correct, onewould expect to observe geminiviruses in different eukaryotic kingdoms.However, no geminiviruses are found outside plant species. Our findingof eukaryotic RC transposons suggests that geminiviruses might haveevolved from plant RC transposons rather than from prokaryotic RCreplicons.

Given characteristics of prokaryotic and eukaryotic RC transposons, itseems that the hypothesis of their evolution from the same commonancestral elements would be the most parsimonious. The prokaryotic RCtransposons encode the Rep proteins only, and their transpositiondepends presumably on the host DNA helicase and SSB (8). As shown here,in addition to Rep, the eukaryotic RC transposons encode their ownhelicase and SSB (the latter in plants only). Given their exon–intronstructure and similarities to known proteins,Helitron'sSSB and helicase are both likely to have evolved from former hostproteins recruited by ancestral eukaryotic RC transposons. Apparently,RC transposons form the only class of eukaryotic mobile elements thatdepend so critically on proteins related to those directly involved ina host DNA replication.

Two alternative scenarios describe the most likely fate of a host genecaptured by a transposon: (i) The captured gene would bedestroyed by multiple mutations if it did not provide any selectiveadvantage to the transposon; (ii) it would be kept as a generelated to the original host gene if its capture is beneficial for thetransposon, which is tolerated by the host. Given the conservation ofthe RPA and helicase proteins in different families ofHelitron, DNA sequences of which are sometimes as much as40–50% divergent, one has to presume that both proteins arefunctional parts of the transposition machinery. Spectacularly,Helitrons, as most of other mobile elements in theA.thaliana andC. elegans genomes (ref.29; V.V.K. andJ.J.,www.girinst.org/Repbase_Update.html), are represented in thegenomes by multiple highly diverged families. Given the young age ofthese families and the extent of protein conservation, it is highlyunlikely that the divergence observed is a result of mutationsaccumulated by the transposons integrated in the host genome. In thatcase,Helitron transposons work as a powerful tool ofevolution. They have recruited host genes, modified them to an extentthat is unreachable by the Mendelian process, and multiplied them inthe host genomes.

Acknowledgments

We thank Dr. Gabor Toth for helpful discussions and JolantaWalichiewicz, Dr. Zelek Herman, Alison McCormack, and Michael Jurka forhelp with editing the manuscript. This work was supported by NationalInstitutes of Health Grant 2 P41 LM06252–04A1 (to J.J.).

Abbreviations

RC

rolling-circle

RCR

RC replication

TE

transposable element

RPA

replication protein A

RPA70

the largestsubunit of RPA

Rep

replication initiator protein

dsDNA

double-stranded DNA

ssDNA

single-stranded DNA

SSB

ssDNA-bindingprotein

GIRI

Genetic Information Research Institute

References

  • 1.Berg D E, Howe M M, editors. Mobile DNA. Washington, DC: Am. Soc. Microbiol.; 1989. [Google Scholar]
  • 2.Kidwell M G, Lish D. Proc Natl Acad Sci USA. 1997;94:7704–7711. doi: 10.1073/pnas.94.15.7704. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Fedoroff N V. Ann NY Acad Sci. 1999;870:251–264. doi: 10.1111/j.1749-6632.1999.tb08886.x. [DOI] [PubMed] [Google Scholar]
  • 4.Craig N L. Science. 1995;270:253–254. doi: 10.1126/science.270.5234.253. [DOI] [PubMed] [Google Scholar]
  • 5.Turlan C, Chandler M. Trends Microbiol. 2000;8:268–274. doi: 10.1016/s0966-842x(00)01757-1. [DOI] [PubMed] [Google Scholar]
  • 6.Mendiola M V, Bernales I, de la Cruz F. Proc Natl Acad Sci USA. 1994;91:1922–1926. doi: 10.1073/pnas.91.5.1922. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Richter G Y, Björklöf F, Romantschuk M, Mills D. Mol Gen Genet. 1998;5:381–387. doi: 10.1007/s004380050907. [DOI] [PubMed] [Google Scholar]
  • 8.Tavakoli N, Comanducci A, Dodd H M, Lett M-C, Albiger B, Bennett P. Plasmid. 2000;44:66–84. doi: 10.1006/plas.1999.1460. [DOI] [PubMed] [Google Scholar]
  • 9.Koonin E V, Ilyina T V. BioSystems. 1993;30:241–268. doi: 10.1016/0303-2647(93)90074-m. [DOI] [PubMed] [Google Scholar]
  • 10.Kornberg A, Baker T A. DNA Replication. New York: Freeman; 1992. [Google Scholar]
  • 11.Khan S A. Mol Microbiol. 2000;37:477–484. doi: 10.1046/j.1365-2958.2000.02001.x. [DOI] [PubMed] [Google Scholar]
  • 12.Rigden J E, Dry I B, Krake L R, Rezaian M A. Proc Natl Acad Sci USA. 1996;93:10280–10284. doi: 10.1073/pnas.93.19.10280. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Ivics Z, Hackett P B, Plasterk R H, Izsvak Z. Cell. 1997;14:501–510. doi: 10.1016/s0092-8674(00)80436-5. [DOI] [PubMed] [Google Scholar]
  • 14.Altschul S F, Gish W, Miller W, Myers E W, Lipman D J. J Mol Biol. 1990;215:403–410. doi: 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]
  • 15.Jurka J, Klonowski P, Dagman V, Pelton P. Comput Chem. 1996;20:119–121. doi: 10.1016/s0097-8485(96)80013-1. [DOI] [PubMed] [Google Scholar]
  • 16.Altschul S F, Madden T L, Schaffer A A, Zhang J, Zhang Z, Miller W, Lipman D J. Nucleic Acids Res. 1997;25:3389–3402. doi: 10.1093/nar/25.17.3389. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Thompson J D, Higgins D G, Gibson T J. Nucleic Acids Res. 1994;22:4673–4680. doi: 10.1093/nar/22.22.4673. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Nicholas K B, Nicholas H B, Jr, Deerfield D W, II. EMBNETNEWS. 1997;4:1–4. [Google Scholar]
  • 19.Faulkner D, Jurka J. Trends Biochem Sci. 1988;13:321–322. doi: 10.1016/0968-0004(88)90129-6. [DOI] [PubMed] [Google Scholar]
  • 20.Burge C, Karlin S. J Mol Biol. 1997;268:78–94. doi: 10.1006/jmbi.1997.0951. [DOI] [PubMed] [Google Scholar]
  • 21.Salamov A A, Solovyev V V. Genome Res. 2000;10:516–522. doi: 10.1101/gr.10.4.516. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Jurka J. Trends Genet. 2000;16:418–420. doi: 10.1016/s0168-9525(00)02093-x. [DOI] [PubMed] [Google Scholar]
  • 23.Lahaye A, Stahl H, Thines-Sempoux D, Foury F. EMBO J. 1991;10:997–1007. doi: 10.1002/j.1460-2075.1991.tb08034.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Zhou J-Q, Monson E K, Teng S-C, Schultz V P, Zakian V A. Science. 2000;289:771–774. doi: 10.1126/science.289.5480.771. [DOI] [PubMed] [Google Scholar]
  • 25.Wold M S. Annu Rev Biochem. 1997;66:61–92. doi: 10.1146/annurev.biochem.66.1.61. [DOI] [PubMed] [Google Scholar]
  • 26.Bochkarev A, Pfuetzner R A, Edwards A M, Frappier L. Nature (London) 1997;385:176–181. doi: 10.1038/385176a0. [DOI] [PubMed] [Google Scholar]
  • 27.Gorbalenya A E, Koonin E V. Curr Opin Struct Biol. 1993;3:419–429. [Google Scholar]
  • 28.Naclerio G, Cangiano G, Coulason A, Levitt A, Ruvolo V, La Volpe A. J Mol Biol. 1992;226:159–168. doi: 10.1016/0022-2836(92)90131-3. [DOI] [PubMed] [Google Scholar]
  • 29.Kapitonov V V, Jurka J. Genetica. 1999;107:27–37. [PubMed] [Google Scholar]
  • 30.Surzycki S A, Belknap W R. J Mol Evol. 1999;48:684–691. doi: 10.1007/pl00006512. [DOI] [PubMed] [Google Scholar]
  • 31.Noirot-Gros M-F, Erlich S D. Science. 1996;274:777–780. doi: 10.1126/science.274.5288.777. [DOI] [PubMed] [Google Scholar]
  • 32.Novick R P. Trends Biochem Sci. 1998;23:434–438. doi: 10.1016/s0968-0004(98)01302-4. [DOI] [PubMed] [Google Scholar]
  • 33.Sherman J A, Matson S W. J Biol Chem. 1994;269:26220–26226. [PubMed] [Google Scholar]

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy ofNational Academy of Sciences

ACTIONS

RESOURCES


[8]ページ先頭

©2009-2025 Movatter.jp