. 2020 Jun 22;4(9):1220–1231. doi:10.1038/s41559-020-1221-7

The genome ofPrasinoderma coloniale unveils the existence of a third phylum within green plants

Linzhou Li^1,^2,^#,Sibo Wang^1,^3,^#,Hongli Wang^1,⁴,Sunil Kumar Sahu¹,Birger Marin⁵,Haoyuan Li¹,Yan Xu^1,⁴,Hongping Liang^1,⁴,Zhen Li⁶,Shifeng Cheng¹,Tanja Reder⁵,Zehra Çebi⁵,Sebastian Wittek⁵,Morten Petersen³,Barbara Melkonian^5,⁷,Hongli Du⁸,Huanming Yang¹,Jian Wang¹,Gane Ka-Shu Wong^1,⁹,Xun Xu^1,¹⁰,Xin Liu¹,Yves Van de Peer^6,^11,^12,^✉,Michael Melkonian^5,^7,^✉,Huan Liu^1,^3,^✉

¹State Key Laboratory of Agricultural Genomics, BGI-Shenzhen, Shenzhen, China

²Department of Biotechnology and Biomedicine, Technical University of Denmark, Lyngby, Denmark

³Department of Biology, University of Copenhagen, Copenhagen, Denmark

⁴BGI Education Center, University of Chinese Academy of Sciences, Shenzhen, China

⁵Institute for Plant Sciences, Department of Biological Sciences, University of Cologne, Cologne, Germany

⁶Department of Plant Biotechnology and Bioinformatics (Ghent University) and Center for Plant Systems Biology, Ghent, Belgium

⁷Central Collection of Algal Cultures, Faculty of Biology, University of Duisburg-Essen, Essen, Germany

⁸School of Biology and Biological Engineering, South China University of Technology, Guangzhou, China

⁹Department of Biological Sciences and Department of Medicine, University of Alberta, Edmonton, Alberta Canada

¹⁰Guangdong Provincial Key Laboratory of Genome Read and Write, BGI-Shenzhen, Shenzhen, China

¹¹College of Horticulture, Nanjing Agricultural University, Nanjing, China

¹²Centre for Microbial Ecology and Genomics, Department of Biochemistry, Genetics and Microbiology, University of Pretoria, Pretoria, South Africa

^✉

Corresponding author.

Contributed equally.

Received 2019 Jun 12; Accepted 2020 May 12; Issue date 2020.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visithttp://creativecommons.org/licenses/by/4.0/.

PMC Copyright notice

PMCID: PMC7455551 PMID:32572216

Abstract

Genome analysis of the pico-eukaryotic marine green algaPrasinoderma coloniale CCMP 1413 unveils the existence of a novel phylum within green plants (Viridiplantae), the Prasinodermophyta, which diverged before the split of Chlorophyta and Streptophyta. Structural features of the genome and gene family comparisons revealed an intermediate position of theP. coloniale genome (25.3 Mb) between the extremely compact, small genomes of picoplanktonic Mamiellophyceae (Chlorophyta) and the larger, more complex genomes of early-diverging streptophyte algae. Reconstruction of the minimal core genome of Viridiplantae allowed identification of an ancestral toolkit of transcription factors and flagellar proteins. Adaptations ofP. coloniale to its deep-water, oligotrophic environment involved expansion of light-harvesting proteins, reduction of early light-induced proteins, evolution of a distinct type of C₄ photosynthesis and carbon-concentrating mechanism, synthesis of the metal-complexing metabolite picolinic acid, and vitamin B₁, B₇ and B₁₂ auxotrophy. TheP. coloniale genome provides first insights into the dawn of green plant evolution.

Subject terms: Evolutionary genetics, Molecular evolution, Phylogenetics, Taxonomy, Evolutionary biology

Main

One of the most important biological events in the history of life was the successful colonization of the terrestrial landscape by green plants (Viridiplantae) that paved the way for terrestrial animal evolution, altering geomorphology and changes in the Earth’s climate^1–3. The Viridiplantae comprise perhaps 500,000 species, ranging from the smallest to the largest eukaryotes^4,5. Divergence time estimates from molecular data suggest that Viridiplantae may be close to 1 billion years old^6,7. All extant green plants are classified in either of two divisions/phyla, Chlorophyta and Streptophyta, which differ structurally, biochemically and molecularly^8–12. The Streptophyta contain the land plants (embryophytes) and a paraphyletic assemblage of algae known as the streptophyte algae, whereas all other green algae comprise the Chlorophyta. The reconstruction of phylogenetic relationships across green plants using transcriptomic or genomic data provided evidence that unicellular, often scaly, flagellate organisms were positioned near the base of the radiation in both phyla^13–16, corroborating earlier proposals based on ultrastructural analyses that the common ancestor of all green plants may have been a scaly flagellate^17,18. The search for an extant relative of such a flagellate, however, has been in vain, although an initial report suggested thatMesostigma viride diverged before the split of Chlorophyta and Streptophyta¹⁹, a result not corroborated by later studies²⁰.M. viride is now recognized as an early-diverging member of the Streptophyta^21,22. While the majority of the early-diverging lineages in the Chlorophyta consisted of (mostly marine) scaly flagellates, some lineages were represented by very small, non-flagellate unicells often surrounded by cell walls^23,24. One of these lineages, provisionally termed ‘Prasinococcales’²³ (clade VI), could not be reliably positioned in phylogenetic trees^24,25. A major step forward was made when it was discovered that an enigmatic, non-cultured group of deep-water, oceanic macroscopic algae of palmelloid organization comprising the generaVerdigellas andPalmophyllum formed a deeply diverging lineage of Viridiplantae that included the Prasinococcales²⁶. Later, the class Palmophyllophyceae was established for these organisms as the first divergence in Chlorophyta, that is sister to all other Chlorophyta²⁷. Phylogenies based on nuclear-encoded ribosomal RNA genes (4,579 positions), however, placed Palmophyllophyceae as the earliest divergence in Viridiplantae, but monophyly of Chlorophyta + Streptophyta to the exclusion of Palmophyllophyceae, received no support in these analyses²⁷.

To date, genomic resources for the Palmophyllophyceae have been limited to organelle genomes. Here we present the first nuclear genome sequence of a unicellular member of this lineage,Prasinoderma coloniale (Fig.1a). Based on phylogenomic analyses, we establish a new phylum for this group, the Prasinodermophyta, with two classes, as the earliest divergence of the Viridiplantae. The genome ofP. coloniale provided new insights into pico-eukaryotic biology near the dawn of green plant evolution.

Fig. 1 — a, Light micrograph ofP. coloniale.b, The phylogenetic tree was constructed using the maximum-likelihood method in RAxML and MrBayes based on a concatenated sequence alignment of 256 single-copy genes (500 bootstraps).c, The basal divergence of the new phylum Prasinodermophyta, as revealed by analyses of complete nuclear- and plastid-encoded rRNA operons from 109 Archaeplastida. The rRNA dataset comprised 8,818 aligned positions and contained representatives of all major lineages of Rhodoplantae (seven classes), Glaucoplantae (four genera) and Viridiplantae (three divisions with several classes) including embryophytes. Shown is the RAxML phylogeny (GTRGAMMA model); the three support values at branches are RAxML/IQ-TREE bootstrap percentages/Bayesian posterior probabilities. Bold branches received maximal support (100/100/1).

Results and discussion

Genome sequencing and characteristics

The genome size ofP. coloniale was estimated to be about 26.04 Mb. After reads filtering (7.4 Gb PacBio data) and self-correction, a 25.3 Mb genome was de novo assembled consisting of 22 chromosomes, including the complete chloroplast and mitochondrial genomes (Supplementary Figs.1 and2). The sizes of the individual chromosomes varied from 0.45 to 3.60 Mb. BUSCO analysis showed a high degree of completeness of the genome, with 282 out of 303 (93.1%) complete eukaryotic universal genes (Supplementary Table1). Additionally, 99.38% (Supplementary Table2) of the transcriptome could be mapped to the assembled genome.P. coloniale has a GC content of 69.8%, while 6.51% of the genome consists of repeats (Supplementary Fig.3, Supplementary Table3 and Extended Data Fig.1). A total of 7,139 protein-coding genes were annotated, of which 6,996 were supported by the transcriptome. Additionally, 6,759 (94.7%) genes were annotated from known protein databases (Supplementary Table4).

Extended Data Fig. 1 — Outer ring: The 22 chromosomes were labeled from Chr1 to Chr22. Inner rings 2–5 (from outside to inside): Illumina sequencing depth colored in light green (y-axis min-max: 0–592). PacBio sequencing depth colored in light purple (y-axis min-max: 0–67). GC content ofP. coloniale chromosomes in light blue (y-axis min-max: 0–80.0). The gene number distribution ofP. coloniale colored in red (y-axis min-max: 0–38). The slide window of inner rings 2–5 is 5,000 bp. Inner rings 6–15: Genes shared betweenP. coloniale and other early-diverging viridiplant genomes, from outside to inside.M. viride, C. atmophyticus, K. nitens, C. braunii andM. endlicherianum (green),*M. commoda, M. pusilla, B. prasinos, O. lucimarinus* andO. tauri (blue).

Phylogenetic analyses and Prasinodermophyta div. nov

Phylogenetic analyses ofP. coloniale were performed with two different taxon and datasets. (1) Both maximum-likelihood and Bayesian trees were constructed from an alignment of 256 orthologues of single-copy nuclear genes from 28 taxa of Archaeplastida, showing thatP. coloniale (and the relatedPrasinococcus capsulatus) diverged before the split of Streptophyta and Chlorophyta (Fig.1b). All internal branches in the tree received maximal/nearly maximal support, and the monophyly of Streptophyta + Chlorophyta to the exclusion ofP. coloniale andP. capsulatus received 88% bootstrap and 1.0 posterior probability support. A phylogenetic tree constructed from 31 mitochondrial genes of 19 taxa of Archaeplastida also revealedP. coloniale as the earliest divergence in Viridiplantae (Supplementary Fig.4). (2) We increased the taxon sampling to 109 taxa of Archaeplastida, including six sequences of Palmophyllophyceae²⁷ comprising nuclear- and plastid-encoded rRNA operons. The phylogeny corroborated the multi-protein phylogeny because Palmophyllophyceae again diverged before the split of Chlorophyta and Streptophyta (Fig.1c). Separate phylogenies of nuclear- and plastid-encoded rRNA operons gave congruent results, although support values were generally lower (Supplementary Figs.5 and6). The summary coalescent method ASTRAL gave inconclusive results (Supplementary Table5 and Supplementary Figs.7 and8), and taxon sampling was sensitive to long-branch attraction²⁸ (Supplementary Fig.9 and Extended Data Fig.2). The former tree is corroborated by a recent phylotranscriptomic analysis of 1,090 viridiplant species in which the placement of three Palmophyllophyceae was unstable in ASTRAL trees but resolved as the basal divergence of Viridiplantae in concatenated trees²⁹. Previous plastome phylogenies placed Palmophyllophyceae either as the earliest divergence within Chlorophyta, sister to all other Chlorophyta²⁷, or in an unresolved position among Chlorophyta¹⁵. Plastome phylogenies are limited by the dataset (70–80 plastid-encoded genes) but also suffer from introgression of the plastid from one species to another, recombination and gene conversion, as well as differential selective pressures acting on protein-coding plastid genes, which may also introduce biases and lead to incongruent gene and species trees^30–33. For example, unlike nuclear trees, some studies have failed to recover Ulvophyceae, Trebouxiophyceae and Pedinophyceae as monophyletic groups²⁷ or Mesostigmatophyceae within Streptophyta¹⁵. Based on their phylogenetic positions (Fig.1b,c, Supplementary Figs.4–6 and Extended Data Fig.2), gene family comparisons and molecular synapomorphies, we here propose a new division/phylum for the Palmophyllophyceae sensu²⁷, the Prasinodermophyta div. nov. with two classes, Prasinodermophyceae class. nov. and Palmophyllophyceae emend (Supplementary Data1).

Extended Data Fig. 2 — (a). RAxML phylogeny of 23 Archaeplastida/Plantae, for which genome sequences have been determined. As an exception, theGonium genome project did not cover both rRNA operons, and thus, Gonium was replaced by the closely relatedYamagishiella. For similar reasons,*Micromonas pusilla* was replaced byM. bravo. The Prasinodermophyta, represented only byPrasinoderma coloniale, was resolved as sister to the Mamiellales (Mamiellophyceae) with maximal support. This artificial placement (that isPrasinoderma coloniale diverging within the Chlorophyta) gained high support by bootstrapping (numbers in red color). (b). Splitting the long branch ofPrasinoderma coloniale by addition ofPrasinococcus capsulatus did not change the artificial placement of the Prasinodermophyta, but reduced the bootstrap support for the artificial branches (numbers in red color). (c). When the long branch of the Mamiellales was subdivided by addition ofMonomastix sp. andPyramimonas parkeae, the Mamiellophyceae/Pyramimonadophyceae-clade diverged independently, and the Prasinodermophyta attained a basally diverging position within Viridiplantae. However, the support for the basal divergence of the Prasinodermophyta was relatively low (numbers in blue color). (d). Further addition of only two taxa,*Dolichomastix tenuilepis* andCymbomonas tetramitiformis, was sufficient to raise the bootstrap support for the monophyly of the Chlorophyta (to the exclusion of Prasinodermophyta; 94%), and the monophyly of Chlorophyta+Streptophyta (again to the exclusion of Prasinodermophyta; 89%) to high values (numbers in blue color), comparable to the 109-taxa rRNA phylogeny (Fig.1c), and the genome/transcriptome tree (Fig.1b). Taxon sampling for resolving the phylogenetic position of the Prasinodermophyta is thus saturated with only 28 sequences of Archaeplastidae/Plantae.

Comparison of gene families among Archaeplastida

The phylogenetic placement of Prasinodermophyta as a sister group to all other Viridiplantae provided a unique opportunity to reconstruct the minimum core genome of Viridiplantae, and to compare the genome ofP. coloniale to those of early-diverging Streptophyta, Chlorophyta and the Glaucoplantae, to identify plesiomorphic and apomorphic traits. In total, 4,052 orthogroups are shared among Chlorophyta and Streptophyta, of which 3,292 are also shared withP. coloniale. If the orthogroups shared uniquely byP. coloniale with eitherMicromonas commoda (621) orChlorokybus atmophyticus (179) are added, 4,092 orthogroups represent the minimal core genome of Viridiplantae (Fig.2a). A total of 1,356 unique orthogroups were found inP. coloniale, mainly involved in biological process categories such as photosynthesis-antenna proteins, plant–pathogen interaction and plant hormone signal transduction (Supplementary Table6). Thus, it is reasonable to expect that these unique biological traits reflect adaptations ofP. coloniale to its deep-water/low-light, oligotrophic habitat.

Fig. 2 — a, Venn diagram showing unique and shared orthogroups amongP. coloniale,*C. atmophyticus*,*M. commoda* andC. paradoxa. Gene numbers are given in parentheses.b, Percentages of proteins found among Viridiplantae (red), Chlorophyta-specific (blue) and Streptophyta-specific (green) based on the classification given in OrthoFinder. Species abbreviations are listed in Supplementary Table32.c, PCA of the type and number of Pfam domains of all genes across the Viridiplantae.d, Box-and-whisker plots depicting distributions of the lengths of exons and introns in selected Viridiplantae.

Comparative genomics ofP. coloniale with early-diverging Viridiplantae

About 38.5% of theP. coloniale genes gave best hits with Chlorophyta, while a similar percentage (33.9%) gave best hits with Streptophyta, supporting an equidistant relationship betweenP. coloniale and Streptophyta and Chlorophyta (Supplementary Fig.10).P. coloniale, along with some representative early-diverging Viridiplantae, showed a very similar percentage of Viridiplantae genes (commonly shared). The remaining proteins ofP. coloniale were equally distributed among Streptophyta- and Chlorophyta-specific genes (Fig.2b). Principal component analysis (PCA) showed that early-diverging Chlorophyta (Mamiellophyceae), streptophyte algae (Mesostigmatophyceae, Klebsormidiophyceae and Charophyceae), Glaucoplantae and Rhodoplantae form four separate clusters withP. coloniale in an isolated position, which also further supports its classification as a new and independent clade—that is, the Prasinodermophyta div. nov. (Fig.2c)

Furthermore, a comparative analysis on structural genomic features showed a trend of gradually increasing average intron length and decreasing average exon length fromP. coloniale to early-diverging streptophytes, and the opposite trend fromP. coloniale to early-diverging Chlorophyta was observed (Fig.2d). In addition, the genome size, gene size, gene spacing distance and total and average exon numbers exhibited a similar pattern with early-diverging Chlorophyta (Extended Data Fig.3 and Supplementary Table7). However, theP. coloniale genome contains 41% coding sequences, higher than the early-diverging streptophytes but considerably lower than early-diverging Chlorophyta. In summary, the structural characteristics of theP. coloniale genome revealed its intermediate position between the extremely compact and small genomes of picoplanktonic early-diverging Chlorophyta³⁴ and the larger and structurally more complex genomes of early-diverging streptophytes^35,36.

Extended Data Fig. 3 — Genome size, average gene size, the percentage of the coding sequence, average gene density, average exon number per gene and total exon number among early-diverging lineages of Chlorophyta and Streptophyta compared toP. coloniale.

Analysis of transcription factors inP. coloniale

In total, 55 of 114 types of TF/TR genes were identified in theP. coloniale genome (Supplementary Table8). Although all 55 types of transcription factor/transcription regulator (TF/TR) genes ofP. coloniale were also found in Chlorophyta and early-diverging streptophyte algae, considerably lower numbers of TF/TR genes (201) were identified inP. coloniale compared to Chlorophyta and Streptophyta (Supplementary Table9).

Among the 55 types of TF/TR genes, the majority (50) are also present in Glaucoplantae and/or Rhodoplantae, suggesting that these constituted the basic TF/TR toolbox in the common ancestor of Archaeplastida. However, five TF/TR types ofP. coloniale (C2C2-Dof, WRKY, SBP, GARP_ARR-B and TAZ) were presumably gained in the common ancestor of the Viridiplantae since they are absent in both Glaucoplantae and Rhodoplantae (Supplementary Table8). WRKY proteins are key regulators of development, carbohydrate synthesis, senescence and responses to biotic and abiotic stresses in embryophytes³⁷. Using newly retrieved WRKY sequences, we confirmed the presence of eight well-supported WRKY domain subgroups in Viridiplantae (Extended Data Fig.4). The number of gene copies with WRKY domains and the divergent sequences of the N-terminal WRKY domains inP. coloniale may be related to its picoplanktonic lifestyle and/or low-light environment (the picoplanktonic Mamiellophyceae generally also display more than one WRKY gene copy; Supplementary Table8).

Extended Data Fig. 4 — *Prasinoderma*’s WRKY domain is marked in light green color. WRKY domains I CTD and I NTD represent the C- and N-terminal domains of a single WRKY gene, each domain is monophyletic comprising both Streptophyta and Chlorophyta. This suggests that the common ancestor of Chlorophyta and Streptophyta had this configuration. Interestingly,*P. coloniale* has four gene copies with a total of six WRKY domains (Supplementary Fig.13). Two of the gene copies display both N- and C-terminal WRKY domains, the other two have only N-terminal WRKY domains. The phylogenetic tree (Supplementary Fig.13) placed three WRKY domains in clade I CTD (two C-terminal and one N-terminal WRKY domain), the other N-terminal WRKY domains ofP. coloniale could not be positioned in one of the 8 WRKY domain subfamilies. We suggest that the I CTD subfamily is ancestral in the Viridiplantae and the N-terminal WRKY domains originated by domain duplication and shuffling.

The type-B phospho-accepting response regulator (GARP_ARR-B) family modulates plastid biogenesis, circadian clock oscillation, cytokinin signalling and control of the phosphate starvation response in plants³⁸. Since many genes of the cytokinin biosynthesis and signalling pathways are lacking inP. coloniale (Supplementary Table10), these response regulators may be involved in other functions. Finally, the evolution of the SQUAMOSA promotor-binding protein (SBP)-box TF was previously suggested to predate the split of Streptophyta and Chlorophyta³⁹. SBP-box TFs have diverse specialized functions in embryophytes, but in green algae they may be involved in more basic functions such as regulation of trace metal homeostasis⁴⁰. The C2C2-Dof (DNA binding with one finger) TFs have been implicated in light control of zygote germination inChlamydomonas reinhardtii⁴¹ and apparently originated also in the common ancestor of Viridiplantae. Besides the five TF/TR gains, expansion in gene copy numbers was observed in only one TF (Jumonji_Other) in the common ancestor of Viridiplantae when compared with Glaucoplantae and Rhodoplantae. Plant JmjC domain-containing proteins have important functions in both histone modification⁴² and regulation of development and environmental responses⁴³.

In contrast to gains and expansion of TFs/TRs,P. coloniale also exhibited loss of six TFs/TRs (C2H2, C3H, CCAAT_HAP2, MADS_MIKC, MBF1 and Zinc Finger MIZ type) that are present in Chlorophyta, Streptophyta, Glaucoplantae and Rhodoplantae. Mapping of TFs/TRs on the phylogeny (Fig.1b) also allowed tentative conclusions about gains of TFs/TRs in the common ancestor of Chlorophyta + Streptophyta (five: ABI3/VP1, Dicer, HD_DDT, Pseudo ARR-B and Whirly) and the common ancestor of Streptophyta (seven: HD-ZIP_I_II, HD-ZIP_III, HD-PLINC, GRF, LUG, SRS and Trihelix).

Light-harvesting complex (LHC) and LHC-like proteins inP. coloniale

Archaeplastida produce metabolic energy by collecting solar energy and transferring it to photosynthetic reaction centres, facilitated by two types of light-harvesting complexes (LHC I and LHC II), composed of LHC proteins that interact with light-harvesting pigments^44–47. We identified 41 LHC and LHC-like proteins ofP. coloniale (Supplementary Table11).

Phylogenetic analysis of LHC proteins fromP. coloniale showed them to be widely distributed in seven of the ten LHC clades, namely LHCA, LHCB, LHCX, PSBS, OHP, Ferrochelatase and ELIPs (Fig.3). TheP. coloniale genome has 19 Lhcb genes (six of which apparently originated from three successive gene duplications in thePrasinoderma lineage).P. coloniale also displayed nineLhca genes, whereas in most of the investigated early-diverging Chlorophyta/Mamiellophyceae and inMesostigma (Streptophyta) there are only sixLhca genes (Supplementary Table11). As in the early-diverging Mamiellophyceae,P. coloniale displayed two LHCX proteins. There are three helix proteins inP. coloniale, as inCyanophora paradoxa. Other types of LHC-like proteins, such as RedCap, SEP (SEP apparently originated in streptophyte algae) and LHCL, are missing inP. coloniale. The relatively large number of gene copies of chlorophyll-a/b-binding proteins (Lhca,Lhcb) inP. coloniale could reflect adaptation to the low-light environment from which this strain was isolated (150 m depth), requiring larger LHC antennae. This is corroborated by two other observations: first, the relatively low Chl-a/b ratio (1.13) reported for this organism and the relatedPalmophyllum⁴⁸ (0.64), and second, the lower number (six) of ELIPs inP. coloniale compared to core Chlorophyta (nine inUlva lactuca and ten inC. reinhardtii) or early-diverging subaerial/terrestrial streptophyte algae (13/9 inC. atmophyticus andKlebsormidium nitens, respectively).

Fig. 3 — The tree is derived from a MAFFT alignment and was constructed using IQ-TREE (seeMethods) with the model of sequence evolution suggested by the programme. Bootstrap values (500 replicates) ≥50% are shown. The LHC superfamily can be divided into ten clades, marked by different colours; the LHC genes ofP. coloniale are highlighted in red. The coloured circles on the outer ring denote the distribution of the different LHC subfamilies in the respective taxa. The TUC clade comprises Trebouxiophyceae, Ulvophyceae and Chlorophyceae (all in Chlorophyta). ELIP, early light-induced protein; SEP, two-helix stress-enhanced protein; OHP, one-helix protein; PSBS, the photosystem II subunit S; RedCAP, red lineage chlorophyll a/b-binding (CAB)-like protein.

Carbon-concentrating mechanisms (CCMs)

Previous studies of picoplanktonic Mamiellophyceae suggested that these algae might possess a C₄-like carbon fixation pathway to alleviate low CO₂ affinity⁴⁹. A C₄-like CCM has also been reported in photosynthetic stramenopiles^50–53. CCMs mainly rely on carbonic anhydrases (CAs) that catalyse the reversible conversion of carbon dioxide to bicarbonate. Four CA genes belonging to the delta- and gamma-type CAs were identified in the genome ofP. coloniale, while alpha- and beta-type CAs were absent (Fig.4a). Among Viridiplantae, only Mamiellophyceae were found to encode delta-type CAs (Supplementary Table12). Whereas alpha-, beta- and gamma-type CAs apparently evolved in the common ancestor of Archaeplastida (alpha- and beta-type CAs were lost inP. coloniale, and alpha-type CAs in the later-diverging Mamiellophyceae, perhaps related to cell miniaturization in both groups), delta-type CAs apparently evolved in the common ancestor of Viridiplantae and were independently lost in the core Chlorophyta and in Streptophyta.

Fig. 4 — a, Maximum-likelihood phylogeny of CA proteins inP. coloniale.b, Proposed CCMs in which inorganic carbon is assimilated byP. coloniale based on predicted protein localizations. A brown arrow denotes that a reaction occurs only inP. coloniale, and a grey dotted arrow denotes a reaction that exists in Mamiellophyceae. MA, malic acid; MDH, malate dehydrogenase; ME, malic enzyme; Pyr, pyruvate; 3-PGA, 3-phosphoglyceric acid; PPDK, pyruvate, phosphate dikinase; RuBisCO, ribulose-1,5-bisphosphate carboxylase oxygenase; TCA, tricarboxylic acid cycle.

Here we propose a putative model of CCMs inP. coloniale, based on the targets of the genes that are necessary for inorganic carbon assimilation (Fig.4b). As a potential C₄-like CCM, malate dehydrogenase catalyses the reaction to yield malate in the cytosol, mitochondrion and chloroplast (Fig.4b). Malic enzymes could be transported into the mitochondrion and chloroplast, where they release CO₂. Previous studies of CCMs inMicromonas andOstreococcus suggested that these algae might perform cytosol- and chloroplast-based C₄-like CCMs⁴⁹.P. coloniale, however, potentially harbours cytosol-, chloroplast- and mitochondrion-based CCMs to enhance the ability to concentrate CO₂ in a low-CO₂ environment. Interestingly, theP. coloniale genome encoded phosphoenolpyruvate carboxykinase (PEPCK) but not pyruvate carboxylase (PC), opposite to the situation in the genomes of Mamiellophyceae^49,54–56. This result suggests that a distinct CCM might exist inP. coloniale that uses phosphoenolpyruvate (PEP) as a substrate from the glycolytic pathway to produce oxaloacetate (OAA) by PEPCK, instead of PC as in the Mamiellophyceae (Supplementary Table12 and13).

Analysis of carbohydrate-active enzymes (CAZymes) and peptidoglycan biosynthesis

TheP. coloniale genome encoded 34 glycoside hydrolases (GHs) and 83 glycosyltransferases (GTs) belonging to 16 GH and 33 GT families (Supplementary Table14). The total number of CAZymes was lower than in the early-diverging Chlorophyta and Streptophyta, and even lower than inOstreococcus spp., the smallest eukaryotes (Supplementary Table15), which probably reflects the simple chemical structure of theP. coloniale cell wall (cells are enclosed within thick cell walls⁵⁷).P. coloniale, however, harbours all genes involved in the biosynthesis and metabolism of starch (Supplementary Table16 and Supplementary Fig.11). Surprisingly, we could not find any enzymes involved in the synthesis or remodelling of the major components of the primary cell wall in embryophytes, such as enzymes of cellulose, mannan, xyloglucan and xylan biosynthesis and degradation.Chlorella spp. have been reported to contain a cell wall with components of glucosamine polymers such as chitin and chitosan⁵⁸. However, very few chitosan-related genes were identified in the genome ofP. coloniale (Supplementary Table17). Interestingly, some but not many bacteria/archaea-specific protein glycosylation genes could be detected in theP. coloniale genome, such as low-salt glycan biosynthesis protein Agl12, low-salt glycan biosynthesis reductase Agl14 and the GT AglE, which are involved in S-layer and cell surface structure biogenesis in bacteria and archaea⁵⁹. Furthermore, seven copies of regulatory response/sensor proteins, homologous to bacteria, could be identified inP. coloniale, which might respond to environmental signals. Further studies are needed to biochemically explore the main components of the cell wall ofP. coloniale.

Peptidoglycan is the main component of cell walls in bacteria⁶⁰. Peptidoglycan biosynthesis requires several enzymes to participate in the conversion of UDP-N-acetyl-d-glucosamine (GlcNAc) to GlcNac-N-acetylmuramyl-pentapeptide-pyrophosphoryl-undecaprenol⁶¹. All ten core enzymes involved in peptidoglycan biosynthesis were identified in theP. coloniale genome (Fig.5a). Consistent with previous results, Glaucoplantae (C. paradoxa) andMicromonas pusilla (Mamiellophyceae), as well as all streptophyte algae, bryophytes and ferns, encoded all the core enzymes⁶². We conclude that peptidoglycan was present in the ancestor of Archaeplastida, completely lost in Rhodoplantae but retained in the common ancestor of Viridiplantae and Glaucoplantae, and then independently lost (to different degrees) in the later-diverging Mamiellophyceae, the core Chlorophyta and the vast majority of vascular seed plants.

Fig. 5 — a, Distribution of proteins involved in the peptidoglycan biosynthetic pathway across Archaeplastida.b, Distribution of key flagellar proteins across Viridiplantae and Glaucoplantae.

Evolutionary analysis of flagella and sexual reproduction inP. coloniale

Prasinoderma coloniale and other members of the recently described class Palmophyllophyceae²⁷ have been reported to lack flagella^57,63–65. We performed a comparative analysis of flagellar proteins, and found that non-flagellate species (three species of Trebouxiophyceae and four ofOstreococcus andBathycoccus) have ≤26 core flagellar proteins and a total number of flagellar proteins of ≤140 (average 117,n = 7), whereas flagellate species display ≥40 core flagellar proteins and a total of ≥192 flagellar proteins (average 272,n = 13) (Fig.5b and Supplementary Table18). This is corroborated by recent analyses among non-flagellate organisms (angiosperms, Rhodoplantae and pennate diatoms) that yielded on average 77 flagellar proteins²¹ (n = 10). Furthermore, non-flagellate species completely lack central pair proteins, dynein heavy chains (DHC1–7), and most of the intraflagellar transport (IFT) and radial spoke proteins (RSP)²¹. RSP3, which binds to the inner dynein arm of the axonemal doublet microtubule and is required for axonemal sliding and flagellar motility, is also absent in non-flagellate organisms^66,67. The number of flagellar proteins inP. coloniale (50 core proteins, 217 total flagellar proteins) and the presence of IFT (12) and DHC proteins (6), as well as RSP3, strongly suggest thatP. coloniale can produce flagellate cells. The absence of the PF6 protein of the central pair microtubule apparatus may indicate that flagellate cells inP. coloniale are short-lived, like the spermatozoids of centric diatoms⁶⁸ orChara braunii that also lack this protein.

Since sexual reproduction has not been observed inP. coloniale and Palmophyllophyceae in general, we searched for genes participating in sexual reproduction. Thirty-one out of 40 meiosis-related genes were identified in theP. coloniale genome, and 8 out of 11 meiosis-specific genes were found (Supplementary Table19). These numbers are higher than reported for meiosis-related genes inSymbiodinium⁶⁹ (25) andTrichomonas vaginalis⁷⁰ (27), and for meiosis-specific genes in some other protists (five genes inGiardia⁷¹ and diatoms⁷² and four in the trebouxiophytesAuxenochlorella andHelicosporidium⁷³), but similar to the number of meiosis-specific genes inMicromonas (7)⁴⁹. Interestingly,P. coloniale seems to lack DMC1, the loss of which correlates with the adaptation of recombination-independent mechanisms for pairing and synapsis in bothDrosophila andCaenorhabditis⁷⁴. We tentatively conclude thatP. coloniale retains the capacity for meiotic recombination and thus sexual reproduction.

De novo NAD⁺ and quinolate biosynthesis inP. coloniale

Nicotinamide adenine dinucleotide (NAD) and its phosphate (NADP) are essential redox co-factors in all living systems. All eukaryotic organisms have the ability to synthesize NAD by one of two de novo pathways, the aspartate pathway⁷⁵ or the kynurenine pathway starting with tryptophan⁷⁶. To date, no eukaryotic organism had been found that contains both pathways.P. coloniale is the first eukaryotic organism to display both pathways (Supplementary Table20, Fig.6a and Supplementary Figs.12–21): the (presumably) ancestral eukaryotic kynurenine pathway and the aspartate pathway. It has been hypothesized that the latter was acquired through primary endosymbiosis from cyanobacteria⁷⁷. This is corroborated by the fact that both aspartate oxidase (AO) and quinolinate synthase (QS) of Glaucoplantae in phylogenetic analyses branch within cyanobacteria. Rhodoplantae have apparently lost the aspartate pathway (and instead retained the kynurenine pathway for NAD biosynthesis⁷⁷). In Viridiplantae, however, the original cyanobacterial AO and QS genes were replaced by those acquired through horizontal gene transfer (HGT) from other bacteria (Bacteriodetes and Deltaproteobacteria, respectively⁷⁷). However, we can now develop a hypothetical evolutionary scenario for both pathways in Archaeplastida: with the introduction of the aspartate pathway from cyanobacteria during primary endosymbiosis, the ancestral eukaryotic kynurenine pathway for NAD biosynthesis was lost in Glaucoplantae and Viridiplantae (but not in Rhodoplantae, which apparently lost the aspartate pathway). While Glaucoplantae essentially retained the cyanobacterial aspartate pathway, the ancestor of the Viridiplantae replaced the cyanobacterial AO and QS genes by nuclear-encoded genes obtained from other bacteria through HGT, thus compensating their function and representing a new synapomorphy for Viridiplantae. This recalls the situation inPaulinella chromatophora, in which loss of genes from the chromatophore genome was compensated by bacterial genes obtained through HGT and encoded on the nuclear genome⁷⁸. We suggest thatP. coloniale retained the kynurenine pathway, not for NAD synthesis but for synthesis (and possible excretion) of picolinic acid. Additionally, gene fusion architecture between KYU and HAAO ofP. coloniale was observed (Fig.6b). The metabolite picolinic acid, a tryptophan catabolite, can potentially form metal complexes with limiting trace elements such as iron, an important property in oligotrophic environments.

Fig. 6 — a, Distribution of genes related to the de novo NAD⁺ and quinolinate biosynthetic pathways inP. coloniale (orange) as compared with Rhodoplantae (red), Glaucoplantae (purple), early-diverging Chlorophyta (blue), early-diverging Streptophyta (green) and bacteria (brown). Solid circles denote the presence of homologues in each clade. TDO/IDO, tryptophan-/indoleamine 2,3-dioxygenase; AFM, arylformamidase; KMO, kynurenine 3-monooxygenase; KYU, kynureninase; HAAO, 3-hydroxyanthranilate 3,4-dioxygenase; ACMSD, 2-amino-3-carboxymuconate-6-semialdehyde decarboxylase; AO,l-aspartate oxidase; QS, quinolinate synthase.b, A gene fusion architecture betweenKYU andHAAO ofP. coloniale; the left and right parts are theKYU andHAAO genes, respectively. A comparison of sequence similarity of variousKYU andHAAO genes from different organisms is shown.

Vitamin auxotrophy and selenocysteine-containing proteins inP. coloniale

Previous studies found that some Mamiellophyceae (Ostreococcus,Micromonas) need to acquire the vitamins thiamine (B₁) and cobalamin (B₁₂) from the extracellular environment for growth, because they lack either enzymes needed for their biosynthesis (B₁) or vitamin-independent isoforms of essential enzymes (for example, methionine synthase) that require the vitamin as a co-enzyme (B₁₂)^79,80.P. coloniale shares these features with the picoplanktonic Mamiellophyceae (Supplementary Table21). The absence of any enzymes involved in the biosynthesis of thiamine inP. coloniale (except for the final phosphorylation step) demonstrates (1) the lack of bacterial contaminations in the genome assembly of the axenic strain used, and (2) that thiamine must be provided by the extracellular environment. It has recently been shown that, inOstreococcus tauri, B₁ and B₁₂ auxotrophy can be alleviated by co-cultivation with the bacteriumDinoroseobacter shibae, a member of the Rhodobacteraceae⁸¹, suggesting that inP. coloniale similar algal/bacterial partnerships may exist. Most importantly,P. coloniale is also a vitamin B₇ (biotin) auxotroph, lacking all four genes involved in the biosynthesis of this vitamin, and apparently the only biotin auxotroph currently known among Archaeplastida⁸² (Supplementary Table21). Many genomes of marine bacteria contain full sets of biotin biosynthesis genes^83,84 and these bacteria could be a source of biotin forP. coloniale. Genome compaction in these picoplanktonic eukaryotes may have facilitated evolution of such symbiotic interactions in an oligotrophic marine environment.

The Mamiellophyceae genomes encode a large number of selenocysteine-containing proteins compared toC. reinhardtii^49,54. The number of selenoproteins, their homologues and selenocysteine insertion sequences have recently been investigated across selected Archaeplastida genomes⁸⁵.P. coloniale also displayed a high number of selenocysteine-containing proteins, similar to picoplanktonic Mamiellophyceae but unlike core Chlorophyta, early-diverging streptophyte algae, Glaucoplantae and Rhodoplantae (Supplementary Table22). Selenoenzymes are more catalytically active than similar enzymes lacking selenium, and small cells may therefore require fewer of those proteins⁵⁴.

Conclusion

The picoplanktonic eukaryoteP. coloniale is a member of a new division/phylum of Viridiplantae, the Prasinodermophyta, that in phylogenomic analyses diverges before the split of Viridiplantae into Chlorophyta and Streptophyta. Its genome revealed both ancestral and derived characteristics that correspond to its unique phylogenetic position, equidistant from Chlorophyta and Streptophyta. The genome of strain CCMP 1413 showed adaptations to a low-light (deep-water), oligotrophic oceanic environment. In such an environment, metabolic coupling and horizontal gene transfer from bacteria may have facilitated adaptation. In the latter, it resembles the genomes of the picoplanktonic Mamiellophyceae (Chlorophyta), although a number of apomorphic features in the genome ofP. coloniale suggest that the picoplanktonic lifestyle in the two groups evolved independently.

Methods

Cultivation of algae, nucleic acid extraction and light microscopy

Cultures ofP. coloniale (CCMP 1413) were obtained from the National Center for Marine Algae and Microbiota (https://ncma.bigelow.org/ccmp1413#.XqP0zGgzYdU). Axenic cultures were prepared by streaking out algae on agar and picking single cell-derived clones from the plates. Algae were grown in a modified ASP12 culture medium⁸⁶ (http://www.ccac.uni-koeln.de/) in a 14/10 h light/dark cycle at 20 µmol photons m⁻² s⁻¹ and 23 °C. During all steps of culture scale-up until nucleic acid extraction, axenicity was monitored by both sterility tests and light microscopy. Total RNA was extracted fromP. coloniale using the CTAB-PVP method as described in ref.⁸⁷ (appendix S1 therein). Total DNA was extracted using a modified CTAB protocol⁸⁸. Light microscopy was performed with a Leica DMLB light microscope using a PL-APO ×100/1.40 numerical aperture (NA) objective, an immersed condenser (NA 1.4) and a Metz Mecablitz 32 Ct3 flash system.

Genome sequencing and assembly

The long reads libraries were constructed using standard library preparation protocols and sequenced by the Pacbio Sequel platform. NextDenovo (https://github.com/Nextomics/NextDenovo) was used to generate the draft assembly. The draft assembly were first polished by Pacbio reads using Arrow, then NextPolish was used to perform a second round of polishing using short reads generated by the Illumina sequence platform. To eliminate putative bacterial contamination, contigs were searched against the NCBI non-redundant database.

K-mer analysis was performed to survey genome size, heterozygosity and repeat content before genome assembly. The peak ofK-mer frequency (M) was determined by the real sequencing depth of the genome (N), read length (L) and the length of theK-mer (K) following the formula:M = N × (L – K + 1)/L. This formula enables accurate estimation ofN, and hence an estimation of the genome size for homozygous diploid or haploid genomes. All these analyses indicated homozygosity of the genome and gave similar estimations of genome size. The final genome size was estimated (~26.04 Mb) using 17-mer analysis.

The quality of the assembly was evaluated in four ways: (1) we used BUSCO v.3 to determine the presence of a proportion of a core set of 303 highly conserved eukaryotic genes. (2) SOAP (v.2.21) was used to map the short reads to the assemblies to evaluate the DNA reads mapping rate in both species. In the meantime, sequence depth and genomic copy content distribution were calculated. (3) We used BLAT (v.36) to compare the draft assemblies to a transcript assembled by Bridger. (4) We mapped the RNA reads to the draft assemblies to evaluate the RNA reads mapping rate using Tophat2.

Transcriptome sequencing and assembly

Two methods of library construction were performed. The rRNA-depleted RNA library was constructed using the ribo-zero rRNA removal kit (plant) (Illumina) following the manufacturer’s protocol, while the poly (A)-selected RNA library was constructed using the ScriptSeq Library Prep kit (Plant leaf) (Illumina) following the manufacturer’s protocol. A total of 12.09 Gb of PE-100 RNA-seq data was generated using the Illumina Hiseq 4000 sequencing platform. SOAPfilter (v.2.2) was used to filter the reads withN > 10 bp, removing duplicates and adaptors. As a result, 5.58 Gb of clean reads were obtained after filtering, then Bridger was used to assemble the clean data into a transcriptome, which was used for gene annotation and genome evaluation.

Repeat annotation

A pipeline combining de novo and library-based approaches was used to identify the repeat elements. For the de novo approach, MITE-hunter and LTRharvest were used to annotate the transposon and retrotransposon, respectively, then RepeatModeler (v.1.0.8) was performed to annotate the other repeat elements. For the library-based approach, the custom library Repbase 22.01 was used to identify the repeat elements by RepeatMasker.

Gene prediction and preliminary functional annotation

Three methods were combined to predict the gene model, an ab initio prediction method, a homologue search method and a RNA-seq data-aided method. For the first method, PASApipeline-2.1.0 was performed to predict gene structure using transcripts assembled by Bridger, which were further used in AUGUSTUS (v.3.2.3) to train gene models. GeneMark (v.1.0) was used to construct a hidden Markov model (HMM) profile for further annotation. For the homologue search method, gene sets of homologue species and public proteins ofPrasinoderma were downloaded from the NCBI database. For the RNA-aided method, transcripts were assembled by Bridger as evidence. All predictions were combined using two rounds of MAKER (v.2.31.8) to yield the consensus gene sets. The final gene set was evaluated by mapping with eukaryotic BUSCO v.3 dataset and RNA read mapping by Tophat2. Coverage depth was calculated by Samtools (v.0.1.19).

Preliminary gene function annotation was performed by BLASTP (<10 × 10^–5) against certain known databases, including SwissProt, TrEMBL, KEGG, COG and NR. InterProScan (using data from Pfam, PRINTS, SMART, ProDom and PROSITE) was used to identify protein motifs and protein domains of the predicted gene set. Gene Ontology information was obtained through Blast2Go (v.2.5.0). For certain key functional genes we used a stricter functional annotation method by the addition of some known query genes, as described inDetection of important candidate functional genes.

Whole-genome phylogenetic analysis

For whole-genome phylogenetic analysis, both genome data downloaded from public databases (NCBI Refseq or JGI) and transcriptome data downloaded from the 1000 Plants Project (1KP,https://sites.google.com/a/ualberta.ca/onekp/) were used. First, OrthoFinder (v.1.1.8) was used to infer orthogroups (gene families) among the 28 selected organisms. Single-copy orthogroups (gene families with only one gene copy per species) were collected, since every single-copy gene in each gene family could be an orthologue among 28 organisms. We used multiple alignment with fast Fourier transform (MAFFT v.7.310) to perform multiple sequence alignment for each single-copy gene orthogroup, followed by a gap position (removing only positions where 50% or more of the sequences having a gap are treated as gap positions). We constructed multiple phylogenetic trees using different tree construction methods (concatenated and coalescent methods) based on different taxon samplings (that is, number of species). In concatenated tree reconstruction, each single-copy gene alignment was linked by order to establish a super-gene, which was used to construct a concatenated maximum-likelihood phylogenetic tree with either RAxML (amino acid substitution model: CAT + GTR, with 500 bootstrap replicates) or IQ-TREE (amino substitution model inferred by ModelFinder, with 500 bootstrap replicates). In addition, we used MrBayes (v.3.2.6) to construct a Bayesian phylogenetic tree, Markov chain Monte Carlo, which was set to run 1,000,000 generations and sampled every 1,000 generations, the first 25% of which was discarded as burn-in. In the coalescent method, a maximum-likelihood phylogenetic tree was constructed for each single-copy orthogroup. We then used ASTRAL to combine all single-copy gene trees into a species tree with the multi-species coalescent model. Finally, we compared and summarized phylogenetic trees using different methods or different datasets. For a general discussion on concatenated versus coalescent methods for phylogenetic reconstruction, see ref.²⁸.

Phylogenetic analyses of complete nuclear and plastid-encoded rRNA operon sequences of 109 Archaeplantae

New sequence data of rRNA operons were generated for several taxa (see Supplementary Table33, and as described previously³⁰). For other taxa, data were either retrieved from annotated entries in sequence databases (https://www.ncbi.nlm.nih.gov/nucleotide/) or assembled from non-annotated transcriptome sequence data (MMETSP and ONE_KP; see Supplementary Table31). All new rRNA sequences, as well as newly assembled transcriptome data, were submitted to GENBANK (https://www.ncbi.nlm.nih.gov/genbank/; bold accession numbers in Supplementary Table31). Sequences were manually aligned, guided by rRNA/transfer RNA secondary structures using SeaView 4.3.0 (http://pbil.univ-lyon1.fr/software/seaview.html). For phylogenetic analyses, only those positions were selected that could be unambiguously aligned among the Rhodoplantae, Glaucoplantae and Viridiplantae—in total, 8,818 nucleotides (nt). For all phylogenetic analyses, the 8,818 positions were subdivided into four sections: nuclear 18 S rDNA (1,621 nt), nuclear 5.8 S and 28 S rDNA (3,025 nt), plastid 16 S rDNA and two tRNA genes (1,535 nt) and plastid 23 S rDNA (2,637 nt). Tree reconstructions were performed at the CIPRES Science Gateway (http://www.phylo.org/sub_sections/portal/) using three methods: maximum-likelihood with RAxML (v.8.2.10), maximum-likelihood with IQ-TREE (v.1.6.10) and Bayesian tree reconstruction with MrBayes (v.3.2.6).

RAxML analyses were performed with 1,000 bootstrap replicates, each with 100 starting trees, using either the GTRGAMMA model (for all trees shown here) or the GTRCAT model (GTRCAT trees were almost identical to GTRGAMMA trees; not shown). In likelihood analyses using IQ-TREE, the best-fitting model was identified by ModelFinder and the bootstrap analysis again involved 1,000 replicates. For Bayesian analysis, 1,000,000 generations were calculated under the GTR + I + G model, and generations 1–250,000 were discarded as burn-in. Bootstrap percentages <50% and Bayesian posterior probabilities <0.9 were regarded as ‘unsupported’. Phylogenetic trees were also constructed using nuclear- and plastid-encoded rRNA operons separately (Supplementary Figs.5 and6).

Search for unique rRNA synapomorphies

To find unique molecular synapomorphies (Supplementary Files_Taxonomic Acts and Revisions)—that is, rare mutations that characterize a given clade—we performed tree-based synapomorphy searches as previously described⁸⁹. To identify genuine non-homoplasious synapomorphies (flagged as NHS), and to find homoplasious changes (parallelisms and reversals), the synapomorphy search must cover as much diversity as possible. Therefore, all synapomorphies that resulted from the initial search procedure (using only 109 Plantae) were controlled for homoplasies by (1) a taxon-rich alignment containing nuclear rRNA operons from about 1,300 Archaeplastida/Plantae, (2) an alignment with plastid rRNA operons from about 1,600 Archaeplastida/Plantae and (3) BLAST searches (https://blast.ncbi.nlm.nih.gov/Blast.cgi).

Genome composition ofP. coloniale genome

We looked at the components of theP. coloniale genome mainly in three ways: (1) gene family clustering was first performed on gene sets of the speciesC. atmophyticus (Streptophyta),M. commoda (Chlorophyta),C. paradoxa (Glaucoplantae) andP. coloniale. Commonly shared and unique gene families were shown and displayed on a Venn diagram. (2) TheP. coloniale gene set was aligned to the NCBI non-redundant database (NR), and the best alignment results (Best-hit) were obtained for each gene. The NCBI taxonomy database was then used to classify theP. coloniale gene set. (3) We selected early-diverging Streptophyta (five species), early-diverging Chlorophyta (five species) andP. coloniale to perform the gene family cluster, and then divided all gene families into three categories: early Chlorophyta gene families, early Streptophyta gene families and gene families shared by both early Chlorophyta and early Streptophyta³⁵. First, we removed unusual/weird gene families in which the gene number of some species was over tenfold larger than the average gene number of the other species. We also removed gene families that include only one species. Then, the average gene numbers in early Chlorophyta and early Streptophyta were determined for each gene family. If the average gene number of early Chlorophyta in a gene family was more than twice the number of early Streptophyta, that gene family was designated as an early Chlorophyta gene family. Conversely, if the average gene number of early Streptophyta in a gene family was larger than twice the number of early Chlorophyta, that gene family was designated as an early Streptophyta gene family. The remaining gene families were shared between early Chlorophyta and early Streptophyta.

Detection of key candidate functional genes

All candidate genes were screened based on the following conditions: (1) candidate gene sequences should be similar to the query genes collected from previous studies or databases (BLAST <10 × 10^–5); and (2) the function of the candidate genes should be consistent with the query genes according to online NR functional annotation or Swissprot functional annotation.

Regarding the detection of flagellar genes, we mainly referenced the flagellar genes from refs.^49,90. After elimination of redundancy, we obtained 397 flagellar genes as our query set. We used the reciprocal best hits method to identify flagellar genes.

For cell wall-related gene annotation we used the CAZyme database as query, then the web meta-server dbCAN2 (http://bcb.unl.edu/dbCAN2/index.php) was used to detect CAZymes. dbCAN2 integrates three tools/databases for automated CAZyme annotation: (1) HMMER, for annotation of the CAZyme domain against the dbCAN CAZyme domain HMM database; (2) DIAMOND for fast blast hits in the CAZy database; and (3) Hotpep for short conserved motifs in the Peptide Pattern Recognition (PPR) library.

For TFs we used the HMMER search method. We downloaded the HMMER model of the domain structure of each transcription factor from the Pfam website (https://pfam.xfam.org/) while referring to the TAPscan v.2 transcription factor database⁹¹ (https://plantcode.online.uni-marburg.de/tapscan/). Preliminary candidates were collected by searching the profile HMM for each species (<10 × 10^–5), then we filtered those genes that did not match the SwissProt functional annotation (<10 × 10^–5). Finally, we filtered genes containing a wrong domain according to the domain rules of the TAPscan v.2 transcription factor database. Most TFs/TRs were confirmed by phylogenetic tree analysis.

Subcellular localization

To predict where key proteins (for example, certain enzymes related to carbon-concentrating mechanisms) reside in a cell, we used online tools including WoLF_PSORT (https://www.genscript.com/wolf-psort.html?src=leftbar), TargetP (http://www.cbs.dtu.dk/services/TargetP/), Hectar (https://webtools.sb-roscoff.fr/) and LocSigDB (http://genome.unmc.edu/LocSigDB/index.html) to predict the subcellular localization of these proteins. Combining the results of the four tools, we estimated the localization.

Reporting Summary

Further information on research design is available in theNature Research Reporting Summary linked to this article.

Supplementary information

Supplementary Information^{(12.3MB, pdf)}

Supplementary Figs. 1–21.

Reporting Summary^{(70.4KB, pdf)}

Supplementary Tables^{(280.1KB, xlsx)}

Supplementary Tables 1–33.

Supplementary Data 1^{(353KB, zip)}

Taxonomic Acts and Revisions.

Acknowledgements

We thank G. Günther (http://www.mikroskopia.de/index.html) for microscopic images ofP. coloniale. Financial support was provided by the Shenzhen Municipal Government of China (grant no. JCYJ20151015162041454) and the Guangdong Provincial Key Laboratory of Genome Read and Write (grant no. 2017B030301011). This work is part of the 10KP project, and is supported by China National GeneBank.

Extended data

Author contributions

H.Liu., M.M. and Y.V.P. conceived, designed and supervised the project. M.M., H.Liu., X.X., J.W., G.K.-S.W. and H.Y. provided resources and materials. Z.C. and S.K.S. developed the protocol for DNA extraction. S.Wittek., T.R. and B.Melkonian grew the organisms to quantity and extracted DNA. Samples were sequenced by B.G.I.; L.L. and S.Wang generated the draft genome and performed the annotation. L.L., S.Wang, H.W., S.K.S., B.Marin, H.Y., Y.X., H.Li., H.Liang, Z.L., S.C. and M.P. analysed data. S.Wang., L.L., S.K.S. and M.M. wrote the paper. J.W., H.Y., X.L., H.Liu., M.M. and Y.V.P. revised the manuscript. All authors read and revised the final version of the manuscript.

Data availability

Whole-genome assemblies, annotation and raw data forP. coloniale in this study are deposited at the CNGB Nucleotide Sequence Archive⁹² (CNSA:http://db.cngb.org/cnsa, accession no. CNP0000924).

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

These authors contributed equally: Linzhou Li, Sibo Wang.

Change history

2/15/2021

A Correction to this paper has been published: 10.1038/s41559-020-1268-5

Contributor Information

Yves Van de Peer, Email: yves.vandepeer@psb.vib-ugent.be.

Michael Melkonian, Email: michael.melkonian@uni-koeln.de.

Huan Liu, Email: liuhuan@genomics.cn.

Extended data

is available for this paper at 10.1038/s41559-020-1221-7.

Supplementary information

is available for this paper at 10.1038/s41559-020-1221-7.

References

1.Niklas, K. J.The Evolutionary Biology of Plants (Univ. of Chicago Press, 1997).
2.Kenrick P, Crane P. The origin and early evolution of plants on Land. Nature. 1997;389:33–39. [Google Scholar]
3.Willis, K. & McElwain, J.The Evolution of Plants (Oxford Univ. Press, 2014).
4.Judd, W. S., Campbell, C. S., Kellogg, E. A., Stevens, P. F. & Donoghue, M. J.Plant Systematics: A Phylogenetic Approach (Sinauer, 2008).
5.Courties C, et al. Smallest eukaryotic organism. Nature. 1994;370:255. [Google Scholar]
6.Yoon HS, Hackett JD, Ciniglia C, Pinto G, Bhattacharya D. A molecular timeline for the origin of photosynthetic eukaryotes. Mol. Biol. Evol. 2004;21:809–818. doi: 10.1093/molbev/msh075. [DOI] [PubMed] [Google Scholar]
7.Morris JL, et al. The timescale of early land plant evolution. Proc. Natl Acad. Sci. USA. 2018;115:E2274–E2283. doi: 10.1073/pnas.1719588115. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Bremer K. Summary of green plant phylogeny and classification. Cladistics. 1985;1:369–385. doi: 10.1111/j.1096-0031.1985.tb00434.x. [DOI] [PubMed] [Google Scholar]
9.Melkonian M, Surek B. Phylogeny of the Chlorophyta: congruence between ultrastructural and molecular evidence. Bull. Soc. Zool. Fr. 1995;120:191–208. [Google Scholar]
10.Lewis LA, McCourt RM. Green algae and the origin of land plants. Am. J. Bot. 2004;91:1535–1556. doi: 10.3732/ajb.91.10.1535. [DOI] [PubMed] [Google Scholar]
11.Becker B, Marin B. Streptophyte algae and the origin of embryophytes. Ann. Bot. 2009;103:999–1004. doi: 10.1093/aob/mcp044. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Leliaert F, et al. Phylogeny and molecular evolution of the green algae. Crit. Rev. Plant Sci. 2012;31:1–46. [Google Scholar]
13.Wickett NJ, et al. Phylotranscriptomic analysis of the origin and early diversification of land plants. Proc. Natl Acad. Sci. USA. 2014;111:E4859–E4868. doi: 10.1073/pnas.1323926111. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Ruhfel BR, Gitzendanner MA, Soltis PS, Soltis DE, Burleigh JG. From algae to angiosperms–inferring the phylogeny of green plants (Viridiplantae) from 360 plastid genomes. BMC Evol. Biol. 2014;14:23. doi: 10.1186/1471-2148-14-23. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Gitzendanner MA, Soltis PS, Wong GKS, Ruhfel BR, Soltis DE. Plastid phylogenomic analysis of green plants: a billion years of evolutionary history. Am. J. Bot. 2018;105:291–301. doi: 10.1002/ajb2.1048. [DOI] [PubMed] [Google Scholar]
16.Turmel M, Lemieux C. Evolution of the plastid genome in green algae. Adv. Bot. Res. 2018;85:157–193. [Google Scholar]
17.Stewart KD, Mattox KR. Structural evolution in the flagellated cells of green algae and land plants. BioSystems. 1978;10:145–152. doi: 10.1016/0303-2647(78)90036-9. [DOI] [PubMed] [Google Scholar]
18.Melkonian, M. Structural and evolutionary aspects of the flagellar apparatus in green algae and land plants.Taxon31, 255–265 (1982).
19.Lemieux C, Otis C, Turmel M. Ancestral chloroplast genome in Mesostigma viride reveals an early branch of green plant evolution. Nature. 2000;403:649–652. doi: 10.1038/35001059. [DOI] [PubMed] [Google Scholar]
20.Wang S, et al. Genomes of early-diverging streptophyte algae shed light on plant terrestrialization. Nat. Plants. 2020;6:95–106. doi: 10.1038/s41477-019-0560-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Rodríguez-Ezpeleta N, Philippe H, Brinkmann H, Becker B, Melkonian M. Phylogenetic analyses of nuclear, mitochondrial, and plastid multigene data sets support the placement of Mesostigma in the Streptophyta. Mol. Biol. Evol. 2007;24:723–731. doi: 10.1093/molbev/msl200. [DOI] [PubMed] [Google Scholar]
22.Marin B, Melkonian M. Mesostigmatophyceae, a new class of streptophyte green algae revealed by SSU rRNA sequence comparisons. Protist. 1999;150:399–417. doi: 10.1016/S1434-4610(99)70041-6. [DOI] [PubMed] [Google Scholar]
23.Guillou L, et al. Diversity of picoplanktonic prasinophytes assessed by direct nuclear SSU rDNA sequencing of environmental samples and novel isolates retrieved from oceanic and coastal marine ecosystems. Protist. 2004;155:193–214. doi: 10.1078/143446104774199592. [DOI] [PubMed] [Google Scholar]
24.Marin B, Melkonian M. Molecular phylogeny and classification of the Mamiellophyceae class. nov. (Chlorophyta) based on sequence comparisons of the nuclear- and plastid-encoded rRNA operons. Protist. 2010;161:304–336. doi: 10.1016/j.protis.2009.10.002. [DOI] [PubMed] [Google Scholar]
25.Lemieux C, Otis C, Turmel M. Six newly sequenced chloroplast genomes from prasinophyte green algae provide insights into the relationships among prasinophyte lineages and the diversity of streamlined genome architecture in picoplanktonic species. BMC Genomics. 2014;15:857. doi: 10.1186/1471-2164-15-857. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Zechman FW, et al. An unrecognized ancient lineage of green plants persists in deep marine waters. J. Phycol. 2010;46:1288–1295. [Google Scholar]
27.Leliaert F, et al. Chloroplast phylogenomic analyses reveal the deepest-branching lineage of the Chlorophyta, Palmophyllophyceae class. nov. Sci. Rep. 2016;6:25367. doi: 10.1038/srep25367. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Molloy, E. & Warnow, T. Large-scale species tree estimation. Preprint atarXivhttps://arxiv.org/abs/1904.02600 (2019).
29.Leebens-Mack JH, et al. One thousand plant transcriptomes and the phylogenomics of green plants. Nature. 2019;574:679–685. doi: 10.1038/s41586-019-1693-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Marin B. Nested in the Chlorellales or independent class? Phylogeny and classification of the Pedinophyceae (Viridiplantae) revealed by molecular phylogenetic analyses of complete nuclear and plastid-encoded rRNA operons. Protist. 2012;163:778–805. doi: 10.1016/j.protis.2011.11.004. [DOI] [PubMed] [Google Scholar]
31.Shen X-X, Hittinger CT, Rokas A. Contentious relationships in phylogenomic studies can be driven by a handful of genes. Nat. Ecol. Evol. 2017;1:0126. doi: 10.1038/s41559-017-0126. [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Walker JF, Walker-Hale N, Vargas OM, Larson DA, Stull GW. Characterizing gene tree conflict in plastome-inferred phylogenies. PeerJ. 2019;7:e7747. doi: 10.7717/peerj.7747. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Gonçalves DJP, Simpson BB, Ortiz EM, Shimizu GH, Jansen RK. Incongruence between gene trees and species trees and phylogenetic signal variation in plastid genes. Mol. Phylogenet. Evol. 2019;138:219–232. doi: 10.1016/j.ympev.2019.05.022. [DOI] [PubMed] [Google Scholar]
34.Grimsley, N., Yau, S., Piganeau, G. & Moreau, H. inMarine Protists (eds. Ohtsuka, S. et al.) 107–127 (Springer, 2015).
35.Hori K, et al. Klebsormidium flaccidum genome reveals primary factors for plant terrestrial adaptation. Nat. Commun. 2014;5:3978. doi: 10.1038/ncomms4978. [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Nishiyama T, et al. The Chara genome: secondary complexity and implications for plant terrestrialization. Cell. 2018;174:448–464. doi: 10.1016/j.cell.2018.06.033. [DOI] [PubMed] [Google Scholar]
37.Rinerson CI, Rabara RC, Tripathi P, Shen QJ, Rushton PJ. The evolution of WRKY transcription factors. BMC Plant Biol. 2015;15:66. doi: 10.1186/s12870-015-0456-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Safi A, et al. The world according to GARP transcription factors. Curr. Opin. Plant Biol. 2017;39:159–167. doi: 10.1016/j.pbi.2017.07.006. [DOI] [PubMed] [Google Scholar]
39.Guo A-Y, et al. Genome-wide identification and evolutionary analysis of the plant-specific SBP-box transcription factor family. Gene. 2008;418:1–8. doi: 10.1016/j.gene.2008.03.016. [DOI] [PubMed] [Google Scholar]
40.Kropat J, et al. A regulator of nutritional copper signaling in Chlamydomonas is an SBP domain protein that recognizes the GTAC core of copper response element. Proc. Natl Acad. Sci. USA. 2005;102:18730–18735. doi: 10.1073/pnas.0507693102. [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Moreno-Risueno MÁ, Martínez M, Vicente-Carbajosa J, Carbonero P. The family of DOF transcription factors: from green unicellular algae to vascular plants. Mol. Genet. Genomics. 2007;277:379–390. doi: 10.1007/s00438-006-0186-9. [DOI] [PubMed] [Google Scholar]
42.Crevillén P, et al. Epigenetic reprogramming that prevents transgenerational inheritance of the vernalized state. Nature. 2014;515:587–590. doi: 10.1038/nature13722. [DOI] [PMC free article] [PubMed] [Google Scholar]
43.Liu C, Lu F, Cui X, Cao X. Histone methylation in higher plants. Annu. Rev. Plant Biol. 2010;61:395–420. doi: 10.1146/annurev.arplant.043008.091939. [DOI] [PubMed] [Google Scholar]
44.Croce, R., Van Grondelle, R., Van Amerongen, H. & Van Stokkum, I.Light Harvesting in Photosynthesis (CRC Press, 2018).
45.Grossman AR, Bhaya D, Apt KE, Kehoe DM. Light-harvesting complexes in oxygenic photosynthesis: diversity, control, and evolution. Annu. Rev. Genet. 1995;29:231–288. doi: 10.1146/annurev.ge.29.120195.001311. [DOI] [PubMed] [Google Scholar]
46.Dreyfuss BW, Thornber JP. Assembly of the light-harvesting complexes (LHCs) of photosystem II (monomeric LHC Iib complexes are intermediates in the formation of oligomeric LHC IIb complexes) Plant Physiol. 1994;106:829–839. doi: 10.1104/pp.106.3.829. [DOI] [PMC free article] [PubMed] [Google Scholar]
47.Schmid VHR. Light-harvesting complexes of vascular plants. Cell. Mol. Life Sci. 2008;65:3619–3639. doi: 10.1007/s00018-008-8333-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
48.Kunugi M, et al. Evolution of green plants accompanied changes in light-harvesting systems. Plant Cell Physiol. 2016;57:1231–1243. doi: 10.1093/pcp/pcw071. [DOI] [PubMed] [Google Scholar]
49.Worden AZ, et al. Green evolution and dynamic adaptations revealed by genomes of the marine picoeukaryotes Micromonas. Science. 2009;324:268–272. doi: 10.1126/science.1167222. [DOI] [PubMed] [Google Scholar]
50.Cock JM, et al. The Ectocarpus genome and the independent evolution of multicellularity in brown algae. Nature. 2010;465:617–621. doi: 10.1038/nature09016. [DOI] [PubMed] [Google Scholar]
51.Roberts K, Granum E, Leegood RC, Raven JA. Carbon acquisition by diatoms. Photosynth. Res. 2007;93:79–88. doi: 10.1007/s11120-007-9172-2. [DOI] [PubMed] [Google Scholar]
52.Kroth PG, et al. A model for carbohydrate metabolism in the diatom Phaeodactylum tricornutum deduced from comparative whole genome analysis. PLoS ONE. 2008;3:e1426. doi: 10.1371/journal.pone.0001426. [DOI] [PMC free article] [PubMed] [Google Scholar]
53.Radakovits R, et al. Draft genome sequence and genetic transformation of the oleaginous alga Nannochloropis gaditana. Nat. Commun. 2012;3:686. doi: 10.1038/ncomms1688. [DOI] [PMC free article] [PubMed] [Google Scholar]
54.Palenik B, et al. The tiny eukaryote Ostreococcus provides genomic insights into the paradox of plankton speciation. Proc. Natl Acad. Sci. USA. 2007;104:7705–7710. doi: 10.1073/pnas.0611046104. [DOI] [PMC free article] [PubMed] [Google Scholar]
55.Derelle E, et al. Genome analysis of the smallest free-living eukaryote Ostreococcus tauri unveils many unique features. Proc. Natl Acad. Sci. USA. 2006;103:11647–11652. doi: 10.1073/pnas.0604795103. [DOI] [PMC free article] [PubMed] [Google Scholar]
56.Moreau H, et al. Gene functionalities and genome structure in Bathycoccus prasinos reflect cellular specializations at the base of the green lineage. Genome Biol. 2012;13:R74. doi: 10.1186/gb-2012-13-8-r74. [DOI] [PMC free article] [PubMed] [Google Scholar]
57.Jouenne F, et al. Prasinoderma singularis sp. nov. (Prasinophyceae, Chlorophyta), a solitary coccoid prasinophyte from the South-East Pacific Ocean. Protist. 2011;162:70–84. doi: 10.1016/j.protis.2010.04.005. [DOI] [PubMed] [Google Scholar]
58.Blanc G, et al. The Chlorella variabilis NC64A genome reveals adaptation to photosymbiosis, coevolution with viruses, and cryptic sex. Plant Cell. 2010;22:2943–2955. doi: 10.1105/tpc.110.076406. [DOI] [PMC free article] [PubMed] [Google Scholar]
59.Fagan RP, Fairweather NF. Biogenesis and functions of bacterial S-layers. Nat. Rev. Microbiol. 2014;12:211–222. doi: 10.1038/nrmicro3213. [DOI] [PubMed] [Google Scholar]
60.Seltmann, G. & Holst, O.The Bacterial Cell Wall (Springer Science & Business Media, 2013).
61.Lovering AL, Safadi SS, Strynadka NCJ. Structural perspective of peptidoglycan biosynthesis and assembly. Annu. Rev. Biochem. 2012;81:451–478. doi: 10.1146/annurev-biochem-061809-112742. [DOI] [PubMed] [Google Scholar]
62.van Baren MJ, et al. Evidence-based green algal genomics reveals marine diversity and ancestral characteristics of land plants. BMC Genomics. 2016;17:267. doi: 10.1186/s12864-016-2585-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
63.Miyashita H, Ikemoto H, Kurano N, Miyachi S, Chihara M. Prasinococcus capsulatus gen. et sp. nov., a new marine coccoid prasinophyte. J. Gen. Appl. Microbiol. 1993;39:571–582. [Google Scholar]
64.Hasegawa T, et al. Prasinoderma coloniale gen. et sp. nov., a new pelagic coccoid prasinophyte from the Western Pacific Ocean. Phycologia. 1996;35:170–176. [Google Scholar]
65.Sieburth JM, Keller MD, Johnson PW, Myklestad SM. Widespread occurrence of the oceanic ultraplankter, Prasinococcus capsulatus (Prasinophyceae), the diagnostic “Golgi‐decapore complex” and the newly described polysaccharide “capsulan”. J. Phycol. 1999;35:1032–1043. [Google Scholar]
66.Yang, P. & Smith, E. F. inThe Chlamydomonas Sourcebook (ed. Witman, G. B.) 209–234 (Elsevier, 2009).
67.Jivan A, Earnest S, Juang YC, Cobb MH. Radial spoke protein 3 is a mammalian protein kinase A-anchoring protein that binds ERK1/2. J. Biol. Chem. 2009;284:29437–29445. doi: 10.1074/jbc.M109.048181. [DOI] [PMC free article] [PubMed] [Google Scholar]
68.Armbrust EV, et al. The genome of the diatom Thalassiosira pseudonana: ecology, evolution, and metabolism. Science. 2004;306:79–86. doi: 10.1126/science.1101156. [DOI] [PubMed] [Google Scholar]
69.Chi J, Parrow MW, Dunthorn M. Cryptic sex in Symbiodinium (Alveolata, Dinoflagellata) is supported by an inventory of meiotic genes. J. Eukaryot. Microbiol. 2014;61:322–327. doi: 10.1111/jeu.12110. [DOI] [PubMed] [Google Scholar]
70.Malik S-B. An expanded inventory of conserved meiotic genes provides evidence for sex in Trichomonas vaginalis. PLoS ONE. 2008;3:e2879. doi: 10.1371/journal.pone.0002879. [DOI] [PMC free article] [PubMed] [Google Scholar]
71.Ramesh MA, Malik SB, Logsdon JM., Jr A phylogenomic inventory of meiotic genes: evidence for sex in Giardia and an early eukaryotic origin of meiosis. Curr. Biol. 2005;15:185–191. doi: 10.1016/j.cub.2005.01.003. [DOI] [PubMed] [Google Scholar]
72.Patil S, et al. Identification of the meiotic toolkit in diatoms and exploration of meiosis-specific SPO11 and RAD51 homologs in the sexual species Pseudo-nitzschia multistriata and Seminavis robusta. BMC Genomics. 2015;16:930. doi: 10.1186/s12864-015-1983-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
73.Fučíková K, Pažoutová M, Rindi F. Meiotic genes and sexual reproduction in the green algal class Trebouxiophyceae (Chlorophyta) J. Phycol. 2015;51:419–430. doi: 10.1111/jpy.12293. [DOI] [PubMed] [Google Scholar]
74.Villeneuve AM, Hillers KJ. Whence meiosis? Cell. 2001;106:647–650. doi: 10.1016/s0092-8674(01)00500-1. [DOI] [PubMed] [Google Scholar]
75.Griffith GR, Chandler JL, Gholson RK. Studies on the de novo biosynthesis of NAD in Escherichia coli: the separation of the nadB gene product from the nadA gene product and its purification. Eur. J. Biochem. 1975;54:239–245. doi: 10.1111/j.1432-1033.1975.tb04133.x. [DOI] [PubMed] [Google Scholar]
76.Gaertner FH, Shetty AS. Kynureninase-type enzymes and the evolution of the aerobic tryptophan-to-nicotinamide adenine dinucleotide pathway. Biochim. Biophys. Acta Enzymol. 1977;482:453–460. doi: 10.1016/0005-2744(77)90259-5. [DOI] [PubMed] [Google Scholar]
77.Ternes CM, Schönknecht G. Gene transfers shaped the evolution of de novo NAD+ biosynthesis in eukaryotes. Genome Biol. Evol. 2014;6:2335–2349. doi: 10.1093/gbe/evu185. [DOI] [PMC free article] [PubMed] [Google Scholar]
78.Nowack ECM, et al. Gene transfers from diverse bacteria compensate for reductive genome evolution in the chromatophore of Paulinella chromatophora. Proc. Natl Acad. Sci. USA. 2016;113:12214–12219. doi: 10.1073/pnas.1608016113. [DOI] [PMC free article] [PubMed] [Google Scholar]
79.Croft MT, Lawrence AD, Raux-Deery E, Warren MJ, Smith AG. Algae acquire vitamin B12 through a symbiotic relationship with bacteria. Nature. 2005;438:90–93. doi: 10.1038/nature04056. [DOI] [PubMed] [Google Scholar]
80.Helliwell KE. The roles of B vitamins in phytoplankton nutrition: new perspectives and prospects. New Phytol. 2017;216:62–68. doi: 10.1111/nph.14669. [DOI] [PubMed] [Google Scholar]
81.Cooper MB, et al. Cross-exchange of B-vitamins underpins a mutualistic interaction between Ostreococcus tauri and Dinoroseobacter shibae. ISME J. 2019;13:334–345. doi: 10.1038/s41396-018-0274-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
82.Croft MT, Warren MJ, Smith AG. Algae need their vitamins. Eukaryot. Cell. 2006;5:1175–1183. doi: 10.1128/EC.00097-06. [DOI] [PMC free article] [PubMed] [Google Scholar]
83.Cho, S. H. et al. Elucidation of the biosynthetic pathway of vitamin B groups and potential secondary metabolite gene clusters via genome analysis of a marine bacteriumPseudoruegeria sp. M32A2M.J. Microbiol. Biotechnol.30, 505–514 (2020). [DOI] [PMC free article] [PubMed]
84.Karimi E, et al. Genome sequences of 72 bacterial strains isolated from Ectocarpus subulatus: a resource for algal microbiology. Genome Biol. Evol. 2020;12:3647–3655. doi: 10.1093/gbe/evz278. [DOI] [PMC free article] [PubMed] [Google Scholar]
85.Liang H, et al. Phylogenomics provides new insights into gains and losses of selenoproteins among Archaeplastida. Int. J. Mol. Sci. 2019;20:3020. doi: 10.3390/ijms20123020. [DOI] [PMC free article] [PubMed] [Google Scholar]
86.McFadden GI, Melkonian M. Use of Hepes buffer for microalgal culture media and fixation for electron microscopy. Phycologia. 1986;25:551–557. [Google Scholar]
87.Johnson MTJ, et al. Evaluating methods for isolating total RNA and predicting the success of sequencing phylogenetically diverse plant transcriptomes. PLoS ONE. 2012;7:e50226. doi: 10.1371/journal.pone.0050226. [DOI] [PMC free article] [PubMed] [Google Scholar]
88.Sahu SK, Thangaraj M, Kathiresan K. DNA extraction protocol for plants with high levels of secondary metabolites and polysaccharides without using liquid nitrogen and phenol. ISRN Mol. Biol. 2012;2012:205049. doi: 10.5402/2012/205049. [DOI] [PMC free article] [PubMed] [Google Scholar]
89.Marin B, Palm A, Klingberg M, Melkonian M. Phylogeny and taxonomic revision of plastid-containing euglenophytes based on SSU rDNA sequence comparisons and synapomorphic signatures in the SSU rRNA secondary structure. Protist. 2003;154:99–145. doi: 10.1078/143446103764928521. [DOI] [PubMed] [Google Scholar]
90.Nevers Y, et al. Insights into ciliary genes and evolution from multi-level phylogenetic profiling. Mol. Biol. Evol. 2017;34:2016–2034. doi: 10.1093/molbev/msx146. [DOI] [PMC free article] [PubMed] [Google Scholar]
91.De Clerck O, et al. Insights into the evolution of multicellularity from the sea lettuce genome. Curr. Biol. 2018;28:2921–2933. doi: 10.1016/j.cub.2018.08.015. [DOI] [PubMed] [Google Scholar]
92.Cheng S, et al. 10KP: a phylodiverse genome sequencing plan. Gigascience. 2018;7:giy013. doi: 10.1093/gigascience/giy013. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Information^{(12.3MB, pdf)}

Supplementary Figs. 1–21.

Reporting Summary^{(70.4KB, pdf)}

Supplementary Tables^{(280.1KB, xlsx)}

Supplementary Tables 1–33.

Supplementary Data 1^{(353KB, zip)}

Taxonomic Acts and Revisions.

Data Availability Statement

Whole-genome assemblies, annotation and raw data forP. coloniale in this study are deposited at the CNGB Nucleotide Sequence Archive⁹² (CNSA:http://db.cngb.org/cnsa, accession no. CNP0000924).

Movatterモバイル変換

PERMALINK

The genome ofPrasinoderma coloniale unveils the existence of a third phylum within green plants

Linzhou Li

Sibo Wang

Hongli Wang

Sunil Kumar Sahu

Birger Marin

Haoyuan Li

Yan Xu

Hongping Liang

Zhen Li

Shifeng Cheng

Tanja Reder

Zehra Çebi

Sebastian Wittek

Morten Petersen

Barbara Melkonian

Hongli Du

Huanming Yang

Jian Wang

Gane Ka-Shu Wong

Xun Xu

Xin Liu

Yves Van de Peer

Michael Melkonian

Huan Liu

Abstract

Main

Fig. 1. Phylogenetic analysis ofP. coloniale.

Results and discussion

Genome sequencing and characteristics

Extended Data Fig. 1. A physical map of theP. coloniale genome.

Phylogenetic analyses and Prasinodermophyta div. nov

Extended Data Fig. 2. The impact of a severely reduced taxon sampling in rRNA phylogenies on the placement of the Prasinodermophyta.

Comparison of gene families among Archaeplastida

Fig. 2. Comparative analysis ofP. coloniale and other Chlorophyta.

Comparative genomics ofP. coloniale with early-diverging Viridiplantae

Extended Data Fig. 3. Comparison of genome characteristics across Viridiplantae.

Analysis of transcription factors inP. coloniale

Extended Data Fig. 4. The phylogenetic tree of WRKY domain.

Light-harvesting complex (LHC) and LHC-like proteins inP. coloniale

Fig. 3. Phylogenetic tree of the LHC antenna protein superfamily.

Carbon-concentrating mechanisms (CCMs)

Fig. 4. CCMs inP. coloniale.

Analysis of carbohydrate-active enzymes (CAZymes) and peptidoglycan biosynthesis

Fig. 5. Analysis of peptidoglycan biosynthesis and flagellar proteins derived from theP. coloniale genome.

Evolutionary analysis of flagella and sexual reproduction inP. coloniale

De novo NAD+ and quinolate biosynthesis inP. coloniale

Fig. 6. Comparison of de novo NAD+ and quinolinate biosynthesis genes.

Vitamin auxotrophy and selenocysteine-containing proteins inP. coloniale

Conclusion

Methods

Cultivation of algae, nucleic acid extraction and light microscopy

Genome sequencing and assembly

Transcriptome sequencing and assembly

Repeat annotation

Gene prediction and preliminary functional annotation

Whole-genome phylogenetic analysis

Phylogenetic analyses of complete nuclear and plastid-encoded rRNA operon sequences of 109 Archaeplantae

Search for unique rRNA synapomorphies

Genome composition ofP. coloniale genome

Detection of key candidate functional genes

Subcellular localization

Reporting Summary

Supplementary information

Acknowledgements

Extended data

Author contributions

Data availability

Competing interests

Footnotes

Contributor Information

Extended data

Supplementary information

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

De novo NAD⁺ and quinolate biosynthesis inP. coloniale

Fig. 6. Comparison of de novo NAD⁺ and quinolinate biosynthesis genes.