Movatterモバイル変換


[0]ホーム

URL:


Signup / Login
  • Published online : 11 January 2024
    Article Outline
      Back to articles list
      Scroll to top
      Data Release
      Near chromosome-level and highly repetitive genome assembly of the snake pipefishEntelurus aequoreus (Syngnathiformes: Syngnathidae)
      Magnus Wolf
        Magnus Wolf
      Affiliation
      Senckenberg Biodiversity and Climate Research Centre (BiK-F), Frankfurt am Main, Germany
      Institute for Ecology, Evolution, and Diversity, Goethe University, Frankfurt am Main, Germany
      Institute for Evolution and Biodiversity, University of Münster, Münster, Germany
      Corresponding authors. E-mail: Magnus.Wolf@senckenberg.de; Sven.Winter@senckenberg.de
      *
      Bruno Lopes da Silva Ferrette
        Bruno Lopes da Silva Ferrette
      Affiliation
      Senckenberg Biodiversity and Climate Research Centre (BiK-F), Frankfurt am Main, Germany
      Raphael T. F. Coimbra
        Raphael T. F. Coimbra
      Affiliation
      Senckenberg Biodiversity and Climate Research Centre (BiK-F), Frankfurt am Main, Germany
      Institute for Ecology, Evolution, and Diversity, Goethe University, Frankfurt am Main, Germany
      Menno de Jong
        Menno de Jong
      Affiliation
      Senckenberg Biodiversity and Climate Research Centre (BiK-F), Frankfurt am Main, Germany
      Marcel Nebenführ
        Marcel Nebenführ
      Affiliation
      Senckenberg Biodiversity and Climate Research Centre (BiK-F), Frankfurt am Main, Germany
      Institute for Ecology, Evolution, and Diversity, Goethe University, Frankfurt am Main, Germany
      David Prochotta
        David Prochotta
      Affiliation
      Senckenberg Biodiversity and Climate Research Centre (BiK-F), Frankfurt am Main, Germany
      Institute for Ecology, Evolution, and Diversity, Goethe University, Frankfurt am Main, Germany
      Yannis Schöneberg
        Yannis Schöneberg
      Affiliation
      Senckenberg Biodiversity and Climate Research Centre (BiK-F), Frankfurt am Main, Germany
      Institute for Ecology, Evolution, and Diversity, Goethe University, Frankfurt am Main, Germany
      Konstantin Zapf
        Konstantin Zapf
      Affiliation
      Senckenberg Biodiversity and Climate Research Centre (BiK-F), Frankfurt am Main, Germany
      Institute for Ecology, Evolution, and Diversity, Goethe University, Frankfurt am Main, Germany
      Jessica Rosenbaum
        Jessica Rosenbaum
      Affiliation
      Institute for Ecology, Evolution, and Diversity, Goethe University, Frankfurt am Main, Germany
      Hannah A. Mc Intyre
        Hannah A. Mc Intyre
      Affiliation
      Institute for Ecology, Evolution, and Diversity, Goethe University, Frankfurt am Main, Germany
      Julia Maier
        Julia Maier
      Affiliation
      Institute for Ecology, Evolution, and Diversity, Goethe University, Frankfurt am Main, Germany
      Clara C. S. de Souza
        Clara C. S. de Souza
      Affiliation
      Institute for Ecology, Evolution, and Diversity, Goethe University, Frankfurt am Main, Germany
      Lucas M. Gehlhaar
        Lucas M. Gehlhaar
      Affiliation
      Institute for Ecology, Evolution, and Diversity, Goethe University, Frankfurt am Main, Germany
      Melina J. Werner
        Melina J. Werner
      Affiliation
      Institute for Ecology, Evolution, and Diversity, Goethe University, Frankfurt am Main, Germany
      Henrik Oechler
        Henrik Oechler
      Affiliation
      Institute for Ecology, Evolution, and Diversity, Goethe University, Frankfurt am Main, Germany
      Marie Wittekind
        Marie Wittekind
      Affiliation
      Institute for Ecology, Evolution, and Diversity, Goethe University, Frankfurt am Main, Germany
      Moritz Sonnewald
        Moritz Sonnewald
      Affiliation
      Senckenberg Research Institute, Department of Marine Zoology, Section Ichthyology, Frankfurt am Main, Germany
      Maria A. Nilsson
        Maria A. Nilsson
      Affiliation
      Senckenberg Biodiversity and Climate Research Centre (BiK-F), Frankfurt am Main, Germany
      LOEWE-Centre for Translational Biodiversity Genomics (TBG), Frankfurt am Main, Germany
      Axel Janke
        Axel Janke
      Affiliation
      Senckenberg Biodiversity and Climate Research Centre (BiK-F), Frankfurt am Main, Germany
      Institute for Ecology, Evolution, and Diversity, Goethe University, Frankfurt am Main, Germany
      LOEWE-Centre for Translational Biodiversity Genomics (TBG), Frankfurt am Main, Germany
      Sven Winter
        Sven Winter
      Affiliation
      Senckenberg Biodiversity and Climate Research Centre (BiK-F), Frankfurt am Main, Germany
      Institute for Ecology, Evolution, and Diversity, Goethe University, Frankfurt am Main, Germany
      Research Institute of Wildlife Ecology, University of Veterinary Medicine, Vienna, Austria
      Corresponding authors. E-mail: Magnus.Wolf@senckenberg.de; Sven.Winter@senckenberg.de
      *
       Views 543
       Downloads 55
      Download PDF

      Cite this article as... 

      Magnus Wolf, Bruno Lopes da Silva Ferrette, Raphael T. F. Coimbra, Menno de Jong, Marcel Nebenführ, David Prochotta, Yannis Schöneberg, Konstantin Zapf, Jessica Rosenbaum, Hannah A. Mc Intyre, Julia Maier, Clara C. S. de Souza, Lucas M. Gehlhaar, Melina J. Werner, Henrik Oechler, Marie Wittekind, Moritz Sonnewald, Maria A. Nilsson, Axel Janke, Sven Winter, Near chromosome-level and highly repetitive genome assembly of the snake pipefishEntelurus aequoreus (Syngnathiformes: Syngnathidae)Gigabyte, 2024  https://doi.org/10.46471/gigabyte.105

       Copy citation
      Article Information
      Journal title: Gigabyte
      Publisher name: GigaScience Press
      Publisher location: Sha Tin, New Territories, Hong Kong SAR
      Received: 20 October 2023
      Accepted: 08 January 2024
      Published: 11 January 2024
      Copyright © The Author(s) 2024.
      https://creativecommons.org/licenses/by/4.0/
      Gigabyte
      Gigabyte
      2709-4715
      GigaScience Press
      Sha Tin, New Territories, Hong Kong SAR
      DRR-202310-01
      105
      10.46471/gigabyte.105
      https://doi.org/10.1101/2023.12.12.571260
      Data Release
      Genetics and Genomics
      Evolutionary Biology
      Marine Biology
      Near chromosome-level and highly repetitive genome assembly of the snake pipefishEntelurus aequoreus (Syngnathiformes: Syngnathidae)
      M. Wolfet al.
      Near chromosome-level and highly repetitive genome assembly of the snake pipefishEntelurus aequoreus (Syngnathiformes: Syngnathidae)
      https://orcid.org/0000-0001-9212-9861
      WolfMagnus
      123
      Formal analysis
      Writing - original draft
      Writing - review editing
      Investigation
      *
      https://orcid.org/0000-0002-3108-9867
      FerretteBruno Lopes da Silva
      1
      Writing - original draft
      Formal analysis
      Investigation
      Methodology
      Visualization
      https://orcid.org/0000-0002-6075-7203
      CoimbraRaphael T. F.
      12
      Investigation
      Formal analysis
      Writing - review editing
      https://orcid.org/0000-0003-2131-9048
      de JongMenno
      1
      Formal analysis
      Writing - review editing
      Investigation
      https://orcid.org/0000-0001-8802-2105
      NebenführMarcel
      Formal analysis
      Investigation
      Writing - review editing
      12
      https://orcid.org/0009-0000-6275-7752
      ProchottaDavid
      12
      Formal analysis
      Investigation
      Writing - review editing
      https://orcid.org/0000-0003-1113-973X
      SchönebergYannis
      12
      Formal analysis
      Investigation
      Writing - review editing
      ZapfKonstantin
      12
      Formal analysis
      Investigation
      Writing - review editing
      https://orcid.org/0009-0008-2306-9015
      RosenbaumJessica
      2
      Formal analysis
      Investigation
      Writing - review editing
      https://orcid.org/0009-0002-3275-5048
      Mc IntyreHannah A.
      2
      Formal analysis
      Investigation
      Writing - review editing
      MaierJulia
      2
      Formal analysis
      Investigation
      Writing - review editing
      https://orcid.org/0009-0000-9560-8905
      de SouzaClara C. S.
      2
      Formal analysis
      Investigation
      Writing - review editing
      GehlhaarLucas M.
      2
      Formal analysis
      Investigation
      Writing - review editing
      https://orcid.org/0009-0007-1081-009X
      WernerMelina J.
      2
      Formal analysis
      Investigation
      Writing - review editing
      https://orcid.org/0009-0001-0413-8731
      OechlerHenrik
      2
      Formal analysis
      Investigation
      Writing - review editing
      https://orcid.org/0009-0001-2443-7552
      WittekindMarie
      2
      Formal analysis
      Investigation
      Writing - review editing
      https://orcid.org/0000-0003-3042-8107
      SonnewaldMoritz
      4
      Resources
      Writing - review editing
      https://orcid.org/0000-0002-8136-7263
      NilssonMaria A.
      15
      Formal analysis
      Investigation
      Writing - original draft
      Writing - review editing
      https://orcid.org/0000-0002-9394-1904
      JankeAxel
      125
      Funding acquisition
      Conceptualization
      Resources
      Writing - review editing
      Writing - original draft
      Project administration
      https://orcid.org/0000-0002-1890-0977
      WinterSven
      Conceptualization
      Data curation
      Formal analysis
      Investigation
      Project administration
      Writing - original draft
      Writing - review editing
      Methodology
      126*
      1
      Senckenberg Biodiversity and Climate Research Centre (BiK-F),Frankfurt am Main,Germany
      2
      Institute for Ecology, Evolution, and Diversity,Goethe University,Frankfurt am Main,Germany
      3
      Institute for Evolution and Biodiversity,University of Münster,Münster,Germany
      4
      Senckenberg Research Institute,Department of Marine Zoology,Section Ichthyology,Frankfurt am Main,Germany
      5
      LOEWE-Centre for Translational Biodiversity Genomics (TBG),Frankfurt am Main,Germany
      6
      Research Institute of Wildlife Ecology,University of Veterinary Medicine,Vienna,Austria
      11012024
      2024
      2024
      1
      13
      20102023
      08012024
      © The Author(s) 2024.
      2024
      https://creativecommons.org/licenses/by/4.0/
      This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
      The snake pipefish,Entelurus aequoreus (Linnaeus, 1758), is a northern Atlantic fish inhabiting open seagrass environments that recently expanded its distribution range. Here, we present a highly contiguous, near chromosome-scale genome ofE. aequoreus. The final assembly spans 1.6 Gbp in 7,391 scaffolds, with a scaffold N50 of 62.3 Mbp and L50 of 12. The 28 largest scaffolds (>21 Mbp) span 89.7% of the assembly length. A BUSCO completeness score of 94.1% and a mapping rate above 98% suggest a high assembly completeness. Repetitive elements cover 74.93% of the genome, one of the highest proportions identified in vertebrates. Our demographic modeling identified a peak in population size during the last interglacial period, suggesting the species might benefit from warmer water conditions. Our updated snake pipefish assembly is essential for future analyses of the morphological and molecular changes unique to the Syngnathidae.
      Introduction
      The snake pipefishEntelurus aequoreus (Linnaeus 1758) is a member of the family Syngnathidae, which currently includes over 300 species of seahorses and pipefishes [1].E. aequoreus shares typical features with other pipefishes, such as the unique, elongated body plan and fused jaws [2]. However, unlike most pipefishes, which are found in benthic habitats, the snake pipefish inhabits more open and deeper seagrass environments and occurs even in pelagic waters[2]. They are ambush predators on small crustaceans and other invertebrates, thereby indirectly contributing to the overall biodiversity and stability of these fragile habitats [3]. Adult snake pipefishes are poor swimmers equipped with small fins. They rely on their elongated, thin bodies for crypsis in eelgrass habitats[46].
      The snake pipefish historically ranged from the waters of the Azores northwards to the waters of Norway and Iceland and eastward to the Baltic Sea[7]. However, since 2003, the species has expanded its distribution [8] into the Arctic waters of Spitsbergen [9], the Barents Sea, and the Greenland Sea [10]. Simultaneously, population sizes seem to increase within its former range, as indicated by substantially increased catch rates [11,12]. Several factors have been proposed to cause this expansion and population growth, including rising sea temperatures, an increased potential for long-distance dispersal of juveniles via ocean currents [4,7], and an increased reproductive success facilitated by the dispersal of invasive seaweeds [6,810,13]. The latter explanation has been confirmed by local field experiments in the northern Wadden Sea, suggesting a mutual co-occurrence of the invasive Japanese seaweed (Sargassum muticum) and the snake pipefish [5]. Studies based on mtDNA marker regions did not discern any population structure thus far and suggest a previous population expansion in the Pleistocene, around 50–100 thousand years ago (kya) [6]. Yet, a comprehensive analysis of demographic events is better conducted using genomic data, thus requiring a high-quality reference genome, ideally of the same species or at least a closely related one.
      Previously, genomes of Syngnathidae have been used to study the evolution of highly specialized morphologies and life-history traits unique to pipefishes and seahorses [1416]. For instance, the transition to male pregnancy was associated with major genomic restructuring events and parallel modifications of the adaptive immune system. There is a remarkable variability in genome sizes within the family, with estimates ranging from 350 Mbp to 1.8 Gbp [14]. The major shifts in body shape are assumed to be related to gene-family loss and expansion events, along with higher rates of protein and nucleotide evolution [16]. Genomic data obtained using a direct sequencing approach of ultra-conserved elements improved the understanding of the phylogeny of pipefishes [15] and identified a likely radiation of the group in the waters of the modern Indo-Pacific Ocean. Nevertheless, high-quality genomes of Syngnathidae are only available for a few species. According to the NCBI genome database, only 7% of the known species diversity has genome sequences available.
      A draft genome of the snake pipefish was previously assembled using a combination of paired-end and mate-pair sequencing techniques, yielding an assembly with low continuity (N50 3.5 kbp, BUSCO C: 21%) and a large difference between the estimated and assembled genome sizes (1.8 Gbp vs. 557 Mbp) [14]. To obtain a higher quality, near chromosome-scale genome assembly for the snake pipefish, essential for future population, conservation, and evolutionary genomics studies of fish, we used long-read sequencing technologies. This allowed us to gain insights into the genetic properties of the species and to perform demographic analyses based on the Pairwise Sequentially Markovian Coalescent (PSMC) framework[17]. The data generation and analyses presented here were conducted during a six-week master course in 2021 at the Goethe University, Frankfurt am Main, Germany. The concept of high-quality genome sequencing in a course setting has so far yielded three reference-quality fish genomes and has proven to be a successful approach to introducing the technology to a new generation of scientists [1821].
      Results and discussion
      Genome sequencing and assembly
      PacBio’s continuous long read (CLR) technology generated 401 Gbp of long-read data in ∼60 million reads with an N50 of 7.9 kb (Table 1). Illumina sequencing yielded 38 Gbp of standard short-read data in approximately 257 million reads with a mean length of 148 bp after filtering. Sequencing of the Omni-C library generated 54.7 Gbp of raw short-read data.
      Table 1
      Summary statistics of the snake pipefish reference genome. The table includes information for (A) the raw read sequencing, (B) the scaffold- and contig-levelde novo assembly, and (C) the BUSCO completeness statistics.
      (A) Raw read statistics
      No. short reads264,111,731
      Mapped short reads (%)99.53
      Mean short read coverage (x)23
      No. long reads130,590,372
      Mapped long reads (%)98.61
      Mean long-read coverage (x)205.2
      (B) Assembly statistics (scaffold/contig)
      No. scaffolds/contigs 7,387 7,473
      No. scaffolds/contigs (>50 kbp) 466 526
      Scaffold/contig L50 12 14
      Scaffold/contig N50 (bp) 62,341,166 45,010,074
      Total length (bp) 1,662,053,0461,662,035,846
      GC (%) 38.87 38.87
      No. of N’s per 100 kb 1.03 0.0
      Heterozygosity (%) 0.387
      Total interspersed repeats (bp)1,237,929,559 (74.93%)
      (C) BUSCO completeness
      Clade:ActinopterygiiC: 94.1% [S: 92.6%, D: 1.5%]
      F: 2.0%, M: 3.9%
      n: 3,640
      BUSCO: Benchmarking Universal Single-Copy Orthologs (65); C, complete; S, single copy; D, duplicated; F, fragmented; M, missing.
      The snake pipefish’s genome was assembledde novo to a total size of 1.7 Gbp. It consisted of 2,204 scaffolds, with a scaffold N50 of 62 Mbp and an L50 of 11 (Table 1, Figure 1A). The finalized assembly has 1.0 Ns per 100 kbp and a GC content of 38.84%. Our BUSCO (RRID:SCR_015008) completeness assessment resulted in 94.1% complete core genes, based on theactinopterygii_obd10 set, showing the high completeness of the assembly. Both long- and short-read data mapped onto the assembly with high mapping rates of 98.6% and 99.5%, respectively. HI-C mapping resulted in 28 larger scaffolds (Figure 1B), indicating the near-chromosome level of thede novo assembly. This result aligns with past karyotype estimations of other pipefish and seahorses, predicting 22 and 22-24 chromosomes, respectively [2224]. The rest of the genome comprises only smaller scaffolds and contigs, which may result from the high amounts of repetitive regions, as detailed in the following section. Our Blobtools (RRID:SCR_017618) analysis of both long- and short-read data (Figure 1C and D) found no apparent signs of contamination. However, we detected and removed background noise of unknown origin in both datasets.
      Figure 1.
      Assembly characteristics and quality assessments of thede novo Entelurus aequoreus genome. (A) The snail plot summarizes different assembly properties. Scaffold statistics are depicted in the innermost circle, and colors red to orange represent the longest scaffold N50 and N90, respectively. GC composition is shown in the outer blue circle. BUSCO completeness statistics are depicted in the small green circle. (B) Omni-C contact density map indicating 28 larger scaffolds and the near-chromosome level of the assembly. (C,D) The BlobPlot analysis compares GC content (x-axis), assembly coverage (y-axis), and taxonomic BLAST assignments of contigs (color) for both the Omni-C short reads (C) and PacBio long reads (D).
      Variant calling identified ∼301 million sites (including monomorphic sites), with ∼1.3 million found to be biallelic. Genome-wide heterozygosity was determined to be 0.387%, which is in line with other fish species [25,26]. The GenomeScope (RRID:SCR_017014) results based on short reads suggested a haploid genome size of 1.15 Gbp and an expected genome-wide heterozygosity of 1%. These estimates were around 362 Mbp shorter and 0.57% more heterozygous than the final assembly. This, again, might be explained by the high repeat content of the genome.
      Annotation
      In total, 0.9 Gbp, or 74.93%, of the entire assembly, were identified as repetitive during ourde novo repeat-modeling (using RepeatModeler, RRID:SCR_015027) and repeat-masking (using RepeatMasker, RRID:SCR_012954) as shown in Figure 2. This high repeat content contrasts with other fish genomes [27]. However, it is similar, although at a smaller scale, to the closest relative,Nerophis ophidion (65.7%) [14], and other syngnathid fish genomes, such as seadragons [28]. The first draft assembly of the snake pipefish had a repeat content of 57.2% [14], and our improved long-read assembly identified 17.7% additional repeats that were missing from the previous assembly [14]. So far, among vertebrates, only the lungfishNeoceratodus forsteri [29] has more transposable elements (TEs) than the snake pipefish.
      Figure 2.
      Repeat landscape of thede novo Entelurus aequoreus genome. Colors represent different types of RE, and gray areas indicate unclassified types of repetitive regions.
      The annotation of the genome, featuringde novo and homology-based identification approaches, resulted in 33,202 genes with an average length of 13,828 bp. Each gene had, on average, 7.32 exons and 6.25 introns with average lengths of 188 bp and 2,240 bp, respectively. In total, we identified 243,038 exons and 207,467 introns within our annotation. The total number of genes is ∼30% higher compared to other annotated genomes in the order Syngnathiformes, such as 23,458 for the tiger tail seahorse (Hippocampus comes[16] or 24,927 for the greater pipefish (Syngnathus acus[30] made by the NCBI Eukaryotic Genome Annotation pipeline. Notably, as these two genomes are considerably smaller (492 Mbp and 324 Mbp, respectively), we can assume that the large-scale genome increase in this species included many coding sequences. The high content of repetitive regions and the lack of transcriptomic data might also have increased the number of false positive gene-calls; however, our BUSCO completeness analysis of the predicted proteins resulted in 82.6% complete sequences, with only 6.8% duplicated ones. Additionally, 5.3% of the coding sequences appeared fragmented, and 12.1% were missing from theactinopterygii_obd10 OrthoDB set. Our functional annotation resulted in hits for 89% of the predicted proteins.
      Demographic inference
      The demographic inference analysis of the snake pipefish genome (Figure 3) using the PSMC framework[17] traced population changes over the past 1 million years. Given the chosen substitution rate and generation time, there was a steady increase in the effective population size (Ne), starting at 15 thousand individuals 1 million years ago, which peaked at anNe of 250 thousand individuals 100 kya. Thereafter,Ne decreased until reaching 30 thousand individuals at 10 kya and stagnated until the end of the model. The previously suggested population expansion during the Pleistocene (50–100 kya) was therefore confirmed by this model. However, the population expansion was followed by another population decline that was not resolved by Braga Goncalveset al. [6]. Our result may point to a conclusion different from that drawn by the authors. This is because the snake pipefish might have inhabited a comparably small population during the Holocene and only recently expanded its distribution. This expansion resulted in a large population with a high degree of homogenization, as observed by Braga Goncalves and colleagues [6]. Given that the presented peak in population size parallels with the last interglacial period between the Penultimate Glacial Period (135–192 kya [31]) and the last glacial period (present – 20 kya [32]), we assume that the snake pipefish largely benefitted from the warmer water conditions during the interglacial period, as seen in the present range expansion.
      Figure 3.
      Demographic history of the snake pipefish estimated using the PSMC framework. Using a generation time of 2.5 years [72] and a substitution rate of 1.7 × 10−8 per site per generation [71], a model was created covering the last 10 kya to 1 Mya. Thex-axis represents time in number of years ago and they-axis shows the effective populationNe size in tens of thousands of individuals. The model indicates a peak inNe of 250 thousand individuals during the Pleistocene around 100 thousand years ago.
      Material and methods
      Sampling, DNA extraction, and sequencing
      A single individual ofEntelurus aequoreus (Linnaeus 1758) (NCBI:txid42861, marinespecies.org:taxname:127379) was caught by trawling during an annual monitoring expedition to the Dogger Bank in the North Sea in July 2021 (trawl start coordinates 54.993633, 2.940833; end coordinates 55.0077, 2.929867) with the permission of the Maritime Policy Unit of the UK Foreign and Commonwealth Office. The study complied with the ‘Nagoya Protocol on Access to Genetic Resources and the Fair and Equitable Sharing of Benefits Arising from Their Utilization’. The sample was initially frozen at −20 °C and later stored at −80 °C.
      High-molecular-weight genomic DNA was extracted from muscle tissue, following the protocol by Mayjonadeet al. [33], with the addition of Proteinase K. We evaluated the quantity and quality of the DNA with the Genomic DNA ScreenTape on the Agilent 2200 TapeStation system (Agilent Technologies), as well as with the Qubit® dsDNA BR Assay Kit.
      For long-read sequencing, a PacBio SMRT Bell CLR library was prepared using the SMRTbell Express Prep kit v3.0 kit (Pacific Biosciences – PacBio, Menlo Park, CA, USA) and sequenced on the PacBio Sequel IIe platform. A proximity-ligation library was compiled with muscle tissue following the Dovetail™ Omni-C protocol (Dovetail Genomics, Santa Cruz, California, USA). In addition, a standard whole-genome 150 bp paired-end Illumina library was prepared using the NEBNext Ultra II library preparation kit (New England Biolabs Inc., Ipswich, USA). Finally, the proximity ligation and the paired-end library were shipped to Novogene (UK) for sequencing on the Illumina NovaSeq 6000 platform (RRID:SCR_016387).
      Pre-processing and genome size estimation
      The PacBio subreads were converted from BAM into FASTQ format using the PacBio Secondary Analysis Tool BAM2fastx v.1.3.0 [34]. Quality control, trimming, and filtering of the Illumina reads were performed using fastp v0.23.1 (RRID:SCR_016962[35] with the settings“-g -3 -l 40 -y -Y 30 -q 15 -u 40 -c -p -j -h -R -w N”. To estimate the genome size of the snake pipefish, we performedk-mer profiling using the standard short-read Illumina data. We first ran Jellyfish v2.3.0 (RRID:SCR_005491[36] to generate a histogram ofk-mers with a length of 21 bp. Subsequently, we used this data to obtain a genome profile using GenomeScope v2.0 (RRID:SCR_017014[37]. We further tested alternativek-mer lengths between 17- and 25-mers. No significant differences in the estimated genome size were detected except for the 17-mer, which resulted in a smaller genome size estimation of ∼500 Mbp.
      Genome assembly and polishing
      We assembled the genome from the PacBio long-read data using WTDBG v.2.5 (RRID:SCR_017225[38]. The resulting assembly was first polished using the PacBio data with Flye v.2.9 (RRID:SCR_017016[39], using Minimap v.2.17 [40] for mapping. Afterwards, we conducted two rounds of short-read polishing by mapping reads onto the assembly with BWA-MEM v.0.7.17 (RRID:SCR_010910[41], followed by error correction with Pilon v1.23 (RRID:SCR_014731[42].
      Assembly quality control and scaffolding
      The polished assembly contigs were anchored into chromosome-scale scaffolds utilizing the generated proximity-ligation Omni-C data. First, the data were mapped and filtered to the assembly following the Arima Hi-C mapping pipeline used by the Vertebrate Genome Project[43]. In brief, reads were mapped using BWA-MEM v.0.7.17 [41], the mapped reads were filtered with samtools v.1.14 (RRID:SCR_002105[44], and the duplicated reads were removed with “MarkDuplicates” in Picard v.2.26.10 (RRID:SCR_006525)[45]. The filtered mapped reads were then used for proximity-ligation scaffolding in YaHs v.1.1 [46]. Gaps in the scaffolded assembly were closed with TGS-GapCloser v.1.1.1 (RRID:SCR_017633[47] using a subset (25%) of the PacBio subreads due to computational constraints. To further improve the assembly’s contiguity, scaffolding and gap-closing were performed a second time using a different subset of PacBio reads for gap-closing. The PacBio read subsets were generated with Seqtk v.1.3 (RRID:SCR_018927)[48] using the random number generator seeds 11 and 18. Gene set completeness was analyzed with BUSCO v.5.4.7 [49] using the Actinopterygii set of core genes (actinopterygii_odb10). Assembly continuity was evaluated using QUAST v5.0.2 (RRID:SCR_001228[50], and mapping rates were assessed by QualiMap v2.2.1 (RRID:SCR_001209[51]. Finally, BlobToolsKit v.4.0.6 [52] performed contamination screening.
      Repeat landscape analysis and genome annotation
      The TE annotation was done in three steps. First, we used RepeatMasker v4.1.5 [53] to annotate and hard-mask known Actinopterygii repeats from Repbase (RRID:SCR_021169), which comprises a database of eukaryotic repetitive DNA element sequences [54]. Secondly, ade novo library of TE was created from the hard-masked genome assembly using RepeatModeler v2.0.4 [55], which includes RECON v1.08 (RRID:SCR_021170[56], RepeatScout v1.0.6 (RRID:SCR_014653[57], as well as LTRharvest and LTR_retriever (RRID:SCR_018970 and RRID:SCR_017623, respectively) [58,59]. Finally, predicted repeats were annotated with a second run of RepeatMasker on the hard-masked assembly obtained in the first run. The results of both RepeatMasker runs were then combined. A summary of TEs and the relative abundance of repeat classes in the genome are shown in Table2 and Figure 2.
      Table 2
      Repeat content of the genome assembly. Class: class of the repetitive regions. Count: number of occurrences of the repetitive region. bpMasked: number of base pairs masked; %Masked: percentage of base pairs masked. LINE: Long Interspersed Nuclear Elements (include retroposons); LTR: Long Terminal Repeat elements (including retroposons); SINE: Short Interspersed Nuclear Elements; RC: Rolling Circle.
      Class Count bpMasked%masked
      ARTEFACT 4 840.00%
      DNA2,765,297 372,407,739 22.40%
      LINE 850,222 167,337,419 10.06%
      LTR 177,214 55,439,6873.33%
      PLE 1 00.00%
      RC 32,348 3,385,0840.20%
      SINE 435,464 32,709,5721.95%
      Unknown3,628,328 534,216,084 32.14%
      Low complexity 127,733 3,095,3220.19%
      Satellite 21,221 7,145,4690.43%
      Simple repeat1,437,090 61,077,3393.67%
      rRNA 4,394 534,5990.03%
      scRNA 5 5040.00%
      snRNA 695 46,8450.00%
      tRNA 6,029 533,8120.03%
      Total9,486,0451,237,929,559 74.93%
      The genome was annotated using the BRAKER3 pipeline (RRID:SCR_018964[6065], combining ade novo gene calling and a homology-based gene annotation. For protein references, we combined the vertebrate-specific protein collection from OrthoDB (RRID:SCR_011980) and the protein collection of the greater pipefish (Syngnathus acus) genome [30] made by the NCBI (see:GCF_901709675.1, last accessed 12th October 2023). To further filter genes based on the support of introns and using extrinsic homology evidence, we used TSEBRA [66] with an “intron_support=0.1”. The resulting set of proteins was tested for completeness using BUSCO v.5.4.7 [49] in “protein mode” and run against the Actinopterygii-specific set of core genes. Finally, functional annotation was done using InterProScan v5 (RRID:SCR_005829[67].
      Variant calling and demographic inference
      The preprocessed short reads were mapped to the final assembly using BWA-MEM v.0.7.17 [41], followed by the removal of duplicate reads with “MarkDuplicates” in Picard v.2.26.10[45] and the evaluation of the mapping quality using Qualimap v2.2.1 [51]. Indels in the BAM files were first identified and then realigned with “RealignerTargetCreator” and “IndelRealigner” as part of the Genome Analysis Toolkit v3.8-1 [68]. Subsequently, samtools v.1.14 [44] was used to check and remove unmapped, secondary, QC-failed, duplicated, and supplementary reads, keeping only reads mapped in proper pairs in non-repetitive regions of the 28 chromosome-scale scaffolds.
      Sambamba v 1.0.0 (RRID:SCR_024328[69] was used to estimate site depth statistics. Minimum and maximum thresholds for the global site depth were set to d ± (5 × MAD), where d is the global site depth distribution median and MAD is the median absolute deviation. Variant calling was performed using the bcftools v1.17 (RRID:SCR_005227[70] commands “mpileup” and “call” [-m]. Variants were then filtered with bcftools “filter” [-e “DP < d − (5 × MAD) ∥ DP > d + (5 × MAD) ∥ QUAL < 30”], thus removing sites with low quality and out of range depth. Finally, bcftools was used to estimate the genome-wide heterozygosity as the proportion of heterozygous sites using the “stats” command.
      Long-term changes in the effective population size (Ne) over time were estimated using the PSMC model[17]. This analysis used the diploid consensus genome sequences generated by bcftools v1.17 [70] with the script “vcfutils.pl” from the processed BAM files, as described above. Sites with read-depth up to a third of the average depth or above twice each sample’s median depth and with a consensus base quality < 30 were removed. PSMC was executed using 25 iterations, employing a maximum 2N0-scaled coalescent time of 15, an initial θ∕ρ ratio of 5, and 64 atomic time intervals (4 + 25 × 2 + 4 + 6) to infer the scaled mutation rate, the recombination rate, and the free population size parameters, respectively. We performed 100 bootstrap replicates by randomly sampling with replacement 1 Mb blocks from the consensus sequence for all individuals. A mutation rate 𝜇 of 1.7 × 10−9 per site per generation [71] and a generation length of 2.5 years [72] were employed for plotting.
      Data availability
      Thede novo genome and all underlying raw data were uploaded to NCBI under the BioProjectPRJNA1005573, BioSampleSAMN36988691, genome assemblyGCA_034508595. All other data, including the repeat and gene annotation, is available in the GigaDB repository [73].
      Abbreviations
      CLR, continuous long reads; kya, thousand years ago; MAD, median absolute deviation; PSMC, Pairwise Sequentially Markovian Coalescent; TE, transposable element.
      Declarations
      Ethics approval and consent for publication
      Not Applicable.
      Competing interests
      The authors declare that they have no competing interests.
      Author contributions
      MW, BF, MS, AJ, and SW designed the study. SW, JR, HMI, JM, CDS, LG, MJW, HO, and MWI performed laboratory procedures and sequencing. MW, BF, RC, MDJ, MN, DP, YS, KZ, JR, HMI, JM, CDS, LG, MJW, HO, MWI, MAN, and SW conducted bioinformatic processing and analyses. All authors contributed to writing this manuscript.
      Acknowledgements
      The present study is a result of the Centre for Translational Biodiversity Genomics (LOEWE-TBG) and was supported through the program ‘LOEWE-Landes-Offensive zur Entwicklung Wissenschaftlich-ökonomischer Exzellenz’ of Hesse’s Ministry of Higher Education, Research, and the Arts.
      References
      1FroeseR,PaulyD.FishBase.2023;www.fishbase.org. Accessed 9 August 2023.
      2DawsonC.Syngnathidae. In:SmithM,HeemstraP (eds),Smiths’ Sea Fishes.Berlin:Springer-Verlag,1986; pp.445458.
      3O’GormanEJ.Multitrophic diversity sustains ecological complexity by dampening top-down control of a shallow marine benthic food web.Ecology,2021;102:e03274. doi:10.1002/ecy.3274.
      4VincentACJ,BerglundA,AhnesjI.Reproductive ecology of five pipefish species in one eelgrass meadow.Environ. Biol. Fish,1995;44:347361. doi:10.1007/BF00008250.
      5PolteP,BuschbaumC.Native pipefishEntelurus aequoreus are promoted by the introduced seaweed Sargassum muticum in the northern Wadden Sea.North Sea Aquat. Biol.,2008;3:1118. doi:10.3354/ab00071.
      6Braga GoncalvesI,CornettiL,CouperusASPhylogeography of the snake pipefish,Entelurus aequoreus (Family: Syngnathidae) in the northeastern Atlantic Ocean.Biol. J. Linn. Soc.,2017;122:787800. doi:10.1093/biolinnean/blx112.
      7WheelerA.Key to the Fishes of Northern Europe: A Guide to the Identification of More than 350 Species.London:Frederick Warne & Co. Ltd,1978. ISBN-10:0723220646.
      8HarrisMP,BeareD,ToresenRA major increase in snake pipefish (Entelurus aequoreus) in northern European seas since 2003: potential implications for seabird breeding success.Mar. Biol.,2007;151:973983. doi:10.1007/s00227-006-0534-7.
      9FleischerD,SchaberM,PiepenburgD.Atlantic snake pipefish (Entelurus aequoreus) extends its northward distribution range to Svalbard (Arctic Ocean).Polar Biol.,2007;30:13591362. doi:10.1007/s00300-007-0322-y.
      10RusyaevSM,DolgovAV,KaramushkoOV.Captures of snake pipefishEntelurus aequoreus in the Barents and Greenland Seas.J. Ichthyol.,2007;47:544546. doi:10.1134/S0032945207070090.
      11KloppmannMHF,UlleweitJ.Off-shelf distribution of pelagic snake pipefish,Entelurus aequoreus (Linnaeus, 1758), west of the British Isles.Mar. Biol.,2007;151:271275. doi:10.1007/s00227-006-0480-4.
      12van DammeCJ,CouperusAS.Mass occurrence of snake pipefish in the Northeast Atlantic: result of a change in climate?J. Sea Res.,2008;60:117125. doi:10.1016/j.seares.2008.02.009.
      13LindleyJ,KirbyR,JohnsDExceptional abundance of the snake pipefish (Entelurus aequoreus) in the north-eastern North Atlantic Ocean. ICES Document.2006.
      14RothO,SolbakkenMH,TørresenOKEvolution of male pregnancy associated with remodeling of canonical vertebrate immunity in seahorses and pipefishes.Proc. Natl. Acad. Sci. USA,2020;117:94319439. doi:10.1073/pnas.1916251117.
      15StillerJ,ShortG,HamiltonHPhylogenomic analysis of Syngnathidae reveals novel relationships, origins of endemic diversity and variable diversification rates.BMC Biol.,2022;20:75. doi:10.1186/s12915-022-01271-w.
      16LinQ,FanS,ZhangYThe seahorse genome and the evolution of its specialized morphology.Nature,2016;540:395399. doi:10.1038/nature20595.
      17LiH,DurbinR.Inference of human population history from individual whole-genome sequences.Nature,2011;475:493496. doi:10.1038/nature10231.
      18ProstS,WinterS,de RaadJEducation in the genomics era: generating high-quality genome assemblies in university courses.GigaScience,2020;9(6):giaa058. doi:10.1093/gigascience/giaa058.
      19ProstS,PetersenM,GrethleinMImproving the chromosome-level genome assembly of the siamese fighting fish (Betta splendens) in a University Master’s course.G3 (Bethesda),2020;10:21792183. doi:10.1534/g3.120.401205.
      20WinterS,ProstS,de RaadJChromosome-level genome assembly of a benthic associated Syngnathiformes species: the common dragonet,Callionymus lyra.GigaByte,2020;2020:gigabyte6. doi:10.46471/gigabyte.6.
      21WinterS,de RaadJ,WolfMA chromosome-scale reference genome assembly of the great sand eel,Hyperoplus lanceolatus.J. Hered.,2023;114:189194. doi:10.1093/jhered/esad003.
      22VitturiR,CatalanoE.Karyotypes in two species of the genus Hippocampus (Pisces: Syngnatiformes).Mar. Biol.,1988;99:119121. doi:10.1007/BF00644985.
      23VitturiR,LibertiniA,CampolmiMConventional karyotype, nucleolar organizer regions and genome size in five Mediterranean species of Syngnathidae (Pisces, Syngnathiformes).J. Fish Biol.,1998;52:677687. doi:10.1111/j.1095-8649.1998.tb00812.x.
      24SmallCM,BasshamS,CatchenJThe genome of the Gulf pipefish enables understanding of evolutionary innovations.Genome Biol.,2016;17:258. doi:10.1186/s13059-016-1126-6.
      25TiganoA,JacobsA,WilderAPChromosome-level assembly of the Atlantic silverside genome reveals extreme levels of sequence diversity and structural genetic variation.Genome Biol. Evol.,2021;13(6):evab098. doi:10.1093/gbe/evab098.
      26BarryP,BroquetT,GagnaireP-A.Age-specific survivorship and fecundity shape genetic diversity in marine fishes.Evol. Lett.,2022;6:4662. doi:10.1002/evl3.265.
      27ShaoF,HanM,PengZ.Evolution and diversity of transposable elements in fish genomes.Sci. Rep.,2019;9:15399. doi:10.1038/s41598-019-51888-1.
      28SmallCM,HealeyHM,CurreyMCLeafy and weedy seadragon genomes connect genic and repetitive DNA features to the extravagant biology of syngnathid fishes.Proc. Natl. Acad. Sci. USA,2022;119:e2119602119. doi:10.1073/pnas.2119602119.
      29MeyerA,SchloissnigS,FranchiniPGiant lungfish genome elucidates the conquest of land by vertebrates.Nature,2021;590:284289. doi:10.1038/s41586-021-03198-8.
      30Scott-SommeK,McTierneyS,BrittainRThe genome sequence of the greater pipefish,Syngnathus acus (Linnaeus, 1758).Wellcome Open Res.,2023;8:274. doi:10.12688/wellcomeopenres.19528.1.
      31ObrochtaSP,CrowleyTJ,ChannellJEClimate variability and ice-sheet dynamics during the last three glaciations.Earth Planet. Sci. Lett.,2014;406:198212. doi:10.1016/j.epsl.2014.09.004.
      32ArmstrongE,HopcroftPO,ValdesPJ.A simulated Northern Hemisphere terrestrial climate dataset for the past 60,000 years.Sci. Data,2019;6:265. doi:10.1038/s41597-019-0277-1.
      33MayjonadeB,GouzyJ,DonnadieuCExtraction of high-molecular-weight genomic DNA for long-read sequencing of single molecules.Biotechniques,2016;61:203205. doi:10.2144/000114460.
      34PacBio secondary analysis tool BAM2fastx v.1.3.0, GitHub.2023;https://github.com/PacificBiosciences/pbbioconda.
      35ChenS,ZhouY,ChenYfastp: an ultra-fast all-in-one FASTQ preprocessor.Bioinformatics,2018;34:i884i890. doi:10.1093/bioinformatics/bty560.
      36MarçaisG,KingsfordC.A fast, lock-free approach for efficient parallel counting of occurrences of k-mers.Bioinformatics,2011;27:764770. doi:10.1093/bioinformatics/btr011.
      37VurtureGW,SedlazeckFJ,NattestadMGenomeScope: fast reference-free genome profiling from short reads.Bioinformatics,2017;33:22022204. doi:10.1093/bioinformatics/btx153.
      38RuanJ,LiH.Fast and accurate long-read assembly with wtdbg2.Nat. Methods,2020;17:155158. doi:10.1038/s41592-019-0669-3.
      39KolmogorovM,YuanJ,LinYAssembly of long, error-prone reads using repeat graphs.Nat. Biotechnol.,2019;37:540546. doi:10.1038/s41587-019-0072-8.
      40LiH.Minimap2: pairwise alignment for nucleotide sequences.Bioinformatics,2018;34:30943100. doi:10.1093/bioinformatics/bty191.
      41LiH.Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv.2013;https://doi.org/10.48550/arXiv.1303.3997.
      42WalkerBJ,AbeelT,SheaTPilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement.PLoS One,2014;9:e112963. doi:10.1371/journal.pone.0112963.
      43Vertebrate Genome Project.Arima Genomics Mapping Pipeline 042817.https://github.com/VGP/vgp-assembly/blob/master/pipeline/salsa/arima_mapping_pipeline.sh.
      44DanecekP,BonfieldJK,LiddleJTwelve years of SAMtools and BCFtools.GigaScience,2021;10(2):giab008. doi:10.1093/gigascience/giab008.
      45Picard toolkit.2019; Broad Institute, GitHub Repository.https://broadinstitute.github.io/picard/.
      46ZhouC,McCarthySA,DurbinR.YaHS: yet another Hi-C scaffolding tool.Bioinformatics,2023;39(1):btac808. doi:10.1093/bioinformatics/btac808.
      47XuM,GuoL,GuSTGS-GapCloser: a fast and accurate gap closer for large genomes with low coverage of error-prone long reads.GigaScience,2020;9(9):giaa094. doi:10.1093/gigascience/giaa094.
      49ManniM,BerkeleyMR,SeppeyMBUSCO: assessing genomic data quality and beyond.Curr. Protoc.,2021;1:e323. doi:10.1002/cpz1.323.
      50MikheenkoA,PrjibelskiA,SavelievVVersatile genome assembly evaluation with QUAST-LG.Bioinformatics,2018;34:i142i150. doi:10.1093/bioinformatics/bty266.
      51OkonechnikovK,ConesaA,García-AlcaldeF.Qualimap 2: advanced multi-sample quality control for high-throughput sequencing data.Bioinformatics,2016;32:292294. doi:10.1093/bioinformatics/btv566.
      52ChallisR,RichardsE,RajanJBlobToolKit – Interactive quality assessment of genome assemblies.G3 (Bethesda),2020;10(4):13611374. doi:10.1534/g3.119.400908.
      53SmitA,HubleyR,GreenP.RepeatMasker Open-4.0.2013;http://www.repeatmasker.org.
      54BaoW,KojimaKK,KohanyO.Repbase update, a database of repetitive elements in eukaryotic genomes.Mob. DNA,2015;6:11. doi:10.1186/s13100-015-0041-9.
      55FlynnJM,HubleyR,GoubertCRepeatModeler2 for automated genomic discovery of transposable element families.Proc. Natl. Acad. Sci. USA,2020;117:94519457. doi:10.1073/pnas.1921046117.
      56BaoZ,EddySR.Automated de novo identification of repeat sequence families in sequenced genomes.Genome Res.,2002;12:12691276. doi:10.1101/gr.88502.
      57PriceAL,JonesNC,PevznerPA.De novo identification of repeat families in large genomes.Bioinformatics,2005;21(Suppl 1):i351i358. doi:10.1093/bioinformatics/bti1018.
      58OuS,JiangN.LTR_retriever: a highly accurate and sensitive program for identification of long terminal repeat retrotransposons.Plant Physiol.,2018;176:14101422. doi:10.1104/pp.17.01310.
      59EllinghausD,KurtzS,WillhoeftU.LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons.BMC Bioinform.,2008;9:18. doi:10.1186/1471-2105-9-18.
      60BrunaT,LomsadzeA,BorodovskyM.GeneMark-ETP: automatic gene finding in eukaryotic genomes in consistency with extrinsic data. bioRxiv.2023;https://doi.org/10.1101/2023.01.13.524024.
      61BrůnaT,HoffKJ,LomsadzeABRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database.NAR Genom. Bioinform.,2021;3:lqaa108. doi:10.1093/nargab/lqaa108.
      62KovakaS,ZiminAV,PerteaGMTranscriptome assembly from long-read RNA-seq alignments with StringTie2.Genome Biol.,2019;20:278. doi:10.1186/s13059-019-1910-1.
      63HoffKJ,LangeS,LomsadzeABRAKER1: unsupervised RNA-seq-based genome annotation with GeneMark-ET and AUGUSTUS.Bioinformatics,2016;32:767769. doi:10.1093/bioinformatics/btv661.
      64BuchfinkB,XieC,HusonDH.Fast and sensitive protein alignment using DIAMOND.Nat. Methods,2015;12:5960. doi:10.1038/nmeth.3176.
      65GabrielL,BrůnaT,HoffKJBRAKER3: fully automated genome annotation using RNA-Seq and protein evidence with GeneMark-ETP, AUGUSTUS and TSEBRA. bioRxiv.2023;https://doi.org/10.1101/2023.06.10.544449.
      66GabrielL,HoffKJ,BrůnaTTSEBRA: transcript selector for BRAKER.BMC Bioinform.,2021;22:566. doi:10.1186/s12859-021-04482-0.
      67JonesP,BinnsD,ChangH-YInterProScan 5: genome-scale protein function classification.Bioinformatics,2014;30:12361240. doi:10.1093/bioinformatics/btu031.
      68Genome analysis toolkit v3.8-1. Broad Institute.2022;https://gatk.broadinstitute.org/.
      69TarasovA,VilellaAJ,CuppenESambamba: fast processing of NGS alignment formats.Bioinformatics,2015;31:20322034. doi:10.1093/bioinformatics/btv098.
      70DanecekP,BonfieldJK,LiddleJTwelve years of SAMtools and BCFtools.GigaScience,2021;10(2):giab008. doi:10.1093/gigascience/giab008.
      71HeL,LongX,QiJGenome and gene evolution of seahorse species revealed by the chromosome-level genome ofHippocampus abdominalis.Mol. Ecol. Resour.,2022;22:14651477. doi:10.1111/1755-0998.13541.
      72SchultzJ.Entelurus aequoreus: IUCN red list of threatened species. e.T18258072A44775951.2014.
      73WolfM,FerretteBLS,CoimbraRTFSupporting data for “Near chromosome-level and highly-repetitive genome assembly of the snake pipefishEntelurus aequoreus (Syngnathiformes: Syngnathidae)”.GigaScience Database,2023;http://dx.doi.org/10.5524/102493.

      [8]ページ先頭

      ©2009-2025 Movatter.jp