Mollusca represents the second largest animal phylum but remains poorly explored from a genomic perspective. While the recent increase in genomic resources holds great promise for a deep understanding of molluscan biology and evolution, access and utilization of these resources still pose a challenge. Here, we present the first comprehensive molluscan genomics database, MolluscDB (http://mgbase.qnlm.ac), which compiles and integrates current molluscan genomic/transcriptomic resources and provides convenient tools for multi-level integrative and comparative genomic analyses. MolluscDB enables a systematic view of genomic information from various aspects, such as genome assembly statistics, genome phylogenies, fossil records, gene information, expression profiles, gene families, transcription factors, transposable elements and mitogenome organization information. Moreover, MolluscDB offers valuable customized datasets or resources, such as gene coexpression networks across various developmental stages and adult tissues/organs, core gene repertoires inferred for major molluscan lineages, and macrosynteny analysis for chromosomal evolution. MolluscDB presents an integrative and comprehensive genomics platform that will allow the molluscan community to cope with ever-growing genomic resources and will expedite new scientific discoveries for understanding molluscan biology and evolution.
Mollusca, commonly known as shellfish, is the second largest phylum in the animal kingdom, with over 100 000 extant species. It also represents the largest marine phylum, containing ∼23% of all named marine organisms (1–3). Molluscs are globally distributed and play vital roles in the structure and functioning of marine, freshwater and terrestrial ecosystems. They are among the first bilaterians to appear in fossil records and mark the extraordinary Cambrian explosion of animals ∼540 million years ago (2). With tremendous diversity in morphologies, behaviours and lifestyles, they have survived several mass extinction events, which makes them well known as one of the most ancient and evolutionarily successful groups of invertebrates. Molluscs exhibit fascinating biological and evolutionary innovations, including a diversity of body plans and highly specialized structures (e.g. bivalve shells for defence and cephalopod arms for predation), adaptive life-history characters (e.g. up to 507 years life span for the bivalveArctica islandica (4)) and extraordinary developmental flexibility (e.g. up to a 4.4-year egg-brooding period for the deep-sea octopusGraneledone boreopacifica (5)). Molluscs have been employed as excellent models for over 100 years in studies of developmental and cell biology, neurobiology, physiology, behaviour, evolution, population genetics and materials science. Moreover, many molluscs are important fishery and aquaculture species, accounting for ∼22% of the total world aquaculture production (6). They therefore present an important source of food throughout the world and provide significant economic benefits to humans.
Despite their remarkable biological, evolutionary and ecological significance, molluscs have long been neglected from a genomic perspective (7,8). The rapid development of high-throughput sequencing technologies has pushed molluscan research into the genomics era. Decoding several molluscan genomes and transcriptomes has led to several major discoveries or breakthroughs, including heat shock protein and immune-related gene expansion for stressful intertidal zone and deep-sea adaptation (9–11), near-perfect preservation of bilaterian ancestor-like karyotypes (12,13), neural novelty evolution by extensive RNA editing (14,15), a single intercalation origin of metazoan larvae (16), and a deeply resolved molluscan phylogeny (1,17). While current molluscan genomic/transcriptomic resources have been accumulated and are rapidly increasing, the access and utilization of these scattered genomic resources pose a great challenge for the molluscan research community. There is an urgent need to establish a Mollusca genomics platform or database by integrating extensive genomic resources and developing convenient tools for comprehensive analysis of these data.
Towards this goal, we constructed the first comprehensive genomics database specifically for molluscs (named MolluscDB,http://mgbase.qnlm.ac) by integrating current molluscan genomic/transcriptomic resources and providing convenient tools for multi-level integrative and comparative analyses. MolluscDB enables a systematic view of genomic and transcriptomic information from various aspects and provides highly valuable, unique custom datasets or resources that are not available elsewhere. The database is compatible with computers, tablets, and mobile devices, and all data in MolluscDB can be freely accessed and downloaded.
MolluscDB represents the most comprehensive collection of 558 molluscan genomic/transcriptomic datasets (including 20 high-quality assembled genomes, 314 reference genome-profiled transcriptomes and 224de novo-profiled transcriptomes) and 409 mitochondrial genomic resources (Figure1, Table1). These resources show outstandingly high taxonomy coverage of all the seven classes and ∼87% of the total 53 orders (according to NCBI Taxonomy Database) in Mollusca. MolluscDB provides various genomic information, including genome assembly statistics, a genome phylogeny, fossil records, gene sequence, structure, functional annotations, expressional profiles, gene families, transcription factors and transposable elements. Convenient visualization of genomic information is compiled and integrated into a customized genome browser. MolluscDB also offers highly valuable, special-featured customized datasets or resources, including gene coexpression networks across various developmental stages and adult tissues/organs, the core gene repertoires inferred for Mollusca and descendent ancestors, and genome-by-genome macrosynteny analysis for inferring molluscan karyotype evolution. Moreover, MolluscDB provides useful and convenient tools for user-defined search of genes of interest, blast- and blat-based sequence comparison and PCR primer design. MolluscDB is implemented with the Linux operating system, using J2EE as the framework, MySQL as the back-end database and Apache Tomcat as the server. Web user interfaces were developed based on JavaServer Pages (JSP), HTML5 and CSS3.
Overview of MolluscDB database structure and web interface features.
Data | Statistics |
---|---|
Class /order/species | 3/46/123 |
Protein-coding genes | 563 593 |
Transcriptomic data/expression profiles | 538 |
Mitogenomic data | 409 |
Taxonomic categories with paleobiological records | 241 |
Types of functional annotation database | 6 |
Swissprot/Nr/Go/Kegg/Pfam/Panther annotation | 347 623/508 505/277 773/165 238/411 647/455 626 |
Transposable elements/associated genes | 72 640 596/522 372 |
Gene families/associated genes | 29 151/513 684 |
Groups of Pan-geneset | 38 |
Core gene families | 122 434 |
Dispensable gene families | 169 392 |
Core genes | 513 684 |
Unclustered genes | 49 909 |
Transcription factors/TF families | 26 441/71 |
Co-expressed gene networks | 18 |
Synteny gene pairs | 363 152 |
Data | Statistics |
---|---|
Class /order/species | 3/46/123 |
Protein-coding genes | 563 593 |
Transcriptomic data/expression profiles | 538 |
Mitogenomic data | 409 |
Taxonomic categories with paleobiological records | 241 |
Types of functional annotation database | 6 |
Swissprot/Nr/Go/Kegg/Pfam/Panther annotation | 347 623/508 505/277 773/165 238/411 647/455 626 |
Transposable elements/associated genes | 72 640 596/522 372 |
Gene families/associated genes | 29 151/513 684 |
Groups of Pan-geneset | 38 |
Core gene families | 122 434 |
Dispensable gene families | 169 392 |
Core genes | 513 684 |
Unclustered genes | 49 909 |
Transcription factors/TF families | 26 441/71 |
Co-expressed gene networks | 18 |
Synteny gene pairs | 363 152 |
Data | Statistics |
---|---|
Class /order/species | 3/46/123 |
Protein-coding genes | 563 593 |
Transcriptomic data/expression profiles | 538 |
Mitogenomic data | 409 |
Taxonomic categories with paleobiological records | 241 |
Types of functional annotation database | 6 |
Swissprot/Nr/Go/Kegg/Pfam/Panther annotation | 347 623/508 505/277 773/165 238/411 647/455 626 |
Transposable elements/associated genes | 72 640 596/522 372 |
Gene families/associated genes | 29 151/513 684 |
Groups of Pan-geneset | 38 |
Core gene families | 122 434 |
Dispensable gene families | 169 392 |
Core genes | 513 684 |
Unclustered genes | 49 909 |
Transcription factors/TF families | 26 441/71 |
Co-expressed gene networks | 18 |
Synteny gene pairs | 363 152 |
Data | Statistics |
---|---|
Class /order/species | 3/46/123 |
Protein-coding genes | 563 593 |
Transcriptomic data/expression profiles | 538 |
Mitogenomic data | 409 |
Taxonomic categories with paleobiological records | 241 |
Types of functional annotation database | 6 |
Swissprot/Nr/Go/Kegg/Pfam/Panther annotation | 347 623/508 505/277 773/165 238/411 647/455 626 |
Transposable elements/associated genes | 72 640 596/522 372 |
Gene families/associated genes | 29 151/513 684 |
Groups of Pan-geneset | 38 |
Core gene families | 122 434 |
Dispensable gene families | 169 392 |
Core genes | 513 684 |
Unclustered genes | 49 909 |
Transcription factors/TF families | 26 441/71 |
Co-expressed gene networks | 18 |
Synteny gene pairs | 363 152 |
The phylum Mollusca is commonly divided into seven classes: Gastropoda, Bivalvia, Cephalopoda, Scaphopoda, Monoplacophora, Polyplacophora and Aplacophora. Comprehensive genomics resources offered by MolluscDB cover all seven molluscan classes. At the genome level, 20 high-quality molluscan genomes with well-annotated gene information (e.g. gene sequence, structure and function) are presented in MolluscDB (Table2), which are derived from the Bivalvia, Gastropoda and Cephalopoda. A phylogenetic tree of the 20 molluscan genomes based on single-copy genes is shown on the MolluscDB homepage. Users can click on species names in the tree or names in the ‘Taxonomy’ module to view a brief biological introductions of each species and its genomic features or switch to frequently used modules through quick links at the bottom (Figure2A,B)
Taxonomy | Species | Genome_size (Mb) | Number of protein-coding genes | Contig N50 (Kb) | Scaffold N50 (Kb) | GC_content (%) | Repeat_rate (%) | References/Resources |
---|---|---|---|---|---|---|---|---|
Bivalvia | Patinopecten yessoensis | 988 | 24 738 | 38 | 804 | 36.52 | 27.85 | (13) |
Chlamys farreri | 780 | 28 602 | 22 | 602 | 35.49 | 27.73 | (11) | |
Argopecten purpuratus | 725 | 26 256 | 80 | 1 020 | 35.40 | 32.04 | (18) | |
Crassostrea gigas | 559 | 28 072 | 19 | 401 | 33.44 | 34.71 | (9) | |
Crassostrea virginica | 685 | 34 596 | 1 971 | 75 944 | 34.83 | 39.69 | (19) | |
Saccostrea glomerate | 788 | 29 738 | 40 | 804 | 33.31 | 45.39 | (20) | |
Pinctada fucata | 1024 | 31 477 | 21 | 167 | 35.03 | 43.35 | (21) | |
Pinctada fucata martensii | 991 | 30 815 | 21 | 324 | 35.32 | 48.01 | (22) | |
Bathymodiolus platifrons | 1660 | 33 584 | 13 | 343 | 34.17 | 47.25 | (13) | |
Modiolus philippinarum | 2630 | 36 549 | 20 | 100 | 33.96 | 59.66 | (13) | |
Scapharca broughtonii | 885 | 24 045 | 1798 | 4500 | 33.70 | 46.41 | (23) | |
Sinonovacula constricta | 1 332 | 26 273 | 679 | 57 990 | 35.45 | 36.65 | (24) | |
Cephalopoda | Octopus bimaculoides | 2 372 | 33 609 | 5 | 470 | 36.04 | 50.43 | (14) |
Octopus minor | 5 090 | 30 010 | 197 | 3020 | 36.34 | 75.62 | (25) | |
Gastropoda | Lottia gigantea | 360 | 23 818 | 96 | 1870 | 33.28 | 23.73 | (12) |
Haliotis discus hannai | 1 865 | 29 449 | 14 | 211 | 40.51 | 36.07 | (26) | |
Elysia chlorotica | 558 | 24 980 | 29 | 422 | 37.65 | 29.25 | (27) | |
Biomphalaria glabrata | 916 | 25 550 | 19 | 48 | 35.99 | 43.79 | (28) | |
Aplysia californica | 927 | 19 944 | 10 | 917 | 40.35 | 39.70 | NCBI Genome (AplCal3.0) | |
Pomacea canaliculate | 440 | 21 533 | 1073 | 31 530 | 40.62 | 20.72 | (29) |
Taxonomy | Species | Genome_size (Mb) | Number of protein-coding genes | Contig N50 (Kb) | Scaffold N50 (Kb) | GC_content (%) | Repeat_rate (%) | References/Resources |
---|---|---|---|---|---|---|---|---|
Bivalvia | Patinopecten yessoensis | 988 | 24 738 | 38 | 804 | 36.52 | 27.85 | (13) |
Chlamys farreri | 780 | 28 602 | 22 | 602 | 35.49 | 27.73 | (11) | |
Argopecten purpuratus | 725 | 26 256 | 80 | 1 020 | 35.40 | 32.04 | (18) | |
Crassostrea gigas | 559 | 28 072 | 19 | 401 | 33.44 | 34.71 | (9) | |
Crassostrea virginica | 685 | 34 596 | 1 971 | 75 944 | 34.83 | 39.69 | (19) | |
Saccostrea glomerate | 788 | 29 738 | 40 | 804 | 33.31 | 45.39 | (20) | |
Pinctada fucata | 1024 | 31 477 | 21 | 167 | 35.03 | 43.35 | (21) | |
Pinctada fucata martensii | 991 | 30 815 | 21 | 324 | 35.32 | 48.01 | (22) | |
Bathymodiolus platifrons | 1660 | 33 584 | 13 | 343 | 34.17 | 47.25 | (13) | |
Modiolus philippinarum | 2630 | 36 549 | 20 | 100 | 33.96 | 59.66 | (13) | |
Scapharca broughtonii | 885 | 24 045 | 1798 | 4500 | 33.70 | 46.41 | (23) | |
Sinonovacula constricta | 1 332 | 26 273 | 679 | 57 990 | 35.45 | 36.65 | (24) | |
Cephalopoda | Octopus bimaculoides | 2 372 | 33 609 | 5 | 470 | 36.04 | 50.43 | (14) |
Octopus minor | 5 090 | 30 010 | 197 | 3020 | 36.34 | 75.62 | (25) | |
Gastropoda | Lottia gigantea | 360 | 23 818 | 96 | 1870 | 33.28 | 23.73 | (12) |
Haliotis discus hannai | 1 865 | 29 449 | 14 | 211 | 40.51 | 36.07 | (26) | |
Elysia chlorotica | 558 | 24 980 | 29 | 422 | 37.65 | 29.25 | (27) | |
Biomphalaria glabrata | 916 | 25 550 | 19 | 48 | 35.99 | 43.79 | (28) | |
Aplysia californica | 927 | 19 944 | 10 | 917 | 40.35 | 39.70 | NCBI Genome (AplCal3.0) | |
Pomacea canaliculate | 440 | 21 533 | 1073 | 31 530 | 40.62 | 20.72 | (29) |
Taxonomy | Species | Genome_size (Mb) | Number of protein-coding genes | Contig N50 (Kb) | Scaffold N50 (Kb) | GC_content (%) | Repeat_rate (%) | References/Resources |
---|---|---|---|---|---|---|---|---|
Bivalvia | Patinopecten yessoensis | 988 | 24 738 | 38 | 804 | 36.52 | 27.85 | (13) |
Chlamys farreri | 780 | 28 602 | 22 | 602 | 35.49 | 27.73 | (11) | |
Argopecten purpuratus | 725 | 26 256 | 80 | 1 020 | 35.40 | 32.04 | (18) | |
Crassostrea gigas | 559 | 28 072 | 19 | 401 | 33.44 | 34.71 | (9) | |
Crassostrea virginica | 685 | 34 596 | 1 971 | 75 944 | 34.83 | 39.69 | (19) | |
Saccostrea glomerate | 788 | 29 738 | 40 | 804 | 33.31 | 45.39 | (20) | |
Pinctada fucata | 1024 | 31 477 | 21 | 167 | 35.03 | 43.35 | (21) | |
Pinctada fucata martensii | 991 | 30 815 | 21 | 324 | 35.32 | 48.01 | (22) | |
Bathymodiolus platifrons | 1660 | 33 584 | 13 | 343 | 34.17 | 47.25 | (13) | |
Modiolus philippinarum | 2630 | 36 549 | 20 | 100 | 33.96 | 59.66 | (13) | |
Scapharca broughtonii | 885 | 24 045 | 1798 | 4500 | 33.70 | 46.41 | (23) | |
Sinonovacula constricta | 1 332 | 26 273 | 679 | 57 990 | 35.45 | 36.65 | (24) | |
Cephalopoda | Octopus bimaculoides | 2 372 | 33 609 | 5 | 470 | 36.04 | 50.43 | (14) |
Octopus minor | 5 090 | 30 010 | 197 | 3020 | 36.34 | 75.62 | (25) | |
Gastropoda | Lottia gigantea | 360 | 23 818 | 96 | 1870 | 33.28 | 23.73 | (12) |
Haliotis discus hannai | 1 865 | 29 449 | 14 | 211 | 40.51 | 36.07 | (26) | |
Elysia chlorotica | 558 | 24 980 | 29 | 422 | 37.65 | 29.25 | (27) | |
Biomphalaria glabrata | 916 | 25 550 | 19 | 48 | 35.99 | 43.79 | (28) | |
Aplysia californica | 927 | 19 944 | 10 | 917 | 40.35 | 39.70 | NCBI Genome (AplCal3.0) | |
Pomacea canaliculate | 440 | 21 533 | 1073 | 31 530 | 40.62 | 20.72 | (29) |
Taxonomy | Species | Genome_size (Mb) | Number of protein-coding genes | Contig N50 (Kb) | Scaffold N50 (Kb) | GC_content (%) | Repeat_rate (%) | References/Resources |
---|---|---|---|---|---|---|---|---|
Bivalvia | Patinopecten yessoensis | 988 | 24 738 | 38 | 804 | 36.52 | 27.85 | (13) |
Chlamys farreri | 780 | 28 602 | 22 | 602 | 35.49 | 27.73 | (11) | |
Argopecten purpuratus | 725 | 26 256 | 80 | 1 020 | 35.40 | 32.04 | (18) | |
Crassostrea gigas | 559 | 28 072 | 19 | 401 | 33.44 | 34.71 | (9) | |
Crassostrea virginica | 685 | 34 596 | 1 971 | 75 944 | 34.83 | 39.69 | (19) | |
Saccostrea glomerate | 788 | 29 738 | 40 | 804 | 33.31 | 45.39 | (20) | |
Pinctada fucata | 1024 | 31 477 | 21 | 167 | 35.03 | 43.35 | (21) | |
Pinctada fucata martensii | 991 | 30 815 | 21 | 324 | 35.32 | 48.01 | (22) | |
Bathymodiolus platifrons | 1660 | 33 584 | 13 | 343 | 34.17 | 47.25 | (13) | |
Modiolus philippinarum | 2630 | 36 549 | 20 | 100 | 33.96 | 59.66 | (13) | |
Scapharca broughtonii | 885 | 24 045 | 1798 | 4500 | 33.70 | 46.41 | (23) | |
Sinonovacula constricta | 1 332 | 26 273 | 679 | 57 990 | 35.45 | 36.65 | (24) | |
Cephalopoda | Octopus bimaculoides | 2 372 | 33 609 | 5 | 470 | 36.04 | 50.43 | (14) |
Octopus minor | 5 090 | 30 010 | 197 | 3020 | 36.34 | 75.62 | (25) | |
Gastropoda | Lottia gigantea | 360 | 23 818 | 96 | 1870 | 33.28 | 23.73 | (12) |
Haliotis discus hannai | 1 865 | 29 449 | 14 | 211 | 40.51 | 36.07 | (26) | |
Elysia chlorotica | 558 | 24 980 | 29 | 422 | 37.65 | 29.25 | (27) | |
Biomphalaria glabrata | 916 | 25 550 | 19 | 48 | 35.99 | 43.79 | (28) | |
Aplysia californica | 927 | 19 944 | 10 | 917 | 40.35 | 39.70 | NCBI Genome (AplCal3.0) | |
Pomacea canaliculate | 440 | 21 533 | 1073 | 31 530 | 40.62 | 20.72 | (29) |
Screenshots for (A) overview of species information, (B) summary of genome assembly, (C) summary of transcriptomic data, (D) overview of mitogenomic information and (E) summary of paleobiological records.
Compared with genomic data, transcriptomic data are much more abundant and show much wider taxonomic coverage (particularly for taxa whose genomes are poorly investigated). All molluscan transcriptomic data deposited in the NCBI SRA database were searched, collected and filtered. In total, 314 reference genome-profiled transcriptomes derived from 12 species were chosen for further expression and network analysis, and 224 transcriptomes from 103 species without reference genomes (covering all seven molluscan classes) werede novo assembled and stored in the ‘Download’ module for free download. Users can browse detailed statistics of all the transcriptomes or download sequencing reads through related SRA links for further customized analysis in the ‘Transcriptomic Data’ module (Figure2C).
Mitochondria, existing in almost all eukaryotic cells, are key components participating in many important biological processes. Compared with nuclear genomic data, mitogenomic data are much easier to obtain and have been an important resource for investigating molluscan phylogeny and evolution (30). We collected 409 molluscan mitochondrial genomes, covering 42 orders and seven classes. For each species, a Circos graph showing mitochondrial gene information and an associated table with detailed genomic positions are presented in the ‘Mitogenomic Data’ module (Figure2D). Considering that some lineages in Bivalvia exhibit doubly uniparental inheritance (DUI;31), the haplotype information for mitogenome and sex information for sequenced individual are also provided. Additionally, we also provide the sequences and annotations of each mitochondrial genome for users to download.
With an evolutionary history of ∼540 million years and the possession of hardened mineralized exoskeletons, molluscs have been a well-characterized animal group with rich fossil records (32). These molluscan fossils provide crucial information for understanding molluscan phylogenetics and evolution. We collected fossil records derived from the Paleobiology Database (PBDB; (33)) for each Mollusca taxon. We organized all the searched records into a taxonomy tree presented in the ‘Paleobiological Records’ module (Figure2E) and linked the fossil record of each species with its relevant genomic/transcriptomic data. In total, 241 taxa distributed in seven Mollusca classes were labelled and linked with fossil records. Clicking on a labelled taxon name links to the external PBDB database and provides related paleobiology information, such as the morphology, dating and collection locations of fossils.
Functional annotation by homology comparison against public databases is crucial for understanding the possible functions of protein-coding genes. To comprehensively annotate 563,593 molluscan protein-coding genes, the ‘Gene Annotation’ module was set up, which compiles and integrates functional annotation information from six mainstream databases (Figure3A), including NR (34), Swiss-Prot (35), KEGG (36,37), GO (38,39), Pfam (40) and Panther (41). In total, 504 210 genes were annotated with at least one type of annotation. The detailed annotation information can be accessed by searching gene IDs for accurate matching or key words in annotation descriptions for fuzzy matching. Links to the ‘Gene Search’ and ‘Gbrowse' modules in MolluscDB and external annotation databases are related to each gene ID or annotation ID, respectively. Download options are also provided for user-defined downloading of annotation information for selected genes or all the protein-coding genes of selected species.
Screenshots for (A) gene annotation, (B) transposable elements, (C) transcription factors, (D) Gbrowse, (E) gene search and (F) gene family.
Transposable elements (TEs) are major components of eukaryotic genomes, with significant impacts on genome evolution, function and disease (42). To ensure the consistency of TE identification across various genome datasets, we developed a uniform pipeline to re-annotate all 20 molluscan genomes for TE identification by referring to previously published methods (29,10). Specifically, in MolluscDB, all annotated TEs were correlated with protein-coding genes for conveniently exploring relationships between TEs and potential target genes (Figure3B), with associations between 72 640 596 TEs and 522 372 genes characterized. Users can search for a certain genomic interval, TE subfamily type or gene ID to obtain and download full annotation information. Gbrowse links are also provided for visualization of each TE and its related gene.
Transcription factors (TFs), functioning as ‘master regulators’ and ‘selector genes’, exert control over biological processes that regulate growth, development and response to the external environment (43,44). We identified TF genes and classified them into gene families according to the AnimalTFDB database (version 3.0; (45)). In total, 26 441 TF genes were obtained from the 20 molluscan genomes and then classified into 71 gene families. In the ‘Transcription Factors’ module, users can search for the TF family of a species, a class or even all classes by family name to obtain TF gene family member information (Figure3C). Links are also provided in the ‘Gene Search’ module for TF genes, and download options are provided for TFs of interest to users.
Basic genomic features and annotated functional elements for 20 high-quality molluscan genomes in MolluscDB are visualized using a customized ‘Gbrowse' module (46). Users can quickly browse any selected genomic region through the genome browser and obtain a convenient view of related genomic annotations, including GC content, sequence and structure of protein-coding genes, and types of transposable elements (Figure3D). Clicking on any element embedded in the browser will display detailed information in a new page. Users are also allowed to create custom tracks by uploading genomic files with prescribed forms.
The ‘Gene Search’ module, which is cross-linked to other modules through the gene ID, ingrates basic gene information from multiple aspects for the whole gene sets of 20 molluscan genomes. Users can search for specific genes through three types of key words, namely, genomic region, gene ID and gene name. The search results contain the downloadable content of gene location, gene size, transcription direction, gene structure, functional annotations and genomic/CDS/protein sequence (Figure3E). Links to Gbrowse and functional databases are also provided in this module for deep gene mining.
In addition to the basic gene information of sequence, structure, and function, MolluscDB also provides gene expression profiles in various developmental stages or major adult tissues/organs. We retrieved 314 reference genome-profiled RNA-Seq datasets belonging to 12 molluscan species from the NCBI SRA database to calculate gene expression profiles in the ‘Expression Visualization’ module (Figure4A) based on a uniform processing pipeline (16). Users need to input gene IDs of specific species to view expression profiles in selected developmental stages or adult tissues/organs. The expression profiles are presented in the format of a heatmap or transcript per million [TPM] value table, which can be switched by clicking on the ‘Display Heatmap/TPM’ button.
Screenshots for (A) expression visualization and (B) gene coexpression network.
Co-expressed genes, reflecting possible relationships in expression regulation and important for elucidating gene interactions, can be displayed in the format of a gene co-expression network according to the similarity of gene expression patterns (47). Co-expressed gene networks for 12 species were constructed based on Pearson's correlation coefficient (PCC) values between pairs of genes (Figure4B) and visualized by using JavaScript Cytoscape.js (48). In total, we filtered and acquired 61,500 highly correlated co-expressed gene pairs. For a given query gene, displayed as a red dot, we show the network of the top 20 target genes with the highest correlation values, displayed in black dots. Users can click on any co-expressed gene in the network to view its co-expression network. In addition, a summary table of all co-expressed genes (also with links to the ‘Gene Search’ module) and corresponding functional annotations are provided below the network.
Identification and comparison of gene families are critical for understanding evolution and adaptation (49). Previous studies illustrated that the expansion of specific gene families is characteristic of molluscan genomes, which possibly corresponds to molluscan evolutionary success in terms of ecological adaptation and morphological diversity (9–11,14). In the ‘Gene Family’ module, we clustered and annotated gene families of 20 molluscan genomes based on OrthoMCL software (v2.0.9; (50)) and the Panther database (40), which resulted in 29 151 gene clusters containing 513 684 genes (Figure3F). Users can search key words in annotation descriptions to obtain gene families of interest. Clicking on the number of each cluster will display the genes with information on species, Panther annotation ID and description. To enable comparative analysis of gene families with other model organisms (e.g. fruit fly, mouse and zebrafish), Panther IDs in clustered gene families of molluscs were externally linked to the Panther database.
In an effort to define and characterize the pan-gene set for Mollusca at different phylogenetic levels, we set up the ‘Pan-geneset’ module, which provides information on core gene sets that are common to all species at a certain molluscan phylogenetic level and potentially dispensable gene sets that show presence/absence variations across species at the same phylogenetic level. Based on the gene family clustering results described above, we identified core/dispensable gene sets at 38 molluscan phylogenetic levels in the 20-mollusc phylogenetic tree (Figure5A). To enable a view of the distribution of core/dispensable gene sets in individual genomes, we classify and visualize all protein-coding genes of each species according to their commonness at certain phylogenetic levels (i.e. phylum/class/order/family/genus/species). By clicking on certain bar graphs, the user can download the gene IDs of corresponding gene sets.
Screenshots for specially customized modules for (A) Pan-geneset analysis and (B) macrosynteny analysis.
Macrosynteny analysis enables deep phylogenetic comparisons and an understanding of karyotype evolution by investigating conserved linkages between orthologous genes that are independent of intra-chromosomal rearrangements (12). Our previous macrosynteny analysis of 19 scallop chromosomes revealed that scallops may have a karyotype close to that of the bilaterian ancestor (13). Consistently a recent study supported the 19 presumed ancestral linkage groups (ALGs) of the bilaterian ancestor (51). To comprehensively investigate the evolution of molluscan karyotypes, we analysed the macrosyntenic relationships of 20 molluscan genomes with ancestral linkage groups represented by three conserved genomes (Patinopecten yessoensis, Branchiostoma floridae andNematostella vectensis) by adopting the approach described by Simakovet al. (12) and Wanget al. (13). In this module (Figure5B), users can view and compare the conservation level among 20 molluscan genomes according to different referred ALGs or focus on particular species by clicking on the dot plot to investigate detailed synteny relationships in an enlarged view. The download option is provided for users to obtain macrosynteny dot plots, homologous gene pairs and related gene sequences.
MolluscDB also provides users with several convenient online tools. Using the ‘primer design’ tool, users can choose a genomic region or directly input a sequence to design primers for PCR experiments. Users can use ‘Blast’ or ‘Blat’ to search for targeted genes by entering user-supplied sequences that are aligned against the genome, CDSs, protein sequences orde novo assembled transcripts.
Currently, high-quality genomes are largely biased to the bivalves, gastropods, and cephalopods, but the situation is expected to quickly change as the rapid increase of genomic resources would eventually cover all molluscan lineages. In the future, we will continuously update MolluscDB as new molluscan genomes and omics data become available and will add more annotation and functionalities to the database, such as the incorporation of multiomics data (e.g. epigenome, proteome, metabolome, phenome and microbiome), developmental transcriptome age-based analysis for evo-devo research (16), molecular marker resources (e.g. SNPs and microsatellites) for genomic breeding (52) and new machine learning-based tools for deep mining of multi-omics data (53) for understanding molluscan biology and evolution.
We wish to thank all researchers who have generated invaluable molluscan genomic resources that are gathered in the MolluscDB database. We thank Biomarker Technologies Corporation and Wuhan Gooalgene Technology Co., Ltd. for assisting in MolluscDB construction. We also thank the Center for High Performance Computing and System Simulation (Qingdao Pilot National Laboratory for Marine Science and Technology) for the support of hardware resources and network services.
National Key Research and Development Program of China [2018YFC0310802]; National Natural Science Foundation of China [31871499, 31702330]; Major basic research projects of Shandong Natural Science Foundation [ZR2018ZA0748]; Fundamental Research Funds for the Central Universities [201841001, 202064008]; Taishan Scholar Project Fund of Shandong Province of China.
Conflict of interest statement. None declared.
Month: | Total Views: |
---|---|
October 2020 | 875 |
November 2020 | 500 |
December 2020 | 219 |
January 2021 | 317 |
February 2021 | 182 |
March 2021 | 220 |
April 2021 | 223 |
May 2021 | 165 |
June 2021 | 138 |
July 2021 | 159 |
August 2021 | 135 |
September 2021 | 193 |
October 2021 | 186 |
November 2021 | 229 |
December 2021 | 152 |
January 2022 | 210 |
February 2022 | 142 |
March 2022 | 261 |
April 2022 | 232 |
May 2022 | 166 |
June 2022 | 147 |
July 2022 | 98 |
August 2022 | 133 |
September 2022 | 194 |
October 2022 | 218 |
November 2022 | 151 |
December 2022 | 126 |
January 2023 | 102 |
February 2023 | 153 |
March 2023 | 204 |
April 2023 | 159 |
May 2023 | 164 |
June 2023 | 165 |
July 2023 | 81 |
August 2023 | 114 |
September 2023 | 148 |
October 2023 | 139 |
November 2023 | 378 |
December 2023 | 1,121 |
January 2024 | 187 |
February 2024 | 126 |
March 2024 | 162 |
April 2024 | 210 |
May 2024 | 137 |
June 2024 | 141 |
July 2024 | 124 |
August 2024 | 127 |
September 2024 | 144 |
October 2024 | 192 |
November 2024 | 194 |
December 2024 | 118 |
January 2025 | 58 |
February 2025 | 90 |
March 2025 | 170 |
Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide
This PDF is available to Subscribers Only
View Article Abstract & Purchase OptionsFor full access to this pdf, sign in to an existing account, or purchase an annual subscription.