Ade novo Full-Length mRNA Transcriptome Generated From Hybrid-Corrected PacBio Long-Reads Improves the Transcript Annotation and Identifies Thousands of Novel Splice Variants in Atlantic Salmon
- PMID:33986770
- PMCID: PMC8110904
- DOI: 10.3389/fgene.2021.656334
Ade novo Full-Length mRNA Transcriptome Generated From Hybrid-Corrected PacBio Long-Reads Improves the Transcript Annotation and Identifies Thousands of Novel Splice Variants in Atlantic Salmon
Abstract
Atlantic salmon (Salmo salar) is a major species produced in world aquaculture and an important vertebrate model organism for studying the process of rediploidization following whole genome duplication events (Ss4R, 80 mya). The currentSalmo salar transcriptome is largely generated from genome sequence basedin silico predictions supported by ESTs and short-read sequencing data. However, recent progress in long-read sequencing technologies now allows for full-length transcript sequencing from single RNA-molecules. This study provides ade novo full-length mRNA transcriptome from liver, head-kidney and gill materials. A pipeline was developed based on Iso-seq sequencing of long-reads on the PacBio platform (HQ reads) followed by error-correction of the HQ reads by short-reads from the Illumina platform. The pipeline successfully processed more than 1.5 million long-reads and more than 900 million short-reads into error-corrected HQ reads. A surprisingly high percentage (32%) represented expressed interspersed repeats, while the remaining were processed into 71 461 full-length mRNAs from 23 071 loci. Each transcript was supported by several single-molecule long-read sequences and at least three short-reads, assuring a high sequence accuracy. On average, each gene was represented by three isoforms. Comparisons to the current Atlantic salmon transcripts in the RefSeq database showed that the long-read transcriptome validated 25% of all known transcripts, while the remaining full-length transcripts were novel isoforms, but few were transcripts from novel genes. A comparison to the current genome assembly indicates that the long-read transcriptome may aid in improving transcript annotation as well as provide long-read linkage information useful for improving the genome assembly. More than 80% of transcripts were assigned GO terms and thousands of transcripts were from genes or splice-variants expressed in an organ-specific manner demonstrating that hybrid error-corrected long-read transcriptomes may be applied to study genes and splice-variants expressed in certain organs or conditions (e.g., challenge materials). In conclusion, this is the single largest contribution of full-length mRNAs in Atlantic salmon. The results will be of great value to salmon genomics research, and the pipeline outlined may be applied to generate additionalde novo transcriptomes in Atlantic Salmon or applied for similar projects in other species.
Keywords: Atlantic salmon; Illumina sequencing; PacBio Iso-seq; full-length mRNA; hybrid error correction; transcriptome.
Copyright © 2021 Ramberg, Høyheim, Østbye and Andreassen.
Conflict of interest statement
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Figures













Similar articles
- A survey of the complex transcriptome from the highly polyploid sugarcane genome using full-length isoform sequencing and de novo assembly from short read sequencing.Hoang NV, Furtado A, Mason PJ, Marquardt A, Kasirajan L, Thirugnanasambandam PP, Botha FC, Henry RJ.Hoang NV, et al.BMC Genomics. 2017 May 22;18(1):395. doi: 10.1186/s12864-017-3757-8.BMC Genomics. 2017.PMID:28532419Free PMC article.
- PacBio full-length cDNA sequencing integrated with RNA-seq reads drastically improves the discovery of splicing transcripts in rice.Zhang G, Sun M, Wang J, Lei M, Li C, Zhao D, Huang J, Li W, Li S, Li J, Yang J, Luo Y, Hu S, Zhang B.Zhang G, et al.Plant J. 2019 Jan;97(2):296-305. doi: 10.1111/tpj.14120. Epub 2018 Dec 3.Plant J. 2019.PMID:30288819
- Uncovering full-length transcript isoforms of sugarcane cultivar Khon Kaen 3 using single-molecule long-read sequencing.Piriyapongsa J, Kaewprommal P, Vaiwsri S, Anuntakarun S, Wirojsirasak W, Punpee P, Klomsa-Ard P, Shaw PJ, Pootakham W, Yoocha T, Sangsrakru D, Tangphatsornruang S, Tongsima S, Tragoonrung S.Piriyapongsa J, et al.PeerJ. 2018 Oct 30;6:e5818. doi: 10.7717/peerj.5818. eCollection 2018.PeerJ. 2018.PMID:30397543Free PMC article.
- PacBio Sequencing and Its Applications.Rhoads A, Au KF.Rhoads A, et al.Genomics Proteomics Bioinformatics. 2015 Oct;13(5):278-89. doi: 10.1016/j.gpb.2015.08.002. Epub 2015 Nov 2.Genomics Proteomics Bioinformatics. 2015.PMID:26542840Free PMC article.Review.
- Methodologies for Transcript Profiling Using Long-Read Technologies.Oikonomopoulos S, Bayega A, Fahiminiya S, Djambazian H, Berube P, Ragoussis J.Oikonomopoulos S, et al.Front Genet. 2020 Jul 7;11:606. doi: 10.3389/fgene.2020.00606. eCollection 2020.Front Genet. 2020.PMID:32733532Free PMC article.Review.
Cited by
- Full-length transcriptome from different life stages of cobia (Rachycentron canadum, Rachycentridae).Ebeneezar S, Krupesha Sharma SR, Vijayagopal P, Sebastian W, Sajina KA, Tamilmani G, Sakthivel M, Rameshkumar P, Anikuttan KK, Varghese E, Linga Prabu D, Jeena NS, Sumithra TG, Gayathri S, Iyyapparaja Narasimapallavan G, Gopalakrishnan A.Ebeneezar S, et al.Sci Data. 2023 Feb 16;10(1):97. doi: 10.1038/s41597-022-01907-0.Sci Data. 2023.PMID:36797271Free PMC article.
- Long-read isoform sequencing reveals tissue-specific isoform expression between active and hibernating brown bears (Ursus arctos).Tseng E, Underwood JG, Evans Hutzenbiler BD, Trojahn S, Kingham B, Shevchenko O, Bernberg E, Vierra M, Robbins CT, Jansen HT, Kelley JL.Tseng E, et al.G3 (Bethesda). 2022 Mar 4;12(3):jkab422. doi: 10.1093/g3journal/jkab422.G3 (Bethesda). 2022.PMID:35100340Free PMC article.
- Differential Expression of miRNAs and Their Predicted Target Genes Indicates That Gene Expression in Atlantic Salmon Gill Is Post-Transcriptionally Regulated by miRNAs in the Parr-Smolt Transformation and Adaptation to Sea Water.Shwe A, Krasnov A, Visnovska T, Ramberg S, Østbye TK, Andreassen R.Shwe A, et al.Int J Mol Sci. 2022 Aug 8;23(15):8831. doi: 10.3390/ijms23158831.Int J Mol Sci. 2022.PMID:35955964Free PMC article.
- PacBio Iso-Seq Improves the Rainbow Trout Genome Annotation and Identifies Alternative Splicing Associated With Economically Important Phenotypes.Ali A, Thorgaard GH, Salem M.Ali A, et al.Front Genet. 2021 Jul 15;12:683408. doi: 10.3389/fgene.2021.683408. eCollection 2021.Front Genet. 2021.PMID:34335690Free PMC article.
- MicroSalmon: A Comprehensive, Searchable Resource of Predicted MicroRNA Targets and 3'UTR Cis-Regulatory Elements in the Full-Length Sequenced Atlantic Salmon Transcriptome.Ramberg S, Andreassen R.Ramberg S, et al.Noncoding RNA. 2021 Sep 22;7(4):61. doi: 10.3390/ncrna7040061.Noncoding RNA. 2021.PMID:34698276Free PMC article.
References
- Allendorf F. W., Thorgaard G. H. (1984). “Tetraploidy and the evolution of salmonid fishes,” in Evolutionary Genetics of Fishes. Monographs in Evolutionary Biology, ed. Turner B. J. (Boston, MA: Springer; ), 1–53. 10.1007/978-1-4684-4652-4_1 - DOI
Related information
LinkOut - more resources
Full Text Sources