- Review Article
- Published:
Transcriptomics in the era of long-read sequencing
- Carolina Monzó ORCID:orcid.org/0000-0002-5043-81451 na1,
- Tianyuan Liu ORCID:orcid.org/0000-0002-8561-62391 na1 &
- Ana Conesa ORCID:orcid.org/0000-0001-9597-311X1
Nature Reviews Genetics (2025)Cite this article
8825Accesses
80Altmetric
Abstract
Transcriptome sequencing revolutionized the analysis of gene expression, providing an unbiased approach to gene detection and quantification that enabled the discovery of novel isoforms, alternative splicing events and fusion transcripts. However, although short-read sequencing technologies have surpassed the limited dynamic range of previous technologies such as microarrays, they have limitations, for example, in resolving full-length transcripts and complex isoforms. Over the past 5 years, long-read sequencing technologies have matured considerably, with improvements in instrumentation and analytical methods, enabling their application to RNA sequencing (RNA-seq). Benchmarking studies are beginning to identify the strengths and limitations of long-read RNA-seq, although there remains a need for comprehensive resources to guide newcomers through the intricacies of this approach. In this Review, we provide a comprehensive overview of the long-read RNA-seq workflow, from library preparation and sequencing challenges to core data processing, downstream analyses and emerging developments. We present an extensive inventory of experimental and analytical methods and discuss current challenges and prospects.
You have full access to this article via your institution.
Similar content being viewed by others
Introduction
Studying the transcriptome — the collection of RNA transcripts expressed under specific conditions — is instrumental to characterize cellular responses, infer gene regulatory networks, unveil disease mechanisms and discover diagnostic biomarkers1. For the past 20 years, short-read RNA sequencing (srRNA-seq) has been the gold standard for transcriptomics. However, srRNA-seq falls short of fully capturing the complexity and dynamics of the transcriptome2. Despite its widespread utility, srRNA-seq is constrained by the resulting 100–150 bp read length, which can reach ~300 bp in paired-end sequencing3, making it difficult to resolve the connectivity between distant exons, as they are never represented on the same sequenced fragment4. Moreover, srRNA-seq cannot directly detect epitranscriptomic modifications and struggles with mapping repetitive regions, which limits its capacity to distinguish between similar transcripts. As a result, it is particularly challenging to study highly polymorphic genes, for example, HLA genes5, and recombinant genes, such as immunoglobulin genes6.
By contrast, long-read sequencing (LRS) methods such as nanopore sequencing, developed by Oxford Nanopore Technologies (ONT), and single-molecule real-time (SMRT) sequencing, developed by Pacific Biosciences (PacBio) (Fig. 1a), sequence entire RNA or cDNA molecules, which enables the recovery of full-length transcripts without assembly steps7,8,9. Features that distinguish very similar transcripts, such as differences at start and polyadenylation (poly(A)) sites, alternative exons, small in-phase nucleotide variants or modifications of the RNA molecule, can thus be captured in the same read (Fig. 1b). Consequently, long-read RNA sequencing (lrRNA-seq) has revealed new insights into transcriptome biology across diverse contexts. Illustrative examples include the identification of alternative splicing events in different brain regions and their association with neuropsychiatric and neurodegenerative diseases10,11, gene fusions in cancer cell lines12, alternative transcription start sites (TSSs) in thymus development, cell-state transitions and different cancer types and stages13,14,15, and alternative transcription termination sites (TTSs) in neurodevelopment and evolution16,17. Additionally, LRS has driven the discovery of thousands of novel isoforms, even in well-annotated organisms18,19, and can be used to study allele-specific expression20 and RNA modifications21,22, among other applications.
a, In long-read sequencing using Oxford Nanopore Technologies (ONT), cDNA or RNA molecules (in red and yellow) are tagged with sequencing adapters (in light blue) preloaded with a motor protein, and combined with a tether that localizes the molecules close to the flow cell. The sequencing occurs by introducing the sequencing adapter into a nanoscale pore and the motor protein unwinding the double-stranded molecules to drive them through the nanopore. As the molecule passes through the pore, the disruptions on the electric signal crossing the pore (squiggles) are measured and recorded in FAST5 or POD5 formatted files. Changes in current within the pore are used to identify the molecule sequences. In long-read single-molecule, real-time sequencing (SMRT) using Pacific Biosciences (PacBio), cDNA molecules (forward strand in yellow and reverse strand in red) are ligated to hairpin adapters (in light blue) to form circular molecules known as SMRTbells. SMRTbells are bound to polymerases (in purple) and immobilized in zero-mode waveguides. The polymerase synthesizes a new strand of cDNA using fluorescently labelled deoxynucleoside triphosphates (dNTPs) that are excited by light pulses. The fluorescent wavelengths are recorded, and the fluorophores are cleaved from the nucleotides to prevent fluorescent interference during the subsequent dNTP incorporation and light pulse. Multiple sequencing passes of the circularized cDNA molecule (subreads) allow for the generation of a consensus sequence of high accuracy.b, Core applications of long-read sequencing. Unlike short-read sequencing, long-read sequencing can capture full-length transcript isoforms in a single read, preserving exon continuity and identifying phased polymorphisms, which allows for allele differentiation and expression quantification and resolves the expression of highly similar transcripts. Additionally, although short-read sequencing of RNA modifications could only be performed indirectly by using RNA modification-specific antibodies to enrich fragments, or using chemical treatments to alter nucleotide bases, ONT directly detects epitranscriptomic modifications in the RNA molecule using unsupervised, supervised or semisupervised methods to identify specific disruption signatures in the electric current.c, The long-read RNA sequencing (lrRNA-seq) workflow: adequate experimental and sequencing design is necessary to obtain a data set that effectively addresses the biological questions of interest. The key steps for experimental and sequencing design, core data processing, downstream analysis and emerging applications are listed above the lines. The key considerations of each step are listed below the lines. Key factors include RNA quality, technology selection, library preparation methods, sequencing depth and the use of multiplexing and internal controls. Core data processing involves steps for transcriptome profiling and quality control. Downstream analyses encompass genome annotation, differential gene expression and functional interpretation. Emerging lrRNA-seq applications are expanding long-read sequencing to single-cell resolution and enabling the exploration of other aspects of RNA biology. AI, artificial intelligence; bp, base pairs; CCS, circular consensus sequencing; GC, guanosine–cytosine; GPU, graphical processing unit; LR, long read; poly(A), polyadenylation; scIso-Seq, single-cell RNA isoform sequencing; SNP, single nucleotide polymorphism; SR, short read.
Initially, LRS technologies were limited by high sequencing error rates and low throughput23,24. However, improvements of platforms and library preparation methods over the past 5 years have enhanced accuracy, with tens of millions of reads generated in a single experiment25,26. The resulting increase in sample sizes has enabled the investigation of differential expression of full-length transcript models14,27, profiling of genotypes18,28, analyses in single cells11,29 and spatial isoform expression10,30. Nowadays, there are algorithms for transcript reconstruction31,32,33, quantification34,35, differential expression36,37, RNA modification detection21,22, as well as advanced library preparation methods25,38. Additionally, benchmarking efforts have provided a comparative analysis of these experimental and computational methods39,40, enhancing our understanding of their strengths and weaknesses (Box 1).
As LRS becomes more popular and accessible, new users face challenges in navigating the complexities of these technologies. This Review showcases the potential of lrRNA-seq and provides essential guidance for navigating both experimental and computational workflows. We provide a comprehensive overview of lrRNA-seq techniques, covering experimental design, library preparation, data processing, transcript detection, quantification and downstream analyses (Fig. 1c). We also address current challenges and future directions, aiming to help researchers effectively use LRS technologies for transcriptomic studies.
Experimental and sequencing design
Designing a robust experiment is critical to effectively answer biological questions. In lrRNA-seq, the factors to consider are RNA integrity, choice of sequencing platform, library preparation protocol, sequencing depth, multiplexing strategies and inclusion of external controls (Fig. 1c).
RNA quality
To effectively sequence full-length transcripts, high-quality RNA is essential. Degraded samples can produce incomplete transcript sequences, making it difficult to differentiate alternative TSS and TTS and misrepresenting the complexity of the transcriptome. RNA degradation often results in biased coverage, with a tendency to over-represent the more stable 3′ ends of transcripts, leading to inaccurate isoform quantification. Additionally, degraded RNA compromises the ability to measure poly(A) tail lengths and introduces ambiguity in spliced alignment owing to shorter read lengths41,42,43. The RNA integrity number (RIN) measures the quality of RNA in a sample by analysing the ratio of intact ribosomal RNA peaks to degraded RNA fragments from an electropherogram; higher values indicate better RNA preservation44. Samples with a RIN > 9.5 in ONT sequencing were found to be undegraded, although samples with a RIN > 7 could be used for lrRNA-seq, provided that correction steps, such as NanoCount filtering according to best alignment per read, were implemented45. Alternatively, using orthogonal data to validate TSS and TTS46, or not considering isoforms with alternative TSS and TTS in a differential isoform usage analysis by using transcript algorithms heavily reliant on the reference genome32,33, could mitigate the biases of degraded RNA TSS and TTS in the lrRNA-seq analysis. Incorporating RIN as a covariate in statistical models aimed at identifying differentially expressed transcripts between conditions can account for degradation-related confounders43. Both ONT and PacBio recommend using RNA with a RIN > 8 to ensure accurate transcriptome representation47,48. Therefore, it is crucial to implement specific protocols for sample storage and RNA extraction that preserve RNA integrity as much as possible, tailored to the tissue or cell type under study.
Sequencing platform and library preparation
When selecting a sequencing platform, read length and accuracy are crucial (Fig. 1c and Table 1). LRS methods offer full-length molecule sequencing, but there are marked differences in both length distributions and read quality9,40. Both platforms can produce highly accurate sequences and capture full-length transcripts but have limitations to span the full range of lengths in the transcriptome. PacBio’s commercial, accessible protocols are generally preferred when read accuracy and broad length range are priorities39,40,49, whereas ONT’s protocols are usually preferred when the priority is studying RNA modifications simultaneously with full-length transcripts50,51,52.
ONT processes single DNA or RNA molecules in real time as they pass through a nanoscale pore embedded in a polymer membrane and disrupt an ionic current in ways indicative of their molecular sequence, using a motor protein to regulate the passage speed26 (Fig. 1a). PacBio uses fluorescence signals for basecalling but differs from short-read sequencing methods by its continuous DNA synthesis. It uses zero-mode waveguides (ZMWs), small wells with a polymerase at the bottom, to confine the observation volume to a single circular cDNA molecule, novel phospho-linked nucleotides to ensure high accuracy and multiple-pass sequencing of the circular cDNA molecule to grant high fidelity. Engineered DNA polymerases synthesize DNA without interruption, producing reads that span thousands of bases53 (Fig. 1a). Owing to these differences in chemistry, PacBio typically produces cDNA reads with higher accuracy than ONT40,54. Generally, ONT protocols can potentially sequence longer transcripts but have a bias towards selecting short fragments (Table 1) and have historically struggled with read accuracy26,55. However, read quality is strongly influenced by the library preparation method, which is platform-specific (Fig. 2).
a, Retrotranscription of mRNA into cDNA for sequencing using either Oxford Nanopore Technologies (ONT) or Pacific Biosciences (PacBio).b, cDNA library preparation for ONT.c, cDNA library preparation for PacBio, including both Iso-Seq and Kinnex/MAS-seq.d, R2C2 cDNA library preparation to improve read accuracy in ONT.e, TEQUILA-seq library preparation to increase sequencing depth of target molecules.f, Nano3P-seq library preparation to capture non-poly(A) tailed RNAs in ONT.g, CapTrap-seq library preparation to conserve 5′ ends.h, Direct RNA sequencing by ONT for simultaneous detection of transcripts and their modifications. RCA, rolling circle amplification.
Library preparation methods for ONT and PacBio usually begin with cDNA synthesis from poly(A)-selected RNA through an initial reverse transcription (RT) step, template switching and a second synthesis step to obtain double-stranded cDNA (Fig. 2a). After synthesis, ONT ligates a tether to the cDNA, which is then targeted and bound to the nanopore for sequencing26 (Fig. 2b). By contrast, PacBio ligates universal hairpin adapters to the double-stranded cDNA fragments to create single-stranded circular cDNAs (to generate a so-called SMRTbell library) and anneals the sequencing primer to the hairpin adapters for sequencing in ZMWs53 (Fig. 2c). Although PacBio sequencing produces high fidelity transcriptomes (>99% accuracy)53, its lrRNA-seq library preparation protocols, Iso-Seq and Kinnex, include a size selection step that uses magnetic purification beads to generate the SMRTbell templates, which yield a consensus sequence from multiple sequencing passes; this size selection results in a selection bias towards a specific molecule length9 (Table 1). The most recent PacBio protocol, Kinnex47, the commercial brand of the MAS-seq protocol25, concatenates cDNA molecules into larger fragment libraries (Fig. 2c), increasing the number of transcripts read in each sequencing pass, thereby increasing throughput without affecting accuracy25 (Table 1). However, Kinnex/MAS-seq concatenation generates large circularized molecules that must fit into ZMWs. Therefore, it can produce shorter and more truncated transcripts compared with Iso-Seq, the previous standard PacBio library preparation method for RNA sequencing25,47.
Non-commercial library preparation protocols have been developed to address specific technology issues or biological questions. The ONT-specific R2C2 method creates a circular cDNA sequence that is amplified before ONT sequencing to mimic PacBio’s multiple-pass strategy and improve read accuracy56 (Fig. 2d). The R2C2 method reaches 94–99% read accuracy, making it 1.08–1.14 times more accurate than sequencing the same samples with the same ONT chemistry and basecaller but without R2C2 circularization. However, this comes at the cost of reduced sequencing depth56,57. TEQUILA-seq and Capture-seq introduce probe or open reading frame capture to increase throughput of target molecules58,59 (Fig. 2e). FLAM-seq tails poly(A)-selected RNA with guanosines and inosines for full poly(A) tail capture, before priming for retrotranscription and PCR amplification60. Nanopore 3′-end-capture sequencing (Nano3P-seq) uses template switching to initiate RT and capture deadenylated molecules and RNAs with other types of tails61 (Fig. 2f). To avoid poly(A) annealing biases enriching 5′ degradation products62, CapTrap-seq modifies the 5′ cap of intact RNA molecules with biotin and uses streptavidin with oligo(dT) priming to detect 5′ capped full-length transcripts38 (Fig. 2g).
In addition to cDNA, ONT can directly sequence RNA molecules. Although most lrRNA-seq experiments rely on cDNA libraries owing to their high sequencing throughput and accuracy, RT can introduce errors and mis-priming. These issues are often driven by sequence-specific elements in the primary structure of RNA, which can lead to the generation of erroneous cDNA molecules that fail to accurately represent structural variation63. By sequencing RNA directly, these issues can potentially be overcome, while enabling the simultaneous detection of epigenetic modifications, which are preserved as the RNA does not undergo RT64. Direct RNA library preparation involves adding a primer to the 3′ end of RNA molecules and ligating a sequencing adapter50 (Fig. 2h). Although the sequencing process is simple — the electrical current measured in the nanopore is altered when a modified base passes through — correct assignment of epigenetic modifications is challenging, as it requires machine-learning models pre-trained on ground-truth data65.
Sequencing depth
The sequencing depth of LRS transcriptomes, measured by the number of reads per sample, has substantially increased in recent years, owing to advances in sequencing platforms, ONT flow cells and PacBio SMRTcells, their chemistries — including the recent RNA004 in ONT and SPRQ in PacBio — as well as more efficient library preparation methods (Table 1). ONT’s PromethION and PacBio’s Revio instruments can generate up to 130 million and 100 million reads per run, respectively47,66,67 (Table 1), which has improved the detection and discovery of novel and lowly expressed transcripts18,25. The use of adaptive sampling — ONT’s method for selectively sequencing transcripts of interest while ejecting non-target transcripts from the pores — has also enabled the enrichment of specific transcripts, improving the detection of known transcripts with low expression levels68.
The required sequencing depth for an lrRNA-seq experiment depends on the biological question and the diversity of the organism’s transcriptome. For example, inDrosophila, 2 million versus 250,000 ONT long reads did not have a strong effect on gene detection, but higher sequencing depth increased the discovery of novel transcripts69. This finding was also reproduced in a benchmarking experiment studying recall of ground-truth synthetic DNA spike-in controls (sequins) in a full data set versus a down-sampled set70.
To study known transcripts in human cell lines, brain and heart tissues — in organisms with more complex transcriptomes thanDrosophila — PacBio recommends generating 10 million high-quality long reads to detect ~80% of known isoforms. Beyond this saturation point, gains in isoform detection diminish47,71. However, transcriptomic benchmarking for the recently developed R10.4 ONT flow cells, which differ in read accuracy to PacBio, is pending.
As more transcripts are identified with more reads generated, the detection of rare or unannotated isoforms and accurate transcript quantification require deeper sequencing — >20 million high-quality reads or >40× depth72,73. More complex transcriptomes may require even higher depths, with adjusted calculations such as:\({{\rm{Depth}}}_{{\rm{sample}}}=\frac{\varSigma {{\rm{Depth}}}_{{\rm{isoform}}}}{{N}_{{\rm{expressed\; isoforms\; in\; sample}}}}\) (refs.39,40). Therefore, independent analyses using saturation curves to evaluate organisms with varying transcriptome complexities are necessary to exhaustively evaluate transcriptome coverage and determine optimal library sizes for novel isoform detection, quantification and expression profiling.
Replication and multiplexing
Increased throughput in LRS technologies has enabled the inclusion of biological replicates in lrRNA-seq experiments. Despite recent high correlation (>0.6) across replicates in lrRNA-seq11,29,40, replicability tends to be lower for transcripts with medium-to-low expression levels. For instance, the Long-read RNA-seq Genome Annotation Assessment Project (LRGASP) benchmark project identified numerous lowly expressed transcripts only in single samples, which were, therefore, not replicated40. Replication follows a bimodal distribution, with most transcripts either present in multiple samples or confined to single samples74, the latter typically novel and lowly expressed RNA molecules. Therefore, replication is necessary for statistical power and to reduce the inclusion of spurious transcripts in downstream analyses.
Previous high costs and low throughput of lrRNA-seq experiments imposed allocating one or more sequencing runs per sample, potentially confounding sequencing run biases with the experimental condition75,76,77,78. The increased yield of current sequencing platforms facilitates sample multiplexing, and ONT and PacBio provide up to 24 and 12 barcodes to combine samples in a single run (Fig. 1c). Multiplexing thus enables sequencing strategies that distribute experimental conditions across several runs69. However, this approach should be carefully implemented, including re-quantifying cDNA after library preparation and adjusting to ensure equimolar concentrations of all samples in the pool.
Ground-truth spike-ins
To assess library preparation and sequencing experiment quality, synthetic spike-in RNA variants (SIRVs)79, external RNA controls developed by the External RNA Controls Consortium (ERCC)80 and sequins81 are used. These preparations vary in RNA length and the presence of alternative isoforms. SIRVs range between ~200 bp and 12 kb and contain alternative transcripts at different concentrations, whereas sequins span from 280 bp to 7,000 bp. Spike-ins provide valuable information about captured transcript length distribution and expected concentrations, but strategies to correct detection and quantification biases have yet to be proposed.
Core data processing of lrRNA-seq data
Core data processing of lrRNA-seq involves raw signal acquisition, basecalling, transcript identification and quantification and quality checks at both long-read and reconstructed-transcript levels (Fig. 1c and Table 2).
Long-read basecalling
Basecalling converts continuous signals — current fluctuations for ONT or light pulses for PacBio — into nucleotide sequences. Advancements in software and hardware have improved accuracy and speed in both technologies. PacBio’s DeepConsensus, a six-layer gap-aware transformer, corrects insertion/deletion errors and achieves 99.90% Q30 basecall accuracy, calculated as mean per base accuracy82, whereas ONT’sDorado achieves 99.26% Q20 basecall accuracy, calculated as median per read accuracy83. GPU acceleration yields faster processing times, and both technologies have integrated NVIDIA 100 A GPUs as compatible hardware. Using this equipment, PacBio reduced circular consensus sequencing times from 15 h to 2.5 h, according toNVIDIA. Further improvements in accuracy and speed are expected as technologies evolve.
Mapping long reads to reference genomes
Mapping long reads or transcripts to the reference genome or transcriptome presents challenges owing to the presence of splice sites and sequencing errors that require mappers capable of handling split alignments. Minimap2, the recommended and most widely used mapper for long and noisy sequences, uses an efficient seed-and-chain framework84 to extractk-mers from the reference genome and index them to identify exact matches (anchors) with query sequences. However, Minimap2 has limitations when aligning small exons and handling complex split alignments across variable splice junctions85.
To address these specific issues in studies focused on small exons or complex split alignments, specialized aligners have been developed: uLTRA improves splice alignment accuracy, especially for small exons and complex junctions, by using a two-pass collinear chaining algorithm that leverages exon annotations for precise boundary detection85; Graphmap2 optimizes anchor selection and refines alignments at exon boundaries86; deSALT uses a de Bruijn graph-based approach with a two-pass alignment to generate spliced reference sequences and produce refined alignments87; 2passtools and Splam incorporate machine learning to filter spurious splice junctions88,89; and Magic-BLAST chains collinear local alignments to maximize scores and maintain splice signal consistency, mapping reads to genome and transcriptome without a two-pass process90.
Transcript identification and quantification
Transcript identification and quantification are the primary goals of lrRNA-seq, which can be achieved through two main strategies: inferring transcript models from the long reads, with or without the support of a reference, and quantifying them based on the model; or quantifying transcripts based on existing genome annotations by mapping or pseudo-aligning reads to a reference transcriptome34. Transcript model quantification enables novel transcript discovery but is usually computationally intensive, whereas quantification based on a reference transcriptome by definition does not include novel transcript identification but is computationally efficient. The approach to choose depends on the goal of the study (Box 1), but it should be noted that lrRNA-seq is characterized by the detection of a large number of novel transcripts (albeit of low expression)40, implying that reference annotations are expected to be, in general, incomplete, which can affect read assignment to reference transcripts91.
Algorithms for transcript identification followed by quantification can be classified into cluster-based, graph-based and class-based methods (Fig. 3a), further distinguished by their dependence on a reference genome or annotation into reference annotation-free and fully reference-free methods. Cluster-based methods such as FLAIR92 and Mandalorion31 align reads to a reference genome, grouping them based on shared splice junctions or chains. Graph-based approaches, such as IsoQuant32, IsoTools37, Isosceles93 and StringTie294, construct splice graphs that represent the splicing landscape of the transcriptome. IsoQuant emphasizes alignment correction and de novo transcript discovery; IsoTools focuses on alternative splicing analysis; Isosceles is tailored for single-cell isoform discovery; and StringTie2 can use short reads to handle errors. Class-based methods, such as ESPRESSO95, TALON96, FLAMES97 and Bambu33, classify reads by splice junction patterns to identify known, novel or uncertain isoforms. ESPRESSO corrects splice junctions before classification; TALON validates novel transcripts across replicates; FLAMES analyses incongruent reads for novel splice sites; and Bambu uses machine learning to predict and rank novel transcripts while controlling false positives.
a, Reference-based inference. This method creates an experiment-defined transcriptome by mapping reads to a reference genome. Transcript identification can apply cluster-based, graph-based or class-based algorithms and may or may not use an existing reference annotation to guide analysis. The approach allows the discovery of novel transcripts and results in a long-read-defined transcriptome that is quantified for secondary analysis.b, Reference-free inference. This approach entirely relies on long-read RNA sequencing data for transcript reconstruction and quantification and is suitable for species without reference information. A mapping step may be used to join transcripts as alternative isoforms if a genome sequence is available.c, Annotation-driven. In this case, the reference annotation or transcriptome is used for direct transcript quantification by (pseudo)mapping, resulting in a simpler and faster process but missing any novel isoforms. EM, expectation-maximization algorithm for isoform quantification.
Annotation-free methods, such as Freddie98 andLyRic, detect and quantify transcript isoforms without existing annotations (Fig. 3a). Freddie optimizes clustering and error correction for isoform reconstruction, whereas LyRic returns fewer but high-confidence transcript models supported by short reads40. Fully reference-free methods are valuable when both reference genomes and annotations are unavailable (Fig. 3b), using clustering techniques to group similar reads, detect alternative splicing events and refine transcript models: IsoSplitter validates splice junctions with short reads99; RNA-Bloom2 (ref.100) and rnaSPAdes101 use de Bruijn graphs for assembly; IsoSeq3 (ref.102) refines PacBio HiFi reads to call transcripts that are then mapped to the genome to identify alternative isoforms of the same gene; RATTLE usesk-mer-based error correction for ONT data103; whereas isONform uses a directed acyclic graph to preserve actual splicing variations104.
Direct quantification tools face challenges in uniquely assigning reads to specific isoforms owing to sequencing errors, length variability and RNA transcript complexity. To address these limitations, they rely on reference annotations and the expectation-maximization algorithm, an approach to perform maximum likelihood estimation with latent variables, to iteratively refine isoform assignments based on abundances (Fig. 3c). For example, Oarfish uses coverage-based probabilistic modelling to adjust fragment assignment probabilities, addressing uneven coverage in long-read data sets105. Similarly, lr-kallisto adapts the pseudo-alignment strategy to accommodate longer reads and higher error rates, focusing on reducing the complexity of transcript compatibility graphs for speed and resource efficiency34. LIQA introduces a bias correction model to handle truncated reads and 3′-end biases by fitting them into a Kaplan–Meier estimator, enhancing isoform quantification accuracy35. NanoCount uses stringent alignment scores and 3′-end position filtering to improve read assignment reliability for ONT direct RNA sequencing45.
Quality control
Despite improvements in ONT and PacBio sequencing platforms, lrRNA-seq data quality is still influenced by library preparation, instrument performance and data processing choices. Comprehensive quality control (QC) is essential to assess potential biases in the sequencing output and reliability of the inferred transcript models.
At the read level, tools such as LongQC106, MinKnow, SQANTI-reads69 and PycoQC107 obtain QC metrics including read counts, length distribution, basecalling quality scores, adapter contamination, GC content and coverage. Furthermore, at the read alignment level, NanoPack2 offers modules for read-level QC and assessing the quality of mapped reads to the reference genome108. MultiQC provides an overview of these read and alignment-level metrics109, whereas SQANTI-reads integrates assessments of transcriptome QC across multiple samples to detect outliers and estimate discovery power using the SQANTI3 framework. SQANTI3 is a widely adopted approach to QC LRS-based reconstructed transcriptomes, as it defines transcript structural categories of novelty and integrates complementary data — short reads, cap analysis of gene expression peaks, poly(A) motifs and poly(A) sites — to assess confidence and curate transcripts46.
If synthetic spike-ins have been included in the library preparation, they can be used to evaluate data quality and quantification in lrRNA-seq. ERCC controls mimic mono-exonic RNAs of varying lengths and concentrations, facilitating assessment of RNA length biases and gene-level quantification accuracy. Additionally, SIRVs and sequins mimic alternatively spliced transcripts and are helpful for evaluating the identification of splice sites, alternative splicing and transcript-level quantification. Furthermore, SIRVs are provided in different mixes with varying relative concentrations, offering a reference for quantification assessment40.
To perform QC on the efficiency of the transcriptome capture in the experiment, bioinformatics benchmarking tools such as BUSCO compare a genome annotation file against a set of conserved single-copy orthologues, providing a quantitative measure of completeness110. However, BUSCO was initially designed to evaluate genome assemblies and focused on single-copy genes, limiting its ability to assess alternative isoforms, which are often reported as duplicated genes.
Finally, to evaluate whether sample-specific data processing and analysis strategy are adequate, algorithms that simulate ONT and PacBio lrRNA-seq data111,112 or novel transcripts113 are valuable. They facilitate the benchmarking of tools used for alignment, quantification and sensitivity to transcript identification, among other metrics. However, they may not accurately reproduce all possible errors, such as RNA degradation, library preparation artefacts and limited capture. Therefore, new methods are needed for comprehensive ground-truth assessment of lrRNA-seq that account for biases and transcriptome complexity.
Downstream computational analysis
The downstream analysis of lrRNA-seq data includes approaches to obtain biological insights from the data. The most frequent applications are differential expression analysis, genome annotation and RNA modification discovery (Fig. 1c and Table 2).
Differential isoform expression and splicing
Recent increases in sequencing throughput enable lrRNA-seq for differential gene and isoform expression analyses and uniquely position LRS technologies for differential isoform usage studies (Fig. 4a). A recent benchmark70 evaluated the suitability of srRNA-seq tools using synthetic lrRNA-seq data sets. Although all tools effectively controlled for false discoveries, DESeq2 (ref.114), edgeR115 and limma-voom116,117 outperformed other tools for differential isoform expression using LRS. However, LRS presents unique challenges, including transcript selection biases, sequencing error rates and mapping accuracy. To address the challenges of lrRNA-seq data, new tools have been proposed. DELongSeq detects differential isoform expression based on expression and uncertainty estimates instead of read counts using a random-effect regression model36. TAGET uses an exact two-sided binomial test on the observed transcript counts to conduct differential isoform expression analysis118.
a, Differential expression and splicing, long-read RNA-seq can quantify gene-level expression and detect differential isoform usage, revealing complex alternative splicing events (for example, exon skipping or intron retention).b, Genome annotation, full-length reads refine annotations by accurately defining exons, introns, transcription start and end sites and novel isoforms, supporting both reference improvements and de novo genome assemblies.c, Functional analysis, isoform-resolved functional profiling links splicing changes to protein domains, RNA motifs and regulatory pathways, illuminating the biological impact of alternative splicing.d, Transcript visualization, specialized tools (for example, genome browsers and sashimi plots) highlight splicing patterns, read alignments and quality metrics at the gene or transcript level.e, Single-cell long-reads, single-cell protocols adapted for long-read sequencing capture isoform-level heterogeneity in individual cells, exposing cell-type-specific expression and splicing.f, Spatial long-reads, spatial transcriptomics with long-read sequencing preserve tissue context while capturing full-length transcripts, enabling isoform-level analysis of distinct anatomical regions.g, Nascent RNA long-read transcriptomics, metabolic labelling of newly synthesized RNA, followed by long-read sequencing, uncovers real-time transcription kinetics and co-transcriptional splicing events. Chr, chromosome; DIU, differential isoform usage; FC, fold change; FSM, full-splice match; ISM, incomplete splice match; NIC, novel in catalogue; NNC, novel not in catalogue.
Differential splicing analysis can detect changes in splicing patterns that contribute to transcript diversity and potentially affect gene function (Fig. 4a). Tools such as IsoTools, tappAS, HBA-DEALS and ScisorseqR can perform both differential isoform expression and splicing analysis37,119,120. IsoTools uses the method by Joglekar et al.29, applying a χ2 test on an isoform quantification dataframe to identify genes with differential isoform expression between samples, and a likelihood ratio test for differential splicing37. tappAS uses DEXseq121 for differential splicing and introduced the ‘total isoform usage change’ as a measure of the magnitude of redistribution of isoform expression between conditions, comparing it with gene fold-change to evaluate the relative contribution of gene expression and splicing transcriptional regulation. HBA-DEALS models differential expression and splicing simultaneously by using hierarchical Bayesian analysis120, whereas ScisorseqR analyses differential poly(A), TSS and exon usage using a χ2 test on a feature quantification dataframe10 (Table 2).
Despite great advances in testing the suitability of short-read methods and new differential isoform expression and splicing analysis methods for lrRNA-seq, technical and biological biases that affect transcript quantification can markedly influence analysis outcomes. Until these ambiguities are resolved, we recommend performing the quantification and differential expression/splicing steps using more than one tool and using the intersection of their results.
Genome annotation
lrRNA-seq provides full-length information of gene expression and is therefore an interesting source of data for genome annotation efforts. SQANTI3 provides a primary annotation framework by annotating transcripts upon comparison to reference annotations into full-splice-match (that is, reference transcripts), incomplete-splice-match (that is, fragments and alternative TSS/TTS), novel-in-catalogue (that is, alternative transcripts of current annotations), novel-not-in-catalogue (that is, transcripts with novel splice sites) and others46 (Fig. 4b). lrRNA-seq is currently used by the two main reference human gene and transcript annotation resources, NCBI’s RefSeq122,123 and EMBL-EBI’s Ensembl/GENCODE124,125, to complete reference genome annotations, including condition-specific rare transcripts74. Also, it is used by multiple international initiatives that aim to sequence all species on Earth (for example, Earth Biogenome126, the Vertebrate Genome Project127 and the Darwin Tree of Life128). Genome annotation approaches using lrRNA-seq data are evidence-based or evidence-driven.
Evidence-based methods use experimental data such as RNA-seq or protein sequences to identify genes by mapping them to the genome. lrRNA-seq captures full-length transcripts, making it the preferred choice for high-confidence genome annotation. This method improved annotations for model organisms, such asArabidopsis thaliana19, and less well-characterized species, such as the horse (Equus ferus caballus)129, the tea plant (Camellia sinensis)130 and the Indian jumping ant (Harpegnathos saltator)131.
Evidence-driven strategies combine computational models such as support vector machines or hidden Markov models to predict genes based on genomic sequences132,133 and experimental data. Tools including Augustus134,135, SNAP136, MAKER137 and BRAKER138 can leverage lrRNA-seq data for genome annotation. A recent strategy used to annotate the Florida manatee (Trichechus manatus latirostris) genome summarized reads into transcript models ahead of using evidence-driven methods, improving annotation by reducing gene redundancy and accurately defining boundaries of TSS, exons and TTS139.
Gene fusion discovery
Gene fusions, which result fromcis-splicing ortrans-splicing or genomic rearrangements, are important cancer drivers and diagnostic markers140,141,142. lrRNA-seq detects transcripts that span two genes and enables sequencing of full-length fusion transcripts. Gene fusion discovery tools (Table 2) use splice-aware aligners to map reads to reference genomes or transcriptomes and identify reads spanning multiple genes. SQANTI3 identifies fusion transcripts as those mapping to two genes after Minimap2 alignment46. LongGF143 and JAFFAL144 determine fusion breakpoints from these alignments and assign a confidence score based on the number of supporting reads. Genion145 and FusionSeeker12 refine fusion detection by clustering reads that suggest potential fusions, and FUGAREC146 and CTAT-LR-fusion147 validate fusion positions by re-aligning reads to synthetic gap sequences or collinear gene contigs. Finally, FLAIR-fusion148 is able to extract isoform-level fusions by annotating the transcriptome after performing the genomic alignment and refining the isoforms using FLAIR92. In general, the ability of lrRNA-seq to sequence full-length transcripts greatly increases the confidence in gene fusion discovery.
Functional annotation
Functional annotation assigns biological meaning to sequences to understand the impact of gene expression regulation. It involves identifying protein-coding transcripts, predicting functional domains and annotating them using orthologue information from databases such as SwissProt149, Pfam150 and InterPro151. Gene ontology terms and biological pathways can also be mapped152,153,154. Databases such as Rfam155 and MirBase156 are used to annotate RNA families. These resources support similarity-based annotation; however, such gene-centric approaches do not account for the functional consequences of differential splicing.
Databases such as APPRIS157, FunctionAnnotator158 and ISOGO159 provide functional annotations with isoform resolution. However, these annotations are often based on protein-level features, overlooking post-transcriptional regulatory mechanisms. To address these limitations, IsoAnnot119 and IsoAnnotLite46 use long-read-defined transcriptomes to build an isoform-resolved database of functional annotations at both RNA and protein levels, enabling the annotation of novel isoforms. Despite advances in functional annotation, profiling non-coding transcripts such as long non-coding RNAs remains challenging owing to their lower conservation across species.
Functional profiling
Transcriptomic studies typically aim to characterize the molecular functions and pathways of differentially expressed genes (Fig. 4c). srRNA-seq achieves this goal by identifying over-represented functions among differentially expressed genes160. However, alternative splicing has a substantial impact on tissue identity and cell development. To investigate the functional consequences of alternative splicing, exon skipping events can be detected using Exon Ontology or DIGGER to assess their impact on protein structure and function161,162, and IRFinder-S leverages lrRNA-seq to identify intron retention events163. Other tools focus on transcript-level analysis. For example, IsoformSwitchAnalyzeR statistically tests for changes in genome-wide alternative splicing patterns and consequences of isoform switches164, whereas tappAS uses isoform-resolved annotation of coding and non-coding functional domains, motifs and sites to evaluate the functional impact of transcript variants and post-transcriptional regulation on isoform expression. Moreover, the tool introduces ‘differential feature inclusion analysis’, which assesses the inclusion or exclusion of a wide range of functional motifs resulting from alternative splicing119. DoChaP and NEASE use exon-protein domain associations and conservation to explore the structural effects of alternative splicing in protein domains165,166. Regardless of the approach, obtaining accurate functional profiling critically depends on having well-annotated isoforms.
Allele-specific expression
Differences in the expression of allelic variants can contribute to phenotypic variation18,167. For many years, linear or single reference genomes complicated the study of how common and rare genetic variants influence regulatory effects on the transcriptome, introducing mapping biases because reads could not be accurately aligned to the specific haplotypes from which they originated. However, with the growing capabilities of LRS, efforts have intensified to develop pan-genome and pan-transcriptome references74,168,169,170.
By capturing full-length transcripts, lrRNA-seq improves the mapping of heterozygous genetic variants and enables the detection of allele-specific expression to analyse allelic imbalance and alternative splicing171 (Fig. 1b). LORALS and RPVG use pan-genomes as mapping references to phase haplotypes18,172. Although LORALS statistically tests whether the observed expression is higher than expected noise levels18, RPVG infers the most probable underlying haplotype pairs and estimates allele-specific expression using an expectation-maximization approach172. IsoLaser and FLAIR2 analyse allele-specific alternative splicing by generating phased haplotypes from variant calling within lrRNA-seq reads20,173. IsoLaser tests for linkage between the phased haplotypes and alternatively spliced exon segments173, whereas FLAIR2 aims to detect haplotype-specific isoforms and generates transcript models with accurate TSS and TTS20. Together, these methods exemplify how leveraging haplotype-aware references and lrRNA-seq can uncover allele-specific expression and splicing dynamics.
Visualization
Visualization of lrRNA-seq data is commonly done using the UCSC genome browser174 or the Integrative Genomics Viewer (IGV)175 (Fig. 4d). However, numerous specialized tools have been designed to visualize different elements of interest throughout the data processing pipeline (Table 2). ONT and PacBio’s proprietary software, MinKNOW and SMRT Link, respectively, provide real-time run monitoring and summary plots of read quality. Other tools include BulkVis, SquiggleKit and BoardION to plot electric current squiggles of ONT runs176,177,178; LongQC to plot quality metrics of FASTQ files106; and NanoPack2 for both FASTQ and BAM files108. SQANTI3 offers plots to inspect QC features of the data, for example, the distance to TSS or TTS, incidence of non-canonical splicing or intrapriming levels (an artefact arising from hybridization of oligo(dT) primer onto adenine single-nucleotide repeats, in addition to poly(A) tails, which can overinflate mRNA counts for genes containing these repeats)46, whereas SQANTI-reads provides comparative charts of quality features across multiple samples69.
Other tools focus on gene-level visualizations. Swan179 creates gene and transcript path graphs, whereas ggtranscript highlights differences from the reference transcript180. ScisorWiz visualizes gene-level differential isoform expression181, and R2Dtool and Nanoblot highlight isoform-resolved RNA features182,183. Moreover, IsoTools37, ESPRESSO95, IsoQuant32 and tappAS119 provide a range of plots, from data QC, including saturation plots and transcript quality, to gene-level splicing visualizations using sashimi plots. Finally, specialized packages for analysing ONT direct RNA generate plots related to differential isoform expression, RNA epigenetic modifications184,185,186, poly(A) tails and gene fusions187 (Table 2). As the transcriptomics community continues to grow, new tools with enhanced features are expected to emerge.
RNA modification detection
Chemical modifications of RNA, collectively known as the epitranscriptome, have crucial roles in various regulatory processes (reviewed elsewhere188). ONT direct RNA sequencing enables the detection of unique signatures of these RNA modifications in individual RNA molecules50 (Fig. 1b). Algorithms for detecting epitranscriptomic modifications via ONT can be classified into three groups: unsupervised, supervised or semisupervised methods. Unsupervised or comparative methods, such as ELIGOS22, xPore189 and DRUMMER190, analyse basecalled data to identify modifications by statistically testing whether the signal differs between modified and unmodified samples; whereas ELIGOS and DRUMMER rely on basecalling error signals and xPore directly assesses the ONT ion current signal. Supervised methods, including nanom6A21, m6Anet191, DENA192, Penguin193 and mAFiA194, use machine-learning models trained on ground-truth, in vitro-generated or labelled data sets containing known RNA modifications. Some supervised tools, such as NanoSpa195 and TandemMod196, can detect multiple types of RNA modifications within the same molecule. Semisupervised methods such as Xron197, m6ABasecaller198 andDorado are trained on both immunoprecipitation and in vitro transcription data to enable de novo identification of m6A modifications in single reads with single-nucleotide resolution at the basecalling step. In many cases, RNA modification detection methods require a data pre-processing step called ‘segmentation’ where the raw ONT ion current signal is aligned to a reference to accurately define event boundaries. Segmentation can be performed using tools such as Nanopolish199, Tombo200, f5c201 or Uncalled4 (ref.202). Despite the advances made by the three mentioned types of RNA modification callers, many signal modulations have not yet been associated with lesser-known RNA modifications, owing to technical biases51 and the challenge of generating synthetic data sets that capture broader modification diversity203.
Although direct RNA sequencing has expanded our ability to study the epitranscriptome, few comprehensive, end-to-end pipelines are available, with the exception of MasterOfPores204 andnf-core/nanoseq. Although some benchmarking studies have been conducted to compare m6A profiling methods205, a comprehensive assessment of tools for single and simultaneous detection of other chemical RNA modifications using robust ground-truth data sets remains a marked unmet need.
In addition to epitranscriptomic modifications, other post-transcriptional modifications, such as transcript poly(A) tails, can be analysed using lrRNA-seq60,64. Although shorter poly(A) tails are thought to be associated with more stable mRNA, longer poly(A) tails are associated with increased translational efficiency206. Poly(A) tails can be directly extracted from basecalled PacBio cDNA sequencing data60 and can also be inferred from direct RNA sequencing with ONT64. However, in ONT experiments, basecallers struggle to accurately call homopolymeric regions. Thus, specialized tools that take advantage of a segmentation step by comparing the raw ONT ion current signal with a poly(A) reference, such as Nanopolish-polyA199 and the more recent tailfindr207, have been developed and are widely used to measure poly(A) tail lengths.
Emerging lrRNA-seq applications
lrRNA-seq has rapidly expanded its range of applications with the development of protocols that address transcript expression at diverse resolutions and cellular compartments. Here, we discuss long-read single-cell methods and LRS-based transcription and translation analyses (Figs. 1c and4e–g). These examples illustrate the power of LRS to be combined with various biochemical assays to investigate the biogenesis, processing and metabolism of RNA with isoform resolution.
Long-read single-cell RNA-seq
Long-read single-cell RNA-seq (LR-scRNA-seq), for example, by 10× Genomics, leverages droplet-based methods with cell barcodes and unique molecular identifiers (UMIs)208 (Fig. 4e). In brief, the strategy is to direct 10× cDNA library preparations to LRS, rather than to srRNA-seq, to obtain reads of full-length transcripts (reviewed elsewhere209). However, high error rates in ONT and low sequencing depth in PacBio can lead to inaccurate barcode and UMI assignments and limited low-abundance transcript detection. Sicelore and RAGE-seq combine LR-scRNA-seq with srRNA-seq to improve barcode and UMI assignment accuracy but increase time and cost210,211. scTaILoR-seq enriches genes via targeted hybridization capture212, and computational tools such as scNanoGPS213 and BLAZE214 improve read assignment and barcode verification. The PacBio’s Kinnex kit and Revio system boost throughput to 80–100 million reads per SMRT cell47 (Table 1), whereas HIT-scISOseq reduces sequencing artefacts using biotinylated primers and magnetic beads215. LR-scRNA-seq has revealed complex isoform dynamics in different tissues, including brain-specific differential isoform expression events in 395 genes across 45 cell types between the mouse hippocampus and prefrontal cortex10.
Long-read spatial transcriptomics
Long-read spatial transcriptomics uses methods such as 10× Genomics’ Visium, Slide-seq216 and Stereo-seq217 for in situ spatial barcode capture and integration with LRS platforms208 (Fig. 4f), with similar challenges and solutions as proposed for LR-scRNA-seq218. Further developments in LRS-based spatial transcriptomics focus on enhancing in situ capture by balancing the trade-off between spatial resolution and field of view and advancing sequencing depth and accuracy to enable srRNA-seq-free barcode and UMI assignment30. Interestingly, the technology has identified region-specific isoform variations in key genes219 and transcriptional heterogeneity within glioma microenvironments, revealing stem-like cells in invasive glioma niches218.
Nascent lrRNA-seq
Nascent RNA sequencing offers insights into gene regulation, RNA polymerase II (Pol II) dynamics and transcription-coupled splicing (reviewed elsewhere220) (Fig. 4g). Chromatin-associated RNA isolation uses stable nascent RNA–Pol II interactions, sequencing both nascent and stable RNAs, but lacks Pol II immunoprecipitation and relies on RT and PCR, hindering RNA modification detection and possibly affecting 5′ ends221,222,223,224. POINT-nano uses Pol II-associated immunoprecipitation for specific nascent RNA enrichment225, whereas dNET-seq, Nano-COP isolate new RNA via biotinylation226,227, and Nano-ID uses neural networks for nascent RNA identification228. These techniques revealed that splicing occurs near active Pol II in yeast, within tens of nucleotides downstream of the 3′ splice site222,224. In mammals, this approach showed that splicing can happen as early as 15 nt downstream of Pol II, extending up to 300 nt (ref.225), revealing that splicing can be completed before Pol II reaches the next splice site or delayed until more intron is transcribed225,226,227. Furthermore, LRS enhances understanding of splicing kinetics, showing that faster transcription favours exon skipping, whereas slower Pol II activity favours exon inclusion, which aligns with the kinetic competition model224. Precise Pol II mapping suggests that splicing decisions occur after transcribing the downstream exon, supporting the exon definition model224.
Long-read ribosomal sequencing
Translational profiling bridges mRNA transcription and protein synthesis, and recent LRS-based methods have been proposed to study isoform-level translation. Specifically, the long-read Ribo-STAMP (lr-Ribo-STAMP) method transfects cells with a fusion protein construct consisting of ribosomal protein S2 (RPS2) fused to APOBEC1, an RNA editing enzyme that catalyses cytidine (C) to uridine (U) editing on translating transcripts. The subsequent LRS and identification of the edited transcript enable simultaneous measurements of mRNA translation and mRNA levels229. This approach was applied to triple-negative breast cancer cell line under normal oxygen levels and hypoxia to find transcriptional and translation differences between the two conditions as well as regulatory elements that mediate translational differences at the isoform level229.
Conclusions and future perspectives
lrRNA-seq is quickly becoming the preferred method for transcriptome analysis, with improved sequencing accuracy, throughput and cost efficiency expected to soon surpass srRNA-seq, and emergent sequencing platforms developing long-reads capabilities (for example, MGI’sCycloneSEQ; representing an important addition to the field). However, precise control over RNA degradation is crucial to fully harness the benefits of capturing an entire RNA molecule in a single read. Hence, the technology must shift its focus to developing more efficient and robust library preparation protocols.
lrRNA-seq holds immense potential for the discovery of novel transcripts and contributes to genome annotation of biodiversity, but clear guidelines for the most effective experimental and bioinformatics pipelines have yet to be established. lrRNA-seq is also a valuable resource for expanding gene and transcript catalogues in well-studied organisms, and it can have a key role in pan-transcriptome projects that catalogue variants associated with specific populations, tissues or cell types. However, there is a risk of capturing spurious RNA molecules, leading to enormous, unwieldy RNA catalogues74. The genome annotation community must balance comprehensiveness and manageable data outputs. For quantification, the focus is shifting from LRS methods technologies generating sufficient reads for quantification to understanding the inherent biases in quantification and how these can be corrected. Length bias, which affects both transcript detection and quantification, is one such challenge that requires investigation. Moreover, the field still requires systematic studies on sequencing depth, replication, experimental design, as well as tailored methods for normalization and differential isoform usage analysis.
Emerging applications of lrRNA-seq represent opportunities to advance genome research, including full-length isoforms as biomarkers of disease or to better understand disease mechanisms, algorithms to profile RNA modifications beyond m6A to fully include the epitranscriptome layer in transcriptome analysis, and integration with other sequencing and mass spectrometry methods for long-read multi-omics approaches that can better describe RNA regulatory dynamics230. Realizing these possibilities and developing new tools tailored to bulk, single-cell and spatial lrRNA-seq will require addressing the considerable computational demands of lrRNA-seq. Given the already high environmental cost of bioinformatics231, optimizing data size and computational efficiency should be a top priority for the community.
References
Conesa, A. et al. A survey of best practices for RNA-seq data analysis.Genome Biol.17, 13 (2016).This review provides an overview of short-reads transcriptomics experimental design and bioinformatic analysis.
Tilgner, H. et al. Comprehensive transcriptome analysis using synthetic long-read sequencing reveals molecular co-association of distant splicing events.Nat. Biotechnol.33, 736–742 (2015).
Katz, Y., Wang, E. T., Airoldi, E. M. & Burge, C. B. Analysis and design of RNA sequencing experiments for identifying isoform regulation.Nat. Methods7, 1009–1015 (2010).
Park, E., Pan, Z., Zhang, Z., Lin, L. & Xing, Y. The expanding landscape of alternative splicing variation in human populations.Am. J. Hum. Genet.102, 11–26 (2018).
Duke, J. L. et al. Determining performance characteristics of an NGS-based HLA typing method for clinical applications.Hladnikia87, 141–152 (2016).
Sirupurapu, V., Safonova, Y. & Pevzner, P. A. Gene prediction in the immunoglobulin loci.Genome Res.32, 1152–1169 (2022).
Sharon, D., Tilgner, H., Grubert, F. & Snyder, M. A single-molecule long-read survey of the human transcriptome.Nat. Biotechnol.31, 1009–1014 (2013).
Soneson, C. et al. A comprehensive examination of nanopore native RNA sequencing for characterization of complex transcriptomes.Nat. Commun.10, 3359 (2019).
Weirather, J. L. et al. Comprehensive comparison of Pacific Biosciences and Oxford Nanopore Technologies and their applications to transcriptome analysis.F1000Research6, 100 (2017).
Joglekar, A. et al. A spatially resolved brain region- and cell type-specific isoform atlas of the postnatal mouse brain.Nat. Commun.12, 463 (2021).
Patowary, A. et al. Developmental isoform diversity in the human neocortex informs neuropsychiatric risk mechanisms.Science384, eadh7688 (2024).
Chen, Y. et al. Gene fusion detection and characterization in long-read cancer transcriptome sequencing data with FusionSeeker.Cancer Res.83, 28–33 (2023).
Hou, R., Hon, C.-C. & Huang, Y. CamoTSS: analysis of alternative transcription start sites for cellular phenotypes and regulatory patterns from 5’ scRNA-seq data.Nat. Commun.14, 7240 (2023).
Wright, D. J. et al. Long read sequencing reveals novel isoforms and insights into splicing regulation during cell state changes.BMC Genomics23, 42 (2022).
Anvar, S. Y. et al. Full-length mRNA sequencing uncovers a widespread coupling between transcription initiation and mRNA processing.Genome Biol.19, 46 (2018).
Zhang, Z., Bae, B., Cuddleston, W. H. & Miura, P. Coordination of alternative splicing and alternative polyadenylation revealed by targeted long read sequencing.Nat. Commun.14, 5506 (2023).
Zhang, S.-J. et al. Isoform evolution in primates through independent combination of alternative RNA processing events.Mol. Biol. Evol.34, 2453–2468 (2017).
Glinos, D. A. et al. Transcriptome variation in human tissues revealed by long-read sequencing.Nature608, 353–359 (2022).This study reported the discovery of thousands of novel isoforms in a well-annotated organism and GTEx benchmarking.
Zhang, R. et al. A high-resolution single-molecule sequencing-basedArabidopsis transcriptome using novel methods of Iso-seq analysis.Genome Biol.23, 149 (2022).
Tang, A. D. et al. Detecting haplotype-specific transcript variation in long reads with FLAIR2.Genome Biol.25, 173 (2024).
Gao, Y. et al. Quantitative profiling ofN-methyladenosine at single-base resolution in stem-differentiating xylem ofPopulus trichocarpa using Nanopore direct RNA sequencing.Genome Biol.22, 22 (2021).
Jenjaroenpun, P. et al. Decoding the epitranscriptional landscape from native RNA sequences.Nucleic Acids Res.49, e7 (2021).
Method of the year 2022: long-read sequencing.Nat. Methods20, 1 (2023).
Amarasinghe, S. L. et al. Opportunities and challenges in long-read sequencing data analysis.Genome Biol.21, 30 (2020).
Al’Khafaji, A. M. et al. High-throughput RNA isoform sequencing using programmed cDNA concatenation.Nat. Biotechnol.42, 582–586 (2024).
Wang, Y., Zhao, Y., Bollas, A., Wang, Y. & Au, K. F. Nanopore sequencing technology, bioinformatics and applications.Nat. Biotechnol.39, 1348–1365 (2021).
Dong, X. et al. The long and the short of it: unlocking nanopore long-read RNA sequencing data with short-read differential expression analysis tools.NAR Genom. Bioinform.3, lqab028 (2021).
Mahmoud, M. et al. Utility of long-read sequencing for All of Us.Nat. Commun.15, 837 (2024).
Joglekar, A. et al. Single-cell long-read sequencing-based mapping reveals specialized splicing patterns in developing and adult mouse and human brain.Nat. Neurosci.27, 1051–1063 (2024).
Fu, Y. et al.Single cell and Spatial Alternative Splicing Analysis with Long Read Sequencing. Preprint athttps://www.researchsquare.com/article/rs-2674892/v1 (2023).
Volden, R. et al. Identifying and quantifying isoforms from accurate full-length transcriptome sequencing reads with Mandalorion.Genome Biol.24, 167 (2023).
Prjibelski, A. D. et al. Accurate isoform discovery with IsoQuant using long reads.Nat. Biotechnol.41, 915–918 (2023).
Chen, Y. et al. Context-aware transcript quantification from long-read RNA-seq data with Bambu.Nat. Methods20, 1187–1195 (2023).
Loving, R. et al. Long-read sequencing transcriptome quantification with lr-kallisto. Preprint atbioRxivhttps://doi.org/10.1101/2024.07.19.604364 (2024).
Hu, Y. et al. LIQA: long-read isoform quantification and analysis.Genome Biol.22, 182 (2021).
Hu, Y., Gouru, A. & Wang, K. DELongSeq for efficient detection of differential isoform expression from long-read RNA-seq data.NAR Genom. Bioinform.5, lqad019 (2023).
Lienhard, M. et al. IsoTools: a flexible workflow for long-read transcriptome sequencing analysis.Bioinformatics39, btad364 (2023).
Carbonell-Sala, S. et al. CapTrap-seq: a platform-agnostic and quantitative approach for high-fidelity full-length RNA sequencing.Nat. Commun.15, 5278 (2024).
Su, Y. et al. Comprehensive assessment of mRNA isoform detection methods for long-read sequencing data.Nat. Commun.15, 3972 (2024).
Pardo-Palacios, F. J. et al. Systematic assessment of long-read RNA-seq methods for transcript identification and quantification.Nat. Methods21, 1349–1363 (2024).This publication is a benchmark study of library preparation, sequencing platform, transcript identification and quantification for long-read RNA sequencing.
Wang, L. et al. Measure transcript integrity using RNA-seq data.BMC Bioinform.17, 58 (2016).
Gallego Romero, I., Pai, A. A., Tung, J. & Gilad, Y. RNA-seq: impact of RNA degradation on transcript quantification.BMC Biol.12, 42 (2014).
Prawer, Y. D. J., Gleeson, J., De Paoli-Iseppi, R. & Clark, M. B. Pervasive effects of RNA degradation on nanopore direct RNA sequencing.NAR Genom. Bioinform.5, lqad060 (2023).
Schroeder, A. et al. The RIN: an RNA integrity number for assigning integrity values to RNA measurements.BMC Mol. Biol.7, 3 (2006).
Gleeson, J. et al. Accurate expression quantification from nanopore direct RNA sequencing with NanoCount.Nucleic Acids Res.50, e19 (2022).
Pardo-Palacios, F. J. et al. SQANTI3: curation of long-read transcriptomes for accurate identification of known and novel isoforms.Nat. Methods21, 793–797 (2024).
Pacific Biosciences.Application Note:Kinnex Full-Length RNA Kit for Isoform Sequencing. Available athttps://www.pacb.com/wp-content/uploads/Application-note-Kinnex-full-length-RNA-kit-for-isoform-sequencing.pdf (2023).
Oxford Nanopore Technologies.The Value of Full-Length Transcripts Without Bias. Available athttps://a.storyblok.com/f/196663/x/8badf93497/rna-sequencing-white-paper.pdf (2019).
Chen, Y. et al. A systematic benchmark of nanopore long read RNA sequencing for transcript level analysis in human cell lines. Preprint atbioRxivhttps://doi.org/10.1101/2021.04.21.440736 (2021).This preprint benchmarks different methods for ONT library preparation, isoform detection and quantification.
Garalde, D. R. et al. Highly parallel direct RNA sequencing on an array of nanopores.Nat. Methods15, 201–206 (2018).This study showed that nanopore RNA sequencing detects both the RNA molecules and their modifications.
Begik, O., Mattick, J. S. & Novoa, E. M. Exploring the epitranscriptome by native RNA sequencing.RNA28, 1430–1439 (2022).
Lucas, M. C. & Novoa, E. M. Long-read sequencing in the era of epigenomics and epitranscriptomics.Nat. Methods20, 25–29 (2023).
Rhoads, A. & Au, K. F. PacBio sequencing and its applications.Genomics Proteom. Bioinform.13, 278–289 (2015).
Wenger, A. M. et al. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome.Nat. Biotechnol.37, 1155–1162 (2019).
Liu-Wei, W. et al. Sequencing accuracy and systematic errors of nanopore direct RNA sequencing.BMC Genomics25, 528 (2024).
Volden, R. et al. Improving nanopore read accuracy with the R2C2 method enables the sequencing of highly multiplexed full-length single-cell cDNA.Proc. Natl Acad. Sci. USA115, 9726–9731 (2018).
Volden, R. & Vollmers, C. Single-cell isoform analysis in human immune cells.Genome Biol.23, 47 (2022).
Wang, F. et al. TEQUILA-seq: a versatile and low-cost method for targeted long-read RNA sequencing.Nat. Commun.14, 4760 (2023).
Sheynkman, G. M. et al. ORF Capture-Seq as a versatile method for targeted identification of full-length isoforms.Nat. Commun.11, 2326 (2020).
Legnini, I., Alles, J., Karaiskos, N., Ayoub, S. & Rajewsky, N. FLAM-seq: full-length mRNA sequencing reveals principles of poly(A) tail length control.Nat. Methods16, 879–886 (2019).
Begik, O. et al. Nano3P-seq: transcriptome-wide analysis of gene expression and tail dynamics using end-capture nanopore cDNA sequencing.Nat. Methods20, 75–85 (2023).
Kuo, R. I. et al. Illuminating the dark side of the human transcriptome with long read transcript sequencing.BMC Genomics21, 751 (2020).
Verwilt, J., Mestdagh, P. & Vandesompele, J. Artifacts and biases of the reverse transcription reaction in RNA sequencing.RNA29, 889–897 (2023).This review discusses the impact of RT on RNA-seq experiments.
Workman, R. E. et al. Nanopore native RNA sequencing of a human poly(A) transcriptome.Nat. Methods16, 1297–1305 (2019).
Kong, Y., Mead, E. A. & Fang, G. Navigating the pitfalls of mapping DNA and RNA modifications.Nat. Rev. Genet.24, 363–381 (2023).
Bamford, R. A. et al. An atlas of expressed transcripts in the prenatal and postnatal human cortex. Preprint atbioRxivhttps://doi.org/10.1101/2024.05.24.595768 (2024).
Aguzzoli Heberle, B. et al. Mapping medically relevant RNA isoform diversity in the aged human frontal cortex with deep long-read RNA-seq.Nat. Biotechnol.https://doi.org/10.1038/s41587-024-02245-9 (2024).
Wang, J. et al. Direct RNA sequencing coupled with adaptive sampling enriches RNAs of interest in the transcriptome.Nat. Commun.15, 481 (2024).
Keil, N., Monzó, C., McIntyre, L. & Conesa, A. SQANTI-reads: a tool for the quality assessment of long read data in multi-sample lrRNA-seq experiments.Genome Res.https://doi.org/10.1101/gr.280021.124 (2025).
Dong, X. et al. Benchmarking long-read RNA-sequencing analysis tools using in silico mixtures.Nat. Methods20, 1810–1821 (2023).
Gonzaludo, N. et al. Assessment of read depth requirements for gene and isoform discovery: a comparative study of long-read and short-read RNA sequencing data in human heart and brain. Available athttps://www.pacb.com/wp-content/uploads/2024-eshg-RNA-isoform-human-heart-brain-short-and-long-read-sequencing-poster.pdf (2024).
Wang, L. et al. Iso-Seq enables discovery of novel isoform variants in human retina at single cell resolution. Preprint atbioRxivhttps://doi.org/10.1101/2024.08.08.607267 (2024).
Zhang, W. et al. Full-length RNA transcript sequencing traces brain isoform diversity in house mouse natural populations.Genome Res.34, 2118–2132 (2024).
Monzó, C., Frankish, A. & Conesa, A. Notable challenges posed by long-read sequencing for the study of transcriptional diversity and genome annotation.Genome Res.https://doi.org/10.1101/gr.279865.124 (2025).
Karin, B. R. et al. Highly-multiplexed and efficient long-amplicon PacBio and Nanopore sequencing of hundreds of full mitochondrial genomes.BMC Genomics24, 229 (2023).
Gordon, S. P. et al. Widespread polycistronic transcripts in fungi revealed by single-molecule mRNA sequencing.PLoS ONE10, e0132628 (2015).
Hoang, N. V. et al. A survey of the complex transcriptome from the highly polyploid sugarcane genome using full-length isoform sequencing and de novo assembly from short read sequencing.BMC Genomics18, 395 (2017).
Ding, C. et al. Short-read and long-read full-length transcriptome of mouse neural stem cells across neurodevelopmental stages.Sci. Data9, 69 (2022).
Paul, L. et al. SIRVs: spike-in RNA variants as external isoform controls in RNA-sequencing. Preprint atbioRxivhttps://doi.org/10.1101/080747 (2016).
External RNA Controls Consortium. Proposed methods for testing and selecting the ERCC external RNA controls.BMC Genomics6, 150 (2005).
Hardwick, S. A. et al. Spliced synthetic genes as internal controls in RNA sequencing experiments.Nat. Methods13, 792–798 (2016).
Baid, G. et al. DeepConsensus improves the accuracy of sequences with a gap-aware sequence transformer.Nat. Biotechnol.41, 232–238 (2023).
Hall, M. B. et al. Benchmarking reveals superiority of deep learning variant callers on bacterial nanopore sequence data.eLife13, RP98300 (2024).
Li, H. Minimap2: pairwise alignment for nucleotide sequences.Bioinformatics34, 3094–3100 (2018).
Sahlin, K. & Mäkinen, V. Accurate spliced alignment of long RNA sequencing reads.Bioinformatics37, 4643–4651 (2021).
Marić, J., Sovic, I., Križanović, K., Nagarajan, N. & Šikić, M. Graphmap2 — splice-aware RNA-seq mapper for long reads. Preprint atbioRxivhttps://doi.org/10.1101/720458 (2019).
Liu, B. et al. deSALT: fast and accurate long transcriptomic read alignment with de Bruijn graph-based index.Genome Biol.20, 274 (2019).
Parker, M. T., Knop, K., Barton, G. J. & Simpson, G. G. 2passtools: two-pass alignment using machine-learning-filtered splice junctions increases the accuracy of intron detection in long-read RNA sequencing.Genome Biol.22, 72 (2021).
Chao, K.-H., Mao, A., Salzberg, S. L. & Pertea, M. Splam: a deep-learning-based splice site predictor that improves spliced alignments.Genome Biol.25, 243 (2024).
Boratyn, G. M., Thierry-Mieg, J., Thierry-Mieg, D., Busby, B. & Madden, T. L. Magic-BLAST, an accurate RNA-seq aligner for long and short reads.BMC Bioinform.20, 405 (2019).
Newman, J. R. B., Concannon, P., Tardaguila, M., Conesa, A. & McIntyre, L. M. Event Analysis: using transcript events to improve estimates of abundance in RNA-seq data.G38, 2923–2940 (2018).
Tang, A. D. et al. Full-length transcript characterization of SF3B1 mutation in chronic lymphocytic leukemia reveals downregulation of retained introns.Nat. Commun.11, 1438 (2020).
Kabza, M. et al. Accurate long-read transcript discovery and quantification at single-cell, pseudo-bulk and bulk resolution with Isosceles.Nat. Commun.15, 7316 (2024).
Kovaka, S. et al. Transcriptome assembly from long-read RNA-seq alignments with StringTie2.Genome Biol.20, 278 (2019).
Gao, Y. et al. ESPRESSO: robust discovery and quantification of transcript isoforms from error-prone long-read RNA-seq data.Sci. Adv.9, eabq5072 (2023).
Wyman, D. et al. A technology-agnostic long-read analysis pipeline for transcriptome discovery and quantification. Preprint atbioRxivhttps://doi.org/10.1101/672931 (2019).
Tian, L. et al. Comprehensive characterization of single-cell full-length isoforms in human and mouse with long-read sequencing.Genome Biol.22, 310 (2021).
Orabi, B. et al. Freddie: annotation-independent detection and discovery of transcriptomic alternative splicing isoforms using long-read sequencing.Nucleic Acids Res.51, e11 (2023).
Wang, Y., Hu, Z., Ye, N. & Yin, H. IsoSplitter: identification and characterization of alternative splicing sites without a reference genome.RNA27, 868–875 (2021).
Nip, K. M. et al. Reference-free assembly of long-read transcriptome sequencing data with RNA-Bloom2.Nat. Commun.14, 2940 (2023).
Bushmanova, E., Antipov, D., Lapidus, A. & Prjibelski, A. D.rnaSPAdes: a de novo transcriptome assembler and its application to RNA-Seq data.Gigascience8, giz100 (2019).
Pacific Biosciences.Iso-Seq: Scalable De Novo Isoform Discovery from PacBio HiFi Reads. Available athttps://isoseq.how/ (2023).
de la Rubia, I. et al. RATTLE: reference-free reconstruction and quantification of transcriptomes from Nanopore sequencing.Genome Biol.23, 153 (2022).
Petri, A. J. & Sahlin, K. isONform: reference-free transcriptome reconstruction from Oxford Nanopore data.Bioinformatics39, i222–i231 (2023).
Jousheghani, Z. Z. & Patro, R. Oarfish: enhanced probabilistic modeling leads to improved accuracy in long read transcriptome quantification. Preprint atbioRxivhttps://doi.org/10.1101/2024.02.28.582591 (2024).
Fukasawa, Y., Ermini, L., Wang, H., Carty, K. & Cheung, M.-S. LongQC: a quality control tool for third generation sequencing long read data.G310, 1193–1196 (2020).
Leger, A. & Leonardi, T. pycoQC, interactive quality control for Oxford Nanopore Sequencing.J. Open Source Softw.4, 1236 (2019).
De Coster, W. & Rademakers, R. NanoPack2: population-scale evaluation of long-read sequencing data.Bioinformatics39, btad311 (2023).
Ewels, P., Magnusson, M., Lundin, S. & Käller, M. MultiQC: summarize analysis results for multiple tools and samples in a single report.Bioinformatics32, 3047–3048 (2016).
Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs.Bioinformatics31, 3210–3212 (2015).
Wang, Y. IsoSeqSim: Iso-Seq reads simulator for PacBio and ONT full-length isoform sequencing technologies.GitHubhttps://github.com/yunhaowang/IsoSeqSim (2022).
Hafezqorani, S. et al. Trans-NanoSim characterizes and simulates nanopore RNA-sequencing data.Gigascience9, giaa061 (2020).
Mestre-Tomás, J., Liu, T., Pardo-Palacios, F. & Conesa, A. SQANTI-SIM: a simulator of controlled transcript novelty for lrRNA-seq benchmark.Genome Biol.24, 286 (2023).
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2.Genome Biol.15, 550 (2014).
Robinson, M. D., McCarthy, D. J. & Smyth, G. K.edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.Bioinformatics26, 139–140 (2010).
Ritchie, M. E. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies.Nucleic Acids Res.43, e47 (2015).
Law, C. W., Chen, Y., Shi, W. & Smyth, G. K. voom: precision weights unlock linear model analysis tools for RNA-seq read counts.Genome Biol.15, R29 (2014).
Xia, Y. et al. TAGET: a toolkit for analyzing full-length transcripts from long-read sequencing.Nat. Commun.14, 5935 (2023).
de la Fuente, L. et al. tappAS: a comprehensive computational framework for the analysis of the functional impact of differential splicing.Genome Biol.21, 119 (2020).
Karlebach, G. et al. HBA-DEALS: accurate and simultaneous identification of differential expression and splicing using hierarchical Bayesian analysis.Genome Biol.21, 171 (2020).
Anders, S., Reyes, A. & Huber, W. Detecting differential usage of exons from RNA-seq data.Genome Res.22, 2008–2017 (2012).
Maglott, D. R., Katz, K. S., Sicotte, H. & Pruitt, K. D. NCBI’s LocusLink and RefSeq.Nucleic Acids Res.28, 126–128 (2000).
O’Leary, N. A. et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation.Nucleic Acids Res.44, D733–D745 (2016).
Harrow, J. et al. GENCODE: producing a reference annotation for ENCODE.Genome Biol.7, S4.1–S4.9 (2006).
Frankish, A. et al. GENCODE: reference annotation for the human and mouse genomes in 2023.Nucleic Acids Res.51, D942–D949 (2023).
Lawniczak, M. K. N. et al. Standards recommendations for the Earth BioGenome Project.Proc. Natl Acad. Sci. USA119, e21156399118 (2022).
Rhie, A. et al. Towards complete and error-free genome assemblies of all vertebrate species.Nature592, 737–746 (2021).
Lewin, H. A. et al. Earth BioGenome Project: sequencing life for the future of life.Proc. Natl Acad. Sci. USA115, 4325–4333 (2018).
Peng, S. et al. Long-read RNA sequencing improves the annotation of the equine transcriptome. Preprint atbioRxivhttps://doi.org/10.1101/2022.06.07.495038 (2022).
Xia, E. et al. The tea plant reference genome and improved gene annotation using long-read and paired-end sequencing data.Sci. Data6, 122 (2019).
Shields, E. J. et al. Genome annotation with long RNA reads reveals new patterns of gene expression and improves single-cell analyses in an ant brain.BMC Biol.19, 254 (2021).
Yandell, M. & Ence, D. A beginner’s guide to eukaryotic genome annotation.Nat. Rev. Genet.13, 329–342 (2012).
Scalzitti, N., Jeannin-Girardon, A., Collet, P., Poch, O. & Thompson, J. D. A benchmark study of ab initio gene prediction methods in diverse eukaryotic organisms.BMC Genomics21, 293 (2020).
Stanke, M. & Waack, S. Gene prediction with a hidden Markov model and a new intron submodel.Bioinformatics19, ii215–ii225 (2003).
Stanke, M., Schöffmann, O., Morgenstern, B. & Waack, S. Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources.BMC Bioinform.7, 62 (2006).
Korf, I. Gene finding in novel genomes.BMC Bioinform.5, 59 (2004).
Campbell, M. S., Holt, C., Moore, B. & Yandell, M. Genome annotation and curation using MAKER and MAKER-P.Curr. Protoc. Bioinform.48, 4.11.1–4.11.39 (2014).
Stanke, M., Diekhans, M., Baertsch, R. & Haussler, D. Using native and syntenically mapped cDNA alignments to improve de novo gene finding.Bioinformatics24, 637–644 (2008).
Paniagua, A. et al. Evaluation of strategies for evidence-driven genome annotation using long-read RNA-seq.Genome Res.https://doi.org/10.1101/gr.279864.124 (2024).
Ren, T. et al. The clinical implication of SS18–SSX fusion gene in synovial sarcoma.Br. J. Cancer109, 2279–2285 (2013).
Wang, Z. et al. Significance of the TMPRSS2:ERG gene fusion in prostate cancer.Mol. Med. Rep.16, 5450–5458 (2017).
Sundaresh, A. & Williams, O. Mechanism of ETV6-RUNX1 Leukemia.Adv. Exp. Med. Biol.962, 201–216 (2017).
Liu, Q. et al. LongGF: computational algorithm and software tool for fast and accurate detection of gene fusions by long-read transcriptome sequencing.BMC Genomics21, 793 (2020).
Davidson, N. M. et al. JAFFAL: detecting fusion genes with long-read transcriptome sequencing.Genome Biol.23, 10 (2022).
Karaoglanoglu, F., Chauve, C. & Hach, F. Genion, an accurate tool to detect gene fusion from long transcriptomics reads.BMC Genomics23, 129 (2022).
Masuda, K., Sota, Y. & Matsuda, H. Detecting fusion genes in long-read transcriptome sequencing data with FUGAREC.IPSJ Trans. Bioinform.17, 1–9 (2024).
Qin, Q. et al. CTAT-LR-fusion: accurate fusion transcript identification from long and short read isoform sequencing at bulk or single cell resolution. Preprint atbioRxivhttps://doi.org/10.1101/2024.02.24.581862 (2024).
Felton, C. A., Tang, A. D., Knisbacher, B. A., Wu, C. J. & Brooks, A. N. Detection of alternative isoforms of gene fusions from long-read RNA-seq with FLAIR-fusion. Preprint atbioRxivhttps://doi.org/10.1101/2022.08.01.502364 (2022).
Boutet, E. et al. UniProtKB/Swiss-Prot, the manually annotated section of the Uniprot KnowledgeBase: how to use the entry view.Methods Mol. Biol.1374, 23–54 (2016).
Mistry, J. et al. Pfam: the protein families database in 2021.Nucleic Acids Res.49, D412–D419 (2021).
Paysan-Lafosse, T. et al. InterPro in 2022.Nucleic Acids Res.51, D418–D427 (2023).
Milacic, M. et al. The Reactome Pathway Knowledgebase 2024.Nucleic Acids Res.52, D672–D678 (2024).
Gene Ontology Consortium et al. The Gene Ontology knowledgebase in 2023.Genetics224, iyad031 (2023).
Kanehisa, M., Furumichi, M., Sato, Y., Kawashima, M. & Ishiguro-Watanabe, M. KEGG for taxonomy-based analysis of pathways and genomes.Nucleic Acids Res.51, D587–D592 (2023).
Kalvari, I. et al. Rfam 14: expanded coverage of metagenomic, viral and microRNA families.Nucleic Acids Res.49, D192–D200 (2021).
Kozomara, A., Birgaoanu, M. & Griffiths-Jones, S. miRBase: from microRNA sequences to function.Nucleic Acids Res.47, D155–D162 (2019).
Rodriguez, J. M. et al. APPRIS: selecting functionally important isoforms.Nucleic Acids Res.50, D54–D59 (2022).
Chen, T.-W. et al. FunctionAnnotator, a versatile and efficient web tool for non-model organism annotation.Sci. Rep.7, 10430 (2017).
Ferrer-Bonsoms, J. A. et al. ISOGO: functional annotation of protein-coding splice variants.Sci. Rep.10, 1069 (2020).
Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles.Proc. Natl Acad. Sci. USA102, 15545–15550 (2005).
Tranchevent, L.-C. et al. Identification of protein features encoded by alternative exons using Exon Ontology.Genome Res.27, 1087–1097 (2017).
Louadi, Z. et al. DIGGER: exploring the functional role of alternative splicing in protein interactions.Nucleic Acids Res.49, D309–D318 (2021).
Lorenzi, C. et al. IRFinder-S: a comprehensive suite to discover and explore intron retention.Genome Biol.22, 307 (2021).
Vitting-Seerup, K. & Sandelin, A. IsoformSwitchAnalyzeR: analysis of changes in genome-wide patterns of alternative splicing and its functional consequences.Bioinformatics35, 4469–4471 (2019).
Gal-Oz, S. T., Haiat, N., Eliyahu, D., Shani, G. & Shay, T. DoChaP: the domain change presenter.Nucleic Acids Res.49, W162–W168 (2021).
Louadi, Z. et al. Functional enrichment of alternative splicing events with NEASE reveals insights into tissue identity and diseases.Genome Biol.22, 327 (2021).
Tilgner, H., Grubert, F., Sharon, D. & Snyder, M. P. Defining a personal, allele-specific, and single-molecule long-read transcriptome.Proc. Natl Acad. Sci. USA111, 9869–9874 (2014).
Liao, W.-W. et al. A draft human pangenome reference.Nature617, 312–324 (2023).
Keane, T. M. et al. Mouse genomic variation and its effect on phenotypes and gene regulation.Nature477, 289–294 (2011).
de Jong, T. V. et al. A revamped rat reference genome improves the discovery of genetic diversity in laboratory rats.Cell Genom.4, 100527 (2024).
Castel, S. E., Levy-Moonshine, A., Mohammadi, P., Banks, E. & Lappalainen, T. Tools and best practices for data processing in allelic expression analysis.Genome Biol.16, 195 (2015).
Sibbesen, J. A. et al. Haplotype-aware pantranscriptome analyses using spliced pangenome graphs.Nat. Methods20, 239–247 (2023).
Quinones-Valdez, G., Amoah, K. & Xiao, X. Long-read RNA-seq demarcatescis- andtrans-directed alternative RNA splicing. Preprint atbioRxivhttps://doi.org/10.1101/2024.06.14.599101v1 (2024).
Nassar, L. R. et al. The UCSC Genome Browser database: 2023 update.Nucleic Acids Res.51, D1188–D1195 (2023).
Robinson, J. T. et al. Integrative genomics viewer.Nat. Biotechnol.29, 24–26 (2011).
Payne, A., Holmes, N., Rakyan, V. & Loose, M. BulkVis: a graphical viewer for Oxford nanopore bulk FAST5 files.Bioinformatics35, 2193–2198 (2019).
Ferguson, J. M. & Smith, M. A. SquiggleKit: a toolkit for manipulating nanopore signal data.Bioinformatics35, 5372–5373 (2019).
Bruno, A., Aury, J.-M. & Engelen, S. BoardION: real-time monitoring of Oxford Nanopore sequencing instruments.BMC Bioinform.22, 245 (2021).
Reese, F. & Mortazavi, A. Swan: a library for the analysis and visualization of long-read transcriptomes.Bioinformatics37, 1322–1323 (2021).
Gustavsson, E. K., Zhang, D., Reynolds, R. H., Garcia-Ruiz, S. & Ryten, M. ggtranscript: an R package for the visualization and interpretation of transcript isoforms using ggplot2.Bioinformatics38, 3844–3846 (2022).
Stein, A. N., Joglekar, A., Poon, C.-L. & Tilgner, H. U. ScisorWiz: visualizing differential isoform expression in single-cell long-read data.Bioinformatics38, 3474–3476 (2022).
Sethi, A. J., Acera Mateos, P., Hayashi, R., Shirokikh, N. & Eyras, E. R2Dtool: integration and visualization of isoform-resolved RNA features.Bioinformatics40, btae495 (2024).
DeMario, S., Xu, K., He, K. & Chanfreau, G. F. Nanoblot: an R-package for visualization of RNA isoforms from long-read RNA-sequencing data.RNA29, 1099–1107 (2023).
Razaghi, R. et al. Modbamtools: analysis of single-molecule epigenetic data for long-range profiling, heterogeneity, and clustering. Preprint atbioRxivhttps://doi.org/10.1101/2022.07.07.499188 (2022).
Cheetham, S. W., Kindlova, M. & Ewing, A. D. Methylartist: tools for visualizing modified bases from nanopore sequence data.Bioinformatics38, 3109–3112 (2022).
Su, S. et al. NanoMethViz: an R/Bioconductor package for visualizing long-read methylation data.PLoS Comput. Biol.17, e1009524 (2021).
Yang, L. et al. NanoTrans: an integrated computational framework for comprehensive transcriptome analysis with Nanopore direct RNA sequencing.J. Genet. Genomics51, 1300–1309 (2024).
Delaunay, S., Helm, M. & Frye, M. RNA modifications in physiology and disease: towards clinical applications.Nat. Rev. Genet.25, 104–122 (2024).
Pratanwanich, P. N. et al. Identification of differential RNA modifications from nanopore direct RNA sequencing with xPore.Nat. Biotechnol.39, 1394–1402 (2021).
Abebe, J. S. et al. DRUMMER-rapid detection of RNA modifications through comparative nanopore sequencing.Bioinformatics38, 3113–3115 (2022).
Hendra, C. et al. Detection of m6A from direct RNA sequencing using a multiple instance learning framework.Nat. Methods19, 1590–1598 (2022).
Qin, H. et al. DENA: training an authentic neural network model using Nanopore sequencing data ofArabidopsis transcripts for detection and quantification ofN-methyladenosine on RNA.Genome Biol.23, 25 (2022).
Hassan, D., Acevedo, D., Daulatabad, S. V., Mir, Q. & Janga, S. C. Penguin: a tool for predicting pseudouridine sites in direct RNA nanopore sequencing data.Methods203, 478–487 (2022).
Chan, A., Naarmann-de Vries, I. S., Scheitl, C. P. M., Höbartner, C. & Dieterich, C. Detecting mA at single-molecular resolution via direct RNA sequencing and realistic training data.Nat. Commun.15, 3323 (2024).
Huang, S., Wylder, A. C. & Pan, T. Simultaneous nanopore profiling of mRNA mA and pseudouridine reveals translation coordination.Nat. Biotechnol.42, 1831–1835 (2024).
Wu, Y. et al. Transfer learning enables identification of multiple types of RNA modifications using nanopore direct RNA sequencing.Nat. Commun.15, 4049 (2024).
Teng, H., Stoiber, M., Bar-Joseph, Z. & Kingsford, C. Detecting m6A RNA modification from nanopore sequencing using a semisupervised learning framework.Genome Res.34, 1987–1999 (2024).
Cruciani, S. et al. De novo basecalling of RNA modifications at single molecule and nucleotide resolution.Genome Biol.26, 38 (2025).
Loman, N. J., Quick, J. & Simpson, J. T. A complete bacterial genome assembled de novo using only nanopore sequencing data.Nat. Methods12, 733–735 (2015).
Stoiber, M. et al. De novo identification of DNA modifications enabled by genome-guided nanopore signal processing. Preprint atbioRxivhttps://doi.org/10.1101/094672 (2016).
Gamaarachchi, H. et al. GPU accelerated adaptive banded event alignment for rapid comparative nanopore signal analysis.BMC Bioinform.21, 343 (2020).
Kovaka, S. et al. Uncalled4 improves nanopore DNA and RNA modification detection via fast and accurate signal alignment. Preprint atbioRxivhttps://doi.org/10.1101/2024.03.05.583511 (2024).
Alfonzo, J. D. et al. A call for direct sequencing of full-length RNAs to identify all modifications.Nat. Genet.53, 1113–1116 (2021).
Cozzuto, L., Delgado-Tejedor, A., Hermoso Pulido, T., Novoa, E. M. & Ponomarenko, J. Nanopore direct RNA sequencing data processing and analysis using masterofpores.Methods Mol. Biol.2624, 185–205 (2023).
Maestri, S. et al. Benchmarking of computational methods for m6A profiling with Nanopore direct RNA sequencing.Brief. Bioinform.25, 2 (2024).
Eckmann, C. R., Rammelt, C. & Wahle, E. Control of poly(A) tail length.Wiley Interdiscip. Rev. RNA2, 348–361 (2011).
Krause, M. et al. tailfindr: alignment-free poly(A) length measurement for Oxford Nanopore RNA and DNA sequencing.RNA25, 1229–1241 (2019).
10× Genomics and Oxford Nanopore Technologies.Application Note:Alternative Transcript Isoform Detection with Single Cell and Spatial Resolution. Available athttps://nanoporetech.com/resource-centre/alternative-transcript-isoform-detection-with-single-cell-and-spatial-resolution (2022).
Joglekar, A., Foord, C., Jarroux, J., Pollard, S. & Tilgner, H. U. From words to complete phrases: insight into single-cell isoforms using short and long reads.Transcription14, 92–104 (2023).
Lebrigand, K., Magnone, V., Barbry, P. & Waldmann, R. High throughput error corrected Nanopore single cell transcriptome sequencing.Nat. Commun.11, 4025 (2020).
Singh, M. et al. High-throughput targeted long-read single cell sequencing reveals the clonal and transcriptional landscape of lymphocytes.Nat. Commun.10, 3120 (2019).
Byrne, A. et al. Single-cell long-read targeted sequencing reveals transcriptional variation in ovarian cancer.Nat. Commun.15, 6916 (2024).
Shiau, C.-K. et al. High throughput single cell long-read sequencing analyses of same-cell genotypes and phenotypes in human tumors.Nat. Commun.14, 4124 (2023).
You, Y. et al. Identification of cell barcodes from long-read single-cell RNA-seq with BLAZE.Genome Biol.24, 66 (2023).
Shi, Z.-X. et al. High-throughput and high-accuracy single-cell RNA isoform analysis using PacBio circular consensus sequencing.Nat. Commun.14, 2631 (2023).
Rodriques, S. G. et al. Slide-seq: a scalable technology for measuring genome-wide expression at high spatial resolution.Science363, 1463–1467 (2019).
Chen, A. et al. Spatiotemporal transcriptomic atlas of mouse organogenesis using DNA nanoball-patterned arrays.Cell185, 1777–1792.e21 (2022).
Ren, Y. et al. Spatial transcriptomics reveals niche-specific enrichment and vulnerabilities of radial glial stem-like cells in malignant gliomas.Nat. Commun.14, 1028 (2023).
Lebrigand, K. et al. The spatial landscape of gene expression isoforms in tissue sections.Nucleic Acids Res.51, e47 (2023).
Wissink, E. M., Vihervaara, A., Tippens, N. D. & Lis, J. T. Nascent RNA analyses: tracking transcription and its regulation.Nat. Rev. Genet.20, 705–723 (2019).
Long, Y., Jia, J., Mo, W., Jin, X. & Zhai, J. FLEP-seq: simultaneous detection of RNA polymerase II position, splicing status, polyadenylation site and poly(A) tail length at genome-wide scale by single-molecule nascent RNA sequencing.Nat. Protoc.16, 4355–4381 (2021).
Oesterreich, F. C. et al. Splicing of nascent RNA coincides with intron exit from RNA polymerase II.Cell165, 372–381 (2016).
Reimer, K. A., Mimoso, C. A., Adelman, K. & Neugebauer, K. M. Co-transcriptional splicing regulates 3′ end cleavage during mammalian erythropoiesis.Mol. Cell81, 998–1012.e7 (2021).
Herzel, L., Straube, K. & Neugebauer, K. M. Long-read sequencing of nascent RNA reveals coupling among RNA processing events.Genome Res.28, 1008–1019 (2018).
Sousa-Luís, R. et al. POINT technology illuminates the processing of polymerase-associated intact nascent transcripts.Mol. Cell81, 1935–1950.e6 (2021).
Prudêncio, P., Savisaar, R., Rebelo, K., Martinho, R. G. & Carmo-Fonseca, M. Transcription and splicing dynamics during earlyDrosophila development.RNA28, 139–161 (2022).
Drexler, H. L., Choquet, K. & Churchman, L. S. Splicing kinetics and coordination revealed by direct nascent RNA sequencing through nanopores.Mol. Cell77, 985–998.e8 (2020).
Maier, K. C., Gressel, S., Cramer, P. & Schwalb, B. Native molecule sequencing by nano-ID reveals synthesis and stability of RNA isoforms.Genome Res.30, 1332–1344 (2020).
Jagannatha, P. et al. Long-read Ribo-STAMP simultaneously measures transcription and translation with isoform resolution.Genome Res.279176, 124 (2024).
Adamopoulos, P. G. et al. Hybrid-seq deciphers the complex transcriptional profile of the human DNA repair associated gene.RNA Biol.20, 281–295 (2023).
Grealey, J. et al. The carbon footprint of bioinformatics.Mol. Biol. Evol.39, msac034 (2022).
Hewel, C. et al. Direct RNA sequencing enables improved transcriptome assessment and tracking of RNA modifications for medical applications. Preprint atbioRxivhttps://doi.org/10.1101/2024.07.25.605188 (2024).
Reese, F. et al. The ENCODE4 long-read RNA-seq collection reveals distinct classes of transcript structure diversity. Preprint atbioRxivhttps://doi.org/10.1101/2023.05.15.540865 (2023).
Acknowledgements
The authors’ work has been funded by the European Union Marie Skłodowska-Curie Actions Postdoctoral Fellowship (HORIZON-MSCA-2023-PF-01-01 grant agreement project 101149931), the European Union Marie Skłodowska-Curie Actions Doctoral Network project LongTREC (HORIZON-MSCA-2021-DN-01 grant agreement project 101072892) and the Spanish Science Ministry, grant number PID2020-119537RB-I00.
Author information
These authors contributed equally: Carolina Monzó, Tianyuan Liu.
Authors and Affiliations
Institute for Integrative Systems Biology, Spanish National Research Council, Paterna, Valencia, Spain
Carolina Monzó, Tianyuan Liu & Ana Conesa
- Carolina Monzó
You can also search for this author inPubMed Google Scholar
- Tianyuan Liu
You can also search for this author inPubMed Google Scholar
- Ana Conesa
You can also search for this author inPubMed Google Scholar
Contributions
The authors contributed equally to all aspects of the article.
Corresponding authors
Correspondence toCarolina Monzó orAna Conesa.
Ethics declarations
Competing interests
A.C. has received in-kind funding from Pacific Biosciences for library preparation and sequencing. A.C. and T.L. collaborate with Oxford Nanopore in the Marie Skłodowska-Curie Actions Doctoral Network project LongTREC. C.M. declares no competing interests.
Peer review
Peer review information
Nature Reviews Genetics thanks Jonathan Göke and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Related links
CycloneSEQ:https://en.mgitech.cn/
Dorado:https://github.com/nanoporetech/dorado
LyRic:https://github.com/guigolab/LyRic
nf-core/nanoseq:https://nf-co.re/nanoseq/3.1.0/
NVIDIA:https://nvidia.com/en-us/case-studies/long-read-sequencing/
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Monzó, C., Liu, T. & Conesa, A. Transcriptomics in the era of long-read sequencing.Nat Rev Genet (2025). https://doi.org/10.1038/s41576-025-00828-z
Accepted:
Published:
Share this article
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative