Movatterモバイル変換


[0]ホーム

URL:


US20150302144A1 - Hierarchical genome assembly method using single long insert library - Google Patents

Hierarchical genome assembly method using single long insert library
Download PDF

Info

Publication number
US20150302144A1
US20150302144A1US14/716,617US201514716617AUS2015302144A1US 20150302144 A1US20150302144 A1US 20150302144A1US 201514716617 AUS201514716617 AUS 201514716617AUS 2015302144 A1US2015302144 A1US 2015302144A1
Authority
US
United States
Prior art keywords
sequence
reads
sequencing
consensus
alignment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US14/716,617
Inventor
Chen-Shan CHIN
Stephen Turner
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Pacific Biosciences of California Inc
Original Assignee
Pacific Biosciences of California Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Pacific Biosciences of California IncfiledCriticalPacific Biosciences of California Inc
Priority to US14/716,617priorityCriticalpatent/US20150302144A1/en
Publication of US20150302144A1publicationCriticalpatent/US20150302144A1/en
Abandonedlegal-statusCriticalCurrent

Links

Images

Classifications

Definitions

Landscapes

Abstract

The present invention is generally directed to a hierarchical genome assembly process for producing high-quality de novo genome assemblies. The method utilizes a single, long-insert, shotgun DNA library in conjunction with Single Molecule, Real-Time (SMRT®) DNA sequencing, and obviates the need for additional sample preparation and sequencing data sets required for previously described hybrid assembly strategies. Efficient de novo assembly from genomic DNA to a finished genome sequence is demonstrated for several microorganisms using as little as three SMRT® cells, and for bacterial artificial chromosomes (BACs) using sequencing data from just one SMRT® Cell. Part of this new assembly workflow is a new consensus algorithm which takes advantage of SMRT® sequencing primary quality values, to produce a highly accurate de novo genome sequence, exceeding 99.999% (QV 50) accuracy. The methods are typically performed on a computer and comprise an algorithm that constructs sequence alignment graphs from pairwise alignment of sequence reads to a common reference.

Description

Claims (20)

What is claimed:
1. A computer-implemented method to determine a consensus sequence from a set of polynucleotide sequence reads without using a previously known reference sequence, the method comprising:
a) providing a set of polynucleotide sequence reads that comprise errors introduced by a sequencing reaction, wherein said polynucleotide sequence reads in the set comprise overlapping polynucleotide sequences that are alignable to each other;
b) choosing a seed read from the set of polynucleotide sequence reads;
c) performing pairwise alignment of all other polynucleotide sequence reads in the set to the seed read to generate a set of sequence alignments;
d) constructing a multiple sequence alignment from the set of sequence alignments, wherein the errors in the set of polynucleotide sequence reads are present in the resulting multiple sequence alignment and further wherein the multiple sequence alignment is constructed without the use of a previously known reference sequence;
e) performing an error correction step on the multiple sequence alignment by applying a consensus algorithm to the multiple sequence alignment, wherein the consensus algorithm reduces the number of errors in the seed sequence using information within the multiple sequence alignment and generates a consensus sequence for the set of polynucleotide sequence reads, thereby determine a consensus sequence from a set of polynucleotide sequence reads.
2. The method ofclaim 1, wherein the set of polynucleotide sequence reads comprises raw sequencing data.
3. The method ofclaim 1, wherein the seed read is greater than 1000 base pairs in length.
4. The method ofclaim 1, wherein the seed read has a length between 1000 and 10,000 base pairs.
5. The method ofclaim 1, wherein the seed read is at least 2, 3, 5, 10, 15, or 20 kb in length.
6. The method ofclaim 1, wherein the seed read has an accuracy of less than 90%.
7. The method ofclaim 1, further comprising normalizing the set of sequence alignments prior to constructing the multiple sequence alignment.
8. The method ofclaim 1, wherein the consensus algorithm uses a dynamic programming process to generate the consensus sequence.
9. The method ofclaim 1, wherein the seed read is generated using a single-molecule sequencing technology.
10. The method ofclaim 1, wherein each read in the set of polynucleotide sequence reads contains at least a portion of a region of interest.
11. The method ofclaim 1, wherein the set of sequence alignments is generated by a method comprising a pairwise local alignment algorithm.
12. The method ofclaim 1, wherein the consensus algorithm is performed iteratively, with each iteration reducing the number of errors in a resulting consensus sequence.
13. The method ofclaim 12, wherein at the end of each iteration, the resulting consensus sequence is used as the seed read for the subsequent iteration.
14. The method ofclaim 1, wherein the polynucleotide sequence reads are genomic DNA sequence reads.
15. The method ofclaim 1, wherein the polynucleotide sequence reads comprise replicate sequence information.
16. A method of determining a consensus sequences for a region of interest, the method comprising:
a) providing a mixed population of nucleic acid sequence reads from the region of interest;
b) choosing a sequence read from the mixed population of nucleic acid sequence reads as a seed sequence;
c) aligning the nucleic acid sequence reads to the seed sequence to generate a set of sequence alignments;
d) constructing a multiple sequence alignment using the set of sequence alignments; and
e) based upon the multiple sequence alignment, determining a consensus sequence for the mixed population without the use of a reference sequence, thereby determining a consensus sequence for the region of interest.
17. The method ofclaim 16, wherein the aligning comprises subjecting the set of sequence alignments to normalization prior to constructing the multiple sequence alignment.
18. The method ofclaim 17, wherein the normalization comprises at least one of the group consisting of: changing mismatches to indels and moving gaps to right-most equivalent positions.
19. The method ofclaim 16, wherein said constructing the multiple sequence alignment comprises constructing a multigraph and merging nodes in the multigraph.
20. The method ofclaim 16, wherein the seed sequence is at least 2, 3, 5, 10, 15, or 20 kb in length.
US14/716,6172012-07-132015-05-19Hierarchical genome assembly method using single long insert libraryAbandonedUS20150302144A1 (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
US14/716,617US20150302144A1 (en)2012-07-132015-05-19Hierarchical genome assembly method using single long insert library

Applications Claiming Priority (4)

Application NumberPriority DateFiling DateTitle
US201261671554P2012-07-132012-07-13
US201361784219P2013-03-142013-03-14
US13/941,442US10777301B2 (en)2012-07-132013-07-12Hierarchical genome assembly method using single long insert library
US14/716,617US20150302144A1 (en)2012-07-132015-05-19Hierarchical genome assembly method using single long insert library

Related Parent Applications (1)

Application NumberTitlePriority DateFiling Date
US13/941,442ContinuationUS10777301B2 (en)2012-07-132013-07-12Hierarchical genome assembly method using single long insert library

Publications (1)

Publication NumberPublication Date
US20150302144A1true US20150302144A1 (en)2015-10-22

Family

ID=49947258

Family Applications (2)

Application NumberTitlePriority DateFiling Date
US13/941,442Active2035-06-11US10777301B2 (en)2012-07-132013-07-12Hierarchical genome assembly method using single long insert library
US14/716,617AbandonedUS20150302144A1 (en)2012-07-132015-05-19Hierarchical genome assembly method using single long insert library

Family Applications Before (1)

Application NumberTitlePriority DateFiling Date
US13/941,442Active2035-06-11US10777301B2 (en)2012-07-132013-07-12Hierarchical genome assembly method using single long insert library

Country Status (1)

CountryLink
US (2)US10777301B2 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US9618474B2 (en)2014-12-182017-04-11Edico Genome, Inc.Graphene FET devices, systems, and methods of using the same for sequencing nucleic acids
US9857328B2 (en)2014-12-182018-01-02Agilome, Inc.Chemically-sensitive field effect transistors, systems and methods for manufacturing and using the same
US9859394B2 (en)2014-12-182018-01-02Agilome, Inc.Graphene FET devices, systems, and methods of using the same for sequencing nucleic acids
US20180157787A1 (en)*2016-10-192018-06-07Pacific Biosciences Of California, Inc.Coding genome reconstruction from transcript sequences
US10006910B2 (en)2014-12-182018-06-26Agilome, Inc.Chemically-sensitive field effect transistors, systems, and methods for manufacturing and using the same
US10020300B2 (en)2014-12-182018-07-10Agilome, Inc.Graphene FET devices, systems, and methods of using the same for sequencing nucleic acids
US10429342B2 (en)2014-12-182019-10-01Edico Genome CorporationChemically-sensitive field effect transistor
US10811539B2 (en)2016-05-162020-10-20Nanomedical Diagnostics, Inc.Graphene FET devices, systems, and methods of using the same for sequencing nucleic acids

Families Citing this family (44)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US9898575B2 (en)2013-08-212018-02-20Seven Bridges Genomics Inc.Methods and systems for aligning sequences
US9116866B2 (en)2013-08-212015-08-25Seven Bridges Genomics Inc.Methods and systems for detecting sequence variants
WO2015058120A1 (en)2013-10-182015-04-23Seven Bridges Genomics Inc.Methods and systems for aligning sequences in the presence of repeating elements
CN105849279B (en)2013-10-182020-02-18七桥基因公司Methods and systems for identifying disease-induced mutations
EP3058332B1 (en)2013-10-182019-08-28Seven Bridges Genomics Inc.Methods and systems for genotyping genetic samples
US10832797B2 (en)2013-10-182020-11-10Seven Bridges Genomics Inc.Method and system for quantifying sequence alignment
US9092402B2 (en)2013-10-212015-07-28Seven Bridges Genomics Inc.Systems and methods for using paired-end data in directed acyclic structure
WO2015094854A1 (en)*2013-12-182015-06-25Pacific Biosciences Inc.Iterative clustering of sequence reads for error correction
JP2017510871A (en)*2014-01-102017-04-13セブン ブリッジズ ジェノミクス インコーポレイテッド System and method for use of known alleles in read mapping
US9817944B2 (en)2014-02-112017-11-14Seven Bridges Genomics Inc.Systems and methods for analyzing sequence data
US10192026B2 (en)2015-03-052019-01-29Seven Bridges Genomics Inc.Systems and methods for genomic pattern analysis
CN107615283B (en)*2015-05-262022-07-05加利福尼亚太平洋生物科学股份有限公司Methods, software and systems for diploid genome assembly and haplotype sequence reconstruction
WO2016205767A1 (en)*2015-06-182016-12-22Pacific Biosciences Of California, IncString graph assembly for polyploid genomes
US10793895B2 (en)2015-08-242020-10-06Seven Bridges Genomics Inc.Systems and methods for epigenetic analysis
US10724110B2 (en)2015-09-012020-07-28Seven Bridges Genomics Inc.Systems and methods for analyzing viral nucleic acids
US10584380B2 (en)2015-09-012020-03-10Seven Bridges Genomics Inc.Systems and methods for mitochondrial analysis
US11347704B2 (en)*2015-10-162022-05-31Seven Bridges Genomics Inc.Biological graph or sequence serialization
US20170199960A1 (en)2016-01-072017-07-13Seven Bridges Genomics Inc.Systems and methods for adaptive local alignment for graph genomes
US10364468B2 (en)2016-01-132019-07-30Seven Bridges Genomics Inc.Systems and methods for analyzing circulating tumor DNA
US10262102B2 (en)2016-02-242019-04-16Seven Bridges Genomics Inc.Systems and methods for genotyping with graph reference
US10790044B2 (en)2016-05-192020-09-29Seven Bridges Genomics Inc.Systems and methods for sequence encoding, storage, and compression
US11289177B2 (en)2016-08-082022-03-29Seven Bridges Genomics, Inc.Computer method and system of identifying genomic mutations using graph-based local assembly
US11250931B2 (en)2016-09-012022-02-15Seven Bridges Genomics Inc.Systems and methods for detecting recombination
US10650621B1 (en)2016-09-132020-05-12Iocurrents, Inc.Interfacing with a vehicular controller area network
US10241970B2 (en)2016-11-142019-03-26Microsoft Technology Licensing, LlcReduced memory nucleotide sequence comparison
US10319465B2 (en)2016-11-162019-06-11Seven Bridges Genomics Inc.Systems and methods for aligning sequences to graph references
CN108460245B (en)*2017-02-212020-11-06深圳华大基因科技服务有限公司Method and apparatus for optimizing second generation assembly results using third generation sequences
US11347844B2 (en)2017-03-012022-05-31Seven Bridges Genomics, Inc.Data security in bioinformatic sequence analysis
US10726110B2 (en)2017-03-012020-07-28Seven Bridges Genomics, Inc.Watermarking for data security in bioinformatic sequence analysis
CN107229839B (en)*2017-05-252020-05-22西安电子科技大学 An Indel detection method based on next-generation sequencing data
US11447818B2 (en)*2017-09-152022-09-20Illumina, Inc.Universal short adapters with variable length non-random unique molecular identifiers
US12046325B2 (en)2018-02-142024-07-23Seven Bridges Genomics Inc.System and method for sequence identification in reassembly variant calling
CN110310702B (en)*2018-03-162021-03-23深圳华大基因科技服务有限公司Method, device and storage medium for repairing genome sequencing assembly result
CN108776749B (en)*2018-06-052022-05-03北京诺禾致源科技股份有限公司Sequencing data processing method and device
WO2020154630A1 (en)*2019-01-252020-07-30Pacific Biosciences Of California, Inc.Systems and methods for graph based mapping of nucleic acid fragments
CN113767438B (en)*2019-02-282025-06-13加利福尼亚太平洋生物科学股份有限公司 Improving alignment using homopolymer folding of sequencing reads
CN111564182B (en)*2020-05-122024-02-09西藏自治区农牧科学院水产科学研究所High-weight recovery of fish of the genus of Glehnian chromosome-level assembly of (2)
US11527307B2 (en)*2020-11-052022-12-13Illumina, Inc.Quality score compression
CN112349350B (en)*2020-11-092022-07-19山西大学 A method for strain identification based on a Dunaliella core genome sequence
WO2022125754A1 (en)*2020-12-102022-06-16The Regents Of The University Of CaliforniaComputational method and system for compression of genetic information
TWI835203B (en)*2021-07-202024-03-11奧義智慧科技股份有限公司Log categorization device and related computer program product with adaptive clustering function
CN114694755B (en)*2022-03-282023-01-24中山大学Genome assembly method, apparatus, device and storage medium
CN118737269B (en)*2024-08-302024-11-19墨卓生物科技(浙江)有限公司Method for distinguishing strains in single-cell microorganism genome sequencing result
CN119339780B (en)*2024-12-182025-04-22深圳瑞吉生物科技有限公司Method for predicting binding affinity of human leukocyte antigen class I molecules and peptide fragments

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US7056661B2 (en)1999-05-192006-06-06Cornell Research Foundation, Inc.Method for sequencing nucleic acid molecules
US7995202B2 (en)2006-02-132011-08-09Pacific Biosciences Of California, Inc.Methods and systems for simultaneous real-time monitoring of optical signals from multiple sources
AU2008261935B2 (en)2007-06-062013-05-02Pacific Biosciences Of California, Inc.Methods and processes for calling bases in sequence by incorporation methods
CA2693979A1 (en)2007-07-262009-02-05Pacific Biosciences Of California, Inc.Molecular redundant sequencing
US8370079B2 (en)2008-11-202013-02-05Pacific Biosciences Of California, Inc.Algorithms for sequence determination
US9165109B2 (en)2010-02-242015-10-20Pacific Biosciences Of California, Inc.Sequence assembly and consensus sequence determination
US20130138358A1 (en)2010-02-242013-05-30Pacific Biosciences Of California, Inc.Algorithms for sequence determination
US20110257889A1 (en)2010-02-242011-10-20Pacific Biosciences Of California, Inc.Sequence assembly and consensus sequence determination
US20120015825A1 (en)2010-07-062012-01-19Pacific Biosciences Of California, Inc.Analytical systems and methods with software mask
US8465922B2 (en)2010-08-262013-06-18Pacific Biosciences Of California, Inc.Methods and systems for monitoring reactions

Cited By (11)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US9618474B2 (en)2014-12-182017-04-11Edico Genome, Inc.Graphene FET devices, systems, and methods of using the same for sequencing nucleic acids
US9857328B2 (en)2014-12-182018-01-02Agilome, Inc.Chemically-sensitive field effect transistors, systems and methods for manufacturing and using the same
US9859394B2 (en)2014-12-182018-01-02Agilome, Inc.Graphene FET devices, systems, and methods of using the same for sequencing nucleic acids
US10006910B2 (en)2014-12-182018-06-26Agilome, Inc.Chemically-sensitive field effect transistors, systems, and methods for manufacturing and using the same
US10020300B2 (en)2014-12-182018-07-10Agilome, Inc.Graphene FET devices, systems, and methods of using the same for sequencing nucleic acids
US10429342B2 (en)2014-12-182019-10-01Edico Genome CorporationChemically-sensitive field effect transistor
US10429381B2 (en)2014-12-182019-10-01Agilome, Inc.Chemically-sensitive field effect transistors, systems, and methods for manufacturing and using the same
US10494670B2 (en)2014-12-182019-12-03Agilome, Inc.Graphene FET devices, systems, and methods of using the same for sequencing nucleic acids
US10607989B2 (en)2014-12-182020-03-31Nanomedical Diagnostics, Inc.Graphene FET devices, systems, and methods of using the same for sequencing nucleic acids
US10811539B2 (en)2016-05-162020-10-20Nanomedical Diagnostics, Inc.Graphene FET devices, systems, and methods of using the same for sequencing nucleic acids
US20180157787A1 (en)*2016-10-192018-06-07Pacific Biosciences Of California, Inc.Coding genome reconstruction from transcript sequences

Also Published As

Publication numberPublication date
US20140025312A1 (en)2014-01-23
US10777301B2 (en)2020-09-15

Similar Documents

PublicationPublication DateTitle
US10777301B2 (en)Hierarchical genome assembly method using single long insert library
JP7284849B2 (en) Methods and systems for generation and error correction of unique molecular index sets with non-uniform molecular lengths
US20240120021A1 (en)Methods and systems for large scale scaffolding of genome assemblies
US20210269875A1 (en)Sequence assembly
US9165109B2 (en)Sequence assembly and consensus sequence determination
EP3304383B1 (en)De novo diploid genome assembly and haplotype sequence reconstruction
US10839940B2 (en)Method, computer-accessible medium and systems for score-driven whole-genome shotgun sequence assemble
US20110257889A1 (en)Sequence assembly and consensus sequence determination
US20190244678A1 (en)Methods, systems and processes of de novo assembly of sequencing reads
Kremer et al.Approaches for in silico finishing of microbial genome sequences
US20150169823A1 (en)String graph assembly for polyploid genomes
Bickhart et al.Generation of lineage-resolved complete metagenome-assembled genomes by precision phasing
Sakoparnig et al.Whole genome phylogenies reflect long-tailed distributions of recombination rates in many bacterial species
Jenike et al.k-mer approaches for biodiversity genomics
Harris et al.Whole-genome sequencing for rapid and accurate identification of bacterial transmission pathways
Shaw et al.Floria: fast and accurate strain haplotyping in metagenomes
LapidusGenome sequence databases (overview): sequencing and assembly
Sheikh et al.Base-calling for bioinformaticians
Narzisi et al.Lancet: genome-wide somatic variant calling using localized colored DeBruijn graphs
Anderson et al.Amira: gene-space de Bruijn graphs to improve the detection of AMR genes from bacterial long reads
Smith et al.Considerations of Depth, Coverage, and Other Read Quality Metrics
이선호New Methods for SNV/InDel Calling and Haplotyping from Next Generation Sequencing Data
SöylevAlgorithms for Structural Variation Discovery Using Multiple Sequence Signatures
LapidusGenome sequence databases (overview): Sequencing and
Kizina et al.In silico detection of taxon-unrelated contigs and reassembling of taxon-specific reads improve draft genomes of strains

Legal Events

DateCodeTitleDescription
STCBInformation on status: application discontinuation

Free format text:ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION


[8]ページ先頭

©2009-2025 Movatter.jp