CN110819621A

Movatterモバイル変換

Info

Publication number: CN110819621A
Application number: CN201910597751.9A
Authority: CN
Inventors: 弗朗索瓦·维尼奥尔特; 威廉·多纳休
Original assignee: F Hoffmann La Roche AG
Current assignee: F Hoffmann La Roche AG
Priority date: 2014-02-11
Filing date: 2015-02-10
Publication date: 2020-02-21
Anticipated expiration: 2035-02-10
Also published as: US20190360045A1; CN110819621B; US20160201124A1; EP3105349A1; ES2819277T3; EP3105349B1; WO2015121236A1; CA2938910A1; CN106029962A; US10421999B2; JP2017511121A; JP6494045B2; US10731212B2

Abstract

Provided herein are methods, compositions, and kits for targeted sequencing of polynucleotides with high accuracy, low amplification and sequencing errors.

Description

Translated fromChinese

靶向测序和UID过滤Targeted sequencing and UID filtering

本申请是国际申请号PCT/EP2015/052723，国际申请日为2015年2 月10，进入中国国家阶段日期为2016年8月4日，中国国家申请号为 201580007216.3，发明名称为“靶向测序和UID过滤”的分案申请。This application is the international application number PCT/EP2015/052723, the international application date is February 10, 2015, the date of entering the Chinese national phase is August 4, 2016, the Chinese national application number is 201580007216.3, and the name of the invention is "targeted sequencing and UID filtering" divisional application.

背景background

多种当前的下一代测序(next-generation sequencing，NGS)技术使用合成测序(sequencing by synthesis，SBS)的形式。NGS技术具有大规模平行测序数百万DNA模板的能力。为了获得高通量，将数百万条单链模板排列在芯片上，并且独立读取每条模板的序列。第二代NGS平台在固体支持物上克隆扩增(clonally amplify)DNA模板，然后循环测序。第三代NGS 平台利用单分子无PCR的流程和无循环的化学(Schadt等，Hum Mol Genet.，19(R2)：R227-40，(2010))。Various current next-generation sequencing (NGS) technologies use a form of sequencing by synthesis (SBS). NGS technology has the ability to sequence millions of DNA templates in massively parallel. For high throughput, millions of single-stranded templates are arrayed on the chip, and the sequence of each template is read independently. The second generation NGS platform clonally amplifies DNA templates on solid supports, followed by cycle sequencing. Third-generation NGS platforms utilize single-molecule PCR-free procedures and cycle-free chemistry (Schadt et al., Hum Mol Genet., 19(R2):R227-40, (2010)).

NGS和其他高通量测序方法的主要限制包括测序和扩增错误和偏差。由于与扩增和测序相关的错误和偏差，这些测序技术偏离理想均一的读数分布，并且可能损害多种科学和医学应用。对于临床应用，在对患者应用之前，实验室必须验证突变或单核苷酸多态性(SNP)确定(call)的准确性。典型地，通过在获得所述序列后制备靶标的Sanger文库，并且“Sange 验证”下一代测序(NGS)结果而进行序列验证。为了克服NGS平台比传统Sanger测序更高的出错率，需要高水平的冗余或序列覆盖，以准确确定碱基。准确的碱基确定典型地需要30-50x覆盖，尽管这可能基于测序平台的准确度、变体检测方法和被测序的物质而不同(Koboldt DC等， Brief Bioinform.，11：484-98(2010))。一般地，所有第二代平台都产生准确度相似(98-99.5％)的数据(取决于充分的序列深度，例如，覆盖率)，以进行更高准确性的碱基确定。Major limitations of NGS and other high-throughput sequencing methods include sequencing and amplification errors and biases. Due to the errors and biases associated with amplification and sequencing, these sequencing techniques deviate from the ideal uniform read distribution and can compromise a variety of scientific and medical applications. For clinical application, the laboratory must verify the accuracy of mutation or single nucleotide polymorphism (SNP) calls before application to patients. Typically, sequence verification is performed by preparing a Sanger library of the target after obtaining the sequence, and "Sanger-validating" the next generation sequencing (NGS) results. To overcome the higher error rates of NGS platforms than traditional Sanger sequencing, a high level of redundancy or sequence coverage is required for accurate base determination. Accurate base determination typically requires 30-50x coverage, although this may vary based on the accuracy of the sequencing platform, variant detection method, and material being sequenced (Koboldt DC et al., Brief Bioinform., 11:484-98 (2010). )). In general, all second generation platforms produced data with similar (98-99.5%) accuracy (depending on sufficient sequence depth, e.g., coverage) for higher accuracy base calling.

测序偏差可以表现为覆盖偏差(偏离均一的读数分布)和误差(偏离均一的错配、插入和缺失率)。由于高通量测序方法中所用的化学品本质上具有偏好性，因此目前的测序技术是有限的。一些核苷酸序列比另一些序列更频繁地读出，并且具有固有的出错率。取决于多种因素，包括所用的测序平台，读取错误(大部分是由于低质量碱基确定导致的错误鉴定的碱基)可能在每100-2000个碱基一个错误的范围内在任何地方发生。尽管覆盖偏差是重要的测序度量，但是序列准确度的变异也是重要的。Sequencing bias can manifest as coverage bias (deviation from uniform read distribution) and error (deviation from uniform mismatch, insertion, and deletion rates). Current sequencing technologies are limited due to the inherently biased nature of the chemicals used in high-throughput sequencing methods. Some nucleotide sequences are read more frequently than others and have inherent error rates. Depending on a number of factors, including the sequencing platform used, read errors (mostly misidentified bases due to low quality base determination) can occur anywhere in the range of one error per 100-2000 bases . Although coverage bias is an important sequencing measure, variation in sequence accuracy is also important.

由于在用于测序的核苷酸模板的文库构建过程中的条件可能显著地影响测序偏差，另一个主要的限制是PCR扩增偏差。已经证明用于文库构建的PCR扩增是测序数据误差的来源(Keohavong P等，PNAS 86：9253-9257(1989)；Cariello等，Nucleic Acids Res.，19：4193-4198 (1991)；Cline等，Nucleic Acids Res.，24：3546-3551(1996))。文库构建方法可以影响覆盖的均匀性。例如，PCR扩增也是文库构建过程中GC-极端区域低覆盖率(under coverage)的已知来源(Aird等，Genome Biol.， 12：R18(2011)；Oyola等，BMCGenomics，13：1；22(2012)；Benjamini等， Nucleic Acids Res.，40：e72(2012))。在用于聚簇扩增的桥连PCR过程中，也可以引入类似的偏差，并且在一些NGS平台上，链特异性的错误可能通过损害比对器性能而导致覆盖偏差(Nakamura等，Nucleic Acids Res.， 39：e90(2011))。其他使用不含终止剂的化学品的平台可能在其准确测序长的均聚物的能力方面受到限制，并且还可能对文库构建时由乳液PCR (emulsion PCR)引入的覆盖偏差敏感(Rothberg等，Nature，475：348-352 (2011)；Margulies等，Nature 2005，437：376-380(2005)；Huse等，Genome Biol.，8：R143(2007)；Merriman等，Electrophoresis，33：3397-3417 (2012))。Since conditions during library construction of nucleotide templates for sequencing can significantly affect sequencing bias, another major limitation is PCR amplification bias. PCR amplification for library construction has been shown to be a source of sequencing data errors (Keohavong P et al., PNAS 86:9253-9257 (1989); Cariello et al., Nucleic Acids Res., 19:4193-4198 (1991); Cline et al. , Nucleic Acids Res., 24: 3546-3551 (1996)). The method of library construction can affect the uniformity of coverage. For example, PCR amplification is also a known source of under coverage of GC-extreme regions during library construction (Aird et al., Genome Biol., 12:R18 (2011); Oyola et al., BMC Genomics, 13:1;22 (2012); Benjamini et al., Nucleic Acids Res., 40:e72 (2012)). Similar biases can be introduced during bridging PCR for cluster amplification, and on some NGS platforms, strand-specific errors can lead to coverage biases by impairing aligner performance (Nakamura et al., Nucleic Acids Res., 39:e90 (2011)). Other platforms using chemicals without terminators may be limited in their ability to accurately sequence long homopolymers and may also be sensitive to coverage bias introduced by emulsion PCR during library construction (Rothberg et al. Nature, 475: 348-352 (2011); Margulies et al, Nature 2005, 437: 376-380 (2005); Huse et al, Genome Biol., 8: R143 (2007); Merriman et al, Electrophoresis, 33: 3397-3417 (2012)).

概述Overview

在一方面中，提供一种产生多核苷酸文库的方法，其包括：(a)使用第一引物产生来自样品的靶标多核苷酸的第一互补序列(complement sequence，CS)，所述第一引物包含靶标特异性序列；(b)向所述第一CS 上附加包含第一引物结合序列(primer bindingsequence，PBS)或其部分的衔接子，由此形成修饰的互补序列(modified complementsequence，MCS)； (c)延伸与所述MCS杂交的第二引物，由此形成第二CS，其中所述第二引物包含：(i)靶标特异性区域，和(ii)第二PBS；并且(d)使用分别与第一PBS和第二PBS杂交的引物扩增第二CS，其中所述第一或第二引物包含特有的识别(unique identification，UID)序列。In one aspect, there is provided a method of generating a polynucleotide library comprising: (a) using a first primer to generate a first complement sequence (CS) of a target polynucleotide from a sample, the first The primers comprise target-specific sequences; (b) an adaptor comprising a first primer binding sequence (PBS) or a portion thereof is appended to the first CS, thereby forming a modified complement sequence (MCS) (c) extending a second primer hybridized to the MCS, thereby forming a second CS, wherein the second primer comprises: (i) a target-specific region, and (ii) a second PBS; and (d) The second CS is amplified using primers that hybridize to the first PBS and the second PBS, respectively, wherein the first or second primer contains a unique identification (UID) sequence.

在一些实施方案中，第一引物包含UID。In some embodiments, the first primer comprises a UID.

在一些实施方案中，第二引物包含UID。In some embodiments, the second primer comprises a UID.

在一方面中，提供一种产生多核苷酸文库的方法，所述方法包括： (a)使与靶标多核苷酸杂交的靶标特异性的第一引物延伸，以形成第一 CS；(b)向所述第一CS上附加衔接子，以形成MCS；(c)使与MCS杂交的第二引物延伸，以形成第二CS；并且(d)扩增第二CS；其中(a)或(c) 不包含指数扩增，并且其中所述第一或第二引物包含UID。In one aspect, a method of generating a polynucleotide library is provided, the method comprising: (a) extending a target-specific first primer hybridized to a target polynucleotide to form a first CS; (b) attaching an adaptor to the first CS to form an MCS; (c) extending a second primer hybridizing to the MCS to form a second CS; and (d) amplifying the second CS; wherein (a) or ( c) does not comprise exponential amplification, and wherein the first or second primer comprises a UID.

在一方面中，提供一种产生多核苷酸文库的方法，所述方法包括：(a) 由靶标多核苷酸产生第一CS，或其修饰形式(MCS)；(b)由包含第一CS 序列的多核苷酸产生第二CS；其中第二CS通过非指数扩增反应产生；并且(c)扩增第二CS；其中所述第一CS或第二CS包含UID。In one aspect, there is provided a method of generating a polynucleotide library, the method comprising: (a) generating a first CS, or a modified form (MCS) thereof, from a target polynucleotide; (b) generating a first CS from a target polynucleotide; The polynucleotide of the sequence produces a second CS; wherein the second CS is produced by a non-exponential amplification reaction; and (c) amplifies the second CS; wherein the first CS or the second CS comprises a UID.

在一些实施方案中，第一CS包含UID。In some implementations, the first CS includes a UID.

在一些实施方案中，第二CS包含UID。In some implementations, the second CS includes a UID.

在一方面中，提供一种准确确定靶标多核苷酸的序列的方法，所述方法包括：(a)从由靶标多核苷酸产生的第一CS或其修饰形式(MCS)产生第二CS；其中第一CS、第二CS或MCS包含UID，并且其中第一和第二CSs各自单独地通过下述产生：(i)引物延伸反应，或(ii)线性扩增反应；(b)扩增第二CS；(c)对扩增的第二CSs中的至少一个测序；(d)比对来自(c)的至少两个包含相同UID的序列；并且(e)基于(d)确定共有序列，其中所述共有序列准确地表示所述靶标多核苷酸序列。In one aspect, there is provided a method of accurately determining the sequence of a target polynucleotide, the method comprising: (a) producing a second CS from a first CS produced by the target polynucleotide or a modified form (MCS) thereof; wherein the first CS, the second CS or the MCS comprises a UID, and wherein the first and second CSs are each independently generated by: (i) a primer extension reaction, or (ii) a linear amplification reaction; (b) amplification second CS; (c) sequencing at least one of the amplified second CSs; (d) aligning at least two sequences from (c) comprising the same UID; and (e) determining a consensus sequence based on (d) , wherein the consensus sequence accurately represents the target polynucleotide sequence.

在一些实施方案中，(a)包括通过使第一引物与靶标多核苷酸杂交并且延伸所杂交的第一引物而产生第一CS。In some embodiments, (a) comprises generating the first CS by hybridizing the first primer to the target polynucleotide and extending the hybridized first primer.

在一些实施方案中，(a)包括通过使与靶标多核苷酸杂交的第一引物延伸产生第一CS。In some embodiments, (a) comprises generating the first CS by extending a first primer that hybridizes to the target polynucleotide.

在一些实施方案中，第一引物通过靶标特异性序列杂交到所述靶标多核苷酸上。In some embodiments, the first primer hybridizes to the target polynucleotide via a target-specific sequence.

在一些实施方案中，(a)包括进行引物延伸反应或反转录反应。In some embodiments, (a) comprises performing a primer extension reaction or a reverse transcription reaction.

在一些实施方案中，(a)包括引物延伸反应。In some embodiments, (a) comprises a primer extension reaction.

在一些实施方案中，所述靶标多核苷酸是DNA。In some embodiments, the target polynucleotide is DNA.

在一些实施方案中，(a)是使用DNA聚合酶进行的。In some embodiments, (a) is performed using a DNA polymerase.

在一些实施方案中，(a)包括反转录反应。In some embodiments, (a) comprises a reverse transcription reaction.

在一些实施方案中，所述靶标多核苷酸是RNA。In some embodiments, the target polynucleotide is RNA.

在一些实施方案中，(a)是使用反转录酶进行的。In some embodiments, (a) is performed using reverse transcriptase.

在一些实施方案中，所述衔接子包含第一PBS。In some embodiments, the adaptor comprises a first PBS.

在一些实施方案中，所述MCS包含第一PBS。In some embodiments, the MCS comprises a first PBS.

在一些实施方案中，第二引物包含靶标特异性区域。In some embodiments, the second primer comprises a target-specific region.

在一些实施方案中，第二引物包含第二PBS。In some embodiments, the second primer comprises a second PBS.

在一些实施方案中，第一CS包含第一PBS。In some embodiments, the first CS comprises a first PBS.

在一些实施方案中，所述方法进一步包括附加衔接子到第一CS以形成MCS。In some embodiments, the method further comprises attaching an adaptor to the first CS to form an MCS.

在一些实施方案中，包含第一CS的序列的多核苷酸是MCS。In some embodiments, the polynucleotide comprising the sequence of the first CS is MCS.

在一些实施方案中，在(a)之后进行所述附加。In some embodiments, the addition is performed after (a).

在一些实施方案中，在(b)之前进行所述附加。In some embodiments, the addition is performed before (b).

在一些实施方案中，产生第二CS包括使杂交到第一CS上的第二引物延伸。In some embodiments, generating the second CS includes extending a second primer hybridized to the first CS.

在一些实施方案中，产生第二CS包括使杂交到MCS上的第二引物延伸。In some embodiments, generating the second CS comprises extending a second primer hybridized to the MCS.

在一些实施方案中，第二CS由第一CS产生。In some embodiments, the second CS is generated from the first CS.

在一些实施方案中，第二CS由MCS产生。In some embodiments, the second CS is produced by the MCS.

在一些实施方案中，MCS通过附加衔接子到第一CS形成MCS而产生。In some embodiments, the MCS is generated by attaching an adaptor to the first CS to form the MCS.

在一些实施方案中，MCS包含第一PBS。In some embodiments, the MCS comprises the first PBS.

在一些实施方案中，产生第二CS包括使杂交到所述MCS上的第二引物延伸。In some embodiments, generating a second CS comprises extending a second primer hybridized to the MCS.

在一些实施方案中，第一引物包含通用连接序列(universal ligationsequence，ULS)。In some embodiments, the first primer comprises a universal ligations sequence (ULS).

在一些实施方案中，所述衔接子包含含有与ULS互补的序列的单链区。In some embodiments, the adaptor comprises a single-stranded region comprising sequence complementary to the ULS.

在一些实施方案中，所述与ULS互补的序列在所述衔接子的单链区的5’端。In some embodiments, the sequence complementary to the ULS is 5' to the single-stranded region of the adaptor.

在一些实施方案中，第一引物还包含磷酸化的5’端。In some embodiments, the first primer further comprises a phosphorylated 5' end.

在一些实施方案中，所述方法包括在附加衔接子之前产生磷酸化的 5’端。In some embodiments, the method comprises generating the phosphorylated 5' end prior to appending the adaptor.

在一些实施方案中，第一引物还包含部分引物结合位点的第一部分，其中完整的引物结合位点包含两部分。In some embodiments, the first primer further comprises a first portion of a partial primer binding site, wherein the complete primer binding site comprises two portions.

在一些实施方案中，所述衔接子包含部分引物结合位点的第二部分。In some embodiments, the adaptor comprises a second portion of a portion of the primer binding site.

在一些实施方案中，完整的引物结合位点通过附加衔接子到第一CS 而形成。In some embodiments, the complete primer binding site is formed by appending an adaptor to the first CS.

在一些实施方案中，第二引物还包含通用引发序列(universal primingsequence，UPS)。In some embodiments, the second primer further comprises a universal priming sequence (UPS).

在一些实施方案中，衔接子还包含UPS。In some embodiments, the adaptor further comprises a UPS.

在一些实施方案中，衔接子包含单链多核苷酸。In some embodiments, the adaptor comprises a single-stranded polynucleotide.

在一些实施方案中，所述方法还包括使杂交到所述衔接子上的第一引物延伸，其中第一引物的延伸的部分包含与所述衔接子或其部分互补的区域。In some embodiments, the method further comprises extending a first primer hybridized to the adaptor, wherein the extended portion of the first primer comprises a region complementary to the adaptor or a portion thereof.

在一些实施方案中，所述衔接子包含双链的多核苷酸。In some embodiments, the adaptor comprises a double-stranded polynucleotide.

在一些实施方案中，所述衔接子还包含突出(overhang)区。In some embodiments, the adaptor further comprises an overhang region.

在一些实施方案中，所述突出区包含与第一CS的一部分互补的序列。In some embodiments, the overhang region comprises a sequence complementary to a portion of the first CS.

在一些实施方案中，所述与衔接子的突出区互补的第一CS的一部分是第一CS的末端。In some embodiments, the portion of the first CS that is complementary to the overhang region of the adaptor is the end of the first CS.

在一些实施方案中，衔接子进一步包含不与第一CS互补的区域。In some embodiments, the adaptor further comprises a region that is not complementary to the first CS.

在一些实施方案中，衔接子进一步包含样品条形码(sample barcode， SBC)序列。In some embodiments, the adaptor further comprises a sample barcode (SBC) sequence.

在一些实施方案中，衔接子还包含SBC序列。In some embodiments, the adaptor further comprises an SBC sequence.

在一些实施方案中，所述不与第一CS互补的区域包含SBC序列。In some embodiments, the region that is not complementary to the first CS comprises an SBC sequence.

在一些实施方案中，所述衔接子还包含亲和分子或捕获序列。In some embodiments, the adaptor further comprises an affinity molecule or capture sequence.

在一些实施方案中，所述衔接子包含亲和分子，其中所述亲和分子是生物素。In some embodiments, the adaptor comprises an affinity molecule, wherein the affinity molecule is biotin.

在一些实施方案中，MCS还包含亲和分子或捕获序列。In some embodiments, the MCS further comprises an affinity molecule or capture sequence.

在一些实施方案中，所述MCS包含亲和分子，其中所述亲和分子是生物素。In some embodiments, the MCS comprises an affinity molecule, wherein the affinity molecule is biotin.

在一些实施方案中，所述方法包括使所述亲和分子或捕获序列结合到固体表面上。In some embodiments, the method comprises binding the affinity molecule or capture sequence to a solid surface.

在一些实施方案中，所述固体表面是珠子。In some embodiments, the solid surface is a bead.

在一些实施方案中，所述方法包括从结合的MCS分离靶标多核苷酸或非靶标多核苷酸。In some embodiments, the method comprises isolating the target polynucleotide or the non-target polynucleotide from the bound MCS.

在一些实施方案中，所述与第一CS的一部分互补的序列是SBC的 5’。In some embodiments, the sequence complementary to a portion of the first CS is 5' of the SBC.

在一些实施方案中，所述与第一CS的一部分互补的序列是UPS的 3’或5’。In some embodiments, the sequence complementary to a portion of the first CS is 3' or 5' of the UPS.

在一些实施方案中，所述MCS包含衔接子。In some embodiments, the MCS comprises an adaptor.

在一些实施方案中，所述MCS包含双链衔接子的一条单链。In some embodiments, the MCS comprises one single strand of a double-stranded adaptor.

在一些实施方案中，所述MCS包含UPS。In some embodiments, the MCS comprises a UPS.

在一些实施方案中，MCS的第一PBS包含UPS。In some embodiments, the first PBS of the MCS comprises a UPS.

在一些实施方案中，MCS的第一PBS不包含UPS。In some embodiments, the first PBS of the MCS does not contain a UPS.

在一些实施方案中，第二引物包含UPS。In some embodiments, the second primer comprises UPS.

在一些实施方案中，第二引物的第二PBS包含UPS。In some embodiments, the second PBS of the second primer comprises UPS.

在一些实施方案中，第二引物的第二PBS不包含UPS。In some embodiments, the second PBS of the second primer does not contain UPS.

在一些实施方案中，所述MCS包含第一UPS，并且所述第二引物包含第二UPS。In some embodiments, the MCS comprises a first UPS and the second primer comprises a second UPS.

在一些实施方案中，所述MCS的第一PBS包含第一UPS。In some embodiments, the first PBS of the MCS comprises a first UPS.

在一些实施方案中，第二引物的第二PBS包含第二UPS。In some embodiments, the second PBS of the second primer comprises a second UPS.

在一些实施方案中，第二CS包含第一PBS、MCS、第二PBS、靶标序列、其互补体(compliments)、或它们的任意组合。In some embodiments, the second CS comprises the first PBS, the MCS, the second PBS, the target sequence, its complements, or any combination thereof.

在一些实施方案中，第二CS包含与第一PBS互补的序列。In some embodiments, the second CS comprises a sequence complementary to the first PBS.

在一些实施方案中，第二CS包含与MCS互补的序列。In some embodiments, the second CS comprises a sequence complementary to the MCS.

在一些实施方案中，第二CS包含第二PBS。In some embodiments, the second CS comprises a second PBS.

在一些实施方案中，第二CS包含靶标序列。In some embodiments, the second CS comprises the target sequence.

在一些实施方案中，第二CS包含UPS。In some embodiments, the second CS comprises a UPS.

在一些实施方案中，第二CS包含与第一UPS互补的序列。In some embodiments, the second CS comprises a sequence complementary to the first UPS.

在一些实施方案中，第二CS包含第二UPS。In some embodiments, the second CS includes a second UPS.

在一些实施方案中，第二CS由非指数扩增反应产生。In some embodiments, the second CS is produced by a non-exponential amplification reaction.

在一些实施方案中，第二CS由单个第二引物产生。In some embodiments, the second CS is produced by a single second primer.

在一些实施方案中，第二CS由引物延伸反应产生。In some embodiments, the second CS is generated by a primer extension reaction.

在一些实施方案中，第二CS由线性扩增反应产生。In some embodiments, the second CS is produced by a linear amplification reaction.

在一些实施方案中，扩增反应包括单轮扩增。In some embodiments, the amplification reaction includes a single round of amplification.

在一些实施方案中，扩增反应包括两轮以上的扩增。In some embodiments, the amplification reaction includes more than two rounds of amplification.

在一些实施方案中，扩增反应包括10轮以上的扩增。In some embodiments, the amplification reaction includes more than 10 rounds of amplification.

在一些实施方案中，在进行指数扩增反应之前产生第二CS。In some embodiments, the second CS is generated prior to performing the exponential amplification reaction.

在一些实施方案中，所述靶标多核苷酸包含多个靶标多核苷酸。In some embodiments, the target polynucleotide comprises a plurality of target polynucleotides.

在一些实施方案中，多个靶标多核苷酸中的每一个包含不同的序列。In some embodiments, each of the plurality of target polynucleotides comprises a different sequence.

在一些实施方案中，多个靶标多核苷酸中的每一个包含相同的序列。In some embodiments, each of the plurality of target polynucleotides comprises the same sequence.

在一些实施方案中，第一引物包含多个第一引物，每个第一引物包含靶标特异性区域。In some embodiments, the first primer comprises a plurality of first primers, each first primer comprising a target-specific region.

在一些实施方案中，多个第一引物中的每一个的靶标特异性区域是不同的。In some embodiments, the target-specific region of each of the plurality of first primers is different.

在一些实施方案中，多个第一引物中的每一个的靶标特异性区域是相同的。In some embodiments, the target-specific region of each of the plurality of first primers is the same.

在一些实施方案中，第二引物包含多个第二引物，每个第二引物包含与靶标特异性区域互补的序列。In some embodiments, the second primer comprises a plurality of second primers, each second primer comprising a sequence complementary to a target-specific region.

在一些实施方案中，多个第一引物中每一个的靶标特异性区域是不同的。In some embodiments, the target-specific region of each of the plurality of first primers is different.

在一些实施方案中，多个第一引物中每一个的靶标特异性区域是相同的。In some embodiments, the target-specific region of each of the plurality of first primers is the same.

在一些实施方案中，第一引物杂交到靶标多核苷酸的3’端、5’端或内部区域。In some embodiments, the first primer hybridizes to the 3' end, the 5' end or an internal region of the target polynucleotide.

在一些实施方案中，第二引物杂交到第一CS或MCS的3’端、5’端或内部区域。In some embodiments, the second primer hybridizes to the 3' end, 5' end or internal region of the first CS or MCS.

在一些实施方案中，第一CS包含多个第一CSs。In some embodiments, the first CS comprises a plurality of first CSs.

在一些实施方案中，多个第一CSs中的每一个包含不同的序列。In some embodiments, each of the plurality of first CSs comprises a different sequence.

在一些实施方案中，多个第一CSs中的每一个包含相同的序列。In some embodiments, each of the plurality of first CSs comprises the same sequence.

在一些实施方案中，衔接子序列包含多个衔接子。In some embodiments, the adaptor sequence comprises multiple adaptors.

在一些实施方案中，多个衔接子中的每一个包含不同的序列。In some embodiments, each of the plurality of adaptors comprises a different sequence.

在一些实施方案中，多个衔接子中的每一个包含相同的序列。In some embodiments, each of the plurality of adaptors comprises the same sequence.

在一些实施方案中，MCS包含多个MCSs。In some embodiments, the MCS comprises multiple MCSs.

在一些实施方案中，多个MCSs中的每一个包含不同的序列。In some embodiments, each of the plurality of MCSs comprises a different sequence.

在一些实施方案中，多个MCSs中的每一个包含相同的序列。In some embodiments, each of the plurality of MCSs comprises the same sequence.

在一些实施方案中，第二CS包含多个第二CSs。In some embodiments, the second CS comprises a plurality of second CSs.

在一些实施方案中，多个第二CSs中的每一个包含不同的序列。In some embodiments, each of the plurality of second CSs comprises a different sequence.

在一些实施方案中，多个第二CSs中的每一个包含相同的序列。In some embodiments, each of the plurality of second CSs comprises the same sequence.

在一些实施方案中，所述UID是每个第一引物特有的。In some embodiments, the UID is unique to each first primer.

在一些实施方案中，所述UID不是每个第一引物特有的。In some embodiments, the UID is not unique to each first primer.

在一些实施方案中，每个第一引物包含相同的UPS、相同的第一 PBS、或两者。In some embodiments, each first primer comprises the same UPS, the same first PBS, or both.

在一些实施方案中，每个第一CS包含相同的UPS、相同的第一PBS、或两者。In some embodiments, each first CS comprises the same UPS, the same first PBS, or both.

在一些实施方案中，每个衔接子包含相同的UPS、相同的第一PBS、相同的SBC或它们的组合。In some embodiments, each adaptor comprises the same UPS, the same first PBS, the same SBC, or a combination thereof.

在一些实施方案中，每个MCS包含相同的UPS、相同的第一PBS、相同的SBC或它们的组合。In some embodiments, each MCS comprises the same UPS, the same first PBS, the same SBC, or a combination thereof.

在一些实施方案中，每个第二引物包含相同的UPS、相同的第二 PBS、或两者。In some embodiments, each second primer comprises the same UPS, the same second PBS, or both.

在一些实施方案中，每个第二CS包含相同的UPS、相同的第一UPS、相同的第二UPS、相同的SBC、相同的第一PBS、相同的第二PBS或它们的组合。In some embodiments, each second CS comprises the same UPS, the same first UPS, the same second UPS, the same SBC, the same first PBS, the same second PBS, or a combination thereof.

在一些实施方案中，每个衔接子包含不同的UPS、不同的第一PBS、不同的SBC或它们的组合。In some embodiments, each adaptor comprises a different UPS, a different first PBS, a different SBC, or a combination thereof.

在一些实施方案中，每个MCS包含不同的UPS、不同的第一PBS、不同的SBC或它们的组合。In some embodiments, each MCS comprises a different UPS, a different first PBS, a different SBC, or a combination thereof.

在一些实施方案中，第一多个第一引物中的每个第一引物同时延伸，在同一个反应室中延伸，同时杂交到靶标多核苷酸上，或在相同反应室中杂交到靶标多核苷酸上。In some embodiments, each of the first primers in the first plurality of first primers are extended simultaneously, in the same reaction chamber, and hybridize to the target polynucleotide simultaneously, or hybridize to the target polynucleotide in the same reaction chamber on the glycine.

在一些实施方案中，第一多个第一CSs或MCSs中的每个第一CS或 MCS同时产生，在同一个的反应室中产生，同时扩增，或在同一个反应室中扩增。In some embodiments, each first CS or MCS of the first plurality of first CSs or MCSs is produced simultaneously, produced in the same reaction chamber, amplified simultaneously, or amplified in the same reaction chamber.

在一些实施方案中，第一多个第二引物中的每个第二引物同时延伸，在同一个反应室中延伸，同时杂交到第一CS或MCS上，或在同一个反应室中杂交到第一CS或MCS上。In some embodiments, each second primer of the first plurality of second primers is simultaneously extended, extended in the same reaction chamber, and hybridized to the first CS or MCS simultaneously, or hybridized in the same reaction chamber to the first CS or MCS. on the first CS or MCS.

在一些实施方案中，第一多个第二CSs中的每一个第二CS同时产生，在同一个的反应室中产生，同时扩增，或在同一个反应室中扩增。In some embodiments, each of the second CSs in the first plurality of second CSs are produced simultaneously, in the same reaction chamber, amplified simultaneously, or amplified in the same reaction chamber.

在一些实施方案中，所述样品是生物学样品。In some embodiments, the sample is a biological sample.

在一些实施方案中，所述样品是来自受试者的生物学样品。In some embodiments, the sample is a biological sample from a subject.

在一些实施方案中，所述受试者是患有疾病或病症的受试者。In some embodiments, the subject is a subject suffering from a disease or disorder.

在一些实施方案中，所述受试者是没有疾病或病症的受试者。In some embodiments, the subject is a subject free of a disease or disorder.

在一些实施方案中，所述受试者是动物。In some embodiments, the subject is an animal.

在一些实施方案中，所述受试者是人。In some embodiments, the subject is a human.

在一些实施方案中，所述样品是血液样品。In some embodiments, the sample is a blood sample.

在一些实施方案中，所述靶标多核苷酸是从样品分离的。In some embodiments, the target polynucleotide is isolated from a sample.

在一些实施方案中，所述靶标多核苷酸是直接从样品扩增的。In some embodiments, the target polynucleotide is amplified directly from the sample.

在一些实施方案中，样品包含多种样品，包括第一样品和第二样品。In some embodiments, the sample comprises multiple samples, including a first sample and a second sample.

在一些实施方案中，多种样品包含至少3，4 5，10，20，30，40， 50，60，70，80，90或100种以上的样品。In some embodiments, the plurality of samples comprises at least 3, 4, 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, or 100 or more samples.

在一些实施方案中，多种样品包含至少约100，200，300，400， 500，600，700，800，900或1000种以上的样品。In some embodiments, the plurality of samples comprises at least about 100, 200, 300, 400, 500, 600, 700, 800, 900 or 1000 or more samples.

在一些实施方案中，多种样品包含至少约1000，2000，3000， 4000，5000，6000，7000，8000种样品，9000或10,000种样品，或 100,000种样品，或1,000,000种以上的样品。In some embodiments, the plurality of samples comprises at least about 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000 samples, 9000 or 10,000 samples, or 100,000 samples, or more than 1,000,000 samples.

在一些实施方案中，多种样品包含至少约10,000种样品。In some embodiments, the plurality of samples comprises at least about 10,000 samples.

在一些实施方案中，第一样品来自第一受试者，第二样品来自第二受试者。In some embodiments, the first sample is from a first subject and the second sample is from a second subject.

在一些实施方案中，第一受试者是患有疾病或病症的受试者。In some embodiments, the first subject is a subject suffering from a disease or disorder.

在一些实施方案中，第二受试者是没有疾病或病症的受试者。In some embodiments, the second subject is a subject free of the disease or disorder.

在一些实施方案中，第一多个第一引物中的每个第一引物与第一样品接触，并且第二多个第一引物中的每个第一引物与第二样品接触。In some embodiments, each first primer of the first plurality of first primers is contacted with the first sample, and each first primer of the second plurality of first primers is contacted with the second sample.

在一些实施方案中，第二多个第一引物中的每个第一引物同时延伸，在同一个反应室中延伸，同时杂交到靶标多核苷酸上，或者在同一个反应室中杂交到靶标多核苷酸上。In some embodiments, each first primer in the second plurality of first primers is extended simultaneously, in the same reaction chamber, and hybridizes to the target polynucleotide simultaneously, or hybridizes to the target in the same reaction chamber on polynucleotides.

在一些实施方案中，第一多个第一引物和第二多个第一引物同时延伸或同时杂交到靶标多核苷酸上。In some embodiments, the first plurality of first primers and the second plurality of first primers extend or hybridize simultaneously to the target polynucleotide.

在一些实施方案中，第一多个第二引物中的每一个第二引物与第一样品接触，并且第二多个第二引物中的每一个第二引物与第二样品接触。In some embodiments, each second primer of the first plurality of second primers is contacted with the first sample, and each second primer of the second plurality of second primers is contacted with the second sample.

在一些实施方案中，第二多个第二引物中的每一个第二引物同时延伸，在同一个反应室中延伸，同时杂交到靶标多核苷酸上，或者在同一个反应室中杂交到靶标多核苷酸上。In some embodiments, each second primer in the second plurality of second primers is extended simultaneously, in the same reaction chamber, and hybridizes to the target polynucleotide simultaneously, or hybridizes to the target in the same reaction chamber on polynucleotides.

在一些实施方案中，第一多个第二引物和第二多个第二引物同时延伸，在同一个反应室中延伸，同时杂交到第一CS或MCS上，或在同一个反应室中杂交到第一CS或MCS上。In some embodiments, the first plurality of second primers and the second plurality of second primers are extended simultaneously, extend in the same reaction chamber, and hybridize to the first CS or MCS simultaneously, or hybridize in the same reaction chamber onto the first CS or MCS.

在一些实施方案中，第一多个第一CSs或MCSs中的每一个第一CS 或MCS由第一样品中的靶标多核苷酸产生，并且第二多个第一CSs或 MCSs中的每一个第一CS或MCS由第二样品中的靶标多核苷酸产生。In some embodiments, each of the first plurality of first CSs or MCSs is produced from a target polynucleotide in the first sample, and each of the second plurality of first CSs or MCSs A first CS or MCS is produced from the target polynucleotide in the second sample.

在一些实施方案中，第二多个第一CSs或第二MCSs中的每一个第一CS或MCS同时产生，在同一个反应室中产生，同时扩增，或者在同一个反应室中扩增。In some embodiments, each of the first CSs or MCSs of the second plurality of first CSs or second MCSs are simultaneously produced, produced in the same reaction chamber, amplified simultaneously, or amplified in the same reaction chamber .

在一些实施方案中，第一多个第一CSs和第二多个第一CSs同时产生，在同一个反应室中产生，同时扩增，或者在同一个反应室中扩增。In some embodiments, the first plurality of first CSs and the second plurality of first CSs are produced simultaneously, in the same reaction chamber, and amplified simultaneously, or amplified in the same reaction chamber.

在一些实施方案中，第一多个第二CSs中的每一个第二CS由第一样品中的靶标多核苷酸产生，并且第二多个第二CSs中的每一个第二CS由第二样品中的靶标多核苷酸产生。In some embodiments, each second CS of the first plurality of second CSs is generated from a target polynucleotide in the first sample, and each second CS of the second plurality of second CSs is generated from a Target polynucleotide production in two samples.

在一些实施方案中，第二多个第二CSs中的每一个第二CS同时产生，在同一个反应室中产生，同时扩增，或者在同一个反应室中扩增。In some embodiments, each of the second CSs in the second plurality of second CSs are produced simultaneously, in the same reaction chamber, amplified simultaneously, or amplified in the same reaction chamber.

在一些实施方案中，第一多个第二CSs和第二多个第二CSs同时产生，在同一个反应室中产生，同时扩增，或者在同一个反应室中扩增。In some embodiments, the first plurality of second CSs and the second plurality of second CSs are produced simultaneously, in the same reaction chamber, and amplified simultaneously, or amplified in the same reaction chamber.

在一些实施方案中，所述方法还包括组合第一样品与第二样品。In some embodiments, the method further comprises combining the first sample with the second sample.

在一些实施方案中，所述组合在产生第一多个第一CSs或MCSs后进行。In some embodiments, the combining is performed after generating the first plurality of first CSs or MCSs.

在一些实施方案中，一种或多种靶标多核苷酸或多种靶标多核苷酸包含变体序列。In some embodiments, the one or more target polynucleotides or more target polynucleotides comprise variant sequences.

在一些实施方案中，所述变体序列包含突变、多态性、缺失或插入。In some embodiments, the variant sequence comprises a mutation, polymorphism, deletion or insertion.

在一些实施方案中，所述多态性是单核苷酸多态性。In some embodiments, the polymorphism is a single nucleotide polymorphism.

在一些实施方案中，一种或多种靶标多核苷酸来自病原体。In some embodiments, the one or more target polynucleotides are from a pathogen.

在一些实施方案中，所述病原体是病毒、细菌或真菌。In some embodiments, the pathogen is a virus, bacteria or fungus.

在一些实施方案中，所述UID包含至少2个核苷酸。In some embodiments, the UID comprises at least 2 nucleotides.

在一些实施方案中，所述UID包含至少10个核苷酸。In some embodiments, the UID comprises at least 10 nucleotides.

在一些实施方案中，所述UID包含至少15个核苷酸。In some embodiments, the UID comprises at least 15 nucleotides.

在一些实施方案中，所述UID包含至多50个核苷酸。In some embodiments, the UID comprises up to 50 nucleotides.

在一些实施方案中，所述UID包含10-30个核苷酸。In some embodiments, the UID comprises 10-30 nucleotides.

在一些实施方案中，所述UID包含简并序列。In some implementations, the UID comprises a degenerate sequence.

在一些实施方案中，所述UID包含完全或部分的简并序列。In some embodiments, the UID comprises a fully or partially degenerate sequence.

在一些实施方案中，所述UID包含序列NNNNNNNNNNNNNNN (SEQ ID NO：1)，其中N是任意核酸。In some embodiments, the UID comprises the sequence NNNNNNNNNNNNNNN (SEQ ID NO: 1), wherein N is any nucleic acid.

在一些实施方案中，所述UID包含序列 NNNNNWNNNNNWNNNNN(SEQ ID NO：2)，其中N是任意核酸，并且 W是腺嘌呤或胸腺嘧啶。In some embodiments, the UID comprises the sequence NNNNNWNNNNNWNNNNN (SEQ ID NO: 2), wherein N is any nucleic acid, and W is adenine or thymine.

在一些实施方案中，所述附加包括连接(ligating)。In some embodiments, the attaching includes ligating.

在一些实施方案中，所述附加包括扩增。In some embodiments, the addition includes amplification.

在一些实施方案中，第二CS(s)在指数扩增反应中扩增。In some embodiments, the second CS(s) is amplified in an exponential amplification reaction.

在一些实施方案中，第二CS(s)通过PCR扩增。In some embodiments, the second CS(s) is amplified by PCR.

在一些实施方案中，第二CS(s)使用包含针对第一PBS的引物和针对第二PBS的引物的引物组进行扩增。In some embodiments, the second CS(s) is amplified using a primer set comprising primers for the first PBS and primers for the second PBS.

在一些实施方案中，其中第二CS(s)使用UPS进行扩增。In some embodiments, wherein the second CS(s) is amplified using a UPS.

在一些实施方案中，第二CS(s)使用包含针对第一UPS的引物和针对第二UPS的引物的引物组进行扩增。In some embodiments, the second CS(s) is amplified using a primer set comprising primers for the first UPS and primers for the second UPS.

在一些实施方案中，所述方法包括对由一个或多个CSs或多个第二 CSs中的一个或多个扩增的产物进行测序。In some embodiments, the method comprises sequencing the products amplified from one or more of the one or more CSs or the one or more of the second plurality of CSs.

在一些实施方案中，测序同时进行。In some embodiments, the sequencing is performed concurrently.

在一些实施方案中，测序是高通量测序。In some embodiments, the sequencing is high-throughput sequencing.

在一些实施方案中，所述方法还包括对确定的序列进行分析。In some embodiments, the method further comprises analyzing the determined sequence.

在一些实施方案中，分析使用计算机进行。In some embodiments, the analysis is performed using a computer.

在一些实施方案中，所述方法还包括确定扩增出错率。In some embodiments, the method further comprises determining an amplification error rate.

在一些实施方案中，所述方法还包括确定测序出错率。In some embodiments, the method further comprises determining a sequencing error rate.

在一些实施方案中，所述方法还包括确定一种或多种靶标多核苷酸的频率。In some embodiments, the method further comprises determining the frequency of one or more target polynucleotides.

在一些实施方案中，所述方法还包括确定在一种或多种靶标多核苷酸中存在或不存在变异。In some embodiments, the method further comprises determining the presence or absence of variation in one or more target polynucleotides.

在一些实施方案中，所述方法还包括确定受试者是等位基因纯合的还是杂合的。In some embodiments, the method further comprises determining whether the subject is allelic homozygous or heterozygous.

在一些实施方案中，所述方法还包括诊断、预测或治疗患有疾病或病症的受试者。In some embodiments, the method further comprises diagnosing, prognosing or treating a subject having a disease or disorder.

在一些实施方案中，所述方法还包括校正扩增错误。In some embodiments, the method further comprises correcting for amplification errors.

在一些实施方案中，所述方法还包括校正测序错误。In some embodiments, the method further comprises correcting sequencing errors.

在一些实施方案中，所述方法还包括划分(binning)或分组包含相同的UID的序列。In some embodiments, the method further comprises binning or grouping sequences comprising the same UID.

在一些实施方案中，所述方法还包括使用计算机或算法划分或分组包含相同UID的序列。In some embodiments, the method further comprises dividing or grouping sequences comprising the same UID using a computer or algorithm.

在一些实施方案中，所述方法还包括聚簇具有至少约90％、95％或 99％的序列同源性的序列。In some embodiments, the method further comprises clustering sequences having at least about 90%, 95% or 99% sequence homology.

在一些实施方案中，所述方法还包括比对具有至少约90％、95％或 99％的序列同源性的序列。In some embodiments, the method further comprises aligning sequences having at least about 90%, 95%, or 99% sequence homology.

在一些实施方案中，所述聚簇或比对在计算机或算法的辅助下进行。In some embodiments, the clustering or alignment is performed with the aid of a computer or algorithm.

在一些实施方案中，所述方法还包括确定包含相同的UID的序列读数的数目。In some embodiments, the method further comprises determining the number of sequence reads comprising the same UID.

在一些实施方案中，所述方法还包括确定包含相同的UID和具有至少约90％、95％或99％的序列同源性的靶标序列二者的序列读数的数目。In some embodiments, the method further comprises determining the number of sequence reads comprising both the same UID and the target sequence having at least about 90%, 95% or 99% sequence homology.

在一些实施方案中，所述方法还包括确定一种或多种样品中的一种或多种靶标多核苷酸的量。In some embodiments, the method further comprises determining the amount of one or more target polynucleotides in one or more samples.

在一些实施方案中，所述方法还包括由两种以上的序列、序列读数、扩增子序列、划分的序列、比对的序列、聚簇的序列、或包含相同的UID 的扩增子组序列形成共有序列。In some embodiments, the method further comprises a set of amplicons consisting of two or more sequences, sequence reads, amplicon sequences, partitioned sequences, aligned sequences, clustered sequences, or comprising the same UID The sequences form a consensus sequence.

在一些实施方案中，所述方法还包括以至少约80％，81％，82％， 83％，84％，85％，86％，87％，88％，89％，90％，91％，92％，93％， 94％，95％，96％，97％，98％，99％，99.5％，99.6％，99.7％，99.8％， 99.9％，99.99％，或100％的准确度或置信度确定靶标多核苷酸序列。In some embodiments, the method further comprises at least about 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, 99.99%, or 100% accuracy or confidence Determine the target polynucleotide sequence.

在一些实施方案中，测序和PCR出错被最小化，消除，或小于 0.01％，0.001％，0.0001％，0.00001％，0.000001％，或0.0000001％。In some embodiments, sequencing and PCR errors are minimized, eliminated, or less than 0.01%, 0.001%, 0.0001%, 0.00001%, 0.000001%, or 0.0000001%.

在一些实施方案中，扩增第一CSs或MCSs限制扩增偏差。In some embodiments, amplification of the first CSs or MCSs limits amplification bias.

在一些实施方案中，测序的出错率小于或等于0.00001％， 0.0001％，0.001％，0.01％，或0％。In some embodiments, the sequencing error rate is less than or equal to 0.00001%, 0.0001%, 0.001%, 0.01%, or 0%.

在一些实施方案中，测序的出错率不是0。In some embodiments, the error rate of sequencing is not zero.

在一些实施方案中，至少1,000，5,000，10,000，20,000，30,000， 40,000，50,000，1000,000，500,000个，或者，7x10⁷，8x10⁷，9x10⁷， 1x10⁸，2x10⁸，3x10⁸，4x10⁸，5x10⁸，6x10⁸，7x10⁸，8x10⁸，9x10⁸， 1x10⁹，2x10⁹，3x10⁹，4x10⁹，5x10⁹，6x10⁹，7x10⁹，8x10⁹，9x10⁹， 1x10¹⁰，2x10¹⁰，3x10¹⁰，4x10¹⁰，5x10¹⁰，6x10¹⁰，7x10¹⁰，8x10¹⁰， 9x10¹⁰，1x10¹¹，2x10¹¹，3x10¹¹，4x10¹¹，5x10¹¹，6x10¹¹，7x10¹¹， 8x10¹¹，9x10¹¹，1x10¹²，2x10¹²，3x10¹²，4x10¹²，5x10¹²，6x10¹²，7x10¹²，8x10¹²，9x10¹²个多核苷酸被测序。In some embodiments, at least 1,000, 5,000, 10,000, 20,000, 30,000, 40,000, 50,000, 1000,000, 500,000, or,^7x107 ,^8x107 ,^9x107 ,^1x108 ,^2x108 ,^3x108 ,^4x ,^5x108 ,^6x108 ,^7x108 ,^8x108 ,^9x108 ,^1x109 ,^2x109 ,^3x109 ,^4x109 ,^5x109 ,^6x109 ,^7x109 ,^8x109 ,^9x109 ,^1x10 , 3x10,²¹⁰ ，4x10¹⁰ ，5x10¹⁰ ，6x10¹⁰ ，7x10¹⁰ ，8x10¹⁰ ， 9x10¹⁰ ，1x10¹¹ ，2x10¹¹ ，3x10¹¹ ，4x10¹¹ ，5x10¹¹ ，6x10¹¹ ，7x10¹¹ ， 8x10¹¹ ，9x10¹¹ ，1x10¹² ，^2x1012 ,^3x1012 ,^4x1012 ,^5x1012 ,^6x1012 ,^7x1012 ,^8x1012 ,^9x1012 polynucleotides were sequenced.

在一些实施方案中，所述方法在小于或等于4周，3周，2周，1 周，6天，5天，5天，4天，3天，2天，1天，18小时，12小时，9小时，6小时或3小时的正时间量内进行。In some embodiments, the method is less than or equal to 4 weeks, 3 weeks, 2 weeks, 1 week, 6 days, 5 days, 5 days, 4 days, 3 days, 2 days, 1 day, 18 hours, 12 hours, 9 hours, 6 hours or 3 hours in a positive amount of time.

在一些实施方案中，与使用不用UIDs的类似方法用来实现相同的、相似的或更高的置信度或碱基确定准确度的读数数目相比，用于获得特定置信度或碱基确定准确度的读数的数目比其少至少约1.1，1.5，2，2.5， 3，3.5，4，4.5，5，5.5，6，6.5，7，7.5，8，8.5，9，9.5，10，11， 12，13，14，15，16，17，18，19，20，30，40，50，60，70，80， 90，100，200，300，400，500，600，700，800，900，或1000倍。In some embodiments, the number of reads used to obtain a specific confidence or base calling accuracy is compared to the number of reads used to achieve the same, similar or higher confidence or base calling accuracy using a similar method without UIDs. The number of readings in degrees is less than about 1.1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5, 5.5, 6, 6.5, 7, 7.5, 8, 8.5, 9, 9.5, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, 600, 700, 800, 900, or 1000 times.

在一些实施方案中，与使用不用UIDs的类似方法用来实现相同的、相似的或更高的置信度或碱基确定准确度的读数数目相比，用于获得特定置信度或碱基确定准确度的读数的数目比其少至少约1，2，3，4，5，5.5 6，6.5 7，8，9，10，11，12，13，14，15，16，17，18，19，20， 30，40，50，60，70，80，90，100，200，300，400，500，600，700， 800，900，1000，2000，3000，4000，5000，6000，7000，8000， 9000，10,000，15,000，20,000，25,000，30,000，35,000，40,000， 45,000，50,000，60,000，70,000，80,000，90,000，100,000，200,000， 300,000，400,000，500,000，600,000，700,000，800,000，900,000， 1x10⁶，2x10⁶，3x10⁶，4x10⁶，5x10⁶，6x10⁶，7x10⁶，8x10⁶，9x10⁶， 1x10⁷，2x10⁷，3x10⁷，4x10⁷，5x10⁷，6x10⁷，7x10⁷，8x10⁷，9x10⁷， 1x10⁸，2x10⁸，3x10⁸，4x10⁸，5x10⁸，6x10⁸，7x10⁸，8x10⁸，9x10⁸， 1x10⁹，2x10⁹，3x10⁹，4x10⁹，5x10⁹，6x10⁹，7x10⁹，8x10⁹，9x10⁹， 1x10¹⁰，2x10¹⁰，3x10¹⁰，4x10¹⁰，5x10¹⁰，6x10¹⁰，7x10¹⁰，8x10¹⁰， 9x10¹⁰，1x10¹¹，2x10¹¹，3x10¹¹，4x10¹¹，5x10¹¹，6x10¹¹，7x10¹¹， 8x10¹¹，9x10¹¹，1x10¹²，2x10¹²，3x10¹²，4x10¹²，5x10¹²，6x10¹²， 7x10¹²，8x10¹²，9x10¹²个读数。In some embodiments, the number of reads used to obtain a specific confidence or base calling accuracy is compared to the number of reads used to achieve the same, similar or higher confidence or base calling accuracy using a similar method without UIDs. The number of readings in degrees is at least about 1, 2, 3, 4, 5, 5.5 6, 6.5 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10,000, 15,000, 20,000, 25,000, 30,000, 35,000, 40,000, 45,000, 50,000, 60,000, 70,000, 80,000, 100,000, 200,000, 300,000, 500,000, 700,000,^{800,900,900,900,900,900,900,900,900,900,900,900,900,900,900,900,900,900,900,900,900,900,900,900,900,900,900,900,900,900} .⁶ ,^3x106 ,^4x106 ,^5x106 ,^6x106 ,^7x106 ,^8x106 ,^9x106 ,^1x107 ,^2x107 ,^3x107 ,^4x107 ,^5x107 ,^6x107 ,^7x1070 ,^8x107 ,⁹ 1x10⁸ , 2x10⁸ , 3x10⁸ , 4x10⁸ , 5x10⁸ , 6x10⁸ , 7x10⁸ , 8x10⁸ ,^9x10⁸ , 1x10⁹ , 2x10⁹ , 3x10⁹ , 4x10⁹ , 5x10⁹ , 6x10⁸ ,^7x10 ，9x10⁹ ， 1x10¹⁰ ，2x10¹⁰ ，3x10¹⁰ ，4x10¹⁰ ，5x10¹⁰ ，6x10¹⁰ ，7x10¹⁰ ，8x10¹⁰ ， 9x10¹⁰ ，1x10¹¹ ，2x10¹¹ ，3x10¹¹ ，4x10¹¹ ，5x10¹¹ ，6x10¹¹ ，7x10¹¹ ,^8x1011 ,^9x1011 ,^1x1012 ,^2x1012 ,^3x1012 ,^4x1012 ,^5x1012 ,^6x1012 ,^7x1012 ,^8x1012 ,^9x1012 readings.

在一方面中，提供一种试剂盒，其包括本文所述的任一种方法的一种或多种引物、试剂、酶或底物。In one aspect, a kit is provided that includes one or more primers, reagents, enzymes or substrates for any of the methods described herein.

在一方面中，提供一组第一引物，其中该组内的第一引物中的每一条包含靶标特异性序列和UID。In one aspect, a set of first primers is provided, wherein each of the first primers within the set comprises a target-specific sequence and a UID.

在一些实施方案中，该组包含至少约2，3，4，5，10，50，100， 500，1000，5000，10,000，100,000，2500,000条以上的包含不同的靶标特异性序列的第一引物。In some embodiments, the set comprises at least about 2, 3, 4, 5, 10, 50, 100, 500, 1000, 5000, 10,000, 100,000, 2500,000 or more sequences comprising different target-specific sequences a primer.

在一方面中，提供包含多种多核苷酸的多核苷酸文库，其中多种多核苷酸中的每一种多核苷酸包含UID，其中多种多核苷酸中的每一种多核苷酸是来自不同的非指数扩增的模板多核苷酸的产物。In one aspect, a polynucleotide library is provided comprising a plurality of polynucleotides, wherein each polynucleotide of the plurality of polynucleotides comprises a UID, wherein each polynucleotide of the plurality of polynucleotides is Products from different non-exponentially amplified template polynucleotides.

在一方面中，提供包含多种多核苷酸的多核苷酸文库，其中所述多种多核苷酸中的每一种多核苷酸包含来自本文所述的任意文库的一种或多种多核苷酸的PCR产物。In one aspect, a polynucleotide library is provided comprising a plurality of polynucleotides, wherein each polynucleotide of the plurality of polynucleotides comprises one or more polynucleosides from any of the libraries described herein acid PCR product.

在一方面中，提供产生多核苷酸文库的方法，所述方法包括：(a)使用第一引物，由样品产生靶标多核苷酸的第一互补序列(CS)，所述第一引物包含靶标特异性序列；(b)将包含第一引物结合序列(PBS)或其部分的衔接子附加到第一CS上，由此形成修饰的互补序列(MCS)；(c)使杂交稻MCS上的第二引物延伸，由此形成第二CS，其中第二引物包含：(i)靶标特异性区域，和(ii)第二PBS；并且(d)使用分别杂交到第一PBS 和第二PBS上的引物扩增第二CS。In one aspect, there is provided a method of generating a polynucleotide library, the method comprising: (a) generating a first complementary sequence (CS) of a target polynucleotide from a sample using a first primer, the first primer comprising the target specific sequence; (b) attaching an adaptor comprising the first primer binding sequence (PBS) or a portion thereof to the first CS, thereby forming a modified complementary sequence (MCS); (c) making the hybrid rice MCS A second primer is extended, thereby forming a second CS, wherein the second primer comprises: (i) a target-specific region, and (ii) a second PBS; and (d) hybridization to the first PBS and the second PBS, respectively, using The primers amplify the second CS.

在一方面中，提供产生多核苷酸文库的方法，所述方法包括：(a) 使杂交到靶标多核苷酸上的靶标特异性第一引物延伸，形成第一CS；(b) 附加衔接子到第一CS以形成MCS；(c)使杂交到第二CS上的第二引物延伸，形成第二CS；并且(d)a扩增第二CS；其中(a)或(c)不包括指数扩增。In one aspect, there is provided a method of generating a polynucleotide library, the method comprising: (a) extending a target-specific first primer hybridized to a target polynucleotide to form a first CS; (b) appending an adaptor to the first CS to form the MCS; (c) extending the second primer hybridized to the second CS to form the second CS; and (d) amplifying the second CS; wherein (a) or (c) do not include Exponential expansion.

在一方面中，提供产生多核苷酸文库的方法，所述方法包括：(a) 由靶标多核苷酸产生第一CS，或其修饰的形式(MCS)；(b)由包含第一 CS的序列的多核苷酸产生第二CS；其中第二CS是通过非指数扩增反应产生的；并且(c)扩增第二CS。In one aspect, a method of generating a polynucleotide library is provided, the method comprising: (a) generating a first CS, or a modified form thereof (MCS), from a target polynucleotide; (b) generating a polynucleotide comprising the first CS The polynucleotide of the sequence produces a second CS; wherein the second CS is produced by a non-exponential amplification reaction; and (c) amplifies the second CS.

在一方面中，提供准确确定靶标多核苷酸的序列的方法，所述方法包括：(a)由第一CS或其修饰的形式(MCS)产生第二CS，所述第一CS或其修饰的形式(MCS)是由靶标多核苷酸产生的；其中第一和第二CSs各自单独地通过下述产生：(i)引物延伸反应，或(ii)线性扩增反应；(b)扩增第二CS；(c)测序扩增的第二CSs中的至少一个；(d)比对来自(c)的至少两个包含至少10％序列同一性的序列；并且(e)基于(d)确定共有序列，其中所述共有序列准确地表示所述靶标多核苷酸序列。In one aspect, there is provided a method of accurately determining the sequence of a target polynucleotide, the method comprising: (a) generating a second CS from a first CS or a modified form (MCS) thereof, the first CS or a modification thereof The form of the (MCS) is generated from the target polynucleotide; wherein the first and second CSs are each independently generated by: (i) a primer extension reaction, or (ii) a linear amplification reaction; (b) amplification second CS; (c) sequencing at least one of the amplified second CSs; (d) aligning at least two sequences comprising at least 10% sequence identity from (c); and (e) based on (d) A consensus sequence is determined, wherein the consensus sequence accurately represents the target polynucleotide sequence.

在一些实施方案中，(a)包括通过使第一引物杂交到靶标多核苷酸上并且延伸所杂交的第一引物而产生第一CS。In some embodiments, (a) comprises generating the first CS by hybridizing the first primer to the target polynucleotide and extending the hybridized first primer.

在一些实施方案中，(a)包括通过延伸杂交到靶标多核苷酸上的第一引物而产生第一CS。In some embodiments, (a) comprises generating the first CS by extending the first primer hybridized to the target polynucleotide.

在一些实施方案中，第一引物通过靶标特异性序列杂交到靶标多核苷酸上。In some embodiments, the first primer hybridizes to the target polynucleotide via a target-specific sequence.

在一些实施方案中，衔接子包含第一PBS。In some embodiments, the adaptor comprises the first PBS.

在一些实施方案中，所述方法还包括将衔接子附加到第一CS上形成 MCS。In some embodiments, the method further comprises attaching an adaptor to the first CS to form the MCS.

在一些实施方案中，所述附加在(a)之后进行。In some embodiments, the appending occurs after (a).

在一些实施方案中，所述附加在(b)之前进行。In some embodiments, the appending occurs before (b).

在一些实施方案中，第二CS是从第一CS产生的。In some embodiments, the second CS is generated from the first CS.

在一些实施方案中，第一CS包含第一PBS.In some embodiments, the first CS comprises the first PBS.

在一些实施方案中，第二CS是从MCS产生的。In some embodiments, the second CS is generated from the MCS.

在一些实施方案中，MCS是通过将衔接子附加到第一CS上形成 MCS而产生的。In some embodiments, the MCS is generated by attaching an adaptor to the first CS to form the MCS.

在一些实施方案中，其中第一引物包含通用连接序列(ULS)。In some embodiments, wherein the first primer comprises a Universal Linker Sequence (ULS).

在一些实施方案中，衔接子包含含有与ULS互补的序列的单链区。In some embodiments, the adaptor comprises a single-stranded region containing sequence complementary to the ULS.

在一些实施方案中，与ULS互补的序列位于衔接子单链区的5’末端。In some embodiments, the sequence complementary to the ULS is located at the 5' end of the single-stranded region of the adaptor.

在一些实施方案中，第一引物还包含磷酸化的5’末端。In some embodiments, the first primer further comprises a phosphorylated 5' end.

在一些实施方案中，所述方法还包括在附加衔接子之前产生磷酸化的5’末端。In some embodiments, the method further comprises generating the phosphorylated 5' end prior to attaching the adaptor.

在一些实施方案中，衔接子包含部分引物结合位点的第二部分。In some embodiments, the adaptor comprises a second portion of a portion of the primer binding site.

在一些实施方案中，完整的引物结合位点是通过附加衔接子到第一 CS形成的。In some embodiments, the complete primer binding site is formed by attaching an adaptor to the first CS.

在一些实施方案中，第二引物还包含通用引物序列(UPS)。In some embodiments, the second primer further comprises a universal primer sequence (UPS).

在一些实施方案中，所述方法还包括使杂交到衔接子上的第一引物延伸，其中第一引物的延伸部分包含与所述衔接子或其部分互补的区域。In some embodiments, the method further comprises extending the first primer hybridized to the adaptor, wherein the extended portion of the first primer comprises a region complementary to the adaptor or a portion thereof.

在一些实施方案中，衔接子包含双链多核苷酸。In some embodiments, the adaptor comprises a double-stranded polynucleotide.

在一些实施方案中，衔接子还包含突出区。In some embodiments, the adaptor further comprises an overhang.

在一些实施方案中，第一CS与衔接子的突出区互补的部分是第一 CS的一个末端。In some embodiments, the portion of the first CS that is complementary to the overhang region of the adaptor is an end of the first CS.

在一些实施方案中，衔接子还包含不与第一CS互补的区域。In some embodiments, the adaptor further comprises a region that is not complementary to the first CS.

在一些实施方案中，衔接子还包含样品条形码(SBC)序列。In some embodiments, the adaptor further comprises a sample barcode (SBC) sequence.

在一些实施方案中，所述方法还包括从结合的MCS分离靶标多核苷酸或非靶标多核苷酸。In some embodiments, the method further comprises isolating the target polynucleotide or the non-target polynucleotide from the bound MCS.

在一些实施方案中，第二CS包含第一PBS、MCS、第二PBS、靶标序列、其互补体、或它们的任意组合。In some embodiments, the second CS comprises the first PBS, the MCS, the second PBS, the target sequence, its complement, or any combination thereof.

在一些实施方案中，第一多个第一引物中的每个第一引物同时延伸，在同一个反应室中延伸，同时杂交到靶标多核苷酸上，或者在同一个反应室中杂交到靶标多核苷酸上。In some embodiments, each first primer in the first plurality of first primers is extended simultaneously, in the same reaction chamber, and hybridizes to the target polynucleotide simultaneously, or hybridizes to the target in the same reaction chamber on polynucleotides.

在一些实施方案中，第一多个第一CSs或MCSs中的每个第一CS或 MCS同时产生，在同一个的反应室中产生，同时扩增，或在同一个反应室中扩增。In some embodiments, each first CS or MCS of the first plurality of first CSs or MCSs is simultaneously produced, produced in the same reaction chamber, amplified simultaneously, or amplified in the same reaction chamber.

在一些实施方案中，第一多个第二CSs中的每一个第二CS同时产生，在同一个的反应室中产生，同时扩增，或在同一个反应室中扩增。In some embodiments, each of the second CSs in the first plurality of second CSs are generated simultaneously, in the same reaction chamber, amplified simultaneously, or amplified in the same reaction chamber.

在一些实施方案中，多种样品包含至少3，45，10，20，30，40， 50，60，70，80，90或100种以上的样品。In some embodiments, the plurality of samples comprises at least 3, 45, 10, 20, 30, 40, 50, 60, 70, 80, 90 or 100 or more samples.

在一些实施方案中，多种样品包含至少约1000，2000，3000， 4000，5000，6000，7000，8000，9000，10,000，100,000，或1,000,000 种以上的样品。In some embodiments, the plurality of samples comprises at least about 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10,000, 100,000, or 1,000,000 or more samples.

在一些实施方案中，第二CS(s)使用UPS进行扩增。In some embodiments, the second CS(s) is amplified using a UPS.

在一些实施方案中，所述方法还包括对由一个或多个CSs或多个第二CSs中的一个或多个扩增的产物进行测序。In some embodiments, the method further comprises sequencing products amplified from one or more of the one or more CSs or the one or more of the plurality of second CSs.

在一些实施方案中，所述方法还包括划分具有至少约90％、95％或 99％的序列同源性的序列。In some embodiments, the method further comprises partitioning sequences having at least about 90%, 95% or 99% sequence homology.

在一些实施方案中，所述方法还包括分组具有至少约90％、95％或 99％的序列同源性的序列。In some embodiments, the method further comprises grouping sequences having at least about 90%, 95% or 99% sequence homology.

在一些实施方案中，所述方法还包括聚簇(clustering)具有至少约 90％、95％或99％的序列同源性的序列。In some embodiments, the method further comprises clustering sequences having at least about 90%, 95% or 99% sequence homology.

在一些实施方案中，所述方法还包括确定具有至少约90％、95％或 99％的序列同源性的序列读数的数目。In some embodiments, the method further comprises determining the number of sequence reads having sequence homology of at least about 90%, 95%, or 99%.

在一些实施方案中，所述方法还包括由两种以上的序列、序列读数、扩增子序列、划分的序列、比对的序列、或聚簇的序列形成共有序列。In some embodiments, the method further comprises forming a consensus sequence from two or more sequences, sequence reads, amplicon sequences, partitioned sequences, aligned sequences, or clustered sequences.

在一些实施方案中，扩增第一第一CSs或MCSs限制扩增偏差。In some embodiments, amplifying the first first CSs or MCSs limits amplification bias.

在一些实施方案中，至少1,000，5,000，10,000，20,000，30,000， 40,000，50,000，1000,000，500,000，或1x10⁶，2x10⁶，3x10⁶，4x10⁶， 5x10⁶，6x10⁶，7x10⁶，8x10⁶，9x10⁶，1x10⁷，2x10⁷，3x10⁷，4x10⁷， 5x10⁷，6x10⁷，7x10⁷，8x10⁷，9x10⁷，1x10⁸，2x10⁸，3x10⁸，4x10⁸， 5x10⁸，6x10⁸，7x10⁸，8x10⁸，9x10⁸，1x10⁹，2x10⁹，3x10⁹，4x10⁹， 5x10⁹，6x10⁹，7x10⁹，8x10⁹，9x10⁹，1x10¹⁰，2x10¹⁰，3x10¹⁰，4x10¹⁰， 5x10¹⁰，6x10¹⁰，7x10¹⁰，8x10¹⁰，9x10¹⁰，1x10¹¹，2x10¹¹，3x10¹¹，4x10¹¹，5x10¹¹，6x10¹¹，7x10¹¹，8x10¹¹，9x10¹¹，1x10¹²，2x10¹²， 3x10¹²，4x10¹²，5x10¹²，6x10¹²，7x10¹²，8x10¹²，9x10¹²个多核苷酸被测序。In some embodiments, at least 1,000, 5,000, 10,000, 20,000, 30,000, 40,000, 50,000, 1000,000, 500,000, or^1x106 ,^2x106 ,^3x1060 ,^4x106 ,^5x106 ,^6x106 ,^7x10⁶ ,^9x106 ,^1x107 ,^2x107 ,^3x107 ,^4x107 ,^5x107 ,^6x107 ,^7x107 ,^8x107 ,^9x107 ,^1x108 ,^2x108 ,^3x108 ,^4x108 ,^5x108 ,^6x108^7x108 ,^8x108 ,^9x108 ,^1x109 ,^2x109 ,^3x109 ,^4x109 ,^5x109 ,^6x109 ,^7x109 ,^8x109 ,^9x109 ,^1x101010 ,^2x10100 ,^3x1010^,⁴ ，6x10¹⁰ ，7x10¹⁰ ，8x10¹⁰ ，9x10¹⁰ ，1x10¹¹ ，2x10¹¹ ，3x10¹¹ ，4x10¹¹ ，5x10¹¹ ，6x10¹¹ ，7x10¹¹ ，8x10¹¹ ，9x10¹¹ ，1x10¹² ，2x10¹² ， 3x10¹² ，4x10¹² ,^5x1012 ,^6x1012 ,^7x1012 ,^8x1012 ,^9x1012 polynucleotides were sequenced.

在一些实施方案中，样品是全血样品。In some embodiments, the sample is a whole blood sample.

在一些实施方案中，样品是FFPE样品。In some embodiments, the sample is an FFPE sample.

在一些实施方案中，含有10个以上UIDs的扩增子的百分数等于从纯化的多核苷酸样品产生的含有10个以上UIDs的扩增子的百分数。In some embodiments, the percentage of amplicons containing more than 10 UIDs is equal to the percentage of amplicons containing more than 10 UIDs generated from the purified polynucleotide sample.

在一些实施方案中，与从纯化的多核苷酸样品产生的含有10个以上 UIDs的扩增子的百分数相比，含有10个以上UIDs的扩增子的百分数仅比其小少于约1％，2％，3％，4％，5％，6％，7％，8％，9％，或至多 10％。In some embodiments, the percentage of amplicons containing more than 10 UIDs is only less than about 1% less than the percentage of amplicons containing more than 10 UIDs generated from the purified polynucleotide sample , 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, or up to 10%.

在一些实施方案中，中靶的特异性(on target specificity)等于从纯化的多核苷酸样品观察到的中靶的特异性。In some embodiments, the on target specificity is equal to the on target specificity observed from the purified polynucleotide sample.

在一些实施方案中，与从纯化的多核苷酸样品观察到的中靶的特异性相比，中靶的特异性仅比起其小少于约1％，2％，3％，4％，5％， 6％，7％，8％，9％，或至多10％。In some embodiments, the mid-target specificity is only less than about 1%, 2%, 3%, 4% less than the mid-target specificity observed from the purified polynucleotide sample, 5%, 6%, 7%, 8%, 9%, or up to 10%.

在一些实施方案中，覆盖的均匀性等于从纯化的多核苷酸样品观察到的覆盖的均匀性。In some embodiments, the uniformity of coverage is equal to the uniformity of coverage observed from purified polynucleotide samples.

在一些实施方案中，与从纯化的多核苷酸样品观察到的覆盖的均匀性相比，覆盖均匀性仅比其小少于约1％，2％，3％，4％，5％，6％， 7％，8％，9％，或至多10％。In some embodiments, the uniformity of coverage is only less than about 1%, 2%, 3%, 4%, 5%, 6 compared to the uniformity of coverage observed from purified polynucleotide samples %, 7%, 8%, 9%, or up to 10%.

在一些实施方案中，所述方法在线性扩增过程中包括缓慢的升降温速率(slowramping rates)。In some embodiments, the method includes slow ramping rates during linear amplification.

在一些实施方案中，所述方法在延伸过程中包括缓慢的升降温速率。In some embodiments, the method includes a slow ramp rate during extension.

在一些实施方案中，延伸包括维持反应在约90℃-99℃持续第一时程，以约0.1℃/s使温度降低至约60℃，维持反应在约55℃-60℃持续第二时程，加入DNA聚合酶，以约0.1℃/s使温度升高至约65℃，维持反应在约65℃持续第三时程，以约0.1℃/s使温度升高至约80℃，并且维持反应在约80℃持续第四时程。In some embodiments, extending comprises maintaining the reaction at about 90°C-99°C for a first period of time, reducing the temperature to about 60°C at about 0.1°C/s, maintaining the reaction at about 55°C-60°C for a second period process, adding DNA polymerase, increasing the temperature to about 65°C at about 0.1°C/s, maintaining the reaction at about 65°C for a third time course, increasing the temperature to about 80°C at about 0.1°C/s, and The reaction was maintained at about 80°C for the fourth time course.

在一些实施方案中，延伸包括维持反应在约90℃-99℃持续第一时程，以约0.1℃/s使温度降低至约68℃，维持反应在约68℃持续第二时程，以约0.1℃/s使温度降低至约55℃，维持反应在约55℃持续第三时程，加入DNA聚合酶，以约0.1℃/s使温度升高至约65℃，维持反应在约65℃持续第四时程，以约0.1℃/s使温度升高至约80℃，并且维持反应在约80℃持续第五时程。In some embodiments, extending comprises maintaining the reaction at about 90°C-99°C for a first time course, reducing the temperature to about 68°C at about 0.1°C/s, maintaining the reaction at about 68°C for a second time course, to Reduce the temperature to about 55°C at about 0.1°C/s, maintain the reaction at about 55°C for a third time course, add DNA polymerase, increase the temperature to about 65°C at about 0.1°C/s, maintain the reaction at about 65°C. °C was continued for the fourth time course, the temperature was increased to about 80 °C at about 0.1 °C/s, and the reaction was maintained at about 80 °C for the fifth time course.

在一些实施方案中，线性扩增包括维持反应在约90℃-99℃持续第一时程，以约0.1℃/s使温度降低至约60℃，维持反应在约60℃持续第二时程，以约0.1℃/s使温度升高至约72℃，并且维持反应在约72℃持续第三时程。In some embodiments, the linear amplification comprises maintaining the reaction at about 90°C-99°C for a first time course, reducing the temperature to about 60°C at about 0.1°C/s, maintaining the reaction at about 60°C for a second time course , the temperature was increased to about 72°C at about 0.1°C/s, and the reaction was maintained at about 72°C for the third time course.

在一些实施方案中，延伸包括以约0.1℃/s的速度降低和/或升高温度。In some embodiments, extending includes decreasing and/or increasing the temperature at a rate of about 0.1°C/s.

在一些实施方案中，线性扩增包括以约0.1℃/s的速度降低和/或升高温度。In some embodiments, linear amplification includes decreasing and/or increasing the temperature at a rate of about 0.1°C/s.

在一些实施方案中，第一引物、第二引物或二者处于固定的浓度。In some embodiments, the first primer, the second primer, or both are at a fixed concentration.

在一些实施方案中，延伸、扩增或二者在存在氯化镁、硫酸铵、D-(+)- 海藻糖、甜菜碱或它们的组合的条件下进行。In some embodiments, extension, amplification, or both are performed in the presence of magnesium chloride, ammonium sulfate, D-(+)-trehalose, betaine, or a combination thereof.

在一些实施方案中，第一引物中的每一个、第二引物或二者包括 60℃-68℃的解链温度。In some embodiments, each of the first primer, the second primer, or both comprises a melting temperature of 60°C-68°C.

在一些实施方案中，第一引物中的每一个、第二引物或二者包含 21-32个核苷酸的长度。In some embodiments, each of the first primers, the second primer, or both comprise a length of 21-32 nucleotides.

在一些实施方案中，第一引物中的每一个、第二引物或二者在其3’ 末端的最后5个核苷酸中不包含4个以上的嘧啶。In some embodiments, each of the first primer, the second primer, or both does not contain more than 4 pyrimidines in the last 5 nucleotides of its 3' end.

在一些实施方案中，第一引物中的每一个、第二引物或二者设计为产生含有30％-70％GC含量的扩增子。In some embodiments, each of the first primers, the second primer, or both are designed to produce amplicons that contain 30%-70% GC content.

在一些实施方案中，第一引物中的每一个、第二引物或二者设计为产生长度为225-300个碱基对的扩增子。In some embodiments, each of the first primers, the second primer, or both are designed to produce amplicons that are 225-300 base pairs in length.

在一些实施方案中，第一引物中的每一个、第二引物或二者排除来自初始引物组的在延伸、扩增或二者过程中具有最高的错读数的引物。In some embodiments, each of the first primers, the second primer, or both excludes primers from the initial primer set with the highest false reads during extension, amplification, or both.

在一些实施方案中，第一引物中的每一个、第二引物或二者排除来自初始引物组的普遍形成二聚体的引物。In some embodiments, each of the first primer, the second primer, or both excludes prevalent dimer-forming primers from the initial primer set.

在一些实施方案中，第一引物中的每一个、第二引物或二者排除来自初始引物组的负责产生关于靶标多核苷酸中的一种或多种的最高总读数中的一种或多种的引物。In some embodiments, each of the first primer, the second primer, or both excludes one or more of the highest total reads from the initial set of primers responsible for generating the highest total reads for one or more of the target polynucleotides species of primers.

在一方面中，提供选择用于引物组的引物的方法，所述引物组包括多个第一引物和多个第二引物，所述方法包括：第一关口(pass)，其中所选的引物包括：解链温度为60℃-68℃，长度为21-32个核苷酸，在其3’ 末端的最后5个核苷酸中嘧啶为3个以下，产生具有30％-70％GC的序列读数的引物，和产生长度为225-300个核苷酸的序列读数的引物；第二关口，其中所选的引物不包括：在延伸或扩增过程中产生最高的错读数的一种或多种引物，产生多个包含大于1％引物二聚体序列的序列读数的引物，和在延伸或扩增过程中产生多个包含1％以上错读和大于0.3％引物二聚体序列的序列读数的引物；和第三关口，其中所选的引物不包括一种或多种产生最高的总序列读数的引物。In one aspect, there is provided a method of selecting primers for a primer set comprising a plurality of first primers and a plurality of second primers, the method comprising: a first pass, wherein the selected primers Including: a melting temperature of 60°C-68°C, a length of 21-32 nucleotides, and 3 or less pyrimidines in the last 5 nucleotides of its 3' end, resulting in a 30%-70% GC Primers for sequence reads, and primers that produce sequence reads of 225-300 nucleotides in length; a second pass, where the primers selected do not include: one that produces the highest misread during extension or amplification; or Multiple primers that produce multiple sequence reads containing greater than 1% primer-dimer sequence, and multiple sequences that contain more than 1% misreads and greater than 0.3% primer-dimer sequence during extension or amplification primers for the reads; and a third gate, wherein the primers selected do not include one or more primers that yield the highest total sequence reads.

在一方面中，提供从引物组中排除引物的方法，所述引物组包括多个第一引物和多个第二引物，所述方法包括：第一关口，其中排除的引物包括：解链温度低于60℃或高于68℃，长度低于21个核苷酸或高于32 个核苷酸，并且在其3’末端的最后5个核苷酸中嘧啶为4个以上，产生少于30％GC含量或大于70％GC含量的序列读数的引物，和产生长度低于225个核苷酸或高于300个核苷酸的序列读数的引物；第二关口，其中排除的引物包括：在延伸或扩增过程中产生最高错读数的一种或多种引物，产生多个包含大于1％引物二聚体序列的序列读数的引物，和在延伸或扩增过程中产生多个包含1％以上错读和大于0.3％引物二聚体序列的序列读数的引物；和第三关口，其中排除的引物包括一种或多种产生最高的总序列读数的引物。In one aspect, there is provided a method of excluding primers from a primer set, the primer set comprising a plurality of first primers and a plurality of second primers, the method comprising: a first gate, wherein the excluded primers comprise: a melting temperature Below 60°C or above 68°C, less than 21 nucleotides or more than 32 nucleotides in length, and having more than 4 pyrimidines in the last 5 nucleotides at its 3' end, yielding less than Primers for sequence reads of 30% GC content or greater than 70% GC content, and primers that produce sequence reads of less than 225 nucleotides or greater than 300 nucleotides in length; the second pass, where excluded primers include: One or more primers that yielded the highest misread during extension or amplification, multiple primers that yielded multiple sequence reads containing greater than 1% primer-dimer sequence, and multiple primers that yielded during extension or amplification multiple sequence reads containing 1 Primers with more than % misreads and greater than 0.3% sequence reads of the primer-dimer sequence; and a third gate, where the excluded primers include one or more primers that yielded the highest total sequence reads.

在一方面中，本文提供包括多种引物的引物组，其中所述多种引物中的每一种引物包含：解链温度为60℃-68℃，长度为21-32个核苷酸，在其3’末端的最后5个核苷酸中嘧啶为3个以下，并且产生具有30％-70％ GC和225-300个核苷酸的长度的序列读数。In one aspect, provided herein is a primer set comprising a plurality of primers, wherein each primer of the plurality of primers comprises: a melting temperature of 60°C-68°C, a length of 21-32 nucleotides, and It has 3 or less pyrimidines in the last 5 nucleotides at its 3' end and yields sequence reads with a 30%-70% GC and a length of 225-300 nucleotides.

在一些实施方案中，所述引物组不包括：在延伸或扩增过程中产生最高的错读数的一种或多种引物，产生多个包含大于1％引物二聚体序列的序列读数的引物，和在延伸反应或扩增反应过程中产生多个包含1％以上错读和大于0.3％引物二聚体序列的序列读数的引物；和一种或多种产生最高的总序列读数的引物。In some embodiments, the primer set does not include: one or more primers that produced the highest misread during extension or amplification, primers that produced multiple sequence reads comprising greater than 1% primer-dimer sequence , and primers that produce multiple sequence reads comprising more than 1% misreads and greater than 0.3% primer-dimer sequence during an extension reaction or amplification reaction; and one or more primers that produce the highest total sequence reads.

附图简述Brief Description of Drawings

本文所述的新型特征具体在附上的权利要求书中描述。通过参考下文的描写示例性实施例的详细描述和附图将得到对本文所述的特征和所述特征的优点的更好的理解，在所述实施例中应用了本文所述的特征的原理，在所述附图中：The novel features described herein are described with particularity in the appended claims. A better understanding of the features described herein and the advantages of the features will be obtained by reference to the following detailed description and accompanying drawings describing exemplary embodiments in which the principles of the features described herein are applied , in the attached drawing:

图1显示本文所述的示例性靶向测序方法的示意图。Figure 1 shows a schematic diagram of an exemplary targeted sequencing method described herein.

图2显示本文所述的示例性靶向测序方法的示意图。Figure 2 shows a schematic diagram of an exemplary targeted sequencing method described herein.

图3显示产生改进的靶向测序方法的示例性工序的示意图。显示了处理时间。Figure 3 shows a schematic diagram of an exemplary procedure for producing an improved targeted sequencing method. Processing time is displayed.

图4显示了表格，该表格显示使用来自本文所述的未改进的和改进的靶向测序方法所示的引物组的中靶的特异性百分数，并且与本领域已知的其他引物组(其他#1和其他#2)相比较。Tex-1组是针对23种基因(所有外显子)的载体组。CS-23是针对18种基因的rsSNP聚焦引物组(focused primer panel)。Figure 4 shows a table showing the percentage of on-target specificity using primer sets shown from the unimproved and improved targeted sequencing methods described herein, and compared to other primer sets known in the art (other #1 compared to other #2). The Tex-1 group is a vector group for 23 genes (all exons). CS-23 is an rsSNP focused primer panel for 18 genes.

图5显示所示反应条件的靶标读数覆盖率的图。显示了在所示反应条件下的相对于读数深度高于覆盖率的基因的分数。对序列覆盖率具有阳性作用的条件以粗体显示。Figure 5 shows a graph of target read coverage for the indicated reaction conditions. The fraction of genes with relative read depth above coverage under the indicated reaction conditions is shown. Conditions positive for sequence coverage are shown in bold.

图6A显示了用于一组30个引物的较不严格条件下的示例性靶向测序方法所示的步骤的升降温(ramping)和退火条件的示意图。Figure 6A shows a schematic representation of ramping and annealing conditions for the steps shown in an exemplary targeted sequencing method under less stringent conditions for a set of 30 primers.

图6B显示了在图6A的不足以产生充分的靶标生产的升降温和退火条件下使用的一组约350个引物中的引物浓度。Figure 6B shows primer concentrations in a set of about 350 primers used under the ramp and annealing conditions of Figure 6A that were insufficient for sufficient target production.

图7显示了在较不严格(上图)和更严格(下图)升降温和退火条件下示例性靶向测序方法的示意图。通过对第二引物延伸步骤减缓升降温速率而提高严格性。通过对第一引物延伸步骤增加68℃的保持步骤(hold step)而提高严格性。通过使最低退火温度降低为55℃而提高严格性。Figure 7 shows a schematic diagram of an exemplary targeted sequencing method under less stringent (upper panel) and more stringent (lower panel) ramping and annealing conditions. Stringency was increased by slowing down the ramp rate for the second primer extension step. Stringency was increased by adding a 68°C hold step to the first primer extension step. Stringency was increased by lowering the minimum annealing temperature to 55°C.

图8显示了引物的浓度和当在图8所示的较不严格条件下使用来自一组约350个引物的全部、一半、四分之一或少量引物，每种引物浓度固定(组1)和总引物浓度固定(组2)的结果。Figure 8 shows the concentrations of primers and when all, half, one quarter or a small number of primers from a set of about 350 primers are used under the less stringent conditions shown in Figure 8, each primer concentration is fixed (set 1) and total primer concentration fixed (group 2).

图9A显示了在示例性靶向测序方法的所示的PCR循环后在琼脂糖凝胶上的产物，旁边是100个碱基对(bp)的梯度标记，没有添加物。显示了靶标产物和二聚体产物。Figure 9A shows the product on an agarose gel following the indicated PCR cycle of an exemplary targeted sequencing method, next to a 100 base pair (bp) gradient marker, without additives. Target and dimer products are shown.

图9B显示了在示例性靶向测序方法的所示的PCR循环后在琼脂糖凝胶上的产物，旁边是100个碱基对(bp)的梯度标记，有添加物甜菜碱。显示了靶标产物和二聚体产物。图9C显示了在示例性靶向测序方法的所示的PCR循环后在琼脂糖凝胶上的产物，旁边是100个碱基对(bp)的梯度标记，有添加物海藻糖。显示了靶标产物和二聚体产物。图9D显示了在示例性靶向测序方法的所示的PCR循环后在琼脂糖凝胶上的产物，旁边是100个碱基对(bp)的梯度标记，有添加物氯化镁。显示了靶标产物和二聚体产物。图9E显示了在示例性靶向测序方法的所示的PCR循环后在琼脂糖凝胶上的产物，旁边是100个碱基对(bp)的梯度标记，有添加物硫酸铵。显示了靶标产物和二聚体产物。Figure 9B shows the product on an agarose gel following the indicated PCR cycle of the exemplary targeted sequencing method next to a 100 base pair (bp) gradient marker with the addition of betaine. Target and dimer products are shown. Figure 9C shows the product on an agarose gel following the indicated PCR cycles of the exemplary targeted sequencing method next to a 100 base pair (bp) gradient marker with the addition of trehalose. Target and dimer products are shown. Figure 9D shows the product on an agarose gel following the indicated PCR cycle of the exemplary targeted sequencing method next to a 100 base pair (bp) gradient marker with the additive magnesium chloride. Target and dimer products are shown. Figure 9E shows the product on an agarose gel following the indicated PCR cycle of the exemplary targeted sequencing method next to a 100 base pair (bp) gradient marker with the additive ammonium sulfate. Target and dimer products are shown.

图10显示了在示例性靶向测序方法的33个PCR循环后在所示条件下在琼脂糖凝胶上的产物，旁边是100个碱基对(bp)的梯度标记。Figure 10 shows the product on an agarose gel under the indicated conditions after 33 PCR cycles of the exemplary targeted sequencing method, next to a 100 base pair (bp) gradient marker.

图11显示了相对于序列长度的序列长度的二聚体序列分析的图。测序的对应的二聚体产物显示在琼脂糖凝胶的右侧。Figure 11 shows a graph of dimer sequence analysis of sequence length versus sequence length. The corresponding dimer product of the sequence is shown on the right side of the agarose gel.

图12显示了表示在第二引物延伸步骤过程中不需要的产物形成的提议的机制的简图，其通过二聚体序列分析确定。二聚体形式由具有高解链温度的引物在低退火温度下推动。二聚体形成由具有高GC含量与UID 相互作用的引物推动。该图分别公开了SEQ IDNOS 90-91，92，91和93 (以出现的次序)。Figure 12 shows a schematic diagram representing the proposed mechanism of unwanted product formation during the second primer extension step, as determined by dimer sequence analysis. The dimer form is driven by primers with high melting temperatures at low annealing temperatures. Dimer formation is driven by primers with high GC content that interact with UID. The figure discloses SEQ ID NOS 90-91, 92, 91 and 93 (in order of appearance), respectively.

图13显示了表示基因和相关的疾病、外显子数和示例性引物组 CS-350的探针组数目的图表。右侧的无引物外显子列表表示这样的外显子：对于所述外显子，没有使用除本文所述的方法之外的其他引物设计方法产生引物序列。Figure 13 shows a graph representing genes and associated diseases, number of exons, and number of probe sets for the exemplary primer set CS-350. The list of unprimed exons on the right represents exons for which no primer sequences were generated using primer design methods other than those described herein.

图14显示了用于从包含约350个引物的引物组产生引物亚组的排除标准的图表。Figure 14 shows a graph of exclusion criteria used to generate primer subsets from primer sets containing about 350 primers.

图15A显示了所示的约350个引物的引物组和使用图14的排除标准由其产生的亚组的100x cap的中靶的特异性和覆盖的均匀性的图。Figure 15A shows a graph of the specificity and uniformity of coverage for the primer set of about 350 primers shown and the 100x cap of the subsets generated therefrom using the exclusion criteria of Figure 14.

图15B显示了显示所示的约350个引物的引物组和使用图14的排除标准由其产生的亚组的100x cap的中靶的特异性、覆盖的均匀性和平均读取深度的图表。Figure 15B shows a graph showing the specificity, uniformity of coverage and average read depth of on-target for the primer set of about 350 primers shown and the 100x cap of the subset generated therefrom using the exclusion criteria of Figure 14.

图16显示了显示在所示的约350个引物的引物组和使用图14的排除标准由其产生的亚组的所示的生物信息帽(in silico cap)范围内的覆盖均匀性的图。Figure 16 shows a graph showing coverage uniformity within the indicated in silico cap of the indicated primer set of about 350 primers and the subsets generated therefrom using the exclusion criteria of Figure 14.

图17A显示了显示使用三种不同的UIDs(BC_01，BC_02，BC_03) 的本文所述的示例性靶向测序方法的中靶的特异性的图。Figure 17A shows a graph showing on-target specificity of the exemplary targeted sequencing methods described herein using three different UIDs (BC_01, BC_02, BC_03).

图17B显示了使用三种不同UIDs的本文所述的示例性靶向测序方法的PCR后的产物。Figure 17B shows the post-PCR products of the exemplary targeted sequencing methods described herein using three different UIDs.

图17C显示了具有图17A的相对应的值的图表。Figure 17C shows a graph with the corresponding values of Figure 17A.

图18显示了在所示的生物信息帽范围内大于平均数20％的扩增子的百分数的图，该图显示了使用三个不同的UIDs的本文所述的示例性靶向测序方法的覆盖均匀性。Figure 18 is a graph showing the percentage of amplicons that are greater than 20% of the mean within the indicated bioinformatic caps showing coverage of the exemplary targeted sequencing method described herein using three different UIDs uniformity.

图19显示了比较原始读数(没有UID)与UID增强的准确性的图。Figure 19 shows a graph comparing the accuracy of raw readings (without UID) with UID enhancement.

图20显示了使用示例性的靶向测序方法的SNP检测和序列分析流程的示意图。Figure 20 shows a schematic diagram of a SNP detection and sequence analysis pipeline using an exemplary targeted sequencing approach.

图21显示了显示使用用所示的UIDs的本文所述的示例性靶向测序方法在样品之间匹配的SNP确定的相对百分数的图和对应的图表。使用 BC_6的减少的循环导致较高数量的特有分子。Figure 21 shows a graph and corresponding graph showing the relative percentages of SNPs that match between samples using the exemplary targeted sequencing methods described herein with the UIDs indicated. Reduced cycling with BC_6 resulted in higher numbers of unique molecules.

图22A显示了使用本文所述的示例性靶向测序方法相对于扩增子％GC含量每个扩增子的读数百分数的图。在组A中存在大量的低性能者(performers)。Figure 22A shows a graph of the percentage of reads per amplicon relative to amplicon %GC content using exemplary targeted sequencing methods described herein. In group A there is a large number of performers.

图22B显示了使用本文所述的示例性靶向测序方法相对于扩增子％GC含量每个扩增子的读数百分数的图。Figure 22B shows a graph of the percentage of reads per amplicon relative to amplicon %GC content using the exemplary targeted sequencing methods described herein.

图23A显示了通过它们各自的引物解链温度绘制的低性能扩增子的图。Figure 23A shows a graph of low performance amplicons plotted by their respective primer melting temperatures.

图23B显示了通过它们各自的引物解链温度绘制的低性能扩增子的图。Figure 23B shows a graph of low performance amplicons plotted by their respective primer melting temperatures.

图24显示了通过它们各自的引物解链温度绘制的低、中等和高性能扩增子的图。Figure 24 shows a graph of low, medium and high performance amplicons plotted by their respective primer melting temperatures.

图25显示了总结用于靶向测序方法的改进的引物设计的设置的图表。Figure 25 shows a diagram summarizing the setup for improved primer design for targeted sequencing methods.

图26显示了用于靶向测序方法的改进的脱靶击中确定标准 (off-target hitcalling criteria)的示意图。Figure 26 shows a schematic diagram of improved off-target hitcalling criteria for targeted sequencing methods.

图27显示了用于靶向测序方法中的改进的引物设计的示意图。一种改进的引物设计是添加约20个核苷酸的内含子缓冲序列。一种改进的引物设计是平均分开的外显子，用以更好的覆盖和提高的灵活性。Figure 27 shows a schematic diagram of an improved primer design for use in targeted sequencing methods. An improved primer design is to add an intron buffer sequence of about 20 nucleotides. An improved primer design is to divide exons evenly for better coverage and increased flexibility.

图28显示了显示与使用现有技术方法(v.1.0)设计的引物组相比，使用本文所述的引物设计法设计的改进的引物组(v.3.0)的图。所述改进的组导致提高的扩增效率和增加的检测到的特有分子数目，导致提高的SNP 确定能力，序列覆盖需求的减少，并且降低样品输入需求。Figure 28 shows a graph showing an improved primer set (v.3.0) designed using the primer design method described herein compared to primer sets designed using the prior art method (v.1.0). The improved set results in increased amplification efficiency and increased number of unique molecules detected, resulting in improved SNP determination capability, reduced sequence coverage requirements, and reduced sample input requirements.

图29显示了比较所示的亚组中满足改进的引物设计标准的引物相对于同一亚组中不满足改进的引物设计标准的引物的覆盖均匀性的图。亚组中满足改进的引物设计标准的引物表现出更高的覆盖均匀性，更高的中靶的特异性和更高的读取计数。Figure 29 shows a graph comparing the uniformity of coverage of primers in the indicated subsets that met the improved primer design criteria relative to primers that did not meet the improved primer design criteria in the same subset. Primers in the subset meeting the improved primer design criteria exhibited higher coverage uniformity, higher on-target specificity, and higher read counts.

图30A显示了显示与从全血提取的DNA样品相比，全血样品关于均匀性和覆盖率以及中靶的特异性的表现的图。Figure 30A shows a graph showing the performance of whole blood samples with respect to uniformity and coverage and specificity on target compared to DNA samples extracted from whole blood.

图30B显示了使用所示体积的全血样品在示例性的靶向测序方法所示的PCR循环后在琼脂糖凝胶上的产物，旁边是100个碱基对(bp)的梯度标记。可以使用少至1μL的全血。Figure 30B shows the products on an agarose gel using the indicated volumes of whole blood samples following the PCR cycles shown in the exemplary targeted sequencing method, next to a 100 base pair (bp) gradient marker. Whole blood as little as 1 μL can be used.

图31显示了比较来自全血样品和提取自全血样品的DNA样品的具有大于10个特有分子的扩增子数目的分析的图(上图)和相对应的表格 (下图)。3x全血样品组合三个在衔接子连接之前的第一引物延伸反应。Figure 31 shows a graph (top panel) and corresponding table (bottom panel) comparing the analysis of the number of amplicons with more than 10 unique molecules from whole blood samples and DNA samples extracted from whole blood samples. 3x whole blood samples combined three first primer extension reactions prior to adaptor ligation.

图32显示了显示全血样品与提取自全血的DNA样品之间的SNP确定差异的表。上表显示使用全血样品遗漏的DNA确定。上表显示使用提取自全血的DNA样品遗漏的SNP确定。该图分别公开了SEQ ID NOS 94-95，94-96和96-97(以出现的次序)。Figure 32 shows a table showing differences in SNP determination between whole blood samples and DNA samples extracted from whole blood. The table above shows the determination of missing DNA using whole blood samples. The table above shows the determination of missed SNPs using DNA samples extracted from whole blood. The figure discloses SEQ ID NOS 94-95, 94-96 and 96-97 (in order of appearance), respectively.

图33显示了显示比较全血样品和提取自全血的DNA样品，FFPE前列腺组织样品关于均匀性和覆盖率以及中靶的特异性的表现。Figure 33 shows performance showing comparison of whole blood samples and DNA samples extracted from whole blood, FFPE prostate tissue samples with respect to uniformity and coverage and specificity on target.

图34显示了使用各种样品的示例性靶向测序方法在琼脂糖凝胶上的产物，旁边是100个碱基对(bp)的梯度标记。本文所述的方法可以适应各种样品，包括向第一引物延伸反应中直接输入全血或唾液(无需预先的核苷酸提取)、口腔样品和FFPE样品。Figure 34 shows the products on an agarose gel of an exemplary targeted sequencing method using various samples, next to a 100 base pair (bp) gradient marker. The methods described herein can be adapted to a variety of samples, including direct infusion of whole blood or saliva into the first primer extension reaction (without prior nucleotide extraction), oral samples, and FFPE samples.

图35显示了比较来自FFPE样品、全血样品和提取自全血的DNA样品的具有大于10个特有分子的扩增子数目的分析的图(上图)和对应的表格(下图)。3x全血样品组合三个在衔接子连接之前的第一引物延伸反应。Figure 35 shows a graph (top panel) and corresponding table (bottom panel) comparing the analysis of the number of amplicons with more than 10 unique molecules from FFPE samples, whole blood samples, and DNA samples extracted from whole blood. 3x whole blood samples combined three first primer extension reactions prior to adaptor ligation.

图36显示了使用所示的PCR循环数和所示的UIDs检测的特有分子数的图。该图表明减少PCR循环数防止形成过度扩增的较大产物，减少 PCR重复，可以允许减少需要的测序深度，可以改善关于低输入样品的数据，并且可以平衡线性扩增以补偿减少的PCR循环。Figure 36 shows a graph of the number of unique molecules detected using the indicated number of PCR cycles and the indicated UIDs. The figure shows that reducing the number of PCR cycles prevents the formation of over-amplified larger products, reduces PCR repeats, can allow for reduced sequencing depth required, can improve data on low input samples, and can balance linear amplification to compensate for reduced PCR cycles .

图37显示了显示与Ampure纯化的文库相比，使用已经凝胶纯化的示例性靶向测序方法产生的文库的测序数据质量的图。Figure 37 shows a graph showing the quality of sequencing data for libraries generated using an exemplary targeted sequencing method that has been gel purified compared to Ampure purified libraries.

图38显示了生物信息读数滴度的图，其显示了具有大于或等于10 个特有的分子覆盖的扩增子的百分数。以每扩增子500x平均读取深度测序为Tex_01引物组中95％的扩增子(336个扩增子)提供了充分的特有分子覆盖率。这能够允许每次运行90个样品的多路(multiplex)(336x 500 ＝168,000个读数)。Figure 38 shows a graph of bioinformatic read titers showing the percentage of amplicons with greater than or equal to 10 unique molecular coverage. Sequencing at an average read depth of 500x per amplicon provided sufficient coverage of unique molecules for 95% of the amplicons (336 amplicons) in the Tex_01 primer set. This can allow multiplexing of 90 samples per run (336 x 500 = 168,000 reads).

图39显示了关于每个条形码样品预测和实际的序列数目和每个条形码样品总序列读数的百分数的表。Figure 39 shows a table for predicted and actual sequence numbers per barcode sample and the percentage of total sequence reads per barcode sample.

图40显示了显示每种基因捕获的特有分子的比例的拷贝数定量的图。将关于给定基因的特有读数(UID过滤的)的比例是比较常染色体上的基因与X染色体上的基因。显示了男性参比患者与三个测试患者之间的读数比例。这表明了使用UID分析进行靶向测序的定量能力。Figure 40 shows a graph showing copy number quantification of the proportion of unique molecules captured by each gene. The ratio of unique reads (UID filtered) for a given gene is compared to genes on the autosomal chromosomes to genes on the X chromosome. The ratio of readings between the male reference patient and the three test patients is shown. This demonstrates the quantitative power of targeted sequencing using UID analysis.

图41显示了用于引物延伸靶向测序的示例性的基于RNA的方法的示意图，所述方法具有证明的扩增长度超过700bps的产物的能力。Figure 41 shows a schematic diagram of an exemplary RNA-based method for primer extension targeted sequencing with demonstrated ability to amplify products in excess of 700 bps in length.

图42A显示了使用所示的RNA输入量在所示的PCR循环后来自示例性的RNA靶向测序方法的在琼脂糖凝胶上的产物，旁边是100个碱基对(bp)的梯度标记。Figure 42A shows products on an agarose gel from an exemplary RNA-targeted sequencing method following the indicated PCR cycles using the indicated RNA input amounts, next to a 100 base pair (bp) gradient marker .

图42B显示了示例性的靶标列表，对于所述靶标已经成功应用了本文所述的示例性的RNA靶向测序方法。Figure 42B shows an exemplary list of targets for which the exemplary RNA-targeted sequencing methods described herein have been successfully applied.

图43显示了来自作为技术重复进行的示例性靶向测序方法的在琼脂糖凝胶上的产物，旁边是100个碱基对(bp)的梯度标记。这证明了本文所述的方法的重现性。Figure 43 shows the product on an agarose gel from an exemplary targeted sequencing method performed as a technical replicate, next to a 100 base pair (bp) gradient marker. This demonstrates the reproducibility of the method described herein.

图44显示了开发用于产生引物组的示例性引物设计软件的示意图。Figure 44 shows a schematic diagram of exemplary primer design software developed for generating primer sets.

图45A显示了在所示的生物信息cap范围内大于平均值的20％的扩增子的百分数，其显示了使用所示的引物亚组(以针对100x中值标准化的读数覆盖倍数使用)的本文所述的示例性靶向测序方法的覆盖均匀性。Figure 45A shows the percentage of amplicons that are greater than 20% of the mean within the indicated bioinformatics cap, showing the use of the indicated subset of primers (used at fold coverage of reads normalized to 100x median) Uniformity of coverage of the exemplary targeted sequencing methods described herein.

图45B显示了比较所示的约350个引物的引物组和使用图15所示的排除标准和本文所述的其他方法由其产生的亚组的覆盖均匀性的图。Figure 45B shows a graph comparing the primer sets of about 350 primers shown and the uniformity of coverage of the subsets generated therefrom using the exclusion criteria shown in Figure 15 and other methods described herein.

图46显示了总结本文所述的方法的质量度量的图表。Figure 46 shows a graph summarizing the quality metrics of the methods described herein.

图47显示DNA靶向测序方法在2％琼脂糖凝胶上的产物，旁边是 100个碱基对(bp)的梯度标记。显示了在所示的PCR扩增循环后来自2名患者(B1和B2)的样品。Figure 47 shows the product of the DNA-targeted sequencing method on a 2% agarose gel next to a 100 base pair (bp) gradient marker. Samples from 2 patients (B1 and B2) are shown after the indicated cycles of PCR amplification.

图48显示了RNA靶向的测序方法在2％琼脂糖凝胶上的产物，旁边是100bp的梯度标记。显示了在所示的PCR扩增循环后来自2名患者(B1 和B2)的样品。对于每名患者，从1000ng降到1ng进行起始RNA输入材料的滴定。Figure 48 shows the product of the RNA-targeted sequencing method on a 2% agarose gel next to a 100 bp gradient marker. Samples from 2 patients (B1 and B2) are shown after the indicated cycles of PCR amplification. Titration of starting RNA input material was performed down from 1000 ng to 1 ng for each patient.

图49显示了使用靶向测序方法应用后下一代(post Next-generation) 测序(NGS)数据过滤程序的结果的柱状图。图49B中的柱状图是49A的对数规格版本。使用配对末端读数(R1和R2)方法进行NGS，对所示的样品产生总共～600万读数。对具有30以上的phred Q得分的序列进行进一步的分析(通过质量R1，和R2)。然后查询序列数据，以确定用于 DNA靶向文库流程的预测的引物组的存在(通过引物R1和R2)。弃去任何不以预测的引物序列之一起始的测序读数。对于具有R1上已知的或预测的引物序列的每个读数，R2上预测的引物是合格的(配对的R1和 R2)。因此，当已知的R1引物与R2上不同的靶标引物错配时(或反之亦然)，其对应非特异性的扩增产物(以浅灰色显示)。如果已知的R1 引物不与R2上不同的靶标引物错配(或反之亦然)，其对应特异性的扩增产物(以深灰色显示)。Figure 49 shows a bar graph of the results of applying a post Next-generation sequencing (NGS) data filter using a targeted sequencing approach. The bar graph in Figure 49B is a log scale version of 49A. NGS was performed using the paired-end reads (Rl and R2) approach, yielding a total of -6 million reads for the samples indicated. Sequences with phred Q-scores above 30 were further analyzed (by quality R1, and R2). The sequence data was then queried to determine the presence of predicted primer sets (by primers Rl and R2) for the DNA-targeted library pipeline. Any sequencing reads that did not start with one of the predicted primer sequences were discarded. For each read with a known or predicted primer sequence on R1, the predicted primer on R2 was qualified (paired R1 and R2). Thus, when a known R1 primer is mismatched with a different target primer on R2 (or vice versa), it corresponds to a non-specific amplification product (shown in light grey). If a known R1 primer does not mismatch a different target primer on R2 (or vice versa), it corresponds to a specific amplification product (shown in dark grey).

图50A-50C显示了DNA靶向的组的测序读数图。每个所示的基因都被用于制备DNA样品的特异性引物对靶向。图50A显示使用不具有UID (左)和具有UID(右)的第一引物对(BC3)的DNA靶向组的测序读数计数的图。图50B显示使用靶向测序方法使用不具有UID(左)和具有 UID(右)的第二引物对(BCl)的DNA靶向组的测序读数计数的图。图 50C显示使用具有后UID过滤的靶向测序方法的DNA靶向组的测序读数计数的图。Figures 50A-50C show graphs of sequencing reads for the DNA-targeted panel. Each indicated gene was targeted with a specific primer pair used to prepare the DNA sample. Figure 50A shows a graph of sequencing read counts for DNA targeting sets using the first primer pair (BC3) without UID (left) and with UID (right). Figure 50B shows a graph of sequencing read counts using a targeted sequencing approach using a DNA targeting panel with no UID (left) and a second primer pair (BC1) with a UID (right). Figure 50C shows a graph of sequencing read counts for a DNA-targeted panel using a targeted sequencing approach with post-UID filtering.

图51A-51B显示使用具有UID过滤的靶向测序方法的RNA靶向组的测序读数计数。每个所示的基因转录物都被用于制备RNA样品的特异性引物对靶向。图51A显示RNA靶向组的测序读数计数的图(左)和测序读数频率的图(右)。图51B是图51A(左)所示的测序读数频率的图的对照规模版本。此处显示的数据表示过滤后的读数/表达计数。Figures 51A-51B show sequencing read counts for RNA targeting panels using the targeted sequencing approach with UID filtering. Each indicated gene transcript was targeted with a specific primer pair used to prepare the RNA sample. Figure 51A shows a graph of sequencing read counts (left) and sequencing read frequencies (right) for the RNA targeting panel. Figure 51B is a control scale version of the graph of sequencing read frequency shown in Figure 51A (left). Data shown here represent filtered reads/expression counts.

图52显示使用靶向测序方法对于所示的靶标和条件的靶标特异性分析的结果的图。检测了多种流程条件(循环数、缓冲液、退火条件等)。如所示，在一些条件下获得99.2％的靶标特异性。(例如，99.2％的序列读数是具有最小非特异性扩增的需要的靶标)。Figure 52 shows a graph of the results of target-specific analysis using targeted sequencing methods for the indicated targets and conditions. Various process conditions (number of cycles, buffers, annealing conditions, etc.) were tested. As shown, 99.2% target specificity was obtained under some conditions. (For example, 99.2% of sequence reads were desired targets with minimal non-specific amplification).

图53显示了使用具有UID过滤的靶向测序方法关于所示的靶标和条件的UID分布的图。检测了多种流程条件(循环数、缓冲液、退火条件等)。每种UID原始序列的数目可能随所用的条件而不同。Figure 53 shows a graph of UID distribution for the indicated targets and conditions using the targeted sequencing method with UID filtering. Various process conditions (number of cycles, buffers, annealing conditions, etc.) were tested. The number of raw sequences of each UID may vary with the conditions used.

图54显示了相对于使用具有UID过滤的靶向测序方法每种UID序列的读取数目，测序准确度phred评分(Q)的推定的提高的图。Figure 54 shows a graph of the putative increase in sequencing accuracy phred score (Q) relative to the number of reads per UID sequence using the targeted sequencing approach with UID filtering.

图55显示了显示使用UID过滤时使用靶向测序方法每种所示的靶标的准确度提高的图。Figure 55 shows a graph showing the increase in accuracy for each of the indicated targets using the targeted sequencing approach when filtering by UID.

图56显示了UID一致性分析和使用具有UID过滤的靶向测序方法的SNP基因分型分析的准确度的图。相对于不同的实验条件(x-轴)检测了不同DNA靶标区域(y-轴)。每种所示的靶标的共有序列以灰色显示，突变/SNPs以白色显示。纯合的基因灰色占主导。杂合的基因以其序列的约～50％(表现出共有序列)用白色显示。由PCR引起的突变和得失位或测序出错以黑色显示。Figure 56 shows a graph of the accuracy of UID agreement analysis and SNP genotyping analysis using a targeted sequencing approach with UID filtering. Different DNA target regions (y-axis) were detected relative to different experimental conditions (x-axis). Consensus sequences for each indicated target are shown in grey and mutations/SNPs are shown in white. Homozygous genes grey predominate. Heterozygous genes are shown in white at about -50% of their sequence (expressing the consensus sequence). Mutations and gain/loss positions or sequencing errors caused by PCR are shown in black.

图57显示了使用具有UID过滤的靶向测序方法的GBA基因的序列分析。将来自患者样品的GBA基因的两个等位基因用Clustal W进行比对。患者在UID过滤后表现出杂合性。将两个等位基因与全体人基因组参比进行比较。缺少“*”表示在3种序列之一的错误配对比对。本文提供的患者的GBA基因具有一条与人参比基因组相同的等位基因，并且第二等位基因具有6个观察到的序列多态性/突变。该图分别公开了SEQ ID NOS 98-100(以出现的次序)。Figure 57 shows sequence analysis of the GBA gene using a targeted sequencing approach with UID filtering. The two alleles of the GBA gene from patient samples were aligned with Clustal W. Patients showed heterozygosity after UID filtering. The two alleles were compared to a global human genome reference. The absence of "*" indicates a mispaired alignment at one of the 3 sequences. The GBA gene of the patients presented herein has one allele identical to the human reference genome, and the second allele has 6 observed sequence polymorphisms/mutations. The figure discloses SEQ ID NOS 98-100 (in order of appearance), respectively.

图58显示了在图12中检测的同一患者中的GBA基因的SNP基因分型分析。此处提供的数据显示了使用本文所述的具有UID过滤的靶向测序方法发现的GBA基因靶标(rs1064644)中存在的致病性SNP的位置。Figure 58 shows the SNP genotyping analysis of the GBA gene in the same patient examined in Figure 12. The data presented here show the location of the pathogenic SNP present in the GBA gene target (rs1064644) discovered using the targeted sequencing approach with UID filtering described herein.

图59是示例可以与本发明的实施例实施方案联系使用的计算机系统的第一实施例结构的结构图。Figure 59 is a block diagram illustrating the structure of a first example of a computer system that may be used in connection with example embodiments of the present invention.

图60是示例可以与本发明的实施例实施方案联系使用的计算机网络的图。Figure 60 is a diagram illustrating a computer network that may be used in connection with example embodiments of the present invention.

详述detail

用在本文时，扩增包括进行扩增反应。引物延伸反应的产物可以包括引物序列连同在引物延伸过程中产生的与模板的互补物。在一些实施方案中，扩增反应包括分别杂交到多核苷酸的互补链上的两条引物的延伸。多核苷酸的扩增可以通过本领域任何已知的方式进行。多核苷酸可以通过聚合酶链式反应(PCR)或等温DNA扩增进行扩增。As used herein, amplifying includes performing an amplification reaction. The product of the primer extension reaction can include the primer sequence along with the complement to the template produced during primer extension. In some embodiments, the amplification reaction includes the extension of two primers that respectively hybridize to the complementary strand of the polynucleotide. Amplification of polynucleotides can be performed by any means known in the art. Polynucleotides can be amplified by polymerase chain reaction (PCR) or isothermal DNA amplification.

扩增反应可以包含一种或多种添加剂。在一些实施方案中，所述一种或多种添加剂是二甲亚砜(DMSO)，甘油，甜菜碱(一)水合物(N，N，N- 三甲基甘氨酸＝[caroxy-甲基]三甲铵)，海藻糖，7-去氮杂-2’-脱氧鸟苷三磷酸酯(dC7GTP或7-去氮杂-2’-dGTP)，BSA(牛血清白蛋白)，甲酰胺基(formamide)(methanamide)，硫酸铵，氯化镁，四甲基氯化铵(TMAC)，其他四烷基铵衍生物(例如，四乙基氯化铵(TEA-Cl)和四丙基氯化铵 (TPrA-Cl)，非离子性去污剂(例如，Triton X-100，吐温20，Nonidet P-40 (NP-40))，或PREXCEL-Q。在一些实施方案中，扩增反应可以包括0，1， 2，3，4，5，6，7，8，9，或10种不同的添加剂。在其他情形中，扩增反应可以包括至少0，1，2，3，4，5，6，7，8，9，或10种不同的添加剂。在一些实施方案中，包括一种或多种添加剂的延伸、反转录或扩增反应的特征在于增加。The amplification reaction may contain one or more additives. In some embodiments, the one or more additives are dimethyl sulfoxide (DMSO), glycerol, betaine (mono)hydrate (N,N,N-trimethylglycine = [caroxy-methyl] trimethylammonium), trehalose, 7-deaza-2'-deoxyguanosine triphosphate (dC7GTP or 7-deaza-2'-dGTP), BSA (bovine serum albumin), formamide ) (methanamide), ammonium sulfate, magnesium chloride, tetramethylammonium chloride (TMAC), other tetraalkylammonium derivatives (for example, tetraethylammonium chloride (TEA-Cl) and tetrapropylammonium chloride (TPrA -Cl), a non-ionic detergent (eg, Triton X-100,Tween 20, Nonidet P-40 (NP-40)), or PREXCEL-Q. In some embodiments, the amplification reaction can include O , 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 different additives. In other cases, the amplification reaction may include at least 0, 1, 2, 3, 4, 5, 6 , 7, 8, 9, or 10 different additives. In some embodiments, an extension, reverse transcription, or amplification reaction that includes one or more additives is characterized by an increase.

当用于本文时，聚合酶链式反应(PCR)包括通过双链多核苷酸互补链的同时引物延伸的特定多核苷酸序列的体外扩增反应。PCR反应产生多个拷贝的侧连引物结合位点的模板多核苷酸。结果，使用两个引物，两条链的模板多核苷酸拷贝数随每个循环指数增加，原因在于，每个循环中两条链都被复制。多核苷酸双链体具有与所用的引物末端相对应的末端。 PCR可以包括一次或多次重复变性模板多核苷酸，使引物与引物结合位点退火，和在存在核苷酸的条件下通过DNA或RNA聚合酶延伸引物。每个步骤特定的温度、持续时间和各步骤之间变化的速率取决于本领域普通技术人员公知的多种因素。(McPherson等，IRLPress，Oxford(1991和 1995))。例如，在使用Taq DNA聚合酶的常规PCR中，双链的模板多核苷酸可以在＞90℃的温度变性，引物可以在50-75℃范围内的温度退火，并且引物可以在72-78℃范围内的温度延伸。在一些实施方案中，PCR 包括反转录PCR(RT-PCR)、实时PCR、巢式PCR、定量PCR、多路PCR 等。在一些实施方案中，PCR不包括RT-PCR。(美国专利号5,168,038， 5,210,015，6,174,670，6,569,627，和5,925,517；Mackay等，Nucleic AcidsResearch，30：1292-1305(2002))。RT-PCR包括在反转录反应之后的PCR反应，并且扩增得到的cDNA，巢式PCR包括两个阶段的PCR，其中使用第一组引物的第一PCR反应的扩增子变成使用第二引物组的第二PCR反应的样品，第二引物组中的至少一种结合在第一PCR反应的扩增子的内部位置。多路PCR包括PCR反应，其中多种多核苷酸序列在同一反应混合物中同时进行PCR。PCR反应体积可以任意地为0.2nL-1000 μL。定量PCR包括设计成测量样品中一种或多种序列的绝对或相对量、丰度或浓度的PCR反应。定量测量可以包括将一种或多种参比序列或标准物与目的多核苷酸序列比较。(Freeman等，Biotechniques，26：112-126 (1999)；Becker-Andre等，Nucleic Acids Research，17：9437-9447(1989)；Zimmerman等，Biotechniques，21：268-279(1996)；Diviacco等，Gene， 122：3013-3020(1992)；Becker-Andre等，Nucleic Acids Research，17： 9437-9446(1989))。As used herein, the polymerase chain reaction (PCR) includes in vitro amplification reactions of specific polynucleotide sequences by simultaneous primer extension of complementary strands of double-stranded polynucleotides. The PCR reaction produces multiple copies of the template polynucleotide flanking the primer binding sites. As a result, using two primers, the number of copies of the template polynucleotide for both strands increases exponentially with each cycle because both strands are replicated in each cycle. The polynucleotide duplexes have ends that correspond to the ends of the primers used. PCR can include one or more repetitions of denaturing the template polynucleotide, annealing the primer to the primer binding site, and extending the primer by DNA or RNA polymerase in the presence of the nucleotide. The specific temperature, duration, and rate of change between steps for each step depends on a variety of factors known to those of ordinary skill in the art. (McPherson et al., IRLPress, Oxford (1991 and 1995)). For example, in conventional PCR using Taq DNA polymerase, double-stranded template polynucleotides can be denatured at temperatures >90°C, primers can be annealed at temperatures in the range of 50-75°C, and primers can be annealed at 72-78°C temperature extension within the range. In some embodiments, PCR includes reverse transcription PCR (RT-PCR), real-time PCR, nested PCR, quantitative PCR, multiplex PCR, and the like. In some embodiments, PCR does not include RT-PCR. (US Patent Nos. 5,168,038, 5,210,015, 6,174,670, 6,569,627, and 5,925,517; Mackay et al., Nucleic Acids Research, 30: 1292-1305 (2002)). RT-PCR includes a PCR reaction followed by a reverse transcription reaction and amplifies the resulting cDNA, and nested PCR includes a two-stage PCR in which the amplicons of the first PCR reaction using the first set of primers become the first PCR reaction using the first set of primers. A sample of a second PCR reaction of a two primer set, at least one of the second primer set is bound to an internal position of the amplicon of the first PCR reaction. Multiplex PCR includes PCR reactions in which multiple polynucleotide sequences are subjected to PCR simultaneously in the same reaction mixture. The PCR reaction volume can be anywhere from 0.2 nL to 1000 μL. Quantitative PCR includes PCR reactions designed to measure the absolute or relative amount, abundance or concentration of one or more sequences in a sample. Quantitative measurements can include comparing one or more reference sequences or standards to the polynucleotide sequence of interest. (Freeman et al, Biotechniques, 26: 112-126 (1999); Becker-Andre et al, Nucleic Acids Research, 17: 9437-9447 (1989); Zimmerman et al, Biotechniques, 21: 268-279 (1996); Diviacco et al, Gene, 122: 3013-3020 (1992); Becker-Andre et al., Nucleic Acids Research, 17: 9437-9446 (1989)).

当用于本文时，等位基因可以是细胞、个体或群体中特异性的遗传序列，其在所述基因序列内的至少一个变体位点的序列中与该基因的其他序列不同。在不同等位基因之间不同的变体位点的序列可以是变体，诸如多态性或突变。变体可以包括点突变、多态性、单核苷酸多态性(SNPS)、单核苷酸变异(SNVs)、易位、插入、缺失、扩增、倒置、内部缺失、拷贝数变异(CNVs)、杂合性丢失、或它们的任意组合。通过在一个染色体基因座处具有两个不同的等位基因，则样品在该基因座是“杂合的”。如果在一个染色体基因座具有两个相同的等位基因，则样品在该基因座是 “纯合的”。As used herein, an allele can be a cell, individual, or population-specific genetic sequence that differs from other sequences of the gene in the sequence of at least one variant site within the gene sequence. Sequences at variant sites that differ between different alleles can be variants, such as polymorphisms or mutations. Variants may include point mutations, polymorphisms, single nucleotide polymorphisms (SNPS), single nucleotide variations (SNVs), translocations, insertions, deletions, amplifications, inversions, internal deletions, copy number variations ( CNVs), loss of heterozygosity, or any combination thereof. By having two different alleles at a chromosomal locus, a sample is "heterozygous" at that locus. A sample is "homozygous" at a chromosomal locus if it has two identical alleles at that locus.

在一些实施方案中，变体可以包括影响多肽的变化，诸如表达水平、序列、功能、位置、结合配偶体的变化或它们的任意组合。在一些实施方案中，遗传变异可以是移码突变、无义突变、错义突变、中性突变或沉默突变。例如，当与参比核苷酸序列比较时，序列差异可以包括插入或缺失单个核苷酸或多于一个核苷酸，导致移码；至少一个核苷酸的变化，导致编码的氨基酸的变化；至少一个核苷酸的变化，导致产生提前的终止密码子；缺失几个核苷酸，导致缺失有所述核苷酸编码的一个或多个氨基酸；插入一个或几个核苷酸，诸如通过不均等重组或基因转变插入，导致阅读框编码序列的中断；所有不部分序列的重复；转座；或核苷酸序列重排。此类序列变化可能改变由核酸编码的多肽，例如，如果核酸序列中的变化引起移码，则移码可以导致编码的氨基酸中的变化，和/或可以导致产生提前的终止密码子，引起截短的多肽的产生。在一些实施方案中，变体可以是一个或多个核苷酸中的同义变化，例如，不引起氨基酸序列变化的变化。例如，这样的多态性可以改变剪接位点、影响mRNA的稳定性或转运，或者另外影响转录或编码的多肽的翻译。在一些实施方案中，由于在翻译过程中影响多肽折叠的稀有密码子的应用，同义突变可以导致具有改变的结构的多肽产物，在一些情形中，如果其药物靶标，这可能改变其功能和/或药物结合特性。在一些实施方案中，可能改变DNA的变化增加在身体水平上发生的结构变化(诸如扩增或缺失)的可能性。In some embodiments, variants can include changes that affect the polypeptide, such as changes in expression level, sequence, function, location, binding partner, or any combination thereof. In some embodiments, the genetic variation can be a frameshift mutation, a nonsense mutation, a missense mutation, a neutral mutation, or a silent mutation. For example, when compared to a reference nucleotide sequence, sequence differences can include insertions or deletions of a single nucleotide or more than one nucleotide, resulting in a frameshift; a change of at least one nucleotide, resulting in a change in the encoded amino acid a change of at least one nucleotide, resulting in an early stop codon; a deletion of several nucleotides, resulting in the deletion of one or more amino acids encoded by said nucleotide; an insertion of one or several nucleotides, such as Insertion by unequal recombination or gene conversion, resulting in interruption of the reading frame coding sequence; duplication of all non-partial sequences; transposition; or nucleotide sequence rearrangement. Such sequence changes may alter the polypeptide encoded by the nucleic acid, for example, if a change in the nucleic acid sequence causes a frameshift, the frameshift may result in a change in the encoded amino acid, and/or may result in the creation of an early stop codon, causing a truncation. Production of short polypeptides. In some embodiments, a variant may be a synonymous change in one or more nucleotides, e.g., a change that does not result in a change in amino acid sequence. For example, such polymorphisms can alter splice sites, affect mRNA stability or transport, or otherwise affect the transcription or translation of the encoded polypeptide. In some embodiments, synonymous mutations can result in polypeptide products with altered structure due to the use of rare codons that affect polypeptide folding during translation, which in some cases may alter its function and function if it is a drug target. /or drug binding properties. In some embodiments, changes that may alter DNA increase the likelihood of structural changes (such as amplifications or deletions) that occur at the body level.

用于本文时，多态性可以是群体中存在的两种以上的遗传确定的备选序列或等位基因。多态性或位点包括发生差异的基因座。在一些实施方案中，多态性以低于0.5％、1％、2％或5％的频率发生。在一些实施方案中，多态性以大于1％、5％、10％、20％或30％的频率发生。在一些实施方案中，生物标记具有至少两个等位基因，每一个在所选的群体中以大于 1％、5％、10％或20％的频率发生。在一些实施方案中，多态性包括病毒或细菌序列，并且在所选的群体中以低于0.5％、1％、2％或5％的频率发生。多态性可以包括一种或多种变体，包括碱基变化、插入、重复或缺失一个或多个碱基。多态性可以包括单核苷酸多态性(SNPs)。拷贝数变体 (CNVs)、颠换(transversions)和其他重排也是变体形式。多态性包括限制性片段长度多态性、可变数目串联重复(VNTR′s)、高变区、小卫星序列、二核苷酸重复、三核苷酸重复、四核苷酸重复、简单序列重复和插入元件。所选群体最常见的等位基因序列可能是野生型等位基因。二倍体生物体关于等位基因可以是纯合的或杂合的。As used herein, a polymorphism can be two or more genetically defined alternative sequences or alleles present in a population. Polymorphisms or loci include loci at which differences occur. In some embodiments, the polymorphism occurs with a frequency of less than 0.5%, 1%, 2%, or 5%. In some embodiments, the polymorphism occurs with a frequency of greater than 1%, 5%, 10%, 20%, or 30%. In some embodiments, the biomarker has at least two alleles, each occurring at a frequency of greater than 1%, 5%, 10%, or 20% in the selected population. In some embodiments, the polymorphism comprises a viral or bacterial sequence and occurs at a frequency of less than 0.5%, 1%, 2% or 5% in the selected population. A polymorphism can include one or more variants, including base changes, insertions, duplications, or deletions of one or more bases. Polymorphisms can include single nucleotide polymorphisms (SNPs). Copy number variants (CNVs), transversions and other rearrangements are also variants. Polymorphisms include restriction fragment length polymorphisms, variable number tandem repeats (VNTR's), hypervariable regions, minisatellite sequences, dinucleotide repeats, trinucleotide repeats, tetranucleotide repeats, simple Sequence repeats and insertion elements. The most common allelic sequence in the selected population is probably the wild-type allele. Diploid organisms can be homozygous or heterozygous for alleles.

用于本文时，基因分型包括确定受试者在一个或多个基因组位置的基因序列。例如，基因分型可以包括确定受试者针对单一SNP或两个以上的SNP具有哪个等位基因或哪些等位基因。二倍体受试者可以是针对两种可能的等位基因中的任一种是纯合的或是杂合的。在一个或多个基因座处杂合的正常细胞可能产生在那些基因座处是纯合的肿瘤细胞。这种杂合性丢失(loss of heterozygosity，LOH)可能由正常基因的缺失、携带正常基因的染色体的丢失、有丝分裂重组或具有正常基因的染色体的丢失和具有缺失的或失活的基因的染色体的重复导致。LOH可以是拷贝中性的或可以由缺失或扩增导致。As used herein, genotyping includes determining the genetic sequence of a subject at one or more genomic locations. For example, genotyping can include determining which allele or alleles a subject has for a single SNP or for two or more SNPs. Diploid subjects can be homozygous or heterozygous for either of the two possible alleles. Normal cells that are heterozygous at one or more loci may give rise to tumor cells that are homozygous at those loci. This loss of heterozygosity (LOH) may result from deletion of normal genes, loss of chromosomes carrying normal genes, mitotic recombination or loss of chromosomes with normal genes and loss of chromosomes with deleted or inactive genes caused by repetition. LOH can be copy neutral or can result from deletion or amplification.

用于本文时，受试者、个体和患者包括有生命的生物体，包括哺乳动物。受试者的宿主的实例包括，但不限于，马，母牛，骆驼，绵羊，猪，山羊，狗，猫，兔，豚鼠，猫，小鼠(例如，人源化的小鼠)，沙土鼠，非人灵长动物(例如，猕猴)、人等，非哺乳动物，包括，例如，非哺乳脊椎动物，诸如鸟(例如，鸡或鸭)，鱼(例如，鲨鱼)或青蛙(例如，爪蟾属(Xenopus))，和非哺乳无脊椎动物，以及它们的转基因物种。在某些方面中，受试者是指单一生物体(例如，人)。在某些方面中，或者提供一组个体，他们组成具有共同的研究免疫因素和/或疾病的小团体，和/ 或没有疾病的个体团体(例如，阴性/正常的对照)。从中获得样品的受试者可以患有疾病和/或病症(例如，一种或多种变态反应、感染、癌症或自身免疫病症等)，并且可以针对不受所述疾病影响的阴性对照受试者进行比较。As used herein, subjects, individuals and patients include living organisms, including mammals. Examples of hosts for subjects include, but are not limited to, horses, cows, camels, sheep, pigs, goats, dogs, cats, rabbits, guinea pigs, cats, mice (eg, humanized mice), sand Murine, non-human primates (eg, rhesus monkeys), humans, etc., non-mammals, including, for example, non-mammalian vertebrates such as birds (eg, chickens or ducks), fish (eg, sharks) or frogs (eg, Xenopus), and non-mammalian invertebrates, and their transgenic species. In certain aspects, a subject refers to a single organism (e.g., a human). In certain aspects, a cohort of individuals is alternatively provided that forms a small cohort with a common study immune factor and/or disease, and/or a cohort of individuals without disease (e.g., a negative/normal control). The subject from which the sample is obtained may have a disease and/or disorder (eg, one or more allergies, infection, cancer, or autoimmune disorder, etc.), and may be tested against a negative control unaffected by the disease to compare.

整体上的靶向测序方法Holistic Targeted Sequencing Approaches

本文所述的方法可以用于产生测序用的多核苷酸文库。为样品中的多核苷酸确定的序列可以以高准确度和置信度的碱基确定而确定。所述方法可以包括特异性靶向、独特地编码、修饰、扩增、测序和/或定量样品中存在的DNA或RNA序列。这些方法允许添加可能形成用于测序或其他分子分析的多核苷酸扩增子文库的序列。通过这些方法产生的测序文库可以结合能够允许划分来源于样品中相同的初始RNA或DNA分子的序列读数的UID。这些方法可以允许确定在RNA或DNA分子群体中存在的所观察到的变体是否是真正的多态性或突变，或者所观察到的序列变体由扩增人为现象(amplification artifact)(如扩增出错或偏差)导致。在本文所述的任一种方法中，考虑UID是任选的。因此，任何“UID”的引用是指任选的UID。The methods described herein can be used to generate polynucleotide libraries for sequencing. Sequences determined for polynucleotides in a sample can be determined with high accuracy and confidence in base determination. The methods can include specifically targeting, uniquely encoding, modifying, amplifying, sequencing and/or quantifying DNA or RNA sequences present in a sample. These methods allow for the addition of sequences that may form a polynucleotide amplicon library for sequencing or other molecular analysis. Sequencing libraries generated by these methods can incorporate UIDs that allow the partitioning of sequence reads derived from the same initial RNA or DNA molecule in the sample. These methods may allow to determine whether an observed variant present in a population of RNA or DNA molecules is a bona fide polymorphism or mutation, or whether the observed sequence variant is the result of an amplification artifact such as amplification error or deviation). In any of the methods described herein, consideration of the UID is optional. Thus, any reference to "UID" refers to an optional UID.

这些包括制备使用靶标特异性引物产生的要在NGS平台上测序的多核苷酸文库的方法。多种生物学靶标，诸如来自生物学患者样品的靶标，在测序后可以由NGS相容的文库进行分析。所述方法允许鉴定靶标频率 (例如，基因表达或等位基因分布)。这些方法还允许基因组或转录物组 (诸如来自患病的或未患病的受试者)中的鉴定和突变或SNPs，由其可以获得准确的测序信息。所述方法还允许确定在来自受试者的生物学样品中存在或不存在污染或感染，诸如通过使用针对外源生物体或病毒(如细菌或真菌)的靶标特异性引物。These include methods for preparing polynucleotide libraries generated using target-specific primers to be sequenced on NGS platforms. Various biological targets, such as targets from biological patient samples, can be analyzed by NGS-compatible libraries after sequencing. The method allows for the identification of target frequencies (eg, gene expression or allele distribution). These methods also allow for identification and mutation or SNPs in genomes or transcriptomes (such as from diseased or non-diseased subjects) from which accurate sequencing information can be obtained. The methods also allow for the determination of the presence or absence of contamination or infection in a biological sample from a subject, such as by using target-specific primers for foreign organisms or viruses such as bacteria or fungi.

本文所述的方法提供灵敏性和特异性的有利的平衡和由线性引物延伸反应和/或UID-标记赋予的优点。在一些实施方案中，所述方法设计用于较小的组尺寸，诸如临床感兴趣的组。这些方法可以具有非常低的预付成本，可以快速进行，并且能对RNA或DNA靶标进行修正 (amendable)。此外，设计用于这些方法的引物不麻烦，并且与设计用于标准PCR反应的引物的容易度相似。所述方法可以用于设计用于多种测序和其他分子分析的多核苷酸文库的格式。另外地，多种应用可以分别或同时地进行。例如，癌症突变剖析(profiling)所需要的靶标测序、分析SNPs和突变、检测载体、检测感染、诊断疾病和分析基因表达可以分别或同时地进行。The methods described herein provide an advantageous balance of sensitivity and specificity and the advantages conferred by linear primer extension reactions and/or UID-tags. In some embodiments, the methods are designed for smaller group sizes, such as groups of clinical interest. These methods can have very low up-front costs, can be performed quickly, and are amendable to RNA or DNA targets. Furthermore, designing primers for these methods is not cumbersome and is similar to the ease of designing primers for standard PCR reactions. The methods can be used to design the format of polynucleotide libraries for various sequencing and other molecular analyses. Additionally, multiple applications may be performed separately or simultaneously. For example, target sequencing, analysis of SNPs and mutations, detection of vectors, detection of infection, diagnosis of disease, and analysis of gene expression required for cancer mutation profiling can be performed separately or simultaneously.

初始靶向：形成与靶标多核苷酸互补的UID-标记的多核苷酸Initial targeting: formation of UID-labeled polynucleotides complementary to target polynucleotides

取决于要分析的多核苷酸靶标的类型，所述方法可以使用反转录(RT) 或引物延伸(PE)。引物延伸反应可以是单引物延伸步骤。引物延伸反应可以包括使一个或多个个体引物延伸一次。引物延伸反应可以包括使一个或多个个体引物在一个步骤中延伸。在一些实施方案中，与DNA靶标互补的多核苷酸可以通过进行引物延伸反应而产生。例如，可以通过进行引物延伸反应产生与DNA靶标互补的UID标记的多核苷酸。在一些实施方案中，靶标多核苷酸互补序列，诸如与RNA靶标互补的UID-标记的多核苷酸，可以通过进行反转录反应产生。靶标多核苷酸互补序列，诸如与RNA 靶标互补的UID-标记的多核苷酸，可以通过进行反转录反应产生。靶标多核苷酸包括初始在样品中存在的多核苷酸。Depending on the type of polynucleotide target to be analyzed, the method may use reverse transcription (RT) or primer extension (PE). The primer extension reaction can be a single primer extension step. A primer extension reaction can include extending one or more individual primers once. A primer extension reaction can include extending one or more individual primers in one step. In some embodiments, a polynucleotide complementary to a DNA target can be generated by performing a primer extension reaction. For example, a UID-labeled polynucleotide complementary to a DNA target can be generated by performing a primer extension reaction. In some embodiments, a target polynucleotide complementary sequence, such as a UID-labeled polynucleotide complementary to an RNA target, can be generated by performing a reverse transcription reaction. A target polynucleotide complementary sequence, such as a UID-labeled polynucleotide complementary to an RNA target, can be generated by performing a reverse transcription reaction. Target polynucleotides include polynucleotides originally present in the sample.

当用于本文时，“靶标多核苷酸互补序列”是包含与靶标序列互补的序列或其互补物(与靶标序列互补的序列的互补位)的多核苷酸。在一些实施方案中，靶标多核苷酸互补序列包含第一互补序列。“第一互补序列”是由靶标多核苷酸反转录的多核苷酸或由中靶多核苷酸的引物延伸反应形成的多核苷酸。在一些实施方案中，靶标多核苷酸互补序列包含修饰的互补序列。“修饰的互补序列”是由靶标多核苷酸反转录的多核苷酸或由中靶多核苷酸的引物延伸反应形成的多核苷酸，其包含衔接子。在一些实施方案中，靶标多核苷酸互补序列包含第二互补序列。“第二互补序列”是包含与第一互补序列或修饰的互补序列互补的序列的多核苷酸。在一些实施方案中，靶标多核苷酸互补序列包含UID。例如，第一互补序列可以包含UID。例如，修饰的互补序列可以包含UID。例如，第二互补序列可以包含UID。例如。第二互补序列可以包含与来自第一互补序列或修饰的互补序列的UID互补的序列。在一些实施方案中，靶标多核苷酸互补序列不包含UID。例如，第一互补序列可以不包含UID。例如，修饰的互补序列可以不包含UID。例如，第二互补序列可以不包含UID。As used herein, a "target polynucleotide complement" is a polynucleotide comprising a sequence complementary to the target sequence or its complement (the paratope of the sequence complementary to the target sequence). In some embodiments, the target polynucleotide complementary sequence comprises the first complementary sequence. A "first complementary sequence" is a polynucleotide reverse transcribed from a target polynucleotide or a polynucleotide formed by a primer extension reaction of the target polynucleotide. In some embodiments, the target polynucleotide complement comprises a modified complement. A "modified complementary sequence" is a polynucleotide reverse transcribed from a target polynucleotide or a polynucleotide formed by a primer extension reaction of a target polynucleotide, which includes an adaptor. In some embodiments, the target polynucleotide complementary sequence comprises a second complementary sequence. A "second complementary sequence" is a polynucleotide comprising a sequence complementary to the first complementary sequence or a modified complementary sequence. In some embodiments, the target polynucleotide complement comprises a UID. For example, the first complementary sequence may contain a UID. For example, the modified complementary sequence may contain a UID. For example, the second complementary sequence may contain the UID. E.g. The second complementary sequence may comprise a sequence complementary to the UID from the first complementary sequence or the modified complementary sequence. In some embodiments, the target polynucleotide complement does not comprise a UID. For example, the first complementary sequence may not contain a UID. For example, the modified complementary sequence may not contain a UID. For example, the second complementary sequence may not contain a UID.

所述方法可以在第一步中包括RT或PE反应。所述方法可以在后续步骤中包括线性引物延伸反应。线性引物延伸反应可以导致与指数扩增相对的线性扩增。对于多种多核苷酸的靶向测序，每个个体靶标特异性引物可以具有由各种酶延伸中的变化引起的某种程度的效率变化，或与其各自的靶标退火效率的不同。这可能产生可能被PCR指数扩大的偏差。本文所述的方法可以使用线性引物延伸来减少或避免这种偏差，导致靶标相对于另一种的变异频率的减少或避免，并且可以产生提高的置信度和频率或碱基确定分析和准确度。已经发现本文所述的方法避免这些偏差问题，并且能够维持起始靶标库的真实频率表现。在一些实施方案中，在所述方法中进行的唯一的指数扩增反应，诸如PCR反应，是在文库产生的最后阶段，并且可以利用通用引物组。在这些实施方案中，在指数扩增步骤中，所有的靶标都可以被均匀地扩增，没有引入基因特异性变异或偏差。The method may include an RT or PE reaction in the first step. The method may include a linear primer extension reaction in a subsequent step. Linear primer extension reactions can result in linear amplification as opposed to exponential amplification. For targeted sequencing of multiple polynucleotides, each individual target-specific primer may have some degree of variation in efficiency caused by variation in the extension of the various enzymes, or a difference in annealing efficiency to its respective target. This may generate biases that may be amplified by PCR indices. The methods described herein can use linear primer extension to reduce or avoid this bias, resulting in a reduction or avoidance of variant frequency in a target relative to another, and can result in improved confidence and frequency or base calling analysis and accuracy . The methods described herein have been found to avoid these bias problems and to maintain the true frequency representation of the starting target library. In some embodiments, the only exponential amplification reactions performed in the methods, such as PCR reactions, are in the final stages of library generation and can utilize a universal primer set. In these embodiments, in the exponential amplification step, all targets can be amplified uniformly without introducing gene-specific variation or bias.

反转录(靶标多核苷酸的RT，以形成互补的UID-标记的多核苷酸)Reverse transcription (RT of target polynucleotide to form complementary UID-tagged polynucleotide)

使用本文所述的引物，可以使用本领域已知的适当试剂反转录RNA 多核苷酸。RNA可以包括mRNA。Using the primers described herein, RNA polynucleotides can be reverse transcribed using appropriate reagents known in the art. RNA can include mRNA.

在一些实施方案中，方法包括使用一种或多种引物(RT引物)反转录靶标RNA多核苷酸，以形成cDNA。在一些实施方案中，RT引物包括寡-dT引物或序列特异性引物。在一些实施方案中，多个RT引物包括一个或多个寡-dT引物或一个或多个序列特异性引物。在一些实施方案中，反转录反应是第一步骤，即，从包含靶标多核苷酸的样品产生多核苷酸文库。在一些实施方案中，靶标多核苷酸不进行RT-PCR。在一些实施方案中，靶标多核苷酸不进行指数扩增。在一些实施方案中，不在反转录后的下一步骤中进行指数扩增。在一些实施方案中，不在反转录后的下2个步骤中进行指数扩增。在一些实施方案中，不在反转录后的下3个步骤中进行指数扩增。在一些实施方案中，由反转录步骤产生的靶标多核苷酸的cDNA不在该步骤中进一步扩增。在一些实施方案中，所述方法包括仅一个周期的反转录。在另一些实施方案中，所述方法包括使靶标RNA分子重复反转录，以产生多个cDNA分子，诸如可能包含UID的第一互补序列。In some embodiments, the method comprises reverse transcribing a target RNA polynucleotide using one or more primers (RT primers) to form a cDNA. In some embodiments, RT primers include oligo-dT primers or sequence-specific primers. In some embodiments, the plurality of RT primers include one or more oligo-dT primers or one or more sequence-specific primers. In some embodiments, a reverse transcription reaction is the first step, ie, generating a polynucleotide library from a sample comprising the target polynucleotide. In some embodiments, the target polynucleotide is not subjected to RT-PCR. In some embodiments, the target polynucleotide does not undergo exponential amplification. In some embodiments, exponential amplification is not performed in the next step after reverse transcription. In some embodiments, exponential amplification is not performed in the next 2 steps after reverse transcription. In some embodiments, exponential amplification is not performed in the next 3 steps after reverse transcription. In some embodiments, the cDNA of the target polynucleotide produced by the reverse transcription step is not further amplified in this step. In some embodiments, the method includes only one cycle of reverse transcription. In other embodiments, the method comprises repeated reverse transcription of the target RNA molecule to generate a plurality of cDNA molecules, such as a first complementary sequence that may comprise a UID.

RT引物可以进一步包含不与RNA的区域互补的区域。在一些实施方案中，RT引物可以进一步包含UID。例如，多个RT引物中的每个RT 引物可以包含不同的UID。这能够允许独特地标记(barcoding)由反转录的RNA分子复制的cDNAs中的每一个。在一些实施方案中，RT引物不与靶RNA的区域互补的区域可以包含UID。在一些实施方案中，多个 RT引物中的每个RT引物不与靶RNA的区域互补的区域可以包含UID。在一些实施方案中，RT引物可以进一步包含已知的序列，诸如通用引物结合位点或与通用引物结合位点互补的序列。在一些实施方案中，RT引物可以进一步包含磷酸化的5’末端。在一些实施方案中，RT引物可以进一步在5’末端包含已知的序列，诸如通用引物结合位点或与通用引物结合位点互补的序列。在一些实施方案中，不与RNA的区域互补的区域是在所述引物与RNA互补的区域的5’。在一些实施方案中，所述不与RNA 的区域互补的区域是5’突出区。在一些实施方案中，不与靶RNA的区域互补的区域包含扩增和/或测序反应的引发位点。RT primers may further comprise regions that are not complementary to regions of the RNA. In some embodiments, the RT primers may further comprise UIDs. For example, each RT primer in the plurality of RT primers may contain a different UID. This can allow unique barcoding of each of the cDNAs replicated from the reverse transcribed RNA molecule. In some embodiments, a region of the RT primer that is not complementary to a region of the target RNA may comprise a UID. In some embodiments, a region of each RT primer in the plurality of RT primers that is not complementary to a region of the target RNA can comprise a UID. In some embodiments, RT primers may further comprise known sequences, such as universal primer binding sites or sequences complementary to universal primer binding sites. In some embodiments, the RT primer may further comprise a phosphorylated 5' end. In some embodiments, the RT primer may further comprise a known sequence at the 5' end, such as a universal primer binding site or a sequence complementary to the universal primer binding site. In some embodiments, the region that is not complementary to the region of the RNA is 5' to the region of the primer that is complementary to the RNA. In some embodiments, the region that is not complementary to the region of the RNA is a 5' overhang. In some embodiments, regions that are not complementary to regions of the target RNA comprise priming sites for amplification and/or sequencing reactions.

在一些实施方案中，RT引物可以包含通用连接序列。在一些实施方案中，通用连接序列是UID的5’。在一些实施方案中，通用连接序列是靶标特异性区域的5’。在一些实施方案中，通用连接序列是UID的5’和靶标特异性区域的5’。在一些实施方案中，通用连接序列是在RT引物的 5’末端。在一些实施方案中，多个RT引物可以包含具有第一通用连接序列的第一RT引物和一个或多个包含至少第二通用引物序列的第二RT引物。In some embodiments, RT primers may comprise a universal linker sequence. In some embodiments, the universal linker sequence is 5' to the UID. In some embodiments, the universal linker sequence is 5' to the target-specific region. In some embodiments, the universal linker sequence is 5' to the UID and 5' to the target-specific region. In some embodiments, the universal linker sequence is at the 5' end of the RT primer. In some embodiments, the plurality of RT primers can comprise a first RT primer having a first universal linker sequence and one or more second RT primers comprising at least a second universal primer sequence.

单链或双链DNA靶标多核苷酸的引物延伸，以形成互补的UID-标记的多核苷酸Primer extension of single- or double-stranded DNA target polynucleotides to form complementary UID-tagged polynucleotides

使用本文所述的引物，DNA多核苷酸可以杂交到引物上，并且可以使用本领域已知的适当试剂进行引物延伸(gPE或PE)。在一些实施方案中，引物延伸包括单个引物延伸。在一些实施方案中，引物延伸不包括多次引物延伸。在一些实施方案中，引物延伸不包括单个引物延伸。在一些实施方案中，引物延伸包括多次引物延伸。在一些实施方案中，方法包括使用一种或多种引物(PE引物)中靶DNA多核苷酸进行引物延伸，以形成靶标多核苷酸互补序列，诸如第一互补序列。在一些实施方案中， PE引物包括序列特异性的引物。在一些实施方案中，多个PE引物包括一个或多个序列特异性的引物。在一些实施方案中，引物延伸反应是第一步骤，即，由包含靶标多核苷酸的样品产生多核苷酸文库。在一些实施方案中，靶标多核苷酸不进行PCR。在一些实施方案中，靶标多核苷酸不进行指数扩增。在一些实施方案中，不在引物延伸后的下一步骤中进行指数扩增。在一些实施方案中，不在引物延伸后的下2个步骤中进行指数扩增。在一些实施方案中，不在引物延伸后的下3个步骤中进行指数扩增。在一些实施方案中，由引物延伸步骤产生的靶标多核苷酸的互补多核苷酸不在该步骤中进一步扩增。在一些实施方案中，所述方法包括仅一个周期的引物延伸。在另一些实施方案中，所述方法包括使与靶标DNA分子杂交的引物重复延伸或线性扩增，以产生多个拷贝的DNA分子，诸如可能包含UID的靶标多核苷酸互补序列。Using the primers described herein, DNA polynucleotides can be hybridized to the primers, and primer extension (gPE or PE) can be performed using appropriate reagents known in the art. In some embodiments, primer extension comprises a single primer extension. In some embodiments, primer extension does not include multiple primer extensions. In some embodiments, primer extension does not include a single primer extension. In some embodiments, primer extension includes multiple primer extensions. In some embodiments, the method comprises primer extension of the target DNA polynucleotide using one or more primers (PE primers) to form a target polynucleotide complementary sequence, such as a first complementary sequence. In some embodiments, the PE primers include sequence-specific primers. In some embodiments, the plurality of PE primers include one or more sequence-specific primers. In some embodiments, a primer extension reaction is the first step, i.e., generating a polynucleotide library from a sample comprising the target polynucleotide. In some embodiments, the target polynucleotide is not subjected to PCR. In some embodiments, the target polynucleotide does not undergo exponential amplification. In some embodiments, exponential amplification is not performed in the next step after primer extension. In some embodiments, exponential amplification is not performed in the next 2 steps after primer extension. In some embodiments, exponential amplification is not performed in the next 3 steps after primer extension. In some embodiments, the complementary polynucleotide of the target polynucleotide produced by the primer extension step is not further amplified in this step. In some embodiments, the method includes only one cycle of primer extension. In other embodiments, the method comprises repeated extension or linear amplification of primers that hybridize to the target DNA molecule to generate multiple copies of the DNA molecule, such as the target polynucleotide complement, which may comprise a UID.

所述一个或多个PE引物可以包含与靶DNA的区域或序列互补的区域，诸如与靶标多核苷酸杂交的靶标特异性区域，诸如生物标记。所述一个或多个PE引物可以包含与靶DNA的区域互补或基本上互补的区域。在一些实施方案中，所述一个或多个PE引物可以包含具有与第一靶标多核苷酸的序列互补的区域的第一PE引物和具有与第二靶标多核苷酸的序列互补的区域的第二PE引物。例如，第一靶标多核苷酸可以是第一DNA 分子，并且第二靶标多核苷酸可以是第二DNA分子。在一些实施方案中，所述一个或多个PE引物可以包含具有与第一DNA互补的区域的第一PE 引物，和一个或多个分别具有与一个或多个第二DNAs的序列互补的区域的第二PE引物。在一些实施方案中，第一和第二靶标序列是相同的。在一些实施方案中，第一和第二靶标序列是不同的。The one or more PE primers may comprise a region complementary to a region or sequence of the target DNA, such as a target-specific region, such as a biomarker, that hybridizes to the target polynucleotide. The one or more PE primers may comprise regions complementary or substantially complementary to regions of the target DNA. In some embodiments, the one or more PE primers can comprise a first PE primer having a region complementary to the sequence of the first target polynucleotide and a first PE primer having a region complementary to the sequence of the second target polynucleotide Two PE primers. For example, the first target polynucleotide can be a first DNA molecule, and the second target polynucleotide can be a second DNA molecule. In some embodiments, the one or more PE primers may comprise a first PE primer having a region complementary to the first DNA, and one or more regions each having a region complementary to the sequence of one or more second DNAs the second PE primer. In some embodiments, the first and second target sequences are the same. In some embodiments, the first and second target sequences are different.

PE引物可以进一步包含不与DNA的区域互补的区域。PE引物可以进一步包含UID。例如，多个PE引物中的每个PE引物可以包含不同的UID。这能够允许独特地标记(barcoding)由进行引物延伸反应的DNA 分子复制的互补DNAs中的每一个。在一些实施方案中，PE引物不与靶 DNA的区域互补的区域可以包含UID。在一些实施方案中，多个PE引物中的每个PE引物不与靶DNA的区域互补的区域可以包含UID。在一些实施方案中，PE引物可以进一步包含已知的序列，诸如通用引物结合位点或与通用引物结合位点互补的序列。在一些实施方案中，PE引物可以进一步包含磷酸化的5’末端。在一些实施方案中，PE引物可以进一步在5’末端包含已知的序列，诸如通用引物结合位点或与通用引物结合位点互补的序列。在一些实施方案中，不与DNA的区域互补的区域是在所述引物与DNA互补的区域的5’。在一些实施方案中，所述不与DNA的区域互补的区域是5’突出区。在一些实施方案中，不与靶DNA的区域互补的区域包含扩增和/或测序反应的引发位点。The PE primer may further comprise a region that is not complementary to the region of DNA. The PE primer may further contain a UID. For example, each PE primer in the plurality of PE primers may contain a different UID. This can allow unique barcoding of each of the complementary DNAs replicated by the DNA molecule undergoing the primer extension reaction. In some embodiments, the region of the PE primer that is not complementary to the region of the target DNA may contain the UID. In some embodiments, a region of each PE primer in the plurality of PE primers that is not complementary to a region of the target DNA may comprise a UID. In some embodiments, the PE primer may further comprise a known sequence, such as a universal primer binding site or a sequence complementary to the universal primer binding site. In some embodiments, the PE primer may further comprise a phosphorylated 5' end. In some embodiments, the PE primer may further comprise a known sequence at the 5' end, such as a universal primer binding site or a sequence complementary to the universal primer binding site. In some embodiments, the region that is not complementary to the region of DNA is 5' to the region of the primer that is complementary to the DNA. In some embodiments, the region that is not complementary to the region of DNA is a 5' overhang. In some embodiments, regions that are not complementary to regions of the target DNA comprise priming sites for amplification and/or sequencing reactions.

在一些实施方案中，可以在引物延伸步骤中使用PE引物的文库。In some embodiments, a library of PE primers can be used in the primer extension step.

在一些实施方案中，PE引物可以包含通用连接序列。In some embodiments, the PE primer may comprise a universal linker sequence.

在一些实施方案中，通用连接序列是UID的5’。在一些实施方案中，通用连接序列是靶标特异性区域的5’。在一些实施方案中，通用连接序列是UID的5’和靶标特异性区域的5’。在一些实施方案中，通用连接序列是在PE引物的5’末端。在一些实施方案中，多个PE引物可以包含具有第一通用连接序列的第一PE引物和一个或多个包含至少第二通用引物序列的第二PE引物。In some embodiments, the universal linker sequence is 5' of the UID. In some embodiments, the universal linker sequence is 5' to the target-specific region. In some embodiments, the universal linker sequence is 5' to the UID and 5' to the target-specific region. In some embodiments, the universal linker sequence is at the 5' end of the PE primer. In some embodiments, the plurality of PE primers can comprise a first PE primer having a first universal linker sequence and one or more second PE primers comprising at least a second universal primer sequence.

在一些实施方案中，使用55℃的退火温度来适应较低的引物解链温度。在一些实施方案中，对于最初的PE步骤，在68℃使用保持步骤。在一些实施方案中，引物的整体浓度固定在某一浓度。在一些实施方案中，在引物延伸步骤中使用氯化镁、硫酸铵、D-(+)-海藻糖、甜菜碱或它们的组合。In some embodiments, an annealing temperature of 55°C is used to accommodate lower primer melting temperatures. In some embodiments, a hold step is used at 68°C for the initial PE step. In some embodiments, the overall concentration of primers is fixed at a certain concentration. In some embodiments, magnesium chloride, ammonium sulfate, D-(+)-trehalose, betaine, or a combination thereof are used in the primer extension step.

与靶标互补的UID-标记的多核苷酸的部分格式化Partial formatting of UID-tagged polynucleotides complementary to the target

在产生靶标多核苷酸互补序列(例如，第一互补序列)后，可以将多核苷酸衔接子序列添加到第一互补序列上。对其已经添加了衔接子序列的靶标多核苷酸互补序列(诸如，可能包含UID的第一互补序列)可以是修饰的互补序列(MCS)。在一些实施方案中，可以在产生靶标多核苷酸互补序列后的下一步骤中，将多核苷酸衔接子序列添加到靶标多核苷酸互补序列(诸如可能包含UID的第一互补序列)上。在一些实施方案中，可以在产生包含UIDs的靶标多核苷酸互补序列后的第二步骤中，将多核苷酸衔接子序列添加到靶标多核苷酸互补序列(诸如，可能包含UID的第一互补序列)上。在一些实施方案中，可以在产生包含UIDs的靶标多核苷酸互补序列后的第三步骤中，将多核苷酸衔接子序列添加到靶标多核苷酸互补序列(诸如，可能包含UID的第一互补序列)上。在一些实施方案中，多核苷酸衔接子序列不包含UID。After generating the target polynucleotide complement (e.g., the first complement), a polynucleotide adaptor sequence can be added to the first complement. The target polynucleotide complement to which an adaptor sequence has been added (such as the first complement possibly comprising a UID) can be a modified complement (MCS). In some embodiments, a polynucleotide adaptor sequence can be added to the target polynucleotide complementary sequence (such as the first complementary sequence that may comprise a UID) in a next step after the target polynucleotide complementary sequence is generated. In some embodiments, a polynucleotide adaptor sequence can be added to the target polynucleotide complement (such as the first complement that may include the UIDs) in a second step after generating the target polynucleotide complement comprising the UIDs. sequence). In some embodiments, a polynucleotide adaptor sequence may be added to the target polynucleotide complement (such as the first complement that may include the UIDs) in a third step after generating the target polynucleotide complement comprising the UIDs sequence). In some embodiments, the polynucleotide adaptor sequence does not contain a UID.

在一些实施方案中，多核苷酸衔接子序列可以通过连接添加到靶标多核苷酸互补序列(诸如，可能包含UID的第一互补序列)上(美国专利号4,883,750，5,476,930，5,593,826，5,426,180，5,871,921；和美国专利公布号2004/0110213)。连接技术可以包括平端连接和粘端连接。连接反应可以包括DNA连接酶，诸如DNA连接酶I，DNA连接酶III，DNA连接酶IV和T4 DNA连接酶。连接反应可以包括RNA连接酶，诸如T4 RNA连接酶I和T4 RNA连接酶II。方法包括：使用T4 DNA连接酶，其催化具有平端和粘端的双链体DNA或RNA中并置的5′磷酸端和3′羟基端之间的磷酸二酯键的形成；Taq DNA连接酶，其催化杂交到互补靶标DNA上的两个相邻的寡核苷酸的并置的5′磷酸端和3′羟基端之间的磷酸二酯键的形成；大肠杆菌(E.coli)DNA连接酶，其催化包含粘端的双链体DNA中并置的5’-磷酸端和3′-羟基端之间的磷酸二酯键的形成；和T4 RNA连接酶，其通过形成3’→5′磷酸二酯键催化5′磷酰基-末端的核酸供体与3′羟基-末端的核酸接受体的连接，底物包括单链RNA和DNA 以及二核苷焦磷酸酯。In some embodiments, a polynucleotide adaptor sequence can be added to a target polynucleotide complementary sequence (such as the first complementary sequence that may comprise a UID) by ligation (US Pat. Nos. 4,883,750, 5,476,930, 5,593,826, 5,426,180, 5,871,921; and US Patent Publication No. 2004/0110213). Connection techniques may include blunt end connections and stick end connections. The ligation reaction can include DNA ligases such as DNA ligase I, DNA ligase III, DNA ligase IV and T4 DNA ligase. The ligation reaction can include RNA ligases, such as T4 RNA ligase I and T4 RNA ligase II. Methods include: using T4 DNA ligase, which catalyzes the formation of phosphodiester bonds between juxtaposed 5' phosphate and 3' hydroxyl ends in duplex DNA or RNA with blunt and sticky ends; Taq DNA ligase, It catalyzes the formation of a phosphodiester bond between the juxtaposed 5' phosphate and 3' hydroxyl ends of two adjacent oligonucleotides hybridized to complementary target DNA; E. coli DNA ligation an enzyme that catalyzes the formation of a phosphodiester bond between juxtaposed 5'-phosphate and 3'-hydroxyl ends in duplex DNA containing sticky ends; and T4 RNA ligase, which catalyzes the formation of a 3'→5' Phosphodiester bonds catalyze the attachment of a 5' phosphoryl-terminal nucleic acid donor to a 3' hydroxy-terminal nucleic acid acceptor, and substrates include single-stranded RNA and DNA and dinucleoside pyrophosphates.

在一些实施方案中，多核苷酸衔接子序列不是通过连接添加到靶标多核苷酸互补序列(诸如可能包含UID的第一互补序列)上。在一些实施方案中，多核苷酸衔接子序列可以通过扩增反应添加到靶标多核苷酸互补序列(诸如可能包含UID的第一互补序列)上。在一些实施方案中，多核苷酸衔接子序列可以通过使用一个或多个包含所述衔接子序列的引物的扩增反应添加到靶标多核苷酸互补序列(诸如可能包含UID的第一互补序列)上。在一些实施方案中，多核苷酸衔接子序列不是通过扩增反应添加到靶标多核苷酸互补序列(诸如可能包含UID的第一互补序列) 上。在一些实施方案中，多核苷酸衔接子序列不是通过使用一个或多个包含所述衔接子序列的引物的扩增反应添加到靶标多核苷酸互补序列(诸如可能包含UID的第一互补序列)上。在一些实施方案中，多核苷酸衔接子序列可以在下文所述的PCR富集步骤中添加到靶标多核苷酸互补序列 (诸如可能包含UID的第二互补序列)上。In some embodiments, the polynucleotide adaptor sequence is not added by ligation to the complementary sequence of the target polynucleotide (such as the first complementary sequence that may comprise a UID). In some embodiments, a polynucleotide adaptor sequence can be added to a target polynucleotide complementary sequence (such as a first complementary sequence that may comprise a UID) by an amplification reaction. In some embodiments, a polynucleotide adaptor sequence can be added to a target polynucleotide complementary sequence (such as a first complementary sequence that may include a UID) by an amplification reaction using one or more primers comprising the adaptor sequence. superior. In some embodiments, the polynucleotide adaptor sequence is not added to the target polynucleotide complementary sequence (such as the first complementary sequence that may comprise a UID) by an amplification reaction. In some embodiments, the polynucleotide adaptor sequence is not added to the target polynucleotide complementary sequence (such as the first complementary sequence that may include a UID) by an amplification reaction using one or more primers comprising the adaptor sequence superior. In some embodiments, a polynucleotide adaptor sequence can be added to a target polynucleotide complementary sequence (such as a second complementary sequence that may comprise a UID) during the PCR enrichment step described below.

在一些实施方案中，在产生靶标多核苷酸互补序列后的下一步骤中，可以通过连接将多核苷酸衔接子序列添加到靶标多核苷酸互补序列(诸如可能包含UID的第一互补序列)上。在一些实施方案中，衔接子可以是单链多核苷酸。在一些实施方案中，衔接子可以是双链多核苷酸。在一些实施方案中，衔接子可以是包含双链区和单链区(如突出区)的桥连多核苷酸。在一些实施方案中，衔接子可以是包含双链区和单链区的桥连多核苷酸，其中包含单链区的链不连接在靶标多核苷酸互补序列(诸如可能包含UID的第一互补序列)上。在一些实施方案中，衔接子可以是包含双链区和单链区的桥连多核苷酸，其中不包含所述单链区的链连接在靶标多核苷酸互补序列(诸如可能包含UID的第一互补序列)上。在一些实施方案中，衔接子可以是包含双链区和单链区的桥连多核苷酸，其中不包含与靶标多核苷酸互补序列(诸如可能包含UID的第一互补序列)互补的区域的链连接到所述靶标多核苷酸互补序列上。在一些实施方案中，衔接子可以是包含双链区和单链区的桥连多核苷酸，其中包含与靶标多核苷酸互补序列(诸如可能包含UID的第一互补序列)互补的区域的链没有连接到所述靶标多核苷酸互补序列上。在一些实施方案中，衔接子可以是包含双链区和单链区的桥连多核苷酸，其中包含与靶标多核苷酸互补序列 (诸如可能包含UID的第一互补序列)互补的区域的链杂交到所述靶标多核苷酸互补序列上。在一些实施方案中，衔接子可以是包含双链区和单链区的桥连多核苷酸，其中不包含与靶标多核苷酸互补序列(诸如可能包含UID的第一互补序列)互补的区域的链不与所述靶标多核苷酸互补序列杂交。In some embodiments, in a next step after generation of the target polynucleotide complement, a polynucleotide adaptor sequence can be added by ligation to the target polynucleotide complement (such as a first complement that may comprise a UID) superior. In some embodiments, the adaptor can be a single-stranded polynucleotide. In some embodiments, the adaptor can be a double-stranded polynucleotide. In some embodiments, an adaptor can be a bridged polynucleotide comprising a double-stranded region and a single-stranded region (e.g., an overhang). In some embodiments, an adaptor can be a bridging polynucleotide comprising a double-stranded region and a single-stranded region, wherein the strand comprising the single-stranded region is not ligated at the target polynucleotide complementary sequence (such as the first complementary sequence that may comprise a UID) sequence). In some embodiments, an adaptor can be a bridging polynucleotide comprising a double-stranded region and a single-stranded region, wherein the strand that does not comprise the single-stranded region is joined at the complementary sequence of the target polynucleotide (such as the first polynucleotide that may comprise a UID). a complementary sequence). In some embodiments, an adaptor can be a bridging polynucleotide comprising a double-stranded region and a single-stranded region, which does not comprise a region complementary to the complementary sequence of the target polynucleotide (such as the first complementary sequence that may comprise a UID) The strand is ligated to the complementary sequence of the target polynucleotide. In some embodiments, an adaptor can be a bridging polynucleotide comprising a double-stranded region and a single-stranded region, wherein the strand comprises a region complementary to the target polynucleotide complementary sequence (such as a first complementary sequence that may comprise a UID) Not linked to the complementary sequence of the target polynucleotide. In some embodiments, an adaptor can be a bridging polynucleotide comprising a double-stranded region and a single-stranded region, wherein the strand comprises a region complementary to the target polynucleotide complementary sequence (such as a first complementary sequence that may comprise a UID) hybridizes to the complementary sequence of the target polynucleotide. In some embodiments, an adaptor can be a bridging polynucleotide comprising a double-stranded region and a single-stranded region, which does not comprise a region complementary to the complementary sequence of the target polynucleotide (such as the first complementary sequence that may comprise a UID) Strands do not hybridize to the complementary sequence of the target polynucleotide.

在一些实施方案中，5’突出区可以与一个或多个靶标多核苷酸互补序列(诸如包含UIDs的那些)互补。在一些实施方案中，5’突出区可以与一个或多个多核苷酸互补序列(诸如包含UIDs的那些)的5’区互补。在一些实施方案中，5’突出区可以包含与通用连接序列(诸如RT引物或 PE引物的通用连接序列)互补的序列。在一些实施方案中，5’突出区可以与一个或多个靶标多核苷酸互补序列(诸如包含UIDs的那些)的5’ 区互补，其中所述5’区是UID的5’。在一些实施方案中，衔接子可以是包含双链区和单链区(如5’突出区或末端)的桥连多核苷酸。在一些实施方案中，衔接子可以是包含双链区和单链区(如3’突出区或末端)的桥连多核苷酸。在一些实施方案中，衔接子可以是包含双链区和两个单链区(如5’突出区或末端和3’突出区或末端)的桥连多核苷酸。在一些实施方案中，5’突出区可以与一个或多个靶标多核苷酸互补序列(诸如包含 UIDs的那些)的5’区互补，其中，在杂交时，所述衔接子可以连接到所述一个或多个靶标多核苷酸互补序列(诸如包含UIDs的那些)上。在一些实施方案中，5’突出区可以与一个或多个靶标多核苷酸互补序列(诸如包含UIDs的那些)的5’区互补，其中，在杂交时，所述衔接子可以在一个或多个包含UIDs的靶标多核苷酸互补序列的5’末端的紧邻处或与其相接。在一些实施方案中，5’突出区可以与一个或多个靶标多核苷酸互补序列(诸如包含UIDs的那些)的5’区互补，其中，当杂交时，所述衔接子可以在一个或多个靶标多核苷酸互补序列(诸如包含UIDs的那些)的5’ 磷酸末端的紧邻处或与其相接。在一些实施方案中，5’突出区可以与其在所述一个或多个靶标多核苷酸互补序列(诸如包含UIDs的那些)上互补的序列长度相同或长度基本上相同。In some embodiments, the 5' overhangs may be complementary to one or more target polynucleotide complementary sequences, such as those comprising UIDs. In some embodiments, the 5' overhang may be complementary to the 5' region of one or more polynucleotide complementary sequences, such as those comprising UIDs. In some embodiments, the 5' overhang may comprise a sequence complementary to a universal linker sequence, such as a universal linker sequence for RT primers or PE primers. In some embodiments, the 5' overhang may be complementary to the 5' region of one or more target polynucleotide complementary sequences, such as those comprising UIDs, wherein the 5' region is 5' to the UID. In some embodiments, an adaptor can be a bridging polynucleotide comprising a double-stranded region and a single-stranded region (e.g., a 5' overhang or end). In some embodiments, an adaptor can be a bridging polynucleotide comprising a double-stranded region and a single-stranded region (such as a 3' overhang or end). In some embodiments, an adaptor can be a bridging polynucleotide comprising a double-stranded region and two single-stranded regions (e.g., a 5' overhang or end and a 3' overhang or end). In some embodiments, the 5' overhanging region may be complementary to the 5' region of one or more target polynucleotide complementary sequences, such as those comprising UIDs, wherein, upon hybridization, the adaptor may be ligated to the on one or more target polynucleotide complementary sequences such as those comprising UIDs. In some embodiments, the 5' overhang can be complementary to the 5' region of one or more target polynucleotide complementary sequences, such as those comprising UIDs, wherein, upon hybridization, the adaptor can be in one or more Immediately adjacent to or adjacent to the 5' terminus of the complementary sequence of the target polynucleotide comprising the UIDs. In some embodiments, the 5' overhang can be complementary to the 5' region of one or more target polynucleotide complementary sequences, such as those comprising UIDs, wherein, when hybridized, the adaptor can be in one or more Immediately adjacent to or adjacent to the 5' phosphate terminus of a target polynucleotide complementary sequence, such as those comprising UIDs. In some embodiments, the 5' overhang may be the same length or substantially the same length as the sequence complementary to the one or more target polynucleotide complementary sequences, such as those comprising UIDs.

在一些实施方案中，可以将包含引物结合位点或引物结合位点互补物的多核苷酸衔接子序列添加到所述靶标多核苷酸互补序列(诸如可能包含UID的第一互补序列)上。在一些实施方案中，包含引物结合组(诸如用于指数扩增或测序)的第一引物结合位点的靶标多核苷酸互补序列，诸如可能包含UID的第一互补序列，可以是部分格式化的靶标多核苷酸互补序列，诸如可能包含UID的修饰的互补序列。在一些实施方案中，包含第一引物组的第一引物结合位点和第二引物组的第一引物结合位点(诸如用于指数扩增或测序)的靶标多核苷酸互补序列，诸如第一互补序列，可以是完全格式化的靶标多核苷酸互补序列，诸如可能包含UID的修饰的互补序列。在一些实施方案中，将引物结合位点或其互补物添加到多个靶标多核苷酸互补序列(诸如可能包含UID的第一互补序列)中的每一个上。在一些实施方案中，添加到多个靶标多核苷酸互补序列(诸如可能包含UID的第一互补序列)中的每一个上的引物结合位点或其互补物是相同的序列。在一些实施方案中，添加到多个靶标多核苷酸互补序列 (诸如可能包含UID的第一互补序列)中的每一个上的引物结合位点或其互补物是不同的序列。在一些实施方案中，添加到第一扩增子或扩增子组中的多个靶标多核苷酸互补序列中的每一个上的引物结合位点或其互补物与添加到第二扩增子或扩增子组中的多个靶标多核苷酸互补序列中的每一个上的引物结合位点或其互补物是相同的序列。用于本文时，扩增子包括扩增反应的多核苷酸产物。扩增子组包括由扩增反应产生的多核苷酸的克隆群体。在一些实施方案中，扩增子组通过扩增单个起始序列而形成。在一些实施方案中，扩增子组包括在扩增反应中来源于单一多核苷酸的一群多核苷酸。在一些实施方案中，扩增子组包括在扩增反应中来源于单一多核苷酸或所述多核苷酸的扩增子的一群多核苷酸。扩增子可以通过多种扩增反应产生。扩增子可以包括多个拷贝的一种或多种核酸。在一些实施方案中，扩增子或扩增子组通过PCR产生。在一些实施方案中，扩增子或扩增子组不是通过PCR产生。In some embodiments, a polynucleotide adaptor sequence comprising a primer binding site or primer binding site complement can be added to the target polynucleotide complement (such as a first complement that may include a UID). In some embodiments, the target polynucleotide complementary sequence comprising the first primer binding site of a primer binding set (such as for exponential amplification or sequencing), such as the first complementary sequence that may comprise a UID, may be partially formatted The complementary sequence of the target polynucleotide, such as a modified complementary sequence that may comprise a UID. In some embodiments, a target polynucleotide complementary sequence, such as the first primer binding site of the first primer set, and the first primer binding site of the second primer set (such as for exponential amplification or sequencing), such as the A complementary sequence, which can be a fully formatted complementary sequence of the target polynucleotide, such as a modified complementary sequence that may contain a UID. In some embodiments, a primer binding site or its complement is added to each of a plurality of target polynucleotide complements (such as a first complement that may comprise a UID). In some embodiments, the primer binding site or its complement added to each of a plurality of target polynucleotide complementary sequences (such as a first complementary sequence that may comprise a UID) is the same sequence. In some embodiments, the primer binding sites or complements thereof added to each of a plurality of target polynucleotide complementary sequences (such as the first complementary sequence that may comprise a UID) are different sequences. In some embodiments, the primer binding site or its complement added to each of the plurality of target polynucleotide complementary sequences in the first amplicon or set of amplicons is the same as that added to the second amplicon Or the primer binding site or its complement on each of the multiple target polynucleotide complementary sequences in the amplicon set is the same sequence. As used herein, amplicon includes the polynucleotide product of an amplification reaction. An amplicon set includes a clonal population of polynucleotides produced by an amplification reaction. In some embodiments, sets of amplicons are formed by amplifying a single starting sequence. In some embodiments, the set of amplicons includes a population of polynucleotides derived from a single polynucleotide in an amplification reaction. In some embodiments, a set of amplicons includes a population of polynucleotides derived from a single polynucleotide or amplicons of said polynucleotide in an amplification reaction. Amplicons can be generated by a variety of amplification reactions. Amplicons can include multiple copies of one or more nucleic acids. In some embodiments, the amplicons or sets of amplicons are generated by PCR. In some embodiments, the amplicons or sets of amplicons are not generated by PCR.

在一些实施方案中，添加到第一扩增子或扩增子组中的多个靶标多核苷酸互补序列中的每一个上的引物结合位点或其互补物与添加到第二扩增子或扩增子组中的多个靶标多核苷酸互补序列中的每一个上的引物结合位点或其互补物是不同的序列。在一些实施方案中，添加到来自第一样品的多个包含UID的多核苷酸中的每一个上的引物结合位点或其互补物与添加到来自第二样品的多个靶标多核苷酸互补序列中的每一个上的引物结合位点或其互补物是不同的序列。在一些实施方案中，添加到来自第一样品的多个靶标多核苷酸互补序列中的每一个上的引物结合位点或其互补物与添加到来自第二样品的多个靶标多核苷酸互补序列中的每一个上的引物结合位点或其互补物是相同的序列。在一些实施方案中，引物结合位点或其互补物包含已知的序列。在一些实施方案中，引物结合位点或其互补物包含用于扩增的引物结合位点。在一些实施方案中，引物结合位点或其互补物包含通用引发序列。在一些实施方案中，引物结合位点或其互补物包含针对引物组中第一引物的第一引物结合。在一些实施方案中，引物结合位点或其互补物包含用于进行指数扩增反应(如PCR)的第一引物结合，例如，要用在下文所述的PCR富集步骤中。在一些实施方案中，引物结合位点或其互补物包含用于进行非指数扩增反应的第一引物结合。在一些实施方案中，引物结合位点或其互补物包含用于测序的引物结合位点。在一些实施方案中，引物结合位点或其互补物包含用于分析的引物结合位点。In some embodiments, the primer binding site or its complement added to each of the plurality of target polynucleotide complementary sequences in the first amplicon or set of amplicons is the same as that added to the second amplicon Or the primer binding site or its complement on each of the multiple target polynucleotide complementary sequences in the amplicon set is a different sequence. In some embodiments, the primer binding site or its complement added to each of the plurality of UID-containing polynucleotides from the first sample is associated with the plurality of target polynucleotides added to the second sample from the second sample. The primer binding site or its complement on each of the complementary sequences is a different sequence. In some embodiments, the primer binding site or its complement added to each of the complementary sequences of the plurality of target polynucleotides from the first sample is the same as that added to the plurality of target polynucleotides from the second sample The primer binding site or its complement on each of the complementary sequences is the same sequence. In some embodiments, the primer binding site or its complement comprises a known sequence. In some embodiments, the primer binding site or its complement comprises a primer binding site for amplification. In some embodiments, the primer binding site or its complement comprises a universal priming sequence. In some embodiments, the primer binding site or its complement comprises the first primer binding to the first primer in the primer set. In some embodiments, the primer binding site or its complement comprises the first primer binding for performing an exponential amplification reaction (such as PCR), e.g., to be used in the PCR enrichment step described below. In some embodiments, the primer binding site or its complement comprises a first primer binding for performing a non-exponential amplification reaction. In some embodiments, the primer binding site or its complement comprises a primer binding site for sequencing. In some embodiments, the primer binding site or its complement comprises a primer binding site for analysis.

在一些实施方案中，多核苷酸衔接子序列还包含样品条形码序列 (SBC)。在所述的方法中，在通用衔接子序列上的样品条形码可以消除针对所用的每种UID的多种探针组的需求。当用于本文时，多核苷酸上的样品条形码(SBC)包含可以用于鉴定多核苷酸所来源的来源的序列。例如，核酸样品可以是来源于多种不同样品的多核苷酸(例如，来源于不同个体、不同组织或细胞的多核苷酸，在不同时间点分离的多核苷酸)的集合，其中来自多种样品中每种不同的样品的多核苷酸用特有的SBC标记。因此，SBC提供多核苷酸与其来源之间的相关性(美国专利号7,537,897， 7,544,473和7,393,665)。在一些实施方案中，可以使用相同的SBC来标记在不同的实验中处理的不同的样品。在一些实施方案中，可以使用不同的SBC来标记在一个实验中处理的每份不同的样品或样品亚组。例如，来自患有疾病或病症的一个或多个受试者的样品可以具有第一SBC，而来自没有患有疾病或病症的一个或多个受试者的样品可以具有第二不同的SBC。例如，来源于相同样品的不同的样品可以用不同的SBCs标记。In some embodiments, the polynucleotide adaptor sequence further comprises a sample barcode sequence (SBC). In the described method, sample barcodes on universal adaptor sequences can eliminate the need for multiple probe sets for each UID used. As used herein, a sample barcode (SBC) on a polynucleotide contains sequences that can be used to identify the source from which the polynucleotide was derived. For example, a nucleic acid sample can be a collection of polynucleotides derived from multiple different samples (eg, polynucleotides derived from different individuals, different tissues or cells, polynucleotides isolated at different time points), wherein The polynucleotide for each different sample in the sample is labeled with a unique SBC. Thus, the SBC provides a correlation between a polynucleotide and its source (US Pat. Nos. 7,537,897, 7,544,473 and 7,393,665). In some embodiments, the same SBC can be used to label different samples processed in different experiments. In some embodiments, a different SBC can be used to label each different sample or subset of samples processed in an experiment. For example, a sample from one or more subjects with a disease or disorder can have a first SBC, while a sample from one or more subjects without a disease or disorder can have a second, different SBC. For example, different samples derived from the same sample can be labeled with different SBCs.

在一些实施方案中，多核苷酸衔接子序列进一步包含SBC或其互补物，其在衔接子的引物结合位点序列或其互补物与衔接子区(诸如与所述一个或多个靶标多核苷酸互补序列的序列互补的5’突出区)之间。在一些实施方案中，多核苷酸衔接子序列还包含SBC，其中所述SBC在衔接子的双链体区域内。在一些实施方案中，多核苷酸衔接子序列还包含 SBC，其中所述SBC不在衔接子的双链体区域内。在一些实施方案中，多核苷酸衔接子序列还包含SBC，其中所述SBC是在衔接子的单链区域内。在一些实施方案中，多核苷酸衔接子序列还包含SBC，其中所述SBC 是在与包含与所述一个或多个靶标多核苷酸互补序列互补的区域(诸如5’ 突出区)的链不同的链上。在一些实施方案中，多核苷酸衔接子序列还包含SBC，其中所述SBC是在与包含与一个或多个靶标多核苷酸互补序列 (诸如第一互补序列)互补的区域(诸如5’突出区)的链相同的链上。在一些实施方案中，多核苷酸衔接子序列还包含SBC，其中所述SBC是在不包含与一个或多个靶标多核苷酸互补序列(诸如第一互补序列)互补的区域(诸如5’突出区)的链上。在一些实施方案中，添加到多个靶标多核苷酸互补序列(诸如第一互补序列)上的引物结合位点或其互补物是衔接子的SBC序列的5’。在一些实施方案中，添加到多个靶标多核苷酸互补序列(诸如第一互补序列)上的引物结合位点或其互补物是衔接子的 SBC序列的3’。In some embodiments, the polynucleotide adaptor sequence further comprises an SBC or its complement that is in a between the sequence-complementary 5' overhangs of the acid-complementary sequences). In some embodiments, the polynucleotide adaptor sequence further comprises an SBC, wherein the SBC is within the duplex region of the adaptor. In some embodiments, the polynucleotide adaptor sequence further comprises an SBC, wherein the SBC is not within the duplex region of the adaptor. In some embodiments, the polynucleotide adaptor sequence further comprises an SBC, wherein the SBC is within the single-stranded region of the adaptor. In some embodiments, the polynucleotide adaptor sequence further comprises an SBC, wherein the SBC is different from a strand comprising a region complementary to the one or more target polynucleotides complementary sequences, such as a 5' overhang on the chain. In some embodiments, the polynucleotide adaptor sequence further comprises an SBC, wherein the SBC is in a region (such as a 5' overhang) comprising a complementary sequence (such as a first complementary sequence) to one or more target polynucleotides region) on the same chain as the chain. In some embodiments, the polynucleotide adaptor sequence further comprises an SBC, wherein the SBC is in a region (such as a 5' overhang) that is not complementary to one or more target polynucleotide complementary sequences (such as the first complementary sequence) region) chain. In some embodiments, the primer binding site or its complement added to a plurality of target polynucleotide complementary sequences (such as the first complementary sequence) is 5' to the SBC sequence of the adaptor. In some embodiments, the primer binding site or its complement added to a plurality of target polynucleotide complementary sequences (such as the first complementary sequence) is 3' to the SBC sequence of the adaptor.

方法可以进一步包括在进行一个或多个反应中的任一个之前组合第一和第二样品。在一些实施方案中，方法进一步包括组合由第一和第二样品产生的多核苷酸。在一些实施方案中，方法进一步包括在进行引物延伸反应后组合由第一和第二样品产生的多核苷酸。在一些实施方案中，方法进一步包括在将衔接子附加到第一或第二样品中的多核苷酸上后组合由第一和第二样品产生的多核苷酸。在一些实施方案中，方法进一步包括在将包含SBC的衔接子连接到第一或第二样品中的多核苷酸上后组合由第一和第二样品产生的多核苷酸。在一些实施方案中，方法进一步包括组合由第一和第二样品产生的靶标多核苷酸互补序列。在一些实施方案中，方法进一步包括组合由第一和第二样品产生的包含一个或多个引物结合位点(诸如一个或多个通用引物结合位点)的多核苷酸。在一些实施方案中，方法进一步包括在进行第一和/或第二样品中的多核苷酸的指数扩增后组合由第一和第二样品产生的多核苷酸。在一些实施方案中，可以使用 SBC确定来源于第一样品和第二样品的多核苷酸的样品起源。在一些实施方案中，来源于第一样品和第二样品的多核苷酸的样品起源可以使用 UID来确定。来源于第一样品和第二样品的多核苷酸的样品起源可以使用引物结合位点序列来确定。来源于第一样品和第二样品的多核苷酸的样品来源可以使用靶标特异性序列来确定。The method can further include combining the first and second samples prior to performing any of the one or more reactions. In some embodiments, the method further comprises combining the polynucleotides produced from the first and second samples. In some embodiments, the method further comprises combining the polynucleotides generated from the first and second samples after performing the primer extension reaction. In some embodiments, the method further comprises combining the polynucleotides generated from the first and second samples after attaching the adaptor to the polynucleotides in the first or second samples. In some embodiments, the method further comprises combining the polynucleotides generated from the first and second samples after ligating the SBC-containing adaptor to the polynucleotides in the first or second samples. In some embodiments, the method further comprises combining target polynucleotide complementary sequences generated from the first and second samples. In some embodiments, the method further comprises combining polynucleotides comprising one or more primer binding sites (such as one or more universal primer binding sites) generated from the first and second samples. In some embodiments, the method further comprises combining the polynucleotides produced from the first and second samples after performing exponential amplification of the polynucleotides in the first and/or second samples. In some embodiments, SBC can be used to determine the sample origin of polynucleotides derived from the first sample and the second sample. In some embodiments, the sample origin of the polynucleotides derived from the first sample and the second sample can be determined using UIDs. The sample origin of the polynucleotides derived from the first sample and the second sample can be determined using the primer binding site sequences. The sample origin of the polynucleotides derived from the first sample and the second sample can be determined using target specific sequences.

任选的清理optional cleaning

在一些实施方案中，方法进一步包括任选地纯化一种或多种衔接子标记的多核苷酸，诸如可能包含UID的修饰的互补序列。在一些实施方案中，添加到多个靶标多核苷酸互补序列(诸如第一互补序列)上的衔接子包含亲和标记。亲和标记可以结合在结合配偶体上，并且不与结合配偶体结合的分子(例如，没有亲和标记的分子)可以被洗掉，或者亲和标记的分子可以与没有亲和标记的分子分离。在一些实施方案中，亲和标记可以是特异性结合第二分子的第一分子。在一些实施方案中，亲和标记可以是已知的核苷酸序列。在一些实施方案中，亲和标记可以是化学结构部分。在一些实施方案中，亲和标记可以是生物素或链霉抗生物素。在一些实施方案中，亲和标记可以是肽或蛋白，诸如抗体。因此，衔接子可以包含蛋白-核酸复合物。可以使用本领域已知的任意亲和标记。在一些实施方案中，亲和标记可以用来从一种或多种其他的多核苷酸纯化衔接子修饰的(例如，连接的或扩增的)靶标多核苷酸互补序列，诸如可能包含 UID的修饰的互补序列。可以使用包含一种或多种固定的与亲和标记结合的多核苷酸、化学或蛋白样分子的支持物或表面。例如，亲和标记可以用来通过使衔接子修饰的靶标多核苷酸互补序列的生物素与包含链霉抗生物素结构部分的表面或基底结合而从一种或多种其他的多核苷酸纯化衔接子靶标多核苷酸互补序列，诸如可能包含UID的修饰的互补序列。当用于本文时，固定包括通过一个或多个共价键或非共价键直接或间接连接到固体支持物上。在一些实施方案中，固定包括通过杂交直接或间接连接到固体支持物上。在一些实施方案中，亲和标记可以用于从一个或多个不感兴趣的多核苷酸序列(诸如非靶标多核苷酸)纯化衔接子靶标多核苷酸互补序列，诸如可能包含UID的修饰的互补序列。在一些实施方案中，亲和标记可以用于从之前的反应或方法步骤中所用的一个或多个引物中纯化衔接子靶标多核苷酸互补序列，诸如可能包含UID的修饰的互补序列。在一些实施方案中，亲和标记可以用于从之前的反应或方法步骤中所用的一个或多个引物中、或从一个或多个不感兴趣的多核苷酸序列(诸如非靶标多核苷酸)纯化衔接子靶标多核苷酸互补序列，诸如可能包含UID 的修饰的互补序列。在一些实施方案中，在所述的方法中不使用亲和标记。例如，在一些实施方案中，衔接子不包含亲和标记。例如，在一些实施方案中，当靶分子是RNA时，在所述的方法中不使用亲和标记。In some embodiments, the method further comprises optionally purifying one or more adaptor-tagged polynucleotides, such as modified complementary sequences that may comprise a UID. In some embodiments, the adapters added to the complementary sequences of the plurality of target polynucleotides, such as the first complementary sequence, comprise affinity tags. The affinity tag can bind to the binding partner and molecules that do not bind to the binding partner (e.g., molecules without the affinity tag) can be washed away, or the affinity tagged molecules can be separated from the molecules without the affinity tag . In some embodiments, the affinity tag can be a first molecule that specifically binds a second molecule. In some embodiments, the affinity tag can be a known nucleotide sequence. In some embodiments, the affinity tag can be a chemical moiety. In some embodiments, the affinity tag can be biotin or streptavidin. In some embodiments, the affinity tag can be a peptide or protein, such as an antibody. Thus, an adaptor can comprise a protein-nucleic acid complex. Any affinity tag known in the art can be used. In some embodiments, affinity tags can be used to purify adapter-modified (eg, ligated or amplified) target polynucleotide complements, such as those that may include UIDs, from one or more other polynucleotides. Modified complementary sequence. A support or surface comprising one or more immobilized polynucleotides, chemical or protein-like molecules bound to an affinity tag can be used. For example, affinity tags can be used to purify from one or more other polynucleotides by binding the adaptor-modified biotin of the complementary sequence of the target polynucleotide to a surface or substrate comprising a streptavidin moiety Adapter target polynucleotide complements, such as modified complements that may comprise a UID. As used herein, immobilization includes direct or indirect attachment to a solid support through one or more covalent or non-covalent bonds. In some embodiments, immobilization includes direct or indirect attachment to a solid support by hybridization. In some embodiments, affinity tags can be used to purify adaptor target polynucleotide complements, such as modified complements that may include UIDs, from one or more uninteresting polynucleotide sequences (such as non-target polynucleotides) sequence. In some embodiments, affinity tags can be used to purify adapter target polynucleotide complements, such as modified complements that may include UIDs, from one or more primers used in previous reactions or method steps. In some embodiments, affinity tags can be used from one or more primers used in previous reactions or method steps, or from one or more polynucleotide sequences of no interest (such as non-target polynucleotides) Adapter target polynucleotide complements, such as modified complements that may comprise UIDs, are purified. In some embodiments, no affinity tag is used in the method. For example, in some embodiments, the adaptor does not contain an affinity tag. For example, in some embodiments, when the target molecule is RNA, no affinity tag is used in the method.

线性引物延伸/线性扩增Linear primer extension/linear amplification

方法可以进一步包括进行第二单轮引物延伸或线性引物延伸(还称为线性扩增)。在一些实施方案中，用于线性延伸/扩增的一个或多个引物与用于反转录或引物延伸步骤的一个或多个RT或PE引物分开在一个或多个分开的反应中。通过以这种方式制备引物对，可以减少不需要的引物相互作用。用于本文时，线性扩增或线性延伸是指产物拷贝数非指数延伸的过程。在一些实施方案中，在每个线性扩增循环过程中，只有模板链被复制。在一些实施方案中，在线性扩增过程中，引物延伸本身不被复制。当使用单个不配对的引物替代两个引物时，结果是延伸产物拷贝数线性生长，而不是如PCR中那样两条链指数生长。The method may further comprise performing a second single round of primer extension or linear primer extension (also referred to as linear amplification). In some embodiments, the one or more primers used for linear extension/amplification and the one or more RT or PE primers used for the reverse transcription or primer extension steps are separated in one or more separate reactions. By preparing primer pairs in this way, unwanted primer interactions can be reduced. As used herein, linear amplification or linear extension refers to the process of non-exponential extension of product copy number. In some embodiments, during each cycle of linear amplification, only the template strand is replicated. In some embodiments, primer extension itself is not replicated during linear amplification. When a single unpaired primer is used in place of both primers, the result is a linear growth of the extension product copy number, rather than an exponential growth of both strands as in PCR.

使用本文所述的引物，由上述方法或方法步骤中的一个或多个产生的 DNA多核苷酸可以与引物(LPE引物)杂交，并且可以使用本领域已知的适当试剂进行线性引物延伸。例如，一个或多个靶标多核苷酸互补序列，诸如可能包含UID的修饰的互补序列，可以与LPE引物杂交，并且可以进行线性引物延伸。例如，一个或多个已经添加了衔接子的靶标多核苷酸互补序列，诸如可能包含UID的第一互补序列，可以与LPE引物杂交，并且可以进行线性引物延伸。在一些实施方案中，LPE包含UID。在一些实施方案中，LPE包含UID，并且RT或PE引物不包含UID。在一些实施方案中，LPE包含UID，并且RT或PE引物包含UID。在一些实施方案中，LPE和RT引物包含UID，并且PE引物不包含UID。在一些实施方案中，LPE和PE引物包含UID，并且RT引物不包含UID。Using the primers described herein, DNA polynucleotides produced by one or more of the above-described methods or method steps can be hybridized to primers (LPE primers), and linear primer extension can be performed using appropriate reagents known in the art. For example, one or more target polynucleotide complements, such as modified complements that may comprise a UID, can hybridize to an LPE primer, and linear primer extension can be performed. For example, one or more target polynucleotide complements to which an adaptor has been added, such as a first complement that may comprise a UID, can hybridize to the LPE primer, and linear primer extension can be performed. In some embodiments, the LPE includes a UID. In some embodiments, the LPE contains a UID, and the RT or PE primers do not contain a UID. In some embodiments, the LPE comprises the UID and the RT or PE primer comprises the UID. In some embodiments, the LPE and RT primers contain UIDs, and the PE primers do not contain UIDs. In some embodiments, the LPE and PE primers contain UIDs, and the RT primers do not.

在一些实施方案中，线性引物延伸包括LPE引物的多次延伸。在一些实施方案中，线性引物延伸包括多个LPE引物中的每个LPE引物的多次延伸。在一些实施方案中，线性引物延伸包括多个LPE引物中的每个LPE引物的多次延伸，其中多个引物中的每一个LPE引物靶向不同的多核苷酸。在一些实施方案中，线性引物延伸包括多个LPE引物中的每个 LPE引物的多次延伸，其中多个引物中的每一个LPE引物靶向相同的多核苷酸。在一些实施方案中，第二轮引物延伸包括LPE引物的单词次延伸。在一些实施方案中，线性引物延伸不包括引物的多次延伸。在一些实施方案中，方法包括使用一个或多个引物(LPE引物)在包含衔接子的一个或多个靶标多核苷酸互补序列(诸如可能包含UID的修饰的互补序列) 上进行线性引物延伸，以形成互补的多核苷酸，如DNA。在一些实施方案中，方法包括在一个或多个靶标多核苷酸互补序列上进行线性引物延伸，其中所述一个或多个靶标多核苷酸互补序列不包含衔接子，诸如可能包含UID的第一互补序列。在一些实施方案中，LPE引物包括序列特异性引物。在一些实施方案中，多个LPE引物包括一个或多个序列特异性引物。在一些实施方案中，线性引物延伸反应是从包含靶标多核苷酸的样品产生多核苷酸温度的第一、第二、第三或第四步骤。在一些实施方案中，线性引物延伸反应是从包含靶标多核苷酸的样品产生多核苷酸文库的第三步骤。在一些实施方案中，线性引物延伸反应是从包含靶标多核苷酸的样品产生多核苷酸文库的第四步骤。在一些实施方案中，线性引物延伸反应在RT或PE反应后进行。在一些实施方案中，线性引物延伸反应在将衔接子添加到靶标多核苷酸互补序列(诸如可能包含UID的第一互补序列)上的反应后进行。在一些实施方案中，线性引物延伸反应在RT或 PE反应之后且在将衔接子添加到靶标多核苷酸互补序列(诸如可能包含 UID的第一互补序列)上的反应后进行。在一些实施方案中，线性引物延伸反应在进行指数扩增反应(如PCR)之前进行。在一些实施方案中，在线性引物延伸后的下一步骤中进行指数扩增。在一些实施方案中，不在线性引物延伸后的下一步骤中进行指数扩增。在一些实施方案中，不在线性引物延伸后的下2个步骤中进行指数扩增。在一些实施方案中，不在线性引物延伸后的下3个步骤中进行指数扩增。在一些实施方案中，由线性延伸步骤产生的靶标多核苷酸互补序列的互补多核苷酸(诸如可能包含 UID的第二互补序列)不在该步骤后进一步扩增。在一些实施方案中，所述方法包括仅一个周期的线性引物延伸。在另一些实施方案中，所述方法包括使与靶标多核苷酸互补序列杂交的引物重复延伸，以产生多个拷贝的所述靶标多核苷酸互补序列，诸如可能包含UID的第二互补序列。所述方法可以包括进行至少约1，2，3，4，5，6，7，8，9，10，11， 12，13，14，15，16，17，18，19，或20个线性引物延伸反应或线性引物延伸循环。在一些实施方案中，与应用非线性扩增步骤的相似的方法相比，在本文所述的应用线性扩增/延伸的方法中可能使用更少的样品输入。在一些实施方案中，与应用非线性扩增步骤的相似的方法相比，在本文所述的应用线性扩增/延伸的方法中可能使用更少的PCR循环。例如，对于使用线性扩增/延伸的方法，20个PCR循环可能是足够的，而对于使用非线性扩增步骤的方法可能需要24个PCR循环。In some embodiments, linear primer extension includes multiple extensions of the LPE primer. In some embodiments, linear primer extension includes multiple extensions of each LPE primer in the plurality of LPE primers. In some embodiments, linear primer extension comprises multiple extensions of each LPE primer in the plurality of LPE primers, wherein each LPE primer in the plurality of primers targets a different polynucleotide. In some embodiments, linear primer extension comprises multiple extensions of each LPE primer in the plurality of LPE primers, wherein each LPE primer in the plurality of primers targets the same polynucleotide. In some embodiments, the second round of primer extension includes a word-by-word extension of the LPE primer. In some embodiments, linear primer extension does not include multiple extensions of the primer. In some embodiments, the method comprises linear primer extension using one or more primers (LPE primers) on one or more target polynucleotide complements (such as modified complements that may include a UID) comprising an adaptor, to form complementary polynucleotides such as DNA. In some embodiments, the method comprises linear primer extension on one or more target polynucleotide complementary sequences, wherein the one or more target polynucleotide complementary sequences do not comprise an adaptor, such as a first possibly comprising a UID complementary sequence. In some embodiments, the LPE primers include sequence-specific primers. In some embodiments, the plurality of LPE primers include one or more sequence-specific primers. In some embodiments, the linear primer extension reaction is the first, second, third or fourth step of generating a polynucleotide temperature from a sample comprising a target polynucleotide. In some embodiments, a linear primer extension reaction is the third step in generating a polynucleotide library from a sample comprising the target polynucleotide. In some embodiments, the linear primer extension reaction is the fourth step in generating a polynucleotide library from a sample comprising the target polynucleotide. In some embodiments, the linear primer extension reaction is performed after the RT or PE reaction. In some embodiments, the linear primer extension reaction is performed after the reaction of adding an adaptor to the complementary sequence of the target polynucleotide, such as the first complementary sequence that may comprise a UID. In some embodiments, the linear primer extension reaction is performed after the RT or PE reaction and after the reaction to add an adaptor to the complementary sequence of the target polynucleotide, such as the first complementary sequence that may comprise a UID. In some embodiments, the linear primer extension reaction is performed prior to performing an exponential amplification reaction (e.g., PCR). In some embodiments, exponential amplification is performed in the next step after linear primer extension. In some embodiments, exponential amplification is not performed in the next step after linear primer extension. In some embodiments, exponential amplification is not performed in the next 2 steps after linear primer extension. In some embodiments, exponential amplification is not performed in the next 3 steps after linear primer extension. In some embodiments, the complementary polynucleotide of the target polynucleotide complementary sequence produced by the linear extension step (such as a second complementary sequence that may comprise a UID) is not further amplified after this step. In some embodiments, the method includes only one cycle of linear primer extension. In other embodiments, the method comprises repeatedly extending a primer that hybridizes to a complementary sequence of a target polynucleotide to generate multiple copies of a complementary sequence of the target polynucleotide, such as a second complementary sequence that may comprise a UID. The method can include performing at least about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 linear Primer extension reactions or linear primer extension cycles. In some embodiments, less sample input may be used in the methods described herein using linear amplification/extension as compared to similar methods using non-linear amplification steps. In some embodiments, fewer PCR cycles may be used in the methods described herein using linear amplification/extension as compared to similar methods using non-linear amplification steps. For example, 20 PCR cycles may be sufficient for methods using linear amplification/extension, while 24 PCR cycles may be required for methods using non-linear amplification steps.

所述一个或多个LPE引物可以包含与靶标多核苷酸互补序列的序列或互补序列(诸如第一互补序列或修饰的互补序列)互补的序列。例如，所述一个或多个LPE引物可以包含与靶标多核苷酸互补序列的序列或互补序列(诸如第一互补序列或修饰的互补序列)或初始样品中的靶标多核苷酸互补的序列。例如，所述一个或多个LPE引物可以包含与靶标多核苷酸互补序列的序列或互补序列(诸如第一互补序列或修饰的互补序列，其是扩增反应、连接反应、引物延伸或它们的组合的产物)互补的序列。The one or more LPE primers may comprise a sequence complementary to a sequence complementary to the target polynucleotide or a complementary sequence, such as a first complementary sequence or a modified complementary sequence. For example, the one or more LPE primers can comprise a sequence complementary to a target polynucleotide or a complementary sequence (such as a first complementary sequence or a modified complementary sequence) or a sequence complementary to the target polynucleotide in the original sample. For example, the one or more LPE primers may comprise a sequence complementary to the target polynucleotide or a complementary sequence (such as a first complementary sequence or a modified complementary sequence, which is an amplification reaction, ligation reaction, primer extension, or combinations thereof). product of the combination) complementary sequences.

在一些实施方案中，一个或多个LPE引物包含与靶标多核苷酸的互补序列互补的序列。在一些实施方案中，一个或多个LPE引物包含与靶标多核苷酸互补序列的序列(诸如第一互补序列或修饰的互补序列)互补的序列。在一些实施方案中，一个或多个LPE引物包含与靶标多核苷酸的互补序列互补的第一序列和与靶标多核苷酸互补序列的序列(诸如第一互补序列或修饰的互补序列)互补的第二序列。在一些实施方案中，第一和第二序列是相同的序列。在一些实施方案中，第一和第二序列是不同的序列。在一些实施方案中，一个或多个LPE引物上与靶标多核苷酸互补序列(诸如第一互补序列或修饰的互补序列)互补的序列不与靶标序列互补。在一些实施方案中，一个或多个LPE引物上与包含UID的多核苷酸互补的序列不与不包含UID的任意多核苷酸互补。在一些实施方案中，一个或多个LPE引物上与靶标多核苷酸互补序列(诸如第一互补序列或修饰的互补序列)互补的序列不与样品中任意其他的多核苷酸互补。In some embodiments, the one or more LPE primers comprise a sequence complementary to the complementary sequence of the target polynucleotide. In some embodiments, the one or more LPE primers comprise a sequence complementary to a sequence complementary to the target polynucleotide, such as a first complementary sequence or a modified complementary sequence. In some embodiments, the one or more LPE primers comprise a first sequence complementary to the complementary sequence of the target polynucleotide and a sequence complementary to the complementary sequence of the target polynucleotide (such as the first complementary sequence or the modified complementary sequence) Second sequence. In some embodiments, the first and second sequences are the same sequence. In some embodiments, the first and second sequences are different sequences. In some embodiments, sequences on one or more LPE primers that are complementary to a target polynucleotide complementary sequence (such as a first complementary sequence or a modified complementary sequence) are not complementary to the target sequence. In some embodiments, the sequence on one or more LPE primers that is complementary to a polynucleotide comprising a UID is not complementary to any polynucleotide that does not comprise a UID. In some embodiments, sequences on one or more LPE primers that are complementary to a target polynucleotide complement (such as a first complement or a modified complement) are not complementary to any other polynucleotides in the sample.

在一些实施方案中，靶标多核苷酸互补序列是单链的多核苷酸。在一些实施方案中，靶标多核苷酸互补序列是双链的多核苷酸。在一些实施方案中，靶标多核苷酸互补序列，诸如第一互补序列，是来自PE或RT反应的延伸产物。在一些实施方案中，靶标多核苷酸互补序列还包含衔接子序列，诸如连接的衔接子序列或修饰的互补序列。在一些实施方案中，靶标多核苷酸互补序列，是来自PE或RT反应的延伸产物，其还包含衔接子序列，诸如修饰的互补序列。在一些实施方案中，靶标多核苷酸互补序列是来自PE或RT反应的延伸产物，其还包含第一引物位点，诸如PCR、测序或通用引发位点。在一些实施方案中，靶标多核苷酸互补序列，诸如第一、第二或修饰的互补序列，固定在基底或表面上。在一些实施方案中，靶标多核苷酸互补序列，诸如第一或修饰的互补序列，包含SBC。In some embodiments, the target polynucleotide complement is a single-stranded polynucleotide. In some embodiments, the target polynucleotide complement is a double-stranded polynucleotide. In some embodiments, the target polynucleotide complement, such as the first complement, is an extension product from a PE or RT reaction. In some embodiments, the target polynucleotide complementary sequence further comprises an adaptor sequence, such as a ligated adaptor sequence or a modified complementary sequence. In some embodiments, the target polynucleotide complement, which is an extension product from a PE or RT reaction, further comprises an adaptor sequence, such as a modified complement. In some embodiments, the target polynucleotide complement is an extension product from a PE or RT reaction that further comprises a first primer site, such as a PCR, sequencing, or universal priming site. In some embodiments, a target polynucleotide complement, such as a first, second or modified complement, is immobilized on a substrate or surface. In some embodiments, the target polynucleotide complement, such as the first or modified complement, comprises an SBC.

在一些实施方案中，一个或多个LPE引物上与靶标多核苷酸互补序列(诸如第一或修饰的互补序列)互补的序列不是与任意靶标多核苷酸的第一链互补的序列。在一些实施方案中，一个或多个LPE引物上与靶标多核苷酸互补序列(诸如第一或修饰的互补序列)互补的序列与在RT或 PE反应中产生的序列互补。在一些实施方案中，一个或多个LPE引物上与靶标多核苷酸互补序列(诸如第一或修饰的互补序列)互补的序列与靶标多核苷酸的互补序列互补，其能够杂交到所述靶标多核苷酸在与RT或 PE引物互补的靶标多核苷酸序列的5’的序列上。在一些实施方案中，一个或多个LPE引物上与靶标多核苷酸互补序列互补的序列与靶标多核苷酸的互补序列互补，其杂交到在所述靶标多核苷酸与RT或PE引物互补的序列的3’的靶标序列上。在一些实施方案中，包含用于通过本文所述的任意方法分析的变体或区域的靶标多核苷酸的序列可以位于靶标多核苷酸与一个或多个RT或PE引物互补的序列与所述靶标多核苷酸上其互补物与一个或多个LPE引物互补的序列之间。In some embodiments, the sequence on one or more LPE primers that is complementary to the complementary sequence of the target polynucleotide (such as the first or modified complementary sequence) is not the sequence complementary to the first strand of any target polynucleotide. In some embodiments, the sequence complementary to the complementary sequence of the target polynucleotide (such as the first or modified complementary sequence) on the one or more LPE primers is complementary to the sequence produced in the RT or PE reaction. In some embodiments, the sequence complementary to the complementary sequence of the target polynucleotide (such as the first or modified complementary sequence) on the one or more LPE primers is complementary to the complementary sequence of the target polynucleotide, which is capable of hybridizing to the target The polynucleotide is on a sequence 5' to the target polynucleotide sequence complementary to the RT or PE primer. In some embodiments, the sequence complementary to the complementary sequence of the target polynucleotide on one or more LPE primers is complementary to the complementary sequence of the target polynucleotide, which hybridizes to the complementary sequence of the target polynucleotide to the RT or PE primer. sequence 3' to the target sequence. In some embodiments, the sequence of a target polynucleotide comprising a variant or region for analysis by any of the methods described herein may be located between the target polynucleotide and one or more RT or PE primers complementary to the sequence of the target polynucleotide. Between sequences on the target polynucleotide whose complements are complementary to one or more LPE primers.

在一些实施方案中，一个或多个LPE引物上与靶标多核苷酸互补序列(诸如第一或修饰的互补序列)互补的序列不是与一个或多个PE或 RT引物的序列互补的序列。在一些实施方案中，一个或多个LPE引物上与靶标多核苷酸互补序列(诸如第一或修饰的互补序列)互补的序列不是与一个或多个PE或RT引物的靶标特异性序列互补的序列。In some embodiments, the sequence on one or more LPE primers that is complementary to a target polynucleotide complementary sequence (such as a first or modified complementary sequence) is not a sequence complementary to that of one or more PE or RT primers. In some embodiments, the sequence on the one or more LPE primers that is complementary to the target polynucleotide complementary sequence (such as the first or modified complementary sequence) is not complementary to the target-specific sequence of the one or more PE or RT primers sequence.

在一些实施方案中，一个或多个LPE引物包括具有与第一模板多核苷酸的序列互补的区域的第一LPE引物和具有与第二模板多核苷酸的序列互补的区域的第二LPE引物。例如，第一模板多核苷酸可以是第一DNA 分子，并且第二个第一模板多核苷酸可以是第二DNA分子。例如，第一模板多核苷酸可以是来源于样品中第一靶标多核苷酸的第一DNA分子，并且第二个第一模板多核苷酸可以是来源于样品中的第二靶标多核苷酸的第二DNA分子。在一些实施方案中，一个或多个LPE引物包括具有与第一DNA的序列互补的区域的第一LPE引物，并且一个或多个第二LPE 引物分别具有与一个或多个第二DNAs的序列互补的区域。在一些实施方案中，第一和第二DNAs的序列是相同的。在一些实施方案中，第一和第二DNAs的序列是不同的。在一些实施方案中，第一和第二模板序列是相同的。在一些实施方案中，第一和第二模板序列是不同的。在一些实施方案中，第一和第二靶标序列是相同的。在一些实施方案中，第一和第二靶标序列是不同的。In some embodiments, the one or more LPE primers include a first LPE primer having a region complementary to the sequence of the first template polynucleotide and a second LPE primer having a region complementary to the sequence of the second template polynucleotide . For example, the first template polynucleotide can be a first DNA molecule and the second first template polynucleotide can be a second DNA molecule. For example, a first template polynucleotide can be a first DNA molecule derived from a first target polynucleotide in a sample, and a second first template polynucleotide can be derived from a second target polynucleotide in the sample. second DNA molecule. In some embodiments, the one or more LPE primers comprise a first LPE primer having a region complementary to the sequence of the first DNA, and the one or more second LPE primers each have a sequence to the one or more second DNAs complementary regions. In some embodiments, the sequences of the first and second DNAs are identical. In some embodiments, the sequences of the first and second DNAs are different. In some embodiments, the first and second template sequences are the same. In some embodiments, the first and second template sequences are different. In some embodiments, the first and second target sequences are the same. In some embodiments, the first and second target sequences are different.

LPE引物可以进一步包含不与模板区域互补的区域。在一些实施方案中，LPE引物可以进一步包含已知的序列，诸如通用引物结合位点或与通用引发位点互补的序列。在一些实施方案中，LPE引物可以进一步在5’末端包含已知的序列，诸如通用引物结合位点或与通用引发位点互补的序列。在一些实施方案中，不与模板区域互补的区域是引物与所述模板互补的区域的5’。在一些实施方案中，不与模板区域互补的区域是5’ 突出区或3’突出区。在一些实施方案中，不与模板区域互补的区域包含用于扩增和/或测序反应的引发位点。在一些实施方案中，不与模板区域互补的区域包含用于扩增和/或测序反应的引物组的第二引物的引发位点。在一些实施方案中，不与模板区域互补的区域包含用于扩增和/或测序反应(诸如PCR反应或PCR富集步骤)的引物组的第二引物的引发位点。任选地，不与模板区域互补的区域包含用于在高通量测序平台上聚簇 (clustering)的通用序列。在一些实施方案中，不与模板区域互补的区域包含用于扩增和/或测序反应的引物组的第二引物的引发位点，其中引物组第一引物的引发位点包含在LPE模板内。在一些实施方案中，在之前的RT、PE、LPE或衔接子添加(例如，连接)反应中添加包含在LPE模板内内的引物组第一引物的引发位点。在一些实施方案中，LPE反应使用 DNA聚合酶进行。The LPE primers may further comprise regions that are not complementary to the template region. In some embodiments, the LPE primers may further comprise known sequences, such as a universal primer binding site or a sequence complementary to a universal priming site. In some embodiments, the LPE primers may further comprise a known sequence at the 5' end, such as a universal primer binding site or a sequence complementary to a universal priming site. In some embodiments, the region that is not complementary to the template region is 5' to the region of the primer that is complementary to the template. In some embodiments, the region that is not complementary to the template region is a 5' overhang or a 3' overhang. In some embodiments, regions that are not complementary to the template region comprise priming sites for amplification and/or sequencing reactions. In some embodiments, the region that is not complementary to the template region comprises the priming site for the second primer of the primer set used in the amplification and/or sequencing reaction. In some embodiments, the region that is not complementary to the template region comprises a priming site for a second primer of a primer set used in an amplification and/or sequencing reaction, such as a PCR reaction or a PCR enrichment step. Optionally, the regions that are not complementary to the template regions comprise universal sequences for clustering on high-throughput sequencing platforms. In some embodiments, the region that is not complementary to the template region comprises the priming site of the second primer of the primer set used for the amplification and/or sequencing reaction, wherein the priming site of the first primer of the primer set is contained within the LPE template . In some embodiments, the priming site of the first primer of the primer set contained within the LPE template is added in a previous RT, PE, LPE or adaptor addition (e.g., ligation) reaction. In some embodiments, the LPE reaction is performed using a DNA polymerase.

在一些实施方案中，LPE引物可以进一步包含第二UID。例如，多个LPE引物中的每个LPE引物可以包含不同的第二UID。这可以允许用第二UID标记从进行线性引物延伸反应的DNA分子复制的每一个 DNAs。在一些实施方案中，第二UID与在进行线性引物延伸反应的DNA 分子上的UID相同。在一些实施方案中，第二UID与进行线性引物延伸反应的DNA分子上的UID不同。在一些实施方案中，LPE引物不与模板区域互补的区域包含第二UID。在一些实施方案中，多个LPE引物中的每个LPE引物不与靶DNA的区域互补的区域包含第二UID。In some embodiments, the LPE primer may further comprise a second UID. For example, each LPE primer in the plurality of LPE primers may contain a different second UID. This may allow each DNAs replicated from the DNA molecule subjected to the linear primer extension reaction to be tagged with a second UID. In some embodiments, the second UID is the same as the UID on the DNA molecule subjected to the linear primer extension reaction. In some embodiments, the second UID is different from the UID on the DNA molecule subjected to the linear primer extension reaction. In some embodiments, the region of the LPE primer that is not complementary to the template region comprises the second UID. In some embodiments, each LPE primer in the plurality of LPE primers comprises a second UID in a region that is not complementary to a region of the target DNA.

在一些实施方案中，对于线性延伸/扩增步骤使用缓慢的升降温速率。在一些实施方案中，线性延伸/扩增引物以固定的整体浓度使用。在一些实施方案中，在线性扩增/延伸步骤中使用氯化镁、硫酸铵、D-(+)- 海藻糖、甜菜碱或它们的组合。In some embodiments, slow ramp rates are used for the linear extension/amplification steps. In some embodiments, linear extension/amplification primers are used at a fixed overall concentration. In some embodiments, magnesium chloride, ammonium sulfate, D-(+)-trehalose, betaine, or a combination thereof are used in the linear amplification/extension step.

PCR富集PCR enrichment

方法可以进一步包括进行指数扩增反应。在一些实施方案中，方法可以进一步包括进行PCR。例如，指数扩增反应可以使用多个正向/反向引物和反向引物。在一些实施方案中，指数扩增反应可以包括两个以上的指数扩增。在一些实施方案中，第一和/或第二PCR反应可以使用多个正向 /反向引物和多个反向引物。多个正向/反向引物的第一和/或第二引物可以是包含与模板多核苷酸(诸如DNA或cDNA分子)互补的区域的正向/ 反向引物。在一些实施方案中，多个正向/反向引物包含一种或多种正向/ 反向引物，其中多个正向/反向引物中的每一个正向/反向引物包含与一个或多个上游或下游引物结合位点(诸如通用引物结合位点)互补的区域。The method may further comprise performing an exponential amplification reaction. In some embodiments, the method can further comprise performing PCR. For example, exponential amplification reactions can use multiple forward/reverse primers and reverse primers. In some embodiments, an exponential amplification reaction can include more than two exponential amplifications. In some embodiments, the first and/or second PCR reaction can use multiple forward/reverse primers and multiple reverse primers. The first and/or second primers of the plurality of forward/reverse primers may be forward/reverse primers comprising a region complementary to a template polynucleotide, such as a DNA or cDNA molecule. In some embodiments, the plurality of forward/reverse primers comprise one or more forward/reverse primers, wherein each forward/reverse primer of the plurality of forward/reverse primers comprises one or more forward/reverse primers A region complementary to multiple upstream or downstream primer binding sites, such as a universal primer binding site.

在一些实施方案中，在引物延伸或反转录反应之前不进行指数扩增反应。在一些实施方案中，在产生靶标多核苷酸互补序列(诸如第一互补序列)之前不进行指数扩增反应，或者在产生靶标多核苷酸互补序列(诸如第一互补序列)之后进行指数扩增反应。In some embodiments, the exponential amplification reaction is not performed before the primer extension or reverse transcription reaction. In some embodiments, the exponential amplification reaction is not performed prior to generation of the target polynucleotide complement (such as the first complement), or the exponential amplification is performed after generation of the target polynucleotide complement (such as the first complement) reaction.

在一些实施方案中，在将衔接子附加到模板多核苷酸上之前不进行指数扩增反应，或者在将衔接子附加到模板多核苷酸上之后进行指数扩增反应。在一些实施方案中，在通过连接将衔接子附加到模板多核苷酸上之前不进行指数扩增反应，或者在通过连接将衔接子附加到模板多核苷酸上之后进行指数扩增反应。在一些实施方案中，在将衔接子附加到靶标多核苷酸互补序列(诸如第一互补序列)上之前不进行指数扩增反应，或者在将衔接子附加到靶标多核苷酸互补序列(诸如第一互补序列)上之后进行指数扩增反应。在一些实施方案中，在通过连接将衔接子附加到靶标多核苷酸互补序列(诸如第一互补序列)上之前不进行指数扩增反应，或者在通过连接将衔接子附加到靶标多核苷酸互补序列(诸如第一互补序列)上之后进行指数扩增反应。In some embodiments, the exponential amplification reaction is not performed before attaching the adaptor to the template polynucleotide, or the exponential amplification reaction is performed after attaching the adaptor to the template polynucleotide. In some embodiments, the exponential amplification reaction is not performed prior to attaching the adaptor to the template polynucleotide by ligation, or the exponential amplification reaction is performed after the adaptor is attached to the template polynucleotide by ligation. In some embodiments, the exponential amplification reaction is not performed prior to attaching the adaptor to the complementary sequence of the target polynucleotide (such as the first complementary sequence), or prior to attaching the adaptor to the complementary sequence of the target polynucleotide (such as the first complementary sequence) A complementary sequence) followed by an exponential amplification reaction. In some embodiments, the exponential amplification reaction is not performed prior to attaching the adaptor to the target polynucleotide complementary sequence (such as the first complementary sequence) by ligation, or before attaching the adaptor to the target polynucleotide complementary by ligation An exponential amplification reaction is then performed on a sequence such as the first complementary sequence.

在一些实施方案中，在将第一引发位点连接到模板序列或其互补物上用于指数扩增之前，不进行指数扩增反应，或者在将第一引发位点连接到模板序列或其互补物上用于指数扩增之后进行指数扩增反应。例如，指数扩增反应可能不在将引物组的第一引物的引发位点连接到模板序列或其互补物上之前进行，或者可以在将引物组的第一引物的引发位点连接到模板序列或其互补物上之后进行。在一些实施方案中，在将第一引发位点连接到靶标多核苷酸互补序列(诸如第一互补序列)上之前，不进行指数扩增反应，或者在将第一引发位点连接到靶标多核苷酸互补序列(诸如第一互补序列)上之后，进行指数扩增反应。在一些实施方案中，在通过连接将第一引发位点连接到包含靶标多核苷酸互补序列(诸如第一互补序列)的多核苷酸之前，不进行指数扩增反应，或者在通过连接将第一引发位点连接到包含靶标多核苷酸互补序列(诸如第一互补序列)的多核苷酸之后进行指数扩增反应。In some embodiments, the exponential amplification reaction is not performed prior to ligation of the first priming site to the template sequence or its complement for exponential amplification, or the first priming site is ligated to the template sequence or its complement. The exponential amplification reaction is performed after the complement is used for exponential amplification. For example, the exponential amplification reaction may not be performed prior to ligation of the priming site of the first primer of the primer set to the template sequence or its complement, or may be performed prior to ligation of the priming site of the first primer of the primer set to the template sequence or followed by its complement. In some embodiments, the exponential amplification reaction is not performed prior to ligation of the first priming site to the target polynucleotide complementary sequence (such as the first complementary sequence), or prior to ligating the first priming site to the target polynucleotide Following the addition of the nucleotide complement, such as the first complement, an exponential amplification reaction is performed. In some embodiments, the exponential amplification reaction is not performed prior to ligation of the first priming site to the polynucleotide comprising the complementary sequence of the target polynucleotide, such as the first complementary sequence, or the first priming site is ligated by ligation An exponential amplification reaction is carried out following ligation of a priming site to a polynucleotide comprising the complementary sequence of the target polynucleotide, such as the first complementary sequence.

在一些实施方案中，在将SBC连接到模板序列或其互补物上之前，不进行指数扩增反应，或者在将SBC连接到模板序列或其互补物上之后进行指数扩增反应。在一些实施方案中，在将SBC连接到靶标多核苷酸互补序列(诸如第一互补序列)上之前，不进行指数扩增反应，或者在将 SBC连接到靶标多核苷酸互补序列(诸如第一互补序列)上之后进行指数扩增反应。在一些实施方案中，在通过扩增向模板序列或其互补物中引入SBC的同时进行指数扩增反应。在一些实施方案中，在通过扩增向靶标多核苷酸互补序列(诸如第二互补序列)中引入SBC的同时进行指数扩增反应。In some embodiments, the exponential amplification reaction is not performed prior to ligation of the SBC to the template sequence or its complement, or the exponential amplification reaction is performed after the SBC is ligated to the template sequence or its complement. In some embodiments, the exponential amplification reaction is not performed prior to ligation of the SBC to the complementary sequence of the target polynucleotide (such as the first complementary sequence), or prior to ligation of the SBC to the complementary sequence of the target polynucleotide (such as the first complementary sequence) Complementary sequence) followed by exponential amplification reaction. In some embodiments, the exponential amplification reaction is performed concurrently with the introduction of the SBC to the template sequence or its complement by amplification. In some embodiments, the exponential amplification reaction is performed concurrently with the introduction of the SBC by amplification into the complementary sequence of the target polynucleotide, such as the second complementary sequence.

在一些实施方案中，在将通用引发序列连接到模板序列或其互补物序列之前，不进行指数扩增反应，或者在将通用引发序列连接到模板序列或其互补物序列之后进行指数扩增反应。在一些实施方案中，在将通用引发序列连接到靶标多核苷酸互补序列之前，不进行指数扩增反应，或者在将通用引发序列连接到靶标多核苷酸互补序列之后进行指数扩增反应。在一些实施方案中，在通过连接将通用引发序列连接到模板序列或其互补物上之前，不进行指数扩增反应，或者在通过连接将通用引发序列连接到模板序列或其互补物上之后，进行指数扩增反应。在一些实施方案中，在通过连接将通用引发序列连接到靶标多核苷酸互补序列上之前，不进行指数扩增反应，或者在通过连接将通用引发序列连接到靶标多核苷酸互补序列上之后进行指数扩增反应。In some embodiments, the exponential amplification reaction is not performed prior to ligation of the universal priming sequence to the template sequence or its complement sequence, or is performed after the universal priming sequence is ligated to the template sequence or its complement sequence . In some embodiments, the exponential amplification reaction is not performed prior to ligation of the universal primer sequence to the target polynucleotide complement, or the exponential amplification reaction is performed after the universal primer sequence is ligated to the target polynucleotide complement. In some embodiments, the exponential amplification reaction is not performed before the universal priming sequence is ligated to the template sequence or its complement by ligation, or after the universal priming sequence is ligated to the template sequence or its complement by ligation, An exponential amplification reaction was performed. In some embodiments, the exponential amplification reaction is not performed before the universal priming sequence is ligated to the target polynucleotide complement by ligation, or is performed after the universal priming sequence is ligated to the target polynucleotide complement by ligation Exponential amplification reaction.

在一些实施方案中，在线性扩增反应之前不进行指数扩增，或者在线性扩增反应之后进行指数扩增。在一些实施方案中，在将衔接子附加到线性扩增模板序列之前，不进行指数扩增反应，或者在将衔接子附加到线性扩增模板序列之后，进行指数扩增反应。在一些实施方案中，在通过连接将衔接子附加到线性扩增模板序列之前，不进行指数扩增反应，或者在通过连接将衔接子附加到线性扩增模板序列之后，进行指数扩增反应。在一些实施方案中，在将第一引发位点连接到线性扩增模板序列之前，不进行指数扩增反应，或者在将第一引发位点连接到线性扩增模板序列之后，进行指数扩增反应。在一些实施方案中，在将SBC连接到线性扩增模板序列之前，不进行指数扩增反应，或者在将SBC连接到线性扩增模板序列之后，进行指数扩增反应。在一些实施方案中，在将通用引发序列连接到线性扩增模板序列之前，不进行指数扩增反应，或者在将通用引发序列连接到线性扩增模板序列之后，进行指数扩增反应。In some embodiments, exponential amplification is not performed before the linear amplification reaction, or exponential amplification is performed after the linear amplification reaction. In some embodiments, the exponential amplification reaction is not performed before the adaptor is attached to the linearly amplified template sequence, or the exponential amplification reaction is performed after the adaptor is attached to the linearly amplified template sequence. In some embodiments, the exponential amplification reaction is not performed before the adaptor is attached to the linearly amplified template sequence by ligation, or the exponential amplification reaction is performed after the adaptor is attached to the linearly amplified template sequence by ligation. In some embodiments, exponential amplification is not performed prior to ligation of the first priming site to the linear amplification template sequence, or exponential amplification is performed after ligation of the first priming site to the linear amplification template sequence reaction. In some embodiments, the exponential amplification reaction is not performed before the SBC is ligated to the linear amplification template sequence, or the exponential amplification reaction is performed after the SBC is ligated to the linear amplification template sequence. In some embodiments, the exponential amplification reaction is not performed prior to ligation of the universal priming sequence to the linear amplification template sequence, or the exponential amplification reaction is performed after the universal priming sequence is ligated to the linear amplification template sequence.

例如，指数扩增反应可以在线性引物延伸反应之前进行。例如，指数扩增反应可以在线性引物延伸反应之后进行。在一些实施方案中，在产生一个或多个拷贝的靶标多核苷酸互补序列(诸如第二互补序列)之前，不进行指数扩增反应，或者在产生一个或多个拷贝的靶标多核苷酸互补序列 (诸如第二互补序列)之后，进行指数扩增反应。在一些实施方案中，在使用LPE引物产生一个或多个拷贝的靶标多核苷酸互补序列(诸如第二互补序列)之前，不进行指数扩增反应，或者在使用LPE引物产生一个或多个拷贝的靶标多核苷酸互补序列(诸如第二互补序列)之后，进行指数扩增反应。在一些实施方案中，在使用多个LPE引物产生一个或多个拷贝的靶标多核苷酸互补序列(诸如第二互补序列)之前，不进行指数扩增反应，或者在使用多个LPE引物产生一个或多个拷贝的靶标多核苷酸互补序列(诸如第二互补序列)之后，进行指数扩增反应。在一些实施方案中，在连接用于指数扩增的第一和第二引发位点之前，不进行指数扩增反应，或者在连接用于指数扩增的第一和第二引发位点之后，进行指数扩增反应。例如，指数扩增反应可能不在连接针对引物组的第一引物的第一引发位点和针对引物组的第二引物的第二引发位点之前进行，或者可以在连接针对引物组的第一引物的第一引发位点和针对引物组的第二引物的第二引发位点之后进行。在一些实施方案中，在通过连接而连接第一引发位点和第二引发位点用于指数扩增之前，不进行指数扩增反应，或者在通过连接而连接第一引发位点和第二引发位点用于指数扩增之后，进行指数扩增反应。在一些实施方案中，在通过线性引物延伸反应连接第一引发位点和第二引发位点用于指数扩增之前，不进行指数扩增反应，或者在通过线性引物延伸反应连接第一引发位点和第二引发位点用于指数扩增之后，进行指数扩增反应。在一些实施方案中，在通过连接而连接第一引发位点或其互补物和通过线性引物延伸反应连接第二引发位点用于指数扩增之前，不进行指数扩增反应，或者在通过连而接连接第一引发位点或其互补物和通过线性引物延伸反应连接第二引发位点用于指数扩增之后，进行指数扩增反应。例如，第一和第二引发位点可以是一对用于指数扩增反应的引物的引发位点。例如，第一和第二引发位点可以是通用引发位点。例如，第一和第二引发位点可以是用于测序的引发位点。For example, an exponential amplification reaction can be performed before a linear primer extension reaction. For example, an exponential amplification reaction can be performed after a linear primer extension reaction. In some embodiments, the exponential amplification reaction is not performed prior to producing one or more copies of the target polynucleotide complementary sequence (such as the second complementary sequence), or before producing one or more copies of the target polynucleotide complementary sequence Following the sequence, such as the second complementary sequence, an exponential amplification reaction is performed. In some embodiments, the exponential amplification reaction is not performed prior to using the LPE primer to produce one or more copies of the complementary sequence of the target polynucleotide (such as the second complementary sequence), or the LPE primer is used to produce one or more copies After the complementary sequence of the target polynucleotide (such as the second complementary sequence), an exponential amplification reaction is performed. In some embodiments, the exponential amplification reaction is not performed prior to using multiple LPE primers to generate one or more copies of the complementary sequence of the target polynucleotide (such as a second complementary sequence), or after using multiple LPE primers to generate one After one or more copies of the target polynucleotide complementary sequence (such as a second complementary sequence), an exponential amplification reaction is performed. In some embodiments, the exponential amplification reaction is not performed prior to ligation of the first and second priming sites for exponential amplification, or after ligation of the first and second priming sites for exponential amplification, An exponential amplification reaction was performed. For example, the exponential amplification reaction may not be performed prior to ligation of the first priming site of the first primer to the primer set and the second priming site of the second primer to the primer set, or may be performed after ligation of the first primer to the primer set followed by the first priming site of the primer set and the second priming site of the second primer against the primer set. In some embodiments, the exponential amplification reaction is not performed before the first priming site and the second priming site are ligated by ligation for exponential amplification, or the first priming site and the second priming site are ligated by ligation After the priming site is used for exponential amplification, an exponential amplification reaction is performed. In some embodiments, the exponential amplification reaction is not performed prior to ligation of the first priming site and the second priming site for exponential amplification by a linear primer extension reaction, or the first priming site is ligated by a linear primer extension reaction After the site and the second priming site are used for exponential amplification, an exponential amplification reaction is performed. In some embodiments, the exponential amplification reaction is not performed before the first priming site or its complement is ligated by ligation and the second priming site is ligated by a linear primer extension reaction for exponential amplification, or the exponential amplification reaction is not performed by ligation The exponential amplification reaction is performed after successively ligating the first priming site or its complement and ligating the second priming site for exponential amplification by a linear primer extension reaction. For example, the first and second priming sites may be priming sites for a pair of primers used in an exponential amplification reaction. For example, the first and second priming sites can be universal priming sites. For example, the first and second priming sites can be priming sites for sequencing.

在一些实施方案中，在将多核苷酸固定在表面或支持物上之前，不进行指数扩增反应，或者在将多核苷酸固定在表面或支持物上之后，进行指数扩增反应。在一些实施方案中，对固定在表面或支持物上的一个拷贝的多核苷酸进行指数扩增反应。在一些实施方案中，对由线性引物延伸反应产生的固定在表面或支持物上一个拷贝的多核苷酸进行指数扩增反应。In some embodiments, the exponential amplification reaction is not performed before the polynucleotide is immobilized on the surface or support, or the exponential amplification reaction is performed after the polynucleotide is immobilized on the surface or support. In some embodiments, an exponential amplification reaction is performed on one copy of the polynucleotide immobilized on the surface or support. In some embodiments, an exponential amplification reaction is performed on one copy of a polynucleotide immobilized on a surface or support resulting from a linear primer extension reaction.

在一些实施方案中，在将一个或多个靶标多核苷酸互补序列固定在表面或支持物上之前，不进行指数扩增反应，或者在将一个或多个靶标多核苷酸互补序列固定在表面或支持物上之后，进行指数扩增反应。在一些实施方案中，在对一个或多个固定的靶标多核苷酸互补序列进行线性引物反应之前，不进行指数扩增反应，或者在对一个或多个固定的靶标多核苷酸互补序列进行线性引物反应之后，进行指数扩增反应。在一些实施方案中，对由固定在表面或固体支持物上的多核苷酸复制的多核苷酸进行指数扩增反应。在一些实施方案中，对由固定在表面或固体支持物上的靶标多核苷酸互补序列(诸如可能包含UID的第一互补序列或修饰的互补序列)复制的多核苷酸互补序列(诸如第二互补序列)进行指数扩增反应。In some embodiments, the exponential amplification reaction is not performed prior to immobilizing the one or more target polynucleotide complementary sequences on the surface or support, or the one or more target polynucleotide complementary sequences are immobilized on the surface After or on the support, an exponential amplification reaction is performed. In some embodiments, an exponential amplification reaction is not performed prior to performing a linear primer reaction on the one or more immobilized target polynucleotide complements, or after linearizing the one or more immobilized target polynucleotide complements After the primer reaction, an exponential amplification reaction is performed. In some embodiments, exponential amplification reactions are performed on polynucleotides replicated from polynucleotides immobilized on a surface or solid support. In some embodiments, a polynucleotide complementary sequence (such as a second complementary sequence that may comprise a UID) replicated from a target polynucleotide complementary sequence (such as a first complementary sequence or a modified complementary sequence that may comprise a UID) is immobilized on a surface or solid support. complementary sequence) for exponential amplification reaction.

在一些实施方案中，对由固定在表面或固体支持物上的包含UID的多核苷酸复制的包含SBC的多核苷酸进行指数扩增反应。在一些实施方案中，对由固定在表面或固体支持物上的包含UID的多核苷酸复制的包含第一引物结合位点、第二引物结合位点或二者的多核苷酸进行指数扩增反应。在一些实施方案中，对由固定在表面或固体支持物上的包含UID 的多核苷酸复制的包含第一引物结合位点、第二引物结合位点、第一通用引发位点、第二通用引发位点或它们的任意组合的多核苷酸进行指数扩增反应。In some embodiments, exponential amplification reactions are performed on SBC-containing polynucleotides replicated from UID-containing polynucleotides immobilized on a surface or solid support. In some embodiments, exponential amplification of a polynucleotide comprising a first primer binding site, a second primer binding site, or both replicated from a UID-containing polynucleotide immobilized on a surface or solid support is performed. reaction. In some embodiments, the replication of a polynucleotide comprising a UID immobilized on a surface or solid support comprises a first primer binding site, a second primer binding site, a first universal priming site, a second universal Polynucleotides with priming sites or any combination thereof undergo an exponential amplification reaction.

在一些实施方案中，对由固定在表面或固体支持物上的包含SBC的多核苷酸复制的包含SBC的多核苷酸进行指数扩增反应。在一些实施方案中，对由固定在表面或固体支持物上的包含SBC的多核苷酸复制的包含UID的多核苷酸进行指数扩增反应。在一些实施方案中，对由固定在表面或固体支持物上的包含SBC的多核苷酸复制的包含第一引物结合位点、第二引物结合位点或二者的多核苷酸进行指数扩增反应。在一些实施方案中，对由固定在表面或固体支持物上的包含SBC的多核苷酸复制的包含第一引物结合位点、第二引物结合位点、第一通用引发位点、第二通用引发位点或它们的任意组合的多核苷酸进行指数扩增反应。In some embodiments, exponential amplification reactions are performed on SBC-containing polynucleotides replicated from SBC-containing polynucleotides immobilized on a surface or solid support. In some embodiments, exponential amplification reactions are performed on UID-containing polynucleotides replicated from SBC-containing polynucleotides immobilized on a surface or solid support. In some embodiments, exponential amplification of a polynucleotide comprising a first primer binding site, a second primer binding site, or both replicated from a polynucleotide comprising an SBC immobilized on a surface or solid support is performed reaction. In some embodiments, the replication of a polynucleotide comprising a SBC immobilized on a surface or solid support comprises a first primer binding site, a second primer binding site, a first universal priming site, a second universal Polynucleotides with priming sites or any combination thereof undergo an exponential amplification reaction.

在一些实施方案中，对由固定在表面或固体支持物上的包含第一和/ 或第二引物位点的多核苷酸复制的包含第一和/或第二引物位点的多核苷酸进行指数扩增反应。在一些实施方案中，对由固定在表面或固体支持物上的包含第一和/或第二引物位点的多核苷酸复制的包含SBC的多核苷酸进行指数扩增反应。在一些实施方案中，对由固定在表面或固体支持物上的包含第一和/或第二引物位点的多核苷酸复制的包含UID的多核苷酸进行指数扩增反应。在一些实施方案中，对由固定在表面或固体支持物上的包含第一和/或第二引物位点的多核苷酸复制的包含第一通用引物结合位点、第二通用引物结合位点或二者的多核苷酸进行指数扩增反应。在一些实施方案中，对由固定在表面或固体支持物上的包含第一和/或第二引物位点的多核苷酸复制的包含第一引物结合位点、第二引物结合位点、第一通用引发位点、第二通用引发位点或它们的任意组合的多核苷酸进行指数扩增反应。In some embodiments, polynucleotides comprising first and/or second primer sites replicated from polynucleotides comprising first and/or second primer sites immobilized on a surface or solid support are subjected to Exponential amplification reaction. In some embodiments, an exponential amplification reaction is performed on an SBC-containing polynucleotide replicated from a polynucleotide comprising the first and/or second primer site immobilized on a surface or solid support. In some embodiments, exponential amplification reactions are performed on polynucleotides comprising UIDs replicated from polynucleotides comprising first and/or second primer sites immobilized on a surface or solid support. In some embodiments, the replication of a polynucleotide comprising a first and/or a second primer site immobilized on a surface or solid support comprises a first universal primer binding site, a second universal primer binding site or both polynucleotides for exponential amplification reaction. In some embodiments, replication of a polynucleotide comprising a first primer binding site, a second primer binding site, a first primer binding site, a second primer binding site, a first primer binding site, a second primer binding site, a Exponential amplification reactions are performed on polynucleotides of one universal priming site, a second universal priming site, or any combination thereof.

使用本文所述的引物，由一个或多个上述方法或方法步骤产生的 DNA多核苷酸可以与引物组(例如，PCR引物组或指数扩增引物组)杂交，并且可以使用本领域已知的适当的试剂进行指数扩增。例如，一个或多个第二互补序列可以与引物组的第一引物(诸如反向引物)杂交，并且可以进行引物延伸；然后，引物组的第二引物(诸如正向引物)可以与所述延伸反应的产物杂交，并且可以进行引物延伸。Using the primers described herein, DNA polynucleotides produced by one or more of the above-described methods or method steps can hybridize to primer sets (eg, PCR primer sets or exponential amplification primer sets), and can use known in the art Appropriate reagents for exponential amplification. For example, one or more second complementary sequences can hybridize to a first primer of a primer set (such as a reverse primer), and primer extension can be performed; then, a second primer of a primer set, such as a forward primer, can hybridize to the The products of the extension reaction hybridize and primer extension can proceed.

在一些实施方案中，指数扩增包括多个循环。在一些实施方案中，引物组的相同的第一和第二引物用于多个模板多核苷酸的指数扩增反应。在一些实施方案中，一个或多个指数扩增引物不是靶标特异性引物。在一些实施方案中，指数扩增引物组的两种引物不是靶标特异性引物。在一些实施方案中，引物组的相同的第一和第二引物用于在同一个反应容器中的多个模板多核苷酸的指数扩增反应。在一些实施方案中，引物组的相同的第一和第二引物用于在同一反应中的多个模板多核苷酸的指数扩增反应。在一些实施方案中，引物组的相同的第一和第二引物同时用于多个模板多核苷酸的指数扩增反应。例如，引物组的相同的第一和第二引物可以用于指数扩增来源于不同靶标序列的多个靶标多核苷酸互补序列，诸如多个第二互补序列。例如，引物组的相同的第一和第二引物可以用于指数扩增来源于不同靶标序列的多个靶标多核苷酸互补序列，诸如多个第二互补序列。例如，引物组的相同的第一和第二引物可以用于指数扩增多个包含相同的靶标序列或其互补物的靶标多核苷酸互补序列，诸如多个第二互补序列。例如，引物组的相同的第一和第二引物可以用于指数扩增扩增子的多个靶标多核苷酸互补序列。例如，引物组的相同的第一和第二引物可以用于指数扩增扩增子组的多个靶标多核苷酸互补序列，诸如多个第二互补序列。例如，例如，引物组的相同的第一和第二引物可以用于指数扩增使用本文所述的任一种方法产生的多个靶标多核苷酸互补序列中的每一个。例如，引物组的相同的第一和第二引物可以用于指数扩增多个包含衔接子序列的靶标多核苷酸互补序列中的每一个。例如，引物组的相同的第一和第二引物可以用于指数扩增多个包含SBC的靶标多核苷酸互补序列中的每一个。例如，引物组的相同的第一和第二引物可以用于指数扩增多个包含第一和第二通用引发位点的靶标多核苷酸互补序列中的每一个。In some embodiments, exponential amplification includes multiple cycles. In some embodiments, the same first and second primers of the primer set are used for exponential amplification reactions of multiple template polynucleotides. In some embodiments, the one or more exponential amplification primers are not target-specific primers. In some embodiments, the two primers of the exponential amplification primer set are not target-specific primers. In some embodiments, the same first and second primers of the primer set are used for exponential amplification reactions of multiple template polynucleotides in the same reaction vessel. In some embodiments, the same first and second primers of a primer set are used for exponential amplification reactions of multiple template polynucleotides in the same reaction. In some embodiments, the same first and second primers of the primer set are used simultaneously for exponential amplification reactions of multiple template polynucleotides. For example, the same first and second primers of a primer set can be used to exponentially amplify multiple target polynucleotide complements, such as multiple second complements, derived from different target sequences. For example, the same first and second primers of a primer set can be used to exponentially amplify multiple target polynucleotide complements, such as multiple second complements, derived from different target sequences. For example, the same first and second primers of a primer set can be used to exponentially amplify multiple target polynucleotide complements comprising the same target sequence or its complement, such as multiple second complements. For example, the same first and second primers of a primer set can be used to exponentially amplify multiple target polynucleotide complements of an amplicon. For example, the same first and second primers of a primer set can be used to exponentially amplify multiple target polynucleotide complements, such as multiple second complements, of a set of amplicons. For example, for example, the same first and second primers of a primer set can be used to exponentially amplify each of a plurality of target polynucleotide complements generated using any of the methods described herein. For example, the same first and second primers of a primer set can be used to exponentially amplify each of a plurality of target polynucleotide complements comprising adaptor sequences. For example, the same first and second primers of the primer set can be used to exponentially amplify each of a plurality of SBC-containing target polynucleotide complementary sequences. For example, the same first and second primers of the primer set can be used to exponentially amplify each of a plurality of target polynucleotide complementary sequences comprising the first and second universal priming sites.

在一些实施方案中，引物组的第一和第二引物可以用于指数扩增 UID、SBC、靶标区域、它们的任意互补物、或它们的任意组合。例如，第一和第二引物结合位点可以分别杂交到UID、SBC、靶标区域、它们的任意互补物、或它们的任意组合的5’和3’。In some embodiments, the first and second primers of the primer set can be used to exponentially amplify UIDs, SBCs, target regions, any complements thereof, or any combination thereof. For example, the first and second primer binding sites can hybridize 5' and 3', respectively, to the UID, SBC, target region, any complement thereof, or any combination thereof.

在一些实施方案中，指数扩增反应是从包含靶标多核苷酸的样品产生多核苷酸文库的第二、第三、第四或第五个步骤。在一些实施方案中，指数扩增反应不是从包含靶标多核苷酸的样品产生多核苷酸文库的第二个步骤。在一些实施方案中，指数扩增反应不是在从包含靶标多核苷酸的样品产生多核苷酸文库的方法中进行的第一扩增反应。在一些实施方案中，指数扩增反应是从包含靶标多核苷酸的样品产生多核苷酸文库的第三个步骤。在一些实施方案中，指数扩增反应是从包含靶标多核苷酸的样品产生多核苷酸文库的第四个步骤。在一些实施方案中，指数扩增反应是从包含靶标多核苷酸的样品产生多核苷酸文库的第五个步骤。在一些实施方案中，在RT或PE反应后进行指数扩增反应。在一些实施方案中，在将衔接子添加到靶标多核苷酸互补序列(诸如第一互补序列)上的反应后进行指数扩增反应。在一些实施方案中，在RT或PE反应后并且在将衔接子添加到靶标多核苷酸互补序列(诸如第一互补序列)上的反应后进行指数扩增反应。在一些实施方案中，在进行第二指数扩增反应(诸如 PCR)之前进行指数扩增反应。在一些实施方案中，在线性引物延伸后的下一步骤中进行指数扩增。在一些实施方案中，不在线性引物延伸后的下一步骤中进行指数扩增。在一些实施方案中，不在RT或PE反应后的下一步骤中进行指数扩增。在一些实施方案中，不在RT或PE反应后的下 2个步骤中进行指数扩增。在一些实施方案中，不在RT或PE反应后的下3个步骤中进行指数扩增。在一些实施方案中，由指数扩增步骤产生的可能包含UID的多核苷酸序列文库在该步骤之后不被进一步扩增。在一些实施方案中，所述方法包括仅一个循环的指数扩增。在一些实施方案中，方法包括使引物组的两个引物重复延伸，以产生多个拷贝的可能包含 UID的多核苷酸序列。In some embodiments, the exponential amplification reaction is the second, third, fourth or fifth step of generating a polynucleotide library from a sample comprising a target polynucleotide. In some embodiments, the exponential amplification reaction is not a second step in generating a polynucleotide library from a sample comprising the target polynucleotide. In some embodiments, the exponential amplification reaction is not the first amplification reaction performed in the method of generating a polynucleotide library from a sample comprising a target polynucleotide. In some embodiments, the exponential amplification reaction is the third step in generating a polynucleotide library from a sample comprising the target polynucleotide. In some embodiments, the exponential amplification reaction is the fourth step in generating a polynucleotide library from a sample comprising the target polynucleotide. In some embodiments, the exponential amplification reaction is the fifth step in generating a polynucleotide library from a sample comprising the target polynucleotide. In some embodiments, the exponential amplification reaction is performed after the RT or PE reaction. In some embodiments, the exponential amplification reaction is performed after the reaction of adding the adaptor to the complementary sequence of the target polynucleotide, such as the first complementary sequence. In some embodiments, the exponential amplification reaction is performed after the RT or PE reaction and after the reaction of adding the adaptor to the complementary sequence of the target polynucleotide, such as the first complementary sequence. In some embodiments, an exponential amplification reaction is performed prior to performing a second exponential amplification reaction, such as PCR. In some embodiments, exponential amplification is performed in the next step after linear primer extension. In some embodiments, exponential amplification is not performed in the next step after linear primer extension. In some embodiments, exponential amplification is not performed in the next step after the RT or PE reaction. In some embodiments, exponential amplification is not performed in the next 2 steps after the RT or PE reaction. In some embodiments, exponential amplification is not performed in the next 3 steps after the RT or PE reaction. In some embodiments, the library of polynucleotide sequences that may contain UIDs produced by the exponential amplification step is not further amplified after this step. In some embodiments, the method comprises only one cycle of exponential amplification. In some embodiments, the method includes repeatedly extending two primers of a primer set to generate multiple copies of a polynucleotide sequence that may comprise a UID.

指数扩增引物可以包含与靶标多核苷酸互补序列的序列或互补序列互补的序列。例如，一个或多个指数扩增引物可以包含与初始样品中的靶标多核苷酸互补序列或靶标多核苷酸的序列或互补序列互补的序列。例如，一个或多个指数扩增引物可以包含与作为扩增反应、连接反应、引物延伸、线性引物延伸或它们的组合的产物的靶标多核苷酸互补序列的序列或互补序列互补的序列。例如，一个或多个指数扩增引物可以包含与第一、第二或修饰的序列的序列或互补序列互补的序列。The exponential amplification primer may comprise a sequence complementary to the sequence or complementary sequence of the target polynucleotide. For example, one or more of the exponential amplification primers can comprise a sequence complementary to the target polynucleotide sequence or the sequence or complement of the target polynucleotide in the original sample. For example, the one or more exponential amplification primers can comprise a sequence complementary to a sequence or a complementary sequence of a target polynucleotide that is the product of an amplification reaction, ligation reaction, primer extension, linear primer extension, or a combination thereof. For example, one or more exponential amplification primers can comprise a sequence complementary to the sequence or complement of the first, second or modified sequence.

在一些实施方案中，一个或多个指数扩增引物不包含与靶标多核苷酸的序列或互补序列互补的序列。在一些实施方案中，一个或多个指数扩增引物不包含与靶标多核苷酸互补序列的序列或互补序列互补的序列。在一些实施方案中，一个或多个指数扩增引物不包含与靶标多核苷酸的序列或互补序列互补的序列，并且不包含与靶标多核苷酸互补序列的序列或互补序列互补的序列。In some embodiments, the one or more exponential amplification primers do not comprise a sequence complementary to the sequence or complementary sequence of the target polynucleotide. In some embodiments, the one or more exponential amplification primers do not comprise a sequence complementary to the sequence or complementary sequence of the target polynucleotide. In some embodiments, the one or more exponential amplification primers do not comprise a sequence complementary to the sequence or complement of the target polynucleotide, and do not comprise a sequence complementary to the sequence or complement of the target polynucleotide.

在一些实施方案中，一个或多个指数扩增引物包含与靶标多核苷酸的序列或互补序列互补的序列。在一些实施方案中，一个或多个指数扩增引物包含与包含UID的多核苷酸的序列或互补序列互补的序列。在一些实施方案中，一个或多个指数扩增引物包含与靶标多核苷酸的序列或互补序列互补的序列，并且包含与包含UID的多核苷酸的序列或互补序列互补的序列。In some embodiments, the one or more exponential amplification primers comprise a sequence complementary to the sequence or complementary sequence of the target polynucleotide. In some embodiments, the one or more exponential amplification primers comprise a sequence complementary to the sequence or complement of a polynucleotide comprising a UID. In some embodiments, the one or more exponential amplification primers comprise a sequence complementary to the sequence or complement of a target polynucleotide, and comprise a sequence complementary to the sequence or complement of a polynucleotide comprising a UID.

在一些实施方案中，一个或多个指数扩增引物上与包含UID的多核苷酸互补的序列不与靶标序列互补。在一些实施方案中，一个或多个指数扩增引物上与包含UID的多核苷酸互补的序列不与没有包含UID的任意多核苷酸互补。在一些实施方案中，一个或多个指数扩增引物上与包含 UID的多核苷酸互补的序列不与样品中的任意其他多核苷酸互补。In some embodiments, the sequence complementary to the UID-containing polynucleotide on one or more exponential amplification primers is not complementary to the target sequence. In some embodiments, the sequence on one or more exponential amplification primers that is complementary to a polynucleotide comprising a UID is not complementary to any polynucleotide that does not comprise a UID. In some embodiments, the sequence on one or more exponential amplification primers that is complementary to the polynucleotide comprising the UID is not complementary to any other polynucleotide in the sample.

在一些实施方案中，指数扩增的靶标多核苷酸互补序列是单链多核苷酸。在一些实施方案中，指数扩增的靶标多核苷酸互补序列是双链多核苷酸。在一些实施方案中，指数扩增的靶标多核苷酸互补序列是来自PE 或RT反应的一个拷贝的延伸产物。在一些实施方案中，指数扩增的靶标多核苷酸互补序列还包含衔接子序列，诸如连接的衔接子序列。在一些实施方案中，指数扩增的靶标多核苷酸互补序列是来自PE或RT反应的延伸产物的互补物，其还包含衔接子序列。在一些实施方案中，指数扩增的靶标多核苷酸互补序列是来自PE或RT反应的延伸产物的互补序列的互补物，其还包含第一和/或第二引物结合位点，诸如PCR、测序或通用引发位点。在一些实施方案中，指数扩增的靶标多核苷酸互补序列被固定在基底或表面上。在一些实施方案中，指数扩增的靶标多核苷酸互补序列包含SBC。In some embodiments, the exponentially amplified target polynucleotide complement is a single-stranded polynucleotide. In some embodiments, the exponentially amplified target polynucleotide complement is a double-stranded polynucleotide. In some embodiments, the exponentially amplified target polynucleotide complement is an extension product of one copy from a PE or RT reaction. In some embodiments, the exponentially amplified target polynucleotide complement further comprises an adaptor sequence, such as a ligated adaptor sequence. In some embodiments, the exponentially amplified target polynucleotide complement is the complement of an extension product from a PE or RT reaction, which further comprises an adaptor sequence. In some embodiments, the exponentially amplified target polynucleotide complementary sequence is the complement of the complementary sequence of the extension product from a PE or RT reaction, which further comprises first and/or second primer binding sites, such as PCR, Sequencing or universal priming sites. In some embodiments, the exponentially amplified target polynucleotide complement is immobilized on a substrate or surface. In some embodiments, the exponentially amplified target polynucleotide complement comprises an SBC.

在一些实施方案中，一个或多个指数扩增引物上与指数扩增的靶标多核苷酸互补序列互补的序列不是靶标多核苷酸中的序列。在一些实施方案中，一个或多个指数扩增引物上与指数扩增的靶标多核苷酸互补序列互补的序列与在RT或PE反应过程中产生的序列的互补序列互补。在一些实施方案中，一个或多个指数扩增引物上与指数扩增的靶标多核苷酸互补序列互补的序列与这样的靶标多核苷酸的序列互补，所述靶标多核苷酸杂交到在所述靶标多核苷酸与RT或PE引物互补的序列的5’的靶标的序列上。在一些实施方案中，一个或多个指数扩增引物上与指数扩增的靶标多核苷酸互补序列互补的序列与这样的靶标多核苷酸的序列互补，所述靶标多核苷酸杂交到在所述靶标多核苷酸与RT或PE引物互补的序列的 3’的靶标的序列上。在一些实施方案中，包含用于通过本文所述的任意方法分析的变体或区域的靶标多核苷酸的序列可以位于靶标多核苷酸与一个或多个RT或PE引物互补的序列与所述靶标多核苷酸上与一个或多个指数扩增引物互补的序列之间。In some embodiments, the sequence on one or more of the exponential amplification primers that is complementary to the complementary sequence of the exponentially amplified target polynucleotide is not a sequence in the target polynucleotide. In some embodiments, the sequence on the one or more exponential amplification primers that is complementary to the complementary sequence of the exponentially amplified target polynucleotide is complementary to the complementary sequence of the sequence produced during the RT or PE reaction. In some embodiments, the sequence on the one or more exponential amplification primers that is complementary to the complementary sequence of the exponentially amplified target polynucleotide is complementary to the sequence of the target polynucleotide that hybridizes to the The target polynucleotide is on the sequence of the target 5' to the sequence complementary to the RT or PE primer. In some embodiments, the sequence on the one or more exponential amplification primers that is complementary to the complementary sequence of the exponentially amplified target polynucleotide is complementary to the sequence of the target polynucleotide that hybridizes to the The target polynucleotide is on the sequence of the target 3' to the sequence complementary to the RT or PE primer. In some embodiments, the sequence of a target polynucleotide comprising a variant or region for analysis by any of the methods described herein may be located between the target polynucleotide and one or more RT or PE primers complementary to the sequence of the target polynucleotide. Between sequences on a target polynucleotide that are complementary to one or more exponential amplification primers.

在一些实施方案中，一个或多个指数扩增引物上与指数扩增的靶标多核苷酸互补序列互补的序列不是与一个或多个PE或RT引物的序列互补的序列。在一些实施方案中，一个或多个指数扩增引物上与指数扩增的靶标多核苷酸互补序列互补的序列不是一个或多个PE或RT引物上与靶标特异性序列互补的序列。In some embodiments, the sequence on the one or more exponential amplification primers that is complementary to the complementary sequence of the exponentially amplified target polynucleotide is not a sequence complementary to the sequence of the one or more PE or RT primers. In some embodiments, the sequence complementary to the exponentially amplified target polynucleotide complementary sequence on the one or more exponential amplification primers is not the sequence complementary to the target-specific sequence on the one or more PE or RT primers.

在一些实施方案中，一个或多个指数扩增引物包括具有与第一模板多核苷酸的序列互补的区域的第一指数扩增引物，和具有与第二模板多核苷酸的序列互补的区域的第二指数扩增引物。例如，第一模板多核苷酸可以是第一DNA分子，并且第二个第一模板多核苷酸可以是第二DNA分子。例如，第一模板多核苷酸可以是来源于样品中的第一靶标多核苷酸的第一DNA分子，并且第二个第一模板多核苷酸可以是来源于样品中的第二靶标多核苷酸的第二DNA分子。在一些实施方案中，一个或多个指数扩增引物包括具有与第一DNA的序列互补的区域的第一指数扩增引物，和一个或多个分别具有与一个或多个第二DNAs的序列互补的区域的第二指数扩增引物。在一些实施方案中，第一和第二DNAs的序列是相同的。在一些实施方案中，第一和第二DNAs的序列是不同的。在一些实施方案中，第一和第二模板序列是相同的。在一些实施方案中，第一和第二模板序列是不同的。在一些实施方案中，第一和第二靶标序列是相同的。在一些实施方案中，第一和第二靶标序列是不同的。In some embodiments, the one or more exponential amplification primers include a first exponential amplification primer having a region complementary to the sequence of the first template polynucleotide, and a region having a region complementary to the sequence of the second template polynucleotide The second exponential amplification primer. For example, the first template polynucleotide can be a first DNA molecule, and the second first template polynucleotide can be a second DNA molecule. For example, the first template polynucleotide can be a first DNA molecule derived from a first target polynucleotide in a sample, and the second first template polynucleotide can be a second target polynucleotide derived from a sample the second DNA molecule. In some embodiments, the one or more exponential amplification primers include a first exponential amplification primer having a region complementary to a sequence of the first DNA, and one or more having a sequence to one or more second DNAs, respectively A second exponential amplification primer for the complementary region. In some embodiments, the sequences of the first and second DNAs are identical. In some embodiments, the sequences of the first and second DNAs are different. In some embodiments, the first and second template sequences are the same. In some embodiments, the first and second template sequences are different. In some embodiments, the first and second target sequences are the same. In some embodiments, the first and second target sequences are different.

测序Sequencing

在进行本文所述的方法或方法步骤中的一个或多个后，可以测序产生的多核苷酸文库。After performing one or more of the methods or method steps described herein, the resulting polynucleotide library can be sequenced.

测序可以通过本领域已知的任何测序方法进行。在一些实施方案中，测序可以以高通量进行。适宜的下一代测序技术包括454个生命科学平台 (Roche，Branford，CT)(Margulies等，Nature，437，376-380(2005))； lllumina′s基因组分析仪，GoldenGate甲基化测定，或Infinium甲基化测定，即，Infinium人甲基化27K珠子阵列或VeraCodeGoldenGate甲基化阵列(Illumina，San Diego，CA；Bibkova等，Genome Res.16，383-393(2006)；和美国专利号6,306,597，7,598,035，7,232,656)，或通过连接、 SOLiD系统(Ligation，SOLiD System)的DNA测序(Applied Biosystems/Life Technologies；美国专利号6,797,470，7,083,917， 7,166,434，7,320,865，7,332,285，7,364,858和7,429,453)；或Helicos真实的单分子DNA测序技术(Harris等，Science，320，106-109(2008)；和美国专利号7,037,687，7,645,596，7,169,560，和7,769,400)，Pacific Biosciences的单分子、实时(SMRTTm)技术，以及测序(Soni等，Clin. Chem.53，1996-2001(2007))。方法可以进一步包括对文库中的一种或多种多核苷酸进行测序。方法可以进一步包括将文库中的一种或多种多核苷酸序列、序列读数、扩增子序列或扩增子组序列彼此比对。Sequencing can be performed by any sequencing method known in the art. In some embodiments, sequencing can be performed at high throughput. Suitable next-generation sequencing technologies include the 454 Life Sciences Platform (Roche, Branford, CT) (Margulies et al., Nature, 437, 376-380 (2005)); lllumina's Genome Analyzer, GoldenGate Methylation Assay, or Infinium Methylation assays, i.e., Infinium Human Methylation 27K Bead Array or VeraCodeGoldenGate Methylation Array (Illumina, San Diego, CA; Bibkova et al., Genome Res. 16, 383-393 (2006); and US Pat. No. 6,306,597, 7,598,035, 7,232,656), or DNA sequencing by ligation, SOLiD System (Applied Biosystems/Life Technologies; U.S. Patent Nos. 6,797,470, 7,083,917, 7,166,434, 7,320,865, 7,332,285, 7,364,858 and 7,429,45) Molecular DNA sequencing technology (Harris et al., Science, 320, 106-109 (2008); and US Pat. Nos. 7,037,687, 7,645,596, 7,169,560, and 7,769,400), Pacific Biosciences' single-molecule, real-time (SMRTTm) technology, and sequencing (Soni et al. , Clin. Chem. 53, 1996-2001 (2007)). The method can further comprise sequencing one or more polynucleotides in the library. The method may further comprise aligning one or more polynucleotide sequences, sequence reads, amplicon sequences or amplicon set sequences in the library to each other.

当用于本文时，比对包括将测试序列(诸如测序的读数)与一个或多个其他测试序列、参比序列或它们的组合进行比较。在一些实施方案中，比对可以用于确定多种序列或多种比对的序列的共有序列。在一些实施方案中，比对包括确定多个分别具有相同的UID的序列的共有序列。在一些实施方案中，出于比较目的比对的序列的长度是参比序列的长度的至少30％，至少40％，至少50％，至少60％，至少70％，至少80％，至少 90％，或至少95％。两种以上的序列的真实比较可以通过公知的方法实现，例如，使用数学算法。所述数学算法的非限制性实例记载在Karlin， S.和Altschul，S.，Proc.Natl.Acad.Sci.USA，90-5873-5877(1993)中。所述算法结合在NBLAST和XBLAST程序(版本2.0)中，如在Altschul， S.等，Nucleic Acids Res.，25：3389-3402(1997)中所述。当使用BLAST和Gapped BLAST程序时，可以使用各种程序(例如，NBLAST)任意相关的参数。例如，用于序列比较的参数可以设置为得分＝100，字长(word length)＝12，或者可以变化(例如，W＝5或W＝20)。其他实例包括Myers 和Miller的算法，CABIOS(1989)，ADVANCE，ADAM，BLAT，和FASTA。在一些实施方案中，例如，可以使用GCG软件包(Accelrys， Cambridge，UK)中的GAP程序完成两个氨基酸序列之间的百分比同一性。As used herein, aligning includes comparing a test sequence (such as a sequenced read) to one or more other test sequences, reference sequences, or a combination thereof. In some embodiments, the alignment can be used to determine a consensus sequence of multiple sequences or multiple aligned sequences. In some embodiments, aligning includes determining a consensus sequence for a plurality of sequences each having the same UID. In some embodiments, the length of the sequences aligned for comparison purposes is at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90% the length of the reference sequence , or at least 95%. True comparison of two or more sequences can be accomplished by well-known methods, for example, using mathematical algorithms. Non-limiting examples of such mathematical algorithms are described in Karlin, S. and Altschul, S., Proc. Natl. Acad. Sci. USA, 90-5873-5877 (1993). The algorithm is incorporated into the NBLAST and XBLAST programs (version 2.0) as described in Altschul, S. et al., Nucleic Acids Res., 25:3389-3402 (1997). When using BLAST and Gapped BLAST programs, any relevant parameters of the various programs (e.g., NBLAST) can be used. For example, parameters for sequence comparison may be set to score=100, word length=12, or may vary (eg, W=5 or W=20). Other examples include Myers and Miller's algorithm, CABIOS (1989), ADVANCE, ADAM, BLAT, and FASTA. In some embodiments, the percent identity between two amino acid sequences can be accomplished, for example, using the GAP program in the GCG software package (Accelrys, Cambridge, UK).

在一些方面中，确定具有不同的序列的多核苷酸、扩增子或扩增子组的数目可以包括确定多核苷酸、扩增子或扩增子组的序列。在一些方面中，确定包含不同的UID的多核苷酸、扩增子或扩增子组的数目可以包括确定包含UID的核苷酸、扩增子或扩增子组的序列。确定多核苷酸的序列可以包括进行测序反应以确定靶标区域的至少一部分、UID、SBC、多核苷酸的至少一部分、其互补物、其反向互补物、或它们的任意组合的序列。在一些实施方案中，仅测序UID或UID的一部分。在一些实施方案中，仅测序SBC或SBC的一部分。在一些实施方案中，仅测序靶区域或靶区域的一部分。在一些实施方案中，测序反应可以在本文所述的支持物上、在连续流动、在稀释液或在一个或多个物理分开的体积中发生。测序可以包括每次运行至少约200，300，400，500，600，700，800，900， 1000个以上测序读数。当用于本文时，序列读数包括从泰国测序技术产生的数据次序或流确定的核苷酸的序列。In some aspects, determining the number of polynucleotides, amplicons, or groups of amplicons having distinct sequences can include determining the sequences of the polynucleotides, amplicons, or groups of amplicons. In some aspects, determining the number of polynucleotides, amplicons or groups of amplicons comprising distinct UIDs can include determining the sequence of the nucleotides, amplicons or groups of amplicons comprising the UIDs. Determining the sequence of the polynucleotide can include performing a sequencing reaction to determine the sequence of at least a portion of the target region, the UID, the SBC, at least a portion of the polynucleotide, its complement, its reverse complement, or any combination thereof. In some embodiments, only the UID or a portion of the UID is sequenced. In some embodiments, only the SBC or a portion of the SBC is sequenced. In some embodiments, only the target region or a portion of the target region is sequenced. In some embodiments, the sequencing reaction can occur on the supports described herein, in continuous flow, in diluent, or in one or more physically separate volumes. Sequencing can include at least about 200, 300, 400, 500, 600, 700, 800, 900, 1000 or more sequencing reads per run. As used herein, sequence reads include sequences of nucleotides determined from the order or flow of data generated by Thai sequencing technology.

在一些实施方案中，测序包括每次运行测序至少约1500，2000， 3000，4000，5000，6000，7000，8000，9000，10,000个以上的测序读数。测序可以包括每次运行多于、少于或等于约1,000,000,000个测序读数。测序可以包括每次运行多于、少于或等于约200,000,000个读数。In some embodiments, sequencing comprises sequencing at least about 1500, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10,000 or more sequencing reads per run. Sequencing can include greater than, less than, or equal to about 1,000,000,000 sequencing reads per run. Sequencing can include greater than, less than, or equal to about 200,000,000 reads per run.

方法可以包括通过确定两个以上序列读数的共有序列而确定靶标多核苷酸的序列。在一些实施方案中，平均5-50或20-30个原始读数/UID 提供共有序列准确度和充分的测序深度的理想的平衡(较高的原始读数计数可能需要更大的测序深度)。在一些实施方案中，当使用UID信息比对并且将序列读数归纳(collapsing)成共有序列时，可以提高准确度(例如，聚集正常分布)。UID共有准确度的特征是提高的准确确定在第二等位基因上存在或不存在突变或SNP的能力，这导致对具有检测的SNP 的患者的杂合性的准确确定。The method can include determining the sequence of the target polynucleotide by determining the consensus sequence of two or more sequence reads. In some embodiments, an average of 5-50 or 20-30 raw reads/UID provides an ideal balance of consensus sequence accuracy and sufficient sequencing depth (higher raw read counts may require greater sequencing depth). In some embodiments, accuracy may be improved (e.g., clustered normal distribution) when using UID information to align and collapsing sequence reads into a consensus sequence. UID consensus accuracy is characterized by an increased ability to accurately determine the presence or absence of a mutation or SNP on the second allele, which results in an accurate determination of heterozygosity for patients with the detected SNP.

方法可以包括从一次或多次比对产生共有序列，诸如从文库中的一种或多种多核苷酸序列、序列读数、扩增子序列或扩增子组序列彼此的一次或多次比对产生共有序列。使用本文所述的方法和所产生的文库确定的共有序列可以提高碱基确定的准确度。例如，与本领域的其他方法相比，确定的共有序列可以具有提高的质量评分。当用于本文时，质量评分包括在特定序列位置的碱基分配是正确的可能性的测量。因此，质量评分值可以与正确的碱基确定的可能性相关。本文所述的方法可以以约或至少约10 的质量评分用于确定靶标多核苷酸序列。本文所述的方法可以降低或使用少量的序列读数来实现序列准确度相同或更高的置信度。在一些实施方案中，与不使用UIDs的相似的方法相比，本文所述的使用UIDs的方法使用较少的序列读数以相似的或相同的置信度或碱基确定准确度确定序列。The method can include generating a consensus sequence from one or more alignments, such as from one or more alignments of one or more polynucleotide sequences, sequence reads, amplicon sequences or amplicon set sequences in a library with each other Generate a consensus sequence. The accuracy of base determination can be improved using the methods described herein and the resulting library-determined consensus sequence. For example, a determined consensus sequence can have an improved quality score compared to other methods in the art. As used herein, a quality score includes a measure of the likelihood that a base assignment at a particular sequence position is correct. Therefore, the quality score value can be related to the probability of correct base determination. The methods described herein can be used to determine target polynucleotide sequences with a quality score of about or at least about 10. The methods described herein can reduce or use a small number of sequence reads to achieve the same or higher confidence in sequence accuracy. In some embodiments, the methods described herein using UIDs determine sequences with similar or the same confidence or accuracy of base determination using fewer sequence reads than similar methods that do not use UIDs.

在一些实施方案中，不具有两个指数扩增引发位点或其互补物、衔接子序列、SBC、可选的UID、两个通用引发序列或它们的任意组合的序列读数可能被错读。方法可能包括测序错读。方法可以包括确定错读的数量，诸如用于确定反应条件或设计引物序列。比较在一个或多个第一条件下或一组条件下产生的错读的数量可以用来确定优选的条件或条件组。例如，第一方法可以在PCR反应的高盐浓度下进行，并且第二方法可以在PCR 反应过程中的低盐浓度下进行，其中除盐浓度不同之外，第一和第二方法可以基本上相同地进行。如果第一方法导致更高的错读数，诸如对于特定的靶标多核苷酸序列或引物的更高的错读数，则可以确定低盐反应条件对于所述特定的靶标多核苷酸序列或引物是优选的。In some embodiments, sequence reads that do not have two exponential amplification priming sites or their complements, adaptor sequences, SBCs, optional UIDs, two universal priming sequences, or any combination thereof may be misread. Methods may include sequencing misreads. Methods may include determining the number of misreads, such as for determining reaction conditions or designing primer sequences. Comparing the number of misreads produced under one or more of the first conditions or set of conditions can be used to determine a preferred condition or set of conditions. For example, a first method may be performed at a high salt concentration in a PCR reaction, and a second method may be performed at a low salt concentration during the PCR reaction, wherein the first and second methods may be substantially Do the same. If the first method results in higher false reads, such as higher false reads for a particular target polynucleotide sequence or primer, it can be determined that low salt reaction conditions are preferred for that particular target polynucleotide sequence or primer of.

在一些实施方案中，仅使用具有两个指数扩增引发位点或其互补物、衔接子序列、SBC、任选的UID、两个通用引发序列或它们的任意组合的序列读数进行比对或确定共有序列。在一些实施方案中，不使用不具有两个指数扩增引发位点或其互补物、衔接子序列、SBC、任选的UID、两个通用引发序列或它们的任意组合的序列读数进行比对或确定共有序列。In some embodiments, only sequence reads with two exponential amplification priming sites or their complements, adaptor sequences, SBCs, optional UIDs, two universal priming sequences, or any combination thereof are used for alignment or Determine the consensus sequence. In some embodiments, sequence reads that do not have two exponential amplification priming sites or their complements, adapter sequences, SBCs, optional UIDs, two universal priming sequences, or any combination thereof are not used for alignment or to determine a consensus sequence.

在一些实施方案中，不使用一个或多个不具有两个指数扩增引发位点或其互补物的序列读数进行比对或确定共有序列。在一些实施方案中，不使用一个或多个不具有单个指数扩增引发位点(例如，PCR引发位点) 或其互补物的序列读数进行比对或确定共有序列。在一些实施方案中，当两个指数扩增引发位点不是所用的引物对(诸如PCR反应中使用的引物对)的相对应的指数扩增引发位点时，不使用一个或多个包含这两个指数扩增引发位点或其互补物的序列读数进行比对或确定共有序列。In some embodiments, one or more sequence reads that do not have two exponential amplification priming sites or their complements are not used to align or determine a consensus sequence. In some embodiments, one or more sequence reads that do not have a single exponential amplification priming site (eg, a PCR priming site) or its complement are not used to align or determine a consensus sequence. In some embodiments, when the two exponential amplification priming sites are not the corresponding exponential amplification priming sites for a primer pair used (such as a primer pair used in a PCR reaction), one or more of the exponential amplification priming sites are not used. Sequence reads of two exponential amplification priming sites or their complements are aligned or a consensus sequence is determined.

在一些实施方案中，仅使用具有两个指数扩增引发位点或其互补物的序列读数进行比对或确定共有序列。在一些实施方案中，仅使用具有两个指数扩增引发位点或其互补物的序列读数进行比对或确定共有序列，所述指数扩增引发位点或其互补物对应于所用的引物对(诸如PCR反应所用的引物对)的指数扩增引发位点。在一些实施方案中，不使用一个或多个不具有SBC的序列读数进行比对或确定共有序列。在一些实施方案中，仅使用具有SBC的序列读数进行比对或确定共有序列。在大部分实施方案中，不使用一个或多个不具有UID的序列读数进行比对或确定共有序列。在大部分实施方案中，仅使用具有UID的序列读数进行比对或确定共有序列。在一些实施方案中，不使用一个或多个不具有衔接子序列的序列读数进行比对或确定共有序列。在一些实施方案中，仅使用具有衔接子序列的序列读数进行比对或确定共有序列。在一些实施方案中，不使用一个或多个不具有两个通用引发序列的序列读数进行比对或确定共有序列。在一些实施方案中，仅使用具有两个通用引发序列的序列读数进行比对或确定共有序列。In some embodiments, only sequence reads with two exponential amplification priming sites or their complements are used to align or determine a consensus sequence. In some embodiments, only sequence reads with two exponential amplification priming sites or their complements corresponding to the primer pair used are used to align or determine consensus sequences Exponential amplification of priming sites (such as primer pairs used in PCR reactions). In some embodiments, one or more sequence reads that do not have an SBC are not used to align or determine a consensus sequence. In some embodiments, only sequence reads with SBCs are used to align or determine consensus sequences. In most embodiments, one or more sequence reads that do not have a UID are not used to align or determine a consensus sequence. In most embodiments, only sequence reads with UIDs are used to align or determine consensus sequences. In some embodiments, one or more sequence reads that do not have an adaptor sequence are not used to align or determine a consensus sequence. In some embodiments, only sequence reads with adaptor sequences are used to align or determine consensus sequences. In some embodiments, one or more sequence reads that do not have two universal priming sequences are not used to align or determine a consensus sequence. In some embodiments, only sequence reads with two universal priming sequences are used to align or determine a consensus sequence.

在一些实施方案中，当存在至少5％的包含相同的UID的序列、扩增子中的序列或扩增子组中的序列时，可以确定序列是准确的。例如，当存在至少10％，15％，20％，25％，30％，35％，40％，45％，50％， 55％，60％，65％，70％，75％，80％，85％，90％，92％，95％，97％，98％，99％以上的包含相同的UID的序列、扩增子中的序列或扩增子组中的序列时，可以确定序列是准确的。例如，当存在至少约75％至约99％的包含相同的UID的序列、扩增子中的序列或扩增子组中的序列时，可以确定序列是准确的。例如，当存在至少约85％至约99％的包含相同的UID的序列、扩增子中的序列或扩增子组中的序列时，可以确定序列是准确的。例如，当存在至少约92％至约99％的包含相同的UID的序列、扩增子中的序列或扩增子组中的序列时，可以确定序列是准确的。In some embodiments, a sequence can be determined to be accurate when there are at least 5% of the sequences comprising the same UID, sequences in an amplicon, or sequences in a set of amplicons. For example, when there is at least 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 92%, 95%, 97%, 98%, 99% or more of sequences containing the same UID, sequences in amplicon, or sequences in amplicon groups can be determined to be accurate of. For example, a sequence can be determined to be accurate when there are at least about 75% to about 99% of the sequences comprising the same UID, sequences in an amplicon, or sequences in a group of amplicons. For example, a sequence can be determined to be accurate when there are at least about 85% to about 99% of the sequences comprising the same UID, sequences in an amplicon, or sequences in a group of amplicons. For example, a sequence can be determined to be accurate when there are at least about 92% to about 99% of the sequences comprising the same UID, sequences in an amplicon, or sequences in a group of amplicons.

在一些实施方案中，使用具有相对高的出错率的测序化学品。在此类实施方案中，由此类化学品产生的平均质量评分是序列读数长度的单一递减函数。在一个实施方案中，所述递减对应于：在位置1-75具有至少一个错误的0.5％的序列读数；在位置76-100具有至少一个错误的1％的序列读数；和在位置101-125具有至少一个错误的2％的序列读数。In some embodiments, sequencing chemistries with relatively high error rates are used. In such embodiments, the average quality score produced by such chemicals is a single decreasing function of sequence read length. In one embodiment, the decrement corresponds to: 0.5% of sequence reads with at least one error at positions 1-75; 1% of sequence reads with at least one error at positions 76-100; and 101-125 2% of sequence reads with at least one error.

靶标多核苷酸target polynucleotide

本文所述的方法可以用于由一个或多个靶标多核苷酸产生用于测序的多核苷酸文库。靶标多核苷酸包括不是扩增反应的产物的任意感兴趣的多核苷酸。例如，靶标多核苷酸可以包括生物样品中的多核苷酸。例如，靶标多核苷酸不包括PCR反应的产物。例如，靶标多核苷酸可以包括用于产生扩增反应的产物的多核苷酸模板，但是不包括扩增产物本身。例如，靶标多核苷酸包括可以进行反转录反应或引物延伸反应的感兴趣的多核苷酸。例如，靶标多核苷酸包括RNA或DNA。在一些实施方案中，靶标RNA多核苷酸是mRNA。在一些实施方案中，靶标RNA多核苷酸是聚腺苷酸化的。在一些实施方案中，RNA多核苷酸没有被聚腺苷酸化。在一些实施方案中，靶标多核苷酸是DNA多核苷酸。DNA多核苷酸可以是基因组DNA。DNA多核苷酸可以包含外显子、内含子、不翻译区或它们的任意组合。The methods described herein can be used to generate a polynucleotide library for sequencing from one or more target polynucleotides. A target polynucleotide includes any polynucleotide of interest that is not a product of an amplification reaction. For example, a target polynucleotide can include a polynucleotide in a biological sample. For example, a target polynucleotide does not include the product of a PCR reaction. For example, a target polynucleotide can include a polynucleotide template used to generate the product of the amplification reaction, but not the amplification product itself. For example, target polynucleotides include polynucleotides of interest that can undergo reverse transcription or primer extension reactions. For example, target polynucleotides include RNA or DNA. In some embodiments, the target RNA polynucleotide is mRNA. In some embodiments, the target RNA polynucleotide is polyadenylated. In some embodiments, the RNA polynucleotide is not polyadenylated. In some embodiments, the target polynucleotide is a DNA polynucleotide. The DNA polynucleotide can be genomic DNA. DNA polynucleotides may contain exons, introns, untranslated regions, or any combination thereof.

在一些实施方案中，可以从靶标多核苷酸的两个以上的区域产生文库。在一些实施方案中，方法文库可以从两个以上的靶标多核苷酸产生。在一些实施方案中，靶标多核苷酸是来源于染色体的基因组核酸或 DNA。在一些实施方案中，靶标多核苷酸包括包含变异(诸如多态性或突变)的序列。在一些实施方案中，靶标多核苷酸包括DNA，并且不包括RNA。在一些实施方案中，靶标多核苷酸包括RNA，并且不包括DNA。在一些实施方案中，靶标多核苷酸包括DNA和RNA。在一些实施方案中，靶标多核苷酸是mRNA分子。在一些实施方案中，靶标多核苷酸是 DNA分子。在一些实施方案中，靶标多核苷酸是单链多核苷酸。在一些实施方案中，靶标多核苷酸是双链多核苷酸。在一些实施方案中，靶标多核苷酸是双链多核苷酸的一条单链。In some embodiments, a library can be generated from more than two regions of a target polynucleotide. In some embodiments, method libraries can be generated from more than two target polynucleotides. In some embodiments, the target polynucleotide is a genomic nucleic acid or DNA derived from a chromosome. In some embodiments, the target polynucleotide includes a sequence comprising a variation, such as a polymorphism or mutation. In some embodiments, the target polynucleotide includes DNA, and does not include RNA. In some embodiments, the target polynucleotide includes RNA, and does not include DNA. In some embodiments, target polynucleotides include DNA and RNA. In some embodiments, the target polynucleotide is an mRNA molecule. In some embodiments, the target polynucleotide is a DNA molecule. In some embodiments, the target polynucleotide is a single-stranded polynucleotide. In some embodiments, the target polynucleotide is a double-stranded polynucleotide. In some embodiments, the target polynucleotide is a single strand of a double-stranded polynucleotide.

靶标多核苷酸可以从任意生物学样品获得，并且使用本领域已知的方法制备。在一些实施方案中，靶标多核苷酸是直接分离的，无需扩增。用于直接分离的方法在本领域中是已知的。非限制性的实例包括从生物学样品、生物体或细胞提取基因组DNA或mRNA。Target polynucleotides can be obtained from any biological sample and prepared using methods known in the art. In some embodiments, the target polynucleotide is isolated directly without amplification. Methods for direct isolation are known in the art. Non-limiting examples include extraction of genomic DNA or mRNA from biological samples, organisms or cells.

在一些实施方案中，从生物学样品纯化一种或多种靶标多核苷酸。在一些实施方案中，靶标多核苷酸不是从包含其的生物学样品纯化的。在一些实施方案中，从生物学样品分离靶标多核苷酸。在一些实施方案中，靶标多核苷酸不是从包含其的生物学样品分离的。例如，在一些实施方案中，不从样品提取或纯化靶标多核苷酸。例如，在一些实施方案中，靶标 mRNA不是从样品纯化的(诸如通过聚腺苷酸(poly-A)纯化方法)。在一些实施方案中，靶标多核苷酸可以是不含细胞的核酸。在一些实施方案中，靶标多核苷酸可以是片段化的核酸。在一些实施方案中，靶标多核苷酸可以是转录的核酸。在一些实施方案中，靶标多核苷酸是修饰的多核苷酸。在一些实施方案中，靶标多核苷酸是未修饰的多核苷酸。In some embodiments, one or more target polynucleotides are purified from a biological sample. In some embodiments, the target polynucleotide is not purified from the biological sample containing it. In some embodiments, the target polynucleotide is isolated from a biological sample. In some embodiments, the target polynucleotide is not isolated from the biological sample containing it. For example, in some embodiments, the target polynucleotide is not extracted or purified from the sample. For example, in some embodiments, the target mRNA is not purified from the sample (such as by polyadenylation (poly-A) purification methods). In some embodiments, the target polynucleotide can be a cell-free nucleic acid. In some embodiments, the target polynucleotide can be a fragmented nucleic acid. In some embodiments, the target polynucleotide can be a transcribed nucleic acid. In some embodiments, the target polynucleotide is a modified polynucleotide. In some embodiments, the target polynucleotide is an unmodified polynucleotide.

在一些实施方案中，靶标多核苷酸是来自单个细胞的多核苷酸。在一些实施方案中，靶标多核苷酸来自个体细胞。在一些实施方案中，靶标多核苷酸是来自包含多种细胞的样品的多核苷酸。In some embodiments, the target polynucleotide is a polynucleotide from a single cell. In some embodiments, the target polynucleotide is from an individual cell. In some embodiments, the target polynucleotide is a polynucleotide from a sample comprising a plurality of cells.

在一些实施方案中，靶标多核苷酸编码生物标记序列。在一些实施方案中，靶标多核苷酸编码2种以上的生物标记序列。在一些实施方案中，多种靶标多核苷酸编码生物标记序列。在一些实施方案中，多种靶标多核苷酸编码2种以上的生物标记序列。In some embodiments, the target polynucleotide encodes a biomarker sequence. In some embodiments, the target polynucleotide encodes more than two biomarker sequences. In some embodiments, the plurality of target polynucleotides encode biomarker sequences. In some embodiments, multiple target polynucleotides encode more than two biomarker sequences.

诊断diagnosis

在一些实施方案中，方法可以进一步包括诊断、预后。监测、治疗、改善和/或预防受试者中的疾病、病症、症状和/或病况。在一些实施方案中，方法可以进一步包括基于靶标多核苷酸的存在、不存在或水平来诊断、预后、监测、治疗、改善和/或预防受试者中的疾病、病症、症状和/ 或病况。在一些实施方案中，方法可以进一步包括基于一种或多种靶标多核苷酸的存在、不存在或水平来诊断、预后、监测、治疗、改善和/或预防受试者中的疾病、病症、症状和/或病况。In some embodiments, the method may further comprise diagnosis, prognosis. Monitoring, treating, ameliorating and/or preventing a disease, disorder, symptom and/or condition in a subject. In some embodiments, the methods may further comprise diagnosing, prognosing, monitoring, treating, ameliorating and/or preventing a disease, disorder, symptom and/or condition in a subject based on the presence, absence or level of the target polynucleotide . In some embodiments, the method may further comprise diagnosing, prognosing, monitoring, treating, ameliorating and/or preventing a disease, disorder, Symptoms and/or Conditions.

在一些实施方案中，方法可以进一步包括基于一种或多种使用本发明的方法获得的序列的存在、不存在、水平或序列来诊断、预后、监测、治疗、改善和/或预防受试者中的疾病、病症、症状和/或病况。例如，可以基于使用本发明的方法获得的变体序列的存在、不存在、水平或序列进行疾病的诊断。在一些实施方案中，方法可以进一步包括基于使用本文所述的方法获得的一种或多种序列读数的存在、不存在、水平或序列来诊断、预后、监测、治疗、改善和/或预防受试者中的疾病、病症、症状和/或病况。在一些实施方案中，方法可以进一步包括基于一种或多种使用本文所述的方法获得的共有序列的存在、不存在、水平或序列来诊断、预后、监测、治疗、改善和/或预防受试者中的疾病、病症、症状和/或病况。在一些实施方案中，方法可以进一步包括基于样品中靶标多核苷酸水平(例如，量或浓度)的确定来诊断、预后、监测、治疗、改善和/或预防受试者中的疾病、病症、症状和/或病况。样品中靶标多核苷酸的水平可以基于一种或多种序列读数、序列、共有序列或它们的组合来确定。样品中多种靶标多核苷酸中每一种的水平可以使用本文所述的方法进行确定。样品中多种靶标多核苷酸中每一种的水平可以基于多种中每一种靶标多核苷酸的序列读数的数量、序列、共有序列或它们的任意组合进行确定。例如，可以使用本文所述的方法确定第一靶标多核苷酸的水平和第二靶标多核苷酸的水平。In some embodiments, the method may further comprise diagnosing, prognosing, monitoring, treating, ameliorating and/or preventing the subject based on the presence, absence, level or sequence of one or more sequences obtained using the methods of the invention Diseases, disorders, symptoms and/or conditions in . For example, a diagnosis of a disease can be made based on the presence, absence, level or sequence of variant sequences obtained using the methods of the invention. In some embodiments, the methods can further comprise diagnosing, prognosing, monitoring, treating, ameliorating and/or preventing a disease based on the presence, absence, level or sequence of one or more sequence reads obtained using the methods described herein. Disease, disorder, symptom and/or condition in the subject. In some embodiments, the methods can further comprise diagnosing, prognosing, monitoring, treating, ameliorating and/or preventing a disease based on the presence, absence, level or sequence of one or more consensus sequences obtained using the methods described herein Disease, disorder, symptom and/or condition in the subject. In some embodiments, the method can further comprise diagnosing, prognosing, monitoring, treating, ameliorating and/or preventing a disease, disorder, Symptoms and/or Conditions. The level of the target polynucleotide in the sample can be determined based on one or more sequence reads, sequences, consensus sequences, or a combination thereof. The level of each of the various target polynucleotides in a sample can be determined using the methods described herein. The level of each of the plurality of target polynucleotides in the sample can be determined based on the number of sequence reads, the sequence, the consensus sequence, or any combination thereof, for each of the target polynucleotides in the plurality. For example, the level of a first target polynucleotide and the level of a second target polynucleotide can be determined using the methods described herein.

在一些实施方案中，多种靶标多核苷酸中的第一和第二靶标多核苷酸是相同的。例如，第一靶标多核苷酸可以包含第一拷贝的mRNA分子，并且第二靶标多核苷酸可以包含第二拷贝的mRNA分子。在一些实施方案中，第一和第二靶标多核苷酸是不同的。例如，第一靶标多核苷酸可以包含第一mRNA分子，并且第二靶标多核苷酸可以包含由与第一mRNA 分子不同的基因转录的第二mRNA分子。例如，第一靶标多核苷酸可以包含第一等位基因，并且第二靶标多核苷酸可以包含第二等位基因。例如，第一靶标多核苷酸可以包含野生型序列，并且第二靶标多核苷酸可以包含变异序列。In some embodiments, the first and second target polynucleotides in the plurality of target polynucleotides are the same. For example, a first target polynucleotide can comprise a first copy of an mRNA molecule, and a second target polynucleotide can comprise a second copy of the mRNA molecule. In some embodiments, the first and second target polynucleotides are different. For example, a first target polynucleotide can comprise a first mRNA molecule, and a second target polynucleotide can comprise a second mRNA molecule transcribed from a different gene than the first mRNA molecule. For example, a first target polynucleotide can comprise a first allele, and a second target polynucleotide can comprise a second allele. For example, a first target polynucleotide can comprise a wild-type sequence, and a second target polynucleotide can comprise a variant sequence.

一组靶标多核苷酸可以包含多个生物标记。一组生物标记可以包含多种靶标多核苷酸。在一些实施方案中，一组生物标记包含来自多种不同的靶标多核苷酸中的每一种的序列。例如，一组生物标记可以包含不同的第一和第二靶标多核苷酸的序列。例如，一组靶标多核苷酸可以包含多种已知与疾病相关或已知不与疾病相关的生物标记，诸如变体序列。例如，一组靶标多核苷酸可以包括至少一种用于多种基因座中的每一个的生物标记。在一些实施方案中，一组靶标多核苷酸中的两种以上的靶标多核苷酸的类型是不同的。例如，一组靶标多核苷酸可以包括多种包含第一靶标 mRNA分子和第二靶标DNA分子的靶标多核苷酸。例如，一组靶标多核苷酸可以包括多种包含第一靶标和第二靶标的靶标多核苷酸，所述第一靶标是RNA，并且所述第二靶标是DNA。例如，一组靶标多核苷酸可以包括多种包含第一靶标和第二靶标的靶标多核苷酸，所述第一靶标是 mRNA，并且所述第二靶标是基因组DNA。在一些实施方案中，一组靶标多核苷酸中的两种以上的靶标多核苷酸的类型是相同的。例如，一组靶标多核苷酸可以包括多种包含第一靶标和第二靶标的靶标多核苷酸，所述第一靶标是RNA，并且所述第二靶标是RNA。例如，一组靶标多核苷酸可以包括多种包含第一靶标和第二靶标的靶标多核苷酸，所述第一靶标是 mRNA，并且所述第二靶标是mRNA。例如，一组靶标多核苷酸可以包括多种包含第一靶标和第二靶标的靶标多核苷酸，所述第一靶标是 mRNA，并且所述第二靶标是miRNA。例如，一组靶标多核苷酸可以包括多种包含第一靶标和第二靶标的靶标多核苷酸，所述第一靶标是DNA，并且所述第二靶标是DNA。例如，一组靶标多核苷酸可以包括多种包含第一靶标和第二靶标的靶标多核苷酸，所述第一靶标是基因组DNA，并且第二靶标是基因组DNA。例如，一组靶标多核苷酸可以包括多种包含第一靶标和第二靶标的靶标多核苷酸，所述第一靶标是细胞DNA，并且第二靶标是循环DNA。A set of target polynucleotides can contain multiple biomarkers. A panel of biomarkers can comprise multiple target polynucleotides. In some embodiments, a panel of biomarkers comprises sequences from each of a plurality of different target polynucleotides. For example, a set of biomarkers can comprise sequences of different first and second target polynucleotides. For example, a set of target polynucleotides can comprise a variety of biomarkers, such as variant sequences, known to be associated with disease or not known to be associated with disease. For example, a set of target polynucleotides can include at least one biomarker for each of the plurality of loci. In some embodiments, the types of two or more target polynucleotides in a set of target polynucleotides are different. For example, a set of target polynucleotides can include a plurality of target polynucleotides comprising a first target mRNA molecule and a second target DNA molecule. For example, a set of target polynucleotides can include a plurality of target polynucleotides comprising a first target and a second target, the first target being RNA and the second target being DNA. For example, a set of target polynucleotides can include a plurality of target polynucleotides comprising a first target that is mRNA and a second target that is genomic DNA. In some embodiments, two or more target polynucleotides in a set of target polynucleotides are of the same type. For example, a set of target polynucleotides can include a plurality of target polynucleotides comprising a first target and a second target, the first target being RNA and the second target being RNA. For example, a set of target polynucleotides can include a plurality of target polynucleotides comprising a first target and a second target, the first target being mRNA, and the second target being mRNA. For example, a set of target polynucleotides can include a plurality of target polynucleotides comprising a first target and a second target, the first target being an mRNA and the second target being a miRNA. For example, a set of target polynucleotides can include a plurality of target polynucleotides comprising a first target and a second target, the first target being DNA, and the second target being DNA. For example, a set of target polynucleotides can include a plurality of target polynucleotides comprising a first target that is genomic DNA and a second target that is genomic DNA. For example, a set of target polynucleotides can include a plurality of target polynucleotides comprising a first target that is cellular DNA and a second target that is circulating DNA.

在一些实施方案中，一组靶标多核苷酸中两种以上的靶标多核苷酸的生物标记的类型是不同的。例如，一组靶标多核苷酸可以包括多种生物标记，包括针对基因座的第一生物标记、针对变体序列的第二生物标记。例如，一组靶标多核苷酸可以包括多种生物标记，包括针对SNP的第一生物标记和针对突变的第二生物标记。在一些实施方案中，一组靶标多核苷酸中两种以上的靶标多核苷酸的生物标记的类型是相同的。例如，一组靶标多核苷酸可以包括多种生物标记，包括针对基因座的第一生物标记、针对另一个基因座的第二生物标记。例如，一组靶标多核苷酸可以包括多种生物标记，包括针对SNP的第一生物标记、针对SNP的第二生物标记。In some embodiments, the types of biomarkers are different for two or more target polynucleotides in a set of target polynucleotides. For example, a set of target polynucleotides can include multiple biomarkers, including a first biomarker for a locus, a second biomarker for a variant sequence. For example, a set of target polynucleotides can include multiple biomarkers, including a first biomarker for a SNP and a second biomarker for a mutation. In some embodiments, the type of biomarker is the same for two or more target polynucleotides in a set of target polynucleotides. For example, a set of target polynucleotides can include multiple biomarkers, including a first biomarker for a locus, a second biomarker for another locus. For example, a set of target polynucleotides can include multiple biomarkers, including a first biomarker for the SNP, a second biomarker for the SNP.

在一些实施方案中，方法可以进一步包括以至少50％的置信度诊断或预后患有疾病、病症、症状和/或病况的受试者。在一些实施方案中，可以以至少50％的置信度确定受试者中靶标多核苷酸(诸如生物标记) 的存在、不存在、水平、序列或它们的任意组合。在一些实施方案中，可以以50％-100％的置信度确定受试者中靶标多核苷酸的存在、不存在、水平、序列或它们的任意组合。In some embodiments, the method can further comprise diagnosing or prognosing a subject having a disease, disorder, symptom and/or condition with at least 50% confidence. In some embodiments, the presence, absence, level, sequence, or any combination thereof, of a target polynucleotide (such as a biomarker) in a subject can be determined with at least 50% confidence. In some embodiments, the presence, absence, level, sequence, or any combination thereof of a target polynucleotide in a subject can be determined with a confidence of 50%-100%.

样品sample

用于本文时，样品包括生物学、环境、医学、或患者来源或包含多核苷酸(诸如靶标多核苷酸)的样品。任何包含多核苷酸的生物学样品都可以用在本文所述的方法中。例如，样品可以是来自受试者的包含RNA或 DNA的生物学样品。可以从生物学样品提取多核苷酸，或者可以直接将样品进行所述方法，无需提取多核苷酸。样品可以是提取的或分离的DNA 或RNA。样品还可以是从生物学样品提取的总DNA或RNA、cDNA文库、病毒或基因组DNA。在一个实施方案中，多核苷酸是从包含多种其他成分(诸如蛋白、脂质和非模板核酸)的生物学样品中分离的。核酸模板分子可以从任意由动物、植物、细菌、真菌或任意其他细胞生物体获得的细胞材料获得。在某些实施方案中，多核苷酸是从单个细胞获得的。多核苷酸可以直接从生物体获得或从由生物体获得的生物学样品获得。任意的组织或体液样品可以用作用于本发明的核酸的来源。多核苷酸还可以从培养的细胞分离，诸如从原代细胞培养物或细胞系分离。可以用病毒或其他细胞内病原体感染从中获得模板核酸的细胞或组织。As used herein, a sample includes a sample of biological, environmental, medical, or patient origin or comprising a polynucleotide, such as a target polynucleotide. Any biological sample comprising polynucleotides can be used in the methods described herein. For example, the sample can be a biological sample comprising RNA or DNA from a subject. The polynucleotides can be extracted from the biological sample, or the sample can be directly subjected to the method without extraction of the polynucleotides. The sample can be extracted or isolated DNA or RNA. The sample can also be total DNA or RNA, cDNA library, viral or genomic DNA extracted from a biological sample. In one embodiment, polynucleotides are isolated from biological samples that contain various other components, such as proteins, lipids, and non-template nucleic acids. Nucleic acid template molecules can be obtained from any cellular material obtained from animals, plants, bacteria, fungi or any other cellular organism. In certain embodiments, the polynucleotide is obtained from a single cell. Polynucleotides can be obtained directly from an organism or from a biological sample obtained from an organism. Any tissue or body fluid sample can be used as a source of nucleic acids for use in the present invention. Polynucleotides can also be isolated from cultured cells, such as from primary cell cultures or cell lines. The cell or tissue from which the template nucleic acid is obtained can be infected with a virus or other intracellular pathogen.

DNA提取的方法在本领域中是公知的。经典DNA分离流程是基于使用有机溶剂(诸如苯酚和氯仿的混合物)提取，然后用乙醇沉淀(J. Sambrook等，″Molecular Cloning：ALaboratory Manual，″1989，第2版， Cold Spring Harbour Laboratory Press：New York，N.Y.)。其他方法包括：盐析DNA提取(P.Sunnucks等，Genetics，1996，144：747-756；S.M.Aljanabi and I.Martinez，Nucl.Acids Res.1997，25：4692-4693)，溴化三甲铵盐DNA提取(S.Gustincich等，BioTechniques，1991，11：298-302) 和硫氰酸胍DNA提取(J.B.W.Hammond等，Biochemistry，1996，240：298-300)。可商购多种试剂盒用于从生物学样品提取DNA(例如，BD Biosciences Clontech(Palo Alto，CA)：Epicentre Technologies(Madison， WI)；Gentra Systems，Inc.(Minneapolis，MN)；MicroProbe Corp. (Bothell，WA)；Organon Teknika(Durham，NC)；和Qiagen Inc. (Valencia，CA))。Methods of DNA extraction are well known in the art. Classic DNA isolation protocols are based on extraction using organic solvents such as a mixture of phenol and chloroform, followed by ethanol precipitation (J. Sambrook et al., "Molecular Cloning: A Laboratory Manual," 1989, 2nd Edition, Cold Spring Harbour Laboratory Press: New York, N.Y.). Other methods include: Salting out DNA extraction (P. Sunnucks et al, Genetics, 1996, 144: 747-756; S. M. Aljanabi and I. Martinez, Nucl. Acids Res. 1997, 25: 4692-4693), trimethylammonium bromide salt DNA extraction (S. Gustincich et al., BioTechniques, 1991, 11:298-302) and guanidine thiocyanate DNA extraction (J.B.W. Hammond et al., Biochemistry, 1996, 240:298-300). Various kits are commercially available for DNA extraction from biological samples (eg, BD Biosciences Clontech (Palo Alto, CA): Epicentre Technologies (Madison, WI); Gentra Systems, Inc. (Minneapolis, MN); MicroProbe Corp. (Bothell, WA); Organon Teknika (Durham, NC); and Qiagen Inc. (Valencia, CA)).

RNA提取的方法在本领域中是公知的(参见，例如，J.Sambrook 等，″MolecularCloning：A Laboratory Manual″1989，211d Ed.，Cold Spring Harbour LaboratoryPress：New York)，并且可商购一些用于从体液进行RNA提取的试剂盒(例如，Ambion，Inc.(Austin，TX)；Amersham Biosciences(Piscataway，NJ)；BD Biosciences Clontech(PaloAlto，CA)； BioRad Laboratories(Hercules，CA)；Dynal Biotech Inc.(Lake Success，NY)；Epicentre Technologies(Madison，WI)；Gentra Systems，Inc. (Minneapolis，MN)；GIBCO BRL(Gaithersburg，MD)；Invitrogen Life Technologies(Carlsbad，CA)；MicroProbe Corp.(Bothell，WA)；Organon Teknika(Durham，NC)；Promega，Inc.(Madison，WI)；和Qiagen Inc， (Valencia，CA))。Methods of RNA extraction are well known in the art (see, eg, J. Sambrook et al., "Molecular Cloning: A Laboratory Manual" 1989, 211d Ed., Cold Spring Harbour Laboratory Press: New York), and some are commercially available for use in Kits for RNA extraction from body fluids (eg, Ambion, Inc. (Austin, TX); Amersham Biosciences (Piscataway, NJ); BD Biosciences Clontech (Palo Alto, CA); BioRad Laboratories (Hercules, CA); Dynal Biotech Inc. (Lake Success, NY); Epicentre Technologies (Madison, WI); Gentra Systems, Inc. (Minneapolis, MN); GIBCO BRL (Gaithersburg, MD); Invitrogen Life Technologies (Carlsbad, CA); MicroProbe Corp. (Bothell, WA) ); Organon Teknika (Durham, NC); Promega, Inc. (Madison, WI); and Qiagen Inc, (Valencia, CA)).

一种或多种样品可以来自一种或多种来源。样品中的一种或多种可以来自两种以上的来源。样品中的一种或多种可以来自一名或多名受试者。样品中的一种或多种可以来自两名以上的受试者。样品中的一种或多种可以来自同一名受试者。一名或多名受试者可以来自不同的物种。一名或多名受试者可以是健康的。一名或多名受试者可以感染了疾病、病症或病况。One or more samples can be from one or more sources. One or more of the samples can be from more than two sources. One or more of the samples can be from one or more subjects. One or more of the samples can be from more than two subjects. One or more of the samples can be from the same subject. One or more subjects can be from different species. One or more subjects can be healthy. One or more subjects may be infected with a disease, disorder or condition.

在一些实施方案中，样品是液体，诸如血液、唾液、淋巴、尿、脑脊液、精液、痰、粪便或组织匀浆物。In some embodiments, the sample is a liquid, such as blood, saliva, lymph, urine, cerebrospinal fluid, semen, sputum, stool, or tissue homogenate.

样品可以取自患有病况的受试者。在一些实施方案中，提取样品的受试者可以是患者，例如，癌症患者或怀疑患有癌症的患者。受试者可以是哺乳动物，例如，人，并且可以是雄性或雌性。在一些实施方案中，雌性是怀孕的。样品可以是肿瘤组织活检。例如，组织活检可以通过保健护理提供者进行，保健护理提供者包括医师、医师助手、护士、兽医、牙医、按摩技师、急救医生、皮肤科医生、肿瘤科医生、肠胃科医生或外科医生。A sample can be taken from a subject with the condition. In some embodiments, the subject from which the sample is drawn can be a patient, e.g., a cancer patient or a patient suspected of having cancer. The subject can be a mammal, e.g., a human, and can be male or female. In some embodiments, the female is pregnant. The sample can be a tumor tissue biopsy. For example, a tissue biopsy can be performed by a health care provider including a physician, physician's assistant, nurse, veterinarian, dentist, massage technician, emergency physician, dermatologist, oncologist, gastroenterologist, or surgeon.

在一些实施方案中，所述疾病或病况是致病性感染。靶标多核苷酸可以来自病原体。病原体可以是病毒、细菌、真菌或原生动物。在一些实施方案中，病原体可以是原生动物，诸如棘阿米巴属(Acanthamoeba)(例如，阿斯特罗尼棘阿米巴(A.astronyxis)，卡氏棘阿米巴(A.castellanii)，卡伯特森氏棘阿米巴(A.culbertsoni)，A.hatchetti，多食棘阿米巴(A. polyphaga)，A.rhysodes，A.healyi，A.divionensis)，Brachiola(例如，Bconnori，B.vesicularum)，隐孢子虫属(Cryptosporidium)(例如，小隐孢子虫(C.parvum))，环孢子虫(Cyclospora)(例如，C.cayetanensis)，脑炎微孢子虫属(Encephalitozoon)(例如，兔脑炎微孢子虫(E.cuniculi)，E. hellem，E.intestinalis)，内阿米巴属(Entamoeba)(例如，溶组织内阿米巴 (E.histolytica))，肠孢虫属(Enterocytozoon)(例如，E.bieneusi)，贾第虫属 (Giardia)(例如，兰伯贾第虫(G.lamblia))，等孢子球虫属(Isospora)(例如，贝氏等孢子球虫(I.belli))，微孢子虫属(Microsporidium)(例如，M. africanum，M.ceylonensis)，纳氏虫属(Naegleria)(例如，福氏耐格里原虫 (N.fowleri))，微粒子虫属(Nosema)(例如，阿尔及尔微粒子虫(N.algerae)，N.ocularum)，具褶孢虫属(Pleistophora)，Trachipleistophora (例如，T.anthropophthera，T.hominis)，和角膜微孢子(Vittaforma)(例如，V.corneae)。病原体可以是真菌，诸如，念珠菌属(Candida)，曲霉菌属(Aspergillus)，隐球菌属(Cryptococcus)，组织胞浆菌属 (Histoplasma)，肺囊虫(Pneumocystis)，和葡萄穗霉属(Stachybotrys)。病原体可以是细菌。示例性的细菌包括，但不限于，博德特菌属(Bordetella)，疏螺旋体属(Borrelia)，布鲁杆菌属(Brucella)，弯曲杆菌属(Campylobacter)，表原体属(Chlamydia)，嗜衣原体属 (Chlamydophila)，梭菌属(Clostridium)，棒杆菌属(Corynebacterium)，肠球菌属(Enterococcus)，埃希氏菌属(Escherichia)，弗朗西丝菌属 (Francisella)，嗜血菌属(Haemophilus)，螺杆菌属(Helicobacter)，军团病杆菌属(Legionella)，钩端螺旋体属(Leptospira)，李斯特菌属(Listeria)，分枝杆菌属(Mycobacterium)，支原体属(Mycoplasma)，奈瑟球菌属(Neisseria)，假单胞菌属(Pseudomonas)，立克次体属 (Rickettsia)，沙门菌属(Salmonella)，志贺菌属(Shigella)，葡萄球菌属(Staphylococcus)，链球菌属(Streptococcus)，密螺旋体属 (Treponema)，弧菌属(Vibrio)，或耶尔森菌属(Yersinia)。病毒可以是反转录病毒。反转录病毒的实例包括，但不限于，单链RNA-RT (ssRNA-RT)病毒和双链DNA-RT(dsDNA-RT)病毒。ssRNA-RT病毒的非限制性实例包括反转录病毒、α反转录病毒、β反转录病毒、γ反转录病毒、δ反转录病毒、ε反转录病毒、慢病毒、泡沫病毒(spumavirus)、 metavirirus和假病毒(pseudoviruses)。dsDNA-RT病毒的非限制性实例包括Hepadenovirus和花椰菜花叶病毒(caulimovirus)。病毒可以是DNA 病毒。病毒可以是RNA病毒。DNA病毒可以是双链DNA(dsDNA)病毒。在一些实施方案中，dsDNA病毒可以是腺病毒、疱疹病毒或痘病毒。腺病毒的实例包括，但不限于，腺病毒和传染性的犬肝炎病毒。疱疹病毒的实例包括，但不限于，单纯疱疹病毒、水痘带状疱疹病毒(varicella-zoster virus)、巨细胞病毒(cytomegalovirus)和埃巴病毒(Epstein-Barr virus)。痘病毒的非限制性实例包括天花病毒(smallpox virus)、牛痘病毒(cow pox virus)、羊痘病毒(sheep poxvirus)、猴痘病毒(monkey pox virus) 和痘苗病毒(vaccinia virus)。DNA病毒可以是单链DNA(ssDNA)病毒。 ssDNA病毒可以是细小病毒属(parvovirus)。细小病毒属的实例包括，但不限于，细小病毒B19，犬细小病毒，小鼠细小病毒，猪细小病毒，猫瘟病毒(felinepanleukopenia)，和貂肠炎病毒(Mink enteritis virus)。In some embodiments, the disease or condition is a pathogenic infection. The target polynucleotide can be derived from a pathogen. Pathogens can be viruses, bacteria, fungi or protozoa. In some embodiments, the pathogen may be a protozoan, such as Acanthamoeba (eg, A. astronyxis, A. castellanii) , A. culbertsoni, A. hatchchetti, A. polyphaga, A. rhysodes, A. healyi, A. divionensis, Brachiola (eg, Bconnori , B. vesicularum), Cryptosporidium (eg, C. parvum), Cyclospora (eg, C. cayetanensis), Encephalitozoon (eg, E. cuniculi, E. hellem, E. intestinalis), Entamoeba (eg, E. histolytica), Enterospora Enterocytozoon (eg, E. bieneusi), Giardia (eg, G. lamblia), Isospora (eg, Isosporium besii) Coccidia (I. belli), Microsporidium (eg, M. africanum, M. ceylonensis), Naegleria (eg, N. fowleri) , Nosema (eg, N. algerae, N. ocularum), Pleistophora, Trachipleistophora (eg, T. anthropophthera, T. hominis), and Cornea Microspores (Vittaforma) (eg, V. corneae). The pathogen may be a fungus, such as, Candida, Aspergillus, Cryptococcus, Histoplasma, Pneumocystis, and Staphylococcus ( Stachybotrys). Pathogens can be bacteria. Exemplary bacteria include, but are not limited to, Bordetella (Bordetella), Borrelia (Borrelia), Brucella (Brucella), Campylobacter (Campylobacter), Epiplasma (Chlamydia), Chlamydophila, Clostridium, Corynebacterium, Enterococcus, Escherichia, Francisella, Haemophilus , Helicobacter, Legionella, Leptospira, Listeria, Mycobacterium, Mycoplasma, Neisseria (Neisseria), Pseudomonas, Rickettsia, Salmonella, Shigella, Staphylococcus, Streptococcus , Treponema, Vibrio, or Yersinia. The virus can be a retrovirus. Examples of retroviruses include, but are not limited to, single-stranded RNA-RT (ssRNA-RT) viruses and double-stranded DNA-RT (dsDNA-RT) viruses. Non-limiting examples of ssRNA-RT viruses include retroviruses, alpha retroviruses, beta retroviruses, gamma retroviruses, delta retroviruses, epsilon retroviruses, lentiviruses, foamy viruses (spumavirus), metavirirus and pseudoviruses (pseudoviruses). Non-limiting examples of dsDNA-RT viruses include Hepadenovirus and caulimovirus. The virus can be a DNA virus. The virus can be an RNA virus. The DNA virus may be a double-stranded DNA (dsDNA) virus. In some embodiments, the dsDNA virus can be an adenovirus, a herpes virus, or a pox virus. Examples of adenoviruses include, but are not limited to, adenovirus and infectious canine hepatitis virus. Examples of herpes viruses include, but are not limited to, herpes simplex virus, varicella-zoster virus, cytomegalovirus, and Epstein-Barr virus. Non-limiting examples of poxviruses include smallpox virus, cow pox virus, sheep pox virus, monkey pox virus, and vaccinia virus. The DNA virus may be a single-stranded DNA (ssDNA) virus. The ssDNA virus may be of the genus parvovirus. Examples of parvoviruses include, but are not limited to, parvovirus B19, canine parvovirus, mouse parvovirus, porcine parvovirus, felinepanleukopenia, and Mink enteritis virus.

病毒可以是RNA病毒。RNA病毒可以是双链RNA(dsRNA)病毒， (+)有义单链RNA病毒((+)ssRNA)病毒，或(-)有义单链((-)ssRNA)病毒。 dsRNA病毒的非限制性列表包括呼肠孤病毒(reovirus)，正呼肠病毒属 (orthoreovirus)，质形多角体病毒属(cypovirus)，轮状病毒属 (rotavirus)，蓝舌病毒(bluetongue virus)，和植物呼肠病毒属(phytoreovirus)。(+)ssRNA病毒的实例包括，但不限于，微小核糖核酸病毒(picornavirus)和囊膜病毒(togavirus)。微小核糖核酸病毒的实例包括，但不限于，肠病毒(enterovirus)，鼻病毒(rhinovirus)，肝病毒(hepatovirus)，心病毒属(cardiovirus)，口蹄疫病毒属(aphthovirus)，脊髓灰质炎病毒(poliovirus)，副肠孤病毒(parechovirus)，马鼻炎病毒(erbovirus)，嵴病毒属(kobuvirus)，捷申病毒属(teschovirus)，和柯萨奇病毒(coxsackie)。在一些实施方案中，囊膜病毒是风疹病毒(rubella virus)，辛德毕斯病毒(Sindbis virus)，东方马脑炎病毒(Eastern equineencephalitis virus)，西方马脑炎病毒(Western equine encephalitis virus)，委内瑞拉马脑炎病毒(Venezuelan equine encephalitis virus)，罗斯河病毒(Ross Rivervirus)，奥-奈氏病毒(O′nyong′nyong virus)，基孔肯亚病毒(Chikungunya)，或生里基森林病毒(Semliki Forest virus)。 (-)ssRNA病毒的非限制性列表包括正粘病毒(orthomyxovirus)和弹状病毒(rhabdovirus)。正粘病毒的实例包括，但不限于，流感病毒A型 (influenzavirus a)，流感病毒B型(influenzavirus B)，流感病毒C型(influenzavirus C)，isavirus，和索戈托病毒属(thogotovirus)。弹状病毒的实例包括，但不限于，质型弹状病毒属(cytorhabdovirus)， dichorhabdovirus，短暂热病毒属(ephemerovirus)，狂犬病病毒属 (lyssavirus)，粒外弹状病毒属(novirhabdovirus)，和水疱病毒属 (vesiculovirus)。The virus can be an RNA virus. The RNA virus can be a double-stranded RNA (dsRNA) virus, a (+) sense single-stranded RNA virus ((+)ssRNA) virus, or a (-) sense single-stranded ((-)ssRNA) virus. A non-limiting list of dsRNA viruses includes reovirus, orthoreovirus, cypovirus, rotavirus, bluetongue virus , and plant reovirus (phytoreovirus). Examples of (+)ssRNA viruses include, but are not limited to, picornaviruses and togaviruses. Examples of picornaviruses include, but are not limited to, enterovirus, rhinovirus, hepatovirus, cardiovirus, aphthovirus, poliovirus ), parechovirus, erbovirus, kobuvirus, teschovirus, and coxsackie. In some embodiments, the enveloped virus is rubella virus, Sindbis virus, Eastern equineencephalitis virus, Western equine encephalitis virus, Venezuelan equine encephalitis virus Venezuelan equine encephalitis virus, Ross River virus, O'nyong'nyong virus, Chikungunya, or Semliki Forest virus). A non-limiting list of (-)ssRNA viruses includes orthomyxovirus and rhabdovirus. Examples of orthomyxoviruses include, but are not limited to, influenza virus A, influenza B, influenza C, isavirus, and thogotovirus. Examples of rhabdoviruses include, but are not limited to, cytorhabdovirus, dichorhabdovirus, ephemerovirus, lyssavirus, novirhabdovirus, and blisters Virus genus (vesiculovirus).

样品可以是来自生物体或病毒的生物学样品。用于本发明的样品包括病毒颗粒或制备物。在一些实施方案中，起始材料可以是来自生物体的包含核酸的样品，从其可以获得遗传材料。样品中的一种或多种可以来自哺乳动物、细菌、病毒、真菌或植物。一种或多种样品可以来自人、马、母牛、鸡、猪、大鼠、小鼠、猴、兔、豚鼠、绵羊、山羊、狗、猫、鸟、鱼、青蛙和果蝇。The sample can be a biological sample from an organism or a virus. Samples for use in the present invention include viral particles or preparations. In some embodiments, the starting material can be a nucleic acid-containing sample from an organism from which genetic material can be obtained. One or more of the samples can be from mammals, bacteria, viruses, fungi or plants. The one or more samples can be from humans, horses, cows, chickens, pigs, rats, mice, monkeys, rabbits, guinea pigs, sheep, goats, dogs, cats, birds, fish, frogs, and fruit flies.

在一些实施方案中，多核苷酸结合到其他靶标分子上，诸如蛋白、酶、底物、抗体、结合剂、珠子、小分子、肽或任意其他分子。通常，核酸可以通过多种技术从生物学样品提取(Sambrook等，Molecular Cloning：A Laboratory Manual，第三版，Cold SpringHarbor，N.Y.(2001))。In some embodiments, polynucleotides bind to other target molecules, such as proteins, enzymes, substrates, antibodies, binding agents, beads, small molecules, peptides, or any other molecule. In general, nucleic acids can be extracted from biological samples by a variety of techniques (Sambrook et al., Molecular Cloning: A Laboratory Manual, Third Edition, Cold Spring Harbor, N.Y. (2001)).

在一些实施方案中，样品是唾液。在一些实施方案中，样品是全血。在一些实施方案中，为了获得足以用于检测的量的多核苷酸，抽取至少约 0.001，0.005，0.01，0.05，0.1，0.5，1，2，3，4，5，10，20，25， 30，35，40，45或50mL的血液体积。在一些实施方案中，血液可以收集到包含镁螯合剂(包括，但不限于EDTA)的装置中，并且保存在4℃。任选地，可以加入钙螯合剂，包括，但不限于，EGTA。In some embodiments, the sample is saliva. In some embodiments, the sample is whole blood. In some embodiments, to obtain a sufficient amount of polynucleotide for detection, at least about 0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 2, 3, 4, 5, 10, 20, 25, 30, 35, 40, 45 or 50 mL of blood volume. In some embodiments, blood can be collected into a device containing a magnesium chelator (including, but not limited to, EDTA) and stored at 4°C. Optionally, calcium chelating agents can be added, including, but not limited to, EGTA.

在一些实施方案中，向血液中加入细胞溶解抑制剂，包括，但不限于，甲醛，甲醛衍生物，福尔马林，戊二醛，戊二醛衍生物，蛋白交联剂，核酸交联剂，蛋白和核酸交联剂，伯胺反应性交联剂，巯基反应性衍生物，巯基加成或二硫化物还原，碳水化合物反应性交联剂，羧基反应性交联剂，光反应性交联剂，或可切割的交联剂。在一些实施方案中，可以使用酶处理(诸如蛋白酶消化)从起始材料中去除非核酸材料。In some embodiments, cytolysis inhibitors are added to the blood, including, but not limited to, formaldehyde, formaldehyde derivatives, formalin, glutaraldehyde, glutaraldehyde derivatives, protein cross-linkers, nucleic acid cross-linkers Agents, Protein and Nucleic Acid Crosslinkers, Primary Amine Reactive Crosslinkers, Sulfhydryl Reactive Derivatives, Sulfhydryl Addition or Disulfide Reduction, Carbohydrate Reactive Crosslinkers, Carboxyl Reactive Crosslinkers, Photoreactive Crosslinkers, or cleavable crosslinkers. In some embodiments, enzymatic treatment, such as protease digestion, can be used to remove non-nucleic acid material from the starting material.

在一些实施方案中，起始材料可以是包含组织的组织样品，非限制性的实例包括脑、肝脏、肺、肾、前列腺、卵巢、脾、淋巴结(包括扁桃体)、甲状腺、胰腺、心脏、骨骼肌、肠、喉、食道、胃、骨骼、心脏、胸腺、动脉、血管、肺、肌肉、胃、肠、肝脏、胰腺、脾、肾、胆囊、甲状腺、肾上腺、乳腺、卵巢、前列腺、睾丸、皮肤、脂肪、眼或脑。在其他情形中，起始材料可以是包含核酸的细胞。组织可以包括感染的组织、患病的组织、恶性组织、钙化组织或健康组织。样品可以包括至少一种来自一种或多种生物学样品的细胞。例如，样品可以包括一种或多种恶性细胞。In some embodiments, the starting material can be a tissue sample comprising tissue, non-limiting examples including brain, liver, lung, kidney, prostate, ovary, spleen, lymph nodes (including tonsils), thyroid, pancreas, heart, bone Muscle, intestine, larynx, esophagus, stomach, bone, heart, thymus, artery, blood vessel, lung, muscle, stomach, intestine, liver, pancreas, spleen, kidney, gallbladder, thyroid, adrenal gland, breast, ovary, prostate, testis, skin, fat, eyes or brain. In other cases, the starting material can be a cell comprising nucleic acid. Tissue can include infected tissue, diseased tissue, malignant tissue, calcified tissue, or healthy tissue. A sample can include at least one cell from one or more biological samples. For example, a sample can include one or more malignant cells.

一种或多种恶性细胞可以来源于肿瘤、癌、肉瘤或白血病。肉瘤是骨骼、软骨、脂肪、肌肉、血管或其他结缔组织或支持组织的癌症。肉瘤包括，但不限于，骨癌、纤维肉瘤(fibrosarcoma)、软骨肉瘤(chondrosarcoma)、尤因肉瘤(Ewing′s sarcoma)、恶性血管内皮细胞瘤(malignant hemangioendothelioma)、恶性神经鞘瘤(malignant schwannoma)、双侧前庭神经鞘瘤(bilateral vestibular schwannoma)、骨肉瘤(osteosarcoma)、软组织肉瘤(soft tissue sarcomas)(例如软组织腺泡状肉瘤(alveolar soft partsarcoma)，血管肉瘤(angiosarcoma)，叶状囊性肉瘤(cystosarcoma phylloides)，皮肤纤维肉瘤(dermatofibrosarcoma)，韧带样纤维瘤 (desmoid tumor)，上皮样肉瘤(epithelioid sarcoma)，骨外骨肉瘤 (extraskeletal osteosarcoma)，纤维肉瘤(fibrosarcoma)，血管外皮细胞瘤(hemangiopericytoma)，血管肉瘤(hemangiosarcoma)，卡波西肉瘤(Kaposi′s sarcoma)，平滑肌肉瘤(leiomyosarcoma)，脂肪肉瘤(liposarcoma)，淋巴管肉瘤(lymphangiosarcoma)，淋巴肉瘤 (lymphosarcoma)，恶性纤维组织细胞瘤(malignant fibrous histiocytoma)，神经纤维肉瘤(neurofibrosarcoma)，横纹肌肉瘤(rhabdomyosarcoma)，和滑膜肉瘤(synovial sarcoma))。癌是在上皮细胞中开始的癌症，上皮细胞是覆盖身体表面、产生激素并且组成腺体的细胞。通过非限制性举例的方式，癌包括乳腺癌、胰腺癌、肺癌、结肠癌、结直肠癌、直肠癌、肾癌、膀胱癌、胃癌、前列腺癌、肝癌、卵巢癌、脑癌、阴道癌、外阴癌、子宫癌、口癌、阴茎癌、睾丸癌、食管癌、皮肤癌、输卵管癌、头颈癌、胃肠间质癌、腺癌、皮肤或眼内黑素瘤、肛区癌症、小肠癌、内分泌系统癌症、甲状腺癌、副甲状腺癌、肾腺癌、尿道癌、肾盂癌、输尿管癌、子宫内膜癌、宫颈癌、垂体癌、中枢神经系统(CNS)肿瘤、原发性CNS淋巴瘤、脑干胶质瘤和脊柱轴瘤。在一些实施方案中，癌症是皮肤癌，诸如基底细胞癌、鳞状细胞癌、黑素瘤、非黑素瘤或光化 (太阳光)角化病。在一些实施方案中，癌症是肺癌。肺癌可以在分支为支气管以供应肺(支气管)或肺的小气囊(肺泡)的呼吸道中起始。肺癌包括非小细胞肺癌(NSCLC)、小细胞肺癌和间皮瘤(mesotheliomia)。 NSCLC的实例包括鳞状细胞癌、腺癌、和大细胞癌。间皮瘤可以是肺和胸腔(胸膜)的衬里或腹部衬里(腹膜)的癌性肿瘤。间皮瘤可能是由于石棉暴露(asbestosexposure)导致。癌症可以是脑癌，诸如胶质母细胞瘤(glioblastoma)。在一些实施方案中，癌症可以是中枢神经系统(CNS) 肿瘤。CNS肿瘤可以分类为胶质瘤或非胶质瘤。胶质瘤可以是恶心胶质瘤、高级别胶质瘤、弥散性内生性脑桥胶质瘤(diffuse intrinsicpontine glioma)。胶质瘤的实例包括星形细胞瘤(astrocytomas)，少突神经胶质细胞瘤(oligodendrogliomas)(或少突神经胶质细胞瘤和星形细胞瘤要素的混合物)，和室管膜瘤(ependymomas)。星形细胞瘤包括，但不限于，低级别星形细胞瘤，间变性星形细胞瘤(anaplastic astrocytomas)，多形性胶质母细胞瘤(glioblastoma multiforme)，纤维状细胞性星形细胞瘤(pilocytic astrocytoma)，多形性黄色星形细胞瘤(pleomorphicxanthoastrocytoma)，和室管膜下巨细胞星形细胞瘤(subependymal giant cellastrocytoma)。少突神经胶质细胞瘤包括低等级少突神经胶质细胞瘤 (或少突星形细胞瘤)和间变性少突神经胶质细胞瘤。非胶质瘤包括脑膜瘤(meningiomas)，垂体腺瘤(pituitary adenomas)，原发性CNS淋巴瘤，和髓母细胞瘤(medulloblastomas)。在一些实施方案中，癌症是脑膜瘤。白血病可以是急性淋巴细胞白血病(acute lymphocyticleukemia)，急性髓细胞白血病(acute myelocytic leukemia)，慢性淋巴细胞白血病(chronic lymphocytic leukemia)，或慢性髓细胞白血病(chronic myelocyticleukemia)。另外类型的白血病包括毛细胞白血病(hairy cell leukemia)、慢性骨髓单核细胞性白血病(chronic myelomonocytic leukemia)，和青少年骨髓单核细胞性白血病(juvenile myelomonocytic leukemia)。淋巴瘤是淋巴细胞癌症，并且可能由B或T淋巴细胞发展而来。两种主要类型的淋巴瘤是霍奇金淋巴瘤(Hodgkin’s lymphoma)(之前称为霍奇金病)和非霍奇金淋巴瘤。霍奇金淋巴瘤以存在里-施细胞 (Reed-Sternberg cell)为标志。非霍奇金淋巴瘤是所有不是霍奇金淋巴瘤的淋巴瘤。非霍奇金淋巴瘤可以是惰性淋巴瘤(indolent lymphomas)和侵袭性淋巴瘤(aggressive lymphomas)。非霍奇金淋巴瘤包括，但不限于，弥散性大B细胞淋巴瘤，滤泡淋巴瘤(follicular lymphoma)，粘膜相关的淋巴组织淋巴瘤(mucosa-associated lymphatic tissue lymphoma (MALT))，小细胞淋巴细胞淋巴瘤(small cell lymphocytic lymphoma)，外套细胞淋巴瘤(mantle celllymphoma)，伯基特淋巴瘤(Burkitt’s lymphoma)，纵隔大B细胞淋巴瘤(mediastinallarge B cell lymphoma)，

巨球蛋白血症(

macroglobulinemia)，结节边缘区B细胞淋巴瘤(nodal marginal zone B cell lymphoma(NMZL))，脾边缘区淋巴瘤(splenic marginal zone lymphoma(SMZL))，结节外边缘区B细胞淋巴瘤(extranodal marginal zone B cell lymphoma)，血管内大 B细胞淋巴瘤(intravascular large B cell lymphoma)，原发性渗出性淋巴瘤(imary effusionlymphoma)，和淋巴瘤样肉芽肿病(mphomatoid granulomatosis)。One or more malignant cells can be derived from a tumor, carcinoma, sarcoma, or leukemia. Sarcomas are cancers of bone, cartilage, fat, muscle, blood vessels, or other connective or supportive tissue. Sarcomas include, but are not limited to, bone cancer, fibrosarcoma, chondrosarcoma, Ewing's sarcoma, malignant hemangioendothelioma, malignant schwannoma , bilateral vestibular schwannoma, osteosarcoma, soft tissue sarcomas (eg, alveolar soft partsarcoma, angiosarcoma, phyllodes cystic sarcoma) Cystosarcoma phylloides, dermatofibrosarcoma, desmoid tumor, epithelioid sarcoma, extraskeletal osteosarcoma, fibrosarcoma, hemangiopericytoma , hemangiosarcoma, Kaposi's sarcoma, leiomyosarcoma, liposarcoma, lymphangiosarcoma, lymphosarcoma, malignant fibrous histiocytoma fibrous histiocytoma), neurofibrosarcoma, rhabdomyosarcoma, and synovial sarcoma). Carcinomas are cancers that start in epithelial cells, the cells that cover the surface of the body, produce hormones, and make up glands. By way of non-limiting example, cancer includes breast cancer, pancreatic cancer, lung cancer, colon cancer, colorectal cancer, rectal cancer, kidney cancer, bladder cancer, stomach cancer, prostate cancer, liver cancer, ovarian cancer, brain cancer, vaginal cancer, Vulvar cancer, uterine cancer, oral cancer, penile cancer, testicular cancer, esophageal cancer, skin cancer, fallopian tube cancer, head and neck cancer, gastrointestinal stromal cancer, adenocarcinoma, skin or intraocular melanoma, anal cancer, small bowel cancer , Endocrine system cancer, thyroid cancer, parathyroid cancer, renal adenocarcinoma, urethral cancer, renal pelvis cancer, ureteral cancer, endometrial cancer, cervical cancer, pituitary cancer, central nervous system (CNS) tumor, primary CNS lymphoma , brain stem glioma and spinal axonoma. In some embodiments, the cancer is a skin cancer, such as basal cell carcinoma, squamous cell carcinoma, melanoma, non-melanoma, or actinic (solar) keratosis. In some embodiments, the cancer is lung cancer. Lung cancer can start in the airways that branch into bronchi to supply the lungs (bronchi) or the small air sacs of the lungs (alveoli). Lung cancer includes non-small cell lung cancer (NSCLC), small cell lung cancer, and mesothelioma. Examples of NSCLC include squamous cell carcinoma, adenocarcinoma, and large cell carcinoma. Mesothelioma can be a cancerous tumor of the lining of the lungs and chest (pleura) or the lining of the abdomen (peritoneum). Mesothelioma may be caused by asbestos exposure. The cancer may be brain cancer, such as glioblastoma. In some embodiments, the cancer can be a central nervous system (CNS) tumor. CNS tumors can be classified as either gliomas or nongliomas. Glioma can be disgusting glioma, high-grade glioma, diffuse intrinsicpontine glioma. Examples of gliomas include astrocytomas, oligodendrogliomas (or a mixture of oligodendroglioma and astrocytoma elements), and ependymomas . Astrocytomas include, but are not limited to, low-grade astrocytoma, anaplastic astrocytomas, glioblastoma multiforme, fibrous cell astrocytoma ( pilocytic astrocytoma), pleomorphic xanthoastrocytoma, and subependymal giant cellastrocytoma. Oligodendrogliomas include low-grade oligodendrogliomas (or oligoastrocytomas) and anaplastic oligodendrogliomas. Nongliomas include meningiomas, pituitary adenomas, primary CNS lymphomas, and medulloblastomas. In some embodiments, the cancer is a meningioma. The leukemia can be acute lymphocytic leukemia, acute myelocytic leukemia, chronic lymphocytic leukemia, or chronic myelocyticleukemia. Additional types of leukemia include hairy cell leukemia, chronic myelomonocytic leukemia, and juvenile myelomonocytic leukemia. Lymphomas are cancers of lymphocytes and may develop from B or T lymphocytes. The two main types of lymphoma are Hodgkin's lymphoma (formerly known as Hodgkin's disease) and non-Hodgkin's lymphoma. Hodgkin lymphoma is marked by the presence of Reed-Sternberg cells. Non-Hodgkin lymphomas are all lymphomas that are not Hodgkin lymphomas. Non-Hodgkin lymphomas can be indolent lymphomas and aggressive lymphomas. Non-Hodgkin lymphomas include, but are not limited to, diffuse large B-cell lymphoma, follicular lymphoma, mucosa-associated lymphatic tissue lymphoma (MALT), small cell Small cell lymphocytic lymphoma, mantle celllymphoma, Burkitt's lymphoma, mediastinallarge B cell lymphoma,

macroglobulinemia (

macroglobulinemia), nodal marginal zone B cell lymphoma (NMZL), splenic marginal zone lymphoma (SMZL), extranodal marginal zone B cell lymphoma (extranodal marginal zone lymphoma (SMZL)) zone B cell lymphoma), intravascular large B cell lymphoma, primary effusion lymphoma, and mphomatoid granulomatosis.

多种样品可以包括至少5，10，20，30，40，50，60，70，80，90 或100种以上的样品。多种样品可以包括至少约100，200，300，400， 500，600，700，800，900或1000种以上的样品。多种样品可以包括至少约1000，2000，3000，4000，5000，6000，7000，8000种样品，9000 或10,000种或100,000种或1,000,000种以上的样品。多种样品可以包括至少约10,000种样品。The plurality of samples may include at least 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, or more than 100 samples. The plurality of samples can include at least about 100, 200, 300, 400, 500, 600, 700, 800, 900, or more than 1000 samples. The plurality of samples can include at least about 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000 samples, 9000 or 10,000 or 100,000 or more than 1,000,000 samples. The plurality of samples can include at least about 10,000 samples.

第一样品中一种或多种多核苷酸可以与第二样品中的一种或多种多核苷酸是不同的。第一样品中一种或多种多核苷酸可以与多种样品中的一种或多种多核苷酸是不同的。样品中的一种或多种多核苷酸可以包括至少约80％，85％，90％，95％，96％，97％，98％，99％，或100％的序列同一性。在一些实施方案中，样品中的一种或多种多核苷酸的区别可以少于约100，90，80，70，60，50，40，30，25，20，25，10，9，8，7， 6，5，4，3，2，或1个核苷酸或碱基对。多种样品中的一种或多种样品中的多种多核苷酸可以包括两种以上相同的序列。多种样品中的一种或多种中的总多核苷酸的至少约1％，2％，3％，4％，5％，6％，7％，8％， 9％，10％，15％，20％，25％，30％，35％，40％，45％，50％，55％， 60％，65％，70％，75％，80％，85％，90％，95％，97％，或100％可以包含相同的序列。多种样品中的一种或多种样品中的多种多核苷酸可以包括至少两种不同的序列。多种样品中的一种或多种中的总多核苷酸的至少约5％，10％，15％，20％，25％，30％，35％，40％，45％，50％，55％，60％，65％，70％，75％，80％，81％，82％，83％，84％，85％， 86％，87％，88％，89％，90％，91％，92％，93％，94％，95％，96％， 97％，98％，99％，100％可以包含至少两种不同的序列。在一些实施方案中，一种或多种多核苷酸彼此是变体。例如，一种或多种多核苷酸可以包含单核苷酸多态性或其他类型的突变。在另一个实施方案中，一种或多种多核苷酸是剪接变体。One or more polynucleotides in the first sample can be different from one or more polynucleotides in the second sample. One or more polynucleotides in the first sample can be different from one or more polynucleotides in the plurality of samples. One or more polynucleotides in a sample can include at least about 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity. In some embodiments, one or more polynucleotides in a sample may differ by less than about 100, 90, 80, 70, 60, 50, 40, 30, 25, 20, 25, 10, 9, 8 , 7, 6, 5, 4, 3, 2, or 1 nucleotide or base pair. Multiple polynucleotides in one or more of the multiple samples may include two or more identical sequences. At least about 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 15% of the total polynucleotides in one or more of the plurality of samples %, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 97%, or 100% can contain the same sequence. The plurality of polynucleotides in one or more of the plurality of samples may comprise at least two different sequences. At least about 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55% of the total polynucleotides in one or more of the plurality of samples %, 60%, 65%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 100% may contain at least two different sequences. In some embodiments, one or more polynucleotides are variants of each other. For example, one or more of the polynucleotides may contain single nucleotide polymorphisms or other types of mutations. In another embodiment, the one or more polynucleotides are splice variants.

第一样品可以包含一种或多种细胞，并且第二样品可以包含一种或多种细胞。第一样品的一种或多种细胞与第二样品的一种或多种细胞可以是相同的细胞类型。第一样品的一种或多种细胞与多种样品的一种或多种不同的细胞可以是不同的细胞类型。The first sample can contain one or more cells, and the second sample can contain one or more cells. The one or more cells of the first sample may be of the same cell type as the one or more cells of the second sample. The one or more cells of the first sample and the one or more different cells of the plurality of samples may be of different cell types.

多种样品可以同时获得。多种样品可以在同一时刻获得。多种样品可以顺序地获得。多种样品可以在获得一种或多种不同的样品的数年、100 年、10年、5年、4年、3年、2年、或1年的时程中获得。一种或多种样品可以在获得一种或多种不同样品的大约一年内获得。一种或多种样品可以在获得一种或多种不同样品的12个月、11个月、10个月、9个月、 8个月、7个月、6个月、4个月、3个月、2个月或1个月内获得。一种或多种样品可以在获得一种或多种不同样品的30天、28天、26天、24 天、21天、20天、18天、17天、16天、15天、14天、13天、12天、 11天、10天、9天、8天、7天、6天、5天、4天、3天、2天或一天内获得。一种或多种样品可以在获得一种或多种不同样品的约24小时、22 小时、20小时、18小时、16小时、14小时、12小时、10小时、8小时、 6小时、4小时、2小时或1小时内获得。一种或多种样品可以在获得一种或多种不同样品的约60秒、45秒、30秒、20秒、10秒、5秒、2秒或 1秒内获得。一种或多种样品可以在获得一种或多种不同样品的少于一秒内获得。Multiple samples can be obtained simultaneously. Multiple samples can be obtained at the same time. Multiple samples can be obtained sequentially. Multiple samples can be obtained over the course of several years, 100 years, 10 years, 5 years, 4 years, 3 years, 2 years, or 1 year in which one or more different samples are obtained. The one or more samples can be obtained within about one year of obtaining the one or more different samples. One or more samples can be obtained at 12 months, 11 months, 10 months, 9 months, 8 months, 7 months, 6 months, 4 months, 3 months of obtaining one or more different samples. Month, 2 months or 1 month. One or more samples can be obtained at 30 days, 28 days, 26 days, 24 days, 21 days, 20 days, 18 days, 17 days, 16 days, 15 days, 14 days, 13 days, 12 days, 11 days, 10 days, 9 days, 8 days, 7 days, 6 days, 5 days, 4 days, 3 days, 2 days or one day. The one or more samples can be obtained at about 24 hours, 22 hours, 20 hours, 18 hours, 16 hours, 14 hours, 12 hours, 10 hours, 8 hours, 6 hours, 4 hours after obtaining the one or more different samples , 2 hours or 1 hour. The one or more samples can be obtained within about 60 seconds, 45 seconds, 30 seconds, 20 seconds, 10 seconds, 5 seconds, 2 seconds, or 1 second of obtaining the one or more different samples. One or more samples can be obtained in less than one second of obtaining one or more different samples.

样品的不同的多核苷酸可以以不同的浓度或量存在于所述样品中。例如，样品中一种多核苷酸的浓度或量可以大于另一种多核苷酸的浓度或量。在一些实施方案中，样品中至少一种多核苷酸的浓度或量是样品中至少另一种多核苷酸的浓度或量的至少约1.5，2，3，4，5，6，7，8，9， 10，11，12，13，14，15，20，25，30，35，40，45，50，60，70， 80，90，100，200，300，400，500，600，700，800，900，1000倍以上。在另一个实例中，一种多核苷酸的浓度或量小于样品中另一种多核苷酸的浓度或量。样品中至少一种多核苷酸的浓度或量可以比样品中至少另一种多核苷酸的浓度或量少至少约1.5，2，3，4，5，6，7，8，9，10， 11，12，13，14，15，20，25，30，35，40，45，50，60，70，80， 90，100，200，300，400，500，600，700，800，900，1000或更多的倍。Different polynucleotides of a sample can be present in the sample in different concentrations or amounts. For example, the concentration or amount of one polynucleotide in the sample can be greater than the concentration or amount of another polynucleotide. In some embodiments, the concentration or amount of at least one polynucleotide in the sample is at least about 1.5, 2, 3, 4, 5, 6, 7, 8 of the concentration or amount of at least one other polynucleotide in the sample , 9, 10, 11, 12, 13, 14, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, 600, 700 , 800, 900, 1000 times more. In another example, the concentration or amount of one polynucleotide is less than the concentration or amount of another polynucleotide in the sample. The concentration or amount of at least one polynucleotide in the sample can be at least about 1.5, 2, 3, 4, 5, 6, 7, 8, 9, 10, less than the concentration or amount of at least one other polynucleotide in the sample. 11, 12, 13, 14, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000 or more times.

在一些实施方案中，两种以上的样品可以包含不同量或浓度的多核苷酸。在一些实施方案中，一种样品中的一种多核苷酸的浓度或量可以大于不同样品中同一种多核苷酸的浓度或量。例如，血液样品可以包含比尿样品更高量的特定多核苷酸。备选地，单一样品可以分成两份以上的子样品。子样品可以包含不同量或浓度的同一种多核苷酸。一种样品中的至少一种多核苷酸的浓度或量可以是同一种多核苷酸在另一种样品中的浓度或量的至少约1.5，2，3，4，5，6，7，8，9，10，11，12，13，14， 15，20，25，30，35，40，45，50，60，70，80，90，100，200，300， 400，500，600，700，800，900，1000或更多的倍。备选地，一种样品中一种多核苷酸的浓度或量可以少于同一种多核苷酸在不同样品中的浓度或量。例如，一种样品中至少一种多核苷酸的浓度或量可以比同一种多核苷酸在另一种样品中的浓度或量少至少约1.5，2，3，4，5，6，7， 8，9，10，11，12，13，14，15，20，25，30，35，40，45，50，60， 70，80，90，100，200，300，400，500，600，700，800，900，1000 或更多的倍。In some embodiments, two or more samples may contain different amounts or concentrations of polynucleotides. In some embodiments, the concentration or amount of one polynucleotide in one sample can be greater than the concentration or amount of the same polynucleotide in different samples. For example, a blood sample may contain a higher amount of a particular polynucleotide than a urine sample. Alternatively, a single sample can be divided into more than two subsamples. Subsamples may contain different amounts or concentrations of the same polynucleotide. The concentration or amount of at least one polynucleotide in one sample can be at least about 1.5, 2, 3, 4, 5, 6, 7, 8 of the concentration or amount of the same polynucleotide in another sample , 9, 10, 11, 12, 13, 14, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, 600, 700 , 800, 900, 1000 or more times. Alternatively, the concentration or amount of one polynucleotide in one sample may be less than the concentration or amount of the same polynucleotide in different samples. For example, the concentration or amount of at least one polynucleotide in one sample can be at least about 1.5, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000 or more times.

全血样品Whole blood sample

在一些实施方案中，样品是全血。在一些实施方案中，由全血样品产生的包含10种以上的UIDs的扩增子的百分数等于由纯化的多核苷酸样品产生的包含10种以上的UIDs的扩增子的百分数。在一些实施方案中，由全血样品产生的包含10种以上的UIDs的扩增子的百分数仅比由纯化的多核苷酸样品产生的包含10种以上的UIDs的扩增子的百分数小约1％，2％，3％，4％，5％，6％，7％，8％，9％，或至少小10％。在一些实施方案中，由全血样品观察到的中靶的特异性等于由纯化的多核苷酸样品观察到的中靶的特异性。在一些实施方案中，由全血样品观察到的中靶的特异性仅比由纯化的多核苷酸样品观察到的中靶的特异性小约1％， 2％，3％，4％，5％，6％，7％，8％，9％，或至多小10％。在一些实施方案中，由全血样品观察到的覆盖均匀性等于由纯化的多核苷酸样品观察到的覆盖均匀性。在一些实施方案中，由全血样品观察到的覆盖均匀性仅比由纯化的多核苷酸样品观察到的覆盖均匀性小约1％，2％，3％，4％， 5％，6％，7％，8％，9％，或至多小10％。In some embodiments, the sample is whole blood. In some embodiments, the percentage of amplicons comprising more than 10 UIDs produced from the whole blood sample is equal to the percentage of amplicons comprising more than 10 UIDs produced from the purified polynucleotide sample. In some embodiments, the percentage of amplicons comprising more than 10 UIDs produced from the whole blood sample is only about 1 less than the percentage of amplicons comprising more than 10 UIDs produced from the purified polynucleotide sample %, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, or at least 10% less. In some embodiments, the specificity of on-target observed from the whole blood sample is equal to the specificity of on-target observed from the purified polynucleotide sample. In some embodiments, the on-target specificity observed from the whole blood sample is only about 1%, 2%, 3%, 4%, 5% less than the on-target specificity observed from the purified polynucleotide sample %, 6%, 7%, 8%, 9%, or at most 10% less. In some embodiments, the uniformity of coverage observed with the whole blood sample is equal to the uniformity of coverage observed with the purified polynucleotide sample. In some embodiments, the uniformity of coverage observed from the whole blood sample is only about 1%, 2%, 3%, 4%, 5%, 6% less than the uniformity of coverage observed from the purified polynucleotide sample , 7%, 8%, 9%, or at most 10% less.

FFPE样品FFPE samples

在一些实施方案中，样品是福尔马林固定的、石蜡包埋的(FFPE)样品。在一些实施方案中，由FFPE样品产生的包含10种以上的UIDs的扩增子的百分数等于由纯化的多核苷酸样品产生的包含10种以上的UIDs 的扩增子的百分数。在一些实施方案中，由FFPE样品产生的包含10种以上的UIDs的扩增子的百分数仅比由纯化的多核苷酸样品产生的包含10种以上的UIDs的扩增子的百分数小约1％，2％，3％，4％，5％， 6％，7％，8％，9％，或至多小10％。在一些实施方案中，由FFPE样品观察到的中靶的特异性等于由纯化的多核苷酸样品观察到的中靶的特异性。在一些实施方案中，由FFPE样品观察到的中靶的特异性仅比由纯化的多核苷酸样品观察到的中靶的特异性小约1％，2％，3％，4％，5％， 6％，7％，8％，9％，或至多小10％。在一些实施方案中，由FFPE样品观察到的覆盖均匀性等于由纯化的多核苷酸样品观察到的覆盖均匀性。在一些实施方案中，在一些实施方案中，由FFPE样品观察到的覆盖均匀性仅比由纯化的多核苷酸样品观察到的覆盖均匀性小约1％，2％，3％，4％，5％，6％，7％，8％，9％，或至多小10％。In some embodiments, the sample is a formalin-fixed, paraffin-embedded (FFPE) sample. In some embodiments, the percentage of amplicons comprising more than 10 UIDs produced from the FFPE sample is equal to the percentage of amplicons comprising more than 10 UIDs produced from the purified polynucleotide sample. In some embodiments, the percentage of amplicons comprising more than 10 UIDs produced from the FFPE sample is only about 1% less than the percentage of amplicons comprising more than 10 UIDs produced from the purified polynucleotide sample , 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, or at most 10% less. In some embodiments, the on-target specificity observed from the FFPE sample is equal to the on-target specificity observed from the purified polynucleotide sample. In some embodiments, the on-target specificity observed from the FFPE sample is only about 1%, 2%, 3%, 4%, 5% less than the on-target specificity observed from the purified polynucleotide sample , 6%, 7%, 8%, 9%, or at most 10% less. In some embodiments, the uniformity of coverage observed from the FFPE sample is equal to the uniformity of coverage observed from the purified polynucleotide sample. In some embodiments, the uniformity of coverage observed from FFPE samples is only about 1%, 2%, 3%, 4% less than the uniformity of coverage observed from purified polynucleotide samples, 5%, 6%, 7%, 8%, 9%, or at most 10% less.

文库library

本文公开的文库可以用于多种应用。用于本文时，文库包括多种分子。在一些实施方案中，文库包括多种多核苷酸。在一些实施方案中，文库包括多种引物。在一些实施方案中，文库包括多种RT引物。在一些实施方案中，文库包括多种PE引物。在一些实施方案中，文库包括多种线性引物延伸(LPE)引物。在一些实施方案中，文库包括多种衔接子。在一些实施方案中，文库包括多种用于非指数扩增(诸如线性扩增)的引物。在一些实施方案中，文库包括多种用于指数扩增(诸如PCR)的引物。在一些实施方案中，文库包括多种用于测序的多核苷酸。例如，文库可以用于测序应用。在一些实施方案中，文库包括多种来自一种或多种多核苷酸、扩增子或扩增子组的序列读数。文库可以保存并应用多次，以产生用于分析的样品。一些应用包括，例如，基因分型多态性，研究RNA加工，以及选择克隆代表物以按照本文提供的方法进行测序。可以产生包含多种多核苷酸(如引物)的文库或用于测序或扩增的文库，其中多种多核苷酸包含至少约2，3，4，5，6，7，8，9，10，11，12，13，14，15，20， 25，30，35，40，45，50，60，70，80，90，100，200，300，400， 500，600，700，800，或900种UIDs或特有的多核苷酸。在一些实施方案中，多核苷酸文库包括至少约1000，1500，2000，3000，4000， 5000，6000，7000，8000，9000，10,000，11,000，12,000，13,000，14,000，15,000，16,000，17,000，18,000，19,000，20,000，30,000， 40,000，50,000，60,000，70,000，80,000，90,000，100,000，200,000， 300,000，400,000，500,000，600,000，700,000，800,000，900,000， 1,000,000，50,000,000，100,000,000种以上特有的多核苷酸中的多种，其中，每种特有的多核苷酸包含UID。在一些实施方案中，多核苷酸文库包括多种扩增子组，其中所述扩增子组包括多个具有相同UID的多核苷酸。在一些实施方案中，多核苷酸文库包括至少约1，2，3，4，5，6， 7，8，9，10，20，30，40，50，60，70，80，90，100，100，200， 300，40，500，600，700，800，900，1000，1500，2000，3000，4000， 5000，6000，7000，8000，9000，10,000，11,000，12,000，13,000， 14,000，15,000，16,000，17,000，18,000，19,000，20,000，30,000， 40,000，50,000，60,000，70,000，80,000，90,000，100,000种以上扩增子中的多种，其中一种或多种扩增子中的每种多核苷酸包括多个具有相同 UID的多核苷酸。在一些实施方案中，多核苷酸文库包括至少约1，2， 3，4，5，6，7，8，9，10，20，30，40，50，60，70，80，90，100， 100，200，300，400，500，600，700，800，900，1000，1500，2000， 3000，4000，5000，6000，7000，8000，9000，10,000，11,000， 12,000，13,000，14,000，15,000，16,000，17,000，18,000，19,000， 20,000，30,000，40,000，50,000，60,000，70,000，80,000，90,000， 100,000个以上的扩增子组中的多个，其中每个扩增子组包括多个具有相同UID的多核苷酸或扩增子。在一些实施方案中，多核苷酸文库包括至少约1，2，3，4，5，6，7，8，9，10，20，30，40，50，60，70，80， 90，100，100，200，300，400，500，600，700，800，900，1000， 1500，2000，3000，4000，5000，6000，7000，8000，9000，10,000， 11,000，12,000，13,000，14,000，15,000，16,000，17,000，18,000， 19,000，20,000，30,000，40,000，50,000，60,000，70,000，80,000，90,000，100,000种以上的多核苷酸、扩增子或扩增子组中的多种，其中每种多核苷酸、扩增子或扩增子组包括多个具有相同的模板序列或其部分的多核苷酸、扩增子或扩增子组。在一些实施方案中，多核苷酸文库包括至少约1，2，3，4，5，6，7，8，9，10，20，30，40，50，60，70，80，90，100，100，200，300，400，500，600，700，800，900， 1000，1500，2000，3000，4000，5000，6000，7000，8000，9000， 10,000，11,000，12,000，13,000，14,000，15,000，16,000，17,000， 18,000，19,000，20,000，30,000，40,000，50,000，60,000，70,000， 80,000，90,000，100,000种以上的多核苷酸、扩增子或扩增子组中的多种，其中每种多核苷酸、扩增子或扩增子组包括多个具有这样的模板序列或其部分的多核苷酸、扩增子或扩增子组，其中所述模板序列或其部分与一种或多种其他的多核苷酸、扩增子或扩增子组具有一个或多个由扩增或测序出错或偏差引起的碱基的差异。The libraries disclosed herein can be used for a variety of applications. As used herein, the library includes a variety of molecules. In some embodiments, the library includes multiple polynucleotides. In some embodiments, the library includes multiple primers. In some embodiments, the library includes multiple RT primers. In some embodiments, the library includes multiple PE primers. In some embodiments, the library includes a plurality of linear primer extension (LPE) primers. In some embodiments, the library includes multiple adaptors. In some embodiments, the library includes multiple primers for non-exponential amplification, such as linear amplification. In some embodiments, the library includes multiple primers for exponential amplification, such as PCR. In some embodiments, the library includes a plurality of polynucleotides for sequencing. For example, libraries can be used in sequencing applications. In some embodiments, the library includes a plurality of sequence reads from one or more polynucleotides, amplicons or sets of amplicons. Libraries can be saved and applied multiple times to generate samples for analysis. Some applications include, for example, genotyping polymorphisms, studying RNA processing, and selecting cloned representatives for sequencing according to the methods provided herein. Libraries for sequencing or amplification can be generated comprising a plurality of polynucleotides (eg, primers), wherein the plurality of polynucleotides comprise at least about 2, 3, 4, 5, 6, 7, 8, 9, 10 , 11, 12, 13, 14, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, 600, 700, 800, or 900 UIDs or unique polynucleotides. In some embodiments, the polynucleotide library comprises at least about 1000, 1500, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10,000, 11,000, 12,000, 13,000, 14,000, 15,000, 16,000, 17,000, 18,000 , 19,000, 20,000, 30,000, 40,000, 50,000, 60,000, 70,000, 80,000, 90,000, 100,000, 200,000, 300,000,400,000, 500,000, 700,000, 800,000, 900,000, 50,000, 100,000,0000,000. A variety of wherein each unique polynucleotide comprises a UID. In some embodiments, the polynucleotide library comprises a plurality of sets of amplicons, wherein the sets of amplicons comprise a plurality of polynucleotides having the same UID. In some embodiments, the polynucleotide library comprises at least about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100 , 100, 200, 300, 40, 500, 600, 700, 800, 900, 1000, 1500, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10,000, 11,000, 12,000, 13,000, 14,000, 15 , 16,000, 17,000, 18,000, 19,000, 20,000, 30,000, 40,000, 50,000, 60,000, 70,000, 80,000, 90,000, 100,000 of more than one amplicons, each of which is multinucleated in one or more amplicons A nucleotide includes multiple polynucleotides with the same UID. In some embodiments, the polynucleotide library comprises at least about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100 , 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1500, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10,000, 11,000, 12,000, 13,000, 14,000, 15 , 16,000, 17,000, 18,000, 19,000, 20,000, 30,000, 40,000, 50,000, 60,000, 70,000, 80,000, 90,000, 100,000 or more in a set of more than UID polynucleotides or amplicons. In some embodiments, the polynucleotide library comprises at least about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100 , 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1500, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10,000, 11,000, 12,000, 13,000, 14,000, 15 , 16,000, 17,000, 18,000, 19,000, 20,000, 30,000, 40,000, 50,000, 60,000, 70,000, 80,000, 90,000, 100,000 or more polynucleotides, amplicons or multiple in a set of amplicons, each of which is polynuclear A nucleotide, amplicon or set of amplicons includes a plurality of polynucleotides, amplicons or sets of amplicons having the same template sequence or portion thereof. In some embodiments, the polynucleotide library comprises at least about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100 , 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1500, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10,000, 11,000, 12,000, 13,000, 14,000, 1 , 16,000, 17,000, 18,000, 19,000, 20,000, 30,000, 40,000, 50,000, 60,000, 70,000, 80,000, 90,000, 100,000 or more polynucleotides, amplicons or multiple in a group of amplicons, each of which is a polynucleotide A nucleotide, amplicon, or set of amplicons includes a plurality of polynucleotides, amplicons, or sets of amplicons having a template sequence or portion thereof that is associated with one or more Other polynucleotides, amplicons or sets of amplicons have one or more base differences caused by amplification or sequencing errors or biases.

引物primer

进行本文所公开的方法的一个或多个反应可以包括使用一种或多种引物。用于本文时，引物包括双链的、单链的或部分单链的寡核苷酸，其互补性足以杂交到模板多核苷酸上。在结合到模板多核苷酸上之前，引物可以是单链DNA。在一些实施方案中，引物在开始时包含双链序列。引物位点包括模板上与引物杂交的区域。在一些实施方案中，引物能够作为模板-指导的核酸合成的起始位点。例如，当存在四种不同的核苷酸和聚合剂或酶，诸如DNA或RNA聚合酶或反转录酶时，引物可以起始模板- 指导的核酸合成。引物对包括2个引物：第一引物，其具有与模板序列的 5′末端杂交的5′上游区，和第二引物，其具有与模板序列3′末端的互补物杂交的3′下游区。在一些实施方案中，引物包含靶标特异性序列和UID 序列。在一些实施方案中，引物包含条形码序列。在一些实施方案中，引物包含UID序列。在一些实施方案中，引物包含样品条形码序列。在一些实施方案中，引物包含通用引发序列。在一些实施方案中，引物包含 PCR引发序列。在一些实施方案中，引物包含用于起始多核苷酸的扩增的PCR引发序列。(Dieffenbach，PCR Primer：A Laboratory Manual，第2 版(Cold Spring Harbor Press，New York(2003))。通用引物结合位点或序列允许通用引物连接到多核苷酸和/或扩增子上。通用引物在本领域中是公知的，并且包括，但不限于，-47F(M13F)，alfaMF，AOX3’，AOX5’， BGHr，CMV-30，CMV-50，CVMf，LACrmt，λgt10F，λgt10R，λgt11F， λgt11R，M13 rev，M13正向(-20)，M13反向，male，p10SEQPpQE，pA-120， pet4，pGAP正向，pGLRVpr3，pGLpr2R，pKLAC14，pQEFS，pQERS， pucU1，pucU2，reversA，seqIREStam，seqIRESzpet，seqori，seqPCR， seqpIRES-，seqpIRES+，seqpSecTag，seqpSecTag+，seqretro+PSI，SP6， T3-prom，T7-prom，和T7-termInv。用于本文时，连接可以是指共价相互作用和非共价相互作用中的两种或任一种。将通用引物连接到通用引物结合位点上可以用于多核苷酸和/或扩增子的扩增、检测、和/或测序。通用引物结合位点可以包括至少约1，2，3，4，5，6，7，8，9，10，11，12， 13，14，15，16，17，18，19，20，30，40，50，60，70，80，90，100， 200，300，400，500，600，700，800，900或1000个核苷酸或碱基对。在另一个实例中，通用引物结合位点包含至少约1500，2000，2500，3000， 3500，4000，4500，5000，5500，6000，6500，7000，7500，8000，8500， 9000，9500，或10000个核苷酸或碱基对。在一些实施方案中，通用引物结合位点包含1-10，10-20，10-30或10-100个核苷酸或碱基对。在一些实施方案中，通用引物结合位点包含约1-90，1-80，1-70，1-60，1-50， 1-40，1-30，1-20，1-10，2-90，2-80，2-70，2-60，2-50，2-40，2-30， 2-20，2-10，1-900，1-800，1-700，1-600，1-500，1-400，1-300，1-200， 1-100，2-900，2-800，2-700，2-600，2-500，2-400，2-300，2-200，2-100， 5-90，5-80，5-70，5-60，5-50，5-40，5-30，5-20，5-10，10-90，10-80， 10-70，10-60，10-50，10-40，10-30，10-20，10-10，5-900，5-800，5-700， 5-600，5-500，5-400，5-300，5-200，5-100，10-900，10-800，10-700， 10-600，10-500，10-400，10-300，10-200，10-100，25-900，25-800，25-700， 25-600，25-500，25-400，25-300，25-200，25-100，100-1000，100-900， 100-800，100-700，100-600，100-500，100-400，100-300，100-200，200-1000，200-900，200-800，200-700，200-600，200-500，200-400，200-300，300-1000， 300-900，300-800，300-700，300-600，300-500，300-400，400-1000，400-900， 400-800，400-700，400-600，400-500，500-1000，500-900，500-800，500-700， 500-600，600-1000，600-900，600-800，600-700，700-1000，700-900， 700-800，800-1000，800-900，或900-1000个核苷酸或碱基对。Carrying out one or more reactions of the methods disclosed herein can include the use of one or more primers. As used herein, primers include double-stranded, single-stranded or partially single-stranded oligonucleotides that are sufficiently complementary to hybridize to a template polynucleotide. The primer may be single-stranded DNA prior to binding to the template polynucleotide. In some embodiments, the primers initially comprise a double-stranded sequence. Primer sites include regions on the template to which primers hybridize. In some embodiments, primers are capable of serving as initiation sites for template-directed nucleic acid synthesis. For example, primers can initiate template-directed nucleic acid synthesis when four different nucleotides and polymerizing agents or enzymes, such as DNA or RNA polymerases or reverse transcriptases are present. The primer pair includes 2 primers: a first primer having a 5' upstream region that hybridizes to the 5' end of the template sequence, and a second primer having a 3' downstream region that hybridizes to the complement of the 3' end of the template sequence. In some embodiments, the primers comprise a target-specific sequence and a UID sequence. In some embodiments, the primers comprise barcode sequences. In some embodiments, the primers comprise UID sequences. In some embodiments, the primers comprise sample barcode sequences. In some embodiments, the primers comprise universal priming sequences. In some embodiments, the primers comprise PCR priming sequences. In some embodiments, the primers comprise PCR priming sequences for amplification of the starting polynucleotide. (Dieffenbach, PCR Primer: A Laboratory Manual, 2nd Edition (Cold Spring Harbor Press, New York (2003)). Universal primer binding sites or sequences allow universal primers to be attached to polynucleotides and/or amplicons. Universal Primers are well known in the art and include, but are not limited to, -47F(M13F), alfaMF, AOX3', AOX5', BGHr, CMV-30, CMV-50, CVMf, LACrmt, λgt10F, λgt10R, λgt11F, λgt11R, M13 rev, M13 forward (-20), M13 reverse, male, p10SEQPpQE, pA-120, pet4, pGAP forward, pGLRVpr3, pGLpr2R, pKLAC14, pQEFS, pQERS, pucU1, pucU2, reversA, seqIREStam, seqIRESzpet , seqori, seqPCR, seqpIRES-, seqpIRES+, seqpSecTag, seqpSecTag+, seqretro+PSI, SP6, T3-prom, T7-prom, and T7-termInv. As used herein, linking can refer to covalent interactions and non-covalent Two or any of the interactions. The universal primer can be connected to the universal primer binding site for amplification, detection, and/or sequencing of polynucleotides and/or amplicons. The universal primer binding site may include at least about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, 600, 700, 800, 900 or 1000 nucleotides or base pairs. In another example, the universal primer binding site comprises at least about 1500, 2000, 2500, 3000, 3500, 4000, 4500, 5000, 5500, 6000, 6500, 7000, 7500, 8000, 8500, 9000, 9500, or 10000 nucleotides or base pairs. In some embodiments , the universal primer binding site comprises 1-10, 10-20, 10-30 or 10-100 nucleotides or base pairs. In some embodiments, the universal primer binding site comprises about 1-90, 1- 80, 1-70, 1-60, 1-50, 1-40, 1-30, 1-20, 1-10, 2-90, 2-80, 2-70, 2-60, 2-50, 2-40, 2-30, 2-20, 2-10, 1-900, 1-800, 1-700, 1-600, 1-500, 1-400, 1-300, 1-200, 1-100, 2-900, 2-800, 2-700, 2-600, 2- 500, 2-400, 2-300, 2-200, 2-100, 5-90, 5-80, 5-70, 5-60, 5-50, 5-40, 5-30, 5-20, 5-10, 10-90, 10-80, 10-70, 10-60, 10-50, 10-40, 10-30, 10-20, 10-10, 5-900, 5-800, 5- 700, 5-600, 5-500, 5-400, 5-300, 5-200, 5-100, 10-900, 10-800, 10-700, 10-600, 10-500, 10-400, 10-300, 10-200, 10-100, 25-900, 25-800, 25-700, 25-600, 25-500, 25-400, 25-300, 25-200, 25-100, 100- 1000, 100-900, 100-800, 100-700, 100-600, 100-500, 100-400, 100-300, 100-200, 200-1000, 200-900, 200-800, 200-700, 200-600, 200-500, 200-400, 200-300, 300-1000, 300-900, 300-800, 300-700, 300-600, 300-500, 300-400, 400-1000, 400- 900, 400-800, 400-700, 400-600, 400-500, 500-1000, 500-900, 500-800, 500-700, 500-600, 600-1000, 600-900, 600-800, 600-700, 700-1000, 700-900, 700-800, 800-1000, 800-900, or 900-1000 nucleotides or base pairs.

引物可以具有与其在引物延伸产物合成中的应用相容的长度。异物可以是长度为8-200个核苷酸的多核苷酸。引物的长度可能取决于模板多核苷酸和模板基因座的序列。例如，可以优化引物或引物组的长度和/或解链温度(Tm)。在一些情形中，引物长度可以是约、大于约或小于约10， 11，12，13，14，15，16，17，18，19，20，21，22，23，24，25， 26，27，28，29，30，31，32，33，34，35，36，37，38，39，40， 41，42，43，44，45，46，47，48，49，50，51，52，53，54，55， 56，57，58，59，或60个核苷酸。在一些实施方案中，引物长度是约 8-100个核苷酸，例如，长度为10-75，15-60，15-40，18-30，20-40， 21-50，22-45，25-40，7-9，12-15，15-20，15-25，15-30，15-45， 15-50，15-55，15-60，20-25，20-30，20-35，20-45，20-50，20-55，或 20-60个核苷酸，以及之间的任意长度。在一些实施方案中，引物长度至多为约10，12，15，20，21，22，23，24，25，26，27，28，29，30， 35，40，45，50，55，60，65，70，75，80，85，90，95或100个核苷酸。The primers can be of a length compatible with their use in the synthesis of primer extension products. The foreign body can be a polynucleotide that is 8-200 nucleotides in length. The length of the primers may depend on the sequence of the template polynucleotide and the template locus. For example, the length and/or melting temperature (Tm) of a primer or primer set can be optimized. In some cases, the primer length can be about, greater than about, or less than about 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, or 60 nucleotides. In some embodiments, primers are about 8-100 nucleotides in length, eg, 10-75, 15-60, 15-40, 18-30, 20-40, 21-50, 22-45 in length, 25-40, 7-9, 12-15, 15-20, 15-25, 15-30, 15-45, 15-50, 15-55, 15-60, 20-25, 20-30, 20- 35, 20-45, 20-50, 20-55, or 20-60 nucleotides, and any length in between. In some embodiments, the primer length is at most about 10, 12, 15, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, 55, 60 , 65, 70, 75, 80, 85, 90, 95 or 100 nucleotides.

通常，在指数扩增反应中使用一对或多对引物；引物对的一个引物可以是正向引物，引物对的一个引物可以是反向引物。在一些实施方案中，第一对引物可以用于指数扩增反应；第一对的一个引物可以是与第一模板多核苷酸分子的序列互补的正向引物，并且第一对的一个引物可以是与第一模板多核苷酸分子的第二序列互补的反向引物，并且第一模板基因座可以位于第一序列与第二序列之间。在一些实施方案中，第二对引物可以用于扩增反应；第二对的一个引物可以是与第二靶标多核苷酸分子的第一序列互补的正向引物，并且第二对的一个引物可以是与第二靶标多核苷酸分子的第二序列互补的反向引物，并且第二靶标基因座可以位于第一序列与第二序列之间。在一些实施方案中，第二靶标基因座包括可变轻链抗体序列。在一些实施方案中，在扩增反应中可以使用第三对引物；第三对的一个引物可以是与第三模板多核苷酸分子的第一序列互补的正向引物，并且第三对的一个引物可以是与第三模板多核苷酸分子的第二序列互补的反向引物，并且第三模板基因座可以位于第一序列与第二序列之间。在一些实施方案中，第一、第二或第三模板基因座包含条形码，诸如UID。Typically, one or more pairs of primers are used in an exponential amplification reaction; one primer of a primer pair may be a forward primer and one primer of a primer pair may be a reverse primer. In some embodiments, a first pair of primers can be used in an exponential amplification reaction; one primer of the first pair can be a forward primer complementary to the sequence of the first template polynucleotide molecule, and one primer of the first pair can be is a reverse primer complementary to the second sequence of the first template polynucleotide molecule, and the first template locus can be located between the first sequence and the second sequence. In some embodiments, a second pair of primers can be used in the amplification reaction; one primer of the second pair can be a forward primer complementary to the first sequence of the second target polynucleotide molecule, and one primer of the second pair can be a forward primer complementary to the first sequence of the second target polynucleotide molecule Can be a reverse primer complementary to the second sequence of the second target polynucleotide molecule, and the second target locus can be located between the first sequence and the second sequence. In some embodiments, the second target locus comprises variable light chain antibody sequences. In some embodiments, a third pair of primers can be used in the amplification reaction; one primer of the third pair can be a forward primer complementary to the first sequence of the third template polynucleotide molecule, and one primer of the third pair can be a forward primer complementary to the first sequence of the third template polynucleotide molecule The primer can be a reverse primer complementary to the second sequence of the third template polynucleotide molecule, and the third template locus can be located between the first sequence and the second sequence. In some embodiments, the first, second or third template loci comprise barcodes, such as UIDs.

一种或多种引物可以与多种模板多核苷酸的至少一部分退火。一种或多种引物可以与多种模板多核苷酸的3’末端和/或5’末端退火。一种或多种引物可以与多种模板多核苷酸的内部区域退火。所述内部区域可以是距多种多核苷酸的3′末端或5’末端至少约10，11，12，13，14，15，16， 17，18，19，20，21，22，23，24，25，26，27，28，29，30，31， 32，33，34，35，36，37，38，39，40，41，42，43，44，45，46， 47，48，49，50，100，150，200，220，230，240，250，260，270， 280，290，300，310，320，330，340，350，360，370，380，390， 400，410，420，430，440，450，460，470，480，490，500，510， 520，530，540，550，560，570，580，590，600，650，700，750， 800，850，900或1000个核苷酸。一种或多种引物可以包括固定的引物组。一种或多种引物可以包括至少一种或多种定制引物(custom primers)。一种或多种引物可以包括至少一种或多种对照引物。一种或多种引物可以包括至少一种或多种看家基因引物。一种或多种引物可以包括通用引物。通用引物可以与通用引物结合位点退火。在一些实施方案中，一种或多种定制引物不与UID退火。在一些实施方案中，一种或多种定制引物与 SBC、靶标特异性区域、其互补物或它们的任意组合退火。一种或多种引物可以包括通用引物和包含UID的引物。一种或多种引物可以设计为扩增或进行一种或多种靶标或模板多核苷酸的引物延伸、反转录、线性延伸、非指数扩增、指数扩增、PCR或任意其他的扩增方法。One or more primers can anneal to at least a portion of the various template polynucleotides. One or more primers can anneal to the 3' and/or 5' ends of the various template polynucleotides. One or more primers can anneal to interior regions of various template polynucleotides. The internal region can be at least about 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23 from the 3' end or the 5' end of the plurality of polynucleotides, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 100, 150, 200, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, 400, 410, 420, 430, 440, 450, 460, 470, 480, 490, 500, 510, 520, 530, 540, 550, 560, 570, 580, 590, 600, 650, 700, 750, 800, 850, 900 1000 nucleotides. The one or more primers may comprise immobilized primer sets. The one or more primers can include at least one or more custom primers. The one or more primers can include at least one or more control primers. The one or more primers can include at least one or more housekeeping gene primers. The one or more primers can include universal primers. The universal primer can anneal to the universal primer binding site. In some embodiments, one or more custom primers do not anneal to the UID. In some embodiments, one or more custom primers anneal to the SBC, the target-specific region, its complement, or any combination thereof. The one or more primers can include universal primers and UID-containing primers. One or more primers may be designed to amplify or perform primer extension, reverse transcription, linear extension, non-exponential amplification, exponential amplification, PCR, or any other amplification of one or more target or template polynucleotides. increase method.

靶标特异性区域可以包含至少约1，2，3，4，5，6，7，8，9， 10，11，12，13，14，15，16，17，18，19，20，21，22，23，24， 25，26，27，28，29，30，31，32，33，34，35，36，37，38，39， 40，41，42，43，44，45，46，47，48，49，50，100，150，200， 220，230，240，250，260，270，280，290，300，310，320，330， 340，350，360，370，380，390，400，410，420，430，440，450， 460，470，480，490，500，510，520，530，540，550，560，570， 580，590，600，650，700，750，800，850，900或1000个核苷酸或碱基对。在另一个实例中，靶标特异性区域包含至少约1500，2000，2500，3000，3500，4000，4500，5000，5500，6000，6500，7000，7500，8000，8500，9000，9500，或10000个核苷酸或碱基对。在一些实施方案中，靶标特异性区域包含约5-10，10-15，10-20，10-30，15-30， 10-75，15-60，15-40，18-30，20-40，21-50，22-45，25-40，7-9， 12-15，15-20，15-25，15-30，15-45，15-50，15-55，15-60，20-25， 20-30，20-35，20-45，20-50，20-55，20-60，2-900，2-800，2-700， 2-600，2-500，2-400，2-300，2-200，2-100，25-900，25-800，25-700， 25-600，25-500，25-400，25-300，25-200，25-100，100-1000， 100-900，100-800，100-700，100-600，100-500，100-400，100-300， 100-200，200-1000，200-900，200-800，200-700，200-600，200-500， 200-400，200-300，300-1000，300-900，300-800，300-700，300-600， 300-500，300-400，400-1000，400-900，400-800，400-700，400-600， 400-500，500-1000，500-900，500-800，500-700，500-600，600-1000， 600-900，600-800，600-700，700-1000，700-900，700-800，800-1000， 800-900，或900-1000个核苷酸或碱基对。The target-specific region may comprise at least about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 100, 150, 200, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, 400, 410, 420, 430, 440, 450, 460, 470, 480, 490, 500, 510, 520, 530, 540, 550, 560, 570, 580, 590, 600, 650, 700, 750, 800, 850, 900 or 1000 nucleotides or base pairs. In another example, the target-specific region comprises at least about 1500, 2000, 2500, 3000, 3500, 4000, 4500, 5000, 5500, 6000, 6500, 7000, 7500, 8000, 8500, 9000, 9500, or 10000 Nucleotides or base pairs. In some embodiments, the target-specific region comprises about 5-10, 10-15, 10-20, 10-30, 15-30, 10-75, 15-60, 15-40, 18-30, 20- 40, 21-50, 22-45, 25-40, 7-9, 12-15, 15-20, 15-25, 15-30, 15-45, 15-50, 15-55, 15-60, 20-25, 20-30, 20-35, 20-45, 20-50, 20-55, 20-60, 2-900, 2-800, 2-700, 2-600, 2-500, 2- 400, 2-300, 2-200, 2-100, 25-900, 25-800, 25-700, 25-600, 25-500, 25-400, 25-300, 25-200, 25-100, 100-1000, 100-900, 100-800, 100-700, 100-600, 100-500, 100-400, 100-300, 100-200, 200-1000, 200-900, 200-800, 200- 700, 200-600, 200-500, 200-400, 200-300, 300-1000, 300-900, 300-800, 300-700, 300-600, 300-500, 300-400, 400-1000, 400-900, 400-800, 400-700, 400-600, 400-500, 500-1000, 500-900, 500-800, 500-700, 500-600, 600-1000, 600-900, 600- 800, 600-700, 700-1000, 700-900, 700-800, 800-1000, 800-900, or 900-1000 nucleotides or base pairs.

引物可以按照已知的避免二级结构和自我杂交的参数进行设计。在一些实施方案中，不同的引物对可以在大致相同的温度退火和解链，例如，在另一个引物对的1，2，3，4，5，6，7，8，9或10℃以内。在一些实施方案中，多种引物中的一种或多种引物可以在大致相同的温度退火和解链，例如，在多种引物中的另一种引物的1，2，3，4，5，6，7，8，9 或10℃以内。在一些实施方案中，多种引物中的一种或多种引物可以在与多种引物中另一种引物不同的温度退火和解链。Primers can be designed according to known parameters to avoid secondary structure and self-hybridization. In some embodiments, different primer pairs can anneal and melt at approximately the same temperature, for example, within 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10°C of the other primer pair. In some embodiments, one or more primers of the plurality of primers can anneal and melt at approximately the same temperature, eg, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, within 6, 7, 8, 9 or 10°C. In some embodiments, one or more primers of the plurality of primers can anneal and melt at a different temperature than another primer of the plurality of primers.

用于本文所述的方法的一个或多个步骤的多种引物可以包括多种引物，包括约、至多约或至少约1，2，3，4，5，6，7，8，9，10，11， 12，13，14，15，16，17，18，19，20，30，40，50，60，70，80， 90，100，200，300，400，500，600，700，800，900，1000，1500， 2000，3000，4000，5000，6000，7000，8000，9000，10,000，11,000， 12,000，13,000，14,000，15,000，16,000，17,000，18,000，19,000， 20,000，30,000，40,000，50,000，60,000，70,000，80,000，90,000， 100,000，200,000，300,000，400,000，500,000，600,000，700,000， 800,000，900,000，1,000,000，50,000,000，100,000,000种不同的引物。例如，多种引物中的每个引物可以包含UID。例如，多种引物中的每个引物可以包含不同的靶标或模板特异性区域或序列。例如，多种引物中的每个引物可以包含不同的UID和不同的靶标或模板特异性区域或序列。例如，多种引物中的每个引物可以包含不同的UID和相同的靶标或模板特异性区域或序列。The various primers used in one or more steps of the methods described herein can include various primers including about, at most about, or at least about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 , 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, 600, 700, 800 , 900, 1000, 1500, 2000, 4000, 5000, 6000, 7000, 8000, 10,000, 11,000, 12,000, 13,000, 14,000, 15,000, 16,000, 17,000, 19,000, 20,000, 30,000, 40,000, 50,000, 50,000 , 60,000, 70,000, 80,000, 90,000, 100,000, 200,000, 300,000, 400,000, 500,000, 600,000, 700,000, 800,000, 900,000, 1,000,000, 50,000,000 different species. For example, each primer of the plurality of primers can contain a UID. For example, each primer of the plurality of primers can comprise a different target or template specific region or sequence. For example, each primer of the plurality of primers can comprise a different UID and a different target or template specific region or sequence. For example, each of the multiple primers can contain a different UID and the same target or template specific region or sequence.

引物组primer set

在一些实施方案中，用于本文所述的方法的引物组包括具有60℃- 68℃的解链温度范围的引物或由具有60℃-68℃的解链温度范围的引物组成。在一些实施方案中，用于本文所述的方法的引物组包括长度为 21-32个核苷酸的引物或由长度为21-32个核苷酸的引物组成。在一些实施方案中，用于本文所述的方法的引物组包括这样的引物或由这样的引物组成：所述引物在3’末端的最后5个核苷酸中不包含4个以上的嘧啶。在一些实施方案中，用于本文所述的方法的引物组包括设计产生包含 30％-70％GC含量的扩增子的引物或由设计产生包含30％-70％GC含量的扩增子的引物组成。在一些实施方案中，用于本文所述的方法的引物组包括设计产生长度为225-300个碱基对的扩增子的引物或由设计产生长度为225-300个碱基对的扩增子的引物组成。在一些实施方案中，用于本文所述的方法的引物组包括这样的引物或由这样的引物组成：所述引物来自排除了在初始RT/PE步骤或线性延伸/扩增步骤中具有最高的错读(由错误引发导致)数目的引物的初始组。在一些实施方案中，用于本文所述的方法的引物组包括来自排除了普遍以二聚体存在的引物的初始组的引物或由来自排除了普遍以二聚体存在的引物的初始组的引物组成。在一些实施方案中，用于本文所述的方法的引物组包括这样的引物或由这样的引物组成：所述引物来自排除了负责产生中靶的一个或多个最高数目的总读数(过度扩增)的引物的初始组。上述考虑中的任一种或组合可以用来产生用于所述的方法中的引物组。In some embodiments, primer sets for use in the methods described herein include or consist of primers having a melting temperature range of 60°C-68°C. In some embodiments, primer sets for use in the methods described herein include or consist of primers 21-32 nucleotides in length. In some embodiments, primer sets for use in the methods described herein include or consist of primers that do not contain more than 4 pyrimidines in the last 5 nucleotides of the 3' end. In some embodiments, primer sets for use in the methods described herein include primers designed to produce amplicons comprising 30%-70% GC content or primers designed to produce amplicons comprising 30%-70% GC content Primer composition. In some embodiments, primer sets for use in the methods described herein include primers designed to produce amplicons of 225-300 base pairs in length or designed to produce amplifications of 225-300 base pairs in length composition of primers. In some embodiments, primer sets for use in the methods described herein include or consist of primers derived from excluding primers that have the highest in the initial RT/PE step or linear extension/amplification step The initial set of primers with the number of misreads (caused by mispriming). In some embodiments, the set of primers used in the methods described herein include primers from an initial set excluding primers prevalent as dimers or from an initial set excluding primers prevalent as dimers Primer composition. In some embodiments, primer sets for use in the methods described herein include or consist of primers derived from the highest number of total reads that exclude one or more of the highest number of reads (over-amplification) responsible for the generation of on-target increase) the initial set of primers. Any one or a combination of the above considerations can be used to generate primer sets for use in the described methods.

UIDsUIDs

在一些实施方案中，条形码，诸如SBC或UID，可能分别具有在4-36 个核苷酸、6-30个核苷酸或8-20个核苷酸的范围内的长度。在某些方面中，一组内的条形码的解链温度在另一种的10℃以内，另一种的5℃以内，或另一种的2℃以内。在其他方面中，条形码是最低交联-杂交组的成员。例如，所述组的每个成员的核苷酸序列可能与该组每一个其他成员的核苷酸序列充分不同，足以在严格杂交条件下，没有成员可以与任意其他成员的互补物形成稳定的双链体。在一些实施方案中，最低交联-杂交组的每个成员的核苷酸序列与每个其他成员的核苷酸序列有至少两个核苷酸的区别。条形码技术记述在Winzeler等(1999)Science 285：901； Brenner(2000)Genome Biol.1：1 Kumar等(2001)NatureRev.2：302； Giaever等(2004)Proc.Natl.Acad.Sci.USA 101：793；Eason等(2004)Proc.Natl.Acad.Sci.USA 101：11046；和Brenner(2004)Genome Biol. 5：240中。In some embodiments, barcodes, such as SBCs or UIDs, may have lengths in the range of 4-36 nucleotides, 6-30 nucleotides, or 8-20 nucleotides, respectively. In certain aspects, the melting temperature of barcodes within one set is within 10°C of another, within 5°C of another, or within 2°C of another. In other aspects, the barcode is a member of the minimal cross-linking-hybridization group. For example, the nucleotide sequence of each member of the group may be sufficiently different from the nucleotide sequence of each other member of the group that under stringent hybridization conditions, no member can form stable with the complement of any other member. Duplex. In some embodiments, the nucleotide sequence of each member of the lowest crosslink-hybridization group differs by at least two nucleotides from the nucleotide sequence of each other member. Barcoding techniques are described in Winzeler et al. (1999) Science 285:901; Brenner (2000) Genome Biol. 1:1 Kumar et al. (2001) Nature Rev. 2:302; Giaever et al. (2004) Proc.Natl.Acad.Sci.USA 101 : 793; in Eason et al. (2004) Proc. Natl. Acad. Sci. USA 101:11046; and Brenner (2004) Genome Biol. 5:240.

用于本文时，特有的识别标记(UID)包含多个分子或分子文库中的单一分子或两种以上的分子所特有的信息。条形码可以是UID。在一些实施方案中，特有的信息包括核苷酸的特有序列。例如，可以通过确定包含 UID的核苷酸特有的或随机的序列的性质和顺序来确定UID的序列。在一些实施方案中，特有的信息不能用于鉴定靶标多核苷酸的序列。在一些实施方案中，特有的信息不是与靶标多核苷酸的序列的性质相关的已知的序列。例如，UID可以连接到一种或多种靶标多核苷酸上，但是UID不能用于确定哪一种或哪几种靶标多核苷酸与其连接。在一些实施方案中，特有的信息包括核苷酸的随机序列。在一些实施方案中，特有的信息包括多核苷酸上一种或多种特有的核苷酸序列。在一些实施方案中，特有的信息包括简并的核苷酸序列或简并的条形码。简并的条形码可以包括可变的核苷酸碱基组成或序列。例如，简并的条形码可以是随机的序列。在一些实施方案中，UID的互补序列也是UID序列。As used herein, a unique identification tag (UID) contains information unique to a single molecule or two or more molecules in a plurality of molecules or a library of molecules. The barcode can be a UID. In some embodiments, the unique information includes the unique sequence of nucleotides. For example, the sequence of the UID can be determined by determining the nature and order of the sequence unique or random to the nucleotides comprising the UID. In some embodiments, the unique information cannot be used to identify the sequence of the target polynucleotide. In some embodiments, the unique information is not a known sequence related to the nature of the sequence of the target polynucleotide. For example, a UID can be attached to one or more target polynucleotides, but the UID cannot be used to determine which target polynucleotide or types are attached to it. In some embodiments, the unique information includes a random sequence of nucleotides. In some embodiments, the unique information includes one or more unique nucleotide sequences on the polynucleotide. In some embodiments, the unique information includes a degenerate nucleotide sequence or a degenerate barcode. Degenerate barcodes can include variable nucleotide base composition or sequence. For example, a degenerate barcode can be a random sequence. In some embodiments, the complementary sequence of the UID is also the UID sequence.

UID可以包括任意长度的核苷酸。例如，UID可以包含至少约2，3， 4，5，6，7，8，9，10，11，12，13，14，15，16，17，18，19，20， 21，22，23，24，25，26，27，28，29，30，31，32，33，34，35， 36，37，38，39，40，41，42，43，44，45，46，47，48，49，50，60，70，80，90，100，200，500，或1000个核苷酸。例如，UID可以包含至多约5，6，7，8，9，10，11，12，13，14，15，16，17，18， 19，20，21，22，23，24，25，26，27，28，29，30，31，32，33， 34，35，36，37，38，39，40，41，42，43，44，45，46，47，48， 49，50，60，70，80，90，100，200，500，或1000个核苷酸。在一些实施方案中，UID具有特定的核苷酸长度。例如，UID长度可以是约2，3， 4，5，6，7，8，9，10，11，12，13，14，15，16，17，18，19，20， 21，22，23，24，25，26，27，28，29，30，31，32，33，34，35， 36，37，38，39，40，41，42，43，44，45，46，47，48，49，50， 60，70，80，90，100，200，500，或1000个核苷酸。UIDs can include nucleotides of any length. For example, the UID may contain at least about 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23 , 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48 , 49, 50, 60, 70, 80, 90, 100, 200, 500, or 1000 nucleotides. For example, the UID may contain up to about 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26 , 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 60 , 70, 80, 90, 100, 200, 500, or 1000 nucleotides. In some embodiments, the UID has a specific nucleotide length. For example, the UID length can be about 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23 , 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48 , 49, 50, 60, 70, 80, 90, 100, 200, 500, or 1000 nucleotides.

在一些实施方案中，多种UIDs中的每一种UID具有至少约2个核苷酸。例如，多种UIDs中的每一种UID的长度可以是至少约2，3，4，5， 6，7，8，9，10，11，12，13，14，15，16，17，18，19，20，21， 22，23，24，25，26，27，28，29，30，31，32，33，34，35，36， 37，38，39，40，41，42，43，44，45，46，47，48，49，50，60， 70，80，90，100，200，500，或1000个核苷酸。在一些实施方案中，多种UIDs中的每一种UID具有至多约1000个核苷酸。例如，多种UIDs 中的每一种UID的长度可以至多为约5，6，7，8，9，10，11，12， 13，14，15，16，17，18，19，20，21，22，23，24，25，26，27， 28，29，30，31，32，33，34，35，36，37，38，39，40，41，42， 43，44，45，46，47，48，49，50，60，70，80，90，100，200，500，或1000个核苷酸。在一些实施方案中，多种UIDs中的每一种UID具有相同的核苷酸长度。例如，多种UIDs中的每一种UID的长度可以是2， 3，4，5，6，7，8，9，10，11，12，13，14，15，16，17，18，19， 20，21，22，23，24，25，26，27，28，29，30，31，32，33，34， 35，36，37，38，39，40，41，42，43，44，45，46，47，48，49， 50，60，70，80，90，100，200，500，或1000个核苷酸。在一些实施方案中，多种UIDs中的一种或多种UIDs具有不同的核苷酸长度。例如，多种UIDs中的一种或多种第一UIDs可以具有约、或至少约2，3，4，5， 6，7，8，9，10，11，12，13，14，15，16，17，18，19，20，21，22，23，24，25，26，27，28，29，30，31，32，33，34，35，36， 37，38，39，40，41，42，43，44，45，46，47，48，49，50，60， 70，80，90，100，200，500，或1000个核苷酸，并且多种UIDs中的一种或多种第二UIDs可以具有约2，3，4，5，6，7，8，9，10，11， 12，13，14，15，16，17，18，19，20，21，22，23，24，25，26， 27，28，29，30，31，32，33，34，35，36，37，38，39，40，41， 42，43，44，45，46，47，48，49，50，60，70，80，90，100，200， 500，或1000个核苷酸，其中所述一种或多种第一UIDs的核苷酸数目不同于所述一种或多种第二UIDs的数目。In some embodiments, each of the plurality of UIDs has at least about 2 nucleotides. For example, the length of each of the plurality of UIDs may be at least about 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18 , 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43 , 44, 45, 46, 47, 48, 49, 50, 60, 70, 80, 90, 100, 200, 500, or 1000 nucleotides. In some embodiments, each of the plurality of UIDs has at most about 1000 nucleotides. For example, the length of each of the various UIDs may be at most about 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 , 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46 , 47, 48, 49, 50, 60, 70, 80, 90, 100, 200, 500, or 1000 nucleotides. In some embodiments, each of the plurality of UIDs has the same nucleotide length. For example, the length of each of the various UIDs may be 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 , 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44 , 45, 46, 47, 48, 49, 50, 60, 70, 80, 90, 100, 200, 500, or 1000 nucleotides. In some embodiments, one or more of the plurality of UIDs have different nucleotide lengths. For example, one or more of the first UIDs of the plurality of UIDs may have about, or at least about 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 60, 70, 80, 90, 100, 200, 500, or 1000 nucleotides, and one of multiple UIDs or Multiple second UIDs can have about 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 60, 70, 80, 90, 100, 200, 500, or 1000 nucleotides, wherein the one or more first UIDs have a different number of nucleotides than the one or The number of various second UIDs.

对于要被标记的分子的数目，UIDs的数目可以是过量的。在一些实施方案中，UIDs的数目是要被标记的分子的数目的至少约2，3，4，5， 6，7，8，9，10，15，20，30，40，50，60，70，80，90，或100倍。The number of UIDs can be in excess for the number of molecules to be labeled. In some embodiments, the number of UIDs is at least about 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 30, 40, 50, 60, 70, 80, 90, or 100 times.

对于要被标记的不同分子的数目，不同UIDs的数目可以是过量的。在一些实施方案中，不同UIDs的数目是要被标记的不同分子的数目的至少约1，1.5，2，2.5，3，3.5，4，4.5，5，6，7，8，9，10，15，20， 30，40，50，60，70，80，90，或100倍。The number of different UIDs may be in excess for the number of different molecules to be labeled. In some embodiments, the number of distinct UIDs is at least about 1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5, 6, 7, 8, 9, 10, 15, 20, 30, 40, 50, 60, 70, 80, 90, or 100 times.

在一些实施方案中，至少约1％，2％，3％，4％，5％，6％，7％， 8％，9％，10％，15％，20％，25％，30％，35％，40％，45％，50％， 55％，60％，65％，70％，75％，80％，85％，90％，95％，97％，或 100％的不同的UIDs具有相同的浓度。在一些实施方案中，至少约1％，2％，3％，4％，5％，6％，7％，8％，9％，10％，15％，20％，25％， 30％，35％，40％，45％，50％，55％，60％，65％，70％，75％，80％， 85％，90％，95％，97％，或100％的不同的UIDs具有不同的浓度。In some embodiments, at least about 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 97%, or 100% of different UIDs have the same concentration. In some embodiments, at least about 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 97%, or 100% of different UIDs have different concentrations.

一群UIDs中的UIDs可以具有至少10，15，20，25，30，35，40， 45，50，60，70，80，90，100，200，300，400，500，600，700， 800，900，1000种以上的不同序列。例如，群体中的UIDs可以具有至少 2,000，3,000，4,000，5,000，6,000，7,000，8,000，9,000，10,000， 15,000，20,000，25,000，30,000，35,000，40,000，45,000，50,000， 60,000，70,000，80,000，90,000，100,000，200,000，300,000， 400,000，500,000，600,000，700,000，800,000，900,000，1,000,000种以上的不同序列。因此，可以使用多种UIDs由一种或多种多核苷酸(靶标多核苷酸)产生至少10，15，20，25，30，35，40，45，50，60， 70，80，90，100，200，300，400，500，600，700，800，900，1000 种以上的不同序列。例如，可以使用多种UIDs由一种或多种多核苷酸(靶标多核苷酸)产生至少2,000，3,000，4,000，5,000，6,000，7,000， 8,000，9,000，10,000，15,000，20,000，25,000，30,000，35,000， 40,000，45,000，50,000，60,000，70,000，80,000，90,000，100,000， 200,000，300,000，400,000，500,000，600,000，700,000，800,000， 900,000，1x10⁶，2x10⁶，3x10⁶，4x10⁶，5x10⁶，6x10⁶，7x10⁶，8x10⁶， 9x10⁶，1x10⁷，2x10⁷，3x10⁷，4x10⁷，5x10⁷，6x10⁷，7x10⁷，8x10⁷， 9x10⁷，1x10⁸，2x10⁸，3x10⁸，4x10⁸，5x10⁸，6x10⁸，7x10⁸，8x10⁸， 9x10⁸，1x10⁹，2x10⁹，3x10⁹，4x10⁹，5x10⁹，6x10⁹，7x10⁹，8x10⁹， 9x10⁹，1x10¹⁰，2x10¹⁰，3x10¹⁰，4x10¹⁰，5x10¹⁰，6x10¹⁰，7x10¹⁰， 8x10¹⁰，9x10¹⁰，1x10¹¹，2x10¹¹，3x10¹¹，4x10¹¹，5x10¹¹，6x10¹¹， 7x10¹¹，8x10¹¹，9x10¹¹，1x10¹²，2x10¹²，3x10¹²，4x10¹²，5x10¹²， 6x10¹²，7x10¹²，8x10¹²，9x10¹²种以上的不同序列。例如，可以使用多种 UIDs由至少约10，15，20，25，30，35，40，45，50，60，70，80， 90，100，200，300，400，500，600，700，800，900，1000，2000，3000，4000，5000，6000，7000，8000，9000，10,000，15,000， 20,000，25,000，30,000，35,000，40,000，45,000，50,000，60,000， 70,000，80,000，90,000，100,000，200,000，300,000，400,000， 500,000，600,000，700,000，800,000，900,000，1x10⁶，2x10⁶，3x10⁶， 4x10⁶，5x10⁶，6x10⁶，7x10⁶，8x10⁶，9x10⁶，1x10⁷，2x10⁷，3x10⁷， 4x10⁷，5x10⁷，6x10⁷，7x10⁷，8x10⁷，9x10⁷，1x10⁸，2x10⁸，3x10⁸， 4x10⁸，5x10⁸，6x10⁸，7x10⁸，8x10⁸，9x10⁸，1x10⁹，2x10⁹，3x10⁹，4x10⁹，5x10⁹，6x10⁹，7x10⁹，8x10⁹，9x10⁹，1x10¹⁰，2x10¹⁰，3x10¹⁰， 4x10¹⁰，5x10¹⁰，6x10¹⁰，7x10¹⁰，8x10¹⁰，9x10¹⁰，1x10¹¹，2x10¹¹， 3x10¹¹，4x10¹¹，5x10¹¹，6x10¹¹，7x10¹¹，8x10¹¹，9x10¹¹，1x10¹²， 2x10¹²，3x10¹²，4x10¹²，5x10¹²，6x10¹²，7x10¹²，8x10¹²，9x10¹²种以上的靶标多核苷酸产生至少约10，15，20，25，30，35，40，45，50，60，70，80，90，100，200，300，400，500，600，700，800，900， 1000，2000，3000，4000，5000，6000，7000，8000，9000，10,000， 15,000，20,000，25,000，30,000，35,000，40,000，45,000，50,000， 60,000，70,000，80,000，90,000，100,000，200,000，300,000， 400,000，500,000，600,000，700,000，800,000，900,000，1x10⁶， 2x10⁶，3x10⁶，4x10⁶，5x10⁶，6x10⁶，7x10⁶，8x10⁶，9x10⁶，1x10⁷， 2x10⁷，3x10⁷，4x10⁷，5x10⁷，6x10⁷，7x10⁷，8x10⁷，9x10⁷，1x10⁸， 2x10⁸，3x10⁸，4x10⁸，5x10⁸，6x10⁸，7x10⁸，8x10⁸，9x10⁸，1x10⁹， 2x10⁹，3x10⁹，4x10⁹，5x10⁹，6x10⁹，7x10⁹，8x10⁹，9x10⁹，1x10¹⁰， 2x10¹⁰，3x10¹⁰，4x10¹⁰，5x10¹⁰，6x10¹⁰，7x10¹⁰，8x10¹⁰，9x10¹⁰， 1x10¹¹，2x10¹¹，3x10¹¹，4x10¹¹，5x10¹¹，6x10¹¹，7x10¹¹，8x10¹¹， 9x10¹¹，1x10¹²，2x10¹²，3x10¹²，4x10¹²，5x10¹²，6x10¹²，7x10¹²，8x10¹²，9x10¹²种以上的不同序列。UIDs in a group of UIDs can have at least 10, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000+ different sequences. For example, the UIDS in the group can have at least 2,000, 3,000, 4,000, 5,000, 6,000, 7,000, 8,000, 9,000, 10,000, 15,000, 20,000, 25,000, 30,000, 35,000, 45,000, 50,000, 70,000,90,000,90,000,90,000,90,000,90,000,90,000,90,000,90,000,90,000,90,000,90,000,90,000,90,000. , 100,000, 200,000, 300,000, 400,000, 500,000, 600,000, 700,000, 800,000, 900,000, 1,000,000 different sequences. Thus, multiple UIDs can be used to generate at least 10, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000+ different sequences. For example, multiple UIDs can be used to generate at least 2,000, 3,000, 4,000, 5,000, 6,000, 7,000, 8,000, 9,000, 10,000, 15,000, 20,000, 25,000, 30,000, 35,000, 40,000, 405,000, 50,000,^60,000 , 70,000, 80,000, 90,000, 100,000, 200,000, 300,000, 400,000, 500,000, 700,000, 800,000, 900,000, 2x10⁶ , 3x10⁶ , 5X10⁶ ,^6X10⁶ ,^7x106 ,^8x106 ,^9x106 ,^1x107 ,^2x107 ,^3x107 ,^4x107 ,^5x107 ,^6x107 ,^7x107 ,^8x107 ,^9x107 ,^1x108 ,^2x108 ,^3x108 ,^4x108^5x108 ,^6x108 ,^7x108 ,^8x108 ,^9x108 ,^1x109 ,^2x109 ,^3x109 ,^4x109 ,^5x109 ,^6x109 ,^7x109 ,^8x109 ,^9x109 ,^1x10 ,^3x100 ,^2x1 ，4x10¹⁰ ，5x10¹⁰ ，6x10¹⁰ ，7x10¹⁰ ， 8x10¹⁰ ，9x10¹⁰ ，1x10¹¹ ，2x10¹¹ ，3x10¹¹ ，4x10¹¹ ，5x10¹¹ ，6x10¹¹ ， 7x10¹¹ ，8x10¹¹ ，9x10¹¹ ，1x10¹² ，2x10¹² , 3x10¹² , 4x10¹² , 5x10¹² , 6x10¹² , 7x10¹² , 8x10¹² , 9x10¹²⁺ different sequences. For example, a variety of UIDs from at least about 10, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 20000, 3000, 4000, 5000, 6000, 7000, 8000, 10,000, 15,000, 20,000, 25,000, 30,000, 35,000, 45,000, 50,000, 70,000, 80,000, 90,000,100,000, 100,000, 100,000, 100,000, 100,000, 100,000, 100,000, 100,000, 100,000, 100,000, 100,000, 100,000, 100,000, 100,000, 100,000, 100,000, 100,000, 100,000, 100,000, 100,000, 100,000. 200,000, 300,000, 400,000, 500,000, 600,000, 700,000, 800,000, 900,000, 1x10⁶ , 2x10⁶ , 3x10⁶ , 4x10⁶ , 5x10⁶ , 6x10⁶ ,^7x10⁶ , 9x10⁶ , 1x10⁷ , 3x10⁷ , 3x10⁷ ,^4x107 ,^5x107 ,^6x107 ,^7x107 ,^8x107 ,^9x107 ,^1x108 ,^2x108 ,^3x108 ,^4x108 ,^5x108 ,^6x108 ,^7x108 ,^8x108 ,^9x108 ,^1x109^2x109 ,^3x109 ,^4x109 ,^5x109 ,^6x109 ,^7x109 ,^8x109 ,^9x109 ,^1x1010 ,^2x1010 ,^3x1010 ,^4x1010 ,^10x1010 ,^6x1010 ,^7x109^,^8x1010 ，1x10¹¹ ，2x10¹¹ ， 3x10¹¹ ，4x10¹¹ ，5x10¹¹ ，6x10¹¹ ，7x10¹¹ ，8x10¹¹ ，9x10¹¹ ，1x10¹² ， 2x10¹² ，3x10¹² ，4x10¹² ，5x10¹² ，6x10¹² ，7x10¹² ，8x10¹² , 9x10¹² or more target polynucleotides yield at least about 10, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10,000, 15,000, 20,000, 25, 000，30,000，35,000，40,000，45,000，50,000， 60,000，70,000，80,000，90,000，100,000，200,000，300,000， 400,000，500,000，600,000，700,000，800,000，900,000，1x10⁶ ， 2x10⁶ ，3x10⁶ ，4x10⁶ ，^5x106 ,^6x106 ,^7x106 ,^8x106 ,^9x106 ,^1x107 ,^2x107 ,^3x107 ,^4x107 ,^5x107 ,^6x107 ,^7x107 ,^8x107 ,^9x107 ,^8x10 ,^3x8 ,^82x1 ,^4x108 ,^5x108 ,^6x108 ,^7x108 ,^8x108 ,^9x108 ,^1x109 ,^2x109 ,^3x109 ,^4x109 ,^5x109 ,^6x109 ,^7x109 ,^8x109 ,^9x10109 ,^1x1¹⁰ ，3x10¹⁰ ，4x10¹⁰ ，5x10¹⁰ ，6x10¹⁰ ，7x10¹⁰ ，8x10¹⁰ ，9x10¹⁰ ， 1x10¹¹ ，2x10¹¹ ，3x10¹¹ ，4x10¹¹ ，5x10¹¹ ，6x10¹¹ ，7x10¹¹ ，8x10¹¹ ， 9x10¹¹ ， 1x10¹² , 2x10¹² , 3x10¹² , 4x10¹² , 5x10¹² , 6x10¹² , 7x10¹² , 8x10¹² , 9x10¹²⁺ different sequences.

在一些实施方案中，可以使用一种或多种UIDs来分组或划分序列。在一些实施方案中，可以使用一种或多种UIDs来分组或划分序列，其中每次划分的序列包含相同的UID。在一些实施方案中，可以使用一种或多种UIDs来分组或划分序列，其中每次划分的序列包含扩增子组。在一些实施方案中，可以使用一种或多种UIDs来分组或划分序列，其中每次划分的序列包含多种序列，其中在扩增反应中，产生多种序列的多核苷酸来源于相同的多核苷酸。例如，可以使用一种或多种UIDs来分组或划分扩增子或扩增子组或二者中的序列。在一些实施方案中，不使用一种或多种UIDs来比对序列。In some embodiments, one or more UIDs may be used to group or divide sequences. In some embodiments, one or more UIDs may be used to group or divide sequences, wherein each divided sequence contains the same UID. In some embodiments, one or more UIDs can be used to group or partition sequences, wherein each partitioned sequence comprises a group of amplicons. In some embodiments, one or more UIDs can be used to group or partition sequences, wherein each partitioned sequence comprises multiple sequences, wherein in an amplification reaction, the polynucleotides producing the multiple sequences are derived from the same polynucleotides. For example, one or more UIDs can be used to group or partition sequences in amplicons or groups of amplicons, or both. In some embodiments, one or more UIDs are not used to align sequences.

在一些实施方案中，不使用一种或多种UIDs来比对序列。在一些实施方案中，一种或多种UIDs不用来比对序列，而是用于分组或划分序列。在一些实施方案中，种或多种UIDs不用来比对序列，而是使用靶标特异性区域来比对序列。在一些实施方案中，一种或多种UIDs用于分组或划分序列，并且使用靶标特异性区域来比对序列。在一些实施方案中，一种或多种UIDs不用来比对序列，使用一种或多种UIDs来分组或划分序列，并且使用靶标特异性区域来比对序列。In some embodiments, one or more UIDs are not used to align sequences. In some embodiments, one or more UIDs are not used to align sequences, but to group or divide sequences. In some embodiments, the UIDs or UIDs are not used to align sequences, but target-specific regions are used to align sequences. In some embodiments, one or more UIDs are used to group or delineate sequences, and target-specific regions are used to align sequences. In some embodiments, one or more UIDs are not used to align sequences, one or more UIDs are used to group or divide sequences, and target-specific regions are used to align sequences.

在一些实施方案中，使用一种或多种UIDs来比对序列。在一些实施方案中，使用一种或多种UIDs来比对序列，其中所比对的序列包含相同的UID。在一些实施方案中，使用一种或多种UIDs来比对序列，其中所比对的序列包含两种以上来自扩增子组的序列。在一些实施方案中，使用一种或多种UIDs来比对序列，其中所比对的序列包括多种序列，其中在扩增反应中，产生多个序列的多核苷酸来源于相同的多核苷酸。In some embodiments, one or more UIDs are used to align sequences. In some embodiments, sequences are aligned using one or more UIDs, wherein the aligned sequences comprise the same UID. In some embodiments, sequences are aligned using one or more UIDs, wherein the aligned sequences comprise more than two sequences from a set of amplicons. In some embodiments, one or more UIDs are used to align sequences, wherein the aligned sequences comprise a plurality of sequences, wherein in an amplification reaction, the polynucleotides producing the plurality of sequences are derived from the same polynucleoside acid.

酶enzyme

本文公开的方法和试剂盒可以包括一种或多种酶。酶的实例包括，但不限于，连接酶、反转录酶、聚合酶、和限制性核酸酶。The methods and kits disclosed herein can include one or more enzymes. Examples of enzymes include, but are not limited to, ligases, reverse transcriptases, polymerases, and restriction nucleases.

在一些实施方案中，使衔接子连接到多核苷酸上包括使用一种或多种连接酶。连接酶的实例包括，但不限于，DNA连接酶，诸如DNA连接酶I，DNA连接酶III，DNA连接酶IV，和T4 DNA连接酶，以及RNA 连接酶，诸如T4 RNA连接酶I和T4 RNA连接酶II。In some embodiments, ligating an adaptor to a polynucleotide includes using one or more ligases. Examples of ligases include, but are not limited to, DNA ligases, such as DNA ligase I, DNA ligase III, DNA ligase IV, and T4 DNA ligase, and RNA ligases, such as T4 RNA ligase I and T4 RNA Ligase II.

本文公开的方法和试剂盒可以进一步包括使用一种或多种反转录酶。在一些实施方案中，反转录酶是HIV-1反转录酶、M-MLV反转录酶、 AMV反转录酶、和端粒酶反转录酶。在一些实施方案中，反转录酶是 M-MLV反转录酶。The methods and kits disclosed herein can further comprise the use of one or more reverse transcriptases. In some embodiments, the reverse transcriptase is HIV-1 reverse transcriptase, M-MLV reverse transcriptase, AMV reverse transcriptase, and telomerase reverse transcriptase. In some embodiments, the reverse transcriptase is M-MLV reverse transcriptase.

在一些实施方案中，本文公开的方法和试剂盒包括使用一种或多种聚合酶。聚合酶的实例包括，但不限于，DNA聚合酶和RNA聚合酶。在一些实施方案中，DNA聚合酶是DNA聚合酶I，DNA聚合酶II， DNA聚合酶III全酶和DNA聚合酶IV。可商购的DNA聚合酶包括，但不限于，Bst 2.0 DNA聚合酶，Bst 2.0 WarmStart^TM DNA聚合酶，Bst DNA聚合酶，硫化叶菌属(Sulfolobus)DNA聚合酶IV，Taq DNA聚合酶，9°N^TMm DNA聚合酶，Deep VentR^TM(exo-)DNA聚合酶，Deep VentR^TM DNA聚合酶，Hemo KlenTaq^TM，

Taq DNA聚合酶，

DNA聚合酶，

DNA聚合酶，Q5^TM高保真度 DNA聚合酶，Therminator^TMγDNA聚合酶，Therminator^TM DNA聚合酶，Therminator^TM II DNA聚合酶，Therminator^TM IIIDNA聚合酶，

DNA聚合酶，

(exo-)DNA聚合酶，Bsu DNA聚合酶，phi29 DNA聚合酶，T4 DNA聚合酶，T7 DNA聚合酶，末端转移酶，Taq聚合酶，KAPA TaqDNA聚合酶和KAPA Taq热启动DNA 聚合酶。In some embodiments, the methods and kits disclosed herein include the use of one or more polymerases. Examples of polymerases include, but are not limited to, DNA polymerases and RNA polymerases. In some embodiments, the DNA polymerase is DNA polymerase I, DNA polymerase II, DNA polymerase III holoenzyme, and DNA polymerase IV. Commercially available DNA polymerases include, but are not limited to, Bst 2.0 DNA polymerase, Bst 2.0 WarmStart^™ DNA polymerase, Bst DNA polymerase, Sulfolobus DNA polymerase IV, Taq DNA polymerase, 9 °N^™ m DNA polymerase, Deep VentR^™ (exo-) DNA polymerase, Deep VentR^™ DNA polymerase, Hemo KlenTaq^™ ,

Taq DNA polymerase,

DNA polymerase,

DNA Polymerase, Q5^™ High Fidelity DNA Polymerase, Therminator^™ Gamma DNA Polymerase, Therminator^™ DNA Polymerase, Therminator^™ II DNA Polymerase,^Therminator ™ III DNA Polymerase,

DNA polymerase,

(exo-)DNA polymerase, Bsu DNA polymerase, phi29 DNA polymerase, T4 DNA polymerase, T7 DNA polymerase, terminal transferase, Taq polymerase, KAPA Taq DNA polymerase and KAPA Taq hot-start DNA polymerase.

在一些实施方案中，聚合酶是RNA聚合酶，诸如RNA聚合酶I， RNA聚合酶II，RNA聚合酶III，大肠杆菌(E.coli)聚(A)聚合酶，phi6 RNA聚合酶(RdRP)，Poly(U)聚合酶，SP6RNA聚合酶，和T7 RNA聚合酶。In some embodiments, the polymerase is an RNA polymerase, such as RNA polymerase I, RNA polymerase II, RNA polymerase III, E. coli poly(A) polymerase, phi6 RNA polymerase (RdRP) , Poly(U) polymerase, SP6 RNA polymerase, and T7 RNA polymerase.

另外的试剂Additional reagents

本文公开的方法和试剂盒可以包括使用一种或多种试剂。试剂的实例包括，但不限于，PCR试剂，连接试剂，反转录试剂，酶试剂，杂交试剂，样品制备试剂，亲和捕获试剂，固体支持物，诸如珠子，以及用于核酸纯化和/或分离的试剂。The methods and kits disclosed herein may involve the use of one or more reagents. Examples of reagents include, but are not limited to, PCR reagents, ligation reagents, reverse transcription reagents, enzymatic reagents, hybridization reagents, sample preparation reagents, affinity capture reagents, solid supports, such as beads, and for nucleic acid purification and/or isolated reagents.

固体支持物可以包括几乎任意不溶的或固体的材料，并且通常固体支持物组合物选择不溶于水的固体支持物组合物。例如，固体支持物可以包括下述或基本上由下述组成：硅胶，玻璃(例如，定孔玻璃(CPG))，尼龙，

纤维素，金属表面(例如，钢，金，银，铝，硅和铜)，磁性材料，塑料材料(例如，聚乙烯，聚丙烯，聚酰胺，聚酯，聚偏氟乙烯(PVDF))等。按照实施方案使用的珠子的实例可以包括允许珠子与核酸分子相互作用的亲和结构部分。固相(例如珠子)可以包括结合对的一个成员(例如，抗生物素蛋白、链霉抗生物素或其衍生物)。例如，珠子可以是链霉抗生物素包被的珠子，并且用于固定在珠子上的核酸分子可以包括生物素结构部分。在一些情形中，每种多核苷酸分子可以包括两个亲和结构部分，诸如生物素，以进一步稳定所述多核苷酸。珠子可以包括另外的特征，以用于固定核酸或者可以用于下游筛选或选择过程。例如，珠子可以包括结合结构部分，荧光标记或荧光猝灭剂。在一些情形中，珠子可以是磁性的。在一些情形中，固体支持物是珠子。珠子的实例包括，但不限于，链霉抗生物素珠子，琼脂糖珠子，磁珠，

微珠，抗体缀合的珠子(例如，抗免疫球蛋白微珠)，蛋白A缀合的珠子，蛋白G缀合的珠子，蛋白A/G缀合的珠子，蛋白L缀合的珠子，寡聚-dT缀合的珠子，硅珠，二氧化硅样珠子，抗生物素微珠，抗氟铬微珠，和BcMag^TM羧基封端的磁珠。珠子或颗粒可以是可溶胀的(例如，聚合物珠子，诸如Wang树脂)或不可溶胀的(例如，CPG)。在一些实施方案中固相是基本上亲水性的。在一些实施方案中，固相(例如，珠子)是基本上疏水性的。在一些实施方案中，固相包括结合对的一个成员(例如，抗生物素蛋白、链霉抗生物素或其衍生物)，并且是基本上疏水性的或基本上亲水性的。在一些实施方案中，固相包括结合对的一个成员(例如，抗生物素蛋白、链霉抗生物素或其衍生物)，并且具有大于约1350皮摩尔游离捕获剂(例如，游离的生物素)/mg固体支持物的结合能力。在一些实施方案中，包含结合对的一个成员的固相的结合能力大于800，900， 1000，1100，1200，1250，1300，1350，1400，1450，1500，1600， 1800，2000皮摩尔游离的捕获剂/mg固体支持物。适用于本发明的珠子的其他实例是胶体金或珠子，诸如聚苯乙烯珠子或硅珠。基本上可以使用任意的珠子半径。珠子的实例可以包括具有150纳米-10微米范围的直径的珠子。也可以使用其他尺寸。The solid support can include almost any insoluble or solid material, and typically the solid support composition is chosen to be water-insoluble. For example, the solid support can comprise or consist essentially of the following: silica gel, glass (eg, fixed pore glass (CPG)), nylon,

Cellulose, metal surfaces (eg, steel, gold, silver, aluminum, silicon, and copper), magnetic materials, plastic materials (eg, polyethylene, polypropylene, polyamide, polyester, polyvinylidene fluoride (PVDF)), etc. . Examples of beads used in accordance with embodiments may include affinity moieties that allow the beads to interact with nucleic acid molecules. A solid phase (eg, a bead) can include one member of a binding pair (eg, avidin, streptavidin, or derivatives thereof). For example, the beads can be streptavidin-coated beads, and the nucleic acid molecules for immobilization on the beads can include biotin moieties. In some cases, each polynucleotide molecule can include two affinity moieties, such as biotin, to further stabilize the polynucleotide. Beads can include additional features for use in immobilizing nucleic acids or can be used in downstream screening or selection processes. For example, beads can include binding moieties, fluorescent labels or fluorescent quenchers. In some cases, the beads can be magnetic. In some cases, the solid support is a bead. Examples of beads include, but are not limited to, streptavidin beads, agarose beads, magnetic beads,

microbeads, antibody-conjugated beads (eg, anti-immunoglobulin microbeads), protein A-conjugated beads, protein G-conjugated beads, protein A/G-conjugated beads, protein L-conjugated beads, oligo Poly-dT conjugated beads, silica beads, silica-like beads, anti-biotin microbeads, anti-fluorochromium microbeads, and BcMag^™ carboxy-terminated magnetic beads. The beads or particles may be swellable (eg, polymeric beads such as Wang resin) or non-swellable (eg, CPG). In some embodiments the solid phase is substantially hydrophilic. In some embodiments, the solid phase (eg, beads) is substantially hydrophobic. In some embodiments, the solid phase includes one member of a binding pair (eg, avidin, streptavidin, or a derivative thereof) and is substantially hydrophobic or substantially hydrophilic. In some embodiments, the solid phase includes one member of a binding pair (eg, avidin, streptavidin, or a derivative thereof) and has greater than about 1350 picomoles of free capture agent (eg, free biotin) )/mg of solid support binding capacity. In some embodiments, the binding capacity of the solid phase comprising one member of the binding pair is greater than 800, 900, 1000, 1100, 1200, 1250, 1300, 1350, 1400, 1450, 1500, 1600, 1800, 2000 picomoles free Capture agent/mg solid support. Other examples of beads suitable for use in the present invention are colloidal gold or beads, such as polystyrene beads or silica beads. Basically any bead radius can be used. Examples of beads may include beads having diameters in the range of 150 nanometers to 10 micrometers. Other sizes are also available.

本文公开的方法和试剂盒可以包括实验一种或多种缓冲剂。缓冲剂的实例包括，但不限于，洗涤缓冲剂，连接缓冲剂，杂交缓冲剂，扩增缓冲剂和反转录缓冲剂。在一些实施方案中，杂交缓冲剂是可商购获得的缓冲剂，诸如TMAC Hyb溶液，SSPE杂交溶液和ECONOTM杂交缓冲剂。本文公开的缓冲剂可以包括一种或多种去污剂。The methods and kits disclosed herein can include assay one or more buffers. Examples of buffers include, but are not limited to, wash buffers, ligation buffers, hybridization buffers, amplification buffers, and reverse transcription buffers. In some embodiments, the hybridization buffer is a commercially available buffer such as TMAC Hyb solution, SSPE hybridization solution, and ECONO™ hybridization buffer. The buffers disclosed herein can include one or more detergents.

本文公开的方法和试剂可以包括使用一种或多种载体。载体可以提高或改善本文公开的一种或多种反应(例如，连接反应，反转录，扩增，杂交)的效率。载体可以减少或防止分子或其产物(例如，多核苷酸和/或扩增子)的非特异性丢失。例如，载体可以通过吸附到表面上而减少多核苷酸的非特异性丢失。载体可以降低多核苷酸针对表面或基底(例如，容器、Eppendorf管、移液器末端)的亲和力。备选地，载体可以提高多核苷酸针对表面或基底(例如，珠子，阵列，玻璃，载玻片或芯片)的亲和力。载体可以保护多核苷酸不被降解。例如，载体可以保护RNA分子不受核糖核酸酶降解。备选地，载体可以保护DNA分子不受DNA酶降解。载体的实例包括，但不限于，多核苷酸，诸如DNA和/或RNA，或多肽。DNA载体的实例包括质粒、载体、聚腺苷酸化的DNA和DNA寡核苷酸。 RNA载体的实例包括聚腺苷酸化的RNA、噬菌体RNA、噬菌体MS2 RNA、大肠杆菌RNA、酵母RNA、酵母tRNA、哺乳动物RNA、哺乳动物tRNA、短的聚腺苷酸化的合成的核糖核苷酸和RNA寡核苷酸。RNA 载体可以是聚腺苷酸化的RNA。备选地，RNA载体可以是非聚腺苷酸化的RNA。在一些实施方案中，载体来自细菌、酵母或病毒。例如，载体可以是来源于细菌、酵母或病毒的多核苷酸或多肽。例如，载体是来自枯草芽孢杆菌(Bacillus subtilis)的蛋白。在另一个实例中，载体是来自大肠杆菌(Escherichia coli)的多核苷酸。备选地，所述载体是来自哺乳动物(例如，人，小鼠，山羊，大鼠，母牛，绵羊，猪，狗，或兔)、禽类、两栖动物或爬行动物的多核苷酸或肽。The methods and reagents disclosed herein may involve the use of one or more carriers. A vector can increase or improve the efficiency of one or more of the reactions disclosed herein (e.g., ligation, reverse transcription, amplification, hybridization). A vector can reduce or prevent nonspecific loss of a molecule or its product (e.g., a polynucleotide and/or amplicon). For example, supports can reduce nonspecific loss of polynucleotides by adsorption to surfaces. A carrier can reduce the affinity of a polynucleotide for a surface or substrate (e.g., container, Eppendorf, pipette tip). Alternatively, the carrier can increase the affinity of the polynucleotide for a surface or substrate (e.g., beads, arrays, glass, slides or chips). The carrier can protect the polynucleotide from degradation. For example, the carrier can protect the RNA molecule from degradation by ribonucleases. Alternatively, the vector can protect the DNA molecule from DNase degradation. Examples of vectors include, but are not limited to, polynucleotides, such as DNA and/or RNA, or polypeptides. Examples of DNA vectors include plasmids, vectors, polyadenylated DNA, and DNA oligonucleotides. Examples of RNA vectors include polyadenylated RNA, bacteriophage RNA, bacteriophage MS2 RNA, E. coli RNA, yeast RNA, yeast tRNA, mammalian RNA, mammalian tRNA, short polyadenylated synthetic ribonucleotides and RNA oligonucleotides. The RNA vector can be polyadenylated RNA. Alternatively, the RNA carrier can be a non-polyadenylated RNA. In some embodiments, the vector is derived from bacteria, yeast or viruses. For example, the vector can be a polynucleotide or polypeptide derived from bacteria, yeast or viruses. For example, the vector is a protein from Bacillus subtilis. In another example, the vector is a polynucleotide from Escherichia coli. Alternatively, the vector is a polynucleotide or peptide from a mammal (eg, human, mouse, goat, rat, cow, sheep, pig, dog, or rabbit), avian, amphibian or reptile .

本文公开的方法和试剂盒可以包括使用一种或多种对照试剂。对照试剂可以包括对照多核苷酸、灭活的酶和非特异性的竞争剂。备选地，对照试剂包括明亮杂交(brighthybridization)、明亮探针对照(bright probe controls)、核酸模板、插入物(spike-in)对照、PCR扩增对照。PCR扩增对照可以是阳性对照。在其他实例中，PCR扩增对照是阴性对照。核酸模板对照可以是已知的浓度。对照试剂可以包括一种或多种标记。The methods and kits disclosed herein can include the use of one or more control reagents. Control reagents can include control polynucleotides, inactivated enzymes, and nonspecific competitors. Alternatively, control reagents include bright hybridization, bright probe controls, nucleic acid templates, spike-in controls, PCR amplification controls. A PCR amplification control can be a positive control. In other examples, the PCR amplification control is a negative control. Nucleic acid template controls can be of known concentrations. Control reagents can include one or more labels.

插入物对照可以死添加到反应或样品中的模板。例如，可以将插入物模板添加到扩增反应中。插入物模板可以在第一扩增循环后的任意时间添加到扩增反应中。在一些实施方案中，插入物模板在2，3，4，5，6， 7，8，9，10，11，12，13，14，15，20，25，30，35，40，45，或50 个循环后添加到扩增反应中。插入物模板可以在最后一个扩增循环之前的任意时间添加到扩增反应中。插入物模板可以包含一个或多个核苷酸或核酸碱基对。插入物模板可以包括DNA、RNA或它们的任意组合。插入物模板可以包括一种或多种标记。Insert controls can be added to the reaction or sample template. For example, an insert template can be added to an amplification reaction. The insert template can be added to the amplification reaction at any time after the first amplification cycle. In some embodiments, the insert template is at 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 20, 25, 30, 35, 40, 45, or added to the amplification reaction after 50 cycles. The insert template can be added to the amplification reaction at any time before the last amplification cycle. The insert template can contain one or more nucleotides or nucleic acid base pairs. Insert templates can include DNA, RNA, or any combination thereof. The insert template may include one or more markers.

计算机执行的方面computer-implemented aspects

如本领域普通技术人员理解的，本文所述的方法和信息可以完全或部分作为在已知的计算机可读介质上的计算机执行的指示而确定。例如，本文所述的方法可以以硬件执行。备选地，方法可以以软件执行，例如，以存储在一个或多个存储器或其他计算机可读介质中的软件执行，并且在一个或多个处理器上执行。如已知的，处理器可以与一个或多个控制器、计算部件和/或计算机系统的其他部件关联，或者在需要时以固件执行。如果以软件执行，路径可以存储在任何计算机存储器中，诸如在RAM、 ROM、闪存、磁盘、光盘或其他存储介质中，这也是已知的。类似地，该软件可以通过任何已知的传送方法传送到计算机装置，所述传送方法包括，例如，通信通道，诸如电话线、互联网、无限连接等，或通过可运输的介质，诸如计算机可读的盘、闪盘驱动器等。更一般地，并且本领域普通技术人员应该理解的，上述各个步骤可以作为多个区块、操作、工具、模块和技术执行，其又可以以硬件、固件、软件或硬件、固件和/或软件的任何组合。当以硬件执行时，一些或全部的区块、操作、技术等可以以例如定制集成电路(IC)、应用专用集成电路(ASIC)、现场可编程逻辑阵列 (FPGA)、可编程逻辑阵列(PLA)等执行。As will be understood by those of ordinary skill in the art, the methods and information described herein may be determined in whole or in part as computer-implemented instructions on known computer-readable media. For example, the methods described herein may be implemented in hardware. Alternatively, the methods may be implemented in software, for example, in software stored in one or more memories or other computer-readable media, and executed on one or more processors. As is known, a processor may be associated with one or more controllers, computing components, and/or other components of a computer system, or execute in firmware as desired. If implemented in software, the paths may be stored in any computer memory, such as RAM, ROM, flash memory, magnetic disk, optical disk, or other storage medium, as is also known. Similarly, the software may be delivered to the computer device by any known delivery method including, for example, a communication channel such as a telephone line, the Internet, a wireless connection, etc., or by a transportable medium such as a computer-readable medium disks, flash drives, etc. More generally, and as will be understood by those of ordinary skill in the art, the various steps described above may be performed as multiple blocks, operations, tools, modules and techniques, which in turn may be implemented in hardware, firmware, software or hardware, firmware and/or software any combination of . When implemented in hardware, some or all of the blocks, operations, techniques, etc. may be implemented in, for example, custom integrated circuits (ICs), application specific integrated circuits (ASICs), field programmable logic arrays (FPGAs), programmable logic arrays (PLAs) ) and so on.

来自测序数据的结果可以存储在数据存储单元中，诸如数据载体，包括计算机数据库，数据存储盘，或通过其他便利的数据存储方式。在某些实施方案中，计算机数据库是对象数据库(object database)、关系数据库(relational database)或后关系数据库(post-relational database)。数据可以用便利的数据查询方法从数据存储单元中取回。Results from sequencing data may be stored in a data storage unit, such as a data carrier, including a computer database, a data storage disk, or by other convenient data storage means. In certain embodiments, the computer database is an object database, a relational database, or a post-relational database. Data can be retrieved from the data storage unit using convenient data query methods.

当以软件执行时，软件可以存储在任何已知的计算机可读介质中，诸如存储在磁盘、光盘、或其他存储介质上，在计算机的RAM或ROM或闪存、处理器、硬盘驱动器、光盘驱动器、磁带驱动器等中。类似地，软件可以通过任何已知的传送方法传送给使用者或计算机系统，所述传送方法包括，例如，在计算机可读盘上或其他可运输的计算机存储机制。When executed in software, the software may be stored in any known computer-readable medium, such as on a magnetic disk, optical disk, or other storage medium, in the RAM or ROM or flash memory of a computer, processor, hard drive, optical disk drive , tape drives, etc. Similarly, the software may be transferred to a user or computer system by any known transfer method, including, for example, on a computer readable disk or other transportable computer storage mechanism.

所要求的方法的步骤可以用多种其他一般目的或特殊目的计算系统环境或构造操作。公知的可以适用于所要求的方法或系统的计算系统、环境和/或构造的实例包括，但不限于，个人计算机、服务器计算机、手提式或便携式装置、多处理器系统、基于微处理器的系统、机顶盒、可编程的消费者电子产品、网络PCs、迷你计算机、大型计算机、包括上述系统或装置中的任一种的分布式计算环境等。The steps of the claimed method can be operated in a variety of other general purpose or special purpose computing system environments or configurations. Examples of well-known computing systems, environments, and/or configurations that can be adapted for use with the claimed methods or systems include, but are not limited to, personal computers, server computers, hand-held or portable devices, multi-processor systems, microprocessor-based Systems, set-top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments including any of the above systems or devices, and the like.

所要求的方法的步骤可以记述在计算机执行指示的一般情形中，诸如程序模块，其通过计算机执行。一般地，程序模块包括进行具体任务或执行具体抽象数据类型的路径、程序、目标、组件和/或数据结构。方法可以在分布式计算环境中实施，在该环境中，通过由通信网络连接的远程处理装置进行任务。在集成的和分布式的计算环境中，程序模块可以放置在本地和远程计算机存储介质二者中，包括记忆存储装置。多种备用环境可以使用目前的技术或在本申请的申请日后开发的技术执行，在本申请的申请日后开发的技术仍然落入限定本公开内容的权利要求书的范围内。The steps of the claimed method may be described in the general case of a computer-implemented instruction, such as a program module, which is executed by a computer. Generally, program modules include paths, programs, objects, components and/or data structures that perform particular tasks or execute particular abstract data types. The methods can be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked by a communications network. In integrated and distributed computing environments, program modules may be located in both local and remote computer storage media, including memory storage devices. Various alternate environments may be implemented using current technology or technology developed after the filing date of this application that would still fall within the scope of the claims defining the present disclosure.

尽管已经记载方法和其他元件优选地以软件执行，它们可以以影响、固件等执行，并且可以通过任何其他的处理器执行。因此，本文所述的元件可以以标准的多目的CPU或在专门设计的硬件或固件(诸如应用专用集成电路(ASIC))或其他需要的硬线装置上执行。当以软件执行时，软件路径可以存储在任何计算机可读存储器中，诸如存储在磁盘、光盘、或其他存储介质上，在计算机或处理器的RAM或ROM中，在任何数据库中等。类似地，该软件可以经由任何已知的或需要的传送方法传送给使用者或播放系统，所述传送方法包括，例如，在计算机可读盘或其他可运输的计算机存储介质上或通过通信通道，例如，电话线、互联网、或无线通信。在不背离本公开内容的精神和范围的前提下，在本文所述和示例的技术和结构上可以进行改进和变化。Although it has been described that the methods and other elements are preferably implemented in software, they may be implemented in software, firmware, etc., and by any other processor. Accordingly, the elements described herein may execute on a standard multipurpose CPU or on specially designed hardware or firmware (such as an application specific integrated circuit (ASIC)) or other desired hardwired device. When executed in software, the software path may be stored in any computer-readable storage, such as on a magnetic disk, optical disk, or other storage medium, in the RAM or ROM of a computer or processor, in any database, and the like. Similarly, the software may be delivered to a user or playback system via any known or desired delivery method, including, for example, on a computer-readable disk or other transportable computer storage medium or through a communication channel , for example, telephone lines, the Internet, or wireless communications. Modifications and changes may be made in the techniques and structures described and exemplified herein without departing from the spirit and scope of the present disclosure.

图58是示例可以联系本发明的实施例实施方案使用的计算机系统 100的第一实例的构造的结构图。如在图58中所示，该实例计算机系统可以包括用于处理指示的处理器102。处理器的非限制性实例包括：Intel Xeon^TM处理器，AMD Opteron^TM处理器，Samsung32-bit RISC ARM 1176JZ(F)-S v1.0^TM处理器，ARM Cortex-A8 Samsung S5PC100^TM处理器，ARM Cortex-A8 Apple A4^TM处理器，Marvell PXA 930^TM处理器，或功能等价的处理器。可以使用多线程执行进行平行处理。在一些实施方案中，不管是在单计算机系统中，在集群中，或包括多个计算机的网络的分布式系统、手机和/或个人数据辅助装置中，也可以使用多个处理器或具有多个中心的处理器。58 is a block diagram illustrating the configuration of a first instance of acomputer system 100 that may be used in connection with example implementations of the present invention. As shown in FIG. 58, the example computer system may include aprocessor 102 for processing indications. Non-limiting examples of processors include: Intel Xeon^™ processor, AMD Opteron^™ processor, Samsung32-bit RISC ARM 1176JZ(F)-S v1.0^™ processor, ARM Cortex-A8 Samsung S5PC100^™ processor, ARM Cortex-A8 Apple A4^TM processor, Marvell PXA 930^TM processor, or functionally equivalent. Parallel processing is possible using multithreaded execution. In some embodiments, whether in a single computer system, in a cluster, or in a distributed system comprising a network of a central processor.

如图59所示，高速缓存104可以连接到或结合在处理器102中，以为处理器102最近使用的或经常使用的指示或数据提供高速存储。处理器 102通过处理器总线108连接北桥接器(north bridge)106。北桥接器106 通过存储器总线112连接随机存取存储器(RAM)110并且通过处理器102 管理对RAM 110的存取。北桥接器106还通过芯片组总线116连接南桥接器114。南桥接器114又连接外围总线118。例如，外围总线可以是PCI， PCI-X，PCI插槽(PCI Express)或其他外围总线。北桥接器和南桥接器通常称为处理器芯片组，并且管理处理器、RAM与外围总线118上的外围组件之间的数据传输。在一些备选的构造中，北桥接器的功能性可以结合在处理器中，而不是使用分开的北桥接器芯片。As shown in Figure 59,cache memory 104 may be connected to or incorporated inprocessor 102 to provide high-speed storage for recently used or frequently used instructions or data ofprocessor 102. Theprocessor 102 is connected to anorth bridge 106 through a processor bus 108.North bridge 106 connects random access memory (RAM) 110 through memory bus 112 and manages access toRAM 110 throughprocessor 102 . Thenorth bridge 106 is also connected to thesouth bridge 114 through the chipset bus 116.South bridge 114 in turn connects toperipheral bus 118 . For example, the peripheral bus may be PCI, PCI-X, PCI slot (PCI Express) or other peripheral bus. The north and south bridges are commonly referred to as processor chipsets, and manage the transfer of data between the processor, RAM, and peripheral components onperipheral bus 118. In some alternative constructions, the functionality of the north bridge can be combined in the processor rather than using a separate north bridge chip.

在一些实施方案中，系统100可以包括连接到外围总线118上的加速卡122。加速器可以包括现场可编程的门阵列(field programmable gate arrays，FPGAs)或其他用于加速某种处理的硬件。例如，加速器可以用于自适应数据重组或评价用于扩展集处理的代数式表示。In some embodiments,system 100 may include anaccelerator card 122 coupled toperipheral bus 118. Accelerators may include field programmable gate arrays (FPGAs) or other hardware for accelerating certain processing. For example, accelerators can be used for adaptive data reorganization or to evaluate algebraic representations for extended set processing.

软件和数据可以存储在外部存储器124中，并且可以负载到RAM 110 和/或缓存104中，以为处理器所用。系统100包括用于管理系统资源的操作系统；操作系统的非限制性实例包括：Linux，Windows^TM， MACOS^TM，BlackBerry OS^TM，iOS^TM，和其他功能等价的操作系统，以及在操作系统上运行的应用软件，其用于管理数据存储和按照本发明的实例实施方案的优化。Software and data may be stored inexternal memory 124 and may be loaded intoRAM 110 and/orcache 104 for use by the processor.System 100 includes an operating system for managing system resources; non-limiting examples of operating systems include: Linux, Windows^(TM) , MACOS^(TM) , BlackBerry OS^(TM) , iOS^(TM) , and other functionally equivalent operating systems, as well as on operating systems Running application software for managing data storage and optimization in accordance with example embodiments of the present invention.

在该实例中，系统100还包括连接到外围总线上的网络接口卡(NICs) 120和121，用于为外部存储提供网络接口，诸如网络附加存储(NAS)和其他可以用于分布式平行处理的计算机系统。In this example,system 100 also includes network interface cards (NICs) 120 and 121 connected to the peripheral bus for providing a network interface to external storage, such as network attached storage (NAS) and others that can be used for distributed parallel processing computer system.

图59是显示具有多个计算机系统202a和202b、多个手机和个人数据辅助202c以及网络附加存储(NAS)204a和204b的网络200的图解。在实例实施方案中，系统202a、202b、和202c可以管理数据存储，并且优化对存储在网络附加存储(NAS)204a和204b中的数据存取。数学模型可以用于数据，并且使用计算机系统202a和202b以及手机和个人数据辅助系统202c的分布式平行处理进行评价。计算机系统202a和202b以及手机和个人数据辅助系统202c还可以为存储在网络附加存储(NAS) 204a和204b中的数据的自适应数据重组提供平行处理。图59仅示例了一个实例，并且联系本发明的各个实施方案可以使用宽泛的其他计算机构造和系统。例如，刀片服务器(blade server)可以用于提供平行处理。处理器刀片可以通过底板连接，以提供平行处理。存储器还可以连接底板或通过分离的网络接口作为网络附加存储(NAS)。59 is a diagram showing anetwork 200 having multiple computer systems 202a and 202b, multiple cell phones and personal data assistance 202c, and network attached storage (NAS) 204a and 204b. In an example embodiment, systems 202a, 202b, and 202c may manage data storage and optimize access to data stored in network attached storage (NAS) 204a and 204b. Mathematical models can be applied to the data and evaluated using distributed parallel processing of computer systems 202a and 202b and cell phone and personal data assistance system 202c. Computer systems 202a and 202b and cell phone and personal data assistance system 202c may also provide parallel processing for adaptive data reorganization of data stored in network attached storage (NAS) 204a and 204b. Figure 59 illustrates only one example, and a wide variety of other computer architectures and systems may be used in connection with various embodiments of the present invention. For example, blade servers can be used to provide parallel processing. Processor blades can be connected through a backplane to provide parallel processing. Storage can also be attached to a backplane or as Network Attached Storage (NAS) via a separate network interface.

在一些实例实施方案中，处理器可以保持分离的记忆空间，并且通过网络接口、底板或其他连接器传输数据，用于通过其他处理器进行平行处理。在其他实施方案中，一些或全部处理器可以使用共享虚拟地址存储空间。In some example embodiments, processors may maintain separate memory spaces and transmit data through a network interface, backplane, or other connector for parallel processing by other processors. In other embodiments, some or all processors may use shared virtual address storage space.

图60是按照实例实施方案使用共享虚拟地址存储空间的多处理器计算机系统300的结构图。该系统包括多个处理器302a-f，其可以访问共享存储亚系统304。该系统在存储器亚系统304中结合多个可编程的硬件存储器算法处理器(MAPs)306a-f。每个MAP306a-f可以包括存储器308a-f 和一个或多个现场可编程的门阵列(FPGAs)310a-f。MAP提供可配置的功能单元，并且可以为FPGAs 310a-f提供特定算法或算法部分用于与各个处理器密切配合进行处理。例如，在实例实施方案中，可以使用MAPs 评价关于数据模型的代数式表示，并且进行自适应数据重组。在该实例中，每个MAP整体上可被用于这些目的的全部的处理器访问。在一种构造中，每个MAP可以使用直接内存存取(DMA)来访问附加存储器308a-f，允许其独立于且不与各个微处理器302a-f同步执行认为。在这种构造中，MAP 可以直接向另一个MAP输入结果，用于算法的流水线和平行执行。Figure 60 is a block diagram of amultiprocessor computer system 300 using shared virtual address storage space according to an example embodiment. The system includes a plurality ofprocessors 302a-f that can access a shared storage subsystem 304. The system incorporates a number of programmable hardware memory algorithm processors (MAPs) 306a-f in a memory subsystem 304. Each MAP 306a-f may include memory 308a-f and one or more field programmable gate arrays (FPGAs) 310a-f. MAPs provide configurable functional units and can provideFPGAs 310a-f with specific algorithms or portions of algorithms for processing in close cooperation with the respective processors. For example, in an example embodiment, MAPs may be used to evaluate algebraic representations with respect to data models, and perform adaptive data reorganization. In this example, each MAP as a whole is accessible to all processors for these purposes. In one configuration, each MAP may use direct memory access (DMA) to access additional memory 308a-f, allowing it to execute transactions independently and not synchronously with therespective microprocessors 302a-f. In this configuration, a MAP can directly input results to another MAP for pipelined and parallel execution of the algorithm.

上述计算机构造和系统仅是实例，并且联系实例实施方案可以使用宽泛的其他计算机、手机和个人数据辅助构造与系统，包括使用一般处理器的任意组合的系统、共处理器、FPGAs和其他可编程的逻辑装置、芯片上的系统(SOCs)、应用专用集成电路(ASICs)和其他处理和逻辑元件。在一些实施方案中，所有或部分数据管理和优化系统可以以软件或硬件执行，并且联系实例实施方案可以使用任意种类的数据存储介质，包括随机存取存储器，硬盘驱动器，闪存，磁带驱动器，盘阵列，网络附加存储(NAS) 和其他本地或分布式数据存储装置和系统。The above-described computer structures and systems are examples only, and a wide variety of other computer, cell phone, and personal data auxiliary structures and systems may be used in connection with example embodiments, including systems using any combination of general processors, co-processors, FPGAs, and other programmable Logic devices, systems on chips (SOCs), application specific integrated circuits (ASICs) and other processing and logic elements. In some embodiments, all or part of the data management and optimization system may be implemented in software or hardware, and any kind of data storage medium may be used in connection with example embodiments, including random access memory, hard drives, flash memory, tape drives, disks Arrays, Network Attached Storage (NAS) and other local or distributed data storage devices and systems.

在实例实施方案中，数据管理和优化系统可以使用软件模块执行，所述软件模块在上述或其他计算机构造和系统中的任一种上执行。在其他实施方案中，系统的功能可以部分或完全在固件、可编程的逻辑装置、芯片上的系统(SOCs)、应用专用集成电路(ASICs)或其他处理和逻辑元件中执行。例如，处理器组(Set Processor)和优化器可以用通过使用硬件加速卡的硬件加速执行。In an example embodiment, the data management and optimization system may be implemented using software modules that execute on any of the above or other computer architectures and systems. In other embodiments, the functions of the system may be performed partially or fully in firmware, programmable logic devices, systems on chips (SOCs), application specific integrated circuits (ASICs), or other processing and logic elements. For example, the Set Processor and the optimizer can be executed with hardware acceleration through the use of a hardware acceleration card.

本领域的技术人员应该理解，尽管上述每个组件中只有一个记述在附图中，但是可以提供这些组件中任意数目的任一个。此外，本领域普通技术人员应该认识到，所公开的系统中任一个的一个或多个组件可以与附图所示的另一个组件组合或结合在其中。附图所示的组件中的一个或多个可以以在一个或多个计算系统上的软件执行。例如，它们可以包括一个或多个应用，其可以包括一个或多个计算机可读指示的计算机单元，所述计算机可读指示当由处理器执行时使得计算机进行方法的各个步骤。计算机可读指示可以存储在计算机可读介质中，诸如存储卡或盘。所述介质典型地提供非暂时性的存储。备选地，附图所示的组件中的一个或多个可以是硬件组件或硬件和软件的组合，诸如，例如，特殊目的的计算机或一般目的的计算机。计算机或计算机系统还可以包括内部或外部数据库。计算机或计算机系统的组件可以通过本地总线接口连接。本领域技术人员应该理解，上述阶段可以以独特的软件模块实施。尽管所公开的组件已经在上文中作为分开的单元进行了描述，本领域普通技术人员应该认识到，由一个或多个单元提供的功能可以组合。本领域普通技术人员应该理解，一个或多个单元可能是任选的，并且在某些实施方案中可以省略实施。It will be understood by those skilled in the art that although only one of each of the above-described components is depicted in the drawings, any number of these components may be provided. Furthermore, one of ordinary skill in the art will recognize that one or more components of any of the disclosed systems may be combined with or incorporated in another component as shown in the figures. One or more of the components shown in the figures may be executed in software on one or more computing systems. For example, they may include one or more applications, which may include one or more computer elements of computer-readable instructions that, when executed by a processor, cause a computer to carry out the various steps of a method. The computer-readable instruction may be stored in a computer-readable medium, such as a memory card or disk. The medium typically provides non-transitory storage. Alternatively, one or more of the components shown in the figures may be hardware components or a combination of hardware and software, such as, for example, a special purpose computer or a general purpose computer. The computer or computer system may also include internal or external databases. Components of a computer or computer system may be connected through a local bus interface. As will be understood by those skilled in the art, the aforementioned stages may be implemented in unique software modules. Although the disclosed components have been described above as separate units, one of ordinary skill in the art will recognize that functionality provided by one or more units may be combined. It will be understood by those of ordinary skill in the art that one or more units may be optional, and in certain embodiments may be omitted from implementation.

试剂盒Reagent test kit

可用于本公开内容的方法的试剂盒包括可用于本文所述的任一种方法中的组件，例如，包括用于核酸扩增的引物，用于检测遗传变异的杂交探针，或气体标记检测，限制性酶，核酸探针，其任选地用适当的标记标记，等位基因特异性寡核苷酸，与由本文所述的公开内容的核酸编码的改变的多肽结合或与由本文所述的公开内容的核酸编码的野生型多肽结合的抗体，用于扩增遗传变异或其片段的方式，用于分析包含本文所述的遗传变异的核酸的核酸序列的方式，用于分析由遗传变异编码的多肽的氨基酸序列的方式，或与遗传变异相关的核酸等。例如，试剂盒可以包括必需缓冲剂，用于扩增核酸的核酸引物，固体支持物，和用于使用所述引物和必需的酶(例如，DNA聚合酶)扩增的片段的等位基因特异性检测的试剂，诸如本文所述的那些中的任一种。另外，试剂盒可以提供与本公开内容的方法组合使用的用于测定的试剂，例如，用于其他疾病或病症筛选测定的试剂。Kits useful in the methods of the present disclosure include components useful in any of the methods described herein, eg, including primers for nucleic acid amplification, hybridization probes for detection of genetic variation, or gas label detection , restriction enzymes, nucleic acid probes, optionally labeled with appropriate labels, allele-specific oligonucleotides, which bind to altered polypeptides encoded by the nucleic acids of the disclosure described herein or are bound by Antibodies that bind to wild-type polypeptides encoded by nucleic acids of the disclosed disclosure, means for amplifying genetic variations or fragments thereof, means for analyzing nucleic acid sequences of nucleic acids comprising genetic variations described herein, means for analyzing nucleic acid sequences comprising genetic variations described herein The manner in which the amino acid sequence of the encoded polypeptide is varied, or the nucleic acid associated with genetic variation, etc. For example, a kit can include the necessary buffers, nucleic acid primers for amplifying nucleic acids, a solid support, and allele specificity for fragments amplified using the primers and necessary enzymes (eg, DNA polymerases) Reagents for sexual detection, such as any of those described herein. Additionally, kits can provide reagents for use in assays in combination with the methods of the present disclosure, e.g., reagents for other disease or disorder screening assays.

在一些实施方案中，公开内容涉及用于测定来自受试者的核酸样品以检测遗传变异的存在的试剂盒，其中所述试剂盒包括选择性检测个体基因组中的至少一种具体的遗传变异所需要的试剂。在一些实施方案中，公开内容涉及用于测定来自受试者的核酸样品以检测受试者的基因组中存在至少一种与遗传变异相关的多态性的至少具体的等位基因的试剂盒。在一些实施方案中，试剂包括至少一种与个体基因组中至少包含遗传变异的片段杂交的连续的寡核苷酸。在一些实施方案中，试剂包括至少一对与从受试者获得的基因组片段的相反链杂交的寡核苷酸，其中每个寡核苷酸引物对设计为选择性扩增个体基因组中包含至少一个遗传变异或遗传变异的片段的片段。所述寡核苷酸或核酸可以使用本文所述的方法设计。在一些实施方案中，试剂盒包括一种或多种能够等位基因特异性检测一种或多种特异性多态性标记或具有遗传变异的单元型的标记的核酸，以及用于检测所述标记的试剂。在一些实施方案中，用于检测SNP标记的试剂盒可以包括检测寡核苷酸探针，其与包含待检测的SNP多态性的模板 DNA的片段杂交，增强子寡核苷酸探针，检测探针，引物和/或核酸内切酶，例如，如Kutyavin等(Nucleic Acid Res.34：el28(2006))所述。In some embodiments, the disclosure relates to a kit for assaying a nucleic acid sample from a subject to detect the presence of a genetic variation, wherein the kit comprises a method for selectively detecting at least one specific genetic variation in an individual's genome required reagents. In some embodiments, the disclosure relates to a kit for assaying a nucleic acid sample from a subject to detect the presence of at least a specific allele of at least one polymorphism associated with a genetic variation in the subject's genome. In some embodiments, the reagents comprise at least one contiguous oligonucleotide that hybridizes to at least a segment of the individual's genome comprising at least the genetic variation. In some embodiments, the reagents comprise at least one pair of oligonucleotides that hybridize to opposite strands of a genomic fragment obtained from a subject, wherein each oligonucleotide primer pair is designed to selectively amplify an individual's genome comprising at least one A fragment of a genetic variation or a fragment of a genetic variation. The oligonucleotides or nucleic acids can be designed using the methods described herein. In some embodiments, the kits include one or more nucleic acids capable of allele-specific detection of one or more specific polymorphic markers or markers having a genetically variant haplotype, and for detecting the labeled reagents. In some embodiments, kits for detecting SNP markers can include a detection oligonucleotide probe that hybridizes to a fragment of template DNA comprising the SNP polymorphism to be detected, an enhancer oligonucleotide probe, Detection probes, primers and/or endonucleases, eg, as described by Kutyavin et al. (Nucleic Acid Res. 34:el28 (2006)).

在一些实施方案中，在评估本文所述的特定的遗传变异的存在之前，通过本公开内容的任意方式扩增DNA模板。可以使用技术人员公知用于进行这些方法的标准方法，并且在本公开的范围内。在一个这样的实施方案中，用于进行这些方法的试剂可以包括在试剂的试剂盒中。在本公开的另一个方面中，提供药物包装(试剂盒)，所述包装包括治疗剂和一组关于将所述治疗剂施用给为本公开的一种或多种变体筛选的人的使用说明，如本文公开的。所述治疗剂可以是小分子药物，抗体，肽，反义或RNAi 分子，或本文所述的其他治疗分子。在一些实施方案中，指示鉴定为本公开的至少一种变体的载体的个体服用所开剂量的治疗剂。在一个这样的实施方案中，指示鉴定为本公开的至少一种变体的载体的个体服用所开剂量的治疗剂。在一些实施方案中，指示鉴定为本公开的至少一种变体的非载体的个体服用所开剂量的治疗剂。In some embodiments, the DNA template is amplified by any means of the present disclosure prior to assessing the presence of the specific genetic variation described herein. Standard methods known to the skilled artisan for performing these methods can be used and are within the scope of this disclosure. In one such embodiment, the reagents for performing the methods can be included in a kit of reagents. In another aspect of the present disclosure, there is provided a pharmaceutical package (kit) comprising a therapeutic agent and a set of uses for administering the therapeutic agent to a human screened for one or more variants of the present disclosure description, as disclosed herein. The therapeutic agent can be a small molecule drug, antibody, peptide, antisense or RNAi molecule, or other therapeutic molecule described herein. In some embodiments, an individual identified as a carrier of at least one variant of the present disclosure is instructed to take the prescribed dose of the therapeutic agent. In one such embodiment, an individual identified as a carrier of at least one variant of the present disclosure is instructed to take the prescribed dose of the therapeutic agent. In some embodiments, an individual identified as a non-carrier of at least one variant of the present disclosure is instructed to take the prescribed dose of the therapeutic agent.

本文还提供制品，其包括与本文所述的人染色体区域杂交并且可以用于检测本文所述的多态性的探针。例如，用于检测本文所述的多态性的任一种探针可以与包装材料组合，以产生制品或试剂盒。所述试剂盒可以包括一种多多赚其他元件，包括：使用说明；和其他试剂，诸如标记或可用于将标记连接到探针上的试剂。使用说明可以包括关于筛选探针的应用从而以本文所述的方法作出疾病或病症的诊断、预后、或治疗诊断(theranosis)的使用说明。其他使用说明可以包括将标记连接到探针上的使用说明，使用探针进行原位分析的使用说明，和/或从受试者获得待分析的核酸样品的使用说明。在一些情形中，试剂盒可以包括标记的探针，其与本文所述的人染色体的区域杂交。Also provided herein are articles of manufacture comprising probes that hybridize to the human chromosomal regions described herein and that can be used to detect the polymorphisms described herein. For example, any of the probes used to detect the polymorphisms described herein can be combined with packaging material to create an article of manufacture or kit. The kit can include a variety of other elements, including: instructions for use; and other reagents, such as labels or reagents that can be used to attach labels to probes. Instructions for use can include instructions for use of screening probes to make a diagnosis, prognosis, or theranosis of a disease or disorder in the methods described herein. Additional instructions may include instructions for attaching a label to a probe, instructions for using the probe for in situ analysis, and/or instructions for obtaining a nucleic acid sample for analysis from a subject. In some cases, kits can include labeled probes that hybridize to regions of human chromosomes described herein.

试剂盒还可以包括一种或多种另外的参比或对照探针，其与可能具有与特定的内在表型相关的异常的相同的染色体或另一种染色体或其部分杂交。包括另外的探针的试剂盒可以进一步包括标记，例如，一种或多种相同的或不同的用于探针的标记。在其他实施方案中，试剂盒中提供的另外的一种探针或多种探针可以是一种标记的探针或多种标记的探针。当试剂盒进一步包括一种或多种另外的探针时，试剂盒可以进一步提供关于使用该一种或多种另外的探针的使用说明。还可以提供用于自检测的试剂盒。所述检测试剂盒可以包括用于使受试者能够在没有保健护理提供者的辅助下获得核酸样品(例如，口腔细胞，血液)的装置和使用说明。例如，口腔细胞可以使用口腔拭子或刷子、说使用漱口水获得。The kit may also include one or more additional reference or control probes that hybridize to the same chromosome or to another chromosome or portion thereof that may have abnormalities associated with a particular intrinsic phenotype. Kits that include additional probes may further include labels, e.g., one or more of the same or different labels used for the probes. In other embodiments, the additional probe or probes provided in the kit can be a labeled probe or labeled probes. When the kit further includes one or more additional probes, the kit may further provide instructions for using the one or more additional probes. Kits for self-detection are also available. The detection kit can include means and instructions for enabling a subject to obtain a nucleic acid sample (e.g., buccal cells, blood) without the assistance of a health care provider. For example, buccal cells can be obtained using a buccal swab or brush, say using a mouthwash.

本文提供的试剂盒还可以包括邮寄品(例如，一个已付邮资的信封或邮递包裹)，其可以用于将用于分析的核酸样品返回，例如，返回到实验室。该试剂盒可以包括一个或多个用于核酸样品的容器，或者核酸样品可以存放在标准血液采集小瓶中。试剂盒还可以包括知情同意表、检测要求表、和关于如何在本文所述的方法中使用试剂盒的使用说明中的一种或多种。用于使用所述试剂盒的方法也包括在本文中。表格(例如，检测要求表)和容纳核酸样品的容器中的一种或多种可以被编码，例如，使用用于识别提供了所述核酸样品的受试者的条形码进行编码。Kits provided herein can also include mailing items (e.g., a postage-paid envelope or postal parcel) that can be used to return a nucleic acid sample for analysis, e.g., to a laboratory. The kit may include one or more containers for nucleic acid samples, or the nucleic acid samples may be stored in standard blood collection vials. The kit may also include one or more of an informed consent form, a test request form, and instructions for how to use the kit in the methods described herein. Methods for using the kits are also included herein. One or more of a form (e.g., a test request form) and a container containing a nucleic acid sample can be encoded, e.g., using a barcode for identifying the subject who provided the nucleic acid sample.

在一些实施方案中，体外筛选检测可以包括一种或多种配置为从个体采集核酸样品的装置、工具和设备。在体外筛选检测的一些实施方案中，采集核酸样品的工具可以包括拭子、解剖刀、注射器、刮刀、容器和其他设计促进核酸样品的采集、保存和运输的装置和试剂中的一种或多种。在一些实施方案中，体外筛选检测可以包括用于采集、稳定、保存和处理核酸样品的试剂或溶液。In some embodiments, an in vitro screening assay can include one or more devices, tools, and equipment configured to collect nucleic acid samples from an individual. In some embodiments of in vitro screening assays, the means for collecting nucleic acid samples may include one or more of swabs, scalpels, syringes, scrapers, containers, and other devices and reagents designed to facilitate the collection, storage, and transport of nucleic acid samples kind. In some embodiments, in vitro screening assays can include reagents or solutions for collecting, stabilizing, preserving, and processing nucleic acid samples.

所述用于核苷酸采集、稳定、保存和处理的试剂和溶液是本领域技术人员公知的，并且可以由本文所述的体外筛选检测所用的具体的方法指示。在一些实施方案中，本文公开的体外筛选检测可以包括微阵列装置和试剂，流动池装置和试剂，多路核苷酸测序仪和试剂，以及测定核酸样品的特定基因标记和检测并显现特定的基因标记所需要的另外的硬件和软件。Such reagents and solutions for nucleotide collection, stabilization, preservation and manipulation are well known to those skilled in the art and can be dictated by the specific methods used for the in vitro screening assays described herein. In some embodiments, the in vitro screening assays disclosed herein can include microarray devices and reagents, flow cell devices and reagents, multiplex nucleotide sequencers and reagents, as well as assay nucleic acid samples for specific genetic markers and detection and visualization of specific genetic markers Additional hardware and software required for genetic markers.

实施例Example

实施例1：RNA靶向测序流程Example 1: RNA-targeted sequencing process

cDNA合成cDNA synthesis

将1ng至1000ng RNA与5μl的下述包含5pmols每种引物(分别为SEQ ID NOS 3-7，以出现的次序)的引物混合物组合：1 ng to 1000 ng RNA was combined with 5 μl of the following primer mix containing 5 pmols of each primer (SEQ ID NOS 3-7, respectively, in order of appearance):

将12μl反应液在95℃加热1分钟，然后在65℃1分钟，并且保持在4℃。然后向反应液中加入4μl 5x第一链缓冲液(Life Technologies， Carlsbad，CA.)，1μl 10mM dNTPs，1μl 0.1M DTT，1μl RNA酶抑制剂(Enzymatics，Beverly，MA.)和1μl Superscript III(LifeTechnologies， Carlsbad，CA.)。将该反应液在55℃温育45分钟，然后在85℃另外温育 5分钟。然后，在加入1μl l RNA酶H(Enzymatics，Beverly，MA.)后，将反应液在37℃温育。将反应液用Ampure(Beckman Coulter Genomics， Danvers，MA)纯化。12 [mu]l of the reaction was heated at 95[deg.]C for 1 minute, then 65[deg.]C for 1 minute, and kept at 4[deg.]C. To the reaction was then added 4 μl of 5x First Strand Buffer (Life Technologies, Carlsbad, CA.), 1 μl of 10 mM dNTPs, 1 μl of 0.1 M DTT, 1 μl of RNase Inhibitor (Enzymatics, Beverly, MA.) and 1 μl of Superscript III ( Life Technologies, Carlsbad, CA.). The reaction was incubated at 55°C for 45 minutes and then at 85°C for an additional 5 minutes. The reaction was then incubated at 37°C after adding 1 μl l RNase H (Enzymatics, Beverly, MA.). The reaction was purified with Ampure (Beckman Coulter Genomics, Danvers, MA).

衔接子连接adaptor ligation

将3μl cDNA与2μl 10μM P7/C7衔接子、1μl T4 DNA连接酶 (Enzymatics，MA)、2μl快速连接酶缓冲液和2μl不含核酸酶的dH₂O组合。将反应液在室温温育1小时。然后通过在65℃温育10分钟使反应液失活，然后用Ampure XP(Beckman Coulter Genomics，Danvers，MA)纯化。3 μl of cDNA was combined with 2 μl of 10 μM P7/C7 adaptor, 1 μl of T4 DNA ligase (Enzymatics, MA), 2 μl of fast ligase buffer and 2 μl of nuclease free_dH2O . The reaction was incubated at room temperature for 1 hour. The reaction was then inactivated by incubation at 65°C for 10 minutes and then purified with Ampure XP (Beckman Coulter Genomics, Danvers, MA).

引物延伸反应primer extension reaction

将10μl衔接子连接的DNA加入到8.4μl dH₂O，0.3μl 10mM dNTP’s，5μl PhusionHF缓冲液，0.3μl Phusion热启动II聚合酶 (Thermo Fischer，Chicago，IL)和0.5pmols的下述引物中的每一种中，体积为1μl：10 μl of adaptor-ligated DNA was added to 8.4 μl of dH₂ O, 0.3 μl of 10 mM dNTP’s, 5 μl of PhusionHF buffer, 0.3 μl of Phusion Hot Start II polymerase (Thermo Fischer, Chicago, IL) and 0.5 pmols of the primers described below. In each, the volume is 1 μl:

将反应液在98℃温育1分钟，然后是5个循环：98℃，60℃20sec， 72℃30sec，然后在4℃保持。然后用Ampure纯化反应液。The reaction was incubated at 98°C for 1 minute, followed by 5 cycles of 98°C, 60°C for 20 sec, 72°C for 30 sec, then hold at 4°C. The reaction was then purified with Ampure.

PCR扩增PCR amplification

将5μl纯化的引物延伸产物与10μl 5x Phusion热启动缓冲液、0.6μl 10mM dNTP、2μl 12.5μM C5 PCR引物(AATGATACGGCGACCA CCGAGATCT)(SEQ ID NO：28)，2μl 12.5μM C7PCR引物 (CAAGCAGAAGACGGCATACGAGAT)(SEQ ID NO：29)、29.8μl dH₂O 和0.6u μlPhusion热启动II聚合酶组合。将反应液在98℃温育1分钟，然后是25个循环：98℃10sec，60℃20秒和72℃30sec。5 μl of the purified primer extension product was combined with 10 μl 5x Phusion Hot Start Buffer, 0.6 μl 10 mM dNTP, 2 μl 12.5 μM C5 PCR primer (AATGATACGGCGACCA CCGAGATCT) (SEQ ID NO: 28), 2 μl 12.5 μM C7 PCR primer (CAAGCAGAAGACGGCATACGAGAT) (SEQ ID NO: 29), 29.8 μl dH₂ O and 0.6 u μl Phusion Hot Start II polymerase combination. The reaction was incubated at 98°C for 1 minute, followed by 25 cycles of 98°C for 10 sec, 60°C for 20 sec and 72°C for 30 sec.

汇集的反应液pooled reaction solution

将PCR产物在琼脂糖凝胶上分离。将凝胶条带切下，并且用Qiagen Minelute凝胶纯化试剂盒纯化。在Illumina MiSeq平台上测序之前，纯化的样品通过AgilentTapestation分析进行分析，按照文库条带量将其稀释，并且汇集。PCR products were separated on agarose gels. Gel bands were excised and purified with Qiagen Minelute gel purification kit. Prior to sequencing on the Illumina MiSeq platform, purified samples were analyzed by Agilent Tapestation analysis, diluted according to the amount of library bands, and pooled.

实施例2：DNA靶向测序制备物Example 2: DNA Targeted Sequencing Preparations

基因组引物延伸Genomic primer extension

将提取自患者血液的4μg人基因组DNA与0.6μl 10mM dNTP，1μl BST 2.0聚合酶(New England Biolabs，Ipswich，MA.)，5μl 10x等温扩增缓冲液(NEB)，和1μl 0.5μM包含下述序列的CS-30引物组合。4 μg of human genomic DNA extracted from patient blood was mixed with 0.6 μl of 10 mM dNTPs, 1 μl of BST 2.0 polymerase (New England Biolabs, Ipswich, MA.), 5 μl of 10x Isothermal Amplification Buffer (NEB), and 1 μl of 0.5 μM containing the following Sequence of CS-30 primer combinations.

衔接子连接adaptor ligation

将20μl洗脱的引物延伸反应液与1.5μl 5μM P7/C7衔接子(之前上文所述的以正确的条形码配对的1个顶部链和1个底部链寡聚物退火的双链体)、1μl T4 DNA连接酶、6μl5x快速连接酶缓冲液(New England Biolabs，Ipswich，MA.)和1.5μl不含核酸酶的dH₂O组合。将反应液在室温温育1小时。然后，通过在68℃温育10分钟而使反应液失活。然后，用Ampure XP(Beckman)纯化反应物。20 μl of the eluted primer extension reaction was mixed with 1.5 μl of 5 μM P7/C7 adaptor (1 top strand and 1 bottom strand oligomer annealed with the correct barcode paired duplex described above), 1 [mu]l T4 DNA ligase, 6 [mu]l 5x fast ligase buffer (New England Biolabs, Ipswich, MA.) and 1.5 [mu]l nuclease free_dH2O were combined. The reaction was incubated at room temperature for 1 hour. Then, the reaction solution was inactivated by incubating at 68°C for 10 minutes. The reaction was then purified with Ampure XP (Beckman).

珠子捕获bead capture

将180μl my One C1 SA珠子(Dynal，Lifetech)用1ml 1x B&W洗涤。将珠子用2份另外的1x B&W洗液洗涤，每份200μl。Ampure纯化的衔接子连接物的总洗脱体积为65μl。加入等体积(65μl)的2x B&W，和另外 100u μl 1x B&W，以使每次结合的总体积为230μl。将反应液在培养箱振荡器上放置20分钟。样品结合后，将珠子用200μl NSX洗涤，去除液体。180 μl of my One C1 SA beads (Dynal, Lifetech) were washed with 1 ml of Ix B&W. The beads were washed with 2 additional 1x B&W washes of 200 μl each. The total elution volume of the Ampure purified adaptor linker was 65 [mu]l. An equal volume (65 μl) of 2x B&W, and an additional 100 u μl of 1x B&W was added to bring the total volume per binding to 230 μl. Place the reaction on an incubator shaker for 20 minutes. After sample binding, beads were washed with 200 [mu]l NSX to remove liquid.

然后将样品重悬在200μl 0.1N NaOH中，并且在室温旋转20分钟。去除NaOH，并且用另外200μl 0.1N NaOH进行第二次洗涤。在去除NaOH 后，将珠子用600μl TE洗涤2次。然后，将珠子用NSX洗涤2次。将珠子放置在100μl Tex(具有.01％Triton X的TE)中，并且在4C保存过夜。在引物延伸之前，将珠子用200μl 1x Phusion HF(w/.01 triton X)洗涤2次，并且用不含Triton X的1x HF洗涤一次。The samples were then resuspended in 200 [mu]l 0.1N NaOH and spun at room temperature for 20 minutes. The NaOH was removed and a second wash was performed with an additional 200 μl of 0.1 N NaOH. After removing the NaOH, the beads were washed 2 times with 600 μl TE. Then, the beads were washed 2 times with NSX. Beads were placed in 100 [mu]l Tex (TE with .01% Triton X) and stored overnight at 4C. Before primer extension, the beads were washed twice with 200 μl of Ix Phusion HF (w/.01 triton X) and once with Ix HF without Triton X.

引物延伸反应primer extension reaction

将珠子混合物重悬在21.1μl dH₂O，0.6μl 10mM dNTP’s，6μl Phusion HF缓冲液，0.3μl Phusion热启动II聚合酶(Thermo Fischer，Chicago，IL) 和0.5pmols下述引物中的每一种中，体积为2μl：The bead mixture was resuspended in 21.1 μl_dH2O , 0.6 μl 10 mM dNTP's, 6 μl Phusion HF buffer, 0.3 μl Phusion Hot Start II polymerase (Thermo Fischer, Chicago, IL) and 0.5 pmols of each of the following primers , the volume is 2 μl:

将反应液在98℃温育1分钟，然后是5个循环：98℃，60℃20sec， 72℃30sec，然后保持在4℃。然后用Ampure纯化反应液。The reaction was incubated at 98°C for 1 minute, followed by 5 cycles of 98°C, 60°C for 20 sec, 72°C for 30 sec, then hold at 4°C. The reaction was then purified with Ampure.

PCR扩增PCR amplification

将5μl纯化的引物延伸产物与10μl 5x Phusion热启动缓冲液(HF)， 0.6μl 10mMdNTP，2μl 12.5μM C5 PCR引物 (AATGATACGGCGACCACCGAGATCT)(SEQ ID NO：28)，2μl 12.5μM C7 PCR引物(CAAGCAGAAGACGGCATACGAGAT)(SEQ ID NO：29)， 29.8μl dH₂O，和0.6μlPhusion热启动II聚合酶组合。将反应液在98℃ 温育1分钟，然后是25个循环：98℃10秒，60℃20秒和72℃30秒。5 μl of the purified primer extension product was mixed with 10 μl 5x Phusion Hot Start Buffer (HF), 0.6 μl 10 mM dNTP, 2 μl 12.5 μM C5 PCR primer (AATGATACGGCGACCACCGAGATCT) (SEQ ID NO: 28), 2 μl 12.5 μM C7 PCR primer (CAAGCAGAAGACGGCATACGAGAT) ( SEQ ID NO: 29), 29.8 [mu]l_dH2O , and 0.6 [mu]l Phusion Hot Start II polymerase in combination. The reaction was incubated at 98°C for 1 minute, followed by 25 cycles of 98°C for 10 seconds, 60°C for 20 seconds and 72°C for 30 seconds.

将凝胶条带切下，并且用Qiagen Minelute凝胶纯化试剂盒按照供应商的使用说明进行纯化。将纯化的样品通过Agilent Tapestation分析进行分析，并且在IlluminaMiSeq上测序之前，按照文库条带量稀释并且汇集。Gel bands were excised and purified using the Qiagen Minelute gel purification kit according to the supplier's instructions. Purified samples were analyzed by Agilent Tapestation analysis and were diluted and pooled according to the amount of library bands prior to sequencing on Illumina MiSeq.

实施例3-改进的引物组产生-引物二聚体形成的分析Example 3 - Improved Primer Set Generation - Analysis of Primer Dimer Formation

为了产生用于所述的靶向测序方法的引物组，评估扩增靶标的稳定性和稳健性。另外，评估覆盖的均匀性和序列准确度，以产生引物组并且提高测定性能。To generate primer sets for the described targeted sequencing methods, the stability and robustness of the amplified targets were assessed. Additionally, uniformity of coverage and sequence accuracy are assessed to generate primer sets and improve assay performance.

为了改善这些参数，评估多个量度，包括最终扩增的靶标的质量，扩增循环的要求，扩增产物的纯度和扩增产物的产量。还进行扩增产物的序列分析，以提高靶标特异性、覆盖均匀性、测序深度和SNP确定准确度。使用重复的流程改进循环、产物形成分析和序列质量来提高测定性能。To improve these parameters, a number of metrics were evaluated, including the quality of the final amplified target, the requirements for amplification cycles, the purity of the amplification product, and the yield of the amplification product. Sequence analysis of amplified products was also performed to improve target specificity, coverage uniformity, sequencing depth, and SNP determination accuracy. Improve assay performance using repetitive procedures to improve cycling, product formation analysis, and sequence quality.

利用序列分析，确定不需要的75bp产物与线性延伸/扩增步骤中所用的引物相关。确定较大的二联体或三联体125-200bp产物与线性延伸/扩增步骤中所用的C7/P7衔接子和引物相关。确定＞150bp的较大的二聚体产物与初始RT/PE步骤中所用的引物相关。Using sequence analysis, the unwanted 75bp product was determined to be related to the primers used in the linear extension/amplification step. The larger doublet or triplet 125-200bp product was determined to be related to the C7/P7 adaptor and primers used in the linear extension/amplification steps. Larger dimer products of >150bp were determined to be related to the primers used in the initial RT/PE step.

利用序列分析检测的主要的二聚体产物长度为143、155和160，并且对应二聚体产物。序列分析揭示143bp的产物与 MCOLN1_11_1_f_PE2_5引物(其出现132次)和GAA_14_1_o_PE2_7 引物(其出现660次)相关。序列分析揭示155bp产物与 GAA_14_1_o_PE2_7引物(其出现1146次)相关。序列分析揭示160bp 产物与IKBKAP_32_1_f_PE2_6(其出现464次)相关。作为这一分析的结果，将这些引物从引物组中去除。The major dimer products detected by sequence analysis are 143, 155 and 160 in length and correspond to dimer products. Sequence analysis revealed that the 143 bp product was related to the MCOLN1_11_1_f_PE2_5 primer (which appeared 132 times) and the GAA_14_1_o_PE2_7 primer (which appeared 660 times). Sequence analysis revealed that the 155bp product was related to the GAA_14_1_o_PE2_7 primer (which appeared 1146 times). Sequence analysis revealed that the 160 bp product was related to IKBKAP_32_1_f_PE2_6 (which appeared 464 times). As a result of this analysis, these primers were removed from the primer set.

从这些分析，发现具有高解链温度(例如，70℃T_M)和低退火温度(例如，60℃)的引物、具有高GC含量的引物通过与引物/UID区相互作用以及一些DNA酶(例如，Phusion)的3′外切活性而促进不需要的二聚体的形成。作为这些分析和结论的结果，用在其3’末端的最后5个核苷酸不具有高GC％的引物产生引物组。作为这些分析和结论的结果，与使用初始的引物组相比，极大地减少了二聚体产物形成，并且所述改善避免了对靶标产物凝胶纯化的需求。From these analyses, it was found that primers with high melting temperature (eg, 70°C_TM ) and low annealing temperature (eg, 60°C), primers with high GC content by interacting with the primer/UID region and some DNases (eg, Phusion) to promote the formation of unwanted dimers. As a result of these analyses and conclusions, primer sets were generated with primers that did not have a high GC% in the last 5 nucleotides of their 3' end. As a result of these analyses and conclusions, dimer product formation is greatly reduced compared to using the original primer set, and the improvement avoids the need for gel purification of the target product.

从上述实验产生多种引物排除标准，并且用来从CS-350组产生亚组。该亚组使用这些排除参数中的一种或组合而产生。第一，在初始RT/PE 步骤或线性延伸/扩增步骤中具有最高的错读(由错误引发导致)数的引物。第二，从亚组中排除通过序列分析表明普遍以二聚体存在的引物。第三，从亚组中排除负责产生中靶的一个或多个最高的总读取数(过度扩增) 的引物。Various primer exclusion criteria were generated from the above experiments and used to generate subgroups from the CS-350 panel. The subgroup was generated using one or a combination of these exclusion parameters. First, primers with the highest number of misreads (caused by mispriming) in the initial RT/PE step or linear extension/amplification step. Second, primers that were shown by sequence analysis to commonly exist as dimers were excluded from the subgroup. Third, primers responsible for producing one or more of the highest total reads (overamplification) on target were excluded from the subset.

实施例4-改进的引物组-扩增子％GC含量和引物解链温度的分析Example 4 - Improved Primer Set - Analysis of Amplicon %GC Content and Primer Melting Temperature

为了产生用于所述的靶向测序方法的引物组，与扩增子的％GC含量和所用的引物的解链温度相比较，评估扩增的靶标的稳定性和稳健性。针对特定引物产生的读取数作为引物性能的量度。另外，评估覆盖的均匀性和序列准确度，以产生引物组并改善测定性能。To generate primer sets for the described targeted sequencing method, the stability and robustness of the amplified target was assessed in comparison to the % GC content of the amplicons and the melting temperature of the primers used. The number of reads produced for a particular primer was used as a measure of primer performance. In addition, uniformity of coverage and sequence accuracy were assessed to generate primer sets and improve assay performance.

大部分差的性能(最少的读取数)具有T_M＜60℃的线性延伸/扩增引物，并且来源于富含AT的扩增子。第二簇差的性能由具有较高GC百分数的扩增子和具有高解链温度的引物构成。作为这些实验和分析的结果，产生多个关于扩增子和引物的标准。第一，要用的引物的解链温度范围应该为60℃-68℃。第二，引物可能具有21-32个核苷酸的长度。第三，引物不应该在3’末端的最后5个核苷酸中包含4个以上的嘧啶。第四，扩增子应该包含30％-70％的GC含量。最后，扩增子的长度应该为 225-300个碱基对的长度。Most of the poor performance (least number of reads) had linear extension/amplification primers with_TM < 60°C and were derived from AT-rich amplicons. The second cluster of poor performance consisted of amplicons with higher GC percentages and primers with high melting temperatures. As a result of these experiments and analyses, a number of criteria for amplicons and primers were generated. First, the primers to be used should have a melting temperature range of 60°C-68°C. Second, primers may be 21-32 nucleotides in length. Third, primers should not contain more than 4 pyrimidines in the last 5 nucleotides at the 3' end. Fourth, the amplicons should contain 30%-70% GC content. Finally, the amplicon should be 225-300 base pairs in length.

实施例5-改善的反应条件Example 5 - Improved reaction conditions

为了改善用于所述的靶向测序方法的反应条件，评估扩增的靶标的稳定性和稳健性。另外，评估覆盖均匀性和序列准确度，以改善反应条件并且改善测定性能。In order to improve the reaction conditions for the described targeted sequencing methods, the stability and robustness of the amplified targets were assessed. Additionally, coverage uniformity and sequence accuracy were assessed to improve reaction conditions and improve assay performance.

为了改善这些参数，评估多种量度，包括最终扩增的靶标的质量，扩增循环要求，扩增产物的纯度，和扩增的产物的产量。使用重复的流程改进循环、产物形成分析和序列质量来提高测定性能。To improve these parameters, a variety of metrics are evaluated, including the quality of the final amplified target, the amplification cycle requirements, the purity of the amplified product, and the yield of the amplified product. Improve assay performance using repetitive procedures to improve cycling, product formation analysis, and sequence quality.

初始引物滴定实验不足以允许用现有的扩增升降温和退火条件进行靶标产生。基于上述参数和量度的评估，对于高度复杂的引物集合，提出更严格的升降温条件。Initial primer titration experiments were insufficient to allow target generation with existing amplification ramp and annealing conditions. Based on the evaluation of the above parameters and metrics, more stringent heating and cooling conditions are proposed for highly complex primer sets.

使用用于CS-30引物组的初始升降温条件，30个靶标不与更复杂的引物组作用。通过减缓线性延伸/扩增步骤(PE2)的升降温速率，并且对于初始RT/PE步骤增加在68℃的保持，而提高严格性。最低退火温度保持降低为55℃，以适应较低的引物解链温度。固定引物集合的整体浓度显示更好的引物形成，具有在24-346个扩增子范围内的组尺寸。严格RT/PE 和线性延伸/扩增升降温条件与固定的整体引物集合的组合显示出利用不同条件的相同方法的改进。Using the initial ramp conditions for the CS-30 primer set, the 30 targets did not interact with the more complex primer set. Stringency was increased by slowing down the ramp rate of the linear extension/amplification step (PE2) and increasing the hold at 68°C for the initial RT/PE step. The minimum annealing temperature was kept lowered to 55°C to accommodate lower primer melting temperatures. The overall concentration of immobilized primer sets showed better primer formation, with set sizes in the range of 24-346 amplicons. The combination of stringent RT/PE and linear extension/amplification ramping conditions with immobilized monolithic primer sets showed improvement of the same method using different conditions.

另外，进行在RT/PE和线性延伸/扩增步骤中使用不同添加剂的其他实验，以改善产物形成。检测几种添加剂条件，并且评估它们对产物形成的影响。该数据表明使用优化的反应条件改善读数覆盖。硫酸铵和另外的 MgCl₂对读取深度具有最显著的影响。在组优化之前，使用完整的CS-350 组进行这些实验。进行这些实验以辅助解释二聚体形成的机制并且鉴定所涉及的引物。Additionally, additional experiments were performed using different additives in the RT/PE and linear extension/amplification steps to improve product formation. Several additive conditions were tested and their effect on product formation was assessed. This data demonstrates improved read coverage using optimized reaction conditions. Ammonium sulfate and additional MgCl had the most significant effect_on read depth. These experiments were performed using the complete CS-350 group prior to group optimization. These experiments were performed to help explain the mechanism of dimer formation and to identify the primers involved.

实施例5-靶向测序流程Example 5 - Targeted Sequencing Process

本文所述的方法用于特异性靶向、扩增、测序和/或定量样品中存在的DNA或RNA序列。这些方法允许加入使用于测序或其他分子分析的被靶向的序列格式化的另外的序列。所述方法已经用于添加特有识别序列 (UID)，其允许划分来源于相同的RNA或DNA分子的读数，允许做出关于某些序列多态性存在于RNA或DNA分子群体中或由人工扩增物导致的确定。RNA或DNA用作模板/起始材料。样品可以来自任意生物体或病毒。所述方法用于使被靶向的分子针对各种测序装置和其他分子分析装置格式化。The methods described herein are used to specifically target, amplify, sequence and/or quantify DNA or RNA sequences present in a sample. These methods allow the addition of additional sequences formatted with the targeted sequence for sequencing or other molecular analysis. The method has been used to add unique identification sequences (UIDs) that allow the division of reads originating from the same RNA or DNA molecule, allowing to make conclusions about whether certain sequence polymorphisms are present in a population of RNA or DNA molecules or by artificial amplification. Additive-induced determination. RNA or DNA was used as template/starting material. Samples can be from any organism or virus. The method is used to format targeted molecules for various sequencing devices and other molecular analysis devices.

文库制备流程用于在NGS平台上测序的靶向测序目的。在该测定中，将来自患者生物学样品的多种特异性的生物学靶标(一种至数千种)转化为NGS相容的文库，并且测序。这允许鉴定患者基因组或转录体组中的靶标频率(基因表达)、和突变或SNPs，由此获得临床信息。该测定还用于通过靶向患者样品中的病毒、细菌或真菌的RNA或DNA而鉴定多种感染的存在或不存在和频率。The library prep workflow is used for targeted sequencing purposes for sequencing on NGS platforms. In this assay, multiple specific biological targets (one to thousands) from patient biological samples are converted into NGS compatible libraries and sequenced. This allows the identification of target frequencies (gene expression), and mutations or SNPs in a patient's genome or transcriptome, thereby obtaining clinical information. The assay is also used to identify the presence or absence and frequency of various infections by targeting viral, bacterial or fungal RNA or DNA in patient samples.

已经单独或同时进行了多种应用，例如，测序癌症突变模式需要的靶标，SNPs和突变分析，载体检测，感染诊断和基因表达分析。Various applications have been performed individually or simultaneously, for example, sequencing of targets required for cancer mutational patterns, SNPs and mutational analysis, vector detection, infection diagnosis and gene expression analysis.

对于RNA，使用反转录酶进行反转录(RT)，以产生与靶标或目的物互补的cDNA。对于DNA，使用DNA聚合酶进行引物延伸(PE)，以产生与靶标或目的物互补的DNA。在这两种情形中，用于进行所述RT或PE 的寡聚物由针对目的靶标的基因特异性引物、特有的识别(UID)标记(由 15个以上简并碱基构成的长的完全或部分简并的条形码；NNNNNNNNNNNNNNNJ (SEQ ID NO：1)，或NNNNNWNNNNNWNNNNN (SEQ ID NO：2))和已知序列的通用标记 (称为P7正向引物：P7f’)构成，具有磷酸化的5’末端。UID用于单分子条形码、任意RNA或DNA分子，并且在序列分析阶段使用，以鉴定生物学样品中的绝对起始分子数，展开靶标的共有序列，并且去除所有PCR 或测序出错，由此提高测序准确度。为了捕获多种不同的基因，使用由多种这样的寡聚物构成的集合，其中所述寡聚物相应的基因特异性部分是针对要捕获的每种靶标的互补物。For RNA, reverse transcription (RT) is performed using reverse transcriptase to generate cDNA complementary to the target or target. For DNA, primer extension (PE) is performed using a DNA polymerase to generate DNA complementary to the target or target. In both cases, the oligomers used to perform the RT or PE consist of gene-specific primers for the target of interest, a unique identification (UID) tag (a long complete consisting of more than 15 degenerate bases) or a partially degenerate barcode; NNNNNNNNNNNNNNNJ (SEQ ID NO: 1), or NNNNNWNNNNNWNNNNN (SEQ ID NO: 2)) and a universal marker of known sequence (called P7 forward primer: P7f') consisting of phosphorylated 5' end. UIDs are used for single-molecule barcodes, arbitrary RNA or DNA molecules, and are used at the sequence analysis stage to identify absolute starting molecule numbers in biological samples, expand consensus sequences for targets, and remove any PCR or sequencing errors, thereby increasing Sequencing accuracy. To capture a variety of different genes, a set consisting of a plurality of such oligomers is used, wherein the corresponding gene-specific portions of the oligomers are the complements for each target to be captured.

格式化/衔接子(Adapter)连接-在该步骤中，向刚合成的核酸添加扩增/分析需要的另外的序列。该另外的序列可以通过连接添加(优选的方法)，单链的或使用桥接寡聚物进行。在后续步骤中，该序列通过扩增添加。该序列用作通用引发序列，用于扩增大的格式化序列群体。该序列包含用于样品识别的条形码。该序列还包含纯化标记，诸如生物素。在一种方法中，用于连接的衔接子由作为与P7f’区互补的桥接寡聚物的上链和连接到在RT或PE步骤中产生的产物上的底部链寡聚物构成。所得到的产物添加了P7区的其余部分(用于测序)以及样品条形码(SBC)(如果平行处理多种患者样品，这是需要的)，以及任选地，C7区域，其用于在NGS平台上聚簇。Formatting/Adapter Ligation - In this step, additional sequences needed for amplification/analysis are added to the just synthesized nucleic acid. This additional sequence can be added by ligation (the preferred method), single-stranded or using bridging oligomers. In a subsequent step, this sequence is added by amplification. This sequence serves as a universal priming sequence for amplifying large populations of formatted sequences. This sequence contains barcodes for sample identification. The sequence also contains a purification tag, such as biotin. In one approach, the adaptor used for ligation consists of the upper strand as a bridging oligomer complementary to the P7f' region and the bottom strand oligomer ligated to the product produced in the RT or PE step. The resulting product has the remainder of the P7 region added (for sequencing) and the sample barcode (SBC) (required if multiple patient samples are processed in parallel), and optionally, the C7 region, which is used in NGS cluster on the platform.

珠子捕获(任选的)一在这一步骤中，部分格式化的核酸通过上述添加的亲和标记或序列而捕获。该捕获用于分离靶标序列与不感兴趣的模板/样品序列。Bead capture (optional) - In this step, partially formatted nucleic acids are captured by the added affinity tags or sequences described above. This capture is used to separate target sequences from template/sample sequences of no interest.

引物延伸/线性扩增-使用DNA聚合酶和使用由针对要捕获的每种靶标的基因特异性区域、测序引物标记(P5)、和用于在NGS平台上聚簇的通用标记(C5)构成的寡聚物的集合，进行线性扩增(或线性引物延伸 (LPE))。寡聚物的集合用于在单一反应中一次性进行多种靶标的LPE。该延伸在溶液中发生或者使用连接到珠子或阵列上的模板进行。LPE作为单次循环或多次循环(直到数百次)进行，避免在标准PCR中产生的PCR 扩增偏差。Primer extension/linear amplification - using DNA polymerase and using a gene-specific region for each target to be captured, a sequencing primer marker (P5), and a universal marker (C5) for clustering on NGS platforms A collection of oligomers is subjected to linear amplification (or linear primer extension (LPE)). Pools of oligomers are used to perform LPE of multiple targets at once in a single reaction. This extension occurs in solution or using templates attached to beads or arrays. LPE is performed as a single cycle or multiple cycles (up to hundreds), avoiding PCR amplification biases that arise in standard PCR.

PCR富集-使用下述寡聚物，通过PCR同时扩增目的靶标：由LPE 寡聚物的任意部分构成的、优选由C5(或任选地P5C5，或仅是P5)构成的正向引物，和与通用衔接子的任意部分互补、优选与C7(或任选地 P7-BC-C7，或仅是P7)互补的反向引物。PCR enrichment - Simultaneous amplification of the target of interest by PCR using the following oligos: forward primer consisting of any part of the LPE oligo, preferably consisting of C5 (or optionally P5C5, or just P5) , and a reverse primer complementary to any part of the universal adaptor, preferably to C7 (or optionally P7-BC-C7, or just P7).

最终的文库-最终的文库由所有用标记捕获的靶标的集合构成。Final Library - The final library consists of the collection of all targets captured with the tag.

目的在于，下述权利要求书限定本文所述的方法、组合物和试剂盒的范围，并且由此应该涵盖在这些权利要求的范围内的方法和组合物及其等价物。可以权利要求以排除任何可选的元素。由此，本声明旨在作为关于联系权利要求元素的引用使用诸如“唯一地”、“仅有”等排除性术语或使用“否定”限定的先行基础。It is intended that the following claims define the scope of the methods, compositions and kits described herein, and that methods and compositions and their equivalents within the scope of these claims are therefore intended to be covered. A claim may be made to exclude any optional element. As such, this statement is intended to serve as an antecedent basis for the use of exclusionary terms such as "solely," "only," or the use of "negative" qualifications with respect to reference to claim elements.

Claims

1. A method of generating a polynucleotide library from a sample, wherein the library has a true frequency representation of target polynucleotides found in the sample, the method comprising:

(a) generating a first complementary sequence CS from each target polynucleotide by extending a first gene-specific primer comprising a UID and a Universal Linker Sequence (ULS);

(b) ligating an adaptor comprising a first Primer Binding Sequence (PBS) to the first CS, wherein the adaptor is a bridged polynucleotide comprising a double-stranded region and a single-stranded overhang region;

(c) generating a second CS from the first CS by extending a second gene-specific primer comprising a second Primer Binding Sequence (PBS);

(d) the second CS is amplified using primers that hybridize to the first and second PBS without amplification bias, thereby forming a polynucleotide library with a true frequency representation of the target polynucleotide.

2. The method of claim 1, wherein amplifying the second CS is achieved by linear amplification.

3. The method of claim 1, wherein amplifying the second CS is achieved by exponential amplification.

4. The method of claim 1, wherein adapters further comprise Sample Barcode (SBC) sequences.

5. The method of claim 1, wherein the adaptor further comprises an affinity molecule or a capture sequence.

6. The method of claim 1, wherein the UID comprises sequence NNNNNNNNNNNNNNN (SEQ ID NO: 1), wherein N is any nucleic acid residue.

7. The method of claim 1, wherein the UID comprises the sequence NNNNNWNNNNNWNNNNN (SEQ ID NO: 2), wherein N is any nucleic acid residue and W is adenine or thymine.