CN110628890A

Movatterモバイル変換

Info

Publication number: CN110628890A
Application number: CN201911084920.5A
Authority: CN
Inventors: 倪铭; 刘红洁; 李鹏; 林彦锋; 宋宏彬
Original assignee: Chinese Pla Center For Disease Control & Prevention; Institute of Pharmacology and Toxicology of AMMS
Current assignee: Chinese Pla Center For Disease Control & Prevention; Institute of Pharmacology and Toxicology of AMMS
Priority date: 2019-11-07
Filing date: 2019-11-07
Publication date: 2019-12-31
Anticipated expiration: 2039-11-07
Also published as: CN110628890B

Abstract

Translated fromChinese

本发明涉及生物测序质控领域，具体而言，提供了一种测序质控标准品及其应用与产品。本发明提供了一种测序质控标准品，依次包括引物区1、特异区1、同源区、特异区2和引物区2，测序质控标准品的长度为1000‑10000bp，与天然序列的同源性为0，GC含量为40％‑60％，不含有≥5bp的单碱基重复元件，以及不含有≥4的GC或CG双碱基重复，引物区1的长度为30‑300bp，引物区2的长度为30‑300bp，以及，同源区的长度为特异区1、同源区和特异区2长度总和的30‑40％。该测序质控标准品通用性强，可以直接掺入到待测序样本中而不会影响其下游的分析操作，此外，可以满足样品追踪和交叉污染评估的需求。

The invention relates to the field of biological sequencing quality control, in particular, it provides a sequencing quality control standard product and its application and product. The invention provides a sequencing quality control standard product, which sequentially includes a primer region 1, a specific region 1, a homology region, a specific region 2 and a primer region 2. Homology is 0, GC content is 40%-60%, does not contain ≥5 bp single-base repeat elements, and does not contain ≥4 GC or CG double-base repeats, the length of primer region 1 is 30-300bp, The length of the primer region 2 is 30-300bp, and the length of the homologous region is 30-40% of the sum of the lengths of the specific region 1, the homologous region and the specific region 2. The sequencing quality control standard has strong versatility and can be directly incorporated into the sample to be sequenced without affecting its downstream analysis operations. In addition, it can meet the needs of sample tracking and cross-contamination assessment.

Description

Translated fromChinese

测序质控标准品及其应用与产品Sequencing quality control standards and their applications and products

技术领域technical field

本发明涉及生物测序质控领域，具体而言，涉及一种测序质控标准品及其应用与产品。The invention relates to the field of biological sequencing quality control, in particular to a sequencing quality control standard product and its application and product.

背景技术Background technique

DNA/RNA序列的高通量测序技术和单分子测序技术在科学研究和实际应用中发挥巨大作用。但是测序流程中可能发生样本混淆或交叉污染的情况，造成损失，主要原因有如下三个方面：High-throughput sequencing technology of DNA/RNA sequence and single-molecule sequencing technology play a huge role in scientific research and practical application. However, sample confusion or cross-contamination may occur during the sequencing process, causing losses. The main reasons are as follows:

(1)待测序样本需要经过复杂的DNA/RNA提取和文库构建工作，涉及大量分子生物学操作，而这其中有发生样本混淆(如中间试管编号失误)或交叉污染(如移液器枪头污染或气溶胶污染)的风险；(1) The samples to be sequenced need to undergo complex DNA/RNA extraction and library construction, involving a large number of molecular biology operations, and there may be sample confusion (such as wrong numbering of intermediate test tubes) or cross-contamination (such as pipette tips pollution or aerosol pollution);

(2)目前测序仪器通量巨大，一张测序芯片上常混合多个样本，再通过生物信息学方法拆分数据，而这其中也可能发生信息填写错误从而导致的数据拆分混乱的情况；(2) At present, the throughput of sequencing instruments is huge, and multiple samples are often mixed on a sequencing chip, and then the data is split by bioinformatics methods, and there may also be confusion in data splitting caused by information filling errors;

(3)测序仪和试剂的本身局限性，如Illumina公司的部分测序平台就有较高的测序过程中交叉污染的缺点。(3) The inherent limitations of sequencers and reagents, for example, some sequencing platforms of Illumina have the disadvantage of high cross-contamination during the sequencing process.

目前，一种样本追踪的方案是测序时同时获得法医学人类身份识别基因组位点(如商品化试剂盒，xGen Human ID Research Panel v1.0，http://www.nanodigmbio.com/product-item-17.html)，从而可以把测序数据和个体联系起来，该方案也可以推广到其他物种。但是，存在一个较大的局限性，即不能有效的发现样本混淆的情况，而只能通过其他手段(比如同临床指标不符合等)提示可能有混淆，再重新对原始样本再实验进行确认。At present, a solution for sample tracking is to simultaneously obtain forensic human identification genomic loci during sequencing (such as commercial kits, xGen Human ID Research Panel v1.0, http://www.nanodigmbio.com/product-item- 17.html), so that sequencing data can be linked to individuals, and the scheme can also be extended to other species. However, there is a big limitation, that is, the sample confusion cannot be effectively found, but other means (such as inconsistent with clinical indicators, etc.) can only be used to indicate that there may be confusion, and then the original sample should be re-tested to confirm.

另一种更有效的方案是采用在待测序样本中掺入少量标准品。掺入的标准品是一系列有固定编号的标准DNA。在准备样本时，掺入少量标准DNA到提取的样品核酸中，经过文库构建和测序后，在数据分析时检测样本中的标准品核酸对应序列是否和预期一致，如果不一致，则说明实验和数据拆分时发生了混淆，需要重新实验。Another, more efficient solution is to use a small amount of standard spiked into the sample to be sequenced. The spiked standard is a series of standard DNA with a fixed number. When preparing the sample, a small amount of standard DNA is mixed into the extracted sample nucleic acid. After library construction and sequencing, check whether the corresponding sequence of the standard nucleic acid in the sample is consistent with the expectation during data analysis. If not, explain the experiment and data. There was a confusion when unpacking, and a re-experiment is required.

在基因微阵列芯片中，样品追踪掺入标准品是常用方案，如Illumina公司用其基因芯片提供的CytoChip^TM OligoSpike-inControls，以及Agilent在其应用说明(Sampth&Kishawi，Use of Spike-ins for Sample Trackingin Agilent Array CGH,AgilentTechnologies,Inc.,2016Published in the USA,March 15,20165991-6661EN)中提供了芯片检测标准品。但是，这些标准品较短(约400bp)，且带有荧光基团，无法用于测序实验。Chen等人在论文(Chen K,Hu Z,Xia Z,Zhao D,Li W,Tyler JK.2016.The overlookedfact:fundamental need for spike-in control for virtually all genome-wideanalyses.MolCellBiol 36:662 –667.doi:10.1128/MCB.00970-14)中讨论了掺入标准品在测序及其他基因组相关实验中的必要性，并讨论了掺入标准品的设计原则，但是主要针对于高通量测序实验。In a gene microarray chip, it is a common solution for sample tracking to incorporate a standard, such as CytoChip^TM OligoSpike-inControls provided by Illumina with its gene chip, and Agilent in its application note (Sampth&Kishawi, Use of Spike-ins for Sample Tracking in Agilent Array CGH, Agilent Technologies, Inc., 2016Published in the USA, March 15, 20165991-6661EN) provides chip detection standards. However, these standards are short (about 400bp) and contain fluorescent groups, which cannot be used for sequencing experiments. Chen et al. in the paper (Chen K, Hu Z, Xia Z, Zhao D, Li W, Tyler JK. 2016. The overlooked fact: fundamental need for spike-in control for virtually all genome-wide analyses. MolCellBiol 36:662 –667. doi:10.1128/MCB.00970-14) discussed the necessity of spiked standards in sequencing and other genome-related experiments, and discussed the design principles of spiked standards, but mainly for high-throughput sequencing experiments.

一方面，单分子测序的测序片段长度远远高于高通量测序，且难以测较短的序列，而且，测序错误率很高(可达到10％以上)。所以，怎样使得掺入标准品可以被有效测序和识别，避免测序错误带来的影响，进行样品追踪和交叉污染评估，是待解决的问题。而且，现有技术中的掺入标准品的应用场景较为狭窄，通用性差，针对不同的测序方案需要设计不同的标准品，可以同时针对高通量测序和单分子测序的掺入标准品还是空白。On the one hand, the sequence fragment length of single-molecule sequencing is much higher than that of high-throughput sequencing, and it is difficult to detect shorter sequences, and the sequencing error rate is very high (up to 10%). Therefore, how to make the spiked standard can be effectively sequenced and identified, avoid the impact of sequencing errors, and conduct sample tracking and cross-contamination assessment are problems to be solved. Moreover, the application scenarios of the incorporation standards in the prior art are relatively narrow and have poor versatility. Different standards need to be designed for different sequencing schemes. The incorporation standards that can be used for both high-throughput sequencing and single-molecule sequencing are still blank. .

另一方面，除了进行样本追踪和交叉污染评估外，对突变检测的准确性评估也是质控的重要方面。目前的做法是采用独立的标准品样本来进行平行实验，如HorizonDiscovery公司的众多带有各种突变的独立标准品，以及中国食品药品检定研究院提供的用于基因检测的(如BRCA基因突变)独立标准品。由于独立标准品一般价格昂贵，且每份样本仅能进行少量实验，常用于对实验室或检测机构的突变发现能力(敏感特异性)进行检测，而难以对每份检测的样本都进行质控。On the other hand, in addition to sample tracking and cross-contamination assessment, the accuracy assessment of mutation detection is also an important aspect of quality control. The current practice is to use independent standard samples to carry out parallel experiments, such as HorizonDiscovery’s many independent standards with various mutations, as well as those provided by China Food and Drug Control Institute for genetic testing (such as BRCA gene mutation) independent standard. Since independent standard products are generally expensive and only a small amount of experiments can be performed on each sample, they are often used to test the mutation discovery ability (sensitivity and specificity) of laboratories or testing institutions, and it is difficult to perform quality control on each sample tested .

有鉴于此，特提出本发明。In view of this, the present invention is proposed.

发明内容Contents of the invention

本发明的目的在于解决上述技术问题中的至少一种。The purpose of the present invention is to solve at least one of the above technical problems.

本发明的第一目的在于提供一种测序质控标准品，以缓解现有技术中标准品应用场景较为狭窄，通用性差，针对不同的测序方案需要设计不同的标准品的问题。The first purpose of the present invention is to provide a sequencing quality control standard to alleviate the problems in the prior art that the standard has narrow application scenarios, poor versatility, and needs to design different standards for different sequencing schemes.

本发明的第二目的在于提供上述测序质控标准品在单分子测序和/或高通量测序中的应用。The second object of the present invention is to provide the application of the above sequencing quality control standard in single-molecule sequencing and/or high-throughput sequencing.

本发明的第三目的在于提供单分子测序质控试剂盒或高通量测序质控试剂盒，以缓解现有技术中缺少有效的质控产品的技术问题。The third purpose of the present invention is to provide a single-molecule sequencing quality control kit or a high-throughput sequencing quality control kit to alleviate the technical problem of lack of effective quality control products in the prior art.

为了实现本发明的上述目的，特采用以下技术方案：In order to realize the above-mentioned purpose of the present invention, special adopt following technical scheme:

一种测序质控标准品，所述测序质控标准品依次包括引物区1、特异区1、同源区、特异区2和引物区2；A quality control standard for sequencing, the quality control standard for sequencing sequentially includes a primer region 1, a specific region 1, a homologous region, a specific region 2 and a primer region 2;

所述测序质控标准品的长度为1000-10000bp；The length of the sequencing quality control standard is 1000-10000bp;

所述测序质控标准品与天然序列的同源性为0；The homology between the sequencing quality control standard and the natural sequence is 0;

所述测序质控标准品的GC含量为40％-60％；The GC content of the sequencing quality control standard is 40%-60%;

所述测序质控标准品不含有≥5bp的单碱基重复元件；The sequencing quality control standard does not contain ≥ 5bp single base repeat elements;

所述测序质控标准品不含有≥4的GC或CG双碱基重复；The sequencing quality control standard does not contain ≥4 GC or CG dibase repeats;

所述引物区1的长度为30-300bp；The length of the primer region 1 is 30-300bp;

所述引物区2的长度为30-300bp；The length of the primer region 2 is 30-300bp;

所述同源区的长度为特异区1、同源区和特异区2长度总和的30-40％。The length of the homologous region is 30-40% of the sum of the lengths of the specific region 1, the homologous region and the specific region 2.

进一步地，所述同源区中每100bp存在1-4个突变位点。Further, there are 1-4 mutation sites per 100 bp in the homologous region.

进一步地，所述测序质控标准品的长度为1000-5000bp，优选为1000-3000bp。Further, the length of the sequencing quality control standard is 1000-5000bp, preferably 1000-3000bp.

进一步地，所述测序质控标准品与天然序列的同源性检测采用MEGABLAST，结果为0；Further, the homology detection between the sequencing quality control standard and the natural sequence adopts MEGABLAST, and the result is 0;

优选地，同源性检测的数据库包括NCBI的核酸数据库(nt库)、NCBI的人类基因组库和NCBI的小鼠基因组库。Preferably, the databases for homology detection include NCBI's nucleic acid database (nt library), NCBI's human genome database and NCBI's mouse genome database.

进一步地，所述测序质控标准品的GC含量为45％-55％，优选为50％。Further, the GC content of the sequencing quality control standard is 45%-55%, preferably 50%.

进一步地，所述测序质控标准品包括SEQ ID NO.1、SEQ ID NO.2、SEQ ID NO.3、SEQ ID NO.4、SEQ ID NO.5和SEQ ID NO.6中的至少一个；Further, the sequencing quality control standard includes at least one of SEQ ID NO.1, SEQ ID NO.2, SEQ ID NO.3, SEQ ID NO.4, SEQ ID NO.5 and SEQ ID NO.6 ;

优选地，测序质控标准品的PCR引物包括：Preferably, the PCR primers of the sequencing quality control standard include:

F：5’-GTGTGCAACCTATGGCGACAG-3’(SEQ ID NO.7)F: 5'-GTGTGCAACCTATGGCGACAG-3' (SEQ ID NO.7)

R：5’-CACATAGCTCTCAGAGTCGCGG-3’(SEQ ID NO.8)。R: 5'-CACATAGCTCTCAGAGTCGCGG-3' (SEQ ID NO. 8).

上述测序质控标准品在单分子测序和/或高通量测序中的应用。Application of the above sequencing quality control standard in single molecule sequencing and/or high throughput sequencing.

进一步地，所述测序质控标准品的添加量为待检测样本的0.5-10w/w％。Further, the added amount of the sequencing quality control standard is 0.5-10w/w% of the sample to be tested.

一种单分子测序质控试剂盒，包括本发明的测序质控标准品。A single-molecule sequencing quality control kit includes the sequencing quality control standard of the present invention.

一种高通量测序质控试剂盒，包括本发明的测序质控标准品。A high-throughput sequencing quality control kit includes the sequencing quality control standard of the present invention.

与现有技术相比，本发明的有益效果为：Compared with prior art, the beneficial effect of the present invention is:

本发明提供了一种测序质控标准品，该测序质控标准品通过特定的限定可以同时满足单分子测序和高通量测序的要求，通用性强。而且，由于标准品的序列与天然序列没有同源性，所以可以直接掺入到待测序样本中而不会影响其下游的分析操作。此外，具有特定序列的测序质控标准品完全可以满足样品追踪和交叉污染评估的需求，结果准确可靠，同时该测序质控标准品的成本低，可实现广泛应用。此外，通过对引物区、特异区和同源区长度的合理限定，可以实现测序质控标准品的快速制备及满足通用性强、样本追踪、交叉污染评估和突变评估等多项需求。The invention provides a sequencing quality control standard, which can meet the requirements of single-molecule sequencing and high-throughput sequencing through specific limitations, and has strong versatility. Moreover, since the sequence of the standard has no homology with the natural sequence, it can be directly incorporated into the sample to be sequenced without affecting its downstream analysis operations. In addition, the sequencing quality control standard with a specific sequence can fully meet the needs of sample tracking and cross-contamination assessment, and the results are accurate and reliable. At the same time, the sequencing quality control standard is low in cost and can be widely used. In addition, by reasonably limiting the length of primer regions, specific regions, and homologous regions, rapid preparation of sequencing quality control standards can be achieved and multiple requirements such as strong versatility, sample tracking, cross-contamination assessment, and mutation assessment can be met.

本发明提供的单分子测序质控试剂盒或高通量测序质控试剂盒，由于应用了本申请提供的测序质控标准品，所以具有上述优势，并且可实现批量生产，成本较低，可广泛应用。The single-molecule sequencing quality control kit or high-throughput sequencing quality control kit provided by the present invention has the above-mentioned advantages due to the application of the sequencing quality control standard provided by the present application, and can be mass-produced with low cost and can widely used.

附图说明Description of drawings

为了更清楚地说明本发明具体实施方式或现有技术中的技术方案，下面将对具体实施方式或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图是本发明的一些实施方式，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the specific embodiments of the present invention or the technical solutions in the prior art, the following will briefly introduce the accompanying drawings that need to be used in the description of the specific embodiments or prior art. Obviously, the accompanying drawings in the following description The drawings show some implementations of the present invention, and those skilled in the art can also obtain other drawings based on these drawings without creative work.

图1为本发明提供测序质控标准品的结构示意图。Fig. 1 is a schematic diagram of the structure of the sequencing quality control standard provided by the present invention.

具体实施方式Detailed ways

下面将结合实施例对本发明的实施方案进行详细描述，但是本领域技术人员将会理解，下列实施例仅用于说明本发明，而不应视为限制本发明的范围。实施例中未注明具体条件者，按照常规条件或制造商建议的条件进行。Embodiments of the present invention will be described in detail below in conjunction with examples, but those skilled in the art will understand that the following examples are only for illustrating the present invention, and should not be considered as limiting the scope of the present invention. Those who do not indicate the specific conditions in the examples are carried out according to the conventional conditions or the conditions suggested by the manufacturer.

除非另有说明，本文中所用的专业与科学术语与本领域熟练人员所熟悉的意义相同。此外，任何与所记载内容相似或均等的方法或材料也可应用于本发明中。Unless otherwise specified, professional and scientific terms used herein have the same meanings as those familiar to those skilled in the art. In addition, any method or material similar or equivalent to the content described can also be applied in the present invention.

一种测序质控标准品，依次包括引物区1、特异区1、同源区、特异区2和引物区2；A quality control standard product for sequencing, comprising a primer region 1, a specific region 1, a homologous region, a specific region 2 and a primer region 2 in sequence;

其中，测序质控标准品的长度为1000-10000bp，与天然序列的同源性为0，GC含量为40％-60％，不含有≥5bp的单碱基重复元件，以及不含有≥4的GC或CG双碱基重复；Among them, the length of the sequencing quality control standard is 1000-10000bp, the homology with the natural sequence is 0, the GC content is 40%-60%, and it does not contain ≥5bp single-base repeat elements, and does not contain ≥4 GC or CG dibase repeats;

此外，引物区1的长度为30-300bp，引物区2的长度为30-300bp，以及，同源区的长度为特异区1、同源区和特异区2长度总和的30-40％。In addition, the length of the primer region 1 is 30-300bp, the length of the primer region 2 is 30-300bp, and the length of the homologous region is 30-40% of the sum of the lengths of the specific region 1, the homologous region and the specific region 2.

该测序质控标准品通过特定的限定可以同时满足单分子测序和高通量测序的要求，通用性强。而且，由于标准品的序列与天然序列没有同源性，所以可以直接掺入到待测序样本中而不会影响其下游的分析操作。此外，具有特定序列的测序质控标准品完全可以满足样品追踪和交叉污染评估的需求，结果准确可靠，同时该测序质控标准品的成本低，可实现广泛应用。本发明提供的测序质控标准品需要在片段化之前掺入待检测样本中。The sequencing quality control standard can meet the requirements of single-molecule sequencing and high-throughput sequencing at the same time through specific limitations, and has strong versatility. Moreover, since the sequence of the standard has no homology with the natural sequence, it can be directly incorporated into the sample to be sequenced without affecting its downstream analysis operations. In addition, the sequencing quality control standard with a specific sequence can fully meet the needs of sample tracking and cross-contamination assessment, and the results are accurate and reliable. At the same time, the sequencing quality control standard is low in cost and can be widely used. The sequencing quality control standard provided by the present invention needs to be incorporated into the sample to be detected before fragmentation.

本发明中对测序质控标准品的长度进行限定，使得测序质控标准品的通用性更好；与天然序列的同源性为0可以保证检测样本中所有的待检测序列都可以被识别出来并且不影响样本本来应该进行的分析；对GC的含量进行合理控制，以避免测序的偏好性；不含有≥5bp的单碱基重复元件和不含有≥4的GC或CG双碱基重复，可以避免测序技术在这些位点偏高的错误率而影响质控分析的准确性。此外，通过对引物区、特异区和同源区长度的合理限定，可以实现测序质控标准品的快速制备及满足通用性强、样本追踪、交叉污染评估和突变评估等多项需求。In the present invention, the length of the sequencing quality control standard is limited, so that the versatility of the sequencing quality control standard is better; the homology with the natural sequence is 0, which can ensure that all the sequences to be detected in the test sample can be identified And it does not affect the analysis that the sample should have been performed; the GC content is reasonably controlled to avoid sequencing bias; it does not contain ≥5 bp single-base repeat elements and does not contain ≥4 GC or CG double-base repeats, which can be Avoid the high error rate of sequencing technology at these sites and affect the accuracy of quality control analysis. In addition, by reasonably limiting the length of primer regions, specific regions, and homologous regions, rapid preparation of sequencing quality control standards can be achieved and multiple requirements such as strong versatility, sample tracking, cross-contamination assessment, and mutation assessment can be met.

需要说明的是，上述测序质控标准品也可以满足突变评估的需求，例如可以但不限为在测序质控标准品的同源区中每100bp的长度可以存在1-4个突变位点，根据此原则可以得到两条以上的标准品序列，这些标准品序列具有高同源性，混合后同时掺入待测序样本中，即可在满足样品追踪和交叉污染评估需求的基础上，满足突变检测质控的需求。It should be noted that the above sequencing quality control standard can also meet the needs of mutation assessment, for example, but not limited to, there can be 1-4 mutation sites per 100 bp length in the homologous region of the sequencing quality control standard, According to this principle, more than two standard sequences can be obtained. These standard sequences have high homology. After mixing, they are mixed into the samples to be sequenced at the same time, which can meet the requirements of sample tracking and cross-contamination assessment, and meet the quality of mutation detection. control needs.

可以理解的是，测序质控标准品的长度可以但不限于1000bp、2000bp、3000bp、4000bp、5000bp、6000bp、7000bp、8000bp、9000bp或10000bp；测序质控标准品的GC含量可以但不限于40％、42％、44％、46％、48％、50％、52％、54％、56％、58％或60％；天然序列是指自然条件下天然生物拥有的核酸序列；测序质控标准品不含有≥5bp的单碱基重复元件是指标准品的序列中例如可以有A、AA、AAA或AAAA序列，但是不能含有AAAAA、AAAAAA等等连续5bp及以上单碱基重复元件；测序质控标准品不含有≥4的GC或CG双碱基重复是指标准品的序列中例如可以有GC、GCGC、GCGCGC、CG、CGCG或CGCGCG序列，但是不能含有GCGCGCGC或CGCGCGCG等等连续4个及以上的GC或CG双碱基重复；引物区1的长度可以但不限于30bp、50bp、100bp、150bp、200bp、250bp或300bp；引物区2的长度可以但不限于30bp、50bp、100bp、150bp、200bp、250bp或300bp；同源区的长度可以但不限为特异区1、同源区和特异区2长度总和的30％、32％、34％、36％、38％或40％。It can be understood that the length of the sequencing quality control standard can be but not limited to 1000bp, 2000bp, 3000bp, 4000bp, 5000bp, 6000bp, 7000bp, 8000bp, 9000bp or 10000bp; the GC content of the sequencing quality control standard can be but not limited to 40% , 42%, 44%, 46%, 48%, 50%, 52%, 54%, 56%, 58% or 60%; natural sequence refers to the nucleic acid sequence possessed by natural organisms under natural conditions; sequencing quality control standard Does not contain ≥ 5bp single-base repeating elements means that the sequence of the standard may contain A, AA, AAA or AAAA sequences, but cannot contain AAAAA, AAAAAA, etc. consecutive 5bp or more single-base repeating elements; sequencing quality control The standard does not contain ≥ 4 GC or CG dibase repeats, which means that the sequence of the standard may contain GC, GCGC, GCGCGC, CG, CGCG or CGCGCG sequences, but cannot contain 4 or more consecutive GCGCGCGC or CGCGCGCG sequences GC or CG double base repeat; the length of primer region 1 can be but not limited to 30bp, 50bp, 100bp, 150bp, 200bp, 250bp or 300bp; the length of primer region 2 can be but not limited to 30bp, 50bp, 100bp, 150bp, 200bp , 250bp or 300bp; the length of the homologous region can be, but not limited to, 30%, 32%, 34%, 36%, 38% or 40% of the sum of the lengths of the specific region 1, the homologous region and the specific region 2.

在优选地实施方式中，同源区中每100bp存在1-4个突变位点。In a preferred embodiment, there are 1-4 mutation sites per 100 bp in the homology region.

在优选地实施方式中，测序质控标准品的长度为1000-5000bp，优选为1000-3000bp。测序质控标准品的长度过短不能实现多应用场景或多标准质控的要求，长度过长又会徒增生产成本，发明人通过试验发现了合理范围，同时给出进一步的优选方案。In a preferred embodiment, the length of the sequencing quality control standard is 1000-5000bp, preferably 1000-3000bp. If the length of the sequencing quality control standard is too short, it cannot meet the requirements of multiple application scenarios or multi-standard quality control, and if the length is too long, it will increase the production cost. The inventor found a reasonable range through experiments, and at the same time gave a further optimal plan.

在优选地实施方式中，测序质控标准品与天然序列的同源性检测采用MEGABLAST，结果为0。具体地，采用BLAST在线比对工具(网址https://blast.ncbi.nlm.nih.gov/Blast.cgi)，选择megablast程序，对数据库人类基因组+转录组(Human genomic+transcript)、小鼠基因组+转录组(Mouse genomic+transcript)，核苷酸全库(Nucleotidecollectionnr/nt)，均没有比对结果。所有的标准品序列同数据库中的天然序列没有同源性，这样，所有的掺入序列都可以被识别出来而不影响样本本来应该进行的分析。In a preferred embodiment, MEGABLAST is used to detect the homology between the sequencing quality control standard and the natural sequence, and the result is 0. Specifically, using the BLAST online comparison tool (URL https://blast.ncbi.nlm.nih.gov/Blast.cgi), select the megablast program, and compare the database human genome + transcriptome (Human genomic + transcript), mouse Genome+transcriptome (Mouse genomic+transcript), nucleotide full library (Nucleotidecollectionnr/nt), there is no comparison result. All standard sequences have no homology to the natural sequences in the database, so that all spiked sequences can be identified without affecting the analysis of the samples that should be performed.

在优选地实施方式中，同源性检测的数据库包括NCBI的核酸数据库(nt库)、NCBI的人类基因组库和NCBI的小鼠基因组库。In a preferred embodiment, the databases for homology detection include NCBI's nucleic acid database (nt library), NCBI's human genome library and NCBI's mouse genome library.

在优选地实施方式中，测序质控标准品的GC含量为45％-55％，优选为50％。标准品序列具有高GC含量的平衡性，整条序列和序列的各个部分的GC含量都为或接近50％，可以避免测序的偏好性。In a preferred embodiment, the GC content of the sequencing quality control standard is 45%-55%, preferably 50%. The standard sequence has a balance of high GC content, and the GC content of the entire sequence and each part of the sequence is at or close to 50%, which can avoid the bias of sequencing.

在优选地实施方式中，测序质控标准品包括SEQ ID NO.1、SEQ ID NO.2、SEQ IDNO.3、SEQ ID NO.4、SEQ ID NO.5和SEQ ID NO.6中的至少一个。In a preferred embodiment, the sequencing quality control standard includes at least one of SEQ ID NO.1, SEQ ID NO.2, SEQ ID NO.3, SEQ ID NO.4, SEQ ID NO.5 and SEQ ID NO.6 One.

本发明提供的上述6个具体测序质控标准品序列的结构(具体见图1)分为共有引物区，特异区和带特异突变位点的同源区三个部分。具体地，每一个标准品序列中，共有引物区400bp(200bp引物区1，200bp引物区2)，特异区1200bp(600bp特异区1，600bp特异区2)，以及带特异突变位点的同源区600bp。引物区为所有序列共有的，方便可以采用同一引物进行制备。测序质控标准品在所有的天然序列中都没有同源性，可以进行高置信度的识别，用于样品追踪和交叉污染评估。此外，600bp的同源区，6个标准品的序列之间是高度同源的，但是设计有不同的突变位点，可以将6个标准品中的至少2个进行混合，制造目标频率的突变位点，然后将混合标准品掺入样品中用于突变位点检测的质控。The structures of the above-mentioned six specific sequencing quality control standard sequences provided by the present invention (see Figure 1 for details) are divided into three parts: a common primer region, a specific region and a homologous region with a specific mutation site. Specifically, in each standard sequence, there are 400bp primer region (200bp primer region 1, 200bp primer region 2), 1200bp specific region (600bp specific region 1, 600bp specific region 2), and homologous Region 600bp. The primer region is shared by all the sequences, so the same primer can be used for preparation conveniently. Sequencing quality control standards have no homology in all natural sequences, allowing high-confidence identification for sample tracking and cross-contamination assessment. In addition, in the 600bp homology region, the sequences of the 6 standard products are highly homologous, but they are designed with different mutation sites, at least 2 of the 6 standard products can be mixed to create mutations of the target frequency site, and then spike the mixed standard into the sample for quality control of mutation site detection.

在优选地实施方式中，测序质控标准品的PCR引物包括：In a preferred embodiment, the PCR primers of the sequencing quality control standard include:

在优选地实施方式中，测序质控标准品的添加量为待检测样本的0.5-10w/w％。其中，“w/w％”是指质量百分数；测序质控标准品的添加量典型但非限制性的为待检测样本的0.5w/w％、2w/w％、4w/w％、6w/w％、8w/w％或10w/w％。In a preferred embodiment, the amount of the sequencing quality control standard is 0.5-10w/w% of the sample to be tested. Among them, "w/w%" refers to the mass percentage; the amount of sequencing quality control standard is typically but not limited to 0.5w/w%, 2w/w%, 4w/w%, 6w/w% of the sample to be tested w%, 8w/w%, or 10w/w%.

一种单分子测序质控试剂盒，包括上述测序质控标准品。A single-molecule sequencing quality control kit, comprising the above-mentioned sequencing quality control standard.

一种高通量测序质控试剂盒，包括上述测序质控标准品。A high-throughput sequencing quality control kit, comprising the above-mentioned sequencing quality control standard.

下面通过具体的实施例进一步说明本发明，但是，应当理解为，这些实施例仅仅是用于更详细地说明之用，而不应理解为用于以任何形式限制本发明。The present invention will be further described below through specific examples, but it should be understood that these examples are only used for more detailed description, and should not be construed as limiting the present invention in any form.

实施例1Example 1

序列的设计考虑了样品追踪、交叉污染评估、突变评估三个需求，并满足一系列技术指标，以适应测序技术和数据分析的需求，具体如下：The sequence design takes into account the three needs of sample tracking, cross-contamination assessment, and mutation assessment, and meets a series of technical indicators to meet the needs of sequencing technology and data analysis, as follows:

(1)序列的结构(见图1)分为共有引物区，特异区和带特异突变位点的同源区三个部分。其中，共有引物区400bp，特异区1200bp，带特异突变位点的同源区600bp。引物区为所有序列共有的，方便可以采用同一引物进行制备。测序质控标准品在所有的天然序列中都没有同源性，可以进行高置信度的识别，用于样品追踪和交叉污染评估。另外有600bp的同源区，不同掺入序列之间是高度同源的，但是设计有不同的突变位点，可以将不同的序列进行混合，制造目标频率的突变位点，然后将混合标准品掺入样品用于突变位点检测的质控。(1) The structure of the sequence (see Figure 1) is divided into three parts: a common primer region, a specific region and a homologous region with a specific mutation site. Among them, the common primer region is 400bp, the specific region is 1200bp, and the homologous region with specific mutation sites is 600bp. The primer region is shared by all the sequences, so the same primer can be used for preparation conveniently. Sequencing quality control standards have no homology in all natural sequences, allowing high-confidence identification for sample tracking and cross-contamination assessment. In addition, there is a 600bp homologous region, which is highly homologous between different incorporation sequences, but different mutation sites are designed, and different sequences can be mixed to create a target frequency of mutation sites, and then the mixed standard Incorporate samples for quality control of mutation site detection.

(2)所有的序列同数据库中的天然序列没有同源性。具体的，采用MEGABLAST对来自NCBI的核酸数据库(nt库)、以及人类基因组和小鼠基因组进行比对，无同源性被发现。这样，所有的掺入序列都可以被识别出来而不影响样本本来应该进行的分析。(2) All the sequences have no homology with the natural sequences in the database. Specifically, MEGABLAST was used to compare the nucleic acid database (nt library) from NCBI, as well as the human genome and the mouse genome, and no homology was found. In this way, all spiked sequences can be identified without affecting the analysis the sample should have been subjected to.

(3)序列具有高GC含量的平衡性。整条序列和序列的各个部分的GC含量都为或接近50％。以避免测序的偏性。(3) The sequence has a balance of high GC content. The entire sequence and various portions of the sequence have a GC content at or near 50%. to avoid sequencing bias.

(4)序列没有长的单碱基重复元件(≥5bp)和≥4的GC或CG双碱基重复，以避免测序技术在这些位点偏高的错误率影响质控分析。(4) The sequence does not have long single-base repeat elements (≥5bp) and ≥4 GC or CG double-base repeats, so as to avoid the high error rate of sequencing technology at these sites from affecting quality control analysis.

实施例2Example 2

利用软件设计完测序质控标准品序列后，得到6条序列分别为SEQ ID NO.1、SEQID NO.2、SEQ ID NO.3、SEQ ID NO.4、SEQ ID NO.5和SEQ ID NO.6的标准品，采用从头合成的方式来分别合成6个测序质控标准品序列(SEQ ID NO.1-SEQ ID NO.6序列)，将合成的序列分别连接于Puc57质粒中得到6个重组质粒，然后将6个重组质粒再分别转入大肠杆菌TOP10菌株中，重组质粒随菌株的繁殖不断扩增，经过DNA提取后，可以分别获得6种大量重组质粒。然后，采用PCR的方式，将嵌入重组质粒中的6个(SEQ ID NO.1-SEQ ID NO.6)序列分别扩增出来，然后通过琼脂糖凝胶电泳将spike-in序列与其他杂质分开，割取属于spike-in序列大小的条带，利用琼脂糖凝胶DNA回收试剂盒对DNA进行回收，最后利用Nuclease-Free Water将DNA洗脱。通过sanger测序进行质检后得到6个目标测序质控标准品，定量后存于-20度冰箱备用。After using the software to design the sequencing quality control standard sequence, the six sequences obtained are SEQ ID NO.1, SEQ ID NO.2, SEQ ID NO.3, SEQ ID NO.4, SEQ ID NO.5 and SEQ ID NO .6 standard products, using de novo synthesis to synthesize 6 sequencing quality control standard sequences (SEQ ID NO.1-SEQ ID NO.6 sequences), and linking the synthesized sequences to the Puc57 plasmid to obtain 6 Recombined plasmids, and then the 6 recombinant plasmids were transferred into E. coli TOP10 strains, the recombinant plasmids continued to amplify with the propagation of the strains, and after DNA extraction, a large number of 6 recombinant plasmids could be obtained respectively. Then, PCR was used to amplify the 6 (SEQ ID NO.1-SEQ ID NO.6) sequences embedded in the recombinant plasmid respectively, and then the spike-in sequence was separated from other impurities by agarose gel electrophoresis , cut out the band belonging to the spike-in sequence size, use the agarose gel DNA recovery kit to recover the DNA, and finally use Nuclease-Free Water to elute the DNA. After quality inspection by sanger sequencing, six target sequencing quality control standards were obtained, which were quantified and stored in a -20°C refrigerator for later use.

PCR用引物如下：Primers for PCR are as follows:

PCR的反应条件为：变性(94℃，30s)、退火(60℃，30s)、延伸(72℃，30s)，循环次数：30次。The reaction conditions of PCR are: denaturation (94°C, 30s), annealing (60°C, 30s), extension (72°C, 30s), cycle number: 30 times.

实施例3Example 3

将6个制备好的测序质控标准品分别加入到6个待测序DNA样本中，并进行了记录，掺入的比例为4w/w％。然后，这6个样本在建库后，采用了Oxford Nanopore Technologies公司的MinION测序仪进行了测序。6个样本为一张芯片上的混合测序。The 6 prepared sequencing quality control standards were respectively added to 6 DNA samples to be sequenced, and recorded, and the incorporation ratio was 4w/w%. Then, these 6 samples were sequenced using the MinION sequencer of Oxford Nanopore Technologies after the library was built. 6 samples for hybrid sequencing on one chip.

测序完成后，采用测序仪公司推荐的软件对数据进行了拆分，然后分别分析每个样本数据中测序质控标准品的信息。结果6个样本中发现标准品序列的平均比例为3.7％，方差为0.9％。在5个样本中，99％以上的标准品序列只来自一个标准品，剩余的一个样本97.6％来自一个标准品。均与预期加入的测序质控标准品一致，说明本发明的测序质控标准品发挥了样本追踪的作用，且本次实验没有发生样品混淆的情况。另一方面，该结果也直接获得交叉污染率：其中5个样本的交叉污染均<1％，而另一个样本的交叉污染率为2.4％。此外，由于测序质控标准品的加入需求量很少，且与天然序列没有同源性，所以没有影响这些样本的正常分析(基因组拼接)。After the sequencing is completed, use the software recommended by the sequencer company to split the data, and then analyze the information of the sequencing quality control standard in each sample data separately. Results The average proportion of standard sequences found in 6 samples was 3.7%, and the variance was 0.9%. Among the 5 samples, more than 99% of the standard sequences came from only one standard, and 97.6% of the remaining one sample came from one standard. All are consistent with the expected sequencing quality control standard, indicating that the sequencing quality control standard of the present invention has played the role of sample tracking, and there is no sample confusion in this experiment. On the other hand, this result also directly obtained the cross-contamination rate: 5 of the samples were all <1%, while the other sample had a cross-contamination rate of 2.4%. In addition, the normal analysis (genome assembly) of these samples was not affected due to the small amount of addition of sequencing quality control standards and no homology to natural sequences.

该结果说明本技术发明的标准品可以满足需求，为测序样本提供良好的质控。This result shows that the standard product invented by the technology can meet the demand and provide good quality control for the sequencing samples.

实施例4Example 4

将序列为SEQ ID NO.1、SEQ ID NO.3、SEQ ID NO.2的掺入标准品按照1:1:2的比例混合，形成一个混合标准品，共有18个突变，频率分为两类；另外序列为SEQ ID NO.4、SEQID NO.5和SEQ ID NO.6的三种掺入标准品按照50:45:5的比例混合，形成第二个混合标准品，共有18个突变，预期频率分为3类。然后，分别将这两种混合标准品加入到样本1、样本2中，掺入的比例预期为5w/w％，两个样本分别建库以后，采用了Oxford NanoporeTechnologies公司的MinION测序仪进行了测序。2个样本为一张芯片上的混合测序。Mix the incorporation standards whose sequences are SEQ ID NO.1, SEQ ID NO.3, and SEQ ID NO.2 at a ratio of 1:1:2 to form a mixed standard with a total of 18 mutations, divided into two frequencies class; the other three incorporation standards whose sequences are SEQ ID NO.4, SEQ ID NO.5 and SEQ ID NO.6 were mixed according to the ratio of 50:45:5 to form the second mixed standard with a total of 18 mutations , the expected frequencies fall into 3 categories. Then, these two mixed standards were added to sample 1 and sample 2 respectively, and the incorporation ratio was expected to be 5w/w%. After the two samples were built separately, they were sequenced using the MinION sequencer from Oxford Nanopore Technologies. . 2 samples for hybrid sequencing on one chip.

测序完成以后采用测序仪公司推荐的软件对数据进行拆分，然后分析两个样本中的测序质控标准品的组成及比例信息。结果发现，两个样本中的混合掺入标准品的比例分别为4.6％和5.07％，并且在两个样本中各个测序质控标准品的组成和比例也与预期相同。After the sequencing is completed, use the software recommended by the sequencer company to split the data, and then analyze the composition and ratio information of the sequencing quality control standards in the two samples. It was found that the ratios of mixed spiked standards in the two samples were 4.6% and 5.07%, respectively, and the composition and ratio of each sequencing quality control standard in the two samples were also the same as expected.

然后通过分析两个样本中所有测序质控标准品同源区的突变及其频率。结果证实，最低突变频率检测限设置在0.2时，可以获得对突变位点比较高准确的检测，频率≥0.2突变的检出率在样本1中为94.4％(17/18)，在样本2中为100％(12/12)；假阳性位点在样本1出现了1个，在样本2中没有。而在突变频率检测限小于0.2时，出现大量假阳性位点，说明低于此检测限的突变分析不可靠。Then by analyzing the mutations and their frequencies in the homologous regions of all sequencing quality control standards in the two samples. The results confirmed that when the detection limit of the lowest mutation frequency was set at 0.2, a relatively high and accurate detection of the mutation site could be obtained, and the detection rate of mutations with a frequency ≥ 0.2 was 94.4% (17/18) in sample 1 and 94.4% (17/18) in sample 2. It was 100% (12/12); there was one false positive site in sample 1 and none in sample 2. However, when the mutation frequency detection limit is less than 0.2, a large number of false positive sites appear, indicating that the mutation analysis below this detection limit is unreliable.

该结果说明本技术发明的标准品可以用于突变检测的评估，包括检出率、假阳性位点、以及噪音分布，为测序样本突变的检出和分析提供质控标准参考。This result shows that the standard product invented by this technology can be used for the evaluation of mutation detection, including detection rate, false positive sites, and noise distribution, and provides a quality control standard reference for the detection and analysis of mutations in sequencing samples.

尽管已用具体实施例来说明和描述了本发明，然而应意识到，在不背离本发明的精神和范围的情况下可以作出许多其它的更改和修改。因此，这意味着在所附权利要求中包括属于本发明范围内的所有这些变化和修改。While particular embodiments of the invention have been illustrated and described, it should be appreciated that various other changes and modifications can be made without departing from the spirit and scope of the invention. It is therefore intended to cover in the appended claims all such changes and modifications that are within the scope of this invention.

SEQUENCE LISTINGSEQUENCE LISTING

<110> 中国人民解放军军事科学院军事医学研究院<110> Academy of Military Medical Sciences, Chinese People's Liberation Army

中国人民解放军疾病预防控制中心Chinese People's Liberation Army Center for Disease Control and Prevention

<120> 测序质控标准品及其应用与产品<120> Sequencing quality control standards and their applications and products

<160> 8<160> 8

<170> PatentIn version 3.5<170> PatentIn version 3.5

<210> 1<210> 1

<211> 2200<211> 2200

<212> DNA<212>DNA

<213> 人工序列<213> Artificial sequence

<400> 1<400> 1

gtgtgcaacc tatggcgaca gtctgtcgaa tccgccgttg tatgcactat attcttgctg 60gtgtgcaacc tatggcgaca gtctgtcgaa tccgccgttg tatgcactat attcttgctg 60

aagacgattg cattagtcat agagatccgc ctacagtcgc ttgccatact gtacggcata 120aagacgattg cattagtcat agagatccgc ctacagtcgc ttgccatact gtacggcata 120

cgagatacct tgtcgatact catgctggaa ctgagatgag cgcggtatac gtgagagtat 180cgagatacct tgtcgatact catgctggaa ctgagatgag cgcggtatac gtgagagtat 180

gcgtggcgca gtcagcgtaa tctatgacgg tgataggcac tggcgcaaga tacggtcgca 240gcgtggcgca gtcagcgtaa tctatgacgg tgataggcac tggcgcaaga tacggtcgca 240

atgtgacttc atgactctgg cgcgacacta tgattgcagt tgatactgcg agcgttatta 300atgtgacttc atgactctgg cgcgacacta tgattgcagt tgatactgcg agcgttatta 300

gatcatatat gcatgctcca atcagtatgc cgctagacgc ctatcgactg tatcggaccg 360gatcatatat gcatgctcca atcagtatgc cgctagacgc ctatcgactg tatcggaccg 360

cgccgtgcca tgtcgagtaa tcgaatagag tctggcatct ccgagcggat caacgtacat 420cgccgtgcca tgtcgagtaa tcgaatagag tctggcatct ccgagcggat caacgtacat 420

tctgcacgca taacgacgat atacgtgact atacaatagt ccattgttga atgatcgtct 480tctgcacgca taacgacgat atacgtgact atacatagt ccattgttga atgatcgtct 480

tcagtgcgta acaacgagat gctcgatcgg tcctaataac gccagccgct ctagatgtgg 540tcagtgcgta acaacgagat gctcgatcgg tcctaataac gccagccgct ctagatgtgg 540

cgcataatac acactacgtt cgacgccgaa gctaatccgc tcggcggtgc ctaggagatg 600cgcataatac acactacgtt cgacgccgaa gctaatccgc tcggcggtgc ctaggagatg 600

tactattgtt ctactcatac taacaatgcg caccgtgccg ctagacaatg gcgaggatcg 660tactattgtt ctactcatac taacaatgcg caccgtgccg ctagacaatg gcgaggatcg 660

agtcagagac ctgtgcatag ctacggcgaa taacgcttac tccatagcgg taacctgcgg 720agtcagagac ctgtgcatag ctacggcgaa taacgcttac tccatagcgg taacctgcgg 720

aacttatacc agattatgaa gcggcaatac ttagagccac tagttgtgcc agcgagtata 780aacttatacc agattatgaa gcggcaatac ttagagccac tagttgtgcc agcgagtata 780

ttcggaccat ccaccaggag gtagctacga acgctagcgc tgagactgat acgcgtccta 840ttcggaccat ccaccaggag gtagctacga acgctagcgc tgagactgat acgcgtccta 840

cacgatactg gtccgcgagt caatccagta acgtacgaga ttgtacactg ctagcatcgg 900cacgatactg gtccgcgagt caatccagta acgtacgaga ttgtacactg ctagcatcgg 900

ttagctacca ttgcacgtac gtcggcacgc catgcgtgtc tagcgtcaac cttcagtatt 960ttagctacca ttgcacgtac gtcggcacgc catgcgtgtc tagcgtcaac cttcagtatt 960

cgtacttatg tgagttcgcc actaagtctc tcttaggata gactgaatgc taccatacag 1020cgtacttatg tgagttcgcc actaagtctc tcttaggata gactgaatgc taccatacag 1020

gcgcaataga tggtgacata gttcgtcgcc aaccgttggt atctgttgat cgcagttagc 1080gcgcaataga tggtgacata gttcgtcgcc aaccgttggt atctgttgat cgcagttagc 1080

gtgttggtac cacaggacga cacgagcagg attcttgacg ctgcgatgcg ttcgcttgta 1140gtgttggtac cacaggacga cacgagcagg attcttgacg ctgcgatgcg ttcgcttgta 1140

gacgcctgtt caccgctaat accataggac ggctatcgta tcgctcctca atcctgcaac 1200gacgcctgtt caccgctaat accataggac ggctatcgta tcgctcctca atcctgcaac 1200

gatcaacaag agccaatgta gctggcgcaa tctccgtcgt aggtacgtcc atataacagc 1260gatcaacaag agccaatgta gctggcgcaa tctccgtcgt aggtacgtcc atataacagc 1260

gtgcaacgcc gatctgcgaa ggtagtgtcc tgatctgcta accgaacaat cgcgttgtat 1320gtgcaacgcc gatctgcgaa ggtagtgtcc tgatctgcta accgaacaat cgcgttgtat 1320

gaacgtattc gacggacaat agcggcaagc gaatcacagg tatgcttagc cactgttgat 1380gaacgtattc gacggacaat agcggcaagc gaatcacagg tatgcttagc cactgttgat 1380

gacagtgagt cgtgctatac cagtggcata ctgtgcgcgt tatcgtgact gtagagctat 1440gacagtgagt cgtgctatac cagtggcata ctgtgcgcgt tatcgtgact gtagagctat 1440

cgagactgga acacggtaga gtatatccag ccactaatct cggtgcagcc gcggattctc 1500cgagactgga acacggtaga gtatatccag ccactaatct cggtgcagcc gcggattctc 1500

atcttagcct gcgacctcta gctataatcc ttacttgagt ggtatacgtc atacgagtta 1560atcttagcct gcgacctcta gctataatcc ttacttgagt ggtatacgtc atacgagtta 1560

gacaagtatc acgcgaatag catactcgaa taccgcggac acgcctcgct acatatatca 1620gacaagtatc acgcgaatag catactcgaa taccgcggac acgcctcgct acatatatca 1620

gtgtatggct agctaggttg tagaacgcgc ctgtccgctg tagatgacag cctcgtgctc 1680gtgtatggct agctaggttg tagaacgcgc ctgtccgctg tagatgacag cctcgtgctc 1680

atggaagatc caaggcgaag gcttagcacg tgcactacaa ccgcatccgt acctatccga 1740atggaagatc caaggcgaag gcttagcacg tgcactacaa ccgcatccgt acctatccga 1740

ttagataact agtccgcttg gtcctattgc taaggagtag ttggagtact ggttcaatag 1800ttagataact agtccgcttg gtcctattgc taaggagtag ttggagtact ggttcaatag 1800

cgaaccgcta tcctcagcta ctcagtacgc aagcctgccg ttacgtgtcg acgtcatgtg 1860cgaaccgcta tcctcagcta ctcagtacgc aagcctgccg ttacgtgtcg acgtcatgtg 1860

tgctatgcgt catgaataag cattgaactg aagataatca gttagcgcat tgagctctaa 1920tgctatgcgt catgaataag cattgaactg aagataatca gttagcgcat tgagctctaa 1920

tggaaccact ggtacgtctc catacttatt cgtgatgata gcatgccagc aggcgccatg 1980tggaaccact ggtacgtctc catacttatt cgtgatgata gcatgccagc aggcgccatg 1980

actcgacagc cagacacgtt aagtagtata gacgagaata tcgtgacaga agcgcatgaa 2040actcgacagc cagacacgtt aagtagtata gacgagaata tcgtgacaga agcgcatgaa 2040

tgtctgtgag atatatcacg gaccgaacgt aggtccagaa ccagcagtaa gatgcgagcg 2100tgtctgtgag atatatcacg gaccgaacgt aggtccagaa ccagcagtaa gatgcgagcg 2100

tggttagatg cggtatagct ccgtacagga cacagtgcag tacaaggatc actccagtct 2160tggttagatg cggtatagct ccgtacagga cacagtgcag tacaaggatc actccagtct 2160

ctaagagcgc aagcctatcc gcgactctga gagctatgtg 2200ctaagagcgc aagcctatcc gcgactctga gagctatgtg 2200

<210> 2<210> 2

<211> 2200<211> 2200

<212> DNA<212>DNA

<213> 人工序列<213> Artificial sequence

<400> 2<400> 2

gcgtggcgca gtcagcgtaa ggtaagcagt ctctgtggtg gtgtatacgc tgcatgacat 240gcgtggcgca gtcagcgtaa ggtaagcagt ctctgtggtg gtgtatacgc tgcatgacat 240

cgtactgcac ctattgacgt gttctccgtg aactgcagta tatacgcggt aacggacaag 300cgtactgcac ctattgacgt gttctccgtg aactgcagta tatacgcggt aacggacaag 300

cgctctatcg catgttgcgg tagacgccac tatattatgg caacgccatg tattggcata 360cgctctatcg catgttgcgg tagacgccac tatattatgg caacgccatg tattggcata 360

gcgataccag tacagcttct ccgactgtac actatccgcc ggcagaatca tatattatct 420gcgataccag tacagcttct ccgactgtac actatccgcc ggcagaatca tatattatct 420

gcgaagtact tgtcgctagt catcgcctcg gaatcgtagt gtagcctggt tggttcgtac 480gcgaagtact tgtcgctagt catcgcctcg gaatcgtagt gtagcctggt tggttcgtac 480

tatccgtgac ctatgaccta ttggcgaggc ggtagcaaca cgaacggtat gcagcaatgc 540tatccgtgac ctatgaccta ttggcgaggc ggtagcaaca cgaacggtat gcagcaatgc 540

acgcttatct gcacaagcct attatgagat ctggtcgtta tcgacttcgc gacgcgacga 600acgcttatct gcacaagcct attatgagat ctggtcgtta tcgacttcgc gacgcgacga 600

gctggtgtta ggctggacag ccgacctacc tgatggagtc agcggacgct tggtgagcta 660gctggtgtta ggctggacag ccgacctacc tgatggagtc agcggacgct tggtgagcta 660

ttcgtacact gcttgatatt atgcgatatt acattatgct cctcgtaaca cacgcatacc 720ttcgtacact gcttgatatt atgcgatatt aattatgct cctcgtaaca cacgcatacc 720

ttccaggatc acgtgtagcg tcgaggacgg agttctatac ctcaagatag cgtcagcgac 780ttccaggatc acgtgtagcg tcgaggacgg agttctatac ctcaagatag cgtcagcgac 780

agaatcattg gtgaacatac gaacctacga acgctagcgc tgagactgat acgcgtccta 840agaatcattg gtgaacatac gaacctacga acgctagcgc tgagactgat acgcgtccta 840

attgctacca ttgcacgtac gtcggcacgc catgcgtgtc tagcgtcaac cttcagtatt 960attgctacca ttgcacgtac gtcggcacgc catgcgtgtc tagcgtcaac cttcagtatt 960

cgtacttatg tgagttcgcc actaagtctc tcttaggata cagtgaatgc taccatacag 1020cgtacttatg tgagttcgcc actaagtctc tcttaggata cagtgaatgc taccatacag 1020

gtgttggtac cacaggacgt ctcgagcagg attcttgacg ctgcgatgcg ttcgcttgta 1140gtgttggtac cacaggacgt ctcgagcagg attcttgacg ctgcgatgcg ttcgcttgta 1140

caacaacaag agccaatgta gctggcgcaa tctccgtcgt aggtacgtcc atataacagc 1260caacaacaag agccaatgta gctggcgcaa tctccgtcgt aggtacgtcc atataacagc 1260

gtgcaacgcc gatctgcgaa ggtagtgtcc tgatctgcta tcggaacaat cgcgttgtat 1320gtgcaacgcc gatctgcgaa ggtagtgtcc tgatctgcta tcggaacaat cgcgttgtat 1320

gacagtgagt cgtgctatac ggacaacgtg tgtggtattg gtgagacaag tattactcgc 1440gacagtgagt cgtgctatac ggacaacgtg tgtggtattg gtgagacaag tattactcgc 1440

gcttgaggac ggcgcagata ctgcaatcaa gtgcagcagc gcgtacggtt gcgatgaact 1500gcttgaggac ggcgcagata ctgcaatcaa gtgcagcagc gcgtacggtt gcgatgaact 1500

tccgtgcctg atcctgacga tgtcgttata tccgaagaca cacttatcgg tcaacagttc 1560tccgtgcctg atcctgacga tgtcgttata tccgaagaca cacttatcgg tcaacagttc 1560

gacttgtcac tgtcgtcgca caggactatc atgaatgcaa cgtcaatgcg gattcctcgc 1620gacttgtcac tgtcgtcgca caggactatc atgaatgcaa cgtcaatgcg gattcctcgc 1620

acggcataat ccataatgta gctcatggcg gtgcggctag gctagtaagt cgcatcgcct 1680acggcataat ccataatgta gctcatggcg gtgcggctag gctagtaagt cgcatcgcct 1680

gttatatcct tggcggtcat gattgtatcg tacaataaga ggtggttaga gcgcgagcac 1740gttatatcct tggcggtcat gattgtatcg tacaataaga ggtggttaga gcgcgagcac 1740

attctgctat ggctgatcct taccttctaa gtcctctgcg gctgaagtta gactgcggca 1800attctgctat ggctgatcct taccttctaa gtcctctgcg gctgaagtta gactgcggca 1800

acgcttgatg ataaccgcct acgagatact cctgaacggt gtataggctc ataatcctcg 1860acgcttgatg ataaccgcct acgagatact cctgaacggt gtataggctc ataatcctcg 1860

atggctcgag ctcgttcggc ggatacgaag ccattatcgt gcatagcgtc ctctatggtg 1920atggctcgag ctcgttcggc ggatacgaag ccattatcgt gcatagcgtc ctctatggtg 1920

cgatagagca cttatccaga ctcagcgaac aatggttcgt gacgagatac cagtgaacag 1980cgatagagca cttatccaga ctcagcgaac aatggttcgt gacgagatac cagtgaacag 1980

atcgccatcg gacactctac aagtagtata gacgagaata tcgtgacaga agcgcatgaa 2040atcgccatcg gacactctac aagtagtata gacgagaata tcgtgacaga agcgcatgaa 2040

<210> 3<210> 3

<211> 2200<211> 2200

<212> DNA<212>DNA

<213> 人工序列<213> Artificial sequence

<400> 3<400> 3

gcgtggcgca gtcagcgtaa ctgataatcc atggcgtgcc gacgaagtat ggtacagtgc 240gcgtggcgca gtcagcgtaa ctgataatcc atggcgtgcc gacgaagtat ggtacagtgc 240

agcttattat accgactgag ctaaggactg gaggataggt tgtgtgcaga aggacaagga 300agcttattat accgactgag ctaaggactg gaggataggt tgtgtgcaga aggacaagga 300

atagacgccg catcgccgcc gtcatacctc agtatcttga agatagccgt gctcaacgca 360atagacgccg catcgccgcc gtcatacctc agtatcttga agatagccgt gctcaacgca 360

ataatctgga gcaatctagt cgtatctcca gttatggtca gttgcgatca gctcaggact 420ataatctgga gcaatctagt cgtatctcca gttatggtca gttgcgatca gctcaggact 420

cggactgcta tctatggaag agctacctgc gctcttagct attgaacaat cactaacact 480cggactgcta tctatggaag agctacctgc gctcttagct attgaacaat cactaacact 480

cctcaccaca aggatacggt atcggagcga tggaccgcac tatattactt ccaactatgc 540cctcaccaca aggatacggt atcggagcga tggaccgcac tatattactt ccaactatgc 540

ggctacggaa ggctctattg cgacatgcgg atacttcgct caggttcgcc gatacacatt 600ggctacggaa ggctctattg cgacatgcgg atacttcgct caggttcgcc gatacacatt 600

ccaataacta atacaaggtg gtcgatactg tgcgagcgag gacacttatc atggctcgaa 660ccaataacta atacaaggtg gtcgatactg tgcgagcgag gacacttatc atggctcgaa 660

taccgcggct cattcggctt gctgtcagtg gtcgtcgtcc tatcgagaag cgacaggagc 720taccgcggct cattcggctt gctgtcagtg gtcgtcgtcc tatcgagaag cgacaggagc 720

aacactgtat tcgagtatac ctctgtctgc cacctatcca ggtggaatat agccatatgt 780aacactgtat tcgagtatac ctctgtctgc cacctatcca ggtggaatat agccatatgt 780

gcgagaactt cgaggataag gaagcaacga acgctagcgc tgagactgat acgcgtccta 840gcgagaactt cgaggataag gaagcaacga acgctagcgc tgagactgat acgcgtccta 840

ataggtacca ttgcacgtac gtcggcacgc catgcgtgtc tagcgtcaac cttcagtatt 960ataggtacca ttgcacgtac gtcggcacgc catgcgtgtc tagcgtcaac cttcagtatt 960

cgtacttatg tgagttcgcc actaagtctc tcttaggata cactcaatgc taccatacag 1020cgtacttatg tgagttcgcc actaagtctc tcttaggata cactcaatgc taccatacag 1020

gtgttggtac cacaggacgt caccagcagg attcttgacg ctgcgatgcg ttcgcttgta 1140gtgttggtac cacaggacgt caccagcagg attcttgacg ctgcgatgcg ttcgcttgta 1140

catctacaag agccaatgta gctggcgcaa tctccgtcgt aggtacgtcc atataacagc 1260catctacaag agccaatgta gctggcgcaa tctccgtcgt aggtacgtcc atataacagc 1260

gtgcaacgcc gatctgcgaa ggtagtgtcc tgatctgcta tccgtacaat cgcgttgtat 1320gtgcaacgcc gatctgcgaa ggtagtgtcc tgatctgcta tccgtacaat cgcgttgtat 1320

gacagtgagt cgtgctatac gattgagtag cctcgcgctc aagagagact agagtaagac 1440gacagtgagt cgtgctatac gattgagtag cctcgcgctc aagagagact agagtaagac 1440

ttccatcacg agcgatctct tactggacgc cgtattgaca cctgcatatg gaatcacatc 1500ttccatcacg agcgatctct tactggacgc cgtattgaca cctgcatatg gaatcacatc 1500

gccgttggat agtgcagtaa tatcactgcg tgcaacttgt gcacagagcc gcgtactatc 1560gccgttggat agtgcagtaa tatcactgcg tgcaacttgt gcacagagcc gcgtactatc 1560

gtgtctatga gaccttacgt ccgacgctct acggtccata tatcgtatcg tatatcgcct 1620gtgtctatga gacctacgt ccgacgctct acggtccata tatcgtatcg tatatcgcct 1620

ctcacgatac ataagttctc tctatcgcac actggtactc gaccgtctcc gtgcgtataa 1680ctcacgatac ataagttctc tctatcgcac actggtactc gaccgtctcc gtgcgtataa 1680

gcgagtactc ctaaccaagt atattcgctc gcaacgcgcc tggacatcgc gatcgttatc 1740gcgagtactc ctaaccaagt atattcgctc gcaacgcgcc tggacatcgc gatcgttatc 1740

tggagcgctc ggagtgcgca tgcaagatta caaccgcatt ggatagactc cattgtgtcc 1800tggagcgctc ggagtgcgca tgcaagatta caaccgcatt gtagagactc cattgtgtcc 1800

gtcggtgcgc agtgcgctac tcttgctagc gctaagacca gagacacgaa ggctatagta 1860gtcggtgcgc agtgcgctac tcttgctagc gctaagacca gagacacgaa ggctatagta 1860

atagtggacg cctatcaact caacatgcga agaagagcag tggtatactg ttctcgtgta 1920atagtggacg cctatcaact caacatgcga agaagagcag tggtatactg ttctcgtgta 1920

ggtacgcaat cgataccgta gttctgcgct gttgtaccga tacacaccta cataggcgct 1980ggtacgcaat cgataccgta gttctgcgct gttgtaccga tacacaccta cataggcgct 1980

tgccaatacg atggttggtc aagtagtata gacgagaata tcgtgacaga agcgcatgaa 2040tgccaatacg atggttggtc aagtagtata gacgagaata tcgtgacaga agcgcatgaa 2040

<210> 4<210> 4

<211> 2200<211> 2200

<212> DNA<212>DNA

<213> 人工序列<213> Artificial sequence

<400> 4<400> 4

gcgtggcgca gtcagcgtaa cgtgcggtgc acgcgtcatg atagacgata taacggccgc 240gcgtggcgca gtcagcgtaa cgtgcggtgc acgcgtcatg atagacgata taacggccgc 240

gtctcaggtc tcaagtaaga ttgcgttggt cagacaatac gctcgaaggc gcagtcatat 300gtctcaggtc tcaagtaaga ttgcgttggt cagacaatac gctcgaaggc gcagtcatat 300

accattaata gcacgtgtag agcgcactac tatgaggtat caggtgagag tatgatatca 360accattaata gcacgtgtag agcgcactac tatgaggtat caggtgagag tatgatatca 360

tagagtcctt gagtgcgtct tatacgcgtt cctagattga gcgtgtatcg cacaagacgc 420tagagtcctt gagtgcgtct tatacgcgtt cctagattga gcgtgtatcg cacaagacgc 420

tatatatgaa tacatgcgtc tcgagattgt ataactcgtc agctagccgt catatgcctt 480tatatatgaa tacatgcgtc tcgagattgt ataactcgtc agctagccgt catatgcctt 480

ctcaagtgcg ttatgtcgca caacgtagac tgtgagtgac gcgtgctgtg aggtctatat 540ctcaagtgcg ttatgtcgca caacgtagac tgtgagtgac gcgtgctgtg aggtctatat 540

aagtcatcac gcacaacgcc tatcaagcca cttgtggacg ctagcgtgct gcacagcgag 600aagtcatcac gcacaacgcc tatcaagcca cttgtggacg ctagcgtgct gcacagcgag 600

tagctcgcgg cagagacaca tcgagtatac ctaggatagt cttgatactc cacgtggtat 660tagctcgcgg cagagacaca tcgagtatac ctagtagt cttgatactc cacgtggtat 660

gcggcactat cttacacata tcaggcgtcc tggaagcgct accaattagc gtcgctgcgt 720gcggcactat cttacacata tcaggcgtcc tggaagcgct accaattagc gtcgctgcgt 720

tactgcaagc agcgaccagg caactcatat gccggcacgc gctatcgcgt aaggcggtaa 780tactgcaagc agcgaccagg caactcatat gccggcacgc gctatcgcgt aaggcggtaa 780

cgctaacata ttgatattat gaagctagga acgctagcgc tgagactgat acgcgtccta 840cgctaacata ttgatattat gaagctagga acgctagcgc tgagactgat acgcgtccta 840

atagcttcca ttgcacgtac gtcggcacgc catgcgtgtc tagcgtcaac cttcagtatt 960atagcttcca ttgcacgtac gtcggcacgc catgcgtgtc tagcgtcaac cttcagtatt 960

cgtacttatg tgagttcgcc actaagtctc tcttaggata cactgattgc taccatacag 1020cgtacttatg tgagttcgcc actaagtctc tcttaggata cactgattgc taccatacag 1020

gtgttggtac cacaggacgt cacgaccagg attcttgacg ctgcgatgcg ttcgcttgta 1140gtgttggtac cacaggacgt cacgaccagg attcttgacg ctgcgatgcg ttcgcttgta 1140

catcaagaag agccaatgta gctggcgcaa tctccgtcgt aggtacgtcc atataacagc 1260catcaagaag agccaatgta gctggcgcaa tctccgtcgt aggtacgtcc atataacagc 1260

gtgcaacgcc gatctgcgaa ggtagtgtcc tgatctgcta tccgaagaat cgcgttgtat 1320gtgcaacgcc gatctgcgaa ggtagtgtcc tgatctgcta tccgaagaat cgcgttgtat 1320

gacagtgagt cgtgctatac aactgcgtat actagagatc cgcgtctcat gctatctcgg 1440gacagtgagt cgtgctatac aactgcgtat actagagatc cgcgtctcat gctatctcgg 1440

cgccttcgcg cagtgctacg taacggcgct atgccatgct aacatagttg cgtatcctat 1500cgccttcgcg cagtgctacg taacggcgct atgccatgct aacatagttg cgtatcctat 1500

gatcctgcat aacgtcacgc gtgacctccg ttctacttcg cgatgcgtat cgctatatcg 1560gatcctgcat aacgtcacgc gtgacctccg ttctacttcg cgatgcgtat cgctatatcg 1560

tgaagtctat atggaatata ggaacagcat tagcgcagcg gaggtaatca catacagtat 1620tgaagtctat atggaatata ggaacagcat tagcgcagcg gaggtaatca catacagtat 1620

atcgtgcggc atacgtcata ttgcactcag tcgccgcata tatcggtaga aggcagtacc 1680atcgtgcggc atacgtcata ttgcactcag tcgccgcata tatcggtaga aggcagtacc 1680

gtgcgcatat tgctgtgctg cagttataca gacgagtact gtcgaggtat ggcgcagtcg 1740gtgcgcatat tgctgtgctg cagttataca gacgagtact gtcgaggtat ggcgcagtcg 1740

ctataattca atccgtatat atgatgccat atgcgccgac agctactcgc catctgtgtg 1800ctataattca atccgtatat atgatgccat atgcgccgac agctactcgc catctgtgtg 1800

gtaggtggcg gtgagttgca ttcatccaga gtgcggaatt catgatatag cgtcgtagat 1860gtaggtggcg gtgagttgca ttcatccaga gtgcggaatt catgatatag cgtcgtagat 1860

ctgacgcacc accgaaccac attgagacgc caactgtgcg catcatatca ctgcatatat 1920ctgacgcacc accgaaccac attgagacgc caactgtgcg catcatatca ctgcatatat 1920

tacctctagg actgctccag aacgcgtatg tcattggagc ctgtcggcca attcacaccg 1980tacctctagg actgctccag aacgcgtatg tcattggagc ctgtcggcca attcacaccg 1980

agccatacac gcgcttgatt aagtagtata gacgagaata tcgtgacaga agcgcatgaa 2040agccatacac gcgcttgatt aagtagtata gacgagaata tcgtgacaga agcgcatgaa 2040

<210> 5<210> 5

<211> 2200<211> 2200

<212> DNA<212>DNA

<213> 人工序列<213> Artificial sequence

<400> 5<400> 5

gcgtggcgca gtcagcgtaa catgtaatgt atcgtaacta gacatgattc tgcatatcgc 240gcgtggcgca gtcagcgtaa catgtaatgt atcgtaacta gacatgattc tgcatatcgc 240

tactcgtggt cgcttgctcc agcgcttcat ctctggagca tagtcttgac tagtatatac 300tactcgtggt cgcttgctcc agcgcttcat ctctggagca tagtcttgac tagtatac 300

gagactgatc tcagtccgta tggccgatca cactgccagc ggagacgagc acaacgacag 360gagactgatc tcagtccgta tggccgatca cactgccagc ggagacgagc acaacgacag 360

cgtcgcggac tcgcatatct cagactatat tcattccgta tgtatatctc cgacaaggag 420cgtcgcggac tcgcatatct cagactatat tcattccgta tgtatatctc cgacaaggag 420

ctgaaggatc atgttctcac tcaccattac tgctgacaat aggcgcacat accagtatgc 480ctgaaggatc atgttctcac tcaccattac tgctgacaat aggcgcacat accagtatgc 480

gcggccggca cttctacaca cattgctgct aacatatgta gtcgaaccta tcttcaagca 540gcggccggca cttctacaca cattgctgct aacatatgta gtcgaaccta tcttcaagca 540

tctcgctgta gcgaacgcgt cgcacgtagc gagctaatac gcgtccagcg cgaattgtat 600tctcgctgta gcgaacgcgt cgcacgtagc gagctaatac gcgtccagcg cgaattgtat 600

actattatat attgcgctga gcgccagccg acgcgctctg cttatattat aatattgatg 660actattatat attgcgctga gcgccagccg acgcgctctg cttatattat aatattgatg 660

gtcggtgctc aagcgtgcac agtgaagttc cttcataccg tgatgcgcgg cgcgtacgtc 720gtcggtgctc aagcgtgcac agtgaagttc cttcataccg tgatgcgcgg cgcgtacgtc 720

gacgcctata tgttagaagg ccaatgtcgc attgttatct tccagcttgg taagatcctt 780gacgcctata tgttagaagg ccaatgtcgc attgttatct tccagcttgg taagatcctt 780

ggcagcgtca tatgaactcg gaagctacgt acgctagcgc tgagactgat acgcgtccta 840ggcagcgtca tatgaactcg gaagctacgt acgctagcgc tgagactgat acgcgtccta 840

atagctacga ttgcacgtac gtcggcacgc catgcgtgtc tagcgtcaac cttcagtatt 960atagctacga ttgcacgtac gtcggcacgc catgcgtgtc tagcgtcaac cttcagtatt 960

cgtacttatg tgagttcgcc actaagtctc tcttaggata cactgaatcc taccatacag 1020cgtacttatg tgagttcgcc actaagtctc tcttaggata cactgaatcc taccatacag 1020

gtgttggtac cacaggacgt cacgagctgg attcttgacg ctgcgatgcg ttcgcttgta 1140gtgttggtac cacaggacgt cacgagctgg attcttgacg ctgcgatgcg ttcgcttgta 1140

catcaacatg agccaatgta gctggcgcaa tctccgtcgt aggtacgtcc atataacagc 1260catcaacatg agccaatgta gctggcgcaa tctccgtcgt aggtacgtcc atataacagc 1260

gtgcaacgcc gatctgcgaa ggtagtgtcc tgatctgcta tccgaacatt cgcgttgtat 1320gtgcaacgcc gatctgcgaa ggtagtgtcc tgatctgcta tccgaacatt cgcgttgtat 1320

gacagtgagt cgtgctatac tcagttagcg caattaatgt gacctaatca caccagtcag 1440gacagtgagt cgtgctatac tcagttagcg caattaatgt gacctaatca caccagtcag 1440

gcacatatga ctataagcgc atgctgcgaa gtctagacat cctgacaact cgtacgcagg 1500gcacatatga ctataagcgc atgctgcgaa gtctagacat cctgacaact cgtacgcagg 1500

cgtgtatata ccgtatataa gaatcttcgg acgcatagcg actgcaacct acagcatcat 1560cgtgtatata ccgtatataa gaatcttcgg acgcatagcg actgcaacct acagcatcat 1560

gcagcctcga ggcgtgcagc gcacatatat ccgcggatat gcaataagca gcgtgccgtc 1620gcagcctcga ggcgtgcagc gcacatatat ccgcggatat gcaataagca gcgtgccgtc 1620

ctggtggtgg ctgctggtat acagcattct tatattcaat gacgtcagcc ttcctcgccg 1680ctggtggtgg ctgctggtat acagcattct tatattcaat gacgtcagcc ttcctcgccg 1680

cgtgaattag agacggtcct tgcttaggtc ctcctggttg acggtcatag taactataag 1740cgtgaattag agacggtcct tgcttaggtc ctcctggttg acggtcatag taactataag 1740

gtgacagcgc ggttcagaag cgcgactata tccgacgaga tatattaacg cctatcaaca 1800gtgacagcgc ggttcagaag cgcgactata tccgacgaga tatattaacg cctatcaaca 1800

tagaatgcaa gaggtacagg tccatggtcg cgtacgacga taagcgtgcg agaacgtgcc 1860tagaatgcaa gaggtacagg tccatggtcg cgtacgacga taagcgtgcg agaacgtgcc 1860

gtcatatacc gaggatatac tcgcagctgg cggcgaccag gtatatcgtc ttatctgata 1920gtcatatacc gaggatatac tcgcagctgg cggcgaccag gtatatcgtc ttatctgata 1920

tcatggactt acatactata tcagcgtgtt atggcgcgag cacgacagct gtatactgag 1980tcatggactt acatactata tcagcgtgtt atggcgcgag cacgacagct gtatactgag 1980

gaggcgcaat gccgtataac aagtagtata gacgagaata tcgtgacaga agcgcatgaa 2040gaggcgcaat gccgtataac aagtagtata gacgagaata tcgtgacaga agcgcatgaa 2040

<210> 6<210> 6

<211> 2200<211> 2200

<212> DNA<212>DNA

<213> 人工序列<213> Artificial sequence

<400> 6<400> 6

gcgtggcgca gtcagcgtaa gtgtatcaac ctaagtacgc tataccacaa tattctgagc 240gcgtggcgca gtcagcgtaa gtgtatcaac ctaagtacgc tataccacaa tattctgagc 240

gttgcgacgg tctagccgtg cactatcacc tacagtgccg tccgaaggcg gaagcgacca 300gttgcgacgg tctagccgtg cactatcacc tacagtgccg tccgaaggcg gaagcgacca 300

tcagcttgtc aatacagaac ctaagacgct gcatgccgat cgatacgacc atgcgaataa 360tcagcttgtc aatacagaac ctaagacgct gcatgccgat cgatacgacc atgcgaataa 360

ggtagcgaca tactgtcgca actgatatgt ggtataagtg cgcagtctaa cgcatatgtc 420ggtagcgaca tactgtcgca actgatatgt ggtataagtg cgcagtctaa cgcatatgtc 420

ttcgtacaca ccttaggagt accgccagcg tgctatgcac gcgagagcgc acaggtataa 480ttcgtacaca ccttaggagt accgccagcg tgctatgcac gcgagagcgc acaggtataa 480

tatagatcgc gcacattcaa gaagtcagcg cgtatagccg agtatatgta tatctcaccg 540tatagatcgc gcacattcaa gaagtcagcg cgtatagccg agtatatgta tatctcaccg 540

ctactgcaag ttgcgtgcgc ttacggtacg ctatatgagc ttggtatatt gatggacgct 600ctactgcaag ttgcgtgcgc ttacggtacg ctatatgagc ttggtatatt gatggacgct 600

tggacgatgc atatgagacc gcgacctgtt cgctctgcca agatgaagga gctcctcact 660tggacgatgc atatgagacc gcgacctgtt cgctctgcca agatgaagga gctcctcact 660

tcaagtcgac cggacgcctc gcgctgtcct tgtcaacaac atagtatcac tcgcaggtca 720tcaagtcgac cggacgcctc gcgctgtcct tgtcaacaac atagtatcac tcgcaggtca 720

tataagcgtg ccttctagcc tcaagagcca tatgacctga ctctagctct tatatacgca 780tataagcgtg ccttctagcc tcaagagcca tatgacctga ctctagctct tatatacgca 780

gatgcttaac taggtacatt gaagctacga aggctagcgc tgagactgat acgcgtccta 840gatgcttaac taggtacatt gaagctacga aggctagcgc tgagactgat acgcgtccta 840

atagctacca atgcacgtac gtcggcacgc catgcgtgtc tagcgtcaac cttcagtatt 960atagctacca atgcacgtac gtcggcacgc catgcgtgtc tagcgtcaac cttcagtatt 960

cgtacttatg tgagttcgcc actaagtctc tcttaggata cactgaatgc aaccatacag 1020cgtacttatg tgagttcgcc actaagtctc tcttaggata cactgaatgc aaccatacag 1020

gtgttggtac cacaggacgt cacgagcagc attcttgacg ctgcgatgcg ttcgcttgta 1140gtgttggtac cacaggacgt cacgagcagc attcttgacg ctgcgatgcg ttcgcttgta 1140

catcaacaag tgccaatgta gctggcgcaa tctccgtcgt aggtacgtcc atataacagc 1260catcaacaag tgccaatgta gctggcgcaa tctccgtcgt aggtacgtcc atataacagc 1260

gtgcaacgcc gatctgcgaa ggtagtgtcc tgatctgcta tccgaacaat ggcgttgtat 1320gtgcaacgcc gatctgcgaa ggtagtgtcc tgatctgcta tccgaacaat ggcgttgtat 1320

gacagtgagt cgtgctatac aacgaagtaa gcggagctat ccttatcacc acgatcacta 1440gacagtgagt cgtgctatac aacgaagtaa gcggagctat ccttatcacc acgatcacta 1440

gtcgcgcaag cacctcaatc tattcgatcg acgcgcatta gcgcacgctc ttcgcaggca 1500gtcgcgcaag cacctcaatc tattcgatcg acgcgcatta gcgcacgctc ttcgcaggca 1500

tgattgcata cctagttcct gcgacatgat atatgaacat caggtcaaca gtgaaggata 1560tgattgcata cctagttcct gcgacatgat atatgaacat caggtcaaca gtgaaggata 1560

tgtgcggacg cacacgcgtt gtagcgcggc acatattgag tatgaacttc catgatagca 1620tgtgcggacg cacacgcgtt gtagcgcggc acatattgag tatgaacttc catgatagca 1620

acgcgtgtgt gtgacaccgc tacaatccac atgacagtat gctcggcctt gtggctggtc 1680acgcgtgtgtgtgacaccgc tacaatccac atgacagtat gctcggcctt gtggctggtc 1680

ttgtggacgc atatatcgac tgtgccatat agtcatatca gagtgcgcat ccgataatcc 1740ttgtggacgc atatatcgac tgtgccatat agtcatatca gagtgcgcat ccgataatcc 1740

ttgctgacac cacacgcgcc ttagcgctat tagatgccgt tacctaggca tgtaatggtc 1800ttgctgacac cacacgcgcc ttagcgctat tagatgccgt tacctaggca tgtaatggtc 1800

tctacgtcaa gttgataggc ttgacgccga atccgaatta attgcagacg attgcgtcct 1860tctacgtcaa gttgataggc ttgacgccga atccgaatta attgcagacg attgcgtcct 1860

cgcgtatatg gattgactac gcggccacct atgtgtcata gcagacacgc atagtacgcc 1920cgcgtatatg gattgactac gcggccacct atgtgtcata gcagacacgc atagtacgcc 1920

tggaggcgag ttaccgcgtc atagacgtct gttaatcaga tatgctcctc attatgcaca 1980tggaggcgag ttaccgcgtc atagacgtct gttaatcaga tatgctcctc attatgcaca 1980

tcagcacttg cgcttcgtgg aagtagtata gacgagaata tcgtgacaga agcgcatgaa 2040tcagcacttg cgcttcgtgg aagtagtata gacgagaata tcgtgacaga agcgcatgaa 2040

<210> 7<210> 7

<211> 21<211> 21

<212> DNA<212>DNA

<213> 人工序列<213> Artificial sequence

<400> 7<400> 7

gtgtgcaacc tatggcgaca g 21gtgtgcaacc tatggcgaca g 21

<210> 8<210> 8

<211> 22<211> 22

<212> DNA<212>DNA

<213> 人工序列<213> Artificial sequence

<400> 8<400> 8

cacatagctc tcagagtcgc gg 22cacatagctc tcagagtcgc gg 22

Claims

Translated fromChinese

1.一种测序质控标准品，其特征在于，所述测序质控标准品依次包括引物区1、特异区1、同源区、特异区2和引物区2；1. A quality control standard for sequencing, characterized in that, the quality control standard for sequencing comprises a primer region 1, a specific region 1, a homologous region, a specific region 2 and a primer region 2 in sequence;

2.根据权利要求1所述的测序质控标准品，其特征在于，所述同源区中每100bp存在1-4个突变位点。2. The sequencing quality control standard product according to claim 1, wherein there are 1-4 mutation sites per 100 bp in the homology region.

3.根据权利要求1所述的测序质控标准品，其特征在于，所述测序质控标准品的长度为1000-5000bp，优选为1000-3000bp。3. The sequencing quality control standard according to claim 1, characterized in that, the length of the sequencing quality control standard is 1000-5000bp, preferably 1000-3000bp.

4.根据权利要求1所述的测序质控标准品，其特征在于，所述测序质控标准品与天然序列的同源性检测采用MEGABLAST检测，结果为0；4. The sequencing quality control standard product according to claim 1, wherein the homology detection between the sequencing quality control standard product and the native sequence is detected by MEGABLAST, and the result is 0;

5.根据权利要求1所述的测序质控标准品，其特征在于，所述测序质控标准品的GC含量为45％-55％，优选为50％。5. The sequencing quality control standard product according to claim 1, characterized in that, the GC content of the sequencing quality control standard product is 45%-55%, preferably 50%.

6.根据权利要求1-5任一项所述的测序质控标准品，其特征在于，所述测序质控标准品包括SEQ ID NO.1、SEQ ID NO.2、SEQ ID NO.3、SEQ ID NO.4、SEQ ID NO.5和SEQ ID NO.6中的至少一个；6. The sequencing quality control standard according to any one of claims 1-5, wherein the sequencing quality control standard comprises SEQ ID NO.1, SEQ ID NO.2, SEQ ID NO.3, At least one of SEQ ID NO.4, SEQ ID NO.5 and SEQ ID NO.6;

7.权利要求1-6任一项所述的测序质控标准品在单分子测序和/或高通量测序中的应用。7. The application of the sequencing quality control standard product described in any one of claims 1-6 in single-molecule sequencing and/or high-throughput sequencing.

8.根据权利要求7所述的应用，其特征在于，所述测序质控标准品的添加量为待检测样本的0.5-10w/w％。8. The application according to claim 7, wherein the amount of the sequencing quality control standard is 0.5-10w/w% of the sample to be tested.

9.一种单分子测序质控试剂盒，其特征在于，包括权利要求1-6任一项所述的测序质控标准品。9. A single-molecule sequencing quality control kit, characterized in that it comprises the sequencing quality control standard according to any one of claims 1-6.

10.一种高通量测序质控试剂盒，其特征在于，包括权利要求1-6任一项所述的测序质控标准品。10. A high-throughput sequencing quality control kit, characterized in that it comprises the sequencing quality control standard according to any one of claims 1-6.