CN110438121A

Movatterモバイル変換

Info

Publication number: CN110438121A
Application number: CN201810414797.8A
Authority: CN
Inventors: 邵迪; 叶明芝; 宋炎
Original assignee: Guangzhou Huada Gene Medical Laboratory Co Ltd; Shenzhen Huada Clinical Laboratory Center; BGI Shenzhen Co Ltd
Current assignee: Guangzhou Huada Gene Medical Laboratory Co Ltd; Shenzhen Huada Clinical Laboratory Center; BGI Shenzhen Co Ltd
Priority date: 2018-05-03
Filing date: 2018-05-03
Publication date: 2019-11-12

Abstract

Translated fromChinese

本发明提供了一种接头。该接头包括：单链核酸段，所述单链核酸段由单链核酸形成，所述单链核酸段包括样本标签序列、随机序列、引物结合序列；以及双链核酸段，所述双链核酸段由双链核酸形成。该接头随机引物设置在接头的单链区，有利于接头上下游引物的退火，接头的连接效率以及建库模板DNA的利用率显著更高，接头处引入随机序列作为分子标签序列，可用于建库模板的DNA分子的特异性的标记，从而在测序序列中区分真实变异和建库或测序中引入的碱基错误，可用于对突变丰度低到1‰的基因进行检测。The invention provides a joint. The linker includes: a single-stranded nucleic acid segment, the single-stranded nucleic acid segment is formed by a single-stranded nucleic acid, and the single-stranded nucleic acid segment includes a sample tag sequence, a random sequence, and a primer binding sequence; and a double-stranded nucleic acid segment, the double-stranded nucleic acid segment Segments are formed from double-stranded nucleic acids. The adapter random primer is set in the single-stranded region of the adapter, which is conducive to the annealing of the upstream and downstream primers of the adapter, the connection efficiency of the adapter and the utilization rate of the template DNA for library construction are significantly higher, and a random sequence is introduced into the adapter as a molecular tag sequence, which can be used to construct The specific labeling of the DNA molecule of the library template, so as to distinguish the real variation and the base error introduced in the library construction or sequencing in the sequencing sequence, and can be used to detect genes with a mutation abundance as low as 1‰.

Description

Translated fromChinese

接头、接头文库及其应用Adapters, adapter libraries and their applications

技术领域technical field

本发明涉及生物技术领域，具体地，本发明涉及接头、接头文库及其应用，更具体地，本发明涉及接头、接头文库、测序文库、构建测序文库的方法以及基因序列分析方法。The present invention relates to the field of biotechnology. Specifically, the present invention relates to adapters, adapter libraries and applications thereof. More specifically, the present invention relates to adapters, adapter libraries, sequencing libraries, methods for constructing sequencing libraries, and gene sequence analysis methods.

背景技术Background technique

传统肿瘤组织活检材存在取样困难，空间和时间异质性等问题，在检测肿瘤特定基因突变时存在一定的局限性。液体活检作为一种无创检测技术，近年来发展迅速，受到广泛的关注。可广泛用于肿瘤早期筛查、诊断分型、治疗用药期间的药物敏感基因检测、疗效实时监测等领域。Traditional tumor biopsy materials have problems such as sampling difficulties, spatial and temporal heterogeneity, and certain limitations in the detection of tumor-specific gene mutations. As a non-invasive detection technology, liquid biopsy has developed rapidly in recent years and has received extensive attention. It can be widely used in the fields of early tumor screening, diagnosis and typing, drug sensitivity gene detection during treatment and medication, and real-time monitoring of curative effect.

液态活检按照检测对象分类，主要可以分为核酸层面的ctDNA(circulatingtumor DNA)和细胞层面的循环肿瘤细胞(CTC)。因为与CTC相比，血流中的ctDNA浓度较高，近年来关于ctDNA在肿瘤临床的基础和转化研究吸引了极大的关注。ctDNA是指人体血液循环系统中不断流动的携带一定特征(包括突变，缺少，插入，重排，拷贝数异常，甲基化等)来自肿瘤基因组的DNA片段。ctDNA的主要来源包括：1、来自坏死的肿瘤细胞；2、来自凋亡的肿瘤细胞；3、循环肿瘤细胞；4、来自肿瘤细胞分泌的外排体。但由于ctDNA在血浆中含量极其微少，ctDNA对于早期的检测还有一定的局限性，目前主要用于晚期肿瘤的检测。Liquid biopsy is classified according to the detection objects, and can be mainly divided into ctDNA (circulating tumor DNA) at the nucleic acid level and circulating tumor cells (CTC) at the cell level. Because of the higher concentration of ctDNA in the bloodstream compared with CTCs, basic and translational research on ctDNA in cancer clinics has attracted great attention in recent years. ctDNA refers to the DNA fragments from the tumor genome that carry certain characteristics (including mutations, deletions, insertions, rearrangements, copy number abnormalities, methylation, etc.) that are constantly flowing in the human blood circulation system. The main sources of ctDNA include: 1. Necrotic tumor cells; 2. Apoptotic tumor cells; 3. Circulating tumor cells; 4. Exosomes secreted by tumor cells. However, due to the extremely small amount of ctDNA in plasma, ctDNA has certain limitations for early detection, and is currently mainly used for the detection of advanced tumors.

高通量测序检测技术被认为有可能突破这一局限，使得ctDNA被应用于肿瘤早期筛查。然而，如何通过高通量测序检测技术使得ctDNA被应用于肿瘤早期筛查，仍然是科学家拭待解决的关键问题。High-throughput sequencing detection technology is considered to have the potential to break through this limitation, allowing ctDNA to be applied to early screening of tumors. However, how to apply ctDNA to the early screening of tumors through high-throughput sequencing detection technology is still a key issue for scientists to be resolved.

发明内容Contents of the invention

本申请是基于发明人对以下事实和问题的发现和认识做出的：This application is based on the inventor's discovery and recognition of the following facts and problems:

目前的高通量测序中，在测序文库构建、DNA分子簇形成过程和并行测序中存在PCR引入的碱基错误，而这些错误会掩盖ctDNA分子的真实碱基突变，使得低频体细胞变异难以被检测。In the current high-throughput sequencing, there are base errors introduced by PCR in the construction of sequencing libraries, the formation of DNA molecular clusters and parallel sequencing, and these errors will cover up the real base mutations of ctDNA molecules, making it difficult for low-frequency somatic mutations to be detected. detection.

本发明旨在至少在一定程度上解决相关技术中的技术问题之一。The present invention aims to solve one of the technical problems in the related art at least to a certain extent.

为此，在发明的第一方面，本发明提出了一种接头。根据本发明的实施例，所述接头包括：单链核酸段，所述单链核酸段由单链核酸形成，所述单链核酸段包括样本标签序列、随机序列、引物结合序列；以及双链核酸段，所述双链核酸段由双链核酸形成。根据本发明实施例的接头随机序列设置在接头的单链区，有利于接头上下游引物的退火，接头引物退火率可达到88％。根据本发明实施例的接头用于构建测序，相比于现有技术，接头与建库模板DNA的连接效率显著提高(可高达45.8％)，相比于现有技术，可达到更高的建库模板DNA的利用率；并且接头处引入随机序列作为分子标签序列(UID)，可用于建库模板的DNA分子的特异性的标记，从而在测序序列中区分真实变异和建库或测序中，如PCR扩增中引入的碱基错误，可用于对突变丰度低到1‰的基因进行检测。To this end, in a first aspect of the invention, the invention proposes a joint. According to an embodiment of the present invention, the linker includes: a single-stranded nucleic acid segment, the single-stranded nucleic acid segment is formed by a single-stranded nucleic acid, and the single-stranded nucleic acid segment includes a sample tag sequence, a random sequence, and a primer binding sequence; and a double-stranded A nucleic acid segment formed from a double-stranded nucleic acid. According to the embodiment of the present invention, the random sequence of the linker is set in the single-strand region of the linker, which is beneficial to the annealing of the upstream and downstream primers of the linker, and the annealing rate of the linker primers can reach 88%. The linker according to the embodiment of the present invention is used to construct the sequencing. Compared with the prior art, the connection efficiency between the linker and the template DNA for library construction is significantly improved (up to 45.8%). Compared with the prior art, a higher construction rate can be achieved. The utilization rate of library template DNA; and the random sequence is introduced into the joint as a molecular tag sequence (UID), which can be used to mark the specificity of the DNA molecule of the library template, so as to distinguish between real variation and library construction or sequencing in the sequencing sequence, For example, base errors introduced in PCR amplification can be used to detect genes with a mutation abundance as low as 1‰.

在本发明的第二方面，本发明提出了一种接头文库。根据本发明的实施例，所述接头文库包括多个接头，所述多个接头的每一个为前面所述的接头。根据本发明实施例的接头文库在每个接头的单链区处引入随机序列作为标记序列，一方面有利于接头上下游引物的退火，提高接头与建库模板DNA的连接效率，另一方面可用于建库模板的DNA分子的特异性的标记，从而在测序序列中区分真实变异和建库或测序中，如PCR扩增中引入的碱基错误，可用于对突变丰度低到1‰的基因进行检测。In a second aspect of the present invention, the present invention proposes a linker library. According to an embodiment of the present invention, the linker library includes a plurality of linkers, each of which is the aforementioned linker. According to the adapter library of the embodiment of the present invention, a random sequence is introduced into the single-stranded region of each adapter as a marker sequence. The specific labeling of the DNA molecule of the library template, so as to distinguish the real variation in the sequencing sequence from the base error introduced in the library building or sequencing, such as the base error introduced in the PCR amplification, which can be used to detect the mutation abundance as low as 1‰ genetic testing.

在本发明的第三方面，本发明提出了一种测序文库。根据本发明的实施例，所述测序文库包括多个插入片段，并且所述多个插入片段的每一个分别与前面所述的接头文库中的一个接头相连，其中，所述插入片段的3’末端与所述双链核酸段中3’非粘性末端所在的核苷酸链的5’端相连，所述插入片段与所述随机序列一一对应。根据本发明实施例的测序文库中的插入片段与前面所述的接头相连，测序文库应用于测序，可区分真实变异和建库或测序中，如PCR扩增中引入的碱基错误，可用于对突变丰度低到1‰的基因进行检测。In the third aspect of the present invention, the present invention provides a sequencing library. According to an embodiment of the present invention, the sequencing library includes a plurality of insertion fragments, and each of the plurality of insertion fragments is respectively connected to an adapter in the aforementioned adapter library, wherein the 3' of the insertion fragment The end is connected to the 5' end of the nucleotide chain where the 3' non-sticky end is located in the double-stranded nucleic acid segment, and the insert segment corresponds to the random sequence one by one. The insert fragments in the sequencing library according to the embodiment of the present invention are connected with the adapters described above, and the sequencing library is applied to sequencing, which can distinguish real variation from library construction or sequencing, such as base errors introduced in PCR amplification, which can be used to Genes with mutation abundance as low as 1‰ were detected.

在本发明的第四方面，本发明提出了一种构建测序文库的方法。根据本发明的实施例，所述方法包括：将目的片段与前面所述的接头进行连接处理，所述接头是以前面所述的接头文库的形式提供的；将连接处理产物进行扩增处理，扩增产物构成所述测序文库。根据本发明实施例的方法中，接头与目的片段的连接效率高，目的片段的利用率高；利用根据本发明实施例的方法获得的测序文库中的目的片段与前面所述的接头相连，可区分真实变异和建库或测序中，如PCR扩增中引入的碱基错误，可用于对突变丰度低到1‰的基因进行检测。In the fourth aspect of the present invention, the present invention provides a method for constructing a sequencing library. According to an embodiment of the present invention, the method includes: performing ligation processing on the target fragment and the aforementioned adapter, and the adapter is provided in the form of the aforementioned adapter library; performing amplification processing on the ligation processing product, Amplified products constitute the sequencing library. In the method according to the embodiment of the present invention, the connection efficiency of the adapter and the target fragment is high, and the utilization rate of the target fragment is high; the target fragment in the sequencing library obtained by using the method according to the embodiment of the present invention is connected with the aforementioned adapter, which can Distinguishing between true variation and base errors introduced during library construction or sequencing, such as PCR amplification, can be used to detect genes with a mutation abundance as low as 1‰.

在本发明的第五方面，本发明提出了一种测序文库。根据本发明的实施例，所述测序文库是通过前面所述的方法构建的。根据本发明实施例的测序文库中，目的片段与接头连接效率高，目的片段的利用率高；根据本发明实施例的测序文库应用于测序，可区分真实变异和建库或测序中，如PCR扩增中引入的碱基错误，可用于对突变丰度低到1‰的基因进行检测。In the fifth aspect of the present invention, the present invention provides a sequencing library. According to an embodiment of the present invention, the sequencing library is constructed by the method described above. In the sequencing library according to the embodiment of the present invention, the connection efficiency of the target fragment and the adapter is high, and the utilization rate of the target fragment is high; the sequencing library according to the embodiment of the present invention is applied to sequencing, and can distinguish real variation from library construction or sequencing, such as PCR Base errors introduced during amplification can be used to detect genes with mutation abundance as low as 1‰.

在本发明的第六方面，本发明提出了一种基因序列分析方法。根据本发明的实施例，所述方法包括：基于待测核酸样本，按照前面所述的方法，构建测序文库；对所述测序文库进行测序，以便获得包含多个测序读段的测序结果，将所述测序结果与参考序列进行比对，以便确定具有突变位点的读段序列，将所述具有突变位点的读段序列进行分簇处理，其中，具有相同随机序列的读段序列被分配至相同的簇中，以及针对每个簇，基于所述突变位点在所述读段序列上的位置，确定所述突变位点的来源；其中，在每个簇中，所述突变位点在所述读段序列上的位置相同，是所述突变位点为所述核酸样本自有突变的指示，所述突变位点在所述读段序列上的位置不同，是所述突变位点来自于测序过程的指示；在进行所述比对之前，预先基于样本标签序列的信息，对所述测序结果进行分类，其中，具有相同样本标签序列的测序读段被归类为来自于相同的核酸样本。根据本发明实施例的方法可用于区分核酸样本自有突变和测序过程中，如PCR扩增中引入的碱基错误，可用于对突变丰度低到1‰的基因进行检测。In the sixth aspect of the present invention, the present invention provides a gene sequence analysis method. According to an embodiment of the present invention, the method includes: constructing a sequencing library based on the nucleic acid sample to be tested according to the aforementioned method; performing sequencing on the sequencing library to obtain a sequencing result comprising multiple sequencing reads, and The sequencing results are compared with the reference sequence to determine the read sequence with the mutation site, and the read sequence with the mutation site is clustered, wherein the read sequence with the same random sequence is assigned into the same cluster, and for each cluster, based on the position of the mutation site on the read sequence, determine the source of the mutation site; wherein, in each cluster, the mutation site The position on the read sequence is the same, which means that the mutation site is an indication of the mutation of the nucleic acid sample, and the position of the mutation site on the read sequence is different, which is the mutation site Instructions from the sequencing process; prior to performing the alignment, the sequencing results are pre-classified based on sample tag sequence information, wherein sequencing reads with the same sample tag sequence are classified as coming from the same Nucleic acid samples. The method according to the embodiment of the present invention can be used to distinguish natural mutations of nucleic acid samples from base errors introduced during the sequencing process, such as PCR amplification, and can be used to detect genes with a mutation abundance as low as 1‰.

本发明的附加方面和优点将在下面的描述中部分给出，部分将从下面的描述中变得明显，或通过本发明的实践了解到。Additional aspects and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

附图说明Description of drawings

本发明的上述和/或附加的方面和优点从结合下面附图对实施例的描述中将变得明显和容易理解，其中：The above and/or additional aspects and advantages of the present invention will become apparent and comprehensible from the description of the embodiments in conjunction with the following drawings, wherein:

图1是根据本发明实施例的UID接头结构示意图；Fig. 1 is a schematic structural diagram of a UID joint according to an embodiment of the present invention;

图2是根据本发明实施例的UID接头退火后Agilent Bioanalyzer 2100检测结果图；Figure 2 is a diagram of the detection results of Agilent Bioanalyzer 2100 after the UID joint is annealed according to an embodiment of the present invention;

图3是根据本发明实施例的带UID的接头测序文库建库Agilent Bioanalyzer2100检测结果图；Fig. 3 is a diagram of detection results of Agilent Bioanalyzer 2100 for library construction of adapter sequencing library with UID according to an embodiment of the present invention;

图4是根据本发明实施例的测序数据分析示意图；Fig. 4 is a schematic diagram of sequencing data analysis according to an embodiment of the present invention;

图5是根据本发明实施例的UID读段簇中的真实的突变示例图，箭头所指的方框里的g或G为正常碱基，A或a为突变碱基；以及Fig. 5 is an example diagram of a real mutation in a UID read cluster according to an embodiment of the present invention, g or G in the box indicated by the arrow is a normal base, and A or a is a mutant base; and

图6是根据本发明实施例的UID读段簇中的PCR引入错误的示例图，方框中的C为PCR引入碱基错误。FIG. 6 is an example diagram of PCR-introduced errors in UID read clusters according to an embodiment of the present invention, and C in the box is a PCR-introduced base error.

具体实施方式Detailed ways

下面详细描述本发明的实施例，所述实施例的示例在附图中示出。下面通过参考附图描述的实施例是示例性的，旨在用于解释本发明，而不能理解为对本发明的限制。Embodiments of the invention are described in detail below, examples of which are illustrated in the accompanying drawings. The embodiments described below by referring to the figures are exemplary and are intended to explain the present invention and should not be construed as limiting the present invention.

如无特别说明，本申请所述的“插入片段”、“目的片段”、“待测核酸样本”均是指“建库模板DNA”或“建库模板DNA分子”，即是指与本申请所述的接头进行直接连接的DNA分子。Unless otherwise specified, the "insert fragment", "target fragment" and "nucleic acid sample to be tested" mentioned in this application all refer to "library construction template DNA" or "library construction template DNA molecule", which refers to the The adapters perform direct ligation of DNA molecules.

接头connector

一方面，本发明提出了一种接头。根据本发明的实施例，所述接头包括：单链核酸段，所述单链核酸段由单链核酸形成，所述单链核酸段包括样本标签序列、随机序列、引物结合序列；以及双链核酸段，所述双链核酸段由双链核酸形成。发明人发现，如果将随机序列设计在接头的双链区，接头上下游引物退火形成双链时能互补形成双链序列的机率比较低，而不利于接头与建库模板DNA的连接。根据本发明实施例的接头随机序列设置在接头的单链区，有利于接头上下游引物的退火(退火率可高达88％)，大大提高退火时形成互补双链的概率。根据本发明实施例的接头用于构建测序文库，接头与建库模板DNA的连接效率(可高达45.8％)以及建库模板DNA的利用率显著更高，接头处引入随机序列作为分子标签序列，可用于建库模板的DNA分子的特异性的标记，从而在测序序列中区分真实变异和建库或测序中，如PCR扩增中引入的碱基错误，可用于对突变丰度低到1‰的基因进行检测。In one aspect, the invention provides a joint. According to an embodiment of the present invention, the linker includes: a single-stranded nucleic acid segment, the single-stranded nucleic acid segment is formed by a single-stranded nucleic acid, and the single-stranded nucleic acid segment includes a sample tag sequence, a random sequence, and a primer binding sequence; and a double-stranded A nucleic acid segment formed from a double-stranded nucleic acid. The inventors found that if a random sequence is designed in the double-stranded region of the adapter, the probability that the upstream and downstream primers of the adapter can be complementary to form a double-stranded sequence when annealing to form a double strand is relatively low, which is not conducive to the connection of the adapter and the template DNA for library construction. According to the embodiment of the present invention, the random sequence of the linker is set in the single-stranded region of the linker, which is beneficial to the annealing of the upstream and downstream primers of the linker (the annealing rate can be as high as 88%), and greatly increases the probability of forming complementary double strands during annealing. The linker according to the embodiment of the present invention is used to construct a sequencing library, the connection efficiency of the linker and the template DNA for library construction (up to 45.8%) and the utilization rate of the template DNA for library construction are significantly higher, and a random sequence is introduced into the linker as a molecular tag sequence, It can be used for the specific labeling of the DNA molecule of the library template, so as to distinguish the real variation in the sequencing sequence from the base error introduced in the library building or sequencing, such as the base error introduced in the PCR amplification, which can be used to detect the mutation abundance as low as 1‰ genes are tested.

根据本发明的实施例，所述双链核酸段具有3’粘性末端，优选地，3’粘性末端为粘性T末端，并且所述双链核酸段的3’非粘性末端与所述单链核酸段的5’末端相连。根据本发明实施例的接头在上述结构下，接头与建库模板DNA可进行T-A连接，接头与建库模板DNA的连接效率进一步提高。According to an embodiment of the present invention, the double-stranded nucleic acid segment has a 3' sticky end, preferably, the 3' sticky end is a sticky T-terminus, and the 3' non-sticky end of the double-stranded nucleic acid segment is connected to the single-stranded nucleic acid The 5' ends of the segments are connected. According to the adapter of the embodiment of the present invention, under the above structure, the adapter and the library construction template DNA can be connected by T-A, and the connection efficiency of the adapter and the library construction template DNA is further improved.

根据本发明的实施例，所述样本标签序列的长度为4～10nt，优选8nt，所述随机序列的长度为6～10nt，优选6nt。进而样本标签序列用于区分不同的样本，8nt足够区分4⁸种不同的样本，随机序列作为分子标签序列，用于区分不同DNA的分子，6nt即可区分4⁶种不同的分子。发明人发现，样本标签序列的长度为4～10nt，随机序列的长度为6～10nt，可满足区分感兴趣的目标基因区域(100kb-500kb)乃至区分全基因组中真实变异和建库引入错误的需求。According to an embodiment of the present invention, the length of the sample tag sequence is 4-10 nt, preferably 8 nt, and the length of the random sequence is 6-10 nt, preferably 6 nt. Furthermore, the sample label sequence is used to distinguish different samples.^8nt is enough to distinguish 48 different samples. The random sequence is used as the molecular label sequence to distinguish different DNA molecules.^6nt can distinguish 46 different molecules. The inventors found that the length of the sample tag sequence is 4-10nt, and the length of the random sequence is 6-10nt, which can meet the requirements of distinguishing the target gene region of interest (100kb-500kb) and even distinguishing the real variation in the whole genome from the error introduced by library construction. need.

根据本发明的实施例，所述随机序列的长度为6nt。发明人发现，样本标签序列的长度为8nt，随机序列的长度为6nt，可满足区分感兴趣的目标基因区域(100kb-500kb)中真实变异和建库引入错误的需求。According to an embodiment of the present invention, the length of the random sequence is 6nt. The inventors found that the length of the sample tag sequence is 8nt, and the length of the random sequence is 6nt, which can meet the requirement of distinguishing the real variation in the target gene region of interest (100kb-500kb) and the error introduced by library construction.

根据本发明的实施例，沿着5’到3’的方向，所述单链核酸段包括样本标签序列、随机序列和引物结合序列。发明人发现，样本的区分比分子的区分对测序质量具有更显著的影响，而测序的质量又随着测序的循环数而降低，单链核酸段中的样本标签序列、随机序列以及引物结合序列在上述排列顺序下，有助于测序质量的进一步提高。According to an embodiment of the present invention, along the direction from 5' to 3', the single-stranded nucleic acid segment includes a sample tag sequence, a random sequence and a primer binding sequence. The inventors found that sample differentiation has a more significant impact on sequencing quality than molecular differentiation, and the quality of sequencing decreases with the number of cycles of sequencing. Sample tag sequences, random sequences, and primer-binding sequences in single-stranded nucleic acid segments Under the above arrangement order, it is helpful to further improve the sequencing quality.

根据本发明的实施例，所述双链核酸段中3’非粘性末端所在的核苷酸链的5’端具有磷酸化修饰。进而可与后续待插入片段或目的片段进行连接。According to an embodiment of the present invention, the 5' end of the nucleotide chain where the 3' non-sticky end is located in the double-stranded nucleic acid segment has a phosphorylation modification. Furthermore, it can be connected with the subsequent fragment to be inserted or the fragment of interest.

根据本发明的实施例，所述样本标签序列具有SEQ ID NO：1所示的核苷酸序列。According to an embodiment of the present invention, the sample tag sequence has the nucleotide sequence shown in SEQ ID NO:1.

TCATAAAT(SEQ ID NO：1)。TCATAAAT (SEQ ID NO: 1).

根据本发明的实施例，所述引物结合序列具有SEQ ID NO：2所示的核苷酸序列。According to an embodiment of the present invention, the primer binding sequence has the nucleotide sequence shown in SEQ ID NO:2.

ACACTCGGTTCCTCAAC(SEQ ID NO：2)。ACACTCGGTTCCTCAAC (SEQ ID NO: 2).

接头文库adapter library

另一方面，本发明提出了一种接头文库。根据本发明的实施例，所述接头文库包括多个接头，所述多个接头的每一个为前面所描述的。发明人在开发接头文库的过程中，发现，在接头的单链区引入作为分子标签的随机序列，有利于接头上下游引物的退火，进而提高接头与模板DNA的连接率以及模板DNA的利用率；同时根据本发明实施例的接头文库在每个接头处引入随机序列作为标记序列，适用于对感兴趣的目标基因区域(100kb-500kb)的建库模板的DNA分子进行特异性的标记，从而在测序序列中区分真实变异和建库或测序中，如PCR扩增中引入的碱基错误。In another aspect, the present invention provides a linker library. According to an embodiment of the present invention, the adapter library includes a plurality of adapters, each of which is described above. In the process of developing the adapter library, the inventors found that introducing a random sequence as a molecular tag into the single-stranded region of the adapter facilitates the annealing of the upstream and downstream primers of the adapter, thereby improving the connection rate of the adapter and the template DNA and the utilization rate of the template DNA At the same time, according to the adapter library of the embodiment of the present invention, a random sequence is introduced at each adapter as a marker sequence, which is suitable for specific labeling of the DNA molecule of the library template of the target gene region of interest (100kb-500kb), thereby Distinguish true variation in sequencing sequences from base errors introduced during library construction or sequencing, such as PCR amplification.

根据本发明的实施例，所述接头文库含有至少4⁶种不同的随机序列。根据本发明实施例的接头文库在每个接头处引入随机序列作为标记序列，接头文库含有至少4⁶种不同的随机序列，适用于对感兴趣的目标基因区域(100kb-500kb)的建库模板的DNA分子进行特异性的标记，从而在测序序列中区分真实变异和建库或测序中，如PCR扩增中引入的碱基错误，可用于对突变丰度低到1‰的基因进行检测。According to an embodiment of the present invention, the linker library contains at least⁴⁶ different random sequences. According to the adapter library of the embodiment of the present invention, a random sequence is introduced at each adapter as a marker sequence, and the adapter library contains at least⁴⁶ different random sequences, which are suitable for the library construction template of the target gene region (100kb-500kb) of interest Specific labeling of DNA molecules, so as to distinguish between real variation and base errors introduced in library construction or sequencing, such as PCR amplification, can be used to detect genes with a mutation abundance as low as 1‰.

接头设计方法Joint design method

再一方面，本发明提出了接头的设计方法。根据本发明的实施例，所述方法包括：1)，设计含有样本标签序列和6nt的随机序列的预定bottom接头，这样就至少产生有4⁶种不同的分子随机序列，进而可满足感兴趣的目标基因区域(100kb-500kb)测序和突变检测的需求；2)订购步骤1)所获得序列并进行5’磷酸化处理；3)将步骤2)所获得产物进行稀释，并与预定UP接头序列进行退火处理，以便获得所述接头库；其中，所述预定bottom接头具有SEQ ID NO：3所示的核苷酸序列；所述预定up接头具有SEQ ID NO：4所示的核苷酸序列。In yet another aspect, the present invention provides a method for designing joints. According to an embodiment of the present invention, the method includes: 1), designing a predetermined bottom linker containing a sample tag sequence and a 6nt random sequence, so that at least 4 to⁶ different molecular random sequences are generated, which can then meet the requirements of interest. Requirements for sequencing and mutation detection of the target gene region (100kb-500kb); 2) Order the sequence obtained in step 1) and perform 5'phosphorylation; 3) Dilute the product obtained in step 2) and combine it with the predetermined UP linker sequence Perform annealing treatment to obtain the linker library; wherein, the predetermined bottom linker has the nucleotide sequence shown in SEQ ID NO: 3; the predetermined up linker has the nucleotide sequence shown in SEQ ID NO: 4 .

其中，带双下划线的为其中一个8nt样本标签序列(barcode)，带单下划线的NNNNNN为随机序列(UID)。Among them, the double underline is one of the 8nt sample label sequences (barcode), and the NNNNNN with a single underline is a random sequence (UID).

TTGTCTTCCTAAGGAACGACATGGCTACGATCCGACTT(SEQ ID NO：4)。TTGTCTTCCTAAGGAACGACATGGCTACGATCCGACTT (SEQ ID NO: 4).

根据本发明实施例的方法所获得的接头文库在每个接头的单链区引入随机序列作为标记序列，一方面有利于接头上下游引物的退火，另一方面可有利于提高接头可用于建库模板的DNA分子的特异性的标记，从而在测序序列中区分真实变异和建库或测序中，如PCR扩增中引入的碱基错误，可用于对突变丰度低到1‰的基因进行检测。The adapter library obtained according to the method of the embodiment of the present invention introduces a random sequence into the single-stranded region of each adapter as a marker sequence, which is beneficial to the annealing of the upstream and downstream primers of the adapter on the one hand, and on the other hand, can be beneficial to improve the adapter’s usability for library construction The specific labeling of the DNA molecule of the template, so as to distinguish the real variation in the sequencing sequence from the base error introduced in the library construction or sequencing, such as the base error introduced in the PCR amplification, which can be used to detect the gene with a mutation abundance as low as 1‰ .

测序文库Sequencing library

再一方面，本发明提出了一种测序文库。根据本发明的实施例，所述测序文库包括多个插入片段，并且所述多个插入片段的每一个分别与前面所述的接头文库中的一个接头相连，其中，所述插入片段的3’末端与所述双链核酸段中3’非粘性末端所对应的5’端相连，所述插入片段与所述随机序列一一对应。根据本发明实施例的测序文库中的插入片段与前面所述的接头相连，每个插入片段可具有至少4⁶种不同的随机序列标记，测序文库应用于测序，可区分真实变异和建库或测序中，如PCR扩增中引入的碱基错误，可用于对突变丰度低到1‰的基因进行检测。In yet another aspect, the present invention provides a sequencing library. According to an embodiment of the present invention, the sequencing library includes a plurality of insertion fragments, and each of the plurality of insertion fragments is respectively connected to an adapter in the aforementioned adapter library, wherein the 3' of the insertion fragment The end is connected to the 5' end corresponding to the 3' non-sticky end in the double-stranded nucleic acid segment, and the insertion segment corresponds to the random sequence one by one. According to the embodiment of the present invention, the inserts in the sequencing library are connected to the adapters described above, and each insert can have at least⁴⁶ different random sequence markers. The sequencing library is used for sequencing, and can distinguish between true variation and library construction or In sequencing, for example, base errors introduced in PCR amplification can be used to detect genes with a mutation abundance as low as 1‰.

根据本发明的实施例，所述插入片段的3’末端具有A突出末端。进而有利于插入片段与接头通过T-A连接进行高效连接。According to an embodiment of the present invention, the 3' end of the insert fragment has an A protruding end. This in turn facilitates the efficient connection of inserts and adapters through T-A ligation.

根据本发明的实施例，所述插入片段的长度为50-500bp。发明人发现，插入片段的长度控制在50-500bp范围内，建库成功率进一步提高，测序结果更加真实可靠。According to an embodiment of the present invention, the length of the insertion fragment is 50-500bp. The inventors found that if the length of the inserted fragment is controlled within the range of 50-500bp, the success rate of library construction is further improved, and the sequencing results are more authentic and reliable.

构建测序文库的方法Methods for Constructing Sequencing Libraries

再一方面，本发明提出了一种构建测序文库的方法。根据本发明的实施例，所述方法包括：1)将血浆游离DNA进行平末端修复和3’加A处理；2)将步骤1)所获得产物与前面所述的接头进行连接处理，所述接头是以前面所述的接头文库的形式提供的；3)将连接处理产物进行扩增处理，扩增产物构成所述测序文库。根据本发明实施例的方法中，接头与步骤1)所获的产物的连接效率高，步骤1)所获的产物的利用率高；根据本发明实施例的方法获得测序文库中的待测核酸样本(血浆游离DNA)与前面所述的接头相连，连接率高(可高达45.8％)，测序文库应用于测序，可区分真实变异和建库或测序中，如PCR扩增中引入的碱基错误，可用于对突变丰度低到1‰的基因进行检测。In another aspect, the present invention provides a method for constructing a sequencing library. According to an embodiment of the present invention, the method includes: 1) performing blunt-end repair and 3'-adding A treatment on free plasma DNA; 2) performing ligation treatment on the product obtained in step 1) and the adapter described above, the The linker is provided in the form of the aforementioned linker library; 3) The ligated product is subjected to amplification treatment, and the amplified product constitutes the sequencing library. In the method according to the embodiment of the present invention, the connection efficiency of the adapter and the product obtained in step 1) is high, and the utilization rate of the product obtained in step 1) is high; the nucleic acid to be tested in the sequencing library is obtained according to the method of the embodiment of the present invention The sample (plasma cell-free DNA) is connected to the above-mentioned linker with a high connection rate (up to 45.8%). The sequencing library is used for sequencing, which can distinguish real variation from bases introduced in library construction or sequencing, such as PCR amplification. Error, can be used to detect genes with mutation abundance as low as 1‰.

根据本发明的实施例，针对相同来源的样本，所述接头采用相同的样本标签序列。进而利用所获得的测序文库进行测序，具有相同的样本标签序列的测序读段归为同一来源。According to an embodiment of the present invention, for samples from the same source, the linker uses the same sample tag sequence. Then, the obtained sequencing library is used for sequencing, and the sequencing reads with the same sample tag sequence are classified as the same source.

根据本发明的实施例，在所述扩增处理之前，进一步包括将所述连接处理产物进行第一纯化处理。进而可排除连接过程中所用的酶或缓冲液中离子对下步扩增反应的干扰，提高扩增反应的成功率和效率。According to an embodiment of the present invention, before the amplification treatment, it further includes performing a first purification treatment on the ligation treatment product. Furthermore, it can eliminate the interference of the enzymes used in the ligation process or the ions in the buffer to the next step of the amplification reaction, and improve the success rate and efficiency of the amplification reaction.

根据本发明的再一实施例，在所述扩增处理之后，进一步包括将所述扩增产物进行第二纯化处理。进而可排除扩增过程中所用酶或缓冲液离子对测序的干扰，提高测序的成功率和准确率。According to yet another embodiment of the present invention, after the amplification treatment, further comprising performing a second purification treatment on the amplification product. Furthermore, the interference of the enzymes or buffer ions used in the amplification process to the sequencing can be eliminated, and the success rate and accuracy of the sequencing can be improved.

根据本发明的具体实施例，所述纯化处理是通过磁珠纯化进行的。进而纯化效率进一步提高，产物回收效率高。According to a specific embodiment of the present invention, the purification treatment is performed by magnetic bead purification. Furthermore, the purification efficiency is further improved, and the product recovery efficiency is high.

再一方面，本发明也提出了一种测序文库。根据本发明的实施例，该测序文库是通过前面所述的构建测序文库的方法获得的。如前所述，根据本发明实施例的测序文库中接头与目的片段(血浆游离DNA)的连接率高，测序文库应用于测序，可区分真实变异和建库或测序中，如PCR扩增中引入的碱基错误，可用于对突变丰度低到1‰的基因进行检测。In another aspect, the present invention also provides a sequencing library. According to an embodiment of the present invention, the sequencing library is obtained by the aforementioned method for constructing a sequencing library. As mentioned above, the connection rate of adapters and target fragments (plasma free DNA) in the sequencing library according to the embodiment of the present invention is high, and the sequencing library is applied to sequencing, which can distinguish real variation from library construction or sequencing, such as in PCR amplification The introduced base errors can be used to detect genes with a mutation abundance as low as 1‰.

基因序列分析方法Gene Sequence Analysis Methods

最后，本发明提出了一种基因序列分析方法。根据本发明的实施例，所述方法包括：基于待测核酸样本，按照前面所述的方法，构建测序文库；对所述测序文库进行测序，以便获得包含多个测序读段的测序结果，将所述测序结果与参考序列进行比对，以便确定具有突变位点的读段序列，将所述具有突变位点的读段序列进行分簇处理，其中，具有相同随机序列的读段序列被分配至相同的簇中，以及针对每个簇，基于所述突变位点在所述读段序列上的位置，确定所述突变位点的来源；其中，在每个簇中，所述突变位点在所述读段序列上的位置相同，是所述突变位点为所述核酸样本自有突变的指示，所述突变位点在所述读段序列上的位置不同，是所述突变位点来自于测序过程的指示；在进行所述比对之前，预先基于样本标签序列的信息，对所述测序结果进行分类，其中，具有相同样本标签序列的测序读段被归类为来自于相同的核酸样本。根据本发明实施例的方法可用于区分核酸样本自有突变和测序过程中，如PCR扩增中引入的碱基错误，可用于对突变丰度低到1‰的基因进行检测。Finally, the present invention proposes a gene sequence analysis method. According to an embodiment of the present invention, the method includes: constructing a sequencing library based on the nucleic acid sample to be tested according to the aforementioned method; performing sequencing on the sequencing library to obtain a sequencing result comprising multiple sequencing reads, and The sequencing results are compared with the reference sequence to determine the read sequence with the mutation site, and the read sequence with the mutation site is clustered, wherein the read sequence with the same random sequence is assigned into the same cluster, and for each cluster, based on the position of the mutation site on the read sequence, determine the source of the mutation site; wherein, in each cluster, the mutation site The position on the read sequence is the same, which means that the mutation site is an indication of the mutation of the nucleic acid sample, and the position of the mutation site on the read sequence is different, which is the mutation site Instructions from the sequencing process; prior to performing the alignment, the sequencing results are pre-classified based on sample tag sequence information, wherein sequencing reads with the same sample tag sequence are classified as coming from the same Nucleic acid samples. The method according to the embodiment of the present invention can be used to distinguish natural mutations of nucleic acid samples from base errors introduced during the sequencing process, such as PCR amplification, and can be used to detect genes with a mutation abundance as low as 1‰.

下面参考具体实施例，对本发明进行说明，需要说明的是，这些实施例仅仅是说明性的，而不能理解为对本发明的限制。若未特别指明，实施例中所采用的技术手段为本领域技术人员所熟知的常规手段，可以参照《分子克隆实验指南》第三版或者相关产品进行，所采用的试剂和产品也均为可商业获得的。未详细描述的各种过程和方法是本领域中公知的常规方法，所用试剂的来源、商品名以及有必要列出其组成成分者，均在首次出现时标明，其后所用相同试剂如无特殊说明，均以首次标明的内容相同。The present invention will be described below with reference to specific embodiments. It should be noted that these embodiments are only illustrative and should not be construed as limiting the present invention. Unless otherwise specified, the technical means used in the examples are conventional means well known to those skilled in the art, and can be carried out with reference to the third edition of the "Molecular Cloning Experiment Guide" or related products, and the reagents and products used are also available. commercially acquired. Various processes and methods that are not described in detail are conventional methods well known in the art. The source, trade name and necessary list of components of the reagents used are all indicated when they appear for the first time. Descriptions are the same as those indicated for the first time.

以下实施例以检测血浆cell free DNA Horizon标准品(HD778)中的PIK3CA基因E545K突变为例，评估随机序列(在以下实施例中，称为UID分子标签)对检测突变，纠正PCR错误的作用，具体实施步骤和结果如下所述。The following examples take the detection of the PIK3CA gene E545K mutation in the plasma cell free DNA Horizon standard (HD778) as an example to evaluate the effect of random sequences (called UID molecular tags in the following examples) on the detection of mutations and correction of PCR errors. The specific implementation steps and results are as follows.

实施例Example

1)接头序列设计、订购和退火1) Linker sequence design, ordering and annealing

设计含有如SEQ ID NO：1所示的样本标签序列(和6nt的UID分子标签序列的接头的bottom链(序列如SEQ ID NO：3所示)，将SEQ ID NO：3序列送至takara公司进行合成，纯化方式为page胶纯化，并在合成的序列的5’进行磷酸化修饰。Design the bottom chain containing the sample tag sequence shown in SEQ ID NO: 1 (and the joint of the UID molecular tag sequence of 6nt (sequence shown in SEQ ID NO: 3), and send the sequence of SEQ ID NO: 3 to takara company The synthesis is carried out, and the purification method is page gel purification, and the phosphorylation modification is carried out at the 5' of the synthesized sequence.

获得磷酸化修饰的含有UID接头的序列后，按照takara公司的操作说明用不含DNA的去离子水进行稀释。After obtaining the phosphorylated sequence containing the UID linker, it was diluted with DNA-free deionized water according to the instructions of Takara Company.

将上述稀释后的序列与接头的up链(序列如SEQ ID NO：4所示)进行退火，接头的终浓度为10nM。The above diluted sequence was annealed to the up strand of the linker (sequence shown in SEQ ID NO: 4), and the final concentration of the linker was 10 nM.

样本标签序列：TCATAAAT(SEQ ID NO：1)。Sample tag sequence: TCATAAAT (SEQ ID NO: 1).

接头的bottom链：The bottom chain of the connector:

其中，带双下划线的为其中一个8nt的样本标签序列(barcode)，带单下划线的NNNNNN(UID)。Among them, the one with a double underline is one of the 8nt sample tag sequences (barcode), and the one with a single underline is NNNNNN (UID).

接头的up链：5’-TTGTCTTCCTAAGGAACGACATGGCTACGATCCGACTT-3’(SEQ ID NO：4)。Up strand of linker: 5'-TTGTCTTCCTAAGGAACGACATGGCTACGATCCGACTT-3' (SEQ ID NO: 4).

退火后的形成的接头结构如图1所示。The joint structure formed after annealing is shown in Fig. 1 .

通过Agilent Bioanalyzer 2100仪器和Agilent High Sensitivity DNA试剂盒对退火后的接头进行质量检测，检测结果如图2所示，根据图中所示，计算接头引物退火率可达到88％。The quality of the annealed linker was detected by Agilent Bioanalyzer 2100 instrument and Agilent High Sensitivity DNA kit. The test results are shown in Figure 2. According to the figure, the annealing rate of the linker primer can reach 88%.

2)按照QIAamp Circulating Nucleic Acid Kit(Qiagen，Cat No.:55114)的说明书从血浆样本中提取游离DNA。2) Extract free DNA from plasma samples according to the instructions of QIAamp Circulating Nucleic Acid Kit (Qiagen, Cat No.: 55114).

3)末端修复和加A3) End repair and A addition

使用KAPA Hyper Prep Kit(KAPA BIOSYSTEMS，KK8505*)对HD778标准品(1％Multiplex I cfDNA Reference Standard，horizon)进行末端修复并在双端的3’位置加入一个A碱基，按照下面表1所示的试剂组分在PCR管里配置混合液，配置2管，标记为UID和non-UID。Use the KAPA Hyper Prep Kit (KAPA BIOSYSTEMS, KK8505*) to perform end repair on the HD778 standard (1% Multiplex I cfDNA Reference Standard, horizon) and add an A base at the 3' position of the double-end, as shown in Table 1 below The reagent components are mixed in PCR tubes, and 2 tubes are configured, marked as UID and non-UID.

表1：末端修复和加A反应体系Table 1: End Repair and Add A Reaction System

组分components体积volumeHD778标准品(40ng)HD778 Standard (40ng)50μL50μL末端修复和尾加A缓冲液End Repair and Tail A Buffer7μL7μL末端修复和尾加A酶混合液End Repair and Tail A Enzyme Mix3μL3μL总体积total capacity60μL60μL

将表1所述混合液震荡混合均匀，短暂离心后按照下面的程序置于PCR仪中，PCR反应程序如表2所示。Shake and mix the mixture described in Table 1 evenly, centrifuge briefly and place it in a PCR instrument according to the following procedure. The PCR reaction procedure is shown in Table 2.

表2：PCR反应程序Table 2: PCR reaction program

温度temperature时间time20℃20°C30分钟30 minutes65℃65°C30分钟30 minutes4℃4°CHoldhold

4)接头连接4) Joint connection

使用试剂盒(Hyper Prep Kit，KAPA BIOSYSTEMS)，向上步的PCR管内加入下面表3所示的连接反应试剂，在上步反应产物中分别加上退火后的UID接头。Using a kit (Hyper Prep Kit, KAPA BIOSYSTEMS), add the ligation reaction reagents shown in Table 3 below into the PCR tube of the upward step, and add the annealed UID adapters to the reaction products of the previous step.

表3：接头连接反应体系Table 3: Linker Ligation Reaction System

组分components体积volume末端修复和加A产物End Repair and Add A Products60μL60μL退火后的接头10μMAnnealed linker 10 μM5μL5μLddH<sub>2</sub>OddH<sub>2</sub>O5μL5μL连接缓冲液ligation buffer30μL30μLDNA连接酶DNA ligase10μL10μL总体积total capacity110μL110μL

混匀和短暂离心。Mix well and centrifuge briefly.

20℃下孵育15分钟。Incubate at 20°C for 15 minutes.

连接后磁珠纯化，纯化步骤如下所述：Magnetic bead purification after ligation, the purification steps are as follows:

a)将磁珠(Agencourt AMPure XP)提前室温平衡30min，vortex 15s充分震荡混匀备用；a) Equilibrate the magnetic beads (Agencourt AMPure XP) at room temperature for 30 minutes in advance, and vortex for 15 seconds to fully oscillate and mix well;

b)将上一步反应液转移至新的1.5ml离心管，每管中加入88μL(0.8x)的磁珠，混匀，室温静置10 min，使DNA与磁珠结合；b) Transfer the reaction solution from the previous step to a new 1.5ml centrifuge tube, add 88 μL (0.8x) magnetic beads to each tube, mix well, and let stand at room temperature for 10 minutes to allow the DNA to bind to the magnetic beads;

c)瞬时离心，将离心管置于磁力架上2-3 min，待磁珠吸附完全，小心吸出并弃掉上清，避免吸到磁珠；c) Instantaneous centrifugation, place the centrifuge tube on the magnetic stand for 2-3 minutes, wait until the magnetic beads are completely adsorbed, carefully suck out and discard the supernatant to avoid attracting the magnetic beads;

d)离心管保持于磁力架上，向每管中加入500μL 80％乙醇(现用现配)，于磁力架上转动离心管2-3周来清洗磁珠，弃掉上清(避免吸取到磁珠)；使用96孔磁力架的，可以每管只加入150μL的乙醇；d) Keep the centrifuge tubes on the magnetic stand, add 500 μL of 80% ethanol (ready to use) to each tube, rotate the centrifuge tubes on the magnetic stand for 2-3 weeks to clean the magnetic beads, discard the supernatant (avoid drawing into magnetic beads); if using a 96-well magnetic frame, only 150 μL of ethanol can be added to each tube;

e)重复步骤d)一次；e) repeat step d) once;

f)用小量程移液器尽量吸出离心管内残余的乙醇，但不要碰到磁珠，保持离心管在磁力架上，开盖，室温干燥至磁珠表面无液体反光，呈磨砂状态，出现微裂最佳(勿使磁珠过度干裂)；f) Use a small-volume pipette to suck out the residual ethanol in the centrifuge tube as much as possible, but do not touch the magnetic beads. Keep the centrifuge tube on the magnetic stand, open the cover, and dry at room temperature until the surface of the magnetic beads has no liquid reflection and is in a frosted state. The cracking is the best (do not make the magnetic beads excessively dry);

g)用40μL DNA溶解液重悬磁珠，将离心管从磁力架取下，吹吸混匀后室温静置5min，瞬时离心，再将离心管置于磁力架上1-2min，吸附至液体澄清，小心吸出上清转移至新的离心管中。g) Resuspend the magnetic beads with 40 μL DNA dissolving solution, remove the centrifuge tube from the magnetic stand, blow and aspirate to mix, let stand at room temperature for 5 minutes, centrifuge briefly, then place the centrifuge tube on the magnetic stand for 1-2 minutes, and absorb to the liquid To clarify, carefully aspirate the supernatant and transfer to a new centrifuge tube.

5)文库PCR扩增5) Library PCR amplification

使用试剂盒(KK2601，KAPA BIOSYSTEMS)，按照下表4所示的体系配置PCR反应液。Using the kit (KK2601, KAPA BIOSYSTEMS), configure the PCR reaction solution according to the system shown in Table 4 below.

表4：PCR反应液Table 4: PCR reaction solution

组分components体积volumeKAPA HiFi热启动混合液(2X)KAPA HiFi Hot Start Mix (2X)25μL25 μLKAPA库扩增引物混合液(10X)KAPA Library Amplification Primer Mix (10X)5μL5μL连接产物Ligation product20μL20 μL总体积total capacity50μL50μL

KAPA库扩增引物混合液由引物1和2等摩尔量混合，制得浓度为10uM的引物混合液，混合液中两条引物浓度分别10uM。The KAPA library amplification primer mixture was mixed with equal molar amounts of primers 1 and 2 to prepare a primer mixture with a concentration of 10 uM, and the concentrations of the two primers in the mixture were respectively 10 uM.

引物1：5’-GAACGACATGGCTACGA-3’(SEQ ID NO：5)；Primer 1: 5'-GAACGACATGGCTACGA-3' (SEQ ID NO: 5);

引物2：5’-TGTGAGCCAAGGAGTTG-3’(SEQ ID NO：6)。Primer 2: 5'-TGTGAGCCAAGGAGTTG-3' (SEQ ID NO: 6).

PCR反应液充分混匀后短暂离心。The PCR reaction solution was thoroughly mixed and centrifuged briefly.

按照下表5所示的PCR反应条件进行扩增。Amplification was performed according to the PCR reaction conditions shown in Table 5 below.

表5：PCR反应条件Table 5: PCR reaction conditions

PCR扩增结束进行纯化，纯化过程如下所述：Purification is carried out after PCR amplification, and the purification process is as follows:

b)将上一步反应液转移至新的1.5ml离心管，每管中加入50μL(1x)的磁珠，混匀，室温静置10min，使DNA与磁珠结合；b) Transfer the reaction solution from the previous step to a new 1.5ml centrifuge tube, add 50 μL (1x) magnetic beads to each tube, mix well, and let stand at room temperature for 10 minutes to bind the DNA to the magnetic beads;

c)瞬时离心，将离心管置于磁力架上2-3min，待磁珠吸附完全，小心吸出并弃掉上清，避免吸到磁珠；c) Instantaneous centrifugation, place the centrifuge tube on the magnetic stand for 2-3 minutes, wait until the magnetic beads are completely adsorbed, carefully suck out and discard the supernatant to avoid attracting the magnetic beads;

d)离心管保持于磁力架上，向每管中加入500μL 80％乙醇(现用现配)，于磁力架上转动离心管2-3周来清洗磁珠，弃掉上清(避免吸取到磁珠)；使用96孔磁力架的，可以每管只加入150μL的乙醇。d) Keep the centrifuge tubes on the magnetic stand, add 500 μL of 80% ethanol (ready to use) to each tube, rotate the centrifuge tubes on the magnetic stand for 2-3 weeks to clean the magnetic beads, discard the supernatant (avoid drawing into magnetic beads); if using a 96-well magnetic stand, only 150 μL of ethanol can be added to each tube.

e)重复步骤d)一次；e) repeat step d) once;

文库检测：使用Qubit 3荧光定量仪和高灵敏性(high sensitive)的定量试剂盒对文库的浓度进行检测，浓度为14ng/μL，符合预期浓度。通过Agilent Bioanalyzer 2100仪器和Agilent High Sensitivity DNA试剂盒对退火后的接头进行检测，结果如图3，根据图中所示及模板、接头使用量计算接头与建库模板DNA的连接效率，接头和建库模板DNA连接效率可达到45.8％。Library detection: Qubit 3 fluorescence quantification instrument and high sensitive quantitative kit were used to detect the concentration of the library, and the concentration was 14ng/μL, which was in line with the expected concentration. The joints after annealing were detected by Agilent Bioanalyzer 2100 instrument and Agilent High Sensitivity DNA kit. The results are shown in Figure 3. According to the figures shown in the figure and the amount of templates and joints used, the connection efficiency of the joints and the template DNA for library construction was calculated. The library template DNA ligation efficiency can reach 45.8%.

文库环化和BGISEQ-500测序：按照BGISEQ-500测序仪的说明书对文库进行环化和PE50+14测序。Library circularization and BGISEQ-500 sequencing: Perform circularization and PE50+14 sequencing on the library according to the instructions of the BGISEQ-500 sequencer.

6)数据分析6) Data Analysis

数据分析按照图4所示，首先通过样本标签(TCATAAAT)对读段(reads)进行拆分归类，带TCATAAAT的归为样本reads，再对reads与参考基因组进行比对，比对到相同位置的reads按照UID进行分类，相同UID的read1和read2为一簇，PCR的错误的位置是随机的，而同一簇的游离DNA(ctDNA)的突变理应处于同一个位置，所以可通过同一簇中突变的位置来判断突变为PCR引入还是真实的体细胞突变(图5和图6)，即通过对测序数据的分析，在建库中利用带有UID分子标签的接头可以区分真实的突变及PCR引入的错误。其中，图5是UIDreads簇中的真实的突变示例图，箭头所指的方框里的g或G为正常碱基，A或a为突变碱基；图6是UID reads簇中的PCR引入错误的示例图，图方框中的C为PCR引入碱基错误。The data analysis is as shown in Figure 4. First, the reads are split and classified by the sample label (TCATAAAT), and the reads with TCATAAAT are classified as sample reads, and then the reads are compared with the reference genome, and compared to the same position The reads are classified according to UID. The read1 and read2 of the same UID are a cluster. The wrong position of PCR is random, and the mutation of the same cluster of free DNA (ctDNA) should be in the same position, so it can be passed through mutations in the same cluster. The location of the mutation can be used to determine whether the mutation is PCR-introduced or a real somatic mutation (Figure 5 and Figure 6), that is, through the analysis of the sequencing data, the linker with the UID molecular tag can be used to distinguish between real mutations and PCR-introduced mistake. Among them, Figure 5 is an example diagram of a real mutation in the UID reads cluster. The g or G in the box pointed by the arrow is a normal base, and A or a is a mutant base; Figure 6 is a PCR introduction error in the UID reads cluster , the C in the box of the figure is a base error introduced by PCR.

在本说明书的描述中，参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本发明的至少一个实施例或示例中。在本说明书中，对上述术语的示意性表述不必须针对的是相同的实施例或示例。而且，描述的具体特征、结构、材料或者特点可以在任一个或多个实施例或示例中以合适的方式结合。此外，在不相互矛盾的情况下，本领域的技术人员可以将本说明书中描述的不同实施例或示例以及不同实施例或示例的特征进行结合和组合。In the description of this specification, descriptions with reference to the terms "one embodiment", "some embodiments", "example", "specific examples", or "some examples" mean that specific features described in connection with the embodiment or example , structure, material or feature is included in at least one embodiment or example of the present invention. In this specification, the schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the described specific features, structures, materials or characteristics may be combined in any suitable manner in any one or more embodiments or examples. In addition, those skilled in the art can combine and combine different embodiments or examples and features of different embodiments or examples described in this specification without conflicting with each other.

尽管上面已经示出和描述了本发明的实施例，可以理解的是，上述实施例是示例性的，不能理解为对本发明的限制，本领域的普通技术人员在本发明的范围内可以对上述实施例进行变化、修改、替换和变型。Although the embodiments of the present invention have been shown and described above, it can be understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and those skilled in the art can make the above-mentioned The embodiments are subject to changes, modifications, substitutions and variations.

SEQUENCE LISTINGSEQUENCE LISTING

<110> 深圳华大临床检验中心<110> Shenzhen Huada Clinical Laboratory Center

广州华大基因医学检验所有限公司Guangzhou Huada Gene Medical Laboratory Co., Ltd.

深圳华大基因股份有限公司Shenzhen Huada Gene Co., Ltd.

<120> 接头、接头文库及其应用<120> Adapters, Adapter Libraries and Applications

<130> PIDC3181389<130> PIDC3181389

<160> 6<160> 6

<170> PatentIn version 3.3<170> PatentIn version 3.3

<210> 1<210> 1

<211> 8<211> 8

<212> DNA<212>DNA

<213> Artificial<213> Artificial

<220><220>

<223> 样本标签序列<223> sample label sequence

<400> 1<400> 1

tcataaat 8tcataaat 8

<210> 2<210> 2

<211> 17<211> 17

<212> DNA<212>DNA

<213> Artificial<213> Artificial

<220><220>

<223> 引物结合序列<223> primer binding sequence

<400> 2<400> 2

acactcggtt cctcaac 17acactcggtt cctcaac 17

<210> 3<210> 3

<211> 62<211> 62

<212> DNA<212>DNA

<213> Artificial<213> Artificial

<220><220>

<223> 预定bottom接头<223> scheduled bottom connector

<220><220>

<221> misc_feature<221> misc_feature

<222> (40)..(45)<222> (40)..(45)

<223> n is a, c, g, or t<223> n is a, c, g, or t

<400> 3<400> 3

agtcggaggc caagcggtct taggaagaca atcataaatn nnnnncaact ccttggctca 60agtcggaggc caagcggtct taggaagaca atcataaatn nnnnncaact ccttggctca 60

ca 62ca 62

<210> 4<210> 4

<211> 38<211> 38

<212> DNA<212>DNA

<213> Artificial<213> Artificial

<220><220>

<223> 预定up接头<223> scheduled up connector

<400> 4<400> 4

ttgtcttcct aaggaacgac atggctacga tccgactt 38ttgtcttcct aaggaacgac atggctacga tccgactt 38

<210> 5<210> 5

<211> 17<211> 17

<212> DNA<212>DNA

<213> Artificial<213> Artificial

<220><220>

<223> 引物1<223> Primer 1

<400> 5<400> 5

gaacgacatg gctacga 17gaacgacatg gctacga 17

<210> 6<210> 6

<211> 17<211> 17

<212> DNA<212>DNA

<213> Artificial<213> Artificial

<220><220>

<223> 引物2<223> Primer 2

<400> 6<400> 6

tgtgagccaa ggagttg 17tgtgagccaa ggagttg 17