CN115851664B

Movatterモバイル変換

Info

Publication number: CN115851664B
Application number: CN202211136496.6A
Authority: CN
Inventors: 肖易倍; 陆美玲; 俞晨霖; 张钰雯
Original assignee: China Pharmaceutical University
Current assignee: Shenzhen Adibek Biotechnology Co ltd
Priority date: 2022-09-19
Filing date: 2022-09-19
Publication date: 2023-08-25
Anticipated expiration: 2042-09-19
Also published as: CN115851664A

Abstract

The invention relates to an I-B CRISPR-Cascade-Cas3 gene editing system and application thereof. The gene editing system consists of a Cascade complex and Cas3 protein, wherein the Cascade complex is formed by compounding Cmx8 protein, cas5 protein, cas6 protein, cas11 protein and crRNA; the application is used for identifying, combining and editing prokaryotic genes or eukaryotic genes. The I-B type CRISPR-Cascade-Cas3 gene editing system can enable a single CRISPR target site to form long fragment deletion with different degrees, thereby making up for the blank that the capability of generating the deletion of the long fragment of the prior CRISPR-Cas9 is relatively limited.

Description

Translated fromChinese

一种I-B型CRISPR-Cascade-Cas3基因编辑系统及应用A type I-B CRISPR-Cascade-Cas3 gene editing system and its application

技术领域Technical Field

本发明涉及一种I-B型CRISPR-Cascade-Cas3基因编辑系统及应用，属于生物医药技术领域。The present invention relates to a type I-B CRISPR-Cascade-Cas3 gene editing system and application, belonging to the technical field of biomedicine.

背景技术Background Art

CRISPR-Cas(Clustered regularly interspaced short palindromic repeatsand CRISPR-associated proteins)系统是在细菌和古细菌基因组中发现的一种由RNA介导、抵挡外源核酸入侵的“适应性免疫系统”^[1]。CRISPR-Cas基因簇包含储存外源核酸序列信息的CRISPR基因座以及编码不同功能蛋白的cas基因。CRISPR基因座包括先导序列leader、重复序列repeat以及间隔序列spacer^[2]。The CRISPR-Cas (Clustered regularly interspaced short palindromic repeats and CRISPR-associated proteins) system is an RNA-mediated "adaptive immune system" found in the genomes of bacteria and archaea that resists the invasion of foreign nucleic acids^[1] . The CRISPR-Cas gene cluster contains the CRISPR locus that stores foreign nucleic acid sequence information and cas genes that encode different functional proteins. The CRISPR locus includes the leader sequence, the repeat sequence, and the spacer sequence^[2] .

CRISPR-Cas系统发挥作用主要分为三个阶段(如图1所示)：1、适应阶段(adaptation)：细菌通过Cas蛋白识别并捕获外来入侵核酸片段，将其作为新的spacer序列整合入CRISPR基因座中；2、CRISPR RNA(crRNA)成熟阶段(maturation)：储存外来核酸信息的CRISPR基因座转录成前体CRISPR RNA(pre-crRNA)，在Cas蛋白和一些核酸酶的作用下加工成成熟的CRISPR RNA(crRNA)，而后与Cas蛋白形成crRNA/Cas蛋白复合物；3、干扰阶段(interference)：crRNA/Cas蛋白复合物通过crRNA与目标序列上PAM(ProtospacerAdjacent Motif)序列附近的靶核酸互作配对结合，具有切割活性的Cas蛋白在目标位点上对靶基因进行特异性切割^[3]。The functioning of the CRISPR-Cas system can be divided into three main stages (as shown in Figure 1): 1. Adaptation stage: bacteria use Cas proteins to identify and capture foreign invading nucleic acid fragments and integrate them into the CRISPR locus as new spacer sequences; 2. CRISPR RNA (crRNA) maturation stage: the CRISPR locus that stores foreign nucleic acid information is transcribed into precursor CRISPR RNA (pre-crRNA), which is processed into mature CRISPR RNA (crRNA) under the action of Cas proteins and some nucleases, and then forms a crRNA/Cas protein complex with Cas proteins; 3. Interference stage: the crRNA/Cas protein complex interacts and pairs with the target nucleic acid near the PAM (Protospacer Adjacent Motif) sequence on the target sequence through crRNA, and the Cas protein with cutting activity specifically cuts the target gene at the target site^[3] .

CRISPR-Cas系统分为两类：Class 1和Class 2(图2)。Class 2系统包括三种类型(type II、typeV和VI)，利用单个多结构域蛋白如Cas9或Cas12来干扰靶标核酸。Class 1系统占整个CRISPR-Cas系统的90％，也分为type I、type III和typeⅣ三种类型，进一步分为七个亚型(I-A至F和I-U)，使用多Cas蛋白以及crRNA组成的效应复合物Cascade(CRISPR-associated complex for antiviral defense)来执行相应功能^[4]。Class 1系统包括Cas3(有时融合Cas2)、Cas5、Cas6、Cas7、Cas8、Cas10、Cas11，在不同亚型中有不同的组合^[5]。Cas5与Cas7形成Cascade的骨架，Cas7蛋白形成6-7个多拷贝数亚基，结合和支撑crRNA并影响crRNA与DNA的互补形式；Cas6负责加工处理pre-crRNA；Cas5分子量较小，与底物核酸结合有关；Cas8和Cas10等负责在底物DNA结合过程中识别PAM序列；具有解旋酶和核酸酶活性的Cas3蛋白负责最后剪切靶标核酸，并进一步降解DNA^[6]。CRISPR-Cas systems are divided into two categories: Class 1 and Class 2 (Figure 2). Class 2 systems include three types (type II, type V, and VI), which use a single multi-domain protein such as Cas9 or Cas12 to interfere with the target nucleic acid. Class 1 systems account for 90% of the entire CRISPR-Cas system and are also divided into three types: type I, type III, and type IV, which are further divided into seven subtypes (IA to F and IU), using multiple Cas proteins and crRNA-composed effector complexes Cascade (CRISPR-associated complex for antiviral defense) to perform corresponding functions^[4] . Class 1 systems include Cas3 (sometimes fused to Cas2), Cas5, Cas6, Cas7, Cas8, Cas10, and Cas11, with different combinations in different subtypes^[5] . Cas5 and Cas7 form the skeleton of Cascade. The Cas7 protein forms 6-7 multi-copy subunits, which bind to and support crRNA and affect the complementary form of crRNA and DNA. Cas6 is responsible for processing pre-crRNA. Cas5 has a smaller molecular weight and is related to binding to the substrate nucleic acid. Cas8 and Cas10 are responsible for recognizing the PAM sequence during the substrate DNA binding process. The Cas3 protein with helicase and nuclease activity is responsible for the final cleavage of the target nucleic acid and further degrading the DNA^[6] .

目前基于Class 2系统中的CRISPR-Cas9的基因编辑技术已十分成熟且应用广泛，并已被开发成为高效的基因编辑工具与基因检测工具，在基础和应用生物学研究中发挥重要的作用。该系统利用向导RNA(single guide RNA，sgRNA)识别并结合到靶标DNA上，引导Cas9蛋白在目标位点进行切割形成DNA双链断裂(DNAdouble-strand break，DSB)^[7]。然后细胞通过非同源末端连接(non-homologous end joining，NHEJ)和同源重组修复(homology-directed repair，HDR)在目标位点上造成碱基的插入或缺失(Indel)，以此对基因组进行编辑^[8]。At present, the gene editing technology based on CRISPR-Cas9 in the Class 2 system has become very mature and widely used. It has been developed into an efficient gene editing tool and gene detection tool, playing an important role in basic and applied biological research. The system uses a single guide RNA (sgRNA) to recognize and bind to the target DNA, guiding the Cas9 protein to cut at the target site to form a DNA double-strand break (DSB)^[7] . The cell then causes the insertion or deletion of bases (Indel) at the target site through non-homologous end joining (NHEJ) and homologous recombination repair (HDR), thereby editing the genome^[8] .

Class 1中type I CRISPR-Cas系统的作用机制与Cas9不同。其工作原理如下：The mechanism of action of the type I CRISPR-Cas system in Class 1 is different from that of Cas9. Its working principle is as follows:

1、Cascade识别PAM序列附近的靶标双链DNA(dsDNA)，促使CRISPR RNA(crRNA)与靶标单链(与crRNA互补的链)形成异源双链核酸分子，置换出非靶标链，形成R-loop结构；1. Cascade recognizes the target double-stranded DNA (dsDNA) near the PAM sequence, prompting CRISPR RNA (crRNA) to form a heteroduplex nucleic acid molecule with the target single strand (the strand complementary to crRNA), displacing the non-target strand and forming an R-loop structure;

2、具有解旋酶与核酸酶活性的Cas3被特异性地招募到Cascade/R-loop复合物中，切割其中的非靶标链(PAM序列在非靶标链上)，优先在距离PAM序列7或9个核苷酸位置处产生缺口，产生缺口的DNA链可能会穿过Cas3的解旋酶结构域，从而启动后续的DNA解链以及降解过程^[9]。2. Cas3, which has helicase and nuclease activities, is specifically recruited to the Cascade/R-loop complex, where it cuts the non-target strand (the PAM sequence is on the non-target strand), preferentially creating a nick at 7 or 9 nucleotides away from the PAM sequence. The nicked DNA strand may pass through the helicase domain of Cas3, thereby initiating subsequent DNA unwinding and degradation processes^[9] .

尽管CRISPR-Cas9系统简单易操作，但仍然存在脱靶等问题。另外虽然CRISPR-Cas9切割了靶标DNA，但由于外显子跳跃或翻译再起始机制，仍有一些靶标蛋白可以部分表达，产生活性发挥其功能，进而影响基因编辑的效率^[10]。同时CRISPR-Cas9产生长片段缺失的能力相对有限，也限制了其应用^[11]。Although the CRISPR-Cas9 system is simple and easy to operate, it still has problems such as off-target effects. In addition, although CRISPR-Cas9 cuts the target DNA, due to exon skipping or translation reinitiation mechanisms, some target proteins can still be partially expressed and active to exert their functions, thereby affecting the efficiency of gene editing^[10] . At the same time, the ability of CRISPR-Cas9 to produce long-fragment deletions is relatively limited, which also limits its application^[11] .

相比于CRISPR-Cas9系统来说，Class 1系统中效应复合物的组成更为复杂和精密，且Cascade中crRNA的长度一般为30nt以上，这相比20nt左右靶向配对的Cas9具有更高的特异性，发生脱靶的几率会更小。Cascade复合物必须与靶标DNA形成完整的R-loop结构，才能招募Cas3^[12]。这一特性可以防止Cas3过早地结合到DNA上，引起非特异性切割。最重要的是，Cas3切割靶标DNA会造成目标位点上长片段缺失(几百bp到100kb)，这种能力是目前CRISPR-Cas9系统不具备的。因此，Class 1CRISPR-Cas系统有望成为效果更好、功能更强大的基因编辑工具。Compared with the CRISPR-Cas9 system, the composition of the effector complex in the Class 1 system is more complex and precise, and the length of the crRNA in Cascade is generally more than 30 nt, which has higher specificity than the 20 nt targeted Cas9 and is less likely to occur off-target. The Cascade complex must form a complete R-loop structure with the target DNA in order to recruit Cas3^[12] . This feature prevents Cas3 from binding to DNA prematurely and causing nonspecific cutting. Most importantly, Cas3 cutting the target DNA will cause long fragment deletions (hundreds of bp to 100 kb) at the target site, a capability that the current CRISPR-Cas9 system does not have. Therefore, the Class 1 CRISPR-Cas system is expected to become a more effective and powerful gene editing tool.

由于Class 1系统的效应复合物组成复杂，直到2019年I型CRISPR-Cas系统才首次被报道应用于哺乳动物细胞基因编辑。该研究利用来源于Thermobifida fusca的I-E型CRISPR-Cas系统在人胚胎干细胞(hESC)以及人类近单倍体细胞系(HAP1)中，达到了13％以及60％的敲除效率，并且发现了由特定的crRNA可以产生大量不同长度的基因组缺失(图3的左图)，证明了I型基因编辑工具有优秀的造成长片段缺失的能力^[9]。另外一个研究利用一种type I-C型CRISPR-Cas系统在人胚胎干细胞(hESC)以及人类近单倍体细胞系(HAP1)也达到了很好的敲除效率^[13](图3的右图)。这两项研究均表明，开发I型基因编辑工具有极大的前景。Due to the complex composition of the effector complex of the Class 1 system, the Type I CRISPR-Cas system was not first reported for mammalian cell gene editing until 2019. This study used the Type IE CRISPR-Cas system derived from Thermobifida fusca to achieve 13% and 60% knockout efficiencies in human embryonic stem cells (hESC) and a human near-haploid cell line (HAP1), and found that a large number of genomic deletions of different lengths could be produced by specific crRNA (left figure of Figure 3), proving that Type I gene editing tools have excellent ability to cause long fragment deletions^[9] . Another study used a type IC CRISPR-Cas system to achieve good knockout efficiencies in human embryonic stem cells (hESC) and a human near-haploid cell line (HAP1)^[13] (right figure of Figure 3). Both studies show that there is great potential for the development of Type I gene editing tools.

目前，对于Class 1类CRISPR-Cas系统中的I-E和I-F型的研究已经较为完整透彻，但对其他亚型的结构、功能特点及作用机制研究尚不足，导致我们不能全面了解和掌握Class 1类CRISPR-Cas系统，也阻碍了该类系统在基因编辑等领域的开发应用。有研究表明，I型CRISPR-Cas系统中Cas蛋白的排列方式被保留在I-B型中，其他亚型都有不同程度的基因缺失或重新排列，该类型可能在进化上更为原始。因此，如能开发与利用I-B型CRISPR-Cas系统基因编辑工具，则有利于人们更广泛全面地了解与掌握Class 1类CRISPR-Cas系统，为以后的广泛应用奠定基础。At present, the research on the I-E and I-F types in the Class 1 CRISPR-Cas system has been relatively complete and thorough, but the research on the structure, functional characteristics and mechanism of action of other subtypes is still insufficient, which has led to our inability to fully understand and master the Class 1 CRISPR-Cas system, and also hindered the development and application of this type of system in the field of gene editing. Studies have shown that the arrangement of Cas proteins in the Type I CRISPR-Cas system is retained in the Type I-B type, and other subtypes have varying degrees of gene deletion or rearrangement, and this type may be more primitive in evolution. Therefore, if the Type I-B CRISPR-Cas system gene editing tools can be developed and utilized, it will help people to understand and master the Class 1 CRISPR-Cas system more widely and comprehensively, and lay the foundation for its widespread application in the future.

集胞藻Synechocystis sp.PCC 6714菌株是单细胞蓝细菌，与广泛研究的模式生物Synechocystissp.PCC 6803密切相关。这两株菌都是R.Kunisawa从加利福尼亚奥克兰的同一个淡水池塘中分离出，其中16S rRNA的同源性高达99.4％，基因含量和预测的蛋白质组非常保守，在进化上具有相同的起源^[14]。同时在较早的研究中，Synechocystis sp.PCC6714和Synechocystis sp.PCC 6803都可用于染色体DNA的制备^[15]，适合于在实验室中进行基因操作。Synechocystis sp. PCC 6714 is a unicellular cyanobacterium that is closely related to the widely studied model organism Synechocystis sp. PCC 6803. Both strains were isolated by R. Kunisawa from the same freshwater pond in Oakland, California. They have a 16S rRNA homology of up to 99.4%, and their gene content and predicted proteome are very conserved, indicating that they have the same evolutionary origin^[14] . In earlier studies, both Synechocystis sp. PCC 6714 and Synechocystis sp. PCC 6803 were used to prepare chromosomal DNA^[15] , making them suitable for genetic manipulation in the laboratory.

上文涉及的参考文献如下：The references mentioned above are as follows:

[1]Marraffini LA,Sontheimer EJ.CRISPR interference:RNA-directedadaptive immunity in bacteria and archaea.Nat Rev Genet.2010,11(3):181-190.[1] Marraffini LA, Sontheimer EJ. CRISPR interference: RNA-directed adaptive immunity in bacteria and archaea. Nat Rev Genet. 2010, 11(3): 181-190.

[2]D.Gupta,O.Bhattacharjee,D.Mandal,M.K.Sen,D.Dey,A.Dasgupta,T.A.Kazi,R.Gupta,S.Sinharoy,K.Acharya,D.Chattopadhyay,V.Ravichandiran,S.Roy,D.Ghosh,CRISPR-Cas9 system:A new-fangled dawn in gene editing,Life Sci 232(2019)116636.[2]D.Gupta,O.Bhattacharjee,D.Mandal,M.K.Sen,D.Dey,A.Dasgupta,T.A.Kazi,R.Gupta,S.Sinharoy,K.Acharya,D.Chattopadhyay,V.Ravichandiran,S .Roy, D.Ghosh, CRISPR-Cas9 system: A new-fangled dawn in gene editing, Life Sci 232(2019)116636.

[3]B.Minkenberg,M.Wheatley,Y.Yang,CRISPR/Cas9-Enabled MultiplexGenome Editing and Its Application,Prog Mol Biol Transl Sci 149(2017)111-132.[3] B. Minkenberg, M. Wheatley, Y. Yang, CRISPR/Cas9-Enabled MultiplexGenome Editing and Its Application, Prog Mol Biol Transl Sci 149 (2017) 111-132.

[4]P,VenclovasWhite MF,et al.Evolutionary classification of CRISPR-Cas systems:a burst of class 2and derived variants.Nat Rev Microbiol,2020,18(2):67-83.[4] P, Venclovas White MF, et al. Evolutionary classification of CRISPR-Cas systems: a burst of class 2and derived variants. Nat Rev Microbiol, 2020, 18(2): 67-83.

[5]Makarova KS,Koonin EV.Annotation and classification of CRISPR-Cassystems[J].Methods in molecular biology,2015:47-75.[5]Makarova KS, Koonin EV. Annotation and classification of CRISPR-Cassystems[J]. Methods in molecular biology, 2015: 47-75.

[6]Makarova KS,Wolf YI,Alkhnbashi OS,Costa F,Shah SA,Saunders SJ,Barrangou R,BrounsSJ,Charpentier E,Haft DH.An updated evolutionaryclassification of CRISPR–Cas systems[J].Nature Reviews Microbiology,2015,13(11):722-736.[6]Makarova KS, Wolf YI, Alkhnbashi OS, Costa F, Shah SA, Saunders SJ, Barrangou R, BrounsSJ, Charpentier E, Haft DH. An updated evolutionary classification of CRISPR–Cas systems[J]. Nature Reviews Microbiology, 2015, 13(11):722-736.

[7]Minkenberg B,Wheatley M,Yang Y.CRISPR/Cas9-enabled multiplexgenome editing and its application.Prog Mol Biol Transl Sci,2017,149:111-132.[7] Minkenberg B, Wheatley M, Yang Y. CRISPR/Cas9-enabled multiplexgenome editing and its application. Prog Mol Biol Transl Sci, 2017, 149: 111-132.

[8]R.Tuladhar,Y.Yeu,J.Tyler Piazza,Z.Tan,J.Rene Clemenceau,X.Wu,Q.Barrett,J.Herbert,D.H.Mathews,J.Kim,T.Hyun Hwang,L.Lum,CRISPR-Cas9-basedmutagenesis frequently provokes on-target mRNA misregulation,Nat Commun 10(1)(2019)4056.[8]R.Tuladhar,Y.Yeu,J.Tyler Piazza,Z.Tan,J.Rene Clemenceau,X.Wu,Q.Barrett,J.Herbert,D.H.Mathews,J.Kim,T.Hyun Hwang,L .Lum,CRISPR-Cas9-basedmutagenesis frequently provokes on-target mRNA misregulation,Nat Commun 10(1)(2019)4056.

[9]Dolan AE,Hou Z,Xiao Y,Gramelspacher MJ,Heo J,Howden SE,FreddolinoPL,Ke A,Zhang Y.Introducing a Spectrum of Long-Range Genomic Deletions inHuman Embryonic Stem Cells Using Type I CRISPR-Cas.Mol Cell.2019Jun 6；74(5):936-950.e5.[9]Dolan AE,Hou Z,Xiao Y,Gramelspacher MJ,Heo J,Howden SE,FreddolinoPL,Ke A,Zhang Y.Introducing a Spectrum of Long-Range Genomic Deletions inHuman Embryonic Stem Cells Using Type I CRISPR-Cas.Mol Cell.2019Jun 6;74(5):936-950.e5.

[10]Xiao Y,Luo M,Dolan AE,Liao M,Ke A.Structure basis for RNA-guidedDNA degradation by Cascade and Cas3[J].Science,2018,361(6397).[10]Xiao Y, Luo M, Dolan AE, Liao M, Ke A. Structure basis for RNA-guidedDNA degradation by Cascade and Cas3[J]. Science, 2018, 361(6397).

[11]A.H.Smits,F.Ziebell,G.Joberty,N.Zinn,W.F.Mueller,S.Clauder-Munster,D.Eberhard,M.Falth Savitski,P.Grandi,P.Jakob,A.M.Michon,H.Sun,K.Tessmer,T.Burckstummer,M.Bantscheff,L.M.Steinmetz,G.Drewes,W.Huber,Biological plasticity rescues target activity in CRISPR knock outs,NatMethods 16(11)(2019)1087-1093.[11]A.H.Smits,F.Ziebell,G.Joberty,N.Zinn,W.F.Mueller,S.Clauder-Munster,D.Eberhard,M.Falth Savitski,P.Grandi,P.Jakob,A.M.Michon,H.Sun ,K.Tessmer,T.Burckstummer,M.Bantscheff,L.M.Steinmetz,G.Drewes,W.Huber,Biological plasticity rescues target activity in CRISPR knock outs,NatMethods 16(11)(2019)1087-1093.

[12]Xiao Y,Luo M,Hayes RP,Kim J,Ng S,Ding F,Liao M,Ke A.Structurebasis for directional R-loop formation and substrate handover mechanisms intype I CRISPR-Cas system[J].Cell,2017,170(1):48-60.e11.[12]Xiao Y,Luo M,Hayes RP,Kim J,Ng S,Ding F,Liao M,Ke A.Structurebasis for directional R-loop formation and substrate handover mechanisms intype I CRISPR-Cas system[J].Cell, 2017,170(1):48-60.e11.

[13]Tan R,Krueger RK,Gramelspacher MJ,Zhou X,Xiao Y,Ke A,Hou Z,ZhangY.Cas11 enables genome engineering in human cells with compact CRISPR-Cas3systems.Mol Cell.2022 Jan 13:S1097-2765(21)01137-0.[13]Tan R,Krueger RK,Gramelspacher MJ,Zhou X,Xiao Y,Ke A,Hou Z,ZhangY.Cas11 enables genome engineering in human cells with compact CRISPR-Cas3systems.Mol Cell.2022 Jan 13:S1097-2765( 21)01137-0.

[14]Kopf M,S,Pade N,et al.Comparative genome analysis of theclosely related Synechocystis strains PCC 6714and PCC 6803.DNA Res,2014,21(3):255-266.[14] Kopf M, S, Pade N, et al. Comparative genome analysis of the closely related Synechocystis strains PCC 6714and PCC 6803. DNA Res, 2014, 21(3):255-266.

[15]Joset F.Transformation in Synechocystis PCC 6714and 6803:preparation of chromosomal DNA.Methods Enzymol,1988,167:712-714.[15]Joset F.Transformation in Synechocystis PCC 6714and 6803:preparation of chromosomal DNA.Methods Enzymol,1988,167:712-714.

发明内容Summary of the invention

本发明的主要目的是：克服现有技术存在的问题，提出一种I-B型CRISPR-Cascade-Cas3基因编辑系统及其应用，以此形成的基因编辑技术手段能够使单个CRISPR靶向位点形成不同程度的长片段缺失，从而弥补目前CRISPR-Cas9产生长片段缺失的能力相对有限的空白，并可以丰富基因编辑工具箱。The main purpose of the present invention is to overcome the problems existing in the prior art and propose a type I-B CRISPR-Cascade-Cas3 gene editing system and its application, so that the gene editing technology formed by this can cause a single CRISPR targeting site to form long fragment deletions of varying degrees, thereby filling the gap in the relatively limited ability of CRISPR-Cas9 to produce long fragment deletions and enriching the gene editing toolbox.

本发明解决其技术问题的技术方案如下：The technical solution of the present invention to solve the technical problem is as follows:

一种I-B型CRISPR-Cascade-Cas3基因编辑系统，其特征是，由Cascade复合物以及Cas3蛋白组成；所述Cascade复合物由Cmx8蛋白、Cas8蛋白、Cas5蛋白、Cas6蛋白、Cas11蛋白以及crRNA复合而成；所述Cmx8蛋白的氨基酸序列为SEQ ID NO.2；所述Cas8蛋白的氨基酸序列为SEQ ID NO.4；所述Cas5蛋白的氨基酸序列为SEQ ID NO.6；所述Cas6蛋白的氨基酸序列为SEQ ID NO.8；所述Cas11蛋白的氨基酸序列为SEQ ID NO.10；所述Cas3蛋白的氨基酸序列为SEQ ID NO.12；表达crRNA的DNA片段序列由彼此相同的repeat序列、彼此相同或不同的spacer序列间隔布置而成，且该DNA片段序列的首尾均为repeat序列，所述repeat序列为5’-gtgtccaaaccattgatgccgtaaggcgttgagcac-3’，所述spacer序列根据靶标基因设计而成。A type I-B CRISPR-Cascade-Cas3 gene editing system, characterized in that it is composed of a Cascade complex and a Cas3 protein; the Cascade complex is composed of a Cmx8 protein, a Cas8 protein, a Cas5 protein, a Cas6 protein, a Cas11 protein and a crRNA; the amino acid sequence of the Cmx8 protein is SEQ ID NO.2; the amino acid sequence of the Cas8 protein is SEQ ID NO.4; the amino acid sequence of the Cas5 protein is SEQ ID NO.6; the amino acid sequence of the Cas6 protein is SEQ ID NO.8; the amino acid sequence of the Cas11 protein is SEQ ID NO.10; the amino acid sequence of the Cas3 protein is SEQ ID NO.12; a DNA fragment sequence expressing crRNA is composed of identical repeat sequences and identical or different spacer sequences arranged at intervals, and the first and last ends of the DNA fragment sequence are both repeat sequences, the repeat sequence is 5'-gtgtccaaaccattgatgccgtaaggcgttgagcac-3', and the spacer sequence is designed according to the target gene.

该I-B型CRISPR-Cascade-Cas3基因编辑系统通过Cascade复合物与Cas3蛋白进行识别与切割，更加严谨，能够使单个CRISPR靶向位点形成不同程度的长片段缺失，从而弥补目前CRISPR-Cas9产生长片段缺失的能力相对有限的空白，并可以丰富基因编辑工具箱。The Type I-B CRISPR-Cascade-Cas3 gene editing system uses the Cascade complex and Cas3 protein for recognition and cutting, which is more rigorous and can cause different degrees of long-fragment deletions at a single CRISPR targeting site, thus filling the gap in the relatively limited ability of CRISPR-Cas9 to produce long-fragment deletions and enriching the gene editing toolbox.

优选地，Cas8蛋白的氨基酸序列的3’端连有核定位信号NLS；Cas3蛋白的氨基酸序列的5’端连有核定位信号NLS；所述核定位信号NLS的氨基酸序列为SEQ ID NO.14；表达crRNA的DNA片段序列的结构为：5’-repeat序列-spacer序列-repeat序列-spacer序列-repeat序列-3’。Preferably, the 3' end of the amino acid sequence of the Cas8 protein is connected to a nuclear localization signal NLS; the 5' end of the amino acid sequence of the Cas3 protein is connected to a nuclear localization signal NLS; the amino acid sequence of the nuclear localization signal NLS is SEQ ID NO.14; the structure of the DNA fragment sequence expressing crRNA is: 5'-repeat sequence-spacer sequence-repeat sequence-spacer sequence-repeat sequence-3'.

更优选地，所述Cmx8蛋白的编码基因序列为SEQ ID NO.1；所述Cas8蛋白的编码基因序列为SEQ ID NO.3；所述Cas5蛋白的编码基因序列为SEQ ID NO.5；所述Cas6蛋白的编码基因序列为SEQ ID NO.7；所述Cas11蛋白的编码基因序列为SEQ ID NO.9；所述Cas3蛋白的编码基因序列为SEQ ID NO.11；所述核定位信号NLS的编码基因序列为SEQ ID NO.13；所述I-B型CRISPR-Cascade-Cas3基因编辑系统最偏好的PAM-DNA序列为5’-atg-3’。More preferably, the coding gene sequence of the Cmx8 protein is SEQ ID NO.1; the coding gene sequence of the Cas8 protein is SEQ ID NO.3; the coding gene sequence of the Cas5 protein is SEQ ID NO.5; the coding gene sequence of the Cas6 protein is SEQ ID NO.7; the coding gene sequence of the Cas11 protein is SEQ ID NO.9; the coding gene sequence of the Cas3 protein is SEQ ID NO.11; the coding gene sequence of the nuclear localization signal NLS is SEQ ID NO.13; the most preferred PAM-DNA sequence of the type I-B CRISPR-Cascade-Cas3 gene editing system is 5’-atg-3’.

采用以上优选方案，可进一步优化具体细节特征，获得更好的基因编辑效果。By adopting the above preferred scheme, the specific detail features can be further optimized to obtain better gene editing effects.

本发明还提出：The present invention also proposes:

前文所述I-B型CRISPR-Cascade-Cas3基因编辑系统的制备方法，其特征是，包括以下步骤：The method for preparing the above-mentioned type I-B CRISPR-Cascade-Cas3 gene editing system is characterized by comprising the following steps:

第一步、构建Cascade复合物的质粒，并构建Cas3蛋白的质粒；The first step is to construct a plasmid for the Cascade complex and a plasmid for the Cas3 protein;

第二步、将Cascade复合物的质粒共转入E.coli原核表达细胞，将Cas3蛋白的质粒单独转入E.coli原核表达细胞；然后分别进行诱导表达，并经纯化获得纯化的Cascade复合物和Cas3蛋白；The second step is to co-transfer the plasmid of the Cascade complex into E. coli prokaryotic expression cells, and to transfer the plasmid of the Cas3 protein into the E. coli prokaryotic expression cells alone; then induce expression respectively, and purify to obtain purified Cascade complex and Cas3 protein;

即得I-B型CRISPR-Cascade-Cas3基因编辑系统。The type I-B CRISPR-Cascade-Cas3 gene editing system is obtained.

优选地，第一步中，在Cascade复合物的质粒中，Cas8蛋白的编码基因序列的3’端连有核定位信号NLS的编码基因序列；在Cas3蛋白的质粒中，Cas3蛋白的编码基因序列的5’端连有核定位信号NLS的编码基因序列；所述核定位信号NLS的编码基因序列为SEQ IDNO.13；Preferably, in the first step, in the plasmid of the Cascade complex, the 3' end of the coding gene sequence of the Cas8 protein is connected to the coding gene sequence of the nuclear localization signal NLS; in the plasmid of the Cas3 protein, the 5' end of the coding gene sequence of the Cas3 protein is connected to the coding gene sequence of the nuclear localization signal NLS; the coding gene sequence of the nuclear localization signal NLS is SEQ ID NO.13;

第二步中，所述E.coli原核表达细胞为E.coli BL21(DE3)；纯化时先将表达产物经亲和层析处理得到粗提蛋白，再将粗提蛋白经分子筛层析得到纯化的目的蛋白。In the second step, the E. coli prokaryotic expression cell is E. coli BL21 (DE3); during purification, the expression product is firstly subjected to affinity chromatography to obtain a crude protein, and then the crude protein is subjected to molecular sieve chromatography to obtain a purified target protein.

更优选地，第一步的具体过程如下：More preferably, the specific process of the first step is as follows:

将Cmx8蛋白、核定位信号NLS、Cas8蛋白、Cas5蛋白的编码基因序列构建入第一质粒，将Cas6蛋白、Cas11蛋白的编码基因序列构建入第二质粒，将表达crRNA的DNA片段序列构建入第三质粒；所述第一质粒、第二质粒以及第三质粒均属于Cascade复合物的质粒；将核定位信号NLS、Cas3蛋白的编码基因序列构建入第四质粒，所述第四质粒即Cas3蛋白的质粒；The coding gene sequences of Cmx8 protein, nuclear localization signal NLS, Cas8 protein, and Cas5 protein are constructed into the first plasmid, the coding gene sequences of Cas6 protein and Cas11 protein are constructed into the second plasmid, and the DNA fragment sequence expressing crRNA is constructed into the third plasmid; the first plasmid, the second plasmid, and the third plasmid are all plasmids belonging to the Cascade complex; the coding gene sequences of nuclear localization signal NLS and Cas3 protein are constructed into the fourth plasmid, and the fourth plasmid is the plasmid of Cas3 protein;

所述第一质粒的载体为pCDF-Duet-1，所述第二质粒的载体为pRSF-Duet-1，所述第三质粒的载体为pUC19，所述第四质粒的载体为pET-28a。The vector of the first plasmid is pCDF-Duet-1, the vector of the second plasmid is pRSF-Duet-1, the vector of the third plasmid is pUC19, and the vector of the fourth plasmid is pET-28a.

上述制备方法能快速、高效、高产的产生高纯度与活性的Cascade复合物与Cas蛋白，使用E.coli原核表达系统，亲和柱与分子筛纯化可以在两天之内得到大量蛋白，效率高。The above preparation method can produce high-purity and active Cascade complexes and Cas proteins quickly, efficiently and in high yield. Using the E. coli prokaryotic expression system, affinity columns and molecular sieve purification can obtain a large amount of protein within two days with high efficiency.

本发明还提出：The present invention also proposes:

前文所述I-B型CRISPR-Cascade-Cas3基因编辑系统用于识别、结合及编辑原核生物基因或真核生物基因的应用。The above-mentioned Type I-B CRISPR-Cascade-Cas3 gene editing system is used to identify, bind and edit prokaryotic genes or eukaryotic genes.

本发明还提出：The present invention also proposes:

一种细胞基因敲除方法，其特征是，采用前文所述I-B型CRISPR-Cascade-Cas3基因编辑系统，所述方法包括以下步骤：A method for knocking out a gene in a cell, characterized in that the type I-B CRISPR-Cascade-Cas3 gene editing system described above is used, and the method comprises the following steps:

S1、将Cascade复合物以及Cas3蛋白电转入目标细胞对其靶标基因进行敲除；S1. Electroporate the Cascade complex and Cas3 protein into target cells to knock out the target gene;

S2、检测分析确定靶标基因的敲除效果。S2. Detection and analysis to determine the knockout effect of the target gene.

优选地，S1中，采用Neon细胞核转染系统进行电转；Preferably, in S1, electroporation is performed using the Neon cell nucleofection system;

S2中，采用流式分析法进行检测分析，或者采用Long range PCR法及NGS测序法进行检测分析。In S2, flow cytometry is used for detection and analysis, or long range PCR and NGS sequencing are used for detection and analysis.

上述细胞基因敲除方法能在目标细胞中对靶标基因进行敲除，且基因敲除效率高。The above-mentioned cell gene knockout method can knock out the target gene in the target cell, and the gene knockout efficiency is high.

本发明还提出：The present invention also proposes:

一种含有前文所述I-B型CRISPR-Cascade-Cas3基因编辑系统的细胞系或细胞株。A cell line or cell strain containing the type I-B CRISPR-Cascade-Cas3 gene editing system described above.

与现有技术相比，本发明的I-B型CRISPR-Cascade-Cas3基因编辑系统能够使单个CRISPR靶向位点形成不同程度的长片段缺失，从而弥补目前CRISPR-Cas9产生长片段缺失的能力相对有限的空白；本发明制备方法能快速、高效、高产的产生高纯度与活性的Cascade复合物与Cas蛋白，使用E.coli原核表达系统，亲和柱与分子筛纯化可以在两天之内得到大量蛋白，效率高；本发明细胞基因敲除方法能在目标细胞中对靶标基因进行敲除，且基因敲除效率高。Compared with the prior art, the Type I-B CRISPR-Cascade-Cas3 gene editing system of the present invention can cause a single CRISPR targeting site to form long-fragment deletions of varying degrees, thereby filling the gap in the relatively limited ability of the current CRISPR-Cas9 to produce long-fragment deletions; the preparation method of the present invention can quickly, efficiently and high-yield produce high-purity and active Cascade complexes and Cas proteins, and uses the E. coli prokaryotic expression system, affinity columns and molecular sieve purification to obtain a large amount of protein within two days with high efficiency; the cell gene knockout method of the present invention can knock out the target gene in the target cell, and the gene knockout efficiency is high.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为本发明背景技术中CRISPR-Cas系统免疫机制简图。FIG1 is a schematic diagram of the immune mechanism of the CRISPR-Cas system in the background technology of the present invention.

图2为本发明背景技术中CRISPR-Cas系统Class1与Class2分类图。FIG2 is a classification diagram of Class 1 and Class 2 of the CRISPR-Cas system in the background technology of the present invention.

图3为本发明背景技术中利用CRISPR-Cas系统在人类细胞中进行基因敲除实验的示意图，其中左图为I-E型CRISPR-Cas系统，右图为I-C型CRISPR-Cas系统。Figure 3 is a schematic diagram of a gene knockout experiment in human cells using the CRISPR-Cas system in the background technology of the present invention, wherein the left figure is a type I-E CRISPR-Cas system and the right figure is a type I-C CRISPR-Cas system.

图4至图7依次为本发明实施例1中构建得到pCDF-Duet-1-cmx8-NLS-cas8-cas5、pUC19-CRISPR array、pRSF-Duet-1-cas6-cas11、pET-28a-Cas3质粒图谱。Figures 4 to 7 are respectively the plasmid maps of pCDF-Duet-1-cmx8-NLS-cas8-cas5, pUC19-CRISPR array, pRSF-Duet-1-cas6-cas11, and pET-28a-Cas3 constructed in Example 1 of the present invention.

图8为本发明实施例1中以tdTomato基因作为靶标基因设计crRNA序列的示意图。Figure 8 is a schematic diagram of designing a crRNA sequence using the tdTomato gene as a target gene in Example 1 of the present invention.

图9为本发明实施例2中Cascade复合物的分子筛色谱图以及SDS-PAGE电泳图。FIG. 9 is a molecular sieve chromatogram and SDS-PAGE electrophoresis diagram of the Cascade complex in Example 2 of the present invention.

图10为本发明实施例2中Cas3蛋白的分子筛色谱图以及SDS-PAGE电泳图。Figure 10 is a molecular sieve chromatogram and SDS-PAGE electrophoresis diagram of the Cas3 protein in Example 2 of the present invention.

图11、图12为本发明实施例3中含单一PAM序列的DNA与Cascade复合物的EMSA反应结果图。FIG. 11 and FIG. 12 are diagrams showing the EMSA reaction results of the DNA containing a single PAM sequence and the Cascade complex in Example 3 of the present invention.

图13为本发明实施例4中通过分子筛体外重组得到的Cascade-DNA-Cas3三元复合物的分子筛色谱图以及SDS-PAGE电泳图。13 is a molecular sieve chromatogram and SDS-PAGE electrophoresis diagram of the Cascade-DNA-Cas3 ternary complex obtained by molecular sieve in vitro recombination in Example 4 of the present invention.

图14为本发明实施例5在hESC细胞系中基因敲除效率结果图。FIG. 14 is a graph showing the gene knockout efficiency in hESC cell lines according to Example 5 of the present invention.

具体实施方式DETAILED DESCRIPTION

下面参照附图并结合实施例对本发明作进一步详细描述。但是本发明不限于所给出的例子。The present invention will be described in further detail below with reference to the accompanying drawings and in combination with embodiments, but the present invention is not limited to the examples given.

实施例1Example 1

本实施例为构建质粒，这些质粒用于后续制备本发明基因编辑系统的各组分。This example is about constructing plasmids, which are used to subsequently prepare the components of the gene editing system of the present invention.

利用PCR扩增Cmx8蛋白、Cas8蛋白、Cas5蛋白、Cas6蛋白、Cas11蛋白、Cas3蛋白、核定位信号NLS的编码基因序列(序列依次为SEQ ID NO.1、3、5、7、9、11、13)以及CRISPR-array，酶切酶连重组质粒后用化学转化法将重组质粒转入DH5α感受态；再使用plasmidextraction Kit提取质粒经Sanger测序后获得正确的重组质粒。PCR was used to amplify the coding gene sequences of Cmx8 protein, Cas8 protein, Cas5 protein, Cas6 protein, Cas11 protein, Cas3 protein, nuclear localization signal NLS (the sequences are SEQ ID NO.1, 3, 5, 7, 9, 11, 13, respectively) and CRISPR-array. After the recombinant plasmid was digested with enzymes, the recombinant plasmid was transferred into DH5α competent state by chemical transformation. The plasmid was extracted using a plasmidextraction Kit and the correct recombinant plasmid was obtained after Sanger sequencing.

通过上述方法构建得到pCDF-Duet-1-cmx8-NLS-cas8-cas5、pUC19-CRISPRarray、pRSF-Duet-1-cas6-cas11、pET-28a-Cas3质粒结构如图4至图7所示。The plasmid structures of pCDF-Duet-1-cmx8-NLS-cas8-cas5, pUC19-CRISPRarray, pRSF-Duet-1-cas6-cas11, and pET-28a-Cas3 constructed by the above method are shown in Figures 4 to 7.

本实施例设计了表达CRISPR array的DNA片段序列(本实施例以tdTomato基因作为靶标基因)(SEQ ID NO.15)，构建在pUC19载体上。表达crRNA的DNA片段序列的结构为：5’-repeat序列-spacer序列-repeat序列-spacer序列-repeat序列-3’，其中，repeat序列来源于Synechocystis sp.PCC 6714的原始type I-B基因簇，spacer序列来源于tdTomato基因(如图8所示)。This embodiment designs a DNA fragment sequence expressing CRISPR array (the tdTomato gene is used as the target gene in this embodiment) (SEQ ID NO.15), which is constructed on a pUC19 vector. The structure of the DNA fragment sequence expressing crRNA is: 5'-repeat sequence-spacer sequence-repeat sequence-spacer sequence-repeat sequence-3', wherein the repeat sequence is derived from the original type I-B gene cluster of Synechocystis sp. PCC 6714, and the spacer sequence is derived from the tdTomato gene (as shown in Figure 8).

以上各序列如下：The above sequences are as follows:

Cmx8蛋白的编码基因序列：SEQ ID NO.1：The coding gene sequence of Cmx8 protein: SEQ ID NO.1:

atgggcagcagccatcaccatcatcaccaccaccacagccagtggagccatccgcagtttgaaaaaggtggtggtagcggtggtggttcaggtggtagtgcatggtcacaccctcagtttgagaaactggaagtgctgttccagggtccgggatccatgccgaaaacccaagcggagatcctgaccctggacttcaacctggcggaactgccgagcgcgcaacaccgtgcgggtctggcgggtctgatcctgatgattcgtgagctgaagaaatggccgtggtttaagatccgtcaaaaggagaaagacgtgctgctgagcattgaaaacctggatcagtacggtgcgagcatccaactgaacctggaaggcctgattgcgctgttcgatctggcgtatctgagctttaccgaggagcgtaagagcaaaagcaagatcaaagacttcaaacgtgttgatgagatcgaaattgaggaaaacggcaagaacaagatccagaagtactacttctacgacgtgattaccccgcaaggtggctttctggcgggttgggacaaaagcgatggccagatctggctgcgtatttggcgtgatatgttctggagcatcattaagggcgttccggcgacccgtaacccgtttaacaaccgttgcggtctgaacctgaacgcgggcgacagcttcagcaaggatgttgagagcgtgtggaaaagcctgcagaacgcggaaaagaccaccggtcaaagcggcgcgttttacctgggtgcgatggcggttaacgcggaaaacgtgagcaccgacgatctgatcaaatggcagttcctgctgcacttctgggcgtttgttgcgcaagtgtactgcccgtatattctggacaaggatggtaaacgtaactttaacggctatgtgatcgttattccggacatcgcgaacctggaggacttctgcgatattctgccggatgtgctgagcaaccgtaacagcaaagcgttcggttttcgtccgcaggaaagcgttatcgacgtgccggagcaaggcgcgctggaactgctgaacctgatcaagcagcgtattgcgaagaaagcgggtagcggcctgctgagcgatctgatcgtgggtgttgaggtgatccacgcggaaaagcagggcaacagcatcaaactgcacagcgttagctacctgcaaccgaacgaggaaagcgtggacgattataacgcgattaagaacagctactattgcccgtggttccgtcgtcagctgctgctgaacctggttaacccgaaatttgacctggcgagccaaagctggctgaagcgtcacccgtggtacggttttggcgatctgctgagccgtatcccgcagcgttggctgaaagagaacaacagctatttcagccacgacgcgcgtcagctgttcacccaaaagggtgactttgatatgaccgtggcgaccaccaaaacccgtgagtacgcggaaatcgtttataagattgcgcagggtttcgtgctgagcaagctgagcagcaaacacgacctgcaatggagcaagtgcaaaggcaacccgaaactggagcgtgaatacaacgataagaaagagaaggtggttaacgaagcgtttctggcgatccgtagccgtaccgaaaaacaggcgttcattgactactttgttagcaccctgtatccgcacgttcgtcaagacgagttcgtggattttgcgcagaaactgttccaagacaccgatgaaatccgtagcctgaccctgctggcgctgagcagccagtatccgattaagcgtcaaggcgagaccgaataaatgggcagccatcaccatcatcaccaccaccacagccagtggagccatccgcagtttgaaaaaggtggtggtagcggtggtggttcaggtggtagtgcatggtcacaccctcagtttgagaaactggaagtgctgttccagggtccgggatccatgccgaaaacccaagcggagatcctgaccctggacttcaacctggc ggaactgccgagcgcgcaacacc gtgcgggtctggcgggtctgatcctgatgattcgtgagctgaagaaatggccgtggtttaagatccgtcaaaaggagaaagacgtgctgctgagcattgaaaacctggatcagtacggtgcgagcatccaactgaacctggaaggcctgattgcgctgttcgatctggcgtatctgagctttaccgaggagcg taagagcaaaagcaagatcaaagacttcaaacgt gttgatgagatcgaaattgaggaaaacggcaagaacaagatccagaagtactacttctacgacgtgattaccccgcaaggtggctttctggcgggttgggacaaaagcgatggccagatctggctgcgtatttggcgtgatatgttctggagcatcattaagggcgttccggcgacccgtaacccgtttaacaaccgtt gcggtctgaacctgaacgcgggcgaca gcttcagcaaggatgttgagagcgtgtggaaaagcctgcagaacgcggaaaagaccaccggtcaaagcggcgcgttttacctgggtgcgatggcggttaacgcggaaaacgtgagcaccgacgatctgatcaaatggcagttcctgctgcacttctgggcgtttgttgcgcaagt gtactgcccgtatattctggacaaggatggtaaacgtaactttaacggctat gtgatcgttattccggacatcgcgaacctggaggacttctgcgatattctgccggatgtgctgagcaaccgtaacagcaaagcgttcggttttcgtccgcaggaaagcgttatcgacgtgccggagcaaggcgcgctggaactgctgaacctgatcaagcagcgtattgcgaagaaagcgggtagcggcc tgctgagcgatctgatcgtgggtgttgaggtgatcc acgcggaaaagcagggcaacagcatcaaactgcacagcgttagctacctgcaaccgaacgaggaaagcgtggacgattataacgcgattaagaacagctactattgcccgtggttccgtcgtcagctgctgctgaacctggttaacccgaaatttgacctggcgagccaaagctggctgaagcgtcacccgtggtacggtt ttggcgatctgctgagccgtatcccg cagcgttggctgaaagagaacaacagctatttcagccacgacgcgtcagctgttcacccaaaagggtgactttgatatgaccgtggcgaccaccaaaacccgtgagtacgcggaaatcgtttataagattgcgcagggtttcgtgctgagcaagctgagcagcaaacacgacctgcaatggagcaagt gcaaaggcaacccgaaactggagcgtgaatacaacg ataagaaagagaaggtggttaacgaagcgtttctggcgatccgtagccgtaccgaaaaacaggcgttcattgactactttgttagcaccctgtatccg cacgttcgtcaagacgagttcgtggattttgcgcagaaactgttccaagacaccgatgaaatccgtagcctgaccctgctggcgctgagcagccagtat ccgattaagcgtcaaggcgagaccgaataa

Cmx8蛋白的氨基酸序列：SEQ ID NO.2：Amino acid sequence of Cmx8 protein: SEQ ID NO.2:

MPKTQAEILTLDFNLAELPSAQHRAGLAGLILMIRELKKWPWFKIRQKEKDVLLSIENLDQYGASIQLNLEGLIALFDLAYLSFTEERKSKSKIKDFKRVDEIEIEENGKNKIQKYYFYDVITPQGGFLAGWDKSDGQIWLRIWRDMFWSIIKGVPATRNPFNNRCGLNLNAGDSFSKDVESVWKSLQNAEKTTGQSGAFYLGAMAVNAENVSTDDLIKWQFLLHFWAFVAQVYCPYILDKDGKRNFNGYVIVIPDIANLEDFCDILPDVLSNRNSKAFGFRPQESVIDVPEQGALELLNLIKQRIAKKAGSGLLSDLIVGVEVIHAEKQGNSIKLHSVSYLQPNEESVDDYNAIKNSYYCPWFRRQLLLNLVNPKFDLASQSWLKRHPWYGFGDLLSRIPQRWLKENNSYFSHDARQLFTQKGDFDMTVATTKTREYAEIVYKIAQGFVLSKLSSKHDLQWSKCKGNPKLEREYNDKKEKVVNEAFLAIRSRTEKQAFIDYFVSTLYPHVRQDEFVDFAQKLFQDTDEIRSLTLLALSSQYPIKRQGETEMPKTQAEILTLDFNLAELPSAQHRAGLAGLILLMIRELKKWPWFKIRQKEKDVLLSIENLDQYGASIQLNLEGLIALFDLAYLSFTEERKSSKKIKDFKRVDEIEIEENGKNKIQKYYFYDVITPQGGFLAGWDKSDGQIWLRIWRDMFWSIIKGVPATRNPFNNRCGLNLNAGDSFSKDVESVWKSLQNAEKTTGQSGAFYLG AMAVNAENVSTDDLIKWQFLLHFWAFVAQVYCPYILDKDGKRNFNGYVIVIPDIANLEDFCDILPDVLSNRN SKAFGFRPQESVIDVPEQGALELLNLIKQRIAKKAGGSGLLSDLIVGVEVIHAEKQGNSIKLHSVSYLQPNEESVDDYNAIKNSYYCPWFRRQLLLNLVNPKFDLASQSWLKRHPWYGFGDLLSRIPQRWLKENNSYFSHDARQLFTQKGDFDMTVATTKTREYAEIVYKIAQGFVLSKLSSKHDLQWSKCKGNPKLEREYND KKEKVVNEAFLAIRSRTEKQAFIDYFVSTLYPHVRQDEFVDFAQKLFQDTDEIRSLTLLALSSQYPIKRQGETE

Cas8蛋白的编码基因序列：SEQ ID NO.3：Cas8 protein coding gene sequence: SEQ ID NO.3:

atgagcaacctgaacctgttcgcgaccatcctgacctatccggcgccggcgagcaactatcgtggcgagagcgaggaaaaccgtagcgtgatccagaagattctgaaagacggtcaaaaatacgcgatcattagcccggaaagcatgcgtaacgcgctgcgtgagatgctgattgaactgggccagccgaacaaccgtacccgtctgcacagcgaggaccaactggcggtggagttcaaagaatacccgaacccggataagtttgcggacgatttcctgtttggttatatggttgcgcagaccaacgacgcgaaagaaatgaagaaactgaaccgtccggcgaagcgtgatagcatcttccgttgcaacatggcggtggcggttaacccgtacaaatatgacaccgtgttttaccaaagcccgctgaacgcgggtgatagcgcgtggaagaacagcaccagcagcgcgctgctgcaccgtgaggttacccacaccgcgttccagtatccgttcgcgctggcgggcaaggactgcgcggcgaaaccggagtgggtgaaggcgctgctgcaagcgattgcggaactgaacggtgttgcgggtggccatgcgcgtgcgtactatgaatttgcgccgcgtagcgtggttgcgcgtctgaccccgaaactggtggcgggttaccagacctatggctttgatgcggagggtaactggctggaactgagccgtctgaccgcgaccgacagcgataacctggacctgccggcgaacgagttttggctgggtggcgaactggttcgtaaaatggatcaggagcaaaaggcgcaactggaagcgatgggtgcgcacctgtatgcgaacccggagaagttgtttgccgacttagcagatagttttctgggggtaccgaagaagaagcgtaaggtgtaaatgagcaacctgaacctgttcgcgaccatcctgacctatccggcgccggcgagcaactatcgtggcgagagcgaggaaaaccgtagcgtgatccagaagattctgaaagacggtcaaaaatacgcgatcattagcccggaaagcatgcgtaacgcgctgcgtgagatgctgattgaactgggccagccgaacaaccg tacccgtctgcacagcgaggaccaactggcggtg gagttcaaagaatacccgaacccggataagtttgcggacgatttcctgtttggttatatggttgcgcagaccaacgacgcgaaagaaatgaagaaactgaaccgtccggcgaagcgtgatagcatcttccgttgcaacatggcggtggcggttaacccgtacaaatatgacaccgtgttttaccaaagcccgctgaacgc gggtgatagcgcgtggaagaacagcaccagca gcgcgctgctgcaccgtgaggttacccacaccgcgttccagtatccgttcgcgctggcgggcaaggactgcgcggcgaaaccggagtgggtgaaggcgctgctgcaagcgattgcggaactgaacggtgttgcgggtggccatgcgcgtgcgtactatgaatttgcgccgcgtagcgt ggttgcgcgtctgaccccgaaactggtggcgggttaccagacctatggctttga tgcggagggtaactggctggaactgagccgtctgaccgcgaccgacagcgataacctggacctgccggcgaacgagttttggctgggtggcgaactggttcgtaaaatggatcaggagcaaaaggcgcaactggaagcgatgggtgcgcacctgtatgcgaacccggagaagttgtttgccgacttagcagatagt tttctgggggtaccgaagaagaagcgtaaggtgtaa

Cas8蛋白的氨基酸序列：SEQ ID NO.4：Amino acid sequence of Cas8 protein: SEQ ID NO.4:

MSNLNLFATILTYPAPASNYRGESEENRSVIQKILKDGQKYAIISPESMRNALREMLIELGQPNNRTRLHSEDQLAVEFKEYPNPDKFADDFLFGYMVAQTNDAKEMKKLNRPAKRDSIFRCNMAVAVNPYKYDTVFYQSPLNAGDSAWKNSTSSALLHREVTHTAFQYPFALAGKDCAAKPEWVKALLQAIAELNGVAGGHARAYYEFAPRSVVARLTPKLVAGYQTYGFDAEGNWLELSRLTATDSDNLDLPANEFWLGGELVRKMDQEQKAQLEAMGAHLYANPEKLFADLADSFLGVMSNLNLFATIILTYPAPASNYRGESEERNRSVIQKILKDGQKYAIISPESMRNALREMLIELGQPNNRTRLHSEDQLAVEFKEYPNPDKFADDFLFGYMVAQTNDAKEMKKLNRPAKRDSIFRCNMAVAVNPYKYDTVFYQSPLNAGDSAWKNSTSSALLHREVTHTAFQYPFALAGKDCAAKPEWVKALLQAIAELNGVAGGHARAYYEFAPRSV VARLTPKLVAGYQTYGFDAEGNWLELSRLTATDSDNLDLPANEFWLGGELVRKMDQEQKAQLEAMGAHLYANPEKLFADLADSFLGV

Cas5蛋白的编码基因序列：SEQ ID NO.5：Cas5 protein coding gene sequence: SEQ ID NO.5:

atggcgcagctggcgctggcgctggacaccgtgacccgttacctgcgtctgaaggcgccgttcgcggcgtttcgtccgttccaaagcggtagctttcgtagcaccaccccggtgccgagcttcagcgcggtttatggtctgctgctgaacctggcgggcatcgagcagcgtcaagaggtggagggtaaagttaccctgattaagccgaaagcggaactgccgaagctggcgatcgcgattggccaggtgaaaccgagcagcaccagcctgatcaaccagcaactgcacaactacccggttggtaacagcggcaaggagtttgcgagccgtaccttcggtagcaaatattggattgcgccggtgcgtcgtgaagtgctggttaacctggacctgatcattggcctgcaaagcccggtggagttttggcagaagctggatcaaggtctgaaaggcgaaaccgttatcaaccgttacggtctgccgttcgcgggcgacaacaacttcctgtttgatgagatctacccgattgaaaagccggacctggcgagctggtattgcccgctggagccggatacccgtccgaaccagggtgcgtgccgtctgaccctgtggatcgaccgtgagaacaacacccaaaccaccattaaggtttttagcccgagcgatttccgtctggaaccgccggcgaaagcgtggcagcaactgccgggctaaatggcgcagctggcgctggcgctggacaccgtgacccgttacctgcgtctgaaggcgccgttcgcggcgtttcgtccgttccaaagcggtagctttcgtagcaccaccccggtgccgagcttcagcgcggtttatggtctgctgctgaacctggcgggcatcgagcagcgtcaagaggtggag ggtaaagttaccctgattaagccgaaagcggaactgccgaagctggcgatcgcgattggccaggtgaaaccgagcagcaccagcctgatcaaccagcaactgcacaactacccggttggtaacagcggcaaggagtttgcgagccgtaccttcggtagcaaatattggattgcg ccggtgcgtcgtgaagtgctggttaacctggacctgatcattggcctgcaaagcccggtggagttttggcagaagctggatcaaaggtctgaaaggcgaaaccgttatcaaccgttacggtctgccgttcgcgggcgacaacaacttcctgtttgatgagatctacccgattgaaaagccggacctggcgag ctggtattgcccgctggagccggatacccgtccgaaccagggtgcgtgccgtctgaccctgtggatcgaccgtgagaacaacacccaaaccaccattaaggtttttagcccgagcgatttccgtctggaaccgccggcgaaagcgtggcagcaactgccgggctaa

Cas5蛋白的氨基酸序列：SEQ ID NO.6：Amino acid sequence of Cas5 protein: SEQ ID NO.6:

MAQLALALDTVTRYLRLKAPFAAFRPFQSGSFRSTTPVPSFSAVYGLLLNLAGIEQRQEVEGKVTLIKPKAELPKLAIAIGQVKPSSTSLINQQLHNYPVGNSGKEFASRTFGSKYWIAPVRREVLVNLDLIIGLQSPVEFWQKLDQGLKGETVINRYGLPFAGDNNFLFDEIYPIEKPDLASWYCPLEPDTRPNQGACRLTLWIDRENNTQTTIKVFSPSDFRLEPPAKAWQQLPGMAQLALALDTVTRYLRLKAPFAAFRPFQSGSFRSTTPVPSFSAVYGLLLNLAGIEQRQEVEGKVTLIKPKAELPKLAIAIGQVKPSSTSLINQQLHNYPVGNSGKEFASRTFGSKYWIAPVRREVLVNLDLIIGLQSPVEFWQKLDQGLKGETVINRYGLPFAGDNNFLFDEIYPIEKPDLASWYCPLEPDTRPNQGACRLTLWIDRENNTQTTIKVF SPSDFRLEPPAKAWQQLPG

Cas6蛋白的编码基因序列：SEQ ID NO.7：Cas6 protein coding gene sequence: SEQ ID NO.7:

atgaacttcatcgacctggcgtttccggtgaagggcaccgttctgaacgcggatcacaactactatctgtacagcgcgattgcgaaagagtttccgatcctgcacgacctgccggatctggcggtgaacaccatcagcggcaagccggaccgtgaaggcaaaattctgctggttccgggcagcaagctgtggatgcgtctgccgatcgataacattacccacatctaccagctggcgggtaagaaactgcgtattggccaatatagcatcgaactgggtaacccgagcctgcacccgctggagccggttgaaagcctgaaggcgcgtatcattaccattaaaggtcacaccgagccgatcagcttcctggaagcggtgaagcgtcagctgtttgcgctggagattaccgaaggtgacgttggcatcccggcgaaccacgagggtattccgaaacgtctgaccctgcaaatcaagaaaccggaacgtacctacagcattgtgggctatagcgttctgctgagcaacctgagcgcggaggatagcctgaagattcagcaagtgggtatcggtggcaaacgtcgtctgggttgcggcgtgttctatccggcggttaagaaaagcaccaacagcggtaacaagaaaaacgttgaagcgaccctgggctaaatgaacttcatcgacctggcgtttccggtgaagggcaccgttctgaacgcggatcacaactactatctgtacagcgcgattgcgaaagagtttccgatcctgcacgacctgccggatctggcggtgaacaccatcagcggcaagccggaccgtgaaggcaaaaattctgctggttccgggcagcaagctgtggatgcg tctgccgatcgataacattacccacatctaccagctggcgggtaagaaactgcgtattggccaatatagcatcgaactgggtaacccgagcctgcacccgctggagccggttgaaagcctgaaggcgcgtatcatt accattaaaggtcacaccgagccgatcagcttcctggaagcggtgaagcgtcagctgtttgcgctggagattaccgaaggtgacgttggcatcccggcgaaccacgagggtattccgaaacgtctgaccctgcaaatcaagaaaccggaacgtacctacagcattgtgggctatagcgttctgctgagcaacctgag cgcggaggatagcctgaagatcagcaagtgggtatcggtggcaaacgtcgtctgggttgcggcgtgttctatccggcggttaagaaaagcaccaacagcggtaacaagaaaaacgttgaagcgaccctgggctaa

Cas6蛋白的氨基酸序列：SEQ ID NO.8：Amino acid sequence of Cas6 protein: SEQ ID NO.8:

MNFIDLAFPVKGTVLNADHNYYLYSAIAKEFPILHDLPDLAVNTISGKPDREGKILLVPGSKLWMRLPIDNITHIYQLAGKKLRIGQYSIELGNPSLHPLEPVESLKARIITIKGHTEPISFLEAVKRQLFALEITEGDVGIPANHEGIPKRLTLQIKKPERTYSIVGYSVLLSNLSAEDSLKIQQVGIGGKRRLGCGVFYPAVKKSTNSGNKKNVEATLGMNFIDLAFPVKGTVLNADHNYYLYSAIAKEFPILHDLPDLAVNTISGKPDREGKILLVPGSKLWMRLPIDNITHIYQLAGKKLRIGQYSIELGNPSLHPLEPVESLKARIITIKGHTEPISFLEAVKRQLFALEITEGDVGIPANHEGIPKRLTLQIKKPERTYSIVGYSVLLSNLSAEDSLKIQQVGIGGKRRLGCGVFYPAVKKSTNSGNKK NVEATLG

Cas11蛋白的编码基因序列：SEQ ID NO.9：Cas11 protein coding gene sequence: SEQ ID NO.9:

atgaccgtggcgaccaccaaaacccgtgagtacgcggaaatcgtttataagattgcgcagggtttcgtgctgagcaagctgagcagcaaacacgacctgcaatggagcaagtgcaaaggcaacccgaaactggagcgtgaatacaacgataagaaagagaaggtggttaacgaagcgtttctggcgatccgtagccgtaccgaaaaacaggcgttcattgactactttgttagcaccctgtatccgcacgttcgtcaagacgagttcgtggattttgcgcagaaactgttccaagacaccgatgaaatccgtagcctgaccctgctggcgctgagcagccagtatccgattaagcgtcaaggcgagaccgaataaatgaccgtggcgaccaccaaaacccgtgagtacgcggaaatcgtttataagattgcgcagggtttcgtgctgagcaagctgagcagcaaacacgacctgcaatggagcaagtgcaaaggcaacccgaaactggagcgtgaatacaacgataagaaagagaaggtggttaacgaagcgtttctggcgatccgtagccgta ccgaaaaacaggcgttcattgactactttgttagcaccctgtatccgcacgttcgtcaagacgagttcgtggattttgcgcagaaactgttccaagacaccgatgaaatccgtagcctgaccctgctggcgctgagcagccagtatccgattaagcgtcaaggcgagaccgaataa

Cas11蛋白的氨基酸序列：SEQ ID NO.10：Amino acid sequence of Cas11 protein: SEQ ID NO.10:

MTVATTKTREYAEIVYKIAQGFVLSKLSSKHDLQWSKCKGNPKLEREYNDKKEKVVNEAFLAIRSRTEKQAFIDYFVSTLYPHVRQDEFVDFAQKLFQDTDEIRSLTLLALSSQYPIKRQGETEMTVATTKTREYAEIVYKIAQGFVLSKLSSKHDLQWSKCKGNPKLEREYNDKKEKVVNEAFLAIRSRTEKQAFIDYFVSTLYPHVRQDEFVDFAQKLFQDTDEIRSLTLLALSSQYPIKRQGETE

Cas3蛋白的编码基因序列：SEQ ID NO.11：Cas3 protein coding gene sequence: SEQ ID NO.11:

atgctgaaacaactgctggcgaagagcctgccgaccgacccgcagaagaaaccgctgagcctggaacaacacctgctggataccgagaccgcggcgctggtgatctttaagggtcgtatgctggacaactggtgccgtttctttaaggttaaagacccggatgaattcctgctgcacctgcgtgtggcggcgctgtttcacgatctgggcaaagcgaaccacgagttcattgaagcggttaccgcgaaaggttttgtgccgcagaccctgcgtcacgaatggatcagcgcgctggttctgcacctgccggaagtgcgtcaatggctgggcaaaagcaacctgaacctggaagtggttaccgcggcggttctgagccatcacctgaaagcgagcccggatggtgattacaagtgggacgaaccgcagaagagcggtgataaagttgagaccaagctgtatttcaaccacgaggaagtggaccgtatcctgaacaaaattgcgaacctgctggacgtggatagcaagctgccggaactgccgaagaaatggatcaaaggcgacattttcctggagaacatctacaaagatgcgaaccagattggtcgtaagtttacccgtcaagcgaagaaagacgatagcctgaaaggcctgctgctggcggttaaagcgggtctgattgcgagcgacagcgtggcgagcggtatttaccgtacccaggatagcgaagcgatcgcgaactgggttaaccaaaccctgcacaccaacagcattaccccggaggaaatcgaggaaaagattctgcacccgcgttatcgtcaggtggagaaaagcatcaacgaaccgttccagctgaaacgttttcaagagaaggcggaaaccctgagcagccgtctgctgctgatgagcggttgcggcagcggtaaaaccattttcgcgtacaagtggatgcagggcgttctgaacaagcaccaagcgggtcgtgcgatcttcctgtatccgacccgtggcaccgcgaccgaaggttttaaagactatgtgagctggtgcccggaggcggatgcgagcctgctgaccggtaccgcgacctacgagctgcaggcgattgcgaaaaacccgaccgaggcgaacgaaggcaaggactatcaagcggatgaacgtctgtacgcgctgggctattggggcaagcgtttctttagcgcgaccgttgaccagttcctgagctttctgacccacaactacaaaagcatctgcctgctgccggtgctggcggacagcgtggttgtgatcgatgaaattcacagcttcagcccggagatgtttgacagcctggtttgcttcctgaagacctttgatgttccggtgctgtgcatgaccgcgaccctgccgcagacccgtattgaggacctgaccattcaactggacaaggataaagacggcctgggtctggaagttttcccgaccagcgatcgtagcgagctggcggagctggaaaaagcggagggcatggaacgttacctgattgcgcacaccaacgaggaagcggcgctggacctggcggtgaaagcgtatcaggatagcaagcgtgttctgtgggttgtgaacaccgtggaccgttgccgtgagaaggcgcgtaaactggaatgcctgctgaagaccgaggttctgacctaccacagccgtttcaaactggcggatcgtcaaaaccgtcaccgtgagaccgtggaagcgtttgcgctgcaccaggcgcaaggtgaaaagaaagcggcgatcgcggttaccacccaggtgtgcgagatgagcctggatctggacgcggatgttctgatcaccgaactggcgccgattagcagcctggtgcaacgtttcggccgtagcaaccgtggtgacaagaacgataaaaccgagccgagcaaaatttacgtttataagccgccgaaggacaaaccgtataagcagaaagacgatctggacccggcggaaaagttcatcaacgatgtgctgggtcgtgcgagccaaaaactgctggcggagaagctgaaagagcatagcccgccgggccgttacagcgatggtagcgcgccgtttgtgacccagggctattgggcgagcagcgatgagccgttccgtaagattgacgattttgcggttaacgcggtgctgaccgaggacctgggtgaaatcacccaatacctgaacagcaacccgccgaaaccgatcgatggctttattgttccggtgccgaagaaatataagttccagggttttagccaccgtccgccgcaactgccgaaatacctggaaatcgcggacagcaaattctatagcagcaagcgtggctttggtgacgatgcgatgctgaaacaactgctggcgaagagcctgccgaccgacccgcagaagaaaccgctgagcctggaacaacacctgctggtaccgagaccgcggcgctggtgatctttaagggtcgtatgctggacaactggtgccgtttctttaaggttaaagacccggatgaattcctgctgcacctgcgtgtggcggcgctgtttc acgatctgggcaaagcgaaccacgagttcattgaagcggttaccgcgaaaggttttgtgccgcagaccctgcgtcacgaatggatcagcgcgctg gttctgcacctgccggaagtgcgtcaatggctgggcaaaagcaacctgaacctggaagtggttaccgcggcggttctgagccatcacctgaaagcgagcccggatggtgattacaagtgggacgaaccgcagaagagcggtgataaagttgagaccaagctgtatttcaaccacgaggaagtggaccgtatcctgaac aaaattgcgaacctgctggacgtggatagcaagctgccggaactgccgaagaaatggatcaaaggcgacattttcctggagaacatctacaaagat gcgaaccagattggtcgtaagtttacccgtcaagcgaagaaagacgatagcctgaaaggcctgctgctggcggttaaagcgggtctgattgcgagcgacagcgtggcgagcggtatttaccgtacccaggatagcgaagcgatcgcgaactgggttaaccaaaccctgcacaccaacagcattaccccggaggaaatcgagga aaagattctgcacccgcgttatcgtcaggtggagaaaagcatcaacgaaccgttccagctgaaacgttttcaagagaaggcggaaaccctg agcagccgtctgctgctgatgagcggttgcggcagcggtaaaaccattttcgcgtacaagtggatgcagggcgttctgaacaagcaccaagcgggtcgtgcgatcttcctgtatccgacccgtggcaccgcgaccgaaggttttaaagactatgtgagctggtgcccggaggcggatgcga gcctgctgaccggtaccgcgacctacgagctgcaggcgattgcgaaaaacccgaccgaggcgaacgaaggcaaggactatcaagcggatgaacgtctgtacgcgctgggctat tggggcaagcgtttctttagcgcgaccgttgaccagttcctgagctttctgacccacaactacaaaagcatctgcctgctgccggtgctggcggacagcgtggttgtgatcgatgaaattcacagcttcagcccggagatgtttgacagcctggtttgcttcctgaagacctttgatgttccgg tgctgtgcatgaccgcgaccctgccgcagacccgtattgaggacctgaccattcaactggacaaggataaagacggcctgggtctggaagttttcccgaccagcgatcgt agcgagctggcggagctggaaaaagcggagggcatggaacgttacctgattgcgcacaccaacgaggaagcggcgctggacctggcggtgaaagcgtatcaggatagcaagcgtgttctgtgggttgtgaacaccgtggaccgttgccgtgagaaggcgcgtaaactggaatgcctgctgaagaccga ggttctgacctaccacagccgtttcaaactggcggatcgtcaaaaccgtcaccgtgagaccgtggaagcgtttgcgctgcaccaggcgcaaggtgaaaagaaagcg gcgatcgcggttaccacccaggtgtgcgagatgagcctggatctggacgcggatgttctgatcaccgaactggcgccgattagcagcctggtgcaacgtttcggccgtagcaaccgtggtgacaagaacgataaaaccgagccgagcaaaatttacgtttataagccgccgaaggacaaaccgtataagcagaaagac gatctggacccggcggaaaagttcatcaacgatgtgctgggtcgtgcgagccaaaaactgctggcggagaagctgaaagagcatagcccgccgggc cgttacagcgatggtagcgcgccgtttgtgacccagggctattgggcgagcagcgatgagccgttccgtaagattgacgattttgcggttaacgcggtgctgaccgaggacctgggtgaaatcacccaatacctgaacagcaacccgccgaaaccgatcgatggctttattgttccggtgccgaagaaatataagtt ccagggttttagccaccgtccgccgcaactgccgaaatacctggaaatcgcggacagcaaattctatagcagcaagcgtggctttggtgacgatgcg

Cas3蛋白的氨基酸序列：SEQ ID NO.12：Amino acid sequence of Cas3 protein: SEQ ID NO.12:

MLKQLLAKSLPTDPQKKPLSLEQHLLDTETAALVIFKGRMLDNWCRFFKVKDPDEFLLHLRVAALFHDLGKANHEFIEAVTAKGFVPQTLRHEWISALVLHLPEVRQWLGKSNLNLEVVTAAVLSHHLKASPDGDYKWDEPQKSGDKVETKLYFNHEEVDRILNKIANLLDVDSKLPELPKKWIKGDIFLENIYKDANQIGRKFTRQAKKDDSLKGLLLAVKAGLIASDSVASGIYRTQDSEAIANWVNQTLHTNSITPEEIEEKILHPRYRQVEKSINEPFQLKRFQEKAETLSSRLLLMSGCGSGKTIFAYKWMQGVLNKHQAGRAIFLYPTRGTATEGFKDYVSWCPEADASLLTGTATYELQAIAKNPTEANEGKDYQADERLYALGYWGKRFFSATVDQFLSFLTHNYKSICLLPVLADSVVVIDEIHSFSPEMFDSLVCFLKTFDVPVLCMTATLPQTRIEDLTIQLDKDKDGLGLEVFPTSDRSELAELEKAEGMERYLIAHTNEEAALDLAVKAYQDSKRVLWVVNTVDRCREKARKLECLLKTEVLTYHSRFKLADRQNRHRETVEAFALHQAQGEKKAAIAVTTQVCEMSLDLDADVLITELAPISSLVQRFGRSNRGDKNDKTEPSKIYVYKPPKDKPYKQKDDLDPAEKFINDVLGRASQKLLAEKLKEHSPPGRYSDGSAPFVTQGYWASSDEPFRKIDDFAVNAVLTEDLGEITQYLNSNPPKPIDGFIVPVPKKYKFQGFSHRPPQLPKYLEIADSKFYSSKRGFGDDAMLKQLLAKSLPTDPQKKPLSLEQHLLDTETAALVIFKGRMLDNWCRFFKVKDPDEFLLHLRVAALFHDLGKANHEFIEAVTAKGFVPQTLRHEWISALVLHLPEVRQWLGKSNLNLEVVTAAVLSHHLKASPDGDYKWDEPQKSGDKVETKLYFNHEEVDRILNKIANLLDVDSKLPELPKKWIKGDIFLENIYKDANQIGRKFTRQA KKDDSLKGLLLAVKAGLIASDSVASGIYRTQDSEAIANWVNQTLHTNSITPEEIEEKILHPRYRQVEKSINEPFQLKRFQEKAETLSSRLLLMSGCGSGKTIFAYKWMQGVLNKHQAGRAIFLYPTRGTATEGFKDYVSWCPEADASLLTGTATYELQAIAKNPTEANEGKDYQADERLYALGY WGKRFFSATVDQFLSFLTHNYKSICLLPVLADSVVVIDEIHSFSPEMFDSLVCFLKTFDVPVLCMTATLPQTRIEDLTIQLDKDKDGLGLEVFPTSDRSELAELEKAEGMERYLIAHTNEEAALDLAVKAYQDSKRVLWVVNTVDRCREKARKLECLLKTEVLTYHSRFKLADRQNRHRETVEAFALHQAQGEKKAAIAVTTQVCEMSLDLDADV LITELAPISSLVQRFGRSNRGDKNDKTEPSKIYVYKPPKDKPYKQKDDLDPAEKFINDVLGRASQKLLAEKLKEHSPPGRYSDGSAPFVTQGYWASSDEPFRKIDDFAVNAVLTEDLGEITQYLNSNPPKPIDGFIVPVPKKYKFQGFSHRPPQLPKYLEIADSKFYSSKRGFGDDA

核定位信号NLS的编码基因序列：SEQ ID NO.13：The coding gene sequence of nuclear localization signal NLS: SEQ ID NO.13:

ccgaagaagaagcgtaaggtgccgaagaagaagcgtaaggtg

核定位信号NLS的氨基酸序列：SEQ ID NO.14：Amino acid sequence of nuclear localization signal NLS: SEQ ID NO.14:

PKKKRKVPKKKRKV

表达CRISPR array的DNA片段序列：SEQ ID NO.15：DNA fragment sequence expressing CRISPR array: SEQ ID NO.15:

注：上述SEQ ID NO.15中，加框为spacer序列，其余为repeat序列。Note: In the above SEQ ID NO.15, the boxed sequence is the spacer sequence, and the rest are repeat sequences.

实施例2Example 2

本实施例以实施例1为基础。本实施例为制备并纯化各蛋白。This example is based on Example 1. This example is to prepare and purify each protein.

(1)Cascade纯化(1) Cascade purification

将实施例1构建得到的pCDF-Duet-1-cmx8-NLS-cas8-cas5、pRSF-Duet-1-cas6-cas11以及pUC19-CRISPR array三个质粒共转入E.coli BL21(DE3)中，将该菌株单菌落接种于转接菌液到1L的大瓶LB培养基中，培养基中抗生素Amp、Kan和Strep工作浓度为50μg/ml。37℃180rpm振荡培养大约3h，测量菌液OD值达到0.6-0.8之间时，降温至18℃后加0.5mM的IPTG溶液诱导表达20h。The three plasmids pCDF-Duet-1-cmx8-NLS-cas8-cas5, pRSF-Duet-1-cas6-cas11 and pUC19-CRISPR array constructed in Example 1 were co-transformed into E. coli BL21 (DE3), and a single colony of the strain was inoculated into a 1L large bottle of LB medium, and the working concentration of antibiotics Amp, Kan and Strep in the medium was 50μg/ml. The culture was shaken at 37℃ and 180rpm for about 3h. When the OD value of the bacterial solution reached between 0.6-0.8, the temperature was lowered to 18℃ and 0.5mM IPTG solution was added to induce expression for 20h.

将菌体在20mM Tris-HCl pH 7.5，500mM NaCl溶液中重悬，用超声裂解。有Streptag的蛋白可以与Strep柱结合，用20mM Tris-HCl pH 7.5，500mM NaCl溶液洗去杂蛋白，用20mM Tris-HCl pH 7.5，500mM NaCl，5mM d-Desthiobiotin溶液洗脱得到粗提蛋白；用分子筛Superdex 200 10/300排阻层析得到均一的蛋白，洗脱液为：20mM HEPES pH 7.5，150mM NaCl。Resuspend the cells in 20mM Tris-HCl pH 7.5, 500mM NaCl solution and lyse by ultrasound. Proteins with Streptag can bind to the Strep column, and the impurities are washed away with 20mM Tris-HCl pH 7.5, 500mM NaCl solution, and the crude protein is eluted with 20mM Tris-HCl pH 7.5, 500mM NaCl, 5mM d-Desthiobiotin solution; uniform protein is obtained by molecular sieve Superdex 200 10/300 exclusion chromatography, and the eluent is: 20mM HEPES pH 7.5, 150mM NaCl.

至此即获得纯化的Cascade复合物，相应的分子筛色谱图以及电泳图见图9。At this point, the purified Cascade complex is obtained, and the corresponding molecular sieve chromatogram and electrophoresis diagram are shown in FIG9 .

(2)Cas3纯化(2) Cas3 purification

将pET-28a-Cas3质粒单独转化到E.coli BL21(DE3)中，在只含50μg/ml Kan抗生素的1L LB中18℃诱导表达20h。菌体重悬于20mM HEPES pH 7.5，500mM NaCl，20mMimidazole，5％甘油的缓冲液中，超声裂解。用Ni-NTA结合带有his-tag的Cas3，用咪唑终浓度为50mM、100mM、200mM和500mM的20mM HEPES pH 7.5，500mM NaCl，20mM imidazole，5％甘油的缓冲液梯度洗脱目的蛋白，最终得到粗提蛋白，用分子筛Superdex 200 10/300排阻层析得到均一的蛋白，洗脱液为：20mM HEPES pH 7.5，150mM NaCl。The pET-28a-Cas3 plasmid was transformed into E.coli BL21 (DE3) alone, and the expression was induced at 18°C for 20h in 1L LB containing only 50μg/ml Kan antibiotics. The bacteria were resuspended in a buffer of 20mM HEPES pH 7.5, 500mM NaCl, 20mMimidazole, and 5% glycerol, and ultrasonically lysed. Ni-NTA was used to bind Cas3 with his-tag, and the target protein was eluted with a buffer gradient of 20mM HEPES pH 7.5, 500mM NaCl, 20mM imidazole, and 5% glycerol with a final imidazole concentration of 50mM, 100mM, 200mM, and 500mM. Finally, the crude protein was obtained, and the uniform protein was obtained by molecular sieve Superdex 200 10/300 exclusion chromatography, and the eluent was: 20mM HEPES pH 7.5, 150mM NaCl.

至此即获得纯化的Cas3蛋白，相应的分子筛色谱图以及电泳图见图10。At this point, the purified Cas3 protein is obtained, and the corresponding molecular sieve chromatogram and electrophoresis diagram are shown in Figure 10.

实施例3Example 3

本实施例以实施例2为基础。本实施例为筛选I-B型系统PAM序列。This example is based on Example 2. This example is to screen for I-B type system PAM sequences.

本实施例通过构建PAM library，筛选出本发明基于Synechocystis sp.PCC 6714细菌开发的I-B型系统最偏好的PAM序列。In this example, a PAM library was constructed to screen out the most preferred PAM sequence of the I-B type system developed based on the Synechocystis sp. PCC 6714 bacterium of the present invention.

设计两条包含protospacer序列和随机PAM序列的引物Mix PAM-F和Mix PAM-R。根据crRNA的序列设计protospacer序列为“tttatcaccgtgtccccaatctggatattttgtgt”，在其5’端设计三个位置的随机碱基“nnn”作为PAM library。使用酶切酶连方法将PAM library与将pET-28a载体连接，构建PAM library质粒。Design two primers Mix PAM-F and Mix PAM-R containing protospacer sequence and random PAM sequence. Design the protospacer sequence as "tttatcaccgtgtccccaatctggatattttgtgt" according to the sequence of crRNA, and design three random bases "nnn" at its 5' end as PAM library. Use the enzyme digestion and ligation method to connect the PAM library with the pET-28a vector to construct the PAM library plasmid.

注：crRNA序列为：guguccaaaccauugaugccguaaggcguugagcac。Note: The crRNA sequence is: guguccaaaccauugaugccguaaggcguugagcac.

用PAM library质粒作为模板，以PCR扩增出161bp PAM library的双链DNA，其中上游引物中带有T7启动子序列；将此PCR产物作为模板，用5’端6-FAM荧光标记的T7启动子引物再次进行PCR，使得产物带上FAM荧光标记。通过两轮PCR反应，得到3’端带有CY5荧光标记的97bp双链DNA，包含单一PAM序列，分别命名为synPAM 1-30。Using the PAM library plasmid as a template, PCR was used to amplify 161bp of double-stranded DNA of the PAM library, in which the upstream primer contained a T7 promoter sequence; using this PCR product as a template, PCR was performed again using a T7 promoter primer labeled with 6-FAM at the 5' end, so that the product was labeled with FAM fluorescence. After two rounds of PCR reactions, 97bp of double-stranded DNA with a CY5 fluorescent label at the 3' end was obtained, containing a single PAM sequence, which were named synPAM 1-30.

建立上述PAM library所用的引物序列如下表所示，加框处为PAM序列。The primer sequences used to establish the above-mentioned PAM library are shown in the table below, and the boxed part is the PAM sequence.

将实施例2纯化好的Cascade复合物与PAM library DNA在25℃孵育1h，然后通过非变性电泳对反应后的产物进行分离，与Cascade形成复合物后可以直观地发现荧光条带在EMSA胶中的迁移速率变慢。在荧光下切下与Cascade复合物结合的DNA条带，进行测序，测序引物为通用T7启动子和T7终止子序列。测序结果与pET-28a-PAM library质粒序列进行分析比对。The purified Cascade complex in Example 2 was incubated with PAM library DNA at 25°C for 1 hour, and then the reaction products were separated by non-denaturing electrophoresis. After forming a complex with Cascade, it can be visually found that the migration rate of the fluorescent band in the EMSA gel slowed down. The DNA band bound to the Cascade complex was cut under fluorescence and sequenced. The sequencing primers were universal T7 promoter and T7 terminator sequences. The sequencing results were analyzed and compared with the pET-28a-PAM library plasmid sequence.

注：上文中用到的T7启动子序列为：6-FAM T7 promoter：taatacgactcactatagg(5’带荧光标记)；T7终止子序列为：CY5-T7 terminator：gctagttattgctcagcgg(3’带荧光标记)。Note: The T7 promoter sequence used in the above text is: 6-FAM T7 promoter: taatacgactcactatagg (5’ with fluorescent label); T7 terminator sequence is: CY5-T7 terminator: gctagttattgctcagcgg (3’ with fluorescent label).

首先进行ann的PAM筛选。根据分子相互作用中解离常数的概念，结合一半DNA底物时的蛋白浓度即为解离常数，因此降低DNA的反应浓度到10nM，并设置Cascade复合物的梯度从0nM到200nM，观察DNA与Cascade复合物的结合情况。First, PAM screening of ann was performed. According to the concept of dissociation constant in molecular interactions, the protein concentration when binding half of the DNA substrate is the dissociation constant, so the reaction concentration of DNA was reduced to 10nM, and the gradient of Cascade complex was set from 0nM to 200nM to observe the binding of DNA and Cascade complex.

如图11所示，1-8号PAM序列为aan和agn，在Cascade复合物浓度达到100nM和200nM时，依然有较多的DNA游离，而9-16号含有acn和atn两种PAM序列的DNA在高浓度蛋白时全部被结合。该结果表明，在PAM序列的第二位，Cascade复合物更偏好结合胞嘧啶c或胸腺嘧啶t。As shown in Figure 11, PAM sequences 1-8 are aan and agn. When the concentration of Cascade complex reaches 100nM and 200nM, there is still a lot of free DNA, while DNAs containing PAM sequences acn and atn in 9-16 are all bound at high protein concentrations. This result shows that at the second position of the PAM sequence, the Cascade complex prefers to bind to cytosine c or thymine t.

在PAM序列中，相比第二位，第三位的偏好性更为明显。aan序列的表现都比较差，agn中g的结合更好，acn中aca和acg的结合力相当，而atn中的atg结合力明显优于其他三个，因此，在总体上，PAM序列的第三位偏好为g。In the PAM sequence, the preference for the third position is more obvious than the second position. The performance of the aan sequence is relatively poor, the binding of g in agn is better, the binding of aca and acg in acn is similar, and the binding of atg in atn is significantly better than the other three. Therefore, in general, the third position of the PAM sequence prefers g.

接着固定PAM序列的第二位和第三位碱基序列，改变第一位的碱基来观察其规律。如图12所示，对比ncg和ntg两组，在蛋白浓度为50nM时，ncg中acg结合得最好，而ntg中也是atg的结合效率更高一些，这表明PAM序列的第一位确实是偏好于碱基a。将ntg和ncg相互比较，发现ntg的结合率总要比ncg大一些。由此得出结论，PAM序列的第二位偏好为t。Then fix the second and third base sequences of the PAM sequence, and change the first base to observe the pattern. As shown in Figure 12, when comparing the ncg and ntg groups, at a protein concentration of 50nM, acg in ncg binds best, while atg in ntg also has a higher binding efficiency, which indicates that the first base of the PAM sequence does prefer base a. Comparing ntg and ncg, it is found that the binding rate of ntg is always greater than that of ncg. It is concluded that the second base of the PAM sequence prefers t.

注：图11、图12显示了典型的部分结果图，其余序列的结果图虽未示出但其结果均符合以上结论。Note: Figures 11 and 12 show typical partial result graphs. Although the result graphs of other sequences are not shown, their results are consistent with the above conclusions.

至此可得本发明I-B型系统的最优PAM序列为5’-atg-3’。It can be concluded that the optimal PAM sequence for the I-B type system of the present invention is 5’-atg-3’.

实施例4Example 4

本实施例基于实施例3。本实施例为体外组装Cascade-DNA-Cas3三元复合物。This example is based on Example 3. This example is to assemble the Cascade-DNA-Cas3 ternary complex in vitro.

将实施例2纯化好的Cascade复合物与实施例3确定的PAM-DNA按照摩尔比1：3在25℃孵育1h，之后15000rpm离心10min，通过分子筛层析获得Cascade-DNA复合物。将该Cascade-DNA复合物与实施例2的Cas3蛋白按照摩尔比1：3在25℃孵育1h，15000rpm离心10min后，通过分子筛层析得到Cascade-DNA-Cas3复合物，并通过SDS-PAGE对得到的复合物进行验证(相关结果如图13所示)。The purified Cascade complex in Example 2 was incubated with the PAM-DNA determined in Example 3 at a molar ratio of 1:3 at 25°C for 1 hour, then centrifuged at 15000 rpm for 10 minutes, and the Cascade-DNA complex was obtained by molecular sieve chromatography. The Cascade-DNA complex was incubated with the Cas3 protein in Example 2 at a molar ratio of 1:3 at 25°C for 1 hour, centrifuged at 15000 rpm for 10 minutes, and then the Cascade-DNA-Cas3 complex was obtained by molecular sieve chromatography, and the obtained complex was verified by SDS-PAGE (the relevant results are shown in Figure 13).

通过以上方法得到的Cascade-DNA-Cas3三元复合物证实了本发明中得到的Cascade复合物与Cas3蛋白是具有结合活性的。The Cascade-DNA-Cas3 ternary complex obtained by the above method confirms that the Cascade complex obtained in the present invention has binding activity with the Cas3 protein.

实施例5Example 5

本实施例基于实施例4。本实施例为检测本发明I-B型CRISPR-Cas系统在hESC细胞系中的基因敲除效率。This example is based on Example 4. This example is to detect the gene knockout efficiency of the type I-B CRISPR-Cas system of the present invention in hESC cell lines.

构建hESC-EGFP-tdTomato双报告细胞系的具体过程如下：The specific process of constructing the hESC-EGFP-tdTomato dual reporter cell line is as follows:

首先构建hESC-EGFR报告细胞系。野生型hESC细胞用TrypLE Express(Gibco)进行消化后，用OptiMem重悬，调整细胞密度为5×10⁶cells/mL。将500μl细胞悬液与30μg线性化DNMT3B-EGFP质粒混合后，加入到0.4cm电转杯中进行电穿孔，之后将其置于10cm培养皿中，培养皿中预先加入含有10μm Y-27632的E8培养基。在培养3天后，加入0.5μg/ml的嘌呤霉素进行筛选，期间每日更换培养基，若长满则正常传代。培养7天后，使用荧光显微镜鉴定出表达EGFP的耐药单克隆，经过单克隆扩增后得到hESC-EGFP细胞系。First, the hESC-EGFR reporter cell line was constructed. Wild-type hESC cells were digested with TrypLE Express (Gibco), resuspended with OptiMem, and the cell density was adjusted to 5×10⁶ cells/mL. 500μl of cell suspension was mixed with 30μg of linearized DNMT3B-EGFP plasmid, added to a 0.4cm electroporation cup for electroporation, and then placed in a 10cm culture dish, in which E8 medium containing 10μm Y-27632 was pre-added. After 3 days of culture, 0.5μg/ml of puromycin was added for screening, and the culture medium was changed daily during the period. If it was full, it was passaged normally. After 7 days of culture, the drug-resistant monoclonal clone expressing EGFP was identified using a fluorescence microscope, and the hESC-EGFP cell line was obtained after monoclonal amplification.

之后，用上述相同的方法在hESC-EGFP细胞系的基础上，构建hESC-EGFR-tdTomato报告细胞系。选用质粒为：线性化DNMT3B-tdTomato质粒。对EGFP+/tdTm+双阳性细胞进行筛选与单克隆扩增，进而得到hESC-EGFP-tdTomato双重报告细胞系。Afterwards, the hESC-EGFR-tdTomato reporter cell line was constructed based on the hESC-EGFP cell line using the same method as above. The plasmid used was: linearized DNMT3B-tdTomato plasmid. EGFP+/tdTm+ double positive cells were screened and monoclonally amplified to obtain the hESC-EGFP-tdTomato dual reporter cell line.

采用上述hESC-EGFP-tdTomato双报告细胞系；以tdTomato为靶标基因设计crRNA，序列为SEQ ID NO.15(同实施例1)；以EGFR为靶标基因设计crRNA，序列为：The hESC-EGFP-tdTomato dual reporter cell line was used; crRNA was designed with tdTomato as the target gene, and the sequence was SEQ ID NO.15 (same as Example 1); crRNA was designed with EGFR as the target gene, and the sequence was:

其中，加框为spacer序列，其余为repeat序列。之后，先按照实施例1方法构建质粒，再按照实施例2方法制得Cascade复合物以及Cas3蛋白，之后采用Neon细胞核转染系统(ThermoFisher)电转入hESC-EGFP-tdTomato双报告细胞系中，最终用FACS计算编辑效率。Among them, the box is the spacer sequence, and the rest is the repeat sequence. Afterwards, the plasmid was first constructed according to the method of Example 1, and then the Cascade complex and Cas3 protein were prepared according to the method of Example 2, and then electroporated into the hESC-EGFP-tdTomato dual reporter cell line using the Neon cell nuclear transfection system (ThermoFisher), and finally the editing efficiency was calculated by FACS.

电转后约4-5天，细胞用TrypLE Express(Gibco)消化，用添加10％FBS的IMDM重悬HAP1细胞。用LSR Fortessa(BD)488nm激光对hESC-EGFP-tdTomato双报告细胞系进行流式分析。FACS数据用FlowJo v10.4.1进行分析。About 4-5 days after electroporation, cells were digested with TrypLE Express (Gibco) and HAP1 cells were resuspended in IMDM supplemented with 10% FBS. The hESC-EGFP-tdTomato dual reporter cell line was analyzed by flow cytometry using a LSR Fortessa (BD) 488 nm laser. FACS data were analyzed using FlowJo v10.4.1.

本实施例采用上述方法探究了基于Synechocystis sp.PCC 6714菌株开发的typeⅠ-B系统即本发明的基因编辑系统的基因编辑效率。编辑效率用tdTomato-或者EGFP-细胞的数量比例来表示。This example uses the above method to explore the gene editing efficiency of the type I-B system, i.e., the gene editing system of the present invention, developed based on the Synechocystis sp. PCC 6714 strain. The editing efficiency is expressed as the ratio of the number of tdTomato- or EGFP- cells.

注：除此之外，还可采用Long range PCR法及NGS测序法进行检测分析。Note: In addition, Long range PCR and NGS sequencing can also be used for detection and analysis.

结果如图14所示，该结果表明本发明的Ⅰ-B型基因编辑系统的效率较文献报道的各基因编辑系统(如来自N.lactamica ATCC 23970的typeⅠ-C系统)更高，可达39％的tdTomato靶向效率以及55％的EGFP靶向效率；如此即证实本发明Ⅰ-B型基因编辑系统的优越性。注：此处提到的文献是指：Tan R,Krueger RK,Gramelspacher MJ,Zhou X,Xiao Y,KeA,Hou Z,Zhang Y.Cas11 enables genome engineering in human cells with compactCRISPR-Cas3 systems.Mol Cell.2022Jan 13:S1097-2765(21)01137-0。The results are shown in Figure 14, which shows that the efficiency of the type I-B gene editing system of the present invention is higher than that of various gene editing systems reported in the literature (such as the type I-C system from N. lactamica ATCC 23970), reaching 39% tdTomato targeting efficiency and 55% EGFP targeting efficiency; thus confirming the superiority of the type I-B gene editing system of the present invention. Note: The literature mentioned here refers to: Tan R, Krueger RK, Gramelspacher MJ, Zhou X, Xiao Y, KeA, Hou Z, Zhang Y. Cas11 enables genome engineering in human cells with compact CRISPR-Cas3 systems. Mol Cell. 2022 Jan 13: S1097-2765 (21) 01137-0.

综合以上各实施例，本发明的制备方法能快速、高效、高产的产生高纯度与活性的Cascade复合物与Cas蛋白，使用E.coli原核表达系统，亲和柱与分子筛纯化可以在两天之内得到大量蛋白，效率高。In summary, the preparation method of the present invention can produce high-purity and active Cascade complexes and Cas proteins quickly, efficiently and in high yield. Using the E. coli prokaryotic expression system, affinity columns and molecular sieve purification can obtain a large amount of protein within two days with high efficiency.

本发明的基因编辑系统基于Synechocystis sp.PCC 6714开发，经生物学信息分析归属于I-B型。本发明的基因编辑系统对不同基因的敲除效率均高于其他Ⅰ型系统，表明该系统具有优越性，具有优良的研究与开发潜力。The gene editing system of the present invention is developed based on Synechocystis sp. PCC 6714 and is classified as type I-B after biological information analysis. The knockout efficiency of the gene editing system of the present invention for different genes is higher than that of other type I systems, indicating that the system has superiority and excellent research and development potential.

除上述实施例外，本发明还可以有其他实施方式。凡采用等同替换或等效变换形成的技术方案，均落在本发明要求的保护范围。In addition to the above embodiments, the present invention may also have other implementations. Any technical solution formed by equivalent replacement or equivalent transformation falls within the protection scope required by the present invention.

Claims

Translated fromChinese

1.一种I-B型CRISPR-Cascade-Cas3基因编辑系统，其特征是，由Cascade复合物以及Cas3蛋白组成；所述Cascade复合物由Cmx8蛋白、Cas8蛋白、Cas5蛋白、Cas6蛋白、Cas11蛋白以及crRNA复合而成；所述Cmx8蛋白的氨基酸序列为SEQ ID NO.2；所述Cas8蛋白的氨基酸序列为SEQ ID NO.4；所述Cas5蛋白的氨基酸序列为SEQ ID NO.6；所述Cas6蛋白的氨基酸序列为SEQ ID NO.8；所述Cas11蛋白的氨基酸序列为SEQ ID NO.10；所述Cas3蛋白的氨基酸序列为SEQ ID NO.12；表达crRNA的DNA片段序列由彼此相同的repeat序列、彼此相同或不同的spacer序列间隔布置而成，且该DNA片段序列的首尾均为repeat序列，所述repeat序列为5’-gtgtccaaaccattgatgccgtaaggcgttgagcac-3’，所述spacer序列根据靶标基因设计而成。1. a kind of I-B type CRISPR-Cascade-Cas3 gene editing system is characterized in that, is made up of Cascade complex and Cas3 albumen; Described Cascade complex is made up of Cmx8 albumen, Cas8 albumen, Cas5 albumen, Cas6 albumen, Cas11 albumen and crRNA compound; the amino acid sequence of the Cmx8 protein is SEQ ID NO.2; the amino acid sequence of the Cas8 protein is SEQ ID NO.4; the amino acid sequence of the Cas5 protein is SEQ ID NO.6; the Cas6 protein The amino acid sequence of the Cas11 protein is SEQ ID NO.8; the amino acid sequence of the Cas11 protein is SEQ ID NO.10; the amino acid sequence of the Cas3 protein is SEQ ID NO.12; the DNA fragment sequence expressing crRNA consists of the same repeat sequence , the same or different spacer sequences are arranged at intervals, and the beginning and end of the DNA fragment sequence are repeat sequences, the repeat sequence is 5'-gtgtccaaaccattgatgccgtaaggcgttgagcac-3', and the spacer sequence is designed according to the target gene.

2.根据权利要求1所述的I-B型CRISPR-Cascade-Cas3基因编辑系统，其特征是，Cas8蛋白的氨基酸序列的3’端连有核定位信号NLS；Cas3蛋白的氨基酸序列的5’端连有核定位信号NLS；所述核定位信号NLS的氨基酸序列为SEQ ID NO.14；表达crRNA的DNA片段序列的结构为：5’-repeat序列-spacer序列-repeat序列-spacer序列-repeat序列-3’。2. I-B type CRISPR-Cascade-Cas3 gene editing system according to claim 1, is characterized in that, the 3 ' end of the amino acid sequence of Cas8 protein is connected with nuclear localization signal NLS; The 5 ' end of the amino acid sequence of Cas3 protein is connected There is a nuclear localization signal NLS; the amino acid sequence of the nuclear localization signal NLS is SEQ ID NO.14; the structure of the DNA fragment sequence expressing crRNA is: 5'-repeat sequence-spacer sequence-repeat sequence-spacer sequence-repeat sequence- 3'.

3.根据权利要求1所述的I-B型CRISPR-Cascade-Cas3基因编辑系统，其特征是，所述Cmx8蛋白的编码基因序列为SEQ ID NO.1；所述Cas8蛋白的编码基因序列为SEQ ID NO.3；所述Cas5蛋白的编码基因序列为SEQ ID NO.5；所述Cas6蛋白的编码基因序列为SEQ IDNO.7；所述Cas11蛋白的编码基因序列为SEQ ID NO.9；所述Cas3蛋白的编码基因序列为SEQID NO.11；所述核定位信号NLS的编码基因序列为SEQ ID NO.13；所述I-B型CRISPR-Cascade-Cas3基因编辑系统对应的PAM-DNA序列为5’-atg-3’。3. I-B type CRISPR-Cascade-Cas3 gene editing system according to claim 1, is characterized in that, the coding gene sequence of described Cmx8 protein is SEQ ID NO.1; The coding gene sequence of described Cas8 protein is SEQ ID NO.3; the coding gene sequence of the Cas5 protein is SEQ ID NO.5; the coding gene sequence of the Cas6 protein is SEQ ID NO.7; the coding gene sequence of the Cas11 protein is SEQ ID NO.9; The coding gene sequence of Cas3 protein is SEQID NO.11; the coding gene sequence of the nuclear localization signal NLS is SEQ ID NO.13; the PAM-DNA sequence corresponding to the I-B type CRISPR-Cascade-Cas3 gene editing system is 5' -atg-3'.

4.权利要求1至3任一项所述I-B型CRISPR-Cascade-Cas3基因编辑系统的制备方法，其特征是，包括以下步骤：4. The preparation method of the I-B type CRISPR-Cascade-Cas3 gene editing system described in any one of claims 1 to 3, is characterized in that, comprises the following steps:

第一步、构建Cascade复合物的质粒，并构建Cas3蛋白的质粒；The first step, construct the plasmid of Cascade complex, and construct the plasmid of Cas3 protein;

第二步、将Cascade复合物的质粒共转入E.coli原核表达细胞，将Cas3蛋白的质粒单独转入E.coli原核表达细胞；然后分别进行诱导表达，并经纯化获得纯化的Cascade复合物和Cas3蛋白；In the second step, the plasmid of the Cascade complex is co-transferred into the E.coli prokaryotic expression cell, and the plasmid of the Cas3 protein is separately transferred into the E.coli prokaryotic expression cell; then the expression is induced and purified to obtain the purified Cascade complex and Cas3 protein;

即得I-B型CRISPR-Cascade-Cas3基因编辑系统。The I-B type CRISPR-Cascade-Cas3 gene editing system is obtained.

5.根据权利要求4所述的制备方法，其特征是，第一步中，在Cascade复合物的质粒中，Cas8蛋白的编码基因序列的3’端连有核定位信号NLS的编码基因序列；在Cas3蛋白的质粒中，Cas3蛋白的编码基因序列的5’端连有核定位信号NLS的编码基因序列；所述核定位信号NLS的编码基因序列为SEQ ID NO.13；5. preparation method according to claim 4, is characterized in that, in the first step, in the plasmid of Cascade complex, the 3 ' end of the coded gene sequence of Cas8 albumen is connected with the coded gene sequence of nuclear localization signal NLS; In the plasmid of the Cas3 protein, the 5' end of the coding gene sequence of the Cas3 protein is connected with the coding gene sequence of the nuclear localization signal NLS; the coding gene sequence of the nuclear localization signal NLS is SEQ ID NO.13;

第二步中，所述E.coli原核表达细胞为E.coli BL21(DE3)；纯化时先将表达产物经亲和层析处理得到粗提蛋白，再将粗提蛋白经分子筛层析得到纯化的目的蛋白。In the second step, the E.coli prokaryotic expression cell is E.coli BL21(DE3); when purifying, the expression product is first processed by affinity chromatography to obtain crude protein, and then the crude protein is purified by molecular sieve chromatography the target protein.

6.根据权利要求5所述的制备方法，其特征是，第一步的具体过程如下：6. preparation method according to claim 5 is characterized in that, the concrete process of the first step is as follows:

将Cmx8蛋白、核定位信号NLS、Cas8蛋白、Cas5蛋白的编码基因序列构建入第一质粒，将Cas6蛋白、Cas11蛋白的编码基因序列构建入第二质粒，将表达crRNA的DNA片段序列构建入第三质粒；所述第一质粒、第二质粒以及第三质粒均属于Cascade复合物的质粒；将核定位信号NLS、Cas3蛋白的编码基因序列构建入第四质粒，所述第四质粒即Cas3蛋白的质粒；The coding gene sequences of Cmx8 protein, nuclear localization signal NLS, Cas8 protein, and Cas5 protein were constructed into the first plasmid, the coding gene sequences of Cas6 protein and Cas11 protein were constructed into the second plasmid, and the DNA fragment sequences expressing crRNA were constructed into the second plasmid. Three plasmids; the first plasmid, the second plasmid and the third plasmid all belong to the plasmid of the Cascade complex; the nuclear localization signal NLS, the coding gene sequence of the Cas3 protein is constructed into the fourth plasmid, and the fourth plasmid is the Cas3 protein the plasmid;

7.权利要求1至3任一项所述I-B型CRISPR-Cascade-Cas3基因编辑系统用于识别、结合及编辑原核生物基因或真核生物基因的应用。7. The application of the I-B type CRISPR-Cascade-Cas3 gene editing system according to any one of claims 1 to 3 for identifying, combining and editing prokaryotic genes or eukaryotic genes.

8.一种细胞基因敲除方法，其特征是，采用权利要求1至3任一项所述I-B型CRISPR-Cascade-Cas3基因编辑系统，所述方法包括以下步骤：8. A cell gene knockout method, characterized in that, using the I-B type CRISPR-Cascade-Cas3 gene editing system described in any one of claims 1 to 3, the method comprises the following steps:

S1、将Cascade复合物以及Cas3蛋白电转入目标细胞对其靶标基因进行敲除；S1. Electrotransfer the Cascade complex and Cas3 protein into the target cells to knock out the target genes;

9.根据权利要求8所述的细胞基因敲除方法，其特征是，S1中，采用Neon细胞核转染系统进行电转；9. The cell gene knockout method according to claim 8, characterized in that, in S1, Neon cell nucleofection system is used for electroporation;

10.一种含有权利要求1至3任一项所述I-B型CRISPR-Cascade-Cas3基因编辑系统的细胞系或细胞株。10. A cell line or cell strain containing the I-B type CRISPR-Cascade-Cas3 gene editing system according to any one of claims 1 to 3.