Movatterモバイル変換


[0]ホーム

URL:


WO2025130907A1 - Composition and method for prime editing technique - Google Patents

Composition and method for prime editing technique
Download PDF

Info

Publication number
WO2025130907A1
WO2025130907A1PCT/CN2024/140239CN2024140239WWO2025130907A1WO 2025130907 A1WO2025130907 A1WO 2025130907A1CN 2024140239 WCN2024140239 WCN 2024140239WWO 2025130907 A1WO2025130907 A1WO 2025130907A1
Authority
WO
WIPO (PCT)
Prior art keywords
nucleotides
protein
sequence
reverse transcriptase
editing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/CN2024/140239
Other languages
French (fr)
Chinese (zh)
Inventor
段志强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Shunfeng Biotechnology CoLtd
Shandong Shunfeng Biotechnology Co Ltd
Original Assignee
Shandong Shunfeng Biotechnology CoLtd
Shandong Shunfeng Biotechnology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Shunfeng Biotechnology CoLtd, Shandong Shunfeng Biotechnology Co LtdfiledCriticalShandong Shunfeng Biotechnology CoLtd
Publication of WO2025130907A1publicationCriticalpatent/WO2025130907A1/en
Pendinglegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Classifications

Definitions

Landscapes

Abstract

The present invention belongs to the field of nucleic acid editing, in particular to the technical field of clustered regularly interspaced short palindromic repeats (CRISPRs). Specifically, provided in the present invention is a fusion protein. The fusion protein can be used for performing prime editing on a target nucleic acid. The fusion protein of the present invention expands a prime editing system, thereby enabling the system to be compatible with the Type V CRISPR system and allowing for precise editing.

Description

Translated fromChinese
一种用于引导编辑技术的组合物和方法A composition and method for guided editing technology

本申请要求申请日为2023年12月22日的中国专利申请CN202311777306.3的优先权。本申请引用上述中国专利申请的全文。This application claims the priority of Chinese patent application CN202311777306.3, filed on December 22, 2023. This application cites the full text of the above Chinese patent application.

技术领域Technical Field

本发明涉及基因编辑领域,特别是引导编辑技术领域。具体地,本发明涉及一种融合蛋白及其应用,尤其涉及了一种融合逆转录酶的V型Cas蛋白及其应用。The present invention relates to the field of gene editing, in particular to the field of guide editing technology. Specifically, the present invention relates to a fusion protein and its application, in particular to a V-type Cas protein fused with a reverse transcriptase and its application.

背景技术Background Art

CRISPR/Cas技术是一种被广泛使用的基因编辑技术,它通过RNA引导对基因组上的靶序列进行特异性结合并切割DNA产生双链断裂,利用生物非同源末端连接(NHEJ)或同源重组(HDR)进行定点基因编辑,传统同源定向修复(HDR)大多数效率非常低,尤其是非分裂细胞中,并且竞争性非同源末端主导导致插入-缺失副产物。CRISPR/Cas technology is a widely used gene editing technology. It uses RNA to specifically bind to the target sequence on the genome and cut DNA to produce double-strand breaks, and uses biological non-homologous end joining (NHEJ) or homologous recombination (HDR) for site-specific gene editing. Traditional homology-directed repair (HDR) is mostly very inefficient, especially in non-dividing cells, and competitive non-homologous ends dominate and lead to insertion-deletion byproducts.

引导编辑(Prime Editing)技术,能够在特定的位点进行碱基替换或片段插入。然而,引导编辑技术的编辑效率仍然有待提升,目前为止,引导编辑系统仅限于II型CRISPR/Cas蛋白,例如来源于化脓性链球菌Cas9和金黄色葡萄球菌。Prime editing technology can perform base substitution or fragment insertion at specific sites. However, the editing efficiency of prime editing technology still needs to be improved. So far, the prime editing system is limited to type II CRISPR/Cas proteins, such as Cas9 from Streptococcus pyogenes and Staphylococcus aureus.

因此,本发明开发出一种可用于引导编辑的V型Cas蛋白,可以与逆转录酶结合形成融合蛋白用于引导编辑,扩展了引导编辑的应用范围。Therefore, the present invention develops a V-type Cas protein that can be used for guide editing, which can be combined with a reverse transcriptase to form a fusion protein for guide editing, thereby expanding the application scope of guide editing.

发明内容Summary of the invention

一方面,本发明提供一种融合蛋白,所述融合蛋白包含Cas蛋白和逆转录酶。In one aspect, the present invention provides a fusion protein comprising a Cas protein and a reverse transcriptase.

在一个实施方式中,所述Cas蛋白为V型Cas蛋白或其变体。In one embodiment, the Cas protein is a V-type Cas protein or a variant thereof.

在优选的实施方式中,所述Cas蛋白选自Cas9、Cas9n、dCas9、CasX、CasY、C2cl、C2c2、C2c3、GeoCas9、CjCas9、Casl2a、Casl2b、Cas12c、Cas12e、Cas12d、Casl2g、Casl2h、Casl2i、Cas12j、Cas13a、Casl3b、Casl3c、Casl3d、Casl4、Csn2、xCas9、Cas9-NG、LbCasl2a、enAsCasl2a、Cas9-KKH、循环置换Cas9、Argonaute(Ago)结构域、SmacCas9、Spy-macCas9、SpGas9-NRRH、SpaCas9-NRTH、SpaCas9-NRCH、Cas9-NG-CP1041、Cas9-NG-VRQR、dCas12i、nCas12i或Argonaute及其变体,优选的,所述Cas蛋白为Cas12i。In a preferred embodiment, the Cas protein is selected from Cas9, Cas9n, dCas9, CasX, CasY, C2cl, C2c2, C2c3, GeoCas9, CjCas9, Cas12a, Cas12b, Cas12c, Cas12e, Cas12d, Cas12g, Cas12h, Cas12i, Cas12j, Cas13a, Cas13b, Cas13c, Cas13d, Cas14, Csn2, xCas9, Cas9-NG, LbCasl2a, enAsCasl2a, Cas9-KKH, circular replacement Cas9, Argonaute (Ago) domain, SmacCas9, Spy-macCas9, SpGas9-NRRH, SpaCas9-NRTH, SpaCas9-NRCH, Cas9-NG-CP1041, Cas9-NG-VRQR, dCas12i, nCas12i or Argonaute and variants thereof, preferably, the Cas protein is Cas12i.

在一个实施方式中,所述Cas蛋白的氨基酸序列与SEQ ID NO.1相比具有至少70%、至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、至少99%、至少99.1%、至少99.2%、至少99.3%、至少99.4%、至少99.5%、至少99.6%、至少99.7%、至少99.8%、或至少99.9%的序列同一性。In one embodiment, the amino acid sequence of the Cas protein has at least 70%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8%, or at least 99.9% sequence identity compared to SEQ ID NO.1.

在一个实施方式中,所述Cas蛋白的氨基酸序列与SEQ ID NO.1相比具有一个或多个氨基酸的置换、缺失或添加,例如,1-20个氨基酸的置换、缺失或添加,又如,1个,2个,3个,4个,5个,6个,7个,8个,9个,10个,11个,12个,13个,14个,15个,16个,17个,18个,19个或20个氨基酸的置换、缺失或添加;并且,基本保留了所述Cas蛋白的生物学功能。In one embodiment, the amino acid sequence of the Cas protein has one or more amino acid substitutions, deletions or additions compared to SEQ ID NO.1, for example, 1-20 amino acid substitutions, deletions or additions, for example, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 or 20 amino acid substitutions, deletions or additions; and the biological function of the Cas protein is substantially retained.

所述Cas蛋白的生物学功能包括亲本V型Cas蛋白的生物学功能,例如,与指导RNA结合的活性、核酸内切酶活性、或者在指导RNA引导下与靶序列特定位点结合并切割的活性(包括但不限于Cis切割活性和Trans切割活性)。The biological functions of the Cas protein include the biological functions of the parent V-type Cas protein, for example, the activity of binding to the guide RNA, the endonuclease activity, or the activity of binding to and cutting a specific site of the target sequence under the guidance of the guide RNA (including but not limited to Cis cutting activity and Trans cutting activity).

在一个实施方式中,所述Cas蛋白的氨基酸序列如SEQ ID NO.1所示。In one embodiment, the amino acid sequence of the Cas protein is as shown in SEQ ID NO.1.

在一个实施方式中,所述逆转录酶是来自逆转录病毒或逆转录转座子的天然存在的逆转录序列或其变体。In one embodiment, the reverse transcriptase is a naturally occurring reverse transcriptase sequence from a retrovirus or a retrotransposon, or a variant thereof.

在一些实施方式中,所述逆转录酶选自M-MLV逆转录酶(莫洛尼鼠白血病病毒逆转录酶)或者AMV逆转录酶(禽类成肌细胞增多病毒逆转录酶)中的任意一种或几种。In some embodiments, the reverse transcriptase is selected from any one or more of M-MLV reverse transcriptase (Moloney murine leukemia virus reverse transcriptase) or AMV reverse transcriptase (avian myoblastosis virus reverse transcriptase).

在优选的实施方式中,所述逆转录酶为M-MLV逆转录酶。In a preferred embodiment, the reverse transcriptase is M-MLV reverse transcriptase.

在一些的实施方式中,所述逆转录酶的氨基酸序列与SEQ ID NO.2相比具有至少70%、至少80%、至少85%、至少90%、至少91%、至少92%、至少93%、至少94%、至少95%、至少96%、至少97%、至少98%、至少99%、至少99.1%、至少99.2%、至少99.3%、至少99.4%、至少99.5%、至少99.6%、至少99.7%、至少99.8%、或至少99.9%的序列同一性。In some embodiments, the amino acid sequence of the reverse transcriptase has at least 70%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8%, or at least 99.9% sequence identity compared to SEQ ID NO.2.

在一个实施方式中,所述逆转录酶的氨基酸序列与SEQ ID NO.2相比具有一个或多个氨基酸的置换、缺失或添加,例如,1-20个氨基酸的置换、缺失或添加,又如,1个,2个,3个,4个,5个,6个,7个,8个,9个,10个,11个,12个,13个,14个,15个,16个,17个,18个,19个或20个氨基酸的置换、缺失或添加;并且,基本保留了所述逆转录酶的生物学功能。In one embodiment, the amino acid sequence of the reverse transcriptase has one or more amino acid substitutions, deletions or additions compared to SEQ ID NO.2, for example, 1-20 amino acid substitutions, deletions or additions, for example, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 or 20 amino acid substitutions, deletions or additions; and the biological function of the reverse transcriptase is basically retained.

在优选的实施方式中,所述逆转录酶的氨基酸序列如SEQ ID NO.2所示。In a preferred embodiment, the amino acid sequence of the reverse transcriptase is as shown in SEQ ID NO.2.

本领域技术人员清楚,可以改变蛋白质的结构而不对其活性和功能性产生不利影响,例如,可以在蛋白质氨基酸序列中引入一个或多个保守性氨基酸取代,而不会对蛋白质分子的活性和/或三维结构产生不利影响。本领域技术人员清楚保守性氨基酸取代的实例以及实施方式。具体的说,可以用与待取代位点属于相同组的另一氨基酸残基取代该氨基酸残基,即用非极性氨基酸残基取代另一非极性氨基酸残基,用极性不带电荷的氨基酸残基取代另一极性不带电荷的氨基酸残基,用碱性氨基酸残基取代另一碱性氨基酸残基,和用酸性氨基酸残基取代另一酸性氨基酸残基。这样的取代的氨基酸残基可以是也可以不是由遗传密码编码的。只要取代不导致蛋白质生物活性的失活,则一种氨基酸被属于同组的其他氨基酸替换的保守取代落在本发明的范围内。因此,本发明的Cas蛋白或逆转录酶可以在氨基酸序列中包含一个或多个保守性取代,这些保守性取代最好根据表1进行替换而产生。另外,本发明也涵盖还包含一个或多个其他非保守取代的蛋白,只要该非保守取代不显著影响本发明的蛋白质的所需功能和生物活性即可。It is clear to those skilled in the art that the structure of a protein can be changed without adversely affecting its activity and functionality. For example, one or more conservative amino acid substitutions can be introduced into the amino acid sequence of a protein without adversely affecting the activity and/or three-dimensional structure of the protein molecule. Examples and embodiments of conservative amino acid substitutions are clear to those skilled in the art. Specifically, the amino acid residue can be replaced with another amino acid residue belonging to the same group as the site to be replaced, that is, another non-polar amino acid residue can be replaced with a non-polar amino acid residue, another polar non-charged amino acid residue can be replaced with a polar uncharged amino acid residue, another basic amino acid residue can be replaced with a basic amino acid residue, and another acidic amino acid residue can be replaced with an acidic amino acid residue. Such substituted amino acid residues may or may not be encoded by a genetic code. As long as the substitution does not result in the inactivation of the biological activity of the protein, a conservative substitution in which an amino acid is replaced by other amino acids belonging to the same group falls within the scope of the present invention. Therefore, the Cas protein or reverse transcriptase of the present invention may contain one or more conservative substitutions in the amino acid sequence, and these conservative substitutions are preferably generated by substitution according to Table 1. In addition, the present invention also encompasses proteins that further comprise one or more other non-conservative substitutions, as long as the non-conservative substitutions do not significantly affect the desired functions and biological activities of the proteins of the present invention.

保守氨基酸置换可以在一个或多个预测的非必需氨基酸残基处进行。“非必需”氨基酸残基是可以发生改变(缺失、取代或置换)而不改变生物活性的氨基酸残基,而“必需”氨基酸残基是生物活性所需的。“保守氨基酸置换”是其中氨基酸残基被具有类似侧链的氨基酸残基替代的置换。氨基酸置换可以在上述工程化的V型Cas蛋白的非保守区域中进行。一般而言,此类置换不对保守的氨基酸残基,或者不对位于保守基序内的氨基酸残基进行,其中此类残基是蛋白质活性所需的。然而,本领域技术人员应当理解,功能变体可以具有较少的在保守区域中的保守或非保守改变。Conservative amino acid substitutions can be made at one or more predicted non-essential amino acid residues. "Non-essential" amino acid residues are amino acid residues that can be changed (deleted, substituted or replaced) without changing the biological activity, while "essential" amino acid residues are required for biological activity. "Conservative amino acid substitutions" are substitutions in which amino acid residues are replaced by amino acid residues with similar side chains. Amino acid substitutions can be made in the non-conserved regions of the above-mentioned engineered V-type Cas proteins. In general, such substitutions are not made to conserved amino acid residues, or are not made to amino acid residues located in conserved motifs, where such residues are required for protein activity. However, it will be appreciated by those skilled in the art that functional variants may have fewer conservative or non-conservative changes in conserved regions.

表1

Table 1

本领域熟知,可以从蛋白质的N和/或C末端改变(置换、删除、截短或插入)一或多个氨基酸残基而仍保留其功能活性。因此,从Cas蛋白或逆转录酶的N和/或C末端改变了一或多个氨基酸残基、同时保留了其所需功能活性的蛋白,也在本发明的范围内。这些改变可以包括通过现代分子方法例如PCR而引入的改变,所述方法包括借助于在PCR扩增中使用的寡核苷酸之中包含氨基酸编码序列而改变或延长蛋白质编码序列的PCR扩增。It is well known in the art that one or more amino acid residues can be changed (replaced, deleted, truncated or inserted) from the N and/or C terminus of a protein while still retaining its functional activity. Therefore, proteins in which one or more amino acid residues are changed from the N and/or C terminus of a Cas protein or reverse transcriptase while retaining its desired functional activity are also within the scope of the present invention. These changes may include changes introduced by modern molecular methods such as PCR, which includes PCR amplification of a protein coding sequence by means of including an amino acid coding sequence in an oligonucleotide used in PCR amplification to change or extend the protein coding sequence.

应认识到,蛋白质可以以各种方式进行改变,包括氨基酸置换、删除、截短和插入,用于此类操作的方法是本领域通常已知的。例如,可以通过对DNA的突变来制备上述蛋白的氨基酸序列变体。还可以通过其他诱变形式和/或通过定向进化来完成,例如,使用已知的诱变、重组和/或改组(shuffling)方法,结合相关的筛选方法,来进行单个或多个氨基酸取代、缺失和/或插入。It will be appreciated that proteins can be altered in a variety of ways, including amino acid substitutions, deletions, truncations and insertions, and methods for such manipulations are generally known in the art. For example, amino acid sequence variants of the above-mentioned proteins can be prepared by mutations in the DNA. Other forms of mutagenesis and/or directed evolution can also be accomplished, for example, using known mutagenesis, recombination and/or shuffling methods, in combination with related screening methods, to perform single or multiple amino acid substitutions, deletions and/or insertions.

领域技术人员能够理解,本发明的Cas蛋白或逆转录酶中的这些微小氨基酸变化可以出现(例如天然存在的突变)或者产生(例如使用r-DNA技术)而不损失蛋白质功能或活性。如果这些突变出现在蛋白的催化结构域、活性位点或其它功能结构域中,则多肽的性质可改变,但多肽可保持其活性。如果存在的突变不接近催化结构域、活性位点或其它功能结构域中,则可预期较小影响。Those skilled in the art will appreciate that these minor amino acid changes in the Cas protein or reverse transcriptase of the present invention can occur (e.g., naturally occurring mutations) or be generated (e.g., using r-DNA technology) without loss of protein function or activity. If these mutations occur in the catalytic domain, active site, or other functional domain of the protein, the properties of the polypeptide may be changed, but the polypeptide may retain its activity. If the mutations present are not close to the catalytic domain, active site, or other functional domain, lesser effects may be expected.

本领域技术人员可以根据本领域已知的方法,例如定位诱变或蛋白进化或生物信息系的分析,来鉴定本发明的Cas蛋白或逆转录酶的必需氨基酸。蛋白的催化结构域、活性位点或其它功能结构域也能够通过结构的物理分析而确定,如通过以下这些技术:如核磁共振、晶体学、电子衍射或光亲和标记,结合推定的关键位点氨基酸的突变来确定。Those skilled in the art can identify the essential amino acids of the Cas protein or reverse transcriptase of the present invention according to methods known in the art, such as site-directed mutagenesis or protein evolution or analysis of a bioinformatics system. The catalytic domain, active site or other functional domain of the protein can also be determined by physical analysis of the structure, such as by the following techniques: such as nuclear magnetic resonance, crystallography, electron diffraction or photoaffinity labeling, combined with mutations of amino acids at putative key sites.

本发明中,氨基酸残基可以用单字母表示,也可以用三字母表示,例如:丙氨酸(Ala,A),缬氨酸(Val,V),甘氨酸(Gly,G),亮氨酸(Leu,L),谷酰胺酸(Gln,Q),苯丙氨酸(Phe,F),色氨酸(Trp,W),酪氨酸(Tyr,Y),天冬氨酸(Asp,D),天冬酰胺(Asn,N),谷氨酸(Glu,E),赖氨酸(Lys,K),甲硫氨酸(Met,M),丝氨酸(Ser,S),苏氨酸(Thr,T),半胱氨酸(Cys,C),脯氨酸(Pro,P),异亮氨酸(Ile,I),组氨酸(His,H),精氨酸(Arg,R)。In the present invention, amino acid residues can be represented by single letters or three letters, for example: alanine (Ala, A), valine (Val, V), glycine (Gly, G), leucine (Leu, L), glutamine (Gln, Q), phenylalanine (Phe, F), tryptophan (Trp, W), tyrosine (Tyr, Y), aspartic acid (Asp, D), asparagine (Asn, N), glutamic acid (Glu, E), lysine (Lys, K), methionine (Met, M), serine (Ser, S), threonine (Thr, T), cysteine (Cys, C), proline (Pro, P), isoleucine (Ile, I), histidine (His, H), arginine (Arg, R).

本发明所述蛋白质内的特定氨基酸位置(编号)是利用标准序列比对工具通过将目标蛋白质的氨基酸序列与参考氨基酸序列(例如,SEQ ID No.1)进行比对而确定的,譬如用Smith-Waterman运算法则或用CLUSTALW2运算法则比对两个序列,其中当比对得分最高时认为所述序列是对准的。比对得分可依照Wilbur,W.J.and Lipman,D.J.(1983)Rapid similarity searches ofnucleic acid and protein data banks.Proc.Natl.Acad.Sci.USA,80:726-730中所述的方法进行计算。在ClustalW2(1.82)运算法则中优选使用默认参数:蛋白质缺口开放罚分=10.0;蛋白质缺口延伸罚分=0.2;蛋白质矩阵=Gonnet;蛋白质/DNA端隙=-1;蛋白质/DNAGAPDIST=4。优选采用AlignX程序(vectorNTI组中的一部分),以适于多重比对的默认参数(缺口开放罚分:10.0缺口延伸罚分0.05)通过将蛋白质的氨基酸序列与SEQ ID No.1进行比来确定本发明所述蛋白质内特定氨基酸的位置。Specific amino acid positions (numbers) within the proteins of the present invention are determined by aligning the amino acid sequence of the target protein with a reference amino acid sequence (e.g., SEQ ID No. 1) using standard sequence alignment tools, such as using the Smith-Waterman algorithm or using the CLUSTALW2 algorithm to align the two sequences, wherein the sequences are considered aligned when the alignment score is highest. The alignment score can be calculated according to the method described in Wilbur, W.J. and Lipman, D.J. (1983) Rapid similarity searches of nuclear acid and protein data banks. Proc. Natl. Acad. Sci. USA, 80:726-730. The default parameters are preferably used in the ClustalW2 (1.82) algorithm: protein gap open penalty = 10.0; protein gap extension penalty = 0.2; protein matrix = Gonnet; protein/DNA end gap = -1; protein/DNAGAPDIST = 4. Preferably, the AlignX program (part of the vectorNTI group) is used to determine the position of specific amino acids within the protein described in the present invention by comparing the amino acid sequence of the protein with SEQ ID No.1 with default parameters suitable for multiple alignment (gap opening penalty: 10.0, gap extension penalty 0.05).

在一个实施方式中,所述融合蛋白还包含连接Cas蛋白和逆转录酶的接头。In one embodiment, the fusion protein further comprises a linker connecting the Cas protein and the reverse transcriptase.

本发明中,接头可以用于连接本发明的任何肽或蛋白结构域。在某些实施方案中,接头是多肽。在某些实施方案中,接头是共价键(例如,碳-碳键、二硫键、碳-杂原子键等)。在某些实施方案中,接头是酰胺连接的碳-氮键。在某些实施方案中,接头是环状或无环的、取代或未取代的、支链或无支链的脂族或杂脂族接头。在某些实施方案中,接头是聚合的(例如聚乙烯、聚乙二醇、聚酰胺、聚酯等)。在某些实施方案中,接头包含氨基链烷酸的单体、二聚体或聚合物。在某些实施方案中,接头包含氨基链烷酸(例如甘氨酸、乙酸、丙氨酸、β-丙氨酸、3-氨基丙酸、4-氨基丁酸、5-戊酸等)。在某些实施方案中,接头包含氨基己酸(Ahx)的单体、二聚体或聚合物。在某些实施方案中,接头基于碳环部分(例如环戊烷,环己烷)。在其他实施方案中,接头包含聚乙二醇部分(PEG)。在其他实施方案中,接头包含氨基酸。在某些实施方案中,接头包含肽。在某些实施方案中,接头包含芳基或杂芳基部分。在某些实施方案中,接头基于苯环。接头可以包含官能化部分以促进来自肽的亲核体(例如硫醇,氨基)与接头的附接。任何亲电体可以用作接头的一部分。In the present invention, the joint can be used to connect any peptide or protein domain of the present invention. In certain embodiments, the joint is a polypeptide. In certain embodiments, the joint is a covalent bond (e.g., carbon-carbon bond, disulfide bond, carbon-heteroatom bond, etc.). In certain embodiments, the joint is a carbon-nitrogen bond connected by amides. In certain embodiments, the joint is a cyclic or acyclic, substituted or unsubstituted, branched or unbranched aliphatic or heteroaliphatic joint. In certain embodiments, the joint is polymeric (e.g., polyethylene, polyethylene glycol, polyamide, polyester, etc.). In certain embodiments, the joint comprises a monomer, dimer or polymer of aminoalkanoic acid. In certain embodiments, the joint comprises aminoalkanoic acid (e.g., glycine, acetic acid, alanine, β-alanine, 3-aminopropionic acid, 4-aminobutyric acid, 5-pentanoic acid, etc.). In certain embodiments, the joint comprises a monomer, dimer or polymer of aminocaproic acid (Ahx). In certain embodiments, the joint is based on a carbocyclic moiety (e.g., cyclopentane, cyclohexane). In other embodiments, the joint comprises a polyethylene glycol moiety (PEG). In other embodiments, the linker comprises an amino acid. In certain embodiments, the linker comprises a peptide. In certain embodiments, the linker comprises an aryl or heteroaryl moiety. In certain embodiments, the linker is based on a benzene ring. The linker can include a functionalized portion to facilitate attachment of a nucleophile (e.g., thiol, amino) from a peptide to the linker. Any electrophile can be used as a part of a linker.

在一些实施方案中,接头可以是GS接头。在一些实施方案中,接头可包含氨基酸序列(GGS)n,GS,SG,GSSG,S(GGS)n,SGGS或(GGGGS)n,其中n是1-20的整数(例如1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19或20)。在一些实施方案中,接头可包含氨基酸序列:SGGSGGSGGS。在一些实施方案中,接头可包含氨基酸序列:SGSETPGTSESATPES,也称作XTEN接头。在一些实施方案中,接头可包含氨基酸序列:SGGSSGGSSGSETPGTSESATPESSGGSSGGS,也称作GS-XTEN-GS接头。In some embodiments, the linker can be a GS linker. In some embodiments, the linker can comprise the amino acid sequence (GGS)n, GS, SG, GSSG, S(GGS)n, SGGS, or (GGGGS)n, where n is an integer from 1 to 20 (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20). In some embodiments, the linker can comprise the amino acid sequence: SGGSGGSGGS. In some embodiments, the linker can comprise the amino acid sequence: SGSETPGTSESATPES, also known as an XTEN linker. In some embodiments, the linker can comprise the amino acid sequence: SGGSSGGSSGSETPGTSESATPESSGGSSGGS, also known as a GS-XTEN-GS linker.

在优选的实施方式中,接头为XTEN接头,氨基酸序列SGGSSGGSSGSETPGTSESATPESSGGSSGGSS(SEQ ID NO.3)。In a preferred embodiment, the linker is an XTEN linker with an amino acid sequence of SGGSSGGSSGSETPGTSESATPESSGGSSGGSS (SEQ ID NO.3).

在一个实施方式中,所述融合蛋白还包括修饰部分,所述修饰部分选自表位标签、报告基因序列、核定位信号(NLS)序列、靶向部分、转录激活结构域(例如,VP64)、转录抑制结构域(例如,KRAB结构域或SID结构域)、核酸酶结构域(例如,Fok1),以及具有选自下列的活性的结构域:核苷酸脱氨酶(例如,腺苷脱氨酶或胞苷脱氨酶),甲基化酶活性,去甲基化酶,转录激活活性,转录抑制活性,转录释放因子活性,组蛋白修饰活性,核酸酶活性,单链RNA切割活性,双链RNA切割活性,单链DNA切割活性,双链DNA切割活性和核酸结合活性;以及其任意组合。所述NLS序列是本领域技术人员熟知的,其实例包括但不限于所述,SV40大T抗原,EGL-13,c-Myc以及TUS蛋白。In one embodiment, the fusion protein further comprises a modification portion selected from an epitope tag, a reporter gene sequence, a nuclear localization signal (NLS) sequence, a targeting portion, a transcriptional activation domain (e.g., VP64), a transcriptional repression domain (e.g., a KRAB domain or a SID domain), a nuclease domain (e.g., Fok1), and a domain having an activity selected from the following: nucleotide deaminase (e.g., adenosine deaminase or cytidine deaminase), methylase activity, demethylase, transcriptional activation activity, transcriptional repression activity, transcriptional release factor activity, histone modification activity, nuclease activity, single-stranded RNA cleavage activity, double-stranded RNA cleavage activity, single-stranded DNA cleavage activity, double-stranded DNA cleavage activity and nucleic acid binding activity; and any combination thereof. The NLS sequence is well known to those skilled in the art, and examples thereof include, but are not limited to, SV40 large T antigen, EGL-13, c-Myc and TUS protein.

在一个实施方式中,本发明的融合蛋白还包含核定位序列(NLS)。在一些实施方案中,NLS与融合蛋白的N端融合。在一些实施方案中,NLS与融合蛋白的C端融合。在其他的实施方式中,融合蛋白的N端和C段均连接有NLS。In one embodiment, the fusion protein of the present invention further comprises a nuclear localization sequence (NLS). In some embodiments, the NLS is fused to the N-terminus of the fusion protein. In some embodiments, the NLS is fused to the C-terminus of the fusion protein. In other embodiments, both the N-terminus and the C-segment of the fusion protein are connected to the NLS.

在一些实施方案中,NLS与Cas蛋白的N端融合。在一些实施方案中,NLS与Cas蛋白的C端融合。在一些实施方案中,NLS与逆转录酶的N端融合。在一些实施方案中,NLS与逆转录酶的C端融合。在一些实施方案中,NLS经由一个或多个接头与融合蛋白融合。在一些实施方案中,NLS与融合蛋白在没有接头的情况下融合,核定位序列(NLS)是本领域已知的并且对于技术人员是显而易见的,在一些实施方案中,NLS的序列包含氨基酸序列KRPAATKKAGQAKKKK(SEQ ID No.6),或者PKKKRKV(SEQ ID No.7)。In some embodiments, the NLS is fused to the N-terminus of the Cas protein. In some embodiments, the NLS is fused to the C-terminus of the Cas protein. In some embodiments, the NLS is fused to the N-terminus of the reverse transcriptase. In some embodiments, the NLS is fused to the C-terminus of the reverse transcriptase. In some embodiments, the NLS is fused to the fusion protein via one or more linkers. In some embodiments, the NLS is fused to the fusion protein without a linker, and the nuclear localization sequence (NLS) is known in the art and is obvious to the technician. In some embodiments, the sequence of the NLS comprises the amino acid sequence KRPAATKKAGQAKKKK (SEQ ID No.6), or PKKKRKV (SEQ ID No.7).

所述表位标签(epitope tag)是本领域技术人员熟知的,包括但不限于His、V5、FLAG、HA、Myc、VSV-G、Trx等,并且本领域技术人员可以选择其他合适的表位标签(例如,纯化、检测或示踪)。The epitope tag is well known to those skilled in the art, including but not limited to His, V5, FLAG, HA, Myc, VSV-G, Trx, etc., and those skilled in the art can select other suitable epitope tags (for example, purification, detection or tracing).

所述报告基因序列是本领域技术人员熟知的,其实例包括但不限于GST、HRP、CAT、GFP、HcRed、DsRed、CFP、YFP、BFP等。The reporter gene sequence is well known to those skilled in the art, and examples thereof include but are not limited to GST, HRP, CAT, GFP, HcRed, DsRed, CFP, YFP, BFP, etc.

在一个实施方式中,本发明的融合蛋白包含能够与DNA分子或细胞内分子结合的结构域,例如麦芽糖结合蛋白(MBP)、Lex A的DNA结合结构域(DBD)、GAL4的DBD等。In one embodiment, the fusion protein of the present invention comprises a domain capable of binding to a DNA molecule or an intracellular molecule, such as maltose binding protein (MBP), the DNA binding domain (DBD) of Lex A, the DBD of GAL4, etc.

在一个实施方式中,本发明的融合蛋白包含可检测的标记,例如荧光染料,例如FITC或DAPI。In one embodiment, the fusion protein of the invention comprises a detectable label, such as a fluorescent dye, such as FITC or DAPI.

在一个实施方式中,所述Cas蛋白一端与逆转录酶连接,另一端还可以与单链结合蛋白或泛素样修饰蛋白连接。In one embodiment, one end of the Cas protein is connected to the reverse transcriptase, and the other end can also be connected to a single-chain binding protein or a ubiquitin-like modified protein.

在一个实施方式中,所述逆转录酶连接在Cas蛋白的N端或C端;在优选的实施方式中,所述逆转录酶连接在Cas蛋白的N端。In one embodiment, the reverse transcriptase is connected to the N-terminus or C-terminus of the Cas protein; in a preferred embodiment, the reverse transcriptase is connected to the N-terminus of the Cas protein.

在一个实施方式中,所述单链结合蛋白或泛素样修饰蛋白连接在Cas蛋白的N端或C端。In one embodiment, the single-chain binding protein or ubiquitin-like modifying protein is connected to the N-terminus or C-terminus of the Cas protein.

在一个实施方式中,所述逆转录酶一端与Cas蛋白连接,另一端还可以与单链结合蛋白或泛素样修饰蛋白连接。In one embodiment, one end of the reverse transcriptase is connected to the Cas protein, and the other end can also be connected to a single-chain binding protein or a ubiquitin-like modified protein.

在一个实施方式中,所述单链结合蛋白或泛素样修饰蛋白连接在逆转录酶的N端或C端。In one embodiment, the single-chain binding protein or ubiquitin-like modifying protein is linked to the N-terminus or C-terminus of the reverse transcriptase.

在一个实施方式中,所述单链结合蛋白能够在双链断裂时稳定暴露出来的ssDNA,所述单链结合蛋白选自Brex27、EcRecA(来自大肠杆菌的单链结合蛋白)、BsRecA(来自枯草芽孢杆菌的单链结合蛋白)或T4SSB(来自T4噬菌体的单链结合蛋白)等,在优选的实施方式中,所述单链结合蛋白为Brex27,优选的,所述Brex氨基酸序列如SEQ ID NO.5所示。In one embodiment, the single-stranded binding protein is capable of stabilizing the ssDNA exposed when the double strand breaks, and the single-stranded binding protein is selected from Brex27, EcRecA (single-stranded binding protein from Escherichia coli), BsRecA (single-stranded binding protein from Bacillus subtilis) or T4SSB (single-stranded binding protein from T4 phage), etc. In a preferred embodiment, the single-stranded binding protein is Brex27. Preferably, the Brex amino acid sequence is as shown in SEQ ID NO.5.

在一个实施方式中,所述泛素样修饰蛋白能够促进精确的DNA修复。优选的,所述泛素样修饰蛋白的氨基酸序列如SEQ ID NO.15所示。In one embodiment, the ubiquitin-like modified protein can promote accurate DNA repair. Preferably, the amino acid sequence of the ubiquitin-like modified protein is shown in SEQ ID NO.15.

在一些实施方式中,所述连接为直接连接,也可以为通过接头连接。In some embodiments, the connection is a direct connection, or a connection through a linker.

在优选的实施方式中,所述接头的氨基酸序列如SEQ ID NO.3所示。In a preferred embodiment, the amino acid sequence of the linker is as shown in SEQ ID NO.3.

蛋白-核酸复合物/组合物Protein-nucleic acid complexes/compositions

一方面,本发明还提供了一种用于引导编辑的复合物,所述复合物包含In one aspect, the present invention also provides a complex for guiding editing, the complex comprising

(i)蛋白组分,其选自:上述的融合蛋白,包含Cas蛋白和逆转录酶;(i) a protein component selected from: the above-mentioned fusion protein comprising a Cas protein and a reverse transcriptase;

(ii)核酸组分,其为引导编辑向导RNA(PEgRNA),所述PEgRNA包含向导RNA(gRNA)和核酸延伸臂,所述核酸延伸臂包含引物结合位点序列(PBS)和逆转录模板序列(RTT);所述gRNA包括引导序列和骨架序列;所述引导序列能够与靶核酸配对,所述骨架序列能够与所述Cas蛋白相互作用;(ii) a nucleic acid component, which is a guide editing guide RNA (PEgRNA), wherein the PEgRNA comprises a guide RNA (gRNA) and a nucleic acid extension arm, wherein the nucleic acid extension arm comprises a primer binding site sequence (PBS) and a reverse transcription template sequence (RTT); the gRNA comprises a guide sequence and a backbone sequence; the guide sequence is capable of pairing with a target nucleic acid, and the backbone sequence is capable of interacting with the Cas protein;

所述蛋白组分与核酸组分相互结合形成复合物。The protein component and the nucleic acid component are combined with each other to form a complex.

在优选的实施方式中,所述骨架序列如SEQ ID NO.4所示。In a preferred embodiment, the backbone sequence is shown as SEQ ID NO.4.

在一个实施方式中,所述复合物或组合物是非天然存在的或经修饰的。在一个实施方式中,所述复合物或组合物中的至少一个组分是非天然存在的或经修饰的。在一个实施方式中,所述第一组分是非天然存在的或经修饰的;和/或,所述第二组分是非天然存在的或经修饰的。In one embodiment, the complex or composition is non-naturally occurring or modified. In one embodiment, at least one component of the complex or composition is non-naturally occurring or modified. In one embodiment, the first component is non-naturally occurring or modified; and/or, the second component is non-naturally occurring or modified.

在一个实施方式中,所述逆转录模板储存有靶向位点编辑信息,所述融合蛋白与引导编辑向导RNA结合时能够靶向靶序列。In one embodiment, the reverse transcription template stores the target site editing information, and the fusion protein can target the target sequence when combined with the guide editing guide RNA.

在一个实施方式中,所述靶序列包含靶链和互补的非靶链。In one embodiment, the target sequence comprises a target strand and a complementary non-target strand.

在一个实施方式中,所述gRNA能够与所述互补非靶链(非PAM链)杂交。In one embodiment, the gRNA is capable of hybridizing to the complementary non-target strand (non-PAM strand).

在一个实施方式中,所述核酸延伸臂位于所述向导RNA的3’或5’末端处、或所述向导RNA中的分子内位置处,并且所述核酸延伸臂是DNA或RNA。In one embodiment, the nucleic acid extension arm is located at the 3' or 5' end of the guide RNA, or at an intramolecular position in the guide RNA, and the nucleic acid extension arm is DNA or RNA.

在一个实施方式中核酸延伸臂还包含同源臂序列。In one embodiment the nucleic acid extension arm further comprises a homology arm sequence.

在一个实施方式中,所述同源臂是至少1个核苷酸、至少2个核苷酸、至少3个核苷酸、至少4个核苷酸、至少5个核苷酸、至少6个核苷酸、至少7个核苷酸、至少8个核苷酸、至少9个核苷酸、至少10个核苷酸、至少11个核苷酸、至少12个核苷酸、至少13个核苷酸、至少14个核苷酸、至少15个核苷酸、至少16个核苷酸、至少17个核苷酸、至少18个核苷酸、至少19个核苷酸、至少20个核苷酸、至少21个核苷酸、至少22个核苷酸、至少23个核苷酸、至少24个核苷酸、至少25个核苷酸、至少26个核苷酸、至少27个核苷酸、至少28个核苷酸、至少29个核苷酸或至少30个核苷酸,优选的,所述同源臂为17个核苷酸。In one embodiment, the homology arm is at least 1 nucleotide, at least 2 nucleotides, at least 3 nucleotides, at least 4 nucleotides, at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleotides, at least 20 nucleotides, at least 21 nucleotides, at least 22 nucleotides, at least 23 nucleotides, at least 24 nucleotides, at least 25 nucleotides, at least 26 nucleotides, at least 27 nucleotides, at least 28 nucleotides, at least 29 nucleotides or at least 30 nucleotides. Preferably, the homology arm is 17 nucleotides.

在一个实施方式中,所述核酸延伸臂长度为5-200bp,优选的,所述核酸延伸臂长度为17-159bp,优选的,所述核酸延伸臂长度为20-150bp,优选的,所述核酸延伸臂长度为30-140bp,优选的,所述核酸延伸臂长度为40-135bp,优选的,所述核酸延伸臂为50-130bp,优选的,所述核酸延伸臂为60-120bp,优选的,所述核酸延伸臂为70-120bp,优选的,所述核酸延伸臂为80-115bp,优选的,所述核酸延伸臂为90-110bp,优选的,所述核酸延伸臂为100bp。In one embodiment, the length of the nucleic acid extension arm is 5-200bp, preferably, the length of the nucleic acid extension arm is 17-159bp, preferably, the length of the nucleic acid extension arm is 20-150bp, preferably, the length of the nucleic acid extension arm is 30-140bp, preferably, the length of the nucleic acid extension arm is 40-135bp, preferably, the nucleic acid extension arm is 50-130bp, preferably, the nucleic acid extension arm is 60-120bp, preferably, the nucleic acid extension arm is 70-120bp, preferably, the nucleic acid extension arm is 80-115bp, preferably, the nucleic acid extension arm is 90-110bp, preferably, the nucleic acid extension arm is 100bp.

在一个实施方式中,所述PBS序列的长度至少3个核苷酸、至少4个核苷酸、至少5个核苷酸、至少6个核苷酸、至少7个核苷酸、至少8个核苷酸、至少9个核苷酸、至少10个核苷酸、至少11个核苷酸、至少12个核苷酸、至少13个核苷酸、至少14个核苷酸、或至少15个核苷酸,优选的PBS序列的长度为9-79bp,优选的,为8-48bp,优选的,为48bp,优选的,为40bp。In one embodiment, the length of the PBS sequence is at least 3 nucleotides, at least 4 nucleotides, at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, or at least 15 nucleotides. The preferred length of the PBS sequence is 9-79 bp, preferably 8-48 bp, preferably 48 bp, and preferably 40 bp.

在一个实施方式中,所述RTT序列的长度至少3个核苷酸、至少4个核苷酸、至少5个核苷酸、至少6个核苷酸、至少7个核苷酸、至少8个核苷酸、至少9个核苷酸、至少10个核苷酸、至少11个核苷酸、至少12个核苷酸、至少13个核苷酸、至少14个核苷酸、或至少15个核苷酸,优选的,为8-80bp,优选的,为52-72bp,优选的,为36-72bp。In one embodiment, the length of the RTT sequence is at least 3 nucleotides, at least 4 nucleotides, at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, or at least 15 nucleotides, preferably, 8-80 bp, preferably, 52-72 bp, preferably, 36-72 bp.

在一个实施方式中,所述PEgRNA还包含至少一个选自下组的另外的结构:接头、茎环、发夹、趾环(toeloop)、适体或RNA-蛋白募集结构域。In one embodiment, the PEgRNA further comprises at least one additional structure selected from the group consisting of a linker, a stem loop, a hairpin, a toe loop, an aptamer, or an RNA-protein recruitment domain.

在一个实施方式中,所述PBS序列和所述引导序列分别与靶核酸的同一条链(非PAM链)配对。In one embodiment, the PBS sequence and the guide sequence are paired with the same strand (non-PAM strand) of the target nucleic acid, respectively.

在一个实施方式中,所述PBS序列和所述引导序列分别与靶核酸的两条链配对。In one embodiment, the PBS sequence and the guide sequence are paired with the two strands of the target nucleic acid, respectively.

在一个实施方式中,所述RTT序列与靶核酸序列相比,其互补同源性小于100%,或小于95%,或小于90%,或小于85%,或小于80%,或小于70%,或小于60%,或小于50%,或小于40%,或小于30%,或小于20%,或小于10%,或小于5%。In one embodiment, the complementary homology of the RTT sequence compared to the target nucleic acid sequence is less than 100%, or less than 95%, or less than 90%, or less than 85%, or less than 80%, or less than 70%, or less than 60%, or less than 50%, or less than 40%, or less than 30%, or less than 20%, or less than 10%, or less than 5%.

在一个实施方式中,所述RTT序列与靶核酸序列不互补。In one embodiment, the RTT sequence is not complementary to the target nucleic acid sequence.

在一个实施方式中中,所述RTT序列与靶核酸序列互补,同时具有一个或多个核苷酸不配对,即具有至少1%,至少5%,至少10%,至少20%,至少40%,至少60%,至少80%,至少90%,至少95%的核苷酸不配对。In one embodiment, the RTT sequence is complementary to the target nucleic acid sequence and has one or more nucleotide mispairing, i.e., at least 1%, at least 5%, at least 10%, at least 20%, at least 40%, at least 60%, at least 80%, at least 90%, at least 95% nucleotide mispairing.

在一个实施方式中,所述PEgRNA从5’到3’方向包括gRNA、RTT序列以及PBS序列。In one embodiment, the PEgRNA includes gRNA, RTT sequence and PBS sequence from 5' to 3' direction.

在一个实施方式中,所述PEgRNA从3’到5’方向包括gRNA、RTT序列以及PBS序列。In one embodiment, the PEgRNA includes gRNA, RTT sequence and PBS sequence from 3' to 5' direction.

在优选的实施方式中,所述PEgRNA的序列如SEQ ID NO.9、SEQ ID NO.10或SEQ ID NO.11所示。In a preferred embodiment, the sequence of the PEgRNA is shown as SEQ ID NO.9, SEQ ID NO.10 or SEQ ID NO.11.

在一个实施方式中,所述PEgRNA还包含中止信号。In one embodiment, the PEgRNA further comprises a termination signal.

在优选的实施方式中,所述PEgRNA一端还连接DNA鸟嘌呤四联体(G-quadruplex),优选的,所述G-quadruplex的核苷酸序列如SEQ ID NO.8所示。In a preferred embodiment, one end of the PEgRNA is also connected to a DNA guanine quadruplex (G-quadruplex), and preferably, the nucleotide sequence of the G-quadruplex is as shown in SEQ ID NO.8.

在一个实施方式中,所述RTT可以被逆转录酶用作模板序列用于合成具有5’末端的相应单链DNA flap,其中所述DNA flap与邻近切口位点的内源性靶DNA序列的链互补,且其中所述单链DNA flap包含由所述编辑模板编码的核苷酸修饰。In one embodiment, the RTT can be used as a template sequence by a reverse transcriptase for synthesizing a corresponding single-stranded DNA flap having a 5’ end, wherein the DNA flap is complementary to the strand of the endogenous target DNA sequence adjacent to the nicking site, and wherein the single-stranded DNA flap contains nucleotide modifications encoded by the editing template.

在一个实施方式中,所述单链DNA flap置换已形成切口的靶DNA序列中具有3’末端的内源性单链DNA。In one embodiment, the single-stranded DNA flap replaces the endogenous single-stranded DNA having a 3’ end in the target DNA sequence in which the nick has been formed.

在一个实施方式中,所述单链DNA flap的细胞修复导致所述核苷酸修饰的安装(installation),从而形成期望产物。In one embodiment, cellular repair of the single-stranded DNA flap results in the installation of the nucleotide modification, thereby forming a desired product.

在一个实施方式中,所述核苷酸修饰为单个核苷酸取代、缺失、插入或大片段的缺失、插入或替换等。In one embodiment, the nucleotide modification is a single nucleotide substitution, deletion, insertion, or large fragment deletion, insertion, or replacement, etc.

在一个实施方式中,所述基因编辑的结果为单个核苷酸取代、缺失、插入或大片段的缺失、插入或替换等。In one embodiment, the result of the gene editing is a single nucleotide substitution, deletion, insertion, or large fragment deletion, insertion, or replacement, etc.

在一个实施方式中,所述复合物的切口位点位于PAM链的原间隔区(或称为spacer)中;在一个实施方式中,所述核苷酸修饰或基因编辑的位置为原间隔区(或称为spacer)以及原间隔区(或称为spacer)的上游和/或下游5bp内、10bp内、20bp内、或30bp内、或40bp内、或50bp内、或60bp内、或70bp内、或80bp内、或90bp内、或100bp内、或150bp内、或200bp内、或250bp内、或300bp内的任意碱基之间的磷酸二酯键。In one embodiment, the nicking site of the complex is located in the protospacer (or spacer) of the PAM chain; in one embodiment, the position of the nucleotide modification or gene editing is the protospacer (or spacer) and the phosphodiester bond between any bases within 5bp, 10bp, 20bp, 30bp, 40bp, 50bp, 60bp, 70bp, 80bp, 90bp, 100bp, 150bp, 200bp, 250bp, or 300bp upstream and/or downstream of the protospacer (or spacer).

在一个实施方式中,所述核苷酸修饰或基因编辑的位置为-3位至17位(PAM链的原间隔区的第1个碱基为位点1),或-3位至23位(PAM链的原间隔区的第1个碱基为位点1)。In one embodiment, the position of the nucleotide modification or gene editing is -3 to 17 (the first base of the original spacer of the PAM chain is site 1), or -3 to 23 (the first base of the original spacer of the PAM chain is site 1).

在优选的实施方式中,所述单个核苷酸的取代是转换或替换,优选的,所述取代选自以下:G到T取代,G到A取代,G到C取代,T到G取代,T到A取代,T到C取代,C到G取代,C到T取代,C到A取代,A到T取代,A到G取代,或A到C取代;优选的,所述转换选自以下:G:C碱基对转换为T:A碱基对,G:C碱基对转换为A:T碱基对,G:C碱基对转换为C:G碱基对,T:A碱基对转换为G:C碱基对,T:A碱基对转换为A:T碱基对,T:A碱基对转换为C:G碱基对,C:G碱基对转换为G:C碱基对,C:G碱基对转换为T:A碱基对,C:G碱基对转换为A:T碱基对,A:T碱基对转换为T:A碱基对,A:T碱基对转换为G:C碱基对,或A:T碱基对转换为C:G碱基对。In a preferred embodiment, the substitution of the single nucleotide is a conversion or replacement, preferably, the substitution is selected from the following: G to T substitution, G to A substitution, G to C substitution, T to G substitution, T to A substitution, T to C substitution, C to G substitution, C to T substitution, C to A substitution, A to T substitution, A to G substitution, or A to C substitution; preferably, the conversion is selected from the following: conversion of a G:C base pair to a T:A base pair, conversion of a G:C base pair to an A:T base pair, G The A:C base pair is converted to a C:G base pair, the T:A base pair is converted to a G:C base pair, the T:A base pair is converted to an A:T base pair, the T:A base pair is converted to a C:G base pair, the C:G base pair is converted to a G:C base pair, the C:G base pair is converted to a T:A base pair, the C:G base pair is converted to an A:T base pair, the A:T base pair is converted to a T:A base pair, the A:T base pair is converted to a G:C base pair, or the A:T base pair is converted to a C:G base pair.

在优选的实施方式中,所述核苷酸修饰为核苷酸缺失,所述缺失的长度为至少1个核苷酸、至少2个核苷酸、至少3个核苷酸、至少4个核苷酸、至少5个核苷酸、至少6个核苷酸、至少7个核苷酸、至少8个核苷酸、至少9个核苷酸、至少10个核苷酸、至少11个核苷酸、至少12个核苷酸、至少13个核苷酸、至少14个核苷酸、至少15个核苷酸、至少16个核苷酸、至少17个核苷酸、至少18个核苷酸、至少19个核苷酸、至少20个核苷酸、至少21个核苷酸、至少22个核苷酸、至少23个核苷酸、至少24个核苷酸、至少25个核苷酸、至少26个核苷酸、至少27个核苷酸、至少28个核苷酸、至少29个核苷酸、至少30个核苷酸、至少31个核苷酸、至少32个核苷酸、至少33个核苷酸、至少34个核苷酸、至少35个核苷酸、至少36个核苷酸、至少37个核苷酸、至少38个核苷酸、至少39个核苷酸、至少40个核苷酸或至少100个核苷酸。In a preferred embodiment, the nucleotide modification is a nucleotide deletion, and the length of the deletion is at least 1 nucleotide, at least 2 nucleotides, at least 3 nucleotides, at least 4 nucleotides, at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleotides, at least 20 nucleotides, at least 21 nucleotides, at least 22 nucleotides, at least 23 nucleotides, at least 24 nucleotides, at least 25 nucleotides, at least 26 nucleotides, at least 27 nucleotides, at least 28 nucleotides, at least 29 nucleotides, at least 30 nucleotides, at least 31 nucleotides, at least 32 nucleotides, at least 33 nucleotides, at least 34 nucleotides, at least 35 20 nucleotides, at least 21 nucleotides, at least 22 nucleotides, at least 23 nucleotides, at least 24 nucleotides, at least 25 nucleotides, at least 26 nucleotides, at least 27 nucleotides, at least 28 nucleotides, at least 29 nucleotides, at least 30 nucleotides, at least 31 nucleotides, at least 32 nucleotides, at least 33 nucleotides, at least 34 nucleotides, at least 35 nucleotides, at least 36 nucleotides, at least 37 nucleotides, at least 38 nucleotides, at least 39 nucleotides, at least 40 nucleotides, or at least 100 nucleotides.

在优选实施方式中,所述核苷酸修饰为核苷酸插入,优选的,所述插入的长度为至少1个核苷酸、至少2个核苷酸、至少3个核苷酸、至少4个核苷酸、至少5个核苷酸、至少6个核苷酸、至少7个核苷酸、至少8个核苷酸、至少9个核苷酸、至少10个核苷酸、至少11个核苷酸、至少12个核苷酸、至少13个核苷酸、至少14个核苷酸、至少15个核苷酸、至少16个核苷酸、至少17个核苷酸、至少18个核苷酸、至少19个核苷酸、至少20个核苷酸、至少21个核苷酸、至少22个核苷酸、至少23个核苷酸、至少24个核苷酸、至少25个核苷酸、至少26个核苷酸、至少27个核苷酸、至少28个核苷酸、至少29个核苷酸、至少30个核苷酸、至少31个核苷酸、至少32个核苷酸、至少33个核苷酸、至少34个核苷酸、至少35个核苷酸、至少36个核苷酸、至少37个核苷酸、至少38个核苷酸、至少39个核苷酸、至少40个核苷酸或至少100个核苷酸。在优选的实施方式中,所述插入是编码多肽的序列。In a preferred embodiment, the nucleotide modification is a nucleotide insertion, and preferably, the length of the insertion is at least 1 nucleotide, at least 2 nucleotides, at least 3 nucleotides, at least 4 nucleotides, at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleotides, at least 20 nucleotides, at least 21 nucleotides, at least 22 nucleotides, at least 23 nucleotides, at least 24 nucleotides, at least 25 nucleotides, at least 26 nucleotides, at least 27 nucleotides, at least 28 nucleotides, at least 29 nucleotides, at least 30 nucleotides, at least 31 nucleotides, at least 32 nucleotides, at least 33 nucleotides, at least 34 nucleotides, at least 35 nucleotides, at least 36 nucleotides, at least 37 nucleotides, at least 38 nucleotides, at least 39 nucleotides, at least 40 nucleotides, at least 41 nucleotides, at least 42 In some embodiments, the insertion sequence is a sequence encoding a polypeptide.

在优选实施方式中,所述核苷酸修饰为核苷酸替换,优选的,所述替换的核苷酸为至少2个核苷酸、至少3个核苷酸、至少4个核苷酸、至少5个核苷酸、至少6个核苷酸、至少7个核苷酸、至少8个核苷酸、至少9个核苷酸、至少10个核苷酸、至少11个核苷酸、至少12个核苷酸、至少13个核苷酸、至少14个核苷酸、至少15个核苷酸、至少16个核苷酸、至少17个核苷酸、至少18个核苷酸、至少19个核苷酸、至少20个核苷酸、至少21个核苷酸、至少22个核苷酸、至少23个核苷酸、至少24个核苷酸、至少25个核苷酸、至少26个核苷酸、至少27个核苷酸、至少28个核苷酸、至少29个核苷酸、至少30个核苷酸、至少31个核苷酸、至少32个核苷酸、至少33个核苷酸、至少34个核苷酸、至少35个核苷酸、至少36个核苷酸、至少37个核苷酸、至少38个核苷酸、至少39个核苷酸、至少40个核苷酸或至少100个核苷酸。In a preferred embodiment, the nucleotide modification is a nucleotide replacement, preferably, the replaced nucleotides are at least 2 nucleotides, at least 3 nucleotides, at least 4 nucleotides, at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleotides, at least 20 nucleotides, at least 21 nucleotides, at least 22 nucleotides, at least 23 nucleotides, at least 24 nucleotides, at least 25 nucleotides, at least 26 nucleotides, at least 27 nucleotides, at least 28 nucleotides, at least 29 nucleotides, at least 30 nucleotides, at least 21 nucleotides, at least 22 nucleotides, at least 23 nucleotides, at least 24 nucleotides, at least 25 nucleotides, at least 26 nucleotides, at least 27 nucleotides, at least 28 nucleotides, at least 29 nucleotides, at least 30 nucleotides, at least 31 nucleotides, at least 32 nucleotides, at least 33 nucleotides, at least 34 nucleotides, at least 35 nucleotides, at least 36 nucleotides, at least 37 nucleotides, at least 38 nucleotides, at least 39 nucleotides, at least 40 nucleotides, or at least 100 nucleotides.

核酸Nucleic Acids

另一方面,本发明提供了一种分离的多核苷酸,其包含:In another aspect, the present invention provides an isolated polynucleotide comprising:

(a)编码本发明的工程化的融合蛋白或复合物的多核苷酸序列;(a) a polynucleotide sequence encoding an engineered fusion protein or complex of the present invention;

或者,与(a)所述的多核苷酸互补的多核苷酸。Alternatively, a polynucleotide complementary to the polynucleotide described in (a).

在一个实施方式中,所述的核苷酸序列经密码子优化用于在原核细胞中进行表达。在一个实施方式中,所述的核苷酸序列经密码子优化用于在真核细胞中进行表达。In one embodiment, the nucleotide sequence is codon optimized for expression in prokaryotes. In one embodiment, the nucleotide sequence is codon optimized for expression in eukaryotic cells.

在一个实施方式中,所述细胞是动物细胞,例如,哺乳动物细胞。In one embodiment, the cell is an animal cell, eg, a mammalian cell.

在一个实施方式中,所述细胞是人类细胞。In one embodiment, the cell is a human cell.

在一个实施方式中,所述细胞是植物细胞,例如栽培植物(如木薯、玉米、高粱、小麦或水稻)、藻类、树或蔬菜具有的细胞。In one embodiment, the cell is a plant cell, such as a cell of a cultivated plant (such as cassava, corn, sorghum, wheat, or rice), algae, tree, or vegetable.

在一个实施方式中,所述的多核苷酸优选是单链的或双链的。In one embodiment, the polynucleotide is preferably single-stranded or double-stranded.

引导编辑向导RNA(PEgRNA)Primer editing guide RNA (PEgRNA)

另一方面,本发明提供了一种所述PEgRNA包含向导RNA(gRNA)和核酸延伸臂,所述核酸延伸臂包含DNA结合模板,所述DNA结合模板引物结合位点序列(PBS)和逆转录模板序列(RTT);所述gRNA包括引导序列和骨架序列;所述引导序列能够与靶核酸配对,所述骨架序列能够与所述Cas蛋白相互作用。On the other hand, the present invention provides a PEgRNA comprising a guide RNA (gRNA) and a nucleic acid extension arm, wherein the nucleic acid extension arm comprises a DNA binding template, a DNA binding template primer binding site sequence (PBS) and a reverse transcription template sequence (RTT); the gRNA includes a guide sequence and a backbone sequence; the guide sequence can be paired with a target nucleic acid, and the backbone sequence can interact with the Cas protein.

引导编辑技术的原理是核酸可编程所述向导RNA的蛋白质结合序列能够与本发明融合蛋白中的DNA结合蛋白(Cas蛋白)相互作用,从而使DNA结合蛋白(Cas蛋白)和向导RNA形成复合物。The principle of guide editing technology is that the protein binding sequence of the guide RNA programmable by nucleic acid can interact with the DNA binding protein (Cas protein) in the fusion protein of the present invention, thereby forming a complex between the DNA binding protein (Cas protein) and the guide RNA.

在一个实施方式中,所述逆转录模板储存有靶向位点编辑信息,所述融合蛋白与引导编辑向导RNA结合时能够靶向靶序列。In one embodiment, the reverse transcription template stores the target site editing information, and the fusion protein can target the target sequence when combined with the guide editing guide RNA.

在一个实施方式中,所述靶序列包含靶链和互补的非靶链。In one embodiment, the target sequence comprises a target strand and a complementary non-target strand.

在一个实施方式中,所述gRNA与所述互补非靶链(非PAM链)杂交形成RNA-DNA杂合体和R-环。In one embodiment, the gRNA hybridizes with the complementary non-target strand (non-PAM strand) to form an RNA-DNA hybrid and an R-loop.

在一个实施方式中,所述核酸延伸臂位于所述向导RNA的3’或5’末端处、或所述向导RNA中的分子内位置处,并且所述核酸延伸臂是DNA或RNA。In one embodiment, the nucleic acid extension arm is located at the 3' or 5' end of the guide RNA, or at an intramolecular position in the guide RNA, and the nucleic acid extension arm is DNA or RNA.

在优选的实施方式中,所述骨架序列如SEQ ID NO.4所示,所述引导序列如SEQ ID NO.14所示。In a preferred embodiment, the backbone sequence is shown as SEQ ID NO.4, and the guide sequence is shown as SEQ ID NO.14.

载体Carrier

本发明还提供了一种载体,其包含如上述的融合蛋白、复合物、分离的核酸分子或多核苷酸;优选的,其还包括与之可操作连接的调控元件。The present invention also provides a vector, which comprises the fusion protein, complex, isolated nucleic acid molecule or polynucleotide as described above; preferably, it also comprises a regulatory element operably linked thereto.

在一个实施方式中,所述的调控元件选自下组中的一种或多种:增强子、转座子、启动子、终止子、前导序列、多腺苷酸序列、标记基因。In one embodiment, the regulatory element is selected from one or more of the following groups: enhancer, transposon, promoter, terminator, leader sequence, polyadenylation sequence, marker gene.

在一个实施方式中,所述的载体包括克隆载体、表达载体、穿梭载体、整合载体。In one embodiment, the vector includes a cloning vector, an expression vector, a shuttle vector, and an integration vector.

在一些实施方案中,所述系统中包括的载体是病毒载体(例如逆转录病毒载体,慢病毒载体,腺病毒载体,腺相关载体和单纯疱疹载体),还可以是质粒、病毒、粘粒、噬菌体等类型,它们是本领域技术人员所熟知的。In some embodiments, the vector included in the system is a viral vector (e.g., a retroviral vector, a lentiviral vector, an adenoviral vector, an adeno-associated vector, and a herpes simplex vector), and can also be a plasmid, a virus, a cosmid, a phage, etc., which are well known to those skilled in the art.

在一个实施方式中,所述融合蛋白与PEgRNA位于同一载体。In one embodiment, the fusion protein and PEgRNA are located in the same vector.

在一个实施方式中,所述融合蛋白与PEgRNA位于不同载体。In one embodiment, the fusion protein and PEgRNA are located in different vectors.

宿主细胞Host cells

本发明还涉及一种体外的、离体的或体内的细胞或细胞系或它们的子代,所述细胞或细胞系或它们的子代包含:本发明所述的融合蛋白、核酸分子、蛋白-核酸复合物、载体、本发明递送组合物。The present invention also relates to an in vitro, ex vivo or in vivo cell or cell line or their progeny, wherein the cell or cell line or their progeny comprises: the fusion protein, nucleic acid molecule, protein-nucleic acid complex, vector, and delivery composition of the present invention.

在某些实施方案中,所述细胞是原核细胞。In certain embodiments, the cell is a prokaryotic cell.

在某些实施方案中,所述细胞是真核细胞。在某些实施方案中,所述细胞是哺乳动物细胞。在某些实施方案中,所述细胞是人类细胞。某些实施方案中,所述细胞是非人哺乳动物细胞,例如非人灵长类动物、牛、羊、猪、犬、猴、兔、啮齿类(如大鼠或小鼠)的细胞。在某些实施方案中,所述细胞是非哺乳动物真核细胞,例如家禽鸟类(如鸡)、鱼类或甲壳动物(如蛤蜊、虾)的细胞。在某些实施方案中,所述细胞是植物细胞,例如单子叶植物或双子叶植物具有的细胞或栽培植物或粮食作物如木薯、玉米、高粱、大豆、小麦、燕麦或水稻具有的细胞,例如藻类、树或生产植物、果实或蔬菜(例如,树类如柑橘树、坚果树;茄属植物、棉花、烟草、番茄、葡萄、咖啡、可可等)。In certain embodiments, the cell is a eukaryotic cell. In certain embodiments, the cell is a mammalian cell. In certain embodiments, the cell is a human cell. In certain embodiments, the cell is a non-human mammalian cell, such as a cell of a non-human primate, a cow, a sheep, a pig, a dog, a monkey, a rabbit, a rodent (such as a rat or a mouse). In certain embodiments, the cell is a non-mammalian eukaryotic cell, such as a cell of a poultry bird (such as a chicken), a fish or a crustacean (such as a clam, a shrimp). In certain embodiments, the cell is a plant cell, such as a cell or a cultivated plant or a food crop such as cassava, corn, sorghum, soybean, wheat, oat or rice, such as algae, a tree or a production plant, a fruit or a vegetable (for example, a tree such as a citrus tree, a nut tree; Solanum, cotton, tobacco, tomato, grape, coffee, cocoa, etc.).

在某些实施方案中,所述细胞是干细胞或干细胞系。In certain embodiments, the cell is a stem cell or a stem cell line.

在某些情况下,本发明的宿主细胞包含基因或基因组的修饰,该修饰是在其野生型中不存在的修饰。In certain cases, the host cells of the invention contain genetic or genomic modifications that are not present in their wild type.

递送及递送组合物Delivery and delivery compositions

本发明的融合蛋白、PEgRNA、核酸分子、载体、系统、复合物和组合物,可以通过本领域已知的任何方法进行递送。此类方法包括但不限于,电穿孔、脂转染、核转染、显微注射、声孔效应、基因枪、磷酸钙介导的转染、阳离子转染、脂质体转染、树枝状转染、热激转染、核转染、磁转染、脂转染、穿刺转染、光学转染、试剂增强性核酸摄取、以及经由脂质体、免疫脂质体、病毒颗粒、人工病毒体等的递送。The fusion proteins, PEgRNA, nucleic acid molecules, vectors, systems, complexes and compositions of the present invention can be delivered by any method known in the art. Such methods include, but are not limited to, electroporation, lipofection, nuclear transfection, microinjection, sonoporation, gene gun, calcium phosphate-mediated transfection, cationic transfection, liposome transfection, dendritic transfection, heat shock transfection, nuclear transfection, magnetofection, lipofection, puncture transfection, optical transfection, agent-enhanced nucleic acid uptake, and delivery via liposomes, immunoliposomes, viral particles, artificial virions, etc.

因此,在另一个方面,本发明提供了一种递送组合物,其包含递送载体,以及选自下列的一种或任意几种:本发明的融合蛋白、PEgRNA、核酸分子、载体、系统、复合物和组合物。Therefore, in another aspect, the present invention provides a delivery composition comprising a delivery vector and one or any several selected from the following: the fusion protein, PEgRNA, nucleic acid molecule, vector, system, complex and composition of the present invention.

在一个实施方式中,所述递送载体是粒子。In one embodiment, the delivery vehicle is a particle.

在一个实施方式中,所述递送载体选自脂质颗粒、糖颗粒、金属颗粒、蛋白颗粒、脂质体、外泌体、微泡、基因枪或病毒载体(例如,复制缺陷型逆转录病毒、慢病毒、腺病毒或腺相关病毒)。In one embodiment, the delivery vehicle is selected from lipid particles, sugar particles, metal particles, protein particles, liposomes, exosomes, microvesicles, gene guns or viral vectors (e.g., replication-defective retroviruses, lentiviruses, adenoviruses or adeno-associated viruses).

基因编辑方法和应用Gene Editing Methods and Applications

本发明的融合蛋白、核酸、上述复合物、上述CIRSPR/Cas系统、上述载体系统、上述递送组合物或者上述宿主细胞可用于以下任一或任意几个用途:靶向和/或编辑靶核酸;特异性地编辑双链核酸;碱基编辑双链核酸;碱基编辑单链核酸。在其他的实施方式中,还可以用于制备用于上述任一或任意几个用途的试剂或试剂盒。The fusion protein, nucleic acid, complex, CIRSPR/Cas system, vector system, delivery composition or host cell of the present invention can be used for any one or more of the following purposes: targeting and/or editing target nucleic acid; specifically editing double-stranded nucleic acid; base editing double-stranded nucleic acid; base editing single-stranded nucleic acid. In other embodiments, it can also be used to prepare reagents or kits for any one or more of the above purposes.

本发明还提供了上述融合蛋白、核酸、上述复合物、上述CIRSPR/Cas系统、上述载体系统、上述递送组合物或上述宿主细胞在基因编辑、基因靶向或基因切割中的应用;或者,在制备用于基因编辑、基因靶向或基因切割的试剂或试剂盒中的用途。The present invention also provides the use of the above-mentioned fusion protein, nucleic acid, complex, CIRSPR/Cas system, vector system, delivery composition or host cell in gene editing, gene targeting or gene cleavage; or, use in the preparation of reagents or kits for gene editing, gene targeting or gene cleavage.

在一个实施方式中,所述基因编辑、基因靶向或基因切割为在细胞内和/或细胞外进行基因编辑、基因靶向或基因切割。In one embodiment, the gene editing, gene targeting or gene cleavage is performed inside and/or outside the cell.

本发明还提供了一种基因编辑、基因靶向或基因切割的方法,所述方法包括将靶核酸与上述融合蛋白、核酸、上述复合物、上述CIRSPR/Cas系统、上述载体系统、上述递送组合物或上述宿主细胞进行接触。在一个实施方式中,所述方法为在细胞内或细胞外编辑靶核酸、靶向靶核酸或切割靶核酸。The present invention also provides a method for gene editing, gene targeting or gene cutting, the method comprising contacting the target nucleic acid with the above-mentioned fusion protein, nucleic acid, the above-mentioned complex, the above-mentioned CIRSPR/Cas system, the above-mentioned vector system, the above-mentioned delivery composition or the above-mentioned host cell. In one embodiment, the method is to edit the target nucleic acid, target the target nucleic acid or cut the target nucleic acid in a cell or outside the cell.

所述基因编辑或编辑靶核酸包括修饰基因、敲除基因、插入基因、改变基因产物的表达、修复突变、和/或插入多核苷酸、基因突变。The gene editing or editing of target nucleic acid includes modifying genes, knocking out genes, inserting genes, changing the expression of gene products, repairing mutations, and/or inserting polynucleotides, gene mutations.

所述编辑可以在原核细胞和/或真核细胞中进行编辑。The editing can be performed in prokaryotic cells and/or eukaryotic cells.

在某些实施方案中,所述细胞是原核细胞。In certain embodiments, the cell is a prokaryotic cell.

在某些实施方案中,所述细胞是真核细胞。在某些实施方案中,所述细胞是哺乳动物细胞。在某些实施方案中,所述细胞是人类细胞。某些实施方案中,所述细胞是非人哺乳动物细胞,例如非人灵长类动物、牛、羊、猪、犬、猴、兔、啮齿类(如大鼠或小鼠)的细胞。在某些实施方案中,所述细胞是非哺乳动物真核细胞,例如家禽鸟类(如鸡)、鱼类或甲壳动物(如蛤蜊、虾)的细胞。在某些实施方案中,所述细胞是植物细胞,例如单子叶植物或双子叶植物具有的细胞或栽培植物或粮食作物如木薯、玉米、高粱、大豆、小麦、燕麦或水稻具有的细胞,例如藻类、树或生产植物、果实或蔬菜(例如,树类如柑橘树、坚果树;茄属植物、棉花、烟草、番茄、葡萄、咖啡、可可等)。In certain embodiments, the cell is a eukaryotic cell. In certain embodiments, the cell is a mammalian cell. In certain embodiments, the cell is a human cell. In certain embodiments, the cell is a non-human mammalian cell, such as a cell of a non-human primate, a cow, a sheep, a pig, a dog, a monkey, a rabbit, a rodent (such as a rat or a mouse). In certain embodiments, the cell is a non-mammalian eukaryotic cell, such as a cell of a poultry bird (such as a chicken), a fish or a crustacean (such as a clam, a shrimp). In certain embodiments, the cell is a plant cell, such as a cell or a cultivated plant or a food crop such as cassava, corn, sorghum, soybean, wheat, oat or rice, such as algae, a tree or a production plant, a fruit or a vegetable (for example, a tree such as a citrus tree, a nut tree; Solanum, cotton, tobacco, tomato, grape, coffee, cocoa, etc.).

另一方面,发明提供了上述融合蛋白、核酸、上述组合物、上述CIRSPR/Cas系统、上述载体系统、上述递送组合物、或上述宿主细胞在制备制剂或试剂盒中的用途,所述制剂或试剂盒用于:On the other hand, the invention provides the use of the above-mentioned fusion protein, nucleic acid, composition, CIRSPR/Cas system, vector system, delivery composition, or host cell in preparing a preparation or a kit, wherein the preparation or kit is used for:

(i)基因或基因组编辑;(i) gene or genome editing;

(ii)靶核酸检测和/或诊断;(ii) target nucleic acid detection and/or diagnosis;

(iii)编辑靶基因座中的靶序列来修饰生物;(iii) editing a target sequence in a target locus to modify an organism;

(iv)疾病的治疗;(iv) treatment of disease;

(v)靶向靶基因;(v) targeting target genes;

(vi)切割目的基因。(vi) Cutting the target gene.

优选的,上述基因或基因组编辑为在细胞内或细胞外进行基因或基因组编辑。Preferably, the above-mentioned gene or genome editing is performed inside or outside the cell.

优选的,所述靶核酸检测和/或诊断为在体外进行靶核酸检测和/或诊断。Preferably, the target nucleic acid detection and/or diagnosis is performed in vitro.

优选的,所述疾病的治疗为治疗由靶基因座中的靶序列的缺陷引起的病症。Preferably, the treatment of the disease is the treatment of a condition caused by a defect in the target sequence in the target locus.

特异性修饰靶核酸的方法Method for specifically modifying target nucleic acid

另一方面,本发明还提供了一种特异性修饰靶核酸的方法,方法包括:使靶核酸与上述融合蛋白、核酸、上述组合物、上述CIRSPR/Cas系统、上述载体系统或上述递送组合物接触。On the other hand, the present invention also provides a method for specifically modifying a target nucleic acid, the method comprising: contacting the target nucleic acid with the above-mentioned fusion protein, nucleic acid, the above-mentioned composition, the above-mentioned CIRSPR/Cas system, the above-mentioned vector system or the above-mentioned delivery composition.

该特异性修饰可以发生在体内或者体外。The specific modification can occur in vivo or in vitro.

该特异性修饰可以发生在细胞内或者细胞外。The specific modification can occur inside or outside the cell.

在一些情况下,细胞选自原核细胞或真核细胞,例如,动物细胞、植物细胞或微生物细胞。In some cases, the cell is selected from a prokaryotic cell or a eukaryotic cell, for example, an animal cell, a plant cell, or a microbial cell.

在一个实施方式中,所述核苷酸修饰为单个核苷酸取代、缺失、插入或大片段的缺失和插入等。In one embodiment, the nucleotide modification is a single nucleotide substitution, deletion, insertion, or large-scale deletion and insertion, etc.

在优选的实施方式中,所述单个核苷酸的取代是转换或替换,优选的,所述取代选自以下:G到T取代,G到A取代,G到C取代,T到G取代,T到A取代,T到C取代,C到G取代,C到T取代,C到A取代,A到T取代,A到G取代,或A到C取代;优选的,所述转换选自以下:G:C碱基对转换为T:A碱基对,G:C碱基对转换为A:T碱基对,G:C碱基对转换为C:G碱基对,T:A碱基对转换为G:C碱基对,T:A碱基对转换为A:T碱基对,T:A碱基对转换为C:G碱基对,C:G碱基对转换为G:C碱基对,C:G碱基对转换为T:A碱基对,C:G碱基对转换为A:T碱基对,A:T碱基对转换为T:A碱基对,A:T碱基对转换为G:C碱基对,或A:T碱基对转换为C:G碱基对。In a preferred embodiment, the substitution of the single nucleotide is a conversion or replacement, preferably, the substitution is selected from the following: G to T substitution, G to A substitution, G to C substitution, T to G substitution, T to A substitution, T to C substitution, C to G substitution, C to T substitution, C to A substitution, A to T substitution, A to G substitution, or A to C substitution; preferably, the conversion is selected from the following: conversion of a G:C base pair to a T:A base pair, conversion of a G:C base pair to an A:T base pair, G The A:C base pair is converted to a C:G base pair, the T:A base pair is converted to a G:C base pair, the T:A base pair is converted to an A:T base pair, the T:A base pair is converted to a C:G base pair, the C:G base pair is converted to a G:C base pair, the C:G base pair is converted to a T:A base pair, the C:G base pair is converted to an A:T base pair, the A:T base pair is converted to a T:A base pair, the A:T base pair is converted to a G:C base pair, or the A:T base pair is converted to a C:G base pair.

在优选的实施方式中,所述核苷酸修饰为核苷酸缺失,所述缺失的长度为至少1个核苷酸、至少2个核苷酸、至少3个核苷酸、至少4个核苷酸、至少5个核苷酸、至少6个核苷酸、至少7个核苷酸、至少8个核苷酸、至少9个核苷酸、至少10个核苷酸、至少11个核苷酸、至少12个核苷酸、至少13个核苷酸、至少14个核苷酸、至少15个核苷酸、至少16个核苷酸、至少17个核苷酸、至少18个核苷酸、至少19个核苷酸、至少20个核苷酸、至少21个核苷酸、至少22个核苷酸、至少23个核苷酸、至少24个核苷酸、至少25个核苷酸、至少26个核苷酸、至少27个核苷酸、至少28个核苷酸、至少29个核苷酸、至少30个核苷酸、至少31个核苷酸、至少32个核苷酸、至少33个核苷酸、至少34个核苷酸、至少35个核苷酸、至少36个核苷酸、至少37个核苷酸、至少38个核苷酸、至少39个核苷酸、至少40个核苷酸或至少100个核苷酸。In a preferred embodiment, the nucleotide modification is a nucleotide deletion, and the length of the deletion is at least 1 nucleotide, at least 2 nucleotides, at least 3 nucleotides, at least 4 nucleotides, at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleotides, at least 20 nucleotides, at least 21 nucleotides, at least 22 nucleotides, at least 23 nucleotides, at least 24 nucleotides, at least 25 nucleotides, at least 26 nucleotides, at least 27 nucleotides, at least 28 nucleotides, at least 29 nucleotides, at least 30 nucleotides, at least 31 nucleotides, at least 32 nucleotides, at least 33 nucleotides, at least 34 nucleotides, at least 35 20 nucleotides, at least 21 nucleotides, at least 22 nucleotides, at least 23 nucleotides, at least 24 nucleotides, at least 25 nucleotides, at least 26 nucleotides, at least 27 nucleotides, at least 28 nucleotides, at least 29 nucleotides, at least 30 nucleotides, at least 31 nucleotides, at least 32 nucleotides, at least 33 nucleotides, at least 34 nucleotides, at least 35 nucleotides, at least 36 nucleotides, at least 37 nucleotides, at least 38 nucleotides, at least 39 nucleotides, at least 40 nucleotides, or at least 100 nucleotides.

在优选实施方式中,所述核苷酸修饰为核苷酸插入,优选的,所述插入的长度为至少1个核苷酸、至少2个核苷酸、至少3个核苷酸、至少4个核苷酸、至少5个核苷酸、至少6个核苷酸、至少7个核苷酸、至少8个核苷酸、至少9个核苷酸、至少10个核苷酸、至少11个核苷酸、至少12个核苷酸、至少13个核苷酸、至少14个核苷酸、至少15个核苷酸、至少16个核苷酸、至少17个核苷酸、至少18个核苷酸、至少19个核苷酸、至少20个核苷酸、至少21个核苷酸、至少22个核苷酸、至少23个核苷酸、至少24个核苷酸、至少25个核苷酸、至少26个核苷酸、至少27个核苷酸、至少28个核苷酸、至少29个核苷酸、至少30个核苷酸、至少31个核苷酸、至少32个核苷酸、至少33个核苷酸、至少34个核苷酸、至少35个核苷酸、至少36个核苷酸、至少37个核苷酸、至少38个核苷酸、至少39个核苷酸、至少40个核苷酸或至少100个核苷酸。In a preferred embodiment, the nucleotide modification is a nucleotide insertion, and preferably, the length of the insertion is at least 1 nucleotide, at least 2 nucleotides, at least 3 nucleotides, at least 4 nucleotides, at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleotides, at least 20 nucleotides, at least 21 nucleotides, at least 22 nucleotides, at least 23 nucleotides, at least 24 nucleotides, at least 25 nucleotides, at least 26 nucleotides, at least 27 nucleotides, at least 28 nucleotides, at least 29 nucleotides, at least 30 nucleotides, at least 31 nucleotides, at least 32 nucleotides, at least 33 nucleotides, at least 34 nucleotides, at least 35 nucleotides, at least 36 nucleotides, at least 37 nucleotides, at least 38 nucleotides, at least 39 nucleotides, at least 40 nucleotides, at least 41 nucleotides, at least 42 At least 20 nucleotides, at least 21 nucleotides, at least 22 nucleotides, at least 23 nucleotides, at least 24 nucleotides, at least 25 nucleotides, at least 26 nucleotides, at least 27 nucleotides, at least 28 nucleotides, at least 29 nucleotides, at least 30 nucleotides, at least 31 nucleotides, at least 32 nucleotides, at least 33 nucleotides, at least 34 nucleotides, at least 35 nucleotides, at least 36 nucleotides, at least 37 nucleotides, at least 38 nucleotides, at least 39 nucleotides, at least 40 nucleotides, or at least 100 nucleotides.

在优选的实施方式中,所述插入是编码多肽的序列。In a preferred embodiment, the insertion is a sequence encoding a polypeptide.

另一方面,本发明还提供了一种双链DNA序列中引入期望的核苷酸变化的方法,所述方法包括:使所述双链DNA序列与包含上述融合蛋白和PEgRNA的复合物接触,其中所述融合蛋白包含Cas蛋白和逆转录酶,其中所述PEgRNA包含含有所述期望的核苷酸变化的DNA合成模板(即,逆转录模板序列,RTT)和引物结合位点;从而对所述双链DNA序列产生切口,由此产生具有靶向链3'末端的游离单链DNA;从而使所述游离单链DNA的3'末端与所述引物结合位点杂交,由此激活所述逆转录酶在与所述引物结合位点杂交的3'末端以RTT为模板聚合DNA链,由此产生包含所述期望的核苷酸变化且与所述RTT模板互补的单链DNA;从而利用细胞内源的DNA修复机制将所述单链DNA替换邻近切割位点的内源性DNA链,由此在所述双链DNA序列中安装所述期望的核苷酸变化。On the other hand, the present invention also provides a method for introducing desired nucleotide changes in a double-stranded DNA sequence, the method comprising: contacting the double-stranded DNA sequence with a complex comprising the above-mentioned fusion protein and PEgRNA, wherein the fusion protein comprises a Cas protein and a reverse transcriptase, wherein the PEgRNA comprises a DNA synthesis template (i.e., a reverse transcription template sequence, RTT) containing the desired nucleotide changes and a primer binding site; thereby generating a nick in the double-stranded DNA sequence, thereby generating free single-stranded DNA having a 3' end of a targeted chain; thereby allowing the 3' end of the free single-stranded DNA to hybridize with the primer binding site, thereby activating the reverse transcriptase to polymerize the DNA chain using RTT as a template at the 3' end hybridized with the primer binding site, thereby generating single-stranded DNA comprising the desired nucleotide changes and complementary to the RTT template; thereby utilizing the endogenous DNA repair mechanism of the cell to replace the endogenous DNA chain adjacent to the cutting site with the single-stranded DNA, thereby installing the desired nucleotide changes in the double-stranded DNA sequence.

在一个实施方式中,期望的核苷酸变化安装在原间隔区的-3至+17之间或-3至23之间的编辑窗口中(PAM链的原间隔区的第1个碱基为位点1)。在其他实施方式中,期望的核苷酸变化安装在原间隔区内以及原间隔区的-5至+5之间、或-10至+10之间、或-20至+20之间,或-30至+30之间,或-40至+40之间,或-50至+50之间,或-60至+60之间,或-70至+70之间,或-80至+80之间,或-90至+90之间,或-100至+100之间,或-200至+200之间的编辑窗口中。In one embodiment, the desired nucleotide change is installed in an editing window between -3 and +17 or between -3 and 23 of the protospacer (the first base of the protospacer of the PAM strand is position 1). In other embodiments, the desired nucleotide change is installed in the protospacer and in an editing window between -5 and +5, or between -10 and +10, or between -20 and +20, or between -30 and +30, or between -40 and +40, or between -50 and +50, or between -60 and +60, or between -70 and +70, or between -80 and +80, or between -90 and +90, or between -100 and +100, or between -200 and +200 of the protospacer.

在一个实施方式中,期望的核苷酸变化安装在PAM序列的约--7至+23之间的编辑窗口中。在其他实施方式中,期望的核苷酸变化安装在切口位点的约-5至+5之间、或切口位点的约-10至+10之间、或切口位点的约-20至+20之间,或切口位点的约-30至+30之间,或切口位点的约-40至+40之间,或约-50至+50之间切口位点的约-60至+60之间,或切口位点的约-70至+70之间,或切口位点的约-80至+80之间,或切口位点的约-90至+90之间,或切口位点的约-100至+100之间,或切口位点的约-200至+200之间的编辑窗口中。In one embodiment, the desired nucleotide change is installed in an editing window between about -7 and +23 of the PAM sequence. In other embodiments, the desired nucleotide change is installed in an editing window between about -5 and +5 of the nicking site, or between about -10 and +10 of the nicking site, or between about -20 and +20 of the nicking site, or between about -30 and +30 of the nicking site, or between about -40 and +40 of the nicking site, or between about -50 and +50 of the nicking site, between about -60 and +60 of the nicking site, or between about -70 and +70 of the nicking site, or between about -80 and +80 of the nicking site, or between about -90 and +90 of the nicking site, or between about -100 and +100 of the nicking site, or between about -200 and +200 of the nicking site.

在一个实施方式中,所述期望的核苷酸变化为单个核苷酸取代、缺失、插入或大片段的缺失、插入或替换等。In one embodiment, the desired nucleotide change is a single nucleotide substitution, deletion, insertion, or large-scale deletion, insertion, or replacement, etc.

在优选的实施方式中,所述单个核苷酸的取代是转换或替换,优选的,所述取代选自以下:G到T取代,G到A取代,G到C取代,T到G取代,T到A取代,T到C取代,C到G取代,C到T取代,C到A取代,A到T取代,A到G取代,或A到C取代;优选的,所述转换选自以下:G:C碱基对转换为T:A碱基对,G:C碱基对转换为A:T碱基对,G:C碱基对转换为C:G碱基对,T:A碱基对转换为G:C碱基对,T:A碱基对转换为A:T碱基对,T:A碱基对转换为C:G碱基对,C:G碱基对转换为G:C碱基对,C:G碱基对转换为T:A碱基对,C:G碱基对转换为A:T碱基对,A:T碱基对转换为T:A碱基对,A:T碱基对转换为G:C碱基对,或A:T碱基对转换为C:G碱基对。In a preferred embodiment, the substitution of the single nucleotide is a conversion or replacement, preferably, the substitution is selected from the following: G to T substitution, G to A substitution, G to C substitution, T to G substitution, T to A substitution, T to C substitution, C to G substitution, C to T substitution, C to A substitution, A to T substitution, A to G substitution, or A to C substitution; preferably, the conversion is selected from the following: conversion of a G:C base pair to a T:A base pair, conversion of a G:C base pair to an A:T base pair, G The A:C base pair is converted to a C:G base pair, the T:A base pair is converted to a G:C base pair, the T:A base pair is converted to an A:T base pair, the T:A base pair is converted to a C:G base pair, the C:G base pair is converted to a G:C base pair, the C:G base pair is converted to a T:A base pair, the C:G base pair is converted to an A:T base pair, the A:T base pair is converted to a T:A base pair, the A:T base pair is converted to a G:C base pair, or the A:T base pair is converted to a C:G base pair.

在优选的实施方式中,所述期望的核苷酸变化为核苷酸缺失,所述缺失的长度为至少1个核苷酸、至少2个核苷酸、至少3个核苷酸、至少4个核苷酸、至少5个核苷酸、至少6个核苷酸、至少7个核苷酸、至少8个核苷酸、至少9个核苷酸、至少10个核苷酸、至少11个核苷酸、至少12个核苷酸、至少13个核苷酸、至少14个核苷酸、至少15个核苷酸、至少16个核苷酸、至少17个核苷酸、至少18个核苷酸、至少19个核苷酸、至少20个核苷酸、至少21个核苷酸、至少22个核苷酸、至少23个核苷酸、至少24个核苷酸、至少25个核苷酸、至少26个核苷酸、至少27个核苷酸、至少28个核苷酸、至少29个核苷酸、至少30个核苷酸、至少31个核苷酸、至少32个核苷酸、至少33个核苷酸、至少34个核苷酸、至少35个核苷酸、至少36个核苷酸、至少37个核苷酸、至少38个核苷酸、至少39个核苷酸、至少40个核苷酸或至少100个核苷酸。In a preferred embodiment, the desired nucleotide change is a nucleotide deletion, and the length of the deletion is at least 1 nucleotide, at least 2 nucleotides, at least 3 nucleotides, at least 4 nucleotides, at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleotides, at least 20 nucleotides, at least 21 nucleotides, at least 22 nucleotides, at least 23 nucleotides, at least 24 nucleotides, at least 25 nucleotides, at least 26 nucleotides, at least 27 nucleotides, at least 28 nucleotides, at least 29 nucleotides, at least 30 nucleotides, at least 31 nucleotides, at least 32 nucleotides, at least 33 nucleotides, at least 34 nucleotides, at least 35 nucleotides, at least 36 nucleotides, at least 37 nucleotides, at least 38 nucleotides, at least 39 nucleotides, at least 40 nucleotides, at least 41 nucleotides, at least 42 At least 20 nucleotides, at least 21 nucleotides, at least 22 nucleotides, at least 23 nucleotides, at least 24 nucleotides, at least 25 nucleotides, at least 26 nucleotides, at least 27 nucleotides, at least 28 nucleotides, at least 29 nucleotides, at least 30 nucleotides, at least 31 nucleotides, at least 32 nucleotides, at least 33 nucleotides, at least 34 nucleotides, at least 35 nucleotides, at least 36 nucleotides, at least 37 nucleotides, at least 38 nucleotides, at least 39 nucleotides, at least 40 nucleotides, or at least 100 nucleotides.

在优选实施方式中,所述期望的核苷酸变化为核苷酸插入,优选的,所述插入的长度为至少1个核苷酸、至少2个核苷酸、至少3个核苷酸、至少4个核苷酸、至少5个核苷酸、至少6个核苷酸、至少7个核苷酸、至少8个核苷酸、至少9个核苷酸、至少10个核苷酸、至少11个核苷酸、至少12个核苷酸、至少13个核苷酸、至少14个核苷酸、至少15个核苷酸、至少16个核苷酸、至少17个核苷酸、至少18个核苷酸、至少19个核苷酸、至少20个核苷酸、至少21个核苷酸、至少22个核苷酸、至少23个核苷酸、至少24个核苷酸、至少25个核苷酸、至少26个核苷酸、至少27个核苷酸、至少28个核苷酸、至少29个核苷酸、至少30个核苷酸、至少31个核苷酸、至少32个核苷酸、至少33个核苷酸、至少34个核苷酸、至少35个核苷酸、至少36个核苷酸、至少37个核苷酸、至少38个核苷酸、至少39个核苷酸、至少40个核苷酸或至少100个核苷酸。In a preferred embodiment, the desired nucleotide change is a nucleotide insertion, and preferably, the length of the insertion is at least 1 nucleotide, at least 2 nucleotides, at least 3 nucleotides, at least 4 nucleotides, at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleotides. , at least 20 nucleotides, at least 21 nucleotides, at least 22 nucleotides, at least 23 nucleotides, at least 24 nucleotides, at least 25 nucleotides, at least 26 nucleotides, at least 27 nucleotides, at least 28 nucleotides, at least 29 nucleotides, at least 30 nucleotides, at least 31 nucleotides, at least 32 nucleotides, at least 33 nucleotides, at least 34 nucleotides, at least 35 nucleotides, at least 36 nucleotides, at least 37 nucleotides, at least 38 nucleotides, at least 39 nucleotides, at least 40 nucleotides, or at least 100 nucleotides.

应理解,在本发明范围内中,本发明的上述各技术特征和在下文(如实施例)中具体描述的各技术特征之间都可以互相组合,从而构成新的或优选的技术方案。限于篇幅,在此不再一一累述。It should be understood that within the scope of the present invention, the above-mentioned technical features of the present invention and the technical features specifically described below (such as embodiments) can be combined with each other to form a new or preferred technical solution. Due to space limitations, they will not be described one by one here.

具体实施方式DETAILED DESCRIPTION

除非另有定义,否则本文所用的技术和科学术语具有与所属领域的普通技术人员之一通常理解的相同的含义。Unless defined otherwise, technical and scientific terms used herein have the same meanings as commonly understood by one of ordinary skill in the art.

如本文所用,所述“多核苷酸”、“核苷酸序列”、“核酸序列”、“核酸分子”和“核酸”可以互换使用,包括DNA、RNA或者其杂交体,可以是双链或单链的。As used herein, the "polynucleotide", "nucleotide sequence", "nucleic acid sequence", "nucleic acid molecule" and "nucleic acid" are used interchangeably and include DNA, RNA or hybrids thereof, which may be double-stranded or single-stranded.

术语“同源性”或“同一性”用于指两个多肽之间或两个核酸之间序列的匹配情况。当两个进行比较的序列中的某个位置都被相同的碱基或氨基酸单体亚单元占据时(例如,两个DNA分子的每一个中的某个位置都被腺嘌呤占据,或两个多肽的每一个中的某个位置都被赖氨酸占据),那么各分子在该位置上是同一的。两个序列之间。通常,在将两个序列比对以产生最大同一性时进行比较。比对方法为本领域技术人员已知的常规方法,比如BLAST运算法则。The terms "homology" or "identity" are used to refer to the matching of sequences between two polypeptides or between two nucleic acids. When a certain position in the two sequences being compared is occupied by the same base or amino acid monomer subunit (for example, a certain position in each of the two DNA molecules is occupied by adenine, or a certain position in each of the two polypeptides is occupied by lysine), then the molecules are identical at that position. Between two sequences. Usually, the comparison is made when the two sequences are aligned to produce maximum identity. Alignment methods are conventional methods known to those skilled in the art, such as the BLAST algorithm.

术语“互补”或“互补配对”是指核酸分子之间按照碱基互补配对原则存在的特定匹配关系。比如在DNA分子中有四种碱基:腺嘌呤(A)、胸腺嘧啶(T)、胞嘧啶(C)和鸟嘌呤(G)。A和T之间存在互补配对关系,C和G之间存在互补配对关系。因此,如果DNA分子的一条链上的一个碱基是A,那么其互补链上的对应碱基是T;如果DNA分子的一条链上的一个碱基是C,那么其互补链上的对应碱基是G。The term "complementary" or "complementary pairing" refers to a specific matching relationship between nucleic acid molecules according to the principle of complementary base pairing. For example, there are four bases in DNA molecules: adenine (A), thymine (T), cytosine (C) and guanine (G). There is a complementary pairing relationship between A and T, and there is a complementary pairing relationship between C and G. Therefore, if a base on one strand of a DNA molecule is A, then the corresponding base on its complementary strand is T; if a base on one strand of a DNA molecule is C, then the corresponding base on its complementary strand is G.

术语“基因工程”是指通过人工干预的方式对控制生物遗传信息的核苷酸进行改造和利用,从而获得新的遗传特性、或新品种、或新产品的技术,包括本领域所公开的所有基因改造技术,如基因诱变、转基因或基因编辑等方法。基因诱变的方法包括但不限于物理诱变(比如紫外线诱变)、化学诱变(比如吖啶类染料)、生物诱变(如病毒、噬菌体诱变)等。The term "genetic engineering" refers to the technology of modifying and utilizing nucleotides that control biological genetic information through artificial intervention to obtain new genetic characteristics, new varieties, or new products, including all genetic modification technologies disclosed in the art, such as gene mutagenesis, transgenic or gene editing. Gene mutagenesis methods include but are not limited to physical mutagenesis (such as ultraviolet mutagenesis), chemical mutagenesis (such as acridine dyes), biological mutagenesis (such as virus, bacteriophage mutagenesis), etc.

术语“编码”是指多核苷酸中特定核苷酸序列的固有特性,例如基因,cDNA或mRNA,作为在具有限定的核苷酸序列(即rRNA,tRNA和mRNA)或限定的氨基酸序列及其产生的生物学特性的生物学过程中合成其它聚合物和大分子的模板。因此,如果对应于该基因的mRNA的转录和翻译在细胞或其它生物系统中产生蛋白质,则该基因编码该蛋白质。The term "encoding" refers to the inherent property of a particular nucleotide sequence in a polynucleotide, such as a gene, cDNA or mRNA, to serve as a template for the synthesis of other polymers and macromolecules in a biological process having a defined nucleotide sequence (i.e., rRNA, tRNA and mRNA) or a defined amino acid sequence and the biological properties it produces. Thus, a gene encodes a protein if transcription and translation of the mRNA corresponding to the gene produces the protein in a cell or other biological system.

术语“氨基酸”是指含有氨基的羧酸。生物体内的各种蛋白质是由20种基本氨基酸构成的。The term "amino acid" refers to a carboxylic acid containing an amino group. Various proteins in living organisms are composed of 20 basic amino acids.

术语“蛋白”、“多肽”和“肽”在本发明中可以互换使用,指的是氨基酸残基聚合物,包括其中一个或多个氨基酸残基是天然氨基酸残基的化学类似物的聚合物。本发明的蛋白和多肽可以重组产生,也可以通过化学合成。The terms "protein", "polypeptide" and "peptide" are used interchangeably herein to refer to polymers of amino acid residues, including polymers in which one or more amino acid residues are chemical analogs of naturally occurring amino acid residues. The proteins and polypeptides of the present invention can be produced recombinantly or by chemical synthesis.

本发明中,氨基酸残基可以用单字母表示,也可以用三字母表示,例如:丙氨酸(Ala,A),缬氨酸(Val,V),甘氨酸(Gly,G),亮氨酸(Leu,L),谷酰胺酸(Gln,Q),苯丙氨酸(Phe,F),色氨酸(Trp,W),酪氨酸(Tyr,Y),天冬氨酸(Asp,D),天冬酰胺(Asn,N),谷氨酸(Glu,E),赖氨酸(Lys,K),甲硫氨酸(Met,M),丝氨酸(Ser,S),苏氨酸(Thr,T),半胱氨酸(Cys,C),脯氨酸(Pro,P),异亮氨酸(Ile,I),组氨酸(His,H),精氨酸(Arg,R)。In the present invention, amino acid residues can be represented by single letters or three letters, for example: alanine (Ala, A), valine (Val, V), glycine (Gly, G), leucine (Leu, L), glutamine (Gln, Q), phenylalanine (Phe, F), tryptophan (Trp, W), tyrosine (Tyr, Y), aspartic acid (Asp, D), asparagine (Asn, N), glutamic acid (Glu, E), lysine (Lys, K), methionine (Met, M), serine (Ser, S), threonine (Thr, T), cysteine (Cys, C), proline (Pro, P), isoleucine (Ile, I), histidine (His, H), arginine (Arg, R).

术语“调控元件”又称“调节元件”,如本文中所使用的,旨在包括启动子、终止子序列、前导序列、多聚腺苷酸化序列、信号肽编码区、标记基因、增强子、内部核糖体进入位点(IRES)、和其他表达控制元件(例如转录终止信号,如多聚腺苷酸化信号和多聚U序列),其详细描述可参考戈德尔(Goeddel),《基因表达技术:酶学方法》(GENE EXPRESSION TECHNOLOGY:METHODS IN ENZYMOLOGY)185,学术出版社(Academic Press),圣地亚哥(San Diego),加利福尼亚州(1990)。在某些情况下,调控元件包括指导一个核苷酸序列在许多类型的宿主细胞中的组成型表达的那些序列以及指导该核苷酸序列只在某些宿主细胞中表达的那些序列(例如,组织特异型调节序列)。组织特异型启动子可主要指导在感兴趣的期望组织中的表达,所述组织例如肌肉、神经元、骨、皮肤、血液、特定的器官(例如肝脏、胰腺)、或特殊的细胞类型(例如淋巴细胞)。在某些情况下,调控元件还可以时序依赖性方式(如以细胞周期依赖性或发育阶段依赖性方式)指导表达,该方式可以是或者可以不是组织或细胞类型特异性的。在某些情况下,术语“调控元件”涵盖的是增强子元件,如WPRE;CMV增强子;在HTLV-I的LTR中的R-U5’片段((Mol.Cell.Biol.,第8(1)卷,第466-472页,1988);SV40增强子;以及在兔β-珠蛋白的外显子2与3之间的内含子序列(Proc.Natl.Acad.Sci.USA.,第78(3)卷,第1527-31页,1981)。The term "regulatory element", also known as "regulatory element", as used herein, is intended to include promoters, terminator sequences, leader sequences, polyadenylation sequences, signal peptide coding regions, marker genes, enhancers, internal ribosome entry sites (IRES), and other expression control elements (e.g., transcription termination signals, such as polyadenylation signals and poly-U sequences), which are described in detail in Goeddel, GENE EXPRESSION TECHNOLOGY: METHODS IN ENZYMOLOGY 185, Academic Press, San Diego, CA (1990). In some cases, regulatory elements include those sequences that direct constitutive expression of a nucleotide sequence in many types of host cells and those sequences that direct expression of the nucleotide sequence only in certain host cells (e.g., tissue-specific regulatory sequences). Tissue-specific promoters can primarily direct expression in the desired tissue of interest, such as muscle, neuron, bone, skin, blood, a specific organ (e.g., liver, pancreas), or a specific cell type (e.g., lymphocyte). In some cases, regulatory elements can also direct expression in a timing-dependent manner (e.g., in a cell cycle-dependent or developmental stage-dependent manner), which may or may not be tissue- or cell-type-specific. In some cases, the term "regulatory element" encompasses enhancer elements such as WPRE; CMV enhancer; R-U5' fragment in the LTR of HTLV-I ((Mol. Cell. Biol., Vol. 8(1), pp. 466-472, 1988); SV40 enhancer; and intron sequences between exons 2 and 3 of rabbit β-globin (Proc. Natl. Acad. Sci. USA., Vol. 78(3), pp. 1527-31, 1981).

术语“启动子”具有本领域技术人员公知的含义,其是指一段位于基因的上游能启动下游基因表达的非编码核苷酸序列。组成型(constitutive)启动子是这样的核苷酸序列:当其与编码或者限定基因产物的多核苷酸可操作地相连时,在细胞的大多数或者所有生理条件下,其导致细胞中基因产物的产生。诱导型启动子是这样的核苷酸序列,当可操作地与编码或者限定基因产物的多核苷酸相连时,基本上只有当对应于所述启动子的诱导物在细胞中存在时,其导致所述基因产物在细胞内产生。组织特异性启动子是这样的核苷酸序列:当可操作地与编码或者限定基因产物的多核苷酸相连时,基本上只有当细胞是该启动子对应的组织类型的细胞时,其才导致在细胞中产生基因产物。The term "promoter" has a meaning well known to those skilled in the art, and refers to a non-coding nucleotide sequence located upstream of a gene that can initiate expression of a downstream gene. A constitutive promoter is a nucleotide sequence that, when operably linked to a polynucleotide encoding or defining a gene product, results in the production of a gene product in a cell under most or all physiological conditions of the cell. An inducible promoter is a nucleotide sequence that, when operably linked to a polynucleotide encoding or defining a gene product, results in the production of the gene product in the cell essentially only when an inducer corresponding to the promoter is present in the cell. A tissue-specific promoter is a nucleotide sequence that, when operably linked to a polynucleotide encoding or defining a gene product, results in the production of a gene product in the cell essentially only when the cell is a cell of the tissue type corresponding to the promoter.

术语“核定位信号”或“核定位序列”(NLS)是对蛋白质“加标签”以通过核转运导入细胞核的氨基酸序列,即,具有NLS的蛋白质被转运至细胞核。典型地,NLS包含暴露在蛋白质表面的带正电荷的Lys或Arg残基。示例性核定位序列包括但不限于来自以下的NLS:SV40大T抗原,EGL-13,c-Myc以及TUS蛋白。The term "nuclear localization signal" or "nuclear localization sequence" (NLS) is an amino acid sequence that "tags" a protein for import into the nucleus by nuclear transport, i.e., a protein with an NLS is transported to the nucleus. Typically, an NLS comprises a positively charged Lys or Arg residue exposed on the surface of the protein. Exemplary nuclear localization sequences include, but are not limited to, NLSs from: SV40 large T antigen, EGL-13, c-Myc, and TUS proteins.

术语“可操作地连接”旨在表示感兴趣的核苷酸序列以一种允许该核苷酸序列的表达的方式被连接至该一种或多种调控元件(例如,处于一种体外转录/翻译系统中或当该载体被引入到宿主细胞中时,处于该宿主细胞中)。The term "operably linked" is intended to mean that the nucleotide sequence of interest is linked to the one or more regulatory elements in a manner that allows for expression of the nucleotide sequence (e.g., in an in vitro transcription/translation system or in a host cell when the vector is introduced into the host cell).

本发明的核酸序列、核酸构建体或表达载体可以通过多种技术导入宿主细胞,包括转化、转染、转导、病毒感染、基因枪或Ti-质粒介导的基因传递,以及钙磷酸盐转染、DEAE-葡聚糖介导的转染、脂转染或电穿孔等。The nucleic acid sequence, nucleic acid construct or expression vector of the present invention can be introduced into host cells by a variety of techniques, including transformation, transfection, transduction, viral infection, gene gun or Ti-plasmid-mediated gene delivery, as well as calcium phosphate transfection, DEAE-dextran-mediated transfection, lipofection or electroporation.

CRISPR系统CRISPR system

如本文中所用,术语“规律成簇的间隔短回文重复(CRISPR)-CRISPR-相关(Cas)(CRISPR-Cas)系统”或“CRISPR系统”可互换地使用并且具有本领域技术人员通常理解的含义,其通常包含与CRISPR相关(“Cas”)基因的表达有关的转录产物或其他元件,或者能够指导所述Cas基因活性的转录产物或其他元件。As used herein, the terms “Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)-CRISPR-associated (Cas) (CRISPR-Cas) system” or “CRISPR system” are used interchangeably and have the meaning generally understood by those skilled in the art, which generally includes transcription products or other elements related to the expression of CRISPR-associated (“Cas”) genes, or transcription products or other elements capable of directing the activity of the Cas genes.

Cas蛋白Cas proteins

Cas蛋白、或CRISPR相关蛋白是指适用于CRISPR(规律成簇间隔短回文重复序列Clustered Regularly Interspaced Short Palindromic Repeats)系统的核酸酶。优选地,所述Cas蛋白为CRISPR酶,其种类包括但并不限于:Cas9蛋白、Cas12蛋白、Cas13蛋白、Cas14蛋白、Csm1蛋白、FDK1蛋白。所述的Cas蛋白可以根据其来源不同而具有不同的结构,如来源于酿脓链球菌(Streptococcus pyogenes)的SpCas9、来源于葡萄球菌(Staphylococcus aureus)的SaCas9;还可以根据结构特征(如结构域)进行下位分类,如Cas12家族包括Cas12a(又名Cpf1)、Cas12b、Cas12c、Cas12i等。所述的Cas蛋白可以具有双链或单链或无切割活性。本发明所述的Cas蛋白可以是野生型或其突变体,所述的突变体的突变类型包括氨基酸的替换、取代或缺失,所述的突变体可以改变也可以不改变Cas蛋白的酶切活性。比如nCas9是指Cas9(H840A)突变体,具有单链核酸的切割活性。本领域技术人员所知,现有技术中已报到的多种具有核酸切割活性的Cas蛋白,该公知蛋白或其改造后的变体均可以实现本发明的功能,本文通过引用方式将其纳入保护范围。Cas protein, or CRISPR-related protein refers to a nuclease suitable for the CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) system. Preferably, the Cas protein is a CRISPR enzyme, and its types include but are not limited to: Cas9 protein, Cas12 protein, Cas13 protein, Cas14 protein, Csm1 protein, FDK1 protein. The Cas protein may have different structures depending on its source, such as SpCas9 from Streptococcus pyogenes and SaCas9 from Staphylococcus aureus; it can also be classified according to structural features (such as domains), such as the Cas12 family includes Cas12a (also known as Cpf1), Cas12b, Cas12c, Cas12i, etc. The Cas protein may have double-stranded or single-stranded or no cutting activity. The Cas protein described in the present invention may be a wild type or a mutant thereof, and the mutation type of the mutant includes amino acid replacement, substitution or deletion, and the mutant may or may not change the enzymatic activity of the Cas protein. For example, nCas9 refers to a Cas9 (H840A) mutant, which has single-stranded nucleic acid cleavage activity. As known to those skilled in the art, a variety of Cas proteins with nucleic acid cleavage activity have been reported in the prior art, and the known protein or its modified variants can achieve the functions of the present invention, and are included in the scope of protection by reference herein.

gRNAgRNA

如本文所用,术语“gRNA”、或“CRISPR RNA”是指适用于CRISPR系统的指导RNA(或向导RNA、guide RNA),其包括引导序列(或称为间隔序列)和骨架区,其骨架区可以与CRISPR蛋白(或者,Cas蛋白)相互作用,从而使Cas蛋白和gRNA形成复合物,并引导复合物与靶核酸结合;其引导序列(或称为间隔序列)与靶核酸序列互补。As used herein, the term "gRNA" or "CRISPR RNA" refers to a guide RNA (or guide RNA, guide RNA) suitable for the CRISPR system, which includes a guide sequence (or spacer sequence) and a backbone region, wherein the backbone region can interact with the CRISPR protein (or Cas protein), thereby allowing the Cas protein and gRNA to form a complex and guide the complex to bind to the target nucleic acid; its guide sequence (or spacer sequence) is complementary to the target nucleic acid sequence.

Prime Editing技术Prime Editing Technology

Prime Editing(引导编辑)技术如专利CN113891936A、CN113891937A、CN114127285A、或CN114729365A所述,是指使用核酸可编程DNA结合蛋白(如Cas9)、聚合酶(如逆转录酶)和引导编辑指导RNA(PEgRNA)进行基因编辑的方法。所述PEgRNA包括向导RNA(由引导序列和骨架序列组成))和延伸臂,延伸臂包括DNA结合模板(包括编辑模板和同源臂的RTT)和引物结合位点(PBS)。引导编辑技术的原理是核酸可编程DNA结合蛋白(如Cas9)切割靶核酸的一条链(非靶链),产生切口的靶核酸链与pegRNA的延伸臂相互作用,引发聚合,聚合酶(如逆转录酶)合成含有目标序列的ssDNA,含有目标序列的DNA链由于包含同源臂序列可以与内源的靶核酸链杂交,并将原DNA链替换,所述目标序列最终被插入靶核酸中。Prime Editing (guide editing) technology, as described in patents CN113891936A, CN113891937A, CN114127285A, or CN114729365A, refers to a method for gene editing using a nucleic acid programmable DNA binding protein (such as Cas9), a polymerase (such as a reverse transcriptase), and a guide editing guide RNA (PEgRNA). The PEgRNA includes a guide RNA (composed of a guide sequence and a backbone sequence) and an extension arm, and the extension arm includes a DNA binding template (including an editing template and RTT of a homology arm) and a primer binding site (PBS). The principle of the guide editing technology is that a nucleic acid programmable DNA binding protein (such as Cas9) cuts a strand (non-target strand) of the target nucleic acid, and the target nucleic acid strand that produces the cut interacts with the extension arm of the pegRNA to initiate polymerization, and the polymerase (such as a reverse transcriptase) synthesizes ssDNA containing the target sequence. The DNA strand containing the target sequence can hybridize with the endogenous target nucleic acid strand due to the homology arm sequence, and replaces the original DNA strand, and the target sequence is finally inserted into the target nucleic acid.

本发明涉及的序列如下:


The sequence involved in the present invention is as follows:


本发明的主要优点:The main advantages of the present invention are:

本发明提供了一种可用于引导编辑的融合蛋白,首次实现了V型Cas蛋白Cas12i与逆转录酶结合进行引导编辑,具有广泛的应用前景。The present invention provides a fusion protein that can be used for guide editing, which for the first time realizes the combination of V-type Cas protein Cas12i and reverse transcriptase for guide editing, and has broad application prospects.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1.PE载体构建示意图,图1a为载体SF01-PE示意图,其中G-quadruplex为鸟嘌呤四连体(SEQ ID NO.8所示),tgcrRNA为所述PEgRNA(SEQ ID NO.9所示,或SEQ ID NO.10所示,或SEQ ID NO.11所示);U6为PEgRNA的启动子(SEQ ID NO.12),CMV为融合蛋白启动子(SEQ ID NO.13所示),RT为逆转录酶(SEQ ID NO.2),SF01为Cas蛋白(SEQ ID NO.1);图1b为载体SF01-Brex示意图,其中G-quadruplex为鸟嘌呤四连体(SEQ ID NO.8所示),tgcrRNA为所述PEgRNA(SEQ ID NO.9所示,或SEQ ID NO.10所示,或SEQ ID NO.11所示),U6为PEgRNA的启动子(SEQ ID NO.12),CMV为融合蛋白启动子(SEQ ID NO.13所示),RT为逆转录酶(SEQ ID NO.2),SF01为Cas蛋白(SEQ ID NO.1),Brex为Brex27单链结合蛋白(SEQ ID NO.5);图1c为载体SF01-LinkerBrex示意图,其中G-quadruplex为鸟嘌呤四连体(SEQ ID NO.8所示),tgcrRNA为所述PEgRNA(SEQ ID NO.9所示,或SEQ ID NO.10所示,或SEQ ID NO.11所示),U6为PEgRNA的启动子(SEQ ID NO.12),CMV为融合蛋白启动子(SEQ ID NO.13所示),RT为逆转录酶(SEQ ID NO.2),SF01为Cas蛋白(SEQ ID NO.1),Brex为Brex27单链结合蛋白(SEQ ID NO.5),Liker为接头XTEN(SEQ ID NO.3),连接单链结合蛋白与Cas蛋白。Figure 1. Schematic diagram of PE vector construction. Figure 1a is a schematic diagram of vector SF01-PE, wherein G-quadruplex is a guanine quadruplex (shown in SEQ ID NO.8), tgcrRNA is the PEgRNA (shown in SEQ ID NO.9, or SEQ ID NO.10, or SEQ ID NO.11); U6 is the promoter of PEgRNA (SEQ ID NO.12), CMV is the fusion protein promoter (shown in SEQ ID NO.13), RT is the reverse transcriptase ( FIG1b is a schematic diagram of the vector SF01-Brex, wherein G-quadruplex is a guanine quadruplex (shown in SEQ ID NO.8), tgcrRNA is the PEgRNA (shown in SEQ ID NO.9, or SEQ ID NO.10, or SEQ ID NO.11), U6 is the promoter of PEgRNA (SEQ ID NO.12), and CMV is the fusion protein promoter. (shown in SEQ ID NO.13), RT is reverse transcriptase (SEQ ID NO.2), SF01 is Cas protein (SEQ ID NO.1), Brex is Brex27 single-stranded binding protein (SEQ ID NO.5); Figure 1c is a schematic diagram of the vector SF01-LinkerBrex, wherein G-quadruplex is a guanine quadruplex (shown in SEQ ID NO.8), tgcrRNA is the PEgRNA (shown in SEQ ID NO.9, or SEQ ID NO. .10, or SEQ ID NO.11), U6 is the promoter of PEgRNA (SEQ ID NO.12), CMV is the fusion protein promoter (SEQ ID NO.13), RT is reverse transcriptase (SEQ ID NO.2), SF01 is Cas protein (SEQ ID NO.1), Brex is Brex27 single-stranded binding protein (SEQ ID NO.5), Liker is the linker XTEN (SEQ ID NO.3), connecting the single-stranded binding protein and the Cas protein.

图2.PE载体系统精准替换编辑效率。Figure 2. Precise replacement editing efficiency of the PE vector system.

图3.基于本发明PE技术的精准编辑原理图示。Figure 3. Illustration of the precise editing principle based on the PE technology of the present invention.

图4.PE载体构建示意图。Figure 4. Schematic diagram of PE vector construction.

图5.PE载体的精准编辑效率和indel效率。Figure 5. Precise editing efficiency and indel efficiency of PE vector.

图6.PE载体的编辑位置示意图。Figure 6. Schematic diagram of the editing position of the PE vector.

图7.PE载体在不同位置碱基替换的效率。Figure 7. Efficiency of base replacement at different positions of PE vector.

图8.PE载体的替换碱基的长度范围。Figure 8. Length range of substituted bases in PE vectors.

图9.PE载体进行碱基删除时的位置和长度结果。Figure 9. Position and length results of base deletion in PE vector.

实施方式Implementation

以下实施例仅用于描述本发明,而非限定本发明。除非特别指明,否则基本上按照本领域内熟知的以及在各种参考文献中描述的常规方法进行实施例中描述的实验和方法。例如,本发明中所使用的免疫学、生物化学、化学、分子生物学、微生物学、细胞生物学、基因组学和重组DNA等常规技术,可参见萨姆布鲁克(Sambrook)、弗里奇(Fritsch)和马尼亚蒂斯(Maniatis),《分子克隆:实验室手册》(MOLECULAR CLONING:A LABORATORY MANUAL),第2次编辑(1989);《当代分子生物学实验手册》(CURRENT PROTOCOLS IN MOLECULAR BIOLOGY)(F.M.奥苏贝尔(F.M.Ausubel)等人编辑,(1987));《酶学方法》(METHODS IN ENZYMOLOGY)系列(学术出版公司):《PCR 2:实用方法》(PCR 2:A PRACTICAL APPROACH)(M.J.麦克弗森(M.J.MacPherson)、B.D.黑姆斯(B.D.Hames)和G.R.泰勒(G.R.Taylor)编辑(1995))、哈洛(Harlow)和拉内(Lane)编辑(1988)《抗体:实验室手册》(ANTIBODIES,A LABORATORY MANUAL),以及《动物细胞培养》(ANIMAL CELL CULTURE)(R.I.弗雷谢尼(R.I.Freshney)编辑(1987))。The following examples are only used to describe the present invention, not to limit the present invention. Unless otherwise specified, the experiments and methods described in the examples are basically carried out according to conventional methods well known in the art and described in various references. For example, conventional techniques such as immunology, biochemistry, chemistry, molecular biology, microbiology, cell biology, genomics and recombinant DNA used in the present invention can be found in Sambrook, Fritsch and Maniatis, MOLECULAR CLONING: A LABORATORY MANUAL, 2nd edition (1989); CURRENT PROTOCOLS IN MOLECULAR BIOLOGY (F.M. Ausubel et al., ed., (1987)); METHODS IN ENZYMOLOGY) series (Academic Press): PCR 2: A PRACTICAL APPROACH (M.J. MacPherson, B.D. Hames, and G.R. Taylor, eds. (1995)), ANTIBODIES, A LABORATORY MANUAL, Harlow and Lane, eds. (1988), and ANIMAL CELL CULTURE (R.I. Freshney, ed. (1987)).

另外,实施例中未注明具体条件者,按照常规条件或制造商建议的条件进行。所用试剂或仪器未注明生产厂商者,均为可以通过市购获得的常规产品。本领域技术人员知晓,实施例以举例方式描述本发明,且不意欲限制本发明所要求保护的范围。本文中提及的全部公开案和其他参考资料以其全文通过引用合并入本文。In addition, if the specific conditions are not specified in the examples, they are carried out according to the conventional conditions or the conditions recommended by the manufacturer. If the manufacturer is not specified in the reagents or instruments used, they are all conventional products that can be obtained commercially. It is known to those skilled in the art that the embodiments describe the present invention by way of example and are not intended to limit the scope of the present invention. All public cases and other references mentioned herein are incorporated herein by reference in their entirety.

实施例1、PE载体系统的构建Example 1. Construction of PE carrier system

针对已知的Cas蛋白(CN111757889B中的Cas12f.4,本实施例中,将其称之为Cas12i),申请人通过生物信息学预测可能影响其生物学功能的关键氨基酸位点,并将氨基酸位点进行突变,得到了编辑活性提高的Cas突变蛋白,其氨基酸序列如SEQ ID NO.1所示,与逆转录酶连接形成融合蛋白,所述逆转录酶的氨基酸序列如SEQ ID NO.2所示,在DNMT1基因上选取3个靶点,分别设计PEgRNA序列如下表2所示(加粗小写部分为gRNA的骨架序列,斜体部分为gRNA的引导序列,下划线为RTT序列,其中加粗大写的核苷酸为PE精准编辑替换的核苷酸,大写部分为PBS序列,其余部分为核酸延伸臂的同源臂序列)。For the known Cas protein (Cas12f.4 in CN111757889B, referred to as Cas12i in this embodiment), the applicant predicted the key amino acid sites that may affect its biological function through bioinformatics, and mutated the amino acid sites to obtain a Cas mutant protein with improved editing activity, whose amino acid sequence is shown in SEQ ID NO.1, and was connected with the reverse transcriptase to form a fusion protein, the amino acid sequence of the reverse transcriptase is shown in SEQ ID NO.2, and three targets were selected on the DNMT1 gene, and the PEgRNA sequences were designed as shown in Table 2 below (the bold lowercase part is the backbone sequence of the gRNA, the italic part is the guide sequence of the gRNA, the underline is the RTT sequence, the bold uppercase nucleotides are the nucleotides replaced by PE precise editing, the uppercase part is the PBS sequence, and the rest is the homologous arm sequence of the nucleic acid extension arm).

表2.PEgRNA序列信息
Table 2. PEgRNA sequence information

载体pcDNA3.3经改造后带有EGFP荧光蛋白及PuroR抗性基因。经酶切位点XbaI和PstI插入SV40 NLS-Cas-XX融合蛋白;经酶切位点Mfe1插入U6启动子及PEgRNA序列。CMV启动子启动融合蛋白SV40 NLS-Cas-XX-NLS-GFP表达。蛋白Cas-XX-NLS与蛋白GFP用连接肽T2A进行连接。启动子EF-1α启动嘌呤霉素抗性基因表达。The vector pcDNA3.3 was modified to carry the EGFP fluorescent protein and the PuroR resistance gene. The SV40 NLS-Cas-XX fusion protein was inserted through the restriction sites XbaI and PstI; the U6 promoter and PEgRNA sequence were inserted through the restriction site Mfe1. The CMV promoter initiated the expression of the fusion protein SV40 NLS-Cas-XX-NLS-GFP. The protein Cas-XX-NLS was connected to the protein GFP with the connecting peptide T2A. The promoter EF-1α initiated the expression of the puromycin resistance gene.

PE载体构建结果如图1所示,图1a为不加Brex27单链结合蛋白,图1b为Cas蛋白与Brex27单链结合蛋白直接连接,图1b为Cas蛋白与Brex27单链结合蛋白通过接头连接。The results of PE vector construction are shown in Figure 1 , where Figure 1a shows that no Brex27 single-chain binding protein is added, Figure 1b shows that the Cas protein is directly connected to the Brex27 single-chain binding protein, and Figure 1b shows that the Cas protein is connected to the Brex27 single-chain binding protein through a linker.

如图3所示,包含融合蛋白和PEgRNA的复合物与双链DNA接触,使双链DNA序列产生切口,靶向链3'末端暴露出游离单链DNA;3'末端的游离单链DNA可以与引物结合位点杂交,由此激活所述逆转录酶在与引物结合位点杂交的3'末端以RTT为模板聚合DNA链,由此产生包含核苷酸替换片段且与所述RTT模板互补的单链DNA;利用细胞内源的DNA修复机制将单链DNA替换邻近切割位点的内源性DNA链,由此在双链DNA序列中产生精准编辑。As shown in Figure 3, the complex containing the fusion protein and PEgRNA contacts the double-stranded DNA, causing a nick in the double-stranded DNA sequence, exposing free single-stranded DNA at the 3' end of the targeted chain; the free single-stranded DNA at the 3' end can hybridize with the primer binding site, thereby activating the reverse transcriptase to polymerize the DNA chain using RTT as a template at the 3' end hybridized with the primer binding site, thereby generating a single-stranded DNA containing a nucleotide replacement fragment and complementary to the RTT template; the endogenous DNA repair mechanism of the cell is used to replace the endogenous DNA chain adjacent to the cutting site with single-stranded DNA, thereby generating precise editing in the double-stranded DNA sequence.

实施例2、PE载体系统的编辑活性验证Example 2: Verification of editing activity of PE vector system

在293T细胞中验证PE载体系统的编辑活性,将构建好的各载体分别转入293T细胞中,铺板:293T细胞融合度至70-80%进行铺板,12孔板中接种细胞数为8*10^4细胞/孔。转染:铺板24h进行转染,100μl opti-MEM中加入6.25μl Hieff TransTM脂质体核酸转染试剂,混匀;100μl opti-MEM中加入2.5ug质粒,混匀。稀释好的Hieff TransTM脂质体核酸转染试剂与稀释后的质粒混合均匀,室温孵育20min。孵育好的混合液加入铺有细胞的培养基中进行转染。转染48h后,用胰蛋白酶-EDTA(0.05%)消化,用流式细胞仪(FACS)分选具有GFP信号的细胞。Verify the editing activity of the PE vector system in 293T cells, and transfer each constructed vector into 293T cells. Plate: 293T cells were plated when the confluence reached 70-80%, and the number of cells inoculated in a 12-well plate was 8*10^4 cells/well. Transfection: Transfection was performed 24 hours after plating, and 6.25μl Hieff TransTM liposome nucleic acid transfection reagent was added to 100μl opti-MEM and mixed; 2.5ug plasmid was added to 100μl opti-MEM and mixed. The diluted Hieff TransTM liposome nucleic acid transfection reagent was mixed evenly with the diluted plasmid and incubated at room temperature for 20 minutes. The incubated mixture was added to the culture medium with cells for transfection. After 48 hours of transfection, trypsin-EDTA (0.05%) was used for digestion, and cells with GFP signals were sorted by flow cytometry (FACS).

提DNA、PCR扩增编辑区附近、送hiTOM测序:细胞经胰酶消化处理后进行收集,经细胞/组织基因组DNA提取试剂盒(百泰克)进行基因组DNA提取。对基因组DNA扩增靶点附近区域。PCR产物进行hiTOM测序。测序数据分析,统计靶点位置上游15nt、下游10nt范围内的序列种类及比例,统计序列中精准替换和大片段缺失的概率。Extract DNA, PCR amplify the editing area, and send to hiTOM sequencing: The cells were collected after trypsin digestion, and genomic DNA was extracted using the Cell/Tissue Genomic DNA Extraction Kit (Biotek). The genomic DNA was amplified near the target site. The PCR product was sequenced by hiTOM. Sequencing data analysis was performed to count the types and proportions of sequences within 15nt upstream and 10nt downstream of the target site, and the probability of precise replacement and large fragment deletion in the sequence was counted.

针对各靶点设计hiTOM测序引物:
Design hiTOM sequencing primers for each target site:

在293T细胞中各靶点的检测结果如图2所示,TGTC-ACAG为靶点1,当精准编辑发生时,靶点1处TGTC会替换为ACAG;ACCT-TGCA为靶点2,当精准编辑发生时,靶点2处ACCT会替换为TGCA;CGG-GCA为靶点3,当精准编辑发生时,靶点3处CGG会替换为GCA。The detection results of each target in 293T cells are shown in Figure 2. TGTC-ACAG is target 1. When precise editing occurs, TGTC at target 1 will be replaced by ACAG; ACCT-TGCA is target 2. When precise editing occurs, ACCT at target 2 will be replaced by TGCA; CGG-GCA is target 3. When precise editing occurs, CGG at target 3 will be replaced by GCA.

如图2所示在不同靶点处,构建的PE编辑系统均具有明显的编辑活性,在靶点1-3处,SF01-PE载体系统的精准替换编辑效率明显高于SF01-Brex载体系统和SF01-LinkerBrex载体系统的精准替换编辑效率。As shown in Figure 2, at different target sites, the constructed PE editing systems all have obvious editing activity. At target sites 1-3, the precise replacement editing efficiency of the SF01-PE vector system is significantly higher than that of the SF01-Brex vector system and the SF01-LinkerBrex vector system.

实施例3、PE载体系统的构建和活性验证Example 3: Construction and activity verification of PE carrier system

利用实施例1中的Cas突变蛋白(Cas-SF01,其氨基酸序列如SEQ ID NO.1所示)与逆转录酶M-MLV连接形成融合蛋白(逆转录酶的氨基酸序列如SEQ ID NO.2所示),在DNMT1基因上选取实施例1中的靶点1,并利用实施例1中的PEgRNA1序列进行实验。实验方法与实施例1相同,构建四种PE载体,构建结果如图4所示,图4a和图4c为不加Ubv(泛素样修饰蛋白)的PE载体;图4a的Cas蛋白的N端与逆转录酶直接连接(图5中的MMLV-SF01);图4b的Cas蛋白的N端与逆转录酶直接连接,Cas蛋白的C端与Ubv直接连接(图5中的MMLV-SF01-Ubv);图4c的Cas蛋白的C端与逆转录酶直接连接(图5中的SF01-MMLV);图4d的Cas蛋白的C端与逆转录酶直接连接,逆转录酶的C端与Ubv直接连接(图5中的SF01-MMLV-Ubv)。在图4中,SF01是指Cas突变蛋Cas-SF01,RT为逆转录酶M-MLV,Ubv为泛素样修饰蛋白(SEQ ID No.15所示),其他元件与图1相同。其中,Ubv是一种泛素样修饰蛋白,它主要作用于53BP1蛋白,抑制其募集到DNA双链断裂位点,从而促进精确的DNA修复。The Cas mutant protein in Example 1 (Cas-SF01, whose amino acid sequence is shown in SEQ ID NO.1) was connected with the reverse transcriptase M-MLV to form a fusion protein (the amino acid sequence of the reverse transcriptase is shown in SEQ ID NO.2), the target 1 in Example 1 was selected on the DNMT1 gene, and the experiment was carried out using the PEgRNA1 sequence in Example 1. The experimental method is the same as that in Example 1. Four PE vectors are constructed. The construction results are shown in Figure 4. Figures 4a and 4c are PE vectors without Ubv (ubiquitin-like modified protein); the N-terminus of the Cas protein in Figure 4a is directly connected to the reverse transcriptase (MMLV-SF01 in Figure 5); the N-terminus of the Cas protein in Figure 4b is directly connected to the reverse transcriptase, and the C-terminus of the Cas protein is directly connected to Ubv (MMLV-SF01-Ubv in Figure 5); the C-terminus of the Cas protein in Figure 4c is directly connected to the reverse transcriptase (SF01-MMLV in Figure 5); the C-terminus of the Cas protein in Figure 4d is directly connected to the reverse transcriptase, and the C-terminus of the reverse transcriptase is directly connected to Ubv (SF01-MMLV-Ubv in Figure 5). In Figure 4, SF01 refers to the Cas mutant Cas-SF01, RT is the reverse transcriptase M-MLV, Ubv is the ubiquitin-like modified protein (shown in SEQ ID No. 15), and the other elements are the same as in Figure 1. Among them, Ubv is a ubiquitin-like modifier protein that mainly acts on the 53BP1 protein, inhibiting its recruitment to the DNA double-strand break site, thereby promoting precise DNA repair.

利用实施例2所述的方法验证上述PE载体的编辑活性,结果如图5所示,对于不加Ubv的PE载体来说,Cas-SF01蛋白的N端与逆转录酶MMLV连接的PE载体(图4a所示的载体)的精准修复效果比Cas-SF01蛋白的C端与逆转录酶MMLV连接的PE载体(图4c所示的载体)好;在PE载体上融合Ubv(图4b和图4d所示的载体)能显著提升精准修复效率,特别是,Cas蛋白的N端与逆转录酶连接、Cas蛋白的C端与Ubv连接的PE载体(图4b所示的载体)的精准修复的效果最好。在图5中,SF01是指Cas突变蛋Cas-SF01,MMLV为逆转录酶,Ubv为泛素样修饰蛋白。The editing activity of the above PE vector was verified by the method described in Example 2. The results are shown in Figure 5. For the PE vector without Ubv, the PE vector (the vector shown in Figure 4a) in which the N-terminus of the Cas-SF01 protein is connected to the reverse transcriptase MMLV is better than the PE vector (the vector shown in Figure 4c) in which the C-terminus of the Cas-SF01 protein is connected to the reverse transcriptase MMLV; the fusion of Ubv on the PE vector (the vector shown in Figure 4b and Figure 4d) can significantly improve the efficiency of precise repair, especially the PE vector (the vector shown in Figure 4b) in which the N-terminus of the Cas protein is connected to the reverse transcriptase and the C-terminus of the Cas protein is connected to Ubv has the best precise repair effect. In Figure 5, SF01 refers to the Cas mutant egg Cas-SF01, MMLV is a reverse transcriptase, and Ubv is a ubiquitin-like modified protein.

实施例4、PE载体系统的编辑范围Example 4: Editing scope of PE vector system

利用实施例1中构建的如图1a所示的PE载体,在DNMT1基因上选取实施例1中的靶点1,验证PE载体的编辑范围,实验方法与实施例1-2相同。如图6所示,将Spacer起始设置为第1位,PAM的位置是-3至-1位。Using the PE vector constructed in Example 1 as shown in FIG. 1a, target 1 in Example 1 was selected on the DNMT1 gene to verify the editing range of the PE vector, and the experimental method was the same as that of Example 1-2. As shown in FIG. 6 , the Spacer start was set to position 1, and the position of the PAM was from -3 to -1.

碱基替换范围的结果如图7所示,PE(MMLV-SF01)的编辑范围是-3位至17位,其中替换的碱基位于第10-13位时编辑效率最高,平均效率超30%,显著高于对照组。在图7中,横坐标P-3,1、P2,5、P6,9、P10,13、P14,17是指编辑发生时,靶点处四个替换碱基的起止位置;MMLV-SF01为对照组,其替换碱基位置是第12-15位。The results of the base replacement range are shown in Figure 7. The editing range of PE (MMLV-SF01) is from -3 to 17, and the editing efficiency is highest when the replaced base is located at positions 10-13, with an average efficiency of over 30%, which is significantly higher than the control group. In Figure 7, the horizontal coordinates P-3,1, P2,5, P6,9, P10,13, P14,17 refer to the start and end positions of the four replaced bases at the target site when editing occurs; MMLV-SF01 is the control group, and its replaced base positions are 12-15.

替换碱基长度的结果如图8所示,PE(MMLV-SF01)的替换碱基长度范围可达24bp,其中,替换碱基长度为8bp(P10-17组)的精准编辑效率最高,效率超30%,显著高于对照组。替换的碱基长度超过8bp后,随着替换长度的增加,替换的效率呈下降趋势。在图8中,横坐标P10,17(替换碱基为长度8bp)、P8,19(替换碱基长度为12bp)、P6,21(替换碱基长度为16bp)、P4,23(替换碱基长度为20bp)、P2,25(替换碱基长度为24bp)是指编辑发生时,靶点处替换碱基的起止位置;MMLV-SF01为对照组,其替换碱基位置是第12-15位的四个碱基。The results of the replacement base length are shown in Figure 8. The replacement base length of PE (MMLV-SF01) can reach 24bp, among which the replacement base length of 8bp (P10-17 group) has the highest precision editing efficiency, with an efficiency of over 30%, which is significantly higher than the control group. After the length of the replaced base exceeds 8bp, the efficiency of the replacement decreases with the increase of the replacement length. In Figure 8, the horizontal axis P10,17 (replacement base length is 8bp), P8,19 (replacement base length is 12bp), P6,21 (replacement base length is 16bp), P4,23 (replacement base length is 20bp), P2,25 (replacement base length is 24bp) refers to the start and end positions of the replacement base at the target site when editing occurs; MMLV-SF01 is the control group, and its replacement base positions are the four bases at positions 12-15.

将PE(MMLV-SF01)的编辑结果设计为碱基删除,删除碱基的范围和长度结果如图9所示,PE(MMLV-SF01)的删除碱基的范围是-7位至23位,删除碱基长度范围可达30bp。其中,删除碱基位于-1位至23位、删除长度为24bp时(P-1,23组)精准编辑的效率最高。删除碱基的长度≤24bp时,长度越长,精准编辑的效率越高。在图9中,横坐标P15,18(删除碱基长度为4bp)、P9,16(删除碱基长度为8bp)、P10,21(删除碱基长度为12bp)、P8,23(删除碱基长度为16bp)、P-1,23(删除碱基长度为24bp)、P-7,23(删除碱基长度为30bp)是指编辑发生时,靶点处删除碱基的起止位置;MMLV-SF01为对照组,其编辑结果是替换位于12-15位的四个碱基。The editing result of PE (MMLV-SF01) was designed as base deletion. The range and length of the deleted bases are shown in Figure 9. The range of the deleted bases of PE (MMLV-SF01) is from -7 to 23, and the length of the deleted bases can reach 30bp. Among them, the efficiency of precise editing is the highest when the deleted bases are located from -1 to 23 and the deletion length is 24bp (P-1, 23 group). When the length of the deleted base is ≤24bp, the longer the length, the higher the efficiency of precise editing. In Figure 9, the horizontal coordinates P15,18 (deleted base length is 4bp), P9,16 (deleted base length is 8bp), P10,21 (deleted base length is 12bp), P8,23 (deleted base length is 16bp), P-1,23 (deleted base length is 24bp), and P-7,23 (deleted base length is 30bp) refer to the start and end positions of the deleted bases at the target site when editing occurs; MMLV-SF01 is the control group, and its editing result is the replacement of the four bases located at positions 12-15.

尽管本发明的具体实施方式已经得到详细的描述,但本领域技术人员将理解:根据已经公布的所有教导,可以对细节进行各种修改和变动,并且这些改变均在本发明的保护范围之内。本发明的全部分为由所附权利要求及其任何等同物给出。Although the specific embodiments of the present invention have been described in detail, it will be understood by those skilled in the art that various modifications and changes may be made to the details according to all the teachings that have been published, and these changes are within the scope of protection of the present invention. The entire invention is given by the attached claims and any equivalents thereof.

Claims (10)

Translated fromChinese
一种融合蛋白,所述融合蛋白包含Cas蛋白和逆转录酶,其特征在于,所述Cas蛋白为V型Cas蛋白或其变体,优选的,所述Cas蛋白为Cas12i,更优选的,所述Cas蛋白的氨基酸序列如SEQ ID NO.1所示,所述逆转录酶选自M-MLV逆转录酶、AMV逆转录酶及其他,优选的,所述逆转录酶为M-MLV逆转录酶,优选的,所述逆转录酶的氨基酸序列如SEQ ID NO.2所示,优选的,所述融合蛋白包含连接所述Cas蛋白和逆转录酶的接头。A fusion protein comprising a Cas protein and a reverse transcriptase, characterized in that the Cas protein is a V-type Cas protein or a variant thereof, preferably, the Cas protein is Cas12i, more preferably, the amino acid sequence of the Cas protein is as shown in SEQ ID NO.1, the reverse transcriptase is selected from M-MLV reverse transcriptase, AMV reverse transcriptase and others, preferably, the reverse transcriptase is M-MLV reverse transcriptase, preferably, the amino acid sequence of the reverse transcriptase is as shown in SEQ ID NO.2, preferably, the fusion protein comprises a linker connecting the Cas protein and the reverse transcriptase.根据权利要求1所述的融合蛋白,其特征在于,所述Cas蛋白一端与逆转录酶连接,另一端还可以与单链结合蛋白或泛素样修饰蛋白连接;或者,所述逆转录酶一端与Cas蛋白连接,另一端还可以与单链结合蛋白或泛素样修饰蛋白连接;The fusion protein according to claim 1, characterized in that one end of the Cas protein is connected to the reverse transcriptase, and the other end can also be connected to the single-chain binding protein or the ubiquitin-like modified protein; or, one end of the reverse transcriptase is connected to the Cas protein, and the other end can also be connected to the single-chain binding protein or the ubiquitin-like modified protein;优选的,所述单链结合蛋白选自Brex27、EcRecA、BsRecA或T4SSB,所述泛素样修饰蛋白的氨基酸序列如SEQ ID No.15所示;更优选的,所述单链结合蛋白为Brex27;Preferably, the single-chain binding protein is selected from Brex27, EcRecA, BsRecA or T4SSB, and the amino acid sequence of the ubiquitin-like modified protein is shown in SEQ ID No. 15; more preferably, the single-chain binding protein is Brex27;优选的,所述逆转录酶连接在Cas蛋白N端或C端,优选的,所述逆转录酶连接在Cas蛋白N端;Preferably, the reverse transcriptase is connected to the N-terminus or C-terminus of the Cas protein. Preferably, the reverse transcriptase is connected to the N-terminus of the Cas protein;优选的,所述连接可以为直接连接,也可以为通过接头连接。Preferably, the connection may be a direct connection or a connection via a connector.一种用于引导编辑的复合物,其特征在于,所述复合物包含:A compound for guided editing, characterized in that the compound comprises:(i)蛋白组分,其选自:权利要求1-2任一所述的融合蛋白;(i) a protein component selected from the group consisting of: a fusion protein according to any one of claims 1 to 2;(ii)核酸组分,其为引导编辑向导RNA(PEgRNA),所述PEgRNA包含向导RNA(gRNA)和核酸延伸臂,所述核酸延伸臂包含引物结合位点序列(PBS)和逆转录模板序列(RTT);优选的,所述核酸延伸臂还可以包含同源臂序列,所述gRNA包括引导序列和骨架序列;(ii) a nucleic acid component, which is a guide editing guide RNA (PEgRNA), wherein the PEgRNA comprises a guide RNA (gRNA) and a nucleic acid extension arm, wherein the nucleic acid extension arm comprises a primer binding site sequence (PBS) and a reverse transcription template sequence (RTT); preferably, the nucleic acid extension arm may further comprise a homology arm sequence, and the gRNA comprises a guide sequence and a backbone sequence;所述蛋白组分与核酸组分相互结合形成复合物。The protein component and the nucleic acid component are combined with each other to form a complex.一种分离的多核苷酸,其特征在于,所述多核苷酸为编码权利要求1-2任一所述融合蛋白的多核苷酸序列,或编码权利要求3所述复合物的多核苷酸序列。An isolated polynucleotide, characterized in that the polynucleotide is a polynucleotide sequence encoding the fusion protein of any one of claims 1-2, or a polynucleotide sequence encoding the complex of claim 3.一种载体,其特征在于,所述载体包含权利要求4所述的多核苷酸以及与之可操作连接的调控元件。A vector, characterized in that the vector comprises the polynucleotide according to claim 4 and a regulatory element operably linked thereto.一种工程化的宿主细胞,其特征在于,所述宿主细胞包含权利要求1-2任一所述的融合蛋白,或权利要求3所述的复合物,或权利要求4所述的多核苷酸,或权利要求5所述的载体。An engineered host cell, characterized in that the host cell comprises the fusion protein according to any one of claims 1-2, or the complex according to claim 3, or the polynucleotide according to claim 4, or the vector according to claim 5.权利要求1-2任一所述的融合蛋白,或权利要求3所述的复合物,或权利要求4所述的多核苷酸,或权利要求5所述的载体,或权利要求6所述的宿主细胞在基因编辑、基因靶向或基因切割中的应用;或者,在制备用于基因编辑、基因靶向或基因切割的试剂或试剂盒中的用途。Use of the fusion protein of any one of claims 1 to 2, or the complex of claim 3, or the polynucleotide of claim 4, or the vector of claim 5, or the host cell of claim 6 in gene editing, gene targeting or gene cleavage; or, use in the preparation of a reagent or kit for gene editing, gene targeting or gene cleavage.权利要求1-2任一所述的融合蛋白,或权利要求3所述的复合物,或权利要求4所述的多核苷酸,或权利要求5所述的载体,或权利要求6所述的宿主细胞在制备制剂或试剂盒中的用途,所述制剂或试剂盒用于:Use of the fusion protein according to any one of claims 1 to 2, or the complex according to claim 3, or the polynucleotide according to claim 4, or the vector according to claim 5, or the host cell according to claim 6 in preparing a preparation or a kit, wherein the preparation or the kit is used for:(i)基因或基因组编辑;(i) gene or genome editing;(ii)靶核酸检测和/或诊断;(ii) target nucleic acid detection and/or diagnosis;(iii)编辑靶基因座中的靶序列来修饰生物;(iii) editing a target sequence in a target locus to modify an organism;(iv)疾病的治疗;(iv) treatment of disease;(v)靶向靶基因;(v) targeting target genes;(vi)切割目的基因。(vi) Cutting the target gene.一种基因编辑、基因靶向或基因切割的方法,所述方法包括:所述靶核酸序列与权利要求1-2任一所述的融合蛋白,或权利要求3所述的复合物,或权利要求4所述的多核苷酸,或权利要求5所述的载体,或权利要求6所述的宿主细胞接触。A method for gene editing, gene targeting or gene cleavage, the method comprising: contacting the target nucleic acid sequence with the fusion protein according to any one of claims 1 to 2, or the complex according to claim 3, or the polynucleotide according to claim 4, or the vector according to claim 5, or the host cell according to claim 6.一种在双链DNA序列中引入期望的核苷酸变化的方法,其特征在于,所述方法包括:使所述双链DNA序列与包含权利要求1或2所述的融合蛋白和PEgRNA的复合物接触,所述融合蛋白包含Cas蛋白和逆转录酶,所述PEgRNA包含含有所述期望的核苷酸变化的DNA合成模板(逆转录模板序列,RTT)和引物结合位点;从而对所述双链DNA序列产生切口,由此产生具有靶向链3'末端的游离单链DNA;从而使所述游离单链DNA的3'末端与所述引物结合位点杂交,由此激活所述逆转录酶在与所述引物结合位点杂交的3'末端以RTT为模版聚合DNA链,由此产生包含所述期望的核苷酸变化且与所述RTT序列互补的单链DNA;从而利用细胞内源DNA修复机制将所述单链DNA替换邻近切割位点的内源性DNA链,由此在所述双链DNA序列中安装所述期望的核苷酸变化。A method for introducing a desired nucleotide change in a double-stranded DNA sequence, characterized in that the method comprises: contacting the double-stranded DNA sequence with a complex comprising the fusion protein described in claim 1 or 2 and PEgRNA, wherein the fusion protein comprises a Cas protein and a reverse transcriptase, and the PEgRNA comprises a DNA synthesis template (reverse transcription template sequence, RTT) containing the desired nucleotide change and a primer binding site; thereby generating a nick in the double-stranded DNA sequence, thereby generating a free single-stranded DNA having a 3' end of a targeted chain; thereby hybridizing the 3' end of the free single-stranded DNA with the primer binding site, thereby activating the reverse transcriptase to polymerize the DNA chain using RTT as a template at the 3' end hybridized with the primer binding site, thereby generating a single-stranded DNA comprising the desired nucleotide change and complementary to the RTT sequence; thereby utilizing the cell's endogenous DNA repair mechanism to replace the endogenous DNA chain adjacent to the cleavage site with the single-stranded DNA, thereby installing the desired nucleotide change in the double-stranded DNA sequence.
PCT/CN2024/1402392023-12-222024-12-18Composition and method for prime editing techniquePendingWO2025130907A1 (en)

Applications Claiming Priority (2)

Application NumberPriority DateFiling DateTitle
CN2023117773062023-12-22
CN202311777306.32023-12-22

Publications (1)

Publication NumberPublication Date
WO2025130907A1true WO2025130907A1 (en)2025-06-26

Family

ID=95040672

Family Applications (1)

Application NumberTitlePriority DateFiling Date
PCT/CN2024/140239PendingWO2025130907A1 (en)2023-12-222024-12-18Composition and method for prime editing technique

Country Status (2)

CountryLink
CN (1)CN119685290A (en)
WO (1)WO2025130907A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN113106081A (en)*2018-10-292021-07-13中国农业大学Novel CRISPR/Cas12f enzymes and systems
CN113151215A (en)*2021-05-272021-07-23中国科学院动物研究所Engineered Cas12i nuclease, effector protein thereof and uses thereof
CN113891936A (en)*2019-03-192022-01-04布罗德研究所股份有限公司Methods and compositions for editing nucleotide sequences
WO2023096977A2 (en)*2021-11-242023-06-01Prime Medicine, Inc.Modified prime editing guide rnas
WO2023212594A2 (en)*2022-04-262023-11-02University Of MassachusettsSINGLE pegRNA-MEDIATED LARGE INSERTIONS

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
BR112023024985A2 (en)*2021-06-012024-02-20Arbor Biotechnologies Inc GENE EDITING SYSTEMS COMPRISING A CRISPR NUCLEASE AND USES THEREOF
CN116004573B (en)*2022-10-252023-09-12山东舜丰生物科技有限公司Cas protein with improved editing activity and application thereof
WO2024178144A1 (en)*2023-02-222024-08-29Prime Medicine, Inc.Methods and compositions for editing nucleotide sequences
CN118792282A (en)*2023-04-132024-10-18深圳大学 Un1Cas12f1 mutant and its application

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN113106081A (en)*2018-10-292021-07-13中国农业大学Novel CRISPR/Cas12f enzymes and systems
CN113891936A (en)*2019-03-192022-01-04布罗德研究所股份有限公司Methods and compositions for editing nucleotide sequences
CN114127285A (en)*2019-03-192022-03-01布罗德研究所股份有限公司 Methods and compositions for editing nucleotide sequences
CN114729365A (en)*2019-03-192022-07-08布罗德研究所股份有限公司 Methods and compositions for editing nucleotide sequences
CN113151215A (en)*2021-05-272021-07-23中国科学院动物研究所Engineered Cas12i nuclease, effector protein thereof and uses thereof
WO2023096977A2 (en)*2021-11-242023-06-01Prime Medicine, Inc.Modified prime editing guide rnas
WO2023212594A2 (en)*2022-04-262023-11-02University Of MassachusettsSINGLE pegRNA-MEDIATED LARGE INSERTIONS

Also Published As

Publication numberPublication date
CN119685290A (en)2025-03-25

Similar Documents

PublicationPublication DateTitle
JP7460178B2 (en) CRISPR-Cas12j enzyme and system
CN114672473B (en)Optimized Cas protein and application thereof
US11530421B2 (en)Self-inactivating endonuclease-encoding nucleic acids and methods of using the same
CN113015798B (en)CRISPR-Cas12a enzymes and systems
CN113373130A (en)Cas12 protein, gene editing system containing Cas12 protein and application
CN113136375A (en)Novel CRISPR/Cas12f enzymes and systems
WO2019206233A1 (en)Rna-edited crispr/cas effector protein and system
CN109929839B (en)Split type single base gene editing system and application thereof
JP2024543042A (en) Serine recombinase
CN117106752A (en)Optimized Cas12 proteins and uses thereof
CN116162609A (en)Cas13 protein, CRISPR-Cas system and application thereof
CN115975986A (en) Mutant Cas12j protein and its application
WO2024251229A1 (en)Cas enzyme and system and use thereof
CN117050971B (en) Cas mutant proteins and their applications
CN110551762B (en)CRISPR/ShaCas9 gene editing system and application thereof
WO2025130907A1 (en)Composition and method for prime editing technique
CN118525088A (en)Optimized Cas protein and application thereof
US20250223576A1 (en)Optimized cas protein and use thereof
CN114277015A (en)Novel CRISPR enzymes and uses
US20250197825A1 (en)Cas protein having improved activity and use thereof
CN118006585B (en)Optimized Cas protein and application thereof
US12291729B2 (en)Engineered Cas protein and use thereof
CN118599808A (en) Novel Cas9 protein CasC, its mutants and applications
US20250163392A1 (en)Nucleic acid-guided nickase fusion proteins
JP7236718B2 (en) Genome editing technology

Legal Events

DateCodeTitleDescription
121Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number:24906386

Country of ref document:EP

Kind code of ref document:A1


[8]ページ先頭

©2009-2025 Movatter.jp