CN117083378A

Movatterモバイル変換

Info

Publication number: CN117083378A
Application number: CN202180092668.1A
Authority: CN
Inventors: K·巴尼; A·西多尔; C·福图尼; M·阿迪尔; A·莱特; B·T·斯塔赫; S·希金斯; B·奥克斯; S·马基亚; S·丹尼; M·莫尔
Original assignee: Scribe Therapy
Current assignee: Scribe Therapy
Priority date: 2020-12-09
Filing date: 2021-12-09
Publication date: 2023-11-17

Abstract

Provided herein are polynucleotides configured for incorporation into recombinant adeno-associated virus (AAV) vectors. These polynucleotides encode CRISPR proteins, gRNA and helper components of AAV vectors for modification of target nucleic acids. The system can also be used to introduce cells, such as eukaryotic cells having mutations in the target nucleic acid of a gene. Methods of using such AAV vectors to modify cells having such mutations are also provided.

Description

AAV vectors for gene editing

Cross Reference to Related Applications

The present application claims priority from U.S. provisional patent application Ser. No. 63/123,112, filed on Ser. No. 12/9, 2020, and U.S. provisional patent application Ser. No. 63/235,638, filed on 8/20, 2021, the contents of which are incorporated herein by reference in their entirety.

Incorporation by reference of sequence Listing

The present sequence listing paragraph application contains a sequence listing submitted in ASCII format via EFS-WEB and is hereby incorporated by reference in its entirety. The ASCII copy was created at 2021, 12 months 9, under the name SCRB-028_02WO_SeqList_ST25.Txt, and was 13MB in size.

Background

Gene editing is very promising for the treatment or prevention of many genetic diseases. However, safe and targeted delivery of CRISPR gene editing mechanisms into desired cells is necessary to achieve therapeutic benefit. There remains a need in the art for compositions and methods for delivering CRISPR gene editing mechanisms to cells in vitro and/or in vivo.

Disclosure of Invention

The present disclosure relates to AAV vectors for delivering CRISPR nucleases to cells to modify target nucleic acids.

In some embodiments, the disclosure provides polynucleotides for producing AAV transgenes (e.g., transgenic plasmids) and for producing recombinant adeno-associated virus (AAV) vectors. In some embodiments, the disclosure provides polynucleotides encoding a first adeno-associated virus (AAV) 5 'Inverted Terminal Repeat (ITR) sequence, a second AAV 3' ITR sequence, a CRISPR nuclease, a first guide RNA (gRNA), one or more promoters, and optionally auxiliary elements; all of these are contained in a single expression cassette that can be incorporated into a single AAV particle. In other embodiments, the polynucleotide comprises sequences encoding a first 5'aav ITR sequence, a second 3' aav ITR sequence, a CRISPR nuclease, a first gRNA, a first promoter, a second promoter, and optionally one or more accessory elements. In other embodiments, the polynucleotide comprises a sequence encoding a first 5'aav ITR sequence, a second 3' aav ITR sequence, a CRISPR nuclease, a first gRNA, a second gRNA, a first promoter, a second promoter, a third promoter, and optionally one or more accessory elements.

In some embodiments, the combined length of the sequence encoding the CRISPR protein and the gRNA sequence is less than about 3100, less than about 3090, less than about 3080, less than about 3070, less than about 3060, less than about 3050, or less than about 3040 nucleotides. In other embodiments, the combined length of the polynucleotides encoding the CRISPR protein sequence and the gRNA sequence is less than about 3040 to about 3100 nucleotides.

In some embodiments, the combined length of the polynucleotide sequences of the first promoter and the at least one auxiliary element is greater than at least about 1300, at least about 1350, at least about 1360, at least about 1370, at least about 1380, at least about 1390, at least about 1400, at least about 1500, at least about 1600, at least 1650, at least about 1700, at least about 1750, at least about 1800, at least about 1850, or at least about 1900 nucleotides. In other embodiments, the combined length of the polynucleotide sequences of the first promoter, the second promoter, and the two or more auxiliary elements is greater than at least about 1300 to at least about 1900 nucleotides. In some embodiments, the combined length of the polynucleotide sequences of the first promoter, the second promoter, and the two or more auxiliary elements is greater than 1314 nucleotides. In other embodiments, the first promoter, the second promoter, and the polynucleotide sequences of two or more auxiliary elements are greater than 1381 nucleotides in combined length. In one embodiment, the polynucleotide sequences of the first promoter, the second promoter, and the two or more auxiliary elements comprise at least 25%, 26%, 27%, 28%, 29%, 30%, 31%, 32%, 33%, 34% or at least 35% or more of the total polynucleotide sequence length.

In some embodiments, the auxiliary element of the polynucleotide is selected from the group consisting of poly (a) signals, gene enhancer elements, introns, post-transcriptional regulatory elements, nuclear Localization Signals (NLS), deaminase, DNA glycosylase inhibitors, stimulators of CRISPR-mediated homology directed repair, and activators or repressors of transcription. In some embodiments, the helper element enhances the expression, binding, activity or performance of the CRISPR protein as compared to the CRISPR protein in the absence of the helper element. In particular embodiments, the enhanced performance is an increase in editing of the target nucleic acid of at least about 10%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 90%, at least about 100%, at least about 1500%, at least about 200%, or at least about 300% after expression of the CRISPR component in an in vitro assay.

In some embodiments, the present disclosure provides a polynucleotide encoding a CRISPR protein that is a class 2V CRISPR protein. In some embodiments, the class 2V CRISPR protein is CasX. In some embodiments, casX comprises a sequence selected from the group consisting of sequences of SEQ ID NOS: 1-3 and SEQ ID NOS: 49-160, 40208-40369 and 40828-40912, or a sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% identity thereto. In some embodiments, the present disclosure provides a polynucleotide encoding a class 2V CRISPR protein, wherein the encoded CRISPR protein comprises the sequence of SEQ ID NO:145, the sequence comprising at least one modification in one or more domains, wherein the one or more modifications are selected from the modifications shown in tables 30-33, wherein the one or more modifications result in improved characteristics of the CRISPR protein relative to SEQ ID NO: 145.

In some embodiments, the polynucleotide encodes a first gRNA and a second gRNA, wherein the encoded grnas each comprise a sequence selected from the sequences of SEQ ID NOs 2101-2285, 39981-40026, 40913-40958, and 41817 as shown in table 2, or a sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% identity thereto. In some embodiments, the encoded first and second grnas comprise a scaffold sequence having one or more modifications relative to SEQ ID No. 2238, wherein the one or more modifications result in improved characteristics of the expressed first and second grnas, wherein the one or more modifications comprise one or more nucleotide substitutions, insertions, and/or deletions as shown in table 28, wherein the one or more modifications result in improved characteristics of the expressed first and second grnas. In another embodiment, the encoded first and second grnas comprise a scaffold sequence having one or more modifications relative to SEQ ID NO:2239, wherein the one or more modifications result in improved characteristics of the expressed first and second grnas, wherein the one or more modifications comprise one or more nucleotide substitutions, insertions, and/or deletions as shown in table 28, wherein the one or more modifications result in improved characteristics of the expressed first and second grnas.

In some embodiments, the polynucleotide comprises 5 'and 3' ITRs, wherein the ITRs are derived from serotypes AAV1, AAV2, AAV3, AAV4, AAV5, AAV6, AAV7, AAV8, AAV9, AAV10, AAV11, AAV12, AAV 44.9, AAV-Rh74, or AAVRh10.

In some embodiments, the polynucleotide comprises one or more sequences selected from the group consisting of the sequences of tables 8-10, 12, 13 and 17-22 and 24-27, or a sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% identity thereto.

In other embodiments, the disclosure provides a recombinant adeno-associated virus (rAAV) comprising an AAV capsid protein and a polynucleotide of any of the embodiments disclosed herein. In some embodiments, the AAV capsid protein is derived from serotype AAV1, AAV2, AAV3, AAV4, AAV5, AAV6, AAV7, AAV8, AAV9, AAV10, AAV11, AAV12, AAV 44.9, AAV-Rh74, or AAVRh10.

In some embodiments, the present disclosure provides a method of preparing a recombinant AAV vector comprising providing a population of cells; and transfecting the population of cells with a vector comprising the polynucleotide of any one of the embodiments disclosed herein. In some embodiments, the cell population expresses AAV rep and cap proteins.

In some embodiments, the disclosure provides AAV vectors wherein one or more component sequences selected from the group consisting of a 5'itr, a 3' itr, a pol III promoter, a pol II promoter, a coding sequence for a CRISPR nuclease, a coding sequence for a gRNA, an auxiliary element, and a poly (a) are substantially depleted of CpG dinucleotides, wherein the component sequences retain their functional characteristics (e.g., the ability to drive expression or the ability to retain the editing potential of a target nucleic acid). In some embodiments, AAV vectors that are substantially depleted of CpG dinucleotides exhibit reduced immunogenic properties (e.g., reduced ability to elicit antibodies to inflammatory cytokines or AAV components), e.g., upon administration.

In some embodiments, the disclosure provides a method for modifying a target nucleic acid in a population of mammalian cells, comprising contacting a plurality of cells with an effective amount of a rAAV of any of the embodiments disclosed herein, wherein the target nucleic acid of a gene of the cell targeted by the expressed gRNA is modified by the expressed CRISPR protein.

In some embodiments, the disclosure provides a method for treating a disease in a subject (e.g., a human) caused by one or more mutations in a gene of the subject, comprising administering a therapeutically effective dose of a rAAV of any of the embodiments disclosed herein.

In some embodiments, the disclosure provides a method of reducing immunogenicity of a rAAV comprising deleting all or a portion of a CpG dinucleotide selected from the group consisting of a 5'itr, a 3' itr, a pol III promoter, a pol II promoter, a coding sequence of a CRISPR nuclease, a coding sequence of a gRNA, a helper element, and an AAV component sequence of a poly (a).

Incorporated by reference

All publications, patents, and patent applications mentioned in this specification are herein incorporated by reference to the same extent as if each individual publication, patent, or patent application was specifically and individually indicated to be incorporated by reference.

The contents of PCT/US2021/061673 filed on Ser. No. 12/10 of 2020 and Ser. No. 12/2021 are incorporated herein by reference in their entirety.

Drawings

The novel features believed characteristic of the disclosure are set forth with particularity in the appended claims. A better understanding of the features and advantages of the present disclosure will be obtained by reference to the following detailed description and drawings that set forth an illustrative embodiment in which the principles of the disclosure are utilized, in which:

FIG. 1 shows a schematic diagram of an AAV construct described in example 1.

Fig. 2 shows the results of an editing assay using AAV transgenic plasmids nuclear transfected into mNPC as described in example 1, which shows that CasX and targeting guide in three different vectors (constructs 1, 2 and 3) were efficiently edited on target (tdtomo) compared to non-targeted control (NT). Editing was assessed by FACS 5 days after transfection. Data are expressed as mean ± SEM of n=3 replicates.

FIG. 3 shows the results of an editing assay using AAV transgenic plasmids transfected into mNPC at four different dose levels, as described in example 1. CasX delivered as AAV transgenic plasmid to mNPC was edited efficiently on target in a dose-dependent manner compared to non-targeted control (NT). The CasX variants 491 in three different vectors (constructs 1, 2 and 3) were nuclear transfected with scaffold variant 174 and tdTomato-targeting spacer in mNPC and edited by FACS evaluation 5 days after transfection. Data are expressed as mean ± SEM of n=3 replicates.

Fig. 4 shows the results of an editing assay using AAV vector construct 3 transduced into mNPC at 3-fold dilution, assessed five days post transduction by FACS, as described in example 1. Data are expressed as mean ± SEM of n=3 replicates. MOI: fold infection.

Fig. 5 is a scanning transmission micrograph showing AAV particles with packaged CasX variants 438, gRNA scaffold 174, and spacer region 12.7, as described in example 2. AAV was negatively stained with 1% uranyl acetate. Empty particles were identified by dark electron circles at the center of the capsid.

FIG. 6 shows immunohistochemical staining results of mouse coronal brain sections, as described in example 3. Mice received 1X 10 packaging CasX 491, gRNA scaffold 174 with spacer 12.7¹¹ ICV injection of AAV (upper panel) capable of editing tdTom locus in Ai9 mice (edited cells appear white). The lower panel shows CasX 491 administered as AAV ICV injection and scaffold 174 with non-targeting spacer was not edited at the tdTom locus. Immunohistochemical analysis of the tissues was performed 1 month after injection.

FIG. 7 shows the results of an edit-out assay for the tdTom locus in mNPC using AAV transgenic plasmids with constructs with CasX promoter changes, as described in example 4. Editing was assessed by FACS 5 days after transfection. Data are expressed as mean ± SEM of n=3 replicates.

FIG. 8 shows the results of an edit-out assay for the tdTom locus in mNPC using AAV transgenic plasmids with constructs with CasX promoter changes, as described in example 4. Editing was assessed by FACS 5 days after transfection. Data are expressed as mean ± SEM of n=3 replicates.

FIG. 9 shows the results of an edit-out assay for tdTom loci in mNPC using AAV transgenic plasmids with a CasX promoter and a transgene size changing construct (see Table insert), as described in example 4. Editing was assessed by FACS 5 days after transfection. Data are expressed as mean ± SEM of n=3 replicates.

FIG. 10 shows the same start-up as shown in FIG. 9 using incorporationThe AAV vector of the son gave the result of the editing assay for the tdTom locus in mNPC as described in example 4. The left panel is the result of a 3-fold dilution of the test construct, while the right panel is the use of 2X 10⁵ Results of editing vg/cell MOI. Editing was assessed by FACS 5 days after transfection. Data are expressed as mean ± SEM of n=3 replicates.

Fig. 11 shows the results of an editing assay for tdTom locus in mNPC using AAV vectors with protein promoter variants designed to reduce transgene size compared to AAV with the first 4 protein promoter variants previously identified (aav.3, aav.4, aav.5, and aav.6), as described in example 4. Editing was assessed by FACS 5 days after transfection. Data are expressed as mean ± SEM of n=3 replicates. The dashed line shows the editing level of aav.4, which AAV construct was used in this experiment as a baseline for comparison across variants.

Figure 12 is a graph of percent editing versus transgene size for all constructs with different promoters tested in this study. Constructs circled with dashes were identified as having the average edits described above while minimizing transgene size. The dashed line shows the editing level of aav.4, which AAV construct was used in this experiment as a baseline for comparison across variants.

FIG. 13 shows the results of an edit assay for mNPC using AAV transgenic plasmids with variation in the strength of the gRNA promoter, as described in example 5. Editing was assessed by FACS 5 days after transfection. Data are expressed as mean ± SEM of n=3 replicates.

FIG. 14 shows the results of an editing assay for mNPC using three different AAV vectors with varying strength of the gRNA promoter, as described in example 5. The left graph is in the range of 1×10⁴ Up to 5X 10⁵ Test results for 3-fold dilutions of vg/cell construct, while the right panel is using 3×10⁵ Results of editing vg/cell MOI. Editing was assessed by FACS 5 days after transfection. Data are expressed as mean ± SEM of n=3 replicates.

FIG. 15 is a bar graph showing the percent editing of the tdTom locus in mNPC in experiments aimed at assessing the use of truncated U6 RNA promoters in constructs when delivered in AAV transgenic plasmids designed to minimize the footprint of the Pol III promoter in the delivered transgene, as described in example 5. Editing was assessed by FACS 5 days after transfection. Data are expressed as mean ± SEM of n=3 replicates.

Fig. 16 is a bar graph showing the percent editing of tdTom loci in mNPC comparing base construct 53 and construct 85 when delivered as an AAV vector designed to minimize the footprint of the Pol III promoter in the delivered transgene, as described in example 5.

Fig. 17 is a bar graph showing the results of editing of the tdTom locus in experiments aimed at assessing the effect of constructs with engineered U6 RNA promoters when delivered to mNPC in AAV vectors designed to minimize the footprint of the Pol III promoter in AAV transgenes, as described in example 5. Editing was assessed by FACS 5 days after transfection. Data are expressed as mean ± SEM of n=3 replicates.

FIG. 18 is a scatter plot depicting transgene size versus percentage of mNPC edited on the Y axis for all AAV variants tested with an engineered U6 RNA promoter on the X axis, as described in example 5. The dashed line represents construct 53 with the largest promoter tested, while the dotted line represents construct 89 with the smallest promoter tested.

Fig. 19 shows the results of an editing assay of the tdTom locus in mNPC in experiments aimed at assessing the effect of constructs with engineered Pol III RNA promoters when delivered in AAV vectors designed to minimize the footprint of the Pol III promoter in AAV transgenes, as described in example 5. Editing was assessed by FACS 5 days after transfection. Data are expressed as mean ± SEM of n=3 replicates.

FIG. 20 is a bar graph showing AAV-mediated editing levels in mNPC at an MOI of 3.0E+5 vg/cell using the indicator construct, as described in example 5.

FIG. 21 is a scatter plot depicting transgene size versus percentage of mNPC edited on the Y axis for all variants tested on the X axis, as described in example 5.

FIG. 22 shows the results of an edit assay for tdTom locus in mNPC using AAV transgenic plasmids with poly (A) signal variation, as described in example 6. Data are expressed as mean ± SEM of n=3 replicates.

FIG. 23 shows the results of an edit assay for tdTom locus in mNPC using two AAV vectors with top poly (A) signal, as described in example 6. Editing was assessed by FACS 5 days after transfection. Data are expressed as mean ± SEM of n=3 replicates.

FIG. 24 is a schematic of an AAV plasmid construct containing guide RNA transcription units (a stack of gRNA scaffold-spacers driven by the U6 promoter) in different orientations relative to the protein promoter transcription units, as described in example 7. The cone points depict the orientation of the transcriptional units of the protein or guide RNA.

FIG. 25 shows the results of an edit assay for tdTom loci in mNPC using AAV transgenic plasmids with differences in regulatory element orientation, as described in example 7. Editing was assessed by FACS 5 days after transfection. Data are expressed as mean ± SEM of n=3 replicates.

FIG. 26 shows the results of an editing assay for NPC using AAV vectors, the plasmid construct containing guide RNA transcription units (stacks of gRNA scaffold-spacers driven by the U6 promoter) in different orientations relative to the protein promoter transcription units, as described in example 7. The left graph shows a range of 1×10⁴ Up to 2X 10⁶ 3-fold dilution of vg/cell construct. The bar graph on the right shows the percentage of AAV mediated editing in mNPC at MOI of 3.0e+5vg/cell. Editing was assessed by FACS 5 days after transfection. Data are expressed as mean ± SEM of n=3 replicates.

FIG. 27 is a bar graph of the results of an edit assay for tdTom loci in mNPC using AAV transgenic plasmid constructs with different post-transcriptional regulatory elements compared to constructs without post-transcriptional regulatory elements, as described in example 8. Editing was assessed by FACS 5 days after transfection. Data are expressed as mean ± SEM of n=3 replicates.

FIG. 28 is a bar graph showing AAV-mediated editing levels (grey bars) of mNPC at 3.0E+5 viral MOI compared to nuclear transfection editing using 150ng AAV-cis plasmid expressing CasX protein 491 (dark bars) under control of the top promoter without (constructs 4, 5, 6) or in combination with different post-transcriptional regulatory element sequences (constructs 35-37 for base plasmid 4, constructs 38-39 for base plasmid 5, and constructs 42-43 for base plasmid 6), as described in example 8. Editing was assessed by FACS 5 days after transfection. Data are expressed as mean ± SEM of n=3 replicates.

FIG. 29 is a bar graph showing AAV-mediated editing levels of mNPC of constructs at 3.0E+5 viral MOI without (constructs 58, 59, 53) or in combination with different post-transcriptional regulatory element sequences (constructs 72-74 of base plasmid 58 containing the Jet promoter, constructs 75-77 of base plasmid 59 containing the jet+USP promoter, and constructs 80-81 of base plasmid 53 containing the UbC promoter, respectively), as described in example 8. Editing was assessed by FACS 5 days after transfection. Data (n=3) are expressed as mean ± SEM.

FIG. 30 is a scatter plot comparing the transgene size (in bp from ITR to ITR) of each construct evaluated with AAV-mediated editing levels in mNPC at a MOI of 3.0e+5vg/cell, as described in example 8. Circled data points represent the highest constructs identified in terms of editing level for the selected transgene size. The horizontal gray line shows the editing level of the reference vector aav.53 for comparison purposes. The vertical gray lines define vectors that exceed or fall below the 4.9kb transgene size.

Fig. 31 is a violin graph showing the inclusion of AAV-mediated fold improvement of the indicated PTRE element relative to its base (transgene with the same promoter but without PTRE, indicated by grey dashed line) in the transgenic plasmid, as described in example 8.

FIG. 32 is a bar graph showing the results of editing of constructs with different neuronal enhancers delivered as AAV transgenic plasmids to mNPC, as described in example 8. The gray line shows the level of editing of reference plasmid 64 containing the CMV enhancer + core promoter. Editing was assessed by FACS 5 days after transfection. Data are expressed as mean ± SEM of n=3 replicates.

FIG. 33 shows a schematic of an AAV construct with an alternative gRNA configuration for constructs with multiple gRNAs, as described in example 9. The top schematic is architecture 1 and the bottom schematic is architecture 2. The cone points depict the orientation of the transcriptional units of the protein or guide RNA.

Figure 34 shows a schematic of an AAV construct with an alternative gRNA configuration with multiple gRNA constructs, as described in example 9. The cone points depict the orientation of the transcriptional units of the protein or guide RNA.

FIG. 35 shows a schematic of the guide RNA stack (Pol III promoter, scaffold, spacer) architecture tested with nuclear transfection and AAV transduction, as described in example 9. The transgene comprises a double stack of different orientations with spacers 12.7, 12.2 and non-target spacers NT. The cone points depict the orientation of the transcriptional units of the protein or guide RNA.

Figure 36 shows the results of an editing assay for constructs with guide RNA stacks delivered to mNPC by plasmid transfection, showing constructs with RNA stack editing with enhanced potency compared to non-targeted control (NT), as described in example 9. Editing was assessed by FACS 5 days after transfection. Data are expressed as mean ± SEM of n=3 replicates.

Figure 37 shows the results of an editing assay for mNPC using AAV transgenic plasmid constructs (see figure 35) with multiple grnas of different architecture and with different spacer combinations compared to construct 3 with a single gRNA and non-targeting construct, as described in example 9. Editing was assessed by FACS 5 days after transfection. Data are expressed as mean ± SEM of n=3 replicates.

FIG. 38 shows the results of an editing assay for mNPC using AAV vector constructs 45-48 (see FIG. 35) with a plurality of gRNAs of different architecture and with different spacer combinations compared to construct 3, as described in example 9. The left diagram shows the use range of 1×10⁴ Up to 3X 10⁵ Edit results of 3-fold MOI dilution of vg/cell, while the right panel shows 3X 10⁵ Editing results at the MOI of vg/cell. Editing was assessed by FACS 5 days after transfection. Data are expressed as n = Mean ± SEM of 3 replicates.

FIG. 39 is a bar graph of percent editing in mNPC using AAV transgenic plasmid constructs having different 5'NLS combinations (2, 7, and 9 in Table 15) and 3' NLS1, 8, and 9 in mNPC, as described in example 10.

FIG. 40 is a bar graph of percent editing in mNPC using AAV vectors with different 5'NLS combinations and 3' NLS1, 8, and 9 in mNPC, as described in example 10.

Figure 41 is a bar graph of percent editing in mNPC using AAV vectors with different NLS combinations when delivered in vectors designed to minimize the footprint of the Pol III promoter in the transgene.

FIG. 42 is a schematic diagram showing the organization of components of an exemplary AAV transgene between 5 'and 3' ITRs, as described in example 12.

FIG. 43A shows the results of an editing assay in mNPC transfected with 1000 AAV-cis plasmids expressing the CasX protein 491 of CMV and guide variants 174, 229-237 having a spacer region 11.30 targeting the mouse RHO exon 1 locus, which shows an improvement in activity at mouse RHO exon 1 in a dose-dependent manner, as described in example 12. Triplicate wells were pooled together for gDNA extraction and thus treated as n=1.

FIG. 43B is a bar graph showing the fold change in editing level of each engineering scaffold (229-237) relative to guide 174 with spacer 11.30 (set to a value of 1.0) at both plasmid nuclear transfection doses of AAV-cis plasmid of 1000ng and 500ng, as described in example 12. Triplicate wells were pooled together for gDNA extraction and thus treated as n=1.

FIG. 44A shows the editing of engineered guide 235 with increased activity at the human RHO locus for the variant of 235, with increased on-target activity at the WT exogenous RHO and no off-target cleavage at the mutant RHO reporter gene, as described in example 12, compared to 174 with spacer 11.1 targeting the RHO at the exogenous RHO-GFP locus (with GFP as the reporter gene) under expression of the Pol III hU6 promoter in ARPE-19 cells. Data (n=3) are expressed as mean ± SD.

FIG. 44B is a bar graph showing fold change in the editing level of engineered guide 235 compared to 174 at the human RHO locus, where p59.491.235.11.1 was normalized to the baseline p59.491.174.11.1 level (set to a value of 1.0) in cells nuclear transfected with 1000ng of each plasmid, as described in example 12. Data (n=3) are expressed as mean ± SD.

FIG. 45A shows the increased levels of editing at the endogenous mouse Rho exon 1 locus without off-target loci at 3 different MOI levels compared to guide scaffold 174 with spacer 11.30, demonstrated by the levels of editing in mNPC of AAV-mediated expression of the CasX molecule and engineered guide-variant 235, as described in example 12.

FIG. 45B is a bar graph showing fold change in editing levels in mNPC by AAV mediated expression of CasX molecules and engineered guide variants 235, as described in example 12, compared to guide scaffold 174 with spacer 11.30 in cells infected at 5.0e+5MOI. Data are expressed as an average of n=3.

FIG. 46A shows the results of editing at the human RHO locus in mNPC nuclear transfected with 1000ng and 500ng of AAV-cis plasmid expressing CasX protein 491 and sgRNA-scaffolds 174 with different lengths of the on-target spacer, showing improved on-target editing at the mouse RHO locus, as described in example 12. The spacer variants were: 11.30 (20 nt WT RHO), 11.38 (18 nt WT RHO) and 11.39 (19 nt WT RHO). No-target (NT) control spacer designed to not recognize any sequence across the mouse and human genomes was also tested as a negative control to ensure that there was no non-specific targeting caused by CasX protein expression alone. Triplicate wells were pooled together for gDNA extraction and thus treated as n=1.

FIG. 46B is a bar graph showing the level of editing at the human RHO locus in mNPC nuclear transfected with 1000ng of AAV-cis plasmid expressing CasX protein 491 and sgRNA scaffold 174 with an indicated off-target spacer, as described in example 12.

Fig. 46C is a bar graph showing fold-changes in editing levels at the human RHO locus in nuclear transfected mNPC for each sgRNA-scaffold 174 with spacer variants 11.38 and 11.39 normalized to the level of parent sgRNA-scaffold-spacer 174.11.30 as described in example 12. The data show the mean +sd of 3 different biological replicates.

Fig. 47A is a cassette diagram showing the editing results of RHO in a mouse model comparing AAV-mediated delivery of sgRNA scaffold variants and optimized spacers to a reference construct, as described in example 13. Each point represents a retina (n=8-16). One-way ANOVA statistical test was performed with =p <0.001.

Fig. 47B is a cassette graph showing the relative fold change of RHO editing in a mouse model comparing AAV-mediated delivery of sgRNA scaffold variants 174 and 235 and optimized spacer and reference constructs, as described in example 13. Values were relative to the reference vector aav.rho.174.11.30 (set to value 1). Each point represents a retina (n=8-16).

FIG. 48A is a bar graph showing the levels of CTC-PAM editing (indel rates) at the mouse RHO locus in mNPC transfected with 1000ng and 500ng of AAV-cis plasmid and sgRNA-scaffold 235.11.37 (on target) expressing CasX protein variants 491, 515, 527, 528, 535, 536 or 537, respectively, as described in example 14. No-target (NT) control spacer designed to not recognize any sequence across the mouse and human genomes was also tested as a negative control to ensure that there was no non-specific targeting caused by CasX protein expression alone. Triplicate wells were pooled together for gDNA extraction and thus treated as n=1.

FIG. 48B is a bar graph showing the level of CTC-PAM editing (indel rate) at the RHO locus in mNPC mice transfected with AAV-cis plasmid and sgRNA-scaffold 235.11.39 (off-target) expressing CasX protein variants 491, 515, 527, 528, 535, 536 or 537, respectively, as described in example 14.

FIG. 48C shows a bar graph with guides 235 and spacer regions 11.39 each indicating fold change in editing level of CasX protein variants, where the results were normalized to the level of parent CasX protein 491, as described in example 14.

FIG. 49A shows a bar graph of editing levels in ARPE-19mNPC transfected with 1000ng of AAV-cis plasmid expressing CasX protein variant 491, 515, 527, 528, 535, 536 or 537 and guide variant 235 with spacer region 11.41 or 11.43, as described in example 14. Data (n=3) are expressed as mean ± SD.

FIG. 49B shows a bar graph showing fold change in editing level in ARPE-19mNPC transfected with 1000ng of AAV-cis plasmid expressing CasX protein variant 515, 527, 528, 535, 536 or 537 and guide variant 235 with spacer region 11.41 or 11.43 relative to baseline p59.491.235.11.41 level (set to value 1.0), as described in example 14. Data (n=3) are expressed as mean ± SD.

FIG. 50A shows a bar graph of AAV-mediated editing levels at the endogenous mouse Rho exon 1 locus in mNPC, as described in example 14. Using 3.0e+5 and 1.0e+5 vg/cell MOI, mNPC was infected with AAV vectors expressing the indicated CasX proteins 491, 515, 527, 528, 535 or 537 and the sgRNA-scaffold variant 235.11.39 as described in example 14. Data (n=3) are expressed as average values.

Fig. 50B is a bar graph showing fold change in indicated CasX variants with guide scaffold 235 relative to the level of editing of guide 174 with spacer 11.39 in cells infected with indicated MOI, as described in example 14.

FIG. 51 is a diagram referencing the mRHO exon 1 locus and the target amino acid residue P23 (CCC) sequence (highlighted in bold), showing the spacer 11.30 target sequence and the expected CasX-mediated cleavage, as described in example 15. The most common predictive edits quantified in CRISPResso edits (substitutions/deletions) are shown under the reference genome.

Fig. 52A shows AAV-CasX mediated editing of the mRHO P23 locus in vivo in the retina of a C57BL6J mouse (n=6-8; quantified as a percentage of total indel rate detected by NGS), as described in example 15.

Fig. 52B shows AAV-CasX mediated frameshift editing fractions (%) of the mRHO P23 locus in the retina of C57BL6J (n=6-8; quantified as a percentage of total indels detected by NGS), as described in example 15.

Fig. 53A-53F show representative fluorescent imaging of stained retinas from AAV-CasX treated mice or negative controls, as described in example 15. Nuclei were counterstained with DAPI (upper row; fig. 53A-53C) to observe retinal layers, and stained with HA tag (lower row; fig. 53D-53F) antibodies to detect CasX expression in photoreceptor cells (ONL) and other retinal layers (INL; GCL). Legend: ONL = outer nuclear layer; INL = inner nuclear layer, GCL = ganglion cell layer.

Fig. 54A is a box plot showing median, minimum and highest edit values of AAV-mediated CasX 491 expression detected by NGS 3 weeks after injection in wild-type retina injected with 5.0e+9vg/eye aav.x.491.174.11.30 vector, wherein 491 protein is driven by a promoter variant designed for selective expression in rod cells (x=rp 1-RP 5) or a ubiquitous promoter (x=cmv), as described in example 16. The gray line is at the editing level obtained for aav.rp1.491.174.11.30 for comparison with other viral vectors tested.

FIG. 54B is a graph showing the level of editing achieved by AAV vectors in wild-type retinas injected with 5.0e+9vg/eye AAV.X.491.174.11.30 vectors, as compared to total transgene size (bp), as described in example 16. The gray lines define transgenes below or above 4.9kb in size.

FIG. 55 shows in vivo editing results of AAV-mediated expression of CasX 491 and sgRNA spacer 174.4.76 in rod cells in a dose-dependent manner resulting in detectable levels of editing at the integrated Nrl-GFP locus, as described in example 16. The bar graph shows the levels of editing detected by NGS at the integrated GFP locus 4 and 12 weeks post injection in heterozygous Nrl-GFP mice injected with the indicated dose of aav.rp1.491.174.4.76 vector in one eye and vehicle control in the opposite eye.

FIG. 56A shows Western blots of retinal lysates from positive (C1, uninjected homozygous Nrl-GFP retina) and negative (N, uninjected C57BL/6J retina), vehicle group (V, AAV-injected with AAV formulation buffer), or medium dose 1.9e+9 (M) or high dose 1.0e+10vg (H group treated AAV-CasX 491, sgRNA 174 and spacer 4.76. Blots show bands for HA protein (CasX protein, upper panel), GFP protein (middle panel) and GAPDH (lower panel) used as loading controls, respectively.

FIG. 56B is a box plot of the scatter plot representing the levels of GFP protein detected in the Western blot of FIG. 56A (ratio of optical density values of GFP bands to total protein, normalized to vehicle group level), as described in example 16. One-way ANOVA statistical analysis was performed (=p < 0.5).

FIG. 56C is a graph correlating GFP protein fractions with the levels of editing achieved in the mouse retina of AAV-treated mice for the 1.0e+9 and 1.0e+10 dose groups, as described in example 16.

Fig. 57A is a bar graph showing the ratio of GFP fluorescence levels (above the lower retinal average gray value) detected by fundus imaging 4 weeks after injection to 12 weeks after injection in mice injected with two dose levels of AAV construct, as described in example 16.

Fig. 57B shows representative images of fluorescent fundus imaging of GFP in the retinas of mice injected with AAV constructs at 1.0e+9vg (# 13) or 1.0e+10vg (# 34) at 4 weeks (left panel) and/or 12 weeks (right panel), as described in example 16.

Fig. 58A-58L show histological images or retinas of mice stained with various immunochemical reagents as described in example 16, demonstrating effective knockdown of GFP in photoreceptor cells in an AAV dose dependent manner. Images are representative confocal images of cross-sectional retinas injected with vehicle (fig. 58A, 58B, 58C, 58D), AAV-CasX at a dose of 1.0e+9vg (fig. 58E, 58F, 58G, and 58H), and AAV-CasX at a dose of 1.0e+10vg (fig. 58I, 58J, 58K, and 58L). Structural imaging showed GFP expression of rod cells in the outer segment (images in fig. 58A, 58E, 58I and 58C, 58G and 58K magnified 20X and 40X, respectively). Nuclei were counterstained with Hoechst (FIG. 58B, FIG. 58F and FIG. 58J), cells were stained with anti-HA to correlate HA levels (CasX transgene levels; FIGS. 58D, 58H and 58L;40X magnification), and GFP was expressed in photoreceptor cells. The outline of the white boxes in B and F represent the retinal areas analyzed at 40X magnification in fig. 58C and 58G. Legend: RPE = retinal pigment epithelium, OS = outer segment, ONL = outer nuclear layer, INL = inner nuclear layer, GCL = ganglion.

Fig. 59A shows immunohistochemical staining results of mouse liver sections, showing that CasX 491 administered as AAV IV injection and scaffold 174 with spacer 12.7 were able to edit the in vivo tdTom locus of Ai9 mice, as described in example 3. Images represent n=3 animals.

Fig. 59B shows immunohistochemical staining results of mouse heart sections, showing that CasX 491 administered as AAV IV injection and scaffold 174 with spacer 12.7 were able to edit the in vivo tdTom locus of Ai9 mice, as described in example 3. Images represent n=3 animals.

Figure 60 is a quantitative plot of percent editing at the B2M locus 5 days after AAV transduction to human NPC in a series of triple dilution MOIs, as described in example 17. The level of editing was determined by NGS as the rate of insertion loss, by flow cytometry as the population of cells that did not express HLA proteins due to successful editing at the B2M locus.

Fig. 61 shows the results of the editing assay, measured as the rate of indels detected by NGS at the human AAVs1 locus in human induced neurons (iNs) using three indicator AAVs, each containing CasX 491 and gRNA with a specific spacer targeting AAVs1, as described in example 17.

Fig. 62 is a bar graph showing percent editing at the B2M locus in human iNs 14 days after AAV transduction of expression CasX 491 driven by various protein promoters at MOI of 2E4 or 6.67E3, as described in example 17.

FIG. 63 shows the results of an edit assay using AAV transgenic plasmids infected into hNPC using nuclei as described in example 18, showing the comparison with original CpG⁺ Editing by AAV vector (construct ID 177), U1a promoter (construct IDs 178 and 179), U6 promoter (construct IDs 180 and 181) or bGH poly (a)) The CpG reduction or deletion within (construct ID 182) did not significantly reduce CasX-mediated editing at the B2M locus. Controls used for this experiment were non-targeted (NT) spacers and no treatment (NTx).

Fig. 64 is a bar graph showing the results of editing of tdbitmap loci in an experiment aimed at assessing the effect of AAV constructs with engineered Pol III promoter hybridization variants when delivered to mNPC in AAV vectors, as described in example 5. Editing was assessed by FACS 5 days after nuclear transfection.

FIG. 65 is a schematic representation of regions and domains of guide RNAs used to design a scaffold library, as described in example 20.

FIG. 66 is a pie chart of relative distribution and design of scaffold libraries, where unbiased mutations (double and single mutations) and targeted mutations (toward triplex, scaffold blebs, pseudoknots, and extended stems and loops) are indicated, as described in example 20.

FIG. 67 is a schematic representation of triplex mutagenesis designed to specifically incorporate an alternative triplex base pair into a triplex, as described in example 20. The solid line represents Watson-Crick pairs in triplex; the third strand nucleotides are indicated by dashed lines, indicating atypical interactions with double-stranded purines. In the library, each of the 5 positions shown was replaced with all possible triplex motifs (G: GC, T: AT, G: GC) =243 sequences. ACUGGCGCUUUUUUUUUUGAGGCCAUCANNNAUCAAAG sequence (SEQ ID NO: 41829).

FIG. 68 is a bar graph of the enrichment value results for reference guide brackets 174 and 175 in each screen, as described in example 20.

FIG. 69 is a log showing each measured single nucleotide substitution, deletion or insertion₂ A scatter plot of enrichment values, as measured in each of the two independent screens for the mutant libraries of guide brackets 174 and 175, as described in example 20.

FIG. 70 is a heat map of a single mutant in guide scaffolds 174 and 175 showing specific mutable regions in the cross-sequence scaffolds, as described in example 20. The yellow shading reflects a value with similar enrichment as the reference scaffold; the red shading indicates an increase in enrichment relative to the reference scaffold, and thus an increase in activity; blue shading indicates loss of activity relative to wild-type scaffold; white indicates missing data (or substitutions that would result in wild-type sequences).

FIG. 71 is a graph comparing the log of single nucleotide mutations on reference guide scaffolds 174 and 175₂ An enriched scatter plot as described in example 20. Only those mutations that are similar in position between 174 and 175 are shown. The results indicate that, overall, the guide brackets 174 are more tolerant of changes than 175.

FIG. 72 is a graph showing the average (and 95% confidence interval) log for a set of stents₂ A histogram of enrichment values, in which pairs of pseudo-junctions have been reorganized such that each new pseudo-junction has the same base pair composition, but in a different order within the stem, as described in example 20. Each column represents a set of brackets indicating the position of the G: a (or a: G) pair (see right). 291 pseudotubers were tested; the numbers on the columns represent the number of stems with G: A (or A: G) pairs at each location.

FIG. 73 is a schematic diagram of the pseudo-junction sequences of FIGS. 55 and 56, given 5 'to 3', wherein the two strand sequences are separated by an underline.

FIG. 74 is a graph showing the mean (and 95% confidence interval) log of stents₂ The enrichment value is divided by a histogram of predicted secondary structural stability of the pseudonode stem region, as described in example 20. Has a very stable stem (e.g.. DELTA.G<-7 kcal/mol) of scaffolds had on average a high enrichment value, whereas scaffolds with unstable stems (. DELTA.G.gtoreq.5 kcal/mol) had on average a low enrichment value.

FIG. 75 is a heat map of all double mutants at positions 7 and 29 in scaffold 175, as described in example 20. The pseudojunction sequence is given 5 'to 3' on the right.

FIG. 76 is a graph of a survival assay to determine the selective stringency of CcdB selection for different spacers when targeted by CasX protein 515 and scaffold 174, as described in example 21.

Detailed Description

While exemplary embodiments have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. Numerous variations, changes, and substitutions will now occur to those skilled in the art without departing from the invention as claimed herein. It should be understood that various alternatives to the embodiments described herein may be employed in practicing the embodiments of the disclosure. The claims are intended to define the scope of the invention and the methods and structures within the scope of these claims and their equivalents are covered thereby.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of embodiments herein, suitable methods and materials are described below. In case of conflict, the patent specification, including definitions, will control. In addition, the materials, methods, and examples are illustrative only and not intended to be limiting. Numerous variations, changes, and substitutions will now occur to those skilled in the art without departing from the invention.

Definition of the definition

The terms "polynucleotide" and "nucleic acid" are used interchangeably herein to refer to polymeric forms of nucleotides of any length (ribonucleotides or deoxyribonucleotides). Thus, the terms "polynucleotide" and "nucleic acid" include single-stranded DNA; double-stranded DNA; a multiplex DNA; single-stranded RNA; double-stranded RNA; a multi-stranded RNA; genomic DNA; a cDNA; DNA-RNA hybrids; and polymers comprising purine and pyrimidine bases or other natural, chemically or biochemically modified, non-natural or derivatized nucleotide bases.

"hybridizable" or "complementary" is used interchangeably, and means that a nucleic acid (e.g., RNA, DNA) comprises a sequence of nucleotides that enable it to non-covalently bind (i.e., form watson-crick base pairs and/or G/U base pairs), "anneal" or "hybridize" to another nucleic acid in a sequence-specific, antiparallel manner (i.e., the nucleic acid specifically binds to the complementary nucleic acid) under appropriate in vitro and/or in vivo temperature and solution ionic strength conditions. It will be appreciated that the sequence of the polynucleotide need not be 100% complementary to its target nucleic acid sequence to which it is specifically hybridizable; the sequence of the polynucleotide can have at least about 70%, at least about 80%, or at least about 90%, or at least about 95% sequence identity and still hybridize to the target nucleic acid sequence. In addition, polynucleotides may hybridize over one or more fragments such that intervening or adjacent fragments do not participate in a hybridization event (e.g., loop or hairpin structures, "bulge," "bubble," etc.).

For the purposes of this disclosure, "gene" includes DNA regions encoding a gene product (e.g., protein, RNA), as well as all DNA regions that regulate the production of a gene product, whether or not such regulatory sequences are adjacent to coding and/or transcribed sequences. Thus, a gene may include helper element sequences including, but not necessarily limited to, promoter sequences, terminators, translational regulatory sequences (such as ribosome binding sites and internal ribosome entry sites), enhancers, silencers, insulators, border elements, origins of replication, matrix attachment sites, and locus control regions. The coding sequence encodes a gene product upon transcription or transcription and translation; the coding sequences of the present disclosure may comprise fragments and need not contain a full-length open reading frame. A gene may include both a transcribed strand and a complementary strand containing an anticodon.

The term "downstream" refers to a nucleotide sequence located 3' of a reference nucleotide sequence. In certain embodiments, the downstream nucleotide sequence relates to a sequence following the start of transcription. For example, the translation initiation codon of a gene is located downstream of the transcription initiation site.

The term "upstream" refers to a nucleotide sequence located 5' to a reference nucleotide sequence. In certain embodiments, the upstream nucleotide sequence relates to a sequence located 5' to the coding region or transcription start point. For example, most promoters are located upstream of the transcription initiation site.

The term "adjacent to … …" in relation to a polynucleotide or amino acid sequence refers to sequences that are adjacent or contiguous to each other in the polynucleotide or polypeptide. The skilled person will appreciate that two sequences may be considered adjacent to each other and still contain a limited number of inserted sequences, for example 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 nucleotides or amino acids.

The term "helper element" is used interchangeably herein with the term "helper sequence" and is intended to include, inter alia, polyadenylation signals (poly (a) signals), enhancer elements, introns, post-transcriptional regulatory elements (PTREs), nuclear Localization Signals (NLS), deaminase, DNA glycosylase inhibitors, additional promoters, factors that stimulate CRISPR-mediated homology directed repair (e.g., cis or trans), transcriptional activators or repressors, self-cleaving sequences, and fusion domains, such as fusion domains fused to CRISPR proteins. It will be appreciated that the selection of the appropriate auxiliary element will depend on the encoded component (e.g., protein or RNA) to be expressed, or on whether the nucleic acid comprises multiple components that require different polymerases or are not intended to be expressed as a fusion protein.

The term "promoter" refers to a DNA sequence that contains a transcription initiation site and additional sequences that promote polymerase binding and transcription. Exemplary eukaryotic promoters include elements such as TATA boxes and/or B Recognition Elements (BREs), and assist or promote transcription and expression of related transcribable polynucleotide sequences and/or genes (or transgenes). The promoter may be synthetically produced, or may be derived from a known or naturally occurring promoter sequence or another promoter sequence. The promoter may be located proximal or distal to the gene to be transcribed. Promoters may also include chimeric promoters that comprise a combination of two or more heterologous sequences to impart certain characteristics. Promoters of the present disclosure may include variants of promoter sequences that are similar in composition but not identical to other promoter sequences known or provided herein. Promoters may be classified according to criteria related to the expression pattern of the relevant coding or transcribable sequence or gene operably linked to the promoter, such as constitutive, developmental, tissue-specific, inducible, etc. Promoters may also be classified according to their strength. As used in the context of a promoter, "strength" refers to the rate of transcription of a gene controlled by the promoter. A "strong" promoter means a high transcription rate, while a "weak" promoter means a relatively low transcription rate.

The promoter of the present disclosure may be a polymerase II (Pol II) promoter. Polymerase II transcribes all protein-encoding genes and many non-encoding genes. Representative Pol II promoters include core promoters, which are sequences of about 100 base pairs surrounding the transcription initiation site, and serve as a binding platform for Pol II polymerase and related general transcription factors. Promoters may contain one or more core promoter elements, such as TATA box, BRE, initiator (INR), motif 10 element (MTE), downstream core promoter element (DPE), downstream Core Element (DCE), although core promoters lacking these elements are known in the art.

The promoter of the present disclosure may be a polymerase III (Pol III) promoter. Pol III transcribes DNA to synthesize small ribosomal RNAs, such as 5S rRNA, trnas, and other small RNAs. Representative Pol III promoters use internal control sequences (sequences within the transcribed portion of the gene) to support transcription, although upstream elements such as TATA boxes are sometimes used as well. All Pol III promoters are considered to be within the scope of the present disclosure.

The term "enhancer" refers to a regulatory DNA sequence that, when bound by a specific protein called a transcription factor, regulates expression of a related gene. Enhancers may be located in introns of a gene, or 5 'or 3' of the coding sequence of a gene. Enhancers may be located proximal to the gene (i.e., within tens or hundreds of base pairs (bp) of the promoter) or may be located distal to the gene (i.e., thousands, hundreds of thousands, or even millions of bp away from the promoter). A single gene may be regulated by more than one enhancer, all of which are considered to be within the scope of the present disclosure.

As used herein, a "post-transcriptional regulatory element (PRE)", such as hepatitis PRE, refers to a DNA sequence that, when transcribed, produces a tertiary structure capable of exhibiting post-transcriptional activity to enhance or promote expression of a gene of interest to which it is operably linked.

As used herein, a "post-transcriptional regulatory element (PTRE)", such as hepatitis PTRE, refers to a DNA sequence that, when transcribed, produces a tertiary structure capable of exhibiting post-transcriptional activity to enhance or promote expression of a gene of interest to which it is operably linked.

As used herein, "recombinant" means that a particular nucleic acid (DNA or RNA) is the product of various combinations of cloning, restriction and/or ligation steps, resulting in a construct having a structurally encoded or non-encoded sequence that is distinguishable from endogenous nucleic acids found in natural systems. In general, the DNA sequence encoding the structural coding sequence may be assembled from cDNA fragments and short oligonucleotide adaptors, or from a series of synthetic oligonucleotides, to provide a synthetic nucleic acid capable of being expressed from recombinant transcription units contained in a cell or in a cell-free transcription and translation system. Such sequences may be provided in open reading frame form uninterrupted by internal untranslated sequences or introns (which are typically present in eukaryotic genes). Genomic DNA comprising the relevant sequences may also be used to form recombinant genes or transcriptional units. Sequences of the non-translated DNA may be present at the 5 'or 3' of the open reading frame, where such sequences do not interfere with the operation or expression of the coding region, and indeed may regulate the production of the desired product by various mechanisms (see "enhancers" and "promoters" above).

The term "recombinant polynucleotide" or "recombinant nucleic acid" refers to a polynucleotide or nucleic acid that does not occur in nature, e.g., one that has been prepared by human intervention in the artificial combination of two otherwise separate sequence fragments. Such artificial combination is typically accomplished by chemical synthesis methods or by manually manipulating isolated fragments of the nucleic acid (e.g., by genetic engineering techniques). This is typically done to replace codons with redundant codons encoding the same or conserved amino acids, while sequence recognition sites are typically introduced or removed. Alternatively, nucleic acid fragments having the desired functions are ligated together to produce the desired combination of functions. Such artificial combination is typically accomplished by chemical synthesis methods or by manually manipulating isolated fragments of the nucleic acid (e.g., by genetic engineering techniques).

As used herein, the term "contacting" refers to establishing a physical connection between two or more entities. For example, contacting the target nucleic acid with the guide nucleic acid means that the target nucleic acid and the guide nucleic acid share a physical linkage; for example, if these sequences share sequence similarity, hybridization may occur.

"dissociation constant" or "K_d "interchangeably used and refers to the affinity between the ligand" L "and the protein" P "; i.e., how tightly the ligand binds to a particular protein. Affinity can be determined using formula K_d ＝[L][P]/[LP]To calculate, wherein [ P ]]、[L]And [ LP ]]The molar concentrations of protein, ligand and complex are indicated, respectively.

The present disclosure provides systems and methods for editing a target nucleic acid sequence. As used herein, "editing" is used interchangeably with "modifying" and includes, but is not limited to, cutting, nicking, deleting, typing, knocking out, and the like.

"cleavage" refers to cleavage of the covalent backbone of a target nucleic acid molecule (e.g., RNA, DNA). Cleavage can be initiated by a variety of methods including, but not limited to, enzymatic or chemical hydrolysis of the phosphodiester bond. Both single strand cleavage and double strand cleavage are possible, and double strand cleavage may occur due to two different single strand cleavage events.

The term "knockout" refers to the elimination of a gene or expression of a gene. For example, a gene may be knocked out by deleting or adding a nucleotide sequence that causes disruption of the reading frame. For another example, a gene may be knocked out by replacing a portion of the gene with an unrelated sequence. The term "knockdown" as used herein refers to reducing the expression of a gene or gene product thereof. Protein activity or function may be reduced or protein levels may be reduced or eliminated as a result of gene knockdown.

As used herein, "homology directed repair" (HDR) refers to a form of DNA repair that occurs during double strand break repair in a cell. This process requires nucleotide sequence homology and uses a donor template to repair or knock out target DNA and results in transfer of genetic information from the donor to the target. If the donor template is different from the target DNA sequence and part or all of the sequence of the donor template is incorporated into the target DNA, homology directed repair may result in a change in the target sequence by an insertion, deletion or mutation.

As used herein, "non-homologous end joining" (NHEJ) refers to repair of double-stranded breaks in DNA by directly joining the broken ends to one another without the need for a homology template (as opposed to homology directed repair, which requires a homology sequence to direct repair). NHEJ typically results in a loss (deletion) of nucleotide sequence near the double strand break site.

As used herein, "microhomology-mediated end ligation" (MMEJ) refers to a mutagenized DSB repair mechanism that is always associated with deletions flanking the cleavage site, without the need for a homology template (as opposed to homology-directed repair that requires a homology sequence to direct repair). MMEJ typically results in a loss (deletion) of nucleotide sequence near the double strand break site. A polynucleotide or polypeptide has a certain percentage of "sequence similarity" or "sequence identity" with another polynucleotide or polypeptide, meaning that when aligned, the percentage of bases or amino acids is the same and in the same relative position when the two sequences are compared. Sequence similarity (sometimes referred to as percent similarity, percent identity, or homology) can be determined in a number of different ways. To determine sequence similarity, sequences, including BLAST, can be aligned using methods and computer programs known in the art, and accessed via the world Wide Web as ncbi.nlm.nih.gov/BLAST. The percent complementarity between fragments of a particular nucleic acid sequence within a nucleic acid can be determined using any convenient method. Exemplary methods include BLAST programs (local sequence alignment search basic tool) and PowerBLAST programs (Altschul et al, J.mol. Biol.,1990,215,403-410; zhang and Madden, genome Res.,1997,7,649-656), or by using the Gap program (Wisconsin sequence analysis software package, version 8 for Unix, from university research institute, madison, wis.), using default settings, for example, using the algorithms of Smith and Waterman (adv. Appl. Math.,1981,2,482-489).

The terms "polypeptide" and "protein" are used interchangeably herein and refer to polymeric forms of amino acids of any length, which may include encoded and non-encoded amino acids, chemically or biochemically modified or derivatized amino acids, and polypeptides having modified peptide backbones. The term includes fusion proteins, including but not limited to fusion proteins having heterologous amino acid sequences.

A "vector" or "expression vector" is a replicon, such as a plasmid, phage, virus-like particle, or cosmid, to which another DNA segment (i.e., an "insert") may be attached, thereby causing replication or expression of the attached segment in a cell.

As used herein, the term "naturally occurring" or "unmodified" or "wild-type" as applied to a nucleic acid, polypeptide, cell or organism refers to a nucleic acid, polypeptide, cell or organism that is found in nature.

As used herein, "mutation" refers to an insertion, deletion, substitution, duplication, or inversion of one or more amino acids or nucleotides as compared to a wild-type or reference amino acid sequence or a wild-type or reference nucleotide sequence.

As used herein, the term "isolated" is meant to describe a polynucleotide, polypeptide, or cell in an environment different from the environment in which the polynucleotide, polypeptide, or cell naturally occurs. The isolated genetically modified host cell may be present in a mixed population of genetically modified host cells.

As used herein, "host cell" refers to a eukaryotic cell, a prokaryotic cell, or a cell from a multicellular organism (e.g., in a cell line), and includes the progeny of a primordial cell that has been genetically modified with a nucleic acid, such eukaryotic cell or prokaryotic cell serving as a recipient for the nucleic acid (e.g., an expression vector). It will be appreciated that the progeny of a single cell may not necessarily be identical in morphology or in genomic or total DNA complement to the original parent, due to natural, accidental, or deliberate mutation. A "recombinant host cell" (also referred to as a "genetically modified host cell") is a host cell into which a heterologous nucleic acid (e.g., an expression vector) has been introduced.

"target cell marker" refers to a molecule expressed by a target cell, including but not limited to a cell surface receptor, cytokine receptor, antigen, tumor-associated antigen, glycoprotein, oligonucleotide, enzyme substrate, epitope or binding site, which may be present on the surface of a target tissue or cell, may act as a ligand for an antibody fragment or glycoprotein tropism factor.

The term "conservative amino acid substitution" refers to the interchangeability of amino acid residues in proteins having similar side chains. For example, a group of amino acids having aliphatic side chains consists of glycine, alanine, valine, leucine and isoleucine; a group of amino acids with aliphatic-hydroxyl side chains consists of serine and threonine; a group of amino acids having amide-containing side chains consists of asparagine and glutamine; a group of amino acids with aromatic side chains consists of phenylalanine, tyrosine and tryptophan; a group of amino acids with basic side chains consists of lysine, arginine and histidine; and a group of amino acids with sulfur-containing side chains consists of cysteine and methionine. Exemplary conservative amino acid substitutions are: valine-leucine-isoleucine, phenylalanine-tyrosine, lysine-arginine, alanine-valine and asparagine-glutamine.

The term "antibody" as used herein encompasses a variety of antibody structures, including but not limited to monoclonal antibodies, polyclonal antibodies, multispecific antibodies (e.g., bispecific antibodies), nanobodies, single domain antibodies such as VHH antibodies and antibody fragments, so long as they exhibit the desired antigen-binding activity or immune activity. Antibodies represent a large class of molecules, including several types of molecules, such as IgD, igG, igA, igM and IgE.

An "antibody fragment" refers to a molecule other than an intact antibody that comprises a portion of the intact antibody and binds to an antigen to which the intact antibody binds. Examples of antibody fragments include, but are not limited to Fv, fab, fab ', fab ' -SH, F (ab ') 2, diabodies, single chain diabodies, linear antibodies, single domain camelid antibodies, single chain variable fragment (scFv) antibody molecules, and multispecific antibodies formed from antibody fragments.

As used herein, "therapy" or "treatment" are used interchangeably herein and refer to a method of achieving a beneficial or desired result, including but not limited to therapeutic benefit and/or prophylactic benefit. Therapeutic benefit refers to eradication or amelioration of the underlying disorder or disease being treated. Therapeutic benefits may also be achieved by: eradication or amelioration of one or more symptoms, or amelioration of one or more clinical parameters associated with the underlying disease, results in an improvement being observed in the subject, although the subject may still have the underlying disorder.

As used herein, the terms "therapeutically effective amount" and "therapeutically effective dose" refer to the amount of a drug or biological agent (alone or as part of a composition) that, when administered to a subject (such as a human or experimental animal) in a single dose or in repeated doses, is capable of having any detectable beneficial effect on any symptom, aspect, measured parameter or feature of a disease state or disorder. Such effects need not be absolutely beneficial.

As used herein, "administering" means a method of administering a dose of a compound (e.g., a composition of the present disclosure) or composition (e.g., a pharmaceutical composition) to a subject.

A "subject" is a mammal. Mammals include, but are not limited to, domesticated animals, non-human primates, humans, dogs, rabbits, mice, rats, and other rodents.

I. General procedure

The practice of the present invention employs, unless otherwise indicated, conventional techniques of immunology, biochemistry, chemistry, molecular biology, microbiology, cell biology, genomics and recombinant DNA, which are found in standard textbooks such as "Molecular Cloning: A Laboratory Manual", 3 rd edition (Sambrook et al, harbor Laboratory Press, 2001); "Short Protocols in Molecular Biology", 4 th edition (Ausubel et al, john Wiley & Sons, 1999); "Protein Methods" (Bollag et al, john Wiley & Sons, 1996); "Nonviral Vectors for Gene Therapy" (edited by Wagner et al, academic Press, 1999); "visual Vectors" (Kaplift and Loewy editions, academic Press, 1995); "Immunology Methods Manual" (edited by Lefkovits, academic Press, 1997); and "Cell and Tissue Culture: laboratory Procedures in Biotechnology" (Doyle and Griffiths, john Wiley & Sons, 1998), the disclosures of which are incorporated herein by reference.

Where a numerical range is provided, it is understood to include the endpoints, and every intermediate value between the upper and lower limits of the range (to one tenth of the unit of the lower limit unless the context clearly dictates otherwise) is included as well as any other specified or intermediate value in that specified range. The upper and lower limits of these smaller ranges may independently be included in the smaller ranges, and are also included in any explicitly excluded limit in the stated range. When a specified range includes one or both of the limits, ranges excluding either or both of those included limits are also included.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. All publications mentioned herein are incorporated herein by reference to disclose and describe the methods and/or materials in connection with which the publications are cited.

It must be noted that, as used herein and in the appended claims, the singular forms "a," "an," and "the" include plural referents unless the context clearly dictates otherwise.

It is appreciated that certain features of the disclosure, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. In other cases, the various features of the disclosure that are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable subcombination. All combinations of embodiments related to the present disclosure are intended to be specifically encompassed by the present disclosure and disclosed herein as if each and every combination were individually and specifically disclosed. In addition, all subcombinations of the various embodiments and elements thereof are also specifically contemplated by the present disclosure and disclosed herein as if each and every such subcombination was individually and specifically disclosed herein.

AAV vectors

In a first aspect, the present disclosure relates to AAV vectors optimized for CRISPR nuclease expression and delivery to target cells and/or tissues for genetic editing.

Wild-type AAV is a small single-stranded DNA virus belonging to the parvovirus family. The wild-type AAV genome consists of two genes encoding four replication proteins and three capsid proteins, respectively, and is flanked on either side by an Inverted Terminal Repeat (ITR) of 130 to 145 nucleotides that folds into a hairpin shape important for replication. Virosomes are composed of three capsid proteins Vp1, vp2 and Vp3, produced from the same open reading frame but from different splicing (Vp 1) and alternative translation initiation sites (Vp 2 and Vp3, respectively) in a 1:1:10 ratio. The cap gene produces another non-structural protein called Assembly Activating Protein (AAP). This protein is produced by ORF2 and is essential for the capsid assembly process. The capsid forms a supermolecule of about 60 individual capsid protein subunits assembled into a non-enveloped T-1 icosahedral lattice capable of protecting the AAV genome.

AAV, being naturally replication-defective and capable of transducing almost every cell type in humans, represents a suitable vector for therapeutic use in gene therapy or vaccine delivery. Typically, when producing a recombinant AAV vector, the sequence between two ITRs is substituted with one or more sequences of interest (e.g., transgenes), and the Rep and Cap sequences are provided in trans, making the ITRs the only viral DNA that remains in the vector. The resulting recombinant AAV vector genome construct comprises two cis-acting 130 to 145 nucleotide ITRs flanking an expression cassette encoding the transgene sequence of interest, providing at least 4.7kb or more for packaging of exogenous DNA that may include the transgene, one or more promoters and auxiliary elements, such that the overall size of the vector is less than 5kb to 5.2kb, which is compatible with packaging within the AAV capsid (it will be appreciated that when the size of the construct exceeds this threshold, the packaging efficiency of the vector is reduced). Transgenes may be used to correct or ameliorate gene defects in cells of a subject. However, in the context of CRISPR-mediated gene editing, the size limitation of expression cassettes is a challenge for most CRISPR systems in view of the large size of nucleases.

The present invention provides polynucleotides for producing AAV transgenic plasmids and for producing AAV viral vectors. In some embodiments, the polynucleotide comprises a sequence encoding a first adeno-associated virus (AAV) 5 'Inverted Terminal Repeat (ITR) sequence, a second AAV 3' ITR sequence, a CRISPR nuclease, a first guide RNA (gRNA), one or more promoters, and optionally auxiliary elements; all of these are contained in a single expression cassette encoded by a single polynucleotide that is capable of being incorporated into a single AAV viral particle. In other embodiments, the polynucleotide comprises sequences encoding a first 5'aav ITR sequence, a second 3' aav ITR sequence, a CRISPR nuclease, a first gRNA, a first promoter, a second promoter, and optionally one or more accessory elements.

The promoter and auxiliary elements can be operably linked to a transgene, such as a CRISPR protein and/or a gRNA, in a manner that allows for transcription, translation, and/or expression thereof in cells transfected with the AAV vector of an embodiment. As used herein, an "operably linked" sequence includes a sequence of helper elements that border a gene of interest and a sequence of helper elements at a distance that controls the gene of interest. In some embodiments, the CRISPR protein and the first gRNA are under the control of and operably linked to a first promoter. In other embodiments, the CRISPR protein is under the control of and operably linked to a first promoter and the first gRNA is under the control of and operably linked to a second promoter.

In some embodiments, the disclosure provides helper elements included in AAV vectors, including but not limited to sequences that control transcription initiation, termination, promoters, enhancer elements, RNA processing signal sequences, enhancer elements, sequences that stabilize cytoplasmic mRNA, sequences that enhance translational efficiency (i.e., kozak consensus sequences), introns, post-transcriptional regulatory elements (PTREs), nuclear Localization Signals (NLS), deaminase, DNA glycosylase inhibitors, second guide RNAs, stimulators of CRISPR-mediated homology directed repair, and activators or repressors of transcription.

"adeno-associated virus inverted terminal repeat" or "AAV ITR" refers to a region recognized in the art found at each end of the AAV genome that functions in cis as both a DNA replication origin and as a packaging signal for the virus. AAV ITRs provide efficient excision and rescue together with AAV rep coding regions, and integrate nucleotide sequences inserted between the two flanking ITRs into the mammalian cell genome.

The nucleotide sequence of the AAV ITR region is known. See, e.g., kotin, r.m. (1994) Human Gene Therapy 5:793-801; berns, K.I. "Parvoviridae and their Replication" in Fundamental Virology, version 2 (B.N.fields and D.M.Knipe). As used herein, AAV ITRs do not have to have the wild type nucleotide sequence, but can be altered, for example, by insertion, deletion, or substitution of nucleotides. In addition, AAV ITRs can be derived from any of a number of AAV serotypes, including, but not limited to, AAV1, AAV2, AAV3, AAV4, AAV5, AAV6, AAV7, AAV8, AAV9, AAV10, AAV11, AAV12, AAV 44.9, AAV-Rh74, and AAVRh10, as well as modified capsids of these serotypes. Furthermore, the 5 'and 3' itrs flanking the selected nucleotide sequence in an AAV vector need not be identical or derived from the same AAV serotype or isolate, so long as they function as intended, i.e., allowing excision and rescue of the sequence of interest from the host cell genome or vector, and allowing integration of the heterologous sequence into the recipient cell genome when the AAV Rep gene product is present in the cell. The use of AAV serotypes for integrating heterologous sequences into host cells is known in the art (see, e.g., WO2018195555A1 and US20180258424A1, which are incorporated herein by reference). In a specific embodiment, the ITRs are derived from serotype AAV1. In another specific embodiment, the ITRs are derived from serotype AAV2, including 5 'ITRs having sequence CCTGCAGGCAGCTGCGCGCTCGCTCGCTCACTGAGGCCGCCCGGGCGTCGGGCGACCTTTGGTCGCCCGGCCTCAGTGAGCGAGCGAGCGCGCAGAGAGGGAGTGGCCAACTCCATCACTAGGGGTTCCT (SEQ ID NO: 40557) and 3' ITRs having sequence AGGAACCCCTAGTGATGGAGTTGGCCACTCCCTCTCTGCGCGCTCGCTCGCTCACTGAGGCCGGGCGACCAAAGGTCGCCCGACGCCCGGGCTTTGCCCGGGCGGCCTCAGTGAGCGAGCGAGCGCGCAGCTGCCTGCAGG (SEQ ID NO: 40576).

"AAV Rep coding region" refers to the region of the AAV genome that encodes replication proteins Rep 78, rep 68, rep 52, and Rep 40. These Rep expression products have been shown to have a number of functions, including recognition, binding and nicking of AAV origins of DNA replication, DNA helicase activity, and regulation of transcription of AAV (or other heterologous) promoters. Rep expression products are a common requirement for replication of the AAV genome.

"AAV cap coding region" refers to a region of the AAV genome encoding capsid proteins VP1, VP2, and VP3, or functional homologs thereof. These Cap expression products provide packaging functions that are commonly required for packaging viral genomes.

In some embodiments, the AAV vector is serotype 9 or serotype 6, which has been demonstrated to efficiently deliver polynucleotides to motor neurons and glia throughout the spinal cord in preclinical models of Amyotrophic Lateral Sclerosis (ALS) (fost, KD. et al, therapeutic AAV9-mediated suppression of mutant RHO slows disease progression and extends survival in models of inherited als.mol ter.21 (12): 2148 (2013)). In some embodiments, the method provides for the use of AAV9 or AAV6 for targeting neurons via intraparenchymal injection. In some embodiments, the method provides for the use of AAV9 for intravenous administration of a vector, wherein AAV9 has the ability to penetrate the blood-brain barrier and drive gene expression in the nervous system via both neuronal and glial tropism of the vector. In other embodiments, the AAV vector is serotype 8, which has been demonstrated to be effective in delivering the polynucleotide to retinal cells.

In some embodiments, the one or more auxiliary elements are selected from: poly (a) signals, gene enhancer elements, introns, post-transcriptional regulatory elements (PTREs), nuclear Localization Signals (NLS), deaminase, DNA glycosylase inhibitors, third promoters, second guide RNAs (targeting different or overlapping fragments of a target nucleotide), stimulators of CRISPR-mediated homology directed repair, and activators or repressors of transcription. In some cases, the PTRE is selected from the group consisting of cytomegalovirus immediate/early intron a, hepatitis b virus PRE (HPRE), woodchuck (Woodchuck) hepatitis virus PRE (WPRE), and the 5' untranslated region (UTR) of human heat shock protein 70mRNA (Hsp 70).

In the above, one or more auxiliary elements are operably linked to the CRISPR protein. It has been found that the inclusion of an auxiliary element in a polynucleotide of an AAV construct can enhance the expression, binding, activity or performance of a CRISPR protein as compared to a CRISPR protein in which the auxiliary element is not present in the AAV construct. In one embodiment, the inclusion of one or more auxiliary elements results in an increase in CRISPR protein editing of a target nucleic acid in an in vitro assay of at least about 10%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 90%, at least about 100%, at least about 1500%, at least about 200%, or at least about 300% compared to a CRISPR protein in which the auxiliary element is not present in the AAV construct.

In a feature of the AAV vectors of the present disclosure, it has been found that utilizing certain class 2 CRISPR systems of smaller size allows for the inclusion of additional sequence space in the polynucleotide used to make the AAV vector that is available for the remaining components of the transgene, as described herein. In some embodiments, the class 2 CRISPR system comprises a V-type protein selected from Cas12a, cas12b, cas12c, cas12d (CasY), cas12j, and CasX, and the relevant guide RNAs of the respective systems. In particular embodiments, the CRISPR protein is CasX, wherein CasX comprises a sequence selected from the group consisting of SEQ ID NOs 1-3 and 49-160, 40208-40369 and 40828-40912 as set forth in table 3, or a sequence having at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% sequence identity thereto. In a specific embodiment, the CRISPR protein is CasX, wherein CasX comprises sequences selected from the group consisting of SEQ ID NOS 1-3 and SEQ ID NOS 49-160 and 40208-40369 and 40828-40912 as set forth in Table 3. In some embodiments, the gRNA comprises a scaffold sequence selected from the group consisting of SEQ ID NOs 2101-2285, 39981-40026, 40913-40958, and 41817 as shown in Table 2, or a sequence having at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98% identity thereto. In a specific embodiment, the gRNA comprises a sequence selected from the group consisting of SEQ ID NOs 2101-2285, 39981-40026, 40913-40958 and 41817 as set forth in Table 2. In the foregoing embodiments, the gRNA further comprises a targeting sequence complementary to the target nucleic acid to be modified, wherein the targeting sequence has at least 15 to 20 nucleotides. Embodiments of CasX protein and gRNA components incorporated into AAV vectors of the present disclosure are contemplated as described more fully below.

In some embodiments, the disclosure provides a polynucleotide comprising a first adeno-associated virus (AAV) Inverted Terminal Repeat (ITR) sequence, a second AAV ITR sequence, a first promoter sequence, a sequence encoding a CRISPR protein, a second promoter sequence, a sequence encoding at least a first guide RNA (gRNA), and one or more helper element sequences, wherein at least 25%, 26%, 27%, 28%, 29%, 30%, 31%, 32%, 33%, 34%, or 35% or more of the nucleotides of the polynucleotide sequence comprise the first promoter and the second promoter and the one or more helper element sequences in a combined length. In other embodiments, the disclosure provides a polynucleotide comprising a first adeno-associated virus (AAV) Inverted Terminal Repeat (ITR) sequence, a second AAV ITR sequence, a first promoter sequence, a sequence encoding a CRISPR protein, a second promoter sequence, a sequence encoding a first guide RNA (gRNA), a third promoter sequence, a sequence encoding a second gRNA, and one or more helper element sequences, wherein at least 25%, 26%, 27%, 28%, 29%, 30%, 31%, 32%, 33%, 34%, or 35% or more of the nucleotides of the polynucleotide sequences comprise the first, second, and third promoters and the one or more helper element sequences in a combined length. As detailed in the examples, it has been found that the ability to use more total polynucleotide of the AAV transgenic expression cassette for the promoter, second gRNA and/or auxiliary element results in enhanced expression and/or performance of CRISPR proteins and grnas when expressed in the target host cell; in an in vitro or in vivo assay of a subject. In some embodiments, the use of alternative or longer promoters and/or auxiliary elements (e.g., poly (a) signals, gene enhancer elements, introns, post-transcriptional regulatory elements (PTREs), nuclear Localization Signals (NLS), deaminase, DNA glycosylase inhibitors, stimulators of CRISPR-mediated homology directed repair, and activators or repressors of transcription) in AAV polynucleotides and resulting AAV vectors results in an increase in editing of the target nucleic acid of at least about 10%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 90%, at least about 100%, at least about 1500%, at least about 200%, or at least about 300% when compared to constructs without alternative or longer promoters and/or auxiliary elements as evaluated in an in vitro assay. In one embodiment, the first promoter sequence of the polynucleotide has at least about 200, at least about 300, at least about 400, at least about 500, at least about 600, at least about 700, or at least about 800 nucleotides. In another embodiment, the second promoter sequence of the polynucleotide has at least about 200, at least about 300, at least about 400, at least about 500, at least about 600, at least about 700, or at least about 800 nucleotides. Embodiments of promoters are described more fully below.

In some embodiments, the present disclosure provides a polynucleotide, wherein the polynucleotide comprises one or more sequences selected from the sequences set forth in tables 8-10, 12, 13, 17-22, and 24-27, or a sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity thereto. In another embodiment, the present disclosure provides a polynucleotide wherein the polynucleotide comprises a sequence selected from the group consisting of the sequences set forth in tables 8-10, 12, 13, and 17-23 and 24-27. In some embodiments, the polynucleotide sequences differ from those shown in tables 8-10, 12, 13 and 17-22, and 24-26 only in the selection of a targeting sequence for the gRNA or grnas encoded by the polynucleotide, wherein the targeting sequence is a sequence having 15 to 30 nucleotides that is capable of hybridizing to the sequence of the target nucleic acid. In the foregoing embodiments, the targeting sequence is selected from the sequences shown in table 27. In some embodiments, the present disclosure provides a polynucleotide of any one of the embodiments described herein, wherein the polynucleotide has the configuration of the construct of fig. 24, 33-35, or 42.

In some embodiments, the disclosure provides a polynucleotide for use in preparing an AAV vector, wherein the polynucleotide comprises one or more sequences selected from the sequences shown in tables 8-10, 12, 13, and 17-22 and 24-27, or a sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity thereto. In another embodiment, the present disclosure provides a polynucleotide for use in preparing an AAV vector, wherein the polynucleotide comprises a sequence selected from the group consisting of the sequences set forth in tables 8-10, 12, 13, 17-22, and 24-27. In some embodiments, polynucleotide sequences differ from those shown in tables 8-10, 12, 13, 17-22, and 24-26 only in the selection of a targeting sequence for a gRNA encoded by a polynucleotide, wherein the targeting sequence is a sequence having 15 to 30 nucleotides and capable of hybridizing to a sequence of a target nucleic acid to be modified. In the foregoing embodiments, the targeting sequence is selected from the group of sequences shown in table 27. In some embodiments, the disclosure provides a polynucleotide for use in preparing an AAV vector of any of the embodiments described herein, wherein the polynucleotide has the configuration of the construct of fig. 24, 33-35, or 42.

Guide nucleic acid for AAV systems

In some embodiments, the disclosure relates to specifically designed guide ribonucleic acids (grnas) for use in AAV systems, which have utility in genome editing of target nucleic acids in cells. The present disclosure provides specifically designed grnas having a targeting sequence that is complementary to (and thus capable of hybridizing to) a target nucleic acid that is a component of a gene editing AAV system. It is contemplated that in some embodiments, multiple grnas (e.g., multiple grnas) are delivered in an AAV system for modifying a target nucleic acid. For example, when each is complexed with a CRISPR nuclease, a pair of grnas having targeting sequences for different or overlapping regions of the target nucleic acid sequence can be used so as to bind and cleave at two different or overlapping sites within the gene, and then be edited by non-homologous end joining (NHEJ), homology Directed Repair (HDR), homology Independent Targeted Integration (HITI), micro-homology mediated end joining (MMEJ), single Strand Annealing (SSA), or Base Excision Repair (BER).

In some embodiments, the disclosure provides grnas for use in a system that have utility in genome editing genes in eukaryotic cells. In particular embodiments, the gRNA of the system is capable of forming a complex with a CRISPR nuclease; ribonucleoprotein (RNP) complexes, described more fully below.

a. Reference gRNA and gRNA variants

In some embodiments, the grnas of the disclosure comprise the sequence of a naturally-occurring guide RNA ("reference gRNA"). In some embodiments, the reference grnas of the present disclosure may be subjected to one or more mutagenesis methods, such as the mutagenesis methods described herein, which may include Deep Mutant Evolution (DME), deep Mutant Scanning (DMS), error-prone PCR, cassette mutagenesis, random mutagenesis, staggered-extension PCR, gene shuffling, or domain swapping (as described herein and in WO2020247883A2, incorporated herein by reference) in order to produce one or more gRNA variants (referred to herein as "gRNA variants") having enhanced or altered properties relative to the reference grnas. gRNA variants also include variants that comprise one or more exogenous sequences, such as fusion to the 5 'or 3' end, or insertion into the interior. The activity of a reference gRNA or variant derived therefrom can be used as a baseline for comparing the activity of the gRNA variants, thereby measuring improvements in the function or other characteristics of the gRNA variants. In other embodiments, the reference gRNA or gRNA variant can undergo one or more deliberate, specifically targeted mutations in order to produce a gRNA variant; such as a rationally designed variant.

In some embodiments, the guide is a ribonucleic acid molecule ("gRNA"), and in other embodiments, the guide is a chimera and comprises both DNA and RNA.

The gRNA of the present disclosure comprises two fragments; targeting sequences and protein binding fragments. Targeting fragments of grnas include nucleotide sequences (interchangeably referred to as guide sequences, spacer regions, targeting sequences, or targeting regions) that are complementary to (and thus can hybridize to) a particular sequence (target site) within a target nucleic acid (e.g., target ssRNA, target ssDNA, complementary strand of double-stranded target DNA, etc.), described more fully below. The targeting sequence of the gRNA is capable of binding to a target nucleic acid sequence comprising a coding sequence, a complement of the coding sequence, a non-coding sequence, and a helper element. The protein binding fragment (or "protein binding sequence") interacts (e.g., binds) with CasX protein as a complex, forming RNP (described more fully below). The protein binding fragments are also referred to herein as "scaffolds" and consist of several regions, described more fully below.

In the case of bi-directional guide RNAs (dgrnas), the targeting and activator moieties each have a duplex-forming fragment, wherein the duplex-forming fragments of the targeting and activator are complementary to each other and hybridize to each other to form a double-stranded duplex (dsRNA duplex for gRNA). When the gRNA is a gRNA, the term "targeting" or "targeting RNA" as used herein refers to the crRNA-like molecule of the CasX double-guide RNA (crRNA: "CRISPR RNA") (and thus the crRNA-like molecule of the CasX single-guide RNA when the "activator" and "targeting" are linked together, e.g., by insertion of nucleotides). The crRNA has a 5' region that anneals to the tracrRNA, followed by nucleotides of the targeting sequence. Thus, for example, a guide RNA (dgRNA or sgRNA) comprises a guide sequence and a duplex-forming fragment of a crRNA, which duplex-forming fragment may also be referred to as a crRNA repeat. The corresponding tracrRNA-like molecule (activator) also comprises a duplex-forming fragment of nucleotides that forms the other half of the dsRNA duplex of the protein-binding fragment of the guide RNA. Thus, the targeting agent and activator act as corresponding pairs, hybridizing to form a two-way guide RNA, referred to herein as "two-way gRNA", "dgRNA", "two-way guide RNA" or "two-way guide RNA". Site-specific binding and/or cleavage of a target nucleic acid sequence (e.g., genomic DNA) by a CasX protein can occur at one or more positions (e.g., the sequence of the target nucleic acid) determined by base pairing complementarity between the targeting sequence of the gRNA and the target nucleic acid sequence. Thus, for example, a gRNA of the present disclosure has a sequence complementary to, and thus can hybridize to, a target nucleic acid adjacent to a sequence complementary to a TC PAM motif or PAM sequence, such as ATC, CTC, GTC or TTC. Because the targeting sequence of the guide sequence hybridizes to the sequence of the target nucleic acid sequence, the user can modify the targeting agent to hybridize to a particular target nucleic acid sequence, as long as the location of the PAM sequence is considered. Thus, in some cases, the sequence of the targeting agent may be a complement of a non-naturally occurring sequence. In other cases, the sequence of the targeting agent may be a naturally occurring sequence derived from the complement of the gene sequence to be edited. In other embodiments, the activator and the target of the gRNA are covalently linked to each other (rather than hybridized to each other) and comprise a single molecule, referred to herein as a "single molecule gRNA", "single guide RNA", "single molecule guide RNA" or "sgRNA". In some embodiments, the sgrnas include "activators" or "targets", and thus may be "activator-RNAs" and "targets-RNAs", respectively. In some embodiments, the gRNA is a ribonucleic acid molecule ("gRNA"), and in other embodiments, the gRNA is a chimera and comprises both DNA and RNA. As used herein, the term gRNA encompasses naturally occurring molecules as well as sequence variants (e.g., non-naturally occurring modified nucleotides).

In general, the assembled grnas of the present disclosure comprise four distinct regions or domains: RNA triplexes, scaffold stems, extension stems, and targeting sequences, which in embodiments of the present disclosure are specific for a target nucleic acid and are located at the 3' end of the gRNA. Together, the RNA triplex, the scaffold stem and the extension stem are referred to as the "scaffold" of gRNA (gRNA scaffold). The gRNA scaffold of the disclosure can comprise RNA, or RNA and DNA.

RNA triplexes

In some embodiments of the guide NA provided herein (including the reference sgRNA), there is an RNA triplex, and the RNA triplex comprises the sequence of the UU- -nX (-4-15) - - -UU (SEQ ID NO: 19) stem loop ending with AAAG (SEQ ID NO: 40786) after 2 insertion of the stem loop (scaffold stem loop and extension stem loop), thereby forming a pseudoknot that can also extend beyond the triplex into a duplex pseudoknot. The triplex UU-AAA (SEQ ID NO: 40787) sequence forms the binding between the targeting sequence, the scaffold stem and the extension stem. In an exemplary CasX sgRNA, the UUU-loop-UUU region is encoded first, then the scaffold stem loop is encoded, then the extended stem loop (which is linked by a four-membered loop) is encoded, then the triplex is blocked with AAAG (SEQ ID NO: 40786) before becoming the targeting sequence.

c. Bracket stem ring

In some embodiments of the CasX sgrnas of the present disclosure, the triplex region is followed by a scaffold stem loop. The scaffold stem loop is the region in the gRNA that binds to CasX proteins (such as CasX variant proteins). In some embodiments, the scaffold stem loop is a relatively short and stable stem loop. In some cases, the scaffold stem loop does not allow for many changes and requires some form of RNA vesicle. In some embodiments, the scaffold stem is necessary for CasX sgRNA function. Although this scaffold stem may resemble the binding stem of Cas9 as a critical stem loop, in some embodiments, the scaffold stem of CasX sgRNA has the necessary projections (RNA bubbles) that are different from many other stem loops present in the CRISPR/Cas system. In some embodiments, the presence of the bulge is conserved in sgrnas that interact with different CasX proteins. Exemplary sequences for the scaffold stem loop sequence of gRNA include sequence CCAGCGACUAUGUCGUAUGG (SEQ ID NO: 14).

d. Extended stem loop

In some embodiments of the CasX sgrnas of the present disclosure, the scaffold stem loop is followed by an extended stem loop. In some embodiments, the extension stem comprises a majority of synthetic tracr and crRNA fusions that do not bind to CasX protein. In some embodiments, the extended stem loop may be highly malleable. In some embodiments, a single guide gRNA is prepared between tracr and crRNA in the extended stem loop with a GAAA (SEQ ID NO: 40788) four-membered ring linker or a GAGAAA (SEQ ID NO: 40789) linker. In some cases, the targeting and activating factors of the CasX sgrnas are linked to each other by intervening nucleotides, and the linker may have a length of 3 to 20 nucleotides. In some embodiments of the CasX sgrnas of the present disclosure, the extension stem is a large 32-bp loop that is located outside of the CasX protein in the ribonucleoprotein complex. Exemplary sequences of the extended stem-loop sequence of sgRNA include sequence GCGCUUAUUUAUCGGAGAGAAAUCCGAUAAAUAAGAAGC (SEQ ID NO: 15).

e. Targeting sequences

In some embodiments of the grnas of the disclosure, the extended stem loop is followed by a region that forms part of a triplex, followed by a targeting sequence (or "spacer") at the 3' end of the gRNA. Targeting sequences target the CasX ribonucleoprotein full complex to specific regions of the target nucleic acid sequence of the gene to be modified. Thus, for example, when either the TC PAM motif or PAM sequence TTC, ATC, GTC or CTC is located 1 nucleotide 5' of a non-target strand sequence complementary to the target sequence, the gRNA targeting sequences of the present disclosure have a sequence that is complementary to, and thus hybridizable to, a portion of a gene in a target nucleic acid (e.g., eukaryotic chromosome, chromosomal sequence, etc.) of a eukaryotic cell. The targeting sequence of the gRNA can be modified so that the gRNA can target the desired sequence of any desired target nucleic acid sequence, provided that PAM sequence positions are considered. In some embodiments, the gRNA scaffold is the 5 'end of the targeting sequence, wherein the targeting sequence is at the 3' end of the gRNA. In some embodiments, the PAM motif sequence recognized by the nuclease of RNP is TC. In other embodiments, the PAM sequence recognized by the nuclease of RNP is NTC; i.e. ATC, CTC, GTC or TTC.

In some embodiments, the targeting sequence is specific for a regulatory element that modulates expression of the gene product. Such regulatory elements include, but are not limited to, promoter regions, enhancer regions, intergenic regions, 5 'untranslated regions (5' UTRs), 3 'untranslated regions (3' UTRs), conserved elements, and regions comprising cis-regulatory elements. The promoter region is intended to include nucleotides within 5kb of the start point of the coding sequence, or in the case of gene enhancer elements or conserved elements, may be thousands, hundreds of thousands or even millions of bp from the coding sequence of the target nucleic acid gene. In the above, a target is one in which the coding gene of the target is knocked out or knocked down such that the gene product is not expressed or expressed at a lower level in the cell.

In some embodiments, the CasX: gRNA system comprises a first gRNA and further comprises a second (and optionally third, fourth, fifth, or more) gRNA, wherein the second gRNA or additional gRNA has a targeting sequence that is complementary to a different or overlapping portion of the target nucleic acid sequence as compared to the targeting sequence of the first gRNA, such that multiple points in the target nucleic acid are targeted, and multiple breaks are introduced in the target nucleic acid, e.g., by CasX. It will be appreciated that in this case the second or further gRNA is complexed with a further copy of the CasX protein. By selecting a targeting sequence for a gRNA, the CasX: gRNA system described herein can be used to modify or edit defined regions of a target nucleic acid sequence that contain mutations, including facilitating insertion of a donor template or excision of DNA between cleavage sites, for example where mutation duplication or removal of exons containing mutations still results in expression of a functional gene product.

gRNA scaffolds

The remaining components of the gRNA, except the targeting sequence domain, are referred to herein as scaffolds. In some embodiments, the gRNA scaffold is derived from a naturally occurring sequence, described below as a reference gRNA. In other embodiments, the gRNA scaffold is a variant of a reference gRNA in which mutations, insertions, deletions, or domain substitutions are introduced to confer desired properties to the gRNA.

In some embodiments, the CasX reference gRNA comprises a sequence isolated from or derived from delta-proteobacteria (deltaproteobacteria). In some embodiments, the sequence is a CasX tracrRNA sequence. Exemplary CasX reference tracrRNA sequences isolated from or derived from delta-proteobacteria (deltaproteobacteria) may include:

ACAUCUGGCGCGUUUAUUCCAUUACUUUGGAGCCAGUCCCAGCGACUAUGUCGUAUGGACGAAGCGCUUAUUUAUCGGAGA (SEQ ID NO: 22) and

ACAUCUGGCGCGUUUAUUCCAUUACUUUGGAGCCAGUCCCAGCGACUAUGUCGUAUGGACGAAGCGCUUAUUUAUCGG (SEQ ID NO: 23). An exemplary crRNA sequence isolated from or derived from delta-Protebacter (DeltaProtebacter) may include sequence CCGAUAAGUAAAACGCAUCAAAG (SEQ ID NO: 24). In some embodiments, the CasX reference gRNA comprises a sequence identical to a sequence isolated from or derived from delta-proteobacteria (deltaproteobacteria).

In some embodiments, the CasX reference guide RNA comprises a sequence isolated from or derived from phylum superficial mycotes (Planctomycetes). In some embodiments, the sequence is a CasX tracrRNA sequence. Exemplary CasX reference tracrRNA sequences isolated or derived from phylum of phylum pumilus (Planctomycetes) may include:

UACUGGCGCUUUUAUCUCAUUACUUUGAGAGCCAUCACCAGCGACUAUGUCGUAUGGGUAAAGCGCUUAUUUAUCGGAGA (SEQ ID NO: 25) and

UACUGGCGCUUUUAUCUCAUUACUUUGAGAGCCAUCACCAGCGACUAUGUCGUAUGGGUAAAGCGCUUAUUUAUCGG (SEQ ID NO: 26). Exemplary crRNA sequences isolated or derived from the phylum Plactomycetes (Plactomycetes) can include sequences

UCUCCGAUAAAUAAGAAGCAUCAAAG (SEQ ID NO: 27). In some embodiments, the CasX reference gRNA comprises a sequence identical to a sequence isolated from or derived from phylum superficial mycotes.

In some embodiments, the CasX reference gRNA comprises a sequence isolated or derived from Candidatus Sungbacteria. In some embodiments, the sequence is a CasX tracrRNA sequence. Exemplary CasX reference tracrRNA sequences isolated or derived from Candidatus Sungbacteria may include the following sequences: GUUUACACACUCCCUCUCAUAGGGU (SEQ ID NO: 28), GUUUACACACUCCCUCUCAUGAGGU (SEQ ID NO: 11), UUUUACAUACCCCCUCUCAUGGGAU (SEQ ID NO: 12) and GUUUACACACUCCCUCUCAUGGGGG (SEQ ID NO: 13). In some embodiments, the CasX reference guide RNA comprises a sequence identical to a sequence isolated from or derived from Candidatus Sungbacteria.

Table 1 provides the sequences of the reference gRNA tracr, cr and scaffold sequences. In some embodiments, the disclosure provides a gRNA variant sequence, wherein the gRNA has a scaffold comprising the sequence: a sequence having at least one nucleotide modification relative to a reference gRNA sequence having the sequence of any one of SEQ ID NOs 4-16 of table 1. It will be appreciated that in those embodiments in which the vector comprises a DNA coding sequence for a gRNA, or in those embodiments in which the gRNA is a chimera of RNA and DNA, thymine (T) bases may be substituted for uracil (U) bases in any of the gRNA sequence embodiments described herein.

Table 1: reference gRNA tracr and scaffold sequences

gRNA variants

In another aspect, the disclosure relates to a gRNA variant comprising one or more modifications relative to a reference gRNA scaffold or derived from another gRNA variant. As used herein, "scaffold" refers to all portions of the gRNA necessary for gRNA function, except for spacer sequences.

In some embodiments, a gRNA variant comprises a region having one or more nucleotide substitutions, insertions, deletions, or exchanges or substitutions relative to a reference gRNA sequence of the disclosure. In some embodiments, mutations can occur in any region of the reference gRNA scaffold to produce a gRNA variant. In some embodiments, the scaffold of the gRNA variant sequence has at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, or at least 70%, at least 80%, at least 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identity to the sequence of SEQ ID NO. 4 or SEQ ID NO. 5. In other embodiments, a gRNA variant comprises a region having one or more nucleotide substitutions, insertions, deletions, or exchanges or substitutions relative to the gRNA variant sequences of the disclosure. In some embodiments, the scaffold of the gRNA variant sequence has at least 50%, at least 60%, or at least 70%, at least 80%, at least 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identity to the sequence of SEQ ID NO:2238 or SEQ ID NO: 2239.

In some embodiments, a gRNA variant comprises one or more nucleotide changes within one or more regions of a reference gRNA scaffold, which gRNA variant improves the characteristics of the reference gRNA. Exemplary regions include RNA triplexes, pseudoknots, stent stem loops, and extended stem loops. In some cases, the variant scaffold stem further comprises a bleb. In other cases, the variant scaffold further comprises a triplex loop region. In other cases, the variant scaffold further comprises a 5' unstructured region. In some embodiments, the gRNA variant scaffold comprises a scaffold stem loop having at least 60% sequence identity, at least 70% sequence identity, at least 80% sequence identity, at least 90% sequence identity, at least 95% sequence identity, or at least 99% sequence identity with SEQ ID NO. 14. In some embodiments, the gRNA variant scaffold comprises a scaffold stem loop having at least 60% sequence identity to SEQ ID NO. 14. In other embodiments, the gRNA variants comprise a scaffold stem loop having the sequence CCAGCGACUAUGUCGUAGUGG (SEQ ID NO: 32). In other embodiments, the present disclosure provides a gRNA scaffold comprising a C18G substitution, a G55 insertion, a U1 deletion, and a modified extended stem loop relative to SEQ ID NO. 5, wherein the initial 6nt loop and 13 base pairs closest to the loop (32 nucleotides total) are replaced with a Uvsx hairpin (4 nt loop and 5 base pairs closest to the loop; 14 nucleotides total) and the loop distal base of the extended stem is converted to a fully base paired stem contiguous with the new Uvsx hairpin by deletion of A99 and substitution of G65U. In the previous embodiments, the gRNA scaffold 174 comprises the sequence

ACUGGCGCUUUUAUCUGAUUACUUUGAGAGCCAUCACCAGCGACUAUGUCGUAGUGGGUAAAGCUCCCUCUUCGGAGGGAGCAUCAAAG(SEQ ID NO:2238)。

All gRNA variants having one or more improved features or added one or more new functions are considered to be within the scope of the present disclosure when comparing the variant gRNA to the reference gRNA described herein. A representative example of such a gRNA variant is guide 174 (SEQ ID NO: 2238), the design of which is described in the examples, and guide 235 (SEQ ID NO: 39987). In some embodiments, the gRNA variant adds a new function to the RNP comprising the gRNA variant. In some embodiments, the gRNA variant has an improved feature selected from the group consisting of: increased stability; increased transcription of the gRNA; increased resistance to nuclease activity; increased folding rate of gRNA; reduced formation of byproducts during folding; increased productive folding; increased binding affinity to CasX protein; increased binding affinity to a target nucleic acid when complexed with a CasX protein; increased gene editing when complexed with CasX proteins; increased editing specificity of the target nucleic acid when complexed with CasX protein; reduced off-target editing when complexed with CasX protein; and when complexed with CasX proteins and any combination thereof, exploit the increased capacity of a broader spectrum of one or more PAM sequences (including ATC, CTC, GTC or TTC) in the editing of target nucleic acids. In some cases, one or more improved characteristics of the gRNA variant are increased by at least about 1.1-fold to about 100,000-fold as compared to the reference gRNA of SEQ ID No. 4 or SEQ ID No. 5 or as compared to gRNA variants 174 or 175. In other cases, one or more improved characteristics of the gRNA variant are increased by at least about 1.1-fold, at least about 10-fold, at least about 100-fold, at least about 1000-fold, at least about 10,000-fold, at least about 100,000-fold, or more as compared to the reference gRNA of SEQ ID No. 4 or SEQ ID No. 5 or as compared to gRNA variant 174 or 175. In other cases, the sequence corresponding to SEQ ID NO:4 or SEQ ID NO:5 or compared to the gRNA variants 174 or 175, one or more improved characteristics of the gRNA variant are increased by about 1.1-100,00-fold, about 1.1-10,00-fold, about 1.1-1,000-fold, about 1.1-500-fold, about 1.1-100-fold, about 1.1-50-fold, about 1.1-20-fold, about 10-100,00-fold, about 10-10,00-fold, about 10-1,000-fold, about 10-500-fold, about 10-100-fold, about 10-50-fold, about 10-20-fold, about 2-70-fold, about 2-50-fold, about 2-30-fold, about 2-20-fold, about 2-10-fold, about 5-50-fold, about 5-30-fold, about 5-10-fold, about 100-100,00-fold, about 100-00-fold, about 100-1,000-fold, about 100-500-fold, about 500-100-000-fold, about 10-100-fold, about 10-500-fold, about 10-100-fold, about 10-20-fold, about 2-5-fold, about 2-fold-5-fold, about 20-fold, about 2-fold, about 5-fold, about 5-fold-5-fold, about 10-5-fold, about 10-fold-5-fold, about 20-fold-5-fold, about 20-fold-5-fold-10-fold-20-fold-10-20-fold-20-fold-20-about-20-fold-20-fold-20-. In other cases, one or more improved characteristics of a gRNA variant are increased by about 1.1-fold, 1.2-fold, 1.3-fold, 1.4-fold, 1.5-fold, 1.6-fold, 1.7-fold, 1.8-fold, 1.9-fold, 2-fold, 3-fold, 4-fold, 5-fold, 6-fold, 7-fold, 8-fold, 9-fold, 10-fold, 11-fold, 12-fold, 13-fold, 14-fold, 15-fold, 16-fold, 17-fold, 18-fold, 19-fold, 20-fold, 25-fold, 30-fold, 40-fold, 45-fold, 50-fold, 55-fold, 60-fold, 70-fold, 80-fold, 90-fold, 100-fold, 110-fold, 120-fold, 130-fold, 140-fold, 150-fold, 160-fold, 170-fold, 180-fold, 190-fold, 210-fold, 220-fold, 230-fold, 240-fold, 250-fold, 260, 270-fold, 300, 340, 400-fold, 400, 340-fold, 400, 425-fold, 400-fold, or 475-fold compared to a reference gRNA of SEQ ID NO 4 or 5 or a gRNA variant of SEQ ID No. 174 or 175.

In some embodiments, the gRNA variants can be produced by subjecting a reference gRNA or gRNA variant to one or more mutagenesis methods, such as the mutagenesis methods described below, which can include Deep Mutation Evolution (DME), deep Mutation Scanning (DMS), error-prone PCR, cassette mutagenesis, random mutagenesis, staggered-extension PCR, gene shuffling, or domain swapping, in order to produce the gRNA variants of the disclosure. The activity of the reference gRNA or the gRNA variant can be used as a baseline for comparing the activity of the gRNA variant, thereby measuring the improvement in the function of the gRNA variant. In other embodiments, the reference gRNA or gRNA variant can undergo one or more deliberate, targeted mutations, substitutions, or domain exchanges in order to produce a gRNA variant, e.g., a rationally designed variant. Exemplary gRNA variants produced by such methods are described in the examples, and representative sequences of the gRNA scaffolds are shown in table 2.

In some embodiments, the gRNA variant comprises one or more modifications as compared to a reference guide nucleic acid scaffold sequence or a gRNA variant scaffold sequence, wherein the one or more modifications are selected from the group consisting of: at least one nucleotide substitution in a region of the gRNA variant; at least one nucleotide deletion in the region of the gRNA; at least one nucleotide insertion in a region of the gRNA; substitution of all or part of the region of the gRNA; deletion of all or part of the region of the gRNA; or any combination of the foregoing. In some cases, the modification is a substitution of 1 to 15 contiguous or non-contiguous nucleotides in one or more regions of the gRNA. In other cases, the modification is a deletion of 1 to 10 contiguous or non-contiguous nucleotides in one or more regions of the gRNA. In other cases, the modification is the insertion of 1 to 10 contiguous or non-contiguous nucleotides in one or more regions of the gRNA. In other cases, the modification is a substitution of a scaffold stem loop or an extended stem loop with an RNA stem loop sequence from a heterologous RNA source having proximal 5 'and 3' ends. In some cases, a gRNA variant of the disclosure comprises two or more modifications in one region relative to the gRNA. In other cases, the gRNA variants of the disclosure comprise modifications in two or more regions. In other cases, the gRNA variants comprise any combination of the foregoing modifications described in this paragraph. In some embodiments, exemplary modifications of the grnas of the disclosure include modifications of table 2.

In some embodiments, 5' G is added to the gRNA variant sequence relative to a reference gRNA for in vivo expression, because transcription from the U6 promoter is more efficient and consistent in the start site when nucleotide +1 is G. In other embodiments, two 5' G are added to generate a gRNA variant sequence for in vitro transcription to increase production efficiency, as T7 polymerase strongly favors purines at the G and +2 positions of the +1 position. In some cases, a 5' g base is added to the reference scaffold of table 1. In other cases, a 5' g base is added to the variant scaffold of table 2.

Table 2 provides exemplary gRNA variant scaffold sequences. In some embodiments, the gRNA variant scaffold comprises any one of the sequences SEQ ID NOs 2101-2285, 39981-40026, 40913-40958, or 41817 as set forth in table 2, or a sequence having at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99% sequence identity thereto. In some embodiments, the gRNA variant scaffold comprises any of the sequences SEQ ID NO 2238-2285, 39981-40026, 40913-40958, or 41817, or a sequence having at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99% sequence identity thereto. In some embodiments, the gRNA variant scaffold comprises any one of the sequences SEQ ID NOs 2281-2285, 39981-40026, 40913-40958, or 41817, or a sequence having at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99% sequence identity thereto. It will be appreciated that in those embodiments in which the vector comprises a DNA coding sequence for a gRNA, or in those embodiments in which the gRNA is a chimera of RNA and DNA, thymine (T) bases may be substituted for uracil (U) bases in any of the gRNA sequence embodiments described herein.

Table 2: exemplary gRNA scaffold sequences

In some embodiments, the sgRNA variant comprises one or more additional modifications to the sequences of SEQ ID NO:2238, SEQ ID NO:2239, SEQ ID NO:2240, SEQ ID NO:2241, SEQ ID NO:2243, SEQ ID NO:2256, SEQ ID NO:2274, SEQ ID NO:2275, SEQ ID NO:2279, SEQ ID NO:2281, SEQ ID NO:2285, SEQ ID NO:39984, SEQ ID NO:39987, or SEQ ID NO:40003 of Table 2.

In some embodiments of the gRNA variants of the disclosure, the gRNA variants comprise at least one modification compared to the reference guide scaffold of SEQ ID NO:5, wherein the at least one modification is selected from one or more of: (a) a C18G substitution in the triplex ring; (b) insertion of G55 in the bleb; (c) U1 is deleted; (d) Modification of an extended stem loop wherein (i) the 6nt loop and 13 loop-proximal base pairs are replaced with a Uvsx hairpin; and (ii) the deletion of A99 and the substitution of G65U results in a fully base-paired loop-distal base.

In some embodiments, the gRNA variant comprises an exogenous stem loop with long non-coding RNA (lncRNA). As used herein, lncRNA refers to non-coding RNAs longer than about 200bp in length. In some embodiments, the 5 'end and the 3' end of the exogenous stem loop are base paired; i.e., a region that interacts to form duplex RNA. In some embodiments, the 5 'end and the 3' end of the exogenous stem loop are base paired, and one or more regions between the 5 'end and the 3' end of the exogenous stem loop are not base paired, thereby forming a loop.

In some embodiments, the disclosure provides a gRNA variant having a nucleotide modification relative to a reference gRNA, the gRNA variant having: (a) Substitutions of 1 to 15 contiguous or non-contiguous nucleotides in one or more regions of the gRNA variant; (b) Deletions of 1 to 10 contiguous or non-contiguous nucleotides in one or more regions of the gRNA variant; (c) Insertion of 1 to 10 contiguous or non-contiguous nucleotides in one or more regions of a gRNA variant; (d) Substitution of a scaffold stem loop or an extended stem loop with an RNA stem loop sequence from a heterologous RNA source having proximal 5 'and 3' ends; or any combination of (a) to (d). Any of the substitutions, insertions, and deletions described herein can be combined to produce a gRNA variant of the disclosure. For example, a gRNA variant can comprise at least one substitution and at least one deletion relative to a reference gRNA, at least one substitution and at least one insertion relative to a reference gRNA, at least one insertion and at least one deletion relative to a reference gRNA, or at least one substitution, one insertion and one deletion relative to a reference gRNA.

In some embodiments, the sgRNA variants of the present disclosure comprise one or more modifications to the sequence of a previously generated variant that itself serves as the sequence to be modified. In some cases, one or more modifications are introduced into the pseudoknot region of the stent. In other cases, one or more modifications are introduced into the triplex region of the scaffold. In other cases, one or more modifications are introduced into the stent bulb. In other cases, one or more modifications are introduced into the extended stem region of the scaffold. In other cases, one of the modifications is introduced into two or more of the aforementioned regions. Such modifications may include insertions, deletions, or substitutions of one or more nucleotides in the aforementioned regions, or any combination thereof. An exemplary method of generating and evaluating modifications is described in example 20.

In some embodiments, the sgRNA variant comprises one or more modifications to the sequence of SEQ ID NO:2238, SEQ ID NO:2239, SEQ ID NO:2240, SEQ ID NO:2241, SEQ ID NO:2274, SEQ ID NO:2275, SEQ ID NO:2279 or SEQ ID NO:2285, SEQ ID NO:39984, SEQ ID NO:39987, or SEQ ID NO: 40003.

In exemplary embodiments, the gRNA variant comprises one or more modifications relative to the gRNA scaffold variant 174 (SEQ ID NO: 2238), wherein the resulting gRNA variant exhibits improved functional characteristics compared to the parent 174 when assessed in vitro or in vivo assays under comparable conditions. In other exemplary embodiments, the gRNA variant comprises one or more modifications relative to the gRNA scaffold variant 175 (SEQ ID NO: 2239), wherein the resulting gRNA variant exhibits improved functional characteristics compared to the parent 175 when assessed in vitro or in vivo assays under comparable conditions. For example, variants with modifications to the triplex loop of the gRNA variant 175 show high enrichment relative to the 175 scaffold, particularly mutations to C15 or C17. In addition, the changes in either member of the predicted pair in the pseudotubers between G7 and A29 are highly enriched relative to the 175 scaffold, converting A29 to C or T to form the typical Watson-Crick pairing (G7: C29), and the second of these would form the GU wobble pair (G7: U29), both of which are expected to increase the stability of the helix relative to the G: A pair. In addition, insertion of C at position 54 of guide stent 175 results in an enriched modification.

In some embodiments, the present disclosure provides a gRNA variant comprising one or more modifications selected from the group consisting of modified gRNA scaffold variants 174 of table 28 (SEQ ID NO: 2238), wherein the resulting gRNA variant exhibits improved functional characteristics compared to parent 174 when assessed in vitro or in vivo assays under comparable conditions. In some embodiments, the improved functional characteristic is one or more functional characteristics selected from the group consisting of: increased editing activity, increased pseudoknot stem stability, increased triplex region stability, increased scaffold stem stability, extended stem stability, reduced off-target folding intermediates, and increased binding affinity to class 2V CRISPR proteins. In the foregoing embodiments, in an in vitro assay, the one or more modified grnas comprising the modification selected from table 28 (having an attached targeting sequence and complexed with a class 2V CRISPR protein) to the gRNA scaffold variant 174 exhibit an enhanced enrichment score (log) of at least about 2.0, at least about 2.5, at least about 3, or at least about 3.5 compared to the score of the gRNA scaffold of SEQ ID No. 2238₂ )。

In some embodiments, the disclosure provides a gRNA variant comprising a selectionOne or more modifications from modified gRNA scaffold variant 175 (SEQ ID NO: 2239) of table 29, wherein the resulting gRNA variant exhibits improved functional characteristics compared to parent 175 when assessed in vitro or in vivo assays under comparable conditions. In some embodiments, the improved functional characteristic is one or more functional characteristics selected from the group consisting of: increased editing activity, increased pseudoknot stem stability, increased triplex region stability, increased scaffold stem stability, extended stem stability, reduced off-target folding intermediates, and increased binding affinity to class 2V CRISPR proteins. In the foregoing embodiments, one or more modified grnas comprising a modification selected from table 29 (having a linked targeting sequence and complexed with a class 2V CRISPR protein) to a gRNA scaffold variant 175 exhibit an enhanced enrichment score (log) of at least about 1.2, at least about 1.5, at least about 2.0, at least about 2.5, at least about 3, or at least about 3.5 in comparison to the score of a gRNA scaffold of SEQ ID NO:2239 in an in vitro assay₂ )。

In particular embodiments, one or more modifications of the gRNA scaffold variant 174 are selected from the group consisting of nucleotide positions U11, U24, a29, U65, C66, C68, a69, U76, G77, a79, and a87. In a specific embodiment, the modification of the gRNA scaffold variant 174 is U11C, U24C, A C, U65C, C66G, C U, ACGGA inserted at position 69, UCCGU inserted at position 76, G77A, GA inserted at position 79, a87G. In another specific embodiment, the modification of the gRNA scaffold variant 175 is selected from the group consisting of nucleotide positions C9, U11, C17, U24, a29, G54, C65, a89, and a96. In a specific embodiment, the modification of the gRNA scaffold variant 174 is C9U, U11C, C G, U C, A C, insertion of G at position 54, insertion of C, A89G and a96G at position 65.

In exemplary embodiments, the gRNA variants comprise one or more modifications relative to the gRNA scaffold variant 215 (SEQ ID NO: 2275), wherein the resulting gRNA variants exhibit improved functional characteristics compared to the parent 215 when assessed in vitro or in vivo assays under comparable conditions.

In exemplary embodiments, the gRNA variant comprises one or more modifications relative to the gRNA scaffold variant 221 (SEQ ID NO: 2281), wherein the resulting gRNA variant exhibits improved functional characteristics compared to the parent 221 when assessed in an in vitro or in vivo assay under comparable conditions.

In exemplary embodiments, the gRNA variants comprise one or more modifications relative to the gRNA scaffold variant 225 (SEQ ID NO: 2285), wherein the resulting gRNA variants exhibit improved functional characteristics compared to the parent 225 when assessed in vitro or in vivo assays under comparable conditions.

In exemplary embodiments, the gRNA variant comprises one or more modifications relative to the gRNA scaffold variant 235 (SEQ ID NO: 39987), wherein the resulting gRNA variant exhibits improved function compared to the parent 225 when assessed in an in vitro or in vivo assay under comparable conditions.

In exemplary embodiments, the gRNA variant comprises one or more modifications relative to the gRNA scaffold variant 251 (SEQ ID NO: 40003), wherein the resulting gRNA variant exhibits improved functional characteristics compared to the parent 251 when assessed in vitro or in vivo assays under comparable conditions.

In the foregoing embodiments, the improved functional features include, but are not limited to, one or more of the following: increased stability, increased gRNA transcription, increased resistance to nuclease activity, increased gRNA folding rate, reduced byproduct formation during folding, increased productive folding, increased binding affinity to CasX protein, increased binding affinity to target nucleic acid when complexed with CasX protein, increased editing specificity when complexed with CasX protein, reduced off-target editing when complexed with CasX protein, and the ability to utilize a broader spectrum of one or more PAM sequences (including ATC, CTC, GTC or TTC) in the modification of target nucleic acid when complexed with CasX protein. In some cases, the one or more improved characteristics of the gRNA variant are improved by at least about 1.1-fold to about 100,000-fold as compared to the gRNA derived therefrom. In other cases, one or more improved characteristics of the gRNA variant are improved by at least about 1.1-fold, at least about 10-fold, at least about 100-fold, at least about 1000-fold, at least about 10,000-fold, at least about 100,000-fold, or more as compared to the gRNA derived therefrom. In other cases, the target cell, as compared to the gRNA derived therefrom, one or more improved characteristics of the gRNA variants are improved by about 1.1-100, 00-fold, about 1.1-10, 00-fold, about 1.1-1,000-fold, about 1.1-500-fold, about 1.1-100-fold, about 1.1-50-fold, about 1.1-20-fold, about 10-100, 00-fold, about 10-10, 00-fold, about 10-1,000-fold, about 10-500-fold, about 10-100-fold, about 10-50-fold, about 10-20-fold, about 2-70-fold, about 2-50-fold, about 2-30-fold, about 2-20-fold, about 2-10-fold, about 5-50-fold, about 5-30-fold, about about 5 to 10 times, about 100 to 100,00 times, about 100 to 10,00 times, about 100 to 1,000 times, about 100 to 500 times, about 500 to 100,00 times, about 500 to 10,00 times, about 500 to 1,000 times, about 500 to 750 times, about 1,000 to 100,00 times, about 10,000 to 100,00 times, about 20 to 500 times, about 20 to 250 times, about 20 to 200 times, about 20 to 100 times, about 20 to 50 times, about 50 to 10,000 times, about 50 to 1,000 times, about 50 to 500 times, about 50 to 200 times, or about 50 to 100 times. In other cases, the target cell, as compared to the gRNA derived therefrom, one or more improved characteristics of the gRNA variant are improved by about 1.1-fold, 1.2-fold, 1.3-fold, 1.4-fold, 1.5-fold, 1.6-fold, 1.7-fold, 1.8-fold, 1.9-fold, 2-fold, 3-fold, 4-fold, 5-fold, 6-fold, 7-fold, 8-fold, 9-fold, 10-fold, 11-fold, 12-fold, 13-fold, 14-fold, 15-fold, 16-fold, 17-fold, 18-fold, 19-fold, 20-fold, 25-fold, 30-fold, 40-fold, 45-fold, 50-fold, 55-fold, 60-fold, 70-fold, 80-fold, 90-fold, 100-fold, 110-fold, 120-fold, 130-fold, 140-fold, 150-fold, 160-fold, 170-fold, 180-fold, 190-fold, 200-fold, 210-fold, 220-fold, 230-fold, 240-fold, 250-fold, 270-fold, 280-fold, 290-fold, 300-fold, 310-fold, 320-fold, 340-fold, 350-fold, 360-fold, 370-fold, 380-fold, 400-fold, 425-fold, 500-fold or 475-fold.

In some embodiments, the gRNA variant comprises an exogenously extended stem loop, wherein such differences from a reference gRNA are as described below. In some embodiments, the exogenously extended stem loop has little or NO identity to the reference stem loop region disclosed herein (e.g., SEQ ID NO: 15). In some embodiments, the exogenous stem loop is at least 10bp, at least 20bp, at least 30bp, at least 40bp, at least 50bp, at least 60bp, at least 70bp, at least 80bp, at least 90bp, at least 100bp, at least 200bp, at least 300bp, at least 400bp, at least 500bp, at least 600bp, at least 700bp, at least 800bp, at least 900bp, at least 1,000bp, at least 2,000bp, at least 3,000bp, at least 4,000bp, at least 5,000bp, at least 6,000bp, at least 7,000bp, at least 8,000bp, at least 9,000bp, at least 10,000bp, at least 12,000bp, at least 15,000bp, or at least 20,000bp. In some embodiments, the gRNA variant comprises an extended stem loop region comprising at least 10, at least 100, at least 500, at least 1000, or at least 10,000 nucleotides. In some embodiments, the heterologous stem loop increases stability of the gRNA. In some embodiments, the heterologous RNA stem loop is capable of binding a protein, RNA structure, DNA sequence, or small molecule. In some embodiments, the exogenous stem loop region replacing the stem loop comprises an RNA stem loop or hairpin, wherein the resulting gRNA has increased stability and can interact with certain cellular proteins or RNAs depending on the choice of loop. Such exogenously extended stem loops may comprise, for example, thermostable RNAs such as MS2 hairpin (ACAUGAGGAUCACCCAUGU (SEQ ID NO: 35)), qβ hairpin (UGCAUGUCUAAGACAGCA (SEQ ID NO: 36)), U1 hairpin II (AAUCCAUUGCACUCCGGAUU (SEQ ID NO: 37)), uvsx (CCUCUUCGGAGG (SEQ ID NO: 38)), PP7 hairpin (AGGAGUUUCUAUGGAAACCCU (SEQ ID NO: 39)), phage replication loop (AGGUGGGACGACCUCUCGGUCGUCCUAUCU (SEQ ID NO: 40)), kis loop_a (UGCUCGCUCCGUUCGAGCA (SEQ ID NO: 41)), kis loop_b1 (UGCUCGACGCGUCCUCGAGCA (SEQ ID NO: 42)), kis loop_b2 (UGCUCGUUUGCGGCUACGAGCA (SEQ ID NO: 43)), G quadruple M3Q (AGGGAGGGAGGGAGAGG (SEQ ID NO: 44)), G quadruple telomere basket (GGUUAGGGUUAGGGUUAGG (SEQ ID NO: 45)), sarcosin-ricin loop (CUGCUCAGUACGAGAGGAACCGCAG (SEQ ID NO: 46)) or pseudoknot (UACACUGGGAUCGCUGAAUUAGAGAUCGGCGUCCUUUCAUUCUAUAUACUUUGGAGUUUUAAAAUGUCUCUAAGUACA (SEQ ID NO: 47)). In some embodiments, the extended stem loop comprises UGGGCGCAGCGUCAAUGACGCUGACGGUACA (stem IIB; SEQ ID NO: 41843), GCACUAUGGGCGCAGCGUCAAUGACGCUGACGGUACAGGCCAGACAAUUAUUGUCUGGUAUAGUGC (stem II; SEQ ID NO: 41844), CAGGAAGCACUAUGGGCGCAGCGUCAAUGACGCUGACGGUACAGGCCAGACAAUUAUUGUCUGGUAUAGUGCAGCAGCAGAACAAUUUGCUGAGGGCUAUUGAGGCGCAACAGCAUCUGUUGCAACUCACAGUCUGGGGCAUCAAGCAGCUCCAGGCAAGAAUCCUG (stem II-V SEQ ID NO: 41845), GCUGACGGUACAGGC (RBE; SEQ ID NO: 41846), and AGGAGCUUUGUUCCUUGGGUUCUUGGGAGCAGCAGGAAGCACUAUGGGCGCAGCGUCAAUGACGCUGACGGUACAGGCCAGACAAUUAUUGUCUGGUAUAGUGCAGCAGCAGAACAAUUUGCUGAGGGCUAUUGAGGCGCAACAGCAUCUGUUGCAACUCACAGUCUGGGGCAUCAAGCAGCUCCAGGCAAGAAUCCUGGCUGUGGAAAGAUACCUAAAGGAUCAACAGCUCCU (full length RRE; SEQ ID NO: 41847).

In some embodiments, the gRNA variant comprises a terminal fusion partner. The term "gRNA variant" includes variants that contain exogenous sequences such as terminal fusions or internal insertions. Exemplary terminal fusions can include fusions of a gRNA with a self-cleaving ribozyme or protein binding motif. As used herein, "ribozyme" refers to an RNA or fragment thereof that has one or more catalytic activities similar to a protease. Exemplary ribozyme catalytic activities may include, for example, cleavage and/or ligation of RNA, cleavage and/or ligation of DNA, or peptide bond formation. In some embodiments, such fusion may improve scaffold folding or recruit DNA repair mechanisms. For example, in some embodiments, the gRNA may be fused to a hepatitis c virus (HDV) anti-genomic ribozyme, an HDV genomic ribozyme, a knife-bending ribozyme (from metagenomic data), an env25 pistol ribozyme (from the representation of Aliistipes putredinis), a HH15 minimal hammerhead ribozyme, a tobacco ringspot virus (TRSV) ribozyme, a WT virus hammerhead ribozyme (and rational variants), or a twist master 1 or RBMX recruitment motif. Hammerhead ribozymes are RNA motifs that catalyze reversible cleavage and ligation reactions at specific sites within an RNA molecule. Hammerhead ribozymes include type I, type II, and type III hammerhead ribozymes. HDV, pistol and knife ribozymes have self-cleaving activity. A gRNA variant comprising one or more ribozymes can allow for extended gRNA function as compared to a gRNA reference. For example, in some embodiments, a gRNA comprising a self-cleaving ribozyme can be transcribed and processed into a mature gRNA as part of a polycistronic transcript. Such fusion may occur at the 5 'or 3' end of the gRNA. In some embodiments, the gRNA variant comprises fusions at both the 5 'and 3' ends, wherein each fusion is independently as described herein.

h. Complex formation with CasX protein

In some embodiments, the gRNA variants have improved ability to form RNP complexes with type 2V proteins comprising CasX variant proteins comprising any of the sequences SEQ ID NOs 49-160, 40208-40369, or 40828-40912 of Table 3, or sequences having at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identity thereto. In some embodiments, after expression, the gRNA variant is complexed as RNP with a CasX variant protein comprising any of the sequences SEQ ID NOs 49-160, 40208-40369, or 40828-40912 of Table 3, or a sequence having at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identity thereto. In some embodiments, after expression, the gRNA variant is complexed as RNP with a CasX variant protein comprising any of the sequences SEQ ID NOs 85-160, 40208-40369, or 40828-40912, or a sequence having at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identity thereto.

In some embodiments, the gRNA variant has an improved ability to form complexes with CasX variant proteins when compared to a reference gRNA, thereby improving its ability to form Ribonucleoprotein (RNP) complexes with CasX proteins with cleavage ability, as described in the examples. In some embodiments, improving the formation of ribonucleoprotein complexes may increase the efficiency of assembly of functional RNPs. In some embodiments, greater than 90%, greater than 93%, greater than 95%, greater than 96%, greater than 97%, greater than 98%, or greater than 99% of the RNPs (which comprise the gRNA variant and its targeting sequence) can be used for gene editing of the target nucleic acid.

In some embodiments, exemplary nucleotide changes that may improve the ability of the gRNA variant to form complexes with CasX protein may include replacing the scaffold stem with a thermostable stem loop. Without wishing to be bound by any theory, replacing the scaffold stem with a thermostable stem loop may increase the overall binding stability of the gRNA variant to the CasX protein. Alternatively, or in addition, removing a substantial portion of the stem loop can alter the folding kinetics of the gRNA variant and make functional folding of the gRNA easier and faster for structural assembly, e.g., by reducing the extent to which the gRNA variant itself can "tangle". In some embodiments, the choice of scaffold stem loop sequence may vary with the different spacers used for the gRNA. In some embodiments, the scaffold sequences can be tailored for the spacer region and thus for the target sequence. Biochemical assays can be used to evaluate the binding affinity of CasX proteins for the formation of RNPs by gRNA variants, including the assays of the examples. For example, one of ordinary skill can measure a change in the amount of fluorescently labeled gRNA bound to the immobilized CasX protein as a response to an increase in the concentration of additional unlabeled "cold competitor" gRNA. Alternatively, or in addition, the fluorescent signal can be monitored or how the fluorescent signal changes when different amounts of fluorescently labeled gRNA flow through the immobilized CasX protein. Alternatively, the ability to form RNPs can be assessed using an in vitro cleavage assay for a defined target nucleic acid sequence.

CRISPR proteins of aav systems

The present disclosure provides AAV systems encoding CRISPR nucleases that have utility in genome editing of eukaryotic cells and are integral components of the self-inactivating characteristics of constructs. In some embodiments, the CRISPR nuclease employed in the genome editing system is a class 2V nuclease. Although members of the class 2V CRISPR Cas system have differences, they share some common features that distinguish them from Cas9 systems. First, class 2V nucleases have a single RNA-guided effector (which contains RuvC domain but no HNH domain) and these nucleases recognize the target region on the T-rich PAM 5 'upstream to the non-targeting strand, unlike Cas9 systems that rely on G-rich PAM on the 3' side of the target sequence. Unlike Cas9 which creates a blunt end near the proximal site of PAM, V-type nucleases create staggered double strand breaks at the distal end of PAM sequence. Furthermore, when activated by the target dsDNA or ssDNA bound in cis, the V-nuclease degrades ssDNA in trans. In some embodiments, the V-type nucleases of embodiments recognize the 5' -TC PAM motif and produce staggered ends that are cut only by RuvC domains. In some embodiments, the V-type nuclease is selected from Cas12a, cas12b, cas12C, cas12d (CasY), cas12j, cas12k, casPhi, C C4, C2C8, C2C5, C2C10, C2C9, casZ, and CasX. In some embodiments, the disclosure provides AAV systems encoding CasX variant proteins and one or more gRNA acids that are capable of forming RNP complexes upon expression in transfected cells and are specifically designed to modify target nucleic acid sequences in eukaryotic cells, as well as cleave self-inactivating fragments incorporated into transgenic polynucleotides comprising AAV constructs.

As used herein, the term "CasX protein" refers to a family of proteins and includes all naturally occurring CasX proteins, proteins having at least 50% identity to naturally occurring CasX proteins, and CasX variants having one or more improved characteristics relative to a naturally occurring reference CasX protein, described more fully below.

The CasX protein of the present disclosure comprises at least one of the following domains: non-target binding (NTSB) domains, target loading (TSL) domains, helix I domains (which are further divided into helix I-I and I-II subdomains), helix II domains, oligonucleotide binding domains (OBD, which are further divided into OBD-I and OBD-II subdomains), and RuvC DNA cleavage domains (which are further divided into RuvC-I and II subdomains). The RuvC domain may be modified or deleted in a catalytic death CasX variant, described more fully below.

In some embodiments, casX variant proteins can bind to and/or modify (e.g., nick, catalyze double strand breaks, methylation, demethylation, etc.) a target nucleic acid at a particular sequence targeted by a related gRNA that hybridizes to a sequence within the target nucleic acid sequence.

a. Reference CasX protein

The present disclosure provides naturally occurring CasX proteins (referred to herein as "reference CasX proteins") that are subsequently modified to produce the CasX variants of the present disclosure. For example, the reference CasX protein may be isolated from a naturally occurring prokaryote such as delta-proteobacteria (Deltaproteobacteria), phylum pumilus (Planctomycetes), or Candidatus Sungbacteria species. The reference CasX protein is a type II CRISPR/Cas endonuclease belonging to the CasX (interchangeably referred to as Cas12 e) protein family that interacts with guide RNAs to form Ribonucleoprotein (RNP) complexes.

In some cases, the reference CasX protein is isolated from or derived from delta-proteobacteria (deltaproteobacteria). In some embodiments, the reference CasX protein comprises a sequence identical to the sequence:

in some cases, the reference CasX protein is isolated or derived from phylum superficial, having the following sequence: in some embodiments, the reference CasX protein comprises a sequence identical to the sequence:

in some cases, the reference CasX protein is isolated or derived from Candidatus Sungbacteria. In some embodiments, the reference CasX protein comprises a sequence identical to the sequence:

class 2V CasX variant proteins

The present disclosure provides class 2V-type CasX variants of a reference CasX protein or variants derived from other CasX variants (interchangeably referred to herein as "class 2V-type CasX variants", "CasX variants" or "CasX variant proteins"), wherein the class 2V-type CasX variants comprise at least one modification in at least one domain relative to the reference CasX protein (including but not limited to the sequences of SEQ ID NOs: 1-3) or at least one modification relative to another CasX variant. Any change in the amino acid sequence of the reference CasX protein or another CasX variant protein that results in an improvement in the characteristics of the CasX protein is considered a CasX variant protein of the present disclosure. For example, a CasX variant may comprise one or more amino acid substitutions, insertions, deletions, or exchange domains, or any combination thereof, relative to a reference CasX protein sequence.

In some embodiments, the modification of the CasX variant is a mutation in one or more amino acids of the reference CasX. In other embodiments, the modification is the insertion or substitution of part or all of a domain from a different CasX protein. In a specific embodiment, the CasX variants of SEQ ID NOS 144-160, 40208-40369, 40828-40912 have the NTSB and helix 1B domains of SEQ ID NO. 1, while other domains are derived from SEQ ID NO. 2, except for the individual modifications of the selection domains described herein. Mutations may be introduced into any one or more domains of the reference CasX protein or CasX variants to produce CasX variants, and may include, for example, deletions of part or all of one or more domains, or one or more amino acid substitutions, deletions, or insertions in any domain of the reference CasX protein or CasX variants derived therefrom. Domains of CasX proteins include non-target binding (NTSB) domains, target loading (TSL) domains, helix I domains, helix II domains, oligonucleotide Binding Domains (OBD), and RuvC DNA cleavage domains. Without being bound by theory or mechanism, the NTSB domain in CasX allows binding to non-target nucleic acid strands and can aid in the unwinding of non-target and target strands. The NTSB domain is presumed to be responsible for unwinding or capturing the non-target nucleic acid strand in its unwound state. Exemplary NTSB domains comprise amino acids 100-190 of SEQ ID NO. 1 or amino acids 102-191 of SEQ ID NO. 2. In some embodiments, the NTSB domain of the reference CasX protein comprises a four-chain β -sheet. In some embodiments, the TSL is used to place or capture the target strand in a folded state that places the frangible phosphate of the target strand DNA backbone in the RuvC active site. Exemplary TSLs comprise amino acids 824-933 of SEQ ID NO. 1 or amino acids 811-920 of SEQ ID NO. 2. Without wishing to be bound by theory, it is believed that in some cases, the helical I domain may contribute to binding of the Protospacer Adjacent Motif (PAM). In some embodiments, the helix I domain of the reference CasX protein comprises one or more alpha helices. Exemplary helix I-I and I-II domains comprise amino acids 56-99 and 191-331, respectively, of SEQ ID NO. 1, or amino acids 58-101 and 192-332, respectively, of SEQ ID NO. 2. The helical II domain is responsible for binding to the guide RNA scaffold stem loop and bound DNA. Exemplary helix II domains comprise amino acids 332-508 of SEQ ID NO. 1 or amino acids 333-500 of SEQ ID NO. 2. OBD binds mainly to the RNA triplex of the guide RNA scaffold. OBD may also be responsible for binding to the Protospacer Adjacent Motif (PAM). Exemplary OBD I and II domains comprise amino acids 1-55 and 509-659 of SEQ ID NO. 1, respectively, or amino acids 1-57 and 501-646 of SEQ ID NO. 2, respectively. RuvC has a DED motif active site that is responsible for cleaving both strands of DNA (one after the other, most likely first cleaving a non-target strand into a target sequence at 11 to 14 nucleotides (nt), then cleaving the target strand at 2 to 4 nucleotides after the target sequence, resulting in staggered cleavage). Particularly in CasX, the RuvC domain is unique in that it is also responsible for binding to the guide RNA scaffold stem loop critical to CasX function. Exemplary RuvC I and II domains comprise amino acids 660-823 and 934-986 of SEQ ID No. 1, respectively, or amino acids 647-810 and 921-978 of SEQ ID No. 2, respectively, while CasX variants may comprise mutations at positions I658 and a708 relative to SEQ ID No. 2, or CasX 515 described below.

In some embodiments, the CasX variant protein comprises at least one modification in at least 1 domain, each domain of at least 2 domains, each domain of at least 3 domains, each domain of at least 4 domains, or each domain of at least 5 domains of a reference CasX protein (including the sequences of SEQ ID NOS: 1-3). In some embodiments, the CasX variant protein comprises two or more modifications in at least one domain of the reference CasX protein. In some embodiments, the CasX variant protein comprises at least two modifications in at least one domain of the reference CasX protein, at least three modifications in at least one domain of the reference CasX protein, or at least four or more modifications in at least one domain of the reference CasX protein. In some embodiments, wherein the CasX variant comprises two or more modifications as compared to the reference CasX protein, and each modification is made in a domain independently selected from the group consisting of an NTSB, a TSL, a helix I domain, a helix II domain, an OBD, and a RuvC DNA cleavage domain. In some embodiments, wherein the CasX variant comprises two or more modifications, the modifications being made in two or more domains, as compared to a reference CasX protein. In some embodiments, at least one modification of the CasX variant protein comprises a deletion of at least a portion of one domain of the reference CasX protein of SEQ ID NOs 1-3. In some embodiments, the deletion is in an NTSB domain, a TSL domain, a helix I domain, a helix II domain, an OBD, or a RuvC DNA cleavage domain.

In some cases, casX variants of the present disclosure comprise modifications in a structural region that may comprise one or more domains. In some embodiments, the CasX variant comprises at least one modification of a discontinuous amino acid residue region of the CasX variant that forms a channel in which the complexing of the gRNA: target nucleic acid with the CasX variant occurs. In other embodiments, the CasX variant comprises at least one modification of a region of non-contiguous amino acid residues of the CasX variant that forms an interface with the gRNA. In other embodiments, the CasX variant comprises at least one modification of a region of non-contiguous amino acid residues of the CasX variant that forms a channel for binding to non-target strand DNA. In other embodiments, the CasX variant comprises at least one modification of a discontinuous amino acid residue region of the CasX variant that forms an interface with a Protospacer Adjacent Motif (PAM) of a target nucleic acid. In other embodiments, the CasX variant comprises at least one modification of a region of the CasX variant where the non-contiguous surface exposes amino acid residues. In other embodiments, the CasX variant comprises at least one modification of a region of non-contiguous amino acid residues that form the core by hydrophobic packing in the domain of the CasX variant. In the foregoing embodiments of this paragraph, the modification of the region may include one or more of a deletion, insertion, or substitution of one or more amino acids of the region; or 2 to 15 amino acid residues of the CasX variant region are substituted with charged amino acids; or 2 to 15 amino acid residues of the CasX variant region are substituted with polar amino acids; or 2 to 15 amino acid residues of the region of the CasX variant are substituted with amino acids that are stacked or have affinity for DNA or RNA bases.

In other embodiments, the present disclosure provides CasX variants, wherein the CasX variants comprise at least one modification relative to another CasX variant; for example, casX variants 515 and 527 are variants of CasX variant 491, and CasX variants 668 and 672 are variants of CasX 535. In some embodiments, at least one modification is selected from amino acid insertions, deletions, or substitutions. All variants that improve one or more functions or features of CasX variant proteins are considered to be within the scope of the present disclosure when compared to the reference CasX proteins or variants derived therefrom described herein. As described in the examples, a CasX variant can be mutagenized to produce another CasX variant. In particular embodiments, in example 21, table 30, the disclosure provides variants of CasX 515 (SEQ ID NO: 145) produced by introducing modifications to the coding sequence that result in amino acid substitutions, deletions, or insertions at one or more positions in one or more domains.

Suitable mutagenesis methods for producing the CasX variant proteins of the present disclosure may include, for example, deep Mutagenesis Evolution (DME), deep Mutation Scanning (DMS), error-prone PCR, cassette mutagenesis, random mutagenesis, staggered-extension PCR, gene shuffling, or domain swapping (described in PCT/US20/36506 and WO2020247883A2, incorporated herein by reference). In some embodiments, casX variants are designed, for example, by selecting a plurality of desired mutations in the identified CasX variants, for example, using the assays described in the examples. In certain embodiments, the activity of a reference CasX or CasX variant protein prior to mutagenesis is used as a baseline for comparing the activity of one or more of the resulting CasX variants, thereby measuring the functional improvement of the new CasX variants.

In some embodiments of the CasX variants described herein, the at least one modification comprises: (a) Substitution of 1 to 100 contiguous or non-contiguous amino acids in a CasX variant as compared to the reference CasX, casX variant 491 (SEQ ID NO: 138), or CasX variant 515 (SEQ ID NO: 145) of SEQ ID NO:1, SEQ ID NO:2, SEQ ID NO: 3; (b) Deletions of 1 to 100 contiguous or non-contiguous amino acids in the CasX variant as compared to a reference CasX or variant derived therefrom; (c) Insertion of 1 to 100 contiguous or non-contiguous amino acids in CasX compared to a reference CasX or variant derived therefrom; or (d) any combination of (a) to (c). In some embodiments, the at least one modification comprises: (a) Substitution of 1 to 10 consecutive or non-consecutive amino acids in the CasX variant compared to the reference CasX of SEQ ID No. 1, SEQ ID No. 2, SEQ ID No. 3; (b) Deletions of 1 to 5 contiguous or non-contiguous amino acids in the CasX variant as compared to a reference CasX or variant derived therefrom; (c) Insertion of 1 to 5 contiguous or non-contiguous amino acids in CasX compared to a reference CasX or variant derived therefrom; or (d) any combination of (a) to (c).

In some embodiments, the CasX variant protein comprises or consists of the following sequence compared to the sequence of SEQ ID NO:1, SEQ ID NO:2, SEQ ID NO:3, casX 491 or CasX 515: the sequence has at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 20, at least 30, at least 40, at least 50, at least 60, at least 70, at least 80, at least 90, or at least 100 changes. In some embodiments, the CasX variant protein comprises one or more substitutions relative to CasX 491 or SEQ ID NO: 138. In some embodiments, the CasX variant protein comprises one or more substitutions relative to CasX 515 or SEQ ID NO: 145. These changes may be amino acid insertions, deletions, substitutions or any combination thereof. These changes may be located in one domain or any combination of domains of the CasX variant. In the substitutions described herein, any amino acid may be substituted with any other amino acid. The substitution may be a conservative substitution (e.g., one basic amino acid is substituted with another basic amino acid). The substitution may be a non-conservative substitution (e.g., a basic amino acid is substituted with an acidic amino acid, or vice versa). For example, the proline in the reference CasX protein may be substituted with any of arginine, histidine, lysine, aspartic acid, glutamic acid, serine, threonine, asparagine, glutamine, cysteine, glycine, alanine, isoleucine, leucine, methionine, phenylalanine, tryptophan, tyrosine, or valine to produce a CasX variant protein of the disclosure.

Any arrangement of substitution, insertion, and deletion embodiments described herein can be combined to produce the CasX variant proteins of the present disclosure. For example, a CasX variant protein may comprise at least one substitution and at least one deletion relative to a reference CasX protein sequence or a sequence of CasX491 or CasX 515, at least one substitution and at least one insertion relative to a reference CasX protein sequence or a sequence of CasX491 or CasX 515, at least one insertion and at least one deletion relative to a reference CasX protein sequence or a sequence of CasX491 or CasX 515, or at least one substitution, one insertion and one deletion relative to a reference CasX protein sequence or a sequence of CasX491 or CasX 515.

In some embodiments, the CasX variant protein comprises 400 to 2000 amino acids, 500 to 1500 amino acids, 700 to 1200 amino acids, 800 to 1100 amino acids, or 900 to 1000 amino acids.

c. CasX variant proteins having domains from multiple source proteins

In certain embodiments, the disclosure provides chimeric CasX proteins for use in AAV systems comprising protein domains from two or more different CasX proteins (such as two or more reference CasX proteins, or two or more CasX variant protein sequences as described herein). As used herein, "chimeric CasX protein" refers to CasX that contains at least two domains isolated or derived from different sources (such as two naturally occurring proteins), which in some embodiments may be isolated from different species. For example, in some embodiments, a chimeric CasX protein comprises a first domain from a first CasX protein and a second domain from a second, different CasX protein. In some embodiments, the first domain may be selected from the group consisting of an NTSB domain, a TSL domain, a helix I domain, a helix II domain, an OBD domain, and a RuvC domain. In some embodiments, the second domain is selected from the group consisting of an NTSB domain, a TSL domain, a helix I domain, a helix II domain, an OBD domain, and a RuvC domain, wherein the second domain is different from the first domain described previously. For example, a chimeric CasX protein may comprise the NTSB, TSL, helix I, helix II, OBD domain of the CasX protein from SEQ ID NO. 2 and the RuvC domain of the CasX protein from SEQ ID NO. 1, or vice versa. As another example, a chimeric CasX protein may comprise the NTSB, TSL, helix II, OBD, and RuvC domains of the CasX protein from SEQ ID No. 2 and the helix I domain of the CasX protein from SEQ ID No. 1, or vice versa. Thus, in certain embodiments, a chimeric CasX protein may comprise the NTSB, TSL, helix II, OBD, and RuvC domains from a first CasX protein, and the helix I domain from a second CasX protein. In some embodiments of the chimeric CasX proteins, the domain of the first CasX protein is derived from the sequence of SEQ ID No. 1, SEQ ID No. 2 or SEQ ID No. 3 and the domain of the second CasX protein is derived from the sequence of SEQ ID No. 1, SEQ ID No. 2 or SEQ ID No. 3 and the first CasX protein and the second CasX protein are not identical. In some embodiments, the domain of the first CasX protein comprises a sequence derived from SEQ ID No. 1 and the domain of the second CasX protein comprises a sequence derived from SEQ ID No. 2. In some embodiments, the domain of the first CasX protein comprises a sequence derived from SEQ ID No. 1 and the domain of the second CasX protein comprises a sequence derived from SEQ ID No. 3. In some embodiments, the domain of the first CasX protein comprises a sequence derived from SEQ ID No. 2 and the domain of the second CasX protein comprises a sequence derived from SEQ ID No. 3.

In some embodiments, the CasX variant protein comprises at least one chimeric domain comprising a first portion from a first CasX protein and a second portion from a second, different CasX protein. As used herein, "chimeric domain" refers to a portion of a single domain containing at least two portions isolated or derived from different sources, such as two naturally occurring proteins or domains from two reference CasX proteins. The at least one chimeric domain may be any of the NTSB, TSL, helix I, helix II, OBD, or RuvC domains as described herein. In some embodiments, the first portion of the CasX domain comprises the sequence of SEQ ID NO. 1 and the second portion of the CasX domain comprises the sequence of SEQ ID NO. 2. In some embodiments, the first portion of the CasX domain comprises the sequence of SEQ ID NO:1 and the second portion of the CasX domain comprises the sequence of SEQ ID NO: 3. In some embodiments, the first portion of the CasX domain comprises the sequence of SEQ ID NO. 2 and the second portion of the CasX domain comprises the sequence of SEQ ID NO. 3. In some embodiments, at least one chimeric domain comprises a chimeric RuvC domain. As an example of the foregoing, the chimeric RuvC domain comprises amino acids 661 to 824 of SEQ ID NO. 1 and amino acids 922 to 978 of SEQ ID NO. 2. As an alternative example of the foregoing, the chimeric RuvC domain comprises amino acids 648 to 812 of SEQ ID NO. 2 and amino acids 935 to 986 of SEQ ID NO. 1. In some embodiments, the CasX protein comprises a first domain from a first CasX protein and a second domain from a second CasX protein, and at least one chimeric domain comprises at least two portions isolated from different CasX proteins using the methods of the embodiments described in this paragraph.

Table 3: casX variant sequences

d. Protein affinity for gRNA

In some embodiments, the CasX variant proteins of the present disclosure used in AAV systems have improved affinity for gRNA relative to a reference CasX protein, resulting in the formation of ribonucleoprotein complexes. Increased affinity of CasX variant proteins for gRNA may, for example, result in lower K that results in RNP complexes_d This may in some cases lead to more stable ribonucleoprotein complex formation. In some embodiments, the increased affinity of the CasX variant protein for gRNA results in increased stability of the ribonucleoprotein complex when delivered to human cells. This increased stability can affect the function and utility of the complex in the cells of the subject, as well as lead to improved pharmacokinetic properties in the blood when delivered to the subject. In some embodiments, the increased affinity of the CasX variant protein and the increased stability of the resulting ribonucleoprotein complex allow for the delivery of lower doses of the CasX variant protein to a subject or to a subjectCells while still having the desired activity, such as in vivo or in vitro gene editing. In some embodiments, when both the CasX variant protein and the gRNA remain in the RNP complex, the higher affinity (tighter binding) of the CasX variant protein to the gRNA allows for a greater number of editing events. Increased editing events can be assessed using editing assays, such as the EGFP disruption assay described herein.

In some embodiments, increased affinity of CasX variant proteins for gRNA results in increased stability of ribonucleoprotein complexes upon delivery to mammalian cells (including in vivo delivery to a subject). This increased stability can affect the function and utility of the complex in the cells of the subject, as well as lead to improved pharmacokinetic properties in the blood when delivered to the subject. In some embodiments, the increased affinity of the CasX variant protein and the increased stability of the resulting ribonucleoprotein complex allow for lower doses of the CasX variant protein to be delivered to a subject or cell while still having the desired activity; such as in vivo or in vitro gene editing. The increased ability to form RNPs and hold them in a stable form can be assessed using assays (such as the in vitro cleavage assays described in the examples herein). In some embodiments, when the complex is RNP, it comprises the present disclosure as compared to the RNP comprising reference CasX of SEQ ID NOs 1-3RNP of CasX variants of (c) is capable of achieving at least 2-fold, at least 5-fold or at least 10-fold higher k_Cutting Rate.

In some embodiments, when both the CasX variant protein and the gRNA remain in the RNP complex, the higher affinity (tighter binding) of the CasX variant protein to the gRNA allows for a greater number of editing events. The increased editing event may be assessed using editing assays (such as the assays described herein).

Without wishing to be bound by theory, in some embodiments, amino acid changes in the helix I domain may increase the binding affinity of the CasX variant protein to the gRNA targeting sequence, while changes in the helix II domain may increase the binding affinity of the CasX variant protein to the gRNA scaffold stem loop, and changes in the Oligonucleotide Binding Domain (OBD) increase the binding affinity of the CasX variant protein to the gRNA triplex.

Methods for determining the binding affinity of CasX proteins for gRNA include in vitro methods using purified CasX proteins and gRNA. If the gRNA or CasX protein is labeled with a fluorophore, the binding affinity of the reference CasX and the variant protein can be measured by fluorescence polarization. Alternatively, or in addition, binding affinity may be measured by biofilm interference techniques, electrophoretic Mobility Shift Analysis (EMSA), or filtration binding methods. Other standard techniques for quantifying the absolute affinity of RNA binding proteins (such as reference CasX and variant proteins of the present disclosure) for a particular gRNA (such as reference gRNA and variants thereof) include, but are not limited to, isothermal calorimetry (ITC) and Surface Plasmon Resonance (SPR) and the methods of the examples.

e. Affinity for target nucleic acid

In some embodiments, the CasX variant proteins used in AAV systems have improved binding affinity to non-target strands of target nucleic acids. As used herein, the term "non-target strand" refers to a strand of a DNA target nucleic acid sequence that does not form watson and crick base pairs with a targeting sequence in a gRNA and is complementary to the target strand. In some embodiments, the CasX variant protein has about 1.1-fold to about 100-fold increased binding affinity to a non-target sequence of a target nucleic acid as compared to the reference protein of SEQ ID NO. 1, SEQ ID NO. 2, or SEQ ID NO. 3 or as compared to CasX variant 119 (SEQ ID NO. 72) and CasX491 (SEQ ID NO. 138).

Methods of measuring affinity of CasX proteins (such as references or variants) for target nucleic acid molecules can include Electrophoretic Mobility Shift Analysis (EMSA), filter binding methods, isothermal calorimetry (ITC) and Surface Plasmon Resonance (SPR), fluorescence polarization and biofilm interference techniques (BLI). Other methods of measuring affinity of CasX proteins for targets include in vitro biochemical assays that measure DNA cleavage events over time.

In some embodiments, the CasX variant protein used in the AAV system is catalytic dead (dCasX). In some embodiments, the disclosure provides RNPs comprising a catalytic death CasX protein that retains the ability to bind to target DNA. Exemplary catalytic death CasX variant proteins comprise one or more mutations in the active site of the RuvC domain of the CasX protein. In some embodiments, the catalytic death CasX variant protein comprises substitutions at residues 672, 769, and/or 935 of SEQ ID NO. 1. In some embodiments, the catalytic death CasX variant protein comprises a substitution of D672A, E769A and/or D935A in the reference CasX protein of SEQ ID NO: 1. In some embodiments, the catalytic death CasX protein comprises substitutions at amino acids 659, 765, and/or 922 of SEQ ID NO. 2. In some embodiments, the catalytic death CasX protein comprises a D659A, E756A and/or D922A substitution in the reference CasX protein of SEQ ID NO. 2. In a further embodiment, the catalytic death reference CasX protein comprises a deletion of all or part of the RuvC domain of the reference CasX protein. Exemplary dCAsX sequences are provided as SEQ ID NOS 40808-40827, 41006-41009 in Table 7.

In some embodiments, the improved affinity of the CasX variant protein for DNA also improves the function of the catalytically inactive form of the CasX variant protein. In some embodiments, the catalytically inactive form of the CasX variant protein comprises one or more mutations in the DED motif in RuvC. In some embodiments, the catalytic death CasX variant protein may be used for base editing or epigenetic modification. In some embodiments, where there is a higher affinity for DNA, the catalytically dead CasX variant protein may find its target DNA faster than catalytically active CasX, remain bound to the target DNA longer, bind to the target DNA in a more stable manner, or a combination thereof, thereby improving the function of the catalytically dead CasX variant protein.

f. Improved specificity for target sites

In some embodiments, the CasX variant protein used in the AAV system has improved specificity for the target nucleic acid sequence relative to a reference CasX protein. As used herein, "specificity" is used interchangeably with "target specificity" and refers to the degree to which a CRISPR/Cas system ribonucleoprotein complex cleaves off-target sequences that are similar to, but not identical to, a target nucleic acid sequence; for example, a CasX variant RNP with a higher degree of specificity will exhibit reduced off-target cleavage of the sequence relative to the reference CasX protein. The specificity of CRISPR/Cas system proteins and reduced potential detrimental off-target effects may be critical in order to achieve an acceptable therapeutic index for use in mammalian subjects.

Without wishing to be bound by theory, it is possible that amino acid changes in the helix I and II domains (which increase the specificity of the CasX variant protein for the target nucleic acid strand) may increase the specificity of the CasX variant protein for the target nucleic acid sequence as a whole. In some embodiments, amino acid changes (which increase the specificity of the CasX variant protein for the target nucleic acid sequence) can also result in reduced affinity of the CasX variant protein for DNA.

Methods for testing the target specificity of CasX proteins, such as variants or references, may include priming and circularization for in vitro reporting of cleavage effects (CIRCLE-seq) by sequencing, or similar methods. Briefly, in the CIRCLE-seq technique, genomic DNA is sheared and circularized by ligation of stem-loop adaptors that are nicked at the stem-loop regions to expose 4 nucleotide palindromic projections. Intramolecular ligation and degradation of the remaining linear DNA is then performed. Circular DNA molecules containing CasX cleavage sites are then linearized with CasX and adaptor adaptors are ligated to the exposed ends, followed by high throughput sequencing to generate paired end reads containing information about the off-target sites. Other assays that can be used to detect off-target events and thus detect CasX protein specificity include assays for detecting and quantifying indels (insertions and deletions) formed at those selected off-target sites, such as mismatch detection nuclease assays and Next Generation Sequencing (NGS). Exemplary mismatch detection assays include nuclease assays in which genomic DNA from cells treated with CasX and sgrnas is PCR amplified, denatured, and re-hybridized to form heteroduplex DNA containing one wild-type strand and one strand with an indel. Mismatches are recognized and cleaved by a mismatch detection nuclease (such as a Surveyor nuclease or T7 endonuclease I).

g. Primordial spacer sequence and PAM sequence

In this context, a primordial spacer is defined as a DNA sequence complementary to a targeting sequence of a guide RNA (referred to as a target strand) and DNA complementary to the DNA sequence (referred to as a non-target strand). As used herein, PAM is a nucleotide sequence adjacent to a primordial spacer that, along with a targeting sequence for gRNA, aids in the orientation and positioning of CasX to potentially cleave the primordial spacer strand.

PAM sequences may be degenerate and specific RNP constructs may have different preferred and tolerant PAM sequences that support different cleavage efficiencies. Conventionally, the disclosure relates to both PAM and primordial spacer sequences and their directionality according to the orientation of the non-target strand, unless otherwise specified. This does not mean that PAM sequences that are not target strands (but not target strands) are determinants of cleavage or are involved in target recognition by mechanisms. For example, when referring to TTC PAM, it may actually be the complementary GAA sequence required for target cleavage, or it may be some combination of nucleotides from both strands. In the case of the CasX proteins disclosed herein, PAM is located 5' to the original spacer, with a single nucleotide separating PAM from the first nucleotide of the original spacer. Thus, in the case of reference to CasX, where typical PAM is TTC, PAM is understood to mean a sequence following the formula 5'- … NNTTCN (primordial spacer) NNNNNN …', where "N" is any DNA nucleotide and "(primordial spacer)" is a DNA sequence having identity to the targeting sequence of the guide RNA. In the case of CasX variants with extended PAM recognition, TTC, CTC, GTC or ATC PAM should be understood to mean a sequence following the formula:

5'- … NNTTCN (original spacer) NNNNNNNN …';

5'- … NNCTCN (original spacer) NNNNNN …';

5'- … NNGTCN (original spacer) NNNNNN …'; or (b)

5'- … NNATCN (original spacer) NNNNNN …'. Alternatively, TC PAM should be understood to mean a sequence following the formula 5'- … NNNTCN (original spacer sequence) NNNNNN …'.

In addition, the CasX variant proteins of the present disclosure have enhanced ability to efficiently edit and/or bind target nucleic acids (in 5 'to 3' orientation) using PAM TC motifs (including PAM sequences selected from TTC, ATC, GTC or CTCs) when complexed with gRNA as RNPs as compared to RNPs of reference CasX proteins and reference gRNA, or compared to RNPs of another CasX variant derived therefrom (such as CasX 491 and gRNA 174). In the above, the PAM sequence is located at least 1 nucleotide 5' of the non-target strand of the protospacer sequence that has identity to the targeting sequence of the gRNA in the assay system, as compared to the editing efficiency and/or binding of RNPs comprising the reference CasX protein and the reference gRNA in a comparable assay system. In one embodiment, the RNP of the CasX variant and the gRNA variant exhibits higher editing efficiency and/or binding to a target sequence in a target nucleic acid, wherein the PAM sequence of the target DNA is TTC, than the RNP comprising the reference CasX protein and the reference gRNA (or another CasX variant derived therefrom, such as CasX 491 and gRNA 174) in a comparable assay system. In another embodiment, the RNP of the CasX variant and the gRNA variant exhibits higher editing efficiency and/or binding to a target sequence in a target nucleic acid, wherein the PAM sequence of the target DNA is ATC, than the RNP comprising the reference CasX protein and the reference gRNA (or another CasX variant derived therefrom, such as CasX 491 and gRNA 174) in a comparable assay system. In the preceding embodiments, wherein the CasX variant exhibits enhanced editing with ATC PAM, the CasX variant is 528 (SEQ ID NO: 157). In another embodiment, the RNP of the CasX variant and the gRNA variant exhibits higher editing efficiency and/or binding to a target sequence in a target nucleic acid, wherein the PAM sequence of the target DNA is CTC, than the RNP comprising the reference CasX protein and the reference gRNA (or another CasX variant derived therefrom, such as CasX 491 and gRNA 174) in a comparable assay system. In another embodiment, the RNP of the CasX variant and the gRNA variant exhibits greater editing efficiency and/or binding to a target sequence in a target nucleic acid, wherein the PAM sequence of the target DNA is a GTC, than the RNP comprising the reference CasX protein and the reference gRNA (or the RNP of another CasX variant and gRNA 174 derived therefrom) in a comparable assay system. In the foregoing embodiments, the increased efficiency of editing and/or binding affinity for one or more PAM sequences is at least 1.5-fold, at least 2-fold, at least 4-fold, at least 10-fold, at least 20-fold, at least 30-fold, at least 40-fold or more higher than the efficiency of editing and/or binding affinity for PAM sequences by RNPs of any one of the CasX proteins of SEQ ID NOs 1-3 and the grnas comprising the sequences of table 1. Exemplary assays demonstrating improved editing are described in the examples herein. In some embodiments, the CasX protein can bind to and/or modify (e.g., cleave, nick, methylate, demethylate, etc.) a target nucleic acid and/or a polypeptide associated with the target nucleic acid (e.g., methylation or acetylation of a histone tail). In some embodiments, the CasX protein is catalytic dead (dCasX), but retains the ability to bind to a target nucleotide.

h. Affinity for target RNA

In some embodiments, variants of the reference CasX proteins used in AAV systems of the present disclosure have increased specificity for and increased activity with respect to a target RNA when compared to the reference CasX protein. For example, the CasX variant protein may exhibit increased binding affinity for the target RNA, or increased cleavage of the target RNA, when compared to a reference CasX protein. In some embodiments, the ribonucleoprotein complex comprising the CasX variant protein binds to and does not cleave the target RNA. In some embodiments, the CasX variant has at least about a two to about a ten fold increase in binding affinity to the target RNA as compared to the reference protein of SEQ ID NO. 1, SEQ ID NO. 2, or SEQ ID NO. 3.

i. CasX variant proteins having domains from multiple source proteins

In some embodiments, the disclosure provides AAV encoding a chimeric CasX variant protein comprising protein domains from two or more different CasX proteins (such as two or more naturally occurring CasX proteins, or two or more CasX variant protein sequences as described herein). As used herein, "chimeric CasX protein" refers to CasX that contains at least two domains isolated or derived from different sources (such as two naturally occurring proteins), which in some embodiments may be isolated from different species. For example, in some embodiments, a chimeric CasX protein comprises a first domain from a first CasX protein and a second domain from a second, different CasX protein. In some embodiments, the first domain may be selected from the group consisting of an NTSB domain, a TSL domain, a helix I domain, a helix II domain, an OBD domain, and a RuvC domain. In some embodiments, the second domain is selected from the group consisting of an NTSB domain, a TSL domain, a helix I domain, a helix II domain, an OBD domain, and a RuvC domain, wherein the second domain is different from the first domain described previously. For example, a chimeric CasX protein may comprise the NTSB, TSL, helix I, helix II, OBD domain of the CasX protein from SEQ ID NO. 2 and the RuvC domain of the CasX protein from SEQ ID NO. 1, or vice versa. As another example, a chimeric CasX protein may comprise the NTSB, TSL, helix II, OBD, and RuvC domains of the CasX protein from SEQ ID No. 2 and the helix I domain of the CasX protein from SEQ ID No. 1, or vice versa. Thus, in certain embodiments, a chimeric CasX protein may comprise the NTSB, TSL, helix II, OBD, and RuvC domains from a first CasX protein, and the helix I domain from a second CasX protein. In some embodiments of the chimeric CasX proteins, the domain of the first CasX protein is derived from the sequence of SEQ ID No. 1, SEQ ID No. 2 or SEQ ID No. 3 and the domain of the second CasX protein is derived from the sequence of SEQ ID No. 1, SEQ ID No. 2 or SEQ ID No. 3 and the first CasX protein and the second CasX protein are not identical. In some embodiments, the domain of the first CasX protein comprises a sequence derived from SEQ ID No. 1 and the domain of the second CasX protein comprises a sequence derived from SEQ ID No. 2. In some embodiments, the domain of the first CasX protein comprises a sequence derived from SEQ ID No. 1 and the domain of the second CasX protein comprises a sequence derived from SEQ ID No. 3. In some embodiments, the domain of the first CasX protein comprises a sequence derived from SEQ ID No. 2 and the domain of the second CasX protein comprises a sequence derived from SEQ ID No. 3. As an example of the foregoing, the chimeric RuvC domain comprises amino acids 660 through 823 of SEQ ID NO. 1 and amino acids 921 through 978 of SEQ ID NO. 2. As an alternative example of the foregoing, the chimeric RuvC domain comprises amino acids 647 to 810 of SEQ ID NO. 2 and amino acids 934 to 986 of SEQ ID NO. 1. In some embodiments, at least one chimeric domain comprises a chimeric helix I domain, wherein the chimeric helix I domain comprises amino acids 56-99 of SEQ ID NO. 1 and amino acids 192-332 of SEQ ID NO. 2. In some embodiments, the chimeric CasX variant is further modified, including a CasX variant of a sequence selected from the group consisting of SEQ ID NO:40959, SEQ ID NO:40960, SEQ ID NO:40968, SEQ ID NO:40977, SEQ ID NO:40969, SEQ ID NO:40970, SEQ ID NO:40971, SEQ ID NO:40972, SEQ ID NO:40973, SEQ ID NO:40961, SEQ ID NO:40978, SEQ ID NO:40962, SEQ ID NO:40979, SEQ ID NO:40963, SEQ ID NO:40980, SEQ ID NO:40964, SEQ ID NO:40981, SEQ ID NO:40965, SEQ ID NO:40982, SEQ ID NO:40966, SEQ ID NO:40983, SEQ ID NO:40967, SEQ ID NO:40974, SEQ ID NO:40975, SEQ ID NO:40976, SEQ ID NO:40984, and SEQ ID NO: 40985. In some embodiments, one or more additional modifications include insertions, substitutions, or deletions as described herein.

In the case of split or non-contiguous domains such as helix I, ruvC and OBD, a portion of the non-contiguous domain may be replaced with a corresponding portion from any other source. For example, the helix I-I domain in SEQ ID NO. 2 (sometimes referred to as helix I-a) may be replaced by the corresponding helix I-I sequence from SEQ ID NO. 1, and so forth. The domain sequences from the reference CasX protein and their coordinates are shown in table 4. Representative examples of chimeric CasX proteins include variants of CasX 472-483, 485-491 and 515, the sequences of which are shown in Table 3.

Table 4: reference domain coordinates in CasX proteins

* OBD I and II, helices I-I and I-II, and RuvC I and II are also referred to herein as OBD a and b, helices I a and b, and RuvC a and b.

Exemplary domain sequences are provided in table 5 below.

Table 5: exemplary domain sequences

Another exemplary helix II domain sequence is provided as SEQ ID NO:41004, and another exemplary RuvC a domain sequence is provided as SEQ ID NO: 41005.

In other embodiments, the CasX variant protein comprises the sequence of SEQ ID NOs 49-160, 40208-40286, or 40828-40912 as shown in table 3, and further comprises one or more NLS disclosed herein at or near the N-terminus, the C-terminus, or both. In other embodiments, the CasX variant protein comprises the sequence of SEQ ID NO:72-160, 40208-40286 or 40828-40912, and further comprises one or more NLS disclosed herein at or near the N-terminus, the C-terminus or both. In other embodiments, the CasX variant protein comprises the sequence of SEQ ID NO:144-160, 40208-40286, or 40828-40912, and further comprises one or more NLS disclosed herein at or near the N-terminus, the C-terminus, or both. It will be appreciated that in some cases, the N-terminal methionine of the CasX variants in the tables is removed from the expressed CasX variants during post-translational modification. One of ordinary skill in the art will appreciate that an NLS near the N or C terminus of a protein may be within 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, or 20 amino acids of the N or C terminus.

j. CasX variants derived from other CasX variants

In further iterations of producing variant proteins, variant proteins may be used to produce additional CasX variants of the present disclosure. For example, casX 119 (SEQ ID NO: 72), casX 491 (SEQ ID NO: 138), and CasX515 (SEQ ID NO: 145) are exemplary variant proteins that are modified to produce additional CasX variants of the present disclosure having improved or additional properties relative to a reference CasX or CasX variant derived therefrom. CasX 119 contains a substitution of L379R, a substitution of A708K and a deletion of P at position 793 of SEQ ID NO. 2. CasX 491 contains the NTSB from SEQ ID NO. 1 and helix 1B exchange. CasX515 was derived from CasX 491 by inserting P at position 793 (relative to SEQ ID NO: 2) and was used to generate the CasX variant described in example 21. For example, relative to CasX515, casX 668 has an insertion of R and a substitution of G223S at position 26. Relative to CasX515, casX 672 has substitutions of L169K and G223S. Relative to CasX515, casX 676 has substitutions of L169K and G223S and an insertion of R at position 26.

Exemplary characteristics that may be improved in CasX variant proteins relative to the same characteristics in a reference CasX protein or relative to CasX variants derived therefrom include, but are not limited to, improved variant folding, increased binding affinity for gRNA, increased binding affinity for target nucleic acid, improved ability to utilize a broader spectrum of PAM sequences in editing and/or binding of target nucleic acid, improved unwinding of target DNA, increased editing activity, improved editing efficiency, improved editing specificity for target nucleic acid, reduced off-target editing or cleavage, increased percentage of eukaryotic genome that is available for editing, increased nuclease activity, increased target strand loading for double strand cleavage, reduced target strand loading for single strand nicks, increased binding of DNA non-target strands, improved protein stability, improved protein: gRNA (RNP) complex stability, and improved fusion characteristics. In particular embodiments, such improved features may include, but are not limited to, improved cleavage activity in target nucleic acids having TTC, ATC, and CTC PAM sequences, increased cleavage specificity of the target nucleic acid sequence, and reduced off-target cleavage of the target nucleic acid, as described in the examples.

Table 6: casX (CasX)515 domain sequence

The CasX variants of the embodiments described herein have the ability to form RNP complexes with the grnas disclosed herein. In some embodiments, an RNP comprising a CasX variant protein and a gRNA of the disclosure at a concentration of 20pM or less is capable of cleaving double stranded DNA targets with an efficiency of at least 80%. In some embodiments, RNP at a concentration of 20pM or less is capable of cleaving double stranded DNA targets with an efficiency of at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 85%, at least 90% or at least 95%. In some embodiments, RNPs at a concentration of 50pM or less, 40pM or less, 30pM or less, 20pM or less, 10pM or less, or 5pM or less are capable of cleaving double stranded DNA targets with an efficiency of at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 85%, at least 90%, or at least 95%. These improved features are described in more detail below.

k. Catalytic death of CasX variants

Table 7: catalytic death CasX variant proteins

CasX fusion proteins

In some embodiments, the disclosure provides AAV encoding CasX proteins comprising a heterologous protein fused to CasX. In some cases, casX is a CasX variant of any one of the embodiments described herein. In some embodiments, the CasX variant comprises any one of the sequences as shown in table 3 fused to one or more proteins or domains thereof having an activity of interest.

In some embodiments, the CasX fusion protein comprises any of the variants SEQ ID NOs 49-160, 40208-40369 or 40828-40912 as shown in Table 3 fused to one or more proteins or domains thereof having different activities of interest, thereby producing a fusion protein. For example, in some embodiments, the CasX variant protein is fused to a protein (or domain thereof) that inhibits transcription, modifies a target nucleic acid, or modifies a polypeptide associated with a nucleic acid (e.g., histone modification).

In some embodiments, a heterologous polypeptide (or heterologous amino acid such as a cysteine residue or unnatural amino acid) can be inserted at one or more positions within the CasX protein to produce a CasX fusion protein. In other embodiments, cysteine residues may be inserted at one or more positions within the CasX protein, followed by conjugation of a heterologous polypeptide as described below. In some alternative embodiments, a heterologous polypeptide or heterologous amino acid may be added to the N-terminus or C-terminus of the CasX variant protein. In other embodiments, a heterologous polypeptide or heterologous amino acid may be inserted within the sequence of the CasX protein.

In some embodiments, the CasX variant fusion protein retains RNA-guided sequence-specific target nucleic acid binding and cleavage activity. In some cases, the CasX variant fusion protein has (retains) 50% or more of the activity (e.g., cleavage and/or binding activity) of the corresponding CasX variant protein (which has no heterologous protein inserted). In some cases, the CasX variant fusion protein retains at least about 60%, or at least about 70% or more, at least about 80%, or at least about 90%, or at least about 92%, or at least about 95%, or at least about 98%, or at least about 100% of the activity (e.g., cleavage and/or binding activity) of the corresponding CasX protein (which has no heterologous protein inserted).

In some cases, the CasX variant fusion protein retains (has) target nucleic acid binding activity relative to the activity of a CasX protein without heterologous amino acids or heterologous polypeptide insertions. In some cases, the CasX variant fusion protein retains at least about 60%, or at least about 70% or more, at least about 80%, or at least about 90%, or at least about 92%, or at least about 95%, or at least about 98%, or at least about 100% of the binding activity of the corresponding CasX protein (which has no heterologous protein inserted).

In some cases, the CasX variant fusion protein retains (has) target nucleic acid binding and/or cleavage activity relative to the activity of a parent CasX protein without heterologous amino acid or heterologous polypeptide insertion. For example, in some cases, the CasX variant fusion protein has (retains) 50% or more of the binding and/or cleavage activity of the corresponding parent CasX protein (without the inserted CasX protein). For example, in some cases, the CasX variant fusion protein has (retains) 60% or more (70% or more, 80% or more, 90% or more, 92% or more, 95% or more, 98% or more, or 100%) of the binding and/or cleavage activity of the corresponding CasX parent protein (with no inserted CasX protein). Methods of measuring cleavage and/or binding activity of CasX proteins and/or CasX fusion proteins will be known to those of ordinary skill in the art and any convenient method may be used.

A variety of heterologous polypeptides are suitable for inclusion in the reference CasX or CasX variant fusion proteins of the present disclosure. In some cases, the fusion partner may modulate transcription of the target DNA (e.g., inhibit transcription, increase transcription). For example, in some cases, the fusion partner is a protein (or domain from a protein) that inhibits transcription (e.g., a transcription repressor protein, a protein that functions via recruitment of transcription inhibitor proteins, modification of target DNA (such as methylation), recruitment of DNA modification genes, regulation of histones associated with target DNA, recruitment of histone modification genes (such as those that modify acetylation and/or methylation of histones), and the like). In some cases, the fusion partner is a protein that increases transcription (or a domain from a protein) (e.g., a transcriptional activator, a protein that functions via recruitment of transcriptional activator proteins, modification of target DNA (such as methylation), recruitment of DNA modification genes, modulation of histones associated with target DNA, recruitment of histone modification genes (such as those that modify acetylation and/or methylation of histones), and the like).

In some cases, the fusion partner has an enzymatic activity that modifies the target nucleic acid sequence. Such as nuclease activity, methyltransferase activity, demethylase activity, DNA repair activity, DNA damage activity, deamination activity, disproportionation enzyme activity, alkylation activity, depurination activity, oxidation activity, pyrimidine dimer formation activity, integrase activity, transposase activity, recombinase activity, polymerase activity, ligase activity, helicase activity, photolytic enzyme activity or glycosylase activity. In some embodiments, the CasX variant comprises any one of SEQ ID NOs 49-160, 40208-40369, or 40828-40912 as shown in table 3, and a polypeptide having the following activity: methyltransferase activity, demethylase activity, acetyltransferase activity, deacetylase activity, kinase activity, phosphatase activity, ubiquitin ligase activity, deubiquitination activity, adenylation activity, deadenylation activity, sumoylation activity, desumoylation activity, ribosylation activity, deribosylation activity, myristoylation activity or demyristoylation activity.

Examples of proteins (or fragments thereof) that can be used as fusion partners to increase transcription include, but are not limited to: transcriptional activators, such as VP16, VP64, VP48, VP160, p65 subdomains (e.g., from NFkB), and activation domains of EDLL and/or TAL activation domains (e.g., for activity in plants); histone lysine methyltransferases such as SET1A, SET1B, MLL1 to 5, ASH1, SYMD2, NSD1, etc.; histone lysine demethylases such as JHDM2a/b, UTX, JMJD3 and the like; histone acetyltransferases such as GCN5, PCAF, CBP, P, TAF1, TIP60/PLIP, MOZ/MYST3, MORF/MYST4, SRC1, ACTR, P160, CLOCK and the like; DNA demethylases such as ten-eleven translocation (TET) dioxygenase 1 (TET 1 CD), TET1, DME, DML1, DML2, ROS1, and the like.

Examples of proteins (or fragments thereof) that can be used as fusion partners to reduce transcription include, but are not limited to: transcription repressors such as Kruppel-related cassettes (KRAB or SKD); KOX1 inhibitory domain; madmsin 3 interaction domain (SID); ERF Repressor Domains (ERD), SRDX repressor domains (e.g., for repression in plants), etc.; histone lysine methyltransferases such as Pr-SET7/8, SUV4-20H1, RIZ1, etc.; histone lysine demethylases such as JMJD 2A/JMM 3A, JMJD2B, JMJD C/GASC1, JMJD2D, JARID A/RBP2, JARID1B/PLU-1, JARID 1C/SMCX, JARID1D/SMCY and the like; histone lysine deacetylases such as HDAC1, HDAC2, HDAC3, HDAC8, HDAC4, HDAC5, HDAC7, HDAC9, SIRT1, SIRT2, HDAC11, and the like; DNA methylases such as HhaIDNA m5 c-methyltransferase (m.hhai), DNA methyltransferase 1 (DNMT 1), DNA methyltransferase 3a (DNMT 3 a), DNA methyltransferase 3b (DNMT 3 b), METI, DRM3 (plant), ZMET2, CMT1, CMT2 (plant), and the like; and peripheral recruitment elements such as lamin a, lamin B, and the like.

In some cases, the fusion partner has an enzymatic activity that modifies the target nucleic acid sequence (e.g., ssRNA, dsRNA, ssDNA, dsDNA). Examples of enzymatic activities that may be provided by fusion partners include, but are not limited to: nuclease activity, such as provided by a restriction enzyme (e.g., fokl nuclease); methyltransferase activity such as provided by methyltransferases (Hhal DNA m5 c-methyltransferase (m.hhal), DNA methyltransferase 1 (DNMT 1), DNA methyltransferase 3a (DNMT 3 a), DNA methyltransferase 3b (DNMT 3 b), METI, DRM3 (plant), ZMET2, CMT1, CMT2 (plant), etc.; demethylase activity, such as provided by a demethylase (e.g., ten-eleven translocation (TET) dioxygenase 1 (TET 1 CD), TET1, DME, DML1, DML2, ROS1, etc.); DNA repair activity; DNA damaging activity; deamination activity, such as provided by deaminase (e.g., cytosine deaminase, e.g., APOBEC protein such as rat APOBEC); a disproportionation enzyme activity; alkylation activity; depurination activity; oxidation activity; pyrimidine dimer formation activity; integrase activity, such as provided by integrase and/or a dissociase (e.g., gin convertases, such as high activity mutants of Gin convertases, ginH106Y; human immunodeficiency virus type 1 Integrase (IN); tn3 dissociase; etc.); transposase activity; recombinase activity, such as provided by a recombinase (e.g., a catalytic domain of Gin recombinase); polymerase activity; ligase activity; helicase activity; photolytic and glycosylase activity).

In some cases, a CasX variant protein of the disclosure for use in an AAV system is fused to a polypeptide selected from the group consisting of: a domain for increasing transcription (e.g., VP16 domain, VP64 domain), a domain for decreasing transcription (e.g., KRAB domain, e.g., from Kox1 protein), a core catalytic domain of histone acetyltransferase (e.g., histone acetyltransferase p 300), a protein/domain providing a detectable signal (e.g., a fluorescent protein such as GFP), a nuclease domain (e.g., fokl nuclease), or a base editor (e.g., a cytidine deaminase such as apodec 1).

In some cases, the fusion partner has an enzymatic activity that modifies a protein (e.g., histone, RNA binding protein, DNA binding protein, etc.) associated with the target nucleic acid (e.g., ssRNA, dsRNA, ssDNA, dsDNA). Examples of enzymatic activities (modifying proteins associated with a target nucleic acid) that can be provided by fusion partners include, but are not limited to: methyltransferase activity such as provided by Histone Methyltransferases (HMT) (e.g., stain 3-9 inhibitor homolog 1 (SUV 39H1, also known as KMT 1A), euchromatin lysine methyltransferase 2 (G9A, also known as KMT1C and EHMT 2), SUV39H2, ESET/SETDB 1, etc., SET1A, SET1B, MLL1 to 5, ASH1, SYMD2, NSD1, DOT1L, pr-SET7/8, SUV4-20H1, EZH2, RIZ 1); demethylase activity such as provided by histone demethylases (e.g., lysine demethylase 1A (KDM 1A, also known as LSD 1), JHDM2A/B, JMJD2A/JHDM3A, JMJD2B, JMJD C/GASC1, JMJD2D, JARID a/RBP2, JARID1B/PLU-1, JARID1C/SMCX, JARID1D/SMCY, UTX, JMJD3, etc.); acetyltransferase activity, such as provided by histone acetyltransferase transferase (e.g., catalytic core/fragment of human acetyltransferase P300, GCN5, PCAF, CBP, TAF1, TIP60/PLIP, MOZ/MYST3, MORF/MYST4, HB01/MYST2, HMOF/MYST1, SRC1, ACTR, P160, CLOCK, etc.); deacetylase activity, such as provided by histone deacetylases (e.g., HDAC1, HDAC2, HDAC3, HDAC8, HDAC4, HDAC5, HDAC7, HDAC9, SIRT1, SIRT2, HDAC11, etc.); kinase activity; phosphatase activity; ubiquitin ligase activity; deubiquitination activity; adenylation activity; deadenylation activity; SUMOylating activity; desupenylating activity; ribosylating activity; a deglycosylation activity; myristoylation activity and dimyristoylation activity.

Other examples of suitable fusion partners for CasX variants are (i) a dihydrofolate reductase (DHFR) destabilizing domain (e.g., to produce a chemically controllable target RNA-directed polypeptide or a conditionally active RNA-directed polypeptide), and (ii) a chloroplast transit peptide.

In some embodiments, the CasX variant comprises any one of SEQ ID NOs 49-160, 40208-40369, or 40828-40912 as shown in table 3, and a chloroplast transit peptide, including but not limited to:

MASMISSSAVTTVSRASRGQSAAMAPFGGLKSMTGFPVRKVNTDITSITSNGGR VKCMQVWPPIGKKKFETLSYLPPLTRDSRA (SEQ ID NO: 40790); MASMISSSAVTTVSRASRGQSAAMAPFGGLKSMTGFPVRKVNTDITSITSNGGRVKS (SEQ ID NO: 39980); MASSMLSSATMVASPAQATMVAPFNGLKSSAAFPATRKANNDITSITSNGGRVNCMQV WPPIEKKKFETLSYLPDLTDSGGRVNC (SEQ ID NO: 39968); MAQVSRICNGVQNPSLISNLSKSSQRKSPLSVSLKTQQHPRAYPISSSWGLKKSGMTLIG SELRPLKVMSSVSTAC (SEQ ID NO: 39969); MAQVSRICNGVWNPSLISNLSKSSQRKSPLSVSLKTQQHPRAYPISSSWGLKKSGMTLIG SELRPLKVMSSVSTAC (SEQ ID NO: 39970); MAQINNMAQGIQTLNPNSNFHKPQVPKSSSFLVFGSKKLKNSANSMLVLKKDSIFMQLF CSFRISASVATAC (SEQ ID NO: 39971); MAALVTSQLATSGTVLSVTDRFRRPGFQGLRPRNPADAALGMRTVGASAAPKQSRKPH RFDRRCLSMVV (SEQ ID NO: 39972); MAALTTSQLATSATGFGIADRSAPSSLLRHGFQGLKPRSPAGGDATSLSVTTSARATPKQ QRSVQRGSRRFPSVVVC (SEQ ID NO: 39973); MASSVLSSAAVATRSNVAQANMVAPFTGLKSAASFPVSRKQNLDITSIASNGGRVQC (SEQ ID NO: 39974); MESLAATSVFAPSRVAVPAARALVRAGTVVPTRRTSSTSGTSGVKCSAAVTPQASPVIS RSAAAA (SEQ ID NO: 39975); and MGAAATSMQSLKFSNRLVPPSRRLSPVPNNVTCNNLPKSAAPVRTVKCCASSWNSTINGAAATTNGASAASS (SEQ ID NO: 39976).

In some cases, a CasX variant protein of the present disclosure for use in an AAV system may include an endosomal escape peptide. In some cases, the endosomal escape polypeptide comprises the amino acid sequence GLFXallLXSLWXLLXa (SEQ ID NO: 39977), wherein each X is independently selected from lysine, histidine and arginine. In some cases, the endosomal escape polypeptide comprises amino acid sequence GLFHALLHLLHSLWHLLLHA (SEQ ID NO: 39978) or HHHHHHHHH (SEQ ID NO: 39979).

Non-limiting examples of fusion partners for use with CasX variants when targeting ssRNA target nucleic acid sequences include (but are not limited to): splicing factors (e.g., RS domains); protein translation components (e.g., translation initiation, extension, and/or release factors; e.g., eIF 4G); an RNA methylase; RNA editing enzymes (e.g., RNA deaminase, e.g., adenosine Deaminase (ADAR) acting on RNA, including a to I and/or C to U editing enzymes); a helicase; an RNA-binding protein; etc. It will be appreciated that the heterologous polypeptide may comprise the entire protein, or in some cases may comprise a fragment (e.g., a functional domain) of the protein.

Some RNA splicing factors (either in whole or as fragments thereof) that can be used as fusion partners for CasX variants have modular organization with separate sequence-specific RNA binding modules and splicing effector domains. For example, members of the serine/arginine (SR) rich protein family contain an N-terminal RNA Recognition Motif (RRM) that binds to an Exon Splicing Enhancer (ESE) in the pre-mRNA, and a C-terminal RS domain that facilitates exon inclusion. As another example, hnRNP protein hnRNP Al binds to an Exon Splice Silencer (ESS) through its RRM domain and inhibits exon inclusion through a C-terminal glycine-rich domain. Alternative uses of splice elements (ss) may be regulated by binding to regulatory sequences between two alternative sites. For example, ASF/SF2 may recognize ESEs and facilitate the use of proximal sites for introns, while hnRNP Al may bind to ESS and shift splicing to the use of distal sites for introns. One application of such factors is the generation of ESFs that regulate alternative splicing of endogenous genes, particularly disease-related genes. For example, bcl-x pre-mRNA produces two splice isoforms with two alternative 5' splice sites to encode proteins with opposite functions. Long splicing isoforms Bcl-xL are potent inhibitors of apoptosis, which are expressed in long-lived postmitotic cells and up-regulated in many cancer cells, protecting the cells from apoptotic signals. The short isoform Bcl-xS is a pro-apoptotic isoform and is expressed at high levels in cells with turnover (e.g., developing lymphocytes). The proportion of the two Bcl-x splice isoforms is regulated by multiple cis-elements located in the core exon region or exon extension region (i.e., between two alternative 5' splice sites). For further examples see WO2010075303, which is hereby incorporated by reference in its entirety.

Other suitable fusion partners for use with CasX variants include, but are not limited to: as proteins (or fragments thereof) of the boundary element (e.g. CTCF), peripherally recruited proteins and fragments thereof (e.g. lamin a, lamin B, etc.) and protein docking elements (e.g. FKBP/FRB, hill/abl, etc.) are provided.

In some cases, the heterologous polypeptide (fusion partner) used with the CasX variant provides subcellular localization, i.e., the heterologous polypeptide contains subcellular localization sequences (e.g., nuclear Localization Signal (NLS) targeting the nucleus, sequence that retains the fusion protein outside the nucleus (e.g., nuclear Export Sequence (NES)), sequence that retains the fusion protein within the cytoplasm, mitochondrial localization signal targeting mitochondria, chloroplast localization signal targeting chloroplast, ER retention signal, etc.). In some embodiments, the target RNA-guided polypeptide or conditionally active RNA-guided polypeptide and/or the target CasX fusion protein does not include an NLS, such that the protein is not targeted to the nucleus (which may be advantageous, for example, when the target nucleic acid sequence is RNA present in the cytosol). In some embodiments, the fusion partner may provide a tag (i.e., the heterologous polypeptide is a detectable label) to facilitate tracking and/or purification (e.g., a fluorescent protein, such as Green Fluorescent Protein (GFP), yellow Fluorescent Protein (YFP), red Fluorescent Protein (RFP), cyan Fluorescent Protein (CFP), mCherry, tdTomato, etc., a histidine tag, such as a 6XHis tag, a Hemagglutinin (HA) tag, a FLAG tag, a Myc tag, etc.).

In some cases, casX variant proteins used in AAV systems include (fused to) a Nuclear Localization Signal (NLS) for targeting CasX/gRNA to the nucleus. In some cases, the CasX variant protein is fused to 2 or more, 3 or more, 4 or more, or 5 or more, 6 or more, 7 or more, 8 or more NLS. In some cases, one or more NLSs (2 or more, 3 or more, 4 or more, or 5 or more NLSs) are located at or near the N-terminus and/or C-terminus (e.g., within 50 amino acids). In some cases, one or more NLSs (2 or more, 3 or more, 4 or more, or 5 or more NLSs) are located at or near the N-terminus (e.g., within 50 amino acids). In some cases, one or more NLSs (2 or more, 3 or more, 4 or more, or 5 or more NLSs) are located at or near the C-terminus (e.g., within 50 amino acids). In some cases, the NLS is at the N-terminus and the NLS is at the C-terminus. In some cases, one or more NLSs (2 or more, 3 or more, 4 or more, or 5 or more NLSs) are located at or near both the N-terminus and the C-terminus (e.g., within 50 amino acids). In some cases, casX variant proteins include (are fused to) between 1 and 10 NLSs (e.g., 1 to 9, 1 to 8, 1 to 7, 1 to 6, 1 to 5, 2 to 10, 2 to 9, 2 to 8, 2 to 7, 2 to 6, or 2 to 5 NLSs). In some cases, the CasX variant protein includes (is fused to) between 2 and 5 NLSs (e.g., 2 to 4 or 2 to 3 NLSs). Non-limiting examples of NLS suitable for use with CasX variants include sequences having at least about 80%, at least about 90%, or at least about 95% identity or identity to sequences derived from: NLS of the SV40 virus large T antigen having the amino acid sequence PKKKRKV (SEQ ID NO: 196); NLS from nucleoplasmin (e.g., a dual-typed nucleoplasmin NLS having the sequence KRPAATKKAGQAKKKK (SEQ ID NO: 197); c-myc NLS having amino acid sequence PAAKRVKLD (SEQ ID NO: 248) or RQRRNELKRSP (SEQ ID NO: 161); hRNPAl M9 NLS with sequence NQSSNFGPMKGGNFGGRSSGPYGGGGQYFAKPRNQGGY (SEQ ID NO: 162); the sequence RMRIZFKNKGKDTARRRRRVEVSLRKAKDEQILKRRNV (SEQ ID NO: 163) from the IBB domain of the import protein-alpha; sequence VSRKRPRP (SEQ ID NO: 164) and PPKKARED (SEQ ID NO: 165) of the myosarcoma T protein; the sequence PQPKKPL of human p53 (SEQ ID NO: 166); sequence SALIKKKKKMAP of mouse c-abl IV (SEQ ID NO: 167); the DRLRR (SEQ ID NO: 168) and PKQKKRK sequences (SEQ ID NO: 169) of influenza virus NS 1; RKLKKKIKKL sequence of hepatitis virus delta antigen (SEQ ID NO: 170); sequence REKKKFLKRR of mouse Mxl protein (SEQ ID NO: 171); sequence KRKGDEVDGVDEVAKKKSKK of human poly (ADP-ribose) polymerase (SEQ ID NO: 172); sequence RKCLQAGMNLEARKTKK of steroid hormone receptor (human) glucocorticoid (SEQ ID NO: 173); the sequence PRPRKIPR (SEQ ID NO: 174) of the P protein (BDV-P1) of the Borna disease virus; the sequence PPRKKRTVV (SEQ ID NO: 175) of the hepatitis C virus nonstructural protein (HCV-NS 5A); sequence NLSKKKKRKREK of LEF1 (SEQ ID NO: 176); sequence RRPSRPFRKP of ORF57 simirae (SEQ ID NO: 177); the sequence KRPSPSS of EBV LANA (SEQ ID NO: 178); sequence KRGINDRNFWRGENERKTR of influenza A virus protein (SEQ ID NO: 179); sequence PRPPKMARYDN of human RNA Helicase A (RHA) (SEQ ID NO: 180); the nucleolus RNA helicase II sequence KRGSFSKAF (SEQ ID NO: 181); TUS-protein sequence KLKIKRPVK (SEQ ID NO: 182); sequence PKKKRKVPPPPAAKRVKLD associated with import protein- α (SEQ ID NO: 183); sequence PKTRRRPRRSQRKRPPT from the Rex protein in HTLV-1 (SEQ ID NO: 184); sequence MSRRRKANPTKLSENAKKLAKEVEN (SEQ ID NO: 185) of EGL-13 protein from caenorhabditis elegans (Caenorhabditis elegans); and the sequences KTRRRPRRSQRKRPPT (SEQ ID NO: 186), RRKKRRPRRKKRR (SEQ ID NO: 187), PKKKSRKPKKKSRK (SEQ ID NO: 188), HKKKHPDASVNFSEFSK (SEQ ID NO: 189), QRPGPYDRPQRPGPYDRP (SEQ ID NO: 190), LSPSLSPLLSPSLSPL (SEQ ID NO: 191), RGKGGKGLGKGGAKRHRK (SEQ ID NO: 192), PKRGRGRPKRGRGR (SEQ ID NO: 193), PKKKRKVPPPPAAKRVKLD (SEQ ID NO: 194) and PKKKRKVPPPPKKKRKV (SEQ ID NO: 195), PAKRARRGYKC (SEQ ID NO: 40188), KLGPRKATGRW (SEQ ID NO: 40189), PRRRREE (SEQ ID NO: 40190), PYRGRKE (SEQ ID NO: 40191), PLRKRPRR (SEQ ID NO: 40192), PLRKRPRRGSPLRKRPRR (SEQ ID NO: 40193), PAAKRVKLDGGKRTADGSEFESPKKKRKV (SEQ ID NO: 40194), PAAKRVKLDGGKRTADGSEFESPKKKRKVGIHGVPAA (SEQ ID NO: 40195), PAAKRVKLDGGKRTADGSEFESPKKKRKVAEAAAKEAAAKEAAAKA (SEQ ID NO: 40196), PAAKRVKLDGGKRTADGSEFESPKKKRKVPG (SEQ ID NO: 40197), KRKGSPERGERKRHW (SEQ ID NO: 40198), KRTADSQHSTPPKTKRKVEFEPKKKRKV (SEQ ID NO: 40199) and PKKKRKVGGSKRTADSQHSTPPKTKRKVEFEPKKKRKV (SEQ ID NO: 40200). Additional NLSs for incorporation into the AAV systems of the present disclosure are provided in tables 15 and 16, indicating the NLS for connecting the N or C terminus of CasX. In some embodiments, the one or more NLS is linked to CasX or to an adjacent NLS by a linker peptide, wherein the linker peptide is selected from RS, (G) n (SEQ ID NO: 40201), (GS) n (SEQ ID NO: 40202), (GSGGS) n (SEQ ID NO: 208), (GGSGGS) n (SEQ ID NO: 209), (GGGS) n (SEQ ID NO: 210), GGSG (SEQ ID NO: 211), GGSGG (SEQ ID NO: 212), GSGSGSG (SEQ ID NO: 213), GSGGG (SEQ ID NO: 214), GGGSG (SEQ ID NO: 215), GSSSG (SEQ ID NO: 216), GPGPP (SEQ ID NO: 217), GGP, PPP, PPAPPA (SEQ ID NO: 218), PPPG (SEQ ID NO: 40207), PPPGPPP (SEQ ID NO: 219), PPP (GGGS) n (SEQ ID NO: 40203), (GGGS) n (SEQ ID NO: 40204), AEAAAKEAAAKEAAAKA (SEQ ID NO: 40205) and 40206), wherein PPP n is 1 to 35 PPP. In some embodiments, AAV constructs of the disclosure comprise polynucleic acids encoding the NLS and linker peptides of any of the preceding embodiments of the paragraph and the NLS of tables 15 and 16, and in some cases may be constructed as depicted in any of fig. 24, 33-35, or 42 relative to other components of the construct.

Typically, the NLS (or NLS) has sufficient strength to drive accumulation of the CasX variant fusion protein in the nucleus of eukaryotic cells. Detection of accumulation in the nucleus may be performed by any suitable technique. For example, a detectable label may be fused to a CasX variant fusion protein such that the position of the latter within the cell may be visualized. The nuclei may also be isolated from the cells, the contents of which may then be analyzed by any suitable method for detecting proteins, such as immunohistochemistry, western blotting, or enzymatic activity assays. Accumulation in the nucleus can also be measured indirectly.

In some cases, casX variant fusion proteins used in AAV systems include a "protein transduction domain" or PTD (also referred to as CPP-cell penetrating peptide), which refers to a protein, polynucleotide, carbohydrate, or organic or inorganic compound that promotes penetration through a lipid bilayer, micelle, cell membrane, organelle membrane, or vesicle membrane. PTDs (which may range from small polar molecules to macromolecules and/or nanoparticles) attached to another molecule facilitate the passage of the molecule across a membrane, for example from the extracellular space to the intracellular space, or from the cytosol into an organelle. In some embodiments, the PTD is covalently linked to the amino terminus of the CasX variant fusion protein. In some embodiments, the PTD is covalently linked to the carboxy terminus of the CasX variant fusion protein. In some cases, the PTD is inserted within the sequence of the CasX variant fusion protein at a suitable insertion site. In some cases, the CasX variant fusion protein includes (is conjugated to, fused to) one or more PTDs (e.g., two or more, three or more, four or more PTDs). In some cases, the PTD includes one or more Nuclear Localization Signals (NLS). Examples of PTDs include, but are not limited to, peptide transduction domains of HIV TAT comprising: YGRKKRRQRRR (SEQ ID NO: 198), RKKRRQRRR (SEQ ID NO: 199); YARAAARQARA (SEQ ID NO: 200); THRLPRRRRRR (SEQ ID NO: 201); GGRRARRRRRR (SEQ ID NO: 202); a polyarginine sequence comprising an amount of arginine sufficient to directly enter a cell, e.g., 3, 4, 5, 6, 7, 8, 9, 10, or 10 to 50 arginines (SEQ ID NO: 203); VP22 domain (Zender et al, (2002) Cancer Gene Ther.9 (6): 489-96); drosophila antennal protein transduction domains (Noguchi et al, (2003) Diabetes 52 (7): 1732-1737); truncated human calcitonin peptide (Trehin et al, (2004) pharm.research 21:1248-1256); polylysine (Wender et al, (2000) Proc. Natl. Acad. Sci. USA 97:13003-13008); RRQRRTSKLMKR (SEQ ID NO: 204); transporter GWTLNSAGYLLGKINLKALAALAKKIL (SEQ ID NO: 205); KALAWEAKLAKALAKALAKHLAKALAKALKCEA (SEQ ID NO: 206); RQIKIWFQNRRMKWKK (SEQ ID NO: 207). In some embodiments, the PTD is an Activatable CPP (ACPP) (Aguilera et al, (2009) Integr Biol (Camb) June;1 (5-6): 371-381). ACPP comprises a polycationic CPP (e.g., arg9 or "R9") linked to a matching polyanion (e.g., glu9 or "E9") via a cleavable linker, which reduces the net charge to almost zero and thereby inhibits adhesion and uptake into cells. Upon cleavage of the linker, the polyanion is released, locally exposing the polyarginine and its inherent adhesiveness, thereby "activating" the ACPP to cross the membrane.

In some embodiments, a CasX variant fusion protein may comprise a CasX protein linked via a linker polypeptide (e.g., one or more linker polypeptides) to a CasX protein having inserted therein a heterologous amino acid or heterologous polypeptide (heterologous amino acid sequence). In some embodiments, the CasX variant fusion protein may be linked to a heterologous polypeptide (fusion partner) at the C-terminus and/or N-terminus via a linker polypeptide (e.g., one or more linker polypeptides). The linker polypeptide may have any of a variety of amino acid sequences. Proteins may be linked by spacer peptides, which typically have flexible properties, although other chemical bonds are not excluded. Suitable linkers include polypeptides from 4 amino acids to 40 amino acids in length, or from 4 amino acids to 25 amino acids in length. These linkers are typically produced by coupling proteins using synthetic oligonucleotides that encode the linkers. Peptide linkers with a degree of flexibility may be used. The linker peptide may have virtually any amino acid sequence, bearing in mind that the preferred linker will have a sequence that results in a generally flexible peptide. The use of small amino acids (such as glycine and alanine) can be used to produce flexible peptides. The generation of such sequences is routine to those skilled in the art. A variety of different linkers are commercially available and are considered suitable for use. Exemplary linker polypeptides include glycine polymer (G) n, glycine-serine polymer, glycine-alanine polymer, alanine-serine polymer, glycine-proline polymer, and proline-alanine polymer. Exemplary linkers can comprise amino acid sequences including, but not limited to: (G) n (SEQ ID NO: 40201), (GS) n (SEQ ID NO: 40202), (GSGGS) n (SEQ ID NO: 208), (GGSGGS) n (SEQ ID NO: 209), (GGGS) n (SEQ ID NO: 210), GGSG (SEQ ID NO: 211), GGSGG (SEQ ID NO: 212), GSGSGSG (SEQ ID NO: 213), GSGGG (SEQ ID NO: 214), GGGSG (SEQ ID NO: 215), GSSSG (SEQ ID NO: 216), GPGPGP (SEQ ID NO: 217), GGP, PPP, PPAPPA (SEQ ID NO: 218), PPPG (SEQ ID NO: 40207), PPPGPPP (SEQ ID NO: 219), PPP (GGGS) n (SEQ ID NO: 40203), (GGGS) nPPP (SEQ ID NO: 40204), AEAAAKEAAAKEAAAKA (SEQ ID NO: 40205) and TPPKTKRKVEFE (SEQ ID NO: 40206), wherein n is 1 to 5. One of ordinary skill will recognize that the design of the peptide conjugated to any of the elements described above may include a linker that is wholly or partially flexible, such that the linker may include a flexible linker as well as one or more portions that impart a less flexible structure.

AAV systems and methods for modifying target nucleic acids

AAV provided herein can be used in a variety of applications, including as a therapeutic agent, diagnostic agent, and for research. To implement the methods of the present disclosure for gene editing, programmable AAV systems are provided herein. The programmable nature of the CasX and gRNA components of the AAV systems provided herein allows for precise targeting to achieve a desired effect (nicking, cleavage, etc.) at one or more predetermined regions of interest in a target nucleic acid sequence. In some embodiments, the AAV systems provided herein comprise sequences encoding CasX proteins and grnas, wherein the targeting sequence of the gRNA is complementary to the target nucleic acid sequence and is therefore capable of hybridizing to the target nucleic acid sequence. In some cases, the AAV system further comprises a donor template nucleic acid.

In some embodiments of the present disclosure, provided herein are methods of modifying a target nucleic acid sequence. In some embodiments, the method comprises contacting a cell comprising a target nucleic acid sequence with an AAV encoding a CasX protein of the present disclosure and a gRNA of the present disclosure comprising a targeting sequence, wherein the targeting sequence of the gRNA has a sequence complementary to and hybridizable to the sequence of the target nucleic acid. After hybridization to a target nucleic acid by CasX and gRNA, casX introduces one or more single-strand breaks or double-strand breaks within or near the target nucleic acid, which may include sequences that contain regulatory elements or non-coding regions of the gene, which result in permanent insertions (deletions or insertions) or mutations in the target nucleic acid, as described herein, accompanied by corresponding modulation of expression of the gene product or alteration of function, thereby producing an editing cell. In other embodiments, the method comprises contacting a cell comprising a target nucleic acid sequence with an AAV encoding a plurality of grnas targeting different or overlapping portions of the target nucleic acid, wherein the CasX protein introduces a plurality of breaks in the target nucleic acid resulting in permanent indels or mutations in the target nucleic acid, as described herein, accompanied by a corresponding modulation or alteration of function of expression of the gene product, thereby producing an edited cell.

In some embodiments, modification of the target nucleic acid results in reduced expression of a gene product of a gene comprising the target nucleic acid, wherein expression is reduced by at least about 10%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, or at least about 90% as compared to an unmodified cell.

In some embodiments of the method of modifying a target nucleic acid sequence, the gRNA of the AAV vector is a guide DNA (gDNA). In other embodiments, the gRNA is a guide RNA (gRNA). In some embodiments, the gRNA is a single molecule gRNA (sgRNA). In other embodiments of the method, the gRNA is a dual molecule gRNA (dgRNA) in which the activator and the targeting agent component are linked together by an intervening nucleotide. In some embodiments, the gRNA is a chimeric gRNA-gDNA. In some embodiments, the method comprises contacting the target nucleic acid sequence with an AAV encoding a plurality of grnas targeting different or overlapping regions of the target nucleic acid. In some embodiments, the gRNA scaffold comprises any of the sequences of SEQ ID NOs 2101-2285, 39981-40026, 40913-40958 and 41817 as shown in Table 2.

In some embodiments, modification of the target nucleic acid sequence is performed ex vivo. In some embodiments, modification of the target nucleic acid sequence occurs outside the interior of the cell. In some embodiments of modifying a target nucleic acid sequence in a cell, the cell is a eukaryotic cell selected from the group consisting of a rodent cell, a mouse cell, a rat cell, a primate cell, a non-human primate cell, and a human cell. In particular embodiments, the eukaryotic cell is a human cell. In some embodiments, modification of the target nucleic acid sequence is performed in the subject. In some embodiments, the subject is selected from the group consisting of mice, rats, pigs, non-human primates, and humans.

In some embodiments, the method of modifying a target nucleic acid sequence comprises contacting the target nucleic acid with an AAV vector encoding a CasX protein and gRNA pair and further comprising a donor template. The donor template can be inserted into the target nucleic acid such that all, a portion, or none of the gene product is expressed. The donor template may be a short single-stranded or double-stranded oligonucleotide, or may be a long single-stranded or double-stranded oligonucleotide, depending on whether the system is used to knock-down/knock-out or knock-in the protein coding sequence. For knockdown/knockdown, the donor template sequence need not be identical to the genomic sequence it replaces, and may contain one or more single base changes, insertions, deletions, inversions or rearrangements relative to the genomic sequence. If there are a sufficient number of arms ("homology arms") of nucleotides flanking (i.e., 5 'and 3' of) the cleavage site of the target nucleic acid sequence targeted by the CasX: gRNA with sufficient homology to support homology directed repair, the use of such donor templates can result in frame shifts or other mutations such that the gene product is not expressed or expressed at a lower level. In some embodiments, the homology arms comprise between 10 and 100 nucleotides. The upstream homology arm sequence and the downstream homology arm sequence share at least about 80%, 85%, 90%, 95% or 100% homology with nucleotide sequences within 1 to 50 bases flanking either side of the cleavage site of the CasX cleavage target nucleic acid sequence, thereby facilitating insertion of the donor template sequence by HDR. In some embodiments, the donor template sequence comprises a non-homologous or heterologous sequence flanking the two homology arms such that homology directed repair between the target DNA region and the two flanking arm sequences results in insertion of the non-homologous or heterologous sequence at the target region, results in a knockdown or knockdown of the target gene, and results in a reduction or elimination of expression of the gene product. In such knockdown cases, expression of the gene product is reduced by at least about 10%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, or at least about 90% as compared to the unmodified target nucleic acid. In other cases, the exogenous donor template can comprise the rectification sequence to be integrated and flank an upstream homology arm and a downstream homology arm each having homology to the target nucleic acid sequence introduced into the cell. The use of such donor templates may result in expression of the functional protein or expression of a physiologically normal level of the functional protein after gene editing. In other cases, exogenous donor templates, which may comprise mutations, heterologous sequences, or corrective sequences, are inserted between the ends resulting from CasX cleavage by a homology-independent targeted integration (HITI) mechanism. The foreign sequence inserted by the HITI may be any relatively short sequence, for example between 1 and 50 nucleotides in length, or a longer sequence of about 50 to 1000 nucleotides in length. The lack of homology may be, for example, no more than 20% to 50% sequence identity, and/or lack of specific hybridization at low stringency. In other cases, the lack of homology may also include criteria having an identity of no more than 5bp, 6bp, 7bp, 8bp, or 9 bp.

In some embodiments, the AAV vector comprises a donor template sequence, wherein the sequence may comprise certain sequence differences compared to the target nucleic acid sequence, e.g., restriction sites, nucleotide polymorphisms, selectable markers (e.g., drug resistance genes, fluorescent proteins, enzymes, etc.), etc., which can be used to assess successful insertion of the donor nucleic acid at the cleavage site, or in some cases, for other purposes (e.g., to represent expression at a targeted genomic locus). Alternatively, these sequence differences may include flanking recombination sequences, such as FLP, loxP sequences, etc., which are activated at a later time to remove the marker sequence. In some embodiments of the method, the donor polynucleotide comprises at least about 10, at least about 50, at least about 100, at least about 200, at least about 300, at least about 400, at least about 500, at least about 600, at least about 700 nucleotides. In other embodiments, the donor polynucleotide comprises at least about 10 to about 700 nucleotides, at least about 20 to about 600 nucleotides, at least about 40 to about 400 nucleotides. In some embodiments, the donor template is a single-stranded DNA template or a single-stranded RNA template.

In some cases, these methods do not include contacting the target nucleic acid sequence with a donor template, and the target nucleic acid sequence is modified such that nucleotides within the target nucleic acid sequence are deleted or inserted according to the repair pathway of the cell itself; for example, the cell repair pathway may be NHEJ.

In other embodiments, the method provides an AAV encoding CasX comprising one or more Nuclear Localization Signals (NLS) of any one or more of the embodiments described herein for targeting CasX/gRNA to the nucleus of a cell. NLS can be fused at or near the N-terminus, the C-terminus, or both of the CasX protein.

The introduction of a recombinant AAV vector comprising sequences encoding the transgenic components of the disclosure (e.g., casX, gRNA, promoter and helper components, and optionally a donor template sequence) into a cell under in vitro conditions can be performed in any suitable medium and under any suitable culture conditions that promote cell survival and CasX: gRNA production. The introduction of the recombinant AAV vector into the target cell may be performed in vivo, in vitro, or ex vivo. In some embodiments of the method, the vector may be provided directly to the target host cell. For example, the cells may be contacted with vectors having nucleic acids encoding CasX and gRNA of any of the embodiments described herein and optionally donor template sequences such that the vectors are taken up by the cells. Methods for contacting cells with nucleic acid vectors as plasmids include electroporation, calcium chloride transfection, microinjection, transduction, and liposome infection, as are well known in the art. In some embodiments, the AAV is selected from AAV1, AAV2, AAV3, AAV4, AAV5, AAV6, AAV7, AAV8, AAV9, AAV10, AAV11, AAV12, AAV 44.9, AAV-Rh74, or AAVRh10.

AAV vectors for providing nucleic acids encoding gRNA and CasX proteins to target host cells can include suitable promoters or other auxiliary elements for driving expression (i.e., transcriptional activation) of the nucleic acid of interest. In some cases, the encoded nucleic acid of interest will be operably linked to a promoter. This may include a randomly acting promoter, for example the CMV- β -actin promoter, or an inducible promoter, such as a promoter that is active in a particular cell population or responsive to the presence of a drug such as tetracycline or kanamycin. By transcriptional activation, it is expected that transcription will increase from basal levels by at least about 10-fold, at least about 100-fold, more typically at least about 1000-fold in a target host cell comprising the vector. In addition, the vector used to provide the nucleic acid encoding the gRNA and/or the CasX protein to the cell may comprise a nucleic acid sequence encoding a selectable marker in the target cell in order to identify cells that have ingested the CasX protein and/or the gRNA.

AAV vectors

In other embodiments, the disclosure provides recombinant AAV vectors comprising polynucleotides encoding the CasX proteins, grnas, and regulatory and accessory elements described herein.

In some embodiments, the present disclosure provides a recombinant adeno-associated virus (rAAV) comprising a): AAV capsid protein and b): the polynucleotide of any of the embodiments described herein. In the foregoing embodiments, the polynucleotide may comprise a sequence of a component selected from the group consisting of: a first adeno-associated virus (AAV) Inverted Terminal Repeat (ITR) sequence; a second AAV ITR sequence; the first promoter sequence of any one of the embodiments described herein; a second promoter sequence of any one of the embodiments described herein; a sequence encoding a CRISPR protein of any of the embodiments described herein; a sequence encoding at least a first guide RNA (gRNA) of any one of the embodiments described herein; and one or more accessory element sequences of any of the embodiments described herein. In some embodiments, the polynucleotide comprises one or more sequences selected from the sequences set forth in tables 8-10, 12, 13 and 17-22 and 24-27, or a sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% identity thereto. In another embodiment, the polynucleotide comprises a sequence selected from the sequences set forth in tables 8-10, 12, 13 and 17-22 and 24-27. In some embodiments, the polynucleotide sequences differ from those shown in tables 8-10, 12, 13 and 17-22, and 24-26 only in the selection of the targeting sequence of the gRNA or grnas encoded by the polynucleotide, wherein the targeting sequence is a sequence having 15 to 30 nucleotides that is capable of hybridizing to the sequence of the target nucleic acid. In a specific embodiment, the targeting sequence of the polynucleotide is selected from the sequences shown in table 27. In some embodiments, the present disclosure provides polynucleotides of any of the embodiments described herein, wherein the polynucleotides have the configuration of the construct of any of fig. 24, 33-35, or 42.

In some embodiments, the AAV capsid protein is derived from serotype AAV1, AAV2, AAV3, AAV4, AAV5, AAV6, AAV7, AAV8, AAV9, AAV10, AAV11, AAV12, AAV 44.9, AAV-Rh74, or AAVRh10. In some embodiments, the AAV capsid proteins and the 5 'and 3' itrs are derived from the same serotype AAV. In other embodiments, the AAV capsid proteins and the 5 'and 3' itrs are derived from AAV of different serotypes. In a specific embodiment, the 5 'and 3' itrs are derived from AAV1. In another specific embodiment, the 5 'and 3' itrs are derived from AAV2. In some embodiments, the polynucleotide comprises a sequence encoding a reference CasX of SEQ ID NOs 1-3. In other embodiments, the polynucleotide comprises a sequence encoding a CasX variant of any of the embodiments described herein, including CasX protein variants of SEQ ID NOs 49-160, 40208-40369, and 40828-40912, as shown in table 3, or a sequence having at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% sequence identity thereto. In some embodiments, the polynucleotide encodes a gRNA scaffold sequence selected from the group consisting of SEQ ID NOs 2101-2285, 39981-40026, 40913-40958, and 41817 as shown in table 2, or a sequence having at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99% sequence identity thereto. In some embodiments, the gRNA comprises a targeting sequence of 15 to 30 nucleotides that is complementary to and thus hybridizes to a target nucleic acid in a cell and is linked to the 3' end of the gRNA scaffold sequence.

In other aspects, the disclosure relates to methods of producing polynucleotide sequences encoding AAV vectors of any of the embodiments described herein, and methods of expressing and recovering AAV. In general, these methods comprise generating polynucleotide sequences encoding the components of the expression cassette plus flanking ITRs of any of the embodiments described herein, and incorporating the encoding genes into expression vectors suitable for use in host cells. To produce the AAV vectors of any of the embodiments described herein, the methods comprise transforming a suitable host cell with an expression vector comprising a coding polynucleotide and trans-provided Rep and Cap sequences, and culturing the host cell under conditions that result in or allow production of the resulting AAV, which is recovered by the methods described herein or by standard purification methods known in the art. Rep and Cap can be provided as plasmids to packaging host cells. Alternatively, the host cell genome may comprise stably integrated Rep and Cap genes. Suitable packaging cell lines are known to those of ordinary skill in the art. See, e.g., www.cellbiolabs.com/aav-expression-and-packaging. Methods of purifying AAV produced by a host cell line are known to those of ordinary skill in the art and include, but are not limited to, affinity chromatography, gradient centrifugation, and ion exchange chromatography. The polynucleotides and AAV vectors of the present disclosure are prepared using standard recombinant techniques in molecular biology and the methods of the examples.

According to the present disclosure, nucleic acid sequences encoding reference CasX, casX variants, or grnas (or their complements) of any of the embodiments described herein are used to produce recombinant DNA molecules that direct expression in a suitable host cell. Several cloning strategies are suitable for performing the present disclosure, many of which are used to generate constructs comprising genes encoding the compositions of the present disclosure or their complements. In some embodiments, cloning strategies are used to generate genes encoding constructs comprising nucleotides encoding reference CasX, casX variants, or for transforming host cells to express grnas of the composition.

In some methods, constructs containing DNA sequences encoding components of AAV vectors and transgenes are first prepared. Exemplary methods of making such constructs are described in the examples. The construct is then used to generate an expression vector suitable for transforming a host packaging cell, such as a eukaryotic host cell, for expression and recovery of the AAV vector comprising the transgene. Eukaryotic host packaging cells may be selected from BHK cells, HEK293T cells, NS0 cells, SP2/0 cells, YO myeloma cells, P3X63 mouse myeloma cells, PER cells, per.c6 cells, hybridoma cells, NIH3T3 cells, COS cells, heLa cells, CHO cells, or other eukaryotic cells known in the art as suitable for the production of recombinant AAV. Many transfection techniques are known in the art; see, e.g., sambrook et al, (1989), "Molecular Cloning, a laboratory manual", cold spring harbor laboratory, new york. Particularly suitable transfection methods include calcium phosphate co-precipitation, direct microinjection into cultured cells, electroporation, liposome-mediated gene transfer, lipid-mediated transduction, and nucleic acid delivery using high-speed microparticles. Exemplary methods for producing expression vectors, transforming host cells, and expressing and recovering nucleic acids and AAV vectors are described in the examples.

Genes encoding AAV vectors may be prepared in one or more steps, synthesized entirely or by combination with enzymatic methods, such as restriction enzyme-mediated cloning, PCR, and overlap extension, including the methods described more fully in the examples. The methods disclosed herein can be used, for example, to ligate sequences of polynucleotides encoding various components of a desired sequence (e.g., ITR, casX and gRNA, promoters and auxiliary elements) to produce expression vectors.

In some embodiments, host cells transfected with the above AAV expression vectors are capable of providing AAV helper functions to replicate and encapsidate nucleotide sequences flanking AAV ITRs, thereby producing rAAV viral particles. AAV helper functions are typically AAV-derived coding sequences that can be expressed to provide AAV gene products that in turn function in a trans-form for productive AAV replication. AAV helper functions are used herein to complement the essential AAV functions deleted in an AAV expression vector. Thus, AAV helper functions include one or both of the major AAV ORFs (open reading frames), i.e., encoding rep and cap coding regions or functional homologs thereof. The helper functions may be introduced into the host cell and then expressed in the host cell using methods known to those skilled in the art. Typically, helper functions are provided by infecting host cells with an unrelated helper virus. In some embodiments, an ancillary function carrier is used to provide the ancillary function. Any of a number of suitable transcriptional and translational control elements (including constitutive and inducible promoters, transcriptional enhancer elements, transcriptional terminators, and the like) may be used in the expression vector, depending on the host/vector system used.

In some embodiments, the nucleotide sequence encoding a component of an AAV vector is codon optimized. This type of optimization may require mutations in the coding nucleotide sequence to mimic the codon bias of the intended host organism or cell, while encoding the same CasX protein or other protein components. Thus, the codons may be varied, but the encoded protein remains unchanged. For example, if the host cell is intended to be a human cell, a nucleotide sequence encoded by CasX that is optimized for human codons may be used. Genetic design may be performed using algorithms that optimize codon usage and amino acid composition suitable for use in host cells used in the production of AAV vectors. In one method of the present disclosure, a library of polynucleotides encoding components of a construct is generated and then assembled as described above. The resulting genes are then assembled and used to transform host cells and to produce and recover AAV vector compositions to evaluate their properties, as described herein. In some embodiments, as described more fully below, the nucleotide sequences encoding components of the AAV vector are engineered to remove CpG dinucleotides in order to reduce the immunogenicity of the components, while preserving their functional characteristics.

In some embodiments, the nucleotide sequence encoding the gRNA is operably linked to a regulatory element. In some embodiments, the nucleotide sequence encoding the CasX protein is operably linked to regulatory elements. In other cases, the nucleotides encoding CasX and gRNA are linked and operably linked to a single regulatory element. Exemplary auxiliary elements include transcription promoters, transcription enhancer elements, transcription termination signals, internal Ribosome Entry Sites (IRES) or P2A peptides that allow for translation of multiple genes from a single transcript, polyadenylation sequences that facilitate downstream transcription termination, sequences for optimizing translation initiation, and translation termination sequences. In some cases, the promoter is a constitutively active promoter. In some cases, the promoter is a regulatable promoter. In some cases, the promoter is an inducible promoter. In some cases, the promoter is a tissue-specific promoter. In some cases, the promoter is a cell type specific promoter. In some cases, transcription assistance elements (e.g., promoters) function in targeting cell types or targeted cell populations. For example, in some cases, the transcription assistance element may function in a eukaryotic cell (e.g., a packaging host cell for producing an AAV vector). In some cases, the helper element is a transcriptional activator that cooperates with the promoter to initiate transcription. By transcriptional activation, it is expected that transcription will increase 10-fold, 100-fold, more typically 1000-fold over basal levels in target cells.

Non-limiting examples of eukaryotic promoters (promoters that function in eukaryotic cells) include EF-1 alpha, EF-1 alpha core promoters, promoters from the immediate early phase of Cytomegalovirus (CMV), herpes Simplex Virus (HSV) thymidine kinase, early and late SV40, long Terminal Repeat (LTR) from retroviruses, and mouse metallothionein-I. Other non-limiting examples of eukaryotic promoters include the CMV promoter full length promoter, the minimal CMV promoter, the chicken beta-actin promoter, the RSV promoter, the HIV-Ltr promoter, the hPGK promoter, the HSV TK promoter, the Mini-TK promoter, the human synapsin I promoter conferring neuronal specific expression, the Mecp2 promoter selectively expressed in neurons, the minimal IL-2 promoter, the Rous sarcoma virus enhancer/promoter (RSV), the spleen focus forming virus Long Terminal Repeat (LTR) promoter, the SV40 enhancer, the TBG promoter from human thyroxine binding globulin (liver specific) promoter, the PGK promoter, the human ubiquitin C promoter, the UCOE promoter (HNRPA 2B1-CBX 3), the histone H2 promoter, the histone H3 promoter, the U1a1 microRNA promoter (226 nt), the U1B2 microRNA promoter (246 nt), the TTR minimal enhancer/26, the PDH 3 promoter, the PDH 3-PDH-dehydrogenase promoter and the PDH-PDH promoter. In some embodiments, the promoter operably linked to the sequence encoding the first and/or second gRNA is U6 (Kunkel, GR et al, U6 small nuclear RNA is transcribed by RNA polymerase III, proc Natl Acad Sci U S A, volume 83, 22: 8575 (1986)).

Non-limiting examples of pol II promoters suitable for use in AAV constructs of the present disclosure include, but are not limited to, polyubiquitin C (UBC), cytomegalovirus (CMV), simian Virus 40 (SV 40), chicken beta-actin promoter and rabbit beta-globin splice acceptor site fusion (CAG), chicken beta-actin promoter with cytomegalovirus enhancer (CB 7), PGK, jens Tonooe (JeT), GUSB, CBA hybrid (CBh), elongation factor-1 alpha (EF-1 alpha), beta-actin, rous Sarcoma Virus (RSV), silencing-prone Spleen Focus Forming Virus (SFFV), CMVd1 promoter, truncated human CMV (tCMVd 2), minimal CMV promoter, chicken beta-actin promoter with cytomegalovirus enhancer (CB 7), HSV TK promoter, mini-TK promoter, minimal IL-2 promoter, GRP94 promoter, super core 1, super core 35, sync 1, rpa 1, and Rp 1 promoters such as the like, hRpl30 and hRps 18), CMV53 promoter, minimal SV40 promoter, CMV53 promoter, SFCp promoter, pJB42CAT5 promoter, MLP promoter, EFS promoter, meP426 promoter, mecP2 promoter, MHCK7 promoter, beta-Glucuronidase (GUSB), CK7 promoter and CK8e promoter. In some embodiments, an AAV construct of the disclosure comprises a pol II promoter comprising a sequence as shown in table 8, or a sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity thereto. In particular embodiments, the pol II promoter is EF-1 a, wherein the promoter enhances transfection efficiency, enhances transgene transcription or expression of the CRISPR nuclease, increases the proportion of expression positive clones, and increases the copy number of ionophores in long term culture. In another specific embodiment, the pol II promoter is JeT, wherein the promoter enhances transfection efficiency, enhances transgene transcription or expression of a CRISPR nuclease, increases the proportion of expression positive clones, and increases the copy number of the ionophore in long term culture. In some embodiments, the pol II promoter is a truncated form of the foregoing promoters. In some embodiments, the pol II promoter in an AAV construct of the disclosure has less than about 400 nucleotides, less than about 350 nucleotides, less than about 300 nucleotides, less than about 200 nucleotides, less than about 150 nucleotides, less than about 100 nucleotides, less than about 80 nucleotides, or less than about 40 nucleotides. In some embodiments, the pol II promoter in an AAV construct of the disclosure has about 40 to about 585 nucleotides, about 100 to about 400 nucleotides, or about 150 to about 300 nucleotides. In some embodiments, an AAV construct of the disclosure comprises a polynucleic acid encoding the pol II promoter of any of the preceding embodiments of this paragraph and the promoter of table 8, and in some cases can be configured as shown in any of fig. 24, 33-35, or 42 relative to the other components of the construct.

In some embodiments, the AAV constructs of the disclosure comprise a pol II promoter with an attached intron, wherein the intron enhances the ability of the promoter to increase transfection efficiency, enhances transgene transcription or expression of a CRISPR nuclease, increases the proportion of expression positive clones, and increases the copy number of the ionophore in long term culture. Exemplary embodiments of such promoter-intron combinations are described in the examples.

Non-limiting examples of pol III promoters suitable for use in AAV constructs of the present disclosure include, but are not limited to, U6, mini U6, 7SK and H1 variants, biH1 (Bi-directional H1 promoter), biU6, bi7SK, biH1 (Bi-directional U6, 7SK and H1 promoters), gorilla U6, rhesus U6, human 7SK and human H1 promoters. In the foregoing embodiments, the pol III promoter enhances transcription of the gRNA encoded by the AAV. In some embodiments, an AAV construct of the disclosure comprises a pol III promoter comprising a sequence as shown in table 9, or a sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity thereto. In some embodiments, the pol III promoter is a truncated form of the foregoing promoters. In some embodiments, the pol III promoter in an AAV construct of the disclosure has less than about 250 nucleotides, less than about 220 nucleotides, less than about 200 nucleotides, less than about 160 nucleotides, less than about 140 nucleotides, less than about 130 nucleotides, less than about 120 nucleotides, less than about 100 nucleotides, less than about 80 nucleotides, or less than about 70 nucleotides. In some embodiments, the pol III promoter in an AAV construct of the disclosure has about 70 to about 245 nucleotides, about 100 to about 220 nucleotides, or about 120 to about 160 nucleotides. In some embodiments, an AAV construct of the disclosure comprises a polynucleic acid encoding the pol III promoter of any of the preceding embodiments of this paragraph and the promoter of table 9, and in some cases can be configured as shown in any of figures 24, 33-35, or 42 relative to the other components of the construct.

The selection of an appropriate promoter is well within the level of one of ordinary skill in the art, as the selection is relevant to controlling expression, e.g., for modifying a gene or other target nucleic acid. The expression vector may also contain a ribosome binding site for translation initiation and a transcription terminator. The expression vector may also contain appropriate sequences for amplified expression. The expression vector may also comprise a nucleotide sequence encoding a protein tag (e.g., a 6xHis tag, a hemagglutinin tag, a fluorescent protein, etc.) that can be fused to the CasX protein, thereby producing a chimeric CasX protein for purification or detection.

In some embodiments, the present disclosure provides polynucleotide sequences encoding a gRNA and/or CasX protein operably linked to an inducible promoter, a constitutively active promoter, a spatially restricted promoter (i.e., transcriptional control elements, enhancers, tissue specific promoters, cell type specific promoters, etc.), or a temporally restricted promoter.

In certain embodiments, suitable promoters may be derived from viruses, and thus may be referred to as viral promoters, or they may be derived from any organism (including prokaryotic or eukaryotic organisms). Suitable promoters can be used to drive expression of any RNA polymerase (e.g., pol I, pol II, pol III). Exemplary promoters include, but are not limited to, the SV40 early promoter, the mouse mammary tumor virus Long Terminal Repeat (LTR) promoter; adenovirus major late promoter (Ad MLP); herpes Simplex Virus (HSV) promoters, cytomegalovirus (CMV) promoters such as CMV immediate early promoter region (CMVIE), rous Sarcoma Virus (RSV) promoters, human U6 micronucleus promoter (U6), enhanced U6 promoters, human HI promoter (HI), pol II promoters, 7SK promoters, tRNA promoters, and the like. In some embodiments, the disclosure provides polynucleotide sequences in which two grnas of a transgene are operably linked to a single bi-directional promoter (e.g., a bi-directional H1 promoter or a bi-directional U6 promoter) disposed between the two encoded gRNA sequences, wherein the promoter is capable of initiating transcription of both gRNA sequences. In other embodiments, the disclosure provides AAV constructs comprising a promoter in a reverse orientation (i.e., 3 'to 5'). Exemplary reverse and bi-directional promoters are described in the examples and table 8 and are schematically depicted in fig. 24 and 34.

In some cases, the promoter is a spatially restricted promoter (i.e., a cell type specific promoter, a tissue specific promoter, etc.), such that in a multicellular organism, the promoter is active (i.e., "effective") in a subset of specific cells. Spatially restricted promoters may also be referred to as enhancers, transcription assistance elements, control sequences, and the like. Any convenient spatially constrained promoter may be used, so long as the promoter functions in targeting a host cell (e.g., eukaryotic cells; prokaryotic cells).

In some cases, the promoter is a reversible promoter. Suitable reversible promoters (including reversibly inducible promoters) are known in the art. Such reversible promoters can be isolated and derived from a number of organisms, e.g., eukaryotes and prokaryotes. Modification of a reversible promoter derived from a first organism for use in a second organism (e.g., a first prokaryote and a second eukaryote, a first eukaryote and a second prokaryote, etc.) is well known in the art. Such reversible promoters and systems based on such reversible promoters but also comprising additional control proteins include, but are not limited to, alcohol-regulated promoters (e.g., alcohol dehydrogenase I (alcA) gene promoters, promoters responsive to alcohol transactivator (AlcR, etc.), tetracycline-regulated promoters (e.g., promoter systems comprising Tet activator, tetON, tetOFF, etc.), steroid-regulated promoters (e.g., rat glucocorticoid receptor promoter systems, human estrogen receptor promoter systems, retinoid promoter systems, thyroid promoter systems, ecdysone promoter systems, mifepristone promoter systems, etc.), metal-regulated promoters (e.g., metallothionein promoter systems, etc.), pathogenesis-related promoters (e.g., salicylic acid-regulated promoters, ethylene-regulated promoters, benzothiadiazole-regulated promoters, etc.), temperature-regulated promoters (e.g., heat shock-inducible promoters (e.g., HSP-70, HSP-90, soybean heat shock promoters, etc.), light-regulated promoters, etc.), and the like.

The recombinant expression vectors of the present disclosure may also comprise elements that facilitate the robust expression components of the present disclosure (e.g., casX or gRNA). For example, a recombinant expression vector used in an AAV construct of the present disclosure may comprise one or more of a polyadenylation signal (poly (a)), an intron sequence, or a post-transcriptional auxiliary element (PTRE), such as woodchuck hepatitis post-transcriptional auxiliary element (WPRE). Non-limiting examples of PTREs suitable for use in AAV constructs of the present disclosure include the sequences of table 12, or sequences having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity thereto. Exemplary poly (a) sequences suitable for inclusion in the expression vectors of the present disclosure include hGH poly (a) signal (short), HSV TK poly (a) signal, synthetic polyadenylation signal, SV40 poly (a) signal, SV40 late poly a signal, β -globin poly (a) short, and the like. Non-limiting examples of poly (a) signals suitable for use in AAV constructs of the present disclosure include the sequences of table 10, or sequences having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity thereto. Non-limiting examples of introns suitable for use in AAV constructs of the present disclosure include the sequences of table 17, or sequences having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity thereto. One of ordinary skill in the art will be able to select appropriate elements to include in the recombinant expression vectors described herein.

Polynucleotides encoding the transgenic components can be cloned separately into AAV expression vectors. In some embodiments, the polynucleotide is a recombinant expression vector comprising a nucleotide sequence encoding a CasX protein. In other embodiments, the present disclosure provides a recombinant expression vector comprising a polynucleotide sequence encoding a CasX protein and a nucleotide sequence encoding a first gRNA and optionally a second gRNA. In some cases, the nucleotide sequence encoding the CasX protein variant and/or the nucleotide sequence encoding the gRNA are each operably linked to a promoter operable in the selected cell type. In other embodiments, the nucleotide sequence encoding the CasX protein variant and the nucleotide sequence encoding the gRNA are provided in separate vectors.

Nucleic acid sequences encoding the transgenic components are inserted into vectors by a variety of methods. Typically, DNA is inserted into the appropriate restriction endonuclease site using techniques known in the art. The vector component typically includes, but is not limited to, one or more of a signal sequence, an origin of replication, one or more marker genes, an enhancer element, a promoter, and a transcription termination sequence. Construction of suitable vectors containing one or more of these components employs standard ligation techniques known to the skilled artisan. Such techniques are well known in the art and well described in the scientific and patent literature. Various vectors are publicly available.

The recombinant expression vector may be delivered to the target host cell by a variety of methods, as described more fully below and in the examples. Such methods include, for example, viral infection, transfection, lipofection, electroporation, calcium phosphate precipitation, polyethyleneimine (PEI) -mediated transfection, DEAE-dextran-mediated transfection, liposome-mediated transfection, particle gun technology, nuclear transfection, electroporation, cell extrusion, calcium phosphate precipitation, direct microinjection, nanoparticle-mediated nucleic acid delivery, and the like. Many transfection techniques are known in the art; see, e.g., sambrook et al, (1989), "Molecular Cloning, a laboratory manual", cold spring harbor laboratory, new york. Packaging cells are commonly used to form viral particles; such cells include HEK293 cells or HEK293T cells (as well as other cells known in the art) that package adenovirus.

In some embodiments, host cells transfected with the above AAV expression vectors are capable of providing AAV helper functions to replicate and encapsidate nucleotide sequences flanking AAV ITRs, thereby producing rAAV viral particles. AAV helper functions are typically AAV-derived coding sequences that can be expressed to provide AAV gene products that in turn function in a trans-form for productive AAV replication. In some embodiments, packaging cells are transfected with a plasmid comprising AAV helper functions to complement the essential AAV functions deleted in the AAV expression vector. Thus, the AAV helper function plasmid comprises one or both of a primary AAV ORF (open reading frame) encoding the rep and cap coding regions or functional homologs thereof, and an adenovirus helper gene comprising the E2A, E4 and VA genes operably linked to a promoter. The helper functions may be introduced into the host cell and then expressed in the host cell using methods known to those skilled in the art. Typically, helper functions are provided by infecting host cells with an unrelated helper virus. In some embodiments, an ancillary function carrier is used to provide the ancillary function. Any of a number of suitable transcription and translation auxiliary elements (including constitutive and inducible promoters, transcription enhancer elements, transcription terminators, etc.) may be used in the expression vector, depending on the host/vector system used.

VII application of

AAV systems provided herein can be used in methods of modifying target nucleic acid sequences in a variety of applications, including therapeutic, diagnostic, and research.

In the methods of modifying a target nucleic acid sequence in a cell described herein, the methods utilize any of the embodiments of the AAV systems described herein. In some cases, these methods knock down the expression of the mutant gene product. In other cases, these methods knock out

Expression of the mutated gene product. In other cases, these methods result in the expression of functional proteins of the gene product.

In some embodiments, the methods comprise contacting the target nucleic acid sequence with an AAV encoding a CasX protein and a guide nucleic acid comprising a targeting sequence, wherein the contacting results in modification of the target nucleic acid sequence by the CasX protein of the RNP. In some embodiments, the methods comprise introducing into a cell an AAV encoding a CasX protein and a gRNA, wherein the targeting sequence of the gRNA comprises a sequence complementary to a portion of a target nucleic acid, wherein the contacting results in modification of the target nucleic acid of the RNP. In some embodiments, the scaffold encoding the gRNA comprises a sequence selected from the group consisting of SEQ ID NOs 2101-2285, 39981-40026, 40913-40958, and 41817 as shown in table 2, or a sequence having at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% sequence identity thereto, and the CasX protein encoded is a reference CasX protein SEQ ID NO 1, SEQ ID NO 2, or SEQ ID NO 3, or a CasX variant comprising a sequence selected from the group consisting of SEQ ID NOs 49-160, 40208-40369, and 40828-40912 as shown in table 3, or a sequence having at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% sequence identity thereto.

In some embodiments, the modified target nucleic acid comprises a single strand break, resulting in a mutation, insertion, or deletion through the repair mechanism of the cell. In other embodiments, the modified target nucleic acid comprises a double strand break, resulting in a mutation, insertion or deletion through the repair mechanism of the cell. For example, the CasX: gRNA system encoded by AAV can introduce indels, such as frameshift mutations, into cells at or near the start of the gene. In other embodiments, the modified target nucleic acid of the cell has been modified by insertion of a donor template, wherein the gene comprising the target nucleic acid has been knocked down or knocked out.

In other embodiments, the method comprises contacting the target nucleic acid sequence with an AAV encoding a plurality (e.g., two or more) grnas targeting different or overlapping regions of a target nucleic acid having one or more mutations or replications. In the above, the resulting modification may be an insertion, deletion, substitution, duplication or inversion of one or more nucleotides as compared to the target nucleic acid sequence.

VIII method of treatment

The present disclosure provides methods of treating a disease in a subject in need thereof. In some embodiments, the methods of the present disclosure can prevent, treat, and/or ameliorate a disease in a subject by administering to the subject an AAV composition of the present disclosure. In some embodiments, the composition administered to the subject further comprises a pharmaceutically acceptable carrier, diluent, or excipient.

In some embodiments, the disclosure provides methods of treating a disease in a subject in need thereof, comprising modifying a target nucleic acid in a cell of the subject, the modification comprising administering to the subject a therapeutically effective dose of an AAV vector of any of the embodiments described herein, wherein the targeting sequence of the encoded gRNA has a sequence that hybridizes to the target nucleic acid, thereby resulting in modification of the target nucleic acid by the CasX protein.

In other embodiments, a method of treating a disease in a subject in need thereof comprises administering to the subject a therapeutically effective dose of an AAV vector of any of the embodiments described herein, wherein the targeting sequence of the encoded gRNA has a sequence that hybridizes to a target nucleic acid, and wherein the AAV further comprises a donor template comprising one or more mutations or a heterologous sequence inserted into or replacing the target nucleic acid sequence to knock down or knock out a gene comprising the target nucleic acid. In the above, the insertion of a donor template is used to disrupt the expression of the gene and the resulting gene product. In some embodiments of the foregoing methods, the donor DNA template ranges in size from 10 to 15,000 nucleotides. In other embodiments of the foregoing methods, the donor template ranges in size from 100 to 1,000 nucleotides. In some cases, the donor template is a single-stranded RNA or DNA template.

The modified cells of the subject to be treated may be eukaryotic cells selected from rodent cells, mouse cells, rat cells, primate cells, non-human primate cells, and human cells. In some embodiments, the eukaryotic cells of the subject being treated are human cells.

In some embodiments, the method comprises administering an AAV vector of embodiments described herein to a subject via an administration route selected from the group consisting of subcutaneous, intradermal, intraneural, intranodular, intramedullary, intramuscular, intravitreal, intrathecal, subarachnoid, intraventricular, intracapsular, intravenous, intralymphatic, intraocular, or intraperitoneal routes, wherein the administration method is injection, infusion, or implantation. In some embodiments of the method of treating a disease in a subject, the subject is selected from the group consisting of mice, rats, pigs, non-human primates, and humans. In a specific embodiment, the subject is a human.

A number of therapeutic strategies have been used to design compositions for use in methods of treating subjects suffering from a disease. In some embodiments, the invention provides methods of treating a subject having a disease, the method comprising administering to the subject an AAV vector of any of the embodiments disclosed herein using a therapeutically effective dose according to a treatment regimen comprising one or more consecutive doses. In some embodiments of this treatment regimen, a therapeutically effective dose of AAV vector is administered as a single dose. In other embodiments of this treatment regimen, the therapeutically effective dose is administered to the subject in two or more doses over a period of at least two weeks, or at least one month, or at least two months, or at least three months, or at least four months, or at least five months, or at least six months. In some embodiments of this treatment regimen, the effective dose is administered by a route selected from subcutaneous, intradermal, intraneural, intranodular, intramedullary, intramuscular, intravitreal, intrathecal, subarachnoid, intraventricular, intracapsular, intravenous, intralymphatic, intraocular, subretinal, intravitreal, or intraperitoneal routes, wherein the method of administration is injection, infusion, or implantation.

In some embodiments, administration of a therapeutically effective amount of an AAV vector to knock down or knock out expression of a gene having one or more mutations results in preventing or ameliorating a potential disease such that an improvement is observed in a subject, although the subject may still have the potential disease. In some embodiments, administration of a therapeutically effective amount of an AAV vector results in an improvement in at least one clinically relevant parameter of the disease. In some embodiments of the method of treatment, the subject is selected from the group consisting of mice, rats, pigs, dogs, non-human primates, and humans.

In some embodiments, the disclosure provides a composition of any of the AAV embodiments described herein for use as a medicament for treating a human in need thereof. In some embodiments, the medicament is administered to the subject according to a treatment regimen comprising one or more consecutive doses with a therapeutically effective dose.

AAV engineered to reduce immunogenicity and retain editing properties

AAV-associated pathogen-associated molecular patterns (PAMPs) that contribute to immune responses in mammalian hosts include: i) Ligands present on the rAAV viral capsid that bind to toll-like receptor 2 (TLR 2), a cell surface PRR on non-parenchymal cells in the liver; and ii) unmethylated CpG dinucleotides in viral DNA that bind TLR9, TLR9 being an endosomal PRR in plasmacytoid dendritic cells (pDC) and B cells (Faust, SM et al, cpG-amplified adeno-associated virus vectors evade immune detection, j.clinical invest., volume 123: page 2294 (2013)). In particular, cpG dinucleotide motifs (CpG PAMPs) in AAV vectors are immunostimulatory in that they are highly hypomethylated relative to mammalian CpG motifs that are highly methylated. Thus, reducing the frequency of unmethylated CpG in the AAV vector genome to a level below the threshold for activating human TLR9 is expected to reduce the immune response to an externally administered AAV-based biologic. Similarly, methylation of CpG PAMPs in AAV constructs is expected to similarly reduce immune responses to AAV-based biologicals.

In some embodiments, the disclosure provides AAV vectors in which one or more components of the transgene are codon optimized to deplete CpG dinucleotides by substituting homologous nucleotide sequences from mammalian species, wherein the one or more components substantially retain their functional properties after expression in the transduced cells; for example, the ability to drive expression of a CRISPR nuclease, the ability to drive expression of a gRNA, the ability to enhance expression of a CRISPR nuclease and/or a gRNA, and the ability to edit a target nucleic acid sequence. In some embodiments, the disclosure provides AAV vectors wherein one or more AAV transgene component sequences selected from the group consisting of a 5'itr, a 3' itr, a pol III promoter, a pol II promoter, a coding sequence for a CRISPR nuclease, a coding sequence for a gRNA, an auxiliary element, and a poly (a) are codon optimized to deplete all or a portion of CpG dinucleotides, wherein the resulting AAV vector transgene is substantially free of CpG dinucleotides. In some embodiments, the disclosure provides AAV vectors wherein one or more AAV transgene component sequences selected from the group consisting of a 5'itr, a 3' itr, a pol III promoter, a pol II promoter, a coding sequence for a CRISPR nuclease, a coding sequence for a gRNA, a poly (a), and an accessory element comprise less than about 10%, less than about 5%, or less than about 1% CpG dinucleotides. In some embodiments, the disclosure provides AAV vectors wherein one or more AAV transgene component sequences selected from the group consisting of a 5' ITR, a 3ITR, a pol III promoter, a pol II promoter, a coding sequence for a CRISPR nuclease, a coding sequence for a gRNA, and a poly (a) are free of CpG dinucleotides. In some embodiments, the disclosure provides AAV vectors wherein the transgene comprises less than about 10%, less than about 5%, or less than about 1% CpG dinucleotides. In some embodiments, the disclosure provides AAV vectors wherein the one or more AAV component sequences that are codon optimized for the depletion of CpG dinucleotides are selected from the group consisting of the sequences shown in table 25, 41045-41055, or sequences having at least about 80%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% sequence identity thereto. In some embodiments, the disclosure provides AAV vectors having one or more components of a transgene codon optimized for the depletion of CpG dinucleotides, wherein the expressed CRISPR nucleases and grnas retain at least about 60%, at least about 70%, at least about 80% or at least about 90% of the target nucleic acid editing potential when assayed under comparable conditions in an in vitro assay, as compared to AAV vectors in which the transgene has not been codon optimized for the depletion of CpG dinucleotides. In a particular embodiment, the present disclosure provides AAV vectors wherein the one or more AAV component sequences maintaining editing potential that are codon optimized for the depletion of CpG dinucleotides are selected from the group consisting of the sequences of SEQ ID NOs 41045-41055, or sequences having at least about 80%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% sequence identity thereto, as shown in table 25.

As an improved feature, embodiments of AAV vectors comprising one or more components of a transgene codon optimized for the depletion of CpG dinucleotides have lower potential to induce an immune response, whether in vivo (when administered to a subject) or in vitro mammalian cell assays designed to detect inflammatory response markers. In some embodiments, administration of a therapeutically effective dose of an AAV vector comprising one or more components of a transgene that is codon optimized for consumption of CpG dinucleotides to a subject results in a reduced immune response compared to the immune response of a comparable AAV vector in which the transgene has not been codon optimized for consumption of CpG dinucleotides, wherein the reduced response is determined by measuring one or more parameters, such as production of antibodies or delayed-type hypersensitivity to AAV components, or inflammatory cytokines and markers such as, but not limited to, TLR9, interleukin-1 (IL-1), IL-6, IL-12, IL-18, tumor necrosis factor α (TNF- α), interferon γ (ifny), and granulocyte-macrophage colony stimulating factor (GM-CSF). In some embodiments, an AAV vector comprising one or more components of a transgene that are substantially free of CpG dinucleotides causes a reduction in production of one or more inflammatory markers selected from TLR9, interleukin-1 (IL-1), IL-6, IL-12, IL-18, tumor necrosis factor alpha (TNF-alpha), interferon gamma (ifnγ), and granulocyte-macrophage colony stimulating factor (GM-CSF) of at least about 10%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 80%, or at least about 90% when assayed in a cell-based in vitro assay using cells known in the art to be suitable for such assays (e.g., monocytes, macrophages, T-cells, B-cells, etc.), as compared to a comparable AAV that is not depleted of CpG. In a specific embodiment, an AAV vector comprising one or more components of a transgene that is codon optimized for CpG dinucleotide depletion exhibits at least about 10%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 80%, or at least about 90% reduced activation of TLR9 in hNPC in an in vitro assay, as compared to a comparable AAV that is not CpG depleted.

X, kit and article of manufacture

In other embodiments, provided herein are kits comprising an AAV vector of any of the embodiments of the disclosure and a suitable container (e.g., tube, vial, or plate).

In some embodiments, the kit further comprises a buffer, a nuclease inhibitor, a protease inhibitor, a liposome, a therapeutic agent, a label visualization agent, or any combination of the foregoing. In some embodiments, the kit further comprises a pharmaceutically acceptable carrier, diluent, or excipient.

In some embodiments, the kit includes suitable control compositions for use in genetic modification applications and instructions for use.

XI. Exemplified embodiment

The group of embodiments listed below is included for illustrative purposes and is not intended to limit the scope of the present invention.

Group I：

The embodiments of group I relate to the tables provided in U.S. provisional application 63/123,112 and the sequence listing filed on day 9 of month 12 in 2020 with U.S. provisional application 63/123,112.

Embodiment I-1: a polynucleotide comprising

a. A first adeno-associated virus (AAV) Inverted Terminal Repeat (ITR) sequence;

b. a second AAV ITR sequence;

c. A first promoter sequence;

d. a sequence encoding a CRISPR protein;

e. a sequence encoding at least a first guide RNA (gRNA); and optionally

f. At least one sequence of auxiliary elements.

Embodiment I-2: the polynucleotide of embodiment I-1 wherein the CRISPR protein sequence and the sequence encoding at least the first gRNA are less than about 3100, less than about 3090, less than about 3080, less than about 3070, less than about 3060, less than about 3050, or less than about 3040 nucleotides in length.

Embodiment I-3: the polynucleotide of embodiment I-1 or I-2, wherein the sequence of the first promoter and the at least one auxiliary element has a combined length of greater than at least about 1300, at least about 1350, at least about 1360, at least about 1370, at least about 1380, at least about 1390, at least about 1400, at least about 1500, at least about 1600 nucleotides, at least 1650, at least about 1700, at least about 1750, at least about 1800, at least about 1850, or at least about 1900 nucleotides.

Embodiment I-4: the polynucleotide according to embodiment I-1 or I-2 wherein the sequence of the first promoter and the at least one auxiliary element has a combined length of greater than 1314 nucleotides.

Embodiment I-5: the polynucleotide according to embodiment I-1 or I-2, wherein the sequence of the first promoter and the at least one auxiliary element has a combined length of more than 1381 nucleotides.

Embodiment I-6: the polynucleotide of any one of the preceding embodiments, wherein the first promoter sequence and the sequence encoding the CRISPR protein are operably linked.

Embodiment I-7: the polynucleotide of any one of the preceding embodiments, wherein the sequence encoding the CRISPR protein and at least the first guide RNA is operably linked to a first promoter.

Embodiment I-8: the polynucleotide of any one of the preceding embodiments, wherein at least one auxiliary element is operably linked to a CRISPR protein.

Embodiment I-9: the polynucleotide according to any one of embodiments I-1 to I-6, further comprising a second promoter.

Embodiment I-10: the polynucleotide of embodiments I-9 wherein the second promoter sequence and the sequence encoding the gRNA are operably linked.

Embodiment I-11: the polynucleotide of embodiment I-9 or I-10, wherein the combined length of the sequences of the first promoter, the second promoter, and the at least one auxiliary element is greater than at least about 1300, at least about 1350, at least about 1360, at least about 1370, at least about 1380, at least about 1390, at least about 1400, at least about 1500, at least about 1600 nucleotides, at least 1650, at least about 1700, at least about 1750, at least about 1800, at least about 1850, or at least about 1900 nucleotides.

Embodiment I-12: the polynucleotide of embodiment I-9 or I-10 wherein the combined length of the sequences of the first promoter, the second promoter and the at least one auxiliary element is greater than 1314 nucleotides.

Embodiment I-13: the polynucleotide according to embodiment I-9 or I-10, wherein the combined length of the sequences of the first promoter, the second promoter and the at least one auxiliary element is greater than 1381 nucleotides.

Embodiment I-14: the polynucleotide according to any one of embodiments I-1 to I-13 comprising two or more auxiliary elements.

Embodiment I-15: the polynucleotide of embodiments I-14, wherein the combined length of the sequences of the first promoter, the second promoter, and the two or more auxiliary elements is greater than at least about 1300, at least about 1350, at least about 1360, at least about 1370, at least about 1380, at least about 1390, at least about 1400, at least about 1500, at least about 1600 nucleotides, at least 1650, at least about 1700, at least about 1750, at least about 1800, at least about 1850, or at least about 1900 nucleotides.

Embodiment I-16: the polynucleotide of embodiment I-14 wherein the combined length of the sequences of the first promoter, the second promoter, and the two or more auxiliary elements is greater than 1314 nucleotides.

Embodiment I-17: the polynucleotide of embodiments I-14 wherein the combined length of the sequences of the first promoter, the second promoter, and the two or more auxiliary elements is greater than 1381 nucleotide.

Embodiment I-18: the polynucleotide according to any one of embodiments I-1 to I-17, wherein the polynucleotide comprises a second promoter, wherein at least 25%, 26%, 27%, 28%, 29%, 30%, 31%, 32%, 33%, 34% or at least 35% or more of the length of the polynucleotide sequence comprises the sequences of the first and second promoters and the at least one auxiliary element in combination length.

Embodiment I-19: the polynucleotide according to any one of the preceding embodiments, wherein the at least one auxiliary element is selected from the group consisting of: poly (a) signals, gene enhancer elements, introns, post-transcriptional regulatory elements, nuclear Localization Signals (NLS), deaminase, DNA glycosylase inhibitors, third promoters, second guide RNAs, stimulators of CRISPR-mediated homology directed repair, activators or repressors of transcription, and self-cleaving sequences.

Embodiment I-20: the polynucleotide of any one of the preceding embodiments, wherein the helper element enhances the expression, binding, activity or performance of the CRISPR protein compared to the CRISPR protein in the absence of the helper element.

Embodiment I-21: the polynucleotide of embodiments I-20, wherein the enhanced property is an increase in editing of the target nucleic acid in an in vitro assay of at least about 10%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 90%, at least about 100%, at least about 1500%, at least about 200%, or at least about 300%.

Embodiment I-22: the polynucleotide of any one of the preceding embodiments, wherein the CRISPR protein is a class 2 CRISPR protein.

Embodiment I-23: the polynucleotide of embodiment I-22 wherein the CRISPR protein is a class 2V CRISPR protein.

Embodiment I-24: the polynucleotide of embodiment I-23 wherein the class 2V CRISPR protein is CasX.

Embodiment I-25: the polynucleotide according to embodiments I-24, wherein CasX comprises a sequence selected from the group consisting of SEQ ID NOs 1-3 and 49-160 as set forth in Table 3, or a sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% identity thereto.

Embodiment I-26: the polynucleotide according to embodiment I-24 wherein CasX comprises a sequence selected from the sequences of SEQ ID NOS 1-3 and 49-160 as shown in Table 3.

Embodiment I-27: the polynucleotide according to any one of the preceding embodiments, wherein the first gRNA comprises a sequence selected from the sequences of SEQ ID NOs 2101-2285 as set forth in table 2, or a sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% identity thereto.

Embodiment I-28: the polynucleotide according to any one of the preceding embodiments, wherein the first gRNA comprises a sequence selected from the sequences of SEQ ID NOs 2101-2285 as shown in table 2.

Embodiment I-29: the polynucleotide of embodiments I-28, wherein the first gRNA comprises a targeting sequence that is complementary to the target nucleic acid sequence, wherein the targeting sequence has at least 15 to 20 nucleotides.

Embodiment I-30: the polynucleotide according to any one of embodiments I-19 to I-29, wherein the second gRNA comprises a sequence selected from the sequences of SEQ ID NOS 2101-2285, as shown in Table 2.

Embodiment I-31: the polynucleotide of embodiment I-30, wherein the second gRNA comprises a targeting sequence complementary to a target nucleic acid sequence of a target nucleic acid different from embodiment I-28, wherein the targeting sequence has at least 15 to 20 nucleotides.

Embodiment I-32: the polynucleotide according to any one of the preceding embodiments comprising the sequences of tables 4, 5, 6, 7, 9, 10 and 12, or a sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% identity thereto.

Embodiment I-33: the polynucleotide according to any one of embodiments I-1 to I-31 comprising the sequences of tables 4, 5, 6, 7, 9, 10 and 12.

Embodiment I-34: the polynucleotide according to any one of the preceding embodiments, wherein the auxiliary element is a post-transcriptional regulatory element (PTRE) selected from the group consisting of: the cytomegalovirus immediate/early intron a, hepatitis b virus PRE (HPRE), woodchuck hepatitis virus PRE (WPRE), and the 5' untranslated region (UTR) of human heat shock protein 70mRNA (Hsp 70).

Embodiment I-35: the polynucleotide of any one of the preceding embodiments, wherein the first promoter sequence has at least about 200, at least about 300, at least about 400, at least about 500, at least about 600, at least about 700, or at least about 800 nucleotides.

Embodiment I-36: the polynucleotide according to any one of embodiments I-9 to I-35, wherein the second promoter sequence has at least about 200, at least about 300, at least about 400, at least about 500, at least about 600, at least about 700, or at least about 800 nucleotides.

Embodiment I-37: the polynucleotide according to any one of the preceding embodiments, wherein the polynucleotide has the configuration of the construct of figure 15, figure 21 or figure 22.

Embodiment I-38: the polynucleotide of any one of the preceding embodiments, wherein the 5 'and 3' itrs are derived from serotypes AAV1, AAV2, AAV3, AAV4, AAV5, AAV6, AAV7, AAV8, AAV9, AAV10, AAV11, AAV12, AAV 44.9, AAV-Rh74, or AAVRh10.

Embodiment I-39: a recombinant adeno-associated virus (rAAV), the recombinant adeno-associated virus comprising: a) AAV capsid protein and b) a polynucleotide according to any one of embodiments I-1 to I-38.

Embodiment I-40: the rAAV of embodiments I-39, wherein the AAV capsid protein is derived from serotype AAV1, AAV2, AAV3, AAV4, AAV5, AAV6, AAV7, AAV8, AAV9, AAV10, AAV11, AAV12, AAV 44.9, AAV-Rh74, or AAVRh10.

Embodiment I-41: the rAAV of embodiment I-40, wherein the AAV capsid proteins and the 5 'and 3' itrs are derived from the same serotype AAV.

Embodiment I-42: the rAAV of embodiment I-40, wherein the AAV capsid proteins and the 5 'and 3' itrs are derived from AAV of different serotypes.

Embodiment I-43: a pharmaceutical composition comprising a rAAV of any one of embodiments I-39 to I-42 and a pharmaceutically acceptable carrier, diluent, or excipient.

Embodiment I-44: a method of modifying a target nucleic acid in a population of mammalian cells, the method comprising contacting a plurality of cells with an effective amount of the rAAV of any one of embodiments I-39 to I-42 or the pharmaceutical composition of embodiment I-43, wherein the target nucleic acid of the cells targeted by the gRNA is modified by a CRISPR protein.

Embodiment I-45: the method according to embodiments I-44, wherein the modification comprises introducing one or more nucleotide insertions, deletions, substitutions, replications or inversions in the target nucleic acid of the cells of the population.

Embodiment I-46: a method of making a rAAV vector, the method comprising:

i) Providing a population of cells; and is also provided with

ii) transfecting the population of cells with a vector comprising the polynucleotide of any one of embodiments I-1 to I-38.

Embodiment I-47: the method of embodiments I-46, wherein the population of cells expresses an AAV rep gene and an AAV cap gene.

Embodiment I-48: the method of embodiments I-46, further comprising transfecting the cell with one or more vectors encoding an AAV rep gene and an AAV cap gene.

Embodiment I-49: the method of any one of embodiments I-46 to I-48, further comprising recovering the rAAV vector.

Group II：

The embodiment of group II relates to the tables provided in U.S. provisional application 63/235,638 and the sequence listing filed on day 20 8 of 2021 with U.S. provisional application 63/235,638.

Embodiment II-1: a polynucleotide comprising

b. a second AAV ITR sequence;

c. a first promoter sequence;

d. a sequence encoding a CRISPR protein;

e. a sequence encoding at least a first guide RNA (gRNA); and

f. optionally at least one auxiliary element sequence.

Embodiment II-2: the polynucleotide of embodiment II-1 wherein the sequence encoding the CRISPR protein and the sequence encoding the at least first gRNA are less than about 3100, less than about 3090, less than about 3080, less than about 3070, less than about 3060, less than about 3050, or less than about 3040 nucleotides in length.

Embodiment II-3: the polynucleotide of embodiment II-1 or II-2 wherein the sequence of the first promoter and the at least one auxiliary element has a combined length of greater than at least about 1300, at least about 1350, at least about 1360, at least about 1370, at least about 1380, at least about 1390, at least about 1400, at least about 1500, at least about 1600 nucleotides, at least 1650, at least about 1700, at least about 1750, at least about 1800, at least about 1850, or at least about 1900 nucleotides.

Embodiment II-4: the polynucleotide of embodiment II-1 or II-2 wherein the sequence of the first promoter and the at least one auxiliary element has a combined length of greater than 1314 nucleotides.

Embodiment II-5: the polynucleotide of embodiment II-1 or II-2 wherein the sequence of the first promoter and the at least one auxiliary element has a combined length of greater than 1381 nucleotides.

Embodiment II-6: the polynucleotide of any one of the preceding embodiments, wherein the first promoter sequence and the sequence encoding a CRISPR protein are operably linked.

Embodiment II-7: the polynucleotide according to embodiment II-6 wherein the first promoter is a pol II promoter.

Embodiment II-8: the polynucleotide of embodiment II-6 or II-7, wherein the promoter is selected from the group consisting of polyubiquitin C (UBC), cytomegalovirus (CMV), simian virus 40 (SV 40), chicken β -actin promoter and rabbit β -globin splice acceptor site fusion (CAG), chicken β -actin promoter with cytomegalovirus enhancer (CB 7), PGK, jens Tornoe (JeT), GUSB, CBA hybrid (CBh), elongation factor-1A (EF-1A), β -actin, rous Sarcoma Virus (RSV), silenced Spleen Focus Forming Virus (SFFV), CMVd1 promoter, truncated human CMV (tcmv2), minimal CMV promoter, chicken β -actin promoter, TK promoter, mini-TK promoter, minimal IL-2 promoter, GRP94 promoter, super core promoter 1, super core promoter 2, MLC, MCK, GRK protein promoter, rho, CAR protein promoter, U b protein promoter, U1A, synopsis, CAT 1, and 8, mch rpp, rpp 5, rpl, and 8, rpl, 8, and 8.

Embodiment II-9: the polynucleotide of embodiment II-8, wherein the promoter is UBC, CMV, SV, CAG, CB7, PGK, jeT, GUSB, CB, EF-1 a, β -actin, RSV, SFFV, CMVd1, tcmdvd 2, minimal CMV, chicken β -actin, HSV TK, mini-TK, minimal IL-2, GRP94, super core promoter 1, super core promoter 2, MLC, MCK, GRK1 protein Rho, CAR protein, hSyn, U1A r, ribosomal Rpl and Rps (e.g., hRpl30 and hRps 18), CMV53, SV40 promoter, CMV53, SFCp, pJB42CAT5, MLP, EFS, meP426, mecP2, MHCK7, (GUSB, CK7, or truncated variants of CK8e promoters.

Embodiment II-10: the polynucleotide of embodiment II-8 or II-9 wherein the promoter has less than about 400 nucleotides, less than about 350 nucleotides, less than about 300 nucleotides, less than about 200 nucleotides, less than about 150 nucleotides, less than about 100 nucleotides, less than about 80 nucleotides, or less than about 40 nucleotides.

Embodiment II-11: the polynucleotide of embodiment II-8 or II-9 wherein the promoter has from about 40 to about 585 nucleotides, from about 100 to about 400 nucleotides, or from about 150 to about 300 nucleotides.

Embodiment II-12: the polynucleotide according to any one of the preceding embodiments, wherein the promoter is selected from the group consisting of SEQ ID NOs 40370-40400 as shown in table 4, or a sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% identity thereto.

Embodiment II-13: the polynucleotide of any one of the preceding embodiments, wherein the at least one auxiliary element is operably linked to the CRISPR protein.

Embodiment II-14: the polynucleotide of any one of embodiments II-1 to II-6, further comprising a second promoter.

Embodiment II-15: the polynucleotide of embodiments II-14 wherein the second promoter sequence and the sequence encoding the gRNA are operably linked.

Embodiment II-16: the polynucleotide according to embodiment II-14 or II-15 wherein the second promoter is a pol III promoter.

Embodiment II-17: the polynucleotide of any one of embodiments II-10 to II-12, wherein the second promoter is selected from the group consisting of U6, mini U61, mini U62, mini U63, biH1 (bi-directional H1 promoter), biU6 (bi-directional U6 promoter), gorilla U6, rhesus U6, human 7sk, and human H1 promoter.

Embodiment II-18: the polynucleotide of embodiments II-17 wherein the promoter is a truncated variant of the U6, mini U61, mini U62, mini U63, biH1, biU6, gorilla U6, rhesus U6, human 7sk or human H1 promoter.

Embodiment II-19: the polynucleotide of embodiment II-17 or II-18 wherein the promoter has less than about 250 nucleotides, less than about 220 nucleotides, less than about 200 nucleotides, less than about 160 nucleotides, less than about 140 nucleotides, less than about 130 nucleotides, less than about 120 nucleotides, less than about 100 nucleotides, less than about 80 nucleotides, or less than about 70 nucleotides.

Embodiment II-20: the polynucleotide of embodiment II-17 or II-18 wherein the promoter has from about 70 to about 245 nucleotides, from about 100 to about 220 nucleotides, or from about 120 to about 160 nucleotides.

Embodiment II-21: the polynucleotide of any one of embodiments II-14 to II-20, wherein the promoter is selected from the group consisting of SEQ ID NOs 40401-40400 as set forth in table 5, or a sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% identity thereto.

Embodiment II-22: the polynucleotide of any one of embodiments II-14 to II-21, wherein the second promoter enhances transcription of the gRNA.

Embodiment II-23: the polynucleotide of any one of embodiments II-14 to II-22, wherein the combined length of the sequences of the first promoter and the second promoter is greater than at least about 1300, at least about 1350, at least about 1360, at least about 1370, at least about 1380, at least about 1390, at least about 1400, at least about 1500, at least about 1600 nucleotides, at least 1650, at least about 1700, at least about 1750, at least about 1800, at least about 1850, or at least about 1900 nucleotides.

Embodiment II-24: the polynucleotide of any one of embodiments II-14 to II-23, wherein the combined length of the sequences of the first promoter, the second promoter, and the at least one auxiliary element is greater than at least about 1300, at least about 1350, at least about 1360, at least about 1370, at least about 1380, at least about 1390, at least about 1400, at least about 1500, at least about 1600 nucleotides, at least 1650, at least about 1700, at least about 1750, at least about 1800, at least about 1850, or at least about 1900 nucleotides.

Embodiment II-25: the polynucleotide of any one of embodiments II-14 to II-24, wherein the combined length of the sequences of the first promoter, the second promoter, and the at least one auxiliary element is greater than 1314 nucleotides.

Embodiment II-26: the polynucleotide of any one of embodiments II-14 to II-24, wherein the combined length of the sequences of the first promoter, the second promoter, and the at least one auxiliary element is greater than 1381 nucleotide.

Embodiment II-27: the polynucleotide according to any one of the preceding embodiments comprising two or more accessory elements.

Embodiment II-28: the polynucleotide of embodiments II-27, wherein the combined length of the sequences of the first promoter, the second promoter, and the two or more auxiliary elements is greater than at least about 1300, at least about 1350, at least about 1360, at least about 1370, at least about 1380, at least about 1390, at least about 1400, at least about 1500, at least about 1600, at least 1650, at least about 1700, at least about 1750, at least about 1800, at least about 1850, or greater than at least about 1900 nucleotides.

Embodiment II-29: the polynucleotide of embodiments II-27 wherein the combined length of the sequences of the first promoter, the second promoter, and the two or more auxiliary elements is greater than 1314 nucleotides.

Embodiment II-30: the polynucleotide of embodiments II-27 wherein the combined length of the sequences of the first promoter, the second promoter, and the two or more auxiliary elements is greater than 1381 nucleotide.

Embodiment II-31: the polynucleotide of any one of embodiments II-14 to II-30, wherein at least 25%, 26%, 27%, 28%, 29%, 30%, 31%, 32%, 33%, 34% or at least 35% or more of the length of the polynucleotide sequence comprises the sequences of the first and second promoters and the at least one auxiliary element in a combined length.

Embodiment II-32: the polynucleotide according to any one of the preceding embodiments, wherein the auxiliary element is selected from the group consisting of: poly (a) signals, gene enhancer elements, introns, post-transcriptional regulatory elements (PTRE), nuclear Localization Signals (NLS), deaminase, DNA glycosylase inhibitors, third promoters, second guide RNAs, stimulators of CRISPR-mediated homology directed repair, or transcriptional activators or repressors.

Embodiment II-33: the polynucleotide of any one of the preceding embodiments, wherein the helper element enhances transcription, transcription termination, expression, binding, activity or performance of the CRISPR protein as compared to an otherwise identical polynucleotide lacking the helper element.

Embodiment II-34: the polynucleotide of embodiments II-33 wherein the enhanced property is an increase in target nucleic acid by editing of the CRISPR protein in an in vitro assay of at least about 10%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 90%, at least about 100%, at least about 150%, at least about 200%, or at least about 300%.

Embodiment II-35: the polynucleotide of any one of the preceding embodiments, wherein the CRISPR protein is a class 2 CRISPR protein.

Embodiment II-36: the polynucleotide of embodiments II-35 wherein the CRISPR protein is a class 2V CRISPR protein.

Embodiment II-37: the polynucleotide of embodiments II-36 wherein the class 2V CRISPR protein is CasX.

Embodiment II-38: the polynucleotide of embodiments II-37 wherein the encoded CasX comprises a sequence selected from the group consisting of SEQ ID NOs 1-3, 49-160 and 40208-40369 as shown in table 3 and SEQ ID NOs 40808-40827 as shown in table 21, or a sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% identity thereto.

Embodiment II-39: the polynucleotide according to embodiments II-37 wherein the encoded CasX comprises a sequence selected from the group consisting of SEQ ID NOS 1-3, 49-160 and 40208-40369 as shown in Table 3 and SEQ ID NOS 40808-40827 as shown in Table 21.

Embodiment II-40: the polynucleotide of any one of embodiments II-35 to II-39, wherein the polynucleotide encodes one or more NLS linked to a sequence encoding CasX.

Embodiment II-41: the polynucleotide of embodiments II-40 wherein the sequence encoding one or more NLS is positioned at or near the 5' end of the sequence encoding CasX protein.

Embodiment II-42: the polynucleotide of embodiment II-40 or II-41 wherein the sequence encoding one or more NLS is positioned at or near the 3' end of the sequence encoding CasX protein.

Embodiment II-43: the polynucleotide of embodiment II-41 or II-42, wherein the polynucleotide encodes at least two NLSs, wherein the sequences encoding the at least two NLSs are positioned at or near the 5 'and 3' ends of the sequence encoding the CasX protein.

Embodiment II-44: the polynucleotide of any one of embodiments II-40 to II-43, wherein the one or more encoded NLS is selected from the group consisting of: PKKKRKV (SEQ ID NO: 196), KRPAATKKAGQAKKKK (SEQ ID NO: 197), PAAKRVKLD (SEQ ID NO: 248), RQRRNELKRSP (SEQ ID NO: 161), NQSSNFGPMKGGNFGGRSSGPYGGGGQYFAKPRNQGGY (SEQ ID NO: 162), RMRIZFKKKKGTARDRRVERRVERVELRKAKKKKKKRNV (SEQ ID NO: 163), VSRKRPRP (SEQ ID NO: 164), PPKKARED (SEQ ID NO: 165), PQPKKKKPL (SEQ ID NO: 166), SALIKKKKKMAP (SEQ ID NO: 167), DRLRR (SEQ ID NO: 168), PKQKKRK (SEQ ID NO: 169), RKLKKKIKKL (SEQ ID NO: 170), REKKKFLKRR (SEQ ID NO: 171), KRKGDEVDGVDEVAKKKSKK (SEQ ID NO: 172), RKCLQAGMNLEARKTKK (SEQ ID NO: 173), PRPRKIPR (SEQ ID NO: 174), PPRKKRVV (SEQ ID NO: 175), NLSKKKKRKREK (SEQ ID NO: 176), RRPSRPFRKP (SEQ ID NO: 177), KRPSPSS (SEQ ID NO: 178), KRGINDRNFWRGENERKTR (SEQ ID NO: 179), PRPPKMARYDN (SEQ ID NO: 180), KRRAF (SEQ ID NO: 3868), 5635 (SEQ ID NO: 192), KRKGDEVDGVDEVAKKKSKK (SEQ ID NO: 172), RKCLQAGMNLEARKTKK (SEQ ID NO: 192), 6575 (SEQ ID NO: 192), PPRKKRPKKKKKRNV (SEQ ID NO: 35 (SEQ ID NO: 17), roll-35 (SEQ ID NO: 35), roll-75), roll-1, roll-35 (SEQ ID NO: 35), roll-1, roll-stand, 35 (SEQ ID NO: 35) PKRGRGRPKRGRGR (SEQ ID NO: 193), PKKKRKVPPPPKKKRKV (SEQ ID NO: 195), PAKRARRGYKC (SEQ ID NO: 40408), KLGPRKATGRW (SEQ ID NO: 40809), PRRRKEE (SEQ ID NO: 40810), PYRGRKE (SEQ ID NO: 40811), PLRKRPRR (SEQ ID NO: 40812), PLRKRPRRGSPLRKRPRR (SEQ ID NO: 40813), PAAKRVKLDGGKRTADGSEFESPKKKRKV (SEQ ID NO: 40814), PAAKRVKLDGGKRTADGSEFESPKKKRKVGIHGVPAA (SEQ ID NO: 40815), PAAKRVKLDGGKRTADGSEFESPKKKRKVAEAAAKEAAAKEAAAKA (SEQ ID NO: 40816), PAAKRVKLDGGKRTADGSEFESPKKKRKVPG (SEQ ID NO: 40452), KRKGSPERGERKRHW, KRTADSQHSTPPKTKRKVEFEPKKKRKV (SEQ ID NO: 40817) and PKKKRKVGGSKRTADSQHSTPPKTKRKVEFEPKKKRKV (SEQ ID NO: 40818), wherein one or more NLS are linked to a CasX variant or an adjacent NLS having a linker peptide, wherein the linker peptide is selected from the group consisting of (G) n (SEQ ID NO: 40201), (GS) n (SEQ ID NO: 40202), (GSGGS) n (SEQ ID NO: 208), (GGSGGS) n (SEQ ID NO: 209), (GGGGGGS) n (SEQ ID NO: 210), SG (SEQ ID NO: 40817) and PKKKRKVGGSKRTADSQHSTPPKTKRKVEFEPKKKRKV (SEQ ID NO: 52216), GPGGPPG (GSGG ID NO: 215), GPIG (GSG (GSGG) n (GSG) and GP ID NO: 215) PPP (GGGS) n (SEQ ID NO: 40203), (GGGS) nPPP (SEQ ID NO: 40204), AEAAAKEAAAKEAAAKA (SEQ ID NO: 40205) and TPPKTKRKVEFE (SEQ ID NO: 40206), wherein n is 1 to 5.

Embodiment II-45: the polynucleotide of any one of embodiments II-40 to II-44, wherein one or more encoded NLS is selected from SEQ ID NOs 40443-40501 as shown in table 11 and table 12, or a sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% identity thereto.

Embodiment II-46: the polynucleotide of any one of embodiments II-40 to II-43, wherein the one or more encoded NLS is selected from the group consisting of: SEQ ID NOS 40443-40501 as shown in Table 11 and Table 12.

Embodiment II-47: the polynucleotide according to any one of the preceding embodiments, wherein the first gRNA comprises a sequence selected from SEQ ID NOs 2101-2285 and 39981-40026 as shown in table 2, or a sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% identity thereto.

Embodiment II-48: the polynucleotide according to any one of the preceding embodiments, wherein the first gRNA comprises a sequence selected from the group consisting of SEQ ID NOs 2101-2285 and 39981-40026 as shown in table 2.

Embodiment II-49: the polynucleotide of embodiments II-48, wherein the first gRNA comprises a targeting sequence that is complementary to a target nucleic acid sequence, wherein the targeting sequence has at least 15 to 30 nucleotides.

Embodiment II-50: the polynucleotide of embodiments II-49 wherein the targeting sequence has 18, 19, or 20 nucleotides.

Embodiment II-51: the polynucleotide of any one of embodiments II-32 to II-50, wherein the second gRNA comprises a sequence selected from SEQ ID NOs 2101-2285 and 39981-40026 as shown in table 2, or a sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% identity thereto.

Embodiment II-52: the polynucleotide according to any one of embodiments II-32 to II-51, wherein the second gRNA comprises a sequence selected from the group consisting of SEQ ID NOs 2101-2285 and 39981-40026 as shown in table 2.

Embodiment II-53: the polynucleotide of embodiment II-51 or II-52, wherein the second gRNA comprises a targeting sequence complementary to a target nucleic acid sequence of a target nucleic acid that is different from embodiment II-49 or II-50, wherein the targeting sequence has at least 15 to 30 nucleotides.

Embodiment II-54: the polynucleotide of embodiments II-53 wherein the targeting sequence has 18, 19, or 20 nucleotides.

Embodiment II-55: the polynucleotide according to any one of the preceding embodiments, wherein the helper element is a post-transcriptional regulatory element (PTRE) selected from the group consisting of cytomegalovirus immediate/early intron a, hepatitis b virus PRE (HPRE), woodchuck hepatitis virus PRE (WPRE), and the 5' untranslated region (UTR) of human heat shock protein 70mRNA (Hsp 70).

Embodiment II-56: the polynucleotide of any one of embodiments II-1 to II-55, wherein the auxiliary element is a PTRE selected from SEQ ID NOs 40431-40442 as shown in table 8, or a sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% identity thereto.

Embodiment II-57: the polynucleotide of any one of the preceding embodiments, wherein the 5 'and 3' itrs are derived from serotypes AAV1, AAV2, AAV3, AAV4, AAV5, AAV6, AAV7, AAV8, AAV9, AAV10, AAV11, AAV12, AAV 44.9, AAV-Rh74, or AAVRh10.

Embodiment II-58: the polynucleotide of any one of the preceding embodiments, wherein the 5 'and 3' itrs are derived from serotype AAV2.

Embodiment II-59: the polynucleotide according to any one of the preceding embodiments, comprising one or more sequences selected from the group consisting of: the sequences of tables 4, 5, 6, 8, 9, 13-16 and 20, or sequences having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% identity thereto.

Embodiment II-60: the polynucleotide according to any one of the preceding embodiments comprising one or more sequences selected from the sequences of tables 4, 5, 6, 8, 9, 13-16 and 20.

Embodiment II-61: the polynucleotide according to any one of the preceding embodiments, wherein the polynucleotide has the configuration of a construct depicted in any one of figures 24, 33-35 or 42.

Embodiment II-62: a recombinant adeno-associated virus (rAAV), the recombinant adeno-associated virus comprising: a) An AAV capsid protein, and b) a polynucleotide according to any one of embodiments II-1 to II-58.

Embodiment II-63: the rAAV of embodiments II-62, wherein the AAV capsid protein is derived from serotype AAV1, AAV2, AAV3, AAV4, AAV5, AAV6, AAV7, AAV8, AAV9, AAV10, AAV11, AAV12, AAV 44.9, AAV-Rh74, or AAVRh10.

Embodiment II-64: the rAAV of embodiments II-63, wherein the AAV capsid proteins and the 5 'and 3' itrs are derived from the same serotype AAV.

Embodiment II-65: the rAAV of embodiments II-63, wherein the AAV capsid proteins and the 5 'and 3' itrs are derived from AAV of different serotypes.

Embodiment II-66: the rAAV of embodiments II-65, wherein the 5 'and 3' itrs are derived from AAV serotype 2.

Embodiment II-67: a pharmaceutical composition comprising a rAAV according to any one of embodiments II-62 and a pharmaceutically acceptable carrier, diluent or excipient.

Embodiment II-68: a method of modifying a target nucleic acid in a population of mammalian cells, the method comprising contacting a plurality of cells with an effective amount of the rAAV of any one of embodiments II-62 to 66 or the pharmaceutical composition of embodiment II-67, wherein the target nucleic acid of the cell targeted by the gRNA is modified by a CRISPR protein.

Embodiment II-69: the method according to embodiments II-68, wherein the modification comprises introducing an insertion, deletion, substitution, duplication, or inversion of one or more nucleotides in the BCL11A gene of the cells of the population.

Embodiment II-71: the method of embodiment II-68 or II-69, wherein the rAAV is present in an amount of at least about 1X 10⁵ vg/kg to about 1X 10¹⁶ vg/kg, at least about 1X 10⁶ vg/kg to about 1X 10¹⁵ vg/kg or at least about 1X 10⁷ vg/kg to about 1X 10¹⁴ A dose of vg/kg is administered to a subject.

Embodiment II-72: the method of any one of embodiments II-68 to II-71, wherein the rAAV is administered to the subject by an administration route selected from subcutaneous, intradermal, intraneural, intranodular, intramedullary, intramuscular, intravitreal, intrathecal, subarachnoid, intraventricular, intracapsular, intravenous, intralymphatic, intraocular, or intraperitoneal routes, and wherein the administration method is injection, infusion, or implantation.

Embodiment II-73: the method of any one of embodiments II-68 to II-72, wherein the subject is selected from the group consisting of a mouse, a rat, a pig, and a non-human primate.

Embodiment II-74: the method according to any one of embodiments II-68 to II-72, wherein the subject is a human.

Embodiment II-75: a method of making a rAAV vector, the method comprising:

a. providing a population of packaging cells; and

b. transfecting a population of cells with:

i) A vector comprising a polynucleotide according to any one of embodiments II-1 to II-57;

ii) a vector comprising the aap (assembly) gene; and

iii) A vector comprising rep and cap genomes.

Embodiments II-76: the method of embodiments II-70, further comprising recovering the rAAV vector.

Group III:

the embodiments of group III refer to the tables provided in this specification and the sequence listing filed therewith.

Embodiment III-1: a polynucleotide comprising the following component sequences:

a. a first AAV Inverted Terminal Repeat (ITR) sequence;

b. a second AAV ITR sequence;

c. a first promoter sequence;

d. a sequence encoding a CRISPR protein;

e. a sequence encoding a first guide RNA (gRNA); the method comprises the steps of,

f. optionally a sequence of at least one auxiliary element,

wherein the polynucleotide is configured for incorporation into a recombinant adeno-associated virus (AAV).

Embodiment III-2: the polynucleotide of embodiment III-1, wherein the combined length of the sequences encoding the CRISPR protein and the first gRNA is less than about 3100, less than about 3090, less than about 3080, less than about 3070, less than about 3060, less than about 3050, or less than about 3040 nucleotides.

Embodiment III-3: the polynucleotide of embodiment III-1 or III-2, wherein the sequence of the first promoter and the at least one auxiliary element has a combined length of greater than at least about 1300, at least about 1350, at least about 1360, at least about 1370, at least about 1380, at least about 1390, at least about 1400, at least about 1500, at least about 1600 nucleotides, at least 1650, at least about 1700, at least about 1750, at least about 1800, at least about 1850, or at least about 1900 nucleotides.

Embodiment III-4: the polynucleotide of embodiment III-1 or III-2, wherein the sequence of the first promoter and the at least one auxiliary element has a combined length of greater than 1314 nucleotides.

Embodiment III-5: the polynucleotide of embodiment III-1 or III-2, wherein the sequence of the first promoter and the at least one auxiliary element has a combined length of greater than 1381 nucleotides.

Embodiment III-6: the polynucleotide of any one of embodiments III-1 to III-5, wherein the first promoter sequence and the sequence encoding the CRISPR protein are operably linked.

Embodiment III-7: the polynucleotide of embodiment III-6, wherein the first promoter is a pol II promoter.

Embodiment III-8: the polynucleotide according to embodiment III-6 or III-7, wherein the first promoter is selected from the group consisting of a polyubiquitin C (UBC) promoter, a Cytomegalovirus (CMV) promoter, a Simian Virus 40 (SV 40) promoter, a chicken beta-actin promoter and a rabbit beta-globin splice acceptor site fusion (CAG), a chicken beta-actin promoter with a cytomegalovirus enhancer (CB 7), a PGK promoter, a Jens Tornoe (JeT) promoter, a GUSB promoter, a CBA hybrid (CBh) promoter, an elongation factor-1 alpha (EF-1 alpha) promoter, a beta-actin promoter, a Rous Sarcoma Virus (RSV) promoter, a silencing Spleen Focus Forming Virus (SFFV) promoter, a CMVd1 promoter, a truncated CMV (tCMVd 2), a minimal CMV promoter, a hepB promoter chicken beta-actin promoter, HSV TK promoter, mini-TK promoter, minimal IL-2 promoter, GRP94 promoter, supercore promoter 1, supercore promoter 2, supercore promoter 3, adenovirus major late (AdML) promoter, MLC promoter, MCK promoter, GRK1 protein promoter, rho promoter, CAR protein promoter, hSyn promoter, U1a promoter, ribosomal protein large subunit 30 (Rpl 30) promoter, ribosomal protein small subunit 18 (Rps 18) promoter, CMV53 promoter, minimal SV40 promoter, CMV53 promoter, SFCp promoter, mecp2 promoter, pJB42CAT5 promoter, MLP promoter, EFs promoter, meP426 promoter, mecp2 promoter, MHCK7 promoter, β -Glucuronidase (GUSB) promoter, CK7 promoter and CK8e promoter.

Embodiment III-9: the polynucleotide of embodiment III-8, wherein the first promoter is a truncated variant of UBC, CMV, SV, CAG, CB7, PGK, jeT, GUSB, CB, EF-1 alpha, beta-actin, RSV, SFFV, CMVd1, tcmdvd 2, minimal CMV, chicken beta-actin, HSV TK, mini-TK, minimal IL-2, GRP94, super core promoter 1, super core promoter 2, MLC, MCK, GRK1 protein Rho, CAR protein, hSyn, U1a, ribosomal protein large subunit 30 (Rpl 30), ribosomal protein small subunit 18 (Rps 18), CMV53, minimal SV40, CMV53, SFCp, pJB42CAT5, MLP, EFS, meP426, mecP2, MHCK7, CK7, or CK8e promoter.

Embodiment III-10: the polynucleotide of embodiment III-7 or III-8 wherein the first promoter sequence has less than about 400 nucleotides, less than about 350 nucleotides, less than about 300 nucleotides, less than about 200 nucleotides, less than about 150 nucleotides, less than about 100 nucleotides, less than about 80 nucleotides, or less than about 40 nucleotides.

Embodiment III-11: the polynucleotide of embodiment III-7 or III-8, wherein the first promoter sequence has between about 40 and about 585 nucleotides, between about 100 and about 400 nucleotides, or between about 150 and about 300 nucleotides.

Embodiment III-12: the polynucleotide of any one of embodiments III-1 to III-11, wherein the first promoter is selected from the group consisting of SEQ ID NOs 40370-40400 as shown in table 8, or a sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% identity thereto.

Embodiment III-13: the polynucleotide of any one of embodiments III-1 to III-12, wherein the first promoter is selected from SEQ ID NOs 41030-41044, or a sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% identity thereto, as set forth in table 24.

Embodiment III-14: the polynucleotide of any one of embodiments III-1 to III-13, wherein at least one auxiliary element is operably linked to the sequence encoding the CRISPR protein.

Embodiment III-15: the polynucleotide of any one of embodiments III-1 to III-14, further comprising a second promoter.

Embodiment III-16: the polynucleotide of embodiments III-15, wherein the second promoter sequence and the sequence encoding the first gRNA are operably linked.

Embodiment III-17: the polynucleotide of embodiment III-15 or III-16, wherein the second promoter is a pol III promoter.

Embodiment III-18: the polynucleotide of any one of embodiments III-15 to III-17, wherein the second promoter is selected from the group consisting of U6, mini U61, mini U62, mini U63, biH1 (bi-directional H1 promoter), biU6 (bi-directional U6 promoter), gorilla U6, rhesus U6, human 7sk, and human H1 promoter.

Embodiment III-19: the polynucleotide of embodiments III-18, wherein the second promoter is a truncated variant of the U6, mini U61, mini U62, mini U63, biH1, biU6, gorilla U6, rhesus U6, human 7sk, or human H1 promoter.

Embodiment III-20: the polynucleotide of embodiment III-18 or III-19, wherein the second promoter sequence has less than about 250 nucleotides, less than about 220 nucleotides, less than about 200 nucleotides, less than about 160 nucleotides, less than about 140 nucleotides, less than about 130 nucleotides, less than about 120 nucleotides, less than about 100 nucleotides, less than about 80 nucleotides, or less than about 70 nucleotides.

Embodiment III-21: the polynucleotide of embodiment III-18 or III-19, wherein the second promoter sequence has between about 70 and about 245 nucleotides, between about 100 and about 220 nucleotides, or between about 120 and about 160 nucleotides.

Embodiment III-22: the polynucleotide of any one of embodiments III-15 to III-21, wherein the second promoter sequence is selected from SEQ ID NOs 40401-40420 and 41010-41029, or a sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% identity thereto, as set forth in table 9.

Embodiment III-23: the polynucleotide of any one of embodiments III-15 to III-22, wherein the second promoter enhances transcription of the first gRNA.

Embodiment III-24: the polynucleotide of any one of embodiments III-15 to III-23, wherein the sequences of the first promoter and the second promoter have a combined length of greater than at least about 1300, at least about 1350, at least about 1360, at least about 1370, at least about 1380, at least about 1390, at least about 1400, at least about 1500, at least about 1600 nucleotides, at least 1650, at least about 1700, at least about 1750, at least about 1800, at least about 1850, or at least about 1900 nucleotides.

Embodiment III-25: the polynucleotide of any one of embodiments III-15 to III-24, wherein the sequence of the first promoter, the second promoter, and the at least one auxiliary element has a combined length of greater than at least about 1300, at least about 1350, at least about 1360, at least about 1370, at least about 1380, at least about 1390, at least about 1400, at least about 1500, at least about 1600 nucleotides, at least 1650, at least about 1700, at least about 1750, at least about 1800, at least about 1850, or at least about 1900 nucleotides.

Embodiment III-26: the polynucleotide of any one of embodiments 15 to III-25, wherein the combined length of the sequences of the first promoter, the second promoter, and the at least one auxiliary element is greater than 1314 nucleotides.

Embodiment III-27: the polynucleotide of any one of embodiments III-15 to III-26, wherein the combined length of the sequences of the first promoter, the second promoter, and the at least one auxiliary element is greater than 1381 nucleotide.

Embodiment III-28: the polynucleotide according to any one of embodiments III-1 to III-27 comprising two or more helper element sequences.

Embodiments III-29: the polynucleotide of embodiments III-28, wherein the sequence of the first promoter, the second promoter, and two or more auxiliary elements has a combined length of greater than at least about 1300, at least about 1350, at least about 1360, at least about 1370, at least about 1380, at least about 1390, at least about 1400, at least about 1500, at least about 1600, at least 1650, at least about 1700, at least about 1750, at least about 1800, at least about 1850, or greater than at least about 1900 nucleotides.

Embodiment III-30: the polynucleotide of embodiments III-28, wherein the combined length of the sequences of the first promoter, the second promoter, and the two or more auxiliary elements is greater than 1314 nucleotides.

Embodiment III-31: the polynucleotide of embodiments III-28, wherein the combined length of the sequences of the first promoter, the second promoter, and the two or more auxiliary elements is greater than 1381 nucleotide.

Embodiment III-32: the polynucleotide of any one of embodiments III-15 to III-31, wherein at least 25%, 26%, 27%, 28%, 29%, 30%, 31%, 32%, 33%, 34% or at least 35% or more of the length of the polynucleotide sequence comprises the sequences of the first and second promoters and at least one auxiliary element.

Embodiment III-33: the polynucleotide according to any one of embodiments III-1 to III-32, wherein the auxiliary element is selected from the group consisting of: poly (a) signals, gene enhancer elements, introns, post-transcriptional regulatory elements (PTREs), nuclear Localization Signals (NLS), deaminase, DNA glycosylase inhibitors, stimulators of CRISPR-mediated homology directed repair, activators of transcription and repressor proteins of transcription.

Embodiments III-34: the polynucleotide of any one of embodiments III-1 to III-32, wherein the helper element enhances transcription, transcription termination, expression, binding of the target nucleic acid, editing of the target nucleic acid, or performance of the CRISPR protein as compared to an otherwise identical polynucleotide lacking the helper element.

Embodiments III-35: the polynucleotide of embodiments III-34, wherein the enhanced property is an increase in the editing of the CRISPR protein and the first gRNA expressed in the target nucleic acid in an in vitro assay of at least about 10%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 90%, at least about 100%, at least about 150%, at least about 200%, or at least about 300%.

Embodiment III-36: the polynucleotide of any one of embodiments III-1 to III-35, wherein the encoded CRISPR protein is a class 2 CRISPR protein.

Embodiment III-37: the polynucleotide of embodiments III-36, wherein the encoded CRISPR protein is a class 2V CRISPR protein.

Embodiment III-38: the polynucleotide of embodiments III-37, wherein the encoded class 2V CRISPR protein comprises:

an ntsb domain comprising a sequence of QPASKKIDQNKLKPEMDEKGNLTTAGFACSQCGQPLFVYKLEQVSEKGKAYTNYFGRCNVAEHEKLILLAQLKPEKDSDEAVTYSLGKFGQ (SEQ ID NO: 41818) or a sequence having at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% identity thereto;

b. a helix I-II domain comprising the sequence of RALDFYSIHVTKESTHPVKPLAQIAGNRYASGPVGKALSDACMGTIASFLSKYQDIIIEHQKVVKGNQKRLESLRELAGKENLEYPSVTLPPQPHTKEGVDAYNEVIARVRMWVNLNLWQKLKLSRDDAKPLLRLKGFPSF (SEQ ID NO: 41819) or a sequence having at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% identity thereto;

c. a helix II domain comprising the sequence of PLVERQANEVDWWDMVCNVKKLINEKKEDGKVFWQNLAGYKRQEALRPYLSSEEDRKKGKKFARYQLGDLLLHLEKKHGEDWGKVYDEAWERIDKKVEGLSKHIKLEEERRSEDAQSKAALTDWLRAKASFVIEGLKEADKDEFCRCELKLQKWYGDLRGKPFAIEAE (SEQ ID NO: 41820) or a sequence having at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% identity thereto; and

a RuvC-I domain comprising a sequence of SSNIKPMNLIGVDRGENIPAVIALTDPEGCPLSRFKDSLGNPTHILRIGESYKEKQRTIQAKKEVEQRRAGGYSRKYASKAKNLADDMVRNTARDLLYYAVTQDAMLIFENLSRGFGRQGKRTFMAERQYTRMEDWLTAKLAYEGLPSKTYLSKTLAQYTSKTC (seq id NO: 41821) or a sequence having at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% identity thereto.

Embodiment III-39: the polynucleotide of embodiments III-38 wherein the encoded class 2V CRISPR protein comprises an OBD-I domain comprising the sequence of QEIKRINKIRRRLVKDSNTKKAGKTGPMKTLLVRVMTPDLRERLENLRK KPENIPQ (SEQ ID NO: 41822) or a sequence having at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% identity thereto.

Embodiment III-40: the polynucleotide of embodiment III-38 or III-39, wherein the encoded class 2V CRISPR protein comprises an OBD-II domain comprising

NSILDISGFSKQYNCAFIWQKDGVKKLNLYLIINYFKGGKLRFKKIKPEAFEANRFYTVINKKSGEIVPMEVNFNFDDPNLIILPLAFGKRQGREFIWNDLLSLETGSLKLANGRVIEKTLYNRRTRQDEPALFVALTFERREVLD (SEQ ID NO: 41823) or a sequence having at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% identity thereto.

Embodiment III-41: the polynucleotide of any one of embodiments III-38 to III-40, wherein the encoded class 2V CRISPR protein comprises a helical I-I domain comprising

PISNTSRANLNKLLTDYTEMKKAILHVYWEEFQKDPVGLMSRVA (SEQ ID NO: 41824) or a sequence having at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% identity thereto.

Embodiment III-42: the polynucleotide of any one of embodiments III-38 to III-41, wherein the encoded class 2V CRISPR protein comprises a TSL domain comprising

SNCGFTITSADYDRVLEKLKKTATGWMTTINGKELKVEGQITYYNRYKRQNVVKDLSVELDRLSEESVNNDISSWTKGRSGEALSLLKKRFSHRPVQEKFVCLNCGFETH (SEQ ID NO: 41825) or a sequence having at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% identity thereto.

Embodiment III-43: the polynucleotide of any one of embodiments III-38 to III-42, wherein the encoded class 2V CRISPR protein comprises a RuvC-II domain comprising

ADEQAALNIARSWLFLRSQEYKKYQTNKTTGNTDKRAFVETWQSFYRK KLKEVWKPAV (SEQ ID NO: 41826) or a sequence having at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% identity thereto.

Embodiment III-44: the polynucleotide of any one of embodiments III-38 to III-43, wherein the encoded class 2V CRISPR protein comprises the sequence of SEQ ID No. 145 or a sequence having at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% identity thereto.

Embodiments III-45: the polynucleotide of any one of embodiments III-38 to III-44, wherein the encoded class 2V CRISPR protein comprises at least one modification in one or more domains.

Embodiments III-46: the polynucleotide of embodiments III-45, wherein the at least one modification comprises:

a. at least one amino acid substitution in the domain;

b. at least one amino acid deletion in the domain;

c. at least one amino acid insertion in a domain; or (b)

d. (a) Any combination of (c).

Embodiment III-47: the polynucleotide of embodiment III-45 or III-46 comprising a modification at one or more amino acid positions in the NTSB domain relative to SEQ ID NO 41818, said modification selected from the group consisting of P2, S4, Q9, E15, G20, G33, L41, Y51, F55, L68, A70, E75, K88, and G90.

Embodiments III-48: the polynucleotide according to embodiments III-47 wherein the modification (S) at one or more amino acid positions in the NTSB domain relative to SEQ ID NO 41818 is selected from the group consisting of an insertion of G at position 2, an insertion of I at position 4, an insertion of L at position 4, Q9P, E15S, G D, a deletion of S at position 30, G33T, L41A, Y51T, F V, L68D, L68E, L5468K, A70 6270S, E75A, E6275 8389Q and G90Q.

Embodiment III-49: the polynucleotide of any one of embodiments III-45 to III-48 comprising a modification at one or more amino acid positions in the helix I-II domain relative to SEQ ID NO. 41819, said modification selected from the group consisting of I24, A25, Y29G 32, G44, S48, S51, Q54, I56, V63, S73, L74, K97, V100, M112, L116, G137, F138 and S140.

Embodiment III-50: the polynucleotide according to embodiments III-49 wherein the sequence relative to SEQ ID NO:41819 the number of times the current of the current, one or more modifications at one or more amino acid positions in the helix I-II domain are selected from the group consisting of an insertion of T at position 24, an insertion of C at position 25, Y29F, G32Y, G32 5232N, G32N, G32N, G32N, G V, a deletion of G at position 32 the insertion of G32N, G44N, G5248N, G5251N, G5273N, G73N, G97N, G5297N, G5297N, G52112N, G137N, Q at position 138 and S140Q.

Embodiment III-51: the polynucleotide of any one of embodiments III-45 to III-50, which is relative to SEQ ID NO:41820 comprises a modification at one or more amino acid positions in the helix II domain selected from L2, V3, E4, R5, Q6, A7, E9, V10, D11, W12, W13, D14, M15, V16, C17, N18, V19, K20, L22, I23, E25, K26, K31, Q35, L37, a38, K41, R42, Q43, E44, L46, K57, Y65, G68, L70, L71, L72, E75, G79, D81, W82, K84, V85, Y86, D87, I93, K95, K96, E98, L100, K102, I104, K105, E109, R110, D114, K118, a120, L121, W124, L125, R126, a127, a129, I133, E134, G135, L136, K138, K142, C152, E150, L152, C152, L158, Q150, L158 and Q158.

Embodiment III-52: the polynucleotide of embodiments III-51, wherein the sequence relative to SEQ ID NO: one or more modifications at one or more amino acid positions in the helix II domain are selected from the group consisting of an insertion at position 2 a, an insertion at position 2H, a deletion at position 2L and a deletion at position 3V, V3 3F, a deletion at position 3V, an insertion at position 3D, V3P, a deletion at position 4E, E44 44 5 6V, an insertion at position 6Q, an insertion at position 7G, an insertion at position 9H, an insertion at position 9 a, VD10, an insertion at position 0T 1, a deletion at position 10V, an insertion at position 10F, an insertion at position 11D, a deletion at position 11S, a deletion at position 12W, W12H, an insertion at position 12P, an insertion at position 13Q the insertion of G at position 12, the insertion of R at position 13, W13D, the insertion of D at position 13, W13L, the insertion of P at position 14, the insertion of D at position 14, the deletion of D at position 14 and the deletion of M at position 15, the insertion of T at position 16, the insertion of P at position 17, N18 19 19 20 22 23 25P, the insertion of G at position 25, the insertion of K26 27 31 35P at position 37, the insertion of S at position 37, the deletion of L at position 37 and the deletion of A at position 38, the insertion of K41L, the insertion of R at position 42, the deletion of Q at position 43 and the deletion of E at position 44, the insertion of L46 57 65 68 70 72 72 72 75 75 75 75 79P, the insertion of E at position 79, the insertion of T at position 81, the insertion of R at position 81, the insertion of W at position 81, the insertion of Y at position 81, the insertion of W at position 82, the insertion of Y at position 82, the insertion of W82 84 84 84 84 85A at position 82, the insertion of L at position 85, the insertion of Y86 87 87 87 87 87 93 95 96 98 102 104 104D at position 85, the insertion of K at position 109, the insertion of E109D, the deletion of R at position 110, D114E, the insertion of D at position 114, K118 120 121 127 127 129 133E, the insertion of C at position 133, the insertion of S at position 134, the insertion of G at position 134, the insertion of R at position 135, the insertion of G135 136 136H, the deletion of E at position 138, D140R, the insertion of D at position 140D, the insertion of P at position 141, the insertion of D at position 142, the deletion of E at position 143+the deletion of F at position 144, the insertion of Q at position 143, F144K, the deletion of F at position 144 and the deletion of C at position 145, the insertion of C145R, the insertion of G at position 145, the insertion of C147D, the insertion of G at position 148D, the deletion of E148, and the insertion of F at position 158L 150.

Embodiment III-53: the polynucleotide of any one of embodiments III-45 to III-52 comprising a modification at one or more amino acid positions in the RuvC-I domain relative to SEQ ID NO 41821, said modification selected from the group consisting of I4, K5, P6, M7, N8, L9, V12, G49, K63, K80, N83, R90, M125 and L146.

Embodiment III-54: the polynucleotide of embodiments III-53, wherein the sequence relative to SEQ ID NO:41821, one or more modifications at one or more amino acid positions in the RuvC-I domain are selected from the group consisting of an insertion of I at position 4, an insertion of S at position 5, an insertion of T at position 6, an insertion of N at position 6, an insertion of R at position 7, an insertion of K at position 7, an insertion of H at position 8, an insertion of S at position 8, V12L, G49W, G R, S51R, S51K, K62S, K62 5235 62E, V62A, K80E, N6583G, R90H, R90G, M125 35125 125A, L137Y, an insertion of P at position 137, an insertion of L at position 141, an insertion of L141R, L141D, an insertion of Q at position 142 insert R at position 143, insert N position 143, insert P position 146 at E144N, insert P position 146 at L146F, P147A, K149Q, T V, insert R position 152, insert H153 at T155Q, insert H position 155, insert R position 155, insert L position 156, miss L position 156, insert W position 156, insert A position 157, insert F position 157, miss Y position 159 at A157S, Q K, miss Y position 159 at T160Y, T160F, insert I position 161, insert S161P, T163P, insert N position 163, C164K, and C164M.

Embodiment III-55: the polynucleotide of any one of embodiments III-45 to III-54 comprising a modification at one or more amino acid positions in the OBD-I domain relative to SEQ ID NO 41822 selected from the group consisting of I3, K4, R5, I6, N7, K8, K15, D16, N18, P27, M28, V33, R34, M36, R41, L47, R48, E52, P55 and Q56.

Embodiment III-56: the polynucleotide of embodiments III-55, wherein the one or more modifications at one or more amino acid positions in the OBD-I domain relative to SEQ ID NO:41822 are selected from the group consisting of an insertion of G at position 3, an insertion of I3G, I E, an insertion of G at position 4, an insertion of K4G, K4P, K4S, K4W, K4W, R P, an insertion of P at position 5, an insertion of G at position 5, an insertion of R5S, an insertion of S at position 5, an insertion of R5A, R5P, R5G, R5L, I6L, an insertion of G at position 6, an insertion of N7Q, N3835, S, K8, F, D W, an insertion of F18 at position 16, an insertion of P at position 27, an insertion of M28 6328 3T, R34 7936Y, R, 35676845P at position 48P, an insertion of E52P at position 55P, a deletion of P at position 55 and a deletion of Q at position 56, a deletion of P at position 56, a P at position 56, and an insertion of Q56, and a Q at position 56.

Embodiments III-57: the polynucleotide of any one of embodiments III-45 to III-56 comprising a modification at one or more amino acid positions in the OBD-II domain relative to SEQ ID NO:41823 selected from the group consisting of: s2, I3, L4, K11, V24, K37, R42, a53, T58, K63, M70, I82, Q92, G93, K110, L121, R124, R141, E143, V144, and L145.

Embodiments III-58: the polynucleotide according to embodiments III-57 wherein the modification (S) at one or more amino acid positions in the OBD-II domain relative to SEQ ID NO:41823 is selected from the group consisting of a deletion of S at position 2, I3R, I K, a deletion of I and L4 at position 3, a deletion of L at position 4, K11T, an insertion of P at position 24, K37G, R E, an insertion of S at position 53, an insertion of R at position 58, a deletion of K at position 63, M70T, I82T, Q I, Q92 92 92 5493A, an insertion of A at position 93, K110Q, R115Q, L T, an insertion of A at position 124, an insertion of R at position 141, an insertion of D at position 143, an insertion of A at position 143, an insertion of W at position 144, and an insertion of A at position 145.

Embodiments III-59: the polynucleotide of any one of embodiments III-45 to III-58 comprising a modification at one or more amino acid positions in the TSL domain relative to SEQ ID NO. 41825 selected from the group consisting of S1, N2, C3, G4, F5, I7, K18, V58, S67, T76, G78, S80, G81, E82, S85, V96 and E98.

Embodiment III-60: the polynucleotide of embodiments III-59, wherein the modification at one or more amino acid positions in the OBD-II domain relative to SEQ ID NO 41825 is selected from the group consisting of an insertion of M at position 1, a deletion of N at position 2, an insertion of V at position 2, C3S, an insertion of G at position 4, an insertion of W at position 4, F5P, an insertion of W at position 7, K18G, V D, an insertion of A at position 67, T76E, T76D, T76N, G D, a deletion of S at position 80, a deletion of G at position 81, an insertion of E at position 82, an insertion of N at position 82, S85I, V96C, V T and E98D.

Embodiment III-61: the polynucleotide of any one of embodiments III-45 to III-60, wherein the expressed class 2V CRISPR protein exhibits improved characteristics relative to SEQ ID No. 2 or SEQ ID No. 145, wherein the improved characteristics comprise increased binding affinity to gRNA, increased binding affinity to the target nucleic acid, improved ability to utilize a broader spectrum of PAM sequences in the editing of the target nucleic acid, improved unwinding of the target nucleic acid, increased editing activity, improved editing efficiency, improved editing specificity for cleavage of the target nucleic acid, reduced off-target editing or cleavage of the target nucleic acid, increased percentage of editable eukaryotic genome, increased activity of the nuclease, increased target strand loading for double strand cleavage, reduced target strand loading for single strand nicks, increased binding of the non-target strand of DNA, improved protein stability, increased protein: gRNA (RNP) complex stability, and improved fusion characteristics.

Embodiment III-62: the polynucleotide of embodiments III-61, wherein the improved feature comprises increased cleavage activity at a target nucleic acid sequence comprising a TTC, ATC, GTC or CTC PAM sequence.

Embodiment III-63: the polynucleotide of embodiments III-62, wherein said improved feature comprises increased cleavage activity at a target nucleic acid sequence comprising an ATC or CTC PAM sequence relative to the cleavage activity of the sequence of SEQ ID NO: 145.

Embodiments III-64: the polynucleotide of embodiments III-63, wherein the improved cleavage activity in an in vitro assay is an enrichment score (log) of at least about 1.5, at least about 2.0, at least about 2.5, at least about 3, at least about 3.5, at least about 4, at least about 4.5, at least about 5, at least about 6, at least about 7, at least about 8, or more as compared to the score of the sequence of SEQ ID NO:145₂ )。

Embodiment III-65: the polynucleotide of embodiments III-63, wherein said improved feature comprises increased cleavage activity of a target nucleic acid sequence comprising a CTC PAM sequence as compared to the sequence of SEQ ID NO: 145.

Embodiment III-66: the multiple cores according to embodiments III-65 A nucleotide, wherein in an in vitro assay the improved cleavage activity is an enrichment score (log) of at least about 2, at least about 2.5, at least about 3, at least about 3.5, at least about 4, at least about 4.5, at least about 5, or at least about 6 or more as compared to the score of the sequence of SEQ ID NO:145₂ )。

Embodiment III-67: the polynucleotide of embodiments III-62 wherein the improved feature comprises increased cleavage activity of a target nucleic acid sequence comprising a TTC PAM sequence as compared to the sequence of SEQ ID NO: 145.

Embodiment III-68: the polynucleotide of embodiments III-67 wherein the improved cleavage activity in an in vitro assay is at least about 1.5, at least about 2.0, at least about 2.5, at least about 3, at least about 3.5, at least about 4, at least about 4.5, at least about 5, or at least about 6log higher than the sequence of SEQ ID NO:145₂ Or more enrichment scores.

Embodiments III-69: the polynucleotide of embodiments III-61 wherein the improved feature comprises increased specificity for cleavage of the target nucleic acid sequence relative to the sequence of SEQ ID NO: 145.

Embodiment III-70: the polynucleotide of embodiments III-69 wherein the increased specificity in an in vitro assay is at least about 2.0, at least about 2.5, at least about 3, at least about 3.5, at least about 4, at least about 4.5, at least about 5, or at least about 6log higher than the sequence of SEQ ID NO:145₂ Or more enrichment scores.

Embodiment III-71: the polynucleotide of embodiments III-61, wherein the improved feature comprises reduced off-target cleavage of the target nucleic acid sequence.

Embodiments III-72: the polynucleotide of embodiments III-37, wherein the encoded class 2V CRISPR protein is selected from Cas12f, cas12j (CasPhi) and CasX.

Embodiment III-73: the polynucleotide of embodiments III-72, wherein the encoded CasX comprises a sequence selected from the group consisting of SEQ ID NOs 1-3, 49-160, and 40208-40369, or a sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity thereto.

Embodiment III-74: the polynucleotide of embodiments III-72, wherein the encoded CasX comprises a sequence selected from the group consisting of sequences SEQ ID NOs 1-3, 49-160, 40208-40369 and 40828-40912.

Embodiment III-75: the polynucleotide of embodiments III-72, wherein the CasX sequence of the polynucleotide comprises a sequence selected from the group consisting of SEQ ID NOs 40577-40588, or a sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity thereto, as set forth in table 21.

Embodiments III-76: the polynucleotide of embodiments III-72, wherein the CasX sequence of the polynucleotide comprises a sequence selected from the group consisting of SEQ ID NOs 40577-40588 as shown in table 21.

Embodiments III-77: the polynucleotide of any one of embodiments III-1 to III-76, wherein the polynucleotide encodes one or more NLS linked to the sequence encoding the CRISPR protein.

Embodiment III-78: the polynucleotide of embodiments III-77 wherein the sequence encoding one or more NLS is positioned at or near the 5' end of the sequence encoding the CRISPR protein.

Embodiment III-79: the polynucleotide of embodiment III-78 or III-79, wherein the sequence encoding the one or more NLS is positioned at or near the 3' end of the sequence encoding the CRISPR protein.

Embodiment III-80: the polynucleotide of embodiment III-78 or III-79, wherein the polynucleotide encodes at least two NLSs, wherein the sequences encoding the at least two NLSs are positioned at or near the 5 'and 3' ends of the sequence encoding the CRISPR protein.

Embodiment III-81: the polynucleotide of any one of embodiments III-77 to III-80, wherein the one or more encoded NLS is selected from the group consisting of: PKKKRKV (SEQ ID NO: 196), KRPAATKKAGQAKKKK (SEQ ID NO: 197), PAAKRVKLD (SEQ ID NO: 248), RQRRNELKRSP (SEQ ID NO: 161), NQSSNFGPMKGGNFGGRSSGPYGGGGQYFAKPRNQGGY (SEQ ID NO: 162), RMRIZFKKGKDTARRRRRVEVELRVEKKKKKKKRRV (SEQ ID NO: 163), VSRKRPRP (SEQ ID NO: 164), PPKKARED (SEQ ID NO: 165), PQPKKKPL (SEQ ID NO: 166), SALIKKKKKMAP (SEQ ID NO: 167), DRLRR (SEQ ID NO: 168), PKQKKKKRKR (SEQ ID NO: 169), RKLKKKIKKL (SEQ ID NO: 170), REKKKFLKRR (SEQ ID NO: 171), KRKGDEVDGVDEVAKKKSKK (SEQ ID NO: 172), RKCLQAGMNLEARKTKK (SEQ ID NO: 173), PRPRKIPR (SEQ ID NO: 174), PPKRRKRRV (SEQ ID NO: 175), NLSKKKKRKREK (SEQ ID NO: 176), RRPSRPFRKP (SEQ ID NO: 177), KRPPRSPSS (SEQ ID NO: 178), KRGINDRNFWRGENERKTR (SEQ ID NO: 179), PRPPKMARYDN (SEQ ID NO: 180), KRR (SEQ ID NO: 180), REKKKFLKRR (SEQ ID NO: 170), REKKKFLKRR (SEQ ID NO: 174), REKKKFLKRR (SEQ ID NO: 17), KRKGDEVDGVDEVAKKKSKK (SEQ ID NO: 172), 35 (SEQ ID NO: 37), 35 (SEQ ID NO: 170), TKKKKKKKKKKKKKRR (SEQ ID NO:192 (SEQ ID NO: 37), 37 (SEQ ID NO: 35, 37) and TKKKKKKKKKKRRV (SEQ ID NO: 35 (SEQ ID NO: 37) PKRGRGRPKRGRGR (SEQ ID NO: 193), PKKKRKVPPPPKKKRKV (SEQ ID NO: 195), PAKRARRGYKC (SEQ ID NO: 40188), KLGPRKATGRW (SEQ ID NO: 40189), PRRRREE (SEQ ID NO: 40190), PYRGRKE (SEQ ID NO: 40191), PLRKRPRR (SEQ ID NO: 40192), PLRKRPRRGSPLRKRPRR (SEQ ID NO: 40193), PAAKRVKLDGGKRTADGSEFESPKKKRKV (SEQ ID NO: 40194), PAAKRVKLDGGKRTADGSEFESPKKKRKVGIHGVPAA (SEQ ID NO: 40195), PAAKRVKLDGGKRTADGSEFESPKKKRKVAEAAAKEAAAKEAAAKA (SEQ ID NO: 40196), PAAKRVKLDGGKRTADGSEFESPKKKRKVPG (SEQ ID NO: 40710), KRKGSPERGERKRHW (SEQ ID NO: 40198), KRTADSQHSTPPKTKRKVEFEPKKKRKV (SEQ ID NO: 41828) and PKKKRKVGGSKRTADSQHSTPPKTKRKVEFEPKKKRKV (SEQ ID NO: 40200), wherein said one or more NLS is linked to said CRISPR variant or to an adjacent NLS linker peptide, wherein said linker peptide is selected from the group consisting of: RS, (G) n (SEQ ID NO: 40201), (GS) n (SEQ ID NO: 40202), (GSGGS) n (SEQ ID NO: 208), (GGSGGS) n (SEQ ID NO: 209), (GGGS) n (SEQ ID NO: 210), GGSG (SEQ ID NO: 211), GGSGG (SEQ ID NO: 212), GSGSGSG (SEQ ID NO: 213), GSGGG (SEQ ID NO: 214), GGGSG (SEQ ID NO: 215), GSSSG (SEQ ID NO: 216), GPGPGP (SEQ ID NO: 217), GGP, PPP, PPAPPA (SEQ ID NO: 218), PPPG (SEQ ID NO: 40207), PPPGPPP (SEQ ID NO: 219), PPP (GGGS) n (SEQ ID NO: 40203), (GGGS) nPPP (SEQ ID NO: 40204), AEAAAKEAAAKEAAAKA (SEQ ID NO: 40205) and TPPKTKRKVEFE (SEQ ID NO: 40206), wherein n is 1 to 5.

Embodiment III-82: the polynucleotide of any one of embodiments III-77 to III-80, wherein the one or more encoded NLS is selected from SEQ ID NOs 40443-40501 as shown in table 15 and table 16, or a sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% identity thereto.

Embodiment III-83: the polynucleotide of any one of embodiments III-77 to III-80, wherein the one or more encoded NLS is selected from the group consisting of: SEQ ID NOS 40443-40501 as shown in Table 15 and Table 16.

Embodiment III-84: the polynucleotide of any one of embodiments III-1 to III-83, wherein the encoded first gRNA comprises a sequence selected from or having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% identity to the sequences of SEQ ID NOs 2101-2285, 39981-40026, 40913-40958, and 41817 as shown in table 2.

Embodiment III-85: the polynucleotide of any one of embodiments III-1 to III-84, wherein the encoded first gRNA comprises a sequence selected from the group consisting of SEQ ID NOs 2101-2285, 39981-40026, 40913-40958, and 41817 as shown in table 2.

Embodiments III-86: the polynucleotide of embodiments III-85, wherein the encoded first gRNA comprises a targeting sequence that is complementary to a target nucleic acid sequence, wherein the targeting sequence has at least 15 to 30 nucleotides.

Embodiments III-87: the polynucleotide of embodiments III-86, wherein the targeting sequence has 18, 19, or 20 nucleotides.

Embodiments III-88: the polynucleotide of any one of embodiments III-1 to III-87, comprising a sequence encoding a second gRNA and a third promoter operably linked to the second gRNA.

Embodiment III-89: the polynucleotide of embodiments III-88, wherein the third promoter is a pol III promoter.

Embodiments III-90: the polynucleotide of embodiment III-88 or III-89, wherein the third promoter is selected from the group consisting of U6, mini U61, mini U62, mini U63, biH1 (bi-directional H1 promoter), biU (bi-directional U6 promoter), gorilla U6, rhesus U6, human 7sk, and human H1 promoter.

Embodiment III-91: the polynucleotide of embodiments III-90, wherein the third promoter is U6, mini U61, mini U62, mini U63, biH1, biU6, gorilla U6, rhesus U6, human 7sk, or a truncated variant of a human H1 promoter.

Embodiments III-92: the polynucleotide of any one of embodiments III-88 to III-91, wherein the third promoter has less than about 250 nucleotides, less than about 220 nucleotides, less than about 200 nucleotides, less than about 160 nucleotides, less than about 140 nucleotides, less than about 130 nucleotides, less than about 120 nucleotides, less than about 100 nucleotides, less than about 80 nucleotides, or less than about 70 nucleotides.

Embodiment III-93: the polynucleotide of any one of embodiments III-88 to III-91, wherein the third promoter has about 70 to about 245 nucleotides, about 100 to about 220 nucleotides, or about 120 to about 160 nucleotides.

Embodiment III-94: the polynucleotide of any one of embodiments III-88 to III-93, wherein the third promoter is selected from SEQ ID NOs 40401-40420 and 41010-41029, or a sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% identity thereto, as set forth in table 9.

Embodiment III-95: the polynucleotide of any one of embodiments III-88 to III-94, wherein the third promoter enhances transcription of the second gRNA.

Embodiments III-96: the polynucleotide of any one of embodiments III-88 to III-95, wherein the encoded second gRNA comprises a sequence selected from or having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% identity to sequences of SEQ ID NOs 2101-2285 and 39981-40026, 40913-40958 and 41817 as shown in table 9.

Embodiment III-97: the polynucleotide of any one of embodiments III-88 to III-95, wherein the encoded second gRNA comprises a sequence selected from the group consisting of SEQ ID NOs 2101-2285, 39981-40026, 40913-40958, and 41817 as shown in table 2.

Embodiment III-98: the polynucleotide of any one of embodiments III-89 to III-97, wherein the encoded second gRNA comprises a targeting sequence complementary to a target nucleic acid sequence that is different from the target nucleic acid of embodiment III-86 or embodiment III-87, wherein the targeting sequence has at least 15 to 30 nucleotides.

Embodiment III-99: the polynucleotide of embodiments III-98, wherein the targeting sequence has 18, 19, or 20 nucleotides.

Embodiment III-100: the polynucleotide of any one of embodiments III-86 to III-99, wherein the targeting sequence is selected from SEQ ID NOs 41056-41776 as set forth in table 27, or a sequence having at least 80%, at least 90% or at least 95% sequence identity thereto.

Embodiments III-101: the polynucleotide of any one of embodiments III-86 to III-99, wherein the targeting sequence is selected from the group consisting of SEQ ID NOs 41056-41776 as shown in table 27.

Embodiments III-102: the polynucleotide of any one of embodiments III-86 to III-101, wherein the encoded first and second grnas comprise a scaffold sequence having one or more modifications relative to SEQ ID No. 2238, wherein the one or more modifications result in improved characteristics of the expressed first and second grnas.

Embodiments III-103: the polynucleotide of embodiments III-102, wherein the one or more modifications comprise one or more nucleotide substitutions, insertions, and/or deletions as set forth in table 28.

Embodiments III-104: the polynucleotide of embodiments III-102 or III-103, wherein optionally in an in vitro assay, the improvement is characterized by one or more functional properties selected from the group consisting of: increased editing activity, increased pseudoknot stem stability, increased triplex region stability, increased scaffold stem stability, extended stem stability, reduced off-target folding intermediates, and increased binding affinity to class 2V CRISPR proteins.

Embodiments III-105: the polynucleotide of any one of embodiments III-102 to III-104, wherein the expressed gRNA scaffold exhibits an improved enrichment score (log) of at least about 2.0, at least about 2.5, at least about 3, or at least about 3.5 as compared to the score of the gRNA scaffold of SEQ ID No. 2238 in an in vitro assay₂ )。

Embodiments III-106: the polynucleotide of embodiments III-84 to III-101, wherein the encoded first and second grnas comprise a scaffold sequence having one or more modifications relative to SEQ ID No. 2239, wherein the one or more modifications result in improved characteristics of the expressed first and second grnas.

Embodiments III-107: the polynucleotide of embodiments III-106, wherein the one or more modifications comprise one or more nucleotide substitutions, insertions, and/or deletions as set forth in table 29.

Embodiments III-108: the polynucleotide of embodiments III-106 or III-107, wherein optionally in an in vitro assay, the improvement is characterized by one or more functional properties selected from the group consisting of: increased editing activity, increased pseudoknot stem stability, increased triplex region stability, increased scaffold stem stability, extended stem stability, reduced off-target folding intermediates, and increased binding affinity to class 2V CRISPR proteins.

Embodiments III-109: the polynucleotide of any one of embodiments III-106 to III-108, wherein the expressed gRNA scaffold exhibits an improved enrichment score (log) of at least about 1.2, at least about 1.5, at least about 2.0, at least about 2.5, at least about 3, or at least about 3.5 as compared to the score of the gRNA scaffold of SEQ ID No. 2239 in an in vitro assay₂ )。

Embodiments III-110: the polynucleotide of any one of embodiments III-106 to III-109 comprising one or more modifications at a position relative to the sequence of SEQ ID No. 2239 selected from the group consisting of C9, U11, C17, U24, a29, U54, G64, a88, and a95.

Embodiments III-111: the polynucleotide according to embodiments III-110 comprising one or more modifications to the sequence of SEQ ID NO. 2239 selected from the group consisting of C9U, U11C, C17G, U C, A C, insertion of G at position 54, insertion of C at position 64, A88G and A95G.

Embodiments III-112: the polynucleotide of embodiments III-111 comprising a modification of the sequence of SEQ ID NO. 2239 consisting of C9U, U11C, C3517G, U C, A C, G insertion at position 54, C insertion at position 64, A88G and A95G.

Embodiment III-113: the polynucleotide of any one of embodiments III-106 to III-112, wherein the improved feature is selected from pseudoknot stem stability, triplex region stability, scaffold vesicle stability, extended stem stability, and binding affinity to a class 2V CRISPR protein.

Embodiments III-114: the polynucleotide of embodiments III-112 wherein said insertion of C and said A88G substitution at position 64 relative to the sequence of SEQ ID NO. 2239 breaks down the asymmetric protruding elements of the extension stem, thereby enhancing the stability of the extension stem of the gRNA scaffold.

Embodiment III-115: the polynucleotide of embodiments III-112 wherein the substitution of U11C, U C and a95G increases the stability of the triplex region of the gRNA scaffold.

Embodiments III-116: the polynucleotide of embodiments III-112, wherein substitution of a29C increases the stability of the pseudoknot stem.

Embodiment III-117: the polynucleotide of any one of embodiments III-1 to III-116, wherein the auxiliary element is a post-transcriptional regulatory element (PTRE) selected from the group consisting of: the cytomegalovirus immediate/early intron a, hepatitis b virus PRE (HPRE), woodchuck (Woodchuck) hepatitis virus PRE (WPRE), and the 5' untranslated region (UTR) of human heat shock protein 70mRNA (Hsp 70).

Embodiments III-118: the polynucleotide of embodiments III-117, wherein the auxiliary element is a PTRE selected from the group consisting of: SEQ ID NOS.40431-40442 as shown in Table 12, or a sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% identity thereto.

Embodiments III-119: the polynucleotide of any one of embodiments III-1 to III-118, wherein the 5 'and 3' itrs are derived from serotype AAV1, AAV2, AAV3, AAV4, AAV5, AAV6, AAV7, AAV8, AAV9, AAV10, AAV11, AAV12, AAV 44.9, AAV-Rh74, or AAVRh10.

Embodiments III-120: the polynucleotide of embodiments III-119, wherein the 5 'and 3' itrs are derived from serotype AAV2.

Embodiments III-121: the polynucleotide of any one of embodiments III-1 to III-120 comprising one or more sequences selected from the group consisting of the sequences of table 8 to table 10, table 12, table 13, table 17 to table 22, and table 24 to table 27, or a sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% identity thereto.

Embodiments III-122: the polynucleotide of any one of embodiments III-1 to III-121 comprising one or more sequences selected from the group consisting of the sequences of tables 8 to 10, 12, 13, 17 to 22, and 24 to 27.

Embodiment III-123: the polynucleotide of any one of embodiments III-1 to III-122 comprising one or more sequences selected from the sequences of table 26, or a sequence having at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity thereto.

Embodiments III-124: the polynucleotide of any one of embodiments III-1 to III-123, comprising one or more sequences selected from the sequences of table 26.

Embodiments III-125: the polynucleotide according to embodiments III-124 comprising the sequence of a construct selected from the group consisting of constructs 1-174, 177-186 and 188-198 as shown in Table 26.

Embodiments III-126: the polynucleotide of any one of embodiments III-123 to III-125, wherein said sequence further comprises a targeting sequence selected from the group consisting of the sequences of SEQ ID NOs 41056-41776 as set forth in table 27, wherein said targeting sequence is linked to the 3' end of said polynucleotide sequence encoding said gRNA.

Embodiment III-127: the polynucleotide of any one of embodiments III-1 to III-126, wherein one or more AAV component sequences selected from the group consisting of a 5'itr, a 3' itr, a pol III promoter, a pol II promoter, a coding sequence for a CRISPR nuclease, a coding sequence for a gRNA, an auxiliary element, and a poly (a) are modified to deplete all or a portion of the CpG dinucleotide of the sequence

Embodiments III-128: the polynucleotide of embodiments III-127, wherein the one or more AAV component sequences selected from the group consisting of a 5'itr, a 3' itr, a pol III promoter, a pol II promoter, a coding sequence for a CRISPR nuclease, a coding sequence for a gRNA, and a poly (a) and a helper element comprise less than about 10%, less than about 5%, or less than about 1% CpG dinucleotides.

Embodiments III-129: the polynucleotide of embodiments III-127, wherein one or more AAV component sequences selected from the group consisting of a 5'itr, a 3' itr, a pol III promoter, a pol II promoter, a coding sequence for a CRISPR nuclease, a coding sequence for a gRNA, and a poly (a) and a helper element are free of CpG dinucleotides.

Embodiments III-130: the polynucleotide of any one of embodiments III-127 to III-129, wherein the one or more AAV component sequences that are codon optimized to deplete all or a portion of the CpG dinucleotides are selected from SEQ ID NOs 41045-41055 as shown in table 25.

Embodiment III-131: the polynucleotide of any one of embodiments III-1 to III-130, wherein the polynucleotide has the configuration of a construct as described in any one of figures 24, 33 to 35 or 42.

Embodiments III-132: a recombinant adeno-associated viral vector (rAAV), the rAAV comprising: a) AAV capsid protein, and b) the polynucleotide according to any one of embodiments III-1 to III-131.

Embodiments III-133: the rAAV of embodiments III-132, wherein the AAV capsid protein is derived from serotype AAV1, AAV2, AAV3, AAV4, AAV5, AAV6, AAV7, AAV8, AAV9, AAV10, AAV11, AAV12, AAV 44.9, AAV-Rh74, or AAVRh10.

Embodiments III-134: the rAAV of embodiments III-133, wherein the AAV capsid proteins and the 5 'and 3' itrs are derived from the same serotype AAV.

Embodiments III-135: the rAAV of embodiments III-133, wherein the AAV capsid proteins and the 5 'and 3' itrs are derived from AAV of different serotypes.

Embodiments III-136: the rAAV of embodiments III-135, wherein the 5 'and 3' itrs are derived from AAV serotype 2.

Embodiments III-137: the rAAV of any one of embodiments III-132-III-136, wherein the CRISPR protein and gRNA are capable of being expressed after transduction of a cell with the rAAV.

Embodiment III-138: the rAAV of embodiments III-137, wherein upon expression the gRNA is capable of forming a Ribonucleoprotein (RNP) complex with the CRISPR protein.

Embodiments III-139: the rAAV of embodiments III-137 or III-138, wherein the AAV polynucleotide component sequences modified to deplete all or a portion of the CpG dinucleotides substantially retain their functional properties upon expression.

Embodiments III-140: the rAAV of embodiments III-137 or III-138, wherein the AAV polynucleotide component sequence modified to deplete all or a portion of the CpG dinucleotide exhibits a lower potential to induce an immune response compared to an rAAV of the AAV polynucleotide in which the CpG dinucleotide is not modified to deplete.

Embodiments III-141: the rAAV of embodiments III-140, wherein the lower potential to induce an immune response is exhibited in an in vitro mammalian cell assay designed to detect the production of one or more markers selected from the group consisting of TLR9, interleukin-1 (IL-1), IL-6, IL-12, IL-18, tumor necrosis factor alpha (TNF- α), interferon gamma (ifny), and granulocyte-macrophage colony stimulating factor (GM-CSF).

Embodiment III-142: the rAAV of embodiments III-141, wherein the rAAV comprising the AAV polynucleotide component sequences modified to deplete all or a portion of the CpG dinucleotides results in a reduction in the production of one or more inflammatory markers by at least about 10%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 80%, or at least about 90% compared to a comparable rAAV that is not depleted of CpG.

Embodiments III-143: the rAAV of embodiments III-140, wherein administration of a dose of the rAAV comprising the AAV polynucleotide component sequences modified to deplete all or a portion of the CpG dinucleotides to a subject results in a reduced immune response compared to the dose of the comparable rAAV that is not depleted of CpG.

Embodiment III-144: the rAAV of embodiments III-143, wherein the reduced immune response is a reduction in the production of an anti-rAAV antibody or is a delayed-type hypersensitivity reaction to a rAAV component in the subject.

Embodiments III-145: the rAAV of embodiments III-143, wherein the reduced immune response is determined by measuring one or more inflammatory markers in the blood of the subject selected from the group consisting of TLR9, interleukin-1 (IL-1), IL-6, IL-12, IL-18, tumor necrosis factor α (TNF- α), interferon gamma (ifnγ), and granulocyte-macrophage colony-stimulating factor (GM-CSF), wherein the one or more markers are reduced by at least about 10%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 80%, or at least about 90% as compared to the comparable rAAV that is not depleted of CpG.

Embodiments III-146: the rAAV of any one of embodiments III-143 to III-145, wherein the subject is selected from the group consisting of a mouse, a rat, a pig, a dog, and a non-human primate. WO 2022/125843A1

Embodiment III-147: the rAAV of any one of embodiments III-143 to III-145, wherein the subject is a human.

Embodiment III-148: a pharmaceutical composition comprising the rAAV of any one of embodiments III-132 and a pharmaceutically acceptable carrier, diluent, or excipient.

Embodiment III-149: a method for modifying a target nucleic acid in a population of mammalian cells, the method comprising contacting a plurality of the cells with an effective amount of a rAAV according to any one of embodiments III-132 to III-147, wherein the target nucleic acid of a gene of the cell targeted by the expressed gRNA is modified by the expressed CRISPR protein.

Embodiments III-150: the method of embodiments III-149, wherein the gene of the cell comprises one or more mutations.

Embodiment III-151: the method of any one of embodiments III-149 or III-150, wherein the modification comprises introducing an insertion, deletion, substitution, duplication, or inversion of one or more nucleotides in the target nucleic acid of the cells of the population.

Embodiments III-152: the method of any one of embodiments III-149 to III-151, wherein the gene is knocked down or knocked out.

Embodiment III-153: the method of any one of embodiments III-149 to III-151, wherein the gene is modified such that the functional gene product is capable of expression.

Embodiments III-154: a method of treating a disease in a subject caused by one or more mutations in a gene of the subject, the method comprising administering to the subject a therapeutically effective dose of a rAAV according to any one of embodiments III-132 to III-145.

Embodiments III-156: the method of embodiments III-154, wherein the rAAV is present in an amount of at least about 1X 10⁵ vg/kg to about 1X 10¹⁶ vg/kg, at least about 1X 10⁶ vg/kg to about 1X 10¹⁵ vg/kg or at least about 1X 10⁷ vg/kg to about 1X 10¹⁴ A dose of vg/kg is administered to a subject.

Embodiments III-157: the method of any one of embodiments III-154 to III-156, wherein the rAAV is administered to the subject by an administration route selected from subcutaneous, intradermal, intraneural, intranodular, intramedullary, intramuscular, intravitreal, intrathecal, subarachnoid, intraventricular, intracapsular, intravenous, intralymphatic, intraocular, or intraperitoneal routes, and wherein the administration method is injection, infusion, or implantation.

Embodiments III-158: the method according to any one of embodiments III-149 to III-157, wherein the subject is selected from the group consisting of a mouse, a rat, a pig, and a non-human primate.

Embodiments III-159: the method according to any one of embodiments III-149 to III-157, wherein the subject is a human.

Embodiments III-160: a method of making a rAAV vector, the method comprising:

a. providing a population of packaging cells; and

b. transfecting a population of cells with:

i) A vector comprising a polynucleotide according to any one of embodiments III-1 to III-131;

ii) a vector comprising the aap (assembly) gene; and

iii) A vector comprising rep and cap genomes.

Embodiment III-161: the method of embodiments III-160, wherein the packaging cell is selected from the group consisting of BHK cells, HEK293T cells, NS0 cells, SP2/0 cells, YO myeloma cells, P3X63 mouse myeloma cells, PER cells, per.c6 cells, hybridoma cells, NIH3T3 cells, COS cells, heLa cells, and CHO cells.

Embodiments III-162: the method of embodiment III-160 or III-161, further comprising recovering the rAAV vector.

Embodiments III-163: the method of any one of embodiments III-160 to III-162, wherein the component sequences of the AAV polynucleotide are contained in a single rAAV particle.

Embodiment III-164: a method of reducing immunogenicity of a rAAV, the method comprising deleting all or part of the CpG dinucleotide of the sequence of the AAV component sequence selected from the group consisting of a 5'itr, a 3' itr, a pol III promoter, a pol II promoter, a coding sequence for a CRISPR nuclease, a coding sequence for a gRNA, an accessory element, and a poly (a).

Embodiment III-165: the method of embodiments III-164, wherein the one or more AAV polynucleotide component sequences comprise less than about 10%, less than about 5%, or less than about 1% CpG dinucleotides.

Embodiment III-166: the method of embodiments III-165, wherein the one or more AAV polynucleotide component sequences are free of CpG dinucleotides.

Embodiments III-167: the method of any one of embodiments III-164 to III-166, wherein the one or more AAV polynucleotide component sequences are selected from SEQ ID NOs 41045-41055 as shown in table 25.

Embodiments III-168: the method of any one of embodiments III-164 to III-167, wherein the rAAV exhibits a lower potential to induce the production of one or more markers of an inflammatory response in an in vitro mammalian cell assay as compared to a comparable rAAV in which the CpG dinucleotide has not been deleted, wherein the one or more inflammatory markers are selected from the group consisting of TLR9, interleukin-1 (IL-1), IL-6, IL-12, IL-18, tumor necrosis factor alpha (TNF-alpha), interferon gamma (ifnγ), and granulocyte-macrophage colony stimulating factor (GM-CSF).

Embodiments III-169: the method of embodiments III-168, wherein the rAAV results in a reduction in production of the one or more inflammatory markers by at least about 10%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 80%, or at least about 90% as compared to the comparable rAAV that is not depleted of CpG.

Embodiments III-170: the method of any one of embodiments III-164 to III-167, wherein administering a dose of the rAAV comprising the AAV polynucleotide component sequence modified to deplete all or a portion of the CpG dinucleotides to a subject results in a reduced immune response compared to the dose of the comparable rAAV that is not depleted of CpG.

Embodiments III-171: the method of embodiments III-170, wherein the reduced immune response is a reduction in the production of an anti-rAAV antibody or a delayed-type hypersensitivity reaction to a rAAV component in the subject.

Embodiments III-172: the method of embodiments III-170, wherein the reduced immune response is determined by measuring one or more inflammatory markers in the blood of the subject selected from the group consisting of TLR9, interleukin-1 (IL-1), IL-6, IL-12, IL-18, tumor necrosis factor alpha (TNF- α), interferon gamma (ifnγ), and granulocyte-macrophage colony-stimulating factor (GM-CSF), wherein the one or more markers are reduced by at least about 10%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 80%, or at least about 90% as compared to the comparable rAAV that is not depleted of CpG.

Embodiments III-173: the method of any one of embodiments III-164 to III-172, wherein the subject is selected from the group consisting of a mouse, a rat, a pig, a dog, and a non-human primate.

Embodiments III-174: the method of any one of embodiments III-164 to III-172, wherein the subject is a human.

Embodiments III-175: the composition of any one of embodiments III-132 to III-147 for use as a medicament for treating a human in need thereof.

This specification sets forth a number of exemplary configurations, methods, parameters, and the like. However, it should be recognized that such description is not intended as a limitation on the scope of the present disclosure, but is instead provided as a description of exemplary embodiments. The following examples are included for illustrative purposes and are not intended to limit the scope of the invention.

Examples

Example 1: when expressed in vitro by AAV episomes, the small class 2V-type CRISPR proteins can edit the genome

Experiments were performed to demonstrate that small class 2V CRISPR proteins can edit genomes when expressed in vitro from AAV plasmids or AAV vectors.

Materials and methods：

AAV transgenes between ITRs are conceptually divided into distinct parts, which consist of therapeutic cargo and accessory elements related to the expression of the therapeutic cargo in mammalian cells. AAV vectorization consists of identifying a list of parts and then designing, constructing and testing plasmids and AAV-form vectors in mammalian cells. A schematic of its components and one configuration are shown in fig. 1.

In this first example, three plasmids were constructed (construct 1, construct 2 and construct 3; see Table 26 for component sequences), where the only difference in plasmid sequence between ITRs was in the region of the affinity tag.

Cloning and QC：

AAV vectors are cloned using a 4-piece Golden Gate module consisting of a pre-digested AAV backbone, DNA encoding a small CRISPR protein, and flanking 5 'and 3' DNA sequences. The 5 'sequence contains the enhancer, protein promoter and N-terminal NLS, while the 3' sequence contains the C-terminal NLS, WPRE, poly (A) signal, RNA promoter and guide RNA containing a spacer 12.7 targeting tDTomato (DNA sequence: CTGCATTCTAGTTGTGGTTT (SEQ ID NO: 40800)). The 5 'and 3' portions were sequenced as gene fragments from Twist, PCR amplified, and assembled into AAV vectors by a circulating golden channel reaction using T4 ligase and BbsI.

The assembled AAV vector was then transformed into chemically competent E.coli (Stbl 3 s). Transformed cells were allowed to recover for 1 hour in a shaking incubator at 37℃and inoculated on kanamycin LB-agar plates, and allowed to grow at 37℃for 12 to 16 hours. Colony PCR was performed to determine clones containing the complete transgene. The correct clones were inoculated into 50mL of LB medium containing kanamycin and grown overnight. The plasmid was then prepared in medium and sequence verified the following day. To assess quality of the medium preparations, constructs were treated with XmaI (which cut in each ITR) and XhoI (which cut once in AAV genome) in restriction digests. The digests and uncleaved constructs were then run on a 1% agarose gel and imaged on ChemiDoc. If the plasmid is >90% supercoiled, correctly sized and ITR intact, the construct is tested via nuclear transfection and/or transduction

Method for plasmid nuclear transfection：

Plasmids containing AAV genomes were transfected into mouse immortalized Neural Progenitor Cell (NPC) lines isolated from Ai9-tdTomato mice (tdTomato mNPC) using the Lonza P3 primary cell 96 well Nucleofector kit. Briefly, ai9 is a Cre reporter gene tool strain designed with loxP flanking STOP cassettes that prevents transcription of tdmamio markers driving the CAG promoter. Ai9 mice or Ai9 mNPC express tdmamto after Cre-mediated recombination to remove the STOP cassette. The sequence verified plasmid was diluted to concentrations of 200 ng/. Mu.l, 100 ng/. Mu.l, 50 ng/. Mu.l and 25 ng/. Mu.l and added to the P3 solution containing 200,000tdTomato mNPC at 5. Mu.l (1000 ng, 500ng, 250ng and 125 ng) of each concentration. The pooled solutions were subjected to nuclear transfection using the Lonza 4D Nucleofector system according to procedure EH-100. After nuclear transfection, the solution was quenched with pre-equilibrated mNPC medium (DMEM/F12 containing GlutaMax, 10mM HEPES, 1 XMEM nonessential amino acids, 1 Xpenicillin/streptomycin, 1:1000-mercaptoethanol, 1X B-27 supplement, 1 XN 2 minus vitamin A with supplemented growth factors bFGF and EGF (20 ng/mL final concentration) the solution was then aliquoted (approximately 67,000 cells/well) in triplicate in 96-well plates coated with PLF (1X poly-DL-ornithine hydrobromide, 10mg/mL in sterile diH20, 1X laminin and 1X fibronectin) after 48 hours after transfection the treated cells were supplemented with fresh mNPC medium containing growth factors.

AAV production：

Suspension HEK293T cells were derived from parental HEK293T and grown in FreeStyle 293 medium. For screening purposes, small-scale cultures (20 mL to 30mL in 125mL Erlenmeyer flasks and stirred at 110 rpm) were diluted to a density of 1.5e+6 cells/mL on the day of transfection. The transgenic endotoxin-free pAAV plasmid with flanking ITR repeats was co-transfected with a plasmid providing an adenovirus helper gene for replication and AAV rep/cap genome using PEIMax (Polysciences) in serum-free optmem medium. Cultures were supplemented with 10% CDM4HEK293 (HyClone) 3 hours after transfection. After three days, the culture was centrifuged at 1000rpm for 10 minutes to separate the supernatant from the cell pellet. The supernatant was mixed with 40% PEG 2.5M NaCl (8% final concentration) and incubated on ice for at least 2 hours to precipitate AAV viral particles. Cell pellets containing most of the AAV vector were resuspended in lysis medium (0.15M NaCl, 50mM Tris HCl, 0.05% Tween, pH 8.5), sonicated on ice (15 seconds, 30% amplitude) and treated with Benzonase (250U/. Mu.L, novagen) for 30 minutes at 37 ℃. The crude lysate and PEG-treated supernatant were then centrifuged at 4000rpm for 20 minutes at 4 ℃ to resuspend PEG-precipitated AAV (precipitate), and the crude lysate (supernatant) with no cellular residues therein was subsequently clarified further using a 0.45 μm filter.

To determine viral genome titres, 1 μl from crude lysate virus was digested with DNase and ProtK followed by quantitative PCR. mu.L of digested virus was used in 25. Mu.L of qPCR reaction consisting of IDT prime mix and a set of primers, 6' FAM/Zen/IBFQ probe (IDT) designed to amplify CMV promoter region (Fwd 5'-CATCTACGTATTAGTCATCGCTATTACCA-3' (SEQ ID NO: 40801)); rev 5'-GAAATCCCCGTGAGTCAAACC-3' (SEQ ID NO: 40802)), probe 5'-TCAATGGGCGTGGATAG-3' (SEQ ID NO: 40803)), or a 62 nucleotide fragment located in AAV2-ITR (Fwd 5'-GGAACCCCTAGTGATGGAGTT-3' (SEQ ID NO: 40804); rev 5'-CGGCCTCAGTGAGCGA-3' (SEQ ID NO: 40805), probe 5'-CACTCCCTCTCTGCGCGCTCG-3'). Titers of virus samples (viral genome (vg)/mL) were calculated using ten-fold serial dilutions of AAV ITR plasmids (2e+9 to 2e+4dna copies/mL each 5 μl) as reference standards. The QPCR procedure was set to: an initial denaturation step of 5 min at 95℃followed by 40 cycles of denaturation at 95℃for 1 min and annealing/extension at 60℃for 1 min.

AAV transduction：

At 48 hours prior to AAV transduction, 10,000 cells/well of mNPC were seeded in PLF coated wells in 96-well plates. All viral infection conditions were performed in triplicate with a normalized number of vg in the experimental vector in a series of 3-fold dilutions of the multiplicity of infection (MOI) ranging from about 1.0e+6vg/cell to 1.0e+4vg/cell. A final volume of 50. Mu.L of AAV vector diluted in pre-equilibrated mNPC medium supplemented with bFGF/EGF growth factor (20 ng/ml final concentration) was applied to each well. 48 hours after transfection, complete medium exchange was performed with fresh medium supplemented with growth factors. Editing activity (tdT + cell quantification) was assessed by FACS 5 days after transfection.

Methods for assessing Activity by FACS：

Five days after transfection, the treated tdTomato mNPC was washed with dPBS in 96-well plates and treated with 50. Mu.L of TrypLE for 15 min. After cell dissociation, the treated wells were quenched with medium containing DMEM, 10% FBS and 1X penicillin/streptomycin. Resuspended cells were transferred to a round bottom 96-well plate and centrifuged at 1000Xg for 5 min. The cell pellet was then resuspended in dPBS containing 1 XPPI and the plate loaded into an Attune NxT flow cytometer autosampler. The Attune NxT flow cytometer was run using the following gating parameters: FSC-A XSSC-A to select cells, FSC-H XSSC-A to select single cells, FSC-A XSVL 1-A to select DAPI negative living cells, FSC-A XSYL 1-A to select tdTomato positive cells.

Results：

The graph in fig. 2 shows that CasX variant 491 and guide variant 174 with spacer 12.7 targeting tdmamio termination cassette are able to edit the target termination cassette in mNPC (measured as percentage of tdtom+ cells by FACS) when delivered by nuclear transfection of AAV transgenic plasmids. Among the vectors tested, casx491.174 delivered in construct 3 (with 80% tdmamato+ cells) was superior to the other vectors. Fig. 3 shows that all three vectors tested were edited at the tdmamato locus in a dose dependent manner. Fig. 4 shows the results of editing using construct 3 in AAV vectors, which demonstrates the dose-dependent response, enabling high editing.

Experiments demonstrated that small class 2V-type CRISPR proteins (such as CasX) and targeting can edit genomes when expressed in vitro by AAV transgenic plasmids or episomes.

Example 2: packaging of small class 2V-type CRISPR systems in AAV vectors

Experiments were conducted to demonstrate that systems of small class 2V CRISPR proteins (such as CasX and gRNA) can be encoded and efficiently packaged within a single AAV vector.

Materials and methods：

For this experiment, AAV vectors were generated with transgenes packaging CasX variant 438, gRNA scaffold 174, and spacer 12.7 using the methods for AAV production, purification, and characterization as described in example 1. For characterization, AAV viral genomes were titrated by qPCR and the space-to-full ratio was quantified using Scanning Transmission Electron Microscopy (STEM). AAV was negatively stained with 1% uranyl acetate and observed. Empty particles are identified by the presence of dark electron closed loops at the center of the capsid.

Results：

The genomic DNA titer of AAV preparations (by qPCR) was measured as 6e12vg/mL, produced by 1L HEK293T cell culture. FIG. 5 is an image from a Scanning Transmission Electron Microscope (STEM) micrograph showing that an estimated 90% of the particles in the AAV formulation contain viral genomes; for example full. Under experimental conditions, the results demonstrate that CasX variant proteins and grnas can be efficiently packaged in a single AAV vector particle, resulting in high titers and high packaging efficiency.

Example 3: in vivo editing of genomes with small class 2V-type CRISPR proteins expressed by AAV episomes

Experiments were conducted to demonstrate that small class 2V-type CRISPR proteins (such as CasX) are capable of editing genomes when expressed in vivo by AAV episomes.

Materials and methods：

For this experiment, AAV vectors were generated using the methods for AAV production, purification and characterization as described in example 1.

In vivo AAV administration and tissue treatment：

P0-P1 pups from Ai9 mice were injected with AAV having a transgene encoding CasX variant 491 and guide variant 174 having spacer 12.7. Briefly, mice were cryoanesthetized and unilaterally injected into the ventricular (ICV) space using a Hamilton syringe (10 μl, model 1701RN SYR Cat No:7653-01) fitted with a 33 gauge needle (small hub RN ndl—custom length 0.5 inch, point 4 (45 degrees)) with 1 μl to 2 μl of AAV vector (about 1e11 viral genome (vg)) after injection, pups were recovered on a hot heat pad and then placed back into their cages, animals were final anesthetized by intraperitoneal injection of ketamine/xylazine 1 month after ICV injection, and further postfixed with saline and fixative (4% paraformaldehyde) in 4% pfa, followed by infiltration with 30% sucrose solution, and embedding in compound.

In a subsequent experiment evaluating the editing in peripheral tissues, particularly in the liver and in the heart, P0-P1 pups from Ai9 mice were cryoanesthetized and injected intravenously with about 1e12 viral genome (vg) of the same AAV construct in a volume of 40 μl. Following injection, the pups were recovered on a hot pad and then returned to their cages. 1 month after administration, animals were subjected to final anesthesia, heart and liver tissues were necropsied and treated as described above.

Results：

Fig. 6 provides comparative Immunohistochemical (IHC) images of treated brain tissue from Ai9 mice receiving ICV-injected packaged CasX variant 491 and AAV with guide scaffold 174 (top) with spacer 12.7 versus ICV-injected packaged CasX variant 491 and AAV with guide 174 with spacer 12.7 and stained with 4', 6-diamidino-2-phenylindole. The signal from the cells in the tdTom channel indicated that the tdTom locus within these cells was successfully edited. tdtom+ cells (represented in white) were evenly distributed throughout all areas of the brain, indicating that ICV administered AAV packaging encoded CasX, guide and spacer were able to reach and edit these cells compared to non-targeted controls (lower panel). The image of fig. 6 represents images obtained from 3 mice of each group. In addition, the results presented in fig. 59A (liver) and 59B (heart) demonstrate that AAV is capable of distributing and editing the genome in the liver and heart (edited cells in white) when expressed in vivo by a single AAV episome.

The results indicate that AAV encoding small CRISPR proteins (such as CasX) and targeting guides can be distributed within tissues when delivered locally (brain) or systemically, and can edit the genome when expressed in vivo by a single AAV episome.

Example 4: small CRISPR protein efficacy is enhanced by AAV vector protein promoter selection

Experiments were performed to demonstrate that expression of small CRISPR proteins (such as CasX) can be enhanced by utilizing different promoters for the encoded proteins in AAV constructs. The load space in AAV transgenes can be maximized by using a short promoter in combination with CasX. In addition, experiments were conducted to demonstrate that if promoters were combined with a larger CRISPR protein (such as Cas 9), expression could be enhanced by using promoters that would otherwise be too long to be efficiently packaged in AAV vectors. The use of long cell type specific promoters to enhance small CRISPR proteins is an advantage of the AAV systems described herein and is not possible in conventional CRISPR systems due to the size of other CRISPR proteins.

Materials and methods：

Cloning and QC were performed as described in example 1. The promoter variant was cloned upstream of the CasX protein in AAV-cis plasmid. In addition to the sequences encoding CasX (table 21) and one or more grnas (tables 18 and 19), the sequences of the additional components of the AAV constructs are listed in table 26.

Table 8: promoter variant sequences

Method for plasmid nuclear transfection：

Immortalized neural progenitor cells are subjected to nuclear transfection as described in example 1. The sequence verified plasmid was diluted to concentrations of 200 ng/. Mu.l, 100 ng/. Mu.l, 50 ng/. Mu.l and 25 ng/. Mu.l and added to the P3 solution containing 200,000tdTomato mNPC at 5. Mu.l (1000 ng, 500ng, 250ng and 125 ng) of each concentration.

AAV viral production and QC were performed as described in example 1, and AAV transduction and editing level assessment was performed in mNPTC-tdT cells by FACS.

Results：

The results of FIG. 7 demonstrate that several different promoters with the CasX protein 438, the scaffold variant 174 and the spacer targeting the tdTomato termination cassette (spacer 12.7, with sequence CTGCATTCTAGTTGTGGTTT (SEQ ID NO: 40800)) edit the target termination cassette in mNPC at a dose of 1000ng when delivered by nuclear transfection of AAV transgenic plasmids. These promoters range in length from over 700 nucleotides to as short as 81 nucleotides (Table 8). Among the promoters tested, constructs 7 and 14 showed considerable editing efficacy.

The results of fig. 8 demonstrate that several short promoters in combination with CasX variant 491, scaffold variant 174 and spacer 12.7 edit the target termination cassette in mNPC at a dose of 500ng when delivered by nuclear transfection of AAV transgenic plasmids. All constructs have promoters less than 250 nucleotides in length, except construct 2 which has a 584 nucleotide promoter. Among the protein promoters tested, construct 15 showed considerable editing efficacy, especially considering its short length (81 nucleotides).

The results of fig. 9 demonstrate that when delivered by nuclear transfection of AAV transgenic plasmids, four leader promoters with CasX variant 491 and scaffold variant 174 (which has spacer 12.7) edit the target termination cassette in mNPC at doses of 125ng and 62.5 ng. Constructs 4, 5 and 6 have a promoter length of less than or equal to 400 nucleotides, thus maximizing editing efficacy while minimizing AAV loading capacity.

The results of fig. 10 demonstrate that the use of four promoter variants in AAV also results in robust editing. Briefly, AAV (aav.3, aav.4, aav.5, and aav.6) were produced with transgenic constructs 3-6, respectively. Each construct showed dose-dependent editing at the target locus (fig. 10, left panel). At the MOI of 2e5, aav.4 showed 38% ± 3% editing at the target locus, which is superior to the other constructs (fig. 10, right panel).

In the experiment depicted in fig. 11, several new protein promoters were compared to the first 4 protein promoter variants previously identified (aav.3, aav.4, aav.5, and aav.6). Briefly, AAV was produced with the corresponding transgenic construct and transduced in tdTomato mNPC. At 5 days post transduction, multiple promoters showed improved editing at MOI of 3e5 (fig. 11). In particular, constructs 58 and 59 had editing activity higher than 30% while minimizing transgene size (fig. 12). Constructs 58 and 59 contained promoters that were 420bp and 258bp smaller, respectively, than construct 3, but both resulted in similar or improved editing of the target locus. In particular, inclusion of an intron in the promoter of construct 59 results in increased editing compared to construct 58 lacking an intron, indicating that inclusion of an intron in the AAV construct promoter is beneficial.

The results demonstrate that expression of small CRISPR proteins (such as CasX) can be enhanced by the use of long promoters that would otherwise not be available for traditional CRISPR proteins due to the size limitations of AAV genomes. Furthermore, combining a short promoter with a small CRISPR protein (such as CasX) allows for significant reduction of AAV transgene load capacity without compromising expression efficiency. This preservation of space allows for the inclusion of additional auxiliary elements in the transgene, such as enhancers and regulatory elements, which will enable increased editing potential.

Example 5: small CRISPR system efficacy is enhanced by AAV vector RNA promoter selection

Experiments were conducted to demonstrate that if certain promoters were selected for expression of guide RNAs (which recognize target DNA for editing) in AAV vectors, the editing efficacy of a small CRISPR system (such as CasX) could be enhanced. By using RNA promoters with different intensities, guide RNA expression can be regulated, thereby affecting editing efficacy. AAV platforms based on CasX systems provide sufficient load space in AAV to include at least 2 independent promoters for expression of two guide RNAs. By combining promoters with different expression levels, expression of multiple guide RNAs can be regulated within a single AAV transgene. The engineered shorter version of the RNA promoter that results in retained editing efficacy can also increase the engineering space for adding other auxiliary elements in AAV transgenes.

Materials and methods：

The method of example 1 was used not only for cloning and quality control of constructs, but also for plasmid nuclear transfection and AAV production, transduction and FACS analysis. The sequence of the Pol III promoter is shown in Table 9. In addition to the sequences encoding CasX (table 21) and one or more grnas (tables 18 and 19), the sequences of the additional components of the AAV constructs are listed in table 26.

Table 9: construct RNA promoter sequences

Results：

The results depicted in fig. 13 demonstrate that three different RNA promoters with protein 491, scaffold variant 174 and spacer 12.7 edit the target termination cassette in mNPC at doses of 250ng and 125ng when delivered by nuclear transfection of AAV transgenic plasmids. Constructs 3 and 32 had similar activity and were edited at the target locus with 42% efficiency. Construct 33 exhibited about 56% activity of constructs 3 and 32.

The results depicted in fig. 14 demonstrate that the same three different promoters with protein 491, scaffold variant 174 and spacer 12.7 edit the target termination cassette in mNPC when delivered as AAV. AAV.3, AAV.32, AAV.33 were produced using transgene constructs 3, 32, and 33, respectively. Each vector showed dose-dependent editing at the target locus (fig. 14, left panel). At the MOI of 3e5, aav.32 and aav.33 have 50% -60% potency of aav.3 (fig. 14, right panel).

The results of fig. 15 demonstrate that constructs with one of four different truncations of the U6 promoter (with protein 491, scaffold variant 174 and spacer 12.7) were each able to edit the target termination cassette at different levels in mNPC at doses of 250ng and 125ng when delivered by nuclear transfection of AAV transgenic plasmids. Construct 85 had 33% efficacy of base construct 53 compared to the non-targeted control, while constructs 86, 87 and 88 did not show any editing and were comparable to the non-targeted control.

Fig. 16 shows experimental results comparing edits in mNPC between base construct 53 and construct 85 when delivered as AAV. At a MOI of 3e5, aav.85 was able to edit at 7% as compared to aav.53 being able to edit at 15%, which is consistent with the results of fig. 15.

The results of fig. 17 demonstrate that constructs with engineered U6 promoters designed to minimize the size of the promoter relative to base U6 of construct 53 (which has encoded CasX protein 491, scaffold variant 174 and spacer 12.7) can edit the target termination cassette at different levels in mNPC at doses of 250ng and 125ng when delivered by nuclear transfection of AAV transgenic plasmids. A set of construct clusters (89, 90, 92, 93, 96, 97, 98 and 99) were all edited in the range of 15% -20% compared to construct 53 being edited in 55%. Other Pol II variants (constructs 94, 95 and 100) all exhibited a higher level of editing, about 32% editing, while construct 101 resulted in 48% editing. These promoters were all smaller than the Pol III promoter in the base construct 53, as shown in the scatter plot of fig. 18, the X-axis describes the transgene size of all AAV variants tested with engineered U6 RNA promoters, while the Y-axis describes the edited mNPC percentages.

The results of fig. 19 demonstrate that constructs with an engineered U6 promoter (with protein 491, scaffold variant 174 and spacer 12.7) were able to edit the target termination cassette in mNPC in a dose-dependent manner when delivered as AAV. Different rates of editing were observed for AAV versus constructs aav.94, aav.95, aav.100 and aav.101, all between base constructs aav.53 and aav.89, with the same Pol III promoter as aav.85 from fig. 15 and 16.

The results of fig. 20 demonstrate that constructs with an engineered U6 promoter (with CasX protein 491, scaffold variant 174 and spacer 12.7) were able to edit the target termination cassette in mNPC when delivered as AAV. Different rates of editing were observed for AAV versus constructs aav.94, aav.95, aav.100 and aav.101, all between base constructs aav.53 and aav.89, with the same Pol III promoter as aav.85 from fig. 15 and 16. Fig. 21 shows the results as a scatter plot of the compiled versus transgene size.

The results depicted in fig. 64 demonstrate that constructs with a rationally engineered Pol III promoter (with sequences encoding CasX protein 491, scaffold variant 174 and spacer 12.7) were able to edit the target tdbitmap termination cassette at different efficiencies when transfected into mouse NPCs as AAV transgenic plasmids at doses of 250ng and 125 ng. Constructs 159 to 174 were designed to minimize the size of the promoter opposite the base U6 (construct ID 157) or H1 (construct ID 158) promoters, and constructs 160 to 174 were engineered as short hybrid variants based on the core region of the H1 promoter (construct 159) with variants from domain exchanges of the 7SK and/or U6 promoters. FIG. 64 shows that most of these promoter variants (which are substantially shorter than the base U6 and H1 promoters) can be used as Pol III promoters to drive adequate gRNA transcription and editing at the tdTomato locus. In particular, constructs 159, 161, 162, 165 and 167 were able to achieve at least 30% editing at the higher dose of 250 ng. These variants are used as promoter substitutes in AAV construct designs that would allow for significant reduction of AAV load capacity in the event that sufficient gRNA expression for targeted editing is driven.

Experimental results demonstrate that expression of small CRISPR systems (such as CasX and guides) can be regulated in a selective manner by using alternative RNA promoters. While most other CRISPR systems do not have enough room to include a separate promoter to express the guide RNA, the CRISPR systems described herein are capable of differentially controlling expression and editing using several possible gRNA promoters of different lengths in the transgene. This data also supports shorter versions of Pol III promoters that can be engineered to retain the ability to promote transcription of functional guides. This quality is an important feature of the AAV systems described herein to save transgene space for additional engineering or inclusion of additional promoters and/or auxiliary elements. Furthermore, modulating other elements in our system may allow for a combination of multiple gRNA promoters, including promoters with different potency.

Example 6: small CRISPR proteinsEfficacy is enhanced by selection of poly (A) in AAV vectors

Experiments were conducted to demonstrate that small CRISPR proteins, such as CasX, can be expressed in AAV genomes using a variety of polyadenylation (poly (a)) signals. Specifically, smaller CRISPR systems can contain larger poly (a) signals. Furthermore, experiments were conducted to demonstrate that inclusion of shorter synthetic poly (a) signals in the construct may allow for further reduction of AAV transgene load capacity.

Materials and methods：

Cloning and QC：

The poly (a) signals within the AAV genome are separated by restriction enzyme sites for modular cloning. These parts were purchased as gene fragments from Twist, amplified by PCR, digested with the corresponding restriction enzymes, washed and then ligated into vectors which were also digested with the same enzymes.

The method of example 1 was used not only for cloning and quality control of constructs, but also for plasmid nuclear transfection and FACS analysis. The sequences of the poly (A) sequences are shown in Table 10. In addition to the sequences encoding CasX (table 21) and one or more grnas (tables 18 and 19), the sequences of the additional components of the AAV constructs are listed in table 26.

Table 10: poly (A) sequences

Plasmid nuclear transfection and activity assessment by FACS were performed as described in example 1.

Results：

The results depicted in figure 22 demonstrate that constructs with several alternative poly (a) signals (which have CasX variant 491, scaffold variant 174 and spacer 12.7) are able to edit the target termination cassette in mNPC at doses of 250ng and 125ng when delivered by nuclear transfection of AAV transgenic plasmids. Of the three constructs tested in this experiment, construct 3 showed the highest efficacy, editing the target locus at 60% efficiency (250 ng dose). Constructs 28 and 29 had poly (a) sequences of 59% and 39% of the poly (a) sequence of construct 3, respectively (see table 11), edited at 21% and 24%, respectively (250 ng dose).

Table 11: poly (A) construct variants

Construct ID	Poly (A) signal size (bp)	AAV transgene size (bp)
			3	208	4550
25	477	4795
			26	49	4367
27	49	4367
			28	122	4440
29	82	4400
			30	395	4713
31	56	4374
			34	186	4565
37	208	4619

The results depicted in fig. 23 demonstrate that two different poly (a) signals with protein 491, scaffold variant 174 and spacer 12.7 are capable of editing the target termination cassette in mNPC when delivered as an AAV vector. Aav.34 and aav.37 were produced with transgene constructs 34 (which had a total transgene length of 186 nucleotides of poly (a) and 4565 nucleotides) and 37 (which had a total transgene length of 208 nucleotides of poly (a) and 4619 nucleotides), respectively. Each vector showed dose-dependent editing at the target locus, and aav.34, which contained a shorter poly (a) signal, had about 75% editing efficacy of aav.37 for both doses.

Under experimental conditions, the results demonstrate that expression of small CRISPR proteins, such as CasX, can be regulated by poly (a) signals of different lengths. Longer poly (a) sequences may be used in AAV constructs to enhance CasX activity, while shorter poly (a) sequences may be used in AAV constructs to make more sequence space available for additional helper elements within the AAV transgene.

Example 7: small CRISPR protein potency is modulated by the location of regulatory elements in AAV vectors

The orientation (forward or reverse) and position (upstream or downstream of the CRISPR gene) of regulatory elements such as the gRNA promoter and guide scaffold complex can regulate the potential expression of small CRISPR proteins and the overall editing efficiency of the CRISPR system in AAV vectors. The objective of these experiments was to assess the optimal orientation and position of regulatory elements within the AAV genome to enhance the efficacy of small CRISPR proteins and guide RNAs.

Materials and methods：

AAV vector production and QC, nuclear transfection, AAV virus production were performed as described in example 1, and editing level evaluation was performed in mNSC-tdT cells by FACS.

Results：

Construct 44 (the configuration shown in fig. 24, second from the top) contains a Pol III promoter that drives expression of vector scaffold 174 and spacer 12.7 in the reverse direction of construct 3 (the top configuration in fig. 15). FIG. 25 demonstrates that construct 44 modifies the target termination cassette in mNPC in a dose-dependent manner similar to construct 3 when delivered by nuclear transfection of AAV transgenic plasmids.

FIG. 26 shows that construct 44 delivered as an AAV vector edits the target termination cassette in mNPC, further supporting the utility of the construct. Aav.3 and aav.44 were produced with transgene constructs 3 and 44, respectively. Each vector showed dose-dependent editing at the target locus (fig. 26, left panel, where 3-fold dilutions were used to determine the vector). FIG. 26, right, shows the position at 3X 10⁵ Is edited at an MOI of (a), wherein aav.44 has an editing efficacy of 60% of the original configuration of vector aav.3.

This experiment demonstrates that although the orientation of the portions within the AAV genome can vary, sufficient expression of CRISPR proteins and guide RNAs results. This suggests that the particular orientation or position of the regulatory elements relative to the encoded protein or RNA component may allow for controlled modulation of expression of AAV constructs (which contain one or more guides) packaging CasX.

Example 8: the efficacy of small CRISPR proteins is enhanced by the inclusion of additional regulatory elements in AAV vectors, a pair ofIs not possible with larger proteins。

The objective of these experiments was to demonstrate that the level of transcription mediated by AAV vectors delivering small CRISPR proteins (such as CasX) can be enhanced by the inclusion of different regulatory elements (intron sequences, enhancers, etc.), which typically do not fit in AAV vectors expressing large transgenic (e.g., spCas 9) plasmids.

Materials and methods：

Cloning and QC: AAV-cis plasmids were generated using a 4-piece Golden Gate module consisting of a pre-digested AAV backbone, DNA encoding a small CRISPR protein, and flanking 5 'and 3' DNA sequences, as described in example 1. The 5 'sequence contains the enhancer, protein promoter and N-terminal NLS, while the 3' sequence contains the C-terminal NLS, WPRE, poly (A) signal, RNA promoter and guide RNA containing spacer 12.7. The 5 'and 3' portions were purchased as gene fragments from Twist, amplified by PCR, and assembled into AAV vectors. Cloning and plasmid QC, nuclear transfection and FACS methods were performed as described in example 1.

Editing was enhanced by including post-translational regulatory elements (PTRE) 1, 2, 3 in AAV cis plasmid 3, which was tested in a combination of different promoters driving expression of CasX. A first set of promoters was tested; transgenic plasmids 4, 35, 36, 37, transgenic plasmids 5, 38, 39, 40 and transgenic plasmids 6, 42, 43 have CasX protein expression driven by the CMV, ubC, EFS, CMV-s promoter, respectively. The second set of constructs tested included PTRE between protein and poly (A) sequences and were produced with promoters Jet, jetUsp compared to the UbC promoter driving expression of CasX (transgenes 58, 72, 73, 74; transgenes 59, 75, 76, 77, and transgenes 53, 80, and 81, respectively). The sequence of PTRE is listed in Table 12 and the enhancer plus promoter sequence is listed in Table 13. In addition to the sequences encoding CasX (table 21) and one or more grnas (tables 18 and 19), the sequences of the additional components of the AAV constructs are listed in table 26.

Table 12: constructs and sequences of post-transcriptional elements tested on base constructs ID 4, 5, 6, 53, 58 and 59

Table 13: enhancer elements and sequences tested in combination with CMV core promoter

Results：

The effect of PTRE on transgene expression was assessed by cloning 3 enhancer sequences (PTRE 1, PTRE2 and PTR 3) into AAV-cis plasmid (construct 3) and constructs containing shorter protein promoters (constructs 4, 5, 6, 53, 57 and 58 containing 400bp, 234bp, 335bp, 400bp, 164bp and 326bp promoter sequences, respectively).

AAV-cis plasmid activity was first confirmed by nuclear transfection in mNPC-tdt cells. For each vector, the addition of PTRE enhanced editing activity at different levels (fig. 27). Table 14 provides the lengths of promoters and PTRE. Addition of PTRE2 to the transgene cassette showed the highest enhancement of CasX editing activity, 2-fold (58.5% versus 25%) increase in editing level of construct 36 compared to construct 4; the level of editing for construct 39 was increased 1.5-fold (35.4% versus 22.9%) compared to construct 5; the level of editing for construct 42 was increased 3-fold (30.5% versus 12%) compared to construct 6. The shortest enhancer sequence PTRE3 in constructs 37 and 43 also increased protein activity at different levels compared to other vectors.

An improvement in the level of editing was also observed when the construct was packaged into AAV. Inclusion of PTRE2 in the transgene increased editing in AAV vectors in a similar manner. The trend of editing on target observed in mNPC with AAV infection was generally correlated with AAV plasmid nuclear transfection dataset (fig. 28).

This trend was confirmed by testing another set of promoters containing these enhancer sequences therein. Among all AAV vectors tested, constructs comprising PTRE1 and PTRE2 in the genome produced an average 1.5-fold increase compared to the base vector (fig. 29). The unique combination of short promoters and these post-transcriptional sequences resulted in the identification of vectors with increased levels of editing of shorter promoters (e.g., aav.74), which is an advantage for AAV production, both below the load-bearing capacity limitations of AAV and allowing for the inclusion of more regulatory elements and CRISPR elements (e.g., guides) (fig. 30).

The results also demonstrate that inclusion of PTRE1 in the transgenic plasmid improved the level of editing of all promoters evaluated (fig. 31), with less variability, while PTRE2 produced the highest transgenic improvement, but with greater variability in the promoters tested.

Several constructs with tissue-specific neuronal enhancers upstream of a single constitutive promoter were tested. In this assay, 7 neuronal enhancer sequences (constructs 65 to 72) were cloned into a single AAV-cis plasmid (64) containing the core CMV promoter, and all of these sequences demonstrated superior editing via nuclear transfection over the base construct 64 (fig. 32). These constructs were also superior to construct 53, which contained the UbC promoter, but not to construct 3, which contained the complete CMV promoter (CMV enhancer+cmv core promoter).

Table 14: constructs with or without PTRE, and showing sequence length

The results demonstrate that the use of a small promoter in an AAV transgene construct allows for the inclusion of additional auxiliary elements. These additional auxiliary elements (such as post-transcriptional regulatory elements of AAV transgenes expressing CasX under the control of short and strong promoter sequences) can increase CasX expression and on-target editing while reducing load size so that all components can be incorporated into a single AAV vector.

Example 9: small CRISPR protein efficacy is enhanced by inclusion and combination of additional regulatory elements in AAV vectors

The objective of these experiments was to demonstrate that CRISPR protein and gRNA complex mediated editing can be enhanced in an integrated single AAV vector that can contain more than one guide RNA. In addition, experiments were conducted to show that inclusion and combination of many regulatory elements can enhance efficacy, and that larger AAV genomes with more regulatory sequences produce greater editing activity. The length of the auxiliary and regulatory sequences that can be included in AAV transgenes by CasX systems exceeds that which can be included in conventionally used CRISPR proteins, which are limited by the length of larger Cas proteins (such as Cas 9).

Materials and methods：

Plasmid cloning, QC and nuclear transfection were performed as described in example 1.

The orientation of multiple RNA transcription unit blocks called "guide RNA stacks" (each consisting of sgRNA scaffold-spacer 174.12.7, 1.74.12.2 or 174.Nt, driven by the U6 promoter) (fig. 35) was studied by: the two guide RNA stacks were cloned on the 3' end of poly (a) in tail-to-tail orientation (plasmid ID 45-49), or still in the same transcriptional orientation as the CasX protein/promoter, one on each side of the protein (plasmid id=50-52). Pentagonal boxes of the protein promoter and Pol III promoter describe the orientation of transcription (cone point; 5 'to 3' or 3 'to 5' orientation). The spacer sequence is 12.2 (TATAGCATACATTATACGAA (SEQ ID NO: 40807)); 12.7 (CTGCATTCTAGTTGTGGTTT (SEQ ID NO: 40800)); and NT (GGGTCTTCGAGAAGACCC (SEQ ID NO: 40505)). AAV vector production and titration were performed as described in example 1. AAV transduction and editing assessment via FACs sorting were performed as described in example 1.

Results：

FIG. 33 is a schematic diagram of the structure of a construct showing how the guide RNA components are combined in the various constructs (constructs 1 and 2). Fig. 34 shows a further configuration. The results of the editing assay depicted in fig. 36 demonstrate that constructs delivered as AAV transgenic plasmids to mNPC in architecture 1 were edited with enhanced efficacy. Different combinations of spacers and non-targeting spacers demonstrated that each individual guide RNA was active, although the architecture with one targeting spacer and one non-targeting spacer (constructs 45 and 46) produced a lower editing level of about 18%. Certain combinations of targeting spacers produce increased efficacy. The combination of spacer 12.7 with sequence CTGCATTCTAGTTGTGGTTT (SEQ ID NO: 40800) and spacer 12.2 with sequence TATAGCATACATTATACGAA (SEQ ID NO: 40807) (construct 48) was edited with significant potency in guide RNA architecture 1, whereas the editing potency of both sets 12.7 (construct 47) was 10% higher than that of construct 3 alone. 125 and 62.5ng of each CasX construct were nuclear transfected in mNPCs and edited by FACS evaluation 5 days after transfection. Data are expressed as mean ± SEM of n=3 replicates.

The results of fig. 37 show that guide RNA stack architecture 2 (see fig. 33) delivered as AAV transgene plasmid to mNPC also edited the target nucleotide. 125 and 62.5ng of each CasX construct were nuclear transfected in mNPCs and edited by FACS evaluation 5 days after transfection. Data are expressed as mean ± SEM of n=3 replicates.

The results of fig. 38 demonstrate that constructs 3, 45, 46, 47 and 48 delivered as AAV edit the target termination cassette in mNPC in guide RNA architecture 1. Aav.3, aav.45, aav.46, aav.47 and aav.48 were produced with transgene constructs 3 and 45, 46, 47 and 48, respectively. Each vector showed dose-dependent editing at the target locus (fig. 38, left panel). At the MOI of 3e5, aav.47 had <5% lower potency than the original orientation vector aav.3 (fig. 38, right panel).

These experiments demonstrate the feasibility of using multiple guide RNAs in combination with a complete Cas protein sequence in one AAV genome, which was previously not possible with larger CRISPR proteins (such as Cas 9) due to packaging limitations of AAV capsids. Furthermore, these experiments also demonstrate that multiple guide RNAs in an integrated vector also retain the ability to edit target nucleic acids.

Example 10: small CRISPR protein potency is enhanced by Nuclear Localization Sequence (NLS) selection。

Experiments were performed to determine if changes in the Nuclear Localization Sequences (NLS) used in the constructs could modulate the editing results in the AAV environment. In the larger context of optimizing AAV for editing using CasX proteins, this initial screening is the first attempt to determine which NLS should be used to advance the construct.

Materials and methods：

Cloning and QC: AAV vectors are cloned using a 4-piece Golden Gate module consisting of a pre-digested AAV backbone, DNA encoding a small CRISPR protein, and flanking 5 'and 3' DNA sequences. The 5 'sequence contains the enhancer, protein promoter and N-terminal NLS, while the 3' sequence contains the C-terminal NLS, WPRE, poly (A) signal, RNA promoter and guide RNA containing spacer 12.7. The 5 'and 3' portions were sequenced as gene fragments from Twist, PCR amplified, and assembled into AAV vectors by a circulating golden channel reaction using T4 ligase and BbsI. NLS sequences are shown in tables 15 and 16.

The assembly of AAV vectors and QC and nuclear transfection were performed as described in example 1. In addition to the sequences encoding CasX (table 21) and one or more grnas (tables 18 and 19), the sequences of the additional components of the AAV constructs are listed in table 26.

Table 15:5' NLS sequence

* The bolded sequence is NLS, while the non-bolded sequence is linker.

Table 16:3' NLS sequence

AAV transduction and editing level assessment was performed in mNSCT-tdT cells by FACS as described in example 1.

Results：

Initial plasmid nuclear transfection showed that many NLS permutations exhibited improved editing when compared to the control (1 x sv40 NLS on both N-and C-termini). In particular, N-terminal variants containing Cmyc or nucleoplasmin NLS were significantly better than SV40 NLS combinations (fig. 39). This trend of N-terminal NLS changes was replicated in AAV transduction, with Cmyc and nucleoplasmin NLS variants again being superior to SV40 NLS variants (fig. 40). Finally, the variation to keep Cmyc constant was tested (fig. 41), and the results demonstrated that the best construct contained Cmyc NLS at both the N-and C-termini.

The data indicate that selecting the amino acid sequence of the NLS enhances editing results in the AAV environment. In particular, NLS variants containing the N-terminal Cmyc show a significant improvement compared to the N-terminal SV40 NLS variants. Furthermore, the C-terminal Cmyc and Nuc variants improved editing compared to the SV40 NLS variants. Repetition of SV40 NLS appears to be detrimental to editing efficiency on both the N-and C-termini.

Example 11: small CRISPR protein expression is enhanced by the addition of introns in the 5' UTR。

The objective of this experiment was to demonstrate that the level of transcription mediated by AAV vectors delivering small CRISPR proteins (such as CasX) can be enhanced by the inclusion of different regulatory elements (such as intronic sequences taken from the viral, mouse or human genome) which typically would not fit in AAV vectors expressing large transgenic (e.g. spCas 9) plasmids.

Method：

AAV-cis plasmids will be generated using a 4-piece Golden Gate module consisting of a pre-digested AAV backbone, DNA encoding a small CRISPR protein, and flanking 5 'and 3' DNA sequences. The 5 'sequence will contain a protein promoter, including UbC, jeT, CMV, CAG, CBH, hSyn or other Pol2 promoter, intron region and N-terminal NLS, while the 3' sequence will contain a C-terminal NLS, poly a signal, RNA promoter and guide RNA containing spacer region 12.7. The 5 'and 3' portions will be PCR amplified and assembled as described in example 1. Cloning and plasmid QC, AAV virus production were performed as described in example 1 and editing level assessment was performed in mNPTC-tdT cells by FACS. Non-limiting examples of intron sequences to be incorporated into the constructs are listed in table 17.

Enhanced editing by inclusion of intron 36 (transgenic plasmid 59) will be tested against transgenic plasmid 58, which transgenic plasmid 58 is a basic construct without introns. The remaining introns are hypothetical intron sequences that can be used to encode future constructs of CasX and are derived from viral, mouse, and human sources.

Table 17: intron sequences for incorporation into base construct 58

Results：

The effect of introns on transgene expression was assessed by: 50 different introns were cloned into AAV-trans plasmid and then the edits were determined in a tdmamato assay.

The addition of an intron sequence generally increases the overall editing efficiency of AAV transgenes when compared to a base construct without an intron.

The expected results support that the addition of introns to AAV transgenes expressing CasX under the control of short and strong promoter sequences will be able to increase CasX expression and on-target editing while reducing load size, thereby further optimizing the AAV system.

Example 12: improved guide variants exhibit enhanced in vitro on-target activity

Experiments were performed to identify engineered guide RNA variants with increased activity at different genomic targets, including treatment-related mouse and human Rho exon 1. Previous assays identified a number of different "hot spot" regions (e.g., stem loops) within the scaffold sequence that had the potential to significantly increase editing efficiency as well as specificity. In addition, screening was performed to identify scaffold variants that would increase the overall activity of our CRISPR system in AAV vectors spanning multiple different PAM-spacer combinations without triggering off-target or non-specific editing. Achieving increased editing efficiency compared to current reference vectors would allow for the use of reduced viral vector doses in vivo studies, thereby improving the safety of AAV-mediated CasX-guidance systems.

Method：

New gRNA scaffolds and spacer variants were inserted into AAV transgene constructs for plasmid and viral vector validation (encoding the sequences in tables 18 and 19). The CasX 491 variant protein was used for all constructs evaluated in this experiment, however the present disclosure contemplates the use of any CasX variant, including those of table 3 and the coding sequences of table 21. We conceptually split AAV transgenes between ITRs into different parts consisting of our therapeutic cargo and accessory elements related to expression in mammalian cells, and our nuclease-guide RNA complex (protein nucleases, scaffolds, spacers). A schematic and conceptual portions thereof are shown in fig. 42. The nucleic acid sequences of the remaining components common to the various constructs are shown in table 26, the coding sequences for the guides are shown in tables 18 and 19, and the coding sequences for CasX are shown in table 21, to enable elucidation of the various permutations of the transgenes.

Cloning: each portion of the AAV genome is separated by a restriction enzyme site for modular cloning. These parts were purchased as gene fragments from Twist, amplified by PCR and digested with the corresponding restriction enzymes, washed and then ligated into vectors also digested with the same enzymes. The new AAV constructs were then transformed into chemically competent E.coli (Turbos or Stbl3 s). Transformed cells were allowed to recover for 1 hour in a shaking incubator at 37℃and then inoculated on kanamycin LB-agar plates, and allowed to grow at 37℃for 12 to 16 hours. Colonies were picked into 6mL kanamycin treated 2xyt and allowed to grow for 7 to 14 hours before miniprep and Sanger sequencing. The transformation and miniprep protocol was then repeated and sequence verification was again performed on the spacer cloning vector. Verified constructs were prepared in large quantities. To assess the quality of mass production, constructs were treated with XmaI (which cut at several sites per ITR) and XhoI (which cut once in AAV genome) in two separate digests. These digests and uncleaved constructs were then run on a 1% agarose gel and imaged on ChemiDoc. If the plasmid is >90% supercoiled, correctly sized and ITR complete, the construct continues to be tested via nuclear transfection and subsequently used for AAV vector production.

Table 18: guide sequence cloned into p59.491.u6.X.y. Plasmid (x=guide; y=spacer)

Table 19: the guide sequence cloned into the p59.491.u6.x.y. Plasmid (x=guide; y=spacer) has a spacerCompartment length variants

Table 20: sequences of AAV vector components shared by plasmids

Table 21: sequences of CasX for AAV

CasX	SEQ ID NO:	Nucleic acid sequences
			438	40577	ND
491	40578	ND
			527	40579	ND
535	40580	ND
			536	40581	ND
537	40582	ND
			583	40583	ND
668	40584	ND
			672	40585	ND
669	40586	ND
			670	40587	ND
676	40588	ND

* Nd=the sequences provided in the sequence listing are not described.

Reporter gene cell line: the neural progenitor cell line isolated from Ai9-tdTomato was cultured in pre-equilibrated mNPC medium (DMEM/F12 containing GlutaMax, 10mM HEPES, 1 XMEM nonessential amino acids, 1 Xpenicillin/streptomycin, 1:1000-mercaptoethanol, 1X B-27 supplement, vitamin A minus, 1 XN 2 with supplemented growth factors bFGF and EGF). Prior to testing, cells were dissociated using accutase, gently resuspended, and the complete separation of neurospheres monitored. The cells were then quenched with medium, centrifuged and resuspended in fresh medium. Cells were counted and used directly for nuclear transfection or 10,000 cells were seeded in 96-well plates coated with PLF (1X poly-DL-ornithine hydrobromide, 10mg/mL in sterile diH20, 1X laminin and 1X fibronectin) 2 days prior to AAV transduction.

HEK293T dual reporter cell lines were generated by knocking two transgene cassettes into HEK293T cells, which constitutively express exon 1 of the human RHO gene linked to GFP and exon 1 of the human p23h.rho gene linked to mscarlet. The modified cells were expanded by serial passage every 3-5 days and maintained in Fibroblast (FB) medium consisting of Du's modified Eagle Medium (DMEM; corning Cellgro, # 10-013-CV) supplemented with 10% fetal bovine serum (FBS; seraidigm, # 1500-500)) and 100 units/mL penicillin and 100mg/mL streptomycin (100 x-Pen-Strep; GIBCO # 15140-122), and may additionally include sodium pyruvate (100 x, thermofisher # 11360070), nonessential amino acids (100x ThermoFisher#11140050), HEPES buffer (100x ThermoFisher#15630080) and 2-mercaptoethanol (1000x ThermoFisher#21985023). Cells were incubated at 37℃and 5% CO 2. After 1-2 weeks, GFP+/mscarlet+ cells were mass-sorted into FB medium. The reporter cell line was expanded every 3-5 days by serial passage and maintained in FB medium in an incubator at 37 ℃ and 5% CO 2. Reporter gene clones were generated by limiting dilution. Clonal cell lines are characterized by flow cytometry, genomic sequencing, and functional modification of the RHO locus using previously validated RHO-targeted CasX molecules. The best reporter cell line was identified as the following: i) Wtrho.gfp and mutho.mscarlet with a single copy that integrates correctly within each cell; ii) maintaining a doubling time equivalent to that of the unmodified cells; and iii) results in a reduction in GFP and mscarlet fluorescence after disruption of the RHO gene when measured using the method described below.

Plasmid nuclear transfection: AAV cis-plasmids driving expression of the CasX scaffold guide system were nuclear transfected in mNPC using the Lonza P3 primary cell 96-well Nucleofector kit. For the ARPE-19 line, lonza SF solution and supplement were used. The plasmid was diluted to a concentration of 200 ng/. Mu.L, 100 ng/. Mu.L. mu.L of DNA from each construct was added to P3 or SF solutions containing 200,000tdTomato mNPC or ARPE-19 cells, respectively. The pooled solutions were subjected to nuclear transfection using the Lonza 4D Nucleofector system according to manufacturer's guidelines. Following nuclear transfection, the solution is quenched with a suitable medium. The solution was then aliquoted in triplicate (approximately 67,000 cells/well) in 96-well plates. 48 hours after transfection, the treated mNPC was supplemented with fresh mNPC medium containing growth factors and the treated ARPE-19 cells were supplemented with fresh FB medium. 5 days after transfection, tdTomato mNPC and ARPE-19 cells were isolated and activity assessed by FACS.

AAV vector production: suspension HEK293T cells were derived from parental HEK293T and grown in FreeStyle 293 medium. For screening purposes, small-scale cultures (20 mL to 30mL in 125mL Erlenmeyer flasks and stirred at 110 rpm) were diluted to a density of 1.5e+6 cells/mL on the day of transfection. The transgenic endotoxin-free pAAV plasmid with flanking ITR repeats was co-transfected with a plasmid providing an adenovirus helper gene for replication and AAV rep/cap genome using PEIMax (Polysciences) in serum-free optmem medium. Cultures were supplemented with 10% CDM4HEK293 (HyClone) 3 hours post-transfection. After three days, the culture was centrifuged at 1000rpm for 10 minutes to separate the supernatant from the cell pellet. The supernatant was mixed with 40% PEG 2.5M NaCl (8% final concentration) and incubated on ice for at least 2 hours to precipitate AAV viral particles. Cell pellets containing most of the AAV vector were resuspended in lysis medium (0.15M NaCl, 50mM Tris HCl, 0.05% Tween, pH 8.5), sonicated on ice (15 seconds, 30% amplitude) and treated with Benzonase (250U/. Mu.L, novagen) for 30 minutes at 37 ℃. The crude lysate and PEG-treated supernatant were then spun at 4000rpm for 20 minutes at 4 ℃ to resuspend PEG-precipitated AAV (precipitate), wherein the crude lysate (supernatant) without cellular residues was further clarified using a 0.45 μm filter.

To determine viral genome titres, 1 μl from crude lysate virus was digested with DNase and ProtK followed by quantitative PCR. mu.L of digested virus was used in 25. Mu.L of qPCR reaction consisting of IDT prime mix and a set of primers and 6' FAM/Zen/IBFQ probe (IDT) designed to amplify CMV promoter region (Fwd 5'-CATCTACGTATTAGTCATCGCTATTACCA-3' (SEQ ID NO: 40801), rev 5'-GAAATCCCCGTGAGTCAAACC-3' (SEQ ID NO: 40802), probe 5'-TCAATGGGCGTGGATAG-3' (SEQ ID NO: 40803) or 62bp fragment in AAV2-ITR (Fwd 5'-GGAACCCCTAGTGATGGAGTT-3' (SEQ ID NO: 40804), rev5'-CGGCCTCAGTGAGCGA-3' (SEQ ID NO: 40805), probe 5'-CACTCCCTCTCTGCGCGCTCG-3' (SEQ ID NO: 40806)) A ten-fold dilution of AAV ITR plasmid (2e+9 to 2e+4DNA copies/mL each 5. Mu.L) was used as reference standard to calculate the titer of the virus sample (viral genome (vg)/mL.) the QPCR procedure was set to an initial denaturation step of 5 minutes at 95℃followed by 1 minute cycles of denaturation at 40℃and 1 minute extension at 60 ℃.

AAV transduction: at 48 hours prior to AAV transduction, 10,000 cells/well of mNPC were seeded in PLF coated wells in 96-well plates. All viral infection conditions were performed in triplicate with a normalized number of vg in the experimental vector in a series of 3-fold dilutions of the multiplicity of infection (MOI) ranging from about 1.0e+6vg/cell to 1.0e+4vg/cell. The calculation was based on an estimated number of 20,000 cells per well at the time of transfection. A final volume of 50. Mu.L of AAV vector diluted in pre-equilibrated mNPC medium supplemented with bFGF/EGF growth factor (20 ng/ml final concentration) was applied to each well. 48 hours after transfection, complete medium exchange was performed with fresh medium supplemented with growth factors. Editing activity (tdT + cell quantification) was assessed by FACS 5 days after transfection.

edit-Activity was assessed by FACS: 5 days after transfection, tdTomato mNPC or ARPE-19 cells treated in 96-well plates were washed with dPBS and treated with 50. Mu.L of TrypLE and trypsin (0.25%) for 15 min and 5 min, respectively. After cell dissociation, the treated wells were quenched with medium containing DMEM, 10% FBS and 1X penicillin/streptomycin. Resuspended cells were transferred to a round bottom 96-well plate and centrifuged at 1000Xg for 5 min. The cell pellet was then resuspended in dPBS containing 1 XPPI and the plate loaded into an Attune NxT flow cytometer autosampler. The Attune NxT flow cytometer was run using the following gating parameters: FSC-A XSSC-A to select cells, FSC-H XSSC-A to select single cells, FSC-A XSVL 1-A to select DAPI negative living cells, FSC-A XSYL 1-A to select tdTomato positive cells.

NGS analysis of indels at the mRHO exon 1 locus: 5 days after transfection, tdTomato mNPC treated in 96-well plates was washed with dPBS and treated with 50. Mu.L of TrypLE and trypsin (0.25%) for 15 min and 5 min, respectively. After cell dissociation, the treated wells were quenched with medium containing DMEM, 10% fbs and 1X penicillin/streptomycin. The cells were then centrifuged and the resulting cell pellet washed with PBS and then treated for gDNA extraction using zymomini DNA kit according to the manufacturer's instructions. To assess the level of editing occurring at the mouse RHO exon 1 locus, a set of primers (Fwd 5'-ACACTCTTTCCCTACACGACGCTCTTCCGATCTNNNNNNNGCAGCCTT GGTCTCTGTCTACG-3' (SEQ ID NO: 40595); in particular, these primers contained additional sequences at the 5 'end to introduce Illumina reads and 2 sequences and 16nt random sequences serving as Unique Molecular Identifiers (UMI) the quality and quantification of the amplicon was assessed using Fragment Analyzer DNA analysis kit (Agilent, dsDNA 35-1500 bp) the amplicon was sequenced on Illumina Miseq according to manufacturer's instructions, the original fastq file from the sequencing was processed by (1) trimming the sequence with the program cutadapt (v.2.1) to introduce Illumina reads and 2 sequences and 16nt random sequences serving as Unique Molecular Identifiers (UMI), and (3) combining sequences from reads 1 and 2 into a single insert sequence using the program sh2 (v.2.00) and (3) allowing the consensus insert sequence to run along with the sequence at any of the 35-1500bp intervals in the 35-3 'read window of the 35-35 percent of the 35-1500bp spacer sequence and the sequence to be deleted at any of the 3' end of the 35-30 bp spacer sequence.

Results：

Different editing experiments were performed to quantify the on-target cleavage mediated by CasX 491 (guide 174 and 229-237) paired with the gRNA scaffold variant, with different spacers targeting multiple genomic loci of interest. The construct was cloned into AAV backbone p59 flanked by ITR2 sequences, driving expression of protein Cas 491 under the control of the CMV promoter, and scaffold-spacer under the control of the human U6 promoter.

The mNPC-tdT reporter cell line was used to evaluate single cleavage efficiency at the endogenous mouse RHO exon 1 locus (spacer 11.30, CTC PAM). The dual reporter gene system integrated in the ARPE-19 derived cell line was also used to evaluate target editing at the exogenously expressed human WT Rho locus (spacer 11.1, ctc PAM).

Scaffold variants with spacer 11.30 were tested by nuclear transfection in mouse NPC cell lines at two different doses of 1000ng and 500 ng. The construct was compared for activity to the current baseline gRNA scaffold 174. Constructs expressing scaffold variants 231, 233, 234, 235 exhibited higher levels than constructs with scaffold 174.11.30 (fig. 43A and 43B). Scaffold 235 showed a 2-fold increase in activity at the mRHO exon 1 locus compared to gRNA scaffold 174. We further confirmed that by transfecting the dual reporter ARPE-19 cell line with constructs p59.491.174.11.1 and p59.491.235.11.1 and the non-target spacer control nuclei, scaffold 235 continued to increase activity without increasing off-target cleavage. Spacer 11.1 targets the exogenously expressed hRHO-GFP gene. Scaffold 235 showed 3-fold increased activity compared to 174 (9% versus 3% Rho-GFP cells, fig. 44A and 44B, respectively). Allele specificity was assessed by observing the frequency of populations of hP23H-RHO-Scarlett cells whose sequences differ from the wild type by 1 bp.

We also attempted to demonstrate that these scaffold variants are packaged efficiently in AAV and remain effective when delivered virally. In comparison to mNPC infected with aav.491.174.11.30 at 3.0e+5moi, mNPC transduced with AAV vector expressing guide scaffold 235 with spacer 11.30 (on target, mouse WT RHO) showed increased activity (> 5-fold increase at the on target locus, fig. 45A and 45B), where no significant off-target indels were detected with both aav.491.174.11.31 and aav.491.235.11.31 vectors targeting the P23H-RHO SNP, respectively.

The effect of spacer length was evaluated: another set of experiments was performed to test whether spacer length variants could increase on-target activity. Spacer 11.39, 11.38 and spacer 11.37 (19 nt P23H RHO), 11.36 (18 nt P23H RHO) were designed from parental spacer 11.30 (20 nt WT RHO) and 11.31 (20 nt P23H RHO), respectively, containing a 1bp or 2bp truncation at the 3' end of the sequence. mfNPC-tdT cells were nuclear transfected with 1000ng and 500ng constructs p59.491.174.11.30 (20 nt WT RHO), p59.491.174.11.39 (19 nt WT RHO), p49.491.174.11.38 (18 nt WT RHO) and the level of editing was assessed after 5 days. All truncated spacer versions increased the level of editing (fig. 46A and 46C), with the highest increase observed with the p59.491.11.39 construct (about a 2-fold increase achieved with the 19bp spacer relative to the 20bp spacer length construct). No increase in off-target cleavage was observed with the truncated spacer variant targeting the 11.31 spacer of the mouse P23H-RHO locus (fig. 46B).

These results support that scaffold variants with structural mutations can be engineered to have increased activity in dual reporter gene systems studying treatment-related genomic targets such as the mouse and human RHO exon 1 loci. Furthermore, although the newly characterized scaffolds showed a > 2-fold overall increase in activity, no off-target cleavage with a 1bp mismatch spacer was detected. This is associated with allele-specific treatment strategies such as adRP 23H Rho, where the mutant allele differs from the WT sequence by 1 nucleotide, targeted by spacer 11.31. This study further demonstrates the use of the guide scaffold 235 in AAV vectors designed for P23H RHO rescue and genotoxicity studies, as well as other therapeutic targets.

Example 13: improved scaffolds and guide variants exhibit enhanced on-target activity in vivo。

Experiments were performed to demonstrate that engineered CasX and sgRNA guide and spacer variants comprising structural mutations with increased selectivity and on-target activity lead to increased editing when delivered in vivo to photoreceptors in mouse retina, with the spacer targeting therapeutically relevant levels of P23 residues in WT. Here we assessed whether the vector expressing CasX variant 491, guide variant 235 and spacer 11.39 increased the level of editing in vivo compared to the parental CasX 491, guide variant 174 and spacer 11.30.

Materials and methods：

Production of AAV plasmids and viral vectors: casX variant 491 under the control of the RHO promoter and sgRNA guide variant 174 with spacer 11.30 and spacer 11.31 (AAGTGGCTCCGCACCACGCC (SEQ ID NO: 40503)) or sgRNA guide variant 235 with spacer 11.39 (AAGGGGCTCCGCACCACGCC (SEQ ID NO: 40531)) and 11.37 (AAGTGGCTCCGCACCACGC (SEQ ID NO: 40535) targeting the P23 residue of mouse RHO exon 1) and under the U6 promoter were cloned into P59 plasmids flanking AAV2 ITR.

Cloning: each portion of the AAV genome is separated by a restriction enzyme site for modular cloning. These parts were purchased as gene fragments from Twist, amplified by PCR and digested with the corresponding restriction enzymes, washed and then ligated into vectors also digested with the same enzymes. Cas X variant 491 under the control of the RHO promoter and scaffold variants 174 and 235 under the control of the human U6 promoter were cloned into the AAV backbone flanking AAV2 ITRs. The spacers 11.30, 11.31 and variants 11.39, 11.37 were cloned into pAAV. RHO.491.174 and pAAV. RHO.491.235, respectively, using the Golden Gate clone. The new AAV constructs were then transformed into chemically competent E.coli (Stbl 3 s). Verified constructs were prepared in large quantities. To assess the quality of mass production, constructs were treated with XmaI (which cut at several sites per ITR) and XhoI (which cut once in AAV genome) in two separate digests. If the plasmid is > 90% supercoiled, correctly sized and ITR complete, the construct is then used for AAV vector production.

AAV vector production: suspension HEK293T cells were derived from parental HEK293T and grown in FreeStyle 293 medium. 500mL of culture (1L Erlenmeyer flask, stirring at 110 rpm) was diluted to a density of 2e+6 cells/mL on the day of transfection. The transgenic endotoxin-free pAAV plasmid with flanking ITR repeats was co-transfected with a plasmid providing an adenovirus helper gene for replication and AAV rep/cap genome using PEIMax (Polysciences) in serum-free optmem medium. Cultures were supplemented with 10% CDM4HEK293 (HyClone) 3 hours post-transfection. After three days, the culture was centrifuged at 1000rpm for 10 minutes to separate the supernatant from the cell pellet. The supernatant was mixed with 40% PEG 2.5M NaCl (8% final concentration) and incubated on ice for at least 2 hours to precipitate AAV viral particles. Cell pellets containing most of the AAV vector were resuspended in lysis medium (0.15M NaCl, 50mM Tris HCl, 0.05% Tween, pH 8.5), sonicated on ice (15 seconds, 30% amplitude) and treated with Benzonase (250U/. Mu.L, novagen) for 30 minutes at 37 ℃. The crude lysate and PEG-treated supernatant were then spun at 4000rpm for 20 minutes at 4 ℃ to resuspend PEG-precipitated AAV (precipitate), wherein the crude lysate (supernatant) without cellular residues was further clarified using a 0.45 μm filter. AAV lysates were purified using affinity chromatography (POROS CaptureSelect AAVX, thermoFisher). The eluate was buffer exchanged and concentrated in PBS +200mm nacl +0.001% Pluronic.

To determine viral genome titres, 1 μl from crude lysate virus was digested with DNase and ProtK followed by quantitative PCR. mu.L of digested virus was used in 25. Mu.L of qPCR reaction consisting of IDT prime mix and a set of primers and 6' FAM/Zen/IBFQ probe (IDT) designed to amplify a 62bp fragment located in AAV2-ITR (Fwd 5'-GGAACCCCTAGTGATGGAGTT-3' (SEQ ID NO: 40804); rev 5'-CGGCCTCAGTGAGCGA-3' (SEQ ID NO: 40805), probe 5'-CACTCCCTCTCTGCGCGCTCG-3' (SEQ ID NO: 40806)). The titer (viral genome (vg)/mL) of the viral sample was calculated using AAV ITR plasmid as a reference standard. The QPCR procedure was set to: an initial denaturation step of 5 min at 95℃followed by 40 cycles of denaturation at 95℃for 1 min and annealing/extension at 60℃for 1 min.

Subretinal injection C57BL6J mice were obtained from jackson laboratory (Jackson Laboratories) and maintained in a normal 12 hour light/dark cycle. Subretinal injection was performed on 3-4 week old mice. Anesthetized mice were inhaled with isoflurane. Promecaine (0.5%) was topically applied to the cornea and the eye was dilated with drops of topiramate (1%) and phenylephrine (2.5%). During surgery, general gel was used to maintain eye lubrication. Under a surgical microscope, an ultrafine 30/2 gauge disposable needle was passed through the sclera, equator, and vicinity of the limbus to form a small hole in the vitreous cavity. Using a blunt needle, 1. Mu.L to 1.5. Mu.L of virus was injected directly into the subretinal space between the RPE and the retinal layer. Each mouse of the experimental group was injected with 1.5.0e+9 viral genome (vg)/eye.

NGS analysis: 3 weeks after injection, animals were sacrificed and eyes were removed in fresh PBS. The whole retina was isolated from the eye cup and gDNA extraction treatment was performed using dnasy blood and tissue kit (Qiagen) according to the manufacturer's instructions. The amplicon was amplified from 200ng gDNA using a set of primers (Fwd 5'-ACACTCTTTCCCTACACGACGCTCTTCCGATCTNNNNNNNGCAGCCTT GGTCTCTGTCTACG-3' (SEQ ID NO: 40595); rev 5'-GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTGCCCCAGTCTCTCT GCTCATACC-3' (SEQ ID NO: 40596)) targeting the mouse RHO exon 1 locus, bead purified (Beckman coulter, agencourt Ampure XP), and then amplified to incorporate the illuminena linker sequence. Specifically, these primers contain additional sequences at the 5' end to introduce Illumina reads and 2 sequences and a 16nt random sequence that serves as a Unique Molecular Identifier (UMI). The quality and quantification of the amplicons was assessed using the Fragment Analyzer DNA assay kit (Agilent, dsDNA 35-1500 bp). Amplicons were sequenced on an Illumina Miseq according to the manufacturer's instructions. The original fastq file from sequencing was processed as follows: (1) Trimming the sequence for quality and adaptor sequence using the program cutadapt (v.2.1); (2) Combining sequences from reads 1 and 2 into a single insert sequence using program flash2 (v 2.2.00); and (3) running the consensus insert sequence along with the desired amplicon sequence and spacer sequence through the program CRISPResso2 (v 2.0.29). The procedure quantifies the percentage of reads modified in the window around the 3 'end of the spacer (30 bp window centered at-3 bp from the 3' end of the spacer). The activity of CasX molecules is quantified as the total percentage of reads containing insertions, substitutions and/or deletions anywhere within the window.

Results：

The reference vector aav.491.174.11.30 (on target) achieved about 8% editing in all samples (fig. 47a; n=8 retinas). Similar vectors with spacer 11.31 (off-target, 1bp mismatch with 11.30 targeting the P23H-RHO SNP) showed background editing levels (about 0.4%). AAV vectors expressing the scaffold variant 235 and spacer 11.39 achieved a more than 2-fold improvement over aav.491.174.11.30 parental vector (fig. 47B), averaging 16% editing, and up to 25% in some retinas. This increase in editing on target was still selective, as there was no increase in the level of off-target of spacer 11.37 (targeting the P23H-RHO SNP with a 1bp mismatch to spacer 11.39) compared to the AAV.491.174.11.31 parental vector.

These experiments demonstrate the proof of concept that expression of CasX 491 driven by a rod photoreceptor selective promoter with a scaffold 174 and a spacer targeting the mouse P23 RHO locus can achieve editing of therapeutically relevant levels at the P23 mouse locus when subretinally delivered by AAV in the mouse retina. These results also support that the editing level achieved by engineered sgRNA guide variants (235) and spacer variants (11.39) screened from previous in vitro are also translatable in vivo and retain allele-specific selectivity. This study further demonstrates the use of the guide scaffold 235 in AAV vectors designed for P23H RHO rescue and genotoxicity studies, as well as other therapeutic targets.

The results of examples 11 and 12 support that scaffold variants with structural mutations can be engineered to have increased activity in dual reporter gene systems that study treatment-related genomic targets, such as the mouse and human RHO exon 1 loci. Furthermore, although the newly characterized 235 scaffold showed a > 2-fold increase in overall activity, no off-target cleavage with a 1bp mismatch spacer was detected. This is associated with allele-specific treatment strategies such as adRP 23H Rho, where the mutant allele differs from the WT sequence by 1 nucleotide, targeted by spacer 11.31. The present study was conducted to further demonstrate the use of the guide scaffold 235 in AAV vectors designed for mouse P23H RHO rescue and genotoxicity studies, as well as other therapeutic targets.

Example 14: improved CasX variants exhibit enhanced in vitro on-target activity

The CasX protospacer adjacent motif allows for precise genome targeting, which is necessary for various genome editing therapeutic applications, such as autosomal dominant RHO, which require allele-specific targeting of the P23H mutation without altering the wild type sequence.

Experiments were conducted to investigate whether a rationally designed engineered CasX nuclease with an introduced mutation that would be expected to increase CTC-PAM mediated on-target activity while maintaining high fidelity, and with reduced off-target events, increased the level of editing at the endogenous mouse RHO locus when delivered to rod photoreceptor cells in vivo.

In addition, experiments were performed to further demonstrate the use of the guide scaffold 235 in AAV vectors designed for mouse P23H RHO rescue and genotoxicity studies, as well as other therapeutic targets.

Method：

The CasX protein variants identified in the different assays observing PAM activity were selected for their increased activity at CTC PAM. The CasX protein was cloned into AAV transgene constructs for plasmid and viral vector validation. We conceptually split AAV transgenes between ITRs into different parts consisting of our therapeutic cargo and accessory elements associated with expression in mammalian cells, as well as our nuclease-guide RNA complex (protein, scaffold, spacer).

Cloning: each portion of the AAV genome is separated by a restriction enzyme site for modular cloning. These parts were purchased as gene fragments from Twist, amplified by PCR and digested with the corresponding restriction enzymes, washed and then ligated into vectors also digested with the same enzymes. The new AAV constructs were then transformed into chemically competent E.coli (Stbl 3 s). Verified constructs were prepared in large quantities. To assess the quality of mass production, constructs were treated with XmaI (which cut at several sites per ITR) and XhoI (which cut once in AAV genome) in two separate digests. These digests and uncleaved constructs were then electrophoresed on a 1% agarose gel. If the plasmid is > 90% supercoiled, correctly sized and ITR complete, the construct continues to be tested via nuclear transfection and subsequently used for AAV vector production.

Reporter gene cell line: immortalized neural progenitor cell lines isolated from Ai9-tdTomato were cultured in pre-equilibrated mNPC medium (DMEM/F12 containing GlutaMax, 10mM HEPES, 1 XMEM nonessential amino acids, 1 Xpenicillin/streptomycin, 1:1000-mercaptoethanol, 1X B-27 supplement, vitamin A minus, 1 XN 2 with supplemented growth factors bFGF and EGF) before testing cells were isolated using accutase, gently resuspended, and complete isolation of neurospheres monitored.

HEK293T dual reporter cell lines were generated by knocking two transgene cassettes into HEK293T cells, which constitutively express exon 1 of the human RHO gene linked to GFP and exon 1 of the human p23h.rho gene linked to mscarlet. The modified cells were expanded by serial passage every 3-5 days and maintained in Fibroblast (FB) medium consisting of Du's modified Eagle Medium (DMEM; corning Cellgro, # 10-013-CV) supplemented with 10% fetal bovine serum (FBS; seraidigm, # 1500-500)) and 100 units/mL penicillin and 100mg/mL streptomycin (100 x-Pen-Strep; GIBCO # 15140-122), and may additionally include sodium pyruvate (100 x, thermofisher # 11360070), nonessential amino acids (100x ThermoFisher#11140050), HEPES buffer (100x ThermoFisher#15630080) and 2-mercaptoethanol (1000x ThermoFisher#21985023). Cells were incubated at 37℃and 5% CO 2. After 1-2 weeks, GFP+/mscarlet+ cells were mass-sorted into FB medium. The reporter cell line was expanded every 3-5 days by serial passage and maintained in FB medium in an incubator at 37 ℃ and 5% CO 2. Reporter gene clones were generated by limiting dilution. Clonal cell lines are characterized by flow cytometry, genomic sequencing, and functional modification of the RHO locus using previously validated RHO-targeted CasX molecules. The best reporter cell line was identified as the following: i) WT-rho.gfp and P23H-rho.mscarlet with a single copy that integrates correctly within each cell; ii) maintaining a doubling time equivalent to that of the unmodified cells; and iii) results in a reduction in GFP and mscarlet fluorescence after disruption of the RHO gene when measured using the method described below.

Plasmid nuclear transfection: AAV cis-plasmids driving expression of the CasX scaffold guide system were nuclear transfected in mNPC using the Lonza P3 primary cell 96-well Nucleofector kit. For the ARPE-19 line, lonza SF solution and supplement were used. The plasmid was diluted to a concentration of 200 ng/. Mu.L, 100 ng/. Mu.L. mu.L of DNA from each construct was added to P3 or SF solutions containing 200,000tdTomato mNPC or ARPE-19 cells, respectively. The pooled solutions were subjected to nuclear transfection using the Lonza 4D Nucleofector system according to manufacturer's guidelines. Following nuclear transfection, the solution is quenched with a suitable medium. The solution was then aliquoted in triplicate (approximately 67,000 cells/well) in 96-well plates. 48 hours after transfection, the treated cells were supplemented with fresh mNPC medium containing growth factors. 5 days after transfection, tdTomato mNPC was isolated and activity assessed by FACS.

NGS analysis of indels at the mRHO exon 1 locus: 5 days after transfection, tdTomato mNPC treated in 96-well plates was washed with dPBS and treated with 50. Mu.L of TrypLE and trypsin (0.25%) for 15 min and 5 min, respectively. After cell dissociation, the treated wells were quenched with medium containing DMEM, 10% FBS and 1X penicillin/streptomycin. The cells were then centrifuged and the resulting cell pellet washed with PBS and then treated for gDNA extraction using zymomini DNA kit according to the manufacturer's instructions. To assess the level of editing that occurs at the mouse RHO exon 1 locus, amplicons were amplified from 200ng gDNA using a set of primers (Fwd 5'-ACACTCTTTCCCTACACGACGCTCTTCCGATCTNNNNNNNGCAGCCTT GGTCTCTGTCTACG-3' (SEQ ID NO: 40595); rev 5'-GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTGCCCCAGTCTCTCT GCTCATACC-3' (SEQ ID NO: 40596)), bead purified (Beckman coulter, agencourt Ampure XP), and then amplified again to incorporate the illuminea linker sequence. Specifically, these primers contain additional sequences at the 5' end to introduce Illumina reads and 2 sequences and a 16nt random sequence that serves as a Unique Molecular Identifier (UMI). The quality and quantification of the amplicons was assessed using the Fragment Analyzer DNA assay kit (Agilent, dsDNA 35-1500 bp). Amplicons were sequenced on an Illumina Miseq according to the manufacturer's instructions. The original fastq file from sequencing was processed as follows: (1) Trimming the sequence for quality and adaptor sequence using the program cutadapt (v.2.1); (2) Combining sequences from reads 1 and 2 into a single insert sequence using program flash2 (v 2.2.00); and (3) running the consensus insert sequence along with the desired amplicon sequence and spacer sequence through the program CRISPResso2 (v 2.0.29). The procedure quantifies the percentage of reads modified in the window around the 3 'end of the spacer (30 bp window centered at-3 bp from the 3' end of the spacer). The activity of the CasX molecule is quantified as the total percentage of reads containing insertions, substitutions and/or deletions anywhere within the window

Results：

Engineered mutations in existing assays identify CasX variants that have the ability to increase the overall activity, specificity, and activity using spacers that target CTC-PAM sites of nucleases. These mutations of CasX 491 protein resulted in CasX variant proteins 515, 527, 528, 535, 536 and 537 (see table 3 for sequences).

Multiple editing screens were performed to quantify the level of on-target editing mediated by these CasX variant proteins paired with gRNA scaffold 174 or 235 and different spacers targeting multiple genomic loci of interest (the coding sequences for the guide and spacers are given in tables 18 and 19). The construct was cloned into AAV backbone p59 flanked by ITR2 sequences, driving expression of Cas X under the control of the CMV promoter, and scaffold-spacer under the control of the human U6 promoter. The mNPC-tdT reporter cell line was used to evaluate the single cleavage efficiency at the endogenous mouse RHO exon 1 locus (spacer 11.39, CTC PAM, FIG. 48A). The dual reporter gene system integrated in the ARPE-19 derived cell line was also used to evaluate target editing at the exogenously expressed human WT Rho locus (spacer 11.41, CTC PAM) or at the P23H-RHO locus (spacer 11.43, CTC PAM, FIG. 48B).

The CasX protein variant with spacer 11.39 was tested by nuclear transfection in a mouse NPC cell line at two different doses of 1000ng and 500 ng. The construct was compared for activity with the parent CasX 491. AAV constructs expressing CasX 535 and 537 with scaffold 174 and spacer 11.30 exhibited the greatest editing activity at the mRHO exon 1 locus of any CasX variant (as percent edited, fig. 48A), which increased 1.5 fold relative to CasX 491 (fig. 48C, normalized to 1) without increased off-target cleavage, as shown by nuclear transfection of the protein variant with spacer 11.37 (targeting mutant P23H-Rho allele, fig. 48B).

Experiments were then performed to determine if the improvement observed at the mouse RHO locus with mutant variants was translated at the human RHO locus, which is more clinically relevant. Dual reporter ARPE-19 cell lines were nuclear transfected with constructs expressing CasX variant proteins paired with sgRNA-scaffolds 235 with either spacer 11.41 or spacer 11.43, targeting human RHO. CasX 535 and 537 also showed more than a 1.5-fold increase in editing activity compared to CasX 491 when targeting exogenous WT-RHO-GFP loci (approximately 4.3% and 4.1% editing compared to 2.4% editing of RHO-GFP cells, respectively, fig. 49A and 49B). Constructs expressing CasX variants 515, 527, and 536 were edited at a similar level as CasX 491. Interestingly, all variant proteins showed improved editing compared to CasX 491 when using a spacer targeting the P23H-RHO-mscarlet locus. Constructs expressing CasX 527 (up to 2-fold) and CasX 535 (up to 1.8-fold) achieved the highest level of activity.

Finally, we attempted to demonstrate that these protein variants are packaged efficiently in AAV and remain effective when delivered in viral form. The increased activity at the on-target locus was shown by mNPC (on-target, mouse WT RHO) transduced with AAV vectors expressing CasX 527, 535 and 537 and guide scaffold 235 with spacer 11.39 relative to AAV CasX 491 transduced with 3.0e+5moi and guide scaffold 235 with spacer 11.39 (> 2-fold addition, fig. 50A and 50B). Fold increases in activity were observed in a dose-dependent manner.

These results support that CasX variants with structural mutations can be engineered for increased editing activity in dual reporter gene systems studying therapeutically relevant genomic targets, such as the mouse and human RHO exon 1 loci. Furthermore, while the newly characterized variants showed an overall increase in activity of 1.5-2 fold, they retained allele-specific targeting, but no off-target cleavage with a 1bp mismatch spacer was detected. This is relevant to allele-specific treatment strategies such as editing at adRP 23H Rho, where the mutant allele differs from the WT sequence by 1 nucleotide (targeted by spacer 11.37). This study further demonstrated the use of CasX variants 527, 535, 536 with scaffold 235 in AAV vectors designed for P23H RHO rescue and genotoxicity studies, as well as other therapeutic targets.

Example 15: AAV constructs with CasX and targetingSeat for seat

Experiments were performed to demonstrate the ability of CasX to edit endogenous RHO loci in mouse retina in vivo, with the spacer region targeting the P23 residues at therapeutically relevant levels, to generate proof of concept data that would demonstrate and inform experiments in P23H mouse disease models. Here, we evaluated whether CasX variant 491 and guide variant 174 and the spacer targeting the P23 locus of the mouse RHO gene could produce significant, detectable in the retina when injected subretinally, and evaluated the efficacy and safety of two different viral doses (1.0e+9 and 1.0e+10 vg). In the case of AdRP, rescue of 10% of rod photoreceptors restores vision. Thus, editing 10% of the RHO loci in rod photoreceptors in the retina can provide therapeutic benefit in a disease setting by reducing the level of mutant rhodopsin and preventing rod photoreceptor degeneration.

Materials and methods：

AAV plasmid and viral vector production

The CasX variant 491 under the control of the CMV promoter and the RNA guide variant 174/spacer 11.30 under the U6 promoter (AAGGGGCTCCGCACCACGCC (SEQ ID NO: 40502), the P23 residue of the target mouse RHO exon 1) were cloned into the pAAV plasmid flanked by AAV2 ITRs. Aav.491.174.11.30 vector was produced in HEK293 cells using the triple transfection method.

Subretinal injection

C57BL/6J mice were obtained from jackson laboratories and maintained in the normal 12 hour light/dark cycle. Subretinal injection was performed on 5-6 week old mice. Anesthetized mice were inhaled with isoflurane. Promecaine (0.5%) was topically applied to the cornea and the eye was dilated with drops of topiramate (1%) and phenylephrine (2.5%). During surgery, general gel was used to maintain eye lubrication. Under a surgical microscope, an ultrafine 30/2 gauge disposable needle was passed through the sclera, equator, and vicinity of the limbus to form a small hole in the vitreous cavity. 1-1.5. Mu.L of virus was injected directly into the subretinal space between the RPE and the retinal layer using a blunt needle. Each experimental group (n=5) was injected with 1e+9vg or 1e+10 viral genome (vg) per eye in one eye and AAV formulation buffer in the opposite eye.

NGS analysis

3 weeks after injection, animals were sacrificed and eyes were removed in fresh PBS. The whole retina was isolated from the eye cup and gDNA extraction treatment was performed using DNeasy Blood Tissue Kit (Qiagen) according to the manufacturer's instructions. The amplicon was amplified from 200ng gDNA using a set of primers (Fwd 5'-ACACTCTTTCCCTACACGACGCTCTTCCGATCTNNNNNNNGCAGCCTTGGTCTCTGTCTACG-3' (SEQ ID NO: 40595); rev 5'-GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTGCCCCAGTCTCTCTGCTCATACC-3' (SEQ ID NO: 40596)) targeting the mouse RHO exon 1 locus, bead purified (Beckman coulter, agencourt Ampure XP), and then amplified to incorporate the illuminena linker sequence. Specifically, these primers contain additional sequences at the 5' end to introduce Illumina reads and 2 sequences and a 16nt random sequence that serves as a Unique Molecular Identifier (UMI). The quality and quantification of the amplicons was assessed using the Fragment Analyzer DNA assay kit (Agilent, dsDNA 35-1500 bp). Amplicons were sequenced on an Illumina Miseq according to the manufacturer's instructions. The original fastq file from sequencing was processed as follows: (1) Trimming the sequence for quality and adaptor sequence using the program cutadapt (v.2.1); (2) Combining sequences from reads 1 and 2 into a single insert sequence using program flash2 (v 2.2.00); and (3) running the consensus insert sequence along with the desired amplicon sequence and spacer sequence through the program CRISPResso2 (v 2.0.29). The procedure quantifies the percentage of reads modified in the window around the 3 'end of the spacer (30 bp window centered at-3 bp from the 3' end of the spacer). The activity of CasX molecules is quantified as the total percentage of reads containing insertions, substitutions and/or deletions anywhere within the window.

Immunohistology：

Mice were euthanized 3-4 weeks after injection. The removed eyes were placed in 10% formalin overnight at 4 ℃. Retinas were excised from the eye cups, rinsed thoroughly in PBS, and immersed in a 15% -30% sucrose gradient. Tissues were embedded at Optimal Cutting Temperature (OCT), frozen on dry ice, and then transferred to-80 ℃ for storage. A20. Mu.M section was cut using a cryostat. Prior to antibody labelling, the sections were blocked in blocking buffer (2% normal goat serum, 1% BSA, 0.1% Triton-X100) for > 1 hour at room temperature. Antibodies used were anti-mouse HA (abcam, 1:500) and Alexa Fluor488 rabbit anti-mouse (Invitrogen, 1:2000). Sections were counterstained with DAPI to label nuclei, fixed on slides and imaged on fluorescence microscopy.

Results：

We assessed the ability of CasX to edit the P23 RHO locus in the mouse retina. Two treatment-related doses of 1.0e+9 and 1.0e+10vg of AAV-casx.491.174.11.30 were administered in the subretinal space of 5-6 week old C57BL/6J mice. Three weeks after injection, retinas were harvested and levels of editing were quantified by NGS and CRISPResso analysis procedures. Spacer 11.30 targets the WT P23 genomic locus located at the beginning of the first exon of RHO (figure 51). Overexpression of CasX-491.174.11.30 resulted in a significant, dose-dependent editing of the mRHO exon 1 locus in the treated retina compared to the sham injected retina (fig. 52A-52B). The left panel (fig. 52A) shows the quantification (%) of total indels detected by NGS at the mouse P23 RHO locus in AAV-CasX or sham injected retinas compared to the mouse reference genome. The right panel (fig. 52B) shows the fraction (%) of edits predicted to result in frame shift mutations in the RHO protein. Data are presented as averages of NGS readings from the total retinal edits, with 6 to 8 animals per experimental group. The highest AAV dose (1e+10 vg/eye) increased the insertion/deletion ratio by a factor of 4 compared to the 1.0e+9vg dose, with 40.3±22% versus 12.3±5% RHO editing, respectively, detected. Most of the indels generated by casx.491 were deletions (left panel), predicted to translate into high frequency frameshift mutations (64.7 versus 76.9% for 1.0e+9 and 1.0e+10 vg/dose, respectively), and assumed high levels of RHO protein knockdown. These results indicate that CasX can efficiently edit 10% of rod photoreceptors in the P23H +/-mouse model using spacers driving allele-specific targets of the mutant P23H locus, with most edits converting to knock-down mutant P23H Rho and significantly delaying photoreceptor degeneration.

Immunohistochemistry on transected injected retinas confirmed expression of CasX in the photoreceptor layer, but also showed virus spreading to the inner layer, as shown in fig. 53A-53F. AAV-CasX in the treatment group was 1.0e+9vg (FIGS. 53B and 53E); 1.0e+10vg AAV-CasX (FIGS. 53C and 53F); or PBS (fig. 53A and 53D). In retinas injected with both 1E9vg (fig. 53B and 53E) and 1E10vg (fig. 53C and 53F), the level of HA-labeled CasX was assessed by anti-HA antibody staining in photoreceptor cell bodies located in the Outer Nuclear Layer (ONL) as well as in the outer segments (lower panels of fig. 53E and 53F). Control retinas receiving only sham (fig. 53A and 53C) injections showed background levels of HA staining signal in RPE/sclera (fig. 53D) and no detectable levels in ONL/INL layers. In addition, total histological analysis showed that retinal structure was maintained following subretinal administration of the AAV-packaged CasX construct.

Under experimental conditions, the results demonstrate that CasX 491, scaffold 174, and the spacer region targeting the mouse P23 RHO locus can achieve therapeutically relevant levels of editing at the P23 mouse locus when subretinally delivered by AAV in the mouse retina.

Example 16: AAV-mediated selective expression of CasX in photoreceptors results inStrong on-target activity in vivo。

Experiments were performed to demonstrate the ability of CasX to selectively edit photoreceptors in the mouse retina by restricting its expression with a selective photoreceptor promoter, with a spacer region targeting the P23 residue in the wild-type retina at therapeutically relevant levels. We further demonstrate a strong correlation between edit level and proteomic level in a transgenic reporter model expressing GFP in rod photoreceptors only. Here, we evaluated whether CasX variant 491 and guide variant 174, which have a spacer targeting the integrated GFP locus when injected subretinally, produced significant, detectable levels of editing in the retina, and evaluated the efficacy of two different viral doses (1.0e+9 and 1.0e+10 vg/eye).

Method：

Production of AAV plasmids and viral vectors: casX variant 491 under the control of various photoreceptor-specific promoters (RP 1, RP2, RP3 based on the endogenous rhodopsin RHO promoter and RP4, RP5 based on the endogenous G-coupled retinal kinase GRK1 promoter; sequences in Table 22) and the CMV promoter, and sgRNA guide variant 174/spacer 11.30 under the control of the U6 promoter (AAGGGGCTCCGCACCACGCC (SEQ ID NO: 40502), the P23 residue of targeted mouse RHO exon 1) were cloned into the pAAV plasmid flanked by AAV2 ITRs. The WPRE sequences amplified with EcoRI restriction sites on each side were inserted into EcoRI digested p59.RP4.491.174.11.30 and p59.RP5.491.174.11.30 plasmids. For efficacy studies in the Nrl-GFP model, GFP-targeting spacer 4.76 (TGTGGTCGGGGTAGCGGCTG (SEQ ID NO: 17)) was cloned into AAV cis plasmid p59.RP1.491.174 using the Golden Gate clone with bbsI restriction sites flanking the spacer.

Table 22: rho promoter sequence

Promoters	PR construct	SEQ ID NO:	DNA sequence
				RHO	RP1	40589	ND
RHO535-CAG	RP2	40590	ND
				RHO-intron	RP3	40591	ND
GRK	RP4	40592	ND
				GRK-SV40	RP5	40593	ND
GRK-CAG	RP6	40594	ND

* Nd=the sequences provided in the sequence listing are not described.

AAV vector production: suspension HEK293T cells were derived from parental HEK293T and grown in FreeStyle 293 medium. 500mL of culture (1L Erlenmeyer flask, stirring at 110 rpm) was diluted to a density of 2e+6 cells/mL on the day of transfection. The transgenic endotoxin-free pAAV plasmid with flanking ITR repeats was co-transfected with a plasmid providing an adenovirus helper gene for replication and AAV rep/cap genome using PEIMax (Polysciences) in serum-free optmem medium. Cultures were supplemented with 10% CDM4HEK293 (HyClone) 3 hours post-transfection. After three days, the culture was centrifuged at 1000rpm for 10 minutes to separate the supernatant from the cell pellet. The supernatant was mixed with 40% PEG 2.5M NaCl (8% final concentration) and incubated on ice for at least 2 hours to precipitate AAV viral particles. Cell pellets containing most of the AAV vector were resuspended in lysis medium (0.15M NaCl, 50mM Tris HCl, 0.05% Tween, pH 8.5), sonicated on ice (15 seconds, 30% amplitude) and treated with Benzonase (250U/. Mu.L, novagen) for 30 minutes at 37 ℃. The crude lysate and PEG-treated supernatant were then spun at 4000rpm for 20 minutes at 4 ℃ to resuspend PEG-precipitated AAV (precipitate), wherein the crude lysate (supernatant) without cellular residues was further clarified using a 0.45 μm filter. AAV lysates were purified using affinity chromatography (POROS CaptureSelect AAVX, thermoFisher). The eluate was buffer exchanged and concentrated in PBS +200mm nacl +0.001% Pluronic. To determine viral genome titres, 1 μl from crude lysate virus was digested with DNase and ProtK followed by quantitative PCR. mu.L of digested virus was used in 25. Mu.L of qPCR reaction consisting of IDT prime mix and a set of primers and 6' FAM/Zen/IBFQ probe (IDT) designed to amplify a 62bp fragment located in AAV2-ITR (Fwd 5'-GGAACCCCTAGTGATGGAGTT-3' (SEQ ID NO: 40804); rev 5'-CGGCCTCAGTGAGCGA-3' (SEQ ID NO: 40805), probe 5'-CACTCCCTCTCTGCGCGCTCG-3' (SEQ ID NO: 40806)). The titer (viral genome (vg)/mL) of the viral sample was calculated using AAV ITR plasmid as a reference standard. The QPCR procedure was set to: an initial denaturation step of 5 min at 95℃followed by 40 cycles of denaturation at 95℃for 1 min and annealing/extension at 60℃for 1 min.

AAV Vector aav.rp1.491.174.4.76 was generated in HEK239T at Vector Core at University of North Carolina (UNC) using a triple transfection method.

Subretinal injection: c57BL/6J mice and heterozygous Nrl-GFP/C57BL/5J mice (Jackson laboratories) were maintained in the normal 12-hour light/dark cycle. Subretinal injection was performed on 4-5 week old mice. Anesthetized mice were inhaled with isoflurane. Promecaine (0.5%) was topically applied to the cornea and the eye was dilated with drops of topiramate (1%) and phenylephrine (2.5%). During surgery, general gel was used to maintain eye lubrication. Under a surgical microscope, an ultrafine 301/2 gauge disposable needle was passed through the sclera, equator, and vicinity of the limbus to form a small hole in the vitreous cavity. Using a blunt needle, 1. Mu.L to 1.5. Mu.L of virus was injected directly into the subretinal space between the RPE and the retinal layer. One eye of each mouse from the experimental group was injected with 1.0e+9, 5.0e+9, or 1.0e+10 genome (vg)/eye, and the opposite eye was injected with AAV formulation buffer.

Western blotting: to generate protein lysates, eyes were freshly enucleated and dissected in ice cold PBS, snap frozen in dry ice, and resuspended in fresh RIPA buffer (150 mM NaCl, 1% NP40, 0.5% deoxycholic acid, 0.1% SDS, 50mM Tris pH8.0, dH 20) supplemented with protease inhibitor (5 mg/mL final concentration), DTT and PMSF (final concentration 1mM, respectively) in 1.5mL Eppendorf tubes per retina. The retinal tissue was further homogenized into small pieces using a disposable particle pestle (Fisher scientific, # 12-141-364) without RNA and incubated on ice for 30 minutes, with occasional flipping of the tube for gentle mixing. The samples were then centrifuged at 4℃for 20 min at full speed to pellet the genomic DNA. The protein extract and gDNA cell pellet were then isolated. For protein extracts, the supernatant was collected. Protein concentration was determined by BCA assay and read on a Tecan plate reader. Mu.g of mouse total retinal protein lysate was separated by SDS-PAGE (Bio-Rad TGX gel) and transferred onto polyvinylidene fluoride membranes using Transblot Turbo. Membranes were blocked with 5% nonfat milk powder for 1 hour at room temperature and incubated overnight with primary antibody at 4 ℃. The blots were then washed three times with Tris buffered saline containing Tween-20 (137 mM sodium chloride, 20mM Tris, 0.1% Tween-20, pH 7.6) and incubated with horseradish peroxidase conjugated anti-rabbit or anti-mouse secondary antibodies for 1 hour at room temperature. After three washes, the film was developed using chemiluminescent substrate ECL and imaged on chemicoc (X). The blot image was treated with ImageLab.

NGS analysis: animals were sacrificed and eyes were removed in fresh PBS. The whole retina was isolated from the eye cup and subjected to gDNA extraction treatment as described previously in the western blot section. Genomic gDNA pellet was treated using dnasy blood and tissue kit (Qiagen) according to the manufacturer's instructions. Amplicons were amplified from 200ng gDNA with a set of primers targeting the genomic region of interest (Table 23). The amplicon was bead purified (Beckman coulter, agencourt Ampure XP) and then reamplified to incorporate the illuminea linker sequence. Specifically, these primers contain additional sequences at the 5' end to introduce Illumina reads and 2 sequences and a 16nt random sequence that serves as a Unique Molecular Identifier (UMI). The quality and quantification of the amplicons was assessed using the Fragment Analyzer DNA assay kit (Agilent, dsDNA 35-1500 bp). Amplicons were sequenced on an Illumina Miseq according to the manufacturer's instructions. The original fastq file from sequencing was processed as follows: (1) Trimming the sequence for quality and adaptor sequence using the program cutadapt (v.2.1); (2) Combining sequences from reads 1 and 2 into a single insert sequence using program flash2 (v 2.2.00); and (3) running the consensus insert sequence along with the desired amplicon sequence and spacer sequence through the program CRISPResso2 (v 2.0.29). The procedure quantifies the percentage of reads modified in the window around the 3 'end of the spacer (30 bp window centered at-3 bp from the 3' end of the spacer). The activity of CasX molecules is quantified as the total percentage of reads containing insertions, substitutions and/or deletions anywhere within the window.

Table 23: NGS primer sequences

Immunohistology: the removed eyes were placed in 10% formalin overnight at 4 ℃. Retinas were excised from the eye cups, rinsed thoroughly in PBS, and immersed in a 15% -30% sucrose gradient. Tissues were embedded at Optimal Cutting Temperature (OCT), frozen on dry ice, and then transferred to-80 ℃ for storage. A20. Mu.M section was cut using a cryostat. Prior to antibody labelling, the sections were blocked in blocking buffer (2% normal goat serum, 1% BSA, 0.1% Triton-X100) for > 1 hour at room temperature. The antibodies used were: anti-mouse HA (abcam, 1:500); alexa Fluor 488 rabbit anti-mouse (Invitrogen, 1:2000). Slides were counterstained with Hoechst 33342 (Thermo Fisher Scientific, hemel Hempstead, UK) and blocked with Prolong Diamond anti-quench blocking tablets (Thermo Fisher Scientific, hemel Hempstead, UK). Confocal fluorescence imaging was then performed using an LSM-710 inverted confocal microscope system (Carl Zeiss, cambridge, UK).

Results：

The level of editing was quantified at the mRHO exon locus of 3 week old C57BL/6J injected subretinally with AAV vectors that expressed CasX 491 under the control of a variety of engineered retina and ubiquitous promoters to identify promoters that drive strong editing levels in photoreceptors with spacer 11.30. The rod-specific RP1, RP2, RP3, RP4 promoters mediate very similar levels of editing (20%). Vectors aav.rp5.491.174.11.30 and AAV rp5.491.wpre.174.11.30 resulted in lower expression levels (10% and 8%, respectively, fig. 54A). We identified the optimized vector aav.rp1.491.174.11.30 as the most effective vector for further functional and distribution studies with the aim of achieving high levels of in vivo editing in photoreceptors and making transgenic plasmids significantly smaller in size for packaging within AAV (100-400 bp shorter than other constructs with similar activity levels (fig. 54B.) we observed further the optimized constructs by performing efficacy studies in transgenic models expressing GFP in rod photoreceptors, which are convenient models in the art for verifying rod-specific or protein knockdown.aav.rp 1.491.174.4.76 vectors were injected at 2 different doses to study efficacy, 4 weeks and 12 weeks after injection, we quantified the editing level at the integrated GFP locus by NGS and observed detectable editing levels for the 1.0e+9 vg/eye dose group, we observed a detectable editing level of 10% in the increased dose group of 1.e+10 vg at 4 weeks followed by a 12-week increase in editing level at the following point of 2 weeks (fig. 55).

The level of editing was confirmed by structural and proteomic analysis. Western blot analysis of retinal lysates 12 weeks after injection showed a strong correlation between the editing and reduced levels of GFP protein (fig. 56A and 56C), protein knockdown was detected with as low as 5% editing in the whole retina. GFP protein levels were significantly lower in AAV-CasX treated retinas at 1.0e+10 vg/eye dose than in vehicle group (fig. 56B).

These results were also confirmed by in vivo fundus imaging with GFP fluorescence. By week 12, the ratio of the upper retinal average gray value to the lower retinal average gray value showed 20% and 50% decrease in GFP fluorescence (fig. 57A). Complete reduction in GFP fluorescence over time was seen in the injected retinas only in the quadrant receiving subretinal injection compared to vehicle group (fig. 57B).

Immunochemical staining confirmed (FIGS. 58A to 58L) a decrease in GFP protein expression in rod photoreceptors. Representative confocal images showed strong GFP expression in retinas injected with AAV formulation buffer alone. GFP was expressed throughout the retina, matched to nuclear staining (fig. 58A-58C). No HA expression was detected, as was a readout of AAV-mediated CasX transgene expression (fig. 58D).

Injection of 1.0e+9 and 1.0e+10 retinas showed a strong decrease in GFP expression in dose-dependent manner throughout the retinal sections (fig. 58E-58L), which correlated with detectable levels of only HA rod Outer Segment (OS) and Outer Nuclear Layer (ONL), confirming the selectivity of promoter RP1 for rod photoreceptors. High dose treatment resulted in complete knockdown of injected retina (about 50% of GFP knockdown in whole retina, since injection was limited to higher gradients), whereas the 1.0e+9vg dose reduced GFP expression in local areas by about 50% compared to control (fig. 58C) (fig. 58G and 58K).

The results demonstrate that CasX 491, scaffold 174 and the spacer region targeting the mouse P23 RHO locus can achieve editing of therapeutically relevant levels at the P23 mouse locus via AAV-mediated subretinal delivery when expressed only in rod photoreceptors (therapeutic cellular targets). Furthermore, the specificity and efficacy of the vector was confirmed by conducting subsequent studies targeting the GFP locus integrated into a reporter gene model that overexpresses GFP in photoreceptors, with the results showing a strong correlation between editing levels and protein knockdown assessed by western blotting, fundus imaging and histology.

Example 17: demonstrating that the CasX: gNA system can efficiently edit human neural progenitor fines when packaged and delivered by AAVCells and induced neurons

Experiments were performed to demonstrate the efficacy of the AAV-expressed CasX: gNA system iN editing human neural progenitor cells (hNPC) and induced neurons (iN) iN vitro.

Materials and methods：

AAV construct cloning：

The CasX variant 491 and the guide bracket variant 235 were used in these experiments.

To assess the editability of the AAV expressed CasX: gNA system in hNPC, standard molecular cloning techniques were used to generate AAV constructs containing the UbC promoter driving expression of CasX and the Pol III promoter scaffold driving expression of gRNA, with scaffold variant 235 and spacer 7.37 (GGCCGAGAUGUCUCGCUCCG; SEQ ID NO:379; integration into construct ID 183), which targets the endogenous B2M locus. The cloned and sequence verified constructs were prepared in large quantities and quality assessed prior to transfection for AAV production.

For experiments assessing the editability of the AAV expressed CasX: gNA system iN human iN, AAV constructs encoding CasX proteins and gRNAs were similarly generated as described, with spacer region 31.12 (UUCUCGGCGCUGCACCACGU; SEQ ID NO:41830; integrated into construct ID 188), 31.63 (CAAGAGGAGAAGCAGUUUGG; SEQ ID NO:41831; integrated into construct ID 189) or 31.82 (GGGGCCUGUGCCAUCUCUCG; SEQ ID NO:41832; construct ID 190) targeting AAVS 1. Non-targeting spacer 0.1 (AGGGGUCUUCGAGAAGACCC; SEQ ID NO: 41833) was also used in these experiments. For experiments evaluating various protein promoters driving CasX 491 expression with gRNA spacer 7.37 to edit the B2M locus iN human iN, AAV constructs containing these protein promoter variants were similarly generated as described (see table 24 for sequences of protein promoter variants). In addition to the sequences encoding the CasX proteins (table 21), the sequences of the additional components of the AAV constructs are listed in table 26.

Table 24: construction of AAV constructs comprising sequences of protein promoter variants and each corresponding protein promoter variantSEQ ID and sequence of each protein promoter variantIDNO。

* Nd=not described.

AAV products：

On the day of transfection, suspension-adapted HEK293T cells maintained in FreeStyle 293 medium were seeded at 1.5e6 cells/mL in 20mL-30mL medium. The transgenic endotoxin-free pAAV plasmid with flanking ITR repeats was co-transfected with a plasmid providing an adenovirus helper gene for replication and AAV rep/cap genome using PEI Max (Polysciences) in serum-free Opti-MEM medium. Cultures were supplemented with 10% CDM4HEK293 (HyClone) 3 hours post-transfection. After three days, the culture was centrifuged to separate the supernatant from the cell pellet. The supernatant was mixed with 40% PEG 2.5M NaCl and incubated on ice to precipitate AAV viral particles. Cell pellets containing most of the AAV vector were resuspended in lysis medium (0.15M NaCl, 50mM Tris HCl, 0.05% Tween, pH 8.5), sonicated on ice and treated with Benzonase (250U/. Mu.L, novagen) for 30 min at 37 ℃. The PEG-treated supernatant was centrifuged to pellet the precipitated AAV, while the crude lysate was centrifuged to remove cellular debris from the virus-containing supernatant, and the collected viruses were then pooled for further clarification using a 0.45 μm filter. AAV lysates were purified using affinity chromatography (POROS CaptureSelect AAVX, thermoFisher), the eluate was buffer exchanged, and concentrated in pbs+200mm nacl+0.001% Pluronic.

To determine viral genome (vg) titres, 1 μl from crude lysate virus was digested with DNase and ProtK followed by quantitative PCR. mu.L of digested virus was used in 25. Mu.L of qPCR reaction consisting of IDT prime mix and a set of primers and 6' FAM/Zen/IBFQ probe (IDT) designed to amplify a 62bp fragment located in AAV 2-ITR. The titer (vg/mL) of the virus samples was calculated using AAV ITR plasmid as a reference standard. The qPCR procedure was set to: an initial denaturation step of 5 min at 95℃followed by 40 cycles of denaturation at 95℃for 1 min and annealing/extension at 60℃for 1 min.

In vitro culture of hNPC：

Immortalized hNPC were cultured in hNPC medium (DMEM/F12 containing GlutaMax, 10mM HEPES, 1 XNEAA, 1X B-27 without vitamin A, 1 XN 2 supplemented with growth factors hFGF and EGF, pen/Strep and 2-mercaptoethanol). Prior to testing, cells were isolated with TrypLE, gently resuspended to dissociate neurospheres, quenched with medium, centrifuged and resuspended in fresh medium. Cells were counted and plated directly onto 96-well plates coated with PLF (poly-DL-ornithine hydrobromide, laminin and fibronectin) at a density of about 10,000 cells/well 24 hours prior to AAV transduction.

AAV transduction of hNPC followed by HLA immunostaining and flow cytometry：

Approximately 7000 cells/well of hNPC were plated on PLF coated 96-well plates. 24 hours later, the inoculated cells were treated with AAV expressing the CasX: gRNA system. All viral infection conditions were performed in at least duplicate, with a normalized number of viral genomes (vg) in the experimental vector, in a series of triple serial dilutions of MOI in the range of 1E4 to 1E6 vg/cell. AAV-treated hNPC was isolated with TrypLE 5 days after transduction. After cell dissociation, the cells were quenched with staining buffer (3% fetal bovine serum in pbs). Dissociated cells were transferred to round bottom 96-well plates, followed by centrifugation and resuspension of the cell pellet with staining buffer. After another centrifugation, the cell pellet was resuspended in staining buffer containing antibodies (BioLegend) that would detect B2M-dependent HLA proteins expressed on the cell surface. After HLA immunostaining, cells were stained with DAPI to label nuclei. HLA+hNPC was determined using an Attune NxT flow cytometer. A decrease or lack of HLA protein expression would indicate successful editing at the B2M locus in these hnpcs. Subsets of transduced hNPC were also isolated for genomic DNA extraction and editing analysis by Next Generation Sequencing (NGS).

NGS processing and analysis：

Genomic DNA (gDNA) was extracted from the harvested cells using the Zymo Quick-DNA Miniprep Plus kit according to the manufacturer's instructions. Target amplicons were formed by amplifying the region of interest from 200ng of extracted gDNA with a set of primers specific for the target locus, such as the human B2M locus. These gene specific primers contain additional sequences at the 5' end to introduce Illumina linkers and a unique molecular identifier of 16 nucleotides. The amplified DNA product was purified using the Ampure XP DNA purification kit. The quality and quantification of the amplicons was assessed using the Fragment Analyzer DNA assay kit (Agilent, dsDNA 35-1500 bp). Amplicons were sequenced on an Illumina Miseq according to the manufacturer's instructions. The original fastq files from sequencing were quality controlled and processed using cutadapt v2.1, flash2 v2.2.00 and CRISPResso2 v 2.0.29. In a window around the 3 '-end of the spacer (30 bp window centered at-3 bp from the 3' -end of the spacer), quantification of the inclusion of an insertion or a deletion (indel) with respect to the reference sequence was performed for each sequence. CasX activity is quantified as the total percentage of reads containing insertions, substitutions and/or deletions anywhere within the window for each sample.

Reprogramming of induced pluripotent stem cells (ipscs)：

Fibroblasts from patients were obtained from the Coriell cell bank (Coriell Cell Repository). ipscs were generated from these cell lines by additional body weight programming and genetically engineered to ectopically express neuronal element 2 (Neurog 2) to accelerate neuronal differentiation. Three iPSC clones were selected for downstream experiments.

Neuronal cell culture：

All neuronal cell cultures were performed using N2B27 based medium. To induce neuronal differentiation, iPSCs were inoculated in neuronal plating medium (N2B 27 basal medium containing 1. Mu.g/mL doxycycline, 200. Mu. M L-ascorbic acid, 1. Mu.M dibutyryl cAMP sodium salt, 10. Mu.M CultereOne, 100ng/mL BDNF, 100ng/mL GDNF). Three days after differentiation (DIV 3), the iN was dissociated, aliquoted and frozen for long term storage. DIV3 iN was thawed and seeded at 30,000 cells/well iN 96-well plates. iN plated medium for one week, after which half medium changes were performed weekly using feed medium (N2B 27 basal medium containing 200. Mu. M L-ascorbic acid, 1. Mu.M dibutyryl cAMP sodium salt, 200ng/ml BDNF, 200ng/ml GDNF).

AAV transduction of iN vitro iN：

24 hours prior to transduction, about 30,000-50,000 iN/well were seeded on Matrigel coated 96 well plates. AAV expressing CasX: gRNA system was then diluted in neuronal plating medium and added to cells, using six wells per condition as parallel. Cells were transduced at various MOI (1E 4 or 1E5 vg/cell for FIG. 61; 2E4 or 6.67E3 for FIG. 62). 7 days after transduction, iN was supplemented with feed medium. 14 days after transduction, cells were isolated using lysis buffer, 6-well replicates were pooled, gDNA harvested and prepared for editing analysis at human AAVS1 or B2M loci using NGS.

Results：

Figure 60 shows quantification of percent editing at B2M loci (e.g., percent indels quantified by NGS genotypes and phenotypically read B2M cell populations detected by flow cytometry) measured in human NPCs via two different assessments after 5 days of transduction with AAV at various MOIs. Efficient editing of the human B2M locus was observed, reaching the highest level of editing at an MOI of about 3E 5: about 50% of the indels and about 13% of the cells exhibit a B2M protein knockout phenotype. Fig. 61 also shows the effective editing at the AAVS1 locus iN human iN, with construct ID 189 achieving about 90% editing at a higher MOI of 1E 5. As expected, no editing was observed at the AAVS1 locus with non-targeted spacers.

Fig. 62 shows that robust editing at the B2M locus was achieved for several of the multiple protein promoters used to drive expression of CasX variant 491. Briefly, AAV is produced with the transgene construct shown and transduced into human iN at a MOI of 2E4 or 6.67E3. AAV constructs 177 and 183 contained the promoters exhibiting the highest editing activity, with at least 80% efficiency at either MOI.

The results of these experiments demonstrate that CasX variants 491 and guide scaffolds 235 having a spacer region targeting the human B2M locus or the human AAVS1 locus can be efficiently targeted for editing when packaged and delivered into human NPC or iN by AAV iN vitro.

Example 18: cpG-depleted AAV showed potent CasX-mediated editing in vitro and induced less TLR9Mediated immune response

Pathogen-associated molecular patterns (PAMPs) such as unmethylated CpG motifs are a class of small molecule motifs that are conserved among microorganisms. They are recognized by toll-like receptors (TLRs) and other pattern recognition receptors in eukaryotes and often induce nonspecific immune activation. In the context of gene therapy, PAMPs-containing therapeutic agents are generally not well tolerated and are rapidly cleared from the patient due to the strong immune response triggered, which ultimately results in reduced therapeutic efficiency. CpG motifs are short single stranded DNA sequences containing dinucleotide CG. When these CpG motifs are unmethylated, they function as PAMPs and thus effectively stimulate an immune response.

Experiments were performed to deplete CpG motifs in AAV constructs encoding CasX variant 491, guide scaffold variant 235 and spacer 7.37 targeting the endogenous B2M locus (construct ID 183), and demonstrated that CpG-depleted AAV vectors were able to be efficiently edited in vitro. In addition, experiments will be performed to assess the effect of CpG depletion on activation of TLR 9-mediated immune responses in vitro. The individual elements of the AAV genome and their corresponding CpG reduced forms were first evaluated in vitro for editing activity and immunogenicity to identify sequences that produced optimal CpG depletion that effectively edited but reduced undesired TLR9 activation, and then combined to produce AAV genomes with dramatic reduction in CpG presence for further evaluation.

Materials and methods：

CpG-depleted AAV plasmid production：

Nucleotide substitutions were designed on the computer to replace the natural CpG motif based on homologous nucleotide sequences from the following elements of the relevant species: mouse U1a snRNA (micronuclear RNA) gene promoter, human UbC (polyubiquitin C) gene promoter, bGHpA (bovine growth hormone polyadenylation) sequence, and human U6 promoter. The coding sequence of CasX 491 was codon optimized for CpG depletion, and AAV2 ITRs had been CpG depleted as previously described (Pan X, yue Y, boftsi M. Et al, 2021, "Rational engineering of a functional CpG-free ITR for AAV Gene treatment." Gene Ther. Https:// doi. Org/10.1038/s 41434-021-00296-0). All the resulting sequences (table 25) were sequenced into gene fragments with appropriate projections for cloning and isothermal assembly to replace the corresponding elements of the existing basal AAV plasmid (construct ID 183) individually. The spacer 7.37 (GGCCGAGAUGUCUCGCUCCG; SEQ ID NO: 379) of the endogenous gene-targeted beta-2-microglobulin (B2M) was used in the relevant experiments discussed in this example. After isothermal assembly, AAV constructs were transformed into chemically competent e.coli cells (Stbl 3) and, after 1 hour recovery at 37 ℃, inoculated on kanamycin LB-agar plates. Single colonies were picked for colony PCR and Sanger sequencing. The sequence verified constructs were prepared in medium amounts for subsequent nuclear transfection and AAV vector production. In addition to the sequences encoding CasX (table 21), the sequences of additional components of the non-CpG-depleted AAV construct are listed in table 26. Based on demonstration of robust expression of CRISPR components and retention of editing activity, AAV constructs with the remaining unchanged components in table 26 would be modified to deplete CpG motifs and evaluated using the method described in example 17.

Table 25: sequences of CpG-depleted AAV elements。

AAV vector production was performed as previously described in example 17.

Viral genome titers were determined as previously described in example 17.

In vitro culture of human neural progenitor cells (hNPC)：

Immortalized hNPC were cultured in hNPC medium (DMEM/F12 containing GlutaMax, 10mM HEPES, 1 XNEAA, 1X B-27 without vitamin A, 1 XN 2 supplemented with growth factors hFGF and EGF, pen/Strep and 2-mercaptoethanol). Prior to testing, cells were isolated with TrypLE, gently resuspended to dissociate neurospheres, quenched with medium, centrifuged and resuspended in fresh medium. Cells were counted and used directly for nuclear transfection or plated at a density of about 10,000 cells/well 48 hours prior to AAV transduction on 96-well plates coated with PLF (poly-DL-ornithine hydrobromide, laminin and fibronectin).

Plasmid Nuclear transfection into human neural progenitor cells (hNPC)：

AAV plasmids encoding the CasX: gRNA system (with or without CpG depletion of the individual elements of the AAV genome) were nuclear transfected into hNPC using the Lonza P3 primary cell 96-well Nucleofector kit. The plasmid was diluted to two concentrations: 50 ng/. Mu.L and 25 ng/. Mu.L. mu.L of DNA was mixed with 20. Mu.L of 200,000hNPC in Lonza P3 solution supplemented with 18% V/V P3 supplement. The pooled solutions were subjected to nuclear transfection using the Lonza 4D Nucleofector system according to procedure EH-100. The nuclear transfected solution was then quenched with the appropriate medium and then dispensed into three wells of a 96-well plate coated with PLF. 7 days after nuclear transfection, hNPC was isolated and analyzed for B2M protein expression by HLA immunostaining and subsequent flow cytometry. Subsequently, a stack of individual CpG-depleted elements will be performed to generate a combined AAV genome with a large number of CpG depletion, and similarly tested for editing assessment at the B2M locus in vitro.

Editing activity assessment by HLA immunostaining and flow cytometry：

AAV-treated hNPC was isolated with TrypLE 7 days after nuclear transfection. After cell dissociation, the cells were quenched with staining buffer (3% fetal bovine serum in pbs). Dissociated cells were transferred to round bottom 96-well plates, followed by centrifugation and resuspension of the cell pellet with staining buffer. After another centrifugation, the cell pellet was resuspended in staining buffer containing antibodies (BioLegend) that would detect B2M-dependent HLA proteins expressed on the cell surface. After HLA immunostaining, cells were stained with DAPI to label nuclei. HLA+hNPC was determined using an Attune NxT flow cytometer.

AAV transduction of hNPC in vitro：

About 10,000 cells/well of hNPC will be plated on PLF coated 96-well plates. After 48 hours, the vaccinated cells will be treated with AAV expressing the CasX: gRNA system, with or without CpG depletion of individual elements of the AAV genome. All viral infection conditions will be performed at least in duplicate. 5-7 days after transduction, hNPC was isolated as described by HLA immunostaining and subsequent evaluation of editing activity by flow cytometry. Subsequently, a stack of individual CpG-depleted elements will be performed to generate a combined AAV genome with a large number of CpG depletion, and similarly tested for editing assessment at the B2M locus in vitro.

Human TLR9 reporter gene HEK293 cell (HEK-Blue)^TM hTLR 9) in the use of CpG-containing (CpG⁺ ) Or CpG depleted (CpG)^- ) Use of AAV after transduction for in vitro immunogenicity assessment:

HEK-Blue^TM the hTLR9 cell line (invitogen) was derived from HEK293 cells and was specifically designed for the study of TLR 9-induced NF- κb signaling. These HEK-Blue^TM hTLR9 cells overexpress the human TLR9 gene and SEAP (secreted embryonic alkaline phosphatase) reporter gene under the control of NF- κB inducible promoters. SEAP levels in cell culture supernatant report activation of TLR9, which levels can be quantified using a colorimetric assay.

For this experiment, 5,000 HEK-Blue^TM hTLR9 cells will be seeded in DMEM medium containing 10% FBS and Pen/Strep in each well of a 96-well plate. The next day, cpG was used to express the CasX-gRNA system⁺ Or CpG^- AAV transduces the vaccinated cells. All viral infection conditions will be performed at least in duplicate, with a normalized number of viral genomes (vg) in the experimental vector, in a series of three-fold serial dilutions of the MOI starting from the effective MOI of 1E6 vg/cell. HEK-Blue was used 1, 2, 3 and 4 days after transduction^TM The assay kit evaluates the level of secreted SEAP in the cell culture medium supernatant according to the manufacturer's instructions.

Results：

FIG. 63 shows the results of evaluation of CpG-containing (CpG⁺ ) Or CpG depleted (CpG)^- ) Determination of editing activity at B2M locus in AAV vector nuclear transfected hNPC. Editing activity was measured as editing at the B2M locus, resulting in reduced/absent B2M expression (B2M^- ) Is a percentage of hNPC. FIG. 63 shows the reduction or depletion of CpG motifs within the sequence of the U1a promoter (construct IDs 178 and 179), pol III U6 promoter (construct IDs 180 and 181) or bGH poly (A) (construct ID 182) versus the original CpG⁺ The AAV construct (construct ID 177) achieved no significant reduction in editing activity compared to the editing level. Specifically, cpG^- U1a、CpG^- U6 or CpG^- basic CpG for bGH acquisition⁺ About 80%, about 94%, or about 83% of the editing level achieved by the AAV construct. However, and with the use of a baseReducing or depleting CpG motifs within the UbC promoter sequences (construct IDs 184, 185 and 186) significantly reduces editing activity, highlights the background dependent impact of CpG depletion on AAV editing activity, and highlights the importance of screening individual CpG depleted AAV elements that will produce effective editing, compared to the levels observed for the basal UbC construct (construct ID 183). These findings will be related to the use of CpG⁺ Or CpG^- The experiment of hNPC transduction of AAV was verified. Each CpG^- Elements will also be stacked to produce a combined AAV genome with maximum CpG depletion, which will be evaluated for editing activity in vitro.

Using HEK-Blue^TM Experiments in hTLR9 cells assessing TLR 9-mediated immune responses are expected to show that they are expected to be identical to those from the use of unmodified CpG⁺ Levels of AAV-treated cells compared to those from CpG-treated cells^- The level of secreted SEAP was reduced in AAV-treated cells. Reduced SEAP levels would indicate reduced TLR 9-mediated immune activation.

Example 19: in vivo administration of AAV vectors with or without CpG-depleted genomes to assess inflammatory cell responsesFactor production and CasX-mediated effects of editing

Experiments will be performed to assess the effect of in vivo administration of AAV vectors with or without CpG-depleted genomes. Briefly, AAV particles expressing the CasX: gRNA system (with or without CpG depletion) will be administered into C57BL/6J mice. In these experiments, the combined AAV genome with a large CpG depletion will be used for evaluation. Following AAV administration, mice will be bled at various time points to collect blood samples. The production of inflammatory cytokines such as IL-1β, IL-6, IL-12 and TNF- α will be measured using ELISA.

Materials and methods：

CpGDepletion ofAAV plasmid production：

To assess the yield of transgenic specific T cells, SIINFEKL peptides were cloned into AAV transgenic plasmids on the N-terminus of CasX protein. SIINFEKL peptides are ovalbumin-derived peptides that are well characterized and have widely available reagents to detect T cells specific for this peptide epitope. The nucleic acid sequence encoding the peptide will be cloned as an N-terminal fusion of CasX in AAV constructs with ROSA26 targeting spacers.

Production of AAV vectors will be performed as previously described in example 17.

Viral genome titers will be determined as previously described in example 17.

Measurement of inflammatory cytokines to assess humoral immune activation：

About 1E12vg AAV was injected intravenously into C57BL/6J mice. Blood was drawn from the tail vein or saphenous vein daily following AAV injection for 7 days. The levels of inflammatory cytokines such as IL-1β, IL-6, IL-12 and TNF- α in the collected serum were assessed using a commercially available ELISA kit according to manufacturer's recommendations for mouse blood samples (Abcam). Briefly, 50. Mu.L of standard, control buffer and sample were loaded into wells of ELISA plates pre-coated with specific antibodies to IL-1. Beta., IL-6, IL-12 or TNF-alpha, incubated for 2 hours at Room Temperature (RT), washed, and incubated with horseradish peroxidase (HRP) for 2 hours at RT, followed by additional washing. Wells were treated with TMB ELISA substrate and incubated for 30 min at RT in the dark, followed by H₂ SO₄ Quenching. Absorbance at 450nm was measured using a TECAN spectrophotometer and wavelength correction was performed at 570 nm.

Assessment of transgenic specific T cell populations：

10 days after intravenous injection of AAV, blood was collected from mice and EasySep was used^TM The mouse T cell separation kit separates T cells. Incubating the isolated T cells with: FITC mouse anti-human CD4 antibody (BD Biosciences), APC mouse anti-human CD8 antibody (BD Biosciences) and BV421 ovalbumin SIINFEKL MHC Tetramer (tetra Shop). The percentage of cd4+ and cd8+ T cells specific for SIINFEKL MHC tetramers will be quantified using flow cytometry. FITC, APC and BV421 will be excited by 488nm, 561nm and 405nm lasers and the signals will be quantified with band pass filters 440/50, 530/30 and 780/60, respectively.

Determination of CasX-specific antibodiesMeasuring amount：

Recombinantly produced and purified CasX variant 491 (the method of production and purification is described in WO2020247882A1, which is incorporated by reference in its entirety) will be directly attached to the wells of a polystyrene 96-well plate by passive adsorption using a carbonate/bicarbonate buffer with pH > 9. The serum samples were then assessed for the presence of CasX 491 specific antibody using standard ELISA techniques, using a commercially available HRP conjugated secondary antibody kit, according to the manufacturer's (Bethyl Laboratories) recommendations. Absorbance at 450nm was measured using a TECAN spectrophotometer and wavelength correction was performed at 570 nm.

Quantification of AAV-mediated genome editing at ROSA26 locus：

To confirm CpG^- AAV versus CpG⁺ AAV exhibits enhanced CasX editing activity in vivo, about 1E12 AAV particles containing CasX protein 491 and gRNA targeting the ROSA26 locus will be administered intravenously via the facial vein of C57BL/6J newborns. The animals will then be treated according to the experimental animal use protocol of Scribe. Four weeks after injection, mice were euthanized and liver and/or muscle tissue was harvested for gDNA extraction using the Zymo Quick DNA/RNA miniprep kit according to the manufacturer's instructions. Target amplicons were amplified from 200ng of extracted gDNA with a set of primers targeting the mouse ROSA26 locus of interest and treated as previously described in example 17.

Results：

In vivo experiments measuring serum inflammatory cytokine levels are expected to show that CpG^- AAV will significantly inhibit the production of inflammatory cytokines such as IL-1 beta, IL-6, IL-12 and TNF-alpha, thereby reducing immunogenicity and toxicity. Furthermore, cpG is^- AAV may cause less TLR9 activation, resulting in reduced expansion of T cells against SIINFEKL peptides fused to CasX. Thus, compared to the level of an AAV construct from CpG elements, the CpG is used^- AAV injection is expected to produce reduced SIINFEKL-specific cd4+ and cd8+ T cell levels.

Due to CpG^- AAV may result in less humoral immune activation and non-specific inflammation, and less T cell mediationImmunization is conducted, and thus the titer of CasX-reactive antibodies is also expected to decrease (i.e., ELISA signals for quantitative CasX antibodies are expected to be lower).

Finally, cpG^- The editing capacity of AAV will be assessed by harvesting muscle and/or liver tissue for genomic DNA extraction and NGS to determine the level of editing at the ROSA26 locus. In view of CpG^- AAV is expected to elicit less humoral immune response in vivo, so they are expected to enhance CasX editing activity at the ROSA26 locus.

Table 26: AAV constructs and component sequences

* The table lists the component sequences except for the sequences encoding the nucleases, guide RNAs and connecting peptides; nd=the sequences provided in the sequence listing are not described.

Table 27: coded targeting sequences integrated into AAV constructs

Target(s)	SEQ ID NO of targeting sequence
		Huntington (HTT) spacers	41056-41289
PCSK9 spacer	41290-41319
		B2M spacer	41320-41477
SOD1 spacer	41478-41571
		Rho spacer	41572-41611
TRAC spacer	41612-41653
		DMD spacer	41654-41736
BCL11A spacer	41737-41738
		C9Orf72 spacer	411739-41740
PTBP1 spacer	41741-41776

Example 20: evolution of guide RNA to guide scaffold platform

Experiments were performed to identify guide RNA guide scaffold variants that exhibit improved activity for double-stranded DNA (dsDNA) cleavage. To achieve this, a large library of scaffold variants was designed and tested in pooled fashion for functional knockdown of reporter genes in human cells. The scaffold variants that resulted in improved knockouts were determined by sequencing the functional elements within the pool and subsequent computational analysis.

Materials and methods

Library design

Assessment of RNA secondary Structure stability

RNAfold (v2.4.14) (Lorenz R et al ViennaRNA Package 2.0.Algorithms Mol Biol.6:26 (2011)) was used to predict secondary structural stabilization of RNA sequences, similar to that done by Jarmoskaite i et al: "A quantitative and predictive model for RNA binding by human pumilio proteins", mol cell.74 (5): 966 (2019). To evaluate the ΔΔg_bc value, the ensemble free energy (Δg) of the unconstrained ensemble is calculated, and then the ensemble free energy (Δg) of the constrained ensemble is calculated. ΔΔΔG/u BC is subject to constraint ΔG values the difference between unconstrained Δg values. Restriction strings are used that reflect base pairing of pseudocorms, stent stems and extension stems and require base unpairement of triplexes.

Calculation of stability of secondary structure of pseudonode stem

The stability of the pseudo-junction structure across the entire stem-loop at positions 3-33 was calculated using the triplex loop sequence from guide bracket 175. In addition, a constrained string is generated that ensures base pairing and base unpairement of pseudoknot bases in the triplex loop. Thus, the stability change may be due to the difference in pseudoknot sequences only. For example, the pseudoknot sequence aaaacg_cgtttt is converted to a stem-loop sequence by insertion of the triplex loop sequence CUUUAUCUCAUUACUUUGA (SEQ ID NO: 41834) such that the final sequence will be AAAACGCUUUAUCUCAUUACUUUGACGTTTT (SEQ ID NO: 41835) and the constraint string is: "((((((xxxxxxxxxxxxxxxxxxx)))))" (SEQ ID NO:41836, wherein x=n).

Molecular biology

Molecular biology of library construction

The library of guide RNA scaffold variants designed was synthesized and obtained from Twist Biosciences and then amplified by PCR using primers specific to the library. These primers amplify additional sequences at the 5 'and 3' ends of the library to introduce the sequence recognition sites for the restriction enzyme SapI. PCR was performed with Q5 DNA polymerase (New England Biolabs) and according to the manufacturer's instructions. Typical PCR conditions are: 10ng template library DNA, 1 XQ 5 DNA polymerase buffer, 300nM dNTPs, 300nM each primer, 0.25. Mu. l Q5 DNA polymerase in 50. Mu.l of reaction. On a thermal cycler, a typical procedure would be: cycling at 95℃for 5 minutes; then, the reaction is carried out for 20 cycles at 98 ℃ for 15s, 65 ℃ for 20s and 72 ℃ for 1 min; wherein the final extension is 2min at 72 ≡c. The amplified DNA product was purified using a DNA cleaning and concentration kit (Zymo Research). The PCR amplicon and plasmid pKB4 were then digested with restriction enzyme SapI (New England Biolabs) and both were independently gel purified by agarose gel electrophoresis followed by gel extraction (Zymo) according to the manufacturer's instructions. The library was then ligated using T4 DNA ligase (New England Biolabs), purified with DNA cleaning and concentration kit (Zymo), and transformed into MegaX DH10B T R Electrocomp cells (ThermoFisher Scientific), all according to the manufacturer's instructions. The transformed library was recovered in SOC medium for 1 hour and then grown overnight in 5mL of 2xyt medium with shaking at 37 ℃. Plasmid DNA was then prepared from the culture in small amounts (QIAGEN). The plasmid DNA was then further cloned by digestion with restriction enzyme Esp3I (New England Biolabs), followed by ligation with an annealing oligonucleotide having complementary single stranded DNA protrusions and the desired spacer sequence for GFP targeting. These oligonucleotides have 5' phosphorylation modifications and are annealed by heating to 95℃for 1min, followed by a decrease in temperature of 2℃per minute until a final temperature of 25℃is reached. Ligation was performed as a Golden Gate assembly reaction, with typical reaction conditions consisting of 1 μg predigested plasmid library, 1 μM annealed oligonucleotides, 2 μ L T4 DNA ligase, 2 μL of Esp3I and 1x T4 DNA ligase buffer in a total volume of 40 μl water. The reaction was cycled between 37℃for 3 minutes and 16℃for 5 minutes for 25 times. The library was purified, transformed, grown overnight and prepared in small quantities as described above. The resulting plasmid library was then used for lentivirus production.

Library screening

LV production

Lentiviral particles were generated by transfection of LentiX HEK293T cells, inoculated 24 hours earlier, and confluency was 70% -90%. In serum-free medium, the plasmids containing the pooled library were introduced into a second generation lentiviral system containing the VSV-G envelope plasmid packaged and having polyethylenimine. For particle production, the medium was changed 12 hours after transfection and virus was harvested 36-48 hours after transfection. The virus supernatant was filtered using a 0.45 μm PES membrane filter and diluted in cell culture medium as appropriate before being added to the target cells.

72 hours after filtration, an aliquot of lentiviral supernatant was titrated by TaqMan qPCR. Viral genomic RNA was isolated by phenol-chloroform extraction (TRIzol) followed by ethanol precipitation. The quality and quantity of extraction was assessed by nanodrop readings. Any remaining plasmid DNA was then digested with dnase I prior to cDNA production by ThermoFischer SuperScript IV reverse transcriptase. Viral cDNA was serially diluted 1:1000 and combined with WPRE-based primers and TaqMan master mix, followed by qPCR by Bio-Rad CFX 96. All sample dilutions were added in duplicate and averaged before titer was calculated according to the known plasmid-based standard curve. Water was always measured as a negative control.

LV screening (transduction, maintenance, gating, sorting, gDNA isolation)

The target reporter cells were passaged 24-48 hours prior to transduction to ensure cell division occurred. At transduction, cells were treated with trypsin, counted, and diluted to the appropriate density. Cells were resuspended in untreated, library-containing or control pure lentiviral supernatant at low MOI (0.1-5, depending on the viral genome) to minimize double lentiviral integration. Lentiviral-cell mixtures were inoculated at 40% -60% confluency, then at 37 ℃, 5% CO₂ And (5) incubating. After 48 hours of transduction with puromycin from 1 μg/ml to 3 μg/ml, successfully transduced cells were selected for 4-6 days and then recovered in HEK or Fb medium.

After selection, the cells were suspended in 4', 6-diamidino-2-phenylindole (DAPI) and Phosphate Buffered Saline (PBS). Cells were then filtered using a Corning Filter cap FACS tube (product 352235) and sorted on Sony MA 900. In addition to gating on individual living cells by standard methods, cells are sorted to knock down fluorescent reporter genes. Cells from the experiments were lysed and the genome was extracted using Zymo Quick-DNA Miniprep Plus according to the manufacturer's protocol.

Next Generation Sequencing (NGS) process

Genomic DNA is amplified by PCR using primers specific for DNA encoding the guide RNA to form target amplicons. These primers contained additional sequences at the 5' end to introduce Illumina reads and 2 sequences. Typical PCR conditions will be: mu.g of gDNA, 1 XKapa Hifi buffer, 300nM dNTPs, 300nM each primer, 0.75. Mu. l Kapa Hifi Hotstart DNA polymerase in 50. Mu.l of the reaction. Circulating for 5min at 95 ℃ on a thermal cycler; then 15 cycles are carried out at 98℃for 15s, at 62℃for 20s and at 72℃for 1 min; wherein the final extension is carried out at 72℃for 2min. The amplified DNA product was purified using the Ampure XP DNA removal kit. A second PCR step was performed with an index adapter to allow multiplexing on the Illumina platform. In 50. Mu.l of the reaction, 20. Mu.l of purified product from the previous step was mixed with 1 XKapa GC buffer, 300nM dNTPs, 200nM of each primer, 0.75. Mu. l Kapa Hifi Hotstart DNA polymerase. Circulating for 5min at 95 ℃ on a thermal cycler; then 5-16 cycles of 98 ℃ for 15 seconds, 65 ℃ for 15 seconds and 72 ℃ for 30 seconds; wherein the final extension is carried out at 72℃for 2min. The amplified DNA product was purified using the Ampure XP DNA removal kit. The quality and quantification of the amplicons was assessed using the Fragment Analyzer DNA assay kit (Agilent, dsDNA 35-1500 bp). Amplicons were sequenced on an Illumina Miseq (v 3, 150 cycles of single-ended sequencing) according to the manufacturer's instructions.

NGS analysis (sample processing and data analysis)

Reads were trimmed using cutadapt (version 2.1) to obtain linker sequences and guide sequences (including scaffold sequences and spacer sequences) were extracted for each read (the sequence between the upstream and downstream amplicon sequences was also extracted using the cutadapt v 2.1 ligated linker). Unique guide RNA sequences were counted and each scaffold sequence was then compared to a list of designed sequences and the sequences of guide scaffolds 174 (SEQ ID NO: 2238) and 175 (SEQ ID NO: 2239) to determine identity of each.

The read counts for each unique guide RNA sequence were normalized using an average normalization method to determine sequencing depth. Enrichment for each sequence was calculated by dividing the normalized read count in each GFP sample by the normalized read count in the relevant native sample. For both selections (R2 and R4), GFP populations and the initial populations were NGS treated on three separate days to form triplicate enrichment values for each scaffold. After summing the read counts of triplicate initial and GFP samples, the total enrichment score for each scaffold was calculated.

Two enrichment scores from different selections pass through a single log₂ The weighted average combination of enrichment scores is weighted by their relative representation in the initial population.

Log estimation by calculating 95% confidence intervals for average enrichment scores of triplicate samples₂ Error in the enrichment score. These errors are propagated when combining two separately selected enrichment values.

Results and discussion

Library design, ordering and cloning

Libraries of guide RNA variants were designed to test for variation of the RNA scaffold in an unbiased manner and in a targeted manner that focused on key modules within the RNA scaffold.

In the unbiased portion of the library, all single nucleotide substitutions, insertions, and deletions were designed for each residue of guide scaffold 174 (SEQ ID NO: 2238) and 175 (SEQ ID NO: 2239) (about 2800 separate sequences). Double mutants were designed to focus exclusively on the regions of possible interactions; thus if in the CryoEM structure (PDBid: 6NY 2) two residues are involved in canonical or non-canonical base pairing interactions, or if the two residues are predicted to pair in the lowest energy structure predicted by RNAfold (v2.4.14), then the corresponding residues and 174 in the guide scaffold 174 are mutated (including all possible substitutions, insertions and deletions of the two residues). Residues adjacent to these "interacting" residues are also mutated; however, for these, only substitution of each of the two residues is included. In the final library, about 27K sequences were designed with two mutations relative to guide scaffold 174 or 175.

In the library portion dedicated to specific mutagenesis of key regions of the RNA scaffold, modifications were designed to: pseudoknot region, triplex region, scaffold blebs and extension stem (region identification see figure 65). In each of these targeting moieties of the library, the entire domain was mutagenized in a hypothetical driven manner (fig. 66). For example, for triplex regions, each base triplex that will contain a triplex is mutagenized to a different triplex forming motif (see FIG. 67). This type of mutagenesis is different from the mutagenesis employed in the stent stem blebs, in that all possible substitutions of bases surrounding the blebs are mutagenized (i.e., having up to 5 mutations relative to guide sequence 174 or 175). Again, in contrast, 5 base pairs comprising pseudobulb were completely replaced with a substituted watson-crick pairing sequence (up to 10 different bases were mutagenized).

The final target portion of the library is intended to optimize sequences that are more likely to form secondary structures that are prone to binding proteins. Briefly, the secondary structural stability of the sequences was predicted under two conditions: 1) Without any constraint, 2) is constrained such that critical secondary structural elements such as pseudoknot stems, scaffold stems, and extension stems are formed (see materials and methods). Our hypothesis is that the stability difference between these two conditions (referred to herein as ΔΔg_bc) will be minimal for sequences that bind more readily to proteins, so we should find a sequence where such difference is minimal.

Libraries were designed ordered from Twist (about 40K different sequences) and synthesized to include the golden gate site for cloning into the lentiviral plasmid backbone that also expressed protein STX119 (see materials and methods). The spacer sequence targeting the GFP gene is cloned into a library vector, effectively producing a single guide RNA targeting the GFP gene from each RNA scaffold variant. The representation of the designed library variants was evaluated with next generation sequencing (see materials and methods).

Library screening and evaluation

Preparing a plasmid library comprising guide RNA variants and a single CasX protein (version 119) into lentiviral particles (see materials and methods); the particles were titrated based on copy number of the viral genome using qPCR assays (see materials and methods). Cell lines stably expressing GFP were transduced with a lentiviral particle library at low multiplicity of infection (MOI) to ensure that each cell was at most integrated with one library member. The cell pool is selected to retain only cells with genomic integration. Finally, GFP expression sorting is performed on the cell population to obtain a GFP negative cell population. These GFP-negative cells contain library members that effectively target CasX RNPs to GFP protein, resulting in indels and subsequent loss of function.

Genomic DNA from both unsorted cell populations ("naive") and GFP-negative populations were treated to isolate sequences of guide RNA library members in each cell. To determine the representation of guide RNAs in the initial population and GFP-negative population, next generation sequencing was performed. The enrichment score for each library member was calculated by dividing its representation in the GFP population by its representation in the initial population: a high enrichment score indicates that library members occur much more frequently in active GFP negative populations than in the initial pool and are therefore active variants that are able to efficiently generate indels within the GFP gene (enrichment value>1，log₂ Enrichment>0). A low enrichment score indicates that library members of the active GFP population are depleted compared to the initial population and thus are not effective in forming indels (enrichment value<1，log₂ Enrichment<0). As a final statistic for the comparison, the relative enrichment value was calculated as the enrichment of library members (in GFP negative versus initial population) divided by the enrichment of reference scaffold sequences (in GFP negative versus initial population). The enrichment values of the reference scaffold sequences (in log space, only these values need to be subtracted.) are shown in figure 68.

Multiple screens were performed to independently generate lentiviral particles, transduce cells, select and sort to obtain an initial population and GFP-negative population, and sequenced to learn the enrichment value of each library member. These screens were designated R2 and R4 and largely reproduced the enrichment values obtained for the single nucleotide variants on guide brackets 174 and 175 (fig. 69). This screen enables the identification of many possible combinations of mutations that are enriched in the functional GFP population and thus can produce functional RNPs. In contrast, no guide containing non-targeted spacers was enriched, confirming that enrichment was a selective cut-off (data not shown). The complete set of mutations on the enriched guide brackets 174 and 175 are given in tables 28 and 29, respectively. These lists reveal the sequence diversity of functional RNPs that are still able to achieve targeting.

Single nucleotide mutations indicate a mutable region of the scaffold：

To determine the scaffold mutations that resulted in similar or improved activity relative to guide scaffolds 174 and 175, enrichment values for single nucleotide substitutions, insertions, or deletions were plotted (FIG. 70). In general, the guide scaffold 174 is more tolerant to single nucleotide changes thereon than the guide scaffold 175, which may reflect the higher activity of the guide scaffold 174 in this case, and thus the higher tolerance to mutations that inhibit activity (fig. 68 and 71). In most cases, the favorable single nucleotide mutation at 175 is also favorable in the context of guide scaffold 174 (fig. 71), so the mutation value at guide scaffold 175 is considered a more stringent reading of the mutation effect. The key mutable regions were revealed by this analysis, as described in the following paragraphs:

the most notable feature is the extension stem, which shows similar enrichment values to the reference sequence of scaffold 174 or 175, indicating that the scaffold can tolerate changes in this region, similar to that observed in the past, and would be predicted by structural analysis of CasX RNP, where little contact of the extension stem with protein was observed.

The triplex loop is another region that shows high enrichment relative to the reference scaffold, particularly when prepared in the guide scaffold 175 (e.g., particularly for mutations to C15 or C17). Notably, the C17 position in 175 has been mutated to G in scaffold 174, which is one of two highly enriched mutations at that position of scaffold 175.

The change in either member of the predicted pair in the pseudonode between G7 and a29 is highly enriched with respect to the reference, particularly in the guide scaffold 175. In both guide brackets 174 and 175, the pairing is a non-canonical G: A pairing. Mutations that are most strongly enriched at these positions convert a29 to C or T in guide scaffold 175: the first of these would form a canonical Watson-Crick pairing (G7: C29) and the second of these would form a GU wobble pair (G7: U29), both of which would be expected to increase the stability of the helix relative to the G: A pair. Conversion of G7 to T is also highly enriched, which will form a canonical pair at this position (U7: A29). Obviously, these positions facilitate a more stable pairing. Typically, the 5' end is mutable, with few changes resulting in de-enrichment.

Finally, the insertion of C at position 54 in guide bracket 175 is highly enriched, while the deletion of A or inserted G at a similar position in guide bracket 174 has an enrichment value similar to the reference. In summary, a guide stent may prefer to have two nucleotides in the stent bulb, but it may not be a strong preference. These results are further studied in the following sections.

Pseudobulb stability is an integral part of stent activity

To further explore the effect of pseudobulb on stent activity, pseudobulb was modified in the following manner: (1) Shuffling of base pairs within the stem results in each new pseudoknot having the same base pair composition, but a different order within the stem: (2) the base pairs are completely replaced by random WC-pairing sequences. Two hundred ninety-one (291) pseudotubers were tested. Ext> analysisext> ofext> theext> firstext> setext> ofext> sequencesext> showedext> aext> strongext> preferenceext> ofext> Gext> -ext> Aext> forext> theext> firstext> positionext> ofext> theext> pseudoknotext> stemext> overext> otherext> possibleext> positionsext> (ext> positionsext> 2ext> -ext> 6ext>;ext> inext> wildext> -ext> typeext> sequencesext>,ext> positionext> 5ext>;ext> FIG.ext> 72ext>)ext>,ext> whileext> theext> resultsext> demonstratedext> thatext> havingext> GAext> atext> eachext> ofext> positionsext> 2ext> -ext> 6ext> ofext> theext> pseudoknotext> stemext> wasext> generallyext> unfavorableext> withext> lowext> averageext> enrichmentext>.ext> Ext> havingext> aext> Gext> -ext> aext> baseext> atext> positionext> 1ext> mayext> stabilizeext> theext> pseudoknotext> stemext> byext> allowingext> theext> remainderext> ofext> theext> helixext> toext> beext> formedext> solelyext> fromext> stackedext> watsonext> -ext> crickext> pairsext>.ext> The results further support that the scaffold prefers perfectly paired pseudotuberosities.

A large number of pseudoknot sequences have positive logs₂ Enrichment, indicating that substitution of the sequence with alternative base pairs is generally tolerated (pseudo-junction structure in FIG. 73). To further test the hypothesis that a more stable helix in the pseudonode will produce a more active scaffold, the secondary structural stability (materials and methods) of each pseudonode was calculated. A strong relationship between pseudoknot stability and enrichment and thus activity was observed (FIG. 74: more active scaffolds have stable pseudoknot stems) with stable pseudoknot stems [ ] <7 kcal/mol) has a high enrichment, whereas guide scaffolds with unstable pseudotubers (. Gtoreq. -3 kcal/mol) have a very low enrichment.

Double mutation indicates mutable region of guide scaffold：

Double mutations of each reference guide scaffold were examined to further identify the mutable regions within the scaffold, as well as potential mutations that improve scaffold activity. Focusing only on a single pair of positions-positions 7 and 29, which were predicted to form non-canonical G: A pairs in pseudotubers and support mutagenesis (see section above) -we mapped all 64 double mutations for this pair of positions (FIG. 75). The canonical pair is advantageous at both positions (e.g., substitution of C at position 7 and G at position 29 yields a G: C pair and is enriched; substitution of C at position 7 and insertion of G at position 29 similarly yields a G: C pair, substitution of A at position 7 and substitution of U at position 29 yields a A: U pair). The insertion pair is not enriched, possibly because it is considered that the G: A pair is shifted up one position in the helix without being completely removed, where inserting a canonical pair is insufficient to stabilize the helix. Surprisingly, several enriched double mutations did not form the following canonical pairs; for example, the substitution of U at position 7 and C at position 29 (which forms a non-canonical U: C pair), the substitution of U at position 7 and U at position 29 (which forms a U: U pair), and some other substitutions (FIG. 75). It is possible that purine-purine is substantially more able to disrupt the helix than other non-canonical pairs. In fact, the substitution of A at position 7 and G at position 29 again forms the A: G pair, which is not enriched at that position.

The enrichment value for the double substitution within each key structural element of guide bracket 175 is determined by the heat map, where each position may have up to three substitutions. The stent stem was determined to be the worst resistant to the mutation, indicating that the sequence of this region was severely limited.

The results demonstrate that when used in editing assays, substantial changes can be made to the guide scaffold that still result in functional gene knockouts. In particular, the results demonstrate critical positions that can be used to enhance activity by modification in guide scaffolds, including increased secondary structural stability of the pseudotuberosities within the scaffold.

Table 28: mutation and relative enrichment of guide 174

* Mutant sequences were used "; "separate", and multiple mutations of each sequence are used "," separate

Table 29: relative enrichment of guide 175 mutations and production

Example 21: ccdB selection assay identifies dsDNA cleavage with improved TTC, ATC and CTC PAM sequencesOr improved spacer-specific CasX protein variants。

Experiments were performed to identify a set of variants derived from CasX 515 (SEQ ID NO: 145) that have biochemical cleavage capacity for double-stranded DNA (dsDNA) cleavage at target DNA sequences related to TTC or PAM sequences of ATC or CTC and exhibit improved activity or improved spacer specificity compared to CasX 515. To achieve this, first, a set of spacers that survive above background levels were identified in a CcdB selection experiment using CasX 515 and guide-stent 174. Second, ccdB selection was performed with these spacers to determine the set of variants derived from CasX 515 that were biochemically cleavable for dsDNA cleavage at the canonical "wild-type" PAM sequence TTC. Third, ccdB selection experiments were performed to determine the set of variants of CasX 515 that were able to improve dsDNA cleavage at PAM sequences of ATC type or CTC type. Fourth, plasmid counter selection experiments were performed to determine the set of variants derived from CasX 515, which resulted in improved spacer specificity.

Materials and methods：

For the CcdB selection experiment, 300ng of plasmid DNA (p 73) expressing the CasX protein (or library) and sgRNA shown was electroporated into the escherichia coli strain BW25113 carrying plasmids expressing the CcdB toxic protein. After transformation, the culture was allowed to recover in glucose rich medium with shaking at 37 ℃ for 20 min, after which IPTG was added to a final concentration of 1mM and the culture was further incubated for an additional 40 min. The recovered cultures were then titrated on LB agar plates (Teknova catalog number L9315) containing antibiotics selective for the plasmids. Cells were titrated on plates containing glucose (not expressing the CcdB toxin) or arabinose (expressing the CcdB toxin), relative viability was calculated and plotted as shown in figure 76. Next, the cultures were electroporated and recovered as described above, and the recovered fractions were saved for titration. The remaining recovery cultures were isolated after the recovery period and grown in glucose or arabinose containing medium to collect samples of the pooled library without selection or strong selection, respectively. These cultures were harvested and surviving plasmid pools were extracted using a plasmid miniprep kit (QIAGEN) according to the manufacturer's instructions. The entire procedure was repeated for a total of three rounds of selection.

The final plasmid pool was isolated and PCR amplification of the p73 plasmid was performed using primers specific for Unique Molecular Identifiers (UMI). These UMI sequences have been designed such that each specific UMI is associated with one and only one single mutation of the CasX 515 protein. Typical PCR conditions are used for amplification. The pool of variants of CasX 515 contains many possible amino acid substitutions, possible insertions and single amino acid deletions in a process called Deep Mutant Evolution (DME). The amplified DNA product was purified using the Ampure XP DNA purification kit and eluted in 30. Mu.l of water. Amplicons were then prepared for a second PCR sequencing to add a Next Generation Sequencing (NGS) compatible linker sequence on a MiSeq instrument or a NextSeq instrument (Illumina) according to the manufacturer's instructions. NGS was performed on the prepared samples. The returned original data file is processed as follows: (1) Quality and sequence of sequencesTrimming of the linker sequence; (2) Combining sequences from reads 1 and 2 into a single insertion sequence; and (3) quantifying each sequence containing UMI associated with the mutation relative to a reference sequence of CasX 515. The incidence of single mutations relative to CasX 515 was counted. Dividing the mutation count after selection by the mutation count before selection and using a false count of 10 to generate an "enrichment score". The base 2 logarithm (log) of the score was calculated₂ ) And plotted as a heat map, wherein the biological parallel-like enrichment score of a single spacer is determined at each amino acid position of an insertion, deletion or substitution (not shown). The library was passaged through CcdB selection with two TTC PAM spacers (spacer 23.2AGAGCGTGATATTACCCTGT,SEQ ID NO:41837; and 23.13CCCTTTGACGTTGGAGTCCA,SEQ ID NO:41838) in triplicate and one TTC PAM spacer (spacer 23.11TCCCCGATATGCACCACCGG,SEQ ID NO:41839) in triplicate, and the average of the triplicate measurements was plotted on log₂ On the enrichment scale, a heat map was taken as a measurement variant of CasX 515. The CasX515 variant, which retains full cleavage capacity compared to CasX515, exhibits a log of about 0₂ Enriching the value; variants that lost cleavage function exhibited a log of less than 0 compared to the value of CasX515₂ Values, whereas variants with improved cleavage using this selection yield a log greater than 0₂ Values. Experiments to generate additional heat maps (not shown) were performed using the following single spacers (11.2AAGTGGCTGCGTACCACACC,SEQ ID NO:41840;23.27GTACATCCACAAACAGACGA,SEQ ID NO:41840; and 23.19CCGATATGCACCACCGGGTA,SEQ ID NO:41842, respectively) for selectivity.

For plasmid counter selection experiments, additional rounds of bacterial selection were performed on the final plasmid pool generated by CcdB selection with TTC PAM spacer. The overall scheme of counter-selection is to allow replication of only those E.coli cells that contain both plasmid populations. The first plasmid (p 73) expresses CasX protein (under the induction of ATc) and sgRNA (constitutive expression), as well as an antibiotic resistance gene (chloramphenicol). It is noted that this plasmid can also be used in standard forward selection assays such as CcdB, and that the spacer sequence is freely variable, entirely according to the needs of the experimenter. The second plasmid (p 74) was used only to express the antibiotic resistance gene (kanamycin), but was modified to contain (or not contain) a target site matching the spacer encoded in p 73. Furthermore, these target sites can be designed to incorporate "mismatches" with respect to the spacer sequence, consisting of non-canonical Watson-Crick base pairing between the RNA of the spacer and the DNA of the target site. If the RNP expressed from p73 is able to cleave the target site in p74, the cell will only remain resistant to chloramphenicol. In contrast, if the RNP is unable to cleave the target site, the cell will remain resistant to both chloramphenicol and kanamycin. Finally, the above described dual plasmid replication system can be implemented in two ways. In a sequential approach, either plasmid may be delivered first into the cell, then the strain rendered electrically competent, and a second plasmid delivered (all by electroporation). Previous work has shown that either order of plasmid delivery is sufficient to successfully counter-select, and two approaches have been followed: in an experiment called "screen 5", p73 was electroporated into competent cells carrying p74, whereas in screen 6 the opposite was true. Cultures were electroporated, recovered, titrated and grown for one round under selective conditions as described above, and plasmid recovery was also performed as described above, followed by amplification, NGS and enrichment calculations.

Finally, additional CcdB selections were made in a similar manner, but using the guide scaffold 235 and the surrogate promoters WGAN45, ran2 and Ran4, all of which target the virulent CcdB plasmid with spacer 23.2. These promoters are expected to express guide RNAs less strongly than the CcdB selection described above, and are therefore expected to reduce the total concentration of CasX RNPs in bacterial cells. This physiological effect should reduce the overall viability of the bacterial cells in the selective assay, thereby increasing the dynamic range of the enrichment score and correlating more precisely with RNP nuclease activity at TTC PAM spacer 23.2. Three rounds of selection were performed in triplicate as above for each promoter, and each round of experiments resulted in enrichment data as described above. These experiments are hereinafter referred to as screen 7.

Results：

The results of library screening heatmaps demonstrate that CasX515 complexed with guide scaffold 174 is able to cleave the CcdB expression plasmid when targeted using a spacer region that targets DNA sequences associated with TTC PAM sequences (as listed below). In contrast, the spacer region using the alternative PAM sequence exhibited more variable survival. The survival rate of ATC PAM spacer (as listed below) ranges from a few percent to well below 0.1%, while CTC PAM spacer (as listed below) allows survival rates ranging from >50% to below 1%. Finally, GTC PAM spacers (listed below) only enable survival rates at or below 0.1%. These baseline data support the experimental design of this selection procedure and demonstrate the powerful selection capacity of the CcdB bacterial assay. Specifically, casX proteins that are unable to cleave double-stranded DNA are de-enriched by at least four orders of magnitude, whereas CasX proteins with biochemical cleavage capacity will survive the assay.

The heat map was used to identify a set of CasX 515 variants that were biochemically cleavage-capable for dsDNA cleavage at target DNA sequences associated with TTC PAM sequences, as well as those variants that exhibited improved dsDNA cleavage at target DNA sequences associated with CTC (spacers 11.2 and 23.27) and ATC (spacer (23.19)).

These three data sets (alone or in combination) represent potential biochemical differences between variants and identify regions of interest for future engineering of improved CasX therapeutics for human genome editing. As evidence of this, internal controls are consistently included as part of the initial library, such as the presence of stop codons at each position throughout the protein. In multiple rounds of selection, these stop codons were always observed to be lost, consistent with the expectation that the partially truncated CasX 515 should not result in dsDNA cleavage. Similarly, the loss of activity variants reflected in the heat map data were observed to have been depleted during selection and thus severely lost the fitness for double stranded DNA cleavage in this assay. However, enrichment values of 1 or greater (and corresponding log₂ An enrichment value of 0 or greater) is neutral at least in terms of biochemical cleavage. Importantly, if one or more mutations identified in this particular subset of variants exhibit the desired properties of the therapeutic molecule, then these mutations establish a structure-function relationship that exhibits compatibility with biochemical functions. More specifically, these protrusions Variants can affect properties such as CasX protein transcription, translation, folding, stabilization, ribonucleoprotein (RNP) formation, PAM recognition, double-stranded DNA unwinding, non-target strand cleavage, and target strand cleavage.

For those variants that have cleavage ability at sequences related to CTC and ATC PAM sequences, variants enriched in these datasets (enriched>1, corresponding to a log of values for about 0₂ Enrichment) represents a mutation that specifically improves cleavage of CTC or ATC PAM target sites. Mutations meeting these criteria can be further subdivided in two general ways: mutations increase the cleavage rate by increasing PAM recognition (type 1), or mutations increase the overall cleavage rate of the molecule independent of PAM sequence (type 2).

As an example of the first type, the substitution mutation at position 223 was found to be several hundred-fold enriched in all test samples. This position encodes a glycine in wild-type reference CasX proteins CasX 1 and 2 measured 6.34 angstroms from the-4 nucleotide position of the DNA non-target strand in the disclosed CryoEM structure of CasX 1 (PDB ID:6NY 2). Thus, these substitution mutations at position 223 are physically close to the altered nucleotides of the new PAM and may interact directly with DNA. Further supporting this conclusion, many of the enriched substitutions encode amino acids that are capable of forming additional hydrogen bonds relative to the substituted amino acid (glycine). These findings demonstrate that improved recognition of new PAM sequences in CasX proteins can be achieved by introducing mutations that interact with one or both DNA strands, especially when the PAM DNA sequences are physically close (within 10 angstroms). Other features of the heatmaps of ATC and CTC spacers represent mutations that increase recognition of non-canonical PAM sequences, but their mechanism of action has not been studied.

As an example of the second type of mutation, the results of the heat map were used to identify mutations that increase the overall cleavage rate compared to CasX 515, but do not necessarily specifically recognize the PAM sequence of DNA. For example, in the selection using spacer 11.2 (CTC PAM) and spacer 23.19 (ATC PAM), the CasX 515 variant consisting of arginine inserted at position 27 was measured to have an enrichment value of greater than 1. The variant has previously been identified by a comparable selection on CTC PAM spacer, wherein the mutation was enriched by several orders of magnitude (data not shown). The amino acid mutation is located physically close (9.29 angstroms) to the DNA target strand at position-1 in the structural model described above. These findings suggest a mechanism in which the mature R-loop formed by CasX RNP with double-stranded DNA is stabilized by the side chain of arginine, possibly through ionic interactions of the positively charged side chain of the DNA target strand with the negatively charged backbone. This interaction benefits the overall cleavage kinetics without altering PAM specificity. These data support the following conclusions: some enriched mutations represent variants that increase the overall cleavage activity of CasX 515 by physically interacting with either or both of the DNA strands when they are physically close (within 10 angstroms).

This data supports the following conclusions: many mutations measured to improve cleavage at sequences related to CTC or ATC PAM sequences identified from the heatmap can be classified as either of the two types of mutations described above. For the type 1 mutations, the variants consisting of the mutation at position 223 with a large enrichment score in at least one spacer tested at CTC PAM, and the associated maximum enrichment score are listed in table 30. For type 2 mutations, a smaller list of mutations was systematically selected from thousands of enriched variants. In order to identify those mutations that are highly likely to increase overall cleavage activity compared to CasX 515, the following procedure was used. First, those mutations that are most consistently enriched in CTC or ATM PAM spacers were filtered out. A lower Limit (LB) is defined for the enrichment score for each mutation for each spacer. LB was defined as log of pooled triplicate biological replicates₂ Log of each parallel was subtracted from the enrichment score₂ Standard deviation of enrichment scores. Second, a subset of these mutations is taken, where LB for at least two of the three independent experimental data sets (one ATC PAM selection and two CTC PAM selections)>1. Third, by excluding those that measured negative logs in any of the three TTC PAM selections₂ Enriched mutations to further reduce the subset of mutations. Finally, a single mutation is manually selected in at least one experiment based on a combination of structural features and strong enrichment scores. The resulting 274 mutations meeting these criteria are listed in Table 31, and the resulting heatmap showsIs the maximum log observed from two CTC or one ATC PAM experiments₂ Enrichment score, and domain where mutation is located.

In contrast to class I mutations, there is another class of mutations that increases the ability of CasX RNPs to discriminate between on-target and off-target sites in genomic DNA, as determined by the spacer sequence, referred to as class II, which increases the spacer specificity of nuclease activity of CasX proteins. Two additional experiments were performed to specifically identify class II mutations, where these experiments consisted of plasmid counter-selection and resulted in enrichment scores representing the sensitivity of the resulting variants to single mismatches between the spacer sequence of the guide RNA and the expected target DNA compared to CasX 515. The resulting enrichment scores for all observed mutations in the experimental data were ranked and the following analysis was performed to identify a subset of mutations that might increase spacer specificity of the CasX protein without substantially decreasing nuclease activity at the desired on-target site. First, mutations from screen 5 were ranked by mean enrichment scores of three technical replicates using spacer 23.2. As inferred from the published model of CasX RNP binding to the target site (PDB ID:6NY 2), those mutations that are physically close to the nucleotide mismatch were removed to discard those class II mutations that might confer a specific improvement only at spacer 23.2, rather than generally across the spacer. Finally, these class II mutations are discarded if their cleavage activity at the TTC PAM site on the target is negatively affected by the mutation and if their average log2 enrichment in the selection from three TTC PAM CcdB is less than zero. The resulting mutations meeting these criteria, as well as the maximum log observed from screen 5, are listed in table 32₂ Enrichment score, and domain where mutation is located. In addition, class II mutations were identified from counter-selection experimental screen 6. These mutations were similarly ranked in terms of their average enrichment score, but different filtering steps were applied. In particular, mutations are identified from each of the following categories: those mutations with the highest average enrichment score from spacer 23.2, spacer 23.11 or spacer 23.13; with the highest combined average enrichment score from spacer 23.2 and spacer 23.11Those of (a); those mutations with the highest combined average enrichment score from spacer 23.11 and spacer 23.13; or those with the highest combined average enrichment score from spacer 23.2 in screen 5 and spacer 23.2 in screen 6. These resulting mutations are listed in table 32, along with the maximum log2 enrichment score observed from screen 6, as well as the domain in which the mutation was located.

In addition to class I or II mutations, there is another class of mutations that have been directly observed to increase dsDNA editing activity at TTC PAM sequences. These mutations (referred to as class III mutations) demonstrated increased nuclease activity by exhibiting higher enrichment scores than CasX515 when the CcdB plasmid was targeted using spacer 23.2 in screen 7. Computational filtering steps are used to identify a subset of these enrichment mutations of particular interest. Specifically, mutations were identified with an average enrichment value of greater than 0 for each of the three promoters tested. Finally, the features of the enrichment score for the entire amino acid sequence are used to identify additional mutations at the enrichment location. Exemplary features of interest include the following: insertions or deletions at protein domain junctions to facilitate topology changes; amino acid substitutions to proline to kink the polypeptide backbone; amino acid substitutions to positively charged amino acids to increase ionic bonding between the protein and either the negatively charged nucleic acid backbone of the guide RNA or any strand of the target DNA; deletion of amino acids, wherein successive deletions are highly enriched; substitution of a position containing a number of highly enriched substitutions; amino acid substitutions at the N-terminus of the protein for highly enriched amino acids. These generated mutations are listed in table 33 along with the maximum log2 enrichment score observed from screen 6 and the domain in which the mutation was located.

Table 30: mutation of CasX 515 (SEQ ID NO: 145) by physical interaction with the PAM nucleotide of DNAFor increasing cleavage Activity at CTC PAM sequences

Table 31: casX 515 (SEQID NO: 145) to improveCleavage Activity at ATC and CTC PAM sequences

Table 32: casX 515 systematically identified from all data setsSEQID NO: 145) to increase the spaceCompartment specificity

Table 33: casX 515 (SEQIDNo. 145) to improveCleavage Activity at TTCPAM sequence