HK40027119A

Movatterモバイル変換

Info

Publication number: HK40027119A
Application number: HK62020016676.2A
Authority: HK
Inventors: 梁锡权; R·波特; L·彭
Original assignee: 生命技术公司
Priority date: 2017-09-08
Filing date: 2018-09-07
Publication date: 2021-01-15

Description

Methods and compositions for enhancing homologous recombination

Technical Field

The present disclosure relates to methods, kits, and compositions for improving the efficiency of homologous recombination. In particular, the present disclosure relates to methods of cloning DNA molecules directly into the genome and/or binding one or more DNA binding agents (TAL effector domain bound by Cas9 or truncated guide RNA) to specific sites, thereby displacing or reconfiguring chromatin at a target locus and/or thereby enhancing accessibility of further enzymatic modifications to the target locus, by using promoter capture in combination with short homology arms, nuclear localization signals. The methods and compositions provided herein are particularly useful for genome editing and enhancing the enzymatic processes involved therein.

Background

Recent advances in TALEN or CRISPR-mediated genome editing tools enabled researchers to efficiently introduce Double Strand Breaks (DSB) in mammalian genomes and then repair most DSB through non-Homologous end joining (NHEJ) pathway or Homology Directed Repair (HDR) pathway in mammalian cells, NHEJ pathway is major and error prone, however, HDR pathway allows for precise genome editing by using sister staining monomers or foreign DNA molecules, but efficiency is still low, for example, simultaneous blocking of KU70 and DNA ligase IV gene expression by siRNA increases HDR efficiency by 4-5 fold see Chu et al, natural biotechnology (Nature, biotechnology) 33: 2015, use of Cas9 nickase and long DNA donor template to make human embryonic stem cells (hESC) have 5% efficiency of HDR efficiency increase, see rooming et al, "use of CRISPR nicking and long stem cell integration promoter (CRISPR) 35 and expression of DNA promoter in vivo", and expression of DNA promoter protein in vivo, expression of DNA promoter, protein, expression of DNA promoter, protein, expression of DNA promoter, protein, expression of cell line 12, protein, expression of cell line, protein, expression of cell line 12, protein.

Traditionally, long homology arms (500bp to 2kb) have been used to integrate relatively large DNA fragments into the mammalian genome, and due to inefficient and random integration, targeting vectors have been constructed and large single cell populations screened. Thus, this approach is generally slow (about 4 to 6 months) and tedious, preventing the use of mammalian cells for the expression of recombinant proteins. To speed up the protein production process, transient gene expression is often used to exclude colony screening steps. Although transient expression results in high levels of protein production, transgenes are expressed only for a limited period of time. Thus, the production of recombinant proteins using mammalian systems becomes expensive. To meet the future market demand for recombinant proteins for biopharmaceutical use, cost-effective methods for rapid and efficient selection of high-productivity clones are needed.

The present disclosure relates, in part, to compositions and methods for editing nucleic acid molecules. There is a tangible need for efficient systems and techniques for modifying genomes. This document and related advantages address this need. Some embodiments provide a method for cloning relatively large DNA molecules directly into the genome of a mammal by using promoter capture in combination with short homology arms. Due to high efficiency and specificity, one can bypass the clonal cell isolation step and use a stable cell pool to produce recombinant proteins.

Disclosure of Invention

The compositions and methods set forth herein are directed to improving gene editing. As described elsewhere herein, a number of compositions and methods have been identified that increase gene editing efficiency.

Described herein is a method for homologous recombination in an initial nucleic acid molecule, the method comprising: generating a double-strand break in an initial nucleic acid molecule to generate a cleaved nucleic acid molecule, and contacting the cleaved nucleic acid molecule with a donor nucleic acid molecule, wherein the initial nucleic acid molecule comprises a promoter and a gene, wherein the donor nucleic acid molecule comprises: (i) matching ends at the 5 'and 3' ends of 12bp to 250bp in length, (ii) a promoterless selection marker, (iii) a reporter gene, (iv) a self-cleaving peptide linking the promoterless selection marker and the reporter gene or LoxP located on either side of the promoterless selection marker, and (iv) optionally a linker present between the promoterless selection marker and the reporter gene.

In some embodiments, the double strand break in the nucleic acid molecule: (i) the distance between the ATG initiation codon and the N end of the nucleic acid molecule is less than or equal to 250bp, and the N end is used for marking the cracked nucleic acid molecule; or (ii) less than or equal to 250bp from the stop codon for C-terminal labeling of the cleaved nucleic acid molecule.

In some embodiments, double strand breaks are induced by at least one nucleic acid cleavage entity or electroporation. In some embodiments, the at least one nucleic acid cleavage entity comprises a nuclease comprising one or more zinc finger proteins, one or more transcription activator-like effectors (TALEs), one or more CRISPR complexes, one or more argan-protein (argonaute) -nucleic acid complexes, or one or more meganucleases. In some embodiments, the at least one nucleic acid cleaving entity is administered using an expression vector, a plasmid, a ribonucleoprotein complex (RNC), or mRNA.

In some embodiments, the promoter-free selectable marker comprises a protein, an antibiotic resistance selectable marker, a cell surface protein, a metabolite, or an active fragment thereof. In some embodiments, the promoter-free selectable marker is a protein. In some embodiments, the protein is Focal Adhesion Kinase (FAK), angiopoietin-related growth factor (AGF) receptor, or Epidermal Growth Factor Receptor (EGFR).

In some embodiments, the promoter-free selectable marker is an antibiotic resistance selectable marker. In some embodiments, the antibiotic resistance selectable marker is a recombinant antibody. In some embodiments, the antibiotic resistance selectable marker is a human IgG antibody.

In some embodiments, the reporter gene comprises a fluorescent protein reporter. In some embodiments, the fluorescent protein reporter is a emerald green fluorescent protein (EmGFP) reporter or an Orange Fluorescent Protein (OFP) reporter.

In some embodiments, the promoter-free selectable marker: (i) (ii) linking to the 5' end of the reporter gene, N-terminal labeling of the nucleic acid molecule for cleavage; or (ii) linked to the 3' end of the reporter gene for C-terminal labeling of the cleaved nucleic acid molecule.

In some embodiments, the donor nucleic acid molecule comprises a linker between the promoterless selection marker and the reporter gene. In some embodiments, the distance between the promoterless selection marker and the reporter gene is less than or equal to 300nt, 240nt, 180nt, 150nt, 120nt, 90nt, 60nt, 30nt, 15nt, 12nt, or 9 nt. In some embodiments, the distance is 6 nucleotides. In some embodiments, the linker is a polyglycine linker (e.g., about 2 to about 5 glycine residues).

In some embodiments, the self-cleaving peptide is a self-cleaving 2A peptide.

In some embodiments, the matched ends are added to the 5 'and 3' ends of the donor nucleic acid molecule by PCR amplification.

In some embodiments, the matching ends share greater than or equal to 95% sequence identity.

In some embodiments, the matching end comprises single-stranded DNA or double-stranded DNA.

In some embodiments, the matching ends on the 5 'and 3' ends of the donor nucleic acid molecule have a length of 12bp to 200bp, 12bp to 150bp, 12bp to 100bp, 12bp to 50bp, or 12bp to 40 bp. In some embodiments, the matching end has a length (bp) of 35 base pairs.

In some embodiments, the initial nucleic acid molecule is present in a cell or plasmid.

In some embodiments, the donor nucleic acid molecule comprises a length less than or equal to 1kb, 2kb, 3kb, 5kb, 10kb, 15kb, 20kb, 25kb, or 30 kb.

In some embodiments, the donor nucleic acid molecule is integrated into the cleaved nucleic acid molecule by Homology Directed Repair (HDR). In some embodiments, wherein HDR is greater than or equal to 10%, 25%, 50%, 75%, 90%, 95%, 98%, 99%, or 100%. In some embodiments, HDR is 100%.

In some embodiments, the integration efficiency of the donor nucleic acid molecule is greater than or equal to 50%, 75%, 90%, 95%, 98%, 99%, or 100%. In some embodiments, the integration efficiency of the donor nucleic acid molecule is 100%.

In some embodiments, the method further comprises modifying the donor nucleic acid molecule at the 5 'terminus, the 3' terminus, or both the 5 'and 3' termini. In some embodiments, the donor nucleic acid molecule is modified at the 5 'and 3' ends. In some embodiments, the donor nucleic acid molecule is modified in at least one strand of at least one end with one or more nuclease-resistant groups. In some embodiments, the one or more nuclease-resistant groups comprise one or more phosphorothioate groups, one or more amino groups, 2 '-O-methyl nucleotides, 2' -deoxy-2 '-fluoro nucleotides, 2' -deoxy nucleotides, 5-C-methyl nucleotides, or a combination thereof.

In some embodiments, the method further comprises treating the donor nucleic acid molecule with at least one non-homologous end joining (NHEJ) inhibitor. In some embodiments, the at least one NHEJ inhibitor is DNA-dependent protein kinase (DNA-PK), DNA ligase IV, DNA polymerase 1 or 2(PARP-1 or PARP-2), or a combination thereof. In some embodiments, the DNA-PK inhibitor is Nu7206(2- (4-morpholinyl) -4H-naphtho [1, 2-b ] pyran-4-one), Nu7441(8- (4-dibenzothienyl) -2- (4-morpholinyl) -4H-1-benzopyran-4-one), Ku-0060648 (4-ethyl-N- [4- [2- (4-morpholinyl) -4-oxo-4H-1-benzopyran-8-yl ] -1-dibenzothienyl ] -1-piperazineacetamide), compound 401(2- (4-morpholinyl) -4H-pyrimido [2, 1-a ] isoquinolin-4-one), DMNB (4, 5-dimethoxy-2-nitrobenzaldehyde), ETP45658(3- [ 1-methyl-4- (4-morpholinyl) -1H-pyrazolo [3, 4-d ] pyrimidin-6-ylphenol), LTURM 34(8- (4-dibenzothienyl) -2- (4-morpholinyl) -4H-1, 3-benzoxazin-4-one), or Pl 103 hydrochloride (3- [4- (4-morpholinylpyrido [3 ', 2': 4, 5] furo [3, 2-d ] pyrimidin-2-yl ] phenol hydrochloride).

In some embodiments, the mammal is a human, a mammalian laboratory animal, a mammalian farm animal, a mammalian sport animal, or a mammalian pet. In some embodiments, the mammal is a human.

In some embodiments, the cell or plasmid is prepared by any of the homologous recombination methods described herein. In some embodiments, the cell is a eukaryotic cell. In some embodiments, the eukaryotic cell is a mammalian cell.

Also described herein is a method of cell therapy comprising administering to a subject in need thereof an effective amount of any of the cells described herein.

In some embodiments, the cell is a T cell and the promoterless selectable marker is a Chimeric Antigen Receptor (CAR).

Also described herein is a method of producing a promoter-free selectable marker, the method comprising activating a promoter of a cell or plasmid prepared by any of the homologous recombination methods described herein to produce the promoter-free selectable marker.

Also described herein is a composition comprising a promoter-free selection marker produced by any of the methods for producing a promoter-free selection marker described herein.

Also described herein is a method of treating a subject in need thereof, the method comprising administering an effective amount of a promoter-free selection marker produced by any of the methods for producing a promoter-free selection marker described herein.

Also described herein are drug screening assays comprising a promoter-free selection marker produced by any of the methods described herein for producing a promoter-free selection marker.

Also described herein are kits for generating a promoter-free selection marker comprising a promoter-free selection marker linked to a reporter gene by a self-cleaving peptide or LoxP located on either side of the selection marker. In some embodiments, the reporter gene is GFP or OFP. In some embodiments, the kit further comprises at least one nucleic acid cleavage entity. In some embodiments, the kit further comprises at least one NHEJ inhibitor. In some embodiments, the kit further comprises one or more nuclease resistant groups.

Also described herein are recombinant antibody expression cassettes comprising: matching ends located at the 5 'and 3' ends of the cassette, wherein the length of the matching ends is less than or equal to 250 bp; a promoter-free selectable marker; a reporter gene; a self-cleaving peptide linking the promoter-free selectable marker and the reporter gene; and optionally a linker between the promoterless selection marker and the reporter gene, wherein the promoterless selection marker is linked to the 5 'end of the reporter gene for N-terminal labeling of the cleaved nucleic acid molecule or to the 3' end of the reporter gene for C-terminal labeling of the cleaved nucleic acid molecule.

Also described herein are compositions and methods for altering an endogenous nucleic acid molecule present within a cell, the method comprising introducing a donor nucleic acid molecule (e.g., a donor DNA molecule) into the cell, wherein the donor nucleic acid molecule is operably linked to one or more intracellular targeting moieties that are capable of localizing the donor nucleic acid molecule at the location of the endogenous nucleic acid molecule in the cell.

In some embodiments, the location of the endogenous nucleic acid molecule in the cell is in the nucleus, mitochondria, or chloroplast.

In some aspects, gene editing proteins and related methods are provided to allow efficient site-specific cleavage of nucleic acid molecules within cells, even when introduced in small quantities into cells. Thus, compositions and methods are provided that allow for high levels of site-specific cleavage even when present at low concentrations. Many factors may influence the amount of intracellular nucleic acid cleavage that occurs. Such factors include (1) the amount of active gene editing agent contacted with the predetermined locus to be cleaved; (2) the level of lytic activity exhibited by the gene-editing reagent; and (3) the amount of donor nucleic acid immediately adjacent to the cleavage site. More generally, the amount of editing that occurs at a locus within a particular cell in a population of cells is determined as a percentage of cells in which at least one locus is disrupted relative to diploid cells.

In some embodiments, the one or more intracellular targeting moieties are nuclear localization signals. In some embodiments, the nuclear localization signal is operably linked to the 5' terminus of the donor nucleic acid molecule.

In some embodiments, the donor nucleic acid molecule is operably linked to at least one nucleic acid cleavage entity. In some embodiments, the at least one nucleic acid cleavage entity comprises a nuclease comprising one or more zinc finger proteins, one or more transcription activator-like effectors (TALEs), one or more CRISPR complexes, one or more argan-nucleic acid complexes, one or more meganucleases, or one or more megabase meganucleases.

In some embodiments, the donor DNA molecule is not linked to a nucleic acid cleavage entity.

In some embodiments, the donor nucleic acid molecule (e.g., donor DNA molecule) is about 25 to about 8,000 nucleotides in length (e.g., about 25 to about 8,000 nucleotides, about 25 to about 5,000 nucleotides, about 25 to about 3,000 nucleotides, about 25 to about 2,000 nucleotides, about 25 to about 1,500 nucleotides, about 30 to about 100 nucleotides, about 30 to about 200 nucleotides, about 50 to about 500 nucleotides, about 50 to about 2,000 nucleotides, about 50 to about 8,000 nucleotides, about 75 to about 2,000 nucleotides, about 250 to about 5,000 nucleotides, etc.). One example where a short donor nucleic acid molecule may be desired is SNP insertion or correction. For example, in such a case, the donor nucleic acid molecule can have two homology arms of 15 nucleotides each and a single nucleotide for altering the target locus.

Furthermore, the donor nucleic acid molecule may be single-stranded, double-stranded, linear or circular.

In addition, the donor nucleic acid molecule can have one or more nuclease resistant groups within 50 nucleotides of at least one terminus. These nuclease resistant groups may be phosphorothioate groups. Furthermore, the two phosphorothioate groups may be located within 50 nucleotides of at least one of the termini.

In some embodiments, the donor nucleic acid molecule comprises a positive selection marker and/or a negative selection marker. In addition, the negative selection marker may be herpes simplex virus thymidine kinase.

In certain embodiments, the donor nucleic acid molecule has two regions that are complementary to a target locus sequence present in the cell. In addition, the positive selection marker, when present, may be located between two regions of sequence complementarity of the donor nucleic acid molecule. In addition, the negative selection marker, when present, may not be located between two regions of sequence complementarity of the donor nucleic acid molecule. In other words, the negative selection marker may be located outside of the two regions of sequence complementarity.

In some embodiments, donor nucleic acid molecules operably linked to one or more intracellular targeting moieties capable of localizing the donor DNA molecule to the location of the endogenous nucleic acid molecule in the cell can be used in conjunction with the other compositions and methods set forth herein. Thus, further provided herein are methods of additionally contacting a cell with one or more of: (1) one or more nucleic acid cleavage entities, (2) one or more nucleic acid molecules encoding at least one component of a nucleic acid cleavage entity, (3) one or more DNA binding modulation enhancers, (4) one or more nucleic acid molecules encoding at least one component of a DNA binding modulation enhancer, or (5) one or more non-homologous end-joining (NHEJ) inhibitors.

As described elsewhere herein, it has been found that the use of non-homologous end joining (NHEJ) inhibitors can enhance the efficiency of homologous recombination. Accordingly, further provided herein are methods of contacting a cell with one or more non-homologous end joining (NHEJ) inhibitors, in particular wherein the one or more non-homologous end joining (NHEJ) inhibitors are DNA-dependent protein kinase inhibitors. Other non-homologous end-joining (NHEJ) inhibitors that may be used include one or more compounds selected from the group consisting of: (1) nu7206, (2) Nu7441, (3) Ku-0060648, (4) DMNB, (5) ETP45658, (6) LTURM 34 and (7) Pl 103 hydrochloride.

In addition, donor nucleic acid molecules operably linked to one or more intracellular targeting moieties can be introduced into cells by using gene editing reagents designed to cleave intracellular DNA at a target locus. Thus, at least one of the one or more nucleic acid cleaving entities may be selected from the group consisting of: (1) zinc finger nucleases, (2) TAL effector nucleases, and (3) CRISPR complexes. Similarly, the invention includes the use of at least one of one or more DNA binding modulation enhancers selected from the group consisting of: (1) zinc finger nucleases, (2) TAL effector nucleases, and (3) CRISPR complexes. In addition, at least one of the one or more DNA binding modulation enhancers, when used, can be designed to bind within 50 nucleotides of the target locus.

The invention further includes, in part, methods of homologous recombination in a eukaryotic cell, comprising contacting the cell with: (1) a donor nucleic acid molecule (e.g., a donor DNA molecule) and (2) (i) a nucleic acid cleavage entity, (ii) a nucleic acid encoding the nucleic acid cleavage entity, or (iii) at least one component of the nucleic acid cleavage entity and a nucleic acid encoding at least one component of the nucleic acid cleavage entity, wherein the donor nucleic acid molecule is bound to an intracellular targeting moiety capable of localizing the donor nucleic acid molecule to the location of the endogenous nucleic acid molecule in the cell.

Such methods further comprise contacting the cell with one or more of: (1) one or more non-homologous end-joining (NHEJ) inhibitors, (2) one or more DNA-binding modulation enhancers, (3) one or more nucleic acids encoding a DNA-binding modulation enhancer, and (4) at least one component of one or more DNA-binding modulation enhancers and a nucleic acid encoding at least one component of one or more DNA-binding modulation enhancers.

The invention also includes, in part, compositions comprising a nucleic acid molecule (e.g., a DNA molecule), wherein the nucleic acid molecule is covalently linked to one or more intracellular targeting moieties, and wherein the nucleic acid molecule is from about 25 nucleotides to about 8,000 nucleotides (e.g., from about 25 to about 8,000 nucleotides, from about 25 to about 5,000 nucleotides, from about 25 to about 3,000 nucleotides, from about 25 to about 2,000 nucleotides, from about 25 to about 1,500 nucleotides, from about 30 to about 100 nucleotides, from about 30 to about 200 nucleotides, from about 50 to about 500 nucleotides, from about 50 to about 2,000 nucleotides, from about 50 to about 8,000 nucleotides, from about 75 to about 2,000 nucleotides, from about 250 to about 5,000 nucleotides, etc.) in length. In some cases, the nucleic acid molecule is a donor nucleic acid molecule (e.g., a donor DNA molecule). In some cases, the one or more intracellular targeting moieties are nuclear localization signals. In other cases, two or more intracellular targeting moieties (e.g., nuclear localization signals, chloroplast targeting signals, mitochondrial targeting signals, etc.) are covalently linked to the nucleic acid molecule.

In one aspect, a method of increasing the accessibility of a target locus in a cell is provided. The method comprises the following steps: (1) introducing a first DNA binding modulation enhancer into a cell comprising a nucleic acid encoding a locus of interest, wherein the first DNA binding modulation enhancer is not endogenous to the cell; and (2) a first enhancer binding sequence that binds the first DNA binding modulation enhancer to the target locus, relative to the absence of the first DNA binding modulation enhancer, thereby enhancing accessibility of the target locus.

In one aspect, a method of replacing chromatin of a target locus in a cell is provided. The method comprises the following steps: (1) introducing a first DNA binding modulation enhancer into a cell comprising a nucleic acid encoding a locus of interest, wherein the first DNA binding modulation enhancer is not endogenous to the cell; and (2) allowing the first DNA binding modulation enhancer to bind to the first enhancer binding sequence of the target locus, thereby replacing chromatin of the target locus.

In one aspect, a method of reconstituting chromatin of a target locus in a cell is provided. The method comprises the following steps: (1) introducing a first DNA binding modulation enhancer into a cell comprising a nucleic acid encoding a locus of interest, wherein the first DNA binding modulation enhancer is not endogenous to the cell; and (2) allowing the first DNA binding modulation enhancer to bind to the first enhancer binding sequence of the target locus, thereby reconstituting chromatin of the target locus.

In one aspect, a method of increasing the accessibility of a target locus in a cell is provided. The method comprises (1) introducing into a cell comprising a nucleic acid encoding a locus of interest: (i) a first DNA binding modulation enhancer, wherein the first DNA binding modulation enhancer is not endogenous to the cell; and (ii) a second DNA binding modulation enhancer, wherein the second DNA binding modulation enhancer is not endogenous to the cell. (2) A first enhancer binding sequence that binds the first DNA binding modulation enhancer to the target locus; and (3) a second enhancer binding sequence that binds the second DNA binding modulation enhancer to the target locus, relative to the absence of the first DNA binding modulation enhancer or the second DNA binding modulation enhancer, thereby enhancing accessibility of the target locus.

In one aspect, a method of replacing chromatin of a target locus in a cell is provided. The method comprises the following steps: (1) introducing into a cell comprising a nucleic acid encoding a locus of interest: (i) a first DNA binding modulation enhancer, wherein the first DNA binding modulation enhancer is not endogenous to the cell; and (ii) a second DNA binding modulation enhancer, wherein the second DNA binding modulation enhancer is not endogenous to the cell. (2) A first enhancer binding sequence that binds the first DNA binding modulation enhancer to the target locus; and (3) allowing the second DNA binding modulation enhancer to bind to the second enhancer binding sequence of the target locus, thereby replacing chromatin of the target locus.

In one aspect, a method of reconstituting chromatin of a target locus in a cell is provided. The method comprises the following steps: (1) introducing into a cell comprising a nucleic acid encoding a locus of interest: (i) a first DNA binding modulation enhancer, wherein the first DNA binding modulation enhancer is not endogenous to the cell; and (ii) a second DNA binding modulation enhancer, wherein the second DNA binding modulation enhancer is not endogenous to the cell. (2) A first enhancer binding sequence that binds the first DNA binding modulation enhancer to the target locus; and (3) allowing the second DNA binding modulation enhancer to bind to the second enhancer binding sequence of the target locus, thereby reconstituting chromatin of the target locus.

In one aspect, a method of enhancing the activity of a regulatory protein or regulatory complex at a target locus in a cell is provided. The method comprises (1) introducing into a cell comprising a nucleic acid encoding a locus of interest: (i) a first regulatory protein or a first regulatory complex capable of binding to a regulator binding sequence of a target locus, wherein the regulator binding sequence comprises a regulatory site; and (ii) a first DNA binding-regulatory enhancer capable of binding to a first enhancer binding sequence of a target locus. And (2) allowing the first DNA binding regulatory enhancer to bind to the first enhancer binding sequence, thereby enhancing the activity of the first regulatory protein or the first regulatory complex at the target locus in the cell.

In one aspect, a method of modulating a target locus in a cell is provided. The method comprises (1) introducing into a cell comprising a nucleic acid encoding a locus of interest: (i) a first regulatory protein or a first regulatory complex capable of binding to a regulator binding sequence of a target locus, wherein the regulator binding sequence comprises a regulatory site; and (ii) a first DNA binding-regulatory enhancer capable of binding to a first enhancer binding sequence of a target locus. And (2) allowing the first regulatory protein or the first regulatory complex to regulate the regulatory site, thereby modulating the locus of interest in the cell.

In embodiments, the method comprises introducing a second DNA binding modulation enhancer capable of binding a second enhancer binding sequence of the target locus.

In embodiments, the first regulatory protein or first regulatory complex is not endogenous to the cell.

In embodiments, the rate of homologous recombination at the target locus is increased relative to the absence of the first DNA binding-modulating enhancer.

In embodiments, the second enhancer binding sequence is linked to the first enhancer binding sequence by a regulator binding sequence.

In embodiments, the method further comprises introducing a second regulatory protein or a second regulatory complex capable of binding to the regulator binding sequence.

In embodiments, the first regulatory protein or the second regulatory protein comprises a DNA binding protein or a DNA regulatory enzyme. In embodiments, the DNA binding protein is a transcriptional repressor or transcriptional activator. In embodiments, the DNA-modulating enzyme is a nuclease, deaminase, methylase or demethylase.

In embodiments, the first regulatory protein or the second regulatory protein comprises a histone regulatory enzyme. In embodiments, the histone modifying enzyme is a deacetylase or an acetylase.

In embodiments, the first regulatory protein is a first DNA-binding nuclease conjugate. In embodiments, the second regulatory protein is a second DNA-binding nuclease conjugate. In embodiments, the first DNA-binding nuclease conjugate comprises a first nuclease and the second DNA-binding nuclease conjugate comprises a second nuclease. In embodiments, the first nuclease and the second nuclease form a dimer. In embodiments, the first nuclease and the second nuclease are independently a transcription activator-like effector nuclease (TALEN).

In embodiments, the first DNA-binding nuclease conjugate comprises a first transcription activator-like (TAL) effector domain operably linked to a first nuclease (TALEN). In embodiments, the first DNA-binding nuclease conjugate comprises a first TAL effector domain operably linked to a first fokl nuclease. In embodiments, the second DNA-binding nuclease conjugate comprises a second TAL effector domain operably linked to a second nuclease (TALEN). In embodiments, the second DNA-binding nuclease conjugate comprises a second TAL effector domain operably linked to a second fokl nuclease. In embodiments, the first DNA-binding nuclease conjugate comprises a first zinc finger nuclease. In embodiments, the second DNA-binding nuclease conjugate comprises a first zinc finger nuclease.

In embodiments, the first regulatory complex is a first ribonucleoprotein complex. In embodiments, the second regulatory complex is a second ribonucleoprotein complex. In embodiments, the first ribonucleoprotein complex comprises a CRISPR-associated protein 9(Cas9) domain that binds to a gRNA or an argan domain that binds to a guide dna (gdna). In embodiments, the second ribonucleoprotein complex comprises a CRISPR-associated protein 9(Cas9) domain that binds to a gRNA or an argan domain that binds to a guide dna (gdna).

In embodiments, the first regulatory protein, first regulatory complex, second regulatory protein, or second regulatory complex is not endogenous to the cell. In embodiments, the first regulatory protein and the second regulatory protein are not endogenous to the cell. In embodiments, the first regulatory complex and the second regulatory complex are not endogenous to the cell. In embodiments, the first DNA binding modulation enhancer or the second DNA binding modulation enhancer is not endogenous to the cell. In embodiments, the first DNA binding modulation enhancer and the second DNA binding modulation enhancer are not endogenous to the cell.

In embodiments, the first DNA binding modulation enhancer is a first DNA binding protein or a first DNA binding nucleic acid. In embodiments, the first DNA binding modulation enhancer is a first transcription activator-like (TAL) effector protein or a truncated first guide rna (grna).

In embodiments, the second DNA binding modulation enhancer is a second DNA binding protein or a second DNA binding nucleic acid. In embodiments, the second DNA binding modulation enhancer is a TAL effector protein or a truncated gRNA.

In embodiments, the first DNA binding modulation enhancer is a first TAL effector protein and the second DNA binding modulation enhancer is a second TAL effector protein. In embodiments, the first DNA binding modulation enhancer is a TAL effector protein and the second DNA binding modulation enhancer is a truncated gRNA. In embodiments, the first DNA binding modulation enhancer is a truncated first gRNA and the second DNA binding modulation enhancer is a truncated second gRNA. In embodiments, the first DNA binding modulation enhancer is a truncated gRNA and the second DNA binding modulation enhancer is a TAL effector protein.

In embodiments, the first regulatory protein is a first DNA-binding nuclease conjugate and the second regulatory protein is a second DNA-binding nuclease conjugate. In embodiments, the first regulatory protein is a DNA-binding nuclease conjugate and the second regulatory complex is a ribonucleoprotein complex. In an embodiment, the first regulatory complex is a first ribonucleoprotein complex and the second regulatory complex is a second ribonucleoprotein complex. In embodiments, the first regulatory complex is a ribonucleoprotein complex and the second regulatory protein is a DNA-binding nuclease conjugate.

In embodiments, the first enhancer binding sequence and/or the second enhancer binding sequence is independently separated from the regulator binding sequence by less than 200 nucleotides (e.g., about 5 to about 180, about 10 to about 180, about 20 to about 180, about 5 to about 90, about 5 to about 70, about 5 to about 60, about 5 to about 50, about 5 to about 40, about 5 to about 30, about 15 to about 80, about 15 to about 60, about 15 to about 50, about 15 to about 40, about 20 to about 40 nucleotides, etc.). In embodiments, the first enhancer binding sequence is independently separated from the regulator binding sequence by less than 150 nucleotides. In embodiments, the first enhancer binding sequence and/or the second enhancer binding sequence is separated from the regulator binding sequence by less than 100 nucleotides. In embodiments, the first enhancer binding sequence and/or the second enhancer binding sequence are independently separated from the regulator binding sequence by less than 50 nucleotides. In embodiments, the first enhancer binding sequence and/or the second enhancer binding sequence are independently separated from the regulator binding sequence by 4 to 30 nucleotides. In embodiments, the first enhancer binding sequence and/or the second enhancer binding sequence are independently separated from the regulator binding sequence by 7 to 30 nucleotides. In embodiments, the first enhancer binding sequence and/or the second enhancer binding sequence is separated from the regulator binding sequence by 4 nucleotides, 7 nucleotides, 12 nucleotides, 20 nucleotides, or 30 nucleotides.

In embodiments, the first enhancer binding sequence and/or the second enhancer binding sequence are independently separated from the regulatory site by 10 to 40 nucleotides. In embodiments, the first enhancer binding sequence and/or the second enhancer binding sequence are independently separated from the regulatory site by 33 nucleotides.

In embodiments, the first enhancer binding sequence has the sequence of SEQ ID NO: 26. SEQ ID NO: 28. SEQ ID NO: 30. SEQ ID NO: 32. SEQ ID NO: 34. SEQ ID NO: 36. SEQ ID NO: 38 or SEQ ID NO: 40, in a sequence of seq id no. In embodiments, the second enhancer binding sequence has the sequence of SEQ ID NO: 27. SEQ ID NO: 29. SEQ ID NO: 31. SEQ ID NO: 33. SEQ ID NO: 35. SEQ ID NO: 37. SEQ ID NO: 39 or SEQ ID NO: 41.

In embodiments, the first DNA binding regulatory enhancer or the second DNA binding regulatory enhancer enhances the activity of the first regulatory protein, the first regulatory complex, the second regulatory protein or the second regulatory complex at the regulatory site.

In one aspect, a cell is provided that comprises a nucleic acid encoding a locus regulatory complex of interest. The compound comprises: (i) a target locus comprising a first enhancer binding sequence and a regulator binding sequence comprising a regulatory site; (ii) a first regulatory protein or first regulatory complex that binds to a regulator binding sequence; and (iii) a first DNA binding modulation enhancer associated with a first enhancer binding sequence.

In embodiments, the target locus further comprises a second enhancer binding sequence linked to the first enhancer binding sequence by a regulator binding sequence.

In embodiments, the cell comprises a second DNA binding modulation enhancer in combination with a second enhancer binding sequence.

In one aspect, a cell comprising a nucleic acid encoding a complex of a locus of interest is provided. The complex comprises (i) a locus of interest comprising a first enhancer binding sequence; and (ii) a first DNA binding modulation enhancer associated with a first enhancer binding sequence, wherein the first DNA binding modulation enhancer is not endogenous to the cell, and wherein the first DNA binding modulation enhancer is capable of enhancing accessibility to the target locus relative to the absence of the first DNA binding modulation enhancer.

In one aspect, a cell comprising a nucleic acid encoding a complex of a locus of interest is provided. The complex comprises (1) a target locus comprising: (i) a first enhancer binding sequence; and (ii) a second enhancer binding sequence. (2) A first DNA binding modulation enhancer associated with a first enhancer binding sequence of a target locus, wherein the first DNA binding modulation enhancer is not endogenous to the cell; and (3) a second DNA binding modulation enhancer associated with a second enhancer binding sequence of the target locus, wherein the second DNA binding modulation enhancer is not endogenous to the cell, wherein the first DNA binding modulation enhancer and the second DNA binding modulation enhancer are capable of enhancing accessibility of the target locus relative to the absence of the first DNA binding modulation enhancer and the second DNA binding modulation enhancer.

In certain aspects, kits are provided. The kits provided herein can include one or more of the following: (i) a first regulatory protein, a first regulatory complex; (ii) a first DNA binding modulation enhancer; (iii) one or more nucleic acid molecules; (iv) one or more intracellular targeting moieties; and (v) one or more non-homologous end-joining inhibitors.

Also provided herein are gene editing agents, such as Cas9 proteins, and nucleic acids encoding such agents, comprising two or more (e.g., about two to about twelve, about three to about twelve, about four to about twelve, about five to about twelve, about two to about seven, about three to about seven, etc.) Nuclear Localization Signals (NLSs) (e.g., non-classical, mono-and/or bi-modal NLSs). Exemplary Cas9 proteins are those comprising two or more double-typing Nuclear Localization Signals (NLS). In addition, all or some of the two or more bimolecular nuclear localization signals can be located within twenty amino acids of at least one terminus, such as the N-terminus and/or C-terminus of the Cas9 protein. The position here refers to the portion of the NLS closest to the end. Thus, if the C-terminal amino acid of an NLS is followed by ten other amino acids, and the last amino acid is the C-terminus of the protein, the position of the NLS is eleven amino acids from the C-terminus. In other words, the position count is determined by the last amino acid of the NLS.

Furthermore, the gene-editing reagent (e.g., Cas9 protein) may comprise NLS that differ in amino acid sequence or have the same amino acid sequence. In addition, the gene-editing reagent (e.g., Cas9 protein) may comprise one or more (e.g., about one to about five, about one to about four, etc.) affinity tags. The use in combination with a gene editing reagent may comprise one or more of the following amino acid sequences: (A) KRTAD GSEFE SPKKK RKVE (SEQ ID NO: 48), (B) KRTAD GSEFE SPKKA RKVE (SEQ ID NO: 49), (C) KRTAD GSEFE SPKKKAKVE (SEQ ID NO: 50), (D) KRPAA TKKAG QAKKK K (SEQ ID NO: 51), and (E) KRTAD GSEFEP AAKRV KLDE (SEQ ID NO: 52). The NLS used in conjunction with the gene editing reagent may comprise one or more amino acid sequences that fall within the scope of one or more of the following formulae: (A) KRX_5-15KKN₁N₂KV(SEQ ID NO：53)、(B)KRX_(5-15)K(K/R)(K/R)_1-2(SEQ ID NO：54)、(C)KRX_(5-15)K(K/R)X(K/R)_1-2(SEQ ID NO: 55) wherein X is an amino acid sequence of 5 to 15 amino acids in length and wherein N is₁Is L or A, and wherein N₂Is L, A or R. In addition, the claimed specific Cas9 protein that can be used in the compositions and methods set forth herein comprises the amino acid sequences shown in figure 41 and figure 42.

Further, provided herein are TALE proteins comprising one or more (e.g., about two to about six, about two to about five, about two to about four, about two to about three, about three to about five, etc.) heterologous nuclear localization signals (e.g., a monodispersed NLS, a bimorphic NLS, etc.). In some aspects, provided herein are TALE proteins comprising amino acids (amino acids 811-830 of figure 46), wherein the amino acids at positions 815-816 and 824-825 are Gly-Ser or Gly-Gly; and TALE proteins comprising amino acids (amino acids 810-1029 of FIG. 46), wherein the amino acids at positions 1022-1023 are Gly-Ser or Gly-Gly. In addition, the TALE proteins provided herein can comprise amino acids 752-1021 of figure 46.

In some aspects, provided herein are TALE proteins comprising amino acids (amino acids 20-165 of FIG. 47), wherein the amino acids at positions 28-29 are Gly-Ser or Gly-Gly and wherein the amino acids at positions 108-110 and 823-824 are Arg-Gly-Ala or Gln-Trp-Ser. In addition, the TALE proteins provided herein may comprise amino acids (amino acids 821-840 of fig. 47), wherein the amino acids at positions 827-828 are Gly-Ser or Gly-Gly. The TALE protein may also comprise amino acids corresponding to figure 46.

TAL proteins provided herein in various aspects can comprise a repeat region comprising 4 to 25 (e.g., about 5 to about 22, about 6 to about 22, about 8 to about 22, about 10 to about 22, about 12 to about 26, about 13 to about 20, etc.) repeat units.

Also provided herein are methods of engineering intracellular nucleic acids in a cell, the methods comprising introducing into the cell one or more TALE proteins (e.g., one or more TALE proteins mentioned above) or nucleic acids encoding the one or more TALE proteins, wherein the one or more TALE proteins are designed to bind to a locus of interest within the cell. In some aspects, such methods further comprise introducing into the cell one or more donor nucleic acid molecules, wherein the one or more donor nucleic acid molecules have one or more regions of sequence homology to a nucleic acid within 50 (e.g., about 0 to about 50, about 0 to about 40, about 0 to about 30, about 0 to about 20, about 6 to about 40, etc.) nucleotides of the target locus.

Further provided herein is a method of homologous recombination of intracellular nucleic acid molecules within a population of cells at a cleavage site, the method comprising: (a) causing the intracellular nucleic acid molecule to generate one or more double-strand breaks at the cleavage site to produce a cleaved nucleic acid molecule; and (b) contacting the cleaved nucleic acid molecule with one or more donor nucleic acid molecules, wherein the one or more donor nucleic acid molecules have at least ten (e.g., about 10 to about 500, about ten to about 500, about 10 to about 400, about 10 to about 300, about 10 to about 250, about 20 to about 300, about 25 to about 300, about 30 to about 350, etc.) nucleotides or base pairs that are homologous to nucleic acids within 100 base pairs located on each side of the cleavage site, wherein at least 95% (e.g., about 95% to about 100%, about 95% to about 99%, about 96% to about 99%, about 95% to about 98%, about 96% to about 99%, etc.) cells within the population of cells undergo homology-directed repair at the cleavage site in the presence of at least one of the one or more donor nucleic acid molecules. In some aspects, one or more donor nucleic acid molecules contain one or more selectable markers or one or more reporter genes that are operably linked to a promoter present in an intracellular nucleic acid molecule following homology directed repair. Furthermore, the one or more donor nucleic acid molecules may be linked to one or more nuclear localization signals that allow the one or more donor nucleic acid molecules (donor nucleic acid molecules) to localize to the nucleus of the cell population.

In some aspects, the population of cells can be contacted with one or more of: (1) one or more nucleic acid cleavage entities, (2) one or more nucleic acid molecules encoding at least one component of a nucleic acid cleavage entity, (3) one or more DNA binding modulation enhancers, (4) one or more nucleic acid molecules encoding at least one component of a DNA binding modulation enhancer, and/or (5) one or more non-homologous end joining (NHEJ) inhibitors. Furthermore, one or more of the one or more donor nucleic acid molecules may be single stranded.

In other aspects, the population of cells can be contacted with one or more nucleic acid cleavage entities or one of a plurality of nucleic acid molecules encoding one or more nucleic acid cleavage entities, and then the population of cells can be contacted with one or more donor nucleic acid molecules. In addition, the population of cells can be contacted with one or more donor nucleic acid molecules, and then the population of cells can be contacted with one or more nucleic acid cleavage entities or one of a plurality of nucleic acid molecules encoding the one or more nucleic acid cleavage entities. In addition, after contacting the population of cells with the one or more nucleic acid cleavage entities or one of the plurality of nucleic acid molecules encoding the one or more nucleic acid cleavage entities, the population of cells can be contacted with the one or more donor nucleic acid molecules for 1 to 60 minutes. In contrast, after contacting the population of cells with the one or more donor nucleic acid molecules, the population of cells can be contacted with the one or more nucleic acid cleavage entities or one of the plurality of nucleic acid molecules encoding the one or more nucleic acid cleavage entities for 1 to 60 minutes. In some cases, the population of cells can be contacted with one or more nucleic acid cleavage entities or one of a plurality of nucleic acid molecules encoding one or more nucleic acid cleavage entities and one or more donor nucleic acid molecules simultaneously.

In other aspects in combination with the above, the one or more nucleic acid cleavage entities or one of the plurality of nucleic acid molecules encoding the one or more nucleic acid cleavage entities and the one or more donor nucleic acid molecules can be introduced into the cell together or separately by electroporation. In addition, one or more nucleic acid cleavage entities or one of more nucleic acid molecules encoding one or more nucleic acid cleavage entities may first be introduced into the cell followed by electroporation of one or more donor nucleic acid molecules, or one or more donor nucleic acid molecules may first be introduced into the cell followed by electroporation of one or more nucleic acid cleavage entities or one of more nucleic acid molecules encoding one or more nucleic acid cleavage entities.

Additional objects and advantages will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice. The objects and advantages will be realized and attained by means of the elements and combinations particularly pointed out in the appended claims.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the claims.

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate several embodiments and together with the description, serve to explain the principles described herein.

Drawings

For a more complete understanding of the principles disclosed herein and the advantages thereof, reference is now made to the following descriptions taken in conjunction with the accompanying drawings, in which:

FIGS. 1A and 1B show protein labeling by promoter capture and short homology arms. FIG. 1A shows an N-terminal tag. The promoterless selection marker puromycin was linked to either emerald green fluorescent protein (EmGFP) reporter or Orange Fluorescent Protein (OFP) reporter by self-cleaving the 2A peptide followed by the addition of 35m homology arms at the 5 'and 3' ends by PCR. The endogenous promoter drives the expression of puromycin, the reporter gene and the endogenous gene. TALENs or CRISPRs induce Double Strand Breaks (DSBs) near the translation initiation site. FIG. 1B shows the C-terminal marker. The EmGFP or OFP reporter was linked to the promoterless selection marker puromycin by leukapheresis of the 2A peptide, followed by the addition of 35nt homology arms at the 5 'and 3' ends. The endogenous promoter drives the expression of the endogenous gene, the reporter gene, and puromycin. TALENs or CRISPRs induce DSBs near the translation termination site. The stop codon is eliminated between the endogenous gene and the reporter gene. In FIGS. 1A and 1B, the donor DNA is inserted into the genome by homologous recombination. The 5 'and 3' ends of the junction were analyzed by PCR using F1/R1 and F2/R2 primer sets, respectively.

Figures 2A to 2D show the effect of donor form and dose and homologous arm length on HDR efficiency. In fig. 2A, Cas9RNP and different amounts of donor DNA with 35nt homology arms were delivered into 293FT cells by electroporation. Samples lacking grnas were used as controls. 48 hours after transfection, cells were analyzed by flow cytometry to determine the percentage of OFP positive cells that did not undergo puromycin selection (-). Alternatively, cells were treated with puromycin for 7 days prior to flow cytometry analysis (+). In fig. 2B, various homology arm lengths were added to the insert cassette by PCR amplification and then co-transfected with Cas9RNP into 293FT cells. Cells were analyzed by flow cytometry as described in figure 2A. In fig. 2C and 2D, Cas9RNP and donor plasmids with about 500nt homology arms, or single-stranded (ss) or double-stranded (ds) DNA donors with 35nt homology arms, were transfected into 293FT or human primary T cells by electroporation. 48 hours after transfection, cells were analyzed by flow cytometry.

FIGS. 3A to 3E show the characterization of clonal cells with OFP integrated into the β -actin locus Cas9RNP and those with 35nt homology arms by electroporationThe resulting PCR products are analyzed by sequencing FIGS. 3A and 3B show the N-and C-terminal junctions with exact HDR (1) or HDR (2) with indels, respectively, the exact HDR (1) arrows in FIGS. 3A and 3B indicate the junctions between genomic DNA and donor DNA or Cas9 cleavage sites, the sequences shown in bold in FIGS. 3A and 3B indicate the 35nt homology arms, the italic ATG indicates the start codon for β, the HDR (2) with indels in FIGS. 3A and 3B show an example of the formation of an indel around the junction, FIG. 3C shows the characterization of the zygosity of the clonal cell, FIG. 3C shows the exact HDR of about 68% at both junctions, and the 32% of the insertion deletion at the C or N-terminal, the allele 2 with about 80% of the insertion in HDR < 80%, (the two-fold markers for wild-type clones) (FIG. 3A > 18% by Tal 3D-3) and the wild-type cloning with the insertion of the wild-type DNA marker [ 35% of the insertion of HDR 2 ] A3D 3A > 35% of the wild-3D 3E 3BElectroporation (Thermo Fisher Scientific, catalog number MPK5000) either TALEN mRNA alone or TALEN mRNA with donor DNA was transfected into HEK293FT cells. Fig. 3D shows genome editing efficiency (% indels) and fig. 3E shows the percentage of OFP-positive cells (-) and the percentage of puromycin-treated OFP-positive cells (+) as analyzed by flow cytometry.

Fig. 4A, 4B and 4C show the N-terminal labeling of EmGFP against LRRK2 in a549 cells. Cas9RNP and donor DNA containing a promoterless puromycin-P2A-EmGFP fragment and about 35nt homology arms were delivered into cells by electroporation. At 48 hours post transfection, cells were subjected to clonal cell isolation. After amplification, the clonal cells were lysed and analyzed by ligation PCR using one inner and one outer primer at either the N-terminus (FIG. 4A) or C-terminus (FIG. 4B). Alternatively, a pair of outer primers was used to analyze the genomic modifications of both alleles (fig. 4C). The resulting PCR products were analyzed by sequencing. The sequences in bold in fig. 4A and 4B indicate homology arms. The bottom arrow indicates the Cas9 cleavage site or the junction point between the genomic DNA and the donor DNA. 7nt No HDR in Δ FIG. 4C indicates that HDR did not occur, but had a 7nt deletion.

In FIGS. 5A (SEQ ID NOS: 56-62), 5B (SEQ ID NOS: 63-69) and 5C, the C-terminus of FAK was labeled with EmGFP. Cas9RNP and donor DNA with short homology arms were transfected into 293FT by electroporation. After puromycin selection, cells were subjected to clonal cell isolation. The junctions were amplified by PCR, followed by sequencing analysis of either the N-terminal junctions (fig. 5A) or the C-terminal junctions (fig. 5B). Arrows indicate Double Strand Breaks (DSBs) or junctions between genomic DNA and donor DNA. In the case of the exact HDR, the short homology arm (bold) and the stop codon (underlined) are also shown. An example of HDR with Indel is also shown in fig. 5A and 5B. Fig. 5C shows the genomic modification analysis on both alleles.

In FIGS. 6A, 6B and 6C, EGFR is C-terminal labeled with EmGFP. One gRNA was designed to target the EGFR genomic locus near the stop codon. Cas9RNP complex and donor DNA were delivered into 293FT cells by electroporation. The cloned cells were analyzed by ligation-type PCR and sequencing. FIG. 6A shows the N-terminal ligation assay (SEQ ID NO: 70) and FIG. 6B shows the C-terminal ligation assay (SEQ ID NO: 71). Fig. 6C shows the genomic modifications on each allele. Second, the intea HDR-free in fig. 6C refers to an "a" insertion without an insertion sequence.

FIG. 7A shows the effect of terminal modifications of the DNA donor on HDR efficiency, and FIG. 7B shows the effect of NHEJ inhibitors on HDR efficiency in FIG. 7A, terminally modified DNA primers were chemically synthesized and used to prepare donor DNA by PCR amplification Cas9RNP and donor DNA were transfected into primary T cells by electroporation 48 hours after transfection, the efficiency of insertion of puromycin-P2A-OFP DNA fragments into the β -actin locus was monitored by flow cytometry in FIG. 7B, immediately after electroporation NHEJ inhibitors were added to the medium "F" refers to the forward primer, "R" refers to the reverse primer, "PS" refers to phosphorothioate, "NH 2" refers to amine modifications, and "ssDNA" refers to single stranded DNA.

FIGS. 8A, 8B, 8C and 8D show cloning and expression of recombinant antibodies in mammalian genomes. FIG. 8A shows an antibody expression cassette comprising a promoter-free puromycin selection marker followed by self-cleaving 2A peptide (SEQ ID NO: 5). Expression of IgG Heavy Chain (HC) and Light Chain (LC) was driven by CMV promoter. 35nt homology arms were added by PCR. FIG. 8B (SEQ ID NOS: 72-76) and FIG. 8C (SEQ ID NOS: 77-82) show the N-terminal and C-terminal ligation assays, respectively. Arrows indicate Double Strand Breaks (DSBs) and junctions between genomic DNA and donor DNA. The 35nt homology arms and some additional sequences are also highlighted in bold. WPRE (woodchuck hepatitis virus post-transcriptional regulatory element) and stop codon are shown in FIG. 8C. FIG. 8D shows the relative percentage of antibody (+) or non-antibody (-) producing clonal cells as determined by ELISA analysis.

FIG. 9 Nuclear Localization Signal (NLS) -Donor DNA design (SEQ ID NO: 83-84). The binding chemistry used to attach the NLS peptide is succinimidyl 4- (N-maleimidomethyl) cyclohexane-1-carboxylate (SMCC) or

FIG. 10(SEQ ID NO 8: 85-92). Line HEK293 modified with NLS donor DNA construct. On the left, the GFP gene is fragmented by deletion of 6 nucleotides constituting the fluorophore. Addition of a donor containing 6 bases restored GFP fluorescence. On the right side is a similar BFP gene disruption. Addition of a donor with a Single Nucleotide Polymorphism (SNP) converts the BFP coding sequence to a GFP coding sequence.

Figure 11 dose response of Phosphorothioate (PS) oligonucleotide donor DNA when 6 base sequence was added to restore GFP activity compared to NLS modified donor DNA. Compared withDose response of ligated NLS modified donor DNA, PS oligonucleotide donor DNA.

FIG. 12 analysis by flow cytometry of cells edited with equal concentrations of PS or NLS oligonucleotide donor DNA.

Figure 13 dose response of PS oligonucleotide donor DNA compared to NLS modified donor DNA to edit single bases, thereby converting BFP expressing cells to GFP expressing cells. Dose response of PS oligonucleotide donor DNA compared to SMCC-linked NLS-modified donor DNA.

Figure 14 is a schematic showing an exemplary architecture of TALEN and TAL-Buddy (nuclease free) constructs. The TAL-Buddy construct shows no nuclease domain. In some cases, nuclease domains may be present, but may be cleaved (e.g., by insertion deletion and/or substitution).

FIG. 15. indel formation at the CMPK1-C target was improved by about 2-fold when "TAL-Buddy" designed at 7nt spacing relative to the TALEN binding sequence was added.

Figure 16 "TAL-Buddy" tested up to 100nt apart relative to TALEN binding sequences to increase TALEN cleavage.

FIG. 17 "TAL-Buddy" increases CRISPR-RNP indel formation by more than or equal to 20-fold at UFSP2-SNP target with a spacing of 20nt relative to the CRISPR sgRNA binding sequence.

"TAL-Buddy" together with RNPs formed by sgRNAs and SpCas9-HF1 or eSPCas9 enhances indel formation.

Fig. 19 is a schematic representation of the templates used to make sgrnas and "CR-PAL" grnas.

Fig. 20 is a graphical representation of the "CR-PAL" function. Black indicates "CR-PAL" with 15nt binding ability; grey indicates sgrnas with 20nt binding capacity.

FIG. 21. the formation of indels was increased more than 60-fold when "CR-PAL" was used with Cas9-RNP for the UFSP2-SNP target.

FIG. 22 shows the preparation of the No-FokI C-terminal fragment of "TAL-Buddy".

FIG. 23 shows the test "Buddy TAL" (293 FT). The target is as follows: CMPK1-C (SEQ ID NO: 19); talen mrna: 100ng each; TAL-Buddy: a 7nt spacing (SEQ ID NO: 18);1300/20/2. And (6) repeating.

Figure 24 is a graph showing the test spacing of Buddy TAL on TALENs. The target is as follows: CMPK 1; cell: 293 FT;1300/20/2. The spacing is important so that the TAL cannot be directly next to the TALEN. The spacing (0, 4, 7, 20nt) represents the spacing between the 18 base recognition sequence of the TAL and the nearest TALEN pair.

Figure 25 is a graph showing testing of Buddy TAL for increasing TALEN and CRISPR efficiency. The spacing affects the cutting efficiency. TAL (gray hexagons) is no different from TALEN (dark gray arrows) by 7nt to 20 nt. TAL and CRISPR targets (black circle fragments) were used, preferably 20nt away from TAL.

Figure 26 figure shows repeated TAL-buddy edited with CRISPR. 293FT cells; CRISPR target: USFP 2; TAL-Buddy: a 20nt pitch;1150/20/2. And (6) repeating.

Figure 27 shows testing of TAL-Buddy-activity to restore low performance mutants against high fidelity Cas 9. The target is as follows: UFSP 2; cell: 293 FT;1150/20/2, respectively; TAL-Buddy: 20nt pitch. The activity of HF-cas9 was undetectable and our analysis showed that it was not perfect over eCas 9. In the absence of TAL, the activity of ecasp 91.1 was not detectable. With TAL, wt activity levels can be obtained. This is important because the high fidelity activity that you obtain is localized only to the desired target site (ultra-high fidelity).

Figure 28 figure shows CRISPR-PAL testing using standard active cas9 and truncated grnas. The target is as follows: UFSP 2; cell: 293 FT;1150/20/2, respectively; CR-PAL: 15mer gRNA; CR _ PAL-left spacing: 36 nt; CR _ PAL-right spacing: 15 nt. Cas9 will bind to, but not cleave, the truncated gRNA (15 mer). (Ch)uch et al, 2014 Kiani et al, Cas9 gRNA engineering for genome editing, activation and suppression (Cas9 gRNA engineering for genome editing, activation and suppression), "methods" (nat. methods) doi: 10.1038 (9/7/2015)). The cutting site is bracketed and the DNA opened using a truncated gRNA so that a standard gRNA (20mer) can be better cut. 5% (alone) to > 50% (15 mer). Cas9v2+20mer gRNA + L/R15 mer gRNA.

Figure 29 shows the Buddy TAL activator concept. Binding of TAL to an activation domain (e.g., VP64) can facilitate active gene expression, thereby opening the DNA to enhance nuclease (TALEN, Cas9, etc.) editing.

Fig. 30 this figure shows HDR (sequence validation) in U2 OS. Donors have a HindIII site insertion;1300/20/2。

figure 31 this figure shows the effect of small molecules/additives on TALEN editing in a549 cells. The target is as follows: HTR 2A-N; donors have a HindIII site insertion;conditions are as follows: 1200/20/4, respectively; the culture medium was changed for 24 hours; HindIII cleavage is shown. NU7441(DNAPK inhibitor) and B18R (immune response repressor).

Fig. 32 shows an example of the relative positions of TALEN and TAL-Buddy. The TALEN pairs are then 8 bases apart on each side of the target site. In this example, the TAL-buddy is 7nt from the TALEN. The upper chain is SEQ ID NO: 20 and the lower strand is SEQ ID NO: 21.

figure 33 this figure shows "TAL-Buddy" designed close to the CRISPR cleavage site in UFSP2-SNP target. 100ng Lt and Rt "TAL-Buddy" mRNA and CRISPR-RNP (1000ng Cas9 protein and 200ng sgRNA) were added for useElectroporation Equipment (Thermo Fisher Scientific, catalog number MPK5000) with 1150 pulsesVoltage, 20 pulse width and 2 pulse number were transfected into approximately 50,000 293 human embryonic kidney cells (293 FT). Cells were harvested 48 to 72 hours after transfection and lysed. Using GENEART^TMThe genomic lysis assay kit (Saimer Feishell science, catalog No. A24372) analyzes the formation of indels. The upper chain is SEQ ID NO: 42, and the lower strand is SEQ ID NO: 43.

figure 34 this figure shows the "CR-PAL" designed adjacent to the CRISPR cleavage site in UFSP2-SNP target. 200ng CR-PAL _ Lt and CR-PAL _ Rt were incubated with wild-type Cas9-RNP and usedElectroporation equipment (seemer feishell science, catalog number MPK5000), was transfected into approximately 50,000 human embryonic kidney cells (293FT) at 1150 pulse voltages, 20 pulse widths and 2 pulse numbers. Cells were harvested 48 to 72 hours after transfection and lysed. Use of "GENEART^TMThe genomic lysis assay kit "(Saimer Feishale science, catalog No. A24372) analyzes indel formation. The upper chain is SEQID NO: 44, and the lower strand is SEQ ID NO: 45.

FIG. 35 shows the test "Buddy TAL" (293 FT). The upper chain is SEQ ID NO: 46, and the lower strand is SEQ ID NO: 47.

figure 36 is a schematic representation of the use of a pair of TAL-Buddy (also referred to herein as first and second DNA binding modulation enhancers) in conjunction with a pair of TAL-fokl nuclease fusions (also referred to herein as first and second DNA binding nuclease conjugates). Indicated on both the right and left sides of the figure are left TAL-Buddy binding and right TAL-Buddy binding on the left. The long solid line represents a portion of the nucleic acid molecule (e.g., chromosome; also referred to herein as a target locus) within the cell. Region a (shown on the left and right) of the depicted nucleic acid molecule is the binding site for two TAL-Buddy proteins (also referred to herein as first and second enhancer binding sequences). Region B represents the distance between the TAL-Buddy binding site (e.g., the first and second enhancer binding sequences) and the TAL-fokl fusion protein binding site (also referred to herein as the first and second binding sequences). Region D represents the nucleic acid segment between the two TAL-fokl fusion protein binding sites. The white boxes in region D represent sites (also referred to herein as regulatory sites) at which the TAL-fokl fusion protein pair cleaves nucleic acids. Region E represents a portion of the nucleic acid molecule where accessibility is potentially enhanced.

Fig. 37 is a schematic similar to fig. 36, except that a single TAL-VP16 fusion (also referred to herein as a regulatory protein) is used in place of a pair of TAL-fokl nuclease fusions. The unlabeled circles represent components of the transcriptional complex recruited by VP 16. Furthermore, there is only one region C, due to the use of a single TAL-VP16 fusion. Furthermore, region B is formed by the middle base pair between region A (also referred to herein as the first and second enhancer binding sequences) and region C (also referred to herein as the regulatory binding sequence).

FIG. 38 shows many different forms of donor nucleic acid molecules that can be used in the various embodiments set forth herein. Open circles at the ends represent nuclease resistant groups. Two circles means that there are two groups. The black regions represent regions homologous/complementary to one or more locus sequences of another nucleic acid molecule (e.g., chromosomal DNA). The cross-hatched regions represent nucleic acids located between the regions of sequence homology/complementarity in the nucleic acid fragments. This figure shows different variants of donor nucleic acid molecules that can be used in different aspects of the invention.

Fig. 39 is a schematic diagram of an exemplary Cas9 format of streptococcus pyogenes Cas9 based on the Cas9 protein model. This 1368 amino acid protein is represented by the filled top line in the figure. The Cas9 protein, designated V1-V5, is a fusion protein with a Nuclear Localization Signal (NLS) as a component. The dashed box represents a mono-typing NLS, and the open box represents a bi-typing NLS. The grey boxes represent affinity tags (e.g., six histidine tags).

Fig. 40 is a schematic similar to fig. 36, with the addition of a more detailed view of the TAL cleavage locus and a schematic of a donor DNA molecule. The lower left shows a linear schematic of the donor DNA. The straight solid lines indicate regions of homology to the target locus. The circular dotted line indicates the insertion cassette. The "X" symbol represents a sequence homologous region. The dashed arrows up and down indicate two phosphorothioate linkages in the 5 'and 3' strands of the donor DNA homology arms. After nuclease digestion, these phosphorothioate linkages were positioned so as to create 5' overhangs that were ten nucleotides in length. The open boxes on the left and right represent bi-parting NLS. The right side of the linear diagram is two examples of insertion cassettes. The upper insertion cassette is designed to both disrupt function at the insertion site and express a puromycin resistance marker. The lower insertion cassette is similar to the upper insertion cassette, but is also designed to insert the gene of interest into the locus operably linked to a tissue-specific promoter.

FIG. 41 shows the amino acid sequence of Cas9V 1 (SEQ ID NO: 93). NLS and His tag are so labeled.

FIG. 42 shows the amino acid sequence of Cas9V2 (SEQ ID NO: 94). The NLS is labeled as such.

Figure 43 shows a series of Cas9-NLS fusion protein formats. "NP" refers to nucleoplasmin NLS.

Figures 44A and 44B show GCD data obtained using different Cas9-NLS in combination with two different cell types.

Fig. 45 is a schematic diagram showing a common TALE structural format. Sites 1, 2 and 3 are located outside the TALE region thought to be involved in DNA recognition and binding.

Figure 46 shows the amino acid sequence of TALEN proteins(SEQ ID NO：95). This form of TALEN is referred to herein as "TALEN V3". The N-terminal region comprises the V5 epitope and the "G-G" linker, followed by a 136 amino acid region, followed by a repeat region. The 136 amino acid region contains (1) a series of repeat units (labeled "R-3", "R-2", "R-1", and "R0") with some sequence homology to the individual repeat sequences of the repeat region and (2) "T-less box", the amino acid sequence "RGA", which can be altered to reduce the 5' T requirement of nucleic acids to which TALENs bind. The repeat region contains sixteen repeats of thirty-four amino acids. The half-repeat sequence (labeled "R1/2") is immediately C-terminal to the repeat region. The two nuclear localization signals (labeled "NLS") are further localized towards the C-terminus of the protein before and after the fokl nuclease domain.

FIG. 47 shows the amino acid sequence of TALEN proteins(SEQ ID NOS: 96 and 97),but repeatThe amino acid sequence of the regions has been removed to simplify the drawing. In addition, the protein shown in this figure has three NLS.

FIG. 48 shows genomic lysis assay data for three different genomic loci in three different cell types, generated as described in example 8 below.

Figure 49 shows genome lysis detection and homology directed repair data for three different genomic loci in two different cell types, generated as described in example 8 below.

Figure 50 shows genome lysis assay data for three different genomic loci in a549 cells, generated as described in example 8 below.

Fig. 51 is a schematic of some uses of TAL for opening and maintaining open chromatin. The upper part of the schematic shows the intracellular nucleic acid region where the nucleic acid binds to histone octamers forming chromatin. Each octamer was wrapped with about 145 base pairs of DNA, about 1.6 turns. Histone H1 is not shown in this schematic. The dashed arrow indicates the Buddy-TAL binding site. The boxes marked with a vertical line "TBS" refer to TAL binding sites. "RNApol" refers to an RNA polymerase molecule that transcribes nucleic acids from "promoter" and "downstream".

FIG. 52 shows the amino acid sequence of Buddy-TAL (SEQ ID NO: 119), which is SEQ ID NO: 120, or a nucleotide sequence as set forth in seq id No. 120. The Buddy-TAL has two NLSs (boxed), one at each of the N-and C-termini of the protein. In addition, the transcriptional activation domain, which is usually present near the C-terminus, has been deleted. The underlined central region of the protein is the repeat region. Also shown in the box are two joints (GS and GG).

Detailed Description

SUMMARY

The compositions and methods set forth herein are directed to improving gene editing. By way of example, these improvements include the following:

i. inserting a nucleic acid molecule (e.g., a donor DNA molecule) into a nucleic acid molecule in a cell, wherein the inserted nucleic acid molecule is operably linked to a promoter present in the nucleic acid molecule in the cell.

Facilitating gene editing using a non-homologous end joining inhibitor.

Use of a DNA binding molecule (e.g. a DNA binding protein, DNA binding protein/nucleic acid complex) that binds at or near a target locus within a cell, wherein the DNA binding protein facilitates enhanced accessibility of other DNA binding molecules to the target locus.

Delivery of donor DNA to gene editing loci within the cell, and delivery of other DNA molecules to different locations within the cell (e.g., a linear DNA molecule containing an open reading frame is operably linked to a promoter and delivered to the mitochondria).

The above improvements may be used individually or in combination with other methods listed above as well as additional methods.

The present disclosure relates in part to the following discoveries: the combined use of promoter capture for selection markers and short homology arms for recombination achieves near 100% integration efficiency and accurate HDR up to 100%.

Unlike traditional methods using targeting vectors with homology arms of 0.5kb to 2kb, the use of short homology arms appears to minimize the incidence of random integration of the foreign DNA of interest into the genome. Most importantly, the use of promoter capture against a selection marker enables the selection of correctly integrated species, since promoter-free selection markers are only expressed when the DNA molecule is precisely inserted into the genomic locus. In some embodiments, terminal modification of the donor DNA with, for example, a phosphorothioate or an amino group and/or treatment with NHEJ inhibitors further improves the efficiency of HDR. The accuracy of integration of donor DNA is sequence dependent. At some loci, 100% integration efficiency and 100% accurate HDR can be achieved.

The present disclosure also relates in part to compositions and methods for enhancing the accessibility of intracellular nucleic acid regions to molecules or molecular complexes that interact with those regions.

The present disclosure is further directed, in part, to compositions and methods for intracellular localization of nucleic acid molecules. In some cases, the nucleic acid molecule is a donor DNA molecule.

The present disclosure also relates to various combinations of the above to facilitate processes such as gene editing, gene activation, gene suppression, DNA methylation, and the like.

The present invention relates in part to compositions and methods for enhancing gene editing. Many variables affect the efficiency of gene editing. With respect to Homology Directed Repair (HDR), these factors include:

(1) (i) donor DNA and (ii) the amount of site-specific nuclease located in the nucleus, and the amount of site-specific nuclease activity in the nucleus,

(2) the degree of accessibility of the site-specific nuclease to the target locus,

(3) timing aspects related to the presence of donors and nucleases in the nucleus,

(4) the efficiency of the lysis of the target locus,

(5) HDR efficiency (including HDR: NHEJ ratio), and

(6) donor DNA structure and composition.

It is expected that in some cases, especially in terms of HDR, gene editing efficiencies approaching 100% can be achieved.

Localization of gene editing reagents in the nucleus: since it is believed that many factors affecting gene editing efficiency are based on a concentration-dependent mechanism, the higher the amount of site-specific nuclease activity (the combination of the level of nuclease activity and the number of nucleases present) and the higher the concentration of donor DNA in the nucleus, the more HDR is expected to dominate in NHEJ.

While nucleic acid molecules and proteins can be produced in cells (e.g., vector-based systems), in many cases, components of gene editing systems (e.g., donor DNA, site-specific nucleases, DNA binding regulation enhancers, etc.) are introduced into cells. Such cell introduction can be accomplished using methods such as transfection and electroporation.

After introduction of a gene editing system component into a cell, efficient localization to the nucleus is often required. This is done because it is believed that the efficient localization of nucleases to components of gene editing systems is at least partially related to cytoplasmic degradation ((i) a combination of degradation activity and (ii) the amount of time spent in the cytoplasm). In addition, many factors can affect nuclear localization efficiency, including (1) the binding of gene editing system components to one or more NLSs, (2) the selection of the NLS used, and (3) the chemical modification of one or more gene editing system components (e.g., donor DNA).

In many cases, the nucleic acid molecules used in the methods described herein can be chemically modified. Chemical modifications include nuclease resistant groups such as phosphorothioate groups, amino groups, 2 '-O-methyl nucleotides, 2' -deoxy-2 '-fluoro nucleotides, 2' -deoxy nucleotides, 5-C-methyl nucleotides, and combinations thereof. For example, three nucleotides on the 5 ' and 3 ' end gRNA molecules may contain phosphorothioate linkages and/or may be 2 ' -O-methyl nucleotides. Amine-terminal modifications of donor DNA have also been found to enhance HDR (see, e.g., fig. 7A and 7B). In both cases this is believed to be due at least in part to the stability of the donor DNA molecule in the cytoplasm. grnas are also thought to be stable by binding to Cas9 protein. It is therefore believed that when the gRNA binds to Cas9 protein, the cytoplasmic half-life of the gRNA is increased.

The data indicate and believe that gene editing efficiency is increased when the gene editing system components are stable with respect to cytoplasmic degradation and rapidly "shuttle" through the cytoplasm to the nucleus. The rapid movement of gene editing system components through the cytoplasm has another effect that would be beneficial in many cases. This allows for transient high concentrations of gene-editing system components active in the nucleus and a lower pool of cytoplasmic gene-editing system components. Thus, once high concentrations of gene-editing system components in the nucleus are depleted of activity, there is little or no cytoplasmic reserve for additional gene-editing activity.

Site-specific target locus lytic activity: the efficiency of target locus lysis is determined by a number of factors, some of which are described above. These factors include: (1) a gene editing system lytic activity, (2) an amount of a gene editing system component present at or near a target locus that mediates lysis, and (3) accessibility of the target locus to the gene editing system lytic activity, as described herein.

The accessibility of the lytic activity of a gene editing system to a target locus may vary with natural effects as it is accessible or inaccessible to the genome or to a particular cell type or somewhere in between. Inducing transcriptional activation of a target locus prior to lysis of that target locus may make the lytic activity more accessible to the locus. Another method of increasing the accessibility of a particular target locus is through the use of DNA binding regulatory enhancers.

One consideration regarding site-specific target locus lytic activity is the "off-target" effect. Off-target effects can be minimized by the use of DNA binding regulatory enhancers, high target locus specificity of grnas, and high fidelity gene editing reagents (e.g., high fidelity Cas9), alone or in combination.

Alteration of a target locus: there are two main types of gene editing that are commonly performed. These gene edits are insertions of nucleic acid molecules into a target locus and insertions of nucleic acid molecules into a target locus, but changes in the nucleotide sequence of the target locus. In addition, there are three possibilities for cleavage and "repair" of a target locus. The target locus may be (1) invariant compared to the nucleotide sequence prior to cleavage; (2) modified by deletion or addition of one or more bases without donor nucleic acid insertion, or (3) donor DNA insertion can be introduced at or near the cleavage site. The first two of these possibilities are usually caused by NHEJ-based repair mechanisms. The third of these possibilities is generally based on HDR-based mechanisms. In many cases, particularly where it is desired to insert donor DNA into a target locus, the third possibility is preferred. Thus, provided herein are compositions and methods for enhancing HDR efficiency and/or favoring HDR over NHEJ.

A number of factors have been found to cause effective HDR and to insert the donor nucleic acid molecule into the cleavage site. Some of these factors are related to the characteristics of the donor nucleic acid molecule. One of these factors is the length of the donor DNA homology arm. In many cases, the donor DNA molecule will have two homology arms that independently range in length from about 20 to about 2,000 nucleotides or base pairs, depending on whether the donor DNA is single-stranded or double-stranded. In addition, double-stranded donor DNA can have 3 'protrusions at one or both ends, and these protrusions (as well as 5' protrusions) can range in length from about 10 to about 40 nucleotides. Likewise, one or both strands of one or both homology arms of the donor DNA molecule may comprise one or more nuclease resistant groups located at the ends or elsewhere within the arms (as discussed elsewhere herein).

There are many ways to make HDR better than NHEJ repair. One approach is to treat the cells to be gene edited with one or more NHEJ inhibitors (see fig. 7B). The other is "decreasing" intracellular NHEJ activity. This can be achieved by using, for example, antisense micrornas and/or RNAi agents (e.g., DNA-dependent protein kinases, catalytic subunits; Ku70 and/or Ku80) designed to inhibit the expression of one or more NHEJ repair pathways.

Definition of

As used in accordance with this disclosure, the following terms, unless otherwise indicated, shall be understood to have the following meanings:

"nucleic acid" refers to deoxyribonucleotides or ribonucleotides and polymers thereof, or the complement thereof, in single-, double-, or multi-stranded form. The term "polynucleotide" refers to a linear sequence of nucleotides. The term "nucleotide" generally refers to a single unit of a polynucleotide, i.e., a monomer. The nucleotide may be a ribonucleotide, a deoxyribonucleotide or a modified form thereof. Examples of polynucleotides encompassed herein include single-and double-stranded DNA, single-and double-stranded RNA (including siRNA), and hybrid molecules having a mixture of single-and double-stranded DNA and RNA. The nucleic acid may be linear or branched. For example, the nucleic acid may be a linear chain of nucleotides, or the nucleic acid may be branched, e.g., such that the nucleic acid comprises one or more nucleotide arms or branches. Optionally, the branched nucleic acids are repeatedly branched to form higher order structures, such as dendrimers and the like.

The term also includes nucleic acids containing known nucleotide analogs or modified backbone residues or linkages, which are synthetic, naturally occurring, and non-naturally occurring, have binding properties similar to a reference nucleic acid, and are metabolized in a manner similar to a reference nucleotide. Examples of such analogs include, but are not limited to, phosphodiester derivatives including, for example, phosphoramidates, phosphorodiamidates, phosphorothioates (also known as phosphorothioates), phosphorodithioates, phosphonocarboxylic acids, phosphonocarboxylates, phosphonoacetic acids, phosphonoformic acids, methylphosphonates, borophosphonates, or O-methyl phosphoramidate linkages (see Eckstein, an oligonucleotide and the like: a Practical method (Oligonucleotides and analogs: A Practical Approach), Oxford University Press (Oxford University Press)); and peptide nucleic acid backbones and linkages. Other analog nucleic acids include those with positive backbones; non-ionic backbones, modified sugar, and non-ribose backbones (e.g., phosphorodiamidate morpholino oligonucleotides or Locked Nucleic Acids (LNAs)), including those described in: U.S. Pat. Nos. 5,235,033 and 5,034,506, and ASC Symposium 580(ASC Symposium Series 580), Chapter 6 and Chapter 7, Carbohydrate modification in Antisense studies, Sanghui and Cook. Nucleic acids containing one or more carbocyclic sugars are also included in one definition of nucleic acids. Modifications of the ribose-phosphate backbone can be for a variety of reasons, such as increasing the stability and half-life of such molecules in physiological environments or as probes on biochips. Mixtures of naturally occurring nucleic acids and analogs can be prepared; alternatively, mixtures of different nucleic acid analogs can be prepared, as well as mixtures of naturally occurring nucleic acids and analogs. In embodiments, the internucleotide linkage in the DNA is a phosphodiester, phosphodiester derivative, or a combination of both.

The nucleic acid may include a non-specific sequence. As used herein, the term "non-specific sequence" refers to a nucleic acid sequence that contains a series of residues that are not designed to be complementary or only partially complementary to any other nucleic acid sequence. For example, a non-specific nucleic acid sequence is a sequence of nucleic acid residues that does not function as an inhibitory nucleic acid when contacted with a cell or organism. An "inhibitory nucleic acid" is a nucleic acid (e.g., a polymer of DNA, RNA, nucleotide analogs) that is capable of binding to and reducing transcription of a target nucleic acid (e.g., mRNA from DNA) or reducing translation of a target nucleic acid (e.g., mRNA) or altering splicing of a transcript (e.g., single-stranded morpholino oligonucleotides).

As used herein, the term "nucleic acid molecule" refers to a covalently linked sequence of nucleotides or bases (e.g., ribonucleotides of RNA and deoxyribonucleotides of DNA, but also includes DNA/RNA hybrids in which the DNA is in separate strands or in the same strand) in which the 3 'position of the pentose of one nucleotide is joined to the 5' position of the pentose of the next nucleotide by a phosphodiester linkage. The nucleic acid molecule may be single-stranded or double-stranded or partially double-stranded. Nucleic acid molecules may appear in linear or circular form with blunt or sticky ends when supercoiled or relaxed and may contain "nicks". The nucleic acid molecule may be composed of a fully complementary single strand or of a partially complementary single strand forming at least one base mismatch. The nucleic acid molecule may additionally comprise two self-complementary sequences, which may form a double-stranded stem region, optionally separated at one end by a loop sequence. The two regions of the nucleic acid molecule comprising the double-stranded stem region are substantially complementary to each other, thereby generating self-hybridization. However, the stem may comprise one or more mismatches, insertions or deletions. As described above, a nucleic acid molecule may include chemically, enzymatically or metabolically modified forms of the nucleic acid molecule, or combinations thereof. Chemically synthesized nucleic acid molecules can refer to nucleic acids that are typically less than or equal to 150 nucleotides long (e.g., 5 to 150, 10 to 100, 15 to 50 nucleotides long), while enzymatically synthesized nucleic acid molecules can encompass smaller as well as larger nucleic acid molecules, as described elsewhere herein. Enzymatic synthesis of nucleic acid molecules can include stepwise methods using enzymes (e.g., polymerases, ligases, exonucleases, endonucleases, etc., or combinations thereof). The term "genome editing" or "gene editing" as provided herein refers to a stepwise process involving enzymes, such as polymerases, ligases, exonucleases, endonucleases, and the like, or combinations thereof. For example, gene editing may include the following methods: cleaving the nucleic acid molecule, cleaving nucleotides at or adjacent to the cleavage site, newly synthesizing new nucleotides, and ligating the cleaved strands.

The term nucleic acid molecule also refers to short nucleic acid molecules, commonly referred to as, for example, "primers" or "probes". Primers are generally referred to as single-stranded starting nucleic acid molecules for enzymatic assembly reactions, while probes are generally useful for detecting at least partially complementary nucleic acid molecules. Nucleic acid molecules have a "5 'end" and a "3' end" because the nucleic acid molecule phosphodiester linkage exists between the 5 'carbon and the 3' carbon of the pentose ring of a substituent mononucleotide. The end of the nucleic acid molecule that is newly linked to the 5 'carbon is its 5' terminal nucleotide. The end of the nucleic acid molecule that is newly linked to the 3 'carbon is its 3' terminal nucleotide. As used herein, a terminal nucleotide or base is a nucleotide at the terminal position of the 3 'or 5' end. Nucleic acid molecule sequences, even if located within a larger nucleic acid molecule (e.g., a sequence region within a nucleic acid molecule), can be referred to as having a5 'end and a 3' end.

As used herein, a "vector" is a nucleic acid molecule that can be used as a vehicle for transferring genetic material into a cell. The vector may be a plasmid, virus or phage, cosmid, or artificial chromosome, e.g., a Yeast Artificial Chromosome (YAC), a Bacterial Artificial Chromosome (BAC), or other sequence capable of replication or being replicated in vitro or in a host cell, or to deliver a desired nucleic acid segment to a desired location in a host cell. In embodiments, a vector refers to a DNA molecule having at least one origin of replication, a Multiple Cloning Site (MCS), and one or more selectable markers. The vector is typically composed of a backbone region and at least one insertion or transgene region or region designed for insertion of a DNA fragment or transgene (e.g., MCS). The framework region typically contains an origin of replication for propagation in at least one host and one or more selectable markers. The vector may have one or more restriction endonuclease recognition sites (e.g., two, three, four, five, seven, ten, etc.) at which sequences may be cut in a determinable fashion without loss of essential biological function of the vector, and in which nucleic acid fragments may be spliced for replication and cloning thereof. The vector may further provide primer sites (e.g.for PCR), transcription and/or translation initiation and/or regulatory sites, recombination signals, repliconsSelecting markers, etc. Clearly, methods of inserting a desired nucleic acid fragment can also be applied without the need to use recombination, translocation or restriction enzymes such as, but not limited to uracil N glycosylase (UDG) cloning PCR fragments (U.S. patent nos. 5, 334,575 and 5, 888,795, both of which are incorporated herein by reference in their entirety)), T: a clone, etc.) the fragment is cloned into a cloning vector to be used according to the invention. In an embodiment, the carrier comprises further features. Such additional features may include natural or synthetic promoters, genetic markers, antibiotic resistance cassettes or selection markers (e.g., toxins such as ccdB or tse2), epitopes or tags for detection, manipulation or purification (e.g., V5 epitope, c-myc, Hemagglutinin) (HA), FLAG^TMPolyhistidine (His), glutathione-S-transferase (GST), Maltose Binding Protein (MBP)), Scaffold Attachment Region (SAR), or reporter genes (e.g., Green Fluorescent Protein (GFP), Red Fluorescent Protein (RFP), luciferase, β -galactosidase, etc.).

As used herein, "cloning vector" includes any vector that can be used to delete, insert, replace, or assemble one or more nucleic acid molecules. In embodiments, the cloning vector may comprise a counter-selectable marker gene (e.g., ccdB or tse2) that can be removed or replaced by another transgene or DNA fragment. In embodiments, the cloning vector may be referred to as a donor vector, an entry vector, a shuttle vector, a destination vector, a target vector, a functional vector, or a capture vector. Cloning vectors typically contain a series of unique restriction enzyme cleavage sites (e.g., type II or type IIS) for removal, insertion, or replacement of DNA fragments. Alternatively, the DNA fragment may be obtained byCloning or recombination for substitution or insertion, e.g. Invitrogen/Life technologyProvided by ies (Carlsbad, Calif.)Cloning or recombination employed in cloning systems and described in more detail elsewhere herein. Cloning vectors that can be used to express transgenes in a target host may also be referred to as expression vectors. In the examples, cloning vectors are engineered to obtain TAL effector binders.

An "expression vector" is designed for expression of a transgene and typically comprises at least one promoter sequence that drives expression of the transgene. As used herein, expression refers to transcription of a transgene or transcription and translation of an open reading frame, and may occur in a cell-free environment (e.g., a cell-free expression system) or in a host cell. In embodiments, expression of the open reading frame or gene results in production of a polypeptide or protein. Expression vectors are typically designed to contain one or more regulatory sequences, such as enhancer, promoter and terminator regions that control the expression of the inserted transgene. Suitable expression vectors include, but are not limited to, plasmids and viral vectors. Vectors and expression systems for various applications are available from commercial suppliers, such as Novagen (Madison, Wis.), Clontech (Palo Alto, Calif.), Stratagene (La Jolla, Calif.), and Life Technologies Corp (Carlsbad, Calif.). In embodiments, the expression vector is engineered to express TAL effector fusions.

"viral vector" generally refers to a genetically engineered non-infectious virus comprising a modified viral nucleic acid sequence. In embodiments, the viral vector comprises at least one viral promoter and is designed for insertion of one or more transgenes or DNA fragments. In embodiments, the viral vector is delivered to the target host along with a helper virus that provides packaging or other functions. In embodiments, the viral vector is used to stably integrate the transgene into the genome of the host cell. The viral vector may be used for delivery and/or expression of the transgene.

Viral vectors may be derived from bacteriophage, baculovirus, tobacco mosaic virus, vaccinia virus, retrovirus (avian leukosarcoma, mammalian type C, type B, type D, HTLV-BLV, lentivirus, foamy virus), adenovirus, parvovirus (e.g., adeno-associated virus), coronavirus, minus-strand RNA virus (e.g., orthomyxovirus (e.g., influenza virus) or Sendai virus), rhabdovirus (e.g., rabies and vesicular stomatitis virus), paramyxovirus (e.g., measles and Sendai virus), plus-strand RNA virus (e.g., picornavirus and type A virus (e.g., Semliki Forest virus (Semliki Forest virus)), and double-stranded DNA virus including adenovirus, herpesvirus (e.g., herpes simplex virus types 1 and 2, Epstein-Barr virus), cytomegalovirus), and poxviruses (e.g., vaccinia virus, and/or a combination thereof, Bird pox and canarypox). Other viruses include, but are not limited to, Norwalk virus (Norwalk virus), togavirus (togavirs), flavivirus, reovirus, papova virus, hepadnavirus, and hepatitis virus. For example, a common viral vector for delivering genes is a lentiviral vector, based on its relatively large packaging capacity, reduced immunogenicity, and its ability to efficiently and stably transduce a variety of different cell types. Such lentiviral vectors may be "integrated" (i.e., capable of integrating into the genome of the target cell) or "non-integrated" (i.e., not integrated into the genome of the target cell). Expression vectors containing regulatory elements from eukaryotic viruses are often used in eukaryotic expression vectors, such as SV40 vectors, papillomavirus vectors, and vectors derived from epstein-barr virus. Other exemplary eukaryotic vectors include pMSG, pAV009/A +, pMTO10/A +, pMAMneo-5, baculovirus pDSVE, and any other vector that allows expression of a protein under the direction of the SV40 early promoter, the SV40 late promoter, the metallothionein promoter, the murine mammary tumor virus promoter, the rous sarcoma virus promoter, the polyhedrin promoter, or other promoters shown to be efficiently expressed in eukaryotic cells.

A "labeled nucleic acid or oligonucleotide" is a nucleic acid or oligonucleotide that is covalently bound to a label via a linker or chemical bond, or non-covalently bound via ionic, van der Waals, electrostatic, or hydrogen bonding, such that the presence of the nucleic acid can be detected by detecting the presence of a detectable label bound to the nucleic acid. Alternatively, the same result can be obtained using a method of high affinity interaction, where one of a pair of binding partners binds to the other, e.g., biotin, streptavidin. In embodiments, the phosphorothioate nucleic acid or phosphorothioate polymer backbone comprises a detectable label as disclosed herein and generally known in the art.

The term "probe" or "primer" as used herein is defined as one or more nucleic acid fragments whose specific hybridization to a sample can be detected. The probe or primer may be of any length, depending on the particular technique for which it is to be used. For example, PCR primers are typically between 10 and 40 nucleotides in length, while nucleic acid probes used in, for example, southern blotting, can be over a hundred nucleotides in length. The probe may be unlabeled or labeled as described below so that its binding to the target or sample can be detected. Probes may be generated from a source of nucleic acid from one or more specific (preselected) portions of a chromosome, such as one or more clones, isolated whole chromosomes or chromosome fragments, or a collection of Polymerase Chain Reaction (PCR) amplification products. The length and complexity of the nucleic acid immobilized on the target element is not critical to the invention. The skilled artisan can adjust these factors to provide optimal hybridization and signal generation for a given hybridization program, and to provide the desired resolution between different gene or genomic locations.

The probes may also be isolated nucleic acids immobilized on a solid surface (e.g., nitrocellulose, glass, quartz, fused silica slides), as in an array. In some embodiments, the probe may be a member of a nucleic acid array, for example as described in WO 96/17958. Techniques capable of generating high density arrays are also useful for this purpose (see, e.g., Fodor Science 251: 767-773 (1991); Johnston, Current biology 8: R171-R174 (1998); Schummer, Biotechnology 23: 1087-1092 (1997); Kern, Biotechnology 23: 120-124 (1997); U.S. Pat. No. 5,143,854).

The words "complementary" or "complementarity" refer to the ability of a nucleic acid in a polynucleotide to form a base pair with another nucleic acid in a second polynucleotide. For example, the sequence A-G-T is complementary to the sequence T-C-A. Complementarity may be partial, where only some of the nucleic acids match, based on base pairing, or complete, where all of the nucleic acids match, based on base pairing.

The term "isolated" when applied to a nucleic acid or protein means that the nucleic acid or protein is substantially free of other cellular components with which it is associated in its native state. It may, for example, be in a homogeneous state and may be present in anhydrous or aqueous solutions. Purity and homogeneity are typically determined using analytical chemistry techniques such as polyacrylamide gel electrophoresis or high performance liquid chromatography. The protein, which is the predominant species present in the preparation, is substantially purified.

The term "purified" means that the nucleic acid or protein essentially produces a band of color in the electrophoresis gel. In some embodiments, the nucleic acid or protein is at least 50% pure, optionally at least 65% pure, optionally at least 75% pure, optionally at least 85% pure, optionally at least 95% pure, and optionally at least 99% pure.

The term "isolated" may also refer to a cell or a sample cell. An isolated cell or sample cell is a single cell type that is substantially free of many components that normally accompany the cell when the cell is in its native state or when the cell is initially removed from its native state. In certain embodiments, the isolated cell sample retains those components from its natural state that are needed to maintain the cells in a desired state. In some embodiments, an isolated (e.g., purified, separated) cell or an isolated cell is a cell that is substantially the only cell type in the sample. The purified cell sample may contain at least 60%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100% of one cell type. An isolated cell sample can be obtained by using a cell marker or a combination of cell markers, either of which is unique to one cell type in the unpurified cell sample. In some embodiments, the cells are isolated by using a cell sorter. In some embodiments, the cells are isolated using antibodies directed against cellular proteins.

As used herein, "wild-type sequence" refers to any given sequence (e.g., an isolated sequence) that can be used as a template for subsequent reactions or modifications. As understood by the skilled person, the wild-type sequence may comprise a nucleic acid sequence (e.g. DNA or RNA or a combination thereof) or an amino acid sequence, or may be composed of different chemical entities. In some embodiments, wild-type sequence may refer to a computer-simulated sequence, which may be sequence information or sequence data that may be stored in a computer-readable medium in a machine-readable and/or editable form. The wild-type sequence (nucleotide or amino acid symbols reflecting a given order) can be entered, for example, into a customer portal via a Web interface. In embodiments, the sequence initially provided by the customer is considered to be a wild-type sequence in view of the downstream processes based thereon, regardless of whether the sequence itself is a native sequence or a modified sequence, i.e., modified relative to another wild-type sequence, or entirely artificial.

In embodiments, a wild-type sequence may also refer to a solid molecule, such as a nucleic acid molecule (e.g., RNA or DNA or a combination thereof) or a protein, polypeptide, or peptide composed of amino acids. Methods for obtaining wild-type sequences chemically, enzymatically or otherwise are known in the art. In one example, the solid nucleic acid wild-type sequence can be obtained by PCR amplification of the corresponding template region, or can be synthesized de novo based on the assembly of synthetic oligonucleotides. Wild-type sequences as used herein may encompass naturally occurring as well as artificial (e.g., chemically or enzymatically modified) portions or building blocks. The wild-type sequence may be composed of two or more sequence portions. The wild-type sequence may be, for example, a coding region, an open reading frame, an expression cassette, an effector domain, a repeat domain, a promoter/enhancer or terminator region, an untranslated region (UTR), but may also be a defined sequence motif, such as a binding, recognition or cleavage site within a given sequence. The wild-type sequence may be DNA or RNA of any length, and may be linear, circular or branched, and may be single-stranded or double-stranded.

As used herein, the term "conjugate" refers to a bond between atoms or molecules. The binding may be direct or indirect. For example, the conjugate between a first moiety (e.g., a nuclease moiety) and a second moiety (a DNA-binding moiety) provided herein can be direct, e.g., through a covalent bond, or indirect, e.g., through a non-covalent bond (e.g., an electrostatic interaction (e.g., ionic bond, hydrogen bond, halogen bond), van der waals interaction (e.g., dipole-dipole, dipole-induced dipole, london dispersion), ring packing (pi effect), hydrophobic interaction, etc.). In embodiments, conjugates are formed using conjugate chemistry, including, but not limited to, nucleophilic substitutions (e.g., reaction of amines and alcohols with acid halides, activated esters), electrophilic substitutions (e.g., enamine reactions), and carbon-carbon and carbon-heteroatom multiple bond additions (e.g., michael reactions, diels-alder additions). These and other useful reactions are discussed, for example, in March, ADVANCED ORGANIC CHEMISTRY (ADVANCED ORGANIC CHEMISTRY), 3 rd edition, John Wiley & Sons, New York, 1985; hermanson, "BIOCONJUGATE technology (BioCon Juggete Techniques)," Academic Press, san Diego, 1996; and Feeney et al, protein MODIFICATION OF PROTEINS (PROTEINS); advances in the chemical Series (Advances in Chemistry Series), volume 198, American chemical society, Washington D.C., 1982. In embodiments, the first moiety (e.g., nuclease moiety) and the second moiety (peptide moiety) are linked in a non-covalent manner by a non-covalent chemical reaction between a component of the first moiety (e.g., nuclease moiety) and a component of the second moiety (DNA-binding moiety). In other embodiments, the first moiety (e.g., a polyamine moiety) comprises one or more reactive moieties, such as a covalent reactive moiety as described herein (e.g., an alkyne, azide, maleimide, or thiol reactive moiety). In other embodiments, the first moiety (e.g., nuclease moiety) comprises a linker having one or more reactive moieties, such as a covalent reactive moiety as described herein (e.g., an alkyne, azide, maleimide, or thiol reactive moiety). In other embodiments, the second moiety (DNA binding moiety) comprises one or more reactive moieties, such as a covalent reactive moiety as described herein (e.g., an alkyne, azide, maleimide, or thiol reactive moiety). In other embodiments, the second moiety (DNA binding moiety) comprises a linker having one or more reactive moieties, such as a covalent reactive moiety as described herein (e.g., an alkyne, azide, maleimide, or thiol reactive moiety).

As used herein, the term "about" refers to a range of values that includes the specified value, which one of ordinary skill in the art would consider reasonably similar to the specified value. In embodiments, the term "about" means within the standard deviation of measurements generally accepted in the art for use. In embodiments, about refers to a range extending to/-10% of the specified value. In embodiments, about refers to a particular value.

The terms "polypeptide," "peptide," and "protein" are used interchangeably herein to refer to a polymer of amino acid residues, wherein the polymer may be bound to a moiety that is not comprised of an amino acid. The terms apply to amino acid polymers in which one or more amino acid residues are artificial chemical mimetics of a naturally occurring corresponding amino acid, as well as to naturally occurring amino acid polymers and non-naturally occurring amino acid polymers. The term applies to macrocyclic peptides, peptides that have been modified with non-peptide functional groups, peptidomimetics, polyamides, and macrolactams. By "fusion protein" is meant a chimeric protein that encodes two or more separate protein sequences that are recombinantly expressed as a single moiety.

The terms "peptidyl", "peptide moiety", "protein moiety" and "peptidyl moiety" refer to a monovalent peptide or protein.

The term "amino acid" refers to both naturally occurring and synthetic amino acids, as well as amino acid analogs and amino acid mimetics that function in a manner similar to the naturally occurring amino acids, those encoded by the genetic code, and those amino acids that are later modified, such as hydroxyproline, γ -carboxyglutamic acid, and O-phosphorylated serine.

Amino acids may be referred to herein by their commonly known three letter symbols or by the one letter symbols recommended by the IUPAC-IUB Biochemical nomenclature Commission. Nucleotides may likewise be referred to by their commonly accepted single-letter codes.

An amino acid or nucleotide base "position" is represented by a number that, based on its position relative to the N-terminus (or 5' terminus), sequentially identifies each amino acid (or nucleotide base) in a reference sequence. Due to deletions, insertions, truncations, fusions, etc., which must be taken into account in determining the optimal alignment, the number of amino acid residues in the test sequence, usually determined by simple counting from the N-terminus, does not have to be identical to the numbering of its corresponding position in the reference sequence. For example, where a variant has a deletion relative to an aligned reference sequence, amino acids at positions corresponding to the positions of the deletions in the reference sequence are not present in the variant. When an insertion is present in the aligned reference sequences, the insertion does not correspond to the numbered amino acid position in the reference sequence. In the case of truncation or fusion, the amino acid segment in the reference sequence or aligned sequence may not correspond to any amino acid in the corresponding sequence.

The term "numbering reference" or "corresponding to" when used in the context of the numbering of a given amino acid or polynucleotide sequence refers to the numbering of the residues of the specified reference sequence when the given amino acid or polynucleotide is compared to the reference sequence.

"conservatively modified variants" applies to both amino acid sequences and nucleic acid sequences. With respect to particular nucleic acid sequences, conservatively modified variants refers to those nucleic acids which encode identical or essentially identical amino acid sequences, or where the nucleic acid does not encode an amino acid sequence; a substantially identical sequence. Due to the degeneracy of the genetic code, a large number of functionally identical nucleic acids encode any given protein. For example, the codons GCA, GCC, GCG and GCU all encode the amino acid alanine. Thus, at each position where an alanine is specified by a codon, the codon can be changed to any of the corresponding codons described without changing the encoded polypeptide. Such nucleic acid variations are "silent variations," which are one of the conservatively modified variations. Each nucleic acid sequence herein that encodes a polypeptide also describes each possible silent variation of the nucleic acid. The skilled artisan will recognize that each codon in a nucleic acid (except AUG, which is typically the only codon for methionine, and TGG, which is typically the only codon for tryptophan) can be modified to produce a functionally identical molecule. Thus, each silent variation of a nucleic acid encoding a polypeptide is implicit in each such sequence with respect to the expression product, but does not involve the actual probe sequence.

With respect to amino acid sequences, the skilled artisan will recognize that individual substitutions, deletions or additions to a nucleic acid, peptide, polypeptide, or protein sequence that alters, adds or deletes a single amino acid or a small percentage of amino acids in the encoded sequence are "conservatively modified variants" where the alteration results in the substitution of an amino acid with a chemically similar amino acid. Conservative substitution tables providing functionally similar amino acids are well known in the art. Such conservatively modified variants are additionally and not exclusively polymorphic variants, interspecies homologs, and alleles of the invention.

The following eight groups each contain amino acids that are conservative substitutions for one another: 1) alanine (a), glycine (G); 2) aspartic acid (D), glutamic acid (E); 3) asparagine (N), glutamine (Q); 4) arginine (R), lysine (K); 5) isoleucine (I), leucine (L), methionine (M), valine (V); 6) phenylalanine (F), tyrosine (Y), tryptophan (W); 7) serine (S), threonine (T); and 8) cysteine (C), methionine (M).

"percent sequence identity" is determined by comparing two optimally aligned sequences over a comparison window, wherein the portion of the polynucleotide or polypeptide sequence in the comparison window may comprise additions or deletions (i.e., gaps) as compared to the reference sequence (which does not comprise additions or deletions) to achieve optimal alignment of the two sequences. The percentages are calculated as follows: determining the number of positions at which the identical nucleic acid base or amino acid residue occurs in both sequences to yield the number of matched positions, dividing the number of matched positions by the total number of positions in the window of comparison and multiplying the result by 100 to yield the percentage of sequence identity.

The terms "identical" or "percent identical" in the context of two or more nucleic acid or polypeptide sequences refer to two or more sequences or subsequences that have the same amino acid residues or nucleotides or a specified percentage of the same amino acid residues or nucleotides when compared and aligned for maximum correspondence over a comparison window or designated region (i.e., 60% identical, optionally 65%, 70%, 75%, 80%, 85%, 90%, 95%, 98%, or 99% identical over, e.g., a designated region of the entire polypeptide sequence of the invention or individual domains of the polypeptide of the invention), as measured using one of the following sequence comparison algorithms or by manual alignment and visual inspection. Such sequences are then said to be "substantially identical". The definition also refers to the complement of the test sequence. Optionally, the identity is present over a region of at least about 50 nucleotides in length, or more preferably over a region of 100 to 500 or 1000 or more nucleotides in length.

When comparing sequences, typically one sequence acts as a reference sequence to which test sequences are compared. When using a sequence comparison algorithm, test and reference sequences are input into a computer, subsequence coordinates are designated, if necessary, and sequence algorithm program parameters are designated. Default program parameters may be used, or alternative parameters may be specified. The sequence comparison algorithm then calculates the percent sequence identity of the test sequence relative to the reference sequence based on the program parameters.

As used herein, "comparison window" includes reference to a segment of any of a plurality of consecutive locations selected from the group consisting of: e.g., a full-length sequence or from 20 to 600, from about 50 to about 200, or from about 100 to about 150 amino acids or nucleotides, wherein after optimal alignment of the two sequences, one sequence can be compared to a reference sequence having the same number of consecutive positions. Methods of sequence alignment for comparison are well known in the art. Optimal alignment of sequences for comparison can be achieved, for example, according to the local homology algorithm of Smith and Waterman (1970) advanced applied mathematics (adv.appl.math.) 2: 482c, according to the homology alignment algorithm of Needleman and Wunsch (1970), journal of molecular biology (J.mol.biol.) 48: 443 by the similarity search method of Pearson and Lipman (1988), 85 published by the national academy of sciences of the united states of america (proc.nat' l.acad.sci.usa): 2444, by Computer implementation of these algorithms (GAP, BESTFIT, FASTA and TFASTA in the Wisconsin Genetics software package, Genetics Computer Group (Genetics Computer Group), 575Science Dr., Madison, Wis.), or by manual alignment and visual inspection (see, for example, Ausubel et al, Current Protocols in Molecular Biology (journal of 1995)).

Examples of algorithms suitable for determining sequence identity and percent sequence similarity are the BLAST and BLAST 2.0 algorithms, described in Altschul et al (1977) nucleic acid research (Nuc.acids Res.) 25: 3389 3402, and Altschul et al (1990) journal of molecular biology (J.mol.biol.) 215: 403- & ltSUB & gt 410/& gt. Software for performing BLAST analyses is publicly available through the national center for Biotechnology information (http:// www.ncbi.nlm.nih.gov /). This algorithm involves first identifying high scoring sequence pairs (HSPs) by identifying short words of length W in the query sequence, which match or satisfy some positive-valued threshold score T when aligned with a word of the same length in a database sequence. T is referred to as the neighborhood word score threshold (Altschul et al, supra). These initial neighborhood word hit points serve as seeds for initiating searches to find longer HSPs containing them. As long as the cumulative alignment score can be increased, the word hit points extend in both directions along each sequence. Cumulative scores were calculated for nucleotide sequences using the parameters M (reward score for a pair of matching residues; always > 0) and N (penalty for mismatching residues; always < 0). For amino acid sequences, a scoring matrix is used to calculate the cumulative score. Decreasing the cumulative alignment score by an amount X from its maximum value reached; (ii) the cumulative score becomes zero or less due to accumulation of one or more negative-scoring residue alignments; or to the end of either sequence, the break-word hit point extends in all directions. The BLAST algorithm parameters W, T and X determine the sensitivity and speed of the alignment. The BLASTN program (for nucleotide sequences) uses by default the word length (W)11, the expectation (E) or 10, M-5, N-4, and a comparison of the two strands. For amino acid sequences, the BLASTP program defaults to using word length 3 and expectation (E)10 and BLOSUM62 scoring matrices (see Henikoff and Henikoff, Proc. Natl. Acad. Sci. USA 89: 10915(1989)), alignment (B)50, expectation (E)10, M-5, N-4, and two-strand comparisons.

The BLAST algorithm also performs a statistical analysis of the similarity between two sequences (see, e.g., Karlin and Altschul, Proc. Natl. Acad. Sci. USA 90: 5873. 5787 (1993)). One measure of similarity provided by the BLAST algorithm is the smallest sum probability (P (N)), which provides an indication of the probability by which a match between two nucleotide or amino acid sequences will occur by chance. For example, a nucleic acid is considered similar to a reference sequence if the smallest sum probability of a test nucleic acid when compared to a reference nucleic acid is less than about 0.2, more preferably less than about 0.01, and most preferably less than about 0.001.

An indication that two nucleic acid sequences or polypeptides are substantially identical is that the polypeptide encoded by the first nucleic acid is immunologically cross-reactive with antibodies raised against the polypeptide encoded by the second nucleic acid, as described below. Thus, a polypeptide is typically substantially identical to a second polypeptide, e.g., where the two peptides differ only by conservative substitutions. Another indication that two nucleic acid sequences are substantially identical is that the two molecules or their complements hybridize to each other under stringent conditions, as described below. Yet another indication that two nucleic acid sequences are substantially identical is that the same primers can be used to amplify the sequences.

"contacting" is used in its plain, clear sense and refers to a process that allows at least two different species to become close enough for a reaction, interaction, or physical contact to occur. However, it is to be understood that the resulting reaction product may be prepared directly from the reaction between the added reagents or from an intermediate derived from one or more added reagents that may be produced in the reaction mixture. In embodiments, contacting comprises, for example, allowing ribonucleic acids as described herein to interact with endonuclease and enhancer elements.

A "control" sample or value refers to a sample that is used as a reference (typically a known reference for comparison to a test sample). For example, a test sample can be taken from a test condition, e.g., in the presence of a test compound (e.g., a first or second DNA binding modulation enhancer), and compared to a sample from a known condition, e.g., in the absence of the test compound (negative control) or in the presence of a known compound (positive control). The control may also represent an average value collected from a number of tests or results. One skilled in the art will recognize that controls can be designed to evaluate any number of parameters. One skilled in the art will understand which standard controls are most appropriate in a given situation and will be able to analyze the data based on comparison to the standard control values. Standard controls are also valuable for determining the significance (e.g., statistical significance) of the data. For example, if the value of a given parameter varies greatly in a standard control, the variation in the test sample is not considered significant.

A "label" or "detectable moiety" is a composition that is detectable by spectroscopic, photochemical, biochemical, immunochemical, chemical or other physical means. For example, useful markers include³²P, fluorescent dyes, electron-dense reagents, enzymes (such as those commonly used in ELISA), biotin, digoxigenin (digoxigenin) or haptens and proteins or other entities that can be detected, for example by incorporating a radiolabel into a peptide or antibody that specifically reacts with the peptide of interest. Any suitable method known in the art for binding an antibody to a label may be used, for example, the method described in Hermanson, "Bioconjugate Techniques (Bioconjugate technologies) 1996, Academic Press, Inc., San Diego.

A "labeled protein or polypeptide" is a protein or polypeptide that is covalently bound, via a linker or chemical bond, or non-covalently bound to a label via ionic, van der Waals, electrostatic, or hydrogen bonding, such that the presence of the labeled protein or polypeptide can be detected by detecting the presence of the label bound to the labeled protein or polypeptide. Alternatively, the same result can be obtained using a method of high affinity interaction, where one of a pair of binding partners binds to the other, e.g., biotin, streptavidin.

"biological sample" or "sample" refers to a substance obtained from a subject or patient. Biological samples include tissue sections, such as biopsy samples and autopsy samples, as well as frozen sections taken for histological purposes. Such samples include bodily fluids such as blood and blood fractions or products (e.g., serum, plasma, platelets, red blood cells, etc.), sputum, tissue, cultured cells (e.g., primary cultures, explants, and transformed cells), stool, urine, synovial fluid, joint tissue, synovial cells, fibroblast-like synovial cells, macrophage-like synovial cells, immune cells, hematopoietic cells, fibroblasts, macrophages, T cells, and the like. The biological sample is typically obtained from a eukaryotic organism, e.g., a mammal, such as a primate, e.g., a chimpanzee or a human; cattle; a dog; a cat; rodents, e.g., guinea pigs, rats, mice; a rabbit; or a bird; a reptile; or fish.

As used herein, "cell" refers to a cell that performs a metabolic or other function sufficient to preserve or replicate its genomic DNA. Cells can be identified by methods well known in the art, including, for example, the presence of an intact membrane, staining with a particular dye, the ability to produce progeny, or, in the case of a gamete, the ability to produce viable progeny in combination with a second gamete. Cells may include prokaryotic cells and eukaryotic cells. Prokaryotic cells include, but are not limited to, bacteria. Eukaryotic cells include, but are not limited to, yeast cells and cells derived from plants and animals, such as mammalian, insect (e.g., Spodoptera), and human cells.

The term "gene" refers to a segment of DNA involved in the production of a protein; it includes regions before and after the coding region (enhancer, promoter, leader and trailer sequences) as well as intervening sequences (introns) between individual coding segments (exons). Enhancers, promoters, leaders, tails, and introns include regulatory elements necessary during transcription and translation of a gene. In addition, a "protein gene product" is a protein expressed from a particular gene.

The word "expression" or "expressed" as used herein with respect to a gene refers to the transcription and/or translation product of the gene. The level of expression of a DNA molecule in a cell can be determined based on the amount of the corresponding mRNA present in the cell or the amount of protein encoded by that DNA produced by the cell (Sambrook et al, 1989, molecular cloning: A laboratory Manual, 18.1-18.88)

Expression of the transfected gene may occur transiently or stably in the cell. During "transient expression", transfected genes are not transferred to daughter cells during cell division. Expression of the gene is lost over time because its expression is restricted to the transfected cells. In contrast, stable expression of a transfected gene can occur when the gene is co-transfected with another gene that confers a selective advantage on the transfected cells. Such a selective advantage may be resistance to certain toxins presented to the cells.

The term "plasmid" refers to a nucleic acid molecule that encodes a gene and/or regulatory elements necessary for gene expression. Expression of genes from plasmids can occur in cis or trans. If the gene is expressed in cis, the gene and the regulatory elements are encoded by the same plasmid. Trans-expression refers to the situation where the genes and regulatory elements are encoded by separate plasmids.

The term "episome" refers to the extrachromosomal state of the plasmid in the cell. Episomal plasmids are nucleic acid molecules that are not part of the chromosomal DNA and replicate independently thereof.

The term "exogenous" refers to a molecule or substance (e.g., a nucleic acid or protein) that is derived from outside a given cell or organism. Conversely, the term "endogenous" is a molecule or substance that is native to or derived within a given cell or organism.

A "cell culture" is a population of cells in vitro that are present in vitro. Cell cultures can be established from primary cells isolated from cell banks or animals, or from passaged cells derived from one of these sources and immortalized for long-term in vitro culture. Cell culture as provided herein further refers to an environment that includes suitable cell nutrients and is capable of maintaining cells in vitro. The environment may be a liquid environment, a solid environment, and/or a semi-solid environment (e.g., agar, gel, etc.) in a suitable container (e.g., cell culture dish). Cell culture media may be used. As used herein, "cell culture medium" is used according to its commonly accepted meaning in the art. Cell culture media (also referred to in the art and herein as "media") includes liquids (e.g., growth factors, minerals, vitamins, etc.) or gels designed to support cell growth (e.g., division, differentiation, maintenance, etc.). In embodiments, the compositions provided herein (including the examples) further comprise a physiologically acceptable solution. A "physiologically acceptable solution" as provided herein refers to any acceptable aqueous solution (e.g., buffer) in which a composition provided herein can be contained without losing its biological properties. In an embodiment, the physiologically acceptable solution is a cell culture medium.

The terms "transfection", "transduction", "transfection" or "transduction" may be used interchangeably and are defined as the process of introducing nucleic acid molecules and/or proteins into a cell. Nucleic acids can be introduced into cells using non-viral based or viral based methods. The nucleic acid molecule may be a sequence encoding a complete protein or a functional part thereof. Typically, nucleic acid vectors contain elements necessary for protein expression (e.g., promoters, transcription initiation sites, etc.). Non-viral transfection methods include any suitable method for introducing nucleic acid molecules into cells without the use of viral DNA or viral particles as a delivery system. Exemplary non-viral transfection methods include calcium phosphate transfection, lipofection, nuclear transfection, sonoporation, transfection by heat shock, magnetization transfection, and electroporation. For virus-based methods, any useful viral vector can be used in the methods described herein. Examples of viral vectors include, but are not limited to, retroviral, adenoviral, lentiviral, and adeno-associated viral vectors. In some aspects, the nucleic acid molecule is introduced into the cell using a retroviral vector according to standard procedures well known in the art. The term "transfection" or "transduction" also refers to the introduction of a protein into a cell from an external environment. Generally, transduction or transfection of proteins relies on the linkage of a peptide or protein capable of passing through the cell membrane to the protein of interest. See, e.g., Ford et al, "Gene Therapy (Gene Therapy) 8: 1-4(2001) and Prochiantz, "methods of nature (nat) 4: 119-120(2007).

As used herein, the term "specifically binds" or "specifically binds" refers to the formation of a complex (e.g., DNA binding enhancer and enhancer binding sequences) of two molecules that is relatively stable under physiological conditions.

As provided herein, "ribonucleoprotein complex", "ribonucleoprotein particle", "deoxyribonucleoprotein complex" or "deoxyribonucleoprotein particle" refers to a complex or particle comprising a nucleoprotein and a ribonucleic acid or deoxyribonucleic acid. "nucleoprotein" as provided herein refers to a protein capable of binding to a nucleic acid (e.g., RNA, DNA). In the case of a nucleoprotein binding to ribonucleic acid, it is called "ribonucleoprotein". In the case of a nucleoprotein bound to deoxyribonucleic acid, it is called "deoxyribonucleoprotein". The interaction between ribonucleoprotein and ribonucleic acid or the interaction between deoxyribonucleoprotein and ribonucleic acid may be direct, e.g., by covalent bonds, or indirect, e.g., by non-covalent bonds (e.g., electrostatic interactions (e.g., ionic bonds, hydrogen bonds, halogen bonds), van der waals interactions (e.g., dipole-dipole, dipole induced dipole, london dispersion), ring packing (pi effect), hydrophobic interactions, etc.). In embodiments, the ribonucleoprotein comprises an RNA binding motif that non-covalently binds to a ribonucleic acid. In embodiments, the ribonucleoprotein comprises an RNA binding motif that non-covalently binds to deoxyribonucleic acid. For example, a positively charged aromatic amino acid residue (e.g., a lysine residue) of an RNA-binding motif or a DNA-binding motif can form an electrostatic interaction with the negative nucleic acid phosphate backbone of RNA or DNA, thereby forming a ribonucleoprotein complex or a deoxyribonucleoprotein complex (e.g., the argan complex mentioned herein). Non-limiting examples of ribonucleoproteins include ribosomes, telomerase, RNAseP, hnRNP, CRISPR-associated protein 9(Cas9), and micronuclear rnp (snrnp). An example of a deoxyribonucleoprotein is alogenin. The ribonucleoprotein or deoxyribonucleoprotein may be an enzyme. In embodiments, the ribonucleoprotein or deoxyribonucleoprotein is an endonuclease. Thus, in embodiments, the ribonucleoprotein complex comprises an endonuclease and a ribonucleic acid. In embodiments, the endonuclease is CRISPR-associated protein 9. Thus, in embodiments, the deoxyribonucleoprotein complex comprises an endonuclease and a deoxyribonucleic acid. In embodiments, the endonuclease is an argan nuclease.

A "guide RNA" or "gRNA" as provided herein refers to a ribonucleotide sequence that is capable of binding to a nucleoprotein, thereby forming a ribonucleoprotein complex. Likewise, "guide DNA" or "gDNA" as provided herein refers to a deoxyribonucleotide sequence capable of binding to a nucleoprotein, thereby forming a deoxyribonucleoprotein complex. In embodiments, the guide RNA comprises one or more RNA molecules. In embodiments, the guide DNA comprises one or more DNA molecules. In embodiments, the gRNA includes a nucleotide sequence (e.g., a regulator binding sequence) that is complementary to the target site. In embodiments, the gDNA includes a nucleotide sequence (e.g., a regulator binding sequence) that is complementary to the target site. The complementary nucleotide sequence may mediate the binding of the ribonucleoprotein complex or the deoxyribonucleoprotein complex to the target site, thereby providing sequence specificity of the ribonucleoprotein complex or the deoxyribonucleoprotein complex. Thus, in embodiments, the guide RNA or guide DNA is complementary to the target nucleic acid (e.g., a regulator binding sequence). In embodiments, the guide RNA binds to a target nucleic acid sequence (e.g., a regulator binding sequence). In embodiments, the guide DNA binds to a target nucleic acid sequence (e.g., a regulator binding sequence). In embodiments, the guide RNA is complementary to a CRISPR nucleic acid sequence. In embodiments, the complement of the guide RNA or guide DNA has about 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% sequence identity to the target nucleic acid (e.g., a regulator binding sequence). A nucleic acid sequence of interest as provided herein is a nucleic acid sequence expressed by a cell. In embodiments, the target nucleic acid sequence is an exogenous nucleic acid sequence. In embodiments, the nucleic acid sequence of interest is an endogenous nucleic acid sequence. In embodiments, the nucleic acid sequence of interest (e.g., a regulator binding sequence) forms part of a cellular gene. Thus, in embodiments, the guide RNA or guide DNA is complementary to a cellular gene or fragment thereof. In embodiments, the guide RNA or guide DNA is about 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% complementary to the target nucleic acid sequence (e.g., a regulator binding sequence). In embodiments, the guide RNA or guide DNA is about 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% complementary to the cellular gene sequence. In embodiments, the guide RNA or guide DNA binds to a cellular gene sequence. The term "target nucleic acid sequence" refers to the regulator binding sequences provided herein.

In embodiments, the guide RNA or guide DNA is a single-stranded ribonucleic acid. In embodiments, the guide RNA or guide DNA is about 10, 20, 30, 40, 50, 60, 70, 80, 90, 100 or more nucleic acid residues in length. In embodiments, the guide RNA or guide DNA is about 10 to about 30 nucleic acid residues in length. In embodiments, the guide RNA or guide DNA is about 20 nucleic acid residues in length. In embodiments, the guide RNA or guide DNA may be at least 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100 or more nucleic acid residues or sugar residues in length. In embodiments, the length of the guide RNA or guide DNA is 5 to 50, 10 to 50, 15 to 50, 20 to 50, 25 to 50, 30 to 50, 35 to 50, 40 to 50, 45 to 50, 5 to 75, 10 to 75, 15 to 75, 20 to 75, 25 to 75, 30 to 75, 35 to 75, 40 to 75, 45 to 75, 50 to 75, 55 to 75, 60 to 75, 65 to 75, 70 to 75, 5 to 100, 10 to 100, 15 to 100, 20 to 100, 25 to 100, 30 to 100, 35 to 100, 40 to 100, 45 to 100, 50 to 100, 55 to 100, 60 to 100, 65 to 100, 70 to 100, 75 to 100, 80 to 100, 85 to 100, 90 to 100, 95 to 100, or more residues in length. In embodiments, the guide RNA or guide DNA is 10 to 15, 10 to 20, 10 to 30, 10 to 40, or 10 to 50 residues in length.

PAM refers to "protospacer adjacent motifs". These sites are typically 2-6 base pair DNA sequences adjacent to the DNA sequence to which Cas9 binds. Thus, in some cases, a DNA-binding modulation enhancer other than Cas9 may be used, while in other cases, a single Cas9/RNA complex may be used as a DNA-binding modulation enhancer (either alone or in combination with different DNA-binding modulation enhancers).

For the specific proteins described herein (e.g., Cas9, alogenin), the proteins include naturally occurring forms, or variants or homologs of any protein that maintain protein transcription factor activity (e.g., activity in the range of at least 50%, 80%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% activity as compared to the native protein). In some embodiments, a variant or homologue has at least 90%, 95%, 96%, 97%, 98%, 99% or 100% amino acid sequence identity over the entire sequence or a portion of the sequence (e.g., a 50, 100, 150 or 200 contiguous amino acid portion) as compared to the naturally occurring form. In other embodiments, the protein is a protein identified from its NCBI sequence reference. In other embodiments, the protein is a protein identified according to its NCBI sequence reference or a functional fragment or homologue thereof.

Thus, reference herein to "CRISPR-associated protein 9", "Cas 9", or "Cas 9 protein" includes any recombinant or naturally occurring form of a Cas9 endonuclease or a variant or homolog thereof that maintains Cas9 endonuclease activity (e.g., activity within at least 50%, 80%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% as compared to Cas 9). In some aspects, a variant or homolog has at least 90%, 95%, 96%, 97%, 98%, 99% or 100% amino acid sequence identity over the entire sequence or a portion of the sequence (e.g., 50, 100, 150 or 200 contiguous amino acid portions) as compared to a naturally occurring Cas9 protein. In embodiments, the Cas9 protein is substantially identical to a protein identified according to UniProt reference Q99ZW2 or a variant or homolog thereof having substantial identity thereto. Cas9 refers to a protein also known in the art as a "nickase. In embodiments, Cas9 binds to CRISPR (clustered regularly interspaced short palindromic repeats) nucleic acid sequences. In embodiments, the CRISPR nucleic acid sequence is a prokaryotic nucleic acid sequence. Examples of Cas9 proteins suitable for use in the present invention provided herein include, but are not limited to, Cas9 muteins, such as, for example, the hi fi Cas9 described by kleintiver, Benjamin p. et al ("High-fidelity CRISPR-Cas9 nuclease with no detectable whole genome off-target effect" (High-fidelity CRISPR-Cas9 nucleic acids with no detectable genome-with off-target effects ")" (Nature (2016) (PubMed PMID: 26735016); cas9 protein binds to modified PAM and orthologous Cas9 proteins, such as CRISPR (Cpf1) of Prevotella and Francisella 1. Any mutant Cas9 form commonly known and described in the art can be used in the methods and compositions provided herein. Non-limiting examples of mutant Cas9 proteins contemplated for use in the methods and compositions provided herein are described in Slaymaker, Ian m, et al ("Rationally engineered Cas9 nuclease with improved specificity (rational engineered Cas9 nucleotides with improved specificity)", Science (Science) (2015): aad5227.PubMed PMID: 26628643) and kleintiver, Benjamin p, et al ("high fidelity CRISPR-Cas9 nuclease with undetectable whole genome decolonization" (2016) (PubMed med PMID: 26735016).

As referred to herein, the terms "Agrin (AGO) protein", "NgAgo" or "bacillus glaucophilus (Natronobacterium gregoryi) argen", "bacillus glaucophilus SP2 argen" include recombinant or naturally occurring forms of NgAgo or variants or homologues thereof that maintain NgAgo endonuclease activity (e.g., in at least 50%, 80%, 90%, 95%, 96%, 97%, 98%, 99% or 100% activity range as compared to wild-type NgAgo). In embodiments, the variant or homologue has at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100% amino acid sequence identity over the entire sequence or a portion of the sequence (e.g., a 50, 100, 150 or 200 contiguous amino acid portion) as compared to a naturally occurring NgAgo protein. In embodiments, the NgAgo protein is substantially identical to a protein identified by National Center for Biotechnology Information (NCBI) protein identifier AFZ73749.1 or a variant or homologue thereof having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% amino acid sequence identity thereto. In embodiments, the alogenin may also include a nuclease domain (i.e., a deoxyribonuclease or ribonuclease domain), additional DNA binding domains, helicase domains, protein-protein interaction domains, dimerization domains, and other domains.

Alogenin also refers to proteins that form complexes that bind nucleic acid molecules. Thus, one argan protein may bind to e.g. a guide RNA and another protein may have endonuclease activity. These proteins are all considered to be alogenins, since they act as part of a complex that performs the same function as a single protein (e.g., NgAgo).

As used herein, the term "argan system" refers to a set of argan and nucleic acids that, when combined, produce at least an argan-related activity (e.g., target locus-specific double-stranded cleavage of double-stranded DNA).

As used herein, the term "argin complex" refers to an argin and a nucleic acid (e.g., RNA) that bind to each other to form functionally active aggregates. An example of an argan complex is the argan protein of bacillus halophilus griffithii (NgAgo) that binds to a guide DNA specific for a target locus.

In embodiments, reference herein to an "Agrin (AGO) protein," "NgAgo," or "agrin of bacillus graham alkaline", "bacillus graham alkaline SP2 agrin" includes any recombinant or naturally occurring form of NgAgo or a variant or homolog thereof that maintains NgAgo endonuclease activity (e.g., within at least 50%, 80%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% activity range as compared to wild-type NgAgo). In embodiments, the variant or homologue has at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100% amino acid sequence identity over the entire sequence or a portion of the sequence (e.g., a 50, 100, 150 or 200 contiguous amino acid portion) as compared to a naturally occurring NgAgo protein. In embodiments, the NgAgo protein is substantially identical to a protein identified by National Center for Biotechnology Information (NCBI) protein identifier AFZ73749.1 or a variant or homologue thereof having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% amino acid sequence identity thereto. In embodiments, the alogenin may also include a nuclease domain (i.e., a deoxyribonuclease or ribonuclease domain), additional DNA binding domains, helicase domains, protein-protein interaction domains, dimerization domains, and other domains.

As used herein, the term "transcriptional regulatory sequence" refers to a functional nucleotide fragment of any configuration or geometry contained on a nucleic acid molecule that regulates the transcription of (1) one or more structural genes (e.g., two, three, four, five, seven, ten, etc.) into messenger RNA or (2) one or more genes into untranslated RNA. Examples of transcriptional regulatory sequences include, but are not limited to, promoters, enhancers, repressors, and the like.

As used herein, the term "nucleic acid targeting ability" refers to the ability of a molecule or molecular complex to recognize and/or bind to a nucleic acid according to sequence specificity. For example, binding of a regulatory protein or regulatory complex to a regulator binding sequence or hybridizing region on a guide dna (gdna) molecule confers the ability of a nucleic acid to target an alogenin complex.

As used herein, a "TAL effector" or "TAL effector protein" provided herein refers to a protein that includes more than one TAL repeat sequence and is capable of binding a nucleic acid in a sequence-specific manner. In embodiments, the TAL effector protein comprises at least six (e.g., at least 8, at least 10, at least 12, at least 15, at least 17, about 6 to about 25, about 6 to about 35, about 8 to about 25, about 10 to about 25, about 12 to about 25, about 8 to about 22, about 10 to about 22, about 12 to about 22, about 6 to about 20, about 8 to about 20, about 10 to about 22, about 12 to about 20, about 6 to about 18, about 10 to about 18, about 12 to about 18, etc.) TAL repeats. In embodiments, the TAL effector protein comprises an 18 or 24 or 17.5 or 23.5TAL nucleic acid binding cassette. In embodiments, the TAL effector protein comprises 15.5, 16.5, 18.5, 19.5, 20.5, 21.5, 22.5, or 24.5TAL nucleic acid binding cassettes. TAL effector proteins comprise at least one polypeptide region flanked by regions containing TAL repeats. In embodiments, the flanking regions are present at the amino terminus and/or the carboxy terminus of the TAL repeat sequence.

As used herein, "control sequences" refer to nucleic acid sequences that affect initiation of transcription and/or translation, as well as the rate, stability, and/or mobility of a transcript or polypeptide product. Regulatory sequences include, but are not limited to, promoter or control elements, enhancer sequences, response elements, protein recognition sites, inducible elements, protein binding sequences, transcription initiation sites, termination sequences, polyadenylation sequences, introns, 5 'and 3' untranslated regions (UTRs), and other regulatory sequences that may be present within a coding sequence, such as splice sites, inhibitory sequence elements (commonly referred to as CNS or INS, as known from certain viruses), secretion signals, Nuclear Localization Signal (NLS) sequences, introns, translational coupling sequences, protease cleavage sites, as described in more detail elsewhere herein. The 5' untranslated region (UTR) is transcribed but not translated, and is located between the start site of the transcript and the translation initiation codon, and may include +1 nucleotides. The 3' UTR may be located between the translation stop codon and the end of the transcript. UTRs can have specific functions, such as increasing mRNA message stability or translational attenuation. Examples of 3' UTRs include, but are not limited to, polyadenylation signals and transcription termination sequences. The control sequences may be universal or host or tissue specific.

As used herein, a "promoter" is a transcriptional regulatory sequence that, when operably linked to a nucleic acid segment (e.g., a transgene comprising, for example, an open reading frame), is capable of directing transcription of the nucleic acid segment. A promoter is a nucleotide sequence located upstream of the transcription initiation site (usually near the initiation site of RNA polymerase II). A promoter typically comprises at least one core or base motif, and may include or cooperate with at least one or more control elements, such as upstream elements (e.g., Upstream Activation Regions (UARs)) or other regulatory sequences or synthetic elements. The base motif constitutes the minimal sequence required for assembly of the transcription complex required for transcription initiation. In embodiments, such minimal sequences include a "TATA box" element, which may be located between about 15 to about 35 nucleotides upstream of the transcription start site. The basal promoter may also include a "CCAAT cassette" element (typically the sequence CCAAT) and/or a GGGGG sequence, which may be located between about 40 to about 200 nucleotides, typically about 60 to about 120 nucleotides, upstream of the transcription start site. Transcription of adjacent nucleic acid fragments starts in the promoter region. The transcription rate of a repressible promoter decreases in response to an inhibitor. The transcription rate of an inducible promoter increases in response to an inducing agent. The transcription rate of a constitutive promoter is not specifically regulated, however it can vary under the influence of general metabolic conditions.

The choice of promoters contained in the Expression vectors depends on several factors including, but not limited to, efficiency, selectivity, inducibility, desired Expression levels and cell or tissue specificity, for example, tissue, organ and cell specific promoters conferring transcription only or mainly in specific tissues, organs and cell types, respectively, may be used in embodiments, promoters that are substantially specific for seeds ("seed-preferred promoters") may be useful in embodiments, constitutive promoters are used which may promote transcription in most or all tissues of a specific species, other classes of promoters include, but are not limited to, inducible promoters, such as promoters conferring transcription in response to external stimuli such as chemical agents, developmental stimuli or environmental stimuli, inducible promoters may be induced by pathogens or cold or stress such as cold, heat, ultraviolet light or high ion concentrations, or may be induced by chemical substances, examples of inducible promoters are eukaryotic metallothionein promoters induced by elevated heavy metal contents, lacL promoters, which are induced in response to isopropyl- β -D-thiogalactopyranoside (Mannosyl glycoside), and high temperature promoters are commercially available from the yeast, heat-inducible promoters, and other eukaryotic Expression systems known in the art, such as the Bacillus subtilis Expression systems and the Escherichia coli, and the Expression systems described in the Laboratory, and the transgenic animal, such as the Expression systems described in the human Escherichia coli, and the like, and the same are well known in the same methods described in the same or similar, and the same or similar.

Common promoters for prokaryotic protein expression are, for example, the lac promoter or the trc and tac promoters (IPTG-inducible), the tetA promoter/operator (anhydrotetracycline-inducible), the PPBAD promoter (L-arabinose-inducible), the r/zaPBAD promoter (L-rhamnose-inducible) or phage promoters, for example the phage promoter pL (sensitive to temperature changes), T7, T3, SP6 or T5.

Common promoters for mammalian protein expression are, for example, the Cytomegalovirus (CMV) promoter, the SV40 promoter/enhancer, the vaccinia virus promoter, the viral LTR (MMTV, RSV, HIV, etc.), the E1B promoter, the promoter of a constitutively expressed gene (actin, GAPDH), the promoter of a gene expressed in a tissue-specific manner (albumin, NSE), the promoter of an inducible gene (metallothionein, steroid hormones).

Many promoters for expressing nucleic acids in plants are known and can be used in the practice of the present invention. Such promoters may be constitutive, regulatable, and/or tissue specific (e.g., seed-specific, stem-specific, leaf-specific, root-specific, fruit-specific, etc.). Exemplary promoters that may be used for plant expression include the cauliflower mosaic virus 35S promoter and the promoters of the following genes: the ACT 11 and CAT 3 genes from arabidopsis thaliana, the gene encoding stearoyl-acyl carrier protein desaturase from brassica napus (GenBank accession number X74782), and the genes encoding GPCl (GenBank accession number X15596) and GPC2(GenBank accession number U45855) from maize. Other promoters include the tobacco mosaic virus subgenomic promoter, the Cassava Vein Mosaic Virus (CVMV) promoter (which exhibits high transcriptional activity in vascular elements, mesophyll cells and root tips), the drought-inducible promoter of maize, and the cold, drought and high salt-inducible promoters of potato. Many other promoters suitable for plant expression are found in U.S. patent No. 8,067,222, the disclosure of which is incorporated herein by reference.

Heterologous expression in chloroplasts of microalgae (e.g., chlamydomonas reinhardtii) can be achieved in a genetic background lacking psbA using, for example, the psbA promoter/5 'untranslated region (UTR) (due to psbA/Dl dependent self-attenuation) or by fusing a strong 16S rRNA promoter with the 5' UTR of the psbA and atpA genes in an expression cassette, as disclosed, for example, in: rasala et al, "improving the expression of heterologous proteins in C.reinhardtii chloroplasts by promoter and 5 'untranslated region optimization (Improved heterologous protein expression in the chloroplature of Chlamydomonas reinhardtii and 5' untranslated region)", "Plant Biotechnology Journal (Plant Biotechnology Journal) Vol.9, Vol.6, p.674 and 683 (2011). The promoter used to direct expression of the TAL effector-encoding nucleic acid depends on the particular application. For example, strong constitutive promoters are commonly used for expression and purification of TAL effector fusion proteins. In contrast, when TAL effector nuclease fusion proteins are administered in vivo for gene regulation, depending on the particular use of the TAL effector nuclease fusion protein and other factors, it may be desirable to use a constitutive or inducible promoter. In addition, a promoter suitable for administration of TAL effector nuclease fusion proteins can be a weak promoter, such as HSV thymidine kinase or a promoter with similar activity. Promoters may also typically include elements responsive to transactivation, such as hypoxia-responsive elements, Gal 4-responsive elements, lac repressor-responsive elements, and small molecule control systems, such as the tet control system and the RU-486 system (see, e.g., Gossen and Bujard. strict control of Gene expression in mammalian cells by tetracycline-responsive promoters (light control of Gene expression in mammalian cells-published promoters.) (89: 5547 (1992); Oligino et al, use of herpes simplex virus vectors to achieve Drug-induced transgene expression in the brain (Drug induced expression in mammalian cells) eukaryotic Gene vector therapy (Gene therapy 5: 491 496 (1998); use of Positive expression of genes in human cells and Positive expression of genes in mammalian cells-derived genes (Positive expression of genes and Positive expression of genes in human genes) using herpes simplex virus vectors regulator.) gene therapy 4: 432-441 (1997); neering et al, transduction of primitive human hematopoietic cells with recombinant adenovirus vectors using recombinant adenovirus vectors, "Blood (Blood) 88: 1147 1155 (1996); and Rendahl et al, Regulation of gene expression following transduction by two separate rAAV vectors (Regulation of gene expression in viral fermentation by two separate rAAV vectors), nature biotechnology (nat. biotechnol.) 16: 151-161(1998). The MNDU3 promoter may also be used and is preferably active in CD34+ hematopoietic stem cells.

"host" refers to a cell or organism that supports replication of a vector or expression of a protein or polypeptide encoded by a vector sequence. The host cell may be a prokaryotic cell, such as E.coli, or a eukaryotic cell, such as a yeast, fungal, protozoan, higher plant, insect or amphibian cell, or a mammalian cell, such as CHO, HeLa, 293, COS-1, and the like, such as cultured cells (in vitro), explants and primary cultures (in vitro and ex vivo), and cells in vivo.

As used herein, the phrase "recombinant protein" includes excised or integrated proteins, enzymes, cofactors or related proteins involved in a recombination reaction involving one or more recombination sites (e.g., two, three, four, five, seven, ten, twelve, fifteen, twenty, thirty, fifty, etc.), which may be wild-type proteins (see Landy, Current opinion Biotechnology 3: 699-. Examples of recombinant proteins include Cre, Int, IHF, Xis, Flp, Fis, Hin, Gin, Phi-C31, Cin, Tn3 resolvase, TndX, XerC, XerD, TnpX, Hjc, SpCCE1, and ParA.

As used herein, the phrase "recombination site" refers to a recognition sequence on a nucleic acid molecule that participates in an integration/recombination reaction by a recombinant protein. Recombination sites are discrete portions or segments of nucleic acid on the participating nucleic acid molecules that are recognized and bound by site-specific recombination proteins during the initial stages of integration or recombination. For example, the recombination Site for the Cre recombinase is loxV, which is a 34 base pair sequence comprising two 13 base pair inverted repeats (serving as recombinase binding sites) flanked by an 8 base pair core sequence (see FIG. 1: Sauer, B., Site-specific recombination: development and applications (Site-specific recombination: details and applications), New Biotechnology (Current, Opin. Biotech.) 5: 521-eye 527 (1994)). Other examples of recognition sequences include the attB, attP, attL and attR sequences described herein, as well as mutants, fragments, variants and derivatives thereof, which are recognized by the recombinant proteins lambda phage integrase and the accessory protein Integration Host Factor (IHF), Fis and excisionase (lambda phage is).

As used herein, the phrase "recognition sequence" refers to a specific sequence that is recognized and bound by a protein, compound, DNA, or RNA molecule (e.g., a restriction endonuclease, a modified methylase, or a recombinase). In the present invention, the recognition sequence is generally referred to as a recombination site. For example, the recognition sequence for Cre recombinase is loxP, which is a 34 base pair sequence comprising two 13 base pair inverted repeats (serving as recombinase binding sites) flanking a 8 base pair core sequence (see Sauer, B., New Biotechnology, 5: 521-527(1994) in FIG. 1). Other examples of recognition sequences are attB, attP, attL and attR sequences recognized by recombinase lambda phage integrase. attB is an approximately 25 base pair sequence containing two 9 base pair core-type Int binding sites and a 7 base pair overlap region. attP is an approximately 240 base pair sequence containing a core-type Int binding site and an arm-type Int binding site, as well as sites for accessory protein Integration Host Factor (IHF), FIS, and excisionase (lambda phage is). (see Landy, New Biotechnology, 3: 699-707 (1993)).

Throughout this document, unless the context requires otherwise, the words "comprise", "comprises" and "comprising" or "contains", "contains" or "contains" will be understood to imply the inclusion of a stated step or element or group of steps or elements but not the exclusion of any other step or element or group of steps or elements.

As used herein, the term "homologous recombination" refers to a mechanism of genetic recombination in which two DNA strands comprising similar nucleotide sequences exchange genetic material. Cells utilize homologous recombination during meiosis, wherein homologous recombination is used not only to rearrange DNA to produce a fully unique set of haploid chromosomes, but also to repair damaged DNA, particularly to repair double-strand breaks. The mechanisms of homologous recombination are well known to the skilled worker and have been described, for example, in Paques and Haber (PaquesF, Haber J E.; review in microbiology and molecular biology (Microbial. Mal. biol. Rev.). 63: 349-404 (1999)). In the method of the invention, homologous recombination is achieved by placing the first and second flanking elements present upstream (5 ') and downstream (3') respectively of the donor DNA sequence, each of which is homologous to a contiguous DNA sequence within the target sequence.

As used herein, the term "non-homologous end joining" (NHEJ) refers to a cellular process that joins the two ends of a double-stranded break (DSB) by a process largely independent of homology. Naturally occurring DSBs are produced spontaneously during DNA synthesis, when replication forks encounter damaged templates, and during certain specific cellular processes, including v (d) J recombination, class switch recombination occurring at the immunoglobulin heavy chain (IgH) locus, and meiosis. In addition, DSBs are produced by exposure of cells to ionizing radiation (X-rays and gamma rays), ultraviolet light, topoisomerase poisons, or radio-mimetic drugs. The NHEJ (non-homologous end joining) pathway joins the two ends of a DSB by a process largely independent of homology. Depending on The specific sequence and chemical modifications produced at The DSB, NHEJ may be precise or mutation-induced (Lieber m.r., The mechanism for repairing double-stranded DNA breaks through non-homologous DNA end joining pathways (The mechanism of double-strand DNA break-joining pathway) & Annu RevBiochem & 79: 181-211).

As used herein, the term "donor DNA" or "donor nucleic acid" refers to a nucleic acid designed for introduction into a locus by homologous recombination. The donor nucleic acid has at least one region of homology to the locus sequence. In embodiments, the donor nucleic acid has two regions of homology to the locus sequence. These homologous regions may be located at either end or may be located within the donor nucleic acid. In embodiments, a nucleic acid "insertion" region that is desired to be introduced into a nucleic acid molecule present in a cell will be located between two homologous regions.

The donor nucleic acid molecule (e.g., donor DNA molecule) can be double-stranded, single-stranded, or partially double-stranded and single-stranded, and thus, can have overhangs (e.g., two 5 'overhangs, two 3' overhangs, one 5 'and one 3' overhang, a single 3 'overhang, or a single 5' overhang) at one or both ends. Furthermore, the nucleic acid molecule may be a linear nucleic acid molecule (closed circular or nicked nucleic acid molecule) of a circular nucleic acid molecule.

As used herein, the term "homologous recombination system" or "HR system" refers to a component of the systems set forth herein that can be used to alter a cell by homologous recombination. In particular, zinc finger nucleases, TAL effector nucleases, CRISPR endonucleases, homing endonucleases and ahigh editing systems.

As used herein, the term "nucleic acid cleavage entity" refers to a single molecule or a complex of molecules having nucleic acid cleavage activity (e.g., double-stranded nucleic acid cleavage activity). Exemplary nucleic acid cleavage entities include argan complexes, zinc finger proteins, transcription activator-like effectors (TALEs), CRISPR complexes, and nested megabase meganucleases. In embodiments, the nucleic acid cleaving entities will have activity that allows them to be nuclear localized (e.g., will comprise a Nuclear Localization Signal (NLS)).

As used herein, the term "double strand break site" refers to a location in a nucleic acid molecule at which a double strand break occurs. In embodiments, this will be produced by cleaving a nucleic acid molecule at two proximate locations (e.g., within a range of about 3 to about 50 base pairs, about 5 to about 50 base pairs, about 10 to about 50 base pairs, about 15 to about 50 base pairs, about 20 to about 50 base pairs, about 3 to about 40 base pairs, about 5 to about 40 base pairs, about 10 to about 40 base pairs, about 15 to about 40 base pairs, about 20 to about 40 base pairs, etc.). In general, the nick spacing can be further increased in nucleic acid regions comprising higher AT content as compared to nucleic acid regions comprising higher GC content.

As used herein, the term "matched end" refers to an end of a nucleic acid molecule having greater than 90% sequence identity. The matching end of the DS break at the target locus can be double-stranded or single-stranded. The matching end of the donor nucleic acid molecule is typically single stranded.

As used herein, "homology directed repair" or "HDR" is a mechanism of repairing DNA Double Strand Breaks (DSBs) in cells. In some embodiments, HDR is greater than or equal to 10%, 25%, 50%, 75%, 90%, 95%, 98%, 99%, or 100%.

A common form of HDR is "homologous recombination," which refers to a genetic recombination mechanism in which two DNA strands comprising similar nucleotide sequences exchange genetic material. Cells utilize homologous recombination during meiosis, wherein homologous recombination is used not only to rearrange DNA to produce a fully unique set of haploid chromosomes, but also to repair damaged DNA, particularly to repair double-strand breaks. The mechanisms of homologous recombination are well known to the skilled person and have been described, for example, in Paques f., haber j.e., in microbiological and molecular biology reviews (microbiol.mol.biol.rev.) 63: 349-404(1999). In some embodiments, homologous recombination is achieved by the presence of matched ends located upstream (5 ') and downstream (3') respectively of donor nucleic acid molecules, each of which is homologous to a contiguous DNA sequence within the cleaved nucleic acid molecule.

Some embodiments include compositions and methods designed to increase the efficiency of homologous recombination in cells (e.g., eukaryotic cells, such as plant cells and animal cells, such as insect cells and mammalian cells, including mouse, rat, hamster, rabbit, and human cells). In some embodiments, the efficiency of homologous recombination is such that greater than 20% of the cells in the population will undergo homologous recombination at the desired target locus or loci. In some embodiments, homologous recombination may occur within a range of 10% to 65%, 15% to 65%, 20% to 65%, 30% to 65%, 35% to 65%, 10% to 55%, 20% to 55%, 30% to 55%, 35% to 55%, 40% to 55%, 10% to 45%, 20% to 45%, 30% to 45%, 40% to 45%, 30% to 50% of the cells in a population, and the like.

In addition, some embodiments include compositions and methods for increasing the efficiency of homologous recombination within a cell. For example, if homologous recombination occurs in 10% of the cell population under one set of conditions and in 40% of the cell population under another set of conditions, the efficiency of homologous recombination is increased by 300%. In some embodiments, homologous recombination efficiency can be increased by 100% to 500% (e.g., 100% to 450%, 100% to 400%, 100% to 350%, 100% to 300%, 200% to 500%, 200% to 400%, 250% to 500%, 250% to 400%, 250% to 350%, 300% to 500%, etc.).

As used herein, "double-strand break" or "DSB" refers to a double-strand break in a nucleic acid molecule. In many embodiments, a DSB will be generated by cleaving a nucleic acid molecule at two proximate locations (e.g., within a range of 3 to 50 base pairs, 5 to 50 base pairs, 10 to 50 base pairs, 15 to 50 base pairs, 20 to 50 base pairs, 3 to 40 base pairs, 5 to 40 base pairs, 10 to 40 base pairs, 15 to 40 base pairs, 20 to 40 base pairs, etc.). The spacing of the nicks can be further increased in nucleic acid regions comprising higher AT content as compared to nucleic acid regions comprising higher GC content. In some embodiments, the double strand break is less than or equal to 250bp from the ATG start codon for an N-terminal marker of the nucleic acid molecule or less than or equal to 250bp from the stop codon for a C-terminal marker of the nucleic acid molecule.

As used herein, "donor nucleic acid molecule" or "donor DNA" refers to a nucleic acid designed for introduction into a cleaved nucleic acid molecule by homologous recombination. The donor nucleic acid molecule will have at least one region of homology to the sequence of the cleaved nucleic acid molecule. In many embodiments, the donor nucleic acid molecule will have two regions of homology to the locus sequence. These homologous regions may be located at one or both ends, or may be located internally within the donor nucleic acid molecule.

As used herein, "integration efficiency" refers to the frequency with which a foreign DNA segment of interest is incorporated into an initial nucleic acid molecule. In some embodiments, the integration efficiency of the donor nucleic acid molecule is greater than or equal to 50%, 75%, 90%, 95%, 98%, 99%, or 100%.

Table 1 shows that near 100% integration efficiency and accurate HDR up to 100% were found at four different genomic loci in three different mammalian cell lines. At certain loci, deletions and insertions at the junction or Cas9 cleavage site were observed.

a: as determined by flow cytometry; b: measured by ELISA assay; c: because of the low expression level of the chimeric protein, it cannot be determined by flow cytometry.

In some embodiments, end-modification of donor DNA with phosphorothioate or amino groups and/or treatment with non-homologous end-attachment inhibitors (NHEJ) inhibitors may further improve the efficiency of HDR.

As used herein, "matched end" refers to an end of a nucleic acid molecule having greater than or equal to 90% sequence identity. In some embodiments, the matched ends on the 5 'and 3' ends have a length of 12bp to 250bp, 12bp to 200bp, 12bp to 150bp, 12bp to 100bp, 12bp to 50bp, or 12bp to 40 bp. In some embodiments, the matching end has a length of 35 bp. In some embodiments, the matching ends will share greater than or equal to 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, or 99% or equal to 100% sequence identity. The matching end of the double-stranded break at the target locus can be double-stranded or single-stranded DNA. In some embodiments, the matching end of the donor nucleic acid molecule will be single stranded.

The greater the amount of sequence identity that the matched end shares with the nucleic acid at the locus of interest, the greater the efficiency of homologous recombination. A high level of sequence identity is particularly desirable when the region of homology is relatively short (e.g., 50 bases). In some embodiments, the amount of sequence identity between the locus of interest and the matched end is greater than 90% (e.g., 90% to 100%, 90% to 99%, 90% to 98%, 95% to 100%, 95% to 99%, 95% to 98%, 97% to 100%, etc.).

As used herein, "percent sequence identity" means a value determined by comparing two optimally aligned nucleotide sequences over a comparison window, wherein the portion of the nucleotide sequence in the comparison window may comprise additions or deletions (i.e., gaps in sequence alignment) as compared to the reference sequence (which does not comprise additions or deletions) in order to achieve optimal alignment of the two sequences. In other words, sequence alignment gaps are removed for quantification purposes. Percent sequence identity was calculated as follows: determining the number of positions at which the identical nucleic acid base or amino acid residue occurs in both sequences to yield the number of matched positions, dividing the number of matched positions by the total number of positions in the window of comparison and multiplying the result by 100 to yield the percentage of sequence identity.

One method of determining sequence identity values is to use the BLAST 2.0 suite of programs that use default parameters (Altschul et al, Nucleic Acids research (Nucleic Acids Res.) 25: 3389-3402 (1997)). Software for performing BLAST analysis is publicly available, for example, through the National Center for biotechnology Information.

In some embodiments, the termini can differ in one or more characteristics associated with homologous recombination. For example, the length of the terminal "matching" region complementary to the target locus sequence can vary. Thus, one end may have forty nucleotides with sequence complementarity, while the other end may have only fifteen nucleotides with sequence complementarity. In some embodiments, one or both ends of the donor nucleic acid molecule will be partially or fully single stranded.

As used herein, "promoter-free selectable marker" refers to a foreign gene of interest that has no promoter, such that it is only expressed after insertion into a genomic locus that contains a promoter. In some embodiments, the promoterless selection marker is a protein, an antibiotic resistance selection marker, a cell surface protein, a metabolite, or an active fragment thereof. In some embodiments, the promoterless selection marker is a marker (e.g., EmGFP or OFP). In one embodiment, the promoterless selection marker is puromycin, dihydrofolate reductase or glutamine synthetase.

The promoterless selection marker can be directly linked to the reporter gene, or alternatively, the donor nucleic acid molecule can comprise another amino acid sequence that serves as a "linker" between the promoterless selection marker and the reporter gene. The linker may be a polypeptide or any other suitable linker known in the art. In some embodiments, the linker comprises greater than or equal to 2, 3, 5, 10, 15, 20, 30, 40, 50, 60, 70, 80, or 90 amino acids. In some embodiments, the linker comprises 100 amino acids. In some embodiments, the linker comprises greater than or equal to two amino acids selected from the group consisting of glycine, serine, alanine, and threonine. In some embodiments, the linker is a polyglycine linker. In some embodiments, the polyglycine linker comprises 3, 4,5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 glycine residues. In one embodiment, the linker is a 6 residue polyglycine. In some embodiments, the distance between the promoterless selection marker and the reporter gene is less than or equal to 300nt, 240nt, 180nt, 150nt, 120nt, 90nt, 60nt, 30nt, 15nt, 12nt, or 9 nt.

As used herein, a "reporter gene" refers to a gene whose product is readily analyzed and can be used as a marker to screen successfully modified cells, to study regulation of gene expression, or as a control to normalize recombination efficiency.

As used herein, "self-cleaving peptide" refers to a peptide that dissociates into component proteins upon translation. In some embodiments, the self-cleaving peptide links the promoterless selection marker and the reporter gene and, upon recombination into the initial nucleic acid molecule, enables dissociation of the promoterless selection marker from the reporter gene during translation. In some embodiments, the self-cleaving peptide is a self-cleaving 2A peptide or other self-cleaving peptide known to the skilled artisan.

In some embodiments, the "loxP" or "X-over P1 locus" is located on either side of a promoterless selection marker, either in place of or in addition to a self-cleaving peptide. LoxP can be used as part of the Cre-lox recombination strategy to facilitate replication of the promoterless selection marker. The Cre-lox strategy requires at least two components: 1) cre recombinase, an enzyme that catalyzes recombination between two loxP sites; and 2) loxP sites (e.g., a specific 34 base pair sequence consisting of 8bp core sequence where recombination occurs and two flanking 13bp inverted repeats) or mutant lox sites. (see, e.g., Araki et al, PNAS 92: 160-4 (1995); Nagy, A. et al, Genesis (Genesis) 26: 99-109 (2000); Araki et al, nucleic acids research (NucAcids Res) 30 (19): e103 (2002); and US20100291626A1, all of which are incorporated herein by reference). Exemplary loxP sites include, but are not limited to, wild-type lox511, lox5171, lox2272, M2, M3, M7, M11, lox71 and lox 66. loxP allows for the later removal of promoterless selection markers. Thus, the edited population can be selected and then the promoter-free selectable marker removed. This allows the reuse of promoter-free selection markers if further editing is required.

As used herein, "non-homologous end joining" (NHEJ) refers to a cellular process that joins two ends of a double-stranded break (DSB) by a process largely independent of homology. Naturally occurring DSBs are produced spontaneously during DNA synthesis, when replication forks encounter damaged templates, and during certain specific cellular processes, including v (d) J recombination, class switch recombination occurring at the immunoglobulin heavy chain (IgH) locus, and meiosis. In addition, DSBs are produced by exposure of cells to ionizing radiation (X-rays and gamma rays), ultraviolet light, topoisomerase poisons, or radio-mimetic drugs. Depending on the specific sequence and chemical modifications produced at the DSB, NHEJ can be either precise or mutation-induced (Lieber, m.r., "annual biochemical seal 79: 181-211 (2010)).

As used herein, "non-homologous end joining inhibitor" or "NHEJ inhibitor" refers to a molecule that inhibits the non-homologous end joining process. In some embodiments, the donor nucleic acid molecule is treated with at least one NHEJ inhibitor. Examples of NHEJ inhibitors include, but are not limited to, DNA-dependent protein kinase (DNA-PK), DNA ligase IV, DNA polymerase 1 or 2(PARP-1 or PARP-2), or combinations thereof. Exemplary DNA-PK inhibitors include Nu7206(2- (4-morpholinyl) -4H-naphtho [1, 2-b ] pyran-4-one), Nu7441(8- (4-dibenzothienyl) -2- (4-morpholinyl) -4H-1-benzopyran-4-one), Ku-0060648 (4-ethyl-N- [4- [2- (4-morpholinyl) -4-oxo-4H-1-benzopyran-8-yl ] -1-dibenzothienyl ] -1-piperazineacetamide), Compound 401(2- (4-morpholinyl) -4H-pyrimido [2, 1-a ] isoquinolin-4-one), DMNB (4, 5-dimethoxy-2-nitrobenzaldehyde), ETP45658(3- [ 1-methyl-4- (4-morpholinyl) -1H-pyrazolo [3, 4-d ] pyrimidin-6-ylphenol), LTURM 34(8- (4-dibenzothienyl) -2- (4-morpholinyl) -4H-1, 3-benzoxazin-4-one), and Pl 103 hydrochloride (3- [4- (4-morpholinylpyrido [3 ', 2': 4, 5] furo [3, 2-d ] pyrimidin-2-yl ] phenol hydrochloride).

As used herein, "target locus" refers to a site within a nucleic acid molecule that is recognized and cleaved by a nucleic acid cleavage entity. When, for example, a single CRISPR complex is designed to cleave a double-stranded nucleic acid, then the locus of interest is the cleavage site and surrounding region recognized by the CRISPR complex. When, for example, two CRISPR complexes are designed to cleave a double-stranded nucleic acid in close proximity to create a double-stranded break, then the surrounding region, including the breakpoint, recognized by both CRISPR complexes is referred to as the target locus.

As used herein, "nuclease resistant group" refers to a chemical group that can be incorporated into a nucleic acid molecule and that is capable of inhibiting degradation of a nucleic acid molecule containing the group by an enzyme (exonuclease and/or endonuclease). Examples of such groups are phosphorothioate groups, amino groups, 2 '-O-methyl nucleotides, 2' -deoxy-2 '-fluoro nucleotides, 2' -deoxy nucleotides and 5-C-methyl nucleotides. Nuclease resistant groups can be located in a number of positions in the donor nucleic acid molecule. In some embodiments, the cellular nuclease will digest this portion of the donor nucleic acid molecule. These nucleases will be stopped or slowed by the nuclease resistant group, thereby stabilizing the structure of the donor nucleic acid molecule.

Embodiments of the invention include compositions comprising nucleic acid molecules comprising one or more (e.g., one, two, three, four, five, six, seven, etc.) nuclease-resistant groups; and methods of making and using such donor nucleic acid molecules. In many embodiments, the nuclease-resistant groups will be located at one or both ends of the donor nucleic acid molecule. The donor nucleic acid molecule may comprise internal groups relative to one or both termini. In many embodiments, some or all of such donor nucleic acid molecules will be processed within the cell to generate ends that match the DS break site.

As used herein, the term "intracellular targeting moiety" refers to a chemical entity (e.g., a polypeptide) that facilitates localization to an intracellular location. Examples of intracellular targeting moieties include nuclear localization signals, chloroplast targeting signals, and mitochondrial targeting signals.

As used herein, "subject" refers to a human or non-human animal (e.g., a mammal) or a plant.

As used herein, "treating" refers to alleviating at least one symptom of a disease, disorder, or condition in a subject by administering to the subject an effective amount of a promoter-free selectable marker.

As used herein, a "nucleic acid cleavage entity" refers to one or more molecules, enzymes, or molecular complexes having nucleic acid cleavage activity (e.g., double-stranded nucleic acid cleavage activity). In most embodiments, the nucleic acid cleaving entity components will be proteins or nucleic acids or a combination of both, but they may be combined with cofactors and/or other molecules. The nucleic acid cleaving entity will typically be selected based on a number of factors, such as the efficiency of generating a DS break at the target locus, the ability to generate a DS break at a suitable location at or near the target locus, the low potential for generating a DS break at the target locus, low toxicity, and cost issues. Many of these factors will vary depending on the cell used and the locus of interest. Many nucleic acid cleavage entities are known in the art. For example, in some embodiments, the nucleic acid cleavage entity comprises one or more zinc finger proteins, transcription activator-like effectors (TALEs), CRISPR complexes (e.g., Cas9 or CPF1), homing endonucleases or meganucleases, argan-nucleic acid complexes, or meganucleases. In some embodiments, the nucleic acid cleaving entities will have activity that allows them to be nuclear localized (e.g., will comprise a Nuclear Localization Signal (NLS)). In some embodiments, the single-stranded DNA donor may function with a nick or a combination of nicks.

Zinc Finger Protein (ZFP)

As used herein, a "zinc finger protein" (ZFP) refers to a chimeric protein comprising a nuclease domain and a zinc stabilizing nucleic acid (e.g., DNA) binding domain. Individual DNA binding domains are often referred to as "fingers" such that a zinc finger protein or polypeptide has at least one finger, more usually two or three fingers, or even four or five fingers, to at least six or more fingers. In some embodiments, the ZFP will comprise three or four zinc fingers. Each finger typically binds two to four base pairs of DNA. Each refers to a zinc-chelating DNA binding region that can comprise about 30 amino acids (see, e.g., U.S. patent publication No. 2012/0329067 a1, the disclosure of which is incorporated herein by reference).

An example of a nuclease domain is the non-specific cleavage domain from the type IIs restriction endonuclease FokI (Kim, Y.G. et al, Proc. Natl. Acad. Sci. USA 93: 1156-60(1996)), which is usually separated by a 5-7 base pair linker sequence. A pair of fokl cleavage domains is usually required to allow dimerization of the domains and cleavage of the non-palindromic target sequence from the opposite strand. Individual Cys₂His₂The DNA-binding domain of ZFNs typically comprises 3 to 6 individual zinc finger repeats and each is capable of recognizing 9 to 18 base pairs.

One problem associated with ZFPs is the possibility of off-target lysis, which may cause random integration of donor DNA or cause chromosomal rearrangements or even cell death, which still raises concerns about suitability for higher organisms (Radecke, s. et al, moll. ther.) 18: 743-.

Transcription activator-like effectors (TALE)

As used herein, a "transcription activator-like effector" (TALE) refers to a protein consisting of more than one TAL repeat sequence and is capable of binding nucleic acids in a sequence-specific manner. TALEs represent a class of DNA binding proteins secreted by plant pathogenic bacterial species (e.g., xanthomonas and ralstonia) via their type III secretion system upon infection of plant cells. Natural TALEs, among others, have been shown to bind to plant promoter sequences, thereby regulating gene expression and activating effector-specific host genes, thereby promoting bacterial reproduction (a) ((b))P. et al, Science 318: 645-648 (2007); boch, j et al, "annual book of phytopathology (annu. rev. phytopathohol.) 48: 419-; kay, s, et al, Science 318: 648-651 (2007); kay, s, et al, new microbiology (curr. opin. microbiol.) 12: 37-43(2009)).

Natural TALEs are typically characterized by a central repeat domain and a carboxy-terminal nuclear localization signal sequence (NLS) and transcriptional Activation Domain (AD). In addition to the usually shorter carboxy-terminal repeat sequence (referred to as the half-repeat sequence), the central repeat domain typically consists of: from 1.5 to 33.5 amino acid repeats of varying amounts, typically 33-35 residues in length. The repeated sequences are mostly identical, but differ at certain hypervariable residues. The DNA recognition specificity of TALEs is mediated by hypervariable residues, so-called Repeat Variable Diresidues (RVDs), typically located at positions 12 and 13 of each repeat sequence, where each RVD targets a specific nucleotide in a given DNA sequence. Thus, the order of the repeat sequences in TAL proteins tends to correlate with a defined linear sequence of nucleotides in a given DNA sequence. The potential RVD codes for some TALEs that occur in nature have been identified, allowing the prediction of the order of repeated sequences necessary for binding to a given DNA sequence (Boch, J. et al, science 326: 1509-1512 (2009); Moscou, M.J. et al, science 326: 1501 (2009)). In addition, TAL effectors generated with new combinations of repeat sequences have been shown to bind to the target sequence predicted from this code. It has been shown that the target DNA sequence typically starts with a 5' thymine base that is recognized by TAL proteins.

TAL modular structures allow DNA binding domains to be combined with effector molecules (e.g., nucleases). Specifically, TALE nucleases allow the development of new genome engineering tools.

TALEs used in some embodiments can generate DS fragmentation or can have a combined effect for generating DS fragmentation. For example, TAL-fokl nuclease fusions can be designed to bind at or near a target locus and form a double-stranded nucleic acid cleavage activity by the binding of two fokl domains.

In some embodiments, the TALE will comprise greater than or equal to 6 TAL repeats (e.g., greater than or equal to 8, 10, 12, 15, or 17, or 6 to 25, 6 to 35, 8 to 25, 10 to 25, 12 to 25, 8 to 22, 10 to 22, 12 to 22, 6 to 20, 8 to 20, 10 to 22, 12 to 20, 6 to 18, 10 to 18, 12 to 18, etc.). In some embodiments, the TALE may comprise 18 or 24 or 17.5 or 23.5TAL nucleic acid binding cassettes. In further embodiments, the TALE may comprise 15.5, 16.5, 18.5, 19.5, 20.5, 21.5, 22.5, or 24.5TAL nucleic acid binding cassettes. A TALE will typically have at least one polypeptide region flanked by regions comprising TAL repeats. In many embodiments, the flanking regions will be present at the amino terminus and the carboxy terminus of the TAL repeat sequence. Exemplary TALEs are described in U.S. patent publication No. 2013/0274129 a1, the disclosure of which is incorporated herein by reference, and can be modified forms of naturally occurring proteins found in Burkholderia (Burkholderia), xanthomonas (Xanthamonas), and Ralstonia bacteria.

In some embodiments, the TALE protein will contain a Nuclear Localization Signal (NLS) that allows it to be transported to the nucleus.

CRISPR-based systems

The term "CRISPR" or "clustered regularly interspaced short palindromic repeats" is a generic term that applies to three types of systems and subtypes of systems. In general, the term CRISPR refers to a repetitive region that encodes a component of the CRISPR system (e.g., encoded crRNAs). Three types of CRISPR systems have been identified, each with different characteristics (see table 2).

As used herein, the term "CRISPR complex" refers to a CRISPR protein and a nucleic acid (e.g., RNA) that bind to each other to form functionally active aggregates. One example of a CRISPR complex is a wild-type Cas9 (sometimes referred to as Csn1) protein that binds to a guide RNA specific for a target locus.

As used herein, the term "CRISPR protein" refers to a protein comprising a nucleic acid (e.g., RNA) binding domain nucleic acid and an effector domain (e.g., Cas9, such as streptococcus pyogenes Cas9 or CPF1 (cleavage and polyadenylation factor 1)). The nucleic acid binding domain interacts with or allows binding to a first nucleic acid molecule having a region capable of hybridizing to a desired target nucleic acid (e.g., a guide RNA) or to a second nucleic acid having a region capable of hybridizing to a desired target nucleic acid (e.g., a crRNA). CRISPR proteins can also comprise nuclease domains (i.e., dnase or rnase domains), additional DNA binding domains, helicase domains, protein-protein interaction domains, dimerization domains, and other domains.

CRISPR proteins also refer to proteins that form a complex that binds to the first nucleic acid molecule described above. Thus, one CRISPR protein may bind to e.g. a guide RNA and another protein may have endonuclease activity. These are all considered CRISPR proteins because they serve as part of a complex that performs the same function as a single protein (such as Cas9 or CPF 1).

In some embodiments, the CRISPR protein will contain a Nuclear Localization Signal (NLS) that allows its transport to the nucleus.

CRISPR used in some embodiments may generate DS breaks or may have a combined effect for generating DS breaks. For example, mutations can be introduced into CRISPR components to prevent CRISPR complexes from causing DS fragmentation, but still allow these complexes to cleave DNA. Mutations have been identified in Cas9 protein that allow for the preparation of Cas9 protein that cleaves DNA rather than double-strand cleavage. Thus, some embodiments include the use of Cas9 proteins with mutations in RuvC and/or HNH domains that limit the nuclease activity of such proteins to nicking activity.

The term "dCas 9" as provided herein refers to nuclease inactivated Cas 9. In embodiments, the DNA binding modulation enhancer may be a guide RNA that binds to a dCas9 domain. In other embodiments, the regulatory complex is a Cas9 domain that binds to a gRNA, wherein the regulatory complex further comprises a VP16 transcriptional activation domain operably linked to a Cas9 domain. Such systems can be used to induce expression of, for example, endogenous genes in mammalian cells. One of ordinary skill in the art will immediately recognize that the type of DNA binding modulation enhancer used will vary depending on the cell type and the particular application.

In many cases, the dCas9 protein has at least one mutation in each of the RuvC and HNH domains that inactivates the nuclease activity of the protein.

CRISPR systems that can be used vary widely. These systems typically have functional activity capable of forming a complex comprising a protein and a first nucleic acid, wherein the complex recognizes a second nucleic acid. The CRISPR system may be a type I, type II or type III system. Non-limiting examples of suitable CRISPR proteins include Cas, Cas5 (or CasE d), Cas6, Cas8a, Cas8, Cas Od, CasF, cassg, CasE, csh, Csy, Cse (or CasA), Cse (or CasB), Cse (or CasE), Cse (or CasC), Csc, Csa, Csn, Csm, cmm, Cmr, Csb, Csx, CsaX, Csx, Csf, and Cu 1966.

In some embodiments, the CRISPR protein (e.g., Cas9) is derived from a type II CRISPR system. In particular embodiments, the CRISPR system is designed to act as an oligonucleotide (e.g., DNA or RNA) guided endonuclease derived from a Cas9 protein. Cas9 proteins useful for this and other functions elucidated herein may be derived from Streptococcus pyogenes (Streptococcus pyrogenes), Streptococcus thermophilus (Streptococcus thermophilus), Streptococcus species (Streptococcus sp.), Nocardioides (Nocardia dassolvii), Streptomyces pristinaespiralis (Streptomyces pristinaespiralis), Streptomyces viridochromogenes (Streptomyces viridochromogenes), Streptomyces viridochromogenes (Streptomyces roseus), Streptomyces roseus (Streptomyces roseosporum), Thermomyces acidocaldarius (Bacillus acidilacticola), Bacillus pseudomycoides (Bacillus dysureus), Bacillus arsenic-reducing bacteria (Bacillus lactis), Bacillus acidipreniatus (Bacillus acidilacticola), Bacillus acidilacticola (Bacillus acidilacticola), Lactobacillus salivarius (Lactobacillus salivarius), Lactobacillus salivarius (Lactobacillus salivarius), Lactobacillus salivarius, Bacillus lacticus (Bacillus lacticus), Lactobacillus salivarius, Bacillus lacticus (Lactobacillus salivarius), Lactobacillus salivarius, Bacillus lacticus (Bacillus lacticus) belonging to genus Lactobacillus species of Bacillus Microcystis aeruginosa (Microcystis aeruginosa), Synechococcus sp., Acetobacter arabicum (Acetobacter arabicum), Thermomobacter celecoxium (Ammoniodextrigensii), Bacillus subtilis (Caldicellosimicrobium), Chryseobacterium auritum (Candidatus), Clostridium botulinum (Clostridium bortulinum), Clostridium difficile (Clostridium difficile), Fenuger bacterium (Finegella magna), Anaeromonas thermophila (Naterabium thermophilum), Thermomyces thermophila (Pentamanulus thermophilus), Thermomyces thermophila (Pentamanum thermophilus), Thiobacillus acidophilus (Acidithiobacillus caldus), Thiobacillus ferrooxidans (Aciditis), Streptococcus thermophilus (Marulobacter xylinus), Streptococcus thermophilus (Streptococcus mutans), Streptococcus mutans strain (Streptococcus mutans), Streptococcus mutans (Streptococcus mutans), Streptococcus mutans, Streptococcus lactis, Streptococcus mutans, Streptococcus lactis, Streptococcus, Nostoc sp, Arthrospira maxima (Arthrospira maxima), Arthrospira platensis (Arthrospira platensis), Arthrospira sp, Spirulina sp (Lyngbyasp), Microcoleus chrysophans (Microcoleus chrysophytes), Oscillatoria sp (Oscilllaria sp.), Shigella mobilis (Petroga mobilis), Thermomyces africana (Thermosiphora africana), or Haematococcus profundus (Acarylchris marina).

Aloglein gene editing system

The argan family is endonucleases that use 5' phosphorylated single stranded nucleic acids as a guide to cleave nucleic acid targets. These proteins, such as Cas9, are believed to have a role in gene expression suppression and defense against foreign nucleic acids.

Alogenin differs from Cas9 in many ways. Unlike Cas9, which is only present in prokaryotes, alogenin is evolutionarily conserved and present in almost all organisms. Some argan proteins have been found to bind to single stranded DNA and cleave target DNA molecules. Furthermore, alogenin binding does not require specific common secondary guide structures and does not require sequences like the CRISPR system PAM site. It has been shown that argan from Bacillus halophilus Graham can be programmed with a single-stranded DNA guide and used for genome editing in mammalian cells (Gao, F., et al, Nature Biotech (Nat.) 34: 768-73 (2016)).

Alogenin requires a 5' phosphorylated single stranded guide DNA molecule, which is about 24 nucleotides in length. See, e.g., SEQ ID NO: 6.

Introduction of materials into cells

Introduction of various molecules into cells can be performed by a variety of Methods, including those described in many standard laboratory manuals, such as Davis, L et al, "Methods based on Molecular Biology in Molecular Biology" (1986) and Sambrook, j. et al, "Molecular cloning: a laboratory Manual (Molecular Cloning: A laboratory) vol.1, 2 nd edition, Cold Spring harbor laboratory Press, N.Y. (1989). Examples include, but are not limited to, calcium phosphate transfection, DEAE-polydextrose mediated transfection, microinjection, cationic lipid mediated transfection, electroporation, transduction, scrape loading, ballistic introduction, nuclear perforation, hydrodynamic shock, and infection.

The different components of the nucleic acid cleavage entity and/or the donor nucleic acid molecule can be introduced into the cell in different ways. In some embodiments, a single type of nucleic acid cleavage entity molecule may be introduced into a cell, but some nucleic acid cleavage entity molecules may be expressed within the cell. One example is the use of two zinc finger-fokl fusions to create a double strand break in intracellular nucleic acids. In some cases, only one zinc finger-fokl fusion may be introduced into a cell, while another zinc finger-Fok I fusion may be produced intracellularly.

Suitable transfection agents include transfection agents that facilitate the introduction of RNA, DNA, and proteins into cells. Exemplary transfection reagents include, but are not limited to, TurboFect transfection reagent (Sermeral Fect technology), Pro-Ject reagent (Sermeral Fect technology), TRANSPASS^TMP protein transfection reagent (New England Biolabs), CHARIOT^TMProtein delivery agent (Active Motif), PROTEOJUICE^TMProtein transfection reagent (EMD Millipore), 293fectin, LIPOFECTAMINE^TM2000、LIPOFECTAMINE^TM3000 (Saimer Feishale science), LIPOFECTAMINE^TMLIPOFECTIN (Saimer Feishale science & ltd. TM.)^TM(Saimer Feishell technology), DMRIE-C, CELLFECTIN^TM(Saimer Feishell science), Oligofectamine^TM(Saimer Feishell science), LIPOFECTAACE^TM、Fugene^TM(Basel Roche, Basel, Switzerland)), Fugene^TMHD (Roche) and Transfectam^TM(Transfectam，Promega，Madison，Wis.)、Tfx-10^TM(Promega)、Tfx-20^TM(Promega)、Tfx-50^TM(Promega)、Transfectin^TM(BioRad，Hercules，Calif.)、SilentFect^TM(Bio-Rad)、Effectene^TM(Qiagen，Valencia，Calif.)、DC-chol(Avanti Polar Lipids)、GENEPORTER^TM(Gene Therapy Systems, San Diego, Calif.), DHARMAFECT1^TM(Dharmacon，Lafayette，Colo.)、DHARMAFECT 2^TM(Dharmacon)、DHARMAFECT 3^TM(Dharmacon)、DHARMAFECT4^TM(Dharmacon)、ESCORT^TMIII (Sigma, St. Louis, Mo.) and Escort^TMIV (Sigma chemical Co.).

The compositions and methods of the invention include methods that can be used in high throughput screening methods. One example of such a method is reverse transfection. For the purpose of illustration, it is assumed that a library of gRNA molecules and corresponding NLS binding donor DNA molecules has been generated. It is further hypothesized that each library composition comprises (1) gRNA molecules having sequence homology to a particular locus in the genome of the cell, and (2) NLS-binding donor DNA molecules having regions of homology flanking the expected genomic cleavage sites. It is also assumed that three hundred such library compositions have been generated and that each of these compositions is spotted onto a separate location on a slide. Finally, a 293FT cell line expressing Cas9 protein was overlaid on a slide under conditions that allowed (1) uptake of the library composition and (2) gene editing at the gRNA designated target locus. Of course, many variations of such methods are possible, including variations in which the gene editing reagents used are different (e.g., TAL-fokl mRNA rather than gRNA) and in which the array format is different (e.g., wells of a 96-well plate rather than the surface of a slide).

Thus, the invention includes libraries of gene editing agents (e.g., grnas, talmrnas, donor nucleic acid molecules, etc.) and high throughput methods for modifying various target loci in a cell.

Nucleic acid localization and gene editing efficiency

The invention also includes compositions and methods for increasing gene editing efficiency. In some embodiments, such compositions and methods involve a nucleic acid molecule linked to one or more intracellular targeting moieties that localize the nucleic acid molecule to an intracellular location (e.g., nucleus, mitochondria, chloroplast, etc.) where gene editing is desired. Some embodiments will employ an intracellular targeting moiety to facilitate an increase in the local concentration of nucleic acid molecules at one or more intracellular locations. While not wishing to be bound by theory, it is believed that increased gene editing efficiency is due to an increased concentration of donor nucleic acid at the location where gene editing is desired.

One embodiment is shown in fig. 9. This figure shows a Nuclear Localization Signal (NLS) (an example of an intracellular targeting moiety) linked to a single stranded donor DNA molecule by two different linkers. Constructs of this type may be used to facilitate nucleic acid delivery to the nucleus. Many variations of this construct are possible.

As shown in fig. 11 and 13 and discussed in the nuclear localization examples below, it has been found that constructs such as those shown in fig. 9 can significantly improve the efficiency of gene editing within a cell and allow for the use of less donor nucleic acid. In particular, the above-mentioned data demonstrate that the use of NLS modified donor DNA can improve the efficiency of genome engineering at a nucleic acid cleavage site (e.g., a chromosomal locus cleaved with gRNA/Cas 9).

The data in fig. 11 and 13 show near 80% gene editing efficiency. Further, it was found to be about every 2x10⁵The NLS-donor DNA conjugate was as little as 0.03 picomolar per cell. Thus, embodiments include compositions and methods for intracellular genetic engineering in which at least 75% (e.g., at least 80%, at least 85%, at least 90%, at least 95%, 50% to 75%, 50% to 80%, 50% to 85%, 50% to 95%, 60% to 95%, 70% to 90%, 75% to 90%, 80% to 99%, 80% to 97%, 80% to 99%, 80% to 96%, 88% to 98%, etc.) of a cell's specific target locus is modified. In addition, some embodiments include compositions and methods wherein the cells are contacted every 2X10^sAt least 50% of the transfected cells in the reaction mixture are modified at the locus of interest when 0.3 picomolar or less (e.g., 0.001 to 0.3, 0.005 to 0.3, 0.01 to 0.3, 0.05 to 0.3, 0.001 to 0.2, 0.005 to 0.2, 0.001 to 0.15, 0.001 to 0.1 picomolar, etc.) of donor DNA is contacted per cell. This assumed 100% transfection. For example, in the case of 50% transfection, about half of the total number of cells and at least 75% of the transfected cells will be obtained.

Some embodiments also relate to compositions and methods for increasing the local concentration of donor nucleic acid at a location (e.g., nucleus) where there is an intracellular nucleic acid molecule that needs to be altered. Some embodiments include compositions and methods for increasing concentration at an intracellular location using an intracellular targeting moiety, wherein the increase in local nucleic acid concentration is at least 10-fold (10 to1,000, 10 to 800, 10 to 600, 10 to1,000, 10 to 400, 50 to1,000, 50 to 600, 100 to1,000, 100 to 700-fold, etc.) compared to when the intracellular targeting moiety is not used. For example, a fold increase in intracellular localization of a nucleic acid molecule can be measured using, for example, a fluorescently labeled nucleic acid molecule. To illustrate, NLS-bound and unbound nucleic acid molecules can be used for comparison in such assays.

A variation of the construct shown in figure 9 is where the NLS is located at the 3 'end (rather than the 5' end), both ends, placed in the middle of the donor DNA molecule, etc. In addition, there may be more than one NLS at one or both ends. Furthermore, the nucleic acid molecule may be: (1) DNA or RNA; (2) single-stranded or double-stranded; (3) molecules that are linear, circular, or have a stem or hairpin loop; and/or (4) chemically modified (e.g., comprising phosphorothioate linkages, 2' -O-methyl bases, etc.). In addition, as described below, the NLS can be replaced with an intracellular targeting moiety that directs localization to a cellular space other than the nucleus (e.g., mitochondria, chloroplasts, etc.). Thus, some embodiments include nucleic acid molecules operably linked to one or more intracellular targeting moieties that are localized at intracellular locations where gene editing is desired, and methods of using such nucleic acid molecules (e.g., for genome engineering). Table 3 lists the amino acid sequences of some exemplary intracellular targeting moieties that may be used in some embodiments.

Table 3: exemplary subcellular/organelle localization sequences

In addition, in many cases where intracellular targeting moieties (e.g., polypeptides) are used, these targeting moieties can be designed to localize the nucleic acid molecule to a location where intracellular nucleic acid is expected to be present (e.g., the nucleus, the stroma of the chloroplast, the stroma of the mitochondria, etc.). In other words, in many cases, it may be desirable to localize a nucleic acid to a particular subspace within a cell. Some embodiments include compositions and methods for localizing a donor nucleic acid molecule at a location within a cell, and for enhancing the efficiency of a genome engineering reaction at the location within the cell where the donor nucleic acid molecule is localized.

In the practice of the present invention, various methods can be used to attach the intracellular targeting moiety to the nucleic acid molecule. Two methods described in the examples are succinimidyl 4- (N-maleimidomethyl) cyclohexane-1-carboxylate (SMCC) linker andsystem (seemer fly siege). Regardless, linkers used to attach intracellular targeting moieties to nucleic acid molecules typically have certain characteristics, some of which are (1) low cytotoxicity, (2) promote cellular uptake or at least have a low level of interference with cellular uptake; and (3) low molecular weight (mwt) (e.g., less than 500 mwt). The ligation of the intracellular targeting moiety to the nucleic acid molecule can be performed by PCR amplification using NLS-conjugated DNA oligonucleotides as primers. In addition, NLS-bound DNA oligonucleotides can be used as universal primers, where the nucleic acid part is ligated to a specific region of the gene for PCR amplification. Furthermore, NLS-bound DNA oligonucleotides can be bound to single-stranded DNA donors or double-stranded DNA donors with single-stranded protrusions, bound to intracellular targeting moieties, adhesively rather than covalently, and then the donors brought into intracellular compartments.

The size, type, and other characteristics of the nucleic acid molecule components in the conjugate often vary with the application, with SNP variations often being shorter than the coding region insertions. In addition, the length of the region homologous to the endogenous nucleic acid (when present) will also vary with the application. The nucleic acid molecule component (e.g., donor DNA) in the conjugate can be 1 to 2000 or more (e.g., less than or equal to 1500, 1000, 750, 500, 300, 250, 200, 150, 100, 75, 50, 40, 35, 30, 25, 20, 15, 10, 5, 4, 3, 2, or 1) nucleotides or base pairs in length (depending on whether it is single-stranded or double-stranded). In some embodiments, the nucleic acid molecule component is 1 to 500, 10 to 400, 20 to 300, 30 to 250, 30 to 200, or 30 to 100 equal nucleotides or base pairs in length.

Also included are compositions and methods comprising a gene-editing protein (e.g., a Cas9 protein, a TAL protein, etc.), wherein the gene-editing protein is operably linked to one or more intracellular targeting moieties that are capable of localizing a donor nucleic acid molecule to a location in a cell where an endogenous nucleic acid molecule is located, the one or more intracellular targeting moieties being associated with the gene-editing protein.

Nuclear localization signals that may be used in the practice of the present invention may have a variety of structures and may be, for example, single-or double-typing. A mono-typing NLS typically consists of a single cluster of basic residues. A bi-typing NLS typically consists of two clusters of basic residues separated by 10-12 residues. Exemplary NLS amino acid sequences are listed in tables 4,5 and 6 below.

Figure 39 shows a series of schematic diagrams of Cas9 protein operably linked to NLS. Any number and type of NLS can be used, and their location in a protein or nucleic acid molecule will vary with the specificity and intended purpose of the molecule to which the NLS is attached. For proteins such as Cas9 or TAL effectors, it is often necessary to introduce large amounts of the protein into the cell, then to keep the protein in the cytoplasm for a relatively short period of time, and to localize most of the protein to the nucleus. This is so because it is believed that the longer the protein is retained in the cytoplasm, the higher the degree of degradation of the protein. It is also believed that the higher the concentration of Cas9 in the nucleus, the higher the cleavage efficiency (assuming of course that all Cas9 have cleavage activity (e.g. in relation to grnas)). Thus, the number of NLS-related proteins (and other molecules) that are "collected" in the nucleus is based on: (1) the amount of protein introduced into the cell and (2) the rate at which the protein proceeds to the nucleus.

Fig. 45 is a schematic diagram showing a common TALE structural format. In many cases, TALEs act by stimulating DNA present at specific locations within the cell (e.g., nucleus, mitochondria, chloroplasts, etc.). In many cases, disruption of the TALE protein region involved in DNA recognition and binding will result in a reduction or elimination of DNA recognition and/or binding activity. Sites 1, 2 and 3 are located outside the TALE region thought to be involved in DNA recognition and binding. Thus, when high levels of target DNA binding are required, these are suitable sites for NLS localization.

Using FIG. 45 as a reference point, NLS is located anywhere to the left (N-terminal direction) of amino acid 25 at position 1. The location of NLS at any position to the right of amino acid 814 (N-terminal direction) is possible at positions 2 and 3. This includes the case where a naturally occurring longer TALE protein region is included beyond amino acid 25 on the left and amino acid 814 on the right of figure 45. Furthermore, position 3 is located at the C-terminus of the effector domain.

One or more NLS can be located at one or more of positions 1, 2 and/or 3. Furthermore, when there are multiple NLS (at one or more of these sites), they can be of the same type or different types.

In many cases, the location (e.g., positions 1, 2, and/or 3 in fig. 45) and type of NLS will be selected in a manner that results in: (1) high levels of gene-editing agents localized to the nucleus and/or (2) high levels of functional activity in the nucleus. Both effects are generally associated with the amount of nuclear functional activity that is generally below nuclear localization. This is so because in many cases not all gene-editing reagents that enter the nucleus bind to their specific target locus, and those that do bind may not always act on the target locus nucleic acid in the way it was designed (e.g., nucleic acid cleavage, transcriptional activation, etc.). In this regard, one exemplary reason is that the nucleic acid of the nucleus in one cell type is more accessible than another cell type. Furthermore, even within cells of the same cell type, there are variations that may make a locus of interest in one cell of a population easier or less accessible than another cell of the same population.

Both nuclear localization and functional assays (e.g., genomic cleavage assays) are described elsewhere herein. To correct for differences in target loci and cell types, the same target loci and cell types will typically be used in a comparative analysis.

Furthermore, gene editing efficiency will often vary with the locus and cell type being edited. This is due to a number of factors, including the accessibility of gene editing reagents to the target locus and the efficiency of the cell type in Homology Directed Repair (HDR). In terms of HDR, cells with higher HDR efficiencies (e.g., 293FT and U2OS cells) typically exhibit higher gene editing rates than cells with lower levels of HDR efficiency (e.g., a549 cells).

Various forms of TALE proteins are shown below in table 7 with respect to the location and type of NLS.

The exemplary TALE/NLS formats listed in table 7 differ in the type of NLS and NLS position within the TALE protein. In some cases, the TALE protein will contain about 1 to about 15 (e.g., about 2 to about 14, about 3 to about 14, about 4 to about 14, about 2 to about 10, about 2 to about 8, about 2 to about 6, about 3 to about 5, about 3 to about 4, etc.) NLS. Further, when multiple NLS are present in the TALE protein, these NLS can be mono-or di-partial.

In addition, two or more NLS can be located in the same region (e.g., N-terminal region, etc.) of the TALE protein. When more than one NLS is located within a TALE protein (e.g., within the same region of the TALE protein), two or more of the NLS can be located within about 1 to about 50 (e.g., about 2 to about 50, about 3 to about 50, about 5 to about 50, about 10 to about 50, about 15 to about 50, about 2 to about 30, about 5 to about 50, about 5 to about 25, etc.) amino acids of each other. In some cases, two NLS can be separated from each other by two amino acids. Furthermore, these amino acids may be of a type intended to form flexible linkers (e.g., Gly-Gly, Gly-Ser, etc.).

Table 7 lists six specific TALE/NLS formats for the regional location and amino acid sequence of NLS. Two more general TALE/NLS formats are further listed. Any number of such forms are possible. For example, each region of the TALE protein can independently comprise about 1 to about 5 NLS.

Fig. 46 and 47 show the amino acid sequences of two different TALEN proteins with NLS in different positions. The Cas9 and TALEN proteins may differ in the number, type, and location of NLS present in the molecule. With respect to the amino acid sequences shown in FIG. 46, when NLS is located at the N-terminus with respect to the repeat region, NLS is located farther from the N-terminus than from the R-3 region. In addition, when NLS is located C-terminally with respect to the repeat region, the location of NLS is generally closer to the C-terminus than amino acid HRVA (amino acids 811-814 in FIG. 46), between the repeat region and the effector domain (FokI in the amino acid sequence shown in FIG. 46). Thus, using the amino acid sequence in fig. 46 for reference, NLS can be located at three general positions: (1) the N-terminal of the repeat region; (2) between the repeat region and the effector domain, and (3) after the effector domain.

In some cases, using the amino acid sequence shown in figure 46 for reference, one or more NLS can be located in the region of amino acids 768 to 814. For example, in fig. 45, one or more NLS can be immediately following one or more of the following amino acids: 768. 777, 779, 788 and/or 789.

Specifically, fig. 46 shows, inter alia, the TAL protein region (amino acids 18 to 153 in fig. 46) located to the left of the N-terminus of the repeat region. This TAL protein region is generally conserved in xanthomonas species with over 90% identity at the amino acid level. In addition, this region contains four regions (R0, R-1, R-2, and R-3) that have some sequence homology to TAL repeats. The NLS is typically located outside this region and will typically be located closer to the N-terminus of the TAL protein.

In addition, the amino acid sequence shown in fig. 46 contains only 153 amino acids at the N-terminus of the repetitive region, for example. The N-terminal region can have various lengths and can be, for example, about 140 to about 400 (e.g., about 150 to about 350, about 150 to about 300, about 150 to about 250, about 150 to about 200, about 180 to about 350, about 185 to about 300, about 200 to about 350, about 200 to about 300, etc.) amino acids in length.

In addition, again using the amino acid sequence shown in FIG. 46, the region C-terminal to the amino acid repeat region is also generally conserved among Xanthomonas species. Likewise, the NLS will generally be located outside of this region, and will generally be located closer to the C-terminus of the TAL protein.

Depending on the desired intracellular level of a gene-editing molecule (e.g., TAL protein, CRISPR protein, gRNA, etc.) and the desired duration of gene-editing activity, the gene-editing molecule can be introduced into the cell as RNA/mRNA or by DNA encoding an RNA or protein gene-editing agent. Furthermore, when the nucleic acid encoding the gene-editing molecule is located in a cell, the coding region will typically be operably linked to an expression control sequence, such as a promoter (e.g., a constitutive promoter, an inducible promoter, a repressible promoter, etc.).

Provided herein are various forms of TAL proteins (as well as other gene editing molecules), nucleic acid molecules encoding these proteins, and methods of using these proteins to modify the genome of a cell.

Assays for measuring nuclear uptake of proteins and other molecules are known. Such assays may be based on measurements of functional activity in the nucleus (e.g. the GCD assay described in example 1). Other assays measure molecular absorption directly and include fluorescence-based assays. Such assays typically require that the molecule being tested exhibit fluorescence. Fluorescence may occur naturally in the molecule or it may result from binding to a fluorescent molecule (e.g., GFP, OFP, chemical labels (e.g., dyes), etc.).

Wu et al, Journal of biophysics (Biophysical Journal), 96: 3840-3849(2009) proposes a suitable assay in which a two-photon fluorescence correspondence microscope is used to measure nuclear input. These methods are based on measuring the mean fluorescence intensity of a plurality of points inside the cytoplasm and nucleus by microscopy and then determining the ratio. Although certain gene-editing agents may be entrapped in the membrane and endosomes, cytoplasmic fluorescence levels may be compared to nuclease fluorescence levels to determine the rate at which the gene-editing agent enters the nucleus and the amount of gene-editing agent present in the nucleus at one or more time points.

The methods as set forth by Wu et al can be used to measure the absorbance and location-based concentration of fluorescently labeled gene-editing reagents. One exemplary method is where two-photon fluorescence correspondence microscopy is used to measure the nuclear localization of Cas9 protein. In this illustration of the method, a series of different Cas9-NLS-GFP fusion protein/gRNA complexes were introduced into the cell line and the cells were subjected to 50-point fluorescence measurements, half in the cytoplasm and half in the nucleus. The steady state cell check to cytoplasm ratio of each Cas9-NLS-GFP fusion protein/gRNA complex was then determined. Provided herein are compositions and methods for producing cells, wherein the cell nucleus to cytoplasm ratio of a gene editing reagent within the cell averages about 5 to about 120 (e.g., about 5 to about 100, about 15 to about 100, about 20 to about 100, about 25 to about 100, about 30 to about 100, about 35 to about 100, about 40 to about 100, about 50 to about 100, about 60 to about 100, about 70 to about 100, about 40 to about 120, about 50 to about 120, etc.).

Also provided herein are compositions and methods that allow for the generation of a population of cells, wherein at least 90% lysis (e.g., about 90% to about 100%, about 90% to about 98%, about 90% to about 96%, about 93% to about 100%, about 95% to about 100%, about 92% to about 96%, etc.) of at least one of the two target loci of a member of the population occurs with respect to a diploid cell. In some cases, the above percentages of cleavage will apply when conditions are adjusted such that 50,000(/ -10%) is contacted with about 0.5 to about 200ng (e.g., about 0.5 to about 150, about 0.5 to about 100, about 0.5 to about 90, about 0.5 to about 75, about 1 to about 200, about 1.5 to about 200, about 3 to about 200, about 1 to about 50ng, about 10 to about 45, about 12 to about 60ng, etc.) Cas9/gRNA complex under the conditions described in example 7.

Compositions and methods employing intracellular targeting moieties can be used to alter endogenous nucleic acid molecules by a number of methods. For example, these compositions and methods can be used to facilitate homologous recombination at a location that is "intact" with an endogenous nucleic acid. This means that the endogenous nucleic acid has not been cleaved by an editing gene agent (e.g., CRISPR, TAL, zinc finger-fokl fusion, etc.). However, in some cases, the genetically altered site will be cleaved or have a double strand break.

Method of producing a composite material

The methods and compositions provided herein are particularly useful for modulating a locus of interest (e.g., a gene, a genomic region, or a transcriptional regulatory sequence (e.g., promoter, enhancer)), including chromatin (DNA-binding histones), DNA-binding proteins, or a combination thereof. As used herein, the term "target locus" refers to a region within the genome of a cell. The target locus includes one or more binding sequences that bind to a protein or nucleic acid, the binding of which results in structural and or chemical modification of the target locus. Using the methods and compositions provided herein, a target locus can be structurally or chemically modified by binding one or more DNA binding agents (e.g., a first or second DNA binding modulation enhancer) to a specific site that forms part of the target locus. Binding of the DNA binding agent (e.g., the first or second DNA binding modulation enhancer) may cause, for example, chromatin replacement or remodeling at the target locus, and/or it may enhance the accessibility of other endogenous or exogenous modulators to further modifications of the target locus. For example, the methods provided herein are useful for increasing the efficiency and specificity of nucleases (TALENs, Cas9) at genomic loci by enhancing the accessibility of DNA to cleavage sites and surrounding sequences at the locus. Thus, the methods and compositions provided herein are particularly useful for genome editing and enhancing the enzymatic processes involved therein.

Thus, in one aspect, a method of enhancing accessibility of a target locus in a cell is provided. The method comprises the following steps: (1) introducing a first DNA binding modulation enhancer into a cell comprising a nucleic acid encoding a locus of interest, wherein the first DNA binding modulation enhancer is not endogenous to the cell; and (2) a first enhancer binding sequence that binds the first DNA binding modulation enhancer to the target locus, relative to the absence of the first DNA binding modulation enhancer, thereby enhancing accessibility of the target locus.

Accessibility to a target locus may be enhanced upstream or downstream of enhancer binding sequences provided herein. Thus, chromatin located 5 'and 3' to the enhancer binding sequence may be more accessible after binding of the DNA binding modulation enhancer to the enhancer binding site relative to the absence of the DNA binding modulation enhancer.

In embodiments, the target locus comprises a plurality of DNA binding regulatory enhancers that bind to a plurality of enhancer binding sequences (e.g., 2, 4, 6, 8, 10 enhancer binding sequences) of the target locus. Each of the plurality of enhancer binding sequences may be separated from each other by a sequence of 20-60 nucleotides in length. In an embodiment, the target locus comprises first, second, third, fourth, fifth and sixth enhancer binding sequences, wherein the first enhancer binding sequence is linked to the third enhancer binding sequence by the second enhancer binding sequence, the third enhancer binding sequence is linked to the fifth enhancer binding sequence by the fourth enhancer binding sequence, and the fourth enhancer binding sequence is linked to the sixth enhancer binding sequence by the fifth enhancer binding sequence. The first and second enhancer binding sequences, the second and third enhancer binding sequences, the third and fourth enhancer binding sequences, the fourth and fifth enhancer binding sequences, and the fifth and sixth enhancer binding sequences may each be separated by 20-50 nucleotides. In an embodiment, the first and second enhancer binding sequences, the second and third enhancer binding sequences, the third and fourth enhancer binding sequences, the fourth and fifth enhancer binding sequences, and the fifth and sixth enhancer binding sequences are each separated by 50 nucleotides.

In another aspect, a method of replacing chromatin of a target locus in a cell is provided. The method comprises the following steps: (1) introducing a first DNA binding modulation enhancer into a cell comprising a nucleic acid encoding a locus of interest, wherein the first DNA binding modulation enhancer is not endogenous to the cell; and (2) allowing the first DNA binding modulation enhancer to bind to the first enhancer binding sequence of the target locus, thereby replacing chromatin of the target locus.

In another aspect, a method of reconstituting chromatin of a target locus in a cell is provided. The method comprises the following steps: (1) introducing a first DNA binding modulation enhancer into a cell comprising a nucleic acid encoding a locus of interest, wherein the first DNA binding modulation enhancer is not endogenous to the cell; and (2) allowing the first DNA binding modulation enhancer to bind to the first enhancer binding sequence of the target locus, thereby reconstituting chromatin of the target locus.

As described above, the methods and compositions provided herein can include the binding of one or more DNA binding agents (e.g., first or second DNA binding modulation enhancers) to accomplish modulation of a target locus. Thus, in another aspect, a method of enhancing the accessibility of a target locus in a cell is provided. The method comprises (1) introducing into a cell comprising a nucleic acid encoding a locus of interest: (i) a first DNA binding modulation enhancer, wherein the first DNA binding modulation enhancer is not endogenous to the cell; and (ii) a second DNA binding modulation enhancer, wherein the second DNA binding modulation enhancer is not endogenous to the cell. (2) A first enhancer binding sequence that binds the first DNA binding modulation enhancer to the target locus; and (3) a second enhancer binding sequence that binds the second DNA binding modulation enhancer to the target locus, relative to the absence of the first DNA binding modulation enhancer or the second DNA binding modulation enhancer, thereby enhancing accessibility of the target locus. As provided herein, enhancing (increasing) accessibility of a target locus refers to structural regulation of the target locus that results in enhanced functional activity of a regulatory protein or complex (e.g., an enzyme (e.g., nuclease)) at the target locus. Clearing the target locus from chromatin and/or reconstituting DNA at the target locus to allow better binding of regulatory proteins and/or enhance activity. Thus, the term enhancing (increasing) the accessibility of a target locus includes modulating the structure of the target locus such that the activity of the regulatory protein is increased, wherein the activity includes, for example, enzyme activity, DNA binding activity, transcription activity.

As described above, the methods and compositions provided herein can enhance accessibility of a target locus and thereby can allow recruitment of regulatory activity of the target locus. Thus, a method of modulating a target locus in a cell is provided. The method comprises (1) introducing into a cell comprising a nucleic acid encoding a locus of interest: (i) a first regulatory protein or a first regulatory complex capable of binding to a regulator binding sequence of a target locus, wherein the regulator binding sequence comprises a regulatory site; and (ii) a first DNA binding-regulatory enhancer capable of binding to a first enhancer binding sequence of a target locus. And (2) allowing the first regulatory protein or the first regulatory complex to regulate the regulatory site, thereby modulating the locus of interest in the cell.

As one or more DNA binding agents (e.g., first or second DNA binding modulation enhancing agents) bind to the target locus, the target locus becomes more accessible, thereby allowing for enhanced efficiency and/or specificity of the regulatory protein or regulatory complex at the target locus. For example, the efficiency of a gene editing reaction can be enhanced by, for example, homologous recombination using the methods and compositions provided herein. In embodiments, nuclease activity of the nuclease at the locus of interest is enhanced due to the presence of one or more DNA binding agents (e.g., the first or second DNA binding modulation enhancer).

Thus, in one aspect, a method of enhancing the activity of a regulatory protein or regulatory complex at a target locus in a cell is provided. The method comprises (1) introducing into a cell comprising a nucleic acid encoding a locus of interest: (i) a first regulatory protein or a first regulatory complex capable of binding to a regulator binding sequence of a target locus, wherein the regulator binding sequence comprises a regulatory site; and (ii) a first DNA binding-regulatory enhancer capable of binding to a first enhancer binding sequence of a target locus. And (2) allowing the first DNA binding regulatory enhancer to bind to the first enhancer binding sequence, thereby enhancing the activity of the first regulatory protein or the first regulatory complex at the target locus in the cell.

Also provided herein are compositions and methods for generating regions of chromatin structure accessible to gene editing reagents using DNA binding protein-transcriptional activator fusion proteins. In some aspects, provided herein are uses of DNA binding protein-transcriptional activator fusion proteins and methods of remodeling chromatin using such fusion proteins to allow for enhanced site-directed nucleic acid cleavage. A variation of some aspects of this method is shown in fig. 51.

Transcriptional activation is known to remodel chromatin and disruption is often a defined pattern of nucleosomes at specific loci. (see, e.g., Gilbert AND Ramsahoy, "relationship between chromatin structure AND transcriptional activity IN mammalian genomes)," functional genomics AND PROTEOMICS bulletin (BRIIFENGS IN FUNCTIONALIZALIGNOSICS AND PROTEOMICS) "4: 129-.

The upper part of figure 51 shows a schematic of intracellular nucleic acid regions in which the nucleic acid is in the form of 10nm chromatin fibres. The upper part of FIG. 51 shows the promoter, nucleosome, desired editing site and potential Buddy-TAL binding site. The location of the nucleosomes can vary with the nucleic acid region and/or the particular cell in which the particular nucleic acid region is located. For example, in any particular cell, the promoter nucleic acid may be located wholly or partially within the nucleosome or wholly outside of the nucleosome. In addition, the location of a particular nucleic acid region (e.g., an editing site) relative to the nucleosomes in a particular cell may vary depending on the following factors: specific time points, transcriptional state, cell cycle phase, etc.

For illustrative purposes, using the schematic of fig. 51, TAL-transcription activator fusion proteins bind to a TAL binding site ("TBS") causing transcriptional activation. This allows chromatin remodeling in the transcribed nucleic acid region as well as the surrounding localised region. This chromatin remodeling enhances the accessibility of gene-editing reagents with nucleic acid cleavage activity (e.g., TAL-fokl fusion proteins) to nucleic acids. The net result is an increase in nucleic acid cleavage activity by the gene editing reagent.

FIG. 51 also shows "edit sites". As used herein, "editing site" refers to a nucleic acid site in which one or more gene editing reagents are designed to cleave in order to alter a locus at the nucleotide sequence level (e.g., deletion, insertion, and/or substitution). In this schematic, transcription is used to increase the accessibility of gene editing reagents to editing sites.

A Buddv-TAL, also known as a DNA binding regulation enhancer, can also be combined with DNA binding protein transcription activation fusion protein using. These Buddy-TALs can be used to enhance the binding of DNA binding protein-transcription activation fusion proteins to TAL binding sites ("TBS") and/or to enhance the accessibility of gene editing reagents with nucleic acid cleavage activity to editing sites. Thus, provided herein are compositions and methods for enhancing nucleic acid cleavage using DNA binding protein-transcriptional activator fusion proteins, alone and in combination with Buddy-TAL.

In some aspects, provided herein are methods for editing a first nucleic acid locus in a cell, the method comprising: (A) contacting the second nucleic acid locus with a DNA binding protein-transcriptional activator fusion protein under conditions that allow transcription of the nucleic acid; and (B) contacting the first nucleic acid locus with one or more gene editing reagents having nucleic acid cleavage activity under conditions that allow cleavage of the nucleic acid at the first nucleic acid locus, wherein transcription of the nucleic acid alters the chromatin structure of the first nucleic acid locus. In some cases, one or more DNA binding modulation enhancers designed to bind to one or more nucleic acid positions in the two hundred (e.g., about 30 to about 200, about 50 to about 200, about 60 to about 200, about 30 to about 180, about 30 to about 130, about 45 to about 150, etc.) base pair range of (a) the first nucleic acid locus and/or (b) the second nucleic acid locus may also be used. In some cases, one or more DNA binding modulation enhancers form the first nucleic acid locus, which DNA binding modulation enhancers are designed to bind to nucleic acid positions within two hundred base pairs upstream of the second nucleic acid locus and/or downstream of the first nucleic acid locus. In some cases, the DNA binding protein-transcriptional activator fusion protein can be a TAL-transcriptional activation domain (e.g., p53, NFAT, NF- κ B, VP16, VP32, VP64, etc.) fusion protein. In some cases, at least one of the one or more DNA binding modulation enhancers can be a TAL-nuclease fusion protein (e.g., a TAL-fokl fusion protein).

DNA binding modulation enhancers

A "DNA binding modulation enhancer" as provided herein is an agent that is capable of binding to a corresponding sequence of a target locus in a cell (enhancer binding sequence) and thereby chemically or structurally modulate the target locus. Upon binding to a locus of interest, the DNA binding modulation enhancers provided herein (including embodiments thereof) can modulate chromatin at the locus. DNA binding modulation enhancers, upon binding, may convert densely packed heterochromatin regions upstream (5 ') or downstream (3') of the enhancer binding sequence into less densely packed euchromatin regions. Transformation can be achieved by dissociation of histones from the DNA to which they are bound at the target site (chromatin displacement). Alternatively, histones may rearrange within chromatin at a target locus (chromatin remodeling). Once the chromatin structure at the target locus is altered, DNA becomes more accessible for subsequent modification of the target locus. This effect may be achieved by the binding of one or more DNA binding modulation enhancers (e.g., a first or second DNA binding modulation enhancer). Thus, in embodiments, the methods set forth herein comprise introducing a second DNA binding modulation enhancer capable of binding a second enhancer binding sequence of the target locus.

For the methods provided herein, the enhancer and regulatory protein or complex can be introduced into the cell in a variety of ways. Enhancers and regulatory proteins or complexes may be introduced by transfection of nucleic acids (vectors) encoding the enhancers and regulatory proteins or complexes. Alternatively, the enhancer and regulatory protein or complex may be introduced by transfection of mRNA encoding the enhancer and regulatory protein or complex. The enhancer and the regulatory protein or complex may be further introduced by direct transfection of the actual agent, regulatory protein or regulatory complex. One of ordinary skill in the art will immediately recognize that the half-life of an agent, regulatory protein, or complex in a cell (the time that the agent has activity and/or expression in the cell) is determined by the physical form in which it is delivered to the cell. Without being bound by any particular scientific theory, delivery of a nucleic acid encoding an enhancer, regulatory protein or complex will result in an extended expression/presence of the enhancer, regulatory protein or complex in a cell as compared to the enhancer and the regulatory protein or complex transfected as the actual protein or complex.

In embodiments, introducing the first DNA binding modulation enhancer comprises introducing a vector encoding the first DNA binding modulation enhancer. In embodiments, introducing the first DNA binding modulation enhancer comprises introducing mRNA encoding the first DNA binding modulation enhancer. In embodiments, introducing a first DNA-binding modulation enhancer comprises introducing a first DNA-binding protein or a first DNA-binding nucleic acid.

In embodiments, introducing a second DNA binding modulation enhancer comprises introducing a vector encoding the second DNA binding modulation enhancer. In embodiments, introducing a second DNA binding modulation enhancer comprises introducing mRNA encoding the second DNA binding modulation enhancer. In embodiments, introducing a second DNA binding modulation enhancer comprises introducing a second DNA binding protein or a second DNA binding nucleic acid.

In embodiments, introducing the first regulatory protein comprises introducing a vector encoding the first regulatory protein. In embodiments, introducing the first regulatory protein comprises introducing mRNA encoding the first regulatory protein. In embodiments, introducing the first regulatory protein comprises introducing the first regulatory protein. In embodiments, introducing the first regulatory complex comprises introducing a vector encoding the first regulatory complex. In embodiments, introducing the first regulatory complex comprises introducing mRNA encoding the first regulatory complex. In an embodiment, introducing the first modulating complex comprises introducing the first modulating complex.

In embodiments, introducing the second regulatory protein comprises introducing a vector encoding the second regulatory protein. In embodiments, introducing the second regulatory protein comprises introducing mRNA encoding the second regulatory protein. In embodiments, introducing the second regulatory protein comprises introducing the second regulatory protein. In embodiments, introducing the second regulatory complex comprises introducing a vector encoding the second regulatory complex. In embodiments, introducing the second regulatory complex comprises introducing mRNA encoding the second regulatory complex. In an embodiment, introducing the second modulating complex comprises introducing the second modulating complex.

Exemplary DNA binding modulation enhancers that can be used in the methods and compositions provided herein include DNA binding proteins or DNA binding nucleic acids. The first and second DNA binding modulation enhancers may be the same or chemically different. In embodiments, the first DNA binding modulation enhancer is not endogenous to the cell. In embodiments, the second DNA binding modulation enhancer is not endogenous to the cell. In embodiments, the first DNA binding modulation enhancer is a first DNA binding protein or a first DNA binding nucleic acid. In embodiments, the first DNA binding modulation enhancer is a first transcription activator-like (TAL) effector protein or a truncated first guide rna (grna). In embodiments, the first DNA binding modulation enhancer is a first zinc finger DNA binding protein. In embodiments, the second DNA binding modulation enhancer is a second DNA binding protein or a second DNA binding nucleic acid. In embodiments, the second DNA binding modulation enhancer is a TAL effector protein or a truncated gRNA.

A "truncated gRNA" or "truncated guide RNA" is a ribonucleic acid that corresponds to a wild-type guide RNA, but contains fewer nucleotides than the wild-type guide RNA. As provided herein, a truncated gRNA can bind to a Cas9 protein. Thus, a truncated guide RNA as provided herein can be an RNA that binds to a Cas9 protein and is capable of binding to a regulator binding sequence. Cas9 protein bound to a truncated gRNA is unable to cleave regulator binding sequences. Thus, in embodiments, the DNA binding modulation enhancer is a truncated gRNA that binds to Cas9 protein. In embodiments, the Cas9 protein that binds to the truncated gRNA is a streptococcus pyogenes Cas9 protein. The streptococcus pyogenes Cas9 protein as provided herein is a Cas9 protein derived from the bacterial streptococcus pyogenes.

Truncated grnas provided herein can be less than 16 nucleotides in length. In embodiments, the truncated gRNA is no more than 15 nucleotides in length. In embodiments, the truncated gRNA is 10 to 15 nucleotides in length. In embodiments, the truncated gRNA is 11 to 15 nucleotides in length. In embodiments, the truncated gRNA is 12 to 15 nucleotides in length. In embodiments, the truncated gRNA is 13 to 15 nucleotides in length. In embodiments, the truncated gRNA is 10 to 14 nucleotides in length. In embodiments, the truncated gRNA is 10 to 13 nucleotides in length. In embodiments, the truncated gRNA is 10 to 12 nucleotides in length. In an embodiment, the truncated gRNA is 16 nucleotides in length. In embodiments, the truncated gRNA is less than 15 nucleotides in length. In an embodiment, the truncated gRNA is 15 nucleotides in length. In embodiments, the truncated gnas are less than 14 nucleotides in length. In embodiments, the truncated gnas are 14 nucleotides in length. In embodiments, the truncated gRNA is less than 13 nucleotides in length. In embodiments, the truncated gnas are 13 nucleotides in length. In embodiments, the truncated gRNA is less than 12 nucleotides in length. In an embodiment, the truncated gRNA is 12 nucleotides in length. In embodiments, the truncated gnas are less than 11 nucleotides in length. In an embodiment, the truncated gRNA is 11 nucleotides in length. In embodiments, the truncated gRNA is less than 10 nucleotides in length. In an embodiment, the truncated gRNA is 10 nucleotides in length. In embodiments, the truncated gnas are less than 9 nucleotides in length. In an embodiment, the truncated gRNA is 9 nucleotides in length. In embodiments, the truncated gRNA is less than 8 nucleotides in length. In an embodiment, the truncated gRNA is 8 nucleotides in length. In embodiments, the truncated gRNA is less than 7 nucleotides in length. In an embodiment, the truncated gRNA is 7 nucleotides in length. In embodiments, the truncated gRNA is less than 6 nucleotides in length. In an embodiment, the truncated gRNA is 6 nucleotides in length. In embodiments, the truncated gRNA is less than 5 nucleotides in length. In an embodiment, the truncated gnas are 5 nucleotides in length. In embodiments, the truncated gRNA is less than 4 nucleotides in length. In an embodiment, the truncated gRNA is 4 nucleotides in length.

Enhancer binding sequences

An "enhancer-binding sequence" as provided herein is a nucleic acid sequence that forms part of a target locus and is bound by a DNA-binding regulatory enhancer. In embodiments, the enhancer binding sequence is a TAL nucleic acid binding cassette. As used herein, a "TAL nucleic acid binding cassette" (also referred to as a "TAL cassette") refers to a nucleic acid encoding a polypeptide that allows a protein comprising the polypeptide to bind a single base pair (e.g., A, T, C or G). In embodiments, the protein will comprise more than one polypeptide encoded by a TAL nucleic acid binding cassette. The individual amino acid sequences of the encoded multimers are referred to as "TAL repeats. In embodiments, TAL repeats will be between twenty-eight and forty amino acids in length and (for the amino acids present) will share at least 60% (e.g., at least about 65%, at least about 70%, at least about 75%, at least about 80%, about 60% to about 95%, about 65% to about 95%, about 70% to about 95%, about 75% to about 95%, about 80% to about 95%, about 85% to about 95%, about 60% to about 90%, about 60% to about 85%, about 65% to about 90%, about 70% to about 90%, about 75% to about 90%, etc.) identity with the following thirty-four amino acid sequences: LTPDQVVAIA SXXGGKQALE TVQRLLPVLC QAHG (SEQ ID NO: 118).

In embodiments, two xs at positions twelve and thirteen in the above sequence represent amino acids in the TAL nucleic acid binding cassette that also recognize a particular base in a nucleic acid molecule.

In embodiments, the last TAL repeat sequence present at the carboxy terminus of a run of repeats is typically a partial TAL repeat sequence, wherein the carboxy terminus can be deleted (e.g., approximately the amino terminus of this last TAL repeat sequence 15 to 20 amino acids).

In embodiments, the enhancer binding sequence is a nucleic acid sequence capable of binding (hybridizing) to a guide RNA binding sequence or a guide DNA binding sequence. In embodiments, the first enhancer binding sequence has the sequence of SEQ ID NO: 26. SEQ ID NO: 28. SEQ ID NO: 30. SEQ ID NO: 32. SEQ ID NO: 34. SEQ ID NO: 36. SEQ ID NO: 38 or SEQ ID NO: 40, in a sequence of seq id no. In embodiments, the second enhancer binding sequence has the sequence of SEQ ID NO: 27. SEQ ID NO: 29. SEQ ID NO: 31. SEQ ID NO: 33. SEQ ID NO: 35. SEQ ID NO: 37. SEQ ID NO: 39 or SEQ ID NO: 41.

Regulatory proteins and regulatory complexes

The regulatory proteins and regulatory complexes provided herein may or may not be endogenous to the cell. The terms "regulatory protein" and "regulatory complex" as provided herein refer to a molecule (e.g., a protein or protein conjugate) or molecular complex (e.g., a ribonucleoprotein complex) capable of structurally and/or chemically altering a target locus. Changes in the structure or chemical composition of a target locus may include changes in the entire target locus or portions thereof. Examples of regulatory proteins include, but are not limited to, double-stranded nucleases, nickases, transcriptional activators, transcriptional repressors, nucleic acid methylases, nucleic acid demethylases, topoisomerases, gyromases, ligases, methyltransferases, transposases, glycosylases, integrases, kinases, phosphatases, thioureas, polymerases, fluorescent activities, and recombinases. Non-limiting examples of regulatory complexes provided herein include ribonucleoprotein complexes and deoxyribonucleoprotein complexes.

In embodiments, the first regulatory protein or the second regulatory protein comprises a DNA binding protein or a DNA regulatory enzyme. The DNA binding protein may be a transcriptional repressor or a transcriptional activator. In embodiments, the DNA-modulating enzyme is a nuclease, deaminase, methylase or demethylase. In embodiments, the first regulatory protein or the second regulatory protein comprises a histone regulatory enzyme. In embodiments, the histone modifying enzyme is a deacetylase or an acetylase.

In embodiments, the first regulatory protein or the second regulatory protein comprises a first DNA binding domain operably linked to a first DNA modification domain. In embodiments, the first DNA-binding domain is a TAL effector domain and the first DNA modification domain is a transcriptional activation domain or a transcriptional repression domain. In embodiments, the first DNA modification domain is the VP16 domain. In embodiments, the first DNA modification domain is the VP64 domain. In embodiments, the first DNA modification domain is a VP16 domain, a VP32 domain, or a VP64 transcriptional activation domain or a KRAB transcriptional repression domain.

In embodiments, the first regulatory protein is a first DNA-binding nuclease conjugate. In embodiments, the second regulatory protein is a second DNA-binding nuclease conjugate. As used herein, "DNA-binding nuclease conjugate" refers to one or more molecules, enzymes, or molecular complexes having nucleic acid cleavage activity (e.g., double-stranded nucleic acid cleavage activity). In most embodiments, the DNA-binding nuclease conjugate component will be a protein or a nucleic acid or a combination of both, but they may be conjugated to cofactors and/or other molecules. The DNA-binding nuclease conjugate will typically be selected based on a variety of factors, such as the efficiency of DS fragmentation generation at the target locus, the ability to generate DS fragmentation generation at an appropriate location at or near the target locus, the low potential to generate DS fragmentation at undesired loci, low toxicity, and cost issues. Many of these factors will vary depending on the cell used and the locus of interest. Many DNA-binding nuclease conjugates are known in the art. For example, in some embodiments, the DNA-binding nuclease conjugate comprises one or more zinc finger proteins, transcription activator-like effectors (TALEs), CRISPR complexes (e.g., Cas9 or CPF1), homing endonucleases or megabase meganucleases, argan-nucleic acid complexes, or meganucleases. In some embodiments, the DNA-binding nuclease binders will have activity that allows them to be nuclear localized (e.g., will comprise a Nuclear Localization Signal (NLS)). In some embodiments, the single-stranded DNA donor may function with a nick or a combination of nicks.

In embodiments, the DNA-binding nuclease conjugate is a TAL effector fusion. As provided herein, "TAL effector fusion" refers to a TAL effector linked to another polypeptide or protein (e.g., alogenin) with which it is not naturally associated in nature. In embodiments, a non-TAL component in a TAL effector fusion will confer a functional activity (e.g., enzymatic activity) to the fusion protein. In embodiments, TAL effector fusions can have binding activity or can have activity that directly or indirectly triggers nucleic acid modification, e.g., nuclease activity.

In embodiments, the first DNA-binding nuclease conjugate comprises a first nuclease and the second DNA-binding nuclease conjugate comprises a second nuclease. In embodiments, the first nuclease and the second nuclease form a dimer. In embodiments, the first nuclease and the second nuclease are independently a transcription activator-like effector nuclease (TALEN). In embodiments, the first nuclease and the second nuclease are independently a FokI nuclease cleavage domain mutant, KKR Sharkey. In an embodiment, the first nuclease and the second nuclease are independently fokl nuclease cleavage domain mutant ELD Sharkey.

In embodiments, the first DNA-binding nuclease conjugate comprises a first transcription activator-like (TAL) effector domain (e.g., a DNA-binding portion of a TAL protein) operably linked to a first nuclease (TALEN). In embodiments, the first DNA-binding nuclease conjugate comprises a first TAL effector domain operably linked to a first fokl nuclease. In embodiments, the second DNA-binding nuclease conjugate comprises a second TAL effector domain operably linked to a second nuclease (TALEN). In embodiments, the second DNA-binding nuclease conjugate comprises a second TAL effector domain operably linked to a second fokl nuclease. In embodiments, the first DNA-binding nuclease conjugate comprises a first zinc finger nuclease. In embodiments, the second DNA-binding nuclease conjugate comprises a first zinc finger nuclease.

As used herein, the term "zinc finger nuclease" refers to a protein comprising a polypeptide having a zinc stabilizing nucleic acid (e.g., DNA) binding domain. Individual DNA binding domains are often referred to as "fingers" such that a zinc finger protein or polypeptide has at least one finger, more typically two or three fingers, or even four or five fingers, to at least six or more fingers. In certain aspects, the zinc finger nuclease will comprise three or four zinc fingers. Each finger typically binds two to four base pairs of DNA. Each refers to a zinc-chelating DNA binding region that typically comprises about 30 amino acids (see, e.g., U.S. patent publication No. 2012/0329067 a1, the disclosure of which is incorporated herein by reference).

An example of a nuclease protein that forms part of the conjugates provided herein is the non-specific cleavage domain from the type IIS restriction endonuclease FokI (Kim, Y.G. et al, Proc. Natl. Acad. Sci. USA 93: 1156-60(1996)), which is typically separated by a linker sequence of 5-7 base pairs. A pair of fokl cleavage domains is usually required to allow dimerization of the domains and cleavage of the non-palindromic target sequence from the opposite strand. Individual Cys₂His₂The DNA-binding domain of ZFNs typically comprises 3 to 6 individual zinc finger repeats and each is capable of recognizing 9 to 18 base pairs.

As used herein, a "transcription activator-like effector" (TALE) refers to a protein consisting of more than one TAL repeat sequence and is capable of binding nucleic acids in a sequence-specific manner. TALEs represent a class of DNA binding proteins secreted by plant pathogenic bacterial species (e.g., xanthomonas and ralstonia) via their type III secretion system upon infection of plant cells. Natural TALEs, among others, have been shown to bind to plant promoter sequences, thereby regulating gene expression and activating effector-specific host genes, thereby promoting bacterial reproduction (a) ((b))P. et al, Science 318: 645-648 (2007); boch, j et al, "annual book of phytopathology (annu. rev. phytopathohol.) 48: 419-436 (2010); kay, s, et al, Science 318: 648-651 (2007); kay, s, et al, new microbiology (curr. opin. microbiol.) 12: 37-43(2009)).

TAL modular structures allow DNA binding domains to be combined with effector molecules (e.g., nucleases). Specifically, TALE nucleases allow the development of new genome engineering tools. TALEs used in some embodiments can generate DS fragmentation or can have a combined effect for generating DS fragmentation. For example, TAL-fokl nuclease fusions can be designed to bind at or near a target locus and form a double-stranded nucleic acid cleavage activity by the binding of two fokl domains.

In some embodiments, the TALE will comprise greater than or equal to 6 TAL repeats (e.g., greater than or equal to 8, 10, 12, 15, or 17, or 6 to 25, 6 to 35, 8 to 25, 10 to 25, 12 to 25, 8 to 22, 10 to 22, 12 to 22, 6 to 20, 8 to 20, 10 to 22, 12 to 20, 6 to 18, 10 to 18, 12 to 18, etc.). In some embodiments, the TALE may comprise 18 or 24 or 17.5 or 23.5 TAL nucleic acid binding cassettes. In further embodiments, the TALE may comprise 15.5, 16.5, 18.5, 19.5, 20.5, 21.5, 22.5, or 24.5 TAL nucleic acid binding cassettes. A TALE will typically have at least one polypeptide region flanked by regions comprising TAL repeats. In many embodiments, the flanking regions will be present at the amino terminus and the carboxy terminus of the TAL repeat sequence. Exemplary TALEs are described in U.S. patent publication No. 2013/0274129 a1, the disclosure of which is incorporated herein by reference, and can be modified forms of naturally occurring proteins found in Burkholderia (Burkholderia), xanthomonas (Xanthamonas), and Ralstonia bacteria. In some embodiments, the TALE protein will contain a Nuclear Localization Signal (NLS) that allows it to be transported to the nucleus.

For the methods and compositions provided herein, the nucleic acid targeting ability of the regulatory protein or regulatory complex is enhanced relative to the absence of a DNA binding regulatory enhancer. In embodiments, the rate of homologous recombination at a target locus is increased relative to the absence of a DNA binding modulation enhancer.

The agents provided herein may be endogenous or non-endogenous to the cells in which they are expressed. Thus, in embodiments, the first regulatory protein or first regulatory complex is not endogenous to the cell. In embodiments, the first regulatory protein, first regulatory complex, second regulatory protein, or second regulatory complex is not endogenous to the cell. In embodiments, the first regulatory protein and the second regulatory protein are not endogenous to the cell. In embodiments, the first regulatory complex and the second regulatory complex are not endogenous to the cell. In embodiments, the first DNA binding modulation enhancer or the second DNA binding modulation enhancer is not endogenous to the cell. In embodiments, the first DNA binding modulation enhancer and the second DNA binding modulation enhancer are not endogenous to the cell.

Applicants have surprisingly found that the distance of the first and/or second enhancer binding site relative to the regulator binding sequence influences the effect of the DNA binding regulatory enhancer on the activity of the regulatory protein or regulatory complex. The distance between the first enhancer binding site and the regulator binding sequence is the number of nucleotides that link the majority of the 3 'nucleotides of the first DNA binding regulatory enhancer to the majority of the 5' nucleotides of the regulator binding sequence. Similarly, the distance between the second enhancer binding site and the regulator binding sequence is the number of nucleotides that link the majority of the 3 'nucleotides of the regulator binding sequence to the majority of the 5' nucleotides of the first DNA binding regulatory enhancer. The modulator binding sequence may be bound by a protein (e.g., a DNA binding protein) or a nucleic acid (e.g., a gRNA or gDNA). The regulatory site included in the regulator binding sequence is the position of a nucleotide in the regulator binding sequence that is recognized by the regulatory protein or regulatory complex and corresponds to a nucleotide whose bond to the rest of the regulator binding sequence is hydrolyzed.

In embodiments, the first enhancer binding sequence or the second enhancer binding sequence is separated from the regulator binding sequence by less than 200 nucleotides. In embodiments, the first enhancer binding sequence or the second enhancer binding sequence is separated from the regulator binding sequence by less than 150 nucleotides. In embodiments, the first enhancer binding sequence or the second enhancer binding sequence is separated from the regulator binding sequence by less than 100 nucleotides. In embodiments, the first enhancer binding sequence or the second enhancer binding sequence is separated from the regulator binding sequence by less than 50 nucleotides. In embodiments, the first enhancer binding sequence or the second enhancer binding sequence is separated from the regulator binding sequence by 4 to 30 nucleotides. In embodiments, the first enhancer binding sequence or the second enhancer binding sequence is 7 to 30 nucleotides apart from the regulator binding sequence. In embodiments, the first enhancer binding sequence or the second enhancer binding sequence is separated from the regulator binding sequence by 4 nucleotides. In embodiments, the first enhancer binding sequence or the second enhancer binding sequence is 7 nucleotides apart from the regulator binding sequence. In embodiments, the first enhancer binding sequence or the second enhancer binding sequence is 12 nucleotides apart from the regulator binding sequence. In embodiments, the first enhancer binding sequence or the second enhancer binding sequence is separated from the regulator binding sequence by 20 nucleotides. In embodiments, the first enhancer binding sequence or the second enhancer binding sequence is 30 nucleotides apart from the regulator binding sequence.

In embodiments, the first enhancer binding sequence or the second enhancer binding sequence is 10 to 40 nucleotides apart from the regulatory site. In embodiments, the first enhancer binding sequence or the second enhancer binding sequence is separated from the regulatory site by 33 nucleotides.

In embodiments, the first enhancer binding sequence is 30 nucleotides apart from the regulator binding sequence and the second enhancer binding sequence is 19 nucleotides apart from the regulator binding sequence. In other embodiments, the first enhancer binding sequence and the second enhancer binding sequence are independently 18 nucleotides in length. In another further embodiment, the modulator binding sequence comprises a first binding sequence and a second binding sequence, wherein the first binding sequence and the second binding sequence are independently 18 nucleotides in length and 16 nucleotides apart.

Various forms are provided herein for enhancing the accessibility of other components (e.g., donor DNA molecules, regulatory proteins, regulatory complexes, etc.) present in a cell to a target locus. The binding of regulatory enhancers, which bind to specific DNA sequences (enhancer binding sequences) in the target locus, enhances accessibility. The DNA binding modulation enhancer may be a truncated gRNA or TAL effector domain. In embodiments, two DNA binding modulation enhancers (e.g., a first and a second DNA binding modulation enhancer) bind to a target locus. Where two DNA binding modulation enhancers (e.g., two TAL effector domains or two truncated grnas) bind to a target locus, they may flank regulatory sequences, including, for example, nuclease cleavage sites. The regulatory sequences of the target locus may be more accessible by the binding of DNA binding regulatory enhancers to their corresponding enhancer binding sequences relative to the absence of DNA binding regulatory enhancers.

The invention provides, inter alia, target loci that include two TAL effector domains that each bind to their respective binding sequences (enhancer binding sequences) flanked by regulator binding sequences having a regulatory site (e.g., a nuclease cleavage site). Where the enhancer binding sequence flanks the regulator binding sequence, the first enhancer binding sequence is linked to the second enhancer binding sequence by the regulator binding sequence. Thus, in the 5 'to 3' direction, the locus of interest may encode a first enhancer binding sequence linked to a regulator binding sequence linked to a second enhancer binding sequence. The binding of two TAL effector domains to their respective binding sequences (enhancer binding sequences) allows for accessibility to the target locus, particularly where the regulator binding sequence is bound and/or modified by two TALEN binders. Each of the two enhancer binding sequences may be separated from the regulator binding sequence by, for example, 7 nucleotides. In the case where each of the two enhancer binding sequences is 7 nucleotides apart from the regulator binding sequence, the majority of the 3 'nucleotides of the first enhancer binding sequence (i.e., the last nucleotide) are linked to the majority of the 5' nucleotides of the regulator binding sequence (i.e., the first nucleotide) by a sequence of 7 contiguous nucleotides. Similarly, the majority of the 5 'nucleotides of the second enhancer binding sequence (i.e., the first nucleotide) is linked to the majority of the 3' nucleotides of the regulator binding sequence (i.e., the last nucleotide) via a sequence of 7 consecutive nucleotides. In the case where two regulatory proteins or regulatory complexes or combinations thereof bind to the regulator binding sequence, they may bind by binding to separate binding sequences (first binding sequence and second binding sequence), respectively. The first binding sequence may be included in the 5 'portion of the regulator binding sequence and the second binding sequence may form part of the 3' portion of the regulator binding sequence. Thus, in the 5 'to 3' direction, the regulator binding sequence may comprise a first binding sequence linked to a second binding sequence by at least one nucleotide. In embodiments, the majority of the 5 'nucleotides of the regulator binding sequence (i.e., the first nucleotide) are the majority of the 5' nucleotides of the first binding site, and the majority of the 3 'nucleotides of the regulator binding sequence (i.e., the last nucleotide) are the majority of the 3' nucleotides of the first binding site.

In addition, each of the two enhancer binding sequences may be separated from the cleavage site (regulatory site) by 33 nucleotides. In the case where each of the two enhancer binding sequences is 33 nucleotides apart from the regulatory site, the majority of the 3 'nucleotides of the first enhancer binding sequence are linked to the 5' nucleotides of the regulatory site by a sequence of 33 contiguous nucleotides. Similarly, the majority of the 5 'nucleotides of the second enhancer binding sequence are linked to the 3' nucleotides of the regulatory site by a sequence of 33 contiguous nucleotides.

Thus, in one embodiment, the target locus comprises a first enhancer binding sequence that binds to a first TAL effector protein; a second enhancer binding sequence that binds to a second TAL effector protein; a first DNA-binding nuclease conjugate consisting of a first TAL effector domain operably linked to a first TALEN, wherein the first conjugate binds to a regulator binding sequence at a first binding site; and a second DNA-binding nuclease conjugate consisting of a second TAL effector domain operably linked to a second TALEN, wherein the second conjugate binds to a regulator binding sequence at a second binding site. In another embodiment, the first enhancer binding sequence is 7 nucleotides apart from the regulator binding sequence and the second enhancer binding sequence is 7 nucleotides apart from the regulator binding sequence. In another embodiment, the first enhancer binding sequence is 7 nucleotides apart from the first binding sequence of the regulator binding sequence and the second enhancer binding sequence is 7 nucleotides apart from the second binding sequence of the regulator binding sequence. In another embodiment, the first enhancer binding sequence is 33 nucleotides away from the regulatory site and the second enhancer binding sequence is 33 nucleotides away from the regulatory site. In another embodiment, the first enhancer binding sequence is 12 nucleotides apart from the regulator binding sequence and the second enhancer binding sequence is 12 nucleotides apart from the regulator binding sequence. In another embodiment, the first enhancer binding sequence is 4 nucleotides apart from the regulator binding sequence and the second enhancer binding sequence is 4 nucleotides apart from the regulator binding sequence.

In one embodiment, the target locus comprises a first enhancer binding sequence that binds to a first TAL effector protein; a second enhancer binding sequence that binds to a second TAL effector protein; a first DNA-binding nuclease conjugate consisting of a first TAL effector domain operably linked to a first TALEN, wherein the first conjugate binds to a regulator binding sequence at a first binding site; and a second DNA-binding nuclease conjugate consisting of a second TAL effector domain operably linked to a second TALEN, wherein the second conjugate binds to a regulator binding sequence at a second binding site. In another embodiment, the first enhancer binding sequence is 30 nucleotides apart from the regulator binding sequence and the second enhancer binding sequence is 19 nucleotides apart from the regulator binding sequence. In another embodiment, the first enhancer binding sequence is 30 nucleotides apart from the first binding sequence of the regulator binding sequence and the second enhancer binding sequence is 19 nucleotides apart from the second binding sequence of the regulator binding sequence. In another embodiment, the first enhancer binding sequence is 18 nucleotides in length and the second enhancer binding sequence is 18 nucleotides in length. In another embodiment, the first binding sequence of the regulator binding sequence is separated from the second binding sequence of the regulator binding sequence by 16 nucleotides.

In one embodiment, the target locus comprises a first enhancer binding sequence that binds to a first TAL effector protein; a second enhancer binding sequence that binds to a second TAL effector protein; and a ribonucleoprotein complex consisting of a Cas9 domain bound to a guide RNA, wherein the ribonucleoprotein complex is bound to a regulator binding sequence. In another embodiment, the first enhancer binding sequence is 7 nucleotides apart from the regulator binding sequence and the second enhancer binding sequence is 7 nucleotides apart from the regulator binding sequence. In another embodiment, the first enhancer binding sequence is 20 nucleotides apart from the regulator binding sequence and the second enhancer binding sequence is 20 nucleotides apart from the regulator binding sequence.

In one embodiment, the target locus comprises a first enhancer binding sequence that binds to a first TAL effector protein; a second enhancer binding sequence that binds to a second TAL effector protein; and a DNA binding conjugate consisting of a TAL effector domain operably linked to a transcriptional activation domain, wherein the DNA binding conjugate binds to a regulator binding sequence. In another embodiment, the first enhancer binding sequence is 30 nucleotides apart from the regulator binding sequence and the second enhancer binding sequence is 30 nucleotides apart from the regulator binding sequence. In another embodiment, the first enhancer binding sequence is 18 nucleotides in length and the second enhancer binding sequence is 18 nucleotides in length. In another embodiment, the regulator binding sequence is 18 nucleotides in length.

In another embodiment, the target locus comprises a first enhancer binding sequence that binds to a first truncated guide RNA that binds to Cas9 protein; a second enhancer-binding sequence that binds to a second truncated guide RNA that binds to Cas9 protein; and a ribonucleoprotein complex consisting of a Cas9 domain bound to a guide RNA, wherein the ribonucleoprotein complex is bound to a regulator binding sequence. In another embodiment, the first enhancer binding sequence is 30 nucleotides apart from the regulator binding sequence and the second enhancer binding sequence is 15 nucleotides apart from the regulator binding sequence.

In one embodiment, the regulator binding sequence is 52 nucleotides in length. In another embodiment, the first binding sequence is 18 nucleotides in length. In another embodiment, the second binding sequence is 18 nucleotides in length.

In one embodiment, the first DNA binding modulation enhancer is a first TAL effector protein and the second DNA binding modulation enhancer is a second TAL effector protein.

In one embodiment, the first DNA binding modulation enhancer is a first TAL effector protein and the second DNA binding modulation enhancer is a truncated gRNA. In another embodiment, the truncated gRNA binds to a Cas9 protein.

In one embodiment, the first DNA binding modulation enhancer is a truncated first gRNA and the second DNA binding modulation enhancer is a truncated second gRNA. In another embodiment, the truncated first gRNA binds to a first Cas9 protein and the truncated second gRNA binds to a second Cas9 protein.

In one embodiment, the first DNA binding modulation enhancer is a first TAL effector protein; the second DNA binding modulation enhancer is a second TAL effector protein; the first regulatory protein is a first DNA-binding nuclease conjugate consisting of a first TAL effector domain operably linked to a first TALEN; and the second regulatory protein is a second DNA-binding nuclease conjugate consisting of a second TAL effector domain operably linked to a second TALEN.

In one embodiment, the first DNA binding modulation enhancer is a first TAL effector protein; the second DNA binding modulation enhancer is a second TAL effector protein; and the regulatory protein complex is a Cas9 domain that binds to the guide RNA.

In one embodiment, the first DNA binding modulation enhancer is a truncated first gRNA that binds to a Cas9 protein; the second DNA binding modulation enhancer is a truncated second gRNA that binds to Cas9 protein; and the regulatory protein complex is a Cas9 domain that binds to the guide RNA.

In one embodiment, the first DNA binding modulation enhancer is a truncated first gRNA that binds to a Cas9 protein; the second DNA binding modulation enhancer is a truncated second gRNA that binds to Cas9 protein; the first regulatory protein is a first DNA-binding nuclease conjugate consisting of a first TAL effector domain operably linked to a first TALEN; and the second regulatory protein is a second DNA-binding nuclease conjugate consisting of a second TAL effector domain operably linked to a second TALEN.

Nucleic acid molecules for intracellular alterations

A donor nucleic acid molecule (e.g., a donor DNA molecule) typically contains at least one homologous region corresponding to a nucleic acid at or near a target locus and an inert region designed to modify the target locus. Donor nucleic acid molecules designed for homologous recombination will typically have at least three regions in the following order: (1) a first homology region corresponding to a nucleic acid at or near the target locus; (2) an insertion region; and (3) a second homologous region corresponding to a nucleic acid at or near the target locus (see FIG. 38). Further, the donor nucleic acid molecules may be Single Stranded (SS) or Double Stranded (DS), and one or both ends thereof may be blunt ends, or may have protrusions on one or both ends. The protrusions, when present, may be 5 ', 3' or 3 'and 5'. In addition, the length of the protrusions may vary. The donor nucleic acid molecule will also typically comprise an "insertion" region, which may be from about one nucleotide to about several thousand nucleotides.

As mentioned above, the overhangs, when present, may have different sizes. The overhang may have about 1 to about 1,000 nucleotides (e.g., about 1 to about 1,000, about 5 to about 1,000, about 10 to about 1,000, about 25 to about 1,000, about 30 to about 1,000, about 40 to about 1,000, about 50 to about 1,000, about 60 to about 1,000, about 70 to about 1,000, about 80 to about 1,000, about 100 to about 1,000, about 1 to about 800, about 1 to about 700, about 1 to about 500, about 1 to about 400, about 1 to about 300, about 10 to about 600, about 10 to about 400, about 10 to about 250, about 30 to about 700, about 50 to about 600, about 50 to about 250, about 75 to about 800, about 80 to about 500, about 100 to about 800, about 100 to about 600 nucleotides, etc.).

The efficiency of homologous recombination is increased when one or both ends of the donor nucleic acid molecule are "matched" to the ends of the double-stranded break that are designed to be introduced. In addition, after the donor nucleic acid molecule enters the cell (and before entering the cell), it can be exposed to a nuclease (e.g., endonuclease, etc.). To limit the role of endonucleases in altering the donor nucleic acid molecule, one or more nuclease resistant groups may be present.

The intracellular nucleic acid molecule of the predetermined modification can be any intracellular nucleic acid molecule, including chromosomes, nuclear plasmids, chloroplast genomes, and mitochondrial genomes. Furthermore, the predetermined modified intracellular nucleic acid molecule may be located anywhere in the cell.

FIG. 38 shows a number of variants of donor nucleic acid molecules that can be used in the methods set forth herein. Open circles at the ends represent nuclease resistant groups. Such groups may be located in many positions in the donor nucleic acid molecule. Donor nucleic acid molecule number 6 shows the 3' terminal region of the lower strand after the nuclease resistant group. In some cases, the cellular nuclease will digest this portion of the donor nucleic acid molecule. These nucleases will be stopped or slowed by the nuclease resistant group, thereby stabilizing the structure at the end of the 3' region of the lower strand.

Compositions comprising nucleic acid molecules comprising one or more (e.g., one, two, three, four, five, six, seven, etc.) nuclease-resistant groups can be used to practice the methods described herein. In many cases, the nuclease-resistant group will be located at one or both ends of the donor nucleic acid molecule. The donor nucleic acid molecule may comprise internal groups relative to one or both termini. In many cases, some or all of such donor nucleic acid molecules will be processed within the cell to create ends that match the double-strand break site.

The homologous regions can be of different lengths and can have different numbers of sequence identities with the nucleic acid at the target locus. Generally, the efficiency of homologous recombination increases with the length of the homologous region and the sequence identity. The length of the homologous region used is generally determined by the following factors: such as the fragility of large nucleic acid molecules, transfection efficiency, and ease of generating nucleic acid molecules containing regions of homology.

The total length of the homologous regions can be from about 20 bases to about 10,000 bases (e.g., from about 20 bases to about 100 bases, from about 30 bases to about 100 bases, from about 40 bases to about 100 bases, from about 50 bases to about 8,000 bases, from about 50 bases to about 7,000 bases, from about 50 bases to about 6,000 bases, from about 50 bases to about 5,000 bases, from about 50 bases to about 3,000 bases, from about 50 bases to about 2,000 bases, from about 50 bases to about 1,000 bases, from about 50 bases to about 800 bases, from about 50 bases to about 600 bases, from about 50 bases to about 500 bases, from about 50 bases to about 400 bases, from about 50 bases to about 300 bases, from about 50 bases to about 200 bases, from about 100 bases to about 8,000 bases, from about 100 bases to about 2,000 bases, from about 100 bases to about 100 bases, from about 100 bases to about 700 bases, from about 100 bases, From about 100 bases to about 600 bases, from about 100 bases to about 400 bases, from about 100 bases to about 300 bases, from about 150 bases to about 1,000 bases, from about 150 bases to about 500 bases, from about 150 bases to about 400 bases, from about 200 bases to about 1,000 bases, from about 200 bases to about 600 bases, from about 200 bases to about 400 bases, from about 200 bases to about 300 bases, from about 250 bases to about 2,000 bases, from about 250 bases to about 1,000 bases, from about 350 bases to about 2,000 bases, from about 350 bases to about 1,000 bases, etc.).

In some cases, it may be desirable to use sequence homology regions of less than 200 bases in length. This will typically be the case when the donor nucleic acid molecule comprises a small insert (e.g., less than about 300 bases) and/or when the donor nucleic acid molecule has one or two overhangs that match the double strand break site.

The overhangs may be of various lengths, and may be of different lengths at each end of the same donor nucleic acid molecule. In many cases, these protrusions will form regions of sequence homology. FIG. 38, for example, shows a series of donor nucleic acid molecules with 30-nucleotide single-stranded protrusions. These donor nucleic acid molecules are shown as single-stranded and double-stranded. Donor nucleic acid molecule number 1 in fig. 38 is a single-stranded molecule with a 30 nucleotide sequence having a predetermined double-stranded break site, a 30 nucleotide insertion, and two nuclease-resistant groups at each end.

The greater the amount of sequence identity shared by the homologous region and the target locus nucleic acid, the greater the efficiency of homologous recombination in general. A high level of sequence identity is particularly desirable when the region of homology is relatively short (e.g., 50 bases). Typically, the amount of sequence identity between the locus of interest and the homologous region will be greater than 90% (e.g., about 90% to about 100%, about 90% to about 99%, about 90% to about 98%, about 95% to about 100%, about 95% to about 99%, about 95% to about 98%, about 97% to about 100%, etc.).

The insertion region of the donor nucleic acid molecule can have a variety of lengths depending on its intended use. In many cases, the donor nucleic acid molecule is about 1 to about 4,000 bases in length (e.g., about 1 to 3,000, about 1 to 2,000, about 1 to 1,500, about 1 to 1,000, about 2 to 1,000, about 3 to 1,000, about 5 to 1,000, about 10 to 400, about 10 to 50, about 15 to 65, about 2 to 15 bases, etc.).

Also provided herein are compositions and methods for introducing small amounts of bases (e.g., about 1 to about 10, about 1 to about 6, about 1 to about 5, about 1 to about 2, about 2 to about 10, about 2 to about 6, about 3 to about 8, etc.) into a nucleic acid within a cell). For illustrative purposes, donor nucleic acid molecules of fifty-one base pairs in length can be prepared. This donor nucleic acid molecule can have two homologous regions of 25 base pairs in length, with the intervening region being a single base pair. Homologous recombination will introduce a single base pair at the target locus when the nucleic acid surrounding the target locus substantially matches the homologous region without an intervening base pair. Such homologous recombination reactions can be used, for example, to disrupt the protein-encoding reading frame, thereby introducing a frame shift in the nucleic acid within the cell. Thus, the present invention provides compositions and methods for introducing one or a small number of bases into intracellular nucleic acid molecules.

The invention further provides compositions and methods for altering short nucleotide sequences in nucleic acid molecules within cells. One such example is the alteration of a single nucleotide position, and one example is the correction or alteration of a Single Nucleotide Polymorphism (SNP). For illustrative purposes, donor nucleic acid molecules having two homologous regions of 25 base pairs in length can be designed using SNP alteration. Located between these homologous regions is a single base pair, which is essentially a "mismatch" of the corresponding base pair in the intracellular nucleic acid molecule. Thus, homologous recombination can be used to change a SNP by changing base pairs to one that is considered wild-type or another base (e.g., a different SNP). Cells that have correctly undergone homologous recombination can be identified by subsequent sequencing of the locus of interest.

The invention also includes compositions and methods for altering the genome for therapeutic applications, including SNP alterations. For illustrative purposes, two genetic diseases caused by SNP changes are listed below.

The most common SNP associated with sickle cell anemia is rs334, which causes a change in the single codon from GAG to GTG. This change results in the substitution of the glutamic acid residue with a valine residue. The compositions and methods set forth herein are suitable for changing this SNP from GTG to GAG, particularly the individual homozygote for SNP rs 334. One of these reasons relates to the toxicity-related effects that can be induced by introducing nucleic acid molecules into cells. Furthermore, the magnitude of these effects is that they increase with the amount of nucleic acid introduced into the cell. As shown in the examples below, the efficiency of genomic insertion makes it necessary to introduce relatively small amounts of donor DNA into the cells (see, e.g., donor DNA-NLS binder data in fig. 11 and 13).

An exemplary ex vivo workflow for altering SNP rs334 in a patient would include removing bone marrow tissue from the patient, altering SNP rs334, and then reintroducing the editing cells back into the patient.

One of the most common genomic alterations associated with cystic fibrosis is based on a three base pair deletion in the cystic fibrosis transmembrane conductance regulator (CFTR) (SNP rs199826652), which results in the deletion of the amino acid phenylalanine at position 508.

The in vivo workflow for altering SNP rs199826652 in a patient includes delivering a donor DNA molecule to tracheal cells of the patient under conditions where a three base pair insertion occurs to correct SNP rs 199826652.

Low doses of donor nucleic acid required for efficient gene editing can also be used for systemic delivery. This is so because low doses are associated with reduced toxicity. Low donor DNA molecule levels are particularly important when modified nucleic acid molecules (e.g., nucleic acid molecules having phosphorothioate linkages) are used.

The donor nucleic acid molecule can bind to an extracellular targeting moiety as well as an intracellular targeting moiety. An "extracellular targeting moiety" is a molecule that directs a donor nucleic acid molecule to one or more cell types. Such moieties include cell surface receptor ligands and antibodies. Domain II of pseudomonas has been shown to be involved in transport across cell membranes. (Jinno et al, J.Biol.chem.) -263: 13203-13207 (1988)). Thus, an exemplary system for delivering nucleic acid molecules to subcellular locations in an organism may involve the following components: (1) a donor DNA molecule, (2) a nuclear binding signal (NLS), and (3) a fusion protein comprising an antibody that binds to a cell surface receptor and domain II of pseudomonas exotoxin, wherein the NLS and fusion protein are covalently bound to the donor DNA molecule. This type of donor DNA molecule allows for systemic delivery of the donor DNA molecule, wherein the donor DNA molecule is to be delivered to a subcellular location within a cell comprising a cell surface receptor.

In each of the two examples described above, only one copy of one allele needs to be changed in order for the patient to gain considerable benefit. However, in many cells, both copies of a SNP will be altered. Thus, the present invention encompasses the treatment of afflictions caused by both homozygous and heterozygous gene components.

The donor nucleic acid molecule may also be designed to introduce a chromosomal open reading frame for a functional coding region. One example is the removal of the stop codon at the end of the open reading frame. Such stop codons may be removed because they are not present in the wild-type open reading frame (i.e., represent a "wild-type" alteration), or they may be naturally present at the end of the open reading frame. Stop codons may also be introduced into the coding region. This is particularly useful when attempting to disrupt an open reading frame.

In addition, a tag coding region may be introduced such that protein expression results in a tagged protein. Such tags can be introduced into intracellular nucleic acids such that the tag is present at one or more of the amino terminus, the carboxy terminus, or the interior of the protein. Examples of the tags include epitope tags (e.g., His tag, Maltose Binding Protein (MBP) tag, Cellulose Binding Domain (CBD) tag, glutathione S-transferase (GST) tag, and the like) and enzyme tags (e.g., horseradish peroxidase (HRP) tag, Alkaline Phosphatase (AP) tag, and the like).

Thus, one aspect includes compositions and methods for producing a non-naturally occurring protein without the need to clone a nucleic acid molecule encoding the non-naturally occurring protein. These methods are based, in part, on the introduction of a polypeptide coding region into an intracellular nucleic acid molecule at a location where a fusion protein is produced, which is encoded by a modified intracellular nucleic acid molecule, followed by expression of the encoded fusion protein and isolation of the fusion protein from the cell.

Cells

The cells provided herein (including embodiments thereof) include complexes capable of enhancing the accessibility of a genomic locus in a cell. The provided complexes can enhance the activity of a regulatory protein or complex at a genomic (target) locus by including an enhancer protein that can increase the accessibility of the regulatory protein to the locus. For example, upon binding of a DNA binding modulation enhancer provided herein to a genomic locus (a target locus), the locus is made more accessible to nuclease or other enzyme activity, thereby enhancing the efficiency and effectiveness of the nuclease or other enzyme activity.

The compositions and methods of the invention can be used to generate cell lines that can be used for many purposes. For example, a single locus or multiple loci can be altered. One example of a cell line that can be generated is the CHO cell line used to produce humanized antibodies. To generate such cell lines, a donor nucleic acid molecule encoding a humanized antibody sequence is introduced into the CHO cell line under conditions designed for insertion into the CHO cell genome. Typically, a selectable marker is also introduced into the genome to allow for selection of modified cells. Of course, any suitable cell line and essentially any desired coding sequence can be used. Thus, one aspect includes compositions and methods for producing cells that can be used for the biological production of gene products (e.g., proteins).

The compositions and methods of the invention can also be used to generate a unified pool of primary or cancer cells. Along these lines, efficient gene editing can alter cells available for "downstream" applications, either directly or with minimal selection.

One exemplary workflow involves the synthesis of simvastatin (simvastatin) precursor (monacolin j). The simvastatin precursor can be chemically prepared using a multi-step process involving lovastatin, a fungal polyketide produced by Aspergillus terreus. The lovastatin hydrolase enzyme found in Penicillium chrysogenum has been identified and characterized. This hydrolase efficiently hydrolyzes lovastatin to monacolin J, but the activity of simvastatin is not detectable (see Huang et al, Single-step production of the simvastatin precursor monacolin J by Engineering an industrial strain of A.terreus, Metabolic Engineering 42: 109-.

In this work flow, an A.terreus producer cell line was developed by stably introducing the P.chrysogenum lovastatin hydrolase into the A.terreus genome. Since lovastatin is a natural polyketide product produced by aspergillus terreus, the engineered cells then convert lovastatin intracellularly to monacolin J. Thus, one workflow is to engineer aspergillus terreus cells using the methods described herein, wherein a sufficient percentage (e.g., more than 60%) of the cell population expresses lovastatin hydrolase such that the cell population is directly available for monacolin J production. An alternative workflow is to select or select against engineered aspergillus terreus cells prior to use.

A workflow similar to that described above can be used to generate cells (e.g., primary mammalian cells, immortalized mammalian cells, etc.) for use in screening assays. One example is the modification of primary hepatocytes, which are then used for screening based on drug-related hepatotoxicity.

Reagent kit

The invention also provides kits, in part, for assembling and/or storing nucleic acid molecules and for editing the genome of a cell. As part of these kits, materials and instructions for assembling nucleic acid molecules and preparing reaction mixtures are provided for storage and use of the kit components.

Kits of the invention will often contain one or more of the following components:

1. one or more DNA binding modulation enhancers (e.g., a TAL effector protein or a truncated gRNA that binds to a Cas9 protein),

2. one or more regulatory proteins (e.g., a DNA-binding nuclease conjugate comprising a TAL effector domain linked to a nuclease),

3. one or more regulatory complexes (e.g., one or more Cas9 domains that bind to grnas, an argan domain that binds to guide DNA, etc.), and

4. instructions for how to use the kit components.

The kit reagents may be provided in any suitable container. The kit may provide, for example, one or more reaction or storage buffers. The reagents may be provided in a form that enables use in a particular reaction, or in a form that requires addition of one or more other components prior to use (e.g., in concentrate or lyophilized form). The buffer may be any buffer including, but not limited to, sodium carbonate buffer, sodium bicarbonate buffer, borate buffer, Tris buffer, MOPS buffer, HEPES buffer, and combinations thereof. In some embodiments, the buffer is basic. In some embodiments, the buffer has a pH of about 7 to about 10.

Examples of the invention

The following examples are provided to illustrate certain disclosed embodiments and are in no way to be construed as limiting the scope of the disclosure. The examples are not intended to indicate that the following experiments are all or the only experiments performed. Efforts have been made to ensure accuracy with respect to numbers used (e.g., amounts, temperature, etc.) but some experimental errors and deviations should be accounted for. Unless otherwise indicated, parts are parts by weight, molecular weight is weight average molecular weight, temperature is in degrees celsius, and pressure is at or near atmospheric.

Example 1: promoter insertion

Material

GENEART^TMPLATINUM^TMCas9 nuclease, GENEART^TMCRISPR gRNA design tool, GENEART^TMPrecise gRNA synthesis kit, 293FT cells, Dulbecco's Modified Eagle Medium (DMEM) culture Medium, Fetal Bovine Serum (FBS), TRYPLE^TM2 percent of expression enzymeEX agarose gel, Transcriptaid T7 high-yield transcription kit, MEGACLEAR^TMTranscription purification kit and ZEROPCR cloning assayA medicine box,Pro Quick96 plastid purification kit and PURELINK^TMA PCR purification kit,RNABR assay kit,10 mu L kit of transfection system,OPTMIZER^TMCTS^TMT cell expansion SFM, recombinant human IL-2 (Mealbin 2) CTS^TM、DYNABEADS^TMMYONE^TMStreptavidin C1, DYNABEDS^TMHuman T amplification factor CD3/CD28, DYNABEDS^TMUNTOUCHED^TMHuman T cell kit, IgG (total) human ELISA kit, polyclonal β -actin antibody, polyclonal Epidermal Growth Factor (EGFR) antibody, and Phusion Flash high fidelity PCR premix were obtained from Seimerless Feichel science Ficoll-Paque plus, available from GE Healthcare Life sciences NU7026, was ordered from Tocris Bioscience the sequences of the DNA oligonucleotides and donor DNA used in this study are listed in Table 12.

gRNA synthesis

DNA oligonucleotides and primers for gRNA synthesis are generated using GENEART^TMAnd designing a CRISPRgRNA design tool. Then using GENEART^TMAnd synthesizing the gRNA by using a precise gRNA synthesis kit. The concentration of gRNA was determined byRNA BR assay kit.

Production of Long Single-stranded DNA by asymmetric PCR

Donor DNA template is first amplified with a forward primer and a biotinylated reverse primer. The resulting PCR product (20ng) was added to Phusion Flash high fidelity PCR premix containing 0.2. mu.M forward primer and 0.01. mu.M raw PCRThe reverse primer was biotinylated and had a total volume of 50. mu.l. A total of 24 reactions were configured and the following PCR procedure was used: 98 ℃ for 30 seconds, one cycle; then 98 ℃ for 5 seconds, 55 ℃ for 10 seconds and 72 ℃ for 45 seconds for a total of 24 cycles. The final extension was incubated at 72 ℃ for 3 min. To remove the double-stranded DNA template, the PCR product was mixed with 300. mu.l DYNABEADS^TMMYONE^TMStreptavidin C1 was combined and incubated at room temperature under gentle rotation for 20 minutes. Removing magnetic beads with magnet and PURELINK supernatant using 4 columns^TMPCR purification followed by concentration using flash evaporation. About 5. mu.g of single-stranded DNA was obtained.

Genomic lysis and detection assays

By usingThe genomic lysis assay kit (seimer feishell science, catalog No. a24372) was tested for genomic lysis efficiency according to the manufacturer's instructions. The primer sequences for PCR amplification of each genomic locus are described in table 12. Cells were analyzed 48 to 72 hours after transfection. The lysis efficiency was calculated based on the relative intensities of the agarose gel bands, which were calculated using the run3.4.0.0 versionQuantitative determination was performed by a gel recording system (ProteinSimple (San Jose, Calif., USA)).

Isolation of human primary T cells

Human Peripheral Blood Mononuclear Cells (PBMC) were isolated from peripheral blood using a Ficoll-Paque PLUS density gradient according to the manufacturer's instructions. Then using DYNABEADS^TMUNTOUCHED^TMHuman T cell kit human primary T cells were isolated and used OPTMIZER supplemented with 200IU/mL IL-2^TMCTS^TMT cell expansion SFM was expanded. Human T cell activation and expansion was performed using DYNABEADS^TMHuman T amplification factor CD3/CD28 kitThe process is carried out. On day 3 of activation, T cells were harvested for transfection.

Cell transfection

293FT or A549 cells were maintained in DMEM medium supplemented with 10% FBS. On the day of transfection, cells were detached from the flask and counted. At each electroporation, 1.5 μ g Cas9 protein and 360ng gRNA were added to resuspension buffer R up to a final volume of 7 μ Ι, but the total volume of Cas9 protein plus gRNA added was less than 1 μ Ι. After mixing, the samples were incubated at room temperature for 5 to 10 minutes to form Cas9RNP complexes. At the same time, contain 1 × 10⁶Aliquots of individual cells were treated with Ca-free 2+ and Mg-free aliquots²⁺The DPBS was washed once and the cell pellets were resuspended in 50. mu.l of resuspension buffer R. A5. mu.l aliquot of the cell suspension was then mixed with 7. mu.l Cas9RNP, followed by the addition of 1. mu.l of the indicated amount of donor DNA. Mu.l of cell suspension containing Cas9RNP and donor was appliedOn an electroporation device (seemer femier science, catalog number MPK5000), the voltage of the electroporation device was set to 1150V, the pulse width was set to 20ms and the number of pulses was set to 2, accordingly. The electroporated cells were transferred to 48-well plates containing 0.5m1 medium. Samples without grnas or donor DNA served as controls. At 48 hours post-transfection, cells were analyzed by flow cytometry. Alternatively, genomic loci are PCR amplified with corresponding primers. Use ofThe resulting PCR fragments were analyzed by genomic lysis assay. The edited cells were further subjected to limiting dilution followed by clonal cell isolation. Clonal cells were characterized by PCR amplification and sequencing of N-terminal and C-terminal adaptors. Using VECTOR NTI11.5 software (Saimer Feishell science) analysis of sequencing data.

When transfecting primary T cells, eachElectroporation apparatus Using 1X10⁵The voltage of each cell, accordingly, the apparatus was set to 1700V, the pulse width was set to 20ms, and the number of pulses was set to 1. To assess the effect of chemical modification on HDR efficiency, phosphorothioate or amine modified nucleotides were added at specific positions of the oligonucleotides during chemical synthesis. The resulting modified oligonucleotide is then used to amplify the donor DNA. When cells were treated with NU7026 inhibitors, cells were transfected as described above and then added to cell culture medium containing 30 μ M NU 7026. Cells were analyzed 48 hours after transfection.

Protein labeling strategy

Protein labeling allows researchers to visualize the subcellular localization of proteins and study their function. The strategy for labeling endogenous cellular proteins is depicted in fig. 1. The promoterless puromycin selection marker is linked to the reporter gene by a self-cleaving 2A peptide. The puromycin gene is located at the 5 'end (for N-terminal labeling) or 3' end (for C-terminal labeling) of the fusion protein. 35nt homology arms were added to the 5 'and 3' ends of the donor DNA by PCR amplification. Expression of puromycin is driven by an endogenous promoter, while the reporter gene is fused in-frame to the endogenous gene. TALENs or CRISPRs are designed to target genomic loci near the ATG start codon for N-terminal labeling or to target genomic loci near the stop codon for C-terminal labeling. The resulting TALENs or CRISPRs and donor DNA are then delivered into cells by lipid-mediated transfection or electroporation. 48 hours after transfection, cells were treated with puromycin for 7 days and then analyzed by fluorescence microscopy or by ligation PCR and sequencing.

Examples of N-terminal protein labeling

To evaluate strategies for labeling endogenous proteins, we fused the OFP gene to the N-terminus of β -actin β -actin is one of the most abundant proteins in eukaryotes and therefore could be easily monitored using fluorescence microscopyThereby forming RNP. The relevant 35nt homology arm was added to the sequence-verified promoterless puromycin-P2A-OFP DNA fragment by PCR amplification. Using PureLink^TMThe resulting donor PCR fragment was purified by PCR purification kit and then concentrated to a final concentration of about 1. mu.g/. mu.l using flash evaporation. To examine the effect of donor dose on HDR efficiency, we kept the amount of Cas9RNP constant and varied the amount of donor DNA. Cas9RNP and donor DNA were transfected into 293FT by electroporation. 48 hours after transfection, cells were analyzed by fluorescence microscopy. When cells were transfected with Cas9RNP or Cas9 protein alone and donor DNA, OFP-positive cells were not detected, whereas when cells were transfected with Cas9RNP and donor DNA, OFP-positive cells were not observed. The percentage OFP positive cells was determined by flow cytometry analysis. In the absence of selection, the percentage of OFP-positive cells increased from about 5% to 20% when the amount of donor DNA increased from 25ng to 500ng (fig. 2A). The optimal amount of donor DNA is about 500ng per reaction. On the other hand, about 80% of the transfected cells were OFP positive 7 days after treating them with 1. mu.g/ml puromycin. The percentage of OFP positive cells did not differ significantly between different amounts of donor DNA (fig. 2A). Next, we investigated the effect of homologous arm length on HDR efficiency. Homology arms of various lengths were added to the promoterless puromycin-P2A-OFP DNA fragment by PCR amplification. As shown in fig. 2B, the percentage of OFP-positive cells increased when the homology arm length increased from 12nt to 80nt, and then stabilized around 35 nt.

Traditionally, plasmid donors are used to incorporate large DNA molecules into the genome. For comparison, we constructed a donor plasmid containing approximately 500nt homology arms. Furthermore, we prepared a long single-stranded DNA donor with 35nt homology arms by asymmetric PCR. Cas9RNP and various forms of donor DNA were delivered into 293FT cells or human primary T cells by electroporation. At 48 hours post-transfection, we analyzed the percentage of OFP positive cells using flow cytometry. As depicted in fig. 1C and 1D, the percentage of OFP-positive cells in 293FT and primary T cells using single-stranded (ss) or double-stranded dna (ds) fragments with 35nt homology arms was significantly higher than the percentage of OFP-positive cells using donor plasmids with long homology arms. The efficiency of using ssDNA donors was higher than that of using dsDNA donors, but their efficiency in 293FT was similar.

In order to check for consistency of integration sites puromycin selection, limiting dilution and clonal cell isolation were performed on cells transfected with Cas9RNP and donor dna.a total of 48 colonies were randomly picked for the splice-type PCR analysis.only one of the 48 colonies failed to grow and produce PCR products when one external primer and one internal primer were used, all other 47 colonies produced PCR products that were N-and C-terminal joined.when a pair of external primers of about 420bp in size was used, PCR products were also observed that correspond to genomic DNA fragments without insertion, no large PCR products were observed because the smaller DNA fragments without insertion were preferentially amplified, sequencing analysis of the PCR products confirmed that about 82% of the N-terminal junctions between genomic DNA and donor DNA showed accurate HDR (fig. 3A (1)), other 18% of the clonal cells also contained insertions, but had mutations in the junction region (fig. 3A (2) and either a partial or homologous region) that the double insertion of the genomic DNA sequence of the insertion was found to be greater than that the full length of the full length genomic DNA of the full length clone (3A) had a deletion and the full length of the genomic DNA of the insertion of the same was found to be greater than the full length of the wild-type DNA of the wild type DNA (3C-type clone).

TALENs (TAL effector nucleases) are another method of introducing double-strand breaks in mammalian genomes. Is designed andthree pairs of TALEN mRNAs were synthesized that target the region near the β -actin ATG codon using 1150 volts, 20 milliseconds (ms), and two pulses viaThe electroporation device transfects TALEN mRNA alone or TALEN mRNA with donor DNA into HEK293FT cells. At 48 hours post-transfection, cells were lysed to measure genome editing efficiency (fig. 3D) or analyzed by flow cytometry (fig. 3E) to determine the percentage of OFP-positive cells (-). Alternatively, cells were treated with puromycin for 7 days prior to flow cytometry analysis (+) (fig. 3E). As depicted in fig. 3D, while T1 and T3 targets produced approximately 60% and 35% indel frequencies, the percentage of OFP-positive cells was very low without puromycin selection. However, after puromycin selection, the percentage OFP-positive rose to about 60% for all three different targets (fig. 3E).

In addition to β -actin, we also evaluated different proteins in different cell lines LRRK2 protein is associated with Parkinson's disease, molecular weight approximately 280 kd. designed to target grna of LRRK2 genomic locus near the initiation codon, amplification by PCR, approximately 35nt of homology arms were added to sequence-verified promoterless puromycin-P2A-EmGFP DNA fragments, using 1050 volts, 30 milliseconds and 2 pulses viaThe electroporation device co-delivers Cas9RNP and donor DNA into a549 cells. Since LRRK2 is a relatively low abundance protein, we were unable to detect the intracellular EmGFP signal. Some commercial antibodies also failed to detect endogenous wild-type LRRK2 protein in whole cell lysates by western blotting. To examine the integration efficiency, cells were treated with 0.75. mu.g/ml puromycin for 7 days at 48 hours post transfection, followed by limiting dilution and clonal cell isolation. The junction is analyzed by PCR using one inner primer and one outer primer or a pair of outer primers. The resulting PCR products were analyzed by sequencing to determine the accuracy of integration. It is surprising that the process of the present invention,all 86 colonies contained at least one copy of the insert. For all colonies, both the N-and C-termini had accurate HDR, and the junction between genomic DNA and donor DNA was correct (fig. 4A and 4B). After isolation of genomic DNA, we were able to detect two PCR products for heterozygotes and one large PCR product for homozygotes. Based on sequencing analysis, about 20% of the population had precise integration of donor DNA in both alleles, while the second allele in the remaining 80% of the population did not contain any inserts, but only had 7nt deletions (fig. 4C). These results indicate that 100% integrated efficiency and 100% accurate HDR can be achieved.

Examples of C-terminal protein tags

The promoter capture strategy for the C-terminal protein marker differs slightly from the promoter capture strategy for the N-terminal protein marker in that the promoterless selection marker for the C-terminal marker is located after the reporter gene and the promoterless selection marker for the N-terminal marker is located before the reporter gene (fig. 1). As an example, we fused the EmGFP tag to the C-terminus of Focal Adhesion Kinase (FAK). Grnas targeting FAK genomic loci near the stop codon were designed and synthesized (table 12). The short homology arms were added by PCR to the sequence-verified EmGFP-2A-puromycin cassette. Via aThe electroporation device delivers Cas9RNP and donor DNA into 293FT cells. At 48 hours post-transfection, cells were selected with 0.75. mu.g/ml puromycin for 7 days, followed by limiting dilution and clonal cell isolation. The junctions were analyzed by PCR and sequencing. As depicted in fig. 5A and 5B, about 95% and 85% of the clones had correct junctions at the N-or C-terminus, respectively. Other clones also contained insertion cassettes, but had indels formed at the junction point or Cas9 cleavage site. Again, we observed that repeats of part or full length homology arms were inserted into the genome. Overall, all clones examined contained at least one copy of donor DNA, with about 70% of the clones having precise HDR at both the N-and C-termini, while the other 30% contained imprecise HDR in one allele. About 30% of the cloned cells in both allelesIncorporating a donor. About 70% of the cells had no insert at the second allele, but had an indel formed at the junction point of the Cas9 cleavage site. Only one wild-type clone was detected in the second allele (fig. 5C).

In addition to FAK, we examined other proteins, such as Epidermal Growth Factor Receptor (EGFR). There are several isoforms of EGFR. In this study, we fused EmGFP to the C-terminus of EGFR isoform 1. grnas are designed to cleave EGFR genomic loci near the stop codon. Short homology arms were added to the insertion cassette by PCR. Cas9RNP and donor DNA were sent into 293FT cells via electroporation. After puromycin selection, cells were clonally isolated. Surprisingly, all 19 colonies had one insertion cassette on one allele, with 100% correct conjugation of the N-and C-termini. About 17% of the colonies underwent biallelic integration, while 83% of the colonies contained no insertion on the second allele, but only had an "a" insertion at the Cas9 cleavage site (fig. 6). Genomic modifications of EGFR by EmGFP were detected by western blot.

End modification of DNA donors and Effect of NHEJ inhibitors on HDR efficiency

Linear ds-DNA or ss-DNA donors can be degraded in vivo by exonucleases. Terminal modifications of donor DNA may be able to prevent their degradation. To test this hypothesis, DNA primers with different modifications at the 5' end were chemically synthesized (table 12). Donor DNA containing promoterless puromycin-P2A-OFP fragment was then prepared by PCR amplification using modified DNA primers. The obtained PCR product was PURELINK^TMCas9RNP targeting β -actin genomic locus was co-delivered into primary T cells by electroporation with various forms of donor DNA 48 hours after transfection the percentage of OFP positive cells was determined by flow cytometry analysis As described in FIG. 7A, the HDR efficiency was increased by approximately 2-fold for phosphorothioate modified DNA donors compared to unmodified donor DNAThen (c) is performed. Using donor DNA modified with amines on both ends, the percentage of OFP positive cells increased by about 4-fold. End modification of ssDNA donors also improves HDR efficiency. However, the efficiency of using amine modified dsDNA donors is about 2 times higher than using modified ssDNA donors.

Disruption of the NHEJ repair pathway is known to improve HDR efficiency. Here, we examined how those NHEJ inhibitors affect the integration of relatively large DNA molecules into human primary T cells. Immediately after Cas9RNP and donor DNA were sent into primary T cells via electroporation, we transferred the cells into medium containing 30 μ M Nu 7026. At 48 hours post-transfection, we analyzed the cells by flow cytometry. As shown in figure 7B, treatment of cells with Nu7026 increased the percentage of OFP positive cells by about 5-fold for unmodified donor DNA and 2-fold for amine modified donor DNA. Similar results were obtained with other DNAPK inhibitors, including Nu7441 and Ku-0060648.

Potential applications

Using the above method, we can easily integrate a large piece of DNA into the mammalian genome with near 100% integration efficiency, allowing researchers to clone foreign DNA of interest directly into the mammalian genome and express proteins for therapeutic applications.

Examples of expression cassettes

By way of example, we prepared an approximately 4.2kb human IgG expression cassette containing a promoterless selection marker, a Cytomegalovirus (CMV) promoter, an IgG heavy chain, an IgG light chain, and WPRE (woodchuck hepatitis virus post-transcriptional regulatory element) the CMV promoter drives the expression of the IgG heavy and light chains, which are linked by 2A self-cleaving peptide (fig. 8A), a 35nt short homology arm was added to the expression cassette by PCR followed by PCR column purification, as described above, the expression cassette was inserted into the β -actin locus in 293FT cells, after 7 days of puromycin selection, we measured the titer of IgG production in the stable cell pool using ELISA assay.

To characterize each clonal cell in the stabilization pool, we performed limiting dilution and clonal cell isolation. The integrated junctions were analyzed by PCR and sequencing. As depicted in fig. 8B and 8C, about 88% of the clonal cells had precise integration at the N-terminal junction, while 12% of the clonal cells were inserted with some additional sequence at the junction. On the other hand, about 41% of the cloned cells had correct junctions at the C-terminus, while 59% had small mutations at the junction. For example, we observed base substitutions with one or several nucleotide insertions in the WPRE poly A tail region. Small mutations occur after the stop codon and may not affect IgG expression. To confirm this, we examined IgG titers from each cloned cell. As shown in fig. 8D, about 70% of the clonal cells were able to produce antibodies.

In this study, endogenous proteins are labeled, the expression level of chimeric proteins depends on the abundance of endogenous proteins within the cell, but for abundant proteins (e.g., β -actin), conventional wide field Fluorescence microscopy is not sufficient for detection, however, for proteins with low abundance (e.g., LRRK2), Fluorescence molecules inside living cells can be visualized using High resolution Fluorescence techniques, such as Fluorescence Resonance Energy Transfer (FRET) and continuous wave ultrasound switchable Fluorescence (CW-USF), with improved spatiotemporal resolution (Sekar et al, "Fluorescence Resonance Energy Transfer (FRET) microscopy of living cell protein localization," the cytological journal of cell biology (j. Biol.) 2003: 160) and "the signal to noise ratio of cell amplification of biological genes", while the expression level of a novel gene is not significantly lower than that of wild type proteins expressed in wild type tissue, transgenic cells, such as the genome, when no Fluorescence is detected, but when no Fluorescence is expressed in a High resolution gene expression level of the genome, No. 3, no more than that of human genome is expressed by a High resolution imaging (Fluorescence).

Example 2: homology-based editing rates in mammalian cells are increased by attaching nuclear localization signals to the donor DNA.

It is hypothesized that delivery of donor DNA (single-stranded or double-stranded, linear or circular) to the nucleus will increase the local concentration of donor DNA near where editing occurs and thus bias repair towards this donor DNA rather than NHEJ.

Zanta, m.a. et al, "journal of the american academy of sciences (proc.natl.acad.sci. (USA))," 96: 91-96(1999) demonstrated that NLS bound to DNA segments can increase delivery of DNA segments to the nucleus. It is therefore reasonable to assume that a similar approach can be used to enhance delivery of donor ssDNA to the nucleus, and that an increase in donor DNA within the nucleus may increase the frequency of integration of donor DNA at the "cleavage site".

For NLS, an evolved SV40 NLS (BP-SV40, KRTADGSEFESPKKKRKVEGG) was used (SEQ ID NO: 13). Hodel, m.r. et al, "journal of biochemistry" 276 (2): 1317-1325(2001) reported that this sequence efficiently localizes to the nucleus. Using succinimidyl 4- (N-maleimidomethyl) cyclohexane-1-carboxylate (SMCC) orA chemical to bind the NLS peptide to the ssDNA donor sequence. The resulting NLS-oligonucleotide conjugate was purified by HPLC. The mass of the NLS-oligonucleotide was determined by MALDI-TOF. Two constructs as shown in FIG. 9 were prepared. As shown in fig. 10, these donor DNAs allow screening by fluorescence.

Part 1: GFP with 6 base deletions was converted to functional GFP using NLS-conjugated oligonucleotide donors.

Via aThe chemical substance makes the carboxyl of NLS peptide BP-SV40(SEQ ID NO. 13)Base-end binding to the 5' end of the oligonucleotide:

5′CGGGGTAGCGGCTGAAGCACTGCACGCCGTAGGTCAGGGTGGTCACGAG

GGTGGGCCAGGGCACGGGCAGCTTGCCGGTGGTGCAGATGAACTTCAG-3' (SEQ ID NO: 14). The resulting NLS-oligonucleotide conjugate was purified by HPLC. The mass of the NLS-oligonucleotide was determined by MALDI-TOF.

On the day before transfection, at 1 × 10 per well⁵Cell Density of individuals disrupted EmGFPGRIPTITE^TMThe 293 cell line was seeded in 24-well plates. On the day of transfection, 0.5. mu.g Cas9 mRNA and 150ng gRNA targeting the disrupted EmGFP gene (GCACGCCGTAGGTGGTCACGAGG) (SEQ ID NO.: 15) were added to 25. mu.l in sterile tubesIn (1). NLS-oligonucleotide conjugates were dissolved in water and different numbers of NLS-oligonucleotides were added to Cas9 and gRNA-containing tubes. Phosphorothioate-modified (PS) oligonucleotides were used as controls, with two phosphorothioates at the 5 'end and two phosphorothioates at the 3' end of the oligonucleotide. In a separate tube, 1.5. mu.l of 1} Lipofectamine was added^TMMessengerMax^TMAdding into 25 μ lIn a culture medium. Then diluting LIPOFECTAMINE^TMMESSENGERMAX^TMTransfer to tubes containing Cas9, gRNA, and the indicated amount of NLS-or PS-oligonucleotides. After 5 minutes incubation at room temperature, the mixture was added to 24 wells containing 0.5ml growth medium. At 48 hours post-transfection, cells were analyzed by flow cytometry to determine the percentage of EmGFP positive cells.

As shown in fig. 11, the NLS-donor made the editing of the cell line significantly higher. Up to 52% of the cells were GFP-positive at the optimal dose of 0.1 pmol NLS-donor, compared to 3 pmol required for standard PS-donor to reach the maximum 36% of optimal editing reached with 30 times more material. Figure 12 demonstrates that NLS-donors have much higher transformation into GFP + cells at the same low dose of 0.03 picomolar. Overall, the edited transformation was found to be higher as measured by the NLS-donor at much lower doses according to GFP positive cells.

Section 2: BFP is converted to functional GFP by altering individual bases using NLS-bound oligonucleotide donors.

The carboxy terminus of the NLS peptide BP-SV40(SEQ ID No.: 13) was bound to the 5' terminus of the oligonucleotide by SMCC chemistry:

5′-GCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCT

ACGGCGTGCAGTGCTTCAGCCGCTACCCCGACCACATGA-3′

(SEQ ID No.: 16). The resulting NLS-oligonucleotide conjugate was purified by HPLC. The mass of the NLS-oligonucleotide was determined by MALDI-TOF.

One day prior to transfection, the eBFP 293FT stable cell line was used at 1x10 per well⁵The cell density of (a) was seeded on a 24-well plate. On the day of transfection, 0.5. mu.g Cas9 mRNA and 150ng gRNA targeting the eBFP gene (CTCGTGACCCCTGACCTGACCCCCACGG) (SEQ ID NO: 17) were added to 25. mu.l in sterile tubesIn (1). NLS-oligonucleotides were dissolved in water and different numbers of NLS-oligonucleotides were added to Cas9 and gRNA-containing tubes. Unmodified oligonucleotides were used as controls. In a separate tube, 1.5. mu.l of 1} Lipofectamine was added^TMMessengerMax^TMAdding into 25 μ lIn a culture medium. Then diluting LIPOFECTAMINE^TMMESSENGERMAx^TMTransfer to tubes containing Cas9, gRNA, and the indicated amount of NLS-oligonucleotide or unmodified oligonucleotide. After 5 minutes incubation at room temperature, the mixture was added to 24 wells containing 0.5ml growth medium. At 48 hours post-transfection, cells were analyzed by flow cytometry to determine the percentage of GFP positive cells.

As shown in fig. 13, cells were analyzed by flow cytometry to determine the percentage of GFP positive cells. At an optimal dose of 0.3 picomolar, up to 76% of the cells turned out to be GFP-positive from BFP, compared to 58.5% in the presence of 10 picomolar of the control PS oligonucleotide. Again, NLS-donors were found to have higher edits at 30-fold lower doses. With decreasing dose, higher levels of editing can be maintained with NLS-oligonucleotides, where 21% of cells are edited at 0.01 picomolar compared to 6% of cells at 0.03 picomolar in the presence of the control PS oligonucleotide.

The methods described herein have broad application in cell engineering, cell therapy, and biological production, among others. Unlike transient plasmid expression, relatively large expression cassettes can be inserted directly into a particular locus in the genome for biological production. Endogenous promoters with the required relative strength can be used to target the safe harbor region. Duplicate regions may also potentially be targeted to merge multiple copies of the payload for higher expression levels. A strong promoter can be used to drive expression of a foreign gene of interest, independent of the selection marker. Due to the high integration efficiency and specificity, the stable cell pool can be used directly for protein production without the need to isolate clonal cells, thus saving time and cost. In some embodiments, this method is used to produce recombinant antibodies in expihho cells.

Reference to the literature

Liang et al, "Enhanced CRISPR/Cas9-mediated precise genome editing by improving the design and delivery of grnas, Cas9 nuclease and donor DNA (Enhanced CRISPR/Cas9-mediated precise genome editing and delivery of grnas, Cas9 cycle, and donor DNA)", "journal of biotechnology (j.biotechnol) 241: 136-146(2017).

Example 3. DNA binding adjacent to the targeted dsDNA break can facilitate replacement of chromatin and/or DNA unfolding and facilitate improved entry of designed nucleases.

"TAL-Buddy" consists of 18 repeats of TAL binders. "TAL-Buddy" is designed to be immediately adjacent to the designed nuclease binding region, one on each side (left-Lt, right-Rt, TALEN pairs and TAL-Buddy binding sequences are listed in table 12). "TAL-Buddy" is prepared as follows: BsaI was used to assemble an N-terminal fragment containing the T7 promoter and transcription/translation initiation element with an amino-terminal fragment of TAL, a hexatalvd trimer, and a C-terminal fragment containing the C-terminal domain, nuclear localization signal, and stop codon via the Golden Gate assembly reaction (shown in figure 14). SEQ ID NO: an example of a "TAL-Buddy" (CMPK1-TALEN 2-7 nt _ TAL-Buddy _ Lt) nucleotide sequence is listed in 35. The adjacent genomic sequence of CMPK1-C target is shown in SEQ ID NO: 36; and the relative positions of TALEN and TAL Buddy are shown in SEQ ID NO: 20 and SEQ ID NO: 21. further description of this example is provided in fig. 14-18, 22, and 32-36.

Full-length "TAL-Buddy" is enriched by amplification using primer pair TD1-F2 and TD8-R2(SEQ ID NOS: 22-23), and further used as a template for using mMESSAGE mMACHINE^TMmRNA was prepared using T7 ULTRA transcription kit (Saimer Feishell technology). 0, 25, 50 or 100ng of Lt and Rt "TAL-Buddy" mRNA and 100ng of TALENmRNA pairs were added for useElectroporation equipment (seemer feishell technology), with 1300 pulse voltage, 20 pulse width and 2 pulse number transfection into N50,000 293 human embryonic kidney cells (293 FT). Cells were harvested 48 to 72 hours after transfection and lysed. Using GENEART^TMThe genomic lysis assay kit (Saimer Feishell science, catalog No. A24372) analyzes the formation of indels. (FIG. 15)

One type of method that can be used to assemble the TAL is the Golden Gate method.

In Golden Gate, assembly and cloning is based on the generation of a nucleic acid segment with "sticky" ends, which is generated by cleavage with one or more type II restriction endonucleases, the assembled nucleic acid molecule typically being subsequently introduced into a suitable host cell. Type II restriction endonucleases are used because they recognize asymmetric sequences and cleave these sequences at a defined distance from the recognition site. In addition, the ends of the DNA molecule can be designed to flank type II restriction sites, such that digestion of the fragment removes the enzyme recognition site and creates a complementary overhang. Such ends can be joined seamlessly, forming a junction lacking the original site or scar.

In addition, type II restriction endonucleases can and have been used to generate repeat regions of TAL effectors. Type II restriction endonucleases can also be used to ligate suitable terminal protein-encoding nucleic acids to the sides of TAL effector repeats and to ligate TAL effector coding regions to other nucleic acid molecules (e.g., vectors in which the TAL effector-encoding nucleic acids are operably linked to a promoter). Methods for assembling TAL effectors which are type II restriction endonucleases are described, for example, in Morbitzer et al, "Assemblem cloning by which to assemble customized TALE-type DNA binding domains (Assembly of custom TALE-type DNA binding domains by cloning)", "Nucleic Acids research (Nucleic Acids Res.)" 39: 5790-9(2011).

As a result: when "TAL-Buddy" is designed to be spaced 7nt apart relative to the TALEN binding sequence (i.e., 33nt relative to the TALEN cleavage site), indel formation at the CMPK1-C target is increased by about 2-fold (fig. 15).

Example 4 "TAL-Buddies" were designed with different spacing relative to TALEN binding sequences (table 12) and tested in 293FT cells using the same method described in example 3.

As a result: "TAL-Buddy" works when spaced 7-30nt apart relative to the TALEN binding sequence (FIG. 16). Optimal enhancement of TALEN cleavage occurs when TAL-buddies are 4 to 30nt away from TALENs. Placing TAL-buddies next to or more than 50nt from the TALEN will not enhance TALEN cleavage (fig. 16 and 17).

Example 5 "TAL-Buddy" of CRISPR sgRNA immediately adjacent to the UFSP2-SNP targeting site was designed to be 7nt or 20nt apart relative to the CRISPR sgRNA binding sequence.

The genomic sequence of the UFSP2-SNP target is set forth in SEQ ID NO: 25 and SEQ ID NO: 43.

as a result: when "TAL-Buddy" is designed to be 7nt or 20nt apart relative to the poorly performing CRISPR sgRNA binding sequence (i.e., 23nt and 37nt relative to the CRISPR cleavage site, respectively), the formation of indels is increased 10 to 20-fold. The results are shown in FIG. 17.

Example 6 to minimize off-target effects of wild-type SpCas9, mutant forms were tested

By enhancing the accessibility of the DNA target locus, the activity of the following proteins can be enhanced: poorly performing Cas9 proteins (e.g., High-fidelity CRISPR-Cas9 nuclease (High-fidelity CRISPR-Cas9 nucleic acid with nondetected High-fidelity CRISPR-Cas9 nuclease) described by kleintiver, Benjamin p. et al ("whole genome off-target effects"), "nature (2016.) PubMed PMID: 26735016; Cas9 protein binding modified PAM and other orthologous Cas9 proteins, e.g., CRISPR 1(Cpf1) from Prevotella (Prevotella) and Francisella (Francisella) 2015201525 protein, any of the commonly known and described forms of medmutation in Cas 2015 9 can be used in the methods and compositions provided herein; non-limiting examples of Cas9 protein contemplated for use in the methods and compositions provided herein are described in sareagle Cas 3627 (rabien) engineering Cas 3627, rabien scientific engineering (rabien) with reasonable modification of mutant Cas 3655 protein (rabiense) and polypeptide engineering (PMID) 3655,27, benjamin P, et al ("high fidelity CRISPR-Cas9 nuclease undetectable by Whole genome off-target Effect". Nature (2016. PubMed PMID: 26735016), incorporated by reference in its entirety for all purposes. The on-target cleavage efficiency of these two mutant forms is also compromised. Combinations of "TAL-Buddy" at a spacing of 20nt relative to the sgRNA binding sequence were tested with RNPs formed with sgrnas and eSpCas9 or SpCas9-HF 1. 100ng of Lt and Rt "TAL-Buddy" mRNA and CRISPR-RNP (1000ng of SpCas9-HF1 or eSPCas9 protein and 200ng of sgRNA) were added to make use ofElectroporation equipment (seemer femier technology), was transfected into approximately 50,000 293 human embryonic kidney cells (293FT) at 1150 pulse voltages, 20 pulse widths and 2 pulse numbers. Cells were harvested 48 to 72 hours after transfection and lysed. Using GENEART^TMGenome lysis detection kit (Saimer Feishale science and technology)Catalog No. a24372) analysis of the formation of indels.

As a result: CRISPR-RNP formed using sgRNA and SpCas9-HF1 or eSpCas9 gave 5-fold and 14-fold indel formation, respectively, when adding "TAL-Buddy" at a spacing of 20m relative to the sgRNA binding sequence (fig. 18).

Example 7. truncated grnas of 15nt in length ("CR-PAL") have shown dsDNA binding activity, but no cleavage activity in the presence of wild-type Cas 9.

The architecture diagram of the templates used to make the sgrnas and "CR-PAL" is shown in fig. 19. A functional diagram of CR-PAL is shown in FIG. 20. A 15-mer gRNA ("CR-PAL") near the CRISPR cleavage site was designed and prepared by in vitro transcription. The genomic DNA sequence and relative position of the full-length sgRNA binding sequence are set forth in SEQ ID NO: 44 and SEQ ID NO: 45, respectively. Fig. 19 and 20.

As a result: the formation of indels increased more than 60-fold for both left (Lt) and right (Rt) CR _ PAL (fig. 21 and 34).

Example 8: cas9 NLS variants

Cas9 v2(BPsv40 tag/nucleoplasmin), IDT (catalog No. 1074181) and Cas9 v1(—/3x sv40) were compared in a549 cells for two targets (HPRT and PRKCG) with 4-fold serial dilutions to determine how protein concentration affects functional performance. HPRT is thought to modify the target easily, whereas PRKCG is more difficult to modify.

RNP complexes were formed using 1 μ g Cas9 protein (from various sources) and 250ng gRNA (HPRT or PRKCG). After 10 minutes of incubation, the cells were incubated in an appropriate volume of OPTI-MEM^TMThe initial concentration was diluted to prepare 4-fold serial dilutions of the RNP complex. Mixing each serial dilution with LIPOFECTAMINE according to the manual^TMCRISPRMAX^TMMixed and then added to approximately 50,000 293FT cells. Transfected cells were grown for 3 days and editing efficiency was measured by genomic lysis assay.

A number of different forms of spy Cas9 backbone variants were also tested (see fig. 43, data not shown) with various NLS or affinity tags added to the N-or C-terminus.

Of the three forms presented by the data shown in fig. 44, Cas9 v2 had the significantly highest activity over the dilution range.

Example 9: TALEN cleavage and homologous directed repair efficiencies

The TALEN design shown below for the targets used to generate the data listed in fig. 48-50 is set forth in table 8 below. Tables 9-11 below set forth the data used to generate FIGS. 48-50.

TABLE 8

For each 50,000 cells grown in 96-well plates, 100ng of forward and 100ng of reverse TALEN mRNA and/or 10 picomolar donor single-stranded oligonucleotides contained a 6-nucleotide HindIII recognition site in the middle and 35-nucleotide homology arms at the 5 'and 3' ends. Both distal nucleotides at the 5 'and 3' ends have phosphorothioate linkages to prevent nuclease degradation.

On the day of transfection, the prepared cells were prepared as follows: (1) calculating the total number of cells required (50,000 cells each); (2) splitting the cells and counting the number of cells; (3) centrifuge the desired number of cells at 1,000rpm for 5 minutes; (4) the cell pellets were washed once with DPBS and then centrifuged at 1,000rpm for 5 minutes; (5) resuspending the pellet in 5. mu.l per 50,000 cellsResuspension buffer R (seemer feishel science,transfection system, 100 μ L kit, catalog No. MPK 10096); and (6) 100ng of forward TALEN primer, 100ng of reverse TALEN primer, 10pmol of donor single-stranded oligonucleotide, and 5 μ l R buffer were added to each 5 μ l of cell-containing R buffer.

Use 10. mu.lThe pipette is electroporated (seemer feishel science catalog number MPK 5000). The electroporation conditions were as follows: 1300 (pulse voltage), 20 (pulse width), 2 (number of pulses) for 293FT cells; 1400 (pulse voltage), 20 (pulse width), 2 (number of pulses) for U2OS cells; for a549 cells, 1150 (pulse voltage), 30 (pulse width), and 2 (pulse number).

The electroporated cells were then transferred to 100. mu.l of pre-warmed growth medium in 96-well culture plates. Cells were harvested 48-72 hours post-transfection and usedThe genomic lysis assay kit (seimer feishell science, catalog No. a24372) analyzes the lysis efficiency and utilizes HindIII digestion to determine HDR efficiency.

Description of amino acid and nucleotide sequences

Table 12 provides a listing of some of the sequences referenced herein.

Table 12: various nucleotide and amino acid sequences as referred to herein

The names, headings, and sub-headings provided herein should not be construed as limiting the various aspects of the disclosure. Accordingly, the terms defined below are more fully defined by reference to the specification as a whole.

Unless defined otherwise, technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art. See, e.g., Singleton et al, DICTIONARY of microbiology AND MOLECULAR BIOLOGY (DICTIONARY OFMICROBIOLOGY AND MOLECULAR BIOLOGY), 2 nd edition, J.Wiley & Sons (1994, New York, N.Y.); sambrook et al, molecular cloning: a LABORATORY Manual (Molecular CLONING, A Laboratory Manual), Cold spring harbor Press (Cold spring harbor, N.Y., 1989). Any methods, devices, and materials similar or equivalent to those described herein can be used in the practice of the present invention. Definitions are provided herein to facilitate understanding of certain terms used frequently herein and are not meant to limit the scope of the present disclosure.

In this application, the use of "or" means "and/or" unless stated otherwise. The use of "or" in the context of a plurality of dependent claims merely refers instead to more than one of the preceding independent or dependent claims. It should be further noted that, as used in this specification and the appended claims, the singular forms "a," "an," and "the" and any singular use of any word, include plural referents unless expressly and unequivocally limited to one referent. As used herein, the term "comprise" and grammatical variations thereof are intended to be non-limiting such that listing of items in a list does not exclude other similar items that may be substituted or added to the listing.

As used herein, any concentration range, percentage range, ratio range, or integer range is to be understood as including any integer value within the range, and where appropriate, fraction thereof (e.g., tenth and hundredth of an integer), unless otherwise indicated.

Units, prefixes, and symbols are expressed in a form accepted by the international system of units (SI). Numerical ranges include the numbers defining the range. The measured values are to be understood as approximations, taking into account the significant figures and the errors associated with the measurement.

The foregoing written description is considered to be sufficient to enable those skilled in the art to practice the embodiments. The foregoing description and examples detail certain embodiments and describe the best mode contemplated by the inventors. It should be understood, however, that the embodiments may be practiced in many ways regardless of the degree of detail set forth in the text, and should be construed in accordance with the appended claims and any equivalents thereof. The specification and exemplary embodiments should not be considered as limiting.

For the purposes of the present specification and appended claims, unless otherwise indicated, all numbers expressing quantities, percentages or proportions, and other numerical values used in the specification and claims, are to be understood as being modified in all instances by the term "about" to the extent they have not been so modified. Accordingly, unless indicated to the contrary, the numerical parameters set forth in the specification and attached claims are approximations that may vary depending upon the desired properties sought to be obtained. At the very least, and not as an attempt to limit the application of the doctrine of equivalents to the scope of the claims, each numerical parameter should at least be construed in light of the number of reported significant digits and by applying ordinary rounding techniques.

When a term such as "less than or equal to" or "greater than or equal to" precedes the list of values or ranges,

the term modifies all values or ranges provided in the list. In some embodiments, the numerical values are rounded to the nearest integer or significant figure.

Exemplary objects of the invention are represented by the following items:

a method for homologous recombination in an initial nucleic acid molecule, said method comprising: (a) creating a double-strand break in the initial nucleic acid molecule to produce a cleaved nucleic acid molecule; and (b) contacting the cleaved nucleic acid molecule with a donor nucleic acid molecule, wherein the initial nucleic acid molecule comprises a promoter and a gene, and wherein the donor nucleic acid molecule comprises: (i) matching ends on the 5 'and 3' ends of 12bp to 250bp in length, (ii) a promoterless selection marker, (iii) a reporter gene, (iv) a self-cleaving peptide linking the promoterless selection marker to the reporter gene or LoxP located on either side of the promoterless selection marker, and (v) optionally a linker present between the promoterless selection marker and the reporter gene.

The method according to item 1, wherein the double strand break in the nucleic acid molecule: (i) less than or equal to 250bp relative to the ATG start codon for N-terminal labeling of the cleaved nucleic acid molecule; or (ii) less than or equal to 250bp relative to the stop codon, for C-terminal labeling of the cleaved nucleic acid molecule.

The method according to item 1, wherein the double strand break is induced by at least one nucleic acid cleavage entity or electroporation.

The method according to item 4, wherein the at least one nucleic acid cleavage entity comprises a nuclease comprising one or more zinc finger proteins, one or more transcription activator-like effectors (TALEs), one or more CRISPR complexes, one or more argan-nucleic acid complexes, or one or more meganucleases.

The method according to item 3, wherein the at least one nucleic acid cleavage entity is administered using an expression vector, a plastid, a ribonucleoprotein complex (RNC), or an mRNA.

The method according to item 6, wherein the promoter-free selectable marker comprises a protein, an antibiotic resistance selectable marker, a cell surface protein, a metabolite, or an active fragment thereof.

Item 7. the method according to item 6, wherein the promoter-free selectable marker is a protein.

The method according to item 8, wherein the protein is Focal Adhesion Kinase (FAK), angiopoietin-Associated Growth Factor (AGF) receptor, or Epidermal Growth Factor Receptor (EGFR).

The method according to item 6, wherein the promoterless selection marker is an antibiotic resistance selection marker.

The method according to item 9, wherein the antibiotic resistance selection marker is a recombinant antibody.

The method according to item 11, wherein the antibiotic resistance selection marker is a human IgG antibody.

The method according to item 1, wherein the reporter gene comprises a fluorescent protein reporter.

The method according to item 12, wherein the fluorescent protein reporter is an emerald green fluorescent protein (EmGFP) reporter or an Orange Fluorescent Protein (OFP) reporter.

The method according to item 1, wherein the promoter-free selectable marker is: (i) (ii) linked to the 5' end of the reporter gene for N-terminal labeling of the cleaved nucleic acid molecule; or (ii) linked to the 3' end of the reporter gene for C-terminal labeling of the cleaved nucleic acid molecule.

The method according to item 15, wherein the donor nucleic acid molecule comprises a linker between the promoterless selection marker and the reporter gene.

Item 16 the method according to item P-15, wherein the distance between the promoter-free selection marker and the reporter gene is less than or equal to 300nt, 240nt, 180nt, 150nt, 120nt, 90nt, 60nt, 30nt, 15nt, 12nt or 9 nt.

The method according to item 17, wherein the distance is 6 nt.

The method according to item 18, wherein the linker is a polyglycine linker.

The method according to item 1, wherein the self-cleaving peptide is a self-cleaving 2A peptide.

Item 20 the method according to item 1, wherein the matching ends are added to the 5 'and 3' ends of the donor nucleic acid molecule by PCR amplification.

The method of item 1, wherein the matched ends share greater than or equal to 95% sequence identity.

The method according to item 1, wherein the matched end comprises single-stranded DNA or double-stranded DNA.

Item 23 the method according to item 1, wherein the matched ends of the 5 'and 3' ends of the donor nucleic acid molecule have a length of 12bp to 200bp, 12bp to 150bp, 12bp to 100bp, 12bp to 50bp, or 12bp to 40 bp.

Item 24 the method of item 23, wherein the matching end has a length of 35 bp.

The method according to item 25, wherein the initial nucleic acid molecule is in a cell or plastid.

Item 26 the method according to item 1, wherein the donor nucleic acid molecule comprises a length of less than or equal to 1 kb, 2kb, 3kb, 5kb, 10kb, 15kb, 20kb, 25kb or 30 kb.

The method according to item 1, wherein the donor nucleic acid molecule is integrated into the cleaved nucleic acid molecule by Homology Directed Repair (HDR).

The method of clause 27, wherein the HDR is greater than or equal to 10%, 25%, 50%, 75%, 90%, 95%, 98%, 99%, or 100%.

Item 29. the method according to item 1, wherein the integration efficiency of the donor nucleic acid molecule is greater than or equal to 50%, 75%, 90%, 95%, 98%, 99% or 100%.

Item 30. the method according to item 1, further comprising modifying the donor nucleic acid molecule at the 5 'end, the 3' end, or both the 5 'and 3' ends.

Item 31. the method according to item 30, wherein the donor nucleic acid molecule is modified at the 5 'and 3' ends.

The method according to item 30, wherein the donor nucleic acid molecule is modified to have one or more nuclease resistant groups in at least one strand at least one terminus.

The method according to item 33, wherein the one or more nuclease-resistant groups comprise one or more phosphorothioate groups, one or more amino groups, 2 '-O-methyl nucleotides, 2' -deoxy-2 '-fluoro nucleotides, 2' -deoxy nucleotides, 5-C-methyl nucleotides, or a combination thereof.

Item 34. the method according to item 1, further comprising treating the donor nucleic acid molecule with at least one non-homologous end joining (NHEJ) inhibitor.

The method according to item 34, wherein the at least one NHEJ inhibitor is DNA-dependent protein kinase (DNA-PK), DNA ligase IV, DNA polymerase 1 or 2(PARP-1 or PARP-2), or a combination thereof.

Item 36. the method according to item 35, wherein the DNA-PK inhibitor is Nu7206(2- (4-morpholinyl) -4H-naphtho [1, 2-b ] pyran-4-one), Nu7441(8- (4-dibenzothienyl) -2- (4-morpholinyl) -4H-1-benzopyran-4-one), Ku-0060648 (4-ethyl-N- [4- [2- (4-morpholinyl) -4-oxo-4H-1-benzopyran-8-yl ] -1-dibenzothienyl ] -1-piperazineacetamide), compound 401(2- (4-morpholinyl) -4H-pyrimido [2, 1-a ] isoquinolin-4-one), DMNB (4, 5-dimethoxy-2-nitrobenzaldehyde), ETP 45658(3- [ 1-methyl-4- (4-morpholinyl) -1H-pyrazolo [3, 4-d ] pyrimidin-6-ylphenol), LTURM 34(8- (4-dibenzothienyl) -2- (4-morpholinyl) -4H-1, 3-benzoxazin-4-one), or Pl 103 hydrochloride (3- [4- (4-morpholinylpyrido [3 ', 2': 4, 5] furo [3, 2-d ] pyrimidin-2-yl ] phenol hydrochloride).

The method according to item 1, wherein the mammal is a human, a mammalian laboratory animal, a mammalian farm animal, a mammalian sport animal or a mammalian pet.

The method according to item 37, wherein the mammal is a human.

A cell or plastid made by the method of claim 1.

Item 40. the cell according to item 39, wherein the cell is a eukaryotic cell.

The cell according to item 40, wherein the eukaryotic cell is a mammalian cell.

A method of cell therapy comprising administering to a subject in need thereof an effective amount of a cell according to item 41.

The method according to item 42, wherein the cell is a T cell and the promoterless selection marker is a Chimeric Antigen Receptor (CAR).

Item 44. a method for producing a promoter-free selectable marker, comprising activating a promoter of a cell or plastid prepared according to the method of item 1 to produce the promoter-free selectable marker.

A composition comprising the promoter-free selectable marker produced by the method of item 44.

A method for therapeutically treating a subject in need thereof, comprising administering an effective amount of a promoter-free selection marker produced according to the method of item 44.

A drug screening assay comprising the promoter-free selectable marker produced according to the method of item 44.

Item 48 a kit for generating a promoter-free selection marker comprising a promoter-free selection marker linked to a reporter gene by a self-cleaving peptide, or LoxP located on either side of the selection marker.

The kit according to item 49, wherein the reporter gene is GFP or OFP.

The kit according to item 48, further comprising at least one nucleic acid cleaving entity.

The kit according to item 51, which further comprises at least one NHEJ inhibitor.

The kit according to item 52, further comprising one or more nuclease resistant groups.

A recombinant antibody expression cassette comprising: (i) matched ends at the 5 'and 3' ends of the expression cassette, wherein the length of the matched ends is less than or equal to 250 bp; (ii) a promoter-free selectable marker; (iii) a reporter gene; (iv) a self-cleaving peptide linking the promoter-free selectable marker to the reporter gene; and (v) optionally a linker between the promoterless selection marker and the reporter gene, wherein the promoterless selection marker is linked to the 5 'end of the reporter gene for N-terminal labeling of the cleaved nucleic acid molecule or to the 3' end of the reporter gene for C-terminal labeling of the cleaved nucleic acid molecule.

A method of enhancing accessibility of a target locus in a cell, the method comprising: (1) introducing a first DNA binding modulation enhancer into a cell comprising a nucleic acid encoding a locus of interest, wherein the first DNA binding modulation enhancer is not endogenous to the cell; and (2) a first enhancer binding sequence that binds the first DNA binding modulation enhancer to the target locus, relative to the absence of the first DNA binding modulation enhancer, thereby enhancing accessibility of the target locus.

The method according to item 55, wherein said introducing a first DNA binding modulation enhancer comprises introducing a vector encoding said first DNA binding modulation enhancer.

The method according to item 54, wherein said introducing a first DNA binding modulation enhancer comprises introducing mRNA encoding said first DNA binding modulation enhancer.

The method according to item 57, wherein said introducing a first DNA binding modulation enhancer comprises introducing a first DNA binding protein or a first DNA binding nucleic acid.

The method according to item 58, wherein the rate at which homologous recombination occurs at the target locus is increased relative to the absence of the first DNA binding modulation enhancer.

The method according to item 59, wherein the first DNA binding modulation enhancer is a first DNA binding protein or a first DNA binding nucleic acid.

Item 60. the method according to item 54, wherein the first DNA binding modulation enhancer is a first transcription activator-like (TAL) effector protein or a truncated first guide RNA (gRNA).

The method according to item 61, item 54, wherein the first enhancer binding sequence has the sequence of SEQ ID NO: 26. SEQ ID NO: 28. SEQ ID NO: 30. SEQ ID NO: 32. SEQ ID NO: 34. SEQ ID NO: 36. SEQ ID NO: 38 or seq id NO: 40, in a sequence of seq id no.

A method of replacing chromatin of a target locus in a cell, the method comprising: (1) introducing a first DNA binding modulation enhancer into the cell comprising a nucleic acid encoding a locus of interest, wherein the first DNA binding modulation enhancer is not endogenous to the cell; and (2) allowing the first DNA binding modulation enhancer to bind to the first enhancer binding sequence of the target locus, thereby replacing chromatin of the target locus.

The method according to item 63, item 62, wherein said introducing a first DNA binding modulation enhancer comprises introducing a vector encoding said first DNA binding modulation enhancer.

The method according to item 64, wherein said introducing a first DNA binding modulation enhancer comprises introducing mRNA encoding said first DNA binding modulation enhancer.

The method according to item 65, wherein said introducing a first DNA binding modulation enhancer comprises introducing a first DNA binding protein or a first DNA binding nucleic acid.

The method according to item 62, wherein the rate at which homologous recombination occurs at the target locus is increased relative to the absence of the first DNA binding modulation enhancer.

The method according to item 62, wherein the first DNA binding modulation enhancer is a first DNA binding protein or a first DNA binding nucleic acid.

Item 68 the method according to item 62, wherein the first DNA binding modulation enhancer is a first transcription activator-like (TAL) effector protein or a truncated first guide RNA (gRNA).

A method of reconfiguring chromatin of a target locus in a cell, the method comprising: (1) introducing a first DNA binding modulation enhancer into a cell comprising a nucleic acid encoding a locus of interest, wherein the first DNA binding modulation enhancer is not endogenous to the cell; and (2) allowing the first DNA binding modulation enhancer to bind to the first enhancer binding sequence of the target locus, thereby reconfiguring chromatin of the target locus.

The method according to item 70, item 69, wherein said introducing a first DNA binding modulation enhancer comprises introducing a vector encoding said first DNA binding modulation enhancer.

The method according to item 71, wherein said introducing a first DNA binding modulation enhancer comprises introducing mRNA encoding said first DNA binding modulation enhancer.

The method according to item 69, wherein said introducing a first DNA binding modulation enhancer comprises introducing a first DNA binding protein or a first DNA binding nucleic acid.

The method according to item 73, 69, wherein the rate at which homologous recombination occurs at the target locus is increased relative to the absence of the first DNA binding modulation enhancer.

The method according to item 74, item 69, wherein the first DNA binding modulation enhancer is a first DNA binding protein or a first DNA binding nucleic acid.

The method according to item 69, wherein the first DNA binding modulation enhancer is a first transcription activator-like (TAL) effector protein or a truncated first guide RNA (gRNA).

A method of enhancing accessibility of a target locus in a cell, the method comprising: (1) introducing into a cell comprising a nucleic acid encoding a locus of interest: (i) a first DNA binding modulation enhancer, wherein the first DNA binding modulation enhancer is not endogenous to the cell; and (ii) a second DNA binding modulation enhancer, wherein the second DNA binding modulation enhancer is not endogenous to the cell; (2) a first enhancer binding sequence that binds the first DNA binding modulation enhancer to the target locus; and (3) a second enhancer binding sequence that binds the second DNA binding modulation enhancer to the target locus, relative to the absence of the first DNA binding modulation enhancer or the second DNA binding modulation enhancer, thereby enhancing accessibility of the target locus.

The method according to item 77, item 76, wherein said introducing a first DNA binding modulation enhancer comprises introducing a vector encoding said first DNA binding modulation enhancer.

The method according to clause 76, clause 78, wherein said introducing a first DNA binding modulation enhancer comprises introducing mRNA encoding said first DNA binding modulation enhancer.

The method according to item 76, wherein said introducing a first DNA binding modulation enhancer comprises introducing a first DNA binding protein or a first DNA binding nucleic acid.

The method according to item 80, item 76, wherein said introducing a second DNA binding modulation enhancer comprises introducing a vector encoding said second DNA binding modulation enhancer.

The method according to item 81, item 76, wherein said introducing a second DNA binding modulation enhancer comprises introducing mRNA encoding said second DNA binding modulation enhancer.

The method according to item 76, item 82, wherein said introducing a second DNA binding modulation enhancer comprises introducing a second DNA binding protein or a second DNA binding nucleic acid.

The method according to clause 76, wherein the rate at which homologous recombination occurs at the target locus is increased relative to the absence of the first DNA binding modulation enhancer.

The method according to item 84, wherein the first DNA binding modulation enhancer is a first DNA binding protein or a first DNA binding nucleic acid.

The method according to item 76, wherein the first DNA binding modulation enhancer is a first transcription activator-like (TAL) effector protein or a truncated first guide rna (grna).

The method according to item 76, wherein the second DNA binding modulation enhancer is a second DNA binding protein or a second DNA binding nucleic acid.

The method according to item 76, wherein the second DNA binding modulation enhancer is a TAL effector protein or a truncated gRNA.

Item 88. the method according to item 76, wherein the first DNA binding modulation enhancer is a first TAL effector protein and the second DNA binding modulation enhancer is a second TAL effector protein.

The method according to item 76, wherein the first DNA binding modulation enhancer is a TAL effector protein and the second DNA binding modulation enhancer is a truncated gRNA.

Item 90. the method according to item 76, wherein the first DNA binding modulation enhancer is a truncated first gRNA and the second DNA binding modulation enhancer is a truncated second gRNA.

Item 91. the method according to item 76, wherein the first DNA binding modulation enhancer is a truncated gRNA and the second DNA binding modulation enhancer is a TAL effector protein.

The method according to item 92, 76, wherein the first enhancer binding sequence has the sequence of SEQ ID NO: 26. SEQ ID NO: 28. SEQ ID NO: 30. SEQ ID NO: 32. SEQ ID NO: 34. SEQ ID NO: 36. SEQ ID NO: 38 or seq id NO: 40, in a sequence of seq id no.

The method according to item 93, wherein the second enhancer binding sequence has the sequence of SEQ ID NO: 27. SEQ ID NO: 29. SEQ ID NO: 31. SEQ ID NO: 33. SEQ ID NO: 35. SEQ ID NO: 37. SEQ ID NO: 39 or SEQ ID NO: 41.

A method of replacing chromatin of a target locus in a cell, the method comprising: (1) introducing into a cell comprising a nucleic acid encoding a locus of interest: (i) a first DNA binding modulation enhancer, wherein the first DNA binding modulation enhancer is not endogenous to the cell; and (ii) a second DNA binding modulation enhancer, wherein the second DNA binding modulation enhancer is not endogenous to the cell; (2) a first enhancer binding sequence that binds the first DNA binding modulation enhancer to the target locus; and (3) allowing the second DNA binding modulation enhancer to bind to the second enhancer binding sequence of the target locus, thereby replacing chromatin of the target locus.

The method according to item 94, wherein said introducing a first DNA binding modulation enhancer comprises introducing a vector encoding said first DNA binding modulation enhancer.

The method according to item 94, wherein said introducing a first DNA binding modulation enhancer comprises introducing mRNA encoding said first DNA binding modulation enhancer.

The method according to item 97, 94, wherein said introducing a first DNA binding modulation enhancer comprises introducing a first DNA binding protein or a first DNA binding nucleic acid.

The method according to item 94, wherein said introducing a second DNA binding modulation enhancer comprises introducing a vector encoding said second DNA binding modulation enhancer.

The method according to item 94, wherein said introducing a second DNA binding modulation enhancer comprises introducing mRNA encoding said second DNA binding modulation enhancer.

The method of item 94, wherein said introducing a second DNA binding modulation enhancer comprises introducing a second DNA binding protein or a second DNA binding nucleic acid.

The method according to item 94, wherein the rate at which homologous recombination occurs at the target locus is increased relative to the absence of the first DNA binding modulation enhancer.

The method according to item 94, wherein the first DNA binding modulation enhancer is a first DNA binding protein or a first DNA binding nucleic acid.

Item 103 the method according to item 94, wherein the first DNA binding modulation enhancer is a first transcription activator-like (TAL) effector protein or a truncated first guide rna (grna).

The method according to item 94, wherein the second DNA binding modulation enhancer is a second DNA binding protein or a second DNA binding nucleic acid.

Item 105. the method according to item 94, wherein the second DNA binding modulation enhancer is a TAL effector protein or a truncated gRNA.

Item 106 the method of item 94, wherein the first DNA binding modulation enhancer is a first TAL effector protein and the second DNA binding modulation enhancer is a second TAL effector protein.

Item 107. the method according to item 94, wherein the first DNA binding modulation enhancer is a TAL effector protein and the second DNA binding modulation enhancer is a truncated gRNA.

Item 108 the method according to item 94, wherein the first DNA binding modulation enhancer is a truncated first gRNA and the second DNA binding modulation enhancer is a truncated second gRNA.

Item 109. the method according to item 94, wherein the first DNA binding modulation enhancer is a truncated gRNA and the second DNA binding modulation enhancer is a TAL effector protein.

The method according to item 94, wherein the first enhancer binding sequence has the sequence SEQ ID NO: 26. SEQ ID NO: 28. SEQ ID NO: 30. SEQ ID NO: 32. SEQ ID NO: 34. SEQ ID NO: 36. SEQ ID NO: 38 or SEQ ID NO: the requirement of 40 is kuuer.

The method according to item 94, wherein the second enhancer binding sequence has the sequence of SEQ ID NO: 27. SEQ ID NO: 29. SEQ ID NO: 31. SEQ ID NO: 33. SEQ ID NO: 35. SEQ ID NO: 37. SEQ ID NO: 39 or SEQ ID NO: 41.

A method of reconfiguring chromatin of a target locus in a cell, the method comprising: (1) introducing into a cell comprising a nucleic acid encoding a locus of interest: (i) a first DNA binding modulation enhancer, wherein the first DNA binding modulation enhancer is not endogenous to the cell; and (ii) a second DNA binding modulation enhancer, wherein the second DNA binding modulation enhancer is not endogenous to the cell; (2) a first enhancer binding sequence that binds the first DNA binding modulation enhancer to the target locus; and (3) allowing the second DNA binding modulation enhancer to bind to the second enhancer binding sequence of the target locus, thereby reconfiguring chromatin of the target locus.

The method of item 113, item 112, wherein said introducing a first DNA binding modulation enhancer comprises introducing a vector encoding said first DNA binding modulation enhancer.

The method of item 112, wherein said introducing a first DNA binding modulation enhancer comprises introducing mRNA encoding said first DNA binding modulation enhancer.

The method of item 115, item 112, wherein said introducing a first DNA binding modulation enhancer comprises introducing a first DNA binding protein or a first DNA binding nucleic acid.

The method according to item 116, item 112, wherein said introducing a second DNA binding modulation enhancer comprises introducing a vector encoding said second DNA binding modulation enhancer.

The method according to item 117, wherein said introducing a second DNA binding modulation enhancer comprises introducing mRNA encoding said second DNA binding modulation enhancer.

The method according to item 112, wherein said introducing a second DNA binding modulation enhancer comprises introducing a second DNA binding protein or a second DNA binding nucleic acid.

The method according to item 119, item 112, wherein the rate at which homologous recombination occurs at the target locus is increased relative to the absence of the first DNA binding modulation enhancer.

The method according to item 120, item 112, wherein the first DNA binding modulation enhancer is a first DNA binding protein or a first DNA binding nucleic acid.

Item 121. the method of item 112, wherein the first DNA binding modulation enhancer is a first transcription activator-like (TAL) effector protein or a truncated first guide rna (grna).

The method of item 122. the method of item 112, wherein the second DNA binding modulation enhancer is a second DNA binding protein or a second DNA binding nucleic acid.

Item 123. the method according to item 112, wherein the second DNA binding modulation enhancer is a TAL effector protein or a truncated gRNA.

Item 124. the method of item 112, wherein the first DNA binding modulation enhancer is a first TAL effector protein and the second DNA binding modulation enhancer is a second TAL effector protein.

Item 125. the method according to item 112, wherein the first DNA binding modulation enhancer is a TAL effector protein and the second DNA binding modulation enhancer is a truncated gRNA.

Item 126. the method according to item 112, wherein the first DNA binding modulation enhancer is a truncated first gRNA and the second DNA binding modulation enhancer is a truncated second gRNA.

The method according to item 112, wherein the first DNA binding modulation enhancer is a truncated gRNA and the second DNA binding modulation enhancer is a TAL effector protein.

A method of enhancing the activity of a regulatory protein or regulatory complex at a target locus in a cell, the method comprising: (1) introducing into a cell comprising a nucleic acid encoding a locus of interest: (i) a first regulatory protein or a first regulatory complex capable of binding to a regulator binding sequence of the target locus, wherein the regulator binding sequence comprises a regulatory site; and (ii) a first DNA binding-regulatory enhancer capable of binding a first enhancer binding sequence of the target locus; and (2) allowing the first DNA binding regulatory enhancer to bind to the first enhancer binding sequence, thereby enhancing the activity of the first regulatory protein or the first regulatory complex at the target locus in the cell.

The method of item 129, which further comprises introducing a second DNA binding modulation enhancer capable of binding a second enhancer binding sequence of the target locus.

The method of item 130, item 128, wherein said introducing a first DNA binding modulation enhancer comprises introducing a vector encoding said first DNA binding modulation enhancer.

The method of item 131, item 128, wherein said introducing a first DNA binding modulation enhancer comprises introducing mRNA encoding said first DNA binding modulation enhancer.

The method of item 132, wherein said introducing a first DNA binding modulation enhancer comprises introducing a first DNA binding protein or a first DNA binding nucleic acid.

Item 133 the method according to item 129, wherein said introducing a second DNA binding modulation enhancer comprises introducing a vector encoding said second DNA binding modulation enhancer.

The method according to item 134, 129, wherein said introducing a second DNA binding modulation enhancer comprises introducing mRNA encoding said second DNA binding modulation enhancer.

The method according to item 135, wherein said introducing a second DNA binding modulation enhancer comprises introducing a second DNA binding protein or a second DNA binding nucleic acid.

The method of item 128, wherein said first regulatory protein or said first regulatory complex is not endogenous to said cell.

The method according to item 128, wherein the rate at which homologous recombination occurs at the target locus is increased relative to the absence of the first DNA binding modulation enhancer.

The method of item 138. the method of item 129, wherein the second enhancer binding sequence is linked to the first enhancer binding sequence through the regulator binding sequence.

Item 139 the method of item 128, further comprising introducing a second regulatory protein or a second regulatory complex capable of binding to the regulator binding sequence.

The method according to item 140, item 128, wherein said introducing a first regulatory protein comprises introducing a vector encoding said first regulatory protein.

The method according to item 141, wherein said introducing a first regulatory protein comprises introducing mRNA encoding said first regulatory protein.

The method of item 142 according to item 128, wherein said introducing a first regulatory protein comprises introducing a first regulatory protein.

The method of item 143 according to item 128, wherein said introducing a first regulatory complex comprises introducing a vector encoding said first regulatory complex.

Item 144 the method of item 128, wherein said introducing a first regulatory complex comprises introducing mRNA encoding said first regulatory complex.

Item 145. the method of item 128, wherein said introducing a first modulating complex comprises introducing a first modulating complex.

Item 146 the method of item 139, wherein said introducing a second regulatory protein comprises introducing a vector encoding said second regulatory protein.

Item 147. the method according to item 139, wherein said introducing a second regulatory protein comprises introducing mRNA encoding said second regulatory protein.

Item 148 the method of item 139, wherein said introducing a second regulatory protein comprises introducing a second regulatory protein.

Item 149. the method according to item 139, wherein said introducing a second regulatory complex comprises introducing a vector encoding said second regulatory complex.

Item 150. the method of item 139, wherein said introducing a second regulatory complex comprises introducing mRNA encoding said second regulatory complex.

The method of item 151. the method of item 139, wherein said introducing a second regulatory complex comprises introducing a second regulatory complex.

The method of item 139, wherein the first regulatory protein or the second regulatory protein comprises a DNA binding protein or a DNA regulatory enzyme.

The method according to item 153, wherein the DNA binding protein is a transcriptional repressor or a transcriptional activator.

The method of item 154, wherein the DNA modulating enzyme is a nuclease, deaminase, methylase, or demethylase.

The method of item 155, item 128, wherein the first regulatory protein or the second regulatory protein comprises a histone regulatory enzyme.

The method according to item 156, wherein the histone regulatory enzyme is a deacetylase or an acetylase.

The method of item 157 according to item 128, wherein the first regulatory protein is a first DNA binding protein nuclease conjugate.

The method of item 158, wherein the second regulatory protein is a second DNA binding protein nuclease conjugate.

The method of item 159, according to item 158, wherein the first DNA binding protein nuclease conjugate comprises a first nuclease and the second DNA binding protein nuclease conjugate comprises a second nuclease.

The method according to item 160. item 159, wherein said first nuclease and said second nuclease form a dimer.

The method of item 161, according to item 159, wherein the first nuclease and the second nuclease are independently a transcription activator-like effector nuclease (TALEN).

Item 162 the method according to item 159, wherein the first DNA binding protein nuclease conjugate comprises a first transcription activator-like (TAL) effector domain operably linked to a first nuclease (TALEN).

The method of item 163 according to item 159, wherein the first DNA binding protein nuclease conjugate comprises a first TAL effector domain operably linked to a first fokl nuclease.

The method of item 164. according to item 159, wherein the second DNA binding protein nuclease conjugate comprises a second TAL effector domain operably linked to a second nuclease (TALEN).

Item 165 the method of item 159, wherein the second DNA binding protein nuclease conjugate comprises a second TAL effector domain operably linked to a second fokl nuclease.

The method of item 166. according to item 159, wherein the first DNA binding protein nuclease conjugate comprises a first zinc finger nuclease.

The method of item 167. according to item 159, wherein the second DNA binding protein nuclease conjugate comprises a first zinc finger nuclease.

Item 168. the method of item 128, wherein the first regulatory complex is a first ribonucleoprotein complex.

Item 169 the method of item 139, wherein the second regulatory complex is a second ribonucleoprotein complex.

Item 170. the method according to item 168, wherein the first ribonucleoprotein complex comprises a CRISPR-associated protein 9(Cas9) domain that binds to a gRNA or an argan domain that binds to a guide dna (gdna).

Item 171 the method according to item 169, wherein the second ribonucleoprotein complex comprises a CRISPR-associated protein 9(Cas9) domain that binds to a gRNA or an alogenin domain that binds to a guide dna (gdna).

The method of item 139, wherein the first regulatory protein, the first regulatory complex, the second regulatory protein, or the second regulatory complex is not endogenous to the cell.

The method of item 139, wherein the first regulatory protein and the second regulatory protein are not endogenous to the cell.

The method of item 139, wherein the first regulatory complex and the second regulatory complex are not endogenous to the cell.

The method of item 175. the method of item 168, wherein the first DNA binding modulation enhancer or the second DNA binding modulation enhancer is not endogenous to the cell.

The method according to item 129, wherein the first DNA binding modulation enhancer and the second DNA binding modulation enhancer are not endogenous to the cell.

The method of item 177, wherein the first DNA binding modulation enhancer is a first DNA binding protein or a first DNA binding nucleic acid.

The method of item 178, wherein the first DNA binding modulation enhancer is a first transcription activator-like (TAL) effector protein or a truncated first guide rna (grna).

The method of item 179, wherein the second DNA binding modulation enhancer is a second DNA binding protein or a second DNA binding nucleic acid.

Item 180. the method according to item 129, wherein the second DNA binding modulation enhancer is a TAL effector protein or a truncated gRNA.

Item 181. the method of item 129, wherein the first DNA binding modulation enhancer is a first TAL effector protein and the second DNA binding modulation enhancer is a second TAL effector protein.

The method according to item 129, wherein the first DNA binding modulation enhancer is a TAL effector protein and the second DNA binding modulation enhancer is a truncated gRNA.

Item 183. the method according to item 129, wherein the first DNA binding modulation enhancer is a truncated first gRNA and the second DNA binding modulation enhancer is a truncated second gRNA.

Item 184. the method according to item 129, wherein the first DNA binding modulation enhancer is a truncated gRNA and the second DNA binding modulation enhancer is a TAL effector protein.

Item 185 the method of item 139, wherein the first regulatory protein is a first DNA-binding nuclease conjugate and the second regulatory protein is a second DNA-binding nuclease conjugate.

The method of item 139, wherein the first regulatory protein is a DNA-binding nuclease conjugate and the second regulatory complex is a ribonucleoprotein complex.

The method according to item 139, wherein the first regulatory complex is a first ribonucleoprotein complex and the second regulatory complex is a second ribonucleoprotein complex.

Item 188. the method of item 139, wherein the first regulatory complex is a ribonucleoprotein complex and the second regulatory protein is a DNA-binding nuclease conjugate.

Item 189. the method according to item 129, wherein the first enhancer binding sequence and/or second enhancer binding sequence is separated from the regulator binding sequence by less than 200 nucleotides, less than 150 nucleotides, less than 100 nucleotides, or less than 50 nucleotides.

Item 190. the method according to item 129, wherein the first enhancer binding sequence and/or second enhancer binding sequence is separated from the regulator binding sequence by 4 to 30 nucleotides or 7 to 30 nucleotides.

The method according to item 191, wherein the first enhancer binding sequence and/or the second enhancer binding sequence is separated from the regulator binding sequence by 4 nucleotides, 7 nucleotides, 12 nucleotides, 20 nucleotides or 30 nucleotides.

The method of item 192, wherein the first enhancer binding sequence and/or second enhancer binding sequence is separated from the regulator binding sequence by less than 200 nucleotides, less than 150 nucleotides, less than 100 nucleotides, or less than 50 nucleotides.

The method of item 129, wherein the first enhancer binding sequence and/or the second enhancer binding sequence is 10 to 40 nucleotides apart from the regulatory site.

Item 194. the method according to item 129, wherein the first enhancer binding sequence and/or the second enhancer binding sequence is separated from the regulatory site by 33 nucleotides.

The method of item 139, wherein the first or second DNA binding modulation enhancer enhances the activity of the first regulatory protein, the first regulatory complex, the second regulatory protein, or the second regulatory complex at the regulatory site.

A method of modulating a target locus in a cell, the method comprising: (1) introducing into a cell comprising a nucleic acid encoding a locus of interest: (i) a first regulatory protein or a first regulatory complex capable of binding to a regulator binding sequence of the target locus, wherein the regulator binding sequence comprises a regulatory site; and (ii) a first DNA binding-regulatory enhancer capable of binding a first enhancer binding sequence of the target locus; and (2) allowing said first regulatory protein or said first regulatory complex to modulate said regulatory site, thereby modulating said locus of interest in a cell.

The method according to item 197, further comprising introducing a second DNA binding modulation enhancer capable of binding to a second enhancer binding sequence of the target locus.

The method of item 198, wherein said introducing a first DNA binding modulation enhancer comprises introducing into a cell: (1) a vector encoding the first DNA binding modulation enhancer, (2) mRNA encoding the first DNA binding modulation enhancer, or (3) a first DNA binding modulation enhancer.

The method according to item 197, wherein the introducing a second DNA binding modulation enhancer comprises introducing into the cell: (1) a vector encoding the first DNA binding modulation enhancer, (2) mRNA encoding the first DNA binding modulation enhancer, or (3) a first DNA binding modulation enhancer.

The method of clause 200. the method of clause 199, wherein the introducing a second DNA binding modulation enhancer comprises introducing mRNA encoding the second DNA binding modulation enhancer.

The method according to clause 197, wherein said introducing a second DNA binding modulation enhancer comprises introducing a second DNA binding protein or a second DNA binding nucleic acid.

The method according to item 196, wherein said first regulatory protein or said first regulatory complex is not endogenous to said cell.

The method of item 203, according to item 196, wherein the rate at which homologous recombination occurs at the target locus is increased relative to the absence of the first DNA binding modulation enhancer.

Item 204. the method according to item 197, wherein the second enhancer binding sequence is linked to the first enhancer binding sequence through the regulator binding sequence.

The method of item 196, further comprising introducing a second regulatory protein or a second regulatory complex capable of binding to the regulator binding sequence.

The method according to item 206, wherein said introducing a first regulatory protein comprises introducing a vector encoding said first regulatory protein.

The method of item 207 according to item 196, wherein said introducing a first regulatory protein comprises introducing mRNA encoding said first regulatory protein.

The method of item 208, according to item 196, wherein said introducing a first regulatory protein comprises introducing a first regulatory protein.

Item 209 the method according to item 196, wherein said introducing a first regulatory complex comprises introducing a vector encoding said first regulatory complex.

The method according to item 196, wherein said introducing a first regulatory complex comprises introducing mRNA encoding said first regulatory complex.

The method of item 211, wherein said introducing a first modulating complex comprises introducing a first modulating complex.

The method according to item 212, 205, wherein said introducing a second regulatory protein comprises introducing a vector encoding said second regulatory protein.

The method according to item 213, item 205, wherein said introducing a second regulatory protein comprises introducing mRNA encoding said second regulatory protein.

Item 214 the method of item 205, wherein said introducing a second regulatory protein comprises introducing a second regulatory protein.

Item 215 the method of item 205, wherein said introducing a second regulatory complex comprises introducing a vector encoding said second regulatory complex.

The method according to item 205, wherein said introducing a second regulatory complex comprises introducing mRNA encoding said second regulatory complex.

Item 217 the method of item 205, wherein said introducing a second modulating complex comprises introducing a second modulating complex.

The method of item 218, item 205, wherein the first regulatory protein or the second regulatory protein comprises a DNA binding protein or a DNA regulatory enzyme.

The method according to item 219, wherein the DNA binding protein is a transcriptional repressor or a transcriptional activator.

The method of item 220, wherein the DNA modulating enzyme is a nuclease, deaminase, methylase, or demethylase.

The method of item 221, wherein said first regulatory protein or said second regulatory protein comprises a histone regulatory enzyme.

The method according to item 222, wherein the histone regulatory enzyme is a deacetylase or an acetylase.

The method according to item 196, wherein the first regulatory protein is a first DNA binding protein nuclease conjugate.

The method according to item 205, wherein the second regulatory protein is a second DNA binding protein nuclease conjugate.

The method of item 225, item 224, wherein the first DNA binding protein nuclease conjugate comprises a first nuclease and the second DNA binding protein nuclease conjugate comprises a second nuclease.

The method of item 226, wherein said first nuclease and said second nuclease form a dimer.

The method of clause 227, according to clause 225, wherein the first nuclease and the second nuclease are independently a transcription activator-like effector nuclease (TALEN).

The method of clause 228, according to clause 225, wherein the first DNA binding protein nuclease conjugate comprises a first transcription activator-like (TAL) effector domain operably linked to a first nuclease (TALEN).

The method of item 229. according to item 228, wherein the first DNA binding protein nuclease conjugate comprises a first TAL effector domain operably linked to a first fokl nuclease.

The method of clause 230-227, wherein the second DNA binding protein nuclease conjugate comprises a second TAL effector domain operably linked to a second nuclease (TALEN).

The method according to clause 231, clause 230, wherein the second DNA-binding protein nuclease conjugate comprises a second TAL effector domain operably linked to a second fokl nuclease.

The method of item 232, item 196, wherein the first DNA binding protein nuclease conjugate comprises a first zinc finger nuclease.

The method of item 233. the method of item 205, wherein the second DNA binding protein nuclease conjugate comprises a first zinc finger nuclease.

The method according to item 234, wherein the first regulatory complex is a first ribonucleoprotein complex.

Item 235 the method according to item 197, wherein the second regulatory complex is a second ribonucleoprotein complex.

Item 236. the method according to item 234, wherein the first ribonucleoprotein complex comprises a CRISPR-associated protein 9(Cas9) domain that binds to a gRNA or an alogenin domain that binds to a guide dna (gdna).

Item 237. the method according to item 235, wherein the second ribonucleoprotein complex comprises a CRISPR-associated protein 9(Cas9) domain that binds to a gRNA or an alogenin domain that binds to a guide dna (gdna).

Item 238. the method of item 205, wherein the first regulatory protein, the first regulatory complex, the second regulatory protein, or the second regulatory complex is not endogenous to the cell.

The method of item 239, wherein said first regulatory protein and said second regulatory protein are not endogenous to said cell.

The method of item 240. the method of item 205, wherein the first regulatory complex and the second regulatory complex are not endogenous to the cell.

The method according to clause 197, wherein the first DNA binding modulation enhancer or the second DNA binding modulation enhancer is not endogenous to the cell.

The method according to item 197, wherein the first DNA binding modulation enhancer and the second DNA binding modulation enhancer are not endogenous to the cell.

The method according to item 243, wherein the first DNA binding modulation enhancer is a first DNA binding protein or a first DNA binding nucleic acid.

The method of item 244, wherein the first DNA binding modulation enhancer is a first transcription activator-like (TAL) effector protein or a truncated first guide rna (grna).

The method according to item 197, wherein the second DNA binding modulation enhancer is a second DNA binding protein or a second DNA binding nucleic acid.

Item 246 the method of item 197, wherein the second DNA binding modulation enhancer is a TAL effector protein or a truncated gRNA.

The method of item 197, wherein the first DNA binding modulation enhancer is a first TAL effector protein and the second DNA binding modulation enhancer is a second TAL effector protein.

The method according to item 197, wherein the first DNA binding modulation enhancer is a TAL effector protein and the second DNA binding modulation enhancer is a truncated gRNA.

The method according to item 197, wherein the first DNA binding-modulation enhancer is a truncated first gRNA and the second DNA binding-modulation enhancer is a truncated second gRNA.

The method according to item 197, wherein the first DNA binding-modulating enhancer is a truncated gRNA and the second DNA binding-modulating enhancer is a TAL effector protein.

Item 251. the method of item 205, wherein the first regulatory protein is a first DNA binding protein nuclease conjugate and the second regulatory protein is a second DNA binding protein nuclease conjugate.

The method of item 205, wherein the first regulatory protein is a DNA-binding nuclease conjugate and the second regulatory complex is a ribonucleoprotein complex.

The method according to clause 252, wherein the first regulatory complex is a first ribonucleoprotein complex and the second regulatory complex is a second ribonucleoprotein complex.

The method of item 254, wherein the first regulatory complex is a ribonucleoprotein complex and the second regulatory protein is a DNA binding protein nuclease conjugate.

The method of clause 255 according to clause 196, wherein the first enhancer binding sequence is separated from the regulator binding sequence by less than 200 nucleotides, less than 150 nucleotides, less than 100 nucleotides, or less than 50 nucleotides.

The method of item 256, according to item 196, wherein the first enhancer binding sequence is separated from the regulator binding sequence by 4 to 30 nucleotides or 7 to 30 nucleotides.

The method of item 257, wherein the first enhancer binding sequence is separated from the regulator binding sequence by 4 nucleotides, 7 nucleotides, 12 nucleotides, 20 nucleotides, or 30 nucleotides.

The method of item 258, wherein the second enhancer binding sequence is separated from the regulator binding sequence by less than 200 nucleotides, less than 150 nucleotides, less than 100 nucleotides, or less than 50 nucleotides.

The method of clause 259, wherein the second enhancer binding sequence is separated from the regulator binding sequence by 4 to 30 nucleotides or 7 to 30 nucleotides.

Item 260. the method according to item 197, wherein the second enhancer binding sequence is separated from the regulator binding sequence by 4 nucleotides, 7 nucleotides, 12 nucleotides, 20 nucleotides, 30 nucleotides.

The method according to item 197, wherein the first enhancer binding sequence or the second enhancer binding sequence is 10 to 40 nucleotides apart from the regulatory site.

The method according to item 197, wherein the first enhancer binding sequence or the second enhancer binding sequence is separated from the regulatory site by 33 nucleotides.

The method according to clause 197, wherein the first or second DNA binding modulatory enhancer enhances the activity of the first regulatory protein, the first regulatory complex, the second regulatory protein or the second regulatory complex at the regulatory site.

A cell comprising a nucleic acid encoding a locus regulatory complex of interest, said complex comprising: (i) a target locus comprising a first enhancer binding sequence and a regulator binding sequence comprising a regulatory site; (ii) a first regulatory protein or first regulatory complex that binds to the regulator binding sequence; and (iii) a first DNA binding modulation enhancer that binds to the first enhancer binding sequence.

The cell of item 265. the cell of item 264, wherein the locus of interest further comprises a second enhancer binding sequence linked to the first enhancer binding sequence by the regulator binding sequence.

Item 266. the cell of item 264 comprising a second DNA binding modulation enhancer in combination with the second enhancer binding sequence.

A cell comprising a nucleic acid encoding a complex of a locus of interest, the complex comprising: (i) a locus of interest comprising a first enhancer binding sequence; and (ii) a first DNA binding modulation enhancer bound to the first enhancer binding sequence, wherein the first DNA binding modulation enhancer is not endogenous to the cell and wherein the first DNA binding modulation enhancer is capable of enhancing accessibility of the target locus relative to the absence of the first DNA binding modulation enhancer.

A cell comprising a nucleic acid encoding a complex of a locus of interest, the complex comprising: (1) a target locus, the target locus comprising: (i) a first enhancer binding sequence; and (ii) a second enhancer binding sequence; (2) a first DNA binding modulation enhancer that binds to the first enhancer binding sequence of the target locus, wherein the first DNA binding modulation enhancer is not endogenous to the cell; and (3) a second DNA binding modulation enhancer bound to the second enhancer binding sequence of the target locus, wherein the second DNA binding modulation enhancer is not endogenous to the cell, wherein the first and second DNA binding modulation enhancers are capable of enhancing accessibility of the target locus relative to the absence of the first and second DNA binding modulation enhancers.

Item 269. a kit comprising: (i) a first regulatory protein or first regulatory complex; and (ii) a first DNA binding modulation enhancer.

A method for altering an endogenous nucleic acid molecule present in a cell, said method comprising introducing a donor DNA molecule into said cell, wherein said donor DNA molecule is operably linked to one or more intracellular targeting moieties that enable localization of said donor DNA molecule to a location in said cell where said endogenous nucleic acid molecule is located.

The method of item 271, wherein the endogenous nucleic acid molecule is located in the cell in the nucleus, mitochondria or chloroplast.

The method of item 272, wherein the one or more intracellular targeting moieties is a nuclear localization signal.

The method of clause 273, wherein the donor DNA molecule is from about 25 to about 8,000 nucleotides in length.

The method of clause 274, wherein the donor DNA molecule is single-stranded, double-stranded, or partially double-stranded.

Item 275 the method of item 270, wherein the donor DNA molecule has one or more nuclease resistant groups within 50 nucleotides of at least one terminus.

The method of item 276, wherein the nuclease resistant group is a phosphorothioate group, an amino group, a2 '-O-methyl nucleotide, a 2' -deoxy-2 '-fluoro nucleotide, a 2' -deoxy nucleotide, a 5-C-methyl nucleotide, or a combination thereof.

The method of item 277, 276, wherein there are two phosphorothioate groups within 50 nucleotides of at least one terminus.

The method of clause 278, where the donor DNA molecule contains a positive selectable marker and a negative selectable marker.

The method of clause 278, wherein the negative selectable marker is herpes simplex virus thymidine kinase.

The method according to clause 270, wherein the donor DNA molecule has two regions of sequence complementarity to a target locus present in the cell.

The method of item 281, according to item 278, wherein the positive selectable marker is located between two regions of the donor DNA molecule that have sequence complementarity.

The method of clause 282, wherein the negative selectable marker is not located between two regions of the donor DNA molecule that have sequence complementarity.

The method of clause 283, wherein the cell is contacted with one or more of the following: (1) one or more nucleic acid cleavage entities, (2) one or more nucleic acid molecules encoding at least one component of a nucleic acid cleavage entity, (3) one or more DNA binding modulation enhancers, (4) one or more nucleic acid molecules encoding at least one component of a DNA binding modulation enhancer, or (5) one or more non-homologous end-joining (NHEJ) inhibitors.

The method of clause 283, wherein the one or more non-homologous end joining (NHEJ) inhibitors is a DNA-dependent protein kinase inhibitor.

The method of clause 284, wherein at least one of the one or more non-homologous end-joining (NHEJ) inhibitors is selected from the group consisting of: (1) nu7206, (2) Nu7441, (3) Ku-0060648, (4) DMNB, (5) ETP 45658, (6) LTURM 34, and (7) P1103 hydrochloride.

The method of clause 286, wherein at least one of the one or more nucleic acid cleavage entities is selected from the group consisting of: (1) zinc finger nucleases, (2) TAL effector nucleases and (3) CRISPR complexes.

The method of clause 283, wherein at least one of the one or more DNA binding modulation enhancers is selected from the group consisting of: (1) zinc finger proteins (e.g., zinc finger proteins without a heterologous nuclease domain), (2) TAL effector proteins (e.g., TALE proteins without a heterologous nuclease domain), and (3) CRISPR complexes (e.g., CRISPR complexes comprising dCas9 protein).

The method of clause 288, wherein at least one of the one or more DNA binding modulation enhancers is designed to bind within 50 nucleotides of the target locus.

A method for homologous recombination in a eukaryotic cell, the method comprising contacting the cell with: (1) a donor DNA molecule and (2) (i) a nucleic acid cleaving entity, (ii) a nucleic acid encoding the nucleic acid cleaving entity, or (iii) at least one component of the nucleic acid cleaving entity and a nucleic acid encoding at least one component of the nucleic acid cleaving entity, wherein the donor DNA molecule is bound to an intracellular targeting moiety that is capable of localizing the donor DNA molecule to a location in the cell where the endogenous nucleic acid molecule is located.

Item 290 the method of item 289, further comprising contacting the cell with one or more of: (1) one or more non-homologous end-joining (NHEJ) inhibitors, (2) one or more DNA-binding modulation enhancers, (3) one or more nucleic acids encoding a DNA-binding modulation enhancer, and (4) at least one component of one or more DNA-binding modulation enhancers and a nucleic acid encoding at least one component of one or more DNA-binding modulation enhancers.

A composition comprising a DNA molecule, wherein the DNA molecule is covalently linked to one or more intracellular targeting moieties and wherein the DNA molecule is from about 25 nucleotides to about 8,000 nucleotides in length.

The composition of item 292, wherein the DNA molecule is a donor DNA molecule.

The composition of item 291, wherein the one or more intracellular targeting moieties is a nuclear localization signal.

The composition of item 291, wherein two or more intracellular targeting moieties are covalently linked to the DNA molecule.

The composition of item 295. the composition of item 291, wherein the one or more intracellular targeting moieties are selected from the group consisting of: (1) nuclear localization signal, (2) chloroplast targeting signal, and (3) mitochondrial targeting signal.

Item 296. a Cas9 protein comprising two or more double typing nuclear localization signals.

Item 297. a Cas9 protein according to item 296, wherein the two or more double typing nuclear localization signals are located within twenty amino acids of at least one terminus.

Item 298. Cas9 protein according to item 296, wherein the two or more double typing nuclear localization signals are individually located within the twenty amino acids of the N-and C-termini of the protein.

Item 299. a Cas9 protein according to item 296, wherein the two or more double typing nuclear localization signals comprise different amino acid sequences.

Item 300. Cas9 protein according to item 296, further comprising at least one single typing nuclear localization signal.

Item 301. Cas9 protein according to item 296, further comprising an affinity tag.

Item 302. a Cas9 protein according to item 296, wherein at least one of the nuclear localization signals has an amino acid sequence selected from the group consisting of: (A) KRTAD GSEFE SPKKK RKVE (SEQ ID NO: 48), (B) KRTAD GSEFESPKKA RKVE (SEQ ID NO: 49), (C) KRTAD GSEFE SPKKK AKVE (SEQ ID NO: 50), (D) KRPAATKKAG QAKKK K (SEQ ID NO: 51), and (E) KRTAD GSEFEP AAKRV KLDE (SEQ ID NO: 52).

Item 303. a Cas9 protein according to item 296, wherein at least one of the nuclear localization signals has an amino acid sequence selected from the group consisting of: (A) KRX_{(5 15)}KKN₁N₂KV(SEQ ID NO：53)、(B)RX_(5-15)K(K/R)(K/R)_1-2(SEQ ID NO：54)、(C)KRX_(5-15)K(K/R)X(K/R)₁₂(SEQ ID NO: 55) wherein X is an amino acid sequence of 5 to 15 amino acids in length and wherein N is₁Is L or A, and wherein N₂Is L, A or R.

Item 304. a Cas9 protein according to item 296, comprising the amino acid sequence shown in figure 42.

A TALE protein comprising amino acids: amino acids 811-830 of FIG. 46, wherein the amino acids at positions 815-816 and 824-825 may be Gly-Ser or Gly-Gly.

The TALE protein according to item 306, 305 comprising amino acids (amino acids 810-1029 of FIG. 46), wherein the amino acids at positions 1022-1023 can be Gly-Ser or Gly-Gly.

TALE protein according to item 305, comprising amino acids 752-1021 of FIG. 46.

A TALE protein comprising amino acids (amino acids 20-165 of FIG. 47), wherein the amino acids at positions 28-29 can be Gly-Ser or Gly-Gly and wherein the amino acids at positions 108-110 and 823-824 can be Arg-Gly-Ala or Gln-Trp-Ser.

A TALE protein comprising amino acids (amino acids 821-.

A TALE protein according to item 308 comprising amino acids corresponding to figure 46.

The TALE protein according to item 311, comprising a repeat region comprising 4 to 25 repeat units.

A method of engineering a intracellular nucleic acid in a cell, the method comprising introducing into the cell a TALE protein according to item 306 or a nucleic acid encoding a TALE protein according to item 2, wherein the TALE protein is designed to bind to a locus of interest within the cell.

The method of item 313, further comprising introducing a donor nucleic acid molecule into the cell, wherein the donor nucleic acid molecule has one or more regions of sequence homology to a nucleic acid within 50 nucleotides of the target locus.

A method of causing homologous recombination of intracellular nucleic acid molecules within a population of cells at a cleavage site, the method comprising: (a) causing a double-strand break in the intracellular nucleic acid molecule at the cleavage site to produce a cleaved nucleic acid molecule; and (b) contacting the cleaved nucleic acid molecule with a donor nucleic acid molecule, wherein the donor nucleic acid molecule has at least ten nucleotides or base pairs that are homologous to nucleic acids within 100 base pairs located on each side of the cleavage site, wherein at least 95% of cells within the population of cells undergo homology-directed repair of the donor nucleic acid molecule at the cleavage site.

The method according to item 315, wherein the donor nucleic acid molecule comprises a selectable marker or reporter gene, which is operably linked to a promoter present in the intracellular nucleic acid molecule following homology directed repair.

The method according to item 314, wherein the donor nucleic acid molecule is linked to one or more nuclear localization signals that allow the localization of the donor nucleic acid molecule to the nucleus of the cell population.

Item 317 the method according to item 314, contacting the population of cells with one or more of: (1) one or more nucleic acid cleavage entities, (2) one or more nucleic acid molecules encoding at least one component of a nucleic acid cleavage entity, (3) one or more DNA binding modulation enhancers, (4) one or more nucleic acid molecules encoding at least one component of a DNA binding modulation enhancer, or (5) one or more non-homologous end-joining (NHEJ) inhibitors.

Item 318 the method according to item 314, wherein the donor nucleic acid molecule is single-stranded, double-stranded or partially double-stranded.

The method of item 314, wherein the population of cells is contacted with one or more nucleic acid cleavage entities or one of a plurality of nucleic acid molecules encoding one or more nucleic acid cleavage entities prior to contacting the population of cells with one or more donor nucleic acid molecules.

The method of item 320, wherein after contacting the population of cells with the one or more nucleic acid cleavage entities or one of the plurality of nucleic acid molecules encoding the one or more nucleic acid cleavage entities, the population of cells is contacted with one or more donor nucleic acid molecules for 5 to 60 minutes.

A method of enhancing the activity of a regulatory protein or regulatory complex at a target locus in a cell, the method comprising: (1) introducing into a cell comprising a nucleic acid encoding the locus of interest: (i) a first regulatory protein or a first regulatory complex capable of binding to a first regulator binding sequence of the target locus, wherein the first regulator binding sequence comprises a regulatory site; and (ii) a first DNA binding-regulatory enhancer capable of binding a first enhancer binding sequence of the target locus; and (2) allowing the first DNA binding regulatory enhancer to bind to the first enhancer binding sequence, thereby enhancing the activity of the first regulatory protein or the first regulatory complex at the target locus in the cell.

The method according to item 321, item 322, wherein said introducing a first DNA binding modulation enhancer comprises introducing a vector encoding said first DNA binding modulation enhancer.

The method of item 323, according to item 321, wherein said introducing a first DNA binding modulation enhancer comprises introducing mRNA encoding said first DNA binding modulation enhancer.

The method of item 321, wherein the first DNA binding modulation enhancer is a first transcription activator-like (TAL) effector.

The method of item 325, further comprising: (1) introducing a second DNA binding modulation enhancer into the cell; and (2) a second enhancer binding sequence that binds the second DNA binding modulation enhancer to the target locus.

Item 326 the method of item 324, wherein the first enhancer binding sequence and the second enhancer binding sequence are located within 180 base pairs relative to each other.

The method of clause 327. 324, wherein the first enhancer binding sequence and the second enhancer binding sequence are on opposite sides of the regulator binding sequence.

The method of item 328, wherein the first regulatory protein is a DNA-binding nuclease fusion protein.

The method of item 328, wherein the DNA-binding nuclease fusion protein is a TALE-FokI fusion protein.

The method according to item 321, 330, wherein the first regulatory complex is a CRISPR/gRNA complex with nuclease activity.

Item 331. the method of item 330, wherein the first regulatory complex is a Cas9/gRNA complex with nuclease activity.

The method of item 321, further comprising introducing into the cell a second regulatory protein or a second regulatory complex capable of binding to a second regulator binding sequence of the target locus, wherein the second regulator binding sequence comprises a regulatory site.

The method of item 333, item 332, wherein the first regulatory protein is a DNA-binding nuclease fusion protein.

The method of item 334, wherein the DNA-binding nuclease fusion protein is a TALE-FokI fusion protein.

The method according to item 334, item 335, wherein the second regulatory complex is a CRISPR/gRNA complex with nuclease activity.

Item 336. the method according to item 335, wherein the second regulatory complex is a Cas9/gRNA complex with nuclease activity.

Claims

1. A method of altering an endogenous nucleic acid molecule present within a cell, the method comprising introducing a donor DNA molecule into the cell,

wherein the donor DNA molecule is operably linked to one or more intracellular targeting moieties capable of localizing the donor DNA molecule to the location of the endogenous nucleic acid molecule in the cell.

2. The method of claim 1, wherein the location in the cell at which the endogenous nucleic acid molecule is located is in the nucleus, mitochondria, or chloroplast.

3. The method of claim 1, wherein the one or more intracellular targeting moieties is a nuclear localization signal.

4. The method of claim 1, wherein the donor DNA molecule is from about 25 to about 8,000 nucleotides in length.

5. The method of claim 1, wherein the donor DNA molecule is single stranded.

6. The method of claim 1, wherein the donor DNA molecule has one or more nuclease resistant groups within 50 nucleotides of one or more ends.

7. The method of claim 6, wherein the nuclease-resistant group is selected from the group consisting of: phosphorothioate group, amino group, 2 '-O-methyl nucleotide, 2' -deoxy-2 '-fluoro nucleotide, 2' -deoxy nucleotide, 5-C-methyl nucleotide, or a combination thereof.

8. The method of claim 1, wherein the donor DNA is double stranded or partially double stranded.

9. The method of claim 1, wherein the donor DNA molecule contains a positive selectable marker and a negative selectable marker.

10. The method of claim 9, wherein the negative selectable marker is a nucleic acid encoding herpes simplex virus thymidine kinase.

11. The method of claim 1, wherein the donor DNA molecule has two regions of sequence complementary to a target locus present in the cell.

12. The method of claim 11, wherein the selectable marker is located between two regions of the donor DNA molecule having sequence complementarity.

13. The method of claim 1, wherein the cell is contacted with one or more of:

(1) one or more nucleic acid cleavage entities, wherein,

(2) one or more nucleic acid molecules encoding one or more components of a nucleic acid cleavage entity,

(3) one or more DNA binding modulation enhancing agents,

(4) one or more nucleic acid molecules encoding one or more components of a DNA binding modulation enhancer, or

(5) One or more non-homologous end-joining (NHEJ) inhibitors.

14. The method of claim 13, wherein the one or more non-homologous end joining (NHEJ) inhibitors are DNA-dependent protein kinase inhibitors.

15. The method of claim 13, wherein one or more of the one or more nucleic acid cleaving entities is selected from the group consisting of:

(1) zinc finger nucleases

(2) TAL effector nucleases, and

(3) a CRISPR complex.

16. The method of claim 13, wherein one or more of the one or more DNA binding modulation enhancers are selected from the group consisting of:

(1) a zinc finger protein,

(2) TALE protein, and

(3) a CRISPR complex.

17. The method of claim 16 wherein the TALE protein does not have a heterologous nuclease domain.

18. The method according to claim 16, wherein the CRISPR complex comprises dCas9 protein.

19. The method of claim 13, wherein one or more of the one or more DNA binding modulation enhancers are designed to bind within 50 nucleotides of the target locus.

20. A TALE protein comprising amino acids: amino acids 811-830 of FIG. 46, wherein the amino acids at positions 815-816 and 824-825 may be Gly-Ser or Gly-Gly.

21. A TALE protein according to claim 20 comprising the amino acids: amino acid 810-1029 of FIG. 46, wherein the amino acid at position 1022-1023 can be Gly-Ser or Gly-Gly.

22. A TALE protein according to claim 20 comprising the amino acids of figure 46: amino acids 752-1021.

23. A TALE protein comprising amino acids: amino acids 20-27, 30-107, and 111-165 of FIG. 47, wherein the amino acids at positions 28-29 are Gly-Ser or Gly-Gly and wherein the amino acids at position 108-110 are Arg-Gly-Ala or Gln-Trp-Ser.

24. A TALE protein comprising amino acids: amino acids 821-826 and 829-840 of FIG. 47, wherein the amino acids at positions 827-828 are Gly-Ser or Gly-Gly.

25. A TALE protein according to claim 24 comprising amino acids corresponding to figure 47.

26. A TALE protein according to claim 24 comprising a repeat region comprising 4 to 25 repeat units.

27. A method of engineering an intracellular nucleic acid in a cell, the method comprising introducing into the cell a TALE protein according to claim 23 or a nucleic acid encoding a TALE protein according to claim 23.

28. The method of claim 27, further comprising introducing a donor nucleic acid molecule into the cell, wherein the donor nucleic acid molecule has one or more regions of sequence homology to a nucleic acid located within 50 nucleotides of the target locus.

29. A Cas9 protein comprising two or more double-typing nuclear localization signals.

30. The Cas9 protein according to claim 29, wherein the two or more double-typing nuclear localization signals are located within twenty amino acids of one or more termini.

31. The Cas9 protein according to claim 29, wherein the two or more double typing nuclear localization signals are individually localized within twenty amino acids of the N-and C-terminus of the protein.

32. The Cas9 protein according to claim 29, wherein the two or more double typing nuclear localization signals comprise different amino acid sequences.

33. A Cas9 protein according to claim 29, further comprising one or more single-typing nuclear localization signals.

34. A Cas9 protein according to claim 29, further comprising an affinity tag.

35. A method of engineering intracellular nucleic acid in a cell, the method comprising introducing into the cell a Cas9 protein according to claim 29 or a nucleic acid encoding a Cas9 protein according to claim 29, wherein the Cas9 protein is designed to bind to a target locus within the cell.

36. The method of claim 35, wherein the Cas9 protein is introduced into the cell as a Cas9/gRNA complex.

37. The method of claim 35, further comprising introducing a donor nucleic acid molecule into the cell, wherein the donor nucleic acid molecule has one or more regions of sequence homology to a nucleic acid located within 50 nucleotides of the target locus.

38. A method of causing homologous recombination of intracellular nucleic acid molecules within a population of cells at a cleavage site, the method comprising:

(a) causing said intracellular nucleic acid molecule to generate a double strand break at said cleavage site to produce a cleaved nucleic acid molecule, and

(b) contacting the cleaved nucleic acid molecule with a donor nucleic acid molecule, wherein the donor nucleic acid molecule has at least ten nucleotides or base pairs that are homologous to nucleic acids within 100 base pairs located on each side of the cleavage site,

wherein at least 95% of the cells within the population of cells undergo homology-directed repair of the donor nucleic acid molecule at the cleavage site.

39. The method of claim 38, wherein the donor nucleic acid molecule contains a selectable marker or reporter gene operably linked to a promoter present in the intracellular nucleic acid molecule following homologous directed repair.

40. The method of claim 38, wherein the donor nucleic acid molecule is linked to one or more nuclear localization signals that allow the donor nucleic acid molecule to localize to the nuclei of the cell population.

41. The method of claim 38, contacting the population of cells with one or more of:

(1) one or more nucleic acid cleavage entities, wherein,

(3) one or more DNA binding modulation enhancing agents,

(5) One or more non-homologous end-joining (NHEJ) inhibitors.

42. The method of claim 38, wherein the donor nucleic acid molecule is single stranded.

43. The method of claim 38, wherein the donor DNA molecule is operably linked to one or more nuclear localization signals.

44. The method of claim 38, wherein the population of cells is contacted with the one or more nucleic acid cleavage entities or one of a plurality of nucleic acid molecules encoding one or more nucleic acid cleavage entities and the population of cells is subsequently contacted with one or more donor nucleic acid molecules.

45. The method of claim 44, wherein the population of cells is contacted with one or more donor nucleic acid molecules for 5 to 60 minutes after contacting the population of cells with the one or more nucleic acid cleavage entities or one of a plurality of nucleic acid molecules encoding one or more nucleic acid cleavage entities.

46. A method for homologous recombination in an initial nucleic acid molecule, comprising:

(a) causing a double strand break in said initial nucleic acid molecule to produce a cleaved nucleic acid molecule, and

(b) contacting the cleaved nucleic acid molecule with a donor nucleic acid molecule,

wherein the initial nucleic acid molecule comprises a promoter and a gene, an

Wherein the donor nucleic acid molecule comprises: (i) a5 'end and a 3' end of 12bp to 250bp in length; (ii) a promoter-free selectable marker; (iii) a reporter gene; (iv) (ii) a self-cleaving peptide linking the promoter-free selectable marker to the reporter gene or a loxP located on either side of the promoter-free selectable marker; and (iv) optionally a linker between the promoter-free selection marker and the reporter gene.

47. The method of claim 46, wherein the double strand break in the nucleic acid molecule: (i) less than or equal to 250bp from the ATG start codon for N-terminal labeling of the cleaved nucleic acid molecule; or (ii) less than or equal to 250bp from the stop codon, for C-terminal labeling of the cleaved nucleic acid molecule.

48. The method of claim 46, wherein the double strand break is induced by one or more nucleic acid cleavage entities.

49. The method of claim 48, wherein the one or more nucleic acid cleavage entities comprise a nuclease comprising one or more zinc finger proteins, one or more transcription activator-like effectors (TALEs), one or more CRISPR complexes, one or more argan-nucleic acid complexes, or one or more meganucleases.

50. The method of claim 48, wherein the one or more nucleic acid cleaving entities are administered using an expression vector, a plastid, a ribonucleoprotein complex (RNC), or an mRNA.

51. The method of claim 46, wherein the promoter-free selection marker comprises a protein, an antibiotic resistance selection marker, a cell surface protein, a metabolite, or an active fragment thereof.

52. The method of claim 46, wherein the self-cleaving peptide is a self-cleaving 2A peptide.

53. The method of claim 46, wherein the matching end comprises single-stranded DNA or double-stranded DNA.

54. The method of claim 46, wherein the matched ends of the 5 'and 3' ends of the donor nucleic acid molecule have a length of 12bp to 200bp, 12bp to 150bp, 12bp to 100bp, 12bp to 50bp, or 12bp to 40 bp.

55. The method of claim 46, wherein the donor nucleic acid molecule is integrated into the cleaved nucleic acid molecule by Homology Directed Repair (HDR).

56. The method of claim 46, further comprising modifying the donor nucleic acid molecule at the 5 'terminus, the 3' terminus, or the 5 'and 3' termini.

57. The method of claim 56, wherein the donor nucleic acid molecule is modified at the 5 'and 3' ends.

58. The method of claim 56, wherein the donor nucleic acid molecule is modified to have one or more nuclease-resistant groups in one or more strands at one or more ends.

59. The method of claim 58, wherein the one or more nuclease-resistant groups comprise one or more phosphorothioate groups, one or more amino groups, 2 '-O-methyl nucleotides, 2' -deoxy-2 '-fluoro nucleotides, 2' -deoxy nucleotides, 5-C-methyl nucleotides, or a combination thereof.

60. The method of claim 46, further comprising treating the donor nucleic acid molecule with one or more non-homologous end-joining (NHEJ) inhibitors.

61. A nucleic acid molecule prepared according to the method of claim 46.

62. A recombinant antibody expression cassette comprising:

(i) matching ends located at the 5 'and 3' ends of the expression cassette, wherein the length of the matching ends is less than or equal to 250 bp;

(ii) a promoter-free selectable marker;

(iii) a reporter gene;

(iv) a self-cleaving peptide linking the promoter-free selectable marker to the reporter gene; and

(v) optionally, a linker between the promoter-free selection marker and the reporter gene,

wherein the promoterless selection marker is linked to the 5 'end of the reporter gene for N-terminal labeling of the cleaved nucleic acid molecule or is linked to the 3' end of the reporter gene for C-terminal labeling of the cleaved nucleic acid molecule.

63. A method of enhancing the activity of a regulatory protein or regulatory complex at a target locus in a cell, the method comprising:

(1) introducing into a cell comprising a nucleic acid encoding the locus of interest:

(i) a first regulatory protein or a first regulatory complex capable of binding to a first regulator binding sequence of the target locus, wherein the first regulator binding sequence comprises a regulatory site; and

(ii) a first DNA binding modulation enhancer capable of binding a first enhancer binding sequence of the target locus; and

(2) allowing the first DNA binding modulation enhancer to bind to the first enhancer binding sequence, thereby enhancing the activity of the first regulatory protein or the first regulatory complex at a target locus in a cell.

64. The method of claim 63, wherein said introducing a first DNA binding modulation enhancer comprises introducing a vector encoding said first DNA binding modulation enhancer.

65. The method of claim 63, wherein said introducing a first DNA binding modulation enhancer comprises introducing mRNA encoding said first DNA binding modulation enhancer.

66. The method of claim 63, wherein the first DNA binding modulation enhancer is a first transcription activator-like (TAL) effector.

67. The method of claim 63, further comprising:

(1) introducing a second DNA binding modulation enhancer into the cell; and

(2) a second enhancer binding sequence that binds the second DNA binding modulation enhancer to the target locus.

68. The method according to claim 64, wherein the first enhancer binding sequence and the second enhancer binding sequence are located within 180 base pairs relative to each other.

69. The method according to claim 64, wherein the first enhancer binding sequence and the second enhancer binding sequence are on opposite sides of the regulator binding sequence.

70. The method of claim 63, wherein the first regulatory protein is a DNA binding nuclease fusion protein.

71. The method of claim 70, wherein the DNA-binding nuclease fusion protein is a TALE-FokI fusion protein.

72. The method of claim 63, wherein the first regulatory complex is a CRISPR/gRNA complex with nuclease activity.

73. The method of claim 72, wherein the first regulatory complex is a Cas9/gRNA complex with nuclease activity.

74. The method of claim 63, further comprising introducing into the cell a second regulatory protein or a second regulatory complex capable of binding a second regulator binding sequence of the target locus, wherein the second regulator binding sequence comprises the regulatory site.

75. The method of claim 74, wherein the first regulatory protein is a DNA binding nuclease fusion protein.

76. The method of claim 75, wherein the DNA-binding nuclease fusion protein is a TALE-FokI fusion protein.

77. The method of claim 76, wherein the second regulatory complex is a CRISPR/gRNA complex with nuclease activity.

78. The method of claim 77, wherein the second regulatory complex is a Cas9/gRNA complex with nuclease activity.