Disclosure of Invention
The invention aims to overcome the limitation of a gene editing system in the prior art in terms of targeting range and delivery efficiency and provides a compact CRISPR/Cas9 gene editing system and application.
In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:
in a first aspect, the invention provides a Cas9 protein, the amino acid sequence of which is shown as SEQ ID NO.1, or the sequence of which at least one amino acid is replaced, deleted or inserted into the amino acid sequence shown as SEQ ID NO. 1.
The Cas9 protein of the invention overcomes the difficulty that the macromolecular property of the Cas9 protein severely restricts the application of the Cas9 protein in common gene delivery vectors (such as AAV) in the prior art, and the molecular weight of the Cas9 protein is remarkably reduced while the gene editing activity is maintained by developing a compact CRISPR/Cas9 system, so that the Cas9 protein is easier to transfer through the vector with limited capacity. In addition, in combination with the delivery characteristics of compact miniaturized proteins, delivery strategies such as designing non-viral vectors such as nanoparticles, liposomes, etc., or in combination with physical delivery techniques such as electroporation, etc., can be further optimized to increase the delivery efficiency of the editing system in a variety of cells and tissues. The miniaturized design can obviously enhance the application potential of the CRISPR technology in-vivo gene editing, and can be popularized in clinical scenes such as gene therapy and the like.
As a preferred embodiment of the Cas9 protein of the present invention, the substitution comprises at least one of N43R、Q56R、G116R、G162R、V164R、G166R、N169R、T186R、Y211R、E213R、G216R、S219R、W230R、Y231R、G236R、T239R、Y240R、F257R、G392R、T393R、S395R、Q415R、T420R、L447R、Y652R、N779R、L782R、D785R、I787R、Y788R、S789R、T802R、Y866R、S877R、Y880R、G886R、L889R、Y895R、Q907R、N983R、S984R、T887R.
As a preferred embodiment of the Cas9 protein of the present invention, the Cas9 protein specifically recognizes the PAM sequence of NNGA.
The N=A or C or G or T, wherein A is adenine in a DNA base, T is thymine in the DNA base, C is cytosine in the DNA base, and G is guanine in the DNA base.
In a second aspect, the invention provides a nucleic acid molecule encoding the Cas9 protein.
In a third aspect, the invention provides a compact CRISPR/Cas9 gene editing system comprising said Cas9 protein.
As a preferred embodiment of the compact CRISPR/Cas9 gene editing system, the compact CRISPR/Cas9 gene editing system further comprises a DR sequence with a nucleotide sequence shown as SEQ ID NO. 2.
As a preferred embodiment of the compact CRISPR/Cas9 gene editing system, the system also comprises a tracrRNA sequence with a nucleotide sequence shown as SEQ ID NO. 3.
As a preferred embodiment of the compact CRISPR/Cas9 gene editing system, the system also comprises a scanfold sequence with a nucleotide sequence shown as SEQ ID NO. 5.
As a preferred embodiment of the compact CRISPR/Cas9 gene editing system of the invention, the identified PAM sequence is NNGA.
In a fourth aspect, the invention uses the Cas9 protein, the nucleic acid molecule, the compact CRISPR/Cas9 gene editing system in gene editing and/or gene delivery. Including gene editing and/or gene delivery in prokaryotic and eukaryotic systems.
Compared with the prior art, the invention has the beneficial effects that:
1. Development and verification of brand new CRISPR/Cas9 system
The invention screens and verifies a novel compact CRISPR/Cas9 gene editing system named as 4th6 by a bioinformatics method. The system includes Cas9 proteins, CRISPR arrays, and tracrRNA elements. Through prokaryotic depletion and interference experiments, the recognition capability of 4th6 to specific PAM sequences is determined, wherein 4th6 can recognize NNGA PAM sequences. In addition, the system exhibits stable and strong DNA cleavage functions in a prokaryotic environment.
2. Optimizing scaffold to promote editing efficiency
According to the invention, the 4th6 scafold is subjected to system optimization, the optimal DR length, tracrRNA range and space length are determined, and finally, the mature scafold sequence is designed, so that the accuracy and efficiency of the gene editing system are remarkably improved.
3. Significant enhancement of editing efficiency by protein engineering
After verifying the function of 4th6 in a eukaryotic environment, the invention further carries out engineering transformation on the Cas9 protein through arginine/lysine substitution, thereby remarkably improving the editing efficiency. The editing efficiency of the optimized 4th6 variant is improved to 1.3 times of that of the original system.
Detailed Description
For a better description of the objects, technical solutions and advantages of the present invention, the present invention will be further described with reference to the following specific examples. It will be appreciated by persons skilled in the art that the specific embodiments described herein are for purposes of illustration only and are not intended to be limiting.
The test methods used in the examples are conventional methods unless otherwise specified, and the materials, reagents, etc. used, unless otherwise specified, are commercially available.
Example 1 bioinformatics mining novel CRISPR/Cas9 Gene editing System
CRISPR ARRAY with conserved Cas1 protein and parameters consistent with the specified parameters (repeat length ≡24 bp, spacer length ≡58 bp) were identified as candidate systems by searching in public databases such as NCBI GenBank, CRISPRdb and Pfam, and combining with CRISPRFINDER tools.
After preliminary candidates are obtained, homologous protein searches are performed using sequence alignment tools such as BLAST and HMMER, and phylogenetic trees are constructed through MAFFT and Clustal Omega to determine the evolutionary relationship of the target sequences. The candidate proteins were then domain analyzed using Pfam and InterPro, focusing on the critical functional regions of Cas9 proteins, including RuvC domain, HNH domain, and PAM recognition domain. A novel Cas9 editing system is successfully screened and identified from the above, and is named as 4th6.
The structural characteristics of the 4th6 protein coded 1052 amino acids (shown as SEQ ID NO. 1) are shown in figure 1. The mechanism of action of this protein follows the typical CRISPR/Cas9 pathway, CRISPR ARRAY transcriptional processing forms mature crRNA, comprising two key components, spacer responsible for recognition sequences complementary to target DNA, DR plays a role in the maturation of crRNA and forms a stable mature RNA secondary structure by pairing with tracrRNA.
Cas9 protein forms RNP complexes with mature RNA, then target sites are located by scanning PAM sequences, then RuvC and HNH domains of Cas9 cleave the targeting and non-targeting strands, respectively, resulting in DNA double strand breaks. The DR sequence of the 4th6 gene editing system is shown as SEQ ID NO.2, and the tracrRNA sequence is shown as SEQ ID NO. 3.
Example 2 identification and verification of PAM sequence of 4th6 Gene editing System
And (3) identifying the PAM identification sequence of the 4th6 gene editing system through a prokaryotic PAM depletion experiment, and verifying the functions of the PAM identification sequence by combining a prokaryotic interference experiment. The method comprises the following specific steps:
(1) Design and construction of PAM random library
A PAM library was designed containing 6 random bases (NNNNNN) where each position was randomly combined of four bases (A, T, C, G) covering all 4096 possible PAM sequences. The library was amplified and then separately constructed on pACYC184 vectors, and the library contained 30 nt of the immobilized spacer sequence upstream of the 6 nt region (as shown in SEQ ID NO. 4), and finally a 6 nt PAM random library plasmid was constructed, which was Amp-resistant. The PAM sequence distribution of the library was tested by high throughput sequencing to ensure randomness and homogeneity.
(2) Construction of Cas9 prokaryotic expression vector
Prokaryotic expression codon optimization is carried out according to the 4th6 protein sequence, and then a pet28a-4th6 expression vector is constructed on a Kana-resistant prokaryotic vector pet28a and used for subsequent experiments.
(3) PAM depletion assay
The pet28a-4th6 was electroporated into DH 5. Alpha. Competent cells simultaneously with the PAM random library plasmid. Following electrotransformation, the resuscitators were plated on plates containing Amp and Kana diabody for overnight incubation at 37 ℃ for 1 hour. The next day, all colonies were scraped and the mixed plasmid was extracted and designated as 4th6-PAM.
(4) PAM sequence determination and analysis
The 4th6-PAM plasmid is used as a template, a specific primer aiming at NNNNNN region is designed for PCR amplification, and high-throughput sequencing is carried out after purification. And (3) taking the empty vector as a control group, calculating the relative abundance change of each PAM sequence in the experimental group, and screening the PAM sequence with the highest exhaustion value. The PAM sequences screened were analyzed and visualized by WebLogo 3.
(5) Prokaryotic interference experiment verification
According to the screening result, respectively constructing plasmids containing the corresponding PAM sequence, wherein the sequence to be verified of 4th6 comprises NNGA. After co-electrotransfer of Cas9 expression plasmid and corresponding PAM plasmid into dh5α competent cells, 10-fold gradient dilution was performed on the bacterial solution at 37 ℃ for 1 hour, and the dilution was plated on plates containing Amp and Kana double antibodies for overnight culture. The colonies were counted the next day to analyze the cleavage efficiency, and the smaller the number of colonies, the stronger the cleavage efficiency.
The result analysis is shown in FIG. 2, wherein A in FIG. 2 is the PAM sequence of 4th6, which is NNGA. The results of a prokaryotic interference experiment of this PAM sequence for 4th6 pair NNGA are shown in fig. 2B. In the experiment, the undiluted bacterial liquid is listed on the leftmost side, and bacterial liquids after stepwise 10-fold dilution are sequentially listed on the right side. In the negative control group, the PAM sequence is not replaced by the PAM sequence of interest, cas9 cannot recognize or cleave DNA, so bacteria retain dual resistance of Amp and Kana and can grow normally on dual-antibody plates. However, in the experimental group, 4th6 was able to recognize the corresponding PAM sequence and achieve double-stranded DNA cleavage, disrupting the Amp resistance gene, resulting in failure of the bacteria to form a monoclonal. Through verification of prokaryotic PAM depletion experiments and interference experiments, the result shows that 4th6 specifically recognizes the NNGA PAM sequence and can effectively play a cutting function in a prokaryotic system.
EXAMPLE 3 mature scaffold determination of 4th6 Gene editing System
In the CRISPR/Cas9 system, the scaffold of the sgRNA plays an important role in the efficiency of gene editing. This example focuses on optimizing the scanfold of the 4th6 system to determine its optimal length and range in order to increase gene editing efficiency. The method comprises the following specific steps:
(1) DR sequence and Length optimization
By performing various truncation designs on the 36 bpDR sequence identified in example 1, a series of candidate truncated DR versions were generated with progressively different lengths of nucleotides removed from their 3' ends. Wherein, the truncated versions of the 4th6 system are respectively 12bp, 13bp, 16bp, 18bp and 20 bp. Plasmids were constructed for binding of these DR sequences of different lengths to tracrRNA and tested for editing efficiency in vitro experiments.
(2) Space sequence and length optimization
Based on the 30 bpspacer sequence used in example 2 as a reference point, various length schemes shorter than 30bp were tested. By progressively reducing the number of bases of the spacer, the effect on editing efficiency was investigated in combination with optimal DR and tracrRNA sequences. Wherein, the truncated versions of the spacer of the 4th6 system are 18bp, 20bp, 22bp, 24bp, 26bp, 28bp and 30bp respectively.
(3) Construction of eukaryotic expressed Cas9
The amino acid sequence of 4th6 described in example 1 was eukaryotic codon optimized and then constructed into a PX330 expression vector designated as PX330-4th6.
(4) Sacffold test
Combining the optimized DR with the tracrRNA, forms a plurality of candidate scaffold. The cleavage efficiency of the target sites by the different scaffolds was assessed separately using a fluorescence reporting system (SSA-GFP). The SSA-GFP system relies on fluorescent expression of GFP gene interrupt repair, and a reporter system can directly reflect the functional effect of the CRISPR system, and the stronger green fluorescence represents the higher editing efficiency.
(5) RNA secondary structure prediction
After the optimal DR and spacer are determined, the secondary structure of the mature guide RNA is modeled and analyzed using RNAfold tools. And the rationality of the design of the guide architecture is further verified through the stability evaluation of the secondary structure.
As a result, as shown in FIG. 3B, C, the optimal DR length of 4th6 is 13 bp and the optimal spacer target length is 20bp. The RNA structure of the mature scaffold of the 4th6 gene editing system is shown in FIG. 3A. The mature scaffold sequence of the 4th6 gene editing system is shown in SEQ ID No. 5.
Example 4:4th6 protein evolution
In order to further improve the efficiency and applicability of the 4th6 system in gene editing, the core component Cas9 protein of the system is optimized through a protein engineering strategy of the system, and a series of amino acid mutants are designed and tested to explore the possibility of improving editing efficiency. The specific operation is as follows:
(1) Key amino acid site selection
Cas9, which has a close homology relationship to the resolved structure, was subjected to structural alignment analysis to determine important amino acid sites associated with sgRNA and target DNA binding. These sites were mapped to the 4th6 protein sequence and 42 sites of 4th6 were selected for protein engineering.
(2) Construction of single point arginine mutations
At each selected site, the codon encoding the original amino acid was replaced with the Arginine (R) coding sequence (AGG), generating a series of single point mutants, including the N43R、Q56R、G116R、G162R、V164R、G166R、N169R、T186R、Y211R、E213R、G216R、S219R、W230R、Y231R、G236R、T239R、Y240R、F257R、G392R、T393R、S395R、Q415R、T420R、L447R、Y652R、N779R、L782R、D785R、I787R、Y788R、S789R、T802R、Y866R、S877R、Y880R、G886R、L889R、Y895R、Q907R、N983R、S984R、T887R. mutated protein vector designated PX330-4th6-R, with the original unmutated vector as a control (WT).
(3) Editing efficiency assessment
Referring to step (4) of example 3, the gene editing efficiency of each mutant was evaluated by flow cytometry using SSA-GFP fluorescence reporting system. The reporter system produces the expression of green fluorescent protein based on the repair of target DNA cleavage, and quantifies editing efficiency by detecting the proportion of green fluorescence.
As a result, as shown in FIG. 4, among the 42 mutation sites of 4th6, 3 sites (Q56R, T802R, S877R) can significantly improve the editing effect, up to 1.3 times.
Example 5 evolved version 4th6 fidelity detection
The targeted editing and off-target effect of the evolved version 4th6 of example 4 (this example uses 4th 6-S877R) was evaluated by detecting the insertion of double stranded oligonucleotides (dsodns) using the GUIDE-seq technique. The specific operation steps are as follows:
(1) Construction of endogenous target plasmids
Based on the specific PAM sequence NNGA of 4th6 determined in example 2, and the optimal spacer target length determined in example 3, this example selects 2 endogenous genes of human origin (DYRK 1A, RNF 2) as targets, and constructs a target plasmid.
(2) Cell electrotransformation
The dsODN, target plasmid and 4th6-S877R protein plasmid were electroporated into HEK293T cells using a Lonza electrotransport apparatus.
(3) GUIDE-seq library construction and sequencing analysis
72 Hours after electrotransformation, cells were collected and DNA was extracted. In this example, after the dsODN binds to the cleavage site of Cas9, a sequencing linker is added to both ends of the labeled DNA via random disruption and end repair. Followed by positive and negative strand PCR amplification. After second generation sequencing, the amplified positive and negative strand libraries are compared with a reference genome, and a target point and a possible off-target site are determined through the comparison result, wherein the site with the highest total reads coverage of the positive and negative strand libraries is regarded as the target point or the off-target site.
The results are shown in FIG. 5, and 4th6 shows higher targeting reads on 2 selected targets according to GUIDE-seq sequencing results, and no off-target phenomenon is detected, which indicates that the 4th6 gene editing system of the invention has high editing capability and extremely high fidelity in a eukaryotic system.
The invention develops a novel Cas protein 4th6 with PAM diversity, which can obviously reduce the limit of PAM sequences on editing target selection, thereby expanding the application range. By improving the ability of proteins to recognize non-standard PAM sequences, a wider coverage of genomic regions can be achieved, providing more options for complex genome editing tasks, and improving editing efficiency, particularly in regions lacking standard PAM.
The invention overcomes the problem that the macromolecular property of the Cas9 protein severely restricts the application of the Cas9 protein in common gene delivery vectors (such as AAV), and by developing a compact CRISPR/Cas9 system, the molecular weight of the Cas9 protein is remarkably reduced while the gene editing activity is maintained, so that the Cas9 protein is easier to transfer through the vector with limited capacity. In addition, in combination with the delivery characteristics of compact miniaturized proteins, delivery strategies such as designing non-viral vectors such as nanoparticles, liposomes, etc., or in combination with physical delivery techniques such as electroporation, etc., can be further optimized to increase the delivery efficiency of the editing system in a variety of cells and tissues. The miniaturized design can obviously enhance the application potential of the CRISPR technology in-vivo gene editing, and can be popularized in clinical scenes such as gene therapy and the like.
Finally, it should be noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the scope of the present invention, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that the technical solution of the present invention may be modified or substituted equally without departing from the spirit and scope of the technical solution of the present invention.