Disclosure of Invention
The invention aims to provide a gene editing system for constructing a diabetes model pig nuclear transfer donor cell with HNF1A gene mutation and application thereof.
The invention provides a method for preparing recombinant cells, which comprises the following steps of replacing a DNA molecule shown as SEQ ID NO. 19 in chromosome DNA of pig cells with a DNA molecule shown as SEQ ID NO. 18 to obtain recombinant cells.
The implementation mode of replacing the DNA molecule shown in SEQ ID NO. 19 in the chromosome DNA of the pig cell by the DNA molecule shown in SEQ ID NO. 18 is that HNF1A-gU2, HNF1A-gD1, HNF1A-mutant-ss163 and NCN proteins are transfected into the pig cell, wherein HNF1A-gU2 is sgRNA, the target sequence binding region of HNF1A-gD1 is sgRNA shown as nucleotide 3-22 in SEQ ID NO. 16, the target sequence binding region of HNF1A-gD1 is shown as nucleotide 3-22 in SEQ ID NO. 17, HNF1A-mutant-ss163 is a single-stranded DNA molecule shown in SEQ ID NO. 18, and NCN protein is Cas9 protein or fusion protein with Cas9 protein.
Specifically, the NCN protein is shown as SEQ ID NO. 3.
Specifically, HNF1A-gU2 is shown as SEQ ID NO. 16.
Specifically, HNF1A-gD1 is shown as SEQ ID NO. 17.
Specifically, HNF1A-gU2 is shown as SEQ ID NO. 11.
Specifically, HNF1A-gD1 is shown as SEQ ID NO. 12.
The pig cells are pig fibroblasts.
The pig cells are primary pig fibroblasts.
The pig cells, HNF1A-gU2, HNF1A-gD1, HNF1A-mutant-ss163 and NCN protein are sequentially prepared from 10 ten thousand pig cells, 0.8-1.2 mug HNF1A-gU2, 0.8-1.2 mug HNF1A-gD1, 1.8-2.2 mug HNF1A-mutant-ss163 and 3-5 mug NCN protein.
The proportion of the pig cells, HNF1A-gU2, HNF1A-gD1, HNF1A-mutant-ss163 and NCN protein is 10 ten thousand pig cells, 1 mug HNF1A-gU2, 1 mug HNF1A-gD1, 2 mug HNF1A-mutant-ss163 and 4 mug NCN protein.
The cotransfection adopts a specific electric shock transfection mode.
The parameters for electric shock transfection can be 1450V, 10ms, 3 pulses.
The cotransfection can be specifically performed by using a mammalian nuclear transfection kit (Neon kit, thermofisher) and a Neon TM transfection system electrotransfection apparatus.
The preparation method of the NCN protein comprises the following steps:
(1) Introducing plasmid pKG-GE4 into escherichia coli BL21 (DE 3) to obtain recombinant bacteria;
(2) Culturing the recombinant bacteria by adopting a liquid culture medium at 30 ℃, then adding IPTG, performing induction culture at 25 ℃, and then collecting the bacteria;
(3) Crushing the collected thalli, and collecting a crude protein solution;
(4) Purifying the His6 -tagged fusion protein from the crude protein solution using affinity chromatography;
(5) Cutting fusion protein with His6 label by enterokinase with His6 label, and removing protein with His6 label by Ni-NTA resin to obtain purified NCN protein;
The plasmid pKG-GE4 has the fusion gene shown in the 5209-9852 nucleotide in SEQ ID NO. 1.
The preparation method of the NCN protein specifically comprises the following steps:
(1) The plasmid pKG-GE4 was introduced into E.coli BL21 (DE 3) to obtain a recombinant strain.
(2) Inoculating the recombinant bacteria obtained in the step (1) to a liquid LB culture medium containing ampicillin, and carrying out shake culture;
(3) Inoculating the bacterial liquid obtained in the step (2) to a liquid LB culture medium, carrying out shaking culture at 30 ℃ and 230rpm until the OD600nm value=1.0, then adding IPTG to ensure that the concentration of the IPTG in the system is 0.5mM, carrying out shaking culture at 25 ℃ and 230rpm for 12 hours, and then centrifuging to collect bacterial cells;
(4) Washing the thalli obtained in the step (3) with PBS buffer solution;
(5) Adding the thalli obtained in the step (4) into a crude extraction buffer solution, suspending the thalli, crushing the thalli, centrifugally collecting supernatant, filtering by adopting a filter membrane with the aperture of 0.22 mu m, and collecting filtrate;
(6) Purifying the His6 -tagged fusion protein (fusion protein shown in SEQ ID NO: 2) from the filtrate obtained in step (5) by affinity chromatography;
(7) Taking the post-column solution collected in the step (6), concentrating the post-column solution by using an ultrafiltration tube, and diluting the post-column solution by using 25mM Tris-HCl (pH 8.0);
(8) Adding the recombinant bovine enterokinase with His6 tag into the solution obtained in the step (7), and performing enzyme digestion;
(9) Uniformly mixing the solution obtained in the step (8) with Ni-NTA resin, incubating, and centrifuging to collect supernatant;
(10) Concentrating the supernatant obtained in the step (9) by using an ultrafiltration tube, and then adding the concentrated supernatant into an enzyme stock solution to obtain the NCN protein solution.
The specific method for purifying the His6 -tagged fusion protein from the filtrate obtained in step (5) by affinity chromatography is as follows:
The method comprises the steps of firstly, balancing a Ni-NTA agarose column by using 5 column volumes of balancing solution (the flow rate is 1 ml/min), then loading 50ml of filtrate obtained in the step (5) (the flow rate is 0.5-1 ml/min), then washing the column by using 5 column volumes of balancing solution (the flow rate is 1 ml/min), then washing the column by using 5 column volumes of buffer solution (the flow rate is 1 ml/min) to remove the impurity proteins, then eluting by using 10 column volumes of eluent at the flow rate of 0.5-1ml/min, and collecting post-column solution (90-100 ml).
The invention also protects a kit comprising any one of the HNF1A-gU2, any one of the HNF1A-gD1, the HNF 1A-variant-ss 163 and any one of the NCN proteins.
The invention also protects a kit comprising any one of the HNF1A-gU2, any one of the HNF1A-gD1, the HNF 1A-variant-ss 163 and PRONCN proteins.
The invention also protects a kit comprising any one of the HNF1A-gU2, any one of the HNF1A-gD1, the HNF 1A-variant-ss 163 and a specific plasmid. The kit also comprises escherichia coli BL21 (DE 3).
Any of the above kits further comprises pig cells.
The pig cells are pig fibroblasts.
The pig cells are primary pig fibroblasts.
The invention also protects the application of any one of the HNF1A-gU2, any one of the HNF1A-gD1, the HNF 1A-variant-ss 163 and any one of the NCN proteins in preparation of a kit.
The invention also protects the application of any one of the HNF1A-gU2, any one of the HNF1A-gD1 and the HNF 1A-variant-ss 163 and PRONCN proteins in preparation of a kit.
The invention also protects the application of any one of the HNF1A-gU2, any one of the HNF1A-gD1, the HNF 1A-variant-ss 163 and the specific plasmid in preparing a kit.
The kit is used for preparing recombinant cells, (b) preparing a diabetes model pig, and (c) preparing a diabetes cell model, a diabetes tissue model or a diabetes organ model.
Any one of the PRONCN proteins comprises the following elements, in order from upstream to downstream, a signal peptide, a chaperone protein, a protein tag, a protease cleavage site, a nuclear localization signal, a Cas9 protein and a nuclear localization signal.
The function of the signal peptide is to promote secretory expression of the protein. The signal peptide may be selected from the group consisting of an E.coli alkaline phosphatase (phoA) signal peptide, a Staphylococcus aureus protein A signal peptide, an E.coli outer membrane protein (ompa) signal peptide, or a signal peptide of any other prokaryotic gene, preferably alkaline phosphatase signal peptide (phoA SIGNAL PEPTIDE). The alkaline phosphatase signal peptide is used for guiding the secretion and expression of the target protein into the periplasmic cavity of the bacterium so as to be separated from the intracellular protein of the bacterium, and the target protein secreted into the periplasmic cavity of the bacterium is expressed in a soluble way and can be cracked by the signal peptidase in the periplasmic cavity of the bacterium.
The chaperone protein functions to increase the solubility of the protein. The chaperone may be any protein that aids in disulfide bond formation, preferably a thioredoxin (TrxA protein). Thioredoxin, which can serve as a molecular chaperone to help the co-expressed target protein (e.g., cas9 protein) form disulfide bonds, improving the stability of the protein, the correctness of folding, and increasing the solubility and activity of the target protein.
The function of the protein tag is for protein purification. The Tag may be a His Tag (His-Tag, his6 protein Tag), GST Tag, flag Tag, HA Tag, c-Myc Tag or any other protein Tag, more preferably a His Tag. His tag can be combined with Ni column, and target protein can be purified by one-step Ni column affinity chromatography, so that the purification process of target protein can be greatly simplified.
The protease cleavage site functions to cleave off the nonfunctional segment after purification to release the native form of Cas9 protein. The protease may be selected from enterokinase (Enterokinase), factor Xa, thrombin (Thrombin), TEV protease (TEV protease), HRV 3C protease (HRV 3C protease), WELQut protease or any other endoprotease, further preferably enterokinase. EK is enterokinase cleavage site, which is convenient for cutting fused TrxA-His segment by enterokinase to obtain the natural form Cas9 protein. After the commercial enterokinase enzyme digestion fusion protein with the His tag is used, the TrxA-His segment and the enterokinase with the His tag can be removed through one-time affinity chromatography to obtain the natural form of the Cas9 protein, so that the damage and the loss of the target protein caused by repeated purification and dialysis are avoided.
The nuclear localization signal may be any nuclear localization signal, preferably an SV40 nuclear localization signal and/or nucleoplasmin nuclear localization signal. NLS is a nuclear localization signal, and an NLS site is designed at the N end and the C end of Cas9 respectively, so that Cas9 can enter the nucleus more effectively for gene editing.
The Cas9 protein may be saCas or spCas9, preferably spCas9 protein.
The PRONCN protein is specifically shown as SEQ ID NO. 2.
The specific plasmid comprises the following elements, namely a promoter, an operator, a ribosome binding site, a PRONCN protein coding gene and a terminator from upstream to downstream.
The promoter may specifically be a T7 promoter. The T7 promoter is a prokaryotic expression strong promoter and can efficiently drive the expression of exogenous genes.
The operon may specifically be the Lac operon. The Lac operon is a regulatory element for lactose induced expression, and can induce the expression of the target protein at low temperature after bacteria grow to a certain amount, thereby avoiding the influence of the premature expression of the target protein on the growth of host bacteria, and remarkably improving the solubility of the expressed target protein by the induced expression at low temperature.
The ribosome binding site is a ribosome binding site for protein translation, and is necessary for protein translation.
The terminator may specifically be a T7 terminator. The T7 terminator can effectively terminate gene transcription at the tail end of the target gene, and prevent other downstream sequences except the target gene from being transcribed and translated.
For the codon of the spCas9 protein, the codon is optimized, so that the codon completely adapts to the codon preference of the E.coli BL21 (DE 3) strain for efficiently expressing the escherichia coli selected by the application, and the expression level of the Cas9 protein is improved.
The T7 promoter is shown as 5121-5139 nucleotides in SEQ ID NO. 1.
The Lac operon is shown as 5140-5164 nucleotides in SEQ ID NO. 1.
The ribosome binding site is shown as nucleotide 5178-5201 in SEQ ID NO. 1.
The coding sequence of the alkaline phosphatase signal peptide is shown as 5209-5271 nucleotides in SEQ ID NO. 1.
The coding sequence of the TrxA protein is shown as 5272-5598 nucleotide in SEQ ID NO. 1.
The coding sequence of His-Tag is shown as 5620-5637 nucleotide in SEQ ID NO. 1.
The coding sequence of the enterokinase enzyme cutting site is shown as 5638-5652 nucleotides in SEQ ID NO. 1.
The coding sequence of the nuclear localization signal is shown as 5656-5670 nucleotides in SEQ ID NO. 1.
The coding sequence of the spCas9 protein is shown as 5701-9801 nucleotide in SEQ ID NO. 1.
The coding sequence of the nuclear localization signal is shown as 9802-9849 nucleotides in SEQ ID NO. 1.
T7 terminator is shown as 9902-9949 nucleotides in SEQ ID NO. 1.
Specifically, the specific plasmid is plasmid pKG-GE4.
The plasmid pKG-GE4 has the DNA molecule shown in the 5121-9949 nucleotide of SEQ ID NO. 1.
Specifically, any one of the plasmids pKG-GE4 is shown as SEQ ID NO. 1.
The proportion of HNF1A-gU2, HNF1A-gD1, HNF1A-mutant-ss163 and NCN protein is 0.8-1.2 mu gHNF A-gU2, 0.8-1.2 mu g HNF1A-gD1, 1.8-2.2 mu g HNF1A-mutant-ss163 and 3-5 mu g NCN protein.
The proportion of HNF1A-gU2, HNF1A-gD1, HNF1A-mutant-ss163 and NCN protein is 1 μg HNF1A-gU2:1 μg HNF1A-gD1:2 μg HNF1A-mutant-ss163:4 μg NCN protein.
The pig cells, HNF1A-gU2, HNF1A-gD1, HNF1A-mutant-ss163 and NCN protein are sequentially prepared from 10 ten thousand pig cells, 0.8-1.2 mug HNF1A-gU2, 0.8-1.2 mug HNF1A-gD1, 1.8-2.2 mug HNF1A-mutant-ss163 and 3-5 mug NCN protein.
The proportion of the pig cells, HNF1A-gU2, HNF1A-gD1, HNF1A-mutant-ss163 and NCN protein is 10 ten thousand pig cells, 1 mug HNF1A-gU2, 1 mug HNF1A-gD1, 2 mug HNF1A-mutant-ss163 and 4 mug NCN protein.
The invention also protects the recombinant cells prepared by any one of the methods.
The invention also protects the application of the recombinant cells in preparing diabetes model pigs.
And (3) taking the recombinant cells as nuclear transfer donor cells to clone somatic cells, so that cloned pigs, namely diabetes model pigs, can be obtained.
The invention also protects pig tissues of a model pig prepared by using the recombinant cells, namely a diabetes tissue model.
The invention also protects a pig organ, namely a diabetes organ model, of a model pig prepared by utilizing the recombinant cells.
The invention also protects porcine cells (e.g., islet cells or hepatocytes) of a model pig prepared using the recombinant cells, i.e., a diabetes cell model.
The invention also protects the use of the recombinant cell, the diabetic tissue model, the diabetic organ model, the diabetic cell model or the diabetic model pig as follows (d 1) or (d 2) or (d 3) or (d 4):
(d1) Screening medicines for treating diabetes;
(d2) Evaluating the efficacy of the diabetes medicine;
(d3) Performing gene therapy and/or cell therapy efficacy evaluation of diabetes;
(d4) The pathogenesis of diabetes is studied.
Any of the above pigs may specifically be a from-river fragrant pig.
Any of the above diabetes may be juvenile onset adult-onset diabetes (MODY-type diabetes).
Any of the above diabetes may be type MODY3 diabetes.
MODY3 type diabetes is caused by a mutation in the HNF1A gene located in the P291fsinsC mutation in exon 4 of the HNF1A gene.
Pig HNF1A gene information is coded by hepatocyte nuclear factor 1-alpha, is positioned on chromosome 14 and has GeneID of 574067,Sus scrofa. The amino acid sequence of the protein coded by the pig HNF1A gene is shown as SEQ ID NO. 8. The pig HNF1A gene has a DNA segment shown in SEQ ID NO. 9.
Compared with the prior art, the invention has at least the following beneficial effects:
(1) The subject (pig) of the invention has better applicability than other animals (rats, mice, primates).
Rodents such as rats and mice have great differences from humans in terms of body type, organ size, physiology, pathology and the like, and cannot truly simulate normal physiological and pathological states of humans. Studies have shown that more than 95% of drugs that are validated in mice are ineffective in human clinical trials. In the case of large animals, primates are animals with the closest relationship to humans, but are small in size, late in sexual maturity (mating begins at 6-7 years old), and single animals, the population expansion rate is extremely slow, and the raising cost is high. In addition, primate cloning is inefficient, difficult and costly.
The pig is an animal which has the closest relationship with human except primate, and has the similar body shape, weight, organ size and the like as human, and has the similar anatomical, physiological, immunological, nutritional metabolism, disease pathogenesis and the like as human. Meanwhile, the pigs are early in sexual maturity (4-6 months), have high fertility and have more piglets, and can form a larger group within 2-3 years. In addition, the cloning technology of pigs is very mature, and the cloning and feeding costs are much lower than those of primates. Pigs are thus very suitable animals as models of human diseases.
(2) The vector constructed by the invention uses a strong promoter T7-lac capable of efficiently expressing the target protein to express the target protein, and uses a signal peptide of bacterial periplasmic protein alkaline phosphatase (phoA) to guide the secretory expression of the target protein into a bacterial periplasmic cavity so as to separate from bacterial intracellular proteins, wherein the target protein secreted into the bacterial periplasmic cavity is expressed in a soluble way. Meanwhile, the fusion expression of the thioredoxin TrxA and the Cas9 protein is adopted, the TrxA can help the co-expressed target protein to form disulfide bonds, the stability and folding correctness of the protein are improved, and the solubility and activity of the target protein are increased. In order to facilitate purification of the target protein, a His tag is designed, and the target protein can be purified by one-step Ni column affinity chromatography, so that the purification process of the target protein is greatly simplified. Meanwhile, an enterokinase enzyme cutting site is designed behind the His tag, so that fused TrxA-His polypeptide fragments can be conveniently cut off, and the Cas9 protein in a natural form is obtained. After the fusion protein is digested by using the enterokinase with the His tag, the TrxA-His polypeptide fragment and the enterokinase with the His tag can be removed by one-time affinity chromatography to obtain the natural form of the Cas9 protein, thereby avoiding the damage and the loss of the target protein caused by multiple purification dialysis. Meanwhile, the N end and the C end of the Cas9 are respectively designed with an NLS site, so that the Cas9 can enter a cell nucleus more effectively for gene editing. In addition, the E.coli BL21 (DE 3) strain is selected as a target protein expression strain, and the strain can efficiently express and clone exogenous genes in an expression vector (such as pET-32 a) containing a phage T7 promoter. Meanwhile, the codon of the Cas9 protein is optimized, so that the codon is completely adapted to the codon preference of an expression strain, and the expression level of the target protein is improved. In addition, after bacteria grow to a certain quantity, the invention uses IPTG to induce the expression of the target protein at low temperature, thereby avoiding the influence of the premature expression of the target protein on the growth of host bacteria, and obviously improving the solubility of the expressed target protein by the induction expression at low temperature. Through the optimization design and experimental implementation, the activity of the obtained Cas9 protein is remarkably improved compared with that of commercial Cas9 protein.
(3) The Cas9 high-efficiency protein constructed and expressed by the invention is combined with the in vitro transcribed gRNA to carry out gene editing, the optimal dosage proportion of Cas9 and gRNA is optimized, the synthesized ssODN is used as the Donor DNA, and finally, the single cell cloning rate of the point mutation of the target site is up to 20 percent, which is far higher than the conventional point mutation efficiency (< 5 percent).
(4) The clone of somatic cell nuclear transfer animal by utilizing the target gene point mutation monoclonal strain can directly obtain cloned pigs containing target gene point mutation, and the mutation can be inherited stably.
The method of microinjection of fertilized eggs with gene editing material and then embryo transplantation adopted in the mouse model production is not suitable for the model production of large animals (such as pigs) with longer gestation period because the probability of directly obtaining the point mutation offspring is very low (less than 1%), the hybrid breeding of the offspring is needed. Therefore, the method for editing primary cells in vitro and carrying out ssODN homologous recombination and screening positive editing single cell clones with high technical difficulty and high challenges is adopted, and corresponding disease model pigs are directly obtained through somatic cell nuclear transfer animal cloning technology in the later period, so that the model pig manufacturing period can be greatly shortened, and manpower, material resources and financial resources can be saved.
The invention adopts CRISPR/Cas9 technology and ssODN homologous recombination technology to carry out site-directed modification of HNF1A gene, simulates natural onset genetic characteristics of MODY3 type diabetes, obtains single cell clone with accurate site-directed modification of HNF1A gene, and lays a foundation for cultivating MODY3 type diabetes disease model pigs by somatic cell nuclear transfer animal cloning technology in later period. The model pig provides a powerful experimental tool for researching pathogenesis of MODY3 type diabetes and researching and developing medicines.
The invention lays a solid foundation for obtaining the MODY3 type diabetes model pig with HNF1A gene mutation by a gene editing means, is helpful for researching and revealing pathogenesis of MODY3 type diabetes caused by HNF1A gene mutation, can also be used for researching drug screening, drug effect detection, gene therapy, cell therapy and the like, can provide effective experimental data for further clinical application, and further provides a powerful experimental means for successfully treating human MODY3 type diabetes. The invention has great application value for researching and developing MODY3 type diabetes mellitus medicine and revealing pathogenesis of the disease.
Detailed Description
The following detailed description of the invention is provided in connection with the accompanying drawings that are presented to illustrate the invention and not to limit the scope thereof. The examples provided below are intended as guidelines for further modifications by one of ordinary skill in the art and are not to be construed as limiting the invention in any way.
The experimental methods in the following examples, unless otherwise specified, are conventional methods, and are carried out according to techniques or conditions described in the literature in the field or according to the product specifications. Materials, reagents and the like used in the examples described below are commercially available unless otherwise specified. The recombinant plasmids constructed in the examples were all subjected to sequencing verification. The commercial Cas9-A protein is a commercially available Cas9 protein with good effect. The commercial Cas9-B protein is a commercially available Cas9 protein with good effect. Complete medium (% by volume) 15% foetal calf serum (Gibco) +83% DMEM medium (Gibco) +1% Penicillin-Streptomycin (Gibco) +1% HEPES (Solarbio). Cell culture conditions were 37℃in a constant temperature incubator with 5% CO2、5%O2.
The primary fibroblasts of pigs used in the examples were prepared from the tissue of the ear of a Jiang Xiang pig from a first time. A method for preparing primary fibroblast of pig comprises collecting pig ear tissue 0.5g, removing hair and bone tissue, soaking in 75% alcohol for 30-40s, washing with PBS buffer containing 5% (volume ratio) Penicillin-Streptomycin (Gibco) for 5 times, washing once with PBS buffer, shearing tissue with ② scissors, digesting with 5mL of 0.1% collagenase solution (Sigma) at 37deg.C for 1 hr, centrifuging for 5min, discarding supernatant, resuspending the precipitate with 1mL of complete culture solution ③, spreading into 10mL of complete culture solution, sealing with 0.2% gelatin (VWR) plate, culturing until the cell grows to about 60% of the bottom of the plate, and culturing with trypsin after ④. For carrying out subsequent electrotransformation experiments.
Plasmid pKG-GE3 is a circular plasmid, as shown in SEQ ID NO. 2 of patent application 202010084343.6. In SEQ ID NO. 2 of patent application 202010084343.6, nucleotide 395-680 constitutes the CMV enhancer, nucleotide 682-890 constitutes the EF1a promoter, nucleotide 986-1006 encodes the Nuclear Localization Signal (NLS), nucleotide 1016-1036 encodes the Nuclear Localization Signal (NLS), nucleotide 1037-5161 encodes the Cas9 protein, nucleotide 5162-5209 encodes the Nuclear Localization Signal (NLS), nucleotide 5219-5266 encodes the Nuclear Localization Signal (NLS), nucleotide 5276-5332 encodes the cleavage polypeptide P2A (the amino acid sequence of cleavage polypeptide P2A is "ATNFSLLKQAGDVEENPGP", the cleavage site at which cleavage occurs between the first amino acid residue and the second amino acid residue from the C-terminus, nucleotide 5333-6046 encodes the EGFP protein, nucleotide 6056-6109 encodes the cleavage polypeptide T2A (the amino acid sequence of cleavage polypeptide T2A is "EGRGSLLTCGDVEENPGP", the cleavage site at which cleavage site occurs between the first amino acid residue from the C-terminus is "377", the amino acid sequence of cleavage site at which cleavage site is the nucleotide position of the cleavage site is "677", the amino acid sequence of nucleotide No. 3-7643 is the amino acid sequence of the cleavage site is encoded between the first amino acid residue from the first amino acid residue of the cleavage site of the cleavage element, nucleotide position 2A is "3782", the amino acid sequence of the cleavage element No. 2A is encoded between nucleotide No. 10-7 and the amino acid sequence of the polypeptide 11B is encoded by the polypeptide B. In SEQ ID NO. 2 of patent application 202010084343.6, nucleotides 911 to 6706 form a fusion gene, expressing a fusion protein. Due to the presence of the self-cleaving polypeptide P2A and the self-cleaving polypeptide T2A, the fusion protein spontaneously forms three proteins, a protein with Cas9 protein, a protein with EGFP protein and a protein with Puro protein.
The pKG-U6gRNA vector, i.e., plasmid pKG-U6gRNA, is a circular plasmid, as shown in SEQ ID NO. 3 of patent application 202010084343.6. In SEQ ID NO. 3 of patent application 202010084343.6, nucleotides 2280 to 2539 constitute the hU6 promoter and nucleotides 2558 to 2637 are used for transcription to form the gRNA backbone. When in use, a DNA molecule of about 20bp (target sequence binding region for transcription to form gRNA) is inserted into plasmid pKG-U6gRNA to form a recombinant plasmid, and the recombinant plasmid is transcribed in cells to obtain gRNA.
Example 1 construction of prokaryotic Cas9 efficient expression vector
The structure of plasmid pET-32a is schematically shown in FIG. 1.
Plasmid pKG-GE4 was obtained by transformation with plasmid pET-32a as starting plasmid. The plasmid pET32a-T7lac-phoA is SP-TrxA-His-EK-NLS-spCas9-NLS-T7ter (called plasmid pKG-GE4 for short), is a circular plasmid as shown in SEQ ID NO. 1, and the structural schematic diagram is shown in figure 2.
In SEQ ID NO. 1, nucleotides 5121 to 5139 constitute the T7 promoter, nucleotides 5140 to 5164 encode the Lac operator (Lac operator), nucleotides 5178 to 5201 constitute the Ribosome Binding Site (RBS), nucleotides 5209 to 5271 encode the alkaline phosphatase signal peptide (phoA SIGNAL PEPTIDE), nucleotides 5272 to 5598 encode the TrxA protein, nucleotides 5620 to 5637 encode the His-Tag, nucleotides 5638 to 5652 encode the enterokinase cleavage site (EK cleavage site), nucleotides 5656 to 5670 encode the nuclear localization signal, nucleotides 5701 to 9801 encode the spCas9 protein, nucleotides 9802 to 9849 encode the nuclear localization signal, and nucleotides 9902 to 9949 constitute the T7 terminator. The nucleotides encoding spCas9 protein have been codon optimized for the e.coli BL21 (DE 3) strain.
The plasmid pKG-GE4 is mainly modified by ① retaining the coding region of TrxA protein, which can help the expressed target protein form disulfide bond and increase the solubility and activity of the target protein, adding the coding sequence of alkaline phosphatase signal peptide before the coding region of TrxA protein, which can guide the expressed target protein to be secreted into the periplasm cavity of the bacterial membrane and be digested by prokaryotic periplasm signal peptide, ② adding the coding sequence of His-Tag after the coding sequence of TrxA protein, his-Tag can be used for enriching the expressed target protein, ③ adding the coding sequence of enterokinase enzyme cleavage site DDDDDDK (Asp-Asp-Asp-Lys) downstream of the coding sequence of His-Tag, the purified protein can remove His-Tag and the fused TrxA protein upstream under the action of enterokinase, ④ inserting the 9 genes expressed by the strain of escherichia coli BL21 (DE 3), and simultaneously adding the localization of the coding sequence of Cas-Tag at the upstream and downstream of the gene, and increasing the localization of the coding sequence.
The fusion gene in the plasmid pKG-GE4 is shown as 5209-9852 nucleotide in SEQ ID NO. 1, and encodes a fusion protein shown in SEQ ID NO. 2 (fusion protein TrxA-His-EK-NLS-spCas9-NLS, which is called PRONCN protein for short). Due to the presence of the alkaline phosphatase signal peptide and enterokinase cleavage site, the fusion protein is cleaved by enterokinase to form the protein shown in SEQ ID NO. 3, and the protein shown in SEQ ID NO. 3 is named NCN protein.
EXAMPLE 2 preparation and purification of NCN protein
1. Induction of expression
1. The plasmid pKG-GE4 was introduced into E.coli BL21 (DE 3) to obtain a recombinant strain.
2. The recombinant strain obtained in step 1 was inoculated into a liquid LB medium containing 100. Mu.g/ml ampicillin, and cultured overnight at 37℃under shaking at 200 rpm.
3. Inoculating the bacterial liquid obtained in the step 2 to a liquid LB culture medium, culturing at 30 ℃ and 230rpm until the OD600nm value=1.0, adding isopropyl thiogalactoside (IPTG) to ensure that the concentration of the isopropyl thiogalactoside in the system is 0.5mM, culturing at 25 ℃ and 230rpm for 12 hours, centrifuging at 4 ℃ and 10000g for 15 minutes, and collecting bacterial bodies.
4. And (3) washing the thalli obtained in the step (3) with PBS buffer solution.
2. Purification of fusion protein TrxA-His-EK-NLS-spCas9-NLS
1. And (3) adding the crude extraction buffer solution into the thalli obtained in the step one, suspending the thalli, crushing the thalli by using a homogenizer (1000 par circulation is performed three times), centrifuging at 4 ℃ and 15000g for 30min, collecting supernatant, filtering the supernatant by using a filter membrane with the aperture of 0.22 mu m, and collecting filtrate. In this step, 10ml of the crude extraction buffer was mixed per g of the wet cells.
The crude extraction buffer contained 20mM Tris-HCl (pH 8.0), 0.5M NaCl, 5mM Imidazole, 1mM PMSF, the balance ddH2 O.
2. The fusion protein was purified by affinity chromatography.
The method comprises the steps of firstly, balancing a Ni-NTA agarose column by using 5 column volumes of balancing solution (the flow rate is 1 ml/min), then loading 50ml of filtrate obtained in the step 1 (the flow rate is 0.5-1 ml/min), then washing the column by using 5 column volumes of balancing solution (the flow rate is 1 ml/min), then washing the column by using 5 column volumes of buffer solution (the flow rate is 1 ml/min) to remove the impurity proteins, then eluting by using 10 column volumes of eluent at the flow rate of 0.5-1ml/min, and collecting post-column solution (90-100 ml).
Ni-NTA agarose column, gold Style, L00250/L00250-C, packing 10ml.
The equilibration solution contained 20mM Tris-HCl (pH 8.0), 0.5M NaCl, 5mM Imidazole, the balance ddH2 O.
Buffer containing 20mM Tris-HCl (pH 8.0), 0.5M NaCl, 50mM Imidazole, the balance ddH2 O.
The eluent contained 20mM Tris-HCl (pH 8.0), 0.5M NaCl, 500mM Imidazole, the remainder ddH2 O.
3. Cleavage of fusion protein TrxA-His-EK-NLS-spCas9-NLS and purification of NCN protein
1. 15Ml of the post-column solution collected in step two was concentrated to 200. Mu.l using an Amicon ultrafiltration tube (Sigma, UFC9100, capacity 15 ml) and then diluted to 1ml with 25mM Tris-HCl (pH 8.0). 6 ultrafiltration tubes were used to give a total of 6ml.
2. Commercial sources of His-tagged recombinant bovine enterokinase (organisms, C620031, recombinant bovine enterokinase light chain, his-tagged, recombinant Bovine Enterokinase LIGHT CHAIN, his) were added to the solution from step 1 (about 6 ml), and digested at 25℃for 16 hours. 2 units of enterokinase were added in a ratio of 50. Mu.g protein.
3. The solution (about 6 ml) from step 2 was taken, mixed with 480. Mu.l of Ni-NTA resin (Kirsrui, L00250/L00250-C), spun at room temperature for 15min, centrifuged at 7000g for 3min, and the supernatant (4-5.5 ml) was collected.
4. The supernatant obtained in the step 3 was concentrated to 200. Mu.l using an Amicon ultrafiltration tube (Sigma, UFC9100, capacity: 15 ml), and then added to an enzyme stock solution to adjust the protein concentration to 5mg/ml, thereby obtaining an NCN protein solution.
The protein in NCN protein solution is sequenced, and 15 amino acid residues at the N end are shown in positions 1 to 15 of SEQ ID NO. 3, namely NCN protein.
The NCN proteins used in the subsequent examples were each provided by NCN protein solutions.
The enzyme stock solution (pH 7.4) contained 10mM Tris,300mM NaCl,0.1mM EDTA,1mM DTT,50% by volume of glycerol, the balance being ddH2 O.
Example 3 Performance of NCN protein
The selection of 2 gRNA targets targeting TTN genes was as follows:
TTN-gRNA1:AGAGCACAGTCAGCCTGGCG;
TTN-gRNA2:CTTCCAGAATTGGATCTCCG。
The primers used to identify the target fragment comprising the gRNA in the TTN gene were as follows:
TTN-F55:TACGGAATTGGGGAGCCAGCGGA;
TTN-R560:CAAAGTTAACTCTCTGTGTCT。
1. Preparation of gRNA
1. Preparation of TTN-T7-gRNA1 transcription template and TTN-T7-gRNA2 transcription template
The TTN-T7-gRNA1 transcription template is a double-stranded DNA molecule, and is shown as SEQ ID NO. 4.
The TTN-T7-gRNA2 transcription template is a double-stranded DNA molecule, and is shown as SEQ ID NO. 5.
2. In vitro transcription to obtain gRNA
TTN-T7-gRNA1 transcription template is adopted, TRANSCRIPT AID T7 HIGH YIELD Transcription Kit (Fermentas, K0441) is adopted for in vitro transcription, and then MEGA CLEARTM Transcription Clean-Up Kit (Thermo, AM 1908) is used for recovery and purification, so that TTN-gRNA1 is obtained. TTN-gRNA1 is single-stranded RNA, and is shown in SEQ ID NO. 6.
TTN-T7-gRNA2 transcription template is adopted, TRANSCRIPT AID T7 HIGH YIELD Transcription Kit (Fermentas, K0441) is adopted for in vitro transcription, and then MEGA CLEARTM Transcription Clean-Up Kit (Thermo, AM 1908) is used for recovery and purification, so that TTN-gRNA2 is obtained. TTN-gRNA2 is single-stranded RNA, as shown in SEQ ID NO. 7.
2. Optimization of dosage proportion of gRNA and NCN proteins
1. Co-transfected porcine primary fibroblasts
The first group was to co-transfect porcine primary fibroblasts with TTN-gRNA1, TTN-gRNA2 and NCN proteins. The ratio was about 10 ten thousand porcine primary fibroblasts with 0.5. Mu.g TTN-gRNA1, 0.5. Mu.g TTN-gRNA2, 4. Mu.g NCN protein.
The second group is to co-transfect porcine primary fibroblasts with TTN-gRNA1, TTN-gRNA2 and NCN proteins. The ratio was about 10 ten thousand porcine primary fibroblasts 0.75. Mu.g TTN-gRNA1 0.75. Mu.g TTN-gRNA2 4. Mu.g NCN protein.
Third group, porcine primary fibroblasts were co-transfected with TTN-gRNA1, TTN-gRNA2 and NCN proteins. The ratio is about 10 ten thousand porcine primary fibroblasts 1 μg TTN-gRNA 24 μg NCN protein.
Fourth group, porcine primary fibroblasts were co-transfected with TTN-gRNA1, TTN-gRNA2 and NCN proteins. The ratio was about 10 ten thousand porcine primary fibroblasts 1.25. Mu.g TTN-gRNA 1:1.25. Mu.g TTN-gRNA 2:4. Mu.g NCN protein.
And fifth group, co-transfecting the TTN-gRNA1 and the TTN-gRNA2 into the primary fibroblast of the pig. The ratio is about 10 ten thousand pig primary fibroblasts 1 mug TTN-gRNA1 and 1 mug TTN-gRNA2.
Co-transfection was performed by electric shock transfection using a mammalian nuclear transfection kit (Neon kit, thermofisher) and a Neon TM transfection system electrotransfection apparatus (parameters set to 1450V, 10ms, 3 pulses).
2. After the step 1 is completed, the culture is carried out for 12 to 18 hours by adopting the complete culture solution, and then the culture is carried out by replacing the new complete culture solution. The total incubation time after electrotransformation was 48 hours.
3. After the step 2 is completed, cells are digested and collected by trypsin, genomic DNA is extracted, PCR amplification is performed by using a primer pair consisting of TTN-F55 and TTN-R560, and then 1% agarose gel electrophoresis is performed.
The electrophoresis pattern is shown in FIG. 3. The 505bp band is wild type band (WT), and the 254bp band (about 251bp of the 505bp theoretical deletion of the wild type band) is deletion mutation band (MT).
Gene deletion mutation efficiency = (MT gray scale/MT band bp number)/(WT gray scale/WT band bp number + MT gray scale/MT band bp number) ×100%. The first group of gene deletion mutation efficiency was 19.9%, the second group of gene deletion mutation efficiency was 39.9%, the third group of gene deletion mutation efficiency was 79.9%, and the fourth group of gene deletion mutation efficiency was 44.3%. The fifth group had no mutation.
The results show that the gene editing efficiency is highest when the mass ratio of the two gRNAs to the NLS-spCas9-NLS protein is 1:1:4, and the actual dosage is 1 mug to 4 mug. Thus, the optimum amount of the two gRNAs and NCN protein was determined to be 1. Mu.g/4. Mu.g.
3. Comparison of Gene editing efficiency of NCN protein and commercial Cas9 protein
1. Co-transfected porcine primary fibroblasts
Cas9-A group TTN-gRNA1, TTN-gRNA2 and commercial Cas9-A protein were co-transfected into porcine primary fibroblasts. The ratio is about 10 ten thousand pig primary fibroblasts with 1 mug TTN-gRNA1 to 1 mug TTN-gRNA2 to 4 mug Cas9-A protein.
The pKG-GE4 group was prepared by cotransfecting porcine primary fibroblasts with TTN-gRNA1, TTN-gRNA2 and NCN proteins. The ratio is about 10 ten thousand porcine primary fibroblasts 1 μg TTN-gRNA 24 μg NCN protein.
Cas9-B group TTN-gRNA1, TTN-gRNA2 and commercial Cas9-B protein were co-transfected into porcine primary fibroblasts. The ratio is about 10 ten thousand pig primary fibroblasts with 1 mug TTN-gRNA1 to 1 mug TTN-gRNA2 to 4 mug Cas9-B protein.
Control group TTN-gRNA1, TTN-gRNA2 were co-transfected into porcine primary fibroblasts. The ratio is about 10 ten thousand pig primary fibroblasts 1 mug TTN-gRNA1 mu gTTN-gRNA2.
Co-transfection was performed by electric shock transfection using a mammalian nuclear transfection kit (Neon kit, thermofisher) and a Neon TM transfection system electrotransfection apparatus (parameters set to 1450V, 10ms, 3 pulses).
2. After the step 1 is completed, the culture is carried out for 12 to 18 hours by adopting the complete culture solution, and then the culture is carried out by replacing the new complete culture solution. The total incubation time after electrotransformation was 48 hours.
3. After the step 2 is completed, cells are digested and collected by trypsin, genomic DNA is extracted, PCR amplification is performed by using a primer pair consisting of TTN-F55 and TTN-R560, and then 1% agarose gel electrophoresis is performed.
The electrophoresis pattern is shown in FIG. 4. The gene deletion mutation efficiency of the commercial Cas9-A protein is 28.5%, the gene deletion mutation efficiency of the NCN protein is 85.6%, and the gene deletion mutation efficiency of the commercial Cas9-B protein is 16.6%.
The results show that compared with the commercial Cas9 protein, the NCN protein prepared by the method provided by the invention has the advantage that the gene editing efficiency is obviously improved.
Example 4 screening of efficient gRNA targets for HNF1A Gene
Pig HNF1A gene information is coded by hepatocyte nuclear factor 1-alpha, is positioned on chromosome 14 and has GeneID of 574067,Sus scrofa. The amino acid sequence of the protein coded by the pig HNF1A gene is shown as SEQ ID NO. 8. In genomic DNA, the porcine HNF1A gene has 10 exons. The partial sequence of the pig HNF1A gene (comprising the 4 th exon and 400bp on the upstream and downstream thereof) is shown in SEQ ID NO. 9.
1. HNF1A gene preset point mutation site and adjacent genome sequence conservation analysis
18 From Jiangxiang pigs, of which 10 females (designated 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 respectively) and 8 males (designated A, B, C, D, E, F, G, H respectively) were bred.
The genome was extracted from ear tissue of swine designated as 1 as a template, and PCR amplification was performed using different primer pairs, followed by 1% agarose gel electrophoresis. The electrophoresis pattern is shown in FIG. 5. In FIG. 5, the primer set consisting of HNF1A-E4-F174 and HNF1A-E4-R724 is used in group 1, and the primer set consisting of HNF1A-E4-F228 and HNF1A-E4-R716 is used in group 2. As a result, it was found that the target fragment was amplified preferably using a primer set consisting of HNF1A-E4-F174 and HNF 1A-E4-R724.
PCR amplification was performed using 18 pig genomic DNAs as templates, respectively, using primer pairs consisting of HNF1A-E4-F174 and HNF1A-E4-R724, followed by 1% agarose gel electrophoresis. The electrophoresis pattern is shown in FIG. 6. And (3) recovering PCR amplification products, sequencing, and comparing the sequencing results with HNF1A gene sequences in a public database for analysis. The conserved regions common to 18 pigs were selected for the design of the gRNA targets.
HNF1A-E4-F174:AGAGAGGCTAAGTCACTTGCTCA;
HNF1A-E4-R724:AGAGCTGATGATCAATGGAGTGG;
HNF1A-E4-F228:GTCTGCCAACCTCAAACACTCAG;
HNF1A-E4-R716:TGATCAATGGAGTGGAGAAAGCC。
2. Screening target
A plurality of targets are initially screened by screening NGG (avoiding possible mutation sites), and 4 targets are further screened from the targets through preliminary experiments.
The 4 targets were as follows:
HNF1A-E4-gU1:AGAAGCATTTCGGCACAAGT;
HNF1A-E4-gU2:ATTTCGGCACAAGTTGGCCA;
HNF1A-E4-gD1:GGGCAGACCAGGAGAGCTGT;
HNF1A-E4-gD2:GGGGCAGACCAGGAGAGCTG。
3. preparation of recombinant plasmids
Plasmid pKG-U6gRNA was taken and digested with restriction enzyme BbsI, and the vector backbone (about 3kb linear fragment) was recovered.
HNF 1A-E4-gU-S and HNF 1A-E4-gU-1-A were synthesized separately, and then mixed and annealed to give double-stranded DNA molecules having cohesive ends. The double-stranded DNA molecule having a cohesive end was ligated to the vector backbone to give plasmid pKG-U6gRNA (HNF 1A-E4-gU 1). Plasmid pKG-U6gRNA (HNF 1A-E4-gU 1) expresses the sgRNAHNF1A-E4-gU1.sgRNAHNF1A-E4-gU1 shown in SEQ ID NO:10 (SEQ ID NO: 10):
AGAAGCAUUUCGGCACAAGUguuuuagagcuagaaauagcaaguuaaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuu.
HNF 1A-E4-gU-S and HNF 1A-E4-gU-A were synthesized separately, and then mixed and annealed to give double-stranded DNA molecules having cohesive ends. The double-stranded DNA molecule having a cohesive end was ligated to the vector backbone to give plasmid pKG-U6gRNA (HNF 1A-E4-gU 2). Plasmid pKG-U6gRNA (HNF 1A-E4-gU 2) expresses sgRNAHNF1A-E4-gU2.sgRNAHNF1A-E4-gU2 shown in SEQ ID NO:11 (SEQ ID NO: 11):
AUUUCGGCACAAGUUGGCCAguuuuagagcuagaaauagcaaguuaaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuu.
HNF1A-E4-gD1-S and HNF1A-E4-gD1-A were synthesized separately, and then mixed and annealed to give double-stranded DNA molecules having cohesive ends. The double-stranded DNA molecule having a cohesive end was ligated to the vector backbone to obtain plasmid pKG-U6gRNA (HNF 1A-E4-gD 1). Plasmid pKG-U6gRNA (HNF 1A-E4-gD 1) expresses the sgRNAHNF1A-E4-gD1.sgRNAHNF1A-E4-gD1 shown in SEQ ID NO:12 (SEQ ID NO: 12):
GGGCAGACCAGGAGAGCUGUguuuuagagcuagaaauagcaaguuaaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuu.
HNF1A-E4-gD2-S and HNF1A-E4-gD2-A were synthesized separately, and then mixed and annealed to give double-stranded DNA molecules having cohesive ends. The double-stranded DNA molecule having a cohesive end was ligated to the vector backbone to give plasmid pKG-U6gRNA (HNF 1A-E4-gD 2). Plasmid pKG-U6gRNA (HNF 1A-E4-gD 2) expresses the sgRNAHNF1A-E4-gD2.sgRNAHNF1A-E4-gD2 shown in SEQ ID NO:13 (SEQ ID NO: 13):
GGGGCAGACCAGGAGAGCUGguuuuagagcuagaaauagcaaguuaaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuu.
HNF1A-E4-gU1-S:caccgAGAAGCATTTCGGCACAAGT;
HNF1A-E4-gU1-A:aaacACTTGTGCCGAAATGCTTCTc;
HNF1A-E4-gU2-S:caccgATTTCGGCACAAGTTGGCCA;
HNF1A-E4-gU2-A:aaacTGGCCAACTTGTGCCGAAATc;
HNF1A-E4-gD1-S:caccGGGCAGACCAGGAGAGCTGT;
HNF1A-E4-gD1-A:aaacACAGCTCTCCTGGTCTGCCC;
HNF1A-E4-gD2-S:caccGGGGCAGACCAGGAGAGCTG;
HNF1A-E4-gD2-A:aaacCAGCTCTCCTGGTCTGCCCC。
HNF1A-E4-gU1-S、HNF1A-E4-gU1-A、HNF1A-E4-gU2-S、HNF1A-E4-gU2-A、HNF1A-E4-gD1-S、HNF1A-E4-gD1-A、HNF1A-E4-gD2-S、HNF1A-E4-gD2-A Are all single-stranded DNA molecules.
4. Editing efficiency comparison of different targets
1. Co-transfection
The first group was to cotransfect plasmid pKG-U6gRNA (HNF 1A-E4-gU 1) and plasmid pKG-GE3 into porcine primary fibroblasts. The ratio was about 20 ten thousand primary swine fibroblasts, 0.92. Mu.g plasmid pKG-U6gRNA (HNF 1A-E4-gU 1) and 1.08. Mu.g plasmid pKG-GE3.
The second group was to cotransfect plasmid pKG-U6gRNA (HNF 1A-E4-gU 2) and plasmid pKG-GE3 into porcine primary fibroblasts. The ratio was about 20 ten thousand primary swine fibroblasts, 0.92. Mu.g plasmid pKG-U6gRNA (HNF 1A-E4-gU 2) and 1.08. Mu.g plasmid pKG-GE3.
The third group was to cotransfect plasmid pKG-U6gRNA (HNF 1A-E4-gD 1) and plasmid pKG-GE3 into porcine primary fibroblasts. The ratio was about 20 ten thousand primary swine fibroblasts, 0.92. Mu.g plasmid pKG-U6gRNA (HNF 1A-E4-gD 1) and 1.08. Mu.g plasmid pKG-GE3.
The fourth group was to cotransfect plasmid pKG-U6gRNA (HNF 1A-E4-gD 2) and plasmid pKG-GE3 into porcine primary fibroblasts. The ratio was about 20 ten thousand primary swine fibroblasts, 0.92. Mu.g plasmid pKG-U6gRNA (HNF 1A-E4-gD 2) and 1.08. Mu.g plasmid pKG-GE3.
Fifth group, pig primary fibroblast, electric transfer operation is carried out without adding plasmid with the same electric transfer parameters.
Co-transfection was performed by electric shock transfection using a mammalian nuclear transfection kit (Neon kit, thermofisher) and a Neon TM transfection system electrotransfection apparatus (parameters set to 1450V, 10ms, 3 pulses).
2. After the step 1 is completed, the culture is carried out for 12 to 18 hours by adopting the complete culture solution, and then the culture is carried out by replacing the new complete culture solution. The total incubation time after electrotransformation was 48 hours.
3. After the step 2 is completed, cells are digested and collected by trypsin, the cells are lysed, genomic DNA is extracted, PCR amplification is performed by using a primer pair consisting of HNF1A-E4-F174 and HNF1A-E4-R724, and then 1% agarose gel electrophoresis is performed. The mutation condition of the target gene of the cell is detected, and the electrophoresis chart is shown in figure 7.
And cutting and recovering a target product, sending the target product to a sequencing company for sequencing, and analyzing a sequencing peak diagram of a sequencing result by using a webpage Synthego ICE tool to obtain the gene editing efficiency of different targets. The gene editing efficiency of the first group to the fourth group was 21%, 67%, 73%, 40% in this order. The fifth group had no gene editing occurred. The results show that HNF1A-E4-gU2 and HNF1A-E4-gD1 have higher editing efficiency.
EXAMPLE 5 preparation of monoclonal cells with accurate site-directed modification of HNF1A Gene by somatic cloning
Two high efficiency gRNA targets (HNF 1A-E4-gU2 and HNF1A-E4-gD 1) screened in example 4 were selected.
1. Preparation of gRNA
1. Preparation of HNF1A-T7-gU2 transcription template and HNF1A-T7-gD1 transcription template
The HNF1A-T7-gU2 transcription template is a double-stranded DNA molecule, as shown in SEQ ID NO. 14.
The HNF1A-T7-gD1 transcription template is a double-stranded DNA molecule, as shown in SEQ ID NO. 15.
2. In vitro transcription to obtain gRNA
HNF1A-T7-gU2 transcription template is taken, TRANSCRIPT AID T A-HIGH YIELD Transcription Kit (Fermentas, K0441) is adopted for in vitro transcription, and MEGA CLEARTM Transcription Clean-Up Kit (Thermo, AM 1908) is used for recovery and purification, thus obtaining HNF1A-gU2.HNF1A-gU2 is single stranded RNA as shown in SEQ ID NO. 16.
HNF1A-T7-gD1 transcription template is taken, TRANSCRIPT AID T A HIGH YIELD Transcription Kit (Fermentas, K0441) is adopted for in vitro transcription, and then MEGA CLEARTM Transcription Clean-Up Kit (Thermo, AM 1908) is used for recovery and purification, so that HNF1A-gD1 is obtained. HNF1A-gD1 is single-stranded RNA, as shown in SEQ ID NO. 17.
HNF1A-gU2(SEQ ID NO.16):
GGAUUUCGGCACAAGUUGGCCAGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUU.
HNF1A-gD1(SEQ ID NO.17):
GGGGGCAGACCAGGAGAGCUGUGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUU.
2. Synthesis of Single-stranded Donor DNA having Single base inserted into HNF1A Gene target site
Single-base single-stranded DNA inserted into the HNF1A gene target site is synthesized as Donor DNA, and the single-stranded DNA contains the synonymous mutation of the PAM sequences of HNF1A-E4-gU2 and HNF1A-E4-gD1 targets besides the target site-directed modification. The single-stranded Donor DNA was designated HNF 1A-variant-ss 163.
HNF 1A-variant-ss 163 is shown in SEQ ID NO. 18.
3. Transfection of porcine primary fibroblasts
1. The HNF1A-gU2, HNF1A-gD1, HNF1A-mutant-ss163 and NCN proteins were co-transfected into porcine primary fibroblasts. The ratio was about 10 ten thousand porcine primary fibroblasts 1. Mu.g HNF1A-gU2 1. Mu.g HNF1A-gD 12. Mu. gHNF 1A-variant-ss 163 4. Mu.g NCN protein. Co-transfection was performed by electric shock transfection using a mammalian nuclear transfection kit (Neon kit, thermofisher) and a Neon TM transfection system electrotransfection apparatus (parameters set to 1450V, 10ms, 3 pulses).
2. After the step 1 is completed, the culture is carried out for 16 to 18 hours by adopting the complete culture solution, and then the culture is carried out by replacing the new complete culture solution. The total incubation time after electrotransformation was 48 hours.
3. After completion of step 2, the cells were digested with trypsin and collected, then washed with complete medium, then resuspended with complete medium, and then each individual monoclonal was individually picked and transferred to 96-well plates (1 cell per well, 100 μl of complete medium per well) and cultured for 2 weeks (new complete medium was changed every 2-3 days).
4. After completion of step 3, cells were digested with trypsin and collected (about 2/3 of the resulting cells per well were inoculated into 6-well plates filled with complete culture medium, and the remaining 1/3 were collected in 1.5mL centrifuge tubes).
5. The 6-well plate of step 4 was used to culture until the cells grew to 80% confluence, the cells were digested with trypsin and collected, and the cells were frozen using cell frozen stock (90% complete medium+10% dmso, volume ratio).
6. Taking the centrifuge tube in the step 4, taking cells, performing cell lysis, extracting genome DNA, performing PCR amplification by adopting a primer pair consisting of HNF1A-E4-F174 and HNF1A-E4-R724, and then performing electrophoresis. Porcine primary fibroblasts were used as wild-type control (WT). The electrophoresis pattern is shown in FIG. 8. The lane numbers in fig. 8 are consistent with the cell numbers in table 1.
7. After step 6 is completed, the PCR amplification product is recovered and sequenced.
The sequencing result of the primary fibroblast of the pig is only one, and the genotype of the primary fibroblast is homozygous wild type. If there are two types of sequencing results of a certain monoclonal cell, one type is mutated (mutation comprises deletion, insertion or substitution of one or more nucleotides) compared with the sequencing results of the pig primary fibroblast, the genotype of the monoclonal cell is heterozygous, if the sequencing results of a certain monoclonal cell are both mutated (mutation comprises deletion, insertion or substitution of one or more nucleotides) compared with the sequencing results of the pig primary fibroblast, the genotype of the monoclonal cell is a different mutant type of the biallelic gene, if the sequencing results of a certain monoclonal cell are one type and mutated (mutation comprises deletion, insertion or substitution of one or more nucleotides) compared with the sequencing results of the pig primary fibroblast, the genotype of the monoclonal cell is the same mutant type of the biallelic gene, and if the sequencing results of a certain monoclonal cell are one type and are consistent with the sequencing results of the pig primary fibroblast, the genotype of the monoclonal cell is a homozygous wild type.
The results are shown in Table 1. The genotypes of the single cell clones numbered 6, 14, 18, 22, 36, 37, 44, 51 were homozygous wild type. The genotypes of the single cell clones numbered 1,3, 12, 17, 19, 23, 25, 26, 28, 29, 33, 35, 39, 42, 45, 47, 49, 50, 52, 54, 55 were heterozygous. The genotypes of the single cell clones numbered 2, 8, 13, 15, 24, 27, 30, 38, 43, 53 were the biallelic different mutants. The genotypes of the single cell clones numbered 4, 5, 7, 9, 10, 11, 16, 20, 21, 31, 32, 34, 40, 41, 46, 48 were double allele identical mutants. Wherein, the single cell clone with the numbers 8, 13, 17, 24, 27, 43, 50 and 53 is heterozygous for the target site mutation (namely, one of the two homologous chromosomes completes the replacement of the single-stranded Donor DNA), and the single cell clone with the numbers 4, 11 and 32 is the double-allele same mutant for the target site mutation (namely, the two homologous chromosomes complete the replacement of the single-stranded Donor DNA). The ratio of the HNF1A gene-edited single-cell clones was 85.5%, and the ratio of the single-cell clones (i.e., single-cell clones numbered 4, 11, 32, 8, 13, 17, 24, 27, 43, 50, 53) obtained by site-directed modification of the target site was 20%.
Exemplary sequencing alignment results are shown in fig. 9-14. FIG. 9 shows the results of forward and reverse sequencing of the single cell clone No. 6 and the sequence after site-directed modification with the target site, which is homozygous wild type. FIG. 10 shows the results of forward and reverse sequencing of the single cell clone No. 1 and the sequence after site-directed modification with the target site, as heterozygous. FIG. 11 shows the results of forward and reverse sequencing of single cell clone number 15 and sequence alignment with site-directed modification of the target site, as two different mutants of the allele. FIG. 12 shows the results of forward and reverse sequencing of the single cell clone No. 7 and the sequence after site-directed modification with the target site, as a double allele identical mutant. FIG. 13 shows the results of forward and reverse sequencing of a single cell clone numbered 17, together with site-directed modification of the sequence at the target site, as a hybrid of site-directed modification. FIG. 14 shows the results of forward and reverse sequencing of single cell clone No. 4 and the sequence aligned with the site-directed modification of the target site, which is the same mutant of the double allele with site-directed modification of the target site.
TABLE 1 genotype determination results for single cell clones of HNF1A Gene
Note that site-directed modification of a target site refers to the completion of replacement of single-stranded Donor DNA by replacing the DNA molecule shown as SEQ ID NO. 19 in chromosomal DNA with the DNA molecule shown as SEQ ID NO. 18.
The single cell clone numbered 8, 13, 17, 24, 27, 43, 50, 53 is heterozygous for the target site mutation (i.e., one of the two homologous chromosomes completes the replacement of single-stranded Donor DNA), and the single cell clone numbered 4, 11, 32 is the double allele identical mutant for the target site mutation (i.e., both homologous chromosomes complete the replacement of single-stranded Donor DNA).
Recombinant cells with site-directed modification of the target site, whether heterozygous or homozygous, can be used for subsequent cloned pig production. And (3) taking the cells as nuclear transfer donor cells to clone somatic cells, and obtaining cloned pigs, namely MODY3 type diabetes model pigs.
The present application is described in detail above. It will be apparent to those skilled in the art that the present application can be practiced in a wide range of equivalent parameters, concentrations, and conditions without departing from the spirit and scope of the application and without undue experimentation. While the application has been described with respect to specific embodiments, it will be appreciated that the application may be further modified. In general, this application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains. The application of some of the basic features may be done in accordance with the scope of the claims that follow.
Sequence listing
<110> Nanjing Kidney Gene engineering Co., ltd
<120> Gene editing System for constructing diabetes model pig Nuclear transplantation donor cells with HNF1A Gene mutation and application thereof
<130> GNCYX211868
<160> 19
<170> SIPOSequenceListing 1.0
<210> 1
<211> 9974
<212> DNA
<213> Artificial sequence (ARTIFICIAL SEQUENCE)
<400> 1
tggcgaatgg gacgcgccct gtagcggcgc attaagcgcg gcgggtgtgg tggttacgcg 60
cagcgtgacc gctacacttg ccagcgccct agcgcccgct cctttcgctt tcttcccttc 120
ctttctcgcc acgttcgccg gctttccccg tcaagctcta aatcgggggc tccctttagg 180
gttccgattt agtgctttac ggcacctcga ccccaaaaaa cttgattagg gtgatggttc 240
acgtagtggg ccatcgccct gatagacggt ttttcgccct ttgacgttgg agtccacgtt 300
ctttaatagt ggactcttgt tccaaactgg aacaacactc aaccctatct cggtctattc 360
ttttgattta taagggattt tgccgatttc ggcctattgg ttaaaaaatg agctgattta 420
acaaaaattt aacgcgaatt ttaacaaaat attaacgttt acaatttcag gtggcacttt 480
tcggggaaat gtgcgcggaa cccctatttg tttatttttc taaatacatt caaatatgta 540
tccgctcatg agacaataac cctgataaat gcttcaataa tattgaaaaa ggaagagtat 600
gagtattcaa catttccgtg tcgcccttat tccctttttt gcggcatttt gccttcctgt 660
ttttgctcac ccagaaacgc tggtgaaagt aaaagatgct gaagatcagt tgggtgcacg 720
agtgggttac atcgaactgg atctcaacag cggtaagatc cttgagagtt ttcgccccga 780
agaacgtttt ccaatgatga gcacttttaa agttctgcta tgtggcgcgg tattatcccg 840
tattgacgcc gggcaagagc aactcggtcg ccgcatacac tattctcaga atgacttggt 900
tgagtactca ccagtcacag aaaagcatct tacggatggc atgacagtaa gagaattatg 960
cagtgctgcc ataaccatga gtgataacac tgcggccaac ttacttctga caacgatcgg 1020
aggaccgaag gagctaaccg cttttttgca caacatgggg gatcatgtaa ctcgccttga 1080
tcgttgggaa ccggagctga atgaagccat accaaacgac gagcgtgaca ccacgatgcc 1140
tgcagcaatg gcaacaacgt tgcgcaaact attaactggc gaactactta ctctagcttc 1200
ccggcaacaa ttaatagact ggatggaggc ggataaagtt gcaggaccac ttctgcgctc 1260
ggcccttccg gctggctggt ttattgctga taaatctgga gccggtgagc gtgggtctcg 1320
cggtatcatt gcagcactgg ggccagatgg taagccctcc cgtatcgtag ttatctacac 1380
gacggggagt caggcaacta tggatgaacg aaatagacag atcgctgaga taggtgcctc 1440
actgattaag cattggtaac tgtcagacca agtttactca tatatacttt agattgattt 1500
aaaacttcat ttttaattta aaaggatcta ggtgaagatc ctttttgata atctcatgac 1560
caaaatccct taacgtgagt tttcgttcca ctgagcgtca gaccccgtag aaaagatcaa 1620
aggatcttct tgagatcctt tttttctgcg cgtaatctgc tgcttgcaaa caaaaaaacc 1680
accgctacca gcggtggttt gtttgccgga tcaagagcta ccaactcttt ttccgaaggt 1740
aactggcttc agcagagcgc agataccaaa tactgtcctt ctagtgtagc cgtagttagg 1800
ccaccacttc aagaactctg tagcaccgcc tacatacctc gctctgctaa tcctgttacc 1860
agtggctgct gccagtggcg ataagtcgtg tcttaccggg ttggactcaa gacgatagtt 1920
accggataag gcgcagcggt cgggctgaac ggggggttcg tgcacacagc ccagcttgga 1980
gcgaacgacc tacaccgaac tgagatacct acagcgtgag ctatgagaaa gcgccacgct 2040
tcccgaaggg agaaaggcgg acaggtatcc ggtaagcggc agggtcggaa caggagagcg 2100
cacgagggag cttccagggg gaaacgcctg gtatctttat agtcctgtcg ggtttcgcca 2160
cctctgactt gagcgtcgat ttttgtgatg ctcgtcaggg gggcggagcc tatggaaaaa 2220
cgccagcaac gcggcctttt tacggttcct ggccttttgc tggccttttg ctcacatgtt 2280
ctttcctgcg ttatcccctg attctgtgga taaccgtatt accgcctttg agtgagctga 2340
taccgctcgc cgcagccgaa cgaccgagcg cagcgagtca gtgagcgagg aagcggaaga 2400
gcgcctgatg cggtattttc tccttacgca tctgtgcggt atttcacacc gcatatatgg 2460
tgcactctca gtacaatctg ctctgatgcc gcatagttaa gccagtatac actccgctat 2520
cgctacgtga ctgggtcatg gctgcgcccc gacacccgcc aacacccgct gacgcgccct 2580
gacgggcttg tctgctcccg gcatccgctt acagacaagc tgtgaccgtc tccgggagct 2640
gcatgtgtca gaggttttca ccgtcatcac cgaaacgcgc gaggcagctg cggtaaagct 2700
catcagcgtg gtcgtgaagc gattcacaga tgtctgcctg ttcatccgcg tccagctcgt 2760
tgagtttctc cagaagcgtt aatgtctggc ttctgataaa gcgggccatg ttaagggcgg 2820
ttttttcctg tttggtcact gatgcctccg tgtaaggggg atttctgttc atgggggtaa 2880
tgataccgat gaaacgagag aggatgctca cgatacgggt tactgatgat gaacatgccc 2940
ggttactgga acgttgtgag ggtaaacaac tggcggtatg gatgcggcgg gaccagagaa 3000
aaatcactca gggtcaatgc cagcgcttcg ttaatacaga tgtaggtgtt ccacagggta 3060
gccagcagca tcctgcgatg cagatccgga acataatggt gcagggcgct gacttccgcg 3120
tttccagact ttacgaaaca cggaaaccga agaccattca tgttgttgct caggtcgcag 3180
acgttttgca gcagcagtcg cttcacgttc gctcgcgtat cggtgattca ttctgctaac 3240
cagtaaggca accccgccag cctagccggg tcctcaacga caggagcacg atcatgcgca 3300
cccgtggggc cgccatgccg gcgataatgg cctgcttctc gccgaaacgt ttggtggcgg 3360
gaccagtgac gaaggcttga gcgagggcgt gcaagattcc gaataccgca agcgacaggc 3420
cgatcatcgt cgcgctccag cgaaagcggt cctcgccgaa aatgacccag agcgctgccg 3480
gcacctgtcc tacgagttgc atgataaaga agacagtcat aagtgcggcg acgatagtca 3540
tgccccgcgc ccaccggaag gagctgactg ggttgaaggc tctcaagggc atcggtcgag 3600
atcccggtgc ctaatgagtg agctaactta cattaattgc gttgcgctca ctgcccgctt 3660
tccagtcggg aaacctgtcg tgccagctgc attaatgaat cggccaacgc gcggggagag 3720
gcggtttgcg tattgggcgc cagggtggtt tttcttttca ccagtgagac gggcaacagc 3780
tgattgccct tcaccgcctg gccctgagag agttgcagca agcggtccac gctggtttgc 3840
cccagcaggc gaaaatcctg tttgatggtg gttaacggcg ggatataaca tgagctgtct 3900
tcggtatcgt cgtatcccac taccgagatg tccgcaccaa cgcgcagccc ggactcggta 3960
atggcgcgca ttgcgcccag cgccatctga tcgttggcaa ccagcatcgc agtgggaacg 4020
atgccctcat tcagcatttg catggtttgt tgaaaaccgg acatggcact ccagtcgcct 4080
tcccgttccg ctatcggctg aatttgattg cgagtgagat atttatgcca gccagccaga 4140
cgcagacgcg ccgagacaga acttaatggg cccgctaaca gcgcgatttg ctggtgaccc 4200
aatgcgacca gatgctccac gcccagtcgc gtaccgtctt catgggagaa aataatactg 4260
ttgatgggtg tctggtcaga gacatcaaga aataacgccg gaacattagt gcaggcagct 4320
tccacagcaa tggcatcctg gtcatccagc ggatagttaa tgatcagccc actgacgcgt 4380
tgcgcgagaa gattgtgcac cgccgcttta caggcttcga cgccgcttcg ttctaccatc 4440
gacaccacca cgctggcacc cagttgatcg gcgcgagatt taatcgccgc gacaatttgc 4500
gacggcgcgt gcagggccag actggaggtg gcaacgccaa tcagcaacga ctgtttgccc 4560
gccagttgtt gtgccacgcg gttgggaatg taattcagct ccgccatcgc cgcttccact 4620
ttttcccgcg ttttcgcaga aacgtggctg gcctggttca ccacgcggga aacggtctga 4680
taagagacac cggcatactc tgcgacatcg tataacgtta ctggtttcac attcaccacc 4740
ctgaattgac tctcttccgg gcgctatcat gccataccgc gaaaggtttt gcgccattcg 4800
atggtgtccg ggatctcgac gctctccctt atgcgactcc tgcattagga agcagcccag 4860
tagtaggttg aggccgttga gcaccgccgc cgcaaggaat ggtgcatgca aggagatggc 4920
gcccaacagt cccccggcca cggggcctgc caccataccc acgccgaaac aagcgctcat 4980
gagcccgaag tggcgagccc gatcttcccc atcggtgatg tcggcgatat aggcgccagc 5040
aaccgcacct gtggcgccgg tgatgccggc cacgatgcgt ccggcgtaga ggatcgagat 5100
cgatctcgat cccgcgaaat taatacgact cactataggg gaattgtgag cggataacaa 5160
ttcccctcta gaaataattt tgtttaactt taagaaggag atatacatat gaaacaaagc 5220
actattgcac tggcactctt accgttactg tttacccctg tgacaaaagc catgagcgat 5280
aaaattattc acctgactga cgacagtttt gacacggatg tactcaaagc ggacggggcg 5340
atcctcgtcg atttctgggc agagtggtgc ggtccgtgca aaatgatcgc cccgattctg 5400
gatgaaatcg ctgacgaata tcagggcaaa ctgaccgttg caaaactgaa catcgatcaa 5460
aaccctggca ctgcgccgaa atatggcatc cgtggtatcc cgactctgct gctgttcaaa 5520
aacggtgaag tggcggcaac caaagtgggt gcactgtcta aaggtcagtt gaaagagttc 5580
ctcgacgcta acctggccgg ttctggttct ggccatatgc accatcatca tcatcatgac 5640
gatgacgata agatgcccaa aaagaaacga aaggtgggta tccacggagt cccagcagcc 5700
gacaaaaaat atagcatcgg cctggacatc ggtaccaaca gcgttggctg ggcagtgatc 5760
actgatgaat acaaagttcc atccaaaaaa tttaaagtac tgggcaacac cgaccgtcac 5820
tctatcaaaa aaaacctgat tggtgctctg ctgtttgaca gcggcgaaac tgctgaggct 5880
acccgtctga aacgtacggc tcgccgtcgc tacactcgtc gtaaaaaccg catctgttat 5940
ctgcaggaaa ttttctctaa cgaaatggca aaagttgatg atagcttctt tcatcgtctg 6000
gaagagagct tcctggtgga agaagataaa aaacacgaac gtcacccgat tttcggtaac 6060
attgtggatg aggttgccta ccacgagaaa tatccgacca tctaccatct gcgtaaaaaa 6120
ctggttgata gcactgacaa agcggatctg cgtctgatct acctggctct ggcacacatg 6180
atcaaattcc gtggtcactt cctgatcgaa ggtgatctga accctgataa ctccgacgtg 6240
gacaaactgt tcattcagct ggttcagacc tataaccagc tgttcgaaga aaacccgatc 6300
aacgcgtccg gtgtagacgc taaggcaatt ctgtctgcgc gtctgtctaa gtctcgtcgt 6360
ctggaaaacc tgattgcgca actgccaggt gaaaagaaaa acggcctgtt cggcaatctg 6420
atcgccctgt ccctgggtct gactccgaac tttaaatcca actttgacct ggcggaagat 6480
gccaagctgc agctgagcaa agatacctat gacgatgacc tggataacct gctggcacag 6540
atcggtgatc agtatgccga tctgttcctg gccgcgaaaa acctgtctga tgcgattctg 6600
ctgtctgata tcctgcgcgt taacactgaa attactaaag cgccgctgag cgcatccatg 6660
attaaacgtt acgatgaaca ccaccaggat ctgaccctgc tgaaagcgct ggtgcgtcag 6720
cagctgccgg aaaaatacaa ggagatcttc ttcgaccaga gcaaaaacgg ttacgcgggc 6780
tacattgatg gtggtgcatc tcaggaggaa ttctacaaat tcattaaacc gatcctggaa 6840
aaaatggatg gtactgaaga gctgctggtt aaactgaatc gtgaagatct gctgcgcaaa 6900
cagcgtacct tcgataacgg ttccatcccg catcagattc atctgggcga actgcacgct 6960
atcctgcgcc gtcaggaaga cttttatccg ttcctgaaag acaaccgtga gaaaattgaa 7020
aaaatcctga ccttccgtat tccgtactat gtaggtccgc tggcgcgtgg taactcccgt 7080
ttcgcttgga tgacccgcaa aagcgaagaa accatcaccc cgtggaattt cgaagaagtc 7140
gttgacaaag gcgcgtccgc gcagtctttc atcgaacgca tgacgaactt cgacaaaaac 7200
ctgccgaacg agaaagtgct gccgaaacac tctctgctgt acgagtactt cactgtgtac 7260
aacgaactga ccaaagtgaa atacgtcacc gaaggtatgc gtaaaccggc attcctgtcc 7320
ggtgagcaaa aaaaagcaat cgtggatctg ctgttcaaaa ccaaccgtaa agtaaccgtg 7380
aaacagctga aggaagacta tttcaagaaa atcgaatgtt ttgattctgt tgaaatctcc 7440
ggcgtggaag atcgcttcaa tgcgtccctg ggtacgtatc acgacctgct gaaaattatc 7500
aaagacaaag attttctgga caacgaggaa aacgaagaca tcctggagga tattgtactg 7560
accctgaccc tgttcgaaga ccgtgagatg atcgaagaac gcctgaaaac ctacgcccac 7620
ctgttcgatg acaaggtaat gaagcagctg aaacgtcgtc gttataccgg ctggggtcgt 7680
ctgtcccgta aactgatcaa tggcatccgt gataaacagt ctggcaaaac catcctggac 7740
ttcctgaaat ccgacggttt cgcgaatcgt aacttcatgc aactgattca tgacgattct 7800
ctgactttca aagaagacat ccagaaagca caggtttccg gccagggtga ctctctgcac 7860
gagcacattg ccaatctggc tggttctccg gctattaaaa agggtattct gcagactgtg 7920
aaagtagttg atgagctggt caaagtaatg ggccgtcaca agccggaaaa cattgtgatc 7980
gaaatggcac gtgaaaacca gacgacccag aaaggtcaga aaaactctcg tgaacgcatg 8040
aaacgtatcg aagaaggcat caaagaactg ggctctcaga tcctgaagga acaccctgta 8100
gaaaataccc agctgcagaa cgaaaagctg tatctgtatt acctgcagaa cggccgcgat 8160
atgtatgtgg accaggaact ggatatcaac cgcctgtccg attacgatgt agatcacatc 8220
gtgccgcaaa gcttcctgaa agacgacagc attgacaaca aagtactgac ccgttctgat 8280
aagaaccgtg gcaaatccga taacgtcccg tctgaagaag ttgttaaaaa aatgaaaaac 8340
tattggcgtc agctgctgaa cgcgaaactg atcacccagc gtaagttcga caatctgact 8400
aaagctgagc gcggtggtct gtccgaactg gataaagcgg gttttatcaa acgccagctg 8460
gttgaaaccc gtcagatcac gaagcacgtt gcgcagattc tggactctcg tatgaacacc 8520
aaatacgacg aaaacgacaa actgatccgc gaggttaagg ttatcaccct gaaaagcaaa 8580
ctggtatccg attttcgtaa agactttcag ttctacaaag tgcgcgaaat taacaactat 8640
caccacgctc acgatgcata tctgaatgca gttgttggca cggcgctgat caaaaagtat 8700
ccgaaactgg aatctgaatt cgtatacggc gattacaaag tgtatgacgt tcgtaagatg 8760
atcgcaaaat ccgagcagga aattggtaag gcgacggcga aatacttctt ttattccaat 8820
attatgaact ttttcaaaac cgaaatcacc ctggcgaatg gtgaaattcg taaacgcccg 8880
ctgatcgaaa ccaacggtga aactggtgaa atcgtttggg acaaaggccg cgacttcgcg 8940
accgtgcgta aagttctgtc tatgccgcaa gtgaacatcg tcaagaagac cgaagtacaa 9000
accggcggtt ttagcaaaga gagcattctg ccaaaacgta actccgacaa actgatcgcg 9060
cgcaagaaag actgggatcc gaaaaaatac ggtggtttcg attctccaac cgttgcttat 9120
tccgttctgg tggtagccaa agttgagaaa ggtaaaagca aaaaactgaa atccgtaaag 9180
gaactgctgg gtattactat catggagcgt agctccttcg aaaaaaaccc gatcgatttt 9240
ctggaagcga aaggctataa agaagtcaaa aaggacctga tcatcaaact gccaaaatac 9300
agcctgttcg agctggaaaa cggccgtaaa cgtatgctgg catctgcggg cgaactgcag 9360
aaaggcaacg agctggctct gccgtccaaa tacgtgaact ttctgtacct ggcctctcac 9420
tacgaaaaac tgaaaggttc cccggaagac aacgaacaga aacagctgtt cgtagagcag 9480
cacaaacact acctggacga gatcatcgaa cagatttctg aattttctaa acgtgtgatt 9540
ctggctgatg cgaatctgga taaagttctg tctgcctata acaagcatcg tgacaaaccg 9600
atccgcgaac aggctgagaa catcatccac ctgttcactc tgactaacct gggcgcgcca 9660
gcggctttca agtactttga taccaccatt gaccgcaagc gttacacctc cactaaagaa 9720
gtgctggacg cgactctgat ccaccagtcc atcaccggtc tgtacgagac ccgtatcgat 9780
ctgagccagc tgggcggtga caaaaggccg gcggccacga aaaaggccgg ccaggcaaaa 9840
aagaaaaagt gacaaagccc gaaaggaagc tgagttggct gctgccaccg ctgagcaata 9900
actagcataa ccccttgggg cctctaaacg ggtcttgagg ggttttttgc tgaaaggagg 9960
aactatatcc ggat 9974
<210> 2
<211> 1547
<212> PRT
<213> Artificial sequence (ARTIFICIAL SEQUENCE)
<400> 2
Met Lys Gln Ser Thr Ile Ala Leu Ala Leu Leu Pro Leu Leu Phe Thr
1 5 10 15
Pro Val Thr Lys Ala Met Ser Asp Lys Ile Ile His Leu Thr Asp Asp
20 25 30
Ser Phe Asp Thr Asp Val Leu Lys Ala Asp Gly Ala Ile Leu Val Asp
35 40 45
Phe Trp Ala Glu Trp Cys Gly Pro Cys Lys Met Ile Ala Pro Ile Leu
50 55 60
Asp Glu Ile Ala Asp Glu Tyr Gln Gly Lys Leu Thr Val Ala Lys Leu
65 70 75 80
Asn Ile Asp Gln Asn Pro Gly Thr Ala Pro Lys Tyr Gly Ile Arg Gly
85 90 95
Ile Pro Thr Leu Leu Leu Phe Lys Asn Gly Glu Val Ala Ala Thr Lys
100 105 110
Val Gly Ala Leu Ser Lys Gly Gln Leu Lys Glu Phe Leu Asp Ala Asn
115 120 125
Leu Ala Gly Ser Gly Ser Gly His Met His His His His His His Asp
130 135 140
Asp Asp Asp Lys Met Pro Lys Lys Lys Arg Lys Val Gly Ile His Gly
145 150 155 160
Val Pro Ala Ala Asp Lys Lys Tyr Ser Ile Gly Leu Asp Ile Gly Thr
165 170 175
Asn Ser Val Gly Trp Ala Val Ile Thr Asp Glu Tyr Lys Val Pro Ser
180 185 190
Lys Lys Phe Lys Val Leu Gly Asn Thr Asp Arg His Ser Ile Lys Lys
195 200 205
Asn Leu Ile Gly Ala Leu Leu Phe Asp Ser Gly Glu Thr Ala Glu Ala
210 215 220
Thr Arg Leu Lys Arg Thr Ala Arg Arg Arg Tyr Thr Arg Arg Lys Asn
225 230 235 240
Arg Ile Cys Tyr Leu Gln Glu Ile Phe Ser Asn Glu Met Ala Lys Val
245 250 255
Asp Asp Ser Phe Phe His Arg Leu Glu Glu Ser Phe Leu Val Glu Glu
260 265 270
Asp Lys Lys His Glu Arg His Pro Ile Phe Gly Asn Ile Val Asp Glu
275 280 285
Val Ala Tyr His Glu Lys Tyr Pro Thr Ile Tyr His Leu Arg Lys Lys
290 295 300
Leu Val Asp Ser Thr Asp Lys Ala Asp Leu Arg Leu Ile Tyr Leu Ala
305 310 315 320
Leu Ala His Met Ile Lys Phe Arg Gly His Phe Leu Ile Glu Gly Asp
325 330 335
Leu Asn Pro Asp Asn Ser Asp Val Asp Lys Leu Phe Ile Gln Leu Val
340 345 350
Gln Thr Tyr Asn Gln Leu Phe Glu Glu Asn Pro Ile Asn Ala Ser Gly
355 360 365
Val Asp Ala Lys Ala Ile Leu Ser Ala Arg Leu Ser Lys Ser Arg Arg
370 375 380
Leu Glu Asn Leu Ile Ala Gln Leu Pro Gly Glu Lys Lys Asn Gly Leu
385 390 395 400
Phe Gly Asn Leu Ile Ala Leu Ser Leu Gly Leu Thr Pro Asn Phe Lys
405 410 415
Ser Asn Phe Asp Leu Ala Glu Asp Ala Lys Leu Gln Leu Ser Lys Asp
420 425 430
Thr Tyr Asp Asp Asp Leu Asp Asn Leu Leu Ala Gln Ile Gly Asp Gln
435 440 445
Tyr Ala Asp Leu Phe Leu Ala Ala Lys Asn Leu Ser Asp Ala Ile Leu
450 455 460
Leu Ser Asp Ile Leu Arg Val Asn Thr Glu Ile Thr Lys Ala Pro Leu
465 470 475 480
Ser Ala Ser Met Ile Lys Arg Tyr Asp Glu His His Gln Asp Leu Thr
485 490 495
Leu Leu Lys Ala Leu Val Arg Gln Gln Leu Pro Glu Lys Tyr Lys Glu
500 505 510
Ile Phe Phe Asp Gln Ser Lys Asn Gly Tyr Ala Gly Tyr Ile Asp Gly
515 520 525
Gly Ala Ser Gln Glu Glu Phe Tyr Lys Phe Ile Lys Pro Ile Leu Glu
530 535 540
Lys Met Asp Gly Thr Glu Glu Leu Leu Val Lys Leu Asn Arg Glu Asp
545 550 555 560
Leu Leu Arg Lys Gln Arg Thr Phe Asp Asn Gly Ser Ile Pro His Gln
565 570 575
Ile His Leu Gly Glu Leu His Ala Ile Leu Arg Arg Gln Glu Asp Phe
580 585 590
Tyr Pro Phe Leu Lys Asp Asn Arg Glu Lys Ile Glu Lys Ile Leu Thr
595 600 605
Phe Arg Ile Pro Tyr Tyr Val Gly Pro Leu Ala Arg Gly Asn Ser Arg
610 615 620
Phe Ala Trp Met Thr Arg Lys Ser Glu Glu Thr Ile Thr Pro Trp Asn
625 630 635 640
Phe Glu Glu Val Val Asp Lys Gly Ala Ser Ala Gln Ser Phe Ile Glu
645 650 655
Arg Met Thr Asn Phe Asp Lys Asn Leu Pro Asn Glu Lys Val Leu Pro
660 665 670
Lys His Ser Leu Leu Tyr Glu Tyr Phe Thr Val Tyr Asn Glu Leu Thr
675 680 685
Lys Val Lys Tyr Val Thr Glu Gly Met Arg Lys Pro Ala Phe Leu Ser
690 695 700
Gly Glu Gln Lys Lys Ala Ile Val Asp Leu Leu Phe Lys Thr Asn Arg
705 710 715 720
Lys Val Thr Val Lys Gln Leu Lys Glu Asp Tyr Phe Lys Lys Ile Glu
725 730 735
Cys Phe Asp Ser Val Glu Ile Ser Gly Val Glu Asp Arg Phe Asn Ala
740 745 750
Ser Leu Gly Thr Tyr His Asp Leu Leu Lys Ile Ile Lys Asp Lys Asp
755 760 765
Phe Leu Asp Asn Glu Glu Asn Glu Asp Ile Leu Glu Asp Ile Val Leu
770 775 780
Thr Leu Thr Leu Phe Glu Asp Arg Glu Met Ile Glu Glu Arg Leu Lys
785 790 795 800
Thr Tyr Ala His Leu Phe Asp Asp Lys Val Met Lys Gln Leu Lys Arg
805 810 815
Arg Arg Tyr Thr Gly Trp Gly Arg Leu Ser Arg Lys Leu Ile Asn Gly
820 825 830
Ile Arg Asp Lys Gln Ser Gly Lys Thr Ile Leu Asp Phe Leu Lys Ser
835 840 845
Asp Gly Phe Ala Asn Arg Asn Phe Met Gln Leu Ile His Asp Asp Ser
850 855 860
Leu Thr Phe Lys Glu Asp Ile Gln Lys Ala Gln Val Ser Gly Gln Gly
865 870 875 880
Asp Ser Leu His Glu His Ile Ala Asn Leu Ala Gly Ser Pro Ala Ile
885 890 895
Lys Lys Gly Ile Leu Gln Thr Val Lys Val Val Asp Glu Leu Val Lys
900 905 910
Val Met Gly Arg His Lys Pro Glu Asn Ile Val Ile Glu Met Ala Arg
915 920 925
Glu Asn Gln Thr Thr Gln Lys Gly Gln Lys Asn Ser Arg Glu Arg Met
930 935 940
Lys Arg Ile Glu Glu Gly Ile Lys Glu Leu Gly Ser Gln Ile Leu Lys
945 950 955 960
Glu His Pro Val Glu Asn Thr Gln Leu Gln Asn Glu Lys Leu Tyr Leu
965 970 975
Tyr Tyr Leu Gln Asn Gly Arg Asp Met Tyr Val Asp Gln Glu Leu Asp
980 985 990
Ile Asn Arg Leu Ser Asp Tyr Asp Val Asp His Ile Val Pro Gln Ser
995 1000 1005
Phe Leu Lys Asp Asp Ser Ile Asp Asn Lys Val Leu Thr Arg Ser Asp
1010 1015 1020
Lys Asn Arg Gly Lys Ser Asp Asn Val Pro Ser Glu Glu Val Val Lys
1025 1030 1035 1040
Lys Met Lys Asn Tyr Trp Arg Gln Leu Leu Asn Ala Lys Leu Ile Thr
1045 1050 1055
Gln Arg Lys Phe Asp Asn Leu Thr Lys Ala Glu Arg Gly Gly Leu Ser
1060 1065 1070
Glu Leu Asp Lys Ala Gly Phe Ile Lys Arg Gln Leu Val Glu Thr Arg
1075 1080 1085
Gln Ile Thr Lys His Val Ala Gln Ile Leu Asp Ser Arg Met Asn Thr
1090 1095 1100
Lys Tyr Asp Glu Asn Asp Lys Leu Ile Arg Glu Val Lys Val Ile Thr
1105 1110 1115 1120
Leu Lys Ser Lys Leu Val Ser Asp Phe Arg Lys Asp Phe Gln Phe Tyr
1125 1130 1135
Lys Val Arg Glu Ile Asn Asn Tyr His His Ala His Asp Ala Tyr Leu
1140 1145 1150
Asn Ala Val Val Gly Thr Ala Leu Ile Lys Lys Tyr Pro Lys Leu Glu
1155 1160 1165
Ser Glu Phe Val Tyr Gly Asp Tyr Lys Val Tyr Asp Val Arg Lys Met
1170 1175 1180
Ile Ala Lys Ser Glu Gln Glu Ile Gly Lys Ala Thr Ala Lys Tyr Phe
1185 1190 1195 1200
Phe Tyr Ser Asn Ile Met Asn Phe Phe Lys Thr Glu Ile Thr Leu Ala
1205 1210 1215
Asn Gly Glu Ile Arg Lys Arg Pro Leu Ile Glu Thr Asn Gly Glu Thr
1220 1225 1230
Gly Glu Ile Val Trp Asp Lys Gly Arg Asp Phe Ala Thr Val Arg Lys
1235 1240 1245
Val Leu Ser Met Pro Gln Val Asn Ile Val Lys Lys Thr Glu Val Gln
1250 1255 1260
Thr Gly Gly Phe Ser Lys Glu Ser Ile Leu Pro Lys Arg Asn Ser Asp
1265 1270 1275 1280
Lys Leu Ile Ala Arg Lys Lys Asp Trp Asp Pro Lys Lys Tyr Gly Gly
1285 1290 1295
Phe Asp Ser Pro Thr Val Ala Tyr Ser Val Leu Val Val Ala Lys Val
1300 1305 1310
Glu Lys Gly Lys Ser Lys Lys Leu Lys Ser Val Lys Glu Leu Leu Gly
1315 1320 1325
Ile Thr Ile Met Glu Arg Ser Ser Phe Glu Lys Asn Pro Ile Asp Phe
1330 1335 1340
Leu Glu Ala Lys Gly Tyr Lys Glu Val Lys Lys Asp Leu Ile Ile Lys
1345 1350 1355 1360
Leu Pro Lys Tyr Ser Leu Phe Glu Leu Glu Asn Gly Arg Lys Arg Met
1365 1370 1375
Leu Ala Ser Ala Gly Glu Leu Gln Lys Gly Asn Glu Leu Ala Leu Pro
1380 1385 1390
Ser Lys Tyr Val Asn Phe Leu Tyr Leu Ala Ser His Tyr Glu Lys Leu
1395 1400 1405
Lys Gly Ser Pro Glu Asp Asn Glu Gln Lys Gln Leu Phe Val Glu Gln
1410 1415 1420
His Lys His Tyr Leu Asp Glu Ile Ile Glu Gln Ile Ser Glu Phe Ser
1425 1430 1435 1440
Lys Arg Val Ile Leu Ala Asp Ala Asn Leu Asp Lys Val Leu Ser Ala
1445 1450 1455
Tyr Asn Lys His Arg Asp Lys Pro Ile Arg Glu Gln Ala Glu Asn Ile
1460 1465 1470
Ile His Leu Phe Thr Leu Thr Asn Leu Gly Ala Pro Ala Ala Phe Lys
1475 1480 1485
Tyr Phe Asp Thr Thr Ile Asp Arg Lys Arg Tyr Thr Ser Thr Lys Glu
1490 1495 1500
Val Leu Asp Ala Thr Leu Ile His Gln Ser Ile Thr Gly Leu Tyr Glu
1505 1510 1515 1520
Thr Arg Ile Asp Leu Ser Gln Leu Gly Gly Asp Lys Arg Pro Ala Ala
1525 1530 1535
Thr Lys Lys Ala Gly Gln Ala Lys Lys Lys Lys
1540 1545
<210> 3
<211> 1399
<212> PRT
<213> Artificial sequence (ARTIFICIAL SEQUENCE)
<400> 3
Met Pro Lys Lys Lys Arg Lys Val Gly Ile His Gly Val Pro Ala Ala
1 5 10 15
Asp Lys Lys Tyr Ser Ile Gly Leu Asp Ile Gly Thr Asn Ser Val Gly
20 25 30
Trp Ala Val Ile Thr Asp Glu Tyr Lys Val Pro Ser Lys Lys Phe Lys
35 40 45
Val Leu Gly Asn Thr Asp Arg His Ser Ile Lys Lys Asn Leu Ile Gly
50 55 60
Ala Leu Leu Phe Asp Ser Gly Glu Thr Ala Glu Ala Thr Arg Leu Lys
65 70 75 80
Arg Thr Ala Arg Arg Arg Tyr Thr Arg Arg Lys Asn Arg Ile Cys Tyr
85 90 95
Leu Gln Glu Ile Phe Ser Asn Glu Met Ala Lys Val Asp Asp Ser Phe
100 105 110
Phe His Arg Leu Glu Glu Ser Phe Leu Val Glu Glu Asp Lys Lys His
115 120 125
Glu Arg His Pro Ile Phe Gly Asn Ile Val Asp Glu Val Ala Tyr His
130 135 140
Glu Lys Tyr Pro Thr Ile Tyr His Leu Arg Lys Lys Leu Val Asp Ser
145 150 155 160
Thr Asp Lys Ala Asp Leu Arg Leu Ile Tyr Leu Ala Leu Ala His Met
165 170 175
Ile Lys Phe Arg Gly His Phe Leu Ile Glu Gly Asp Leu Asn Pro Asp
180 185 190
Asn Ser Asp Val Asp Lys Leu Phe Ile Gln Leu Val Gln Thr Tyr Asn
195 200 205
Gln Leu Phe Glu Glu Asn Pro Ile Asn Ala Ser Gly Val Asp Ala Lys
210 215 220
Ala Ile Leu Ser Ala Arg Leu Ser Lys Ser Arg Arg Leu Glu Asn Leu
225 230 235 240
Ile Ala Gln Leu Pro Gly Glu Lys Lys Asn Gly Leu Phe Gly Asn Leu
245 250 255
Ile Ala Leu Ser Leu Gly Leu Thr Pro Asn Phe Lys Ser Asn Phe Asp
260 265 270
Leu Ala Glu Asp Ala Lys Leu Gln Leu Ser Lys Asp Thr Tyr Asp Asp
275 280 285
Asp Leu Asp Asn Leu Leu Ala Gln Ile Gly Asp Gln Tyr Ala Asp Leu
290 295 300
Phe Leu Ala Ala Lys Asn Leu Ser Asp Ala Ile Leu Leu Ser Asp Ile
305 310 315 320
Leu Arg Val Asn Thr Glu Ile Thr Lys Ala Pro Leu Ser Ala Ser Met
325 330 335
Ile Lys Arg Tyr Asp Glu His His Gln Asp Leu Thr Leu Leu Lys Ala
340 345 350
Leu Val Arg Gln Gln Leu Pro Glu Lys Tyr Lys Glu Ile Phe Phe Asp
355 360 365
Gln Ser Lys Asn Gly Tyr Ala Gly Tyr Ile Asp Gly Gly Ala Ser Gln
370 375 380
Glu Glu Phe Tyr Lys Phe Ile Lys Pro Ile Leu Glu Lys Met Asp Gly
385 390 395 400
Thr Glu Glu Leu Leu Val Lys Leu Asn Arg Glu Asp Leu Leu Arg Lys
405 410 415
Gln Arg Thr Phe Asp Asn Gly Ser Ile Pro His Gln Ile His Leu Gly
420 425 430
Glu Leu His Ala Ile Leu Arg Arg Gln Glu Asp Phe Tyr Pro Phe Leu
435 440 445
Lys Asp Asn Arg Glu Lys Ile Glu Lys Ile Leu Thr Phe Arg Ile Pro
450 455 460
Tyr Tyr Val Gly Pro Leu Ala Arg Gly Asn Ser Arg Phe Ala Trp Met
465 470 475 480
Thr Arg Lys Ser Glu Glu Thr Ile Thr Pro Trp Asn Phe Glu Glu Val
485 490 495
Val Asp Lys Gly Ala Ser Ala Gln Ser Phe Ile Glu Arg Met Thr Asn
500 505 510
Phe Asp Lys Asn Leu Pro Asn Glu Lys Val Leu Pro Lys His Ser Leu
515 520 525
Leu Tyr Glu Tyr Phe Thr Val Tyr Asn Glu Leu Thr Lys Val Lys Tyr
530 535 540
Val Thr Glu Gly Met Arg Lys Pro Ala Phe Leu Ser Gly Glu Gln Lys
545 550 555 560
Lys Ala Ile Val Asp Leu Leu Phe Lys Thr Asn Arg Lys Val Thr Val
565 570 575
Lys Gln Leu Lys Glu Asp Tyr Phe Lys Lys Ile Glu Cys Phe Asp Ser
580 585 590
Val Glu Ile Ser Gly Val Glu Asp Arg Phe Asn Ala Ser Leu Gly Thr
595 600 605
Tyr His Asp Leu Leu Lys Ile Ile Lys Asp Lys Asp Phe Leu Asp Asn
610 615 620
Glu Glu Asn Glu Asp Ile Leu Glu Asp Ile Val Leu Thr Leu Thr Leu
625 630 635 640
Phe Glu Asp Arg Glu Met Ile Glu Glu Arg Leu Lys Thr Tyr Ala His
645 650 655
Leu Phe Asp Asp Lys Val Met Lys Gln Leu Lys Arg Arg Arg Tyr Thr
660 665 670
Gly Trp Gly Arg Leu Ser Arg Lys Leu Ile Asn Gly Ile Arg Asp Lys
675 680 685
Gln Ser Gly Lys Thr Ile Leu Asp Phe Leu Lys Ser Asp Gly Phe Ala
690 695 700
Asn Arg Asn Phe Met Gln Leu Ile His Asp Asp Ser Leu Thr Phe Lys
705 710 715 720
Glu Asp Ile Gln Lys Ala Gln Val Ser Gly Gln Gly Asp Ser Leu His
725 730 735
Glu His Ile Ala Asn Leu Ala Gly Ser Pro Ala Ile Lys Lys Gly Ile
740 745 750
Leu Gln Thr Val Lys Val Val Asp Glu Leu Val Lys Val Met Gly Arg
755 760 765
His Lys Pro Glu Asn Ile Val Ile Glu Met Ala Arg Glu Asn Gln Thr
770 775 780
Thr Gln Lys Gly Gln Lys Asn Ser Arg Glu Arg Met Lys Arg Ile Glu
785 790 795 800
Glu Gly Ile Lys Glu Leu Gly Ser Gln Ile Leu Lys Glu His Pro Val
805 810 815
Glu Asn Thr Gln Leu Gln Asn Glu Lys Leu Tyr Leu Tyr Tyr Leu Gln
820 825 830
Asn Gly Arg Asp Met Tyr Val Asp Gln Glu Leu Asp Ile Asn Arg Leu
835 840 845
Ser Asp Tyr Asp Val Asp His Ile Val Pro Gln Ser Phe Leu Lys Asp
850 855 860
Asp Ser Ile Asp Asn Lys Val Leu Thr Arg Ser Asp Lys Asn Arg Gly
865 870 875 880
Lys Ser Asp Asn Val Pro Ser Glu Glu Val Val Lys Lys Met Lys Asn
885 890 895
Tyr Trp Arg Gln Leu Leu Asn Ala Lys Leu Ile Thr Gln Arg Lys Phe
900 905 910
Asp Asn Leu Thr Lys Ala Glu Arg Gly Gly Leu Ser Glu Leu Asp Lys
915 920 925
Ala Gly Phe Ile Lys Arg Gln Leu Val Glu Thr Arg Gln Ile Thr Lys
930 935 940
His Val Ala Gln Ile Leu Asp Ser Arg Met Asn Thr Lys Tyr Asp Glu
945 950 955 960
Asn Asp Lys Leu Ile Arg Glu Val Lys Val Ile Thr Leu Lys Ser Lys
965 970 975
Leu Val Ser Asp Phe Arg Lys Asp Phe Gln Phe Tyr Lys Val Arg Glu
980 985 990
Ile Asn Asn Tyr His His Ala His Asp Ala Tyr Leu Asn Ala Val Val
995 1000 1005
Gly Thr Ala Leu Ile Lys Lys Tyr Pro Lys Leu Glu Ser Glu Phe Val
1010 1015 1020
Tyr Gly Asp Tyr Lys Val Tyr Asp Val Arg Lys Met Ile Ala Lys Ser
1025 1030 1035 1040
Glu Gln Glu Ile Gly Lys Ala Thr Ala Lys Tyr Phe Phe Tyr Ser Asn
1045 1050 1055
Ile Met Asn Phe Phe Lys Thr Glu Ile Thr Leu Ala Asn Gly Glu Ile
1060 1065 1070
Arg Lys Arg Pro Leu Ile Glu Thr Asn Gly Glu Thr Gly Glu Ile Val
1075 1080 1085
Trp Asp Lys Gly Arg Asp Phe Ala Thr Val Arg Lys Val Leu Ser Met
1090 1095 1100
Pro Gln Val Asn Ile Val Lys Lys Thr Glu Val Gln Thr Gly Gly Phe
1105 1110 1115 1120
Ser Lys Glu Ser Ile Leu Pro Lys Arg Asn Ser Asp Lys Leu Ile Ala
1125 1130 1135
Arg Lys Lys Asp Trp Asp Pro Lys Lys Tyr Gly Gly Phe Asp Ser Pro
1140 1145 1150
Thr Val Ala Tyr Ser Val Leu Val Val Ala Lys Val Glu Lys Gly Lys
1155 1160 1165
Ser Lys Lys Leu Lys Ser Val Lys Glu Leu Leu Gly Ile Thr Ile Met
1170 1175 1180
Glu Arg Ser Ser Phe Glu Lys Asn Pro Ile Asp Phe Leu Glu Ala Lys
1185 1190 1195 1200
Gly Tyr Lys Glu Val Lys Lys Asp Leu Ile Ile Lys Leu Pro Lys Tyr
1205 1210 1215
Ser Leu Phe Glu Leu Glu Asn Gly Arg Lys Arg Met Leu Ala Ser Ala
1220 1225 1230
Gly Glu Leu Gln Lys Gly Asn Glu Leu Ala Leu Pro Ser Lys Tyr Val
1235 1240 1245
Asn Phe Leu Tyr Leu Ala Ser His Tyr Glu Lys Leu Lys Gly Ser Pro
1250 1255 1260
Glu Asp Asn Glu Gln Lys Gln Leu Phe Val Glu Gln His Lys His Tyr
1265 1270 1275 1280
Leu Asp Glu Ile Ile Glu Gln Ile Ser Glu Phe Ser Lys Arg Val Ile
1285 1290 1295
Leu Ala Asp Ala Asn Leu Asp Lys Val Leu Ser Ala Tyr Asn Lys His
1300 1305 1310
Arg Asp Lys Pro Ile Arg Glu Gln Ala Glu Asn Ile Ile His Leu Phe
1315 1320 1325
Thr Leu Thr Asn Leu Gly Ala Pro Ala Ala Phe Lys Tyr Phe Asp Thr
1330 1335 1340
Thr Ile Asp Arg Lys Arg Tyr Thr Ser Thr Lys Glu Val Leu Asp Ala
1345 1350 1355 1360
Thr Leu Ile His Gln Ser Ile Thr Gly Leu Tyr Glu Thr Arg Ile Asp
1365 1370 1375
Leu Ser Gln Leu Gly Gly Asp Lys Arg Pro Ala Ala Thr Lys Lys Ala
1380 1385 1390
Gly Gln Ala Lys Lys Lys Lys
1395
<210> 4
<211> 225
<212> DNA
<213> Artificial sequence (ARTIFICIAL SEQUENCE)
<400> 4
ggcttgtcgg actcttcgct attacgccag ctggcgaagg gggatgtgct gcaaggcgat 60
taagttgggt aacgccaggg ttttcccagt cacgacgtta ggaaattaat acgactcact 120
ataggagagc acagtcagcc tggcggtttt agagctagaa atagcaagtt aaaataaggc 180
tagtccgtta tcaacttgaa aaagtggcac cgagtcggtg ctttt 225
<210> 5
<211> 225
<212> DNA
<213> Artificial sequence (ARTIFICIAL SEQUENCE)
<400> 5
ggcttgtcgg actcttcgct attacgccag ctggcgaagg gggatgtgct gcaaggcgat 60
taagttgggt aacgccaggg ttttcccagt cacgacgtta ggaaattaat acgactcact 120
ataggcttcc agaattggat ctccggtttt agagctagaa atagcaagtt aaaataaggc 180
tagtccgtta tcaacttgaa aaagtggcac cgagtcggtg ctttt 225
<210> 6
<211> 102
<212> RNA
<213> Artificial sequence (ARTIFICIAL SEQUENCE)
<400> 6
ggagagcaca gucagccugg cgguuuuaga gcuagaaaua gcaaguuaaa auaaggcuag 60
uccguuauca acuugaaaaa guggcaccga gucggugcuu uu 102
<210> 7
<211> 102
<212> RNA
<213> Artificial sequence (ARTIFICIAL SEQUENCE)
<400> 7
ggcuuccaga auuggaucuc cgguuuuaga gcuagaaaua gcaaguuaaa auaaggcuag 60
uccguuauca acuugaaaaa guggcaccga gucggugcuu uu 102
<210> 8
<211> 631
<212> PRT
<213> Sus scrofa
<400> 8
Met Val Ser Lys Leu Ser Gln Leu Gln Thr Glu Leu Leu Ala Ala Leu
1 5 10 15
Leu Glu Ser Gly Leu Ser Lys Glu Ala Leu Ile Gln Ala Leu Gly Glu
20 25 30
Pro Gly Pro Tyr Leu Leu Ala Gly Asp Gly Ala Leu Asp Lys Gly Glu
35 40 45
Ser Cys Gly Gly Ala Arg Gly Glu Leu Ala Glu Leu Pro Asn Gly Leu
50 55 60
Gly Glu Thr Arg Gly Ser Glu Asp Glu Thr Asp Asp Asp Gly Glu Asp
65 70 75 80
Phe Thr Pro Pro Ile Leu Lys Glu Leu Glu Asn Leu Ser Pro Glu Glu
85 90 95
Ala Ala His Gln Lys Ala Val Val Glu Thr Leu Leu Gln Glu Asp Pro
100 105 110
Trp Arg Val Ala Lys Met Val Lys Ser Tyr Leu Gln Gln His Asn Ile
115 120 125
Pro Gln Arg Glu Val Val Asp Thr Thr Gly Leu Asn Gln Ser His Leu
130 135 140
Ser Gln His Leu Asn Lys Gly Thr Pro Met Lys Thr Gln Lys Arg Ala
145 150 155 160
Ala Leu Tyr Thr Trp Tyr Val Arg Lys Gln Arg Glu Val Ala Gln Gln
165 170 175
Phe Thr His Ala Gly Gln Gly Gly Leu Ile Glu Glu Pro Thr Gly Asp
180 185 190
Glu Leu Pro Thr Lys Lys Gly Arg Arg Asn Arg Phe Lys Trp Gly Pro
195 200 205
Ala Ser Gln Gln Ile Leu Phe Gln Ala Tyr Glu Arg Gln Lys Asn Pro
210 215 220
Ser Lys Glu Glu Arg Glu Ala Leu Val Glu Glu Cys Asn Arg Ala Glu
225 230 235 240
Cys Ile Gln Arg Gly Val Ser Pro Ser Gln Ala Gln Gly Leu Gly Ser
245 250 255
Asn Leu Val Thr Glu Val Arg Val Tyr Asn Trp Phe Ala Asn Arg Arg
260 265 270
Lys Glu Glu Ala Phe Arg His Lys Leu Ala Met Asp Thr Tyr Ser Gly
275 280 285
Pro Pro Pro Gly Pro Gly Pro Gly Pro Ala Leu Pro Ala His Ser Ser
290 295 300
Pro Gly Leu Pro Pro Thr Ala Leu Ser Pro Ser Lys Val His Gly Val
305 310 315 320
Arg Tyr Gly Gln Ser Ala Thr Ser Glu Gly Ala Glu Val Pro Ser Ser
325 330 335
Ser Gly Gly Pro Leu Val Thr Val Ser Ala Pro Leu His Gln Val Ser
340 345 350
Pro Thr Gly Leu Glu Pro Ser His Ser Leu Leu Ser Thr Glu Ala Lys
355 360 365
Leu Val Ser Ala Thr Gly Gly Pro Leu Pro Pro Val Ser Thr Leu Thr
370 375 380
Ala Leu His Ser Leu Glu Gln Thr Ser Pro Gly Leu Asn Gln Gln Pro
385 390 395 400
Gln Asn Leu Ile Met Ala Ser Leu Pro Gly Val Met Ala Ile Gly Pro
405 410 415
Ser Glu Pro Ala Ser Leu Gly Pro Thr Phe Thr Asn Thr Gly Ala Ser
420 425 430
Thr Leu Val Ile Gly Leu Ala Ser Thr Gln Ala Gln Ser Val Pro Val
435 440 445
Ile Asn Ser Met Gly Ser Ser Leu Thr Thr Leu Gln Pro Val Gln Phe
450 455 460
Ser Gln Pro Leu His Pro Ser Tyr Gln Gln Pro Leu Met Pro Ser Val
465 470 475 480
Gln Ser His Val Ala Gln Ser Pro Phe Met Ala Thr Met Ala Gln Leu
485 490 495
Gln Ser Pro His Ala Leu Tyr Ser His Lys Pro Glu Val Ala Gln Tyr
500 505 510
Thr His Thr Gly Leu Leu Pro Gln Thr Met Leu Ile Thr Asp Thr Thr
515 520 525
Asn Leu Ser Ala Leu Ala Ser Leu Thr Pro Thr Lys Gln Val Phe Thr
530 535 540
Ser Asp Thr Glu Ala Ser Ser Glu Ser Gly Leu His Thr Pro Ala Ser
545 550 555 560
Gln Ala Thr Thr Ile His Ile Pro Ser Gln Asp Pro Ala Gly Ile Gln
565 570 575
His Leu Gln Pro Ala His Arg Leu Ser Ala Ser Pro Thr Val Ser Ser
580 585 590
Ser Ser Leu Val Leu Tyr Gln Ser Ser Asp Ser Thr Asn Gly His Ser
595 600 605
His Leu Leu Pro Ser Asn His Ser Val Ile Glu Thr Phe Ile Ser Thr
610 615 620
Gln Met Ala Ser Ser Ser Gln
625 630
<210> 9
<211> 1042
<212> DNA
<213> Sus scrofa
<400> 9
aaggctgggg aaggggagag gggctttggg tgctgaggga ggctccccag gttttgaaag 60
ctcctgctgt tggcccagga gttctcagct cctgggctga gtgtctgaaa cccagctcca 120
tttctggtgc ccccccaccc cactgaccca aacaaccttt gagtggctgc tcgactccct 180
catcctcact acaaccctat gtttattgtg cccactccct gaagagacta agagaggcta 240
agtcacttgc tcaaggtcac acagcagact gagattgaaa ctgagtctgc caacctcaaa 300
cactcaggta gatctctcat tctcagaacc ctccccccac ctccaaggag agggttcttc 360
tgtgcctggc ctggaggctc acaagtggcc attcctgcag ggcggagtgc atccagaggg 420
gggtgtcacc atcacaggca caggggctgg gctccaacct cgtcacggag gtgcgcgtct 480
acaactggtt tgccaatcgg cgcaaggaag aagcatttcg gcacaagttg gccatggaca 540
cgtacagtgg gccaccaccg gggccaggtc cgggccctgc actgcctgcc cacagctctc 600
ctggtctgcc cccaaccgcc ctctccccca gtaaggtcca cggtgagtgc catgtgggca 660
gggggactgg acagtggtta gagggactct gagggtaggt gggagagttg gggagcacca 720
cctcattggc agcagccacc cacgcctcct ggctttctcc actccattga tcatcagctc 780
tacccattcc atattcactc caactctttt tttttttttt ttttggtctt tttaggacca 840
cacatgcagc atgtggaagt tcccaggcta ggggtctaat gggagctgta gccgccagcc 900
tatgccacag ccacaacaac accagatcag agcctcatct gtgacctaca tcacagccca 960
cagcaatgct ggattcttaa cccactgaga gaggccaggg atcaaacctg cgtcctcatg 1020
gatactaata agatttgtta tc 1042
<210> 10
<211> 100
<212> RNA
<213> Artificial sequence (ARTIFICIAL SEQUENCE)
<400> 10
agaagcauuu cggcacaagu guuuuagagc uagaaauagc aaguuaaaau aaggcuaguc 60
cguuaucaac uugaaaaagu ggcaccgagu cggugcuuuu 100
<210> 11
<211> 100
<212> RNA
<213> Artificial sequence (ARTIFICIAL SEQUENCE)
<400> 11
auuucggcac aaguuggcca guuuuagagc uagaaauagc aaguuaaaau aaggcuaguc 60
cguuaucaac uugaaaaagu ggcaccgagu cggugcuuuu 100
<210> 12
<211> 100
<212> RNA
<213> Artificial sequence (ARTIFICIAL SEQUENCE)
<400> 12
gggcagacca ggagagcugu guuuuagagc uagaaauagc aaguuaaaau aaggcuaguc 60
cguuaucaac uugaaaaagu ggcaccgagu cggugcuuuu 100
<210> 13
<211> 100
<212> RNA
<213> Artificial sequence (ARTIFICIAL SEQUENCE)
<400> 13
ggggcagacc aggagagcug guuuuagagc uagaaauagc aaguuaaaau aaggcuaguc 60
cguuaucaac uugaaaaagu ggcaccgagu cggugcuuuu 100
<210> 14
<211> 225
<212> DNA
<213> Artificial sequence (ARTIFICIAL SEQUENCE)
<400> 14
ggcttgtcgg actcttcgct attacgccag ctggcgaagg gggatgtgct gcaaggcgat 60
taagttgggt aacgccaggg ttttcccagt cacgacgtta ggaaattaat acgactcact 120
ataggatttc ggcacaagtt ggccagtttt agagctagaa atagcaagtt aaaataaggc 180
tagtccgtta tcaacttgaa aaagtggcac cgagtcggtg ctttt 225
<210> 15
<211> 225
<212> DNA
<213> Artificial sequence (ARTIFICIAL SEQUENCE)
<400> 15
ggcttgtcgg actcttcgct attacgccag ctggcgaagg gggatgtgct gcaaggcgat 60
taagttgggt aacgccaggg ttttcccagt cacgacgtta ggaaattaat acgactcact 120
atagggggca gaccaggaga gctgtgtttt agagctagaa atagcaagtt aaaataaggc 180
tagtccgtta tcaacttgaa aaagtggcac cgagtcggtg ctttt 225
<210> 16
<211> 102
<212> RNA
<213> Artificial sequence (ARTIFICIAL SEQUENCE)
<400> 16
ggauuucggc acaaguuggc caguuuuaga gcuagaaaua gcaaguuaaa auaaggcuag 60
uccguuauca acuugaaaaa guggcaccga gucggugcuu uu 102
<210> 17
<211> 102
<212> RNA
<213> Artificial sequence (ARTIFICIAL SEQUENCE)
<400> 17
gggggcagac caggagagcu guguuuuaga gcuagaaaua gcaaguuaaa auaaggcuag 60
uccguuauca acuugaaaaa guggcaccga gucggugcuu uu 102
<210> 18
<211> 163
<212> DNA
<213> Artificial sequence (ARTIFICIAL SEQUENCE)
<400> 18
tctacaactg gtttgccaat cggcgcaagg aagaagcatt tcggcacaag ctagcaatgg 60
acacgtacag tgggccacca cccggggcca ggtccgggcc ctgcactgcc tgtccacagc 120
tctcctggtc tgcccccaac cgccctctcc cccagtaagg tcc 163
<210> 19
<211> 162
<212> DNA
<213> Sus scrofa
<400> 19
tctacaactg gtttgccaat cggcgcaagg aagaagcatt tcggcacaag ttggccatgg 60
acacgtacag tgggccacca ccggggccag gtccgggccc tgcactgcct gcccacagct 120
ctcctggtct gcccccaacc gccctctccc ccagtaaggt cc 162