Movatterモバイル変換


[0]ホーム

URL:


CA2451957A1 - Methods, vectors, cell lines and kits for selecting nucleic acids having a desired feature - Google Patents

Methods, vectors, cell lines and kits for selecting nucleic acids having a desired feature
Download PDF

Info

Publication number
CA2451957A1
CA2451957A1CA002451957ACA2451957ACA2451957A1CA 2451957 A1CA2451957 A1CA 2451957A1CA 002451957 ACA002451957 ACA 002451957ACA 2451957 ACA2451957 ACA 2451957ACA 2451957 A1CA2451957 A1CA 2451957A1
Authority
CA
Canada
Prior art keywords
site
nucleic acid
recombinase
vector
vectors
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
CA002451957A
Other languages
French (fr)
Inventor
Christian Lanctot
Rock Gingras
Marie-Helene Gaumond
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Phenogene Therapeutiques Inc
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by IndividualfiledCriticalIndividual
Publication of CA2451957A1publicationCriticalpatent/CA2451957A1/en
Abandonedlegal-statusCriticalCurrent

Links

Classifications

Landscapes

Abstract

Methods, vectors, cell lines and kits for the screening and identification of nucleic acids are described. The invention is based on the use of a site-specific recombinase to excise, from an expression vector into which an exogenous nucleic acid having a desired feature has been inserted, a region of the vector which is excisable by site-specific recombination. Insertion into the vector of a nucleic acid having a desired feature, such as nucleic acids capable of changing the expression of cellular genes or the state of cellular metabolism or signaling pathways, triggers the synthesis and/or activity of a site specific recombinase, the action of the recombinase allowing an easy selection of the expression vector containing the exogenous nucleic acid having a desired feature.

Description

iviETHODS, VECTORS, CELL LINES AND KITS
FOR SELECTING NUCLEIC ACIDS HAVING A DESIRED FEATURE
RELATED APPLICATION
This application claims priority of United States Provisional Application 60/301.149 filed June 28, 2001, the disclosure of which is incorporated by reference herein in its entirety.
BACKGROUND OF THE INVENTION
a) Field of the invention The present invention relates to screening of nucleic acids. More particularly, the present invention is concerned with the identification of nucleic acids having a desired feature, such as nucleic acids encoding signaling molecules, transcription factors or other proteins involved in changes of cell metabolism.
b) Brief description of the prior art Large-scale gene sequencing projects are currently generating huge amounts of genetic information. In silico analysis and transcriptional profiling are important tools to help decipher this information. Ultimately however, the function encoded by novel sequences will have to be determined within the proper biological context. A powerFul approach to do so is the so-called expression screening 'strategy, i.e. transfection of a library of vectors harboring different nucleic acids ("expression vectors") into host cells and selective retrieval of those nucleic acids encoding specific function.
Cell-based screening technology can be viewed as a tool that sends out a "signal" if, and only if, a particular "target" nucleic acid possessing the activity being screened for has been incorporated into a cell. This technology is based on a reporter system that is kept inactive (no signal) in the absence of the target gene. Typically, reporter system are based on the conditional expression of marker proteins. Appearance of this marker in a cell transfected with an expression vector containing a nucleic acid having a desired feature allows the selection of this cell from the rest of the transfected cell population. When performed in vertebrate cells, the need to select cells greatly limits the throughput of the method since techniques to do so are complex, cumbersome or lengthy. In some cases for example, selection is achieved with fluorescent markers coupled to very sophisticated sorting equipment. In others, cells having the desired phenotype are selected either after limiting dilution or by clonal growth followed by colony picking, long and laborious techniques. Furthermore, the cell selection step may increase the occurrence of false-positives. In fact, the cell selection step is merely a pre-requisite to recover the expression vector whose transfection has triggered the appearance of a desired phenotype and to identify the nucleic acid contained therein.
Site-specific recombinases such as Cre recombinase from bacteriophage P1 and Flp from S. cerevisiae have previously been used for recombining exogenous molecules in an heterologous system (Saner and Henderson, 1988;
O'Gorman et al., 1991). The Cre and Flp recombinases bind well-characterized DNA elements ("recombination target sequences") and mediate excision or inversion of the intervening sequence, depending on the orientation of the recombination target sequences relative to one another (Saner, 1994). The Cre recombinase is extensively used to create so-called conditional mouse knock-out mutants, i.e. control the spatial and temparal inactivation of engineered genetic loci in vivo. One way to achieve this spatial and temporal selectivity is, to control the transcription of Cre via cis regulatory elements having the desired properties (Metzger and Feil, 1999). An alternative is to exploit the fact that the recombinase must be localized in the nucleus to carry out recombination of target sequences.
In this technique, the recombinase is fused to a ligand binding domain of a nuclear receptor (e.g. estrogen receptor), such that nuclear localization of the fusion protein is dependent on the presence of the nuclear receptor ligand molecule (e.g. estradiol in the case of the estrogen receptor) (Logie and Stewart, 1995; Angrand et al., 1998).
Although the prior art describes some methods to render the activity of a site-specific recombinase dependent on the occurrence of specific cellular events, the use of recombinases for screening nucleic acids has never been suggesfied. Similarly, no one has ever produced expression vectors comprising a nucleic acid sequence excisable by site-specific recombinase for fihe screening of exogenous nucleic acids in eukaryotic cells.
In view of the above, it is clear that there is a need for a nucleic acid expression screening method that bypasses the time-consuming cell selection step and that does not require marking of proteins. There is more particularly a need for new mefihods taking advantage of the recombinase activity for fihe selective retrieval of nucleic acids encoding a desired function.
The present invention aims to overcome the limits and obviate the problems known in the art for screening gene and nucleic acids by providing a molecule, a method and a kit for rapidly and efficiently idenfiifying and retrieving nucleic acids encoding a desired function in eukaryotic cells. The purpose of the invention is also to fulfill other needs that will be apparent to those skilled in the art upon reading fihe following specificafiion.
SUMMARY OF THE 1NVENTfON
It is therefore an object of the present invention to provide methods, vectors, cell lines and kifis for idenfiifying and/or selecting a nucleic acid having a desired feature. A non limitative list of nucleic acids having a desired feafiure includes nucleic acids having transcriptional activity, nucleic acids encoding proteins involved in signal firansduction pafihways, and nucleic acids encoding proteins involved in cell mefiabolism or differentiafiion state.
According to a firsfi aspect, fihe invention relafies to expression vectors or viral-based expression vectors containing one or more recombination target sequences, these vectors being modified after the action of a site-specific recombinase in such a way as to be differentiable from non recombined vecfiors.
The invention also relates to libraries of expression vectors or viral-based expression vectors containing exogenous nucleic acids. Most preferred vectors according to the invention are those whose sequences are set forth in SEQ ID
NOS: 1, 2 and 3.
According to a preferred embodiment, the vectar is useful for expressing an exogenous nucleic acid in eukaryotic cells and it comprises a nucleic acid sequence excisable by site-specific recombination.
According to another preferred embodiment, the vector is an expression vector which comprises nucleic acid sequence, the vector comprising a recombinase substrate and a transcription unit.
According to a further preferred embodiment, the vector comprises i) a site-specific recombinase coding sequence operatively linked to a termination sequence; and ii) a recombinase substrate excisable specifically by a recombinase encoded by the site-specific recombinase coding sequence.
Preferably, the recombinase substrate in the vectors of the invention comprises a stuffer region flanked by recombination target sequences.
Preferably, the stuffer region is removable by site-specific recombination.
More preferably, the stuffer region comprises a restriction site. Preferably also, the vector's transcription unit comprises an enhancer sequence, a promoter sequence and a termination sequence operatively finked together.
In one embodiment, the nucleic acid sequence of the vector comprises at least two fragments of a viral genome for packaging the vector or a fragment thereof into infectious viral particles. These fragments may derive from a retrovirus or an adenovirus. The recombinase substrate and the transcription unit are preferably located between these two viral fragments.
Preferably also, the nucleic acid sequence of the vector comprises a nucleic acid sequence encoding an inactive gene conferring resistance to an antibiotic in bacteria. According to the invention, the activity of inactive gene is restorable by site-specific recombination of the vector nucleic acid sequence.
According to another preferred embodiment, the vector nucleic acid sequence comprises a recombinase substrate and a transcription unit incorporated into a viral genome. According to the invention, the recombinase substrate comprises a stuffer region advantageously excisable by site-specific recombination, and formation of viral particles is dependent upon excision of the stuffier region. Preferably, the viral genome consists of a cDNA copy of an alphaviral genome, and presence of the stuffer region blocks translation of viral proteins encoded by the cDNA copy of the alphaviral genome. The cDNA copy of the alphaviral genome may derive firom Sindbis virus genome or from Semliki Forest virus genome. More preferably, the recombinase substrate is present in a 5' untranslated region of the cDNA copy of the alphaviral genome.
5 According to a second aspect, the invention relates to modified cell lines and transgenic animals having incorporated in their genome a DNA segment comprising a site-specific recombinase operatively linked to regulatory elements.
A related aspect concerns the use ofi such cell lines) and animals) for screening, among libraries of expression vectors, those vectors containing a nucleic acid that activates, directly or indirectly, transcription of regulatory element(s).
According to a preferred embodiment, there is provided an eukaryotic cell line which comprises an expressible site-specifiic recombinase coding sequence.
This expressible site-specific recombinase coding sequence is operatively linked to a minimal promoter and to at least one cis-acting regulatory element. In one embodiment, the site-specific recombinase is expressed upon activation of the at least one cis-acting regulatory element. The cis-acting regulatory elements) may be activated by elevation of intracellular CAMP or cGMP levels, elevation of intracellular calcium concentration, and/or change in the phosphorylation state of specific proteins (e.g. mitogen-activated protein kinase (MAPK), c jun N-terminal protein kinase (JNK) and phosphatidyl inositol-3 kinase (PI-3 kinase)). The cis-acting regulatory elements) may also be activated during differentiation of mesenchymal stem cells into bone, cartilage, adipocytes or myoblasts. More preferably, the site-specific recombinase coding sequence is optimized for enhanced synthesis, stability or translation in eukaryote cells. The expressible site-specific recombinase coding sequence may be chosen from Flp coding sequence from Saccharomyces cerevisiae, Cre coding sequence from bacteriophage P1 and a-recombinase coding sequence from Bacillus subtilis.
Preferred Flp coding sequence are those comprising SEQ 1D N0:4, SEQ 1D
NO:6, or a functional homologue thereof, particularly those homologues coding for proteins having substantially the same biological activity than SEQ ID
N0:5 or SEQ ID N0:7. fn a related aspect, the invention relates to nucleic acids and amino acid sequences comprising an optimized Flp recombinase. Preferred optimized Flp coding sequences are those coding amino acid sequence set forth in SEQ ID N0:7 (including SEQ ID N0:6) and functional homologues thereof.
According to another aspect, the invention relates to a method for identifying nucleic acids encoding a desired feature from a library of exogenous nucleic acids. In a preferred embodiment, a plurality of nucleic acids from the library are inserted into a plurality of expression vectors comprising a nucleic acid sequence excisable by site-specific recombination. Preferably, the vectors are then inserted into a eukaryotic cell line (as defined previously) or into a transgenic animal comprising a nucleic acid encoding an inactive site-specific recombinase whose activity is restorable. This site-specific recombinase may be inactive for instance due to a lack of sufficient expression or~due to sequestration outside of the cell nucleus. In one embodiment, the activity of the inactive site-specific recombinase is restored upon expression by said vector of an exogenous nucleic acid having the desired feature. Thereafter, the active site-specific recombinase preferably excises a fragment from the expression vectors, thereby forming recombined expression vectors that comprise a nucleic acid having the desired feature and that can be differentiated from unrecombined expression vectors.
In a more specific embodiment, the method of the invention is used for screening exogenous nucleic acids having a desired feature within eukaryotic cells. The method comprises the steps of:
a) providing a plurality of expression vectors each capable, when present into a suitable host, of expressing an exogenous nucleic acid inserted therein, these vectors comprising a nucleic acid sequence excisable by site-specific recombination;
b) providing a cell line or a transgenic animal comprising a nucleic acid encoding an inactive site-specific recombinase whose activity is restorable;
c) inserting at least one exogenous nucleic acid from a library of nucleic acids into a plurality of the expression vectors, in order to provide a library of recombinant expression vectors;
d) introducing, into cells of the cell line or of the transgenic animal of step (b), a plurality of recombinant expression vectors from the library obtained at step (c);
e) allowing the recombinant expression vectors introduced at step (d) to express the exogenous nucleic acid inserted therein, wherein only exogenous nucleic acids encoding the desired feature are capable of restoring the activity of the site-specific recombinase of step (b);
f) allowing the site-specific recombinase whose activity has been restored in step (e) to excise the excisable nucleic acid sequence from recombinant expression vectors which have expressed an exogenous nucleic acid having restored the activity of the site-specific recombinase;
g) recovering recombinant expression vectors from cells of the cell line or transgenic animal; and h) selecting recombined expression vectors having undergone site-specific recombination at step (f), said recombined vectors containing an exogenous nucleic acid encoding the desired feature.
In an even more specific embodiment, the method of the invention is used for screening exogenous nucleic acids having a transcriptional activity within eukaryotic cells (e.g. regulatory elements such as enhancer, promoter, etc).
The method comprises the steps of:
a) providing a vector comprising: i) a site-specific recombinase coding sequence operatively linked to a termination sequence; and ii) a recombinase substrate excisable specifically by a site-specific recombinase encoded by the site-specific recombinase coding sequence;
b) inserting into a plurality of vectors as defined at step (a) at least one exogenous nucleic acid taken from a library of exogenous nucleic acids in order to provide a library of recombinant vectors;
c) inserting a plurality of recombinant vectors from the library obtained at step (b) into a suitable eukaryotic host;
d) allowing the exogenous nucleic acid inserted at step (b) to activate transcription of the site-specific recombinase coding sequence which is comprised in the vector, thereby producing the site-specific recombinase;

e) allowing the site-specific recombinase so produced to excise the recombinase substrate in the recombinant vector harboring the exogenous nucleic acid having activated the transcription of the site-specific recombinase;
f) following step (e), recovering a plurality of recombinant vectors from the eukaryotic host; and g) selecting recombinant vectors having undergone site-specific recombination, most of these vectors containing an exogenous nucleic acid having transcriptional activity.
It is further an object of the invention to provide methods for specifically recovering recombined vectors after screening of libraries of expression vectors or viral-based expression vectors.
According to one embodiment, the nucleic acid sequence of the vector comprises an inactive gene conferring resistance to an antibiotic in bacteria, the activity of the inactive gene being restored by site-specific recombination of said nucleic acid sequence. Therefore, recombined vectors may be isolated from unrecombined vectors by:
i) extracting DNA from cells into which the expression vectors have been introduced;
ii) transforming bacteria with DNA extracted at step (i);
iii) growing bacteria transformed at step (ii) in presence of the antibiotic;
and iv)selecting bacterial colonies resistant to the antibiotic.
The resistant bacterial colonies comprises expression vectors having undergone site-specific recombination.
This method may further comprises the steps of:
v) extracting expression vectors from colonies selected at step (iv); and vi) identifying an exogenous nucleic acid found in said extracted vectors.
According to another embodiment, the nucleic acid sequence of the vector comprises a recombinase substrate having a stuffer region flanked by recombination target sequences. The stuffer region also comprises a cleavable restriction site. Therefore, recombined vectors may be isolated from unrecombined vectors by:

i) extracting DNA from cells info which the expression vectors have been introduced; and ii) contacting DNA extracted at step a) with a restriction enzyme recognizing said cleavable restriction site;
iii) optionally degrading DNA fragments cleaved by the restriction enzyme with an exonuclease; and iv) optionally amplifying a DNA fragment from the expression vectors, the fragment comprising the exogenous nucleic acid.
Accordingly, recombined expression vectors are not cleaved by the restriction enzyme, but unrecombined expression vectors are cleaved by the restriction enzyme.
In another aspect, the invention concerns a screening kit comprising 1 ) a vector as defined herein; and/or 2) a cell line as defined herein; and at least one further element selected from the group consisting of instructions for using the kit, reaction buffer(s), enzyme(s), probes) and pools) of nucleotide molecules to be screened.
An advantage of the present invention is that it obviates the expensive and time-consuming task of selecting cells that express a gene of interest. The invention is also much more rapid; efficient and accurate for selecting a particular nucleic acid having a desired feature, characteristic or function. The invention can also selectively retrieve, from a library of nucleic acids, a nucleic acid having a desired feature, such as a nucleic acid encoding a signaling molecule, a transcription factor or a protein involved somehow in promoting changes in cell metabolism or differentiation state (for instance a kinase, a phosphatase, or a transcription factor).
Other objects and advantages of the present invention will be apparent upon reading the following non-restrictive description of several preferred embodiments, made with reference to the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS
Figure 1 is a schema illustrating how site-specific recombination can be used to screen for nucleic acids encoding a specific biological function.

Figures 2A and 2B are schemas showing preferred embodiments of a transcription unit of an expression vector according to the invention.
Figures 3A, 3B, and 3C are schemas showing preferred embodiments of a 10 recombinase substrate of an expression vector according to the invention, and also preferred methods for specifically retrieving whole or parts of recombined expression vectors or viral-based expression vectors according to the invention.
Figure 4A shows an alignment of the first 116 codons and corresponding amino acids of wild type Flp recombinase (Flp; SEQ ID NOS: 4 and 5) and of an optimized recombinase coding sequence (oFlp; SEQ ID NOS: 6 and 7) according the present invention. Amino acid substitutions to enhance thermostability are shown in bold. Putative internal polyadenylation signal is underlined.
Figure 4B is a picture of a Northern analysis comparing the expression of wild type Flp (411) and optimized recombinase (oFlp, 412) after transfection of appropriate constructs in HEK293 cells. Flp signal (arrowhead) is not detected in mock-transfected cells (410).
Figure 4C is a picture of Western analysis comparing the amount of Flp protein produced after transfection of HEK293 cells with vectors expressing either the wild type coding sequence (421) or optimized coding sequence ,(422) according fio the invention. Flp signal (arrowhead) is not detected in mock-transfected cells (420).
Figures 5A, 5B and 5C schematizes construction of an expression vector (RC43) according to a preferred embodiment of the invention, the expression vector comprising a transcription unit and a recombinase substrate disrupting a gene conferring resistance to kanamycin. Nucleic acid sequence of RC43 is set forth in SEQ ID N0:1.
Figure 6 schematizes the construction of a plasmid containing cis-acting regulatory elements operatively linked to an optimized recombinase coding sequence (oFfp).
Figure 7 is a picture showing the results of a Northern analysis performed to detect expression of oFlp mRNA in subclones of HEK293 cells obtained after stable transfection of a plasmid comprising a coding sequence for oFlp operatively linked to regulatory elements activated by the Ga14VP16 protein (RE-oFlp). Lane 701, wild type HEK293 cells; lane 702, subclone 6 transfected with a control vector expressing green fluorescent protein; lane 703, subclone 6 transfected with an expression vector for Ga14VP16; lane 704, subclone 10 transfected with a control vector expressing green fluorescent protein; lane 705, subclone 10 transfected with an expression vector for Ga14VP16. oFlp signal is indicated by an arrowhead. The signal indicated by an asterisk is an artefact arising from transfection of the control vector.
Figure 8A schematizes fragments from non recombined (801) and recombined (802) expression vectors. Arrows indicate the approximate positions of primers used in the PCR analysis presented on figure 8B.
Figure 8B is a picture showing results of a PCR analysis performed to selectively amplify fragments from recombined expression vectors according to a preferred embodiment of the invention. Expression vectors were recovered from HEK293/RE-oFlp subclone 6 transfected with a vector expressing green fluorescent protein (lane 813), Ga14VP16 (lane 814) or from wild type HEK293 cells transfected with a vector expressing Ga14VP16 (lane 815). DNA was ,subjected to PCR after digestion with Swal. A control fragment was amplified from a non recombined vector expressing Ga14VP16 (lane 811).

Figures 9A and 9B schematizes the construction of an expression vector (plasmid RC49-2) according to a preferred embodiment of the invention. The plasmid may generate an adenovirus-based expression vector, and comprises a transcription ~ unit and a recombinase substrate with a restriction site. The approximate position of primers 18-64V and 18-106V used in subsequent PCR is indicated. Nucleic acid sequence of RC49-2 is set forth in SEQ ID N0:2.
Figure 10 is a picture of a Northern analysis showing the expression of optimized Flp mRNA in distinct subclones of Hela cells obtained after stable transfection of a vector comprising an optimized Flp coding sequence linked operatively to a cytomegalovirus enhancer and promoter according to a preferred embodiment of the invention. Lane 1001, wild type Hela cells; lane 1002, subclone Hela/oFlp2-3;
lane 1003, subclone Hela/oFlp3-2; lane 1004, subclone Hela/oFlp6-2; lane 1005, ~ subclone Hela/oFlp6-3.
Figure 11A is a picture showing results of a PCR analysis performed to determine the amount of recombined adenovirus-based expression vector according to a preferred embodiment of the invention using DNA extracted from wild type Hela cells (1101); infected Hela/oFlp6-2 cells (1102); infected Hela/oFlp6-2 cells and digested by Swal (1103); infected Hela/oFlp2-3 cells (1104). Lane 1100 shows the migration of a molecular marker. Fragments amplified from non recombined adenovirus-based expression vector (1105).
Fragments amplified from recombined adenovirus-based expression vector (1106).
Figure 11 B is a picture showing results of a PCR analysis performed to detect fragments of recombined adenovirus-based expression vectors according to a preferred embodiment of the invention after infection of populations of Hela cells containing 10% of Hela/oFlp6-2 cells (lanes 1110,1111 ) or 0.1 % of Hela/oFlp6-cells (lanes 1112,1113). DNA extracted from infected cells was subjected to PCR
after digestion with Swal (lanes 1111,1113).

Figure 11C is a picture showing results of a semi-nested PCR analysis performed on amplicons obtained from Swal-digested DNA extracted from populations of Hela cells containing 10% of HelaloFlp6-2 cells (lanes 1120) or 0.1 % of Hela/oFlp6-2 cells (lanes 1121 ). Lane 1122 shows the migration of a molecular marker. Fragments amplified from non recombined adenovirus-based expression vector (1123). Fragments amplified from recombined adenovirus-based expression vector (1124).
Figures 12A and 12B schematizes the construction of an expression vector (RC77) according to a preferred embodiment of the invention, the expression vector comprising a transcription unit embedded in a viral genome whose translation is disrupted by a recombinase substrate. Nucleic acid sequence of RC77 is set forth in SEQ ID N0:3.
Figure 13 shows the properties of an expression vector containing a transcription unit embedded in a cDNA copy of the Sindbis virus genome whose translation is disrupted by a recombinase substrate according to a preferred embodiment of the invention. Figure 13A, Image 1301, expression of GFP inserted in such an expression vector (RC77). Image 1302, immunofluorescence against the C viral protein in HEK293A cells transfected with RC77 and a control expression vector (VB35). Image 1303, immunofluorescence against the C viral protein in BHK-21 cells infected with culture medium from HEK293A cells transfected with RC77 and VB35. Image 1304, immunofluorescence against the C viral protein in HEK293A cells transfected with RC77 and a vector expressing oFlp (RC59).
Image 1305, immunofluorescence against the C viral protein in BHK-21 cells infected with culture medium from HEK293A cells transfected with RC77 and RC59. Figure 13B is a picture showing results of a~ RT-PCR analysis performed to detect fragments of engineered Sindbis virus genomes after co-transfection in HEK293A cells of RC77 with either VB35 (lane 1313) or RC59 (lane 1314).
Fragment amplified from RC77 plasmid DNA (lane 1312). Control reaction on RNA extracted from untransfected cells (lane 1311 ). Lane 1310 shows migration of a 100bp ladder.
Similar reference numerals are used in different figures to denote similar components.
DETAILED DESCRIPTION OF THE INVENTION
A) Definitions Throughout the text, the word "kilobase" is generally abbreviated as "kb", the words "deoxyribonucleic acid" as "DNA", the words "ribonucleic acid" as "RNA", the words "complementary DNA" as "cDNA", the words "polymerase chain reaction" as "PCR", and the words "reverse transcription" as "RT". Nucleotide sequences are written in the 5' to 3' orientation unless stated otherwise.
In order to provide an even clearer and more consistent understanding of the specification and the claims, including the scope given herein to such terms, the following definitions are provided:
Desired feature: Refers to a nucleic acid encoding a peptide or a protein having a desired property or function. A non-limitative list of examples of a nucleic acid having a "desired property" or "desired function" include nucleic acids encoding a specific signal transduction activity (e.g. a kinase or a phosphatase), a specific gene regulation activity (e.g. a transcription factor), or a specific cellular function (e.g. a protein promoting changes in cell metabolism or differentiation state), etc.
Exogenous nucleic acid: A nucleic acid (such as cDNA, cDNA
fragments, genomic DNA fragments, antisense RNA, oligonucleotide) which is not naturally part of another nucleic acid molecule. The "exogenous nucleic acid"
may be from any organism or purely synthetic.
Expression: The process whereby an exogenous nucleic acid is transcribed. In the case of cDNAs, cDNA fragments, genomic DNA fragments and oligonucleotides, the transcribed exogenous nucleic acid can be subsequently translated info a peptide or a protein in order to carry out its function if any.

Expression vector: a vector capable of mediating the expression of an exogenous nucleic acid once introduced into a host. Preferably, expression vectors according to the present invention are capable of expressing an exogenous nucleic acid inserted therein in eukaryotic cells and comprise a 5 recombinase substrate and a transcription unit. In addition, the expression vectors of the invention preferably contain a signal for the termination of transcription and the polyadenylation of transcripts generated from enhancer and promoter sequences (see transcription unit definition). The expression vectors also preferably comprise unique restriction sites between the promoter 10 sequences and the termination sequence for inserting the exogenous nucleic acid to be expressed.
Functional homologue: As is generally understood and used herein, refers to a non native polypeptide or nucleic acid molecule that possesses a functional biological activity that is substantially similar to the biological activity of 15 a native polypeptide or a nucleic acid molecule. A functional homologue typically refers to a polypeptide or a nucleic acid molecule having at least 50%, more preferably at least 55%, even more preferably at least 60%, still more preferably at least 65-70%, and yet even more preferably greater than 85%, 90%, 95% or 95% similarity or identity at the level of nucleotide or amino acid sequence to at least one or more regions of a given nucleotide or amino acid sequence. The functional homologue may exist naturally or may be obtained following a single or multiple amino acid substitutions, deletions and/or additions relative to the naturally occurring enzymes) using methods and principles well known in the art.
A functional homologue of a protein may or may not contain post-translational modifications such as covalently linked carbohydrate, if such modification is not necessary for the performance of a specific function. It should be noted, however, that nucleotide or amino acid sequences may have similarities below the above given percentages and still encode a proteinic molecule having a desired activity, and such proteinic molecules may still be considered within the scope of the present invention where they have regions of sequence conservation. The term "functional homologue" is intended to the "fragments", "segments", "variants", "analogs" or "chemical derivatives" of a polypeptide or a nucleic acid molecule.

Fragment: refers to a section of a molecule, such as protein/polypeptide or nucleic acid, and is meant to refer to any portion of the amino acid or nucleotide sequence.
Host: A cell, tissue, organ or organism capable of providing cellular components for allowing expression of an exogenous nucleic acid inserted into an expression vector. This term is intended to also include hosts which have been modified in order to accomplish these functions. Bacteria, fungi, animals (cells, tissues, or organisms) and plants (cells, tissues, or organisms) are examples of a host. Preferred hosts according to the present invention are eukaryotic cells and animals.
Insertion: The process by which a nucleic acid is introduced info another nucleic acid. A typical example includes insertion of an exogenous nucleic acid into an expression vector to create a "recombinant" or "genetically modified"
expression vector. Methods for inserting a nucleic acid into another normally requires the use of restriction enzymes and such methods of insertion are well known in the art.
Knock-in: Refers to the process by which a specific region of the genome of a host is replaced by an exogenous nucleic acid through a reaction involving homologous recombination. According to a preferred embodiment of the present invention, this process is used to replace the first coding exon of a host gene by the coding sequence of a site-specific recombinase.
Library: A collection or a pool of nucleic acid molecules. This includes genomic libraries, RNA libraries, cDNA libraries, expressed sequence tag libraries, artificial sequences libraries including randomized artificial sequence libraries.
Minimal promoter: A short DNA sequence harboring minimal requirements for initiating transcription of a genetic sequence. The minimal promoter is not sufficient to activate transcription of a linked gene. A
sequence harboring a so called "TATA" box at about 30 nucleotides upstream of the site of initiation of transcription is an example of a minimal promoter.
Nucleic acid: Any DNA, RNA sequence or molecule having one nucleotide or more, including nucleotide sequences encoding a complete gene.

The term is intended to encompass all nucleic acids whether occurring naturally or non-naturally in a particular cell, tissue or organism. This includes DNA
and firagments thereof, RNA and firagments thereof, cDNAs and fragments thereof, expressed sequence tags, artificial sequences including randomized artificial sequences.
Optimized coding sequence: refers to a wild type nucleic acid sequence which has been modified to give higher levels of transcripts andlor products when expressed in a given host which is different from the host of the wild type nucleic acid. A typical example is the replacement of codons not efficiently translated in a given host by codons preferred in this host.
Recombinant: The term "recombinant" in association with "expression vector" refers to an expression vector which has been modifiied to contain a non-native exogenous nucleic acid.
Recombinase substrate: A nucleic acid molecule comprising a stuffer region flanked by recombination target sequences in direct or reverse orientation relative to one another. Typically the stuffer region is a nucleic acid .
sequence which is excisable by site-specific recombination.
Recombination target sequence: A short DNA segment acted upon by a site-specific recombinase. Generally, it is composed of two inverted sequences (such as SEQ ID N0:8) that are bound by a site-specific recombinase and that are separated by a spacer sequence of defined length. According to a preferred embodiment of the invention, an additional binding sequence is typically present at the 5' end of the recombination target sequence.
Regulatory element: Refers to a DNA sequence that can, under specific cellular conditions, mediate the activation or repression of the transcription of nucleic acid sequences that are operatively linked thereto. Typically, regulatory elements comprise one or more fragments of sequences naturally occurring in the enhancer or promoter regions of cellular genes. Purely synthetic regulatory elements can also be made by assembling one or more oligonucleotides corresponding to binding sites of specific transcription factors.
Site-specific recombinase: A protein capable of mediating site-specific recombination.

Site-specific recombination: The process by which a recombinase substrate is acted upon by a site-specific recombinase. Typically, this activity results in the excision of the stuffer region and of one recombination target sequence if the recombination target sequences are in the direct orientation relative to one another, or, if the recombination target sequences are in the reverse orientation relative to one another, in the inversion of the stuffer region.
Transcription unit: As used herein, refers to a region of a vector which comprises an enhancer sequence, a promoter sequence and a termination sequence, all operatively linked together. Preferably, the enhancer and promoter sequences are constitutively active and are operatively linked to an exogenous nucleic acid inserted into the vector. Enhancer and promoter sequences can be derived for example from the cytomegalovirus (CMV) immediate-early genes or from the Rous sarcoma virus (RSV) long terminal repeat.
Transfection: the process of introducing nucleic acids in eukaryotic cells by any means such as electroporation, lipofection, precipitate uptake, micro-injection. A cell having incorporated an exogenous nucleic acid (e.g. an expression vector or a recombinant expression vector) is said to be transfected.
Vector: An RNA or DNA molecule which can be used to transfer an RNA
or DNA segment from one organism to another.
Viral-based expression vector: Refers to an expression vector or parts thereof embedded in a viral genome that can be packaged into infectious viral particles. Typically, the parts of an expression vector embedded in a viral genome consists of the transcription unit and the recombinase substrate. Viral-based expression vectors provide a way to better control the delivery of exogenous nucleic acids to host cells via infectious viral particles.
B) General overview of the invention The invention is based on the use of a site-specific recombinase to modify an expression vector containing an exogenous nucleic acid having a desired feature. As will be outlined in greater details below, insertion of a nucleic acid having a desired feature, such as nucleic acids capable of changing the expression of cellular genes or the state of cellular metabolism or signaling pathways, triggers the synthesis and/or activity of a site specific recombinase, the action of the recombinase allowing an easy selection of the expression vector containing the exogenous nucleic acid having a desired feature.
C) Methods for selecting a nucleic acid having a desired feature According to a first aspect, the present invention relates to methods for screening andlor identifying exogenous nucleic acids having a desired feature within eukaryotic cells.
In its most basic version, the invention is used to screen for nucleic acids encoding a specific gene regulatory activity (e.g. a kinase, a phosphatase, or a transcription factor). Figure 1 depicts a preferred specific embodiment of a screening method according to the invention. A shown, a vector (101 ) which is capable, when present into a suitable host (102), of expressing an exogenous nucleic acid inserted therein (103), is provided. The vector comprises a nucleic acid sequence (104) which is excisable by site-specific recombination.
A cell line or a transgenic animal is also provided. ~ The cell line or transgenic animal comprises a nucleic acid minimally encoding an inactive site-specific recombinase (105) whose activity is restorable.
A library of recombinant expression vectors is then prepared. This is achieved by inserting into a plurality of expression vectors as the one defined previously, at least one exogenous nucleic acid from a library of exogenous nucleic acids. Next, a plurality of recombinant expression vectors from this library are inserted into the cell line or transgenic animal provided previously.
Thereafter, these recombinant expression vectors are allowed to express the exogenous nucleic acid inserted therein. According to the invention, only exogenous nucleic acids encoding the desired feature will be capable of restoring (106) the activity of the site-specific recombinase of the host. A site-specific recombinase (107) whose activity is restored may then excise the excisable nucleic acid sequence from recombinant expression vectors) which have expressed an exogenous nucleic acid having restored such site-specific recombinase activity.
Recombinant expression vectors are then recovered (108) from the transfected cells or transgenic animal and recombinant expression vectors having undergone site-specific recombination are selected (109). According to the invention, most of these vectors contains an exogenous nucleic acid encoding the desired feature.
In another embodiment, the invention is used for screening nucleic acids 5 having a transcriptional activity {e.g. regulatory elements such as enhancers, promoters and the like). A preferred screening method comprises the steps of:
a) providing a vector comprising: i) a site-specific recombinase coding sequence operatively linked to a termination sequence; and ii) a recombinase substrate excisable specifically by a site-specific 10 recombinase encoded by the site-specific recombinase coding sequence;
b) inserting into a plurality of vectors as defined at step (a) at least one exogenous nucleic acid taken from a library of exogenous nucleic acids in order to provide a library of recombinant vectors;
15 c) inserting a plurality of recombinant vectors from the library obtained at step (b) into a suitable eukaryotic host;
d) allowing the exogenous nucleic acid inserted at step (b) to activate transcription of the site-specific recombinase coding sequence which is comprised in the vector, thereby producing fihe site-specific 20 recombinase;
e) allowing the site-specific recombinase so produced to excise the recombinase substrate in the. recombinant vector harboring the exogenous nucleic acid having activated the transcription of the site-specific recombinase;
f) following step (e), recovering a plurality of recombinant vectors from the eukaryotic host; and g) selecting recombinant vectors having undergone site-specific recombination, most of these vectors containing an exogenous nucleic acid having transcriptional activity.

i) Site-specific recombinase As it will now be explained in more detail, the present invention uses a site-specific recombinase as a tool to screen for nucleic acids encoding a specific function or having a desired feature. Site-specific recombinases are part of the larger integrase family of recombinases that are mainly involved in the insertion, deletion or inversion of genetic material. Site-specific recombinases have been used to recombine DNA molecules transfected into eukaryotic cells, particularly Cre from bacteriophage P1 and Flp from Saccharomyces cerevisiae (Saner and Henderson, 1988; O'Gorman et al., 1991). These proteins cooperatively bind to specific DNA sequences arranged as palindromes ("recombination target sequences"; e.g. SEQ !D NO:B) (Jayaram, 1985). Recombination between target sequences results in the deletion of the intervening sequence and of one target sequence if the recombination target sequences are in the same orientation relative to one another.
Theoretically, any site-specific recombinase can be used according to the present invention. Examples include prokaryotic ~3-recombinase (Diaz et al., 1999) in addition to Cre and Flp mentioned above. However, it may be necessary to optimize the coding sequence of the recombinase as the preferred codon usage in the organism from which it originates may differ greatly from the preferred codon usage in the screening host. This can impair either stability or efficient translation of the recombinase mRNA. In one of the enclosed examples (Example 1 ), codons in the first 345 nt of the Flp coding sequence from Saccharomyces cerevisiae (SEQ ID N0:4) have been changed to optimal codons for translation in mammalian cells. In preferred embodiments of the present invention, an optimized coding sequence of Flp (SEQ ID N0:4) is used as a tool to screen nucleic acids.
ii) The expression vector The expression vector according to the present invention is minimally composed of a transcription unit (200; Fig 2A) and a selectable recombinase substrate unit (309; Figs 3A to 3C).

According to an embodiment of the invention shown in Figure 2A, the transcription unit (200) comprises cloning sites (204), a promoter (202), enhancer elements (201), transcription termination and polyadenylation signals (203).
An exogenous nucleic acid (205) is inserted into the cloning sites (204) of the transcription unit (200). According to another embodiment of the invention shown in Figure 2B, the transcription unit comprises a recombinase coding sequence (211) operatively linked to a minimal promoter (210) and an exogenous nucleic acid (205) is placed upstream of the promoter (210).
The transcription unit (200) as schematized on Figure 2A serves to express exogenous nucleic acids and comprises enhancer (201 ) and promoter (202) sequences, followed by signals (203) for the termination of transcription and the pofyadenyfation of transcripts generated from the .enhancer and promoter sequences. Enhancer and promoter sequences driving robust expression in a wide variety of cells are generally preferred. These include but are not limited to sequences derived from cytomegalovirus immediate-early genes (CMV;
GenBankT"" acc. No. AF477200) and Rous sarcoma virus long terminal repeat (RSV; GenBankT"" acc. No. M83236.1) as well as sequences derived from widely expressed cellular genes such as chicken ~3-actin and human elongation factor 1 a. Alternatively, enhancer and promoter sequences driving expression in specific cells, tissues or organs can be used. In this case, expression of the exogenous nucleic acid will be limited to the cells, tissues or organs in which the enhancer and promoter sequences can activate transcription. This can be desirable when constructing libraries of viral-based expression vectors.
Indeed, as will be described in more detail below, such construction requires the introduction of expression vectors into cells at some point, a step that can lead to the loss of vectors comprising exogenous nucleic acids whose expression are deleterious or toxic to the cell type used in the library construction procedure.
Such losses can be spared by insuring that exogenous nucleic acids are expressed from cell-specific or tissue-specific enhancer and promoter sequences that are not active in the cell type used for library construction. The following DNA
fragments are just a few examples of enhancer and promoter sequences that can activate the expression of exogenous nucleic acid only in specific cell populations. Nucleotides are numbered relative to the site of initiation of transcription (+1 ). A fragment encompassing nucleotides -1700 to +1 of the rat osteocalcin gene can be used to achieve osteoblast-specific expression (Baker et al., 1992). A fragment encompassing nucleotides -1542 to -1 of the kidney androgen-regulated protein can be used to achieve kidney-specific expression (Ding et al., 1997). Signals for the termination and polyadenylation of transcripts are well known in the art. Examples include part of the 3' untranslated region of the bovine growth hormone gene or of the SV40 virus. Unique cloning sites (204) are introduced between the promoter sequences and the termination signal and are used to insert exogenous nucleic acids (205). To decrease the probability of cleaving the exogenous nucleic acids during their insertion process, these sites are generally recognized by or compatible with sites recognized by enzymes which infrequently cut DNA molecules (e.g. Notl, Sall) Alternatively, the transcription unit, as schematized on Figure 2B, comprises a recombinase coding sequence (211 ) linked at its 5' end to a minimal promoter sequence (210) and at its 3' end to a transcription termination and polyadenylation sequence (203). The minimal promoter is typically approximately 30-40 nucleotides in length. Its only functional element is a "TATA box" about 30 nt upstream of the site of initiation of transcription. The minimal promoter sequence can be derived from naturally occurring .genes (e.g. pro-opiomelanocortin (Therrien and Drouin, 1991 ) or be entirely synthetic. The minimal promoter is chosen such that the level of expression of the recombinase in the screening host is insufficient to mediate efficient recombination of substrate. Unique cloning sites (204) are present, generally immediately upstream of the minimal promoter, to insert an exogenous nucleic acid (205) to be tested for its transcriptional properties. It is understood that the minimal promoter can be omitted to screen for sequences containing complete transcriptional activity.
According to preferred embodiments of the present invention, the expression vector also comprises a recombinase substrate (309). In one embodiment, depicted o.n Figure 3A, the recombinase substrate (309) is composed of a stuffer region (302) containing a restriction site (R) flanked by recombination target sequences in the same orientation (301 ). Site-specific recombination (303) leads to removal of the stuffer and one recombination target sequence as well as disappearance of the restriction site. Recombined expression vectors (305) can be distinguished from non recombined expression vectors (304) after digestion with restriction enzyme R and PCR amplification using primers located upstream and downstream (306, 307) of the site of recombination.
More preferably, the recombination target sequences (301) are in direct orientation relative to one another and are separated by a stuffer region (302) which comprises one or many rare restriction site (R; e.g. Notl, Pacl, Swal).
Site-specific recombination (303) leads to removal of the stuffer and one recombination target sequences. Consequently, the rare restriction site is also deleted from the recombined molecule. Thus, unrecombined expression vectors (304) can be distinguished from recombined expression vectors (305) by their size and by the restriction patterns obtained after digestion by the enzyme cleaving at said rare restriction site. Ideally, the restriction site present in the stuffer region should be unique in the expression vector such that unrecombined expression vectors can be distinguished from recombined expression vectors by their sensitivity to the enzyme cleaving at the rare restriction site.
Furthermore, the restriction site should be rarely found in DNA molecules to decrease the probability of cleaving the exogenous nucleic acid, thereby allowing a region comprising the exogenous nucleic acid to be amplified by PCR from recombined vectors using primers located upstream and downstream (306,307) of the recombination site.
In another embodiment of the invention, depicted on Figure 3B the recombinase substrate (309) is composed of a stuffer region (302) which is flanked by recombination target sequences in the same orientation (301 ), which disrupts a coding sequence conferring resistance to a given antibiotic (310), and which is expressed from a prokaryotic promoter (311). According to this embodiment, the vector is designed such that the remaining recombination target sequence (312) after site-specific recombination (303) no longer interferes with the production of protein conferring resistance to the antibiotic. Therefore, recombined expression vectors (314) give rise to colonies (316) when transformed into bacteria whereas non recombined expression vectors (313) do not (315).
More preferably, the DNA segment which comprises recombination target 5 sequences (301) in direct orientation relative to one another and separated by a stuffier region (302), is inserted in the expression vector within a gene conferring resistance to a given antibiotic (310) (e.g. aminoglycoside phosphotransferase conferring resistance to kanamycin) such that it disrupts its proper function.
Disruption can be achieved by interrupting the coding sequence of said gene or 10 by abolishing the expression of said gene through insertion of recombination target sequences and stuffier in essential promoter sequences or between promoter (311 ) and coding sequences. Site-specific recombination (303) will lead to removal of one recombination target sequence and the stuffier segment.
According to this embodiment, the expression vector is designed such that the 15 remaining recombination target sequence (312) no longer interferes with the proper function of the gene conferring resistance to a given antibiotic. Thus, bacteria transformed with recombined expression vectors (314) will be resistant to a given antibiotic (316) whereas those transformed with unrecombined expression vectors (313) will not (315). It is understood that a rare restriction site 20 can be introduced in the stuffier segment, as described above, to distinguish recombined and unrecombined expression vectors after digestion with an enzyme recognizing such a rare restriction site.
In yet another embodiment according to the present invention, depicted on Figure 3C, the recombinase substrate (309) is composed of a stuffer region (302) 25 flanked by recombination target sequences in the same orientation (301) which disrupts translation of a viral genome (321) in which a transcription unit (320) comprising an exogenous nucleic acid (205) has been embedded. The defective viral genome is expressed from eukaryotic promoter and enhancer eiements (322). Signals for transcription termination and polyadenylation of transcripts are provided (203). According to this embodiment, the vector is designed such that the remaining recombination target sequence (312) after site-specific recombination {303) no longer interferes with the translation of the viral genome.

Therefore, recombined expression vectors (325) give rise to viral particles whereas non recombined expression vectors (324) do not.
According to this preferred embodiment, the transcription unit and the associated exogenous nucleic acid (320) are embedded in a viral genome (321 ) whose translation and replication have been disrupted by insertion of a DNA
segment comprising recombination target sequences (301) in direct orientation relative to one another and separated by a stuffer region (302). More preferably, the viral genome is a cDNA copy of a Sindbis virus replicon (see GenBankT""
acc.
No. NC 001547; and WO 02/16572 incorporated herein by reference) cloned into a DNA-based plasmid downstream of constitutively active enhancer and promoter sequences (322) and whose translation is disrupted by insertion of said DNA
segment in the 5' untranslated region of the viral genome. Preferably, the enhancer and promoter sequences driving expression of the disrupted viral genome are different from those comprised in the transcription unit and driving expression of the exogenous nucleic acid. The viral genome contains the transcription unit and associated exogenous nucleic acid between the viral coding sequence and the 3' untranslated region (323). Once transfected, such a DNA
plasmid will lead to expression of the exogenous nucleic acid (205). Site-specific recombination (303) will lead to removal of one recombination target sequence and the stuffer segment. According to this embodiment, the expression vector is designed such that the remaining recombination target sequence (312) no longer interferes with the translation and replication of the Sindbis virus replicon.
Thus, only recombined expression vectors (325) produce self-replicating and self-packaging viral genomes thafi contain the exogenous nucleic acid whereas unrecombined expression vectors (324) do not.
Advantageously, the expression vector or part thereof is embedded in a viral genome to generate a viral-based expression vector that minimally contains 1) a transcription unit and 2) a recombinase substrate having at least one of the properties described above. Such a viral-based expression vector may then be packaged within infectious viral particles. These are particularly useful to deliver the expression vector into a whole organism or into cells that are difficult to transfect by conventional methods (e.g. primary cells, immortalized cell lines of 2r hematopoietic origin). Engineered retroviruses and adenoviruses are commonly used to introduce nucleic acid into cells (Ragot et al., 1998; Pear et al., 1993).
Standard methodology may be used according to the invention to insert an expression vector in a viral genome and package the resulting viral genome within infectious viral particles. Preferably, the transcription unit and the recombinase substrate of the expression vector should be flanked by viral sequences fihat are either essential for replication and packaging or that are sufficient to insert components of the expression vector in a viral genome via homologous recombination.
iii) Production of a cell line or a transgenic animal in inrhich the activity of the site-specific recombinase is regulafed in a specific manner The present invention relies on the conditional activity of a site-specific recombinase to select expression vectors containing nucleic acids having a desired feature such as those encoding a peptide/protein with a specific cellular function. It is understood that the recombinase activity should somehow be dependent on the occurrence of the specific cellular function. Before choosing a screening host (cell line or transgenic animal or plant), it is important to ascerfiain that 1) the specific cellular function does not occur in the absence of an "activating" exogenous nucleic acid, i.e. that the recombinase is not active under basal conditions; and 2) that the cellular function can occur if the right conditions are met, for example transfection of an expression vector containing an "activating" exogenous nucleic acid.
In one embodiment of the present invention, the specific cellular function being screened for is the activation of a particular gene or sef of genes. In this case, the recombinase coding sequence is placed under the control of regulatory elements known to be responsible for the activation of this particular gene or set of genes. Thus, according to this embodiment, the recombinase will be expressed solely if an expression vector contains an exogenous nucleic acid that can activate transcription from the regulatory elements. Various regulatory elements have been described in the prior art. They are generally composed of repeats of synthetic oligonucleotides or relatively small gene fragments. They activate transcription under known conditions. For example, transcription from cyclic AMP
response elements (CRE) is activated by increased intracellular cyclic AMP
levels, a well-known second messenger to many hormones (Tamai et al., 1997).
As another example, transcription from a 1.7 kb fragment of the osteocalcin gene is activated upon osteoblast terminal differentiation (Baker et al., 1992).
Thus, regulatory elements can be operatively linked to the recombinase coding sequence to obtain a conditionally active form of the recombinase.
Transcription termination and polyadenylation signals are also added to the 3' of the recombinase coding sequence. Another approach can be used to place the expression of a recombinase coding sequence under the control of specific regulatory elements. This is the so-called "knock-in" technique, whereby, according to a preferred embodiment of the present invention, the whole or part of the expressed sequence of a specific cellular gene is replaced by a recombinase coding sequence, or whereby a recombinase coding sequence is inserted into a specific cellular gene. The result of such replacement or insertion is that the expression of a recombinase coding sequence mimics the expression of a cellular gene. Methods to insert into or replace specific cellular genomic sequences are known in the prior art. By targeting a cellular gene known to be activated under the desired conditions (e.g. activation of a specific cellular pathway or cell differentiation), a recombinase coding sequence can be expressed solely under the desired conditions, thereby creating a conditionally-active form of the recombinase.
In another embodiment, the specific cellular function being screened for is the translocation of a signaling molecule from the cytosol to the nucleus. In this case, the recombinase is fused to the signaling molecule and the mRNA
encoding the fusion protein is constitutively expressed from enhancer and promoter sequences. A number of signaling molecules are known to shuttle between the cytosol and the nucleus depending on the activation state of certain cellular pathways. For example, part of the NF-xB complex translocates to the nucleus upon activation of lymphocytes. Smad4 is another example of a signaling molecule that translocates to the nucleus as a result of TGF(i binding to its cognate receptor at the cell surface (Wrana and Attisano, 2000). Furthermore, it is well known that many nuclear receptors (e.g. glucocorticoid receptor) translocate to the nucleus upon ligand binding. In the absence of activation, the signaling molecule is retained in the cytosol, thereby leading to retention of the fused recombinase in the cytosol. Since the recombinase must be located in the nucleus to perform site-specific recombination, the recombinase is inactive when retained in the cytosol. Upon activation of the signaling molecule (i.e. a specific cellular pathway), the fusion protein translocates to the nucleus, where the recombinase moiety acts on the expression vector containing an exogenous nucleic acid whose expression triggered activation of the specific pathway.
Thus, a conditionally-active form of the recombinase can be obtained by fusion with certain signaling molecules.
In a further embodiment of the . present invention, the specific cellular function being screened for is the stabilization of a particular messenger RNA. It is known in the prior art that certain messenger RNAs are unstable due to the fact that they contain one or more "destabilizing" sequences, usually located in their 3' untranslated region. Various mRNA-destabilizing sequences have been reported.
To screen for nucleic acids whose expression leads to stabilization and therefore increased translation of specific mRNA, the recombinase coding sequence is fused to a specific "destabilizing" sequence. The chimeric mRNA is expressed from constitutively active enhancer and promoter sequences. However, the recombinase is not produced because of the instability of the chimeric mRNA.
Thus, a conditionally-active ,recombinase can be obtained by inserting in its mRNA a chosen destabilizing sequence.
Methods to incorporate DNA segments into the genome of a cell are well known in the art. According to a preferred embodiment of the invention, the conditionally active recombinase sequence is inserted into a plasmid containing a gene conferring resistance to a selective agent (e.g. puromycin-N-acetyltransferase conferring resistance to puromycin). The resulting construct is transfected into cells using standard protocols (e.g. electroporation) and selection is applied. Surviving and growing cells are thought to have incorporated the plasmid and are cloned. Individual clones are analyzed by Southern blotting to confirm the presence of the construct within the genomic DNA. Ultimately, positive cellular clones are tested to determine whether site-specific recombination can be activated when the conditions initially set forth are met, for example when transcription from regulatory elements linked to the recombinase coding sequence is activated or when the fusion protein containing a 5 recombinase moiety is translocated to the nucleus. Cellular clones containing a conditionally-active form of a recombinase are used as screening hosts.
Methods to incorporate DNA segments into the genome of an organism are also well known in the art. According to a preferred embodiment of the invention, the conditionally active recombinase sequence is inserted into a 10 fertilized egg (e.g. of a mouse), which is re-implanted into a pseudo-pregnant mother. DNA extracted from resulting organisms (e.g. embryos, pups or adults) is analyzed by Southern blotting to determine whether the organism is transgenic.
Positive animals are bred and used as screening hosts. Alternatively, the conditionally-active form of a recombinase can be incorporated in the genome of 15 embryonic stem ("ES") cells (e.g. of mouse origin). The transgenic ES cells can be aggregated with morula or injected. into blastocysts to obtain chimeric animals.
If the transgenic ES cells have populated the germline, then the resulting chimeric animal can be bred to obtain a line of transgenic animals, which can be used as screening hosts.
iv) Insertion of exogenous nucleic acids in the expression vector and producfion of libraries of recombinant expression vectors containing exogenous nucleic acids The exogenous nucleic acid may be derived from any source, i.e. any organism, tissue or cell type, disease state, etc. In one embodiment of the invention, a plurality of different nucleic acids is inserted into a plurality of copies of an expression vector to provide a plurality of recombinant expression vectors each expressing a unique exogenous nucleic acid and/or encoding a unique protein or peptide. Alternatively, a nucleic acid encoding one particular exogenous protein or peptide may be inserted into the expression vector.
Preferably, the exogenous nucleic acid is derived from a nucleic acid library and a plurality of exogenous nucleic acids are inserted into multiple expression vector copies to yield a pool of recombinant expression vectors.
The library may be obtained from a tissue or a cell type of interest or synthesized artificially. This library may be a cDNA library, a genomic library, an RNA
library, an expressed sequence tag library, a library made of randomized artificial sequences, or any other kind of library comprising nucleic acids from any kind of organism, tissue, or cell type known to the skilled artisan. Preferably the library is derived from a mammalian source. However, the library may also be derived from reptilian, amphibian, avian, insect, plant, fungi, bacterial cells, etc. The exogenous nucleic acid may be derived from mRNA isolated from a tissue or cell type of interest. In this case, the mRNA would be purified and reverse transcribed into cDNA using methods well known in the art. In same instance, the nucleic acid library will be derived from a subtractive library, for example a library which comprises cDNAs differently expressed in a disease state when compared to the corresponding healthy tissue. Suitable nucleic acid libraries may be generafied using standard methods (see for example Sambrook et al., Molecular Cloning: A
Laboratory Manual, 2~d Ed. Cold Spring Harbor (1989)).
Although exogenous nucleic acids of any type can be screened and selected using the present invention, examples given below rely on cDNA, fragments of cDNA or fragments of genomic DNA as a source of exogenous nucleic acids, In the case of fragments of cDNAs, initiation and termination codons may be provided by the expression vector upstream and downstream of the cloning sites) for the fragments of cDNAs, respectively. In the case of fragments of genomic DNA, a library is made starting either with whole genomic DNA or DNA inserts) from ~, bacteriophage, cosmid or bacterial artificial chromosome confiaining genomic DNA. Exogenous nucleic acids are generated by partial digestion of the DNA with a restriction enzyme cutting DNA
frequently (e.g. Sau3A, Rsal) and can be size-selected by sieve chromatography (e.g.
SepharoseT"" CL2B column).
Preferably, the exogenous nucleic acids are cloned into the expression vectors to produce recombinant expression vectors. The resulting population of recombinant expression vectors is transformed in Escherichia coli by electroporation according to standard procedures. A typical yield is 5x105 to 5x10' transformants/p.g of cDNA depending on the expression vector. A person skilled in the art will understand that the required number of individual transformants depends on the predicted abundance of exogenous nucleic acids having the desired feature within the starting population of exogenous nucleic acids. Plasmids may be prepared and purified according to standard procedures.
Additional steps may be needed to obtain a population of viral-based expression vectors. In the case of retroviruses, plasmids comprise viral sequences essential for replication and packaging as well as components of an expression vector. The population of plasmids is transfected into a cell line expressing the viral proteins necessary for replication and packaging to generate a plurality of recombinant viral genomes that are subsequently packaged. Thus, a plurality of retroviral-based expression vectors is obtained. In the case of adenoviruses, plasmids comprise parts of the viral genome separated by the components of an expression vector. The population of plasmid is transfected, along with a replication-defective viral genome, in a cell line that can complement the replication defect, usually HEK293 cells. Homologous recombination between a plasmid and a viral genome generates a recombinant viral genome having inserted the components of the expression vector. This recombinant viral genome is subsequently packaged and can be propagated in HEK293 cells. A plurality of recombinant viral genomes is thus produced and packaged to obtain a plurality of adenoviral-based expression vectors.
v) Insertion of the expression vecfor info a suitable hosf and recombination of expression vector comprising a heterologous nucleic acid encoding the desired funcfion As outlined above, a suitable host should be able to perform the cellular function being screened for, buff it should not exhibit this function in the absence of an "activating" condition (e.g. expression of an appropriate exogenous nucleic acid). If screening is performed using viral-based expression vectors, it , is necessary that the host be infected with the recombinant viral particles. In preferred embodiments of the present invention, the genome of the host should also harbor a conditionally-active form of a recombinase.

Introduction of a recombinant expression vector into an eukaryotic host cell can be carried out using a number of different well known procedures.
Transfection by electroporation, lipofection, calcium phosphate, and micro-injection are only a few of the available techniques to introduce nucleic acids info eukaryotic cells. Introduction of recombinant viral-based expression vectors into eukaryotic host cells is simply carried out by incubating the host with the viral particles and allowing infection to proceed. In order to ensure that a recombinant viral-based expression vector is introduced in almost every cell, infection is usually performed at a multiplicity of infection (m.o.i.) greater than 1 (e.g.

plaque-forming units/cell). Introduction of a recombinant expression vecfior into a transgenic animal or plant or part thereof can be carried out by electroporation or by injection of complexes comprising lipid derivatives and DNA (e.g.
intravenous or peritoneal injections). Introduction of a recombinant viral-based expression vector into a transgenic organism is performed by injection of viral particles, e.g.
in the case of recombinant adenoviruses, 10$ plaque forming units in 0.05 ml of saline intraperitoneally (Mittal et al., 1993).
Once the expression vector or viral-based expression vector has been introduced into the host, the transfected or infected cell should provide most of the molecular machinery for the proper expression and/or function of the exogenous nucleic acid contained therein. The biological function encoded by the exogenous nucleic acid is carried out, if any, directly or through the corresponding protein or peptide if it contains an open reading frame. If this biological function somehow triggers the activity of the conditionally-active recombinase present in the host (e.g. by activating ifs expression, by inducing its translocation into the nucleus, by stabilizing its mRNA), then the expression vector or viral-based expression vector containing said exogenous nucleic acid will be recombined.
vi) Recovery and identification of recombined expression vector After recombination has taken place, genomic and extrachromosomal DNA
are extracted using standard techniques to recover a pool of expression vectors.
Typically, this step is done 24 to 72 hours after introduction into the host of the expression vector having incorporated an appropriate exogenous nucleic acid.
This step generally involves lysis of cells using a buffer containing ionic detergent (e.g. 1% SDS) followed by digestion of proteins using proteinase K and purification of DNA by ion-exchange chromatography or phenol extraction and ethanol precipitation. In the case of expression vectors smaller than approximately 12 kb, it may be preferable to extract specifically extrachromosomal DNA using a modified Hirt procedure involving lysis of cells by ionic detergents (e.g. 1.2% SDS) followed by precipitation of the genomic DNA
(e.g. using KOAc 3M, pH 5) and purification of extrachromosomal DNA by ion-exchange chromatography (e.g. QIAQUICKTM column from Qiagen Inc.).
In one embodiment of the present invention, activation of a site-specific recombinase results in the modification of the expression vector, this modification leading to the removal of a unique restriction site contained in the expression vector. Thus, unrecombined expression vectors can be cut by the enzyme recognizing the restriction site whereas recombined expression vectors can not.
Since it is known that cleaved vectors are much less efficiently transformed in bacteria than uncleaved vectors, this property may be used to identify expression vectors that have been recombined. Acoording to a preferred embodiment of the invention the method to identify recombined expression vectors comprises the steps of:
a) extracting DNA from cells into which the expression vectors) with an exogenous nucleic acid sequence has been introduced;
b) digesting the DNA of step a) with a restriction enzyme recognizing a restriction site present in the stuffer region of the expression vector, between recombination target sequences;
c) transforming bacteria with the digested DNA molecules of step b);
d) culturing the transformed bacteria in a selection media (e.g. media with an antibiotic such as ampicillin) so as to select for bacterial colonies resistant to the selection media by virtue of having been transformed by an expression vector;
e) extracting the expression vector molecules from bacterial colonies selected in step d); and f) identifying exogenous nucleic acids) found in the expression vectors extracted at step e).
To further increase the specificity of recovery of expression vectors at step e) herein above, it may be preferable to degrade cleaved unrecombined 5 expression vectors (step b) with a nuclease that acts on extremities of double strand DNA (e.g. lambda exonuclease). Circular uncleaved recombined molecules are protected from the action of such nucleases.
According to another embodiment of the invention, activation of a site specific recombinase results in the modification of the expression vector, this 10 modification leading to the reconstitution of a sequence encoding a peptide/protein conferring a resistance to a suppressive condition, such as resistance to a given antibiotic (e.g. aminoglycoside phosphotransferase conferring resistance to kanamycin). Hence, according to an embodiment of the invention, the method to identify recombined expression vectors comprises the 15 steps of:
a) extracting DNA from hosts) into which the expression vector with an exogenous nucleic acid sequence has been introduced;
b) transforming bacteria with the DNA molecules extracted at step a);
c) selecting for bacterial colonies resistant to a selection media (e.g.
20 culture media with an antibiotic such as kanamycin) by virtue of having been transformed by a recombined expression vector;
d) extracting the expression vector molecules from bacterial colonies selecfied at step c); and e) identifying the exogenous nucleic acids) found into the expression 25 vectors from step d).
It is also possible to use the polymerase chain reaction (PCR) to identify exogenous nucleic acids inserted into expression vectors which have been recombined. This is particularly useful when screening is performed with viral-based expression vectors. According to an embodiment of the invention, 30 identification of exogenous nucleic acids inserted into expression vectors is achieved using a PCR-based method which comprises the steps of a) extracting_ DNA from hosts) into which the expression vector with an exogenous nucleic acid sequence has been introduced;
b) digesting the DNA of step a) with a restriction enzyme recognizing a site present in the stuffer region of the expression vector, between recombination target sequences;
c) amplifying a DNA fragment ("amplicon") from the expression vector using a forward primer located upstream of the site of insertion of the exogenous nucleic acids) and a reverse primer located downstream the recombinase substrate; and d) identifying exogenous nucleic acids) found in the amplicon obtained from step c).
To further increase the specificity of the PCR reaction, it may be preferable to degrade cleaved unrecombined expression vectors (step b) with a nuclease that acts on extremities of double strand DNA (e.g. lambda exonuclease).
Uncleaved circular recombined molecules are protected from the action of such nucleases. In the case of viral-based expression vector derived from some linear DNA viruses (e.g. adenoviruses), cleaved unrecombined expression vectors can be degraded by a nuclease that acts specifically on free 5' extremities of double strand DNA (e.g. bacteriophage lambda exonuclease VII). Since the 5' extremities of adenovirus derivatives are covalently linked to a protein moiety, uncleaved linear recombined molecules are protected from the action of such nucleases.
The amplicon can be cloned in a general purpose bacterial plasmid (e.g.
pBLUESCRIPTT"" KS II). If recombination has resulted in the reconstitution of a sequence encoding resistance to a given antibiotic (e.g. neomycin phosphotransferase conferring resistance to kanamycin) and if the amplicon contains the whole antibiotic resistance coding sequence, then bacteria transformed with a plasmid harboring the amplicon may be preferably selected on medium containing an appropriate antibiotic, thereby ensuring that only amplicons derived from recombined molecules are cloned.
According to a third embodiment of the present invention, activation of a site-specific recombinase results in the modification of the expression vector, this modification leading to the removal from the expression vector of a sfiuffer region preventing the replication and translation of a cDNA copy of the Sindbis virus genome that comprises the exogenous nucleic acid. Preferably the exogenous nucleic is inserted immediately upstream of the 3' untranslated region of the cDNA copy of the Sindbis virus genome and it is expressed from enhancer and promoter sequences preferably also embedded in the cDNA copy of the Sindbis virus genome. Thus, recombined expression vectors produce self-replicating and self-packaging viral genomes that contain the exogenous nucleic acid whereas unrecombined expression vectors do not. Viral particles are therefore produced only in cells in which an exogenous nucleic acid encoding a desired function has been expressed. Because the exogenous nucleic acid and, generally, the promoter and enhancer sequences, are embedded in the cDNA copy of the viral genome, the viral particles that are produced after recombination contain a copy of the desired exogenous nucleic acid. Viral particles are collected i) from the culture medium of host cells transfected with a library of expression vectors, or ii) from extracellular fluids (e.g. blood, lymph) or whole tissues if the screening host is a transgenic organism. Viral particles can be infectious to allow the propagation of the viral genome comprising desired exogenous nucleic acids. Viral genomes can be recovered from infectious viral particles by infecting a susceptible cell line (e.g. BHK-21 in the case of recombinant Sindbis viral particles), extracting nucleic acids from infected cells (e.g. RNA in the case of recombinant Sindbis virus).
A
DNA fragment containing the exogenous nucleic acid can be obtained by PCR
(after reverse transcription of RNA in the case of Sindbis virus) using primers located upstream and downstream of the site of insertion of the exogenous nucleic acid. U.S. application No 09/641,931 of the present inventors (incorporated herein by reference) contains numerous details and explanations on the construction and use of recombinant Sindbis viral particles and genomes.
Alternatively, viral particles produced after recombination of expression vectors can be conditionally infectious to prevent unwanted effects of a viral infection on the screening host, particularly in the case of transgenic animal screening hosts. It is known that conditionally-infectious Sindbis virus particles can be obtained by preventing the cleavage of the p62 envelope precursor protein, usually by introducing a deleterious mutation in the sequence coding for the cleavage site (Berglund et al., 1993). Viral particles produced under these conditions or from such a modified Sindbis virus genome can be recovered and partially purified (e.g. by centrifugation on density gradient or by heparin-agarose affinity chromatography). Alternatively, viral particles produced from p62 cleavage deficient mutants can be rendered infectious after recovery from screening host by controlled digestion with chymotrypsin and used to infect a susceptible cell line (e.g. BHK-21 fibroblasts).
The identity of the exogenous nucleic acid inserted into a recombined expression vector can be determined by sequencing appropriate regions) of the plasmids recovered from bacteria or by sequencing appropriate regions) of the DNA fragment comprising the exogenous nucleic acid and obtained, for example, by PCR amplification. Sequence comparisons with . known polynucleotide sequences in databases may confirm the function of the isolated exogenous nucleic acid and/or reveal homologies with nucleic acids encoding known functions. The exogenous nucleic acids inserted into recombined expression vectors can also be i) analyzed by digestion with restriction enzymes followed by gel electrophoresis; ii) used as hybridization probes) in expression profiling or microarray analysis; iii) otherwise characterized.
vii) Applicafions of the identified exogenous nucleic acids having a desired property The exogenous nucleic acids selected and identified according to the methods of the invention, as well as the peptides and proteins encoded by the same may have many uses. They may be useful for research applications and laboratory use. For instance, they may be used for further screening procedures e.g. as a library, they may serve as probes for the discovery and isolation of various genes and/or diseases, be used for the production of antibodies, be used for the development and the use of oligonucleotide or oligoribonucleotide sequences antisense DNA or RNA molecules or ribozymes. Some of the genes and gene products identified and isolated by the method of the present invention may directly be used as therapeutic agents or, alternatively, as therapeutic targets. These applications and others are known in the art as well as the manner in which they can be reduced to practice.
EXAMPLES
As it will now be demonstrated by way of examples hereinafter, the invention provides a very rapid, efficient and accurate method to select a particular nucleic acid having a desired feature. Example 1 shows the properties of a Flp recombinase whose coding sequence has been partially optimized (oFlp). Example 2 gives an example of a plasmid-based expression vector, a cell line in which the activity of oFlp is regulated in a specific manner, and methods to selectively recover recombined forms of plasmid-based expression vectors.
Example 3 gives an example of a viral-based expression vector and a method to selectively recover its recombined form after infection of cells constitutively expressing an active form of oFlp. Example 4 gives an example of a vector carrying a cDNA copy of the Sindbis virus genome inactivated through insertion of a recombinase substrate and a method to recover viral particles after recombination of this vector. Example 5 gives an hypothetical example of a screening performed in a transgenic animal using a virus-based expression vector. Example 6 gives an hypothetical example of an in vivo screening performed to identify tissue-specific regulatory elements.
Although any methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, the preferred methods and materials are described.
A) Materials and methods The following are experimental procedures and materials that were used for the examples set forth below.
Enzymes and reagents Restriction enzymes and DNA-modifying enzymes were purchased from New England Biolabs (Cambridge, Ma.) unless otherwise stated. TITAN T"" one-tube RT-PCR system was purchased from Roche Molecular (Laval, Quebec, Canada). Taq DNA polymerise was purchased from Amersham Pharmacia Biotech (Baie d'Urfe, Quebec, Canada). Synthetic oligonucleotides were obtained either from Hukabel Ltd. (Montreal, Quebec, Canada), Life Technologies (Burlington, Ontario, Canada) or MWG Biotech Inc. (High Point, North Carolina).
5 Cell culture reagents were from Life Technologies unless otherwise stated.
Plasmids PQuantoxT"", pQB125fc3TM and pQBIAdBNT"" were purchased from Quantum Biotechnologies, Montreal, Canada. pREP4 (GenBankT"" accession 10 number A25856) and pQE30T"" were from Qiagen Inc. pBluescript IlT"" SK (+) was from Stratagene (California). DH-BB, pSinRep5 and pcDNA1.1/Amp were from Invitrogen (Carlsbad, Ca.).
Cloning of Flp recombinase and synthesis of a partially optimized coding 15 sequence The sequence of Flp recombinase (SEQ ID N0:4) was amplified by 30 cycles of PGR from approximately 200 ng of yeast DNA using 25 pmoles of forward primer 20-5 (SEQ ID N0:29), 25 pmoles of reverse primer 23-1 (SEQ ID
N0:30) and 1 U of high-fidelity Vent DNA polymerise (New England Biolabs, Ma.) 20 in 50 p,1 of 1 x Vent reaction buffer supplemented with 3% DMSO and 200 p,M
dNTP. Comparison of the sequence of our clone with published Flp sequence (GenBankTM accession number J01347) revealed no difference. pCMVneo was derived from pQB125fc3T"" (Quantum Biotechnologies, Montreal, Canada) by deletion of a 758 by Sacll-Apal fragment. The Flp PCR fragment was cloned at 25 the Nrul site of expression vector pCMVneo in order to achieve production of the Flp recombinase in transfected cells. The resulting plasmid is designated RC6.
A
1654 by Bsml-Dralll fragment was deleted from RC6 to generate RC26. Finally, a filled-in BamHl/Pvull 1392 by fragment from pPurT"" (BD Clontech, Ca.) was cloned at a Nael site of RC26 to generate RC33. Oligonucleotide-mediated gene 30 synthesis was used to optimize the 5' third of Flp coding sequence.
Briefly, 35 pmoles of oligonucleotides 89-1 (SEQ ID N0:31), 86-1 (SEQ ID N0:32), 83-1 (SEQ ID N0:33), 82-1 (SEQ ID N0:34), 80-1 (SEQ ID N0:35) and 74-1 (SEQ ID

NO:36) were phosphorylafied and annealed in 70,1 of 10mM Tris-HCI pH
7.51100mM NaCI/1mM EDTA by heating at 85°C for 10 minutes and decreasing the temperature at a rate of 1 °C/minute. Gaps were filled using 3 U of polymerase and extremities were ligated using 1 Weiss U of T4 DNA ligase. The resulting 391 by fragment was isolated by electrophoresis on a 2% agarose gel, purified using the QiaQuickT"" kit (Qiagen) and amplified by 25 cycles of PCR
using 1 U Vent DNA polymerase and 25 pmoles of primers 24-6 (SEQ ID N0:37) and 18-88 (SEQ ID N0:38) in 90 p,1 of ThermopoITM 1x buffer containing 6%
dimethylsulfoxide and 200p.M dNTP. The PCR product was digested by BamHl and EcoRV and inserted into a BamHl-EcoRV digested RC33. The resulting plasmid is designated RC59.
Cell culture and transfection HEK293A cells (ATCC no. CRL-1575) are grown in Dulbecco's minimal essential medium supplemented with 10% (v/v) fetal bovine serum, 100 Ulml penicillin and 100 mg/ml streptomycin. Cells are passaged when reaching 80 95% confluence by incubating with 0.05% (vlv) trypsin/0.5mM EDTA (Wisent Inc.). Lipofection is performed as follows. Lipid:DNA complexes are formed in 100 w1 of culture medium without serum using 3 p,1 of 1 mg/ml PEI (Sigma, St Louis) per p,g of DNA. Cells are transfected the day after plating (typically 10,000 cells/cm2) by adding the Iipid:DNA complex to the culture medium. After a 3 hour incubation, the medium is changed and cells are usually processed after 48 hours.
Production of recombinant proteins and polyclonal antibodies pQE30TM (Qiagen, Mississauga, Ontario, Canada) contains an origin of replication, the [i-lactamase coding sequence, and the taq promoter controlling the expression of a given fusion protein containing 6 histidines at its N-terminus.
A 865 by Hindlll fragment from plasmid RC6 was subcloned in Hindlll-digested pQE30TM. The hexahistidine fag coordinates nickel atom, thereby allowing purification of the fusion protein by metal affinity chromatography. pQE30TM
containing Flp~-~86 described above is transformed in strain M15[pREP4]. The fusion protein is produced and purified under denaturing conditions (6M
guanidine hydrochloride, 200mM NaCI, 100mM sodium phosphate, lOmM Tris pH 8,0, 2mM imidazole, 5mM ~3-mercaptoethanol) according to the manufacturer's instructions (QIAEXPRESSIONISTT"" kit, Qiagen, Mississauga, Ontario, Canada). The protein solution is dialyzed at 4°C against 4 liters of PBS.
Approximately 200 p.g of recombinant protein mixed with complete Freund's adjuvant (VWR Canlab, Montreal, Quebec, Canada) is injected subcutaneously to New Zealand White rabbit on day 1. On days 15 and 28, another 100 p,g of recombinant protein mixed with incomplete Freund's adjuvant is similarly injected.
Rabbits are bled 7 days after the last injection.
NUClear extracts, VIlestern bloffing, immcrnofluorescence For immunofluorescence, cells are rinsed twice with PBS and fixed with 2% (w/v) paraformaldehyde in PBS. Cells are washed with PBS and fixative is quenched by incubating 10 minutes in PBS supplemented with 50mM NH4C1.
Cells are then incubated overnight in PBS supplemented with 1 % (w/v) bovine serum albumin fraction V (BSA), 0.1 % (w/v) dried low fat milk and 0.05% (v/v) Triton X-1OOTM. Cells are incubated in a 1/25 dilution of antiserum in PBS/BSA
0.1 %/milk 0.1 %, washed and incubated with anti-rabbit IgG coupled to TRITC.
To prepare crude nuclear extracfis, cells are rinsed with PBS and collected in PBS/1mM EDTA. Cells are incubated for 15 minutes on ice in 0.8m1 of buffer A
(10mM HEPES pH 7.9; lOmM KCI; 0.lmM EDTA; 1mM DTT; 10p,g/ml aprotinin).
Igepal CA-630TM (50p,1 of a 10% (vlv) solution) is added, the solution is briefly vortexed and centrifuged. The nuclear pellet is incubated for 15minutes on ice in 0.05 ml of buffer B (20mM HEPES pH 7.9; 400mM NaCI; 1 mM EDTA; 1 mM DTT;
10~,g/ml aprotinin). The solution is centrifuged and the insoluble pellet is resuspended in Laemmli buffer (50mM Tris-HCI, pH 6.8, 100mM dithiothreitol, 2%
sodium dodecyl sulfate (w/v), 0.1 % bromophenol blue (w/v), 10% glycerol (v/v)) and boiled for 5 minutes. Proteins are electrophoresed on denaturing polyacrylamide gel and transferred to 0.22 ~.m nitrocellulose according to standard protocols. The nitrocellulose membrane is incubated overnight in tris-buffered saline (TBS; 25mM Tris-HCI, pH 7.4, 137mM NaCI, 2.7mM KCl) supplemented with 5% (w/v) dried milk and 0.1 % (v/v) TWEEN-20TM (Sigma, St. Louis, Mo.). It is then incubated for 1.5 hours at room temperature with affinity-purified antibody to Flp at a concentration of approximately 3p,g/ml in TBS
supplemented with 0.1 % (w/v) dried milk and 0.1 % (v/v) TWEEN-20T"". The membrane is washed twice with TBS supplemented with 0.1 % (v/v) TWEEN-20T"". It is then incubated for 1 hour at room temperature with goat anti-rabbit coupled to horseradish peroxidase (Sigma, St. Louis, Mo.) diluted 1/30,000 in TBS
supplemented with 0.1% (v/v) TWEEN-20T"". The membrane is washed twice with TBS supplemented with 0.1 % (v/v) TWEEN-20T"". Detection of the protein bound to the antibody complex is performed with the ECLTM reagent according to the manufacturer's instructions (Amersham Pharmacia Biotech, Baie d'Urfe, Canada).
RNA extraction and Northern analysis Total RNA is purified either by the guanidium isothiocyanate/acid phenol method or using the RNEASYT"" kit according to the manufacturer's instrucfiions (Qiagen). For Northern analysis, 2.5p,8 of total RNA was electrophoresed on 1.2% agarose/1.2% formaldehyde gel and transferred onto nylon membrane by capillarity. After UV crosslinking, the blot was probed with a radioactively-labeled full length wild type Flp fragment. After hybridization, the membrane was rinsed with 2xSSC/0.1% SDS and washed at 65C once with 2xSSCl0.1% SDS and twice with 0.2xSSC/0.1 % SDS. Signal was revealed by autoradiography.
Extraction of genomic and extrachromosomal DNA. Polymerise chain reaction.
Cells were washed twice with PBS and collected using PBS supplemented with 1 mM EDTA. Cells were centrifuged for 5 minutes at 30008. Total nuclear DNA was extracted and purified using the Qiagen DneasyTM tissue kit.
Extrachromosomal DNA was extracted using a modified Hirt procedure as follows (Arid, 1998). Cell pellet was resuspended in 250 p,! of buffer A (50 mM Tris-HCI
pH 7.5/10mM EDTA/100 p,g/ml RNAse A). Cells were incubated for 5 minutes at room temperature in 250 f~l lysis buffer (1.2% w/v sodium dodecyl sulfate).
Cellular debris and chromosomal DNA were precipitated 15 minutes on ice using 350 p.1 of buffer B (3M cesium chloride/1 M potassium acetate/0.67M glacial acetic acid). After centrifugation, DNA was purified from the supernatant by using a QIAquickTM kit (Qiagen). Concentration of DNA solutions was determined by fluorimetry. PCR was typically performed for 30-40 cycles on 100ng of DNA
using the Expand T"" enzyme mix (Roche Molecular, Laval, Canada), 25 pmoles of each primer, 2% (vlv) DMSO, 200p.M dNTP in a 1x buffer supplied by the manufacturer.
B) Example 1: Partially optimized coding sequence for the Flp recornbinase The Flp recombinase was chosen for this and subsequent experiments.
This example illustrates the properties of a partially optimized coding sequence of the Flp recombinase. Analysis of the Flp coding sequence (SEQ ID ~N0:4) indicated that the 5' third of the sequence contained a number of codons rarely found in mammalian genes and presumably poorly translated in cells of mammalian origin. Most notably, 3 ATA and 5 TTA or CTA codons, encoding isoleucine (11e) and leucine (Leu) respectively, are present in the yeast sequence but are the least preferred codons in mammals. Furthermore, the 5' third of the yeast Flp coding sequence is AT-rich (64 %) and contains a putative site of transcription termination (AATAAA, position 220). Changes were therefore introduced in the first 345 base pairs of the Flp coding sequence to remove the putative sites of transcription termination and to replace ATA (11e) and TTA
or CTA (Leu) codons by ATC (most frequent Ile codon in mammalian cells) and CTG or CTC (most frequent Leu codons in mammalian cells), respectively.
Furthermore, mutations were introduced to substitute a serine for proline at position 2, a serine for a leucine at position 33 and a serine for a leucine at position 108. It has been reported in prior art that these mutations enhance the thermostability of Flp (Buchholz et al., 1998). The Flp sequence was optimized by oligonucleotide-mediated gene synthesis technique (see Materials and methods).
Alignment of wild-type Flp (SEQ ID NOS:4 and 5) and optimized Flp sequences (referred hereinafter as oFlp; SEQ ID NOS:6 and 7) is shown in Figure 4A.
To assess the combined effect of codon optimization and thermostability-increasing mutations on the levels of Flp that is produced, wild type Flp (SEQ

N0:4) and oFlp (SEQ ID N0:6) sequences were inserted into CMV-based expression vectors to obtain RC33 and RC59, respectively (see Materials and Methods). Recombinant expression vectors (2 ~.g) were transfected into HEK293A cells by lipofection. Total RNA was extracted from cells 24 hours after 5 transfection and nuclear proteins extracts were prepared 48 hours after transfection to evaluate both the transcript and protein levels of the two forms of Flp. As shown by Northern analysis, (Figure 4B), transcript levels of optimized Flp (lane 412) were approximately 20-fold higher than those of wild type Flp (lane 411), even though both were expressed from the same promoter/enhancer 10 elements (derived from CMV) and transfection efficiencies were considered similar in both cases, as judged by the equal number of fluorescent cells after co-transfection of Flp-expressing vectors with a vector expressing green fluorescent protein as a marker. No Flp transcript is detected in cells transfected only with a vector expressing green fluorescent protein (lane 410). Flp protein levels were 15 determined by Western analysis (Figure 4C). Using this technique, Flp is undetectable when expressed as a wild type coding sequence (lane 421 ) but a robust signal is detected when Flp is expressed as an optimized coding sequence (lane 422). As expected, no Flp is detected in cells transfected only with a vector expressing green fluorescent protein (lane 420). Taken together, these results 20 indicate that both transcript and protein levels of Flp are greatly increased by expressing an optimized coding sequence.
The increased amount of Flp produced from an optimized coding sequence can be useful in a screening experiment. Indeed, if a regulatory element linked to a Flp coding sequence is activated by a given stimulus, then 25 more Flp transcript and protein shall be produced if the regulatory element is linked to an optimized Flp coding sequence rather than to a wild type Flp coding sequence, as was shown for the CMV enhancer/promoter elements. This may help to achieve higher sensitivity, particularly if transcription from the regulatory element is weakly activated by the stimulus of interest.

C) Example 2: Plasmid-based expression vector, cell line in which the activity of oFlp is regulated in a specific manner, and methods to selectively recover recombined forms of plasmid-based expression vectors.
This example illustrates the various functionalities of a plasmid-based expression vector designed according to the present invention. It also shows that an expression vector can be selected after transfection in an engineered cell line, provided the vector expresses an exogenous nucleic acid capable of triggering Flp activity. Plasmid construction is schematized on Figure 5.
Oligonucleotides 62-2 (SEQ ID N0:9) and 62-3 (SEQ ID N0:10) (500) were annealed and the protruding extremities were blunted by the Klenow fragment of DNA polymerase I
(501). The resulting fragment was cloned in a Sspl site of plasmid pQuantoxT""
(502). The resulting plasmid, RC1, was partially digested with Hincll and completely with Kpnl, the extremities were blunted and the plasmid was recircularized (503) to generate RC20a. A 953 by fragment was amplified from plasmid pREP4 (504) using forward primer 72-2 (SEQ ID N0:11) and reverse primer 18-73 (SEQ ID N0:13) (505). This fragment was cloned in the unique EcoRV site of pBluescript II SK(+) to generate RC17 (506). A 358 by BamHl fragment from RC20a (507) was cloned into the BamHl site of RC17 to generate RC22 (508). The unique Notl site of the latter plasmid was removed by digestion, fill-in and recircularization (509) to create RC24. A 1607 by Pvull-Hincll fragment (510) obtained by partial digestion of RC24 was ligated to a 4602 by Dralll-Bsml fragment of pQBl25fc3 that had been blunted (511). The resulting vector is RC32.
The transcription unit of RC32 can be said to comprise the cytomegalovirus immediate early gene enhancer/promoter regions followed by the coding sequence for green fluorescent protein (GFP) and a bovine growth hormone polyadenylation signal. The recombinase substrate, composed of a recombination target sequence followed by a stuffer region and by another recombination target sequence, is inserted between the laci promoter derived from plasmid pBluescript IIT"" SK (+) and nt 419 to 1318 of pREP4, encoding residues 26 to 267 of neomycin phosphotransferase (GenBankT"" accession number AAK28133). The Xhol site upstream of the GMV enhancer/promoter elements in RC32 was removed by partial digestion, fill-in and recircularization (512). Finally, annealed oligonucleotides 24-7 (SEQ ID N0:17) and 24-8 (SEQ ID
N0:18) (513) were cloned in a Scal site of the resulting plasmid to generate RC43 (SEQ ID N0:1) (514), which contains a unique Swal site in the stuffer region. A map of RC43 is given in Figure 5C and Table 1 hereinafter.
Table 1: Map of RC43 (Fig 5C; SEQ ID N0:1) Region Function Position CMV cytomegalovirus immediate/early enhancer 375-903 and promoter GFP green fluorescent coding sequence 1074-1796 pA bovine growth hormone polyadenylation signal1954-2178 lac laci prokaryotic promoter 2496-2618 FRT1 Flp recombination target sequence (SEQ 2719-2775 ID N0:8) GSTt1 stuffer 2829-3095 FRT2 Flp recombination target sequence (SEQ 3108-3155 ID N0:8) KanR truncated neomycin phosphotransferase 3156-3888 AmpR ~3-lactamase 5336-6196 Swal Restriction site in stufFer 2979 We next established a system in which the activity of an oFlp recombinase was conditional on the transfection of an appropriate expression vector. We chose to place the expression of oFlp under the control of cis-acting regulatory elements recognized by the Ga14VP16 chimeric transcription factor ('UAS' sites).
Ga14VP16 is composed of the DNA-binding domain of Gal4 and the transcriptional activation domain of VP16 (Sadowski et al., 1988). Binding of Ga14VP16 to its cognate element in the context of a minimal promoter activates transcription of a sequence operatively finked to the minimal promoter (Webster et al., 1988). We constructed a plasmid comprising oFlp operatively linked to a minimal promoter downstream of two copies of a consensus UAS site (Webster et al., 1988). This was done essentially as schematized on Figure 6.
The minimal promoter of CMV was amplified from pcDNA1.1 (Invitrogen) using forward primer 31-6 (SEQ ID N0:24) and reverse primer 72-3 (SEQ ID N0:23) (601 ). This fragment encompasses the first 36 nt upstream of the site of initiation of transcription followed by 55 nt of the 5' untranslated region of the Sindbis virus genome. It was cloned in a EcoRV site in pBluescript II KS(+) (602) to generate plasmid pKS-PCRCMV. Oligonucleotides 21-1 (SEQ ID N0:25) and 21-8 (SEQ
ID N0:26) were phosphorylated, annealed and ligated using standard protocols.
The multimers of double-stranded oligonucleotide were cloned in a Bglll site of pKS-PCRCMV (603). A 131 by EcoRl-Munl fragment was excised from the resulting plasmid and substituted to a 785 by Xhol-Notl fragment in RC59 (604).
The resulting plasmid is RC71. To obtain an expression vector for Ga14VP16, we replaced the GFP coding sequence by the one for Ga14VP16 in RC43 and obtained RC74.
To obtain a cell line having integrated one or more copies of RC71 in its genome, 3.5 million of HEK293 cells were electroporated at 600V/cm with a mixture of 9~,g of Scal-linearized RC71 and 10~,g of denatured salmon sperm DNA. Cells were plated and selection (2.5 ~.g/ml puromycin) was applied 24 hours later. The concentration of puromycin was reduced to 0.5p.g/ml on day 4 and colonies were picked on day 11. Induction levels of oFlp expression was determined after transfection with RC74. Northern analysis revealed that oFlp mRNA levels were strongly induced in one subclone after transfection with RC74 (subclone 293-UASoFIp/6; Figure 7, lane 703). No induction was observed in another subclone (subclone 293-UASoFIp/10; Figure 7, lane 105). No Flp signal was detected in wild type HEK293 cells (Figure 7, lane 701) or after transfection of a control vector in either subclone (Figure 7, lanes 702 and 704).
To assess the activity of oFlp in the 293-UASoFIp/6 subclone, expression vectors (RC43 or RC74) were transfected in these cells, recovered after 48 hours and subjected to a selection procedure relying on reconstitution in the expression vector of a gene conferring resistance to kanamycin when transformed in E.coli only if the nucleic acid expressed by the vector is able to trigger the activity of oFlp. DNA (0.4~,g) was transfected in approximately 200,000 cells by lipofection using the EfFecteneTM reagent (Qiagen Inc.) according to the manufacturer's recommendations. Extrachromosomal DNA was extracted from cells by a modified Hirt technique. 20ng of recovered DNA was transformed in E. coli DH10B cells by electroporation. A small aliquot of the transformation (1/1,000) was plated on plates containing 100~.g/ml ampicillin to control for the efficiency of DNA recovery and electroporation. The remainder was plated on plates containing 50 p,g/ml ampicillin and 20 ~glml kanamycin. The number of colonies on plates containing kanamycin is a measure of the amount of expression vector that was recombined. In this example, transformation of DNA extracted from 293 UASoFIp/6 cells transfected with RC43 gave rise to 25 kanamycin-resistant colonies per 100 ampicillin-resistant colonies. Transformation of DNA
extracted from 293-UASoFIp/6 cells transfected with RC74 gave rise to 1,000 kanamycin resistant colonies per 100 ampicillin-resistant colonies.
According to a preferred embodiment of the invention and as schematized on Figure 8A, site-specific recombination of an expression vector expressing an exogenous nucleic acid (205) results in the removal of a unique Swal restriction site (R) located in the stuffer region (309). As will be shown below, a fragment can be specifically amplified from recombined expression vectors (802) using primers (803,804) flanking the recombination region and the exogenous nucleic acid. Expression vectors recovered after transfection in 293-UASoFIp/6 cells were therefore also analyzed by PCR. In this case, half of the DNA recovered by the modified Hirt technique was subjected to Swal digestion in order to cut unrecombined molecules (801). Digestion was carried out at room temperature for 4 hours. The reaction was then purified by phenol extraction and DNA was precipitated with ethanol. One fifth of the precipitate was used in a PCR
perFormed with 0.7U of Expand Mix DNA polymerise (Roche Molecular, Laval, Canada), 12.5 pmoles of 18-64 (SEQ ID , N0:19) and 12.5 pmoles of 18-106 (SEQ ID NO:20) in 25 ~.~1 of 1x Expand buffer (Roche Molecular) supplemented with 2% (v/v) dimethyl sulfoxide and 200E~M deoxynucleotides. Cycling conditions were 94°C/30 s, 52°C/30s, 68°C/4 min for 25 cycles. One fifth of the reaction was analyzed by electrophoresis on a 1 % (w/v) agarose gel. Results are shown on Figure 8B. A 3076 by fragment (indicated by arrow 816) is obtained only when an expression vector for Ga14VP16 (RC74) is transfected (lane 814). By comparison, a 3462 by fragment (indicated by arrow 817) is obtained after 15 cycles of PCR
on RC74 before transfection (lane 811 ). No fragment is amplified from DNA

extracted from untransfected 293-UASoFlp/6 cells (lane 812), from 293-UASoFlp/6 cells transfected with a control expression vector (RC43, lane 813) or from 293 UVT cells transfected with RC74 (lane 815).
5 D) Example 3: Viral-based expression vector and method to selectively recover ifs recombined form after infection of cells constitutively expressing an active form of oFlp.
This example illustrates the various functionalities of an adenovirus-based expression vector designed according to the present invention. Plasmid 10 construction is schematized on Figure 9. Annealed oligonucleotides 19-6 (SEQ ID
N0:13) and 19-7 (SEQ ID N0:14) were cloned in the 5417 by fragment of Hindlll/Notl-digested vector pQBIAdBN after extremities were blunted using the Klenow fragment of DNA polymerase I (900). Annealed oligonucleotides 22-5 (SEQ ID NO:15) and 22-6 (SEQ ID N0:16) were cloned in the unique Clal site of 15 RC44 to introduce a unique Pmel site in the resulting plasrnid RC46 (901).
Finally, a 4112 by Sall fragment from RC43 (see Example 2) was cloned in the Xhol site of RC46 (902). The resulting viral-based expression vector (RC49-2;
SEQ ID N0:2) comprises a transcription unit and recombinase substrate flanked by nucleotides 1-102 and nucleotides 3334-5779 of the Adenovirus serotype 5 20 genome (GenBank accession number 9626187). A map of RC49-2 is given in Figure 9B and Table 2 hereinafter.

Table 2: Map of RC49-2 (Fig 9B; SEQ ID N0:2) Region ' Function Position ITR Inverted terminal repeat (nucleotides 239-341 1-102 of adenovirus serotype 5 genome) CMV cytomegalovirus immediate/early enhancer 993-1521 and promoter GFP green fluorescent coding sequence 1692-2414 pA bovine growth hormone polyadenylation 2572-2796 signal lac laci prokaryotic promoter 3114-3236 FRT1 Flp recombination target sequence (SEQ 3337-3393 ID N0:8) GST~ stuffier 3447-3713 FRT2 Flp recombination target sequence (SEQ 3726-3773 ID N0:8) KanR truncated neomycin phosphotransferase 3774-4506 Ad5 Nt 3334-5779 of adenovirus serotype 5 4799-7244 genome Ori Origin of plasmid replication (ColE1) 7612-8375 AmpR ~i-lactamase 8539-9398 Swal Restriction site in stuffer 3597 The transcription unit and recombinase substrate of RC49-2 were incorporated in an adenoviral genome by in vivo homologous recombination. This was done by co-transfecting 5 ~~g of Pmel-linearized RC49-2 with 5 p,g of AdCMVIacZ~E1/DE3, a replication-defective genome obtained commercially (Quantum Biotechnologies, Montreal, Canada). Co-transfection of DNA
molecules in HEK293 cells was carried out by means of a calcium phosphate precipitate using standard protocols. Two days post-transfection, cells were overlaid with medium containing 1.25% (w/v) low melting agarose. Recombinant viral genome resulting from homologous recombination between RC49-2 and the replication-defective adenoviral genome can be propagated in HEK293 cells, as indicated by the presence of viral plaques composed of GFP-expressing cells. A
stock of viral particles was obtained after 2 successive rounds of plaque-purification according to standard protocols.

In order to test the adenoviral-based expression vectors, we obtained a cell fine constitutively expressing an active oFlp. This was done by integrating a plasmid, designated RC59, in the genome of Hela cells. RC59 contains two distinct transcription units: one expressing the optimized Flp sequence from the CMV enhancer/promoter elements and the other conferring resistance to puromycin to stably transfected cells. RC59 was linearized at a unique Scal restriction site to facilitate integration in the host genome. The linearized vector (10 ~.g) was electroporated in 5 million Hela cells. Electroporated cells were plated in five 100mm petri dishes and allowed to recover for 24 hours, at which time puromycin (1 L~g/ml) was added to the medium. After 4 days of selective pressure, the concentration of puromycin was decreased 10-fold to allow growth of cell colonies. Isolated colonies (20-100 cells) were picked 10 days later.
Clones were tested for expression of optimized Flp coding sequence by Northern analysis. Expressing clones were cloned again by limiting dilution to ensure that the population of Flp-expressing cells is monoclonal. Figure 10 shows the Flp transcript levels in 4 subclones, Hela/oFlp2-3 (lane 1002), Hela/oFlp3-2 (lane 1003), Hela/oFlp6-1 (lane 1004) and Hela/oFlp6-2 (lane 1005). Wild type Hela cells do not express Flp (lane 1001 ). It is well known that expression of a transgene can vary from one subclone to another, depending for example on the number of copies of the transgene and/or its site of integrafiion.
To verify that the viral-based expression vector contained within these viral particles could be recombined by the Flp recombinase, Hela/oFlp6-2, Hela/oFlp2-3 and wild type Hela cells were infected at a multiplicity of infection (m.o.i.) of 50. At this m.o.i., expression of GFP was seen in over 95% of cells. 48 hours after infection, genomic DNA was extracted and a fragment was amplified using primers located from either side of the site of recombination (18-64V
(SEQ
ID N0:19); and 18-106V (SEQ ID N0:20) as described in the Materials and Methods section. Analysis of the PCR products by agarose gel electrophoresis (Figure 11A) revealed that about 50% of the viral-based expression vectors were recombined in HelaloFlp6-2 cells (lane 1102). The expected sizes of the amplicons are 3544 by from the non recombined viral-based expression vector (1105) and 3182 by from the recombined viral-based expression vector (1106).

Because recombination leads to removal of a unique Swal site from the viral-based expression vector, the signal arising from non recombined viral-based expression vectors can be eliminated by Swal digestion. Genomic DNA (1.5 p.g) was therefore digested with 20 U of Swal for 6 hours at 22°C, purified by phenol extraction and ethanol precipitation and subjected to PCR as described above.
Consistent with the fact a DNA molecule cleaved between primers can no longer serve as template in PCR, amplicons from non recombined viral-based vectors (1105) are no longer detected after digestion of genomic DNA from infected cells with Swal (lane 1103). However, amplicons from recombined viral-based vectors (1106) are still readily detected after Swal digestion of genomic DNA
extracted from Hela/oFlp6-2 cells infected with viral-based expression vectors (lane 1103).
Interestingly, no recombined viral-based expression vectors was detected after infection of Hela/oFlp2-3 cells (lane 1104), showing that the level of Flp expression in these cells is not sufficient to mediate site-specific recombination.
We can therefore hypothesize that some leakage of Flp expression from regulatory elements, i.e. low levels of expression in the absence of activators, will not give rise to high background in the context of a screening experiment.
The following was performed in order to mimic a screening experiment.
Decreasing numbers of optimized Flp-expressing Hela cells (Hela/oFlp6-2:
15,000 cells; 150 cells) were mixed with wild type non-expressing cells (WT:
135,000 cells; 149,850 cells). The resulting cell population was infected with the adenoviral-based vector expressing GFP described above at a m.o.i, of 50.
Genomic and viral DNA were extracted 48 hours after infection. Two micrograms of DNA was digested with 20U of Swal for 6 hours at 37°C before 1110 of the digested DNA was subjected to 40 cycles of PCR as described above.
Figure 11 B shows the analysis of the PCR products by agarose gel electrophoresis. Shorter amplicons from recombined viral-based expression vectors are readily detected when the infected cell population is composed of 15,000 Hela/oFlp6-2 cells mixed with 135,000 WT cells (lane 1111 ). No amplicon is detecfied when the infected cell population consists of 150 Hela/oFlp6-2 cells mixed with 149,850 WT cells (lane 1113). To ascertain the presence of viral DNA
in the tested cell populations, PCR was performed on DNA that had not been digested by Swal. As expected longer, fragments derived from non recombined expression vectors are easily detected in both cases (lane 1110 and 1112). To increase the sensitivity of the PCR detection method when performed on DNA
digested by Swal, a semi-nested PCR was performed on 1/25 of the initial PCR
using primers 18-112V (SEQ ID N0:21) and 18-106V (SEQ ID N0:20). As can be seen on Figure 11 C, a fragment amplified from recombined viral-based expression vectors (expected size 838 bp, label 1124) was detected after semi-nested PCR on DNA extracted from both cell populations (lanes 1120, 1121 ).
Semi-nested PCR on DNA extracted from an infected cell population consisting of 150 Hela/oFlp6-2 cells mixed with 149,850 WT cells (lane 1121) reveals a fragment from non recombined DNA molecules (expected size 1201 bp, label 1123), presumably due to the incomplete digestion of the viral DNA by Swal.
Taken together, these results indicate that our method is sensitive enough to detect recombined viral-based expression vector when only 0.1 % of an infected cell population expresses the oFlp recombinase.
F~ Example 4: Production of Sindbis viral particles dependent on site-specific recombination This example illustrates how a cDNA copy of the Sindbis virus genome can be engineered such that replication and packaging is dependent on the activity of the Flp recombinase. Plasmid construction is schematized on Figure 12. Sindbis virus genome is a positive-strand RNA molecule. Because Flp acts on DNA
substrate, we constructed a CMV-based vector expressing a cDNA derived from the Sindbis virus genome as follows. A 4415 by BamHl-Xhol fragment from DH-BB comprising the Sindbis virus structural proteins coding sequence (SP) was subcloned into a 9241 by BamHl-Xhol fragment of plasmid pSinRep5 to generate VB220 (1200). A 1230 by fragment was amplified from pcDNA1.1/Amp by 25 cycles of PCR using high-fidelity Vent DNA polymerase and primers 18-100 (SEQ
ID N0:22) and 72-3 (SEQ ID N0:23) (1201). This fragment comprises the first nucleotide of the Sindbis virus genome positioned at the putative site of initiation of transcription of the cytomegalovirus immediate early enhancer/promoter elements (CMV). The PCR product was digested by Hincll and Munl (1202) and a 639 by fragment was inserted into a 13529 by fragment of VB220 resulting from digestion by Sacl, blunting of the extremities by T4 DNA polymerise followed by partial Munl digestion (1203). The resulting plasmid is called Vb233b. A 714 by fragment comprising signals for transcription stop and polyadenylation of 5 transcripts was obtained by digestion of pcDNA1.1/Amp with Sphl and Ncol (1204). After blunting its extremities with T4 DNA polymerise, the fragment was cloned in the blunted Xhol site of VB233b to generate VB250b (1205). A 449 by fragment comprising a recombinase substrate was amplified from RC24 (see Example 2) using primers 18-112 (SEQ ID N0:21) and 20-22 (SEQ ID N0:27) 10 and high fidelity Vent DNA polymerise (1206). This fragment was cloned in a partially-digested VB250b at a Munl site located in the 5' untranslated region of the cDNA copy of the Sindbis viral genome (1207). The resulting vector is VB271 b. Finally, a GFP coding sequence linked to promoter and enhancer sequences derived from the Rous sarcoma virus long terminal repeat (RSV LTR) 15 was introduced between the structural proteins coding sequence and the 3' untranslated region of the viral genome. Sequences from RSV LTR were chosen because they are strongly active in a wide variety of cell types (Gorman et al., , 1982). Cloning was performed as follows. The GFP coding sequence was amplified (1208) from pQBIfc3TM (Quantum Biotechnologies, Montreal, Canada) 20 and cloned (1209) into a 4005 by EcoRV fragment of pRcRSV (Invitrogen, Carlsbad, Ca.) to generate VB288b. A fragment comprising the RSV LTR and the GFP coding sequence was excised from VB288b (1210) and inserted at the blunted Apal site of VB271 b. The resulting expression vector is RC77 (SEQ ID
N0:3). A map of RC77 is given in Figure 12C and Table 3 hereinafter.

Table 3: Map of RC7~ (Fig '12C; SEQ ID N0:3) Region Function Position AmpR ~i-lactamase 166-1024 CMV cytomegalovirus immediate/early enhancer 1801-2397 and promoter FRT2 Flp recombination target sequence (SEQ 2508-2461 ID N0:8) GST~ stuffer 2763-2521 FRT1 Flp recombination target sequence (SEQ 2873-2817 ID N0:8) nSP Sindbis virus non structural proteins 2911-10449 SP Sindbis virus structural proteins 10563-14297 RSV Rous sarcoma virus enhancer and promoter 14377-14773 GFP green fluorescent coding sequence 14918-15640 3' SV Sindbis virus 3' untranslated region 15709-16051 pA SV40 polyadenylation signal/small t intron16092-16790 To verify that this vector could express an exogenous nucleic acid (i.e.
GFP in this case), 500ng of RC77 was transfected in HEK293A cells by lipofection using the EffecteneTM reagent (Qiagen Inc.). Forty-eight hours after transfection, cells were fixed with 4% paraformaldehyde and observed by fluorescence microscopy. As shown on Figure 13A, expression of GFP is detectable in approximately 2-5% of cells (image 1301 ). This result indicates that the transcription unit embedded in the cDNA copy of a disrupted Sindbis virus genome is active. To further verify that RC77 could lead to production of viral particles when recombined, 350ng of expression vector RC77 was transfected in HEK293A cells by lipofection with either 650ng of a vector expressing an optimized Flp coding sequence (RC59) or 650ng of a control vector expressing luciferase (VB35). Forty-eight hours after transfection, cells were fixed with 4%
(v/v) paraformaldehyde and processed for anti-C protein immunofluorescence.
Only cells co-transfected with RC77 and RC59 showed expression of this viral protein (Figure 13A, images 1304), presumably due to recombination of the expression vector, excision of the disruptive recombinase substrate and subsequent production of viral particles. By assaying for C protein immunoreactivity in monolayers of BHK-21 fibroblasts incubated with culture medium from co-transfected cells, we found that the expression vector RC77 gives rise to infectious viral particles when co-transfected with RC59 (Figure 13A, image 1305) but not when co-transfected with VB35 (Figure 13A, image 1303).
Our interpretation was further confirmed by RT-PCR analysis of total RNA
(500ng) extracted from co-transfected cells and using primers flanking the disruptive recombinase substrate 20-22V (SEQ ID N0:27) and 18-101V (SEQ ID
N0:28)). Reaction was perFormed using the TitanTM one-step RT-PCR (Roche Molecular) as follows: 50°C, 30 min; 94°C, 2 min, 25 cycles of 94°C 30s, 54°C, 30s and 68°C, 30s. No product is amplified from RNA extracted from untransfected BHK-21 cells (Figure 13B, lane 1311). A 470 by fragment (Figure 13B, arrow 1315) is amplified from RC77 plasmid DNA (Figure 13B, lane 1312) and from total RNA extracted from cells co-transfected with RC77 and either VB35 (Figure 13B, lane 1313) or RC59 (Figure 13B, lane 1314). However, an additional 108 by fragment is specifically amplified from cells co-transfected with RC77 and RC59 (Figure 13B, arrow 1316). The difference in the size of the amplicon is due to excision of the disruptive recombinase substrate.
Taken together, these results indicate that RC77 can be specifically recombined by oFlp and that removal of a stuffer region in the 5' untranslated portion of a cDNA copy of the Sindbis virus genome can restore production of infectious viral particles.
Example 5: Screening in a transgenic animal using virus-based expression vector.
This hypothetical example illustrates the design of a screening conducted in a transgenic animal to identify cDNAs encoding dominant activators of osteoblast differentiation. Of course, this example could easily be adapted and used for identifying and selecting cDNAs encoding other types of genes.
According to this example, the gene delivery vector that is used is an adenovirus particle containing a genome engineered as described in Example 3.
A library of adenoviral particles is constructed as outlined in section iv) starting from RNAs extracted from osteoblasts undergoing differentiation. The screening host is a transgenic mouse obtained as follows. The oFlp coding sequence and the 3' untranslated region of the bovine growth hormone cDNA are operatively linked to a fragment encompassing 1.7 kb found immediately upstream of the site of transcription initiation of the mouse osteocalcin gene. Osteocalcin is a well-known marker of osteoblast differentiation whose expression is controlled by this 1.7 kb cell-specific regulatory fragment. The resulting construct is injected into a pseudo-fertilized egg to obtain lines of transgenic mice according to standard protocols. Preparations of total RNA extracted from various tissues of heterozygotes animals are tested by Northern analysis to ensure that oFlp expression is restricted to the differentiated osteoblast, as is the endogenous osteocalcin gene. Transgenic animals are then injected intraperitoneally with plaque-forming units of the adenovirus-based expression vector library. At various times after injection (typically 2 to 10 days), animals are sacrificed and recombined expression vectors are detected, if any, as in Example 3, from muscle and adipose tissue. These tissues are selected because it is believed that they harbor cells that can differentiate into osteoblasts given the proper stimulus.
According to the design of the screen, exogenous nucleic acids (cDNAs) comprised in recombined expression vectors so produced in the mouse have the capacity to activate oFlp transcription from the osteocalcin regulatory fragment and may therefore be hypothesized to be dominant activators of osteoblast differentiation.
Example 6: In vivo screening for regulatory elements This hypothetical example illustrates the design of a screening conducted in a mouse to identify fragments of mouse genomic DNA that can confer tissue-specific expression to the linked oFlp coding sequence.
According to this example, the expression vector that is used comprises a transcription unit devoid of regulatory elements, as depicted on Figure 2B and a recombinase substrate constructed as depicted on Figure 3A. Fragments of mouse genomic DNA are obtained by partial digestion with Sau3A and cloned in the vector upstream of the oFlp coding sequence. The library of vectors is transfected in vivo by injection of Iipid:DNA complexes formed using commercially available transfection reagents. At various times after transfection in vivo, animals are sacrificed, extrachromosomal DNA is extracted from various tissues and digested with the enzyme cutting in the stuffer , region of the recombinase substrate. Exogenous nucleic acids from uncleaved recombined vectors are amplified by PCR using a primer located upstream of their site of insertion and another downstream of the recombinase substrate. Exogenous nucleic acids are reinserted in the vector and subjected to another round of in vivo screening.
Exogenous nucleic acids that are finally retrieved correspond to genomic DNA
fragments capable of activating the expression of oFlp in the tissue from which it was retrieved. By comparing sequences of fragments retrieved from different tissues, it is therefore possible to identify genomic fragments whose transcriptional activity is tissue-restricted.
REFERENCES
1- Angrand, P.-O., ef al. (1998), Nucl. Acid. Res., 26, 3263-3269.
2- Arad, lJ. (1998), Biofechniques, 24, 761-762.
3- Baker, A.R., et al. (1992), Mol. Cell Biol., 12, 5541-5547.
4- Berglund, P., et al. (1993), Bio/Technology, 11., 916-920.
5- Buchholz, F., ef al. (1998), Naf. Biotech., 16, 657-662.
6- Diaz, V., et al. (1999), J. Biol. Chem., 274, 6634-6640.
7- Ding, Y., et al. (1997), J. Biol. Chem., 272, 28142-28148.
8- Gorman, C.M., et al. (1982), Proc. Natl. Acad. Sci. U.S.A., 79, 6777-6781.
9- Jayaram, M. (1985), Proc. Nat/. Acad. Sci. U.S.A., 82, 5875-5879.
10-Logie, C. and Stewart, A.F. (1995), Proc. Nafl. Acad. Sci. U.S.A., 92, 5940-5944.
11-Metzger, D. and Feil, R. (1999), Curr. Opin. Biotech., 10, 470-476.
12-Mittal, S., ef al. (1993), Virus Res., 28, 67-90.
13-0'Gorman, S., et al. (1991), Science, 251, 1351-1355.
14-Pear, W.S., et al. (1993), Proc. Natl Acad. Sci. U.S.A., 90, 8392-8396.
15-Ragot, T., et al. (1998), Meth. Cell Biol., 52, 229-260.
16-Sadowski, I., et al. (1988), Nature, 335, 563-4.
17-Sauer, B. and Henderson, N. (1988), Proc. Natl. Acad. Sci. U.S.A., 85, 5166-5170.
18-Tamai, K.T., et al. (1997), Recent Prog. Norm. Res., 52, 121-139.
5 19-Therrien, M. and Drouin, J. (1991), Mol. Cell Biol., 11, 3492-3503.
20-Webster, N., et al. (1988), Cell, 52, 169.
21-Wrana, J.L. and Attisano, L. (2000), Cytokine Growth Factor Rev., 11, 5-13.
While several embodiments of the invention have been described, it will be 10 understood that the present invention is capable of further modifications, and this application is intended to cover any variations, uses, or adaptations of the invention, following in general the principles of the invention and including such departures from the present disclosure as to come within knowledge or customary practice in the art to which the invention pertains, and as may be 15 applied to the essential features hereinbefore set forth and falling within the scope of the invention or the limits of the appended claims.

SEQUENCE LISTING
<110> Phenogene Therapeutiques Inc.
<120> METHODS, VECTORS, CELL LINES AND KITS FOR SELECTING NUCLEIC ACIDS
HAVING A DESTRED FEATURE
<130> 000783-0006 <150> US 60/301,149 <151> 2001-06-28 <160> 38 <170> PatentIn version 3.1 <210> 1 <211> 6334 <212> DNA
<213> artificial sequence <220>

<223> mpletely Sequence synthesized is co <400>

gacggatcgggagatctcccgatcccctatggtcgactctcagtacaatctgctctgatg60 ccgcatagttaagccagtatctgctccctgcttgtgtgttggaggtcgctgagtagtgcg120 cgagcaaaatttaagctacaacaaggcaaggcttgaccgacaattgcatgaagaatctgc180 ttagggttaggcgttttgcgctgcttcgcctcgatcgaggcctggccattgcatacgttg240 tatccatatcataatatgtacatttatattggctcatgtccaacattaccgccatgttga300 cattgattattgactagttattaatagtaatcaattacggggtcattagttcatagccca360 tatatggagttccgcgttacataacttacggtaaatggcccgcctggctgaccgcccaac420 gacccccgcccattgacgtcaataatgacgtatgttcccatagtaacgccaatagggact480 ttccattgacgtcaatgggtggagtatttacggtaaactgcccacttggcagtacatcaa540 gtgtatcatatgccaagtacgccccctattgacgtcaatgacggtaaatggcccgcctgg600 cattatgcccagtacatgaccttatgggactttcctacttggcagtacatctacgtatta660 gtcatcgctattaccatggtgatgcggttttggcagtacatcaatgggcgtggatagcgg720 tttgactcacggggatttccaagtctccaccccattgacgtcaatgggagtttgttttgg780 caccaaaatcaacgggactttccaaaatgtcgtaacaactccgccccattgacgcaaatg840 ggcggtaggcgtgtacggtgggaggtctatataagcagagctcgtttagtgaaccgtcag900 atcgcctggagacgccatccacgctgttttgacctccatagaagacaccgggaccgatcc960 agcctcccggttaccggtaccggatcctcgaggtgaccgtttagcttggtaccgagctcg1020 -gatccactagtaacggccgccagtgtgctggaattctgcagatcgcgcaagaaatggcta1080 gcaaaggagaagaactcttcactggagttgtcccaattcttgttgaattagatggtgatg1140 ttaacggccacaagttctctgtcagtggagagggtgaaggtgatgcaacatacggaaaac1200 ttaccctgaagttcatctgcactactggcaaactgcctgttccatggccaacactagtca1260 ctactctgtgctatggtgttcaatgcttttcaagatacccggatcatatgaaacggcatg1320 actttttcaagagtgccatgcccgaaggttatgtacaggaaaggaccatcttcttcaaag1380 atgacggcaactacaagacacgtgctgaagtcaagtttgaaggtgatacccttgttaata1440 gaatcgagttaaaaggtattgacttcaaggaagatggcaacattctgggacacaaattgg1500 aatacaactataactcacacaatgtatacatcatggcagacaaacaaaagaatggaatca1560 aagtgaacttcaagacccgccacaacattgaagatggaagcgttcaactagcagaccatt1620 atcaacaaaatactccaattggcgatggccctgtccttttaccagacaaccattacctgt1680 ccacacaatctgccctttcgaaagatcccaacgaaaagagagaccacatggtccttcttg1740 agtttgtaacagctgctgggattacacatggcatggatgaactgtacaacatcgatggag1800 gcggaggtggaaagggcccggttaccggtaccggatccagatatctgggcggccgctcag1860 caagctaaacgcggccgcacaattctgactaactagacgcgttggccctattctatagtg1920 tcacctaaatgctagagctcgctgatcagcctcgactgtgccttctagttgccagccatc1980 tgttgtttgcccctcccccgtgccttccttgaccctggaaggtgccactcccactgtcct2040 ttcctaataaaatgaggaaattgcatcgcattgtctgagtaggtgtcattctattctggg2100 gggtggggtggggcaggacagcaagggggaggattgggaagacaatagcaggcatgctgg2160 ggatgcggtgggctctatggcttctgaggcggaaagaaccagctggggctctagggggta2220 tccccacgcgccctgtagcggcgcattaagcgcggcgggtgtggtggttacgcgcagcgt2280 gaccgctacacttgccagcgccctagcgcccgctcctttcgctttcttcccttcctttct2340 cgccacgttcgccggctttccccgtcaagctctaaatcggggcatccctttagggttccg2400 atttagtgctttacggcacctcgaccccaaaaaacttgattagggtgatggttcacctgg2460 cacgacaggtttcccgactggaaagcgggcagtgagcgcaacgcaattaatgtgagttag2520 ctcactcattaggcaccccaggctttacactttatgcttccggctcgtatgttgtgtgga2580 attgtgagcggataacaatttcacacaggaaacagctatgaccatgattacgccaagctc2640 gaaattaaccctcactaaagggaacaaaagctggagctccaccgcggtggcggccggccg2700 ctctagaactagtggatccttgaagttcctattccgaagttcctattctctagaaagtat2760 aggaacttcagcgcgattctgaaatgagctgttccaattcgccctatagtgagtcgtatt2820 acgcgcgaaatggtgatcatgtaacccatcctgacttcatgttgtatgacgctcttgatg2880 ttgttttatacatggacccaatgtgcctggatgcgttcccaaaattagtttgttttaaaa2940 aacgtattgaagctatcccacaaattgataagttcgcgatttaaattaattaagcttact3000 tgaaatccagcaagtatatagcatggcctttgcagggctggcaagccacgtttggtggtg3060 gcgaccatcctccaaaatcggatctggttccgcgtggatcccccggggaagttcctattc3120 cgaagttcctattctctagaaagtataggaacttcctatgactgggcacaacagacaatc3180 ggctgctctgatgccgccgtgttccggctgtcagcgcaggggcgcccggttctttttgtc3240 aagaccgacctgtccggtgccctgaatgaactgcaggacgaggcagcgcggctatcgtgg3300 ctggccacgacgggcgttccttgcgcagctgtgctcgacgttgtcactgaagcgggaagg3360 gactggctgctattgggcgaagtgccggggcaggatctcctgtcatctcaccttgctcct3420 gccgagaaagtatccatcatggctgatgcaatgcggcggctgcatacgcttgatccggct3480 acctgcccattcgaccaccaagcgaaacatcgcatcgagcgagcacgtactcggatggaa3540 gccggtcttgtcgatcaggatgatctggacgaagagcatcaggggctcgcgccagccgaa3600 ctgttcgccaggctcaaggcgcgcatgcccgacggcgaggatctcgtcgtgacccatggc3660 gatgcctgcttgccgaatatcatggtggaaaatggccgcttttctggattcatcgactgt3720 ggccggctgggtgtggcggaccgctatcaggacatagcgttggctacccgtgatattgct3780 gaagagcttggcggcgaatgggctgaccgcttcctcgtgctttacggtatcgccgctccc3840 gattcgcagcgcatcgccttctatcgccttcttgacgagttcttctgagcgggactctgg3900 ggttcgaaatgaccgaccaagcgacgcccaacctgccatcacgagattcgaattccaccg3960 ccgccttctatgaaaggttgggcttcggaatcgttttccgggacgccggctggatgatcc4020 tccagcgcggggatctcatgctggagttcttcgcccgggctgcaggaattagcttatcga4080 taccgtccattctagttgtggtttgtccaaactcatcaatgtatcttatcatgtctgtat4140 accgtcgacctctagctagagcttggcgtaatcatggtcatagctgtttcctgtgtgaaa4200 ttgttatccgctcacaattccacacaacatacgagccggaagcataaagtgtaaagcctg4260 gggtgcctaatgagtgagctaactcacattaattgcgttgcgctcactgcccgctttcca4320 gtcgggaaacctgtcgtgccagctgcattaatgaatcggccaacgcgcggggagaggcgg4380 tttgcgtattgggcgctcttccgcttcctcgctcactgactcgctgcgctcggtcgttcg4440 gctgcggcgagcggtatcagctcactcaaaggcggtaatacggttatccacagaatcagg4500 ggataacgcaggaaagaacatgtgagcaaaaggccagcaaaaggccaggaaccgtaaaaa4560 ggccgcgttgctggcgtttttccataggctccgcccccctgacgagcatcacaaaaatcg4620 acgctcaagtcagaggtggcgaaacccgacaggactataaagataccaggcgtttccccc4680 -tggaagctccctcgtgcgctctcctgttccgaccctgccgcttaccggatacctgtccgc4740 ctttctcccttcgggaagcgtggcgctttctcaatgctcacgctgtaggtatctcagttc4800 ggtgtaggtcgttcgctccaagctgggctgtgtgcacgaaccccccgttcagcccgaccg4860 ctgcgccttatccggtaactatcgtcttgagtccaacccggtaagacacgacttatcgcc4920 actggcagcagccactggtaacaggattagcagagcgaggtatgtaggcggtgctacaga4980 gttcttgaagtggtggcctaactacggctacactagaaggacagtatttggtatctgcgc5040 tctgctgaagccagttaccttcggaaaaagagttggtagctcttgatccggcaaacaaac5100 caccgctggtagcggtggtttttttgtttgcaagcagcagattacgcgcagaaaaaaagg5160 atctcaagaagatcctttgatcttttctacggggtctgacgctcagtggaacgaaaactc5220 acgttaagggattttggtcatgagattatcaaaaaggatcttcacctagatccttttaaa5280 ttaaaaatgaagttttaaatcaatctaaagtatatatgagtaaacttggtctgacagtta5340 ccaatgcttaatcagtgaggcacctatctcagcgatctgtctatttcgttcatccatagt5400 tgcctgactccccgtcgtgtagataactacgatacgggagggcttaccatctggccccag5460 tgctgcaatgataccgcgagacccacgctcaccggctccagatttatcagcaataaacca5520 gccagccggaagggccgagcgcagaagtggtcctgcaactttatccgcctccatccagtc5580 tattaattgttgccgggaagctagagtaagtagttcgccagttaatagtttgcgcaacgt5640 tgttgccattgctacaggcatcgtggtgtcacgctcgtcgtttggtatggcttcattcag5700 ctccggttcccaacgatcaaggcgagttacatgatcccccatgttgtgcaaaaaagcggt5760 tagctccttcggtcctccgatcgttgtcagaagtaagttggccgcagtgttatcactcat5820 ggttatggcagcactgcataattctcttactgtcatgccatccgtaagatgcttttctgt5880 gactggtgagtactcaaccaagtcattctgagaatagtgtatgcggcgaccgagttgctc5940 ttgcccggcgtcaatacgggataataccgcgccacatagcagaactttaaaagtgctcat6000 cattggaaaacgttcttcggggcgaaaactctcaaggatcttaccgctgttgagatccag6060 ttcgatgtaacccactcgtgcacccaactgatcttcagcatcttttactttcaccagcgt6120 ttctgggtgagcaaaaacaggaaggcaaaatgccgcaaaaaagggaataagggcgacacg6180 gaaatgttgaatactcatactcttcctttttcaatattattgaagcatttatcagggtta6240 ttgtctcatgagcggatacatatttgaatgtatttagaaaaataaacaaataggggttcc6300 gcgcacatttccccgaaaagtgccacctgacgtc 6334 <210>

<211>

<212>
DNA

<213>
artificial sequence <220>

<223> uence is Seq completely synthesized <400>

gaattcatcgagcttagtgccccttctgggttatagacgagcgtgtccgtatgctcgctt60 ttataggtgggatcattgttgttgacgaccacctgggcggtatcgtctttttcggccata120 ttaatgccgatcatcaccagaaccgccagtgatagcacaatgataacccaacgtctggct180 ttactcatcgtcgagcgtttaaactagaagcgatgaattctcatgtttgacagcttatca240 tcatcaataatataccttattttggattgaagccaatatgataatgagggggtggagttt300 gtgacgtggcgcggggcgtgggaacggggcgggtgacgtagtagtgtggcggaagtgtga360 tgttgcaagtgtggcggaacacatgtaagcgacggatgtggcaaaagtgacgtttttggt420 gtgcgccggtgtacacaggaagtgacaattttcgcgcggttttaggcggatgttgtagta480 aatttgggcgtaaccgagtaagatttggccattttcgcgggaaaactgaataagaggaag540 tgaaatctgaataattttgtgttactcatagcgcgtaatatttgtctagggccgccagat600 ctggcgcgccggtacctcgacggatcgggagatctcccgatcccctatggtcgactctca660 gtacaatctgctctgatgccgcatagttaagccagtatctgctccctgcttgtgtgttgg720 aggtcgctgagtagtgcgcgagcaaaatttaagctacaacaaggcaaggcttgaccgaca780 attgcatgaagaatctgcttagggttaggcgttttgcgctgcttcgcctcgatcgaggcc840 tggccattgcatacgttgtatccatatcataatatgtacatttatattggctcatgtcca900 acattaccgccatgttgacattgattattgactagttattaatagtaatcaattacgggg960 tcattagttcatagcccatatatggagttccgcgttacataacttacggtaaatggcccg1020 cctggctgaccgcccaacgacccccgcccattgacgtcaataatgacgtatgttcccata1080 gtaacgccaatagggactttccattgacgtcaatgggtggagtatttacggtaaactgcc1140 cacttggcagtacatcaagtgtatcatatgccaagtacgccccctattgacgtcaatgac1200 ggtaaatggcccgcctggcattatgcccagtacatgaccttatgggactttcctacttgg1260 cagtacatctacgtattagtcatcgctattaccatggtgatgcggttttggcagtacatc1320 aatgggcgtggatagcggtttgactcacggggatttccaagtctccaccccattgacgtc1380 aatgggagtttgttttggcaccaaaatcaacgggactttccaaaatgtcgtaacaactcc1440 gccccattgacgcaaatgggcggtaggcgtgtacggtgggaggtctatataagcagagct1500 cgtttagtgaaccgtcagatcgcctggagacgccatccacgctgttttgacctccataga1560 agacaccgggaccgatccagcctcccggttaccggtaccggatcctcgaggtgaccgttt1620 agcttggtaccgagctcggatccactagtaacggccgccagtgtgctggaattctgcaga1680 tcgcgcaaga aatggctagc aaaggagaag aactcttcac tggagttgtc ccaattcttg 1740 ttgaattaga tggtgatgtt aacggccaca agttctctgt cagtggagag ggtgaaggtg 1800 atgcaacata cggaaaactt accctgaagt tcatctgcac tactggcaaa ctgcctgttc 1860 catggccaac actagtcact actctgtgct atggtgttca atgcttttca agatacccgg 1920 atcatatgaa acggcatgac tttttcaaga gtgccatgcc cgaaggttat gtacaggaaa 1980 ggaccatctt cttcaaagat gacggcaact acaagacacg tgctgaagtc aagtttgaag 2040 gtgataccct tgttaataga atcgagttaa aaggtattga cttcaaggaa gatggcaaca 2100 ttctgggaca caaattggaa tacaactata actcacacaa tgtatacatc atggcagaca 2160 aacaaaagaa tggaatcaaa gtgaacttca agacccgcca caacattgaa gatggaagcg 2220 ttcaactagc agaccattat caacaaaata ctccaattgg cgatggccct gtccttttac 2280 cagacaacca ttacctgtcc acacaatctg ccctttcgaa agatcccaac gaaaagagag 2340 accacatggt ccttcttgag tttgtaacag ctgctgggat tacacatggc atggatgaac 2400 tgtacaacat cgatggaggc ggaggtggaa agggcccggt taccggtacc ggatccagat 2460 atctgggcgg ccgctcagca agctaaacgc ggccgcacaa ttctgactaa ctagacgcgt 2520 tggccctatt ctatagtgtc acctaaatgc tagagctcgc tgatcagcct cgactgtgcc 2580 ttctagttgc cagccatctg ttgtttgccc ctcccccgtg ccttccttga ccctggaagg 2640 tgccactccc actgtccttt cctaataaaa tgaggaaatt gcatcgcatt gtctgagtag 2700 gtgtcattct attctggggg gtggggtggg gcaggacagc aagggggagg attgggaaga 2760 caatagcagg catgctgggg atgcggtggg ctctatggct tctgaggcgg aaagaaccag 2820 ctggggctct agggggtatc cccacgcgcc ctgtagcggc gcattaagcg cggcgggtgt 2880 ggtggttacg cgcagcgtga ccgctacact tgccagcgcc ctagcgcccg ctcctttcgc 2940 tttcttccct tcctttctcg ccacgttcgc cggctttccc cgtcaagctc taaatcgggg 3000 catcccttta gggttccgat ttagtgcttt acggcacctc gaccccaaaa aacttgatta 3060 gggtgatggt tcacctggca cgacaggttt cccgactgga aagcgggcag tgagcgcaac 3120 gcaattaatg tgagttagct cactcattag gcaccccagg ctttacactt tatgcttccg 3180 gctcgtatgt tgtgtggaat tgtgagcgga taacaatttc acacaggaaa cagctatgac 3240 catgattacg ccaagctcga aattaaccct cactaaaggg aacaaaagct ggagctccac 3300 cgcggtggcg gccggccgct ctagaactag tggatccttg aagttcctat tccgaagttc 3360 ctattctcta gaaagtatag gaacttcagc gcgattctga aatgagctgt tccaattcgc 3420 cctatagtga gtcgtattac gcgcgaaatg gtgatcatgt aacccatcct gacttcatgt 3480 tgtatgacgc tcttgatgtt gttttataca tggacccaat gtgcctggat gcgttcccaa~ 3540 aattagtttgttttaaaaaacgtattgaagctatcccacaaattgataagttcgcgattt3600 aaattaattaagcttacttgaaatccagcaagtatatagcatggcctttgcagggctggc3660 aagccacgtttggtggtggcgaccatcctccaaaatcggatctggttccgcgtggatccc3720 ccggggaagttcctattccgaagttcctattctctagaaagtataggaacttcctatgac3780 tgggcacaacagacaatcggctgctctgatgccgccgtgttccggctgtcagcgcagggg3840 cgcccggttctttttgtcaagaccgacctgtccggtgccctgaatgaactgcaggacgag3900 gcagcgcggctatcgtggctggccacgacgggcgttccttgcgcagctgtgctcgacgtt3960 gtcactgaagcgggaagggactggctgctattgggcgaagtgccggggcaggatctcctg4020 tcatctcaccttgctcctgccgagaaagtatccatcatggctgatgcaatgcggcggctg4080 catacgcttgatccggctacctgcccattcgaccaccaagcgaaacatcgcatcgagcga4140 gcacgtactcggatggaagccggtcttgtcgatcaggatgatctggacgaagagcatcag4200 gggctcgcgccagccgaactgttcgccaggctcaaggcgcgcatgcccgacggcgaggat4260 ctcgtcgtgacccatggcgatgcctgcttgccgaatatcatggtggaaaatggccgcttt4320 tctggattcatcgactgtggccggctgggtgtggcggaccgctatcaggacatagcgttg4380 gctacccgtgatattgctgaagagcttggcggcgaatgggctgaccgcttcctcgtgctt4440 tacggtatcgccgctcccgattcgcagcgcatcgccttctatcgccttcttgacgagttc4500 ttctgagcgggactctggggttcgaaatgaccgaccaagcgacgcccaacctgccatcac4560 gagattcgaattccaccgccgccttctatgaaaggttgggcttcggaatcgttttccggg4620 acgccggctggatgatcctccagcgcggggatctcatgctggagttcttcgcccgggctg4680 caggaattagcttatcgataccgtccattctagttgtggtttgtccaaactcatcaatgt4740 atcttatcatgtctgtataccgtcgagaagcttggcgcgccgaactaggccgcggatctg4800 gaaggtgctgaggtacgatgagacccgcaccaggtgcagaccctgcgagtgtggcggtaa4860 acatattaggaaccagcctgtgatgctggatgtgaccgaggagctgaggcccgatcactt4920 ggtgctggcctgcacccgcgctgagtttggctctagcgatgaagatacagattgaggtac4980 tgaaatgtgtgggcgtggcttaagggtgggaaagaatatataaggtgggggtcttatgta5040 gttttgtatctgttttgcagcagccgccgccgccatgagcaccaactcgtttgatggaag5100 cattgtgagctcatatttgacaacgcgcatgcccccatgggccggggtgcgtcagaatgt5160 gatgggctccagcattgatggtcgccccgtcctgcccgcaaactctactaccttgaccta5220 cgagaccgtgtctggaacgccgttggagactgcagcctccgccgccgcttcagccgctgc5280 agccaccgcccgcgggattgtgactgactttgctttcctgagcccgcttgcaagcagtgc5340 _ 8/29 _ agcttcccgttcatccgcccgcgatgacaagttgacggctcttttggcacaattggattc5400 tttgacccgggaacttaatgtcgtttctcagcagctgttggatctgcgccagcaggtttc5460 tgccctgaaggcttcctcccctcccaatgcggtttaaaacataaataaaaaaccagactc5520 tgtttggatttggatcaagcaagtgtcttgctgtctttatttaggggttttgcgcgcgcg5580 gtaggcccgggaccagcggtctcggtcgttgagggtcctgtgtattttttccaggacgtg5640 gtaaaggtgactctggatgttcagatacatgggcataagcccgtctctggggtggaggta5700 gcaccactgcagagcttcatgctgcggggtggtgttgtagatgatccagtcgtagcagga5760 gcgctgggcgtggtgcctaaaaatgtctttcagtagcaagctgattgccaggggcaggcc5820 cttggtgtaagtgtttacaaagcggttaagctgggatgggtgcatacgtggggatatgag5880 atgcatcttggactgtatttttaggttggctatgttcccagccatatccctccggggatt5940 catgttgtgcagaaccaccagcacagtgtatccggtgcacttgggaaatttgtcatgtag6000 cttagaaggaaatgcgtggaagaacttggagacgcccttgtgacctccaagattttccat6060 gcattcgtccataatgatggcaatgggcccacgggcggcggcctgggcgaagatatttct6120 gggatcactaacgtcatagttgtgttccaggatgagatcgtcataggecatttttacaaa6180 gcgcgggcggagggtgccagactgcggtataatggttccatccggcccaggggcgtagtt6240 accctcacagatttgcatttcccacgctttgagttcagatggggggatcatgtctacctg6300 cggggcgatgaagaaaacggtttccggggtaggggagatcagctgggaagaaagcaggtt6360 cctgagcagctgcgacttaccgcagccggtgggcccgtaaatcacacctattaccgggtg6420 caactggtagttaagagagctgcagctgccgtcatccctgagcaggggggccacttcgtt6480 aagcatgtccctgactcgcatgttttccctgaccaaatccgccagaaggcgctcgccgcc6540 cagcgatagcagttcttgcaaggaagcaaagtttttcaacggtttgagaccgtccgccgt6600 aggcatgcttttgagcgtttgaccaagcagttccaggcggtcccacagctcggtcacctg6660 ctctacggcatctcgatccagcatatctcctcgtttcgcgggttggggcggctttcgctg6720 tacggcagtagtcggtgctcgtccagacgggccagggtcatgtctttccacgggcgcagg6780 gtcctcgtcagcgtagtctgggtcacggtgaaggggtgcgctccgggctgcgcgctggcc6840 agggtgcgcttgaggctggtcctgctggtgctgaagcgctgccggtcttcgccctgcgcg6900 tcggccaggtagcatttgaccatggtgtcatagtccagcccctccgcggcgtggcccttg6960 gcgcgcagcttgcccttggaggaggcgccgcacgaggggcagtgcagacttttgagggcg7020 tagagcttgggcgcgagaaataccgattccggggagtaggcatccgcgccgcaggccccg7080 cagacggtctcgcattccacgagccaggtgagctctggccgttcggggtcaaaaaccagg7140 tttcccccatgctttttgatgcgtttcttacctctggtttccatgagccggtgtccacgc7200 - ~ ° ~ a tcggtgacgaaaaggctgtccgtgtccccgtatacagacttgagaggcctgtcctcgacc7260 gatgcccttgagagccttcaacccagtcagctccttccggtgggcgcggggcatgactat7320 cgtcgccgcacttatgactgtcttctttatcatgcaactcgtaggacaggtgccggcagc7380 gctctgggtcattttcggcgaggaccgctttcgctggagcgcgacgatgatcggcctgtc7440 gcttgcggtattcggaatcttgcacgccctcgctcaagccttcgtcactggtcccgccac7500 caaacgtttcggcgagaagcaggccattatcgccggcatggcggccgacgcgctgggcta7560 cgtcttgctggcgttcgcgacgcgaggctggatggccttccccattatgattcttctcgc7620 ttccggcggcatcgggatgcccgcgttgcaggccatgctgtccaggcaggtagatgacga7680 ccatcagggacagcttcaaggatcgctcgcggctcttaccagctgagcaaaaggccagca7740 aaaggccaggaaccgtaaaaaggccgcgttgctggcgtttttccataggctccgcccccc7800 tgacgagcatcacaaaaatcgacgctcaagtcagaggtggcgaaacccgacaggactata7860 aagataccaggcgtttccccCtggaagCtCCCtCgtgCgCtCtCCtgttCCgaCCCtgCC7920 gcttaccggatacctgtccgcctttctcccttcgggaagcgtggcgctttctcaatgctc7980 acgctgtaggtatctcagttcggtgtaggtcgttcgctccaagctgggctgtgtgcacga8040 aCCCCCCgttCagCCCgaCCgctgcgccttatCCggtaaCtatCgtCttgagtCCaaCCC8100 ggtaagacacgacttatcgccactggcagcagccactggtaacaggattagcagagcgag8160 gtatgtaggcggtgctacagagttcttgaagtggtggcctaactacggctacactagaag8220 gacagtatttggtatctgcgctctgctgaagccagttaccttcggaaaaagagttggtag8280 ctcttgatccggcaaacaaaccaccgctggtagcggtggtttttttgtttgcaagcagca8340 gattacgcgcagaaaaaaaggatctcaagaagatcctttgatcttttctacggggtctga8400 cgctcagtggaacgaaaactcacgttaagggattttggtcatgagattatcaaaaaggat8460 cttcacctagatccttttaaattaaaaatgaagttttaaatcaatctaaagtatatatga8520 gtaaacttggtctgacagttaccaatgcttaatcagtgaggcacctatctcagcgatctg8580 tctatttcgttcatccatagttgcctgactccccgtcgtgtagataactacgatacggga8640 gggcttaccatctggccccagtgctgcaatgataccgcgagacccacgctcaccggctcc8700 agatttatcagcaataaaccagccagccggaagggccgagcgcagaagtggtcctgcaac8760 tttatccgcctccatccagtctattaattgttgccgggaagctagagtaagtagttcgcc8820 agttaatagtttgcgcaacgttgttgccattgctgcaggcatcgtggtgtcacgctcgtc8880 gtttggtatggcttcattcagctccggttcccaacgatcaaggcgagttacatgatcccc8940 catgttgtgcaaaaaagcggttagctccttcggtcctccgatcgttgtcagaagtaagtt9000 -ggccgcagtgttatcactcatggttatggcagcactgcataattctcttactgtcatgcc9060 atccgtaagatgcttttctgtgactggtgagtactcaaccaagtcattctgagaatagtg9120 tatgcggcgaccgagttgctcttgcccggcgtcaacacgggataataccgcgccacatag9180 cagaactttaaaagtgctcatcattggaaaacgttcttcggggcgaaaactctcaaggat9240 cttaccgctgttgagatccagttcgatgtaacccactcgtgcacccaactgatcttcagc9300 atcttttactttcaccagcgtttctgggtgagcaaaaacaggaaggcaaaatgccgcaaa9360 aaagggaataagggcgacacggaaatgttgaatactcatactcttcctttttcaatatta9420 ttgaagcatttatcagggttattgtctcatgagcggatacatatttgaatgtatttagaa9480 aaataaacaaataggggttccgcgcacatttccccgaaaagtgccacctgacgtctaaga9540 aaccattattatcatgacattaacctataaaaataggcgtatcacgaggccctttcgtct9600 tcaa 9604 <210>

<211>

<212>
DNA

<213>
artificial sequence <220>

<223> uence is Seq completely synthesized <400>

taatcagggaattaattcttgaagacgaaagggccaggtggcacttttcggggaaatgtg60 cgcggaacccctatttgtttatttttctaaatacattcaaatatgtatccgctcatgaga120 caataaccctgataaatgcttcaataatattgaaaaaggaagagtatgagtattcaacat180 ttCCgtgtCgCCCttattCCCttttttgCggCattttgCCttCCtgtttttgctcaccca240 gaaacgctggtgaaagtaaaagatgctgaagatcagttgggtgcacgagtgggttacatc300 gaactggatctcaacagcggtaagatccttgagagttttcgccccgaagaacgttttcca360 atgatgagcacttttaaagttctgctatgtggcgcggtattatcccgtgttgacgccggg420 caagagcaactcggtcgccgcatacactattctcagaatgacttggttgagtactcacca480 gtcacagaaaagcatcttacggatggcatgacagtaagagaattatgcagtgctgccata540 accatgagtgataacactgcggccaacttacttctgacaacgatcggaggaccgaaggag600 ctaaccgcttttttgcacaacatgggggatcatgtaactcgccttgatcgttgggaaccg660 gagctgaatgaagccataccaaacgacgagcgtgacaccacgatgcctgtagcaatggca720 acaacgttgcgcaaactattaactggcgaactacttactctagcttcccggcaacaatta780 atagactggatggaggcggataaagttgcaggaccacttctgcgctcggcccttccggct840 ggctggtttattgctgataaatctggagccggtgagcgtgggtctcgcggtatcattgca900 gcactggggccagatggtaagccctcccgtatcgtagttatctacacgacggggagtcag960 gcaactatggatgaacgaaatagacagatcgctgagataggtgcctcactgattaagcat1020 tggtaactgtcagaccaagtttactcatatatactttagattgatttaaaacttcatttt1080 taatttaaaaggatctaggtgaagatcctttttgataatctcatgaccaaaatcccttaa1140 cgtgagttttcgttccactgagcgtcagaccccgtagaaaagatcaaaggatcttcttga1200 gatcctttttttctgcgcgtaatctgctgcttgcaaacaaaaaaaccaccgctaccagcg1260 gtggtttgtttgccggatcaagagctaccaactctttttccgaaggtaactggcttcagc1320 agagcgcagataccaaatactgtccttctagtgtagccgtagttaggccaccacttcaag1380 aactctgtagcaccgcctacatacctcgctctgctaatcctgttaccagtggctgctgcc1440 agtggcgataagtcgtgtcttaccgggttggactcaagacgatagttaccggataaggcg1500 cagcggtcgggctgaacggggggttcgtgcacacagcccagcttggagcgaacgacctac1560 accgaactgagatacctacagcgtgagcattgagaaagcgccacgcttcccgaagggaga1620 aaggcggacaggtatccggtaagcggcagggtcggaacaggagagcgcacgagggagctt1680 ccagggggaaacgcctggtatctttatagtcctgtcgggtttcgccacctctgacttgag1740 cgtcgatttttgtgatgctcgtcaggggggcggagcctatggaaaaacgccagcaacgcg1800 gacattgattattgactagttattaatagtaatcaattacggggtcattagttcatagcc1860 catatatggagttccgcgttacataacttacggtaaatggcccgcctggctgaccgccca1920 acgacccccgcccattgacgtcaataatgacgtatgttcccatagtaacgccaataggga1980 ctttccattgacgtcaatgggtggactatttacggtaaactgcccacttggcagtacatc,2040 aagtgtatcatatgccaagtacgccccctattgacgtcaatgacggtaaatggcccgcct2100 ggcattatgcccagtacatgaccttatgggactttcctacttggcagtacatctacgtat2160 tagtcatcgctattaccatggtgatgcggttttggcagtacatcaatgggcgtggatagc2220 ggtttgactcacggggatttccaagtctccaccccattgacgtcaatgggagtttgtttt2280 ggcaccaaaatcaacgggactttccaaaatgtcgtaacaactccgccccattgacgcaaa2340 tgggcggtaggcgtgtacggtgggaggtctatataagcagagctctctggctaactagat2400 tgacggcgtagtacacactattgaatcaaacagccgaccaattgttgtgcccagtcaaag2460 gaagttcctatactttctagagaataggaacttcggaataggaacttccccgggggatcc2520 acgcggaaccagatccgattttggaggatggtcgccaccaccaaacgtggcttgccagcc2580 ctgcaaaggccatgctatatacttgctggatttcaagtacttatcaatttgtgggatagc2640 ttcaatacgttttttaaaacaaactaattttgggaacgcatccaggcacattgggtccat2700 ~ ~~ss~r. ~r ~ o eu ~ ~

-gtataaaacaacatcaagagcgtcatacaacatgaagtcaggatgggttacatgatcacc2760 atttcgcgcgtaatacgactcactatagggcgaattggaacagctcatttcagaatcgcg2820 ctgaagttcctatactttctagagaataggaacttcggaataggaacttcaaggatccac2880 tagttctagagcaattgcactaccatcacaatggagaagccagtagtaaacgtagacgta2940 gacccccagagtccgtttgtcgtgcaactgcaaaaaagcttcccgcaatttgaggtagta3000 gcacagcaggtcactccaaatgaccatgctaatgccagagcattttcgcatctggccagt3060 aaactaatcgagctggaggttcctaccacagcgacgatcttggacataggcagcgcaccg3120 gctcgtagaatgttttccgagcaccagtatcattgtgtctgccccatgcgtagtccagaa3180 gacccggaccgcatgatgaaatacgccagtaaactggcggaaaaagcgtgcaagattaca3240 aacaagaacttgcatgagaagattaaggatctccggaccgtacttgatacgccggatgct3300 gaaacaccatcgctctgctttcacaacgatgttacctgcaacatgcgtgccgaatattcc3360 gtcatgcaggacgtgtatatcaacgctcccggaactatctatcatcaggctatgaaaggc3420 gtgcggaccctgtactggattggcttcgacaccacccagttcatgttctcggctatggca3480 ggttcgtaccctgcgtacaacaccaactgggccgacgagaaagtccttgaagcgcgtaac3540 atcggactttgcagcacaaagctgagtgaaggtaggacaggaaaattgtcgataatgagg3600 aagaaggagttgaagcccgggtcgcgggtttatttctccgtaggatcgacactttatcca3660 gaacacagagccagcttgcagagctggcatcttccatcggtgttccacttgaatggaaag3720 cagtcgtacacttgccgctgtgatacagtggtgagttgcgaaggctacgtagtgaagaaa3780 atcaccatcagtcccgggatcacgggagaaaccgtgggatacgcggttacacacaatagc3840 gagggcttcttgctatgcaaagttactgacacagtaaaaggagaacgggtatcgttccct3900 gtgtgcacgtacatcccggccaccatatgcgatcagatgactggtataatggccacggat3960 atatcacctgacgatgcacaaaaacttctggttgggctcaaccagcgaattgtcattaac4020 ggtaggactaacaggaacaccaacaccatgcaaaattaccttctgccgatcatagcacaa4080 gggttcagcaaatgggctaaggagcgcaaggatgatcttgataacgagaaaatgctgggt4140 actagagaacgcaagcttacgtatggctgcttgtgggcgtttcgcactaagaaagtacat4200 tcgttttatcgcccacctggaacgcagacctgcgtaaaagtcccagcctcttttagcgct4260 tttcccatgtcgtccgtatggacgacctctttgcccatgtcgctgaggcagaaattgaaa~

ctggcattgcaaccaaagaaggaggaaaaactgctgcaggtctcggaggaattagtcatg4380 gaggccaaggctgcttttgaggatgctcaggaggaagccagagcggagaagctccgagaa4440 gcacttccaccattagtggcagacaaaggcatcgaggcagccgcagaagttgtctgcgaa4500 gtggaggggctccaggcggacatcggagcagcattagttgaaaccccgcgcggtcacgta4560 aggataatacctcaagcaaatgaccgtatgatcggacagtatatcgttgtctcgccaaac4620 tctgtgctgaagaatgccaaactcgcaccagcgcacccgctagcagatcaggttaagatc4680 ataacacactccggaagatcaggaaggtacgcggtcgaaccatacgacgctaaagtactg4740 atgccagcaggaggtgccgtaccatggccagaattcctagcactgagtgagagcgccacg4800 ttagtgtacaacgaaagagagtttgtgaaccgcaaactataccacattgccatgcatggc4860 cccgccaagaatacagaagaggagcagtacaaggttacaaaggcagagcttgcagaaaca4920 gagtacgtgtttgacgtggacaagaagcgttgcgttaagaaggaagaagcctcaggtctg4980 gtcctctcgggagaactgaccaaccctccctatcatgagctagctctggagggactgaag5040 acccgacctgcggtcccgtacaaggtcgaaacaataggagtgataggcacaccggggtcg5100 ggcaagtcagctattatcaagtcaactgtcacggcacgagatcttgttaccagcggaaag5160 aaagaaaattgtcgcgaaattgaggccgacgtgctaagactgaggggtatgcagattacg5220 tcgaagacagtagattcggttatgctcaacggatgccacaaagccgtagaagtgctgtac5280 gttgacgaagcgttcgcgtgccacgcaggagcactacttgccttgattgctatcgtcagg5340 ccccgcaagaaggtagtactatgcggagaccccatgcaatgcggattcttcaacatgatg5400 caactaaaggtacatttcaatcaccctgaaaaagacatatgcaccaagacattctacaag5460 tatatctcccggcgttgcacacagccagttacagctattgtatcgacactgcattacgat5520 ggaaagatgaaaaccacgaacccgtgcaagaagaacattgaaatcgatattacaggggcc5580 acaaagccgaagccaggggatatcatcctgacatgtttccgcgggtgggttaagcaattg5640 caaatcgactatcccggacatgaagtaatgacagccgcggcctcacaagggctaaccaga5700 ~aaaggagtgtatgccgtccggcaaaaagtcaatgaaaacccactgtacgcgatcacatca5760 gagcatgtgaacgtgttgctcacccgcactgaggacaggctagtgtggaaaaccttgcag5820 ggcgacccatggattaagcagcccactaacatacctaaaggaaactttcaggctactata5880 gaggactgggaagctgaacacaagggaataattgctgcaataaacagccccactccccgt5940 gccaatccgttcagctgcaagaccaacgtttgctgggcgaaagcattggaaccgatacta6000 gccacggccggtatcgtacttaccggttgccagtggagcga.actgttcccacagtttgcg6060 gatgacaaaccacattcggccatttacgccttagacgtaatttgcattaagtttttcggc6120 atggacttgacaagcggactgttttctaaacagagcatcccactaacgtaccatcccgcc6180 gattcagcgaggccggtagctcattgggacaacagcccaggaacccgcaagtatgggtac6240 gatcacgccattgccgccgaactctcccgtagatttccggtgttccagctagctgggaag6300 ggcacacaacttgatttgcagacggggagaaccagagttatctctgcacagcataacctg6360 -gtcccggtgaaccgcaatcttcctcacgccttagtccccgagtacaaggagaagcaaccc6420 ggcccggtcaaaaaattcttgaaccagttcaaacaccactcagtacttgtggtatcagag6480 gaaaaaattgaagctccccgtaagagaatcgaatggatcgccccgattggcatagccggt6540 gcagataagaactacaacctggctttcgggtttccgccgcaggcacggtacgacctggtg6600 ttcatcaacattggaactaaatacagaaaccaccactttcagcagtgcgaagaccatgcg6660 gCgaCCttaaaaaCCCtttCgCgttCggCCCtgaattgCCttaaCCCaggaggcaccctc6720 gtggtgaagtcctatggctacgccgaccgcaacagtgaggacgtagtcaccgctcttgcc6780 agaaagtttgtcagggtgtctgcagcgagaccagattgtgtctcaagcaatacagaaatg6840 tacctgattttccgacaactagacaacagccgtacacggcaattcaccccgcaccatctg6900 aattgcgtgatttcgtccgtgtatgagggtacaagagatggagttggagccgcgccgtca6960 taccgcaccaaaagggagaatattgctgactgtcaagaggaagcagttgtcaacgcagcc7020 aatccgctgggtagaccaggcgaaggagtctgccgtgccatctataaacgttggccgacc7080 agttttaccgattcagccacggagacaggcaccgcaagaatgactgtgtgcctaggaaag7140 aaagtgatccacgcggtcggccctgatttccggaagcacccagaagcagaagccttgaaa7200 ttgctacaaaacgcctaccatgcagtggcagacttagtaaatgaacataacatcaagtct7260 gtcgccattccactgctatctacaggcatttacgcagccggaaaagaccgccttgaagta7320 tcacttaactgcttgacaaccgcgctagacagaactgacgcggacgtaaccatctattgc7380 ctggataagaagtggaaggaaagaatcgacgcggcactccaacttaaggagtctgtaaca7440 gagctgaaggatgaagatatgga.gatcgacgatgagttagtatggattcatccagacagt7500 tgcttgaagggaagaaagggattcagtactacaaaaggaaaattgtattcgtacttcgaa7560 ggcaccaaattccatcaagcagcaaaagacatggcggagataaaggtcctgttccctaat7620 gaccaggaaagtaatgaacaactgtgtgcctacatattgggtgagaccatggaagcaatc7680 cgcgaaaagtgcccggtcgaccataacccgtcgtctagcccgcccaaaacgttgccgtgc7740 ctttgcatgtatgccatgacgccagaaagggtccacagacttagaagcaataacgtcaaa7800 gaagttacagtatgctcctccaccccccttcctaagcacaaaattaagaatgttcagaag7860 gttcagtgcacgaaagtagtcctgtttaatccgcacactcccgcattcgttcccgcccgt7920 aagtacatagaagtgccagaacagcctaccgctcctcctgcacaggccgaggaggccccc7980 gaagttgtagcgacaccgtcaccatctacagctgataacacctcgcttgatgtcacagac8040 atctcactggatatggatgacagtagcgaaggctcacttttttcgagctttagcggatcg8100 gacaactctattactagtatggacagttggtcgtcaggacctagttcactagagatagta8160 gaccgaaggcaggtggtggtggctgacgttcatgccgtccaagagcctgcccctattcca8220 ggaaagatgaaaaccacgaacccg ccgccaaggc taaagaagat ggcccgcctg gcagcggcaa gaaaagagcc cactccaccg 8280 gcaagcaata gctctgagtc cctccacctc tcttttggtg gggtatccat gtccctcgga 8340 tcaattttcg acggagagac ggcccgccag gcagcggtac aacccctggc aacaggcccc 8400 acggatgtgc ctatgtcttt cggatcgttt tccgacggag agattgatga gctgagccgc 8460 agagtaactg agtccgaacc cgtcctgttt ggatcatttg aaccgggcga agtgaactca 8520 attatatcgt cccgatcagc cgtatctttt ccactacgca agcagagacg tagacgcagg 8580 agcaggagga ctgaatactg actaaccggg gtaggtgggt acatattttc gacggacaca 8640 ggccctgggc acttgcaaaa gaagtccgtt ctgcagaacc agcttacaga accgaccttg 8700 gagcgcaatg tcctggaaag aattcatgcc ccggtgctcg acacgtcgaa agaggaacaa 8760 ctcaaactca ggtaccagat gatgcccacc gaagccaaca aaagtaggta ccagtctcgt 8820 aaagtagaaa atcagaaagc cataaccact gagcgactac tgtcaggact acgactgtat 8880 aactctgcca cagatcagcc agaatgctat aagatcacct atccgaaacc attgtactcc 8940 agtagcgtac cggcgaacta ctccgatcca cagttcgctg tagctgtctg taacaactat 9000 ctgcatgaga actatccgac agtagcatct tatcagatta ctgacgagta cgatgcttac 9060 ttggatatgg tagacgggac agtcgcctgc ctggatactg caaccttctg ccccgctaag 9120 cttagaagtt acccgaaaaa acatgagtat agagccccga atatccgcag tgcggttcca 9180 tcagcgatgc agaacacgct acaaaatgtg ctcattgccg caactaaaag aaattgcaac 9240 gtcacgcaga tgcgtgaact gccaacactg gactcagcga cattcaatgt cgaatgcttt 9300 cgaaaatatg catgtaatga cgagtattgg gaggagttcg ctcggaagcc aattaggatt 9360 accactgagt ttgtcaccgc atatgtagct agactgaaag gccctaaggc cgccgcacta 9420 tttgcaaaga cgtataattt ggtcccattg caagaagtgc ctatggatag attcgtcatg 9480 gacatgaaaa gagacgtgaa agttacacca ggcacgaaac acacagaaga aagaccgaaa 9540 gtacaagtga tacaagccgc agaacccctg gcgactgctt acttatgcgg gattcaccgg 9600 gaattagtgc gtaggcttac ggccgtcttg cttccaaaca ttcacacgct ttttgacatg 9660 tcggcggagg attttgatgc aatcatagca gaacacttca agcaaggcga cccggtactg 9720 gagacggata tcgcatcatt cgacaaaagc caagacgacg ctatggcgtt aaccggtctg 9780 atgatcttgg aggacctggg tgtggatcaa ccactactcg acttgatcga gtgcgccttt 9840 ggagaaatat catccaccca tctacctacg ggtactcgtt ttaaattcgg ggcgatgatg 9900 aaatccggaa tgttcctcac actttttgtc aacacagttt tgaatgtcgt tatcgccagc 9960 agagtactag aagagcggct taaaacgtcc agatgtgcag cgttcattgg cgacgacaac 10020 atcatacatg gagtagtatc tgacaaagaa atggctgaga ggtgcgccac ctggctcaac 10080 atggaggtta agatcatcga cgcagtcatc ggtgagagac caccttactt ctgcggcgga 10140 tttatcttgc aagattcggt tacttccaca gcgtgccgcg tggcggatcc cctgaaaagg 10200 ctgtttaagt tgggtaaacc gctcccagcc gacgacgagc aagacgaaga cagaagacgc 10260 gctctgctag atgaaacaaa ggcgtggttt agagtaggta taacaggcac tttagcagtg 10320 gccgtgacga cccggtatga ggtagacaat attacacctg tcctactggc attgagaact 10380 tttgcccaga gcaaaagagc attccaagcc atcagagggg aaataaagca tctctacggt 10440 ggtcctaaat agtcagcata gtacatttca tctgactaat,actacaacac caccacctct 10500 agacgcgagc ttgattagtc agcatagtac atttcatctg actaatacta caacaccacc 10560 accatgaata~°gaggattctt taacatgctc ggccgccgcc ccttcccggc ccccactgcc atgtggaggc cgcggagaag gaggcaggcg gccccgatgc ctgcccgcaa cgggctggct 10680 tctcaaatcc agcaactgac cacagccgtc agtgccctag tcattggaca ggcaactaga 10740 cctcaacccc cacgtccacg cccgccaccg cgccagaaga agcaggcgcc caagcaacca 10800 ccgaagccga agaaaccaaa aacgcaggag aagaagaaga agcaacctgc aaaacccaaa 10860 cccggaaaga gacagcgcat ggcacttaag ttggaggccg acagattgtt cgacgtcaag 10920 aacgaggacg gagatgtcat cgggcacgca ctggccatgg aaggaaaggt aatgaaacct 10980 ctgcacgtga aaggaaccat cgaccaccct gtgctatcaa agctcaaatt taccaagtcg 11040 tcagcatacg acatggagtt cgcacagttg ccagtcaaca tgagaagtga ggcattcacc 11100 tacaccagtg aacaccccga aggattctat aactggcacc acggagcggt gcagtatagt 11160 ggaggtagat ttaccatccc tcgcggagta ggaggcagag gagacagcgg tcgtccgatc 11220 atggataact ccggtcgggt tgtcgcgata gtcctcggtg gcgctgatga aggaacacga 11280 actgcccttt cggtcgtcac ctggaatagt aaagggaaga caattaagac gaccccggaa 11340 gggacagaag agtggtccgc agcaccactg gtcacggcaa tgtgtttgct cggaaatgtg 11400 agcttcccat gcgaccgccc gcccacatgc tatacccgcg aaccttccag agccctcgac 11460 atccttgaag agaacgtgaa ccatgaggcc tacgataccc tgctcaatgc catattgcgg 11520 tgcggatcgt ctggcagaag caaaagaagc gtcattgacg actttaccct gaccagcccc 11580 tacttgggca catgctcgta ctgccaccat actgtaccgt gcttcagccc tgttaagatc 11640 gagcaggtct gggacgaagc ggacgataac accatacgca tacagacttc cgcccagttt 11700 ggatacgacc aaagcggagc agcaagcgca aacaagtacc gctacatgtc gcttaagcag 11760 gatcacaccg ttaaagaagg caccatggat gacatcaaga ttagcacctc aggaccgtgt 11820 agaaggctta gctacaaagg atactttctc ctcgcaaaat gccctccagg ggacagcgta 11880 acggttagca tagtgagtag caactcagca acgtcatgta cactggcccg caagataaaa 11940 ccaaaattcg tgggacggga aaaatatgat ctacctcccg ttcacggtaa aaaaattcct 12000 tgcacagtgt acgaccgtct gaaagaaaca actgcaggct acatcactat gcacaggccg 12060 agaccgcacg cttatacatc ctacctggaa gaatcatcag ggaaagttta cgcaaagccg 12120 ccatctggga agaacattac gtatgagtgc aagtgcggcg actacaagac cggaaccgtt 12180 tcgacccgca ccgaaatcac tggttgcacc gccatcaagc agtgcgtcgc ctataagagc 12240 gaccaaacga agtgggtctt caactcaccg gacttgatca gacatgacga ccacacggcc 12300 caagggaaat tgcatttgcc tttcaagttg atcccgagta cctgcatggt ccctgttgcc 12360 cacgcgccga atgtaataca tggctttaaa cacatcagcc tccaattaga tacagaccac 12420 ttgacattgc tcaccaccag gagactaggg gcaaacccgg aaccaaccac tgaatggatc 12480 gtcggaaaga cggtcagaaa cttcaccgtc gaccgagatg gcctggaata catatgggga 12540 aatcatgagc cagtgagggt ctatgcccaa gagtcagcac caggagaccc tcacggatgg 12600 ccacacgaaa tagtacagca ttactaccat cgccatcctg tgtacaccat cttagccgtc 12660 gcatcagcta ccgtggcgat gatgattggc gtaactgttg cagtgttatg tgcctgtaaa 12720 gcgcgccgtg agtgcctgac gccatacgcc ctggccccaa acgccgtaat cccaacttcg 12780 ctggcactct tgtgctgcgt taggtcggcc aatgctgaaa cgttcaccga gaccatgagt 12840 tacttgtggt cgaacagtca gccgttcttc tgggtccagt tgtgcatacc tttggccgct 12900 ttcatcgttc taatgcgctg ctgctcctgc tgcctgcctt ttttagtggt tgccggcgcc 1290 tacctggcga aggtagacgc ctacgaacat gcgaccactg ttccaaatgt gccacagata 13020 ccgtataagg cacttgttga aagggcaggg tatgccccgc tcaatttgga gatcactgtc 13080 atgtcctcgg aggttttgcc ttccaccaac caagagtaca ttacctgcaa attcaccact 13140 gtggtcccct ccccaaaaat caaatgctgc ggctccttgg aatgtcagcc ggccgctcat 13200 gcagactata cctgcaaggt cttcggaggg gtctacccct ttatgtgggg aggagcgcaa 13260 tgtttttgcg acagtgagaa cagccagatg agtgaggcgt acgtcgaatt gtcagcagat 13320 tgcgcgtctg accacgcgca ggcgattaag gtgcacactg ccgcgatgaa agtaggactg 13380 cgtattgtgt acgggaacac taccagtttc ctagatgtgt acgtgaacgg agtcacacca 13440 ggaacgtcta aagacttgaa agtcatagct ggaccaattt cagcatcgtt tacgccattc 13500 gatcataagg tcgttatcca tcgcggcctg gtgtacaact atgacttccc ggaatatgga 13560 gcgatgaaac caggagcgtt tggagacatt caagctacct ccttgactag caaggatctc 13620 atcgccagca cagacattag gctactcaag ccttccgcca agaacgtgca tgtcccgtac 13680 acgcaggcct catcaggatt tgagatgtgg aaaaacaact caggccgccc actgcaggaa 13740 accgcacctt tcgggtgtaa gattgcagta aatccgctcc gagcggtgga ctgttcatac 13800 gggaacattc ccatttctat tgacatcccg aacgctgcct ttatcaggac atcagatgca 13860 ccactggtct caacagtcaa atgtgaagtc agtgagtgca cttattcagc agacttcggc 13920 gggatggcca ccctgcagta tgtatccgac cgcgaaggtc aatgccccgt acattcgcat 13980 tcgagcacag caactctcca agagtcgaca gtacatgtcc tggagaaagg agcggtgaca 14040 gtacacttta gcaccgcgag tccacaggcg aactttatcg tatcgctgtg tgggaagaag 14100 acaacatgca atgcagaatg taaaccacca gctgaccata tcgtgagcac cccgcacaaa 14160 aatgaccaag aatttcaagc cgccatctca aaaacatcat ggagttggct gtttgccctt 14220 ttcggcggcg cctcgtcgct attaattata ggacttatga tttttgcttg cagcatgatg 14280 ctgactagca cacgaagatg accgctacgc cccaatgatc cgaccagcaa aactcgatgt 14340 acttccgagg aactgatgtg cataatgcat ctctagcgat gtacgggcca gatatacgcg 14400 tatctgaggg gactagggtg tgtttaggcg aaaagcgggg cttcggttgt acgcggttag 14460 gagtcccctc aggatatagt agtttcgctt ttgcataggg agggggaaat gtagtcttat 14520 gcaatacact tgtagtcttg caacatggta acgatgagtt agcaacatgc cttacaagga 14580 gagaaaaagc accgtgcatg ccgattggtg gaagtaaggt ggtacgatcg tgccttatta 14640 ggaaggcaac agacaggtct gacatggatt ggacgaacca ctgaattccg cattgcagag 14700 ataattgtat ttaagtgcct agctcgatac aataaacgcc atttgaccat tcaccacatt 14760 ggtgtgcacc tccaagcttg gtaccgagct cggatccact agtaacggcc gccagtgtgc 14820 tggaattctg cagatcatag tgtgaattcg cggccgctct attggatcca ctagtaacgg 14880 ccgccagtgt gctggaattc tgcagatcgc gcaagaaatg gctagcaaag gagaagaact 14940 cttcactgga gttgtcccaa ttcttgttga attagatggt gatgttaacg gccacaagtt 15000 ctctgtcagt ggagagggtg aaggtgatgc aacatacgga aaacttaccc tgaagttcat 15060 ctgcactact ggcaaactgc ctgttccatg gccaacacta gtcactactc tgtgctatgg 15120 tgttcaatgc ttttcaagat acccggatca tatgaaacgg catgactttt tcaagagtgc 15180 catgcccgaa ggttatgtac aggaaaggac catcttcttc aaagatgacg gcaactacaa 15240 gacacgtgct gaagtcaagt ttgaaggtga tacccttgtt aatagaatcg agttaaaagg 15300 tattgacttc aaggaagatg gcaacattct gggacacaaa ttggaataca actataactc 15360 acacaatgta tacatcatgg cagacaaaca aaagaatgga atcaaagtga acttcaagac 15420 ccgccacaac attgaagatg gaagcgttca actagcagac cattatcaac aaaatactcc 15480 aattggcgat ggccctgtcc ttttaccaga caaccattac ctgtccacac aatctgccct 15540 ttcgaaagat cccaacgaaa agagagacca catggtcctt cttgagtttg taacagctgc 15600 tgggattaca catggcatgg atgaactgta caacatcgat ggaggcggag gtggaaaggg 15660 cccggttacc ggtaccggat cccggctcga gcatgcaggc cttgggccca atgatccgac 15720 cagcaaaact cgatgtactt ccgaggaact gatgtgcata atgcatcagg ctggtacatt 15780 agatccccgc ttaccgcggg caatatagca acactaaaaa ctcgatgtac ttccgaggaa 15840 gcgcagtgca taatgctgcg cagtgttgcc acataaccac tatattaacc atttatctag 15900 cggacgccaa aaactcaatg tatttctgag gaagcgtggt gcataatgcc acgcagcgtc 15960 tgcataactt ttattatttc ttttattaat caacaaaatt ttgtttttaa catttcaaaa 16020 aaaaaaaaaa aaaaaaaaaa aaaaaaaatt taaattaatt aagcttaatt cctcgattaa 16080 ttaagcggcc ctagaggatc tttgtgaagg aaccttactt ctgtggtgtg acataattgg 16140 acaaactacc tacagagatt taaagctcta aggtaaatat aaaattttta agtgtataat 16200 gtgttaaact actgattcta attgtttgtg tattttagat tccaacctat ggaactgatg 16260 aatgggagca gtggtggaat gcctttaatg aggaaaacct gttttgctca gaagaaatgc 16320 catctagtga tgatgaggct actgctgact ctcaacattc tactcctcca aaaaagaaga 16380 gaaaggtaga agaccccaag gacttt cctt cagaattgct aagttttttg agtcatgctg 16440 tgtttagtaa tagaactctt gcttgctttg ctatttacac cacaaaggaa aaagctgcac 16500 tgctatacaa gaaaattatg gaaaaatatt tgatgtatag tgccttgact agagatcata 16560 atcagccata ccacatttgt agaggtttta cttgctttaa aaaacctccc acacctcccc 16620 ctgaacctga aacataaaat gaatgcaatt gttgttgtta acttgtttat tgcagcttat 16680 aatggttaca aataaagcaa tagcatcaca aatttcacaa ataaagcatt tttttcactg 16740 cattctagtt gtggtttgtc caaactcatc aatgtatctt atcatgtctg gatcatcccg 16800 ccatggtatc aacgccatat ttctatttac agtagggacc tcttcgttgt gtaggtaccg 16860 ctgtattcct agggaaatag tagaggcacc ttgaactgtc tgcatcagcc atatagcccc 16920 cgctgttcga cttacaaaca caggcacagt aaattaat 16958 <210> 4 <211> 345 <212> DNA
<213> Saccharomyces cerevisiae <220>
<221> CDS
<222> (1)..(345) <223>
<400>

atgcca caatttggt atatta tgtaaaaca ccacctaag gtgcttgtt 48 MetPro GlnPheGly IleLeu CysLysThr ProProLys ValLeuVal cgtcag tttgtggaa aggttt gaaagacct tcaggtgag aaaatagca 96 ArgGln PheVa1Glu ArgPhe GluArgPro SerGlyGlu LysIleAla ttatgt getgetgaa ctaacc tatttatgt tggatgatt acacataac 144 LeuCys AlaAlaGlu LeuThr TyrLeuCys TrpMetIle ThrHisAsn ggaaca gcaatcaag agagcc acattcatg agctataat actatcata 192 GlyThr AlaIleLys ArgAla ThrPheMet SerTyrAsn ThrIleIle agcaat tcgctgagt ttcgat attgtcaataaa tcactc cagtttaaa 240 SerAsn SerLeuSer PheAsp IleValAsnLys SerLeu GlnPheLys tacaag acgcaaaaa gcaaca attctggaagcc tcatta aagaaattg 288 TyrLys ThrGlnLys AlaThr IleLeuGluAla SerLeu LysLysLeu attcct gettgggaa tttaca attattccttac tatgga caaaaacat 336 IlePro AlaTrpGlu PheThr IleT1eProTyr TyrGly GlnLysHis caatct gat 345 GlnSer Asp <210> 5 <211> 115 <212> PRT
<213> Saccharomyces cerevisiae <400> 5 Met Pro Gln Phe Gly Ile Leu Cys Lys Thr Pro Pro Lys Val Leu Val l 5 10 15 Arg Gln Phe Val Glu Arg Phe Glu Arg Pro Ser Gly Glu Lys Ile Ala Leu Cys A1a Ala Glu Leu Thr Tyr Leu Cys Trp Met Ile Thr His Asn Gly Thr Ala Ile Lys Arg Ala Thr Phe Met Ser Tyr Asn Thr Ile Ile Ser Asn Ser Leu Ser Phe Asp Ile Val Asn Lys Ser Leu Gln Phe Lys Tyr Lys Thr G1n Lys Ala Thr Ile Leu Glu Ala Ser Leu Lys Lys Leu Ile Pro Ala Trp Glu Phe Thr Ile Tle Pro Tyr Tyr G1y Gln Lys His Gln Ser Asp <210> 6 <211> 345 <212> DNA
<213> artificial sequence <220>
<223> Sequence is derived from Saccharomyces cerivisiae (Flp recombinas e) <220>
<221> CDS
<222> (1)..(345) <223>
<400>

atgagccagttc ggcatcctg tgcaagact ccacct aaggtgctggtc 48 MetSerGlnPhe GlyIleLeu CysLysThr ProPro LysValLeuVal cggcagtttgtg gaaaggttc gagagaccc agcgga gagaaaatcgca 96 ArgGlnPheVal GluArgPhe GluArgPro SerG1y GluLysTleAla agctgtgccget gaactcacc tatctgtgc tggatg attacacacaac 144 SerCysAlaAla GluLeuThr TyrLeuCys TrpMet IleThrHisAsn ggcaccgccatc aagagagcc accttcatg tcctac aatacaatcatc 192 GlyThrAlaT1e LysArgAla ThrPheMet SerTyr AsnThrIleIle agcaattctctg agcttcgac attgtcaac aagagc ctccagttcaag 240 SerAsnSerLeu SerPheAsp I1eValAsn LysSer LeuGlnPheLys tacaagacccag aaggetacc atcctggag gcctcc ctgaagaagctg 288 TyrLysThrGln LysAlaThr IleLeuGlu AlaSer LeuLysLysLeu atcccagcatgg gagtttccc atcatccct tacaac gggcagaagcac 336 IleProAlaTrp GluPhePro IleIlePro TyrAsn GlyGlnLysHis cagagcgat 345 GlnSerAsp <210> 7 <211> 115 <212> PRT
<213> artificial sequence <220>
<223> Sequence is derived from Saccharomyces cerivisiae (Flp recombinas e) <400> 7 Met Ser G1n Phe Gly Ile Leu Cys Lys Thr Pro Pro Lys Va1 Leu Val Arg Gln Phe Val Glu Arg Phe Glu Arg Pro Ser Gly Glu Lys Ile Ala Ser Cys Ala Ala Glu Leu Thr Tyr Leu Cys Trp Met Ile Thr His Asn Gly Thr Ala Ile Lys Arg Ala Thr Phe Met Ser Tyr Asn Thr Ile Ile Ser Asn Ser Leu Ser Phe Asp Ile Val Asn Lys Ser Leu Gln Phe Lys Tyr Lys Thr Gln Lys A1a Thr Tle Leu Glu Ala Ser Leu Lys Lys Leu Ile Pro Ala Trp Glu Phe Pro Ile Ile Pro Tyr Asn Gly Gln Lys His Gln Ser Asp <210> 8 <211> 48 <212> DNA
<213> artificial sequence <220>
<223> Sequence is completely synthesized <400> 8 gaagttccta ttccgaagtt cctattctct agaaagtata ggaacttc 48 <210> 9 <211> 62 <212> DNA
<213> artificial sequence <220>
<223> Sequence is completely synthesized <400> 9 cgcgggatcc ttgaagttcc tattccgaag ttcctattct ctagaaagta taggaacttc 60 ag 62 <210> 10 <211> 62 <212> DNA
<213> artificial sequence <220>
<223> Sequence is completely synthesized <400> 10 cgcgctgaag ttcctatact ttctagagaa taggaacttc ggaataggaa cttcaaggat 60 cc 62 <210> 11 <211> 72 <212> DNA
<213> artificial sequence <220>
<223> Sequence is completely synthesized <400> 11 ggggaagttc ctattccgaa gttcctattc tctagaaagt ataggaactt cctatgactg 60 ggcacaacag ac ~2 <210> 12 <211> 18 <212> DNA
<213> artificial sequence <220>
<223> Sequence is completely synthesized <400> 12 gggcgaagaa ctccagca 1g <210> 13 <211> 19 <212> DNA
<213> artificial sequence <220>
<223> Sequence is completely synthesized <400> 13 ggcctagttc ggcgcgcca 1g <210> 14 <211> 19 <212> DNA
<213> artificial sequence <220>
<223> Sequence is completely synthesized <400> 14 agcttggcgc gccgaacta 19 <210> 15 <211> 22 <212> DNA
<213> artificial sequence <220>
<223> Sequence is completely synthesized <400> 15 cgcttctagt ttaaacgctc ga 22 <210> 16 <211> 22 <212> DNA
<213> artificial sequence <220>
<223> Sequence is completely synthesized <400> 16 cgtcgagcgt ttaaactaga ag 22 <210> 17 <211> 24 <212> DNA
<213> artificial sequence <220>
<223> Sequence is completely synthesized <400> 17 tcgcgattta aattaattaa gctt 24 <210> 18 <211> 24 <212> DNA
<213> artificial sequence <220>
<223> Sequence is completely synthesized <400> 18 aagcttaatt aatttaaatc gcga 24 <210> 19 <211> 18 <212> DNA
<213> artificial sequence <220>
<223> Sequence is completely synthesized <400> 19 cccatatatg gagttccg 18 <210> 20 .
<211> 18 <212> DNA
<213> artificial sequence <220>
<223> Sequence is completely synthesized <400> 20 ggtcggtcat ttcgaacc 18 <210> 21 <211> 18 <212> DNA
<213> artificial sequence <220>
<223> Sequence is completely synthesized <400> 21 gctctagaac tagtggat 18 <210> 22 <211> 18 <212> DNA
<213> artificial sequence <220>
<223> Sequence is completely synthesized <400> 22 gaattatgca gtgctgcc 18 <210> 23 <211> 72 <212> DNA
<213> artificial sequence <220>
<223> Sequence is completely synthesized <400> 23 tggtagtgca attggtcggc tgtttgattc aatagtgtgt actacgccgt caatctagtt 60 agccagagag ct ~2 <210> 24 - 26!29 -<211> 31 <212> DNA
<213> artificial sequence <220>
<223> Sequence is completely synthesized <400> 24 tggaattcag atctgggagg tctatataag c 31 <210> 25 <211> 21 <212> DNA
<213> artificial sequence <220>
<223> Sequence is completely synthesized <400> 25 gatccggagg actgtcctcc g 21 <210> 26 <211> 21 <212> DNA
<213> artificial sequence <220>
<223> Sequence is completely synthesized <400> 26 gatccggagg acagtcctcc g 21 <210> 27 <211> 20 <212> DNA
<213> artificial sequence <220>
<223> Sequence is completely synthesized <400> 27 gttgtgccca gtcaaaggaa 20 <210> 28 <211> 18 <212> DNA
<213> artificial sequence <220>
<223> Sequence is completely synthesized <400> 28 gtttactact ggcttctc 18 <210> 29 <211> 20 <212> DNA
<213> artificial sequence <220>
<223> Sequence is completely synthesized <400> 29 aagcatgcca caatttggta 20 <210> 30 <211> 23 <212> DNA
<213> artificial sequence <220> .
<223> Sequence is completely synthesized <400> 30 gtacttatat gcgtctattt atg 23 <210> 31 <211> 89 <212> DNA
<213> artificial sequence <220>
<223> Sequence is completely synthesized <400> 31 atcgctctgg tgcttctgcc cgttgtaagg gatgatggta aactccc.atg ctgggatcag 60 cttcttcagg gaggcctcca ggatggtag 89 <210> 32 <211> 86 <212> DNA
<213> artificial sequence <220>
<223> Sequence is completely synthesized <400> 32 gtagaggatc cagacctctg ggcggccgct cagcaagctt cgaagcatga gccagttcgg 60 catcctgtgc aagactccac ctaagg 86 <210> 33 <211> 83 <212> DNA
<213> artificial sequence <220>
<223> Sequence is completely synthesized <400> 33 gcaattctct gagcttcgac attgtgaaca agagcctcca gttcaagtac aagacccaga 60 aggctaccat cctggaggcc tcc 83 <210> 34 <2l1> 82 <212> DNA
<213> artificial sequence <220>
<223> Sequence is completely synthesized <400> 34 gtcgaagctc agagaattgc tgatgattgt attgtaggac atgaaggtgg ctctcttgat 60 ggcggtgccg ttgtgtgtaa tc 82 <210> 35 <211> 80 <212> DNA
<213> artificial sequence <220>
<223> Sequence is completely synthesized <400> 35 gacccagcgg agagaaaatc gccagctgtg ccgctgaact cacctatctg tgctggatga 60 ttacacacaa cggcaccgcc 80 <210> 36 <211> 74 <212> DNA
<213> artificial sequence <220>
<223> Sequence is completely synthesized <400> 36 cgattttctc tccgctgggt ctctcgaacc tttccacaaa ctgccggacc agcaccttag 60 gtggagtctt gcac 74 <210> 37 <211> 24 <212> DNA
<213> artificial sequence <220>
<223> Sequence is completely synthesized <400> 37 gctgatatcg ctctggtgct tctg 24 <210> 38 <211> 18 <212> DNA
<213> artificial sequence <220>
<223> Sequence is completely synthesized <400> 38 gtagaggatc cagacctc 18

Claims (49)

CLAIMS:
1. A vector for expressing an exogenous nucleic acid in eukaryotic cells, said vector comprising a nucleic acid sequence excisable by site-specific recombination.
2. An expression vector comprising a nucleic acid sequence, said sequence comprising a recombinase substrate and a transcription unit.
3. The expression vector of claim 2, wherein said recombinase substrate comprises a stuffer region flanked by recombination target sequences.
4. The expression vector of claim 3, wherein said stuffer region is removable by site-specific recombination.
5. The vector of claim 4, wherein the stuffer region comprises a restriction site.
6. The expression vector of any one of claims 2 to 5, wherein said transcription unit comprises an enhancer sequence, a promoter sequence and a termination sequence operatively linked together.
7. The expression vector of any one of claims 2 to 6, wherein said nucleic acid sequence comprises at least two fragments of a viral genome for packaging said vector or a fragment thereof into infectious viral particles
8. The expression vector of claim 7, wherein the recombinase substrate and the transcription unit are located between said at least two fragments.
9. The vector of claim 7 or 8, wherein said at least two fragments derive from a retrovirus or an adenovirus.
10. The expression vector of any one of claims 2 to 9, comprising a nucleic acid sequence encoding an inactive gene conferring resistance to an antibiotic in bacteria, and wherein activity of said inactive gene is restorable by site-specific recombination of the recombinase substrate.
11. A vector comprising: i) a site-specific recombinase coding sequence operatively linked to a termination sequence; and ii) a recombinase substrate excisable specifically by a recombinase encoded by said site-specific recombinase coding sequence.
12. The expression vector of claim 11, wherein said recombinase substrate comprises a stuffer region flanked by recombination target sequences.
13. The expression vector of claim 12, wherein said stuffer region is removable by site-specific recombination.
14. The vector of claim 13, wherein the stuffer region comprises a restriction site.
15. The expression vector of any one of claims 11 to 14, wherein said nucleic acid sequence comprises at least two fragments of a viral genome for packaging said vector or a fragment thereof into infectious viral particles
16. The expression vector of claim 15, wherein the recombinase substrate is located between said at least two fragments.
17. The vector of claim 15 or 16, wherein said at least two fragments derive from a retrovirus or an adenovirus.
18. The expression vector of any one of claims 11 to 17, comprising a nucleic acid sequence encoding an inactive gene conferring resistance to an antibiotic in bacteria, and wherein activity of said inactive gene is restorable by site-specific recombination of the recombinase substrate.
19. An expression vector comprising a nucleic acid sequence, said nucleic acid sequence comprising a recombinase substrate and a transcription unit incorporated into a viral genome.
20. The expression vector of claim 19, wherein said recombinase substrate comprises a stuffer region excisable by site-specific recombination, and wherein formation of viral particles is dependent upon excision of said stuffer region.
21. The vector of claim 19 or 20, wherein said viral genome consists of a cDNA copy of an alphaviral genome, and presence of the stuffer region blocks translation of viral proteins encoded by said cDNA copy of the alphaviral genome.
22. The vector of claim 21, wherein said recombinase substrate is present in a 5' untranslated region of the cDNA copy of the alphaviral genome.
23. The vector of claim 21 or 22, wherein said cDNA copy of the alphaviral genome derives from Sindbis virus genome or from Semliki Forest virus genome.
24. An expression vector comprising a sequence selected from the group consisting of SEQ ID NO: 1, SEQ ID NO: 2, and SEQ ID NO: 3.
25. An eukaryotic cell line comprising an expressible site-specific recombinase coding sequence, wherein said expressible site-specific recombinase coding sequence is operatively linked to a minimal promoter and to at least one cis-acting regulatory element.
26. The cell line of claim 25, wherein said site-specific recombinase is expressed upon activation of said at least one cis-acting regulatory element.
27. The cell line of claim 25 or 26, wherein said cis-acting regulatory element is activatable by elevation of intracellular cAMP or cGMP levels, elevation of intracellular calcium concentration, and/or change in phosphorylation state of proteins.
28. The cell line of claim 25 or 26, wherein said cis-acting regulatory element is activated during differentiation of mesenchymal stem cells into bone, cartilage, adipocytes or myoblasts.
29. The cell line of any one of claims 25 to 28, wherein said site-specific recombinase coding sequence is optimized for enhanced synthesis, stability or translation in eukaryotic cells.
30. The cell line of any one of claims 25 to 28, wherein said expressible site-specific recombinase coding sequence consists of a site-specific recombinase coding sequence selected from the group consisting of Flp from Saccharomyces cerevisiae, Cre from bacteriophage P1 and .beta.-recombinase from Bacillus subtilis.
31. The cell line of claim 30, wherein said site-specific recombinase coding sequence comprises SEQ ID NO:5, SEQ ID NO: 6, or a functional homologue thereof.
32. Use of a vector as defined in any one of claims 1 to 24, or a cell line as defined in any one of claims 25 to 31, for identifying or selecting an exogenous nucleic acid having a desired feature.
33. A method for identifying nucleic acids encoding a desired feature from a library of exogenous nucleic acids, wherein a plurality of nucleic acids from said library are inserted into a plurality of vectors, said vectors comprising a nucleic acid sequence excisable by site-specific recombination.
34. The method of claim 33, wherein said vectors are inserted into a eukaryotic cell line or into a transgenic animal comprising a nucleic acid encoding an inactive site-specific recombinase whose activity is restorable.
35. The method of claim 34, wherein the activity of said inactive site-specific recombinase is restored upon expression by said vector of an exogenous nucleic acid having said desired feature.
36. The method of claim 35, wherein said activated site-specific recombinase excises a fragment of the nucleic acid sequence of said expression vectors, thereby forming recombined expression vectors comprising a nucleic acid having said desired feature.
37. The method of any one of claims 34 to 36 wherein said inactive site-specific recombinase is inactive due to a lack of sufficient expression or due to sequestration outside of the cell nucleus.
38. The method of any one of claims 34 to 37, wherein said nucleic acid having a desired feature is selected from the group consisting of nucleic acids encoding transcription factors, nucleic acids encoding proteins involved in signal transduction pathways, and nucleic acids encoding proteins involved in cell metabolism or differentiation state.
39. The method of claim 35, wherein said vectors are inserted into a suitable eukaryotic host, and wherein said vectors encode and express a site-specific recombinase.
40. A method for screening exogenous nucleic acids having a desired feature within eukaryotic cells, said method comprising the steps of:
a) providing a plurality of expression vectors each capable, when present into a suitable host, of expressing an exogenous nucleic acid inserted therein, said vectors comprising a nucleic acid sequence excisable by site-specific recombination;

b) providing a cell line or a transgenic animal comprising a nucleic acid encoding an inactive site-specific recombinase whose activity is restorable;
c) inserting at least one exogenous nucleic acid from a library of nucleic acids into a plurality of said expression vectors, to provide a library of recombinant expression vectors;

d) introducing, into cells of the cell line or of the transgenic animal of step (b), a plurality of recombinant expression vectors from the library obtained at step (c);

e) allowing the recombinant expression vectors introduced at step (d) to express the exogenous nucleic acid inserted therein, wherein only exogenous nucleic acids encoding said desired feature are capable of restoring the activity of the site-specific recombinase of step (b);

f) allowing the site-specific recombinase whose activity has been restored in step (e) to excise said excisable nucleic acid sequence from recombinant expression vectors which have expressed an exogenous nucleic acid having restored the activity of the site-specific recombinase;

g) recovering recombinant expression vectors from cells of said cell line or transgenic animal; and h) selecting recombined expression vectors having undergone site-specific recombination at step (f), said recombined vectors containing an exogenous nucleic acid encoding the desired feature.
41. A method for screening exogenous nucleic acids having a transcriptional activity within eukaryotic cells, said method comprising the steps of:

a) providing a vector comprising: i) a site-specific recombinase coding sequence operatively linked to a termination sequence; and ii) a recombinase substrate excisable specifically by a site-specific recombinase encoded by said site-specific recombinase coding sequence;

b) inserting into a plurality of vectors as defined at step (a) at least one exogenous nucleic acid taken from a library of exogenous nucleic acids in order to provide a library of recombinant vectors;
c) inserting a plurality of recombinant vectors from the library obtained at step (b) into a suitable eukaryotic host;
d) allowing the exogenous nucleic acid inserted at step (b) to activate transcription of the site-specific recombinase coding sequence which is comprised in the vector, thereby producing said site-specific recombinase;
e) allowing the site-specific recombinase so produced to excise the recombinase substrate in the recombinant vector harboring the exogenous nucleic acid having activated the transcription of the site-specific recombinase;
f) following step e), recovering a plurality of recombinant vectors from said eukaryotic host; and g) selecting recombinant vectors having undergone site-specific recombination, most of these vectors containing an exogenous nucleic acid having transcriptional activity.
42. The method of claim 40 or 41, wherein the nucleic acid sequence of said vector encodes an inactive gene conferring resistance to an antibiotic in bacteria, and wherein activity of said inactive gene is restored by site-specific recombination of said nucleic acid sequence.
43. The method of claim 42, wherein the step of selecting recombined expression vectors having undergone site-specific recombination comprises the steps of:
i) extracting DNA from cells into which the expression vectors have been introduced;
ii) transforming bacteria with DNA extracted at step (i);
iii) growing bacteria transformed at step (ii) in presence of said antibiotic;
and iv) selecting bacterial colonies resistant to said antibiotic;

whereby said resistant bacterial colonies comprises excised expression vectors having undergone site-specific recombination.
44. The method of claim 43, further comprising the steps of:
v) extracting expression vectors from colonies selected at step (iv); and vi) identifying an exogenous nucleic acid found in said extracted vectors.
45. The method of claim 40 or 41, wherein the nucleic acid sequence of said vector comprises a recombinase substrate having a stuffer region flanked by recombination target sequences, and wherein said stuffer region comprises a cleavable restriction site.
46. The method of claim 45, wherein the step of selecting recombined expression vectors having undergone site-specific recombination comprises the steps of:
i. extracting DNA from cells into which the expression vectors have been introduced; and ii. contacting DNA extracted at step a) with a restriction enzyme recognizing said cleavable restriction site;
whereby recombined expression vectors are not cleaved by said restriction enzyme, and whereby unrecombined expression vectors are cleaved by said restriction enzyme.
47. The method of claim 46, further comprising the step of degrading DNA
fragments cleaved by said restriction enzyme with an exonuclease.
48. The method of claim 46, further comprising the step of amplifying a DNA
fragment from the expression vectors, said fragment comprising said exogenous nucleic acid.
49. A screening kit comprising:
1) a vector as defined in any one of claims 1 to 24; or 2) a cell line as defined in any one of claims 25 to 31;
and at least one further element selected from the group consisting of instructions for using said kit, reaction buffer(s), enzyme(s), probe(s) and pool(s) of nucleotide molecules to be screened.
CA002451957A2001-06-282002-06-28Methods, vectors, cell lines and kits for selecting nucleic acids having a desired featureAbandonedCA2451957A1 (en)

Applications Claiming Priority (3)

Application NumberPriority DateFiling DateTitle
US30114901P2001-06-282001-06-28
US60/301,1492001-06-28
PCT/CA2002/000997WO2003002735A2 (en)2001-06-282002-06-28Methods, vectors, cell lines and kits for selecting nucleic acids having a desired feature

Publications (1)

Publication NumberPublication Date
CA2451957A1true CA2451957A1 (en)2003-01-09

Family

ID=23162156

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CA002451957AAbandonedCA2451957A1 (en)2001-06-282002-06-28Methods, vectors, cell lines and kits for selecting nucleic acids having a desired feature

Country Status (3)

CountryLink
EP (1)EP1402020A2 (en)
CA (1)CA2451957A1 (en)
WO (1)WO2003002735A2 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN115747167A (en)*2022-10-172023-03-07北京博晖创新生物技术集团股份有限公司 A reference product for human gene molecular diagnosis and its preparation method and application
CN116716350A (en)*2023-01-182023-09-08中国科学院深圳先进技术研究院SINV vector for expressing IL-12 and application thereof in preparation of antitumor drugs

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US5919676A (en)*1993-06-241999-07-06Advec, Inc.Adenoviral vector system comprising Cre-loxP recombination
JP4216350B2 (en)*1994-09-192009-01-28大日本住友製薬株式会社 Recombinant DNA viral vector for animal cell infection
US6830885B1 (en)*2000-08-182004-12-14Phenogene Therapeutiques Inc.Nucleic acid molecule, method and kit for selecting a nucleic acid having a desired feature

Also Published As

Publication numberPublication date
WO2003002735A3 (en)2003-05-30
EP1402020A2 (en)2004-03-31
WO2003002735A2 (en)2003-01-09

Similar Documents

PublicationPublication DateTitle
US6025192A (en)Modified retroviral vectors
US10006048B2 (en)Synthetic genes and genetic constructs
EP1857549B1 (en)Control of gene expression
JP5075833B2 (en) Recombinant expression of multiprotein complexes using polygenes
EP1015620B1 (en)Dual selection cassette and plasmids containing same
EP1141361B1 (en)Compositions and methods for packaging of alphavirus vectors
US6255071B1 (en)Mammalian viral vectors and their uses
JP4383530B2 (en) Alphavirus vectors with reduced inhibition of cellular macromolecular synthesis
JPH01165395A (en)Manifestation system for manifestation of recombinant protein
US6830885B1 (en)Nucleic acid molecule, method and kit for selecting a nucleic acid having a desired feature
EP1012319B1 (en)Insect expression vectors
JP2001519165A (en) Recombinant alphavirus-based vectors with reduced inhibition of cell macromolecule synthesis
JP2011078429A (en)Recombinant alphavirus particle
WO1998012339A9 (en)Viral vectors and their uses
JPH03280883A (en)Recombination dna containing sequence from rna viruse, and gene-manipulating method using dna thereof
AU678982B2 (en)Insect viruses and their uses in protecting plants
CA2451957A1 (en)Methods, vectors, cell lines and kits for selecting nucleic acids having a desired feature
EP1417323A2 (en)Humanised baculovirus
Zou et al.Translation of the reovirus M1 gene initiates from the first AUG codon in both infected and transfected cells
US7045685B2 (en)Insect viruses and their uses in protecting plants
CZ302282B6 (en)Process for preparing recombinant adenoviruses and adenovirus libraries
CA2224475A1 (en)Protein interaction and transcription factor trap
JP3713038B2 (en) Recombinant adenovirus
CA2719413A1 (en)Compositions and methods for packaging of alphavirus vectors
CA2284728A1 (en)Insect expression vectors

Legal Events

DateCodeTitleDescription
FZDEDiscontinued

[8]ページ先頭

©2009-2025 Movatter.jp