UNIVERSAL INFLUENZA VIRUS PROBE SET FOR ENRICHMENT OF ANY
INFLUENZA VIRUS NUCLEIC ACID
CROSS REFERENCE TO RELATED APPLICATIONS
This application claims the benefit of ET.S. Provisional Application No. 62/611,734, filed December 29, 2017, which is herein incorporated by reference in its entirety.
FIELD
This disclosure concerns an influenza virus probe set capable of hybridizing with nucleic acid molecules from all influenza viruses, and its use for enriching and/or detecting influenza virus nucleic acid molecules in a sample and diagnosing an influenza virus infection.
INCORPORATION OF ELECTRONIC SEQUENCE LISTING
The nucleic acid sequences listed in the accompanying sequence listing are shown using standard letter abbreviations for nucleotide bases, as defined in 37 C.F.R. 1.822. Only one strand of each nucleic acid sequence is shown, but the complementary strand is understood as included by any reference to the displayed strand. The Sequence Listing is submitted as an ASCII text file, created on December 12, 2018, 12.6 MB, which is incorporated by reference herein.
BACKGROUND
Influenza virus is a negative stranded RNA virus belonging to the Orthomyxovirus family that infects several warm-blooded animals, including avian and mammalian species, but its natural reservoir is thought to be wild, free-ranging waterfowl and shorebirds (Webster et al ., Microbiol Rev 56(1): 152-179, 1992). When infecting humans, influenza virus causes mild to severe symptoms that include high fever, runny nose, sore throat, muscle pain, headache, cough, fatigue and can sometimes lead to death. Even with the availability of influenza vaccines, influenza virus still causes yearly seasonal infections, some sporadic outbreaks, and more rarely large pandemics (Morens et al, Clin Infect Dis 51(12): 1442-1444, 2010). Annual influenza epidemics worldwide are estimated to result in about 3 to 5 million cases of severe illness, and about 250,000 to 500,000 deaths. One major reason for the extensive influenza morbidity each year is the lack of proofreading activity of the influenza virus RNA polymerase, which leads to the constant creation of new viral genetic variants to evade the host immune system (Elena and Sanjuan, J Virol 79(18): 11555-11558, 2005). Random mutations can be rapidly selected for or against, depending upon the evolutionary pressures applied, including novel host environment, response to pre-existing immunity leading to antigenic drift, or antiviral drug pressure leading to resistance (Taubenberger and Kash, Cell Host Microbe 7(6):440-45l, 2010). Therefore, information on surveillance, transmission, infection mechanism, pathology, vaccinology, and rapid diagnosis of influenza is still a need to understand and prevent infection and reduce the disease burden.
SUMMARY
Described is a nucleic acid probe set capable of detecting nucleic acid molecules from all influenza virus isolates. The probe set can be used, for example, to enrich influenza virus nucleic acid molecules, detect influenza virus nucleic acid molecules and/or diagnose a subject as having an influenza virus infection.
Provided herein is an influenza virus probe set that includes deoxyribonucleic acid (DNA) probes at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95% or at least 99% identical to the nucleotide sequences of SEQ ID NOs: 1-46953; or includes ribonucleic acid (RNA) probes at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95% or at least 99% identical to the nucleotide sequences of SEQ ID NOs: 1-46953, wherein uracil (U) is substituted for thymidine (T). In some
embodiments, the probes are labelled, such as with biotin.
In some examples, the influenza virus probe set includes a subset of DNA probes, such as at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.9% of the probes of SEQ ID NOs: 1-46953 ( e.g ., at least 44,000, at least 44,500, at least 44,600, at least 45,000, at least 45,500, at least 46,000, at least 46,500, or at least 46,900 of the probes shown in SEQ ID NOs: 1-46953). In some examples, the influenza virus probe set includes a subset of RNA probes, such as at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.9% of the probes of SEQ ID NOs: 1-46953 (e.g., at least 44,000, at least 44,500, at least 44,600, at least 45,000, at least 45,500, at least 46,000, at least 46,500, or at least 46,900 of the probes shown in SEQ ID NOs: 1-46953).
Also provided are kits that include the probe set. In some embodiments, the kit further includes streptavidin-labelled magnetic beads.
Further provided is a method of enriching influenza virus nucleic acid in a sample. In some embodiments, the method includes contacting the sample with the probe set disclosed herein under conditions sufficient to allow hybridization of any influenza virus nucleic acid molecule present in the sample to probes of the probe set, and isolating the probes from the sample. Also provided herein is a method of detecting influenza virus nucleic acid molecules in a sample. In some embodiments, the method includes contacting the sample with the probe set disclosed herein under conditions sufficient to allow hybridization of any influenza virus nucleic acid molecules present in the sample to probes of the probe set, isolating the probes from the sample, and detecting the presence of influenza virus nucleic acid molecules hybridized to the isolated probes.
A method of diagnosing a subject as having an influenza virus infection is also described. In some embodiments, the method includes contacting a sample obtained from the subject with the probe set disclosed herein under conditions sufficient to allow hybridization of any influenza virus nucleic acid molecules present in the sample to probes of the probe set, isolating the probes from the sample; and detecting the presence of influenza virus nucleic acid molecules hybridized to the isolated probes.
The foregoing and other objects and features of the disclosure will become more apparent from the following detailed description, which proceeds with reference to the accompanying figures.
BRIEF DESCRIPTION OF THE DRAWING FIG. 1 is a schematic showing the process applied to the downloaded influenza reads for enrichment probe design.
SEQUENCE LISTING
In the accompanying sequence listing:
SEQ ID NOs: 1-46953 are influenza virus probes.
SEQ ID NOs: 46954-46969 are H1N1 real-time RT-PCR primers.
SEQ ID NOs: 46970-46985 are influenza B virus real-time RT-PCR primers.
SEQ ID NOs: 46986-46987 are HER2 real-time RT-PCR primers.
SEQ ID NOs: 46988-47019 are Hl to H16 real-time RT-PCR primers.
DETAILED DESCRITION
I. Abbreviations
FISH fluorescence in situ hybridization
HA hemagglutinin
IAV influenza A virus
IB V influenza B virus ICV influenza C virus
ISH in situ hybridization
NA neuraminidase
PCR polymerase chain reaction
II. Terms and Methods
Unless otherwise noted, technical terms are used according to conventional usage.
Definitions of common terms in molecular biology may be found in Benjamin Lewin, Genes V, published by Oxford University Press, 1994 (ISBN 0-19-854287-9); Kendrew et al. (eds.), The Encyclopedia of Molecular Biology, published by Blackwell Science Ltd., 1994 (ISBN 0-632- 02182-9); and Robert A. Meyers (ed.), Molecular Biology and Biotechnology: a Comprehensive Desk Reference, published by VCH Publishers, Inc., 1995 (ISBN 1-56081-569-8).
In order to facilitate review of the various embodiments of the disclosure, the following explanations of specific terms are provided:
Amplification (of nucleic acid): Increasing the number of copies of a nucleic acid molecule, such as a gene or fragment of a gene. The products of an amplification reaction are called amplification products ( e.g ., amplicons). An example of in vitro amplification is the polymerase chain reaction (PCR), in which a sample (such as a biological sample containing nucleic acid molecules) is contacted with one or more oligonucleotide primers, under conditions that allow for hybridization of the primer(s) to a nucleic acid molecule in the sample. The primers are extended under suitable conditions, dissociated from the template, and then re annealed, extended, and dissociated to amplify the number of copies of the nucleic acid molecule. Other examples of in vitro amplification techniques include real-time PCR, quantitative real-time PCR (qPCR), reverse transcription PCR (RT-PCR), quantitative RT-PCR (qRT-PCR), include real-time RT-PCR, loop-mediated isothermal amplification (LAMP; see Notomi et al, Nucl. Acids Res. 28:e63, 2000); reverse-transcription LAMP (RT-LAMP); strand displacement amplification (see U.S. Patent No. 5,744,311); transcription-mediated
amplification (U.S. Patent No. 5,399,491) transcription-free isothermal amplification (see U.S. Patent No. 6,033,881); repair chain reaction amplification (see WO 90/01069); ligase chain reaction amplification (see U.S. Patent No. 5,686,272); gap filling ligase chain reaction amplification (see U.S. Patent No. 5,427,930); coupled ligase detection and PCR (see U.S. Patent No. 6,027,889); and NASBA™ RNA transcription-free amplification (see U.S. Patent No. 6,025,134).
Biotin: A molecule (also known as vitamin H or vitamin B7) that binds with high affinity to avidin and streptavidin. Biotin is often used to label nucleic acids and proteins for subsequent detection by avidin or streptavidin linked to a detectable label, or for subsequent isolation using avidin or streptavidin linked to a solid support (such as a magnet bead).
Biotinylation refers to the process of adding or attaching (such as covalently) a biotin label to a protein or nucleic acid. As used herein, the term“biotin” includes derivatives or analogs that participate in a binding reaction with avidin. Biotin analogs and derivatives include, but are not limited to, N-hydroxysuccinimide-iminobiotin (NHS-iminobiotin), amino or sulfhydryl derivatives of 2-iminobiotin, amidobiotin, desthiobiotin, biotin sulfone, caproylamidobiotin and biocytin, biotinyl-s-aminocaproic acid-N-hydroxysuccinimide ester, sulfo-succinimide- iminobiotin, biotinbromoacetylhydrazide, p-diazobenzoyl biocytin, 3-(N-maleimidopropionyl) biocytin, 6-(6-biotinamidohexanamido)hexanoate and 2-biotinamidoethanethiol. Biotin derivatives are also commercially available, such as DSB-X™ Biotin (Invitrogen). Additional biotin analogs and derivatives are known in the art (see, for example, U.S. Patent No. 5,168,049; U.S. Patent Application Publication Nos. 2004/0024197, 2001/0016343, and 2005/0048012; and PCT Publication No. WO 1995/007466).
Biotin binding protein: A protein that binds biotin with sufficiently great affinity for an intended purpose. Examples of biotin binding proteins include avidin, streptavidin,
NeutrAvidin, and monoclonal antibodies or receptor molecules that specifically bind biotin. In the context of this disclosure, streptavidin could be replaced with any other biotin-binding protein, or a combination of biotin binding proteins.
Consists essentially of: In the context of the present disclosure,“consists essentially of’ indicates that the probe set contains all or at least 99% of the disclosed 46,953 probes. In some examples,“consist essentially of’ indicates that no more than 100, no more than 200, no more than 300, no more than 400 or no more than 500 other probes are included in the probe set.
Contacting: Placement in direct physical association; includes both in solid and liquid form. “Contacting” is often used interchangeably with“exposed.” For example, contacting can occur in vitro with one or more primers and/or probes and a biological sample (such as a sample including nucleic acid molecules) in solution.
Control: A reference standard, for example a positive control or negative control. A positive control is known to provide a positive test result. A negative control is known to provide a negative test result. However, the reference standard can be a theoretical or computed result, for example a result obtained in a population.
Diagnosis: The process of identifying a disease by its signs, symptoms and results of various tests. The conclusion reached through that process is also called“a diagnosis.” Forms of testing commonly performed include blood tests, medical imaging, and biopsy. Enrich: The process of increasing the quantity or concentration of a desired component. In the context of the present disclosure, enriching targeted nucleic acid molecules in a sample refers to the process of increasing the number of copies of the target nucleic acid molecules (such as influenza nucleic acid molecules) in a given sample volume.
Fluorophore: A chemical compound, which when excited by exposure to a particular wavelength of light, emits light (z.e., fluoresces), for example at a different wavelength than that to which it was exposed. Also encompassed by the term“fluorophore” are luminescent molecules, which are chemical compounds which do not require exposure to a particular wavelength of light to fluoresce; luminescent compounds naturally fluoresce. Therefore, the use of luminescent signals eliminates the need for an external source of electromagnetic radiation, such as a laser. An example of a luminescent molecule includes, but is not limited to, aequorin (Tsien, 1998 , Ann. Rev. Biochem. 67:509).
In some embodiments herein, a probe (such as any of SEQ ID NOs: 1-46953) is labeled with ( e.g ., has attached thereto) a fluorophore, such as at the 5' end of the probe. Probes used for real-time PCR assays typically include a fluorophore and a quencher. Fluorophores suitable for use with real-time PCR assays, such as TaqMan™ PCR, include, but are not limited to, 6- carboxyfluorescein (FAM), tetrachlorofluorescein (TET), tetramethylrhodamine (TMR), hexachlorofluorescein (HEX), JOE, ROX, CAL Fluor™, Pulsar™, Quasar™, Texas Red™, Cy™3 and Cy™5.
Other examples of fluorophores that can be used with the probes provided herein (such as any of SEQ ID NOs: 1-46953) are provided in U.S. Patent No. 5,866,366. These include: 4- acetamido-4'-isothiocyanatostilbene-2,2'disulfonic acid, acridine and derivatives such as acridine and acridine isothiocyanate, 5-(2'-aminoethyl)amino-naphthalene-l-sulfonic acid (EDANS), 4- amino-N-[3-vinylsulfonyl)phenyl]-naphthalimide-3,5 disulfonate (Lucifer Yellow VS), N-(4- anilino-l-naphthyl)-maleimide, anthranilamide, Brilliant Yellow, coumarin and derivatives such as coumarin, 7-amino-4-methylcoumarin (AMC, Coumarin 120), 7-amino-4- trifluoromethylcouluarin (Coumaran 151); cyanosine; 4',6-diaminidino-2-phenylindole (DAPI); 5', 5"-dibromopyrogallol-sulfonephthalein (Bromopyrogallol Red); 7-diethylamino-3-(4'- isothiocyanatophenyl)-4-methylcoumarin; diethylenetriamine pentaacetate; 4,4'- diisothiocyanatodihydro-stilbene-2,2'-disulfonic acid; 4,4'-diisothiocyanatostilbene-2,2'- disulfonic acid; 5-[dimethyl-amino]naphthalene-l-sulfonyl chloride (DNS, dansyl chloride); 4- (4'-dimethyl-aminophenylazo)benzoic acid (DABCYL); 4-dimethylaminophenylazophenyl-4'- isothiocyanate (DABITC); eosin and derivatives such as eosin and eosin isothiocyanate;
erythrosin and derivatives such as erythrosin B and erythrosin isothiocyanate; ethidium;
fluorescein and derivatives such as 5-carboxyfluorescein (FAM), 5-(4,6-dichlorotriazin-2- yl)aminofluorescein (DTAF), 2'7'-dimethoxy-4'5'-dichloro-6-carboxyfluorescein (JOE), fluorescein, fluorescein isothiocyanate (FITC), and QFITC (XRITC); fluorescamine; IR144; IR1446; Malachite Green isothiocyanate; 4-methylumbelliferone; ortho cresolphthalein;
nitrotyrosine; pararosaniline; Phenol Red; B-phycoerythrin; o-phthaldialdehyde; pyrene and derivatives such as pyrene, pyrene butyrate and succinimidyl 1 -pyrene butyrate; Reactive Red 4 (Cibacron .RTM. Brilliant Red 3B-A); rhodamine and derivatives such as 6-carboxy-X- rhodamine (ROX), 6-carboxyrhodamine (R6G), lissamine rhodamine B sulfonyl chloride, rhodamine (Rhod), rhodamine B, rhodamine 123, rhodamine X isothiocyanate, sulforhodamine B, sulforhodamine 101 and sulfonyl chloride derivative of sulforhodamine 101 (Texas Red); N,N,N',N'-tetramethyl-6-carboxyrhodamine (TAMRA); tetramethyl rhodamine; tetramethyl rhodamine isothiocyanate (TRITC); riboflavin; rosolic acid and terbium chelate derivatives.
Other fluorophores that can be used include thiol -reactive europium chelates that emit at approximately 617 nm (Heyduk and Heyduk, Analyt. Biochem. 248:216-27, 1997; J. Biol. Chem. 274:3315-22, 1999).
Other fluorophores that can be used include cyanine, merocyanine, stryl, and oxonyl compounds, such as those disclosed in U.S. Patent Nos. 5,627,027; 5,486,616; 5,569,587; and 5,569,766, and in published PCT application no. US98/00475, each of which is incorporated herein by reference. Specific examples of fluorophores disclosed in one or more of these patent documents include Cy3 and Cy5, for instance, and substituted versions of these fluorophores.
Other fluorophores that can be used include GFP, Lissamine™, diethylaminocoumarin, fluorescein chlorotriazinyl, naphthofluorescein, 4,7-dichlororhodamine and xanthene (as described in U.S. Patent No. 5,800,996 to Lee et al. , herein incorporated by reference) and derivatives thereof. Other fluorophores are known to those skilled in the art and are
commercially available from known sources.
Hybridization: Oligonucleotides (such as primers and probes) and their analogs hybridize by hydrogen bonding, which includes Watson-Crick, Hoogsteen or reversed
Hoogsteen hydrogen bonding, between complementary bases. Generally, nucleic acid consists of nitrogenous bases that are either pyrimidines (cytosine (C), uracil (U), and thymine (T)) or purines (adenine (A) and guanine (G)). These nitrogenous bases form hydrogen bonds between a pyrimidine and a purine, and the bonding of the pyrimidine to the purine is referred to as“base pairing.” More specifically, A will hydrogen bond to T or U, and G will bond to C.
“Complementary” refers to the base pairing that occurs between two distinct nucleic acid sequences or two distinct regions of the same nucleic acid sequence.
“Specifically hybridizable” and“specifically complementary” are terms that indicate a sufficient degree of complementarity such that stable and specific binding occurs between the oligonucleotide (or its analog) and the DNA or RNA target. The oligonucleotide or oligonucleotide analog need not be 100% complementary to its target sequence to be specifically hybridizable. An oligonucleotide or analog is specifically hybridizable when there is a sufficient degree of complementarity to out-compete non-specific binding of the oligonucleotide or analog to non-target sequences under conditions where specific binding is desired, for example under physiological conditions in the case of in vivo assays or systems. Such binding is referred to as specific hybridization.
Hybridization conditions resulting in particular degrees of stringency will vary depending upon the nature of the hybridization method of choice and the composition and length of the hybridizing nucleic acid sequences. Generally, the temperature of hybridization and the ionic strength (especially the Na+ and/or Mg++ concentration) of the hybridization buffer will determine the stringency of hybridization, though wash times also influence stringency.
Calculations regarding hybridization conditions required for attaining particular degrees of stringency are discussed by Sambrook et al. (ed.), Molecular Cloning: A Laboratory Manual,
2nd ed., vol. 1-3, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, 1989, chapters 9 and 11; and Ausubel et al. Short Protocols in Molecular Biology, 4th ed., John Wiley & Sons, Inc., 1999.
Infection: A state resulting from invasion and/or multiplication of a microorganism (such as a virus, for example an influenza virus) in body tissues or cells. An infection can be local or systemic, and acute, subacute or chronic.
Influenza virus: A segmented negative- strand RNA virus that belongs to the
Orthomyxoviridae family. There are three types of influenza viruses, A, B and C.
Influenza A virus (IAV): A negative-sense, single-stranded, segmented RNA virus, which has eight RNA segments (PB2, PB1, PA, NP, M, NS, HA and NA) that code for 11 proteins, including RNA-directed RNA polymerase proteins (PB2, PB1 and PA), nucleoprotein (NP), neuraminidase (NA), hemagglutinin (subunits HA1 and HA2), the matrix proteins (Ml and M2) and the non-structural proteins (NS1 and NS2). This virus is prone to rapid evolution by error-protein polymerase and by segment reassortment. The host range of influenza A is diverse, and includes humans, birds ( e.g ., chickens and aquatic birds), horses, marine mammals, pigs, bats, mice, ferrets, cats, tigers, leopards, and dogs. In animals, most influenza A viruses cause mild localized infections of the respiratory and intestinal tract. However, highly pathogenic influenza A strains, such as H5N1, cause systemic infections in poultry in which mortality may reach 100%. Animals infected with influenza A often act as a reservoir for the influenza viruses and certain subtypes have been shown to cross the species barrier to humans. Influenza A viruses can be classified into subtypes based on allelic variations in antigenic regions of two genes that encode surface glycoproteins, namely, hemagglutinin (HA) and neuraminidase (NA) which are required for viral attachment and cellular release. There are currently 18 different influenza A virus HA antigenic subtypes (Hl to H18) and 11 different influenza A virus NA antigenic subtypes (Nl to Nl 1). H1-H16 and N1-N9 are found in wild bird hosts and may be a pandemic threat to humans. H17-H18 and N10-N11 have been
described in bat hosts and are not currently thought to be a pandemic threat to humans.
Specific examples of influenza A include, but are not limited to: H1N1 (such as 1918 H1N1), H1N2, H1N7, H2N2 (such as 1957 H2N2), H2N1, H3N1, H3N2, H3N8, H4N8, H5N1, H5N2, H5N8, H5N9, H6N1, H6N2, H6N5, H7N1, H7N2, H7N3, H7N4, H7N7, H7N9, H8N4,
H9N2, H10N1, H10N7, H10N8, Hl 1N1, Hl 1N6, H12N5, H13N6, and H14N5. In one
example, influenza A includes those that circulate in humans such as H1N1, H1N2 and H3N2, or cause zoonotic infections, such as H7N9 and H5N1.
In animals, most influenza A viruses cause self-limited localized infections of the respiratory tract in mammals and/or the intestinal tract in birds. However, highly pathogenic influenza A strains, such as H5N1, cause systemic infections in poultry in which mortality may reach 100%. In 2009, H1N1 influenza was the most common cause of human influenza. A new strain of swine-origin H1N1 emerged in 2009 and was declared pandemic by the World Health Organization. This strain was referred to as "swine flu." H1N1 influenza A viruses were also responsible for the Spanish flu pandemic in 1918, the Fort Dix outbreak in 1976, and the Russian flu epidemic in 1977-1978.
Influenza B virus (IBV): A negative-sense, single-stranded, RNA virus, which has eight RNA segments. The capsid of IBV is enveloped while its virion includes an envelope, matrix protein, nucleoprotein complex, a nucleocapsid, and a polymerase complex. The surface proteins are neuraminidase (NA) and hemagglutinin (HA). This virus is less prone to evolution than influenza A, but it mutates enough such that lasting immunity has not been achieved. The host range of influenza B is narrower than influenza A, and is only known to infect humans and seals.
Influenza B viruses are not divided into subtypes, but can be further broken down into lineages and strains. Specific examples of influenza B include, but are not limited to: B/Yamagata, B/Victoria, B/Shanghai/36l/2002 and B/Hong Kong/330/2001.
Influenza C virus (ICV): A negative-sense, single-stranded, RNA virus, which has seven RNA segments that encode nine proteins. ICV is a genus in the virus family Orthomyxoviridae . ICV infects humans and pigs and generally causes only minor symptoms, but can be severe and cause local epidemics. Unlike IAV and IBV, ICV does not have the HA and NA proteins. Instead, ICV expresses a single glycoprotein called hemagglutinin-esterase fusion (HEF). Isolated: An“isolated” biological component (such as a nucleic acid molecule, protein or virus) has been substantially separated or purified away from other biological components (such as cell debris, or other proteins or nucleic acid molecules). Biological components that have been “isolated” include those components purified by standard purification methods. The term also embraces recombinant nucleic acids or proteins, as well as chemically synthesized nucleic acids or peptides.
Label: A detectable compound or composition that is conjugated directly or indirectly to another molecule (such as any of SEQ ID NOs: 1-46953) to facilitate detection or isolation of that molecule. Specific, non-limiting examples of labels include radioactive isotopes (such as32P,33P,35S, and125I), enzymes, co-factors, ligands, chemiluminescent or fluorescent molecules, haptens (such as biotin, digoxigenin, and fluorescein), affinity tags, and enzymes. Labels may be natural or synthetic, and may also be heterologous in the sense that they do not naturally occur in combination with the molecule to which it is conjugated. Conjugation can occur, for example, by covalent attachment of the label to the other molecule. The label can be directly detectable ( e.g ., optically detectable) or indirectly detectable (for example, via interaction with one or more additional molecules that are in turn detectable). Methods for labeling nucleic acids, and guidance in the choice of labels useful for various purposes, are discussed, e.g., in Green and Sambrook, Molecular Cloning: A Laboratory Manual, Cold Spring Harbor
Laboratory Press, Fourth Edition, 2012, and Ausubel et al, Short Protocols in Molecular
Biology , Current Protocols, Fifth Edition, 2002.
Library: In the context of the present disclosure, a“library” refers to a collection of nucleic acid molecules with special adapters or indexes, such as a nucleic acid fragment from a biological sample, such as a sample that includes influenza nucleic acid molecules.
Polymerase Chain Reaction (PCR): An in vitro amplification technique that increases the number of copies of a nucleic acid molecule (for example, a nucleic acid molecule in a sample or specimen). In an example, a biological sample collected from a subject is contacted with one or more oligonucleotide primers, under conditions that allow for the hybridization of the primers to nucleic acid template in the sample. The primers are extended under suitable conditions, dissociated from the template, and then re-annealed, extended, and dissociated to amplify the number of copies of the nucleic acid. The product of PCR (amplicons) can be characterized, for example by electrophoresis, restriction endonuclease cleavage patterns, oligonucleotide hybridization or ligation, and/or nucleic acid sequencing.
Primer: Primers are short nucleic acids, generally DNA oligonucleotides 10 nucleotides or more in length (such as 10-60, 15-50, 20-40, 20-50, 25-50, or 30-60 nucleotides in length). Primers may be annealed to a complementary target DNA strand by nucleic acid hybridization to form a hybrid between the primer and the target DNA strand, and then extended along the target DNA strand by a DNA polymerase enzyme. Primer pairs or sets of primers (such as 2, 3, 4, 5, 6, or more primers) can be used for amplification of a target nucleic acid, e.g ., by PCR, LAMP, RT-LAMP, or other nucleic acid amplification methods. Amplification primer pairs can be derived from a known sequence, for example, by using computer programs intended for that purpose such as Primer (Version 0.5, © 1991, Whitehead Institute for Biomedical Research, Cambridge, MA).
Probe: A probe typically comprises an isolated nucleic acid (for example, at least 10 or more nucleotides in length), generally with an attached detectable label or reporter molecule. Exemplary labels include radioactive isotopes, ligands, haptens, chemiluminescent agents, fluorescent molecules (e.g, fluorophores), and enzymes. In the context of the present disclosure, probes are about 120 nucleotides in length, such as about 80-140, about 90-130, about 100-125, or about 105-120 nucleotides in length. In particular examples, the probes are about 80, about 85, about, 90, about 95, about 100, about 105, about 110, about 115, about 120, about 125, about 130, about 135 or about 140 nucleotides in length. In certain examples, the probes are labelled, such as with biotin. Methods for labeling oligonucleotides and guidance in the choice of labels appropriate for various purposes are discussed, e.g, in Green and Sambrook, Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory Press, Fourth Edition, 2012, and Ausubel et al, Short Protocols in Molecular Biology , Current Protocols,
Fifth Edition, 2002. Methods for preparing and using nucleic acid probes and primers are described, for example, in Sambrook el al. (ed.), Molecular Cloning: A Laboratory Manual, 2nd ed., vol. 1-3, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, 1989; Ausubel el al. Short Protocols in Molecular Biology, 4th ed., John Wiley & Sons, Inc., 1999; and Innis et al. PCR Protocols, A Guide to Methods and Applications, Academic Press, Inc., San Diego, CA, 1990.
Sample: Encompasses a sample obtained from an animal, plant, or the environment, whether unfixed, frozen, or fixed in formalin and/or paraffin. As used herein, samples include all clinical samples useful for detection of viral infection in subjects, including, but not limited to, cells, tissues, aspirates, and bodily fluids. In some embodiments, the sample is a biological sample obtained from a human or veterinary subject, such as, for example, a fluid, cell and/or tissue sample, such as one from the blood or lung. In some examples herein, the biological sample is a fluid sample. Fluid sample include, but are not limited to, serum, blood, plasma, urine, feces, saliva, mucus, nasal wash, cerebral spinal fluid (CSF) or other bodily fluid.
Biological samples can also refer to cells or tissue samples, such as biopsy samples (for example, skin biopsies or needle aspirates), tissue sections (such as brain tissue and lung tissue), corneal tissue samples, or isolated leukocytes.
Sequence identity: The similarity between amino acid or nucleic acid sequences is expressed in terms of the similarity between the sequences, otherwise referred to as sequence identity. Sequence identity is frequently measured in terms of percentage identity (or similarity or homology); the higher the percentage, the more similar the two sequences are. Homologs or variants of a given gene or protein will possess a relatively high degree of sequence identity when aligned using standard methods.
Methods of alignment of sequences for comparison are known. Various programs and alignment algorithms are described in: Smith and Waterman, Adv. Appl. Math. 2:482, 1981;
Needleman and Wunsch, J. Mol. Biol. 48:443, 1970; Pearson and Lipman, Proc. Natl. Acad. Sci. U.S.A. 85:2444, 1988; Higgins and Sharp, Gene 73:237-244, 1988; Higgins and Sharp, CABIOS 5: 151-153, 1989; Corpet et al. , Nucleic Acids Research 16:10881-10890, 1988; and Pearson and Lipman, Proc. Natl. Acad. Sci. U.S.A. 85:2444, 1988. Altschul et al, Nature Genet. 6: 119-129,
1994.
The NCBI Basic Local Alignment Search Tool (BLAST™) (Altschul et al. , J. Mol. Biol. 215:403-410, 1990) is available from several sources, including the National Center for
Biotechnology Information (NCBI, Bethesda, MD) and on the Internet, for use in connection with the sequence analysis programs blastp, blastn, blastx, tblastn and tblastx.
Shear: To cut, break or fragment.
Subject: Living multi-cellular vertebrate organisms, a category that includes both human and non-human animals, such as non-human mammals (such as pigs, mice, rats, rabbits, sheep, horses, cows, bats and non-human primates, or any other animal that can be infected by an influenza virus). In one example the subject is a bird.
Under conditions sufficient for: A phrase that is used to describe any environment that permits the desired activity.
Unless otherwise explained, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. The singular terms“a,”“an,” and“the” include plural referents unless context clearly indicates otherwise. “Comprising A or B” means including A, or B, or A and B. It is further to be understood that all base sizes or amino acid sizes, and all molecular weight or molecular mass values, given for nucleic acids or polypeptides are approximate, and are provided for description. Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present disclosure, suitable methods and materials are described below. All publications, patent applications, patents, and other references mentioned herein are incorporated by reference in their entirety. In case of conflict, the present specification, including explanations of terms, will control. In addition, the materials, methods, and examples are illustrative only and not intended to be limiting.
III. Introduction
Since the isolation of influenza virus in 1901, there have been tremendous advances in understanding and preventing influenza virus infection in humans and other animals. One aspect of these advances is the obtainment of sequence information of influenza viruses found in different infected samples. Disclosed herein is the development of a universal influenza enrichment probe set that increases the sensitivity of sequence-based virus detection and characterization for all influenza viruses. In some examples, this universal influenza enrichment probe set contains 46,953 biotin-labeled probes that were designed based on all presently available influenza virus sequences for the enrichment of nucleic acid from any and all influenza viruses. The significant enrichment effects achieved using the probe set were demonstrated in H1N1 and influenza B virus spiked human samples by real time PCR and cultured H2N1 viral stock by Illumina sequencing. When one mallard rectal swab sample was sequenced by
Illumina technology without enrichment, no influenza virus sequences were detected. However, after applying an enrichment approach using the disclosed universal influenza virus probe set, a mixture of infection by different influenza subtypes was detected in the same sample. Another two mallard rectal swab samples were enriched using the universal influenza virus probe set. Sequencing of these enriched samples identified influenza virus infection and/or infection by a mixture of influenza viruses. The results described herein demonstrate that the disclosed universal influenza virus enrichment probe set can capture and enrich influenza virus nucleic acid molecules, and confirms that the probe set can be used to investigate influenza virus nucleic acid sequences in different samples, such as in partially degraded samples or samples with low- copy nucleic acid, such as samples containing less than 1000, less than 900, less than 800, less than 700, less than 600, less than 500, less than 400, less than 300, less than 200 or less thanlOO fg DNA and/or RNA.
IV. Overview of Several Embodiments
The present disclosure describes a nucleic acid probe set capable of detecting nucleic acid molecules from all influenza virus isolates. The probe set can be used, for example, to enrich influenza virus nucleic acid molecules from a sample, detect influenza virus nucleic acid molecules in a sample and/or diagnose a subject as having an influenza virus infection. Provided herein is an influenza virus probe set. In some embodiments, the probe set includes DNA probes that are at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95% or at least 99% identical to the nucleotide sequences of SEQ ID NOs: 1-46953. In other embodiments, the probe set includes RNA probes that are at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95% or at least 99% identical to the nucleotide sequences of SEQ ID NOs: 1-46953, where uracil (U) is substituted for thymidine (T). In some examples, the probe set includes DNA probes comprising the nucleotide sequences of SEQ ID NOs: 1-46953, or RNA probes comprising the nucleotide sequences of SEQ ID NOs: 1-46953, wherein uracil (U) is substituted for thymidine (T). Thus, in some examples, the probe set includes 46,953 unique probe sequences (i.e., the set includes each of SEQ ID NOs: 1-46953), which can be DNA and/or RNA molecules.
In other embodiments, the probe set includes a combination of the DNA probes and the RNA probes. In yet other embodiments, the probe set consists essentially of the DNA probes comprising the nucleotide sequences of SEQ ID NOs: 1-46953 or consists essentially of the corresponding RNA probes comprising the nucleotide sequences of SEQ ID NOs: 1-46953, where U is substituted for T. In some examples, the probes are at least 120, at least 125, at least 130, at least 135 or at least 140 nucleotides in length.
In other embodiments, the probe set includes a subset of the DNA probes as set forth in SEQ ID NOs: 1-46953 (or the corresponding RNA probes), such as at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.9% of the probes set forth as SEQ ID NOs: 1-46953. In some examples, the probe set includes a subset of the disclosed DNA probes, such as at least 44,000, at least 44,500, at least 44,600, at least 45,000, at least 45,500, at least 46,000, at least 46,500, or at least 46,900 of the probes shown in SEQ ID NOs: 1-46953, such as 44,000 to 46,953, 44,500 to 46,953, 44,600 to 46,953, 45,000 to 46,953, 45,500 to 46,953, 46,000 to 46,953, 46,500 to 46,953, or 46,900 to 46,953 of the probes shown in SEQ ID NOs: 1-46953. In some examples, the probe set includes a subset of RNA probes, such as at least 44,000, at least 44,500, at least 44,600, at least 45,000, at least 45,500, at least 46,000, at least 46,500, or at least 46,900 of the probes shown in SEQ ID NOs: 1-46953, such as 44,000 to 46,953, 44,500 to 46,953, 44,600 to 46,953, 45,000 to 46,953, 45,500 to 46,953, 46,000 to 46,953, 46,500 to 46,953, or 46,900 to 46,953 of the probes shown in SEQ ID NOs: 1- 46953. In some examples, the probe set includes all influenza A virus-specific probes. In other examples, the probe set includes all influenza B virus-specific probes. In yet other examples, the probe set includes all influenza C virus-specific probes. In other embodiments, the probe set includes DNA probes that are at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% identical to the nucleotide sequences of SEQ ID NOs: 1-46953. In other embodiments, the probe set includes RNA probes that are at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% identical to the nucleotide sequences of SEQ ID NOs: 1-46953, where uracil (U) is substituted for thymidine (T).
In other embodiments, the probe set includes DNA probes that include no more than 1, no more than 2, no more than 3, no more than 4, no more than 5, no more than 6, no more than 7, no more than 8, no more than 9, no more than 10, no more than 11 or no more than 12 substitutions (such as 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 or 12 substitutions) relative to the nucleotide sequences of SEQ ID NOs: 1-46953. In other embodiments, the probe set includes RNA probes that include no more than 1, no more than 2, no more than 3, no more than 4, no more than 5, no more than 6, no more than 7, no more than 8, no more than 9, no more than 10, no more than 11 or no more than 12 substitutions (such as 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 or 12 substitutions) relative to the nucleotide sequences of SEQ ID NOs: 1-46953, where uracil (U) is substituted for thymidine (T).
In some embodiments, the probe set includes DNA probes that are at least 100 nucleotides in length and comprise at least 100 consecutive nucleotides of SEQ ID NOs: 1- 46953. In some examples, the probe set includes DNA probes that are at least 105 nucleotides in length and comprise at least 105 consecutive nucleotides of SEQ ID NOs: 1-46953. In some examples, the probe set includes DNA probes that are at least 110 nucleotides in length and comprise at least 110 consecutive nucleotides of SEQ ID NOs: 1-46953. In some examples, the probe set includes DNA probes that are at least 115 nucleotides in length and comprise at least 115 consecutive nucleotides of SEQ ID NOs: 1-46953.
In some embodiments, the probe set includes RNA probes that are least 100 nucleotides in length and comprise at least 100 consecutive nucleotides of SEQ ID NOs: 1-46953, where U is substituted for T. In some examples, the probe set includes RNA probes that are at least 105 nucleotides in length and comprise at least 105 consecutive nucleotides of SEQ ID NOs: 1- 46953, where U is substituted for T. In some examples, the probe set includes RNA probes that are at least 110 nucleotides in length and comprise at least 110 consecutive nucleotides of SEQ ID NOs: 1-46953, where U is substituted for T. In some examples, the probe set includes RNA probes that are at least 115 nucleotides in length and comprise at least 115 consecutive nucleotides of SEQ ID NOs: 1-46953, where U is substituted for T. In some embodiments, the probes of the probe set are labelled, for example thereby allowing their detection. In some examples, the probes are labelled at their 5' end. In other examples, the probes are labelled at their 3' end.
The probes of the probe set can be labelled with any suitable label. Generally, the label will be selected based on the intended use of the probe or the desired readout. In some examples, the probe is labelled with one component of a specific binding pair, such as biotin or a derivative thereof, to enable isolation of the probe using the second component of the specific binding pair, such as streptavidin/avidin or a derivative or analog thereof. In other examples, the label is an enzyme, a fluorophore or a radioactive isotope.
Also provided are kits that include the influenza virus probe set disclosed herein. In some embodiments, the kit includes a biotin-labelled probe set. In some examples, the kit further includes a solid support comprising immobilized streptavidin (or another biotin binding protein). In non-limiting examples, the solid support comprises magnetic beads. The kit can also include microplates or columns.
Further provided are methods of enriching influenza virus nucleic acid molecules present in a sample that comprises nucleic acid molecules. In some embodiments, the method includes contacting the sample with the probe set disclosed herein under conditions sufficient to allow hybridization of any influenza virus nucleic acid molecules present in the sample to probes of the probe set, and isolating the probes from the sample.
In some embodiments, the method includes shearing the nucleic acid molecules in the sample, contacting the sample with the probe set disclosed herein under conditions sufficient to allow hybridization of any influenza virus nucleic acid present in the sample to probes of the probe set, and isolating the probes from the sample. Shearing can be performed, for example, by physical methods such as acoustic shearing, sonication or hydrodynamic shearing; by enzymatic methods, such as by using an endonuclease or transposase; or by chemical fragmentation, such as by heat digestion using metal cations. In one example, shearing is carried out using an acoustic method, such as by using a Covaris S2 machine.
In some embodiments, the method includes shearing the nucleic acid molecules in the sample, preparing a sequencing library from the sheared nucleic acid molecules, contacting the sequencing library with the probe set disclosed herein under conditions sufficient to allow hybridization of any influenza virus nucleic acid present in the library to probes of the probe set, and isolating the probes.
In some embodiments of these methods, the probes are labelled with biotin and isolating the probes from the sample includes contacting the sample (or library) with streptavidin-labelled magnetic beads. Also provided are methods of detecting influenza virus nucleic acid in a sample comprising nucleic acid molecules. In some embodiments, the method includes contacting the sample with the probe set disclosed herein under conditions sufficient to allow hybridization of any influenza virus nucleic acid present in the sample to probes of the probe set, isolating the probes from the sample, and detecting the presence of influenza virus nucleic acid hybridized to the isolated probes.
In some embodiments, the method includes shearing the nucleic acid molecules in the sample, contacting the sample with the probe set disclosed herein under conditions sufficient to allow hybridization of any influenza virus nucleic acid present in the sample to probes of the probe set, isolating the probes from the sample, and detecting the presence of influenza virus nucleic acid hybridized to the isolated probes.
In some embodiments, the method includes shearing the nucleic acid molecules in the sample, preparing a library from the sheared nucleic acid, contacting the library with the probe set disclosed herein under conditions sufficient to allow hybridization of any influenza virus nucleic acid present in the library to probes of the probe set, isolating the probes, and detecting the presence of influenza virus nucleic acid hybridized to the isolated probes.
In some embodiments, the method includes shearing the nucleic acid molecules in the sample, preparing a library from the sheared nucleic acid, contacting the library with the probe set disclosed herein under conditions sufficient to allow hybridization of any influenza virus nucleic acid present in the library to probes of the probe set, isolating the probes, amplifying the influenza virus nucleic acid hybridized to the isolated probes, and detecting the amplified influenza virus nucleic acid. In some examples, amplifying the influenza virus nucleic acid comprises, for example, PCR amplification or linear amplification. In some examples, detecting the amplified influenza virus nucleic acid comprises sequencing the amplified influenza virus nucleic acid. In some examples, the method further includes eluting the influenza virus nucleic acid hybridized to the isolated probes prior to amplification and/or detection of the influenza virus nucleic acid.
In some embodiments of the disclosed methods, the probes are labelled with biotin and isolating the probes from the sample (or library) comprises contacting the sample with streptavidin-labelled magnetic beads.
Further provided are methods of diagnosing a subject as having an influenza virus infection. In some embodiments, the methods include contacting a sample comprising nucleic acid molecules obtained from the subject with the probe set disclosed herein under conditions sufficient to allow hybridization of any influenza virus nucleic acid present in the sample to probes of the probe set, isolating the probes from the sample, and detecting the presence of influenza virus nucleic acid hybridized to the isolated probes.
In some embodiments, the method includes providing a sample containing nucleic acid molecules obtained from the subject, shearing the nucleic acid molecules in the sample, contacting the sample with the probe set disclosed herein under conditions sufficient to allow hybridization of any influenza virus nucleic acid present in the sample to probes of the probe set, isolating the probes from the sample, and detecting the presence of influenza virus nucleic acid hybridized to the isolated probes.
In some embodiments, the method includes providing a sample containing nucleic acid molecules obtained from the subject, shearing the nucleic acid molecules in the sample, preparing a library from the sheared nucleic acid, contacting the library with the probe set disclosed herein under conditions sufficient to allow hybridization of any influenza virus nucleic acid present in the sample to probes of the probe set, isolating the probes, and detecting the presence of influenza virus nucleic acid hybridized to the isolated probes.
In some embodiments, the method includes providing a sample containing nucleic acid molecules obtained from the subject, shearing the nucleic acid molecules in the sample, preparing a library from the sheared nucleic acid, contacting the library with the probe set disclosed herein under conditions sufficient to allow hybridization of any influenza virus nucleic acid present in the sample to probes of the probe set, isolating the probes, amplifying the influenza virus nucleic acid hybridized to the isolated probes, and detecting the amplified influenza virus nucleic acid. In some examples, amplifying the influenza virus nucleic acid comprises, for example PCR amplification or linear amplification. In some examples, detecting the amplified influenza virus nucleic acid comprises sequencing the amplified influenza virus nucleic acid. In some examples, the method further includes eluting the influenza virus nucleic acid hybridized to the isolated probes prior to amplification and/or detection of the influenza virus nucleic acid.
In some embodiments of the disclosed methods, the probes are labelled with biotin and isolating the probes from the sample comprises contacting the sample with streptavidin-labelled magnetic beads.
In some embodiments of the enrichment, detection and diagnostic methods disclosed herein, the sample is a biological sample. In some examples, the biological comprises a blood, saliva, mucus, nasal wash or swab sample. In other examples, the biological sample comprises a tissue sample, such as a paraffin-embedded tissue sample. V. Detection of Influenza Virus
A need exists for efficient methods to detect and identify influenza viruses, especially for newly emerging viral variants, for the purposes of surveillance, prevention, and treatment. Ideal methods would be capable of detecting influenza virus in a variety of different sample types, such samples from wild birds, domestic animals, and human patients. Various technological advances allow different methods to be developed to identify influenza viruses from these samples, such as methods based on culture, antibody, serological assay, nucleic acid
amplification and nucleic acid sequencing (Vemula et al ., Viruses 8(4):96, 2016). The latest and most comprehensive, but still costly approach, is the incorporation of high throughput sequencing technology.
High-throughput sequencing has been used for detecting influenza A virus and norovirus infections in patients (Nakamura et al. , PLoS One 4(l):e42l9, 2009), uncovering mixed infection with the 2009 pandemic influenza A viruses (Ghedin et al. , J Infect Dis 203(2):l68- 174, 2011), sequencing influenza B viruses (Zhou et al., J Clin Microbiol 52(5): 1330-1337, 2014), evaluating genetic stability of influenza vaccine viruses (Laassri et al. , PLoS One l0(9):e0l38650, 2015), revealing antigenic variants at low frequencies in influenza A virus- infected patients (Dinis et al. , J Virol 90(7):3355-3365, 2016), and identification of influenza A (H3N2) virus antigenic drift variants (Mishin et al., J Clin Microbiol 55(1): 145-154, 2017). However, because of the limited amount of virus in these samples, all of these studies used virus-specific primers and viral-specific PCR amplifications to enrich the influenza virus nucleic acid sequences.
In addition to PCR contamination issues faced in influenza laboratories, PCR introduced errors have been reported in next generation sequencing. It has been shown that the most commonly used PCR enzymes, including those marketed as high fidelity enzymes, all have error rates of 10-5 to 10-6 point mutations/bp/duplication (Mclnerney et al., Mol Biol Int
2014:287430, 2014). Besides well-characterized polymerase base substitution errors, other sources of error were found to be equally prevalent, including PCR-mediated recombination, template-switching and DNA damage introduced during temperature cycling (Potapov and Ong, PLoS One l2(l):e0l69774, 2017). In fact, the Primer ID method was developed to distinguish PCR introduced errors from real single nucleotide polymorphisms (SNPs) that occurred during virus evolution (Zhou et al., J Virol 89(l6):8540-8555, 2015). Another challenge to using PCR for enriching influenza sequences is that in many cases, the isolated influenza RNA is degraded, making it difficult to amplify using influenza virus universal primers (Hoffmann et al, Arch Virol 146(12):2275-2289, 2001). Furthermore, in some situations, subtype specific primers cannot be used because the subtype of influenza virus in the sample is unknown. To address these problems, the present disclosure describes the design of a set of universal influenza virus probes for enrichment of any and all influenza virus nucleic acid sequences by hybridization. Hybridization capture methods were first used to enrich sequence targets from the human genome. The designed oligonucleotide probes that capture the sequencing targets can be attached to a solid phase (Albert et al ., Nat Methods 4(11):903-905, 2007; Hodges et ah, Nat Genet 39(12): 1522-1527, 2007; Okou et ah, Nat Methods 4(11):907- 909, 2007), or hybridization can be performed in solution (Gnirke et al. , Nat Biotechnol 27(2): 182-189, 2009; Tewhey et al. , Genome Biol lO(lO):RH6, 2009). A virome capture sequencing platform can be used (Briese et al., MBio 6(5):e0l49l-0l4l5, 2015). Currently, many commercial companies provide design and capturing methods with differing enrichment effects of the target sequences (Bodi et al. , J Biomol Tech 24(2):73-86, 2013; Garcia-Garcia et al., Sci Rep 6:20948, 2016).
VI. Other uses of the influenza virus enrichment probes
The influenza virus nucleic acid-specific enrichment probes disclosed herein (SEQ ID NOs: 1-46953) can also be used for a variety of different nucleic acid-based influenza virus detection methods. In some embodiments, one or more of the disclosed probes is used to detect the presence of influenza virus nucleic acid, such as amplified nucleic acid, and/or to quantify amplified nucleic acid.
In some examples, RNA is isolated from a sample obtained from a subject, such as a fluid sample (for example, a serum, blood, plasma, urine, feces, saliva, mucus, nasal wash or CSF sample) or tissue sample. General methods for RNA extraction are disclosed in standard textbooks of molecular biology, including Ausubel et al, Current Protocols of Molecular Biology, John Wiley and Sons (1997). Methods for RNA extraction from paraffin embedded tissues are disclosed, for example, in Rupp and Locker, Lab Invest. 56:A67 (1987), and De Andres et al, BioTechniques 18:42044 (1995). In one example, RNA isolation can be performed using a purification kit, buffer set and protease from commercial manufacturers. In some examples, total RNA from cells (such as those obtained from a subject) can be isolated using QIAGIN® RNeasy mini-columns. Other commercially available RNA isolation kits include MASTERPURE® Complete DNA and RNA Purification Kit (EPICENTRE® Madison, Wis.) and Paraffin Block RNA Isolation Kit (Ambion, Inc.). Total RNA from tissue samples can be isolated using RNA Stat-60 (Tel-Test). RNA prepared from a biological sample can be isolated, for example, by cesium chloride density gradient centrifugation.
Methods for quantitating nucleic acid, including RNA, include RT-PCR. The first step in amplification via RT-PCR is the reverse transcription of the RNA template into cDNA, followed by its exponential amplification in a PCR reaction. Two exemplary reverse transcriptases are avian myeloblastosis virus reverse transcriptase (AMV-RT) and Moloney murine leukemia virus reverse transcriptase (MMLV-RT). The reverse transcription step is primed, for example, using specific primers, random hexamers, oligo-dT primers, or gene- specific primers, depending on the target nucleic acid. For example, extracted RNA can be reverse-transcribed using a GeneAmp RNA PCR kit (Perkin Elmer, Calif., USA), following the manufacturer’s instructions. The derived cDNA can then be used as a template in the subsequent PCR reaction. Amplified DNA is then detected using a specific probe, such any one of more of the influenza virus probes disclosed herein and set forth as SEQ ID NOs: 1-46953.
In situ hybridization (ISH), including fluorescence-based ISH (FISH), is another exemplary method for detecting the presence of a nucleic acid of interest, such as an influenza virus nucleic acid. ISH applies and extrapolates the technology of nucleic acid hybridization to the single cell level, and allows for the localization of sequences to specific cells within populations, such as tissues and blood samples. ISH is a type of hybridization that uses a complementary nucleic acid ( e.g . a probe) to localize one or more specific nucleic acid sequences in a portion or section of tissue (in situ), or, if the tissue is small enough, in the entire tissue (whole mount ISH).
Protocols for RNA iSH (including RNA FISH) are well established. RNA ISH protocols generally include bathing the sample in a high concentration of nucleic acid probe (or probes) that are complementar to the target RNA species, driving hybridization of the probe to the target. After hybridization, excess unbound probe is washed away, leaving only those probes specifically bound to the target molecule. Differences between the variants of RNA ISH typically revolve around the type of nucleic acid used for the probe and the type of labeling scheme used to detect the probe via microscopy. Radiolabeled cDNA probes complementary to the appropriate target, as well as fluorescence-based ISH (FISH) approaches using DNA or RNA probes, can be utilized.
Sample cells or tissues are treated to increase their permeability to allow a probe to enter the cells. The probe is added to the treated cells, allowed to hybridize at pertinent temperature, and excess probe is washed away. A complementary probe is labeled, for example with a radioactive, fluorescent or antigenic tag, so that the probe’s location and quantity in the tissue can be determined using autoradiography, fluorescence microscopy or immunoassay. The sample may be any sample as herein described, such as a biological fluid sample. One or more the influenza virus-specific enrichment probes discloses herein (SEQ ID NOs: 1-46953) can be used for ISH, including FISH. The following examples are provided to illustrate certain particular features and/or embodiments. These examples should not be construed to limit the disclosure to the particular features or embodiments described.
EXAMPLES
Example 1: Methods
This example describes the materials and methods used for the studies described in Example 2.
Enrichment probe design
All influenza virus sequences were downloaded from the NCBI influenza database. Details of the generation of the final sequence data set for enrichment probe design are described in Example 2. Enrichment probes were synthesized by Agilent Technologies (Santa Clara, CA) based on the final influenza sequence data set as a reference. The synthesized probes were each 120 nucleotides in length, using 5X tiling or probes spaced every 24 bp, to produce a total of 46,953 biotinylated cRNA probes.
Data analysis
All hemagglutinin (HA) and neuraminidase (NA) sequences were downloaded from the NCBTs Influenza Virus Resource (Bao et al. , J Virol 82(2):596-60l, 2008); A/Brevig
Mission/l/l9l8(HlNl) and A/Hangzhou/l/20l3(H7N9) sequences were download from NCBI; all human virus sequences and all human genome sequences were downloaded from NCBI. The Blastn program (Altschul et al. , JMol Biol 215(3):403-410, 1990) from the BLAST+ (version 2.2.31) package was used to search enrichment probes against different downloaded databases using percent identity 90 and e-value 0.001 as cutoffs. Blastn results from searching A/Brevig Mission/l/l9l8(HlNl) and A/Hangzhou/l/20l3(H7N9) sequences were converted to sam file using blast2bam version 0.1. SAMtools (version 1.4) was used to convert sam file to bam file, sort generated bam file, and index sorted bam file (Li et al. , Bioinformatics 25(l6):2078-2079, 2009). BEDTools (version 2.25.0) was used to calculate overall coverage of the reads for each segment (Quinlan and Hall, Bioinformatics 26(6): 841-842, 2010). Integrative Genomics Viewer (version 2.3.60) was used to generate HA and NA coverage figure (Robinson et al. , Nat Biotechnol 29(l):24-26, 2011). Sequencing was performed on the Illumina NextSeq machine. Samples were demultiplexed and FastQ files were generated using Illumina software. Reads were mapped to the Bowtie2 (version 2.2.5) indexed A/Hangzhou/l/20l3(H7N9) genome using Tophat2 (release 2.0.13) downloaded from the Center for Computational Biology, Johns Hopkins University (Trapnell et al. , Bioinformatics 25(9): 11-5-1111, 2009).
Plasmids spike-in and real-time PCR
The H1N1 (A/California/04/2009(HlNl)) plasmid set was constructed previously (Memoli et al. , Clin Infect Dis 60(5):693-702, 2015). The influenza B virus plasmid set was constructed by cloning the 8 viral gene segments of the wild-type virus in the pHH2l vector (E. Hoffmann (1997) Ph.D. thesis, Justus, Liebig-University, Giessen, Germany). Plasmids were isolated by QIAprep Spin Miniprep Kit (Qiagen, Hilden, Germany).
Real-time PCR primers for H1N1:
HA-fwd: TAA ACA CCA GCC TCC CAT TTC (SEQ ID NO: 46954)
HA-rev: CCT GTG GCC AGT CTC AAT TT (SEQ ID NO: 46955)
MX-fwd: CAG TGC TGG TCT GAA AGA TGA (SEQ ID NO: 46956)
MX-rev: GAC GAG AGG ATC ACT TGA ATC G (SEQ ID NO: 46957)
NA-fwd: GGG CCT TGC TAA ATG ACA AAC (SEQ ID NO: 46958)
NA-rev: GGA GAG GGA ACT TCA CCA ATA G (SEQ ID NO: 46959)
NP-fwd: GTA CTC ACT GGT CGG GAT AGA (SEQ ID NO: 46960)
NP-rev: GAC TCT TGT GAG CTG GGT TT (SEQ ID NO: 46961)
NS-fwd: GGG AAA CAA ATC GTG GAA TGG (SEQ ID NO: 46962)
NS-rev: GAG GGT CAT GTC AGA AAG GTA G (SEQ ID NO: 46963)
PA-fwd: GCA AGC ATG AGG AAC TAT (SEQ ID NO: 46964)
PA-rev: CAT TGA GCA AGG CCG TAT TTA TG (SEQ ID NO: 46965)
PBl-fwd: GAC AAT ATA CTG GTG GGA TGG G (SEQ ID NO: 46966)
PBl-rev: CTG TCC ACT CCT GCT TGT ATT (SEQ ID NO: 46967)
PB2-fwd: GCA ATA GGG TTG AGG ATT A (SEQ ID NO: 46968)
PB2-rev: CCC GTT AGC ACT TCT T (SEQ ID NO: 46969)
Real-time PCR primers for influenza B virus:
HA-fwd: CCT CAT CTG CTA ATG GAG TAA CC (SEQ ID NO: 46970)
HA-rev: TTT GTG GTA GTC CTT CGT CTT C (SEQ ID NO: 46971)
MX-fwd: CCT TAT CGG GAA TGG GAA CAA (SEQ ID NO: 46972)
MX-rev: CTG AGC TTT CAT GGC CTT CT (SEQ ID NO: 46973)
NA-fwd: GGA AAC TCA GCT CCC TTG ATA A (SEQ ID NO: 46974)
NA-rev: TGA GCT GCA TAA TGG GTT AGA G (SEQ ID NO: 46975) NP-fwd: AGT CTT GGC TTT GAT GTC TCT C (SEQ ID NO: 46976) NP-rev: TCA AAG GAG GCG GAA CTT TAG (SEQ ID NO: 46977) NS-fwd: CTG GAA TTG AAG GGT TTG (SEQ ID NO: 46978)
NS-rev: CCC TGG TGT TGA AGG GTA AT (SEQ ID NO: 46979) PA-fwd: GCA AGG ATG TCT CCC TTA GTA TC (SEQ ID NO: 46980) PA-rev: CTT CTG GTA GCT CAT GGT TGT (SEQ ID NO: 46981)
PB l-fwd: GCC CGT AGG TGG AAA TGA AA (SEQ ID NO: 46982)
PB l-rev: CTG TTA CTG TCA TGC TGA TCC C (SEQ ID NO: 46983) PB2-fwd: TAT CAC CCG GGA GGG AAT AA (SEQ ID NO: 46984) PB2-rev: TGG GTT TGA TGC GAC TAT TGA (SEQ ID NO: 46985)
Real-time PCR primers for HI to H16 segments:
Hl-fwd: GTT AGA GGA CAG GCA GGC AG (SEQ ID NO: 46988) Hl-rev: CGG AGT CAG ACC CCT TGT TC (SEQ ID NO: 46989) H2-fwd: TGA CGA TGC GGA ACA AAG GA (SEQ ID NO: 46990) H2-rev: CCC CCT TGT CCG TTG ACT TT (SEQ ID NO: 46991) H3-fwd: ACG TTC AGG CAT CAG GAA GG (SEQ ID NO: 46992) H3-rev: ACT AGT ACA TCC CCC GGC TT (SEQ ID NO: 46993) H4-fwd: AGA GTG ACT GTC TCC ACC CA (SEQ ID NO: 46994) H4-rev: CCT CTC ACC CAC GGT CTA CT (SEQ ID NO: 46995) H5-fwd: TCA TCC TCT TGC CAT TGG GG (SEQ ID NO: 46996) H5-rev: CCA TTC CTT GCC ATC CTC CT (SEQ ID NO: 46997) H6-fwd: CCA AGT CAG CGT ATC CA (SEQ ID NO: 46998)
H6-rev: TGC TCA TTG GTG TCA GGA GG (SEQ ID NO: 46999) H7-fwd: TGG GGC ATT CAT AGC TCC TG (SEQ ID NO: 47000) H7-rev: CCC CAC TGT GGA AGC AAT CT (SEQ ID NO: 47001) H8-fwd: GGG GCA TTC TGA AAA GGG GA (SEQ ID NO: 47002) H8-rev: GGC ATT TCG TGT GGC AGT TT (SEQ ID NO: 47003) H9-fwd: TTG TCA ATG GTC AGC AGG GG (SEQ ID NO: 47004) H9-rev: TCA CTC GCA ATG TCT GAC CC (SEQ ID NO: 47005) HlO-fwd: CAC CGA GAA CTG TGG GTC AA (SEQ ID NO: 47006) Hl 0-rev: CCA AAC AGG CCT CTC CCT TG (SEQ ID NO: 47007) Hl l-fwd: GCT GGG TTC ATA GAG GGT GG (SEQ ID NO: 47008) Hl l-rev: CTG CAG CAA TCC CTG TAC CT (SEQ ID NO: 47009) Hl2-fwd: CCA TTC ACC CAC CAA CA (SEQ ID NO: 47010) Hl 2-rev: AGT GGT GAC TGA GGA GAG GG (SEQ ID NO: 47011)
Hl3-fwd: CGC ACC TAC TTC TTG GGG AG (SEQ ID NO: 47012)
Hl3-rev: TGT TGG TCC CCG TAT TGT CC (SEQ ID NO: 47013)
Hl4-fwd: AGG TGG CAA CAG GGA GAG TA (SEQ ID NO: 47014)
Hl 4-rev: AGA TGC TTA TCC TGC CGC TC (SEQ ID NO: 47015)
Hl5-fwd: ATG CCG TAG CAA ATG GGA CA (SEQ ID NO: 47016)
Hl 5-rev: GTC CAC CGC TTT CTT CCC TT (SEQ ID NO: 47017)
Hl6-fwd: AGA GGG GTT TGT TTG GTG CT (SEQ ID NO: 47018)
Hl 6-rev: GGC TTT CTG GGT GGA CAC TT (SEQ ID NO: 47019)
Primers for human HER2 gene:
HER2-fwd: ACA ACC AAG TGA GGC AGG TC (SEQ ID NO: 46986)
HER2-rev: GTA TTG TTC AGC GGG TCT CC (SEQ ID NO: 46987)
All primers were synthesized by Integrated DNA Technologies (Coralville, IA). The isolated plasmids were diluted at least 1 million times before mixing with 500 ng Hela human DNA (New England Biolabs, Ipswich, MA). Mixed DNAs were sheared to 150 bp on the Covaris S2 machine (Covaris, Woburn, MA) and sequencing libraries (enriched and unenriched) were made following standard protocols with elimination of RNA to cDNA steps. Real-time PCR reactions using 1 mΐ sequencing libraries were performed on Applied Biosystems 7500 Real-Time PCR System (Foster City, CA) using Power SYBR® Green PCR Master Mix (Thermo Fisher Scientific, Waltham, MA) with following program: 50°C for 2 min; 95°C for 3 min; followed by 40 cycles of 95°C for 15 sec; 60°C for 1 min.
RNA isolation
Virus stock was cultured in Madin-Darby canine kidney (MDCK) cells and RNA was isolated using the QIAamp Viral RNA Mini Kit (Qiagen, Hilden, Germany). Cloacal swabs from mallards were inoculated in embryonated specific pathogen-free (SPF) chicken eggs and total RNA was extracted from first passage allantoic fluid using E.Z.N. A viral RNA kit according to the manufacture’s protocol (OMEGA bio-tek, Norcross, GA).
Library construction, enrichment and sequencing
Isolated total RNA was amplified by Ovation RNA-Seq system V2 from NuGEN (NuGEN, San Carlos, CA) following kit specification. Each sample, 5m1 of total RNA, was used as input for Ovation RNA-Seq system V2. The amplified total cDNAs were analyzed by Agilent 2100 Bioanalyzer using Agilent High Sensitivity DNA Kit (Agilent Technologies, Santa Clara, CA) and sheared to 150 bp on Covaris S2 machine (Covaris, Woburn, MA). Then, about 300 ng amplified cDNA was used to make Illumina sequencing libraries. For influenza enriched libraries, Agilent SureSelectXT Target Enrichment Kit for Illumina Multiplex Sequencing (Agilent, Santa Clara, CA) was used with designed influenza enrichment probes using a protocol for 200 ng DNA samples. Briefly, the ends of the sheared cDNA were repaired, adenylated at 3' ends, ligated to adaptors, and amplified according to the protocol.
The enrichment steps were the following: use a vacuum concentrator to concentrate the amplified samples (30 mΐ) from previous step at less than 45°C to final volume of 3.4 mΐ.
Prepare the hybridization buffer and block mix according to the protocol. Then add 5.6m1 of the block mix to 3.4 mΐ of sample (total 9 mΐ). Put the mixture on PCR using the following program: 95°C 5 min and hold at 65°C on thermal cycler. Take the synthesized influenza enrichment probe set from -80°C and thaw on ice. Add 2 mΐ of enrichment probe set (full cocktail) to 5 mΐ of diluted RNase block solution and keep it on ice (total 7 mΐ). Bring hybridization buffer to 65°C using heat block or PCR machine. Put the 7m1 of enrichment and RNase block mixture on PCR machine at 65°C. After that, quickly add 13 mΐ of 65°C of hybridization buffer then 9 mΐ of sample and block mixture (total 29 mΐ). Incubate the hybridization mixture on PCR machine at 65°C with a heated lid at l05°C for 16 to 24 hours. Vigorously resuspend the Dynabeads MyOne Streptavidin Tl magnetic beads (Thermo Fisher Scientific, Waltham, MA) on a vortex mixer. For each hybridization sample, 50 mΐ of the resuspended beads is needed. Wash the resuspended Streptavidin Tl magnetic beads with 200 mΐ SureSelect Binding Buffer 3 times on a magnetic separator device and resuspend the beads in 200m1 of SureSelect Binding Buffer. Then keep the hybridization tubes at 65°C on PCR machine while adding 200 mΐ of washed streptavidin beads to each tube and mix well by pipetting slowly. Put the hybridization and beads mixture on tube/plate mixer and rotate or mix for 30 minutes at room temperature. Put the tube in a magnetic separator to collect the beads. Wait until the solution is clear, then remove and discard the supernatant. Resuspend the beads in 200 mΐ of SureSelect Wash Buffer 1.
Incubate the samples for 15 minutes at room temperature. Briefly spin the tube and put it on the magnetic separator. Wait for the solution to clear, then discard the supernatant. Resuspend the beads in 200 mΐ of 65°C pre-warmed Wash Buffer 2 and incubate the sample for 10 minutes at 65°C on the thermal cycler. Put the tube in the magnetic separator. Wait for the solution to clear, then discard the supernatant. Repeat the wash steps for a total of 3 times. Make sure all of the wash buffer has been removed during the final wash. Add 30 mΐ of nuclease-free water to each sample well. Pipette up and down to resuspend the beads. Captured DNA is retained on the streptavidin beads during the step of post-capture amplification with indexing primers, which is the final step of the enrichment library
construction according to the procedure of Agilent SureSelectXT Target Enrichment Kit (Agilent, Santa Clara, CA). The PCR amplification cycle of 16 was used because the capture library is 1 kb to 0.5 Mb according to the protocol. For un-enriched libraries, TruSeq RNA Library Prep Kit v2 (Illumina, San Diego, CA) was used following manufacturer’s instructions with elimination of cDNA synthesizing steps. All the Illumina sequencing libraries were analyzed on Agilent 2100 Bioanalyzer using Agilent High Sensitivity DNA Kit. Libraries were then clustered on Illumina cBot machine and sequenced on Illumina GAIIx or NextSeq sequencer according to manufacturer’s instructions (Illumina, San Diego, CA).
Example 2: Universal Influenza Virus Enrichment Probe Set
This example describes the development and testing of a universal influenza virus probe set for enrichment of influenza virus nucleic acid from a sample.
Design Influenza Universal Enrichment Probes
All influenza sequences were downloaded from the NCBI influenza database. A total of 408, 140 influenza sequences were obtained. Among them, 390,301 (95.6%) were clean sequences and 17,839 (4.4%) were non-clean sequences (containing non A, T, G, C bases).
First, the clean sequences were collapsed into a unique set of 277,949 sequences (“clean sequence set”) using FASTX-Toolkit version 0.0.13. For the sequences with ambiguous bases (17,839), if the total length of the ambiguous bases together in a sequence was longer than 10% of the total length of the sequence, they were discarded. By this step, 649 sequences were discarded. The rest of the 17,190 sequences were clustered at 90% identity level by cd-hit-v4.3 (Li and Godzik, Bioinformatics 2: 1658-1659, 2006), resulting in 492 sequences. These sequences were split into separate sequences when they contained 8“N” residues continuously and the separated sequences were kept if they were longer than 20bp, which generated 649 sequences as the“non-clean sequence set.” Then, the clean sequence set (277,949) and non clean sequence set (649) were combined and clustered together at 90% identity level by cd-hit- v4.3 (Li and Godzik, Bioinformatics 2: 1658-1659, 2006). This process generated 905 sequences with 823 sequences from the clean sequence set and 82 sequences from the non-clean sequence set. After blasting the 82 sequences from the non-clean sequence set against the 823 sequences from the clean sequence set, it was determined that 78 of 82 sequences had relatively high homology with the lowest percent identity at 82.24%. Thus, these 78 sequences were eliminated from the dataset. Therefore, the finial data set contained 823 sequences from the clean sequence set and 4 sequences from non-clean sequence set.
The 4 sequences from the non-clean sequence set are:
(1) gi|21335490|gb | AX399742| Sequence 19 from WO/0224876 (2397bp with 1 ambiguous base)
(2) gi|29363 l646|gb|GU943 l84|Influenza B virus (B/Singapore/DSO_050230/2006) segment 4 hemagglutinin (HA) gene, partial cds l (1815bp with 3 ambiguous bases)
(3) gi|343527428|gb|CY096794|Influenza A virus (A/Albany/5/l967(mixed)) hemagglutinin (HA) gene, partial cds l (l79bp with 4 ambiguous bases)
(4) gi|426308777|gb|CYl00576|Influenza A virus (A/Mexico/InDRE2246/2005(H3N2)) polymerase PB1 (PB1) and putative PB1-F2 protein (PB1-F2) genes, complete cds (l74bp with 52 ambiguous bases)
The process for generation of the final sequence data set from all downloaded influenza sequences is shown in FIG. 1. The final sequences (825) were used to make enrichment RNA oligonucleotides by Agilent Technologies (Santa Clara, CA) at 5X tiling and resulted in 46,953 l20bp RNA biotin labeled universal influenza enrichment probes. The enrichment probe sequences are set forth in the Sequence Listing.
In silico evaluation of designed probes
Influenza A viruses (IAV) are the major cause of human infection. Based on hemagglutinin (HA) segment, they can be categorized into two phylogenetic groups, group 1 contains Hl, H2, H5, H6, H8, H9, Hl 1, H12, H13 and H16, group 2 contains H3, H4, H7, H10, H14 and Hl5 (Air, Proc Natl Acad Sci USA 78:7639-7643, 1981; Nobusawa el ai Virology 182, 475-485, 1991). Therefore, 1918 H1N1 pandemic (A/Brevig Mission/l/l9l8(HlNl)) (group 1) and 2013 H7N9 outbreak (A/Hangzhou/l/20l3(H7N9)) (group 2) were used to evaluate the enrichment probes in silico. The enrichment probes were blasted by blastn program (with percent identity as 90 and e-value as 0.001 cutoff) (Altschul et al. , JMol Biol 215, 403- 410, 1990) against both viral genomes. Table 1 A shows the coverage of each segment from the two different viruses by designed probes. The results demonstrated that the enrichment probes provided 100% coverage of all segments from both influenza strains.
Table 1A. Probe coverages of each segment of two influenza strains by blastn search
In addition, all sequences of complete HA and NA segments of influenza A viruses were downloaded from the Influenza Research Database (Zhang et al, Nucleic Acids Res 45, D466- D474, 2017). One HA sequence from each HA subtype (Hl to H16) was randomly picked from a total of 55,300 unique downloaded HA sequences and one NA sequence from each NA subtype (Nl to Nl 1) was randomly picked from a total of 45,834 unique downloaded NA sequences. Enrichment probes were aligned to 18 randomly picked HA sequences from each HA subtype and 11 randomly picked NA sequences from each NA subtype respectively using the blastn program. Table 1B shows the coverages of the 18 HA segments (Hl to H18) and Table 1C shows that of 11 NA segments (Nl to Nl 1) by the enrichment probes. Similarly, the enrichment probes display more than 95% of segment coverage of all the tested HA and NA subtypes.
Table IB. Probe coverages of HA segments by blastn search
Table 1C. Probe coverages of NA segments by blastn search
Next, the enrichment probes were blasted against all human viruses downloaded from NCBI, which contains 3140 human representative virus sequences including 100 influenza sequences. From the downloaded human virus dataset, there were 18,704 probes that hit 122 human viral sequences (90 unique NC numbers). Among the 122 human viral sequences hit,
100 were influenza, including influenza A, B and C (71 unique NC numbers) and 22 non influenza viral sequences (19 unique NC numbers from 19 human viruses). However, the average alignment length between query and target from these 22 non-influenza hits was only 26.83 bp with the smallest being 21 bp and largest being 35 bp, while the average alignment length between query and target from influenza hits was 107.0 bp. Therefore, the homology of the enrichment probes to human non-influenza viruses is not significant.
Finally, the enrichment probes were blasted against the human genome downloaded from NCBI. The results demonstrated that 25 enrichment probes hit the human genome (90 unique gi numbers). Similarly, when looking at the query/target alignment length, the average alignment length of hits was 34.63 bp with the smallest being 30 bp and largest being 43 bp. Therefore, in silico analysis indicated that the enrichment probes hybridize selectively to the tested influenza viruses.
Enrichment effects on two different influenza viruses spiked-in human DNA samples by real-time PCR
In order to test the effectiveness of the enrichment probes, 2 sets of 8 plasmids, each set containing the 8 segments from one of two different influenza viruses,
HlNl(A/Califomia/04/2009(HlNl)) and influenza B, were spiked into human DNA and Illumina sequencing libraries were made. Real-time PCR reactions were performed to assess the amount of each of the 8 influenza segments were in the libraries before and after influenza specific enrichment steps. The results are shown in Table 2A. Using the human HER2 gene as a reference, the enrichment effect was significant. For the H1N1 spike-in library, the lowest enrichment effect was more than 10 CT values (the difference of CTs between unenriched and enriched libraries) from the PB2 segment. For the influenza B spike-in library, the lowest enrichment effect was more than 8 CT values from the NS segment. Therefore, based on the mixed human reference gene and the real time PCR results on 8 mixed segments, the enrichment probes demonstrated a substantial enrichment effect on two different influenza viruses.
Furthermore, 16 HA plasmids (Hl-to-Hl6) (Qi et aI., MBίo 5:e02l 16, 2014) were spiked into human DNA and the enrichment effect of the probe set was tested in the same way. The results are shown in Table 2B and demonstrate significant enrichment effects: the average CT values are decreased 9.8 after enrichment for Hl-to-Hl6 plasmid spiked-ins while the CT value of human HER2 gene is increased 9.9 after enrichment.
Table 2A. Real-time PCR results of enrichment of spike-in libraries
Table 2B. Real-time PCR results of enrichment of spike-in libraries of HI to H16 plasmids
Enrichment effects on cultured influenza stock by Illumina sequencing
The enrichment effects were further evaluated by Illumina sequencing on cultured influenza stock virus. Four sequencing libraries were made, two un-enriched and two enriched libraries from cultured influenza A stocks (H2N1), and then sequenced on Illumina NextSeq. The results show that the numbers of mapped reads to the stock influenza virus were
significantly increased (about 10 times) by universal influenza virus enrichment probes (Table 3). For library 1, the mapped influenza read number was only 2.9% of the total reads before enrichment, while after enrichment, the mapped read number had increased to 81.0% of the total reads. Similarly, for library 2, the mapped influenza read number was only 8.3% of the total reads before enrichment, while after enrichment, the mapped read number was 85.4% of the total reads.
Table 3. Illumina sequencing results of enrichment of cultured virus libraries
Detection of avian IAV in wild bird primary cloacal swap surveillance samples by viral isolation
The three cloacal swabs samples used in this study were collected from mallard ducks
{Anas platyrhynchos) during routine IAV surveillance conducted in Ohio during 2013. The presence of IAV was determined by inoculation techniques in embryonated specific pathogen- free (SPF) chicken eggs and the HA and NA subtype of each viral isolate was determined by hemagglutinin and neuraminidase inhibition techniques at The National Veterinary Service Laboratory (NVSL), as previously described (Fries et al., J Virol 89, 5371-5381, 2015). The subtype of sample 1 (A/mallard/0hio/l30Sl979/) was not able to be determined using traditional methods. Sample 2 (A/mallard/Ohio/l3OSl980) was subtyped as H2N8, and sample 3 (A/mallard/0hio/l30Sl35 l) was subtyped as H1N8. Enrichment effects on wild bird cloacal swab samples by Illumina sequencing
From duck rectal swab sample 1, an Illumina sequencing library was made without enrichment and 207271317 reads were obtained. However, after these reads were mapped to all of the HA and NA sequences downloaded from the Influenza Resource Database, there were only 3 reads that mapped to NA sequences (N8). An influenza-enriched sequencing library was then made using the universal influenza probes from the same sample. It was sequenced on Illumina NextSeq and a total of 329619786 reads were obtained. Among them, 575532 reads were mapped to HAs and 409989 reads to NA sequences. Among the HA hits, there were 118317 unique reads aligning to H10 and 457215 unique reads aligning to Hl. In addition, there was no read that aligned to H10 and Hl at the same time. Similarly, for NA hits, there were 65934 unique reads aligning to N5 and 344055 unique reads that aligned to N8. Also, no unique hits from N5 and N8 overlapped. Therefore, the enrichment approach by the designed universal influenza enrichment probes not only allowed for detection of influenza sequences, but also demonstrated that a mixture of 2 HAs and 2 NAs were in this sample.
Based on the successful enrichment of Sample 1, enrichment libraries were made from Sample 2 and Sample 3 and sequenced on the Illumina machine. From Sample 2, 438249661 reads were obtained. Among them, 9673 reads were mapped to HAs with 235 reads on Hl and 9438 reads on H2; and 16485335 reads were mapped to NAs with 1342 on Nl and 16483993 on N8, which matched the detection of H2N8 by traditional method. From Sample 3, 633893806 reads were obtained. Among them, 34418497 reads were mapped to Has, with 24076524 reads on Hl and 10341973 reads on H8; and 21574593 reads were mapped to NAs with 8491101 on N4 and 13083492 on N8. Similarly, for Sample 3, the result from traditional method of H1N8 infection was confirmed, as well as detection of other influenza, possible H8N4 infection. The enrichment effect on mallard samples is shown in Table 4.
Table 4. Illumina sequencing results of enrichment of mallard swab samples
Clinical Applications
Disclosed herein is the generation of a unique dataset for all available influenza virus sequences, and based on this dataset, a set of universal influenza enrichment probes was designed. Their homology was analyzed in silico on a data set of influenza virus sequences, a data set of all human viruses, and the human genome, and a specific homology preferentially to influenza viruses was demonstrated. Subsequently, experiments were performed to test their enrichment effects on: 1) libraries made by spiking influenza gene-encoding plasmid DNAs by real-time PCR; 2) cultured influenza virus stocks; 3) wild bird primary cloacal swab surveillance samples. From all of these materials, a significant enrichment of influenza sequences was demonstrated. Mixed infections with different avian IAV subtypes were detected in the mallard samples, which may not be detected using traditional subtyping methods (Dugan et al ., PLoS Pathog 4:el000076, 2008; Wang et al., Virology 375, 182-189, 2008). With the cost of deep sequencing technology decreasing and sequence output increasing, more and more influenza researchers can apply this advanced technology to their research, surveillance, and diagnostic efforts because it not only provides an opportunity recover the entire influenza genome (Ren et al, Emerg Infect Dis 19, 1881-1884 19, 1881-1884, 2013), but also allows investigation of the quasispecies variants in the population (Doud et al, PLoS Pathog 13, el00627l, 2017). Most traditional molecular-based approaches have utilized viral specific primers to PCR amplify influenza genomes or genome segments. One recent study used sequence-independent PCR amplification on RNA isolated from purified viral particles, but this method required filtration and ultracentrifugation (Ren et al, Emerg Infect Dis 19, 1881-1884, 2013). Enrichment strategies using universal influenza probes avoids influenza specific amplification but also allows enrichment from samples for which RNA is degraded. For example, isolated RNA (maximum length about 100 nucleotides) from a previous study sequencing IAV from a formalin-fixed, paraffin-embedded (FFPE) autopsy lung tissue sample from the 1918 influenza pandemic (Xiao et al, J Pathol 229, 535-54, 2013), or from fixed clinical nasal swabs (Krafft et al, J Clin Microbiol 43, 1768-1775, 2005) or bird cloacal swabs (Wang et al, Virology 375, 182-189, 2008). The RNA isolated from FFPE tissue samples or fixed swab samples can be highly degraded, making reverse transcription using conserved non-coding region primers, and PCR amplification using full gene segment primers difficult or impossible. In addition, prior knowledge of the infected influenza virus type, or IAV subtype(s) is unnecessary when using the influenza universal enrichment probes described herein. Even RNA from emerging influenza strains will likely be enriched because the enrichment process is hybridization-based, and the probes are 120 nt in length. It has been shown in a study of a related method that sequences up to 40% different from known virus genomes used for designing a probe library can be captured (Briese et al, MBio 6, e01491-01415, 2015) and the probe hybridization temperature and the wash conditions can be adjusted to compensate for stringency of hybridization.
From three primary cloacal swab samples from wild mallards, using the disclosed enrichment probes, not only were influenza sequences recovered from the deep sequencing libraries, but also evidence of mixed infection was identified, reflected by two HA subtype sequences and two NA subtype sequences. However, when using the traditional methods, sample 1 could not be subtyped, likely reflecting the mixed infection seen by sequence analysis. For samples 2 and 3, from the egg-cultured sample, only one subtype was identified. It has been reported that sequencing viral samples without culturing increases the detection of mixed infections and enhances the identification of viral strains that might be outgrown during adaptation to egg culture (Lindsay et al ., Viruses 5, 1964-1977, 2013). From the sequencing data disclosed herein, the subtypes identified by traditional methods are the ones that have largest number of reads. Therefore, the reason for only detecting one viral subtype by culture could be that during the culturing process, the major one outgrows the minor one.
Aquatic birds are thought to be the reservoir of influenza virus (Webster et al ., Microbiol Rev 56, 152-179, 1992) and occasionally spill over to other species including intermediate hosts, like dogs and horses (Parrish et al., J Virol 89, 2990-2994, 2015). Mixed infection of different subtype of influenza viruses and reassortment have been found in wild birds (Dugan et al. , PLoS Pathog 4, el000076, 2008; Dugan et al, Virology 417, 98-105, 2011; Lindsay et al, Viruses 5, 1964-1977, 2013; Wang et al, Virology 375, 182-189, 2008). Based on the sequencing results, evidence of mixed infection was noted in the enriched cloacal swab sample libraries. The coverage of the HA or NA gene segments varied from 15.9% of H10 and 12.6% of N5 from sample 2 to 90.5% of H8 and 85.7% of N4 in sample 3.
Example 3: Coverage of influenza virus sequences in the GISAID influenza database
In order to check the comprehensiveness of the enrichment probes, coverage of the probe set was tested against the Global Initiative on Sharing All Influenza Data (GISAID) influenza database, which is a larger influenza specific data set, particularly for avian subtypes. All avian IAV GISAID only isolates (only uploaded in GISAID, not imported from GenBank) were downloaded, totaling 4,289 isolates, and 23,147 sequences on. Among these 23,147 sequences, there were 2,974 sequences containing ambiguous non A, T, G, C bases (non-clean sequences) and 20,173 sequences containing only A, T, G, C bases (clean sequences). The influenza virus probe set (46,953 l20nt probe sequences) was blasted against the non-clean sequences (2,974). For all the non-clean sequences, the average length coverage of these 2,974 sequences by the probes was 99.19%, and the average coverage for each base was 48.61 (each base position had an average of 48.61 probes covering it). The average aligned probe length was 109.94 bases. The total of 20,173 clean sequences was collapsed into 14,925 unique sequences and the same analysis was performed. The average length coverage of these sequences by the probes was 99.88%, and the average coverage of each base was 47.95 (each base position had an average of 47.95 probes covering it). The average aligned probe length was 112.58 bases. Therefore, the disclosed probe set exhibits good coverage of the influenza sequences in the GISAID database.
In view of the many possible embodiments to which the principles of the disclosure may be applied, it should be recognized that the illustrated embodiments are only examples of the invention and should not be taken as limiting the scope of the invention. Rather, the scope of the invention is defined by the following claims. We therefore claim as our invention all that comes within the scope and spirit of these claims.