In addition, all sequences of complete HA and NA segments of influenza A viruses were downloaded from the Influenza Research Database (Zhang et al, Nucleic Acids Res 45, D466- D474, 2017). One HA sequence from each HA subtype (Hl to H16) was randomly picked from a total of 55,300 unique downloaded HA sequences and one NA sequence from each NA subtype (Nl to Nl 1) was randomly picked from a total of 45,834 unique downloaded NA sequences. Enrichment probes were aligned to 18 randomly picked HA sequences from each HA subtype and 11 randomly picked NA sequences from each NA subtype respectively using the blastn program. Table 1B shows the coverages of the 18 HA segments (Hl to H18) and Table 1C shows that of 11 NA segments (Nl to Nl 1) by the enrichment probes. Similarly, the enrichment probes display more than 95% of segment coverage of all the tested HA and NA subtypes.

Table IB. Probe coverages of HA segments by blastn search

Table 1C. Probe coverages of NA segments by blastn search

Next, the enrichment probes were blasted against all human viruses downloaded from NCBI, which contains 3140 human representative virus sequences including 100 influenza sequences. From the downloaded human virus dataset, there were 18,704 probes that hit 122 human viral sequences (90 unique NC numbers). Among the 122 human viral sequences hit,

100 were influenza, including influenza A, B and C (71 unique NC numbers) and 22 non influenza viral sequences (19 unique NC numbers from 19 human viruses). However, the average alignment length between query and target from these 22 non-influenza hits was only 26.83 bp with the smallest being 21 bp and largest being 35 bp, while the average alignment length between query and target from influenza hits was 107.0 bp. Therefore, the homology of the enrichment probes to human non-influenza viruses is not significant.

Finally, the enrichment probes were blasted against the human genome downloaded from NCBI. The results demonstrated that 25 enrichment probes hit the human genome (90 unique gi numbers). Similarly, when looking at the query/target alignment length, the average alignment length of hits was 34.63 bp with the smallest being 30 bp and largest being 43 bp. Therefore, in silico analysis indicated that the enrichment probes hybridize selectively to the tested influenza viruses.

Enrichment effects on two different influenza viruses spiked-in human DNA samples by real-time PCR

In order to test the effectiveness of the enrichment probes, 2 sets of 8 plasmids, each set containing the 8 segments from one of two different influenza viruses,

HlNl(A/Califomia/04/2009(HlNl)) and influenza B, were spiked into human DNA and Illumina sequencing libraries were made. Real-time PCR reactions were performed to assess the amount of each of the 8 influenza segments were in the libraries before and after influenza specific enrichment steps. The results are shown in Table 2A. Using the human HER2 gene as a reference, the enrichment effect was significant. For the H1N1 spike-in library, the lowest enrichment effect was more than 10 CT values (the difference of CTs between unenriched and enriched libraries) from the PB2 segment. For the influenza B spike-in library, the lowest enrichment effect was more than 8 CT values from the NS segment. Therefore, based on the mixed human reference gene and the real time PCR results on 8 mixed segments, the enrichment probes demonstrated a substantial enrichment effect on two different influenza viruses.

Furthermore, 16 HA plasmids (Hl-to-Hl6) (Qi et aI., MBίo 5:e02l 16, 2014) were spiked into human DNA and the enrichment effect of the probe set was tested in the same way. The results are shown in Table 2B and demonstrate significant enrichment effects: the average CT values are decreased 9.8 after enrichment for Hl-to-Hl6 plasmid spiked-ins while the C_T value of human HER2 gene is increased 9.9 after enrichment.

Table 2A. Real-time PCR results of enrichment of spike-in libraries

Table 2B. Real-time PCR results of enrichment of spike-in libraries of HI to H16 plasmids

Enrichment effects on cultured influenza stock by Illumina sequencing

The enrichment effects were further evaluated by Illumina sequencing on cultured influenza stock virus. Four sequencing libraries were made, two un-enriched and two enriched libraries from cultured influenza A stocks (H2N1), and then sequenced on Illumina NextSeq. The results show that the numbers of mapped reads to the stock influenza virus were

significantly increased (about 10 times) by universal influenza virus enrichment probes (Table 3). For library 1, the mapped influenza read number was only 2.9% of the total reads before enrichment, while after enrichment, the mapped read number had increased to 81.0% of the total reads. Similarly, for library 2, the mapped influenza read number was only 8.3% of the total reads before enrichment, while after enrichment, the mapped read number was 85.4% of the total reads.

Table 3. Illumina sequencing results of enrichment of cultured virus libraries

Detection of avian IAV in wild bird primary cloacal swap surveillance samples by viral isolation

The three cloacal swabs samples used in this study were collected from mallard ducks

{Anas platyrhynchos) during routine IAV surveillance conducted in Ohio during 2013. The presence of IAV was determined by inoculation techniques in embryonated specific pathogen- free (SPF) chicken eggs and the HA and NA subtype of each viral isolate was determined by hemagglutinin and neuraminidase inhibition techniques at The National Veterinary Service Laboratory (NVSL), as previously described (Fries et al., J Virol 89, 5371-5381, 2015). The subtype of sample 1 (A/mallard/0hio/l30Sl979/) was not able to be determined using traditional methods. Sample 2 (A/mallard/Ohio/l3OSl980) was subtyped as H2N8, and sample 3 (A/mallard/0hio/l30Sl35 l) was subtyped as H1N8. Enrichment effects on wild bird cloacal swab samples by Illumina sequencing

From duck rectal swab sample 1, an Illumina sequencing library was made without enrichment and 207271317 reads were obtained. However, after these reads were mapped to all of the HA and NA sequences downloaded from the Influenza Resource Database, there were only 3 reads that mapped to NA sequences (N8). An influenza-enriched sequencing library was then made using the universal influenza probes from the same sample. It was sequenced on Illumina NextSeq and a total of 329619786 reads were obtained. Among them, 575532 reads were mapped to HAs and 409989 reads to NA sequences. Among the HA hits, there were 118317 unique reads aligning to H10 and 457215 unique reads aligning to Hl. In addition, there was no read that aligned to H10 and Hl at the same time. Similarly, for NA hits, there were 65934 unique reads aligning to N5 and 344055 unique reads that aligned to N8. Also, no unique hits from N5 and N8 overlapped. Therefore, the enrichment approach by the designed universal influenza enrichment probes not only allowed for detection of influenza sequences, but also demonstrated that a mixture of 2 HAs and 2 NAs were in this sample.

Based on the successful enrichment of Sample 1, enrichment libraries were made from Sample 2 and Sample 3 and sequenced on the Illumina machine. From Sample 2, 438249661 reads were obtained. Among them, 9673 reads were mapped to HAs with 235 reads on Hl and 9438 reads on H2; and 16485335 reads were mapped to NAs with 1342 on Nl and 16483993 on N8, which matched the detection of H2N8 by traditional method. From Sample 3, 633893806 reads were obtained. Among them, 34418497 reads were mapped to Has, with 24076524 reads on Hl and 10341973 reads on H8; and 21574593 reads were mapped to NAs with 8491101 on N4 and 13083492 on N8. Similarly, for Sample 3, the result from traditional method of H1N8 infection was confirmed, as well as detection of other influenza, possible H8N4 infection. The enrichment effect on mallard samples is shown in Table 4.

Table 4. Illumina sequencing results of enrichment of mallard swab samples

Clinical Applications

Disclosed herein is the generation of a unique dataset for all available influenza virus sequences, and based on this dataset, a set of universal influenza enrichment probes was designed. Their homology was analyzed in silico on a data set of influenza virus sequences, a data set of all human viruses, and the human genome, and a specific homology preferentially to influenza viruses was demonstrated. Subsequently, experiments were performed to test their enrichment effects on: 1) libraries made by spiking influenza gene-encoding plasmid DNAs by real-time PCR; 2) cultured influenza virus stocks; 3) wild bird primary cloacal swab surveillance samples. From all of these materials, a significant enrichment of influenza sequences was demonstrated. Mixed infections with different avian IAV subtypes were detected in the mallard samples, which may not be detected using traditional subtyping methods (Dugan et al ., PLoS Pathog 4:el000076, 2008; Wang et al., Virology 375, 182-189, 2008). With the cost of deep sequencing technology decreasing and sequence output increasing, more and more influenza researchers can apply this advanced technology to their research, surveillance, and diagnostic efforts because it not only provides an opportunity recover the entire influenza genome (Ren et al, Emerg Infect Dis 19, 1881-1884 19, 1881-1884, 2013), but also allows investigation of the quasispecies variants in the population (Doud et al, PLoS Pathog 13, el00627l, 2017). Most traditional molecular-based approaches have utilized viral specific primers to PCR amplify influenza genomes or genome segments. One recent study used sequence-independent PCR amplification on RNA isolated from purified viral particles, but this method required filtration and ultracentrifugation (Ren et al, Emerg Infect Dis 19, 1881-1884, 2013). Enrichment strategies using universal influenza probes avoids influenza specific amplification but also allows enrichment from samples for which RNA is degraded. For example, isolated RNA (maximum length about 100 nucleotides) from a previous study sequencing IAV from a formalin-fixed, paraffin-embedded (FFPE) autopsy lung tissue sample from the 1918 influenza pandemic (Xiao et al, J Pathol 229, 535-54, 2013), or from fixed clinical nasal swabs (Krafft et al, J Clin Microbiol 43, 1768-1775, 2005) or bird cloacal swabs (Wang et al, Virology 375, 182-189, 2008). The RNA isolated from FFPE tissue samples or fixed swab samples can be highly degraded, making reverse transcription using conserved non-coding region primers, and PCR amplification using full gene segment primers difficult or impossible. In addition, prior knowledge of the infected influenza virus type, or IAV subtype(s) is unnecessary when using the influenza universal enrichment probes described herein. Even RNA from emerging influenza strains will likely be enriched because the enrichment process is hybridization-based, and the probes are 120 nt in length. It has been shown in a study of a related method that sequences up to 40% different from known virus genomes used for designing a probe library can be captured (Briese et al, MBio 6, e01491-01415, 2015) and the probe hybridization temperature and the wash conditions can be adjusted to compensate for stringency of hybridization.

From three primary cloacal swab samples from wild mallards, using the disclosed enrichment probes, not only were influenza sequences recovered from the deep sequencing libraries, but also evidence of mixed infection was identified, reflected by two HA subtype sequences and two NA subtype sequences. However, when using the traditional methods, sample 1 could not be subtyped, likely reflecting the mixed infection seen by sequence analysis. For samples 2 and 3, from the egg-cultured sample, only one subtype was identified. It has been reported that sequencing viral samples without culturing increases the detection of mixed infections and enhances the identification of viral strains that might be outgrown during adaptation to egg culture (Lindsay et al ., Viruses 5, 1964-1977, 2013). From the sequencing data disclosed herein, the subtypes identified by traditional methods are the ones that have largest number of reads. Therefore, the reason for only detecting one viral subtype by culture could be that during the culturing process, the major one outgrows the minor one.

Aquatic birds are thought to be the reservoir of influenza virus (Webster et al ., Microbiol Rev 56, 152-179, 1992) and occasionally spill over to other species including intermediate hosts, like dogs and horses (Parrish et al., J Virol 89, 2990-2994, 2015). Mixed infection of different subtype of influenza viruses and reassortment have been found in wild birds (Dugan et al. , PLoS Pathog 4, el000076, 2008; Dugan et al, Virology 417, 98-105, 2011; Lindsay et al, Viruses 5, 1964-1977, 2013; Wang et al, Virology 375, 182-189, 2008). Based on the sequencing results, evidence of mixed infection was noted in the enriched cloacal swab sample libraries. The coverage of the HA or NA gene segments varied from 15.9% of H10 and 12.6% of N5 from sample 2 to 90.5% of H8 and 85.7% of N4 in sample 3.

Example 3: Coverage of influenza virus sequences in the GISAID influenza database

In order to check the comprehensiveness of the enrichment probes, coverage of the probe set was tested against the Global Initiative on Sharing All Influenza Data (GISAID) influenza database, which is a larger influenza specific data set, particularly for avian subtypes. All avian IAV GISAID only isolates (only uploaded in GISAID, not imported from GenBank) were downloaded, totaling 4,289 isolates, and 23,147 sequences on. Among these 23,147 sequences, there were 2,974 sequences containing ambiguous non A, T, G, C bases (non-clean sequences) and 20,173 sequences containing only A, T, G, C bases (clean sequences). The influenza virus probe set (46,953 l20nt probe sequences) was blasted against the non-clean sequences (2,974). For all the non-clean sequences, the average length coverage of these 2,974 sequences by the probes was 99.19%, and the average coverage for each base was 48.61 (each base position had an average of 48.61 probes covering it). The average aligned probe length was 109.94 bases. The total of 20,173 clean sequences was collapsed into 14,925 unique sequences and the same analysis was performed. The average length coverage of these sequences by the probes was 99.88%, and the average coverage of each base was 47.95 (each base position had an average of 47.95 probes covering it). The average aligned probe length was 112.58 bases. Therefore, the disclosed probe set exhibits good coverage of the influenza sequences in the GISAID database.

In view of the many possible embodiments to which the principles of the disclosure may be applied, it should be recognized that the illustrated embodiments are only examples of the invention and should not be taken as limiting the scope of the invention. Rather, the scope of the invention is defined by the following claims. We therefore claim as our invention all that comes within the scope and spirit of these claims.

Claims

1. An influenza virus probe set, comprising:

deoxyribonucleic acid (DNA) probes having at least 80% identity with the nucleotide sequences of SEQ ID NOs: 1-46953, or a subset thereof comprising at least 95% of the probes of SEQ ID NOs: 1-46953; or

ribonucleic acid (RNA) probes having at least 80% identity with the nucleotide sequences of SEQ ID NOs: 1-46953, wherein uracil (U) is substituted for thymidine (T), or a subset thereof comprising at least 95% of the probes of SEQ ID NOs: 1-46953.

2. The influenza virus probe set of claim 1, wherein:

the subset of DNA probes comprises at least 98% of the probes of SEQ ID NOs: 1- 46953; or

the subset of RNA probes comprises at least 98% of the probes of SEQ ID NOs: 1-

46953.

3. The influenza virus probe set of claim 1 or claim 2, wherein the probe set comprises or consists of:

46,953 unique DNA probes having at least 80% identity with the nucleotide sequences of each of SEQ ID NOs: 1-46953; or

46,953 unique RNA probes having at least 80% identity with the nucleotide sequences of each of SEQ ID NOs: 1-46953, wherein uracil (U) is substituted for thymidine (T).

4. The influenza virus probe set of any one of claims 1-3, wherein the probe set or subset thereof comprises:

DNA probes having at least 90% identity with the nucleotide sequences of SEQ ID NOs: 1-46953; or

RNA probes having at least 90% identity with the nucleotide sequences of SEQ ID NOs: 1-46953, wherein uracil (U) is substituted for thymidine (T).

5. The influenza virus probe set of any one of claims 1-4, wherein the probe set, or subset thereof, comprises:

DNA probes comprising the nucleotide sequences of SEQ ID NOs: 1-46953; or

RNA probes comprising the nucleotide sequences of SEQ ID NOs: 1-46953, wherein uracil (U) is substituted for thymidine (T).

6. The influenza virus probe set of any one of claims 1-5, wherein the probes are at least 120 nucleotides in length.

7. The influenza virus probe set of any one of claims 1-6, wherein the probes are labelled.

8. The influenza virus probe set of claim 7, wherein the probes are labelled at the 5' end.

9. The influenza virus probe set of claim 7, wherein the probes are labelled at the 3' end.

10. The influenza virus probe set of any one of claims 7-9, wherein the probes are labelled with biotin or a radioactive isotope.

11. A kit comprising the influenza virus probe set of any one of claims 1-10.

12. The kit of claim 11, wherein the probes are labelled with biotin and the kit further comprises streptavidin-labelled magnetic beads.

13. A method of enriching influenza virus nucleic acid in a sample comprising nucleic acid molecules, comprising:

contacting the sample with the probe set of any one of claims 1-10 under conditions sufficient to allow hybridization of any influenza virus nucleic acid present in the sample to probes of the probe set; and

isolating the probes from the sample, thereby enriching influenza virus nucleic acid in the sample.

14. The method of claim 13, further comprising shearing the nucleic acid molecules in the sample prior to contacting the sample with the probe set.

15. The method of claim 14, further comprising preparing a library from the sheared nucleic acid molecules and contacting the library with the probe set.

16. The method of any one of claims 13-15, wherein the probes are labelled with biotin and isolating the probes from the sample comprises contacting the sample with

streptavidin-labelled magnetic beads.

17. The method of any one of claims 13-16, wherein the sample is a biological sample.

18. The method of claim 17, wherein the biological sample comprises a blood, saliva, mucus, nasal wash or swab sample.

19. The method of claim 17, wherein the biological sample comprises a tissue sample.

20. The method of claim 19, wherein the tissue sample is a paraffin-embedded tissue sample.

21. A method of detecting influenza virus nucleic acid in a sample comprising nucleic acid molecules, comprising:

contacting the sample with the probe set of any one of claims 1-10 under conditions sufficient to allow hybridization of any influenza virus nucleic acid present in the sample to probes of the probe set;

isolating the probes from the sample; and

detecting the presence of influenza virus nucleic acid hybridized to the isolated probes, thereby detecting influenza virus nucleic acid in the sample.

22. The method of claim 21, further comprising shearing the nucleic acid molecules in the sample prior to contacting the sample with the probe set.

23. The method of claim 22, further comprising preparing a library from the sheared nucleic acid and contacting the library with the probe set.

24. The method of any one of claims claim 21-23, further comprising:

amplifying the influenza virus nucleic acid hybridized to the isolated probes; and detecting the amplified influenza virus nucleic acid.

25. The method of claim 24, wherein amplifying the influenza virus nucleic acid comprises PCR amplification.

26. The method of claim 24 or claim 25, wherein detecting the amplified influenza virus nucleic acid comprises sequencing the amplified influenza virus nucleic acid.

27. The method of any one of claims 21-26, wherein the probes are labelled with biotin and isolating the probes from the sample comprises contacting the sample with

streptavidin-labelled magnetic beads.

28. The method of any one of claims 21-27, wherein the sample is a biological sample.

29. The method of claim 28, wherein the biological sample comprises a blood, saliva, mucus, nasal wash or swab sample.

30. The method of claim 28, wherein the biological sample comprises a tissue sample.

31. The method of claim 30, wherein the tissue sample is a paraffin-embedded tissue sample.

32. A method of diagnosing a subject as having an influenza virus infection, comprising:

contacting a sample comprising nucleic acid molecules obtained from the subject with the probe set of any one of claims 1-10 under conditions sufficient to allow hybridization of any influenza virus nucleic acid present in the sample to probes of the probe set;

isolating the probes from the sample; and

detecting the presence of influenza virus nucleic acid hybridized to the isolated probes, thereby diagnosing the subject as having an influenza virus infection.

33. The method of claim 32, further comprising shearing the nucleic acid molecules in the sample prior to contacting the sample with the probe set.

34. The method of claim 33, further comprising preparing a library from the sheared nucleic acid and contacting the library with the probe set.

35. The method of any one of claims 32-34, further comprising:

36. The method of claim 35, wherein amplifying the influenza virus nucleic acid comprises PCR amplification or linear amplification.

37. The method of claim 35 or claim 36, wherein detecting the amplified influenza virus nucleic acid comprises sequencing the amplified influenza virus nucleic acid.

38. The method of any one of claims 32-37, wherein the probes are labelled with biotin and isolating the probes from the sample comprises contacting the sample with

streptavidin-labelled magnetic beads.

39. The method of any one of claims 32-38, wherein the sample is a biological sample.

40. The method of claim 39, wherein the biological sample comprises a blood, saliva, mucus, nasal wash or swab sample.

41. The method of claim 39, wherein the biological sample comprises a tissue sample.

42. The method of claim 41, wherein the tissue sample is a paraffin-embedded tissue sample.