Movatterモバイル変換


[0]ホーム

URL:


WO2019108906A1 - Genomic dna methylation associated with disease prediction - Google Patents

Genomic dna methylation associated with disease prediction
Download PDF

Info

Publication number
WO2019108906A1
WO2019108906A1PCT/US2018/063266US2018063266WWO2019108906A1WO 2019108906 A1WO2019108906 A1WO 2019108906A1US 2018063266 WUS2018063266 WUS 2018063266WWO 2019108906 A1WO2019108906 A1WO 2019108906A1
Authority
WO
WIPO (PCT)
Prior art keywords
methylation
cancer
probes
ess
individual
Prior art date
Application number
PCT/US2018/063266
Other languages
French (fr)
Inventor
Robert A. WATERLAND
Timothy VAN BAAK, Esq.
Cristian COARFA
Original Assignee
Baylor College Of Medicine
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Baylor College Of MedicinefiledCriticalBaylor College Of Medicine
Publication of WO2019108906A1publicationCriticalpatent/WO2019108906A1/en

Links

Classifications

Definitions

Landscapes

Abstract

Embodiments of the disclosure include methods of determining a risk for cancer for an individual. In particular embodiments, the disclosure includes methods of identifying a methylation status for one or more CpG sites, such as the sites of ZFP57, SPATCIL, OR2L-13, VTRNA2-1, DUSP22, HCG4B, and/or PF4.

Description

GENOMIC DNA METHYLATION ASSOCIATED WITH DISEASE PREDICTION
[0001] This application claims priority to U.S. Provisional Patent Application Serial No. 62/592,507, filed November 30, 2017, which is incorporated by reference herein in its entirety.
[0002] This invention was made with government support under CRIS 3092-5-001-059 awarded by United States Department of Agriculture. The government has certain rights in the invention.
TECHNICAL FIELD
[0003] Embodiments of fields of the subject matter of the disclosure include at least cell biology, molecular biology, epigenetics, and medicine, including at least cancer medicine.
BACKGROUND
[0004] Epigenetic mechanisms govern cell type-specific differences in gene expression potential [1] DNA methylation, which occurs predominantly at CpG dinucleotides in the mammalian genome, is a stable epigenetic mark critical to genomic imprinting, silencing of retrotransposons, and cell type-specific gene expression. Thirty years ago it was proposed that aberrant DNA methylation could function as an‘epimutation’ and contribute to human disease, analogously to genetic mutations [2]. Indeed, DNA methylation is implicated in cancer [3] and a host of human diseases. Advancing our understanding of the role of DNA methylation in human disease is complicated, however, by the cellular heterogeneity of epigenetic marks, the influence of genetics on epigenetics, and the potential for reverse causality [4, 5].
[0005] The characteristics of metastable epialleles (MEs) circumvent these obstacles, offering outstanding opportunities to understand how inter-individual epigenetic variation contributes to human disease. MEs are epigenetic variants that are set stochastically in the early embryo and maintained during subsequent cellular differentiation [6]. Consequently, MEs function as epigenetic polymorphisms, i.e. stable and systemic (not cell type-specific) individual variants. Epigenetic metastability was discovered due to visible phenotypic differences among isogenic inbred mice [7]. Inter- individual variation in gene expression and phenotype is correlated with stable individual differences in DNA methylation at murine MEs [8].
73689209.1 [0006] Reasoning that monozygotic twins offer a human analog of inbred mice, the inventors explored a publicly-available genome-scale CpG methylation data set for monozygotic (MZ) and dizygotic (DZ) twins [9] based on the widely-utilized Illumina Infinium Human- Methylation 450 (HM450) array. It was discovered that many candidate MEs exhibit
inordinately high epigenetic similarity in MZ twin pairs - a phenomenon the inventors have termed‘epigenetic supersimilarity’. Embodiments of the disclosure explain this phenomenon, characterize genomic and epigenomic features of epigenetically supersimilar loci and, in a large prospective epidemiologic study, show that methylation at these loci in at least peripheral blood DNA years before diagnosis is associated with risk of cancer.
[0007] The present disclosure satisfies a long-felt need in the art for prediction of disease for a subject, such as prediction based on methylation patterns in one or more particular genomic loci.
BRIEF SUMMARY
[0008] Embodiments of the disclosure include analysis for one or more differences in DNA methylation in an individual that associate the individual with a risk for developing one or more particular medical conditions, including a risk greater than the average individual of a population. In particular embodiments, the difference(s) are not cell-type specific and may be identified from DNA from any cell of the individual. The methods allow determination of a risk of various diseases based upon detection of the methylation status (methylated vs. not
methylated) at one or more loci in the genome of an individual, including at one or more CpG nucleotides. In certain embodiments, a region of the genome is evaluated for the degree to which one or more CpG sites and the outcome of the evaluation indicates being at risk for contracting a medical condition or indicates being at reduced risk of contracting a medical condition. Because, in any individual, the level of methylation at these special CpG sites is stable and systemic, they may be referred as epigenetic polymorphisms. Consequently, the methods of the disclosure for analyzing methylation status may be performed only once for an individual.
[0009] Embodiments of the disclosure include compositions and methods for identifying a risk for a subject to develop cancer, such as by determining from a sample from the subject the methylation status of one or more loci selected from the group consisting of ZFP57, SPATC1L, OR2L-13, VTRNA2-1, DUSP22, HCG4B, PF4, and a combination thereof, for example. In specific embodiments, determining the methylation status refers to measuring average
73689209.1 methylation at 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, or more CpG sites in one or more of the loci. In certain cases, the loci include all of ZFP57, SPATC1L, OR2L-13, VTRNA2-1, DUSP22, HCG4B, and PF4, although in other cases the loci include 6, 5, 4, 3, or 2 of the loci of ZFP57, SPATC1L, OR2L-13, VTRNA2-1, DUSP22, HCG4B, and PF4. In some cases, one or more loci from Tables 1, 3, 4, or 5 are assayed in methods as encompassed herein, including with or without ZFP57, SPATC1L, OR2L-13, VTRNA2-1, DUSP22, HCG4B, and/or PF4.
[0010] In particular embodiments of methods encompassed by the disclosure, the subject is an adult, adolescent, child, or infant, including a newborn, or the methods may be performed in utero such as by testing genetic material from fetal cells or cell fragments that are circulating or in the mother’s womb. The cancer for which the subject is being tested for being at risk may be any cancer, including cancer of the lung, breast, brain, colon, pancreas, uterus, bone, skin, endometrium, testes, uterus, spleen, liver, kidney, stomach, thyroid, gall bladder, esophagus, prostate, or hematopoietic lineages, for example.
[0011] Samples from which DNA is obtained for analysis of the methylation status includes at least peripheral blood, biopsy, hair, saliva, cheek scrapings, cerebrospinal fluid, urine, nipple aspirate, fecal material, semen, sputum, mucus, fingernails or a combination thereof. In some cases the sample is obtained from one or more fingernails from a newborn. In some methods, the method includes the step of obtaining a sample from the subject.
[0012] In certain embodiments, once an individual is determined to be at risk for cancer, the method further comprises the step of providing a therapeutic and/or a preventative therapy to the individual, and the therapeutic and/or a preventative therapy may be provided to the individual once or more than once. In cases wherein the therapeutic and/or a preventative therapy is provided more than once, the duration of time between deliveries may be on the order of 1-60 minutes, 1-24 hours, 1-7 days, 1-4 weeks, 1-12 months or 1-70 years, including any ranges there between. A therapeutic may comprise surgery, drug, radiation, immunotherapy, hormone therapy, or a combination thereof. A preventative therapy may comprise watchful waiting, surgery, drug, radiation, immunotherapy, hormone therapy, or a combination thereof.
[0013] The foregoing has outlined rather broadly the features and technical advantages of the present invention in order that the detailed description of the invention that follows may be better understood. Additional features and advantages of the invention will be described hereinafter which form the subject of the claims of the invention. It should be appreciated by
73689209.1 those skilled in the art that the conception and specific embodiment disclosed may be readily utilized as a basis for modifying or designing other structures for carrying out the same purposes of the present invention. It should also be realized by those skilled in the art that such equivalent constructions do not depart from the spirit and scope of the invention as set forth in the appended claims. The novel features which are believed to be characteristic of the invention, both as to its organization and method of operation, together with further objects and advantages will be better understood from the following description when considered in connection with the
accompanying figures. It is to be expressly understood, however, that each of the figures is provided for the purpose of illustration and description only and is not intended as a definition of the limits of the present invention.
BRIEF DESCRIPTION OF THE DRAWINGS
[0014] For a more complete understanding of the present invention, reference is now made to the following descriptions taken in conjunction with the accompanying drawings.
[0015] FIGS, la-lf: Methylation in MZ twin pairs is highly concordant at candidate metastable epialleles (MEs). Each plot shows probe- specific b values for 97 MZ (blue, twin 2 > twin 1) and 162 DZ (red, twin 2 < twin 1) twin pairs, at loci previously identified as bona fide or candidate MEs [13, 14]. Insets show locus-average mean square errors (MSE) across all the MZ and DZ twins. MSE is much lower in MZ compared to DZ twins la, VTRNA2-1 , 15 probes,
10.6-fold lower MSE (in MZ vs. DZ). lb, DUSP22, 11 probes, 16.5-fold lower lc, PAX8, 8 probes, 2.5-fold lower. Id, CYP2E1, 3 probes, 10.8-fold lower le, SFT2D3, 4 probes, 3.l-fold lower. If, CFD, 1 probe, 6.6-fold lower.
[0016] FIGS. 2a-2e: Some HM450 probes exhibit epigenetic supersimilarity (ESS).
2a, Distribution of probe- specific narrow-sense heritability (h2) estimates from [9]. (Shown are data on 24,839 probes; 9,566 probes with h2 < 0.001 were excluded for clarity.) 1058 probes show h2 > 1, including most of the probes illustrated in Figure 1 (red box plot). 2b, Normalized DZ MSE vs. MZ MSE for the 34,405 probes (top 10%) from Grundberg et al. [9]. Histograms (right and top) show distribution; red curves show best normal fit. Normalized DZ MSE (mean ± sd = 0.76 ± 0.13) is normally distributed, but normalized MZ MSE (0.63 ± 0.23) is skewed left (P = 7.0 x 1066). Probes with h2 > 1 are shown in blue. Probes to the left of the green line (y=2x) are classified as ESS. 2c, Associations between probe-level mQTL and heritability estimates (both from Grundberg et al. [9]. Among the 9,708 probes that are both in the top 10% of inter-
73689209.1 individual variance and positive for mQTL (top panel) mean heritability is 0.64 (gray vertical line) and positively associated with the strength of mQTL. Among ESS probes positive for mQTL (middle panel), mean heritability is 0.90 and not associated with mQTL. Mean heritability of ESS probes negative for mQTL (0.99, bottom panel) is similar to that of mQTL positive ESS probes. 2d, Model to explain ESS in MZ twins. Numbers on the dice represent different methylation states at a specific locus. If de novo methylation occurs after embryo cleavage (top), each MZ embryo undergoes independent establishment. If de novo methylation occurs prior to embryo cleavage (bottom), both MZ embryos inherit the same methylation state. 2e, Consistent with this model, bisulfite pyrosequencing in three tissues of 17 cadavers indicates that ESS probes also show systemic inter-individual variation (SIV). Two examples are shown: OR2L13 and HLA-DQB2.
[0017] FIGS. 3a-3d: Epigenetically supersimilar (ESS) probes are enriched for systemic inter-individual variation (SIV). 3a, Schematic of analytical strategy applied to data of Lokk el al. [20] on abdominal aorta, gall bladder, and sciatic nerve from each of 4 individuals. Inter-individual and tissue- specific variation were quantified as the range of the individuals' mean beta values (mi, m2, m3, m4) and the tissues’ mean beta values (paa, mgb, pSn), respectively.
3b, Tissue-specific vs. inter-individual variation for 344,151 probes. Histograms (top and side) indicate the density distribution. Green lines illustrate cutoffs used to identify 1,042 SIV probes (shaded region: inter-individual > 0.2 and tissue- specific < 1/3 of inter-individual variation). 3c, Examples of bisulfite pyrosequencing data confirming systemic inter- individual variation in selected loci: PF4 and LDHC. 3d, The 1,580 probes with evidence of ESS are 6.3-fold enriched for SIV (p < 1010, chi-squared test).
[0018] FIGS. 4a-4f: Regions of epigenetic supersimilarity (ESS) and systemic interindividual variation (SIV) share genomic and epigenomic features. 4a, Normalized DZ MSE vs. MZ MSE for the 6,968 probes with range > 0.4, of which 489 (red) show substantial mQTL. Inset: ESS probes are l5-fold enriched for substantial mQTL (p < 1010, chi-squared test). 4b, Tissue specific vs. inter-individual variation at 344,151 probes, of which 2,702 (red) are substantial mQTL. Inset: SIV probes are 24-fold enriched for substantial mQTL (p < 1010, chi- squared test). 4c, After filtering out substantial mQTL, ESS and SIV hits overlap > 2/3 of probes at previously identified MEs [13]. 4d, Relative to all probes in the top 10% of inter-individual variance, ESS and SIV probe sets are enriched for CpG islands (both comparisons p < 1010, chi- squared test). 4e, Gene set enrichment analysis shows that both ESS and SIV probes are enriched
73689209.1 for genes expressed in cancer (P = 4.7 x 108 and 4.8 x 109, respectively). Each row represents a different type of cancer in The Cancer Genome Atlas [24]. 4f, Association of probe sets with epigenomic feature annotations derived from 111 reference epigenomes [25]. ESS and SIV probes are enriched for active promoters (TssA) and underrepresented at enhancers (Enh) (all 4 comparisons P < 1010).
[0019] FIGS. 5a-5f: Interactions between DNA methylation and local sequence context at some top ESS regions. 5a, 5b, 5c, and 5d show average methylation vs. SNP genotype at ESS regions within CYP2E1, DUSP22, SPATC1L, and ZFP57, respectively. In each panel, gene diagram (top) shows location of ESS region where methylation analysis was performed (asterisk) relative to that of a SNP that was genotyped in 64 Gambian children. Grid summarizes normalized linkage disequilibrium ( D’) across these ~3kb regions in a Gambian population in Western Gambia (GWD, 1000 Genomes Project [77]). With the exception of G/G individuals at rs3129057 ( ZFP57 ), there is substantial inter-individual variation in average methylation within each genotype class. At CYP2E1 (5a), average methylation is not associated with SNP genotype (P = 0.31). At DUSP22, SPATC1F, and ZFP57, (5b, 5c, and 5d) average methylation is associated with genotype (P = 0.002, 0.02, and 0.0001, respectively). At these same loci, inter-individual variance differs between the two homozygous genotypes; i.e. C/C vs. T/T at DUSP22 (P = 0.02), G/G vs. A/A at SPATC1F (P = 0.04), and G/G vs. A/A at ZFP57 (P = l.9xl06). 5e and 5f, Clonal bisulfite sequencing data at two homozygous individuals at each of SPATC1F and ZFP57, respectively, confirm dramatic inter-individual variation in DNA methylation in the absence of local sequence variation. Black, empty, and gray circles represent methylated, unmethylated, and indeterminate CpG sites, respectively. Vertical red line indicates the position of the SNP.
[0020] FIGS. 6a-6b: Sites of epigenetic supersimilarity (ESS) and systemic interindividual variation (SIV) are enriched for effects of periconceptional environment on DNA methylation. 6a, Relative to all probes on the HM450 array, mQTL-filtered ESS and SIV probes (but not negative control probes) are highly enriched for significant (FDR < 10%) associations with season of conception in rural Gambia. 6b, Heat map of average effect of season of conception at loci that show a significant seasonal difference in methylation (FDR <10%). At both ESS and SIV probes, as in previous studies of MEs in independent cohorts [13, 14], children conceived in the rainy season have higher methylation.
73689209.1 [0021] FIGS. 7a-7g: At clusters of probes showing epigenetic supersimilarity (ESS), peripheral blood methylation at baseline is associated with risk of later cancer. Manhattan plots illustrating results of conditional logistic regression analyses of the association between baseline probe-specific methylation (HM450) and risk of later 7a, Breast cancer, 7b, Colorectal cancer, 7c, Kidney cancer, 7d, Lung cancer, 7e, Mature B-cell neoplasm, 7f, Prostate cancer, and 7g, Urothelial cell carcinoma. Only probes within clusters of > 2 probes are shown. Probes plotted with positive values (red) have positive coefficients (i.e. more methylation in cases than controls) and probes plotted with negative values (green) have negative coefficients (delta beta value scale indicated). The dotted lines indicate P = 0.05. Among the 10 most CpG-rich ESS clusters, colored boxes indicate 7 at which methylation is significantly associated with later cancer: ZFP57 (colorectal cancer, P = 0.008), SPATC1L (colorectal cancer, P = 0.009, and prostate cancer, P = 0.01), OR2L13 (lung cancer, P = 0.010), VTRNA2-1 (lung cancer, P = 0.025, and MBCN, P = 0.009), DUSP22 (MBCN, P = 0.001, and UCC, P = 0.001), HCG4B (prostate cancer, P = 0.007), and PF4 (UCC, P = 0.013).
[0022] FIGS. 8a-8c: Statistical significance of ESS probes. 8a, Unlike the expected normal distribution, the ratio of DZ MSE to MZ MSE is skewed strongly to the right. (ESS probes are selected based on (DZ MSE)/(MZ MSE) > 2.) Relative to a normal distribution modeled on the left side of the data (red, mean ± sd = 1.075 ± 0.18) probability of ESS probes is P < 0.0001. 8b, The same analysis as in Fig. 2b, but including only probes with inter-individual b range > 0.4 in the Grundberg el al. data set. Compared to FIG. 2b, the DZ/RZ MSE (right histogram) remains normally distributed, but the left skewing of the MZ/RZ MSE (top histogram) is more pronounced. (Red curves show normal curves fitted to the data.) 8c, DZ MSE to MZ MSE distribution of probes with range > 0.4 in the Grundberg et al. data set. Relative to a normal distribution modeled on the left side of the data (red, mean ± sd = 1.025 ± 0.18) enrichment of ESS probes remains highly significant (P < 0.0001).
[0023] FIG. 9: Bisulfite pyrosequencing validation data in ESS hits, to test for SIV.
Each plot shows inter-tissue correlation for tissues from 17 cadavers. ZNF714, RYRJ PRDM9, and HCG4B yielded at least one inter-tissue correlation with r2 > 0.50 (i.e. validated). WDSUB I, LEPR, and CABLES1 failed to validate.
[0024] FIG. 10: Bisulfite pyrosequencing validation data in hits from the SIV screen. Each plot shows inter-tissue correlation for tissues from 17 cadavers. ERICH1, SRPRB,
73689209.1 and C10RF86 yielded at least one inter-tissue correlation with r2 > 0.50 (i.e. validated).
PM20D1, LOC101927932, KCNE1, and KLHL35 failed to validate (although PM20D1 and KLHL35 achieved inter-tissue correlations just short of the cutoff).
[0025] FIG. 11. Cutoff for substantial mQTL. Of the 34,304 probes Shi et al.
identified as showing significant c .s-mQTL, only 4,306 (blue) have at least 33% of their variation explained by mQTL.
[0026] FIG. 12. HM450 probes identified as mQTL in three different studies. Venn diagram illustrates overlap of probes identified as showing mQTL in Grundberg et al. (n=36,l39, conservative P value cutoff), Volkov et al. (n=l5,208), and Shi et al. (n=4306, substantial mQTL (i.e. fisNP > 0.33)). Over half of the Shi et al. substantial mQTL probes overlap with either of the other two sets.
[0027] FIGS. 13a-13c. Negative control probes. 13a, Negative control probes all have inter-individual range >0.4 in the Grundberg data set. 13b, Negative control probes (red dots) compared to FIG. 2B; they are distinct from ESS hits. 13c, Negative control probes (red dots) compared to FIG. 3B; they are distinct from SIV hits.
[0028] FIGS. 14a-14b. Both ESS and SIV probe sets are enriched in subtelomeric regions. 14a, Distribution of ESS and SIV probes in the 10 Mb from all chromosome ends, relative to negative control probes. Both ESS and SIV probes are concentrated in the 2 Mb subtelomeric regions. 14b, Same plots, but excluding probes with evidence of substantial mQTL. The same enrichment is observed, indicating that the subtelomeric enrichment is not due to the greater concentration of SNPs (and hence greater potential for mQTL) in subtelomeric regions.
[0029] FIGS. 15a-15g. Examples of associations between cluster-average
methylation in adipose tissue and expression of associated genes in adipose tissue (left), lymphoblastoid cell lines (LCL, middle), and skin (right). Associations are illustrated for 15a, ZFP57, 15b, PM20D1, 15c, LDHC, 15d, SRPRB, 15e, TRIM61, 15f, PSMB9, and 15g,
C220RF34. At ZFP57, methylation in adipose tissue is inversely associated with expression in LCL but not in adipose tissue or skin. At LDHC , methylation in adipose tissue is strongly inversely associated with expression in adipose tissue, LCL, and skin. At SRPRB and
C220RF34, conversely, methylation in adipose tissue is positively associated with expression in adipose tissue, LCL, and skin.
73689209.1 [0030] FIGS. 16a-16g. At clusters of negative control probes, peripheral blood methylation at baseline is not strongly associated with risk of later cancer. Associations are tested with respect to 16a, Breast cancer, 16b, Colorectal cancer, 16c, Kidney cancer, 16d, Lung cancer, 16e, Mature B cell neoplasm, 16f, Prostate cancer, and 16g, Urothelial cell carcinoma. Unlike at the ESS clusters (FIG. 7), methylation is significantly associated with risk of later cancer at only a few of the top 13 most CpG-rich negative control clusters (FDR<0.25, labeled boxes). In every case, the magnitude of the effect is extremely small (Delta Beta < 0.01).
[0031] FIGS. 17a-17g. Associations with later cancer among those of the top 10 ESS clusters containing no substantial mQTL probes. Associations are tested with respect to 17a, Breast cancer, 17b, Colorectal cancer, 17c, Kidney cancer, 17d, Lung cancer, 17e, Mature B cell neoplasm, 17f, Prostate cancer, and 17g, Urothelial cell carcinoma. Of the 10 significant associations illustrated in FIG. 7, 6 are at ESS clusters with no evidence of mQTL.
[0032] FIG. 18. Season of conception study: Principal component analysis scree plot.
DETAILED DESCRIPTION
[0033] As used herein the specification,“a” or“an” may mean one or more. As used herein in the claim(s), when used in conjunction with the word "comprising", the words“a” or “an” may mean one or more than one. As used herein“another” may mean at least a second or more. Still further, the terms“having”,“including”,“containing” and“comprising” are interchangeable and one of skill in the art is cognizant that these terms are open ended terms. Some embodiments of the disclosure may consist of or consist essentially of one or more elements, method steps, and/or methods of the disclosure. It is contemplated that any method or composition described herein can be implemented with respect to any other method or composition described herein.
[0034] The present disclosure concerns methods and compositions for determining whether or not a subject is at risk for developing a medical condition, such as cancer. The risk may be ascertained for onset of the medical condition later in the life of the individual, given that the evaluation may occur decades before onset of the medical condition because of the stable nature of the epigenetic polymorphism. In specific embodiments, methods of the disclosure provide information on whether or not an individual will develop a medical condition such as cancer and may be ascertained at any stage of life of the individual, including upon birth or as an
73689209.1 infant, child, or adolescent, for example. Such methods and compositions encompass ascertaining the degree to which one or more certain sites in genomic DNA of the subject are methylated, including isolated CpG sites or one or more CpG sites within a cluster of CpG sites, including a CpG island. The CpG sites are located in particular loci in the genome, in particular embodiments. Particular aspects of the methods include analysis of the methylation status of one or more CpG sites in a certain locus or loci for the specific and intended purpose of determining the risk of a subject for developing cancer.
[0035] In particular embodiments, a particular genomic methylation pattern (including epigenetic variants, for example) for an individual occurs stably and systemically (i.e. is the same in essentially all different tissues). In specific embodiments, methods of the disclosure do not assay genomic DNA from cells in disease-affected tissue itself or do not assay genomic DNA from cells in tissue for which disease development is of particular concern. Instead, genomic DNA analysis from one tissue and/or body fluid provides information on risk for development of disease in another tissue and/or body fluid. In specific embodiments, genomic DNA is analyzed from a first tissue and/or body fluid from an individual that provides information on risk for development of cancer in a second tissue and/or body fluid, including risk for future
development on the order of time of years or even decades.
[0036] In certain embodiments, a subject is bom with a particular methylation pattern in the subject’s genomic DNA in which the pattern is associated with at least one CpG in a region selected from the group consisting of ZFP57, SPATC1L, OR2L-13, VTRNA2-1, DUSP22, HCG4B, PF4, and a combination thereof. Such an individual at the time of analysis does not or may not have cancer but the methylation pattern present at the time of birth in the genomic DNA of the subject is indicative of a risk for cancer development later in life, including years later, for example.
[0037] Embodiments of the disclosure concern methods of screening for a risk for cancer in a sample comprising genomic DNA obtained from a subject, the method comprising: 1) assaying a methylation state of at least one marker in genomic DNA obtained from the subject; and 2) identifying the subject as having a risk for a cancer when the methylation state of the marker is different than a methylation state of the marker assayed in a subject that does not have a neoplasm or different from a standard, wherein the marker comprises at least one CpG in a
73689209.1 region selected from a group consisting of ZFP57, SPATC1L, OR2L-13, VTRNA2-1, DUSP22, HCG4B, PF4, and a combination thereof.
[0038] Embodiments of the disclosure encompass methods for characterizing a sample obtained from a subject, comprising: a) obtaining a sample comprising genomic DNA from the subject; b) assaying a methylation state of one or more markers in the genomic DNA, wherein the one or more markers comprise at least one CpG site; c) comparing the methylation state of the assayed marker to the methylation state of the marker assayed in a subject that does not have a neoplasm or comparing the methylation state to a standard. The marker may comprise at least one CpG in a region selected from the group consisting of ZFP57, SPATC1L, OR2L-13, VTRNA2-1, DUSP22, HCG4B, PF4, and a combination thereof.
[0039] In particular embodiments, for a particular CpG site, elevated methylation at that site is associated with elevated risk for cancer, whereas in other cases elevated methylation at a particular site is associated with reduced risk for cancer. In embodiments when multiple CpG sites are evaluated, high methylation at all of the sites indicates the risk for cancer, whereas in other cases low methylation at all of the sites indicates the risk for cancer. In some cases when multiple CpG sites are evaluated, a risk for cancer is indicated based on methylation at only a subset of these.
[0040] In specific embodiments, the methods are performed as a routine part of health care for the subject. Such routine screening may occur in utero, upon birth, for an infant, child, adolescent, or adult, for example. In other cases the methods are performed for a subject for other reasons, such as having concern for the risk or presence of cancer. In such cases, the individual may be considered to be at risk because of aging, tobacco use, sun exposure, radiation exposure, exposure to chemicals and other substances, some viruses and bacteria, certain hormones, family or personal history of cancer, alcohol, poor diet, lack of physical activity, or being overweight, for example. Individuals for which the methods of the disclosure may be performed include humans, dogs, cats, horses, cows, and so forth.
[0041] In some embodiments, the technology is related to assessing the presence of and/or risk of and/or methylation state of one or more of the markers identified herein in a biological sample. In particular embodiments of the disclosure, methylation status refers to measuring average methylation of one or more CpG sites across a plurality of genomic DNA molecules from a sample. In particular embodiments, the methods encompass ascertaining the
73689209.1 degree to which one or more certain sites in genomic DNA of the subject are methylated, including isolated CpG sites or one or more CpG sites within a cluster of CpG sites, including a CpG island. The measurements, in at least specific cases, may refer to a quantitative measure of methylation (i.e., proportional (0-1) or percent (0-100%). For example, among a plurality of genomic DNA molecules from a sample, a certain percentage of them have methylation at a particular CpG site, and that percentage is compared to a standard. For indications of risk for an individual, that percentage may be higher or lower than the standard, depending on the particular marker. Thus, markers include isolated CpG sites or CpG islands and comprise one or more regions as provided herein.
[0042] Methylation state is assessed in embodiments of the technology. As such, the technology provided herein is not restricted in the method by which a methylation state of one or more loci is measured. In specific embodiments, the DNA methylation profiles are determined by hybridization-based arrays or by the use of next-generation sequencing techniques, polymerase-based as well as ligase based sequencing technologies like pyro sequencing, sequencing by ligation, single-molecule sequencing and/or or nanopore sequencing alone or in combination with the bisulfite treatment of cytosine nucleotides. In some embodiments the assaying comprises using methylation specific polymerase chain reaction, nucleic acid sequencing, mass spectrometry, methylation specific nuclease, mass-based separation, and/or target capture. In some embodiments, the assaying comprises use of a methylation specific oligonucleotide. In some embodiments, the technology uses massively parallel sequencing ( e.g ., next-generation sequencing) to determine methylation state, e.g., sequencing-by-synthesis, real time (e.g., single-molecule) sequencing, bead emulsion sequencing, nanopore sequencing, etc.
[0043] In specific embodiments, the methylation status of at least one of ZFP57, SPATC1L, OR2L-13, VTRNA2-1, DUSP22, HCG4B, and PF4 is different in an individual’s sample compared to a control sample.
[0044] Also provided herein are compositions and kits for practicing the methods. For example, in some embodiments, reagents (e.g., primers, probes) specific for one or more markers are provided alone or in sets (e.g., sets of primers pairs for amplifying a plurality of markers). Additional reagents for conducting a detection assay may also be provided (e.g., enzymes, buffers, positive and negative controls for conducting QuARTS, PCR, sequencing, bisulfite, or other assays). In some embodiments, the kits containing one or more reagents necessary,
73689209.1 sufficient, or useful for conducting a method are provided. Also provided are reactions mixtures containing the reagents. Further provided are master mix reagent sets containing a plurality of reagents that may be added to each other and/or to a test sample to complete a reaction mixture.
[0045] Particular kit embodiments may be provided, e.g., a kit comprising a bisulfite reagent; and a control nucleic acid comprising a sequence from a particular loci and having a methylation state associated with a subject who does not have a cancer. In some embodiments, kits comprise a bisulfite reagent and an oligonucleotide as encompassed herein. In some embodiments, kits comprise a bisulfite reagent; and a control nucleic acid comprising a sequence from a particular loci and having a methylation state associated with a subject who has a cancer. Some kit embodiments comprise a sample collector for obtaining a sample from a subject;
reagents for isolating a nucleic acid from the sample; a bisulfite reagent; and/or one or more oligonucleotides as encompassed herein. Enzymes that facilitate methylation analysis may be included in the kit.
[0046] Furthermore, in some embodiments the region of the genome associated with the epigenetic polymorphism to be analyzed may be considerably greater than the particular CpG dinucleotide for which methylation status is analyzed. In some cases, a region is analyzed of 100 or fewer bases, 200 or fewer bases, 300 or fewer bases, 400 or fewer bases, 500 or fewer bases, 600 or fewer bases, 700 or fewer bases, 800 fewer bases, 900 or fewer bases, 1000 or fewer bases, 5000 or fewer bases, or, in some embodiments, the marker is one or two bases that may or may not be contiguous. In some embodiments the marker is in a high CpG density region. The region that is analyzed may be comprise 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, or more epigenetic polymorphisms that are CpG dinucleotides.
[0047] In some embodiments a sample obtained from a subject is a stool sample, a tissue sample, a pancreatic juice sample, a pancreatic cyst fluid sample, a blood sample (e.g., plasma, serum, follicles, fingernails, whole blood and including peripheral blood), an excretion, a urine sample, biopsy, hair, saliva, cheek scrapings, cerebrospinal fluid, urine, nipple aspirate, semen, sputum, mucus, or a combination thereof.
[0048] Although certain embodiments of the disclosure concern methods for the risk analysis, other methods concern diagnosis, prognosis and/or therapy monitoring of a disease in a subject, comprising determining a DNA methylation profile in a subject sample comprising genomic DNA from cells, tissue and/or fluid such as peripheral blood (for example); and
73689209.1 comparing the DNA methylation profile in the sample with the DNA methylation profile from a normal subject not having the disease, wherein a difference in the DNA methylation profile is indicative of a disease or of the risk for developing the disease or for a prediction of therapy effects or therapy outcome.
[0049] In particular aspects, the present technology provides compositions and methods for identifying whether or not a subject is at risk for cancer, determining that a subject has cancer, classifying a cancer, and/or determining whether or not an individual will be responsive to a therapy. Embodiments of the disclosure include risk assessment of one or more subjects including evaluation of one or more factors, such as DNA methylation profiles, mutations, biomarkers, etc., in order to predict the risk of future events and, in some cases, in order to decide about the type, manner, doses, regimen of therapy and/or treatment for the individual subject.
[0050] In some embodiments of the technology, methods are provided that comprise the step of contacting a nucleic acid ( e.g ., genomic DNA, e.g., isolated from a body fluid(s) and/or tissue(s)) obtained from a subject with at least one reagent or series of reagents that determines the degree to which CpG dinucleotides within at least one marker are methylated.
[0051] Genomic DNA may be isolated by any means, including the use of commercially available kits. Briefly, wherein the DNA of interest is encapsulated in by a cellular membrane the biological sample must be disrupted and lysed by enzymatic, chemical or mechanical means. The DNA solution may then be cleared of proteins and other contaminants, e.g., by digestion with proteinase K. The genomic DNA is then recovered from the solution. This may be carried out by means of a variety of methods including salting out, organic extraction, or binding of the DNA to a solid phase support. The choice of method will be affected by several factors including time, expense, and required quantity of DNA. All clinical sample types are suitable for use in the present method, e.g., cell lines, histological slides, biopsies, paraffin-embedded tissue, body fluids, stool, colonic effluent, urine, blood plasma, blood serum, whole blood, isolated blood cells, cells isolated from the blood, and combinations thereof.
[0052] In some embodiments, when risk for a cancer is determined for a subject and the subject develops cancer (metastatic or not), the technology includes methods for refining the treatment of the subject (e.g., a subject with cancer, with early stage cancer, or who may develop cancer), the method comprising determining the methylation state of one or more loci as
73689209.1 provided herein and administering a treatment to the subject based on the results of determining the methylation state. The treatment may be administration of a pharmaceutical compound, a vaccine, performing a surgery, imaging the subject or a part of the subject, performing another test, or a combination thereof. In some embodiments, the use is in a method of clinical screening, a method of prognosis assessment, a method of monitoring the results of therapy, a method to identify patients most likely to respond to a particular therapeutic treatment, a method of imaging a patient or subject, and/or a method for drug screening and development.
[0053] In some embodiments of the technology, a method for diagnosing a cancer in a subject is provided. The terms "diagnosing" and "diagnosis" as used herein refer to methods by which the skilled artisan can estimate and even determine whether or not a subject is suffering from a given disease or condition or may develop a given disease or condition in the future. The skilled artisan often makes a diagnosis on the basis of one or more diagnostic indicators, such as for example a biomarker ( e.g ., one or more loci as disclosed herein), the methylation state of which is indicative of the risk for, presence, severity, or absence of the condition.
[0054] Along with identification of risk, in some embodiments one can determine the aggressiveness of the cancer and the likelihood of tumor recurrence to plan the most effective therapy. If a more accurate prognosis can be made or even a potential risk for developing the cancer can be assessed, appropriate therapy, and in some instances less severe therapy for the patient can be chosen. Assessment (e.g., determining methylation state) of cancer biomarkers is useful to separate subjects with good prognosis and/or low risk of developing cancer who will need no therapy or limited therapy from those more likely to develop cancer or suffer a recurrence of cancer who might benefit from more intensive treatments.
[0055] In some embodiments, a subject is diagnosed as having a cancer if, when compared to a control methylation state, there is a measurable difference in the methylation state of at least one biomarker in the sample. Conversely, when no change in methylation state is identified in the biological sample, the subject can be identified as not having cancer, not being at risk for the cancer, or as having a low risk of the cancer. In this regard, subjects having the cancer or risk thereof can be differentiated from subjects having low to substantially no cancer or risk thereof. Those subjects having a risk of developing a cancer can be placed on a more intensive and/or regular screening schedule, including surveillance or any kind. On the other hand, those subjects having low to substantially no risk may avoid being subjected to one or
73689209.1 more procedures, until such time as a future screening, for example, a screening conducted in accordance with the present technology, indicates that a risk of cancer has appeared in those subjects.
[0056] As mentioned above, depending on the embodiment of the method of the present technology, detecting a change in methylation state of the one or more biomarkers can be a qualitative determination or it can be a quantitative determination. As such, the step of diagnosing a subject as having, or at risk of developing, a cancer indicates that certain threshold measurements are made, e.g., the methylation state of the one or more biomarkers in the biological sample varies from a predetermined control methylation state. In some embodiments of the method, the control methylation state is any detectable methylation state of the biomarker. In other embodiments of the method where a control sample is tested concurrently with the biological sample, the predetermined methylation state is the methylation state in the control sample. In other embodiments of the method, the predetermined methylation state is based upon and/or identified by a standard curve. In other embodiments of the method, the predetermined methylation state is a specifically state or range of state. As such, the predetermined methylation state can be chosen, within acceptable limits that will be apparent to those skilled in the art, based in part on the embodiment of the method being practiced and the desired specificity, etc.
[0057] Encompassed herein are methods of measuring for (or detecting) the methylation state of one or more regions of ZFP57, SPATC1L, OR2L-13, VTRNA2-1, DUSP22, HCG4B, PF4, or a combination thereof, including from individuals suspected of being at risk for a particular medcial condition, such as cancer. In specific embodiments, the analysis utilizes comparison to a control or standard (for example, wherein one determines the methylation state in one or more individuals that are known not to have cancer).
[0058] Embodiments of the disclosure include methods of detecting the risk for cancer in an individual by obtaining a sample from an individual at an elevated risk for cancer when compared to the general population and detecting the degree methylation at one or more CpG dinucleotides in one or more genes comprising, consisting of, or consisting essentially of ZFP57, SPATC1F, OR2F-13, VTRNA2-1, DUSP22, HCG4B, PF4, or a combination thereof.
Embodiments also include methods of prognosticating and treating cancer in an individual by obtaining a sample from an individual suspected of being at risk for cancer; prognosticating a risk for cancer when one or more particular methylation states are detected in one or more
73689209.1 regions of ZFP57, SPATC1L, OR2L-13, VTRNA2-1, DUSP22, HCG4B, PF4, or a combination thereof; and administering one or more cancer vaccines and/or therapies and/or preventative measures (lifestyle changes, such as improvement of diet and or exercise) and/or monitoring of the individual for the onset of cancer.
[0059] In particular embodiments of the disclosure, there are compositions that facilitate determination of a subject’s risk for cancer development. Such compositions include any nucleic acid (polynucleotide and/or oligonucleotide, for example) that facilitates analysis of methylation of any loci encompassed herein. Compositions also include substrates that have attached thereto nucleic acids to which probes are subjected, and the substrates and/or probes are associated with one or more genes comprising, consisting of, or consisting essentially of ZFP57, SPATC1L, OR2L-13, VTRNA2-1, DUSP22, HCG4B, PF4, and a combination thereof.
[0060] It is envisioned that quantitation of methylation at ESS CpGs may be evaluated by a variety of methods, including (but not limited to) methylation-sensitive endonuclease digestion and gel chromatography, bisulfite pyrosequencing, MassArray (Epityper®), oligonucleotide array-based approaches, target capture bisulfite sequencing, or direct nanopore sequencing, as examples.
[0061] Further with respect to diagnostic methods, a particular subject is a vertebrate subject, such as a mammal including human and animal subjects. Thus, veterinary therapeutic uses are provided herein, including animals kept as pets or in zoos. Examples of such animals include but are not limited to cats, dogs, swine, ruminants, ungulates, etc. Thus, also provided is the diagnosis and treatment of livestock, including, but not limited to, domesticated swine, ruminants, ungulates, horses (including race horses), and the like.
[0062] Methods of the disclosure encompass identifying a methylation status for one or more CpG sites, such as any one or more of the specific sites listed in Table 1. The methods may include analysis of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more, including all, of the specific sites listed in Table 1, Table 3, Table 4, or Table 5. In specific embodiments, one or more of the specific CpG sites in ZFP57, SPATC1L, OR2L-13, VTRNA2-1, DUSP22, HCG4B, and/or PF4 may be analyzed, including those listed in Table 4 or Table 5, at least. In some cases, epigenetic polymorphisms other than those listed herein may be analyzed for methylation status.
73689209.1 EXAMPLES
[0063] The following examples are included to demonstrate preferred embodiments of the invention. It should be appreciated by those of skill in the art that the techniques disclosed in the examples that follow represent techniques discovered by the inventors to function well in the practice of the invention, and thus can be considered to constitute preferred modes for its practice. However, those of skill in the art should, in light of the present disclosure, appreciate that many changes can be made in the specific embodiments which are disclosed and still obtain a like or similar result without departing from the spirit and scope of the invention.
EXAMPLE 1
EARLY EMBRYONIC EPIGENETIC DEVELOPMENT AND ADULT DISEASE
[0064] Background: Monozygotic twins have long been studied to estimate heritability and explore epigenetic influences on phenotypic variation. The phenotypic and epigenetic similarity of monozygotic twins have been assumed to be largely due to their genetic identity.
[0065] Results: Here, by analyzing data from a genome- scale study of DNA methylation in monozygotic and dizygotic twins, the inventors identified genomic regions at which the epigenetic similarity of monozygotic twins is substantially greater than can be explained by their genetic identity. This‘epigenetic supersimilarity’ apparently results from locus-specific establishment of epigenotype prior to embryo cleavage during twinning. Epigenetically supersimilar loci exhibit systemic inter-individual epigenetic variation and plasticity to periconceptional environment, and are enriched in sub-telomeric regions. In case-control studies nested in a prospective cohort, blood DNA methylation at these loci years before diagnosis was associated with risk of developing several types of cancer.
[0066] Conclusions: These results establish a link between early embryonic epigenetic development and adult disease. More broadly, epigenetic supersimilarity is a previously unrecognized phenomenon that may contribute to the phenotypic similarity of monozygotic twins.
73689209.1 EXAMPLE 2
EPIGENETIC SUPERSIMILARITY IN MZ TWINS
[0067] Rather than being predominantly determined by genetics, inter-individual variation in DNA methylation at MEs is determined, at least in part, stochastically [6] and influenced by the nutritional milieu of the preimplantation embryo [10-12]. It was therefore considered that, at MEs, methylation concordance within MZ twin pairs would be greater than that of unrelated individuals, but comparable to that within DZ twin pairs. To test this, the inventors analyzed a genome-scale DNA methylation data set from Grundberg el al, who used the HM450 array to assess methylation in adipose tissue from adult female twins of European- descent (97 MZ twin pairs and 162 DZ twin pairs) [9]. As did Grundberg et al, low-quality probes potentially affected by single nucleotide polymorphisms (SNPs) were discarded and, of the remaining 344,303 probes, analysis was focused on the 10% (34,405) with the highest inter individual variance (hereafter referred to as the top 10%).
[0068] Within regions previously identified as candidate or bona fide MEs [13, 14], the inventors assessed twin-twin methylation concordance inversely by probe- specific mean square error (MSE) of b values. MSE assesses the deviation of a twin pair from the line of identity, providing a direct measure of discordance. Contrary to the expectation, MZ twin concordance in putative ME regions was between 2.5- and 16.5-fold higher than that of DZ twins (FIG. 1). This indicated that establishment of DNA methylation at these regions is under genetic control. To test this, the inventors examined the probe- specific narrow-sense heritability (h2) estimates (based on the ACE method [15]) from Grundberg et al. [9]; h2 is the proportion of phenotypic variation in a population that is attributable to genetic variation [16]. Strikingly, 1,058 probes (3% of total) showed h2 estimates > 1 (FIG. 2a). Most of the probes within the candidate MEs featured in FIG. 1 were among them (FIG. 2a) indicating that these superordinate h2 values are not simply a result of sampling error.
[0069] Clearly, h2 values > 1 are difficult to interpret. To better understand this observation, MSE was calculated for all 34,405 top 10% probes [9]. To elucidate the extent to which DZ and MZ twins are more similar than pairs of unrelated individuals, probe-specific MSEs were normalized relative to randomized pairs (RZ), simulating pairwise MSE within the general population. DZ/RZ MSE and MZ/RZ MSE were generally <1, as expected (FIG. 2b). Genetic influences on CpG methylation generally occur when the local sequence context in cis
73689209.1 (i.e. a haplotype) affects establishment of methylation [17]. Given that DZ twins are identical by descent at 50% of haplotypes [18] and MZ twins at 100% of haplotypes, a model based on genetic determination predicts that the mean normalized DZ MSE should be no more than twice the mean normalized MZ MSE. Hence, for probes to the left of the green line (y=2x) in FIG. 2b, MZ twin pairs show greater-than-expected similarity in DNA methylation. The inventors refer to this phenomenon as‘epigenetic super similarity’ (ESS). According to the central limit theorem, assuming that probe- specific methylation is determined by many unobserved (genetic) factors, the mean intra-pair errors should be normally distributed. Indeed, normalized DZ MSE are, but normalized MZ MSE are skewed to the left (P = 7.0 x 1066) (Figure 2b). Each probe with DZ/MZ MSE > 2 (corresponding to those left of the green line in FIG. 2b) is > 5 sd away from the expected normal mean (P < 0.0001) (FIG. 8a), well beyond the range of sampling error. Most of the probes for which Grundberg el al. estimated h2 > 1 are characterized as ESS (FIG. 2b). Initial validation studies found that many ESS probes with inter-individual b range < 0.4 in the Grundberg et al. data set [9] are essentially unmethylated in several human primary tissues. The inventors therefore refined the selection criteria to MSE DZ/MZ > 2 and an inter-individual b range > 0.4, identifying 1,580 probes (4.6% of the 34,405) as ESS (FIG. 8b, and Table 1 in Appendix A). Across ah probes with b range > 0.4, normalized DZ MSE remained normally distributed, but normalized MZ MSE were shifted even further to the left (FIGS. 8b, 8c).
[0070] To further test whether the superordinate heritability estimates of Grundberg et al. might somehow result from the genetic identity of MZ twin pairs, the data was analyzed on methylation quantitative trait loci (mQTL), i.e. sequence variants correlated with methylation at specific CpG sites [19]. Grundberg et al. [9] combined their genotyping and HM450 data on 603 adipose tissue samples and applied a conservative significance threshold (P < 1.2 x 109), identifying 9,708 mQTL probes within the top 10% of inter-individual variance. Among these, as expected, the strength of the mQTL association was positively associated with heritability (FIG. 2c, top). There was no such association across ESS probes (FIG. 2c, middle). If the superordinate heritability associated with ESS results from the genetic identity of MZ twins, the mean heritability of ESS probes with mQTL should be higher than that of those without mQTL. This was not the case (FIG. 2c, middle and bottom). This analysis, using mQTL data from the same samples in which the inventors identified ESS, provides strong evidence that ESS is not simply a consequence of the isogenicity of MZ twins.
Testing a model for ESS
73689209.1 [0071] During MZ twinning, if de novo DNA methylation at a particular locus occurs prior to embryo cleavage, both twins will inherit the same epigenotype at the locus simply because of developmental timing, rather than as a consequence of their genetic identity [14]. This provides a potential explanation for ESS (FIG. 2d). If correct, methylation at ESS loci must be established in the cleavage- stage embryo. If the epigenetic state is maintained during subsequent cellular differentiation, these loci should show systemic inter-individual variation in DNA methylation.
[0072] To test this, the inventors selected 13 ESS regions and assessed systemic inter individual variation (SIV) by bisulfite pyrosequencing in liver, kidney, and brain of cadaver tissues [13]. Methylation tended to be correlated in these tissues derived from the different embryonic germ layers (FIG. 2e and FIG. 9). Overall, 9 (69%) of the 13 loci showed evidence of SIV. For a broader evaluation, the inventors analyzed a previously published data set from Fokk et al, who profiled multiple tissues from each of several individuals using the HM450 platform [20]. From each of 4 individuals, data was considered for all SNP-free and high-quality HM450 probes for tissues representing the three embryonic germ layer lineages: gall bladder
(endodermal), abdominal aorta (mesodermal), and sciatic nerve (ectodermal). Individual- and tissue-specific methylation were estimated as the average across the 3 tissues and the 4 individuals, respectively, and variation was quantitated as the range of these averages (FIG. 3a). Though most probes showed little of either (see histograms, FIG. 3b), tissue-specific was generally greater than inter-individual variation (FIG. 3b). To focus on robust SIV the inventors restricted analysis to probes with inter-individual variation that was at least 0.2 delta b and 3 times greater than tissue-specific variation (FIG. 3b, shaded region). These cutoffs identified 1042 probes with evidence of SIV (Table 1 in Appendix A). Bisulfite pyrosequencing in cadaver tissues (FIG. 3c and FIG. 10) confirmed SIV at 8 (67%) of 12 regions evaluated.
[0073] Perfect overlap between ESS and SIV probe sets was not anticipated for two reasons. First, as they survey only 4 individuals, the Fokk et al. data cannot capture all inter individual variation. Second, epigenetic states established prior to gastrulation may not be maintained in all differentiated lineages (i.e. early embryonic establishment is necessary but not sufficient for SIV). Nonetheless, relative to the 5388 non-ESS probes with inter-individual range > 0.4, the 1,580 ESS probes were 6.3-fold enriched for SIV (p < 1010, chi-squared test) (FIG.
3d), supporting the model for the developmental basis of ESS (FIG. 2d).
73689209.1 ESS and SIV sites share genomic and epigenomic features, and are enriched for
MEs
[0074] ESS appears to be a marker for individual-specific epigenetic states that are established in the cleavage- stage embryo. Such states could be established under genetic influence, or stochastically; only the latter are consistent with epigenetic metastability [6]. The mQTL data of Grundberg el al. (FIG. 2c) demonstrate that ESS is not generally associated with genetic effects. To test this more generally additional data sets were evaluated in which the HM450 platform was used to assess mQTL in at least 100 individuals [21]. Volkov et al. [22] profiled SNPs and DNA methylation in adipose tissue of 119 men and identified 15,208 CpG sites with significant cis-mQTL. Shi et al. [23] assessed mQTL in histologically normal lung tissue from 210 individuals and reported estimates of the proportion of methylation variance explained by neighboring SNPs (which the inventors refer to as fisN/J- The inventors considered probes with fisNP > 0.33 as exhibiting substantial mQTL. Of the 34,304 probes Shi et al.
identified with statistically significant c .s-mQTL, only 4,306 (12.6%) showed substantial mQTL (FIG. 11). Although both the Grundberg et al. [9] and Volkov et al. data [22] were based on adipose tissue, less than half of the mQTL probes identified by either were identified in both (FIG. 12). Conversely, most of the Shi et al. substantial mQTL probes were also identified by the other two studies (FIG. 12). Moreover, >80% of the probes Shi et al. reported as substantial mQTL in lung also exhibited significant mQTL in independent studies of breast and kidney [23]. For these reasons, the inventors focus the subsequent analyses on the Shi et al. substantial mQTL probe set. (data on all three mQTL lists in the annotation of ESS and SIV probes is provided in Table 1 in Appendix A)
[0075] Relative to the probe sets from which they were drawn, those with evidence of either ESS (FIG. 4a) or SIV (FIG. 4b) were enriched for substantial mQTL (15- and 24-fold, respectively, both p < 1010, chi-squared test). Substantial mQTL affected 25.1% of ESS and 17.9% of SIV probes (Table 1 in Appendix A). In particular embodiments, ESS probes without evidence of substantial mQTL are considered to be candidate MEs. Likewise, since the SIV analysis is analogous to previous ME screens [13, 14], SIV probes without evidence of substantial mQTL are also candidate MEs. Indeed, most of the HM450 probes identified as MEs in a previous screen that employed genome-wide bisulfite sequencing [13] overlap with mQTL- filtered ESS or SIV hits (FIG. 4c). After excluding those with substantial mQTL, ESS probes remained 5.6-fold enriched for SIV (p < 1010) (FIG. 4c), indicating that the common epigenetic
73689209.1 behavior of these probes sets is not due to genetic effects. Importantly, most ESS probes do not show substantial mQTL (FIG. 4a), further evidence that ESS is not simply a consequence of the genetic identity of MZ twins. To directly assess the influence of local sequence on inter individual variation at ESS loci several top hits that were validated, such as by performing genotyping and methylation analysis by pyrosequencing in peripheral blood DNA of 64
Gambian children [13]. Each genotyping assay targeted a nearby common SNP with minor allele frequency >25%. Two regions negative for mQTL ( CYP2E1 and DUSP22) and two with some mQTL-positive probes ( SPATC1L and ZFP57) showed substantial inter-individual variation even among individuals of the same genotype (FIGS. 5a-5d). These regions show strong linkage disequilibrium, indicating that SNP genotype is generally an indicator of haplotype. Notably, the SNP that was genotyped at ZFP57, rs3129057, was recently reported to be the strongest index SNP in phase with haplotype-dependent allele- specific methylation (Hap-ASM) in the region [17]. Significant mQTL was detected for ESS CpGs at DUSP22, SPATC1L, and ZFP57 (FIGS. 5b, 5c, and 5d). At these same loci, however, inter- individual variance of methylation was associated with haplotype, providing the novel insight that the local sequence context can influence epigenetic metastability. The pyrosequencing results were further validated by clonal bisulfite sequencing for selected individuals at SPATC1L and ZFP57 (FIGS. 5e and 5f), confirming that even in regions of substantial mQTL, individuals with the same local sequence context can exhibit dramatic inter-individual variation in DNA methylation.
[0076] Relative to negative control probes with inter-individual variation comparable to ESS probes but no evidence of ESS or SIV (FIG. 13), ESS and SIV probes were 3.6 and 5.0-fold enriched for CpG islands, respectively (FIG. 4d) (P < l010 for both comparisons). Likewise, ESS and SIV probes were 3.3. and 2.4-fold enriched in subtelomeric regions (<2Mb from
chromosome ends) (FIG. l4a) (P < l010 for both comparisons). Since subtelomeric regions are rich in genetic variation, the inventors tested whether the subtelomeric enrichment might be due to mQTL. However, similar enrichments were found in the ESS and SIV probe subsets not associated with substantial mQTL (FIG. l4b). The ESS and SIV gene lists each included 6 genomically imprinted genes, no different from what is expected by chance; imprinted loci among these two classes arc AN OF GNAS, GRB10, NAP1L5, NLRP, and VTRNA2-1 (ESS) and DLGAP2, KCNQIOTl , NAP1L5, NLRP2, and VTRNA2-1 (SIV) (www.geneimprint.com). Gene set enrichment analysis (GSEA) using data from The Cancer Genome Atlas [24] showed that, relative to negative controls, both ESS and SIV probes are more likely to be annotated to genes
73689209.1 expressed in a wide range of tumors (FIG. 4e). Across 111 reference epigenomes encompassing a wide range of cell lines and primary tissues [25], both probe sets were enriched for active promoters and depleted for enhancers (FIG. 4f). ESS and SIV CpGs were identified
independently but exhibit highly overlapping genomic and epigenomic features, indicating that they share similar fundamental biological properties.
Periconceptional environment affects establishment of methylation at ESS and SIV
CpGs
[0077] Mouse [10, 11, 26] and human studies [13, 14, 27, 28] have shown that establishment of DNA methylation at MEs is sensitive to periconceptional environment.
Previous studies tested this using a‘natural experiment’ exploiting seasonal variation in maternal nutritional status in The Gambia [12]. Here, an independent set of 128 blood samples were analyzed that were collected from 2-year old Gambian participants in the Early Nutrition and Immune Development (ENID) trial [29] who were conceived at the peak of either the rainy or the dry season. Based on the notion that MEs are largely free of genetic influence, it was tested whether ESS and SIV probes without substantial mQTL show season of conception effects. Surprisingly, there were comparable and highly- significant enrichments for season-of-conception effects in ESS and SIV probe sets regardless of substantial mQTL. The inventors therefore examined the unfiltered probe sets, and found that both ESS and SIV probes, but not negative control probes, were significantly enriched for season-of-conception effects (P = 3.3 x 109 and 1.4 x 1023, respectively, FIG. 6a). Consistent with previous studies of candidate MEs in independent cohorts [12-14], being conceived during the rainy season generally resulted in higher levels of DNA methylation at both ESS and SIV loci (FIG. 6b). These findings further support the conjecture that, regardless of mQTL, many ESS and SIV probes are MEs. This season of conception effect also provides independent support for the model (FIG. 2d) that ESS arises due to early embryonic establishment of epigenotype.
Prospective associations between DNA methylation in blood and later cancer
[0078] Although the probe signature of ESS was identified from a study of adult twins, it appears to be a consequence of methylation establishment in the early embryo and hence must be stable from embryonic development to adulthood. Since ESS is associated with genes expressed in tumors (FIG. 4e), it was considered whether inter-individual variation in DNA methylation at ESS loci predicts risk of later cancer in adults. To test this, data was examined from the
73689209.1 Melbourne Collaborative Cohort Study (MCCS), which enrolled 41,514 healthy adult volunteers between 1990 and 1994 [30]. Peripheral blood samples and information on health-related behaviors were collected at enrollment, and incident cases of cancer were ascertained
prospectively by linkage to the Victorian Cancer Registry, which receives mandatory notification of all new cancer cases in Victoria, Australia. The systemic nature of inter-individual variation at ESS probes enabled use of DNA methylation in peripheral blood as an indicator of methylation in various tissues. A control was matched to each incident case on sex, country of birth, and age at enrollment, using density sampling. Using the Illumina HM450 platform, DNA methylation at baseline was assessed on 3464 case-control pairs overall in studies of seven types of cancer (Table 2); average time from sample collection to diagnosis was 9.2 ± 4.9 years (mean ± sd).
[0079] Table 2: MCCS HM450 data sets, and average years from sample collection to diagnosis
Figure imgf000026_0001
[0080] Regardless of potential genetic influences, the data indicate that inter-individual epigenetic variation at ESS probes occurs systemically and is stable over time. The inventors therefore evaluated ESS probes without regard to mQTL. Combined effects across multiple CpGs (i.e. differentially methylated regions) are more likely to demonstrate long-term stability and affect gene expression [31]. Hence, rather than analyze individual probes, the inventors focused on the 198 clusters of multiple ESS probes separated by no more than 500 bp (523 CpGs total, Table 3 in Appendix B). Analysis of expression [32] and methylation data [9] from
Grundberg el al. showed that at many ESS clusters, average methylation in adipose tissue is associated with gene expression, not only in adipose tissue but also in skin and lymphoblastoid
73689209.1 cell lines (FIG. 15). These results provide evidence that methylation at ESS loci in one tissue yields information about epigenetic regulation in additional tissues. To test for probe-specific associations between peripheral blood DNA methylation at baseline and risk of specific cancer diagnosis the inventors performed conditional logistic regression, adjusting for estimated leukocyte composition (using the Houseman algorithm [33]) and other covariates. Statistical significance of associations at the cluster level were then evaluated by permutation testing.
Relative to negative control clusters, the 198 ESS clusters were four-fold enriched for associations with later cancer (P = l.5xl05). To minimize multiple testing, the inventors focused on the 10 ESS clusters with the largest number of CpG sites. Remarkably, at 7 of these, peripheral blood DNA methylation at baseline was significantly associated with later cancer (FIG. 7 and Table 4 in Appendix C); three ( SPATC1L , VTRNA2-1, and DUSP22 ) were significantly associated with multiple types of cancer. Elevated methylation in a cluster of 6 CpG sites at SPATC1L was associated with reduced risk of colorectal and prostate cancer (FIGS. 7b and 7f), and elevated methylation in a cluster of 12 CpGs at VTRNA2-1 was associated with higher risk of lung cancer and mature B-cell neoplasm (FIGS. 7d and 7e). Interestingly, elevated methylation in a cluster of 8 CpG sites at DUSP22 was associated with increased risk of mature B-cell neoplasm (FIG. 6e) yet reduced risk of urothelial cell carcinoma (FIG. 7g). The 154 negative control clusters showed few and relatively weak associations with later cancer (FIG.
16). Results are also shown for the 128 ESS clusters that included no probes with substantial mQTL (FIG. 17).
Significance of Certain Embodiments
[0081] Because they offer the potential to test the hypothesis that inter-individual epigenetic variation (in the absence of genetic variation) determines human phenotype, MZ twins have long been a focus of epigenetic investigation [34-38]. Such studies depend upon the existence of stochastic (i.e. non genetically-mediated) epigenetic differences within pairs of MZ twins. Conversely, herein has been identified a set of human genomic regions at which MZ twins exhibit non genetically-mediated epigenetic similarity. Based on the frequent occurrence of SIV in ESS regions, and their epigenetic plasticity to periconceptional environment, in particular embodiments of the disclosure ESS arises due to establishment of DNA methylation prior to embryo cleavage during MZ twinning.
73689209.1 [0082] Accordingly, at ESS loci one would expect greater epigenetic similarity in MZ twins that separate later compared to those that separate earlier. This can be tested based on chorionicity; cleavage before day 4 of gestation results in MZ twins each with their own placenta (dichorionic); later cleavage results in a shared placenta (monochorionic). In one of the earliest genome-scale studies of DNA methylation in twins, Kaminsky et al. [36] studied buccal epithelial cells and reported that monochorionic MZ twins exhibit greater epigenetic discordance than dichorionic, contrary to the inventors thesis. A slightly larger study, however, recently assessed genome-scale DNA methylation in blood and came to the exact opposite conclusion
[39]. Given that monochorionic twins share hematopoietic stem cells during fetal development
[40], blood is not the ideal tissue in which to study epigenetic correlates of chorionicity.
Definitive studies in non-blood tissues and focused on ESS regions are needed. Another predicted consequence of ESS is that estimates of methylation heritability from twin studies will be inflated relative to those from family-based designs. Indeed, whereas Grundberg et al.
estimated median genome-wide narrow-sense h2 = 0.34 [9], a recent large family-based study (also using the HM450 platform) estimated an average genome-wide h2 = 0.19 [41].
[0083] After decades of epigenetic studies in MZ twins, it is remarkable that ESS has not been previously reported. Despite their seemingly unsupportive findings in monochorionic vs. dichorionic twins, Kaminsky et al. proposed that in addition to their genetic identity,“epigenetic similarity at the time of blastocyst splitting may also contribute to the phenotypic similarities in MZ co-twins,” exactly as the findings suggest. The excessive h2 estimates in twin studies of epigenetic heritability have, in fact, been waiting to be discovered. Grundberg et al. obtained but did not comment upon HM450 probe- specific h2 estimates > 1. Likewise, in a more recent study using the HM450 array to assess genome-scale DNA methylation in whole blood of MZ and DZ twins, van Dongen et al. [42] reported 3792 probes for which their heritability model failed to converge. Of the 631 of these“NA” probes among the high-variance set from which the ESS probes were drawn, 365 (58%) are classified as ESS. Hence, two recent large studies of DNA methylation in MZ and DZ twins detected but did not explore these very interesting genomic regions.
[0084] The findings indicate complex relationships among genetic variation, ESS, and epigenetic metastability. To clarify, mQTL assesses pairwise associations between methylation at a specific CpG site and a specific genetic variant [19], while hap-ASM describes allelic biases in methylation that are associated with haplotype [17]. Because of the linkage disequilibrium
73689209.1 among neighboring SNPs and the regional correlation of CpG methylation, mQTL (specifically, cis mQTL) provides a means of assessing hap-ASM [17]. The analyses focused on mQTL because many targeted analyses of hap-ASM [21] show poor overlap with probes on the HM450 platform [17]. The results show that ESS loci are enriched for mQTL. Although this may seem to suggest that ESS is a consequence of genetic determination, we’ve provided several lines of evidence to the contrary. According to the model (FIG. 2d) mQTL is consistent with ESS, because any epigenetic state (whether under genetic influence or not) that is established prior to embryo cleavage during MZ twinning and thereafter maintained with high fidelity will exhibit ESS. Seminal studies in isogenic mice led to the concept that inter-individual variation at MEs is determined stochastically, free of genetic influence [43]. The characterization of ESS loci (many of which appear to be MEs) suggests the novel concept that establishment of epigenotype at MEs need not be completely free of genetic influence. In particular, mQTL and epigenetic
metastability appear to occur at the same loci (FIG. 5) and ESS loci - even those associated with substantial mQTL - are labile to perioconceptional environment (FIG. 6). Like nutrition [10-12] and other environmental influences [27, 44], perhaps haplotype (i.e. local sequence context) may be viewed as a determinant of the microenvironment that shifts the probability distribution of stochastic methylation processes during early embryonic development. Building upon this, the validation studies indicate allelic biases in epigenetic metastability. In the clearest example, at ZFP57 (FIG. 5d), the most common allele in the population showed greater inter-individual variation, consistent with the thesis that propensity for stochastic epigenetic variation may be both genetically inherited and evolutionarily advantageous [45].
[0085] It may seem surprising that ESS loci include some genomically imprinted genes. Based on their parent-of-origin specific epigenetic regulation one would expect the mean MSE at imprinted loci to be similar in MZ and DZ twin pairs. The data at VTRNA2-1, however, (FIG. la) show this is clearly not the case. Known imprinted genes were not significantly enriched among ESS loci, but there is evidence that two more of the top hits ( PAX8 and DUSP22) are imprinted in humans, in at least some tissues [46, 47]. In particular embodiments, inter-individual variation at imprinted loci may in some cases occur stochastically; for example, the VTRNA2-1
hypomethylation that is observed in 10-20% of individuals [13, 48, 49] may reflect loss of the maternally-inherited methylation mark in the early embryo. At the population level many ESS loci exhibit clusters of three methylation states (FIG. 1 and FIG. 15). This suggests these loci behave as bistable epigenetic switches (i.e. the combination of two alleles yields three preferred
73689209.1 average states). This is actually consistent with the bimodal distribution of somatic CpG methylation genomewide (i.e. methylation at most loci is either close to 0 or close to 100%). In this regard the presence of imprinted loci - paradigmatic bistable epigenetic switches - among ESS loci is not surprising. Although identified purely on the basis of the methylation MSE ratio of adult DZ to MZ twins, ESS probes are 3-fold enriched in subtelomeric regions. This makes sense; subtelomeric regions are packed with transposable elements, known to be targets of de novo DNA methylation in the pre-implantation embryo [50]. There was a similar enrichment in the genome-wide screen for MEs [13] but filtered out most of those hits due to proximity to SNPs. The current results, showing that the subtelomeric enrichment is not associated with mQTL, suggest the considerations were overly conservative. Intriguingly, since epigenetic regulation in subtelomeric regions regulates telomere shortening [50], the Gambian data showing season of conception effects at ESS regions suggest that periconceptional events could influence the process of telomere maintenance, deregulation of which is an almost universal characteristic of aging and cancer.
[0086] Because of their early embryonic establishment and systemic inter-individual variation, ESS loci are attractive candidate regions for studies of epigenetics and disease. The HM450 array was built upon a platform initially focused on regions aberrantly methylated in cancer, motivating the focus on cancer. Methylation at three clusters ( SPATC1L , VTRNA2-1, and DUSP22) showed significant associations with two types of cancer. Little is known about the speriolin-like protein SPATC1L, but elevated methylation at the small noncoding RNA VTRNA2-1 has previously been reported to predict poor prognosis in acute myeloid leukemia [49] and other types of cancer [13], consistent with the positive association that was found between VTRNA2-1 methylation and lung cancer and mature B-cell neoplasm (FIGS. 7d & 7e). Likewise, rearrangements disrupting the dual specificity phosphatase gene DUSP22 are associated with T-cell and B-cell lymphoma [51], consistent with the finding of a positive association between methylation at DUSP22 and mature B-cell neoplasm (FIG. 7e). Methylation at DUSP22 was also negatively associated with risk of urothelial cell carcinoma (FIG. 7g), reminiscent of situations in which the same genetic variant is associated oppositely with risk of different types of cancer [52]. ZFP57 encodes a master regulator of genomic imprinting and is epigenetically labile to periconceptional nutrition [13]. The finding that elevated methylation at ZFP57 is associated with a reduced risk of later colorectal cancer (FIG. 7b) is consistent with data suggesting ZFP57 is an oncogene [53]. Likewise, PF4 (platelet factor 4) inhibits tumor
73689209.1 growth and metastasis by suppressing neovascularization [54], consistent with the positive association found between PF4 methylation and later urothelial cell carcinoma (FIG. 7g).
Despite detecting significant associations between methylation and later cancer, the effect sizes in most cases were modest. It is likely that effects of some epimutations are limited to specific cancer subtypes. Likewise, these epigenetic variants likely interact with genetic variation and environmental exposures to affect cancer risk. It is possible that some of these associations might reflect mQTL/hap-ASM association with cancer-associated SNPs identified in GW AS studies. Targeted studies in larger cohorts are needed. Nonetheless, the data indicate that individual epigenetic variation at ESS loci has phenotypic consequences: methylation in peripheral blood is associated with risk of specific cancer diagnoses years later. Broader-scale identification of ESS loci throughout the genome may enable epigenetic risk models for cancer prediction, even during early life.
[0087] The findings indicate a partial explanation for missing heritability, in specific embodiments. Because heritability is defined as the phenotypic variance explained by genetics divided by the total phenotypic variation in a population [16], rather than something heritable per se, what is actually missing is genetic variance [55]. Twin models of estimating heritability rely on the assumption that the greater phenotypic similarity of MZ relative to DZ twin pairs is attributable to their genetic identity. Hence, to the extent that epigenetic variation at ESS loci influences phenotype, estimates of heritability based on twin studies will be inflated. Indeed, twin studies often yield higher heritability estimates than family studies [56, 57]. Further, although heritability does not definitively connote transgenerational inheritance, transmission of sequence-independent epigenetic events across generations could contribute to missing heritability [55]. In this regard, genomically imprinted loci that behave as epi-alleles (such as VTRNA2-1 ) potentiate transgenerational inheritance of epigenetic traits, in particular
embodiments.
Conclusions
[0088] Overall, the data show for the first time that, independent of their genetic identity, human MZ twin pairs share an additional level of similarity at the epigenetic level. ESS appears to result from establishment of mitotically heritable epigenetic states prior to embryo cleavage during MZ twinning. Because of ESS, human MZ twins clearly cannot be viewed as the epigenetic equivalent of isogenic inbred mice, which originate from separate zygotes. To the
73689209.1 extent that epigenetic variation at ESS loci influences human phenotype, as the data indicate, the existence of ESS establishes a link between early embryonic epigenetic development and adult disease and may call into question heritability estimates based on twin studies.
Examples of Methods
Identification and characterization of ESS and SIV probes
Analysis of twin, SIV, and mQTL data sets
[0089] Grundberg et al. used the Illumina HM450 array to assess methylation in adipose tissue from 662 female twins of European-descent, including 97 MZ pairs and 162 DZ pairs. Methylation scores were normalized by quantile normalization. SNP-associated probes were excluded, leaving 344,303 probes [9]. The analyses focused on the 34,405 probes in the top 10% by variance. To calculate the metrics used, the inventors pooled the MZ and DZ twins into a single population, and calculated probe- specific b-value ranges (max - min) from this population. Individuals in this population were randomly paired to simulate unrelated individuals (RZ). For each probe, the inventors calculated the MSE of MZ, DZ, and RZ pairs from the line of identity (i.e. the mean square difference between twins). For n twin pairs, each comprised of twins A and B,
Figure imgf000032_0001
[0090] Criteria for ESS probes were DZ/MZ > 2 and overall inter-individual b-value range (max - min) > 0.4; additionally, 14 probes with MZ/RZ MSE > 0.5 were excluded. Probe- specific h2 estimates from Ref. [9] were kindly provided by Elin Grundberg. Lokk el al. used the Illumina HM450 array to assess methylation in 17 tissues from four autopsied individuals [20]. The inventors analyzed the methylation data for three tissues representing the three germ layers: gall bladder (endodermal), abdominal aorta (mesodermal), and sciatic nerve (ectodermal).
Starting with the 344,303 high-quality probes that were the basis of the Grundberg et al.
analysis, the inventors excluded any probes with missing values in any of the 12 samples (4 individuals, 3 tissues), leaving 344,151 probes. Inter-individual variation was calculated by taking the mean b value across each individual’s 3 tissues, then calculating the range of these means across all 4 individuals. Tissue-specific variation was calculated by taking the mean beta
73689209.1 value of each tissue over all individuals, then calculating the range of these means across all 3 tissues. Negative control probes were selected by maintaining the criterion of inter-individual range > 0.4 in the Grundberg el al. data set, but requiring MZ/RZ MSE > 0.5 and (in the Lokk data set) requiring tissue-specific variation to be at twice inter-individual variation (FIG. 13). Figures were made in R 3.3.1 using ggplot2 [58]. For the analysis of the Shi el al mQTF data [23], senior author Maria Fandi kindly shared with us their estimates of the proportion of methylation variance ( h2) explained by neighboring SNPs.
Validation studies
[0091] Quantitative analysis of selected candidate MEs was performed by bisulfite pyrosequencing [59] across endodermal (liver), mesodermal (kidney), and ectodermal (brain) tissue in 17 Asian cadavers [13]. Prior to use, all pyrosequencing assays were validated for linearity and sensitivity using human genomic methylation standards [12, 13]. To assess SIV, for each pyrosequencing assay methylation was averaged across multiple CpG sites for each samples, and inter-tissue correlation coefficients were calculated across the 17 cadavers (kidney vs. liver, brain vs. liver, and brain vs. kidney). Regions yielding an inter-tissue correlation of R2 > 0.50 (R > 0.71) were considered positive for SIV [12]. Pyrosequencing was also used to perform SNP genotyping at specific loci [13, 14]. Associations between SNP genotype and average methylation were evaluated by linear regression (SAS), and effects of genotype on variance were evaluated by Bartlett’s test (implemented in R). Clonal bisulfite sequencing was performed as previously described [60], using appropriate primers.
Gene set enrichment analysis
[0092] For each of the probe sets analyzed ( e.g . ESS, SIV, and negative controls), associated gene sets were determined based on the HM450 probe annotations. For 24 cancer types profiled by The Cancer Genome Atlas (TCGA) [24], the inventors downloaded the RNA- Seq gene expression profiles using the firebrowse.org portal [61], selected genes with maximum FPKM across all samples exceeding 1, then generated rank file for Gene Set Enrichment Analysis (GSEA) [62] as previously described [63] by comparing the tumor samples and the adjacent normal samples. Next, GSEA was run for each cancer type rank file and each CpG- associated gene set; significance was considered for q-value<0.25. For visualization purposes, The inventors represented the Normalized Enrichment Score (NES) for each significant enrichment; by convention, NES for enrichments with Q>=0.25 were considered 0.
73689209.1 Epigenomic distribution ofCpG probes
[0093] For each of probe sets analyzed (e.g. ESS, SIV, and negative controls), genomic coordinates on the human genome build UCSC hgl9 were determined based on the HM450 probes definition. The inventors considered fifteen-states genome wide epigenomic partitions for 127 distinct epigenomes as defined by the NIH Epigenomic Roadmap Consortium [25], based on a collection of uniformly collected histone modification ChIP-Seq profiles and using the ChromHMM algorithm [64]. Using the BEDTOOLS software, the relative distribution was determined of each CpG probes set over the fifteen epigenomic states for each distinct epigenome.
Evaluating the relationship between DNA methylation and gene expression at ESS clusters
[0094] As described above, HM450 methylation data was used for subcutaneous adipose tissue from MZ and DZ twins [9]. Gene expression data in skin, adipose tissue, and
lymphoblastoid cell lines from the same set of twins was downloaded from ArrayExpress (accession # E-TABM-1140) [32, 65]. DNA methylation (b values) were first averaged across probes within each ESS cluster. Correlation between cluster-level DNA methylation and associated gene expression was evaluated using the Spearman (rank) correlation coefficient, as implemented in the Python scientific libraries.
Season of conception analyses
Sample selection and preparation
[0095] The data were collected as part of a study in The Gambia (in sub-Saharan West Africa) identifying biomarkers and understanding mechanisms for the relationship between aflatoxin exposure and child stunting. 251 blood samples (3 ml) were collected from children aged 2 years as part of the Early Nutrition and Immune Development (ENID) Trial [29].
Bisulfite conversion and DNA methylation assay
[0096] DNA was extracted from white blood cells following a previously described protocol [12]. An additional 6 samples were included as technical replicates. Genome-scale methylation profiles were obtained using HM450 Infinium methylation bead arrays (Illumina,
73689209.1 San Diego, USA). DNA samples (500 ng) were bisulfite-modified using the EZ DNA Methylation kit (Zymo Research, D5001) following manufacturer’s instructions for the HM450 array. Modified DNA was stored at -20C until used. Amplification, labelling, hybridization and scanning was performed as previously described [13].
Methylation data QC and normalization
[0097] Methylation data pre-processing was performed using the R/Bioconductor minfi package [66], along with other functions and bespoke R scripts as appropriate (R version 3.2.3; Bioconductor version 3.2). Briefly, data for 485,512 HM450 probes measured in 257 samples were imported from raw ID AT files. Analysis of internal HM450 bisulfite conversion control probes revealed one sample with poor bisulfite conversion efficiency which was excluded.
Functional normalization [67] was used to reduce unwanted technical variation using control probes on the array. 7 samples with discordant sex were removed following sex prediction based on median values of measurement on X and Y chromosomes using the minfi addSex() function. Using a probe detection p-value threshold of 0.01, 5 samples failing in >1% of probes were removed, along with 32,488 probes failing in 1 or more samples. All technical replicates showed beta-value Pearson correlations > 0.994 and visual inspection of replicate correlation scatterplots revealed no anomalies. Following removal of technical replicates and X and Y chromosome probes, methylation data for 442,869 probes measured in 239 individuals remained. Correction for differences in HM450 beta-value distributions due to type-I and type-II probes on the array was conducted using the Beta Mixture Quantile Dilation (BMIQ) method [68]. Finally, 28,509 cross-reactive probes [69] and 41,334 probes within lObp of common (MAF > 1%) African SNPs identified using the R Illumina450ProbeVariants.db were removed. After all QC and filtering, 373,026 probes remained.
Identification ofCpGs associated with Gambian Season of Conception
[0098] Statistical analysis was performed to identify HM450 probes associated with Gambian season of conception, described hereafter as‘season of conception differentially methylated probes’ (SoC-DMPs). This analysis was restricted to 128 individuals conceived at the peak of either the Gambian dry (February- April) or rainy (July- September) seasons (based on date of birth). These seasonal windows have been used in previous studies investigating seasonal effects on DNA methylation [12, 13]. Robust linear regression using the R rim function was used to model the association between SoC and DNA methylation (measured as HM450 beta-values),
73689209.1 in order to account for potential heteroscedasticity and influential outliers [70, 71]. The regression model included infant sex, and the first 3 principal components identified in an unsupervised principal components analysis of genome-wide methylation (FIG. 18). The model was additionally adjusted for the effects of cell heterogeneity using an established method that uses methylation data to estimate the relative proportions of 6 white blood cell types [33].
Additional analyses were performed i) without cell composition adjustment to assess sensitivity to cell composition effects; and ii) with the inclusion of one further principal component (PC 8) which was associated with SoC which would be expected to dilute the SoC effect (hence providing a more conservative estimate of SoC-associated ME enrichment). A correction for multiple testing was applied by controlling the false discovery rate (FDR).
Enrichment analysis
[0099] Probes with an FDR < 10% were identified as SoC-DMPs. Different sets of HM450 probes were tested for SoC-DMP enrichment (FDR < 10%) using Fisher’s exact test, against a background of all 373,026 probes passing QC and filtering steps. Enrichment results for the main analysis are presented in FIG. 6A. Additional enrichment tests were performed without adjustment for cell composition and with the inclusion of one further principal component (see previous section).
Identification of ESS clusters associated with cancer risk
Sample collection, data generation, and quality control
[0100] Methylation data were available for participants in one of seven case-control studies of breast, colorectal, kidney, lung, mature B-cell malignancies, prostate or urothelial cancer [72-74] nested within the Melbourne Collaborative Cohort Study [30]. DNA was extracted from samples of peripheral blood mononuclear cells (PBMC), buffy coats or dried blood spots (DBS) stored on Guthrie card diagnostic cellulose filter paper. Samples were collected at recruitment to the cohort (baseline) or at follow-up approximately ten years later. Cases and controls were individually matched to cases on age (they had to be free of cancer at an age within one year of the age at diagnosis of the corresponding case), sex, ethnicity and blood DNA source (DBS, PBMC or buffy coat). For all but the colorectal cancer study, controls were matched to cases on year of birth. For the lung cancer study, controls were matched on smoking
73689209.1 status at the time of blood collection. Case-control pairs were placed at random positions on a same chip of the assay to minimize batch effects.
[0101] DNA was extracted from mononuclear cells using QIAamp mini spin columns (Qiagen, Hilden, Germany). Dried blood spot DNA was extracted as previously described [75]. Briefly, twenty blood spots of 3.2 mm diameter were punched from the Guthrie card and lysed in phosphate buffered saline using TissueLyser (Qiagen). The resulting supernatant was processed using Qiagen mini spin columns according to the manufacturer’s protocol. The quality and quantity of DNA was assessed using the Quant-iT™ Picogreen® dsDNA assay measured on the Qubit® Fluorometer (Life Technologies, Grand Island, NY), with a minimum of 0.3 pg DNA considered acceptable for methylation analysis.
[0102] Bisulfite conversion was performed using the Zymo Gold single tube kit (EZ DNA Methylation-Gold kit, Zymo Research, Irvine, CA) according to the manufacturer’s instructions. Post-conversion quality control was performed using SYBR Green-based quantitative PCR, an in-house assay designed to determine the success of bisulphite conversion by comparing amplification of the test sample with positive and negative controls. All samples were processed in the same laboratory on 96-well plates, each using eight HM450 BeadChips to assay batches of 12 samples.
[0103] The methylation data were background corrected and normalized based on internal control probes using the manufacturer’s background correction, using the R library minfi [66]. The inventors also applied subset-quantile within-array normalization (SWAN) [76] to correct for technical discrepancies between type I and type II probes on the assay. A b-value (interpreted as percentage methylation) was calculated for each CpG site using minfi.
Methylation measures with a detection p-value higher than 0.01 were considered missing.
Samples with more than 5% missing values were excluded; then, CpGs that were missing for more than 20% of samples were excluded b-values were transformed into M-values for analysis: M=1ob2(b/(1-b)).
Logistic regression and permutation analyses
[0104] For each CpG probe set, their clustering structure was first determined by considering all CpGs within 500bp of each other; groups of at least 2 such CpGs were considered clusters. ESS, SIV, and negative control cluster annotations are provided in Table 3
73689209.1 in Appendix B. For each of the 7 case-control cohorts described above, normalized DNA methylation data at the CpG probe level were obtained. Since methylation was measured in peripheral blood DNA, cell type composition estimates using established methods [33] were also included for each sample (specifically, proportions of B-cells, granulocytes, monocytes, NK,
CD4 T-cells, and CD8 T-cells). Clinical data variables indicating body mass index (BMI), alcohol consumption, and smoking status were included for each subject. Many ESS probes showed highly non-normal methylation distributions within each cohort. To avoid incorrect assumptions about the data distribution, the M- values for each probe were rank- normalized in ascending order across all samples using the R statistical system. Using conditional logistic regression as implemented in the R survival package, the inventors determined for each probe the significance of the association between methylation rank and cancer status, in a model including both cell type proportion and the clinical variables described above. For the purposes of permutation testing (see below), associations were considered statistically significant at p<0.05.
[0105] These probe- specific P values were then utilized to evaluate the statistical significance of associations at the cluster level. Analysis was focused on the top 10 clusters by total number of CpGs. The significance was assessed of two event types. For each cluster (C), cancer type (7), and random assignment of the case-control status for each matched pair (Smd), the inventors denoted the number of significant probes at p<0.05 with concordant coefficients as determined by conditional logistic regression as N(C,T,Smd ) The inventors denote the actual case-control status from the MCCS cohort as N(C,T,S0bs ) The inventors likewise denote the minimum p-value obtained across all the probes in a cluster, for the randomly-assigned and actual case-control pairing as Pmm(C,T,Smd) and Pmin(C,T,S0bs). The event was defined:
Ϊ) N( C,T,S rnd ) >= N(C,T,Sobs) and Pmin(C,T,Smd) <= Pmin(C,T,Sobs).
[0106] Next, for each random assignment Smd the inventors defined the event
Recurrence(C,Smd ) as the number of cancer types for which a cluster C contains at least two significant probes (p<0.05) with concordant coefficients. The corresponding value for the actual case-control status is Recurrence(C,S0bs). The event was defined: ii)) Recurrence(C,Smd) >= Recurrence(C,Sabs
[0107] The null hypothesis is that both i) and ii) occur by chance. Similar to widely used methods such as GSEA [62], permutation testing was used to establish a null distribution for S.
73689209.1 The inventors generated 20,000 permutations for each individual cancer site, by keeping the sample pairing (as indicated by the patient id) but randomly assigning the case / control status within each pair. For each permutation S the inventors applied conditional logistic regression for each cancer type, and counted events i) and ii) as described above. The inventors assigned each event a p-value corresponding to the relative number of permutations (out of 20,000) for which events i) or ii) were observed. Statistical significance was achieved at the FDR<0.25 level, across the top 10 most CpG-rich clusters.
REFERENCES
[0108] All publications mentioned in this specification are indicative of the level of those skilled in the art to which the invention pertains. All publications herein are incorporated by reference to the same extent as if each individual publication was specifically and individually indicated to be incorporated by reference in their entirety.
1. Jaenisch R, Bird A: Epigenetic regulation of gene expression: how the genome integrates intrinsic and environmental signals. Nat Genet 2003, 33 Suppl:245-254.
2. Holliday R: The inheritance of epigenetic defects. Science 1987, 238:163-170.
3. Baylin SB, Jones PA: A decade of exploring the cancer epigenome - biological and translational implications. Nat Rev Cancer 2011, 11:726-734.
4. Bimey E, Smith GD, Greally JM: Epigenome-wide Association Studies and the Interpretation of Disease -Omics. PLoS Genet 2016, l2:el006l05.
5. Rakyan VK, Down TA, Balding DJ, Beck S: Epigenome-wide association studies for common human diseases. Nat Rev Genet 2011, 12:529-541.
6. Rakyan VK, Blewitt ME, Druker R, Preis JI, Whitelaw E: Metastable epialleles in mammals. Trends Genet 2002, 18:348-351.
7. Duhl DM, Vrieling H, Miller KA, Wolff GL, Barsh GS: Neomorphic agouti mutations in obese yellow mice. NatGenet 1994, 8:59-65.
8. Morgan HD, Sutherland HG, Martin DI, Whitelaw E: Epigenetic inheritance at the agouti locus in the mouse. Nat Genet 1999, 23:314-318.
73689209.1 9. Grundberg E, Meduri E, Sandling JK, Hedman AK, Keildson S, Buil A, Busche S, Yuan W, Nisbet J, Sekowska M, et al: Global analysis of DNA methylation variation in adipose tissue from twins reveals links to disease-associated variants in distal regulatory elements. Am J Hum Genet 2013, 93:876-890.
10. Waterland RA, Jirtle RL: Transposable elements: targets for early nutritional effects on epigenetic gene regulation. Mol Cell Biol 2003, 23:5293-5300.
11. Waterland RA, Dolinoy DC, Lin JR, Smith CA, Shi X, Tahiliani KG: Maternal methyl supplements increase offspring DNA methylation at Axin fused. Genesis 2006, 44:401- 406.
12. Dominguez-Salas P, Moore SE, Baker MS, Bergen AW, Cox SE, Dyer RA, Fulford AJ, Guan Y, Laritsky E, Silver MJ, et al: Maternal nutrition at conception modulates DNA methylation of human metastable epialleles. Nat Commun 2014, 5:3746.
13. Silver MJ, Kessler NJ, Hennig BJ, Dominguez-Salas P, Laritsky E, Baker MS, Coarfa C, Hernandez- Vargas H, Castelino JM, Routledge MN, et al: Independent genomewide screens identify the tumor suppressor VTRNA2-1 as a human epiallele responsive to
periconceptional environment. Genome Biol 2015, 16:118.
14. Waterland RA, Kellermayer R, Laritsky E, Rayco-Solon P, Harris RA, Travisano M, Zhang W, Torskaya MS, Zhang J, Shen L, et al: Season of conception in rural gambia affects DNA methylation at putative human metastable epialleles. PLoS Genet 2010, 6:el00l252.
15. Visscher PM, Benyamin B, White I: The use of linear mixed models to estimate variance components from data on twin pairs by maximum likelihood. Twin Res 2004, 7:670- 674.
16. Visscher PM, Hill WG, Wray NR: Heritability in the genomics era— concepts and misconceptions. Nat Rev Genet 2008, 9:255-266.
17. Do C, Lang CF, Lin J, Darbary H, Krupska I, Gaba A, Petukhova L, Vonsattel JP, Gallagher MP, Goland RS, et al: Mechanisms and Disease Associations of Haplotype-Dependent Allele-Specific DNA Methylation. Am J Hum Genet 2016, 98:934-955.
73689209.1 18. Titlestad IL, Kyvik KO, Kristensen T, Lillevang S: HLA haplotypes in dizygotic twin pairs: are dizygotic twins more similar than sibs? Twin Res 2002, 5:287-288.
19. Yet I, Tsai PC, Castillo-Femandez JE, Carnero-Montoro E, Bell JT: Genetic and environmental impacts on DNA methylation levels in twins. Epigenomics 2016, 8:105-117.
20. Lokk K, Modhukur V, Rajashekar B, Martens K, Magi R, Kolde R, Koltsina M, Nilsson TK, Vilo J, Salumets A, Tonisson N: DNA methylome profiling of human tissues identifies global and tissue- specific methylation patterns. Genome Biol 2014, l5:r54.
21. Do C, Shearer A, Suzuki M, Terry MB, Gelernter J, Greally JM, Tycko B:
Genetic-epigenetic interactions in cis: a major focus in the post-GWAS era. Genome Biol 2017, 18:120.
22. Volkov P, Olsson AH, Gillberg L, Jorgensen SW, Brons C, Eriksson KF, Groop L, Jansson PA, Nilsson E, Ronn T, et al: A Genome-Wide mQTL Analysis in Human Adipose Tissue Identifies Genetic Variants Associated with DNA Methylation, Gene Expression and Metabolic Traits. PLoS One 2016, H:e0l57776.
23. Shi J, Marconett CN, Duan J, Hyland PL, Li P, Wang Z, Wheeler W, Zhou B, Campan M, Lee DS, et al: Characterizing the genetic basis of methylome diversity in histologically normal human lung tissue. Nat Commun 2014, 5:3365.
24. Cancer Genome Atlas Research N, Weinstein JN, Collisson EA, Mills GB, Shaw KR, Ozenberger BA, Ellrott K, Shmulevich I, Sander C, Stuart JM: The Cancer Genome Atlas Pan-Cancer analysis project. Nat Genet 2013, 45: 1113-1120.
25. Roadmap Epigenomics C, Kundaje A, Meuleman W, Ernst J, Bilenky M, Yen A, Heravi-Moussavi A, Kheradpour P, Zhang Z, Wang J, et al: Integrative analysis of 111 reference human epigenomes. Nature 2015, 518:317-330.
26. Dolinoy DC, Huang D, Jirtle RL: Maternal nutrient supplementation counteracts bisphenol A-induced DNA hypomethylation in early development. Proc Natl Acad Sci U SA 2007, 104:13056-13061.
73689209.1 27. Estill MS, Bolnick JM, Waterland RA, Bolnick AD, Diamond MP, Krawetz SA: Assisted reproductive technology alters deoxyribonucleic acid methylation profiles in bloodspots of newborn infants. Fertil Steril 2016, 106:629-639 e6l0.
28. Kuhnen P, Handke D, Waterland RA, Hennig BJ, Silver M, Fulford AJ,
Dominguez-Salas P, Moore SE, Prentice AM, Spranger J, et al: Inter-individual Variation in DNA Methylation at a Putative POMC Metastable Epiallele Is Associated with Obesity. Cell Metab 2016, 24:502-509.
29. Moore SE, Fulford AJ, Darboe MK, Jobarteh ML, Jarjou LM, Prentice AM: A randomized trial to investigate the effects of pre-natal and infant nutritional supplementation on infant immune development in rural Gambia: the ENID trial: Early Nutrition and Immune Development. BMC Pregnancy Childbirth 2012, 12:107.
30. Giles GG, English DR: The Melbourne Collaborative Cohort Study. IARC Sci Publ 2002, 156:69-70.
31. Bock C: Analysing and interpreting DNA methylation data. Nat Rev Genet 2012, 13:705-719.
32. Grundberg E, Small KS, Hedman AK, Nica AC, Buil A, Keildson S, Bell JT, Yang TP, Meduri E, Barrett A, et al: Mapping cis- and trans-regulatory effects across multiple tissues in twins. Nat Genet 2012, 44:1084-1089.
33. Jaffe AE, Irizarry RA: Accounting for cellular heterogeneity is critical in epigenome-wide association studies. Genome Biol 2014, l5:R3l.
34. Bell JT, Spector TD: A twin approach to unraveling epigenetics. Trends Genet 2011, 27:116-125.
35. Fraga MF, Ballestar E, Paz MF, Ropero S, Setien F, Ballestar ML, Heine-Suner D, Cigudosa JC, Urioste M, Benitez J, et al: Epigenetic differences arise during the lifetime of monozygotic twins. Proc Natl Acad Sci U SA 2005, 102:10604-10609.
36. Kaminsky ZA, Tang T, Wang SC, Ptak C, Oh GH, Wong AH, Feldcamp LA, Virtanen C, Halfvarson J, Tysk C, et al: DNA methylation profiles in monozygotic and dizygotic twins. Nat Genet 2009, 41:240-245.
73689209.1 37. Oates NA, van Vliet J, Duffy DL, Kroes HY, Martin NG, Boomsma DI,
Campbell M, Coulthard MG, Whitelaw E, Chong S: Increased DNA methylation at the AXIN1 gene in a monozygotic twin from a pair discordant for a caudal duplication anomaly. Am J Hum Genet 2006, 79:155-162.
38. Wong AH, Gottesman, II, Petronis A: Phenotypic differences in genetically identical organisms: the epigenetic perspective. Hum Mol Genet 2005, 14 Spec No l:Rl 1-18.
39. Bui M, Benyamin B, Shah S, Henders AK, Martin NG, Montgomery GW, McRae AF: Sharing a Placenta is Associated With a Greater Similarity in DNA Methylation in
Monochorionic Versus Dichorionic Twin Pars in Blood at Age 14. Twin Res Hum Genet 2015, 18:680-685.
40. Weksberg R, Shuman C, Caluseriu O, Smith AC, Fei YF, Nishikawa J, Stockley TF, Best F, Chitayat D, Olney A, et al: Discordant KCNQlOTl imprinting in sets of
monozygotic twins discordant for Beckwith- Wiedemann syndrome. Hum Mol Genet 2002, 11:1317-1325.
41. McRae AF, Powell JE, Henders AK, Bowdler F, Hemani G, Shah S, Painter JN, Martin NG, Visscher PM, Montgomery GW: Contribution of genetic variation to
transgenerational inheritance of DNA methylation. Genome Biol 2014, l5:R73.
42. van Dongen J, Nivard MG, Willemsen G, Hottenga JJ, Helmer Q, Dolan CV, Ehli EA, Davies GE, van Iterson M, Breeze CE, et al: Genetic and environmental influences interact with age and sex in shaping the human methylome. Nat Commun 2016, 7:11115.
43. Rakyan VK: Metastable epialleles in mammals. Trends in genetics 2002, 18:348-
351.
44. Dolinoy DC, Weidman JR, Waterland RA, Jirtle RE: Maternal genistein alters coat color and protects Avy mouse offspring from obesity by modifying the fetal epigenome. Environ Health Perspect 2006, 114:567-572.
45. Feinberg AP, Irizarry RA: Evolution in health and medicine Sackler colloquium: Stochastic epigenetic variation as a driving force of development, evolutionary adaptation, and disease. Proc Natl Acad Sci U SA 2010, 107 Suppl 1:1757-1764.
73689209.1 46. Green BB, Kappil M, Lambertini L, Armstrong DA, Guerin DJ, Sharp AJ, Lester BM, Chen J, Marsit CJ: Expression of imprinted genes in placenta is associated with infant neurobehavioral development. Epigenetics 2015, 10:834-841.
47. Kukurba KR, Zhang R, Li X, Smith KS, Knowles DA, How Tan M, Piskol R,
Lek M, Snyder M, Macarthur DG, et al: Allelic expression of deleterious protein-coding variants across human tissues. PLoS Genet 2014, l0:el004304.
48. Romanelli V, Nakabayashi K, Vizoso M, Moran S, Iglesias-Platas I, Sugahara N, Simon C, Hata K, Esteller M, Court F, Monk D: Variable maternal methylation overlapping the nc886/vtRNA2-l locus is locked between hypermethylated repeats and is frequently altered in cancer. Epigenetics 2014, 9:783-790.
49. Treppendahl MB, Qiu X, Sogaard A, Yang X, Nandrup-Bus C, Hother C, Andersen MK, Kjeldsen L, Mollgard L, Hellstrom-Lindberg E, et al: Allelic methylation levels of the noncoding VTRNA2-1 located on chromosome 5q3l.l predict outcome in AML. Blood 2012, 119:206-216.
50. Blasco MA: The epigenetic regulation of mammalian telomeres. Nat Rev Genet 2007, 8:299-309.
51. Hapgood G, Savage KJ: The biology and management of systemic anaplastic large cell lymphoma. Blood 2015, 126:17-25.
52. Kamper-Jorgensen M, Biggar RJ, Tjonneland A, Hjalgrim H, Kroman N, Rostgaard K, Stamper CL, Olsen A, Andersen AM, Gadi VK: Opposite effects of
microchimerism on breast and colon cancer. Eur J Cancer 2012, 48:2227-2235.
53. Tada Y, Yamaguchi Y, Kinjo T, Song X, Akagi T, Takamura H, Ohta T, Yokota T, Koide H: The stem cell transcription factor ZFP57 induces IGF2 expression to promote anchorage-independent growth in cancer cells. Oncogene 2015, 34:752-760.
54. Lippi G, Favaloro EJ: Recombinant platelet factor 4: a therapeutic, anti-neoplastic chimera? Semin Thromb Hemost 2010, 36:558-569.
73689209.1 55. Eichler EE, Flint J, Gibson G, Kong A, Leal SM, Moore JH, Nadeau JH: Missing heritability and strategies for finding the underlying causes of complex disease. Nat Rev Genet 2010, 11:446-450.
56. Costa AM, Breitenfeld L, Silva AJ, Pereira A, Izquierdo M, Marques MC:
Genetic inheritance effects on endurance and muscle strength: an update. Sports Med 2012, 42:449-458.
57. Gordon H, Trier Moller F, Andersen V, Harbord M: Heritability in inflammatory bowel disease: from the first twin study to genome-wide association studies. Inflamm Bowel Dis 2015, 21:1428-1434.
58. Wickham H: ggplot2: Elegant Graphics for Data Analysis. Springer- Verlag New York; 2009.
59. Shen L, Guo Y, Chen X, Ahmed S, Issa JP: Optimizing annealing temperature overcomes bias in bisulfite PCR methylation analysis. Biotechniques 2007 , 42:48-58.
60. Waterland RA, Kellermayer R, Rached MT, Tatevian N, Gomes MV, Zhang J, Zhang L, Chakravarty A, Zhu W, Laritsky E, et al: Epigenomic profiling indicates a role for DNA methylation in early postnatal liver development. Hum Mol Genet 2009, 18:3026-3038.
61. Deng M, Bragelmann J, Kryukov I, Saraiva-Agostinho N, Pemer S: FirebrowseR: an R client to the Broad Institute's Firehose Pipeline. Database (Oxford) 2017, 2017.
62. Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP: Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U SA 2005, 102:15545-15550.
63. He B, Lanz RB, Fiskus W, Geng C, Yi P, Hartig SM, Rajapakshe K, Shou J, Wei L, Shah SS, et al: GATA2 facilitates steroid receptor coactivator recruitment to the androgen receptor complex. Proc Natl Acad Sci U SA 2014, 111:18261-18266.
64. Ernst J, Kheradpour P, Mikkelsen TS, Shoresh N, Ward LD, Epstein CB, Zhang X, Wang L, Issner R, Coyne M, et al: Mapping and analysis of chromatin state dynamics in nine human cell types. Nature 2011, 473:43-49.
73689209.1 65. Yang J, Huang T, Petralia F, Long Q, Zhang B, Argmann C, Zhao Y, Mobbs CV, Consortium GT, Schadt EE, et al: Synchronized age-related gene expression changes across multiple tissues in human and the link to complex diseases. Sci Rep 2015, 5:15145.
66. Aryee MJ, Jaffe AE, Corrada-Bravo H, Ladd-Acosta C, Feinberg AP, Hansen KD, Irizarry RA: Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays. Bioinformatics 2014, 30:1363-1369.
67. Fortin JP, Labbe A, Lemire M, Zanke BW, Hudson TJ, Fertig EJ, Greenwood CMT, Hansen KD: Functional normalization of 450k methylation array data improves replication in large cancer studies. Genome Biol 2014, 15.
68. Teschendorff AE, Marabita F, Lechner M, Bartlett T, Tegner J, Gomez-Cabrero D: A beta-mixture quantile normalization method for correcting probe design bias in Illumina Infinium 450k DNA methylation data. Bioinformatics 2013, 29.
69. Chen Y, Lemire M, Choufani S, Butcher DT, Grafodatskaya D, Zanke BW, Gallinger S, Hudson TJ, Weksberg R: Discovery of cross-reactive probes and polymorphic CpGs in the Illumina Infinium HumanMethylation450 microarray. Epigenetics 2013, 8:203-209.
70. Joubert BR, Felix JF, Yousefi P, Bakulski KM, Just AC, Breton C, Reese SE, Markunas CA, Richmond RC, Xu CJ, et al: DNA Methylation in Newborns and Maternal Smoking in Pregnancy: Genome-wide Consortium Meta- analysis. Am J Hum Genet 2016, 98:680-696.
71. Engel SM, Joubert BR, Wu MC, Olshan AF, Haberg SE, Ueland PM, Nystad W, Nilsen RM, Vollset SE, Peddada SD, London SJ: Neonatal genome- wide methylation patterns in relation to birth weight in the Norwegian Mother and Child Cohort. Am J Epidemiol 2014, 179:834-842.
72. Dugue PA, Brinkman MT, Milne RL, Wong EM, FitzGerald LM, Bassett JK, Joo JE, Jung CH, Makalic E, Schmidt DF, et al: Genome-wide measures of DNA methylation in peripheral blood and the risk of urothelial cell carcinoma: a prospective nested case-control study. Br J Cancer 2016, 115:664-673.
73689209.1 73. Severi G, Southey MC, English DR, Jung CH, Lonie A, McLean C, Tsimiklis H, Hopper JL, Giles GG, Baglietto L: Epigenome-wide methylation in DNA from peripheral blood as a marker of risk for breast cancer. Breast Cancer Res Treat 2014, 148:665-673.
74. Wong Doo N, Makalic E, Joo JE, Vajdic CM, Schmidt DF, Wong EM, Jung CH, Severi G, Park DJ, Chung J, et al: Global measures of peripheral blood-derived DNA
methylation as a risk factor in the development of mature B-cell neoplasms. Epigenomics 2016.
75. Joo JE: The use of DNA from archival dried blood spots with the Infinium HumanMethylation450 array. BMC biotechnology 2013, 13:23.
76. Maksimovic J, Gordon L, Oshlack A: SWAN: Subset-quantile within array normalization for illumina infinium HumanMethylation450 BeadChips. Genome Biol 2012, l3:R44.
77. Machiela MJ, Chanock SJ: LDlink: a web-based application for exploring population- specific haplotype structure and linking correlated alleles of possible functional variants. Bioinformatics 2015, 31:3555-3557.
[0102] Although the present invention and its advantages have been described in detail, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims. Moreover, the scope of the present application is not intended to be limited to the particular embodiments of the process, machine, manufacture, composition of matter, means, methods and steps described in the specification. As one of ordinary skill in the art will readily appreciate from the disclosure of the present invention, processes, machines, manufacture, compositions of matter, means, methods, or steps, presently existing or later to be developed that perform substantially the same function or achieve substantially the same result as the corresponding embodiments described herein may be utilized according to the present invention. Accordingly, the appended claims are intended to include within their scope such processes, machines, manufacture, compositions of matter, means, methods, or steps.
73689209.1 APPENDIX A
Table 1: Annotated list of ESS and SIV probes
Figure imgf000048_0001
73689209.1
Figure imgf000049_0001
73689209.1
Figure imgf000050_0001
73689209.1
Figure imgf000051_0001
73689209.1
Figure imgf000052_0001
73689209.1
Figure imgf000053_0001
73689209.1
Figure imgf000054_0001
73689209.1
Figure imgf000055_0001
73689209.1
Figure imgf000056_0001
73689209.1
Figure imgf000057_0001
73689209.1
Figure imgf000058_0001
73689209.1
Figure imgf000059_0001
73689209.1
Figure imgf000060_0001
73689209.1
Figure imgf000061_0001
73689209.1
Figure imgf000062_0001
73689209.1
Figure imgf000063_0001
73689209.1
Figure imgf000064_0001
73689209.1
Figure imgf000065_0001
73689209.1
Figure imgf000066_0001
73689209.1
Figure imgf000067_0001
73689209.1
Figure imgf000068_0001
73689209.1
Figure imgf000069_0001
73689209.1
Figure imgf000070_0001
73689209.1
Figure imgf000071_0001
73689209.1
Figure imgf000072_0001
73689209.1
Figure imgf000073_0001
73689209.1
Figure imgf000074_0001
73689209.1
Figure imgf000075_0001
73689209.1
Figure imgf000076_0001
73689209.1
Figure imgf000077_0001
73689209.1
Figure imgf000078_0001
73689209.1
Figure imgf000079_0001
73689209.1
Figure imgf000080_0001
73689209.1
Figure imgf000081_0001
73689209.1
Figure imgf000082_0001
73689209.1
Figure imgf000083_0001
73689209.1
Figure imgf000084_0001
73689209.1
Figure imgf000085_0001
73689209.1
Figure imgf000086_0001
73689209.1
Figure imgf000087_0001
73689209.1
Figure imgf000088_0001
73689209.1
Figure imgf000089_0001
73689209.1
Figure imgf000090_0001
73689209.1
Figure imgf000091_0001
73689209.1
Figure imgf000092_0001
73689209.1
Figure imgf000093_0001
73689209.1
Figure imgf000094_0001
73689209.1
Figure imgf000095_0001
73689209.1
Figure imgf000096_0001
73689209.1
Figure imgf000097_0001
73689209.1
Figure imgf000098_0001
73689209.1
Figure imgf000099_0001
73689209.1
Figure imgf000100_0001
73689209.1
Figure imgf000101_0001
73689209.1
Figure imgf000102_0001
73689209.1
Figure imgf000103_0001
73689209.1
Figure imgf000104_0001
73689209.1
Figure imgf000105_0001
73689209.1
Figure imgf000106_0001
73689209.1
Figure imgf000107_0001
73689209.1
Figure imgf000108_0001
73689209.1
Figure imgf000109_0001
73689209.1
Figure imgf000110_0001
73689209.1
Figure imgf000111_0001
73689209.1
Figure imgf000112_0001
73689209.1
Figure imgf000113_0001
73689209.1
Figure imgf000114_0001
73689209.1
Figure imgf000115_0001
73689209.1
Figure imgf000116_0001
73689209.1
Figure imgf000117_0001
73689209.1
Figure imgf000118_0001
73689209.1
Figure imgf000119_0001
73689209.1 APPENDIX B
Table 3: ESS clusters
Figure imgf000120_0001
73689209.1
Figure imgf000121_0001
73689209.1
Figure imgf000122_0001
73689209.1
Figure imgf000123_0001
73689209.1
Figure imgf000124_0001
73689209.1
Figure imgf000125_0001
73689209.1
Figure imgf000126_0001
73689209.1
Figure imgf000127_0001
73689209.1
Figure imgf000128_0001
73689209.1
Figure imgf000129_0001
APPENDIX C
Table 4: MCCS permutation testing results for 10 most CpG-rich ESS clusters
Figure imgf000129_0002
73689209.1
Figure imgf000130_0001
73689209.1
Figure imgf000131_0001
73689209.1 APPENDIX D
Table 5: CpG sites in Cancer-Associated Clusters
Figure imgf000132_0001
73689209.1
Figure imgf000133_0001
73689209.1

Claims

CLAIMS What is claimed is:
1. A method of identifying a risk for a subject to develop cancer, comprising the step of determining from a genomic DNA sample from the subject the methylation status of one or more loci selected from the group consisting of ZFP57, SPATC1L, OR2L-13, VTRNA2-1, DUSP22, HCG4B, PF4, and a combination thereof.
2. The method of claim 1, wherein determining the methylation status comprises identifying methylation at 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, or more CpG sites in one or more of the loci.
3. The method of claim 1 or 2, wherein the loci include all of ZFP57, SPATC1L, OR2L-13, VTRNA2-1, DUSP22, HCG4B, and PF4.
4. The method of claim 1 or 2, wherein the loci include 6, 5, 4, 3, or 2 of ZFP57, SPATC1L, OR2L-13, VTRNA2-1, DUSP22, HCG4B, and PF4.
5. The method of any one of claims 1-4, wherein the individual is an adult, adolescent, child, or infant.
6. The method of any one of claims 1-5, wherein the cancer is cancer of the lung, breast, brain, colon, pancreas, uterus, bone, skin, endometrium, testes, uterus, spleen, liver, kidney, stomach, thyroid, gall bladder, blood, prostate, hematopoietic lineages, esophagus or is urothelial cancer.
7. The method of any one of claims 1-6, wherein the sample comprises peripheral blood, biopsy, hair, follicles, fingernails, saliva, cheek scrapings, cerebrospinal fluid, urine, nipple aspirate, fecal material, semen, sputum, mucus, or a combination thereof.
8. The method of any one of claims 1-7, wherein the method further comprises the step of obtaining a sample from the individual.
9. The method of any one of claims 1-8, wherein the method further comprises the step of providing a therapeutic and/or a preventative therapy to the individual.
10. The method of claim 9, wherein the therapeutic comprises surgery, drug, radiation, immunotherapy, hormone therapy, or a combination thereof.
73689209.1
11. The method of claim 9, wherein the preventative therapy comprises watchful waiting, surgery, drug, radiation, immunotherapy, hormone therapy, or a combination thereof.
12. The method of any one of claims 1-11, wherein the genomic DNA sample is from a tissue or body fluid other than the tissue or body fluid at risk for development of the cancer.
73689209.1
PCT/US2018/0632662017-11-302018-11-30Genomic dna methylation associated with disease predictionWO2019108906A1 (en)

Applications Claiming Priority (2)

Application NumberPriority DateFiling DateTitle
US201762592507P2017-11-302017-11-30
US62/592,5072017-11-30

Publications (1)

Publication NumberPublication Date
WO2019108906A1true WO2019108906A1 (en)2019-06-06

Family

ID=66664619

Family Applications (1)

Application NumberTitlePriority DateFiling Date
PCT/US2018/063266WO2019108906A1 (en)2017-11-302018-11-30Genomic dna methylation associated with disease prediction

Country Status (1)

CountryLink
WO (1)WO2019108906A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN111593069A (en)*2020-04-032020-08-28中国中医科学院西苑医院Construction method and application of plasmid for over-expressing vault RNA2-1 gene
CN115011691A (en)*2022-05-162022-09-06重庆美普蓝科技有限公司Application of DUSP22 in preparation of NASH, HCC marker and medicine

Citations (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20130022974A1 (en)*2011-06-172013-01-24The Regents Of The University Of MichiganDna methylation profiles in cancer
US20150072947A1 (en)*2011-08-302015-03-12National Defense Medical CenterGene biomarkers for prediction of susceptibility of ovarian neoplasms and/or prognosis or malignancy of ovarian cancers
WO2015095359A1 (en)*2013-12-172015-06-25Harry StylliMethods of detecting diseases or conditions
US20160326593A1 (en)*2013-11-252016-11-10The Broad Institute Inc.Compositions and methods for diagnosing, evaluating and treating cancer

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20130022974A1 (en)*2011-06-172013-01-24The Regents Of The University Of MichiganDna methylation profiles in cancer
US20150072947A1 (en)*2011-08-302015-03-12National Defense Medical CenterGene biomarkers for prediction of susceptibility of ovarian neoplasms and/or prognosis or malignancy of ovarian cancers
US20160326593A1 (en)*2013-11-252016-11-10The Broad Institute Inc.Compositions and methods for diagnosing, evaluating and treating cancer
WO2015095359A1 (en)*2013-12-172015-06-25Harry StylliMethods of detecting diseases or conditions

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
DO ET AL.: "Genetic-Epigenetic Interactions in Cis: A Major Focus in the Post-GWAS Era", GENOME BIOLOGY, vol. 18, no. 120, 19 June 2017 (2017-06-19), pages 1 - 22, XP055618497*
JOO ET AL.: "Heritable DNA Methylation Marks Associated with Susceptibility to Breast Cancer", NATURE COMMUNICATIONS, vol. 9, no. 867, 28 February 2018 (2018-02-28), pages 1 - 12, XP055618500*
ROMANELLI ET AL.: "Variable maternal methylation overlapping the nc886/vtRNA2-1 locus is locked between hypermethylated repeats and is frequently altered in cancer", EPIGENETICS, vol. 9, no. 5, 3 March 2014 (2014-03-03), pages 783 - 790, XP055618511*
VAN BAAK ET AL.: "Epigenetic supersimilarity of monozygotic twin pairs", GENOME BIOLOGY, vol. 19, no. 2, 9 January 2018 (2018-01-09), pages 1 - 20, XP055618501*

Cited By (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN111593069A (en)*2020-04-032020-08-28中国中医科学院西苑医院Construction method and application of plasmid for over-expressing vault RNA2-1 gene
CN115011691A (en)*2022-05-162022-09-06重庆美普蓝科技有限公司Application of DUSP22 in preparation of NASH, HCC marker and medicine
CN115011691B (en)*2022-05-162023-07-18重庆美普蓝科技有限公司Application of DUSP22 in preparation of NASH and HCC markers and medicines

Similar Documents

PublicationPublication DateTitle
Van Baak et al.Epigenetic supersimilarity of monozygotic twin pairs
EP3899030B1 (en)Methods for analysis of circulating cells
US20220056509A1 (en)Methods for cancer detection and monitoring
JP6700333B2 (en) Methods and materials for assessing loss of heterozygosity
CA3226132A1 (en)Methods for determining velocity of tumor growth
Zaidan et al.A-to-I RNA editing in the rat brain is age-dependent, region-specific and sensitive to environmental stress across generations
CN118076750A (en)Method for detecting neoplasms in pregnant women
CN109971852A (en)Detect the mutation and ploidy in chromosome segment
US20160108476A1 (en)Colorectal cancer markers
Marin et al.Validation of a targeted next generation sequencing-based comprehensive chromosome screening platform for detection of triploidy in human blastocysts
Zemet et al.Parental mosaicism for apparent de novo genetic variants: scope, detection, and counseling challenges
WO2019108906A1 (en)Genomic dna methylation associated with disease prediction
US20240084389A1 (en)Use of simultaneous marker detection for assessing difuse glioma and responsiveness to treatment
US10770183B2 (en)Methods of assessing a risk of developing necrotizing meningoencephalitis
RU2811503C2 (en)Methods of detecting and monitoring cancer by personalized detection of circulating tumor dna
US20240352513A1 (en)Detecting mutations and ploidy in chromosomal segments
WentDeciphering genetic susceptibility to multiple myeloma
ZhangThe integrated genomic analyses of human cancers
SewdaAnalysis of rare and common variants to identify inherited and maternal genes associated with conotruncal heart defects

Legal Events

DateCodeTitleDescription
121Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number:18882456

Country of ref document:EP

Kind code of ref document:A1

DPE1Request for preliminary examination filed after expiration of 19th month from priority date (pct application filed from 20040101)
NENPNon-entry into the national phase

Ref country code:DE

122Ep: pct application non-entry in european phase

Ref document number:18882456

Country of ref document:EP

Kind code of ref document:A1


[8]ページ先頭

©2009-2025 Movatter.jp