Movatterモバイル変換


[0]ホーム

URL:


Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
Thehttps:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

NIH NLM Logo
Log inShow account info
Access keysNCBI HomepageMyNCBI HomepageMain ContentMain Navigation
pubmed logo
Advanced Clipboard
User Guide

Full text links

Silverchair Information Systems full text link Silverchair Information Systems Free PMC article
Full text links

Actions

Share

.2016 Oct;204(2):783-798.
doi: 10.1534/genetics.116.188391. Epub 2016 Aug 3.

Gene and Network Analysis of Common Variants Reveals Novel Associations in Multiple Complex Diseases

Affiliations

Gene and Network Analysis of Common Variants Reveals Novel Associations in Multiple Complex Diseases

Priyanka Nakka et al. Genetics.2016 Oct.

Abstract

Genome-wide association (GWA) studies typically lack power to detect genotypes significantly associated with complex diseases, where different causal mutations of small effect may be present across cases. A common, tractable approach for identifying genomic elements associated with complex traits is to evaluate combinations of variants in known pathways or gene sets with shared biological function. Such gene-set analyses require the computation of gene-level P-values or gene scores; these gene scores are also useful when generating hypotheses for experimental validation. However, commonly used methods for generating GWA gene scores are computationally inefficient, biased by gene length, imprecise, or have low true positive rate (TPR) at low false positive rates (FPR), leading to erroneous hypotheses for functional validation. Here we introduce a new method, PEGASUS, for analytically calculating gene scores. PEGASUS produces gene scores with as much as 10 orders of magnitude higher numerical precision than competing methods. In simulation, PEGASUS outperforms existing methods, achieving up to 30% higher TPR when the FPR is fixed at 1%. We use gene scores from PEGASUS as input to HotNet2 to identify networks of interacting genes associated with multiple complex diseases and traits; this is the first application of HotNet2 to common variation. In ulcerative colitis and waist-hip ratio, we discover networks that include genes previously associated with these phenotypes, as well as novel candidate genes. In contrast, existing methods fail to identify these networks. We also identify networks for attention-deficit/hyperactivity disorder, in which GWA studies have yet to identify any significant SNPs.

Keywords: GWAS; common variants; complex diseases; pathway analysis; quantitative traits.

Copyright © 2016 by the Genetics Society of America.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Schematic representations of PEGASUS and the three other methods—minSNP, permSNP, and VEGAS—assessed in this study. (A) minSNP defines the gene score to be the lowest of the SNP-levelP-values within the gene observed in a GWA study. (B) permSNP (Ballardet al. 2010) performsM permutations of case–control labels in genotype data, recomputes GWAP-values for each SNP for each permuted data set, averages SNPP-values within each gene, and computes an empirical geneP-value based on the number of times the observed geneP-value is lower than permutedP-values. (C) VEGAS performs multivariate normal simulations from a null distribution ofχ12 statistics where theχ12 statistics are correlated by empirical LD calculated from genotype data.M simulations are performed, the null statistics are summed within each gene and the empirical geneP-value is the number of times the observedχ12 statistic is lower than the permutedχ12 statistic. (D) In PEGASUS, for each gene, we numerically integrate the distribution of the sum of correlatedχ12 statistics at the observed gene statistic to determine the gene score. We also assess the performance of SKAT (Wuet al. 2010, 2011), which is not depicted here. SKAT uses a multiple linear/logistic regression framework, where genotypes for variants in a gene set and covariates are regressed onto phenotype to generate gene scores.
Figure 2
Figure 2
Quantile–quantile plots comparing gene scores produced by PEGASUS against those produced by minSNP, permSNP, and VEGAS. (A) Quantile–quantile plots of PEGASUS gene scoresvs. minSNP gene scores. Each point represents a gene and is colored yellow, red, or blue based on gene length percentile, 0–25%, 25–75%, and 75–100%, respectively. The phenotype used is waist–hip ratio adjusted for body mass index (WHR). minSNP gene scores are smaller than PEGASUS gene scores and decrease with increasing number of SNPs in a gene. The deviations fromy=x show that minSNP scores are biased toward being smaller than PEGASUS scores, and this bias increases for increasing gene length (genes colored in blue and red). (B) Base-10 logarithm of PEGASUS gene scoresvs. base-10 logarithm of permSNP gene scores for acute lymphoblastic leukemia (ALL). permSNP can determine gene scores only as low as the reciprocal of the number of permutations (10,000 in this case) whereas PEGASUS can determine gene scores as low as2.22×1016 (the numerical precision of R). Note that the minimum permSNP scores of104 differ widely from theirP-values computed by PEGASUS. (C) Base-10 logarithm of PEGASUS gene scoresvs. base-10 logarithm VEGAS gene scores. Using 1 million simulations, the lowest gene scores output by VEGAS are106, while PEGASUS determines gene scores as low as2.22×1016. In addition, for gene scores close to the reciprocal of the maximum number of simulations performed, VEGAS can return inaccurate gene scores compared to PEGASUS.
Figure 3
Figure 3
PEGASUS gene hits are enriched for known GWA study associations compared to minSNP gene hits. Shown are the numbers of minSNP gene hits (blue) and PEGASUS gene hits (orange) that contain known GWA study associations and gene hits not previously found in GWA studies (gray) for 12 GWA study data sets. To the right of each bar are positive predictive values (PPV) for each gene score for every data set; where possible, boldface type indicates the gene score with the highest PPV for each disease and “NA” means that PPV is undefined, which occurs when there are zero gene hits. A gene hit is a gene with a score of<2.8×106, and known GWA study associations are genes containing genome-wide significant SNPs in GWA studies conducted with different data sets from the 12 data sets analyzed here. We find that PEGASUS gene hits have as much as 2.8-fold higher PPV than minSNP gene hits.
Figure 4
Figure 4
Receiver operating characteristic (ROC) curves from GWA conducted with simulated phenotypes show gene scores controlling for LD achieve higher true positive rates at low false positive rates. We performed a GWA study for a simulated phenotype with known underlying true causal genes (seeMaterials and Methods) and determined true positive rate (TPR; genes truly associated with phenotype that were identified as such) and false positive rate (FPR; genes identified as causal by a gene score method that were not truly associated with the simulated phenotype) for minSNP, VEGAS, SKAT, and PEGASUS for various gene score thresholds (seeMaterials and Methods). We find that PEGASUS and VEGAS, which control for pairwise correlations between SNPs within genes, outperform minSNP and SKAT with higher TPRs at very low FPRs.
Figure 5
Figure 5
Subnetworks for ulcerative colitis (A–C), attention-deficit/hyperactivity disorder (D–F), and waist–hip ratio adjusted for body mass index (G and H) from significant runs of HotNet2 (Leisersonet al. 2015) (p0.05 for multiple subnetwork sizes), using PEGASUS gene scores as input. Circles represent genes in each subnetwork and are colored by heat score (negative log-transformed PEGASUS gene scores); the color bar indicates the lowest heat score (blue or “cold” genes) and the highest heat score (red or “hot” genes) in each subnetwork for a given phenotype. Lines between genes indicate a direct gene–gene interaction from the HINT database (Das and Yu 2012). Gene names that are underlined, in boldface type, and italicized represent genes that have been previously associated with the ulcerative colitis (Duerret al. 2006; Raelsonet al. 2007; WTCCC 2007; Barrettet al. 2008; Silverberget al. 2009; Frankeet al. 2010; McGovernet al. 2010; Andersonet al. 2011; Festenet al. 2011; Ellinghauset al. 2012; Jostinset al. 2012; Scharlet al. 2012; Parkeset al. 2013), attention-deficit/hyperactivity disorder (Wanet al. 1998; Maden 2007; Daviset al. 2008; Nakaet al. 2008; McGrathet al. 2009; Needet al. 2009; Chenet al. 2010; Cirulliet al. 2010, 2012; Fuentealbaet al. 2010; Nealeet al. 2010; Huet al. 2011; Lucianoet al. 2011; Tanget al. 2011; De Jageret al. 2012; Rivièreet al. 2012; Schuurs-Hoeijmakerset al. 2012; Peixoto and Abel 2013; Rietveldet al. 2013), and waist–hip ratio (Cantileet al. 2003; Eguchiet al. 2008, 2011; Guthet al. 2009; Hagberget al. 2010; Heidet al. 2010; Siervoet al. 2012; Tchkoniaet al. 2013; Karpe and Pinnick 2015) phenotypes in GWA or functional studies.
See this image and copyright information in PMC

Similar articles

See all similar articles

Cited by

See all "Cited by" articles

References

    1. Anderson C. A., Boucher G., Lees C. W., Franke A., D’Amato M., et al. , 2011. Meta-analysis identifies 29 additional ulcerative colitis risk loci, increasing the number of confirmed associations to 47. Nat. Genet. 43: 246–252. - PMC - PubMed
    1. Auton A., Abecasis G. R., Altshuler D. M., Durbin R. M., Bentley D. R., et al. , 2015. A global reference for human genetic variation. Nature 526: 68–74. - PMC - PubMed
    1. Backes C., Meder B., Lai A., Stoll M., Rühle F., et al. , 2016. Pathway-based variant enrichment analysis on the example of dilated cardiomyopathy. Hum. Genet. 135: 31–40. - PubMed
    1. Baker M., Gaukrodger N., Mayosi B. M., Imrie H., Farrall M., et al. , 2005. Association between common polymorphisms of the proopiomelanocortin gene and body fat distribution: a family study. Diabetes 54: 2492–2496. - PubMed
    1. Ballard D. H., Cho J., Zhao H., 2010. Comparisons of multi-marker association methods to detect association between a candidate region and disease. Genet. Epidemiol. 34: 201–212. - PMC - PubMed

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full text links
Silverchair Information Systems full text link Silverchair Information Systems Free PMC article
Cite
Send To

NCBI Literature Resources

MeSHPMCBookshelfDisclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.


[8]ページ先頭

©2009-2025 Movatter.jp