Movatterモバイル変換


[0]ホーム

URL:


Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
Thehttps:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

NIH NLM Logo
Log inShow account info
Access keysNCBI HomepageMyNCBI HomepageMain ContentMain Navigation
pubmed logo
Advanced Clipboard
User Guide

Full text links

Free PMC article
Full text links

Actions

Share

.2013 Dec 1;6(6):496-505.
doi: 10.1002/sam.11196.

A Weighted Random Forests Approach to Improve Predictive Performance

Affiliations

A Weighted Random Forests Approach to Improve Predictive Performance

Stacey J Winham et al. Stat Anal Data Min..

Abstract

Identifying genetic variants associated with complex disease in high-dimensional data is a challenging problem, and complicated etiologies such as gene-gene interactions are often ignored in analyses. The data-mining method Random Forests (RF) can handle high-dimensions; however, in high-dimensional data, RF is not an effective filter for identifying risk factors associated with the disease trait via complex genetic models such as gene-gene interactions without strong marginal components. Here we propose an extension called Weighted Random Forests (wRF), which incorporates tree-level weights to emphasize more accurate trees in prediction and calculation of variable importance. We demonstrate through simulation and application to data from a genetic study of addiction that wRF can outperform RF in high-dimensional data, although the improvements are modest and limited to situations with effect sizes that are larger than what is realistic in genetics of complex disease. Thus, the current implementation of wRF is unlikely to improve detection of relevant predictors in high-dimensional genetic data, but may be applicable in other situations where larger effect sizes are anticipated.

Keywords: Random Forests; gene-gene interactions; genetic data; genome wide association; high-dimensional data; weighting.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Simulation Results (wPE and wAUC), m=2 causal pairs. Weighted Prediction Error (top row) and AUC (bottom row) plotted against total heritability for models with 2 causal interacting pairs. Results are only displayed for a subset of weights and for H2M/H2I = 0.5, 2.0; for the full results, see Supporting Information. The results for weights that are not plotted are similar and in the same range.
Figure 2
Figure 2
Simulation Results (wPE and wAUC), m=10 causal pairs. Weighted Prediction Error (top row) and AUC (bottom row) plotted against total heritability for models with 10 causal interacting pairs. Results are only displayed for a subset of weights and for H2M/H2I = 0.5, 2.0; for the full results, see Supporting Information. The results for weights that are not plotted are similar and in the same range.
Figure 3
Figure 3
Variable importance results from real data analysis of the NMDA-dependent AMPA trafficking cascade pathway. Importance is plotted for each SNP by chromosomal location for A: traditional RF using OOB estimates, and wRF with cross-validation and weights of B: x=1, C: x=(1/tPE)2, D: x=(1/tPE)5, and E: x=rank(1/tPE). Colors indicate gene membership, as specified on the x-axis. Only SNPs with importance > 0 are plotted. Importance is plotted for each SNP by chromosomal location for A: traditional RF using OOB estimates, and wRF with a single data split and weights of B: x=1, C: x=(1/tPE)2, D: x=(1/tPE)5, and E: x=rank(1/tPE). Colors indicate gene membership, as specified on the x-axis. Only SNPs with importance > 0 are plotted.
See this image and copyright information in PMC

Similar articles

See all similar articles

Cited by

See all "Cited by" articles

References

    1. McCarthy MI, Abecasis GR, Cardon LR, Goldstein DB, Little J, Ioannidis JP, Hirschhorn JN. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nature reviews Genetics. 2008;9(5):356–369. - PubMed
    1. Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, McCarthy MI, Ramos EM, Cardon LR, Chakravarti A, et al. Finding the missing heritability of complex diseases. Nature. 2009;461(7265):747–753. - PMC - PubMed
    1. Breiman L. Random Forests. Mach Learn. 2001;45:5–32.
    1. Lunetta KL, Hayward LB, Segal J, Van Eerdewegh P. Screening large-scale association study data: exploiting interactions using random forests. BMC genetics. 2004;5(1):32. - PMC - PubMed
    1. Goldstein BA, Hubbard AE, Cutler A, Barcellos LF. An application of Random Forests to a genome-wide association dataset: methodological considerations & new findings. BMC genetics. 2010;11:49. - PMC - PubMed

Grants and funding

LinkOut - more resources

Full text links
Free PMC article
Cite
Send To

NCBI Literature Resources

MeSHPMCBookshelfDisclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.


[8]ページ先頭

©2009-2025 Movatter.jp