Movatterモバイル変換


[0]ホーム

URL:


Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
Thehttps:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

NIH NLM Logo
Log inShow account info
Access keysNCBI HomepageMyNCBI HomepageMain ContentMain Navigation
pubmed logo
Advanced Clipboard
User Guide

Full text links

Nature Publishing Group full text link Nature Publishing Group Free PMC article
Full text links

Actions

Share

.2022 Dec;19(12):1599-1611.
doi: 10.1038/s41592-022-01640-x. Epub 2022 Oct 27.

A framework for detecting noncoding rare-variant associations of large-scale whole-genome sequencing studies

Zilin Li #  1  2Xihao Li #  3Hufeng Zhou  3Sheila M Gaynor  3Margaret Sunitha Selvaraj  4  5  6Theodore Arapoglou  3Corbin Quick  3Yaowu Liu  7Han Chen  8  9Ryan Sun  10Rounak Dey  3Donna K Arnett  11Paul L Auer  12Lawrence F Bielak  13Joshua C Bis  14Thomas W Blackwell  15John Blangero  16Eric Boerwinkle  8  17Donald W Bowden  18Jennifer A Brody  14Brian E Cade  5  19  20Matthew P Conomos  21Adolfo Correa  22L Adrienne Cupples  23  24Joanne E Curran  16Paul S de Vries  8Ravindranath Duggirala  16Nora Franceschini  25Barry I Freedman  26Harald H H Göring  16Xiuqing Guo  27Rita R Kalyani  28Charles Kooperberg  29Brian G Kral  28Leslie A Lange  30Bridget M Lin  31Ani Manichaikul  32Alisa K Manning  6  33  34Lisa W Martin  35Rasika A Mathias  28James B Meigs  5  6  36Braxton D Mitchell  37  38May E Montasser  39Alanna C Morrison  8Take Naseri  40Jeffrey R O'Connell  37Nicholette D Palmer  18Patricia A Peyser  13Bruce M Psaty  14  41  42Laura M Raffield  43Susan Redline  19  20  44Alexander P Reiner  29  41Muagututi'a Sefuiva Reupena  45Kenneth M Rice  21Stephen S Rich  32Jennifer A Smith  13  46Kent D Taylor  27Margaret A Taub  47Ramachandran S Vasan  24  48Daniel E Weeks  49James G Wilson  50Lisa R Yanek  28Wei Zhao  13NHLBI Trans-Omics for Precision Medicine (TOPMed) ConsortiumTOPMed Lipids Working GroupJerome I Rotter  27Cristen J Willer  51  52  53Pradeep Natarajan  4  5  6Gina M Peloso  23  24Xihong Lin  54  55  56
Collaborators, Affiliations

A framework for detecting noncoding rare-variant associations of large-scale whole-genome sequencing studies

Zilin Li et al. Nat Methods.2022 Dec.

Abstract

Large-scale whole-genome sequencing studies have enabled analysis of noncoding rare-variant (RV) associations with complex human diseases and traits. Variant-set analysis is a powerful approach to study RV association. However, existing methods have limited ability in analyzing the noncoding genome. We propose a computationally efficient and robust noncoding RV association detection framework, STAARpipeline, to automatically annotate a whole-genome sequencing study and perform flexible noncoding RV association analysis, including gene-centric analysis and fixed window-based and dynamic window-based non-gene-centric analysis by incorporating variant functional annotations. In gene-centric analysis, STAARpipeline uses STAAR to group noncoding variants based on functional categories of genes and incorporate multiple functional annotations. In non-gene-centric analysis, STAARpipeline uses SCANG-STAAR to incorporate dynamic window sizes and multiple functional annotations. We apply STAARpipeline to identify noncoding RV sets associated with four lipid traits in 21,015 discovery samples from the Trans-Omics for Precision Medicine (TOPMed) program and replicate several of them in an additional 9,123 TOPMed samples. We also analyze five non-lipid TOPMed traits.

© 2022. The Author(s), under exclusive licence to Springer Nature America, Inc.

PubMed Disclaimer

Conflict of interest statement

Competing interests

S.M.G. is now an employee of Regeneron Genetics Center. J.B.M. is an Academic Associate for Quest Diagnostics R&D. For B.D.M.: The Amish Research Program receives partial support from Regeneron Pharmaceuticals. M.E.M. reports grant from Regeneron Pharmaceutical unrelated to the present work. B.M.P. serves on the Steering Committee of the Yale Open Data Access Project funded by Johnson & Johnson. L.M.R. is a consultant for the TOPMed Admistrative Coordinating Center (through Westat). For S.R.: Jazz Pharma, Eli Lilly, Apnimed, unrelated to the present work. The spouse of C.J.W. works at Regeneron Pharmaceuticals. P.N. reports investigator-initiated grants from Amgen, Apple, AstraZeneca, Boston Scientific, and Novartis, personal fees from Apple, AstraZeneca, Blackstone Life Sciences, Foresite Labs, Novartis, Roche / Genentech, is a co-founder of TenSixteen Bio, is a shareholder of geneXwell and TenSixteen Bio, and spousal employment at Vertex, all unrelated to the present work. X. Lin is a consultant of AbbVie Pharmaceuticals and Verily Life Sciences. The remaining authors declare no competing interests.

Figures

Extended Data Fig. 1|
Extended Data Fig. 1|. Rare variant (MAF< 0.01) distribution in the discovery phase using TOPMed cohorts (n=21,015).
Variant categories are defined by GENCODE VEP categories.
Extended Data Fig. 2|
Extended Data Fig. 2|. Manhattan plots and Q-Q plots for unconditional gene-centric noncoding analysis and sliding window analysis of high-density lipoprotein cholesterol (HDL-C) in the discovery phase (n=21,015).
a, Manhattan plots for unconditional gene-centric noncoding analysis of protein-coding gene. The horizontal line indicates a genome-wide STAAR-OP value threshold of3.57×107. The significant threshold is defined by multiple comparisons using the Bonferroni correction0.05/20,000 × 7=3.57 × 107. Different symbols represent the STAAR-OP value of the protein-coding gene using different functional categories (upstream, downstream, UTR, promoter_CAGE, promoter_DHS, enhancer_CAGE, enhancer_DHS). Promoter_CAGE and promoter_DHS are the promoters with overlap of Cap Analysis of Gene Expression (CAGE) sites and DNase hypersensitivity (DHS) sites for a given gene, respectively. Enhancer_CAGE and enhancer_DHS are the enhancers in GeneHancer predicted regions with the overlap of CAGE sites and DHS sites for a given gene, respectively.b, Quantile-quantile plots for unconditional gene-centric noncoding analysis of protein-coding gene. Different symbols represent the STAAR-OP-value of the gene using different functional categories (upstream, downstream, UTR, promoter_CAGE, promoter_DHS, enhancer_CAGE, enhancer_DHS).c, Manhattan plots for unconditional gene-centric noncoding analysis of ncRNA gene. The horizontal line indicates a genome-wide STAAR-OP value threshold of2.50×106. The significant threshold is defined by multiple comparisons using the Bonferroni correction0.05/20,000=2.50 × 106.d, Quantile-quantile plots for unconditional gene-centric noncoding analysis of ncRNA gene.e, Manhattan plot for 2-kb sliding windows. The horizontal line indicates a genome-wideP value threshold of1.88 × 108. The significant threshold is defined by multiple comparisons using the Bonferroni correction0.05/2.66 × 106=1.88 × 108.f, Quantile-quantile plot for 2-kb sliding windows. In panels,a,c ande, the chromosome number are indicated by the colors of dots. In all panels, STAAR-O is a two-sided test.
Extended Data Fig. 3|
Extended Data Fig. 3|. Manhattan plots and Q-Q plots for unconditional gene-centric noncoding analysis and sliding window analysis of low-density lipoprotein cholesterol (LDL-C) in the discovery phase (n=21,015).
a, Manhattan plots for unconditional gene-centric noncoding analysis of protein-coding gene. The horizontal line indicates a genome-wide STAAR-OP-value threshold of3.57×107. The significant threshold is defined by multiple comparisons using the Bonferroni correction0.05/20,000 × 7=3.57 × 107. Different symbols represent the STAAR-OP-value of the protein-coding gene using different functional categories (upstream, downstream, UTR, promoter_CAGE, promoter_DHS, enhancer_CAGE, enhancer_DHS). Promoter_CAGE and promoter_DHS are the promoters with overlap of Cap Analysis of Gene Expression (CAGE) sites and DNase hypersensitivity (DHS) sites for a given gene, respectively. Enhancer_CAGE and enhancer_DHS are the enhancers in GeneHancer predicted regions with the overlap of CAGE sites and DHS sites for a given gene, respectively.b, Quantile-quantile plots for unconditional gene-centric noncoding analysis of protein-coding gene. Different symbols represent the STAAR-OP-value of the gene using different functional categories (upstream, downstream, UTR, promoter_CAGE, promoter_DHS, enhancer_CAGE, enhancer_DHS).c, Manhattan plots for unconditional gene-centric noncoding analysis of ncRNA gene. The horizontal line indicates a genome-wide STAAR-OP-value threshold of2.50×106. The significant threshold is defined by multiple comparisons using the Bonferroni correction0.05/20,000=2.50 × 106.d, Quantile-quantile plots for unconditional gene-centric noncoding analysis of ncRNA gene.e, Manhattan plot for 2-kb sliding windows. The horizontal line indicates a genome-wideP-value threshold of1.88 × 108. The significant threshold is defined by multiple comparisons using the Bonferroni correction0.05/2.66 × 106=1.88 × 108.f, Quantile-quantile plot for 2-kb sliding windows. In panels,a,c ande, the chromosome number are indicated by the colors of dots. In all panels, STAAR-O is a two-sided test.
Extended Data Fig. 4|
Extended Data Fig. 4|. Manhattan plots and Q-Q plots for unconditional gene-centric noncoding analysis and sliding window analysis of triglycerides (TG) in the discovery phase (n=21,015).
a, Manhattan plots for unconditional gene-centric noncoding analysis of protein-coding gene. The horizontal line indicates a genome-wide STAAR-OP-value threshold of3.57×107. The significant threshold is defined by multiple comparisons using the Bonferroni correction0.05/20,000 × 7=3.57 × 107. Different symbols represent the STAAR-OP-value of the protein-coding gene using different functional categories (upstream, downstream, UTR, promoter_CAGE, promoter_DHS, enhancer_CAGE, enhancer_DHS). Promoter_CAGE and promoter_DHS are the promoters with overlap of Cap Analysis of Gene Expression (CAGE) sites and DNase hypersensitivity (DHS) sites for a given gene, respectively. Enhancer_CAGE and enhancer_DHS are the enhancers in GeneHancer predicted regions with the overlap of CAGE sites and DHS sites for a given gene, respectively.b, Quantile-quantile plots for unconditional gene-centric noncoding analysis of protein-coding gene. Different symbols represent the STAAR-OP-value of the gene using different functional categories (upstream, downstream, UTR, promoter_CAGE, promoter_DHS, enhancer_CAGE, enhancer_DHS).c, Manhattan plots for unconditional gene-centric noncoding analysis of ncRNA gene. The horizontal line indicates a genome-wide STAAR-OP-value threshold of2.50×106. The significant threshold is defined by multiple comparisons using the Bonferroni correction0.05/20,000=2.50 × 106.d, Quantile-quantile plots for unconditional gene-centric noncoding analysis of ncRNA gene.e, Manhattan plot for 2-kb sliding windows. The horizontal line indicates a genome-wideP-value threshold of1.88 × 108. The significant threshold is defined by multiple comparisons using the Bonferroni correction0.05/2.66 × 106=1.88 × 108.f, Quantile-quantile plot for 2-kb sliding windows. In panels,a,c ande, the chromosome number are indicated by the colors of dots. In all panels, STAAR-O is a two-sided test.
Extended Data Fig. 5|
Extended Data Fig. 5|. Manhattan plots and Q-Q plots for unconditional gene-centric noncoding analysis and sliding window analysis of total cholesterol (TC) in the discovery phase (n=21,015).
a, Manhattan plots for unconditional gene-centric noncoding analysis of protein-coding gene. The horizontal line indicates a genome-wide STAAR-OP-value threshold of3.57×107. The significant threshold is defined by multiple comparisons using the Bonferroni correction0.05/20,000 × 7=3.57 × 107. Different symbols represent the STAAR-OP-value of the protein-coding gene using different functional categories (upstream, downstream, UTR, promoter_CAGE, promoter_DHS, enhancer_CAGE, enhancer_DHS). Promoter_CAGE and promoter_DHS are the promoters with overlap of Cap Analysis of Gene Expression (CAGE) sites and DNase hypersensitivity (DHS) sites for a given gene, respectively. Enhancer_CAGE and enhancer_DHS are the enhancers in GeneHancer predicted regions with the overlap of CAGE sites and DHS sites for a given gene, respectively.b, Quantile-quantile plots for unconditional gene-centric noncoding analysis of protein-coding gene. Different symbols represent the STAAR-OP-value of the gene using different functional categories (upstream, downstream, UTR, promoter_CAGE, promoter_DHS, enhancer_CAGE, enhancer_DHS).c, Manhattan plots for unconditional gene-centric noncoding analysis of ncRNA gene. The horizontal line indicates a genome-wide STAAR-OP-value threshold of2.50×106. The significant threshold is defined by multiple comparisons using the Bonferroni correction0.05/20,000=2.50 × 106.d, Quantile-quantile plots for unconditional gene-centric noncoding analysis of ncRNA gene.e, Manhattan plot for 2-kb sliding windows. The horizontal line indicates a genome-wideP-value threshold of1.88 × 108. The significant threshold is defined by multiple comparisons using the Bonferroni correction0.05/2.66 × 106=1.88 × 108.f, Quantile-quantile plot for 2-kb sliding windows. In panels,a,c ande, the chromosome number are indicated by the colors of dots. In all panels, STAAR-O is a two-sided test.
Fig. 1 |
Fig. 1 |. Workflow ofSTAARpipeline.
(a) Prepare the input data ofSTAARpipeline, including genotypes, phenotypes and covariates. (b) Annotate all variants in the genome using FAVORannotator through FAVOR database and calculate the (sparse) genetic relatedness matrix. (c) Define analysis units in the noncoding genome: eight functional categories of regulatory regions, sliding windows and dynamic windows using SCANG. (d) Obtain genome-wide significant associations and perform analytical follow-up via conditional analysis.
See this image and copyright information in PMC

Comment in

Similar articles

  • Dynamic incorporation of multiple in silico functional annotations empowers rare variant association analysis of large whole-genome sequencing studies at scale.
    Li X, Li Z, Zhou H, Gaynor SM, Liu Y, Chen H, Sun R, Dey R, Arnett DK, Aslibekyan S, Ballantyne CM, Bielak LF, Blangero J, Boerwinkle E, Bowden DW, Broome JG, Conomos MP, Correa A, Cupples LA, Curran JE, Freedman BI, Guo X, Hindy G, Irvin MR, Kardia SLR, Kathiresan S, Khan AT, Kooperberg CL, Laurie CC, Liu XS, Mahaney MC, Manichaikul AW, Martin LW, Mathias RA, McGarvey ST, Mitchell BD, Montasser ME, Moore JE, Morrison AC, O'Connell JR, Palmer ND, Pampana A, Peralta JM, Peyser PA, Psaty BM, Redline S, Rice KM, Rich SS, Smith JA, Tiwari HK, Tsai MY, Vasan RS, Wang FF, Weeks DE, Weng Z, Wilson JG, Yanek LR; NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium; TOPMed Lipids Working Group; Neale BM, Sunyaev SR, Abecasis GR, Rotter JI, Willer CJ, Peloso GM, Natarajan P, Lin X.Li X, et al.Nat Genet. 2020 Sep;52(9):969-983. doi: 10.1038/s41588-020-0676-4. Epub 2020 Aug 24.Nat Genet. 2020.PMID:32839606Free PMC article.
  • Rare variants in long non-coding RNAs are associated with blood lipid levels in the TOPMed whole-genome sequencing study.
    Wang Y, Selvaraj MS, Li X, Li Z, Holdcraft JA, Arnett DK, Bis JC, Blangero J, Boerwinkle E, Bowden DW, Cade BE, Carlson JC, Carson AP, Chen YI, Curran JE, de Vries PS, Dutcher SK, Ellinor PT, Floyd JS, Fornage M, Freedman BI, Gabriel S, Germer S, Gibbs RA, Guo X, He J, Heard-Costa N, Hildalgo B, Hou L, Irvin MR, Joehanes R, Kaplan RC, Kardia SL, Kelly TN, Kim R, Kooperberg C, Kral BG, Levy D, Li C, Liu C, Lloyd-Jone D, Loos RJ, Mahaney MC, Martin LW, Mathias RA, Minster RL, Mitchell BD, Montasser ME, Morrison AC, Murabito JM, Naseri T, O'Connell JR, Palmer ND, Preuss MH, Psaty BM, Raffield LM, Rao DC, Redline S, Reiner AP, Rich SS, Ruepena MS, Sheu WH, Smith JA, Smith A, Tiwari HK, Tsai MY, Viaud-Martinez KA, Wang Z, Yanek LR, Zhao W; NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium; Rotter JI, Lin X, Natarajan P, Peloso GM.Wang Y, et al.Am J Hum Genet. 2023 Oct 5;110(10):1704-1717. doi: 10.1016/j.ajhg.2023.09.003.Am J Hum Genet. 2023.PMID:37802043Free PMC article.
  • Dynamic Scan Procedure for Detecting Rare-Variant Association Regions in Whole-Genome Sequencing Studies.
    Li Z, Li X, Liu Y, Shen J, Chen H, Zhou H, Morrison AC, Boerwinkle E, Lin X.Li Z, et al.Am J Hum Genet. 2019 May 2;104(5):802-814. doi: 10.1016/j.ajhg.2019.03.002. Epub 2019 Apr 12.Am J Hum Genet. 2019.PMID:30982610Free PMC article.
  • Methods for the Analysis and Interpretation for Rare Variants Associated with Complex Traits.
    Weissenkampen JD, Jiang Y, Eckert S, Jiang B, Li B, Liu DJ.Weissenkampen JD, et al.Curr Protoc Hum Genet. 2019 Apr;101(1):e83. doi: 10.1002/cphg.83. Epub 2019 Mar 8.Curr Protoc Hum Genet. 2019.PMID:30849219Free PMC article.Review.
  • Molecular genetic studies of complex phenotypes.
    Marian AJ.Marian AJ.Transl Res. 2012 Feb;159(2):64-79. doi: 10.1016/j.trsl.2011.08.001. Epub 2011 Aug 31.Transl Res. 2012.PMID:22243791Free PMC article.Review.
See all similar articles

Cited by

See all "Cited by" articles

References

    1. Manolio TA et al. Finding the missing heritability of complex diseases. Nature 461, 747–753 (2009). - PMC - PubMed
    1. Wainschtein P et al. Assessing the contribution of rare variants to complex trait heritability from whole-genome sequence data. Nature Genetics 54, 263–273 (2022). - PMC - PubMed
    1. Hernandez RD et al. Ultrarare variants drive substantial cis heritability of human gene expression. Nature genetics 51, 1349–1355 (2019). - PMC - PubMed
    1. Taliun D et al. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Nature 590, 290–299 (2021). - PMC - PubMed
    1. Flannick J et al. Exome sequencing of 20,791 cases of type 2 diabetes and 24,440 controls. Nature 570, 71–76 (2019). - PMC - PubMed

Methods-only references

    1. Chen H et al. Efficient variant set mixed model association tests for continuous and binary traits in large-scale whole-genome sequencing studies. The American Journal of Human Genetics 104, 260–274 (2019). - PMC - PubMed
    1. Gazal S et al. Linkage disequilibrium–dependent architecture of human complex traits shows action of negative selection. Nature Genetics 49, 1421–1427 (2017). - PMC - PubMed
    1. Li X & Li Z xihaoli/STAARpipeline: STAARpipeline_v0.9.6 Version 0.9.6 10.5281/zenodo.6871504 (2022). - DOI
    1. Li X & Li Z xihaoli/STAARpipelineSummary: STAARpipelineSummary_v0.9.6 Version 0.9.6 10.5281/zenodo.6871524 (2022). - DOI
    1. Li X & Li Z xihaoli/STAARpipeline-Tutorial: v0.9.6 Version 0.9.6 10.5281/zenodo.6871408 (2022). - DOI

Publication types

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full text links
Nature Publishing Group full text link Nature Publishing Group Free PMC article
Cite
Send To

NCBI Literature Resources

MeSHPMCBookshelfDisclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.


[8]ページ先頭

©2009-2025 Movatter.jp