Movatterモバイル変換


[0]ホーム

URL:


Skip to Main Content
Advertisement
Oxford Academic
Search
Bioinformatics
International Society for Computational Biology
Close
Search
Journal Article

GWAS GUI: graphical browser for the results of whole-genome association studies with high-dimensional phenotypes

,
Wei Chen*
Center for Statistical Genetics, Department of Biostatistics, University of Michigan, 1420 Washington Heights, Ann Arbor, MI 48109, USA
*To whom correspondence should be addressed.
Search for other works by this author on:
,
Liming Liang
Center for Statistical Genetics, Department of Biostatistics, University of Michigan, 1420 Washington Heights, Ann Arbor, MI 48109, USA
Search for other works by this author on:
Gonçalo R. Abecasis
Center for Statistical Genetics, Department of Biostatistics, University of Michigan, 1420 Washington Heights, Ann Arbor, MI 48109, USA
Search for other works by this author on:

Associate Editor: Martin Bishop

Author Notes
Bioinformatics, Volume 25, Issue 2, January 2009, Pages 284–285,https://doi.org/10.1093/bioinformatics/btn600
Published:
20 November 2008
Article history
Received:
06 August 2008
Revision received:
03 November 2008
Accepted:
14 November 2008
Published:
20 November 2008
Search
Close
Search

Abstract

Summary: We describe an interactive package that provides graphical overviews of the results of whole-genome association studies in datasets with rich multi-dimensional phenotypic information, such as global surveys of gene expression. Windows, Linux and Mac binaries are available from our website.

Availability:  http://www.sph.umich.edu/csg/weich/software.html

Contact:  [email protected]

Supplementary information:  Supplementary data are available atBioinformatics online.

1 INTRODUCTION

Recently, genome-wide association scans (GWAS) have been used to successfully dissect a variety of complex traits, ranging from discrete clinical outcomes such as asthma and diabetes (Moffattet al.,2007; Scottet al.,2007; WTCCC,2007) to continuous traits as diverse as height, weight, global gene expression and blood lipid levels (Dixonet al.,2007; Fraylinget al.,2007; Sannaet al.,2008; Scuteriet al.,2007; Willeret al.,2008). The amount of information generated in these studies is staggering and interpreting their results requires efficient computational tools for data analysis and visualization. This challenge is most noticeable when high-dimensional data (such as microarray gene expression data or proteomics data) are analyzed. In this case, the results of whole genome association studies can include billions of data points (Cheunget al.,2005; Dixonet al.,2007; Moffattet al.,2007). Realizing the full benefits of these studies requires an efficient way to share data among collaborators and with other researchers, both before and after the data are published. Here, we present a tool that facilitates interactive browsing of the results from whole genome association studies. To illustrate the capabilities of our browser, we used it to create an interactive interface for the results of a recent genome-wide association study of global gene expression (Dixonet al.,2007). The objective of the Dixonet al. (2007) study was to build a database that would allow researchers to systematically examine potential effects of disease-associated variants on transcript expression and our interactive browser makes it easy for many researchers to explore the data.

A diverse set of statistical methods can be used to examine the association between phenotypes of interest and single nucleotide polymorphism (SNP) data. For example, χ2 test statistics,P-values, effect size estimates and their standard errors, as well as SNP-specific heritability estimates are all commonly reported in GWAS studies. When there are tens of thousands of phenotypic outcomes and hundreds of thousands SNPs, the result set is usually very large, containing several million statistics and easily totaling several gigabytes. These datasets can be integrated into specialized local databases for further investigation, but it can be challenging for researchers without extensive database or programming skills to access results. Our GWAS GUI (Graphic User Interface) is intended to provide a convenient tool for interacting with arbitrary GWAS result sets and to facilitate searches and displays of GWAS results in graph or tabular form. We hope our tool will facilitate data sharing within collaborative groups and with the public at large.

2 FEATURES OF GWAS GUI BROWSER

Our GWAS GUI browser is an interactive package that facilitates rapid interactive browsing of whole-genome association study results. It is designed to handle thousands of phenotypes, and thus can handle very rich datasets, such as those where global surveys of gene expression are combined with genome-wide SNP data. The browser also allows users to interact with the results of simpler scans, such as scans that focus on a single discrete outcome or a small number of related traits. To evaluate the program, we have applied it to several large datasets, including a study evaluating the association between 408 273 SNPs and the levels of 54 675 transcripts representing 20 599 known genes and assessed in lymphoblastoid cell lines from approximately 400 children (Dixonet al.,2007). After this initial evaluation, we released an early version of the program, named the mRNA by SNP browser (MRBS), when the Dixonet al. (2007) paper was published. In addition to the visualization tool, the full GWAS GUI browser includes a data preparation tool that can be used to organize tabulated results into an indexed database for rapid browsing. There are two main browsing interfaces within our browser: (i) an interface that retrieves all results for a specific trait and (ii) an interface that retrieves all results in a specific genomic region. In either view, results can typically be retrieved almost instantaneously. In the ‘trait-centric’ view, the browser can tabulate and sort a summary of user provided association test results (e.g. effect size, standard error, heritability estimates, test statistics andP-value) and quickly generate plots that summarize the distribution of a user-specified test statistics along the genome. Alternatively, in the ‘position-centric’ view, the browser can tabulate all significant association test statistics (using a user-defined threshold) in a target region and plot the results for multiple traits. Optionally, information such as the location of nearby genes can also be displayed (Fig. 1). For convenience, both interfaces allow the browser to link the results to external databases chosen by the user, such as the University of California Santa Cruz (UCSC) genome browser, where users can examine the genomic context of each result in detail. When the user requests a SNP that is not included in the current dataset, linkage disequilibrium (LD) and tag information from the International HapMap Consortium can be used to suggest a backup tag-SNP.Figure 1 is an illustration of the browser interface after searching for a specific SNP position using the ‘position-centric’ view. Four SNPs of interest have been highlighted by the user in the tabular view (bottom left) and are circled in the graphical view.

An illustration of the GWAS GUI browser interface. This example demonstrates how to display the results for a specific region. Several large statistics have been highlighted with blue circles by selecting the corresponding rows. The top transcripts ordered by maximum statistic within the region are tabulated in the right panel.
Fig. 1.

An illustration of the GWAS GUI browser interface. This example demonstrates how to display the results for a specific region. Several large statistics have been highlighted with blue circles by selecting the corresponding rows. The top transcripts ordered by maximum statistic within the region are tabulated in the right panel.

3 EXAMPLES OF APPLICATION

Allowing large groups of scientists to browse and interact with the results of large multi-dimensional GWAS can be extremely helpful. For example, prior to the publication of the Dixonet al. (2007) gene expression paper, we used an early version of our browser to share preliminary results with several colleagues. This led to the observation that SNPs in an intergenic region on chromosome 5p13 that were associated with Crohn's Disease were also associated with transcript levels of PTGER4 suggesting that PTGER4 may be the primary candidate gene for Crohn's disease on chromosome 5. The Crohn's-associated SNPs are >200 Kb away from the nearest annotated gene. The result is published and described in detail elsewhere (Libioulleet al.,2007). Since then, many others have browsed our results resulting in several potential links between SNPs, human disease and mRNA transcript levels.

The current version of the GWAS GUI browser program is not restricted to gene-expression data, but is intended as a general tool that provides graphical overviews of whole-genome association study results for arbitrary phenotypes. The extended program allows users to load their own data files, tests statistics and genomic annotation files into the browser in a standardized text format. Generally, the traits can be any outcomes of interest, such as case–control indicators, expression values and many other continuous or categorical measurements. Arbitrary meta-data about each trait can be tracked and displayed. We expect that the browser will be particularly helpful when multiple-related traits are studied. In this setting, the browser simplifies the initial comparison of signals for different-related traits in regions of interest.

4 IMPLEMENTATION

The GWAS GUI browser program was implemented in C++ using the Qt4 toolkit (open-source version 4.4 Trolltech Inc.). It has been tested on Windows, Linux and Mac workstations. The system requirements depend on the size of input datasets which can range from a dataset examining a single trait dataset and hundreds of thousands of genetic markers to large-scale genome-wide gene-expression datasets with tens of thousands of traits and markers. On a modern Windows Workstation, the initial indexing of a set of results generated by PLINK (Purcellet al.,2007), MERLIN (Chenet al.,2007) or another whole-genome analysis tools and including approximately 300 000 SNPs requires ∼200 Mb of RAM and 5–10 min of computing time. After indexing, opening the same dataset and browsing the data should be nearly instantaneous and require only 60 Mb RAM.

Funding: National Human Genome Research Institute; National Heart Lung and Blood Institute.

Conflict of Interest: G.R.A. is a Pew Scholar of the Biomedical Sciences and is supported by the Pew Charitable Trusts.

REFERENCES

Chen
WM
et al.
,
Family-based association tests for genome-wide association scans.
Am. J. Hum. Genet.
,
2007
, vol.
81
(pg.
913
-
926
)
Cheung
VG
et al.
,
Mapping determinants of human gene expression by regional and genome-wide association.
Nature.
,
2005
, vol.
437
(pg.
1365
-
1369
)
Dixon
AL
et al.
,
A genome-wide association study of global gene expression.
Nat. Genet.
,
2007
, vol.
39
(pg.
1202
-
1207
)
Frayling
TM
et al.
,
A common variant in the FTO gene is associated with body mass index and predisposes to childhood and adult obesity.
Science.
,
2007
, vol.
316
(pg.
889
-
894
)
Libioulle
C
et al.
,
Novel Crohn disease locus identified by genome-wide association maps to a gene desert on 5p13.1 and modulates expression of PTGER4.
PLoS Genet.
,
2007
, vol.
3
pg.
e58
Moffatt
MF
et al.
,
Genetic variants regulating ORMDL3 expression contribute to the risk of childhood asthma.
Nature.
,
2007
, vol.
448
(pg.
470
-
473
)
Purcell
S
et al.
,
PLINK: a toolset for whole-genome association and population-based linkage analysis.
Am. J. Hum. Genet.
,
2007
, vol.
81
(pg.
559
-
575
)
Sanna
S
et al.
,
Common variants in the GDF5-UQCC region are associated with variation in human height.
Nat. Genet.,.
,
2008
, vol.
40
(pg.
198
-
203
)
Scott
LJ
et al.
,
A genome-wide association study of type 2 diabetes in Finns detects multiple susceptibility variants.
Science.
,
2007
, vol.
316
(pg.
1341
-
1345
)
Scuteri
A
et al.
,
Genome-wide association scan shows genetic variants in the FTO gene are associated with obesity-related traits.
PLoS Genet.
,
2007
, vol.
3
(pg.
1200
-
1210
)
Wellcome Trust Case Control Consortium
,
Genome-wide association study of 14 000 cases of seven common diseases and 3000 shared controls.
Nature.
,
2007
, vol.
447
(pg.
661
-
678
)
Willer
CJ
et al.
,
Newly identified loci that influence lipid concentrations and risk of coronary artery disease.
Nat. Genet.
,
2008
, vol.
40
(pg.
161
-
169
)

Author notes

Associate Editor: Martin Bishop

© The Author 2008. Published by Oxford University Press. All rights reserved. For Permissions, please email:[email protected]
Advertisement

Citations

Views

1,116

Altmetric

Metrics
Total Views1,116
838Pageviews
278PDF Downloads
Since 12/1/2016
Month:Total Views:
December 20165
January 20172
February 20177
March 20173
May 20178
June 20171
July 20175
August 20177
September 20171
October 20178
November 20171
December 201713
January 20189
February 20188
March 201831
April 201823
May 201815
June 201814
July 20186
August 201812
September 20184
October 20187
November 201817
December 201814
January 20197
February 201912
March 20198
April 201913
May 201910
June 20196
July 20197
August 20197
September 20196
October 201914
November 201924
December 20196
January 202020
February 20206
March 20208
April 202012
May 20204
June 202013
July 202013
August 20202
September 202012
October 202018
November 202019
December 20208
January 202113
February 20216
March 202113
April 202111
May 20218
June 202112
July 202112
August 202118
September 202122
October 202115
November 202120
December 202112
January 202210
February 202210
March 20225
April 202212
May 202212
June 20227
July 202214
August 202212
September 202212
October 202216
November 20227
December 202212
January 202315
February 202313
March 20238
April 20235
May 202312
June 20237
July 20231
August 202312
September 20233
October 202315
November 202310
December 202316
January 202424
February 202428
March 202414
April 202418
May 202416
June 202415
July 202414
August 202417
September 202413
October 202410
November 20248
December 20249
January 20254
February 202514
March 202522
April 20256
Citations
Powered by Dimensions
5Web of Science
Altmetrics
×

Email alerts

New journal issues alert

To set up an email alert, pleasesign in to your personal account, orregister

Sign in

Personal account

  • Sign in with email/username & password
  • Get email alerts
  • Save searches
  • Purchase content
  • Activate your purchase/trial code
  • Add your ORCID iD

Journal article activity alert

To set up an email alert, pleasesign in to your personal account, orregister

Sign in

Personal account

  • Sign in with email/username & password
  • Get email alerts
  • Save searches
  • Purchase content
  • Activate your purchase/trial code
  • Add your ORCID iD
Having trouble contacting the network. Please try again in a moment or two.
Oxford University Press
Journals Career Network
Advertisement
Advertisement
Advertisement
Bioinformatics
  • Online ISSN 1367-4811
  • Copyright © 2025 Oxford University Press
Close
Close
This Feature Is Available To Subscribers Only

Sign In orCreate an Account

Close

This PDF is available to Subscribers Only

View Article Abstract & Purchase Options

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

Close

[8]ページ先頭

©2009-2025 Movatter.jp