Associate Editor: Martin Bishop
Summary: We describe an interactive package that provides graphical overviews of the results of whole-genome association studies in datasets with rich multi-dimensional phenotypic information, such as global surveys of gene expression. Windows, Linux and Mac binaries are available from our website.
Availability: http://www.sph.umich.edu/csg/weich/software.html
Contact: [email protected]
Supplementary information: Supplementary data are available atBioinformatics online.
Recently, genome-wide association scans (GWAS) have been used to successfully dissect a variety of complex traits, ranging from discrete clinical outcomes such as asthma and diabetes (Moffattet al.,2007; Scottet al.,2007; WTCCC,2007) to continuous traits as diverse as height, weight, global gene expression and blood lipid levels (Dixonet al.,2007; Fraylinget al.,2007; Sannaet al.,2008; Scuteriet al.,2007; Willeret al.,2008). The amount of information generated in these studies is staggering and interpreting their results requires efficient computational tools for data analysis and visualization. This challenge is most noticeable when high-dimensional data (such as microarray gene expression data or proteomics data) are analyzed. In this case, the results of whole genome association studies can include billions of data points (Cheunget al.,2005; Dixonet al.,2007; Moffattet al.,2007). Realizing the full benefits of these studies requires an efficient way to share data among collaborators and with other researchers, both before and after the data are published. Here, we present a tool that facilitates interactive browsing of the results from whole genome association studies. To illustrate the capabilities of our browser, we used it to create an interactive interface for the results of a recent genome-wide association study of global gene expression (Dixonet al.,2007). The objective of the Dixonet al. (2007) study was to build a database that would allow researchers to systematically examine potential effects of disease-associated variants on transcript expression and our interactive browser makes it easy for many researchers to explore the data.
A diverse set of statistical methods can be used to examine the association between phenotypes of interest and single nucleotide polymorphism (SNP) data. For example, χ2 test statistics,P-values, effect size estimates and their standard errors, as well as SNP-specific heritability estimates are all commonly reported in GWAS studies. When there are tens of thousands of phenotypic outcomes and hundreds of thousands SNPs, the result set is usually very large, containing several million statistics and easily totaling several gigabytes. These datasets can be integrated into specialized local databases for further investigation, but it can be challenging for researchers without extensive database or programming skills to access results. Our GWAS GUI (Graphic User Interface) is intended to provide a convenient tool for interacting with arbitrary GWAS result sets and to facilitate searches and displays of GWAS results in graph or tabular form. We hope our tool will facilitate data sharing within collaborative groups and with the public at large.
Our GWAS GUI browser is an interactive package that facilitates rapid interactive browsing of whole-genome association study results. It is designed to handle thousands of phenotypes, and thus can handle very rich datasets, such as those where global surveys of gene expression are combined with genome-wide SNP data. The browser also allows users to interact with the results of simpler scans, such as scans that focus on a single discrete outcome or a small number of related traits. To evaluate the program, we have applied it to several large datasets, including a study evaluating the association between 408 273 SNPs and the levels of 54 675 transcripts representing 20 599 known genes and assessed in lymphoblastoid cell lines from approximately 400 children (Dixonet al.,2007). After this initial evaluation, we released an early version of the program, named the mRNA by SNP browser (MRBS), when the Dixonet al. (2007) paper was published. In addition to the visualization tool, the full GWAS GUI browser includes a data preparation tool that can be used to organize tabulated results into an indexed database for rapid browsing. There are two main browsing interfaces within our browser: (i) an interface that retrieves all results for a specific trait and (ii) an interface that retrieves all results in a specific genomic region. In either view, results can typically be retrieved almost instantaneously. In the ‘trait-centric’ view, the browser can tabulate and sort a summary of user provided association test results (e.g. effect size, standard error, heritability estimates, test statistics andP-value) and quickly generate plots that summarize the distribution of a user-specified test statistics along the genome. Alternatively, in the ‘position-centric’ view, the browser can tabulate all significant association test statistics (using a user-defined threshold) in a target region and plot the results for multiple traits. Optionally, information such as the location of nearby genes can also be displayed (Fig. 1). For convenience, both interfaces allow the browser to link the results to external databases chosen by the user, such as the University of California Santa Cruz (UCSC) genome browser, where users can examine the genomic context of each result in detail. When the user requests a SNP that is not included in the current dataset, linkage disequilibrium (LD) and tag information from the International HapMap Consortium can be used to suggest a backup tag-SNP.Figure 1 is an illustration of the browser interface after searching for a specific SNP position using the ‘position-centric’ view. Four SNPs of interest have been highlighted by the user in the tabular view (bottom left) and are circled in the graphical view.
An illustration of the GWAS GUI browser interface. This example demonstrates how to display the results for a specific region. Several large statistics have been highlighted with blue circles by selecting the corresponding rows. The top transcripts ordered by maximum statistic within the region are tabulated in the right panel.
Allowing large groups of scientists to browse and interact with the results of large multi-dimensional GWAS can be extremely helpful. For example, prior to the publication of the Dixonet al. (2007) gene expression paper, we used an early version of our browser to share preliminary results with several colleagues. This led to the observation that SNPs in an intergenic region on chromosome 5p13 that were associated with Crohn's Disease were also associated with transcript levels of PTGER4 suggesting that PTGER4 may be the primary candidate gene for Crohn's disease on chromosome 5. The Crohn's-associated SNPs are >200 Kb away from the nearest annotated gene. The result is published and described in detail elsewhere (Libioulleet al.,2007). Since then, many others have browsed our results resulting in several potential links between SNPs, human disease and mRNA transcript levels.
The current version of the GWAS GUI browser program is not restricted to gene-expression data, but is intended as a general tool that provides graphical overviews of whole-genome association study results for arbitrary phenotypes. The extended program allows users to load their own data files, tests statistics and genomic annotation files into the browser in a standardized text format. Generally, the traits can be any outcomes of interest, such as case–control indicators, expression values and many other continuous or categorical measurements. Arbitrary meta-data about each trait can be tracked and displayed. We expect that the browser will be particularly helpful when multiple-related traits are studied. In this setting, the browser simplifies the initial comparison of signals for different-related traits in regions of interest.
The GWAS GUI browser program was implemented in C++ using the Qt4 toolkit (open-source version 4.4 Trolltech Inc.). It has been tested on Windows, Linux and Mac workstations. The system requirements depend on the size of input datasets which can range from a dataset examining a single trait dataset and hundreds of thousands of genetic markers to large-scale genome-wide gene-expression datasets with tens of thousands of traits and markers. On a modern Windows Workstation, the initial indexing of a set of results generated by PLINK (Purcellet al.,2007), MERLIN (Chenet al.,2007) or another whole-genome analysis tools and including approximately 300 000 SNPs requires ∼200 Mb of RAM and 5–10 min of computing time. After indexing, opening the same dataset and browsing the data should be nearly instantaneous and require only 60 Mb RAM.
Funding: National Human Genome Research Institute; National Heart Lung and Blood Institute.
Conflict of Interest: G.R.A. is a Pew Scholar of the Biomedical Sciences and is supported by the Pew Charitable Trusts.
Associate Editor: Martin Bishop
Month: | Total Views: |
---|---|
December 2016 | 5 |
January 2017 | 2 |
February 2017 | 7 |
March 2017 | 3 |
May 2017 | 8 |
June 2017 | 1 |
July 2017 | 5 |
August 2017 | 7 |
September 2017 | 1 |
October 2017 | 8 |
November 2017 | 1 |
December 2017 | 13 |
January 2018 | 9 |
February 2018 | 8 |
March 2018 | 31 |
April 2018 | 23 |
May 2018 | 15 |
June 2018 | 14 |
July 2018 | 6 |
August 2018 | 12 |
September 2018 | 4 |
October 2018 | 7 |
November 2018 | 17 |
December 2018 | 14 |
January 2019 | 7 |
February 2019 | 12 |
March 2019 | 8 |
April 2019 | 13 |
May 2019 | 10 |
June 2019 | 6 |
July 2019 | 7 |
August 2019 | 7 |
September 2019 | 6 |
October 2019 | 14 |
November 2019 | 24 |
December 2019 | 6 |
January 2020 | 20 |
February 2020 | 6 |
March 2020 | 8 |
April 2020 | 12 |
May 2020 | 4 |
June 2020 | 13 |
July 2020 | 13 |
August 2020 | 2 |
September 2020 | 12 |
October 2020 | 18 |
November 2020 | 19 |
December 2020 | 8 |
January 2021 | 13 |
February 2021 | 6 |
March 2021 | 13 |
April 2021 | 11 |
May 2021 | 8 |
June 2021 | 12 |
July 2021 | 12 |
August 2021 | 18 |
September 2021 | 22 |
October 2021 | 15 |
November 2021 | 20 |
December 2021 | 12 |
January 2022 | 10 |
February 2022 | 10 |
March 2022 | 5 |
April 2022 | 12 |
May 2022 | 12 |
June 2022 | 7 |
July 2022 | 14 |
August 2022 | 12 |
September 2022 | 12 |
October 2022 | 16 |
November 2022 | 7 |
December 2022 | 12 |
January 2023 | 15 |
February 2023 | 13 |
March 2023 | 8 |
April 2023 | 5 |
May 2023 | 12 |
June 2023 | 7 |
July 2023 | 1 |
August 2023 | 12 |
September 2023 | 3 |
October 2023 | 15 |
November 2023 | 10 |
December 2023 | 16 |
January 2024 | 24 |
February 2024 | 28 |
March 2024 | 14 |
April 2024 | 18 |
May 2024 | 16 |
June 2024 | 15 |
July 2024 | 14 |
August 2024 | 17 |
September 2024 | 13 |
October 2024 | 10 |
November 2024 | 8 |
December 2024 | 9 |
January 2025 | 4 |
February 2025 | 14 |
March 2025 | 22 |
April 2025 | 6 |
Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide
This PDF is available to Subscribers Only
View Article Abstract & Purchase OptionsFor full access to this pdf, sign in to an existing account, or purchase an annual subscription.