Movatterモバイル変換

Simultaneous analysis of genetic associations with multiplephenotypes may reveal shared genetic susceptibility across traits(pleiotropy). CPBayes is a Bayesian meta analysis method for studyingcross-phenotype genetic associations. It uses summary-level data acrossmultiple phenotypes to simultaneously measure the evidence ofaggregate-level pleiotropic association and estimate an optimal subsetof traits associated with the risk locus. CPBayes model is based on aspike and slab prior.

Installation

install.packages("CPBayes")library("CPBayes")

How torun CPBayes for uncorrelated summary statistics.

library("CPBayes")# Load the beta hat vectorBetaHatfile<-system.file("extdata","BetaHat.rda",package ="CPBayes")load(BetaHatfile)BetaHat

BetaHat contains an example data of the main genetic effect(beta/log(odds ratio)) estimates for a single nucleotide polymorphism(SNP) obtained from 10 separate case-control studies for 10 differentdiseases. Since the studies do not have any overlapping subject,beta-hat across the diseases can be assumed uncorrelated.

# Load the standard error vectorSEfile<-system.file("extdata","SE.rda",package ="CPBayes")load(SEfile)SE

SE contains the standard errors corresponding to the above beta hatvector across 10 separate case-control studies.

# Specify the name of the traits and the genetic variant.traitNames<-paste("Disease",1:10,sep ="")SNP1<-"rs1234"traitNamesSNP1

Now, since the studies are non-overlapping, the summary statisticsacross traits are uncorrelated. Here we run the analytic_locFDR_BF_uncorfunction for this example data.

#Run analytic_locFDR_BF_uncor function to analytically compute locFDR and log10BF for uncorrelated summary statistics.result<-analytic_locFDR_BF_uncor(BetaHat, SE)str(result)

This function provides analytically computed locFDR [result$locFDR\] and log10(Bayes factor)\[result$log10_BF] for uncorrelated summary statistics. Whileanalytically computing locFDR (BF), a fixed value of slab variance isconsidered.

Now we implement CPBayes (based on MCMC) for this example data. Sincethe studies are non-overlapping, we run the the cpbayes_uncorfunction.

# Run the uncorrelated version of CPBayes.result<-cpbayes_uncor(BetaHat, SE,Phenotypes = traitNames,Variant = SNP1)

After running cpbayes_uncor, it prints the list of important traitsfor which the trait-specific posterior probability of association (PPAj)> 20%. However, the printed outputs are only a part of ‘result’ whichis a list that constitutes of various components. An overall summary of‘result’ can be seen by using the str() function (as shown below).

# Overall summary of the primary results produced by cpbayes_uncor.str(result)

A detailed interpretation of all the outputs are described in theValue section of cpbayes_uncor in the CPBayes manual.

The post_summaries function provides important insights into anobserved pleiotropic signal, e.g., the direction of associations,trait-specific posterior probability of associations (PPAj), posteriormean/median and 95% credible interval (Bayesian analog of the confidenceinterval) of the unknown true genetic effect (beta/odds ratio) on eachtrait, etc.

# Post summary of the MCMC data produced by cpbayes_uncor.PleioSumm<-post_summaries(result,level =0.05)str(PleioSumm)

So we have to pass the list ‘result’ returned by cpbayes_uncor as thefirst argument and the ‘level’ as the second argument into thepost_summaries function. If ‘level’ is not specified, the default valueis 0.05. For detailed description of different outputs provided by thisfunction, see the Value section of post_summaries in the CPBayesmanual.

Next we run the forest_cpbayes function to create a forest plot thatpresents the pleiotropy result produced by cpbayes_uncor.

# Forest plot for the pleiotropy result obtained by cpbayes_uncor.forest_cpbayes(result,level =0.05)

Similarly as for the post_summaries function, we need to pass thesame list `result’ returned by cpbayes_uncor as the first argument intothe function. Second argument is the level whose default value is 0.05.In the forest plot, (1-level)% confidence interval of the beta/log oddsratio parameter is plotted for each trait. For more details, see thesection of forest_cpbayes function in the CPBayes manual.

How torun CPBayes for correlated summary statistics.

Next we demonstrate how to run CPBayes for correlated summarystatistics. Get the path to the data.

# Load the beta-hat vectordatafile<-system.file("extdata","cBetaHat.rda",package ="CPBayes")load(datafile)cBetaHat

Here ‘c’ in cBetaHat stands for correlated case. cBetaHat contains anexample data of the main genetic association parameter (beta/log oddsratio) estimates for a SNP across 10 overlapping case-control studiesfor 10 different diseases. Each of the 10 studies has a distinct set of7000 cases and a common set of 10000 controls shared across all thestudies. Since the studies have overlapping subjects, beta-hat acrossthe diseases are correlated.

# Load the standard error vectordatafile<-system.file("extdata","cSE.rda",package ="CPBayes")load(datafile)cSE

cSE contains the standard errors corresponding to the above beta hatvector across 10 overlapping case-control studies.

# Load the correlation matrix of the beta-hat vector (cBetaHat)datafile<-system.file("extdata","cor.rda",package ="CPBayes")load(datafile)cor

The correlation matrix of the beta-hat vector (cBetaHat) is given by‘cor’ which we estimated by employing the estimate_corln function(demonstrated later in this tutorial) using the sample-overlap matrices(explained later in this tutorial).

Since the summary statistics across traits are correlated, we run thethe analytic_locFDR_BF_cor function for this example data.

# Run analytic_locFDR_BF_cor function to analytically compute locFDR and log10BF for correlated summary statistics.result<-analytic_locFDR_BF_cor(cBetaHat, cSE, cor)str(result)

So this function analytically computes the locFDR [result$locFDR\] and log10(Bayes factor)\[result$log10_BF] for correlated summary statistics. Next werun the correlated version of CPBayes (based on MCMC) for this exampledata.

# Run the correlated version of CPBayes.result<-cpbayes_cor(cBetaHat, cSE, cor,Phenotypes = traitNames,Variant = SNP1)

After running cpbayes_cor, it prints the list of important traits forwhich the trait-specific posterior probability of association (PPAj)> 20%. However, the printed outputs are only a part of ‘result’ whichis a list that constitutes of various components. An overall summary of‘result’ can be seen by using the str() function (as shown below).

# Overall summary of the primary results produced by cpbayes_cor.str(result)

A detailed interpretation of all the outputs are described in theValue section of cpbayes_cor in the CPBayes manual.

# Post summary of the MCMC data produced by cpbayes_cor.PleioSumm<-post_summaries(result,level =0.05)str(PleioSumm)

post_summaries works exactly in the same way for both cpbayes_cor andcpbayes_uncor. For detailed description of different outputs provided bypost_summaries, see the Value section of post_summaries in the CPBayesmanual.

Next we run the forest_cpbayes function to create a forest plot thatpresents the pleiotropy result produced by cpbayes_cor.

# Forest plot for the pleiotropy result obtained by cpbayes_cor.forest_cpbayes(result,level =0.05)

Note that, forest_cpbayes works exactly in the same way for bothcpbayes_cor and cpbayes_uncor. For more details, see the section offorest_cpbayes function in the CPBayes manual.

How to run estimate_corln.

The function estimate_corln estimates the correlation matrix of thebeta-hat vector for multiple overlapping case-control studies using thesample-overlap count matrices which describe the number of cases orcontrols shared between studies/traits, and the number of subjects whoare case for one study/trait but control for another study/trait. For acohort study, the phenotypic correlation matrix should be a reasonablesubstitute of this correlation matrix.

# Example data of sample-overlap matricesSampleOverlapMatrixFile<-system.file("extdata","SampleOverlapMatrix.rda",package ="CPBayes")load(SampleOverlapMatrixFile)SampleOverlapMatrix

SampleOverlapMatrix is a list that contains an example of the sampleoverlap matrices for five different diseases in the Kaiser GERA cohort(a real data). The list constitutes of three matrices as follows.SampleOverlapMatrix$n11 provides the numberof cases shared between all possible pairs of studies/traits.SampleOverlapMatrix$n00 provides the number of controls sharedbetween all possible pairs of studies/traits. SampleOverlapMatrix$n10provides the number of subjects who are case for one study/trait andcontrol for another study/trait. For more detailed explanation, see theArguments section of estimate_corln in the CPBayes manual.

# Estimate the correlation matrix of correlated beta-hat vectorn11<- SampleOverlapMatrix$n11n00<- SampleOverlapMatrix$n00n10<- SampleOverlapMatrix$n10cor<-estimate_corln(n11, n00, n10)cor

The function estimate_corln computes an approximate correlationmatrix of the correlated beta-hat vector obtained from multipleoverlapping case-control studies using the sample-overlap matrices. Notethat for a cohort study, the phenotypic correlation matrix should be areasonable substitute of this correlation matrix. These approximationsof the correlation structure are accurate when none of thediseases/traits is associated with the environmental covariates andgenetic variant. While demonstrating cpbayes_cor, we used simulated datafor 10 overlapping case-control studies with each study having adistinct set of 7000 cases and a common set of 10000 controls sharedacross all the studies. We used the estimate_corln function to estimatethe correlation matrix of the correlated beta-hat vector using thesample-overlap matrices.

Important note on the estimation of correlation structureof correlated beta-hat vector: In general, environmentalcovariates are expected to be present in a study and associated with thephenotypes of interest. Also, a small proportion of genome-wide geneticvariants are expected to be associated. Hence the above approximationsof the correlation matrix may not be accurate. So in general, werecommend an alternative strategy to estimate the correlation matrixusing the genome-wide summary statistics data across traits as follows.First, extract all the SNPs for each of which the trait-specificunivariate association p-value across all the traits are > 0.1. Thetrait-specific univariate association p-values are obtained using thebeta-hat and standard error for each trait. Each of the SNPs selected inthis way is either weakly or not associated with any of the phenotypes(null SNP). Next, select a set of independent null SNPs from the initialset of null SNPs by using a threshold of r^2 < 0.01 (r: thecorrelation between the genotypes at a pair of SNPs). In the absence ofin-sample linkage disequilibrium (LD) information, one can use thereference panel LD information for this screening. Finally, compute thecorrelation matrix of the effect estimates (beta-hat vector) as thesample correlation matrix of the beta-hat vector across all the selectedindependent null SNPs. This strategy is more general and applicable to acohort study or multiple overlapping studies for binary or quantitativetraits with arbitrary distributions. It is also useful when the beta-hatvector for multiple non-overlapping studies become correlated due togenetically related individuals across studies. Misspecification of thecorrelation structure can affect the results produced by CPBayes to someextent. Hence, if genome-wide summary statistics data across traits isavailable, we recommend this alternative strategy to estimate thecorrelation matrix of the beta-hat vector.

Getting more details