The Generalized Berk-Jones statistic was developed to performset-based inference in genetic association studies. It is an alternativeto tests such as the Sequence Kernel Association Test (SKAT),Generalized Higher Criticism (GHC), and Minimum p-value (minP).
GBJ is a generalization of the Berk-Jones (BJ) statistic, whichoffers - in a certain sense - asymptotic power guarantees for detectionof rare and weak signals. GBJ modifies BJ to account for correlationbetween factors in a set. GBJ has been demonstrated to outperform othertests when signals are moderately sparse (more precisely, when thenumber of signals is betweend1/4 andd1/2, whered is the number of factors inthe set).
Other advantages include: 1. Analytic p-value calculation (no needfor permutation inference). 2. Can be applied to individual-levelgenotype data or GWAS summary statistics. 3. No tuning parameters.Accepts standard inputs (similar to glm() function).
We show a simple example for testing the association between a set of50 SNPs (which could be, for example, from the same gene or pathway) anda binary outcome.
library(GBJ)set.seed(1000)# Case-control study, 1000 subjectscancer_status<-c(rep(1,500),rep(0,500))# We have 50 SNPs each with minor allele frequency of 0.3 in this examplegenotype_data<-matrix(data=rbinom(n=1000*50,size=2,prob=0.3),nrow=1000)age<-round(runif(n=1000,min=30,max=80) )gender<-rbinom(n=1000,size=1,prob=0.5)# Fit the null model, calculate marginal score statistics for each SNP# (asymptotically equivalent to those calculated by, for example, PLINK)null_mod<-glm(cancer_status~age+gender,family=binomial(link="logit"))log_reg_stats<-calc_score_stats(null_model=null_mod,factor_matrix=genotype_data,link_function="logit")# Run the testGBJ(test_stats=log_reg_stats$test_stats,cor_mat=log_reg_stats$cor_mat)#> $GBJ#> [1] 1.43984#>#> $GBJ_pvalue#> [1] 0.330911#>#> $err_code#> [1] 0We may not have convinced you that GBJ is the best option for yourapplication. If that is the case, then you may still be interested intrying the Berk-Jones (BJ), Generalized Higher Criticism (GHC), HigherCriticism (HC), or Minimum p-value (minP) tests, which can be run withthe same inputs, i.e. GHC(test_stats=score_stats, cor_mat=cor_Z) to runthe GHC. We also have developed an omnibus test which information frommultiple different methods. Please see the vignette for moredetails.