When looking at multivariate binomial data with the aim of learningabout the dependence that is present, possibly after correcting for somecovariates many models and methods are available:
in the mets package you can fit the
Pairwise odds ratio model
Bivariate Probit model
Additive gamma random effects model
These last three models are all fitted in the mets package usingcomposite likelihoods based on pairs within clusters. The models can befitted specifically based on specifying which pairs one wants to use forthe composite score.
The models are described in futher details in the binomial-twinvignette.
We start by simulating family data with and additive gamma structureon ACE form. Here 10000 families consisting of two parents and twochildren. The response is ybin and there is one covariate x.
library(mets)library(timereg)#> Loading required package: survival#>#> Attaching package: 'timereg'#> The following objects are masked from 'package:mets':#>#> Event, event.split, kmplot, plotConfregionset.seed(100) data<-simbinClaytonOakes.family.ace(500,2,1,beta=NULL,alpha=NULL) data$number<-c(1,2,3,4) data$child<-1*(data$number==3)head(data)#> ybin x type cluster number child#> 1 1 1 mother 1 1 0#> 2 1 0 father 1 2 0#> 3 1 0 child 1 3 1#> 4 1 1 child 1 4 0#> 5 1 1 mother 2 1 0#> 6 1 0 father 2 2 0We fit the marginal models, and here find a covariate effect at 0.3for x. The marginals can be specified as desired.
aa<- margbin<-glm(ybin~x,data=data,family=binomial())summary(aa)#>#> Call:#> glm(formula = ybin ~ x, family = binomial(), data = data)#>#> Coefficients:#> Estimate Std. Error z value Pr(>|z|)#> (Intercept) 0.49908 0.06387 7.814 5.52e-15 ***#> x 0.17531 0.09355 1.874 0.0609 .#> ---#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1#>#> (Dispersion parameter for binomial family taken to be 1)#>#> Null deviance: 2610.2 on 1999 degrees of freedom#> Residual deviance: 2606.7 on 1998 degrees of freedom#> AIC: 2610.7#>#> Number of Fisher Scoring iterations: 4For the Additive Gamma model we set-up the random effects included insuch a family to make the ACE valid using some special functions forthis.
The model is constructed with one enviromental effect shared by allin the family and 8 genetic random effects with size 1/4 geneticvariance. Looking at the first family we see that the mother and fatherboth share half the genes with the children and that the two childrenalso share half their genes with this specification. Below we also showan alternative specification of this model using all pairs.
# make ace random effects designout<-ace.family.design(data,member="type",id="cluster")out$pardes#> [,1] [,2]#> [1,] 0.25 0#> [2,] 0.25 0#> [3,] 0.25 0#> [4,] 0.25 0#> [5,] 0.25 0#> [6,] 0.25 0#> [7,] 0.25 0#> [8,] 0.25 0#> [9,] 0.00 1head(out$des.rv,4)#> m1 m2 m3 m4 f1 f2 f3 f4 env#> [1,] 1 1 1 1 0 0 0 0 1#> [2,] 0 0 0 0 1 1 1 1 1#> [3,] 1 1 0 0 1 1 0 0 1#> [4,] 1 0 1 0 1 0 1 0 1We can now fit the model calling the two-stage function
# fitting ace model for family structurets<-binomial.twostage(margbin,data=data,clusters=data$cluster,theta=c(2,1),random.design=out$des.rv,theta.des=out$pardes)summary(ts)#> Dependence parameter for Clayton-Oakes model#> Variance of Gamma distributed random effects#> $estimates#> theta se#> dependence1 2.817486 0.8981749#> dependence2 1.137637 0.2806719#>#> $type#> [1] "clayton.oakes"#>#> $h#> Estimate Std.Err 2.5% 97.5% P-value#> dependence1 0.7124 0.09262 0.5308 0.8939 1.457e-14#> dependence2 0.2876 0.09262 0.1061 0.4692 1.899e-03#>#> $vare#> NULL#>#> $vartot#> Estimate Std.Err 2.5% 97.5% P-value#> p1 3.955 0.8668 2.256 5.654 5.051e-06#>#> attr(,"class")#> [1] "summary.mets.twostage"# true variance parametersc(2,1)#> [1] 2 1# total variance3#> [1] 3We now specify the same model via extracting all pairs.
The random effecs structure is typically simpler when just looking atpairs, but we start by specifying the random effects jointly for wholefamily. A special function writes up all combinations of pairs. Thereare 6 pairs within each family, and we keep track of who belongs to thedifferent families. We first simply give the pairs and we then shouldget the same result as before.
mm<-familycluster.index(data$cluster)head(mm$familypairindex,n=20)#> [1] 1 2 1 3 1 4 2 3 2 4 3 4 5 6 5 7 5 8 6 7pairs<- mm$pairsdim(pairs)#> [1] 3000 2head(pairs,12)#> [,1] [,2]#> [1,] 1 2#> [2,] 1 3#> [3,] 1 4#> [4,] 2 3#> [5,] 2 4#> [6,] 3 4#> [7,] 5 6#> [8,] 5 7#> [9,] 5 8#> [10,] 6 7#> [11,] 6 8#> [12,] 7 8Now with the pairs we fit the model
tsp<-binomial.twostage(margbin,data=data,clusters=data$cluster,theta=c(2,1),detail=0,random.design=out$des.rv,theta.des=out$pardes,pairs=pairs)summary(tsp)#> Dependence parameter for Clayton-Oakes model#> Variance of Gamma distributed random effects#> $estimates#> theta se#> dependence1 2.817486 0.8981749#> dependence2 1.137637 0.2806719#>#> $type#> [1] "clayton.oakes"#>#> $h#> Estimate Std.Err 2.5% 97.5% P-value#> dependence1 0.7124 0.09262 0.5308 0.8939 1.457e-14#> dependence2 0.2876 0.09262 0.1061 0.4692 1.899e-03#>#> $vare#> NULL#>#> $vartot#> Estimate Std.Err 2.5% 97.5% P-value#> p1 3.955 0.8668 2.256 5.654 5.051e-06#>#> attr(,"class")#> [1] "summary.mets.twostage"Here a random sample of pairs are given instead and we get otherestimates.
set.seed(100)ssid<-sort(sample(1:nrow(pairs),nrow(pairs)/2))tsd<-binomial.twostage(aa,data=data,clusters=data$cluster,theta=c(2,1),step=1.0,random.design=out$des.rv,iid=1,Nit=10,theta.des=out$pardes,pairs=pairs[ssid,])summary(tsd)#> Dependence parameter for Clayton-Oakes model#> Variance of Gamma distributed random effects#> $estimates#> theta se#> dependence1 3.0275402 1.6045129#> dependence2 0.8135103 0.3895686#>#> $type#> [1] "clayton.oakes"#>#> $h#> Estimate Std.Err 2.5% 97.5% P-value#> dependence1 0.7882 0.1511 0.49210 1.0843 1.816e-07#> dependence2 0.2118 0.1511 -0.08431 0.5079 1.609e-01#>#> $vare#> NULL#>#> $vartot#> Estimate Std.Err 2.5% 97.5% P-value#> p1 3.841 1.402 1.093 6.589 0.00615#>#> attr(,"class")#> [1] "summary.mets.twostage"To specify such a model when only the pairs are available we show howto specify the model. We here use the same marginal “aa” to make theresults comparable. The marginal can also be fitted based on availabledata.
We start by selecting the data related to the pairs, and sets up newid’s and to start we specify the model using the full design with 9random effects. Below we show how one can use with only the randomeffects needed for each pair, which is typically simpler.
head(pairs[ssid,])#> [,1] [,2]#> [1,] 1 2#> [2,] 1 3#> [3,] 2 3#> [4,] 2 4#> [5,] 3 4#> [6,] 6 8ids<-sort(unique(c(pairs[ssid,])))pairsids<-c(pairs[ssid,])pair.new<-matrix(fast.approx(ids,c(pairs[ssid,])),ncol=2)head(pair.new)#> [,1] [,2]#> [1,] 1 2#> [2,] 1 3#> [3,] 2 3#> [4,] 2 4#> [5,] 3 4#> [6,] 5 7dataid<-dsort(data[ids,],"cluster")outid<-ace.family.design(dataid,member="type",id="cluster")outid$pardes#> [,1] [,2]#> [1,] 0.25 0#> [2,] 0.25 0#> [3,] 0.25 0#> [4,] 0.25 0#> [5,] 0.25 0#> [6,] 0.25 0#> [7,] 0.25 0#> [8,] 0.25 0#> [9,] 0.00 1head(outid$des.rv)#> m1 m2 m3 m4 f1 f2 f3 f4 env#> [1,] 1 1 1 1 0 0 0 0 1#> [2,] 0 0 0 0 1 1 1 1 1#> [3,] 1 1 0 0 1 1 0 0 1#> [4,] 1 0 1 0 1 0 1 0 1#> [5,] 0 0 0 0 1 1 1 1 1#> [6,] 1 1 0 0 1 1 0 0 1Now fitting the model with the data set up
aa<-glm(ybin~x,data=dataid,family=binomial())tsdid<-binomial.twostage(aa,data=dataid,clusters=dataid$cluster,theta=c(2,1),random.design=outid$des.rv,theta.des=outid$pardes,pairs=pair.new)summary(tsdid)#> Dependence parameter for Clayton-Oakes model#> Variance of Gamma distributed random effects#> $estimates#> theta se#> dependence1 3.1389859 1.649758#> dependence2 0.8010544 0.399870#>#> $type#> [1] "clayton.oakes"#>#> $h#> Estimate Std.Err 2.5% 97.5% P-value#> dependence1 0.7967 0.1492 0.50421 1.0892 9.358e-08#> dependence2 0.2033 0.1492 -0.08917 0.4958 1.731e-01#>#> $vare#> NULL#>#> $vartot#> Estimate Std.Err 2.5% 97.5% P-value#> p1 3.94 1.438 1.121 6.759 0.006153#>#> attr(,"class")#> [1] "summary.mets.twostage"We now specify the design specifically using the pairs. Therandom.design and design on the parameters are now given for each pair,as a 3 dimensional matrix. with a direct specification of random.designand the design on the parameters theta.design. In addition we need alsoto give the number of random effects for each pair. These basic thingsare constructed by certain functions for the ACE design.
pair.types<-matrix(dataid[c(t(pair.new)),"type"],byrow=T,ncol=2)head(pair.new,7)#> [,1] [,2]#> [1,] 1 2#> [2,] 1 3#> [3,] 2 3#> [4,] 2 4#> [5,] 3 4#> [6,] 5 7#> [7,] 6 7head(pair.types,7)#> [,1] [,2]#> [1,] "mother" "father"#> [2,] "mother" "child"#> [3,] "father" "child"#> [4,] "father" "child"#> [5,] "child" "child"#> [6,] "father" "child"#> [7,] "child" "child"###theta.des<-rbind(c(rbind(c(1,0),c(1,0),c(0,1),c(0,0))),c(rbind(c(0.5,0),c(0.5,0),c(0.5,0),c(0,1))))random.des<-rbind(c(1,0,1,0),c(0,1,1,0),c(1,1,0,1),c(1,0,1,1))mf<-1*(pair.types[,1]=="mother"& pair.types[,2]=="father")## pair, rv related to pairs, theta.des related to pairpairs.new<-cbind(pair.new,(mf==1)*1+(mf==0)*3,(mf==1)*2+(mf==0)*4,(mf==1)*1+(mf==0)*2,(mf==1)*3+(mf==0)*4)pairs.new is matrix with
columns 1:2 giving the indeces of the data points
columns 3:4 giving the indeces of the random.design for thedifferent pairs
columns 5 giving the indeces of the theta.des written asrows
columns 6 giving the number of random variables for thispair
The length of all rows of theta.des are the maximum number of randomeffects\(\times\) the number ofparameters. These two numbers are given in the call. In this case 4\(\times\) 2. So theta.des has rows oflength\(8\), possibly including some0’s for rows not relevant due to fewer random effects, as is the casehere for pairs that do not share genetic effects.
For pair 1 that is a mother/farther pair, we see that they share 1environmental random effect of size 1. There are also two geneticeffects that are unshared between the two. So a total of 3 randomeffects are needed here. The theta.des relates the 3 random effects topossible relationships in the parameters. Here the genetic effects arefull and so is the environmental effect. In contrast we also consider amother/child pair that share half the genes, now with random effectswith (1/2) gene variance. We there need 4 random effects, 2 non-sharedhalf-gene, 1 shared half-gene, and one shared full environmentaleffect.
# 3 rvs hererandom.des#> [,1] [,2] [,3] [,4]#> [1,] 1 0 1 0#> [2,] 0 1 1 0#> [3,] 1 1 0 1#> [4,] 1 0 1 1theta.des#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]#> [1,] 1.0 1.0 0.0 0 0 0 1 0#> [2,] 0.5 0.5 0.5 0 0 0 0 1head(pairs.new)#> [,1] [,2] [,3] [,4] [,5] [,6]#> [1,] 1 2 1 2 1 3#> [2,] 1 3 3 4 2 4#> [3,] 2 3 3 4 2 4#> [4,] 2 4 3 4 2 4#> [5,] 3 4 3 4 2 4#> [6,] 5 7 3 4 2 4Now fitting the model, and we see that it is a lot quicker due to thefewer random effects needed for pairs. We need to also specify thenumber of parameters in this case.
tsdid2<-binomial.twostage(aa,data=dataid,clusters=dataid$cluster,theta=c(2,1),random.design=random.des,theta.des=theta.des,pairs=pairs.new,dim.theta=2)summary(tsdid2)#> Dependence parameter for Clayton-Oakes model#> Variance of Gamma distributed random effects#> $estimates#> theta se#> dependence1 3.1389859 1.649758#> dependence2 0.8010544 0.399870#>#> $type#> [1] "clayton.oakes"#>#> $h#> Estimate Std.Err 2.5% 97.5% P-value#> dependence1 0.7967 0.1492 0.50421 1.0892 9.358e-08#> dependence2 0.2033 0.1492 -0.08917 0.4958 1.731e-01#>#> $vare#> NULL#>#> $vartot#> Estimate Std.Err 2.5% 97.5% P-value#> p1 3.94 1.438 1.121 6.759 0.006153#>#> attr(,"class")#> [1] "summary.mets.twostage"The same model can be specifed even simpler via the kinshipcoefficient. For this speicification there are 4 random effects for eachpair, but some have variance 0. The mother-father pair, here shares arandom effect with variance 0, and have two non-shared genetic effectswith full variance, in addition to a fully shared environmentaleffect.
kinship<-rep(0.5,nrow(pair.types))kinship[pair.types[,1]=="mother"& pair.types[,2]=="father"]<-0head(kinship,n=10)#> [1] 0.0 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5out<-make.pairwise.design(pair.new,kinship,type="ace")tsdid3<-binomial.twostage(aa,data=dataid,clusters=dataid$cluster,theta=c(2,1)/9,random.design=out$random.design,theta.des=out$theta.des,pairs=out$new.pairs,dim.theta=2)summary(tsdid3)#> Dependence parameter for Clayton-Oakes model#> Variance of Gamma distributed random effects#> $estimates#> theta se#> dependence1 3.1389859 1.649758#> dependence2 0.8010544 0.399870#>#> $type#> [1] "clayton.oakes"#>#> $h#> Estimate Std.Err 2.5% 97.5% P-value#> dependence1 0.7967 0.1492 0.50421 1.0892 9.358e-08#> dependence2 0.2033 0.1492 -0.08917 0.4958 1.731e-01#>#> $vare#> NULL#>#> $vartot#> Estimate Std.Err 2.5% 97.5% P-value#> p1 3.94 1.438 1.121 6.759 0.006153#>#> attr(,"class")#> [1] "summary.mets.twostage"library(mets)set.seed(1000)data<-simbinClaytonOakes.family.ace(500,2,1,beta=NULL,alpha=NULL)head(data)#> ybin x type cluster#> 1 0 1 mother 1#> 2 0 0 father 1#> 3 0 1 child 1#> 4 0 0 child 1#> 5 1 0 mother 2#> 6 1 0 father 2data$number<-c(1,2,3,4)data$child<-1*(data$number==3)mm<-familycluster.index(data$cluster)head(mm$familypairindex,n=20)#> [1] 1 2 1 3 1 4 2 3 2 4 3 4 5 6 5 7 5 8 6 7pairs<- mm$pairsdim(pairs)#> [1] 3000 2head(pairs,12)#> [,1] [,2]#> [1,] 1 2#> [2,] 1 3#> [3,] 1 4#> [4,] 2 3#> [5,] 2 4#> [6,] 3 4#> [7,] 5 6#> [8,] 5 7#> [9,] 5 8#> [10,] 6 7#> [11,] 6 8#> [12,] 7 8 dtypes<-interaction( data[pairs[,1],"type"], data[pairs[,2],"type"]) dtypes<-droplevels(dtypes)table(dtypes)#> dtypes#> child.child father.child mother.child mother.father#> 500 1000 1000 500 dm<-model.matrix(~-1+factor(dtypes))Now with the pairs we fit the model
aa<-glm(ybin~x,data=data,family=binomial())tsp<-binomial.twostage(aa,data=data,clusters=data$cluster,theta.des=dm,pairs=cbind(pairs,1:nrow(dm)))summary(tsp)#> Dependence parameter for Odds-Ratio (Plackett) model#> $estimates#> theta se#> factor(dtypes)child.child 3.520109 0.7143594#> factor(dtypes)father.child 3.871698 0.6055928#> factor(dtypes)mother.child 3.922180 0.6233304#> factor(dtypes)mother.father 1.328730 0.2657202#>#> $log.or#> Estimate Std.Err 2.5% 97.5% P-value#> factor(dtypes)child.child 1.2585 0.2029 0.8607 1.6562 5.596e-10#> factor(dtypes)father..... 1.3537 0.1564 1.0471 1.6603 4.952e-18#> factor(dtypes)mother..... 1.3666 0.1589 1.0552 1.6781 8.017e-18#> factor(dtypes)mother......1 0.2842 0.2000 -0.1077 0.6762 1.552e-01#>#> $type#> [1] "plackett"#>#> attr(,"class")#> [1] "summary.mets.twostage"To fit the pairwise odds-ratio model in the case of apair-specification there are two options for fitting the model.
Starting by the second option. We need to start by specify the designof the odds-ratio of each pair. We set up the data and find allcombinations within the pairs. Subsequently, we remove all the emptygroups, by grouping together the factor levels 4:9, and then weconstruct the design.
tdp<-cbind( dataid[pair.new[,1],],dataid[pair.new[,2],])names(tdp)<-c(paste(names(dataid),"1",sep=""),paste(names(dataid),"2",sep=""))tdp<-transform(tdp,tt=interaction(type1,type2))dlevel(tdp)#> tt #levels=:6#> [1] "child.child" "father.child" "mother.child" "child.father"#> [5] "father.father" "mother.father"#> -----------------------------------------drelevel(tdp,newlevels=list(mother.father=4:9))<- obs.types~ttdtable(tdp,~tt+obs.types)#>#> obs.types mother.father child.child father.child mother.child#> tt#> child.child 0 232 0 0#> father.child 0 0 525 0#> mother.child 0 0 0 510#> child.father 0 0 0 0#> father.father 0 0 0 0#> mother.father 233 0 0 0tdp<-model.matrix(~-1+factor(obs.types),tdp)We then can fit the pairwise model using the pairs and thepair-design for descrbing the OR. The results are consistent with thethe ACE model as the mother-father have a lower dependence as is dueonly the environmental effects. All other combinations should have thesame dependence as also seem to be the case.
To fit the OR model it is generally recommended to use the var.linkto use the parmetrization with log-odd-ratio regression.
sessionInfo()#> R version 4.5.1 (2025-06-13)#> Platform: aarch64-apple-darwin25.0.0#> Running under: macOS Tahoe 26.0.1#>#> Matrix products: default#> BLAS: /Users/kkzh/.asdf/installs/R/4.5.1/lib/R/lib/libRblas.dylib#> LAPACK: /Users/kkzh/.asdf/installs/R/4.5.1/lib/R/lib/libRlapack.dylib; LAPACK version 3.12.1#>#> locale:#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8#>#> time zone: Europe/Copenhagen#> tzcode source: internal#>#> attached base packages:#> [1] stats graphics grDevices utils datasets methods base#>#> other attached packages:#> [1] timereg_2.0.7 survival_3.8-3 mets_1.3.8#>#> loaded via a namespace (and not attached):#> [1] cli_3.6.5 knitr_1.50 rlang_1.1.6#> [4] xfun_0.53 jsonlite_2.0.0 future.apply_1.20.0#> [7] listenv_0.9.1 lava_1.8.1 htmltools_0.5.8.1#> [10] sass_0.4.10 rmarkdown_2.30 grid_4.5.1#> [13] evaluate_1.0.5 jquerylib_0.1.4 fastmap_1.2.0#> [16] numDeriv_2016.8-1.1 yaml_2.3.10 mvtnorm_1.3-3#> [19] lifecycle_1.0.4 compiler_4.5.1 codetools_0.2-20#> [22] Rcpp_1.1.0 future_1.67.0 lattice_0.22-7#> [25] digest_0.6.37 R6_2.6.1 parallelly_1.45.1#> [28] parallel_4.5.1 splines_4.5.1 Matrix_1.7-4#> [31] bslib_0.9.0 tools_4.5.1 globals_0.18.0#> [34] cachem_1.1.0