Here we describe the details of the calculations for theKenward-Roger degrees of freedom and the adjusted covariance matrix ofthe coefficients.
The model definition is the same as what we have inDetails of the model fitting inmmrm. We are using the same notations.
For each subject\(i\) we observe avector\[Y_i = (y_{i1}, \dotsc, y_{im_i})^\top \in \mathbb{R}^{m_i}\] and given a design matrix\[X_i \in \mathbb{R}^{m_i \times p}\] and a corresponding coefficient vector\(\beta \in \mathbb{R}^{p}\) we assume thatthe observations are multivariate normal distributed:\[Y_i \sim N(X_i\beta, \Sigma_i)\] where the covariance matrix\(\Sigma_i \in \mathbb{R}^{m_i \times m_i}\)is derived by subsetting the overall covariance matrix\(\Sigma \in \mathbb{R}^{m \times m}\)appropriately by\[\Sigma_i = G_i^{-1/2} S_i^\top \Sigma S_i G_i^{-1/2}\] where the subsetting matrix\(S_i\in \{0, 1\}^{m \times m_i}\) contains in each of its\(m_i\) columns contains a single 1indicating which overall time point is matching\(t_{ih}\).\(G_i\in \mathbb{R}_{\gt 0}^{m_i \times m_i}\) is the diagonal weightmatrix.
Conditional on the design matrices\(X_i\), the coefficient vector\(\beta\) and the covariance matrix\(\Sigma\) we assume that the observationsare independent between the subjects.
We can write the linear model for all subjects together as\[Y = X\beta + \epsilon\] where\(Y \in \mathbb{R}^N\)combines all subject specific observations vectors\(Y_i\) such that we have in total\(N = \sum_{i = 1}^{n}{m_i}\) observations,\(X \in \mathbb{R}^{N \times p}\)combines all subject specific design matrices and\(\epsilon \in \mathbb{R}^N\) has amultivariate normal distribution\[\epsilon \sim N(0, \Omega)\] where\(\Omega \in \mathbb{R}^{N\times N}\) is block-diagonal containing the subject specific\(\Sigma_i\) covariance matrices on thediagonal and 0 in the remaining entries.
The mathematical derivation of the Kenward-Roger method is based onthe Taylor expansion of the obtained covariance matrix of\(\hat\beta\) to get a more accurate estimatefor it. All these derivations are based on the restricted maximumlikelihood. Following the samenotation, thecovariance matrix,\(\Omega\) can berepresented as a function of covariance matrix parameters\(\theta = (\theta_1, \dotsc,\theta_k)^\top\), i.e. \(\Omega(\theta)\). Here after model fittingwithmmrm, we obtain the estimate\(\hat\beta =\Phi(\hat\theta)X^\top\Omega(\hat\theta)^{-1}Y\), where\(\Phi(\theta) = \left\{X^\top \Omega(\theta)^{-1}X\right\} ^{-1}\) is the asymptotic covariance matrix of\(\hat\beta\). However,Kackar and Harville (1984) suggests thatalthough the\(\hat\beta\) is unbiasedfor\(\beta\), the covariance matrix,\(\hat\Phi = \left\{X^\top \hat\OmegaX\right\}^{-1}\) can be biased. They showed that the variabilityof\(\hat\beta\) can be partitionedinto two components,
\[ \Phi_A = \Phi + \Lambda\]
where\(\Phi\) is thevariance-covariance matrix of the asymptotic distribution of\(\hat\beta\) as\(n\rightarrow \infty\) as defined above, and\(\Lambda\) represents the amount towhich the asymptotic variance-covariance matrix underestimates\(\Phi_A\).
Based on a Taylor series expansion around\(\theta\),\(\Lambda\) can be approximated by
\[ \Lambda \simeq \Phi \left\{\sum_{h=1}^k{\sum_{j=1}^k{W_{hj}(Q_{hj} -P_h \Phi P_j)} }\right\} \Phi\] where\[ P_h = X^\top \frac{\partial{\Omega^{-1}}}{\partial \theta_h} X\]\[ Q_{hj} = X^\top \frac{\partial{\Omega^{-1}}}{\partial \theta_h} \Omega\frac{\partial{\Omega^{-1}}}{\partial \theta_j} X\]
\(W\) is the inverse of the Hessianmatrix of the log-likelihood function of\(\theta\) evaluated at the estimate\(\hat\theta\), i.e. the observed FisherInformation matrix as a consistent estimator of the variance-covariancematrix of\(\hat\theta\).
Again, based on a Taylor series expansion about\(\theta\),Kenwardand Roger (1997) show that\[ \hat\Phi \simeq \Phi + \sum_{h=1}^k{(\hat\theta_h -\theta_h)\frac{\partial{\Phi}}{\partial{\theta_h}}} + \frac{1}{2}\sum_{h=1}^k{\sum_{j=1}^k{(\hat\theta_h - \theta_h)(\hat\theta_j -\theta_j)\frac{\partial^2{\Phi}}{\partial{\theta_h}\partial{\theta_j}}}}\] Ignoring the possible bias in\(\hat\theta\),\[ E(\hat\Phi) \simeq \Phi + \frac{1}{2}\sum_{h=1}^k{\sum_{j=1}^k{W_{hj}\frac{\partial^2{\Phi}}{\partial{\theta_h}\partial{\theta_j}}}}\] Using previously defined notations, this can be furtherwritten as\[ \frac{\partial^2{\Phi}}{\partial{\theta_h}\partial{\theta_j}} = \Phi(P_h \Phi P_j + P_j \Phi P_h - Q_{hj} - Q_{jh} + R_{hj}) \Phi\] where\[ R_{hj} =X^\top\Omega^{-1}\frac{\partial^2\Omega}{\partial{\theta_h}\partial{\theta_j}}\Omega^{-1} X\]
substituting\(\Phi\) and\(\Lambda\) back to the\(\hat\Phi_A\), we have
\[ \hat\Phi_A = \hat\Phi + 2\hat\Phi\left\{\sum_{h=1}^k{\sum_{j=1}^k{W_{hj}(Q_{hj} - P_h \hat\Phi P_j -\frac{1}{4}R_{hj})} }\right\} \hat\Phi\]
where\(\Omega(\hat\theta)\)replaces\(\Omega(\theta)\) in theright-hand side.
Please note that, if we ignore\(R_{hj}\), the second-order derivatives, wewill get a different estimate of adjusted covariance matrix, and we callthis the linear Kenward-Roger approximation.
In mmrm models,\(\Omega\) is ablock-diagonal matrix, hence we can calculate\(P\),\(Q\)and\(R\) for each subject and add themup.
\[ P_h = \sum_{i=1}^{N}{P_{ih}} = \sum_{i=1}^{N}{X_i^\top\frac{\partial{\Sigma_i^{-1}}}{\partial \theta_h} X_i}\]
\[ Q_{hj} = \sum_{i=1}^{N}{Q_{ihj}} = \sum_{i=1}^{N}{X_i^\top\frac{\partial{\Sigma_i^{-1}}}{\partial \theta_h} \Sigma_i\frac{\partial{\Sigma_i^{-1}}}{\partial \theta_j} X_i}\]
\[ R_{hj} = \sum_{i=1}^{N}{R_{ihj}} =\sum_{i=1}^{N}{X_i^\top\Sigma_i^{-1}\frac{\partial^2\Sigma_i}{\partial{\theta_h}\partial{\theta_j}}\Sigma_i^{-1} X_i}\]
The derivative of the overall covariance matrix\(\Sigma\) with respect to the varianceparameters can be calculated through the derivatives of the Choleskyfactor, and hence obtained through automatic differentiation, followingmatrixidentities calculations.\[ \frac{\partial{\Sigma}}{\partial{\theta_h}} =\frac{\partial{LL^\top}}{\partial{\theta_h}} =\frac{\partial{L}}{\partial{\theta_h}}L^\top +L\frac{\partial{L^\top}}{\partial{\theta_h}}\]
\[ \frac{\partial^2{\Sigma}}{\partial{\theta_h}\partial{\theta_j}} =\frac{\partial^2{L}}{\partial{\theta_h}\partial{\theta_j}}L^\top +L\frac{\partial^2{L^\top}}{\partial{\theta_h}\partial{\theta_j}} +\frac{\partial{L}}{\partial{\theta_h}}\frac{\partial{L^T}}{\partial{\theta_j}}+\frac{\partial{L}}{\partial{\theta_j}}\frac{\partial{L^\top}}{\partial{\theta_h}}\]
The derivatives of\(\Sigma^{-1}\)can be calculated through
\[ \frac{\partial{\Sigma\Sigma^{-1}}}{\partial{\theta_h}}\\ = \frac{\partial{\Sigma}}{\partial{\theta_h}}\Sigma^{-1} +\Sigma\frac{\partial{\Sigma^{-1}}}{\partial{\theta_h}} \\ = 0\]\[ \frac{\partial{\Sigma^{-1}}}{\partial{\theta_h}} = - \Sigma^{-1}\frac{\partial{\Sigma}}{\partial{\theta_h}}\Sigma^{-1}\]
If a subject do not have all visits, the corresponding covariancematrix can be represented as\[ \Sigma_i = S_i^\top \Sigma S_i\]
and the derivatives can be obtained through
\[ \frac{\partial{\Sigma_i}}{\partial{\theta_h}} = S_i^\top\frac{\partial{\Sigma}}{\partial{\theta_h}} S_i\]
\[ \frac{\partial^2{\Sigma_i}}{\partial{\theta_h}\partial{\theta_j}} =S_i^\top \frac{\partial^2{\Sigma}}{\partial{\theta_h}\partial{\theta_j}}S_i\]
The derivative of the\(\Sigma_i^{-1}\),\(\frac{\partial\Sigma_i^{-1}}{\partial{\theta_h}}\)can be calculated through\(\Sigma_i^{-1}\) and\(\frac{\partial{\Sigma_i}}{\partial{\theta_h}}\)using the above.
When fitting groupedmmrm models, the covariance matrixfor subject i of group\(g(i)\), can bewritten as\[ \Sigma_i = S_i^\top \Sigma_{g(i)} S_i$.\] Assume there are\(B\)groups, the number of parameters is increased by\(B\) times. With the fact that for eachgroup, the corresponding\(\theta\)will not affect other parts, we will have block-diagonal\(P\),\(Q\)and\(R\) matrices, where the blocksare given by:
\[P_h = \begin{pmatrix}P_{h, 1} & \dots & P_{h, B} \\\end{pmatrix}\]
\[Q_{hj} = \begin{pmatrix}Q_{hj, 1} & 0 & \dots & \dots & 0 \\0 & Q_{hj, 2} & 0 & \dots & 0\\\vdots & & \ddots & & \vdots \\\vdots & & & \ddots & \vdots \\0 & \dots & \dots & 0 & Q_{hj, B}\end{pmatrix}\]
\[R_{hj} = \begin{pmatrix}R_{hj, 1} & 0 & \dots & \dots & 0 \\0 & R_{hj, 2} & 0 & \dots & 0\\\vdots & & \ddots & & \vdots \\\vdots & & & \ddots & \vdots \\0 & \dots & \dots & 0 & R_{hj, B}\end{pmatrix}\]
Use\(P_{j, b}\) to denote the blockdiagonal part for group\(b\), we have\[ P_{h,b} = \sum_{g(i) = b}{P_{ih}} = \sum_{g(i) = b}{X_i^\top\frac{\partial{\Sigma_i^{-1}}}{\partial \theta_h} X_i}\]
\[ Q_{hj,b} = \sum_{g(i) = b}{Q_{ihj}} = \sum_{g(i) = b}{X_i^\top\frac{\partial{\Sigma_i^{-1}}}{\partial \theta_h} \Sigma_i\frac{\partial{\Sigma_i^{-1}}}{\partial \theta_j} X_i}\]
\[ R_{hj,b} = \sum_{g(i) = b}{R_{ihj}} = \sum_{g(i) =b}{X_i^\top\Sigma_i^{-1}\frac{\partial^2\Sigma_i}{\partial{\theta_h}\partial{\theta_j}}\Sigma_i^{-1} X_i}\]
Similarly for\(R\).
Under weights mmrm model, the covariance matrix for subject\(i\), can be represented as
\[ \Sigma_i = G_i^{-1/2} S_i^\top \Sigma S_i G_i^{-1/2}\]
Where\(G_i\) is a diagonal matrixof the weights. Then, when deriving\(P\),\(Q\)and\(R\), there are no mathematicaldifferences as they are constant, and having\(G_i\) in addition to\(S_i\) does not change the algorithms and wecan simply multiply the formulas with\(G_i^{-1/2}\), similarly as above for thesubsetting matrix.
Suppose we are testing the linear combination of\(\beta\),\(C\beta\) with\(C\in \mathbb{R}^{c\times p}\), we can use the followingF-statistic\[ F = \frac{1}{c} (\hat\beta - \beta)^\top C (C^\top \hat\Phi_A C)^{-1}C^\top (\hat\beta - \beta)\] and\[ F^* = \lambda F\] follows exact\(F_{c,m}\)distribution.
When we have only one coefficient to test, then\(C\) is a column vector containing a single\(1\) inside. We can still follow thesame calculations as for the multi-dimensional case. This recovers thedegrees of freedom results of the Satterthwaite method.
\(\lambda\) and\(m\) can be calculated through
\[ M = C (C^\top \Phi C)^{-1} C^\top\]
\[ A_1 = \sum_{h=1}^k{\sum_{j=1}^k{W_{hj} tr(M \Phi P_h \Phi) tr(M \PhiP_j \Phi)}}\]
\[ A_2 = \sum_{h=1}^k{\sum_{j=1}^k{W_{hj} tr(M \Phi P_h \Phi M \Phi P_j\Phi)}}\]
\[ B = \frac{1}{2c}(A_1 + 6A_2)\]
\[ g = \frac{(c+1)A_1 - (c+4)A_2}{(c+2)A_2}\]
\[ c_1 = \frac{g}{3c+2(1-g)}\]
\[ c_2 = \frac{c-g}{3c+2(1-g)}\]
\[ c_3 = \frac{c+2-g}{3c+2(1-g)}\]\[E^*={\left\{1-\frac{A_2}{c}\right\}}^{-1}\]\[V^*=\frac{2}{c}{\left\{\frac{1+c_1B}{(1-c_2 B)^2(1-c_3 B)}\right\}}\]
\[\rho =\frac{V^{*}}{2(E^*)^2}\]
\[m = 4 + \frac{c+2}{c\rho - 1}\]\[\lambda = \frac{m}{E^*(m-2)}\]
While the Kenward-Roger adjusted covariance matrix is adopting aTaylor series to approximate the true value, the choices ofparameterization can change the result. In a simple example ofunstructured covariance structure, in our current approach, where theparameters are elements of the Cholesky factor of\(\Sigma\) (seeparameterization), the second-orderderivatives of\(\Sigma\) over ourparameters, are non-zero matrices. However, if we use the elements of\(\Sigma\) as our parameters, then thesecond-order derivatives are zero matrices. However, the differences canbe small and will not affect the inference. If you would like to matchSAS results for the unstructured covariance model, you can use thelinear Kenward-Roger approximation.
mmrmIn packagemmrm, we have implemented Kenward-Rogercalculations based on the previous sections. Specially, for thefirst-order and second-order derivatives, we use automaticdifferentiation to obtain the results easily for non-spatial covariancestructure. For spatial covariance structure, we derive the exactresults.
For spatial exponential covariance structure, we have
\[\theta = (\theta_1,\theta_2)\]\[\sigma = e^{\theta_1}\]\[\rho = \frac{e^{\theta_2}}{1 +e^{\theta_2}}\]
\[\Sigma_{ij} = \sigma\rho^{d_{ij}}\] where\(d_{ij}\)is the distance between time point\(i\) and time point\(j\).
So the first-order derivatives can be written as:
\[ \frac{\partial{\Sigma_{ij}}}{\partial\theta_1} =\frac{\partial\sigma}{\partial\theta_1} \rho^{d_{ij}}\\ = e^{\theta_1}\rho^{d_{ij}} \\ = \Sigma_{ij}\]
\[ \frac{\partial{\Sigma_{ij}}}{\partial\theta_2} =\sigma\frac{\partial{\rho^{d_{ij}}}}{\partial\theta_2} \\ = \sigma\rho^{d_{ij}-1}{d_{ij}}\frac{\partial\rho}{\partial\theta_2}\\ = \sigma\rho^{d_{ij}-1}{d_{ij}}\rho(1-\rho) \\ = \sigma \rho^{d_{ij}} {d_{ij}} (1-\rho)\]
Second-order derivatives can be written as:
\[ \frac{\partial^2{\Sigma_{ij}}}{\partial\theta_1\partial\theta_1}\\ = \frac{\partial\Sigma_{ij}}{\partial\theta_1}\\ = \Sigma_{ij}\]
\[ \frac{\partial^2{\Sigma_{ij}}}{\partial\theta_1\partial\theta_2} =\frac{\partial^2{\Sigma_{ij}}}{\partial\theta_2\partial\theta_1} \\ = \frac{\partial\Sigma_{ij}}{\partial\theta_2}\\ = \sigma\rho^{d_{ij}-1}{d_{ij}}\rho(1-\rho)\\ = \sigma\rho^{d_{ij}}{d_{ij}}(1-\rho)\]
\[ \frac{\partial^2{\Sigma_{ij}}}{\partial\theta_2\partial\theta_2}\\ =\frac{\partial{\sigma\rho^{d_{ij}}{d_{ij}}(1-\rho)}}{\partial\theta_2}\\ = \sigma\rho^{d_{ij}}{d_{ij}}(1-\rho)(d_{ij} (1-\rho) - \rho)\]