Precision matrix estimation requires selecting appropriate regularization parameterλ to balance sparsity (number of edges) and model fit (likelihood), and a mixing parameterα to trade off between element-wise (individual-level) and block-wise (group-level) penalties.
In a Gaussian graphical model (GGM), the data matrixXn × d consists ofn independent and identically distributed observationsX1, …, Xn drawn fromNd(μ, Σ). LetΩ = Σ−1 denote the precision matrix, and define the empirical covariance matrix as$S = n^{-1} \sum_{i=1}^n (X_i-\bar{X})(X_i-\bar{X})^\top$. Up to an additive constant, the negative log-likelihood (nll) forΩ simplified to$$\mathrm{nll}(\Omega) = \frac{n}{2}[-\log\det(\Omega) + \mathrm{tr}(S\Omega)].$$ The edge setE(Ω) is determined by the non-zero off-diagonal entries: an edge(i, j) is included if and only ifωij ≠ 0 fori < j. The number of edges is therefore given by|E(Ω)|.
Ω̂AIC = arg min Ω{2 nll(Ω) + 2 |E(Ω)|}.
Ω̂BIC = arg min Ω{2 nll(Ω) + log (n) |E(Ω)|}.
Ω̂EBIC = arg min Ω{2 nll(Ω) + log (n) |E(Ω)| + 4 ξ log (d) |E(Ω)|},
whereξ ∈ [0, 1] is a tuning parameter. Settingξ = 0 reduces EBIC to the classic BIC.
Ω̂HBIC = arg min Ω{2 nll(Ω) + log [log (n)] log (d) |E(Ω)|}.
Figure 1 illustrates theK-fold cross-validation procedure used for tuning the parametersλ andα. The notation#λ and#α denotes the number of candidate values considered forλ andα, respectively, forming a grid of#λ × #α total parameter combinations. For each of theK iterations, negative log-likelihood loss is evaluated for all parameter combinations, yieldingK performance values per combination. The optimal parameter pair is selected as the one achieving the lowest average loss across theK iterations.