Movatterモバイル変換


[0]ホーム

URL:


Selection Criteria for Parameters in grasps

Introduction

Precision matrix estimation requires selecting appropriate regularization parameterλ to balance sparsity (number of edges) and model fit (likelihood), and a mixing parameterα to trade off between element-wise (individual-level) and block-wise (group-level) penalties.

Background: Negative Log-Likelihood

In a Gaussian graphical model (GGM), the data matrixXn × d consists ofn independent and identically distributed observationsX1, …, Xn drawn fromNd(μ, Σ). LetΩ = Σ−1 denote the precision matrix, and define the empirical covariance matrix as$S = n^{-1} \sum_{i=1}^n (X_i-\bar{X})(X_i-\bar{X})^\top$. Up to an additive constant, the negative log-likelihood (nll) forΩ simplified to$$\mathrm{nll}(\Omega) = \frac{n}{2}[-\log\det(\Omega) + \mathrm{tr}(S\Omega)].$$ The edge setE(Ω) is determined by the non-zero off-diagonal entries: an edge(i, j) is included if and only ifωij ≠ 0 fori < j. The number of edges is therefore given by|E(Ω)|.

Selection Criteria

  1. AIC: Akaike information criterion(Akaike 1973)

Ω̂AIC = arg min Ω{2 nll(Ω) + 2 |E(Ω)|}.

  1. BIC: Bayesian information criterion(Schwarz 1978)

Ω̂BIC = arg min Ω{2 nll(Ω) + log (n) |E(Ω)|}.

  1. EBIC: Extended Bayesian information criterion(Chen and Chen 2008; Foygel and Drton 2010)

Ω̂EBIC = arg min Ω{2 nll(Ω) + log (n) |E(Ω)| + 4 ξ log (d) |E(Ω)|},

whereξ ∈ [0, 1] is a tuning parameter. Settingξ = 0 reduces EBIC to the classic BIC.

  1. HBIC: High dimensional Bayesian information criterion(Wang, Kim, and Li 2013; Fan et al. 2017)

Ω̂HBIC = arg min Ω{2 nll(Ω) + log [log (n)] log (d) |E(Ω)|}.

  1. K-fold cross validation with negative log-likelihood loss.

Figure 1 illustrates theK-fold cross-validation procedure used for tuning the parametersλ andα. The notation#λ and#α denotes the number of candidate values considered forλ andα, respectively, forming a grid of#λ × #α total parameter combinations. For each of theK iterations, negative log-likelihood loss is evaluated for all parameter combinations, yieldingK performance values per combination. The optimal parameter pair is selected as the one achieving the lowest average loss across theK iterations.

Figure 1:K-fold cross-validation procedure for tuning (λ,α).

Reference

Akaike, Hirotogu. 1973.“Information Theory and an Extension of the Maximum Likelihood Principle.” InSecond International Symposium on Information Theory, edited by Boris Nikolaevich Petrov and Frigyes Csáki, 267–81. Budapest, Hungary: Akadémiai Kiadó.
Chen, Jiahua, and Zehua Chen. 2008.“ExtendedBayesian Information Criteria for Model Selection with Large Model Spaces.”Biometrika 95 (3): 759–71.https://doi.org/10.1093/biomet/asn034.
Fan, Jianqing, Han Liu, Yang Ning, and Hui Zou. 2017.“High Dimensional Semiparametric Latent Graphical Model for Mixed Data.”Journal of the Royal Statistical Society Series B: Statistical Methodology 79 (2): 405–21.https://doi.org/10.1111/rssb.12168.
Foygel, Rina, and Mathias Drton. 2010.“Extended Bayesian Information Criteria forGaussian Graphical Models.” InAdvances in Neural Information Processing Systems 23 (NIPS 2010), edited by J. Lafferty, C. Williams, J. Shawe-Taylor, R. Zemel, and A. Culotta, 604–12. Red Hook, NY, USA: Curran Associates, Inc.https://dl.acm.org/doi/10.5555/2997189.2997257.
Schwarz, Gideon. 1978.“Estimating the Dimension of a Model.”The Annals of Statistics 6 (2): 461–64.https://doi.org/10.1214/aos/1176344136.
Wang, Lan, Yongdai Kim, and Runze Li. 2013.“Calibrating Nonconvex Penalized Regression in Ultra-High Dimension.”The Annals of Statistics 41 (5): 2505–36.https://doi.org/10.1214/13-AOS1159.

[8]ページ先頭

©2009-2025 Movatter.jp