PosiR provides tools for post-selection inference(PoSI) in linear regression models. Post-Selection Inference addressesthe challenge of performing valid statistical inference after modelselection, ensuring that confidence intervals maintain their nominalcoverage probability (e.g., 95%) even when the model is chosen based onthe data. The package implements simultaneous confidence intervals usingbootstrap-based max-t statistics, following Algorithm 1 fromKuchibhotla, Kolassa, and Kuffner (2022).
You can install the development version ofPosiR fromGitHub:
# Install devtools if not already installedif (!requireNamespace("devtools",quietly =TRUE)) {install.packages("devtools")}# Install PosiRdevtools::install()# Optional dependencies for vignette and examplesinstall.packages(c("dplyr","pbapply"))This example demonstrates how to usesimultaneous_ci()to compute simultaneous confidence intervals for regression coefficientsacross a set of models:
library(PosiR)# Simulate dataset.seed(123)X<-matrix(rnorm(100*3),100,3)colnames(X)<-c("X1","X2","X3")y<-1+ X[,"X1"]*0.5+rnorm(100)# True intercept = 1, X1 coefficient = 0.5# Define model universe (column indices of X)Q<-list(model1 =1:2,# Model with X1, X2model2 =1:3# Model with X1, X2, X3)# Compute simultaneous confidence intervalsresult<-simultaneous_ci(X, y, Q,B =500,verbose =FALSE)# View resultsprint(result$intervals)#> model_id coefficient_name estimate lower upper psi_hat_nqj#> 1 model1 (Intercept) 0.96831201 0.7198033 1.2168207 1.084196#> 2 model1 X1 0.44983825 0.2037940 0.6958825 1.062799#> 3 model2 (Intercept) 0.97292290 0.7230406 1.2228052 1.096215#> 4 model2 X1 0.45219170 0.2012421 0.7031413 1.105600#> 5 model2 X2 0.04485171 -0.1971332 0.2868366 1.028019#> se_nqj#> 1 0.1041248#> 2 0.1030922#> 3 0.1047003#> 4 0.1051475#> 5 0.1013913# Plot the intervalsplot(result,main ="Simultaneous Confidence Intervals",las.labels =1)
##Interpretation
The outputresult$intervals provides the coefficientestimates and simultaneous 95% confidence intervals for each model inQ. For example:
The(Intercept) andX1 intervals inmodel1 should contain their true values (1 and 0.5,respectively).
The intervals are wider than naive intervals to account for modelselection uncertainty, ensuring valid coverage across all models inQ.
Vignette: Runvignette(“Vignette”).
Source Paper: Kuchibhotla, A., Kolassa, J., & Kuffner, T. (2022).Post-selection inference. Annual Review of Statistics and ItsApplication, 9(1), 505–527. DOI:10.1146/annurev-statistics-100421-044639.