Usage of functions LF and QF in Linear/Logistic regressionsettings

We propose a few examples on the usage of SIHR to simulated dataset.We will show how to conduct inference for linear functionals (LF) andquadratic functionals (QF) on linear and logistic regression settings,respectively.

Load the library:

library(SIHR)

Linear Regression Setting

We consider the setting that\(n=200,p=150\) with\[X_i \sim N(\textbf{0}_p, \textbf{I}_p),\; Y_i = \alpha + X_i^\intercal\beta + \epsilon_i, \; \epsilon_i\sim N(0,1),\;\textrm{where }\; \alpha = -0.5, \; \beta = (0.5, \textbf{1}_4,\textbf{0}_{p-5}).\] Our goal is to construct valid inference for objectives:

\(\beta_1 = 0.5\)
\(\beta_1 + \beta_2 = 1.5\)
\(\beta_{G}^\intercal \Sigma_{G,G}\beta_{G} = 3.25\), where\(\Sigma=\mathbb{E}[X_i^\intercal X_i] =\textbf{I}_p\) and\(G=\{1,2,3,4\}\).

The 1st and 2nd objectives will be achieved togther byLF(), while the 3d objective will be conducted withQF().

Generate Data

set.seed(0)n<-200p<-150X<-matrix(rnorm(n* p),nrow = n,ncol = p)y<--0.5+ X%*%c(0.5,rep(1,4),rep(0, p-5))

LF: Linear Functionals

Loadings for Linear Functionals

loading1<-c(1,rep(0, p-1))# for 1st objective, true value = 0.5loading2<-c(1,1,rep(0, p-2))# for 2nd objective, true value = 1.5loading.mat<-cbind(loading1, loading2)

Conduct Inference, callLF withmodel="linear":

Est<-LF(X, y, loading.mat,model ="linear",intercept =TRUE,intercept.loading =FALSE,verbose =TRUE)#> ---> Computing for loading (1/2)...#> The projection direction is identified at mu = 0.044356at step =4#> ---> Computing for loading (2/2)...#> The projection direction is identified at mu = 0.044356at step =4

The parameterintercept indicates whether we fit themodel with/without intercept term. The parameterintercept.loading indicates whether we include interceptterm in the inference objective. In this example, the model is fittedwith intercept, but we do not include it in our final objective.

Methods forLF

ci(Est)#>   loading     lower     upper#> 1       1 0.4892515 0.5115561#> 2       2 1.4886438 1.5184631summary(Est)#> Call:#> Inference for Linear Functional#>#> Estimators:#>  loading est.plugin est.debias Std. Error z value Pr(>|z|)#>        1     0.4764     0.5004   0.005690   87.94        0 ***#>        2     1.4533     1.5036   0.007607  197.65        0 ***

Notice that the true values are\(0.5\) and\(1.5\) for 1st and 2nd objectiverespectively, both are included in their corresponding confidenceinterval. Also it is evident that our bias-corrected estimators is muchcloser to the true values than the Lasso estimators.

QF: Quadratic Functionals

For quadratic functionals, we need to specify the subset\(G \subseteq [p]\). If argument\(A\) is not specified (default = NULL), wewill automatically conduct inference on\(\beta_G \Sigma_{G,G} \beta_G\).

G<-c(1:4)# 3rd objective, true value = 3.25

Conduct Inference, callQF withmodel="linear". The argumentsplit indicateswhether we split samples or not for computing the initial estimator.

Est<-QF(X, y, G,A =NULL,model ="linear",intercept =TRUE,verbose =TRUE)#> The projection direction is identified at mu = 0.062729at step =3

ci method for QF

ci(Est)#>    tau    lower    upper#> 1 0.25 2.239521 3.890422#> 2 0.50 2.233725 3.896219#> 3 1.00 2.222250 3.907693

summary method for QF

summary(Est)#> Call:#> Inference for Quadratic Functional#>#>   tau est.plugin est.debias Std. Error z value  Pr(>|z|)#>  0.25        2.9      3.065     0.4212   7.278 3.400e-13 ***#>  0.50        2.9      3.065     0.4241   7.227 4.947e-13 ***#>  1.00        2.9      3.065     0.4300   7.128 1.016e-12 ***

In the output results, each row represents the result for differentvalues of\(\tau\), the enlargementfactor for asymptotic variance to handle super-efficiency. Notice thatthe true value is\(3.25\) for 3rdobjective, which is included in the confidence interval.

Logistic Regression Setting

The procedures of usage in the logistic regression setting are almostthe same as the one in linear setting, except that we need to specifythe argumentmodel="logistic" ormodel="logistic_alter", instead ofmodel="linear". We propose two different debiasing methodsfor logistic regression, both work theoretically and empiricially.

We consider the setting that\(n=200,p=150\) with\[X_i \sim N(\textbf{0}_p, \textbf{I}_p),\; P_i = \frac{\exp(\alpha +X_i^\intercal \beta)}{1+\exp(\alpha + X_i^\intercal \beta)},\; Y_i ={\rm Binomial}(P_i),\;\textrm{where }\; \alpha = -0.5, \;\beta = (0.5, 1, \textbf{0}_{p-2}).\] Our goal is to construct valid inference for objectives:

\(\beta_1 + \beta_2 = 1.5\)
\(-\frac{1}{2}\beta_1 - \beta_2 =-1.25\)
\(\beta_{G}^\intercal \Sigma_{G,G}\beta_{G} = 1.25\), where\(\Sigma=\mathbb{E}[X_i^\intercal X_i] =\textbf{I}_p\) and\(G=\{1,2,3\}\).

The 1st and 2nd objectives will be achieved togther byLF(), while the 3d objective will be conducted withQF().

Generate Data

set.seed(1)n<-200p<-120X<-matrix(rnorm(n* p),nrow = n,ncol = p)val<--1.5+ X[,1]*0.5+ X[,2]*1prob<-exp(val)/ (1+exp(val))y<-rbinom(n,1, prob)

LF: Linear Functionals

Loadings for Linear Functionals

loading1<-c(1,1,rep(0, p-2))# for 1st objective, true value = 1.5loading2<-c(-0.5,-1,rep(0, p-2))# for 2nd objective, true value = -1.25loading.mat<-cbind(loading1, loading2)

Conduct Inference, callLF withmodel="logistic" ormodel="logistic_alter":

Est<-LF(X, y, loading.mat,model ="logistic",verbose =TRUE)#> ---> Computing for loading (1/2)...#> The projection direction is identified at mu = 0.028911at step =5#> ---> Computing for loading (2/2)...#> The projection direction is identified at mu = 0.028911at step =5

Methods forLF

ci(Est)#>   loading      lower     upper#> 1       1  0.6510009  1.866141#> 2       2 -1.3927844 -0.424308summary(Est)#> Call:#> Inference for Linear Functional#>#> Estimators:#>  loading est.plugin est.debias Std. Error z value  Pr(>|z|)#>        1     0.3745     1.2586     0.3100   4.060 4.907e-05 ***#>        2    -0.2762    -0.9085     0.2471  -3.677 2.357e-04 ***

Notice that the true values are\(1.5\) and\(-1.25\) for 1st and 2nd objectiverespectively, both are included in their corresponding confidenceinterval. Also it is evident that our bias-corrected estimators is muchcloser to the true values than the Lasso estimators.

QF: Quadratic Functionals

For quadratic functionals, we find that sufficient larger sample sizeis needed for better empirical result, since we need to split samples toobtain initial estimators. Thus, we generate another simulated data butwith larger sample size\(n=400\).

set.seed(0)n<-400p<-120X<-matrix(rnorm(n* p),nrow = n,ncol = p)val<--1.5+ X[,1]*0.5+ X[,2]*1prob<-exp(val)/ (1+exp(val))y<-rbinom(n,1, prob)G<-c(1:3)# 3rd objective, true value = 1.25

Conduct Inference, callQF withmodel="logistic_alter".

Est<-QF(X, y, G,A =NULL,model ="logistic_alter",intercept =TRUE,verbose =TRUE)#> The projection direction is identified at mu = 0.029056at step =5

ci method for QF

ci(Est)#>    tau     lower    upper#> 1 0.25 0.2274503 2.048520#> 2 0.50 0.1339998 2.141970#> 3 1.00 0.0000000 2.306665

summary method for QF

summary(Est)#> Call:#> Inference for Quadratic Functional#>#>   tau est.plugin est.debias Std. Error z value Pr(>|z|)#>  0.25     0.6434      1.138     0.4646   2.450  0.01430 *#>  0.50     0.6434      1.138     0.5122   2.222  0.02631 *#>  1.00     0.6434      1.138     0.5963   1.908  0.05633 .

Movatterモバイル変換

Quick Start to SIHR

Usage of functions LF and QF in Linear/Logistic regressionsettings

Linear Regression Setting

LF: Linear Functionals

QF: Quadratic Functionals

Logistic Regression Setting

LF: Linear Functionals

QF: Quadratic Functionals