Movatterモバイル変換

Here’s a basic example using a simulated dataset to demonstrate howto utilize the main functions of the plsmmLasso package. This exampleassumes an effect of a grouping variable and different nonlinearfunctions for each group.

library(plsmmLasso)# Simulate a datasetset.seed(123)data_sim<-simulate_group_inter(N =50,n_mvnorm =3,grouped =TRUE,timepoints =3:5,nonpara_inter =TRUE,sample_from =seq(0,52,13),cos =FALSE,A_vec =c(1,1.5))sim<- data_sim$sim# Fit the datax<-as.matrix(sim[,-1:-3])y<- sim$yseries<- sim$seriest<- sim$tbases<-create_bases(t)lambda<-0.0046gamma<-0.00001plsmm_output<-plsmm_lasso(x, y, series, t,name_group_var ="group", bases$bases,gamma = gamma,lambda = lambda,timexgroup =TRUE,criterion ="BIC")

One of the most important output of theplsmm_lasso()function is the estimates of the fixed-effects.

plsmm_output$lasso_output$theta#>  Intercept      group         x1         x2         x3         x4         x5#> 0.19729689 3.15155151 1.91905369 0.74891414 0.02274130 0.01291499 0.01049015

Here, we observe that some covariates have small values, but most arenon-zero. If we desire more regularization for the fixed-effects, we canuse a larger value for lambda.

Hyperparameter tuning

lambda<-0.1plsmm_output<-plsmm_lasso(x, y, series, t,name_group_var ="group", bases$bases,gamma = gamma,lambda = lambda,timexgroup =TRUE,criterion ="BIC")plsmm_output$lasso_output$theta#> Intercept     group        x1        x2        x3        x4        x5#> 3.5544003 0.4262652 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000

head(plsmm_output$lasso_output$alpha)#> [1] -5.576108e-02 -4.720350e-01 -8.130512e-03  2.368476e-10 -7.129181e-06#> [6]  0.000000e+00

Similar behavior would be observed for the alpha values if we were toincrease the value of gamma.

To find optimal values for gamma and lambda, we tune thesehyperparameters using BIC-type criteria with thetune_plsmm() function and a grid search.

lambdas<- gammas<-round(exp(seq(log(1),log(1*0.00001),length.out =5)),digits =5)tuned_plsmm<-tune_plsmm(x, y, series, t,name_group_var ="group", bases$bases,gamma_vec = gammas,lambda_vec = lambdas,timexgroup =TRUE,criterion ="BIC")

Thetune_plsmm() function tries every possiblecombination of the values from lambdas and gammas and returns the modelwith the best BIC (other options are BICC and EBIC). This example is forillustration purposes only; in practice, a more exhaustive grid shouldbe used.

Plotting the results

Theplot_fit() function allows for the visualization ofthe estimated mean trajectories as well as the estimate of the nonlinearfunctions. By default, only the observed time points are being used. Touse continuous time points, the argumentpredicted can beset toTRUE.

plot_fit(x, y, series, t,name_group_var ="group", tuned_plsmm)

plot_fit(x, y, series, t,name_group_var ="group", tuned_plsmm,predicted =TRUE)

Post-selection inference

To compute p-values on the fixed-effects, thedebias_plsmm() function can be used.

debias_plsmm(x, y, series, tuned_plsmm)#>         Estimate   Debiased Std. Error   Lower 95% Upper 95%      p-value#> group 3.20441442 3.31447827 0.33608394  2.65575376 3.9732028 6.079223e-23#> x1    1.95569696 2.03694234 0.21339004  1.61869786 2.4551868 1.352832e-21#> x2    0.72925412 0.68895582 0.26577859  0.16802979 1.2098818 9.535955e-03#> x3    0.02755599 0.03780433 0.04645279 -0.05324313 0.1288518 4.157466e-01#> x4    0.01691564 0.02535323 0.04110770 -0.05521786 0.1059243 5.373988e-01#> x5    0.01569818 0.02732562 0.04056090 -0.05217374 0.1068250 5.005061e-01

The function reports the original coefficients, debiasedcoefficients, standard errors, confidence intervals and p-values. Thesep-values are already adjusted for the selection process of the lasso,and provide valid inference.

Test on the nonlinearfunctions

Finally, we can perform tests on the nonlinear functions. The firstelement of the list is an overall test of equality. If the p-value is\(< 0.05\), we reject the nullhypothesis of equality and conclude that overall the two nonlinearfunctions are different. To obtain a comparison at each time point,confidence bands are computed, and a figure displaying these confidencebands is generated. For the time points associated with confidence bandsthat include\(0\), we cannot rejectthe null hypothesis that the nonlinear functions are the same for thistime point. The data frame that is used to generate this figure can befound in the second element of the output list.

test_f_results<-test_f(x, y, series, t,name_group_var ="group", tuned_plsmm,n_boot =10,verbose =TRUE)#>   |                                                                              |                                                                      |   0%  |                                                                              |=======                                                               |  10%  |                                                                              |==============                                                        |  20%  |                                                                              |=====================                                                 |  30%  |                                                                              |============================                                          |  40%  |                                                                              |===================================                                   |  50%  |                                                                              |==========================================                            |  60%  |                                                                              |=================================================                     |  70%  |                                                                              |========================================================              |  80%  |                                                                              |===============================================================       |  90%  |                                                                              |======================================================================| 100%#>#> Completed fitting Bootstrap samples. Now formatting results, and generating figure.

test_f_results[[1]]#>           T p-value#> 1 0.4785269       0head(test_f_results[[2]])#>     t   f diff.  Lower 95% Upper 95%#> 1 0.0 0.2365327 0.00586791 0.4414301#> 2 0.1 0.2400843 0.01051297 0.4443028#> 3 0.2 0.2436013 0.01512247 0.4471456#> 4 0.3 0.2470832 0.01969562 0.4499582#> 5 0.4 0.2505294 0.02423163 0.4527404#> 6 0.5 0.2539394 0.02872971 0.4554919

Similarly to theplot_fit() function, the argumentpredicted can be changed toTRUE to displaythe joint confidence bands as a continuous function of time rather thanat the observed time points only.

test_f_results<-test_f(x, y, series, t,name_group_var ="group", tuned_plsmm,n_boot =10,predicted =TRUE)#>   |                                                                              |                                                                      |   0%  |                                                                              |=======                                                               |  10%  |                                                                              |==============                                                        |  20%  |                                                                              |=====================                                                 |  30%  |                                                                              |============================                                          |  40%  |                                                                              |===================================                                   |  50%  |                                                                              |==========================================                            |  60%  |                                                                              |=================================================                     |  70%  |                                                                              |========================================================              |  80%  |                                                                              |===============================================================       |  90%  |                                                                              |======================================================================| 100%#>#> Completed fitting Bootstrap samples. Now formatting results, and generating figure.

Example II: Flexibility ofthe Model

Modelfitting without group-specific nonlinear functions

The plsmmLasso package offers flexibility; if the nonlinear functionsdo not differ between groups, thetimexgroup argument canbe set toFALSE. Below, we simulate a dataset where thenonlinear functions are the same and fit the data accordingly.

# Simulate a dataset with equal nonlinear functions per groupset.seed(123)data_sim<-simulate_group_inter(N =50,n_mvnorm =3,grouped =TRUE,timepoints =3:5,nonpara_inter =FALSE,sample_from =seq(0,52,13),cos =FALSE,A_vec =c(1,1.5))sim<- data_sim$sim# Fit the datax<-as.matrix(sim[,-1:-3])y<- sim$yseries<- sim$seriest<- sim$tbases<-create_bases(t)tuned_plsmm<-tune_plsmm(x, y, series, t,name_group_var ="group", bases$bases,gamma_vec = gammas,lambda_vec = lambdas,timexgroup =FALSE,criterion ="BIC")plot_fit(x, y, series, t,name_group_var ="group", tuned_plsmm)

Model fitting withoutgrouping variable

It is also possible not to use any grouping variable for thename_group_var argument. Here, we simulate a datasetwithout the effect of a grouping variable, resulting in no difference inthe height of the overall mean trajectories and using the same nonlinearfunctions.

# Simulate a dataset with no group effectset.seed(123)data_sim<-simulate_group_inter(N =50,n_mvnorm =3,grouped =FALSE,timepoints =3:5,nonpara_inter =FALSE,sample_from =seq(0,52,13),cos =FALSE,A_vec =c(1,1.5))sim<- data_sim$sim# Fit the datax<-as.matrix(sim[,-1:-3])y<- sim$yseries<- sim$seriest<- sim$tbases<-create_bases(t)tuned_plsmm<-tune_plsmm(x, y, series, t,name_group_var =NULL, bases$bases,gamma_vec = gammas,lambda_vec = lambdas,timexgroup =FALSE,criterion ="BIC")plot_fit(x, y, series, t,name_group_var =NULL, tuned_plsmm)