Movatterモバイル変換


[0]ホーム

URL:


Type:Package
Title:High Dimensional Penalized Generalized Linear Mixed Models(pGLMM)
Version:1.5.4.8
Date:2024-08-29
Description:Fits high dimensional penalized generalized linear mixed models using the Monte Carlo Expectation Conditional Minimization (MCECM) algorithm. The purpose of the package is to perform variable selection on both the fixed and random effects simultaneously for generalized linear mixed models. The package supports fitting of Binomial, Gaussian, and Poisson data with canonical links, and supports penalization using the MCP, SCAD, or LASSO penalties. The MCECM algorithm is described in Rashid et al. (2020) <doi:10.1080/01621459.2019.1671197>. The techniques used in the minimization portion of the procedure (the M-step) are derived from the procedures of the 'ncvreg' package (Breheny and Huang (2011) <doi:10.1214/10-AOAS388>) and 'grpreg' package (Breheny and Huang (2015) <doi:10.1007/s11222-013-9424-2>), with appropriate modifications to account for the estimation and penalization of the random effects. The 'ncvreg' and 'grpreg' packages also describe the MCP, SCAD, and LASSO penalties.
License:GPL-2 |GPL-3 [expanded from: GPL (≥ 2)]
Encoding:UTF-8
Imports:ggplot2, Matrix, methods, ncvreg, reshape2, rstan (≥ 2.18.1),stringr, mvtnorm, MASS, survival, rstantools, RcppParallel (≥5.0.1)
Depends:lme4, bigmemory, Rcpp (≥ 0.12.0), R (≥ 3.6.0)
LinkingTo:BH (≥ 1.66.0), bigmemory, Rcpp (≥ 0.12.0), RcppArmadillo,RcppEigen (≥ 0.3.3.3.0), rstan (≥ 2.18.1), StanHeaders (≥2.18.0), RcppParallel (≥ 5.0.1)
RoxygenNote:7.1.2
NeedsCompilation:yes
Packaged:2024-08-29 19:06:09 UTC; hheiling
Author:Hillary Heiling [aut, cre], Naim Rashid [aut], Quefeng Li [aut], Joseph Ibrahim [aut]
Suggests:testthat, knitr, rmarkdown
Biarch:true
SystemRequirements:GNU make
Maintainer:Hillary Heiling <hmheiling@gmail.com>
Repository:CRAN
Date/Publication:2024-08-29 19:30:01 UTC

Calculation of Penalty Parameter Sequence (Lambda Sequence)

Description

Calculates the sequence of penalty parameters used in the model selection procedure.This function calls functions from packagencvreg.

Usage

LambdaSeq(  X,  y,  family,  offset = NULL,  alpha = 1,  lambda.min = NULL,  nlambda = 10,  penalty.factor = NULL)

Arguments

X

matrix of standardized fixed effects (seestd function inncvreg documenation). X should not include intercept.

y

numeric vector of response values. If "survival" family,y are the event indicatorvalues (0 if censored, 1 if event)

family

a description of the error distribution and link function to be used in the model. Currently, theglmmPen algorithm allows the Binomial ("binomial" or binomial()), Gaussian ("gaussian" or gaussian()), and Poisson ("poisson" or poisson()) familieswith canonical links only. SeephmmPen for variable selection withinproportional hazards mixed models for survival data.

offset

numeric vector that can be used to specify an a priori known component to be included in the linear predictor during fitting. This should beNULL or a numeric vector of length equal to the number of observations

alpha

Tuning parameter for the Mnet estimator which controls the relative contributions from the MCP/SCAD/LASSO penalty and the ridge, or L2, penalty.alpha=1 is equivalent to the MCP/SCAD/LASSO penalty, whilealpha=0 is equivalent to ridge regression. However,alpha=0 is not supported;alpha may be arbitrarily small, but not exactly zero

lambda.min

numeric fraction between 0 and 1. The sequence of the lambda penalty parametersranges from the maximum lambda where all fixed and random effects are penalized to 0 and a minimum lambda value, which equals a small fraction of the maximum lambda. The parameterlambda.min specifies this fraction. Default value is set toNULL, whichautomatically selectslambda.min to equal 0.01 when the number of observations isgreater than the number of fixed effects predictors and 0.05 otherwise.Only usedif eitherlambda0_seq orlambda1_seq remain unspecified by the user(one or both of these sequence arguments set toNULL) and, consequently, one or more default sequences needto be calculated.

nlambda

positive integer specifying number of penalty parameters (lambda) with which to fit a model.

penalty.factor

an optional numeric vector equal to thefixef_noPen argumentinglmmPen

Value

numeric sequence of penalty parameters of lengthnlambda ranging from theminimum penalty parameter (first element) equal to fractionlambda.min multiplied by the maximum penalty parameter to the maximum penalty parameter (last element)


Control of Metropolis-within-Gibbs Adaptive Random Walk Sampling ProcedureControls the adaptive random walk Metropolis-within-Gibbs sampling procedure.

Description

Control of Metropolis-within-Gibbs Adaptive Random Walk Sampling Procedure

Controls the adaptive random walk Metropolis-within-Gibbs sampling procedure.

Usage

adaptControl(batch_length = 100, offset = 0)

Arguments

batch_length

positive integer specifying the number of posterior samples to collectbefore the proposal variance is adjusted based on the acceptance rate of the lastbatch_length accepted posterior samples. Default is set to 100. Batch length restrictedto be no less than 50.

offset

non-negative integer specifying an offset value for the increment of the proposalvariance adjustment. Optionally used to ensure the required diminishing adaptation condition. Default set to 0.⁠increment = min(0.01, 1 / sqrt(batch*batch_length + offset))⁠

Value

Function returns a list (inheriting from class "adaptControl") containing parameter specifications for the adaptive random walk sampling procedure.


Basal dataset: A composition of cancer datasets with top scoring pairs(TSPs) as covariates and binary response indicating if the subject's cancersubtype was basal-like.A dataset composed of four datasets combined from studies that contain gene expression data from subjects with several types of cancer.Two of these datasets contain gene expression data for subjects with Pancreatic Ductal Adenocarcinoma (PDAC), one dataset contains data for subjects with Breast Cancer, and the fourth dataset contains data for subjects with Bladder Cancer. The response of interest is whether or not the subject'scancer subtype was the basal-like subtype.See articles Rashid et al. (2020) "Modeling Between-Study Heterogeneity forImproved Replicability in Gene Signature Selection and Clinical Prediction"and Moffitt et al. (2015) "Virtual microdissection identifies distinct tumor- and stroma-specific subtypes of pancreatic ductal adenocarcinoma" for further details on these four datasets.

Description

Basal dataset: A composition of cancer datasets with top scoring pairs(TSPs) as covariates and binary response indicating if the subject's cancersubtype was basal-like.

A dataset composed of four datasets combined from studies that contain gene expression data from subjects with several types of cancer.Two of these datasets contain gene expression data for subjects with Pancreatic Ductal Adenocarcinoma (PDAC), one dataset contains data for subjects with Breast Cancer, and the fourth dataset contains data for subjects with Bladder Cancer. The response of interest is whether or not the subject'scancer subtype was the basal-like subtype.See articles Rashid et al. (2020) "Modeling Between-Study Heterogeneity forImproved Replicability in Gene Signature Selection and Clinical Prediction"and Moffitt et al. (2015) "Virtual microdissection identifies distinct tumor- and stroma-specific subtypes of pancreatic ductal adenocarcinoma" for further details on these four datasets.

Usage

data("basal")

Format

A list containing the following elements:

y

binary response vector; 1 indicates that the subject's cancerwas of the basal-like subtype, 0 otherwise

X

matrix of 50 top scoring pair (TSP) covariates

group

factor indicating which cancer study the observation belongs to,which are given the following descriptions:UNC PDAC, TCGA PDAC, TCGA Bladder Cancer, and UNC Breast Cancer

Z

model matrix for random effects; organized first by variable, thenby group (i.e. by cancer study)


Fit a Generalized Mixed Model via Monte Carlo Expectation Conditional Minimization (MCECM)

Description

glmm is used to fit a single generalized mixed model via Monte Carlo Expectation Conditional Minimization (MCECM). UnlikeglmmPen, no model selection is performed.

Usage

glmm(  formula,  data = NULL,  family = "binomial",  covar = NULL,  offset = NULL,  optim_options = optimControl(),  adapt_RW_options = adaptControl(),  trace = 0,  tuning_options = lambdaControl(),  progress = TRUE,  ...)

Arguments

formula

a two-sided linear formula object describing both the fixed effects and random effects part of the model, with the response on the left of a ~ operator and the terms, separated by + operators, on the right. Random-effects terms are distinguished by vertical bars ("|") separating expression for design matrices from the grouping factor.formula should be of the same format needed forglmer in packagelme4. Only one grouping factor will be recognized. The random effects covariates need to be a subset of the fixed effects covariates.The offset must be specified outside of the formula in the 'offset' argument.

data

an optional data frame containing the variables named informula. Ifdata is omitted, variables will be taken from the environment offormula.

family

a description of the error distribution and link function to be used in the model. Currently, theglmmPen algorithm allows the Binomial ("binomial" or binomial()), Gaussian ("gaussian" or gaussian()), and Poisson ("poisson" or poisson()) familieswith canonical links only. SeephmmPen for variable selection withinproportional hazards mixed models for survival data.

covar

character string specifying whether the covariance matrix should be unstructured("unstructured") or diagonal with no covariances between variables ("independent").Default is set toNULL. Ifcovar is set toNULL and the number of random effectspredictors (not including the intercept) is greater than or equal to 10 (i.e. high dimensional), then the algorithm automatically assumes an independent covariance structure andcovar is set to "independent". Otherwise ifcovaris set toNULL and the number of random effects predictors is less than 10, then thealgorithm automatically assumes an unstructured covariance structure andcovar is set to "unstructured".

offset

This can be used to specify ana priori known component to be included in the linear predictor during fitting. Default set toNULL (no offset). If the data argument is notNULL, this should be a numeric vector of length equal to the number of cases (the length of the response vector). If the data argument specifies a data.frame, the offsetargument should specify the name of a column in the data.frame.

optim_options

a structure of class "optimControl" created from functionoptimControl that specifies several optimization parameters. See the documentation foroptimControl for more details on defaults.

adapt_RW_options

a list of class "adaptControl" from functionadaptControl containing the control parameters for the adaptive random walk Metropolis-within-Gibbs procedure. Ignored ifoptimControl parametersampler is set to "stan" (default) or "independence".

trace

an integer specifying print output to include as function runs. Default value is 0. See Details for more information about output provided when trace = 0, 1, or 2.

tuning_options

a list of class "selectControl" or "lambdaControl" resulting fromselectControl orlambdaControl containing additional control parameters.When functionglmm is used,the algorithm may be run using one specific set ofpenalty parameterslambda0 andlambda1 by specifying such values inlambdaControl(). The default forglmm is to run the model fit with no penalization (lambda0 =lambda1 = 0).When functionglmmPen is run,tuning_options is specified usingselectControl(). See thelambdaControl andselectControl documentation for further details.

progress

a logical value indicating if additional output should be given showing theprogress of the fit procedure. IfTRUE, such output includes iteration-level informationfor the fit procedure (iteration number EM_iter,number of MCMC samples nMC, average Euclidean distance between current coefficients and coefficientsfrom t–defined inoptimControl–iterations back EM_conv, and number of non-zero fixed and random effects covariatesnot including the intercept). Additionally,progress = TRUEgives some other information regarding the progress of the variable selection procedure, including the model selection criteria and log-likelihood estimatesfor each model fit.Default isTRUE.

...

additional arguments that could be passed intoglmmPen. SeeglmmPenfor further details.

Details

Theglmm function can be used to fit a single generalized mixed model.While this approach is meant to be used in the case where the user knows whichcovariates belong in the fixed and random effects and no penalization is required, one isallowed to specify non-zero fixed and random effects penalties usinglambdaControland the (...) arguments. The (...) allow for specification of penalty-related arguments; seeglmmPen for details. For a high dimensional situation, the user may want to fit a minimal penalty model using a small penalty for the fixed and random effects and save the posteriorsamples from this minimal penalty model for use in any BIC-ICQ calculations during selection withinglmmPen. Specifying a file name in the 'BICq_posterior' argument will save the posterior samples from theglmm model into a big.matrix with this file name, see the Details section ofglmmPen for additional details.

Value

A reference class object of classpglmmObj for which many methods are available (e.g.methods(class = "pglmmObj"))


Fit Penalized Generalized Mixed Models via Monte Carlo Expectation Conditional Minimization (MCECM)

Description

glmmPen is used to fit penalized generalized mixed models via Monte Carlo Expectation Conditional Minimization (MCECM). The purpose of the function is to perform variable selection on both the fixed and random effects simultaneously for thegeneralized linear mixed model.glmmPen selects the best model using BIC-type selection criteria (seeselectControl documentation for further details). To improve the speed of the algorithm, consider settingvar_restrictions = "fixef" within theoptimControl options.

Usage

glmmPen(  formula,  data = NULL,  family = "binomial",  covar = NULL,  offset = NULL,  fixef_noPen = NULL,  penalty = c("MCP", "SCAD", "lasso"),  alpha = 1,  gamma_penalty = switch(penalty[1], SCAD = 4, 3),  optim_options = optimControl(),  adapt_RW_options = adaptControl(),  trace = 0,  tuning_options = selectControl(),  BICq_posterior = NULL,  progress = TRUE,  ...)

Arguments

formula

a two-sided linear formula object describing both the fixed effects and random effects part of the model, with the response on the left of a ~ operator and the terms, separated by + operators, on the right. Random-effects terms are distinguished by vertical bars ("|") separating expression for design matrices from the grouping factor.formula should be of the same format needed forglmer in packagelme4. Only one grouping factor will be recognized. The random effects covariates need to be a subset of the fixed effects covariates.The offset must be specified outside of the formula in the 'offset' argument.

data

an optional data frame containing the variables named informula. Ifdata is omitted, variables will be taken from the environment offormula.

family

a description of the error distribution and link function to be used in the model. Currently, theglmmPen algorithm allows the Binomial ("binomial" or binomial()), Gaussian ("gaussian" or gaussian()), and Poisson ("poisson" or poisson()) familieswith canonical links only. SeephmmPen for variable selection withinproportional hazards mixed models for survival data.

covar

character string specifying whether the covariance matrix should be unstructured("unstructured") or diagonal with no covariances between variables ("independent").Default is set toNULL. Ifcovar is set toNULL and the number of random effectspredictors (not including the intercept) is greater than or equal to 10 (i.e. high dimensional), then the algorithm automatically assumes an independent covariance structure andcovar is set to "independent". Otherwise ifcovaris set toNULL and the number of random effects predictors is less than 10, then thealgorithm automatically assumes an unstructured covariance structure andcovar is set to "unstructured".

offset

This can be used to specify ana priori known component to be included in the linear predictor during fitting. Default set toNULL (no offset). If the data argument is notNULL, this should be a numeric vector of length equal to the number of cases (the length of the response vector). If the data argument specifies a data.frame, the offsetargument should specify the name of a column in the data.frame.

fixef_noPen

Optional vector of 0's and 1's of the same length as the number of fixed effects covariatesused in the model. Value 0 indicates the variable should not have its fixed effect coefficientpenalized, 1 indicates that it can be penalized. Order should correspond to the same order of the fixed effects given in the formula.

penalty

character describing the type of penalty to use in the variable selection procedure.Options include 'MCP', 'SCAD', and 'lasso'. Default is MCP penalty. If the random effect covariancematrix is "unstructured", then a group MCP, group SCAD, or group LASSO penalty is used on the random effects coefficients. See Breheny and Huang (2011) <doi:10.1214/10-AOAS388> and Breheny and Huang (2015) <doi:10.1007/s11222-013-9424-2> for details of these penalties.

alpha

Tuning parameter for the Mnet estimator which controls the relative contributions from the MCP/SCAD/LASSO penalty and the ridge, or L2, penalty.alpha=1 is equivalent to the MCP/SCAD/LASSO penalty, whilealpha=0 is equivalent to ridge regression. However,alpha=0 is not supported;alpha may be arbitrarily small, but not exactly zero

gamma_penalty

The scaling factor of the MCP and SCAD penalties. Not used by LASSO penalty.Default is 4.0 for SCAD and 3.0 for MCP. See Breheny and Huang (2011) <doi:10.1214/10-AOAS388> and Breheny and Huang (2015) <doi:10.1007/s11222-013-9424-2> for further details.

optim_options

a structure of class "optimControl" created from functionoptimControl that specifies several optimization parameters. See the documentation foroptimControl for more details on defaults.

adapt_RW_options

a list of class "adaptControl" from functionadaptControl containing the control parameters for the adaptive random walk Metropolis-within-Gibbs procedure. Ignored ifoptimControl parametersampler is set to "stan" (default) or "independence".

trace

an integer specifying print output to include as function runs. Default value is 0. See Details for more information about output provided when trace = 0, 1, or 2.

tuning_options

a list of class "selectControl" or "lambdaControl" resulting fromselectControl orlambdaControl containing additional control parameters.When functionglmm is used,the algorithm may be run using one specific set ofpenalty parameterslambda0 andlambda1 by specifying such values inlambdaControl(). The default forglmm is to run the model fit with no penalization (lambda0 =lambda1 = 0).When functionglmmPen is run,tuning_options is specified usingselectControl(). See thelambdaControl andselectControl documentation for further details.

BICq_posterior

an optional character string expressing the path and file basename of a file combination that will file-back or currently file-backs abig.matrix of the posterior samples from the minimal penalty model used for the BIC-ICQ calculation used for model selection. T(BIC-ICQ reference: Ibrahim et al (2011)<doi:10.1111/j.1541-0420.2010.01463.x>). If this argument isspecified asNULL (default) and BIC-ICQ calculations are requested (seeselectControl)for details), the posterior sampleswill be saved in the file combination 'BICq_Posterior_Draws.bin' and 'BICq_Posterior_Draws.desc'in the working directory.See 'Details' section for additional details about the required format ofBICq_posteriorand the file-backed big matrix.

progress

a logical value indicating if additional output should be given showing theprogress of the fit procedure. IfTRUE, such output includes iteration-level informationfor the fit procedure (iteration number EM_iter,number of MCMC samples nMC, average Euclidean distance between current coefficients and coefficientsfrom t–defined inoptimControl–iterations back EM_conv, and number of non-zero fixed and random effects covariatesnot including the intercept). Additionally,progress = TRUEgives some other information regarding the progress of the variable selection procedure, including the model selection criteria and log-likelihood estimatesfor each model fit.Default isTRUE.

...

additional arguments that could be passed intoglmmPen. SeeglmmPen_FAandphmmPen_FA for further details.

Details

ArgumentBICq_posterior details: If theBIC_option inselectControl (tuning_options) is specified to be 'BICq', this requests the calculation of the BIC-ICQ criterion during the selectionprocess. For the BIC-ICQ criterion to be calculated, a minimal penalty model assuming a small valued lambda penalty needs to be fit, and the posterior samples from this minimal penalty model need to be used. In order to avoid repetitive calculations ofthis minimal penalty model (i.e. if the user wants to re-runglmmPen with a differentset of penalty parameters), abig.matrix of these posterior samples will be file-backed as two files: a backing file with extension '.bin' and a descriptor file with extension '.desc'. TheBICq_posterior argument should contain a path and a filename of the form "./path/filename" such that the backingfile andthe descriptor file would then be saved as "./path/filename.bin" and "./path/filename.desc", respectively.IfBICq_posterior is set toNULL, then by default, the backingfile and descriptorfile are saved in the working directory as "BICq_Posterior_Draws.bin" and "BICq_Posterior_Draws.desc".If the big matrix of posterior samples is already file-backed,BICq_posterior shouldspecify the path and basename of the appropriate files (again of form "./path/filename"); the minimal penalty model will not be fit again and the big.matrix of posterior samples will be read using theattach.big.matrix function of thebigmemory package and used in the BIC-ICQ calculations. If the appropriate files do not exist orBICq_posterior is specified asNULL, the minimal penalty model will be fit and the minimal penalty model posteriorsamples will be saved as specified above. The algorithm will save 10^4 posterior samples automatically.

Trace details: The value of 0 (default) does not output any extra information. The value of 1additionally outputs the updated coefficients, updated covariance matrix values, and thenumber of coordinate descent iterations used for the M step for eachEM iteration. When pre-screening procedure is used and/or if the BIC-ICQ criterion isrequested, trace = 1 gives additional information about the penalties usedfor the 'minimal penalty model' fit procedure. If Stan is not used as the E-step sampling mechanism, the value of 2 outputs all of the above plus gibbs acceptance rate informationfor the adaptive random walk and independence samplers and the updated proposal standard deviationfor the adaptive random walk.

Value

A reference class object of classpglmmObj for which many methods are available (e.g.methods(class = "pglmmObj"), see ?pglmmObj for additional documentation)


Fit Penalized Generalized Mixed Models via Monte Carlo Expectation Conditional Minimization (MCECM)

Description

glmmPen_FA is used to fit penalized generalized linear mixed models via Monte Carlo Expectation Conditional Minimization (MCECM) using a factor model decomposition ofthe random effects. The purpose of the function is to perform variable selection on both the fixed and random effects simultaneously for thegeneralized linear mixed model. This function uses a factor model decomposition on the random effects. This assumptionreduces the latent space in the E-step (Expectation step) of the algorithm,which reduces the computational complexity of the algorithm. This improvesthe speed of the algorithm and enables the scaling of the algorithm to larger dimensions.Besides the modeling of the random effects, this function is similar toglmmPen.glmmPen_FA selects the best model using BIC-type selection criteria (seeselectControl documentation for further details). Function is currently available for Binomial and Poisson families with canonical links.To improve the speed of the algorithm, consider settingvar_restrictions = "fixef" within theoptimControl options.

Usage

glmmPen_FA(  formula,  data = NULL,  family = "binomial",  offset = NULL,  r_estimation = rControl(),  fixef_noPen = NULL,  penalty = c("MCP", "SCAD", "lasso"),  alpha = 1,  gamma_penalty = switch(penalty[1], SCAD = 4, 3),  optim_options = optimControl(),  adapt_RW_options = adaptControl(),  trace = 0,  tuning_options = selectControl(),  BICq_posterior = NULL,  progress = TRUE,  ...)

Arguments

formula

a two-sided linear formula object describing both the fixed effects and random effects part of the model, with the response on the left of a ~ operator and the terms, separated by + operators, on the right. Random-effects terms are distinguished by vertical bars ("|") separating expression for design matrices from the grouping factor.formula should be of the same format needed forglmer in packagelme4. Only one grouping factor will be recognized. The random effects covariates need to be a subset of the fixed effects covariates.The offset must be specified outside of the formula in the 'offset' argument.

data

an optional data frame containing the variables named informula. Ifdata is omitted, variables will be taken from the environment offormula.

family

a description of the error distribution and link function to be used in the model. Currently, theglmmPen algorithm allows the Binomial ("binomial" or binomial()), Gaussian ("gaussian" or gaussian()), and Poisson ("poisson" or poisson()) familieswith canonical links only. SeephmmPen for variable selection withinproportional hazards mixed models for survival data.

offset

This can be used to specify ana priori known component to be included in the linear predictor during fitting. Default set toNULL (no offset). If the data argument is notNULL, this should be a numeric vector of length equal to the number of cases (the length of the response vector). If the data argument specifies a data.frame, the offsetargument should specify the name of a column in the data.frame.

r_estimation

a list of class "rControl" from functionrControl containing the control parameters for the estimation of the number of latentfactors to use in theglmmPen_FA andglmm_FA estimation procedures.

fixef_noPen

Optional vector of 0's and 1's of the same length as the number of fixed effects covariatesused in the model. Value 0 indicates the variable should not have its fixed effect coefficientpenalized, 1 indicates that it can be penalized. Order should correspond to the same order of the fixed effects given in the formula.

penalty

character describing the type of penalty to use in the variable selection procedure.Options include 'MCP', 'SCAD', and 'lasso'. Default is MCP penalty. If the random effect covariancematrix is "unstructured", then a group MCP, group SCAD, or group LASSO penalty is used on the random effects coefficients. See Breheny and Huang (2011) <doi:10.1214/10-AOAS388> and Breheny and Huang (2015) <doi:10.1007/s11222-013-9424-2> for details of these penalties.

alpha

Tuning parameter for the Mnet estimator which controls the relative contributions from the MCP/SCAD/LASSO penalty and the ridge, or L2, penalty.alpha=1 is equivalent to the MCP/SCAD/LASSO penalty, whilealpha=0 is equivalent to ridge regression. However,alpha=0 is not supported;alpha may be arbitrarily small, but not exactly zero

gamma_penalty

The scaling factor of the MCP and SCAD penalties. Not used by LASSO penalty.Default is 4.0 for SCAD and 3.0 for MCP. See Breheny and Huang (2011) <doi:10.1214/10-AOAS388> and Breheny and Huang (2015) <doi:10.1007/s11222-013-9424-2> for further details.

optim_options

a structure of class "optimControl" created from functionoptimControl that specifies several optimization parameters. See the documentation foroptimControl for more details on defaults.

adapt_RW_options

a list of class "adaptControl" from functionadaptControl containing the control parameters for the adaptive random walk Metropolis-within-Gibbs procedure. Ignored ifoptimControl parametersampler is set to "stan" (default) or "independence".

trace

an integer specifying print output to include as function runs. Default value is 0. See Details for more information about output provided when trace = 0, 1, or 2.

tuning_options

a list of class "selectControl" or "lambdaControl" resulting fromselectControl orlambdaControl containing additional control parameters.When functionglmm is used,the algorithm may be run using one specific set ofpenalty parameterslambda0 andlambda1 by specifying such values inlambdaControl(). The default forglmm is to run the model fit with no penalization (lambda0 =lambda1 = 0).When functionglmmPen is run,tuning_options is specified usingselectControl(). See thelambdaControl andselectControl documentation for further details.

BICq_posterior

an optional character string expressing the path and file basename of a file combination that will file-back or currently file-backs abig.matrix of the posterior samples from the minimal penalty model used for the BIC-ICQ calculation used for model selection. T(BIC-ICQ reference: Ibrahim et al (2011)<doi:10.1111/j.1541-0420.2010.01463.x>). If this argument isspecified asNULL (default) and BIC-ICQ calculations are requested (seeselectControl)for details), the posterior sampleswill be saved in the file combination 'BICq_Posterior_Draws.bin' and 'BICq_Posterior_Draws.desc'in the working directory.See 'Details' section for additional details about the required format ofBICq_posteriorand the file-backed big matrix.

progress

a logical value indicating if additional output should be given showing theprogress of the fit procedure. IfTRUE, such output includes iteration-level informationfor the fit procedure (iteration number EM_iter,number of MCMC samples nMC, average Euclidean distance between current coefficients and coefficientsfrom t–defined inoptimControl–iterations back EM_conv, and number of non-zero fixed and random effects covariatesnot including the intercept). Additionally,progress = TRUEgives some other information regarding the progress of the variable selection procedure, including the model selection criteria and log-likelihood estimatesfor each model fit.Default isTRUE.

...

additional arguments that could be passed intoglmmPen_FA. SeephmmPen_FA for further details (e.g.survival_options argument).

Details

ArgumentBICq_posterior details: If theBIC_option inselectControl (tuning_options) is specified to be 'BICq', this requests the calculation of the BIC-ICQ criterion during the selectionprocess. For the BIC-ICQ criterion to be calculated, a full model assuming a small valued lambda penalty needs to be fit, and the posterior draws from this full model need to be used. In order to avoid repetitive calculations ofthis full model (i.e. if the user wants to re-runglmmPen with a differentset of penalty parameters), abig.matrix of these posterior draws will be file-backed as two files: a backing file with extention '.bin' and a descriptor file with extension '.desc'. TheBICq_posterior argument should contain a path and a filename with no extension of the form "./path/filename" such that the backingfile andthe descriptor file would then be saved as "./path/filename.bin" and "./path/filename.desc", respectively.IfBICq_posterior is set toNULL, then by default, the backingfile and descriptorfile are saved in the working directory as "BICq_Posterior_Draws.bin" and "BICq_Posterior_Draws.desc".If the big matrix of posterior draws is already file-backed,BICq_posterior shouldspecify the path and basename of the appropriate files (again of form "./path/filename"); the full model will not be fit again and the big.matrix of posterior draws will be read using theattach.big.matrix function of thebigmemory package and used in the BIC-ICQ calcuations. If the appropriate files do not exist orBICq_posterior is specified asNULL, the full model will be fit and the full model posteriordraws will be saved as specified above. The algorithm will save 10^4 posterior draws automatically.

Trace details: The value of 0 (default) does not output any extra information. The value of 1additionally outputs the updated coefficients, updated covariance matrix values, and thenumber of coordinate descent iterations used for the M step for eachEM iteration. When pre-screening procedure is used and/or if the BIC-ICQ criterion isrequested, trace = 1 gives additional information about the penalties usedfor the 'full model' fit procedure. If Stan is not used as the E-step sampling mechanism, the value of 2 outputs all of the above plus gibbs acceptance rate informationfor the adaptive random walk and independence samplers and the updated proposal standard deviationfor the adaptive random walk.

Value

A reference class object of classpglmmObj for which many methods are available (e.g.methods(class = "pglmmObj"), see ?pglmmObj for additional documentation)


Fit a Generalized Mixed Model via Monte Carlo Expectation Conditional Minimization (MCECM)

Description

glmm_FA is used to fit a single generalized linear mixed model via Monte Carlo Expectation Conditional Minimization (MCECM) using a factor model decomposition ofthe random effects. No model selection is performed. This function uses a factor model decomposition on the random effects. This assumptionreduces the latent space in the E-step (Expectation step) of the algorithm,which reduces the computational complexity of the algorithm. This improvesthe speed of the algorithm and enables the scaling of the algorithm to larger dimensions.Besides the modeling of the random effects, this function is similar toglmm.

Usage

glmm_FA(  formula,  data = NULL,  family = "binomial",  offset = NULL,  r_estimation = rControl(),  optim_options = optimControl(),  adapt_RW_options = adaptControl(),  trace = 0,  tuning_options = lambdaControl(),  progress = TRUE,  ...)

Arguments

formula

a two-sided linear formula object describing both the fixed effects and random effects part of the model, with the response on the left of a ~ operator and the terms, separated by + operators, on the right. Random-effects terms are distinguished by vertical bars ("|") separating expression for design matrices from the grouping factor.formula should be of the same format needed forglmer in packagelme4. Only one grouping factor will be recognized. The random effects covariates need to be a subset of the fixed effects covariates.The offset must be specified outside of the formula in the 'offset' argument.

data

an optional data frame containing the variables named informula. Ifdata is omitted, variables will be taken from the environment offormula.

family

a description of the error distribution and link function to be used in the model. Currently, theglmmPen algorithm allows the Binomial ("binomial" or binomial()), Gaussian ("gaussian" or gaussian()), and Poisson ("poisson" or poisson()) familieswith canonical links only. SeephmmPen for variable selection withinproportional hazards mixed models for survival data.

offset

This can be used to specify ana priori known component to be included in the linear predictor during fitting. Default set toNULL (no offset). If the data argument is notNULL, this should be a numeric vector of length equal to the number of cases (the length of the response vector). If the data argument specifies a data.frame, the offsetargument should specify the name of a column in the data.frame.

r_estimation

a list of class "rControl" from functionrControl containing the control parameters for the estimation of the number of latentfactors to use in theglmmPen_FA andglmm_FA estimation procedures.

optim_options

a structure of class "optimControl" created from functionoptimControl that specifies several optimization parameters. See the documentation foroptimControl for more details on defaults.

adapt_RW_options

a list of class "adaptControl" from functionadaptControl containing the control parameters for the adaptive random walk Metropolis-within-Gibbs procedure. Ignored ifoptimControl parametersampler is set to "stan" (default) or "independence".

trace

an integer specifying print output to include as function runs. Default value is 0. See Details for more information about output provided when trace = 0, 1, or 2.

tuning_options

a list of class "selectControl" or "lambdaControl" resulting fromselectControl orlambdaControl containing additional control parameters.When functionglmm is used,the algorithm may be run using one specific set ofpenalty parameterslambda0 andlambda1 by specifying such values inlambdaControl(). The default forglmm is to run the model fit with no penalization (lambda0 =lambda1 = 0).When functionglmmPen is run,tuning_options is specified usingselectControl(). See thelambdaControl andselectControl documentation for further details.

progress

a logical value indicating if additional output should be given showing theprogress of the fit procedure. IfTRUE, such output includes iteration-level informationfor the fit procedure (iteration number EM_iter,number of MCMC samples nMC, average Euclidean distance between current coefficients and coefficientsfrom t–defined inoptimControl–iterations back EM_conv, and number of non-zero fixed and random effects covariatesnot including the intercept). Additionally,progress = TRUEgives some other information regarding the progress of the variable selection procedure, including the model selection criteria and log-likelihood estimatesfor each model fit.Default isTRUE.

...

additional arguments that could be passed intoglmmPen_FA. SeeglmmPen_FA for further details (e.g. arguments related to variable selectionthat could be used to fit a single penalized GLMM model).

Details

Theglmm_FA function can be used to fit a single generalized linear mixed model.While this approach is meant to be used in the case where the user knows whichcovariates belong in the fixed and random effects and no penalization is required, one isallowed to specify non-zero fixed and random effects penalties usinglambdaControland the (...) arguments. The (...) allow for specification of penalty-related arguments; seeglmmPen_FA for details. For a high dimensional situation, the user may want to fit a full model using a small penalty for the fixed and random effects and save the posteriordraws from this full model for use in any BIC-ICQ calculations during selection withinglmmPen_FA. Specifying a file name in the 'BICq_posterior' argument will save the posterior draws from theglmm_FA model into a big.matrix with this file name, see the Details section ofglmmPen_FA for additional details.

Value

A reference class object of classpglmmObj for which many methods are available (e.g.methods(class = "pglmmObj"))


Control of Penalization Parameters and Selection Criteria

Description

Constructs control structures for penalized mixed model fitting.

Usage

lambdaControl(lambda0 = 0, lambda1 = 0)selectControl(  lambda0_seq = NULL,  lambda1_seq = NULL,  nlambda = 10,  search = c("abbrev", "full_grid"),  BIC_option = c("BICq", "BICh", "BIC", "BICNgrp"),  logLik_calc = switch(BIC_option[1], BICq = FALSE, TRUE),  lambda.min = NULL,  pre_screen = TRUE,  lambda.min.presc = NULL)

Arguments

lambda0

a non-negative numeric penalty parameter for the fixed effects coefficients

lambda1

a non-negative numeric penalty parameter for the (grouped) random effectscovariance coefficients

lambda0_seq,lambda1_seq

a sequence of non-negative numeric penalty parameters for the fixed and random effect coefficients (lambda0_seq andlambda1_seq, respectively). IfNULL, then a sequence will be automatically calculated. See 'Details' section for more details on these default calculations.

nlambda

positive integer specifying number of penalty parametersto use for the fixed and random effects penalty parameters. Default set to 10. Only usedif eitherlambda0_seq orlambda1_seq remain unspecified by the user(one or both of these arguments set toNULL) and, consequently, one or more default sequences needto be calculated.

search

character string of "abbrev" (default) or "full_grid" indicating if the search of models over the penalty parameter space should be the full grid search (total number of models equals'nlambda'^2 or length('lambda0_seq')*length('lambda1_seq')) or an abbreviated grid search.The abbreviated grid search is described in more detail in the Details section. Te authorshighly recommend the abbreviated grid search.

BIC_option

character string specifying the selection criteria used to select the 'best' model.Default "BICq" option specifies the BIC-ICQ criterion (Ibrahim et al (2011)<doi:10.1111/j.1541-0420.2010.01463.x>),which requires a fit of a 'minimum penalty' model; a small penalty (the minimum of the penalty sequence) is used for the fixed and random effects. See "Details" section for what these small penalties will be.The "BICh" option utilizes the hybrid BIC value described in Delattre, Lavielle, and Poursat (2014) <doi:10.1214/14-EJS890>.The regular "BIC" option penalty term uses (total non-zero coefficients)*(length(y) = total numberobservations). The "BICNgrp" option penalty term uses (total non-zero coefficients)*(nlevels(group) = numbergroups).

logLik_calc

logical value specifying if the log likelihood (and log-likelihood based calculations BIC, BICh, and BICNgrp) should be calculated for all of the models in the selection procedure. If BIC-ICQ is used for selection, the log-likelihood is not needed for each model. However, if users are interestedin comparing the best models from BIC-ICQ and other BIC-type selection criteria, settinglogLik_calc toTRUE will calculate these other quantities for all of the models.

lambda.min

numeric fraction between 0 and 1. The sequence of the lambda penalty parametersranges from the maximum lambda where all fixed and random effects are penalized to 0 and a minimum lambda value, which equals a small fraction of the maximum lambda. The parameterlambda.min specifies this fraction. Default value is set toNULL, whichautomatically selectslambda.min to equal 0.01 when the number of observations isgreater than the number of fixed effects predictors and 0.05 otherwise.Only usedif eitherlambda0_seq orlambda1_seq remain unspecified by the user(one or both of these sequence arguments set toNULL) and, consequently, one or more default sequences needto be calculated.

pre_screen

logical value indicating whether pre-screening should be performed beforemodel selection (defaultTRUE). If the number of random effects covariatesconsidered is 4 or less, thenno pre-screening will be performed. Pre-screening removes random effects from considerationduring the model selection process, which can significantly speed up the algorithm.See "Details" section for a further discussion.

lambda.min.presc

numeric fraction between 0 and 1. During pre-screening and the minimal penaltymodel fit for the BIC-ICQ calculation, the small penalty used on the random effect isthe fractionlambda.min.presc multiplied by the maximum penalty parameter that penalizesall fixed and random effects to 0. If left asNULL, the default value is 0.01 when the numberof random effect covariates is 10 or less and 0.05 otherwise.Only used iflambda1_seq remains unspecified by the user(this argument set toNULL so the random effects penalty parameter sequenceneeds to be automatically calculated) AND either the pre-screening procedure is selected by the argumentpre_screen or the BIC-ICQ is selected as the model selection criteria,i.e.,BIC_option = "BICq". See the "Details" section for a further discussion.

Details

If left as the defaultNULL values, thelambda0_seq andlambda1_seq numeric sequences are automatically calculated. The sequence will be calculated in the same manner asncvreg calculates the range: the max value (let's denote this aslambda_max) penalizes all fixed and random effects to 0, the min value is a small portion of max (lambda.min*lambda_max), and the sequence is composed ofnlambda values ranging from these min and max values spread evenly on the log scale. Unlikencvreg, the order of penaltyvalues used in the algorithm must run from the min lambda to the max lambda (as opposed to running from max lambda to min lambda). The length of the sequence is specified bynlambda. By default, these sequences are calculated usingLambdaSeq.

Thelambda0 andlambda1 arguments used within theglmm function allow for a user to fit a model with a single non-zero penalty parameter combination. However, this is generally not recommended.

Abbreviated grid search: The abbreviated grid search proceeds in two stages. In stage 1, thealgorithm fits the following series of models: the fixed effects penalty parameter remains afixed value evaluated at the minimum of the fixed effects penalty parameters, and allrandom effects penalty parameters are examined. The 'best' model from this first stage of modelsdetermines the optimum random effect penalty parameter. In stage 2, the algorithm fits the following series of models: the random effects penalty parameter remains fixed at the value ofthe optimum random effect penalty parameter (from stage 1) and all fixed effects penaltyparameters are considered. The best overall model is the best model from stage 2. This reduces the number of models considered to length('lambda0_seq') + length('lambda1_seq'). The authors foundthat this abbreviated grid search worked well in simulations, and performed considerablyfaster than the full grid search that examined all possible fixed and random effect penaltyparameter combinations.

The argumentsnlambda andlambda.min are only usedif one or both of thelambda0_seq andlambda1_seq penalty sequences(corresponding to the fixed and random effects penalty sequences, respectively) remain unspecified by the user (one or both of these arguments left asNULL),indicating that the algorithm needs to calculate default penalty sequences.

The argumentlambda.min.presc is only used under the following condition:lambda1_seq remains unspecified by the user(this argument set toNULL so the random effects penalty parameter sequenceneeds to be calculated) AND either the pre-screening procedure is selected by the argumentpre_screen or the BIC-ICQ is selected as the model selection criteria,i.e.,BIC_option = "BICq". Iflambda1_seq is specified by the user, the minimumvalue in that sequence will be used as the random effect penalty in thepre-screening procedure and/or the minimal penalty model for the BIC-ICQ calculation.

BIC-ICQ calculation: This model selection criteria requires the fitting of a 'minimal penalty'model, which fits a model with a small penalty on the fixed and random effects.For the fixed effects penalty, the minimal penalty is: (a) 0 if the number of fixed effects covariates is 4 or less or (b) the minimum fixed effect penalty from the fixedeffects penalty sequence (either from the default sequence or from the sequencespecified by the user). For the random effects penalty, the minimal penaltyis (a) 0 if the number of random effects covariates is 4 or less; (b) the minimum random effect penalty from the random effects penalty sequence specified by the user, or (c)lambda.min.presc multiplied to thelambda_max maximum penaltyspecified above when a default random effects penalty sequence is calculated.

Pre-screening: The minimum fixed effects penalty used in the pre-screening stage will be the minimum penalty of the fixed effects penalty sequence,lambda0_seq.The minimum random effects penalty used in the pre-screening stage will be either (a) the minimum random effects penalty in the sequencelambda1_seq if this sequence specified by the user, or (b)lambda.min.pres xlambda_max,wherelambda_max was described above.

Value

The *Control functions return a list (inheriting from class "pglmmControl") containing parameter values that determine settings for variable selection.


Control of Penalized Generalized Linear Mixed Model Fitting

Description

Constructs the control structure for the optimization of the penalized mixed model fit algorithm.

Usage

optimControl(  var_restrictions = c("none", "fixef"),  conv_EM = 0.0015,  conv_CD = 5e-04,  nMC_burnin = NULL,  nMC_start = NULL,  nMC_max = NULL,  nMC_report = 5000,  maxitEM = NULL,  maxit_CD = 50,  M = 10000,  t = 2,  mcc = 2,  sampler = c("stan", "random_walk", "independence"),  var_start = "recommend",  step_size = 1,  standardization = TRUE,  convEM_type = c("AvgEuclid1", "maxdiff", "AvgEuclid2", "Qfun"),  B_init_type = c("deterministic", "data", "random"))

Arguments

var_restrictions

character string indicating how the random effect covariancematrix should be initialized at the beginning of the algorithmwhen penalties are applied to the coefficients. If "none" (default), all random effect predictors are initialized to have non-zero variances.If "fixef", the code first examines the initialized fixed effects (initialized using a regularpenalized GLM), and only the random effect predictors that are initialized with non-zero fixed effectsare initialized with non-zero variances.

conv_EM

a non-negative numeric convergence criteria for the convergence of the EM algorithm. Default is 0.0015. EM algorithm is considered to have converge if the average Euclidean distance between the current coefficient estimates and the coefficient estimates fromt EM iterations back is less thanconv_EMmcc times in a row.Seet andmcc for more details.

conv_CD

a non-negative numeric convergence criteria for the convergence of the grouped coordinate descent loop within the M step of the EM algorithm. Default 0.0005.

nMC_burnin

positive integer specifying the number of posterior samples to use asburn-in for each E step in the EM algorithm. If set toNULL, the algorithm inputsthe following defaults: Default 250 when the number of random effects predictors is less than or equal to 10; default 100 otherwise. Function will not allownMC_burninto be less than 100.

nMC_start

a positive integer for the initial number of Monte Carlo draws. If set toNULL, the algorithm inputs the following defaults: Default 250 when the number of random effects predictors is less than or equal to 10; default 100 otherwise.

nMC_max

a positive integer for the maximum number of allowed Monte Carlo draws usedin each step of the EM algorithm. If set toNULL, the algorithm inputs the following defaults: When the number of random effect covariates is greater than 10,the default is set to 1000; when the number of random effect covariates is10 or less, the default is set to 2500.

nMC_report

a positive integer for the number of posterior samples to save from the finalmodel. These posterior samples can be used for diagnostic purposes, seeplot_mcmc.Default set to 5000.

maxitEM

a positive integer for the maximum number of allowed EM iterations. If set toNULL, then the algorithm inputs the following defaults:Default equals 50 for the Binomial and Poisson families, 65 for the Gaussian family.

maxit_CD

a positive integer for the maximum number of allowed iterations for thecoordinate descent algorithms used within the M-step of each EM iteration. Default equals 50.

M

positive integer specifying the number of posterior samples to use within the Pajor log-likelihood calculation. Default is 10^4; minimum allowed value is 5000.

t

the convergence criteria is based on the average Euclidean distance between the most recent coefficient estimates and the coefficient estimates fromt EM iterations back.Positive integer, default equals 2.

mcc

the number of times the convergence criteria must be met before the algorithm isseen as having converged (mcc for 'meet condition counter'). Default set to 2. Value restricted to be no less than 2.

sampler

character string specifying whether the posterior samples of the random effectsshould be drawn using Stan (default, from package rstan) or the Metropolis-within-Gibbs procedure incorporating an adaptive random walk sampler ("random_walk") or anindependence sampler ("independence"). If using the random walk sampler, seeadaptControlfor some additional control structure parameters.

var_start

either the character string "recommend" or a positive number specifying the starting values to initialize the variance of the covariance matrix. ForglmmPen,the default "recommend" firstfits a simple model with a fixed and random intercept only using the lme4 R package, seeglmer for details on fitting generalized linear mixed models orlmer for details on fitting linear mixed models. The random intercept variance estimate from this model is then multiplied by 2 and used as the starting variance. ForglmmPen_FA, the default is set to 0.10 (seeB_init_type for further information).

step_size

positive numeric value indicating the starting step size to use in the Majorization-Minimization scheme of the M-step. Only relevant when the distributional assumptionused is not Binomial or Gaussian with canonical links (e.g. Poisson with log link)

standardization

logical value indicating whether covariates shouldstandardized (TRUE, default) or unstandardized (FALSE) before beingused within the algorithm. Ifstandardization = TRUE, then the standardized covariateswill also be used to create the Z matrix used in the estimation of the random effects.

convEM_type

character string indicating the type of convergence criteria to use within the EM algorithm to determine when a model has converged. The default is "AvgEuclid1",which calculates the average Euclidean distance between the most recent coefficient vector andthe coefficient vectort EM iterations back (Euclidean distance divided by the numberof non-zero coefficientst EM iterations back). Alternative convergence options include"maxdiff", which determines convergence based on the maximum difference between the coefficient vectors; "AvgEuclid2", which is similar to "AvgEuclid1" except it divides the Euclidean distance by the square-rootof the number of non-zero coefficients; and "Qfun", which determines convergence based onthe relative difference in the Q-function estimates calculated with the most recent coefficient vector and the coefficient vectort EM iterations back.

B_init_type

character string indicating how the B matrix within theglmmPen_FAmethod should be initialized. (This argument is not used within theglmmPen function.)The default "deterministic" initializes all non-zero variance and covariance values of the random effect covariance matrix to the value ofvar_start,such that each non-zero element of the B matrix issqrt(var_start / r) (wherer isthe number of latent factors). Option "data" is similar to "deterministic", but thevar_start value is the default data-driven variance estimate used inglmmPen (see argumentvar_start for more details).

Details

Several arguments are set to a default value ofNULL. If these arguments are left asNULL by the user, then these values will be filled in with appropriatedefault values as specified above, which may depend on the number of random effects orthe family of the data. If the userspecifies particular values for these arguments, no additional modifications to these arguments will be done within the algorithm.

Value

Function returns a list inheriting from classoptimControlcontaining fit and optimization criteria values used in optimization routine.


ClasspglmmObj of Fitted Penalized Generalized Mixed-Effects Models for packageglmmPen

Description

The functionsglmm,glmmPen,glmm_FA, andglmmPen_FA from the packageglmmPen output the reference class object of typepglmmObj.

Usage

## S3 method for class 'pglmmObj'fixef(object, ...)## S3 method for class 'pglmmObj'ranef(object, ...)## S3 method for class 'pglmmObj'sigma(object, ...)## S3 method for class 'pglmmObj'coef(object, ...)## S3 method for class 'pglmmObj'family(object, ...)## S3 method for class 'pglmmObj'nobs(object, ...)## S3 method for class 'pglmmObj'ngrps(object, ...)## S3 method for class 'pglmmObj'formula(x, fixed.only = FALSE, random.only = FALSE, ...)## S3 method for class 'pglmmObj'model.frame(formula, fixed.only = FALSE, ...)## S3 method for class 'pglmmObj'model.matrix(object, type = c("fixed", "random"), ...)## S3 method for class 'pglmmObj'fitted(object, fixed.only = TRUE, ...)## S3 method for class 'pglmmObj'predict(  object,  newdata = NULL,  type = c("link", "response"),  fixed.only = TRUE,  ...)## S3 method for class 'pglmmObj'residuals(object, type = c("deviance", "pearson", "response", "working"), ...)## S3 method for class 'pglmmObj'print(x, digits = c(fef = 4, ref = 4), ...)## S3 method for class 'pglmmObj'summary(  object,  digits = c(fef = 4, ref = 4),  resid_type = switch(object$family$family, gaussian = "pearson", "deviance"),  ...)## S3 method for class 'pglmmObj'logLik(object, ...)## S3 method for class 'pglmmObj'BIC(object, ...)## S3 method for class 'pglmmObj'plot(x, fixed.only = FALSE, type = NULL, ...)

Arguments

object

pglmmObj object output fromglmm,glmmPen, orglmmPen_FineSearch

...

potentially further arguments passed from other methods

x

an R object of classpglmmObj

fixed.only

logical value; defaultTRUE indicates that only the fixed effects should be used in the fitted value/prediction, whileFALSE indicates that both the fixed and random effects posterior modes should be used in the fitted value/prediction

random.only

logical value used informula;TRUE indicates that only the formula elements relating to the random effects should be returned

formula

in the case of model.frame, apglmmObj object

type

See details oftype options for each function under "Functions" section.

newdata

optional new data.frame containing the same variables used in the model fit procedure

digits

number of significant digits for printing; default of 4

resid_type

type of residuals to summarize in output. Seepredict.pglmmObjfor residual options available.

Value

The pglmmObj object returns the following items:

fixef

vector of fixed effects coefficients

ranef

matrix of random effects coefficients for each explanatory variable for each level of the grouping factor

sigma

random effects covariance matrix

scale

if family is Gaussian, returns the residual error variance

posterior_samples

Samples from the posterior distribution of the random effects,taken at the end of the model fit (after convergence or after maximum iterations allowed).Can be used for diagnositics purposes. Note: These posterior samples are from a single chain.

sampling

character string for type of sampling used to calculate the posterior samplesin the E-step of the algorithm

results_all

matrix of results from all model fits during variable selection (if selectionperformed). Output for each model includes: penalty parameters for fixed (lambda0) and random (lambda1) effects, BIC-derived quantities and the log-likelihood (note: the argumentsBIC_option andlogLik_calc inselectControldetermine which of these quantities are calculated for each model), the number of non-zero fixed and random effects (includes intercept),number of EM iterations used for model fit, whether or not the model converged (0 for no vs 1 for yes), and the fixed and random effects coefficients

results_optim

results from the 'best' model fit; see results_all for details. BICh, BIC, BICNgrp, and LogLik computed for this best model if not previously calculated.

family

Family

penalty_info

list of penalty information

call

arguments plugged intoglmm,glmmPen,glmm_FA, orglmmPen_FA

formula

formula

fixed_vars

names of fixed effects variables

data

list of data used in model fit, including the response y, the fixed effects covariates matrix X, the random effects model matrix Z (which is composed of values from the standardized fixed effects model matrix), the grouping factor, offset, model frame,and standarization information used to standardize the fixed effects covariates

optinfo

Information about the optimization of the 'best' model

control_info

optimization parameters used for the model fit

Estep_init

materials that can be used to initialize another E-step, if desired

Gibbs_info

list of materials to perform diagnositics on the Metropolis-within-Gibbssample chains, including the Gibbs acceptance rates (included for both the independenceand adaptive random walk samplers) and the final proposal standard deviations (included for the adaptive random walk sampler only))

r_estimation

list of output related to estimation of number of latent common factors, r. Only relevant for the output of functionsglmm_FA andglmmPen_FA, which are currently in development and are not yet ready forgeneral use.

showClass("pglmmObj")methods(class = "pglmmObj")

Functions


Fit a Proportional Hazards Mixed Model via Monte Carlo Expectation Conditional Minimization (MCECM)using a Piecewise Constant Hazard Mixed Model Approximation

Description

phmm is used to approximate a single proportional hazards mixed model using a piecewise constant hazard mixed model approximationvia Monte Carlo Expectation Conditional Minimization (MCECM). No model selection is performed.

Usage

phmm(  formula,  data = NULL,  covar = NULL,  offset = NULL,  optim_options = optimControl(),  adapt_RW_options = adaptControl(),  trace = 0,  tuning_options = lambdaControl(),  survival_options = survivalControl(),  progress = TRUE,  ...)

Arguments

formula

a two-sided linear formula object describing both the fixed effects and random effects part of the model, with the response on the left of a ~ operator and the terms, separated by + operators, on the right. The response must be aSurv object (seeSurv from thesurvival package).Random-effects terms are distinguished by vertical bars ("|") separating expression for design matrices from the grouping factor.formula should be of the same format needed forglmer in packagelme4. Only one grouping factor will be recognized. The random effects covariates need to be a subset of the fixed effects covariates.The offset must be specified outside of the formula in the 'offset' argument.

data

an optional data frame containing the variables named informula. Ifdata is omitted, variables will be taken from the environment offormula.

covar

character string specifying whether the covariance matrix should be unstructured("unstructured") or diagonal with no covariances between variables ("independent").Default is set toNULL. Ifcovar is set toNULL and the number of random effectspredictors (not including the intercept) is greater than or equal to 10 (i.e. high dimensional), then the algorithm automatically assumes an independent covariance structure andcovar is set to "independent". Otherwise ifcovaris set toNULL and the number of random effects predictors is less than 10, then thealgorithm automatically assumes an unstructured covariance structure andcovar is set to "unstructured".

offset

This can be used to specify ana priori known component to be included in the linear predictor during fitting. Default set toNULL (no offset). If the data argument is notNULL, this should be a numeric vector of length equal to the number of cases (the length of the response vector). If the data argument specifies a data.frame, the offsetargument should specify the name of a column in the data.frame.

optim_options

a structure of class "optimControl" created from functionoptimControl that specifies several optimization parameters. See the documentation foroptimControl for more details on defaults.

adapt_RW_options

a list of class "adaptControl" from functionadaptControl containing the control parameters for the adaptive random walk Metropolis-within-Gibbs procedure. Ignored ifoptimControl parametersampler is set to "stan" (default) or "independence".

trace

an integer specifying print output to include as function runs. Default value is 0. See Details for more information about output provided when trace = 0, 1, or 2.

tuning_options

a list of class "selectControl" or "lambdaControl" resulting fromselectControl orlambdaControl containing additional control parameters.When functionglmm is used,the algorithm may be run using one specific set ofpenalty parameterslambda0 andlambda1 by specifying such values inlambdaControl(). The default forglmm is to run the model fit with no penalization (lambda0 =lambda1 = 0).When functionglmmPen is run,tuning_options is specified usingselectControl(). See thelambdaControl andselectControl documentation for further details.

survival_options

a structure of class "survivalControl" created from functionsurvivalControl that specifies several parameters needed to properly fit the input survival data using a piecewise constant hazard mixed model. See the documentation forsurvivalControl for more details on defaults.

progress

a logical value indicating if additional output should be given showing theprogress of the fit procedure. IfTRUE, such output includes iteration-level informationfor the fit procedure (iteration number EM_iter,number of MCMC samples nMC, average Euclidean distance between current coefficients and coefficientsfrom t–defined inoptimControl–iterations back EM_conv, and number of non-zero fixed and random effects covariatesnot including the intercept). Additionally,progress = TRUEgives some other information regarding the progress of the variable selection procedure, including the model selection criteria and log-likelihood estimatesfor each model fit.Default isTRUE.

...

additional arguments that could be passed intophmmPen. SeephmmPenfor further details.

Details

Thephmm function can be used to approximate a single proportional hazardsmixed model using a piecewise constant hazard mixed model.While this approach is meant to be used in the case where the user knows whichcovariates belong in the fixed and random effects and no penalization is required, one isallowed to specify non-zero fixed and random effects penalties usinglambdaControland the (...) arguments. The (...) allow for specification of penalty-related arguments; seeglmmPen andglmmPen_FAfor details. For a high dimensional situation, the user may want to fit a full model using a small penalty for the fixed and random effects and save the posteriordraws from this full model for use in any BIC-ICQ calculations during selection withinphmmPen. Specifying a file name in the 'BICq_posterior' argument will save the posterior draws from thephmm model into a big.matrix with this file name, see the Details section ofphmmPen for additional details.

Value

A reference class object of classpglmmObj for which many methods are available (e.g.methods(class = "pglmmObj"))


Fit Penalized Proportional Hazards Mixed Models via Monte Carlo Expectation Conditional Minimization (MCECM) using a Piecewise Constant Hazard Mixed Model Approximation

Description

phmmPen_FA is used to fit penalized proportional hazards mixed models using a piecewise constant hazard mixed model approximation via Monte Carlo Expectation Conditional Minimization (MCECM). The purpose of the function is to perform variable selection on both the fixed and random effects simultaneously for thepiecewise constant hazard mixed model.phmmPen selects the best model using BIC-type selection criteria (seeselectControl documentation for further details). To improve the speed of the algorithm, consider settingvar_restrictions = "fixef" within theoptimControl options.

Usage

phmmPen(  formula,  data = NULL,  covar = NULL,  offset = NULL,  fixef_noPen = NULL,  penalty = c("MCP", "SCAD", "lasso"),  alpha = 1,  gamma_penalty = switch(penalty[1], SCAD = 4, 3),  optim_options = optimControl(),  adapt_RW_options = adaptControl(),  trace = 0,  tuning_options = selectControl(),  survival_options = survivalControl(),  BICq_posterior = NULL,  progress = TRUE)

Arguments

formula

a two-sided linear formula object describing both the fixed effects and random effects part of the model, with the response on the left of a ~ operator and the terms, separated by + operators, on the right. The response must be aSurv object (seeSurv from thesurvival package).Random-effects terms are distinguished by vertical bars ("|") separating expression for design matrices from the grouping factor.formula should be of the same format needed forglmer in packagelme4. Only one grouping factor will be recognized. The random effects covariates need to be a subset of the fixed effects covariates.The offset must be specified outside of the formula in the 'offset' argument.

data

an optional data frame containing the variables named informula. Ifdata is omitted, variables will be taken from the environment offormula.

covar

character string specifying whether the covariance matrix should be unstructured("unstructured") or diagonal with no covariances between variables ("independent").Default is set toNULL. Ifcovar is set toNULL and the number of random effectspredictors (not including the intercept) is greater than or equal to 10 (i.e. high dimensional), then the algorithm automatically assumes an independent covariance structure andcovar is set to "independent". Otherwise ifcovaris set toNULL and the number of random effects predictors is less than 10, then thealgorithm automatically assumes an unstructured covariance structure andcovar is set to "unstructured".

offset

This can be used to specify ana priori known component to be included in the linear predictor during fitting. Default set toNULL (no offset). If the data argument is notNULL, this should be a numeric vector of length equal to the number of cases (the length of the response vector). If the data argument specifies a data.frame, the offsetargument should specify the name of a column in the data.frame.

fixef_noPen

Optional vector of 0's and 1's of the same length as the number of fixed effects covariatesused in the model. Value 0 indicates the variable should not have its fixed effect coefficientpenalized, 1 indicates that it can be penalized. Order should correspond to the same order of the fixed effects given in the formula.

penalty

character describing the type of penalty to use in the variable selection procedure.Options include 'MCP', 'SCAD', and 'lasso'. Default is MCP penalty. If the random effect covariancematrix is "unstructured", then a group MCP, group SCAD, or group LASSO penalty is used on the random effects coefficients. See Breheny and Huang (2011) <doi:10.1214/10-AOAS388> and Breheny and Huang (2015) <doi:10.1007/s11222-013-9424-2> for details of these penalties.

alpha

Tuning parameter for the Mnet estimator which controls the relative contributions from the MCP/SCAD/LASSO penalty and the ridge, or L2, penalty.alpha=1 is equivalent to the MCP/SCAD/LASSO penalty, whilealpha=0 is equivalent to ridge regression. However,alpha=0 is not supported;alpha may be arbitrarily small, but not exactly zero

gamma_penalty

The scaling factor of the MCP and SCAD penalties. Not used by LASSO penalty.Default is 4.0 for SCAD and 3.0 for MCP. See Breheny and Huang (2011) <doi:10.1214/10-AOAS388> and Breheny and Huang (2015) <doi:10.1007/s11222-013-9424-2> for further details.

optim_options

a structure of class "optimControl" created from functionoptimControl that specifies several optimization parameters. See the documentation foroptimControl for more details on defaults.

adapt_RW_options

a list of class "adaptControl" from functionadaptControl containing the control parameters for the adaptive random walk Metropolis-within-Gibbs procedure. Ignored ifoptimControl parametersampler is set to "stan" (default) or "independence".

trace

an integer specifying print output to include as function runs. Default value is 0. See Details for more information about output provided when trace = 0, 1, or 2.

tuning_options

a list of class "selectControl" or "lambdaControl" resulting fromselectControl orlambdaControl containing additional control parameters.When functionglmm is used,the algorithm may be run using one specific set ofpenalty parameterslambda0 andlambda1 by specifying such values inlambdaControl(). The default forglmm is to run the model fit with no penalization (lambda0 =lambda1 = 0).When functionglmmPen is run,tuning_options is specified usingselectControl(). See thelambdaControl andselectControl documentation for further details.

survival_options

a structure of class "survivalControl" created from functionsurvivalControl that specifies several parameters needed to properly fit the input survival data using a piecewise constant hazard mixed model. See the documentation forsurvivalControl for more details on defaults.

BICq_posterior

an optional character string expressing the path and file basename of a file combination that will file-back or currently file-backs abig.matrix of the posterior samples from the minimal penalty model used for the BIC-ICQ calculation used for model selection. T(BIC-ICQ reference: Ibrahim et al (2011)<doi:10.1111/j.1541-0420.2010.01463.x>). If this argument isspecified asNULL (default) and BIC-ICQ calculations are requested (seeselectControl)for details), the posterior sampleswill be saved in the file combination 'BICq_Posterior_Draws.bin' and 'BICq_Posterior_Draws.desc'in the working directory.See 'Details' section for additional details about the required format ofBICq_posteriorand the file-backed big matrix.

progress

a logical value indicating if additional output should be given showing theprogress of the fit procedure. IfTRUE, such output includes iteration-level informationfor the fit procedure (iteration number EM_iter,number of MCMC samples nMC, average Euclidean distance between current coefficients and coefficientsfrom t–defined inoptimControl–iterations back EM_conv, and number of non-zero fixed and random effects covariatesnot including the intercept). Additionally,progress = TRUEgives some other information regarding the progress of the variable selection procedure, including the model selection criteria and log-likelihood estimatesfor each model fit.Default isTRUE.

Details

ArgumentBICq_posterior details: If theBIC_option inselectControl (tuning_options) is specified to be 'BICq', this requests the calculation of the BIC-ICQ criterion during the selectionprocess. For the BIC-ICQ criterion to be calculated, a full model assuming a small valued lambda penalty needs to be fit, and the posterior draws from this full model need to be used. In order to avoid repetitive calculations ofthis full model (i.e. if the user wants to re-runphmmPen with a differentset of penalty parameters), abig.matrix of these posterior draws will be file-backed as two files: a backing file with extention '.bin' and a descriptor file with extension '.desc'. TheBICq_posterior argument should contain a path and a filename with no extension of the form "./path/filename" such that the backingfile andthe descriptor file would then be saved as "./path/filename.bin" and "./path/filename.desc", respectively.IfBICq_posterior is set toNULL, then by default, the backingfile and descriptorfile are saved in the working directory as "BICq_Posterior_Draws.bin" and "BICq_Posterior_Draws.desc".If the big matrix of posterior draws is already file-backed,BICq_posterior shouldspecify the path and basename of the appropriate files (again of form "./path/filename"); the full model will not be fit again and the big.matrix of posterior draws will be read using theattach.big.matrix function of thebigmemory package and used in the BIC-ICQ calcuations. If the appropriate files do not exist orBICq_posterior is specified asNULL, the full model will be fit and the full model posteriordraws will be saved as specified above. The algorithm will save 10^4 posterior draws automatically.

Trace details: The value of 0 (default) does not output any extra information. The value of 1additionally outputs the updated coefficients, updated covariance matrix values, and thenumber of coordinate descent iterations used for the M step for eachEM iteration. When pre-screening procedure is used and/or if the BIC-ICQ criterion isrequested, trace = 1 gives additional information about the penalties usedfor the 'full model' fit procedure. If Stan is not used as the E-step sampling mechanism, the value of 2 outputs all of the above plus gibbs acceptance rate informationfor the adaptive random walk and independence samplers and the updated proposal standard deviationfor the adaptive random walk.

Value

A reference class object of classpglmmObj for which many methods are available (e.g.methods(class = "pglmmObj"), see ?pglmmObj for additional documentation)


Fit a Penalized Proportional Hazards Mixed Model via Monte Carlo Expectation Conditional Minimization (MCECM) usinga Piecewise Constant Hazard Mixed Model Approximation

Description

phmmPen_FA is used to approximate penalized proportional hazards mixed models usingusing a piecewise constant hazard mixed survival model approximation via Monte Carlo Expectation Conditional Minimization (MCECM) and a factor model decomposition ofthe random effects. The purpose of the function is to perform variable selection on both the fixed and random effects simultaneously for thepiecewise constant hazard mixed model. This function uses a factor model decomposition on the random effects. This assumptionreduces the latent space in the E-step (Expectation step) of the algorithm,which reduces the computational complexity of the algorithm. This improvesthe speed of the algorithm and enables thescaling of the algorithm to larger dimensions.phmmPen_FA selects the best model using BIC-type selection criteria (seeselectControl documentation for further details). To improve the speed of the algorithm, consider settingvar_restrictions = "fixef" within theoptimControl options.

Usage

phmmPen_FA(  formula,  data = NULL,  offset = NULL,  r_estimation = rControl(),  fixef_noPen = NULL,  penalty = c("MCP", "SCAD", "lasso"),  alpha = 1,  gamma_penalty = switch(penalty[1], SCAD = 4, 3),  optim_options = optimControl(),  adapt_RW_options = adaptControl(),  trace = 0,  tuning_options = selectControl(),  survival_options = survivalControl(),  BICq_posterior = NULL,  progress = TRUE)

Arguments

formula

a two-sided linear formula object describing both the fixed effects and random effects part of the model, with the response on the left of a ~ operator and the terms, separated by + operators, on the right. The response must be aSurv object (seeSurv from thesurvival package).Random-effects terms are distinguished by vertical bars ("|") separating expression for design matrices from the grouping factor.formula should be of the same format needed forglmer in packagelme4. Only one grouping factor will be recognized. The random effects covariates need to be a subset of the fixed effects covariates.The offset must be specified outside of the formula in the 'offset' argument.

data

an optional data frame containing the variables named informula. Ifdata is omitted, variables will be taken from the environment offormula.

offset

This can be used to specify ana priori known component to be included in the linear predictor during fitting. Default set toNULL (no offset). If the data argument is notNULL, this should be a numeric vector of length equal to the number of cases (the length of the response vector). If the data argument specifies a data.frame, the offsetargument should specify the name of a column in the data.frame.

r_estimation

a list of class "rControl" from functionrControl containing the control parameters for the estimation of the number of latentfactors to use in theglmmPen_FA andglmm_FA estimation procedures.

fixef_noPen

Optional vector of 0's and 1's of the same length as the number of fixed effects covariatesused in the model. Value 0 indicates the variable should not have its fixed effect coefficientpenalized, 1 indicates that it can be penalized. Order should correspond to the same order of the fixed effects given in the formula.

penalty

character describing the type of penalty to use in the variable selection procedure.Options include 'MCP', 'SCAD', and 'lasso'. Default is MCP penalty. If the random effect covariancematrix is "unstructured", then a group MCP, group SCAD, or group LASSO penalty is used on the random effects coefficients. See Breheny and Huang (2011) <doi:10.1214/10-AOAS388> and Breheny and Huang (2015) <doi:10.1007/s11222-013-9424-2> for details of these penalties.

alpha

Tuning parameter for the Mnet estimator which controls the relative contributions from the MCP/SCAD/LASSO penalty and the ridge, or L2, penalty.alpha=1 is equivalent to the MCP/SCAD/LASSO penalty, whilealpha=0 is equivalent to ridge regression. However,alpha=0 is not supported;alpha may be arbitrarily small, but not exactly zero

gamma_penalty

The scaling factor of the MCP and SCAD penalties. Not used by LASSO penalty.Default is 4.0 for SCAD and 3.0 for MCP. See Breheny and Huang (2011) <doi:10.1214/10-AOAS388> and Breheny and Huang (2015) <doi:10.1007/s11222-013-9424-2> for further details.

optim_options

a structure of class "optimControl" created from functionoptimControl that specifies several optimization parameters. See the documentation foroptimControl for more details on defaults.

adapt_RW_options

a list of class "adaptControl" from functionadaptControl containing the control parameters for the adaptive random walk Metropolis-within-Gibbs procedure. Ignored ifoptimControl parametersampler is set to "stan" (default) or "independence".

trace

an integer specifying print output to include as function runs. Default value is 0. See Details for more information about output provided when trace = 0, 1, or 2.

tuning_options

a list of class "selectControl" or "lambdaControl" resulting fromselectControl orlambdaControl containing additional control parameters.When functionglmm is used,the algorithm may be run using one specific set ofpenalty parameterslambda0 andlambda1 by specifying such values inlambdaControl(). The default forglmm is to run the model fit with no penalization (lambda0 =lambda1 = 0).When functionglmmPen is run,tuning_options is specified usingselectControl(). See thelambdaControl andselectControl documentation for further details.

survival_options

a structure of class "survivalControl" created from functionsurvivalControl that specifies several parameters needed to properly fit the input survival data using a piecewise constant hazard mixed model. See the documentation forsurvivalControl for more details on defaults.

BICq_posterior

an optional character string expressing the path and file basename of a file combination that will file-back or currently file-backs abig.matrix of the posterior samples from the minimal penalty model used for the BIC-ICQ calculation used for model selection. T(BIC-ICQ reference: Ibrahim et al (2011)<doi:10.1111/j.1541-0420.2010.01463.x>). If this argument isspecified asNULL (default) and BIC-ICQ calculations are requested (seeselectControl)for details), the posterior sampleswill be saved in the file combination 'BICq_Posterior_Draws.bin' and 'BICq_Posterior_Draws.desc'in the working directory.See 'Details' section for additional details about the required format ofBICq_posteriorand the file-backed big matrix.

progress

a logical value indicating if additional output should be given showing theprogress of the fit procedure. IfTRUE, such output includes iteration-level informationfor the fit procedure (iteration number EM_iter,number of MCMC samples nMC, average Euclidean distance between current coefficients and coefficientsfrom t–defined inoptimControl–iterations back EM_conv, and number of non-zero fixed and random effects covariatesnot including the intercept). Additionally,progress = TRUEgives some other information regarding the progress of the variable selection procedure, including the model selection criteria and log-likelihood estimatesfor each model fit.Default isTRUE.

Details

ArgumentBICq_posterior details: If theBIC_option inselectControl (tuning_options) is specified to be 'BICq', this requests the calculation of the BIC-ICQ criterion during the selectionprocess. For the BIC-ICQ criterion to be calculated, a full model assuming a small valued lambda penalty needs to be fit, and the posterior draws from this full model need to be used. In order to avoid repetitive calculations ofthis full model (i.e. if the user wants to re-runphmmPen with a differentset of penalty parameters), abig.matrix of these posterior draws will be file-backed as two files: a backing file with extention '.bin' and a descriptor file with extension '.desc'. TheBICq_posterior argument should contain a path and a filename with no extension of the form "./path/filename" such that the backingfile andthe descriptor file would then be saved as "./path/filename.bin" and "./path/filename.desc", respectively.IfBICq_posterior is set toNULL, then by default, the backingfile and descriptorfile are saved in the working directory as "BICq_Posterior_Draws.bin" and "BICq_Posterior_Draws.desc".If the big matrix of posterior draws is already file-backed,BICq_posterior shouldspecify the path and basename of the appropriate files (again of form "./path/filename"); the full model will not be fit again and the big.matrix of posterior draws will be read using theattach.big.matrix function of thebigmemory package and used in the BIC-ICQ calcuations. If the appropriate files do not exist orBICq_posterior is specified asNULL, the full model will be fit and the full model posteriordraws will be saved as specified above. The algorithm will save 10^4 posterior draws automatically.

Trace details: The value of 0 (default) does not output any extra information. The value of 1additionally outputs the updated coefficients, updated covariance matrix values, and thenumber of coordinate descent iterations used for the M step for eachEM iteration. When pre-screening procedure is used and/or if the BIC-ICQ criterion isrequested, trace = 1 gives additional information about the penalties usedfor the 'full model' fit procedure. If Stan is not used as the E-step sampling mechanism, the value of 2 outputs all of the above plus gibbs acceptance rate informationfor the adaptive random walk and independence samplers and the updated proposal standard deviationfor the adaptive random walk.

Value

A reference class object of classpglmmObj for which many methods are available (e.g.methods(class = "pglmmObj"), see ?pglmmObj for additional documentation)


Fit a Proportional Hazards Mixed Model via Monte Carlo Expectation Conditional Minimization (MCECM) usinga Piecewise Constant Hazard Mixed Model Approximation

Description

phmm_FA is used to fit a single piecewise constant hazard mixed model as anapproximation to a proportional hazards mixed model via Monte Carlo Expectation Conditional Minimization (MCECM). This piecewise constant hazard mixed modeluses a factor model decomposition ofthe random effects. No model selection is performed.

Usage

phmm_FA(  formula,  data = NULL,  offset = NULL,  r_estimation = rControl(),  optim_options = optimControl(),  adapt_RW_options = adaptControl(),  trace = 0,  tuning_options = lambdaControl(),  survival_options = survivalControl(),  progress = TRUE,  ...)

Arguments

formula

a two-sided linear formula object describing both the fixed effects and random effects part of the model, with the response on the left of a ~ operator and the terms, separated by + operators, on the right. The response must be aSurv object (seeSurv from thesurvival package).Random-effects terms are distinguished by vertical bars ("|") separating expression for design matrices from the grouping factor.formula should be of the same format needed forglmer in packagelme4. Only one grouping factor will be recognized. The random effects covariates need to be a subset of the fixed effects covariates.The offset must be specified outside of the formula in the 'offset' argument.

data

an optional data frame containing the variables named informula. Ifdata is omitted, variables will be taken from the environment offormula.

offset

This can be used to specify ana priori known component to be included in the linear predictor during fitting. Default set toNULL (no offset). If the data argument is notNULL, this should be a numeric vector of length equal to the number of cases (the length of the response vector). If the data argument specifies a data.frame, the offsetargument should specify the name of a column in the data.frame.

r_estimation

a list of class "rControl" from functionrControl containing the control parameters for the estimation of the number of latentfactors to use in theglmmPen_FA andglmm_FA estimation procedures.

optim_options

a structure of class "optimControl" created from functionoptimControl that specifies several optimization parameters. See the documentation foroptimControl for more details on defaults.

adapt_RW_options

a list of class "adaptControl" from functionadaptControl containing the control parameters for the adaptive random walk Metropolis-within-Gibbs procedure. Ignored ifoptimControl parametersampler is set to "stan" (default) or "independence".

trace

an integer specifying print output to include as function runs. Default value is 0. See Details for more information about output provided when trace = 0, 1, or 2.

tuning_options

a list of class "selectControl" or "lambdaControl" resulting fromselectControl orlambdaControl containing additional control parameters.When functionglmm is used,the algorithm may be run using one specific set ofpenalty parameterslambda0 andlambda1 by specifying such values inlambdaControl(). The default forglmm is to run the model fit with no penalization (lambda0 =lambda1 = 0).When functionglmmPen is run,tuning_options is specified usingselectControl(). See thelambdaControl andselectControl documentation for further details.

survival_options

a structure of class "survivalControl" created from functionsurvivalControl that specifies several parameters needed to properly fit the input survival data using a piecewise constant hazard mixed model. See the documentation forsurvivalControl for more details on defaults.

progress

a logical value indicating if additional output should be given showing theprogress of the fit procedure. IfTRUE, such output includes iteration-level informationfor the fit procedure (iteration number EM_iter,number of MCMC samples nMC, average Euclidean distance between current coefficients and coefficientsfrom t–defined inoptimControl–iterations back EM_conv, and number of non-zero fixed and random effects covariatesnot including the intercept). Additionally,progress = TRUEgives some other information regarding the progress of the variable selection procedure, including the model selection criteria and log-likelihood estimatesfor each model fit.Default isTRUE.

...

additional arguments that could be passed intophmmPen_FA. SeephmmPen_FAfor further details.

Details

Thephmm_FA function can be used to approximate a single proportional hazardsmixed model using a piecewise constant hazard mixed model.While this approach is meant to be used in the case where the user knows whichcovariates belong in the fixed and random effects and no penalization is required, one isallowed to specify non-zero fixed and random effects penalties usinglambdaControland the (...) arguments. The (...) allow for specification of penalty-related arguments; seephmmPen_FA for details. For a high dimensional situation, the user may want to fit a full model using a small penalty for the fixed and random effects and save the posteriordraws from this full model for use in any BIC-ICQ calculations during selection withinphmmPen_FA. Specifying a file name in the 'BICq_posterior' argument will save the posterior draws from thephmm_FA model into a big.matrix with this file name, see the Details section ofphmmPen_FA for additional details.

Value

A reference class object of classpglmmObj for which many methods are available (e.g.methods(class = "pglmmObj"))


Plot Diagnostics for MCMC Posterior Draws of the Random Effects

Description

Provides graphical diagnostics of the random effect posterior draws from the (best) model.Availabile diagnostics include the sample path, histograms, cummulative sums, and autocorrelation.

Usage

plot_mcmc(  object,  plots = "sample.path",  grps = "all",  vars = "all",  numeric_grp_order = FALSE,  bin_width = NULL)

Arguments

object

an object of classpglmmObj

plots

a character string or a vector of character strings specifying which graphicaldiagnostics to provide. Options include a sample path plot (default, "sample.path"), autocorrelation plots ("autocorr"), histograms ("histogram"), cumulative sum plots ("cumsum"),and all four possible plot options ("all"). While the "all" option will produce all fourpossible plots, subsets of the types of plots (e.g. sample path plots and autocorrelation plotsonly) can be specified with a vector of the relevant character strings (e.g. c("sample.path","autocorr"))

grps

a character string or a vector of character strings specifying which groups should have diagnostics provided. The names of the groups match the input group factor levels.Default is set to 'all' for all groups.

vars

a character string or a vector of character strings specifying which variablesshould have diagnostics provided. Default is set to'all', which picks all variables with non-zero random effects.Tip: can find the names of the random effect variables inthe output sigma matrix found in thepglmmObj object, runsigma(object).

numeric_grp_order

if TRUE, specifies that the groups factor should be converted to numeric values. This option could be used to ensure that the organization of the groups is in the proper numeric order (e.g. groups with levels 1-10 are ordered 1-10, not 1, 10, 2-9).

bin_width

optional binwidth argument forgeom_histogram from theggplot2 package. Default set toNULL, which specifies the defaultgeom_histogram binwidth. This argument only applies if the "histogram" plot type is selected.

Value

a list of ggplot graphics, each faceted by group and random effect variable. Type of plots specified in theplots argument.


Control of Latent Factor Model Number EstimationConstructs the control structure for the estimation of the number of latent factors (r) for use within theglmmPen_FA andglmm_FA estimation procedures.

Description

Control of Latent Factor Model Number Estimation

Constructs the control structure for the estimation of the number of latent factors (r) for use within theglmmPen_FA andglmm_FA estimation procedures.

Usage

rControl(  r = NULL,  r_max = NULL,  r_est_method = "GR",  size = 25,  sample = FALSE)

Arguments

r

positive integer specifying number of latent common factors to assume in the model. IfNULL (default), this value estimated from the data. Seer_est_method for available estimation procedures, and the Detailssection for further details on the general estimation procedure.Ifr is specified, the no estimation procedure is performed and the algorithmuses the input value or r. All other parameters for this function arerelevant for the estimation procedure.

r_max

positive integer specifying maximum number of latent factors to consider.IfNULL (default), this value is automatically calculated.

r_est_method

character string indicating method used to estimate numberof latent factorsr. Default "GR" uses the Growth Ratio method ofAhn and Horenstein (2013) (<doi:10.3982/ECTA8968>). Other available options include "ER" forthe Eigenvalue Ratio method of Ahn and Horenstein (2013) (<doi:10.3982/ECTA8968>) and "BN1" or "BN2",the Bai and Ng (2002) method (<dio:10.1111/1468-0262.00273>) using one of two penalties: (1)(d + p) / (d p) log(d p/(d+p)) or(2)(d + p) / (d p) log(min(d,p)) where d is the number of groups inthe data and p is the number of total random effect covariates (including the intercept)

size

positive integer specifying the total number of pseudo randomeffect estimates to use in the estimation procedure for the number of latent factorsr, which is restricted to be no less than 25. If thissize is greaterthan the number of groups in the data (i.e.~the number of levels of the groupingvariable), then a sampling procedure is used to increase the number of pseudo estimatesto the value ofsize if the value ofsample isTRUE.

sample

logical value indicating if the total number of pseudo random effectestimates to use in the estimation procedure for the number of latent common factors rshould be larger than the number of unique groups in the data, where the number of pseudo estimates are increased to the value ofsize using a sampling procedure. Default isFALSE. IfTRUE, the sampling procedure is onlyperformed if the value ofsize is greater than the number of groups in the data.

Details

Estimation ofr procedure: For each level of the group variable separately,we identify the observations within that group and fit a regular penalized generalized linear model where the penalty value is theminimum fixed effect penalty. These group-specific estimates, which we label as 'pseudo random effects',are placed into a matrixG(rows = number of levels of the grouping variable, columns = number of random effect covariates),and this pseudo random effects matrix is treated as the observed outcome matrix used inthe "GR", "ER", and "BN" estimation procedures described above in the description ofr_est_method.


Simulates data to use for theglmmPen package

Description

Simulates data to use for testing theglmmPen package.sim.data simulates data forglmmPen,sim.data.FA simulates data forglmmPen_FA,sim.data.piecewise.exp simulates survival data forphmmPen andphmmPen_FA,andsim.data.weibull simulates alternative survival data forphmmPen andphmmPen_FA.Possible parameters to specify includes number of total covariates,number of non-zero fixed and random effects, and the magnitudeof the random effect covariance values.

Usage

sim.data(  n,  ptot,  pnonzero,  nstudies,  sd_raneff = 1,  family = "binomial",  corr = NULL,  seed,  imbalance = 0,  beta = NULL,  pnonzerovar = 0,  sd_x = 1)sim.data.FA(  n,  ptot,  pnonzero,  nstudies,  sd_raneff = 0,  family = "binomial",  B = NULL,  r = 2,  corr = NULL,  seed,  imbalance = 0,  beta = NULL,  pnonzerovar = 0,  sd_x = 1)sim.data.piecewise.exp(  n,  ptot,  pnonzero,  nstudies,  sd_raneff = 0,  B = NULL,  r = 2,  cut_points = c(0, 0.5, 1, 1.5, 2),  lhaz_vals = c(-1.5, 1, 2.7, 3.7, 6.8),  cens_type = c("unif", "exp"),  cens_max = 5,  exp_rate = 0.15,  seed,  imbalance = 0,  beta = NULL,  pnonzerovar = 0,  sd_x = 1)sim.data.weibull(  n,  ptot,  pnonzero,  nstudies,  sd_raneff = 0,  B = NULL,  r = 2,  lhaz_base = 1,  alpha_PH = 5,  exp_rate = 0.15,  cens_type = c("unif", "exp"),  cens_max = 5,  seed,  imbalance = 0,  beta = NULL,  pnonzerovar = 0,  sd_x = 1)

Arguments

n

integer specifying total number of samples to generate

ptot

integer specifying total number of covariates to generate (values randomly generated from the standard normal distribution)

pnonzero

integer specifying how may of the covariates should havenon-zero fixed and random effects

nstudies

number of studies/groups to have in the data

sd_raneff

non-negative value specifying the standard deviation of therandom effects covariance matrix (applied to the non-zero random effects)

family

character string specifying which family to generate data from.Family options include "binomial" (default), "poisson", and "gaussian".

corr

optional value to specify correlation between covariatesin the model matrix. DefaultNULL, only available withinsim.data.

seed

integer to use for the setting of a random seed

imbalance

integer of 0 or 1 indicating whether the observations shouldbe equally distributed among the groups (0) or unequally distributed (1).

beta

numeric vector of the fixed effects (including intercept)

pnonzerovar

non-negative integer specifying the number of covariates with a zero-valued fixed effect but a non-zero random effect.

sd_x

non-negative value specifying the standard deviation of thesimulated covariates (drawn from a normal distribution with mean 0,standard deviationsd_x)

B

matrix specifying the factor loadings matrix for the random effects,only used withinsim.data.FA.Dimensions: number of columns is the number of random effects (including the intercept),and number of rows is the number of latent common factors (r).The random effect covariance matrix is specified as Sigma = B x t(B) + diag(sd_raneff)

r

positive integer specifying number of latent common factors that describe the random effects,only used withinsim.data.FA

cut_points

vector of cut points to use for the time intervals when simulating piecewise exponential data. Length of cut points must equal length oflhaz_vals, and the value of the first cut point must be 0.

lhaz_vals

vector of the log baseline hazard values for each time interval (whichcorrespond to the time intervals defined by thecut_points argument) withinpiecewise exponential data. Hazards are assumed to be constant within each time interval.

cens_type

character specifying type of censoring to implement. Default "unif" specifiesuniform censoring from 0 tocens_max. The option "exp" specifies exponentialcensoring with rateexp_rate

cens_max

numeric value used to characterize the range of the uniform censoring procedure (from 0 tocens_max)

exp_rate

numeric value used to characterize the exponential censoring rate (where rateis defined as the rate used inrexp)

lhaz_base

numeric value that gives the log of the scale parameter for the Weibull distribution (for description of Weibull scale parameter without log transformation, see "lambdas" argument insimsurv and the lambda notation in section "Weibull distribution" in the simsurv vignette https://cran.r-project.org/web/packages/simsurv/vignettes/simsurv_technical.html)

alpha_PH

numeric value > 0 that gives the shape parameter for the Weibull distribution (for description of Weibull shape paraemeter,see "gammas" argument insimsurv and the gamma notation in section "Weibulldistribution" in the simsurv vignette https://cran.r-project.org/web/packages/simsurv/vignettes/simsurv_technical.html)

Value

list containing the following elements:

y

vector of the response

X

model matrix for the fixed effects

Z

model matrix for the random effects, organized first by variableand then by group

pnonzero

number of non-zero fixed effects

z1

values of the random effects for each variable for each level of the grouping factor

group

grouping factor

X0

model matrix for just the non-zero fixed effects


Control for Fitting Piecewise Constant Hazard Mixed Models as an Approximation to Fitting Cox Proportional Hazards Mixed Models

Description

Constructs the control structure for additional parameters needed toproperly fit survival data using a piecewise constant hazard mixed model

Usage

survivalControl(  cut_num = 8,  interval_type = c("equal", "manual", "group"),  cut_points = NULL,  time_scale = 1)

Arguments

cut_num

positive integer specifying the number of time intervals to include inthe piecewise constant hazard model assumptions for the sampling step. Default is 8.General recommendation: use between 5 and 10 intervals. See the Details section foradditional information.

interval_type

character specifying how the time intervals are calculated.Options include 'equal' (default), 'manual', or 'group'.If 'equal' (default), time intervalsare calculated such that there are approximately equal numbers of events per time interval.If 'manual', the user needs to inputtheir own cut points (seecut_points for details).If 'group', time intervals are calculated such that there are approximatelyequal numbers of events per time interval AND there are at least 4 events observedby each group within each time interval. The input number of time intervalscut_nummay be modified to a lower number in order to accomplish this goal. This option is generally difficult to perform when there are a large number of groups in the data.

cut_points

numeric vector specifying the value of the cut points to usein the calculation of the time intervals for the piecewise constant hazard model. Ifinterval_type set to 'equal' or 'group', then this argument is not utilized.Ifinterval_type set to 'manual', then this argument is required.First value must be 0, and all values must be ordered smallest to largest.

time_scale

positive numeric value (greater than 1) used to scale the time variable in the survival data. In order to calculate the piecewise constant hazard mixed model,the log of the time a subject survived within a particular time intervalis used as an offset in the model fit. Sometimes multiplying the time scaleby a factor greater than 1 improves the stability of the model fit algorithm.

Details

In the piecewise constant hazard model, there is an assumption that the time line of the data can be cut intocut_numtime intervals and the baseline hazard is constant withineach of these time intervals. In the fit algorithm, we estimatethese baseline hazard values (specifically, we estimate the log of the baselinehazard values). By default, we determine cut points by specifying the total number of cutsto make (cut_num) and then specifying time values for cut points suchthat each time interval has an approximately equal number of events (interval_type = equal).The authors of this package have found simulations to work well using this defaultinterval_type = equal,but if desired, users can further specify that each group has at least some (4) events observedwithin each time interval. Regardless of theinterval_type choice, users should be aware that having too many cut points could result in having too fewevents for each time interval needed for a stable estimation of the baseline hazard estimates.Additionally, data with few events could result in too few events per time intervaleven for a small number of cut points. Alternatively, having too few cut points could result in a sub-par approximation of the baseline hazard, which can lead to biased estimation for the coefficients corresponding to the input variables of interest.We generally recommend having8 total time intervals (more broadly, between 5 and 10 total time intervals). Warnings or errorswill occur for cases when there are 1 or 0 events for a time interval. If this happens, either adjust thecut_num value appropriately,or in the case when the data simply has a very small number of events,consider not using this software for your estimation purposes.

Value

Function returns a list inheriting from classsurvivalControlcontaining fit and optimization criteria values used in optimization routine.


Convert Input Survival Data Into Long-Form Data Needed for Fitting a PiecewiseExponential Model

Description

Converts the input survival data with one row or element corresponding to a single observation or subject into a long-form dataset where one observation or subjectcontributesj rows, wherej is the number of time intervals that a subject survived at least part-way through.

Usage

survival_data(y, X, Z, group, offset_fit = NULL, survival_options)

Arguments

y

response, which must be aSurv object (seeSurv from thesurvival package)

X

matrix of fixed effects covariates

Z

matrix of random effects covariates

group

vector specifying the group assignment for each subject

offset_fit

vector specifying the offset. This can be used to specify ana priori known component to be included in the linear predictor during fitting. Default set toNULL (no offset). If the data argument is notNULL, this should be a numeric vector of length equal to the number of cases (the length of the response vector).

survival_options

a structure of class "survivalControl" created from functionsurvivalControl that specifies several parameters needed to properly fit the input survival data using a piecewise constant hazard mixed model. See the documentation forsurvivalControl for more details on defaults.


[8]ページ先頭

©2009-2025 Movatter.jp