Movatterモバイル変換


[0]ホーム

URL:


Type:Package
Version:0.2-6.8
Title:Generalised Joint Regression Modelling
Description:Routines for fitting various joint (and univariate) regression models, with several types of covariate effects, in the presence of equations' errors association.
Author:Giampiero Marra [aut, cre], Rosalba Radice [aut]
Maintainer:Giampiero Marra <giampiero.marra@ucl.ac.uk>
Depends:R (≥ 3.6.0)
Imports:mgcv, magic, VGAM, survey, trust, VineCopula, graphics, stats,utils, grDevices, ggplot2, matrixStats, mnormt, gamlss.dist,Rmpfr, scam, survival, psych, copula, numDeriv, evd, ismev,methods, distrEx
Enhances:sp
LazyLoad:yes
License:GPL-2 |GPL-3 [expanded from: GPL (≥ 2)]
URL:https://www.ucl.ac.uk/statistics/people/giampieromarra
Repository:CRAN
Date/Publication:2025-06-23 21:20:02 UTC
NeedsCompilation:no
Packaged:2025-06-23 20:04:25 UTC; Giampiero

Generalised Joint Regression Modelling

Description

This package provides a function for fitting various generalised joint regression models with several types of covariate effects and distributions. Many modelling options are supported and all parameters of the joint distribution can be specified as flexible functions of covariates.

The orginal name of this package wasSemiParBIVProbit which was designed to fit flexible bivariate binary response models. However, since then the package has expanded so much that its orginal name no longer gave a clue about all modelling options available. The new name should more closely reflect past, current and future developments.

The main fitting functions are listed below.

gjrm() which fits bivariate regression models with binary responses (useful for fitting bivariate binary models in the presence of (i) non-random sample selection or (ii) associated responses/endogeneity or (iii) partial observability), bivariate models with binary/discrete/continuous/survival margins in the presence of associated responses/endogeneity, bivariate sample selection models with continuous/discrete response, trivariate binary models (with and without double sample selection). This function essentially merges all previously available fitting functions, namelySemiParBIV(),SemiParTRIV(),copulaReg() andcopulaSampleSel().

gamlss() fits flexible univariate regression models where the response can be binary (only the extreme value distribution is allowed for), continuous, discrete and survival. The purpose of this function was only to provide, in some cases, starting values for the above functions, but it has now been made available in the form of a proper function should the user wish to fit univariate models using the general estimation approach of this package.

We are currently working on several multivariate extensions.

Details

GJRM provides functions for fitting general joint models in various situations. The estimation approach isbased on a very generic penalized maximum likelihood based framework, where any (parametric) distribution can in principle be employed,and the smoothers (representing several types of covariate effects) are set up using penalised regression splines.Several marginal and copula distributions are available and the numerical routine carries out function minimization using a trust region algorithm in combination withan adaptation of an automatic multiple smoothing parameter estimation procedure for GAMs (seemgcv for more details on this last point). The smoothers supported by this package are those available inmgcv.

Confidence intervals for smooth components and nonlinear functions of the modelparameters are derived using a Bayesian approach. P-values for testing individual smooth terms for equality to the zero function are also provided and based on the approachimplemented inmgcv. The usual plotting and summary functions are also available. Model/variable selection is also possible via the use of shrinakge smoothers and/or information criteria.

Author(s)

Giampiero Marra (University College London, Department of Statistical Science) and Rosalba Radice (Bayes Business School, Faculty of Actuarial Science and Insurance, City, University of London)

with help and contributions from Panagiota Filippou (trivariate binary models), Francesco Donat (bivariate models with ordinal and continuous margins, and ordinal margins), Matteo Fasiolo (pdf and cdf, and related derivatives, of the Tweedie distribution), Alessia Eletti and Javier Rubio Alvarez (univariate survival models with mixed censoring and excess hazards), Kiron Das(Galambos copula), Eva Cantoni and William Aeberhard (robust gamlss).

Thanks to Bear Braumoeller for suggesting the implementation of bivariate models with partial observability, and Carmen Cadarso for suggesting the inclusion of various modelling extensions.

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk

Part funded by EPSRC: EP/J006742/1 and EP/T033061/1

References

Key methodological references (ordered by year of publication):

Marra G., Radice R., Zimmer D. (2024), A Unifying Switching Regime Regression Framework with Applications in Health Economics.Econometric Reviews, 43(1), 52-70.

Eletti A., Marra G., Quaresma M., Radice R., Rubio F.J. (2022), A Unifying Framework for Flexible Excess Hazard Modeling with Applications in Cancer Epidemiology.Journal of the Royal Statistical Society Series C, 71(4), 1044-1062.

Petti D., Eletti A., Marra G., Radice R. (2022), Copula Link-Based Additive Models for Bivariate Time-to-Event Outcomes with General Censoring Scheme.Computational Statistics and Data Analysis, 107550.

Ranjbar S., Cantoni E., Chavez-Demoulin V., Marra G., Radice R., Jaton-Ogay K. (2022), Modelling the Extremes of Seasonal Viruses and Hospital Congestion: The Example of Flu in a Swiss Hospital.Journal of the Royal Statistical Society Series C, 71(4), 884-905.

Aeberhard W.H., Cantoni E., Marra G., Radice R. (2021), Robust Fitting for Generalized Additive Models for Location, Scale and Shape.Statistics and Computing, 31(11), 1-16.

Marra G., Farcomeni A., Radice R. (2021), Link-Based Survival Additive Models under Mixed Censoring to Assess Risks of Hospital-Acquired Infections.Computational Statistics and Data Analysis, 155, 107092.

Hohberg M., Donat F., Marra G., Kneib T. (2021), Beyond Unidimensional Poverty Analysis Using Distributional Copula Models for Mixed Ordered-Continuous Outcomes.Journal of the Royal Statistical Society Series C, 70(5), 1365-1390.

Dettoni R., Marra G., Radice R. (2020), Generalized Link-Based Additive Survival Models with Informative Censoring.Journal of Computational and Graphical Statistics, 29(3), 503-512.

Marra G., Radice R. (2020), Copula Link-Based Additive Models for Right-Censored Event Time Data.Journal of the American Statistical Association, 115(530), 886-895.

Filippou P., Kneib T., Marra G., Radice R. (2019), A Trivariate Additive Regression Model with Arbitrary Link Functions and Varying Correlation Matrix.Journal of Statistical Planning and Inference, 199, 236-248.

Klein N., Kneib T., Marra G., Radice R., Rokicki S., McGovern M.E. (2019), Mixed Binary-Continuous Copula Regression Models with Application to Adverse Birth Outcomes.Statistics in Medicine, 38(3), 413-436.

Filippou P., Marra G., Radice R. (2017), Penalized Likelihood Estimation of a Trivariate Additive Probit Model.Biostatistics, 18(3), 569-585.

Marra G., Radice R. (2017), Bivariate Copula Additive Models for Location, Scale and Shape.Computational Statistics and Data Analysis, 112, 99-113.

Marra G., Radice R., Barnighausen T., Wood S.N., McGovern M.E. (2017), A Simultaneous Equation Approach to Estimating HIV Prevalence with Non-Ignorable Missing Responses.Journal of the American Statistical Association, 112(518), 484-496.

Marra G., Radice R., Filippou P. (2017), Testing the Hypothesis of Exogeneity in Regression Spline Bivariate Probit Models.Communications in Statistics - Simulation and Computation, 46(3), 2283-2298.

Radice R., Marra G., Wojtys M. (2016), Copula Regression Spline Models for Binary Outcomes.Statistics and Computing, 26(5), 981-995.

Marra G., Radice R. (2013), A Penalized Likelihood Estimation Approach to Semiparametric Sample Selection Binary Response Modeling.Electronic Journal of Statistics, 7, 1432-1455.

Marra G., Radice R. (2013), Estimation of a Regression Spline Sample Selection Model.Computational Statistics and Data Analysis, 61, 158-173.

Marra G., Radice R. (2011), Estimation of a Semiparametric Recursive Bivariate Probit in the Presence of Endogeneity.Canadian Journal of Statistics, 39(2), 259-279.

For applied case studies seehttps://www.homepages.ucl.ac.uk/~ucakgm0/pubs.htm.

See Also

gjrm,gamlss


Average Treatment Effect of a binary or continuous treatment variable

Description

ATE can be used to calculate the causal average treatment effect of a binary or continuous Gaussian treatment variable, with corresponding interval obtained using posterior simulation.

Usage

ATE(x, trt, trt.val = NULL, int.var = NULL, eq = NULL, joint = TRUE, n.sim = 100,     prob.lev = 0.05, length.out = NULL, percentage = FALSE)

Arguments

x

A fittedgjrm object as produced by the respective fitting function.

trt

Name of the treatment variable.

trt.val

Numeric value for the treatment variable. This is only required when the endogenous variable is Gaussian.

int.var

A vector made up of the name of the variable interacted withtrt, and a value for it. It can also be a list.

eq

Number of equation containing the treatment variable. This is only used for trivariate models.

joint

IfFALSE then the effect is obtained from the univariate model which neglects the presence of unobserved confounders. WhenTRUE, the effect is obtained from the simultaneous model which accounts for observed and unobserved confounders.

n.sim

Number of simulated coefficient vectors from the posterior distribution of the estimated model parameters. It may be increased if more precision is required.

prob.lev

Overall probability of the left and right tails of the AT distribution used for interval calculations.

length.out

Length of the sequence to be used when calculating the effect that a continuoustreatment has on a binary outcome.

percentage

Only for the Roy model, whenTRUE it provides results in terms of percentage.

Details

ATE measures the causal average difference in outcomes under treatment (the binary predictor or treatment assumes value 1) and under control (the binary treatment assumes value 0). Posterior simulation is used to obtain a confidence/credible interval. See the references below for details.

ATE can also calculate the effect that a continuous Gaussian endogenous variable has on a binary outcome. In this case the effect will depend on the unit increment chosen (as shown by the plot produced).

Value

res

It returns three values: lower confidence interval limit, estimated AT and upper interval limit.

prob.lev

Probability level used.

sim.ATE

It returns a vector containing simulated values of the average treatment effect. This is used to calculate intervals.

Effects

For the case of continuous/discrete endogenous variable and binary outcome, it returns a matrix made up of three columns containing the effects for each incremental value in the endogenous variable and respective intervals.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk

References

Marra G. and Radice R. (2011), Estimation of a Semiparametric Recursive Bivariate Probit in the Presence of Endogeneity.Canadian Journal of Statistics, 39(2), 259-279.

See Also

GJRM-package,gjrm


Internal Function

Description

It evaluates the cdf of several copulae.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk


Internal fitting function

Description

Internal fitting and set up function.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk


Differentiable penalties

Description

work in progress, temp function

Usage

Dpens(params, type = "lasso", lambda = 1, w.alasso = NULL,       gamma = 1, a = 3.7, eps = 1e-08)

Arguments

params

coefficients.

type

lasso, alasso or scad.

lambda

smoothing parameter.

w.alasso

for alasso.

gamma

default 1.

a

for scad.

eps

tolerance.

Details

work in progress.

Value

The function returns a penalty.


Internal Function

Description

This and other similar internal functions calculate the Hessian for trivariate binary models.

Author(s)

Author: Panagiota Filippou

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk


Lagrange Multiplier Test (Score Test)

Description

Before fitting a bivariate probit model,LM.bpm can be used to test the hypothesis of absence of endogeneity, correlated model equations/errors or non-random sample selection.

Usage

LM.bpm(formula, data = list(), weights = NULL, subset = NULL, model, hess = TRUE)

Arguments

formula

A list of two formulas, one for equation 1 and the other for equation 2.s terms are used to specify smooth smooth functions of predictors. Note that ifmodel = "BSS" then the first formula MUST refer to the selection equation.

data

An optional data frame, list or environment containing the variables in the model. If not found indata, thevariables are taken fromenvironment(formula).

weights

Optional vector of prior weights to be used in fitting.

subset

Optional vector specifying a subset of observations to be used in the fitting process.

model

It indicates the type of model to be used in the analysis. Possible values are "B" (bivariate model) and "BSS" (bivariate model with sample selection). The two marginal equations have probit links.

hess

IfFALSE then the expected (rather than observed) information matrix is employed.

Details

This Lagrange multiplier test (also known as score test) is used here for testing the null hypothesis that\theta is equal to 0 (i.e. no endogeneity, non-random sample selection or correlated model equations/errors, depending on the model being fitted). Its main advantage is that it does not require an estimate of the model parameter vector under the alternative hypothesis. Asymptotically, it takes a Chi-squared distribution with one degree of freedom. Full details can be found in Marra et al. (2014) and Marra et al. (2017).

Value

It returns a numeric p-value corresponding to the null hypothesis that the correlation,\theta, is equal to 0.

WARNINGS

This test's implementation is ONLY valid for bivariate binary probit models with normal errors.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk

References

Marra G., Radice R. and Filippou P. (2017), Regression Spline Bivariate Probit Models: A Practical Approach to Testing for Exogeneity.Communications in Statistics - Simulation and Computation, 46(3), 2283-2298.

Marra G., Radice R. and Missiroli S. (2014), Testing the Hypothesis of Absence of Unobserved Confounding in Semiparametric Bivariate Probit Models.Computational Statistics, 29(3-4), 715-741.

See Also

gjrm

Examples

## see examples for gjrm

Causal odds ratio of a binary/continuous treatment variable

Description

OR can be used to calculate the causal odds ratio of a binary/continuous treatment variable, with corresponding interval obtained using posterior simulation.

Usage

OR(x, trt, trt.val = NULL, int.var = NULL, joint = TRUE, n.sim = 100, prob.lev = 0.05,    length.out = NULL)

Arguments

x

A fittedgjrm object.

trt

Name of the treatment variable.

trt.val

Numeric value for the treatment variable. This is only required when the endogenous variable is Gaussian.

int.var

A vector made up of the name of the variable interacted withtrt, and a value for it. It can also be a list.

joint

IfFALSE then the effect is obtained from the univariate model which neglects the presence of unobserved confounders. WhenTRUE, the effect is obtained from the simultaneous model which accounts for observed and unobserved confounders.

n.sim

Number of simulated coefficient vectors from the posterior distribution of the estimated model parameters. This is used whendelta = FALSE. It may be increased if more precision is required.

prob.lev

Overall probability of the left and right tails of the OR distribution used for interval calculations.

length.out

Ddesired length of the sequence to be used when calculating the effect that a continuoustreatment has on a binary outcome.

Details

OR calculates the causal odds ratio for a binary/continuous Gaussian treatment. Posterior simulation is used to obtain a confidence/credible interval.

Value

prob.lev

Probability level used.

sim.OR

It returns a vector containing simulated values of the average OR. This is used to calculate intervals.

Ratios

For the case of continuous endogenous treatment and binary outcome, it returns a matrix made up of three columns containing the odds ratios for each incremental value in the endogenous variable and respective intervals.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk

See Also

GJRM-package,gjrm


Partial effect from a binary bivariate model

Description

PE can be used to calculate the sample treatment effect from a a binary bivariate model, with corresponding interval obtained using posterior simulation.

Usage

PE(x1, idx, n.sim = 100, prob.lev = 0.05,    plot = FALSE,    main = "Histogram of Simulated Average Effects",    xlab = "Simulated Average Effects", ...)

Arguments

x1

A fittedgjrm object.

idx

This is useful to pick a particular individual and must be provided.

n.sim

Number of simulated coefficient vectors from the posterior distribution of the estimated model parameters. This is used whendelta = FALSE. It may be increased if more precision is required.

prob.lev

Overall probability of the left and right tails of the AT distribution used for interval calculations.

plot

IfTRUE then a plot of the histogram and kernel density estimate of the simulated average effects is produced.

main

Title for the plot.

xlab

Title for the x axis.

...

Other graphics parameters to pass on to plotting commands. These are used only whenhd.plot = TRUE.

Details

PE measures the sample average effect from a binary bivariate model when a binary response (associated with a continuous outcome) takes values 0 and 1. Posterior simulation is used to obtain a confidence/credible interval.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk

See Also

GJRM-package,gjrm


Causal risk ratio of a binary/continuous treatment variable

Description

RR can be used to calculate the causal risk ratio of a binary/continuous treatment variable, with corresponding interval obtained using posterior simulation.

Usage

RR(x, trt, trt.val = NULL, int.var = NULL, joint = TRUE, n.sim = 100, prob.lev = 0.05,    length.out = NULL)

Arguments

x

A fittedgjrm object.

trt

Name of the treatment variable.

trt.val

Numeric value for the treatment variable. This is only required when the endogenous variable is Gaussian.

int.var

A vector made up of the name of the variable interacted withtrt, and a value for it. It can also be a list.

joint

IfFALSE then the effect is obtained from the univariate model which neglects the presence of unobserved confounders. WhenTRUE, the effect is obtained from the simultaneous model which accounts for observed and unobserved confounders.

n.sim

Number of simulated coefficient vectors from the posterior distribution of the estimated model parameters. This is used whendelta = FALSE. It may be increased if more precision is required.

prob.lev

Overall probability of the left and right tails of the RR distribution used for interval calculations.

length.out

Ddesired length of the sequence to be used when calculating the effect that a continuoustreatment has on a binary outcome.

Details

RR calculates the causal risk ratio of the probabilities of positive outcome under treatment (the binary predictor or treatment assumes value 1) and under control (the binary treatment assumes value 0). Posterior simulation is used to obtain a confidence/credible interval.

RR works also for the case of continuous Gaussian endogenous treatment variable.

Value

prob.lev

Probability level used.

sim.RR

It returns a vector containing simulated values of the average RR. This is used to calculate intervals.

Ratios

For the case of continuous endogenous variable and binary outcome, it returns a matrix made up of three columns containing the risk ratios for each incremental value in the endogenous variable and respective intervals.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk

See Also

GJRM-package,gjrm


Internal Function

Description

It provides penalty matrices in a format suitable for automatic multiple smoothing parameter estimation.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk


Survival Average Treatment Effects of a binary treatment variable

Description

SATE can be used to calculate the survival treatment effects of a binary treatment variable, with corresponding interval obtained using posterior simulation.

Usage

SATE(x, trt, surv.t = NULL, int.var = NULL, joint = TRUE,     n.sim = 100, prob.lev = 0.05, ls = 10, plot.type = "none", ...)

Arguments

x

A fittedgjrm object as produced by the respective fitting function.

trt

Name of the treatment variable.

surv.t

Numeric value or vector for time. If not provided, the function will be calculate the SATE for each time point of a grid of lenghtls, calculated from the observed outcome.

int.var

A vector made up of the name of the variable interacted withtrt, and a value for it. It can also be a list.

joint

IfFALSE then the effects are obtained from the univariate model which neglects the presence of unobserved confounders. WhenTRUE, the effects are obtained from the simultaneous model which accounts for observed and unobserved confounders.

n.sim

Number of simulated coefficient vectors from the posterior distribution of the estimated model parameters. It may be increased if more precision is required.

prob.lev

Overall probability of the left and right tails of the SAT distribution used for interval calculations.

ls

Length of sequence to use for time variable. Only used whensurv.t = NULL.

plot.type

Values allowed are: "none", "survival" (for survival plots under treatment = 0 (grey lines)and treatment = 1 (black lines)) and "sate" (SATE evaluated at several time points).

...

Other graphics parameters to pass on to plotting commands.

Details

SATE measures the average survival difference in outcomes under treatment (the binary predictor or treatment assumes value 1) and under control (the binary treatment assumes value 0). Posterior simulation is used to obtain a confidence/credible interval.

Value

res

It returns three values: lower interval limit(s), estimated SATE(s) and upper interval limit(s).

prob.lev

Probability level used.

sim.SATE

It returns a vector containing simulated values of the survival average treatment effect for the case in which a specific time is chosen. This is used to calculate intervals.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk

See Also

GJRM-package,gjrm


Internal fitting function

Description

Internal fitting set up function.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk


Internal Function

Description

Wrapper of core algorithm.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk


Internal Function

Description

This and other similar internal functions calculate useful post estimation quantities.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk


Internal fitting function

Description

Internal fitting set up function.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk


Internal fitting function

Description

Internal fitting set up function.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk


Internal Function

Description

It approximates the trivariate normal integral.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk


Adjustment for the covariance matrix from a fitted gjrm model

Description

adjCov can be used to adjust the covariance matrix of a fittedgjrm object.

Usage

adjCov(x, id)

Arguments

x

A fittedgjrm object as produced by the respective fitting function.

id

Cluster identifier.

Details

This adjustment can be made when dealing with clustered data and the cluster structure is neglected when fitting the model. The basic idea is that the model is fitted as though observations were independent, and subsequently adjust the covariance matrix of the parameter estimates. Using theterminology of Liang and Zeger (1986), this would correspond to using an independence structure within the context of generalized estimating equations. The parameter estimators are still consistent but are inefficient as comparedto a model which accounts for the correct cluster dependence structure. The covariance matrix of the independence estimators can be adjusted as described in Liang and Zeger (1986, Section 2).

Value

This function returns a fitted object which is identical to that supplied inadjCov but with adjusted covariance matrix.

WARNINGS

This correction may not be appropriate for models fitted using penalties.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk

References

Liang K.-Y. and Zeger S. (1986), Longitudinal Data Analysis Using Generalized Linear Models.Biometrika, 73(1), 13-22.

See Also

GJRM-package,gjrm


Adjustment for the covariance matrix from a gjrm model fitted to complex survey data.

Description

adjCovSD can be used to adjust the covariance matrix of a fittedgjrm object.

Usage

adjCovSD(x, design)

Arguments

x

A fittedgjrm object as produced by the respective fitting function.

design

Asvydesign object as produced bysvydesign() from thesurvey package.

Details

This function has been extracted from thesurvey package and adapted to the class of this package's models. It computes the sandwich variance estimator for a copula model fitted to data from a complex sample survey (Lumley, 2004).

Value

This function returns a fitted object which is identical to that supplied inadjCovSD but with adjusted covariance matrix.

WARNINGS

This correction may not be appropriate for models fitted using penalties.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk

References

Lumley T. (2004), Analysis of Complex Survey Samples.Journal of Statistical Software, 9(8), 1-19.

See Also

GJRM-package,gjrm


Internal Function

Description

This and other similar internal functions provide the log-likelihood, gradient and observed information matrix for penalized/unpenalized maximum likelihood optimization when copula models with continuous margins are employed.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk


Internal Function

Description

This and other similar internal functions provide the log-likelihood, gradient and observed information matrix for penalized/unpenalized maximum likelihood optimization when copula models with discrete and continuous margins are employed.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk


Internal Function

Description

This and other similar internal functions provide the log-likelihood, gradient and observed information matrix for penalized/unpenalized maximum likelihood optimization when copula models with discrete margins are employed.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk


Internal Function

Description

It provides the log-likelihood, gradient and observed/Fisher information matrix for penalized/unpenalized maximum likelihood optimization when copula models with binary outcomes are employed.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk


Internal Function

Description

It provides the log-likelihood, gradient and observed information matrix for penalized/unpenalized maximum likelihood optimization when copula models with binary and continuous margins are employed.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk


Internal Function

Description

It provides the log-likelihood, gradient and observed information matrix for penalized/unpenalized maximum likelihood optimization when copula sample selection models with continuous margins are employed.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk


Internal Function

Description

It provides the log-likelihood, gradient and observed information matrix for penalized/unpenalized maximum likelihood optimization when fitting univariate models with discrete/continuous response.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk


Internal Function

Description

It provides the log-likelihood, gradient and observed information matrix for penalized/unpenalized maximum likelihood optimization when copula models with binary and discrete margins are employed.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk


Internal Function

Description

It provides the log-likelihood, gradient and observed information matrix for penalized/unpenalized maximum likelihood optimization when copula sample selection models with discrete margins are employed.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk


Internal Function

Description

It provides the log-likelihood, gradient and observed or expected information matrix for penalized/unpenalized maximum likelihood optimization when bivariate probit models with partialobservability are employed.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk


Internal Function

Description

It provides the log-likelihood, gradient and observed/Fisher information matrix for penalized/unpenalized maximum likelihood optimization when copula sample selection models with binary outcomes are employed.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk


Clarke test

Description

The Clarke test is a likelihood-ratio-based test that can be used for choosing between two non-nested models.

Usage

clarke.test(obj1, obj2, sig.lev = 0.05)

Arguments

obj1,obj2

Objects of the two fitted bivariate non-nested models.

sig.lev

Significance level used for testing.

Details

The Clarke (2007) test is a likelihood-ratio-based tests for model selection that use the Kullback-Leibler information criterion, and that can be employed for choosing between two bivariate models which are non-nested.

If the two models are statistically equivalent then the log-likelihood ratios of the observations should be evenly distributed around zero and around half of the ratios should be larger than zero. The test follows asymptotically a binomial distribution with parametersn and 0.5. Critical values can be obtained as shown in Clarke (2007). Intuitively, modelobj1 is preferred overobj2 if the value of the test is significantly larger than its expected value under the null hypothesis (n/2), and vice versa. If the value is not significantly different fromn/2 thenobj1 can be thought of as equivalent toobj2.

Value

It returns a decision.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk

References

Clarke K. (2007), A Simple Distribution-Free Test for Non-Nested Model Selection.Political Analysis, 15, 347-363.

Examples

## see examples for gjrm

Conditional Mean/Variance from a Copula Model

Description

Functioncond.mv can be used to calculate conditional means/variances from a copula model, with corresponding interval obtained using posterior simulation.

Usage

cond.mv(x, eq, y1 = NULL, y2 = NULL, newdata, fun = "mean", n.sim = 100,         prob.lev = 0.05)

Arguments

x

A fittedcond.mv object as produced by the respective fitting function.

eq

Equation of interest. From this, conditioning is also deduced.

y1,y2

Values for y1 and y2. Depending on the fitted model, one of them may be required.

newdata

A data frame with one row, which must be provided.

fun

Either mean or variance.

n.sim

Number of simulated coefficient vectors from the posterior distribution of the estimated model parameters.

prob.lev

Overall probability of the left and right tails of the simulated distribution used for interval calculations.

Details

cond.mv() calculates the conditional mean or variance of copula models. Posterior simulation is used to obtain a confidence/credible interval.

Value

res

It returns three values: lower confidence interval limit, estimated conditional mean or variance and upper interval limit.

prob.lev

Probability level used.

sim.mv

It returns a vector containing simulated values of the conditional mean or variance. This is used to calculate intervals.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk

See Also

GJRM-package,gjrm


Some convergence diagnostics

Description

It takes a fitted model object and produces some diagnostic information about the fitting procedure.

Usage

conv.check(x, blather = FALSE)

Arguments

x

gjrm object.

blather

IfTRUE then more diagnostic information is provided.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk

See Also

gamlss,gjrm


Internal Function

Description

This and other similar internal functions evaluate the first and second derivatives with respect to the margins and association parameter of several copulae.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk


Copula probabilities (joint and conditional) from a fitted simultaneous model

Description

copula.prob can be used to calculate the joint or conditional copula probabilities from a fitted simultaneous model with intervals obtained via posterior simulation.

Usage

copula.prob(x, y1, y2, y3 = NULL, newdata, joint = TRUE, cond = 0,            intervals = FALSE, n.sim = 100, prob.lev = 0.05,             theta = FALSE, tau = FALSE, min.pr = 1e-323, max.pr = 1)

Arguments

x

A fittedgjrm object as produced by the respective fitting function.

y1

Value of response for first margin.

y2

Value of response for second margin.

y3

Value of response for third margin if a trivariate model is employed.

newdata

A data frame with one row, which must be provided.

joint

IfTRUE then the calculation is done using the fitted joint model. IfFALSE then the calculation is done from univariate fits.

cond

There are three possible values: 0 (joint probabilities are delivered), 1 (conditional probabilities are delivered and conditioning is with the respect to the first margin), 2 (as before but conditioning is with the respect to the second margin).

intervals

IfTRUE then intervals for the probabilities are also produced.

n.sim

Number of simulated coefficient vectors from the posterior distribution of the estimated model parameters. This is usedfor interval calculations.

prob.lev

Overall probability of the left and right tails of the probabilities' distributions used for interval calculations.

theta

IfTRUE the theta dependence parameter will be shown. This is especially useful for prediction purposes when theta is specified as a function of covariate effects.

tau

IfTRUE the Kendall's tau will also be calculated and provided in output. Note that the calculation adopted here assumes continuous margins. In all other cases, this may provide a rough indication of dependence under certain assumptions. Note that, for the F, PL and J0 (and the relatedrotations), computing times may be longer than for the other cases. This is especially useful for prediction purposes when theta is specified as a function of covariate effects, with an interest in analysing a more interpretable measure of dependence for certain copulae.

min.pr,max.pr

Allowed minimum and maximum for estimated probabities.

Details

This function calculates joint or conditional copula probabilities from a fitted simultaneous model or a model assuming independence, with intervals obtained via posterior simulation.

Value

res

It returns several values including: estimated probabilities (p12), with lower and upper interval limits (CIpr) ifintervals = TRUE, andp1,p2 andp3 (the marginal probabilities).

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk

See Also

GJRM-package,gjrm


Internal fitting function

Description

Internal fitting and set up function.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk


Cross validation for informative censoring univariate survival models

Description

cv.inform carries out cross validation to help choosing the set of informative covariates.

Usage

cv.inform(x, K = 5, data, informative = "yes")

Arguments

x

A fittedgamlss object as produced by the respective fitting function.

K

No. of folds.

data

Data.

informative

If no then cv is carried out for the case of no informative censoring. This is useful for comparison purposes.

Details

cv.inform carries out cross validation to help choosing the set of informative covariates.

Value

sl

Overall sum of predicted likelihood contributions.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk

See Also

GJRM-package,gamlss


Internal Function

Description

This and other similar internal functions evaluate the margins' derivatives needed in the likelihood function for the binary, discrete and continuous cases.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk


Internal Function

Description

This and other similar internal functions map certain key quantities into a feasible parameter space. Some functions carry out some general consistency checks.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk


Internal Function

Description

This and other similar internal functions calculate the score for trivariate binary models.

Author(s)

Author: Panagiota Filippou

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk


Generalised Additive Models for Location, Scale and Shape and Beyond

Description

gamlss fits flexible univariate regression models for several continuous and discrete distributions as well as survival outcomes, and types of covariate effects. When first designed, the purpose of this function was only to provide, in some cases, starting values for the simultaneous models in the package. At a later stage, it was made available in the form of a proper function should the user wish to fit univariate models using the general estimation approach of this package. The continuous and discrete distributions used here are parametrised according to Rigby and Stasinopoulos (2005).

Usage

gamlss(formula, data = list(), weights = NULL, subset = NULL, offset = NULL,        family = "N", cens = NULL, type.cens = "R", ub.t = NULL, left.trunc = 0,       robust = FALSE, rc = 3, lB = NULL, uB = NULL, infl.fac = 1,        rinit = 1, rmax = 100, iterlimsp = 50, tolsp = 1e-07,       gc.l = FALSE, parscale, gev.par = -0.25,       chunk.size = 10000, knots = NULL,       informative = "no", inform.cov = NULL, family2 = "-cloglog",        fp = FALSE, sp = NULL,       drop.unused.levels = TRUE, siginit = NULL, shinit = NULL,       sp.method = "perf", hrate = NULL, d.lchrate = NULL, d.rchrate = NULL,       d.lchrate.td = NULL, d.rchrate.td = NULL, truncation.time = NULL,       min.dn = 1e-40, min.pr = 1e-16, max.pr = 0.9999999, ygrid.tol = 1e-08)

Arguments

formula

List of equations. This should contain one or more equations.

data

A data frame.

weights

Optional vector of prior weights to be used in fitting.

subset

Optional vector specifying a subset of observations to be used in the fitting process.

offset

Optional vector specifying an offset for use in fitting. Option introduced for dealing with offset with discrete distributions.

family

Possible choices are Gaussian ("N"), truncated Gaussian ("tN"), Tweedie ("TW"),log-normal ("LN"), Gumbel ("GU"), reverse Gumbel ("rGU"), generelised Pareto ("GP"), generelised Pareto II ("GPII") where the shape parameter is forced to be > -0.5, generelised Pareto (with orthogonal parametrisation) ("GPo") where the shape parameter is forced to be > -0.5,discrete generelised Pareto ("DGP"),discrete generelised Pareto II ("DGPII") where the shape parameter is forced to be positive, discrete generelised Pareto derivedunder the scenario in which shape = 0 ("DGP0"), logistic ("LO"), Weibull ("WEI"), Inverse Gaussian ("IG"), gamma ("GA"), Dagum ("DAGUM"), Singh-Maddala ("SM"), beta ("BE"), Fisk ("FISK", also known as log-logistic), Poisson ("P"), truncated Poisson ("tP"), negative binomial - type I ("NBI"), negative binomial - type II ("NBII"), Poisson inverse Gaussian ("PIG"), truncated negative binomial - type I ("tNBI"), truncated negative binomial - type II ("tNBII"), truncated Poisson inverse Gaussian ("tPIG"), generalised extreme value link function ("GEVlink", this is used for binary responses and is more stable and faster than theR packagebgeva). For survival models,family can be "-cloglog" (similar to generalised proportional hazards), "-logit" (similar to generalised proportional odds), "-probit" (generalised probit).

cens

This is required for a survival model. Whentype.cens is different frommixed, this variable can be equal to 1 if the event occurred and 0 otherwise. Iftype.cens = "mixed" thencens is a mixed factor variable (made up of four possible categories:I for interval,L for left,R for right, andU for uncensored.

type.cens

Type of censoring mechanism. This can be "R", "L", "I" or "mixed".

ub.t

Variable name of right/upper bound whentype.cens = "I" ortype.cens = "mixed" and interval censoring is present.

left.trunc

Value of truncation at left. Currently done for count distributions only.

robust

IfTRUE then the robust version of the model is fitted.

rc

Robust constant.

lB,uB

Bounds for integral in robust case.

infl.fac

Inflation factor for the model degrees of freedom in the approximate AIC. Smoother models can be obtained setting this parameter to a value greater than 1.

rinit

Starting trust region radius. The trust region radius is adjusted as the algorithm proceeds.

rmax

Maximum allowed trust region radius. This may be set very large. If set small, the algorithm traces a steepest descent path.

iterlimsp

A positive integer specifying the maximum number of loops to be performed before the smoothing parameter estimation step is terminated.

tolsp

Tolerance to use in judging convergence of the algorithm when automatic smoothing parameter estimation is used.

gc.l

This is relevant when working with big datasets. IfTRUE then the garbage collector is called more often than it is usually done. This keeps the memory footprint down but it will slow down the routine.

parscale

The algorithm will operate as if optimizing objfun(x / parscale, ...) where parscale is a scalar. If missing then no rescaling is done. See the documentation oftrust for more details.

gev.par

GEV link parameter.

chunk.size

This is used for discrete robust models.

knots

Optional list containing user specified knot values to be used for basis construction.

informative

If "yes" then informative censoring is assumed when using a survival model.

inform.cov

If above is "yes" then a set of informative covariates must be provided.

family2

In the informative survival case, the family for the censored equation can be different from that of the survival equation. Choices are"-cloglog" (siilar to generalised proportional hazards), "-logit" (similar to generalised proportional odds), "-probit" (generalised probit).

fp

IfTRUE then a fully parametric model with unpenalised regression splines if fitted.

sp

A vector of smoothing parameters can be provided here. Smoothing parameters must be supplied in the order that the smooth terms appear in the model equation(s).

drop.unused.levels

By default unused levels are dropped from factors before fitting. For some smooths involving factor variables this may have to be turned off (only use if you know what you are doing).

siginit,shinit

For the GP and DGP distributions, initial values for sigma and shape may be provided.

sp.method

Multiple smoothing automatic parameter selection is perf. efs is an alternative and only sensible optionfor robust models.

hrate

Vector of population hazard rates computed at time of death of each uncensored patient. The length ofhrate should be equal to the number of uncensored observations in the dataset. Needed in the context of excess hazard modelling when uncensored observations are present. Note that this includes left truncated uncensored observations as well.

d.lchrate

Vector of differences of population cumulative excess hazards computed at the age of the patient when the left censoring occurred and at the initial age of the patient. The length ofd.lchrate should be equal to the number of left and/or interval censored observations in the dataset. Needed in the context of excess hazard modelling if left censored and/or interval censored observations are present. In the latter case,d.rchrate also need be provided.

d.rchrate

Vector of differences of population cumulative excess hazards computed at the age of the patient when the at the right interval censoring time and at the initial age of the patient. The length ofd.rchrate should be equal to the number of right censored and/or interval censored observations in the dataset. Needed in the context of excess hazard modelling if right censored and/or interval censored observations are present. In the latter case,d.lchrate also need be provided.

d.lchrate.td

Vector of differences of population cumulative excess hazards computed at the age of the patient when the left censoring occurred and at the age of the patient when the truncation occurred. The length ofd.lchrate.td should be equal to the number of left truncated left censored and/or left truncated interval censored observations in the dataset. Needed in the context of excess hazard modelling if left truncated left censored and/or left truncated interval censored observations are present. In the latter case,d.rchrate.td also need be provided.

d.rchrate.td

Vector of differences of population cumulative excess hazards computed at the age of the patient when the right censoring occurred and at the age of the patient when the truncation occurred. The length ofd.rchrate.td should be equal to the number of left truncated right censored and/or left truncated interval censored observations in the dataset. Needed in the context of excess hazard modelling if left truncated right censored and/or left truncated interval censored observations are present. In the latter case,d.lchrate.td also need be provided.

truncation.time

Variable name of truncation time.

min.dn,min.pr,max.pr

These values are used to set, depending on the model used for modelling, the minimum and maximum allowed for the densities and probabilities. Theseparameters are employed to avoid potential overflows/underflows in the calculations and the default values seem to offer a good compromise. Functionconv.check() provides some relevant diagnostic information which can be used, for example, to check whether the lower bounds ofmin.dn andmin.pr have been reached. So based on this or if the user wishes to do some sensitivity analysis then this can be easily carried out using these three arguments.However, the user has to be cautious. For instance, it would not make much sense to choose formin.dn andmin.pr values bigger than the default ones. Bear in mind that the bounds can be reached for ill-defined models. For certain distributions/models, if convergence failure occurs and the bounds have been reached then the usercan try a sensitivity analysis as mentioned above.

ygrid.tol

Tolerance used to choose grid of response values for robust discrete models. Values smaller than 1e-160 are not allowed for.

Details

The underlying algorithm is described in ?gjrm.

There are many continuous/discrete distributions to choose from and we plan to include more options. Get in touch if you are interested in a particular distribution.

The"GEVlink" option is used for binary response additive models and is more stable and faster than theR packagebgeva.This model has been incorporated into this package to take advantage of the richer set of smoother choices, and of the estimation approach. Details on the model can be found in Calabrese, Marra and Osmetti (2016).

Value

The function returns an object of classgamlss as described ingamlssObject.

WARNINGS

Convergence can be checked usingconv.check which provides some information about the score and information matrix associated with the fitted model. The former should be close to 0 and the latter positive definite.gamlss() will produce some warnings if there is a convergence issue.

Convergence failure may sometimes occur. This is not necessarily a bad thing as it may indicate specific problems with a fitted model. In such a situation, the user may use rescaling (seeparscale). However, the user should especially considerre-specifying/simplifying the model, and/or checking that the chosen distribution fits the response well.In our experience, we found that convergence failure typically occurs when the model has been misspecified and/or the sample size is low compared to the complexity of the model. It is also worth bearing in mind that the use of three parameter distributions requires the datato be more informative than a situation in which two parameter distributions are used instead.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk

References

Aeberhard W.H., Cantoni E., Marra G., Radice R. (2021), Robust Fitting for Generalized Additive Models for Location, Scale and Shape.Statistics and Computing, 31(11), 1-16.

Eletti A., Marra G., Quaresma M., Radice R., Rubio F.J. (2022), A Unifying Framework for Flexible Excess Hazard Modeling with Applications in Cancer Epidemiology.Journal of the Royal Statistical Society Series C, 71(4), 1044-1062.

Marra G., Farcomeni A., Radice R. (2021), Link-Based Survival Additive Models under Mixed Censoring to Assess Risks of Hospital-Acquired Infections.Computational Statistics and Data Analysis, 155, 107092.

Marra G., Radice R. (2017), Bivariate Copula Additive Models for Location, Scale and Shape.Computational Statistics and Data Analysis, 112, 99-113.

Ranjbar S., Cantoni E., Chavez-Demoulin V., Marra G., Radice R., Jaton-Ogay K. (2022), Modelling the Extremes of Seasonal Viruses and Hospital Congestion: The Example of Flu in a Swiss Hospital.Journal of the Royal Statistical Society Series C, 71(4), 884-905.

Calabrese R., Marra G., Osmetti SA (2016), Bankruptcy Prediction of Small and Medium Enterprises Using a Flexible Binary Generalized Extreme Value Model.Journal of the Operational Research Society, 67(4), 604-615.

Marincioni V., Marra G., Altamirano-Medina H. (2018), Development of Predictive Models for the Probabilistic Moisture Risk Assessment of Internal Wall Insulation.Building and Environment, 137, 5257-267.

See Also

GJRM-package,gamlssObject,conv.check,summary.gamlss

Examples

## Not run:  library(GJRM)set.seed(0)n <- 400x1 <- round(runif(n))x2 <- runif(n)x3 <- runif(n)f1 <- function(x) cos(pi*2*x) + sin(pi*x)y1 <- -1.55 + 2*x1 + f1(x2) + rnorm(n)dataSim <- data.frame(y1, x1, x2, x3)resp.check(y1, "N")eq.mu <- y1 ~ x1 + s(x2) + s(x3)eq.s  <-    ~ s(x3)fl    <- list(eq.mu, eq.s)out <- gamlss(fl, data = dataSim)conv.check(out)res.check(out)plot(out, eq = 1, scale = 0, pages = 1, seWithMean = TRUE)plot(out, eq = 2, seWithMean = TRUE)summary(out)AIC(out)BIC(out)################# Robust example################eq.mu <- y1 ~ x1 + x2 + x3fl    <- list(eq.mu)out <- gamlss(fl, data = dataSim, family = "N", robust = TRUE,                   rc = 3, lB = -Inf, uB = Inf)conv.check(out)summary(out)rob.const(out, 100)##eq.s  <-    ~ x3fl    <- list(eq.mu, eq.s)out <- gamlss(fl, data = dataSim, family = "N", robust = TRUE)conv.check(out)summary(out)##eq.mu <- y1 ~ x1 + s(x2) + s(x3)eq.s  <-    ~ s(x3)fl    <- list(eq.mu, eq.s)out1 <- gamlss(fl, data = dataSim, family = "N", robust = TRUE,                sp.method = "efs")conv.check(out1)summary(out1)AIC(out, out1)plot(out1, eq = 1, all.terms = TRUE, pages = 1, seWithMean = TRUE)plot(out1, eq = 2, seWithMean = TRUE)############################ GEV link binary example########################### this incorporates the bgeva# model implemented in the bgeva package# however this implementation is more general, # stable and efficientset.seed(0)n <- 400x1 <- round(runif(n)); x2 <- runif(n); x3 <- runif(n)f1 <- function(x) cos(pi*2*x) + sin(pi*x)f2 <- function(x) x+exp(-30*(x-0.5)^2)   y  <- ifelse(-3.55 + 2*x1 + f1(x2) + rnorm(n) > 0, 1, 0)dataSim <- data.frame(y, x1, x2, x3)out1 <- gamlss(list(y ~ x1 + x2 + x3), family = "GEVlink", data = dataSim)out2 <- gamlss(list(y ~ x1 + s(x2) + s(x3)), family = "GEVlink", data = dataSim)conv.check(out1)conv.check(out2)summary(out1)summary(out2)AIC(out1, out2)BIC(out1, out2)plot(out2, eq = 1, all.terms = TRUE, pages = 1, seWithMean = TRUE)################### prediction of Pr################### Calculate eta (that is, X*model.coef)# For a new data set the argument newdata should be usedeta <- predict(out2, eq = 1, type = "link")# extract gev tail parametergev.par <- out2$gev.par# multiply gev tail parameter by etagevpeta <- gev.par*eta   # establish for which values the model is defined   gevpetaIND <- ifelse(gevpeta < -1, FALSE, TRUE) gevpeta <- gevpeta[gevpetaIND]    # estimate probabilities  pr <- exp(-(1 + gevpeta)^(-1/gev.par))##################################### Flexible survival model examples##################################### Simulate proportional hazards data ##set.seed(0)n  <- 2000c  <- runif(n, 3, 8)u  <- runif(n, 0, 1)z1 <- rbinom(n, 1, 0.5)z2 <- runif(n, 0, 1)t  <- rep(NA, n)beta_0 <- -0.2357beta_1 <- 1f <- function(t, beta_0, beta_1, u, z1, z2){   S_0 <- 0.7 * exp(-0.03*t^1.9) + 0.3*exp(-0.3*t^2.5)  exp(-exp(log(-log(S_0))+beta_0*z1 + beta_1*z2))-u}for (i in 1:n){   t[i] <- uniroot(f, c(0, 8), tol = .Machine$double.eps^0.5,                    beta_0 = beta_0, beta_1 = beta_1, u = u[i],                    z1 = z1[i], z2 = z2[i], extendInt = "yes" )$root}delta   <- ifelse(t < c, 1, 0)u       <- apply(cbind(t, c), 1, min)dataSim <- data.frame(u, delta, z1, z2)1-mean(delta) # average censoring rate# log(u) helps obtaining smoother hazardsout <- gamlss(list(u ~ s(log(u), bs = "mpi") + z1 + s(z2) ), data = dataSim,               family = "-cloglog", cens = delta)res.check(out)summary(out)AIC(out)BIC(out)plot(out, eq = 1, scale = 0, pages = 1)haz.surv(out, newdata = data.frame(z1 = 0, z2 = 0), shade = TRUE,         n.sim = 1000, baseline = TRUE)haz.surv(out, type = "haz", newdata = data.frame(z1 = 0, z2 = 0),         shade = TRUE, n.sim = 1000, baseline = TRUE)# library(mgcv)# out1 <- mgcv::gam(u ~ z1 + s(z2), family = cox.ph(), #                   data = dataSim, weights = delta)# summary(out1)# estimates of z1 and s(z2) are# nearly identical between out and out1 ####################################### Simulate proportional odds data #######################################set.seed(0)n <- 2000c <- runif(n, 4, 8)u <- runif(n, 0, 1)z <- rbinom(n, 1, 0.5)beta_0 <- -1.05t <- rep(NA, n)f <- function(t, beta_0, u, z){   S_0 <- 0.7 * exp(-0.03*t^1.9) + 0.3*exp(-0.3*t^2.5)  1/(1 + exp(log((1-S_0)/S_0)+beta_0*z))-u}for (i in 1:n){    t[i] <- uniroot(f, c(0, 8), tol = .Machine$double.eps^0.5,                     beta_0 = beta_0, u = u[i], z = z[i],                     extendInt="yes" )$root}delta   <- ifelse(t < c,1, 0)u       <- apply(cbind(t, c), 1, min)dataSim <- data.frame(u, delta, z)1-mean(delta) # average censoring rateout <- gamlss(list(u ~ s(log(u), bs = "mpi") + z ), data = dataSim,               family = "-logit", cens = delta)res.check(out)summary(out)AIC(out)BIC(out)plot(out, eq = 1, scale = 0)haz.surv(out, newdata = data.frame(z = 0), shade = TRUE, n.sim = 1000,        baseline = TRUE)haz.surv(out, type = "haz", newdata = data.frame(z = 0),         shade = TRUE, n.sim = 1000)                                       ############################### Mixed censoring example ###############################                          f1 <- function(t, u, z1, z2, z3, z4, s1, s2){     S_0 <- 0.7 * exp(-0.03*t^1.8) + 0.3*exp(-0.3*t^2.5)       exp( -exp(log(-log(S_0)) + 1.3*z1 + 0.5*z2 + s1(z3) + s2(z4)  ) ) - u                 }    datagen <- function(n, z1, z2, z3, z4, s1, s2, f1){    u <- runif(n, 0, 1)  t <- rep(NA, n)    for (i in 1:n) t[i] <- uniroot(f1, c(0, 100), tol = .Machine$double.eps^0.5,                                  u = u[i], s1 = s1, s2 = s2, z1 = z1[i], z2 = z2[i],                                  z3 = z3[i], z4 = z4[i], extendInt = "yes")$root   c1 <-      runif(n, 0, 2)  c2 <- c1 + runif(n, 0, 6)     df <- data.frame(u1 = t, u2 = t, cens = character(n), stringsAsFactors = FALSE)for (i in 1:n){  if(t[i] <= c1[i]) {        df[i, 1] <- c1[i]        df[i, 2] <- NA        df[i, 3] <- "L"         }else if(c1[i] < t[i] && t[i] <= c2[i]){        df[i, 1] <- c1[i]        df[i, 2] <- c2[i]        df[i, 3] <- "I"          }else if(t[i] > c2[i]){        df[i, 1] <- c2[i]        df[i, 2] <- NA        df[i, 3] <- "R"}}uncens <- (df[, 3] %in% c("L", "I")) + (rbinom(n, 1, 0.2) == 1) == 2 df[uncens, 1] <- t[uncens]df[uncens, 2] <- NAdf[uncens, 3] <- "U"dataSim <- data.frame(u1 = df$u1, u2 = df$u2, cens = as.factor(df$cens), z1, z2, z3, z4, t)dataSim  }set.seed(0)n      <- 1000SigmaC <- matrix(0.5, 4, 4); diag(SigmaC) <- 1cov    <- rMVN(n, rep(0,4), SigmaC)cov    <- pnorm(cov)z1     <- round(cov[, 1])z2     <- round(cov[, 2])z3     <- cov[, 3]z4     <- cov[, 4]s1     <- function(x) -0.075*exp(3.2 * x) s2     <- function(x) sin(2*pi*x)  eq1    <- u1 ~ s(log(u1), bs = "mpi") + z1 + z2 + s(z3) + s(z4)dataSim <- datagen(n, z1, z2, z3, z4, s1, s2, f1)out <- gamlss(list(eq1), data = dataSim, family = "-cloglog",               cens = cens, type.cen = "mixed", ub.t = "u2")conv.check(out)summary(out)plot(out, eq = 1, scale = 0, pages = 1)               ndf <- data.frame(z1 = 1, z2 = 0, z3 = 0.2, z4 = 0.5)haz.surv(out, eq = 1, newdata = ndf, type = "surv")haz.surv(out, eq = 1, newdata = ndf, type = "haz", n.sim = 1000)         ## End(Not run)

Fitted gamlssObject object

Description

A fitted gamlss object returned by functiongamlss and of class "gamlss" and "SemiParBIV".

Value

fit

List of values and diagnostics extracted from the output of the algorithm.

gam1,gam2,gam3

Univariate starting values' fits.

coefficients

The coefficients of the fitted model.

weights

Prior weights used during model fitting.

sp

Estimated smoothing parameters of the smooth components.

iter.sp

Number of iterations performed for the smoothing parameter estimation step.

iter.if

Number of iterations performed in the initial step of the algorithm.

iter.inner

Number of iterations performed within the smoothing parameter estimation step.

n

Sample size.

X1,X2,X3,...

Design matrices associated with the linear predictors.

X1.d2,X2.d2,X3.d2,...

Number of columns ofX1,X2,X3, etc.

l.sp1,l.sp2,l.sp3,...

Number of smooth components in the equations.

He

Penalized -hessian/Fisher. This is the same asHeSh for unpenalized models.

HeSh

Unpenalized -hessian/Fisher.

Vb

Inverse ofHe. This corresponds to the Bayesian variance-covariance matrix used for confidence/credible interval calculations.

F

This is obtained multiplying Vb by HeSh.

t.edf

Total degrees of freedom of the estimated bivariate model. It is calculated assum(diag(F)).

edf1,edf2,edf3,...

Degrees of freedom for the model's equations.

wor.c

Working model quantities.

eta1,eta2,eta3,...

Estimated linear predictors.

y1

Response.

logLik

Value of the (unpenalized) log-likelihood evaluated at the (penalized or unpenalized) parameter estimates.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk

See Also

gamlss,summary.gamlss


ggmtrust for penalised network

Description

penalised network, work in progress.

Usage

ggmtrust(s, n, data = NULL, lambda = 1, pen = "lasso", params = NULL, method = "BHHH",          w.alasso = NULL, gamma = 1, a = 3.7)

Arguments

s

Sample covariance matrix.

n

Sample size.

data

Data.

lambda

Regularisation parameter.

pen

Either "lasso" or "ridge".

params

If different from null then these are taken as the starting values.

method

Either "H" or "BHHH".

w.alasso

weight for alasso.

gamma

alasso param.

a

scad param.

Details

penalised network, work in progress.

Value

The function returns an object of classggmtrust.


Generalised Joint Regression Models with Binary/Continuous/Discrete/Survival Margins

Description

gjrm fits flexible joint models with binary/continuous/discrete/survival margins, with several types of covariate effects, copula and marginal distributions.

Usage

gjrm(formula, data = list(), weights = NULL, subset = NULL,       offset1 = NULL, offset2 = NULL,     copula = "N", copula2 = "N", margins, model, dof = 3, dof2 = 3,       cens1 = NULL, cens2 = NULL, cens3 = NULL, dep.cens = FALSE,     ub.t1 = NULL, ub.t2 = NULL, left.trunc1 = 0, left.trunc2 = 0,       uni.fit = FALSE, fp = FALSE, infl.fac = 1,      rinit = 1, rmax = 100, iterlimsp = 50, tolsp = 1e-07,     gc.l = FALSE, parscale, knots = NULL,     penCor = "unpen", sp.penCor = 3,      Chol = FALSE, gamma = 1, w.alasso = NULL,     drop.unused.levels = TRUE,      min.dn = 1e-40, min.pr = 1e-16, max.pr = 0.999999,      h.margins = FALSE)

Arguments

formula

In the basic setup this will be a list of two (or three) formulas, one for equation 1, the other for equation 2 and another onefor equation 3 if a trivariate model is fitted to the data. Otherwise, more equations can be used depending on the number of distributional parameters.s terms are used to specify smooth functions of predictors; see the documentation ofmgcv for further details on formula specifications. Note that if a selection model is employed (that is,model = "BSS"ormodel = "TSS") then the first formula (and the second as well for trivariate models) MUST refer to the selection equation(s).When one outcome is binary and the other continuous/discrete then the first equation should refer to the binary outcome whereas the second to the continuous/discrete one. When one outcome is discrete and the other continuous then the first equation has to be the discrete one.

data

A data frame.

weights

Optional vector of prior weights to be used in fitting.

subset

Optional vector specifying a subset of observations to be used in the fitting process.

offset1,offset2

They can be used to supply model offsets for use in fitting. These have been introduced for dealing with offsets in the case of discrete marginal distributions.

copula

Type of bivariate error distribution employed. Possible choices are "N", "C0", "C90", "C180", "C270", "GAL0", "GAL90", "GAL180", "GAL270", "J0", "J90", "J180", "J270", "G0", "G90", "G180", "G270", "F", "AMH", "FGM", "T", "PL", "HO" which stand for bivariate normal, Clayton, rotated Clayton (90 degrees), survival Clayton,rotated Clayton (270 degrees), Galambos, rotated Galambos (90 degrees), survival Galambos,rotated Galambos (270 degrees), Joe, rotated Joe (90 degrees), survival Joe, rotated Joe (270 degrees),Gumbel, rotated Gumbel (90 degrees), survival Gumbel, rotated Gumbel (270 degrees), Frank, Ali-Mikhail-Haq,Farlie-Gumbel-Morgenstern, Student-t withdof, Plackett, Hougaard. Each of the Clayton, Galambos, Joe and Gumbel copulae is allowed to be mixed with a rotated version of the samefamily. The options are: "C0C90", "C0C270", "C180C90", "C180C270", "GAL0GAL90", "GAL0GAL270", "GAL180GAL90", "GAL180GAL270", "G0G90", "G0G270", "G180G90","G180G270", "J0J90", "J0J270", "J180J90" and "J180J270". This allows the user to model negative and positive tail dependencies.

copula2

As above but used only for Roy models.

margins

It indicates the distributions used for margins. Possible distributions are Gaussian ("N"), truncated Gaussian ("tN"), Tweedie ("TW"), log-normal ("LN"), Gumbel ("GU"), reverse Gumbel ("rGU"), logistic ("LO"), Weibull ("WEI"), Inverse Gaussian ("IG"), gamma ("GA"), Dagum ("DAGUM"), Singh-Maddala ("SM"), beta ("BE"), Fisk ("FISK", also known as log-logistic distribution), Poisson ("P"), truncated Poisson ("tP"), negative binomial - type I ("NBI"), negative binomial - type II ("NBII"), Poisson inverse Gaussian ("PIG"), truncated negative binomial - type I ("tNBI"), truncated negative binomial - type II ("tNBII"), truncated Poisson inverse Gaussian ("tPIG"). If the responses are binary thenpossible link functions are "probit", "logit", "cloglog". For survival models, the margins can be "-cloglog" (similar to generalised proportional hazards), "-logit" (similar to generalised proportional odds), "-probit" (generalised probit). For ordinal marginals, the choices are "ord.probit" and"ord.logit". For extreme value models, there are also options we are working on, which are already implemented in the univariate gamlss() function. These are the generelised Pareto ("GP"), generelised Pareto II ("GPII") where the shape parameter is forced to be > -0.5,generelised Pareto (with orthogonal parametrisation) ("GPo") where the shape parameter is forced to be > -0.5,discrete generelised Pareto ("DGP"), discrete generelised Pareto II ("DGPII") where the shape parameter is forced to be positive, discrete generelised Pareto derivedunder the scenario in which shape = 0 ("DGP0"). Regarding the Tweedie, this margin can currently only be used together with a binary margin; weare working on the discrete/continuous margin extension.

model

Possible values are "B" (bivariate model), "T" (trivariate model), "BSS" (bivariate model with non-random sample selection), "TSS" (trivariate model with double non-random sample selection),"TESS" (trivariate model with endogeneity and non-random sample selection),"BPO" (bivariate model with partial observability)and "BPO0" (bivariate model with partial observability and zero correlation).Options "T", "TESS" and "TSS" are currently for trivariate binary models only. "BPO" and "BPO0" are for bivariate binary models only. "SWITCH" is for the switching regression model.

dof

Ifcopula = "T" then the degrees of freedom can be set to a value greater than 2 and smaller than 249. Only for continuous margins,this will be taken as a starting value and the dof estimated from the data.

dof2

As above but used only for Roy models.

cens1

Censoring indicator for the first equation. For the case of right censored data only, this variable can be equal to 1 if the event occurred and 0 otherwise. However, if there are several censoring mechanisms thencens will have to be specified as a factor variable made up of four possible categories:I for interval,L for left,R for right, andU for uncensored.

cens2

Same as above but for the second equation.

cens3

Binary censoring indicator employed only whendep.cens = TRUE and administrative censoring is present.

dep.cens

If TRUE then the dependence censored model is employed.

ub.t1,ub.t2

Variable names of right/upper bounds when interval censoring is present.

left.trunc1,left.trunc2

Values of truncation at left. Currently done for count distributions only.

uni.fit

Ifuni.fit = TRUE then gamlss or gam univariate models are also fitted. This is useful for obtaining starting values, for instance.

fp

IfTRUE then a fully parametric model with unpenalised regression splines if fitted. See the Example 2 below.

infl.fac

Inflation factor for the model degrees of freedom in the approximate AIC. Smoother models can be obtained setting this parameter to a value greater than 1.

rinit

Starting trust region radius. The trust region radius is adjusted as the algorithm proceeds. See the documentationoftrust for further details.

rmax

Maximum allowed trust region radius. This may be set very large. If set small, the algorithm traces a steepest descent path.

iterlimsp

A positive integer specifying the maximum number of loops to be performed before the smoothing parameter estimation step is terminated.

tolsp

Tolerance to use in judging convergence of the algorithm when automatic smoothing parameter estimation is used.

gc.l

This is relevant when working with big datasets. IfTRUE then the garbage collector is called more often than it is usually done. This keeps the memory footprint down but it will slow down the routine.

parscale

The algorithm will operate as if optimizing objfun(x / parscale, ...) where parscale is a scalar. If missing then no rescaling is done. See the documentation oftrust for more details.

knots

Optional list containing user specified knot values to be used for basis construction.

penCor

This and the arguments below are only for trivariate binary models. Type of penalty for correlation coefficients. Possible values are "unpen", "lasso", "ridge", "alasso".

sp.penCor

Starting value for smoothing parameter ofpenCor.

Chol

IfTRUE then the Cholesky method instead of the eigenvalue method is employed for the correlation matrix.

gamma

Inflation factor used only for the alasso penalty.

w.alasso

When using the alasso penalty a weight vector made up of three values must be provided.

drop.unused.levels

By default unused levels are dropped from factors before fitting. For some smooths involving factor variables this may have to be turned off (only use if you know what you are doing).

min.dn,min.pr,max.pr

These values are used to set, depending on the model used for modelling, the minimum and maximum allowed for the densities and probabilities; recall that the margins of copula models have to be in the range (0,1). Theseparameters are employed to avoid potential overflows/underflows in the calculations and the default values seem to offer a good compromise. Functionconv.check() provides some relevant diagnostic information which can be used, for example, to check whether the lower bounds ofmin.dn andmin.pr have been reached. So based on this or if the user wishes to do some sensitivity analysis then this can be easily carried out using these three arguments.However, the user has to be cautious. For instance, it would not make much sense to choose formin.dn andmin.pr values bigger than the default ones. Bear in mind that the bounds can be reached for ill-defined models. For certain distributions/models, if convergence failure occurs and the bounds have been reached then the usercan try a sensitivity analysis as mentioned above.

h.margins

IfTRUE then the marginal distributions are treated as copula h-functions. This is still experimental.

Details

The joint models considered by this function consist of two or three model equations which depend on flexible linear predictors andwhose dependence between the responses is modelled through one or more parameters of a chosen multivariate distribution. The additive predictors of the equations are flexibly specified using parametric components and smooth functions of covariates. The same can be done for the dependence parameter(s) if it makes sense.Estimation is achieved within a penalized likelihood framework with integrated automatic multiple smoothing parameter selection. The use of penalty matrices allows for the suppression of that part of smooth term complexity which has no support from the data. The trade-off between smoothness and fitness is controlled by smoothing parameters associated with the penalty matrices. Smoothing parameters are chosen to minimise an approximate AIC.

For sample selection models, if there are factors in the model then before fitting the user has to ensure that the numbers of factor variables' levels in the selected sample are the same as those in the complete dataset. Even if a model could be fitted in such a situation,the model may produce fits which are not coherent with the nature of the correction sought. As an example consider the situation in which the complete dataset contains a factor variable with five levels and that only three of them appear in the selected sample. For the outcome equation (which is the one of interest) only three levels of such variable exist in the population, but their effects will be corrected for non-random selection using a selection equation in which five levels exist instead. Having differing numbers of factors' levels between complete and selected samples will also make prediction not feasible (an aspect which may be particularly important for selection models);clearly it is not possible to predict the response of interest for the missing entries using a dataset that contains all levels of a factor variable but using an outcome model estimated using a subset of these levels.

There are many continuous/discrete/survival distributions and copula functions to choose from and we plan to include more options. Get in touch if you are interested in a particular distribution.

Value

The function returns an object of classgjrm as described ingjrmObject.

WARNINGS

Convergence can be checked usingconv.check which provides some information about the score and information matrix associated with the fitted model. The former should be close to 0 and the latter positive definite.gjrm() will produce some warnings if there is a convergence issue.

Convergence failure may sometimes occur. This is not necessarily a bad thing as it may indicate specific problems with a fitted model.In such a situation, the user may use rescaling (seeparscale). Usinguni.fit = TRUE is typically more effective than the first two options asthis will provide better calibrated starting values as compared to those obtained from the default starting value procedure.The default option is, however,uni.fit = FALSE only because it tends to be computationally cheaper and because the default procedure has typically been found to do a satisfactory job in most cases. (The results obtained when usinguni.fit = FALSE anduni.fit = TRUE could also be compared to check if starting values make any difference.)

The above suggestions may help, especially the latter option. However, the user should also considerre-specifying/simplifying the model, and/or using a diferrent dependence structure and/or checking that the chosen marginal distributions fit the responses well.In our experience, we found that convergence failure typically occurs when the model has been misspecified and/or the sample size is low compared to the complexity of the model. Examplesof misspecification include using a Clayton copula rotated by 90 degrees when a positiveassociation between the margins is present instead, using marginal distributions that do not fitthe responses, and employing a copula which does not accommodate the type and/or strength ofthe dependence between the margins (e.g., using AMH when the association between the margins is strong). When using smooth functions, if the covariate's values are too sparse then convergence may be affected by this.It is also worth bearing in mind that the use of three parameter marginal distributions requires the datato be more informative than a situation in which two parameter distributions are used instead.

In the contexts of endogeneity and non-random sample selection, extra attention is required when specifyingthe dependence parameter as a function of covariates. This is because in these situations the dependence parameter mainly models the association between the unobserved confounders in the two equations. Therefore, this option would make sense when it is believed that the strength of the association between the unobservables in the two equations varies based on some grouping factor or across geographical areas, for instance. In any case, a clear rationale is typically needed in such cases.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk

References

See help("GJRM-package").

See Also

adjCov,vuong.test,clarke.test,GJRM-package,gjrmObject,conv.check,summary.gjrm

Examples

library(GJRM)##################################### JOINT MODELS WITH BINARY MARGINS ###################################################### EXAMPLE 1 ##set.seed(0)n <- 400Sigma <- matrix(0.5, 2, 2); diag(Sigma) <- 1u     <- rMVN(n, rep(0,2), Sigma)x1 <- round(runif(n)); x2 <- runif(n); x3 <- runif(n)f1   <- function(x) cos(pi*2*x) + sin(pi*x)f2   <- function(x) x+exp(-30*(x-0.5)^2)   y1 <- ifelse(-1.55 + 2*x1    + f1(x2) + u[,1] > 0, 1, 0)y2 <- ifelse(-0.25 - 1.25*x1 + f2(x2) + u[,2] > 0, 1, 0)dataSim <- data.frame(y1, y2, x1, x2, x3)## CLASSIC BIVARIATE PROBITout  <- gjrm(list(y1 ~ x1 + x2 + x3,                        y2 ~ x1 + x2 + x3),                        data = dataSim,                        margins = c("probit", "probit"),                       model = "B")conv.check(out)summary(out)AIC(out)BIC(out)## Not run:  ## BIVARIATE PROBIT with Splinesout  <- gjrm(list(y1 ~ x1 + s(x2) + s(x3),                   y2 ~ x1 + s(x2) + s(x3)),                    data = dataSim,                  margins = c("probit", "probit"),                  model = "B")conv.check(out)summary(out)AIC(out)## estimated smooth function plotsplot(out, eq = 1, pages = 1, seWithMean = TRUE, scale = 0)plot(out, eq = 2, pages = 1, seWithMean = TRUE, scale = 0)## BIVARIATE PROBIT with Splines and ## varying dependence parametereq.mu.1  <- y1 ~ x1 + s(x2)eq.mu.2  <- y2 ~ x1 + s(x2)eq.theta <-    ~ x1 + s(x2)fl <- list(eq.mu.1, eq.mu.2, eq.theta)outD <- gjrm(fl, data = dataSim,             margins = c("probit", "probit"),             model = "B")             conv.check(outD)  summary(outD)summary(outD$theta)plot(outD, eq = 1, seWithMean = TRUE)plot(outD, eq = 2, seWithMean = TRUE)plot(outD, eq = 3, seWithMean = TRUE)graphics.off()################# EXAMPLE 2 #### Generate data with one endogenous variable ## and exclusion restriction (or instrument)set.seed(0)n <- 400Sigma <- matrix(0.5, 2, 2); diag(Sigma) <- 1u     <- rMVN(n, rep(0,2), Sigma)cov   <- rMVN(n, rep(0,2), Sigma)cov   <- pnorm(cov)x1 <- round(cov[,1]); x2 <- cov[,2]f1   <- function(x) cos(pi*2*x) + sin(pi*x)f2   <- function(x) x+exp(-30*(x-0.5)^2)   y1 <- ifelse(-1.55 + 2*x1    + f1(x2) + u[,1] > 0, 1, 0)y2 <- ifelse(-0.25 - 1.25*y1 + f2(x2) + u[,2] > 0, 1, 0)dataSim <- data.frame(y1, y2, x1, x2)### Testing the hypothesis of absence of endogeneity... #LM.bpm(list(y1 ~ x1 + s(x2), y2 ~ y1 + s(x2)), dataSim, model = "B")## CLASSIC RECURSIVE BIVARIATE PROBITout <- gjrm(list(y1 ~ x1 + x2,                  y2 ~ y1 + x2),                  data = dataSim,                 margins = c("probit", "probit"), model = "B")conv.check(out)                        summary(out)## FLEXIBLE RECURSIVE BIVARIATE PROBITout <- gjrm(list(y1 ~ x1 + s(x2),                  y2 ~ y1 + s(x2)),                  data = dataSim,                 margins = c("probit", "probit"),                 model = "B")conv.check(out)                        summary(out)### Testing the hypothesis of absence of endogeneity post estimation... gt.bpm(out)### Causal effects## average treatment effect, risk ratio and odds ratio with CIs mb(y1, y2, model = "B")ATE(out, trt = "y1") RR(out, trt = "y1") OR(out, trt = "y1") ATE(out, trt = "y1", joint = FALSE) ## try a Clayton copula model... outC <- gjrm(list(y1 ~ x1 + s(x2),                   y2 ~ y1 + s(x2)),                   data = dataSim, copula = "C0",                  margins = c("probit", "probit"),                  model = "B")conv.check(outC)                         summary(outC)ATE(outC, trt = "y1") ## try a Joe copula model... outJ <- gjrm(list(y1 ~ x1 + s(x2),                   y2 ~ y1 + s(x2)),                   data = dataSim, copula = "J0",                  margins = c("probit", "probit"),                  model = "B")conv.check(outJ)summary(outJ)ATE(outJ, "y1") vuong.test(out, outJ)clarke.test(out, outJ)### recursive bivariate probit modelling with unpenalized splines ## can be achieved as followsoutFP <- gjrm(list(y1 ~ x1 + s(x2, bs = "cr", k = 5),                    y2 ~ y1 + s(x2, bs = "cr", k = 6)),                    fp = TRUE, data = dataSim,                   margins = c("probit", "probit"),                   model = "B")conv.check(outFP)                            summary(outFP)# in the above examples a third equation could be introduced# as illustrated in Example 1################# EXAMPLE 3 #### Generate data with a non-random sample selection mechanism ## and exclusion restrictionset.seed(0)n <- 2000Sigma <- matrix(0.5, 2, 2); diag(Sigma) <- 1u     <- rMVN(n, rep(0,2), Sigma)SigmaC <- matrix(0.5, 3, 3); diag(SigmaC) <- 1cov    <- rMVN(n, rep(0,3), SigmaC)cov    <- pnorm(cov)bi <- round(cov[,1]); x1 <- cov[,2]; x2 <- cov[,3]  f11 <- function(x) -0.7*(4*x + 2.5*x^2 + 0.7*sin(5*x) + cos(7.5*x))f12 <- function(x) -0.4*( -0.3 - 1.6*x + sin(5*x))  f21 <- function(x) 0.6*(exp(x) + sin(2.9*x)) ys <-  0.58 + 2.5*bi + f11(x1) + f12(x2) + u[, 1] > 0y  <- -0.68 - 1.5*bi + f21(x1) +         + u[, 2] > 0yo <- y*(ys > 0)  dataSim <- data.frame(y, ys, yo, bi, x1, x2)### Testing the hypothesis of absence of non-random sample selection... LM.bpm(list(ys ~ bi + s(x1) + s(x2), yo ~ bi + s(x1)), dataSim, model = "BSS")# p-value suggests presence of sample selection### SEMIPARAMETRIC SAMPLE SELECTION BIVARIATE PROBIT## the first equation MUST be the selection equationout <- gjrm(list(ys ~ bi + s(x1) + s(x2),                  yo ~ bi + s(x1)),                  data = dataSim, model = "BSS",                 margins = c("probit", "probit"))conv.check(out)                          gt.bpm(out)                        ## compare the two summary outputs below## the second output produces a summary of the results obtained when## selection bias is not accounted forsummary(out)summary(out$gam2)## corrected predicted probability that 'yo' is equal to 1mb(ys, yo, model = "BSS")prev(out)prev(out, joint = FALSE)## estimated smooth function plots## the red line is the true curve## the blue line is the univariate model curve not accounting for selection biasx1.s <- sort(x1[dataSim$ys>0])f21.x1 <- f21(x1.s)[order(x1.s)]-mean(f21(x1.s))plot(out, eq = 2, ylim = c(-1.65,0.95)); lines(x1.s, f21.x1, col="red")par(new = TRUE)plot(out$gam2, se = FALSE, col = "blue", ylim = c(-1.65,0.95),      ylab = "", rug = FALSE)#### try a Clayton copula model... outC <- gjrm(list(ys ~ bi + s(x1) + s(x2),                   yo ~ bi + s(x1)),                   data = dataSim, model = "BSS", copula = "C0",                  margins = c("probit", "probit"))conv.check(outC)summary(outC)prev(outC)################# EXAMPLE 4 #### Generate data with partial observabilityset.seed(0)n <- 1000Sigma <- matrix(0.5, 2, 2); diag(Sigma) <- 1u     <- rMVN(n, rep(0,2), Sigma)x1 <- round(runif(n)); x2 <- runif(n); x3 <- runif(n)y1 <- ifelse(-1.55 + 2*x1 + x2 + u[,1] > 0, 1, 0)y2 <- ifelse( 0.45 - x3        + u[,2] > 0, 1, 0)y  <- y1*y2dataSim <- data.frame(y, x1, x2, x3)## BIVARIATE PROBIT with Partial Observabilityout  <- gjrm(list(y ~ x1 + x2,                   y ~ x3),                   data = dataSim, model = "BPO",                  margins = c("probit", "probit"))conv.check(out)summary(out)# first ten estimated probabilities for the four events from object outcbind(out$p11, out$p10, out$p00, out$p01)[1:10,]# case with smooth function f1 <- function(x) cos(pi*2*x) + sin(pi*x)y1 <- ifelse(-1.55 + 2*x1 + f1(x2) + u[,1] > 0, 1, 0)y2 <- ifelse( 0.45 - x3            + u[,2] > 0, 1, 0)y  <- y1*y2dataSim <- data.frame(y, x1, x2, x3)out  <- gjrm(list(y ~ x1 + s(x2),                   y ~ x3),                   data = dataSim, model = "BPO",                  margins = c("probit", "probit"))conv.check(out)summary(out)plot(out, eq = 1, scale = 0)####################################################### JOINT MODELS WITH BINARY AND/OR CONTINUOUS MARGINS ######################################################################## EXAMPLE 5 #### Generate data## Correlation between the two equations 0.5 - Sample size 400 set.seed(0)n <- 400Sigma <- matrix(0.5, 2, 2); diag(Sigma) <- 1u     <- rMVN(n, rep(0,2), Sigma)x1 <- round(runif(n)); x2 <- runif(n); x3 <- runif(n)f1   <- function(x) cos(pi*2*x) + sin(pi*x)f2   <- function(x) x+exp(-30*(x-0.5)^2)   y1 <- -1.55 + 2*x1    + f1(x2) + u[,1]y2 <- -0.25 - 1.25*x1 + f2(x2) + u[,2]dataSim <- data.frame(y1, y2, x1, x2, x3)resp.check(y1, "N")resp.check(y2, "N")eq.mu.1     <- y1 ~ x1 + s(x2) + s(x3)eq.mu.2     <- y2 ~ x1 + s(x2) + s(x3)eq.sigma1   <-    ~ 1eq.sigma2   <-    ~ 1eq.theta    <-    ~ x1fl <- list(eq.mu.1, eq.mu.2, eq.sigma1, eq.sigma2, eq.theta)# the order above is the one to follow when# using more than two equationsout  <- gjrm(fl, data = dataSim, margins = c("N", "N"),             model = "B")conv.check(out)res.check(out)summary(out)AIC(out)BIC(out)nd <- data.frame(x1 = 1, x2 = 0.4, x3 = 0.6)copula.prob(out, y1 = 1.4, y2 = 2.3, newdata = nd, intervals = TRUE)################# EXAMPLE 6 #### Generate data with one endogenous binary variable ## and continuous outcomeset.seed(0)n <- 1000Sigma <- matrix(0.5, 2, 2); diag(Sigma) <- 1u     <- rMVN(n, rep(0,2), Sigma)cov   <- rMVN(n, rep(0,2), Sigma)cov   <- pnorm(cov)x1 <- round(cov[,1]); x2 <- cov[,2]f1   <- function(x) cos(pi*2*x) + sin(pi*x)f2   <- function(x) x+exp(-30*(x-0.5)^2)   y1 <- ifelse(-1.55 + 2*x1    + f1(x2) + u[,1] > 0, 1, 0)y2 <-        -0.25 - 1.25*y1 + f2(x2) + u[,2] dataSim <- data.frame(y1, y2, x1, x2)## RECURSIVE Modelout <- gjrm(list(y1 ~ x1 + x2,                  y2 ~ y1 + x2),                  data = dataSim, margins = c("probit","N"),                 model = "B")conv.check(out)                        summary(out)res.check(out)## SEMIPARAMETRIC RECURSIVE Modeleq.mu.1   <- y1 ~ x1 + s(x2) eq.mu.2   <- y2 ~ y1 + s(x2)eq.sigma  <-    ~ 1eq.theta  <-    ~ 1fl <- list(eq.mu.1, eq.mu.2, eq.sigma, eq.theta)out <- gjrm(fl, data = dataSim,             margins = c("probit","N"), uni.fit = TRUE,            model = "B")conv.check(out)                        summary(out)res.check(out)ATE(out, trt = "y1")ATE(out, trt = "y1", joint = FALSE)################### EXAMPLE 7 #### Generate data with one endogenous continuous exposure ## and binary outcomeset.seed(0)n <- 1000Sigma <- matrix(0.5, 2, 2); diag(Sigma) <- 1u     <- rMVN(n, rep(0,2), Sigma)cov   <- rMVN(n, rep(0,2), Sigma)cov   <- pnorm(cov)x1 <- round(cov[,1]); x2 <- cov[,2]f1   <- function(x) cos(pi*2*x) + sin(pi*x)f2   <- function(x) x+exp(-30*(x-0.5)^2) y1 <-        -0.25 - 2*x1    + f2(x2) + u[,2] y2 <- ifelse(-0.25 - 0.25*y1 + f1(x2) + u[,1] > 0, 1, 0)dataSim <- data.frame(y1, y2, x1, x2)eq.mu.1   <- y2 ~ y1 + s(x2) eq.mu.2   <- y1 ~ x1 + s(x2)eq.sigma  <-    ~ 1eq.theta  <-    ~ 1fl <- list(eq.mu.1, eq.mu.2, eq.sigma, eq.theta)out <- gjrm(fl, data = dataSim,             margins = c("probit","N"),            model = "B")conv.check(out)                        summary(out)res.check(out)ATE(out, trt = "y1", trt.val = 1)ATE(out, trt = "y1", trt.val = 1, joint = FALSE) RR(out, trt = "y1", trt.val = 1) RR(out, trt = "y1", trt.val = 1, joint = FALSE) OR(out, trt = "y1", trt.val = 1) OR(out, trt = "y1", trt.val = 1, joint = FALSE)######################### EXAMPLE 8       #### SURVIVAL MODELS ##set.seed(0)n  <- 2000c  <- runif(n, 3, 8)u  <- runif(n, 0, 1)z1 <- rbinom(n, 1, 0.5)z2 <- runif(n, 0, 1)t  <- rep(NA, n)beta_0 <- -0.2357beta_1 <- 1f <- function(t, beta_0, beta_1, u, z1, z2){   S_0 <- 0.7 * exp(-0.03*t^1.9) + 0.3*exp(-0.3*t^2.5)  exp(-exp(log(-log(S_0))+beta_0*z1 + beta_1*z2))-u}for (i in 1:n){   t[i] <- uniroot(f, c(0, 8), tol = .Machine$double.eps^0.5,                    beta_0 = beta_0, beta_1 = beta_1, u = u[i],                    z1 = z1[i], z2 = z2[i], extendInt = "yes" )$root}delta1  <- ifelse(t < c, 1, 0)u1      <- apply(cbind(t, c), 1, min)dataSim <- data.frame(u1, delta1, z1, z2)c <- runif(n, 4, 8)u <- runif(n, 0, 1)z <- rbinom(n, 1, 0.5)beta_0 <- -1.05t      <- rep(NA, n)f <- function(t, beta_0, u, z){   S_0 <- 0.7 * exp(-0.03*t^1.9) + 0.3*exp(-0.3*t^2.5)  1/(1 + exp(log((1-S_0)/S_0)+beta_0*z))-u}for (i in 1:n){    t[i] <- uniroot(f, c(0, 8), tol = .Machine$double.eps^0.5,                     beta_0 = beta_0, u = u[i], z = z[i],                     extendInt="yes" )$root}delta2 <- ifelse(t < c,1, 0)u2     <- apply(cbind(t, c), 1, min)dataSim$delta2 <- delta2dataSim$u2     <- u2dataSim$z      <- zeq1 <- u1 ~ s(log(u1), bs = "mpi") + z1 + s(z2)eq2 <- u2 ~ s(log(u2), bs = "mpi") + z eq3 <-    ~ s(z2)out <- gjrm(list(eq1, eq2), data = dataSim,            margins = c("-cloglog", "-logit"),             cens1 = delta1, cens2 = delta2, model = "B")                 # PH margin fit can also be compared with cox.ph from mgcv                 conv.check(out)res <- res.check(out)## martingale residualsmr1 <- out$cens1 - res$qr1mr2 <- out$cens2 - res$qr2# can be plotted against covariates# obs index, survival time, rank order of# surv times# to determine func form, one may use# res from null model against covariate# to test for PH, use:# library(survival)# fit <- coxph(Surv(u1, delta1) ~ z1 + z2, data = dataSim) # temp <- cox.zph(fit) # print(temp)                  # plot(temp, resid = FALSE)     summary(out)AIC(out); BIC(out)plot(out, eq = 1, scale = 0, pages = 1)plot(out, eq = 2, scale = 0, pages = 1)haz.surv(out, eq = 1, newdata = data.frame(z1 = 0, z2 = 0),         shade = TRUE, n.sim = 100, baseline = TRUE)haz.surv(out, eq = 1, newdata = data.frame(z1 = 0, z2 = 0),         shade = TRUE, n.sim = 100, type = "haz", baseline = TRUE,              intervals = FALSE)haz.surv(out, eq = 2, newdata = data.frame(z = 0),         shade = FALSE, n.sim = 100, baseline = TRUE)haz.surv(out, eq = 2, newdata = data.frame(z = 0),         shade = TRUE, n.sim = 100, type = "haz", baseline = TRUE)  newd0 <- newd1 <- data.frame(z = 0, z1 = mean(dataSim$z1),                              z2 = mean(dataSim$z2),                              u1 =  mean(dataSim$u1) + 1,                              u2 =  mean(dataSim$u2) + 1) newd1$z <- 1                   copula.prob(out, newdata = newd0, intervals = TRUE)copula.prob(out, newdata = newd1, intervals = TRUE)out1 <- gjrm(list(eq1, eq2, eq3), data = dataSim,                  margins = c("-cloglog", "-logit"),                   cens1 = delta1, cens2 = delta2, uni.fit = TRUE,                  model = "B")                     ###################################################### Joint continuous and survival outcomes####################################################  # this is complete, just testing, get in touch if interested## eq1 <- z2 ~ z1# eq2 <- u2 ~ s(u2, bs = "mpi") + z  # eq3 <-    ~ s(z2)                  # eq4 <-    ~ s(z2)                  #                   # f.l <- list(eq1, eq2, eq3, eq4)                  #   # out3 <- gjrm(f.l, data = dataSim,#                   margins = c("N", "-logit"), #                   cens1 = NULL, cens2 = delta2, #                   uni.fit = TRUE, model = "B")   # # conv.check(out3)# res.check(out3)# summary(out3)# AIC(out3); BIC(out3)# plot(out3, eq = 2, scale = 0, pages = 1)# plot(out3, eq = 3, scale = 0, pages = 1)  # plot(out3, eq = 4, scale = 0, pages = 1)                  # # newd <- newd1 <- data.frame(z = 0, z1 = mean(dataSim$z1), #                              z2 = mean(dataSim$z2), #                              u2 =  mean(dataSim$u2) + 1) # # copula.prob(out3, y1 = 0.6, newdata = newd, intervals = TRUE)                ########################################### JOINT MODELS WITH THREE BINARY MARGINS ############################################################ EXAMPLE 9 #### Generate data## Correlation between the two equations 0.5 - Sample size 400 set.seed(0)n <- 400Sigma <- matrix(0.5, 3, 3); diag(Sigma) <- 1u     <- rMVN(n, rep(0,3), Sigma)x1 <- round(runif(n)); x2 <- runif(n); x3 <- runif(n)f1   <- function(x) cos(pi*2*x) + sin(pi*x)f2   <- function(x) x+exp(-30*(x-0.5)^2) y1 <- ifelse(-1.55 +    2*x1 - f1(x2) + u[,1] > 0, 1, 0)y2 <- ifelse(-0.25 - 1.25*x1 + f2(x2) + u[,2] > 0, 1, 0)y3 <- ifelse(-0.75 + 0.25*x1          + u[,3] > 0, 1, 0)dataSim <- data.frame(y1, y2, y3, x1, x2)f.l <- list(y1 ~ x1 + s(x2),             y2 ~ x1 + s(x2),            y3 ~ x1)              margs <- c("probit", "probit", "probit") out  <- gjrm(f.l, data = dataSim, model = "T", margins = margs)out1 <- gjrm(f.l, data = dataSim, Chol = TRUE, model = "T", margins = margs)conv.check(out)summary(out)plot(out, eq = 1)plot(out, eq = 2)AIC(out)BIC(out)margs <- c("probit","logit","cloglog") out  <- gjrm(f.l, data = dataSim, model = "T", margins = margs)out1 <- gjrm(f.l, data = dataSim, Chol = TRUE, model = "T", margins = margs)                    conv.check(out)summary(out)plot(out, eq = 1)plot(out, eq = 2)margs <- c("probit", "probit", "probit") f.l <- list(y1 ~ x1 + s(x2),             y2 ~ x1 + s(x2),            y3 ~ x1,               ~ 1, ~ 1, ~ 1)                out1 <- gjrm(f.l, data = dataSim, Chol = TRUE, model = "T", margins = margs)   f.l <- list(y1 ~ x1 + s(x2),             y2 ~ x1 + s(x2),            y3 ~ x1,               ~ 1, ~ s(x2), ~ 1)                out2 <- gjrm(f.l, data = dataSim, Chol = TRUE, model = "T", margins = margs)   f.l <- list(y1 ~ x1 + s(x2),             y2 ~ x1 + s(x2),            y3 ~ x1,               ~ x1, ~ s(x2), ~ x1 + s(x2))                out2 <- gjrm(f.l, data = dataSim, Chol = TRUE, model = "T", margins = margs)   f.l <- list(y1 ~ x1 + s(x2),             y2 ~ x1 + s(x2),            y3 ~ x1,               ~ x1, ~ x1, ~ s(x2))                out2 <- gjrm(f.l, data = dataSim, Chol = TRUE, model = "T", margins = margs) f.l <- list(y1 ~ x1 + s(x2),             y2 ~ x1 + s(x2),            y3 ~ x1,               ~ x1, ~ x1 + x2, ~ s(x2))                out2 <- gjrm(f.l, data = dataSim, Chol = TRUE, model = "T", margins = margs) f.l <- list(y1 ~ x1 + s(x2),             y2 ~ x1 + s(x2),            y3 ~ x1,               ~ x1 + x2, ~ x1 + x2, ~ x1 + x2)                out2 <- gjrm(f.l, data = dataSim, Chol = TRUE, model = "T", margins = margs)                  nw <- data.frame( x1 = 0, x2 = 0.7 )          copula.prob(out2, 1, 1, 1, newdata = nw, cond = 0, intervals = TRUE, n.sim = 100) # with endogenous variable  f.l <- list(y1 ~ x1 + s(x2),             y2 ~ y1 + x1 + s(x2),            y3 ~ x1)              margs <- c("probit", "probit", "probit") out <- gjrm(f.l, data = dataSim, model = "T", margins = margs)conv.check(out)summary(out)ATE(out, trt = "y1", eq = 2, joint = TRUE)ATE(out, trt = "y1", eq = 2, joint = FALSE) ################## EXAMPLE 10 #### Generate data## with double sample selectionset.seed(0)n <- 5000Sigma <- matrix(c(1,   0.5, 0.4,                  0.5,   1, 0.6,                  0.4, 0.6,   1 ), 3, 3)u <- rMVN(n, rep(0,3), Sigma)f1   <- function(x) cos(pi*2*x) + sin(pi*x)f2   <- function(x) x+exp(-30*(x-0.5)^2) x1 <- runif(n)x2 <- runif(n)x3 <- runif(n)x4 <- runif(n)  y1 <-  1    + 1.5*x1 -     x2 + 0.8*x3 - f1(x4) + u[, 1] > 0y2 <-  1    - 2.5*x1 + 1.2*x2 +     x3          + u[, 2] > 0y3 <-  1.58 + 1.5*x1 - f2(x2)                   + u[, 3] > 0dataSim <- data.frame(y1, y2, y3, x1, x2, x3, x4)f.l <- list(y1 ~ x1 + x2 + x3 + s(x4),             y2 ~ x1 + x2 + x3,             y3 ~ x1 + s(x2))             out <- gjrm(f.l, data = dataSim, model = "TSS",            margins = c("probit", "probit", "probit"))conv.check(out)summary(out)plot(out, eq = 1)plot(out, eq = 3)prev(out)prev(out, joint = FALSE)############### EXAMPLE 11set.seed(0)n <- 2000rh <- 0.5      sigmau <- matrix(c(1, rh, rh, 1), 2, 2)u      <- rMVN(n, rep(0,2), sigmau)sigmac <- matrix(rh, 3, 3); diag(sigmac) <- 1cov    <- rMVN(n, rep(0,3), sigmac)cov    <- pnorm(cov)bi <- round(cov[,1]); x1 <- cov[,2]; x2 <- cov[,3]  f11 <- function(x) -0.7*(4*x + 2.5*x^2 + 0.7*sin(5*x) + cos(7.5*x))f12 <- function(x) -0.4*( -0.3 - 1.6*x + sin(5*x))  f21 <- function(x) 0.6*(exp(x) + sin(2.9*x)) ys <-  0.58 + 2.5*bi + f11(x1) + f12(x2) + u[, 1] > 0y  <- -0.68 - 1.5*bi + f21(x1) +           u[, 2]yo <- y*(ys > 0)  dataSim <- data.frame(ys, yo, bi, x1, x2)## CLASSIC SAMPLE SELECTION MODEL## the first equation MUST be the selection equationresp.check(yo[ys > 0], "N")out <- gjrm(list(ys ~ bi + x1 + x2,                  yo ~ bi + x1),                  data = dataSim, model = "BSS",                 margins = c("probit", "N"))conv.check(out)res.check(out)summary(out)AIC(out)BIC(out)## SEMIPARAMETRIC SAMPLE SELECTION MODELout <- gjrm(list(ys ~ bi + s(x1) + s(x2),                  yo ~ bi + s(x1)),                  data = dataSim, model = "BSS",                 margins = c("probit", "N"))conv.check(out) res.check(out)AIC(out)## compare the two summary outputs## the second output produces a summary of the results obtained when only ## the outcome equation is fitted, i.e. selection bias is not accounted forsummary(out)summary(out$gam2)## estimated smooth function plots## the red line is the true curve## the blue line is the naive curve not accounting for selection biasx1.s <- sort(x1[dataSim$ys>0])f21.x1 <- f21(x1.s)[order(x1.s)] - mean(f21(x1.s))plot(out, eq = 2, ylim = c(-1, 0.8)); lines(x1.s, f21.x1, col = "red")par(new = TRUE)plot(out$gam2, se = FALSE, lty = 3, lwd = 2, ylim = c(-1, 0.8),      ylab = "", rug = FALSE)#### SEMIPARAMETRIC SAMPLE SELECTION MODEL with association ## and dispersion parameters ## depending on covariates as welleq.mu.1   <- ys ~ bi + s(x1) + s(x2)eq.mu.2   <- yo ~ bi + s(x1)eq.sigma  <-    ~ bieq.theta  <-    ~ bi + x1fl <- list(eq.mu.1, eq.mu.2, eq.sigma, eq.theta)out <- gjrm(fl, data = dataSim, model = "BSS",                 margins = c("probit", "N"))conv.check(out)   res.check(out)summary(out)summary(out$sigma)summary(out$theta)nd <- data.frame(bi = 0, x1 = 0.2, x2 = 0.8)copula.prob(out, 0, 0.3, newdata = nd, intervals = TRUE)outC0 <- gjrm(fl, data = dataSim, copula = "C0", model = "BSS",              margins = c("probit", "N"))conv.check(outC0)res.check(outC0)## End(Not run)

Fitted gjrm object

Description

A fitted joint model returned by functiongjrm and of class "gjrm", "SemiParBIV", "SemiParTRIV", etc.

Value

fit

List of values and diagnostics extracted from the output of the algorithm.

gam1

Univariate fit for equation 1. See the documentation ofmgcv for full details.

gam2,gam3,...

Univariate fit for equation 2, equation 3, etc.

coefficients

The coefficients of the fitted model.

weights

Prior weights used during model fitting.

sp

Estimated smoothing parameters of the smooth components.

iter.sp

Number of iterations performed for the smoothing parameter estimation step.

iter.if

Number of iterations performed in the initial step of the algorithm.

iter.inner

Number of iterations performed within the smoothing parameter estimation step.

theta

Estimated dependence parameter linking the two equations.

n

Sample size.

X1,X2,X3,...

Design matrices associated with the linear predictors.

X1.d2,X2.d2,X3.d2,...

Number of columns ofX1,X2,X3, etc.

l.sp1,l.sp2,l.sp3,...

Number of smooth components in the equations.

He

Penalized -hessian/Fisher. This is the same asHeSh for unpenalized models.

HeSh

Unpenalized -hessian/Fisher.

Vb

Inverse ofHe. This corresponds to the Bayesian variance-covariance matrix used for confidence/credible interval calculations.

F

This is obtained multiplying Vb by HeSh.

t.edf

Total degrees of freedom of the estimated bivariate model. It is calculated assum(diag(F)).

edf1,edf2,edf3,...

Degrees of freedom for the two equations of the fitted bivariate model (and for the third and fourthequations if present. They are calculated when splines are used.

bs.mgfit

List of values and diagnostics extracted frommagic inmgcv.

conv.sp

IfTRUE then the smoothing parameter selection algorithm stopped before reaching the maximum number of iterations allowed.

wor.c

Working model quantities.

eta1,eta2,eta3,...

Estimated linear predictors for the two equations (as well as the third and fourth equations if present).

y1,y2

Responses of the two equations.

logLik

Value of the (unpenalized) log-likelihood evaluated at the (penalized or unpenalized) parameter estimates.

respvec

List containing response vectors.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk

See Also

gjrm,summary.gjrm


Gradient test

Description

gt.bpm can be used to test the hypothesis of absence of endogeneity, correlated model equations/errors or non-random sample selectionin binary bivariate probit models.

Usage

gt.bpm(x)

Arguments

x

A fittedgjrm object.

Details

The gradient test was first proposed by Terrell (2002) and it is based on classic likelihood theory. See Marra et al. (2017) for full details.

Value

It returns a numeric p-value corresponding to the null hypothesis that the correlation,\theta, is equal to 0.

WARNINGS

This test's implementation is only valid for bivariate binary probit models with normal errors.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk

References

Marra G., Radice R. and Filippou P. (2017), Regression Spline Bivariate Probit Models: A Practical Approach to Testing for Exogeneity.Communications in Statistics - Simulation and Computation, 46(3), 2283-2298.

Terrell G. (2002), The Gradient Statistic.Computing Science and Statistics, 34, 206-215.

Examples

## see examples for gjrm

Post-estimation calculation of hazard, cumulative hazard and survival functions

Description

This function produces estimated values, intervals and plots for the hazard, cumulative hazard and survival functions.

Usage

haz.surv(x, eq, newdata, type = "surv", t.range = NULL, t.vec = NULL,         intervals = TRUE, n.sim = 100, prob.lev = 0.05, shade = FALSE,         bars = FALSE, ylim, ylab, xlab, pch, ls = 100, baseline = FALSE,        min.dn = 1e-200, min.pr = 1e-200, max.pr = 1, plot = TRUE,         print.progress = TRUE, ...)

Arguments

x

A fittedgamlss/gjrm object.

eq

Equation number. This can be ignored for univariate models.

newdata

A data frame or list containing the values of the model covariates at which predictions are required. This must always be provided.

For the individual survival/hazard/cumulative hazard function, the data frame must have one row containing the values of the model covariates corresponding to the individual of interest.

For the (sub-)population survival/hazard/cumulative hazard function, the data frame must have as many rows as there are individuals in the (sub-)population of interest. Each row must contain the values of the model covariates of the corresponding individual.

type

Either"surv","haz" or"cum.haz". In the excess hazard setting these are, respectively, the net survival, the excess hazard and the cumulative excess hazard.

t.range

Time variable range. This must be a vector with only two elements: the minimum and maximum of the time range. IfNULL then it is determined automatically based on the observed data.

t.vec

Vector of time values. This can also be a single time. Note you cannot provide botht.range andt.vec as they are two mutually exclusive ways of defining the time variable. IfNULL then it is determined automatically based on the observed data.

intervals

IfTRUE then intervals are also produced.

n.sim

Number of simulated coefficient vectors from the posterior distribution of the estimated model parameters. This is usedfor interval calculations.

prob.lev

Overall probability of the left and right tails of the probabilities' distributions used for interval calculations.

shade

IfTRUE then it produces shaded regions as confidence bands.

bars

IfTRUE then the confidence intervals are plotted as bars rather than continuous curves. Ift.vec is used and only one time value is provided, this is the only possible plotting option for the confidence intervals. Noteshade andbars are mutually exclusive.

ylim,ylab,xlab,pch

Usual plot arguments.

ls

Length of sequence to use for time variable.

baseline

If baseline is desired; this will set all covariate/smooth effects to zero.

min.dn,min.pr,max.pr

Allowed minimum and maximum for estimated probabities and densities for survival, hazard and cumulative hazard calculations.

plot

IfFALSE then the function does not produce a plot. The default isTRUE.

print.progress

IfFALSE then the function does not print progress made. The default isTRUE.

...

Other arguments to pass to plot.

Value

It produces estimated values, intervals and plots for the hazard, cumulative hazard and survival functions.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk


Kendall's tau a fitted joint model

Description

k.tau can be used to calculate the Kendall's tau from a fitted joint model with intervals obtained via posterior simulation.

Usage

k.tau(x, prob.lev = 0.05)

Arguments

x

A fittedgjrm object as produced by the respective fitting function.

prob.lev

Overall probability of the left and right tails of the probabilities' distributions used for interval calculations.

Details

This function calculates the Kendall's tau a fitted simultaneous model, with intervals obtained via posterior simulation. Note that this is derived under the assumption of continuous margins.

Value

res

It returns the estimated tau with lower and upper interval limits.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk

See Also

GJRM-package,gjrm


Internal Function

Description

Log-logistic robust function.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk


Linear Model Fitting with Constraints

Description

Linear model fitting with positivity and sum-to-one constraints on the model's coefficients.

Usage

lmc(y, X, start.v = NULL, lambda = 1, pen = "none", gamma = 1, a = 3.7)

Arguments

y

Response vector.

X

Design matrix.

start.v

Starting values.

lambda

Tuning parameter.

pen

Type of penalty. Choices are: none, ridge, lasso, alasso, scad.

gamma

Power parameter of adaptive lasso.

a

Scad parameter.

Details

Linear model fitting with positivity and sum-to-one constraints on the model's coefficients.

Value

The function returns an object of classlmc.

Examples

## Not run:  library(GJRM)set.seed(1)n    <- 1000beta <- c(0.07, 0.08, 0.21, 0.12, 0.15, 0.17, 0.2)l    <- length(beta)X    <- matrix(runif(n*l), n, l)y    <- X%*%beta + rnorm(n)out <- lmc(y, X)conv.check(out)out1 <- lmc(y, X, start.v = beta)conv.check(out1)coef(out)                    # estimated   coefficientsround(out$c.coefficients, 3) # constrained coefficientssum(out$c.coefficients)round(out1$c.coefficients, 3) sum(out1$c.coefficients)# penalised estimationout1 <- lmc(y, X, pen = "alasso", lambda = 0.02)conv.check(out1)coef(out1)                    round(out1$c.coefficients, 3)sum(out1$c.coefficients)AIC(out, out1)BIC(out, out1)round(cbind(out$c.coefficients, out1$c.coefficients), 3)# scadn    <- 10000beta <- c(0.2, 0, 0, 0.02, 0.01, 0.01, 0.01, 0.08, 0.21, 0.12, 0.15, 0.17, 0.02)l    <- length(beta)X    <- matrix(runif(n*l), n, l)y    <- X%*%beta + rnorm(n)out1 <- lmc(y, X, pen = "scad", lambda = 0.01)conv.check(out1)coef(out1)  sum(out1$c.coefficients)                  round(cbind(beta, out1$c.coefficients), 2)## End(Not run)

Extract the log likelihood for a fitted copula model

Description

It extracts the log-likelihood for a fittedgjrm model.

Usage

## S3 method for class 'SemiParBIV'logLik(object, ...)

Arguments

object

A fittedgjrm object.

...

Un-used for this function.

Details

Modification of the classiclogLik which accounts for the estimated degrees of freedom used ingjrm.This function is provided so that information criteria work correctly by using the correct number of degrees of freedom.

Value

StandardlogLik object.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk

See Also

AIC,BIC


Marginal Mean/Variance

Description

Functionmarg.mv can be used to calculate marginal means/variances, with corresponding interval obtained using posterior simulation.

Usage

marg.mv(x, eq, newdata, fun = "mean", n.sim = 100, prob.lev = 0.05, bin.model = NULL)

Arguments

x

A fittedmarg.mv object as produced by the respective fitting function.

eq

Number of equation of interest.

newdata

A data frame with one row, which must be provided.

fun

Either mean or variance.

n.sim

Number of simulated coefficient vectors from the posterior distribution of the estimated model parameters.

prob.lev

Overall probability of the left and right tails of the simulated distribution used for interval calculations.

bin.model

If a two part or hurdle model is used then this is the object of a binary regression model fitted using gam() from mgcv.

Details

marg.mv() calculates the marginal mean or variance. Posterior simulation is used to obtain a confidence/credible interval.

Value

res

It returns three values: lower confidence interval limit, estimated marginal mean or variance and upper interval limit.

prob.lev

Probability level used.

sim.mv

It returns a vector containing simulated values of the marginal mean or variance. This is used to calculate intervals.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk

See Also

GJRM-package,gjrm


Nonparametric (worst-case and IV) Manski's bounds

Description

mb can be used to calculate the (worst-case and IV) Manski's bounds and confidence interval covering the true effect of interestwith a fixed probability.

Usage

mb(treat, outc, IV = NULL, model, B = 100, sig.lev = 0.05)

Arguments

treat

Binary treatment/selection variable.

outc

Binary outcome variable.

IV

An instrumental binary variable can be used if available.

model

Possible values are "B" (model with endogenous variable) and "BSS" (model with non-random sample selection).

B

Number of bootstrap replicates. This is used to obtain some components needed for confidence interval calculations.

sig.lev

Significance level.

Details

Based on Manski (1990), this function returns the nonparametric lower and upper (worst-case) Manski's bounds for the average treatment effect (ATE) whenmodel = "B" or prevalence whenmodel = "BSS". When an IV is employedthe function returns IV Manski bounds.

For comparison, it also returns the estimated effect assuming random assignment (i.e., the treatment received or selection relies on the assumption of ignorable observed and unobserved selection). Note that this is equivalent to what provided byATE orprev whentype = "naive", and is different from what obtainedbyATE orprev whentype = "univariate" as observed confounders are accounted forand the assumption here is of ignorable unobserved selection.

A confidence interval covering the true ATE/prevalence with a fixed probability is also provided. This is based on the approach described in Imbens and Manski (2004). NOTE that this interval is typically very close (if not identical) to the lowerand upper bounds.

The ATE can be at most 1 (or 100 in percentage) and the worst-case Manski's bounds have width 1. This means that 0 is always included within the possibilites of these bounds. Nevertheless, this may be useful to check whether the effect from a bivariate recursive model is included within the possibilites of the bounds.

When estimating a prevalance the worst-case Manski's bounds have width equal to the non-response probability,which provides a measure of the uncertainty about the prevalence caused by non-response. Again, this may be useful to check whether the prevalence from a bivariate non-random sample selection model is included within the possibilites of the bounds.

Seegjrm for some examples.

Value

LB,UP

Lower and upper bounds for the true effect of interest.

CI

Confidence interval covering the true effect of interest with a fixed probability.

ate.ra

Estimated effect of interest assuming random assignment.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk

References

Manski C.F. (1990), Nonparametric Bounds on Treatment Effects.American Economic Review, Papers and Proceedings, 80(2), 319-323.

Imbens G.W. and Manski C.F (2004), Confidence Intervals for Partially Identified Parameters.Econometrica, 72(6), 1845-1857.

See Also

gjrm

Examples

## see examples for gjrm

Internal Function

Description

This and other similar internal functions calculate numerical derivatives.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk


Internal Function

Description

It provides an overall penalty matrix in a format suitable for estimation conditional on smoothing parameters.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk


Plotting function

Description

It takes a fittedgjrm object produced bygjrm() and plots the estimated smooth functions on the scale of the linear predictors. This function is a wrapper ofplot.gam() inmgcv. Please see the documentation ofplot.gam() for full details.

Usage

## S3 method for class 'SemiParBIV'plot(x, eq, ...)

Arguments

x

A fittedgjrm object.

eq

The equation from which smooth terms should be considered for printing.

...

Other graphics parameters to pass on to plotting commands, as described forplot.gam() inmgcv.

Details

This function produces plots showing the smooth terms of a fitted semiparametric bivariate probit model. In the case of 1-D smooths, the x axis of each plot is labelled using the name of the regressor, while the y axis is labelled ass(regr, edf) whereregr is the regressor's name, andedf the effective degrees of freedom of the smooth. For 2-D smooths, perspective plots are produced with the x axes labelled with the first and second variable names and the y axis is labelled ass(var1, var2, edf), which indicates the variables of which the term is a function and theedf for the term.

IfseWithMean = TRUE then the intervals include the uncertainty about the overall mean. Note that the smooths are still shown centred. The theoretical arguments and simulation study of Marra and Wood (2012) suggest thatseWithMean = TRUE results in intervals withclose to nominal frequentist coverage probabilities.

Value

The function generates plots.

WARNING

The function can not deal with smooths of more than 2 variables.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk

References

Marra G. and Wood S.N. (2012), Coverage Properties of Confidence Intervals for Generalized Additive Model Components.Scandinavian Journal of Statistics, 39(1), 53-74.

See Also

gjrm


Geographic map with regions defined as polygons

Description

This function produces a map with geographic regions defined by polygons. It is essentially the same function aspolys.plot() inmgcv but with added argumentszlim andrev.col and a wider set of choices forscheme.

Usage

polys.map(lm, z, scheme = "gray", lab = "", zlim, rev.col = FALSE, ...)

Arguments

lm

Named list of matrices where each matrix has two columns. The matrix rows each define the vertex of a boundary polygon.

z

A vector of values associated with each area (item) oflm.

scheme

Possible values are"heat","terrain","topo","cm" and"gray", indicating how to fill the polygons in accordance with the value ofz.

lab

label for plot.

zlim

If missing then the range of z will be chosen usingpretty(z) otherwise the range provided will be used.

rev.col

IfTRUE then coloring scheme is reversed.

...

other arguments to pass to plot.

Details

See help file ofpolys.plot inmgcv.

Value

It produces a plot.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk


Set up geographic polygons

Description

This function creates geographic polygons in a format suitable for smoothing.

Usage

polys.setup(object)

Arguments

object

An RDS file object as extracted from http://www.gadm.org.

Value

It produces a list with polygons (polys), and various names (names0,names1 - first level of aggregation,names2 - second level of aggregation).

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk

Thanks to Guy Harling for suggesting the implementation of this function.

Examples

?hiv

Function to predict quantiles from GP and DGP distributions

Description

It takes a fittedgamlss object produced bygamlss() and produces the desired quntities and respective intervals.

Usage

pred.gp(x, p = 0.5, newdata, n.sim = 100, prob.lev = 0.05)

Arguments

x

A fittedgamlss object.

p

Value of p.

newdata

A data frame or list containing the values of the model covariates at which predictions are required. If not provided then predictions corresponding to the original data are returned.When newdata is provided, it should contain all the variables needed for prediction.

n.sim

The number of simulated coefficient vectors from the posterior distribution of the estimated model parameters. This is used to calculate intervals. It may be increased ifmore precision is required.

prob.lev

Probability of the left and right tails of the posterior distribution used for interval calculations.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk

See Also

gamlss


Prediction function

Description

It takes a fittedgjrm object for the ordinal-continuous case and, for each equation, produces predictions for a new set of values of the model covariates or the original values used for the model fit. Standard errors of predictions can be produced and are based on the posterior distribution of the model coefficients.

Usage

## S3 method for class 'CopulaCLM'predict(object, eq, type = "link", ...)

Arguments

object

A fittedgjrm object.

eq

The equation to be considered for prediction.

type

Type of prediction.

...

Other arguments as inpredict.gam() inmgcv.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk

See Also

gjrm


Prediction function

Description

It takes a fittedgjrm object and, for each equation, produces predictions for a new set of values of the model covariates or the original values used for the model fit. Standard errors of predictions can be produced and are based on the posterior distribution of the model coefficients. This function is a wrapper forpredict.gam() inmgcv. Please see the documentation ofpredict.gam() for full details.

Usage

## S3 method for class 'SemiParBIV'predict(object, eq, ...)

Arguments

object

A fittedgjrm object.

eq

The equation to be considered for prediction.

...

Other arguments as inpredict.gam() inmgcv.

WARNINGS

Whentype = "response" this function will provide prediction assuming that the identity link function is adopted.This means thattype = "link" andtype = "response" will produce the same results, which for some distributions is fine.This is because, for internal reasons, the model object used always assumes an identity link. There are other functions in the packagewhich will produce predictions for the response correctly and we are currently working on extending them to all models in the package.For all the othertype values the function will produce the correct results.

When predicting based on a new data set, this function can not return correct predictions for models based on a copula value of "C0C90", "C0C270", "C180C90", "C180C270", "G0G90", "G0G270", "G180G90","G180G270", "J0J90", "J0J270", "J180J90" or "J180J270".

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk

See Also

gjrm


Estimated overall prevalence from a sample selection model

Description

prev can be used to calculate the overall estimated prevalence from a sample selection model with binay outcome, with corresponding intervalobtained using posterior simulation.

Usage

prev(x, sw = NULL, joint = TRUE, n.sim = 100, prob.lev = 0.05)

Arguments

x

A fittedgjrm object.

sw

Survey weights.

joint

IfFALSE then the prevalence is obtained from the univariate model which neglects the presence of unobserved confounders. WhenTRUE, the prevalence is obtained from the simultaneous model which accounts for observed and unobserved confounders.

n.sim

Number of simulated coefficient vectors from the posterior distribution of the estimated model parameters. It may be increased if more precision is required.

prob.lev

Overall probability of the left and right tails of the prevalence distribution used for interval calculations.

Details

prev estimates the overall prevalence of a disease (e.g., HIV) when there are missing values that are not at random. An interval for the estimated prevalence can be obtained using posterior simulation.

Value

res

It returns three values: lower confidence interval limit, estimated prevalence and upper confidence interval limit.

prob.lev

Probability level used.

sim.prev

Vector containing simulated values of the prevalence. This is used to calculate an interval.

Author(s)

Authors: Giampiero Marra, Rosalba Radice, Guy Harling, Mark E McGovern

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk

References

Marra G., Radice R., Barnighausen T., Wood S.N. and McGovern M.E. (2017), A Simultaneous Equation Approach to Estimating HIV Prevalence with Non-Ignorable Missing Responses.Journal of the American Statistical Association, 112(518), 484-496.

See Also

GJRM-package,gjrm


Print an ATE object

Description

The print method for anATE object.

Usage

## S3 method for class 'ATE'print(x, ...)

Arguments

x

ATE object produced byATE().

...

Other arguments.

Details

print.ATE prints the lower confidence interval limit, estimated ATE and upper confidence interval limit.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk

See Also

ATE


Print an OR object

Description

The print method for anOR object.

Usage

## S3 method for class 'OR'print(x, ...)

Arguments

x

OR object produced byOR().

...

Other arguments.

Details

print.OR prints the lower confidence interval limit, estimated OR and upper confidence interval limit.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk

See Also

OR


Print an PE object

Description

The print method for anPE object.

Usage

## S3 method for class 'PE'print(x, ...)

Arguments

x

PE object produced byPE().

...

Other arguments.

Details

print.PE prints the lower confidence interval limit, estimated PE and upper confidence interval limit.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk

See Also

PE


Print an RR object

Description

The print method for anRR object.

Usage

## S3 method for class 'RR'print(x, ...)

Arguments

x

RR object produced byRR().

...

Other arguments.

Details

print.RR prints the lower confidence interval limit, estimated RR and upper confidence interval limit.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk

See Also

RR


Print an SATE object

Description

The print method for aSATE object.

Usage

## S3 method for class 'SATE'print(x, ...)

Arguments

x

SATE object produced bySATE().

...

Other arguments.

Details

print.SATE prints the lower confidence interval limit, estimated SATE and upper confidence interval limit.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk

See Also

SATE


Print a SemiParBIV object

Description

The print method for aSemiParBIV object.

Usage

## S3 method for class 'SemiParBIV'print(x, ...)

Arguments

x

SemiParBIV object.

...

Other arguments.

Details

It prints out the family, model equations, total number of observations, estimated association coefficient and total effective degrees of freedom for the penalized or unpenalized model.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk


Print a SemiParROY object

Description

The print method for aSemiParROY object.

Usage

## S3 method for class 'SemiParROY'print(x, ...)

Arguments

x

SemiParROY object.

...

Other arguments.

Details

It prints out the family, model equations, total number of observations, estimated association coefficient, etc for the penalized or unpenalized model.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk


Print a SemiParTRIV object

Description

The print method for aSemiParTRIV object.

Usage

## S3 method for class 'SemiParTRIV'print(x, ...)

Arguments

x

SemiParTRIV object.

...

Other arguments.

Details

It prints out the family, model equations, total number of observations, estimated association coefficient and total effective degrees of freedom for the penalized or unpenalized model.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk


Print a cond.mv object

Description

The print method for acond.mv object.

Usage

## S3 method for class 'cond.mv'print(x, ...)

Arguments

x

cond.mv object produced bycond.mv().

...

Other arguments.

Details

print.cond.mv prints the lower confidence interval limit, estimated conditonal mean or variance and upper confidence interval limit.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk

See Also

cond.mv


Print a copulaSampleSel object

Description

The print method for acopulaSampleSel object.

Usage

## S3 method for class 'copulaSampleSel'print(x, ...)

Arguments

x

copulaSampleSel object.

...

Other arguments.

Details

It prints out the family, model equations, total number of observations, estimated association coefficient, etc for the penalized or unpenalized model.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk


Print a gamlss object

Description

The print method for agamlss object.

Usage

## S3 method for class 'gamlss'print(x, ...)

Arguments

x

gamlss object produced bygamlss().

...

Other arguments.

Details

print.gamlss prints out the family, model equations, total number of observations, etc for the penalized or unpenalized model.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk

See Also

gamlss


Print a gjrm object

Description

The print method for agjrm object.

Usage

## S3 method for class 'gjrm'print(x, ...)

Arguments

x

gjrm object produced bygjrm().

...

Other arguments.

Details

print.gjrm prints out the family, model equations, total number of observations, estimated association coefficient, etc for the penalized or unpenalized model.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk

See Also

gjrm


Print a marg.mv object

Description

The print method for amarg.mv object.

Usage

## S3 method for class 'marg.mv'print(x, ...)

Arguments

x

marg.mv object produced bymarg.mv().

...

Other arguments.

Details

print.marg.mv prints the lower confidence interval limit, estimated conditonal mean or variance and upper confidence interval limit.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk

See Also

marg.mv


Print an mb object

Description

The print method for anmb object.

Usage

## S3 method for class 'mb'print(x, ...)

Arguments

x

mb object produced bymb().

...

Other arguments.

Details

print.mb prints the lower and upper bounds, confidence interval, and effect assuming random assignment.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk

See Also

mb


Print an prev object

Description

The print method for anprev object.

Usage

## S3 method for class 'prev'print(x, ...)

Arguments

x

prev object produced byprev().

...

Other arguments.

Details

print.prev prints the lower interval limit, estimated prevalence and upper interval limit.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk

See Also

prev


Internal Function

Description

Internal fitting function.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk


Multivariate Normal Variates

Description

This function simply generates random multivariate normal variates.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk


Internal Function

Description

It applies one of two regularisations on the information matrix if desired. These are based on the Cholesky and eigen decompositions.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk


Diagnostic plots for discrete/continuous response margins

Description

It produces diagnostic plots based on (randomised) quantile residuals.

Usage

res.check(x, intervals = FALSE, n.sim = 100, prob.lev = 0.05)

Arguments

x

A fittedgjrm object.

intervals

IfTRUE then intervals for the qqplots are produced.

n.sim

Number of replicate datasets used to simulate quantiles of the residual distribution.

prob.lev

Overall probability of the left and right tails of the probabilities' distributions used for interval calculations.

Details

If the model fits the response well then the plots should look normally distributed.When fitting models with discrete and/or continuous margins, four plots will be produced. In this case,the argumentsmain2 andxlab2 come into play and allow for differentlabelling across the plots.

Value

qr

It returns the (randomised) quantile residuals for the continuous or discrete margin when fitting a model that involves a binary response.

qr1

As above but for first equation (this applies when fitting models with continuous/discrete margins).

qr2

As above but for second equation.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk

See Also

gjrm


Diagnostic plot for a variable

Description

It produces a normal Q-Q plot for the (randomised) normalised quantile response. It also provides the log-likelihood for AIC calculation, for instance. It is also used for internal purposes.

Usage

resp.check(y, margin = "N", print.par = FALSE, plots = TRUE,            loglik = FALSE, os = FALSE, i.f = FALSE,            min.dn = 1e-40, min.pr = 1e-16, max.pr = 0.999999,           left.trunc = 0)

Arguments

y

Response.

margin

The distributions allowed are: normal ("N"), log-normal ("LN"), generelised Pareto ("GP"), discrete generelised Pareto ("DGP"),Gumbel ("GU"), reverse Gumbel ("rGU"), logistic ("LO"), Weibull ("WEI"), inverse Gaussian ("iG"), gamma ("GA"),Dagum ("DAGUM"), Singh-Maddala ("SM"), beta ("BE"), Fisk ("FISK"), Poisson ("P"), zero truncated Poisson ("ZTP"), negative binomial - type I ("NBI"), negative binomial - type II ("NBII"), Poisson inverse Gaussian ("PIG").

print.par

IfTRUE then the estimated parameters used to construct the plot are returned.

plots

IfFALSE then no plot is produced and only parameter estimates returned.

loglik

IfTRUE then it returns the logLik.

os

IfTRUE then the estimated parameters are returned on the original scale.

i.f

Internal fitting option. This is not for user purposes.

min.dn,min.pr,max.pr

Allowed minimum and maximum for estimated probabities and densities for parameter estimation.

left.trunc

Value of truncation at left. Currently done for count distributions only.

Details

Prior to fitting a model with discrete and/or continuous margins, the distributions for the outcome variablesmay be chosen by checking the normalised quantile responses. These will provide a rough guide to the adequacy of the chosen distribution.The latter are defined as the quantile standard normal function of the cumulative distribution function of the response with scale and locationestimated by MLE. These should behave approximately as normally distributed variables (even though the original observations are not). Therefore, a normal Q-Q plot is appropriate here.

Ifloglik = TRUE then this function also provides the log-likelihood for AIC calculation, for instance.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk

See Also

gjrm


Bootstrap procedure to help select the robust constant in a GAMLSS

Description

It helps finding the robust constant for a GAMLSS.

Usage

rob.const(x, B = 100, left.trunc = 0)

Arguments

x

A fittedgjrm object.

B

Number of bootstrap replicates.

left.trunc

If a truncated count distribution is employed then this is the left truncation point.

Details

It helps finding the robust constant for a GAMLSS based on the mean or median.

Value

rc

Robust constant used in fitting.

sw

Sum of weights for each bootstrap replicate.

m1

Mean.

m2

Median.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk

See Also

gamlss


Tool for tuning bounds of integral in robust models

Description

Tool for tuning bounds of integral in robust GAMLSS with continuous distribution.

Usage

rob.int(x, rc, l.grid = 1000, tol = 1e-4, var.range = NULL)

Arguments

x

A fittedgjrm object, typically from a non-robust fit.

rc

Robust tuning constant.

l.grid

Length of grid.

tol

Tolerance

var.range

Range of values, min and max, to use in calculations.

Details

Tool for tuning bounds of integral in robust GAMLSS.

Value

lB,uB

Lower and upper bounds.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk

See Also

gamlss


SemiParBIV summary

Description

It takes a fittedSemiParBIV object and produces some summaries from it.

Usage

## S3 method for class 'SemiParBIV'summary(object, n.sim = 100, prob.lev = 0.05, gm = FALSE, ...)        ## S3 method for class 'summary.SemiParBIV'print(x, digits = max(3, getOption("digits") - 3),            signif.stars = getOption("show.signif.stars"), ...)

Arguments

object

A fittedSemiParBIV object.

x

summary.SemiParBIV object produced bysummary.SemiParBIV().

n.sim

The number of simulated coefficient vectors from the posterior distribution of the estimated model parameters. This is used to calculate intervals for the association parameter, dispersion coefficient and other measures (e.g., gamma measure). It may be increased ifmore precision is required.

prob.lev

Probability of the left and right tails of the posterior distribution used for interval calculations.

gm

If TRUE then intervals for the gamma measure and odds ratio are calculated.

digits

Number of digits printed in output.

signif.stars

By default significance stars are printed alongside output.

...

Other arguments.

Details

Using some low level functions inmgcv, based on the results of Marra and Wood (2012), ‘Bayesian p-values’ are returned for the smooth terms. These have better frequentist performance than their frequentist counterpart. See the help file ofsummary.gam inmgcv for further details. Covariate selection can also be achieved using a single penalty shrinkage approach as shown in Marra and Wood (2011).

Posterior simulation is used to obtain intervals of nonlinear functions of parameters, such as the association and dispersion parametersas well as the odds ratio and gamma measure discussed by Tajar et al. (2001) ifgm = TRUE.

print.summary.SemiParBIV prints model term summaries.

Value

tableP1

Table containing parametric estimates, their standard errors, z-values and p-values for equation 1.

tableP2,tableP3,...

As above but for equation 2 and equations 3 and 4 if present.

tableNP1

Table of nonparametric summaries for each smooth component including effective degrees of freedom, estimated rank, approximate Wald statistic for testing the null hypothesis that the smooth term is zero and corresponding p-value, for equation 1.

tableNP2,tableNP3,...

As above but for equation 2 and equations 3 and 4 if present.

n

Sample size.

theta

Estimated dependence parameter linking the two equations.

formula1,formula2,formula3,...

Formulas used for the model equations.

l.sp1,l.sp2,l.sp3,...

Number of smooth components in model equations.

t.edf

Total degrees of freedom of the estimated bivariate model.

CItheta

Interval(s) for\theta.

n.sel

Number of selected observations in the sample selection case.

OR,CIor

Odds ratio and related CI. The odds ratio is a measure of association between binary random variables and is defined as p00p11/p10p01. In the case of independence this ratio is equal to 1. It can take values in the range (-Inf, Inf) and it does not depend on the marginal probabilities (Tajar et al., 2001). Interval is calculated using posterior simulation.

GM,CIgm

Gamma measure and related CI. This measure of association was proposed by Goodman and Kruskal (1954). It is defined as (OR - 1)/(OR + 1), can take values in the range (-1, 1) and does not depend on the marginal probabilities.Interval is calculated using posterior simulation.

WARNINGS

Note that the Kendall's tau (and related interval), as implemented here, is a valid measure of dependence for continuous margins and it will only provide a crude indication of dependence in other cases.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk

References

Marra G. and Wood S.N. (2011), Practical Variable Selection for Generalized Additive Models.Computational Statistics and Data Analysis, 55(7), 2372-2387.

Marra G. and Wood S.N. (2012), Coverage Properties of Confidence Intervals for Generalized Additive Model Components.Scandinavian Journal of Statistics, 39(1), 53-74.

Tajar M., Denuit M. and Lambert P. (2001), Copula-Type Representation for Random Couples with Bernoulli Margins. Discussion Papaer 0118, Universite Catholique De Louvain.

See Also

ATE,prev


SemiParROY summary

Description

It takes a fittedSemiParROY object and produces some summaries from it.

Usage

## S3 method for class 'SemiParROY'summary(object, n.sim = 100, prob.lev = 0.05, ...)  ## S3 method for class 'summary.SemiParROY'print(x, digits = max(3, getOption("digits") - 3),            signif.stars = getOption("show.signif.stars"), ...)

Arguments

object

A fittedSemiParROY object.

x

summary.SemiParROY object produced bysummary.SemiParROY().

n.sim

The number of simulated coefficient vectors from the posterior distribution of the estimated model parameters. This is used to calculate intervals for the association parameter, dispersion coefficient, for instance It may be increased ifmore precision is required.

prob.lev

Probability of the left and right tails of the posterior distribution used for interval calculations.

digits

Number of digits printed in output.

signif.stars

By default significance stars are printed alongside output.

...

Other arguments.

Details

print.summary.SemiParROY prints model term summaries.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk

Examples

## see examples for gjrm

SemiParTRIV summary

Description

It takes a fittedSemiParTRIV object and produces some summaries from it.

Usage

## S3 method for class 'SemiParTRIV'summary(object, n.sim = 100, prob.lev = 0.05, ...)## S3 method for class 'summary.SemiParTRIV'print(x, digits = max(3, getOption("digits") - 3),            signif.stars = getOption("show.signif.stars"), ...)

Arguments

object

A fittedSemiParTRIV object.

x

summary.SemiParTRIV object produced bysummary.SemiParTRIV().

n.sim

The number of simulated coefficient vectors from the posterior distribution of the estimated model parameters. This is used to calculate intervals for the association parameter and other measures. It may be increased ifmore precision is required.

prob.lev

Probability of the left and right tails of the posterior distribution used for interval calculations.

digits

Number of digits printed in output.

signif.stars

By default significance stars are printed alongside output.

...

Other arguments.

Details

print.summary.SemiParTRIV prints model term summaries.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk

Examples

## see examples for gjrm

copulaSampleSel summary

Description

It takes a fittedcopulaSampleSel object and produces some summaries from it.

Usage

## S3 method for class 'copulaSampleSel'summary(object, n.sim = 100, prob.lev = 0.05, ...)  ## S3 method for class 'summary.copulaSampleSel'print(x, digits = max(3, getOption("digits") - 3),            signif.stars = getOption("show.signif.stars"), ...)

Arguments

object

A fittedcopulaSampleSel object.

x

summary.copulaSampleSel object produced bysummary.copulaSampleSel().

n.sim

The number of simulated coefficient vectors from the posterior distribution of the estimated model parameters. This is used to calculate intervals for the association parameter, dispersion coefficient, for instance It may be increased ifmore precision is required.

prob.lev

Probability of the left and right tails of the posterior distribution used for interval calculations.

digits

Number of digits printed in output.

signif.stars

By default significance stars are printed alongside output.

...

Other arguments.

Details

print.summary.copulaSampleSel prints model term summaries.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk

Examples

## see examples for gjrm

gamlss summary

Description

It takes a fittedgamlss object and produces some summaries from it.

Usage

## S3 method for class 'gamlss'summary(object, n.sim = 100, prob.lev = 0.05, ...)   ## S3 method for class 'summary.gamlss'print(x, digits = max(3, getOption("digits") - 3),            signif.stars = getOption("show.signif.stars"), ...)

Arguments

object

A fittedgamlss object.

x

summary.gamlss object produced bysummary.gamlss().

n.sim

The number of simulated coefficient vectors from the posterior distribution of the estimated model parameters. This is used to calculate intervals for various parameters. It may be increased ifmore precision is required.

prob.lev

Probability of the left and right tails of the posterior distribution used for interval calculations.

digits

Number of digits printed in output.

signif.stars

By default significance stars are printed alongside output.

...

Other arguments.

Details

print.summary.gamlss prints model term summaries.

Value

tableP1

Table containing parametric estimates, their standard errors, z-values and p-values for equation 1.

tableP2,tableP3

As above but for equations 2 and 3 if present.

tableNP1

Table of nonparametric summaries for each smooth component including effective degrees of freedom, estimated rank, approximate Wald statistic for testing the null hypothesis that the smooth term is zero and corresponding p-value, for equation 1.

tableNP2,tableNP3

As above but for equations 2 and 3.

n

Sample size.

sigma,nu

Estimated distribution specific parameters.

formula1,formula2,formula3

Formulas used for the model equations.

l.sp1,l.sp2,l.sp3

Number of smooth components in model equation.

t.edf

Total degrees of freedom of the estimated bivariate model.

CIsig,CInu

Intervals for distribution specific parameters.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk

Examples

## see examples for gamlss

gjrm summary

Description

It takes a fittedgjrm object and produces some summaries from it.

Usage

## S3 method for class 'gjrm'summary(object, n.sim = 100, prob.lev = 0.05, ...)   ## S3 method for class 'summary.gjrm'print(x, digits = max(3, getOption("digits") - 3),            signif.stars = getOption("show.signif.stars"), ...)

Arguments

object

A fittedgjrm object.

x

summary.gjrm object produced bysummary.gjrm().

n.sim

The number of simulated coefficient vectors from the posterior distribution of the estimated model parameters. This is used to calculate intervals for the association parameter, dispersion coefficient etc. It may be increased ifmore precision is required.

prob.lev

Probability of the left and right tails of the posterior distribution used for interval calculations.

digits

Number of digits printed in output.

signif.stars

By default significance stars are printed alongside output.

...

Other arguments.

Details

print.summary.gjrm prints model term summaries.

Value

tableP1

Table containing parametric estimates, their standard errors, z-values and p-values for equation 1.

tableP2,tableP3,...

As above but for equation 2 and equations 3 and 4 if present.

tableNP1

Table of nonparametric summaries for each smooth component including effective degrees of freedom, estimated rank, approximate Wald statistic for testing the null hypothesis that the smooth term is zero and corresponding p-value, for equation 1.

tableNP2,tableNP3,...

As above but for equation 2 and equations 3 and 4 if present.

n

Sample size.

theta

Estimated dependence parameter linking the two equations.

sigma1,sigma2

Estimated distribution specific parameters for equations 1 and 2.

nu1,nu2

Estimated distribution specific parameters for equations 1 and 2.

formula1,formula2,formula3,...

Formulas used for the model equations.

l.sp1,l.sp2,l.sp3,...

Number of smooth components in model equations.

t.edf

Total degrees of freedom of the estimated bivariate model.

CItheta

Interval(s) for\theta.

CIsig1,CIsig2,CInu1,CInu2

Intervals for distribution specific parameters

WARNINGS

Note that the Kendall's tau (and related interval), as implemented here, is a valid measure of dependence for continuous margins and it will only provide a crude indication of dependence in other cases.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk


Internal Function

Description

It provides score and Hessian for trivariate binary models.

Author(s)

Author: Panagiota Filippou

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk


Vuong test

Description

The Vuong test is likelihood-ratio-based tests that can be used for choosing between two non-nested models.

Usage

vuong.test(obj1, obj2, sig.lev = 0.05)

Arguments

obj1,obj2

Objects of the two fitted bivariate non-nested models.

sig.lev

Significance level used for testing.

Details

The Vuong test is a likelihood-ratio-based tests for model selection that use the Kullback-Leibler information criterion, and that can be employed for choosing between two bivariate models which are non-nested.

The null hypothesis is that the two models are equally close to the actual model, whereas the alternative is that one model is closer. The test follows asymptotically a standard normal distribution under the null. Assume that the critical region is(-c,c), wherec is typically set to 1.96. If the value of the test is higher thanc then we reject the null hypothesis that the models are equivalent in favor of modelobj1. Viceversa if the value is smaller thanc. If the value falls in[-c,c] then we cannot discriminate between the two competing models given the data.

Value

It returns a decision.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk

References

Vuong Q.H. (1989), Likelihood Ratio Tests for Model Selection and Non-Nested Hypotheses.Econometrica, 57(2), 307-333.

Examples

## see examples for gjrm

Internal Function

Description

It efficiently calculates the working model quantities needed to implement the automatic multiple smoothing parameter estimation procedure by exploiting a result which leads to very fast and stable calculations.

Author(s)

Maintainer: Giampiero Marragiampiero.marra@ucl.ac.uk


[8]ページ先頭

©2009-2025 Movatter.jp