| Title: | Longitudinal Gaussian Process Regression |
| Version: | 1.2.5 |
| Description: | Interpretable nonparametric modeling of longitudinal data using additive Gaussian process regression. Contains functionality for inferring covariate effects and assessing covariate relevances. Models are specified using a convenient formula syntax, and can include shared, group-specific, non-stationary, heterogeneous and temporally uncertain effects. Bayesian inference for model parameters is performed using 'Stan'. The modeling approach and methods are described in detail in Timonen et al. (2021) <doi:10.1093/bioinformatics/btab021>. |
| License: | GPL (≥ 3) |
| Encoding: | UTF-8 |
| LazyData: | true |
| Biarch: | true |
| Depends: | R (≥ 3.4.0), methods |
| Imports: | Rcpp (≥ 0.12.0), RcppParallel (≥ 5.0.2), RCurl (≥ 1.98),rstan (≥ 2.26.0), rstantools (≥ 2.3.1), bayesplot (≥ 1.7.0),MASS (≥ 7.3-50), stats (≥ 3.4), ggplot2 (≥ 3.1.0), gridExtra(≥ 0.3.0) |
| LinkingTo: | BH (≥ 1.75.0-0), Rcpp (≥ 1.0.6), RcppEigen (≥ 0.3.3.9.1),RcppParallel (≥ 5.0.2), rstan (≥ 2.26.0), StanHeaders (≥2.26.0) |
| SystemRequirements: | GNU make |
| NeedsCompilation: | yes |
| RoxygenNote: | 7.3.2 |
| Suggests: | knitr, rmarkdown, testthat, covr |
| URL: | https://github.com/jtimonen/lgpr |
| BugReports: | https://github.com/jtimonen/lgpr/issues |
| VignetteBuilder: | knitr |
| Packaged: | 2025-10-30 18:12:13 UTC; juhotimonen |
| Author: | Juho Timonen |
| Maintainer: | Juho Timonen <juho.timonen@iki.fi> |
| Repository: | CRAN |
| Date/Publication: | 2025-10-30 23:50:14 UTC |
The 'lgpr' package.
Description
Interpretable nonparametric modeling of longitudinal datausing additive Gaussian process regression. Contains functionalityfor inferring covariate effects and assessing covariate relevances.Models are specified using a convenient formula syntax, and can includeshared, group-specific, non-stationary, heterogeneous and temporallyuncertain effects. Bayesian inference for model parameters is performedusing 'Stan' (rstan). The modeling approach and methodsare described in detail inTimonen et al. (2021).
Core functions
Main functionality of the package consists of creating and fitting anadditive GP model:
lgp: Specify and fit an additive GP model with onecommand.create_model: Define anlgpmodel object.sample_model: Fit a model by sampling the posteriordistribution of its parameters and create anlgpfit object.pred: Computing model predictions and inferredcovariate effects after fitting a model.relevances: Assessing covariate relevances afterfitting a model.prior_pred: Prior predictive sampling to checkif your prior makes sense.
Visualization
plot_pred: Plot model predictions.plot_components: Visualize inferred model components.plot_draws: Visualize parameter draws.plot_data: Visualize longitudinal data.
Data
The data that you wish to analyze with 'lgpr' should be in anRdata.frame where columns correspond to measured variables and rowscorrespond to observations. Some functions that can help working with suchdata frames are:
new_x: Creating new test points where the posteriordistribution of any function component or sum of all components, or theposterior predictive distribution can be computed after model fitting.Other functions:
add_factor,add_factor_crossing,add_dis_age,adjusted_c_hat.
Vignettes and tutorials
Seehttps://jtimonen.github.io/lgpr-usage/index.html. Thetutorials focus on code and use cases, whereas theMathematical description of lgpr modelsvignette describes the statistical models and how they can be customized in'lgpr'.
Citation
Runcitation("lgpr") to get citation information.
Feedback
Bug reports, PRs, enhancement ideas or user experiences in general arewelcome and appreciated. Create an issue in Github or email the author.
Author(s)
Juho Timonen (first.last at iki.fi)
References
Timonen, J. et al. (2021).lgpr: an interpretable non-parametric method for inferring covariateeffects from longitudinal data. Bioinformatics,url.
Carpenter, B. et al. (2017).Stan: A probabilistic programming language. Journal of StatisticalSoftware 76(1).
See Also
Useful links:
An S4 class to represent analytically computed predictive distributions(conditional on hyperparameters) of an additive GP model
Description
An S4 class to represent analytically computed predictive distributions(conditional on hyperparameters) of an additive GP model
Usage
## S4 method for signature 'GaussianPrediction'show(object)## S4 method for signature 'GaussianPrediction'component_names(object)## S4 method for signature 'GaussianPrediction'num_components(object)## S4 method for signature 'GaussianPrediction'num_paramsets(object)## S4 method for signature 'GaussianPrediction'num_evalpoints(object)Arguments
object | GaussianPrediction object for which to apply aclass method. |
Methods (by generic)
show(GaussianPrediction): Print a summary about the object.component_names(GaussianPrediction): Get names of components.num_components(GaussianPrediction): Get number of components.num_paramsets(GaussianPrediction): Get number of parameter combinations(different parameter vectors) using which predictions were computed.num_evalpoints(GaussianPrediction): Get number of points wherepredictions were computed.
Slots
f_comp_meancomponent means
f_comp_stdcomponent standard deviations
f_meansignal mean (on normalized scale)
f_stdsignal standard deviation (on normalized scale)
y_meanpredictive mean (on original data scale)
y_stdpredictive standard deviation (on original data scale)
xa data frame of points (covariate values) where thefunction posteriors or predictive distributions have been evaluated
See Also
An S4 class to represent input for kernel matrix computations
Description
An S4 class to represent input for kernel matrix computations
Usage
## S4 method for signature 'KernelComputer'show(object)## S4 method for signature 'KernelComputer'num_components(object)## S4 method for signature 'KernelComputer'num_evalpoints(object)## S4 method for signature 'KernelComputer'num_paramsets(object)## S4 method for signature 'KernelComputer'component_names(object)Arguments
object | The object for which to call a class method. |
Methods (by generic)
show(KernelComputer): Print a summary about the object.num_components(KernelComputer): Get number of components.num_evalpoints(KernelComputer): Get number of evaluation points.num_paramsets(KernelComputer): Get number of parameter sets.component_names(KernelComputer): Get component names.
Slots
inputCommon input (for example parameter values).
K_inputInput for computing kernel matrices between data points(
NxN). A list.Ks_inputInput for computing kernel matrices between data and outputpoints (
PxN). A list.Kss_inputInput for computing kernel matrices between outputpoints (
PxP). A list, empty iffull_covariance=FALSE.comp_namesComponent names (character vector).
full_covarianceBoolean value determining if this can computefull predictive covariance matrices (or just marginal variance ateach point).
no_separate_output_pointsBoolean value determining if
Ks_inputandKss_inputare the same thing. Using thisknowledge can reduce unnecessary computations of kernel matrices.STREAMexternal pointer (for calling 'Stan' functions)
An S4 class to represent prior or posteriordraws from an additive function distribution.
Description
An S4 class to represent prior or posteriordraws from an additive function distribution.
Usage
## S4 method for signature 'Prediction'show(object)## S4 method for signature 'Prediction'component_names(object)## S4 method for signature 'Prediction'num_components(object)## S4 method for signature 'Prediction'num_paramsets(object)## S4 method for signature 'Prediction'num_evalpoints(object)Arguments
object | Prediction object for which to apply a classmethod. |
Methods (by generic)
show(Prediction): Print a summary about the object.component_names(Prediction): Get names of components.num_components(Prediction): Get number of components.num_paramsets(Prediction): Get number of parameter combinations(different parameter vectors) using which predictions were computed.num_evalpoints(Prediction): Get number of points wherepredictions were computed.
Slots
f_compcomponent draws
fsignal draws
hpredictions (signal draws + scaling factor
c_hat,transformed through inverse link function)xa data frame of points (covariate values) where thefunctions/predictions have been evaluated/sampled
extrapolatedBoolean value telling if the function draws areoriginal MCMC draws or if they have been created by extrapolatingsuch draws.
See Also
Easily add the disease-related age variable to a data frame
Description
Creates the disease-related age covariate vector based on thedisease initiation times and adds it to the data frame
Usage
add_dis_age(data, t_init, id_var = "id", time_var = "age")Arguments
data | the original data frame |
t_init | A named vector containing the observed initiation or onsettime for each individual. The names, i.e. |
id_var | name of the id variable in |
time_var | name of the time variable in |
Value
A data frame with one column added. The new column willbe calleddis_age. For controls, its value will beNaN.
See Also
Other data frame handling functions:add_factor(),add_factor_crossing(),adjusted_c_hat(),new_x(),split()
Easily add a categorical covariate to a data frame
Description
Easily add a categorical covariate to a data frame
Usage
add_factor(data, x, id_var = "id")Arguments
data | the original data frame |
x | A named vector containing the category for each individual.The names should specify the individual id. |
id_var | name of the id variable in |
Value
A data frame with one column added. The new column willhave same name as the variable passed as inputx.
See Also
Other data frame handling functions:add_dis_age(),add_factor_crossing(),adjusted_c_hat(),new_x(),split()
Add a crossing of two factors to a data frame
Description
Add a crossing of two factors to a data frame
Usage
add_factor_crossing(data, fac1, fac2, new_name)Arguments
data | a data frame |
fac1 | name of first factor, must be found in |
fac2 | name of second factor, must be found in |
new_name | name of the new factor |
Value
a data frame
See Also
Other data frame handling functions:add_dis_age(),add_factor(),adjusted_c_hat(),new_x(),split()
Set the GP mean vector, taking TMM or other normalizationinto account
Description
Creates thec_hat input forlgp,so that it accounts for normalization between data points in the"poisson" or"nb" observation model
Usage
adjusted_c_hat(y, norm_factors)Arguments
y | response variable, vector of length |
norm_factors | normalization factors, vector of length |
Value
a vector of lengthn, which can be used asthec_hat input to thelgp function
See Also
Other data frame handling functions:add_dis_age(),add_factor(),add_factor_crossing(),new_x(),split()
Apply variable scaling
Description
Apply variable scaling
Usage
apply_scaling(scaling, x, inverse = FALSE)Arguments
scaling | an object of classlgpscaling |
x | object to which apply the scaling (numeric) |
inverse | whether scaling should be done in inverse direction |
Value
a similar object asx
See Also
Other variable scaling functions:create_scaling()
Character representations of different formula objects
Description
Character representations of different formula objects
Usage
## S4 method for signature 'lgpexpr'as.character(x)## S4 method for signature 'lgpterm'as.character(x)## S4 method for signature 'lgpformula'as.character(x)Arguments
x | an object of some S4 class |
Value
a character representation of the object
Create a model
Description
See theMathematical description of lgpr modelsvignette for more information about the connection between different optionsand the created statistical model.
Usage
create_model( formula, data, likelihood = "gaussian", prior = NULL, c_hat = NULL, num_trials = NULL, options = NULL, prior_only = FALSE, verbose = FALSE, sample_f = !(likelihood == "gaussian"))Arguments
formula | The model formula, where
See the "Model formula syntax" section below ( |
data | A |
likelihood | Determines the observation model. Must be either |
prior | A named list, defining the prior distribution of model(hyper)parameters. See the "Defining priors" section below( |
c_hat | The GP mean. This should only be given if
where |
num_trials | This argument (number of trials) is only needed whenlikelihood is |
options | A named list with the following possible fields:
If |
prior_only | Should likelihood be ignored? See also |
verbose | Should some informative messages be printed? |
sample_f | Determines if the latent function values are sampled(must be |
Value
An object of classlgpmodel, containing theStan input created based on parsing the specifiedformula,prior, and other options.
See Also
Other main functions:draw_pred(),get_draws(),lgp(),pred(),prior_pred(),sample_model()
Parse the covariates and model components from given data and formula
Description
Parse the covariates and model components from given data and formula
Usage
create_model.covs_and_comps(data, model_formula, x_cont_scl, verbose)Arguments
data | A |
model_formula | an object of classlgpformula |
x_cont_scl | Information on how to scale the continuous covariates.This can either be
|
verbose | Should some informative messages be printed? |
Value
parsed input to Stan and covariate scaling, and other info
See Also
Other internal model creation functions:create_model.formula(),create_model.likelihood(),create_model.prior()
Create a model formula
Description
Checks if formula is in advanced format and translates if not.
Usage
create_model.formula(formula, data, verbose = FALSE)Arguments
formula | The model formula, where
See the "Model formula syntax" section below ( |
data | A |
verbose | Should some informative messages be printed? |
Value
an object of classlgpformula
See Also
Other internal model creation functions:create_model.covs_and_comps(),create_model.likelihood(),create_model.prior()
Parse the response variable and its likelihood model
Description
Parse the response variable and its likelihood model
Usage
create_model.likelihood( data, likelihood, c_hat, num_trials, y_name, sample_f, verbose)Arguments
data | A |
likelihood | Determines the observation model. Must be either |
c_hat | The GP mean. This should only be given if
where |
num_trials | This argument (number of trials) is only needed whenlikelihood is |
y_name | Name of response variable |
sample_f | Determines if the latent function values are sampled(must be |
verbose | Should some informative messages be printed? |
Value
a list of parsed options
See Also
Other internal model creation functions:create_model.covs_and_comps(),create_model.formula(),create_model.prior()
Parse the given modeling options
Description
Parse the given modeling options
Usage
create_model.options(options, verbose)Arguments
options | A named list with the following possible fields:
If |
verbose | Should some informative messages be printed? |
Value
a named list of parsed options
Parse given prior
Description
Parse given prior
Usage
create_model.prior(prior, stan_input, verbose)Arguments
prior | A named list, defining the prior distribution of model(hyper)parameters. See the "Defining priors" section below( |
stan_input | a list of stan input fields |
verbose | Should some informative messages be printed? |
Value
a named list of parsed options
See Also
Other internal model creation functions:create_model.covs_and_comps(),create_model.formula(),create_model.likelihood()
Helper function for plots
Description
Helper function for plots
Usage
create_plot_df(object, x = "age", group_by = "id")Arguments
object | model or fit |
x | x-axis variable name |
group_by | grouping variable name (use |
Value
a data frame
Create a standardizing transform
Description
Create a standardizing transform
Usage
create_scaling(x, name)Arguments
x | variable measurements (might contain |
name | variable name |
Value
an object of classlgpscaling
See Also
Other variable scaling functions:apply_scaling()
Density and quantile functions of the inverse gamma distribution
Description
Using the same parametrization as Stan. More infohere.
Usage
dinvgamma_stanlike(x, alpha, beta, log = FALSE)qinvgamma_stanlike(p, alpha, beta)Arguments
x | point where to compute the density |
alpha | positive real number |
beta | positive real number |
log | is log-scale used? |
p | quantile (must be between 0 and 1) |
Value
density/quantile value
See Also
Other functions related to the inverse-gamma distribution:plot_invgamma(),priors
Draw pseudo-observations from posterior or prior predictive distribution
Description
Draw pseudo-observations from predictive distribution.Ifpred contains draws from the component posterior (prior)distributions, then the output is draws from the posterior (prior)predictive distribution. Ifpred is not specified, thenwhether output draws are from prior or posterior predictive distributiondepends on whetherfit is created using thelgpoptionprior_only=TRUE or not.
Usage
draw_pred(fit, pred = NULL)Arguments
fit | An object of classlgpfit that has been createdusing the |
pred | An object of classPrediction, containingdraws of each model component. If |
Value
An array with shapeS x P, whereS is the number ofdraws thatpred contains andP is the length of eachfunction draw.Each rows = 1, \ldots, S of the output is one vector drawn from thepredictive distribution, given parameter draws.
See Also
Other main functions:create_model(),get_draws(),lgp(),pred(),prior_pred(),sample_model()
Quick way to create an example lgpfit, useful for debugging
Description
Quick way to create an example lgpfit, useful for debugging
Usage
example_fit( formula = y ~ id + age + age | SEX + age | LOC, likelihood = "gaussian", chains = 1, iter = 30, num_indiv = 6, num_timepoints = 5, ...)Arguments
formula | model formula |
likelihood | observation model |
chains | number of chains to run |
iter | number of iterations to run |
num_indiv | number of individuals (data simulation) |
num_timepoints | number of time points (data simulation) |
... | additional arguments to |
Value
Anlgpfit object created by fittingthe example model.
Print a fit summary.
Description
Print a fit summary.
Usage
fit_summary(fit, ignore_pars = c("f_latent", "eta", "teff_raw", "lp__"))Arguments
fit | an object of classlgpfit |
ignore_pars | parameters and generated quantities to ignore from output |
Value
object invisibly.
Extract parameter draws from lgpfit or stanfit
Description
Usesextractwithpermuted = FALSE andinc_warmup = FALSE.
Usage
get_draws(object, draws = NULL, reduce = NULL, ...)Arguments
object | An object of classlgpfit or |
draws | Indices of the parameter draws. |
reduce | Function used to reduce all parameter draws intoone set of parameters. Ignored if |
... | Additional arguments to |
Value
The return value is always a 2-dimensional array of shapenum_param_sets xnum_params.
See Also
Other main functions:create_model(),draw_pred(),lgp(),pred(),prior_pred(),sample_model()
Extract model predictions and function posteriors
Description
NOTE: It is not recommended for users to call this. Usepred instead.
Usage
get_pred(fit, draws = NULL, reduce = NULL, verbose = TRUE)Arguments
fit | An object of classlgpfit. |
draws | Indices of parameter draws to use, or |
reduce | Reduction for parameters draws. Can be a function thatis applied to reduce all parameter draws into one parameter set, or |
verbose | Should more information and a possible progress bar beprinted? |
Value
an object of classGaussianPrediction orPrediction
Compute a kernel matrix (covariance matrix)
Description
These haveSTAN_kernel_* counterparts. These R versionsare provided for reference and are not optimized for speed. These areused when generating simulated data, and not during model inference.
Usage
kernel_eq(x1, x2, alpha = 1, ell)kernel_ns(x1, x2, alpha = 1, ell, a)kernel_zerosum(x1, x2, M)kernel_bin(x1, x2, pos_class = 0)kernel_cat(x1, x2)kernel_varmask(x1, x2, a, vm_params)kernel_beta(beta, idx1_expand, idx2_expand)Arguments
x1 | vector of length |
x2 | vector of length |
alpha | marginal std (default = 1) |
ell | lengthscale |
a | steepness of the warping function rise |
M | number of categories |
pos_class | binary (mask) kernel function has value one if both inputshave this value, other wise it is zero |
vm_params | vector of two mask function parameters. |
beta | a parameter vector (row vector) of length |
idx1_expand | integer vector of length |
idx2_expand | integer vector of length |
Value
A matrix of sizen xm.
Functions
kernel_eq(): Uses the exponentiated quadratic kernel.kernel_ns(): Uses the non-stationary kernel (input warping + squaredexponential).kernel_zerosum(): Uses the zero-sum kernel. Here,x1andx2must be integer vectors (integers denoting different categories).Returns a binary matrix.kernel_bin(): Uses the binary (mask) kernel. Here,x1andx2must be integer vectors (integers denoting different categories).Returns a binary matrix.kernel_cat(): Uses the categorical kernel. Here,x1andx2must be integer vectors (integers denoting different categories).Returns a binary matrix.kernel_varmask(): Computes variance mask multiplier matrix.NaN'sinx1andx2will be replaced by 0.kernel_beta(): Computes the heterogeneity multiplier matrix.NOTE:idx_expandneeds to be given so thatidx_expand[j]-1tells the index of the beta parameter that should beused for thejth observation. If observationjdoesn'tcorrespond to any beta parameter, thenidx_expand[j]should be 1.
Main function of the 'lgpr' package
Description
Creates an additive Gaussian process model usingcreate_model and fits it usingsample_model.See theMathematical description of lgpr modelsvignette for more information about the connection between different optionsand the created statistical model.
Usage
lgp( formula, data, likelihood = "gaussian", prior = NULL, c_hat = NULL, num_trials = NULL, options = NULL, prior_only = FALSE, verbose = FALSE, sample_f = !(likelihood == "gaussian"), quiet = FALSE, skip_postproc = sample_f, ...)Arguments
formula | The model formula, where
See the "Model formula syntax" section below ( |
data | A |
likelihood | Determines the observation model. Must be either |
prior | A named list, defining the prior distribution of model(hyper)parameters. See the "Defining priors" section below( |
c_hat | The GP mean. This should only be given if
where |
num_trials | This argument (number of trials) is only needed whenlikelihood is |
options | A named list with the following possible fields:
If |
prior_only | Should likelihood be ignored? See also |
verbose | Can messages be printed during model creation? Has noeffect if |
sample_f | Determines if the latent function values are sampled(must be |
quiet | Should all output messages be suppressed? You need to setalso |
skip_postproc | Should all postprocessing be skipped? If this is |
... | Optional arguments passed to |
Value
Returns an object of the S4 classlgpfit.
Model formula syntax
There are two ways to define the model formula:
Using a common
formula-like syntax, like iny ~ age +age|id+ sex. Terms can consist of asingle variable, such asage, or an interaction of two variables,such asage|id. In single-variable terms, the variable can be eithercontinuous (numeric) or categorical (factor), whereas in interaction termsthe variable on the left-hand side of the vertical bar (|) has tobe continuous and the one on the right-hand side has to be categorical.Formulae specified using this syntax are translated to the advanced formatso thatsingle-variable terms become
gp(x)ifvariablexis numeric andzs(x)ifxis a factorinteraction terms
x|zbecomegp(x)*zs(z)
Using the advanced syntax, like in
y ~ gp(age) +gp(age)*zs(id) +het(id)*gp_vm(disAge).This createslgprhs objects, which consist oflgpterms, which consist oflgpexprs.This approach must be used if creating nonstationary, heterogeneous ortemporally uncertain components.
Either one of the approaches should be used and they should not be mixed.
Defining priors
Theprior argument must be a named list, likelist(alpha=student_t(4), wrp=igam(30,10)). See examples in tutorials.Possible allowed names are
"alpha"= component magnitude parameters"ell"= component lengthscale parameters"wrp"= input warping steepness parameters"sigma"= noise magnitude (Gaussian obs. model)"phi"= inv. overdispersion (negative binomial obs. model)"gamma"= overdispersion (beta-binomial obs. model)"beta"= heterogeneity parameters"effect_time"= uncertain effect time parameters"effect_time_info"= additional options for the above
Seepriors for functions that can beused to define the list elements. If a parameter of a model is not givenin this list, a default prior will be used for it.
When to not use default priors
It is not recommended to use default priors blindly. Rather, priors shouldbe specified according to the knowledge about the problem at hand, as in anyBayesian analysis. Inlgpr this is especially important when
Using a non-Gaussian likelihood or otherwise setting
sample_f = TRUE. In this case the response variable is notnormalized, so the scale on which the data varies must be taken intoaccount when defining priors of the signal magnitude parametersalphaand possible noise parameters (sigma,phi,gamma). Also it should be checked ifc_hatis set in asensible way.Using a model that contains a
gp_ns(x)orgp_vm(x)expression in its formula. In this case the corresponding covariatexis not normalized, and the prior for the input warping steepnessparameterwrpmust be set according to the expected width of thewindow in which the nonstationary effect ofxoccurs. By default,the width of this window is about 36, which has been set assuming thatthe unit ofxis months.
See Also
Other main functions:create_model(),draw_pred(),get_draws(),pred(),prior_pred(),sample_model()
An S4 class to represent an lgp expression
Description
An S4 class to represent an lgp expression
Slots
covariatename of a covariate
funfunction name
See Also
Seeoperations for performing arithmeticsonlgprhs,lgpterm andlgpexprobjects.
An S4 class to represent the output of thelgp function
Description
An S4 class to represent the output of thelgp function
Usage
## S4 method for signature 'lgpfit'show(object)## S4 method for signature 'lgpfit'component_names(object)## S4 method for signature 'lgpfit'num_components(object)## S4 method for signature 'lgpfit'postproc(object, verbose = TRUE)## S4 method for signature 'lgpfit'contains_postproc(object)## S4 method for signature 'lgpfit'clear_postproc(object)## S4 method for signature 'lgpfit'get_model(object)## S4 method for signature 'lgpfit'get_stanfit(object)## S4 method for signature 'lgpfit'is_f_sampled(object)## S4 method for signature 'lgpfit,missing'plot(x, y)Arguments
object | The object for which to apply a class method. |
verbose | Can the method print any messages? |
x | anlgpfit object to visualize |
y | unused argument |
Methods (by generic)
show(lgpfit): Print information and summary about the fit object.component_names(lgpfit): Get names of model components.num_components(lgpfit): Get number of model components. Returns apositive integer.postproc(lgpfit): Apply postprocessing. Returns an updatedlgpfit object (copies data).contains_postproc(lgpfit): Check if object contains postprocessing information.clear_postproc(lgpfit): Returns an updated (copies data)lgpfit object without any postprocessing information.get_model(lgpfit): Get the storedlgpmodel object.Various properties of the returned object can be accessed as explainedin the documentation oflgpmodel.get_stanfit(lgpfit): Get the storedstanfitobject.Various properties of the returned object can be accessed or plottedas explainedhereor in the documentation ofstanfit.is_f_sampled(lgpfit): Determine if inference was done by samplingthe latent signalf(and its components).plot(x = lgpfit, y = missing): Visualize parameter draws usingplot_draws.
Slots
stan_fitAn object of class
stanfit.modelAn object of class
lgpmodel.num_drawsTotal number of parameter draws.
postproc_resultsA named list containing possible postprocessingresults.
See Also
For extracting parameter draws, seeget_draws,or therstan methods forstanfit objects.
For more detailed plotting functions, seeplot_draws,plot_beta,plot_warp,plot_effect_times
An S4 class to represent an lgp formula
Description
An S4 class to represent an lgp formula
Slots
termsan object of classlgprhs
y_namename of the response variable
calloriginal formula call
See Also
Seeoperations for performing arithmeticsonlgprhs,lgpterm andlgpexprobjects.
An S4 class to represent an additive GP model
Description
An S4 class to represent an additive GP model
Usage
## S4 method for signature 'lgpmodel'show(object)## S4 method for signature 'lgpmodel'parameter_info(object, digits = 3)## S4 method for signature 'lgpmodel'component_info(object)## S4 method for signature 'lgpmodel'num_components(object)## S4 method for signature 'lgpmodel'covariate_info(object)## S4 method for signature 'lgpmodel'component_names(object)## S4 method for signature 'lgpmodel'is_f_sampled(object)Arguments
object | The object for which to apply a class method. |
digits | number of digits to show for floating point numbers |
Methods (by generic)
show(lgpmodel): Print information and summary about the object.Returnsobjectinvisibly.parameter_info(lgpmodel): Get a parameter summary (bounds andpriors). Returns adata.frame.component_info(lgpmodel): Get a data frame with information about each modelcomponent.num_components(lgpmodel): Get number of model components. Returns apositive integer.covariate_info(lgpmodel): Get covariate information.component_names(lgpmodel): Get names of model components.is_f_sampled(lgpmodel): Determine if inference of the model requires samplingthe latent signalf(and its components).
Slots
formulaAn object of classlgpformula
dataThe original unmodified data.
stan_inputThe data to be given as input to
rstan::samplingvar_namesList of variable names grouped by type.
var_scalingsA named list with fields
y- Response variable normalization function and itsinverse operation. Must be anlgpscaling object.x_cont- Continuous covariate normalization functions andtheir inverse operations. Must be a named list with each element is anlgpscaling object.
var_infoA named list with fields
x_cat_levels- Names of the levels of categorical covariatesbefore converting from factor to numeric.
infoOther info in text format.
sample_fWhether the signal
fis sampled or marginalized.full_priorComplete prior information.
An S4 class to represent the right-hand side of an lgp formula
Description
An S4 class to represent the right-hand side of an lgp formula
Slots
summandsa list of one or morelgpterms
See Also
Seeoperations for performing arithmeticsonlgprhs,lgpterm andlgpexprobjects.
An S4 class to represent variable scaling
Description
An S4 class to represent variable scaling
Slots
locoriginal location (mean)
scaleoriginal scale (standard deviation)
var_namevariable name
An S4 class to represent a data set simulated using the additive GPformalism
Description
An S4 class to represent a data set simulated using the additive GPformalism
Usage
## S4 method for signature 'lgpsim'show(object)## S4 method for signature 'lgpsim,missing'plot(x, y, ...)Arguments
object | anlgpsim object |
x | anlgpsim object to plot |
y | not used |
... | optional arguments passed to |
Methods (by generic)
show(lgpsim): Show summary of object.plot(x = lgpsim, y = missing): Plot the data and generating process. For moreinformation seeplot_sim.
Slots
datathe actual data
responsename of the response variable in the data
componentsthe drawn function components
kernel_matricesthe covariance matrices for each gp
infoA list with fields
par_ellthe used lengthscale parameterspar_contthe parameters used to generate the continuouscovariatesp_signalsignal proportion
effect_timesA list with fields
truepossible true effect times that generate the diseaseeffectobservedpossible observed effect times
An S4 class to represent one formula term
Description
An S4 class to represent one formula term
Slots
factorsa list of at most twolgpexprs
See Also
Seeoperations for performing arithmeticsonlgprhs,lgpterm andlgpexprobjects.
Print a model summary.
Description
Print a model summary.
Usage
model_summary(object, digits = 3)param_summary(object, digits = 3)Arguments
object | a model or fit |
digits | number of digits to round floats to |
Value
object invisibly.
Create test input points for prediction
Description
Replaces a continuous variablex in the data frame, andpossibly another continuous variablex_ns derived from it, with newvalues, for each level of a grouping factor (usually id)
Usage
new_x(data, x_values, group_by = "id", x = "age", x_ns = NULL)Arguments
data | A data frame. Can also be anlgpfit orlgpmodel object, in which case data is extracted from it. |
x_values | the values of |
group_by | name of the grouping variable, must be a factorin |
x | of the variable along which to extend,must be a numeric in |
x_ns | of a nonstationary variable derived from |
Value
a data frame containing the following columns
all factors in the original
dataxx_ns(unless it is NULL)
See Also
Other data frame handling functions:add_dis_age(),add_factor(),add_factor_crossing(),adjusted_c_hat(),split()
Operations on formula terms and expressions
Description
Operations on formula terms and expressions
Usage
## S4 method for signature 'lgprhs,lgprhs'e1 + e2## S4 method for signature 'lgpterm,lgpterm'e1 + e2## S4 method for signature 'lgprhs,lgpterm'e1 + e2## S4 method for signature 'lgpterm,lgpterm'e1 * e2Arguments
e1 | The first sum, term or expression |
e2 | The second sum, term or expression |
Value
The behaviour and return type depend on the types ofe1 ande2.You can
Plot a generated/fit model component
Description
Data frames specified in argumentsdf,anddf_err must have a format where
The first column is the grouping factor (usually id).
The second column is the x-axis variable (usually age).
The third column is the coloring factor. If name of the thirdcolumn is
NA, coloring is not done.A column named
ymust contain the y-axis variable(not fordf_err).A column named
lower(upper) must contain the lower(upper) bound of error bar (only fordf_err).The posterior draw using which the fit has been computed can bespecified with a factor named
_draw_(only fordf).
Usage
plot_api_c( df, df_err = NULL, alpha = 1, alpha_err = 0.2, no_err = FALSE, no_line = FALSE)Arguments
df | a data frame |
df_err | a data frame |
alpha | line opacity |
alpha_err | ribbon opacity |
no_err | hide error bar even when it would normally be plotted? |
no_line | hide line even when it would normally be plotted? |
Value
Aggplot object.
See Also
Other internal plot API functions:plot_api_g()
Plot longitudinal data and/or model fit so that each subject/group hastheir own panel
Description
Data frames specified in argumentsdf_data,df_signal,df_fit, anddf_fit_err must have a formatwhere
the first column is the grouping factor (usually id)
the second column is the x-axis variable (usually age)
a column named
ymust contain the y-axis variable(not fordf_fit_err)a column named
lower(upper) must contain the lower(upper) bound of error bar (only fordf_fit_err)a column named
drawmust be a factor thatspecifies the posterior draw using which the fit has been computed(only fordf_fit)
Usage
plot_api_g( df_data, df_signal = NULL, df = NULL, df_err = NULL, teff_signal = NULL, teff_obs = NULL, i_test = NULL, color_signal = color_palette(2)[1], color = color_palette(2)[2], color_err = colorset("red", "light_highlight"), color_vlines = colorset("gray", "mid_highlight"), alpha = 1, alpha_err = 0.5, nrow = NULL, ncol = NULL, y_transform = function(x) x)Arguments
df_data | A data frame containing the observations. |
df_signal | A data frame containing the true signal. Omitted if |
df | A data frame containing the model fit, or a list of dataframes. The list version can be used for example so that each list elementcorresponds to the fit computed using one parameter draw. Omitted if |
df_err | A data frame containing error bars. Omitted if |
teff_signal | A named vector containing true effect times used togenerate the signal. Omitted if |
teff_obs | A named vector containing observed effect times. Omitted if |
i_test | Indices of test points. |
color_signal | Line color for true signal. |
color | Line color for model fit. |
color_err | Color of the error ribbon. |
color_vlines | Two line colors for vertical lines(true and obs. effect time). |
alpha | Line opacity for model fit. |
alpha_err | Opacity of the error ribbon. |
nrow | number of rows, an argument for |
ncol | number of columns, an argument for |
y_transform | A function to be applied to the third column of |
Value
Aggplot object.
See Also
Other internal plot API functions:plot_api_c()
Visualize all model components
Description
This callsplot_f for all model components.
Usage
plot_components( fit, pred = NULL, group_by = "id", t_name = "age", MULT_STD = 2, verbose = TRUE, draws = NULL, reduce = function(x) base::mean(x), color_by = NA, no_err = FALSE, ylim = NULL, draw = TRUE, nrow = NULL, ncol = NULL, gg_add = NULL, x = NULL, ...)Arguments
fit | An object of classlgpfit. |
pred | An object of classGaussianPrediction orPrediction. If |
group_by | name of the grouping variable (use |
t_name | name of the x-axis variable |
MULT_STD | a multiplier for standard deviation |
verbose | Can this print any messages? |
draws | Only has effect if |
reduce | Only has effect if |
color_by | Names of coloring factors. Can have length 1 or equal tothe number of components. See the |
no_err | Should the error ribbons be skipped even though theyotherwise would be shown? Can have length 1 or equal to number ofcomponents + 1. See the |
ylim | a vector of length 2 (upper and lower y-axis limits), or NULL |
draw | if this is TRUE, the plot grid is drawn using |
nrow | number of grid rows |
ncol | number of grid columns |
gg_add | additional ggplot obejct to add to each plot |
x | Deprecated argument. This is now taken from the |
... | additional arguments to |
Value
a list of ggplot objects invisibly
See Also
Other main plot functions:plot_draws(),plot_pred()
Vizualizing longitudinal data
Description
Vizualizing longitudinal data
Usage
plot_data( data, x_name = "age", y_name = "y", group_by = "id", facet_by = NULL, color_by = NULL, highlight = NULL, main = NULL, sub = NULL)Arguments
data | A data frame. |
x_name | Name of x-axis variable. |
y_name | Name of the y-axis variable. |
group_by | Name of grouping variable (must be a factor). |
facet_by | Name of the faceting variable (must be a factor). |
color_by | Name of coloring variable (must be a factor). |
highlight | Value of category of the |
main | main plot title |
sub | plot subtitle |
Value
aggplot object
Visualize the distribution of parameter draws
Description
Visualize the distribution of parameter draws
Usage
plot_draws( fit, type = "intervals", regex_pars = c("alpha", "ell", "wrp", "sigma", "phi", "gamma"), ...)plot_beta(fit, type = "dens", verbose = TRUE, ...)plot_warp( fit, num_points = 300, window_size = 48, color = colorset("red", "dark"), alpha = 0.5)plot_effect_times(fit, type = "areas", verbose = TRUE, ...)Arguments
fit | an object of classlgpfit |
type | plot type, allowed options are "intervals", "dens","areas", and "trace" |
regex_pars | regex for parameter names to plot |
... | additional arguments for the |
verbose | Can any output be printed? |
num_points | number of plot points |
window_size | width of time window |
color | line color |
alpha | line alpha |
Value
aggplot object or list of them
Functions
plot_draws(): visualizes the distribution of any set ofmodel parameters (defaults to kernel hyperparameters and possibleobservation model parameters)plot_beta(): visualizes the distribution of theindividual-specific disease effect magnitude parameter drawsplot_warp(): visualizes the input warping function fordifferent draws of the warping steepness parameterplot_effect_times(): visualizes the input warping function fordifferent parameter draws
See Also
Other main plot functions:plot_components(),plot_pred()
Visualize input warping function with several steepness parameter values
Description
Visualize input warping function with several steepness parameter values
Usage
plot_inputwarp(wrp, x, color = colorset("red", "dark"), alpha = 0.5)Arguments
wrp | a vector of values of the warping steepness parameter |
x | a vector of input values |
color | line color |
alpha | line alpha |
Value
aggplot object
Plot the inverse gamma-distribution pdf
Description
Plot the inverse gamma-distribution pdf
Usage
plot_invgamma( alpha, beta, by = 0.01, log = FALSE, IQR = 0.95, return_quantiles = FALSE, linecolor = colorset("red", "dark"), fillcolor = colorset("red", "mid"))Arguments
alpha | positive real number |
beta | positive real number |
by | grid size |
log | is log-scale used? |
IQR | inter-quantile range width |
return_quantiles | should this return a list |
linecolor | line color |
fillcolor | fill color |
Value
aggplot object
See Also
Other functions related to the inverse-gamma distribution:dinvgamma_stanlike(),priors
Visualizing model predictions or inferred covariate effects
Description
Function draws at data points can be visualized using
plot_pred. If thepredargument isNULL, itis computed using thepredfunction withx=NULL.The total signal
for any of itsadditive components can be plotted usingplot_f.
Usage
plot_pred( fit, pred = NULL, group_by = "id", t_name = "age", MULT_STD = 2, verbose = TRUE, draws = NULL, reduce = function(x) base::mean(x), x = NULL, ...)plot_f( fit, pred = NULL, group_by = "id", t_name = "age", MULT_STD = 2, verbose = TRUE, draws = NULL, reduce = function(x) base::mean(x), comp_idx = NULL, color_by = NA, x = NULL, ...)Arguments
fit | An object of classlgpfit. |
pred | An object of classGaussianPrediction orPrediction. If |
group_by | name of the grouping variable (use |
t_name | name of the x-axis variable |
MULT_STD | a multiplier for standard deviation |
verbose | Can this print any messages? |
draws | Only has effect if |
reduce | Only has effect if |
x | Deprecated argument. This is now taken from the |
... | additional arguments to |
comp_idx | Index of component to plot. The total sum is plottedif this is |
color_by | name of coloring factor |
Value
aggplot object
See Also
Other main plot functions:plot_components(),plot_draws()
Visualize an lgpsim object (simulated data)
Description
Visualize an lgpsim object (simulated data)
Usage
plot_sim( simdata, group_by = "id", x_name = "age", h_name = "h", y_name = "y", comp_idx = NULL, color_by = NA, verbose = TRUE, ...)Arguments
simdata | an object of classlgpsim |
group_by | grouping factor |
x_name | name of x-axis variable |
h_name | name of the signal in |
y_name | name of response variable |
comp_idx | Possible index of a component to be shown.If this is NULL, the data and total signal are shown. |
color_by | coloring factor |
verbose | should some information be printed? |
... | additional arguments to |
Value
aggplot object
Graphical posterior predictive checks
Description
Graphical posterior predictive checks
Usage
ppc(fit, data = NULL, fun = default_ppc_fun(fit), verbose = TRUE, ...)Arguments
fit | An object of classlgpfit that can been createdwith |
data | the original data frame (deprecated argument with noeffect, now obtained from fit object) |
fun |
|
verbose | Can this print any messages? |
... | additional arguments passed to the default |
Value
aggplot object
See Also
Introduction to graphical posterior predictive checks:here.Prior predictive check can be done by callingprior_pred and thenbayesplot::pp_check().
Posterior predictions and function posteriors
Description
If
fitis for a model that marginalizes the latentsignalf(i.e.is_f_sampled(fit)isFALSE), thiscomputes the analytic conditional posteriordistributions of each model component, their sum, and the conditionalpredictive distribution. All these are computed foreach (hyper)parameter draw (defined bydraws), or other parameterset (obtained by a reduction defined byreduce). Results are storedin aGaussianPrediction object which is then returned.If
fitis for a model that samples the latentsignalf(i.e.is_f_sampled(fit)isTRUE), this willextract these function samples, compute their sum, and a version of thesumfthat is transformed through the inverse link function.Ifxis notNULL, the function draws are extrapolatedto the points specified byxusing kernel regression.Results are stored in aPredictionobject which is then returned.
Usage
pred( fit, x = NULL, reduce = function(x) base::mean(x), draws = NULL, verbose = TRUE, STREAM = get_stream(), c_hat_pred = NULL, force = FALSE, debug_kc = FALSE)Arguments
fit | An object of classlgpfit. |
x | A data frame of points where function posterior distributionsand predictions should be computed or sampled.The function |
reduce | Reduction for parameters draws. Can be a function thatis applied to reduce all parameter draws into one parameter set, or |
draws | Indices of parameter draws to use, or |
verbose | Should more information and a possible progress bar beprinted? |
STREAM | External pointer. By default obtained with |
c_hat_pred | This is only used if the latent signal |
force | This is by default |
debug_kc | If this is |
Value
An object of classGaussianPrediction orPrediction.
See Also
Other main functions:create_model(),draw_pred(),get_draws(),lgp(),prior_pred(),sample_model()
Prior (predictive) sampling
Description
These functions take anlgpmodel object, and
prior_predsamples from the prior predictive distribution ofthe modelsample_param_priorsamples only its parameter prior usingsampling
Usage
prior_pred( model, verbose = TRUE, quiet = FALSE, refresh = 0, STREAM = get_stream(), ...)sample_param_prior(model, verbose = TRUE, quiet = FALSE, ...)Arguments
model | An object of classlgpmodel. |
verbose | Should more information and a possible progress bar beprinted? |
quiet | This forces |
refresh | Argument for |
STREAM | External pointer. By default obtained with |
... | Additional arguments for |
Value
prior_predreturns a list with componentsy_draws: A matrix containing the prior predictive drawsas rows. Can be passed tobayesplot::pp_check()forgraphical prior predictive checking.pred_draws: an object of classPrediction,containing prior draws of each model component and their sumparam_draws: astanfitobject of prior parameterdraws (obtained by callingsample_param_priorinternally)
sample_param_priorreturnsan object of classstanfit
See Also
Other main functions:create_model(),draw_pred(),get_draws(),lgp(),pred(),sample_model()
Convert given prior to numeric format
Description
Convert given prior to numeric format
Usage
prior_to_num(desc)Arguments
desc | Prior description as a named list, containing fields
Other list fields are interpreted as hyperparameters. |
Value
a named list of parsed options
Prior definitions
Description
These use the same parametrizations as defined in the 'Stan'documentation. See the docs forgamma andinverse gamma distributions.
Usage
uniform(square = FALSE)normal(mu, sigma, square = FALSE)student_t(nu, square = FALSE)gam(shape, inv_scale, square = FALSE)igam(shape, scale, square = FALSE)log_normal(mu, sigma, square = FALSE)bet(a, b)Arguments
square | is prior for a square-transformed parameter? |
mu | mean |
sigma | standard deviation |
nu | degrees of freedom |
shape | shape parameter (alpha) |
inv_scale | inverse scale parameter (beta) |
scale | scale parameter (beta) |
a | shape parameter |
b | shape parameter |
Value
a named list
See Also
Other functions related to the inverse-gamma distribution:dinvgamma_stanlike(),plot_invgamma()
Examples
# Log-normal priorlog_normal(mu = 1, sigma = 1)# Cauchy priorstudent_t(nu = 1)# Exponential prior with rate = 0.1gam(shape = 1, inv_scale = 0.1)# Create a similar priors as in LonGP (Cheng et al., 2019)# Not recommended, because a lengthscale close to 0 is possible.a <- log(1) - log(0.1)log_normal(mu = 0, sigma = a / 2) # for continuous lengthscalestudent_t(nu = 4) # for interaction lengthscaleigam(shape = 0.5, scale = 0.005, square = TRUE) # for sigmaFunction for reading the built-in proteomics data
Description
Function for reading the built-in proteomics data
Usage
read_proteomics_data(parentDir = NULL, protein = NULL, verbose = TRUE)Arguments
parentDir | Path to local parent directory for the data.If this is |
protein | Index or name of protein. |
verbose | Can this print some output? |
Value
adata.frame
Assess component relevances
Description
Assess component relevances
Usage
relevances(fit, reduce = function(x) base::mean(x), verbose = TRUE, ...)Arguments
fit | an object of class |
reduce | a function to apply to reduce the relevances given eachparameter draw into one value |
verbose | Can this print any messages? |
... | currently has no effect |
Value
a named vector with length equal tonum_comps + 1
S4 generics for lgpfit, lgpmodel, and other objects
Description
S4 generics for lgpfit, lgpmodel, and other objects
Usage
parameter_info(object, digits)component_info(object)covariate_info(object)component_names(object)get_model(object)is_f_sampled(object)get_stanfit(object)postproc(object, ...)contains_postproc(object)clear_postproc(object)num_paramsets(object)num_evalpoints(object)num_components(object)Arguments
object | object for which to apply the generic |
digits | number of digits to show |
... | additional optional arguments to pass |
Value
parameter_inforeturns a data frame withone row for each parameter and columnsfor parameter name, parameter bounds, and the assigned priorcomponent_inforeturns a data frame with one row foreach model component, and columns encoding information aboutmodel componentscovariate_inforeturns a list with namescontinuousandcategorical, with information aboutboth continuous and categorical covariatescomponent_namesreturns a character vector withcomponent namesis_f_sampledreturns a logical valueget_stanfitreturns astanfit(rstan)postprocapplies postprocessing and returns anupdatedlgpfitclear_postprocremoves postprocessing information andreturns an updatedlgpfitnum_paramsets,num_evalpointsandnum_componentsreturn an integer
Functions
parameter_info(): Get parameter information (priors etc.).component_info(): Get component information.covariate_info(): Get covariate information.component_names(): Get component names.get_model(): Getlgpmodel object.is_f_sampled(): Determine if signal f is sampled or marginalized.get_stanfit(): Extract stanfit object.postproc(): Perform postprocessing.contains_postproc(): Determine if object contains postprocessinginformation.clear_postproc(): Clear postprocessing information (to reducesize of object).num_paramsets(): Get number of parameter sets.num_evalpoints(): Get number of points where posterior is evaluated.num_components(): Get number of model components.
See Also
To find out which methods have been implemented for which classes,seelgpfit,lgpmodel,Prediction andGaussianPrediction.
Fitting a model
Description
sample_modeltakes anlgpmodelobject and fits it usingsampling.optimize_modeltakes anlgpmodelobject and fits it usingoptimizing.
Usage
sample_model( model, verbose = TRUE, quiet = FALSE, skip_postproc = is_f_sampled(model), ...)optimize_model(model, ...)Arguments
model | An object of classlgpmodel. |
verbose | Can messages be printed? |
quiet | Should all output messages be suppressed? You need to setalso |
skip_postproc | Should all postprocessing be skipped? If this is |
... | Optional arguments passed to |
Value
sample_modelreturns an object of classlgpfitcontaining the parameter draws, the originalmodelobject,and possible postprocessing results. See documentation oflgpfit for more information.optimize_modeldirectly returns the list returned byoptimizing. See its documentation for more information.
See Also
Other main functions:create_model(),draw_pred(),get_draws(),lgp(),pred(),prior_pred()
Select relevant components
Description
selectperforms strict selection, returning eitherTRUEorFALSEfor each component.select.integrateis likeselect, but instead ofa fixed threshold, computes probabilistic selection by integrating overa threshold density.select_freqperforms the selection separately usingeach parameter draw and returns the frequency at which eachcomponent was selected.select_freq.integrateis likeselect_freq, butinstead of a fixed threshold, computes probabilistic selectionfrequencies by integrating over a threshold density.
Usage
select(fit, reduce = function(x) base::mean(x), threshold = 0.95, ...)select_freq(fit, threshold = 0.95, ...)select.integrate( fit, reduce = function(x) base::mean(x), p = function(x) stats::dbeta(x, 100, 5), h = 0.01, verbose = TRUE, ...)select_freq.integrate( fit, p = function(x) stats::dbeta(x, 100, 5), h = 0.01, verbose = TRUE, ...)Arguments
fit | An object of class |
reduce | The |
threshold | Threshold for relevance sum.Must be a value between 0 and 1. |
... | Additional arguments to |
p | A threshold density over interval [0,1]. |
h | A discretization parameter for computing a quadrature. |
verbose | Should this show a progress bar? |
Value
See description.
Printing formula object info using the show generic
Description
Printing formula object info using the show generic
Usage
## S4 method for signature 'lgpformula'show(object)## S4 method for signature 'lgprhs'show(object)## S4 method for signature 'lgpterm'show(object)Arguments
object | an object of some S4 class |
Value
the object invisibly
Simulate latent function components for longitudinal data analysis
Description
Simulate latent function components for longitudinal data analysis
Usage
sim.create_f( X, covariates, relevances, lengthscales, X_affected, dis_fun, bin_kernel, steepness, vm_params, force_zeromean)Arguments
X | input data matrix (generated by |
covariates | Integer vector that defines the types of covariates(other than id and age). Different integers correspond to thefollowing covariate types:
|
relevances | Relative relevance of each component. Must have be a vectorso that |
lengthscales | A vector so that |
X_affected | which individuals are affected by the disease |
dis_fun | A function or a string that defines the disease effect. Ifthis is a function, that function is used to generate the effect.If |
bin_kernel | Should the binary kernel be used for categoricalcovariates? If this is |
steepness | Steepness of the input warping function. This is only usedif the disease component is in the model. |
vm_params | Parameters of the variance mask function. This is onlyneeded if |
force_zeromean | Should each component (excluding the disease agecomponent) be forced to have a zero mean? |
Value
a data frame FFF where one column corresponds to one additivecomponent
Create an input data frame X for simulated data
Description
Create an input data frame X for simulated data
Usage
sim.create_x( N, covariates, names, n_categs, t_data, t_jitter, t_effect_range, continuous_info)Arguments
N | Number of individuals. |
covariates | Integer vector that defines the types of covariates(other than id and age). If not given, only the id and agecovariates are created. Different integers correspond to the followingcovariate types:
|
names | Covariate names. |
n_categs | An integer vector defining the number of categoriesfor each categorical covariate, so that |
t_data | Measurement times (same for each individual, unless |
t_jitter | Standard deviation of the jitter added to the givenmeasurement times. |
t_effect_range | Time interval from which the disease effect times aresampled uniformly. Alternatively, This can any function that returns the(possibly randomly generated) real disease effect time for one individual. |
continuous_info | Info for generating continuous covariates. Must be alist containing fields
|
Value
a list
Simulate noisy observations
Description
Simulate noisy observations
Usage
sim.create_y(noise_type, f, snr, phi, gamma, N_trials)Arguments
noise_type | Either "gaussian", "poisson", "nb" (negative binomial),"binomial", or "bb" (beta-binomial). |
f | The underlying signal. |
snr | The desired signal-to-noise ratio. This argument is validonly when |
phi | The inverse overdispersion parameter for negative binomial data.The variance is |
gamma | The dispersion parameter for beta-binomial data. |
N_trials | The number of trials parameter for binomial data. |
Value
A listout, where
out$hisfmapped through an inverse link function(timesN_trialsifnoise_typeis binomial or beta-binomial)out$yis the noisy response variable.
Compute all kernel matrices when simulating data
Description
Compute all kernel matrices when simulating data
Usage
sim.kernels( X, types, lengthscales, X_affected, bin_kernel, useMaskedVarianceKernel, steepness, vm_params)Arguments
X | covariates |
types | vector of covariate types, so that
|
lengthscales | vector of lengthscales |
X_affected | which individuals are affected by the disease |
bin_kernel | whether or not binary (mask) kernel should be used forcategorical covariates (if not, the zerosum kernel is used) |
useMaskedVarianceKernel | should the masked variance kernel be usedfor drawing the disease component |
steepness | steepness of the input warping function |
vm_params | parameters of the variance mask function |
Value
a 3D array
Generate an artificial longitudinal data set
Description
Generate an artificial longitudinal data set.
Usage
simulate_data( N, t_data, covariates = c(), names = NULL, relevances = c(1, 1, rep(1, length(covariates))), n_categs = rep(2, sum(covariates %in% c(2, 3))), t_jitter = 0, lengthscales = rep(12, 2 + sum(covariates %in% c(0, 1, 2))), f_var = 1, noise_type = "gaussian", snr = 3, phi = 1, gamma = 0.2, N_affected = round(N/2), t_effect_range = "auto", t_observed = "after_0", c_hat = 0, dis_fun = "gp_warp_vm", bin_kernel = FALSE, steepness = 0.5, vm_params = c(0.025, 1), continuous_info = list(mu = c(pi/8, pi, -0.5), lambda = c(pi/8, pi, 1)), N_trials = 1, force_zeromean = TRUE)Arguments
N | Number of individuals. |
t_data | Measurement times (same for each individual, unless |
covariates | Integer vector that defines the types of covariates(other than id and age). If not given, only the id and agecovariates are created. Different integers correspond to the followingcovariate types:
|
names | Covariate names. |
relevances | Relative relevance of each component. Must have be a vectorso that |
n_categs | An integer vector defining the number of categoriesfor each categorical covariate, so that |
t_jitter | Standard deviation of the jitter added to the givenmeasurement times. |
lengthscales | A vector so that |
f_var | variance of f |
noise_type | Either "gaussian", "poisson", "nb" (negative binomial),"binomial", or "bb" (beta-binomial). |
snr | The desired signal-to-noise ratio. This argument is validonly when |
phi | The inverse overdispersion parameter for negative binomial data.The variance is |
gamma | The dispersion parameter for beta-binomial data. |
N_affected | Number of diseased individuals that are affected by thedisease. This defaults to the number of diseased individuals. This argumentcan only be given if |
t_effect_range | Time interval from which the disease effect times aresampled uniformly. Alternatively, This can any function that returns the(possibly randomly generated) real disease effect time for one individual. |
t_observed | Determines how the disease effect time is observed. Thiscan be any function that takes the real disease effect time as an argumentand returns the (possibly randomly generated) observed onset/initiation time.Alternatively, this can be a string of the form |
c_hat | a constant added to f |
dis_fun | A function or a string that defines the disease effect. Ifthis is a function, that function is used to generate the effect.If |
bin_kernel | Should the binary kernel be used for categoricalcovariates? If this is |
steepness | Steepness of the input warping function. This is only usedif the disease component is in the model. |
vm_params | Parameters of the variance mask function. This is onlyneeded if |
continuous_info | Info for generating continuous covariates. Must be alist containing fields
|
N_trials | The number of trials parameter for binomial data. |
force_zeromean | Should each component (excluding the disease agecomponent) be forced to have a zero mean? |
Value
An object of classlgpsim.
Examples
# Generate Gaussian datadat <- simulate_data(N = 4, t_data = c(6, 12, 24, 36, 48), snr = 3)# Generate negative binomially (NB) distributed count datadat <- simulate_data( N = 6, t_data = seq(2, 10, by = 2), noise_type = "nb", phi = 2)Split data into training and test sets
Description
split_by_factorsplits according to given factorsplit_within_factorsplits according to givendata point indices within the same level of a factorsplit_within_factor_randomselects k pointsfrom each level of a factor uniformly at random as test datasplit_randomsplits uniformly at randomsplit_datasplits according to given data rows
Usage
split_by_factor(data, test, var_name = "id")split_within_factor(data, idx_test, var_name = "id")split_within_factor_random(data, k_test = 1, var_name = "id")split_random(data, p_test = 0.2, n_test = NULL)split_data(data, i_test, sort_ids = TRUE)Arguments
data | a data frame |
test | the levels of the factor that will be used as test data |
var_name | name of a factor in the data |
idx_test | indices point indices with the factor |
k_test | desired number of test data points per each level of thefactor |
p_test | desired proportion of test data |
n_test | desired number of test data points (if NULL, |
i_test | test data row indices |
sort_ids | should the test indices be sorted into increasing order |
Value
a named list with namestrain,test,i_trainandi_test
See Also
Other data frame handling functions:add_dis_age(),add_factor(),add_factor_crossing(),adjusted_c_hat(),new_x()
A very small artificial test data, used mostly for unit tests
Description
A very small artificial test data, used mostly for unit tests
Usage
testdata_001Format
A data frame with 24 rows and 6 variables:
- id
individual id, a factor with levels: 1, 2, 3, 4
- age
age
- dis_age
disease-related age
- blood
a continuous variable
- sex
a factor with 2 levels: Male, Female
- y
a continuous variable
See Also
Other built-in datasets:testdata_002
Medium-size artificial test data, used mostly for tutorials
Description
Medium-size artificial test data, used mostly for tutorials
Usage
testdata_002Format
A data frame with 96 rows and 6 variables:
- id
individual id, a factor with levels: 01-12
- age
age
- diseaseAge
disease-related age
- sex
a factor with 2 levels: Male, Female
- group
a factor with 2 levels: Case, Control
- y
a continuous variable
See Also
Other built-in datasets:testdata_001
Validate S4 class objects
Description
Validate S4 class objects
Usage
validate_lgpexpr(object)validate_lgpformula(object)validate_lgpscaling(object)validate_lgpfit(object)validate_GaussianPrediction(object)validate_Prediction(object)Arguments
object | an object to validate |
Value
TRUE if valid, otherwise reasons for invalidity
Variance masking function
Description
Variance masking function
Usage
var_mask(x, stp)Arguments
x | a vector of length |
stp | a positive real number (steepness of mask function) |
Value
a vector of lengthn
See Also
Other kernel utility functions:warp_input()
Input warping function
Description
Input warping function
Usage
warp_input(x, a)Arguments
x | a vector of length |
a | steepness of the warping function rise |
Value
a vector of warped inputsw(x), lengthn
See Also
Other kernel utility functions:var_mask()