Movatterモバイル変換

Title:

Longitudinal Gaussian Process Regression

Version:

1.2.5

Description:

Interpretable nonparametric modeling of longitudinal data using additive Gaussian process regression. Contains functionality for inferring covariate effects and assessing covariate relevances. Models are specified using a convenient formula syntax, and can include shared, group-specific, non-stationary, heterogeneous and temporally uncertain effects. Bayesian inference for model parameters is performed using 'Stan'. The modeling approach and methods are described in detail in Timonen et al. (2021) <doi:10.1093/bioinformatics/btab021>.

License:

GPL (≥ 3)

Encoding:

UTF-8

LazyData:

true

Biarch:

true

Depends:

R (≥ 3.4.0), methods

Imports:

Rcpp (≥ 0.12.0), RcppParallel (≥ 5.0.2), RCurl (≥ 1.98),rstan (≥ 2.26.0), rstantools (≥ 2.3.1), bayesplot (≥ 1.7.0),MASS (≥ 7.3-50), stats (≥ 3.4), ggplot2 (≥ 3.1.0), gridExtra(≥ 0.3.0)

LinkingTo:

BH (≥ 1.75.0-0), Rcpp (≥ 1.0.6), RcppEigen (≥ 0.3.3.9.1),RcppParallel (≥ 5.0.2), rstan (≥ 2.26.0), StanHeaders (≥2.26.0)

SystemRequirements:

GNU make

NeedsCompilation:

yes

RoxygenNote:

7.3.2

Suggests:

knitr, rmarkdown, testthat, covr

URL:

https://github.com/jtimonen/lgpr

BugReports:

https://github.com/jtimonen/lgpr/issues

VignetteBuilder:

knitr

Packaged:

2025-10-30 18:12:13 UTC; juhotimonen

Author:

Juho Timonen

[aut, cre], Andrew Johnson [ctb]

Maintainer:

Juho Timonen <juho.timonen@iki.fi>

Repository:

CRAN

Date/Publication:

2025-10-30 23:50:14 UTC

The 'lgpr' package.

Description

Interpretable nonparametric modeling of longitudinal datausing additive Gaussian process regression. Contains functionalityfor inferring covariate effects and assessing covariate relevances.Models are specified using a convenient formula syntax, and can includeshared, group-specific, non-stationary, heterogeneous and temporallyuncertain effects. Bayesian inference for model parameters is performedusing 'Stan' (rstan). The modeling approach and methodsare described in detail inTimonen et al. (2021).

Core functions

Main functionality of the package consists of creating and fitting anadditive GP model:

lgp: Specify and fit an additive GP model with onecommand.
create_model: Define anlgpmodel object.
sample_model: Fit a model by sampling the posteriordistribution of its parameters and create anlgpfit object.
pred: Computing model predictions and inferredcovariate effects after fitting a model.
relevances: Assessing covariate relevances afterfitting a model.
prior_pred: Prior predictive sampling to checkif your prior makes sense.

Visualization

plot_pred: Plot model predictions.
plot_components: Visualize inferred model components.
plot_draws: Visualize parameter draws.
plot_data: Visualize longitudinal data.

Data

The data that you wish to analyze with 'lgpr' should be in anRdata.frame where columns correspond to measured variables and rowscorrespond to observations. Some functions that can help working with suchdata frames are:

new_x: Creating new test points where the posteriordistribution of any function component or sum of all components, or theposterior predictive distribution can be computed after model fitting.
Other functions:add_factor,add_factor_crossing,add_dis_age,adjusted_c_hat.

Vignettes and tutorials

Seehttps://jtimonen.github.io/lgpr-usage/index.html. Thetutorials focus on code and use cases, whereas theMathematical description of lgpr modelsvignette describes the statistical models and how they can be customized in'lgpr'.

Citation

Runcitation("lgpr") to get citation information.

Feedback

Bug reports, PRs, enhancement ideas or user experiences in general arewelcome and appreciated. Create an issue in Github or email the author.

Author(s)

Juho Timonen (first.last at iki.fi)

References

Timonen, J. et al. (2021).lgpr: an interpretable non-parametric method for inferring covariateeffects from longitudinal data. Bioinformatics,url.
Carpenter, B. et al. (2017).Stan: A probabilistic programming language. Journal of StatisticalSoftware 76(1).

An S4 class to represent analytically computed predictive distributions(conditional on hyperparameters) of an additive GP model

Description

An S4 class to represent analytically computed predictive distributions(conditional on hyperparameters) of an additive GP model

Usage

## S4 method for signature 'GaussianPrediction'show(object)## S4 method for signature 'GaussianPrediction'component_names(object)## S4 method for signature 'GaussianPrediction'num_components(object)## S4 method for signature 'GaussianPrediction'num_paramsets(object)## S4 method for signature 'GaussianPrediction'num_evalpoints(object)

Arguments

object

GaussianPrediction object for which to apply aclass method.

Methods (by generic)

show(GaussianPrediction): Print a summary about the object.
component_names(GaussianPrediction): Get names of components.
num_components(GaussianPrediction): Get number of components.
num_paramsets(GaussianPrediction): Get number of parameter combinations(different parameter vectors) using which predictions were computed.
num_evalpoints(GaussianPrediction): Get number of points wherepredictions were computed.

Slots

f_comp_mean: component means
f_comp_std: component standard deviations
f_mean: signal mean (on normalized scale)
f_std: signal standard deviation (on normalized scale)
y_mean: predictive mean (on original data scale)
y_std: predictive standard deviation (on original data scale)
x: a data frame of points (covariate values) where thefunction posteriors or predictive distributions have been evaluated

An S4 class to represent input for kernel matrix computations

Description

An S4 class to represent input for kernel matrix computations

Usage

## S4 method for signature 'KernelComputer'show(object)## S4 method for signature 'KernelComputer'num_components(object)## S4 method for signature 'KernelComputer'num_evalpoints(object)## S4 method for signature 'KernelComputer'num_paramsets(object)## S4 method for signature 'KernelComputer'component_names(object)

Arguments

object

The object for which to call a class method.

Methods (by generic)

show(KernelComputer): Print a summary about the object.
num_components(KernelComputer): Get number of components.
num_evalpoints(KernelComputer): Get number of evaluation points.
num_paramsets(KernelComputer): Get number of parameter sets.
component_names(KernelComputer): Get component names.

Slots

input: Common input (for example parameter values).
K_input: Input for computing kernel matrices between data points(N xN). A list.
Ks_input: Input for computing kernel matrices between data and outputpoints (P xN). A list.
Kss_input: Input for computing kernel matrices between outputpoints (P xP). A list, empty iffull_covariance=FALSE.
comp_names: Component names (character vector).
full_covariance: Boolean value determining if this can computefull predictive covariance matrices (or just marginal variance ateach point).
no_separate_output_points: Boolean value determining ifKs_input andKss_input are the same thing. Using thisknowledge can reduce unnecessary computations of kernel matrices.
STREAM: external pointer (for calling 'Stan' functions)

An S4 class to represent prior or posteriordraws from an additive function distribution.

Description

An S4 class to represent prior or posteriordraws from an additive function distribution.

Usage

## S4 method for signature 'Prediction'show(object)## S4 method for signature 'Prediction'component_names(object)## S4 method for signature 'Prediction'num_components(object)## S4 method for signature 'Prediction'num_paramsets(object)## S4 method for signature 'Prediction'num_evalpoints(object)

Arguments

object

Prediction object for which to apply a classmethod.

Methods (by generic)

show(Prediction): Print a summary about the object.
component_names(Prediction): Get names of components.
num_components(Prediction): Get number of components.
num_paramsets(Prediction): Get number of parameter combinations(different parameter vectors) using which predictions were computed.
num_evalpoints(Prediction): Get number of points wherepredictions were computed.

Slots

f_comp: component draws
f: signal draws
h: predictions (signal draws + scaling factorc_hat,transformed through inverse link function)
x: a data frame of points (covariate values) where thefunctions/predictions have been evaluated/sampled
extrapolated: Boolean value telling if the function draws areoriginal MCMC draws or if they have been created by extrapolatingsuch draws.

Easily add the disease-related age variable to a data frame

Description

Creates the disease-related age covariate vector based on thedisease initiation times and adds it to the data frame

Usage

add_dis_age(data, t_init, id_var = "id", time_var = "age")

Arguments

data

the original data frame

t_init

A named vector containing the observed initiation or onsettime for each individual. The names, i.e.names(t_init), shouldspecify the individual id.

id_var

name of the id variable indata

time_var

name of the time variable indata

Value

A data frame with one column added. The new column willbe calleddis_age. For controls, its value will beNaN.

Easily add a categorical covariate to a data frame

Description

Easily add a categorical covariate to a data frame

Usage

add_factor(data, x, id_var = "id")

Arguments

data

the original data frame

x

A named vector containing the category for each individual.The names should specify the individual id.

id_var

name of the id variable indata

Value

A data frame with one column added. The new column willhave same name as the variable passed as inputx.

Add a crossing of two factors to a data frame

Description

Add a crossing of two factors to a data frame

Usage

add_factor_crossing(data, fac1, fac2, new_name)

Arguments

data

a data frame

fac1

name of first factor, must be found indf

fac2

name of second factor, must be found indf

new_name

name of the new factor

Value

a data frame

Set the GP mean vector, taking TMM or other normalizationinto account

Description

Creates thec_hat input forlgp,so that it accounts for normalization between data points in the"poisson" or"nb" observation model

Usage

adjusted_c_hat(y, norm_factors)

Arguments

y

response variable, vector of lengthn

norm_factors

normalization factors, vector of lengthn

Value

a vector of lengthn, which can be used asthec_hat input to thelgp function

Apply variable scaling

Description

Apply variable scaling

Usage

apply_scaling(scaling, x, inverse = FALSE)

Arguments

scaling

an object of classlgpscaling

x

object to which apply the scaling (numeric)

inverse

whether scaling should be done in inverse direction

Value

a similar object asx

Character representations of different formula objects

Description

Character representations of different formula objects

Usage

## S4 method for signature 'lgpexpr'as.character(x)## S4 method for signature 'lgpterm'as.character(x)## S4 method for signature 'lgpformula'as.character(x)

Arguments

x

an object of some S4 class

Value

a character representation of the object

Create a model

Description

See theMathematical description of lgpr modelsvignette for more information about the connection between different optionsand the created statistical model.

Usage

create_model(  formula,  data,  likelihood = "gaussian",  prior = NULL,  c_hat = NULL,  num_trials = NULL,  options = NULL,  prior_only = FALSE,  verbose = FALSE,  sample_f = !(likelihood == "gaussian"))

Arguments

formula

The model formula, where

it must contain exatly one tilde (~), with responsevariable on the left-hand side and model terms on the right-hand side
terms are be separated by a plus (+) sign
all variables appearing informula must befound indata

See the "Model formula syntax" section below (lgp) forinstructions on how to specify the model terms.

data

Adata.frame where each column corresponds to onevariable, and each row is one observation. Continuous covariates and theresponse variable must have type"numeric" and categorical covariatesmust have type"factor". Missing values should be indicated withNaN orNA. The response variable cannot contain missingvalues. Column names should not contain trailing or leading underscores.

likelihood

Determines the observation model. Must be either"gaussian" (default),"poisson","nb" (negativebinomial),"binomial" or"bb" (beta binomial).

prior

A named list, defining the prior distribution of model(hyper)parameters. See the "Defining priors" section below(lgp).

c_hat

The GP mean. This should only be given ifsample_f isTRUE, otherwise the GP will always have zero mean. Ifsample_fisTRUE, the givenc_hat can be a vector of lengthdim(data)[1], or a real number defining a constant GP mean. If notspecified andsample_f isTRUE,c_hat is set to

c_hat = mean(y), iflikelihood is"gaussian",
c_hat =log(mean(y)) iflikelihood is"poisson" or"nb",
c_hat =log(p/(1-p)), wherep = mean(y/num_trials) iflikelihood is"binomial"or"bb",

wherey denotes the response variable measurements.

num_trials

This argument (number of trials) is only needed whenlikelihood is"binomial" or"bb". Must have length one orequal to the number of data points. Settingnum_trials=1 andlikelihood="binomial" corresponds to Bernoulli observation model.

options

A named list with the following possible fields:

delta Amount of added jitter to ensure positive definitecovariance matrices.
vm_params Variance mask function parameters (numericvector of length 2).

Ifoptions isNULL, default options are used. The defaultsare equivalent tooptions = list(delta = 1e-8, vm_params = c(0.025, 1)).

prior_only

Should likelihood be ignored? See alsosample_param_prior which can be used for anylgpmodel, and whose runtime is independent of the number ofobservations.

verbose

Should some informative messages be printed?

sample_f

Determines if the latent function values are sampled(must beTRUE if likelihood is not"gaussian"). If this isTRUE, the response variable will be normalized to have zero meanand unit variance.

Value

An object of classlgpmodel, containing theStan input created based on parsing the specifiedformula,prior, and other options.

Parse the covariates and model components from given data and formula

Description

Parse the covariates and model components from given data and formula

Usage

create_model.covs_and_comps(data, model_formula, x_cont_scl, verbose)

Arguments

data

model_formula

an object of classlgpformula

x_cont_scl

Information on how to scale the continuous covariates.This can either be

an existing list of objects with classlgpscaling, or
NA, in which case such list is created by computing meanand standard deviation fromdata

verbose

Should some informative messages be printed?

Value

parsed input to Stan and covariate scaling, and other info

Create a model formula

Description

Checks if formula is in advanced format and translates if not.

Usage

create_model.formula(formula, data, verbose = FALSE)

Arguments

formula

The model formula, where

it must contain exatly one tilde (~), with responsevariable on the left-hand side and model terms on the right-hand side
terms are be separated by a plus (+) sign
all variables appearing informula must befound indata

See the "Model formula syntax" section below (lgp) forinstructions on how to specify the model terms.

data

verbose

Should some informative messages be printed?

Value

an object of classlgpformula

Parse the response variable and its likelihood model

Description

Parse the response variable and its likelihood model

Usage

create_model.likelihood(  data,  likelihood,  c_hat,  num_trials,  y_name,  sample_f,  verbose)

Arguments

data

likelihood

Determines the observation model. Must be either"gaussian" (default),"poisson","nb" (negativebinomial),"binomial" or"bb" (beta binomial).

c_hat

c_hat = mean(y), iflikelihood is"gaussian",
c_hat =log(mean(y)) iflikelihood is"poisson" or"nb",
c_hat =log(p/(1-p)), wherep = mean(y/num_trials) iflikelihood is"binomial"or"bb",

wherey denotes the response variable measurements.

num_trials

y_name

Name of response variable

sample_f

Determines if the latent function values are sampled(must beTRUE if likelihood is not"gaussian"). If this isTRUE, the response variable will be normalized to have zero meanand unit variance.

verbose

Should some informative messages be printed?

Value

a list of parsed options

Parse the given modeling options

Description

Parse the given modeling options

Usage

create_model.options(options, verbose)

Arguments

options

A named list with the following possible fields:

delta Amount of added jitter to ensure positive definitecovariance matrices.
vm_params Variance mask function parameters (numericvector of length 2).

Ifoptions isNULL, default options are used. The defaultsare equivalent tooptions = list(delta = 1e-8, vm_params = c(0.025, 1)).

verbose

Should some informative messages be printed?

Value

a named list of parsed options

Parse given prior

Description

Parse given prior

Usage

create_model.prior(prior, stan_input, verbose)

Arguments

prior

A named list, defining the prior distribution of model(hyper)parameters. See the "Defining priors" section below(lgp).

stan_input

a list of stan input fields

verbose

Should some informative messages be printed?

Value

a named list of parsed options

Helper function for plots

Description

Helper function for plots

Usage

create_plot_df(object, x = "age", group_by = "id")

Arguments

object

model or fit

x

x-axis variable name

group_by

grouping variable name (useNULL for no grouping)

Value

a data frame

Create a standardizing transform

Description

Create a standardizing transform

Usage

create_scaling(x, name)

Arguments

x

variable measurements (might containNA orNaN)

name

variable name

Value

an object of classlgpscaling

Density and quantile functions of the inverse gamma distribution

Description

Using the same parametrization as Stan. More infohere.

Usage

dinvgamma_stanlike(x, alpha, beta, log = FALSE)qinvgamma_stanlike(p, alpha, beta)

Arguments

x

point where to compute the density

alpha

positive real number

beta

positive real number

log

is log-scale used?

p

quantile (must be between 0 and 1)

Value

density/quantile value

Draw pseudo-observations from posterior or prior predictive distribution

Description

Draw pseudo-observations from predictive distribution.Ifpred contains draws from the component posterior (prior)distributions, then the output is draws from the posterior (prior)predictive distribution. Ifpred is not specified, thenwhether output draws are from prior or posterior predictive distributiondepends on whetherfit is created using thelgpoptionprior_only=TRUE or not.

Usage

draw_pred(fit, pred = NULL)

Arguments

fit

An object of classlgpfit that has been createdusing thelgp optionsample_f=TRUE.

pred

An object of classPrediction, containingdraws of each model component. IfNULL, this isobtained usingget_pred(fit).

Value

An array with shapeS x P, whereS is the number ofdraws thatpred contains andP is the length of eachfunction draw.Each rows = 1, \ldots, S of the output is one vector drawn from thepredictive distribution, given parameter draws.

Quick way to create an example lgpfit, useful for debugging

Description

Quick way to create an example lgpfit, useful for debugging

Usage

example_fit(  formula = y ~ id + age + age | SEX + age | LOC,  likelihood = "gaussian",  chains = 1,  iter = 30,  num_indiv = 6,  num_timepoints = 5,  ...)

Arguments

formula

model formula

likelihood

observation model

chains

number of chains to run

iter

number of iterations to run

num_indiv

number of individuals (data simulation)

num_timepoints

number of time points (data simulation)

...

additional arguments tolgp

Value

Anlgpfit object created by fittingthe example model.

Print a fit summary.

Description

Print a fit summary.

Usage

fit_summary(fit, ignore_pars = c("f_latent", "eta", "teff_raw", "lp__"))

Arguments

fit

an object of classlgpfit

ignore_pars

parameters and generated quantities to ignore from output

Value

object invisibly.

Extract parameter draws from lgpfit or stanfit

Description

Usesextractwithpermuted = FALSE andinc_warmup = FALSE.

Usage

get_draws(object, draws = NULL, reduce = NULL, ...)

Arguments

object

An object of classlgpfit orstanfit.

draws

Indices of the parameter draws.NULL corresponds toall post-warmup draws.

reduce

Function used to reduce all parameter draws intoone set of parameters. Ignored ifNULL, or ifdraws is notNULL.

...

Additional arguments torstan::extract().

Value

The return value is always a 2-dimensional array of shapenum_param_sets xnum_params.

Extract model predictions and function posteriors

Description

NOTE: It is not recommended for users to call this. Usepred instead.

Usage

get_pred(fit, draws = NULL, reduce = NULL, verbose = TRUE)

Arguments

fit

An object of classlgpfit.

draws

Indices of parameter draws to use, orNULL to use alldraws.

reduce

Reduction for parameters draws. Can be a function thatis applied to reduce all parameter draws into one parameter set, orNULL (no reduction). Has no effect ifdraws is specified.

verbose

Should more information and a possible progress bar beprinted?

Value

an object of classGaussianPrediction orPrediction

Compute a kernel matrix (covariance matrix)

Description

These haveSTAN_kernel_* counterparts. These R versionsare provided for reference and are not optimized for speed. These areused when generating simulated data, and not during model inference.

Usage

kernel_eq(x1, x2, alpha = 1, ell)kernel_ns(x1, x2, alpha = 1, ell, a)kernel_zerosum(x1, x2, M)kernel_bin(x1, x2, pos_class = 0)kernel_cat(x1, x2)kernel_varmask(x1, x2, a, vm_params)kernel_beta(beta, idx1_expand, idx2_expand)

Arguments

x1

vector of lengthn

x2

vector of lengthm

alpha

marginal std (default = 1)

ell

lengthscale

a

steepness of the warping function rise

M

number of categories

pos_class

binary (mask) kernel function has value one if both inputshave this value, other wise it is zero

vm_params

vector of two mask function parameters.

beta

a parameter vector (row vector) of lengthN_cases

idx1_expand

integer vector of lengthn

idx2_expand

integer vector of lengthm

Value

A matrix of sizen xm.

Functions

kernel_eq(): Uses the exponentiated quadratic kernel.
kernel_ns(): Uses the non-stationary kernel (input warping + squaredexponential).
kernel_zerosum(): Uses the zero-sum kernel. Here,x1 andx2 must be integer vectors (integers denoting different categories).Returns a binary matrix.
kernel_bin(): Uses the binary (mask) kernel. Here,x1 andx2 must be integer vectors (integers denoting different categories).Returns a binary matrix.
kernel_cat(): Uses the categorical kernel. Here,x1 andx2 must be integer vectors (integers denoting different categories).Returns a binary matrix.
kernel_varmask(): Computes variance mask multiplier matrix.NaN'sinx1 andx2 will be replaced by 0.
kernel_beta(): Computes the heterogeneity multiplier matrix.NOTE:idx_expand needs to be given so thatidx_expand[j]-1 tells the index of the beta parameter that should beused for thejth observation. If observationj doesn'tcorrespond to any beta parameter, thenidx_expand[j] should be 1.

Main function of the 'lgpr' package

Description

Creates an additive Gaussian process model usingcreate_model and fits it usingsample_model.See theMathematical description of lgpr modelsvignette for more information about the connection between different optionsand the created statistical model.

Usage

lgp(  formula,  data,  likelihood = "gaussian",  prior = NULL,  c_hat = NULL,  num_trials = NULL,  options = NULL,  prior_only = FALSE,  verbose = FALSE,  sample_f = !(likelihood == "gaussian"),  quiet = FALSE,  skip_postproc = sample_f,  ...)

Arguments

formula

The model formula, where

it must contain exatly one tilde (~), with responsevariable on the left-hand side and model terms on the right-hand side
terms are be separated by a plus (+) sign
all variables appearing informula must befound indata

See the "Model formula syntax" section below (lgp) forinstructions on how to specify the model terms.

data

likelihood

Determines the observation model. Must be either"gaussian" (default),"poisson","nb" (negativebinomial),"binomial" or"bb" (beta binomial).

prior

A named list, defining the prior distribution of model(hyper)parameters. See the "Defining priors" section below(lgp).

c_hat

c_hat = mean(y), iflikelihood is"gaussian",
c_hat =log(mean(y)) iflikelihood is"poisson" or"nb",
c_hat =log(p/(1-p)), wherep = mean(y/num_trials) iflikelihood is"binomial"or"bb",

wherey denotes the response variable measurements.

num_trials

options

A named list with the following possible fields:

delta Amount of added jitter to ensure positive definitecovariance matrices.
vm_params Variance mask function parameters (numericvector of length 2).

Ifoptions isNULL, default options are used. The defaultsare equivalent tooptions = list(delta = 1e-8, vm_params = c(0.025, 1)).

prior_only

Should likelihood be ignored? See alsosample_param_prior which can be used for anylgpmodel, and whose runtime is independent of the number ofobservations.

verbose

Can messages be printed during model creation? Has noeffect ifquiet=TRUE.

sample_f

Determines if the latent function values are sampled(must beTRUE if likelihood is not"gaussian"). If this isTRUE, the response variable will be normalized to have zero meanand unit variance.

quiet

Should all output messages be suppressed? You need to setalsorefresh=0 if you want to suppress also the progress updatemessages fromsampling.

skip_postproc

Should all postprocessing be skipped? If this isTRUE, the returnedlgpfit object will likely bemuch smaller (ifsample_f=FALSE).

...

Optional arguments passed tosampling oroptimizing.

Value

Returns an object of the S4 classlgpfit.

Model formula syntax

There are two ways to define the model formula:

Using a commonformula-like syntax, like iny ~ age +age|id + sex. Terms can consist of asingle variable, such asage, or an interaction of two variables,such asage|id. In single-variable terms, the variable can be eithercontinuous (numeric) or categorical (factor), whereas in interaction termsthe variable on the left-hand side of the vertical bar (|) has tobe continuous and the one on the right-hand side has to be categorical.Formulae specified using this syntax are translated to the advanced formatso that
- single-variable terms becomegp(x) ifvariablex is numeric andzs(x) ifx is a factor
- interaction termsx|z becomegp(x)*zs(z)
Using the advanced syntax, like iny ~ gp(age) +gp(age)*zs(id) +het(id)*gp_vm(disAge).This createslgprhs objects, which consist oflgpterms, which consist oflgpexprs.This approach must be used if creating nonstationary, heterogeneous ortemporally uncertain components.

Either one of the approaches should be used and they should not be mixed.

Defining priors

Theprior argument must be a named list, likelist(alpha=student_t(4), wrp=igam(30,10)). See examples in tutorials.Possible allowed names are

"alpha" = component magnitude parameters
"ell" = component lengthscale parameters
"wrp" = input warping steepness parameters
"sigma" = noise magnitude (Gaussian obs. model)
"phi" = inv. overdispersion (negative binomial obs. model)
"gamma" = overdispersion (beta-binomial obs. model)
"beta" = heterogeneity parameters
"effect_time" = uncertain effect time parameters
"effect_time_info" = additional options for the above

Seepriors for functions that can beused to define the list elements. If a parameter of a model is not givenin this list, a default prior will be used for it.

When to not use default priors

It is not recommended to use default priors blindly. Rather, priors shouldbe specified according to the knowledge about the problem at hand, as in anyBayesian analysis. Inlgpr this is especially important when

Using a non-Gaussian likelihood or otherwise settingsample_f = TRUE. In this case the response variable is notnormalized, so the scale on which the data varies must be taken intoaccount when defining priors of the signal magnitude parametersalpha and possible noise parameters (sigma,phi,gamma). Also it should be checked ifc_hat is set in asensible way.
Using a model that contains agp_ns(x) orgp_vm(x)expression in its formula. In this case the corresponding covariatex is not normalized, and the prior for the input warping steepnessparameterwrp must be set according to the expected width of thewindow in which the nonstationary effect ofx occurs. By default,the width of this window is about 36, which has been set assuming thatthe unit ofx is months.

An S4 class to represent an lgp expression

Description

An S4 class to represent an lgp expression

Slots

covariate: name of a covariate
fun: function name

An S4 class to represent the output of the`lgp` function

Description

An S4 class to represent the output of thelgp function

Usage

## S4 method for signature 'lgpfit'show(object)## S4 method for signature 'lgpfit'component_names(object)## S4 method for signature 'lgpfit'num_components(object)## S4 method for signature 'lgpfit'postproc(object, verbose = TRUE)## S4 method for signature 'lgpfit'contains_postproc(object)## S4 method for signature 'lgpfit'clear_postproc(object)## S4 method for signature 'lgpfit'get_model(object)## S4 method for signature 'lgpfit'get_stanfit(object)## S4 method for signature 'lgpfit'is_f_sampled(object)## S4 method for signature 'lgpfit,missing'plot(x, y)

Arguments

object

The object for which to apply a class method.

verbose

Can the method print any messages?

x

anlgpfit object to visualize

y

unused argument

Methods (by generic)

show(lgpfit): Print information and summary about the fit object.
component_names(lgpfit): Get names of model components.
num_components(lgpfit): Get number of model components. Returns apositive integer.
postproc(lgpfit): Apply postprocessing. Returns an updatedlgpfit object (copies data).
contains_postproc(lgpfit): Check if object contains postprocessing information.
clear_postproc(lgpfit): Returns an updated (copies data)lgpfit object without any postprocessing information.
get_model(lgpfit): Get the storedlgpmodel object.Various properties of the returned object can be accessed as explainedin the documentation oflgpmodel.
get_stanfit(lgpfit): Get the storedstanfit object.Various properties of the returned object can be accessed or plottedas explainedhereor in the documentation ofstanfit.
is_f_sampled(lgpfit): Determine if inference was done by samplingthe latent signalf (and its components).
plot(x = lgpfit, y = missing): Visualize parameter draws usingplot_draws.

Slots

stan_fit: An object of classstanfit.
model: An object of classlgpmodel.
num_draws: Total number of parameter draws.
postproc_results: A named list containing possible postprocessingresults.

An S4 class to represent an lgp formula

Description

An S4 class to represent an lgp formula

Slots

terms: an object of classlgprhs
y_name: name of the response variable
call: original formula call

An S4 class to represent an additive GP model

Description

An S4 class to represent an additive GP model

Usage

## S4 method for signature 'lgpmodel'show(object)## S4 method for signature 'lgpmodel'parameter_info(object, digits = 3)## S4 method for signature 'lgpmodel'component_info(object)## S4 method for signature 'lgpmodel'num_components(object)## S4 method for signature 'lgpmodel'covariate_info(object)## S4 method for signature 'lgpmodel'component_names(object)## S4 method for signature 'lgpmodel'is_f_sampled(object)

Arguments

object

The object for which to apply a class method.

digits

number of digits to show for floating point numbers

Methods (by generic)

show(lgpmodel): Print information and summary about the object.Returnsobject invisibly.
parameter_info(lgpmodel): Get a parameter summary (bounds andpriors). Returns adata.frame.
component_info(lgpmodel): Get a data frame with information about each modelcomponent.
num_components(lgpmodel): Get number of model components. Returns apositive integer.
covariate_info(lgpmodel): Get covariate information.
component_names(lgpmodel): Get names of model components.
is_f_sampled(lgpmodel): Determine if inference of the model requires samplingthe latent signalf (and its components).

Slots

formula

An object of classlgpformula

data

The original unmodified data.

stan_input

The data to be given as input torstan::sampling

var_names

List of variable names grouped by type.

var_scalings

A named list with fields

y - Response variable normalization function and itsinverse operation. Must be anlgpscaling object.
x_cont - Continuous covariate normalization functions andtheir inverse operations. Must be a named list with each element is anlgpscaling object.

var_info

A named list with fields

x_cat_levels - Names of the levels of categorical covariatesbefore converting from factor to numeric.

info

Other info in text format.

sample_f

Whether the signalf is sampled or marginalized.

full_prior

Complete prior information.

An S4 class to represent the right-hand side of an lgp formula

Description

An S4 class to represent the right-hand side of an lgp formula

Slots

summands: a list of one or morelgpterms

An S4 class to represent variable scaling

Description

An S4 class to represent variable scaling

Slots

loc: original location (mean)
scale: original scale (standard deviation)
var_name: variable name

An S4 class to represent a data set simulated using the additive GPformalism

Description

An S4 class to represent a data set simulated using the additive GPformalism

Usage

## S4 method for signature 'lgpsim'show(object)## S4 method for signature 'lgpsim,missing'plot(x, y, ...)

Arguments

object

anlgpsim object

x

anlgpsim object to plot

y

not used

...

optional arguments passed toplot_sim

Methods (by generic)

show(lgpsim): Show summary of object.
plot(x = lgpsim, y = missing): Plot the data and generating process. For moreinformation seeplot_sim.

Slots

data

the actual data

response

name of the response variable in the data

components

the drawn function components

kernel_matrices

the covariance matrices for each gp

info

A list with fields

par_ell the used lengthscale parameters
par_cont the parameters used to generate the continuouscovariates
p_signal signal proportion

effect_times

A list with fields

true possible true effect times that generate the diseaseeffect
observed possible observed effect times

An S4 class to represent one formula term

Description

An S4 class to represent one formula term

Slots

factors: a list of at most twolgpexprs

Print a model summary.

Description

Print a model summary.

Usage

model_summary(object, digits = 3)param_summary(object, digits = 3)

Arguments

object

a model or fit

digits

number of digits to round floats to

Value

object invisibly.

Create test input points for prediction

Description

Replaces a continuous variablex in the data frame, andpossibly another continuous variablex_ns derived from it, with newvalues, for each level of a grouping factor (usually id)

Usage

new_x(data, x_values, group_by = "id", x = "age", x_ns = NULL)

Arguments

data

A data frame. Can also be anlgpfit orlgpmodel object, in which case data is extracted from it.

x_values

the values ofx to set for each individual

group_by

name of the grouping variable, must be a factorindata (or usegroup_by=NA to create a dummy groupingfactor which has only one value)

x

of the variable along which to extend,must be a numeric indata

x_ns

of a nonstationary variable derived fromx,must be a numeric indata

Value

a data frame containing the following columns

all factors in the originaldata
x
x_ns (unless it is NULL)

Operations on formula terms and expressions

Description

Operations on formula terms and expressions

Usage

## S4 method for signature 'lgprhs,lgprhs'e1 + e2## S4 method for signature 'lgpterm,lgpterm'e1 + e2## S4 method for signature 'lgprhs,lgpterm'e1 + e2## S4 method for signature 'lgpterm,lgpterm'e1 * e2

Arguments

e1

The first sum, term or expression

e2

The second sum, term or expression

Value

The behaviour and return type depend on the types ofe1 ande2.You can

sum (+) twolgprhs's to yield anlgprhs
sum (+) twolgpterm's to yield anlgprhs
sum (+) anlgprhs and anlgptermto yield anlgprhs
multiply (*) twolgpterm's to yieldanlgpterm

Plot a generated/fit model component

Description

Data frames specified in argumentsdf,anddf_err must have a format where

The first column is the grouping factor (usually id).
The second column is the x-axis variable (usually age).
The third column is the coloring factor. If name of the thirdcolumn isNA, coloring is not done.
A column namedy must contain the y-axis variable(not fordf_err).
A column namedlower (upper) must contain the lower(upper) bound of error bar (only fordf_err).
The posterior draw using which the fit has been computed can bespecified with a factor named_draw_ (only fordf).

Usage

plot_api_c(  df,  df_err = NULL,  alpha = 1,  alpha_err = 0.2,  no_err = FALSE,  no_line = FALSE)

Arguments

df

a data frame

df_err

a data frame

alpha

line opacity

alpha_err

ribbon opacity

no_err

hide error bar even when it would normally be plotted?

no_line

hide line even when it would normally be plotted?

Value

Aggplot object.

Plot longitudinal data and/or model fit so that each subject/group hastheir own panel

Description

Data frames specified in argumentsdf_data,df_signal,df_fit, anddf_fit_err must have a formatwhere

the first column is the grouping factor (usually id)
the second column is the x-axis variable (usually age)
a column namedy must contain the y-axis variable(not fordf_fit_err)
a column namedlower (upper) must contain the lower(upper) bound of error bar (only fordf_fit_err)
a column nameddraw must be a factor thatspecifies the posterior draw using which the fit has been computed(only fordf_fit)

Usage

plot_api_g(  df_data,  df_signal = NULL,  df = NULL,  df_err = NULL,  teff_signal = NULL,  teff_obs = NULL,  i_test = NULL,  color_signal = color_palette(2)[1],  color = color_palette(2)[2],  color_err = colorset("red", "light_highlight"),  color_vlines = colorset("gray", "mid_highlight"),  alpha = 1,  alpha_err = 0.5,  nrow = NULL,  ncol = NULL,  y_transform = function(x) x)

Arguments

df_data

A data frame containing the observations.

df_signal

A data frame containing the true signal. Omitted ifNULL.

df

A data frame containing the model fit, or a list of dataframes. The list version can be used for example so that each list elementcorresponds to the fit computed using one parameter draw. Omitted ifNULL.

df_err

A data frame containing error bars. Omitted ifNULL.Must beNULL ifdf_fit is a list.

teff_signal

A named vector containing true effect times used togenerate the signal. Omitted ifNULL.

teff_obs

A named vector containing observed effect times. Omitted ifNULL.

i_test

Indices of test points.

color_signal

Line color for true signal.

color

Line color for model fit.

color_err

Color of the error ribbon.

color_vlines

Two line colors for vertical lines(true and obs. effect time).

alpha

Line opacity for model fit.

alpha_err

Opacity of the error ribbon.

nrow

number of rows, an argument forfacet_wrap

ncol

number of columns, an argument forfacet_wrap

y_transform

A function to be applied to the third column ofdf_data.

Value

Aggplot object.

Visualize all model components

Description

This callsplot_f for all model components.

Usage

plot_components(  fit,  pred = NULL,  group_by = "id",  t_name = "age",  MULT_STD = 2,  verbose = TRUE,  draws = NULL,  reduce = function(x) base::mean(x),  color_by = NA,  no_err = FALSE,  ylim = NULL,  draw = TRUE,  nrow = NULL,  ncol = NULL,  gg_add = NULL,  x = NULL,  ...)

Arguments

fit

An object of classlgpfit.

pred

An object of classGaussianPrediction orPrediction. Ifpred=NULL, thepredfunction is called with the givenreduce anddraws arguments.

group_by

name of the grouping variable (usegroup_by=NAto avoid grouping)

t_name

name of the x-axis variable

MULT_STD

a multiplier for standard deviation

verbose

Can this print any messages?

draws

Only has effect ifpred=NULL.

reduce

Only has effect ifpred=NULL.

color_by

Names of coloring factors. Can have length 1 or equal tothe number of components. See thecolor_by argument ofplot_f.

no_err

Should the error ribbons be skipped even though theyotherwise would be shown? Can have length 1 or equal to number ofcomponents + 1. See theno_err argument ofplot_api_c.

ylim

a vector of length 2 (upper and lower y-axis limits), or NULL

draw

if this is TRUE, the plot grid is drawn usingarrangeGrob

nrow

number of grid rows

ncol

number of grid columns

gg_add

additional ggplot obejct to add to each plot

x

Deprecated argument. This is now taken from thepredobject to ensure compatibility.

...

additional arguments toplot_api_c

Value

a list of ggplot objects invisibly

Vizualizing longitudinal data

Description

Vizualizing longitudinal data

Usage

plot_data(  data,  x_name = "age",  y_name = "y",  group_by = "id",  facet_by = NULL,  color_by = NULL,  highlight = NULL,  main = NULL,  sub = NULL)

Arguments

data

A data frame.

x_name

Name of x-axis variable.

y_name

Name of the y-axis variable.

group_by

Name of grouping variable (must be a factor).

facet_by

Name of the faceting variable (must be a factor).

color_by

Name of coloring variable (must be a factor).

highlight

Value of category of thegroup_by variablethat is highlighted. Can only be used ifcolor_by isNULL.

main

main plot title

sub

plot subtitle

Value

aggplot object

Visualize the distribution of parameter draws

Description

Visualize the distribution of parameter draws

Usage

plot_draws(  fit,  type = "intervals",  regex_pars = c("alpha", "ell", "wrp", "sigma", "phi", "gamma"),  ...)plot_beta(fit, type = "dens", verbose = TRUE, ...)plot_warp(  fit,  num_points = 300,  window_size = 48,  color = colorset("red", "dark"),  alpha = 0.5)plot_effect_times(fit, type = "areas", verbose = TRUE, ...)

Arguments

fit

an object of classlgpfit

type

plot type, allowed options are "intervals", "dens","areas", and "trace"

regex_pars

regex for parameter names to plot

...

additional arguments for thebayesplot functionmcmc_intervals,mcmc_dens,mcmc_areas ormcmc_trace

verbose

Can any output be printed?

num_points

number of plot points

window_size

width of time window

color

line color

alpha

line alpha

Value

aggplot object or list of them

Functions

plot_draws(): visualizes the distribution of any set ofmodel parameters (defaults to kernel hyperparameters and possibleobservation model parameters)
plot_beta(): visualizes the distribution of theindividual-specific disease effect magnitude parameter draws
plot_warp(): visualizes the input warping function fordifferent draws of the warping steepness parameter
plot_effect_times(): visualizes the input warping function fordifferent parameter draws

Visualize input warping function with several steepness parameter values

Description

Visualize input warping function with several steepness parameter values

Usage

plot_inputwarp(wrp, x, color = colorset("red", "dark"), alpha = 0.5)

Arguments

wrp

a vector of values of the warping steepness parameter

x

a vector of input values

color

line color

alpha

line alpha

Value

aggplot object

Plot the inverse gamma-distribution pdf

Description

Plot the inverse gamma-distribution pdf

Usage

plot_invgamma(  alpha,  beta,  by = 0.01,  log = FALSE,  IQR = 0.95,  return_quantiles = FALSE,  linecolor = colorset("red", "dark"),  fillcolor = colorset("red", "mid"))

Arguments

alpha

positive real number

beta

positive real number

by

grid size

log

is log-scale used?

IQR

inter-quantile range width

return_quantiles

should this return a list

linecolor

line color

fillcolor

fill color

Value

aggplot object

Visualizing model predictions or inferred covariate effects

Description

Function draws at data points can be visualized usingplot_pred. If thepred argument isNULL, itis computed using thepred function withx=NULL.
The total signalf or any of itsadditive components can be plotted usingplot_f.

Usage

plot_pred(  fit,  pred = NULL,  group_by = "id",  t_name = "age",  MULT_STD = 2,  verbose = TRUE,  draws = NULL,  reduce = function(x) base::mean(x),  x = NULL,  ...)plot_f(  fit,  pred = NULL,  group_by = "id",  t_name = "age",  MULT_STD = 2,  verbose = TRUE,  draws = NULL,  reduce = function(x) base::mean(x),  comp_idx = NULL,  color_by = NA,  x = NULL,  ...)

Arguments

fit

An object of classlgpfit.

pred

An object of classGaussianPrediction orPrediction. Ifpred=NULL, thepredfunction is called with the givenreduce anddraws arguments.

group_by

name of the grouping variable (usegroup_by=NAto avoid grouping)

t_name

name of the x-axis variable

MULT_STD

a multiplier for standard deviation

verbose

Can this print any messages?

draws

Only has effect ifpred=NULL.

reduce

Only has effect ifpred=NULL.

x

Deprecated argument. This is now taken from thepredobject to ensure compatibility.

...

additional arguments toplot_api_g orplot_api_c

comp_idx

Index of component to plot. The total sum is plottedif this isNULL.

color_by

name of coloring factor

Value

aggplot object

Visualize an lgpsim object (simulated data)

Description

Visualize an lgpsim object (simulated data)

Usage

plot_sim(  simdata,  group_by = "id",  x_name = "age",  h_name = "h",  y_name = "y",  comp_idx = NULL,  color_by = NA,  verbose = TRUE,  ...)

Arguments

simdata

an object of classlgpsim

group_by

grouping factor

x_name

name of x-axis variable

h_name

name of the signal insimdata$components ("h" or "f")

y_name

name of response variable

comp_idx

Possible index of a component to be shown.If this is NULL, the data and total signal are shown.

color_by

coloring factor

verbose

should some information be printed?

...

additional arguments toplot_api_g orplot_api_c

Value

aggplot object

Graphical posterior predictive checks

Description

Graphical posterior predictive checks

Usage

ppc(fit, data = NULL, fun = default_ppc_fun(fit), verbose = TRUE, ...)

Arguments

fit

An object of classlgpfit that can been createdwithsample_f=TRUE.

data

the original data frame (deprecated argument with noeffect, now obtained from fit object)

fun

bayesplot function name

verbose

Can this print any messages?

...

additional arguments passed to the defaultpp_check method inbayesplot

Value

aggplot object

Posterior predictions and function posteriors

Description

Iffit is for a model that marginalizes the latentsignalf (i.e.is_f_sampled(fit) isFALSE), thiscomputes the analytic conditional posteriordistributions of each model component, their sum, and the conditionalpredictive distribution. All these are computed foreach (hyper)parameter draw (defined bydraws), or other parameterset (obtained by a reduction defined byreduce). Results are storedin aGaussianPrediction object which is then returned.
Iffit is for a model that samples the latentsignalf (i.e.is_f_sampled(fit) isTRUE), this willextract these function samples, compute their sum, and a version of thesumf that is transformed through the inverse link function.Ifx is notNULL, the function draws are extrapolatedto the points specified byx using kernel regression.Results are stored in aPredictionobject which is then returned.

Usage

pred(  fit,  x = NULL,  reduce = function(x) base::mean(x),  draws = NULL,  verbose = TRUE,  STREAM = get_stream(),  c_hat_pred = NULL,  force = FALSE,  debug_kc = FALSE)

Arguments

fit

An object of classlgpfit.

x

A data frame of points where function posterior distributionsand predictions should be computed or sampled.The functionnew_x provides an easy way to create it.If this isNULL, the data points are used.

reduce

Reduction for parameters draws. Can be a function thatis applied to reduce all parameter draws into one parameter set, orNULL (no reduction). Has no effect ifdraws is specified.

draws

Indices of parameter draws to use, orNULL to use alldraws.

verbose

Should more information and a possible progress bar beprinted?

STREAM

External pointer. By default obtained withrstan::get_stream().

c_hat_pred

This is only used if the latent signalf wassampled. This input contains the values added to the sumf beforepassing through inverse link function. Must be a vector with length equal tothe number of prediction points. If originalc_hat was constant,thenc_hat_pred can be ignored, in which case this will by defaultuse the same constant.

force

This is by defaultFALSE to prevent unintendedlarge computations that might crash R or take forever. Set it toTRUEtry computing no matter what.

debug_kc

If this isTRUE, this only returns aKernelComputer object that is created internally. Meant fordebugging.

Value

An object of classGaussianPrediction orPrediction.

Prior (predictive) sampling

Description

These functions take anlgpmodel object, and

prior_pred samples from the prior predictive distribution ofthe model
sample_param_prior samples only its parameter prior usingsampling

Usage

prior_pred(  model,  verbose = TRUE,  quiet = FALSE,  refresh = 0,  STREAM = get_stream(),  ...)sample_param_prior(model, verbose = TRUE, quiet = FALSE, ...)

Arguments

model

An object of classlgpmodel.

verbose

Should more information and a possible progress bar beprinted?

quiet

This forcesverbose to beFALSE. If you wantto suppress also the output from Stan, give the additional argumentrefresh=0.

refresh

Argument forsampling.

STREAM

External pointer. By default obtained withrstan::get_stream().

...

Additional arguments forsampling.

Value

prior_pred returns a list with components
- y_draws: A matrix containing the prior predictive drawsas rows. Can be passed tobayesplot::pp_check() forgraphical prior predictive checking.
- pred_draws: an object of classPrediction,containing prior draws of each model component and their sum
- param_draws: astanfit object of prior parameterdraws (obtained by callingsample_param_prior internally)
sample_param_prior returnsan object of classstanfit

Convert given prior to numeric format

Description

Convert given prior to numeric format

Usage

prior_to_num(desc)

Arguments

desc

Prior description as a named list, containing fields

dist - Distribution name. Must be one of'uniform', 'normal', 'student-t', 'gamma', 'inv-gamma', or 'log-normal'(case-insensitive)
square - Is the prior for a square-transformed parameter.

Other list fields are interpreted as hyperparameters.

Value

a named list of parsed options

Prior definitions

Description

These use the same parametrizations as defined in the 'Stan'documentation. See the docs forgamma andinverse gamma distributions.

Usage

uniform(square = FALSE)normal(mu, sigma, square = FALSE)student_t(nu, square = FALSE)gam(shape, inv_scale, square = FALSE)igam(shape, scale, square = FALSE)log_normal(mu, sigma, square = FALSE)bet(a, b)

Arguments

square

is prior for a square-transformed parameter?

mu

mean

sigma

standard deviation

nu

degrees of freedom

shape

shape parameter (alpha)

inv_scale

inverse scale parameter (beta)

scale

scale parameter (beta)

a

shape parameter

b

shape parameter

Value

a named list

Examples

# Log-normal priorlog_normal(mu = 1, sigma = 1)# Cauchy priorstudent_t(nu = 1)# Exponential prior with rate = 0.1gam(shape = 1, inv_scale = 0.1)# Create a similar priors as in LonGP (Cheng et al., 2019)# Not recommended, because a lengthscale close to 0 is possible.a <- log(1) - log(0.1)log_normal(mu = 0, sigma = a / 2) # for continuous lengthscalestudent_t(nu = 4) # for interaction lengthscaleigam(shape = 0.5, scale = 0.005, square = TRUE) # for sigma

Function for reading the built-in proteomics data

Description

Function for reading the built-in proteomics data

Usage

read_proteomics_data(parentDir = NULL, protein = NULL, verbose = TRUE)

Arguments

parentDir

Path to local parent directory for the data.If this isNULL, data is downloaded fromhttps://github.com/jtimonen/lgpr-usage/tree/master/data/proteomics.

protein

Index or name of protein.

verbose

Can this print some output?

Value

adata.frame

Assess component relevances

Description

Assess component relevances

Usage

relevances(fit, reduce = function(x) base::mean(x), verbose = TRUE, ...)

Arguments

fit

an object of classlgpfit

reduce

a function to apply to reduce the relevances given eachparameter draw into one value

verbose

Can this print any messages?

...

currently has no effect

Value

a named vector with length equal tonum_comps + 1

S4 generics for lgpfit, lgpmodel, and other objects

Description

S4 generics for lgpfit, lgpmodel, and other objects

Usage

parameter_info(object, digits)component_info(object)covariate_info(object)component_names(object)get_model(object)is_f_sampled(object)get_stanfit(object)postproc(object, ...)contains_postproc(object)clear_postproc(object)num_paramsets(object)num_evalpoints(object)num_components(object)

Arguments

object

object for which to apply the generic

digits

number of digits to show

...

additional optional arguments to pass

Value

parameter_info returns a data frame withone row for each parameter and columnsfor parameter name, parameter bounds, and the assigned prior
component_info returns a data frame with one row foreach model component, and columns encoding information aboutmodel components
covariate_info returns a list with namescontinuous andcategorical, with information aboutboth continuous and categorical covariates
component_names returns a character vector withcomponent names
get_model forlgpfit objectsreturns anlgpmodel
is_f_sampled returns a logical value
get_stanfit returns astanfit (rstan)
postproc applies postprocessing and returns anupdatedlgpfit
clear_postproc removes postprocessing information andreturns an updatedlgpfit
num_paramsets,num_evalpoints andnum_components return an integer

Functions

parameter_info(): Get parameter information (priors etc.).
component_info(): Get component information.
covariate_info(): Get covariate information.
component_names(): Get component names.
get_model(): Getlgpmodel object.
is_f_sampled(): Determine if signal f is sampled or marginalized.
get_stanfit(): Extract stanfit object.
postproc(): Perform postprocessing.
contains_postproc(): Determine if object contains postprocessinginformation.
clear_postproc(): Clear postprocessing information (to reducesize of object).
num_paramsets(): Get number of parameter sets.
num_evalpoints(): Get number of points where posterior is evaluated.
num_components(): Get number of model components.

Fitting a model

Description

sample_model takes anlgpmodelobject and fits it usingsampling.
optimize_model takes anlgpmodelobject and fits it usingoptimizing.

Usage

sample_model(  model,  verbose = TRUE,  quiet = FALSE,  skip_postproc = is_f_sampled(model),  ...)optimize_model(model, ...)

Arguments

model

An object of classlgpmodel.

verbose

Can messages be printed?

quiet

Should all output messages be suppressed? You need to setalsorefresh=0 if you want to suppress also the progress updatemessages fromsampling.

skip_postproc

Should all postprocessing be skipped? If this isTRUE, the returnedlgpfit object will likely bemuch smaller (ifsample_f=FALSE).

...

Optional arguments passed tosampling oroptimizing.

Value

sample_model returns an object of classlgpfitcontaining the parameter draws, the originalmodel object,and possible postprocessing results. See documentation oflgpfit for more information.
optimize_model directly returns the list returned byoptimizing. See its documentation for more information.

Select relevant components

Description

select performs strict selection, returning eitherTRUEorFALSE for each component.
select.integrate is likeselect, but instead ofa fixed threshold, computes probabilistic selection by integrating overa threshold density.
select_freq performs the selection separately usingeach parameter draw and returns the frequency at which eachcomponent was selected.
select_freq.integrate is likeselect_freq, butinstead of a fixed threshold, computes probabilistic selectionfrequencies by integrating over a threshold density.

Usage

select(fit, reduce = function(x) base::mean(x), threshold = 0.95, ...)select_freq(fit, threshold = 0.95, ...)select.integrate(  fit,  reduce = function(x) base::mean(x),  p = function(x) stats::dbeta(x, 100, 5),  h = 0.01,  verbose = TRUE,  ...)select_freq.integrate(  fit,  p = function(x) stats::dbeta(x, 100, 5),  h = 0.01,  verbose = TRUE,  ...)

Arguments

fit

An object of classlgpfit.

reduce

Thereduce argument forrelevances.

threshold

Threshold for relevance sum.Must be a value between 0 and 1.

...

Additional arguments torelevances.

p

A threshold density over interval [0,1].

h

A discretization parameter for computing a quadrature.

verbose

Should this show a progress bar?

Value

See description.

Printing formula object info using the show generic

Description

Printing formula object info using the show generic

Usage

## S4 method for signature 'lgpformula'show(object)## S4 method for signature 'lgprhs'show(object)## S4 method for signature 'lgpterm'show(object)

Arguments

object

an object of some S4 class

Value

the object invisibly

Simulate latent function components for longitudinal data analysis

Description

Simulate latent function components for longitudinal data analysis

Usage

sim.create_f(  X,  covariates,  relevances,  lengthscales,  X_affected,  dis_fun,  bin_kernel,  steepness,  vm_params,  force_zeromean)

Arguments

X

input data matrix (generated bysim.create_x)

covariates

Integer vector that defines the types of covariates(other than id and age). Different integers correspond to thefollowing covariate types:

0 = disease-related age
1 = other continuous covariate
2 = a categorical covariate that interacts with age
3 = a categorical covariate that acts as a group offset
4 = a categorical covariate that that acts as a group offset ANDis restricted to have value 0 for controls and 1 for cases

relevances

Relative relevance of each component. Must have be a vectorso that
length(relevances) = 2 + length(covariates).
First two values define the relevance of the individual-specific age andshared age component, respectively.

lengthscales

A vector so that
length(lengthscales) =2 + sum(covariates %in% c(0,1,2)).

X_affected

which individuals are affected by the disease

dis_fun

A function or a string that defines the disease effect. Ifthis is a function, that function is used to generate the effect.Ifdis_fun is "gp_vm" or "gp_ns", the disease component is drawn froma nonstationary GP prior ("vm" is the variance masked version of it).

bin_kernel

Should the binary kernel be used for categoricalcovariates? If this isTRUE, the effect will exist only for group 1.

steepness

Steepness of the input warping function. This is only usedif the disease component is in the model.

vm_params

Parameters of the variance mask function. This is onlyneeded ifuseMaskedVarianceKernel = TRUE.

force_zeromean

Should each component (excluding the disease agecomponent) be forced to have a zero mean?

Value

a data frame FFF where one column corresponds to one additivecomponent

Create an input data frame X for simulated data

Description

Create an input data frame X for simulated data

Usage

sim.create_x(  N,  covariates,  names,  n_categs,  t_data,  t_jitter,  t_effect_range,  continuous_info)

Arguments

N

Number of individuals.

covariates

Integer vector that defines the types of covariates(other than id and age). If not given, only the id and agecovariates are created. Different integers correspond to the followingcovariate types:

0 = disease-related age
1 = other continuous covariate
2 = a categorical covariate that interacts with age
3 = a categorical covariate that acts as a group offset
4 = a categorical covariate that that acts as a group offset ANDis restricted to have value 0 for controls and 1 for cases

names

Covariate names.

n_categs

An integer vector defining the number of categoriesfor each categorical covariate, so thatlength(n_categs) equals tothe number of 2's and 3's in thecovariates vector.

t_data

Measurement times (same for each individual, unlesst_jitter > 0 in which case they are perturbed).

t_jitter

Standard deviation of the jitter added to the givenmeasurement times.

t_effect_range

Time interval from which the disease effect times aresampled uniformly. Alternatively, This can any function that returns the(possibly randomly generated) real disease effect time for one individual.

continuous_info

Info for generating continuous covariates. Must be alist containing fieldslambda andmu, which have length 3.The continuous covariates are generated so thatx <- sin(a*t + b) + c,where

t <- seq(0, 2*pi, length.out = k)
a <- mu[1] + lambda[1]*stats::runif(1)
b <- mu[2] + lambda[2]*stats::runif(1)
c <- mu[3] + lambda[3]*stats::runif(1)

Value

a list

Simulate noisy observations

Description

Simulate noisy observations

Usage

sim.create_y(noise_type, f, snr, phi, gamma, N_trials)

Arguments

noise_type

Either "gaussian", "poisson", "nb" (negative binomial),"binomial", or "bb" (beta-binomial).

f

The underlying signal.

snr

The desired signal-to-noise ratio. This argument is validonly whennoise_type is"gaussian".

phi

The inverse overdispersion parameter for negative binomial data.The variance isg + g^2/phi.

gamma

The dispersion parameter for beta-binomial data.

N_trials

The number of trials parameter for binomial data.

Value

A listout, where

out$h isf mapped through an inverse link function(timesN_trials ifnoise_type is binomial or beta-binomial)
out$y is the noisy response variable.

Compute all kernel matrices when simulating data

Description

Compute all kernel matrices when simulating data

Usage

sim.kernels(  X,  types,  lengthscales,  X_affected,  bin_kernel,  useMaskedVarianceKernel,  steepness,  vm_params)

Arguments

X

covariates

types

vector of covariate types, so that

1 = ID
2 = age
3 = diseaseAge
4 = other continuous covariate
5 = a categorical covariate that interacts with age
6 = a categorical covariate that acts as an offset

lengthscales

vector of lengthscales

X_affected

which individuals are affected by the disease

bin_kernel

whether or not binary (mask) kernel should be used forcategorical covariates (if not, the zerosum kernel is used)

useMaskedVarianceKernel

should the masked variance kernel be usedfor drawing the disease component

steepness

steepness of the input warping function

vm_params

parameters of the variance mask function

Value

a 3D array

Generate an artificial longitudinal data set

Description

Generate an artificial longitudinal data set.

Usage

simulate_data(  N,  t_data,  covariates = c(),  names = NULL,  relevances = c(1, 1, rep(1, length(covariates))),  n_categs = rep(2, sum(covariates %in% c(2, 3))),  t_jitter = 0,  lengthscales = rep(12, 2 + sum(covariates %in% c(0, 1, 2))),  f_var = 1,  noise_type = "gaussian",  snr = 3,  phi = 1,  gamma = 0.2,  N_affected = round(N/2),  t_effect_range = "auto",  t_observed = "after_0",  c_hat = 0,  dis_fun = "gp_warp_vm",  bin_kernel = FALSE,  steepness = 0.5,  vm_params = c(0.025, 1),  continuous_info = list(mu = c(pi/8, pi, -0.5), lambda = c(pi/8, pi, 1)),  N_trials = 1,  force_zeromean = TRUE)

Arguments

N

Number of individuals.

t_data

Measurement times (same for each individual, unlesst_jitter > 0 in which case they are perturbed).

covariates

Integer vector that defines the types of covariates(other than id and age). If not given, only the id and agecovariates are created. Different integers correspond to the followingcovariate types:

0 = disease-related age
1 = other continuous covariate
2 = a categorical covariate that interacts with age
3 = a categorical covariate that acts as a group offset
4 = a categorical covariate that that acts as a group offset ANDis restricted to have value 0 for controls and 1 for cases

names

Covariate names.

relevances

n_categs

An integer vector defining the number of categoriesfor each categorical covariate, so thatlength(n_categs) equals tothe number of 2's and 3's in thecovariates vector.

t_jitter

Standard deviation of the jitter added to the givenmeasurement times.

lengthscales

A vector so that
length(lengthscales) =2 + sum(covariates %in% c(0,1,2)).

f_var

variance of f

noise_type

Either "gaussian", "poisson", "nb" (negative binomial),"binomial", or "bb" (beta-binomial).

snr

The desired signal-to-noise ratio. This argument is validonly whennoise_type is"gaussian".

phi

The inverse overdispersion parameter for negative binomial data.The variance isg + g^2/phi.

gamma

The dispersion parameter for beta-binomial data.

N_affected

Number of diseased individuals that are affected by thedisease. This defaults to the number of diseased individuals. This argumentcan only be given ifcovariates contains a zero.

t_effect_range

Time interval from which the disease effect times aresampled uniformly. Alternatively, This can any function that returns the(possibly randomly generated) real disease effect time for one individual.

t_observed

Determines how the disease effect time is observed. Thiscan be any function that takes the real disease effect time as an argumentand returns the (possibly randomly generated) observed onset/initiation time.Alternatively, this can be a string of the form"after_n" or"random_p" or"exact".

c_hat

a constant added to f

dis_fun

bin_kernel

Should the binary kernel be used for categoricalcovariates? If this isTRUE, the effect will exist only for group 1.

steepness

Steepness of the input warping function. This is only usedif the disease component is in the model.

vm_params

Parameters of the variance mask function. This is onlyneeded ifuseMaskedVarianceKernel = TRUE.

continuous_info

Info for generating continuous covariates. Must be alist containing fieldslambda andmu, which have length 3.The continuous covariates are generated so thatx <- sin(a*t + b) + c,where

t <- seq(0, 2*pi, length.out = k)
a <- mu[1] + lambda[1]*stats::runif(1)
b <- mu[2] + lambda[2]*stats::runif(1)
c <- mu[3] + lambda[3]*stats::runif(1)

N_trials

The number of trials parameter for binomial data.

force_zeromean

Should each component (excluding the disease agecomponent) be forced to have a zero mean?

Value

An object of classlgpsim.

Examples

# Generate Gaussian datadat <- simulate_data(N = 4, t_data = c(6, 12, 24, 36, 48), snr = 3)# Generate negative binomially (NB) distributed count datadat <- simulate_data(  N = 6, t_data = seq(2, 10, by = 2), noise_type = "nb",  phi = 2)

Split data into training and test sets

Description

split_by_factor splits according to given factor
split_within_factor splits according to givendata point indices within the same level of a factor
split_within_factor_random selects k pointsfrom each level of a factor uniformly at random as test data
split_random splits uniformly at random
split_data splits according to given data rows

Usage

split_by_factor(data, test, var_name = "id")split_within_factor(data, idx_test, var_name = "id")split_within_factor_random(data, k_test = 1, var_name = "id")split_random(data, p_test = 0.2, n_test = NULL)split_data(data, i_test, sort_ids = TRUE)

Arguments

data

a data frame

test

the levels of the factor that will be used as test data

var_name

name of a factor in the data

idx_test

indices point indices with the factor

k_test

desired number of test data points per each level of thefactor

p_test

desired proportion of test data

n_test

desired number of test data points (if NULL,p_testis used to compute this)

i_test

test data row indices

sort_ids

should the test indices be sorted into increasing order

Value

a named list with namestrain,test,i_trainandi_test

A very small artificial test data, used mostly for unit tests

Description

A very small artificial test data, used mostly for unit tests

Usage

testdata_001

Format

A data frame with 24 rows and 6 variables:

id: individual id, a factor with levels: 1, 2, 3, 4
age: age
dis_age: disease-related age
blood: a continuous variable
sex: a factor with 2 levels: Male, Female
y: a continuous variable

Medium-size artificial test data, used mostly for tutorials

Description

Medium-size artificial test data, used mostly for tutorials

Usage

testdata_002

Format

A data frame with 96 rows and 6 variables:

id: individual id, a factor with levels: 01-12
age: age
diseaseAge: disease-related age
sex: a factor with 2 levels: Male, Female
group: a factor with 2 levels: Case, Control
y: a continuous variable

Validate S4 class objects

Description

Validate S4 class objects

Usage

validate_lgpexpr(object)validate_lgpformula(object)validate_lgpscaling(object)validate_lgpfit(object)validate_GaussianPrediction(object)validate_Prediction(object)

Arguments

object

an object to validate

Value

TRUE if valid, otherwise reasons for invalidity

Variance masking function

Description

Variance masking function

Usage

var_mask(x, stp)

Arguments

x

a vector of lengthn

stp

a positive real number (steepness of mask function)

Value

a vector of lengthn

Input warping function

Description

Input warping function

Usage

warp_input(x, a)

Arguments

x

a vector of lengthn

a

steepness of the warping function rise

Value

a vector of warped inputsw(x), lengthn

Movatterモバイル変換

The 'lgpr' package.

Description

Core functions

Visualization

Data

Vignettes and tutorials

Citation

Feedback

Author(s)

References

See Also

An S4 class to represent analytically computed predictive distributions(conditional on hyperparameters) of an additive GP model

Description

Usage

Arguments

Methods (by generic)

Slots

See Also

An S4 class to represent input for kernel matrix computations

Description

Usage

Arguments

Methods (by generic)

Slots

An S4 class to represent prior or posteriordraws from an additive function distribution.

Description

Usage

Arguments

Methods (by generic)

Slots

See Also

Easily add the disease-related age variable to a data frame

Description

Usage

Arguments

Value

See Also

Easily add a categorical covariate to a data frame

Description

Usage

Arguments

Value

See Also

Add a crossing of two factors to a data frame

Description

Usage

Arguments

Value

See Also

Set the GP mean vector, taking TMM or other normalizationinto account

Description

Usage

Arguments

Value

See Also

Apply variable scaling

Description

Usage

Arguments

Value

See Also

Character representations of different formula objects

Description

Usage

Arguments

Value

Create a model

Description

Usage

Arguments

Value

See Also

Parse the covariates and model components from given data and formula

Description

Usage

Arguments

Value

See Also

Create a model formula