Movatterモバイル変換


[0]ホーム

URL:


Title:Longitudinal Gaussian Process Regression
Version:1.2.5
Description:Interpretable nonparametric modeling of longitudinal data using additive Gaussian process regression. Contains functionality for inferring covariate effects and assessing covariate relevances. Models are specified using a convenient formula syntax, and can include shared, group-specific, non-stationary, heterogeneous and temporally uncertain effects. Bayesian inference for model parameters is performed using 'Stan'. The modeling approach and methods are described in detail in Timonen et al. (2021) <doi:10.1093/bioinformatics/btab021>.
License:GPL (≥ 3)
Encoding:UTF-8
LazyData:true
Biarch:true
Depends:R (≥ 3.4.0), methods
Imports:Rcpp (≥ 0.12.0), RcppParallel (≥ 5.0.2), RCurl (≥ 1.98),rstan (≥ 2.26.0), rstantools (≥ 2.3.1), bayesplot (≥ 1.7.0),MASS (≥ 7.3-50), stats (≥ 3.4), ggplot2 (≥ 3.1.0), gridExtra(≥ 0.3.0)
LinkingTo:BH (≥ 1.75.0-0), Rcpp (≥ 1.0.6), RcppEigen (≥ 0.3.3.9.1),RcppParallel (≥ 5.0.2), rstan (≥ 2.26.0), StanHeaders (≥2.26.0)
SystemRequirements:GNU make
NeedsCompilation:yes
RoxygenNote:7.3.2
Suggests:knitr, rmarkdown, testthat, covr
URL:https://github.com/jtimonen/lgpr
BugReports:https://github.com/jtimonen/lgpr/issues
VignetteBuilder:knitr
Packaged:2025-10-30 18:12:13 UTC; juhotimonen
Author:Juho TimonenORCID iD [aut, cre], Andrew Johnson [ctb]
Maintainer:Juho Timonen <juho.timonen@iki.fi>
Repository:CRAN
Date/Publication:2025-10-30 23:50:14 UTC

The 'lgpr' package.

Description

Interpretable nonparametric modeling of longitudinal datausing additive Gaussian process regression. Contains functionalityfor inferring covariate effects and assessing covariate relevances.Models are specified using a convenient formula syntax, and can includeshared, group-specific, non-stationary, heterogeneous and temporallyuncertain effects. Bayesian inference for model parameters is performedusing 'Stan' (rstan). The modeling approach and methodsare described in detail inTimonen et al. (2021).

Core functions

Main functionality of the package consists of creating and fitting anadditive GP model:

Visualization

Data

The data that you wish to analyze with 'lgpr' should be in anRdata.frame where columns correspond to measured variables and rowscorrespond to observations. Some functions that can help working with suchdata frames are:

Vignettes and tutorials

Seehttps://jtimonen.github.io/lgpr-usage/index.html. Thetutorials focus on code and use cases, whereas theMathematical description of lgpr modelsvignette describes the statistical models and how they can be customized in'lgpr'.

Citation

Runcitation("lgpr") to get citation information.

Feedback

Bug reports, PRs, enhancement ideas or user experiences in general arewelcome and appreciated. Create an issue in Github or email the author.

Author(s)

Juho Timonen (first.last at iki.fi)

References

  1. Timonen, J. et al. (2021).lgpr: an interpretable non-parametric method for inferring covariateeffects from longitudinal data. Bioinformatics,url.

  2. Carpenter, B. et al. (2017).Stan: A probabilistic programming language. Journal of StatisticalSoftware 76(1).

See Also

Useful links:


An S4 class to represent analytically computed predictive distributions(conditional on hyperparameters) of an additive GP model

Description

An S4 class to represent analytically computed predictive distributions(conditional on hyperparameters) of an additive GP model

Usage

## S4 method for signature 'GaussianPrediction'show(object)## S4 method for signature 'GaussianPrediction'component_names(object)## S4 method for signature 'GaussianPrediction'num_components(object)## S4 method for signature 'GaussianPrediction'num_paramsets(object)## S4 method for signature 'GaussianPrediction'num_evalpoints(object)

Arguments

object

GaussianPrediction object for which to apply aclass method.

Methods (by generic)

Slots

f_comp_mean

component means

f_comp_std

component standard deviations

f_mean

signal mean (on normalized scale)

f_std

signal standard deviation (on normalized scale)

y_mean

predictive mean (on original data scale)

y_std

predictive standard deviation (on original data scale)

x

a data frame of points (covariate values) where thefunction posteriors or predictive distributions have been evaluated

See Also

Prediction


An S4 class to represent input for kernel matrix computations

Description

An S4 class to represent input for kernel matrix computations

Usage

## S4 method for signature 'KernelComputer'show(object)## S4 method for signature 'KernelComputer'num_components(object)## S4 method for signature 'KernelComputer'num_evalpoints(object)## S4 method for signature 'KernelComputer'num_paramsets(object)## S4 method for signature 'KernelComputer'component_names(object)

Arguments

object

The object for which to call a class method.

Methods (by generic)

Slots

input

Common input (for example parameter values).

K_input

Input for computing kernel matrices between data points(N xN). A list.

Ks_input

Input for computing kernel matrices between data and outputpoints (P xN). A list.

Kss_input

Input for computing kernel matrices between outputpoints (P xP). A list, empty iffull_covariance=FALSE.

comp_names

Component names (character vector).

full_covariance

Boolean value determining if this can computefull predictive covariance matrices (or just marginal variance ateach point).

no_separate_output_points

Boolean value determining ifKs_input andKss_input are the same thing. Using thisknowledge can reduce unnecessary computations of kernel matrices.

STREAM

external pointer (for calling 'Stan' functions)


An S4 class to represent prior or posteriordraws from an additive function distribution.

Description

An S4 class to represent prior or posteriordraws from an additive function distribution.

Usage

## S4 method for signature 'Prediction'show(object)## S4 method for signature 'Prediction'component_names(object)## S4 method for signature 'Prediction'num_components(object)## S4 method for signature 'Prediction'num_paramsets(object)## S4 method for signature 'Prediction'num_evalpoints(object)

Arguments

object

Prediction object for which to apply a classmethod.

Methods (by generic)

Slots

f_comp

component draws

f

signal draws

h

predictions (signal draws + scaling factorc_hat,transformed through inverse link function)

x

a data frame of points (covariate values) where thefunctions/predictions have been evaluated/sampled

extrapolated

Boolean value telling if the function draws areoriginal MCMC draws or if they have been created by extrapolatingsuch draws.

See Also

GaussianPrediction


Easily add the disease-related age variable to a data frame

Description

Creates the disease-related age covariate vector based on thedisease initiation times and adds it to the data frame

Usage

add_dis_age(data, t_init, id_var = "id", time_var = "age")

Arguments

data

the original data frame

t_init

A named vector containing the observed initiation or onsettime for each individual. The names, i.e.names(t_init), shouldspecify the individual id.

id_var

name of the id variable indata

time_var

name of the time variable indata

Value

A data frame with one column added. The new column willbe calleddis_age. For controls, its value will beNaN.

See Also

Other data frame handling functions:add_factor(),add_factor_crossing(),adjusted_c_hat(),new_x(),split()


Easily add a categorical covariate to a data frame

Description

Easily add a categorical covariate to a data frame

Usage

add_factor(data, x, id_var = "id")

Arguments

data

the original data frame

x

A named vector containing the category for each individual.The names should specify the individual id.

id_var

name of the id variable indata

Value

A data frame with one column added. The new column willhave same name as the variable passed as inputx.

See Also

Other data frame handling functions:add_dis_age(),add_factor_crossing(),adjusted_c_hat(),new_x(),split()


Add a crossing of two factors to a data frame

Description

Add a crossing of two factors to a data frame

Usage

add_factor_crossing(data, fac1, fac2, new_name)

Arguments

data

a data frame

fac1

name of first factor, must be found indf

fac2

name of second factor, must be found indf

new_name

name of the new factor

Value

a data frame

See Also

Other data frame handling functions:add_dis_age(),add_factor(),adjusted_c_hat(),new_x(),split()


Set the GP mean vector, taking TMM or other normalizationinto account

Description

Creates thec_hat input forlgp,so that it accounts for normalization between data points in the"poisson" or"nb" observation model

Usage

adjusted_c_hat(y, norm_factors)

Arguments

y

response variable, vector of lengthn

norm_factors

normalization factors, vector of lengthn

Value

a vector of lengthn, which can be used asthec_hat input to thelgp function

See Also

Other data frame handling functions:add_dis_age(),add_factor(),add_factor_crossing(),new_x(),split()


Apply variable scaling

Description

Apply variable scaling

Usage

apply_scaling(scaling, x, inverse = FALSE)

Arguments

scaling

an object of classlgpscaling

x

object to which apply the scaling (numeric)

inverse

whether scaling should be done in inverse direction

Value

a similar object asx

See Also

Other variable scaling functions:create_scaling()


Character representations of different formula objects

Description

Character representations of different formula objects

Usage

## S4 method for signature 'lgpexpr'as.character(x)## S4 method for signature 'lgpterm'as.character(x)## S4 method for signature 'lgpformula'as.character(x)

Arguments

x

an object of some S4 class

Value

a character representation of the object


Create a model

Description

See theMathematical description of lgpr modelsvignette for more information about the connection between different optionsand the created statistical model.

Usage

create_model(  formula,  data,  likelihood = "gaussian",  prior = NULL,  c_hat = NULL,  num_trials = NULL,  options = NULL,  prior_only = FALSE,  verbose = FALSE,  sample_f = !(likelihood == "gaussian"))

Arguments

formula

The model formula, where

  • it must contain exatly one tilde (~), with responsevariable on the left-hand side and model terms on the right-hand side

  • terms are be separated by a plus (+) sign

  • all variables appearing informula must befound indata

See the "Model formula syntax" section below (lgp) forinstructions on how to specify the model terms.

data

Adata.frame where each column corresponds to onevariable, and each row is one observation. Continuous covariates and theresponse variable must have type"numeric" and categorical covariatesmust have type"factor". Missing values should be indicated withNaN orNA. The response variable cannot contain missingvalues. Column names should not contain trailing or leading underscores.

likelihood

Determines the observation model. Must be either"gaussian" (default),"poisson","nb" (negativebinomial),"binomial" or"bb" (beta binomial).

prior

A named list, defining the prior distribution of model(hyper)parameters. See the "Defining priors" section below(lgp).

c_hat

The GP mean. This should only be given ifsample_f isTRUE, otherwise the GP will always have zero mean. Ifsample_fisTRUE, the givenc_hat can be a vector of lengthdim(data)[1], or a real number defining a constant GP mean. If notspecified andsample_f isTRUE,c_hat is set to

  • c_hat = mean(y), iflikelihood is"gaussian",

  • c_hat =log(mean(y)) iflikelihood is"poisson" or"nb",

  • c_hat =log(p/(1-p)), wherep = mean(y/num_trials) iflikelihood is"binomial"or"bb",

wherey denotes the response variable measurements.

num_trials

This argument (number of trials) is only needed whenlikelihood is"binomial" or"bb". Must have length one orequal to the number of data points. Settingnum_trials=1 andlikelihood="binomial" corresponds to Bernoulli observation model.

options

A named list with the following possible fields:

  • delta Amount of added jitter to ensure positive definitecovariance matrices.

  • vm_params Variance mask function parameters (numericvector of length 2).

Ifoptions isNULL, default options are used. The defaultsare equivalent tooptions = list(delta = 1e-8, vm_params = c(0.025, 1)).

prior_only

Should likelihood be ignored? See alsosample_param_prior which can be used for anylgpmodel, and whose runtime is independent of the number ofobservations.

verbose

Should some informative messages be printed?

sample_f

Determines if the latent function values are sampled(must beTRUE if likelihood is not"gaussian"). If this isTRUE, the response variable will be normalized to have zero meanand unit variance.

Value

An object of classlgpmodel, containing theStan input created based on parsing the specifiedformula,prior, and other options.

See Also

Other main functions:draw_pred(),get_draws(),lgp(),pred(),prior_pred(),sample_model()


Parse the covariates and model components from given data and formula

Description

Parse the covariates and model components from given data and formula

Usage

create_model.covs_and_comps(data, model_formula, x_cont_scl, verbose)

Arguments

data

Adata.frame where each column corresponds to onevariable, and each row is one observation. Continuous covariates and theresponse variable must have type"numeric" and categorical covariatesmust have type"factor". Missing values should be indicated withNaN orNA. The response variable cannot contain missingvalues. Column names should not contain trailing or leading underscores.

model_formula

an object of classlgpformula

x_cont_scl

Information on how to scale the continuous covariates.This can either be

  • an existing list of objects with classlgpscaling, or

  • NA, in which case such list is created by computing meanand standard deviation fromdata

verbose

Should some informative messages be printed?

Value

parsed input to Stan and covariate scaling, and other info

See Also

Other internal model creation functions:create_model.formula(),create_model.likelihood(),create_model.prior()


Create a model formula

Description

Checks if formula is in advanced format and translates if not.

Usage

create_model.formula(formula, data, verbose = FALSE)

Arguments

formula

The model formula, where

  • it must contain exatly one tilde (~), with responsevariable on the left-hand side and model terms on the right-hand side

  • terms are be separated by a plus (+) sign

  • all variables appearing informula must befound indata

See the "Model formula syntax" section below (lgp) forinstructions on how to specify the model terms.

data

Adata.frame where each column corresponds to onevariable, and each row is one observation. Continuous covariates and theresponse variable must have type"numeric" and categorical covariatesmust have type"factor". Missing values should be indicated withNaN orNA. The response variable cannot contain missingvalues. Column names should not contain trailing or leading underscores.

verbose

Should some informative messages be printed?

Value

an object of classlgpformula

See Also

Other internal model creation functions:create_model.covs_and_comps(),create_model.likelihood(),create_model.prior()


Parse the response variable and its likelihood model

Description

Parse the response variable and its likelihood model

Usage

create_model.likelihood(  data,  likelihood,  c_hat,  num_trials,  y_name,  sample_f,  verbose)

Arguments

data

Adata.frame where each column corresponds to onevariable, and each row is one observation. Continuous covariates and theresponse variable must have type"numeric" and categorical covariatesmust have type"factor". Missing values should be indicated withNaN orNA. The response variable cannot contain missingvalues. Column names should not contain trailing or leading underscores.

likelihood

Determines the observation model. Must be either"gaussian" (default),"poisson","nb" (negativebinomial),"binomial" or"bb" (beta binomial).

c_hat

The GP mean. This should only be given ifsample_f isTRUE, otherwise the GP will always have zero mean. Ifsample_fisTRUE, the givenc_hat can be a vector of lengthdim(data)[1], or a real number defining a constant GP mean. If notspecified andsample_f isTRUE,c_hat is set to

  • c_hat = mean(y), iflikelihood is"gaussian",

  • c_hat =log(mean(y)) iflikelihood is"poisson" or"nb",

  • c_hat =log(p/(1-p)), wherep = mean(y/num_trials) iflikelihood is"binomial"or"bb",

wherey denotes the response variable measurements.

num_trials

This argument (number of trials) is only needed whenlikelihood is"binomial" or"bb". Must have length one orequal to the number of data points. Settingnum_trials=1 andlikelihood="binomial" corresponds to Bernoulli observation model.

y_name

Name of response variable

sample_f

Determines if the latent function values are sampled(must beTRUE if likelihood is not"gaussian"). If this isTRUE, the response variable will be normalized to have zero meanand unit variance.

verbose

Should some informative messages be printed?

Value

a list of parsed options

See Also

Other internal model creation functions:create_model.covs_and_comps(),create_model.formula(),create_model.prior()


Parse the given modeling options

Description

Parse the given modeling options

Usage

create_model.options(options, verbose)

Arguments

options

A named list with the following possible fields:

  • delta Amount of added jitter to ensure positive definitecovariance matrices.

  • vm_params Variance mask function parameters (numericvector of length 2).

Ifoptions isNULL, default options are used. The defaultsare equivalent tooptions = list(delta = 1e-8, vm_params = c(0.025, 1)).

verbose

Should some informative messages be printed?

Value

a named list of parsed options


Parse given prior

Description

Parse given prior

Usage

create_model.prior(prior, stan_input, verbose)

Arguments

prior

A named list, defining the prior distribution of model(hyper)parameters. See the "Defining priors" section below(lgp).

stan_input

a list of stan input fields

verbose

Should some informative messages be printed?

Value

a named list of parsed options

See Also

Other internal model creation functions:create_model.covs_and_comps(),create_model.formula(),create_model.likelihood()


Helper function for plots

Description

Helper function for plots

Usage

create_plot_df(object, x = "age", group_by = "id")

Arguments

object

model or fit

x

x-axis variable name

group_by

grouping variable name (useNULL for no grouping)

Value

a data frame


Create a standardizing transform

Description

Create a standardizing transform

Usage

create_scaling(x, name)

Arguments

x

variable measurements (might containNA orNaN)

name

variable name

Value

an object of classlgpscaling

See Also

Other variable scaling functions:apply_scaling()


Density and quantile functions of the inverse gamma distribution

Description

Using the same parametrization as Stan. More infohere.

Usage

dinvgamma_stanlike(x, alpha, beta, log = FALSE)qinvgamma_stanlike(p, alpha, beta)

Arguments

x

point where to compute the density

alpha

positive real number

beta

positive real number

log

is log-scale used?

p

quantile (must be between 0 and 1)

Value

density/quantile value

See Also

Other functions related to the inverse-gamma distribution:plot_invgamma(),priors


Draw pseudo-observations from posterior or prior predictive distribution

Description

Draw pseudo-observations from predictive distribution.Ifpred contains draws from the component posterior (prior)distributions, then the output is draws from the posterior (prior)predictive distribution. Ifpred is not specified, thenwhether output draws are from prior or posterior predictive distributiondepends on whetherfit is created using thelgpoptionprior_only=TRUE or not.

Usage

draw_pred(fit, pred = NULL)

Arguments

fit

An object of classlgpfit that has been createdusing thelgp optionsample_f=TRUE.

pred

An object of classPrediction, containingdraws of each model component. IfNULL, this isobtained usingget_pred(fit).

Value

An array with shapeS x P, whereS is the number ofdraws thatpred contains andP is the length of eachfunction draw.Each rows = 1, \ldots, S of the output is one vector drawn from thepredictive distribution, given parameter draws.

See Also

Other main functions:create_model(),get_draws(),lgp(),pred(),prior_pred(),sample_model()


Quick way to create an example lgpfit, useful for debugging

Description

Quick way to create an example lgpfit, useful for debugging

Usage

example_fit(  formula = y ~ id + age + age | SEX + age | LOC,  likelihood = "gaussian",  chains = 1,  iter = 30,  num_indiv = 6,  num_timepoints = 5,  ...)

Arguments

formula

model formula

likelihood

observation model

chains

number of chains to run

iter

number of iterations to run

num_indiv

number of individuals (data simulation)

num_timepoints

number of time points (data simulation)

...

additional arguments tolgp

Value

Anlgpfit object created by fittingthe example model.


Print a fit summary.

Description

Print a fit summary.

Usage

fit_summary(fit, ignore_pars = c("f_latent", "eta", "teff_raw", "lp__"))

Arguments

fit

an object of classlgpfit

ignore_pars

parameters and generated quantities to ignore from output

Value

object invisibly.


Extract parameter draws from lgpfit or stanfit

Description

Usesextractwithpermuted = FALSE andinc_warmup = FALSE.

Usage

get_draws(object, draws = NULL, reduce = NULL, ...)

Arguments

object

An object of classlgpfit orstanfit.

draws

Indices of the parameter draws.NULL corresponds toall post-warmup draws.

reduce

Function used to reduce all parameter draws intoone set of parameters. Ignored ifNULL, or ifdraws is notNULL.

...

Additional arguments torstan::extract().

Value

The return value is always a 2-dimensional array of shapenum_param_sets xnum_params.

See Also

Other main functions:create_model(),draw_pred(),lgp(),pred(),prior_pred(),sample_model()


Extract model predictions and function posteriors

Description

NOTE: It is not recommended for users to call this. Usepred instead.

Usage

get_pred(fit, draws = NULL, reduce = NULL, verbose = TRUE)

Arguments

fit

An object of classlgpfit.

draws

Indices of parameter draws to use, orNULL to use alldraws.

reduce

Reduction for parameters draws. Can be a function thatis applied to reduce all parameter draws into one parameter set, orNULL (no reduction). Has no effect ifdraws is specified.

verbose

Should more information and a possible progress bar beprinted?

Value

an object of classGaussianPrediction orPrediction


Compute a kernel matrix (covariance matrix)

Description

These haveSTAN_kernel_* counterparts. These R versionsare provided for reference and are not optimized for speed. These areused when generating simulated data, and not during model inference.

Usage

kernel_eq(x1, x2, alpha = 1, ell)kernel_ns(x1, x2, alpha = 1, ell, a)kernel_zerosum(x1, x2, M)kernel_bin(x1, x2, pos_class = 0)kernel_cat(x1, x2)kernel_varmask(x1, x2, a, vm_params)kernel_beta(beta, idx1_expand, idx2_expand)

Arguments

x1

vector of lengthn

x2

vector of lengthm

alpha

marginal std (default = 1)

ell

lengthscale

a

steepness of the warping function rise

M

number of categories

pos_class

binary (mask) kernel function has value one if both inputshave this value, other wise it is zero

vm_params

vector of two mask function parameters.

beta

a parameter vector (row vector) of lengthN_cases

idx1_expand

integer vector of lengthn

idx2_expand

integer vector of lengthm

Value

A matrix of sizen xm.

Functions


Main function of the 'lgpr' package

Description

Creates an additive Gaussian process model usingcreate_model and fits it usingsample_model.See theMathematical description of lgpr modelsvignette for more information about the connection between different optionsand the created statistical model.

Usage

lgp(  formula,  data,  likelihood = "gaussian",  prior = NULL,  c_hat = NULL,  num_trials = NULL,  options = NULL,  prior_only = FALSE,  verbose = FALSE,  sample_f = !(likelihood == "gaussian"),  quiet = FALSE,  skip_postproc = sample_f,  ...)

Arguments

formula

The model formula, where

  • it must contain exatly one tilde (~), with responsevariable on the left-hand side and model terms on the right-hand side

  • terms are be separated by a plus (+) sign

  • all variables appearing informula must befound indata

See the "Model formula syntax" section below (lgp) forinstructions on how to specify the model terms.

data

Adata.frame where each column corresponds to onevariable, and each row is one observation. Continuous covariates and theresponse variable must have type"numeric" and categorical covariatesmust have type"factor". Missing values should be indicated withNaN orNA. The response variable cannot contain missingvalues. Column names should not contain trailing or leading underscores.

likelihood

Determines the observation model. Must be either"gaussian" (default),"poisson","nb" (negativebinomial),"binomial" or"bb" (beta binomial).

prior

A named list, defining the prior distribution of model(hyper)parameters. See the "Defining priors" section below(lgp).

c_hat

The GP mean. This should only be given ifsample_f isTRUE, otherwise the GP will always have zero mean. Ifsample_fisTRUE, the givenc_hat can be a vector of lengthdim(data)[1], or a real number defining a constant GP mean. If notspecified andsample_f isTRUE,c_hat is set to

  • c_hat = mean(y), iflikelihood is"gaussian",

  • c_hat =log(mean(y)) iflikelihood is"poisson" or"nb",

  • c_hat =log(p/(1-p)), wherep = mean(y/num_trials) iflikelihood is"binomial"or"bb",

wherey denotes the response variable measurements.

num_trials

This argument (number of trials) is only needed whenlikelihood is"binomial" or"bb". Must have length one orequal to the number of data points. Settingnum_trials=1 andlikelihood="binomial" corresponds to Bernoulli observation model.

options

A named list with the following possible fields:

  • delta Amount of added jitter to ensure positive definitecovariance matrices.

  • vm_params Variance mask function parameters (numericvector of length 2).

Ifoptions isNULL, default options are used. The defaultsare equivalent tooptions = list(delta = 1e-8, vm_params = c(0.025, 1)).

prior_only

Should likelihood be ignored? See alsosample_param_prior which can be used for anylgpmodel, and whose runtime is independent of the number ofobservations.

verbose

Can messages be printed during model creation? Has noeffect ifquiet=TRUE.

sample_f

Determines if the latent function values are sampled(must beTRUE if likelihood is not"gaussian"). If this isTRUE, the response variable will be normalized to have zero meanand unit variance.

quiet

Should all output messages be suppressed? You need to setalsorefresh=0 if you want to suppress also the progress updatemessages fromsampling.

skip_postproc

Should all postprocessing be skipped? If this isTRUE, the returnedlgpfit object will likely bemuch smaller (ifsample_f=FALSE).

...

Optional arguments passed tosampling oroptimizing.

Value

Returns an object of the S4 classlgpfit.

Model formula syntax

There are two ways to define the model formula:

  1. Using a commonformula-like syntax, like iny ~ age +age|id + sex. Terms can consist of asingle variable, such asage, or an interaction of two variables,such asage|id. In single-variable terms, the variable can be eithercontinuous (numeric) or categorical (factor), whereas in interaction termsthe variable on the left-hand side of the vertical bar (|) has tobe continuous and the one on the right-hand side has to be categorical.Formulae specified using this syntax are translated to the advanced formatso that

    • single-variable terms becomegp(x) ifvariablex is numeric andzs(x) ifx is a factor

    • interaction termsx|z becomegp(x)*zs(z)

  2. Using the advanced syntax, like iny ~ gp(age) +gp(age)*zs(id) +het(id)*gp_vm(disAge).This createslgprhs objects, which consist oflgpterms, which consist oflgpexprs.This approach must be used if creating nonstationary, heterogeneous ortemporally uncertain components.

Either one of the approaches should be used and they should not be mixed.

Defining priors

Theprior argument must be a named list, likelist(alpha=student_t(4), wrp=igam(30,10)). See examples in tutorials.Possible allowed names are

Seepriors for functions that can beused to define the list elements. If a parameter of a model is not givenin this list, a default prior will be used for it.

When to not use default priors

It is not recommended to use default priors blindly. Rather, priors shouldbe specified according to the knowledge about the problem at hand, as in anyBayesian analysis. Inlgpr this is especially important when

  1. Using a non-Gaussian likelihood or otherwise settingsample_f = TRUE. In this case the response variable is notnormalized, so the scale on which the data varies must be taken intoaccount when defining priors of the signal magnitude parametersalpha and possible noise parameters (sigma,phi,gamma). Also it should be checked ifc_hat is set in asensible way.

  2. Using a model that contains agp_ns(x) orgp_vm(x)expression in its formula. In this case the corresponding covariatex is not normalized, and the prior for the input warping steepnessparameterwrp must be set according to the expected width of thewindow in which the nonstationary effect ofx occurs. By default,the width of this window is about 36, which has been set assuming thatthe unit ofx is months.

See Also

Other main functions:create_model(),draw_pred(),get_draws(),pred(),prior_pred(),sample_model()


An S4 class to represent an lgp expression

Description

An S4 class to represent an lgp expression

Slots

covariate

name of a covariate

fun

function name

See Also

Seeoperations for performing arithmeticsonlgprhs,lgpterm andlgpexprobjects.


An S4 class to represent the output of thelgp function

Description

An S4 class to represent the output of thelgp function

Usage

## S4 method for signature 'lgpfit'show(object)## S4 method for signature 'lgpfit'component_names(object)## S4 method for signature 'lgpfit'num_components(object)## S4 method for signature 'lgpfit'postproc(object, verbose = TRUE)## S4 method for signature 'lgpfit'contains_postproc(object)## S4 method for signature 'lgpfit'clear_postproc(object)## S4 method for signature 'lgpfit'get_model(object)## S4 method for signature 'lgpfit'get_stanfit(object)## S4 method for signature 'lgpfit'is_f_sampled(object)## S4 method for signature 'lgpfit,missing'plot(x, y)

Arguments

object

The object for which to apply a class method.

verbose

Can the method print any messages?

x

anlgpfit object to visualize

y

unused argument

Methods (by generic)

Slots

stan_fit

An object of classstanfit.

model

An object of classlgpmodel.

num_draws

Total number of parameter draws.

postproc_results

A named list containing possible postprocessingresults.

See Also

For extracting parameter draws, seeget_draws,or therstan methods forstanfit objects.

For more detailed plotting functions, seeplot_draws,plot_beta,plot_warp,plot_effect_times


An S4 class to represent an lgp formula

Description

An S4 class to represent an lgp formula

Slots

terms

an object of classlgprhs

y_name

name of the response variable

call

original formula call

See Also

Seeoperations for performing arithmeticsonlgprhs,lgpterm andlgpexprobjects.


An S4 class to represent an additive GP model

Description

An S4 class to represent an additive GP model

Usage

## S4 method for signature 'lgpmodel'show(object)## S4 method for signature 'lgpmodel'parameter_info(object, digits = 3)## S4 method for signature 'lgpmodel'component_info(object)## S4 method for signature 'lgpmodel'num_components(object)## S4 method for signature 'lgpmodel'covariate_info(object)## S4 method for signature 'lgpmodel'component_names(object)## S4 method for signature 'lgpmodel'is_f_sampled(object)

Arguments

object

The object for which to apply a class method.

digits

number of digits to show for floating point numbers

Methods (by generic)

Slots

formula

An object of classlgpformula

data

The original unmodified data.

stan_input

The data to be given as input torstan::sampling

var_names

List of variable names grouped by type.

var_scalings

A named list with fields

  • y - Response variable normalization function and itsinverse operation. Must be anlgpscaling object.

  • x_cont - Continuous covariate normalization functions andtheir inverse operations. Must be a named list with each element is anlgpscaling object.

var_info

A named list with fields

  • x_cat_levels - Names of the levels of categorical covariatesbefore converting from factor to numeric.

info

Other info in text format.

sample_f

Whether the signalf is sampled or marginalized.

full_prior

Complete prior information.


An S4 class to represent the right-hand side of an lgp formula

Description

An S4 class to represent the right-hand side of an lgp formula

Slots

summands

a list of one or morelgpterms

See Also

Seeoperations for performing arithmeticsonlgprhs,lgpterm andlgpexprobjects.


An S4 class to represent variable scaling

Description

An S4 class to represent variable scaling

Slots

loc

original location (mean)

scale

original scale (standard deviation)

var_name

variable name


An S4 class to represent a data set simulated using the additive GPformalism

Description

An S4 class to represent a data set simulated using the additive GPformalism

Usage

## S4 method for signature 'lgpsim'show(object)## S4 method for signature 'lgpsim,missing'plot(x, y, ...)

Arguments

object

anlgpsim object

x

anlgpsim object to plot

y

not used

...

optional arguments passed toplot_sim

Methods (by generic)

Slots

data

the actual data

response

name of the response variable in the data

components

the drawn function components

kernel_matrices

the covariance matrices for each gp

info

A list with fields

  • par_ell the used lengthscale parameters

  • par_cont the parameters used to generate the continuouscovariates

  • p_signal signal proportion

effect_times

A list with fields

  • true possible true effect times that generate the diseaseeffect

  • observed possible observed effect times


An S4 class to represent one formula term

Description

An S4 class to represent one formula term

Slots

factors

a list of at most twolgpexprs

See Also

Seeoperations for performing arithmeticsonlgprhs,lgpterm andlgpexprobjects.


Print a model summary.

Description

Print a model summary.

Usage

model_summary(object, digits = 3)param_summary(object, digits = 3)

Arguments

object

a model or fit

digits

number of digits to round floats to

Value

object invisibly.


Create test input points for prediction

Description

Replaces a continuous variablex in the data frame, andpossibly another continuous variablex_ns derived from it, with newvalues, for each level of a grouping factor (usually id)

Usage

new_x(data, x_values, group_by = "id", x = "age", x_ns = NULL)

Arguments

data

A data frame. Can also be anlgpfit orlgpmodel object, in which case data is extracted from it.

x_values

the values ofx to set for each individual

group_by

name of the grouping variable, must be a factorindata (or usegroup_by=NA to create a dummy groupingfactor which has only one value)

x

of the variable along which to extend,must be a numeric indata

x_ns

of a nonstationary variable derived fromx,must be a numeric indata

Value

a data frame containing the following columns

See Also

Other data frame handling functions:add_dis_age(),add_factor(),add_factor_crossing(),adjusted_c_hat(),split()


Operations on formula terms and expressions

Description

Operations on formula terms and expressions

Usage

## S4 method for signature 'lgprhs,lgprhs'e1 + e2## S4 method for signature 'lgpterm,lgpterm'e1 + e2## S4 method for signature 'lgprhs,lgpterm'e1 + e2## S4 method for signature 'lgpterm,lgpterm'e1 * e2

Arguments

e1

The first sum, term or expression

e2

The second sum, term or expression

Value

The behaviour and return type depend on the types ofe1 ande2.You can


Plot a generated/fit model component

Description

Data frames specified in argumentsdf,anddf_err must have a format where

Usage

plot_api_c(  df,  df_err = NULL,  alpha = 1,  alpha_err = 0.2,  no_err = FALSE,  no_line = FALSE)

Arguments

df

a data frame

df_err

a data frame

alpha

line opacity

alpha_err

ribbon opacity

no_err

hide error bar even when it would normally be plotted?

no_line

hide line even when it would normally be plotted?

Value

Aggplot object.

See Also

Other internal plot API functions:plot_api_g()


Plot longitudinal data and/or model fit so that each subject/group hastheir own panel

Description

Data frames specified in argumentsdf_data,df_signal,df_fit, anddf_fit_err must have a formatwhere

Usage

plot_api_g(  df_data,  df_signal = NULL,  df = NULL,  df_err = NULL,  teff_signal = NULL,  teff_obs = NULL,  i_test = NULL,  color_signal = color_palette(2)[1],  color = color_palette(2)[2],  color_err = colorset("red", "light_highlight"),  color_vlines = colorset("gray", "mid_highlight"),  alpha = 1,  alpha_err = 0.5,  nrow = NULL,  ncol = NULL,  y_transform = function(x) x)

Arguments

df_data

A data frame containing the observations.

df_signal

A data frame containing the true signal. Omitted ifNULL.

df

A data frame containing the model fit, or a list of dataframes. The list version can be used for example so that each list elementcorresponds to the fit computed using one parameter draw. Omitted ifNULL.

df_err

A data frame containing error bars. Omitted ifNULL.Must beNULL ifdf_fit is a list.

teff_signal

A named vector containing true effect times used togenerate the signal. Omitted ifNULL.

teff_obs

A named vector containing observed effect times. Omitted ifNULL.

i_test

Indices of test points.

color_signal

Line color for true signal.

color

Line color for model fit.

color_err

Color of the error ribbon.

color_vlines

Two line colors for vertical lines(true and obs. effect time).

alpha

Line opacity for model fit.

alpha_err

Opacity of the error ribbon.

nrow

number of rows, an argument forfacet_wrap

ncol

number of columns, an argument forfacet_wrap

y_transform

A function to be applied to the third column ofdf_data.

Value

Aggplot object.

See Also

Other internal plot API functions:plot_api_c()


Visualize all model components

Description

This callsplot_f for all model components.

Usage

plot_components(  fit,  pred = NULL,  group_by = "id",  t_name = "age",  MULT_STD = 2,  verbose = TRUE,  draws = NULL,  reduce = function(x) base::mean(x),  color_by = NA,  no_err = FALSE,  ylim = NULL,  draw = TRUE,  nrow = NULL,  ncol = NULL,  gg_add = NULL,  x = NULL,  ...)

Arguments

fit

An object of classlgpfit.

pred

An object of classGaussianPrediction orPrediction. Ifpred=NULL, thepredfunction is called with the givenreduce anddraws arguments.

group_by

name of the grouping variable (usegroup_by=NAto avoid grouping)

t_name

name of the x-axis variable

MULT_STD

a multiplier for standard deviation

verbose

Can this print any messages?

draws

Only has effect ifpred=NULL.

reduce

Only has effect ifpred=NULL.

color_by

Names of coloring factors. Can have length 1 or equal tothe number of components. See thecolor_by argument ofplot_f.

no_err

Should the error ribbons be skipped even though theyotherwise would be shown? Can have length 1 or equal to number ofcomponents + 1. See theno_err argument ofplot_api_c.

ylim

a vector of length 2 (upper and lower y-axis limits), or NULL

draw

if this is TRUE, the plot grid is drawn usingarrangeGrob

nrow

number of grid rows

ncol

number of grid columns

gg_add

additional ggplot obejct to add to each plot

x

Deprecated argument. This is now taken from thepredobject to ensure compatibility.

...

additional arguments toplot_api_c

Value

a list of ggplot objects invisibly

See Also

Other main plot functions:plot_draws(),plot_pred()


Vizualizing longitudinal data

Description

Vizualizing longitudinal data

Usage

plot_data(  data,  x_name = "age",  y_name = "y",  group_by = "id",  facet_by = NULL,  color_by = NULL,  highlight = NULL,  main = NULL,  sub = NULL)

Arguments

data

A data frame.

x_name

Name of x-axis variable.

y_name

Name of the y-axis variable.

group_by

Name of grouping variable (must be a factor).

facet_by

Name of the faceting variable (must be a factor).

color_by

Name of coloring variable (must be a factor).

highlight

Value of category of thegroup_by variablethat is highlighted. Can only be used ifcolor_by isNULL.

main

main plot title

sub

plot subtitle

Value

aggplot object


Visualize the distribution of parameter draws

Description

Visualize the distribution of parameter draws

Usage

plot_draws(  fit,  type = "intervals",  regex_pars = c("alpha", "ell", "wrp", "sigma", "phi", "gamma"),  ...)plot_beta(fit, type = "dens", verbose = TRUE, ...)plot_warp(  fit,  num_points = 300,  window_size = 48,  color = colorset("red", "dark"),  alpha = 0.5)plot_effect_times(fit, type = "areas", verbose = TRUE, ...)

Arguments

fit

an object of classlgpfit

type

plot type, allowed options are "intervals", "dens","areas", and "trace"

regex_pars

regex for parameter names to plot

...

additional arguments for thebayesplot functionmcmc_intervals,mcmc_dens,mcmc_areas ormcmc_trace

verbose

Can any output be printed?

num_points

number of plot points

window_size

width of time window

color

line color

alpha

line alpha

Value

aggplot object or list of them

Functions

See Also

Other main plot functions:plot_components(),plot_pred()


Visualize input warping function with several steepness parameter values

Description

Visualize input warping function with several steepness parameter values

Usage

plot_inputwarp(wrp, x, color = colorset("red", "dark"), alpha = 0.5)

Arguments

wrp

a vector of values of the warping steepness parameter

x

a vector of input values

color

line color

alpha

line alpha

Value

aggplot object


Plot the inverse gamma-distribution pdf

Description

Plot the inverse gamma-distribution pdf

Usage

plot_invgamma(  alpha,  beta,  by = 0.01,  log = FALSE,  IQR = 0.95,  return_quantiles = FALSE,  linecolor = colorset("red", "dark"),  fillcolor = colorset("red", "mid"))

Arguments

alpha

positive real number

beta

positive real number

by

grid size

log

is log-scale used?

IQR

inter-quantile range width

return_quantiles

should this return a list

linecolor

line color

fillcolor

fill color

Value

aggplot object

See Also

Other functions related to the inverse-gamma distribution:dinvgamma_stanlike(),priors


Visualizing model predictions or inferred covariate effects

Description

Usage

plot_pred(  fit,  pred = NULL,  group_by = "id",  t_name = "age",  MULT_STD = 2,  verbose = TRUE,  draws = NULL,  reduce = function(x) base::mean(x),  x = NULL,  ...)plot_f(  fit,  pred = NULL,  group_by = "id",  t_name = "age",  MULT_STD = 2,  verbose = TRUE,  draws = NULL,  reduce = function(x) base::mean(x),  comp_idx = NULL,  color_by = NA,  x = NULL,  ...)

Arguments

fit

An object of classlgpfit.

pred

An object of classGaussianPrediction orPrediction. Ifpred=NULL, thepredfunction is called with the givenreduce anddraws arguments.

group_by

name of the grouping variable (usegroup_by=NAto avoid grouping)

t_name

name of the x-axis variable

MULT_STD

a multiplier for standard deviation

verbose

Can this print any messages?

draws

Only has effect ifpred=NULL.

reduce

Only has effect ifpred=NULL.

x

Deprecated argument. This is now taken from thepredobject to ensure compatibility.

...

additional arguments toplot_api_g orplot_api_c

comp_idx

Index of component to plot. The total sum is plottedif this isNULL.

color_by

name of coloring factor

Value

aggplot object

See Also

Other main plot functions:plot_components(),plot_draws()


Visualize an lgpsim object (simulated data)

Description

Visualize an lgpsim object (simulated data)

Usage

plot_sim(  simdata,  group_by = "id",  x_name = "age",  h_name = "h",  y_name = "y",  comp_idx = NULL,  color_by = NA,  verbose = TRUE,  ...)

Arguments

simdata

an object of classlgpsim

group_by

grouping factor

x_name

name of x-axis variable

h_name

name of the signal insimdata$components ("h" or "f")

y_name

name of response variable

comp_idx

Possible index of a component to be shown.If this is NULL, the data and total signal are shown.

color_by

coloring factor

verbose

should some information be printed?

...

additional arguments toplot_api_g orplot_api_c

Value

aggplot object


Graphical posterior predictive checks

Description

Graphical posterior predictive checks

Usage

ppc(fit, data = NULL, fun = default_ppc_fun(fit), verbose = TRUE, ...)

Arguments

fit

An object of classlgpfit that can been createdwithsample_f=TRUE.

data

the original data frame (deprecated argument with noeffect, now obtained from fit object)

fun

bayesplot function name

verbose

Can this print any messages?

...

additional arguments passed to the defaultpp_check method inbayesplot

Value

aggplot object

See Also

Introduction to graphical posterior predictive checks:here.Prior predictive check can be done by callingprior_pred and thenbayesplot::pp_check().


Posterior predictions and function posteriors

Description

Usage

pred(  fit,  x = NULL,  reduce = function(x) base::mean(x),  draws = NULL,  verbose = TRUE,  STREAM = get_stream(),  c_hat_pred = NULL,  force = FALSE,  debug_kc = FALSE)

Arguments

fit

An object of classlgpfit.

x

A data frame of points where function posterior distributionsand predictions should be computed or sampled.The functionnew_x provides an easy way to create it.If this isNULL, the data points are used.

reduce

Reduction for parameters draws. Can be a function thatis applied to reduce all parameter draws into one parameter set, orNULL (no reduction). Has no effect ifdraws is specified.

draws

Indices of parameter draws to use, orNULL to use alldraws.

verbose

Should more information and a possible progress bar beprinted?

STREAM

External pointer. By default obtained withrstan::get_stream().

c_hat_pred

This is only used if the latent signalf wassampled. This input contains the values added to the sumf beforepassing through inverse link function. Must be a vector with length equal tothe number of prediction points. If originalc_hat was constant,thenc_hat_pred can be ignored, in which case this will by defaultuse the same constant.

force

This is by defaultFALSE to prevent unintendedlarge computations that might crash R or take forever. Set it toTRUEtry computing no matter what.

debug_kc

If this isTRUE, this only returns aKernelComputer object that is created internally. Meant fordebugging.

Value

An object of classGaussianPrediction orPrediction.

See Also

Other main functions:create_model(),draw_pred(),get_draws(),lgp(),prior_pred(),sample_model()


Prior (predictive) sampling

Description

These functions take anlgpmodel object, and

Usage

prior_pred(  model,  verbose = TRUE,  quiet = FALSE,  refresh = 0,  STREAM = get_stream(),  ...)sample_param_prior(model, verbose = TRUE, quiet = FALSE, ...)

Arguments

model

An object of classlgpmodel.

verbose

Should more information and a possible progress bar beprinted?

quiet

This forcesverbose to beFALSE. If you wantto suppress also the output from Stan, give the additional argumentrefresh=0.

refresh

Argument forsampling.

STREAM

External pointer. By default obtained withrstan::get_stream().

...

Additional arguments forsampling.

Value

See Also

Other main functions:create_model(),draw_pred(),get_draws(),lgp(),pred(),sample_model()


Convert given prior to numeric format

Description

Convert given prior to numeric format

Usage

prior_to_num(desc)

Arguments

desc

Prior description as a named list, containing fields

  • dist - Distribution name. Must be one of'uniform', 'normal', 'student-t', 'gamma', 'inv-gamma', or 'log-normal'(case-insensitive)

  • square - Is the prior for a square-transformed parameter.

Other list fields are interpreted as hyperparameters.

Value

a named list of parsed options


Prior definitions

Description

These use the same parametrizations as defined in the 'Stan'documentation. See the docs forgamma andinverse gamma distributions.

Usage

uniform(square = FALSE)normal(mu, sigma, square = FALSE)student_t(nu, square = FALSE)gam(shape, inv_scale, square = FALSE)igam(shape, scale, square = FALSE)log_normal(mu, sigma, square = FALSE)bet(a, b)

Arguments

square

is prior for a square-transformed parameter?

mu

mean

sigma

standard deviation

nu

degrees of freedom

shape

shape parameter (alpha)

inv_scale

inverse scale parameter (beta)

scale

scale parameter (beta)

a

shape parameter

b

shape parameter

Value

a named list

See Also

Other functions related to the inverse-gamma distribution:dinvgamma_stanlike(),plot_invgamma()

Examples

# Log-normal priorlog_normal(mu = 1, sigma = 1)# Cauchy priorstudent_t(nu = 1)# Exponential prior with rate = 0.1gam(shape = 1, inv_scale = 0.1)# Create a similar priors as in LonGP (Cheng et al., 2019)# Not recommended, because a lengthscale close to 0 is possible.a <- log(1) - log(0.1)log_normal(mu = 0, sigma = a / 2) # for continuous lengthscalestudent_t(nu = 4) # for interaction lengthscaleigam(shape = 0.5, scale = 0.005, square = TRUE) # for sigma

Function for reading the built-in proteomics data

Description

Function for reading the built-in proteomics data

Usage

read_proteomics_data(parentDir = NULL, protein = NULL, verbose = TRUE)

Arguments

parentDir

Path to local parent directory for the data.If this isNULL, data is downloaded fromhttps://github.com/jtimonen/lgpr-usage/tree/master/data/proteomics.

protein

Index or name of protein.

verbose

Can this print some output?

Value

adata.frame


Assess component relevances

Description

Assess component relevances

Usage

relevances(fit, reduce = function(x) base::mean(x), verbose = TRUE, ...)

Arguments

fit

an object of classlgpfit

reduce

a function to apply to reduce the relevances given eachparameter draw into one value

verbose

Can this print any messages?

...

currently has no effect

Value

a named vector with length equal tonum_comps + 1


S4 generics for lgpfit, lgpmodel, and other objects

Description

S4 generics for lgpfit, lgpmodel, and other objects

Usage

parameter_info(object, digits)component_info(object)covariate_info(object)component_names(object)get_model(object)is_f_sampled(object)get_stanfit(object)postproc(object, ...)contains_postproc(object)clear_postproc(object)num_paramsets(object)num_evalpoints(object)num_components(object)

Arguments

object

object for which to apply the generic

digits

number of digits to show

...

additional optional arguments to pass

Value

Functions

See Also

To find out which methods have been implemented for which classes,seelgpfit,lgpmodel,Prediction andGaussianPrediction.


Fitting a model

Description

Usage

sample_model(  model,  verbose = TRUE,  quiet = FALSE,  skip_postproc = is_f_sampled(model),  ...)optimize_model(model, ...)

Arguments

model

An object of classlgpmodel.

verbose

Can messages be printed?

quiet

Should all output messages be suppressed? You need to setalsorefresh=0 if you want to suppress also the progress updatemessages fromsampling.

skip_postproc

Should all postprocessing be skipped? If this isTRUE, the returnedlgpfit object will likely bemuch smaller (ifsample_f=FALSE).

...

Optional arguments passed tosampling oroptimizing.

Value

See Also

Other main functions:create_model(),draw_pred(),get_draws(),lgp(),pred(),prior_pred()


Select relevant components

Description

Usage

select(fit, reduce = function(x) base::mean(x), threshold = 0.95, ...)select_freq(fit, threshold = 0.95, ...)select.integrate(  fit,  reduce = function(x) base::mean(x),  p = function(x) stats::dbeta(x, 100, 5),  h = 0.01,  verbose = TRUE,  ...)select_freq.integrate(  fit,  p = function(x) stats::dbeta(x, 100, 5),  h = 0.01,  verbose = TRUE,  ...)

Arguments

fit

An object of classlgpfit.

reduce

Thereduce argument forrelevances.

threshold

Threshold for relevance sum.Must be a value between 0 and 1.

...

Additional arguments torelevances.

p

A threshold density over interval [0,1].

h

A discretization parameter for computing a quadrature.

verbose

Should this show a progress bar?

Value

See description.


Printing formula object info using the show generic

Description

Printing formula object info using the show generic

Usage

## S4 method for signature 'lgpformula'show(object)## S4 method for signature 'lgprhs'show(object)## S4 method for signature 'lgpterm'show(object)

Arguments

object

an object of some S4 class

Value

the object invisibly


Simulate latent function components for longitudinal data analysis

Description

Simulate latent function components for longitudinal data analysis

Usage

sim.create_f(  X,  covariates,  relevances,  lengthscales,  X_affected,  dis_fun,  bin_kernel,  steepness,  vm_params,  force_zeromean)

Arguments

X

input data matrix (generated bysim.create_x)

covariates

Integer vector that defines the types of covariates(other than id and age). Different integers correspond to thefollowing covariate types:

  • 0 = disease-related age

  • 1 = other continuous covariate

  • 2 = a categorical covariate that interacts with age

  • 3 = a categorical covariate that acts as a group offset

  • 4 = a categorical covariate that that acts as a group offset ANDis restricted to have value 0 for controls and 1 for cases

relevances

Relative relevance of each component. Must have be a vectorso that
length(relevances) = 2 + length(covariates).
First two values define the relevance of the individual-specific age andshared age component, respectively.

lengthscales

A vector so that
length(lengthscales) =2 + sum(covariates %in% c(0,1,2)).

X_affected

which individuals are affected by the disease

dis_fun

A function or a string that defines the disease effect. Ifthis is a function, that function is used to generate the effect.Ifdis_fun is "gp_vm" or "gp_ns", the disease component is drawn froma nonstationary GP prior ("vm" is the variance masked version of it).

bin_kernel

Should the binary kernel be used for categoricalcovariates? If this isTRUE, the effect will exist only for group 1.

steepness

Steepness of the input warping function. This is only usedif the disease component is in the model.

vm_params

Parameters of the variance mask function. This is onlyneeded ifuseMaskedVarianceKernel = TRUE.

force_zeromean

Should each component (excluding the disease agecomponent) be forced to have a zero mean?

Value

a data frame FFF where one column corresponds to one additivecomponent


Create an input data frame X for simulated data

Description

Create an input data frame X for simulated data

Usage

sim.create_x(  N,  covariates,  names,  n_categs,  t_data,  t_jitter,  t_effect_range,  continuous_info)

Arguments

N

Number of individuals.

covariates

Integer vector that defines the types of covariates(other than id and age). If not given, only the id and agecovariates are created. Different integers correspond to the followingcovariate types:

  • 0 = disease-related age

  • 1 = other continuous covariate

  • 2 = a categorical covariate that interacts with age

  • 3 = a categorical covariate that acts as a group offset

  • 4 = a categorical covariate that that acts as a group offset ANDis restricted to have value 0 for controls and 1 for cases

names

Covariate names.

n_categs

An integer vector defining the number of categoriesfor each categorical covariate, so thatlength(n_categs) equals tothe number of 2's and 3's in thecovariates vector.

t_data

Measurement times (same for each individual, unlesst_jitter > 0 in which case they are perturbed).

t_jitter

Standard deviation of the jitter added to the givenmeasurement times.

t_effect_range

Time interval from which the disease effect times aresampled uniformly. Alternatively, This can any function that returns the(possibly randomly generated) real disease effect time for one individual.

continuous_info

Info for generating continuous covariates. Must be alist containing fieldslambda andmu, which have length 3.The continuous covariates are generated so thatx <- sin(a*t + b) + c,where

  • t <- seq(0, 2*pi, length.out = k)

  • a <- mu[1] + lambda[1]*stats::runif(1)

  • b <- mu[2] + lambda[2]*stats::runif(1)

  • c <- mu[3] + lambda[3]*stats::runif(1)

Value

a list


Simulate noisy observations

Description

Simulate noisy observations

Usage

sim.create_y(noise_type, f, snr, phi, gamma, N_trials)

Arguments

noise_type

Either "gaussian", "poisson", "nb" (negative binomial),"binomial", or "bb" (beta-binomial).

f

The underlying signal.

snr

The desired signal-to-noise ratio. This argument is validonly whennoise_type is"gaussian".

phi

The inverse overdispersion parameter for negative binomial data.The variance isg + g^2/phi.

gamma

The dispersion parameter for beta-binomial data.

N_trials

The number of trials parameter for binomial data.

Value

A listout, where


Compute all kernel matrices when simulating data

Description

Compute all kernel matrices when simulating data

Usage

sim.kernels(  X,  types,  lengthscales,  X_affected,  bin_kernel,  useMaskedVarianceKernel,  steepness,  vm_params)

Arguments

X

covariates

types

vector of covariate types, so that

  • 1 = ID

  • 2 = age

  • 3 = diseaseAge

  • 4 = other continuous covariate

  • 5 = a categorical covariate that interacts with age

  • 6 = a categorical covariate that acts as an offset

lengthscales

vector of lengthscales

X_affected

which individuals are affected by the disease

bin_kernel

whether or not binary (mask) kernel should be used forcategorical covariates (if not, the zerosum kernel is used)

useMaskedVarianceKernel

should the masked variance kernel be usedfor drawing the disease component

steepness

steepness of the input warping function

vm_params

parameters of the variance mask function

Value

a 3D array


Generate an artificial longitudinal data set

Description

Generate an artificial longitudinal data set.

Usage

simulate_data(  N,  t_data,  covariates = c(),  names = NULL,  relevances = c(1, 1, rep(1, length(covariates))),  n_categs = rep(2, sum(covariates %in% c(2, 3))),  t_jitter = 0,  lengthscales = rep(12, 2 + sum(covariates %in% c(0, 1, 2))),  f_var = 1,  noise_type = "gaussian",  snr = 3,  phi = 1,  gamma = 0.2,  N_affected = round(N/2),  t_effect_range = "auto",  t_observed = "after_0",  c_hat = 0,  dis_fun = "gp_warp_vm",  bin_kernel = FALSE,  steepness = 0.5,  vm_params = c(0.025, 1),  continuous_info = list(mu = c(pi/8, pi, -0.5), lambda = c(pi/8, pi, 1)),  N_trials = 1,  force_zeromean = TRUE)

Arguments

N

Number of individuals.

t_data

Measurement times (same for each individual, unlesst_jitter > 0 in which case they are perturbed).

covariates

Integer vector that defines the types of covariates(other than id and age). If not given, only the id and agecovariates are created. Different integers correspond to the followingcovariate types:

  • 0 = disease-related age

  • 1 = other continuous covariate

  • 2 = a categorical covariate that interacts with age

  • 3 = a categorical covariate that acts as a group offset

  • 4 = a categorical covariate that that acts as a group offset ANDis restricted to have value 0 for controls and 1 for cases

names

Covariate names.

relevances

Relative relevance of each component. Must have be a vectorso that
length(relevances) = 2 + length(covariates).
First two values define the relevance of the individual-specific age andshared age component, respectively.

n_categs

An integer vector defining the number of categoriesfor each categorical covariate, so thatlength(n_categs) equals tothe number of 2's and 3's in thecovariates vector.

t_jitter

Standard deviation of the jitter added to the givenmeasurement times.

lengthscales

A vector so that
length(lengthscales) =2 + sum(covariates %in% c(0,1,2)).

f_var

variance of f

noise_type

Either "gaussian", "poisson", "nb" (negative binomial),"binomial", or "bb" (beta-binomial).

snr

The desired signal-to-noise ratio. This argument is validonly whennoise_type is"gaussian".

phi

The inverse overdispersion parameter for negative binomial data.The variance isg + g^2/phi.

gamma

The dispersion parameter for beta-binomial data.

N_affected

Number of diseased individuals that are affected by thedisease. This defaults to the number of diseased individuals. This argumentcan only be given ifcovariates contains a zero.

t_effect_range

Time interval from which the disease effect times aresampled uniformly. Alternatively, This can any function that returns the(possibly randomly generated) real disease effect time for one individual.

t_observed

Determines how the disease effect time is observed. Thiscan be any function that takes the real disease effect time as an argumentand returns the (possibly randomly generated) observed onset/initiation time.Alternatively, this can be a string of the form"after_n" or"random_p" or"exact".

c_hat

a constant added to f

dis_fun

A function or a string that defines the disease effect. Ifthis is a function, that function is used to generate the effect.Ifdis_fun is "gp_vm" or "gp_ns", the disease component is drawn froma nonstationary GP prior ("vm" is the variance masked version of it).

bin_kernel

Should the binary kernel be used for categoricalcovariates? If this isTRUE, the effect will exist only for group 1.

steepness

Steepness of the input warping function. This is only usedif the disease component is in the model.

vm_params

Parameters of the variance mask function. This is onlyneeded ifuseMaskedVarianceKernel = TRUE.

continuous_info

Info for generating continuous covariates. Must be alist containing fieldslambda andmu, which have length 3.The continuous covariates are generated so thatx <- sin(a*t + b) + c,where

  • t <- seq(0, 2*pi, length.out = k)

  • a <- mu[1] + lambda[1]*stats::runif(1)

  • b <- mu[2] + lambda[2]*stats::runif(1)

  • c <- mu[3] + lambda[3]*stats::runif(1)

N_trials

The number of trials parameter for binomial data.

force_zeromean

Should each component (excluding the disease agecomponent) be forced to have a zero mean?

Value

An object of classlgpsim.

Examples

# Generate Gaussian datadat <- simulate_data(N = 4, t_data = c(6, 12, 24, 36, 48), snr = 3)# Generate negative binomially (NB) distributed count datadat <- simulate_data(  N = 6, t_data = seq(2, 10, by = 2), noise_type = "nb",  phi = 2)

Split data into training and test sets

Description

Usage

split_by_factor(data, test, var_name = "id")split_within_factor(data, idx_test, var_name = "id")split_within_factor_random(data, k_test = 1, var_name = "id")split_random(data, p_test = 0.2, n_test = NULL)split_data(data, i_test, sort_ids = TRUE)

Arguments

data

a data frame

test

the levels of the factor that will be used as test data

var_name

name of a factor in the data

idx_test

indices point indices with the factor

k_test

desired number of test data points per each level of thefactor

p_test

desired proportion of test data

n_test

desired number of test data points (if NULL,p_testis used to compute this)

i_test

test data row indices

sort_ids

should the test indices be sorted into increasing order

Value

a named list with namestrain,test,i_trainandi_test

See Also

Other data frame handling functions:add_dis_age(),add_factor(),add_factor_crossing(),adjusted_c_hat(),new_x()


A very small artificial test data, used mostly for unit tests

Description

A very small artificial test data, used mostly for unit tests

Usage

testdata_001

Format

A data frame with 24 rows and 6 variables:

id

individual id, a factor with levels: 1, 2, 3, 4

age

age

dis_age

disease-related age

blood

a continuous variable

sex

a factor with 2 levels: Male, Female

y

a continuous variable

See Also

Other built-in datasets:testdata_002


Medium-size artificial test data, used mostly for tutorials

Description

Medium-size artificial test data, used mostly for tutorials

Usage

testdata_002

Format

A data frame with 96 rows and 6 variables:

id

individual id, a factor with levels: 01-12

age

age

diseaseAge

disease-related age

sex

a factor with 2 levels: Male, Female

group

a factor with 2 levels: Case, Control

y

a continuous variable

See Also

read_proteomics_data

Other built-in datasets:testdata_001


Validate S4 class objects

Description

Validate S4 class objects

Usage

validate_lgpexpr(object)validate_lgpformula(object)validate_lgpscaling(object)validate_lgpfit(object)validate_GaussianPrediction(object)validate_Prediction(object)

Arguments

object

an object to validate

Value

TRUE if valid, otherwise reasons for invalidity


Variance masking function

Description

Variance masking function

Usage

var_mask(x, stp)

Arguments

x

a vector of lengthn

stp

a positive real number (steepness of mask function)

Value

a vector of lengthn

See Also

Other kernel utility functions:warp_input()


Input warping function

Description

Input warping function

Usage

warp_input(x, a)

Arguments

x

a vector of lengthn

a

steepness of the warping function rise

Value

a vector of warped inputsw(x), lengthn

See Also

Other kernel utility functions:var_mask()


[8]ページ先頭

©2009-2025 Movatter.jp