Movatterモバイル変換


[0]ホーム

URL:


Title:Clustering and Prediction using Multi-Task Gaussian Processeswith Common Mean
Version:1.2.1
Description:An implementation for the multi-task Gaussian processes with common mean framework. Two main algorithms, called 'Magma' and 'MagmaClust', are available to perform predictions for supervised learning problems, in particular for time series or any functional/continuous data applications. The corresponding articles has been respectively proposed by Arthur Leroy, Pierre Latouche, Benjamin Guedj and Servane Gey (2022) <doi:10.1007/s10994-022-06172-1>, and Arthur Leroy, Pierre Latouche, Benjamin Guedj and Servane Gey (2023)https://jmlr.org/papers/v24/20-1321.html. Theses approaches leverage the learning of cluster-specific mean processes, which are common across similar tasks, to provide enhanced prediction performances (even far from data) at a linear computational cost (in the number of tasks). 'MagmaClust' is a generalisation of 'Magma' where the tasks are simultaneously clustered into groups, each being associated to a specific mean process. User-oriented functions in the package are decomposed into training, prediction and plotting functions. Some basic features (classic kernels, training, prediction) of standard Gaussian processes are also implemented.
License:MIT + file LICENSE
URL:https://github.com/ArthurLeroy/MagmaClustR,https://arthurleroy.github.io/MagmaClustR/
BugReports:https://github.com/ArthurLeroy/MagmaClustR/issues
Imports:broom, dplyr, ggplot2, magrittr, methods, mvtnorm, plyr,purrr, Rcpp, rlang, stats, tibble, tidyr, tidyselect
Suggests:gganimate, gifski, gridExtra, knitr, plotly, png, rmarkdown,testthat (≥ 3.0.0), transformr
LinkingTo:Rcpp
Encoding:UTF-8
LazyData:true
RoxygenNote:7.2.3
Depends:R (≥ 2.10)
NeedsCompilation:yes
Packaged:2024-06-28 20:01:23 UTC; Arthur Leroy
Author:Arthur LeroyORCID iD [aut, cre], Pierre Latouche [aut], Pierre Pathé [ctb], Alexia Grenouillat [ctb], Hugo Lelievre [ctb]
Maintainer:Arthur Leroy <arthur.leroy.pro@gmail.com>
Repository:CRAN
Date/Publication:2024-06-28 20:20:02 UTC

MagmaClustR : Clustering and Prediction using Multi-Task Gaussian Processes

Description

TheMagmaClustR package implements two main algorithms, calledMagma andMagmaClust, using a multi-task GPs model to performpredictions for supervised learning problems. Theses approaches leveragethe learning of cluster-specific mean processes, which are common acrosssimilar tasks, to provide enhanced prediction performances (even far fromdata) at a linear computational cost (in the number of tasks).MagmaClust is a generalisation ofMagma where the tasks aresimultaneously clustered into groups, each being associated to a specificmean process. User-oriented functions in the package are decomposed intotraining, prediction and plotting functions. Some basic features ofstandard GPs are also implemented.

Details

For a quick introduction toMagmaClustR, please refer to the README athttps://github.com/ArthurLeroy/MagmaClustR

Author(s)

Arthur Leroy, Pierre Pathe and Pierre Latouche
Maintainer: Arthur Leroy -arthur.leroy.pro@gmail.com

References

Arthur Leroy, Pierre Latouche, Benjamin Guedj, and Servane Gey.
MAGMA: Inference and Prediction with Multi-Task Gaussian Processes.Machine Learning, 2022,https://link.springer.com/article/10.1007/s10994-022-06172-1

Arthur Leroy, Pierre Latouche, Benjamin Guedj, and Servane Gey.
Cluster-Specific Predictions with Multi-Task Gaussian Processes.Journal of Machine Learning Research, 2023,https://jmlr.org/papers/v24/20-1321.html

Examples

Simulate a dataset, train and predict with Magma

set.seed(4242)
data_magma <- simu_db(M = 11, N = 10, K = 1)
magma_train <- data_magma %>% subset(ID %in% 1:10)
magma_test <- data_magma %>% subset(ID == 11) %>% head(7)

magma_model <- train_magma(data = magma_train)
magma_pred <- pred_magma(data = magma_test, trained_model = magma_model,grid_inputs = seq(0, 10, 0.01))

Simulate a dataset, train and predict with MagmaClust

set.seed(4242)
data_magmaclust <- simu_db(M = 4, N = 10, K = 3)
list_ID = unique(data_magmaclust$ID)
magmaclust_train <- data_magmaclust %>% subset(ID %in% list_ID[1:11])
magmaclust_test <- data_magmaclust %>% subset(ID == list_ID[12]) %>%head(5)

magmaclust_model <- train_magmaclust(data = magmaclust_train)
magmaclust_pred <- pred_magmaclust(data = magmaclust_test,
trained_model = magmaclust_model, grid_inputs = seq(0, 10, 0.01))

Author(s)

Maintainer: Arthur Leroyarthur.leroy.pro@gmail.com (ORCID)

Authors:

Other contributors:

See Also

Useful links:


Pipe operator

Description

Seemagrittr::%>% for details.

Usage

lhs %>% rhs

Arguments

lhs

A value or the magrittr placeholder.

rhs

A function call using the magrittr semantics.

Value

The result of callingrhs(lhs).


Round a matrix to make if symmetric

Description

If a matrix is non-symmetric due to numerical errors, round with a decreasingnumber of digits until the matrix becomes symmetric.

Usage

check_symmetric(mat, digits = 10)

Arguments

mat

A matrix, possibly non-symmetric.

digits

A number, the starting number of digits to round from ifmat is not symmetric

Value

A matrix, rounded approximation ofmat that is symmetric.

Examples

TRUE

Inverse a matrix using an adaptive jitter term

Description

Inverse a matrix from its Choleski decomposition. If (nearly-)singular,increase the order of magnitude of the jitter term added to the diagonaluntil the matrix becomes non-singular.

Usage

chol_inv_jitter(mat, pen_diag)

Arguments

mat

A matrix, possibly singular.

pen_diag

A number, a jitter term to add on the diagonal.

Value

A matrix, inverse ofmat plus an adaptive jitter termadded on the diagonal.

Examples

TRUE

Allocate training data into the most probable cluster

Description

Allocate training data into the most probable cluster

Usage

data_allocate_cluster(trained_model)

Arguments

trained_model

A list, containing the information coming from aMagmaClust model, previously trained using thetrain_magmaclust function.

Value

The original dataset used to train the MagmaClust model, withadditional 'Cluster' and associated 'Proba' columns, indicating the mostprobable cluster for each individual/task at the end of the trainingprocedure.

Examples

TRUE

Compute the Multivariate Gaussian likelihood

Description

Modification of the functiondmvnorm() from the packagemvtnorm, providing an implementation of the Multivariate Gaussianlikelihood. This version uses inverse of the covariance function as argumentinstead of the traditional covariance.

Usage

dmnorm(x, mu, inv_Sigma, log = FALSE)

Arguments

x

A vector, containing values the likelihood is evaluated on.

mu

A vector or matrix, specifying the mean parameter.

inv_Sigma

A matrix, specifying the inverse of covariance parameter.

log

A logical value, indicating whether we return the log-likelihood.

Value

A number, corresponding to the Multivariate Gaussian log-likelihood.

Examples

TRUE

Draw a number

Description

Draw uniformly a number within a specified interval

Usage

draw(int)

Arguments

int

An interval of values we want to draw uniformly in.

Value

A 2-decimals-rounded random number

Examples

TRUE

E-Step of the EM algorithm

Description

Expectation step of the EM algorithm to compute the parameters of thehyper-posterior Gaussian distribution of the mean process in Magma.

Usage

e_step(db, m_0, kern_0, kern_i, hp_0, hp_i, pen_diag)

Arguments

db

A tibble or data frame. Columns required: ID, Input, Output.Additional columns for covariates can be specified.

m_0

A vector, corresponding to the prior mean of the mean GP.

kern_0

A kernel function, associated with the mean GP.

kern_i

A kernel function, associated with the individual GPs.

hp_0

A named vector, tibble or data frame of hyper-parametersassociated withkern_0.

hp_i

A tibble or data frame of hyper-parametersassociated withkern_i.

pen_diag

A number. A jitter term, added on the diagonal to preventnumerical issues when inverting nearly singular matrices.

Value

A named list, containing the elementsmean, a tibblecontaining the Input and associated Output of the hyper-posterior's meanparameter, andcov, the hyper-posterior's covariance matrix.

Examples

TRUE

Penalised elbo for multiple mean GPs with common HPs

Description

Penalised elbo for multiple mean GPs with common HPs

Usage

elbo_GP_mod_common_hp_k(hp, db, mean, kern, post_cov, pen_diag)

Arguments

hp

A tibble, data frame or named vector containing hyper-parameters.

db

A tibble containing values we want to compute elbo on.Required columns: Input, Output. Additional covariate columns are allowed.

mean

A list of the K mean GPs at union of observed timestamps.

kern

A kernel function used to compute the covariance matrix atcorresponding timestamps.

post_cov

A List of the K posterior covariance of the mean GP (mu_k).Used to compute correction term (cor_term).

pen_diag

A jitter term that is added to the covariance matrix to avoidnumerical issues when inverting, in cases of nearly singular matrices.

Value

The value of the penalised Gaussian elbo forthe sum of the k mean GPs with common HPs.

Examples

TRUE

Evidence Lower Bound for a mixture of GPs

Description

Evidence Lower Bound for a mixture of GPs

Usage

elbo_clust_multi_GP(hp, db, hyperpost, kern, pen_diag)

Arguments

hp

A tibble, data frame or named vector containing hyper-parameters.

db

A tibble containing the values we want to compute the elbo on.Required columns: Input, Output. Additional covariate columns are allowed.

hyperpost

List of parameters for the K mean GPs.

kern

A kernel function used to compute the covariance matrix atcorresponding timestamps.

pen_diag

A jitter term that is added to the covariance matrix to avoidnumerical issues when inverting, in cases of nearly singular matrices.

Value

The value of the penalised Gaussian elbo for a mixture of GPs

Examples

TRUE

Penalised elbo for multiple individual GPs with common HPs

Description

Penalised elbo for multiple individual GPs with common HPs

Usage

elbo_clust_multi_GP_common_hp_i(hp, db, hyperpost, kern, pen_diag)

Arguments

hp

A tibble, data frame or named vector containing hyper-parameters.

db

A tibble containing values we want to compute elbo on.Required columns: Input, Output. Additional covariate columns are allowed.

hyperpost

List of parameters for the K mean Gaussian processes.

kern

A kernel function used to compute the covariance matrix atcorresponding timestamps.

pen_diag

A jitter term that is added to the covariance matrix to avoidnumerical issues when inverting, in cases of nearly singular matrices.

Value

The value of the penalised Gaussian elbo forthe sum of the M individual GPs with common HPs.

Examples

TRUE

Evidence Lower Bound maximised in MagmaClust

Description

Evidence Lower Bound maximised in MagmaClust

Usage

elbo_monitoring_VEM(hp_k, hp_i, db, kern_i, kern_k, hyperpost, m_k, pen_diag)

Arguments

hp_k

A tibble, data frame or named vector of hyper-parametersfor each clusters.

hp_i

A tibble, data frame or named vector of hyper-parametersfor each individuals.

db

A tibble containing values we want to compute elbo on.Required columns: Input, Output. Additional covariate columns are allowed.

kern_i

Kernel used to compute the covariance matrix of individuals GPsat corresponding inputs.

kern_k

Kernel used to compute the covariance matrix of the mean GPsat corresponding inputs.

hyperpost

A list of parameters for the variational distributionsof the K mean GPs.

m_k

Prior value of the mean parameter of the mean GPs (mu_k).Length = 1 or nrow(db).

pen_diag

A jitter term that is added to the covariance matrix to avoidnumerical issues when inverting, in cases of nearly singular matrices.

Value

Value of the elbo that is maximised during the VEM algorithm used fortraining in MagmaClust.

Examples

TRUE

Expand a grid of inputs

Description

Expand a grid of inputs

Usage

expand_grid_inputs(Input, ...)

Arguments

Input

A vector of inputs.

...

As many vector of covariates as desired. We advise to giveexplicit names when using the function.

Value

A tibble containing all the combination of values of theparameters.

Examples

TRUE

Gradient of the logLikelihood of a Gaussian Process

Description

Gradient of the logLikelihood of a Gaussian Process

Usage

gr_GP(hp, db, mean, kern, post_cov, pen_diag)

Arguments

hp

A tibble, data frame or named vector containing hyper-parameters.

db

A tibble containing the values we want to compute the logL on.Required columns: Input, Output. Additional covariate columns are allowed.

mean

A vector, specifying the mean of the GP at the reference inputs.

kern

A kernel function.

post_cov

(optional) A matrix, corresponding to covariance parameter ofthe hyper-posterior. Used to compute the hyper-prior distribution of a newindividual in Magma.

pen_diag

A jitter term that is added to the covariance matrix to avoidnumerical issues when inverting, in cases of nearly singular matrices.

Value

A named vector, corresponding to the value of the hyper-parametersgradients for the Gaussian log-Likelihood (where the covariance can be thesum of the individual and the hyper-posterior's mean process covariances).

Examples

TRUE

Gradient of the modified logLikelihood for GPs in Magma

Description

Gradient of the modified logLikelihood for GPs in Magma

Usage

gr_GP_mod(hp, db, mean, kern, post_cov, pen_diag)

Arguments

hp

A tibble, data frame or named vector containing hyper-parameters.

db

A tibble containing the values we want to compute the logL on.Required columns: Input, Output. Additional covariate columns are allowed.

mean

A vector, specifying the mean of the GPs at the reference inputs.

kern

A kernel function.

post_cov

A matrix, covariance parameter of the hyper-posterior.Used to compute the correction term.

pen_diag

A jitter term that is added to the covariance matrix to avoidnumerical issues when inverting, in cases of nearly singular matrices.

Value

A named vector, corresponding to the value of the hyper-parametersgradients for the modified Gaussian log-Likelihood involved in Magma.

Examples

TRUE

Gradient of the modified logLikelihood with common HPs for GPs in Magma

Description

Gradient of the modified logLikelihood with common HPs for GPs in Magma

Usage

gr_GP_mod_common_hp(hp, db, mean, kern, post_cov, pen_diag)

Arguments

hp

A tibble or data frame containing hyper-parameters for allindividuals.

db

A tibble containing the values we want to compute the logL on.Required columns: ID, Input, Output. Additional covariate columns areallowed.

mean

A vector, specifying the mean of the GPs at the reference inputs.

kern

A kernel function.

post_cov

A matrix, covariance parameter of the hyper-posterior.Used to compute the correction term.

pen_diag

A jitter term that is added to the covariance matrix to avoidnumerical issues when inverting, in cases of nearly singular matrices.

Value

A named vector, corresponding to the value of the hyper-parameters'gradients for the modified Gaussian log-Likelihood involved in Magma withthe 'common HP' setting.

Examples

TRUE

Gradient of the penalised elbo for multiple mean GPs with common HPs

Description

Gradient of the penalised elbo for multiple mean GPs with common HPs

Usage

gr_GP_mod_common_hp_k(hp, db, mean, kern, post_cov, pen_diag)

Arguments

hp

A tibble, data frame or named vector containing hyper-parameters.

db

A tibble containing the values we want to compute the elbo on.Required columns: Input, Output. Additional covariate columns are allowed.

mean

A list of the k means of the GPs at union of observed timestamps.

kern

A kernel function

post_cov

A list of the k posterior covariance of the mean GP (mu_k).Used to compute correction term (cor_term)

pen_diag

A jitter term that is added to the covariance matrix to avoidnumerical issues when inverting, in cases of nearly singular matrices.

Value

The gradient of the penalised Gaussian elbo forthe sum of the k mean GPs with common HPs.

Examples

TRUE

Gradient of the elbo for a mixture of GPs

Description

Gradient of the elbo for a mixture of GPs

Usage

gr_clust_multi_GP(hp, db, hyperpost, kern, pen_diag)

Arguments

hp

A tibble, data frame or named vector containing hyper-parameters.

db

A tibble containing the values we want to compute the elbo on.Required columns: Input, Output. Additional covariate columns are allowed.

hyperpost

List of parameters for the K mean Gaussian processes.

kern

A kernel function.

pen_diag

A jitter term that is added to the covariance matrix to avoidnumerical issues when inverting, in cases of nearly singular matrices.

Value

The gradient of the penalised Gaussian elbo for a mixture of GPs

Examples

TRUE

Gradient of the penalised elbo for multiple individual GPs with common HPs

Description

Gradient of the penalised elbo for multiple individual GPs with common HPs

Usage

gr_clust_multi_GP_common_hp_i(hp, db, hyperpost, kern, pen_diag = NULL)

Arguments

hp

A tibble, data frame or name vector of hyper-parameters.

db

A tibble containing values we want to compute elbo on.Required columns: Input, Output. Additional covariate columns are allowed.

hyperpost

List of parameters for the K mean Gaussian processes.

kern

A kernel function used to compute the covariance matrix atcorresponding timestamps.

pen_diag

A jitter term that is added to the covariance matrix to avoidnumerical issues when inverting, in cases of nearly singular matrices.

Value

The gradient of the penalised Gaussian elbo forthe sum of the M individual GPs with common HPs.

Examples

TRUE

Gradient of the mixture of Gaussian likelihoods

Description

Compute the gradient of a sum of Gaussian log-likelihoods, weighted by theirmixture probabilities.

Usage

gr_sum_logL_GP_clust(hp, db, mixture, mean, kern, post_cov, pen_diag)

Arguments

hp

A tibble, data frame or named vector of hyper-parameters.

db

A tibble containing data we want to evaluate the logL on.Required columns: Input, Output. Additional covariate columns are allowed.

mixture

A tibble or data frame, indicating the mixture probabilitiesof each cluster for the new individual/task.

mean

A list of hyper-posterior mean parameters for all clusters.

kern

A kernel function.

post_cov

A list of hyper-posterior covariance parameters for allclusters.

pen_diag

A jitter term that is added to the covariance matrix to avoidnumerical issues when inverting, in cases of nearly singular matrices.

Value

A named vector, corresponding to the value of the hyper-parameters'gradients for the mixture of Gaussian log-likelihoods involved in theprediction step of MagmaClust.

Examples

TRUE

Generate random hyper-parameters

Description

Generate a set of random hyper-parameters, specific to the chosen type ofkernel, under the format that is used in Magma.

Usage

hp(  kern = "SE",  list_ID = NULL,  list_hp = NULL,  noise = FALSE,  common_hp = FALSE)

Arguments

kern

A function, or a character string indicating the chosen type ofkernel among:

  • "SE": the Squared Exponential kernel,

  • "LIN": the Linear kernel,

  • "PERIO": the Periodic kernel,

  • "RQ": the Rational Quadratic kernel.Compound kernels can be created as sums or products of the above kernels.For combining kernels, simply provide a formula as a character stringwhere elements are separated by whitespaces (e.g. "SE + PERIO"). As theelements are treated sequentially from the left to the right, the productoperator '*' shall always be used before the '+' operators (e.g.'SE * LIN + RQ' is valid whereas 'RQ + SE * LIN' is not).

In case of a custom kernel function, the argumentlist_hp has to beprovided as well, for designing a tibble with the correct names ofhyper-parameters.

list_ID

A vector, associating anID value with each individualfor whom hyper-parameters are generated. If NULL (default) only one set ofhyper-parameters is return without theID column.

list_hp

A vector of characters, providing the name of eachhyper-parameter, in case wherekern is a custom kernel function.

noise

A logical value, indicating whether a 'noise' hyper-parametershould be included.

common_hp

A logical value, indicating whether the set ofhyper-parameters is assumed to be common to all individuals.

Value

A tibble, providing a set of random hyper-parameters associated withthe kernel specified through the argumentkern.

Examples

TRUE

Compute the hyper-posterior distribution in Magma

Description

Compute the parameters of the hyper-posterior Gaussian distribution of themean process in Magma (similarly to the expectation step of the EMalgorithm used for learning). This hyper-posterior distribution, evaluatedon a grid of inputs provided through thegrid_inputs argument, is akey component for making prediction in Magma, and is required in the functionpred_magma.

Usage

hyperposterior(  trained_model = NULL,  data = NULL,  hp_0 = NULL,  hp_i = NULL,  kern_0 = NULL,  kern_i = NULL,  prior_mean = NULL,  grid_inputs = NULL,  pen_diag = 1e-10)

Arguments

trained_model

A list, containing the information coming from aMagma model, previously trained using thetrain_magmafunction. Iftrained_model is not provided, the argumentsdata,hp_0,hp_i,kern_0, andkern_iare all required.

data

A tibble or data frame. Required columns: 'Input','Output'. Additional columns for covariates can be specified.The 'Input' column should define the variable that is used asreference for the observations (e.g. time for longitudinal data). The'Output' column specifies the observed values (the responsevariable). The data frame can also provide as many covariates as desired,with no constraints on the column names. These covariates are additionalinputs (explanatory variables) of the models that are also observed ateach reference 'Input'. Recovered fromtrained_model if notprovided.

hp_0

A named vector, tibble or data frame of hyper-parametersassociated withkern_0. Recovered fromtrained_model if notprovided.

hp_i

A tibble or data frame of hyper-parametersassociated withkern_i. Recovered fromtrained_model if notprovided.

kern_0

A kernel function, associated with the mean GP.Several popular kernels(seeThe KernelCookbook) are already implemented and can be selected within thefollowing list:

  • "SE": (default value) the Squared Exponential Kernel (also calledRadial Basis Function or Gaussian kernel),

  • "LIN": the Linear kernel,

  • "PERIO": the Periodic kernel,

  • "RQ": the Rational Quadratic kernel.Compound kernels can be created as sums or products of the above kernels.For combining kernels, simply provide a formula as a character stringwhere elements are separated by whitespaces (e.g. "SE + PERIO"). As theelements are treated sequentially from the left to the right, the productoperator '*' shall always be used before the '+' operators (e.g.'SE * LIN + RQ' is valid whereas 'RQ + SE * LIN' is not). Recovered fromtrained_model if not provided.

kern_i

A kernel function, associated with the individual GPs. ("SE","PERIO" and "RQ" are aso available here). Recovered fromtrained_model if not provided.

prior_mean

Hyper-prior mean parameter of the mean GP. This argument,can be specified under various formats, such as:

  • NULL (default). The hyper-prior mean would be set to 0 everywhere.

  • A number. The hyper-prior mean would be a constant function.

  • A vector of the same length as all the distinct Input values in thedata argument. This vector would be considered as the evaluationof the hyper-prior mean function at the training Inputs.

  • A function. This function is defined as the hyper-prior mean.

  • A tibble or data frame. Required columns: Input, Output. The Inputvalues should include at least the same values as in thedataargument.

grid_inputs

A vector or a data frame, indicating the grid ofadditional reference inputs on which the mean process' hyper-posteriorshould be evaluated.

pen_diag

A number. A jitter term, added on the diagonal to preventnumerical issues when inverting nearly singular matrices.

Value

A list gathering the parameters of the mean processes'hyper-posterior distributions, namely:

Examples

TRUE

Compute the hyper-posterior distribution for each cluster in MagmaClust

Description

Recompute the E-step of the VEM algorithm in MagmaClust for a new set ofreferenceInput. Once training is completed, it can be necessary toevaluate the hyper-posterior distributions of the mean processes at specificlocations, for which we want to make predictions. This process is directlyimplemented in thepred_magmaclust function but the usermight want to usehyperpost_clust for a tailored control ofthe prediction procedure.

Usage

hyperposterior_clust(  trained_model = NULL,  data = NULL,  mixture = NULL,  hp_k = NULL,  hp_i = NULL,  kern_k = NULL,  kern_i = NULL,  prior_mean_k = NULL,  grid_inputs = NULL,  pen_diag = 1e-10)

Arguments

trained_model

A list, containing the information coming from aMagma model, previously trained using thetrain_magmafunction. Iftrained_model is not provided, the argumentsdata,mixture,hp_k,hp_i,kern_k, andkern_i are all required.

data

A tibble or data frame. Required columns:ID,Input,Output. Additional columns for covariates can be specified.TheID column contains the unique names/codes used to identify eachindividual/task (or batch of data).TheInput column should define the variable that is used asreference for the observations (e.g. time for longitudinal data). TheOutput column specifies the observed values (the responsevariable). The data frame can also provide as many covariates as desired,with no constraints on the column names. These covariates are additionalinputs (explanatory variables) of the models that are also observed ateach referenceInput. Recovered fromtrained_model if notprovided.

mixture

A tibble or data frame, indicating the mixture probabilitiesof each cluster for each individual. Required column:ID.Recovered fromtrained_model if notprovided.

hp_k

A tibble or data frame of hyper-parametersassociated withkern_k. Recovered fromtrained_model if notprovided.

hp_i

A tibble or data frame of hyper-parametersassociated withkern_i. Recovered fromtrained_model if notprovided.

kern_k

A kernel function, associated with the mean GPs.Several popular kernels(seeThe KernelCookbook) are already implemented and can be selected within thefollowing list:

  • "SE": (default value) the Squared Exponential Kernel (also calledRadial Basis Function or Gaussian kernel),

  • "LIN": the Linear kernel,

  • "PERIO": the Periodic kernel,

  • "RQ": the Rational Quadratic kernel.Compound kernels can be created as sums or products of the above kernels.For combining kernels, simply provide a formula as a character stringwhere elements are separated by whitespaces (e.g. "SE + PERIO"). As theelements are treated sequentially from the left to the right, the productoperator '*' shall always be used before the '+' operators (e.g.'SE * LIN + RQ' is valid whereas 'RQ + SE * LIN' is not). Recovered fromtrained_model if not provided.

kern_i

A kernel function, associated with the individual GPs. ("SE","LIN", PERIO" and "RQ" are also available here). Recovered fromtrained_model if not provided.

prior_mean_k

The set of hyper-prior mean parameters (m_k) for the Kmean GPs, one value for each cluster.cluster. This argument can be specified under various formats, such as:

  • NULL (default). All hyper-prior means would be set to 0 everywhere.

  • A numerical vector of the same length as the number of clusters.Each number is associated with one cluster, and consideredto be the hyper-prior mean parameter of the cluster (i.e. a constantfunction at allInput).

  • A list of functions. Each function is associated with one cluster. Thesefunctions are all evaluated at allInput values, to providespecific hyper-prior mean vectors for each cluster.

grid_inputs

A vector or a data frame, indicating the grid ofadditional reference inputs on which the mean process' hyper-posteriorshould be evaluated.

pen_diag

A number. A jitter term, added on the diagonal to preventnumerical issues when inverting nearly singular matrices.

Value

A list containing the parameters of the mean processes'hyper-posterior distribution, namely:

Examples

TRUE

Run a k-means algorithm to initialise clusters' allocation

Description

Run a k-means algorithm to initialise clusters' allocation

Usage

ini_kmeans(data, k, nstart = 50, summary = FALSE)

Arguments

data

A tibble containing common Input and associated Output valuesto cluster.

k

A number of clusters assumed for running the kmeans algorithm.

nstart

A number, indicating how many re-starts of kmeans are set.

summary

A boolean, indicating whether we want an outcome summary

Value

A tibble containing the initial clustering obtained through kmeans.

Examples

TRUE

Mixture initialisation with kmeans

Description

Provide an initial kmeans allocation of the individuals/tasks in a datasetinto a definite number of clusters, and return the associated mixtureprobabilities.

Usage

ini_mixture(data, k, name_clust = NULL, nstart = 50)

Arguments

data

A tibble or data frame. Required columns:ID,Input,Output.

k

A number, indicating the number of clusters.

name_clust

A vector of characters. Each element should correspond tothe name of one cluster.

nstart

A number of restart used in the underlying kmeans algorithm

Value

A tibble indicating for eachID in which cluster it belongsafter a kmeans initialisation.

Examples

TRUE

Create covariance matrix from a kernel

Description

kern_to_cov() creates a covariance matrix between input values (thatcould be either scalars or vectors) evaluated within a kernel function,which is characterised by specified hyper-parameters. This matrix isa finite-dimensional evaluation of the infinite-dimensional covariancestructure of a GP, defined thanks to this kernel.

Usage

kern_to_cov(input, kern = "SE", hp, deriv = NULL, input_2 = NULL)

Arguments

input

A vector, matrix, data frame or tibble containing all inputs forone individual. If a vector, the elements are used as reference, otherwise, one column should be named 'Input' to indicate that it represents thereference (e.g. 'Input' would contain the timestamps in time-seriesapplications). The other columns are considered as being covariates. Ifno column is named 'Input', the first one is used by default.

kern

A kernel function. Several popular kernels(seeThe KernelCookbook) are already implemented and can be selected within thefollowing list:

  • "SE": (default value) the Squared Exponential Kernel (also calledRadial Basis Function or Gaussian kernel),

  • "LIN": the Linear kernel,

  • "PERIO": the Periodic kernel,

  • "RQ": the Rational Quadratic kernel.Compound kernels can be created as sums or products of the above kernels.For combining kernels, simply provide a formula as a character stringwhere elements are separated by whitespaces (e.g. "SE + PERIO"). As theelements are treated sequentially from the left to the right, the productoperator '*' shall always be used before the '+' operators (e.g.'SE * LIN + RQ' is valid whereas 'RQ + SE * LIN' is not).

hp

A list, data frame or tibble containing the hyper-parameters usedin the kernel. The name of the elements (or columns) should correspondexactly to those used in the kernel definition. Ifhp contains anelement or a column 'Noise', its value will be added on the diagonal ofthe covariance matrix.

deriv

A character, indicating according to which hyper-parameter thederivative should be computed. If NULL (default), the function simplyreturns the covariance matrix.

input_2

(optional) A vector, matrix, data frame or tibble under thesame format asinput. This argument should be used only when thekernel needs to be evaluated between two different sets of inputs,typically resulting in a non-square matrix.

Value

A covariance matrix, where elements are evaluations of the associatedkernel for each pair of reference inputs.

Examples

TRUE

Create inverse of a covariance matrix from a kernel

Description

kern_to_inv() creates the inverse of a covariance matrix betweeninput values (that could be either scalars or vectors) evaluated withina kernel function, which is characterised by specified hyper-parameters.This matrix is a finite-dimensional evaluation of theinfinite-dimensional covariance structure of a GP, defined thanks to thiskernel.

Usage

kern_to_inv(input, kern, hp, pen_diag = 1e-10, deriv = NULL)

Arguments

input

A vector, matrix, data frame or tibble containing all inputs forone individual. If a vector, the elements are used as reference, otherwise,one column should be named 'Input' to indicate that it represents thereference (e.g. 'Input' would contain the timestamps in time-seriesapplications). The other columns are considered as being covariates. Ifno column is named 'Input', the first one is used by default.

kern

A kernel function. Several popular kernels(seeThe KernelCookbook) are already implemented and can be selected within thefollowing list:

  • "SE": (default value) the Squared Exponential Kernel (also calledRadial Basis Function or Gaussian kernel),

  • "LIN": the Linear kernel,

  • "PERIO": the Periodic kernel,

  • "RQ": the Rational Quadratic kernel.Compound kernels can be created as sums or products of the above kernels.For combining kernels, simply provide a formula as a character stringwhere elements are separated by whitespaces (e.g. "SE + PERIO"). As theelements are treated sequentially from the left to the right, the productoperator '*' shall always be used before the '+' operators (e.g.'SE * LIN + RQ' is valid whereas 'RQ + SE * LIN' is not).

hp

A list, data frame or tibble containing the hyper-parameters usedin the kernel. The name of the elements (or columns) should correspondexactly to those used in the kernel definition.

pen_diag

A jitter term that is added to the covariance matrix to avoidnumerical issues when inverting, in cases of nearly singular matrices.

deriv

A character, indicating according to which hyper-parameter thederivative should be computed. If NULL (default), the function simply returnsthe inverse covariance matrix.

Value

The inverse of a covariance matrix, which elements are evaluations ofthe associated kernel for each pair of reference inputs.

Examples

TRUE

Linear Kernel

Description

Linear Kernel

Usage

lin_kernel(x, y, hp, deriv = NULL, vectorized = FALSE)

Arguments

x

A vector (or matrix if vectorized = T) of inputs.

y

A vector (or matrix if vectorized = T) of inputs.

hp

A tibble, data frame or named vector, containing the kernel'shyperparameters. Required columns: 'lin_slope' and 'lin_offset'.

deriv

A character, indicating according to which hyper-parameter thederivative should be computed. If NULL (default), the function simplyreturns the evaluation of the kernel.

vectorized

A logical value, indicating whether the function providesa vectorized version for speeded-up calculations. If TRUE, thexandy arguments should be the vector or matrix containing allinputs for which the kernel is evaluated on all pairs of elements.If FALSE, thex andy arguments are simply two inputs.

Value

A scalar, corresponding to the evaluation of the kernel.

Examples

TRUE

Compute a covariance matrix for multiple individuals

Description

Compute the covariance matrices associated with all individuals in thedatabase, taking into account their specific inputs and hyper-parameters.

Usage

list_kern_to_cov(data, kern, hp, deriv = NULL)

Arguments

data

A tibble or data frame of input data. Required column: 'ID'.Suggested column: 'Input' (for indicating the reference input).

kern

A kernel function.

hp

A tibble or data frame, containing the hyper-parameters associatedwith each individual.

deriv

A character, indicating according to which hyper-parameter thederivative should be computed. If NULL (default), the function simply returnsthe list of covariance matrices.

Value

A named list containing all of the inverse covariance matrices.

Examples

TRUE

Compute an inverse covariance matrix for multiple individuals

Description

Compute the inverse covariance matrices associated with all individualsin the database, taking into account their specific inputs andhyper-parameters.

Usage

list_kern_to_inv(db, kern, hp, pen_diag, deriv = NULL)

Arguments

db

A tibble or data frame of input data. Required column: 'ID'.Suggested column: 'Input' (for indicating the reference input).

kern

A kernel function.

hp

A tibble or data frame, containing the hyper-parameters associatedwith each individual.

pen_diag

A number. A jitter term, added on the diagonal to preventnumerical issues when inverting nearly singular matrices.

deriv

A character, indicating according to which hyper-parameter thederivative should be computed. If NULL (default), the function simply returnsthe list of covariance matrices.

Value

A named list containing all of the inverse covariance matrices.

Examples

TRUE

Log-Likelihood function of a Gaussian Process

Description

Log-Likelihood function of a Gaussian Process

Usage

logL_GP(hp, db, mean, kern, post_cov, pen_diag)

Arguments

hp

A tibble, data frame or named vector containing hyper-parameters.

db

A tibble containing the values we want to compute the logL on.Required columns: Input, Output. Additional covariate columns are allowed.

mean

A vector, specifying the mean of the GP at the reference inputs.

kern

A kernel function.

post_cov

(optional) A matrix, corresponding to covariance parameter ofthe hyper-posterior. Used to compute the hyper-prior distribution of a newindividual in Magma.

pen_diag

A jitter term that is added to the covariance matrix to avoidnumerical issues when inverting, in cases of nearly singular matrices.

Value

A number, corresponding to the value of Gaussianlog-Likelihood (where the covariance can be the sum of the individual andthe hyper-posterior's mean process covariances).

Examples

TRUE

Modified log-Likelihood function for GPs

Description

Log-Likelihood function involved in Magma during the maximisation step ofthe training. The log-Likelihood is defined as a simple Gaussian likelihoodadded with correction trace term.

Usage

logL_GP_mod(hp, db, mean, kern, post_cov, pen_diag)

Arguments

hp

A tibble, data frame or named vector of hyper-parameters.

db

A tibble containing values we want to compute logL on.Required columns: Input, Output. Additional covariate columns are allowed.

mean

A vector, specifying the mean of the GP at the reference inputs.

kern

A kernel function.

post_cov

A matrix, covariance parameter of the hyper-posterior.Used to compute the correction term.

pen_diag

A jitter term that is added to the covariance matrix to avoidnumerical issues when inverting, in cases of nearly singular matrices.

Value

A number, corresponding to the value of the modified Gaussianlog-Likelihood defined in Magma.

Examples

TRUE

Modified log-Likelihood function with common HPs for GPs

Description

Log-Likelihood function involved in Magma during the maximisation step ofthe training, in the particular case where the hyper-parameters are shared byall individuals. The log-Likelihood is defined as a sum over all individualsof Gaussian likelihoods added with correction trace terms.

Usage

logL_GP_mod_common_hp(hp, db, mean, kern, post_cov, pen_diag)

Arguments

hp

A tibble, data frame of hyper-parameters.

db

A tibble containing the values we want to compute the logL on.Required columns: ID, Input, Output. Additional covariate columns areallowed.

mean

A vector, specifying the mean of the GP at the reference inputs.

kern

A kernel function.

post_cov

A matrix, covariance parameter of the hyper-posterior.Used to compute the correction term.

pen_diag

A jitter term that is added to the covariance matrix to avoidnumerical issues when inverting, in cases of nearly singular matrices.

Value

A number, corresponding to the value of the modified Gaussianlog-Likelihood with common hyper-parameters defined in Magma.

Examples

TRUE

Log-Likelihood for monitoring the EM algorithm in Magma

Description

Log-Likelihood for monitoring the EM algorithm in Magma

Usage

logL_monitoring(  hp_0,  hp_i,  db,  m_0,  kern_0,  kern_i,  post_mean,  post_cov,  pen_diag)

Arguments

hp_0

A named vector, tibble or data frame, containing thehyper-parameters associated with the mean GP.

hp_i

A tibble or data frame, containing the hyper-parameters with theindividual GPs.

db

A tibble or data frame. Columns required: ID, Input, Output.Additional columns for covariates can be specified.

m_0

A vector, corresponding to the prior mean of the mean GP.

kern_0

A kernel function, associated with the mean GP.

kern_i

A kernel function, associated with the individual GPs.

post_mean

A tibble, coming out of the E step, containing the Input andassociated Output of the hyper-posterior mean parameter.

post_cov

A matrix, coming out of the E step, being the hyper-posteriorcovariance parameter.

pen_diag

A jitter term that is added to the covariance matrix to avoidnumerical issues when inverting, in cases of nearly singular matrices.

Value

A number, expectation of joint log-likelihood of the model. Thisquantity is supposed to increase at each step of the EM algorithm, andthus used for monitoring the procedure.

Examples

TRUE

M-Step of the EM algorithm

Description

Maximisation step of the EM algorithm to compute hyper-parameters of all thekernels involved in Magma.

Usage

m_step(  db,  m_0,  kern_0,  kern_i,  old_hp_0,  old_hp_i,  post_mean,  post_cov,  common_hp,  pen_diag)

Arguments

db

A tibble or data frame. Columns required: ID, Input, Output.Additional columns for covariates can be specified.

m_0

A vector, corresponding to the prior mean of the mean GP.

kern_0

A kernel function, associated with the mean GP.

kern_i

A kernel function, associated with the individual GPs.

old_hp_0

A named vector, tibble or data frame, containing thehyper-parameters from the previous M-step (or initialisation) associatedwith the mean GP.

old_hp_i

A tibble or data frame, containing the hyper-parametersfrom the previous M-step (or initialisation) associated with theindividual GPs.

post_mean

A tibble, coming out of the E step, containing the Input andassociated Output of the hyper-posterior mean parameter.

post_cov

A matrix, coming out of the E step, being the hyper-posteriorcovariance parameter.

common_hp

A logical value, indicating whether the set ofhyper-parameters is assumed to be common to all indiviuals.

pen_diag

A number. A jitter term, added on the diagonal to preventnumerical issues when inverting nearly singular matrices.

Value

A named list, containing the elementshp_0, a tibblecontaining the hyper-parameters associated with the mean GP,hp_i, a tibble containing the hyper-parametersassociated with the individual GPs.

Examples

TRUE

Periodic Kernel

Description

Periodic Kernel

Usage

perio_kernel(x, y, hp, deriv = NULL, vectorized = FALSE)

Arguments

x

A vector (or matrix if vectorized = T) of inputs.

y

A vector (or matrix if vectorized = T) of inputs.

hp

A tibble, data frame or named vector, containing the kernel'shyperparameters. Required columns: 'perio_variance', 'perio_lengthscale',and 'period'.

deriv

A character, indicating according to which hyper-parameter thederivative should be computed. If NULL (default), the function simply returnsthe evaluation of the kernel.

vectorized

A logical value, indicating whether the function providesa vectorized version for speeded-up calculations. If TRUE, thexandy arguments should be the vector or matrix containing allinputs for which the kernel is evaluated on all pairs of elements.If FALSE, thex andy arguments are simply two inputs.

Value

A scalar, corresponding to the evaluation of the kernel.

Examples

TRUE

Plot smoothed curves of raw data

Description

Display raw data under the Magma format as smoothed curves.

Usage

plot_db(data, cluster = FALSE, legend = FALSE)

Arguments

data

A data frame or tibble with format : ID, Input, Output.

cluster

A boolean indicating whether data should be coloured bycluster. Requires a column named 'Cluster'.

legend

A boolean indicating whether the legend should be displayed.

Value

Graph of smoothed curves of raw data.

Examples

TRUE

Create a GIF of Magma or GP predictions

Description

Create a GIF animation displaying how Magma or classic GPpredictions evolve and improve when the number of data points increase.

Usage

plot_gif(  pred_gp,  x_input = NULL,  data = NULL,  data_train = NULL,  prior_mean = NULL,  y_grid = NULL,  heatmap = FALSE,  prob_CI = 0.95,  size_data = 3,  size_data_train = 1,  alpha_data_train = 0.5,  export_gif = FALSE,  path = "gif_gp.gif",  ...)

Arguments

pred_gp

A tibble, typically coming from thepred_giffunction. Required columns: 'Input', 'Mean', 'Var' and 'Index'.

x_input

A vector of character strings, indicating which input shouldbe displayed. If NULL(default) the 'Input' column is used for the x-axis.If providing a 2-dimensional vector, the corresponding columns are usedfor the x-axis and y-axis.

data

(Optional) A tibble or data frame. Required columns: 'Input','Output'. Additional columns for covariates can be specified.The 'Input' column should define the variable that is used asreference for the observations (e.g. time for longitudinal data). The'Output' column specifies the observed values (the responsevariable). The data frame can also provide as many covariates as desired,with no constraints on the column names. These covariates are additionalinputs (explanatory variables) of the models that are also observed ateach reference 'Input'.

data_train

(Optional) A tibble or data frame, containing the trainingdata of the Magma model. The data set should have the same format as thedata argument with an additional column 'ID' for identifying thedifferent individuals/tasks. If provided, those data are displayed asbackward colourful points (each colour corresponding to oneindividual/task).

prior_mean

(Optional) A tibble or a data frame, containing the 'Input'and associated 'Output' prior mean parameter of the GP prediction.

y_grid

A vector, indicating the grid of values on the y-axis for whichprobabilities should be computed for heatmaps of 1-dimensionalpredictions. If NULL (default), a vector of length 50 is defined, rangingbetween the min and max 'Output' values contained inpred_gp.

heatmap

A logical value indicating whether the GP prediction should berepresented as a heatmap of probabilities for 1-dimensional inputs. IfFALSE (default), the mean curve and associated 95% CI are displayed.

prob_CI

A number between 0 and 1 (default is 0.95), indicating thelevel of the Credible Interval associated with the posterior mean curve.

size_data

A number, controlling the size of thedata points.

size_data_train

A number, controlling the size of thedata_train points.

alpha_data_train

A number, between 0 and 1, controlling transparencyof thedata_train points.

export_gif

A logical value indicating whether the animation shouldbe exported as a .gif file.

path

A character string defining the path where the GIF file should beexported.

...

Any additional parameters that can be passed to the functiontransition_states from thegganimatepackage.

Value

Visualisation of a Magma or GP prediction (optional: display datapoints, training data points and the prior mean function), where datapoints are added sequentially for visualising changes in prediction asinformation increases.

Examples

TRUE

Plot Magma or GP predictions

Description

Display Magma or classic GP predictions. According to the dimension of theinputs, the graph may be a mean curve + Credible Interval or a heatmap ofprobabilities.

Usage

plot_gp(  pred_gp,  x_input = NULL,  data = NULL,  data_train = NULL,  prior_mean = NULL,  y_grid = NULL,  heatmap = FALSE,  samples = FALSE,  nb_samples = 50,  plot_mean = TRUE,  alpha_samples = 0.3,  prob_CI = 0.95,  size_data = 3,  size_data_train = 1,  alpha_data_train = 0.5)plot_magma(  pred_gp,  x_input = NULL,  data = NULL,  data_train = NULL,  prior_mean = NULL,  y_grid = NULL,  heatmap = FALSE,  samples = FALSE,  nb_samples = 50,  plot_mean = TRUE,  alpha_samples = 0.3,  prob_CI = 0.95,  size_data = 3,  size_data_train = 1,  alpha_data_train = 0.5)

Arguments

pred_gp

A tibble or data frame, typically coming frompred_magma orpred_gp functions. Requiredcolumns: 'Input', 'Mean', 'Var'. Additional covariate columns may bepresent in case of multi-dimensional inputs.

x_input

A vector of character strings, indicating which input shouldbe displayed. If NULL (default) the 'Input' column is used for the x-axis.If providing a 2-dimensional vector, the corresponding columns are usedfor the x-axis and y-axis.

data

(Optional) A tibble or data frame. Required columns: 'Input','Output'. Additional columns for covariates can be specified. Thisargument corresponds to the raw data on which the prediction has beenperformed.

data_train

(Optional) A tibble or data frame, containing the trainingdata of the Magma model. The data set should have the same format as thedata argument with an additional required column 'ID' foridentifying the different individuals/tasks. If provided, those data aredisplayed as backward colourful points (each colour corresponding to oneindividual/task).

prior_mean

(Optional) A tibble or a data frame, containing the 'Input'and associated 'Output' prior mean parameter of the GP prediction.

y_grid

A vector, indicating the grid of values on the y-axis for whichprobabilities should be computed for heatmaps of 1-dimensionalpredictions. If NULL (default), a vector of length 50 is defined, rangingbetween the min and max 'Output' values contained inpred_gp.

heatmap

A logical value indicating whether the GP prediction should berepresented as a heatmap of probabilities for 1-dimensional inputs. IfFALSE (default), the mean curve and associated Credible Interval aredisplayed.

samples

A logical value indicating whether the GP prediction should berepresented as a collection of samples drawn from the posterior. IfFALSE (default), the mean curve and associated Credible Interval aredisplayed.

nb_samples

A number, indicating the number of samples to be drawn fromthe predictive posterior distribution. For two-dimensional graphs, onlyone sample can be displayed.

plot_mean

A logical value, indicating whether the mean predictionshould be displayed on the graph whensamples = TRUE.

alpha_samples

A number, controlling transparency of the sample curves.

prob_CI

A number between 0 and 1 (default is 0.95), indicating thelevel of the Credible Interval associated with the posterior mean curve.If this this argument is set to 1, the Credible Interval is not displayed.

size_data

A number, controlling the size of thedata points.

size_data_train

A number, controlling the size of thedata_train points.

alpha_data_train

A number, between 0 and 1, controlling transparencyof thedata_train points.

Value

Visualisation of a Magma or GP prediction (optional: display datapoints, training data points and the prior mean function). For 1-Dinputs, the prediction is represented as a mean curve and its associated95% Credible Interval, as a collection of samples drawn from theposterior ifsamples = TRUE, or as a heatmap of probabilities ifheatmap = TRUE. For 2-D inputs, the prediction is represented as aheatmap, where each couple of inputs on the x-axis and y-axis areassociated with a gradient of colours for the posterior mean values,whereas the uncertainty is indicated by the transparency (the narrower isthe Credible Interval, the more opaque is the associated colour, and viceversa)

Examples

TRUE

Plot MagmaClust predictions

Description

Display MagmaClust predictions. According to the dimension of theinputs, the graph may be a mean curve (dim inputs = 1) or a heatmap(dim inputs = 2) of probabilities. Moreover, MagmaClust can provide credibleintervals only by visualising cluster-specific predictions (e.g. for the mostprobable cluster). When visualising the full mixture-of-GPs prediction,which can be multimodal, the user should choose between the simple meanfunction or the full heatmap of probabilities (more informative but slower).

Usage

plot_magmaclust(  pred_clust,  cluster = "all",  x_input = NULL,  data = NULL,  data_train = NULL,  col_clust = FALSE,  prior_mean = NULL,  y_grid = NULL,  heatmap = FALSE,  samples = FALSE,  nb_samples = 50,  plot_mean = TRUE,  alpha_samples = 0.3,  prob_CI = 0.95,  size_data = 3,  size_data_train = 1,  alpha_data_train = 0.5)

Arguments

pred_clust

A list of predictions, typically coming frompred_magmaclust. Required elements:pred,mixture,mixture_pred.

cluster

A character string, indicating which cluster to plot from.If 'all' (default) the mixture of GPs prediction is displayed as a meancurve (1-D inputs) or a mean heatmap (2-D inputs). Alternatively, if thename of one cluster is provided, the classic mean curve + credibleinterval is displayed (1-D inputs), or a heatmap with colour gradient forthe mean and transparency gradient for the Credible Interval (2-D inputs).

x_input

A vector of character strings, indicating which input shouldbe displayed. If NULL (default) the 'Input' column is used for the x-axis.If providing a 2-dimensional vector, the corresponding columns are usedfor the x-axis and y-axis.

data

(Optional) A tibble or data frame. Required columns:Input,Output. Additional columns for covariates can be specified. Thisargument corresponds to the raw data on which the prediction has beenperformed.

data_train

(Optional) A tibble or data frame, containing the trainingdata of the MagmaClust model. The data set should have the same format asthedata argument with an additional required columnID foridentifying the different individuals/tasks. If provided, those data aredisplayed as backward colourful points (each colour corresponding to oneindividual or a cluster, seecol_clust below).

col_clust

A boolean indicating whether backward points are colouredaccording to the individuals or to their most probable cluster. If onewants to colour by clusters, a columnCluster shall be present indata_train. We advise to usedata_allocate_clusterfor automatically creating a well-formatted dataset from a trainedMagmaClust model.

prior_mean

(Optional) A list providing, for each cluster, atibble containing prior mean parameters of the prediction. This argumenttypically comes as an outcomehyperpost$mean, available throughthetrain_magmaclust,pred_magmaclustfunctions.

y_grid

A vector, indicating the grid of values on the y-axis for whichprobabilities should be computed for heatmaps of 1-dimensionalpredictions. If NULL (default), a vector of length 50 is defined, rangingbetween the min and max 'Output' values contained inpred.

heatmap

A logical value indicating whether the GP mixture should berepresented as a heatmap of probabilities for 1-dimensional inputs. IfFALSE (default), the mean curve (and associated Credible Interval ifavailable) are displayed.

samples

A logical value indicating whether the GP mixture should berepresented as a collection of samples drawn from the posterior. IfFALSE (default), the mean curve (and associated Credible Interval ifavailable) are displayed.

nb_samples

A number, indicating the number of samples to be drawn fromthe predictive posterior distribution. For two-dimensional graphs, onlyone sample can be displayed.

plot_mean

A logical value, indicating whether the mean predictionshould be displayed on the graph whensamples = TRUE.

alpha_samples

A number, controlling transparency of the sample curves.

prob_CI

A number between 0 and 1 (default is 0.95), indicating thelevel of the Credible Interval associated with the posterior mean curve.If this this argument is set to 1, the Credible Interval is not displayed.

size_data

A number, controlling the size of thedata points.

size_data_train

A number, controlling the size of thedata_train points.

alpha_data_train

A number, between 0 and 1, controlling transparencyof thedata_train points.

Value

Visualisation of a MagmaClust prediction (optional: display datapoints, training data points and the prior mean functions). For 1-Dinputs, the prediction is represented as a mean curve (and its associated95% Credible Interval for cluster-specific predictions), or as a heatmapof probabilities ifheatmap = TRUE. In the case of MagmaClust,the heatmap representation should be preferred for clarity, although thedefault display remains mean curve for quicker execution. For 2-D inputs,the prediction is represented as a heatmap, where each couple of inputs onthe x-axis and y-axis are associated with a gradient of colours for theposterior mean values, whereas the uncertainty is indicated by thetransparency (the narrower is the Credible Interval, the more opaque isthe associated colour, and vice versa). As for 1-D inputs, CredibleInterval information is only available for cluster-specific predictions.

Examples

TRUE

Display realisations from a (mixture of) GP prediction

Description

Display samples drawn from the posterior of a GP, Magma orMagmaClust prediction. According to the dimension of the inputs, the graphmay represent curves or a heatmap.

Usage

plot_samples(  pred = NULL,  samples = NULL,  nb_samples = 50,  x_input = NULL,  plot_mean = TRUE,  alpha_samples = 0.3)

Arguments

pred

A list, typically coming frompred_gp,pred_magma orpred_magmaclust functions, usingthe argument 'get_full_cov = TRUE'. Required elements:pred,cov. This argument is needed ifsamples is missing.

samples

A tibble or data frame, containing the samples generated froma GP, Magma, or MagmaClust prediction. Required columns:Input,Sample,Output. This argument is needed ifpredis missing.

nb_samples

A number, indicating the number of samples to be drawn fromthe predictive posterior distribution. For two-dimensional graphs, onlyone sample can be displayed.

x_input

A vector of character strings, indicating which 'column'should be displayed in the case of multidimensional inputs. IfNULL(default) the Input' column is used for the x-axis. If providing a2-dimensional vector, the corresponding columns are used for the x-axisand the y-axis.

plot_mean

A logical value, indicating whether the mean predictionshould be displayed on the graph.

alpha_samples

A number, controlling transparency of the sample curves.

Value

Graph of samples drawn from a posterior distribution of a GP,Magma, or MagmaClust prediction.

Examples

TRUE

Magma prediction for ploting GIFs

Description

Generate a Magma or classic GP prediction under a format that is compatiblewith a further GIF visualisation of the results. For a Magma prediction,either thetrained_model orhyperpost argument is required.Otherwise, a classic GP prediction is applied and the prior mean can bespecified through themean argument.

Usage

pred_gif(  data,  trained_model = NULL,  grid_inputs = NULL,  hyperpost = NULL,  mean = NULL,  hp = NULL,  kern = "SE",  pen_diag = 1e-10)

Arguments

data

A tibble or data frame. Required columns: 'Input','Output'. Additional columns for covariates can be specified.The 'Input' column should define the variable that is used asreference for the observations (e.g. time for longitudinal data). The'Output' column specifies the observed values (the responsevariable). The data frame can also provide as many covariates as desired,with no constraints on the column names. These covariates are additionalinputs (explanatory variables) of the models that are also observed ateach reference 'Input'.

trained_model

A list, containing the information coming from aMagma model, previously trained using thetrain_magmafunction.

grid_inputs

The grid of inputs (reference Input and covariates) valueson which the GP should be evaluated. Ideally, this argument should be atibble or a data frame, providing the same columns asdata, except'Output'. Nonetheless, in cases wheredata provides only one'Input' column, thegrid_inputs argument can be NULL (default) or avector. This vector would be used as reference input for prediction and ifNULL, a vector of length 500 is defined, ranging between the min and maxInput values ofdata.

hyperpost

A list, containing the elements 'mean' and 'cov', theparameters of the hyper-posterior distribution of the mean process.Typically, this argument should from a previous learning usingtrain_magma, or a previous prediction withpred_magma, with the argumentget_hyperpost set toTRUE. The 'mean' element should be a data frame with two columns 'Input'and 'Output'. The 'cov' element should be a covariance matrix withcolnames and rownames corresponding to the 'Input' in 'mean'. In allcases, the column 'Input' should contain all the values appearing both inthe 'Input' column ofdata and ingrid_inputs.

mean

Mean parameter of the GP. This argument can be specified undervarious formats, such as:

  • NULL (default). The mean would be set to 0 everywhere.

  • A number. The mean would be a constant function.

  • A function. This function is defined as the mean.

  • A tibble or data frame. Required columns: Input, Output. The Inputvalues should include at least the same values as in thedataargument.

hp

A named vector, tibble or data frame of hyper-parametersassociated withkern. The columns/elements should be namedaccording to the hyper-parameters that are used inkern. Thefunctiontrain_gp can be used to learn maximum-likelihoodestimators of the hyper-parameters,

kern

A kernel function, defining the covariance structure of the GP.Several popular kernels(seeThe KernelCookbook) are already implemented and can be selected within thefollowing list:

  • "SE": (default value) the Squared Exponential Kernel (also calledRadial Basis Function or Gaussian kernel),

  • "LIN": the Linear kernel,

  • "PERIO": the Periodic kernel,

  • "RQ": the Rational Quadratic kernel.Compound kernels can be created as sums or products of the above kernels.For combining kernels, simply provide a formula as a character stringwhere elements are separated by whitespaces (e.g. "SE + PERIO"). As theelements are treated sequentially from the left to the right, the productoperator '*' shall always be used before the '+' operators (e.g.'SE * LIN + RQ' is valid whereas 'RQ + SE * LIN' is not).

pen_diag

A number. A jitter term, added on the diagonal to preventnumerical issues when inverting nearly singular matrices.

Value

A tibble, representing Magma or GP predictions as two column 'Mean'and 'Var', evaluated on thegrid_inputs. The column 'Input' andadditional covariates columns are associated to each predicted values. Anadditional 'Index' column is created for the sake of GIF creation usingthe functionplot_gif

Examples

TRUE

Gaussian Process prediction

Description

Compute the posterior distribution of a standard GP, using the formalism ofMagma. By providing observed data, the prior mean and covariancematrix (by defining a kernel and its associated hyper-parameters), the meanand covariance parameters of the posterior distribution are computed on thegrid of inputs that has been specified. This predictive distribution can beevaluated on any arbitrary inputs since a GP is an infinite-dimensionalobject.

Usage

pred_gp(  data = NULL,  grid_inputs = NULL,  mean = NULL,  hp = NULL,  kern = "SE",  get_full_cov = FALSE,  plot = TRUE,  pen_diag = 1e-10)

Arguments

data

A tibble or data frame. Required columns: 'Input','Output'. Additional columns for covariates can be specified.The 'Input' column should define the variable that is used asreference for the observations (e.g. time for longitudinal data). The'Output' column specifies the observed values (the responsevariable). The data frame can also provide as many covariates as desired,with no constraints on the column names. These covariates are additionalinputs (explanatory variables) of the models that are also observed ateach reference 'Input'. If NULL, the prior GP is returned.

grid_inputs

The grid of inputs (reference Input and covariates) valueson which the GP should be evaluated. Ideally, this argument should be atibble or a data frame, providing the same columns asdata, except'Output'. Nonetheless, in cases wheredata provides only one'Input' column, thegrid_inputs argument can be NULL (default) or avector. This vector would be used as reference input for prediction and ifNULL, a vector of length 500 is defined, ranging between the min and maxInput values ofdata.

mean

Mean parameter of the GP. This argument can be specified undervarious formats, such as:

  • NULL (default). The mean would be set to 0 everywhere.

  • A number. The mean would be a constant function.

  • A tibble or data frame. Required columns: Input, Output. The Inputvalues should include at least the same values as in thedataargument.

hp

A named vector, tibble or data frame of hyper-parametersassociated withkern. The columns/elements should be namedaccording to the hyper-parameters that are used inkern. If NULL(default), the functiontrain_gp is called with randominitial values for learning maximum-likelihood estimators of thehyper-parameters associated withkern.

kern

A kernel function, defining the covariance structure of the GP.Several popular kernels(seeThe KernelCookbook) are already implemented and can be selected within thefollowing list:

  • "SE": (default value) the Squared Exponential Kernel (also calledRadial Basis Function or Gaussian kernel),

  • "LIN": the Linear kernel,

  • "PERIO": the Periodic kernel,

  • "RQ": the Rational Quadratic kernel.Compound kernels can be created as sums or products of the above kernels.For combining kernels, simply provide a formula as a character stringwhere elements are separated by whitespaces (e.g. "SE + PERIO"). As theelements are treated sequentially from the left to the right, the productoperator '*' shall always be used before the '+' operators (e.g.'SE * LIN + RQ' is valid whereas 'RQ + SE * LIN' is not).

get_full_cov

A logical value, indicating whether the full posteriorcovariance matrix should be returned.

plot

A logical value, indicating whether a plot of the results isautomatically displayed.

pen_diag

A number. A jitter term, added on the diagonal to preventnumerical issues when inverting nearly singular matrices.

Value

A tibble, representing the GP predictions as two column 'Mean' and'Var', evaluated on thegrid_inputs. The column 'Input' andadditional covariates columns are associated to each predicted values.If theget_full_cov argument is TRUE, the function returns a list,in which the tibble described above is defined as 'pred' and the fullposterior covariance matrix is defined as 'cov'.

Examples

TRUE

Magma prediction

Description

Compute the posterior predictive distribution in Magma. Providing data of anynew individual/task, its trained hyper-parameters and a previously trainedMagma model, the predictive distribution is evaluated on any arbitrary inputsthat are specified through the 'grid_inputs' argument.

Usage

pred_magma(  data = NULL,  trained_model = NULL,  grid_inputs = NULL,  hp = NULL,  kern = "SE",  hyperpost = NULL,  get_hyperpost = FALSE,  get_full_cov = FALSE,  plot = TRUE,  pen_diag = 1e-10)

Arguments

data

A tibble or data frame. Required columns: 'Input','Output'. Additional columns for covariates can be specified.The 'Input' column should define the variable that is used asreference for the observations (e.g. time for longitudinal data). The'Output' column specifies the observed values (the responsevariable). The data frame can also provide as many covariates as desired,with no constraints on the column names. These covariates are additionalinputs (explanatory variables) of the models that are also observed ateach reference 'Input'. If NULL, the mean process fromtrained_model is returned as a generic prediction.

trained_model

A list, containing the information coming from aMagma model, previously trained using thetrain_magmafunction.

grid_inputs

The grid of inputs (reference Input and covariates) valueson which the GP should be evaluated. Ideally, this argument should be atibble or a data frame, providing the same columns asdata, except'Output'. Nonetheless, in cases wheredata provides only one'Input' column, thegrid_inputs argument can be NULL (default) or avector. This vector would be used as reference input for prediction and ifNULL, a vector of length 500 is defined, ranging between the min and maxInput values ofdata.

hp

A named vector, tibble or data frame of hyper-parametersassociated withkern. The columns/elements should be namedaccording to the hyper-parameters that are used inkern. Thefunctiontrain_gp can be used to learn maximum-likelihoodestimators of the hyper-parameters.

kern

A kernel function, defining the covariance structure of the GP.Several popular kernels(seeThe KernelCookbook) are already implemented and can be selected within thefollowing list:

  • "SE": (default value) the Squared Exponential Kernel (also calledRadial Basis Function or Gaussian kernel),

  • "LIN": the Linear kernel,

  • "PERIO": the Periodic kernel,

  • "RQ": the Rational Quadratic kernel.Compound kernels can be created as sums or products of the above kernels.For combining kernels, simply provide a formula as a character stringwhere elements are separated by whitespaces (e.g. "SE + PERIO"). As theelements are treated sequentially from the left to the right, the productoperator '*' shall always be used before the '+' operators (e.g.'SE * LIN + RQ' is valid whereas 'RQ + SE * LIN' is not).

hyperpost

A list, containing the elements 'mean' and 'cov', theparameters of the hyper-posterior distribution of the mean process.Typically, this argument should come from a previous learning usingtrain_magma, or a previous prediction withpred_magma, with the argumentget_hyperpost set toTRUE. The 'mean' element should be a data frame with two columns 'Input'and 'Output'. The 'cov' element should be a covariance matrix withcolnames and rownames corresponding to the 'Input' in 'mean'. In allcases, the column 'Input' should contain all the values appearing both inthe 'Input' column ofdata and ingrid_inputs.

get_hyperpost

A logical value, indicating whether the hyper-posteriordistribution of the mean process should be returned. This can be usefulwhen planning to perform several predictions on the same grid of inputs,since recomputation of the hyper-posterior can be prohibitive for highdimensional grids.

get_full_cov

A logical value, indicating whether the full posteriorcovariance matrix should be returned.

plot

A logical value, indicating whether a plot of the results isautomatically displayed.

pen_diag

A number. A jitter term, added on the diagonal to preventnumerical issues when inverting nearly singular matrices.

Value

A tibble, representing Magma predictions as two column 'Mean' and'Var', evaluated on thegrid_inputs. The column 'Input' andadditional covariates columns are associated to each predicted values.If theget_full_cov orget_hyperpost arguments are TRUE,the function returns a list, in which the tibble described above isdefined as 'pred_gp' and the full posterior covariance matrix isdefined as 'cov', and the hyper-posterior distribution of the mean processis defined as 'hyperpost'.

Examples

TRUE

MagmaClust prediction

Description

Compute the posterior predictive distribution in MagmaClust.Providing data from any new individual/task, its trained hyper-parametersand a previously trained MagmaClust model, the multi-task posteriordistribution is evaluated on any arbitrary inputs that are specified throughthe 'grid_inputs' argument. Due to the nature of the model, the prediction isdefined as a mixture of Gaussian distributions. Therefore the presentfunction computes the parameters of the predictive distributionassociated with each cluster, as well as the posterior mixture probabilitiesfor this new individual/task.

Usage

pred_magmaclust(  data = NULL,  trained_model = NULL,  grid_inputs = NULL,  mixture = NULL,  hp = NULL,  kern = "SE",  hyperpost = NULL,  prop_mixture = NULL,  get_hyperpost = FALSE,  get_full_cov = TRUE,  plot = TRUE,  pen_diag = 1e-10)

Arguments

data

A tibble or data frame. Required columns:Input,Output. Additional columns for covariates can be specified.TheInput column should define the variable that is used asreference for the observations (e.g. time for longitudinal data). TheOutput column specifies the observed values (the responsevariable). The data frame can also provide as many covariates as desired,with no constraints on the column names. These covariates are additionalinputs (explanatory variables) of the models that are also observed ateach reference 'Input'. If NULL, the mixture of mean processes fromtrained_model is returned as a generic prediction.

trained_model

A list, containing the information coming from aMagmaClust model, previously trained using thetrain_magmaclust function. Iftrained_model is set toNULL, thehyperpost andprop_mixture arguments are mandatoryto perform required re-computations for the prediction to succeed.

grid_inputs

The grid of inputs (reference Input and covariates) valueson which the GP should be evaluated. Ideally, this argument should be atibble or a data frame, providing the same columns asdata, except'Output'. Nonetheless, in cases wheredata provides only one'Input' column, thegrid_inputs argument can be NULL (default) or avector. This vector would be used as reference input for prediction and ifNULL, a vector of length 500 is defined, ranging between the min and maxInput values ofdata.

mixture

A tibble or data frame, indicating the mixture probabilitiesof each cluster for the new individual/task.If NULL, thetrain_gp_clust function is used to computethese posterior probabilities according todata.

hp

A named vector, tibble or data frame of hyper-parametersassociated withkern. The columns/elements should be namedaccording to the hyper-parameters that are used inkern. Thetrain_gp_clust function can be used to learnmaximum-likelihood estimators of the hyper-parameters.

kern

A kernel function, defining the covariance structure of the GP.Several popular kernels(seeThe KernelCookbook) are already implemented and can be selected within thefollowing list:

  • "SE": (default value) the Squared Exponential Kernel (also calledRadial Basis Function or Gaussian kernel),

  • "LIN": the Linear kernel,

  • "PERIO": the Periodic kernel,

  • "RQ": the Rational Quadratic kernel.Compound kernels can be created as sums or products of the above kernels.For combining kernels, simply provide a formula as a character stringwhere elements are separated by whitespaces (e.g. "SE + PERIO"). As theelements are treated sequentially from the left to the right, the productoperator '*' shall always be used before the '+' operators (e.g.'SE * LIN + RQ' is valid whereas 'RQ + SE * LIN' is not).

hyperpost

A list, containing the elementsmean,cov andmixture the parameters of the hyper-posterior distributions of themean processes. Typically, this argument should come from a previouslearning usingtrain_magmaclust, or a previous predictionwithpred_magmaclust, with the argumentget_hyperpostset to TRUE.

prop_mixture

A tibble or a named vector of the mixture proportions.Each name of column or element should refer to a cluster. The valueassociated with each cluster is a number between 0 and 1. If bothmixture andtrained_model are set to NULL, this argumentallows to recompute mixture probabilities, thanks to thehyperpostargument and thetrain_gp_clust function.

get_hyperpost

A logical value, indicating whether the hyper-posteriordistributions of the mean processes should be returned. This can be usefulwhen planning to perform several predictions on the same grid of inputs,since recomputation of the hyper-posterior can be prohibitive for highdimensional grids.

get_full_cov

A logical value, indicating whether the full posteriorcovariance matrices should be returned.

plot

A logical value, indicating whether a plot of the results isautomatically displayed.

pen_diag

A number. A jitter term, added on the diagonal to preventnumerical issues when inverting nearly singular matrices.

Value

A list of GP prediction results composed of:

Examples

TRUE

Indicates the most probable cluster

Description

Indicates the most probable cluster

Usage

proba_max_cluster(mixture)

Arguments

mixture

A tibble or data frame containing mixture probabilities.

Value

A tibble, retaining only the most probable cluster. The columnCluster indicates the the cluster's name whereasProbarefers to its associated probability. IfID is initiallya column ofmixture (optional), the function returns the mostprobable cluster for all the differentID values.

Examples

TRUE

Regularise a grid of inputs in a dataset

Description

Modify the original grid of inputs to make it more 'regular' (in the sensethat the interval between each observation is constant, or corresponds to aspecific pattern defined by the user). In particular, this function can alsobe used to summarise several data points into one, at a specific location. Inthis case, the output values are averaged according to the 'summarise_fct'argument.

Usage

regularize_data(  data,  size_grid = 30,  grid_inputs = NULL,  summarise_fct = base::mean)regularise_data(  data,  size_grid = 30,  grid_inputs = NULL,  summarise_fct = base::mean)

Arguments

data

A tibble or data frame. Required columns:ID,InputOutput. TheID column contains the unique names/codes usedto identify each individual/task (or batch of data). TheInputcolumn corresponds to observed locations (an explanatory variable).TheOutput column specifies the associated observed values (theresponse variable). The data frame can also provide as many additionalinputs as desired, with no constraints on the column names.

size_grid

An integer, which indicates the number of equispaced pointseach column must contain. Each original input value will be collapsed tothe closest point of the new regular grid, and the associated outputs areaveraged using the 'summarise_fct' function. This argument is used when'grid_inputs' is left to 'NULL'. Default value is 30.

grid_inputs

A data frame, corresponding to a pre-defined grid ofinputs according to which we want to regularise a dataset. Column namesmust be similar to those appearing indata. IfNULL (default), a default grid of inputs is defined: for eachinput column indata, a regular sequence is created from the minto the max values, with a number of equispaced points being equal to the'size_grid' argument.

summarise_fct

A character string or a function. If several similarinputs are associated with different outputs, the user can choose thesummarising function for the output among the following: min, max, mean,median. A custom function can be defined if necessary. Default is "mean".

Value

A data frame, where input columns have been regularised as desired.

Examples

data = tibble::tibble(ID = 1, Input = 0:100, Output = -50:50)## Define a 1D input grid of 10 pointsregularize_data(data, size_grid = 10)## Define a 1D custom gridmy_grid = tibble::tibble(Input = c(5, 10, 25, 50, 100))regularize_data(data, grid_inputs = my_grid)## Define a 2D input grid of 5x5 pointsdata_2D = cbind(ID = 1, expand.grid(Input=1:10, Input2=1:10), Output = 1:100)regularize_data(data_2D, size_grid = 5)## Define a 2D custom input gridmy_grid_2D = MagmaClustR::expand_grid_inputs(c(2, 4, 8), 'Input2' = c(3, 5))regularize_data(data_2D, grid_inputs = my_grid_2D)

Rational Quadratic Kernel

Description

Rational Quadratic Kernel

Usage

rq_kernel(x, y, hp, deriv = NULL, vectorized = FALSE)

Arguments

x

A vector (or matrix if vectorized = T) of inputs.

y

A vector (or matrix if vectorized = T) of inputs.

hp

A tibble, data frame or named vector, containing the kernel'shyperparameters. Required columns: 'rq_variance', 'rq_lengthscale', and'rq_scale'.

deriv

A character, indicating according to which hyper-parameter thederivative should be computed. If NULL (default), the function simply returnsthe evaluation of the kernel.

vectorized

A logical value, indicating whether the function providesa vectorized version for speeded-up calculations. If TRUE, thexandy arguments should be the vector or matrix containing allinputs for which the kernel is evaluated on all pairs of elements.If FALSE, thex andy arguments are simply two inputs.

Value

A scalar, corresponding to the evaluation of the kernel.

Examples

TRUE

Draw samples from a posterior GP/Magma distribution

Description

Draw samples from a posterior GP/Magma distribution

Usage

sample_gp(pred_gp, nb_samples = 50)sample_magma(pred_gp, nb_samples = 50)

Arguments

pred_gp

A list, typically coming frompred_magma orpred_gp functions, with argument'get_full_cov = TRUE'. Required elements:pred,cov.

nb_samples

A number, indicating the number of samples to be drawn fromthe predictive posterior distribution. For two-dimensional graphs, onlyone sample can be displayed.

Value

A tibble or data frame, containing the samples generated froma GP prediction. Format:Input,Sample,Output.

Examples

TRUE

Draw samples from a MagmaClust posterior distribution

Description

Draw samples from a MagmaClust posterior distribution

Usage

sample_magmaclust(pred_clust, nb_samples = 50)

Arguments

pred_clust

A list, typically coming frompred_magmaclust, with argument get_full_cov = TRUE'.Required elements:pred,cov,mixture.

nb_samples

A number, indicating the number of samples to be drawn fromthe predictive posterior distribution. For two-dimensional graphs, onlyone sample can be displayed.

Value

A tibble or data frame, containing the samples generated froma GP prediction. Format:Cluster,Proba,Input,Sample,Output.

Examples

TRUE

Squared Exponential Kernel

Description

Squared Exponential Kernel

Usage

se_kernel(x, y, hp, deriv = NULL, vectorized = FALSE)

Arguments

x

A vector (or matrix if vectorized = T) of inputs.

y

A vector (or matrix if vectorized = T) of inputs.

hp

A tibble, data frame or named vector, containing the kernel'shyperparameters. Required columns: 'se_variance', 'se_lengthscale'.

deriv

A character, indicating according to which hyper-parameter thederivative should be computed. If NULL (default), the function simplyreturns the evaluation of the kernel.

vectorized

A logical value, indicating whether the function providesa vectorized version for speeded-up calculations. If TRUE, thexandy arguments should be the vector or matrix containing allinputs for which the kernel is evaluated on all pairs of elements.If FALSE, thex andy arguments are simply two inputs.

Value

A scalar, corresponding to the evaluation of the kernel.

Examples

TRUE

Select the optimal number of clusters

Description

In MagmaClust, as for any clustering method, the number K of clusters has tobe provided as an hypothesis of the model. This function implements a modelselection procedure, by maximising a variational BIC criterion, computedfor different values of K. A heuristic for a fast approximation of theprocedure is proposed as well, although the corresponding models would notbe properly trained.

Usage

select_nb_cluster(  data,  fast_approx = TRUE,  grid_nb_cluster = 1:10,  ini_hp_k = NULL,  ini_hp_i = NULL,  kern_k = "SE",  kern_i = "SE",  plot = TRUE,  ...)

Arguments

data

A tibble or data frame. Columns required:ID,Input,Output. Additional columns for covariates can be specified.TheID column contains the unique names/codes used to identify eachindividual/task (or batch of data).TheInput column should define the variable that is used asreference for the observations (e.g. time for longitudinal data). TheOutput column specifies the observed values (the responsevariable). The data frame can also provide as many covariates as desired,with no constraints on the column names. These covariates are additionalinputs (explanatory variables) of the models that are also observed ateach referenceInput.

fast_approx

A boolean, indicating whether a fast approximation shouldbe used for selecting the number of clusters. If TRUE, each Magma orMagmaClust model will perform only one E-step of the training, usingthe same fixed values for the hyper-parameters (ini_hp_k andini_hp_i, or random values if not provided) in all models. Theresulting models should not be considered as trained, but this approachprovides an convenient heuristic to avoid a cumbersome model selectionprocedure.

grid_nb_cluster

A vector of integer, corresponding to grid of valuesthat will be tested for the number of clusters.

ini_hp_k

A tibble or data frame of hyper-parameters associated withkern_k. Thehp function can be used to drawcustom hyper-parameters with the correct format.

ini_hp_i

A tibble or data frame of hyper-parameters associated withkern_i. Thehp function can be used to drawcustom hyper-parameters with the correct format.db

kern_k

A kernel function associated to the mean processes.

kern_i

A kernel function associated to the individuals/tasks.

plot

A boolean indicating whether the plot of V-BIC values for allnumbers of clusters should displayed.

...

Any additional argument that could be passed totrain_magmaclust.

Value

A list, containing the results of model selection procedure forselecting the optimal number of clusters thanks to a V-BIC criterionmaximisation. The elements of the list are:

Examples

TRUE

Simulate a dataset tailored for MagmaClustR

Description

Simulate a complete training dataset, which may be representative of variousapplications. Several flexible arguments allow adjustment of the number ofindividuals, of observed inputs, and the values of many parameterscontrolling the data generation.

Usage

simu_db(  M = 10,  N = 10,  K = 1,  covariate = FALSE,  grid = seq(0, 10, 0.05),  grid_cov = seq(0, 10, 0.5),  common_input = TRUE,  common_hp = TRUE,  add_hp = FALSE,  add_clust = FALSE,  int_mu_v = c(4, 5),  int_mu_l = c(0, 1),  int_i_v = c(1, 2),  int_i_l = c(0, 1),  int_i_sigma = c(0, 0.2),  lambda_int = c(30, 40),  m_int = c(0, 10),  lengthscale_int = c(30, 40),  m0_slope = c(-5, 5),  m0_intercept = c(-50, 50))

Arguments

M

An integer. The number of individual per cluster.

N

An integer. The number of observations per individual.

K

An integer. The number of underlying clusters.

covariate

A logical value indicating whether the dataset shouldinclude an additional input covariate named 'Covariate'.

grid

A vector of numbers defining a grid of observations(i.e. the reference inputs).

grid_cov

A vector of numbers defining a grid of observations(i.e. the covariate reference inputs).

common_input

A logical value indicating whether the reference inputsare common to all individual.

common_hp

A logical value indicating whether the hyper-parameters arecommon to all individual. If TRUE and K>1, the hyper-parameters remaindifferent between the clusters.

add_hp

A logical value indicating whether the values ofhyper-parameters should be added as columns in the dataset.

add_clust

A logical value indicating whether the name of theclusters should be added as a column in the dataset.

int_mu_v

A vector of 2 numbers, defining an interval of admissiblevalues for the variance hyper-parameter of the mean process' kernel.

int_mu_l

A vector of 2 numbers, defining an interval of admissiblevalues for the lengthscale hyper-parameter of the mean process' kernel.

int_i_v

A vector of 2 numbers, defining an interval of admissiblevalues for the variance hyper-parameter of the individual process' kernel.

int_i_l

A vector of 2 numbers, defining an interval of admissiblevalues for the lengthscale hyper-parameter of the individual process'kernel.

int_i_sigma

A vector of 2 numbers, defining an interval of admissiblevalues for the noise hyper-parameter.

lambda_int

A vector of 2 numbers, defining an interval of admissiblevalues for the lambda parameter of the 2D exponential.

m_int

A vector of 2 numbers, defining an interval of admissiblevalues for the mean of the 2D exponential.

lengthscale_int

A vector of 2 numbers, defining an interval ofadmissible values for the lengthscale parameter of the 2D exponential.

m0_slope

A vector of 2 numbers, defining an interval of admissiblevalues for the slope of m0.

m0_intercept

A vector of 2 numbers, defining an interval of admissiblevalues for the intercept of m0.

Value

A full dataset of simulated training data.

Examples

## Generate a dataset with 3 clusters of 4 individuals, observed at 10 inputsdata = simu_db(M = 4, N = 10, K = 3)## Generate a 2-D dataset with an additional input 'Covariate'data = simu_db(covariate = TRUE)## Generate a dataset where input locations are different among individualsdata = simu_db(common_input = FALSE)## Generate a dataset with an additional column indicating the true clustersdata = simu_db(K = 3, add_clust = TRUE)

Simulate a batch of data

Description

Simulate a batch of output data, corresponding to one individual, coming froma GP with a the Squared Exponential kernel as covariance structure, andspecified hyper-parameters and input.

Usage

simu_indiv_se(ID, input, mean, v, l, sigma)

Arguments

ID

An identification code, whether numeric or character.

input

A vector of numbers. The input variable that is used as'reference' for input and outputs.

mean

A vector of numbers. Prior mean values of the GP.

v

A number. The variance hyper-parameter of the SE kernel.

l

A number. The lengthscale hyper-parameter of the SE kernel.

sigma

A number. The noise hyper-parameter.

Value

A tibble containing a batch of output data along with input andadditional information for a simulated individual.

Examples

TRUE

Compute a mixture of Gaussian log-likelihoods

Description

During the prediction step of MagmaClust, an EM algorithm is used to computethe maximum likelihood estimator of the hyper-parameters along withmixture probabilities for the new individual/task. This function implementsthe quantity that is maximised (i.e. a sum of Gaussian log-likelihoods,weighted by their mixture probabilities). It can also be used to monitor theEM algorithm when providing the 'prop_mixture' argument, for properpenalisation of the full log-likelihood.

Usage

sum_logL_GP_clust(  hp,  db,  mixture,  mean,  kern,  post_cov,  prop_mixture = NULL,  pen_diag)

Arguments

hp

A tibble, data frame or named vector of hyper-parameters.

db

A tibble containing data we want to evaluate the logL on.Required columns: Input, Output. Additional covariate columns are allowed.

mixture

A tibble or data frame, indicating the mixture probabilitiesof each cluster for the new individual/task.

mean

A list of hyper-posterior mean parameters for all clusters.

kern

A kernel function.

post_cov

A list of hyper-posterior covariance parameters for allclusters.

prop_mixture

A tibble or a named vector. Each name of column orelement should refer to a cluster. The value associated with each clusteris a number between 0 and 1, corresponding to the mixtureproportions.

pen_diag

A jitter term that is added to the covariance matrix to avoidnumerical issues when inverting, in cases of nearly singular matrices.

Value

A number, expectation of mixture of Gaussian log-likelihoods inthe prediction step of MagmaClust. This quantity is supposed to increaseat each step of the EM algorithm, and can be used for monitoring theprocedure.

Examples

TRUE

French swimmers performances data on 100m freestyle events

Description

A subset of data from reported performances of French swimmers during100m freestyle competitions between 2002 and 2016. Seehttps://link.springer.com/article/10.1007/s10994-022-06172-1 andhttps://www.mdpi.com/2076-3417/8/10/1766 for dedicated description andanalysis.

Usage

swimmers

Format

swimmers

A data frame with 76,832 rows and 4 columns:

ID

Indentifying number associated to each swimmer

Input

Age in years

Output

Performance in seconds on a 100m freestyle event

Gender

Competition gender

Source

https://ffn.extranat.fr/webffn/competitions.php?idact=nat


Learning hyper-parameters of a Gaussian Process

Description

Learning hyper-parameters of any new individual/task inMagma isrequired in the prediction procedure. This function can also be used to learnhyper-parameters of a simple GP (just let thehyperpost argument setto NULL, and useprior_mean instead). When using withinMagma,by providing data for the new individual/task, the hyper-posterior mean andcovariance parameters, and initialisation values for the hyper-parameters,the function computes maximum likelihood estimates of the hyper-parameters.

Usage

train_gp(  data,  prior_mean = NULL,  ini_hp = NULL,  kern = "SE",  hyperpost = NULL,  pen_diag = 1e-10)

Arguments

data

A tibble or data frame. Required columns:Input,Output. Additional columns for covariates can be specified.TheInput column should define the variable that is used asreference for the observations (e.g. time for longitudinal data). TheOutput column specifies the observed values (the responsevariable). The data frame can also provide as many covariates as desired,with no constraints on the column names. These covariates are additionalinputs (explanatory variables) of the models that are also observed ateach referenceInput.

prior_mean

Mean parameter of the GP. This argument can bespecified under various formats, such as:

  • NULL (default). The hyper-posterior mean would be set to 0 everywhere.

  • A number. The hyper-posterior mean would be a constant function.

  • A vector of the same length as all the distinct Input values in thedata argument. This vector would be considered as the evaluationof the hyper-posterior mean function at the training Inputs.

  • A function. This function is defined as the hyper-posterior mean.

  • A tibble or data frame. Required columns: Input, Output. The Inputvalues should include at least the same values as in thedataargument.

ini_hp

A named vector, tibble or data frame of hyper-parametersassociated with thekern of the new individual/task.The columns should be named according to the hyper-parameters that areused inkern. In cases where the model includes a noise term,ini_hp should contain an additional 'noise' column. If NULL(default), random values are used as initialisation. Thehpfunction can be used to draw custom hyper-parameters with the correctformat.

kern

A kernel function, defining the covariance structure of the GP.Several popular kernels(seeThe KernelCookbook) are already implemented and can be selected within thefollowing list:

  • "SE": (default value) the Squared Exponential Kernel (also calledRadial Basis Function or Gaussian kernel),

  • "LIN": the Linear kernel,

  • "PERIO": the Periodic kernel,

  • "RQ": the Rational Quadratic kernel.Compound kernels can be created as sums or products of the above kernels.For combining kernels, simply provide a formula as a character stringwhere elements are separated by whitespaces (e.g. "SE + PERIO"). As the²elements are treated sequentially from the left to the right, the productoperator '*' shall always be used before the '+' operators (e.g.'SE * LIN + RQ' is valid whereas 'RQ + SE * LIN' is not).

hyperpost

A list, containing the elements 'mean' and 'cov',the parameters of the hyper-posterior distribution of the mean process.Typically, this argument should come from a previous learning usingtrain_magma, or from thehyperposteriorfunction. Ifhyperpost is provided, the likelihood that ismaximised is the one involved during Magma's prediction step, and theprior_mean argument is ignored. For classic GP training, leavehyperpost to NULL.

pen_diag

A number. A jitter term, added on the diagonal to preventnumerical issues when inverting nearly singular matrices.

Value

A tibble, containing the trained hyper-parameters for the kernel ofthe new individual/task.

Examples

TRUE

Prediction in MagmaClust: learning new HPs and mixture probabilities

Description

Learning hyper-parameters and mixture probabilities of any newindividual/task is required inMagmaClust in the prediction procedure.By providing data for the new individual/task, the hyper-posterior mean andcovariance parameters, the mixture proportions, and initialisation values forthe hyper-parameters,train_gp_clust uses an EM algorithm to computemaximum likelihood estimates of the hyper-parameters and hyper-posteriormixture probabilities of the new individual/task.

Usage

train_gp_clust(  data,  prop_mixture = NULL,  ini_hp = NULL,  kern = "SE",  hyperpost = NULL,  pen_diag = 1e-10,  n_iter_max = 25,  cv_threshold = 0.001)

Arguments

data

A tibble or data frame. Required columns:Input,Output. Additional columns for covariates can be specified.TheInput column should define the variable that is used asreference for the observations (e.g. time for longitudinal data). TheOutput column specifies the observed values (the responsevariable). The data frame can also provide as many covariates as desired,with no constraints on the column names. These covariates are additionalinputs (explanatory variables) of the models that are also observed ateach referenceInput.

prop_mixture

A tibble or a named vector. Each name of column orelement should refer to a cluster. The value associated with each clusteris a number between 0 and 1, corresponding to the mixtureproportions.

ini_hp

A tibble or data frame of hyper-parametersassociated withkern, the individual process kernel. Thehp function can be used to draw custom hyper-parameters withthe correct format.

kern

A kernel function, defining the covariance structure of the GP.Several popular kernels(seeThe KernelCookbook) are already implemented and can be selected within thefollowing list:

  • "SE": (default value) the Squared Exponential Kernel (also calledRadial Basis Function or Gaussian kernel),

  • "LIN": the Linear kernel,

  • "PERIO": the Periodic kernel,

  • "RQ": the Rational Quadratic kernel.Compound kernels can be created as sums or products of the above kernels.For combining kernels, simply provide a formula as a character stringwhere elements are separated by whitespaces (e.g. "SE + PERIO"). As theelements are treated sequentially from the left to the right, the productoperator '*' shall always be used before the '+' operators (e.g.'SE * LIN + RQ' is valid whereas 'RQ + SE * LIN' is not).

hyperpost

A list, containing the elementsmean,cov andmixture the parameters of the hyper-posterior distributions of themean processes. Typically, this argument should come from a previouslearning usingtrain_magmaclust, or a previous predictionwithpred_magmaclust, with the argumentget_hyperpostset to TRUE.

pen_diag

A number. A jitter term, added on the diagonal to preventnumerical issues when inverting nearly singular matrices.

n_iter_max

A number, indicating the maximum number of iterations ofthe EM algorithm to proceed while not reaching convergence.

cv_threshold

A number, indicating the threshold of the likelihood gainunder which the EM algorithm will stop.

Value

A list, containing the results of the EM algorithm used during theprediction step of MagmaClust. The elements of the list are:

Examples

TRUE

Training Magma with an EM algorithm

Description

The hyper-parameters and the hyper-posterior distribution involved in Magmacan be learned thanks to an EM algorithm implemented intrain_magma.By providing a dataset, the model hypotheses (hyper-prior mean parameter andcovariance kernels) and initialisation values for the hyper-parameters, thefunction computes maximum likelihood estimates of the HPs as well as themean and covariance parameters of the Gaussian hyper-posterior distributionof the mean process.

Usage

train_magma(  data,  prior_mean = NULL,  ini_hp_0 = NULL,  ini_hp_i = NULL,  kern_0 = "SE",  kern_i = "SE",  common_hp = TRUE,  grid_inputs = NULL,  pen_diag = 1e-10,  n_iter_max = 25,  cv_threshold = 0.001,  fast_approx = FALSE)

Arguments

data

A tibble or data frame. Required columns:ID,Input,Output. Additional columns for covariates can be specified.TheID column contains the unique names/codes used to identify eachindividual/task (or batch of data).TheInput column should define the variable that is used asreference for the observations (e.g. time for longitudinal data). TheOutput column specifies the observed values (the responsevariable). The data frame can also provide as many covariates as desired,with no constraints on the column names. These covariates are additionalinputs (explanatory variables) of the models that are also observed ateach referenceInput.

prior_mean

Hyper-prior mean parameter (m_0) of the mean GP. Thisargument can be specified under various formats, such as:

  • NULL (default). The hyper-prior mean would be set to 0 everywhere.

  • A number. The hyper-prior mean would be a constant function.

  • A vector of the same length as all the distinct Input values in thedata argument. This vector would be considered as the evaluationof the hyper-prior mean function at the training Inputs.

  • A function. This function is defined as the hyper_prior mean.

  • A tibble or data frame. Required columns: Input, Output. The Inputvalues should include at least the same values as in thedataargument.

ini_hp_0

A named vector, tibble or data frame of hyper-parametersassociated withkern_0, the mean process' kernel. Thecolumns/elements should be named according to the hyper-parametersthat are used inkern_0. If NULL (default), random values are usedas initialisation. Thehp function can be used to drawcustom hyper-parameters with the correct format.

ini_hp_i

A tibble or data frame of hyper-parametersassociated withkern_i, the individual processes' kernel.Required column :ID. TheID column contains the uniquenames/codes used to identify each individual/task. The other columnsshould be named according to the hyper-parameters that are used inkern_i. Compared toini_hp_0 should contain an additional'noise' column to initialise the noise hyper-parameter of the model. IfNULL (default), random values are used as initialisation. Thehp function can be used to draw custom hyper-parameterswith the correct format.

kern_0

A kernel function, associated with the mean GP.Several popular kernels(seeThe KernelCookbook) are already implemented and can be selected within thefollowing list:

  • "SE": (default value) the Squared Exponential Kernel (also calledRadial Basis Function or Gaussian kernel),

  • "LIN": the Linear kernel,

  • "PERIO": the Periodic kernel,

  • "RQ": the Rational Quadratic kernel.Compound kernels can be created as sums or products of the above kernels.For combining kernels, simply provide a formula as a character stringwhere elements are separated by whitespaces (e.g. "SE + PERIO"). As theelements are treated sequentially from the left to the right, the productoperator '*' shall always be used before the '+' operators (e.g.'SE * LIN + RQ' is valid whereas 'RQ + SE * LIN' is not).

kern_i

A kernel function, associated with the individual GPs. ("SE","PERIO" and "RQ" are also available here).

common_hp

A logical value, indicating whether the set ofhyper-parameters is assumed to be common to all individuals.

grid_inputs

A vector, indicating the grid of additional referenceinputs on which the mean process' hyper-posterior should be evaluated.

pen_diag

A number. A jitter term, added on the diagonal to preventnumerical issues when inverting nearly singular matrices.

n_iter_max

A number, indicating the maximum number of iterations ofthe EM algorithm to proceed while not reaching convergence.

cv_threshold

A number, indicating the threshold of the likelihood gainunder which the EM algorithm will stop. The convergence condition isdefined as the difference of likelihoods between two consecutive steps,divided by the absolute value of the last one((LL_n - LL_n-1) / |LL_n| ).

fast_approx

A boolean, indicating whether the EM algorithm shouldstop after only one iteration of the E-step. This advanced feature ismainly used to provide a faster approximation of the model selectionprocedure, by preventing any optimisation over the hyper-parameters.

Details

The user can specify custom kernel functions for the argumentkern_0 andkern_i. The hyper-parameters used in the kernelshould have explicit names, and be contained within thehpargument.hp should typically be defined as a named vector or adata frame. Although it is not mandatory for thetrain_magmafunction to run, gradients can be provided within kernel functiondefinition. See for examplese_kernel to create a customkernel function displaying an adequate format to be used in Magma.

Value

A list, gathering the results of the EM algorithm used for trainingin Magma. The elements of the list are:

Examples

TRUE

Training MagmaClust with a Variational EM algorithm

Description

The hyper-parameters and the hyper-posterior distributions involved inMagmaClust can be learned thanks to a VEM algorithm implemented intrain_magmaclust. By providing a dataset, the model hypotheses(hyper-prior mean parameters, covariance kernels and number of clusters) andinitialisation values for the hyper-parameters, the function computesmaximum likelihood estimates of the HPs as well as the mean and covarianceparameters of the Gaussian hyper-posterior distributions of the meanprocesses.

Usage

train_magmaclust(  data,  nb_cluster = NULL,  prior_mean_k = NULL,  ini_hp_k = NULL,  ini_hp_i = NULL,  kern_k = "SE",  kern_i = "SE",  ini_mixture = NULL,  common_hp_k = TRUE,  common_hp_i = TRUE,  grid_inputs = NULL,  pen_diag = 1e-10,  n_iter_max = 25,  cv_threshold = 0.001,  fast_approx = FALSE)

Arguments

data

A tibble or data frame. Columns required:ID,Input,Output. Additional columns for covariates can be specified.TheID column contains the unique names/codes used to identify eachindividual/task (or batch of data).TheInput column should define the variable that is used asreference for the observations (e.g. time for longitudinal data). TheOutput column specifies the observed values (the responsevariable). The data frame can also provide as many covariates as desired,with no constraints on the column names. These covariates are additionalinputs (explanatory variables) of the models that are also observed ateach referenceInput.

nb_cluster

A number, indicating the number of clusters ofindividuals/tasks that are assumed to exist among the dataset.

prior_mean_k

The set of hyper-prior mean parameters (m_k) for the Kmean GPs, one value for each cluster.cluster. This argument can be specified under various formats, such as:

  • NULL (default). All hyper-prior means would be set to 0 everywhere.

  • A numerical vector of the same length as the number of clusters.Each number is associated with one cluster, and consideredto be the hyper-prior mean parameter of the cluster (i.e. a constantfunction at allInput).

  • A list of functions. Each function is associated with one cluster. Thesefunctions are all evaluated at allInput values, to providespecific hyper-prior mean vectors for each cluster.

ini_hp_k

A tibble or data frame of hyper-parametersassociated withkern_k, the mean process' kernel.Required column :ID. TheID column contains the uniquenames/codes used to identify each cluster. The other columnsshould be named according to the hyper-parameters that are used inkern_k. Thehp function can be used to drawcustom hyper-parameters with the correct format.

ini_hp_i

A tibble or data frame of hyper-parametersassociated withkern_i, the individual processes' kernel.Required column :ID. TheID column contains the uniquenames/codes used to identify each individual/task. The other columnsshould be named according to the hyper-parameters that are used inkern_i. Thehp function can be used to drawcustom hyper-parameters with the correct format.

kern_k

A kernel function, associated with the mean GPs.Several popular kernels(seeThe KernelCookbook) are already implemented and can be selected within thefollowing list:

  • "SE": (default value) the Squared Exponential Kernel (also calledRadial Basis Function or Gaussian kernel),

  • "LIN": the Linear kernel,

  • "PERIO": the Periodic kernel,

  • "RQ": the Rational Quadratic kernel.Compound kernels can be created as sums or products of the above kernels.For combining kernels, simply provide a formula as a character stringwhere elements are separated by whitespaces (e.g. "SE + PERIO"). As theelements are treated sequentially from the left to the right, the productoperator '*' shall always be used before the '+' operators (e.g.'SE * LIN + RQ' is valid whereas 'RQ + SE * LIN' is not).

kern_i

A kernel function, associated with the individual GPs. (Seedetails above inkern_k).

ini_mixture

Initial values of the probability to belong to eachcluster for each individual (ini_mixture can be used fora k-means initialisation. Used by default if NULL).

common_hp_k

A boolean indicating whether hyper-parameters are commonamong the mean GPs.

common_hp_i

A boolean indicating whether hyper-parameters are commonamong the individual GPs.

grid_inputs

A vector, indicating the grid of additional referenceinputs on which the mean processes' hyper-posteriors should be evaluated.

pen_diag

A number. A jitter term, added on the diagonal to preventnumerical issues when inverting nearly singular matrices.

n_iter_max

A number, indicating the maximum number of iterations ofthe VEM algorithm to proceed while not reaching convergence.

cv_threshold

A number, indicating the threshold of the likelihood gainunder which the VEM algorithm will stop. The convergence condition isdefined as the difference of elbo between two consecutive steps,divided by the absolute value of the last one((ELBO_n - ELBO_{n-1}) / |ELBO_n| ).

fast_approx

A boolean, indicating whether the VEM algorithm shouldstop after only one iteration of the VE-step. This advanced feature ismainly used to provide a faster approximation of the model selectionprocedure, by preventing any optimisation over the hyper-parameters.

Details

The user can specify custom kernel functions for the argumentkern_k andkern_i. The hyper-parameters used in the kernelshould have explicit names, and be contained within thehpargument.hp should typically be defined as a named vector or adata frame. Although it is not mandatory for thetrain_magmaclustfunction to run, gradients be can provided within kernel functiondefinition. See for examplese_kernel to create a customkernel function displaying an adequate format to be used inMagmaClust.

Value

A list, containing the results of the VEM algorithm used in thetraining step of MagmaClust. The elements of the list are:

Examples

TRUE

Update the mixture probabilities for each individual and each cluster

Description

Update the mixture probabilities for each individual and each cluster

Usage

update_mixture(db, mean_k, cov_k, hp, kern, prop_mixture, pen_diag)

Arguments

db

A tibble or data frame. Columns required:ID,Input,Output. Additional columns for covariates can bespecified.

mean_k

A list of the K hyper-posterior mean parameters.

cov_k

A list of the K hyper-posterior covariance matrices.

hp

A named vector, tibble or data frame of hyper-parametersassociated withkern, the individual process' kernel. Thecolumns/elements should be named according to the hyper-parametersthat are used inkern.

kern

A kernel function, defining the covariance structure ofthe individual GPs.

prop_mixture

A tibble containing the hyper-parameters associatedwith each individual, indicating in which cluster it belongs.

pen_diag

A number. A jitter term, added on the diagonal to preventnumerical issues when inverting nearly singular matrices.

Value

Compute the hyper-posterior multinomial distributions by updatingmixture probabilities.

Examples

TRUE

E-Step of the VEM algorithm

Description

Expectation step of the Variational EM algorithm used to computethe parameters of the hyper-posteriors distributionsfor the mean processes and mixture variables involved in MagmaClust.

Usage

ve_step(db, m_k, kern_k, kern_i, hp_k, hp_i, old_mixture, iter, pen_diag)

Arguments

db

A tibble or data frame. Columns required: ID, Input, Output.Additional columns for covariates can be specified.

m_k

A named list of vectors, corresponding to the prior meanparameters of the K mean GPs.

kern_k

A kernel function, associated with the K mean GPs.

kern_i

A kernel function, associated with the M individual GPs.

hp_k

A named vector, tibble or data frame of hyper-parametersassociated withkern_k.

hp_i

A named vector, tibble or data frame of hyper-parametersassociated withkern_i.

old_mixture

A list of mixture values from the previous iteration.

iter

A number, indicating the current iteration of the VEM algorithm.

pen_diag

A number. A jitter term, added on the diagonal to preventnumerical issues when inverting nearly singular matrices.

Value

A named list, containing the elementsmean, a tibblecontaining the Input and associated Output of the hyper-posterior meanparameters,cov, the hyper-posterior covariance matrices,andmixture, the probabilities to belong to each cluster for eachindividual.

Examples

TRUE

V-Step of the VEM algorithm

Description

Maximization step of the Variational EM algorithm used to computehyper-parameters of all the kernels involved in MagmaClust.

Usage

vm_step(  db,  old_hp_k,  old_hp_i,  list_mu_param,  kern_k,  kern_i,  m_k,  common_hp_k,  common_hp_i,  pen_diag)

Arguments

db

A tibble or data frame. Columns required: ID, Input, Output.Additional columns for covariates can be specified.

old_hp_k

A named vector, tibble or data frame, containing thehyper-parameters from the previous M-step (or initialisation) associatedwith the mean GPs.

old_hp_i

A named vector, tibble or data frame, containing thehyper-parameters from the previous M-step (or initialisation) associatedwith the individual GPs.

list_mu_param

List of parameters of the K mean GPs.

kern_k

A kernel used to compute the covariance matrix of the mean GPat corresponding timestamps.

kern_i

A kernel used to compute the covariance matrix of individualsGP at corresponding timestamps.

m_k

A named list of prior mean parameters for the K mean GPs.Length = 1 or nrow(unique(db$Input))

common_hp_k

A boolean indicating whether hp are common amongmean GPs (for each mu_k)

common_hp_i

A boolean indicating whether hp are common amongindividual GPs (for each y_i)

pen_diag

A number. A jitter term, added on the diagonal to preventnumerical issues when inverting nearly singular matrices.

Value

A named list, containing the elementshp_k, a tibblecontaining the hyper-parameters associated with each cluster,hp_i, a tibble containing the hyper-parametersassociated with the individual GPs, andprop_mixture_k,a tibble containing the hyper-parameters associated with each individual,indicating the probabilities to belong to each cluster.

Examples

TRUE

Weight follow-up data of children in Singapore

Description

A subset of data from the GUSTO project (https://www.gusto.sg/) collectingthe weight over time of several children in Singapore.See https://arxiv.org/abs/2011.07866 for dedicated description andanalysis.

Usage

weight

Format

weight

A data frame with 3,629 rows and 4 columns:

ID

Indentifying number associated to each child

sex

Biological gender

Input

Age in months

Output

Weight in kilograms

Source

https://www.gusto.sg/


[8]ページ先頭

©2009-2025 Movatter.jp