In this example, we simulate data from a multivariate (convolved) GPmodel. See details of this model in Chapter 8 of Shi, J. Q., and Choi,T. (2011), “Gaussian Process Regression Analysis for Functional Data”,CRC Press.
We simulate\(30\) realisations ofthree dependent outputs, with\(250\)time points on\([0,1]\) for eachoutput.
set.seed(123)nrep<-30n1<-250n2<-250n3<-250N<-3n<- n1+n2+n3input1<-sapply(1:n1,function(x) (x-min(1:n1))/max(1:n1-min(1:n1)))input2<- input1input3<- input1# storing input vectors in a listData<-list()Data$input<-list(input1, input2, input3)# true hyperparameter valuesnu0s<-c(6,4,2)nu1s<-c(0.1,0.05,0.01)a0s<-c(500,500,500)a1s<-c(100,100,100)sigm<-0.05hp<-c(nu0s,log(nu1s),log(a0s),log(a1s),log(sigm))# Calculate covariance matrixPsi<-mgpCovMat(Data=Data,hp=hp)We need an index vector identifying to which output the datacorresponds:
Covariance functions\(\text{Cov}\big[X_j(t), X_\ell(0) \big]\)can be plotted as follows. The argumentsoutput andoutputp correspond to\(j\) and\(\ell\), respectively.
Given the hyperparametershp, we can plot the auto- andcross-covariance functions as follows:
# Plotting an auto-covariance functionplotmgpCovFun(type="Cov",output=1,outputp=1,Data=Data,hp=hp,idx=idx)# Plotting a cross-covariance functionplotmgpCovFun(type="Cov",output=1,outputp=2,Data=Data,hp=hp,idx=idx)Corresponding correlation functions can be plotted by settingtype=Cor:
# Plotting an auto-correlation functionplotmgpCovFun(type="Cor",output=1,outputp=1,Data=Data,hp=hp,idx=idx)# Plotting a cross-correlation functionplotmgpCovFun(type="Cor",output=1,outputp=2,Data=Data,hp=hp,idx=idx)We assume that the mean functions for each output are\(\mu_1(t) = 5t\),\(\mu_2(t) = 10t\), and\(\mu_3(t) = -3t\) and simulate the data asfollows
mu<-c(5*input1,10*input2,-3*input3)Y<-t(mvrnorm(n=nrep,mu=mu,Sigma=Psi))response<-list()for(jin1:N){ response[[j]]<- Y[idx==j,,drop=F]}# storing the response in the listData$response<- responseBelow we estimate the mean and covariance functions using a subset ofdata including\(m=100\) observations(out of\(750\) of the sample) aimingfor a faster estimation. These\(m\)observations are chosen randomly. For the mean functions, we choose thelinear model by setttingmeanModel = 't'.
Next, based on the estimated model, we want to predict the values ofthe three outputs at new time points:
n_star<-60*Ninput1star<-seq(min(input1),max(input1),length.out = n_star/N)input2star<-seq(min(input2),max(input2),length.out = n_star/N)input3star<-seq(min(input3),max(input3),length.out = n_star/N)DataNew<-list()DataNew$input<-list(input1star, input2star, input3star)We have trained the model using\(m\) time points. However, for visualisationpurposes, it is more interesting to see predictions based on very fewdata points. Therefore, let’s use a very small subset of observationsand make predictions given this small subset. We will use observationsfrom the fifth multivariate realisation stored in `Data’.
realisation<-5obsSet<-list()obsSet[[1]]<-c(5,10,23,50,80,200)obsSet[[2]]<-c(10,23,180)obsSet[[3]]<-c(3,11,30,240)DataObs<-list()DataObs$input[[1]]<- Data$input[[1]][obsSet[[1]]]DataObs$input[[2]]<- Data$input[[2]][obsSet[[2]]]DataObs$input[[3]]<- Data$input[[3]][obsSet[[3]]]DataObs$response[[1]]<- Data$response[[1]][obsSet[[1]], realisation]DataObs$response[[2]]<- Data$response[[2]][obsSet[[2]], realisation]DataObs$response[[3]]<- Data$response[[3]][obsSet[[3]], realisation]ThemgprPredict function returns a list containing thepredictive mean and standard deviation for the curves of each output atthe new time points.
# Calculate predictions for the test set given some observationspredCGP<-mgprPredict(train=res,DataObs=DataObs,DataNew=DataNew)str(predCGP)#> List of 3#> $ pred.mean :List of 3#> ..$ : num [1:60, 1] -2.93 -2.72 -2.32 -1.83 -1.3 ...#> .. ..- attr(*, "dimnames")=List of 2#> .. .. ..$ : chr [1:60] "1" "2" "3" "4" ...#> .. .. ..$ : NULL#> ..$ : num [1:60, 1] -0.65281 -0.46403 -0.28736 -0.13731 -0.00933 ...#> .. ..- attr(*, "dimnames")=List of 2#> .. .. ..$ : chr [1:60] "1" "2" "3" "4" ...#> .. .. ..$ : NULL#> ..$ : num [1:60, 1] -0.33 -0.346 -0.355 -0.36 -0.362 ...#> .. ..- attr(*, "dimnames")=List of 2#> .. .. ..$ : chr [1:60] "1" "2" "3" "4" ...#> .. .. ..$ : NULL#> $ pred.sd :List of 3#> ..$ : num [1:60] 0.136 0.0673 0.0662 0.0866 0.0932 ...#> ..$ : num [1:60] 0.2731 0.15 0.0713 0.0892 0.0984 ...#> ..$ : num [1:60] 0.0715 0.0613 0.0606 0.0633 0.0653 ...#> $ noiseFreePred: logi FALSE#> - attr(*, "class")= chr "mgpr"The predictions (with 95% confidence inverval) for the\(5\)th curve at the new time points can bevisualised by using the model estimated by themgprfunction:
Let’s assume that we have additional information for the first twofunctions by also including their 100th and 150th observations:
obsSet[[1]]<-c(5,10,23,50,80,100,150,200)obsSet[[2]]<-c(10,23,100,150,180)DataObs$input[[1]]<- Data$input[[1]][obsSet[[1]]]DataObs$input[[2]]<- Data$input[[2]][obsSet[[2]]]DataObs$response[[1]]<- Data$response[[1]][obsSet[[1]], realisation]DataObs$response[[2]]<- Data$response[[2]][obsSet[[2]], realisation]Now notice how predictions for the third function are affected by theinformation added to the other functions.