Movatterモバイル変換


[0]ホーム

URL:


Spatial Regression Models

In this article, we discuss the following functions -

These functions can be used to fit Gaussian and non-Gaussian spatialpoint-referenced data.

set.seed(1729)

Bayesian Gaussian spatial regression models

In this section, we thoroughly illustrate our method on syntheticGaussian as well as non-Gaussian spatial data and provide code toanalyze the output of our functions. We start by loading thepackage.

library(spStack)

Some synthetic spatial data are lazy-loaded which includes syntheticspatial Gaussian datasimGaussian, Poisson datasimPoisson, binomial datasimBinom and binarydatasimBinary. One can use the functionsim_spData() to simulate spatial data. We will be applyingour functions on these datasets.

Using fixed hyperparameters

We first load the datasimGaussian and set up thepriors. Supplying the priors is optional. See the documentation ofspLMexact() to learn more about the default priors.Besides, setting the priors, we also fix the values of the spatialprocess parameters (spatial decay\(\phi\) and smoothness\(\nu\)) and the noise-to-spatial varianceratio (\(\delta^2\)).

data("simGaussian")dat<- simGaussian[1:200, ]# work with first 200 rowsmuBeta<-c(0,0)VBeta<-cbind(c(1E4,0.0),c(0.0,1E4))sigmaSqIGa<-2sigmaSqIGb<-2phi0<-3nu0<-0.5noise_sp_ratio<-0.8prior_list<-list(beta.norm =list(muBeta, VBeta),sigma.sq.ig =c(sigmaSqIGa, sigmaSqIGb))nSamples<-1000

We define the spatial model using aformula, similar tothat in the widely usedlm() function in thestats package. Here, the formulay ~ x1corresponds to the spatial linear model\[y(s) = \beta_0 + \beta_1 x_1(s) + z(s) +\epsilon(s)\;,\] where they corresponds to theresponse variable\(y(s)\), which isregressed on the predictorx1 given by\(x_1(s)\). The intercept is automaticallyconsidered within the model, and hencey ~ x1 isfunctionally equivalent toy ~ 1 + x1. Moreover, a spatialrandom effect is inherent in the model, where the spatial correlationmatrix is governed by the spatial correlation function specified by theargumentcor.fn. Supported correlation functions are"exponential" and"matern". The exponentialcovariogram is specified by the hyperparameter\(\phi\) and the Matern covariogram isspecified by the hyperparameters\(\phi\) and\(\nu\). Fixed values of thesehyperparameters are supplied through the argumentspParams.In addition, the noise-to-spatial variance ration is also fixed throughthe argumentnoise_sp_ratio.

If interested in calculation of leave-one-out predictive densities(LOO-PD),loopd must be setTRUE (the defaultisFALSE). Method of LOO-PD calculation can be also set bythe optionloopd.method which support the keywords"exact" and"psis". The option"exact" exploits the analytically available expressions ofthe predictive density and implements an efficient row-deletion Choleskyfactor update for fast calculation and avoids refitting the model\(n\) times, where\(n\) is the sample size. On the other hand,"psis" implements Pareto-smoothed importance sampling andfinds approximate LOO-PD and is much faster than"exact".

We pass these arguments into the functionspLMexact().

mod1<-spLMexact(y~ x1,data = dat,coords =as.matrix(dat[,c("s1","s2")]),cor.fn ="matern",priors = prior_list,spParams =list(phi = phi0,nu = nu0),noise_sp_ratio = noise_sp_ratio,n.samples = nSamples,loopd =TRUE,loopd.method ="exact",verbose =TRUE)#> ----------------------------------------#>  Model description#> ----------------------------------------#> Model fit with 200 observations.#>#> Number of covariates 2 (including intercept).#>#> Using the matern spatial correlation function.#>#> Priors:#>  beta: Gaussian#>  mu: 0.00    0.00#>  cov:#>   10000.00    0.00#>   0.00    10000.00#>#>  sigma.sq: Inverse-Gamma#>  shape = 2.00, scale = 2.00.#>#> Spatial process parameters:#>  phi = 3.00, and, nu = 0.50.#> Noise-to-spatial variance ratio = 0.80.#>#> Number of posterior samples = 1000.#>#> LOO-PD calculation method = exact.#> ----------------------------------------

Next, we can summarize the posterior samples of the fixed effects asfollows.

post_beta<- mod1$samples$betasummary_beta<-t(apply(post_beta,1,function(x)quantile(x,c(0.025,0.5,0.975))))rownames(summary_beta)<- mod1$X.namesprint(summary_beta)#>                 2.5%      50%    97.5%#> (Intercept) 1.688957 2.284946 2.955524#> x1          4.866534 4.954401 5.040862

Leave-one-out predictive densities using PSIS

Out of curiosity, we find the LOO-PD for the same model using theapproximate method that uses Pareto-smoothed importance sampling, orPSIS. SeeVehtari, Gelman, and Gabry (2017) for details.

mod2<-spLMexact(y~ x1,data = dat,coords =as.matrix(dat[,c("s1","s2")]),cor.fn ="matern",priors = prior_list,spParams =list(phi = phi0,nu = nu0),noise_sp_ratio = noise_sp_ratio,n.samples = nSamples,loopd =TRUE,loopd.method ="PSIS",verbose =FALSE)

Subsquently, we compare the LOO-PD obtained by the two methods.

loopd_exact<- mod1$loopdloopd_psis<- mod2$loopdloopd_df<-data.frame(exact = loopd_exact,psis = loopd_psis)library(ggplot2)plot1<-ggplot(data = loopd_df,aes(x = exact))+geom_point(aes(y = psis),size =1,alpha =0.5)+geom_abline(slope =1,intercept =0,color ="red",alpha =0.5)+xlab("Exact")+ylab("PSIS")+theme_bw()+theme(panel.background =element_blank(),panel.grid =element_blank(),aspect.ratio =1)plot1

Using predictive stacking

Next, we move on to the Bayesian spatial stacking algorithm forGaussian data. We supply the same prior list and provide some candidatevalues of spatial process parameters and noise-to-spatial varianceratio.

mod3<-spLMstack(y~ x1,data = dat,coords =as.matrix(dat[,c("s1","s2")]),cor.fn ="matern",priors = prior_list,params.list =list(phi =c(1.5,3,5),nu =c(0.5,1,1.5),noise_sp_ratio =c(0.5,1.5)),n.samples =1000,loopd.method ="exact",parallel =FALSE,solver ="ECOS",verbose =TRUE)#>#> STACKING WEIGHTS:#>#>            | phi | nu  | noise_sp_ratio | weight |#> +----------+-----+-----+----------------+--------+#> | Model 1  |  1.5|  0.5|             0.5| 0.000  |#> | Model 2  |  3.0|  0.5|             0.5| 0.000  |#> | Model 3  |  5.0|  0.5|             0.5| 0.000  |#> | Model 4  |  1.5|  1.0|             0.5| 0.242  |#> | Model 5  |  3.0|  1.0|             0.5| 0.000  |#> | Model 6  |  5.0|  1.0|             0.5| 0.751  |#> | Model 7  |  1.5|  1.5|             0.5| 0.000  |#> | Model 8  |  3.0|  1.5|             0.5| 0.000  |#> | Model 9  |  5.0|  1.5|             0.5| 0.000  |#> | Model 10 |  1.5|  0.5|             1.5| 0.000  |#> | Model 11 |  3.0|  0.5|             1.5| 0.000  |#> | Model 12 |  5.0|  0.5|             1.5| 0.000  |#> | Model 13 |  1.5|  1.0|             1.5| 0.000  |#> | Model 14 |  3.0|  1.0|             1.5| 0.000  |#> | Model 15 |  5.0|  1.0|             1.5| 0.006  |#> | Model 16 |  1.5|  1.5|             1.5| 0.000  |#> | Model 17 |  3.0|  1.5|             1.5| 0.000  |#> | Model 18 |  5.0|  1.5|             1.5| 0.000  |#> +----------+-----+-----+----------------+--------+

The user can check the solver status and runtime by issuing thefollowing.

print(mod3$solver.status)#> [1] "optimal"print(mod3$run.time)#>    user  system elapsed#>   0.586   0.157   0.586

Analyzing samples from the stacked posterior

To sample from the stacked posterior, the package provides a helperfunction calledstackedSampler(). Subsequent inferenceproceeds from these samples obtained from the stacked posterior.

post_samps<-stackedSampler(mod3)

We then collect the samples of the fixed effects and summarize themas follows.

post_beta<- post_samps$betasummary_beta<-t(apply(post_beta,1,function(x)quantile(x,c(0.025,0.5,0.975))))rownames(summary_beta)<- mod3$X.namesprint(summary_beta)#>                 2.5%      50%    97.5%#> (Intercept) 1.087011 2.220825 3.060573#> x1          4.872795 4.951893 5.036646

The synthetic datasimGaussian was simulated using thetrue value\(\beta = (2, 5)^{ \scriptstyle\top }\). We notice that the stacked posterior is concentratedaround the truth.

library(tidyr)library(dplyr)#>#> Attaching package: 'dplyr'#> The following objects are masked from 'package:stats':#>#>     filter, lag#> The following objects are masked from 'package:base':#>#>     intersect, setdiff, setequal, unionpost_beta_df<-as.data.frame(post_beta)post_beta_df<- post_beta_df%>%mutate(row =paste0("beta",row_number()-1))%>%pivot_longer(-row,names_to ="sample",values_to ="value")# True values of beta0 and beta1truth<-data.frame(row =c("beta0","beta1"),true_value =c(2,5))ggplot(post_beta_df,aes(x = value))+geom_density(fill ="lightblue",alpha =0.6)+geom_vline(data = truth,aes(xintercept = true_value),color ="red",linetype ="dashed",linewidth =0.5)+facet_wrap(~ row,scales ="free")+labs(x ="",y ="Density")+theme_bw()+theme(panel.background =element_blank(),panel.grid =element_blank(),aspect.ratio =1)

Posterior distributions of the fixed effects

Furthermore, we compare the posterior samples of the spatial randomeffects with their corresponding true values.

post_z<- post_samps$zpost_z_summ<-t(apply(post_z,1,function(x)quantile(x,c(0.025,0.5,0.975))))z_combn<-data.frame(z = dat$z_true,zL = post_z_summ[,1],zM = post_z_summ[,2],zU = post_z_summ[,3])plotz<-ggplot(data = z_combn,aes(x = z))+geom_point(aes(y = zM),size =0.75,color ="darkblue",alpha =0.5)+geom_errorbar(aes(ymin = zL,ymax = zU),width =0.05,alpha =0.15,color ="skyblue")+geom_abline(slope =1,intercept =0,color ="red")+xlab("True z")+ylab("Stacked posterior of z")+theme_bw()+theme(panel.background =element_blank(),panel.grid =element_blank(),aspect.ratio =1)plotz

Comparison of stacked posterior with the true values

The package also provides helper functions to plot interpolatedspatial surfaces in order for visualization purposes. The functionsurfaceplot() creates a single spatial surface plot, whilesurfaceplot2() creates two side-by-side surface plots. Weare using the later to visually inspect the interpolated spatialsurfaces of the true spatial effects and their posterior medians.

postmedian_z<-apply(post_z,1, median)dat$z_hat<- postmedian_zplot_z<-surfaceplot2(dat,coords_name =c("s1","s2"),var1_name ="z_true",var2_name ="z_hat")library(ggpubr)ggarrange(plotlist = plot_z,common.legend =TRUE,legend ="right")

Comparison of the interploated spatial surfaces of the true random effects and the posterior medians.

Analysis of spatial non-Gaussian data

We also offer functions for Bayesian analysis of spatiallypoint-referenced Poisson, binomial count, and binary data.

Spatial Poisson count data

We first load and plot the point-referenced Poisson count data.

data("simPoisson")dat<- simPoisson[1:200, ]# work with first 200 observationsggplot(dat,aes(x = s1,y = s2))+geom_point(aes(color = y),alpha =0.75)+scale_color_distiller(palette ="RdYlGn",direction =-1,label =function(x)sprintf("%.0f", x))+guides(alpha ='none')+theme_bw()+theme(axis.ticks =element_line(linewidth =0.25),panel.background =element_blank(),panel.grid =element_blank(),legend.title =element_text(size =10,hjust =0.25),legend.box.just ="center",aspect.ratio =1)

Under fixed hyperparameters

Next, we demonstrate the functionspGLMexact() whichdelivers posterior samples of the fixed effects and the spatial randomeffects. The optionfamily must be specified correctlywhile using this function. For instance, in the following example, theformulay ~ x1 andfamily = "poisson"corresponds to the spatial regression model\[y(s) \sim \mathsf{Poisson} (\lambda(s)), \quad\log \lambda(s) = \beta_0 + \beta_1 x_1(s) + z(s)\;.\]

We provide fixed values of the spatial process parameters and aboundary adjustment parameter, given by the argumentboundary, which if not supplied, defaults to 0.5. Fordetails on the priors and its default value, see functiondocumentation.

mod1<-spGLMexact(y~ x1,data = dat,family ="poisson",coords =as.matrix(dat[,c("s1","s2")]),cor.fn ="matern",spParams =list(phi = phi0,nu = nu0),priors =list(nu.beta =5,nu.z =5),boundary =0.5,n.samples =1000,verbose =TRUE)#> Some priors were not supplied. Using defaults.#> ----------------------------------------#>  Model description#> ----------------------------------------#> Model fit with 200 observations.#>#> Family = poisson.#>#> Number of covariates 2 (including intercept).#>#> Using the matern spatial correlation function.#>#> Priors:#>  beta: Gaussian#>  mu: 0.00    0.00#>  cov:#>   100.00  0.00#>   0.00    100.00#>#>  sigmaSq.beta ~ IG(nu.beta/2, nu.beta/2)#>  sigmaSq.z ~ IG(nu.z/2, nu.z/2)#>  nu.beta = 5.00, nu.z = 5.00.#>  sigmaSq.xi = 0.10.#>  Boundary adjustment parameter = 0.50.#>#> Spatial process parameters:#>  phi = 3.00, and, nu = 0.50.#>#> Number of posterior samples = 1000.#> ----------------------------------------

We next collect the samples of the fixed effects and summarize them.The true value of the fixed effects with which the data was simulated is\(\beta = (2, -0.5)\) (for moredetails, see the documentation of the datasimPoisson).

post_beta<- mod1$samples$betasummary_beta<-t(apply(post_beta,1,function(x)quantile(x,c(0.025,0.5,0.975))))rownames(summary_beta)<- mod1$X.namesprint(summary_beta)#>                   2.5%       50%      97.5%#> (Intercept)  0.7767448  2.043858  3.1654221#> x1          -0.6480264 -0.551603 -0.4539841

Posterior recovery of scale parameters

The analytic tractability of the posterior distribution under the\(\mathsf{GCM}\) framework is enabledby marginalizing out the scale parameters\(\sigma^2_\beta\) and\(\sigma^2_z\) associated with the fixedeffects\(\beta\) and the spatialrandom effects\(z\), respectively.However, posterior samples of\(\sigma^2_\beta\) and\(\sigma^2_z\) can be recovered using thefunctionrecoverGLMscale().

mod1<-recoverGLMscale(mod1)

We visualize the posterior distributions of\(\sigma_\beta\) and\(\sigma_z\) through histograms.

post_scale_df<-data.frame(value =sqrt(c(mod1$samples$sigmasq.beta, mod1$samples$sigmasq.z)),group =factor(rep(c("sigma.beta","sigma.z"),each =length(mod1$samples$sigmasq.beta))))ggplot(post_scale_df,aes(x = value))+geom_density(fill ="lightblue",alpha =0.6)+facet_wrap(~ group,scales ="free")+labs(x ="",y ="Density")+theme_bw()+theme(panel.background =element_blank(),panel.grid =element_blank(),aspect.ratio =1)

Posterior distributions of the scale parameters of the fixed and the random effects.

Using predictive stacking

Next, we move on to the functionspGLMstack() that willimplement our proposed stacking algorithm. The argumentloopd.controls is used to provide details on what algorithmto be used to find LOO-PD. Valid options for the tagmethodis"exact" and"CV". We use\(K\)-fold cross-validation by assigningmethod = "CV"andCV.K = 10. The tagnMC decides the number of Monte Carlo samples to be used tofind the LOO-PD.

mod2<-spGLMstack(y~ x1,data = dat,family ="poisson",coords =as.matrix(dat[,c("s1","s2")]),cor.fn ="matern",params.list =list(phi =c(3,7,10),nu =c(0.5,1.5),boundary =c(0.5,0.6)),n.samples =1000,priors =list(mu.beta =5,nu.z =5),loopd.controls =list(method ="CV",CV.K =10,nMC =1000),parallel =TRUE,solver ="ECOS",verbose =TRUE)#> Some priors were not supplied. Using defaults.#>#> STACKING WEIGHTS:#>#>            | phi | nu  | boundary | weight |#> +----------+-----+-----+----------+--------+#> | Model 1  |    3|  0.5|       0.5| 0.000  |#> | Model 2  |    7|  0.5|       0.5| 0.000  |#> | Model 3  |   10|  0.5|       0.5| 0.000  |#> | Model 4  |    3|  1.5|       0.5| 0.000  |#> | Model 5  |    7|  1.5|       0.5| 0.000  |#> | Model 6  |   10|  1.5|       0.5| 0.000  |#> | Model 7  |    3|  0.5|       0.6| 0.000  |#> | Model 8  |    7|  0.5|       0.6| 0.000  |#> | Model 9  |   10|  0.5|       0.6| 0.000  |#> | Model 10 |    3|  1.5|       0.6| 0.005  |#> | Model 11 |    7|  1.5|       0.6| 0.724  |#> | Model 12 |   10|  1.5|       0.6| 0.272  |#> +----------+-----+-----+----------+--------+

We can extract information on solver status and runtime by thefollowing.

print(mod2$solver.status)#> [1] "optimal"print(mod2$run.time)#>    user  system elapsed#>   7.901   0.807   7.941

Further, we can recover the posterior samples of the scale parametersby passing the output obtained by runningspGLMstack() onceagain throughrecoverGLMscale().

mod2<-recoverGLMscale(mod2)

Sampling from stacked posterior

We first obtain final posterior samples by sampling from the stackedsampler.

post_samps<-stackedSampler(mod2)

Subsequently, we summarize the posterior samples of the fixedeffects.

post_beta<- post_samps$betasummary_beta<-t(apply(post_beta,1,function(x)quantile(x,c(0.025,0.5,0.975))))rownames(summary_beta)<- mod3$X.namesprint(summary_beta)#>                   2.5%        50%      97.5%#> (Intercept)  0.9925623  2.1080561  3.1036063#> x1          -0.6200144 -0.5462358 -0.4774726

The synthetic datasimPoisson was simulated using\(\beta = (2, -0.5)^{ \scriptstyle \top}\).

post_beta_df<-as.data.frame(post_beta)post_beta_df<- post_beta_df%>%mutate(row =paste0("beta",row_number()-1))%>%pivot_longer(-row,names_to ="sample",values_to ="value")# True values of beta0 and beta1truth<-data.frame(row =c("beta0","beta1"),true_value =c(2,-0.5))ggplot(post_beta_df,aes(x = value))+geom_density(fill ="lightblue",alpha =0.6)+geom_vline(data = truth,aes(xintercept = true_value),color ="red",linetype ="dashed",linewidth =0.5)+facet_wrap(~ row,scales ="free")+labs(x ="",y ="Density")+theme_bw()+theme(panel.background =element_blank(),panel.grid =element_blank(),aspect.ratio =1)

Posterior distributions of the fixed effects

Finally, we analyze the posterior samples of the spatial randomeffects.

post_z<- post_samps$zpost_z_summ<-t(apply(post_z,1,function(x)quantile(x,c(0.025,0.5,0.975))))z_combn<-data.frame(z = dat$z_true,zL = post_z_summ[,1],zM = post_z_summ[,2],zU = post_z_summ[,3])plotz<-ggplot(data = z_combn,aes(x = z))+geom_point(aes(y = zM),size =0.75,color ="darkblue",alpha =0.5)+geom_errorbar(aes(ymin = zL,ymax = zU),width =0.05,alpha =0.15,color ="skyblue")+geom_abline(slope =1,intercept =0,color ="red")+xlab("True z")+ylab("Stacked posterior of z")+theme_bw()+theme(panel.background =element_blank(),panel.grid =element_blank(),aspect.ratio =1)plotz

We can also compare the interpolated spatial surfaces of the truespatial effects with that of their posterior median.

postmedian_z<-apply(post_z,1, median)dat$z_hat<- postmedian_zplot_z<-surfaceplot2(dat,coords_name =c("s1","s2"),var1_name ="z_true",var2_name ="z_hat")library(ggpubr)ggarrange(plotlist = plot_z,common.legend =TRUE,legend ="right")

Spatial binomial count data

This will follow the same workflow as Poisson data with the exceptionthat the structure offormula that defines the model willalso contain the total number of trials at each location. Here, wepresent only thespGLMexact() function for brevity.

data("simBinom")dat<- simBinom[1:200, ]# work with first 200 rowsmod1<-spGLMexact(cbind(y, n_trials)~ x1,data = dat,family ="binomial",coords =as.matrix(dat[,c("s1","s2")]),cor.fn ="matern",spParams =list(phi =3,nu =0.5),boundary =0.5,n.samples =1000,verbose =FALSE)

Similarly, we collect the posterior samples of the fixed effects andsummarize them. The true value of the fixed effects with which the datawas simulated is\(\beta = (0.5,-0.5)\).

post_beta<- mod1$samples$betasummary_beta<-t(apply(post_beta,1,function(x)quantile(x,c(0.025,0.5,0.975))))rownames(summary_beta)<- mod1$X.namesprint(summary_beta)#>                   2.5%        50%      97.5%#> (Intercept) -1.2299612  0.7347727  2.7381889#> x1          -0.5837158 -0.4044253 -0.2245974

Spatial binary data

Finally, we present only thespGLMexact() function forspatial binary data to avoid repetition. In this case, unlike thebinomial model, almost nothing changes from that of in the case ofspatial Poisson data.

data("simBinary")dat<- simBinary[1:200, ]mod1<-spGLMexact(y~ x1,data = dat,family ="binary",coords =as.matrix(dat[,c("s1","s2")]),cor.fn ="matern",spParams =list(phi =4,nu =0.4),boundary =0.5,n.samples =1000,verbose =FALSE)

Similarly, we collect the posterior samples of the fixed effects andsummarize them. The true value of the fixed effects with which the datawas simulated is\(\beta = (0.5,-0.5)\).

post_beta<- mod1$samples$betasummary_beta<-t(apply(post_beta,1,function(x)quantile(x,c(0.025,0.5,0.975))))rownames(summary_beta)<- mod1$X.namesprint(summary_beta)#>                   2.5%        50%      97.5%#> (Intercept) -1.2015085  0.3212393 2.20401790#> x1          -0.6653012 -0.3229634 0.07179601

References

Vehtari, Aki, Andrew Gelman, and Jonah Gabry. 2017.“PracticalBayesian Model Evaluation Using Leave-One-OutCross-Validation and WAIC.”Statistics and Computing 27(5): 1413–32.https://doi.org/10.1007/s11222-016-9696-4.

[8]ページ先頭

©2009-2025 Movatter.jp