Let\(Y = \{y_i\}\) and\(X = \{x_i\}\) be the two spatial crosssectional variable, where\(i = 1, 2, \dots,n\) denotes spatial units (e.g., regions or grid cells), theshadow manifolds of\(X\) can beconstructed using the different spatial lag values of all spatialunits:
\[M_{x} = \begin{bmatrix}S_{(\tau)}(x_1) & S_{(2\tau)}(x_1) & \cdots &S_{(E\tau)}(x_1) \\S_{(\tau)}(x_2) & S_{(2\tau)}(x_2) & \cdots &S_{(E\tau)}(x_2) \\\vdots & \vdots & \ddots & \vdots \\S_{(\tau)}(x_n) & S_{(2\tau)}(x_n) & \cdots &S_{(E\tau)}(x_n)\end{bmatrix}\]
Here,\(S_{(j)}(x_i)\) denotes the\(j\) th-order spatial lag value ofspatial unit\(i\),\(\tau\) is the step size for the spatial lagorder, and\(E\) is the embeddingdimension and\(M_{x}\) corresponds tothe shadow manifolds of\(X\).
With the reconstructed shadow manifolds\(M_{x}\), the state of Y can be predictedwith the state of X through
\[\hat{Y}_s \mid M_x = \sum\limits_{i=1}^k \left(\omega_{si}Y_{si} \midM_x \right)\]
where\(s\) represents a spatialunit at which the value of\(Y\) needsto be predicted,\(\hat{Y}_s\) is theprediction result,\(k\) is the numberof nearest neighbors used for prediction,\(si\) is the spatial unit used in theprediction,\(Y_{si}\) is theobservation value of\(Y\) at\(si\).\(\omega_{si}\) is the corresponding weightdefined as:
\[\omega_{si} \mid M_x = \frac{weight\left(\psi\left(M_x,s_i\right),\psi\left(M_x,s\right)\right)}{\sum_{i=1}^{L+1}weight\left(\psi\left(M_x,s_i\right),\psi\left(M_x,s\right)\right)}\]
where\(\psi(M_x, s_i)\) is thestate vector of spatial unit\(s_i\) inthe shadow manifold\(M_x\), and\(weight (\ast, \ast)\) is the weightfunction between two states in the shadow manifold, defined as:
\[weight \left(\psi\left(M_x,s_i\right),\psi\left(M_x,s\right)\right) =\exp \left(- \frac{dis\left(\psi\left(M_x,s_i\right),\psi\left(M_x,s\right)\right)}{dis\left(\psi\left(M_x,s_1\right),\psi\left(M_x,s\right)\right)} \right)\]
where\(\exp\) is the exponentialfunction and\(dis\left(\ast,\ast\right)\) represents the distance function betweentwo states in the shadow manifold defined as:
\[dis \left( \psi(M_x, s_i), \psi(M_x, s) \right) = \frac{1}{E}\sum_{m=1}^{E} \left| \psi_m(M_x, s_i) - \psi_m(M_x, s) \right|\]
where\(dis \left( \psi(M_x, s_i),\psi(M_x, s) \right)\) denotes the average absolute differencebetween corresponding elements of the two state vectors in the shadowmanifold\(M_x\),\(E\) is the embedding dimension, and\(\psi_m(M_x, s_i)\) is the\(m\)-th element of the state vector\(\psi(M_x, s_i)\).
The skill of cross-mapping prediction\(\rho\) is measured by the Pearsoncorrelation coefficient between the true observations and correspondingpredictions, and the confidence interval of\(\rho\) can be estimated based the\(z\)-statistics with the normaldistribution:
\[\rho = \frac{Cov\left(Y,\hat{Y}\mid M_x\right)}{\sqrt{Var\left(Y\right)Var\left(\hat{Y}\mid M_x\right)}}\]
\[t = \rho \sqrt{\frac{n-2}{1-\rho^2}}\]
where\(n\) is the number ofobservations to be predicted, and
\[z = \frac{1}{2} \ln \left(\frac{1+\rho}{1-\rho}\right)\]
The prediction skill\(\rho\) variesby setting different sizes of libraries, which means the quantity ofobservations used in reconstruction of the shadow manifold. We can usethe convergence of\(\rho\) to inferthe causal associations. For GCCM, the convergence means that\(\rho\) increases with the size of librariesand is statistically significant when the library becomes largest.
\[\rho_{x \to y} = \lim_{L \to \infty} cor \left( Y,\hat{Y}\mid M_x\right)\]
where\(\rho_{x \to y}\) is thecorrelation after convergence, used to measure the causation effect from\(Y\) to\(X\), despite the notation suggesting thereverse direction.
Load thespEDM package and its county-level populationdensity data:
library(spEDM)popd_nb= spdep::read.gal(system.file("case/popd_nb.gal",package ="spEDM"))## Warning in spdep::read.gal(system.file("case/popd_nb.gal", package = "spEDM")):## neighbour object has 4 sub-graphspopd= readr::read_csv(system.file("case/popd.csv",package ="spEDM"))## Rows: 2806 Columns: 7## ── Column specification ────────────────────────────────────────────────────────## Delimiter: ","## dbl (7): lon, lat, popd, elev, tem, pre, slope#### ℹ Use `spec()` to retrieve the full column specification for this data.## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.popd_sf= sf::st_as_sf(popd,coords =c("lon","lat"),crs =4326)popd_sf## Simple feature collection with 2806 features and 5 fields## Geometry type: POINT## Dimension: XY## Bounding box: xmin: 74.9055 ymin: 18.2698 xmax: 134.269 ymax: 52.9346## Geodetic CRS: WGS 84## # A tibble: 2,806 × 6## popd elev tem pre slope geometry## * <dbl> <dbl> <dbl> <dbl> <dbl> <POINT [°]>## 1 780. 8 17.4 1528. 0.452 (116.912 30.4879)## 2 395. 48 17.2 1487. 0.842 (116.755 30.5877)## 3 261. 49 16.0 1456. 3.56 (116.541 30.7548)## 4 258. 23 17.4 1555. 0.932 (116.241 30.104)## 5 211. 101 16.3 1494. 3.34 (116.173 30.495)## 6 386. 10 16.6 1382. 1.65 (116.935 30.9839)## 7 350. 23 17.5 1569. 0.346 (116.677 30.2412)## 8 470. 22 17.1 1493. 1.88 (117.066 30.6514)## 9 1226. 11 17.4 1526. 0.208 (117.171 30.5558)## 10 137. 598 13.9 1458. 5.92 (116.208 30.8983)## # ℹ 2,796 more rowsDetermining optimal embedding dimension:
spEDM::simplex(popd_sf,"pre","pre",E =2:10,k =12)## The suggested E,k,tau for variable pre is 3, 12 and 1spEDM::simplex(popd_sf,"popd","popd",E =2:10,k =12)## The suggested E,k,tau for variable popd is 9, 12 and 1Run GCCM:
startTime=Sys.time()pd_res= spEDM::gccm(data = popd_sf,cause ="pre",effect ="popd",libsizes =seq(100,2800,by =200),E =c(3,9),k =12,nb = popd_nb,progressbar =FALSE)endTime=Sys.time()print(difftime(endTime,startTime,units ="mins"))## Time difference of 2.147843 minspd_res## libsizes pre->popd popd->pre## 1 100 0.1199174 0.03313697## 2 300 0.2359430 0.06783942## 3 500 0.3036477 0.09908984## 4 700 0.3589912 0.12790924## 5 900 0.4002043 0.15290218## 6 1100 0.4410549 0.17431029## 7 1300 0.4850465 0.19399537## 8 1500 0.5252593 0.21250237## 9 1700 0.5681300 0.22907550## 10 1900 0.6087514 0.24370783## 11 2100 0.6471818 0.25669742## 12 2300 0.6822049 0.26774678## 13 2500 0.7154519 0.27716611## 14 2700 0.7470211 0.28586628Visualize the result:
Load thespEDM package and its farmland NPP data:
library(spEDM)npp= terra::rast(system.file("case/npp.tif",package ="spEDM"))# To save the computation time, we will aggregate the data by 3 timesnpp= terra::aggregate(npp,fact =3,na.rm =TRUE)npp## class : SpatRaster## size : 135, 161, 5 (nrow, ncol, nlyr)## resolution : 30000, 30000 (x, y)## extent : -2625763, 2204237, 1867078, 5917078 (xmin, xmax, ymin, ymax)## coord. ref. : CGCS2000_Albers## source(s) : memory## names : npp, pre, tem, elev, hfp## min values : 187.50, 390.3351, -47.8194, -110.1494, 0.04434316## max values : 15381.89, 23734.5330, 262.8576, 5217.6431, 42.68803711# Inspect NA valuesterra::global(npp,"isNA")## isNA## npp 14815## pre 14766## tem 14766## elev 14760## hfp 14972terra::ncell(npp)## [1] 21735nnamat= terra::as.matrix(npp[[1]],wide =TRUE)nnaindice=which(!is.na(nnamat),arr.ind =TRUE)dim(nnaindice)## [1] 6920 2# Select 1500 non-NA pixels to predict:set.seed(2025)indices=sample(nrow(nnaindice),size =1500,replace =FALSE)libindice= nnaindice[-indices,]predindice= nnaindice[indices,]Determining optimal embedding dimension:
spEDM::simplex(npp,"pre","pre",E =2:10,k =12,lib = nnaindice,pred = predindice)## The suggested E,k,tau for variable pre is 2, 12 and 1spEDM::simplex(npp,"npp","npp",E =2:10,k =12,lib = nnaindice,pred = predindice)## The suggested E,k,tau for variable npp is 10, 12 and 1Run GCCM:
startTime=Sys.time()npp_res= spEDM::gccm(data = npp,cause ="pre",effect ="npp",libsizes =matrix(rep(seq(10,130,20),2),ncol =2),E =c(2,10),k =12,lib = nnaindice,pred = predindice,progressbar =FALSE)endTime=Sys.time()print(difftime(endTime,startTime,units ="mins"))## Time difference of 1.097801 minsnpp_res## libsizes pre->npp npp->pre## 1 10 0.1235069 0.1061684## 2 30 0.2629126 0.2340358## 3 50 0.3780635 0.3031695## 4 70 0.4980299 0.3450645## 5 90 0.5954949 0.3434175## 6 110 0.7297306 0.3453495## 7 130 0.8272545 0.3677895Visualize the result:
plot(npp_res,xlimits =c(5,135),ylimits =c(0.05,1),draw_ci =TRUE)+ ggplot2::theme(legend.justification =c(0.25,1))