| Type: | Package |
| Title: | Nonparametric Modeling and Monitoring of Spatio-Temporal Data |
| Version: | 1.0.3 |
| Date: | 2023-9-29 |
| Description: | Spatio-temporal data have become increasingly popular in many research fields. Such data often have complex structures that are difficult to describe and estimate. This package provides reliable tools for modeling complicated spatio-temporal data. It also includes tools of online process monitoring to detect possible change-points in a spatio-temporal process over time. More specifically, the package implements the spatio-temporal mean estimation procedure described in Yang and Qiu (2018) <doi:10.1002/sim.7622>, the spatio-temporal covariance estimation procedure discussed in Yang and Qiu (2019) <doi:10.1002/sim.8315>, the three-step method for the joint estimation of spatio-temporal mean and covariance functions suggested by Yang and Qiu (2022) <doi:10.1007/s10463-021-00787-2>, the spatio-temporal disease surveillance method discussed in Qiu and Yang (2021) <doi:10.1002/sim.9150> that can accommodate the covariate effect, the spatial-LASSO-based process monitoring method proposed by Qiu and Yang (2023) <doi:10.1080/00224065.2022.2081104>, and the online spatio-temporal disease surveillance method described in Yang and Qiu (2020) <doi:10.1080/24725854.2019.1696496>. |
| License: | GPL (≥ 3) |
| NeedsCompilation: | yes |
| Packaged: | 2023-09-29 23:26:49 UTC; kaiyang |
| RoxygenNote: | 7.2.0 |
| Depends: | R (≥ 3.5.0) |
| Imports: | glmnet, MASS, ggplot2, maps, mapproj, knitr, rmarkdown |
| VignetteBuilder: | knitr, rmarkdown |
| Author: | Kai Yang [aut, cre], Peihua Qiu [ctb] |
| Maintainer: | Kai Yang <kayang@mcw.edu> |
| Repository: | CRAN |
| Date/Publication: | 2023-09-30 01:12:49 UTC |
Nonparametric Modeling and Monitoring of Spatio-Temporal Data
Description
Spatio-temporal data have become increasingly popular in many research fields. Such data often have complex structures that are difficult to describe and estimate. This package provides reliable tools for modeling complicated spatio-temporal data. It also includes tools of online process monitoring to detect possible change-points in a spatio-temporal process over time. More specifically, it implements the nonparametric spatio-temporal data modeling methods described in Yang and Qiu (2018, 2019, and 2022), as well as the online spatio-temporal process monitoring methods discussed inQiu and Yang (2021 and 2023) and Yang and Qiu (2020).
Author(s)
Kai Yangkayang@mcw.edu and Peihua QiuMaintainer: Kai Yang <kayang@mcw.edu>
References
Qiu, P. and Yang, K. (2021). Effective Disease Surveillance by Using Covariate Information.Statistics in Medicine,40, 5725-5745.
Qiu, P. and Yang, K. (2023). Spatio-Temporal Process Monitoring Using Exponentially Weighted Spatial LASSO.Journal of Quality Technology,55, 163-180.
Yang, K. and Qiu, P. (2018). Spatio-Temporal Incidence Rate Data Analysis by Nonparametric Regression.Statistics in Medicine,37, 2094-2107.
Yang, K. and Qiu, P. (2019). Nonparametric Estimation of the Spatio-Temporal Covariance Structure.Statistics in Medicine,38, 4555-4565.
Yang, K. and Qiu, P. (2020). Online Sequential Monitoring of Spatio-Temporal Disease Incidence Rates.IISE Transactions,52, 1218-1233.
Yang, K. and Qiu, P. (2022). A Three-Step Local Smoothing Approach for Estimating the Mean and Covariance Functions of Spatio-Temporal Data.Annals of the Institute of Statistical Mathematics,74, 49-68.
Cross-validation mean squared prediction error
Description
The spatio-temporal covariance function is estimated by the weighted moment estimation method in Yang and Qiu (2019). The functioncv_mspe is developed to select the bandwidths(gt,gs) used in the estimation of the spatio-temporal covariance function.
Usage
cv_mspe(y, st, gt = NULL, gs = NULL)Arguments
y | A vector of length |
st | An |
gt | A sequence of temporal kernel bandwidth |
gs | A sequence of spatial kernel bandwidth |
Value
bandwidth | A matrix containing all the bandwidths ( |
mspe | The mean squared prediction errors for all the bandwidths provided by users. |
bandwidth.opt | The bandwidths |
mspe.opt | The minimal mean squared prediction error. |
Author(s)
Kai Yangkayang@mcw.edu and Peihua Qiu
References
Yang, K. and Qiu, P. (2019). Nonparametric Estimation of the Spatio-Temporal Covariance Structure.Statistics in Medicine,38, 4555-4565.
Examples
library(SpTe2M)data(sim_dat)y <- sim_dat$y; st <- sim_dat$stgt <- seq(0.3,0.4,0.1); gs <- seq(0.3,0.4,0.1)ids <- 1:500; y.sub <- y[ids]; st.sub <- st[ids,]mspe <- cv_mspe(y.sub,st.sub,gt,gs)Florida influenza-like illness data
Description
Daily influenza-like illness (ILI) incidence rates at 67 Florida counties during years 2012-2014. The ILI incidence rates were collected by the Electronic Surveillance System for the Early Notification of Community-based Epidemics (ESSENCE) that was developed by the Florida Department of Health. Researchers can have an access to the ESSENCE database after a proper online registration. Moreover, some weather conditions during years 2012-2014 can be obtained from the official website of the National Oceanic and Atmospheric Administration of the United States. The ILI dataset used here contains 8 variables, includingCounty,Date,Lat,Long,Time,Rate (ILI incidence rate),Temp (temperature) andRH (relative humidity), from the two databases mentioned above, whereLong andLat refer to the longitude and latitude of the geometric centers of each Florida county, respectively.
Usage
data(ili_dat)Format
A dataframe containingN=73,432 observations of 8 variables.
Author(s)
Kai Yangkayang@mcw.edu and Peihua Qiu
Modifed cross-validation for bandwidth selection
Description
The spatio-temporal mean function can be estimated by the local linear kernel smoothing procedure (cf., Yang and Qiu 2018). The functionmod_cv provides a reliable tool for selecting bandwidths(ht, hs) used in the local linear kernel smoothing procedure in cases when data are spatio-temporally correlated.
Usage
mod_cv(y, st, ht = NULL, hs = NULL, eps = 0.1)Arguments
y | A vector of the spatio-temporal response |
st | A three-column matrix specifying the spatial locations and times for all the spatio-temporal observations in |
ht | A sequence of temporal kernel bandwidth |
hs | A sequence of temporal kernel bandwidth |
eps | The value of this parametric is between 0 and 1. Default is 0.1. The following bimodal kernel function (cf., Yang and Qiu 2018) is used when calculting the modified cross-validation score:
The argument |
Value
bandwidth | A matrix containing all the bandwidths ( |
mcv | The modified cross-validation scores for all the bandwidths provided by users. |
bandwidth.opt | The selected bandwidths |
mcv.opt | The modified cross-validation score of the selected bandwidths. |
Author(s)
Kai Yangkayang@mcw.edu and Peihua Qiu
References
Yang, K. and Qiu, P. (2018). Spatio-Temporal Incidence Rate Data Analysis by Nonparametric Regression.Statistics in Medicine,37, 2094-2107.
Examples
library(SpTe2M)data(sim_dat)y <- sim_dat$y; st <- sim_dat$stht <- seq(0.10,0.15,0.05); hs <- seq(0.20,0.30,0.10)ids <- 1:500; y.sub <- y[ids]; st.sub <- st[ids,]mcv <- mod_cv(y.sub,st.sub,ht,hs,eps=0.1)PM2.5 concentration data
Description
Daily PM2.5 concentration levels at 183 major cities in China during years 2014-2016. This dataset was collected by the China National Environmental Monitoring Centre (CNEMC). It can be downloaded directly from the CNEMC offical web page. The PM2.5 dataset used here contains 6 variables, includingYear,Time,Long (longitude),Lat (latitude),City, andPM2.5.
Usage
data(pm25_dat)Format
A dataframe containingN=200,385 observations of 6 variables.
Author(s)
Kai Yangkayang@mcw.edu and Peihua Qiu
A simulated spatio-temporal dataset
Description
This simulated dataset is saved as a list, and it contains the following three elements:
- y
A vector of length
N; it contains the data of the observed response variabley.- x
A vector of length
N; it contains the data of the covariatex.- st
An
N\times 3matrix containing the spatial locations and times for all the observations in the dataset.
Usage
data(sim_dat)Format
A list containingN=10,000 observations.
Author(s)
Kai Yangkayang@mcw.edu and Peihua Qiu
Examples
library(MASS)set.seed(100)n <- 100; m <- 100; N <- n*mt <- rep(seq(0.01,1,0.01),each=m)su <- sv <- seq(0.1,1,0.1)su <- rep(su,each=10); sv <- rep(sv,10)su <- rep(su,n); sv <- rep(sv,n)st <- matrix(0,N,3)st[,1] <- su; st[,2] <- sv; st[,3] <- tmu <- rep(0,N)for(i in 1:N) { mu[i] <- 2+sin(pi*su[i])*sin(pi*sv[i])+sin(2*pi*t[i]) }dist <- matrix(0,m,m) # distance matrixfor(i in 1:m) { for(j in 1:m) { dist[i,j] <- sqrt((su[i]-su[j])^2+(sv[i]-sv[j])^2) }}cov.s <- matrix(0,m,m) # spatial correlationfor(i in 1:m) { for(j in 1:m) { cov.s[i,j] <- 0.3^2*exp(-30*dist[i,j]) }}noise <- matrix(0,n,m)noise[1,] <- MASS::mvrnorm(1,mu=rep(0,m),Sigma=cov.s) for(i in 2:n) { noise[i,] <- 0.1*noise[i-1,]+sqrt(1-0.1^2)* MASS::mvrnorm(1,mu=rep(0,m),Sigma=cov.s)}noise <- c(t(noise)); x <- rnorm(N,0,0.3) beta <- 0.5; y <- mu+x*beta+noisesim_dat <- list(); sim_dat$y <- ysim_dat$x <- x; sim_dat$st <- stEstimate the spatio-temporal covariance function
Description
The functionspte_covest is developed to estimate the spatio-temporal covarianceV(t,t';s,s')=\mbox{Cov}(y(t,s),y(t',s')) by the weighted moment estimation procedure (cf., Yang and Qiu 2019). It should be noted that the estimated covariance fromspte_covest may not be positive semidefinite and thus it may not be a legitimate covariance function. In such cases, the projection-based modification needs to be used to make it positive semidefinite (cf., Yang and Qiu 2019).
Usage
spte_covest(y, st, gt = NULL, gs = NULL, stE1 = NULL, stE2 = NULL)Arguments
y | A vector of length |
st | An |
gt | The temporal kernel bandwidth |
gs | The spatial kernel bandwidth |
stE1 | An |
stE2 | An |
Value
stE1 | Same as the one in the arguments. |
stE2 | Same as the one in the arguments. |
bandwidth | The bandwidths |
covhat | An |
Author(s)
Kai Yangkayang@mcw.edu and Peihua Qiu
References
Yang, K. and Qiu, P. (2019). Nonparametric Estimation of the Spatio-Temporal Covariance Structure.Statistics in Medicine,38, 4555-4565.
Examples
library(SpTe2M)data(sim_dat)y <- sim_dat$y; st <- sim_dat$stids <- 1:500; y.sub <- y[ids]; st.sub <- st[ids,]cov.est <- spte_covest(y.sub,st.sub)Decorrelate the spatio-temporal data
Description
The functionspte_decor uses the estimated spatio-temporal mean and covariance to decorrelate the observed spatio-temporal data. After data decorrelation, each decorrelated observation should have asymptotic mean of 0 and asymptotic variance of 1, and the decorrelated data should be asymptotically uncorrelated with each other.
Usage
spte_decor(y, st, y0, st0, T = 1, ht = NULL, hs = NULL, gt = NULL, gs = NULL)Arguments
y | A vector of |
st | A three-column matrix specifying the spatial locations and observation times of the observations to decorrelate. |
y0 | A vector of |
st0 | A three-column matrix specifying the spatial locations and times for all the spatio-temporal observations in |
T | The period of the spatio-temporal mean and covariance. Default value is 1. |
ht | The temporal kernel bandwidth |
hs | The spatial kernel bandwidth |
gt | The temporal kernel bandwidth |
gs | The spatial kernel bandwidth |
Value
st | Same as the one in the arguments. |
std.res | The decorrelated data. |
Author(s)
Kai Yangkayang@mcw.edu and Peihua Qiu
References
Yang, K. and Qiu, P. (2020). Online Sequential Monitoring of Spatio-Temporal Disease Incidence Rates.IISE Transactions,52, 1218-1233.
Examples
library(SpTe2M)data(sim_dat)y <- sim_dat$y; st <- sim_dat$stids <- 1:500; y.sub <- y[ids]; st.sub <- st[ids,]decor <- spte_decor(y.sub,st.sub,y0=y.sub,st0=st.sub)Estimate the spatio-temporal mean function
Description
The functionspte_meanest provides a major tool for estimating the spatio-temporal mean function nonparametrically (cf., Yang and Qiu 2018 and 2022).
Usage
spte_meanest(y, st, ht = NULL, hs = NULL, cor = FALSE, stE = NULL)Arguments
y | A vector of spatio-temporal observations. |
st | A three-column matrix specifying the spatial locations and times for all the spatio-temporal observations in |
ht | The temporal kernel bandwidth |
hs | The spatial kernel bandwidth |
cor | A logical indicator where |
stE | A three-column matrix specifying the spatial locations and times where we want to calculate the estimate of the mean. Default is NULL, and |
Value
bandwidth | The bandwidths ( |
stE | Same as the one in the arguments. |
muhat | The estimated mean values at the spatial locations and times specified by |
Author(s)
Kai Yangkayang@mcw.edu and Peihua Qiu
References
Yang, K. and Qiu, P. (2018). Spatio-Temporal Incidence Rate Data Analysis by Nonparametric Regression. Statistics in Medicine,37, 2094-2107.
Yang, K. and Qiu, P. (2022). A Three-Step Local Smoothing Approach for Estimating the Mean and Covariance Functions of Spatio-Temporal Data.Annals of the Institute of Statistical Mathematics,74, 49-68.
Examples
library(SpTe2M)data(sim_dat)y <- sim_dat$y; st <- sim_dat$stids <- 1:500; y.sub <- y[ids]; st.sub <- st[ids,]cov.est <- spte_meanest(y.sub,st.sub)Fit the semiparametric spatio-temporal model
Description
The functionspte_semiparmreg fits the semiparametric spatio-temporal model to study the relationship between the responsey and covariates\bm{x} by the method discussed in Qiu and Yang (2021), in which an iterative algorithm is used to compute the estimated regression coefficients.
Usage
spte_semiparmreg( y, st, x, ht = NULL, hs = NULL, maxIter = 1000, tol = 10^(-4), stE = NULL)Arguments
y | A vector of length |
st | An |
x | An |
ht | The temporal kernel bandwidth |
hs | The spatial kernel bandwidth |
maxIter | A positive integer specifying the maximum number of iterations allowed. Default value is 1,000. |
tol | A positive numeric value specifying the tolerance level for the convergence criterion. Default value is 0.0001. |
stE | A three-column matrix specifying the spatial locations and times where we want to calculate the estimate of the mean. Default is NULL, and |
Value
bandwidth | The bandwidths ( |
stE | Same as the one in the arguments. |
muhat | The estimated mean values at spatial locations and times specified by |
beta | The vector of the estimated regression coefficient vector. |
Author(s)
Kai Yangkayang@mcw.edu and Peihua Qiu
References
Qiu, P. and Yang, K. (2021). Effective Disease Surveillance by Using Covariate Information.Statistics in Medicine,40, 5725-5745.
Examples
library(SpTe2M)data(sim_dat)y <- sim_dat$y; st <- sim_dat$st; x <- sim_dat$xids <- 1:500; y.sub <- y[ids]; st.sub <- st[ids,]; x.sub <- x[ids]semi.est <- spte_semiparmreg(y.sub,st.sub,x.sub,maxIter=2)Online spatio-temporal process monitoring by a CUSUM chart
Description
The functionsptemnt_cusum implements the sequential online monitoring procedure described in Yang and Qiu (2020).
Usage
sptemnt_cusum( y, st, type, ARL0 = 200, gamma = 0.1, B = 1000, bs = 5, T = 1, ht = NULL, hs = NULL, gt = NULL, gs = NULL)Arguments
y | A vector of |
st | An |
type | A vector of |
ARL0 | The pre-specified IC average run length. Default is 200. |
gamma | The pre-specified allowance constant in the CUSUM chart. Default is 0.1. |
B | The bootstrap sizes used in the block bootstrap procedure for determining the control limit. Default value is 1,000. |
bs | The block size of the block bootstrap procedure. Default value is 5. |
T | The period of the spatio-temporal mean and covariance. Default value is 1. |
ht | The temporal kernel bandwidth |
hs | The spatial kernel bandwidth |
gt | The temporal kernel bandwidth |
gs | The spatial kernel bandwidth |
Value
ARL0 | Same as the one in the arguments. |
gamma | Same as the one in the arguments. |
cstat | The charting statistics which can be used to make a plot for the control chart. |
cl | The control limit that is determined by the block bootstrap. |
signal_time | The signal time (i.e., the first time point when the charting statistic |
Author(s)
Kai Yangkayang@mcw.edu and Peihua Qiu
References
Yang, K. and Qiu, P. (2020). Online Sequential Monitoring of Spatio-Temporal Disease Incidence Rates.IISE Transactions,52, 1218-1233.
Examples
library(SpTe2M)data(ili_dat)n <- 365; m <- 67y <- ili_dat$Rate; st <- ili_dat[,3:5]type <- rep(c('IC1','IC2','Mnt'),c(m*(n+1),(m*n),(m*n)))ids <- c(1:(5*m),((n+1)*m+1):(m*(n+6)),((2*n+1)*m+1):(m*(2*n+6)))y.sub <- y[ids]; st.sub <- st[ids,]; type.sub <- type[ids]ili.cusum <- sptemnt_cusum(y.sub,st.sub,type.sub,ht=0.05,hs=6.5,gt=0.25,gs=1.5)Spatio-temporal process monitoring using covariate information
Description
The functionsptemnt_ewmac is developed to solve the spatio-temporal process montoring problems in cases when the information in covariates needs to be used. Please refer to Qiu and Yang (2021) for more details of the method.
Usage
sptemnt_ewmac( y, x, st, type, ARL0 = 200, ARL0.z = 200, lambda = 0.1, B = 1000, bs = 5, T = 1, ht = NULL, hs = NULL, gt = NULL, gs = NULL)Arguments
y | A vector of |
x | An |
st | An |
type | A vector of |
ARL0 | The pre-specified IC average run length. Default is 200. |
ARL0.z | The pre-specified IC average run length for the covariate chart.Default is 200. Usually, set |
lambda | The pre-specified weighting parameter in the EWMAC chart. Default is 0.1. |
B | The bootstrap sizes used in the block bootstrap procedure for determining the control limit. Default value is 1,000. |
bs | The block size of the block bootstrap procedure. Default value is 5. |
T | The period of the spatio-temporal mean and covariance. Default value is 1. |
ht | The temporal kernel bandwidth |
hs | The spatial kernel bandwidth |
gt | The temporal kernel bandwidth |
gs | The spatial kernel bandwidth |
Value
ARL0 | Same as the one in the arguments. |
lambda | Same as the one in the arguments. |
cstat | The charting statistics which can be used to make a plot for the control chart. |
cl | The control limit that is determined by the block bootstrap. |
signal_time | The signal time (i.e., the first time point when the charting statistic |
Author(s)
Kai Yangkayang@mcw.edu and Peihua Qiu
References
Qiu, P. and Yang, K. (2021). Effective Disease Surveillance by Using Covariate Information.Statistics in Medicine,40, 5725-5745.
Examples
library(SpTe2M)data(ili_dat)n <- 365; m <- 67y <- ili_dat$Rate; x <- as.matrix(ili_dat[,7:8]); st <- ili_dat[,3:5]type <- rep(c('IC1','IC2','Mnt'),c(m*(n+1),(m*n),(m*n)))ids <- c(1:(5*m),((n+1)*m+1):(m*(n+6)),((2*n+1)*m+1):(m*(2*n+6)))y.sub <- y[ids]; x.sub <- x[ids,]; st.sub <- st[ids,]; type.sub <- type[ids]ili.ewmac <- sptemnt_ewmac(y.sub,x.sub,st.sub,type.sub,ht=0.05,hs=6.5,gt=0.25,gs=1.5)Spatio-temporal process monitoring using exponentially weighted spatial LASSO
Description
Implementation of the online spatio-temporal process monitoring procedure described in Qiu and Yang (2023), in which spatial locations with the detected shifts are guaranteed to be small clustered spatal regions by the exponentially weighted spatial LASSO.
Usage
sptemnt_ewsl( y, st, type, ARL0 = 200, lambda = 0.1, B = 1000, bs = 5, T = 1, ht = NULL, hs = NULL, gt = NULL, gs = NULL)Arguments
y | A vector of |
st | An |
type | A vector of |
ARL0 | The pre-specified IC average run length. Default is 200. |
lambda | The pre-specified weighting parameter in the EWMAC chart. Default is 0.1. |
B | The bootstrap sizes used in the block bootstrap procedure for determining the control limit. Default value is 1,000. |
bs | The block size of the block bootstrap procedure. Default value is 5. |
T | The period of the spatio-temporal mean and covariance. Default value is 1. |
ht | The temporal kernel bandwidth |
hs | The spatial kernel bandwidth |
gt | The temporal kernel bandwidth |
gs | The spatial kernel bandwidth |
Value
ARL0 | Same as the one in the arguments. |
lambda | Same as the one in the arguments. |
cstat | The charting statistics which can be used to make a plot for the control chart. |
cl | The control limit that is determined by the block bootstrap. |
signal_time | The signal time (i.e., the first time point when the charting statistic |
Author(s)
Kai Yangkayang@mcw.edu and Peihua Qiu
References
Qiu, P. and Yang, K. (2023). Spatio-Temporal Process Monitoring Using Exponentially Weighted Spatial LASSO.Journal of Quality Technology,55, 163-180.
Examples
library(SpTe2M)data(ili_dat)n <- 365; m <- 67y <- ili_dat$Rate; st <- ili_dat[,3:5]type <- rep(c('IC1','IC2','Mnt'),c(m*(n+1),(m*n),(m*n)))ids <- c(1:(5*m),((n+1)*m+1):(m*(n+6)),((2*n+1)*m+1):(m*(2*n+6)))y.sub <- y[ids]; st.sub <- st[ids,]; type.sub <- type[ids]ili.ewsl <- sptemnt_ewsl(y.sub,st.sub,type.sub,ht=0.05,hs=6.5,gt=0.25,gs=1.5)