Spatio-temporal data are common in practice. Such data often havecomplex structures that are difficult to describe and estimate. Toovercome this difficulty, theR packageSpTe2M has been developed to estimate the underlyingspatio-temporal mean and covariance structures. This package alsoincludes tools of online process monitoring to detect change-points in aspatio-temporal process over time. More specifically, it implements thenonparametric spatio-temporal data modeling methods described in Yangand Qiu (2018, 2019, and 2022), as well as the online spatio-temporalprocess monitoring methods discussed in Qiu and Yang (2021 and 2023) andYang and Qiu (2020). In this vignette, we will demonstrate the mainfunctions inSpTe2M by using the Florida influenza-likeillness (ILI) data, which is a built-in dataset of the package.
The Florida ILI dataset, named asili_dat inSpTe2M, contains daily ILI incidence rates at 67Florida counties during years 2012-2014. The observed IIL incidencerates was collected by the Electronic Surveillance System for the EarlyNotification of Community-based Epidemics of the Florida Department ofHealth. In addition to observations of the variableRate(i.e., ILI incidence rate), the dataset also includes observations ofthe following 7 variables:County,Date,Lat (i.e., latitude of the geometric center of a county),Long (i.e., longitude of the geometric center of a county),Time,Temp (i.e., air temperature), andRH (i.e., relative humidity). First of all, let us installand load the packageSpTe2M and take a quick look atthe ILI data.
First, we run the followingR code to install thepackage from CRAN and load the package inR:
Then, we read the built-in datasetili_dat and use theR functionhead() to display its first 6rows.
data(ili_dat)head(ili_dat)#> County Date Lat Long Time Rate Temp RH#> 1 alachua 1/1/2012 29.65320 -82.32440 0.00273224 8.020372e-06 60.5 78.6#> 2 baker 1/1/2012 30.27236 -82.21351 0.00273224 0.000000e+00 62.0 78.6#> 3 bay 1/1/2012 30.16190 -85.65297 0.00273224 3.532404e-05 61.5 78.6#> 4 bradford 1/1/2012 29.96893 -82.12255 0.00273224 0.000000e+00 61.0 78.6#> 5 brevard 1/1/2012 28.70765 -80.89049 0.00273224 0.000000e+00 68.0 78.6#> 6 broward 1/1/2012 26.05192 -80.14526 0.00273224 2.527846e-05 71.0 78.6Next, we make some geographic maps to investigate the spatio-temporalpatterns of the ILI incidence. To do this, we first combineili_dat with the map shape files included in the packagesmaps andmapproj and then create mapsby using the functionggplot() in the packageggplot2.
library(maps)library(ggplot2)library(mapproj)# turn shape files in the maps packages into a data frameFL<-map_data('county','florida')names(FL)[6]<-'County'# Only plot maps on Jun 15 and Dec 15subdat<-subset(ili_dat,Date%in%c('6/15/2012','6/15/2013','6/15/2014','12/15/2012','12/15/2013','12/15/2014'))subdat$Date<-factor(subdat$Date,levels=c('6/15/2012','6/15/2013','6/15/2014','12/15/2012','12/15/2013','12/15/2014'))# merge ILI data with map datamydat<-merge(FL,subdat)# make maps using ggplotmaps<-ggplot(data=mydat,aes(long,lat,group=group,fill=Rate))+geom_polygon()+facet_wrap(~Date,ncol=3)+geom_path(colour='grey10',lwd=0.5)+scale_fill_gradient2('',low='cyan',mid='white',high='navy',guide='colorbar',limits=c(0,0.0003),na.value='orange',breaks=c(0,0.0001,0.0002,0.0003),labels=c('0','1e-4','2e-4','3e-4'))+guides(fill=guide_colorbar(barwidth=25,barheight=1,direction='horizontal'),cex=1)+theme_bw(base_size=15)+xlab('longitude')+ylab('latitude')+theme(legend.position='bottom',axis.ticks=element_blank(),line=element_blank(),axis.text=element_blank(),panel.border=element_rect(color='black',linewidth=1.2),axis.line=element_line(colour='black'),legend.margin=margin(-10,0,0,0),legend.box.margin=margin(0,0,0,0))# display the mapsmapsFrom the maps, it can be seen that ILI incidence rates have obviousseasonal patterns with more ILI cases in the winters and fewer ILI casesin the summers. Moreover, there seems to be an unusual pattern of ILIincidence rates in the winter of 2014 because the incidence rates on12/15/2014 are much higher than those on 12/15/2012 and 12/15/2013.Next, we appply the packageSpTe2M to this dataset toestimate the spatio-temporal mean and covariance structures and monitorthe ILI incidence sequentially over time.
In this part, we estimate the spatio-temporal mean and covariancestructures in year 2013 by using the functionsspte_meanest() andspte_covest() inSpTe2M. To this end, we first use the following code toextract the observed ILI data in year 2013.
n<-365; m<-67; N1<- (n+1)*m; N2<- n*m# extract the observed ILI data in year 2013ili13<- ili_dat[(N1+1):(N1+N2),]y13<- ili13$Rate; st13<- ili13[,c('Lat','Long','Time')]After obtaining the ILI data in year 2013, the spatio-temporal meanstructure can be estimated by applying the functionspte_meanest() to the extracted data.
Note that we don’t specify the argumentsht andhs when using the functionspte_meanest(). Insuch a case, the two bandwidthsht andhswould be selected automatically by the modified cross-validationprocedure (cf., Yang and Qiu 2018).
To visually check whether the estimated means describe the observedILI data well, we use the code below to plot the estimated means for the4 Florida counties Broward, Lake, Pinellas, and Seminole, along with theobserved ILI incidence rates at the 4 counties.
mu<- mu.est$muhat; mu<-t(matrix(mu,m,n))obs<-t(matrix(y13,m,n))ids<-c(6,34,52,57)# IDs for Broward, Lake, Pinellas, Seminolepar(mfrow=c(2,2),mgp=c(2.4,1,0))par(mar=c(3.5,3.5,1.5,0.5))plot(1:365,mu[,ids[1]],type='l',lty=1,lwd=1.5,xaxt='n',ylim=c(0,8e-5),main='Broward',xlab='Time',ylab='Incidence rate',cex=1.2,cex.lab=1.3,cex.axis=1.2,cex.main=1.3)points(1:365,obs[,ids[1]],cex=1)axis(1,cex.axis=1.2,at=c(1+c(1,62,123,184,245,306,367)),label=c('Jan','Mar','May','July','Sep','Nov','Jan'))par(mar=c(3.5,3,1.5,1))plot(1:365,mu[,ids[2]],type='l',lty=1,lwd=1.5,xaxt='n',ylim=c(0,8e-5),main='Lake',xlab='Time',ylab='',cex=1.2,cex.lab=1.3,cex.axis=1.2,cex.main=1.3)points(1:365,obs[,ids[2]],cex=1)axis(1,cex.axis=1.2,at=c(1+c(1,62,123,184,245,306,367)),label=c('Jan','Mar','May','July','Sep','Nov','Jan'))par(mar=c(3.5,3.5,1.5,0.5))plot(1:365,mu[,ids[3]],type='l',lty=1,lwd=1.5,xaxt='n',ylim=c(0,8e-5),main='Pinellas',xlab='Time',ylab='Incidence rate',cex=1.2,cex.lab=1.3,cex.axis=1.2,cex.main=1.3)points(1:365,obs[,ids[3]],cex=1)axis(1,cex.axis=1.2,at=c(1+c(1,62,123,184,245,306,367)),label=c('Jan','Mar','May','July','Sep','Nov','Jan'))par(mar=c(3.5,3,1.5,1))plot(1:365,mu[,ids[4]],type='l',lty=1,lwd=1.5,xaxt='n',ylim=c(0,8e-5),main='Seminole',xlab='Time',ylab=' ',cex=1.2,cex.lab=1.3,cex.axis=1.2,cex.main=1.3)points(1:365,obs[,ids[4]],cex=1)axis(1.2,at=c(1+c(1,62,123,184,245,306,367)),label=c('Jan','Mar','May','July','Sep','Nov','Jan'))It can be seen from the figure that the estimated spatio-temporalmean structure can well describe the longitudinal pattern of theobserved ILI incidence rates. Thus, the functionspte_meanest() provides a reliable tool for estimating thespatio-temporal mean function of the ILI data.
To estimate the spatio-temporal covariance, we can use the followingcode:
In the above command, we don’t specify the argumentsgtandgs. Then, the two bandwidthsgt andgs would be determined automatically by minimizing the meansquared prediction error, as discussed in Yang and Qiu (2019).
In this part, we discuss how to monitor the observed ILI data byusing the functionssptemnt_cusum(),sptemnt_ewmac() andsptemnt_ewsl(), whichcorrespond to the CUSUM chart (Yang and Qiu 2020), the EWMAC chart (Qiuand Yang 2021) and the EWSL chart (Qiu and Yang 2023), respectively.
Notice that the construction of the CUSUM chart (cf., Yang and Qiu2020) consists of 2 main steps. The first step is to model an in-control(IC) spatio-temporal dataset and estimate the regular spatio-temporalpattern of the IC process. Then, in the second step, the control limitof the chart is first determined by the block bootstrap procedure (cf.,Yang and Qiu 2020) and the chart can then be used for online processmonitoring. In this example, we use the data in years 2012 and 2013 asthe IC data for setting up the chart. These IC observations are dividedinto two parts: the IC data in year 2013 are used for estimating themean and covariance functions of the IC spatio-temporal data, and thedata in year 2012 are used for determining the control limit of thechart. After that, the chart starts to monitor the ILI incidence ratesfrom the beginning of year 2014.
y<- ili_dat$Rate; st<- ili_dat[,c('Lat','Long','Time')]# specify the argument type# data with type=IC1 are used to determine the control limit# data with type=IC2 are used to estimate the regular pattern# data with type =Mnt are used for sequential process monitoringtype<-rep(c('IC1','IC2','Mnt'),c(N1,N2,N2))By using the code below, we can monitor the ILI incidence rates inyear 2014 by the CUSUM chart:
In this example, the bandwidthsht andhsfor estimating the mean function are chosen based on the modifiedcross-validation procedure (cf., Yang and Qiu 2018), which can beimplemented by using the functionmod_cv() inSpTe2M, and the selected bandwidths forhtandhs are 0.05 and 6.50, respectively. Regarding thebandwidthsgt andgs for estimating thecovariance function, they are determined by the mean square predictionerror criterion (cf., Yang and Qiu 2019). After running the functioncv_mspe(), they are chosen to be 0.25 and 1.50,respectively, in this example.
After obtaining the charting statistics from the above code, we canplot the charting statistic values versus the observation times by usingthe following code:
cstat<- ili.cusum$cstat; cl<- ili.cusum$clpar(mfrow=c(1,2),mgp=c(2.1,1,0))# plot the CUSUM chartpar(mar=c(3.5,3.5,1.5,0.5))plot(1:n,cstat,type="l",xlab="Time",xaxt="n",ylab="Charting statistic",main='CUSUM',cex=0.6,lwd=1.5,cex.lab=0.7,cex.axis=0.7,cex.main=0.8)abline(h=cl,lty=2,lwd=1.5)axis(1,cex.axis=0.7,at=c(1,59,121,182,243,304,365),labe=c('Jan','Mar','May','July','Sep','Nov','Jan'))# plot the zoom-in partpar(mar=c(3.5,3,1.5,1))plot(259:305,cstat[259:305],type="b",xlab="Time",xaxt="n",ylab=" ",main='First signal time: 10/16/2014',cex=0.6,lwd=1.5,cex.lab=0.7,cex.axis=0.7,cex.main=0.8)abline(h=cl,lty=2,lwd=1.5)axis(1,cex.axis=0.7,at=c(259,274,289,304),label=c('Sep 15','Sep 30','Oct 15','Oct 31'))The resulting plot is shown in the left panel of the above figure. Tobetter perceive charting statistic values around its first signal time10/16/2014, the charting statistic values during the time period from09/15/2014 to 10/31/2014 are presented in the right panel of thefigure.
In the ILI dataset, there are observations of the two covariatesTemp andRH available. It is our belief thatthe performance of a control chart can be improved by using theinformation in the covariates. To take advantage of the covariateinformation, we can use the EWMAC chart (Qiu and Yang 2021) for processmonitoring, which can be implemented by using the functionsptemnt_ewmac(). The following code is an example about howto use the functionsptemnt_ewmac() to monitor the ILIincidence rates in 2014:
x<-as.matrix(ili_dat[,c('Temp','RH')])# the covariatesili.ewmac<-sptemnt_ewmac(y,x,st,type,ht=0.05,hs=6.5,gt=0.25,gs=1.5)After running the above code, we use the code below to plot the EWMACchart, as well as its zoom-in part during the time period from09/15/2014 to 10/31/2014.
cstat<- ili.ewmac$cstat; cl<- ili.ewmac$clpar(mfrow=c(1,2),mgp=c(2.1,1,0))# plot the EWMAC chartpar(mar=c(3.5,3.5,1.5,0.5))plot(1:n,cstat,type="l",xlab="Time",xaxt="n",ylab="Charting statistic",main='EWMAC',cex=0.6,lwd=1.5,cex.lab=0.7,cex.axis=0.7,cex.main=0.8)abline(h=cl,lty=2,lwd=1.5)axis(1,cex.axis=0.7,at=c(1,59,121,182,243,304,365),labe=c('Jan','Mar','May','July','Sep','Nov','Jan'))# plot its zoom-in partpar(mar=c(3.5,3,1.5,1))plot(259:305,cstat[259:305],type="b",xlab="Time",xaxt="n",ylab=" ",main='First signal time: 9/23/2014',cex=0.6,lwd=1.5,cex.lab=0.7,cex.axis=0.7,cex.main=0.8)abline(h=cl,lty=2,lwd=1.5)axis(1,cex.axis=0.7,at=c(259,274,289,304),label=c('Sep 15','Sep 30','Oct 15','Oct 31'))From the figure, it is clear that the EWMAC chart gives its firstsignal on 09/23/2014, which is 23 days earlier than the first signal ofthe CUSUM chart. So, it is beneficial to use the covariate informationwhen monitoring the ILI incidence rates.
In many applications, anomalies in spatio-temporal processes start insome small regions that are spatially clustered. To take into accountthis spatial feature, the EWSL chart was developed for spatio-temporalprocess monitoring, which is effective in cases when anomalies occur inspatially clustered regions.
Next, we demonstrate the application of the EWSL chart (cf., Qiu andYang 2023) by using the function sptemnt_ewsl(). To this end, let us usethe code as follows:
The plot of the EWSL chart and its zoom-in part during the periodfrom 09/15/2014 to 10/31/2014 can be generated by the code givenbelow.
cstat<- ili.ewsl$cstat; cl<- ili.ewsl$clpar(mfrow=c(1,2),mgp=c(2.1,1,0))# plot the EWSL chartpar(mar=c(3.5,3.5,1.5,0.5))plot(1:n,cstat,type="l",xlab="Time",xaxt="n",ylab="Charting statistic",main='EWSL',cex=0.6,lwd=1.5,cex.lab=0.7,cex.axis=0.7,cex.main=0.8)abline(h=cl,lty=2,lwd=1.5)axis(1,cex.axis=0.8,at=c(1,59,121,182,243,304,365),labe=c('Jan','Mar','May','July','Sep','Nov','Jan'))# plot its zoom-in partpar(mar=c(3.5,3,1.5,1))plot(259:305,cstat[259:305],type="b",xlab="Time",xaxt="n",ylab=" ",main='First signal time: 10/6/2014',cex=0.6,lwd=1.5,cex.lab=0.7,cex.axis=0.7,cex.main=0.8)abline(h=cl,lty=2,lwd=1.5)axis(1,cex.axis=0.7,at=c(259,274,289,304),label=c('Sep 15','Sep 30','Oct 15','Oct 31'))It can be seen from the figure that the EWSL chart gives its firstsignal on 10/6/2014, which is 10 days earlier than that of the CUSUMchart.
Qiu, P. and Yang, K. (2021). Effective Disease Surveillance byUsing Covariate Information.Statistics in Medicine,40, 5725-5745.
Qiu, P. and Yang, K. (2023). Spatio-Temporal Process MonitoringUsing Exponentially Weighted Spatial LASSO.Journal of QualityTechnology,55, 163-180.
Yang, K. and Qiu, P. (2018). Spatio-Temporal Incidence Rate DataAnalysis by Nonparametric Regression.Statistics in Medicine,37, 2094-2107.
Yang, K. and Qiu, P. (2019). Nonparametric Estimation of theSpatio-Temporal Covariance Structure.Statistics in Medicine,38, 4555-4565.
Yang, K. and Qiu, P. (2020). Online Sequential Monitoring ofSpatio-Temporal Disease Incidence Rates.IISE Transactions,52, 1218-1233.
Yang, K. and Qiu, P. (2022). A Three-Step Local SmoothingApproach for Estimating the Mean and Covariance Functions ofSpatio-Temporal Data.Annals of the Institute of StatisticalMathematics,74, 49-68.