Thesmerc package is focused onStatisticalMethods forRegionalCounts. It particularlyfocuses on the spatial scan method (Kulldorff, 1997) and many of itsextensions.
In what follows, we demonstrate some of the basic functionality ofthesmerc package using 281 observations related toleukeumia cases in an 8 county area of the state of New York. The datawere made available in Waller and Gotway (2005) and details are providedthere.
The leukemia data are available in thenydf dataframe.Each row of the dataframe contains information regarding a differentregion. Each row of the dataframe includeslongitude andlatitude information for the centroid of each region,transformedx andy coordinates available inthe original dataset provided by Waller and Gotway (2005), as well asthepopulation of each region and the number of leukemiacases.
## # This research was partially supported under NSF grants 1463642 and 1915277## 'data.frame': 281 obs. of 6 variables:## $ longitude : num -75.9 -75.9 -75.9 -75.9 -75.9 ...## $ latitude : num 42.1 42.1 42.1 42.1 42.1 ...## $ population: num 3540 3560 3739 2784 2571 ...## $ cases : num 3.08 4.08 1.09 1.07 3.06 ...## $ x : num 4.07 4.64 5.71 7.61 7.32 ...## $ y : num -67.4 -66.9 -67 -66 -67.3 ...We plot the study area below using information in the dataobject.
## Linking to GEOS 3.12.1, GDAL 3.8.4, PROJ 9.3.1; sf_use_s2() is TRUEThe spatial scan test can be perfomed using thescan.test function. The user must supply the coordinates ofthe region centroids, the number of cases in each region, and thepopulation (i.e., the number persons at risk). There are many additionaloptional arguments described in the documentation. In the followingexample, we reduce the number of Monte Carlo simulations used toestimate the p-value.
coords= nydf[,c("x","y")]# extract coordinatescases= nydf$cases# extract casespop= nydf$population# extract populationscan_out=scan.test(coords, cases, pop,nsim =99)# perform scan test## computing statistics for simulated data:Thescan.test function (and most of the other*.test functions in thesmerc package)will return an object of classsmerc_cluster.
## [1] "smerc_cluster"Asmerc_cluster object has a default print option thatsummarizes relevant details about the object produced by thefunction.
In this case, we receive the following:
scan_out# print scan.test results## method: circular scan## statistic: poisson## simulation: multinomial## realizations: 99## population upperbound: 50%## minimum cases: 2In this particular case, we see that thescan_out objectsummarizes the results of thecircular scan method using apoisson statistic (the other choice isbinomial). The Monte Carlo p-value was estimated using99 realizations from amultinomialdistribution (poisson andbinomial are theother possibilities). The candidate zones were limited to having no morethan50% of the total population and at least2 cases had to be in a candidate zone to be considered acluster. Other methods will provide some of the same information, butsome of the information will be different based on the relevantparameters of the models.
Asmerc_cluster also has asummary functionthat summarizes the significant (or most likely) clusters returned bythe method.
## nregions max_dist cases ex rr stat p## 1 24 12.3 95.33108 55.8 1.8 13.1 0.01## 2 11 26.5 49.71990 27.1 1.9 8.0 0.07Thesummary ofscan_out reveals that therewere two significant clusters. The first cluster was comprised of 24regions with a maximum distance of 12.3 between centroids. The number ofcases observed in the cluster is 95.33, while 55.8 cases were expected.The relative risk of the cluster is 1.8, with an associated teststatistic of 13.1, and a Monte Carlo p-value of 0.01. A second clusteris also summarized with similar information.
A very basicplot mechanism is provided forsmerc_cluster objects. The plot will show the locations ofevery centroid, with significant clusters shown with different colorsand symbols. An attempt is also made to show the connectivity of theregions. In the plot below, we see the two significant clusters shown inyellow (middle) and purple (bottom).
If you want to extract the detected clusters from asmerc_cluster object, then theclustersfunction can be used. The function returns a list: each element of thelist is a vector with the indices of the regions included in thecluster. The first element of the list contains the regions comprisingthe most likely cluster, the second element contains the regions for thesecond most likely cluster, etc.
## [[1]]## [1] 52 50 49 48 15 16 1 51 37 38 47 2 14 39 53 13 34 40 3 44 17 43 12 46## ## [[2]]## [1] 88 86 87 89 92 91 85 93 84 90 259If a polygon-like structure is associated with each region, then thecolor.clusters can be used to easily produce nicer plots.E.g., we can use thenysf object to create a nicer map ofour results.
Other scan methods available include:
elliptic.test}.flex.test}.rflex.test}.uls.test}.dmst.test}.edmst.test}.dc.test}.mlink.test}.fast.test}.mlf.test}.Other non-scan method for the detection of clusters and/or clusteringof cases for regional data are provided.
bn.test performs the Besag-Newell test (Besag andNewell, 1991) and returns asmerc_cluster. Theprint,summary, andplot methodsare available, as before.
bn_out=bn.test(coords = coords,cases = cases,pop = pop,cstar =20,alpha =0.01)# perform besag-newell testbn_out# print results## method: Besag-Newell## case radius: 20## modified p-value: FALSEsummary(bn_out)# summarize results## nregions max_dist cases ex rr stat p## 1 5 3.0 20.42516 10.2 2.0 5 0.004132413## 2 5 13.4 20.15705 10.5 1.9 5 0.006017272## 3 3 2.0 20.39443 10.8 1.9 3 0.008039295## 4 5 2.7 25.45904 11.0 2.4 5 0.009113555## 5 5 2.4 20.29863 11.1 1.9 5 0.009962904plot(bn_out)# plot resultsThe Cluster Evaluation Permutation Procedure (CEPP) proposed byTurnbull et al. (1990) can be performed usingcepp.test.The function returns asmerc_cluster object withprint,summary, andplotfunctions.
# perform CEPP testcepp_out=cepp.test(coords = coords,cases = cases,pop = pop,nstar =5000,nsim =99,alpha =0.1)cepp_out# print results## method: CEPP## simulation: multinomial## realizations: 99## population radius: 5000summary(cepp_out)# summarize results## nregions max_dist cases ex rr stat p## 1 2 3.6 9.589796 2.8 3.5 9.6 0.07plot(cepp_out)# plot resultsTango’s clustering detection test (Tango, 1995) can be performed viathetango.test function. Thedweights functioncan be used to construct the weights for the test. Thetango.test function produces and object of classtango, which has a nativeprint function and aplot function that compares the goodness-of-fit and spatialautocorrelation components of Tango’s statistic for the observed dataand the simulated data sets.
w=dweights(coords,kappa =1)# construct weights matrixtango_out=tango.test(cases, pop, w,nsim =49)# perform tango's testtango_out# print results## method: Tango's index## index: 0.0029## goodness-of-fit component: 0.0026## spatial autocorrelation component: 0.00033## chi-square statistic: 210## chi-square df: 110## chi-square p-value: 1.4e-08## Monte Carlo p-value: 0.02plot(tango_out)# plot resultsNearly all cluster detection methods have both a*.zonesfunction that returns all the candidate zones for the method and a*.sim function that is used to produce results for datasimulated under the null hypothesis. These are unlikely to interest mostusers, but may be useful for individuals wanting to better understandhow these methods work or are interested in developing new methods basedoff existing ones. The*.sim functions are not meant forgeneral use, as they are written to take very specific arguments thatthe user could easily misuse.
As an example, the following code produces all elliptical-shapedzones considered by the elliptic scan method (Kulldorff et al., 2006)using a population upperbound of no more than 10%. The function returnsa list with the 248,213 candidate zones, as well as the associated shapeand angle.
# obtain zones for elliptical scan methodezones=elliptic.zones(coords, pop,ubpop =0.1)# view structure of ezonesstr(ezones)## List of 3## $ zones:List of 239228## ..$ : int 1## ..$ : int [1:2] 1 2## ..$ : int [1:3] 1 2 15## ..$ : int [1:4] 1 2 15 49## ..$ : int [1:5] 1 2 15 49 50## ..$ : int [1:6] 1 2 15 49 50 13## ..$ : int [1:7] 1 2 15 49 50 13 3## ..$ : int [1:8] 1 2 15 49 50 13 3 14## ..$ : int [1:9] 1 2 15 49 50 13 3 14 48## ..$ : int [1:10] 1 2 15 49 50 13 3 14 48 47## ..$ : int [1:11] 1 2 15 49 50 13 3 14 48 47 ...## ..$ : int [1:12] 1 2 15 49 50 13 3 14 48 47 ...## ..$ : int [1:13] 1 2 15 49 50 13 3 14 48 47 ...## ..$ : int [1:14] 1 2 15 49 50 13 3 14 48 47 ...## ..$ : int [1:15] 1 2 15 49 50 13 3 14 48 47 ...## ..$ : int [1:16] 1 2 15 49 50 13 3 14 48 47 ...## ..$ : int [1:17] 1 2 15 49 50 13 3 14 48 47 ...## ..$ : int [1:18] 1 2 15 49 50 13 3 14 48 47 ...## ..$ : int [1:19] 1 2 15 49 50 13 3 14 48 47 ...## ..$ : int [1:20] 1 2 15 49 50 13 3 14 48 47 ...## ..$ : int [1:21] 1 2 15 49 50 13 3 14 48 47 ...## ..$ : int [1:22] 1 2 15 49 50 13 3 14 48 47 ...## ..$ : int [1:23] 1 2 15 49 50 13 3 14 48 47 ...## ..$ : int [1:24] 1 2 15 49 50 13 3 14 48 47 ...## ..$ : int [1:25] 1 2 15 49 50 13 3 14 48 47 ...## ..$ : int [1:26] 1 2 15 49 50 13 3 14 48 47 ...## ..$ : int [1:27] 1 2 15 49 50 13 3 14 48 47 ...## ..$ : int [1:28] 1 2 15 49 50 13 3 14 48 47 ...## ..$ : int [1:29] 1 2 15 49 50 13 3 14 48 47 ...## ..$ : int 2## ..$ : int [1:3] 2 1 3## ..$ : int [1:4] 2 1 3 13## ..$ : int [1:5] 2 1 3 13 15## ..$ : int [1:6] 2 1 3 13 15 14## ..$ : int [1:7] 2 1 3 13 15 14 49## ..$ : int [1:8] 2 1 3 13 15 14 49 47## ..$ : int [1:9] 2 1 3 13 15 14 49 47 12## ..$ : int [1:10] 2 1 3 13 15 14 49 47 12 50## ..$ : int [1:11] 2 1 3 13 15 14 49 47 12 50 ...## ..$ : int [1:12] 2 1 3 13 15 14 49 47 12 50 ...## ..$ : int [1:13] 2 1 3 13 15 14 49 47 12 50 ...## ..$ : int [1:14] 2 1 3 13 15 14 49 47 12 50 ...## ..$ : int [1:15] 2 1 3 13 15 14 49 47 12 50 ...## ..$ : int [1:16] 2 1 3 13 15 14 49 47 12 50 ...## ..$ : int [1:17] 2 1 3 13 15 14 49 47 12 50 ...## ..$ : int [1:18] 2 1 3 13 15 14 49 47 12 50 ...## ..$ : int [1:20] 2 1 3 13 15 14 49 47 12 50 ...## ..$ : int [1:21] 2 1 3 13 15 14 49 47 12 50 ...## ..$ : int [1:22] 2 1 3 13 15 14 49 47 12 50 ...## ..$ : int [1:23] 2 1 3 13 15 14 49 47 12 50 ...## ..$ : int [1:24] 2 1 3 13 15 14 49 47 12 50 ...## ..$ : int [1:25] 2 1 3 13 15 14 49 47 12 50 ...## ..$ : int [1:27] 2 1 3 13 15 14 49 47 12 50 ...## ..$ : int [1:28] 2 1 3 13 15 14 49 47 12 50 ...## ..$ : int 3## ..$ : int [1:2] 3 13## ..$ : int [1:3] 3 13 2## ..$ : int [1:4] 3 13 2 12## ..$ : int [1:5] 3 13 2 12 14## ..$ : int [1:6] 3 13 2 12 14 5## ..$ : int [1:7] 3 13 2 12 14 5 1## ..$ : int [1:8] 3 13 2 12 14 5 1 10## ..$ : int [1:9] 3 13 2 12 14 5 1 10 11## ..$ : int [1:10] 3 13 2 12 14 5 1 10 11 4## ..$ : int [1:11] 3 13 2 12 14 5 1 10 11 4 ...## ..$ : int [1:12] 3 13 2 12 14 5 1 10 11 4 ...## ..$ : int [1:13] 3 13 2 12 14 5 1 10 11 4 ...## ..$ : int [1:14] 3 13 2 12 14 5 1 10 11 4 ...## ..$ : int [1:15] 3 13 2 12 14 5 1 10 11 4 ...## ..$ : int [1:16] 3 13 2 12 14 5 1 10 11 4 ...## ..$ : int [1:17] 3 13 2 12 14 5 1 10 11 4 ...## ..$ : int [1:18] 3 13 2 12 14 5 1 10 11 4 ...## ..$ : int [1:19] 3 13 2 12 14 5 1 10 11 4 ...## ..$ : int [1:20] 3 13 2 12 14 5 1 10 11 4 ...## ..$ : int [1:21] 3 13 2 12 14 5 1 10 11 4 ...## ..$ : int [1:22] 3 13 2 12 14 5 1 10 11 4 ...## ..$ : int [1:23] 3 13 2 12 14 5 1 10 11 4 ...## ..$ : int [1:24] 3 13 2 12 14 5 1 10 11 4 ...## ..$ : int [1:25] 3 13 2 12 14 5 1 10 11 4 ...## ..$ : int [1:26] 3 13 2 12 14 5 1 10 11 4 ...## ..$ : int [1:27] 3 13 2 12 14 5 1 10 11 4 ...## ..$ : int [1:29] 3 13 2 12 14 5 1 10 11 4 ...## ..$ : int 4## ..$ : int [1:2] 4 6## ..$ : int [1:3] 4 6 5## ..$ : int [1:4] 4 6 5 12## ..$ : int [1:5] 4 6 5 12 35## ..$ : int [1:6] 4 6 5 12 35 7## ..$ : int [1:7] 4 6 5 12 35 7 10## ..$ : int [1:8] 4 6 5 12 35 7 10 3## ..$ : int [1:9] 4 6 5 12 35 7 10 3 11## ..$ : int [1:10] 4 6 5 12 35 7 10 3 11 9## ..$ : int [1:11] 4 6 5 12 35 7 10 3 11 9 ...## ..$ : int [1:12] 4 6 5 12 35 7 10 3 11 9 ...## ..$ : int [1:13] 4 6 5 12 35 7 10 3 11 9 ...## ..$ : int [1:14] 4 6 5 12 35 7 10 3 11 9 ...## ..$ : int [1:15] 4 6 5 12 35 7 10 3 11 9 ...## ..$ : int [1:16] 4 6 5 12 35 7 10 3 11 9 ...## ..$ : int [1:17] 4 6 5 12 35 7 10 3 11 9 ...## .. [list output truncated]## $ shape: num [1:239228] 1 1 1 1 1 1 1 1 1 1 ...## $ angle: num [1:239228] 90 90 90 90 90 90 90 90 90 90 ...Assuncao, R.M., Costa, M.A., Tavares, A. and Neto, S.J.F. (2006).Fast detection of arbitrarily shaped disease clusters, Statistics inMedicine, 25, 723-742.
Besag, J. and Newell, J. (1991). The detection of clusters in rarediseases, Journal of the Royal Statistical Society, Series A, 154,327-333.
Costa, M.A. and Assuncao, R.M. and Kulldorff, M. (2012) Constrainedspanning tree algorithms for irregularly-shaped spatial clustering,Computational Statistics & Data Analysis, 56(6), 1771-1783.
Kulldorff, M. (1997) A spatial scan statistic. Communications inStatistics - Theory and Methods, 26(6): 1481-1496,
Kulldorff, M., Huang, L., Pickle, L. and Duczmal, L. (2006) Anelliptic spatial scan statistic. Statististics in Medicine,25:3929-3943.
Neill, D. B. (2012), Fast subset scan for spatial pattern detection.Journal of the Royal Statistical Society: Series B (StatisticalMethodology), 74: 337-360.
Patil, G.P. & Taillie, C. Upper level set scan statistic fordetecting arbitrarily shaped hotspots. Environmental and EcologicalStatistics (2004) 11(2):183-197.
Tango, T. (1995) A class of tests for detecting “general” and“focused” clustering of rare diseases. Statistics in Medicine. 14,2323-2334.
Tango, T., & Takahashi, K. (2005). A flexibly shaped spatial scanstatistic for detecting clusters. International journal of healthgeographics, 4(1), 11. Kulldorff, M. (1997) A spatial scan statistic.Communications in Statistics – Theory and Methods 26, 1481-1496.
Tango, T. and Takahashi, K. (2012), A flexible spatial scan statisticwith a restricted likelihood ratio for detecting disease clusters.Statist. Med., 31: 4207-4218.
Turnbull, Bruce W., Iwano, Eric J., Burnett, William S., Howe, HollyL., Clark, Larry C. (1990). Monitoring for Clusters of Disease:Application to Leukemia Incidence in Upstate New York, American Journalof Epidemiology, 132(supp1):136-143.
Waller, L.A. and Gotway, C.A. (2005). Applied Spatial Statistics forPublic Health Data. Hoboken, NJ: Wiley.
Yao, Z., Tang, J., & Zhan, F. B. (2011). Detection ofarbitrarily-shaped clusters using a neighbor-expanding approach: A casestudy on murine typhus in South Texas. International journal of healthgeographics, 10(1), 1.