Movatterモバイル変換

Introduction

Thesmerc package is focused onStatisticalMethods forRegionalCounts. It particularlyfocuses on the spatial scan method (Kulldorff, 1997) and many of itsextensions.

In what follows, we demonstrate some of the basic functionality ofthesmerc package using 281 observations related toleukeumia cases in an 8 county area of the state of New York. The datawere made available in Waller and Gotway (2005) and details are providedthere.

The leukemia data are available in thenydf dataframe.Each row of the dataframe contains information regarding a differentregion. Each row of the dataframe includeslongitude andlatitude information for the centroid of each region,transformedx andy coordinates available inthe original dataset provided by Waller and Gotway (2005), as well asthepopulation of each region and the number of leukemiacases.

library(smerc)# load package

## # This research was partially supported under NSF grants 1463642 and 1915277

data(nydf)# load datastr(nydf)# look at structure

## 'data.frame':    281 obs. of  6 variables:##  $ longitude : num  -75.9 -75.9 -75.9 -75.9 -75.9 ...##  $ latitude  : num  42.1 42.1 42.1 42.1 42.1 ...##  $ population: num  3540 3560 3739 2784 2571 ...##  $ cases     : num  3.08 4.08 1.09 1.07 3.06 ...##  $ x         : num  4.07 4.64 5.71 7.61 7.32 ...##  $ y         : num  -67.4 -66.9 -67 -66 -67.3 ...

We plot the study area below using information in the dataobject.

data(nysf)# load nysf datalibrary(sf)# load sf package for plotting

## Linking to GEOS 3.12.1, GDAL 3.8.4, PROJ 9.3.1; sf_use_s2() is TRUE

plot(st_geometry(nysf))# plot study area

Scan methods

The spatial scan test can be perfomed using thescan.test function. The user must supply the coordinates ofthe region centroids, the number of cases in each region, and thepopulation (i.e., the number persons at risk). There are many additionaloptional arguments described in the documentation. In the followingexample, we reduce the number of Monte Carlo simulations used toestimate the p-value.

coords= nydf[,c("x","y")]# extract coordinatescases= nydf$cases# extract casespop= nydf$population# extract populationscan_out=scan.test(coords, cases, pop,nsim =99)# perform scan test

## computing statistics for simulated data:

Thescan.test function (and most of the other*.test functions in thesmerc package)will return an object of classsmerc_cluster.

class(scan_out)

## [1] "smerc_cluster"

Asmerc_cluster object has a default print option thatsummarizes relevant details about the object produced by thefunction.

In this case, we receive the following:

scan_out# print scan.test results## method: circular scan## statistic: poisson## simulation: multinomial## realizations: 99## population upperbound: 50%## minimum cases: 2

In this particular case, we see that thescan_out objectsummarizes the results of thecircular scan method using apoisson statistic (the other choice isbinomial). The Monte Carlo p-value was estimated using99 realizations from amultinomialdistribution (poisson andbinomial are theother possibilities). The candidate zones were limited to having no morethan50% of the total population and at least2 cases had to be in a candidate zone to be considered acluster. Other methods will provide some of the same information, butsome of the information will be different based on the relevantparameters of the models.

Asmerc_cluster also has asummary functionthat summarizes the significant (or most likely) clusters returned bythe method.

summary(scan_out)# summarize scan.test results

##   nregions max_dist    cases   ex  rr stat    p## 1       24     12.3 95.33108 55.8 1.8 13.1 0.01## 2       11     26.5 49.71990 27.1 1.9  8.0 0.07

Thesummary ofscan_out reveals that therewere two significant clusters. The first cluster was comprised of 24regions with a maximum distance of 12.3 between centroids. The number ofcases observed in the cluster is 95.33, while 55.8 cases were expected.The relative risk of the cluster is 1.8, with an associated teststatistic of 13.1, and a Monte Carlo p-value of 0.01. A second clusteris also summarized with similar information.

A very basicplot mechanism is provided forsmerc_cluster objects. The plot will show the locations ofevery centroid, with significant clusters shown with different colorsand symbols. An attempt is also made to show the connectivity of theregions. In the plot below, we see the two significant clusters shown inyellow (middle) and purple (bottom).

plot(scan_out)# basic plot of scan.test results

If you want to extract the detected clusters from asmerc_cluster object, then theclustersfunction can be used. The function returns a list: each element of thelist is a vector with the indices of the regions included in thecluster. The first element of the list contains the regions comprisingthe most likely cluster, the second element contains the regions for thesecond most likely cluster, etc.

clusters(scan_out)

## [[1]]##  [1] 52 50 49 48 15 16  1 51 37 38 47  2 14 39 53 13 34 40  3 44 17 43 12 46## ## [[2]]##  [1]  88  86  87  89  92  91  85  93  84  90 259

If a polygon-like structure is associated with each region, then thecolor.clusters can be used to easily produce nicer plots.E.g., we can use thenysf object to create a nicer map ofour results.

plot(st_geometry(nysf),col =color.clusters(scan_out))#nicer plot of scan.test results

Other scan methods available include:

The elliptic scan method (Kulldorff et al., 2006), which can becalled viaelliptic.test}.
The flexible scanning method the flexibly-shaped scan method (Tangoand Takahashi, 2005), which can be called viaflex.test}.
The restricted flexible-scanning method (Tango and Takahashi, 2005),which can be called viarflex.test}.
The Upper Level Set (ULS) method (Patil and Taillie, 2004), whichcan be called viauls.test}.
The dynamic minimum spanning tree (DMST) method (Assuncao et al.,2006), which can be called viadmst.test}.
The early-stopping DMST method (Costa et al., 2012), which can becalled vaiedmst.test}.
The double connection method (Costa et al., 2012), which can becalled viadc.test}.
The maximum linkage scan test (Costa et al., 2012), which can becalled viamlink.test}.
The fast scan method (Neill, 2012), which can be called viafast.test}.
The maxima likelihood first scan test (Yao et al., 2011), which canbe called viamlf.test}.

Other methods

Other non-scan method for the detection of clusters and/or clusteringof cases for regional data are provided.

bn.test performs the Besag-Newell test (Besag andNewell, 1991) and returns asmerc_cluster. Theprint,summary, andplot methodsare available, as before.

bn_out=bn.test(coords = coords,cases = cases,pop = pop,cstar =20,alpha =0.01)# perform besag-newell testbn_out# print results## method: Besag-Newell## case radius: 20## modified p-value: FALSEsummary(bn_out)# summarize results##   nregions max_dist    cases   ex  rr stat           p## 1        5      3.0 20.42516 10.2 2.0    5 0.004132413## 2        5     13.4 20.15705 10.5 1.9    5 0.006017272## 3        3      2.0 20.39443 10.8 1.9    3 0.008039295## 4        5      2.7 25.45904 11.0 2.4    5 0.009113555## 5        5      2.4 20.29863 11.1 1.9    5 0.009962904plot(bn_out)# plot results

The Cluster Evaluation Permutation Procedure (CEPP) proposed byTurnbull et al. (1990) can be performed usingcepp.test.The function returns asmerc_cluster object withprint,summary, andplotfunctions.

# perform CEPP testcepp_out=cepp.test(coords = coords,cases = cases,pop = pop,nstar =5000,nsim =99,alpha =0.1)cepp_out# print results## method: CEPP## simulation: multinomial## realizations: 99## population radius: 5000summary(cepp_out)# summarize results##   nregions max_dist    cases  ex  rr stat    p## 1        2      3.6 9.589796 2.8 3.5  9.6 0.07plot(cepp_out)# plot results

Tango’s clustering detection test (Tango, 1995) can be performed viathetango.test function. Thedweights functioncan be used to construct the weights for the test. Thetango.test function produces and object of classtango, which has a nativeprint function and aplot function that compares the goodness-of-fit and spatialautocorrelation components of Tango’s statistic for the observed dataand the simulated data sets.

w=dweights(coords,kappa =1)# construct weights matrixtango_out=tango.test(cases, pop, w,nsim =49)# perform tango's testtango_out# print results## method: Tango's index## index: 0.0029## goodness-of-fit component: 0.0026## spatial autocorrelation component: 0.00033## chi-square statistic: 210## chi-square df: 110## chi-square p-value: 1.4e-08## Monte Carlo p-value: 0.02plot(tango_out)# plot results

Additional details about cluster detection methods

Nearly all cluster detection methods have both a*.zonesfunction that returns all the candidate zones for the method and a*.sim function that is used to produce results for datasimulated under the null hypothesis. These are unlikely to interest mostusers, but may be useful for individuals wanting to better understandhow these methods work or are interested in developing new methods basedoff existing ones. The*.sim functions are not meant forgeneral use, as they are written to take very specific arguments thatthe user could easily misuse.

As an example, the following code produces all elliptical-shapedzones considered by the elliptic scan method (Kulldorff et al., 2006)using a population upperbound of no more than 10%. The function returnsa list with the 248,213 candidate zones, as well as the associated shapeand angle.

# obtain zones for elliptical scan methodezones=elliptic.zones(coords, pop,ubpop =0.1)# view structure of ezonesstr(ezones)

## List of 3##  $ zones:List of 239228##   ..$ : int 1##   ..$ : int [1:2] 1 2##   ..$ : int [1:3] 1 2 15##   ..$ : int [1:4] 1 2 15 49##   ..$ : int [1:5] 1 2 15 49 50##   ..$ : int [1:6] 1 2 15 49 50 13##   ..$ : int [1:7] 1 2 15 49 50 13 3##   ..$ : int [1:8] 1 2 15 49 50 13 3 14##   ..$ : int [1:9] 1 2 15 49 50 13 3 14 48##   ..$ : int [1:10] 1 2 15 49 50 13 3 14 48 47##   ..$ : int [1:11] 1 2 15 49 50 13 3 14 48 47 ...##   ..$ : int [1:12] 1 2 15 49 50 13 3 14 48 47 ...##   ..$ : int [1:13] 1 2 15 49 50 13 3 14 48 47 ...##   ..$ : int [1:14] 1 2 15 49 50 13 3 14 48 47 ...##   ..$ : int [1:15] 1 2 15 49 50 13 3 14 48 47 ...##   ..$ : int [1:16] 1 2 15 49 50 13 3 14 48 47 ...##   ..$ : int [1:17] 1 2 15 49 50 13 3 14 48 47 ...##   ..$ : int [1:18] 1 2 15 49 50 13 3 14 48 47 ...##   ..$ : int [1:19] 1 2 15 49 50 13 3 14 48 47 ...##   ..$ : int [1:20] 1 2 15 49 50 13 3 14 48 47 ...##   ..$ : int [1:21] 1 2 15 49 50 13 3 14 48 47 ...##   ..$ : int [1:22] 1 2 15 49 50 13 3 14 48 47 ...##   ..$ : int [1:23] 1 2 15 49 50 13 3 14 48 47 ...##   ..$ : int [1:24] 1 2 15 49 50 13 3 14 48 47 ...##   ..$ : int [1:25] 1 2 15 49 50 13 3 14 48 47 ...##   ..$ : int [1:26] 1 2 15 49 50 13 3 14 48 47 ...##   ..$ : int [1:27] 1 2 15 49 50 13 3 14 48 47 ...##   ..$ : int [1:28] 1 2 15 49 50 13 3 14 48 47 ...##   ..$ : int [1:29] 1 2 15 49 50 13 3 14 48 47 ...##   ..$ : int 2##   ..$ : int [1:3] 2 1 3##   ..$ : int [1:4] 2 1 3 13##   ..$ : int [1:5] 2 1 3 13 15##   ..$ : int [1:6] 2 1 3 13 15 14##   ..$ : int [1:7] 2 1 3 13 15 14 49##   ..$ : int [1:8] 2 1 3 13 15 14 49 47##   ..$ : int [1:9] 2 1 3 13 15 14 49 47 12##   ..$ : int [1:10] 2 1 3 13 15 14 49 47 12 50##   ..$ : int [1:11] 2 1 3 13 15 14 49 47 12 50 ...##   ..$ : int [1:12] 2 1 3 13 15 14 49 47 12 50 ...##   ..$ : int [1:13] 2 1 3 13 15 14 49 47 12 50 ...##   ..$ : int [1:14] 2 1 3 13 15 14 49 47 12 50 ...##   ..$ : int [1:15] 2 1 3 13 15 14 49 47 12 50 ...##   ..$ : int [1:16] 2 1 3 13 15 14 49 47 12 50 ...##   ..$ : int [1:17] 2 1 3 13 15 14 49 47 12 50 ...##   ..$ : int [1:18] 2 1 3 13 15 14 49 47 12 50 ...##   ..$ : int [1:20] 2 1 3 13 15 14 49 47 12 50 ...##   ..$ : int [1:21] 2 1 3 13 15 14 49 47 12 50 ...##   ..$ : int [1:22] 2 1 3 13 15 14 49 47 12 50 ...##   ..$ : int [1:23] 2 1 3 13 15 14 49 47 12 50 ...##   ..$ : int [1:24] 2 1 3 13 15 14 49 47 12 50 ...##   ..$ : int [1:25] 2 1 3 13 15 14 49 47 12 50 ...##   ..$ : int [1:27] 2 1 3 13 15 14 49 47 12 50 ...##   ..$ : int [1:28] 2 1 3 13 15 14 49 47 12 50 ...##   ..$ : int 3##   ..$ : int [1:2] 3 13##   ..$ : int [1:3] 3 13 2##   ..$ : int [1:4] 3 13 2 12##   ..$ : int [1:5] 3 13 2 12 14##   ..$ : int [1:6] 3 13 2 12 14 5##   ..$ : int [1:7] 3 13 2 12 14 5 1##   ..$ : int [1:8] 3 13 2 12 14 5 1 10##   ..$ : int [1:9] 3 13 2 12 14 5 1 10 11##   ..$ : int [1:10] 3 13 2 12 14 5 1 10 11 4##   ..$ : int [1:11] 3 13 2 12 14 5 1 10 11 4 ...##   ..$ : int [1:12] 3 13 2 12 14 5 1 10 11 4 ...##   ..$ : int [1:13] 3 13 2 12 14 5 1 10 11 4 ...##   ..$ : int [1:14] 3 13 2 12 14 5 1 10 11 4 ...##   ..$ : int [1:15] 3 13 2 12 14 5 1 10 11 4 ...##   ..$ : int [1:16] 3 13 2 12 14 5 1 10 11 4 ...##   ..$ : int [1:17] 3 13 2 12 14 5 1 10 11 4 ...##   ..$ : int [1:18] 3 13 2 12 14 5 1 10 11 4 ...##   ..$ : int [1:19] 3 13 2 12 14 5 1 10 11 4 ...##   ..$ : int [1:20] 3 13 2 12 14 5 1 10 11 4 ...##   ..$ : int [1:21] 3 13 2 12 14 5 1 10 11 4 ...##   ..$ : int [1:22] 3 13 2 12 14 5 1 10 11 4 ...##   ..$ : int [1:23] 3 13 2 12 14 5 1 10 11 4 ...##   ..$ : int [1:24] 3 13 2 12 14 5 1 10 11 4 ...##   ..$ : int [1:25] 3 13 2 12 14 5 1 10 11 4 ...##   ..$ : int [1:26] 3 13 2 12 14 5 1 10 11 4 ...##   ..$ : int [1:27] 3 13 2 12 14 5 1 10 11 4 ...##   ..$ : int [1:29] 3 13 2 12 14 5 1 10 11 4 ...##   ..$ : int 4##   ..$ : int [1:2] 4 6##   ..$ : int [1:3] 4 6 5##   ..$ : int [1:4] 4 6 5 12##   ..$ : int [1:5] 4 6 5 12 35##   ..$ : int [1:6] 4 6 5 12 35 7##   ..$ : int [1:7] 4 6 5 12 35 7 10##   ..$ : int [1:8] 4 6 5 12 35 7 10 3##   ..$ : int [1:9] 4 6 5 12 35 7 10 3 11##   ..$ : int [1:10] 4 6 5 12 35 7 10 3 11 9##   ..$ : int [1:11] 4 6 5 12 35 7 10 3 11 9 ...##   ..$ : int [1:12] 4 6 5 12 35 7 10 3 11 9 ...##   ..$ : int [1:13] 4 6 5 12 35 7 10 3 11 9 ...##   ..$ : int [1:14] 4 6 5 12 35 7 10 3 11 9 ...##   ..$ : int [1:15] 4 6 5 12 35 7 10 3 11 9 ...##   ..$ : int [1:16] 4 6 5 12 35 7 10 3 11 9 ...##   ..$ : int [1:17] 4 6 5 12 35 7 10 3 11 9 ...##   .. [list output truncated]##  $ shape: num [1:239228] 1 1 1 1 1 1 1 1 1 1 ...##  $ angle: num [1:239228] 90 90 90 90 90 90 90 90 90 90 ...

References

Assuncao, R.M., Costa, M.A., Tavares, A. and Neto, S.J.F. (2006).Fast detection of arbitrarily shaped disease clusters, Statistics inMedicine, 25, 723-742.

Besag, J. and Newell, J. (1991). The detection of clusters in rarediseases, Journal of the Royal Statistical Society, Series A, 154,327-333.

Costa, M.A. and Assuncao, R.M. and Kulldorff, M. (2012) Constrainedspanning tree algorithms for irregularly-shaped spatial clustering,Computational Statistics & Data Analysis, 56(6), 1771-1783.

Kulldorff, M. (1997) A spatial scan statistic. Communications inStatistics - Theory and Methods, 26(6): 1481-1496,

Kulldorff, M., Huang, L., Pickle, L. and Duczmal, L. (2006) Anelliptic spatial scan statistic. Statististics in Medicine,25:3929-3943.

Neill, D. B. (2012), Fast subset scan for spatial pattern detection.Journal of the Royal Statistical Society: Series B (StatisticalMethodology), 74: 337-360.

Patil, G.P. & Taillie, C. Upper level set scan statistic fordetecting arbitrarily shaped hotspots. Environmental and EcologicalStatistics (2004) 11(2):183-197.

Tango, T. (1995) A class of tests for detecting “general” and“focused” clustering of rare diseases. Statistics in Medicine. 14,2323-2334.

Tango, T., & Takahashi, K. (2005). A flexibly shaped spatial scanstatistic for detecting clusters. International journal of healthgeographics, 4(1), 11. Kulldorff, M. (1997) A spatial scan statistic.Communications in Statistics – Theory and Methods 26, 1481-1496.

Tango, T. and Takahashi, K. (2012), A flexible spatial scan statisticwith a restricted likelihood ratio for detecting disease clusters.Statist. Med., 31: 4207-4218.

Turnbull, Bruce W., Iwano, Eric J., Burnett, William S., Howe, HollyL., Clark, Larry C. (1990). Monitoring for Clusters of Disease:Application to Leukemia Incidence in Upstate New York, American Journalof Epidemiology, 132(supp1):136-143.

Waller, L.A. and Gotway, C.A. (2005). Applied Spatial Statistics forPublic Health Data. Hoboken, NJ: Wiley.

Yao, Z., Tang, J., & Zhan, F. B. (2011). Detection ofarbitrarily-shaped clusters using a neighbor-expanding approach: A casestudy on murine typhus in South Texas. International journal of healthgeographics, 10(1), 1.

Movatterモバイル変換

smerc-demo

Joshua P. French

2024-10-24

Introduction

Scan methods

Other methods

Additional details about cluster detection methods

References