The way in which clusters are assigned in cluster randomized trials(CRTs) can profoundly affect the efficiency of the trial. Allocatingclusters by algorithm makes it easy to generate alternative clusterallocations for any given trial site, both for real-world trials and forexploring this neglected aspect of trial design in simulations. TheCRTspat package contains R functions developed for thispurpose.
Input to the package is in the form of a data frame with one recordfor each geo-location in a trial area. Most of the functions of thepackage return a list of classCRTsp, which consists of theinput data frame augmented with additional vectors (e.g. codingclusters, arms, or buffer zones), and lists containing descriptors ofthe dataset. Objects of classCRTsp can also be used asinput to most of the functions.
After each step,summary() can be used to provide adescription of the outputCRTsp object andplotCRT() can be used to output a descriptive plot, or amap of the locations, clusters, arms, buffer zones or othergeographically structured analysis results.
In general the package functions do not expect to find repeatedvalues for outcomes for the same location. TheaggregateCRT() function is used to aggregate data with thesame co-ordinates so that this condition is satisfied. In particular, ifthe input database contains outcome data (e.g. if it contains baselinesurvey results), these should be provided in the form of a numeratorbase_num and denominatorbase_denom for eachrecord. These values will be summed byaggregateCRT() overall records with the same co-ordinates. An object of classCRTsp is output.
Thespecify_clusters() function carries outalgorithmic assignment of clusters and outputs aCRTspobject augmented with the cluster assignments. One of three differentalgorithms must be selected:
algorithm = "NN" implements a nearest neighbouralgorithm. Iteratively One household is selected and a cluster of size kis constructed by adding its k-1 nearest neighbors (NN). These pointsare removing these points from the data set, and this step is repeatediteratively until all the points have been allocated.Thisalgorithm will often lead to connected clusters, in a “fish scale”manner. This is the default option.algorithm = "TSP" implements therepetitive_nn option of theTSPpackage for solving the travelling salesman problem. This finds anefficient path through the study locations. Clusters are formed bygrouping the required number of locations sequentially along the path.Note that this is not guaranteed to give rise to congruentclusters.algorithm = "kmeans" implements ak-meansalgorithm that aims to partition the locations into the requirednumber of clusters in which each observation belongs to the cluster withthe nearest cluster centroid. k-means clustering minimizeswithin-cluster variances (squared Euclidean distances) but does notnecessarily give equal-sized clusters. Irrespective of the algorithm,the target number of points allocated to each cluster is specified bythe parameterh.TherandomizeCRT() function carries out a simplerandomization of clusters to arms, and outputs aCRTspobject augmented with the assignments. (If baseline data are availablematched pair randomization is available as an option)
The units to be randomized will usually be households, but thealgorithms can be used to generate clusters with equal geographicalareas by randomizing pixels. In this case a dataset containing x,ycoordinates for each pixel should be used as input.
The example uses locations and baseline test positivity data from asite in Kenya. The input dataset contains a single record for each testso there are multiple records of test positivity for many locations.
library(CRTspat)example_locations<-readdata('example_site.csv')# assign the denominator to the baseline dataexample_locations$base_denom<-1# convert to a `CRTsp` objectexampleCRT<-CRTsp(example_locations)summary(exampleCRT)## ===============================CLUSTER RANDOMISED TRIAL ===========================## ## Summary of coordinates## ----------------------## Min. : 1st Qu.: Median : Mean : 3rd Qu.: Max. :## x -3.20 -1.31 -0.24 0.00 1.35 5.16 ## y -5.08 -2.84 -0.17 0.00 2.49 6.16 ## ## Total area (within 0.2 km of a location) : 27.6 sq.km## Total area (convex hull) : 48.2 sq.km## ## Locations and Clusters## ---------------------- - ## Coordinate system (x, y) ## Not aggregated. Total records: 3172. Unique locations: 1181 ## Available clusters (across both arms) Not assigned ## No randomization - ## No power calculations to report - ## ## Other variables in dataset## -------------------------- RDT_test_result base_denom# Aggregate data for multiple observations for the same location Only the (x,y) co-ordinates and numerical# auxiliary variablesexample<-aggregateCRT(exampleCRT,auxiliaries =c("RDT_test_result","base_denom"))summary(example)## ===============================CLUSTER RANDOMISED TRIAL ===========================## ## Summary of coordinates## ----------------------## Min. : 1st Qu.: Median : Mean : 3rd Qu.: Max. :## x -3.20 -1.40 -0.30 -0.07 1.26 5.16 ## y -5.08 -2.84 0.19 0.05 2.49 6.16 ## ## Total area (within 0.2 km of a location) : 27.6 sq.km## Total area (convex hull) : 48.2 sq.km## ## Locations and Clusters## ---------------------- - ## Coordinate system (x, y) ## Locations: 1181 ## Available clusters (across both arms) Not assigned ## No randomization - ## No power calculations to report - ## ## Other variables in dataset## -------------------------- RDT_test_result base_denom
Fig 1.1 Map oflocations
In the example shown here a target cluster size of 50 locations isset, but the heterogeneity in spatial density of the locations leads toconsiderable variation in the number of locations assigned to eachcluster.
example_clustered<-specify_clusters(trial = example,h =50,algorithm ='NN')summary(example_clustered)## ===============================CLUSTER RANDOMISED TRIAL ===========================## ## Summary of coordinates## ----------------------## Min. : 1st Qu.: Median : Mean : 3rd Qu.: Max. :## x -3.20 -1.40 -0.30 -0.07 1.26 5.16 ## y -5.08 -2.84 0.19 0.05 2.49 6.16 ## ## Total area (within 0.2 km of a location) : 27.6 sq.km## Total area (convex hull) : 48.2 sq.km## ## Locations and Clusters## ---------------------- - ## Coordinate system (x, y) ## Locations: 1181 ## Available clusters (across both arms) 24 ## Per cluster mean number of points 49.2 ## Per cluster s.d. number of points 3.9 ## No randomization - ## No power calculations to report - ## ## Other variables in dataset## -------------------------- RDT_test_result base_denom
Fig 1.2 Map ofclusters
A smoothed map of the baseline prevalence surface is produced using ageostatistical model inR-INLA.Details of the implementation inCRTspat are in thedocumentation ofCRTanalysis and ofUse Case5.
library(Matrix)examplemesh100<-readdata("examplemesh100.rds")baselineanalysis<-CRTanalysis(trial=example_clustered,method ='INLA',link='logit',baselineOnly =TRUE,baselineNumerator ="RDT_test_result",baselineDenominator ="base_denom",clusterEffects =FALSE,spatialEffects =TRUE,requireMesh =TRUE,inla_mesh = examplemesh100)## Analysis of baseline only, using INLA
Fig 1.3 Smoothed surfaceof baseline prevalence
A summary of the baseline prevalence at cluster level is used in thisexample to match clusters on baseline prevalence and then generate arandomisation based on matched pairs.
example_randomized<-randomizeCRT(example_clustered,matchedPair =TRUE,baselineNumerator ="RDT_test_result",baselineDenominator ="base_denom")## *** computed distance to nearest measurements in discordant arm ***## ===============================CLUSTER RANDOMISED TRIAL ===========================## ## Summary of coordinates## ----------------------## Min. : 1st Qu.: Median : Mean : 3rd Qu.: Max. :## x -3.20 -1.40 -0.30 -0.07 1.26 5.16 ## y -5.08 -2.84 0.19 0.05 2.49 6.16 ## nearestDiscord -1.64 -0.39 0.01 0.03 0.40 2.57 ## ## Total area (within 0.2 km of a location) : 27.6 sq.km## Total area (convex hull) : 48.2 sq.km## ## Locations and Clusters## ---------------------- - ## Coordinate system (x, y) ## Locations: 1181 ## Available clusters (across both arms) 24 ## Per cluster mean number of points 49.2 ## Per cluster s.d. number of points 3.9 ## Cluster randomization: Matched pairs randomized ## No power calculations to report - ## ## Other variables in dataset## -------------------------- RDT_test_result base_denom base_num pair
Fig 1.4 Map of armassignments