Currently, there are 8 functions associated with thecalculate verb in thesgsR package:
| Algorithm | Description |
|---|---|
calculate_representation() | Determine representation of strata in existing sampleunit |
calculate_distance() | Per pixel distance to closestaccessvector |
calculate_pcomp() | Principal components of inputmraster |
calculate_sampsize() | Estimated sample sizes based on relative standarderror |
calculate_allocation() | Proportional / optimal / equal / manualallocation |
calculate_coobs() | Detail howexisting sample units aredistributed amongmraster |
calculate_pop() | Population level information (PCA / quantile matrix /covariance matrix) ofmraster |
calculate_lhsOpt() | Determine optimal Latin hypercube sampling parametersincluding sample size |
calculate_representation()calculate_representation() function allows the users toverify how a stratification is spatially represented in anexisting sample. Result in tabular and graphical (ifplot = TRUE) outputs that compare strata coverage frequencyand sampling frequency.
#--- quantile sraster ---#quantiles<-strat_quantiles(mraster = mraster$zq90,nStrata =8)#--- random samples ---#srs<-sample_srs(raster = sraster,nSamp =50)#--- calculate representation ---#calculate_representation(sraster = quantiles,existing = srs,plot =TRUE)#> # A tibble: 8 × 6#> strata srasterFreq sampleFreq diffFreq nSamp need#> <dbl> <dbl> <dbl> <dbl> <int> <dbl>#> 1 1 0.13 0.12 -0.0100 6 1#> 2 2 0.13 0.08 -0.05 4 3#> 3 3 0.12 0.14 0.0200 7 -1#> 4 4 0.13 0.1 -0.03 5 2#> 5 5 0.13 0.16 0.03 8 -1#> 6 6 0.12 0.16 0.04 8 -2#> 7 7 0.13 0.16 0.03 8 -1#> 8 8 0.12 0.08 -0.04 4 2The tabular output presents the frequency of coverage for each strata(srasterFreq) (what % of the landscape does the stratacover) and the sampling frequency within each strata(sampleFreq) (what % of totalexisting samplesare in the strata). The difference (diffFreq) betweencoverage frequency and sampling frequency determines whether the valuesare over-represented (positive numbers) or under-represented (negativenumbers). This value translates to a discreteneedattribute that defines whether there is a need to add or remove samplesto meet the number of samples necessary to be considered representativeof the spatial coverage of strata inputted insraster.
Performing the algorithm on a sample set derived usingsample_strat() exhibits proportional sampling to stratacoverage given that samples were allocated proportionally to spatialcoverage.
#> # A tibble: 4 × 6#> strata srasterFreq sampleFreq diffFreq nSamp need#> <dbl> <dbl> <dbl> <dbl> <int> <dbl>#> 1 1 0.25 0.26 0.0100 51 -1#> 2 2 0.25 0.25 0 50 0#> 3 3 0.25 0.24 -0.0100 49 1#> 4 4 0.25 0.25 0 50 0TIP!
Presence of very small (negligible) differences betweensrasterFreq andsampleFreq is common.In these situations, it is important for the user to determinewhether to add or remove the samples.
calculate_distancecalculate_distance() uses inputraster andaccess data and outputs the per pixel distance to thenearest access point. This algorithm has a specific value forconstraining sampling protocols, such as withsample_clhs(), where the output raster layer can be used asthecost for the constraint. The output raster consists ofthe input appended with the calculated distance layer(dist2access). Theslope parameters alsoexists to calculate slope distance instead of geographic distance, whichbecomes very handy in the case of steep mountainous areas and is fasterfrom a computational point of view. Ifslope == TRUE, themraster provided should be a digital terrain model.
calculate_distance(raster = sraster,# inputaccess = access,# define access road networkplot =TRUE)# plot#> |---------|---------|---------|---------|=========================================#> class : SpatRaster #> dimensions : 277, 373, 2 (nrow, ncol, nlyr)#> resolution : 20, 20 (x, y)#> extent : 431100, 438560, 5337700, 5343240 (xmin, xmax, ymin, ymax)#> coord. ref. : UTM Zone 17, Northern Hemisphere #> source(s) : memory#> varnames : mraster #> mraster #> names : strata, dist2access #> min values : 1, 0.3438 #> max values : 4, 1061.6599This function performs considerably slower whenaccess has many features. Consider mergeing features forimproved efficiency.
calculate_pcompcalculate_pcomp() usesmraster as the inputand performs principal component analysis. The number of componentsdefined by thenComp parameter specifies the number ofcomponents that will be rasterized.
calculate_pcomp(mraster = mraster,# inputnComp =3,# number of components to outputplot =TRUE,# plotdetails =TRUE)# details about the principal component analysis appended#> $pca#> Standard deviations (1, .., p=3):#> [1] 1.5479878 0.7359109 0.2493371#> #> Rotation (n x k) = (3 x 3):#> PC1 PC2 PC3#> zq90 0.6286296 0.1795433 0.7566961#> pzabove2 0.5104140 -0.8293596 -0.2272450#> zsd 0.5867729 0.5290812 -0.6130014#> #> $raster#> class : SpatRaster #> dimensions : 277, 373, 3 (nrow, ncol, nlyr)#> resolution : 20, 20 (x, y)#> extent : 431100, 438560, 5337700, 5343240 (xmin, xmax, ymin, ymax)#> coord. ref. : UTM Zone 17, Northern Hemisphere #> source(s) : memory#> varnames : mraster #> mraster #> mraster #> names : PC1, PC2, PC3 #> min values : -4.402269, -2.155242, -1.510955 #> max values : 5.282663, 5.357801, 1.446156calculate_sampsizecalculate_sampsize() function allows the user toestimate an appropriate sample size using the relative standard error(rse) of input metrics. If the inputmrastercontains multiple layers, the sample sizes will be determined for alllayers. Ifplot = TRUE andrse is defined, asequence ofrse values will be visualized with theindicators and the values for the matching sample size.
#--- determine sample size based on relative standard error (rse) of 1% ---#calculate_sampsize(mraster = mraster,rse =0.01)#> nSamp rse var#> 1 1394 0.01 zq90#> 2 1341 0.01 pzabove2#> 3 1859 0.01 zsd#--- change default threshold sequence values ---##--- if increment and rse are not divisible the closest value will be taken ---#p<-calculate_sampsize(mraster = mraster,rse =0.025,start =0.01,end =0.08,increment =0.01,plot =TRUE)#> 'rse' not perfectly divisible by 'increment'. Selecting closest sample size (rse = 0.03) based on values.p#> $nSamp#> # A tibble: 3 × 3#> # Groups: var [3]#> nSamp rse var#> <dbl> <dbl> <chr>#> 1 157 0.03 zq90#> 2 151 0.03 pzabove2#> 3 211 0.03 zsd#>#> $plotcalculate_allocationcalculate_allocation() determines how sample units areallocated based on the desired sample size (nSamp) and theinputsraster. It is used internally in a number ofalgorithms, includingsample_strat.Currently, there are four allocation methods: proportional(prop; default), optimal (optim), equal(equal), and manual (manual).
Samples are allocated based on the spatial coverage area of thestrata. This is the default allocation method.
#--- perform grid sampling ---#calculate_allocation(sraster = sraster,nSamp =200)#> strata total#> 1 1 51#> 2 2 50#> 3 3 49#> 4 4 50In this case the sraster has equally sized strata, so the number ofallocated sample units is always the same.
#--- calculate existing samples to include ---#e.sr<-extract_strata(sraster = sraster,existing = existing)calculate_allocation(sraster = sraster,nSamp =200,existing = e.sr)#> strata total need#> 1 1 0 51#> 2 2 0 50#> 3 3 0 49#> 4 4 0 50At times, values undertotal can be negative. Negativevalues indicate thatexisting sample units over representthose strata and that some sample units could removed to preventover-representation.$total indicates the number of sampleunits that could be added or removed.
Sample units are allocated based on within strata variance. Theoptimal allocation method uses the variation within the strata metric toallocate sample units. This means that in addition to providing ansraster, that a specific metric (mraster) mustbe provided to calculate variation to optimally allocate sampleunits.
calculate_allocation(sraster = sraster,# stratified rasternSamp =200,# desired sample numberexisting = e.sr,# existing samplesallocation ="optim",# optimal allocationmraster = mraster$zq90,# metric rasterforce =TRUE)# force nSamp number#> # A tibble: 4 × 3#> # Rowwise:#> strata total need#> <dbl> <dbl> <dbl>#> 1 1 27 78#> 2 2 -14 36#> 3 3 -25 24#> 4 4 12 62There may be situations where the user wants an equal number ofsample units allocated to each strata. In these situations useallocation = equal. In this case,nSamp refersto the total number of sample units per strata, instead of the overalltotal number.
calculate_allocation(sraster = sraster,# stratified rasternSamp =20,# desired sample numberallocation ="equal")# optimal allocation#> Implementing equal allocation of samples.#> # A tibble: 4 × 2#> strata total#> <dbl> <dbl>#> 1 1 20#> 2 2 20#> 3 3 20#> 4 4 20The code in the demonstration above yields a total of 80 samples (20nSamp for each of the 4 strata insraster).
The user may wish to manually assign weights to strata. In this case,allocation = manual can be used andweightsmust be provided as a numeric vector(e.g. weights = c(0.2, 0.2, 0.2, 0.4) wheresum(weights) == 1). In this case,nSamp willbe allocated based onweights.
weights<-c(0.2,0.2,0.2,0.4)calculate_allocation(sraster = sraster,# stratified rasternSamp =20,# desired sample numberallocation ="manual",# manual allocationweights = weights)# weights adding to 1#> Implementing allocation of samples based on user-defined weights.#> strata total#> 1 1 4#> 2 2 4#> 3 3 4#> 4 4 8The code above yields a total of 20 sample units with plots beingallocated based on theweights provided in ascending strataorder.
The following algorithms were initially developed by Dr. BrendanMalone from the University of Sydney. Dr. Malone and his colleaguessupplied an in depth description of the functionality of thesealgorithms, which were originally developed to improve soil samplingstrategies. These functions were modified and implemented to be used forstructurally guided sampling approaches. Many thanks to Dr. Malone forbeing a proponent of open source algorithms.
Please consult the original reference for these scripts and ideas astheir publication holds helpful and valuable information to understandtheir rationale for sampling and algorithm development.
Malone BP, Minansy B, Brungard C. 2019. Some methods to improvethe utility of conditioned Latin hypercube sampling. PeerJ 7:e6451 DOI10.7717/peerj.6451
calculate_coobscalculate_coobs() function performs theCOunt ofOBServations (coobs)algorithm using anexisting sample andmraster. This algorithm helps the user understand how anexisting sample is distributed among the landscape inrelation tomraster data.
TIP!
The output coobs raster can be used to constrain clhs sampling usingthesample_clhs() function to the areas that areunder-represented.
The coobs raster determines how many observations are similar interms of the covariate space at every pixel, and takes advantage ofparallel processing routines.
The following 2 algorithms present the means to maximize theeffectiveness of thelatin hypercube samplingprotocols.
calculate_popcalculate_pop() calculates population level statisticsof anmraster, including calculating the principalcomponents, quantile & covariate distributions, and Kullback-Leiblerdivergence testing.
The outputs produced from this functions are required to use thecalculate_lhsOpt() function described in the followingsection.
TIP!
This algorithm can be pre-emptively used to calculatematQ andMatCov, two values that are used forthesample_ahels() function. This will save processing timeduring sampling.
The output list contains the following:
$values - Pixel values frommraster
$pcaLoad - PCA loadings
$matQ - Quantile matrix
$matCov - Covariate matrix
#--- statistical analyses can be chosen by setting their parameter to `FALSE` ---#mat<-calculate_pop(mraster = mraster,# inputnQuant =10)# desired number of quantiles#--- use matrix output within sample ahels algorithm ---#sample_ahels(mraster = mraster,# inputexisting = existing,# existing samplenQuant =10,# number of desired quantilesnSamp =50,# desired sample sizematCov = mat)# covariance matrixcalculate_lhsOptcalculate_lhsOpt() function performs a bootstrappedlatin hypercube sampling approach where population level analysis ofmraster data is performed to determine the optimal latinhypercube sample size.
calculate_pop() and varying sample sizes defined byminSamp,maxSamp,step andrep. Sampling protocols are conducted and statisticaleffectiveness of those sampling outcomes are evaluated to determinewhere sample size is minimized and statistical representation ismaximized.