Movatterモバイル変換


[0]ホーム

URL:


Title:Structure for Organizing Monte Carlo Simulation Designs
Version:2.22
Description:Provides tools to safely and efficiently organize and execute Monte Carlo simulation experiments in R. The package controls the structure and back-end of Monte Carlo simulation experiments by utilizing a generate-analyse-summarise workflow. The workflow safeguards against common simulation coding issues, such as automatically re-simulating non-convergent results, prevents inadvertently overwriting simulation files, catches error and warning messages during execution, implicitly supports parallel processing with high-quality random number generation, and provides tools for managing high-performance computing (HPC) array jobs submitted to schedulers such as SLURM. For a pedagogical introduction to the package see Sigal and Chalmers (2016) <doi:10.1080/10691898.2016.1246953>. For a more in-depth overview of the package and its design philosophy see Chalmers and Adkins (2020) <doi:10.20982/tqmp.16.4.p248>.
VignetteBuilder:knitr
Depends:R (≥ 4.1.0)
Imports:methods, testthat, parallel, parallelly, dplyr, sessioninfo,beepr, pbapply (≥ 1.3-0), future, future.apply, progressr,R.utils, codetools, clipr, stats
Suggests:snow, knitr, ggplot2, tidyr, purrr, shiny, copula,extraDistr, renv, cli, job, future.batchtools, FrF2, rmarkdown,RPushbullet, httr
License:GPL-2 |GPL-3 [expanded from: GPL (≥ 2)]
ByteCompile:yes
LazyData:true
URL:http://philchalmers.github.io/SimDesign/,https://github.com/philchalmers/SimDesign/wiki
RoxygenNote:7.3.3
Encoding:UTF-8
NeedsCompilation:no
Packaged:2025-12-16 18:29:39 UTC; phil
Author:Phil ChalmersORCID iD [aut, cre], Matthew Sigal [ctb], Ogreden Oguzhan [ctb], Mikko Rönkkö [aut], Moritz Ketzer [ctb]
Maintainer:Phil Chalmers <rphilip.chalmers@gmail.com>
Repository:CRAN
Date/Publication:2025-12-17 06:40:53 UTC

Structure for Organizing Monte Carlo Simulation Designs

Description

Structure for Organizing Monte Carlo Simulation Designs

Details

Provides tools to help organize Monte Carlo simulations in R. The packagecontrols the structure and back-end of Monte Carlo simulationsby utilizing a general generate-analyse-summarise strategy. The functions provided control commonsimulation issues such as re-simulating non-convergent results, support parallelback-end computations with proper random number generation within each simulationcondition,save and restore temporary files,aggregate results across independent nodes, and provide native support for debugging.The primary function for organizing the simulations isrunSimulation, whilefor array jobs submitting to HPC clusters (e.g., SLURM) seerunArraySimulationand the associated package vignettes.

For an in-depth tutorial of the package please refer toChalmers and Adkins (2020;doi:10.20982/tqmp.16.4.p248).For an earlier didactic presentation of the package users can refer to Sigal and Chalmers(2016;doi:10.1080/10691898.2016.1246953). Finally, see the associatedwiki on Github (https://github.com/philchalmers/SimDesign/wiki)for other tutorial material, examples, and applications ofSimDesign to real-world simulations.

Author(s)

Phil Chalmersrphilip.chalmers@gmail.com

References

Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248

Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953


Compute estimates and statistics

Description

Compute all relevant test statistics, parameter estimates, detection rates, and so on.This is the computational heavy lifting portion of the Monte Carlo simulation. Usersmay define a single Analysis function to perform all the analyses in the same function environment,or may define alist of named functions torunSimulation to allow for a moremodularized approach to performing the analyses in independent blocks (but that share the same generateddata). Note that if a suitableGenerate function was not supplied then this functioncan be used to be generate and analyse the Monte Carlo data (though in general thissetup is not recommended for larger simulations).

Usage

Analyse(condition, dat, fixed_objects)

Arguments

condition

a single row from the design input (as adata.frame), indicating thesimulation conditions

dat

thedat object returned from theGenerate function(usually adata.frame,matrix,vector, orlist)

fixed_objects

object passed down fromrunSimulation

Details

In some cases, it may be easier to change the output to a namedlist containingdifferent parameter configurations (e.g., whendetermining RMSE values for a large set of population parameters).

The use oftry functions is generally not required in this function becauseAnalyseis internally wrapped in atry call. Therefore, if a function stops earlythen this will cause the function to halt internally, the message which triggered thestopwill be recorded, andGenerate will be called again to obtain a different dataset.That said, it may be useful for users to throw their ownstop commands if the datashould be re-drawn for other reasons (e.g., an estimated model terminated correctlybut the maximum number of iterations were reached).

Value

returns a namednumeric vector ordata.frame with the values of interest(e.g., p-values, effects sizes, etc), or alist containing values of interest(e.g., separate matrix and vector of parameter estimates corresponding to elements inparameters). If adata.frame is returned with more than 1 row then theseobjects will be wrapped into suitablelist objects

References

Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248

Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953

See Also

stop,AnalyseIf,manageWarnings

Examples

## Not run: analyse <- function(condition, dat, fixed_objects) {    # require packages/define functions if needed, or better yet index with the :: operator    require(stats)    mygreatfunction <- function(x) print('Do some stuff')    #wrap computational statistics in try() statements to control estimation problems    welch <- t.test(DV ~ group, dat)    ind <- stats::t.test(DV ~ group, dat, var.equal=TRUE)    # In this function the p values for the t-tests are returned,    #  and make sure to name each element, for future reference    ret <- c(welch = welch$p.value,             independent = ind$p.value)    return(ret)}# A more modularized example approachanalysis_welch <- function(condition, dat, fixed_objects) {    welch <- t.test(DV ~ group, dat)    ret <- c(p=welch$p.value)    ret}analysis_ind <- function(condition, dat, fixed_objects) {    ind <- t.test(DV ~ group, dat, var.equal=TRUE)    ret <- c(p=ind$p.value)    ret}# pass functions as a named list# runSimulation(..., analyse=list(welch=analyse_welch, independent=analysis_ind))## End(Not run)

Perform a test that indicates whether a givenAnalyse() function should be executed

Description

This function is designed to prevent specific analysis function executions when thedesign conditions are not met. Primarily useful when theanalyse argument torunSimulation was input as a named list object, however some of theanalysis functions are not interesting/compatible with the generated data and shouldtherefore be skipped.

Usage

AnalyseIf(x, condition = NULL)

Arguments

x

logical statement to evaluate. If the statement evaluates toTRUEthen the remainder of the defined function will be evaluated

condition

(optional) the current design condition. This does not need to be suppliedif the expression inx evaluates to valid logical (e.g., useAttach(condition)prior to usingAnalyseIf, or usewith(condition, AnalyseIf(someLogicalTest)))

Author(s)

Phil Chalmersrphilip.chalmers@gmail.com

References

Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248

Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953

See Also

Analyse,runSimulation

Examples

## Not run: Design <- createDesign(N=c(10,20,30), var.equal = c(TRUE, FALSE))Generate <- function(condition, fixed_objects) {  Attach(condition)  dat <- data.frame(DV = rnorm(N*2), IV = gl(2, N, labels=c('G1', 'G2')))  dat}# always run this analysis for each row in DesignAnalyse1 <- function(condition, dat, fixed_objects) {  mod <- t.test(DV ~ IV, data=dat)  mod$p.value}# Only perform analysis when variances are equal and N = 20 or 30Analyse2 <- function(condition, dat, fixed_objects) {  AnalyseIf(var.equal && N %in% c(20, 30), condition)  mod <- t.test(DV ~ IV, data=dat, var.equal=TRUE)  mod$p.value}Summarise <- function(condition, results, fixed_objects) {  ret <- EDR(results, alpha=.05)  ret}#-------------------------------------------------------------------# append names 'Welch' and 'independent' to associated outputres <- runSimulation(design=Design, replications=100, generate=Generate,                     analyse=list(Welch=Analyse1, independent=Analyse2),                     summarise=Summarise)res# leave results unnamedres <- runSimulation(design=Design, replications=100, generate=Generate,                     analyse=list(Analyse1, Analyse2),                     summarise=Summarise)## End(Not run)

Attach objects for easier reference

Description

The behaviour of this function is very similar toattach,however it is environment specific, andtherefore only remains defined in a given function rather than in the Global Environment.Hence, this function is much safer to use than theattach, whichincidentally should never be used in your code. Thisis useful primarily as a convenience function when you prefer to call the variable namesincondition directly rather than indexing withcondition$sample_size orwith(condition, sample_size), for example.

Usage

Attach(  ...,  omit = NULL,  check = TRUE,  attach_listone = TRUE,  RStudio_flags = FALSE,  clip = interactive())

Arguments

...

a comma separated list ofdata.frame,tibble,list,ormatrix objects containing (column) elements that should be placed in thecurrent working environment

omit

an optional character vector containing the names of objects that should notbe attached to the current environment. For instance, if the objects named 'a' and 'b' shouldnot be attached then useomit = c('a', 'b').When NULL (default) all objects are attached

check

logical; check to see if the function will accidentally replace previously definedvariables with the same names as incondition? Default isTRUE, which will avoidthis error

attach_listone

logical; if the element to be assign is a list of length onethen assign the first element of this list with the associated name. This generally avoidsadding an often unnecessary list 1 index, such asname <- list[[1L]]

RStudio_flags

logical; print R script output comments that disable flaggedmissing variables in RStudio? Requires the formAttach(Design, RStudio_flags=TRUE) orin an interactive debugging sessionAttach(condition, RStudio_flags=TRUE)

clip

whenRstudio_flags = TRUE should the output be copied to theclipboard usingclipr? Makes it easier to paste into existing code. Onlyclipped in interactive mode

Details

Note that if you are using RStudio with the"Warn if variable used has no definition in scope"diagnostic flag then usingAttach() will raise suspensions. To suppress such issues,you can either disable such flags (the atomic solution) or evaluate the following outputin the R console and place the output in your working simulation file.

Attach(Design, RStudio_flags = TRUE)

Author(s)

Phil Chalmersrphilip.chalmers@gmail.com

References

Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248

Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953

See Also

runSimulation,Generate

Examples

Design <- createDesign(N1=c(10,20),                       N2=c(10,20),                       sd=c(1,2))Design# does not use Attach()Generate <- function(condition, fixed_objects ) {    # condition = single row of Design input (e.g., condition <- Design[1,])    N1 <- condition$N1    N2 <- condition$N2    sd <- condition$sd    group1 <- rnorm(N1)    group2 <- rnorm(N2, sd=sd)    dat <- data.frame(group = c(rep('g1', N1), rep('g2', N2)),                      DV = c(group1, group2))    dat}# similar to above, but using the Attach() function instead of indexingGenerate <- function(condition, fixed_objects ) {    Attach(condition) # N1, N2, and sd are now 'attached' and visible    group1 <- rnorm(N1)    group2 <- rnorm(N2, sd=sd)    dat <- data.frame(group = c(rep('g1', N1), rep('g2', N2)),                      DV = c(group1, group2))    dat}###################### NOTE: if you're using RStudio with code diagnostics on then evaluate + add the# following output to your source file to manually support the flagged variablesAttach(Design, RStudio_flags=TRUE)# Below is the same example, however with false positive missing variables suppressed# when # !diagnostics ... is added added to the source file(s)# !diagnostics suppress=N1,N2,sdGenerate <- function(condition, fixed_objects ) {    Attach(condition) # N1, N2, and sd are now 'attached' and visible    group1 <- rnorm(N1)    group2 <- rnorm(N2, sd=sd)    dat <- data.frame(group = c(rep('g1', N1), rep('g2', N2)),                      DV = c(group1, group2))    dat}

Example simulation from Brown and Forsythe (1974)

Description

Example results from the Brown and Forsythe (1974) article on robust estimators forvariance ratio tests. Statistical tests are organized by columns and the unique design conditionsare organized by rows. SeeBF_sim_alternative for an alternative form of the samesimulation. Code for this simulation is available of the wiki(https://github.com/philchalmers/SimDesign/wiki).

Author(s)

Phil Chalmersrphilip.chalmers@gmail.com

References

Brown, M. B. and Forsythe, A. B. (1974). Robust tests for the equality of variances.Journal of the American Statistical Association, 69(346), 364–367.

Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248

Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953

Examples

## Not run: data(BF_sim)head(BF_sim)#Type I errorssubset(BF_sim, var_ratio == 1)## End(Not run)

(Alternative) Example simulation from Brown and Forsythe (1974)

Description

Example results from the Brown and Forsythe (1974) article on robust estimators forvariance ratio tests. Statistical tests and distributions are organized by columnsand the unique design conditions are organized by rows. SeeBF_sim for an alternativeform of the same simulation where distributions are also included in the rows.Code for this simulation is available on the wiki (https://github.com/philchalmers/SimDesign/wiki).

Author(s)

Phil Chalmersrphilip.chalmers@gmail.com

References

Brown, M. B. and Forsythe, A. B. (1974). Robust tests for the equality of variances.Journal of the American Statistical Association, 69(346), 364–367.

Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248

Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953

Examples

## Not run: data(BF_sim_alternative)head(BF_sim_alternative)#' #Type I errorssubset(BF_sim_alternative, var_ratio == 1)## End(Not run)

Bradley's (1978) empirical robustness interval

Description

Robustness interval criteria for empirical detection rate estimates andempirical coverage estimates defined by Bradley (1978).SeeEDR andECR to obtain such estimates.

Usage

Bradley1978(  rate,  alpha = 0.05,  type = "liberal",  CI = FALSE,  out.logical = FALSE,  out.labels = c("conservative", "robust", "liberal"),  unname = FALSE)

Arguments

rate

(optional) numeric vector containing the empirical detectionrate(s) or empirical confidence interval estimates.If supplied a character vector with elements defined inout.labels or a logical vector will be returned indicating whether thedetection rate estimate is considered 'robust'.

When the input is an empirical coverage rate the argumentCI must beset toTRUE.

If this input is missing, the interval criteria will be printed to the console

alpha

Type I error rate to evaluated (default is .05)

type

character vector indicating the type of interval classification to use.Default is 'liberal', however can be 'stringent' to use Bradley's morestringent robustness criteria

CI

logical; should this robust interval be constructed on empirical detectionrates (FALSE) or empirical coverage rates (TRUE)?

out.logical

logical; should the output vector be TRUE/FALSE indicating whetherthe supplied empirical detection rate/CI should be considered "robust"? Default isFALSE, in which case the out.labels elements are used instead

out.labels

character vector of length three indicating the classificationlabels according to the desired robustness interval

unname

logical; applyunname to the results to remove any variablenames?

Author(s)

Phil Chalmersrphilip.chalmers@gmail.com

References

Bradley, J. V. (1978). Robustness?British Journal of Mathematical andStatistical Psychology, 31, 144-152.

Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248

Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953

See Also

EDR,ECR,Serlin2000

Examples

# interval criteria used for empirical detection ratesBradley1978()Bradley1978(type = 'stringent')Bradley1978(alpha = .01, type = 'stringent')# intervals applied to empirical detection rate estimatesedr <- c(test1 = .05, test2 = .027, test3 = .051, test4 = .076, test5 = .024)Bradley1978(edr)Bradley1978(edr, out.logical=TRUE) # is robust?###### interval criteria used for coverage estimatesBradley1978(CI = TRUE)Bradley1978(CI = TRUE, type = 'stringent')Bradley1978(CI = TRUE, alpha = .01, type = 'stringent')# intervals applied to empirical coverage rate estimatesecr <- c(test1 = .950, test2 = .973, test3 = .949, test4 = .924, test5 = .976)Bradley1978(ecr, CI=TRUE)Bradley1978(ecr, CI=TRUE, out.logical=TRUE) # is robust?

Compute congruence coefficient

Description

Computes the congruence coefficient, also known as an "unadjusted" correlationor Tucker's congruence coefficient.

Usage

CC(x, y = NULL, unname = FALSE)

Arguments

x

a vector ordata.frame/matrix containing thevariables to use. If a vector then the inputy is required,otherwise the congruence coefficient is computed for all bivariatecombinations

y

(optional) the second vector input to use ifx is a vector

unname

logical; applyunname to the results to remove any variablenames?

Author(s)

Phil Chalmersrphilip.chalmers@gmail.com

References

Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248

Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953

See Also

cor

Examples

vec1 <- runif(1000)vec2 <- runif(1000)CC(vec1, vec2)# compare to cor()cor(vec1, vec2)# column inputdf <- data.frame(vec1, vec2, vec3 = runif(1000))CC(df)cor(df)

Compute empirical coverage rates

Description

Computes the detection rate for determining empirical coverage rates given a set of estimatedconfidence intervals. Note that using1 - ECR(CIs, parameter) will provide the empiricaldetection rate. Also supports computing the average width of the CIs, which may be useful when comparingthe efficiency of CI estimators.

Usage

ECR(  CIs,  parameter,  tails = FALSE,  CI_width = FALSE,  complement = FALSE,  names = NULL,  unname = FALSE)

Arguments

CIs

anumeric vector ormatrix of confidence interval values for agiven parameter value, where the first element/column indicates the lower confidence intervaland the second element/column the upper confidence interval. If avector of length 2 is passed instead then the returned value will be either a 1 or 0 to indicatewhether the parameter value was or was not within the interval, respectively. Otherwise,the input must be a matrix with an even number of columns

parameter

a numeric scalar indicating the fixed parameter value. Alternative, anumericvector object with length equal to the number of rows asCIs (use to compare sets of parametersat once)

tails

logical; when TRUE returns a vector of length 2 to indicate the proportion of timesthe parameter was lower or higher than the supplied interval, respectively. This is mainly onlyuseful when the coverage region is not expected to be symmetric, and therefore is generally notrequired. Note that1 - sum(ECR(CIs, parameter, tails=TRUE)) == ECR(CIs, parameter)

CI_width

logical; rather than returning the overall coverage rate, return theaverage width of the CIs instead? Useful when comparing the efficiency of different CIestimators

complement

logical; rather than computing the proportion ofpopulation parameters within the CI, return the proportion outside theadvertised CI (1 - ECR = alpha). In the case where only one value is provided,which normally would return a 0 if outside the CI or 1 if inside, the valueswill be switched (useful when using, for example, CI tests of for the significanceof parameters)

names

an optional character vector used to name the returned object. Generally usefulwhen more than one CI estimate is investigated at once

unname

logical; applyunname to the results to remove any variablenames?

Author(s)

Phil Chalmersrphilip.chalmers@gmail.com

References

Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248

Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953

See Also

EDR

Examples

CIs <- matrix(NA, 100, 2)for(i in 1:100){   dat <- rnorm(100)   CIs[i,] <- t.test(dat)$conf.int}ECR(CIs, 0)ECR(CIs, 0, tails = TRUE)ECR(CIs, 0, complement = TRUE) # proportion outside interval# single vector inputCI <- c(-1, 1)ECR(CI, 0)ECR(CI, 0, complement = TRUE)ECR(CI, 2)ECR(CI, 2, complement = TRUE)ECR(CI, 2, tails = TRUE)# parameters of the same size as CIparameters <- 1:10CIs <- cbind(parameters - runif(10), parameters + runif(10))parameters <- parameters + rnorm(10)ECR(CIs, parameters)# average width of CIsECR(CIs, parameters, CI_width=TRUE)# ECR() for multiple CI estimates in the same objectparameter <- 10CIs <- data.frame(lowerCI_1=parameter - runif(10),                  upperCI_1=parameter + runif(10),                  lowerCI_2=parameter - 2*runif(10),                  upperCI_2=parameter + 2*runif(10))head(CIs)ECR(CIs, parameter)ECR(CIs, parameter, tails=TRUE)ECR(CIs, parameter, CI_width=TRUE)# often a good idea to provide names for the outputECR(CIs, parameter, names = c('this', 'that'))ECR(CIs, parameter, CI_width=TRUE, names = c('this', 'that'))ECR(CIs, parameter, tails=TRUE, names = c('this', 'that'))

Compute the empirical detection/rejection rate for Type I errors and Power

Description

Computes the detection/rejection rate for determining empiricalType I error and power rates using information from p-values.

Usage

EDR(p, alpha = 0.05, unname = FALSE)

Arguments

p

anumeric vector ormatrix/data.frame of p-values from thedesired statistical estimator. If amatrix, each statistic must be organized bycolumn, where the number of rows is equal to the number of replications

alpha

the detection threshold (typical values are .10, .05, and .01).Default is .05

unname

logical; applyunname to the results to remove any variablenames?

Author(s)

Phil Chalmersrphilip.chalmers@gmail.com

References

Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248

Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953

See Also

ECR,Bradley1978

Examples

rates <- numeric(100)for(i in 1:100){   dat <- rnorm(100)   rates[i] <- t.test(dat)$p.value}EDR(rates)EDR(rates, alpha = .01)# multiple rates at oncerates <- cbind(runif(1000), runif(1000))EDR(rates)

Generate data

Description

Generate data from a single row in thedesign input (seerunSimulation). R containsnumerous approaches to generate data, some of which are contained in the base package, as wellas inSimDesign (e.g.,rmgh,rValeMaurelli,rHeadrick).However the majority can be found in external packages. See CRAN's list of possible distributions here:https://CRAN.R-project.org/view=Distributions. Note that this function technicallycan be omitted if the data generation is provided in theAnalyse step, thoughin general this is not recommended.

Usage

Generate(condition, fixed_objects)

Arguments

condition

a single row from thedesign input (as adata.frame), indicating thesimulation conditions

fixed_objects

object passed down fromrunSimulation

Details

The use oftry functions is generally not required in this function becauseGenerateis internally wrapped in atry call. Therefore, if a function stops earlythen this will cause the function to halt internally, the message which triggered thestopwill be recorded, andGenerate will be called again to obtain a different dataset.That said, it may be useful for users to throw their ownstop commands if the datashould be re-drawn for other reasons (e.g., an estimated model terminated correctlybut the maximum number of iterations were reached).

Value

returns a single object containing the data to be analyzed (usually avector,matrix, ordata.frame),orlist

References

Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248

Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953

See Also

addMissing,Attach,rmgh,rValeMaurelli,rHeadrick

Examples

## Not run: generate <- function(condition, fixed_objects) {    N1 <- condition$sample_sizes_group1    N2 <- condition$sample_sizes_group2    sd <- condition$standard_deviations    group1 <- rnorm(N1)    group2 <- rnorm(N2, sd=sd)    dat <- data.frame(group = c(rep('g1', N1), rep('g2', N2)),                      DV = c(group1, group2))    # just a silly example of a simulated parameter    pars <- list(random_number = rnorm(1))    list(dat=dat, parameters=pars)}# similar to above, but using the Attach() function instead of indexinggenerate <- function(condition, fixed_objects) {    Attach(condition)    N1 <- sample_sizes_group1    N2 <- sample_sizes_group2    sd <- standard_deviations    group1 <- rnorm(N1)    group2 <- rnorm(N2, sd=sd)    dat <- data.frame(group = c(rep('g1', N1), rep('g2', N2)),                      DV = c(group1, group2))    dat}generate2 <- function(condition, fixed_objects) {    mu <- sample(c(-1,0,1), 1)    dat <- rnorm(100, mu)    dat        #return simple vector (discard mu information)}generate3 <- function(condition, fixed_objects) {    mu <- sample(c(-1,0,1), 1)    dat <- data.frame(DV = rnorm(100, mu))    dat}## End(Not run)

Perform a test that indicates whether a givenGenerate() function should be executed

Description

This function is designed to prevent specific generate function executions when thedesign conditions are not met. Primarily useful when thegenerate argument torunSimulation was input as a named list object, however should only beapplied for some specific design condition (otherwise, the data generation moves to thenext function in the list).

Usage

GenerateIf(x, condition = NULL)

Arguments

x

logical statement to evaluate. If the statement evaluates toTRUEthen the remainder of the defined function will be evaluated

condition

(optional) the current design condition. This does not need to be suppliedif the expression inx evaluates to valid logical (e.g., useAttach(condition)prior to usingAnalyseIf, or usewith(condition, AnalyseIf(someLogicalTest)))

Author(s)

Phil Chalmersrphilip.chalmers@gmail.com

References

Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248

Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953

See Also

Analyse,runSimulation

Examples

## Not run: # SimFunctions(nGenerate = 2)Design <- createDesign(N=c(10,20,30), var.equal = c(TRUE, FALSE))Generate.G1 <- function(condition, fixed_objects) {  GenerateIf(condition$var.equal == FALSE) # only run when unequal vars  Attach(condition)  dat <- data.frame(DV = c(rnorm(N), rnorm(N, sd=2)),                    IV = gl(2, N, labels=c('G1', 'G2')))  dat}Generate.G2 <- function(condition, fixed_objects) {  Attach(condition)  dat <- data.frame(DV = rnorm(N*2), IV = gl(2, N, labels=c('G1', 'G2')))  dat}# always run this analysis for each row in DesignAnalyse <- function(condition, dat, fixed_objects) {  mod <- t.test(DV ~ IV, data=dat)  mod$p.value}Summarise <- function(condition, results, fixed_objects) {  ret <- EDR(results, alpha=.05)  ret}#-------------------------------------------------------------------# append names 'Welch' and 'independent' to associated outputres <- runSimulation(design=Design, replications=1000,                     generate=list(G1=Generate.G1, G2=Generate.G2),                     analyse=Analyse,                     summarise=Summarise)res## End(Not run)

Compute the integrated root mean-square error

Description

Computes the average/cumulative deviation given two continuous functions and an optionalfunction representing the probability density function. Only one-dimensional integrationis supported.

Usage

IRMSE(  estimate,  parameter,  fn,  density = function(theta, ...) 1,  lower = -Inf,  upper = Inf,  ...)

Arguments

estimate

a vector of parameter estimates

parameter

a vector of population parameters

fn

a continuous function where the first argument is to be integrated and the second argument isa vector of parameters or parameter estimates. This functionrepresents a implied continuous function which uses the sample estimates or population parameters

density

(optional) a density function used to marginalize (i.e., average), where the firstargument is to be integrated, and must be of the formdensity(theta, ...) ordensity(theta, param1, param2), whereparam1 is a placeholder name for thehyper-parameters associated with the probability density function. If omitted thenthe cumulative different between the respective functions will be computed instead

lower

lower bound to begin numerical integration from

upper

upper bound to finish numerical integration to

...

additional parameters to pass tofnest,fnparam,density,andintegrate,

Details

The integrated root mean-square error (IRMSE) is of the form

IRMSE(\theta) = \sqrt{\int [f(\theta, \hat{\psi}) - f(\theta, \psi)]^2 g(\theta, ...)}

whereg(\theta, ...) is the density function used to marginalize the continuous sample(f(\theta, \hat{\psi})) and population (f(\theta, \psi)) functions.

Value

returns a singlenumeric term indicating the average/cumulative deviationgiven the supplied continuous functions

Author(s)

Phil Chalmersrphilip.chalmers@gmail.com

References

Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248

Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953

See Also

RMSE

Examples

# logistic regression function with one slope and interceptfn <- function(theta, param) 1 / (1 + exp(-(param[1] + param[2] * theta)))# sample and population setsest <- c(-0.4951, 1.1253)pop <- c(-0.5, 1)theta <- seq(-10,10,length.out=1000)plot(theta, fn(theta, pop), type = 'l', col='red', ylim = c(0,1))lines(theta, fn(theta, est), col='blue', lty=2)# cumulative result (i.e., standard integral)IRMSE(est, pop, fn)# integrated RMSE result by marginalizing over a N(0,1) distributionden <- function(theta, mean, sd) dnorm(theta, mean=mean, sd=sd)IRMSE(est, pop, fn, den, mean=0, sd=1)# this specification is equivalent to the aboveden2 <- function(theta, ...) dnorm(theta, ...)IRMSE(est, pop, fn, den2, mean=0, sd=1)

Compute the mean absolute error

Description

Computes the average absolute deviation of a sample estimate from the parameter value.Accepts estimate and parameter values, as well as estimate values which are in deviation form.

Usage

MAE(estimate, parameter = NULL, type = "MAE", percent = FALSE, unname = FALSE)

Arguments

estimate

anumeric vector,matrix/data.frame, orlistof parameter estimates.If a vector, the length is equal to the number of replications. If amatrix/data.frame the number of rows must equal the number of replications.list objects will be loopedover using the same rules after above after first translating the information into one-dimensionalvectors and re-creating the structure upon return

parameter

anumeric scalar/vector ormatrix indicating the fixed parameter values.If a single value is supplied andestimate is amatrix/data.framethen the value will berecycled for each column; otherwise, each element will be associatedwith each respective column in theestimate input.IfNULL, then it will be assumed that theestimate input is in a deviationform (thereforemean(abs(estimate)) will be returned)

type

type of deviation to compute. Can be'MAE' (default) for the mean absolute error,'NMSE' for the normalized MAE (MAE / (max(estimate) - min(estimate))), or'SMSE' for the standardized MAE (MAE / sd(estimate))

percent

logical; change returned result to percentage by multiplying by 100?Default is FALSE

unname

logical; applyunname to the results to remove any variablenames?

Value

returns a numeric vector indicating the overall mean absolute error in the estimates

Author(s)

Phil Chalmersrphilip.chalmers@gmail.com

References

Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248

Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953

See Also

RMSE

Examples

pop <- 1samp <- rnorm(100, 1, sd = 0.5)MAE(samp, pop)dev <- samp - popMAE(dev)MAE(samp, pop, type = 'NMAE')MAE(samp, pop, type = 'SMAE')# matrix inputmat <- cbind(M1=rnorm(100, 2, sd = 0.5), M2 = rnorm(100, 2, sd = 1))MAE(mat, parameter = 2)# same, but with data.framedf <- data.frame(M1=rnorm(100, 2, sd = 0.5), M2 = rnorm(100, 2, sd = 1))MAE(df, parameter = c(2,2))# parameters of the same sizeparameters <- 1:10estimates <- parameters + rnorm(10)MAE(estimates, parameters)

Compute the relative performance behavior of collections of standard errors

Description

The mean-square relative standard error (MSRSE) compares standard errorestimates to the standard deviation of the respectiveparameter estimates. Values close to 1 indicate that the behavior of the standard errorsclosely matched the sampling variability of the parameter estimates.

Usage

MSRSE(SE, SD, percent = FALSE, unname = FALSE)

Arguments

SE

anumeric scalar/vector indicating the average standard errors acrossthe replications, or amatrix of collected standard error estimates themselvesto be used to compute the average standard errors. Each column/element in this inputcorresponds to the column/element inSD

SD

anumeric scalar/vector indicating the standard deviation acrossthe replications, or amatrix of collected parameter estimates themselvesto be used to compute the standard deviations. Each column/element in this inputcorresponds to the column/element inSE

percent

logical; change returned result to percentage by multiplying by 100?Default is FALSE

unname

logical; applyunname to the results to remove any variablenames?

Details

Mean-square relative standard error (MSRSE) is expressed as

MSRSE = \frac{E(SE(\psi)^2)}{SD(\psi)^2} = \frac{1/R * \sum_{r=1}^R SE(\psi_r)^2}{SD(\psi)^2}

whereSE(\psi_r) represents the estimate of the standard error at therthsimulation replication, andSD(\psi) represents the standard deviation estimateof the parameters across allR replications. Note thatSD(\psi)^2 is used,which corresponds to the variance of\psi.

Value

returns avector of ratios indicating the relative performanceof the standard error estimates to the observed parameter standard deviation.Values less than 1 indicate that the standard errors were larger than the standarddeviation of the parameters (hence, the SEs are interpreted as more conservative),while values greater than 1 were smaller than the standard deviation of theparameters (i.e., more liberal SEs)

Author(s)

Phil Chalmersrphilip.chalmers@gmail.com

References

Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248

Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953

Examples

Generate <- function(condition, fixed_objects) {   X <- rep(0:1, each = 50)   y <- 10 + 5 * X + rnorm(100, 0, .2)   data.frame(y, X)}Analyse <- function(condition, dat, fixed_objects) {   mod <- lm(y ~ X, dat)   so <- summary(mod)   ret <- c(SE = so$coefficients[,"Std. Error"],            est = so$coefficients[,"Estimate"])   ret}Summarise <- function(condition, results, fixed_objects) {   MSRSE(SE = results[,1:2], SD = results[,3:4])}results <- runSimulation(replications=500, generate=Generate,                         analyse=Analyse, summarise=Summarise)results

Probabilistic Bisection Algorithm

Description

The functionPBA searches a specifiedinterval for a root(i.e., zero) of the functionf(x) with respect to its first argument.However, this function differs from deterministic cousins such asuniroot in thatf may contain stochastic errorcomponents, and instead provides a Bayesian interval where the rootis likely to lie. Note that it is assumed thatE[f(x)] is non-decreasinginx and that the root is between the search interval (evaluatedapproximately whencheck.interval=TRUE).See Waeber, Frazier, and Henderson (2013) for details.

Usage

PBA(  f.root,  interval,  ...,  p = 0.6,  integer = FALSE,  tol = if (integer) 0.01 else 1e-04,  maxiter = 300L,  miniter = 100L,  wait.time = NULL,  f.prior = NULL,  resolution = 10000L,  check.interval = TRUE,  check.interval.only = FALSE,  verbose = interactive())## S3 method for class 'PBA'print(x, ...)## S3 method for class 'PBA'plot(x, type = "posterior", main = "Probabilistic Bisection Posterior", ...)

Arguments

f.root

noisy function for which the root is sought

interval

a vector containing the end-points of the intervalto be searched for the root of the formc(lower, upper).

Note that if the interval is specified asc(upper, lower), whereupper > lower then it the search will be organized such that increasingthe value of the root estimate will result in lowerf(x) values

...

additional named arguments to be passed tof

p

assumed constant for probability of correct responses (must be > 0.5)

integer

logical; should the values of the root be considered integeror numeric? The former uses a discreet grid to track the updates, while thelatter currently creates a grid withresolution points

tol

tolerance criteria for convergence based on average of thef(x) evaluations

maxiter

the maximum number of iterations (default 300)

miniter

minimum number of iterations (default 100)

wait.time

(optional) instead of terminating after specific estimate criteriaare satisfied (e.g.,tol), terminate after a specificwait time. Input is specified either as a numeric vector in seconds or as a charactervector to be formatted bytimeFormater.Note that users should increase the number ofmaxiter as wellso that termination can occur if either the maximum iterations are satisfiedor the specified wait time has elapsed (whichever occurs first)

f.prior

density function indicating the likely location of the prior(e.g., if root is within [0,1] thendunif works, otherwise customfunctions will be required)

resolution

constant indicating thenumber of equally spaced grid points to track wheninteger = FALSE.

check.interval

logical; should an initial check be made to determinewhetherf(interval[1L]) andf(interval[2L]) have oppositesigns? Default is TRUE

check.interval.only

logical; return only TRUE or FALSE to testwhether there is a likely root giveninterval? Setting this to TRUEcan be useful when you are unsure about the root location interval andmay want to use a higherreplication input fromSimSolve

verbose

logical; should the iterations and estimate be printed to theconsole?

x

an object of classPBA

type

type of plot to draw for PBA object. Can be either 'posterior' or'history' to plot the PBA posterior distribution or the mediation iterationhistory

main

plot title

References

Horstein, M. (1963). Sequential transmission using noiseless feedback.IEEE Trans. Inform. Theory, 9(3):136-143.

Waeber, R., Frazier, P. I. & Henderson, S. G. (2013). Bisection Searchwith Noisy Responses. SIAM Journal on Control and Optimization,Society for Industrial & Applied Mathematics (SIAM), 51, 2261-2279.

See Also

uniroot,RobbinsMonro

Examples

# find x that solves f(x) - b = 0 for the followingf.root <- function(x, b = .6) 1 / (1 + exp(-x)) - bf.root(.3)xs <- seq(-3,3, length.out=1000)plot(xs, f.root(xs), type = 'l', ylab = "f(x)", xlab='x', las=1)abline(h=0, col='red')retuni <- uniroot(f.root, c(0,1))retuniabline(v=retuni$root, col='blue', lty=2)# PBA without noisy rootretpba <- PBA(f.root, c(0,1))retpbaretpba$rootplot(retpba)plot(retpba, type = 'history')# Same problem, however root function is now noisy. Hence, need to solve#  fhat(x) - b + e = 0, where E(e) = 0f.root_noisy <- function(x) 1 / (1 + exp(-x)) - .6 + rnorm(1, sd=.02)sapply(rep(.3, 10), f.root_noisy)# uniroot "converges" unreliablyset.seed(123)uniroot(f.root_noisy, c(0,1))$rootuniroot(f.root_noisy, c(0,1))$rootuniroot(f.root_noisy, c(0,1))$root# probabilistic bisection provides better convergenceretpba.noise <- PBA(f.root_noisy, c(0,1))retpba.noiseplot(retpba.noise)plot(retpba.noise, type = 'history')## Not run: # ignore termination criteria and instead run for 30 seconds or 50000 iterationsretpba.noise_30sec <- PBA(f.root_noisy, c(0,1), wait.time = "0:30", maxiter=50000)retpba.noise_30sec## End(Not run)

Compute the relative absolute bias of multiple estimators

Description

Computes the relative absolute bias given the bias estimates for multiple estimators.

Usage

RAB(x, percent = FALSE, unname = FALSE)

Arguments

x

anumeric vector of bias estimates (seebias),where the first element will be used as the reference

percent

logical; change returned result to percentage by multiplying by 100?Default is FALSE

unname

logical; applyunname to the results to remove any variablenames?

Value

returns avector of absolute bias ratios indicating the relative biaseffects compared to the first estimator. Values less than 1 indicate better bias estimatesthan the first estimator, while values greater than 1 indicate worse bias than the first estimator

Author(s)

Phil Chalmersrphilip.chalmers@gmail.com

References

Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248

Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953

Examples

pop <- 1samp1 <- rnorm(5000, 1)bias1 <- bias(samp1, pop)samp2 <- rnorm(5000, 1)bias2 <- bias(samp2, pop)RAB(c(bias1, bias2))RAB(c(bias1, bias2), percent = TRUE) # as a percentage

Compute the relative difference

Description

Computes the relative difference statistic of the form(est - pop)/ pop, whichis equivalent to the formest/pop - 1. If matrices are supplied thenan equivalent matrix variant will be used of the form(est - pop) * solve(pop). Values closer to 0 indicate betterrelative parameter recovery. Note that for single variable inputs this is equivalent tobias(..., type = 'relative').

Usage

RD(est, pop, as.vector = TRUE, unname = FALSE)

Arguments

est

anumeric vector,matrix/data.frame, orlist containingthe parameter estimates

pop

anumeric vector or matrix containing the true parameter values. Must beof comparable dimension toest

as.vector

logical; always wrap the result in aas.vector functionbefore returning?

unname

logical; applyunname to the results to remove any variablenames?

Value

returns avector ormatrix depending on the inputs and whetheras.vector was used

Author(s)

Phil Chalmersrphilip.chalmers@gmail.com

References

Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248

Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953

Examples

# vectorpop <- seq(1, 100, length.out=9)est1 <- pop + rnorm(9, 0, .2)(rds <- RD(est1, pop))summary(rds)# matrixpop <- matrix(c(1:8, 10), 3, 3)est2 <- pop + rnorm(9, 0, .2)RD(est2, pop, as.vector = FALSE)(rds <- RD(est2, pop))summary(rds)

Compute the relative efficiency of multiple estimators

Description

Computes the relative efficiency given the RMSE (default) or MSE values for multiple estimators.

Usage

RE(x, MSE = FALSE, percent = FALSE, unname = FALSE)

Arguments

x

anumeric vector of root mean square error values (seeRMSE),where the first element will be used as the reference. Otherwise, the object could containMSE values if the flagMSE = TRUE is also included

MSE

logical; are the input value mean squared errors instead of root mean square errors?

percent

logical; change returned result to percentage by multiplying by 100?Default is FALSE

unname

logical; applyunname to the results to remove any variablenames?

Value

returns avector of variance ratios indicating the relative efficiency comparedto the first estimator. Values less than 1 indicate better efficiency than the firstestimator, while values greater than 1 indicate worse efficiency than the first estimator

Author(s)

Phil Chalmersrphilip.chalmers@gmail.com

References

Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248

Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953

Examples

pop <- 1samp1 <- rnorm(100, 1, sd = 0.5)RMSE1 <- RMSE(samp1, pop)samp2 <- rnorm(100, 1, sd = 1)RMSE2 <- RMSE(samp2, pop)RE(c(RMSE1, RMSE2))RE(c(RMSE1, RMSE2), percent = TRUE) # as a percentage# using MSE insteadmse <- c(RMSE1, RMSE2)^2RE(mse, MSE = TRUE)

Compute the (normalized) root mean square error

Description

Computes the average deviation (root mean square error; also known as the root mean square deviation)of a sample estimate from the parameter value. Accepts estimate and parameter values,as well as estimate values which are in deviation form.

Usage

RMSE(  estimate,  parameter = NULL,  type = "RMSE",  MSE = FALSE,  percent = FALSE,  unname = FALSE)RMSD(  estimate,  parameter = NULL,  type = "RMSE",  MSE = FALSE,  percent = FALSE,  unname = FALSE)

Arguments

estimate

anumeric vector,matrix/data.frame, orlistof parameter estimates.If a vector, the length is equal to the number of replications. If amatrix/data.frame, the number of rows must equal the number of replications.list objects will be loopedover using the same rules after above after first translating the information into one-dimensionalvectors and re-creating the structure upon return

parameter

anumeric scalar/vector indicating the fixed parameter values.If a single value is supplied andestimate is amatrix/data.frame thenthe value will be recycled for each column; otherwise, each element will be associatedwith each respective column in theestimate input.IfNULL then it will be assumed that theestimate input is in a deviationform (thereforesqrt(mean(estimate^2)) will be returned)

type

type of deviation to compute. Can be'RMSE' (default) for the root mean square-error,'NRMSE' for the normalized RMSE (RMSE / (max(estimate) - min(estimate))),'SRMSE' for the standardized RMSE (RMSE / sd(estimate)),'CV' for the coefficient of variation, or'RMSLE' for the root mean-square log-error

MSE

logical; return the mean square error equivalent of the results instead of the rootmean-square error (in other words, the result is squared)? Default isFALSE

percent

logical; change returned result to percentage by multiplying by 100?Default is FALSE

unname

logical; applyunname to the results to remove any variablenames?

Value

returns anumeric vector indicating the overall average deviation in the estimates

Author(s)

Phil Chalmersrphilip.chalmers@gmail.com

References

Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248

Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953

See Also

bias

MAE

Examples

pop <- 1samp <- rnorm(100, 1, sd = 0.5)RMSE(samp, pop)dev <- samp - popRMSE(dev)RMSE(samp, pop, type = 'NRMSE')RMSE(dev, type = 'NRMSE')RMSE(dev, pop, type = 'SRMSE')RMSE(samp, pop, type = 'CV')RMSE(samp, pop, type = 'RMSLE')# percentage reportedRMSE(samp, pop, type = 'NRMSE')RMSE(samp, pop, type = 'NRMSE', percent = TRUE)# matrix inputmat <- cbind(M1=rnorm(100, 2, sd = 0.5), M2 = rnorm(100, 2, sd = 1))RMSE(mat, parameter = 2)RMSE(mat, parameter = c(2, 3))# different parameter associated with each columnmat <- cbind(M1=rnorm(1000, 2, sd = 0.25), M2 = rnorm(1000, 3, sd = .25))RMSE(mat, parameter = c(2,3))# same, but with data.framedf <- data.frame(M1=rnorm(100, 2, sd = 0.5), M2 = rnorm(100, 2, sd = 1))RMSE(df, parameter = c(2,2))# parameters of the same sizeparameters <- 1:10estimates <- parameters + rnorm(10)RMSE(estimates, parameters)

Compute the relative standard error ratio

Description

Computes the relative standard error ratio given the set of estimated standard errors (SE) and thedeviation across the R simulation replications (SD). The ratio is formed by finding the expectationof the SE terms, and compares this expectation to the general variability of their respective parameterestimates across the R replications (ratio should equal 1). This is used to roughly evaluate whether theSEs being advertised by a given estimation method matches the sampling variability of the respectiveestimates across samples.

Usage

RSE(SE, ests, unname = FALSE)

Arguments

SE

anumeric matrix of SE estimates across the replications (extractedfrom theresults object in the Summarise step). Alternatively, can be a vector containingthe mean of the SE estimates across the R simulation replications

ests

anumeric matrix object containing the parameter estimates under investigationfound within theSummarise function. This input is used to compute thestandard deviation/variance estimates for each column to evaluate how well the expected SEmatches the standard deviation

unname

logical; applyunname to the results to remove any variablenames?

Value

returns vector of variance ratios, (RSV = SE^2/SD^2)

Author(s)

Phil Chalmersrphilip.chalmers@gmail.com

References

Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248

Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953

Examples

R <- 10000par_ests <- cbind(rnorm(R), rnorm(R, sd=1/10),                  rnorm(R, sd=1/15))colnames(par_ests) <- paste0("par", 1:3)(SDs <- colSDs(par_ests))SEs <- cbind(1 + rnorm(R, sd=.01),             1/10 + + rnorm(R, sd=.01),             1/15 + rnorm(R, sd=.01))(E_SEs <- colMeans(SEs))RSE(SEs, par_ests)# equivalent to the formcolMeans(SEs) / SDs

Robbins-Monro (1951) stochastic root-finding algorithm

Description

Function performs stochastic root solving for the providedf(x)using the Robbins-Monro (1951) algorithm. Differs from deterministiccousins such asuniroot in thatf may contain stochastic errorcomponents, where the root is obtained through the running average methodprovided by noise filter (see alsoPBA).Assumes thatE[f(x)] is non-decreasing inx.

Usage

RobbinsMonro(  f,  p,  ...,  Polyak_Juditsky = FALSE,  maxiter = 500L,  miniter = 100L,  k = 3L,  tol = 1e-05,  verbose = interactive(),  fn.a = function(iter, a = 1, b = 1/2, c = 0, ...) a/(iter + c)^b)## S3 method for class 'RM'print(x, ...)## S3 method for class 'RM'plot(x, par = 1, main = NULL, Polyak_Juditsky = FALSE, ...)

Arguments

f

noisy function for which the root is sought

p

vector of starting values to be passed asf(p, ...)

...

additional named arguments to be passed tof

Polyak_Juditsky

logical; apply the Polyak and Juditsky (1992)running-average method? Returns the final running average estimateusing the Robbins-Monro updates (also applies toplot).Note that this should only beused when the step-sizes are sufficiently large so that the Robbins-Monrohave the ability to stochastically explore around the root (not justapproach it from one side, which occurs when using small steps)

maxiter

the maximum number of iterations (default 500)

miniter

minimum number of iterations (default 100)

k

number of consecutivetol criteria required before terminating

tol

tolerance criteria for convergence on the changes in theupdatedp elements. Must be achieved onk (default 3)successive occasions

verbose

logical; should the iterations and estimate be printed to theconsole?

fn.a

function to create thea coefficient in the Robbins-Monronoise filter. Requires the first argument is the current iteration (iter),provide one or more arguments, and (optionally) the.... Sequence functionis of the form recommended by Spall (2000).

Note that if a different function is provided it must satisfy the propertythat\sum^\infty_{i=1} a_i = \infty and\sum^\infty_{i=1} a_i^2 < \infty

x

an object of classRM

par

which parameter in the original vectorp to include in the plot

main

plot title

References

Polyak, B. T. and Juditsky, A. B. (1992). Acceleration of StochasticApproximation by Averaging. SIAM Journal on Control and Optimization,30(4):838.

Robbins, H. and Monro, S. (1951). A stochastic approximation method.Ann.Math.Statistics, 22:400-407.

Spall, J.C. (2000). Adaptive stochastic approximation by the simultaneousperturbation method. IEEE Trans. Autom. Control 45, 1839-1853.

See Also

uniroot,PBA

Examples

# find x that solves f(x) - b = 0 for the followingf.root <- function(x, b = .6) 1 / (1 + exp(-x)) - bf.root(.3)xs <- seq(-3,3, length.out=1000)plot(xs, f.root(xs), type = 'l', ylab = "f(x)", xlab='x')abline(h=0, col='red')retuni <- uniroot(f.root, c(0,1))retuniabline(v=retuni$root, col='blue', lty=2)# Robbins-Monro without noisy root, start with p=.9retrm <- RobbinsMonro(f.root, .9)retrmplot(retrm)# Same problem, however root function is now noisy. Hence, need to solve#  fhat(x) - b + e = 0, where E(e) = 0f.root_noisy <- function(x) 1 / (1 + exp(-x)) - .6 + rnorm(1, sd=.02)sapply(rep(.3, 10), f.root_noisy)# uniroot "converges" unreliablyset.seed(123)uniroot(f.root_noisy, c(0,1))$rootuniroot(f.root_noisy, c(0,1))$rootuniroot(f.root_noisy, c(0,1))$root# Robbins-Monro provides better convergenceretrm.noise <- RobbinsMonro(f.root_noisy, .9)retrm.noiseplot(retrm.noise)# different power (b) for fn.a()retrm.b2 <- RobbinsMonro(f.root_noisy, .9, b = .01)retrm.b2plot(retrm.b2)# use Polyak-Juditsky averaging (b should be closer to 0 to work well)retrm.PJ <- RobbinsMonro(f.root_noisy, .9, b = .01,                         Polyak_Juditsky = TRUE)retrm.PJ   # final Polyak_Juditsky estimateplot(retrm.PJ) # Robbins-Monro historyplot(retrm.PJ, Polyak_Juditsky = TRUE) # Polyak_Juditsky history

Surrogate Function Approximation via the Generalized Linear Model

Description

Given a simulation that was executed withrunSimulation,potentially with the argumentstore_results = TRUE to store theunsummarised analysis results, fit a surrogate function approximation (SFA)model to the results and (optionally) perform a root-solvingstep to solve a target quantity. See Schoemann et al. (2014) for details.

Usage

SFA(  results,  formula,  family = "binomial",  b = NULL,  design = NULL,  CI = 0.95,  interval = NULL,  ...)## S3 method for class 'SFA'print(x, ...)

Arguments

results

data returned fromrunSimulation. This can bethe original results object or the extracted results stored when usingstore_results = TRUE included to store the analysis results.

formula

formula to specify for the regression model

family

character vector indicating the family of GLMs to use(seefamily)

b

(optional) Target quantity to use for root solving given the fittedsurrogate function (e.g., find sample size associated with SFA implied power of .80)

design

(optional)data.frame object containing all the informationrelevant for the surrogate model (passed tonewdata inpredict) with anNA value in the variable to be solved

CI

advertised confidence interval of SFA prediction around solved target

interval

interval to be passed touniroot if not specified thenthe lowest and highest values fromresults for the respective variablewill be used

...

additional arguments to pass toglm

x

an object of classSFA

Author(s)

Phil Chalmersrphilip.chalmers@gmail.com

References

Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248

Schoemann, A. M., Miller, P., Pornprasertmanit, S., and Wu, W. (2014).Using Monte Carlo simulations to determine power and sample size for plannedmissing designs.International Journal of Behavioral Development,SAGE Publications, 38, 471-479.

See Also

runSimulation,SimSolve

Examples

## Not run: # create long Design object to fit surrogate overDesign <- createDesign(N = 100:500,                       d = .2)Design#~~~~~~~~~~~~~~~~~~~~~~~~#### Step 2 --- Define generate, analyse, and summarise functionsGenerate <- function(condition, fixed_objects) {    Attach(condition)    group1 <- rnorm(N)    group2 <- rnorm(N, mean=d)    dat <- data.frame(group = gl(2, N, labels=c('G1', 'G2')),                      DV = c(group1, group2))    dat}Analyse <- function(condition, dat, fixed_objects) {    p <- c(p = t.test(DV ~ group, dat, var.equal=TRUE)$p.value)    p}Summarise <- function(condition, results, fixed_objects) {    ret <- EDR(results, alpha = .05)    ret}#~~~~~~~~~~~~~~~~~~~~~~~~#### Step 3 --- Estimate power over N# Use small number of replications given range of sample sizes## note that due to the lower replications disabling the## RAM printing will help reduce overheadsim <- runSimulation(design=Design, replications=10,                     generate=Generate, analyse=Analyse,                     summarise=Summarise, store_results=TRUE, save=FALSE,                     progress=FALSE, control=list(print_RAM=FALSE))sim# total of 4010 replicationsum(sim$REPLICATIONS)# use the unsummarised results for the SFA, and include p.values < alphasim_results <- SimResults(sim)sim_results <- within(sim_results, sig <- p < .05)sim_results# fitted modelsfa <- SFA(sim_results, formula = sig ~ N)sfasummary(sfa)# plot the observed and SFA expected valuesplot(p ~ N, sim, las=1, pch=16, main='Rejection rates with R=10')pred <- predict(sfa, type = 'response')lines(sim_results$N, pred, col='red', lty=2)# fitted model + root-solved solution given f(.) = b,#   where b = target power of .8design <- data.frame(N=NA, d=.2)sfa.root <- SFA(sim_results, formula = sig ~ N,                b=.8, design=design)sfa.root# true rootpwr::pwr.t.test(power=.8, d=.2)################# example with smaller range but higher precisionDesign <- createDesign(N = 375:425,                       d = .2)Designsim2 <- runSimulation(design=Design, replications=100,                     generate=Generate, analyse=Analyse,                     summarise=Summarise, store_results=TRUE, save=FALSE,                     progress=FALSE, control=list(print_RAM=FALSE))sim2sum(sim2$REPLICATIONS) # more replications in total# use the unsummarised results for the SFA, and include p.values < alphasim_results <- SimResults(sim2)sim_results <- within(sim_results, sig <- p < .05)sim_results# fitted modelsfa <- SFA(sim_results, formula = sig ~ N)sfasummary(sfa)# plot the observed and SFA expected valuesplot(p ~ N, sim2, las=1, pch=16, main='Rejection rates with R=100')pred <- predict(sfa, type = 'response')lines(sim_results$N, pred, col='red', lty=2)# fitted model + root-solved solution given f(.) = b,#   where b = target power of .8design <- data.frame(N=NA, d=.2)sfa.root <- SFA(sim_results, formula = sig ~ N,                b=.8, design=design, interval=c(100, 500))sfa.root# true rootpwr::pwr.t.test(power=.8, d=.2)#################### vary multiple parameters (e.g., sample size + effect size) to fit# multi-parameter surrogateDesign <- createDesign(N = seq(from=10, to=500, by=10),                       d = seq(from=.1, to=.5, by=.1))Designsim3 <- runSimulation(design=Design, replications=50,                      generate=Generate, analyse=Analyse,                      summarise=Summarise, store_results=TRUE, save=FALSE,                      progress=FALSE, control=list(print_RAM=FALSE))sim3sum(sim3$REPLICATIONS)# use the unsummarised results for the SFA, and include p.values < alphasim_results <- SimResults(sim3)sim_results <- within(sim_results, sig <- p < .05)sim_results# additive effects (logit(sig) ~ N + d)sfa0 <- SFA(sim_results, formula = sig ~ N+d)sfa0# multiplicative effects (logit(sig) ~ N + d + N:d)sfa <- SFA(sim_results, formula = sig ~ N*d)sfa# multiplicative better fit (sample size interacts with effect size)anova(sfa0, sfa, test = "LRT")summary(sfa)# plot the observed and SFA expected valueslibrary(ggplot2)sim3$pred <- predict(sfa, type = 'response', newdata=sim3)ggplot(sim3, aes(N, p, color = factor(d))) +  geom_point() + geom_line(aes(y=pred)) +  facet_wrap(~factor(d))# fitted model + root-solved solution given f(.) = b,#   where b = target power of .8design <- data.frame(N=NA, d=.2)sfa.root <- SFA(sim_results, formula = sig ~ N * d,                b=.8, design=design, interval=c(100, 500))sfa.root# true rootpwr::pwr.t.test(power=.8, d=.2)# root prediction where d *not* used in original datadesign <- data.frame(N=NA, d=.25)sfa.root <- SFA(sim_results, formula = sig ~ N * d,                b=.8, design=design, interval=c(100, 500))sfa.root# true rootpwr::pwr.t.test(power=.8, d=.25)## End(Not run)

Empirical detection robustness method suggested by Serlin (2000)

Description

Hypothesis test to determine whether an observed empirical detection rate,coupled with a given robustness interval, statistically differs from thepopulation value. Uses the methods described by Serlin (2000) as well togenerate critical values (similar to confidence intervals, but define a fixedwindow of robustness). Critical values may be computed without performing the simulationexperiment (hence, can be obtained a priori).

Usage

Serlin2000(p, alpha, delta, R, CI = 0.95)

Arguments

p

(optional) a vector containing the empirical detection rate(s) to be tested.Omitting this input will compute only the CV1 and CV2 values, while including thisinput will perform a one-sided hypothesis test for robustness

alpha

Type I error rate (e.g., often set to .05)

delta

(optional) symmetric robustness interval aroundalpha (e.g., a valueof .01 whenalpha = .05 would test the robustness window .04-.06)

R

number of replications used in the simulation

CI

confidence interval foralpha as a proportion. Default of 0.95indicates a 95% interval

Author(s)

Phil Chalmersrphilip.chalmers@gmail.com

References

Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248

Serlin, R. C. (2000). Testing for Robustness in Monte Carlo Studies.Psychological Methods, 5, 230-240.

Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953

Examples

# Cochran's criteria at alpha = .05 (i.e., 0.5 +- .01), assuming N = 2000Serlin2000(p = .051, alpha = .05, delta = .01, R = 2000)# Bradley's liberal criteria given p = .06 and .076, assuming N = 1000Serlin2000(p = .060, alpha = .05, delta = .025, R = 1000)Serlin2000(p = .076, alpha = .05, delta = .025, R = 1000)# multiple p-valuesSerlin2000(p = c(.05, .06, .07), alpha = .05, delta = .025, R = 1000)# CV values computed before simulation performedSerlin2000(alpha = .05, R = 2500)

Function for decomposing the simulation into ANOVA-based effect sizes

Description

Given the results from a simulation withrunSimulation form an ANOVA table (withoutp-values) with effect sizes based on the eta-squared statistic. These results provide approximateindications of observable simulation effects, therefore these ANOVA-based results are generally usefulas exploratory rather than inferential tools.

Usage

SimAnova(formula, dat, subset = NULL, rates = TRUE)

Arguments

formula

an R formula generally of a form suitable forlm oraov. However, if the dependent variable (left size of the equation) is omittedthen all the dependent variables in the simulation will be used and the result will returna list of analyses

dat

an object returned fromrunSimulation of class'SimDesign'

subset

an optional argument to be passed tosubset with the same name. Used tosubset the results object while preserving the associated attributes

rates

logical; does the dependent variable consist of rates (e.g., returned fromECR orEDR)? Default is TRUE, which will use the logit of the DVto help stabilize the proportion-based summary statistics when computing the parameters andeffect sizes

Author(s)

Phil Chalmersrphilip.chalmers@gmail.com

References

Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248

Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953

Examples

data(BF_sim)# all results (not usually good to mix Power and Type I results together)SimAnova(alpha.05.F ~ (groups_equal + distribution)^2, BF_sim)# only use anova for Type I error conditionsSimAnova(alpha.05.F ~ (groups_equal + distribution)^2, BF_sim, subset = var_ratio == 1)# run all DVs at once using the same formulaSimAnova(~ groups_equal * distribution, BF_sim, subset = var_ratio == 1)

Check for missing files in array simulations

Description

Given the saved files from arunArraySimulation remoteevaluation check whether all.rds files have been saved. If missingthe missing row condition numbers will be returned.

Usage

SimCheck(dir = NULL, files = NULL, min = 1L, max = NULL)

Arguments

dir

character vector input indicating the directorycontaining the.rds files (seefiles)

files

vector of file names referring to the saved simulation files.E.g.c('mysim-1.rds', 'mysim-2.rds', ...)

min

minimum number after the'-' deliminator. Default is 1

max

maximum number after the'-' deliminator. If not specifiedis extracted from the attributes in the first file

Value

returns an invisible list of indices of empty, missing andempty-and-missing row conditions. If no missing then an empty list isreturned

Author(s)

Phil Chalmersrphilip.chalmers@gmail.com

References

Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248

See Also

runArraySimulation,SimCollect

Examples

## Not run: # if files are in mysimfiles/ directorySimCheck('mysimfiles')# specifying files explicilitysetwd('mysimfiles/')SimCheck(files=dir())## End(Not run)

Removes/cleans files and folders that have been saved

Description

This function is mainly used in pilot studies where results and datasets have been temporarily savedbyrunSimulation but should be removed before beginning the fullMonte Carlo simulation (e.g., remove files and folders which contained bugs/biased results).

Usage

SimClean(  ...,  dirs = NULL,  temp = TRUE,  results = FALSE,  seeds = FALSE,  save_details = list())

Arguments

...

one or more character objects indicating which files to remove. Used to remove.rds files which were saved withsaveRDS or when using thesaveandfilename inputs torunSimulation

dirs

a character vector indicating which directories to remove

temp

logical; remove the temporary file saved when passingsave = TRUE?

results

logical; remove the.rds results filessaved when passingsave_results = TRUE?

seeds

logical; remove the seed filessaved when passingsave_seeds = TRUE?

save_details

a list pertaining to information about how and where files were saved(see the corresponding list inrunSimulation)

Author(s)

Phil Chalmersrphilip.chalmers@gmail.com

References

Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248

Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953

See Also

runSimulation

Examples

## Not run: # remove file called 'results.rds'SimClean('results.rds')# remove default temp fileSimClean()# remove customized saved-results directory called 'mydir'SimClean(results = TRUE, save_details = list(save_results_dirname = 'mydir'))## End(Not run)

Collapse separate simulation files into a single result

Description

This function collects and aggregates the results fromSimDesign'srunSimulation into a singleobjects suitable for post-analyses, or combines all the saved results directories and combinesthem into one. This is useful when results are run piece-wise on one node (e.g., 500 replicationsin one batch, 500 again at a later date, though be careful about theset.seeduse as the random numbers will tend to correlate the more it is used) or run independently across differentnodes/computing cores (e.g., seerunArraySimulation.

Usage

SimCollect(  dir = NULL,  files = NULL,  filename = NULL,  select = NULL,  check.only = FALSE,  target.reps = NULL,  warning_details = FALSE,  error_details = TRUE,  gc = FALSE)aggregate_simulations(...)

Arguments

dir

acharacter vector pointing to the directory name containing the.rds files.All.rds files in this directory will be used after first checkingtheir status withSimCheck. For greater specificity use thefiles argument

files

acharacter vector containing the names of the simulation's final.rds files.

filename

(optional) name of .rds file to save aggregate simulation file to. If not specifiedthen the results will only be returned in the R console.

select

a character vector indicating columns to variables to select from theSimExtract(what='results') information. This is mainly useful when RAM is an issuegiven simulations with many stored estimates. Default includes the results objectsin their entirety, though to omit all internally stored simulation results pass thecharacter'NONE'. To investigate the stored warnings and error messages in isolationpass'WARNINGS' or'ERRORS', respectively

check.only

logical; for larger simulations file sets, such as those generated byrunArraySimulation, return the design conditions that do no satisfy thetarget.reps and throw warning if files are unexpectedly missing (the lattercriterion is reported fromSimCheck, which can be used to obtain theconditions associated with the missing files)

target.reps

(optional) number of replications to check against to evaluate whetherthe simulation files returned the desired number of replications. If missing, the highestdetected value from the collected set of replication information will be used

warning_details

logical; include the aggregate of the warnings to be extracted viaSimExtract?

error_details

logical; include the aggregate of the errors to be extracted viaSimExtract?

gc

logical; explicitly call R's garbage collectorgc? May help whenmemory is severely constrained during the file read-ins. Otherwise, theselectargument should be used to take more memory-friendly subsets

...

not used

Value

returns adata.frame/tibble with the (weighted) average/aggregateof the simulation results

Author(s)

Phil Chalmersrphilip.chalmers@gmail.com

References

Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248

Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953

See Also

runSimulation,runArraySimulation,SimCheck

Examples

## Not run: setwd('my_working_directory')## run simulations to save the .rds files (or move them to the working directory)# seeds1 <- genSeeds(design)# seeds2 <- genSeeds(design, old.seeds=seeds1)# ret1 <- runSimulation(design, ..., seed=seeds1, filename='file1')# ret2 <- runSimulation(design, ..., seed=seeds2, filename='file2')# saves to the hard-drive and stores in workspacefinal <- SimCollect(files = c('file1.rds', 'file2.rds'))final# If filename not included, can be extracted from results# files <- c(SimExtract(ret1, 'filename'), SimExtract(ret2, 'filename'))# final <- SimCollect(files = files)################################################## Example where each row condition is repeated, evaluated independently,# and later collapsed into a single analysis object# Each condition repeated four times (hence, replications# should be set to desired.reps/4)Design <- createDesign(mu = c(0,5),                       N  = c(30, 60))Design# assume the N=60 takes longer, and should be spread out across more arraysDesign_long <- expandDesign(Design, c(2,2,4,4))Design_longreplications <- c(rep(50, 4), rep(25,8))data.frame(Design_long, replications)#-------------------------------------------------------------------Generate <- function(condition, fixed_objects) {    dat <- with(condition, rnorm(N, mean=mu))    dat}Analyse <- function(condition, dat, fixed_objects) {    ret <- c(mean=mean(dat), SD=sd(dat))    ret}Summarise <- function(condition, results, fixed_objects) {    ret <- colMeans(results)    ret}#-------------------------------------------------------------------# create directory to store all final simulation filesdir.create('sim_files/')iseed <- genSeeds()# distribute jobs independentlysapply(1:nrow(Design_long), \(i) {  runArraySimulation(design=Design_long, replications=replications,                generate=Generate, analyse=Analyse, summarise=Summarise,                arrayID=i, dirname='sim_files/', filename='job', iseed=iseed)}) |> invisible()# check for any missed filesSimCheck(dir='sim_files/')# check that all replications satisfy targetSimCollect('sim_files/', check.only = TRUE)# specify files explicitlySimCollect(files = list.files(path='sim_files/', pattern="*.rds", full.names=TRUE),   check.only = TRUE)# this would have been returned were the target.rep supposed to be 1000SimCollect('sim_files/', check.only = TRUE, target.reps=1000)# aggregate into single objectsim <- SimCollect('sim_files/')sim# view list of error messages (if there were any raised)SimCollect('sim_files/', select = 'ERRORS')SimClean(dir='sim_files/')## End(Not run)

Function to extract extra information from SimDesign objects

Description

Function used to extract any error or warnings messages, the seeds associatedwith any error or warning messages, and any analysis results that were stored in thefinal simulation object.

Usage

SimExtract(object, what, fuzzy = TRUE, append = TRUE)

Arguments

object

object returned fromrunSimulation

what

character vector indicating what information to extract, written in agnostic casing(e.g.,'ERRORS' and'errors' are equivalent).

Possible inputs include'errors' to return atibble object containing counts of anyerror messages,'warnings' to return adata.frame object containingcounts of any warning messages,'seeds' for the specified random numbergeneration seeds,'Random.seeds' for the complete list of.Random.seed states across replications (only stored whenrunSimulation(..., control = list(store_Random.seeds=TRUE))),'error_seeds' and'warning_seeds'to extract the associated.Random.seed values associated with the ERROR/WARNING messages,'prepare_seeds' to extract the.Random.seed states captured beforeprepare() was called for each condition,'prepare_error_seed' to extract the.Random.seed state whenprepare() encountered an error (useful for debugging withload_seed_prepare),'results' to extract the simulation results if the optionstore_results was passed torunSimulation,'filename' and'save_results_dirname' for extractingthe saved file/directory name information (if used),'functions' to extract the defined functionsused in the experiment, and'design' to extract the original design object

Note that'warning_seeds' are not stored automatically insimulations and require passingstore_warning_seeds = TRUE torunSimulation.

fuzzy

logical; use fuzzy string matching to reduce effectively identical messages?For example, when attempting to invert a matrix the error message"System is computationally singular: reciprocal condition number = 1.92747e-17" and"System is computationally singular: reciprocal condition number = 2.15321e-16" areeffectively the same, and likely should be reported in the same columns of the extracted output

append

logical; append the design conditions when extracting error/warning messages?

Author(s)

Phil Chalmersrphilip.chalmers@gmail.com

References

Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248

Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953

Examples

## Not run: Generate <- function(condition, fixed_objects) {    int <- sample(1:10, 1)    if(int > 5) warning('GENERATE WARNING: int greater than 5')    if(int == 1) stop('GENERATE ERROR: integer is 1')    rnorm(5)}Analyse <- function(condition, dat, fixed_objects) {    int <- sample(1:10, 1)    if(int > 5) warning('ANALYSE WARNING: int greater than 5')    if(int == 1) stop('ANALYSE ERROR: int is 1')    c(ret = 1)}Summarise <- function(condition, results, fixed_objects) {    mean(results)}res <- runSimulation(replications = 100, seed=1234,                     generate=Generate, analyse=Analyse, summarise=Summarise)resSimExtract(res, what = 'errors')SimExtract(res, what = 'warnings')seeds <- SimExtract(res, what = 'error_seeds')seeds[,1:3]# replicate a specific error for debugging (type Q to exit debugger)res <- runSimulation(replications = 100, load_seed=seeds[,1], debug='analyse',                     generate=Generate, analyse=Analyse, summarise=Summarise)## End(Not run)

Template-based generation of the Generate-Analyse-Summarise functions

Description

This function prints template versions of the requiredDesign and Generate-Analyse-Summarise functionsforSimDesign to run simulations. Templated output comes complete with the correct inputs,class of outputs, and optional comments to help with the initial definitions.Use this at the start of your Monte Carlo simulation study. Followingthe definition of theSimDesign template file please refer to detailed the informationinrunSimulation for how to edit this template to make a working simulation study.

Usage

SimFunctions(  filename = NULL,  dir = getwd(),  save_structure = "single",  extra_file = FALSE,  nAnalyses = 1,  nGenerate = 1,  summarise = TRUE,  comments = FALSE,  openFiles = TRUE,  spin_header = TRUE,  SimSolve = FALSE)

Arguments

filename

a character vector indicating whether the output should be saved to two respective filescontaining the simulation design and the functional components, respectively. Using this optionis generally the recommended approach when beginning to write a Monte Carlo simulation

dir

the directory to write the files to. Default is the working directory

save_structure

character indicating the number of files to break the simulation code intowhenfilename is included (default is 'single' for one file). Whensave_structure = 'double' theoutput is saved to two separate files containing the functions and design definitions,and whensave_structure = 'all' the generate, analyse, summarise, and execution code are all saved intoseparate files. The purpose of this structure is because multiple structured filesoften makes organization and debugging slightly easier for larger Monte Carlo simulations, though, in principle,all files could be stored in a single R script

extra_file

logical; should an extra file be saved containing user-defined functions or objects?Default isFALSE

nAnalyses

number of analysis functions to create (default is 1). Increasing the valueof this argument when independent analysis are being performed allows function definitionsto be better partitioned and potentially more modular

nGenerate

number of generate functions to create (default is 1). Increase the valueof this argument when when the data generation functions are very different and shouldbe isolated from each other (otherwise, if there is much in common between the generatesteps, the default of 1 should be preferred). Otherwise, ifnGenerate == 0then no generate function will be provided and instead this data-generationstep can be defined in the analysis function(s) (only recommended for smaller simulations)

summarise

includesummarise function? Default isTRUE

comments

logical; include helpful comments? Default isFALSE

openFiles

logical; after files have been generated, open them in your text editor(e.g., if Rstudio is running the scripts will open in a new tab)?

spin_header

logical; include a basicknitr::spin header to allow the simulationto be knitted? Default isTRUE. For those less familiar withspin documentsseehttps://bookdown.org/yihui/rmarkdown-cookbook/spin.html for further details

SimSolve

logical; should the template be generated that is intended for aSimSolve implementation? Default isFALSE

Details

The recommended approach to organizing Monte Carlo simulation files is to first save the template generatedby this function to the hard-drive by passing a suitablefilename argument(which, if users are interactingwith R via the RStudio IDE, will also open the template file after it has been saved).For larger simulations, twoseparate files could also be used (achieved by changingout.files),and may be easier for debugging/sourcing the simulation code; however, this is amatter of preference and does not change any functionality in the package.

Author(s)

Phil Chalmersrphilip.chalmers@gmail.com

References

Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248

Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953

See Also

runSimulation

Examples

SimFunctions()SimFunctions(comments = TRUE) #with helpful comments## Not run: # write output files to a single file with commentsSimFunctions('mysim', comments = TRUE)# Multiple analysis functions for optional partitioningSimFunctions(nAnalyses = 2)SimFunctions(nAnalyses = 3)# Multiple analysis + generate functionsSimFunctions(nAnalyses = 2, nGenerate=2)# save multiple files for the purpose of designing larger simulations#  (also include extra_file for user-defined objects/functions)SimFunctions('myBigSim', save_structure = 'all',   nAnalyses = 3, nGenerate=2, extra_file = TRUE)## End(Not run)

Function to read in saved simulation results

Description

IfrunSimulation was passed the flagsave_results = TRUE then therow results corresponding to thedesign object will be stored to a suitablesub-directory as individual.rds files. While users could usereadRDS directlyto read these files in themselves, this convenience function will read the desired rows inautomatically given the returned objectfrom the simulation. Can be used to read in 1 or more.rds files at once (if more than 1 fileis read in then the result will be stored in a list).

Usage

SimResults(obj, which, prefix = "results-row", wd = getwd(), rbind = FALSE)

Arguments

obj

object returned fromrunSimulation wheresave_results = TRUEorstore_results was used. If the former then the remaining function arguments canbe useful for reading in specific files.

Alternatively, the object can be from theSpower package, evaluatedusing eitherSpower() orSpowerBatch()

which

a numeric vector indicating which rows should be read in. If missing, all rows will beread in

prefix

character indicating prefix used for stored files

wd

working directory; default is found withgetwd.

rbind

logical; should the results be combined by row or returned asa list? Only applicable when the suppliedobj was obtained from thefunctionSpower::SpowerBatch()

Value

the returned result is either a nested list (whenlength(which) > 1) or a single list(whenlength(which) == 1) containing the simulation results. Each read-in result refers toa list of 4 elements:

condition

the associate row (ID) and conditions from therespectivedesign object

results

the object with returned from theanalyse function, potentiallysimplified into a matrix or data.frame

errors

a table containing the message and number of errors that causedthe generate-analyse steps to be rerun. These should be inspected carefully as theycould indicate validity issues with the simulation that should be noted

warnings

a table containing the message and number of non-fatal warningswhich arose from the analyse step. These should be inspected carefully as theycould indicate validity issues with the simulation that should be noted

Author(s)

Phil Chalmersrphilip.chalmers@gmail.com

References

Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248

Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953

Examples

## Not run: # store results (default behaviour)sim <- runSimulation(..., store_results = TRUE)SimResults(sim)# store results to drive if RAM issues are presentobj <- runSimulation(..., save_results = TRUE)# row 1 resultsrow1 <- SimResults(obj, 1)# rows 1:5, stored in a named listrows_1to5 <- SimResults(obj, 1:5)# all resultsrows_all <- SimResults(obj)## End(Not run)

Generate a basic Monte Carlo simulation GUI template

Description

This function generates suitable stand-alone code from theshiny package to create simpleweb-interfaces for performing single condition Monte Carlo simulations. The templategenerated is relatively minimalistic, but allows the user to quickly and easilyedit the saved files to customize the associated shiny elements as they see fit.

Usage

SimShiny(filename = NULL, dir = getwd(), design, ...)

Arguments

filename

an optional name of a text file to save the server and UI components(e.g., 'mysimGUI.R'). If omitted, the code will be printed to the R console instead

dir

the directory to write the files to. Default is the working directory

design

design object fromrunSimulation

...

arguments to be passed torunSimulation. Note that thedesign object is not used directly, and instead provides options to beselected in the GUI

Author(s)

Phil Chalmersrphilip.chalmers@gmail.com

References

Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248

Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953

See Also

runSimulation

Examples

## Not run: Design <- createDesign(sample_size = c(30, 60, 90, 120),                       group_size_ratio = c(1, 4, 8),                       standard_deviation_ratio = c(.5, 1, 2))Generate <- function(condition, fixed_objects) {    N <- condition$sample_size    grs <- condition$group_size_ratio    sd <- condition$standard_deviation_ratio    if(grs < 1){        N2 <- N / (1/grs + 1)        N1 <- N - N2    } else {        N1 <- N / (grs + 1)        N2 <- N - N1    }    group1 <- rnorm(N1)    group2 <- rnorm(N2, sd=sd)    dat <- data.frame(group = c(rep('g1', N1), rep('g2', N2)), DV = c(group1, group2))    dat}Analyse <- function(condition, dat, fixed_objects) {    welch <- t.test(DV ~ group, dat)    ind <- t.test(DV ~ group, dat, var.equal=TRUE)    # In this function the p values for the t-tests are returned,    #  and make sure to name each element, for future reference    ret <- c(welch = welch$p.value, independent = ind$p.value)    ret}Summarise <- function(condition, results, fixed_objects) {    #find results of interest here (e.g., alpha < .1, .05, .01)    ret <- EDR(results, alpha = .05)    ret}# test that it works# Final <- runSimulation(design=Design, replications=5,#                       generate=Generate, analyse=Analyse, summarise=Summarise)# print code to consoleSimShiny(design=Design, generate=Generate, analyse=Analyse,         summarise=Summarise)# save shiny code to fileSimShiny('app.R', design=Design, generate=Generate, analyse=Analyse,         summarise=Summarise)# run the applicationshiny::runApp()shiny::runApp(launch.browser = TRUE) # in web-browser## End(Not run)

One Dimensional Root (Zero) Finding in Simulation Experiments

Description

Function provides a stochastic root-finding approach to solvespecific quantities in simulation experiments (e.g., solving for a specificsample size to meet a target power rate) using theProbablistic Bisection Algorithm with Bolstering and Interpolations(ProBABLI; Chalmers, 2024). The structure follows thethree functional steps outlined inrunSimulation, however portions ofthedesign input are taken as variables to be estimated rather thanfixed, where an additional constantb is required in order tosolve the root equationf(x) - b = 0.

Usage

SimSolve(  design,  interval,  b,  generate,  analyse,  summarise,  replications = list(burnin.iter = 15L, burnin.reps = 50L, max.reps = 500L,    min.total.reps = 9000L, increase.by = 10L),  integer = TRUE,  formula = y ~ poly(x, 2),  family = "binomial",  parallel = FALSE,  cl = NULL,  save = TRUE,  resume = TRUE,  method = "ProBABLI",  wait.time = NULL,  ncores = parallelly::availableCores(omit = 1L),  type = ifelse(.Platform$OS.type == "windows", "PSOCK", "FORK"),  maxiter = 100L,  check.interval = TRUE,  predCI = 0.95,  predCI.tol = NULL,  lastSolve = NULL,  verbose = interactive(),  control = list(),  ...)## S3 method for class 'SimSolve'summary(object, tab.only = FALSE, reps.cutoff = 300, ...)## S3 method for class 'SimSolve'plot(x, y, ...)

Arguments

design

atibble ordata.frame object containingthe Monte Carlo simulation conditions to be studied, where each rowrepresents a unique condition and each column a factor to be varied(seecreateDesign). However, exactly one column of thisobject in each row must be specified withNA placeholders to indicatethat the missing value should be estimated via the selectstochastic optimizer

interval

avector of length two, ormatrix withnrow(design) rows and two columns, containing the end-pointsof the interval to be searched per row condition.If a vector then the interval will be used for all rows in the supplieddesign object when passed to thePBA engine

b

a single constant used to solve the root equationf(x) - b = 0

generate

generate function. SeerunSimulation

analyse

analysis function. SeerunSimulation

summarise

summary function that returns a single number correspondingto the function evaluationf(x) in the equationf(x) = b to be solved as a rootf(x) - b = 0.Unlike in the standardrunSimulation() definitions this inputis required. For further information on this function specification,seerunSimulation

replications

a namedlist orvectorindicating the number of replication touse for each design condition per PBA iteration. By default the input is alist with the argumentsburnin.iter = 15L, specifying the numberof burn-in iterations to used,burnin.reps = 50L to indicate how manyreplications to use in each burn-in iteration,max.reps = 500L toprevent the replications from increasing higher than this number,min.total.reps = 9000L to avoid termination when very few replicationshave been explored (lower bound of the replication budget),andincrease.by = 10L to indicate how many replications to increaseper iteration after the burn-in stage. Default can overwritten by explicit definition (e.g.,replications = list(increase.by = 25L)).

Vector inputs can specify the exact replications for each respectiveiteration. As a general rule, early iterationsshould be relatively low for initial searches to avoid unnecessary computationswhen locating the approximate location of the root,while the number of replications should gradually increase after this burn-into reduce the sampling variability.

integer

logical; should the values of the root be considered integeror numeric? IfTRUE then bolstered directional decisions will bemade in thePBA function based on the collected sampling history

formula

regression formula to use wheninterpolate = TRUE. Defaultfits an orthogonal polynomial of degree 2

family

family argument passed toglm. By defaultthe'binomial' family is used, as this function defaults to poweranalysis setups where isolated results passed tosummarise willreturn 0/1s, however other families should be used ifsummarisereturns something else (e.g., if solving for a particular standard errorthen a'gaussian' family would be more appropriate).

Note that if individual results from theanalyse steps shouldnot be used (i.e., only the aggregate fromsummarise is meaningful)then setcontrol = list(summarise.reg_data = TRUE) to override the defaultbehaviour, thereby using only the aggregate information and weights

parallel

for parallel computing for slower simulation experiments(seerunSimulation for details)

cl

seerunSimulation

save

logical; store temporary file in case of crashes. If detectedin the working directory will automatically be loaded to resume (seerunSimulation for similar behaviour)

resume

logical; if a temporarySimDesign file is detected shouldthe simulation resume from this location? Keeping thisTRUE is generallyrecommended, however this should be disabledif usingSimSolve withinrunSimulation to avoidreading improper save states

method

optimizer method to use. Default is the stochastic root-finder'ProBABLI', but can also be the deterministic options'Brent'(which uses the functionuniroot) or'bisection'for the classical bisection method. If using deterministic root-finders thenreplications must either equal a single constant to reflectthe number of replication to use per deterministic iteration or be avector of lengthmaxiter to indicate the replications to use periteration

wait.time

(optional) argument passed toPBA to indicatethe time to wait (specified in minutes if a numeric vector is passed)per row in theDesign objectrather than using pre-determined termination criteria based on the estimates.For example, if three three conditions were defined inDesign, andwait.time="5",then the total search time till terminate after 15 minutes regardless ofindependently specified termination criteria incontrol. SeetimeFormater for alternative specifications

ncores

seerunSimulation

type

type of cluster (seemakeCluster)or plotting type to use.

Iftype used inplotthen can be'density' to plot the density of the iteration historyafter the burn-in stage,'iterations' for a bubble plot with inversereplication weights. If not specified then the default PBAplots are provided (seePBA)

maxiter

the maximum number of iterations (default 100) except whenwait.time is specified (automatically increased to 3000to avoid early termination)

check.interval

logical; should an initial check be made to determinewhetherf(interval[1L]) andf(interval[2L]) have oppositesigns? IfFALSE, the specifiedinterval is assumed to contain a root,wheref(interval[1]) < 0 andf(interval[2] > 0. Default isTRUE

predCI

advertised confidence interval probability for finalmodel-based prediction of targetb given the root input estimate.Returned as an element in thesummary() list output

predCI.tol

(optional) rather than relying on the changes between successiveestimates (default), if the predicting CI is consistently within thissupplied tolerance range then the search will be terminated.This provides termination behaviour based on the predictedprecision of the root solutions rather than their stability history, and thereforecan be used to obtain estimates with a particular level of advertised accuracy.For example, when solving for a sample size value (N) if the solutionassociated withb = .80 requires that the advertised 95is consistently between [.795, .805] thenpredCI.tol = .01 should beused to reflect this tolerance range

lastSolve

stub forSpower package; not to be used byfront-end users

verbose

logical; print information to the console?

control

alist of the algorithm control parameters. If not specified,the defaults described below are used.

tol

tolerance criteria for early termination (.1 forinteger = TRUE searches; .00025 for non-integer searches

rel.tol

relative tolerance criteria for early termination (default .0001)

k.success

number of consecutive tolerance successes givenrel.tol andtol criteria (default is 3)

bolster

logical; should the PBA evaluations use bolstering based on previousevaluations? Default isTRUE, though only applicable wheninteger = TRUE

interpolate.R

number of replications to collect prior to performingthe interpolation step (default is 3000 after accounting for data exclusionfromburnin.iter). Setting this to 0 will disable anyinterpolation computations

include_reps

logical; include a column in theconditionelements to indicate how many replications are currently being evaluated? Mainlyuseful when further tuning within each ProBABLI iteration isdesirable (e.g., for increasing/decreasing bootstrap draws as the search progresses).Default isFALSE

summarise.reg_data

logical; should the aggregate results fromSummarise(along with its associated weights) be used for the interpolation steps, or theraw data from theAnalyse step? Set this toTRUE when the individualresults fromAnalyse give less meaningful information

...

additional arguments to be pasted toPBA

object

object of class'SimSolve'

tab.only

logical; print only the (reduced) table of estimates?

reps.cutoff

integer indicating the rows to omit from the outputif the number of replications are less than this value

x

object of class'SimSolve'

y

design row to plot. If omitted defaults to 1

Details

Root finding is performed using a progressively bolstered version of theprobabilistic bisection algorithm (PBA) to find theassociated root given the noisy simulationobjective function evaluations. Information is collected throughoutthe search to make more accurate predictions about theassociated root via interpolation. If interpolations fail, then the lastiteration of the PBA search is returned as the best guess.

For greater advertised accuracy with ProBABLI, termination criteriacan be based on the width of the advertised predicting interval(viapredCI.tol) or by specifying how long the investigatoris willing to wait for the final estimates (viawait.time,where longer wait times lead to progressively better accuracy inthe final estimates).

Value

the filled-indesign object containing the associated lower and upper intervalestimates from the stochastic optimization

Author(s)

Phil Chalmersrphilip.chalmers@gmail.com

References

Chalmers, R. P. (2024). Solving Variables with Monte Carlo Simulation Experiments: AStochastic Root-Solving Approach.Psychological Methods. DOI: 10.1037/met0000689

Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248

See Also

SFA

Examples

## Not run: ############################ A Priori Power Analysis########################### GOAL: Find specific sample size in each group for independent t-test# corresponding to a power rate of .8## For ease of the setup, assume the groups are the same size, and the mean# difference corresponds to Cohen's d values of .2, .5, and .8# This example can be solved numerically using the pwr package (see below),# though the following simulation setup is far more general and can be# used for any generate-analyse combination of interest# SimFunctions(SimSolve=TRUE)#### Step 1 --- Define your conditions under study and create design data.frame.#### However, use NA placeholder for sample size as it must be solved,#### and add desired power rate to objectDesign <- createDesign(N = NA,                       d = c(.2, .5, .8),                       sig.level = .05)Design    # solve for NA's#~~~~~~~~~~~~~~~~~~~~~~~~#### Step 2 --- Define generate, analyse, and summarise functionsGenerate <- function(condition, fixed_objects) {    Attach(condition)    group1 <- rnorm(N)    group2 <- rnorm(N, mean=d)    dat <- data.frame(group = gl(2, N, labels=c('G1', 'G2')),                      DV = c(group1, group2))    dat}Analyse <- function(condition, dat, fixed_objects) {    p <- t.test(DV ~ group, dat, var.equal=TRUE)$p.value    p}Summarise <- function(condition, results, fixed_objects) {    # Must return a single number corresponding to f(x) in the    # root equation f(x) = b    ret <- c(power = EDR(results, alpha = condition$sig.level))    ret}#~~~~~~~~~~~~~~~~~~~~~~~~#### Step 3 --- Optimize N over the rows in design### (For debugging) may want to see if simulation code works as intended first### for some given set of inputs# runSimulation(design=createDesign(N=100, d=.8, sig.level=.05),#              replications=10, generate=Generate, analyse=Analyse,#              summarise=Summarise)# Initial search between N = [10,500] for each row using the default   # integer solver (integer = TRUE). In this example, b = target powersolved <- SimSolve(design=Design, b=.8, interval=c(10, 500),                generate=Generate, analyse=Analyse,                summarise=Summarise)solvedsummary(solved)plot(solved, 1)plot(solved, 2)plot(solved, 3)# also can plot median history and estimate precisionplot(solved, 1, type = 'history')plot(solved, 1, type = 'density')plot(solved, 1, type = 'iterations')# verify with true power from pwr packagelibrary(pwr)pwr.t.test(d=.2, power = .8) # sig.level/alpha = .05 by defaultpwr.t.test(d=.5, power = .8)pwr.t.test(d=.8, power = .8)# use estimated N results to see how close power wasN <- solved$Npwr.t.test(d=.2, n=N[1])pwr.t.test(d=.5, n=N[2])pwr.t.test(d=.8, n=N[3])# with roundingN <- ceiling(solved$N)pwr.t.test(d=.2, n=N[1])pwr.t.test(d=.5, n=N[2])pwr.t.test(d=.8, n=N[3])### failing analytic formula, confirm results with more precise###  simulation via runSimulation()###  (not required, if accuracy is important then ProBABLI should be run longer)# csolved <- solved# csolved$N <- ceiling(solved$N)# confirm <- runSimulation(design=csolved, replications=10000, parallel=TRUE,#                         generate=Generate, analyse=Analyse,#                         summarise=Summarise)# confirm# Similarly, terminate if the prediction interval is consistently predicted#   to be between [.795, .805]. Note that maxiter increased as wellsolved_predCI <- SimSolve(design=Design, b=.8, interval=c(10, 500),                     generate=Generate, analyse=Analyse, summarise=Summarise,                     maxiter=200, predCI.tol=.01)solved_predCIsummary(solved_predCI) # note that predCI.b are all within [.795, .805]N <- solved_predCI$Npwr.t.test(d=.2, n=N[1])pwr.t.test(d=.5, n=N[2])pwr.t.test(d=.8, n=N[3])# Alternatively, and often more realistically, wait.time can be used# to specify how long the user is willing to wait for a final estimate.# Solutions involving more iterations will be more accurate,# and therefore it is recommended to run the ProBABLI root-solver as long# the analyst can tolerate if the most accurate estimates are desired.# Below executes the simulation for 2 minutes per conditionsolved_2min <- SimSolve(design=Design[1, ], b=.8, interval=c(10, 500),                generate=Generate, analyse=Analyse, summarise=Summarise,                wait.time="2")solved_2minsummary(solved_2min)# use estimated N results to see how close power wasN <- solved_2min$Npwr.t.test(d=.2, n=N[1])#------------------------------------------------######################### Sensitivity Analysis######################## GOAL: solve effect size d given sample size and power inputs (inputs# for root no longer required to be an integer)# Generate-Analyse-Summarise functions identical to above, however# Design input includes NA for d elementDesign <- createDesign(N = c(100, 50, 25),                       d = NA,                       sig.level = .05)Design    # solve for NA's#~~~~~~~~~~~~~~~~~~~~~~~~#### Step 2 --- Define generate, analyse, and summarise functions (same as above)#~~~~~~~~~~~~~~~~~~~~~~~~#### Step 3 --- Optimize d over the rows in design# search between d = [.1, 2] for each row# In this example, b = target power# note that integer = FALSE to allow smooth updates of dsolved <- SimSolve(design=Design, b = .8, interval=c(.1, 2),                   generate=Generate, analyse=Analyse,                   summarise=Summarise, integer=FALSE)solvedsummary(solved)plot(solved, 1)plot(solved, 2)plot(solved, 3)# plot median history and estimate precisionplot(solved, 1, type = 'history')plot(solved, 1, type = 'density')plot(solved, 1, type = 'iterations')# verify with true power from pwr packagelibrary(pwr)pwr.t.test(n=100, power = .8)pwr.t.test(n=50, power = .8)pwr.t.test(n=25, power = .8)# use estimated d results to see how close power waspwr.t.test(n=100, d = solved$d[1])pwr.t.test(n=50, d = solved$d[2])pwr.t.test(n=25, d = solved$d[3])### failing analytic formula, confirm results with more precise###  simulation via runSimulation() (not required; if accuracy is important###  PROBABLI should just be run longer)# confirm <- runSimulation(design=solved, replications=10000, parallel=TRUE,#                         generate=Generate, analyse=Analyse,#                         summarise=Summarise)# confirm#------------------------------------------------####################### Criterion Analysis###################### GOAL: solve Type I error rate (alpha) given sample size, effect size, and# power inputs (inputs for root no longer required to be an integer). Only useful# when Type I error is less important than achieving the desired 1-beta (power)Design <- createDesign(N = 50,                        d = c(.2, .5, .8),                        sig.level = NA)Design    # solve for NA's# all other function definitions same as above# search for alpha within [.0001, .8]solved <- SimSolve(design=Design, b = .8, interval=c(.0001, .8),                   generate=Generate, analyse=Analyse,                   summarise=Summarise, integer=FALSE)solvedsummary(solved)plot(solved, 1)plot(solved, 2)plot(solved, 3)# plot median history and estimate precisionplot(solved, 1, type = 'history')plot(solved, 1, type = 'density')plot(solved, 1, type = 'iterations')# verify with true power from pwr packagelibrary(pwr)pwr.t.test(n=50, power = .8, d = .2, sig.level=NULL)pwr.t.test(n=50, power = .8, d = .5, sig.level=NULL)pwr.t.test(n=50, power = .8, d = .8, sig.level=NULL)# use estimated alpha results to see how close power waspwr.t.test(n=50, d = .2, sig.level=solved$sig.level[1])pwr.t.test(n=50, d = .5, sig.level=solved$sig.level[2])pwr.t.test(n=50, d = .8, sig.level=solved$sig.level[3])### failing analytic formula, confirm results with more precise###  simulation via runSimulation() (not required; if accuracy is important###  PROBABLI should just be run longer)# confirm <- runSimulation(design=solved, replications=10000, parallel=TRUE,#                         generate=Generate, analyse=Analyse,#                         summarise=Summarise)# confirm## End(Not run)

Summarise simulated data using various population comparison statistics

Description

This collapses the simulation results within each condition to compositeestimates such as RMSE, bias, Type I error rates, coverage rates, etc. See theSee Also section below for useful functions to be used withinSummarise.

Usage

Summarise(condition, results, fixed_objects)

Arguments

condition

a single row from thedesign input fromrunSimulation(as adata.frame), indicating the simulation conditions

results

atibble data frame (ifAnalyse returned a named numeric vector of anylength) or alist (ifAnalyse returned alist or multi-roweddata.frame)containing the analysis results fromAnalyse,where each cell is stored in a unique row/list element

fixed_objects

object passed down fromrunSimulation

Value

for best results should return a namednumeric vector ordata.framewith the desired meta-simulation results. Namedlist objects can also be returned,however the subsequent results must be extracted viaSimExtract

References

Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248

Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953

See Also

bias,RMSE,RE,EDR,ECR,MAE,SimExtract

Examples

## Not run: summarise <- function(condition, results, fixed_objects) {    #find results of interest here (alpha < .1, .05, .01)    lessthan.05 <- EDR(results, alpha = .05)    # return the results that will be appended to the design input    ret <- c(lessthan.05=lessthan.05)    ret}## End(Not run)

Add missing values to a vector given a MCAR, MAR, or MNAR scheme

Description

Given an input vector, replace elements of this vector with missing values according to some scheme.Default method replaces input values with a MCAR scheme (where on average 10% of the values will bereplaced withNAs). MAR and MNAR are supported by replacing the defaultFUN argument.

Usage

addMissing(y, fun = function(y, rate = 0.1, ...) rep(rate, length(y)), ...)

Arguments

y

an input vector that should contain missing data in the form ofNA's

fun

a user defined function indicating the missing data mechanism for each element iny.Function must return a vector of probability values with the length equal to the length ofy.Each value in the returned vector indicates the probability thatthe respective element in y will be replaced withNA.Function must contain the argumenty, representing theinput vector, however any number of additional arguments can be included

...

additional arguments to be passed toFUN

Details

Given an input vector y, and other relevant variablesinside (X) and outside (Z) the data-set, the three types of missingness are:

MCAR

Missing completely at random (MCAR). This is realized by randomlysampling the values of theinput vector (y) irrespective of the possible values in X and Z.Therefore missing values are randomly sampled and do not depend on any data characteristics andare truly random

MAR

Missing at random (MAR). This is realized when values in the dataset (X)predict the missing data mechanism in y; conceptually this is equivalent toP(y = NA | X). This requires the user to define a custom missing data function

MNAR

Missing not at random (MNAR). This is similar to MAR exceptthat the missing mechanism comesfrom the value of y itself or from variables outside the working dataset;conceptually this is equivalent toP(y = NA | X, Z, y). This requiresthe user to define a custom missing data function

Value

the input vectory with the sampledNA values(according to theFUN scheme)

Author(s)

Phil Chalmersrphilip.chalmers@gmail.com

References

Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248

Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953

Examples

## Not run: set.seed(1)y <- rnorm(1000)## 10% missing rate with default FUNhead(ymiss <- addMissing(y), 10)## 50% missing with default FUNhead(ymiss <- addMissing(y, rate = .5), 10)## missing values only when female and lowX <- data.frame(group = sample(c('male', 'female'), 1000, replace=TRUE),                level = sample(c('high', 'low'), 1000, replace=TRUE))head(X)fun <- function(y, X, ...){    p <- rep(0, length(y))    p[X$group == 'female' & X$level == 'low'] <- .2    p}ymiss <- addMissing(y, X, fun=fun)tail(cbind(ymiss, X), 10)## missingness as a function of elements in X (i.e., a type of MAR)fun <- function(y, X){   # missingness with a logistic regression approach   df <- data.frame(y, X)   mm <- model.matrix(y ~ group + level, df)   cfs <- c(-5, 2, 3) #intercept, group, and level coefs   z <- cfs %*% t(mm)   plogis(z)}ymiss <- addMissing(y, X, fun=fun)tail(cbind(ymiss, X), 10)## missing values when y elements are large (i.e., a type of MNAR)fun <- function(y) ifelse(abs(y) > 1, .4, 0)ymiss <- addMissing(y, fun=fun)tail(cbind(y, ymiss), 10)## End(Not run)

Compute (relative/standardized) bias summary statistic

Description

Computes the (relative) bias of a sample estimate from the parameter value.Accepts estimate and parameter values, as well as estimate values which are in deviation form.If relative bias is requested theestimate andparameter inputs are both required.

Usage

bias(  estimate,  parameter = NULL,  type = "bias",  abs = FALSE,  percent = FALSE,  unname = FALSE)

Arguments

estimate

anumeric vector,matrix/data.frame, orlistof parameter estimates. If a vector,the length is equal to the number of replications. If amatrix/data.frame,the number of rows must equal the number of replications.list objects will be loopedover using the same rules after above after first translating the information into one-dimensionalvectors and re-creating the structure upon return

parameter

anumeric scalar/vector indicating the fixed parameters.If a single value is supplied andestimate is amatrix/data.framethen the value will be recycled for each column; otherwise, each element will be associatedwith each respective column in theestimate input.IfNULL then it will be assumed that theestimate input is in a deviationform (thereforemean(estimate)) will be returned)

type

type of bias statistic to return. Default ('bias') computes the standard bias(average difference between sample and population),'relative' computesthe relative bias statistic (i.e., divide the bias by the valueinparameter; note that multiplying this by 100 gives the "percent bias" measure, orif Type I error rates (\alpha) are supplied will result in the "percentage error"),'abs_relative' computes the relative bias for each replication independently, takes theabsolute value of each term, then computes the mean estimate,and'standardized' computes the standardized bias estimate(standard bias divided by the standard deviation of the sample estimates)

abs

logical; find the absolute bias between the parameters and estimates? This effectivelyjust applies theabs transformation to the returned result. Default is FALSE

percent

logical; change returned result to percentage by multiplying by 100?Default is FALSE

unname

logical; applyunname to the results to remove any variablenames?

Value

returns anumeric vector indicating the overall (relative/standardized)bias in the estimates

Author(s)

Phil Chalmersrphilip.chalmers@gmail.com

References

Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248

Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953

See Also

RMSE

Examples

pop <- 2samp <- rnorm(100, 2, sd = 0.5)bias(samp, pop)bias(samp, pop, type = 'relative')bias(samp, pop, type = 'standardized')bias(samp, pop, type = 'abs_relative')dev <- samp - popbias(dev)# equivalent herebias(mean(samp), pop)# matrix inputmat <- cbind(M1=rnorm(100, 2, sd = 0.5), M2 = rnorm(100, 2, sd = 1))bias(mat, parameter = 2)bias(mat, parameter = 2, type = 'relative')bias(mat, parameter = 2, type = 'standardized')# different parameter associated with each columnmat <- cbind(M1=rnorm(1000, 2, sd = 0.25), M2 = rnorm(1000, 3, sd = .25))bias(mat, parameter = c(2,3))bias(mat, parameter = c(2,3), type='relative')bias(mat, parameter = c(2,3), type='standardized')# same, but with data.framedf <- data.frame(M1=rnorm(100, 2, sd = 0.5), M2 = rnorm(100, 2, sd = 1))bias(df, parameter = c(2,2))# parameters of the same sizeparameters <- 1:10estimates <- parameters + rnorm(10)bias(estimates, parameters)# relative difference dividing by the magnitude of parametersbias(estimates, parameters, type = 'abs_relative')# relative bias as a percentagebias(estimates, parameters, type = 'abs_relative', percent = TRUE)# percentage error (PE) statistic given alpha (Type I error) and EDR() result# edr <- EDR(results, alpha = .05)edr <- c(.04, .05, .06, .08)bias(matrix(edr, 1L), .05, type = 'relative', percent = TRUE)

Compute prediction estimates for the replication size using bootstrap MSE estimates

Description

This function computes bootstrap mean-square error estimates to approximate the sampling behaviorof the meta-statistics in SimDesign'ssummarise functions. A single design condition issupplied, and a simulation withmax(Rstar) replications is performed whereby thegenerate-analyse results are collected. After obtaining these replication values, thereplications are further drawn from (with replacement) using the differing sizes inRstarto approximate the bootstrap MSE behavior given different replication sizes. Finally, given thesebootstrap estimates linear regression models are fitted using the predictor termone_sqrtR = 1 / sqrt(Rstar) to allow extrapolation to replication sizes not observed inRstar. For more information about the method and subsequent bootstrap MSE plots,refer to Koehler, Brown, and Haneuse (2009).

Usage

bootPredict(  condition,  generate,  analyse,  summarise,  fixed_objects = NULL,  ...,  Rstar = seq(100, 500, by = 100),  boot_draws = 1000)boot_predict(...)

Arguments

condition

adata.frame consisting of one row from the originaldesigninput object used withinrunSimulation

generate

seerunSimulation

analyse

seerunSimulation

summarise

seerunSimulation

fixed_objects

seerunSimulation

...

additional arguments to be passed torunSimulation

Rstar

a vector containing the size of the bootstrap subsets to obtain. Defaultinvestigates the vector [100, 200, 300, 400, 500] to compute the respective MSE terms

boot_draws

number of bootstrap replications to draw. Default is 1000

Value

returns a list of linear model objects (vialm) for eachmeta-statistics returned by thesummarise() function

Author(s)

Phil Chalmersrphilip.chalmers@gmail.com

References

Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248

Koehler, E., Brown, E., & Haneuse, S. J.-P. A. (2009). On the Assessment of Monte Carlo Error inSimulation-Based Statistical Analyses.The American Statistician, 63, 155-162.

Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953

Examples

set.seed(4321)Design <- createDesign(sigma = c(1, 2))#-------------------------------------------------------------------Generate <- function(condition, fixed_objects) {    dat <- rnorm(100, 0, condition$sigma)    dat}Analyse <- function(condition, dat, fixed_objects) {    CIs <- t.test(dat)$conf.int    names(CIs) <- c('lower', 'upper')    ret <- c(mean = mean(dat), CIs)    ret}Summarise <- function(condition, results, fixed_objects) {    ret <- c(mu_bias = bias(results[,"mean"], 0),             mu_coverage = ECR(results[,c("lower", "upper")], parameter = 0))    ret}## Not run: # boot_predict supports only one condition at a timeout <- bootPredict(condition=Design[1L, , drop=FALSE],    generate=Generate, analyse=Analyse, summarise=Summarise)out # list of fitted linear model(s)# extract first meta-statisticmu_bias <- out$mu_biasdat <- model.frame(mu_bias)print(dat)# original R metric plotR <- 1 / dat$one_sqrtR^2plot(R, dat$MSE, type = 'b', ylab = 'MSE', main = "Replications by MSE")plot(MSE ~ one_sqrtR, dat, main = "Bootstrap prediction plot", xlim = c(0, max(one_sqrtR)),     ylim = c(0, max(MSE)), ylab = 'MSE', xlab = expression(1/sqrt(R)))beta <- coef(mu_bias)abline(a = 0, b = beta, lty = 2, col='red')# what is the replication value when x-axis = .02? What's its associated expected MSE?1 / .02^2 # number of replicationspredict(mu_bias, data.frame(one_sqrtR = .02)) # y-axis value# approximately how many replications to obtain MSE = .001?(beta / .001)^2## End(Not run)

Set RNG sub-stream for Pierre L'Ecuyer's RngStreams

Description

Sets the sub-stream RNG state within for Pierre L'Ecuyer's (1999)algorithm. Should be used within distributed array jobsafter suitable L'Ecuyer's (1999) have been distributed to each array, andeach array is further defined to use multi-core processing. SeeclusterSetRNGStream for further information.

Usage

clusterSetRNGSubStream(cl, seed)

Arguments

cl

A cluster from theparallel package, or(ifNULL) the registered cluster

seed

An integer vector of length 7 as given by.Random.seed whenthe L'Ecuyer-CMR RNG is in use. SeeRNG for the valid values

Value

invisible NULL


Form Column Standard Deviation and Variances

Description

Form column standard deviation and variances for numeric arrays (or data frames).

Usage

colVars(x, na.rm = FALSE, unname = FALSE)colSDs(x, na.rm = FALSE, unname = FALSE)

Arguments

x

an array of two dimensions containing numeric, complex, integer or logical values,or a numeric data frame

na.rm

logical; remove missing values in each respective column?

unname

logical; applyunname to the results to remove any variablenames?

Author(s)

Phil Chalmersrphilip.chalmers@gmail.com

See Also

colMeans

Examples

results <- matrix(rnorm(100), ncol=4)colnames(results) <- paste0('stat', 1:4)colVars(results)colSDs(results)results[1,1] <- NAcolSDs(results)colSDs(results, na.rm=TRUE)colSDs(results, na.rm=TRUE, unname=TRUE)

Create the simulation design object

Description

Create a partially or fully-crossed data object reflecting the uniquesimulation design conditions. Each row of the returned object representsa unique simulation condition, and each column represents the named factorvariables under study.

Usage

createDesign(  ...,  subset,  fractional = NULL,  tibble = TRUE,  stringsAsFactors = FALSE,  fully.crossed = TRUE)## S3 method for class 'Design'print(x, list2char = TRUE, pillar.sigfig = 5, show.IDs = FALSE, ...)## S3 method for class 'Design'x[i, j, ..., drop = FALSE]rbindDesign(..., keep.IDs = FALSE)

Arguments

...

comma separated list of named input objects representing the simulationfactors to completely cross. Note that these arguments are passed toexpand.grid to perform the complete crossings

subset

(optional) a logical vector indicating elements or rows to keepto create a partially crossed simulation design

fractional

a fractional design matrix returned from theFrF2 package.Note that the order of the factor names/labels are associated with therespective... inputs

tibble

logical; return atibble object instead of adata.frame? Default is TRUE

stringsAsFactors

logical; should character variable inputs be coercedto factors when building adata.frame? Default is FALSE

fully.crossed

logical; create a fully-crossed design object? Setting toFALSEwill attempt to combine the design elements column-wise viadata.frame(...)instead ofexpand.grid(...)

x

object of class'Design'

list2char

logical; fortibble object re-evaluate list elementsas character vectors for better printing of the levels? Note that thisdoes not change the original classes of the object, just how they are printed.Default is TRUE

pillar.sigfig

number of significant digits to print. Default is 5

show.IDs

logical; print the internally stored Design ID indicators?

i

row index

j

column index

drop

logical; drop to lower dimension class?

keep.IDs

logical; keep the internal ID variables in theDesign objects? Use this when row-binding conditionsthat are matched with previous conditions(e.g., when usingexpandDesign)

Value

atibble ordata.frame containing the simulation experimentconditions to be evaluated inrunSimulation

Author(s)

Phil Chalmersrphilip.chalmers@gmail.com

References

Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248

Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953

See Also

expandDesign

Examples

## Not run: # modified example from runSimulation()Design <- createDesign(N = c(10, 20),                       SD = c(1, 2))Design# remove N=10, SD=2 row from initial definitionDesign <- createDesign(N = c(10, 20),                       SD = c(1, 2),                       subset = !(N == 10 & SD == 2))Design# example with list inputsDesign <- createDesign(N = c(10, 20),                       SD = c(1, 2),                       combo = list(c(0,0), c(0,0,1)))Design   # notice levels printed (not typical for tibble)print(Design, list2char = FALSE)   # standard tibble outputDesign <- createDesign(N = c(10, 20),                       SD = c(1, 2),                       combo = list(c(0,0), c(0,0,1)),                       combo2 = list(c(5,10,5), c(6,7)))Designprint(Design, list2char = FALSE)   # standard tibble output# design without crossing (inputs taken-as is)Design <- createDesign(N = c(10, 20),                       SD = c(1, 2), cross=FALSE)Design   # only 2 rows############ fractional factorial examplelibrary(FrF2)# help(FrF2)# 7 factors in 32 runsfr <- FrF2(32,7)dim(fr)fr[1:6,]# Create working simulation design given -1/1 combinationsfDesign <- createDesign(sample_size=c(100,200),                        mean_diff=c(.25, 1, 2),                        variance.ratio=c(1,4, 8),                        equal_size=c(TRUE, FALSE),                        dists=c('norm', 'skew'),                        same_dists=c(TRUE, FALSE),                        symmetric=c(TRUE, FALSE),                        # remove same-normal combo                        subset = !(symmetric & dists == 'norm'),                        fractional=fr)fDesign## End(Not run)

Expand the simulation design object for array computing

Description

Repeat each design row the specified number of times. This is primarily usedfor cluster computing where jobs are distributed with batches of replicationsand later aggregated into a complete simulation object(seerunArraySimulation andSimCollect).

Usage

expandDesign(Design, repeat_conditions)

Arguments

Design

object created bycreateDesign which should haveits rows repeated for optimal HPC schedulers

repeat_conditions

integer vector used to repeat each design rowthe specified number of times. Can either be a single integer, which repeatseach row this many times, or an integer vector equal to the number of totalrows in the created object.

This argument is useful when distributing independent row conditions tocluster computing environments, particularly with differentreplicationinformation. For example, if 1000 replications in total are the target butthe condition is repeated over 4 rows then only 250 replications per rowwould be required across the repeated conditions. SeeSimCollect for combining the simulation objectsonce complete

Value

atibble ordata.frame containing the simulation experimentconditions to be evaluated inrunSimulation

Author(s)

Phil Chalmersrphilip.chalmers@gmail.com

References

Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248

Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953

See Also

expandReplications,createDesign,SimCollect,runArraySimulation

Examples

## Not run: # repeat each row 4 times (for cluster computing)Design <- createDesign(N = c(10, 20),                       SD.equal = c(TRUE, FALSE))Design4 <- expandDesign(Design, 4)Design4# repeat first two rows 2x and the rest 4 times (for cluster computing#   where first two conditions are faster to execute)Design <- createDesign(SD.equal = c(TRUE, FALSE),                       N = c(10, 100, 1000))Design24 <- expandDesign(Design, c(2,2,rep(4, 4)))Design24## End(Not run)

Expand the replications to matchexpandDesign

Description

Expands the replication budget to match theexpandDesignstructure.

Usage

expandReplications(replications, repeat_conditions)

Arguments

replications

number of replications. Can be a scalar to reflect the samereplications overall, or a vector of unequal replication budgets.

repeat_conditions

integer vector used to repeat each design rowthe specified number of times. Can either be a single integer, which repeatseach row this many times, or an integer vector equal to the number of totalrows in the created object.

Value

an integer vector of the replication budget matchingthe expanded structure inexpandDesign

Author(s)

Phil Chalmersrphilip.chalmers@gmail.com

References

Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248

Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953

See Also

expandDesign

Examples

## Not run: # repeat each row 4 times (for cluster computing)Design <- createDesign(N = c(10, 20),                       SD.equal = c(TRUE, FALSE))Design4 <- expandDesign(Design, 4)Design4# match the replication budget. Target is 1000 replications(replications4 <- expandReplications(1000, 4))# hence, evaluate each row in Design4 250 timescbind(Design4, replications4)##### Unequal Design intensitiesDesign24 <- createDesign(SD.equal = c(TRUE, FALSE),                       N = c(10, 100, 1000))# split first two conditions into half rows, next two conditions into quarters,#  while N=1000 condition into tenthsexpand <- c(2,2,4,4,10,10)eDesign <- expandDesign(Design, expand)eDesign# target replications is R=1000 per condition(replications24 <- expandReplications(1000, expand))cbind(eDesign, replications24)## End(Not run)

Generate random seeds

Description

Generate seeds to be passed torunSimulation'sseed input. Valuesare sampled from 1 to 2147483647, or are generated using L'Ecuyer-CMRG's (2002)method (returning either a list ifarrayID is omitted, or the specificrow value from this list ifarrayID is included).

Usage

genSeeds(design = 1L, iseed = NULL, arrayID = NULL, old.seeds = NULL)gen_seeds(...)

Arguments

design

design matrix that requires a unique seed per condition, ora number indicating the number of seeds to generate. Default generates onenumber

iseed

the initialset.seed number used to generate a sequenceof independent seeds according to the L'Ecuyer-CMRG (2002) method. Thisis recommended whenever quality random number generation is requiredacross similar (if not identical) simulation jobs(e.g., seerunArraySimulation). IfarrayID is notspecified then this will return a list of the associated seed for thefulldesign

arrayID

(optional) single integer input corresponding to the specificrow in thedesign object when using theiseed input.This is used in functions such asrunArraySimulationto pull out the specific seed rather than manage a complete list, andis therefore more memory efficient

old.seeds

(optional) vector or matrix of last seeds used inprevious simulations to avoid repeating the same seed on a subsequent run.Note that this approach should be used sparingly as seeds set more frequentlyare more likely to correlate, and therefore provide less optimal randomnumber behaviour (e.g., if performing a simulation on two runs to achieve5000 * 2 = 10,000 replications this is likely reasonable,but for simulations with 100 * 2 = 200 replications this is morelikely to be sub-optimal).Length must be equal to the number of rows indesign

...

does nothing

Author(s)

Phil Chalmersrphilip.chalmers@gmail.com

Examples

# generate 1 seed (default)genSeeds()# generate 5 unique seedsgenSeeds(5)# generate from nrow(design)design <- createDesign(factorA=c(1,2,3),                       factorB=letters[1:3])seeds <- genSeeds(design)seeds# construct new seeds that are independent from original (use this sparingly)newseeds <- genSeeds(design, old.seeds=seeds)newseeds# can be done in batches toonewseeds2 <- genSeeds(design, old.seeds=cbind(seeds, newseeds))cbind(seeds, newseeds, newseeds2) # all unique############# generate seeds for runArraySimulation()(iseed <- genSeeds())  # initial seedseed_list <- genSeeds(design, iseed=iseed)seed_list# expand number of unique seeds given iseed (e.g., in case more replications# are required at a later date)seed_list_tmp <- genSeeds(nrow(design)*2, iseed=iseed)str(seed_list_tmp) # first 9 seeds identical to seed_list# more usefully for HPC, extract only the seed associated with an arrayIDarraySeed.15 <- genSeeds(nrow(design)*2, iseed=iseed, arrayID=15)arraySeed.15

Get job array ID (e.g., from SLURM or other HPC array distributions)

Description

Get the array ID from an HPC array distribution job (e.g., from SLURM orfrom optional command line arguments).The array ID is used to index the rows in the designobject inrunArraySimulation. For instance,a SLURM array with 10 independent jobs might have the following shellinstructions.

Usage

getArrayID(type = "slurm", trailingOnly = TRUE, ID.shift = 0L)

Arguments

type

an integer indicating the element from the result ofcommandArgs to extract, or acharacter specifying thethe type of. Default is'slurm'

trailingOnly

logical value passed tocommandArgs.Only used whentype is an integer

ID.shift

single integer value used to shift the array ID by a constant.Useful when there are array range limitation that must be specified in theshell files (e.g., array can only be 10000 but there are more rowsin thedesign object). For example, if the array ID should be 10000 through12000, but the cluster computer enviroment does not allow these indices, thenincluding the arrange range as 1-2000 in the shell file withshift=9999would add this constant to the detected arrayID, thereby indexing the remainingrow elements in thedesign object

Details

#!/bin/bash -l
#SBATCH --time=00:01:00
#SBATCH --array=1-10

which names the associated jobs with the numbers 1 through 10.getArrayID() then extracts this information per array, whichis used as therunArraySimulation(design, ..., arrayID = getArrayID()) topass specific rows for thedesign object.

See Also

runArraySimulation

Examples

## Not run: # get slurm array IDarrayID <- getArrayID()# get ID based on first optional argument in shell specificationarrayID <- getArrayID(type = 1)# pass to# runArraySimulation(design, ...., arrayID = arrayID)# increase detected arrayID by constant 9999 (for array    specification limitations) arrayID <- getArrayID(ID.shift=9999)## End(Not run)

List All Available Notifiers

Description

Automatically detects all S3 classes that have a specializednotify() method(likenotify.MyNotifier) and prints them as a character vector of class names(e.g.,"PushbulletNotifier","TelegramNotifier").

Note that only classes defined and loaded at the time you call this function willappear. If you just created a new notifier in another file or package, ensure it'ssourced/loaded first.

Usage

listAvailableNotifiers()

Value

A character vector of class names that havenotify.<ClassName> methods.

Examples

## Not run: listAvailableNotifiers()# [1] "PushbulletNotifier" "TelegramNotifier"## End(Not run)

Increase the intensity or suppress the output of an observed message

Description

Function provides more nuanced management of known message outputsmessages that appear in function calls outside the front-end users control(e.g., functions written in third-party packages). Specifically,this function provides a less nuclear approach thanquiet and friends, which suppresses allcat andmessages raised, and instead allows for specific messages to beraised either to warnings or, even more extremely, to errors. Note that formessages that are not suppressed the order with which the output and messagecalls appear in the original function is not retained.

Usage

manageMessages(  expr,  allow = NULL,  message2warning = NULL,  message2error = NULL,  ...)

Arguments

expr

expression to be evaluated (e.g., ret <-myfun(args)).Function should either be used as a wrapper,such asmanageMassages(ret <- myfun(args), ...) orret <- manageMassages(myfun(args), ...), or morereadably as a pipe,ret <- myfun(args) |> manageMassages(...)

allow

(optional) acharacter vector indicating messages thatshould still appear, while all other messages should remain suppressed.Each supplied message is matched using agrepl expression, so partial matchingis supported (though more specific messages are less likely to throwfalse positives). IfNULL, all messages will be suppressed unlessthey appear inmessage2error ormessage2warning

message2warning

(optional) Input can be acharacter vector containingmessages that should probably be considered warning messages for the current applicationinstead. Each suppliedcharacter vector element is matched usingagrepl expression,so partial matching is supported (though more specific messages are lesslikely to throw false positives).

message2error

(optional) Input can be acharacter vector containing known-to-be-severemessages that should be converted to errors for the current application.Seemessage2warning for details.

...

additional arguments passed togrepl

Value

returns the original result ofeval(expr), with warningmessages either left the same, increased to errors, or suppressed (dependingon the input specifications)

Author(s)

Phil Chalmersrphilip.chalmers@gmail.com

References

Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248

See Also

manageWarnings,quiet

Examples

## Not run: myfun <- function(x, warn=FALSE){   message('This function is rather chatty')   cat("It even prints in different output forms!\n")   message('And even at different ')   cat(" many times!\n")   cat("Too many messages can be annoying \n")   if(warn)     warning('It may even throw warnings ')   x}out <- myfun(1)out# tell the function to shhhhout <- quiet(myfun(1))out# same default behaviour as quiet(), but potential for nuanceout2 <- manageMessages(myfun(1))identical(out, out2)# allow some messages to still get printedout2 <- manageMessages(myfun(1), allow = "many times!")out2 <- manageMessages(myfun(1), allow = "This function is rather chatty")# note: . matches single character (regex)out2 <- manageMessages(myfun(1), allow = c("many times.",                                           "This function is rather chatty"))# convert specific message to warningout3 <- manageMessages(myfun(1), message2warning = "many times!")identical(out, out3)# other warnings also get throughout3 <- manageMessages(myfun(1, warn=TRUE), message2warning = "times!")identical(out, out3)# convert message to errormanageMessages(myfun(1), message2error = "m... times!")# multiple message intensity changesmanageMessages(myfun(1),  message2warning = "It even prints in different output forms",  message2error = "many times!")manageMessages(myfun(1),  allow = c("This function is rather chatty",            "Too many messages can be annoying"),  message2warning = "It even prints in different output forms",  message2error = "many times!")## End(Not run)

Manage specific warning messages

Description

Function provides more nuanced management of known warningmessages that appear in function calls outside the front-end users control(e.g., functions written in third-party packages). Specifically,this function provides a less nuclear approach thansuppressWarnings, which suppresses all warning messagesrather than those which are knownto be innocuous to the current application, or when globally settingoptions(warn=2),which has the opposite effect of treating all warnings messages as errorsin the function executions. To avoid these two extreme behaviors,character vectors can instead be supplied to this functionto either leave the raised warningsas-is (default behaviour), raise only specific warning messages to errors,or specify specific warning messages that can be generally be ignored(and therefore suppressed) while allowing new or yet to be discovered warningsto still be raised.

Usage

manageWarnings(expr, warning2error = FALSE, suppress = NULL, ...)

Arguments

expr

expression to be evaluated (e.g., ret <-myfun(args)).Function should either be used as a wrapper,such asmanageWarnings(ret <- myfun(args), ...) orret <- manageWarnings(myfun(args), ...), or morereadably as a pipe,ret <- myfun(args) |> manageWarnings(...)

warning2error

logical orcharacter vector to control theconversion of warnings to errors. Setting this input toTRUE will treatall observed warning messages as errors (same behavior asoptions(warn=2),though defined on a per expression basis rather than globally),while setting toFALSE (default) will leave all warning messages as-is,retaining the default behavior

Alternatively, and more useful for specificity reasons,input can be acharacter vector containing known-to-be-severewarning messages that should be converted to errors. Each suppliedcharacter vector element is matched using agrepl expression,so partial matching is supported (though more specific messages are lesslikely to throw false positives).

suppress

acharacter vector indicating warning messages thatare known to be innocuous a priori and can therefore be suppressed.Each supplied warning message ismatched using agrepl expression, so partial matchingis supported (though more specific messages are less likely to throwfalse positives). IfNULL, no warning message will be suppressed

...

additional arguments passed togrepl

Details

In general, global/nuclear behaviour of warning messages should be avoidedas they are generally bad practice. On one extreme,when suppressing all warning messages usingsuppressWarnings,potentially important warning messages will become muffled, which can be problematicif the code developer has not become aware of these (now muffled) warnings.Moreover, this can become a long-term sustainability issue when third-party functionsthat the developer's code depends upon throw new warnings in the future as thecode developer will be less likely to become aware of these newly implemented warnings.

On the other extreme, where all warning messages are turned into errorsusingoptions(warn=2), innocuous warning messages can and will be (unwantingly)raised to an error. This negatively affects the logical workflow of thedeveloper's functions, where more error messages must now be manually managed(e.g., viatryCatch), including the known to be innocuouswarning messages as these will now considered as errors.

To avoid these extremes, front-end users should first make note of the warning messagesthat have been raised in their prior executions, and organized these messagesinto vectors of ignorable warnings (least severe), known/unknown warningsthat should remain as warnings (even if not known by the code developer yet),and explicit warnings that ought to be considered errors for the currentapplication (most severe). Once collected, these can be passed to the respectivewarning2error argument to increase the intensity of a specific warningraised, or to thesuppress argument to suppress only the messages thathave been deemed ignorable a priori (and therefore allowing all other warningmessages to be raised).

Value

returns the original result ofeval(expr), with warningmessages either left the same, increased to errors, or suppressed (dependingon the input specifications)

Author(s)

Phil Chalmersrphilip.chalmers@gmail.com

References

Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248

See Also

manageMessages,quiet

Examples

## Not run: fun <- function(warn1=FALSE, warn2=FALSE, warn3=FALSE,                warn_trailing = FALSE, error=FALSE){   if(warn1) warning('Message one')   if(warn2) warning('Message two')   if(warn3) warning('Message three')   if(warn_trailing) warning(sprintf('Message with lots of random trailings: %s',                             paste0(sample(letters, sample(1:20, 1)), collapse=',')))   if(error) stop('terminate function call')   return('Returned from fun()')}# normal run (no warnings or errors)out <- fun()out# these are all the samemanageWarnings(out <- fun())out <- manageWarnings(fun())out <- fun() |> manageWarnings()# errors treated normallyfun(error=TRUE)fun(error=TRUE) |> manageWarnings()# all warnings/returns treated normally by defaultret1 <- fun(warn1=TRUE)ret2 <- fun(warn1=TRUE) |> manageWarnings()identical(ret1, ret2)# all warnings converted to errors (similar to options(warn=2), but local)fun(warn1=TRUE) |> manageWarnings(warning2error=TRUE)fun(warn2=TRUE) |> manageWarnings(warning2error=TRUE)# Specific warnings treated as errors (others stay as warnings)# Here, treat first warning message as error but not the second or thirdret <- fun(warn1=TRUE) # warningret <- fun(warn1=TRUE) |> manageWarnings("Message one")  # now errorret <- fun(warn2=TRUE) |> manageWarnings("Message one")  # still a warning# multiple warnings raised but not converted as they do not match criteriafun(warn2=TRUE, warn3=TRUE)fun(warn2=TRUE, warn3=TRUE) |> manageWarnings("Message one")# Explicitly convert multiple warning messages, allowing others through.#   This is generally the best use of the function's specificityfun(warn1=TRUE, warn2=TRUE)fun(warn1=TRUE) |>   # error given either message        manageWarnings(c("Message one", "Message two"))fun(warn2=TRUE) |>       manageWarnings(c("Message one", "Message two"))# last warning gets through (left as valid warning)ret <- fun(warn3=TRUE) |>            manageWarnings(c("Message one", "Message two"))ret# suppress warnings that have only partial matchingfun(warn_trailing=TRUE)fun(warn_trailing=TRUE)fun(warn_trailing=TRUE)# partial match, therefore suppressedfun(warn_trailing=TRUE) |>  manageWarnings(suppress="Message with lots of random trailings: ")# multiple suppress stringsfun(warn_trailing=TRUE) |>  manageWarnings(suppress=c("Message with lots of random trailings: ",                           "Suppress this too"))# could also use .* to catch all remaining characters (finer regex control)fun(warn_trailing=TRUE) |>  manageWarnings(suppress="Message with lots of random trailings: .*")############ Combine with quiet() and suppress argument to suppress innocuous messagesfun <- function(warn1=FALSE, warn2=FALSE, warn3=FALSE, error=FALSE){   message('This function is rather chatty')   cat("It even prints in different output forms!\n")   if(warn1) warning('Message one')   if(warn2) warning('Message two')   if(warn3) warning('Message three')   if(error) stop('terminate function call')   return('Returned from fun()')}# normal run (no warnings or errors, but messages)out <- fun()out <- quiet(fun()) # using "indoor voice"# suppress all print messages and warnings (not recommended)fun(warn2=TRUE) |> quiet()fun(warn2=TRUE) |> quiet() |> suppressWarnings()# convert all warning to errors, and keep suppressing messages via quiet()fun(warn2=TRUE) |> quiet() |> manageWarnings(warning2error=TRUE)# define tolerable warning messages (only warn1 deemed ignorable)ret <- fun(warn1=TRUE) |> quiet() |>  manageWarnings(suppress = 'Message one')# all other warnings raised to an error except ignorable onesfun(warn1=TRUE, warn2=TRUE) |> quiet() |>  manageWarnings(warning2error=TRUE, suppress = 'Message one')# only warn2 raised to an error explicitly (warn3 remains as warning)ret <- fun(warn1=TRUE, warn3=TRUE) |> quiet() |>  manageWarnings(warning2error = 'Message two',                 suppress = 'Message one')fun(warn1=TRUE, warn2 = TRUE, warn3=TRUE) |> quiet() |>  manageWarnings(warning2error = 'Message two',                 suppress = 'Message one')############################ Practical example, converting warning into error for model that# failed to converged normally library(lavaan)## The industrialization and Political Democracy Example## Bollen (1989), page 332model <- '  # latent variable definitions     ind60 =~ x1 + x2 + x3     dem60 =~ y1 + a*y2 + b*y3 + c*y4     dem65 =~ y5 + a*y6 + b*y7 + c*y8  # regressions    dem60 ~ ind60    dem65 ~ ind60 + dem60  # residual correlations    y1 ~~ y5    y2 ~~ y4 + y6    y3 ~~ y7    y4 ~~ y8    y6 ~~ y8'# throws a warningfit <- sem(model, data = PoliticalDemocracy, control=list(iter.max=60))# for a simulation study, often better to treat this as an errorfit <- sem(model, data = PoliticalDemocracy, control=list(iter.max=60)) |>   manageWarnings(warning2error = "the optimizer warns that a solution has NOT been found!")## End(Not run)

Auto-named Concatenation of Vector or List

Description

This is a wrapper to the functionc, however names the respective elementsaccording to their input object name. For this reason, nestingnc() callsis not recommended (joining independentnc() calls viac()is however reasonable).

Usage

nc(..., use.names = FALSE, error.on.duplicate = TRUE)

Arguments

...

objects to be concatenated

use.names

logical indicating ifnames should be preserved (unlikec,default isFALSE)

error.on.duplicate

logical; if the same object name appears in the returning objectshould an error be thrown? Default isTRUE

References

Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248

Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953

Examples

A <- 1B <- 2C <- 3names(C) <- 'LetterC'# compare the followingc(A, B, C) # unnamednc(A, B, C) # namednc(this=A, B, C) # respects override named (same as c() )nc(this=A, B, C, use.names = TRUE) # preserve original name## Not run: # throws errors if names not uniquenc(this=A, this=B, C)nc(LetterC=A, B, C, use.names=TRUE)## End(Not run)# poor input choice namesnc(t.test(c(1:2))$p.value, t.test(c(3:4))$p.value)# better to explicitly provide namenc(T1 = t.test(c(1:2))$p.value,   T2 = t.test(c(3:4))$p.value)# vector of unnamed inputsA <- c(5,4,3,2,1)B <- c(100, 200)nc(A, B, C) # A's and B's numbered uniquelyc(A, B, C)  # comparenc(beta=A, B, C) # replacement of object name# retain names attributes (but append object name, when appropriate)names(A) <- letters[1:5]nc(A, B, C)nc(beta=A, B, C)nc(A, B, C, use.names=TRUE)# mix and match if some named elements work while others do notc( nc(A, B, use.names=TRUE), nc(C))## Not run: # error, 'b' appears twicenames(B) <- c('b', 'b2')nc(A, B, C, use.names=TRUE)## End(Not run)# List inputA <- list(1)B <- list(2:3)C <- list('C')names(C) <- 'LetterC'# compare the followingc(A, B, C) # unnamednc(A, B, C) # namednc(this=A, B, C) # respects override named (same as c() and list() )nc(this=A, B, C, use.names = TRUE) # preserve original name

Create a Pushbullet Notifier

Description

Constructs a notifier object for sending messages via Pushbullet. This requires aPushbullet account, the Pushbullet application installed on both a mobile deviceand computer, and a properly configured JSON file (typically~/.rpushbullet.json,usingRPushbullet::pbSetup()).

Usage

new_PushbulletNotifier(  config_path = "~/.rpushbullet.json",  verbose_issues = FALSE)

Arguments

config_path

A character string specifying the path to the Pushbullet configuration file.Defaults to"~/.rpushbullet.json".

verbose_issues

Logical. IfTRUE, includes detailed information about warningsand errors in notifications. Default isFALSE.

Details

To useRPushbullet inSimDesign, create aPushbulletNotifierobject usingnew_PushbulletNotifier() and pass it to thenotifierargument inrunSimulation().

Value

An S3 object of class"PushbulletNotifier" and"Notifier".

Examples

# Create a Pushbullet notifier (requires a valid configuration file)pushbullet_notifier <- new_PushbulletNotifier(verbose_issues = TRUE)

Create a Telegram Notifier

Description

Constructs a notifier object for sending messages via Telegram.Requires a valid Telegram bot token and chat ID.

Usage

new_TelegramNotifier(bot_token, chat_id, verbose_issues = FALSE)

Arguments

bot_token

A character string representing your Telegram bot token, typicallysomething like"123456:ABC-xxxx".

chat_id

A character string or numeric representing the chat/group to sendmessages to.

verbose_issues

Logical. If TRUE, provides detailed information about warnings and errors in the notifications.

Details

To use send notifications over Telegram withhttr inSimDesign,installhttr, set set up a Telegram bot, and obtain a bot token and chat ID.For more information, see theTelegram Bots API.Then use thenew_TelegramNotifier() function to create aTelegramNotifierobject and pass it to thenotifier argument inrunSimulation().

Value

An S3 object of class"TelegramNotifier".

Examples

# Create a Telegram notifier (requires setting up a Telegram Bot)telegram_notifier <- new_TelegramNotifier(bot_token = "123456:ABC-xyz", chat_id = "987654321")

Send a simulation notification

Description

Package extensions can implement custom notifiers by creating S3 methodsfor this generic.

Usage

notify(notifier, event, event_data)

Arguments

notifier

The notifier object

event

Character string indicating the notification trigger ("condition" or "complete")

event_data

List containing context information for the notification


S3 method to send notifications via Pushbullet

Description

S3 method to send notifications via Pushbullet

Usage

## S3 method for class 'PushbulletNotifier'notify(notifier, event, event_data)

Arguments

notifier

A TelegramNotifier object created with new_TelegramNotifier()

event

Character string indicating the notification trigger ("condition" or "complete")

event_data

List containing context information for the notification

Value

Invisibly returns NULL


S3 method to send notifications through the Telegram API.

Description

S3 method to send notifications through the Telegram API.

Usage

## S3 method for class 'TelegramNotifier'notify(notifier, event, event_data)

Arguments

notifier

A TelegramNotifier object created with new_TelegramNotifier()

event

Character string indicating the notification trigger ("condition" or "complete")

event_data

List containing context information for the notification

Value

Invisibly returns NULL


Suppress verbose function messages

Description

This function is used to suppress information printed from external functionsthat make internal use ofmessage andcat, whichprovide information in interactive R sessions. For simulations, the sessionis not interactive, and therefore this type of output should be suppressed.For similar behaviour for suppressing warning messages, seemanageWarnings.

Usage

quiet(..., cat = TRUE, keep = FALSE, attr.name = "quiet.messages")

Arguments

...

the functional expression to be evaluated

cat

logical; also capture calls fromcat? IfFALSE onlymessage will be suppressed

keep

logical; return a character vector of the messages/concatenateand print strings as an attribute to the resulting object fromexpr(...)?

attr.name

attribute name to use whenkeep = TRUE

References

Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248

Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953

See Also

manageWarnings

Examples

myfun <- function(x, warn=FALSE){   message('This function is rather chatty')   cat("It even prints in different output forms!\n")   message('And even at different....')   cat("...times!\n")   if(warn)     warning('It may even throw warnings!')   x}out <- myfun(1)out# tell the function to shhhhout <- quiet(myfun(1))out# which messages are suppressed? Extract stored attributeout <- quiet(myfun(1), keep = TRUE)attr(out, 'quiet.messages')# Warning messages still get through (see manageWarnings(suppress)#  for better alternative than using suppressWarnings())out2 <- myfun(2, warn=TRUE) |> quiet() # warning gets throughout2# suppress warning message explicitly, allowing others to be raised if presentmyfun(2, warn=TRUE) |> quiet() |>   manageWarnings(suppress='It may even throw warnings!')

Generate non-normal data with Headrick's (2002) method

Description

Generate multivariate non-normal distributions using the fifth-order polynomialmethod described by Headrick (2002).

Usage

rHeadrick(  n,  mean = rep(0, nrow(sigma)),  sigma = diag(length(mean)),  skew = rep(0, nrow(sigma)),  kurt = rep(0, nrow(sigma)),  gam3 = NaN,  gam4 = NaN,  return_coefs = FALSE,  coefs = NULL,  control = list(trace = FALSE, max.ntry = 15, obj.tol = 1e-10, n.valid.sol = 1))

Arguments

n

number of samples to draw

mean

a vector of k elements for the mean of the variables

sigma

desired k x k covariance matrix between bivariate non-normal variables

skew

a vector of k elements for the skewness of the variables

kurt

a vector of k elements for the kurtosis of the variables

gam3

(optional) explicitly supply the gamma 3 value? Default computes this internally

gam4

(optional) explicitly supply the gamma 4 value? Default computes this internally

return_coefs

logical; return the estimated coefficients only? See below regarding why this is useful.

coefs

(optional) supply previously estimated coefficients? This is useful when there must be multipledata sets drawn and will avoid repetitive computations. Must be the object returned after passingreturn_coefs = TRUE

control

a list of control parameters when locating the polynomial coefficients

Details

This function is primarily a wrapper for the code written by Oscar L. Olvera Astivia(last edited Feb 26, 2015) with some modifications (e.g., better starting valuesfor the Newton optimizer, passing previously saved coefs, etc).

Author(s)

Oscar L. Olvera Astivia and Phil Chalmersrphilip.chalmers@gmail.com

References

Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248

Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953

Headrick, T. C. (2002). Fast fifth-order polynomial transforms for generating univariate andmultivariate nonnormal distributions.Computational Statistics & Data Analysis, 40, 685-711.

Olvera Astivia, O. L., & Zumbo, B. D. (2015). A Cautionary Note on the Use of the Vale and MaurelliMethod to Generate Multivariate, Nonnormal Data for Simulation Purposes.Educational and Psychological Measurement, 75, 541-567.

Examples

## Not run: set.seed(1)N <- 200mean <- c(rep(0,4))Sigma <- matrix(.49, 4, 4)diag(Sigma) <- 1skewness <- c(rep(1,4))kurtosis <- c(rep(2,4))nonnormal <- rHeadrick(N, mean, Sigma, skewness, kurtosis)# cor(nonnormal)# psych::describe(nonnormal)#-----------# compute the coefficients, then supply them back to the function to avoid# extra computationscfs <- rHeadrick(N, mean, Sigma, skewness, kurtosis, return_coefs = TRUE)cfs# comparesystem.time(nonnormal <- rHeadrick(N, mean, Sigma, skewness, kurtosis))system.time(nonnormal <- rHeadrick(N, mean, Sigma, skewness, kurtosis,                                   coefs=cfs))## End(Not run)

Generate non-normal data with Vale & Maurelli's (1983) method

Description

Generate multivariate non-normal distributions using the third-order polynomial method describedby Vale & Maurelli (1983). If only a single variable is generated then this functionis equivalent to the method described by Fleishman (1978).

Usage

rValeMaurelli(  n,  mean = rep(0, nrow(sigma)),  sigma = diag(length(mean)),  skew = rep(0, nrow(sigma)),  kurt = rep(0, nrow(sigma)))

Arguments

n

number of samples to draw

mean

a vector of k elements for the mean of the variables

sigma

desired k x k covariance matrix between bivariate non-normal variables

skew

a vector of k elements for the skewness of the variables

kurt

a vector of k elements for the kurtosis of the variables

Author(s)

Phil Chalmersrphilip.chalmers@gmail.com

References

Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248

Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953

Fleishman, A. I. (1978). A method for simulating non-normal distributions.Psychometrika, 43, 521-532.

Vale, C. & Maurelli, V. (1983). Simulating multivariate nonnormal distributions.Psychometrika, 48(3), 465-471.

Examples

set.seed(1)# univariate with skewnonnormal <- rValeMaurelli(10000, mean=10, sigma=5, skew=1, kurt=3)# psych::describe(nonnormal)# multivariate with skew and kurtosisn <- 10000r12 <- .4r13 <- .9r23 <- .1cor <- matrix(c(1,r12,r13,r12,1,r23,r13,r23,1),3,3)sk <- c(1.5,1.5,0.5)ku <- c(3.75,3.5,0.5)nonnormal <- rValeMaurelli(n, sigma=cor, skew=sk, kurt=ku)# cor(nonnormal)# psych::describe(nonnormal)

Combine two separate SimDesign objects by row

Description

This function combines two Monte Carlo simulations executed bySimDesign'srunSimulation function which, for allintents and purposes, could have been executed in a single run.This situation arises when a simulation has been completed, howevertheDesign object was later modified to include more levels in thedefined simulation factors. Rather than re-executing the previously completedsimulation combinations, only the new combinations need to be evaluatedinto a different object and thenrbind together to create the completeobject combinations.

Usage

## S3 method for class 'SimDesign'rbind(...)

Arguments

...

two or moreSimDesign objects that should becombined by rows

Value

same object that is returned byrunSimulation

Author(s)

Phil Chalmersrphilip.chalmers@gmail.com

References

Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248

Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953

Examples

## Not run: # modified example from runSimulation()Design <- createDesign(N = c(10, 20),                       SD = c(1, 2))Generate <- function(condition, fixed_objects) {    dat <- with(condition, rnorm(N, 10, sd=SD))    dat}Analyse <- function(condition, dat, fixed_objects) {    ret <- mean(dat) # mean of the sample data vector    ret}Summarise <- function(condition, results, fixed_objects) {    ret <- c(mu=mean(results), SE=sd(results)) # mean and SD summary of the sample means    ret}Final1 <- runSimulation(design=Design, replications=1000,                       generate=Generate, analyse=Analyse, summarise=Summarise)Final1#### later decide that N = 30 should have also been investigated. Rather than# running the following object ....newDesign <- createDesign(N = c(10, 20, 30),                          SD = c(1, 2))# ... only the new subset levels are executed to save timesubDesign <- subset(newDesign, N == 30)subDesignFinal2 <- runSimulation(design=subDesign, replications=1000,                       generate=Generate, analyse=Analyse, summarise=Summarise)Final2# glue results together by row into one object as though the complete 'Design'# object were run all at onceFinal <- rbind(Final1, Final2)Finalsummary(Final)## End(Not run)

Run a summarise step for results that have been saved to the hard drive

Description

WhenrunSimulation() uses the optionsave_results = TRUEthe R replication results from the Generate-Analyse functions arestored to the hard drive. As such, additional summarise componentsmay be required at a later time, whereby the respective.rds filesmust be read back into R to be summarised. This function performsthe reading of these files, application of a provided summarise function,and final collection of the respective results.

Usage

reSummarise(  summarise,  dir = NULL,  files = NULL,  results = NULL,  Design = NULL,  fixed_objects = NULL,  boot_method = "none",  boot_draws = 1000L,  CI = 0.95,  prefix = "results-row")

Arguments

summarise

a summarise function to apply to the read-in files.SeerunSimulation for details.

Note that if the simulation containedonly one row then the new summarise function can be defined as eithersummarise <- function(results, fixed_objects), iffixed_objects is required, orsummarise <- function(results),

dir

directory pointing to the .rds files to beread-in that were saved fromrunSimulation(..., save_results=TRUE).IfNULL, it is assumed the current working directory containsthe .rds files

files

(optional) names of files to read-in. IfNULL all fileslocated withindir will be used

results

(optional) the results ofrunSimulation when nosummarise function was provided. Can be either atibble ormatrix (indicating that exactly one design condition was evaluated),or alist ofmatrix/tibbleobjects indicating that multiple conditions were performed with no summarise evaluation.

Alternatively, ifstore_results = TRUE in therunSimulation() execution thenthe final SimDesign object may be passed, where the generate-analyse information will beextracted from the object instead

Design

(optional) ifresults input used, and design condition informationimportant in the summarise step, then the originaldesign object fromrunSimulation should be included

fixed_objects

(optional) seerunSimulation for details

boot_method

method for performing non-parametric bootstrap confidence intervalsfor the respective meta-statistics computed by theSummarise function.SeerunSimulation for details

boot_draws

number of non-parametric bootstrap draws to sample for thesummarisefunction after the generate-analyse replications are collected. Default is 1000

CI

bootstrap confidence interval level (default is 95%)

prefix

character indicating prefix used for stored files

Author(s)

Phil Chalmersrphilip.chalmers@gmail.com

References

Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248

Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953

Examples

Design <- createDesign(N = c(10, 20, 30))Generate <- function(condition, fixed_objects) {    dat <- with(condition, rnorm(N, 10, 5)) # distributed N(10, 5)    dat}Analyse <- function(condition, dat, fixed_objects) {    ret <- c(mean=mean(dat), median=median(dat)) # mean/median of sample data    ret}Summarise <- function(condition, results, fixed_objects){    colMeans(results)}## Not run: # run the simulationrunSimulation(design=Design, replications=50,              generate=Generate, analyse=Analyse,              summarise=Summarise, save_results=TRUE,              save_details = list(save_results_dirname='simresults'))res <- reSummarise(Summarise, dir = 'simresults/')resSummarise2 <- function(condition, results, fixed_objects){    ret <- c(mean_ests=colMeans(results), SE=colSDs(results))    ret}res2 <- reSummarise(Summarise2, dir = 'simresults/')res2SimClean(dir='simresults/')## End(Not run)#### Similar, but with results stored within the final objectres <- runSimulation(design=Design, replications=50, store_results = TRUE,                     generate=Generate, analyse=Analyse, summarise=Summarise)res# same summarise but with bootstrappingres2 <- reSummarise(Summarise, results = res, boot_method = 'basic')res2

Rejection sampling (i.e., accept-reject method)

Description

This function supports the rejection sampling (i.e., accept-reject) approachto drawing values from seemingly difficult (probability) density functionsby sampling values from more manageable proxy distributions.

Usage

rejectionSampling(  n,  df,  dg,  rg,  M,  method = "optimize",  interval = NULL,  logfuns = FALSE,  maxM = 1e+05,  parstart = rg(1L),  ESRS_Mstart = 1.0001)

Arguments

n

number of samples to draw

df

the desired (potentially un-normed) density function to drawindependent samples from. Must be in the form of afunction with asingle input corresponding to the values sampled fromrg. Functionis assumed to be vectorized (if not, seeVectorize)

dg

the proxy (potentially un-normed) density function todraw samples from in lieu of drawing samples fromdf.The support for this density function should be the same asdf(i.e., whendf(x) > 0 thendg(x) > 0).Must be in the form of afunction with a single inputcorresponding to the values sampled fromrg. Function isassumed to be vectorized (if not, seeVectorize)

rg

the proxy random number generation function, associated withdg, used to draw proposal samples from.Must be in the form of afunction with a single input correspondingto the number of values to draw, while the output can either be a vectoror a matrix (if a matrix, each independent observation must be stored ina unique row). Function is assumed to be vectorized (if not, seeVectorize)

M

the upper-bound of the ratio of probability density functions to helpminimize the number of discarded draws and define the correspondingrescaled proposal envelope. When missing,M is computedinternally by finding a reasonable maximum oflog(df(x)) - log(dg(x)),and this value is returned to the console.When bothdf anddg are true probability density functions(i.e., integrate to 1) the acceptance probability is equal to 1/M

method

when M is missing, the optimization of M is done either byfinding the mode of the log-density values ("optimize") or byusing the "Empirical Supremum Rejection Sampling" method ("ESRS")

interval

when M is missing, for univariate density function draws,the interval to search within viaoptimize.If not specified, a sample of 5000 values from thergfunction definition will becollected, and the min/max will be obtained via this random sample

logfuns

logical; have thedf anddg function beenwritten so as to return log-densities instead of the original densities?The FALSE default assumes the original densities are returned(use TRUE when higher accuracy is required when generating each densitydefinition)

maxM

logical; if when optimizing M the value is greater than thiscut-off then stop; ampler would likelihood be too efficient,or optimization is failing

parstart

starting value vector for optimization of M inmultidimensional distributions

ESRS_Mstart

starting M value for the ESRS algorithm

Details

The accept-reject algorithm is a flexible approach to obtaining i.i.d.'s froma difficult to sample from (probability) density function where either thetransformation method fails or inverse transform method isdifficult to manage. The algorithm does so by sampling froma more "well-behaved" proxy distribution (with identical support, up to someproportionality constantM that reshapes the proposal densityto envelope the target density), and accepts thedraws if they are likely within the target density. Hence, the closer theshape ofdg(x) is to the desireddf(x), the more likely the drawsare to be accepted; otherwise, many iterations of the accept-reject algorithmmay be required, which decreases the computational efficiency.

Value

returns a vector or matrix of draws (corresponding to theoutput class fromrg) from the desireddf

Author(s)

Phil Chalmersrphilip.chalmers@gmail.com

References

Caffo, B. S., Booth, J. G., and Davison, A. C. (2002). Empirical supremumrejection sampling.Biometrika, 89, 745–754.

Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and ReliableMonte Carlo Simulations with the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248

Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statisticswith Monte Carlo simulation.Journal of Statistics Education, 24(3),136-156.doi:10.1080/10691898.2016.1246953

Examples

## Not run: # Generate X ~ beta(a,b), where a and b are a = 2.7 and b = 6.3,# and the support is Y ~ Unif(0,1)dfn <- function(x) dbeta(x, shape1 = 2.7, shape2 = 6.3)dgn <- function(x) dunif(x, min = 0, max = 1)rgn <- function(n) runif(n, min = 0, max = 1)# when df and dg both integrate to 1, acceptance probability = 1/MM <- rejectionSampling(df=dfn, dg=dgn, rg=rgn)Mdat <- rejectionSampling(10000, df=dfn, dg=dgn, rg=rgn, M=M)hist(dat, 100)hist(rbeta(10000, 2.7, 6.3), 100) # compare# obtain empirical estimate of M via ESRS methodM <- rejectionSampling(1000, df=dfn, dg=dgn, rg=rgn, method='ESRS')M# generate using better support function (here, Y ~ beta(2,6)),#   and use log setup in initial calls (more numerically accurate)dfn <- function(x) dbeta(x, shape1 = 2.7, shape2 = 6.3, log = TRUE)dgn <- function(x) dbeta(x, shape1 = 2, shape2 = 6, log = TRUE)rgn <- function(n) rbeta(n, shape1 = 2, shape2 = 6)M <- rejectionSampling(df=dfn, dg=dgn, rg=rgn, logfuns=TRUE) # better MM## Alternative estimation of M## M <- rejectionSampling(10000, df=dfn, dg=dgn, rg=rgn, logfuns=TRUE,##                        method='ESRS')dat <- rejectionSampling(10000, df=dfn, dg=dgn, rg=rgn, M=M, logfuns=TRUE)hist(dat, 100)#------------------------------------------------------# sample from wonky (and non-normalized) density function, like belowdfn <- function(x){    ret <- numeric(length(x))    ret[x <= .5] <- dnorm(x[x <= .5])    ret[x > .5] <-  dnorm(x[x > .5]) + dchisq(x[x > .5], df = 2)    ret}y <- seq(-5,5, length.out = 1000)plot(y, dfn(y), type = 'l', main = "Function to sample from")# choose dg/rg functions that have support within the range [-inf, inf]rgn <- function(n) rnorm(n, sd=4)dgn <- function(x) dnorm(x, sd=4)## example M height from above graphic##  (M selected using ESRS to help stochastically avoid local mins)M <- rejectionSampling(10000, df=dfn, dg=dgn, rg=rgn, method='ESRS')Mlines(y, dgn(y)*M, lty = 2)dat <- rejectionSampling(10000, df=dfn, dg=dgn, rg=rgn, M=M)hist(dat, 100, prob=TRUE)# true density (normalized)C <- integrate(dfn, -Inf, Inf)$valuendfn <- function(x) dfn(x) / Ccurve(ndfn, col='red', lwd=2, add=TRUE)#-----------------------------------------------------# multivariate distributiondfn <- function(x) sum(log(c(dnorm(x[1]) + dchisq(x[1], df = 5),                   dnorm(x[2], -1, 2))))rgn <- function(n) c(rnorm(n, sd=3), rnorm(n, sd=3))dgn <- function(x) sum(log(c(dnorm(x[1], sd=3), dnorm(x[1], sd=3))))# M <- rejectionSampling(df=dfn, dg=dgn, rg=rgn, logfuns=TRUE)dat <- rejectionSampling(5000, df=dfn, dg=dgn, rg=rgn, M=4.6, logfuns=TRUE)hist(dat[,1], 30)hist(dat[,2], 30)plot(dat)## End(Not run)

Generate integer values within specified range

Description

Efficiently generate positive and negative integer values with (default) or without replacement.This function is mainly a wrapper to thesample.int function (which itself is muchmore efficient integer sampler than the more generalsample), however is intendedto work with both positive and negative integer ranges sincesample.int only returnspositive integer values that must begin at1L.

Usage

rint(n, min, max, replace = TRUE, prob = NULL)

Arguments

n

number of samples to draw

min

lower limit of the distribution. Must be finite

max

upper limit of the distribution. Must be finite

replace

should sampling be with replacement?

prob

a vector of probability weights for obtaining the elements of the vector being sampled

Author(s)

Phil Chalmersrphilip.chalmers@gmail.com

References

Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248

Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953

Examples

set.seed(1)# sample 1000 integer values within 20 to 100x <- rint(1000, min = 20, max = 100)summary(x)# sample 1000 integer values within 100 to 10 billionx <- rint(1000, min = 100, max = 1e8)summary(x)# compare speed to sample()system.time(x <- rint(1000, min = 100, max = 1e8))system.time(x2 <- sample(100:1e8, 1000, replace = TRUE))# sample 1000 integer values within -20 to 20x <- rint(1000, min = -20, max = 20)summary(x)

Generate data with the inverse Wishart distribution

Description

Function generates data in the form of symmetric matrices from the inverseWishart distribution given a covariance matrix and degrees of freedom.

Usage

rinvWishart(n = 1, df, sigma)

Arguments

n

number of matrix observations to generate. By defaultn = 1, which returns a singlesymmetric matrix. Ifn > 1 then a list ofn symmetric matrices are returned instead

df

degrees of freedom

sigma

positive definite covariance matrix

Value

a numeric matrix with columns equal toncol(sigma) whenn = 1, or a listofn matrices with the same properties

Author(s)

Phil Chalmersrphilip.chalmers@gmail.com

References

Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248

Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953

See Also

runSimulation

Examples

# random inverse Wishart matrix given variances [3,6], covariance 2, and df=15sigma <- matrix(c(3,2,2,6), 2, 2)x <- rinvWishart(sigma = sigma, df = 15)x# list of matricesx <- rinvWishart(20, sigma = sigma, df = 15)x

Generate data with the multivariate g-and-h distribution

Description

Generate non-normal distributions using the multivariate g-and-h distribution. Can be used togenerate several different classes of univariate and multivariate distributions.

Usage

rmgh(n, g, h, mean = rep(0, length(g)), sigma = diag(length(mean)))

Arguments

n

number of samples to draw

g

the g parameter(s) which control the skew of a distribution in terms of both directionand magnitude

h

the h parameter(s) which control the tail weight or elongation of a distribution andis positively related with kurtosis

mean

a vector of k elements for the mean of the variables

sigma

desired k x k covariance matrix between bivariate non-normal variables

Author(s)

Phil Chalmersrphilip.chalmers@gmail.com

References

Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248

Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953

Examples

set.seed(1)# univariatenorm <- rmgh(10000,1e-5,0)hist(norm)skew <- rmgh(10000,1/2,0)hist(skew)neg_skew_platykurtic <- rmgh(10000,-1,-1/2)hist(neg_skew_platykurtic)# multivariatesigma <- matrix(c(2,1,1,4), 2)mean <- c(-1, 1)twovar <- rmgh(10000, c(-1/2, 1/2), c(0,0),    mean=mean, sigma=sigma)hist(twovar[,1])hist(twovar[,2])plot(twovar)

Generate data with the multivariate normal (i.e., Gaussian) distribution

Description

Function generates data from the multivariate normal distribution given some mean vector and/orcovariance matrix.

Usage

rmvnorm(n, mean = rep(0, nrow(sigma)), sigma = diag(length(mean)))

Arguments

n

number of observations to generate

mean

mean vector, default isrep(0, length = ncol(sigma))

sigma

positive definite covariance matrix, default isdiag(length(mean))

Value

a numeric matrix with columns equal tolength(mean)

Author(s)

Phil Chalmersrphilip.chalmers@gmail.com

References

Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248

Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953

See Also

runSimulation

Examples

# random normal values with mean [5, 10] and variances [3,6], and covariance 2sigma <- matrix(c(3,2,2,6), 2, 2)mu <- c(5,10)x <- rmvnorm(1000, mean = mu, sigma = sigma)head(x)summary(x)plot(x[,1], x[,2])

Generate data with the multivariate t distribution

Description

Function generates data from the multivariate t distribution given a covariance matrix,non-centrality parameter (or mode), and degrees of freedom.

Usage

rmvt(n, sigma, df, delta = rep(0, nrow(sigma)), Kshirsagar = FALSE)

Arguments

n

number of observations to generate

sigma

positive definite covariance matrix

df

degrees of freedom.df = 0 anddf = Infcorresponds to the multivariate normal distribution

delta

the vector of non-centrality parameters of lengthnwhich specifies the either the modes (default) or non-centrality parameters

Kshirsagar

logical; triggers whether to generate data with non-centrality parametersor to adjust the simulated data to the mode of the distribution. The default uses the mode

Value

a numeric matrix with columns equal toncol(sigma)

Author(s)

Phil Chalmersrphilip.chalmers@gmail.com

References

Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248

Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953

See Also

runSimulation

Examples

# random t values given variances [3,6], covariance 2, and df = 15sigma <- matrix(c(3,2,2,6), 2, 2)x <- rmvt(1000, sigma = sigma, df = 15)head(x)summary(x)plot(x[,1], x[,2])

Generate a random set of values within a truncated range

Description

Function generates data given a supplied random number generating function thatare constructed to fall within a particular range. Sampled values outside thisrange are discarded and re-sampled until the desired criteria has been met.

Usage

rtruncate(n, rfun, range, ..., redraws = 100L)

Arguments

n

number of observations to generate. This should be the first argumentpassed torfun

rfun

a function to generate random values. Function can returna numeric/integer vector or matrix, and additional argumentsrequred for this function are passed through the argument...

range

a numeric vector of length two, where the first elementindicates the lower bound and the second the upper bound. When values aregenerated outside these two bounds then data are redrawn until the boundedcriteria is met. When the output ofrfun is a matrix then this inputcan be specified as a matrix with two rows, where each the first rowcorresponds to the lower bound and the second row the upper bound foreach generated column in the output

...

additional arguments to be passed torfun

redraws

the maximum number of redraws to take before terminating theiterative sequence. This is in place as a safety in case therangeis too small given the random number generator, causing too manyconsecutive rejections. Default is 100

Details

In simulations it is often useful to draw numbers from truncated distributionsrather than across the full theoretical range. For instance, sampling parameterswithin the range [-4,4] from a normal distribution. Thertruncatefunction has been designed to accept any sampling function, where the firstargument is the number of values to sample, and will draw values iterativelyuntil the number of values within the specified bound are obtained.In situations where it is unlikely for the bounds to be located(e.g., sampling from a standard normal distribution where all values arewithin [-10,-6]) then the sampling scheme will throw an error if too manyre-sampling executions are required (default will stop if more that 100calls torfun are required).

Value

either a numeric vector or matrix, where all values are within thedesiredrange

Author(s)

Phil Chalmersrphilip.chalmers@gmail.com

References

Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and ReliableMonte Carlo Simulations with the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248

Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statisticswith Monte Carlo simulation.Journal of Statistics Education, 24(3),136-156.doi:10.1080/10691898.2016.1246953

See Also

runSimulation

Examples

# n = 1000 truncated normal vector between [-2,3]vec <- rtruncate(1000, rnorm, c(-2,3))summary(vec)# truncated correlated multivariate normal between [-1,4]mat <- rtruncate(1000, rmvnorm, c(-1,4),   sigma = matrix(c(2,1,1,1),2))summary(mat)# truncated correlated multivariate normal between [-1,4] for the#  first column and [0,3] for the second columnmat <- rtruncate(1000, rmvnorm, cbind(c(-1,4), c(0,3)),   sigma = matrix(c(2,1,1,1),2))summary(mat)# truncated chi-square with df = 4 between [2,6]vec <- rtruncate(1000, rchisq, c(2,6), df = 4)summary(vec)

Run a Monte Carlo simulation using array job submissions per condition

Description

This function has the same purpose asrunSimulation, howeverrather than evaluating each row in adesign object (potentially withparallel computing architecture) this function evaluates the simulationper independent row condition. This is mainly useful when distributing thejobs to HPC clusters where a job array number is available (e.g., via SLURM),where the simulation results must be saved to independent files as theycomplete. Use ofexpandDesign is useful for distributing replicationsto different jobs, whilegenSeeds is required to ensure high-qualityrandom number generation across the array submissions. See the associatedvignette for a brief tutorial of this setup.

Usage

runArraySimulation(  design,  ...,  replications,  iseed,  filename,  dirname = NULL,  arrayID = getArrayID(),  array2row = function(arrayID) arrayID,  addArrayInfo = TRUE,  parallel = FALSE,  cl = NULL,  ncores = parallelly::availableCores(omit = 1L),  save_details = list(),  control = list(),  verbose = interactive())

Arguments

design

design object containing simulation conditions on a per row basis.This function is design to submit each row as in independent job on a HPC cluster.SeerunSimulation for further details

...

additional arguments to be passed torunSimulation

replications

number of independent replications to perform percondition (i.e., each row indesign). SeerunSimulationfor further details

iseed

initial seed to be passed togenSeeds's argumentof the same name, along with the suppliedarrayID

filename

file name to save simulation files to (does not need tospecify extension). However, the array ID will be appended to eachfilename. For example, iffilename = 'mysim' then files stored will be'mysim-1.rds','mysim-2.rds', and so on for each row ID indesign

dirname

directory to save the files associated withfilenameto. If omitted the files will be stored in the same working directorywhere the script was submitted

arrayID

array identifier from the scheduler. Must be a number between1 andnrow(design). If not specified thengetArrayID willbe called automatically, which assumes the environmental variables are availableaccording the SLURM scheduler

array2row

user defined function with the single argumentarrayID.Used to convert the detectedarrayIDinto a suitable row index in thedesign object input. By defaulteacharrayID is associated with its respective row indesign.

For example, if eacharrayID should evaluate 10 rows inthedesign object then the functionfunction(arrayID){1:10 + 10 * (arrayID-1)} can be passed toarray2row

addArrayInfo

logical; should the array ID and original design row numberbe added to theSimResults(...) output?

parallel

logical; use parallel computations via the a "SOCK" cluster?Only useful when the instruction shell file requires more than 1 core(number of cores detected viancores). For this applicationthe random seeds further distributed usingnextRNGSubStream

cl

cluster definition. If omitted a "SOCK" cluster will be defined

ncores

number of cores to use whenparallel=TRUE. Note thatthe default uses 1 minus the number of available cores, therefore thiswill only be useful whenncores > 2 as defined in the shell instructionfile

save_details

optional list of extra file saving details.SeerunSimulation

control

control list passed torunSimulation.In addition to the originalcontrol elements twoadditional arguments have been added:max_time andmax_RAM, both of which as specified ascharacter vectors with one element.

max_time specifies the maximum time allowed for asingle simulation condition to execute (default does not setany time limits), and is formatted according to the specification intimeFormater. This is primarily useful when the HPC clusterwill time out after some known elapsed time.In general, this input should be set to somewhere around80-90before the cluster is terminated can be saved. Default applies no time limit

Similarly,max_RAM controls the(approximate) maximum size that the simulation storage objects can growbefore RAM becomes an issue. This can be specified either in terms of megabytes(MB), gigabytes (GB), or terabytes (TB). For example,max_RAM = "4GB" indicates thatif the simulation storage objects are larger than 4GB then the workflowwill terminate early, returning only the successful results up to this point).Useful for larger HPC cluster jobs with RAM constraints that could terminate abruptly.As a rule of thumb this should be set to around 90available. Default applies no memory limit

verbose

logical; pass a verbose flag torunSimulation.UnlikerunSimulation this is set to FALSE during interactivesessions, though set to TRUE when non-interactive and information about thesession itself should be stored (e.g., in SLURM.out files)

Details

Due to the nature of how the replication are split it is important thatthe L'Ecuyer-CMRG (2002) method of random seeds is used across allarray ID submissions (cf.runSimulation'sparallelapproach, which uses this method to distribute random seeds withineach isolated condition rather than between all conditions). As such, thisfunction requires the seeds to be generated usinggenSeeds with theiseed andarrayIDinputs to ensure that each job is analyzing a high-qualityset of random numbers via L'Ecuyer-CMRG's (2002) method, incremented usingnextRNGStream.

Additionally, for timed simulations on HPC clusters it is also recommended to pass acontrol = list(max_time) value to avoid discardingconditions that require more than the specified time in the shell script.Themax_time value should be less than the maximum time allocatedon the HPC cluster (e.g., approximately 90depends on how long, and how variable, each replication is). Simulations with missingreplications should submit a new set of jobs at a later timeto collect the missing information.

Author(s)

Phil Chalmersrphilip.chalmers@gmail.com

References

Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248

See Also

runSimulation,expandDesign,genSeeds,SimCheck,SimCollect,getArrayID

Examples

library(SimDesign)Design <- createDesign(N = c(10, 20, 30))Generate <- function(condition, fixed_objects) {    dat <- with(condition, rnorm(N, 10, 5)) # distributed N(10, 5)    dat}Analyse <- function(condition, dat, fixed_objects) {    ret <- c(mean=mean(dat), median=median(dat)) # mean/median of sample data    ret}Summarise <- function(condition, results, fixed_objects){    colMeans(results)}## Not run: # define initial seed (do this only once to keep it constant!)# iseed <- genSeeds()iseed <- 554184288### On cluster submission, the active array ID is obtained via getArrayID(),###   and therefore should be used in real SLURM submissionsarrayID <- getArrayID(type = 'slurm')# However, the following example arrayID is set to#  the first row only for testing purposesarrayID <- 1L# run the simulation (results not caught on job submission, only files saved)res <- runArraySimulation(design=Design, replications=50,                      generate=Generate, analyse=Analyse,                      summarise=Summarise, arrayID=arrayID,                      iseed=iseed, filename='mysim') # saved as 'mysim-1.rds'resSimResults(res) # condition and replication count stored# same, but evaluated with multiple coresres <- runArraySimulation(design=Design, replications=50,                      generate=Generate, analyse=Analyse,                      summarise=Summarise, arrayID=arrayID,                      parallel=TRUE, ncores=3,                      iseed=iseed, filename='myparsim')resSimResults(res) # condition and replication count storeddir()SimClean(c('mysim-1.rds', 'myparsim-1.rds'))######################### Same submission job as above, however split the replications over multiple# evaluations and combine when completeDesign5 <- expandDesign(Design, 5)Design5# iseed <- genSeeds()iseed <- 554184288# arrayID <- getArrayID(type = 'slurm')arrayID <- 14L# run the simulation (replications reduced per row, but same in total)runArraySimulation(design=Design5, replications=10,                   generate=Generate, analyse=Analyse,                   summarise=Summarise, iseed=iseed,                   filename='mylongsim', arrayID=arrayID)res <- readRDS('mylongsim-14.rds')resSimResults(res) # condition and replication count storedSimClean('mylongsim-14.rds')#### Emulate the arrayID distribution, storing all results in a 'sim/' folder# (if 'sim/' does not exist in runArraySimulation() it will be# created automatically)dir.create('sim/')# Emulate distribution to nrow(Design5) = 15 independent job arrays##  (just used for presentation purposes on local computer)sapply(1:nrow(Design5), \(arrayID)     runArraySimulation(design=Design5, replications=10,          generate=Generate, analyse=Analyse,          summarise=Summarise, iseed=iseed, arrayID=arrayID,          filename='condition', dirname='sim', # files: "sim/condition-#.rds"          control = list(max_time="04:00:00", max_RAM="4GB"))) |> invisible()#  If necessary, conditions above will manually terminate before#  4 hours and 4GB of RAM are used, returning any#  successfully completed results before the HPC session times#  out (provided .slurm script specified more than 4 hours)# list saved filesdir('sim/')# check that all files saved (warnings will be raised if missing files)SimCheck('sim/')condition14 <- readRDS('sim/condition-14.rds')condition14SimResults(condition14)# aggregate simulation results into single filefinal <- SimCollect('sim/')final# clean simulation directorySimClean(dirs='sim/')############# same as above, however passing different amounts of information depending# on the array IDarray2row <- function(arrayID){  switch(arrayID,    "1"=1:8,    "2"=9:14,    "3"=15)}# arrayID 1 does row 1 though 8, arrayID 2 does 9 to 14array2row(1)array2row(2)array2row(3)  # arrayID 3 does 15 only# emulate remote array distribution with only 3 arrayssapply(1:3, \(arrayID)     runArraySimulation(design=Design5, replications=10,          generate=Generate, analyse=Analyse,          summarise=Summarise, iseed=iseed, arrayID=arrayID,          filename='condition', dirname='sim', array2row=array2row)) |> invisible()# list saved filesdir('sim/')# Also works if the replication input were a suitable vector (not run)if(FALSE){    reps <- c(rep(10, 14), 100)    cbind(Design5, reps)    sapply(1:3, \(arrayID)        runArraySimulation(design=Design5, replications=reps,          generate=Generate, analyse=Analyse,          summarise=Summarise, iseed=iseed, arrayID=arrayID,          filename='condition', dirname='sim', array2row=array2row)) |> invisible()}# note that all row conditions are still stored separately, though note that#  arrayID is now 2 insteadcondition14 <- readRDS('sim/condition-14.rds')condition14SimResults(condition14)# aggregate simulation results into single filefinal <- SimCollect('sim/')final# clean simulation directorySimClean(dirs='sim/')## End(Not run)

Run a Monte Carlo simulation given conditions and simulation functions

Description

This function runs a Monte Carlo simulation study given a set of predefined simulation functions,design conditions, and number of replications. Results can be saved as temporary files in case ofinterruptions and may be restored by re-runningrunSimulation, provided that the respective tempfile can be found in the working directory.runSimulation supports paralleland cluster computing (with theparallel andfuture packages; see alsorunArraySimulation for submitting array jobs to HPC clusters),global and local debugging, error handling (including fail-safestopping when functions fail too often, even across nodes), provides bootstrap estimates of thesampling variability (optional), and automatic tracking of error and warning messageswith their associated.Random.seed states.For convenience, all functions available in the R work-space are exported across all nodesso that they are more easily accessible (however, other R objects are not, and thereforemust be passed to thefixed_objects input to become available across nodes).

Usage

runSimulation(  design,  replications,  generate,  analyse,  summarise,  prepare = NULL,  fixed_objects = NULL,  packages = NULL,  filename = NULL,  debug = "none",  load_seed = NULL,  load_seed_prepare = NULL,  save = any(replications > 2),  store_results = TRUE,  save_results = FALSE,  parallel = FALSE,  ncores = parallelly::availableCores(omit = 1L),  cl = NULL,  notification = "none",  notifier = NULL,  beep = FALSE,  sound = 1,  check.globals = FALSE,  CI = 0.95,  seed = NULL,  boot_method = "none",  boot_draws = 1000L,  max_errors = 50L,  resume = TRUE,  save_details = list(),  control = list(),  not_parallel = NULL,  progress = TRUE,  verbose = interactive())## S3 method for class 'SimDesign'summary(object, ...)## S3 method for class 'SimDesign'print(x, list2char = TRUE, ...)

Arguments

design

atibble ordata.frame object containing the Monte Carlo simulationconditions to be studied, where each row represents a unique condition and each column a factorto be varied. SeecreateDesign for the standard approachto create this simulation design object

replications

number of independent replications to perform percondition (i.e., each row indesign). Can be a single number, whichwill be used for each design condition, or an integer vector with lengthequal tonrow(design). All inputs must be greater than 0, though settingto less than 3 (for initial testing purpose) will disable thesaveandcontrol$stop_on_fatal flags

generate

user-defined data and parameter generating function (or named list of functions).SeeGenerate for details. Note that this argument may be omitted by theuser if they wish to generate the data with theanalyse step, but for real-worldsimulations this is generally not recommended. If multiple generate functions are providedas a list then the list of generate functions are executed in order until the first validgenerate function is executed, where the subsequent generation functions are then ignored(seeGenerateIf to only apply data generation for specific conditions).

analyse

user-defined analysis function (or named list of functions)that acts on the data generated fromGenerate (or, ifgenerate was omitted, can be used to generate andanalyse the simulated data). SeeAnalyse for details

summarise

optional (but strongly recommended) user-defined summary functionfromSummarise to be used to compute meta-statistical summaryinformation after all the replications have completed withineachdesign condition. Return of this function, in orderof increasing complexity, should be: a named numeric vector ordata.framewith one row, amatrix ordata.frame with more than one row, and,failing these more atomic types, a namedlist. When alist is returnedthe final simulation object will contain a columnSUMMARISE containing thesummary results for each respective condition

Note that unlike the Generate and Analysesteps, the Summarise portion is not as important to perfectly organizeas the results can be summarised later on by using the built-inreSummarise function (provided eitherstore_results = TRUE orsave_results = TRUE were included).

Omitting this function will return a tibble with theDesignand associated results information for allnrow(Design) * repliations evaluations if the results from eachAnalyse() call was a one-dimensional vector.For more general objects returned byAnalyse()(such aslists), alistcontaining the results returned fromAnalyse.This is generally only recommended for didactic purposes because the resultswill leave out a large amount ofinformation (e.g., try-errors, warning messages, saving files, etc), canwitness memory related issues if the Analyse function returns larger objects,and generally is not as flexible internally. However, it may be usefulwhen replications are expensive and ANOVA-based decompositions involvingthe within-condition replication information are of interest, thoughof course this can be circumvented by usingstore_results = TRUE orsave_results = TRUE with or without a suppliedsummarisedefinition.

Finally, there are keywords that should not be returned from thisfunction, since they will cause a conflict with the aggregated simulationobjects. These are currently those listed in capital letters (e.g.,ERRORS,WARNINGS,REPLICATIONS, etc), all of which canbe avoided if the returned objects are not entirely capitalized(e.g.,Errors,errors,ErRoRs, ..., will all avoidconflicts)

prepare

(optional) a function that executes once per simulation condition(i.e., once per row indesign) to load or prepare condition-specific objectsintofixed_objects before replications are run. This function should acceptcondition andfixed_objects as arguments and return the modifiedfixed_objects.

The primary use case is to load pre-computed objects from disk that weregenerated offline:

prepare <- function(condition, fixed_objects) {

# Create filename from design parameters

fname <- paste0('prepare/N', condition$N, '_SD', condition$SD, '.rds')

fixed_objects$expensive_stuff <- readRDS(fname)

fixed_objects

}

This approach allows you to: (1) pre-generate expensive condition-specific objectsprior to running the simulation , (2) save them as individualRDS files, and (3) load them efficiently during the simulation. This is preferable togenerating objects withinprepare() itself because it allows you to inspect theobjects, ensures reproducibility, and separates object generation from the simulation workflow.

Note: Objects added tofixed_objects inprepare() are not storedbyrunSimulation() to conserve memory, as they are typically large. You shouldmaintain your own records of these objects outside the simulation.

RNG Warning: If you generate objects withinprepare() using random numbergeneration (e.g.,rnorm(),runif()), reproducibility requires explicit RNGstate management viastore_Random.seeds=TRUE andload_seed_prepare. Pre-computingand saving objects as RDS files is the recommended approach for reproducible simulations.

The function signature should be:prepare <- function(condition, fixed_objects) { ... return(fixed_objects) }

Default isNULL, in which case no preparation step is performed

fixed_objects

(optional) an object (usually a namedlist)containing additional user-defined objectsthat should remain fixed across conditions. This is useful when includinglarge vectors/matrices of population parameters, fixed data informationthat should be used across all conditions and replications (e.g., including acommon design matrix for linear regression models), or simply controlconstant global elements (e.g., a constant for sample size)

packages

a character vector of external packages to be used during the simulation (e.g.,c('MASS', 'extraDistr', 'simsem') ). Use this input when running code inparallel to use non-standard functions from additional packages. Note that any previously attachedpackages explicitly loaded vialibrary orrequirewill be automatically added to this list, provided that they are visiblein theotherPkgs elementfromsessionInfo. Alternatively, functions can be calledexplicitly without attaching the package with the:: operator(e.g.,extraDistr::rgumbel())

filename

(optional) the name of the.rds file to save the finalsimulation results to. If the extension.rds is not included in the file name (e.g."mysimulation"versus"mysimulation.rds") then the.rds extension will be automatically added to the file name to ensurethe file extension is correct.

Note that if the same file name already exists in the workingdirectly at the time of saving then a newfile will be generated instead and a warning will be thrown. This helps toavoid accidentally overwritingexisting files. Default isNULL, indicating no file will be saved by default

debug

a string indicating where to initiate abrowser() call for editingand debugging, and pairs particularly well with theload_seed argument for precise debugging.General options are'none' (default; no debugging),'error', whichstarts the debuggerwhen any error in the code is detected in one of three generate-analyse-summarise functions,and'all', which debugs all the user defined functions regardless ofwhether an error was thrownor not. Specific options include:'generate'to debug the data simulation function,'analyse' to debug the computational function, and'summarise' to debug the aggregation function.

If theAnalyse argument is supplied as a named list of functions then it is also possibleto debug the specific function of interest by passing the name of the respective function in the list.For instance, ifanalyse = list(A1=Analyse.A1, A2=Analyse.A2) then passingdebug = 'A1' will debug only the first function in this list, and all remaining analysisfunctions will be ignored.

For debugging specific rows in theDesign input (e.g.,when a number of initial rows successfully complete but thekthrow fails) the row number can be appended to the standarddebug input using a'-' separator.For instance, debugging whenever an error is raisedin the second row ofDesign can be declared viadebug = 'error-2'.

Finally, users may placebrowser calls within the respective functions fordebugging at specific lines, which is useful when debugging based on conditional evaluations (e.g.,if(this == 'problem') browser()). Note that parallel computation flagswill automatically be disabled when abrowser() is detected or when a debuggingargument other than'none' is supplied

load_seed

used to replicate an exact simulation state, which isprimarily useful for debugging purposes.Input can be a character object indicating which file to load from when the.Random.seeds havebe saved (after a call withsave_seeds = TRUE), or an integer vectorindicating the actual.Random.seed values (e.g., extracted after usingstore_seeds).E.g.,load_seed = 'design-row-2/seed-1'will load the first seed in the second row of thedesign input, orexplicitly passing theelements from.Random.seed (seeSimExtract to extractthe seeds associated explicitlywith errors during the simulation, where each column represents a unique seed).If the input is a character vector then it is important NOTto modify thedesign input object, otherwise the path may not pointto the correct saved location, whileif the input is an integer vector (or single columntbl object)then it WILL be important to modify thedesign input in order to load thisexact seed for the corresponding design row. Default isNULL

load_seed_prepare

similar toload_seed, but specifically fordebugging theprepare function. Used to replicate the exact RNG statewhen prepare is called for a given condition. Accepts the same input formatsasload_seed: a character string path (e.g.,'design-row-2/prepare-seed'),an integer vector containing the.Random.seed state, or a tibble/data.framewith seed values. This is particularly useful when prepare encounters an errorand you need to reproduce the exact state. The prepare error seed can beextracted usingSimExtract(res, 'prepare_error_seed'). Default isNULL

save

logical; save the temporary simulation state to the hard-drive? This is usefulfor simulations which require an extended amount of time, though for shorter simulationscan be disabled to slightly improve computational efficiency. WhenTRUE,which is the default when evaluatingreplications > 2, a temp filewill be created in the working directory which allows the simulation state to be savedand recovered (in case of power outages, crashes, etc). As well, triggering this flag willsave any fatal.Random.seed states when conditions unexpectedly crash (where each seedis stored row-wise in an external .rds file), which provides a much easier mechanismto debug issues (seeload_seed for details). Upon completion, this temp file willbe removed.

To recover your simulation at the last known location (having patched the issues in theprevious execution code) simply re-run the code you used toinitially define the simulation and the external file will automatically be detected and read-in.Default isTRUE whenreplications > 10 andFALSE otherwise

store_results

logical; store the complete tables of simulation resultsin the returned object? This isTRUE default, though if RAM anticipated tobe an issue seesave_results instead. Note that if theDesignobject is omitted from the call torunSimulation(), or the number of rows inDesignis exactly 1, then this argument is automatically set toTRUE as RAM storage is nolonger an issue.

To extract these resultspass the returned object to eitherSimResults orSimExtract withwhat = 'results', which will return a named listof all the simulation results for each condition ifnrow(Design) > 1; otherwise, ifnrow(Design) == 1 orDesign was missing theresults object will be stored as-is

save_results

logical; save the results returned fromAnalyse to external.rds files located in the definedsave_results_dirname directory/folder?Use this if you would like to keep track of the individual parameters returned fromtheanalysis function.Each saved object will contain a list of three elements containing thecondition (row fromdesign),results (as alist ormatrix), and try-errors.SeeSimResults for an example of how to read these.rds files back into Rafter the simulation is complete. Default isFALSE.

WARNING: saving results to your hard-drive can fill up space very quickly forlarger simulations. Be sure totest this option using a smaller number of replications before the full MonteCarlo simulation is performed.See alsoreSummarise for applying summarise functions from savedsimulation results

parallel

logical; use parallel processing from theparallelpackage over each unique condition? This distributes the independentreplicationsacross the defined nodes, and is repeated for each row condition in thedesigninput.

Alternatively, if thefuture package approach is desired then passingparallel = 'future' torunSimulation() will use the definedplan for execution. This allows for greater flexibility whenspecifying the general computing plan (e.g.,plan(multisession)) for parallel computingon the same machine,plan(future.batchtools::batchtools_torque) orplan(future.batchtools::batchtools_slurm) for common MPI/Slurm schedulers, etc).However, it is the responsibility of the user to useplan(sequential) to reset thecomputing plan when the jobs are completed

ncores

number of cores to be used in parallel execution (ignored if using thefuture package approach). Default uses all available minus 1

cl

cluster object defined bymakeCluster used to run code in parallel(ignored if using thefuture package approach).IfNULL andparallel = TRUE, a local cluster object will be defined whichselects the maximum number cores availableand will be stopped when the simulation is complete. Note that supplying aclobject will automatically set theparallel argument toTRUE. Define and supply thiscluster object yourself whenever you have multiple nodes and can link them together manually

If thefuture package hasbeen attached prior to executingrunSimulation() then the associatedplan() will be followed instead

notification

an optional character vector input that can be used to sendnotifications with information about execution time and recorded errors and warnings.Pass one of the following supported options:'none' (default; send no notification),'condition' to send a notificationafter each condition has completed, or'complete' to send a notification onlywhen the simulation has finished.When notification is set to'condition' or'complete', thenotifierargument must be supplied with a valid notifier object (or a list of notifier objects).

notifier

A notifier object (or a list of notifier objects, allowing for multiplenotification methods) required whennotification is not set to"none".SeelistAvailableNotifiers for a list of available notifiers and how to use them.

Example usage:

telegram_notifier <- new_TelegramNotifier(bot_token = "123456:ABC-xyz", chat_id = "987654321")

runSimulation(..., notification = "condition", notifier = telegram_notifier)

Using multiple notifiers:

pushbullet_notifier <- new_PushbulletNotifier()

runSimulation(..., notification = "complete", notifier = list(telegram_notifier, pushbullet_notifier))

See theR/notifications.R file for reference on implementing a custom notifier.

beep

logical; call thebeepr package when the simulation is completed?

sound

sound argument passed tobeepr::beep()

check.globals

logical; should the functions be inspected for potential global variable usage?Using global object definitions will raise issues with parallel processing, and thereforeany global object to be use in the simulation should either be defined within the codeitself of included in thefixed_objects input. Setting this value toTRUE willreturn a character vector of potentially problematic objects/functions that appear global, thelatter of which can be ignored. Careful inspection of this list may prove useful in tracking downobject export issues

CI

bootstrap confidence interval level (default is 95%)

seed

a vector or list of integers to be used for reproducibility.The length of the vector must be equal the number of rows indesign.If the input is a vector thenset.seed orclusterSetRNGStream for each condition will be called, respectively.If a list is provided then thesenumbers must have been generated fromgenSeeds. The list approachensures random number generation independence across conditions and replications,while the vector input ensures independence within the replications per conditionsbut not necessarily across conditions. Default randomly generates seeds within therange 1 to 2147483647 for each condition viagenSeeds

boot_method

method for performing non-parametric bootstrap confidence intervalsfor the respective meta-statistics computed by theSummarise function.Can be'basic' for the empirical bootstrap CI,'percentile'for percentile CIs,'norm' for normal approximations CIs, or'studentized'for Studentized CIs (should only be used for simulations with lower replications due to itscomputational intensity). Alternatively, CIs can be constructed using the argument'CLT',which computes the intervals according to the large-sample standard errorapproximationSD(results)/\sqrt{R}. Default is'none', which performs no CI computations

boot_draws

number of non-parametric bootstrap draws to sample for thesummarisefunction after the generate-analyse replications are collected. Default is 1000

max_errors

the simulation will terminate when more than this number of consecutiveerrors are thrown in anygiven condition, causing the simulation to continue to the next uniquedesign condition.This is included to avoid getting stuck in infinite re-draws, and to indicate thatsomething fatally problematicis going wrong in the generate-analyse phases. Default is 50

resume

logical; if a temporarySimDesign file is detected shouldthe simulation resume from this location? Keeping thisTRUE is generally recommended,however this should be disabled if usingrunSimulation withinrunSimulation to avoidreading improper save states. Alternatively, if an integer is supplied then the simulationwill continue at the associated row location indesign (e.g.,resume=10).This is useful to overwrite a previously evaluate element in the temporary files that was detectedto contain fatal errors that require re-evaluation without discarding the originally valid rowsin the simulation

save_details

a list pertaining to information regarding how and where files should be savedwhen thesave orsave_results flags are triggered.

safe

logical; trigger whether safe-saving should be performed. WhenTRUE fileswill never be overwritten accidentally, and where appropriate the program will either stop or generatenew files with unique names. Default isTRUE

compname

name of the computer running the simulation. Normally this doesn't needto be modified, but in the event that a manual node breaks down while running a simulation theresults from the temp files may be resumed on another computer by changing the name of thenode to match the broken computer. Default is the result of evaluatingunname(Sys.info()['nodename'])

out_rootdir

root directory to save all files to. Default uses thecurrent working directory

save_results_dirname

a string indicating the name of the folder to saveresult objects to whensave_results = TRUE. If a directory/folder does not existin the current working directory then a unique one will be created automatically. Default is'SimDesign-results_' with the associatedcompname appended if nofilename is defined, otherwise the filename is used to replace 'SimDesign'in the string

save_results_filename

a string indicating the name file to store, where theDesign row ID will be appended to ensure uniqueness across rows. Specifyingthis input will disable any checking for the uniqueness of the file folder, therebyallowing independentrunSimulation calls to write to the samesave_results_dirname. Useful when the files should all be stored in the sameworking directory, however the rows ofDesign are evaluated in isolation (e.g.,for HPC structures that allow asynchronous file storage).WARNING: the uniqueness of the file names are not checked usingthis approach, therefore please ensure that each generated name will be unique a priori,such as naming the file based on the supplied row condition information

save_seeds_dirname

a string indicating the name of the folder to save.Random.seed objects to whensave_seeds = TRUE. If a directory/folderdoes not existin the current working directory then one will be created automatically. Default is'SimDesign-seeds_' with the associatedcompname appended if nofilename is defined, otherwise the filename is used to replace 'SimDesign'in the string

tmpfilename

string indicating the temporary file name to saveprovisional information to. If not specified the following will be used:paste0('SIMDESIGN-TEMPFILE_', compname, '.rds')

control

a list for extra information flags for controlling lesscommonly used features. These include

stop_on_fatal

logical (default isFALSE); should the simulation beterminated immediately whenthe maximum number of consecutive errors (max_errors) is reached? IfFALSE,the simulation will continue as though errors did not occur, however a columnFATAL_TERMINATION will be included in the resulting object indicating the finalerror message observed, andNA placeholders will be placed in all other row-elements.Default isFALSE, though is automatically set toTRUE whenreplications < 3for the purpose of debugging

warnings_as_errors

logical (default isFALSE);treat warning messages as error messages during the simulation? Default is FALSE,therefore warnings are only collected and not used to restart the data generation step,and the seeds associated withthe warning message conditions are not stored within the final simulation object.

Note that this argument is generally intended for debugging/early planningstages when designing a simulation experiment. If specific warnings are known tobe problematic and should be treated as errors then please usemanageWarnings instead

save_seeds

logical; save the.Random.seed states prior to performingeach replication intoplain text files located in the definedsave_seeds_dirname directory/folder?Use this if you would like to keep track of every simulation state within eachreplication and designcondition. This can be used to completely replicate any cell in the simulation if need be.As well, see theload_seed inputto load a given.Random.seed to exactly replicate the generated data andanalysis state (mostly usefulfor debugging). WhenTRUE, temporary files will also be savedto the working directory (in the same way as whensave = TRUE).Additionally, if aprepare function is provided, the RNG state beforeprepare() execution is saved to'design-row-X/prepare-seed'.Default isFALSE

Note, however, that this option is not typically necessary or recommended sincethe.Random.seed states for simulationreplications that throw errors during the execution are automatically storedwithin the final simulationobject, and can be extracted and investigated usingSimExtract.Hence, this option is only ofinterest whenall of the replications must be reproducible (which occurs very rarely),otherwise the defaults torunSimulation should be sufficient

store_Random.seeds

logical; store thecomplete.Random.seed statesfor each simulation replicate? Default isFALSE as this cantake up a great deal of unnecessary RAM (see relatedsave_seeds),however this may be usefulwhen used withrunArraySimulation. To extract useSimExtract(..., what = 'stored_Random.seeds'). Additionally,if aprepare function is provided, the RNG state beforeprepare()execution is stored and can be extracted withSimExtract(..., what = 'prepare_seeds')

store_warning_seeds

logical (default isFALSE);in addition to storing the.Random.seed states whenever error messagesare raised, also store the.Random.seed states when warnings are raised? This isdisabled by defaultsince warnings are generally less problematic than errors, and because many morewarnings messages may be raisedthroughout the simulation (potentially causing RAM related issues when constructingthe final simulation object asany given simulation replicate could generate numerous warnings, and storing the seedsstates could add up quickly).

Set this toTRUE when replicating warning messages is important, however be awarethat too many warnings messages raised during the simulation implementation could causeRAM related issues.

include_replication_index orinclude_reps

logical (default isFALSE);should a REPLICATION element be added tothecondition object when performing the simulation to track which specificreplication experiment is being evaluated? This is useful when, for instance, attemptingto run external software programs (e.g., Mplus) that require saving temporary data setsto the hard-drive (see the Wiki for examples)

try_all_analyse

logical; whenanalyse is a list, should every generateddata set be analyzed by each function definition in theanalyse list?Default isTRUE.

Note that thisTRUE default can be computationally demanding when some analysisfunctions require more computational resources than others, and the data should bediscarded early as an invalid candidate (e.g., estimating a model via maximum-likelihoodin on analyze component, while estimating a model using MCMC estimation on another). Hence,the main benefit of usingFALSE instead is that the data set may be rejected earlier,where easier/faster to estimateanalyse definitions should be placed earlier in the listas the functions are evaluated in sequence(e.g.,Analyse = list(MLE=MLE_definition, MCMC=MCMC_definition))

allow_na

logical (default isFALSE); shouldNAs be allowed in theanalyse step as a valid result from the simulation analysis?

allow_nan

logical (default isFALSE); shouldNaNs be allowed in theanalyse step as a valid result from the simulation analysis?

type

default type of cluster to create for thecl object if not supplied.For Windows OS this defaults to"PSOCK", otherwise"SOCK" is selected(suitable for Linux and Mac OSX). This is ignored if the user specifies their owncl object

print_RAM

logical (default isTRUE); print the amount of RAMused throughout the simulation? Set toFALSE if unnecessary or if the call togc is unnecessarily time consuming

global_fun_level

determines how many levels to search until global environmentframe is located. Default is 2, though forrunArraySimulation this is set to 3.Use 3 or more wheneverrunSimulation is used within the context of another function

max_time

Similar torunArraySimulation, specifies the (approximate) maximumtime that the simulation is allowed to be executed. Default sets no time limit.SeetimeFormater for the input specifications; otherwise, can bespecified as anumeric input reflecting the maximum time in seconds.

Note that whenparallel = TRUE themax_time can only be checked ona per condition basis.

max_RAM

Similar torunArraySimulation, specifies the (approximate) maximumRAM that the simulation is allowed to occupy. However, unlike the implementationinrunArraySimulation is evaluated on a per condition basis,wheremax_RAM is only evaluated after every row in thedesign object has been completed (hence, is notably more approximate as ithas the potential to overshoot by a wider margin). Default sets no RAM limit.SeerunArraySimulation for the input specifications.

not_parallel

integer vector indicating which rows in thedesignobject should not be run in parallel. This is useful when some version ofparallel is used, however the overhead that tags along with parallel processingresults in higher processing times than simply running the simulation with one core.

progress

logical; display a progress bar (using thepbapply package)for each simulation condition? In interactive sessions, shows a timer-basedprogress bar. In non-interactive sessions (e.g., HPC cluster jobs), displaystext-based progress updates that are visible in log files.This is useful when simulations conditions take a long time to run (see also thenotification argument). Default isTRUE

verbose

logical; print messages to the R console? Default isTRUE whenin interactive mode

object

SimDesign object returned fromrunSimulation

...

additional arguments

x

SimDesign object returned fromrunSimulation

list2char

logical; fortibble object re-evaluate list elementsas character vectors for better printing of the levels? Note that thisdoes not change the original classes of the object, just how they are printed.Default is TRUE

Details

For an in-depth tutorial of the package please refer to Chalmers and Adkins (2020;doi:10.20982/tqmp.16.4.p248).For an earlier didactic presentation of the package refer to Sigal and Chalmers(2016;doi:10.1080/10691898.2016.1246953). Finally, see the associatedwiki on Github (https://github.com/philchalmers/SimDesign/wiki)for tutorial material, examples, and applications ofSimDesign to real-worldsimulation experiments, as well as the various vignette files associated with the package.

The strategy for organizing the Monte Carlo simulation work-flow is to

1)

Define a suitableDesign object (atibble ordata.frame)containing fixed conditionalinformation about the Monte Carlo simulations. Each row or thisdesign object pertainsto a unique set of simulation to study, while each column the simulation factor underinvestigation (e.g., sample size,distribution types, etc). This is often expedited by using thecreateDesign function, and if necessary the argumentsubsetcan be used to remove redundant or non-applicable rows

2)

Define the three step functions to generate the data (Generate; see alsohttps://CRAN.R-project.org/view=Distributions for a list of distributions in R),analyse the generated data by computing the respective parameter estimates, detection rates,etc (Analyse), and finally summarise the results across the totalnumber of replications (Summarise).

3)

Pass thedesign object and three defined R functions torunSimulation,and declare the number of replications to perform with thereplications input.This function will return a suitabletibble object with the complete simulation results and execution details

4)

Analyze the output fromrunSimulation, possibly using ANOVA techniques(SimAnova) and generating suitable plots and tables

Expressing the above more succinctly, the functions to be called have the following form,with the exact functional arguments listed:

Design <- createDesign(...)
Generate <- function(condition, fixed_objects) {...}
Analyse <- function(condition, dat, fixed_objects) {...}
Summarise <- function(condition, results, fixed_objects) {...}
res <- runSimulation(design=Design, replications, generate=Generate, analyse=Analyse, summarise=Summarise)

Thecondition object above represents a single row from thedesign object, indicatinga unique Monte Carlo simulation condition. Thecondition object also contains twoadditional elements to help track the simulation's state: anID variable, indicatingthe respective row number in thedesign object, and aREPLICATION elementindicating the replication iteration number (an integer value between 1 andreplication).This setup allows users to easily locate therth replication (e.g.,REPLICATION == 500)within thejth row in the simulation design (e.g.,ID == 2). TheREPLICATION input is also useful when temporarily saving files to the hard-drivewhen calling external command line utilities (see examples on the wiki).

For a template-based version of the work-flow, which is often useful when initiallydefining a simulation, use theSimFunctions function. Thisfunction will write a template simulationto one/two files so that modifying the required functions and objects can begin immediately.This means that users can focus on their Monte Carlo simulation details right away ratherthan worrying about the repetitive administrative code-work required to organize the simulation'sexecution flow.

Finally, examples, presentation files, and tutorials can be found on the package wiki located athttps://github.com/philchalmers/SimDesign/wiki.

Value

atibble from thedplyr package (also of class'SimDesign')with the originaldesign conditions in the left-most columns,simulation results in the middle columns, and additional information in the right-most columns (see below).

The right-most column information for each condition are:REPLICATIONS to indicate the number of Monte Carlo replications,SIM_TIME to indicate how long (in seconds) it took to completeall the Monte Carlo replications for each respective design condition,RAM_USED amount of RAM that was in use at the time of completingeach simulation condition,COMPLETED to indicate the date in which the given simulation condition completed,SEED for the integer values in theseed argument (if applicable; notrelevant when L'Ecuyer-CMRG method used), and, if applicable,ERRORS andWARNINGS which contain counts for the number of error or warningmessages that were caught (if no errors/warnings were observed these columns will be omitted).Note that to extract the specific error and warnings messages seeSimExtract. Finally,ifboot_method was a valid input other than 'none' then the final right-mostcolumns will contain the labelsBOOT_ followed by the name of the associated meta-statistic defined insummarise() andand bootstrapped confidence interval location for the meta-statistics.

Saving data, results, seeds, and the simulation state

To conserve RAM, temporary objects (such as data generated across conditions and replications)are discarded; however, these can be saved to the hard-disk by passing the appropriate flags.For longer simulations it is recommended to use thesave_results flag to write theanalysis results to the hard-drive.

The use of thestore_seeds or thesave_seeds optionscan be evoked to save R's.Random.seedstate to allow for complete reproducibility of each replication within each condition. Theseindividual.Random.seed terms can then be read in with theload_seed input to reproduce the exact simulation state at any given replication.Most often though, saving the complete list of seeds is unnecessary as problematic seeds areautomatically stored in the final simulation object to allow for easier replicabilityof potentially problematic errors (which incidentally can be extractedusingSimExtract(res, 'error_seeds') and passed to theload_seed argument). Finally,providing a vector ofseeds is also possible to ensurethat each simulation condition is macro reproducible under the single/multi-core method selected.

Finally, when the Monte Carlo simulation is completeit is recommended to write the results to a hard-drive for safe keeping, particularly with thefilename argument provided (for reasons that are more obvious in the parallel computationdescriptions below). Using thefilename argument supplied is safer than using, for instance,saveRDS directly because files will never accidentally be overwritten,and instead a new file name will be created when a conflict arises; this type of implementation safetyis prevalent in many locations in the package to help avoid unrecoverable (yet surprisinglycommon) mistakes during the process of designing and executing Monte Carlo simulations.

Resuming temporary results

In the event of a computer crash, power outage, etc, ifsave = TRUE was used (the default)then the original code used to executerunSimulation() need only be re-run to resumethe simulation. The saved temp file will be read into the function automatically, and thesimulation will continue one the condition where it left off before the simulationstate was terminated. If users wish to remove this temporarysimulation state entirely so as to start anew then simply passSimClean(temp = TRUE)in the R console to remove any previously saved temporary objects.

A note on parallel computing

When running simulations in parallel (either withparallel = TRUEor when using thefuture approach with aplan() other than sequential)R objects defined in the global environment will generallynot be visible across nodes.Hence, you may see errors such asError: object 'something' not found if you try to usean object that is defined in the work space but is not passed torunSimulation.To avoid this type or error, simply pass additional objects to thefixed_objects input (usually it's convenient to supply a named list of these objects).Fortunately, however,custom functions defined in the global environment are exported acrossnodes automatically. This makes it convenient when writing code because custom functions willalways be available across nodes if they are visible in the R work space. As well, note thepackages input to declare packages which must be loaded vialibrary() in order to makespecific non-standard R functions available across nodes.

Author(s)

Phil Chalmersrphilip.chalmers@gmail.com

References

Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248

Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953

See Also

SimFunctions,createDesign,Generate,Analyse,Summarise,SimExtract,reSummarise,SimClean,SimAnova,SimResults,SimCollect,Attach,AnalyseIf,SimShiny,manageWarnings,runArraySimulation

Examples

#-------------------------------------------------------------------------------# Example 1: Sampling distribution of mean# This example demonstrate some of the simpler uses of SimDesign,# particularly for classroom settings. The only factor varied in this simulation# is sample size.# skeleton functions to be saved and editedSimFunctions()#### Step 1 --- Define your conditions under study and create design data.frameDesign <- createDesign(N = c(10, 20, 30))#~~~~~~~~~~~~~~~~~~~~~~~~#### Step 2 --- Define generate, analyse, and summarise functions# help(Generate)Generate <- function(condition, fixed_objects) {    dat <- with(condition, rnorm(N, 10, 5)) # distributed N(10, 5)    dat}# help(Analyse)Analyse <- function(condition, dat, fixed_objects) {    ret <- c(mean=mean(dat)) # mean of the sample data vector    ret}# help(Summarise)Summarise <- function(condition, results, fixed_objects) {    # mean and SD summary of the sample means    ret <- c(mu=mean(results$mean), SE=sd(results$mean))    ret}#~~~~~~~~~~~~~~~~~~~~~~~~#### Step 3 --- Collect results by looping over the rows in design# run the simulation in testing mode (replications = 2)Final <- runSimulation(design=Design, replications=2,                       generate=Generate, analyse=Analyse, summarise=Summarise)FinalSimResults(Final)# reproduce exact simulationFinal_rep <- runSimulation(design=Design, replications=2, seed=Final$SEED,                       generate=Generate, analyse=Analyse, summarise=Summarise)Final_repSimResults(Final_rep)## Not run: # run with more standard number of replicationsFinal <- runSimulation(design=Design, replications=1000,                       generate=Generate, analyse=Analyse, summarise=Summarise)FinalSimResults(Final)#~~~~~~~~~~~~~~~~~~~~~~~~#### Extras# compare SEs estimates to the true SEs from the formula sigma/sqrt(N)5 / sqrt(Design$N)# To store the results from the analyse function either#   a) omit a definition of summarise() to return all results,#   b) use store_results = TRUE (default) to store results internally and later#      extract with SimResults(), or#   c) pass save_results = TRUE to runSimulation() and read the results in with SimResults()##   Note that method c) should be adopted for larger simulations, particularly#   if RAM storage could be an issue and error/warning message information is important.# a) approachres <- runSimulation(design=Design, replications=100,                     generate=Generate, analyse=Analyse)res# b) approach (store_results = TRUE by default)res <- runSimulation(design=Design, replications=100,                     generate=Generate, analyse=Analyse, summarise=Summarise)resSimResults(res)# c) approachFinal <- runSimulation(design=Design, replications=100, save_results=TRUE,                       generate=Generate, analyse=Analyse, summarise=Summarise)# read-in all conditions (can be memory heavy)res <- SimResults(Final)reshead(res[[1]]$results)# just first conditionSimResults(Final, which=1)# obtain empirical bootstrapped CIs during an initial run# the simulation was completed (necessarily requires save_results = TRUE)res <- runSimulation(design=Design, replications=1000, boot_method = 'basic',                     generate=Generate, analyse=Analyse, summarise=Summarise)res# alternative bootstrapped CIs that uses saved results via reSummarise().# Default directory save to:dirname <- paste0('SimDesign-results_', unname(Sys.info()['nodename']), "/")res <- reSummarise(summarise=Summarise, dir=dirname, boot_method = 'basic')res# remove the saved results from the hard-drive if you no longer want themSimClean(results = TRUE)## End(Not run)#-------------------------------------------------------------------------------# Example 2: t-test and Welch test when varying sample size, group sizes, and SDs# skeleton functions to be saved and editedSimFunctions()## Not run: # in real-world simulations it's often better/easier to save# these functions directly to your hard-drive withSimFunctions('my-simulation')## End(Not run)#### Step 1 --- Define your conditions under study and create design data.frameDesign <- createDesign(sample_size = c(30, 60, 90, 120),                       group_size_ratio = c(1, 4, 8),                       standard_deviation_ratio = c(.5, 1, 2))Design#~~~~~~~~~~~~~~~~~~~~~~~~#### Step 2 --- Define generate, analyse, and summarise functionsGenerate <- function(condition, fixed_objects) {    N <- condition$sample_size      # could use Attach() to make objects available    grs <- condition$group_size_ratio    sd <- condition$standard_deviation_ratio    if(grs < 1){        N2 <- N / (1/grs + 1)        N1 <- N - N2    } else {        N1 <- N / (grs + 1)        N2 <- N - N1    }    group1 <- rnorm(N1)    group2 <- rnorm(N2, sd=sd)    dat <- data.frame(group = c(rep('g1', N1), rep('g2', N2)), DV = c(group1, group2))    dat}Analyse <- function(condition, dat, fixed_objects) {    welch <- t.test(DV ~ group, dat)$p.value    independent <- t.test(DV ~ group, dat, var.equal=TRUE)$p.value    # In this function the p values for the t-tests are returned,    #  and make sure to name each element, for future reference    ret <- nc(welch, independent)    ret}Summarise <- function(condition, results, fixed_objects) {    #find results of interest here (e.g., alpha < .1, .05, .01)    ret <- EDR(results, alpha = .05)    ret}#~~~~~~~~~~~~~~~~~~~~~~~~#### Step 3 --- Collect results by looping over the rows in design# first, test to see if it worksres <- runSimulation(design=Design, replications=2,                     generate=Generate, analyse=Analyse, summarise=Summarise)res## Not run: # complete run with 1000 replications per conditionres <- runSimulation(design=Design, replications=1000, parallel=TRUE,                     generate=Generate, analyse=Analyse, summarise=Summarise)resView(res)## save final results to a file upon completion, and play a beep when donerunSimulation(design=Design, replications=1000, parallel=TRUE, filename = 'mysim',              generate=Generate, analyse=Analyse, summarise=Summarise, beep=TRUE)## same as above, but send a notification via Pushbullet upon completionlibrary(RPushbullet) # read-in default JSON filepushbullet_notifier <- new_PushbulletNotifier(verbose_issues = TRUE)runSimulation(design=Design, replications=1000, parallel=TRUE, filename = 'mysim',              generate=Generate, analyse=Analyse, summarise=Summarise,              notification = 'complete', notifier = pushbullet_notifier)## Submit as RStudio job (requires job package and active RStudio session)job::job({  res <- runSimulation(design=Design, replications=100,                       generate=Generate, analyse=Analyse, summarise=Summarise)}, title='t-test simulation')res  # object res returned to console when completed## Debug the generate function. See ?browser for help on debugging##   Type help to see available commands (e.g., n, c, where, ...),##   ls() to see what has been defined, and type Q to quit the debuggerrunSimulation(design=Design, replications=1000,              generate=Generate, analyse=Analyse, summarise=Summarise,              parallel=TRUE, debug='generate')## Alternatively, place a browser() within the desired function line to##   jump to a specific locationSummarise <- function(condition, results, fixed_objects) {    #find results of interest here (e.g., alpha < .1, .05, .01)    browser()    ret <- EDR(results[,nms], alpha = .05)    ret}## The following debugs the analyse function for the## second row of the Design inputrunSimulation(design=Design, replications=1000,              generate=Generate, analyse=Analyse, summarise=Summarise,              parallel=TRUE, debug='analyse-2')## End(Not run)###################################### Not run: ## Network linked via ssh (two way ssh key-paired connection must be## possible between master and slave nodes)#### Define IP addresses, including primary IPprimary <- '192.168.2.20'IPs <- list(    list(host=primary, user='phil', ncore=8),    list(host='192.168.2.17', user='phil', ncore=8))spec <- lapply(IPs, function(IP)                   rep(list(list(host=IP$host, user=IP$user)), IP$ncore))spec <- unlist(spec, recursive=FALSE)cl <- parallel::makeCluster(type='PSOCK', master=primary, spec=spec)res <- runSimulation(design=Design, replications=1000, parallel = TRUE,                     generate=Generate, analyse=Analyse, summarise=Summarise, cl=cl)## Using parallel='future' to allow the future framework to be used insteadlibrary(future) # future structure to be used internallyplan(multisession) # specify different plan (default is sequential)res <- runSimulation(design=Design, replications=100, parallel='future',                     generate=Generate, analyse=Analyse, summarise=Summarise)head(res)# The progressr package is used for progress reporting with futures. To redefine#  use progressr::handlers() (see below)library(progressr)with_progress(res <- runSimulation(design=Design, replications=100, parallel='future',                     generate=Generate, analyse=Analyse, summarise=Summarise))head(res)# re-define progressr's bar (below requires cli)handlers(handler_pbcol(   adjust = 1.0,   complete = function(s) cli::bg_red(cli::col_black(s)),   incomplete = function(s) cli::bg_cyan(cli::col_black(s))))with_progress(res <- runSimulation(design=Design, replications=100, parallel='future',                     generate=Generate, analyse=Analyse, summarise=Summarise))# reset future computing plan when complete (good practice)plan(sequential)## End(Not run)########################################## Post-analysis: Analyze the results via functions like lm() or SimAnova(), and create###### tables(dplyr) or plots (ggplot2) to help visualize the results.###### This is where you get to be a data analyst!library(dplyr)res %>% summarise(mean(welch), mean(independent))res %>% group_by(standard_deviation_ratio, group_size_ratio) %>%   summarise(mean(welch), mean(independent))# quick ANOVA analysis method with all two-way interactionsSimAnova( ~ (sample_size + group_size_ratio + standard_deviation_ratio)^2, res,  rates = TRUE)# or more specific ANOVAsSimAnova(independent ~ (group_size_ratio + standard_deviation_ratio)^2,    res, rates = TRUE)# make some plotslibrary(ggplot2)library(tidyr)dd <- res %>%   select(group_size_ratio, standard_deviation_ratio, welch, independent) %>%   pivot_longer(cols=c('welch', 'independent'), names_to = 'stats')ddggplot(dd, aes(factor(group_size_ratio), value)) + geom_boxplot() +    geom_abline(intercept=0.05, slope=0, col = 'red') +    geom_abline(intercept=0.075, slope=0, col = 'red', linetype='dotted') +    geom_abline(intercept=0.025, slope=0, col = 'red', linetype='dotted') +    facet_wrap(~stats)ggplot(dd, aes(factor(group_size_ratio), value, fill = factor(standard_deviation_ratio))) +    geom_boxplot() + geom_abline(intercept=0.05, slope=0, col = 'red') +    geom_abline(intercept=0.075, slope=0, col = 'red', linetype='dotted') +    geom_abline(intercept=0.025, slope=0, col = 'red', linetype='dotted') +    facet_grid(stats~standard_deviation_ratio) +    theme(legend.position = 'none')#-------------------------------------------------------------------------------# Example with prepare() function - Loading pre-computed objects## Not run: # Step 1: Pre-generate expensive objects offline (can be parallelized)Design <- createDesign(N = c(10, 20), rho = c(0.3, 0.7))# Create directory for storing prepared objectsdir.create('prepared_objects', showWarnings = FALSE)# Generate and save correlation matrices for each conditionfor(i in 1:nrow(Design)) {  cond <- Design[i, ]  # Generate correlation matrix  corr_matrix <- matrix(cond$rho, nrow=cond$N, ncol=cond$N)  diag(corr_matrix) <- 1  # Create filename based on design parameters  fname <- paste0('prepared_objects/N', cond$N, '_rho', cond$rho, '.rds')  # Save to disk  saveRDS(corr_matrix, file = fname)}# Step 2: Use prepare() to load these objects during simulationprepare <- function(condition, fixed_objects) {  # Create matching filename  fname <- paste0('prepared_objects/N', condition$N,                  '_rho', condition$rho, '.rds')  # Load the pre-computed correlation matrix  fixed_objects$corr_matrix <- readRDS(fname)  return(fixed_objects)}generate <- function(condition, fixed_objects) {  # Use the loaded correlation matrix to generate multivariate data  dat <- MASS::mvrnorm(n = 50,                       mu = rep(0, condition$N),                       Sigma = fixed_objects$corr_matrix)  data.frame(dat)}analyse <- function(condition, dat, fixed_objects) {  # Calculate mean correlation in generated data  obs_corr <- cor(dat)  c(mean_corr = mean(obs_corr[lower.tri(obs_corr)]))}summarise <- function(condition, results, fixed_objects) {  c(mean_corr = mean(results$mean_corr))}# Run simulation - prepare() loads objects efficientlyresults <- runSimulation(design = Design,                         replications = 2,                         prepare = prepare,                         generate = generate,                         analyse = analyse,                         summarise = summarise,                         verbose = FALSE)results# CleanupSimClean(dirs='prepared_objects')## End(Not run)

Format time string to suitable numeric output

Description

Format time input string into suitable numeric output metric (e.g., seconds).Input follows theSBATCH utility specifications.Accepted time formats include"minutes","minutes:seconds","hours:minutes:seconds","days-hours","days-hours:minutes" and"days-hours:minutes:seconds".

Usage

timeFormater(time, output = "sec")

Arguments

time

a character string to be formatted. If a numeric vector is suppliedthen this will be interpreted as minutes due to character coercion.

output

type of numeric output to convert time into.Currently supported are'sec' for seconds (default),'min' for minutes,'hour', and'day'

Details

For example,time = "60" indicates a maximum time of 60 minutes,time = "03:00:00" a maximum time of 3 hours,time = "4-12" a maximum of 4 days and 12 hours, andtime = "2-02:30:00" a maximum of 2 days, 2 hours and 30 minutes.

Examples

# Test cases (outputs in seconds)timeFormater("4-12")        # day-hourstimeFormater("4-12:15")     # day-hours:minutestimeFormater("4-12:15:30")  # day-hours:minutes:secondstimeFormater("30")          # minutestimeFormater("30:30")       # minutes:secondstimeFormater("4:30:30")     # hours:minutes:seconds# output in hourstimeFormater("4-12", output = 'hour')timeFormater("4-12:15", output = 'hour')timeFormater("4-12:15:30", output = 'hour')timeFormater("30", output = 'hour')timeFormater("30:30", output = 'hour')timeFormater("4:30:30", output = 'hour')# numeric input is understood as minutestimeFormater(42)               # secondstimeFormater(42, output='min') # minutes

[8]ページ先頭

©2009-2025 Movatter.jp