| Title: | Structure for Organizing Monte Carlo Simulation Designs |
| Version: | 2.22 |
| Description: | Provides tools to safely and efficiently organize and execute Monte Carlo simulation experiments in R. The package controls the structure and back-end of Monte Carlo simulation experiments by utilizing a generate-analyse-summarise workflow. The workflow safeguards against common simulation coding issues, such as automatically re-simulating non-convergent results, prevents inadvertently overwriting simulation files, catches error and warning messages during execution, implicitly supports parallel processing with high-quality random number generation, and provides tools for managing high-performance computing (HPC) array jobs submitted to schedulers such as SLURM. For a pedagogical introduction to the package see Sigal and Chalmers (2016) <doi:10.1080/10691898.2016.1246953>. For a more in-depth overview of the package and its design philosophy see Chalmers and Adkins (2020) <doi:10.20982/tqmp.16.4.p248>. |
| VignetteBuilder: | knitr |
| Depends: | R (≥ 4.1.0) |
| Imports: | methods, testthat, parallel, parallelly, dplyr, sessioninfo,beepr, pbapply (≥ 1.3-0), future, future.apply, progressr,R.utils, codetools, clipr, stats |
| Suggests: | snow, knitr, ggplot2, tidyr, purrr, shiny, copula,extraDistr, renv, cli, job, future.batchtools, FrF2, rmarkdown,RPushbullet, httr |
| License: | GPL-2 |GPL-3 [expanded from: GPL (≥ 2)] |
| ByteCompile: | yes |
| LazyData: | true |
| URL: | http://philchalmers.github.io/SimDesign/,https://github.com/philchalmers/SimDesign/wiki |
| RoxygenNote: | 7.3.3 |
| Encoding: | UTF-8 |
| NeedsCompilation: | no |
| Packaged: | 2025-12-16 18:29:39 UTC; phil |
| Author: | Phil Chalmers |
| Maintainer: | Phil Chalmers <rphilip.chalmers@gmail.com> |
| Repository: | CRAN |
| Date/Publication: | 2025-12-17 06:40:53 UTC |
Structure for Organizing Monte Carlo Simulation Designs
Description
Structure for Organizing Monte Carlo Simulation Designs
Details
Provides tools to help organize Monte Carlo simulations in R. The packagecontrols the structure and back-end of Monte Carlo simulationsby utilizing a general generate-analyse-summarise strategy. The functions provided control commonsimulation issues such as re-simulating non-convergent results, support parallelback-end computations with proper random number generation within each simulationcondition,save and restore temporary files,aggregate results across independent nodes, and provide native support for debugging.The primary function for organizing the simulations isrunSimulation, whilefor array jobs submitting to HPC clusters (e.g., SLURM) seerunArraySimulationand the associated package vignettes.
For an in-depth tutorial of the package please refer toChalmers and Adkins (2020;doi:10.20982/tqmp.16.4.p248).For an earlier didactic presentation of the package users can refer to Sigal and Chalmers(2016;doi:10.1080/10691898.2016.1246953). Finally, see the associatedwiki on Github (https://github.com/philchalmers/SimDesign/wiki)for other tutorial material, examples, and applications ofSimDesign to real-world simulations.
Author(s)
Phil Chalmersrphilip.chalmers@gmail.com
References
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953
Compute estimates and statistics
Description
Compute all relevant test statistics, parameter estimates, detection rates, and so on.This is the computational heavy lifting portion of the Monte Carlo simulation. Usersmay define a single Analysis function to perform all the analyses in the same function environment,or may define alist of named functions torunSimulation to allow for a moremodularized approach to performing the analyses in independent blocks (but that share the same generateddata). Note that if a suitableGenerate function was not supplied then this functioncan be used to be generate and analyse the Monte Carlo data (though in general thissetup is not recommended for larger simulations).
Usage
Analyse(condition, dat, fixed_objects)Arguments
condition | a single row from the design input (as a |
dat | the |
fixed_objects | object passed down from |
Details
In some cases, it may be easier to change the output to a namedlist containingdifferent parameter configurations (e.g., whendetermining RMSE values for a large set of population parameters).
The use oftry functions is generally not required in this function becauseAnalyseis internally wrapped in atry call. Therefore, if a function stops earlythen this will cause the function to halt internally, the message which triggered thestopwill be recorded, andGenerate will be called again to obtain a different dataset.That said, it may be useful for users to throw their ownstop commands if the datashould be re-drawn for other reasons (e.g., an estimated model terminated correctlybut the maximum number of iterations were reached).
Value
returns a namednumeric vector ordata.frame with the values of interest(e.g., p-values, effects sizes, etc), or alist containing values of interest(e.g., separate matrix and vector of parameter estimates corresponding to elements inparameters). If adata.frame is returned with more than 1 row then theseobjects will be wrapped into suitablelist objects
References
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953
See Also
Examples
## Not run: analyse <- function(condition, dat, fixed_objects) { # require packages/define functions if needed, or better yet index with the :: operator require(stats) mygreatfunction <- function(x) print('Do some stuff') #wrap computational statistics in try() statements to control estimation problems welch <- t.test(DV ~ group, dat) ind <- stats::t.test(DV ~ group, dat, var.equal=TRUE) # In this function the p values for the t-tests are returned, # and make sure to name each element, for future reference ret <- c(welch = welch$p.value, independent = ind$p.value) return(ret)}# A more modularized example approachanalysis_welch <- function(condition, dat, fixed_objects) { welch <- t.test(DV ~ group, dat) ret <- c(p=welch$p.value) ret}analysis_ind <- function(condition, dat, fixed_objects) { ind <- t.test(DV ~ group, dat, var.equal=TRUE) ret <- c(p=ind$p.value) ret}# pass functions as a named list# runSimulation(..., analyse=list(welch=analyse_welch, independent=analysis_ind))## End(Not run)Perform a test that indicates whether a givenAnalyse() function should be executed
Description
This function is designed to prevent specific analysis function executions when thedesign conditions are not met. Primarily useful when theanalyse argument torunSimulation was input as a named list object, however some of theanalysis functions are not interesting/compatible with the generated data and shouldtherefore be skipped.
Usage
AnalyseIf(x, condition = NULL)Arguments
x | logical statement to evaluate. If the statement evaluates to |
condition | (optional) the current design condition. This does not need to be suppliedif the expression in |
Author(s)
Phil Chalmersrphilip.chalmers@gmail.com
References
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953
See Also
Examples
## Not run: Design <- createDesign(N=c(10,20,30), var.equal = c(TRUE, FALSE))Generate <- function(condition, fixed_objects) { Attach(condition) dat <- data.frame(DV = rnorm(N*2), IV = gl(2, N, labels=c('G1', 'G2'))) dat}# always run this analysis for each row in DesignAnalyse1 <- function(condition, dat, fixed_objects) { mod <- t.test(DV ~ IV, data=dat) mod$p.value}# Only perform analysis when variances are equal and N = 20 or 30Analyse2 <- function(condition, dat, fixed_objects) { AnalyseIf(var.equal && N %in% c(20, 30), condition) mod <- t.test(DV ~ IV, data=dat, var.equal=TRUE) mod$p.value}Summarise <- function(condition, results, fixed_objects) { ret <- EDR(results, alpha=.05) ret}#-------------------------------------------------------------------# append names 'Welch' and 'independent' to associated outputres <- runSimulation(design=Design, replications=100, generate=Generate, analyse=list(Welch=Analyse1, independent=Analyse2), summarise=Summarise)res# leave results unnamedres <- runSimulation(design=Design, replications=100, generate=Generate, analyse=list(Analyse1, Analyse2), summarise=Summarise)## End(Not run)Attach objects for easier reference
Description
The behaviour of this function is very similar toattach,however it is environment specific, andtherefore only remains defined in a given function rather than in the Global Environment.Hence, this function is much safer to use than theattach, whichincidentally should never be used in your code. Thisis useful primarily as a convenience function when you prefer to call the variable namesincondition directly rather than indexing withcondition$sample_size orwith(condition, sample_size), for example.
Usage
Attach( ..., omit = NULL, check = TRUE, attach_listone = TRUE, RStudio_flags = FALSE, clip = interactive())Arguments
... | a comma separated list of |
omit | an optional character vector containing the names of objects that should notbe attached to the current environment. For instance, if the objects named 'a' and 'b' shouldnot be attached then use |
check | logical; check to see if the function will accidentally replace previously definedvariables with the same names as in |
attach_listone | logical; if the element to be assign is a list of length onethen assign the first element of this list with the associated name. This generally avoidsadding an often unnecessary list 1 index, such as |
RStudio_flags | logical; print R script output comments that disable flaggedmissing variables in RStudio? Requires the form |
clip | when |
Details
Note that if you are using RStudio with the"Warn if variable used has no definition in scope"diagnostic flag then usingAttach() will raise suspensions. To suppress such issues,you can either disable such flags (the atomic solution) or evaluate the following outputin the R console and place the output in your working simulation file.
Attach(Design, RStudio_flags = TRUE)
Author(s)
Phil Chalmersrphilip.chalmers@gmail.com
References
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953
See Also
Examples
Design <- createDesign(N1=c(10,20), N2=c(10,20), sd=c(1,2))Design# does not use Attach()Generate <- function(condition, fixed_objects ) { # condition = single row of Design input (e.g., condition <- Design[1,]) N1 <- condition$N1 N2 <- condition$N2 sd <- condition$sd group1 <- rnorm(N1) group2 <- rnorm(N2, sd=sd) dat <- data.frame(group = c(rep('g1', N1), rep('g2', N2)), DV = c(group1, group2)) dat}# similar to above, but using the Attach() function instead of indexingGenerate <- function(condition, fixed_objects ) { Attach(condition) # N1, N2, and sd are now 'attached' and visible group1 <- rnorm(N1) group2 <- rnorm(N2, sd=sd) dat <- data.frame(group = c(rep('g1', N1), rep('g2', N2)), DV = c(group1, group2)) dat}###################### NOTE: if you're using RStudio with code diagnostics on then evaluate + add the# following output to your source file to manually support the flagged variablesAttach(Design, RStudio_flags=TRUE)# Below is the same example, however with false positive missing variables suppressed# when # !diagnostics ... is added added to the source file(s)# !diagnostics suppress=N1,N2,sdGenerate <- function(condition, fixed_objects ) { Attach(condition) # N1, N2, and sd are now 'attached' and visible group1 <- rnorm(N1) group2 <- rnorm(N2, sd=sd) dat <- data.frame(group = c(rep('g1', N1), rep('g2', N2)), DV = c(group1, group2)) dat}Example simulation from Brown and Forsythe (1974)
Description
Example results from the Brown and Forsythe (1974) article on robust estimators forvariance ratio tests. Statistical tests are organized by columns and the unique design conditionsare organized by rows. SeeBF_sim_alternative for an alternative form of the samesimulation. Code for this simulation is available of the wiki(https://github.com/philchalmers/SimDesign/wiki).
Author(s)
Phil Chalmersrphilip.chalmers@gmail.com
References
Brown, M. B. and Forsythe, A. B. (1974). Robust tests for the equality of variances.Journal of the American Statistical Association, 69(346), 364–367.
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953
Examples
## Not run: data(BF_sim)head(BF_sim)#Type I errorssubset(BF_sim, var_ratio == 1)## End(Not run)(Alternative) Example simulation from Brown and Forsythe (1974)
Description
Example results from the Brown and Forsythe (1974) article on robust estimators forvariance ratio tests. Statistical tests and distributions are organized by columnsand the unique design conditions are organized by rows. SeeBF_sim for an alternativeform of the same simulation where distributions are also included in the rows.Code for this simulation is available on the wiki (https://github.com/philchalmers/SimDesign/wiki).
Author(s)
Phil Chalmersrphilip.chalmers@gmail.com
References
Brown, M. B. and Forsythe, A. B. (1974). Robust tests for the equality of variances.Journal of the American Statistical Association, 69(346), 364–367.
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953
Examples
## Not run: data(BF_sim_alternative)head(BF_sim_alternative)#' #Type I errorssubset(BF_sim_alternative, var_ratio == 1)## End(Not run)Bradley's (1978) empirical robustness interval
Description
Robustness interval criteria for empirical detection rate estimates andempirical coverage estimates defined by Bradley (1978).SeeEDR andECR to obtain such estimates.
Usage
Bradley1978( rate, alpha = 0.05, type = "liberal", CI = FALSE, out.logical = FALSE, out.labels = c("conservative", "robust", "liberal"), unname = FALSE)Arguments
rate | (optional) numeric vector containing the empirical detectionrate(s) or empirical confidence interval estimates.If supplied a character vector with elements defined in When the input is an empirical coverage rate the argument If this input is missing, the interval criteria will be printed to the console |
alpha | Type I error rate to evaluated (default is .05) |
type | character vector indicating the type of interval classification to use.Default is 'liberal', however can be 'stringent' to use Bradley's morestringent robustness criteria |
CI | logical; should this robust interval be constructed on empirical detectionrates ( |
out.logical | logical; should the output vector be TRUE/FALSE indicating whetherthe supplied empirical detection rate/CI should be considered "robust"? Default isFALSE, in which case the out.labels elements are used instead |
out.labels | character vector of length three indicating the classificationlabels according to the desired robustness interval |
unname | logical; apply |
Author(s)
Phil Chalmersrphilip.chalmers@gmail.com
References
Bradley, J. V. (1978). Robustness?British Journal of Mathematical andStatistical Psychology, 31, 144-152.
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953
See Also
Examples
# interval criteria used for empirical detection ratesBradley1978()Bradley1978(type = 'stringent')Bradley1978(alpha = .01, type = 'stringent')# intervals applied to empirical detection rate estimatesedr <- c(test1 = .05, test2 = .027, test3 = .051, test4 = .076, test5 = .024)Bradley1978(edr)Bradley1978(edr, out.logical=TRUE) # is robust?###### interval criteria used for coverage estimatesBradley1978(CI = TRUE)Bradley1978(CI = TRUE, type = 'stringent')Bradley1978(CI = TRUE, alpha = .01, type = 'stringent')# intervals applied to empirical coverage rate estimatesecr <- c(test1 = .950, test2 = .973, test3 = .949, test4 = .924, test5 = .976)Bradley1978(ecr, CI=TRUE)Bradley1978(ecr, CI=TRUE, out.logical=TRUE) # is robust?Compute congruence coefficient
Description
Computes the congruence coefficient, also known as an "unadjusted" correlationor Tucker's congruence coefficient.
Usage
CC(x, y = NULL, unname = FALSE)Arguments
x | a vector or |
y | (optional) the second vector input to use if |
unname | logical; apply |
Author(s)
Phil Chalmersrphilip.chalmers@gmail.com
References
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953
See Also
Examples
vec1 <- runif(1000)vec2 <- runif(1000)CC(vec1, vec2)# compare to cor()cor(vec1, vec2)# column inputdf <- data.frame(vec1, vec2, vec3 = runif(1000))CC(df)cor(df)Compute empirical coverage rates
Description
Computes the detection rate for determining empirical coverage rates given a set of estimatedconfidence intervals. Note that using1 - ECR(CIs, parameter) will provide the empiricaldetection rate. Also supports computing the average width of the CIs, which may be useful when comparingthe efficiency of CI estimators.
Usage
ECR( CIs, parameter, tails = FALSE, CI_width = FALSE, complement = FALSE, names = NULL, unname = FALSE)Arguments
CIs | a |
parameter | a numeric scalar indicating the fixed parameter value. Alternative, a |
tails | logical; when TRUE returns a vector of length 2 to indicate the proportion of timesthe parameter was lower or higher than the supplied interval, respectively. This is mainly onlyuseful when the coverage region is not expected to be symmetric, and therefore is generally notrequired. Note that |
CI_width | logical; rather than returning the overall coverage rate, return theaverage width of the CIs instead? Useful when comparing the efficiency of different CIestimators |
complement | logical; rather than computing the proportion ofpopulation parameters within the CI, return the proportion outside theadvertised CI (1 - ECR = alpha). In the case where only one value is provided,which normally would return a 0 if outside the CI or 1 if inside, the valueswill be switched (useful when using, for example, CI tests of for the significanceof parameters) |
names | an optional character vector used to name the returned object. Generally usefulwhen more than one CI estimate is investigated at once |
unname | logical; apply |
Author(s)
Phil Chalmersrphilip.chalmers@gmail.com
References
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953
See Also
Examples
CIs <- matrix(NA, 100, 2)for(i in 1:100){ dat <- rnorm(100) CIs[i,] <- t.test(dat)$conf.int}ECR(CIs, 0)ECR(CIs, 0, tails = TRUE)ECR(CIs, 0, complement = TRUE) # proportion outside interval# single vector inputCI <- c(-1, 1)ECR(CI, 0)ECR(CI, 0, complement = TRUE)ECR(CI, 2)ECR(CI, 2, complement = TRUE)ECR(CI, 2, tails = TRUE)# parameters of the same size as CIparameters <- 1:10CIs <- cbind(parameters - runif(10), parameters + runif(10))parameters <- parameters + rnorm(10)ECR(CIs, parameters)# average width of CIsECR(CIs, parameters, CI_width=TRUE)# ECR() for multiple CI estimates in the same objectparameter <- 10CIs <- data.frame(lowerCI_1=parameter - runif(10), upperCI_1=parameter + runif(10), lowerCI_2=parameter - 2*runif(10), upperCI_2=parameter + 2*runif(10))head(CIs)ECR(CIs, parameter)ECR(CIs, parameter, tails=TRUE)ECR(CIs, parameter, CI_width=TRUE)# often a good idea to provide names for the outputECR(CIs, parameter, names = c('this', 'that'))ECR(CIs, parameter, CI_width=TRUE, names = c('this', 'that'))ECR(CIs, parameter, tails=TRUE, names = c('this', 'that'))Compute the empirical detection/rejection rate for Type I errors and Power
Description
Computes the detection/rejection rate for determining empiricalType I error and power rates using information from p-values.
Usage
EDR(p, alpha = 0.05, unname = FALSE)Arguments
p | a |
alpha | the detection threshold (typical values are .10, .05, and .01).Default is .05 |
unname | logical; apply |
Author(s)
Phil Chalmersrphilip.chalmers@gmail.com
References
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953
See Also
Examples
rates <- numeric(100)for(i in 1:100){ dat <- rnorm(100) rates[i] <- t.test(dat)$p.value}EDR(rates)EDR(rates, alpha = .01)# multiple rates at oncerates <- cbind(runif(1000), runif(1000))EDR(rates)Generate data
Description
Generate data from a single row in thedesign input (seerunSimulation). R containsnumerous approaches to generate data, some of which are contained in the base package, as wellas inSimDesign (e.g.,rmgh,rValeMaurelli,rHeadrick).However the majority can be found in external packages. See CRAN's list of possible distributions here:https://CRAN.R-project.org/view=Distributions. Note that this function technicallycan be omitted if the data generation is provided in theAnalyse step, thoughin general this is not recommended.
Usage
Generate(condition, fixed_objects)Arguments
condition | a single row from the |
fixed_objects | object passed down from |
Details
The use oftry functions is generally not required in this function becauseGenerateis internally wrapped in atry call. Therefore, if a function stops earlythen this will cause the function to halt internally, the message which triggered thestopwill be recorded, andGenerate will be called again to obtain a different dataset.That said, it may be useful for users to throw their ownstop commands if the datashould be re-drawn for other reasons (e.g., an estimated model terminated correctlybut the maximum number of iterations were reached).
Value
returns a single object containing the data to be analyzed (usually avector,matrix, ordata.frame),orlist
References
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953
See Also
addMissing,Attach,rmgh,rValeMaurelli,rHeadrick
Examples
## Not run: generate <- function(condition, fixed_objects) { N1 <- condition$sample_sizes_group1 N2 <- condition$sample_sizes_group2 sd <- condition$standard_deviations group1 <- rnorm(N1) group2 <- rnorm(N2, sd=sd) dat <- data.frame(group = c(rep('g1', N1), rep('g2', N2)), DV = c(group1, group2)) # just a silly example of a simulated parameter pars <- list(random_number = rnorm(1)) list(dat=dat, parameters=pars)}# similar to above, but using the Attach() function instead of indexinggenerate <- function(condition, fixed_objects) { Attach(condition) N1 <- sample_sizes_group1 N2 <- sample_sizes_group2 sd <- standard_deviations group1 <- rnorm(N1) group2 <- rnorm(N2, sd=sd) dat <- data.frame(group = c(rep('g1', N1), rep('g2', N2)), DV = c(group1, group2)) dat}generate2 <- function(condition, fixed_objects) { mu <- sample(c(-1,0,1), 1) dat <- rnorm(100, mu) dat #return simple vector (discard mu information)}generate3 <- function(condition, fixed_objects) { mu <- sample(c(-1,0,1), 1) dat <- data.frame(DV = rnorm(100, mu)) dat}## End(Not run)Perform a test that indicates whether a givenGenerate() function should be executed
Description
This function is designed to prevent specific generate function executions when thedesign conditions are not met. Primarily useful when thegenerate argument torunSimulation was input as a named list object, however should only beapplied for some specific design condition (otherwise, the data generation moves to thenext function in the list).
Usage
GenerateIf(x, condition = NULL)Arguments
x | logical statement to evaluate. If the statement evaluates to |
condition | (optional) the current design condition. This does not need to be suppliedif the expression in |
Author(s)
Phil Chalmersrphilip.chalmers@gmail.com
References
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953
See Also
Examples
## Not run: # SimFunctions(nGenerate = 2)Design <- createDesign(N=c(10,20,30), var.equal = c(TRUE, FALSE))Generate.G1 <- function(condition, fixed_objects) { GenerateIf(condition$var.equal == FALSE) # only run when unequal vars Attach(condition) dat <- data.frame(DV = c(rnorm(N), rnorm(N, sd=2)), IV = gl(2, N, labels=c('G1', 'G2'))) dat}Generate.G2 <- function(condition, fixed_objects) { Attach(condition) dat <- data.frame(DV = rnorm(N*2), IV = gl(2, N, labels=c('G1', 'G2'))) dat}# always run this analysis for each row in DesignAnalyse <- function(condition, dat, fixed_objects) { mod <- t.test(DV ~ IV, data=dat) mod$p.value}Summarise <- function(condition, results, fixed_objects) { ret <- EDR(results, alpha=.05) ret}#-------------------------------------------------------------------# append names 'Welch' and 'independent' to associated outputres <- runSimulation(design=Design, replications=1000, generate=list(G1=Generate.G1, G2=Generate.G2), analyse=Analyse, summarise=Summarise)res## End(Not run)Compute the integrated root mean-square error
Description
Computes the average/cumulative deviation given two continuous functions and an optionalfunction representing the probability density function. Only one-dimensional integrationis supported.
Usage
IRMSE( estimate, parameter, fn, density = function(theta, ...) 1, lower = -Inf, upper = Inf, ...)Arguments
estimate | a vector of parameter estimates |
parameter | a vector of population parameters |
fn | a continuous function where the first argument is to be integrated and the second argument isa vector of parameters or parameter estimates. This functionrepresents a implied continuous function which uses the sample estimates or population parameters |
density | (optional) a density function used to marginalize (i.e., average), where the firstargument is to be integrated, and must be of the form |
lower | lower bound to begin numerical integration from |
upper | upper bound to finish numerical integration to |
... | additional parameters to pass to |
Details
The integrated root mean-square error (IRMSE) is of the form
IRMSE(\theta) = \sqrt{\int [f(\theta, \hat{\psi}) - f(\theta, \psi)]^2 g(\theta, ...)}
whereg(\theta, ...) is the density function used to marginalize the continuous sample(f(\theta, \hat{\psi})) and population (f(\theta, \psi)) functions.
Value
returns a singlenumeric term indicating the average/cumulative deviationgiven the supplied continuous functions
Author(s)
Phil Chalmersrphilip.chalmers@gmail.com
References
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953
See Also
Examples
# logistic regression function with one slope and interceptfn <- function(theta, param) 1 / (1 + exp(-(param[1] + param[2] * theta)))# sample and population setsest <- c(-0.4951, 1.1253)pop <- c(-0.5, 1)theta <- seq(-10,10,length.out=1000)plot(theta, fn(theta, pop), type = 'l', col='red', ylim = c(0,1))lines(theta, fn(theta, est), col='blue', lty=2)# cumulative result (i.e., standard integral)IRMSE(est, pop, fn)# integrated RMSE result by marginalizing over a N(0,1) distributionden <- function(theta, mean, sd) dnorm(theta, mean=mean, sd=sd)IRMSE(est, pop, fn, den, mean=0, sd=1)# this specification is equivalent to the aboveden2 <- function(theta, ...) dnorm(theta, ...)IRMSE(est, pop, fn, den2, mean=0, sd=1)Compute the mean absolute error
Description
Computes the average absolute deviation of a sample estimate from the parameter value.Accepts estimate and parameter values, as well as estimate values which are in deviation form.
Usage
MAE(estimate, parameter = NULL, type = "MAE", percent = FALSE, unname = FALSE)Arguments
estimate | a |
parameter | a |
type | type of deviation to compute. Can be |
percent | logical; change returned result to percentage by multiplying by 100?Default is FALSE |
unname | logical; apply |
Value
returns a numeric vector indicating the overall mean absolute error in the estimates
Author(s)
Phil Chalmersrphilip.chalmers@gmail.com
References
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953
See Also
RMSE
Examples
pop <- 1samp <- rnorm(100, 1, sd = 0.5)MAE(samp, pop)dev <- samp - popMAE(dev)MAE(samp, pop, type = 'NMAE')MAE(samp, pop, type = 'SMAE')# matrix inputmat <- cbind(M1=rnorm(100, 2, sd = 0.5), M2 = rnorm(100, 2, sd = 1))MAE(mat, parameter = 2)# same, but with data.framedf <- data.frame(M1=rnorm(100, 2, sd = 0.5), M2 = rnorm(100, 2, sd = 1))MAE(df, parameter = c(2,2))# parameters of the same sizeparameters <- 1:10estimates <- parameters + rnorm(10)MAE(estimates, parameters)Compute the relative performance behavior of collections of standard errors
Description
The mean-square relative standard error (MSRSE) compares standard errorestimates to the standard deviation of the respectiveparameter estimates. Values close to 1 indicate that the behavior of the standard errorsclosely matched the sampling variability of the parameter estimates.
Usage
MSRSE(SE, SD, percent = FALSE, unname = FALSE)Arguments
SE | a |
SD | a |
percent | logical; change returned result to percentage by multiplying by 100?Default is FALSE |
unname | logical; apply |
Details
Mean-square relative standard error (MSRSE) is expressed as
MSRSE = \frac{E(SE(\psi)^2)}{SD(\psi)^2} = \frac{1/R * \sum_{r=1}^R SE(\psi_r)^2}{SD(\psi)^2}
whereSE(\psi_r) represents the estimate of the standard error at therthsimulation replication, andSD(\psi) represents the standard deviation estimateof the parameters across allR replications. Note thatSD(\psi)^2 is used,which corresponds to the variance of\psi.
Value
returns avector of ratios indicating the relative performanceof the standard error estimates to the observed parameter standard deviation.Values less than 1 indicate that the standard errors were larger than the standarddeviation of the parameters (hence, the SEs are interpreted as more conservative),while values greater than 1 were smaller than the standard deviation of theparameters (i.e., more liberal SEs)
Author(s)
Phil Chalmersrphilip.chalmers@gmail.com
References
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953
Examples
Generate <- function(condition, fixed_objects) { X <- rep(0:1, each = 50) y <- 10 + 5 * X + rnorm(100, 0, .2) data.frame(y, X)}Analyse <- function(condition, dat, fixed_objects) { mod <- lm(y ~ X, dat) so <- summary(mod) ret <- c(SE = so$coefficients[,"Std. Error"], est = so$coefficients[,"Estimate"]) ret}Summarise <- function(condition, results, fixed_objects) { MSRSE(SE = results[,1:2], SD = results[,3:4])}results <- runSimulation(replications=500, generate=Generate, analyse=Analyse, summarise=Summarise)resultsProbabilistic Bisection Algorithm
Description
The functionPBA searches a specifiedinterval for a root(i.e., zero) of the functionf(x) with respect to its first argument.However, this function differs from deterministic cousins such asuniroot in thatf may contain stochastic errorcomponents, and instead provides a Bayesian interval where the rootis likely to lie. Note that it is assumed thatE[f(x)] is non-decreasinginx and that the root is between the search interval (evaluatedapproximately whencheck.interval=TRUE).See Waeber, Frazier, and Henderson (2013) for details.
Usage
PBA( f.root, interval, ..., p = 0.6, integer = FALSE, tol = if (integer) 0.01 else 1e-04, maxiter = 300L, miniter = 100L, wait.time = NULL, f.prior = NULL, resolution = 10000L, check.interval = TRUE, check.interval.only = FALSE, verbose = interactive())## S3 method for class 'PBA'print(x, ...)## S3 method for class 'PBA'plot(x, type = "posterior", main = "Probabilistic Bisection Posterior", ...)Arguments
f.root | noisy function for which the root is sought |
interval | a vector containing the end-points of the intervalto be searched for the root of the form Note that if the interval is specified as |
... | additional named arguments to be passed to |
p | assumed constant for probability of correct responses (must be > 0.5) |
integer | logical; should the values of the root be considered integeror numeric? The former uses a discreet grid to track the updates, while thelatter currently creates a grid with |
tol | tolerance criteria for convergence based on average of the |
maxiter | the maximum number of iterations (default 300) |
miniter | minimum number of iterations (default 100) |
wait.time | (optional) instead of terminating after specific estimate criteriaare satisfied (e.g., |
f.prior | density function indicating the likely location of the prior(e.g., if root is within [0,1] then |
resolution | constant indicating thenumber of equally spaced grid points to track when |
check.interval | logical; should an initial check be made to determinewhether |
check.interval.only | logical; return only TRUE or FALSE to testwhether there is a likely root given |
verbose | logical; should the iterations and estimate be printed to theconsole? |
x | an object of class |
type | type of plot to draw for PBA object. Can be either 'posterior' or'history' to plot the PBA posterior distribution or the mediation iterationhistory |
main | plot title |
References
Horstein, M. (1963). Sequential transmission using noiseless feedback.IEEE Trans. Inform. Theory, 9(3):136-143.
Waeber, R., Frazier, P. I. & Henderson, S. G. (2013). Bisection Searchwith Noisy Responses. SIAM Journal on Control and Optimization,Society for Industrial & Applied Mathematics (SIAM), 51, 2261-2279.
See Also
Examples
# find x that solves f(x) - b = 0 for the followingf.root <- function(x, b = .6) 1 / (1 + exp(-x)) - bf.root(.3)xs <- seq(-3,3, length.out=1000)plot(xs, f.root(xs), type = 'l', ylab = "f(x)", xlab='x', las=1)abline(h=0, col='red')retuni <- uniroot(f.root, c(0,1))retuniabline(v=retuni$root, col='blue', lty=2)# PBA without noisy rootretpba <- PBA(f.root, c(0,1))retpbaretpba$rootplot(retpba)plot(retpba, type = 'history')# Same problem, however root function is now noisy. Hence, need to solve# fhat(x) - b + e = 0, where E(e) = 0f.root_noisy <- function(x) 1 / (1 + exp(-x)) - .6 + rnorm(1, sd=.02)sapply(rep(.3, 10), f.root_noisy)# uniroot "converges" unreliablyset.seed(123)uniroot(f.root_noisy, c(0,1))$rootuniroot(f.root_noisy, c(0,1))$rootuniroot(f.root_noisy, c(0,1))$root# probabilistic bisection provides better convergenceretpba.noise <- PBA(f.root_noisy, c(0,1))retpba.noiseplot(retpba.noise)plot(retpba.noise, type = 'history')## Not run: # ignore termination criteria and instead run for 30 seconds or 50000 iterationsretpba.noise_30sec <- PBA(f.root_noisy, c(0,1), wait.time = "0:30", maxiter=50000)retpba.noise_30sec## End(Not run)Compute the relative absolute bias of multiple estimators
Description
Computes the relative absolute bias given the bias estimates for multiple estimators.
Usage
RAB(x, percent = FALSE, unname = FALSE)Arguments
x | a |
percent | logical; change returned result to percentage by multiplying by 100?Default is FALSE |
unname | logical; apply |
Value
returns avector of absolute bias ratios indicating the relative biaseffects compared to the first estimator. Values less than 1 indicate better bias estimatesthan the first estimator, while values greater than 1 indicate worse bias than the first estimator
Author(s)
Phil Chalmersrphilip.chalmers@gmail.com
References
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953
Examples
pop <- 1samp1 <- rnorm(5000, 1)bias1 <- bias(samp1, pop)samp2 <- rnorm(5000, 1)bias2 <- bias(samp2, pop)RAB(c(bias1, bias2))RAB(c(bias1, bias2), percent = TRUE) # as a percentageCompute the relative difference
Description
Computes the relative difference statistic of the form(est - pop)/ pop, whichis equivalent to the formest/pop - 1. If matrices are supplied thenan equivalent matrix variant will be used of the form(est - pop) * solve(pop). Values closer to 0 indicate betterrelative parameter recovery. Note that for single variable inputs this is equivalent tobias(..., type = 'relative').
Usage
RD(est, pop, as.vector = TRUE, unname = FALSE)Arguments
est | a |
pop | a |
as.vector | logical; always wrap the result in a |
unname | logical; apply |
Value
returns avector ormatrix depending on the inputs and whetheras.vector was used
Author(s)
Phil Chalmersrphilip.chalmers@gmail.com
References
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953
Examples
# vectorpop <- seq(1, 100, length.out=9)est1 <- pop + rnorm(9, 0, .2)(rds <- RD(est1, pop))summary(rds)# matrixpop <- matrix(c(1:8, 10), 3, 3)est2 <- pop + rnorm(9, 0, .2)RD(est2, pop, as.vector = FALSE)(rds <- RD(est2, pop))summary(rds)Compute the relative efficiency of multiple estimators
Description
Computes the relative efficiency given the RMSE (default) or MSE values for multiple estimators.
Usage
RE(x, MSE = FALSE, percent = FALSE, unname = FALSE)Arguments
x | a |
MSE | logical; are the input value mean squared errors instead of root mean square errors? |
percent | logical; change returned result to percentage by multiplying by 100?Default is FALSE |
unname | logical; apply |
Value
returns avector of variance ratios indicating the relative efficiency comparedto the first estimator. Values less than 1 indicate better efficiency than the firstestimator, while values greater than 1 indicate worse efficiency than the first estimator
Author(s)
Phil Chalmersrphilip.chalmers@gmail.com
References
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953
Examples
pop <- 1samp1 <- rnorm(100, 1, sd = 0.5)RMSE1 <- RMSE(samp1, pop)samp2 <- rnorm(100, 1, sd = 1)RMSE2 <- RMSE(samp2, pop)RE(c(RMSE1, RMSE2))RE(c(RMSE1, RMSE2), percent = TRUE) # as a percentage# using MSE insteadmse <- c(RMSE1, RMSE2)^2RE(mse, MSE = TRUE)Compute the (normalized) root mean square error
Description
Computes the average deviation (root mean square error; also known as the root mean square deviation)of a sample estimate from the parameter value. Accepts estimate and parameter values,as well as estimate values which are in deviation form.
Usage
RMSE( estimate, parameter = NULL, type = "RMSE", MSE = FALSE, percent = FALSE, unname = FALSE)RMSD( estimate, parameter = NULL, type = "RMSE", MSE = FALSE, percent = FALSE, unname = FALSE)Arguments
estimate | a |
parameter | a |
type | type of deviation to compute. Can be |
MSE | logical; return the mean square error equivalent of the results instead of the rootmean-square error (in other words, the result is squared)? Default is |
percent | logical; change returned result to percentage by multiplying by 100?Default is FALSE |
unname | logical; apply |
Value
returns anumeric vector indicating the overall average deviation in the estimates
Author(s)
Phil Chalmersrphilip.chalmers@gmail.com
References
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953
See Also
MAE
Examples
pop <- 1samp <- rnorm(100, 1, sd = 0.5)RMSE(samp, pop)dev <- samp - popRMSE(dev)RMSE(samp, pop, type = 'NRMSE')RMSE(dev, type = 'NRMSE')RMSE(dev, pop, type = 'SRMSE')RMSE(samp, pop, type = 'CV')RMSE(samp, pop, type = 'RMSLE')# percentage reportedRMSE(samp, pop, type = 'NRMSE')RMSE(samp, pop, type = 'NRMSE', percent = TRUE)# matrix inputmat <- cbind(M1=rnorm(100, 2, sd = 0.5), M2 = rnorm(100, 2, sd = 1))RMSE(mat, parameter = 2)RMSE(mat, parameter = c(2, 3))# different parameter associated with each columnmat <- cbind(M1=rnorm(1000, 2, sd = 0.25), M2 = rnorm(1000, 3, sd = .25))RMSE(mat, parameter = c(2,3))# same, but with data.framedf <- data.frame(M1=rnorm(100, 2, sd = 0.5), M2 = rnorm(100, 2, sd = 1))RMSE(df, parameter = c(2,2))# parameters of the same sizeparameters <- 1:10estimates <- parameters + rnorm(10)RMSE(estimates, parameters)Compute the relative standard error ratio
Description
Computes the relative standard error ratio given the set of estimated standard errors (SE) and thedeviation across the R simulation replications (SD). The ratio is formed by finding the expectationof the SE terms, and compares this expectation to the general variability of their respective parameterestimates across the R replications (ratio should equal 1). This is used to roughly evaluate whether theSEs being advertised by a given estimation method matches the sampling variability of the respectiveestimates across samples.
Usage
RSE(SE, ests, unname = FALSE)Arguments
SE | a |
ests | a |
unname | logical; apply |
Value
returns vector of variance ratios, (RSV = SE^2/SD^2)
Author(s)
Phil Chalmersrphilip.chalmers@gmail.com
References
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953
Examples
R <- 10000par_ests <- cbind(rnorm(R), rnorm(R, sd=1/10), rnorm(R, sd=1/15))colnames(par_ests) <- paste0("par", 1:3)(SDs <- colSDs(par_ests))SEs <- cbind(1 + rnorm(R, sd=.01), 1/10 + + rnorm(R, sd=.01), 1/15 + rnorm(R, sd=.01))(E_SEs <- colMeans(SEs))RSE(SEs, par_ests)# equivalent to the formcolMeans(SEs) / SDsRobbins-Monro (1951) stochastic root-finding algorithm
Description
Function performs stochastic root solving for the providedf(x)using the Robbins-Monro (1951) algorithm. Differs from deterministiccousins such asuniroot in thatf may contain stochastic errorcomponents, where the root is obtained through the running average methodprovided by noise filter (see alsoPBA).Assumes thatE[f(x)] is non-decreasing inx.
Usage
RobbinsMonro( f, p, ..., Polyak_Juditsky = FALSE, maxiter = 500L, miniter = 100L, k = 3L, tol = 1e-05, verbose = interactive(), fn.a = function(iter, a = 1, b = 1/2, c = 0, ...) a/(iter + c)^b)## S3 method for class 'RM'print(x, ...)## S3 method for class 'RM'plot(x, par = 1, main = NULL, Polyak_Juditsky = FALSE, ...)Arguments
f | noisy function for which the root is sought |
p | vector of starting values to be passed as |
... | additional named arguments to be passed to |
Polyak_Juditsky | logical; apply the Polyak and Juditsky (1992)running-average method? Returns the final running average estimateusing the Robbins-Monro updates (also applies to |
maxiter | the maximum number of iterations (default 500) |
miniter | minimum number of iterations (default 100) |
k | number of consecutive |
tol | tolerance criteria for convergence on the changes in theupdated |
verbose | logical; should the iterations and estimate be printed to theconsole? |
fn.a | function to create the Note that if a different function is provided it must satisfy the propertythat |
x | an object of class |
par | which parameter in the original vector |
main | plot title |
References
Polyak, B. T. and Juditsky, A. B. (1992). Acceleration of StochasticApproximation by Averaging. SIAM Journal on Control and Optimization,30(4):838.
Robbins, H. and Monro, S. (1951). A stochastic approximation method.Ann.Math.Statistics, 22:400-407.
Spall, J.C. (2000). Adaptive stochastic approximation by the simultaneousperturbation method. IEEE Trans. Autom. Control 45, 1839-1853.
See Also
Examples
# find x that solves f(x) - b = 0 for the followingf.root <- function(x, b = .6) 1 / (1 + exp(-x)) - bf.root(.3)xs <- seq(-3,3, length.out=1000)plot(xs, f.root(xs), type = 'l', ylab = "f(x)", xlab='x')abline(h=0, col='red')retuni <- uniroot(f.root, c(0,1))retuniabline(v=retuni$root, col='blue', lty=2)# Robbins-Monro without noisy root, start with p=.9retrm <- RobbinsMonro(f.root, .9)retrmplot(retrm)# Same problem, however root function is now noisy. Hence, need to solve# fhat(x) - b + e = 0, where E(e) = 0f.root_noisy <- function(x) 1 / (1 + exp(-x)) - .6 + rnorm(1, sd=.02)sapply(rep(.3, 10), f.root_noisy)# uniroot "converges" unreliablyset.seed(123)uniroot(f.root_noisy, c(0,1))$rootuniroot(f.root_noisy, c(0,1))$rootuniroot(f.root_noisy, c(0,1))$root# Robbins-Monro provides better convergenceretrm.noise <- RobbinsMonro(f.root_noisy, .9)retrm.noiseplot(retrm.noise)# different power (b) for fn.a()retrm.b2 <- RobbinsMonro(f.root_noisy, .9, b = .01)retrm.b2plot(retrm.b2)# use Polyak-Juditsky averaging (b should be closer to 0 to work well)retrm.PJ <- RobbinsMonro(f.root_noisy, .9, b = .01, Polyak_Juditsky = TRUE)retrm.PJ # final Polyak_Juditsky estimateplot(retrm.PJ) # Robbins-Monro historyplot(retrm.PJ, Polyak_Juditsky = TRUE) # Polyak_Juditsky historySurrogate Function Approximation via the Generalized Linear Model
Description
Given a simulation that was executed withrunSimulation,potentially with the argumentstore_results = TRUE to store theunsummarised analysis results, fit a surrogate function approximation (SFA)model to the results and (optionally) perform a root-solvingstep to solve a target quantity. See Schoemann et al. (2014) for details.
Usage
SFA( results, formula, family = "binomial", b = NULL, design = NULL, CI = 0.95, interval = NULL, ...)## S3 method for class 'SFA'print(x, ...)Arguments
results | data returned from |
formula | formula to specify for the regression model |
family | character vector indicating the family of GLMs to use(see |
b | (optional) Target quantity to use for root solving given the fittedsurrogate function (e.g., find sample size associated with SFA implied power of .80) |
design | (optional) |
CI | advertised confidence interval of SFA prediction around solved target |
interval | interval to be passed to |
... | additional arguments to pass to |
x | an object of class |
Author(s)
Phil Chalmersrphilip.chalmers@gmail.com
References
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248
Schoemann, A. M., Miller, P., Pornprasertmanit, S., and Wu, W. (2014).Using Monte Carlo simulations to determine power and sample size for plannedmissing designs.International Journal of Behavioral Development,SAGE Publications, 38, 471-479.
See Also
Examples
## Not run: # create long Design object to fit surrogate overDesign <- createDesign(N = 100:500, d = .2)Design#~~~~~~~~~~~~~~~~~~~~~~~~#### Step 2 --- Define generate, analyse, and summarise functionsGenerate <- function(condition, fixed_objects) { Attach(condition) group1 <- rnorm(N) group2 <- rnorm(N, mean=d) dat <- data.frame(group = gl(2, N, labels=c('G1', 'G2')), DV = c(group1, group2)) dat}Analyse <- function(condition, dat, fixed_objects) { p <- c(p = t.test(DV ~ group, dat, var.equal=TRUE)$p.value) p}Summarise <- function(condition, results, fixed_objects) { ret <- EDR(results, alpha = .05) ret}#~~~~~~~~~~~~~~~~~~~~~~~~#### Step 3 --- Estimate power over N# Use small number of replications given range of sample sizes## note that due to the lower replications disabling the## RAM printing will help reduce overheadsim <- runSimulation(design=Design, replications=10, generate=Generate, analyse=Analyse, summarise=Summarise, store_results=TRUE, save=FALSE, progress=FALSE, control=list(print_RAM=FALSE))sim# total of 4010 replicationsum(sim$REPLICATIONS)# use the unsummarised results for the SFA, and include p.values < alphasim_results <- SimResults(sim)sim_results <- within(sim_results, sig <- p < .05)sim_results# fitted modelsfa <- SFA(sim_results, formula = sig ~ N)sfasummary(sfa)# plot the observed and SFA expected valuesplot(p ~ N, sim, las=1, pch=16, main='Rejection rates with R=10')pred <- predict(sfa, type = 'response')lines(sim_results$N, pred, col='red', lty=2)# fitted model + root-solved solution given f(.) = b,# where b = target power of .8design <- data.frame(N=NA, d=.2)sfa.root <- SFA(sim_results, formula = sig ~ N, b=.8, design=design)sfa.root# true rootpwr::pwr.t.test(power=.8, d=.2)################# example with smaller range but higher precisionDesign <- createDesign(N = 375:425, d = .2)Designsim2 <- runSimulation(design=Design, replications=100, generate=Generate, analyse=Analyse, summarise=Summarise, store_results=TRUE, save=FALSE, progress=FALSE, control=list(print_RAM=FALSE))sim2sum(sim2$REPLICATIONS) # more replications in total# use the unsummarised results for the SFA, and include p.values < alphasim_results <- SimResults(sim2)sim_results <- within(sim_results, sig <- p < .05)sim_results# fitted modelsfa <- SFA(sim_results, formula = sig ~ N)sfasummary(sfa)# plot the observed and SFA expected valuesplot(p ~ N, sim2, las=1, pch=16, main='Rejection rates with R=100')pred <- predict(sfa, type = 'response')lines(sim_results$N, pred, col='red', lty=2)# fitted model + root-solved solution given f(.) = b,# where b = target power of .8design <- data.frame(N=NA, d=.2)sfa.root <- SFA(sim_results, formula = sig ~ N, b=.8, design=design, interval=c(100, 500))sfa.root# true rootpwr::pwr.t.test(power=.8, d=.2)#################### vary multiple parameters (e.g., sample size + effect size) to fit# multi-parameter surrogateDesign <- createDesign(N = seq(from=10, to=500, by=10), d = seq(from=.1, to=.5, by=.1))Designsim3 <- runSimulation(design=Design, replications=50, generate=Generate, analyse=Analyse, summarise=Summarise, store_results=TRUE, save=FALSE, progress=FALSE, control=list(print_RAM=FALSE))sim3sum(sim3$REPLICATIONS)# use the unsummarised results for the SFA, and include p.values < alphasim_results <- SimResults(sim3)sim_results <- within(sim_results, sig <- p < .05)sim_results# additive effects (logit(sig) ~ N + d)sfa0 <- SFA(sim_results, formula = sig ~ N+d)sfa0# multiplicative effects (logit(sig) ~ N + d + N:d)sfa <- SFA(sim_results, formula = sig ~ N*d)sfa# multiplicative better fit (sample size interacts with effect size)anova(sfa0, sfa, test = "LRT")summary(sfa)# plot the observed and SFA expected valueslibrary(ggplot2)sim3$pred <- predict(sfa, type = 'response', newdata=sim3)ggplot(sim3, aes(N, p, color = factor(d))) + geom_point() + geom_line(aes(y=pred)) + facet_wrap(~factor(d))# fitted model + root-solved solution given f(.) = b,# where b = target power of .8design <- data.frame(N=NA, d=.2)sfa.root <- SFA(sim_results, formula = sig ~ N * d, b=.8, design=design, interval=c(100, 500))sfa.root# true rootpwr::pwr.t.test(power=.8, d=.2)# root prediction where d *not* used in original datadesign <- data.frame(N=NA, d=.25)sfa.root <- SFA(sim_results, formula = sig ~ N * d, b=.8, design=design, interval=c(100, 500))sfa.root# true rootpwr::pwr.t.test(power=.8, d=.25)## End(Not run)Empirical detection robustness method suggested by Serlin (2000)
Description
Hypothesis test to determine whether an observed empirical detection rate,coupled with a given robustness interval, statistically differs from thepopulation value. Uses the methods described by Serlin (2000) as well togenerate critical values (similar to confidence intervals, but define a fixedwindow of robustness). Critical values may be computed without performing the simulationexperiment (hence, can be obtained a priori).
Usage
Serlin2000(p, alpha, delta, R, CI = 0.95)Arguments
p | (optional) a vector containing the empirical detection rate(s) to be tested.Omitting this input will compute only the CV1 and CV2 values, while including thisinput will perform a one-sided hypothesis test for robustness |
alpha | Type I error rate (e.g., often set to .05) |
delta | (optional) symmetric robustness interval around |
R | number of replications used in the simulation |
CI | confidence interval for |
Author(s)
Phil Chalmersrphilip.chalmers@gmail.com
References
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248
Serlin, R. C. (2000). Testing for Robustness in Monte Carlo Studies.Psychological Methods, 5, 230-240.
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953
Examples
# Cochran's criteria at alpha = .05 (i.e., 0.5 +- .01), assuming N = 2000Serlin2000(p = .051, alpha = .05, delta = .01, R = 2000)# Bradley's liberal criteria given p = .06 and .076, assuming N = 1000Serlin2000(p = .060, alpha = .05, delta = .025, R = 1000)Serlin2000(p = .076, alpha = .05, delta = .025, R = 1000)# multiple p-valuesSerlin2000(p = c(.05, .06, .07), alpha = .05, delta = .025, R = 1000)# CV values computed before simulation performedSerlin2000(alpha = .05, R = 2500)Function for decomposing the simulation into ANOVA-based effect sizes
Description
Given the results from a simulation withrunSimulation form an ANOVA table (withoutp-values) with effect sizes based on the eta-squared statistic. These results provide approximateindications of observable simulation effects, therefore these ANOVA-based results are generally usefulas exploratory rather than inferential tools.
Usage
SimAnova(formula, dat, subset = NULL, rates = TRUE)Arguments
formula | an R formula generally of a form suitable for |
dat | an object returned from |
subset | an optional argument to be passed to |
rates | logical; does the dependent variable consist of rates (e.g., returned from |
Author(s)
Phil Chalmersrphilip.chalmers@gmail.com
References
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953
Examples
data(BF_sim)# all results (not usually good to mix Power and Type I results together)SimAnova(alpha.05.F ~ (groups_equal + distribution)^2, BF_sim)# only use anova for Type I error conditionsSimAnova(alpha.05.F ~ (groups_equal + distribution)^2, BF_sim, subset = var_ratio == 1)# run all DVs at once using the same formulaSimAnova(~ groups_equal * distribution, BF_sim, subset = var_ratio == 1)Check for missing files in array simulations
Description
Given the saved files from arunArraySimulation remoteevaluation check whether all.rds files have been saved. If missingthe missing row condition numbers will be returned.
Usage
SimCheck(dir = NULL, files = NULL, min = 1L, max = NULL)Arguments
dir | character vector input indicating the directorycontaining the |
files | vector of file names referring to the saved simulation files.E.g. |
min | minimum number after the |
max | maximum number after the |
Value
returns an invisible list of indices of empty, missing andempty-and-missing row conditions. If no missing then an empty list isreturned
Author(s)
Phil Chalmersrphilip.chalmers@gmail.com
References
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248
See Also
Examples
## Not run: # if files are in mysimfiles/ directorySimCheck('mysimfiles')# specifying files explicilitysetwd('mysimfiles/')SimCheck(files=dir())## End(Not run)Removes/cleans files and folders that have been saved
Description
This function is mainly used in pilot studies where results and datasets have been temporarily savedbyrunSimulation but should be removed before beginning the fullMonte Carlo simulation (e.g., remove files and folders which contained bugs/biased results).
Usage
SimClean( ..., dirs = NULL, temp = TRUE, results = FALSE, seeds = FALSE, save_details = list())Arguments
... | one or more character objects indicating which files to remove. Used to remove |
dirs | a character vector indicating which directories to remove |
temp | logical; remove the temporary file saved when passing |
results | logical; remove the |
seeds | logical; remove the seed filessaved when passing |
save_details | a list pertaining to information about how and where files were saved(see the corresponding list in |
Author(s)
Phil Chalmersrphilip.chalmers@gmail.com
References
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953
See Also
Examples
## Not run: # remove file called 'results.rds'SimClean('results.rds')# remove default temp fileSimClean()# remove customized saved-results directory called 'mydir'SimClean(results = TRUE, save_details = list(save_results_dirname = 'mydir'))## End(Not run)Collapse separate simulation files into a single result
Description
This function collects and aggregates the results fromSimDesign'srunSimulation into a singleobjects suitable for post-analyses, or combines all the saved results directories and combinesthem into one. This is useful when results are run piece-wise on one node (e.g., 500 replicationsin one batch, 500 again at a later date, though be careful about theset.seeduse as the random numbers will tend to correlate the more it is used) or run independently across differentnodes/computing cores (e.g., seerunArraySimulation.
Usage
SimCollect( dir = NULL, files = NULL, filename = NULL, select = NULL, check.only = FALSE, target.reps = NULL, warning_details = FALSE, error_details = TRUE, gc = FALSE)aggregate_simulations(...)Arguments
dir | a |
files | a |
filename | (optional) name of .rds file to save aggregate simulation file to. If not specifiedthen the results will only be returned in the R console. |
select | a character vector indicating columns to variables to select from the |
check.only | logical; for larger simulations file sets, such as those generated by |
target.reps | (optional) number of replications to check against to evaluate whetherthe simulation files returned the desired number of replications. If missing, the highestdetected value from the collected set of replication information will be used |
warning_details | logical; include the aggregate of the warnings to be extracted via |
error_details | logical; include the aggregate of the errors to be extracted via |
gc | logical; explicitly call R's garbage collector |
... | not used |
Value
returns adata.frame/tibble with the (weighted) average/aggregateof the simulation results
Author(s)
Phil Chalmersrphilip.chalmers@gmail.com
References
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953
See Also
runSimulation,runArraySimulation,SimCheck
Examples
## Not run: setwd('my_working_directory')## run simulations to save the .rds files (or move them to the working directory)# seeds1 <- genSeeds(design)# seeds2 <- genSeeds(design, old.seeds=seeds1)# ret1 <- runSimulation(design, ..., seed=seeds1, filename='file1')# ret2 <- runSimulation(design, ..., seed=seeds2, filename='file2')# saves to the hard-drive and stores in workspacefinal <- SimCollect(files = c('file1.rds', 'file2.rds'))final# If filename not included, can be extracted from results# files <- c(SimExtract(ret1, 'filename'), SimExtract(ret2, 'filename'))# final <- SimCollect(files = files)################################################## Example where each row condition is repeated, evaluated independently,# and later collapsed into a single analysis object# Each condition repeated four times (hence, replications# should be set to desired.reps/4)Design <- createDesign(mu = c(0,5), N = c(30, 60))Design# assume the N=60 takes longer, and should be spread out across more arraysDesign_long <- expandDesign(Design, c(2,2,4,4))Design_longreplications <- c(rep(50, 4), rep(25,8))data.frame(Design_long, replications)#-------------------------------------------------------------------Generate <- function(condition, fixed_objects) { dat <- with(condition, rnorm(N, mean=mu)) dat}Analyse <- function(condition, dat, fixed_objects) { ret <- c(mean=mean(dat), SD=sd(dat)) ret}Summarise <- function(condition, results, fixed_objects) { ret <- colMeans(results) ret}#-------------------------------------------------------------------# create directory to store all final simulation filesdir.create('sim_files/')iseed <- genSeeds()# distribute jobs independentlysapply(1:nrow(Design_long), \(i) { runArraySimulation(design=Design_long, replications=replications, generate=Generate, analyse=Analyse, summarise=Summarise, arrayID=i, dirname='sim_files/', filename='job', iseed=iseed)}) |> invisible()# check for any missed filesSimCheck(dir='sim_files/')# check that all replications satisfy targetSimCollect('sim_files/', check.only = TRUE)# specify files explicitlySimCollect(files = list.files(path='sim_files/', pattern="*.rds", full.names=TRUE), check.only = TRUE)# this would have been returned were the target.rep supposed to be 1000SimCollect('sim_files/', check.only = TRUE, target.reps=1000)# aggregate into single objectsim <- SimCollect('sim_files/')sim# view list of error messages (if there were any raised)SimCollect('sim_files/', select = 'ERRORS')SimClean(dir='sim_files/')## End(Not run)Function to extract extra information from SimDesign objects
Description
Function used to extract any error or warnings messages, the seeds associatedwith any error or warning messages, and any analysis results that were stored in thefinal simulation object.
Usage
SimExtract(object, what, fuzzy = TRUE, append = TRUE)Arguments
object | object returned from |
what | character vector indicating what information to extract, written in agnostic casing(e.g., Possible inputs include Note that |
fuzzy | logical; use fuzzy string matching to reduce effectively identical messages?For example, when attempting to invert a matrix the error message"System is computationally singular: reciprocal condition number = 1.92747e-17" and"System is computationally singular: reciprocal condition number = 2.15321e-16" areeffectively the same, and likely should be reported in the same columns of the extracted output |
append | logical; append the design conditions when extracting error/warning messages? |
Author(s)
Phil Chalmersrphilip.chalmers@gmail.com
References
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953
Examples
## Not run: Generate <- function(condition, fixed_objects) { int <- sample(1:10, 1) if(int > 5) warning('GENERATE WARNING: int greater than 5') if(int == 1) stop('GENERATE ERROR: integer is 1') rnorm(5)}Analyse <- function(condition, dat, fixed_objects) { int <- sample(1:10, 1) if(int > 5) warning('ANALYSE WARNING: int greater than 5') if(int == 1) stop('ANALYSE ERROR: int is 1') c(ret = 1)}Summarise <- function(condition, results, fixed_objects) { mean(results)}res <- runSimulation(replications = 100, seed=1234, generate=Generate, analyse=Analyse, summarise=Summarise)resSimExtract(res, what = 'errors')SimExtract(res, what = 'warnings')seeds <- SimExtract(res, what = 'error_seeds')seeds[,1:3]# replicate a specific error for debugging (type Q to exit debugger)res <- runSimulation(replications = 100, load_seed=seeds[,1], debug='analyse', generate=Generate, analyse=Analyse, summarise=Summarise)## End(Not run)Template-based generation of the Generate-Analyse-Summarise functions
Description
This function prints template versions of the requiredDesign and Generate-Analyse-Summarise functionsforSimDesign to run simulations. Templated output comes complete with the correct inputs,class of outputs, and optional comments to help with the initial definitions.Use this at the start of your Monte Carlo simulation study. Followingthe definition of theSimDesign template file please refer to detailed the informationinrunSimulation for how to edit this template to make a working simulation study.
Usage
SimFunctions( filename = NULL, dir = getwd(), save_structure = "single", extra_file = FALSE, nAnalyses = 1, nGenerate = 1, summarise = TRUE, comments = FALSE, openFiles = TRUE, spin_header = TRUE, SimSolve = FALSE)Arguments
filename | a character vector indicating whether the output should be saved to two respective filescontaining the simulation design and the functional components, respectively. Using this optionis generally the recommended approach when beginning to write a Monte Carlo simulation |
dir | the directory to write the files to. Default is the working directory |
save_structure | character indicating the number of files to break the simulation code intowhen |
extra_file | logical; should an extra file be saved containing user-defined functions or objects?Default is |
nAnalyses | number of analysis functions to create (default is 1). Increasing the valueof this argument when independent analysis are being performed allows function definitionsto be better partitioned and potentially more modular |
nGenerate | number of generate functions to create (default is 1). Increase the valueof this argument when when the data generation functions are very different and shouldbe isolated from each other (otherwise, if there is much in common between the generatesteps, the default of 1 should be preferred). Otherwise, if |
summarise | include |
comments | logical; include helpful comments? Default is |
openFiles | logical; after files have been generated, open them in your text editor(e.g., if Rstudio is running the scripts will open in a new tab)? |
spin_header | logical; include a basic |
SimSolve | logical; should the template be generated that is intended for a |
Details
The recommended approach to organizing Monte Carlo simulation files is to first save the template generatedby this function to the hard-drive by passing a suitablefilename argument(which, if users are interactingwith R via the RStudio IDE, will also open the template file after it has been saved).For larger simulations, twoseparate files could also be used (achieved by changingout.files),and may be easier for debugging/sourcing the simulation code; however, this is amatter of preference and does not change any functionality in the package.
Author(s)
Phil Chalmersrphilip.chalmers@gmail.com
References
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953
See Also
Examples
SimFunctions()SimFunctions(comments = TRUE) #with helpful comments## Not run: # write output files to a single file with commentsSimFunctions('mysim', comments = TRUE)# Multiple analysis functions for optional partitioningSimFunctions(nAnalyses = 2)SimFunctions(nAnalyses = 3)# Multiple analysis + generate functionsSimFunctions(nAnalyses = 2, nGenerate=2)# save multiple files for the purpose of designing larger simulations# (also include extra_file for user-defined objects/functions)SimFunctions('myBigSim', save_structure = 'all', nAnalyses = 3, nGenerate=2, extra_file = TRUE)## End(Not run)Function to read in saved simulation results
Description
IfrunSimulation was passed the flagsave_results = TRUE then therow results corresponding to thedesign object will be stored to a suitablesub-directory as individual.rds files. While users could usereadRDS directlyto read these files in themselves, this convenience function will read the desired rows inautomatically given the returned objectfrom the simulation. Can be used to read in 1 or more.rds files at once (if more than 1 fileis read in then the result will be stored in a list).
Usage
SimResults(obj, which, prefix = "results-row", wd = getwd(), rbind = FALSE)Arguments
obj | object returned from Alternatively, the object can be from the |
which | a numeric vector indicating which rows should be read in. If missing, all rows will beread in |
prefix | character indicating prefix used for stored files |
wd | working directory; default is found with |
rbind | logical; should the results be combined by row or returned asa list? Only applicable when the supplied |
Value
the returned result is either a nested list (whenlength(which) > 1) or a single list(whenlength(which) == 1) containing the simulation results. Each read-in result refers toa list of 4 elements:
conditionthe associate row (ID) and conditions from therespective
designobjectresultsthe object with returned from the
analysefunction, potentiallysimplified into a matrix or data.frameerrorsa table containing the message and number of errors that causedthe generate-analyse steps to be rerun. These should be inspected carefully as theycould indicate validity issues with the simulation that should be noted
warningsa table containing the message and number of non-fatal warningswhich arose from the analyse step. These should be inspected carefully as theycould indicate validity issues with the simulation that should be noted
Author(s)
Phil Chalmersrphilip.chalmers@gmail.com
References
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953
Examples
## Not run: # store results (default behaviour)sim <- runSimulation(..., store_results = TRUE)SimResults(sim)# store results to drive if RAM issues are presentobj <- runSimulation(..., save_results = TRUE)# row 1 resultsrow1 <- SimResults(obj, 1)# rows 1:5, stored in a named listrows_1to5 <- SimResults(obj, 1:5)# all resultsrows_all <- SimResults(obj)## End(Not run)Generate a basic Monte Carlo simulation GUI template
Description
This function generates suitable stand-alone code from theshiny package to create simpleweb-interfaces for performing single condition Monte Carlo simulations. The templategenerated is relatively minimalistic, but allows the user to quickly and easilyedit the saved files to customize the associated shiny elements as they see fit.
Usage
SimShiny(filename = NULL, dir = getwd(), design, ...)Arguments
filename | an optional name of a text file to save the server and UI components(e.g., 'mysimGUI.R'). If omitted, the code will be printed to the R console instead |
dir | the directory to write the files to. Default is the working directory |
design |
|
... | arguments to be passed to |
Author(s)
Phil Chalmersrphilip.chalmers@gmail.com
References
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953
See Also
Examples
## Not run: Design <- createDesign(sample_size = c(30, 60, 90, 120), group_size_ratio = c(1, 4, 8), standard_deviation_ratio = c(.5, 1, 2))Generate <- function(condition, fixed_objects) { N <- condition$sample_size grs <- condition$group_size_ratio sd <- condition$standard_deviation_ratio if(grs < 1){ N2 <- N / (1/grs + 1) N1 <- N - N2 } else { N1 <- N / (grs + 1) N2 <- N - N1 } group1 <- rnorm(N1) group2 <- rnorm(N2, sd=sd) dat <- data.frame(group = c(rep('g1', N1), rep('g2', N2)), DV = c(group1, group2)) dat}Analyse <- function(condition, dat, fixed_objects) { welch <- t.test(DV ~ group, dat) ind <- t.test(DV ~ group, dat, var.equal=TRUE) # In this function the p values for the t-tests are returned, # and make sure to name each element, for future reference ret <- c(welch = welch$p.value, independent = ind$p.value) ret}Summarise <- function(condition, results, fixed_objects) { #find results of interest here (e.g., alpha < .1, .05, .01) ret <- EDR(results, alpha = .05) ret}# test that it works# Final <- runSimulation(design=Design, replications=5,# generate=Generate, analyse=Analyse, summarise=Summarise)# print code to consoleSimShiny(design=Design, generate=Generate, analyse=Analyse, summarise=Summarise)# save shiny code to fileSimShiny('app.R', design=Design, generate=Generate, analyse=Analyse, summarise=Summarise)# run the applicationshiny::runApp()shiny::runApp(launch.browser = TRUE) # in web-browser## End(Not run)One Dimensional Root (Zero) Finding in Simulation Experiments
Description
Function provides a stochastic root-finding approach to solvespecific quantities in simulation experiments (e.g., solving for a specificsample size to meet a target power rate) using theProbablistic Bisection Algorithm with Bolstering and Interpolations(ProBABLI; Chalmers, 2024). The structure follows thethree functional steps outlined inrunSimulation, however portions ofthedesign input are taken as variables to be estimated rather thanfixed, where an additional constantb is required in order tosolve the root equationf(x) - b = 0.
Usage
SimSolve( design, interval, b, generate, analyse, summarise, replications = list(burnin.iter = 15L, burnin.reps = 50L, max.reps = 500L, min.total.reps = 9000L, increase.by = 10L), integer = TRUE, formula = y ~ poly(x, 2), family = "binomial", parallel = FALSE, cl = NULL, save = TRUE, resume = TRUE, method = "ProBABLI", wait.time = NULL, ncores = parallelly::availableCores(omit = 1L), type = ifelse(.Platform$OS.type == "windows", "PSOCK", "FORK"), maxiter = 100L, check.interval = TRUE, predCI = 0.95, predCI.tol = NULL, lastSolve = NULL, verbose = interactive(), control = list(), ...)## S3 method for class 'SimSolve'summary(object, tab.only = FALSE, reps.cutoff = 300, ...)## S3 method for class 'SimSolve'plot(x, y, ...)Arguments
design | a |
interval | a |
b | a single constant used to solve the root equation |
generate | generate function. See |
analyse | analysis function. See |
summarise | summary function that returns a single number correspondingto the function evaluation |
replications | a named Vector inputs can specify the exact replications for each respectiveiteration. As a general rule, early iterationsshould be relatively low for initial searches to avoid unnecessary computationswhen locating the approximate location of the root,while the number of replications should gradually increase after this burn-into reduce the sampling variability. |
integer | logical; should the values of the root be considered integeror numeric? If |
formula | regression formula to use when |
family |
Note that if individual results from the |
parallel | for parallel computing for slower simulation experiments(see |
cl | |
save | logical; store temporary file in case of crashes. If detectedin the working directory will automatically be loaded to resume (see |
resume | logical; if a temporary |
method | optimizer method to use. Default is the stochastic root-finder |
wait.time | (optional) argument passed to |
ncores | |
type | type of cluster (see If |
maxiter | the maximum number of iterations (default 100) except when |
check.interval | logical; should an initial check be made to determinewhether |
predCI | advertised confidence interval probability for finalmodel-based prediction of target |
predCI.tol | (optional) rather than relying on the changes between successiveestimates (default), if the predicting CI is consistently within thissupplied tolerance range then the search will be terminated.This provides termination behaviour based on the predictedprecision of the root solutions rather than their stability history, and thereforecan be used to obtain estimates with a particular level of advertised accuracy.For example, when solving for a sample size value ( |
lastSolve | stub for |
verbose | logical; print information to the console? |
control | a
|
... | additional arguments to be pasted to |
object | object of class |
tab.only | logical; print only the (reduced) table of estimates? |
reps.cutoff | integer indicating the rows to omit from the outputif the number of replications are less than this value |
x | object of class |
y | design row to plot. If omitted defaults to 1 |
Details
Root finding is performed using a progressively bolstered version of theprobabilistic bisection algorithm (PBA) to find theassociated root given the noisy simulationobjective function evaluations. Information is collected throughoutthe search to make more accurate predictions about theassociated root via interpolation. If interpolations fail, then the lastiteration of the PBA search is returned as the best guess.
For greater advertised accuracy with ProBABLI, termination criteriacan be based on the width of the advertised predicting interval(viapredCI.tol) or by specifying how long the investigatoris willing to wait for the final estimates (viawait.time,where longer wait times lead to progressively better accuracy inthe final estimates).
Value
the filled-indesign object containing the associated lower and upper intervalestimates from the stochastic optimization
Author(s)
Phil Chalmersrphilip.chalmers@gmail.com
References
Chalmers, R. P. (2024). Solving Variables with Monte Carlo Simulation Experiments: AStochastic Root-Solving Approach.Psychological Methods. DOI: 10.1037/met0000689
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248
See Also
Examples
## Not run: ############################ A Priori Power Analysis########################### GOAL: Find specific sample size in each group for independent t-test# corresponding to a power rate of .8## For ease of the setup, assume the groups are the same size, and the mean# difference corresponds to Cohen's d values of .2, .5, and .8# This example can be solved numerically using the pwr package (see below),# though the following simulation setup is far more general and can be# used for any generate-analyse combination of interest# SimFunctions(SimSolve=TRUE)#### Step 1 --- Define your conditions under study and create design data.frame.#### However, use NA placeholder for sample size as it must be solved,#### and add desired power rate to objectDesign <- createDesign(N = NA, d = c(.2, .5, .8), sig.level = .05)Design # solve for NA's#~~~~~~~~~~~~~~~~~~~~~~~~#### Step 2 --- Define generate, analyse, and summarise functionsGenerate <- function(condition, fixed_objects) { Attach(condition) group1 <- rnorm(N) group2 <- rnorm(N, mean=d) dat <- data.frame(group = gl(2, N, labels=c('G1', 'G2')), DV = c(group1, group2)) dat}Analyse <- function(condition, dat, fixed_objects) { p <- t.test(DV ~ group, dat, var.equal=TRUE)$p.value p}Summarise <- function(condition, results, fixed_objects) { # Must return a single number corresponding to f(x) in the # root equation f(x) = b ret <- c(power = EDR(results, alpha = condition$sig.level)) ret}#~~~~~~~~~~~~~~~~~~~~~~~~#### Step 3 --- Optimize N over the rows in design### (For debugging) may want to see if simulation code works as intended first### for some given set of inputs# runSimulation(design=createDesign(N=100, d=.8, sig.level=.05),# replications=10, generate=Generate, analyse=Analyse,# summarise=Summarise)# Initial search between N = [10,500] for each row using the default # integer solver (integer = TRUE). In this example, b = target powersolved <- SimSolve(design=Design, b=.8, interval=c(10, 500), generate=Generate, analyse=Analyse, summarise=Summarise)solvedsummary(solved)plot(solved, 1)plot(solved, 2)plot(solved, 3)# also can plot median history and estimate precisionplot(solved, 1, type = 'history')plot(solved, 1, type = 'density')plot(solved, 1, type = 'iterations')# verify with true power from pwr packagelibrary(pwr)pwr.t.test(d=.2, power = .8) # sig.level/alpha = .05 by defaultpwr.t.test(d=.5, power = .8)pwr.t.test(d=.8, power = .8)# use estimated N results to see how close power wasN <- solved$Npwr.t.test(d=.2, n=N[1])pwr.t.test(d=.5, n=N[2])pwr.t.test(d=.8, n=N[3])# with roundingN <- ceiling(solved$N)pwr.t.test(d=.2, n=N[1])pwr.t.test(d=.5, n=N[2])pwr.t.test(d=.8, n=N[3])### failing analytic formula, confirm results with more precise### simulation via runSimulation()### (not required, if accuracy is important then ProBABLI should be run longer)# csolved <- solved# csolved$N <- ceiling(solved$N)# confirm <- runSimulation(design=csolved, replications=10000, parallel=TRUE,# generate=Generate, analyse=Analyse,# summarise=Summarise)# confirm# Similarly, terminate if the prediction interval is consistently predicted# to be between [.795, .805]. Note that maxiter increased as wellsolved_predCI <- SimSolve(design=Design, b=.8, interval=c(10, 500), generate=Generate, analyse=Analyse, summarise=Summarise, maxiter=200, predCI.tol=.01)solved_predCIsummary(solved_predCI) # note that predCI.b are all within [.795, .805]N <- solved_predCI$Npwr.t.test(d=.2, n=N[1])pwr.t.test(d=.5, n=N[2])pwr.t.test(d=.8, n=N[3])# Alternatively, and often more realistically, wait.time can be used# to specify how long the user is willing to wait for a final estimate.# Solutions involving more iterations will be more accurate,# and therefore it is recommended to run the ProBABLI root-solver as long# the analyst can tolerate if the most accurate estimates are desired.# Below executes the simulation for 2 minutes per conditionsolved_2min <- SimSolve(design=Design[1, ], b=.8, interval=c(10, 500), generate=Generate, analyse=Analyse, summarise=Summarise, wait.time="2")solved_2minsummary(solved_2min)# use estimated N results to see how close power wasN <- solved_2min$Npwr.t.test(d=.2, n=N[1])#------------------------------------------------######################### Sensitivity Analysis######################## GOAL: solve effect size d given sample size and power inputs (inputs# for root no longer required to be an integer)# Generate-Analyse-Summarise functions identical to above, however# Design input includes NA for d elementDesign <- createDesign(N = c(100, 50, 25), d = NA, sig.level = .05)Design # solve for NA's#~~~~~~~~~~~~~~~~~~~~~~~~#### Step 2 --- Define generate, analyse, and summarise functions (same as above)#~~~~~~~~~~~~~~~~~~~~~~~~#### Step 3 --- Optimize d over the rows in design# search between d = [.1, 2] for each row# In this example, b = target power# note that integer = FALSE to allow smooth updates of dsolved <- SimSolve(design=Design, b = .8, interval=c(.1, 2), generate=Generate, analyse=Analyse, summarise=Summarise, integer=FALSE)solvedsummary(solved)plot(solved, 1)plot(solved, 2)plot(solved, 3)# plot median history and estimate precisionplot(solved, 1, type = 'history')plot(solved, 1, type = 'density')plot(solved, 1, type = 'iterations')# verify with true power from pwr packagelibrary(pwr)pwr.t.test(n=100, power = .8)pwr.t.test(n=50, power = .8)pwr.t.test(n=25, power = .8)# use estimated d results to see how close power waspwr.t.test(n=100, d = solved$d[1])pwr.t.test(n=50, d = solved$d[2])pwr.t.test(n=25, d = solved$d[3])### failing analytic formula, confirm results with more precise### simulation via runSimulation() (not required; if accuracy is important### PROBABLI should just be run longer)# confirm <- runSimulation(design=solved, replications=10000, parallel=TRUE,# generate=Generate, analyse=Analyse,# summarise=Summarise)# confirm#------------------------------------------------####################### Criterion Analysis###################### GOAL: solve Type I error rate (alpha) given sample size, effect size, and# power inputs (inputs for root no longer required to be an integer). Only useful# when Type I error is less important than achieving the desired 1-beta (power)Design <- createDesign(N = 50, d = c(.2, .5, .8), sig.level = NA)Design # solve for NA's# all other function definitions same as above# search for alpha within [.0001, .8]solved <- SimSolve(design=Design, b = .8, interval=c(.0001, .8), generate=Generate, analyse=Analyse, summarise=Summarise, integer=FALSE)solvedsummary(solved)plot(solved, 1)plot(solved, 2)plot(solved, 3)# plot median history and estimate precisionplot(solved, 1, type = 'history')plot(solved, 1, type = 'density')plot(solved, 1, type = 'iterations')# verify with true power from pwr packagelibrary(pwr)pwr.t.test(n=50, power = .8, d = .2, sig.level=NULL)pwr.t.test(n=50, power = .8, d = .5, sig.level=NULL)pwr.t.test(n=50, power = .8, d = .8, sig.level=NULL)# use estimated alpha results to see how close power waspwr.t.test(n=50, d = .2, sig.level=solved$sig.level[1])pwr.t.test(n=50, d = .5, sig.level=solved$sig.level[2])pwr.t.test(n=50, d = .8, sig.level=solved$sig.level[3])### failing analytic formula, confirm results with more precise### simulation via runSimulation() (not required; if accuracy is important### PROBABLI should just be run longer)# confirm <- runSimulation(design=solved, replications=10000, parallel=TRUE,# generate=Generate, analyse=Analyse,# summarise=Summarise)# confirm## End(Not run)Summarise simulated data using various population comparison statistics
Description
This collapses the simulation results within each condition to compositeestimates such as RMSE, bias, Type I error rates, coverage rates, etc. See theSee Also section below for useful functions to be used withinSummarise.
Usage
Summarise(condition, results, fixed_objects)Arguments
condition | a single row from the |
results | a |
fixed_objects | object passed down from |
Value
for best results should return a namednumeric vector ordata.framewith the desired meta-simulation results. Namedlist objects can also be returned,however the subsequent results must be extracted viaSimExtract
References
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953
See Also
bias,RMSE,RE,EDR,ECR,MAE,SimExtract
Examples
## Not run: summarise <- function(condition, results, fixed_objects) { #find results of interest here (alpha < .1, .05, .01) lessthan.05 <- EDR(results, alpha = .05) # return the results that will be appended to the design input ret <- c(lessthan.05=lessthan.05) ret}## End(Not run)Add missing values to a vector given a MCAR, MAR, or MNAR scheme
Description
Given an input vector, replace elements of this vector with missing values according to some scheme.Default method replaces input values with a MCAR scheme (where on average 10% of the values will bereplaced withNAs). MAR and MNAR are supported by replacing the defaultFUN argument.
Usage
addMissing(y, fun = function(y, rate = 0.1, ...) rep(rate, length(y)), ...)Arguments
y | an input vector that should contain missing data in the form of |
fun | a user defined function indicating the missing data mechanism for each element in |
... | additional arguments to be passed to |
Details
Given an input vector y, and other relevant variablesinside (X) and outside (Z) the data-set, the three types of missingness are:
- MCAR
Missing completely at random (MCAR). This is realized by randomlysampling the values of theinput vector (y) irrespective of the possible values in X and Z.Therefore missing values are randomly sampled and do not depend on any data characteristics andare truly random
- MAR
Missing at random (MAR). This is realized when values in the dataset (X)predict the missing data mechanism in y; conceptually this is equivalent to
P(y = NA | X). This requires the user to define a custom missing data function- MNAR
Missing not at random (MNAR). This is similar to MAR exceptthat the missing mechanism comesfrom the value of y itself or from variables outside the working dataset;conceptually this is equivalent to
P(y = NA | X, Z, y). This requiresthe user to define a custom missing data function
Value
the input vectory with the sampledNA values(according to theFUN scheme)
Author(s)
Phil Chalmersrphilip.chalmers@gmail.com
References
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953
Examples
## Not run: set.seed(1)y <- rnorm(1000)## 10% missing rate with default FUNhead(ymiss <- addMissing(y), 10)## 50% missing with default FUNhead(ymiss <- addMissing(y, rate = .5), 10)## missing values only when female and lowX <- data.frame(group = sample(c('male', 'female'), 1000, replace=TRUE), level = sample(c('high', 'low'), 1000, replace=TRUE))head(X)fun <- function(y, X, ...){ p <- rep(0, length(y)) p[X$group == 'female' & X$level == 'low'] <- .2 p}ymiss <- addMissing(y, X, fun=fun)tail(cbind(ymiss, X), 10)## missingness as a function of elements in X (i.e., a type of MAR)fun <- function(y, X){ # missingness with a logistic regression approach df <- data.frame(y, X) mm <- model.matrix(y ~ group + level, df) cfs <- c(-5, 2, 3) #intercept, group, and level coefs z <- cfs %*% t(mm) plogis(z)}ymiss <- addMissing(y, X, fun=fun)tail(cbind(ymiss, X), 10)## missing values when y elements are large (i.e., a type of MNAR)fun <- function(y) ifelse(abs(y) > 1, .4, 0)ymiss <- addMissing(y, fun=fun)tail(cbind(y, ymiss), 10)## End(Not run)Compute (relative/standardized) bias summary statistic
Description
Computes the (relative) bias of a sample estimate from the parameter value.Accepts estimate and parameter values, as well as estimate values which are in deviation form.If relative bias is requested theestimate andparameter inputs are both required.
Usage
bias( estimate, parameter = NULL, type = "bias", abs = FALSE, percent = FALSE, unname = FALSE)Arguments
estimate | a |
parameter | a |
type | type of bias statistic to return. Default ( |
abs | logical; find the absolute bias between the parameters and estimates? This effectivelyjust applies the |
percent | logical; change returned result to percentage by multiplying by 100?Default is FALSE |
unname | logical; apply |
Value
returns anumeric vector indicating the overall (relative/standardized)bias in the estimates
Author(s)
Phil Chalmersrphilip.chalmers@gmail.com
References
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953
See Also
Examples
pop <- 2samp <- rnorm(100, 2, sd = 0.5)bias(samp, pop)bias(samp, pop, type = 'relative')bias(samp, pop, type = 'standardized')bias(samp, pop, type = 'abs_relative')dev <- samp - popbias(dev)# equivalent herebias(mean(samp), pop)# matrix inputmat <- cbind(M1=rnorm(100, 2, sd = 0.5), M2 = rnorm(100, 2, sd = 1))bias(mat, parameter = 2)bias(mat, parameter = 2, type = 'relative')bias(mat, parameter = 2, type = 'standardized')# different parameter associated with each columnmat <- cbind(M1=rnorm(1000, 2, sd = 0.25), M2 = rnorm(1000, 3, sd = .25))bias(mat, parameter = c(2,3))bias(mat, parameter = c(2,3), type='relative')bias(mat, parameter = c(2,3), type='standardized')# same, but with data.framedf <- data.frame(M1=rnorm(100, 2, sd = 0.5), M2 = rnorm(100, 2, sd = 1))bias(df, parameter = c(2,2))# parameters of the same sizeparameters <- 1:10estimates <- parameters + rnorm(10)bias(estimates, parameters)# relative difference dividing by the magnitude of parametersbias(estimates, parameters, type = 'abs_relative')# relative bias as a percentagebias(estimates, parameters, type = 'abs_relative', percent = TRUE)# percentage error (PE) statistic given alpha (Type I error) and EDR() result# edr <- EDR(results, alpha = .05)edr <- c(.04, .05, .06, .08)bias(matrix(edr, 1L), .05, type = 'relative', percent = TRUE)Compute prediction estimates for the replication size using bootstrap MSE estimates
Description
This function computes bootstrap mean-square error estimates to approximate the sampling behaviorof the meta-statistics in SimDesign'ssummarise functions. A single design condition issupplied, and a simulation withmax(Rstar) replications is performed whereby thegenerate-analyse results are collected. After obtaining these replication values, thereplications are further drawn from (with replacement) using the differing sizes inRstarto approximate the bootstrap MSE behavior given different replication sizes. Finally, given thesebootstrap estimates linear regression models are fitted using the predictor termone_sqrtR = 1 / sqrt(Rstar) to allow extrapolation to replication sizes not observed inRstar. For more information about the method and subsequent bootstrap MSE plots,refer to Koehler, Brown, and Haneuse (2009).
Usage
bootPredict( condition, generate, analyse, summarise, fixed_objects = NULL, ..., Rstar = seq(100, 500, by = 100), boot_draws = 1000)boot_predict(...)Arguments
condition | a |
generate | |
analyse | |
summarise | |
fixed_objects | |
... | additional arguments to be passed to |
Rstar | a vector containing the size of the bootstrap subsets to obtain. Defaultinvestigates the vector [100, 200, 300, 400, 500] to compute the respective MSE terms |
boot_draws | number of bootstrap replications to draw. Default is 1000 |
Value
returns a list of linear model objects (vialm) for eachmeta-statistics returned by thesummarise() function
Author(s)
Phil Chalmersrphilip.chalmers@gmail.com
References
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248
Koehler, E., Brown, E., & Haneuse, S. J.-P. A. (2009). On the Assessment of Monte Carlo Error inSimulation-Based Statistical Analyses.The American Statistician, 63, 155-162.
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953
Examples
set.seed(4321)Design <- createDesign(sigma = c(1, 2))#-------------------------------------------------------------------Generate <- function(condition, fixed_objects) { dat <- rnorm(100, 0, condition$sigma) dat}Analyse <- function(condition, dat, fixed_objects) { CIs <- t.test(dat)$conf.int names(CIs) <- c('lower', 'upper') ret <- c(mean = mean(dat), CIs) ret}Summarise <- function(condition, results, fixed_objects) { ret <- c(mu_bias = bias(results[,"mean"], 0), mu_coverage = ECR(results[,c("lower", "upper")], parameter = 0)) ret}## Not run: # boot_predict supports only one condition at a timeout <- bootPredict(condition=Design[1L, , drop=FALSE], generate=Generate, analyse=Analyse, summarise=Summarise)out # list of fitted linear model(s)# extract first meta-statisticmu_bias <- out$mu_biasdat <- model.frame(mu_bias)print(dat)# original R metric plotR <- 1 / dat$one_sqrtR^2plot(R, dat$MSE, type = 'b', ylab = 'MSE', main = "Replications by MSE")plot(MSE ~ one_sqrtR, dat, main = "Bootstrap prediction plot", xlim = c(0, max(one_sqrtR)), ylim = c(0, max(MSE)), ylab = 'MSE', xlab = expression(1/sqrt(R)))beta <- coef(mu_bias)abline(a = 0, b = beta, lty = 2, col='red')# what is the replication value when x-axis = .02? What's its associated expected MSE?1 / .02^2 # number of replicationspredict(mu_bias, data.frame(one_sqrtR = .02)) # y-axis value# approximately how many replications to obtain MSE = .001?(beta / .001)^2## End(Not run)Set RNG sub-stream for Pierre L'Ecuyer's RngStreams
Description
Sets the sub-stream RNG state within for Pierre L'Ecuyer's (1999)algorithm. Should be used within distributed array jobsafter suitable L'Ecuyer's (1999) have been distributed to each array, andeach array is further defined to use multi-core processing. SeeclusterSetRNGStream for further information.
Usage
clusterSetRNGSubStream(cl, seed)Arguments
cl | A cluster from the |
seed | An integer vector of length 7 as given by |
Value
invisible NULL
Form Column Standard Deviation and Variances
Description
Form column standard deviation and variances for numeric arrays (or data frames).
Usage
colVars(x, na.rm = FALSE, unname = FALSE)colSDs(x, na.rm = FALSE, unname = FALSE)Arguments
x | an array of two dimensions containing numeric, complex, integer or logical values,or a numeric data frame |
na.rm | logical; remove missing values in each respective column? |
unname | logical; apply |
Author(s)
Phil Chalmersrphilip.chalmers@gmail.com
See Also
Examples
results <- matrix(rnorm(100), ncol=4)colnames(results) <- paste0('stat', 1:4)colVars(results)colSDs(results)results[1,1] <- NAcolSDs(results)colSDs(results, na.rm=TRUE)colSDs(results, na.rm=TRUE, unname=TRUE)Create the simulation design object
Description
Create a partially or fully-crossed data object reflecting the uniquesimulation design conditions. Each row of the returned object representsa unique simulation condition, and each column represents the named factorvariables under study.
Usage
createDesign( ..., subset, fractional = NULL, tibble = TRUE, stringsAsFactors = FALSE, fully.crossed = TRUE)## S3 method for class 'Design'print(x, list2char = TRUE, pillar.sigfig = 5, show.IDs = FALSE, ...)## S3 method for class 'Design'x[i, j, ..., drop = FALSE]rbindDesign(..., keep.IDs = FALSE)Arguments
... | comma separated list of named input objects representing the simulationfactors to completely cross. Note that these arguments are passed to |
subset | (optional) a logical vector indicating elements or rows to keepto create a partially crossed simulation design |
fractional | a fractional design matrix returned from the |
tibble | logical; return a |
stringsAsFactors | logical; should character variable inputs be coercedto factors when building a |
fully.crossed | logical; create a fully-crossed design object? Setting to |
x | object of class |
list2char | logical; for |
pillar.sigfig | number of significant digits to print. Default is 5 |
show.IDs | logical; print the internally stored Design ID indicators? |
i | row index |
j | column index |
drop | logical; drop to lower dimension class? |
keep.IDs | logical; keep the internal ID variables in the |
Value
atibble ordata.frame containing the simulation experimentconditions to be evaluated inrunSimulation
Author(s)
Phil Chalmersrphilip.chalmers@gmail.com
References
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953
See Also
Examples
## Not run: # modified example from runSimulation()Design <- createDesign(N = c(10, 20), SD = c(1, 2))Design# remove N=10, SD=2 row from initial definitionDesign <- createDesign(N = c(10, 20), SD = c(1, 2), subset = !(N == 10 & SD == 2))Design# example with list inputsDesign <- createDesign(N = c(10, 20), SD = c(1, 2), combo = list(c(0,0), c(0,0,1)))Design # notice levels printed (not typical for tibble)print(Design, list2char = FALSE) # standard tibble outputDesign <- createDesign(N = c(10, 20), SD = c(1, 2), combo = list(c(0,0), c(0,0,1)), combo2 = list(c(5,10,5), c(6,7)))Designprint(Design, list2char = FALSE) # standard tibble output# design without crossing (inputs taken-as is)Design <- createDesign(N = c(10, 20), SD = c(1, 2), cross=FALSE)Design # only 2 rows############ fractional factorial examplelibrary(FrF2)# help(FrF2)# 7 factors in 32 runsfr <- FrF2(32,7)dim(fr)fr[1:6,]# Create working simulation design given -1/1 combinationsfDesign <- createDesign(sample_size=c(100,200), mean_diff=c(.25, 1, 2), variance.ratio=c(1,4, 8), equal_size=c(TRUE, FALSE), dists=c('norm', 'skew'), same_dists=c(TRUE, FALSE), symmetric=c(TRUE, FALSE), # remove same-normal combo subset = !(symmetric & dists == 'norm'), fractional=fr)fDesign## End(Not run)Expand the simulation design object for array computing
Description
Repeat each design row the specified number of times. This is primarily usedfor cluster computing where jobs are distributed with batches of replicationsand later aggregated into a complete simulation object(seerunArraySimulation andSimCollect).
Usage
expandDesign(Design, repeat_conditions)Arguments
Design | object created by |
repeat_conditions | integer vector used to repeat each design rowthe specified number of times. Can either be a single integer, which repeatseach row this many times, or an integer vector equal to the number of totalrows in the created object. This argument is useful when distributing independent row conditions tocluster computing environments, particularly with different |
Value
atibble ordata.frame containing the simulation experimentconditions to be evaluated inrunSimulation
Author(s)
Phil Chalmersrphilip.chalmers@gmail.com
References
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953
See Also
expandReplications,createDesign,SimCollect,runArraySimulation
Examples
## Not run: # repeat each row 4 times (for cluster computing)Design <- createDesign(N = c(10, 20), SD.equal = c(TRUE, FALSE))Design4 <- expandDesign(Design, 4)Design4# repeat first two rows 2x and the rest 4 times (for cluster computing# where first two conditions are faster to execute)Design <- createDesign(SD.equal = c(TRUE, FALSE), N = c(10, 100, 1000))Design24 <- expandDesign(Design, c(2,2,rep(4, 4)))Design24## End(Not run)Expand the replications to matchexpandDesign
Description
Expands the replication budget to match theexpandDesignstructure.
Usage
expandReplications(replications, repeat_conditions)Arguments
replications | number of replications. Can be a scalar to reflect the samereplications overall, or a vector of unequal replication budgets. |
repeat_conditions | integer vector used to repeat each design rowthe specified number of times. Can either be a single integer, which repeatseach row this many times, or an integer vector equal to the number of totalrows in the created object. |
Value
an integer vector of the replication budget matchingthe expanded structure inexpandDesign
Author(s)
Phil Chalmersrphilip.chalmers@gmail.com
References
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953
See Also
Examples
## Not run: # repeat each row 4 times (for cluster computing)Design <- createDesign(N = c(10, 20), SD.equal = c(TRUE, FALSE))Design4 <- expandDesign(Design, 4)Design4# match the replication budget. Target is 1000 replications(replications4 <- expandReplications(1000, 4))# hence, evaluate each row in Design4 250 timescbind(Design4, replications4)##### Unequal Design intensitiesDesign24 <- createDesign(SD.equal = c(TRUE, FALSE), N = c(10, 100, 1000))# split first two conditions into half rows, next two conditions into quarters,# while N=1000 condition into tenthsexpand <- c(2,2,4,4,10,10)eDesign <- expandDesign(Design, expand)eDesign# target replications is R=1000 per condition(replications24 <- expandReplications(1000, expand))cbind(eDesign, replications24)## End(Not run)Generate random seeds
Description
Generate seeds to be passed torunSimulation'sseed input. Valuesare sampled from 1 to 2147483647, or are generated using L'Ecuyer-CMRG's (2002)method (returning either a list ifarrayID is omitted, or the specificrow value from this list ifarrayID is included).
Usage
genSeeds(design = 1L, iseed = NULL, arrayID = NULL, old.seeds = NULL)gen_seeds(...)Arguments
design | design matrix that requires a unique seed per condition, ora number indicating the number of seeds to generate. Default generates onenumber |
iseed | the initial |
arrayID | (optional) single integer input corresponding to the specificrow in the |
old.seeds | (optional) vector or matrix of last seeds used inprevious simulations to avoid repeating the same seed on a subsequent run.Note that this approach should be used sparingly as seeds set more frequentlyare more likely to correlate, and therefore provide less optimal randomnumber behaviour (e.g., if performing a simulation on two runs to achieve5000 * 2 = 10,000 replications this is likely reasonable,but for simulations with 100 * 2 = 200 replications this is morelikely to be sub-optimal).Length must be equal to the number of rows in |
... | does nothing |
Author(s)
Phil Chalmersrphilip.chalmers@gmail.com
Examples
# generate 1 seed (default)genSeeds()# generate 5 unique seedsgenSeeds(5)# generate from nrow(design)design <- createDesign(factorA=c(1,2,3), factorB=letters[1:3])seeds <- genSeeds(design)seeds# construct new seeds that are independent from original (use this sparingly)newseeds <- genSeeds(design, old.seeds=seeds)newseeds# can be done in batches toonewseeds2 <- genSeeds(design, old.seeds=cbind(seeds, newseeds))cbind(seeds, newseeds, newseeds2) # all unique############# generate seeds for runArraySimulation()(iseed <- genSeeds()) # initial seedseed_list <- genSeeds(design, iseed=iseed)seed_list# expand number of unique seeds given iseed (e.g., in case more replications# are required at a later date)seed_list_tmp <- genSeeds(nrow(design)*2, iseed=iseed)str(seed_list_tmp) # first 9 seeds identical to seed_list# more usefully for HPC, extract only the seed associated with an arrayIDarraySeed.15 <- genSeeds(nrow(design)*2, iseed=iseed, arrayID=15)arraySeed.15Get job array ID (e.g., from SLURM or other HPC array distributions)
Description
Get the array ID from an HPC array distribution job (e.g., from SLURM orfrom optional command line arguments).The array ID is used to index the rows in the designobject inrunArraySimulation. For instance,a SLURM array with 10 independent jobs might have the following shellinstructions.
Usage
getArrayID(type = "slurm", trailingOnly = TRUE, ID.shift = 0L)Arguments
type | an integer indicating the element from the result of |
trailingOnly | logical value passed to |
ID.shift | single integer value used to shift the array ID by a constant.Useful when there are array range limitation that must be specified in theshell files (e.g., array can only be 10000 but there are more rowsin the |
Details
#!/bin/bash -l#SBATCH --time=00:01:00#SBATCH --array=1-10
which names the associated jobs with the numbers 1 through 10.getArrayID() then extracts this information per array, whichis used as therunArraySimulation(design, ..., arrayID = getArrayID()) topass specific rows for thedesign object.
See Also
Examples
## Not run: # get slurm array IDarrayID <- getArrayID()# get ID based on first optional argument in shell specificationarrayID <- getArrayID(type = 1)# pass to# runArraySimulation(design, ...., arrayID = arrayID)# increase detected arrayID by constant 9999 (for array specification limitations) arrayID <- getArrayID(ID.shift=9999)## End(Not run)List All Available Notifiers
Description
Automatically detects all S3 classes that have a specializednotify() method(likenotify.MyNotifier) and prints them as a character vector of class names(e.g.,"PushbulletNotifier","TelegramNotifier").
Note that only classes defined and loaded at the time you call this function willappear. If you just created a new notifier in another file or package, ensure it'ssourced/loaded first.
Usage
listAvailableNotifiers()Value
A character vector of class names that havenotify.<ClassName> methods.
Examples
## Not run: listAvailableNotifiers()# [1] "PushbulletNotifier" "TelegramNotifier"## End(Not run)Increase the intensity or suppress the output of an observed message
Description
Function provides more nuanced management of known message outputsmessages that appear in function calls outside the front-end users control(e.g., functions written in third-party packages). Specifically,this function provides a less nuclear approach thanquiet and friends, which suppresses allcat andmessages raised, and instead allows for specific messages to beraised either to warnings or, even more extremely, to errors. Note that formessages that are not suppressed the order with which the output and messagecalls appear in the original function is not retained.
Usage
manageMessages( expr, allow = NULL, message2warning = NULL, message2error = NULL, ...)Arguments
expr | expression to be evaluated (e.g., ret <- |
allow | (optional) a |
message2warning | (optional) Input can be a |
message2error | (optional) Input can be a |
... | additional arguments passed to |
Value
returns the original result ofeval(expr), with warningmessages either left the same, increased to errors, or suppressed (dependingon the input specifications)
Author(s)
Phil Chalmersrphilip.chalmers@gmail.com
References
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248
See Also
Examples
## Not run: myfun <- function(x, warn=FALSE){ message('This function is rather chatty') cat("It even prints in different output forms!\n") message('And even at different ') cat(" many times!\n") cat("Too many messages can be annoying \n") if(warn) warning('It may even throw warnings ') x}out <- myfun(1)out# tell the function to shhhhout <- quiet(myfun(1))out# same default behaviour as quiet(), but potential for nuanceout2 <- manageMessages(myfun(1))identical(out, out2)# allow some messages to still get printedout2 <- manageMessages(myfun(1), allow = "many times!")out2 <- manageMessages(myfun(1), allow = "This function is rather chatty")# note: . matches single character (regex)out2 <- manageMessages(myfun(1), allow = c("many times.", "This function is rather chatty"))# convert specific message to warningout3 <- manageMessages(myfun(1), message2warning = "many times!")identical(out, out3)# other warnings also get throughout3 <- manageMessages(myfun(1, warn=TRUE), message2warning = "times!")identical(out, out3)# convert message to errormanageMessages(myfun(1), message2error = "m... times!")# multiple message intensity changesmanageMessages(myfun(1), message2warning = "It even prints in different output forms", message2error = "many times!")manageMessages(myfun(1), allow = c("This function is rather chatty", "Too many messages can be annoying"), message2warning = "It even prints in different output forms", message2error = "many times!")## End(Not run)Manage specific warning messages
Description
Function provides more nuanced management of known warningmessages that appear in function calls outside the front-end users control(e.g., functions written in third-party packages). Specifically,this function provides a less nuclear approach thansuppressWarnings, which suppresses all warning messagesrather than those which are knownto be innocuous to the current application, or when globally settingoptions(warn=2),which has the opposite effect of treating all warnings messages as errorsin the function executions. To avoid these two extreme behaviors,character vectors can instead be supplied to this functionto either leave the raised warningsas-is (default behaviour), raise only specific warning messages to errors,or specify specific warning messages that can be generally be ignored(and therefore suppressed) while allowing new or yet to be discovered warningsto still be raised.
Usage
manageWarnings(expr, warning2error = FALSE, suppress = NULL, ...)Arguments
expr | expression to be evaluated (e.g., ret <- |
warning2error |
Alternatively, and more useful for specificity reasons,input can be a |
suppress | a |
... | additional arguments passed to |
Details
In general, global/nuclear behaviour of warning messages should be avoidedas they are generally bad practice. On one extreme,when suppressing all warning messages usingsuppressWarnings,potentially important warning messages will become muffled, which can be problematicif the code developer has not become aware of these (now muffled) warnings.Moreover, this can become a long-term sustainability issue when third-party functionsthat the developer's code depends upon throw new warnings in the future as thecode developer will be less likely to become aware of these newly implemented warnings.
On the other extreme, where all warning messages are turned into errorsusingoptions(warn=2), innocuous warning messages can and will be (unwantingly)raised to an error. This negatively affects the logical workflow of thedeveloper's functions, where more error messages must now be manually managed(e.g., viatryCatch), including the known to be innocuouswarning messages as these will now considered as errors.
To avoid these extremes, front-end users should first make note of the warning messagesthat have been raised in their prior executions, and organized these messagesinto vectors of ignorable warnings (least severe), known/unknown warningsthat should remain as warnings (even if not known by the code developer yet),and explicit warnings that ought to be considered errors for the currentapplication (most severe). Once collected, these can be passed to the respectivewarning2error argument to increase the intensity of a specific warningraised, or to thesuppress argument to suppress only the messages thathave been deemed ignorable a priori (and therefore allowing all other warningmessages to be raised).
Value
returns the original result ofeval(expr), with warningmessages either left the same, increased to errors, or suppressed (dependingon the input specifications)
Author(s)
Phil Chalmersrphilip.chalmers@gmail.com
References
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248
See Also
Examples
## Not run: fun <- function(warn1=FALSE, warn2=FALSE, warn3=FALSE, warn_trailing = FALSE, error=FALSE){ if(warn1) warning('Message one') if(warn2) warning('Message two') if(warn3) warning('Message three') if(warn_trailing) warning(sprintf('Message with lots of random trailings: %s', paste0(sample(letters, sample(1:20, 1)), collapse=','))) if(error) stop('terminate function call') return('Returned from fun()')}# normal run (no warnings or errors)out <- fun()out# these are all the samemanageWarnings(out <- fun())out <- manageWarnings(fun())out <- fun() |> manageWarnings()# errors treated normallyfun(error=TRUE)fun(error=TRUE) |> manageWarnings()# all warnings/returns treated normally by defaultret1 <- fun(warn1=TRUE)ret2 <- fun(warn1=TRUE) |> manageWarnings()identical(ret1, ret2)# all warnings converted to errors (similar to options(warn=2), but local)fun(warn1=TRUE) |> manageWarnings(warning2error=TRUE)fun(warn2=TRUE) |> manageWarnings(warning2error=TRUE)# Specific warnings treated as errors (others stay as warnings)# Here, treat first warning message as error but not the second or thirdret <- fun(warn1=TRUE) # warningret <- fun(warn1=TRUE) |> manageWarnings("Message one") # now errorret <- fun(warn2=TRUE) |> manageWarnings("Message one") # still a warning# multiple warnings raised but not converted as they do not match criteriafun(warn2=TRUE, warn3=TRUE)fun(warn2=TRUE, warn3=TRUE) |> manageWarnings("Message one")# Explicitly convert multiple warning messages, allowing others through.# This is generally the best use of the function's specificityfun(warn1=TRUE, warn2=TRUE)fun(warn1=TRUE) |> # error given either message manageWarnings(c("Message one", "Message two"))fun(warn2=TRUE) |> manageWarnings(c("Message one", "Message two"))# last warning gets through (left as valid warning)ret <- fun(warn3=TRUE) |> manageWarnings(c("Message one", "Message two"))ret# suppress warnings that have only partial matchingfun(warn_trailing=TRUE)fun(warn_trailing=TRUE)fun(warn_trailing=TRUE)# partial match, therefore suppressedfun(warn_trailing=TRUE) |> manageWarnings(suppress="Message with lots of random trailings: ")# multiple suppress stringsfun(warn_trailing=TRUE) |> manageWarnings(suppress=c("Message with lots of random trailings: ", "Suppress this too"))# could also use .* to catch all remaining characters (finer regex control)fun(warn_trailing=TRUE) |> manageWarnings(suppress="Message with lots of random trailings: .*")############ Combine with quiet() and suppress argument to suppress innocuous messagesfun <- function(warn1=FALSE, warn2=FALSE, warn3=FALSE, error=FALSE){ message('This function is rather chatty') cat("It even prints in different output forms!\n") if(warn1) warning('Message one') if(warn2) warning('Message two') if(warn3) warning('Message three') if(error) stop('terminate function call') return('Returned from fun()')}# normal run (no warnings or errors, but messages)out <- fun()out <- quiet(fun()) # using "indoor voice"# suppress all print messages and warnings (not recommended)fun(warn2=TRUE) |> quiet()fun(warn2=TRUE) |> quiet() |> suppressWarnings()# convert all warning to errors, and keep suppressing messages via quiet()fun(warn2=TRUE) |> quiet() |> manageWarnings(warning2error=TRUE)# define tolerable warning messages (only warn1 deemed ignorable)ret <- fun(warn1=TRUE) |> quiet() |> manageWarnings(suppress = 'Message one')# all other warnings raised to an error except ignorable onesfun(warn1=TRUE, warn2=TRUE) |> quiet() |> manageWarnings(warning2error=TRUE, suppress = 'Message one')# only warn2 raised to an error explicitly (warn3 remains as warning)ret <- fun(warn1=TRUE, warn3=TRUE) |> quiet() |> manageWarnings(warning2error = 'Message two', suppress = 'Message one')fun(warn1=TRUE, warn2 = TRUE, warn3=TRUE) |> quiet() |> manageWarnings(warning2error = 'Message two', suppress = 'Message one')############################ Practical example, converting warning into error for model that# failed to converged normally library(lavaan)## The industrialization and Political Democracy Example## Bollen (1989), page 332model <- ' # latent variable definitions ind60 =~ x1 + x2 + x3 dem60 =~ y1 + a*y2 + b*y3 + c*y4 dem65 =~ y5 + a*y6 + b*y7 + c*y8 # regressions dem60 ~ ind60 dem65 ~ ind60 + dem60 # residual correlations y1 ~~ y5 y2 ~~ y4 + y6 y3 ~~ y7 y4 ~~ y8 y6 ~~ y8'# throws a warningfit <- sem(model, data = PoliticalDemocracy, control=list(iter.max=60))# for a simulation study, often better to treat this as an errorfit <- sem(model, data = PoliticalDemocracy, control=list(iter.max=60)) |> manageWarnings(warning2error = "the optimizer warns that a solution has NOT been found!")## End(Not run)Auto-named Concatenation of Vector or List
Description
This is a wrapper to the functionc, however names the respective elementsaccording to their input object name. For this reason, nestingnc() callsis not recommended (joining independentnc() calls viac()is however reasonable).
Usage
nc(..., use.names = FALSE, error.on.duplicate = TRUE)Arguments
... | objects to be concatenated |
use.names | logical indicating if |
error.on.duplicate | logical; if the same object name appears in the returning objectshould an error be thrown? Default is |
References
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953
Examples
A <- 1B <- 2C <- 3names(C) <- 'LetterC'# compare the followingc(A, B, C) # unnamednc(A, B, C) # namednc(this=A, B, C) # respects override named (same as c() )nc(this=A, B, C, use.names = TRUE) # preserve original name## Not run: # throws errors if names not uniquenc(this=A, this=B, C)nc(LetterC=A, B, C, use.names=TRUE)## End(Not run)# poor input choice namesnc(t.test(c(1:2))$p.value, t.test(c(3:4))$p.value)# better to explicitly provide namenc(T1 = t.test(c(1:2))$p.value, T2 = t.test(c(3:4))$p.value)# vector of unnamed inputsA <- c(5,4,3,2,1)B <- c(100, 200)nc(A, B, C) # A's and B's numbered uniquelyc(A, B, C) # comparenc(beta=A, B, C) # replacement of object name# retain names attributes (but append object name, when appropriate)names(A) <- letters[1:5]nc(A, B, C)nc(beta=A, B, C)nc(A, B, C, use.names=TRUE)# mix and match if some named elements work while others do notc( nc(A, B, use.names=TRUE), nc(C))## Not run: # error, 'b' appears twicenames(B) <- c('b', 'b2')nc(A, B, C, use.names=TRUE)## End(Not run)# List inputA <- list(1)B <- list(2:3)C <- list('C')names(C) <- 'LetterC'# compare the followingc(A, B, C) # unnamednc(A, B, C) # namednc(this=A, B, C) # respects override named (same as c() and list() )nc(this=A, B, C, use.names = TRUE) # preserve original nameCreate a Pushbullet Notifier
Description
Constructs a notifier object for sending messages via Pushbullet. This requires aPushbullet account, the Pushbullet application installed on both a mobile deviceand computer, and a properly configured JSON file (typically~/.rpushbullet.json,usingRPushbullet::pbSetup()).
Usage
new_PushbulletNotifier( config_path = "~/.rpushbullet.json", verbose_issues = FALSE)Arguments
config_path | A character string specifying the path to the Pushbullet configuration file.Defaults to |
verbose_issues | Logical. If |
Details
To useRPushbullet inSimDesign, create aPushbulletNotifierobject usingnew_PushbulletNotifier() and pass it to thenotifierargument inrunSimulation().
Value
An S3 object of class"PushbulletNotifier" and"Notifier".
Examples
# Create a Pushbullet notifier (requires a valid configuration file)pushbullet_notifier <- new_PushbulletNotifier(verbose_issues = TRUE)Create a Telegram Notifier
Description
Constructs a notifier object for sending messages via Telegram.Requires a valid Telegram bot token and chat ID.
Usage
new_TelegramNotifier(bot_token, chat_id, verbose_issues = FALSE)Arguments
bot_token | A character string representing your Telegram bot token, typicallysomething like |
chat_id | A character string or numeric representing the chat/group to sendmessages to. |
verbose_issues | Logical. If TRUE, provides detailed information about warnings and errors in the notifications. |
Details
To use send notifications over Telegram withhttr inSimDesign,installhttr, set set up a Telegram bot, and obtain a bot token and chat ID.For more information, see theTelegram Bots API.Then use thenew_TelegramNotifier() function to create aTelegramNotifierobject and pass it to thenotifier argument inrunSimulation().
Value
An S3 object of class"TelegramNotifier".
Examples
# Create a Telegram notifier (requires setting up a Telegram Bot)telegram_notifier <- new_TelegramNotifier(bot_token = "123456:ABC-xyz", chat_id = "987654321")Send a simulation notification
Description
Package extensions can implement custom notifiers by creating S3 methodsfor this generic.
Usage
notify(notifier, event, event_data)Arguments
notifier | The notifier object |
event | Character string indicating the notification trigger ("condition" or "complete") |
event_data | List containing context information for the notification |
S3 method to send notifications via Pushbullet
Description
S3 method to send notifications via Pushbullet
Usage
## S3 method for class 'PushbulletNotifier'notify(notifier, event, event_data)Arguments
notifier | A TelegramNotifier object created with new_TelegramNotifier() |
event | Character string indicating the notification trigger ("condition" or "complete") |
event_data | List containing context information for the notification |
Value
Invisibly returns NULL
S3 method to send notifications through the Telegram API.
Description
S3 method to send notifications through the Telegram API.
Usage
## S3 method for class 'TelegramNotifier'notify(notifier, event, event_data)Arguments
notifier | A TelegramNotifier object created with new_TelegramNotifier() |
event | Character string indicating the notification trigger ("condition" or "complete") |
event_data | List containing context information for the notification |
Value
Invisibly returns NULL
Suppress verbose function messages
Description
This function is used to suppress information printed from external functionsthat make internal use ofmessage andcat, whichprovide information in interactive R sessions. For simulations, the sessionis not interactive, and therefore this type of output should be suppressed.For similar behaviour for suppressing warning messages, seemanageWarnings.
Usage
quiet(..., cat = TRUE, keep = FALSE, attr.name = "quiet.messages")Arguments
... | the functional expression to be evaluated |
cat | logical; also capture calls from |
keep | logical; return a character vector of the messages/concatenateand print strings as an attribute to the resulting object from |
attr.name | attribute name to use when |
References
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953
See Also
Examples
myfun <- function(x, warn=FALSE){ message('This function is rather chatty') cat("It even prints in different output forms!\n") message('And even at different....') cat("...times!\n") if(warn) warning('It may even throw warnings!') x}out <- myfun(1)out# tell the function to shhhhout <- quiet(myfun(1))out# which messages are suppressed? Extract stored attributeout <- quiet(myfun(1), keep = TRUE)attr(out, 'quiet.messages')# Warning messages still get through (see manageWarnings(suppress)# for better alternative than using suppressWarnings())out2 <- myfun(2, warn=TRUE) |> quiet() # warning gets throughout2# suppress warning message explicitly, allowing others to be raised if presentmyfun(2, warn=TRUE) |> quiet() |> manageWarnings(suppress='It may even throw warnings!')Generate non-normal data with Headrick's (2002) method
Description
Generate multivariate non-normal distributions using the fifth-order polynomialmethod described by Headrick (2002).
Usage
rHeadrick( n, mean = rep(0, nrow(sigma)), sigma = diag(length(mean)), skew = rep(0, nrow(sigma)), kurt = rep(0, nrow(sigma)), gam3 = NaN, gam4 = NaN, return_coefs = FALSE, coefs = NULL, control = list(trace = FALSE, max.ntry = 15, obj.tol = 1e-10, n.valid.sol = 1))Arguments
n | number of samples to draw |
mean | a vector of k elements for the mean of the variables |
sigma | desired k x k covariance matrix between bivariate non-normal variables |
skew | a vector of k elements for the skewness of the variables |
kurt | a vector of k elements for the kurtosis of the variables |
gam3 | (optional) explicitly supply the gamma 3 value? Default computes this internally |
gam4 | (optional) explicitly supply the gamma 4 value? Default computes this internally |
return_coefs | logical; return the estimated coefficients only? See below regarding why this is useful. |
coefs | (optional) supply previously estimated coefficients? This is useful when there must be multipledata sets drawn and will avoid repetitive computations. Must be the object returned after passing |
control | a list of control parameters when locating the polynomial coefficients |
Details
This function is primarily a wrapper for the code written by Oscar L. Olvera Astivia(last edited Feb 26, 2015) with some modifications (e.g., better starting valuesfor the Newton optimizer, passing previously saved coefs, etc).
Author(s)
Oscar L. Olvera Astivia and Phil Chalmersrphilip.chalmers@gmail.com
References
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953
Headrick, T. C. (2002). Fast fifth-order polynomial transforms for generating univariate andmultivariate nonnormal distributions.Computational Statistics & Data Analysis, 40, 685-711.
Olvera Astivia, O. L., & Zumbo, B. D. (2015). A Cautionary Note on the Use of the Vale and MaurelliMethod to Generate Multivariate, Nonnormal Data for Simulation Purposes.Educational and Psychological Measurement, 75, 541-567.
Examples
## Not run: set.seed(1)N <- 200mean <- c(rep(0,4))Sigma <- matrix(.49, 4, 4)diag(Sigma) <- 1skewness <- c(rep(1,4))kurtosis <- c(rep(2,4))nonnormal <- rHeadrick(N, mean, Sigma, skewness, kurtosis)# cor(nonnormal)# psych::describe(nonnormal)#-----------# compute the coefficients, then supply them back to the function to avoid# extra computationscfs <- rHeadrick(N, mean, Sigma, skewness, kurtosis, return_coefs = TRUE)cfs# comparesystem.time(nonnormal <- rHeadrick(N, mean, Sigma, skewness, kurtosis))system.time(nonnormal <- rHeadrick(N, mean, Sigma, skewness, kurtosis, coefs=cfs))## End(Not run)Generate non-normal data with Vale & Maurelli's (1983) method
Description
Generate multivariate non-normal distributions using the third-order polynomial method describedby Vale & Maurelli (1983). If only a single variable is generated then this functionis equivalent to the method described by Fleishman (1978).
Usage
rValeMaurelli( n, mean = rep(0, nrow(sigma)), sigma = diag(length(mean)), skew = rep(0, nrow(sigma)), kurt = rep(0, nrow(sigma)))Arguments
n | number of samples to draw |
mean | a vector of k elements for the mean of the variables |
sigma | desired k x k covariance matrix between bivariate non-normal variables |
skew | a vector of k elements for the skewness of the variables |
kurt | a vector of k elements for the kurtosis of the variables |
Author(s)
Phil Chalmersrphilip.chalmers@gmail.com
References
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953
Fleishman, A. I. (1978). A method for simulating non-normal distributions.Psychometrika, 43, 521-532.
Vale, C. & Maurelli, V. (1983). Simulating multivariate nonnormal distributions.Psychometrika, 48(3), 465-471.
Examples
set.seed(1)# univariate with skewnonnormal <- rValeMaurelli(10000, mean=10, sigma=5, skew=1, kurt=3)# psych::describe(nonnormal)# multivariate with skew and kurtosisn <- 10000r12 <- .4r13 <- .9r23 <- .1cor <- matrix(c(1,r12,r13,r12,1,r23,r13,r23,1),3,3)sk <- c(1.5,1.5,0.5)ku <- c(3.75,3.5,0.5)nonnormal <- rValeMaurelli(n, sigma=cor, skew=sk, kurt=ku)# cor(nonnormal)# psych::describe(nonnormal)Combine two separate SimDesign objects by row
Description
This function combines two Monte Carlo simulations executed bySimDesign'srunSimulation function which, for allintents and purposes, could have been executed in a single run.This situation arises when a simulation has been completed, howevertheDesign object was later modified to include more levels in thedefined simulation factors. Rather than re-executing the previously completedsimulation combinations, only the new combinations need to be evaluatedinto a different object and thenrbind together to create the completeobject combinations.
Usage
## S3 method for class 'SimDesign'rbind(...)Arguments
... | two or more |
Value
same object that is returned byrunSimulation
Author(s)
Phil Chalmersrphilip.chalmers@gmail.com
References
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953
Examples
## Not run: # modified example from runSimulation()Design <- createDesign(N = c(10, 20), SD = c(1, 2))Generate <- function(condition, fixed_objects) { dat <- with(condition, rnorm(N, 10, sd=SD)) dat}Analyse <- function(condition, dat, fixed_objects) { ret <- mean(dat) # mean of the sample data vector ret}Summarise <- function(condition, results, fixed_objects) { ret <- c(mu=mean(results), SE=sd(results)) # mean and SD summary of the sample means ret}Final1 <- runSimulation(design=Design, replications=1000, generate=Generate, analyse=Analyse, summarise=Summarise)Final1#### later decide that N = 30 should have also been investigated. Rather than# running the following object ....newDesign <- createDesign(N = c(10, 20, 30), SD = c(1, 2))# ... only the new subset levels are executed to save timesubDesign <- subset(newDesign, N == 30)subDesignFinal2 <- runSimulation(design=subDesign, replications=1000, generate=Generate, analyse=Analyse, summarise=Summarise)Final2# glue results together by row into one object as though the complete 'Design'# object were run all at onceFinal <- rbind(Final1, Final2)Finalsummary(Final)## End(Not run)Run a summarise step for results that have been saved to the hard drive
Description
WhenrunSimulation() uses the optionsave_results = TRUEthe R replication results from the Generate-Analyse functions arestored to the hard drive. As such, additional summarise componentsmay be required at a later time, whereby the respective.rds filesmust be read back into R to be summarised. This function performsthe reading of these files, application of a provided summarise function,and final collection of the respective results.
Usage
reSummarise( summarise, dir = NULL, files = NULL, results = NULL, Design = NULL, fixed_objects = NULL, boot_method = "none", boot_draws = 1000L, CI = 0.95, prefix = "results-row")Arguments
summarise | a summarise function to apply to the read-in files.See Note that if the simulation containedonly one row then the new summarise function can be defined as either |
dir | directory pointing to the .rds files to beread-in that were saved from |
files | (optional) names of files to read-in. If |
results | (optional) the results of Alternatively, if |
Design | (optional) if |
fixed_objects | (optional) see |
boot_method | method for performing non-parametric bootstrap confidence intervalsfor the respective meta-statistics computed by the |
boot_draws | number of non-parametric bootstrap draws to sample for the |
CI | bootstrap confidence interval level (default is 95%) |
prefix | character indicating prefix used for stored files |
Author(s)
Phil Chalmersrphilip.chalmers@gmail.com
References
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953
Examples
Design <- createDesign(N = c(10, 20, 30))Generate <- function(condition, fixed_objects) { dat <- with(condition, rnorm(N, 10, 5)) # distributed N(10, 5) dat}Analyse <- function(condition, dat, fixed_objects) { ret <- c(mean=mean(dat), median=median(dat)) # mean/median of sample data ret}Summarise <- function(condition, results, fixed_objects){ colMeans(results)}## Not run: # run the simulationrunSimulation(design=Design, replications=50, generate=Generate, analyse=Analyse, summarise=Summarise, save_results=TRUE, save_details = list(save_results_dirname='simresults'))res <- reSummarise(Summarise, dir = 'simresults/')resSummarise2 <- function(condition, results, fixed_objects){ ret <- c(mean_ests=colMeans(results), SE=colSDs(results)) ret}res2 <- reSummarise(Summarise2, dir = 'simresults/')res2SimClean(dir='simresults/')## End(Not run)#### Similar, but with results stored within the final objectres <- runSimulation(design=Design, replications=50, store_results = TRUE, generate=Generate, analyse=Analyse, summarise=Summarise)res# same summarise but with bootstrappingres2 <- reSummarise(Summarise, results = res, boot_method = 'basic')res2Rejection sampling (i.e., accept-reject method)
Description
This function supports the rejection sampling (i.e., accept-reject) approachto drawing values from seemingly difficult (probability) density functionsby sampling values from more manageable proxy distributions.
Usage
rejectionSampling( n, df, dg, rg, M, method = "optimize", interval = NULL, logfuns = FALSE, maxM = 1e+05, parstart = rg(1L), ESRS_Mstart = 1.0001)Arguments
n | number of samples to draw |
df | the desired (potentially un-normed) density function to drawindependent samples from. Must be in the form of a |
dg | the proxy (potentially un-normed) density function todraw samples from in lieu of drawing samples from |
rg | the proxy random number generation function, associated with |
M | the upper-bound of the ratio of probability density functions to helpminimize the number of discarded draws and define the correspondingrescaled proposal envelope. When missing, |
method | when M is missing, the optimization of M is done either byfinding the mode of the log-density values ( |
interval | when M is missing, for univariate density function draws,the interval to search within via |
logfuns | logical; have the |
maxM | logical; if when optimizing M the value is greater than thiscut-off then stop; ampler would likelihood be too efficient,or optimization is failing |
parstart | starting value vector for optimization of M inmultidimensional distributions |
ESRS_Mstart | starting M value for the ESRS algorithm |
Details
The accept-reject algorithm is a flexible approach to obtaining i.i.d.'s froma difficult to sample from (probability) density function where either thetransformation method fails or inverse transform method isdifficult to manage. The algorithm does so by sampling froma more "well-behaved" proxy distribution (with identical support, up to someproportionality constantM that reshapes the proposal densityto envelope the target density), and accepts thedraws if they are likely within the target density. Hence, the closer theshape ofdg(x) is to the desireddf(x), the more likely the drawsare to be accepted; otherwise, many iterations of the accept-reject algorithmmay be required, which decreases the computational efficiency.
Value
returns a vector or matrix of draws (corresponding to theoutput class fromrg) from the desireddf
Author(s)
Phil Chalmersrphilip.chalmers@gmail.com
References
Caffo, B. S., Booth, J. G., and Davison, A. C. (2002). Empirical supremumrejection sampling.Biometrika, 89, 745–754.
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and ReliableMonte Carlo Simulations with the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statisticswith Monte Carlo simulation.Journal of Statistics Education, 24(3),136-156.doi:10.1080/10691898.2016.1246953
Examples
## Not run: # Generate X ~ beta(a,b), where a and b are a = 2.7 and b = 6.3,# and the support is Y ~ Unif(0,1)dfn <- function(x) dbeta(x, shape1 = 2.7, shape2 = 6.3)dgn <- function(x) dunif(x, min = 0, max = 1)rgn <- function(n) runif(n, min = 0, max = 1)# when df and dg both integrate to 1, acceptance probability = 1/MM <- rejectionSampling(df=dfn, dg=dgn, rg=rgn)Mdat <- rejectionSampling(10000, df=dfn, dg=dgn, rg=rgn, M=M)hist(dat, 100)hist(rbeta(10000, 2.7, 6.3), 100) # compare# obtain empirical estimate of M via ESRS methodM <- rejectionSampling(1000, df=dfn, dg=dgn, rg=rgn, method='ESRS')M# generate using better support function (here, Y ~ beta(2,6)),# and use log setup in initial calls (more numerically accurate)dfn <- function(x) dbeta(x, shape1 = 2.7, shape2 = 6.3, log = TRUE)dgn <- function(x) dbeta(x, shape1 = 2, shape2 = 6, log = TRUE)rgn <- function(n) rbeta(n, shape1 = 2, shape2 = 6)M <- rejectionSampling(df=dfn, dg=dgn, rg=rgn, logfuns=TRUE) # better MM## Alternative estimation of M## M <- rejectionSampling(10000, df=dfn, dg=dgn, rg=rgn, logfuns=TRUE,## method='ESRS')dat <- rejectionSampling(10000, df=dfn, dg=dgn, rg=rgn, M=M, logfuns=TRUE)hist(dat, 100)#------------------------------------------------------# sample from wonky (and non-normalized) density function, like belowdfn <- function(x){ ret <- numeric(length(x)) ret[x <= .5] <- dnorm(x[x <= .5]) ret[x > .5] <- dnorm(x[x > .5]) + dchisq(x[x > .5], df = 2) ret}y <- seq(-5,5, length.out = 1000)plot(y, dfn(y), type = 'l', main = "Function to sample from")# choose dg/rg functions that have support within the range [-inf, inf]rgn <- function(n) rnorm(n, sd=4)dgn <- function(x) dnorm(x, sd=4)## example M height from above graphic## (M selected using ESRS to help stochastically avoid local mins)M <- rejectionSampling(10000, df=dfn, dg=dgn, rg=rgn, method='ESRS')Mlines(y, dgn(y)*M, lty = 2)dat <- rejectionSampling(10000, df=dfn, dg=dgn, rg=rgn, M=M)hist(dat, 100, prob=TRUE)# true density (normalized)C <- integrate(dfn, -Inf, Inf)$valuendfn <- function(x) dfn(x) / Ccurve(ndfn, col='red', lwd=2, add=TRUE)#-----------------------------------------------------# multivariate distributiondfn <- function(x) sum(log(c(dnorm(x[1]) + dchisq(x[1], df = 5), dnorm(x[2], -1, 2))))rgn <- function(n) c(rnorm(n, sd=3), rnorm(n, sd=3))dgn <- function(x) sum(log(c(dnorm(x[1], sd=3), dnorm(x[1], sd=3))))# M <- rejectionSampling(df=dfn, dg=dgn, rg=rgn, logfuns=TRUE)dat <- rejectionSampling(5000, df=dfn, dg=dgn, rg=rgn, M=4.6, logfuns=TRUE)hist(dat[,1], 30)hist(dat[,2], 30)plot(dat)## End(Not run)Generate integer values within specified range
Description
Efficiently generate positive and negative integer values with (default) or without replacement.This function is mainly a wrapper to thesample.int function (which itself is muchmore efficient integer sampler than the more generalsample), however is intendedto work with both positive and negative integer ranges sincesample.int only returnspositive integer values that must begin at1L.
Usage
rint(n, min, max, replace = TRUE, prob = NULL)Arguments
n | number of samples to draw |
min | lower limit of the distribution. Must be finite |
max | upper limit of the distribution. Must be finite |
replace | should sampling be with replacement? |
prob | a vector of probability weights for obtaining the elements of the vector being sampled |
Author(s)
Phil Chalmersrphilip.chalmers@gmail.com
References
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953
Examples
set.seed(1)# sample 1000 integer values within 20 to 100x <- rint(1000, min = 20, max = 100)summary(x)# sample 1000 integer values within 100 to 10 billionx <- rint(1000, min = 100, max = 1e8)summary(x)# compare speed to sample()system.time(x <- rint(1000, min = 100, max = 1e8))system.time(x2 <- sample(100:1e8, 1000, replace = TRUE))# sample 1000 integer values within -20 to 20x <- rint(1000, min = -20, max = 20)summary(x)Generate data with the inverse Wishart distribution
Description
Function generates data in the form of symmetric matrices from the inverseWishart distribution given a covariance matrix and degrees of freedom.
Usage
rinvWishart(n = 1, df, sigma)Arguments
n | number of matrix observations to generate. By default |
df | degrees of freedom |
sigma | positive definite covariance matrix |
Value
a numeric matrix with columns equal toncol(sigma) whenn = 1, or a listofn matrices with the same properties
Author(s)
Phil Chalmersrphilip.chalmers@gmail.com
References
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953
See Also
Examples
# random inverse Wishart matrix given variances [3,6], covariance 2, and df=15sigma <- matrix(c(3,2,2,6), 2, 2)x <- rinvWishart(sigma = sigma, df = 15)x# list of matricesx <- rinvWishart(20, sigma = sigma, df = 15)xGenerate data with the multivariate g-and-h distribution
Description
Generate non-normal distributions using the multivariate g-and-h distribution. Can be used togenerate several different classes of univariate and multivariate distributions.
Usage
rmgh(n, g, h, mean = rep(0, length(g)), sigma = diag(length(mean)))Arguments
n | number of samples to draw |
g | the g parameter(s) which control the skew of a distribution in terms of both directionand magnitude |
h | the h parameter(s) which control the tail weight or elongation of a distribution andis positively related with kurtosis |
mean | a vector of k elements for the mean of the variables |
sigma | desired k x k covariance matrix between bivariate non-normal variables |
Author(s)
Phil Chalmersrphilip.chalmers@gmail.com
References
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953
Examples
set.seed(1)# univariatenorm <- rmgh(10000,1e-5,0)hist(norm)skew <- rmgh(10000,1/2,0)hist(skew)neg_skew_platykurtic <- rmgh(10000,-1,-1/2)hist(neg_skew_platykurtic)# multivariatesigma <- matrix(c(2,1,1,4), 2)mean <- c(-1, 1)twovar <- rmgh(10000, c(-1/2, 1/2), c(0,0), mean=mean, sigma=sigma)hist(twovar[,1])hist(twovar[,2])plot(twovar)Generate data with the multivariate normal (i.e., Gaussian) distribution
Description
Function generates data from the multivariate normal distribution given some mean vector and/orcovariance matrix.
Usage
rmvnorm(n, mean = rep(0, nrow(sigma)), sigma = diag(length(mean)))Arguments
n | number of observations to generate |
mean | mean vector, default is |
sigma | positive definite covariance matrix, default is |
Value
a numeric matrix with columns equal tolength(mean)
Author(s)
Phil Chalmersrphilip.chalmers@gmail.com
References
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953
See Also
Examples
# random normal values with mean [5, 10] and variances [3,6], and covariance 2sigma <- matrix(c(3,2,2,6), 2, 2)mu <- c(5,10)x <- rmvnorm(1000, mean = mu, sigma = sigma)head(x)summary(x)plot(x[,1], x[,2])Generate data with the multivariate t distribution
Description
Function generates data from the multivariate t distribution given a covariance matrix,non-centrality parameter (or mode), and degrees of freedom.
Usage
rmvt(n, sigma, df, delta = rep(0, nrow(sigma)), Kshirsagar = FALSE)Arguments
n | number of observations to generate |
sigma | positive definite covariance matrix |
df | degrees of freedom. |
delta | the vector of non-centrality parameters of length |
Kshirsagar | logical; triggers whether to generate data with non-centrality parametersor to adjust the simulated data to the mode of the distribution. The default uses the mode |
Value
a numeric matrix with columns equal toncol(sigma)
Author(s)
Phil Chalmersrphilip.chalmers@gmail.com
References
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953
See Also
Examples
# random t values given variances [3,6], covariance 2, and df = 15sigma <- matrix(c(3,2,2,6), 2, 2)x <- rmvt(1000, sigma = sigma, df = 15)head(x)summary(x)plot(x[,1], x[,2])Generate a random set of values within a truncated range
Description
Function generates data given a supplied random number generating function thatare constructed to fall within a particular range. Sampled values outside thisrange are discarded and re-sampled until the desired criteria has been met.
Usage
rtruncate(n, rfun, range, ..., redraws = 100L)Arguments
n | number of observations to generate. This should be the first argumentpassed to |
rfun | a function to generate random values. Function can returna numeric/integer vector or matrix, and additional argumentsrequred for this function are passed through the argument |
range | a numeric vector of length two, where the first elementindicates the lower bound and the second the upper bound. When values aregenerated outside these two bounds then data are redrawn until the boundedcriteria is met. When the output of |
... | additional arguments to be passed to |
redraws | the maximum number of redraws to take before terminating theiterative sequence. This is in place as a safety in case the |
Details
In simulations it is often useful to draw numbers from truncated distributionsrather than across the full theoretical range. For instance, sampling parameterswithin the range [-4,4] from a normal distribution. Thertruncatefunction has been designed to accept any sampling function, where the firstargument is the number of values to sample, and will draw values iterativelyuntil the number of values within the specified bound are obtained.In situations where it is unlikely for the bounds to be located(e.g., sampling from a standard normal distribution where all values arewithin [-10,-6]) then the sampling scheme will throw an error if too manyre-sampling executions are required (default will stop if more that 100calls torfun are required).
Value
either a numeric vector or matrix, where all values are within thedesiredrange
Author(s)
Phil Chalmersrphilip.chalmers@gmail.com
References
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and ReliableMonte Carlo Simulations with the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statisticswith Monte Carlo simulation.Journal of Statistics Education, 24(3),136-156.doi:10.1080/10691898.2016.1246953
See Also
Examples
# n = 1000 truncated normal vector between [-2,3]vec <- rtruncate(1000, rnorm, c(-2,3))summary(vec)# truncated correlated multivariate normal between [-1,4]mat <- rtruncate(1000, rmvnorm, c(-1,4), sigma = matrix(c(2,1,1,1),2))summary(mat)# truncated correlated multivariate normal between [-1,4] for the# first column and [0,3] for the second columnmat <- rtruncate(1000, rmvnorm, cbind(c(-1,4), c(0,3)), sigma = matrix(c(2,1,1,1),2))summary(mat)# truncated chi-square with df = 4 between [2,6]vec <- rtruncate(1000, rchisq, c(2,6), df = 4)summary(vec)Run a Monte Carlo simulation using array job submissions per condition
Description
This function has the same purpose asrunSimulation, howeverrather than evaluating each row in adesign object (potentially withparallel computing architecture) this function evaluates the simulationper independent row condition. This is mainly useful when distributing thejobs to HPC clusters where a job array number is available (e.g., via SLURM),where the simulation results must be saved to independent files as theycomplete. Use ofexpandDesign is useful for distributing replicationsto different jobs, whilegenSeeds is required to ensure high-qualityrandom number generation across the array submissions. See the associatedvignette for a brief tutorial of this setup.
Usage
runArraySimulation( design, ..., replications, iseed, filename, dirname = NULL, arrayID = getArrayID(), array2row = function(arrayID) arrayID, addArrayInfo = TRUE, parallel = FALSE, cl = NULL, ncores = parallelly::availableCores(omit = 1L), save_details = list(), control = list(), verbose = interactive())Arguments
design | design object containing simulation conditions on a per row basis.This function is design to submit each row as in independent job on a HPC cluster.See |
... | additional arguments to be passed to |
replications | number of independent replications to perform percondition (i.e., each row in |
iseed | initial seed to be passed to |
filename | file name to save simulation files to (does not need tospecify extension). However, the array ID will be appended to each |
dirname | directory to save the files associated with |
arrayID | array identifier from the scheduler. Must be a number between1 and |
array2row | user defined function with the single argument For example, if each |
addArrayInfo | logical; should the array ID and original design row numberbe added to the |
parallel | logical; use parallel computations via the a "SOCK" cluster?Only useful when the instruction shell file requires more than 1 core(number of cores detected via |
cl | cluster definition. If omitted a "SOCK" cluster will be defined |
ncores | number of cores to use when |
save_details | optional list of extra file saving details.See |
control | control list passed to
Similarly, |
verbose | logical; pass a verbose flag to |
Details
Due to the nature of how the replication are split it is important thatthe L'Ecuyer-CMRG (2002) method of random seeds is used across allarray ID submissions (cf.runSimulation'sparallelapproach, which uses this method to distribute random seeds withineach isolated condition rather than between all conditions). As such, thisfunction requires the seeds to be generated usinggenSeeds with theiseed andarrayIDinputs to ensure that each job is analyzing a high-qualityset of random numbers via L'Ecuyer-CMRG's (2002) method, incremented usingnextRNGStream.
Additionally, for timed simulations on HPC clusters it is also recommended to pass acontrol = list(max_time) value to avoid discardingconditions that require more than the specified time in the shell script.Themax_time value should be less than the maximum time allocatedon the HPC cluster (e.g., approximately 90depends on how long, and how variable, each replication is). Simulations with missingreplications should submit a new set of jobs at a later timeto collect the missing information.
Author(s)
Phil Chalmersrphilip.chalmers@gmail.com
References
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248
See Also
runSimulation,expandDesign,genSeeds,SimCheck,SimCollect,getArrayID
Examples
library(SimDesign)Design <- createDesign(N = c(10, 20, 30))Generate <- function(condition, fixed_objects) { dat <- with(condition, rnorm(N, 10, 5)) # distributed N(10, 5) dat}Analyse <- function(condition, dat, fixed_objects) { ret <- c(mean=mean(dat), median=median(dat)) # mean/median of sample data ret}Summarise <- function(condition, results, fixed_objects){ colMeans(results)}## Not run: # define initial seed (do this only once to keep it constant!)# iseed <- genSeeds()iseed <- 554184288### On cluster submission, the active array ID is obtained via getArrayID(),### and therefore should be used in real SLURM submissionsarrayID <- getArrayID(type = 'slurm')# However, the following example arrayID is set to# the first row only for testing purposesarrayID <- 1L# run the simulation (results not caught on job submission, only files saved)res <- runArraySimulation(design=Design, replications=50, generate=Generate, analyse=Analyse, summarise=Summarise, arrayID=arrayID, iseed=iseed, filename='mysim') # saved as 'mysim-1.rds'resSimResults(res) # condition and replication count stored# same, but evaluated with multiple coresres <- runArraySimulation(design=Design, replications=50, generate=Generate, analyse=Analyse, summarise=Summarise, arrayID=arrayID, parallel=TRUE, ncores=3, iseed=iseed, filename='myparsim')resSimResults(res) # condition and replication count storeddir()SimClean(c('mysim-1.rds', 'myparsim-1.rds'))######################### Same submission job as above, however split the replications over multiple# evaluations and combine when completeDesign5 <- expandDesign(Design, 5)Design5# iseed <- genSeeds()iseed <- 554184288# arrayID <- getArrayID(type = 'slurm')arrayID <- 14L# run the simulation (replications reduced per row, but same in total)runArraySimulation(design=Design5, replications=10, generate=Generate, analyse=Analyse, summarise=Summarise, iseed=iseed, filename='mylongsim', arrayID=arrayID)res <- readRDS('mylongsim-14.rds')resSimResults(res) # condition and replication count storedSimClean('mylongsim-14.rds')#### Emulate the arrayID distribution, storing all results in a 'sim/' folder# (if 'sim/' does not exist in runArraySimulation() it will be# created automatically)dir.create('sim/')# Emulate distribution to nrow(Design5) = 15 independent job arrays## (just used for presentation purposes on local computer)sapply(1:nrow(Design5), \(arrayID) runArraySimulation(design=Design5, replications=10, generate=Generate, analyse=Analyse, summarise=Summarise, iseed=iseed, arrayID=arrayID, filename='condition', dirname='sim', # files: "sim/condition-#.rds" control = list(max_time="04:00:00", max_RAM="4GB"))) |> invisible()# If necessary, conditions above will manually terminate before# 4 hours and 4GB of RAM are used, returning any# successfully completed results before the HPC session times# out (provided .slurm script specified more than 4 hours)# list saved filesdir('sim/')# check that all files saved (warnings will be raised if missing files)SimCheck('sim/')condition14 <- readRDS('sim/condition-14.rds')condition14SimResults(condition14)# aggregate simulation results into single filefinal <- SimCollect('sim/')final# clean simulation directorySimClean(dirs='sim/')############# same as above, however passing different amounts of information depending# on the array IDarray2row <- function(arrayID){ switch(arrayID, "1"=1:8, "2"=9:14, "3"=15)}# arrayID 1 does row 1 though 8, arrayID 2 does 9 to 14array2row(1)array2row(2)array2row(3) # arrayID 3 does 15 only# emulate remote array distribution with only 3 arrayssapply(1:3, \(arrayID) runArraySimulation(design=Design5, replications=10, generate=Generate, analyse=Analyse, summarise=Summarise, iseed=iseed, arrayID=arrayID, filename='condition', dirname='sim', array2row=array2row)) |> invisible()# list saved filesdir('sim/')# Also works if the replication input were a suitable vector (not run)if(FALSE){ reps <- c(rep(10, 14), 100) cbind(Design5, reps) sapply(1:3, \(arrayID) runArraySimulation(design=Design5, replications=reps, generate=Generate, analyse=Analyse, summarise=Summarise, iseed=iseed, arrayID=arrayID, filename='condition', dirname='sim', array2row=array2row)) |> invisible()}# note that all row conditions are still stored separately, though note that# arrayID is now 2 insteadcondition14 <- readRDS('sim/condition-14.rds')condition14SimResults(condition14)# aggregate simulation results into single filefinal <- SimCollect('sim/')final# clean simulation directorySimClean(dirs='sim/')## End(Not run)Run a Monte Carlo simulation given conditions and simulation functions
Description
This function runs a Monte Carlo simulation study given a set of predefined simulation functions,design conditions, and number of replications. Results can be saved as temporary files in case ofinterruptions and may be restored by re-runningrunSimulation, provided that the respective tempfile can be found in the working directory.runSimulation supports paralleland cluster computing (with theparallel andfuture packages; see alsorunArraySimulation for submitting array jobs to HPC clusters),global and local debugging, error handling (including fail-safestopping when functions fail too often, even across nodes), provides bootstrap estimates of thesampling variability (optional), and automatic tracking of error and warning messageswith their associated.Random.seed states.For convenience, all functions available in the R work-space are exported across all nodesso that they are more easily accessible (however, other R objects are not, and thereforemust be passed to thefixed_objects input to become available across nodes).
Usage
runSimulation( design, replications, generate, analyse, summarise, prepare = NULL, fixed_objects = NULL, packages = NULL, filename = NULL, debug = "none", load_seed = NULL, load_seed_prepare = NULL, save = any(replications > 2), store_results = TRUE, save_results = FALSE, parallel = FALSE, ncores = parallelly::availableCores(omit = 1L), cl = NULL, notification = "none", notifier = NULL, beep = FALSE, sound = 1, check.globals = FALSE, CI = 0.95, seed = NULL, boot_method = "none", boot_draws = 1000L, max_errors = 50L, resume = TRUE, save_details = list(), control = list(), not_parallel = NULL, progress = TRUE, verbose = interactive())## S3 method for class 'SimDesign'summary(object, ...)## S3 method for class 'SimDesign'print(x, list2char = TRUE, ...)Arguments
design | a |
replications | number of independent replications to perform percondition (i.e., each row in |
generate | user-defined data and parameter generating function (or named list of functions).See |
analyse | user-defined analysis function (or named list of functions)that acts on the data generated from |
summarise | optional (but strongly recommended) user-defined summary functionfrom Note that unlike the Generate and Analysesteps, the Summarise portion is not as important to perfectly organizeas the results can be summarised later on by using the built-in Omitting this function will return a tibble with the Finally, there are keywords that should not be returned from thisfunction, since they will cause a conflict with the aggregated simulationobjects. These are currently those listed in capital letters (e.g., |
prepare | (optional) a function that executes once per simulation condition(i.e., once per row in The primary use case is to load pre-computed objects from disk that weregenerated offline:
This approach allows you to: (1) pre-generate expensive condition-specific objectsprior to running the simulation , (2) save them as individualRDS files, and (3) load them efficiently during the simulation. This is preferable togenerating objects within Note: Objects added to RNG Warning: If you generate objects within The function signature should be: Default is |
fixed_objects | (optional) an object (usually a named |
packages | a character vector of external packages to be used during the simulation (e.g., |
filename | (optional) the name of the Note that if the same file name already exists in the workingdirectly at the time of saving then a newfile will be generated instead and a warning will be thrown. This helps toavoid accidentally overwritingexisting files. Default is |
debug | a string indicating where to initiate a If the For debugging specific rows in the Finally, users may place |
load_seed | used to replicate an exact simulation state, which isprimarily useful for debugging purposes.Input can be a character object indicating which file to load from when the |
load_seed_prepare | similar to |
save | logical; save the temporary simulation state to the hard-drive? This is usefulfor simulations which require an extended amount of time, though for shorter simulationscan be disabled to slightly improve computational efficiency. When To recover your simulation at the last known location (having patched the issues in theprevious execution code) simply re-run the code you used toinitially define the simulation and the external file will automatically be detected and read-in.Default is |
store_results | logical; store the complete tables of simulation resultsin the returned object? This is To extract these resultspass the returned object to either |
save_results | logical; save the results returned from WARNING: saving results to your hard-drive can fill up space very quickly forlarger simulations. Be sure totest this option using a smaller number of replications before the full MonteCarlo simulation is performed.See also |
parallel | logical; use parallel processing from the Alternatively, if the |
ncores | number of cores to be used in parallel execution (ignored if using the |
cl | cluster object defined by If the |
notification | an optional character vector input that can be used to sendnotifications with information about execution time and recorded errors and warnings.Pass one of the following supported options: |
notifier | A notifier object (or a list of notifier objects, allowing for multiplenotification methods) required when Example usage:
Using multiple notifiers:
See the |
beep | logical; call the |
sound |
|
check.globals | logical; should the functions be inspected for potential global variable usage?Using global object definitions will raise issues with parallel processing, and thereforeany global object to be use in the simulation should either be defined within the codeitself of included in the |
CI | bootstrap confidence interval level (default is 95%) |
seed | a vector or list of integers to be used for reproducibility.The length of the vector must be equal the number of rows in |
boot_method | method for performing non-parametric bootstrap confidence intervalsfor the respective meta-statistics computed by the |
boot_draws | number of non-parametric bootstrap draws to sample for the |
max_errors | the simulation will terminate when more than this number of consecutiveerrors are thrown in anygiven condition, causing the simulation to continue to the next unique |
resume | logical; if a temporary |
save_details | a list pertaining to information regarding how and where files should be savedwhen the
|
control | a list for extra information flags for controlling lesscommonly used features. These include
|
not_parallel | integer vector indicating which rows in the |
progress | logical; display a progress bar (using the |
verbose | logical; print messages to the R console? Default is |
object | SimDesign object returned from |
... | additional arguments |
x | SimDesign object returned from |
list2char | logical; for |
Details
For an in-depth tutorial of the package please refer to Chalmers and Adkins (2020;doi:10.20982/tqmp.16.4.p248).For an earlier didactic presentation of the package refer to Sigal and Chalmers(2016;doi:10.1080/10691898.2016.1246953). Finally, see the associatedwiki on Github (https://github.com/philchalmers/SimDesign/wiki)for tutorial material, examples, and applications ofSimDesign to real-worldsimulation experiments, as well as the various vignette files associated with the package.
The strategy for organizing the Monte Carlo simulation work-flow is to
- 1)
Define a suitable
Designobject (atibbleordata.frame)containing fixed conditionalinformation about the Monte Carlo simulations. Each row or thisdesignobject pertainsto a unique set of simulation to study, while each column the simulation factor underinvestigation (e.g., sample size,distribution types, etc). This is often expedited by using thecreateDesignfunction, and if necessary the argumentsubsetcan be used to remove redundant or non-applicable rows- 2)
Define the three step functions to generate the data (
Generate; see alsohttps://CRAN.R-project.org/view=Distributions for a list of distributions in R),analyse the generated data by computing the respective parameter estimates, detection rates,etc (Analyse), and finally summarise the results across the totalnumber of replications (Summarise).- 3)
Pass the
designobject and three defined R functions torunSimulation,and declare the number of replications to perform with thereplicationsinput.This function will return a suitabletibbleobject with the complete simulation results and execution details- 4)
Analyze the output from
runSimulation, possibly using ANOVA techniques(SimAnova) and generating suitable plots and tables
Expressing the above more succinctly, the functions to be called have the following form,with the exact functional arguments listed:
Design <- createDesign(...)Generate <- function(condition, fixed_objects) {...}Analyse <- function(condition, dat, fixed_objects) {...}Summarise <- function(condition, results, fixed_objects) {...}res <- runSimulation(design=Design, replications, generate=Generate, analyse=Analyse, summarise=Summarise)
Thecondition object above represents a single row from thedesign object, indicatinga unique Monte Carlo simulation condition. Thecondition object also contains twoadditional elements to help track the simulation's state: anID variable, indicatingthe respective row number in thedesign object, and aREPLICATION elementindicating the replication iteration number (an integer value between 1 andreplication).This setup allows users to easily locate therth replication (e.g.,REPLICATION == 500)within thejth row in the simulation design (e.g.,ID == 2). TheREPLICATION input is also useful when temporarily saving files to the hard-drivewhen calling external command line utilities (see examples on the wiki).
For a template-based version of the work-flow, which is often useful when initiallydefining a simulation, use theSimFunctions function. Thisfunction will write a template simulationto one/two files so that modifying the required functions and objects can begin immediately.This means that users can focus on their Monte Carlo simulation details right away ratherthan worrying about the repetitive administrative code-work required to organize the simulation'sexecution flow.
Finally, examples, presentation files, and tutorials can be found on the package wiki located athttps://github.com/philchalmers/SimDesign/wiki.
Value
atibble from thedplyr package (also of class'SimDesign')with the originaldesign conditions in the left-most columns,simulation results in the middle columns, and additional information in the right-most columns (see below).
The right-most column information for each condition are:REPLICATIONS to indicate the number of Monte Carlo replications,SIM_TIME to indicate how long (in seconds) it took to completeall the Monte Carlo replications for each respective design condition,RAM_USED amount of RAM that was in use at the time of completingeach simulation condition,COMPLETED to indicate the date in which the given simulation condition completed,SEED for the integer values in theseed argument (if applicable; notrelevant when L'Ecuyer-CMRG method used), and, if applicable,ERRORS andWARNINGS which contain counts for the number of error or warningmessages that were caught (if no errors/warnings were observed these columns will be omitted).Note that to extract the specific error and warnings messages seeSimExtract. Finally,ifboot_method was a valid input other than 'none' then the final right-mostcolumns will contain the labelsBOOT_ followed by the name of the associated meta-statistic defined insummarise() andand bootstrapped confidence interval location for the meta-statistics.
Saving data, results, seeds, and the simulation state
To conserve RAM, temporary objects (such as data generated across conditions and replications)are discarded; however, these can be saved to the hard-disk by passing the appropriate flags.For longer simulations it is recommended to use thesave_results flag to write theanalysis results to the hard-drive.
The use of thestore_seeds or thesave_seeds optionscan be evoked to save R's.Random.seedstate to allow for complete reproducibility of each replication within each condition. Theseindividual.Random.seed terms can then be read in with theload_seed input to reproduce the exact simulation state at any given replication.Most often though, saving the complete list of seeds is unnecessary as problematic seeds areautomatically stored in the final simulation object to allow for easier replicabilityof potentially problematic errors (which incidentally can be extractedusingSimExtract(res, 'error_seeds') and passed to theload_seed argument). Finally,providing a vector ofseeds is also possible to ensurethat each simulation condition is macro reproducible under the single/multi-core method selected.
Finally, when the Monte Carlo simulation is completeit is recommended to write the results to a hard-drive for safe keeping, particularly with thefilename argument provided (for reasons that are more obvious in the parallel computationdescriptions below). Using thefilename argument supplied is safer than using, for instance,saveRDS directly because files will never accidentally be overwritten,and instead a new file name will be created when a conflict arises; this type of implementation safetyis prevalent in many locations in the package to help avoid unrecoverable (yet surprisinglycommon) mistakes during the process of designing and executing Monte Carlo simulations.
Resuming temporary results
In the event of a computer crash, power outage, etc, ifsave = TRUE was used (the default)then the original code used to executerunSimulation() need only be re-run to resumethe simulation. The saved temp file will be read into the function automatically, and thesimulation will continue one the condition where it left off before the simulationstate was terminated. If users wish to remove this temporarysimulation state entirely so as to start anew then simply passSimClean(temp = TRUE)in the R console to remove any previously saved temporary objects.
A note on parallel computing
When running simulations in parallel (either withparallel = TRUEor when using thefuture approach with aplan() other than sequential)R objects defined in the global environment will generallynot be visible across nodes.Hence, you may see errors such asError: object 'something' not found if you try to usean object that is defined in the work space but is not passed torunSimulation.To avoid this type or error, simply pass additional objects to thefixed_objects input (usually it's convenient to supply a named list of these objects).Fortunately, however,custom functions defined in the global environment are exported acrossnodes automatically. This makes it convenient when writing code because custom functions willalways be available across nodes if they are visible in the R work space. As well, note thepackages input to declare packages which must be loaded vialibrary() in order to makespecific non-standard R functions available across nodes.
Author(s)
Phil Chalmersrphilip.chalmers@gmail.com
References
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulationswith the SimDesign Package.The Quantitative Methods for Psychology, 16(4), 248-280.doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with MonteCarlo simulation.Journal of Statistics Education, 24(3), 136-156.doi:10.1080/10691898.2016.1246953
See Also
SimFunctions,createDesign,Generate,Analyse,Summarise,SimExtract,reSummarise,SimClean,SimAnova,SimResults,SimCollect,Attach,AnalyseIf,SimShiny,manageWarnings,runArraySimulation
Examples
#-------------------------------------------------------------------------------# Example 1: Sampling distribution of mean# This example demonstrate some of the simpler uses of SimDesign,# particularly for classroom settings. The only factor varied in this simulation# is sample size.# skeleton functions to be saved and editedSimFunctions()#### Step 1 --- Define your conditions under study and create design data.frameDesign <- createDesign(N = c(10, 20, 30))#~~~~~~~~~~~~~~~~~~~~~~~~#### Step 2 --- Define generate, analyse, and summarise functions# help(Generate)Generate <- function(condition, fixed_objects) { dat <- with(condition, rnorm(N, 10, 5)) # distributed N(10, 5) dat}# help(Analyse)Analyse <- function(condition, dat, fixed_objects) { ret <- c(mean=mean(dat)) # mean of the sample data vector ret}# help(Summarise)Summarise <- function(condition, results, fixed_objects) { # mean and SD summary of the sample means ret <- c(mu=mean(results$mean), SE=sd(results$mean)) ret}#~~~~~~~~~~~~~~~~~~~~~~~~#### Step 3 --- Collect results by looping over the rows in design# run the simulation in testing mode (replications = 2)Final <- runSimulation(design=Design, replications=2, generate=Generate, analyse=Analyse, summarise=Summarise)FinalSimResults(Final)# reproduce exact simulationFinal_rep <- runSimulation(design=Design, replications=2, seed=Final$SEED, generate=Generate, analyse=Analyse, summarise=Summarise)Final_repSimResults(Final_rep)## Not run: # run with more standard number of replicationsFinal <- runSimulation(design=Design, replications=1000, generate=Generate, analyse=Analyse, summarise=Summarise)FinalSimResults(Final)#~~~~~~~~~~~~~~~~~~~~~~~~#### Extras# compare SEs estimates to the true SEs from the formula sigma/sqrt(N)5 / sqrt(Design$N)# To store the results from the analyse function either# a) omit a definition of summarise() to return all results,# b) use store_results = TRUE (default) to store results internally and later# extract with SimResults(), or# c) pass save_results = TRUE to runSimulation() and read the results in with SimResults()## Note that method c) should be adopted for larger simulations, particularly# if RAM storage could be an issue and error/warning message information is important.# a) approachres <- runSimulation(design=Design, replications=100, generate=Generate, analyse=Analyse)res# b) approach (store_results = TRUE by default)res <- runSimulation(design=Design, replications=100, generate=Generate, analyse=Analyse, summarise=Summarise)resSimResults(res)# c) approachFinal <- runSimulation(design=Design, replications=100, save_results=TRUE, generate=Generate, analyse=Analyse, summarise=Summarise)# read-in all conditions (can be memory heavy)res <- SimResults(Final)reshead(res[[1]]$results)# just first conditionSimResults(Final, which=1)# obtain empirical bootstrapped CIs during an initial run# the simulation was completed (necessarily requires save_results = TRUE)res <- runSimulation(design=Design, replications=1000, boot_method = 'basic', generate=Generate, analyse=Analyse, summarise=Summarise)res# alternative bootstrapped CIs that uses saved results via reSummarise().# Default directory save to:dirname <- paste0('SimDesign-results_', unname(Sys.info()['nodename']), "/")res <- reSummarise(summarise=Summarise, dir=dirname, boot_method = 'basic')res# remove the saved results from the hard-drive if you no longer want themSimClean(results = TRUE)## End(Not run)#-------------------------------------------------------------------------------# Example 2: t-test and Welch test when varying sample size, group sizes, and SDs# skeleton functions to be saved and editedSimFunctions()## Not run: # in real-world simulations it's often better/easier to save# these functions directly to your hard-drive withSimFunctions('my-simulation')## End(Not run)#### Step 1 --- Define your conditions under study and create design data.frameDesign <- createDesign(sample_size = c(30, 60, 90, 120), group_size_ratio = c(1, 4, 8), standard_deviation_ratio = c(.5, 1, 2))Design#~~~~~~~~~~~~~~~~~~~~~~~~#### Step 2 --- Define generate, analyse, and summarise functionsGenerate <- function(condition, fixed_objects) { N <- condition$sample_size # could use Attach() to make objects available grs <- condition$group_size_ratio sd <- condition$standard_deviation_ratio if(grs < 1){ N2 <- N / (1/grs + 1) N1 <- N - N2 } else { N1 <- N / (grs + 1) N2 <- N - N1 } group1 <- rnorm(N1) group2 <- rnorm(N2, sd=sd) dat <- data.frame(group = c(rep('g1', N1), rep('g2', N2)), DV = c(group1, group2)) dat}Analyse <- function(condition, dat, fixed_objects) { welch <- t.test(DV ~ group, dat)$p.value independent <- t.test(DV ~ group, dat, var.equal=TRUE)$p.value # In this function the p values for the t-tests are returned, # and make sure to name each element, for future reference ret <- nc(welch, independent) ret}Summarise <- function(condition, results, fixed_objects) { #find results of interest here (e.g., alpha < .1, .05, .01) ret <- EDR(results, alpha = .05) ret}#~~~~~~~~~~~~~~~~~~~~~~~~#### Step 3 --- Collect results by looping over the rows in design# first, test to see if it worksres <- runSimulation(design=Design, replications=2, generate=Generate, analyse=Analyse, summarise=Summarise)res## Not run: # complete run with 1000 replications per conditionres <- runSimulation(design=Design, replications=1000, parallel=TRUE, generate=Generate, analyse=Analyse, summarise=Summarise)resView(res)## save final results to a file upon completion, and play a beep when donerunSimulation(design=Design, replications=1000, parallel=TRUE, filename = 'mysim', generate=Generate, analyse=Analyse, summarise=Summarise, beep=TRUE)## same as above, but send a notification via Pushbullet upon completionlibrary(RPushbullet) # read-in default JSON filepushbullet_notifier <- new_PushbulletNotifier(verbose_issues = TRUE)runSimulation(design=Design, replications=1000, parallel=TRUE, filename = 'mysim', generate=Generate, analyse=Analyse, summarise=Summarise, notification = 'complete', notifier = pushbullet_notifier)## Submit as RStudio job (requires job package and active RStudio session)job::job({ res <- runSimulation(design=Design, replications=100, generate=Generate, analyse=Analyse, summarise=Summarise)}, title='t-test simulation')res # object res returned to console when completed## Debug the generate function. See ?browser for help on debugging## Type help to see available commands (e.g., n, c, where, ...),## ls() to see what has been defined, and type Q to quit the debuggerrunSimulation(design=Design, replications=1000, generate=Generate, analyse=Analyse, summarise=Summarise, parallel=TRUE, debug='generate')## Alternatively, place a browser() within the desired function line to## jump to a specific locationSummarise <- function(condition, results, fixed_objects) { #find results of interest here (e.g., alpha < .1, .05, .01) browser() ret <- EDR(results[,nms], alpha = .05) ret}## The following debugs the analyse function for the## second row of the Design inputrunSimulation(design=Design, replications=1000, generate=Generate, analyse=Analyse, summarise=Summarise, parallel=TRUE, debug='analyse-2')## End(Not run)###################################### Not run: ## Network linked via ssh (two way ssh key-paired connection must be## possible between master and slave nodes)#### Define IP addresses, including primary IPprimary <- '192.168.2.20'IPs <- list( list(host=primary, user='phil', ncore=8), list(host='192.168.2.17', user='phil', ncore=8))spec <- lapply(IPs, function(IP) rep(list(list(host=IP$host, user=IP$user)), IP$ncore))spec <- unlist(spec, recursive=FALSE)cl <- parallel::makeCluster(type='PSOCK', master=primary, spec=spec)res <- runSimulation(design=Design, replications=1000, parallel = TRUE, generate=Generate, analyse=Analyse, summarise=Summarise, cl=cl)## Using parallel='future' to allow the future framework to be used insteadlibrary(future) # future structure to be used internallyplan(multisession) # specify different plan (default is sequential)res <- runSimulation(design=Design, replications=100, parallel='future', generate=Generate, analyse=Analyse, summarise=Summarise)head(res)# The progressr package is used for progress reporting with futures. To redefine# use progressr::handlers() (see below)library(progressr)with_progress(res <- runSimulation(design=Design, replications=100, parallel='future', generate=Generate, analyse=Analyse, summarise=Summarise))head(res)# re-define progressr's bar (below requires cli)handlers(handler_pbcol( adjust = 1.0, complete = function(s) cli::bg_red(cli::col_black(s)), incomplete = function(s) cli::bg_cyan(cli::col_black(s))))with_progress(res <- runSimulation(design=Design, replications=100, parallel='future', generate=Generate, analyse=Analyse, summarise=Summarise))# reset future computing plan when complete (good practice)plan(sequential)## End(Not run)########################################## Post-analysis: Analyze the results via functions like lm() or SimAnova(), and create###### tables(dplyr) or plots (ggplot2) to help visualize the results.###### This is where you get to be a data analyst!library(dplyr)res %>% summarise(mean(welch), mean(independent))res %>% group_by(standard_deviation_ratio, group_size_ratio) %>% summarise(mean(welch), mean(independent))# quick ANOVA analysis method with all two-way interactionsSimAnova( ~ (sample_size + group_size_ratio + standard_deviation_ratio)^2, res, rates = TRUE)# or more specific ANOVAsSimAnova(independent ~ (group_size_ratio + standard_deviation_ratio)^2, res, rates = TRUE)# make some plotslibrary(ggplot2)library(tidyr)dd <- res %>% select(group_size_ratio, standard_deviation_ratio, welch, independent) %>% pivot_longer(cols=c('welch', 'independent'), names_to = 'stats')ddggplot(dd, aes(factor(group_size_ratio), value)) + geom_boxplot() + geom_abline(intercept=0.05, slope=0, col = 'red') + geom_abline(intercept=0.075, slope=0, col = 'red', linetype='dotted') + geom_abline(intercept=0.025, slope=0, col = 'red', linetype='dotted') + facet_wrap(~stats)ggplot(dd, aes(factor(group_size_ratio), value, fill = factor(standard_deviation_ratio))) + geom_boxplot() + geom_abline(intercept=0.05, slope=0, col = 'red') + geom_abline(intercept=0.075, slope=0, col = 'red', linetype='dotted') + geom_abline(intercept=0.025, slope=0, col = 'red', linetype='dotted') + facet_grid(stats~standard_deviation_ratio) + theme(legend.position = 'none')#-------------------------------------------------------------------------------# Example with prepare() function - Loading pre-computed objects## Not run: # Step 1: Pre-generate expensive objects offline (can be parallelized)Design <- createDesign(N = c(10, 20), rho = c(0.3, 0.7))# Create directory for storing prepared objectsdir.create('prepared_objects', showWarnings = FALSE)# Generate and save correlation matrices for each conditionfor(i in 1:nrow(Design)) { cond <- Design[i, ] # Generate correlation matrix corr_matrix <- matrix(cond$rho, nrow=cond$N, ncol=cond$N) diag(corr_matrix) <- 1 # Create filename based on design parameters fname <- paste0('prepared_objects/N', cond$N, '_rho', cond$rho, '.rds') # Save to disk saveRDS(corr_matrix, file = fname)}# Step 2: Use prepare() to load these objects during simulationprepare <- function(condition, fixed_objects) { # Create matching filename fname <- paste0('prepared_objects/N', condition$N, '_rho', condition$rho, '.rds') # Load the pre-computed correlation matrix fixed_objects$corr_matrix <- readRDS(fname) return(fixed_objects)}generate <- function(condition, fixed_objects) { # Use the loaded correlation matrix to generate multivariate data dat <- MASS::mvrnorm(n = 50, mu = rep(0, condition$N), Sigma = fixed_objects$corr_matrix) data.frame(dat)}analyse <- function(condition, dat, fixed_objects) { # Calculate mean correlation in generated data obs_corr <- cor(dat) c(mean_corr = mean(obs_corr[lower.tri(obs_corr)]))}summarise <- function(condition, results, fixed_objects) { c(mean_corr = mean(results$mean_corr))}# Run simulation - prepare() loads objects efficientlyresults <- runSimulation(design = Design, replications = 2, prepare = prepare, generate = generate, analyse = analyse, summarise = summarise, verbose = FALSE)results# CleanupSimClean(dirs='prepared_objects')## End(Not run)Format time string to suitable numeric output
Description
Format time input string into suitable numeric output metric (e.g., seconds).Input follows theSBATCH utility specifications.Accepted time formats include"minutes","minutes:seconds","hours:minutes:seconds","days-hours","days-hours:minutes" and"days-hours:minutes:seconds".
Usage
timeFormater(time, output = "sec")Arguments
time | a character string to be formatted. If a numeric vector is suppliedthen this will be interpreted as minutes due to character coercion. |
output | type of numeric output to convert time into.Currently supported are |
Details
For example,time = "60" indicates a maximum time of 60 minutes,time = "03:00:00" a maximum time of 3 hours,time = "4-12" a maximum of 4 days and 12 hours, andtime = "2-02:30:00" a maximum of 2 days, 2 hours and 30 minutes.
Examples
# Test cases (outputs in seconds)timeFormater("4-12") # day-hourstimeFormater("4-12:15") # day-hours:minutestimeFormater("4-12:15:30") # day-hours:minutes:secondstimeFormater("30") # minutestimeFormater("30:30") # minutes:secondstimeFormater("4:30:30") # hours:minutes:seconds# output in hourstimeFormater("4-12", output = 'hour')timeFormater("4-12:15", output = 'hour')timeFormater("4-12:15:30", output = 'hour')timeFormater("30", output = 'hour')timeFormater("30:30", output = 'hour')timeFormater("4:30:30", output = 'hour')# numeric input is understood as minutestimeFormater(42) # secondstimeFormater(42, output='min') # minutes