Movatterモバイル変換

Type:

Package

Title:

Computationally-Efficient Confidence Intervals for Mean Shiftfrom Permutation Methods

Version:

0.2.3

Date:

2022-06-21

Description:

Implements computationally-efficient construction of confidence intervals from permutation or randomization tests for simple differences in means, based on Nguyen (2009) <doi:10.15760/etd.7798>.

License:

MIT + file LICENSE

Encoding:

UTF-8

RoxygenNote:

7.1.2

Imports:

matrixStats

Suggests:

knitr, rmarkdown

VignetteBuilder:

knitr

Language:

en-US

URL:

https://github.com/ColbyStatSvyRsch/CIPerm/

BugReports:

https://github.com/ColbyStatSvyRsch/CIPerm/issues

NeedsCompilation:

Packaged:

2022-06-21 12:51:24 UTC; jawieczo

Author:

Emily Tupaj [aut], Jerzy Wieczorek

[cre, aut], Minh Nguyen [ctb], Mara Tableman [ctb]

Maintainer:

Jerzy Wieczorek <jawieczo@colby.edu>

Repository:

CRAN

Date/Publication:

2022-06-21 13:10:10 UTC

CIPerm: Computationally-Efficient Confidence Intervals for Mean Shift from Permutation Methods

Description

Implements computationally-efficient construction ofconfidence intervals from permutation tests or randomization testsfor simple differences in means.The method is based on Minh D. Nguyen's 2009 MS thesis paper,"Nonparametric Inference using Randomization and PermutationReference Distribution and their Monte-Carlo Approximation,"<doi:10.15760/etd.7798>See thenguyen vignette for a brief summary of the method.First usedset to tabulate summary statistics for each permutation.Then pass the results intocint to compute a confidence interval,or intopval to calculate p-values.

Details

Our R function arguments and outputs are structured differentlythan the similarly-named R functions in Nguyen (2009),but the results are equivalent. In thenguyen vignettewe use our functions to replicate Nguyen's results.

Following Ernst (2004) and Nguyen (2009), we use "permutation methods"to include both randomization tests and permutation tests.In the simple settings in this R package,the randomization and permutation test mechanics are identical,but their interpretations may differ.

We say "randomization test" under the model wherethe units are not necessarily a random sample, but the treatment assignmentwas random. The null hypothesis is that the treatment has no effect.In this case we can make causal inferences about thetreatment effect (difference between groups) for this set of individuals,but cannot necessarily generalize to other populations.

By contrast, we say "permutation test" under the model wherethe units were randomly sampled from two distinct subpopulations.The null hypothesis is that the two groups have identical CDFs.In this case we can make inferences about differences between subpopulations,but there's not necessarily any "treatment" to speak ofand causal inferences may not be relevant.

References

Ernst, M.D. (2004)."Permutation Methods: A Basis for Exact Inference,"Statistical Science, vol. 19, no. 4, 676-685,<doi:10.1214/088342304000000396>.

Nguyen, M.D. (2009)."Nonparametric Inference using Randomization and PermutationReference Distribution and their Monte-Carlo Approximation"[unpublished MS thesis; Mara Tableman, advisor], Portland State University.Dissertations and Theses. Paper 5927.<doi:10.15760/etd.7798>.

Permutation-methods confidence interval for difference in means

Description

Calculate confidence interval for a simple difference in meansfrom a two-sample permutation or randomization test.In other words, we set up a permutation or randomization test to evaluateH_0: \mu_A - \mu_B = 0, then use those same permutations toconstruct a CI for the parameter\delta = (\mu_A - \mu_B).

Usage

cint(dset, conf.level = 0.95, tail = c("Two", "Left", "Right"))

Arguments

dset

The output ofdset.

conf.level

Confidence level (default 0.95 corresponds to 95% confidence level).

tail

Which tail? Either "Two"- or "Left"- or "Right"-tailed interval.

Details

If the desiredconf.level is not exactly feasible,the achieved confidence level will be slightly anti-conservative.We use the default numeric tolerance inall.equal to checkif(1-conf.level) * nrow(dset) is an integer for one-tailed CIs,or if(1-conf.level)/2 * nrow(dset) is an integer for two-tailed CIs.If so,conf.level.achieved will be the desiredconf.level.Otherwise, we will use the next feasible integer,thus slightly reducing the confidence level.For example, in the example below the randomization test has 35 combinations,and a two-sided CI must have at least one combination value in each tail,so the largest feasible confidence level for a two-sided CI is 1-(2/35) or around 94.3%.If we request a 95% or 99% CI, we will have to settle for a 94.3% CI instead.

Value

A list containing the following components:

conf.int: Numeric vector with the CI's two endpoints.
conf.level.achieved: Numeric value of the achieved confidence level.

Examples

x <- c(19, 22, 25, 26)y <- c(23, 33, 40)demo <- dset(x, y)cint(dset = demo, conf.level = .95, tail = "Two")

Permutation-methods summary statistics

Description

Calculate table of differences in means, medians, etc. for eachcombination (or permutation, if using Monte Carlo approx.),as needed in order to compute a confidence interval usingcintand/or a p-value usingpval.

Usage

dset(group1, group2, nmc = 10000, returnData = FALSE)

Arguments

group1

Vector of numeric values for first group.

group2

Vector of numeric values for second group.

nmc

Threshold for whether to use Monte Carlo draws or completeenumeration. If the number of all possible combinationschoose(n1+n2, n1) <= nmc, we use complete enumeration.Otherwise, we take a Monte Carlo sample ofnmc permutations.You can setnmc = 0 to force complete enumeration regardless ofhow many combinations there are.

returnData

Whether the returned dataframe should include columns forthe permuted data itself (if TRUE), or only the derived columns that areneeded for confidence intervals and p-values (if FALSE, default).

Value

A data frame ready to be used incint() orpval().

Examples

x <- c(19, 22, 25, 26)y <- c(23, 33, 40)demo <- dset(x, y, returnData = TRUE)knitr::kable(demo, digits = 2)

Permutations-methods p-values for difference in means, medians, or Wilcoxon rank sum test

Description

Calculate p-values for a two-sample permutation or randomization test.In other words, we set up a permutation or randomization test to evaluatethe null hypothesis that groups A and B have the same distribution,then calculate p-values for several alternatives:a difference in means (value="m"),a difference in medians (value="d"),or the Wilcoxon rank sum test (value="w").

Usage

pval(  dset,  tail = c("Two", "Left", "Right"),  value = c("m", "s", "d", "w", "a"))

Arguments

dset

The output ofdset.

tail

Which tail? Either "Two"- or "Left"- or "Right"-tailed test.

value

Either "m" for difference in means (default);"s" for sum of Group 1 values[equivalent to "m" and included only for sake of checking results againstNguyen (2009) and Ernst (2004)];"d" for difference in medians;or "w" for Wilcoxon rank sum statistic;or "a" for a named vector of all four p-values.

Value

Numeric p-value for the selected type of test,or a named vector of all four p-values ifvalue="a".

Examples

x <- c(19, 22, 25, 26)y <- c(23, 33, 40)demo <- dset(x, y)pval(dset = demo, tail = "Left", value = "s")pval(dset = demo, tail = "Left", value = "a")