- Notifications
You must be signed in to change notification settings - Fork3
Flexible Inference via Permutations in R
License
permaverse/flipr
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
The goal of theflipr packageis to provide a flexible framework for making inference via permutation.The idea is to promote the permutation framework as an incrediblywell-suited tool for inference on complex data. You supply your data, ascomplex as it might be, in the form of lists in which each entry storesone data point in a representation that suits you andflipr takes care of thepermutation magic and provides you with either point estimates orconfidence regions or
You can install the package fromCRANwith:
install.packages("flipr")Alternatively, You can install the development version offlipr fromGitHub with:
# install.packages("pak")pak::pak("permaverse/flipr")
library(flipr)We hereby use the very simple t-test for comparing the means of twounivariate samples to show how easy it is to carry out a permutationtest withflipr.
Let us first generate two samples of size
set.seed(123)n<-15x<- rnorm(n=n,mean=0,sd=1)y<- rnorm(n=n,mean=1,sd=1)
Given the data we simulated, the parameter of interest here is thedifference between the means of the distributions, say
In the context of null hypothesis testing, we consider the nullhypothesis
We can define a proper function to do this, termed thenullspecification function, which takes two input arguments:
ywhich is a list storing the data points in the second sample;parameterswhich is a numeric vector of values for the parametersunder investigation (here only$\delta$ and thusparametersis oflength$1$ withparameters[1] = delta).
In our simple example, it boils down to:
null_spec<-function(y,parameters) {purrr::map(y,~.x-parameters[1])}
Next, we need to decide which test statistic(s) we are going to use forperforming the test. Here, we are only interested in one parameter,namely the mean difference
This statistic can be easily computed usingstats::t.test(x, y, var.equal = TRUE)$statistic. However, we want toextend its evaluation to any permuted version of the data. Teststatistic functions compatible withflipr should have at leasttwo mandatory input arguments:
datawhich is either a concatenated list of size$n_x + n_y$ regrouping the data points of both samples or a distance matrix ofsize$(n_x + n_y) \times (n_x + n_y)$ stored as an object of classdist.indices1which is an integer vector of size$n_x$ storing theindices of the data points belonging to the first sample in thecurrent permuted version of the data.
Some test statistics are already implemented inflipr and ready to use.User-defined test statistics can be used as well, with the use of thehelper functionuse_stat(nsamples = 2, stat_name = ). This functioncreates and saves an.R file in theR/ folder of the current workingdirectory and populates it with the following template:
#' Test Statistic for the Two-Sample Problem#'#' This function computes the test statistic...#'#' @param data A list storing the concatenation of the two samples from which#' the user wants to make inference. Alternatively, a distance matrix stored#' in an object of class \code{\link[stats]{dist}} of pairwise distances#' between data points.#' @param indices1 An integer vector that contains the indices of the data#' points belong to the first sample in the current permuted version of the#' data.#'#' @return A numeric value evaluating the desired test statistic.#' @export#'#' @examples#' # TO BE DONE BY THE DEVELOPER OF THE PACKAGEstat_{{{name}}}<-function(data,indices1) {n<-if (inherits(data,"dist")) attr(data,"Size")elseif (inherits(data,"list")) length(data)else stop("The `data` input should be of class either list or dist.")indices2<- seq_len(n)[-indices1]x<-data[indices1]y<-data[indices2]# Here comes the code that computes the desired test# statistic from input samples stored in lists x and y}
For instance, aflipr-compatible version ofthe
my_t_stat<-function(data,indices1) {n<-if (inherits(data,"dist")) attr(data,"Size")elseif (inherits(data,"list")) length(data)else stop("The `data` input should be of class either list or dist.")indices2<- seq_len(n)[-indices1]x<-data[indices1]y<-data[indices2]# Here comes the code that computes the desired test# statistic from input samples stored in lists x and yx<- unlist(x)y<- unlist(y)stats::t.test(x,y,var.equal=TRUE)$statistic}
Here, we are only going to use the
stat_functions<-list(my_t_stat)
Finally we need to define a named list that tellsflipr which test statisticsamong the ones declared in thestat_functions list should be used foreach parameter under investigation. This is used to determine bounds oneach parameter for the plausibility function. This list, often termedstat_assignments, should therefore have as many elements as there areparameters under investigation. Each element should be named after aparameter under investigation and should list the indices correspondingto the test statistics that should be used for that parameter instat_functions. In our example, it boils down to:
stat_assignments<-list(delta=1)
Now we can instantiate a plausibility function as follows:
pf<-PlausibilityFunction$new(null_spec=null_spec,stat_functions=stat_functions,stat_assignments=stat_assignments,x,y)#> ! Setting the seed for sampling permutations is mandatory for obtaining a continuous p-value function. Using `seed = 1234`.
Now, assume we want to test the following hypotheses:
We use the$get_value() method for this purpose, which essentiallyevaluates the permutation
pf$get_value(0)#> [1] 0.1078921
We can compare the resulting
t.test(x,y,var.equal=TRUE)$p.value#> [1] 0.1030946
The permutation
- The resolution of a permutation
$p$ -value is of the order of$1/(B+1)$ , where$B$ is the number of sampled permutations. Bydefault, the plausibility function is instantiated with$B = 1000$ :
pf$nperms#> [1] 1000
- We randomly sample
$B$ permutations out of the$\binom{n_x+n_y}{n_x}$ possible permutations and therefore introduceextra variability in the$p$ -value.
If we were to ask for more permutations, say
pf$set_nperms(1000000)pf$get_value(0)#> [1] 0.1029879
About
Flexible Inference via Permutations in R
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Uh oh!
There was an error while loading.Please reload this page.
Contributors4
Uh oh!
There was an error while loading.Please reload this page.
