You signed in with another tab or window.Reload to refresh your session.You signed out in another tab or window.Reload to refresh your session.You switched accounts on another tab or window.Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: vignettes/flipr.qmd
+21Lines changed: 21 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -126,6 +126,7 @@ We can define a proper function to do this, termed the *null specification* func
126
126
-`parameters` which is a numeric vector of values for the parameters under investigation (here only $\delta$ and thus`parameters` is of length $1$ with`parameters[1] = delta`).
127
127
128
128
In our simple example, it boils down to:
129
+
129
130
```{r}
130
131
null_spec <- function(y, parameters) {
131
132
purrr::map(y, ~ .x - parameters[1])
@@ -142,20 +143,23 @@ This statistic can be easily computed using `stats::t.test(x, y, var.equal = TRU
142
143
-`indices1` which is an integer vector of size $n_x$ storing the indices of the data points belonging to the first sample in the current permuted version of the data.
143
144
144
145
A[**flipr**](https://permaverse.github.io/flipr/)-compatible version of the t-statistic is already implemented in[**flipr**](https://permaverse.github.io/flipr/) and ready to use as`stat_student` or its alias`stat_t`. Here, we are only going to use the $t$-statistic for this example, but we might be willing to use more than one statistic for a parameter or we might have several parameters under investigation, each one of them requiring a different test statistic. We therefore group all the test statistics that we need into a single list:
146
+
145
147
```{r}
146
148
stat_functions <- list(stat_t)
147
149
```
148
150
149
151
###Statistic assignments
150
152
151
153
Finally we need to define a named list that tells[**flipr**](https://permaverse.github.io/flipr/) which test statistics among the ones declared in the`stat_functions` list should be used for each parameter under investigation. This is used to determine bounds on each parameter for the plausibility function. This list, often termed`stat_assignments`, should therefore have as many elements as there are parameters under investigation. Each element should be named after a parameter under investigation and should list the indices corresponding to the test statistics that should be used for that parameter in`stat_functions`. In our example, it boils down to:
154
+
152
155
```{r}
153
156
stat_assignments <- list(delta = 1)
154
157
```
155
158
156
159
###Instantiation of the plausibility function
157
160
158
161
In[**flipr**](https://permaverse.github.io/flipr/), the plausibility function is implemented as an[R6Class](https://r6.r-lib.org/reference/R6Class.html) object. Assume we observed two samples stored in lists`x` and`y`, we therefore instantiate a plausibility function for this data as follows:
162
+
159
163
```{r, eval=FALSE}
160
164
pf <- PlausibilityFunction$new(
161
165
null_spec = null_spec,
@@ -170,6 +174,7 @@ Now, assume we want to test the following hypotheses:
We use the`$get_value()` method for this purpose, which essentially evaluates the permutation $p$-value of a two-sided test by default:
177
+
173
178
```{r, eval=FALSE}
174
179
pf$get_value(0)
175
180
```
@@ -183,6 +188,7 @@ By default, the number of sampled permutations is `1000L`. It is accessible thro
183
188
###Scenario A
184
189
185
190
Let us instantiate the plausibility for the data simulated under scenario A:
191
+
186
192
```{r, eval=FALSE}
187
193
pfa <- PlausibilityFunction$new(
188
194
null_spec = null_spec,
@@ -194,15 +200,19 @@ pfa$set_nperms(B)
194
200
```
195
201
196
202
We can compute a point estimate of the mean difference and store it inside the plausibility function class via the`$set_point_estimate()` method:
203
+
197
204
```{r, eval=FALSE}
198
205
pfa$set_point_estimate(mean(a2) - mean(a1))
199
206
```
200
207
201
208
The computed value can then be accessed via the`$point_estimate` field:
209
+
202
210
```{r}
203
211
pfa$point_estimate
204
212
```
213
+
205
214
or by displaying the list of parameters under investigation which is stored in the`$parameters` field.
215
+
206
216
```{r, eval=FALSE}
207
217
pfa$parameters
208
218
```
@@ -215,6 +225,7 @@ p
215
225
```
216
226
217
227
In this list, one can see that parameters come with an unknown range by default. We can however compute their bounds by defining a maximum confidence level through the`$set_max_conf_level()` method of the`PlausibilityFunction` class. When a plausibility function is instantiated, the default value for the`$max_conf_level` field is $0.99$. To set parameter bounds automatically, use the`$set_parameter_bounds()` method:
228
+
218
229
```{r, eval=FALSE}
219
230
pfa$set_parameter_bounds(
220
231
point_estimate = pfa$point_estimate,
@@ -223,11 +234,13 @@ pfa$set_parameter_bounds(
223
234
```
224
235
225
236
We can now inspect again the list of parameters under investigation to see the updated bounds:
237
+
226
238
```{r}
227
239
pfa$parameters
228
240
```
229
241
230
242
Once bounds are known for each parameter, it becomes possible to generate a grid for later evaluating the plausibility function. This is done through the`$set_grid()` method as follows:
243
+
231
244
```{r, eval=FALSE}
232
245
pfa$set_grid(
233
246
parameters = pfa$parameters,
@@ -236,6 +249,7 @@ pfa$set_grid(
236
249
```
237
250
238
251
We can then take a look at the newly created grid:
252
+
239
253
```{r, eval=FALSE}
240
254
pfa$grid
241
255
```
@@ -245,16 +259,19 @@ select(pfa$grid, -pvalue)
245
259
```
246
260
247
261
We can go a step further and evaluate the plausibility function on that grid using the`$evaluate_grid()` method as follows:
262
+
248
263
```{r, eval=FALSE}
249
264
pfa$evaluate_grid(grid = pfa$grid)
250
265
```
251
266
252
267
Again, we can then take a look at the updated grid:
268
+
253
269
```{r}
254
270
pfa$grid
255
271
```
256
272
257
273
We can add to this grid the p-value computed from the t-test assuming normality of the data:
274
+
258
275
```{r}
259
276
dfa <- pfa$grid %>%
260
277
mutate(
@@ -471,11 +488,13 @@ by-product as we will show in the next sections.
471
488
One can obtain a point estimate of the parameter under investigation by
472
489
searching which value of the parameter reaches the maximum of the
473
490
$p$-value function (which is $1$). One can use the`$set_point_estimate()` method to do that:
491
+
474
492
```{r, eval=FALSE}
475
493
pfa$set_point_estimate(overwrite = TRUE)
476
494
```
477
495
478
496
The computed point estimate is then stored in the`$point_estimate` field and can be retrieved as:
497
+
479
498
```{r}
480
499
pfa$point_estimate
481
500
```
@@ -486,6 +505,7 @@ One can obtain a confidence interval for the parameter under
486
505
investigation by searching for which values of the parameter the
487
506
$p$-value function remains above a pre-specified significance level
488
507
$\alpha$. This is achieved via the`$set_parameter_bounds()` method:
508
+
489
509
```{r, eval=FALSE}
490
510
pfa$set_parameter_bounds(
491
511
point_estimate = pfa$point_estimate,
@@ -504,6 +524,7 @@ hypothesis is $H_0: \delta = \delta_0$ is immediate from the $p$-value
504
524
function as it boils down to evaluating the $p$-value function in
505
525
$\delta_0$. Hence we can for instance test $H_0: \delta = 3$ against the
506
526
alternative $H_1: \delta \ne 3$ using the following piece of code: