library(amp)| Argument | Description |
|---|---|
| n_peld_mc_samples | Number of samples to be used in approximating the estimated limitingdistribution of the parameter estimate under the null. Increasing thisvalue reduces the approximation error of the test statistic. |
| nrm_type | The type of norm to be used for the test. Generally the l_pnorm |
| perf_meas | the preferred measure used to generate the test statistic. |
| pos_lp_norms | The index of the norms to be considered. For example if we use thel_p norm, norms_indx specifies the different p’s to try. |
| ld_est_meth | String indicating method for estimating the limiting distribution ofthe test statistic parametric bootstrap or permutation. |
| ts_ld_bs_samp | The number of test statistic limiting distribution bootstrap samplesto be drawn. |
| other_output | A vector indicating additional data that should be returned.Currently only"var_est" is supported. |
| … | Other arguments needed in other places. |
Throughout, we will use a simple data generating mechanism:
x_data<-matrix(rnorm(500),ncol =5)y_data<-rnorm(100)+0.02* x_data[,2]obs_data<-data.frame(y_data, x_data)There are multiple options when defining a test statistic outside ofthe specification of the parameter estimator,\(\hat{\Psi}\) and corresponding ICestimator,\(\hat{IC}\) (which isspecified in theparam_est argument. There are fourarguments arguments that control these options.
The first argumentperf_meas specifies theperformance measure used to define the test statistic.Loosely defined, a performance measure is a function that providesinformation about the performance of a simple test at a specifiedalternative. It takes as arguments a norm\(\varphi\), an alternative\(x\) and a limiting distribution\(P_0\) and considers the performance of atest defined by\[ \text{reject if } \varphi\left(\hat{\psi}\right) > c_{\alpha}\] if the parameter value\(\psi\) was equal to\(x\). Theperf_meas specifieswhich measure of performance to use. Currently the package hasimplemented three such measures:
p-value (specified by settingperf_meas = "pval"):The p-value of the test if\(\hat{\psi} =x\), defined by\[\Gamma(x, P_0) :=\text{pr}(\varphi(x) < \varphi(Z)) \text{ where } Z \simP_0\]
acceptance rate (specified by settingperf_meas = "est_acc"): The acceptance rate of the test if\(\hat{\psi}\), is normally distributedand centered at\(x\) defined by\[\Gamma(x, P_0) := \text{pr}(\varphi(x + Z) <c_\alpha) \text{ where } Z \sim P_0 \text{ and } c_\alpha =F_{\varphi(Z)}^{-1}(1 - \alpha)\]
multiplicative distance (specified by settingperf_meas = "mag"): The minimum\(s\) such that\[\text{pr}(\varphi(s x + Z) < c_\alpha) \text{where } Z \sim P_0 \text{ and } c_\alpha = F_{\varphi(Z)}^{-1}(1 -\alpha)\] is lower than\(0.2\).
Recommendation: Based on what we know currently, werecommend that users use the multiplicative distance performancemeasure. The other measures can have limiting distributions that arehighly concentrated near 0 which can cause issues when approximating thep-value of the test.
We will discuss specification of the norm in the next section. Formore details on the procedure, including why performance measures aregood for defining a test statistic, seeA general adaptive framework formultivariate point null testing.
Two arguments are used to specify the norm used in defining the teststatistic. The first isnrm_type which can either be"ssq" or"lp". These norms are defined as:
"lp"):\[\ell_p: (x_1, x_2,\ldots, x_d) \mapsto \sqrt[p]{\sum_{i = 1}^d|x_i|^p} \]"ssq"):\[\jmath_{p}:(x_1,x_2,\ldots,x_d)\mapsto\left\{\textstyle\sum_{j=1}^{p}x^2_{(d-j+1)}\right\}^{1/2}\]The choice of\(p\) is specified bythepos_lp_norms argument. Ifpos_lp_norm isassigned a single value, a non-adaptive version of the test will beperformed. If insteadpos_lp_norm is assigned multiplearguments an adaptive test will be carried out. More information can befound inour paper. Forthe\(\ell_p\) norm, it is possible toset\(p = \infty\). To make thisspecification in R, include"max" in the vector of valuesassigned topos_lp_norm.
The next argument we review specifies the method by which you wish toestimate the limiting distribution of the test statistic (\(\Gamma(\hat{\psi}, \hat{P}_0)\)). There aretwo options for this argument:
ld_est_meth = "par_boot"): When using the parametricbootstrap version of the test, the estimated limiting distribution of\(\Gamma(\hat{\psi}, \hat{P}_0)\) isapproximated by assuming that\(\hat{\psi}\) has a distribution equal to\(\hat{P}_0\) and that\(\hat{P}_0\) is normal distribution.ld_est_meth = "perm"): When using the permutation versionof the test, the estimated limiting distribution of\(\Gamma(\hat{\psi}, \hat{P}_0)\) isapproximated by repeatedly permuting the data and recalculating\(\hat{\psi}\) using the permuted data. Thismethod may provide better finite sample performance. However, it comesat the cost computational efficiency.Also note that dependingon the parameter of interest, the permutation based test may not havethe same null hypothesis as is desired. Thus, care must be taken whenusing this method.The next two controls specify the accuracy of the approximation ofthe testing procedure.
To understand this control argument it is important to distinguishbetween our parameter estimator\(\hat{\psi}\) and our test statistic, whichis a function of\(\hat{\psi}\) and theestimated limiting distribution of\(\hat{\psi}\) under the null hypothesis(that\(\psi = 0\)), denoted by\(\hat{P}_0\). Letting\(\Gamma\) denote our performance measure,conditional on our observations, the true value of the test statistic isfixed and equal to\(\Gamma(\hat{\psi},\hat{P}_0)\).
Then_peld_mc_samples argument determines how accuratethe approximation of test statistic\(\Gamma(\hat{\psi}, \hat{P}_0)\) will be.The performance measure is frequently a function of\(\hat{P}_0\) through some probabilitystatement (see theperf_meas for examples). To approximatethese probabilities, a MC approximation is used andn_peld_mc_samples determines how many MC draws aretaken.
Considering this argument in practice, note that the testingprocedure only approximates the test statistic:
tc<- amp::test.control(n_peld_mc_samples =50,pos_lp_norms ="2")set.seed(10)test_1<- amp::mv_pn_test(obs_data = obs_data,param_est = amp::ic.pearson,control = tc)set.seed(20)test_2<- amp::mv_pn_test(obs_data = obs_data,param_est = amp::ic.pearson,control = tc)print(c(test_1$test_stat, test_2$test_stat))#> [1] 0.92 0.94In order to better approximate the test statistic, one may increasethe value of this control argument:
mc_draws<-c(10,50)all_res<-list()for (mc_drawsinc(10,50)) {set.seed(121) tc<- amp::test.control(n_peld_mc_samples = mc_draws,pos_lp_norms =2,perf_meas ="est_acc") test_stat<-replicate(50, amp::mv_pn_test(obs_data = obs_data,param_est = amp::ic.pearson,control = tc)$test_stat) all_res[[as.character(mc_draws)]]<-data.frame("mc_draws"= mc_draws, test_stat)}oldpar<-par(mfrow =c(1,2))yl<-25hist(all_res[[1]]$test_stat,main ="MC draws = 10",xlab ="Test Statistic",xlim =c(0,1),ylim =c(0, yl),breaks =seq(0,1,0.1))hist(all_res[[2]]$test_stat,main ="MC draws = 50",xlab ="Test Statistic",xlim =c(0,1),ylim =c(0, yl),breaks =seq(0,1,0.1))par(oldpar)The other parameter that determines the approximation accuracy of thetesting procedure ists_ld_bs_samp. This argumentdetermines the number of draws taken from the estimated limitingdistribution of\(\Gamma(\hat{\psi},\hat{P}_0)\). This is different thatn_peld_mc_samples that determines the accuracy of thesedraws and the test statistic.
mv_pn_testThe last argument determines the output of themv_pn_test function. The standard output of the testfunction is a list containing the following:
pvalue: The approximate p-value of the testtest_stat: The approximate value of the test statistic(\(\Gamma(\hat{\psi},\hat{P}_0)\)).test_st_eld: The approximate limiting distribution ofthe test statistic (with length equal tots_ld_bs_samp).chosen_norm: A vector indicating which norm was chosenby the adaptive testparam_ests: The parameter estimate (\(\hat{\psi}\)).param_ses: An estimate of the standard error off eachelement of\(\hat{\psi}\)oth_ic_inf: Any other information provided by theparam_est function when calculating the IC and parameterestimates.other_output is a character vector. Currentlyother_output only provides the option of returning twoadditional output elements.
"var_est" is contained inother_output,the test output will contain will havevar_mat returnedwhich is the empirical second moment of the IC (equal asymptotically tothe variance estimator). However, this matrix can be quite large forlarger dimensions, which is why there is a separate control for thisoption."obs_data" is contained in theother_output, the test output will return the data passedto the testing function.