library(amp)Here, we will go through one of the examples from the paper for whichwe have already built an influence curve (IC). For a more formalintroduction of influence curves, seethis article. In thisexample,\(Y\) is the outcome ofinterest, and\(X_1\),…,\(X_d\) are\(d\) covariates measured on each(independent) individual. The parameter in this example is the pearsoncorrelation coefficient between the outcome of interest and eachcovariate, that is\(\psi_i = \text{corr}(Y,X_1)\). The null hypothesis we wish to test is that\(\psi_1 = \psi_2 =\)…\(=\psi_d=0\).
x_data<-matrix(rnorm(5000),ncol =5)y_data<-rnorm(1000)+0.02* x_data[,2]obs_data<-data.frame(y_data, x_data)Here, we have already written a function that calculates both theparameter estimate and the IC for this parameter (we also choose to makeno assumptions about the data generating mechanism).
amp::ic.pearson(obs_data,what ="est")#> $est#> X1 X2 X3 X4 X5#> [1,] -0.02897102 -0.02868844 -0.06009732 0.05895252 -0.007182061cor(y_data, x_data)#> [,1] [,2] [,3] [,4] [,5]#> [1,] -0.02897102 -0.02868844 -0.06009732 0.05895252 -0.007182061We are now ready to test the multivariate point null.
test_res<- amp::mv_pn_test(obs_data,param_est = amp::ic.pearson,control = amp::test.control())hist(test_res$test_st_eld)abline(v = test_res$pvalue,col ="red")The derivation of influence functions for parameter estimators isquite technically challenging. However, once the IC and the parameterestimators have been derived, they can be used by this package.
As an example, we will consider the influence function for the samplemean which is just\(x - \bar{x}\). Thefunction that is passed to the mv_pn_test will always take threearguments.
observ) should be the observeddatawhat) will specify if the functionshould return the estimate (“est”), the influence curve (“ic”), or both(“both”).control takes any other argumentsused to control the function. Even if your function does not use otherarguments, the function definition should still containcontrol = NULL.The object returned by the function should be a list whichcontains
"est" ifwhat is either"est"or"both""ic" ifwhat is either"ic"or"both"ic.mean<-function(observ,what ="both",control =NULL){if (!(what%in%c("ic","est","both"))) {stop("what must be one of ic (influence curve), est (estimate), or both") } ret<-list()if (what%in%c("ic","both")) { col_means<-colMeans(x = observ) infl<-sweep(x = observ,MARGIN =2,STATS = col_means,FUN ="-") ret$ic<- infl }if (what%in%c("est","both")) { ret$est<-colMeans(x = observ) }return(ret)}Now that we have our newly defined IC, we can pass it to the testingfunction. We first create a new dataset for which we are interested intesting if each covariate has mean zero:
set.seed(100)obs_data_mean1<-matrix(rnorm(n =500)+rep(c(0,0,0,0,0.01),each =100),ncol =5,byrow =FALSE)res_1<- amp::mv_pn_test(obs_data_mean1,param_est = ic.mean,control = amp::test.control())obs_data_mean2<-matrix(rnorm(n =500)+rep(c(0,0,0.32,0,0.07),each =100),ncol =5,byrow =FALSE)res_2<- amp::mv_pn_test(obs_data_mean2,param_est = ic.mean,control = amp::test.control())print(c(res_1$pvalue, res_2$pvalue))#> [1] 0.788 0.080In cases where control will modify the behavior of the IC orparameter estimation, these arguments should be appended to the list ofalready created arguments inamp::test.control() and passedto thecontrol argument of themv_pn_test.
When passing control arguments to theparam_estfunction, it is recommended that if needed, these functions explicitlypull these arguments out of control before passing them to a nestedfunction:
## Correct usagenested_fun1<-function(x)return(x)ic.mean1<-function(observ,what ="both",control =NULL){if (!(what%in%c("ic","est","both"))) {stop("what must be one of ic (influence curve), est (estimate), or both") } ret<-list()if (what%in%c("ic","both")) { mult<-nested_fun1(control$extra_arg) col_means<- mult*colMeans(x = observ) infl<-sweep(x = observ,MARGIN =2,STATS = col_means,FUN ="-") ret$ic<- infl }if (what%in%c("est","both")) { ret$est<-colMeans(x = observ) }return(ret)}test1<- amp::mv_pn_test(obs_data_mean1,param_est = ic.mean1,control =c(amp::test.control(),"extra_arg"=2))## Incorrect usagenested_fun2<-function(x)return(x$extra_arg)ic.mean2<-function(observ,what ="both",control =NULL){if (!(what%in%c("ic","est","both"))) {stop("what must be one of ic (influence curve), est (estimate), or both") } ret<-list()if (what%in%c("ic","both")) { mult<-nested_fun2(control) col_means<- mult*colMeans(x = observ) infl<-sweep(x = observ,MARGIN =2,STATS = col_means,FUN ="-") ret$ic<- infl }if (what%in%c("est","both")) { ret$est<-colMeans(x = observ) }return(ret)}test2<- amp::mv_pn_test(obs_data_mean1,param_est = ic.mean2,control =c(amp::test.control(),"extra_arg"=2))#> Warning in .checkargs(obs_data, param_est, control): Some arguments appear not to be used by the test. These are:#> extra_arg#> To avoid this warning, explicitly call these arguments in your param_est function (rather than just passing all control arguments to another function.