In applied work in order to evaluate the effect of a set of exogenousvariables on an outcome it is very common to estimate a parametric modelsuch as the linear model with ordinary least squares (OLS). But suchparametric specifications may not capture the true relationship betweenoutcome and exogenous variables. In fact if the chosen parametric modelis a bad approximation of the true model then counterfactual analysiswill be flawed. For this reason in the past forty years a literature onspecification tests has developed in order to know if a parametricspecification is right or wrong.SpeTestNP is a packagewhich implements heteroskedasticity-robust specification tests ofparametric models from Bierens (1982), Zheng (1996), Escanciano (2006),Lavergne and Patilea (2008), and Lavergne and Patilea (2012).
Hippolyte Boucher (Hippolyte.Boucher@outlook.com)is the author ofSpeTestNP and Pascal Lavergne (lavergnetse@gmail.com) is acontributor. Both Hippolyte Boucher and Pascal Lavergne are maintainersand any question or bug should be reported to one of them. This vignettedescribes the principle behind each test available inSpeTestNP, then how to useSpeTestNP to test aparametric specification in practice with an illustration using theexpected earnings conditional on education and age.
In order to present the specification tests available inSpeTestNP we first describe the model being considered anddefine the null and alternative hypothesis, second we highlight theprinciple behind each test, third we derive the test statistics andtheir rejection rules (based on either the bootstrap or Gaussianasymptotics), and fourth we briefly discuss and compare the tests sizeand power performances.
Consider a sample\((y_j,x_j')_{j=1}^{n}\) of independentobservations with\(y_j\) the scalaroutcome and\(x_j\) a\(k\times 1\) vector of exogenous explanatoryvariables. Then as long as\(\mathbb{E}(|y_j|)<+\infty\) there existssome Borel-measurable regression function\(g(\cdot)\) such that\(g(x_j)=\mathbb{E}(y_j|x_j) \ \ a.s\). Thatis the true model linking\(y_j\) and\(x_j\) writes
\[ y_j=g(x_j)+\varepsilon_j, \qquad\mathbb{E}(\varepsilon_j|x_j)=0 \quad a.s\]
for\(j=1,2,\dots,n\) and where\(\varepsilon_j\) denotes the part of\(y_j\) which is unexplained by\(x_j\) in terms of the mean. But instead inpractice some parametric model characterized by a parametric family offunctions\(\mathcal{F}=\{f(\cdot,\tilde{\theta}):\tilde{\theta}\in\Theta\subset\mathbb{R}^k\}\)is considered
\[ y_j=f(x_j,\theta)+u_j \]
where\(\theta= \\underset{\tilde{\theta}\in\Theta}{Argmin} \\mathbb{E}((y_j-f(x_j,\tilde{\theta}))^2)\) is the parameterwhich yields the best mean square error fit for this parametric model,and where\(u_j\) is the error inducedby this parametric model. A typical estimator of\(\theta\) is the non-linear least squares(NLS) estimator denoted by\(\hat{\theta}\), thus when\(\mathcal{F}\) is the family of linearfunctions then\(\hat{\theta}\) is theOLS estimator. Next notice that if\(g(\cdot)\in\mathcal{F}\) then\(\mathbb{E}(u_j|x_j)=0 \ a.s\) orequivalently\(\mathbb{E}(y_j|x_j)=f(x_j,\theta)\). Indeedif\(g(\cdot)\in\mathcal{F}\) then byproperties of projections
\[ g(\cdot)= \\underset{\tilde{g}}{Argmin} \ \mathbb{E}((y_j-\tilde{g}(x_j))^2) =\\underset{\tilde{g}\in\mathcal{F}}{Argmin} \\mathbb{E}((y_j-\tilde{g}(x_j))^2)= \\underset{\tilde{\theta}\in\Theta}{Argmin} \\mathbb{E}((y_j-f(x_j,\tilde{\theta}))^2)=f(\cdot,\theta) \]
Consequently when modeling the true relationship between\(y\) and\(x\) with a parametric model, the implicitnull hypothesis is
\[ H_0:\mathbb{E}(u_j|x_j)=0 \quad a.s\]
And the alternative hypothesis is
\[H_1:\mathbb{P}(\mathbb{E}(u_j|x_j)=0)<1 \]
Equivalently the null and alternative hypothesis write
\[H_0: g(x_j)=f(x_j,\theta) \quad a.s,\qquad H_1:\mathbb{P}(g(x_j)=f(x_j,\theta))<1 \]
Next to construct specification tests the null hypothesis isreformulated into moments conditions from which statistics can bederived. The five reformulations of the null hypothesis are inorder.
Bierens (1982) proves that the conditional moment condition of thenull hypothesis is equivalent to an infinite number of moment conditionswhich is equivalent to an integrated conditional moment condition
\[H_0:\mathbb{E}(u_j|x_j)=0 \quad a.s \\Leftrightarrow \mathbb{E}(u_j exp(i\beta' x_j))=0 \ \ \forall\beta\in\mathbb{R}^{k}\Leftrightarrow\int_{\mathbb{R}^k}\left|\mathbb{E}(u_j exp(i\beta'x_j))\right|^2d\mu(\beta)=0\]
where\(\mu(\cdot)\) is any positivealmost everywhere measure,\(|\cdot|\)denotes the modulus, and\(i\) is theimaginary unit.
Instead Zheng (1996) finds an equivalence between the conditionalmoment condition and an unconditional one
\[H_0:\mathbb{E}(u_j|x_j)=0 \quad a.s \\Leftrightarrow\mathbb{E}(u_j\mathbb{E}(u_j|x_j)f(x_j))=0\]
where\(f(\cdot)\) denotes theprobability density function of\(x_j\).
Escanciano (2006) proves the equivalence between the null hypothesis,an infinite number of moment conditions which differ from Bierens(1982), and an integrated moment condition
\[H_0:\mathbb{E}(u_j|x_j)=0 \quad a.s \\Leftrightarrow\mathbb{E}(u_j1\{\beta' x_j\leqslant l\})=0 \ \\forall(t,l)\in\mathbb{S}^{k}\times \mathbb{R}\\ \Leftrightarrow\int_{\mathbb{S}^k\times\mathbb{R}}\mathbb{E}^2(u_j1\{\beta'x_j\leqslant l\})f_\beta(l)d\beta dl=0\]
where\(1\{\cdot\}\) denotes theindicator function,\(\mathbb{S}^{k}=\{\beta\in\mathbb{R}^k:|\beta|=1\}\)denotes the unit sphere, and\(f_\beta(\cdot)\) denotes the probabilitydensity function of\(\beta'x_j\).
Lavergne and Patilea (2008) show that the null hypothesis isequivalent to an infinite number of unconditional moment conditions
\[H_0:\mathbb{E}(u_j|x_j)=0 \quad a.s \\Leftrightarrow \ \underset{||\beta||=1}{max} \\mathbb{E}(u_j\mathbb{E}(u_j|\beta' x_j)\omega(\beta'x_j))=0\]
for any\(\omega(\cdot)\) such that\(\forall \beta\in\mathbb{R}^k\),\(\omega(\beta' x_j)>0\) on thesupport of\(\mathbb{E}(u_j|\beta'x_j)\). This condition resembles that of Zheng (1996) with\(\beta' x_j\) replacing\(x_j\) in an effort to remove the curse ofdimensionality.
Finally Lavergne and Patilea (2012) prove the equivalence between thenull and an integrated moment condition
\[H_0:\mathbb{E}(u_j|x_j)=0 \quad a.s \\Leftrightarrow \int_B\mathbb{E}(\mathbb{E}^2(u_j|\beta'x_j)f_\beta(\beta' x_j))d\beta=0\]
where\(B\subseteq \mathbb{S}^k\)and\(f_\beta(\cdot)\) denotes thedensity of\(\beta' x_j\). Thismoment condition combines the integrated moments approaches of Bierens(1982) and Escanciano (2006) and the dimension reduction devise used inLavergne and Patilea (2008).
Each test relies on reformulating the null hypothesis into a momentcondition for which an empirical counterpart exist. Thus the teststatistics are sample analogs of the moments defining the nullhypothesis, possibly multiplied by the sample size in order to obtainvariation at the limit. Denote by\(\hat{\theta}\) a consistent estimator of\(\theta\) and let\(\hat{u}_j=y_j-f(x_j,\hat{\theta})\) denotethe residual for individual\(j\). Thefive test statistics are derived in order.
An empirical counterpart of the integrated conditional moment\(\int_{\mathbb{R}^k}\left|\mathbb{E}(u_jexp(i\beta' x_j))\right|^2d\mu(\beta)\) of Bierens (1982)is
\[T_{icm}=\int_{\mathbb{R}^k}\left|\frac{1}{\sqrt{n}}\sum_{j=1}^n\hat{u}_jexp(i\beta' x_j)\right|^2d\mu(\beta) \]
with some positive almost everywhere measure\(\mu(\cdot)\) and where\(|\cdot|\) denotes the modulus. Usingproperties of the modulus and of the Fourier transform it can then beshown that
\[T_{icm}=\frac{1}{n}\sum_{j,j'}\hat{u}_j\hat{u}_{j'}K(x_j-x_{j'})=\frac{1}{n}\hat{u}'W_{icm}\hat{u}\]
where\(K(\cdot)\) is the Fouriertransform of\(\mu(\cdot)\),\(\hat{u}=(\hat{u}_1,\dots,\hat{u}_n)'\)is the\(n\times 1\) vector of stackedresiduals, and\(W_{icm}\) is thematrix with entries\(K(x_j-x_{j'})\) for any row\(j\) and column\(j'\). Although this statistic can beused as is,\(\mu(\cdot)\) is typicallyassumed to be a symmetric probability measure which is strictly positivealmost everywhere. This simplifies the asymptotic theory and thederivation of the test statistic in practice. Indeed as a consequencethe Fourier transform of\(\mu(\cdot)\)denoted as\(K(\cdot)\) is a symmetricbounded density. Hence candidates for\(K(\cdot)\) include logistic, triangular,normal, student, or Cauchy densities, see Johnson, Kotz and Balakrishnan(1995, section 23.3) and Dreier and Kotz (2002). Furthermore to controlfor scale, we impose that either the integral of\(K(\cdot)\) to the square equals one or thatthe distribution associated to\(K(\cdot)\) has variance one.
Zheng (1996) test statistic is the sample analog of\(\mathbb{E}(u_j\mathbb{E}(u_j|x_j)f(x_j))\)which is derived by estimating both the density\(f(\cdot)\) of\(x_j\) and the conditional mean\(\mathbb{E}(u_j|x_j=\cdot)\) with Kernels.For any\(\tilde{x}\in\mathbb{R}^k\)define
\[ \hat{f}(\tilde{x})=\frac{1}{nh^k}\sum_jK\left(\frac{\tilde{x}-x_j}{h}\right), \qquad\hat{\mathbb{E}}(u_j|x_j=\tilde{x})=\frac{1}{nh^k}\sum_j\frac{u_j}{\hat{f}(\tilde{x})}K\left(\frac{\tilde{x}-x_j}{h}\right)\]
where\(K(\cdot)\) is a Kernelfunction which is nonnegative, symmetric, bounded, continuous and whichintegrates to one, and\(h\) abandwidth such that\(h\underset{n\rightarrow+\infty}{\rightarrow}0\)and\(nh^k\underset{n\rightarrow+\infty}{\rightarrow}+\infty\).Then the test statistic is the sample analog of the moment\(\mathbb{E}(u_j\mathbb{E}(u_j|x_j)f(x_j))\)
\[ T_{zheng}=\frac{1}{n}\sum_j\hat{u}_j\hat{\mathbb{E}}(u_{j'}|x_{j'}=x_j)\hat{f}(x_j)\]
It can be rewritten as
\[T_{zheng}=\frac{1}{n(n−1)h^k}\sum_{j,j′\neqj}\hat{u}_j\hat{u}_{j′}K\left(\frac{x_j−x_{j′}}{h}\right)=\frac{1}{n(n−1)h^k}\hat{u}^′W_{zheng}\hat{u}\]
where\(W_{zheng}\) is a matrixwhose diagonal elements are equal to zero and its other entries areequal to\(K\left(\frac{x_j−x_{j′}}{h}\right)\) forany row\(j\) any column\(j′\) such that\(j\neq j′\).
Escanciano (2006) test statistic is the sample analog of\(\int_{\mathbb{S}^k\times\mathbb{R}}\mathbb{E}^2(u_j1\{\beta' x_j\leqslant l\})f_\beta(l)d\betadl\) times\(n\) which isderived by approximating the density\(f_\beta(\cdot)\) by a probability massfunction. Let\(\hat{f}_\beta(l)=\frac{1}{n}\sum_r1\{\beta'x_r=l\}\) then the statistic is
\[T_{esca}=\int_{\mathbb{S}^k\times\mathbb{R}}\left(\frac{1}{\sqrt{n}}\sum_j\hat{u}_j1\{\beta' x_j\leqslant l\}\right)^2\hat{f}_\beta(l)d\betadl \]
It can be proven that it has the same form as the other teststatistics
\[ T_{esca} =\frac{1}{n}\sum_{j,j'}\hat{u}_j\hat{u}_{j'}\frac{1}{n}\sum_r\int_{\mathbb{S}^k}1\{\beta'x_j\leqslant \beta' x_r,\beta' x_{j'}\leqslant \beta'x_r\}d\beta=\frac{1}{n}\hat{u}'W_{esca}\hat{u}\]
where\(W_{esca}\) has elements\(\frac{1}{n}\sum_rW_{esca,j,j',r}\) with\(W_{esca,j,j',r}=\int_{\mathbb{S}^k}1\{\beta'x_j\leqslant \beta' x_r,\beta' x_{j'}\leqslant \beta'x_r\}d\beta\) for any row\(j\)and column\(j'\). Approximatingthe integrals in\(W_{esca}\) isunnecessary because
\[W_{esca,j,j',r}=W_{esca,j,j',r}^{(0)}\frac{\pi^{k/2}-1}{\Gamma(k/2+1)},\qquadW_{esca,j,j',r}^{(0)}=\left|\pi-arccos\left(\frac{(x_j-x_{r})'(x_{j'}-x_r)}{|x_j-x_{r}||x_{j'}-x_r|}\right)\right|\]
See appendix B in Escanciano (2006) for more details. Note that\(n^3\) operations are necessary to compute\(W_{esca}\) which means that thisstatistic takes much more time to compute.
Lavergne and Patilea (2008) consider a sample analog of the moment\(\mathbb{E}(u_j\mathbb{E}(u_j|x_j)\omega(\beta'x_j))\) and replace\(\omega(\cdot)\) by\(f_\beta(\cdot)\) the density of\(\beta' x_j\). In addition they replace\(\beta\) by the value in the unithypersphere which maximizes the moment taken to the square. This way thetest is given the direction which best reject the null hypothesis underthe alternative. Thus first define for any\(t\in\mathbb{S}^k\)
\[\mathcal{Q}(\beta)=\frac{1}{n(n-1)h}\sum_{j,j'\neqj}\hat{u}_j\hat{u}_{j'}K\left(\frac{\beta'(x_j-x_{j'})}{h}\right)\]
where\(K(\cdot)\) is a boundedsymmetric density with bounded variation,\(h\) is a bandwidth such that\(h\underset{n\rightarrow+\infty}{\longrightarrow}0\) and\(\frac{(nh^2)^{\delta}}{log(n)}\underset{n\rightarrow+\infty}{\longrightarrow}+\infty\)for some\(\delta\in(0;1)\).\(\mathcal{Q}(\beta)\) cannot be directlyused, instead define\(\hat{\beta}\)the direction which best captures the correlation between the residualsand the explanatory variables
\[\hat{\beta}= \\underset{\beta\in\mathbb{S}^k}{Argmax} \|n\sqrt{h}\mathcal{Q}(\beta)−\alpha_n 1\{\beta\neq\beta^*\}|\]
where\(\beta^*\) represents afavored direction chosen a priori, and\(\alpha_n\underset{n\rightarrow+\infty}{\rightarrow}0\) is the weight given to this favoreddirection.\(\beta^*\) and\(\alpha_n\) improve significantly the powerproperties of the test in small sample. Note that in practice the unithypersphere\(\mathbb{S}^k\) isapproximated by a finite number of points. Thus the test statistic isthe criterion evaluated at\(\hat{\beta}\)
\[T_{pala}=\mathcal{Q}(\hat{\beta})=\frac{1}{n(n−1)h}\sum_{j,j′\neqj}\hat{u}_j\hat{u}_jK\left(\frac{\hat{\beta}′(x_j−x_{j′})}{h}\right)=\frac{1}{n(n−1)h}\hat{u}^′W_{pala}\hat{u}\]
where\(W_{pala}\) is a matrix withdiagonal elements equal to zero and its other entries equal to\(K\left(\frac{\hat{β}′(x_j−x_j)}{h}\right)\)for any row\(j\) and column\(j′\) such that\(j\neq j′\).
Finally Lavergne and Patilea (2012) use the sample analog of\(\int_B\mathbb{E}(\mathbb{E}^2(u_j|\beta'x_j)f_\beta(\beta' x_j))d\beta=0\) for some\(B\subseteq\mathbb{S}^k\) as a teststatistic. To derive it notice that an empirical counterpart of\(\mathbb{E}(\mathbb{E}^2(u_j|\beta'x_j)f_\beta(\beta' x_j))\) is\(\mathcal{Q}(\beta)\) as defined inpreviously. Hence their test statistic which they call smooth integratedconditional moment statistic writes
\[T_{sicm}=\int_B\mathcal{Q}(\beta)d\beta=\int_B\frac{1}{n(n-1)h}\sum_{j,j'\neqj}\hat{u}_j\hat{u}_{j'}K\left(\frac{\beta'(x_j-x_{j'})}{h}\right)d\beta=\frac{1}{n(n-1)h}\hat{u}'W_{sicm}\hat{u}\]
where\(W_{sicm}\) has diagonalelements equal to zero and its other elements are equal to\(\int_BK\left(\frac{\beta'(x_j-x_{j'})}{h}\right)d\beta\)for any row\(j\) and any column\(j'\neq j\). Clearly\(T_{sicm}\) is a smooth version of\(T_{icm}\) because of the bandwidth\(h\). Furthermore it is also a smoothversion of\(T_{pala}\) in the sensethat instead of being based on the squared error in the worst directionof\(\beta' x_j\), it is based on acontinuum of directions. In practice to compute the integral a finitenumber of points are drawn randomly from\(B\) and\(B\) doesn’t have to be the whole unithypersphere\(\mathbb{S}^k\). Forinstance half hyperspheres can be considered such as\(\{\beta\in\mathbb{R}^k:\beta_m\geqslant0,||\beta||=1\}\) where\(\beta_m\) denotes the\(m\)-th element of the vector\(\beta\).
The five test statistics can be normalized. Not only does thisimprove the finite sample properties of the tests, but it allows to useGaussian asymptotics when deciding to reject the null hypothesis withthe tests of Zheng (1996), Lavergne and Patilea (2008), and Lavergne andPatilea (2012). This is extremely useful in large samples instead ofusing the bootstrap.
The normalized test statistics are of the following form:
\[ \hat{T}_{icm}=\hat{u}^′\hat{W}_{icm}\hat{u}, \qquad\hat{W}_{icm}=W_{icm}\sqrt{2\sum_{j,j′}\hat{\sigma}^2_j\hat{\sigma}^2_{j′}K^2(x_j−x_{j′})}\]
\[\hat{T}_{zheng}=\hat{u}^′\hat{W}_{zheng}\hat{u},\qquad\hat{W}_{zheng}=W_{zheng}\sqrt{2\sum_{j,j′\neqj}\hat{\sigma}^2_j\hat{\sigma}^2_{j'}K^2\left(\frac{x_j−x_{j′}}{h}\right)}\]
\[\hat{T}_{esca}=\hat{u}′\hat{W}_{esca}\hat{u}, \qquad\hat{W}_{esca}=W_{esca}\sqrt{2\sum_{j,j′}\hat{\sigma}_j^2\hat{\sigma}^2_{j′}\left(\frac{1}{n}\sum_r\int_{\mathbb{S}^k}1\{\beta′x_j\leqslant\beta′x_r,\beta′x_{j′}⩽\beta′x_r\}d\beta\right)^2} \]
\[\hat{T}_{pala}=\hat{u}^′\hat{W}_{pala}\hat{u}, \qquad\hat{W}_{pala}=W_{pala}\sqrt{2\sum_{j,j′\neqj}\hat{\sigma}^2_j\hat{\sigma}_{j'}^2K^2\left(\frac{\hat{β}′(x_j−x_{j′})}{h}\right)}\]
\[\hat{T}_{sicm}=\hat{u}^′\hat{W}_{sicm}\hat{u},\qquad \hat{W}_{sicm}=W_{sicm}\sqrt{2\sum_{j,j′\neqj}\hat{\sigma}_j^2\hat{\sigma}^2_{j'}\left(\int_BK\left(\frac{\beta′(x_j−x_{j′})}{h}\right)d\beta\right)^2}\]
where\(\hat{\sigma}_j^2\) controlsfor the conditional variance of the error uj. A naive approach to thenormalization which works very well in large sample is to directlyreplace\(\hat{\sigma}_j^2\) by thesquared residuals\(\hat{u}_j^2\).Another approach to the normalization is to replace\(\hat{\sigma}_j^2\) by an estimator such theas the nonparametric kernel variance estimator of Yin, Geng, Li and Wang(2010) which writes
\[\hat{\sigma}^2(\tilde{x})=\frac{\frac{1}{nh_v}\sum_j(y_j−\overline{y}(\tilde{x}))^2K\left(\frac{\tilde{x}−x_j}{h_v}\right)}{\frac{1}{nh_v}\sum_jK\left(\frac{\tilde{x}−x_j}{h_v}\right)},\qquad \overline{y}(\tilde{x})=\frac{\frac{1}{nh_v}\sum_jy_jK\left(\frac{\tilde{x}−x_j}{h_v}\right)}{\frac{1}{nh_v}\sum_jK\left(\frac{\tilde{x}−x_j}{h_v}\right)} \]
where\(K\) is a Kernel function and\(h_v\) is a bandwidth which can bedifferent from\(h\).
Both the naive and nonparametric approaches to the normalization areimplemented.
To decide whether to reject or not the null hypothesis we need tocompute quantiles of the distribution of each statistic under the nullconditional on\(x≡=(x_1,\dots,x_n)′\).Then\(H_0\) is rejected at level 5% ifthe test statistic is above the quantile 95% of its distribution underthe null. To compute these quantiles we propose two solutions.
First we consider computing the quantiles using the fixed designbootstrap.\(x\) is held fixed so foreach test statistic their central W is held fixed, and a n×1 vector ofresiduals ˆub is drawn using the fixed design wild bootstrap of Wu(1986) or the smooth conditional moment bootstrap of Gozalo (1997). Itwill also control for potential heteroskedasticity. Using thisbootstrapped vector of residuals and the maintained central matrix\(W\) a bootstrapped statistic can becomputed. After repeating this operation many times we obtain a vectorof bootstrapped statistics. The quantiles of this vector can then beused to reject or not\(H_0\). As anexample if the test we consider is that of Bierens (1982) a bootstrappedstatistic is
\[T_{icm,b}=\frac{1}{n}\hat{u}^′_bW_{icm}\hat{u}_b \]
By repeating this operation B times we obtain B bootstrappedstatistics\((T_{icm,b})_{b=1}^B\)which mimic the behavior of\(T_{icm}\)under the null hypothesis. Consequently the parametric specificationwill be rejected at level 5% if\(T_{icm}>q_{95\%}\) where\(q_{95\%}\) is the 95% quantile of\((T_{icm,b})^B_{b=1}\). The same procedurecan be applied to other tests and their normalized versions to decidewhether or not to reject the null hypothesis.
Second we consider using the quantiles of the standard normal. Asmentioned, the normalized versions of the statistics of Zheng (1996),Lavergne and Patilea (2008), and Lavergne and Patilea (2012) areasymptotically standard normal. Thus if one of these normalized teststatistics are used, we can use the quantiles of a standard normal toreject or not\(H_0\). As an example ifthe test we consider is that of Zheng (1996) with a normalization thenthe parametric specification will be rejected at level 5% if\(|\hat{T}_{zheng}|>1.96\).
Each test can be proven to be valid, as in under the null hypothesisthe probability to reject the null converges to nominal level, and to beconsistent, as in under any fixed alternative the probability to rejectthe null converges to one.
But these five tests differ significantly in terms of power inpractice. The test of Zheng (1996) seem to be the least powerful test inpractice, it has no power against Pitman alternatives and has difficultyrejecting the null when the number\(k\) of exogenous variables is large. Thetest of Bierens (1982) possesses more than trivial power against Pitmanalternatives but it also has trouble rejecting the null when\(k\) is large. The test of Escanciano (2006)does not depend on a choice of weighting function and does not requirenumerical integration however to derive its statistic it requires\(n^3\) operations making it very slow andhard to apply in practice. In addition its power however largely dependson the true alternative and is low when\(k\) is large. The tests of Lavergne andPatilea (2008), and Lavergne and Patilea (2012) are more powerful thanthe other two when\(k\) is largebecause of their use of a continuum of single index\(\beta^′x_j\) to summarize the correlationbetween\(u_j\) and\(x_j\). At the same time when\(k\) is small the two tests are at least aspowerful as the others. As mentioned the power of Lavergne and Patilea(2008) test comes from the “worst” single-index alternative whereas thepower of Lavergne and Patilea (2012) test comes from a continuum ofsingle-index alternatives. Thus in practice under the alternative thenature of the correlation between\(u_j\) and\(x_j\) will determine which of these twotests is more powerful.
See the references for more details.
SpeTestNPPreviously we have described the principle behind the fivenonparametric specification tests, how to derive the test statistics andthe rejection rules, and discussed their properties. Next we show how touseSpeTestNP to test parametric models in practice, withfirst the installation, second a description of how to use the test,third a thorough description of the arguments of the package mainfunctionSpeTest, and fourth an illustration to determinethe true shape of expected wages conditional on years of education andage.
To installSpeTestNP from CRAN simply run the followingcommand:
install.packages("SpeTestNP")To installSpeTestNP from Github the packagedevtools should be installed and the following commandsshould be run:
install.packages("devtools")library("devtools")install_github("HippolyteBoucher/SpeTestNP")To choose where and how the package is installed checkhelp(install_github) andhelp(install.packages). Alternatively users can downloadthe package and directly install it with the CMD.SpeTestNPrequires the packagesstats (already installed and loadedby default in Rstudio),foreach,parallel anddoParallel (if parallel computing is used to generate thevector) to be installed.
SpeTestNPRecall the true model and the model induced by the parametricspecification characterized by\(\mathcal{F}=\{f(\cdot,\tilde{\theta}):\tilde{\theta}\in\Theta\subset\mathbb{R}^k\}\)
\[ y_j=g(x_j)+\varepsilon_j, \qquady_j=f(x_j,\theta)+u_j \] where\(\mathbb{E}(y_j|x_j)=g(x_j) \ a.s\) and\(\theta= \\underset{\tilde{\theta}\in\Theta}{Argmin} \\mathbb{E}((y_j-f(x_j,\tilde{\theta}))^2)\).
Then to test the parametric specification or equivalently to test\(H_0:\mathbb{E}(u_j|x_j)=0 \ a.s\) thefunctionSpeTest of the packageSpeTestNP canbe directly used by filling the first argumenteq with afitted model of classlm ornls. In case theparametric specification is linear or can be rewritten in a linear formeq should be an object of classlm. In case ofnon-linear modelseq should be an object of classnls which stands for non-linear least squares (from thepackagestats). Note that in order to perform thespecification test by feedingSpeTest with annls model then the arguments innls must begiven in the right order. Then by running the following command theparametric specification characterized by\(\mathcal{F}\) is tested
SpeTest(eq)The function returns an object of classSTNP which whenprinted withprint orprint.STNP returns thetest statistic and its p-value. An object of typeSTNP is alist which not only contains the test statisticstat andits p-valuepval but also the type of the testtype, the rejection rulerejection, the teststatistic normalizationnorma, the Kernel function denotedas\(K(\cdot)\) used to compute thetest statistic central matrixker, the standardizationmethod of test the statistic central matrixknorm, the typeof bootstrap used to compute the p-valueboot, the numberof bootstrap samples used to compute the p-valuenboot, thebandwidthscch andhv, etc… To obtain asummary of the test and its options the methodsummary orsummary.STNP can be used on objects of classSTNP.
By default the test of Bierens (1982) with the standard normaldensity as the central matrix function is applied and the test p-valueis obtained using 50 wild bootstrap samples with a naive estimator ofthe conditional variance of the errors. Among many options, by changingthe argumentrejection frombootstrap (thedefault) toasymptotics iftype = "zheng" ortype = "pala" ortype = "sicm" the testp-value is then based on the asymptotic normality of these normalizedtest statistics under the null. In addition by default the teststatistic is not normalized as in by default the denominator in\(T_{zheng}\),\(T_{pala}\) and\(T_{sicm}\) is set to one. This can bechanged by settingnorma = "naive" to normalize thestatistic using a naive estimator of the errors conditional variance, orby settingnorma = "np" to normalize the statistic using anonparametric estimator of the errors conditional variance. Ifrejection = "bootstrap" settingpara toTRUE greatly speeds up the computation of the p-value byderiving bootstrapped statistics in parallel. For more details refer tothe next section orhelp(SpeTest).
Note that the functionsSpeTest_Stat andSpeTest_Dist are also available. Both functions takesimilar arguments toSpeTest.SpeTest_Statcomputes the specification test statistic, whileSpeTest_Dist generates a vector of sizenbootfrom the specification test statistic distribution under the nullhypothesis using the bootstrap. The argumentpara is alsoavailable toSpeTest_Dist.SpeTest_Stat andSpeTest_Dist allow to easily perform simulationexercises.
To be more specific about the arguments of the functionSpeTest:
Argumenteq should be the fitted parametric model ofclasslm ornlsof the parametric specificationof interest\(\mathcal{F}\)
Argumenttype refers to the type of the test
Iftype = "icm" the test of Bierens (1982) is performed(default)
Iftype = "zheng" the test of Zheng (1996) isperformed
Iftype = "esca" the test of Escanciano (2006) isperformed, significantly increases computing time
Iftype = "pala" the test of Lavergne and Patilea (2008)is performed
Iftype = "sicm" the test of Lavergne and Patilea (2012)is performed
Argumentrejection refers to the rejection rule
Ifrejection = "bootstrap" the p-value of the test isbased on the bootstrap (default)
Ifrejection = "asymptotics" andtype = "zheng" ortype = "esca" ortype = "sicm" the p-value of the test is based onasymptotic normality of the normalized version of one of these teststatistic under the null hypothesis
Iftype = "icm" ortype = "esca" theargumentrejection is ignored and the p-value is based onthe bootstrap
Argumentnorma refers to the normalization of thetest statistic
Ifnorma = "no" the test statistic is not normalized(default)
Ifnorma = "naive" the test statistic is normalized witha naive estimator of the errors variance
Ifnorma = "np" the test statistic is normalized with anonparametric estimator of the errors variance
Argumentboot refers to the bootstrap method used tocompute the test p-value whenrejection = "bootstrap"
Ifboot = "wild" the wild bootstrap of Wu (1986) is used(default)
Ifboot = "smooth" the smooth conditional momentsbootstrap of Gozalo (1997) is used
Argumentnboot is the number of bootstraps used tocompute the test p-value, by default `nboot = 50}
Argumentpara determines if parallel computing isused or not whenrejection = "bootstrap"
Ifpara = FALSE parallel computing is not used togenerate the bootstrap samples to compute the test p-value (default)
Ifpara = TRUE parallel computing is used to generatethe bootstrap samples to compute the test p-value, significantlydecreases computing time, makes use of all CPU cores except one
Argumentker refers to the Kernel function used inthe central matrix and for the nonparametric covariance estimator ifthere is any
Ifker = "normal" the central matrix Kernel function isthe normal p.d.f (default)
Ifker = "triangle" the central matrix Kernel functionis the triangular p.d.f
Ifker = "logistic" the central matrix Kernel functionis the logistic p.d.f
Ifker = "sinc" the central matrix Kernel function isthe sine cardinal function
Argumentknorm refers to the normalization of theKernel function
Ifknorm = "sd" then the standard deviation using theKernel function equals 1 (default)
Ifknorm ="sq" then the integral of the squared Kernelfunction equals 1
Argumentcch is the central matrix Kernelbandwidth
Iftype = "icm" ortype = "esca" thencch always equals1
Iftype = "zheng" the"default" bandwidthis the scaled rule of thumb:cch = 1.06*n^(-1/5)
Iftype = "sicm" andtype = "pala" the"default" bandwidth is the scaled rule of thumb:cch = 1.06*n^(-1/(4+k)) wherek is the numberof regressors
The user may change the bandwidth whentype = "zheng",type = "sicm" ortype = "pala".
Argumenthv is the bandwidth the nonparametricerrors covariance estimator whennorma = "np" orrejection = "bootstrap" andboot = "smooth"
By"default" the bandwidth is the scaled rule of thumbhv = 1.06*n^(-1/(4+k))
Argumentnbeta refers to the number of elements\(\beta\) used to represent the unithypersphere\(\mathcal{S}^k\) whentype = "pala" ortype = "sicm"
Computing time increases asnbeta gets larger
By"default" it is equal to 20 times the square root ofthe number of exogenous control variables
Argumentdirect refers to the default “directions”for the tests of Lavergne and Patilea (2008) and Lavergne and Patilea(2012)
Iftype = "pala",direct is the favoreddirection for\(\beta\), by"default" it is the OLS estimator ifclass(eq) = "lm"
Iftype = "sicm",direct is the initialdirection for\(\beta\). This directionshould be a vector of0 (for no direction),1(for positive direction) and-1 (for negativedirection)
For example,c(1,-1,0) indicates that the user thinksthat the 1st regressor has a positive effect on the dependent variable,that the 2nd regressor has a negative effect on the dependent variable,and that he has no idea about the effect of the 3rd regressor
By"default" no direction is given to thehypersphere
Argumentalphan refers to the weight given to thefavored direction for\(\beta\) whentype = "pala"
By"default" it is equal tolog(n)*n^(-3/2)
Before changing the default options of argumentsnorma,direct andalphan we strongly advise the userto read the tests references.
To finish we use data on 1,000 individuals from the CurrentPopulation Survey as in Stock and Watson (2007) to find the true shapeof their expected earnings conditional on their years of education andtheir age using the test of Bierens (1982).
library(SpeTestNP)library(AER)### Loading the data and taking a first lookdata( CPSSW8 )summary ( CPSSW8 )#> earnings gender age region #> Min. : 2.003 male :34348 Min. :21.00 Northeast:12371 #> 1st Qu.:11.058 female:27047 1st Qu.:33.00 Midwest :15136 #> Median :16.250 Median :41.00 South :18963 #> Mean :18.435 Mean :41.23 West :14925 #> 3rd Qu.:23.558 3rd Qu.:49.00 #> Max. :72.115 Max. :64.00 #> education #> Min. : 6.00 #> 1st Qu.:12.00 #> Median :13.00 #> Mean :13.64 #> 3rd Qu.:16.00 #> Max. :20.00Thus the dependent variable we consider is earnings and theexplanatory variables we use to build the conditional expectation areeducation and age. First we fit a linear specification of conditionalearnings.
lm_lin<-lm( earnings~ age+ education,data = CPSSW8[1:1000,] )summary ( lm_lin )#> #> Call:#> lm(formula = earnings ~ age + education, data = CPSSW8[1:1000, #> ])#> #> Residuals:#> Min 1Q Median 3Q Max #> -27.313 -6.464 -1.445 4.804 42.092 #> #> Coefficients:#> Estimate Std. Error t value Pr(>|t|) #> (Intercept) -14.18639 2.10661 -6.734 2.78e-11 ***#> age 0.15846 0.02747 5.767 1.07e-08 ***#> education 1.93904 0.12286 15.782 < 2e-16 ***#> ---#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1#> #> Residual standard error: 9.465 on 997 degrees of freedom#> Multiple R-squared: 0.2176, Adjusted R-squared: 0.216 #> F-statistic: 138.7 on 2 and 997 DF, p-value: < 2.2e-16Both variables are very significant. Then we perform two tests of thelinear specification, the bootstrap test of Bierens (1982) using thebootstrap decision rule, and the asymptotic test of Zheng (1996) with anaive normalization.
SpeTest( lm_lin ,type ="icm" ,rejection ="bootstrap" )#> #> Bierens (1982) integrated conditional moment test #> #> Test statistic : 27.31333 #> Bootstrap p-value : 0 #>SpeTest( lm_lin ,type ="zheng" ,rejection ="asymptotics" )#> #> Zheng (1996) test #> #> Normalized test statistic : 1.47353 #> Asymptotic p-value : 0.0703 #>The linear specification is rejected at level below 1% for the testof Bierens (1982) and at level below 10% for the test of Zheng (1996).So we fit a quadratic specification and perform the same tests.
lm_quad<-lm( earnings~ age+I(age^2)+ education+I(education^2),data = CPSSW8[1:1000,] )summary( lm_quad )#> #> Call:#> lm(formula = earnings ~ age + I(age^2) + education + I(education^2), #> data = CPSSW8[1:1000, ])#> #> Residuals:#> Min 1Q Median 3Q Max #> -32.167 -6.242 -1.412 4.665 41.753 #> #> Coefficients:#> Estimate Std. Error t value Pr(>|t|) #> (Intercept) -3.353005 8.633125 -0.388 0.69781 #> age 1.011953 0.212083 4.772 2.1e-06 ***#> I(age^2) -0.010051 0.002456 -4.093 4.6e-05 ***#> education -2.079218 1.041245 -1.997 0.04611 * #> I(education^2) 0.140968 0.036501 3.862 0.00012 ***#> ---#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1#> #> Residual standard error: 9.323 on 995 degrees of freedom#> Multiple R-squared: 0.2424, Adjusted R-squared: 0.2393 #> F-statistic: 79.58 on 4 and 995 DF, p-value: < 2.2e-16SpeTest( lm_quad ,type ="icm" ,rejection ="bootstrap" )#> #> Bierens (1982) integrated conditional moment test #> #> Test statistic : 1.45746 #> Bootstrap p-value : 0.1 #>SpeTest( lm_quad ,type ="zheng" ,rejection ="asymptotics")#> #> Zheng (1996) test #> #> Normalized test statistic : -0.98736 #> Asymptotic p-value : 0.16173 #>Both age and education to the square are very significant. Inaddition the p-values of both tests are above 15% so we cannot rejectthe quadratic specification. Finally we test a highly non-linearspecification with age, age to the square, education, education to thesquare, and their products included as controls:
lm_nlin<-lm( earnings~ age+I(age^2)+ education+I(education^2)+I(education*age)+I(education^2*age)+I(education*age^2)+I(education^2*age^2),data= CPSSW8[1:1000,] )summary( lm_nlin )#> #> Call:#> lm(formula = earnings ~ age + I(age^2) + education + I(education^2) + #> I(education * age) + I(education^2 * age) + I(education * #> age^2) + I(education^2 * age^2), data = CPSSW8[1:1000, ])#> #> Residuals:#> Min 1Q Median 3Q Max #> -33.135 -6.212 -1.485 4.515 41.920 #> #> Coefficients:#> Estimate Std. Error t value Pr(>|t|)#> (Intercept) 6.006e+01 1.334e+02 0.450 0.653#> age -3.545e-01 6.060e+00 -0.058 0.953#> I(age^2) -1.043e-02 6.707e-02 -0.155 0.876#> education -9.404e+00 1.924e+01 -0.489 0.625#> I(education^2) 3.335e-01 6.815e-01 0.489 0.625#> I(education * age) 1.277e-01 8.738e-01 0.146 0.884#> I(education^2 * age) -2.053e-03 3.093e-02 -0.066 0.947#> I(education * age^2) 6.633e-04 9.669e-03 0.069 0.945#> I(education^2 * age^2) -4.410e-05 3.420e-04 -0.129 0.897#> #> Residual standard error: 9.316 on 991 degrees of freedom#> Multiple R-squared: 0.2467, Adjusted R-squared: 0.2406 #> F-statistic: 40.56 on 8 and 991 DF, p-value: < 2.2e-16SpeTest( lm_nlin ,type ="icm" ,rejection ="bootstrap" )#> #> Bierens (1982) integrated conditional moment test #> #> Test statistic : 0.02541 #> Bootstrap p-value : 0.78 #>SpeTest( lm_nlin ,type ="zheng" ,rejection ="asymptotics")#> #> Zheng (1996) test #> #> Normalized test statistic : -1.8227 #> Asymptotic p-value : 0.03417 #>This time none of the variables are considered (individually)significant. This does not mean that this specification is wrong, infact it nests the quadratic specification. Note that the p-value of thetest of Bierens (1982) is very high while the p-value of asymptotic testof Zheng (1996) is 3%. This difference can be explained by the fact thatboth tests have important size distortions when the number ofexplanatory variables is “large”. Thus we perform a final check with theasymptotic tests of Lavergne and Patilea (2008) and Lavergne and Patilea(2012).
SpeTest( lm_nlin,type ="pala",rejection ="asymptotics",nbeta =40 )#> #> Lavergne and Patilea (2008) test #> #> Normalized test statistic : -0.8568 #> Asymptotic p-value : 0.19578 #>SpeTest( lm_nlin,type ="pala",rejection ="bootstrap" ,nboot =10 ,nbeta =10 )#> #> Lavergne and Patilea (2008) test #> #> Test statistic : -96.46926 #> Bootstrap p-value : 0.3 #>Both p-values are high so we cannot reject this highly non-linearspecification.
H.J. Bierens (1982),“ConsistentModel Specification Test”,Journal of Econometrics, 20 (1),105-134
I. Dreier and S. Kotz (2002),“Anote on the characteristic function of the t-distribution”,Statistics & Probability Letters, 57 (3), 221-224
J.C. Escanciano (2006),“A Consistent DiagnosticTest for Regression Models Using Projections”,EconometricTheory, 22 (6), 1030-1051
P.L. Gozalo (1997),“NonparametricBootstrap Analysis with Applications to Demographic Effects in DemandFunctions”,Journal of Econometrics, 81 (2), 357-393
Johnson, Kotz and Balakrishnan (1995),“ContinuousUnivariate Distributions”, volume 2,Wiley Series in Probabilityand Statistics: Applied Probability and Statistics, Wiley &Sons
P. Lavergne and V. Patilea (2008),“Breakingthe Curse of Dimensionality in Nonparametric Testing”,Journalof Econometrics, 143 (1), 103-122
P. Lavergne and V. Patilea (2012),“Onefor All and All for One: Regression Checks with Many Regressors”,Journal of Business & Economic Statistics, 30 (1),41-52
J.H. Stock and M.W. Watson (2006),“Why Has U.S. Inflation BecomeHarder to Forecast?”,Journal of Money, Credit and Banking,39 (1), 3-33
C.F.J. Wu (1986)“Jackknife, bootstrap andother resampling methods in regression analysis (with discussion)”,National Bureau of Economic Research Working Paper
J. Yin, Z. Geng, R. Li, H. Wang (2010),“Nonparametric covariancemodel”,Statistica Sinica, 20 (1), 469-479
J.X. Zheng (1996),“AConsistent Test of Functional Form via Nonparametric EstimationTechniques”,Journal of Econometrics, 75 (2), 263-289