
nortsTest is anR package for assessingnormality of stationary processes, it tests if a given data follows astationary Gaussian process. The package works as an extension of thenortest package that performs normality tests in randomsamples (independent data). The package’s principal functionsare:
elbouch.test() function that computes the bivariateEl Bouch etal. test,
epps.test() function that implements theEppstest,
epps-bootstrap.test() function that implements theEpps test,approximating the p-values using a sieve-bootstrap procedure.
lobato.test() function that implements the Lobatoand Velasco’s test,
lobato-bootstrap.test() function that implements theLobato and Velasco’s test, approximating the p-values using asieve-bootstrap procedure.
rp.test() function that implements the randomprojections test of Nieto-Reyes, Cuesta-Albertos and Gamboa’stest,
vavra.test() function that implements thePsaradakis and Vávra’stest,
jb-bootstrap.test() function that implements theJarque and Bera test, approximating the p-values using a sieve-bootstrapprocedure,
shapiro-bootstrap.test() function that implementsthe Shapiro test, approximating the p-values using a sieve-bootstrapprocedure,
cvm-bootstrap.test() function that implements theCramer Von Mises test, approximating the p-values using asieve-bootstrap procedure.
Additionally, inspired in the functioncheckresiduals()of theforecastpackage, we provide thecheck_residuals methods forchecking model’s assumptions using the estimated residuals. The functionchecks stationarity, homoscedasticity and normality, presenting a reportof the used tests and conclusions.
library(nortsTest)Classic hypothesis tests for normality such as Shapiro & Wilk,Anderson & Darling, or Jarque & Bera, do not perform well ondependent data. Therefore, these tests should not be used to checkwhether a given time series has been drawn from a Gaussian process. As asimple example, we generate a stationary ARMA(1,1) process simulatedusing an t student distribution with 7 degrees of freedom, and performthe Anderson-Darling test from thenortest package.
x=arima.sim(100,model =list(ar =0.32,ma =0.25),rand.gen = rt,df =7)nortest::ad.test(x)#>#> Anderson-Darling normality test#>#> data: x#> A = 0.50769, p-value = 0.1954The null hypothesis is that the data has a normal distribution andtherefore, follows a Gaussian Process. At
lobato.test(x)#>#> Lobatos and Velascos test#>#> data: x#> lobato = 16.864, df = 2, p-value = 0.0002177#> alternative hypothesis: x does not follow a Gaussian ProcessIn the next example we generate a stationary AR(2) process, using anexponential distribution with rate of 5, and perform theEppsandRP withk = 5 random projections tests. With asignificance level at\(\alpha=0.05\),the null hypothesis of non-normality is rejected.
set.seed(298)# Simulating the AR(2) processx=arima.sim(250,model =list(ar =c(0.2,0.3)),rand.gen = rexp,rate =5)# testsepps.test(x)#>#> epps test#>#> data: x#> epps = 38.158, df = 2, p-value = 5.178e-09#> alternative hypothesis: x does not follow a Gaussian Processrp.test(x,k =5)#>#> k random projections test#>#> data: x#> k = 5, lobato = 188.771, epps = 28.385, p-value = 0.0007823#> alternative hypothesis: x does not follow a Gaussian ProcessIn the next example we generate a stationary VAR(1) process ofdimensionp = 2, using two independent Gaussian AR(1)processes, and perform theEl Bouch’s test. With a significancelevel of\(\alpha = 0.05\), thealternative hypothesis of non-normality is rejected.
set.seed(298)# Simulating the VAR(2) processx1=arima.sim(250,model =list(ar =c (0.2)))x2=arima.sim(250,model =list(ar =c (0.3)))## testelbouch.test(y = x1,x = x2)#>#> El Bouch, Michel & Comon's test#>#> data: w = (y, x)#> Z = 0.1438, p-value = 0.4428#> alternative hypothesis: w = (y, x) does not follow a Gaussian Processcardox dataAs an example, we analyze the monthly mean carbon dioxide (inppm) from theastsa package, measured at Mauna LoaObservatory, Hawaii. from March, 1958 to November 2018. The carbondioxide data measured as the mole fraction in dry air, on Mauna Loaconstitute the longest record of direct measurements of CO2 in theatmosphere. They were started by C. David Keeling of the ScrippsInstitution of Oceanography in March of 1958 at a facility of theNational Oceanic and Atmospheric Administration.
library(astsa)data("cardox")autoplot(cardox,xlab ="years",ylab =" CO2 (ppm)",color ="darkred",size =1,main ="Carbon Dioxide Levels at Mauna Loa")
The time series clearly has trend and seasonal components, foranalyzing thecardox data we proposed a Gaussian linear statespace model. We use the model’s implementation from theforecast package asfollows:
library(forecast)#>#> Attaching package: 'forecast'#> The following object is masked from 'package:astsa':#>#> gasmodel=ets(cardox)summary(model)#> ETS(M,A,A)#>#> Call:#> ets(y = cardox)#>#> Smoothing parameters:#> alpha = 0.5591#> beta = 0.0072#> gamma = 0.1061#>#> Initial states:#> l = 314.6899#> b = 0.0696#> s = 0.6611 0.0168 -0.8536 -1.9095 -3.0088 -2.7503#> -1.2155 0.6944 2.1365 2.7225 2.3051 1.2012#>#> sigma: 9e-04#>#> AIC AICc BIC#> 3136.280 3137.140 3214.338#>#> Training set error measures:#> ME RMSE MAE MPE MAPE MASE#> Training set 0.0232403 0.312003 0.2430829 0.006308831 0.06883992 0.1559102#> ACF1#> Training set 0.07275949The best fitted model is amultiplicative level,additive trend andseasonality state space model. Ifthe model’s assumptions are satisfied, then the model’s errors behavelike a Gaussian stationary process. These assumptions can be checkedusing ourcheck_residuals functions.
In this case, we use an Augmented Dickey-Fuller test for stationaryassumption, and a random projections test for normality.
check_residuals(model,unit_root ="adf",normality ="rp",plot =TRUE)#>#> ***************************************************#>#> Unit root test for stationarity:#>#> Augmented Dickey-Fuller Test#>#> data: y#> Dickey-Fuller = -9.7249, Lag order = 8, p-value = 0.01#> alternative hypothesis: stationary#>#>#> Conclusion: y is stationary#> ***************************************************#>#> Goodness of fit test for Gaussian Distribution:#>#> k random projections test#>#> data: y#> k = 2, lobato = 3.8260, epps = 1.3156, p-value = 0.3328#> alternative hypothesis: y does not follow a Gaussian Process#>#>#> Conclusion: y follows a Gaussian Process#>#> ***************************************************
After all the model’s assumptions are checked, the model can be usedto forecast.
autoplot(forecast(model,h =12),include =100,xlab ="years",ylab =" CO2 (ppm)",main ="Forecast: Carbon Dioxide Levels at Mauna Loa")
nortsTest?The current development version can be downloaded from GitHub via
if (!requireNamespace("remotes"))install.packages("remotes")remotes::install_github("asael697/nortsTest",dependencies =TRUE)ThenortsTest package offers additional functions fordescriptive analysis in univariate time series.
uroot.test: performs unit root test for checkingstationary in linear time series. The Ljung-Box, AugmentedDickey-Fuller, Phillips-Perron and Kpps tests can be selected with theunit_root option parameter.
seasonal.test: performs seasonal unit root test forstationary in seasonal time series. Thehegy,ch andocsb tests are available with theseasonal option parameter.
arch.test: for checking the ARCH effect in timeseries. The Ljung-Box and Lagrange Multiplier tests can be selected fromthearch option parameter.
normal.test: for normal distribution check in timeseries and random samples. The tests presented above can be chosen forstationary time series. For random samples (independent data),the Anderson & Darling, Shapiro & Wilk’s, and Jarque-Bera testsare available with the normality option parameter.
For visual diagnostic, we offer ggplot2 methods for numeric andtime-series data. Most of the functions were adapted from Rob Hyndman’sforecastpackage.
autoplot: For plotting time series objects(ts class*).
gghist: histograms for numeric and univariate timeseries.
ggnorm: quantile-quantile plot for numeric andunivariate time series.
ggacf &ggpacf: partial and autocorrelation functions plots for numeric and univariate timeseries.
check_plot: summary diagnostic plot for univariatestationary time series.
Currently ourcheck_residuals() andcheck_plot() methods are valid for the current models andclasses:
ts: for univariate time series
numeric: for numeric vectors
arima0: from thestats package
Arima: from theforecastpackage
fGARCH: from thefGarchpackage
lm: from thestats package
glm: from thestats package
Holt andWinters: from thestats andforecast package
ets: from the forecast package
forecast methods: from theforecastpackage.
For overloading more functions, methods or packages, please make apull request or send a mail to:asael_am@hotmail.com.
El Bouch, S., Michel, O. & Comon, P. (2022). A normality testfor Multivariate dependent samples.Journal of SignalProcessing. Volume 201.
Psaradakis, Z. and Vávra, M. (2020) Normality tests for dependentdata: large-sample and bootstrap approaches. Communications inStatistics-Simulation and Computation. 49 (2), ISSN0361-0918.
Psaradakis, Z. & Vávra, M. (2017). A distance test ofnormality for a wide class of stationary process.Journal ofEconometrics and Statistics. 2, 50-60.
Nieto-Reyes, A., Cuesta-Albertos, J. & Gamboa, F. (2014). Arandom-projection based test of Gaussianity for stationary processes.Computational Statistics & Data Analysis, Elsevier. 75(C),124-141.
Hyndman, R. & Khandakar, Y. (2008). Automatic time seriesforecasting: the forecast package for R.Journal of StatisticalSoftware. 26(3), 1-22.
Lobato, I., & Velasco, C. (2004). A simple test of normalityfor time series.Journal Econometric Theory. 20(4),671-689.
Epps, T.W. (1987). Testing that a stationary time series isGaussian.The Annals of Statistic. 15(4), 1683-1698.