
Thekardl package is an R tool for estimating symmetricand asymmetric Autoregressive Distributed Lag (ARDL) and Nonlinear ARDL(NARDL) models, designed for econometricians and researchers analyzingcointegration and dynamic relationships in time series data. It offersflexible model specifications, allowing users to include deterministicvariables, asymmetric effects for short- and long-run dynamics, andtrend components. The package supports customizable lag structures,model selection criteria (AIC, BIC, AICc, HQ), and parallel processingfor computational efficiency. Key features include:
asym(),asymL(), andasymS() tomodel asymmetric effects in short- and long-run dynamics, anddeterministic() for dummy variables."quick","grid","grid_custom") or user-defined lags.This vignette demonstrates how to use thekardl()function to estimate an asymmetric ARDL model, perform diagnostic tests,and visualize results, using economic data from Turkey.
kardl in R can easily be installed from its CRANrepository:
install.packages("kardl")library(kardl)Alternatively, you can use thedevtools package to loaddirectly from GitHub:
# Install required packagesinstall.packages(c("stats","msm","lmtest","nlWaldTest","car","strucchange","utils"))# Install kardl from GitHubinstall.packages("devtools")devtools::install_github("karamelikli/kardl")Load the package:
library(kardl)This example estimates an asymmetric ARDL model to analyze thedynamics of exchange rate pass-through to domestic prices in Turkey,using a sample dataset (imf_example_data) with variablesfor Consumer Price Index (CPI), Exchange Rate (ER), Producer Price Index(PPI), and a COVID-19 dummy variable.
Assumeimf_example_data contains monthly data for CPI,ER, PPI, and a COVID dummy variable. We prepare the data by ensuringproper formatting and adding the dummy variable. We retrieve data fromthe IMF’s International Financial Statistics (IFS) dataset and prepareit for analysis.
Note: Theimf_example_data is a placeholder fordemonstration purposes. You should replace it with your actual dataset.The data can be loaded byreadxl or other data importfunctions.
We define the model formula using R’s formula syntax, incorporatingasymmetric effects and deterministic variables. We useasym() for variables with both short- and long-runasymmetry,deterministic() for fixed effects, andtrend for a linear time trend.
# Define the model formulaMyFormula<- CPI~ ER+ PPI+asym(ER+ PPI)+deterministic(covid)+ trendWe estimate the ARDL model using differentmode settingsto demonstrate flexibility in lag selection. Thekardl()function supports various modes:"grid","grid_custom","quick", or a user-defined lagvector.
mode = "grid"The"grid" mode evaluates all lag combinations up tomaxlag and provides console feedback.
# Set model optionskardl_set(criterion ="BIC",differentAsymLag =TRUE,data=imf_example_data)# Estimate model with grid modekardl_model<-kardl(data=imf_example_data,model= MyFormula,maxlag =4,mode ="grid")# View resultskardl_model# Display model summarysummary(kardl_model)Specify custom lags to bypass automatic lag selection:
kardl_model2<-kardl(data=imf_example_data, MyFormula,mode =c(2,1,1,3,0))# View resultskardl_model2$properLagUse the. operator to include all variables except thedependent variable:
kardl_set(data=imf_example_data)kardl(model = CPI~ .+deterministic(covid),mode ="grid")TheLagCriteria component contains lag combinations andtheir criterion values. We visualize these to compare model selectioncriteria (AIC, BIC, HQ).
library(dplyr)library(tidyr)library(ggplot2)# Convert LagCriteria to a data frameLagCriteria<-as.data.frame(kardl_model[["LagCriteria"]])colnames(LagCriteria)<-c("lag","AIC","BIC","AICc","HQ")LagCriteria<- LagCriteria%>%mutate(across(c(AIC, BIC, HQ), as.numeric))# Pivot to long formatLagCriteria_long<- LagCriteria%>%select(-AICc)%>%pivot_longer(cols =c(AIC, BIC, HQ),names_to ="Criteria",values_to ="Value")# Find minimum valuesmin_values<- LagCriteria_long%>%group_by(Criteria)%>%slice_min(order_by = Value)%>%ungroup()# Plotggplot(LagCriteria_long,aes(x = lag,y = Value,color = Criteria,group = Criteria))+geom_line()+geom_point(data = min_values,aes(x = lag,y = Value),color ="red",size =3,shape =8)+geom_text(data = min_values,aes(x = lag,y = Value,label = lag),vjust =1.5,color ="black",size =3.5)+labs(title ="Lag Criteria Comparison",x ="Lag Configuration",y ="Criteria Value")+theme_minimal()+theme(axis.text.x =element_text(angle =45,hjust =1))We calculate long-run coefficients usingkardl_longrun(), which standardizes coefficients bydividing them by the negative of the dependent variable’s long-runparameter.
# Long-run coefficientsmylong<-kardl_longrun(kardl_model)mylongTheasymmetrytest() function performs Wald tests toassess short- and long-run asymmetry in the model.
ast<- imf_example_data%>%kardl(CPI~ ER+ PPI+asym(ER+ PPI)+deterministic(covid)+ trend,mode =c(1,2,3,0,1))%>%asymmetrytest()ast# View long-run hypothesesast$Lhypotheses# Detailed resultssummary(ast)We perform cointegration tests to assess long-term relationshipsusingpssf(),psst(),narayan(),banerjee(), andrecmt().
Thepssf() function tests for cointegration using thePesaran, Shin, and Smith F Bound test.
A<- kardl_model%>%pssf(case =3,signif_level ="0.05")cat(paste0("The F statistic = ", A$statistic," where k = ", A$k,"."))cat(paste0("\nWe found '", A$Cont,"' at ", A$siglvl,"."))A$criticalValuessummary(A)Thepsst() function tests the significance of the laggeddependent variable’s coefficient.
A<- kardl_model%>%psst(case =3,signif_level ="0.05")cat(paste0("The t statistic = ", A$statistic," where k = ", A$k,"."))cat(paste0("\nWe found '", A$Cont,"' at ", A$siglvl,"."))A$criticalValuessummary(A)Thenarayan() function is tailored for small samplesizes.
A<- kardl_model%>%narayan(case =3,signif_level ="0.05")cat(paste0("The F statistic = ", A$statistic," where k = ", A$k,"."))cat(paste0("\nWe found '", A$Cont,"' at ", A$siglvl,"."))A$criticalValuessummary(A)Thebanerjee() function is designed for small datasets(≤100 observations).
A<- kardl_model%>%banerjee(signif_level ="0.05")cat(paste0("The ECM parameter = ", A$coef,", k = ", A$k,", t statistic = ", A$statistic,"."))cat(paste0("\nWe found '", A$Cont,"' at ", A$siglvl,"."))A$criticalValuessummary(A)Therecmt() function tests for cointegration using anError Correction Model.
recmt_model<- imf_example_data%>%recmt(MyFormula,mode ="grid_custom",case =3)recmt_model# View resultssummary(recmt_model)Thearchtest() function checks for autoregressiveconditional heteroskedasticity in the model residuals.
arch_result<-archtest(kardl_model$finalModel$model$residuals,q =2)summary(arch_result)We demonstrate how to customize prefixes and suffixes for asymmetricvariables usingkardl_set().
# Set custom prefixes and suffixeskardl_reset()kardl_set(AsymPrefix =c("asyP_","asyN_"),AsymSuffix =c("_PP","_NN"))kardl_custom<-kardl(data=imf_example_data, MyFormula,mode ="grid_custom")kardl_custom$properLagkardl(data, model, maxlag, mode, ...):
data: A time series dataset (e.g., a data frame withCPI, ER, PPI).model: A formula specifying the long-run equation,e.g.,y ~ x + z + asym(z) + asymL(x2 + x3) + asymS(x3 + x4) + deterministic(dummy1 + dummy2) + trend.Supports:asym(): Asymmetric effects for both short- and long-rundynamics.asymL(): Long-run asymmetric variables.asymS(): Short-run asymmetric variables.deterministic(): Fixed dummy variables.trend: Linear time trend.maxlag: Maximum number of lags (default: 4). Usesmaller values (e.g., 2) for small datasets, larger values (e.g., 8) forlong-term dependencies.mode: Estimation mode:"quick": Verbose output for interactive use."grid": Verbose output with lag optimization."grid_custom": Silent, efficient execution.c(1, 2, 4, 5) orc(CPI = 2, ER_POS = 3, ER_NEG = 1, PPI = 3)).inputs,finalModel,start_time,end_time,properLag,TimeSpan,OptLag,LagCriteria,type (“kardlmodel”).kardl_set(...): Configures optionslikecriterion (AIC, BIC, AICc, HQ),differentAsymLag,AsymPrefix,AsymSuffix,ShortCoef, andLongCoef. Usekardl_get() to retrieve settingsandkardl_reset() to restore defaults.
kardl_longrun(model): Calculatesstandardized long-run coefficients, returningtype(“kardl_longrun”),coef,delta_se,results, andstarsDesc.
asymmetrytest(model): Performs Waldtests for short- and long-run asymmetry, returningLhypotheses,Lwald,Shypotheses,Swald, andtype (“asymmetrytest”).
pssf(model, case, signif_level):Performs the Pesaran, Shin, and Smith F Bound test for cointegration,supporting cases 1–5 and significance levels (“auto”, 0.01, 0.025, 0.05,0.1, 0.10).
psst(model, case, signif_level):Performs the PSS t Bound test, focusing on the lagged dependentvariable’s coefficient.
narayan(model, case, signif_level):Conducts the Narayan test for cointegration, optimized for small samples(cases 2–5).
banerjee(model, signif_level):Performs the Banerjee cointegration test for small datasets (≤100observations).
recmt(data, model, maxlag, mode, case, signif_level, ...):Conducts the Restricted ECM test for cointegration, with similarparameters tokardl() and case/significance leveloptions.
archtest(resid, q): Tests for ARCHeffects in model residuals, returningtype,statistic,parameter,p.value,andFval.
For detailed documentation, use?kardl,?kardl_set,?kardl_longrun,?asymmetrytest,?pssf,?psst,?narayan,?banerjee,?recmt, or?archtest.
The options for the KARDL package are set by thekardl_set() function in R. The default values are set inthekardl_set list. You can change the options by using thekardl_set() function with the desired parameters. Thefollowing options are available:
| Option Name | Default | Description |
|---|---|---|
| data | NULL | The data to be used for the model estimation |
| model | NULL | The formula to be used for the model estimation |
| maxlag | 4 | The maximum number of lags to be considered for the modelestimation |
| mode | “quick” | The mode of the model estimation, can be “quick”, “grid”,“grid_custom” or a user-defined vector |
| criterion | “AIC” | The criterion for model selection, can be “AIC”, “BIC”, “HQ” or auser-defined function |
| differentAsymLag | FALSE | If TRUE, the asymmetry lags will be different for positive andnegative shocks |
| AsymPrefix | c() | Prefix for asymmetry variables, default is empty |
| AsymSuffix | c(“_POS”, “_NEG”) | Suffix for asymmetry variables, default is “_POS” and “_NEG” |
| LongCoef | “L{lag}.{varName}” | Prefix for long-run coefficients, default is “L1.” |
| ShortCoef | “L{lag}.d.{varName}” | Prefix for short-run coefficients, default is “L1.d.” |
| batch | “1/1” | Batch size for parallel processing, default is “1/1” |
The details of the options are as follows:
data is a data frame or time series object containingthe variables to be used in the model estimation. The default value isNULL, which means that the user must provide a dataset whencalling thekardl() function.
Thedata parameter is essential for thekardl() function to perform model estimation. It shouldcontain all the variables specified in the model formula, including thedependent variable and any independent variables, asymmetric components,and deterministic variables defined in the formula. The trend will begenerated automatically if specified in the formula. Input data can bein the form of a data frame, tibble, or time series object (e.g.,ts,xts,zoo).
When providing thedata, ensure that: - The dataset isclean and free of missing values for the variables used in the model. -The variables are appropriately formatted (e.g., numeric for continuousvariables). - The time series data is ordered correctly, especially ifthe analysis involves lagged variables.
model is a formula object specifying the model to beestimated. The default value isNULL, which means that theuser must provide a model formula when calling thekardl()function.
Themodel parameter defines the structure of the ARDL orNARDL model to be estimated. It should include the dependent variable onthe left side of the formula and the independent variables, asymmetriccomponents, deterministic variables, and trend (if applicable) on theright side. The formula can include: -asym(): To specifyvariables with asymmetric effects in both short- and long -run dynamics.-asymL(): To specify variables with asymmetric effectsonly in the long-run -dynamics. -asymS(): To specifyvariables with asymmetric effects only in the short-run -dynamics. -deterministic(): To include fixed dummy variables (e.g.,seasonal d -ummies, event dummies). -trend: To include alinear time trend in the model. When constructing themodelformula, ensure that: - All variables used in the formula are present inthedata provided. - The formula is syntactically correctand follows R’s formula conventions. - The use of asymmetric anddeterministic functions is appropriate for the research question anddata characteristics.
maxlag is an integer value specifying the maximum numberof lags to be considered for the model estimation. The default value is4.
Themaxlag parameter sets the upper limit for the numberof lags that thekardl() function will evaluate whenoptimizing the lag structure of the model. This is particularlyimportant when using modes like"grid" or"grid_custom", where the function systematically testsdifferent lag combinations up to the specified maximum. When choosing avalue formaxlag, consider the following: -DataFrequency: For monthly data, amaxlag of 4 isoften sufficient to capture short-term dynamics. For quarterly data, alowermaxlag ( e.g., 2) may be appropriate, while for dailydata, a highermaxlag (e.g., 8 or more) might be necessary.-Sample Size: A largermaxlag increasesthe number of parameters to estimate, which can be problematic withsmall sample sizes. Ensure that the sample size is adequate to supportthe number of lags being considered. -ModelComplexity: Highermaxlag values lead to morecomplex models, which may overfit the data. Balance the need forcapturing dynamics with the risk of overfitting. -ComputationalResources: Evaluating a large number of lag combinations can becomputationally intensive. Consider the available resources and timeconstraints when settingmaxlag.
mode is a character string or numeric vector specifyingthe mode of the model estimation. The default value is"quick". The available options are:
-“quick”: This mode provides a fast estimation of themodel without optimizing the lags. It is suitable for initialexplorations or when the user has a predefined lag structure. -“grid”: This mode performs a grid search over allpossible lag combinations up to the specifiedmaxlag. Itprovides verbose output, including the lag criteria for eachcombination, and is useful for thorough lag optimization. -“grid_custom”: Similar to"grid", but withsilent execution. It is more efficient for large datasets or when theuser wants to avoid console output during the lag optimization process.-User-defined vector: The user can specify a customlag structure by providing a numeric vector (e.g.,c(1, 2, 4, 5)) or a named vector (e.g.,c(CPI = 2, ER_POS = 3, ER_NEG = 1, PPI = 3)). This allowsfor complete control over the lag selection process.
Themode parameter determines how thekardl() function approaches the estimation of the ARDL orNARDL model. Each mode has its advantages and is suited to differentscenarios: - Use"quick" for rapid assessments when the lagstructure is already known or when computational speed is a priority. -Use"grid" for comprehensive lag optimization, especiallywhen the optimal lag structure is unknown. This mode is ideal forexploratory analysis and model selection. - Use"grid_custom" for efficient lag optimization withoutconsole output, particularly for large datasets or when running multiplemodels in batch mode. - Use a user-defined vector when the user hasspecific knowledge about the appropriate lags for each variable,allowing for tailored model specifications. When using"grid" or"grid_custom", ensure that themaxlag parameter is set appropriately to balance thethoroughness of the search with computational feasibility.
criterion is a character string specifying the criterionto be used for selecting the optimal lags. The default value is"AIC". The available options are:
modelCriterion function.For detailed information on the model selection criteria used in themethods, see the documentation for themodelCriterionfunction.
The choice of the criterion can significantly impact the selected laglength and, consequently, the performance of the model. Each criterionhas its strengths and is suited to specific scenarios:
"AIC" for general purposes, especially whenprioritizing a good fit over simplicity."BIC" when you prefer a more parsimonious model,particularly with large datasets."AICc" when working with small sample sizes toavoid overfitting."HQ" for a balance between AIC and BIC, often ineconometrics or time series models.Ensure that the selected criterion aligns with the goals of youranalysis and the characteristics of your data.
kardl_set(criterion ="AIC")kardl(data, MyFormula)kardl_set(criterion ="BIC")kardl(data, MyFormula)kardl_set(criterion ="AICc")data%>%kardl(MyFormula)kardl_set(criterion ="HQ")kardl(data, MyFormula)differentAsymLag is a logical value (TRUEorFALSE) indicating whether positive and negativeasymmetric variables should be assigned different lags during theestimation process. The default value isFALSE, meaningthat both positive and negative components will use the same lag.
Asymmetric decomposition separates a variable into its positive andnegative changes. In some models, it may be desirable to assigndifferent lags to these components to capture distinct dynamicbehaviors. SettingdifferentAsymLag = TRUE allows thefunction to optimize lags for positive and negative componentsindependently. WhendifferentAsymLag = FALSE, bothcomponents will share the same lag.
This parameter is particularly useful when:
Attention!
differentAsymLag = TRUE, ensure that the model hassufficient data to estimate separate lags reliably.differentAsymLag = FALSE may be more robust andcomputationally efficient.kadrl_set(differentAsymLag =FALSE)kardl(data, MyFormula)kardl_set(differentAsymLag =TRUE)kardl(data, MyFormula)AsymPrefix is a character vector specifying the prefixesused for naming asymmetric variables created during positive andnegative decomposition. The default value is an empty vectorc(), indicating that no prefixes are added by default.
When specified, the prefixes are added to the beginning of variablenames to represent the positive and negative decomposition:
Asymmetric decomposition is used to analyze the separate effects ofpositive and negative changes in a variable. For example, given avariableX, prefixes can be used to generatePOS_X andNEG_X for the positive and negativecomponents, respectively.
By default, no prefixes are applied (AsymPrefix = c()).However, users can define custom prefixes by providing a vector with twoelements. For example:
kardl_set(AsymPrefix = c("POS_", "NEG_")) results invariable names such asPOS_X andNEG_X.kardl_set(AsymPrefix = c("Increase_", "Decrease_"))results in variable names such asIncrease_X andDecrease_X.Attention!
AsymSuffix), ensure that the resulting variable names aremeaningful and do not conflict.kardl_set(AsymPrefix =c())kardl_set(AsymPrefix =c("POS_","NEG_"))kardl_set(AsymPrefix =c("Change_","Fall_"),AsymSuffix =c("_High","_Low"))AsymSuffix is a character vector specifying the suffixesused for naming asymmetric variables created during positive andnegative decomposition. The default value isc("_POS", "_NEG"), where:
"_POS" is the suffix appended to variables representingthe positive decomposition."_NEG" is the suffix appended to variables representingthe negative decomposition.The order of the suffixes is important:
Asymmetric decomposition is commonly used in models to separate theeffects of positive and negative changes in a variable. For example,given a variableX, the decomposition may result inX_POS andX_NEG to represent its positive andnegative components, respectively.
By default, the suffixes"_POS" and"_NEG"are used, but users can customize them as needed by providing a customvector. For example:
AsymSuffix = c("_Increase", "_Decrease") results invariable names such asX_Increase andX_Decrease.AsymSuffix = c("_Up", "_Down") results in variablenames such asX_Up andX_Down.Attention!
LongCoef is a character string specifying the prefixformat for naming long-run coefficients in the model output. The defaultvalue is"L{lag}.{varName}", where:
{lag} is a placeholder for the lag number.{varName} is a placeholder for the variable name.This format generates names likeL1.X for the first lagof variableX.
Long-run coefficients represent the long-term relationships betweenthe dependent variable and independent variables in an ARDL or NARDLmodel. TheLongCoef parameter allows users to customize howthese coefficients are named in the output, making it easier to identifyand interpret them. The default format"L{lag}.{varName}"is widely used and provides clear information about the lag and variableassociated with each coefficient. Users can modify the format bychanging theLongCoef string. For example:
LongCoef = "LongRun_{varName}_Lag{lag}" results innames likeLongRun_X_Lag1.LongCoef = "LR_{varName}_L{lag}" results in names likeLR_X_L1.Attention!
{lag} and{varName} are included in the custom format to maintainclarity in the coefficient names.ShortCoef is a character string specifying the prefixformat for naming short-run coefficients in the model output. Thedefault value is"L{lag}.d.{varName}", where:
{lag} is a placeholder for the lag number.{varName} is a placeholder for the variable name.This format generates names likeL1.d.X for the firstlag of the differenced variableX.
Short-run coefficients capture the immediate effects of changes inindependent variables on the dependent variable in an ARDL or NARDLmodel. TheShortCoef parameter allows users to customizehow these coefficients are named in the output, facilitating easieridentification and interpretation. The default format"L{lag}.d.{varName}" is commonly used and provides clearinformation about the lag, differencing, and variable associated witheach coefficient. Users can modify the format by changing theShortCoef string. For example:
ShortCoef = "ShortRun_{varName}_Lag{lag}" results innames likeShortRun_X_Lag1.ShortCoef = "SR_{varName}_L{lag}" results in names likeSR_X_L1.Attention!
{lag} and{varName} are included in the custom format to maintainclarity in the coefficient names.batch is a character string specifying the batch sizefor parallel processing during model estimation. The default value is"1/1", indicating that the model estimation will beexecuted as a single job without batching.
Thebatch parameter is particularly useful when dealingwith large datasets or complex models that require significantcomputational resources. By specifying a batch size, users can dividethe model estimation process into smaller, more manageable segments,which can be processed in parallel. The format for thebatch parameter is"m/n", where:
m is the number of batches to be processed inparallel.n is the total number of batches.For example, settingbatch = "2/4" would divide theestimation into 4 batches, with 2 batches being processedsimultaneously. This can significantly reduce computation time,especially for models with extensive lag structures or large numbers ofvariables.
Attention!
kardl_set(batch ="1/1")kardl(data, MyFormula)kardl_set(batch ="2/4")kardl(data, MyFormula)kardl_set(batch ="3/6")kardl(data, MyFormula)Thekardl package is a versatile tool for econometricanalysis, offering robust support for symmetric and asymmetricARDL/NARDL modeling, cointegration tests, stability diagnostics, andheteroskedasticity checks. Its flexible formula specification, lagoptimization, and support for parallel processing make it ideal forstudying complex economic relationships. For more information, visithttps://github.com/karamelikli/kardl or contact theauthors athakperest@gmail.com.