| Version: | 1.0.2 |
| Date: | 2025-12-06 |
| Title: | Tools for Highfrequency Data Analysis |
| Description: | Provide functionality to manage, clean and match highfrequency trades and quotes data, calculate various liquidity measures, estimate and forecast volatility, detect price jumps and investigate microstructure noise and intraday periodicity. A detailed vignette can be found in the open-access paper "Analyzing Intraday Financial Data in R: The highfrequency Package" by Boudt, Kleen, and Sjoerup (2022, <doi:10.18637/jss.v104.i08>). |
| License: | GPL-2 |GPL-3 [expanded from: GPL (≥ 2)] |
| Encoding: | UTF-8 |
| LazyData: | true |
| URL: | https://github.com/jonathancornelissen/highfrequency |
| BugReports: | https://github.com/jonathancornelissen/highfrequency/issues |
| Depends: | R (≥ 3.5.0) |
| Imports: | xts, zoo, Rcpp, graphics, methods, stats, utils, grDevices,robustbase, data.table (≥ 1.12.0), RcppRoll, quantmod,sandwich, numDeriv, Rsolnp |
| LinkingTo: | Rcpp, RcppArmadillo |
| Suggests: | mvtnorm, covr, FKF, rugarch, testthat, knitr, rmarkdown |
| RoxygenNote: | 7.3.3 |
| NeedsCompilation: | yes |
| Packaged: | 2025-12-08 09:28:31 UTC; onnokleen |
| Author: | Kris Boudt |
| Maintainer: | Kris Boudt <kris.boudt@ugent.be> |
| Repository: | CRAN |
| Date/Publication: | 2025-12-08 10:50:02 UTC |
highfrequency: Tools for Highfrequency Data Analysis
Description
Thehighfrequency package provides numerous tools for analyzing high-frequency financial data, including functionality to:
Clean, handle, and manage high frequency trades and quotes data.
Calculate liquidity measures
Calculate (multivariate) realized measures of the distribution of high-frequency returns
Estimate models for realized measures of volatility and the corresponding forecasts
Detect jumps in prices
Analyze market microstructure noise in asset prices
Estimate spot volatility and drift as well as analyze intraday periodicity of spot volatility
Author(s)
Kris Boudt, Jonathan Cornelissen, Onno Kleen, Scott Payseur, Emil SjoerupMaintainer: Kris Boudt <Kris.Boudt@ugent.be>
Contributors: Giang Nguyen
Thanks: We would like to thank Brian Peterson, Chris Blakely, Dirk Eddelbuettel, Maarten Schermer, and Eric Zivot
See Also
Useful links:
Ait-Sahalia and Jacod (2009) tests for the presence of jumps in the price series.
Description
This test examines the presence of jumps in highfrequency price series. It is based on the theory of Ait-Sahalia and Jacod (2009). It consists in comparing the multi-power variation of equi-spaced returns computed at a fast time scale (h),r_{t,i} (i=1, \ldots,N) and those computed at the slower time scale (kh),y_{t,i}(i=1, \ldots ,\mbox{N/k}).
They found that the limit (forN\to\infty ) of the realized power variation is invariant for different sampling scales and that their ratio is1 in case of jumps and\mbox{k}^{p/2}-1 if no jumps.Therefore the AJ test detects the presence of jump using the ratio of realized power variation sampled from two scales. The null hypothesis is no jumps.
The function returns three outcomes: 1.z-test value 2.critical value under confidence level of95\% and 3.p-value.
Assume there isN equispaced returns in periodt. Letr_{t,i} be a return (withi=1, \ldots,N) in periodt.
And there isN/k equispaced returns in periodt. Lety_{t,i} be a return (withi=1, \ldots ,\mbox{N/k}) in periodt.
Then the AJjumpTest is given by:
\mbox{AJjumpTest}_{t,N}= \frac{S_t(p,k,h)-k^{p/2-1}}{\sqrt{V_{t,N}}}
in which,
\mbox{S}_t(p,k,h)= \frac{PV_{t,M}(p,kh)}{PV_{t,M}(p,h)}
\mbox{PV}_{t,N}(p,kh)= \sum_{i=1}^{N/k}{|y_{t,i}|^p}
\mbox{PV}_{t,N}(p,h)= \sum_{i=1}^{N}{|r_{t,i}|^p}
\mbox{V}_{t,N}= \frac{N(p,k) A_{t,N(2p)}}{N A_{t,N(p)}}
\mbox{N}(p,k)= \left(\frac{1}{\mu_p^2}\right)(k^{p-2}(1+k))\mu_{2p} + k^{p-2}(k-1) \mu_p^2 - 2k^{p/2-1}\mu_{k,p}
\mbox{A}_{t,n(2p)}= \frac{(1/N)^{(1-p/2)}}{\mu_p} \sum_{i=1}^{N}{|r_{t,i}|^p} \ \ \mbox{for} \ \ |r_j|< \alpha(1/N)^w
\mu_{k,p}= E(|U|^p |U+\sqrt{k-1}V|^p)
U, V: independent standard normal random variables;h=1/N;p, k, \alpha, w: parameters.
Usage
AJjumpTest( pData, p = 4, k = 2, alignBy = NULL, alignPeriod = NULL, alphaMultiplier = 4, alpha = 0.975, ...)Arguments
pData | either an |
p | can be chosen among 2 or 3 or 4. The author suggests 4. 4 by default. |
k | can be chosen among 2 or 3 or 4. The author suggests 2. 2 by default. |
alignBy | character, indicating the time scale in which |
alignPeriod | positive numeric, indicating the number of periods to aggregate over. For example, to aggregatebased on a 5 minute frequency, set |
alphaMultiplier | alpha multiplier |
alpha | numeric of length one with the significance level to use for the jump test(s). Defaults to 0.975. |
... | used internally |
Details
The theoretical framework underlying jump test is that the logarithmic price processX_t belongs to the class of Brownian semimartingales, which can be written as:
\mbox{X}_{t}= \int_{0}^{t} a_udu + \int_{0}^{t}\sigma_{u}dW_{u} + Z_t
wherea is the drift term,\sigma denotes the spot volatility process,W is a standard Brownian motion andZ is a jump process defined by:
\mbox{Z}_{t}= \sum_{j=1}^{N_t}k_j
wherek_j are nonzero random variables. The counting process can be either finite or infinite for finite or infinite activity jumps.
Using the convergence properties of power variation and its dependence on the time scale on which it is measured, Ait-Sahalia and Jacod (2009) define a new variable which converges to 1 in the presence of jumps in the underlying return series, or to another deterministic and known number in the absence of jumps (Theodosiou and Zikes, 2009).
Value
a list orxts in depending on whether input prices span more than one day.
Author(s)
Giang Nguyen, Jonathan Cornelissen, Kris Boudt, and Emil Sjoerup.
References
Ait-Sahalia, Y. and Jacod, J. (2009). Testing for jumps in a discretely observed process. The Annals of Statistics, 37(1), 184-222.
Theodosiou, M., & Zikes, F. (2009). A comprehensive comparison of alternative tests for jumps in asset prices. Unpublished manuscript, Graduate School of Business, Imperial College London.
Examples
jt <- AJjumpTest(sampleTData[, list(DT, PRICE)], p = 2, k = 3, alignBy = "seconds", alignPeriod = 5, makeReturns = TRUE)Barndorff-Nielsen and Shephard (2006) tests for the presence of jumps in the price series.
Description
This test examines the presence of jumps in highfrequency price series. It is based on theory of Barndorff-Nielsen and Shephard (2006). The null hypothesis is that there are no jumps.
Usage
BNSjumpTest( rData, IVestimator = "BV", IQestimator = "TP", type = "linear", logTransform = FALSE, max = FALSE, alignBy = NULL, alignPeriod = NULL, makeReturns = FALSE, alpha = 0.975)Arguments
rData | either an |
IVestimator | can be chosen among jump robust integrated variance estimators: |
IQestimator | can be chosen among jump robust integrated quarticity estimators: |
type | a method of BNS testing: can be linear or ratio. Linear by default. |
logTransform | boolean, should be |
max | boolean, should be |
alignBy | character, indicating the time scale in which |
alignPeriod | positive numeric, indicating the number of periods to aggregate over. For example, to aggregatebased on a 5 minute frequency, set |
makeReturns | boolean, should be |
alpha | numeric of length one with the significance level to use for the jump test(s). Defaults to 0.975. |
Details
Assume there isN equispaced returns in periodt. Assume the Realized variance (RV), IVestimator and IQestimator are based onN equi-spaced returns.
Letr_{t,i} be a return (withi = 1, \ldots, N) in periodt.
Then the BNSjumpTest is given by
\mbox{BNSjumpTest}= \frac{\code{RV} - \code{IVestimator}}{\sqrt{(\theta-2)\frac{1}{N} {\code{IQestimator}}}}.
The options forIVestimator andIQestimator are listed above.\theta depends on the chosenIVestimator (Huang and Tauchen, 2005).
The theoretical framework underlying the jump test is that the logarithmic price processX_t belongs to the class of Brownian semimartingales, which can be written as:
\mbox{X}_{t}= \int_{0}^{t} a_u \ du + \int_{0}^{t}\sigma_{u} \ dW_{u} + Z_t
wherea is the drift term,\sigma denotes the spot volatility process,W is a standard Brownian motion andZ is a jump process defined by:
\mbox{Z}_{t}= \sum_{j=1}^{N_t}k_j
wherek_j are nonzero random variables. The counting process can be either finite or infinite for finite or infinite activity jumps.
Since the realized volatility converges to the sum of integrated variance and jump variation, while the robustIVestimator converges to the integrated variance, it follows that the difference betweenRV and theIVestimator captures the jump part only, and this observation underlines the BNS test for jumps (Theodosiou and Zikes, 2009).
Value
a list orxts (depending on whether input prices span more than one day)with the following values:
z-test value.critical value (with confidence level of 95%).
p-value of the test.
Author(s)
Giang Nguyen, Jonathan Cornelissen, Kris Boudt, and Emil Sjoerup.
References
Barndorff-Nielsen, O. E., and Shephard, N. (2006). Econometrics of testing for jumps in financial economics using bipower variation.Journal of Financial Econometrics, 4, 1-30.
Corsi, F., Pirino, D., and Reno, R. (2010). Threshold bipower variation and the impact of jumps on volatility forecasting.Journal of Econometrics, 159, 276-288.
Huang, X., and Tauchen, G. (2005). The relative contribution of jumps to total price variance.Journal of Financial Econometrics, 3, 456-499.
Theodosiou, M., and Zikes, F. (2009). A comprehensive comparison of alternative tests for jumps in asset prices. Unpublished manuscript, Graduate School of Business, Imperial College London.
Examples
bns <- BNSjumpTest(sampleTData[, list(DT, PRICE)], IVestimator= "rMinRVar", IQestimator = "rMedRQuar", type= "linear", makeReturns = TRUE)bnsInternal HEAVY functions
Description
Internal HEAVY functions
Usage
Bj(j, parVarEq, parRMEq)Heterogeneous autoregressive (HAR) model for realized volatility model estimation
Description
Function returns the estimates for the heterogeneous autoregressive model (HAR)for realized volatility discussed in Andersen et al. (2007) and Corsi (2009).This model is mainly used to forecast the next day's volatility based on the high-frequency returns of the past.
Usage
HARmodel( data, periods = c(1, 5, 22), periodsJ = c(1, 5, 22), periodsQ = c(1), leverage = NULL, RVest = c("rCov", "rBPCov", "rQuar"), type = "HAR", inputType = "RM", jumpTest = "ABDJumptest", alpha = 0.05, h = 1, transform = NULL, externalRegressor = NULL, periodsExternal = c(1), ...)Arguments
data | an |
periods | a vector of integers indicating over how days the realized measures in the model should be aggregated. By default |
periodsJ | a vector of integers indicating over what time periods the jump components in the model should be aggregated. By default |
periodsQ | a vector of integers indicating over what time periods the realized quarticity in the model should be aggregated. By default |
leverage | a vector of integers indicating over what periods the negative returns should be aggregated.See Corsi and Reno (2012) for more information. By default |
RVest | a character vector with one, two, or three elements. The first element always refers to the name of the function to estimate the daily integrated variance (non-jump-robust).The second and third element depends on which type of model is estimated: If |
type | a string referring to the type of HAR model you would like to estimate. By default |
inputType | a string denoting if the input data consists of realized measures or high-frequency returns. Default "RM" is the only way to denote realized measures and everything else denotes returns. |
jumpTest | the function name of a function used to test whether the test statistic which determines whether the jump variability is significant that day. By default |
alpha | a real indicating the confidence level used in testing for jumps. By default |
h | an integer indicating the number over how many days the dependent variable should be aggregated.By default, |
transform | optionally a string referring to a function that transforms both the dependent and explanatory variables in the model. By default |
externalRegressor | an |
periodsExternal | a vector of integers indicating over how days |
... | extra arguments for jump test. |
Details
The basic specification in Corsi (2009) is as follows. LetRV_{t} be the realized variances at dayt andRV_{t-k:t} the averagerealized variance in betweent-k andt,k \geq 0.
The dynamics of the model are given by
RV_{t+1} = \beta_0 + \beta_1 \ RV_{t} + \beta_2 \ RV_{t-4:t} + \beta_3 \ RV_{t-21:t} + \varepsilon_{t+1}
which is estimated by ordinary least squares under the assumption that at timet,the conditional mean of\varepsilon_{t+1} is equal to zero.
For other specifications, please refer to the cited papers.
The standard errors reporting in theprint andsummary methods are Newey-West standard errors calculated with thesandwich package.
Value
The function outputs an object of classHARmodel andlm (soHARmodel is a subclass oflm). Objectsof classHARmodel has the following methodsplot.HARmodel,predict.HARmodel,print.HARmodel, andsummary.HARmodel.
Author(s)
Jonathan Cornelissen, Kris Boudt, Onno Kleen, and Emil Sjoerup.
References
Andersen, T. G., Bollerslev, T., and Diebold, F. (2007). Roughing it up: Including jump components in the measurement, modelling and forecasting of return volatility.The Review of Economics and Statistics, 89, 701-720.
Corsi, F. (2009). A simple approximate long memory model of realized volatility.Journal of Financial Econometrics, 7, 174-196.
Corsi, F. and Reno R. (2012). Discrete-time volatility forecasting with persistent leverage effect and the link with continuous-time volatility modeling.Journal of Business & Economic Statistics, 30, 368-380.
Bollerslev, T., Patton, A., and Quaedvlieg, R. (2016). Exploiting the errors: A simple approach for improved volatility forecasting,Journal of Econometrics, 192, 1-18.
Examples
# Example 1: HAR# Forecasting daily Realized volatility for the S&P 500 using the basic HARmodel: HARlibrary(xts)RVSPY <- as.xts(SPYRM$RV5, order.by = SPYRM$DT)x <- HARmodel(data = RVSPY , periods = c(1,5,22), RVest = c("rCov"), type = "HAR", h = 1, transform = NULL, inputType = "RM")class(x)xsummary(x)plot(x)predict(x)# Example 2: HARQ# Get the highfrequency returnsdat <- as.xts(sampleOneMinuteData[, makeReturns(STOCK), by = list(DATE = as.Date(DT))])x <- HARmodel(dat, periods = c(1,5,10), periodsJ = c(1,5,10), periodsQ = c(1), RVest = c("rCov", "rQuar"), type="HARQ", inputType = "returns")# Estimate the HAR model of type HARQclass(x)x# plot(x)# predict(x)# Example 3: HARQJ with already computed realized measuresdat <- SPYRM[, list(DT, RV5, BPV5, RQ5)]x <- HARmodel(as.xts(dat), periods = c(1,5,22), periodsJ = c(1), periodsQ = c(1), type = "HARQJ")# Estimate the HAR model of type HARQJclass(x)x# plot(x)predict(x)# Example 4: CHAR with already computed realized measuresdat <- SPYRM[, list(DT, RV5, BPV5)]x <- HARmodel(as.xts(dat), periods = c(1, 5, 22), type = "CHAR")# Estimate the HAR model of type CHARclass(x)x# plot(x)predict(x)# Example 5: CHARQ with already computed realized measuresdat <- SPYRM[, list(DT, RV5, BPV5, RQ5)]x <- HARmodel(as.xts(dat), periods = c(1,5,22), periodsQ = c(1), type = "CHARQ")# Estimate the HAR model of type CHARQclass(x)x# plot(x)predict(x)# Example 6: HARCJ with pre-computed test-statistics## BNSJumptest manually calculated.testStats <- sqrt(390) * (SPYRM$RV1 - SPYRM$BPV1)/sqrt((pi^2/4+pi-3 - 2) * SPYRM$medRQ1)model <- HARmodel(cbind(as.xts(SPYRM[, list(DT, RV5, BPV5)]), testStats), type = "HARCJ")HEAVY model estimation
Description
This function calculates the High frEquency bAsed VolatilitY (HEAVY) model proposed in Shephard and Sheppard (2010).
Usage
HEAVYmodel(data, startingValues = NULL)Arguments
data | an |
startingValues | a vector of alternative starting values: first three arguments for variance equation and last three arguments for measurement equation. |
Details
Letr_{t} andRM_{t} be series of demeaned returns and realized measures ofdaily stock price variation. The HEAVY model is a two-component model.We assumer_{t} = h_{t}^{1/2} Z_{t} whereZ_t is an i.i.d. zero-mean and unit-variance innovation term. The dynamics of the HEAVY model are given by
h_{t} = \omega + \alpha RM_{t-1} + \beta h_{t-1}
and
\mu_{t} = \omega_{R} + \alpha_{R} RM_{t-1} + \beta_{R} \mu_{t-1}.
The two equations are estimated separately as mentioned in Shephard and Sheppard (2010).We report robust standard errors based on the matrix-product of inverted Hessians andthe outer product of gradients.
Note that we always demean the returns in the data input as we don't include a constant in the mean equation.
Value
The function outputs an object of classHEAVYmodel, a list containing
coefficients = estimated coefficients.
se = robust standard errors based on inverted Hessian matrix.
residuals = the residuals in the return equation.
llh = the two-component log-likelihood values.
varCondVariances = conditional variances in the variance equation.
RMCondVariances = conditional variances in the RM equation.
data = the input data.
The class HEAVYmodel has the following methods: plot.HEAVYmodel, predict.HEAVYmodel, print.HEAVYmodel, and summary.HEAVYmodel.
Author(s)
Onno Kleen and Emil Sjorup.
References
Shephard, N. and Sheppard, K. (2010). Realising the future: Forecasting with high frequency based volatility (HEAVY) models. Journal of Applied Econometrics 25, 197–231.
See Also
Examples
# Calculate returns in percentageslogReturns <- 100 * makeReturns(SPYRM$CLOSE)[-1]# Combine both returns and realized measures into one xts# Due to return calculation, the first observation is missingdataSPY <- xts::xts(cbind(logReturns, SPYRM$BPV5[-1] * 10000), order.by = SPYRM$DT[-1])# Fit the HEAVY modelfittedHEAVY <- HEAVYmodel(dataSPY)# Examine the estimated coefficients and robust standard errorsfittedHEAVY# Calculate iterative multi-step-ahead forecastspredict(fittedHEAVY, stepsAhead = 12)Estimators of the integrated covariance
Description
This documentation page functions as a point of reference to quickly look up the estimators of the integrated covariance provided in thehighfrequency package.
The implemented estimators are:
Realized covariancerCov
Realized bipower covariancerBPCov
Hayashi-Yoshida realized covariancerHYCov
Realized kernel covariancerKernelCov
Realized outlyingness-weighted covariancerOWCov
Realized threshold covariancerThresholdCov
Realized two-scale covariancerTSCov
Robust realized two-scale covariancerRTSCov
Subsampled realized covariancerAVGCov
Realized semi-covariancerSemiCov
Modulated Realized covariancerMRCov
Realized Cholesky covariancerCholCov
Beta-adjusted realized covariancerBACov
See Also
IVar for a list of implemented estimators of the integrated variance.
Estimators of the integrated variance
Description
This documentation page functions as a point of reference to quickly look up the estimators of the integrated variance provided in thehighfrequency package.
The implemented estimators are:Realized VariancerRVar
Median realized variancerMedRVar
Minimum realized variancerMinRVar
Realized quadpower variancerQPVar
Realized multipower variancerMPVar
Realized semivariancerSVar
Note that almost all estimators in the list inICov also work yield estimates of the integrated variance on the diagonals.
See Also
ICov for a list of implemented estimators of the integrated covariance.
Function returns the value, the standard error and the confidence band of the integrated variance (IV) estimator.
Description
This function supplies information about standard error and confidence band of integrated variance (IV) estimators under Brownian semimartingales model such as: bipower variation, rMinRV, rMedRV. Depending on users' choices of estimator (integrated variance (IVestimator), integrated quarticity (IQestimator)) and confidence level, the function returns the result.(Barndorff (2002))Function returns three outcomes: 1.value of IV estimator 2.standard error of IV estimator and 3.confidence band of IV estimator.
Assume there isN equispaced returns in periodt.
Then the IVinference is given by:
\mbox{standard error}= \frac{1}{\sqrt{N}} *sd
\mbox{confidence band}= \hat{IV} \pm cv*se
in which,
\mbox{sd}= \sqrt{\theta \times \hat{IQ}}
cv: critical value.
se: standard error.
\theta: depending on IQestimator,\theta can take different value (Andersen et al. (2012)).
\hat{IQ} integrated quarticity estimator.
Usage
IVinference( rData, IVestimator = "RV", IQestimator = "rQuar", confidence = 0.95, alignBy = NULL, alignPeriod = NULL, makeReturns = FALSE, ...)Arguments
rData |
|
IVestimator | can be chosen among integrated variance estimators: RV, BV, rMinRV or rMedRV. RV by default. |
IQestimator | can be chosen among integrated quarticity estimators: rQuar, realized tri-power quarticity (TPQ), quad-power quarticity (QPQ), rMinRQuar or rMedRQuar. TPQ by default. |
confidence | confidence level set by users. 0.95 by default. |
alignBy | character, indicating the time scale in which |
alignPeriod | positive numeric, indicating the number of periods to aggregate over. E.g. to aggregatebased on a 5 minute frequency, set |
makeReturns | boolean, should be |
... | additional arguments. |
Details
The theoretical framework is the logarithmic price processX_t belongs to the class of Brownian semimartingales, which can be written as:
\mbox{X}_{t}= \int_{0}^{t} a_udu + \int_{0}^{t}\sigma_{u}dW_{u}
wherea is the drift term,\sigma denotes the spot vivInferenceolatility process,W is a standard Brownian motion (assume that there are no jumps).
Value
list
Author(s)
Giang Nguyen, Jonathan Cornelissen and Kris Boudt
References
Andersen, T. G., Dobrev, D., and Schaumburg, E. (2012). Jump-robust volatility estimation using nearest neighbor truncation.Journal of Econometrics, 169, 75-93.
Barndorff-Nielsen, O. E. (2002). Econometric analysis of realized volatility and its use in estimating stochastic volatility models.Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64, 253-280.
Examples
## Not run: library("xts") # This function only accepts xts data currentlyivInf <- IVinference(as.xts(sampleTData[, list(DT, PRICE)]), IVestimator= "rMinRV", IQestimator = "rMedRQ", confidence = 0.95, makeReturns = TRUE)ivInf## End(Not run)Jiang and Oomen (2008) tests for the presence of jumps in the price series.
Description
This test examines the jump in highfrequency data. It is based on theory of Jiang and Oomen (JO). They found that the difference of simple return and logarithmic return can capture one half of integrated variance if there is no jump in the underlying sample path. The null hypothesis is no jumps.
Function returns three outcomes: 1.z-test value 2.critical value under confidence level of95\% and 3.p-value.
Assume there isN equispaced returns in periodt.
Letr_{t,i} be a logarithmic return (withi=1, \ldots,N) in periodt.
LetR_{t,i} be a simple return (withi=1, \ldots,N) in periodt.
Then the JOjumpTest is given by:
\mbox{JOjumpTest}_{t,N}= \frac{N BV_{t}}{\sqrt{\Omega_{SwV}} \left(1-\frac{RV_{t}}{SwV_{t}} \right)}
in which,BV: bipower variance;RV: realized variance (defined by Andersen et al. (2012));
\mbox{SwV}_{t}=2 \sum_{i=1}^{N}(R_{t,i}-r_{t,i})
\Omega_{SwV}= \frac{\mu_6}{9} \frac{{N^3}{\mu_{6/p}^{-p}}}{N-p-1} \sum_{i=0}^{N-p}\prod_{k=1}^{p}|r_{t,i+k}|^{6/p}
\mu_{p}= \mbox{E}[|\mbox{U}|^{p}] = 2^{p/2} \frac{\Gamma(1/2(p+1))}{\Gamma(1/2)} % \mbox{E}[|\mbox{U}|^p]=
U: independent standard normal random variables
p: parameter (power).
Usage
JOjumpTest( pData, power = 4, alignBy = NULL, alignPeriod = NULL, alpha = 0.975, ...)Arguments
pData | a zoo/xts object containing all prices in period t for one asset. |
power | can be chosen among 4 or 6. 4 by default. |
alignBy | character, indicating the time scale in which |
alignPeriod | positive numeric, indicating the number of periods to aggregate over. E.g. to aggregatebased on a 5 minute frequency, set |
alpha | numeric of length one with the significance level to use for the jump test(s). Defaults to 0.975. |
... | Used internally, do not set. |
Details
The theoretical framework underlying jump test is that the logarithmic price processX_t belongs to the class of Brownian semimartingales, which can be written as:
\mbox{X}_{t}= \int_{0}^{t} a_udu + \int_{0}^{t}\sigma_{u}dW_{u} + Z_t
wherea is the drift term,\sigma denotes the spot volatility process,W is a standard Brownian motion andZ is a jump process defined by:
\mbox{Z}_{t}= \sum_{j=1}^{N_t}k_j
wherek_j are nonzero random variables. The counting process can be either finite or infinite for finite or infinite activity jumps.
The the Jiang and Ooment test is that in the absence of jumps, the accumulated difference between the simple returns and log returns captures half of the integrated variance. (Theodosiou and Zikes, 2009).If this difference is too great, the null hypothesis of no jumps is rejected.
Value
list
Author(s)
Giang Nguyen, Jonathan Cornelissen, Kris Boudt, and Emil Sjoerup
References
Andersen, T. G., Dobrev, D., and Schaumburg, E. (2012). Jump-robust volatility estimation using nearest neighbor truncation.Journal of Econometrics, 169, 75- 93.
Jiang, J. G., and Oomen, R. C. A (2008). Testing for jumps when asset prices are observed with noise- a "swap variance" approach.Journal of Econometrics, 144, 352-370.
Theodosiou, M., Zikes, F. (2009). A comprehensive comparison of alternative tests for jumps in asset prices. Unpublished manuscript, Graduate School of Business, Imperial College London.
Examples
joDT <- JOjumpTest(sampleTData[, list(DT, PRICE)])DEPRECATEDDEPRECATED USErRVar
Description
DEPRECATEDDEPRECATED USErRVar
Usage
RV(rData)Arguments
rData | DEPRECATED USE |
ReMeDI
Description
This function estimates the auto-covariance of market-microstructure noise
Let the observed priceY_{t} be given asY_{t} = X_{t} + \varepsilon_{t}, whereX_{t} is the efficient price and\varepsilon_t is the market microstructure noise
The estimator of thel'th lag of the market microstructure is defined as:
\hat{R}^{n}_{t,l} = \frac{1}{n_{t}} \sum_{i=2k_{n}}^{n_{t}-k_{n}-l} \left(Y_{i+l}^n - Y_{i+l+k_{n}}^{n} \right) \left(Y_{i}^n - Y_{i- 2k_{n}}^{n} \right),
wherek_{n} is a tuning parameter. In the functionknChooseReMeDI, we provide a function to estimate the optimalk_{n} parameter.
Usage
ReMeDI(pData, kn = 1, lags = 1, makeCorrelation = FALSE)Arguments
pData |
|
kn | numeric of length 1 determining the tuning parameter kn this controls the lengths of the non-overlapping interval in the ReMeDI estimation |
lags | numeric containing integer values indicating the lags for which to estimate the (co)variance |
makeCorrelation | logical indicating whether to transform the autocovariances into autocorrelations. The estimate of variance is imprecise and thus, constructing the correlation like this may show correlations that fall outside |
Note
We Thank Merrick Li for contributing his Matlab code for this estimator.
Author(s)
Emil Sjoerup.
References
Li, M. and Linton, O. (2021). A ReMeDI for microstructure noise. Econometrica, forthcoming
Examples
remed <- ReMeDI(sampleTData[as.Date(DT) == "2018-01-02", ], kn = 2, lags = 1:8)# We can also use the algorithm for choosing the kn tuning parameteroptimalKn <- knChooseReMeDI(sampleTData[as.Date(DT) == "2018-01-02",], knMax = 10, tol = 0.05, size = 3, lower = 2, upper = 5, plot = TRUE)optimalKnremed <- ReMeDI(sampleTData[as.Date(DT) == "2018-01-02", ], kn = optimalKn, lags = 1:8)Asymptotic variance of ReMeDI estimator
Description
Estimates the asymptotic variance of the ReMeDI estimator.
Usage
ReMeDIAsymptoticVariance(pData, kn, lags, phi, i)Arguments
pData |
|
kn | numerical value determining the tuning parameter kn this controls the lengths of the non-overlapping interval in the ReMeDI estimation |
lags | numeric containing integer values indicating the lags for which to estimate the (co)variance |
phi | tuning parameter phi |
i | tuning parameter i |
Details
Some notation is needed for the estimator of the asymptotic covariance of the ReMeDI estimator.Let
\delta\left(n, i\right) = t_{i}^{n}-t_{t-1}^{n}, i\geq 1,
\hat{\delta}_{t}^{n}=\left(\frac{k_{n}\delta\left(n,i+1+k_{n}\right)-t_{i+2+2k_{n}}^{n}+t_{i+2+k_{n}}^{n}}{\left(t_{i+k_{n}}^{n}-t_{i}^{n}\right)\vee\phi_{n}}\right)^{2},
U\left(1\right)_{t}^{n}=\sum_{i=0}^{n_{t}-\omega\left(1\right)_{n}}\hat{\delta}_{i}^{n},
U\left(2,\boldsymbol{j}\right)_{t}^{n}=\sum_{i=0}^{n_{t}-\omega\left(2\right)_{n}}\hat{\delta}_{i}^{n}\Delta_{\boldsymbol{j}}\left(Y\right)_{i+\omega\left(2\right)_{2}^{n}}^{n},
U\left(3,\boldsymbol{j},\boldsymbol{j}'\right)_{t}^{n}=\sum_{i=0}^{n_{t}-\omega\left(3\right)_{n}}\hat{\delta}_{i}^{n}\Delta_{\boldsymbol{j}}\left(Y\right)_{i+\omega\left(3\right)_{2}^{n}}^{n}\Delta_{\boldsymbol{j}'}\left(Y\right)_{i+\omega\left(3\right)_{3}^{n}}^{n},
U\left(4;\boldsymbol{j},\boldsymbol{j}'\right)_{t}^{n}=-\sum_{i=2^{q-1}k_{n}}^{n_{t}-\omega\left(4\right)_{n}}\Delta_{\boldsymbol{j}}\left(Y\right)\Delta_{\boldsymbol{j}^{\prime}}\left(Y\right)_{i+\omega\left(3\right)_{3}^{n}}^{n},
U\left(5,k;\boldsymbol{j},\boldsymbol{j}'\right)_{t}^{n}=\sum_{Q_{q}\in\mathcal{Q}_{q}}\sum_{i=2^{e\left(Q_{q}\right)}k_{n}}^{n_{t}-\omega\left(5\right)_{n}}\Delta_{\boldsymbol{j}_{Q_{q}\oplus\left(\boldsymbol{j}\prime_{Q_{q'}}\left(+k\right)\right)}}\left(Y\right)_{i}^{n}\prod_{\ell:l_{\ell}\in Q_{q}^{c}}\Delta_{\left(j_{l_{\ell}},j\prime_{l_{\ell}}+k\right)\left(Y\right)_{i+\omega\left(5\right)_{\ell+1}^{n}\prime}},
U\left(6,k;\boldsymbol{j},\boldsymbol{j}^{\prime}\right)=\sum_{j_{l}\in\boldsymbol{j},j_{l^{\prime}}^{\prime}\in\boldsymbol{j}^{\prime}}\sum_{i=2k_{n}}^{n_{t}-\omega\left(6\right)n}\Delta_{\left(j_{l},j_{l^{\prime}}^{\prime}+k\right)}\left(Y\right)_{i}^{n}\Delta_{\boldsymbol{j}_{-l}}\left(Y\right)_{i+\omega\left(6\right)_{2}^{n}}^{n}\Delta_{\boldsymbol{j}_{-l^{\prime}}^{\prime}}\left(Y\right)_{i+\omega\left(6\right)_{3}^{n}}^{n} \\ -\sum_{j_{l}\in\boldsymbol{j}}\sum_{i=2^{q}k_{n}}^{n_{t}-\omega^{\prime}\left(6\right)_{n}}\Delta_{\left\{ j_{l}\right\} \oplus\boldsymbol{j}^{\prime}\left(+k\right)}\left(Y\right)_{i}^{n}\Delta_{\boldsymbol{j}-l}\left(Y\right)_{i+\omega^{\prime}\left(6\right)_{2}^{n}}^{n} \\ -\sum_{j_{l^{\prime}\in\boldsymbol{j}^{\prime}}^{\prime}}\sum_{i=2^{q}k_{n}}^{n_{t}-\omega^{\prime\prime}\left(6\right)n}\Delta_{\left\{ j_{l^{\prime}}^{\prime}+k\right\} \oplus\boldsymbol{j}}\left(Y\right)_{i}^{n}\Delta_{\boldsymbol{j}_{-l^{\prime}}^{\prime}}\left(Y\right)_{i+\omega^{\prime\prime}\left(6\right)_{2}^{n}\prime}^{n},
U\left(7,k;\boldsymbol{j},\boldsymbol{j}^{\prime}\right)_{t}^{n}=ReMeDI\left(\boldsymbol{j}\oplus\boldsymbol{j}^{\prime}\left(+k\right)\right)_{t}^{n},
U\left(k;\boldsymbol{j},\boldsymbol{j}^{\prime}\right)_{t}^{n}=\sum_{\ell=5}^{7}U\left(\ell,k;\boldsymbol{j},\boldsymbol{j}^{\prime}\right)_{t}^{n},
U\left(k;\boldsymbol{j},\boldsymbol{j}^{\prime}\right)_{t}^{n}=\sum_{\ell=5}^{7}U\left(\ell,k;\boldsymbol{j},\boldsymbol{j}^{\prime}\right)_{t}^{n},
Where the indices are given by:
\omega\left(1\right)_{n}=2+2k_{n},\ \omega\left(2\right)_{2}^{n}=2+\left(3+2^{q-1}\right)k_{n},\ \omega\left(2\right)_{n}=\omega\left(2\right)_{2}^{n}+j_{1}+k_{n},
\omega\left(3\right)_{2}^{n}=2+\left(3+2^{q-1}\right)k_{n},\ \omega\left(3\right)_{3}^{n}=2+\left(5+2^{q-1}+2^{q^{\prime}-1}\right)k_{n}+j_{1},
\omega\left(3\right)_{n}=\omega\left(3\right)_{3}^{n}+j_{1}^{\prime}+k_{n},\ \omega\left(4\right)_{2}^{n}=2k_{n}+q_{n}^{\prime}+j_{1},\ \omega\left(4\right)_{n}=\omega\left(4\right)_{2}^{n}+j_{1}^{\prime}+k_{n},
e\left(Q_{q}\right)=\left(2\left|Q_{q}\right|+q^{\prime}-q-1\right)\vee1,\ \omega\left(5\right)_{\ell+1}^{n}=4\ell k_{n}+\sum_{\ell^{\prime}=1}^{\ell}j_{l_{\ell^{\prime}}}\vee\left(j_{l_{\ell}}^{\prime}+k\right)\textrm{for}\ell\geq 1,
\omega\left(5\right)_{n}=\omega\left(5\right)_{\left|Q_{q}^{c}\right|+1}^{n}+j_{l_{\left|Q_{q}^{c}\right|}}\vee\left(j_{l_{\left|Q_{q}^{c}\right|}}+k\right)+k_{n},
\omega\left(6\right)_{2}^{n}=\left(2^{q-2}+2\right)k_{n}+j_{\ell}\vee\left(j_{\ell^{\prime}}^{\prime}+k\right),\ \omega\left(6\right)_{3}^{n}=\left(2^{q-2}+2^{q^{\prime}-2}+2\right)k_{n}+j_{1}+j_{\ell}\vee\left(j_{\ell}^{\prime}+k\right),
\omega^{\prime}\left(6\right)_{2}^{n}=\left(2^{q-2}+2\right)k_{n}+j_{\ell}\vee\left(j_{1}^{\prime}+k\right),\ \omega^{\prime\prime}\left(6\right)_{2}^{n}=\left(2^{q^{\prime}-2}+1\right)k_{n}+\left(j_{\ell^{\prime}}^{\prime}+k\right)\vee j_{1},
\omega\left(6\right)_{n}=\omega\left(6\right)_{3}^{n}+j^{\prime}+k_{n},\ \omega^{\prime}\left(6\right)_{n}=\omega^{\prime}\left(6\right)_{2}^{n}+j_{1}+k_{n},\ \omega^{\prime\prime}\left(6\right)_{n}=\omega^{\prime\prime}\left(6\right)_{2}^{n}j_{1}^{\prime}+k_{n},
The asymptotic variance estimator is then given by
\hat{\sigma}\left(\boldsymbol{j},\boldsymbol{j}^{\prime}\right)_{t}^{n}=\frac{1}{n_{t}}\sum_{\ell=1}^{3}\hat{\sigma}_{\ell}\left(\boldsymbol{j},\boldsymbol{j}^{\prime}\right)_{t}^{n},
where
\hat{\sigma}_{1}\left(\boldsymbol{j},\boldsymbol{j}^{\prime}\right)_{t}^{n}=U\left(0;\boldsymbol{j},\boldsymbol{j}^{\prime}\right)+\sum_{k=1}^{i_{n}}\left(U\left(k;\boldsymbol{j},\boldsymbol{j}^{\prime}\right)_{t}^{n}\right)+\left(2i_{n}+1\right)U\left(4;\boldsymbol{j},\boldsymbol{j}\right)_{t}^{n},
\hat{\sigma}_{2}\left(\boldsymbol{j},\boldsymbol{j}^{\prime}\right)_{t}^{n}=U\left(3;\boldsymbol{j},\boldsymbol{j}^{\prime}\right),
\hat{\sigma}_{3}\left(\boldsymbol{j},\boldsymbol{j}^{\prime}\right)_{t}^{n}=\frac{1}{n_{t}^{2}}\textrm{ReMeDI}\left(Y,\boldsymbol{j}\right)_{t}^{n}\textrm{ReMeDI}\left(Y,\boldsymbol{j}^{\prime}\right)_{t}^{n}U\left(1\right)_{t}^{n}\\,
-\frac{1}{n_{t}}\left(\textrm{ReMeDI}\left(Y,\boldsymbol{j}\right)_{t}^{n}U\left(2,\boldsymbol{j}^{\prime}\right)_{t}^{n}+\textrm{ReMeDI}\left(Y,\boldsymbol{j}^{\prime}\right)_{t}^{n}U\left(2,\boldsymbol{j}\right)_{t}^{n}\right),
Value
a list with componentsReMeDI andasympVar containing the ReMeDI estimation and it's asymptotic variance respectively
Note
We Thank Merrick Li for contributing his Matlab code for this estimator.
Examples
kn <- knChooseReMeDI(sampleTDataEurope[, list(DT, PRICE)])remedi <- ReMeDI(sampleTDataEurope[, list(DT, PRICE)], kn = kn, lags = 0:15)asympVar <- ReMeDIAsymptoticVariance(sampleTDataEurope[, list(DT, PRICE)], kn = kn, lags = 0:15, phi = 0.9, i = 2)SPY realized measures
Description
Realized measures for the SPY ETF calculated at 1 and 5 minute sampling.
Usage
SPYRMFormat
Adata.table object
Note
The CLOSE column is NOT the official close price, but simply the last recorded price of the day. Thus, this may be slightly different from other sources.
Aggregate a time series but keep first and last observation
Description
Function to aggregate high frequency data by last tick aggregation to an arbitrary periodicity based on wall clocks.Alternatively the aggregation can be done by number of ticks. In case we DON'T do tick-based aggregation, this function accepts arbitrary number of symbols over a arbitrary number of days. Although the function has the word Price in the name,the function is general and works on arbitrary time series, eitherxts ordata.table objects the latter requires aDTcolumn containing POSIXct time stamps.
Usage
aggregatePrice( pData, alignBy = "minutes", alignPeriod = 1, marketOpen = "09:30:00", marketClose = "16:00:00", fill = FALSE, tz = NULL)Arguments
pData |
|
alignBy | character, indicating the time scale in which |
alignPeriod | positive numeric, indicating the number of periods to aggregate over. E.g. to aggregatebased on a 5 minute frequency, set |
marketOpen | the market opening time, by default: |
marketClose | the market closing time, by default: |
fill | indicates whether rows without trades should be added with the most recent value, FALSE by default. |
tz | fallback time zone used in case we we are unable to identify the timezone of the data, by default: |
Details
The time stamps of the new time series are the closing times and/or days of the intervals. The element of the returned series with e.g. time stamp 09:35:00 containsthe last observation up to that point, including the value at 09:35:00 itself.
In casealignBy = "ticks", the sampling is done such the sampling starts on the first tick, and the last tick is always included.For example, if 14 observations are made on one day, and these are 1, 2, 3, ... 14.Then, withalignBy = "ticks" andalignPeriod = 3, the output will be 1, 4, 7, 10, 13, 14.
Value
Adata.table orxts object containing the aggregated time series.
Author(s)
Jonathan Cornelissen, Kris Boudt, Onno Kleen, and Emil Sjoerup.
Examples
# Aggregate price data to the 30-second frequencyaggregatePrice(sampleTData, alignBy = "secs", alignPeriod = 30)# Aggregate price data to 30-minute frequency including zero return price changesaggregatePrice(sampleTData, alignBy = "minutes", alignPeriod = 30, fill = TRUE)Aggregate adata.table orxts object containing quote data
Description
Aggregate tick-by-tick quote data and return adata.table orxts object containing the aggregated quote data.SeesampleQData for an example of the argument qData. This function accepts arbitrary number of symbols over an arbitrary number of days.
Usage
aggregateQuotes( qData, alignBy = "minutes", alignPeriod = 5, marketOpen = "09:30:00", marketClose = "16:00:00", tz = NULL)Arguments
qData |
|
alignBy | character, indicating the time scale in which |
alignPeriod | positive numeric, indicating the number of periods to aggregate over. E.g. to aggregatebased on a 5 minute frequency, set |
marketOpen | the market opening time, by default: |
marketClose | the market closing time, by default: |
tz | fallback time zone used in case we we are unable to identify the timezone of the data, by default: |
Details
The output "BID" and "OFR" columns are constructed using previous tick aggregation.
The variables "BIDSIZ" and "OFRSIZ" are aggregated by taking the sum of the respective inputs over each interval.
The timestamps of the new time series are the closing times of the intervals.
Please note: Returned objects always contain the first observation (i.e. opening quotes,...).
Value
Adata.table or anxts object containing the aggregated quote data.
Author(s)
Jonathan Cornelissen, Kris Boudt, Onno Kleen, and Emil Sjoerup.
Examples
# Aggregate quote data to the 30 second frequencyqDataAggregated <- aggregateQuotes(sampleQData, alignBy = "seconds", alignPeriod = 30)qDataAggregated # Show the aggregated dataAggregate a time series
Description
Aggregate a time series asxts ordata.table object. It can handle irregularly spaced time series and returns a regularly spaced one.Use univariate time series as input for this function and check outaggregateTradesandaggregateQuotes to aggregate Trade or Quote data objects.
Usage
aggregateTS( ts, FUN = "previoustick", alignBy = "minutes", alignPeriod = 1, weights = NULL, dropna = FALSE, tz = NULL, ...)Arguments
ts |
|
FUN | function to apply over each interval. By default, previous tick aggregation is done. Alternatively one can set e.g. FUN = "mean".In case weights are supplied, this argument is ignored and a weighted average is taken. |
alignBy | character, indicating the time scale in which |
alignPeriod | positive numeric, indicating the number of periods to aggregate over. For example, to aggregatebased on a 5 minute frequency, set |
weights | By default, no weighting scheme is used. When you assign an |
dropna | boolean, which determines whether empty intervals should be dropped.By default, an NA is returned in case an interval is empty, except when the user optsfor previous tick aggregation, by setting |
tz | character denoting which timezone the output should be in. Defaults to |
... | extra parameters passed on to |
Details
The time stamps of the new time series are the closing times and/or days of the intervals. For example, for a weekly aggregation the new time stamp is the last day in that particular week (namely Sunday).
In case of previous tick aggregation, foralignBy is either"seconds""minutes", or"hours",the element of the returned series with e.g. timestamp 09:35:00 contains the last observation up to that point, including the value at 09:35:00 itself.
Please note: In case an interval is empty, by default an NA is returned.. In case e.g. previous tick aggregation it makes sense to fill these NAs by the functionna.locf(last observation carried forward) from thezoo package.
In casealignBy = "ticks", the sampling is done such the sampling starts on the first tick and the last tick is always included.For example, if 14 observations are made on one day, and these are 1, 2, 3, ... 14.Then, withalignBy = "ticks" andalignPeriod = 3, the output will be 1, 4, 7, 10, 13, 14.
Value
Anxts object containing the aggregated time series.
Author(s)
Jonathan Cornelissen, Kris Boudt, and Emil Sjoerup.
Examples
# Load sample price data## Not run: library(xts)ts <- as.xts(sampleTData[, list(DT, PRICE, SIZE)])# Previous tick aggregation to the 5-minute sampling frequency:tsagg5min <- aggregateTS(ts, alignBy = "minutes", alignPeriod = 5)head(tsagg5min)# Previous tick aggregation to the 30-second sampling frequency:tsagg30sec <- aggregateTS(ts, alignBy = "seconds", alignPeriod = 30)tail(tsagg30sec)tsagg3ticks <- aggregateTS(ts, alignBy = "ticks", alignPeriod = 3)## End(Not run)Aggregate adata.table orxts object containing trades data´
Description
Aggregate tick-by-tick trade data and return a time series as adata.table orxts object where first observation is always the opening priceand subsequent observations are the closing prices over the interval. This function accepts arbitrary number of symbols over an arbitrary number of days.
Usage
aggregateTrades( tData, alignBy = "minutes", alignPeriod = 5, marketOpen = "09:30:00", marketClose = "16:00:00", tz = NULL)Arguments
tData |
|
alignBy | character, indicating the time scale in which |
alignPeriod | positive numeric, indicating the number of periods to aggregate over. For example, to aggregatebased on a 5 minute frequency, set |
marketOpen | the market opening time, by default: |
marketClose | the market closing time, by default: |
tz | fallback time zone used in case we we are unable to identify the timezone of the data, by default: |
Details
The time stamps of the new time series are the closing times and/or days of the intervals.
The output"PRICE" column is constructed using previous tick aggregation.
The variable"SIZE" is aggregated by taking the sum over each interval.
The variable"VWPRICE" is the aggregated price weighted by volume.
The time stamps of the new time series are the closing times of the intervals.
In case of previous tick aggregation oralignBy = "seconds"/"minutes"/"hours",the element of the returned series with e.g. time stamp 09:35:00 contains the last observation up to that point, including the value at 09:35:00 itself.
Value
Adata.table orxts object containing the aggregated time series.
Author(s)
Jonathan Cornelissen, Kris Boudt, Onno Kleen, and Emil Sjoerup.
Examples
# Aggregate trade data to 5 minute frequencytDataAggregated <- aggregateTrades(sampleTData, alignBy = "minutes", alignPeriod = 5)tDataAggregatedRetain only data from the stock exchange with the highest volume
Description
Filters raw quote data and return only data that stems from the exchange with the highestvalue for the sum of"BIDSIZ" and"OFRSIZ", i.e. the highest quote volume.
Usage
autoSelectExchangeQuotes(qData, printExchange = TRUE)Arguments
qData | a |
printExchange | indicates whether the chosen exchange is printed on the console, default is
|
Value
data.table orxts object depending on input.
Author(s)
Jonathan Cornelissen, Kris Boudt, Onno Kleen, and Emil Sjoerup.
Examples
autoSelectExchangeQuotes(sampleQDataRaw)Retain only data from the stock exchange with the highest trading volume
Description
Filters raw trade data and return only data that stems from the exchange with the highestvalue for the variable"SIZE", i.e. the highest trade volume.
Usage
autoSelectExchangeTrades(tData, printExchange = TRUE)Arguments
tData | an |
printExchange | indicates whether the chosen exchange is printed on the console, default is
|
Value
data.table orxts object depending on input.
Author(s)
Jonathan Cornelissen, Kris Boudt, Onno Kleen, and Emil Sjoerup.
Examples
autoSelectExchangeTrades(sampleTDataRaw)Business time aggregation
Description
Time series aggregation based on 'business time' statistics. Instead of equidistant sampling based on time during a trading day, business time sampling creates measures and samples equidistantly using these instead.For example when sampling based on volume, business time aggregation will result in a time series that has an equal amount of volume between each observation (if possible).
Usage
businessTimeAggregation( pData, measure = "volume", obs = 390, bandwidth = 0.075, tz = NULL, ...)Arguments
pData |
|
measure | character denoting which measure to use. Valid options are |
obs | integer valued numeric of length 1 denoting how many observations is wanted after the aggregation procedure. |
bandwidth | numeric of length one, denoting which bandwidth parameter to use in the trade intensity process estimation of Oomen (2005). |
tz | fallback time zone used in case we we are unable to identify the timezone of the data, by default: |
... | extra arguments passed on to |
Value
A list containing"pData" which is the aggregated data and a list containing the intensity process, split up day by day.
Author(s)
Emil Sjoerup.
References
Dong, Y., and Tse, Y. K. (2017). Business time sampling scheme with applications to testing semi-martingale hypothesis and estimating integrated volatility.Econometrics, 5, 51.
Oomen, R. C. A. (2006). Properties of realized variance under alternative sampling schemes.Journal of Business & Economic Statistics, 24, 219-237
Examples
pData <- sampleTData[,list(DT, PRICE, SIZE)]# Aggregate based on the trade intensity measure. Getting 390 observations.agged <- businessTimeAggregation(pData, measure = "intensity", obs = 390, bandwidth = 0.075)# Plot the trade intensity measureplot.ts(agged$intensityProcess$`2018-01-02`)rCov(agged$pData[, list(DT, PRICE)], makeReturns = TRUE)rCov(pData[,list(DT, PRICE)], makeReturns = TRUE, alignBy = "minutes", alignPeriod = 1)# Aggregate based on the volume measure. Getting 78 observations.agged <- businessTimeAggregation(pData, measure = "volume", obs = 78)rCov(agged$pData[,list(DT, PRICE)], makeReturns = TRUE)rCov(pData[,list(DT, PRICE)], makeReturns = TRUE, alignBy = "minutes", alignPeriod = 5)Inference on drift burst hypothesis
Description
Calculates the test-statistic for the drift burst hypothesis
Let the efficient log-price be defined as:
dX_{t} = \mu_{t}dt + \sigma_{t}dW_{t} + dJ_{t},
where\mu_{t},\sigma_{t}, andJ_{t} are the spot drift, the spot volatility, and a jump process respectively.However, due to microstructure noise, the observed log-price is
Y_{t} = X_{t} + \varepsilon_{t}
In order robustify the results to the presence of market microstructure noise, the pre-averaged returns are used:
\Delta_{i}^{n}\overline{Y} = \sum_{j=1}^{k_{n}-1}g_{j}^{n}\Delta_{i+j}^{n}Y,
whereg(\cdot) is a weighting function,min(x, 1-x), andk_{n} is the pre-averaging horizon.
The test statistic for the Drift Burst Hypothesis can then be calculated as
\bar{T}_{t}^{n} = \sqrt{\frac{h_{n}}{K_{2}}}\frac{\hat{\bar{\mu}}_{t}^{n}}{\sqrt{\hat{\bar{\sigma}}_{t}^{n}}},
where
\hat{\bar{\mu}}_{t}^{n} = \frac{1}{h_{n}}\sum_{i=1}^{n-k_{n}+2}K\left(\frac{t_{i-1}-t}{h_{n}}\right)\Delta_{i-1}^{n}\overline{Y},
and
\hat{\bar{\sigma}}_{t}^{n} = \frac{1}{h_{n}'}\bigg[\sum_{i=1}^{n-k_{n}+2}\left(K\left(\frac{t_{i-1}-t}{h'_{n}}\right)\Delta_{i-1}^{n}\overline{Y}\right)^{2} \\ + 2\sum_{L=1}^{L_{n}}\omega\left(\frac{L}{L_{n}}\right)\sum_{i=1}^{n-k_{n}-L+2}K\left(\frac{t_{i-1}-t}{h_{n}'}\right)K\left(\frac{t_{i+L-1}-t}{h_{n}'}\right)\Delta_{i-1}^{n}\overline{Y}\Delta_{i-1+L}^{n}\overline{Y}\bigg],
where\omega(\cdot) is a smooth kernel function, in this case the Parzen kernel.L_{n} is the lag length for adjusting for auto-correlation andK(\cdot)is a kernel weighting function, which in this case is the left-sided exponential kernel.
Usage
driftBursts( pData, testTimes = seq(34260, 57600, 60), preAverage = 5, ACLag = -1L, meanBandwidth = 300L, varianceBandwidth = 900L, parallelize = FALSE, nCores = NA, warnings = TRUE)Arguments
pData | Either a |
testTimes | A |
preAverage | A positive |
ACLag | A positive |
meanBandwidth | An |
varianceBandwidth | An |
parallelize | A |
nCores | An |
warnings | A |
Details
If thetestTimes vector contains instructions to test before the first trade, or more than 15 minutes after the last trade, these entries will be deleted, as not doing so may cause crashes.The test statistic is unstable beforemax(meanBandwidth , varianceBandwidth) seconds has passed.The lags from the Newey-West algorithm is increased by2 * (preAveage-1) due to the pre-averaging we know at least this many lags should be corrected for.The maximum of 20 lags is also increased by this factor for the same reason.
Value
An object of classDBH andlist containing the series of the drift burst hypothesis test-statistic as well as the estimated spot drift and variance series. The list also contains some information such as the variance and mean bandwidths along with the pre-averaging setting and the amount of observations. Additionally, the list will contain information on whether testing happened for alltestTimes entries.Objects of classDBH has the methodsprint.DBH,plot.DBH, andgetCriticalValues.DBH which prints, plots, andretrieves critical values for the test described in appendix B of Christensen, Oomen, and Reno (2020).
Author(s)
Emil Sjoerup
References
Christensen, K., Oomen, R., and Reno, R. (2020) The drift burst hypothesis. Journal of Econometrics. Forthcoming.
Examples
# Usage with data.table objectdat <- sampleTData[as.Date(DT) == "2018-01-02"]# Testing every 60 seconds after 09:45:00DBH1 <- driftBursts(dat, testTimes = seq(35100, 57600, 60), preAverage = 2, ACLag = -1L, meanBandwidth = 300L, varianceBandwidth = 900L)print(DBH1)plot(DBH1, pData = dat)# Usage with xts object (1 column)library("xts")dat <- xts(sampleTData[as.Date(DT) == "2018-01-03"]$PRICE, order.by = sampleTData[as.Date(DT) == "2018-01-03"]$DT)# Testing every 60 seconds after 09:45:00DBH2 <- driftBursts(dat, testTimes = seq(35100, 57600, 60), preAverage = 2, ACLag = -1L, meanBandwidth = 300L, varianceBandwidth = 900L)plot(DBH2, pData = dat)## Not run: # This block takes some timedat <- xts(sampleTDataEurope$PRICE, order.by = sampleTDataEurope$DT)# Testing every 60 seconds after 09:00:00system.time({DBH4 <- driftBursts(dat, testTimes = seq(32400 + 900, 63000, 60), preAverage = 2, ACLag = -1L, meanBandwidth = 300L, varianceBandwidth = 900L)})system.time({DBH4 <- driftBursts(dat, testTimes = seq(32400 + 900, 63000, 60), preAverage = 2, ACLag = -1L, meanBandwidth = 300L, varianceBandwidth = 900L, parallelize = TRUE, nCores = 8)})plot(DBH4, pData = dat)# The print method for DBH objects takes an argument alpha that determines the confidence level# of the test performedprint(DBH4, alpha = 0.99)# Additionally, criticalValue can be passed directlyprint(DBH4, criticalValue = 3)max(abs(DBH4$tStat)) > getCriticalValues(DBH4, 0.99)$quantile## End(Not run)Extract data from anxts object for the exchange hours only
Description
Filter raw trade data such and return only data between market close and market open. By default,marketOpen = "09:30:00" andmarketClose = "16:00:00" (see Brownlees and Gallo (2006) for more information on good choices for these arguments).
Usage
exchangeHoursOnly( data, marketOpen = "09:30:00", marketClose = "16:00:00", tz = NULL)Arguments
data | a |
marketOpen | character in the format of |
marketClose | character in the format of |
tz | fallback time zone used in case we we are unable to identify the timezone of the data, by default: |
Value
xts ordata.table object depending on input.
Author(s)
Jonathan Cornelissen, Kris Boudt, Onno Kleen, and Emil Sjoerup.
References
Brownlees, C. T. and Gallo, G. M. (2006). Financial econometric analysis at ultra-high frequency: Data handling concerns. Computational Statistics & Data Analysis, 51, pages 2232-2245.
Examples
exchangeHoursOnly(sampleTDataRaw)Make TAQ format
Description
Convenience function to gather data from onexts ordata.table with at least"DT", and d columns containing price data to a"DT","SYMBOL", and"PRICE"column. This function the opposite ofspreadPrices.
Usage
gatherPrices(data)Arguments
data | An |
Value
adata.table with columnsDT,SYMBOL, andPRICE
Author(s)
Emil Sjoerup
See Also
Examples
## Not run: library(data.table)data1 <- copy(sampleTData)[, `:=`(PRICE = PRICE * runif(.N, min = 0.99, max = 1.01), DT = DT + runif(.N, 0.01, 0.02))]data2 <- copy(sampleTData)[, SYMBOL := 'XYZ']dat1 <- rbind(data1[, list(DT, SYMBOL, PRICE)], data2[, list(DT, SYMBOL, PRICE)])setkeyv(dat1, c("DT", "SYMBOL"))dat1dat <- spreadPrices(dat1) # Easy to use for realized measuresdatdat <- gatherPrices(dat)datall.equal(dat1, dat) # We have changed to RM format and back.## End(Not run)Get high frequency data from Alpha Vantage
Description
Function to retrieve high frequency data from Alpha Vantage - wrapper around quantmod's getSymbols.av function
Usage
getAlphaVantageData( symbols = NULL, interval = "5min", outputType = "xts", apiKey = NULL, doSleep = TRUE)Arguments
symbols | character vector with the symbols to import. |
interval | the sampling interval of the data retrieved. Should be one of one of "1min", "5min", "15min", "30min", or "60min" |
outputType | string either |
apiKey | string with the api key provided by Alpha Vantage. |
doSleep | logical when the length of symbols > 5 the function will sleep for 12 seconds by default. |
Details
ThedoSleep argument is set to true as default because Alpha Vantage has a limit of five calls per minute. The function does not try to extract when the last API call was made which means that ifyou made successive calls to get 3 symbols in rapid succession, the function may not retrieve all the data.
Value
An object of typexts ordata.table in case the length of symbols is 1. If the length of symbols > 1 thexts anddata.table objects will be put into a list.
Author(s)
Emil Sjoerup (wrapper only) Paul Teetor (for quantmod's getSymbols.av)
See Also
The getSymbols.av function in the quantmod package
Examples
## Not run: # Get data for SPY at an interval of 1 minute in the standard xts format.data <- getAlphaVantageData(symbols = "SPY", apiKey = "yourKey", interval = "1min")# Get data for 3M and Goldman Sachs at a 5 minute interval in the data.table format. # The data.tables will be put in a list.data <- getAlphaVantageData(symbols = c("MMM", "GS"), interval = "5min", outputType = "DT", apiKey = 'yourKey')# Get data for JPM and Citicorp at a 15 minute interval in the xts format. # The xts objects will be put in a list.data <- getAlphaVantageData(symbols = c("JPM", "C"), interval = "15min", outputType = "xts", apiKey = "yourKey")## End(Not run)Get critical value for the drift burst hypothesis t-statistic
Description
Method for DBH objects to calculate the critical value for the presence of a burst of drift.The critical value is that of the test described in appendix B in Christensen Oomen Reno
Usage
getCriticalValues(x, alpha = 0.95)Arguments
x | object of class |
alpha | numeric denoting the confidence level for the critical value. Possible values are |
Author(s)
Emil Sjoerup
References
Christensen, K., Oomen, R., and Reno, R. (2020) The drift burst hypothesis. Journal of Econometrics. Forthcoming.
Compute Liquidity Measure
Description
Function returns anxts ordata.table object containing 23 liquidity measures. Please see details below.
Note that this assumes a regular time grid.
Usage
getLiquidityMeasures(tqData, win = 300)Arguments
tqData | A |
win | A windows length for the forward-prices used for ‘realized’spread |
Details
NOTE:xts ordata.table should only contain one day of observationsSome markets have publish information about whether it was a buyer or a seller who initiated the trade. This information can be passed in a columnDIRECTION this column must only have 1 or -1 as values.
The respective liquidity measures are defined as follows:
- effectiveSpread
\mbox{effective spread}_t = 2*D_t*(\mbox{PRICE}_{t} - \frac{(\mbox{BID}_{t}+\mbox{OFR}_{t})}{2}),where
D_tis 1 (-1) iftrade_twas buy (sell) (see Boehmer (2005), Bessembinder (2003)). Note that the input of this function consists of the matched trades and quotes,so this is were the time indication refers to (and thus not to the registered quote timestamp).- realizedSpread: realized spread
\mbox{realized spread}_t = 2*D_t*(\mbox{PRICE}_{t} - \frac{(\mbox{BID}_{t+300}+\mbox{OFR}_{t+300})}{2}),where
D_tis 1 (-1) iftrade_twas buy (sell) (see Boehmer (2005), Bessembinder (2003)). Note that the time indication of\mbox{BID}and\mbox{OFR}refers to the registered time of the quote in seconds.- valueTrade: trade value
\mbox{trade value}_t = \mbox{SIZE}_{t}*\mbox{PRICE}_{t}.- signedValueTrade: signed trade value
\mbox{signed trade value}_t = D_t * (\mbox{SIZE}_{t}*\mbox{PRICE}_{t}),where
D_tis 1 (-1) iftrade_twas buy (sell) (see Boehmer (2005), Bessembinder (2003)).- depthImbalanceDifference: depth imbalance (as a difference)
\mbox{depth imbalance (as difference)}_t = \frac{D_t *(\mbox{OFRSIZ}_{t}-\mbox{BIDSIZ}_{t})}{(\mbox{OFRSIZ}_{t}+\mbox{BIDSIZ}_{t})},where
D_tis 1 (-1) iftrade_twas buy (sell) (see Boehmer (2005), Bessembinder (2003)). Note that the input of this function consists of the matched trades and quotes,so this is were the time indication refers to (and thus not to the registered quote timestamp).- depthImbalanceRatio: depth imbalance (as ratio)
\mbox{depth imbalance (as ratio)}_t = (\frac{\mbox{OFRSIZ}_{t}}{\mbox{BIDSIZ}_{t}})^{D_t},where
D_tis 1 (-1) iftrade_twas buy (sell) (see Boehmer (2005), Bessembinder (2003)). Note that the input of this function consists of the matched trades and quotes,so this is were the time indication refers to (and thus not to the registered quote timestamp).- proportionalEffectiveSpread: proportional effective spread
\mbox{proportional effective spread}_t = \frac{\mbox{effective spread}_t}{(\mbox{OFR}_{t}+\mbox{BID}_{t})/2}(Venkataraman, 2001).
Note that the input of this function consists of the matched trades and quotes,so this is were the time indication refers to (and thus not to the registered quote timestamp).
- proportionalRealizedSpread: proportional realized spread
\mbox{proportional realized spread}_t = \frac{\mbox{realized spread}_t}{(\mbox{OFR}_{t}+\mbox{BID}_{t})/2}(Venkataraman, 2001).
Note that the input of this function consists of the matched trades and quotes,so this is were the time indication refers to (and thus not to the registered
- priceImpact: price impact
\mbox{price impact}_t = \frac{\mbox{effective spread}_t - \mbox{realized spread}_t}{2}(see Boehmer (2005), Bessembinder (2003)).
- proportionalPriceImpact: proportional price impact
\mbox{proportional price impact}_t = \frac{\frac{(\mbox{effective spread}_t - \mbox{realized spread}_t)}{2}}{\frac{\mbox{OFR}_{t}+\mbox{BID}_{t}}{2}}(Venkataraman, 2001).Note that the input of this function consists of the matched trades andquotes, so this is where the time indication refers to (and thus not to theregistered quote timestamp).
- halfTradedSpread: half traded spread
\mbox{half traded spread}_t = D_t*(\mbox{PRICE}_{t} - \frac{(\mbox{BID}_{t}+\mbox{OFR}_{t})}{2}),where
D_tis 1 (-1) iftrade_twas buy (sell) (see Boehmer (2005), Bessembinder (2003)). Note that the input of this function consists of the matched trades and quotes,so this is were the time indication refers to (and thus not to the registered quote timestamp).- proportionalHalfTradedSpread: proportional half traded spread
\mbox{proportional half traded spread}_t = \frac{\mbox{half traded spread}_t}{\frac{\mbox{OFR}_{t}+\mbox{BID}_{t}}{2}}.Note that the input of this function consists of the matched trades and quotes,so this is were the time indication refers to (and thus not to the registered quote timestamp).
- squaredLogReturn: squared log return on trade prices
\mbox{squared log return on Trade prices}_t = (\log(\mbox{PRICE}_{t})-\log(\mbox{PRICE}_{t-1}))^2.- absLogReturn: absolute log return on trade prices
\mbox{absolute log return on Trade prices}_t = |\log(\mbox{PRICE}_{t})-\log(\mbox{PRICE}_{t-1})|.- quotedSpread: quoted spread
\mbox{quoted spread}_t = \mbox{OFR}_{t}-\mbox{BID}_{t}Note that the input of this function consists of the matched trades andquotes, so this is where the time indication refers to (and thus not to theregistered quote timestamp).
- proportionalQuotedSpread: proportional quoted spread
\mbox{proportional quoted spread}_t = \frac{\mbox{quoted spread}_t}{\frac{\mbox{OFR}_{t}+\mbox{BID}_{t}}{2}}(Venkataraman, 2001).Note that the input of this function consists of the matched trades andquotes, so this is where the time indication refers to (and thus not to theregistered quote timestamp).
- logQuotedSpread: log quoted spread
\mbox{log quoted spread}_t = \log(\frac{\mbox{OFR}_{t}}{\mbox{BID}_{t}})(Hasbrouck and Seppi, 2001).Note that the input of this function consists of the matched trades andquotes, so this is where the time indication refers to (and thus not to theregistered quote timestamp).
- logQuotedSize: log quoted size
\mbox{log quoted size}_t = \log(\mbox{OFRSIZ}_{t})+\log(\mbox{BIDSIZ}_{t})(Hasbrouck and Seppi, 2001).Note that the input of this function consists of the matched trades andquotes, so this is where the time indication refers to (and thus not to theregistered quote timestamp).
- quotedSlope: quoted slope
\mbox{quoted slope}_t = \frac{\mbox{quoted spread}_t}{\mbox{log quoted size}_t}(Hasbrouck and Seppi, 2001).
- logQSlope: log quoted slope
\mbox{log quoted slope}_t = \frac{\mbox{log quoted spread}_t}{\mbox{log quoted size}_t}.- midQuoteSquaredReturn: midquote squared return
\mbox{midquote squared return}_t = (\log(\mbox{midquote}_{t})-\log(\mbox{midquote}_{t-1}))^2,where
\mbox{midquote}_{t} = \frac{\mbox{BID}_{t} + \mbox{OFR}_{t}}{2}.- midQuoteAbsReturn: midquote absolute return
\mbox{midquote absolute return}_t = |\log(\mbox{midquote}_{t})-\log(\mbox{midquote}_{t-1})|,where
\mbox{midquote}_{t} = \frac{\mbox{BID}_{t} + \mbox{OFR}_{t}}{2}.- signedTradeSize: signed trade size
\mbox{signed trade size}_t = D_t * \mbox{SIZE}_{t},where
D_tis 1 (-1) iftrade_twas buy (sell).
Value
A modified (enlarged)xts ordata.table with the new measures.
References
Bessembinder, H. (2003). Issues in assessing trade execution costs.Journal of Financial Markets, 223-257.
Boehmer, E. (2005). Dimensions of execution quality: Recent evidence for US equity markets.Journal of Financial Economics, 78, 553-582.
Hasbrouck, J. and Seppi, D. J. (2001). Common factors in prices, order flows and liquidity.Journal of Financial Economics, 59, 383-411.
Venkataraman, K. (2001). Automated versus floor trading: An analysis of execution costs on the Paris and New York exchanges.The Journal of Finance, 56, 1445-1485.
Examples
tqData <- matchTradesQuotes(sampleTData[as.Date(DT) == "2018-01-02"], sampleQData[as.Date(DT) == "2018-01-02"])res <- getLiquidityMeasures(tqData)resGet trade direction
Description
Function returns a vector with the inferred trade direction which is determined using the Lee and Ready algorithm (Lee and Ready, 1991).
Usage
getTradeDirection(tqData)Arguments
tqData |
|
Details
NOTE: By convention the first observation is always marked as a buy.
Value
A vector which has values 1 or (-1) if the inferred trade directionis buy or sell respectively.
Author(s)
Jonathan Cornelissen, Kris Boudt, Onno Kleen, and Emil Sjoerup. Special thanks to Dirk Eddelbuettel.
References
Lee, C. M. C. and Ready, M. J. (1991). Inferring trade direction from intraday data.Journal of Finance, 46, 733-746.
Examples
# Generate matched trades and quote data settqData <- matchTradesQuotes(sampleTData[as.Date(DT) == "2018-01-02"], sampleQData[as.Date(DT) == "2018-01-02"])directions <- getTradeDirection(tqData)head(directions)Intraday jump tests
Description
This function can be used to test for jumps in intraday price paths.
The tests are of the formL(t) = (R(t) - mu(t))/sigma(t).
SeespotVol andspotDrift for Estimators for\sigma(t) and\mu(t), respectively.
Usage
intradayJumpTest( pData, volEstimator = "RM", driftEstimator = "none", alpha = 0.95, alignBy = "minutes", alignPeriod = 5, marketOpen = "09:30:00", marketClose = "16:00:00", tz = NULL, n = NULL, ...)Arguments
pData |
|
volEstimator | character denoting which volatility estimator to use for the tests. See |
driftEstimator | character denoting which drift estimator to use for the tests. See |
alpha | numeric of length one determining what confidence level to use when constructing the critical values. |
alignBy | character, indicating the time scale in which |
alignPeriod | positive numeric, indicating the number of periods to aggregate over. E.g. to aggregatebased on a 5 minute frequency, set |
marketOpen | the market opening time. This should be in the time zonespecified by |
marketClose | the market closing time. This should be in the time zonespecified by |
tz | fallback time zone used in case we we are unable to identify the timezone of the data, by default: |
n | number of observation to use in the calculation of the critical values of the test statistic. If this is left as |
... | extra arguments passed on to The null hypothesis of the tests in this function is that there are no jumps in the price series |
Author(s)
Emil Sjoerup
References
Christensen, K., Oomen, R. C. A., Podolskij, M. (2014): Fact or Friction: Jumps at ultra high frequency.Journal of Financial Economics, 144, 576-599
Examples
## Not run: # We can easily make a Lee-Mykland jump test.LMtest <- intradayJumpTest(pData = sampleTData[, list(DT, PRICE)], volEstimator = "RM", driftEstimator = "none", RM = "rBPCov", lookBackPeriod = 20, alignBy = "minutes", alignPeriod = 5, marketOpen = "09:30:00", marketClose = "16:00:00")plot(LMtest)# We can just as easily use the pre-averaged version from the "Fact or Friction" paperFoFtest <- intradayJumpTest(pData = sampleTData[, list(DT, PRICE)], volEstimator = "PARM", driftEstimator = "none", RM = "rBPCov", lookBackPeriod = 20, theta = 1.2, marketOpen = "09:30:00", marketClose = "16:00:00")plot(FoFtest)## End(Not run)ReMeDI tuning parameter
Description
Function to choose the tuning parameter, kn in ReMeDI estimation. The optimal parameterkn is the smallest value that where the criterion:
SqErr(k_{n})^{n}_{t} = \left(\hat{R}^{n,k_{n}}_{t,0} - \hat{R}^{n,k_{n}}_{t,1} - \hat{R}^{n,k_{n}}_{t,2} + \hat{R}^{n,k_{n}}_{t,3} - \hat{R}^{n, k_{n}}_{t,l}\right)^{2}
is perceived to be zero. The tuning parametertol can be set to choose the tolerance of the perception of 'close to zero', a higher tolerance will lead to a higher optimal value.
Usage
knChooseReMeDI( pData, knMax = 10, tol = 0.05, size = 3, lower = 2, upper = 5, plot = FALSE)Arguments
pData |
|
knMax | max value of |
tol | tolerance for the minimizing value. If |
size | size of the local window. |
lower | lower boundary for the method if it fails to find an optimal value. If this is the case, the best kn between lower and upper is returned |
upper | upper boundary for the method if it fails to find an optimal value. If this is the case, the best kn between lower and upper is returned |
plot | logical whether to plot the errors. |
Details
This is the algorithm B.2 in the appendix of the Li and Linton (2019) working paper.
Value
integer containing the optimal kn
Note
We Thank Merrick Li for contributing his Matlab code for this estimator.
Author(s)
Emil Sjoerup.
References
Li, M. and Linton, O. (2019). A ReMeDI for microstructure noise. Cambridge Working Papers in Economics 1908.
Examples
optimalKn <- knChooseReMeDI(sampleTData[as.Date(DT) == "2018-01-02",], knMax = 10, tol = 0.05, size = 3, lower = 2, upper = 5, plot = TRUE)optimalKn## Not run: # We can also have a much larger search-spaceoptimalKn <- knChooseReMeDI(sampleTDataEurope, knMax = 50, tol = 0.05, size = 3, lower = 2, upper = 5, plot = TRUE)optimalKn## End(Not run)Lead-Lag estimation
Description
Function that estimates whether one series leads (or lags) another.
LetX_{t} andY_{t} be two observed price over the time interval[0,1].
For every integerk \in \cal{Z}, we form the shifted time series
Y_{\left(k+i\right)/n}, \quad i = 1, 2, \dots
H=\left(\underline{H},\overline{H}\right] is an interval for\vartheta\in\Theta, define the shift intervalH_{\vartheta}=H+\vartheta=\left(\underline{H}+\vartheta,\overline{H}+\vartheta\right] then let
X\left(H\right)_{t}=\int_{0}^{t}1_{H}\left(s\right)\textrm{d}X_{s}
Which will be abbreviated:
X\left(H\right)=X\left(H\right)_{T+\delta}=\int_{0}^{T+\delta}1_{H}\left(s\right)\textrm{d}X_{s}
Then the shifted HY contrast function is:
\tilde{\vartheta}\rightarrow U^{n}\left(\tilde{\vartheta}\right)= \\ 1_{\tilde{\vartheta}\geq0}\sum_{I\in{\cal{I}},J\in{\cal{J}},\overline{I}\leq T}X\left(I\right)Y\left(J\right)1_{\left\{ I\cap J_{-\tilde{\vartheta}}\neq\emptyset\right\}} \\ +1_{\tilde{\vartheta}<0}\sum_{I\in{\cal{I}},J\in{\cal{J}},\overline{J}\leq T}X\left(I\right)Y\left(Y\right)1_{\left\{ J\cap I_{\tilde{\vartheta}}\neq\emptyset\right\} }
This contrast function is then calculated for all the lags passed in the argumentlags
Usage
leadLag( price1 = NULL, price2 = NULL, lags = NULL, resolution = "seconds", normalize = TRUE, parallelize = FALSE, nCores = NA)Arguments
price1 |
|
price2 |
|
lags | a numeric denoting which lags (in units of |
resolution | the resolution at which the lags is measured. The default is "seconds", use this argument to gain 1000 times resolution by setting it to either "ms", "milliseconds", or "milli-seconds". |
normalize | logical denoting whether the contrasts should be normalized by the product of the L2 norms of both the prices. Default = TRUE. This does not change the value of the lead-lag-ratio. |
parallelize | logical denoting whether to use a parallelized version of the C++ code (parallelized using OPENMP). Default = FALSE |
nCores | integer valued numeric denoting how many cores to use for the lead-lag estimation procedure in case parallelize is TRUE. Default is NA, which does not parallelize the code. |
Details
The lead-lag-ratio (LLR) can be used to see if one asset leads the other. If LLR < 1, then price1 MAY be leading price2 and vice versa if LLR > 1.
Value
A list with classleadLag which containscontrasts,lead-lag-ratio, andlags, denoting the estimated values for each lag calculated,the lead-lag-ratio, and the tested lags respectively.
References
Hoffmann, M., Rosenbaum, M., and Yoshida, N. (2013). Estimation of the lead-lag parameter from non-synchronous data.Bernoulli, 19, 1-37.
Examples
## Not run: # Toy example to show the usage# Spread pricesspread <- spreadPrices(sampleMultiTradeData[SYMBOL %in% c("ETF", "AAA")])# Use lead-lag estimatorllEmpirical <- leadLag(spread[!is.na(AAA), list(DT, PRICE = AAA)], spread[!is.na(ETF), list(DT, PRICE = ETF)], seq(-15,15))plot(llEmpirical)## End(Not run)Available kernels
Description
Returns a vector of the available kernels.
Usage
listAvailableKernels()Details
The available kernels are:
Rectangular:
K(x) = 1.Bartlett:
K(x) = 1 - x.Second-order:
K(x) = 1 - 2x - x^2.Epanechnikov:
K(x) = 1 - x^2.Cubic:
K(x) = 1 - 3 x^2 + 2 x^3.Fifth:
K(x) = 1 - 10 x^3 + 15 x^4 - 6 x^5.Sixth:
K(x) = 1 - 15 x^4 + 24 x^5 - 10 x^6Seventh:
K(x) = 1 - 21 x^5 + 35 x^6 - 15 x^7.Eighth:
K(x) = 1 - 28 x^6 + 48 x^7 - 21 x^8.Parzen:
K(x) = 1- 6 x^2 + 6 x^3ifk \leq 0.5andK(x) = 2 (1-x)^3ifk > 0.5.TukeyHanning:
K(x) = 1 + \sin(\pi/2 - \pi \cdot x))/2.ModifiedTukeyHanning:
K(x) = (1 - \sin(\pi/2 - \pi \ (1 - x)^2 ) / 2.
Value
a character vector.
Author(s)
Scott Payseur.
References
Barndorff-Nielsen, O. E., Hansen, P. R., Lunde, A., and Shephard, N. (2008). Designing realized kernels to measure the ex post variation of equity prices in the presence of noise.Econometrica, 76, 1481-1536.
Examples
listAvailableKernelsUtility function listing the available estimators for the CholCov estimation
Description
Utility function listing the available estimators for the CholCov estimation
Usage
listCholCovEstimators()Value
This function returns a character vector containing the available estimators.
Make Open-High-Low-Close-Volume bars
Description
This function makes OHLC-V bars at arbitrary intervals. If the SIZE column is not present in the input, no volume column is created.
Usage
makeOHLCV(pData, alignBy = "minutes", alignPeriod = 5, tz = NULL)Arguments
pData |
|
alignBy | character, indicating the time scale in which |
alignPeriod | positive numeric, indicating the number of periods to aggregate over. For example, to aggregatebased on a 5 minute frequency, set |
tz | fallback time zone used in case we we are unable to identify the timezone of the data, by default: |
Author(s)
Emil Sjoerup
Examples
## Not run: minuteBars <- makeOHLCV(sampleTDataEurope, alignBy = "minutes", alignPeriod = 1)# We can use the quantmod package's chartSeries function to plot the ohlcv dataquantmod::chartSeries(minuteBars)minuteBars <- makeOHLCV(sampleTDataEurope[,], alignBy = "minutes", alignPeriod = 1)# Again we plot the series with chartSeriesquantmod::chartSeries(minuteBars)# We can also handle data across multiple days.fiveMinuteBars <- makeOHLCV(sampleTData)# Again we plot the series with chartSeriesquantmod::chartSeries(fiveMinuteBars)# We can use arbitrary alignPeriod, here we choose pibars <- makeOHLCV(sampleTDataEurope, alignBy = "seconds", alignPeriod = pi)# Again we plot the series with chartSeriesquantmod::chartSeries(bars)## End(Not run)Returns the positive semidefinite projection of a symmetric matrix using the eigenvalue method
Description
Function returns the positive semidefinite projection of a symmetric matrix using the eigenvalue method.
Usage
makePsd(S, method = "covariance")Arguments
S | a non-PSD matrix. |
method | character, indicating whether the negative eigenvalues of the correlation or covariance should be replaced by zero. Possible values are "covariance" and "correlation". |
Details
We use the eigenvalue method to transformS into a positivesemidefinite covariance matrix (see, e.g., Barndorff-Nielsen and Shephard, 2004, and Rousseeuw and Molenberghs, 1993). Let\Gamma be theorthogonal matrix consisting of thep eigenvectors ofS. Denote\lambda_1^+,\ldots,\lambda_p^+ itsp eigenvalues, whereby the negative eigenvalues have been replaced by zeroes.Under this approach, the positive semi-definiteprojection ofS is S^+ = \Gamma' \mbox{diag}(\lambda_1^+,\ldots,\lambda_p^+) \Gamma.
If method = "correlation", the eigenvalues of the correlation matrix corresponding to the matrixS are transformed, see Fan et al. (2010).
Value
A matrix containing the positive semi definite matrix.
Author(s)
Jonathan Cornelissen, Kris Boudt, and Emil Sjoerup.
References
Barndorff-Nielsen, O. E. and Shephard, N. (2004). Measuring the impact of jumps in multivariate price processes using bipower covariation. Discussion paper, Nuffield College, Oxford University.
Fan, J., Li, Y., and Yu, K. (2012). Vast volatility matrix estimation using high frequency data for portfolio selection.Journal of the American Statistical Association, 107, 412-428
Rousseeuw, P. and Molenberghs, G. (1993). Transformation of non positive semidefinite correlation matrices.Communications in Statistics - Theory and Methods, 22, 965-984.
DEPRECATED usespreadPrices
Description
DEPRECATED usespreadPrices
Usage
makeRMFormat(data)Arguments
data | DEPRECATED |
Compute log returns
Description
Convenience function to calculate log-returns, also used extensively internally.Acceptsxts andmatrix-like objects. If you use this with adata.table object, remember to not pass theDT column.
\mbox{log return}_t = (\log(\mbox{PRICE}_{t})-\log(\mbox{PRICE}_{t-1})).
Usage
makeReturns(ts)Arguments
ts | a possibly multivariate matrix-like object containing prices in levels. If |
Details
Note: the first (row of) observation(s) is set to zero.
Value
Depending on input, either amatrix or anxts object containing the log returns.
Author(s)
Jonathan Cornelissen, Kris Boudt, and Emil Sjoerup
Match trade and quote data
Description
Match the trades and quotes of the input data. All trades are retained and the latest bids and offers are retained,while 'old' quotes are discarded.
Usage
matchTradesQuotes( tData, qData, lagQuotes = 0, BFM = FALSE, backwardsWindow = 3600, forwardsWindow = 0.5, plot = FALSE, ...)Arguments
tData |
|
qData |
|
lagQuotes | numeric, number of seconds the quotes are registered faster thanthe trades (should be round and positive). Default is 0. For older datasets, i.e. before 2010, it may be a good idea to set this to e.g. 2. See Vergote (2005) |
BFM | a logical determining whether to conduct 'Backwards - Forwards matching' of trades and quotes.The algorithm tries to match trades that fall outside the bid - ask and first tries to match a small window forwards and if this fails, it tries to match backwards in a bigger window.The small window is a tolerance for inaccuracies in the timestamps of bids and asks. The backwards window allow for matching of late reported trades. I.e. block trades. |
backwardsWindow | a numeric denoting the length of the backwards window used when |
forwardsWindow | a numeric denoting the length of the forwards window used when |
plot | a logical denoting whether to visualize the forwards, backwards, and unmatched trades in a plot. |
... | used internally. Don't set this parameter |
Value
Depending on the input data type, we return either adata.table or anxts object containing the matched trade and quote data.When using the BFM algorithm, a report of the matched and unmatched trades are also returned (This is omitted when we call this function from thetradesCleanupUsingQuotes function).
Author(s)
Jonathan Cornelissen, Kris Boudt, Onno Kleen, and Emil Sjoerup.
References
Vergote, O. (2005). How to match trades and quotes for NYSE stocks? K.U.Leuven working paper.
Christensen, K., Oomen, R. C. A., Podolskij, M. (2014): Fact or Friction: Jumps at ultra high frequency.Journal of Financial Economics, 144, 576-599
Examples
# Multi-day input allowedtqData <- matchTradesQuotes(sampleTData, sampleQData)# Show outputtqDataMerge multiple quote entries with the same time stamp
Description
Merge quote entries that have the same time stamp to a single one and returns anxts or adata.table objectwith unique time stamps only.
Usage
mergeQuotesSameTimestamp(qData, selection = "median")Arguments
qData | an |
selection | indicates how the bid and ask price for a certain time stampshould be calculated in case of multiple observation for a certain timestamp. By default,
|
Value
Depending on the input data type, we return either adata.table or anxts object containing the quote data which has been cleaned.
Author(s)
Jonathan Cornelissen, Kris Boudt, Onno Kleen, and Emil Sjoerup.
Merge multiple transactions with the same time stamp
Description
Merge trade entries that have the same time stamp to a single one and returns anxts or adata.table objectwith unique time stamps only.
Usage
mergeTradesSameTimestamp(tData, selection = "median")Arguments
tData | an |
selection | indicates how the price for a certain time stampshould be calculated in case of multiple observation for a certain timestamp. By default,
|
Value
data.table orxts object depending on input.
Note
previously this function returned the mean of the size of the merged trades (pre version 0.7 and when not using max.volume as the criterion), now it returns the sum.
Author(s)
Jonathan Cornelissen, Kris Boudt, Onno Kleen, and Emil Sjoerup.
to use when p,k different from range [4,6]
Description
to use when p,k different from range [4,6]
Usage
mukp(p, k, t = 1e+06)Delete the observations where the price is zero
Description
Function deletes the observations where the price is zero.
Usage
noZeroPrices(tData)Arguments
tData | an |
Value
anxts ordata.table object depending on input.
Author(s)
Jonathan Cornelissen and Kris Boudt.
Delete the observations where the bid or ask is zero
Description
Function deletes the observations where the bid or ask is zero.
Usage
noZeroQuotes(qData)Arguments
qData | an |
Value
xts object ordata.table depending on type of input.
Author(s)
Jonathan Cornelissen and Kris Boudt.
Plotting method forDBH objects
Description
Plotting method forDBH objects
Usage
## S3 method for class 'DBH'plot(x, ...)Arguments
x | an object of class |
... | optional arguments, see details |
Details
The plotting method has the following optional parameters:
pDataA
data.tableor anxtsobject, containing the prices and timestamps of the data used to calculate the test statistic.If specified, andwhich = "tStat", the price will be shown on the right y-axis along with the test statisticwhichA string denoting which of four plots to make.
"tStat"denotes plotting the test statistic."sigma"denotes plotting theestimated volatility process."mu"denotes plotting the estimated drift process. Ifwhich = c("sigma", "mu")orwhich = c("mu", "sigma"),both the drift and volatility processes are plotted. CaPiTAlizAtIOn doesn't matter
Author(s)
Emil Sjoerup
Examples
# Testing every 60 seconds after 09:15:00DBH <- driftBursts(sampleTDataEurope, testTimes = seq(32400 + 900, 63000, 60), preAverage = 2, ACLag = -1L, meanBandwidth = 300L, varianceBandwidth = 900L)plot(DBH)plot(DBH, pData = sampleTDataEurope)plot(DBH, which = "sigma")plot(DBH, which = "mu")plot(DBH, which = c("sigma", "mu"))Plotting method for HARmodel objects
Description
Plotting method for HARmodel objects
Usage
## S3 method for class 'HARmodel'plot(x, ...)Arguments
x | an object of class |
... | extra arguments, see details |
Details
The plotting method has the following optional parameter:
legend.locA string denoting the location of the legend passed on to
addLegendof thexts package
Plotting method for HEAVYmodel objects
Description
Plotting method for HEAVYmodel objects
Usage
## S3 method for class 'HEAVYmodel'plot(x, ...)Arguments
x | an object of class |
... | extra arguments, see details. |
Details
The plotting method has the following optional parameter:
legend.locA string denoting the location of the legend passed on to
addLegendof thexts packagetypeA string denoting the type of lot to be made. If
typeis"condVar"the fitted values of the conditional variance of the returnsis shown. Iftypeis different from"condVar", the fitted values of the realized measure is shown. Default is"condVar"
Plot Trade and Quote data
Description
Plot trade and quote data, trades are marked by crosses, and quotes are plotted as boxes denoting the bid-offer spread for all the quotes.
Usage
plotTQData( tData, qData = NULL, xLim = NULL, tradeCol = "black", quoteCol = "darkgray", format = "%H:%M:%S", axisCol = "black", ...)Arguments
tData | cleaned trades data |
qData | cleaned quotes data |
xLim | timestamps for the start and the end of the plots. |
tradeCol | color in which to paint the trade crosses. |
quoteCol | color in which to fill out the bid-offer spread. |
format | format string to pass to |
axisCol | string to denote which color to use for the x axis |
... | passed to |
Examples
## Not run: cleanedQuotes = quotesCleanup(qDataRaw = sampleQDataRaw, report = FALSE, printExchange = FALSE)cleanedTrades <- tradesCleanupUsingQuotes( tData = tradesCleanup(tDataRaw = sampleTDataRaw, report = FALSE, printExchange = FALSE), qData = quotesCleanup(qDataRaw = sampleQDataRaw, report = FALSE, printExchange = FALSE) )[as.Date(DT) == "2018-01-03"]xLim <- range(as.POSIXct(c("2018-01-03 15:30:00", "2018-01-03 16:00:00"), tz = "EST"))plotTQData(cleanedTrades, cleanedQuotes, xLim = xLim, main = "Raw trade and quote data from NYSE TAQ")## End(Not run)Predict method for objects of typeHARmodel
Description
Predict method for objects of typeHARmodel
Usage
## S3 method for class 'HARmodel'predict(object, ...)Arguments
object | an object of class |
... | extra arguments. See details |
Details
The print method has the following optional parameters:
newdatanew data to use for forecasting
warningsA logical denoting whether to display warnings, detault is
TRUEbacktransformA string. If the model is estimated with transformation this parameter can be set to transform the prediction back into varianceThe possible values are
"simple"which means inverse of transformation, i.e.expwhen log-transformation is applied. If using log transformation,the option"parametric"can also be used to transform back. The parametric method adds a correction that stems from using the log-transformation
Iterative multi-step-ahead forecasting for HEAVY models
Description
Calculates forecasts forh_{T+k}, whereT denotes the end of the estimationperiod for fitting the HEAVYmodel andk = 1, \dots, \code{stepsAhead}.
Usage
## S3 method for class 'HEAVYmodel'predict(object, stepsAhead = 10, ...)Arguments
object | an object of class HEAVYmodel. |
stepsAhead | the number of days iterative forecasts are calculated for (default 10). |
... | further arguments passed to or from other methods. |
Printing method forDBH objects
Description
Printing method forDBH objects
Usage
## S3 method for class 'DBH'print(x, ...)Arguments
x | an object of class |
... | optional arguments, see details |
Details
The print method has the following optional parameters:
criticalValueA numeric denoting a custom critical value of the test.
alphaA numeric denoting the confidence level of the test. The alpha value is passed on to
getCriticalValues.The default value is 0.95
Author(s)
Emil Sjoerup
Examples
## Not run: DBH <- driftBursts(sampleTDataEurope, testTimes = seq(32400 + 900, 63000, 300), preAverage = 2, ACLag = -1L, meanBandwidth = 300L, varianceBandwidth = 900L)print(DBH)print(DBH, criticalValue = 1) # This value doesn't make sense - don't actually use it!print(DBH, alpha = 0.95) # 5% confidence level - this is the standardprint(DBH, alpha = 0.99) # 1% confidence level## End(Not run)Printing method forHARmodel objects
Description
Printing method forHARmodel objects
Usage
## S3 method for class 'HARmodel'print(x, ...)Arguments
x | object of type |
... | extra options |
Details
The printing method has the extra optiondigits which can be used to set the number of digits for printing passlag to determine the maximum order of the Newey West estimator. Default is22
Cleans quote data
Description
This is a wrapper function for cleaning the quote data in the entire folderdataSource. The result is saved in the folderdataDestination.
In case you supply the argumentqDataRaw, the on-disk functionality is ignoredand the function returns the cleaned quotes asxts ordata.table object (see examples).
The following cleaning functions are performed sequentially:noZeroQuotes,exchangeHoursOnly,autoSelectExchangeQuotes orselectExchange,rmNegativeSpread,rmLargeSpreadmergeQuotesSameTimestamp,rmOutliersQuotes.
Usage
quotesCleanup( dataSource = NULL, dataDestination = NULL, exchanges = "auto", qDataRaw = NULL, report = TRUE, selection = "median", maxi = 50, window = 50, type = "standard", marketOpen = "09:30:00", marketClose = "16:00:00", rmoutliersmaxi = 10, printExchange = TRUE, saveAsXTS = FALSE, tz = NULL)Arguments
dataSource | character indicating the folder in which the original data is stored. |
dataDestination | character indicating the folder in which the cleaned data is stored. |
exchanges | vector of stock exchange symbols for all data in dataSource, e.g.
. The default value is |
qDataRaw |
|
report | boolean and |
selection | argument to be passed on to the cleaning routine |
maxi | spreads which are greater than median spreads of the day times |
window | argument to be passed on to the cleaning routine |
type | argument to be passed on to the cleaning routine |
marketOpen | passed to |
marketClose | passed to |
rmoutliersmaxi | argument to be passed on to the cleaning routine |
printExchange | Argument passed to |
saveAsXTS | indicates whether data should be saved in |
tz | fallback time zone used in case we we are unable to identify the timezone of the data, by default: |
Details
Using the on-disk functionality with .csv.zip files which is the standard from the WRDS databasewill write temporary files on your machine - we try to clean up after it, but cannot guarantee that there won't be files that slip through the crack if the permission settings on your machine does not match ours.
If the inputdata.table does not contain aDT column but it does containDATE andTIME_M columns, we create theDT column by REFERENCE, altering thedata.table that may be in the user's environment!
Value
The function converts every (compressed) csv (or rds) file indataSource into multiplexts ordata.table files.
IndataDestination, there will be one folder for each symbol containing .rds files with cleaned data stored either indata.table orxts format.
In case you supply the argumentqDataRaw, the on-disk functionality is ignoredand the function returns a list with the cleaned quotes as anxts ordata.table object depending on input (see examples).
Author(s)
Jonathan Cornelissen, Kris Boudt, Onno Kleen, and Emil Sjoerup.
References
Barndorff-Nielsen, O. E., Hansen, P. R., Lunde, A., and Shephard, N. (2009). Realized kernels in practice: Trades and quotes. Econometrics Journal 12, C1-C32.
Brownlees, C.T. and Gallo, G.M. (2006). Financial econometric analysis at ultra-high frequency: Data handling concerns. Computational Statistics & Data Analysis, 51, pages 2232-2245.
Falkenberry, T.N. (2002). High frequency data filtering. Unpublished technical report.
Examples
# Consider you have raw quote data for 1 stock for 2 dayshead(sampleQDataRaw)dim(sampleQDataRaw)qDataAfterCleaning <- quotesCleanup(qDataRaw = sampleQDataRaw, exchanges = "N")qDataAfterCleaning$reportdim(qDataAfterCleaning$qData)# In case you have more data it is advised to use the on-disk functionality# via "dataSource" and "dataDestination" argumentsRealized covariances via subsample averaging
Description
Calculates realized variances via averaging across partiallyoverlapping grids, first introduced by Zhang et al. (2005).This estimator is basically an average across differentrCov estimates that start atdifferent points in time, see details below.
Usage
rAVGCov( rData, cor = FALSE, alignBy = "minutes", alignPeriod = 5, k = 1, makeReturns = FALSE, ...)Arguments
rData | an |
cor | boolean, in case it is |
alignBy | character, indicating the time scale in which |
alignPeriod | positive numeric, indicating the number of periods to aggregate over. For example, to aggregatebased on a 5-minute frequency, set |
k | numeric denoting which horizon to use for the subsambles. This can be a fraction as long as |
makeReturns | boolean, should be |
... | used internally, do not change. |
Details
Suppose that in periodt, there areN equispaced returnsr_{i,t} and let\Delta be equal toalignPeriod. For\ i \geq \Delta, we define the subsampled\Delta-period return as
\tilde r_{t,i} = \sum_{k = 0}^{\Delta - 1} r_{t,i-k}, .
Now defineN^*(j) = N/\Delta ifj = 0 andN^*(j) = N/\Delta - 1 otherwise.Thej-th component of therAVGCov estimator is given by
RV_t^j = \sum_{i = 1}^{N^*(j)} \tilde r_{t, j + i \cdot \Delta}^2.
Now taking the average across the differentRV_t^j, \ j = 0, \dots, \Delta-1, defines therAVGCov estimator.The multivariate version follows analogously.
Note that Liu et al. (2015) show thatrAVGCov is not only theoretically but also empirically a more reliable estimator than rCov.
Value
in case the input is and contains data from one day, anN byN matrix is returned. If the data is a univariatexts object with multiple days, anxts is returned.If the data is multivariate and contains multiple days (xts ordata.table), the function returns a list containingN byN matrices. Each item in the list has a name which corresponds to the date for the matrix.
Author(s)
Scott Payseur, Onno Kleen, and Emil Sjoerup.
References
Liu, L. Y., Patton, A. J., Sheppard, K. (2015). Does anything beat 5-minute RV? A comparison of realized measures across multiple asset classes.Journal of Econometrics, 187, 293-311.
Zhang, L., Mykland, P. A. , and Ait-Sahalia, Y. (2005). A tale of two time scales: Determining integrated volatility with noisy high-frequency data.Journal of the American Statistical Association, 100, 1394-1411.
See Also
ICov for a list of implemented estimators of the integrated covariance.
Examples
# Average subsampled realized variance/covariance aligned at one minute returns at# 5 sub-grids (5 minutes).# Univariate subsampled realized variancervAvgSub <- rAVGCov(rData = sampleTData[, list(DT, PRICE)], alignBy = "minutes", makeReturns = TRUE)rvAvgSub# Multivariate subsampled realized variancervAvgCovSub <- rAVGCov(rData = sampleOneMinuteData[1:391], makeReturns = TRUE)rvAvgCovSubrBACov
Description
The Beta Adjusted Covariance (BAC) equals the pre-estimator plus a minimal adjustment matrix such that the covariance-implied stock-ETF beta equals a target beta.
The BAC estimator works by applying a minimum correction factor to a pre-estimated covariance matrix such that a target beta derived from the ETF is reached.
Let
\bar{\beta}
denote the implied beta derived from the pre-estimator, and
\beta_{\bullet}
denote the target beta, then the correction factor is calculated as:
L\left(\bar{\beta}-\beta_{\bullet}\right),
where
L=\left(I_{d^{2}}-\frac{1}{2}{\cal Q}\right)\bar{W}^{\prime}\left(I_{d^{2}}\left(\sum_{k=1}^{d}\frac{\sum_{k=1}^{n_{k}}\left(w_{t_{m-1}^{k}}^{k}\right)^{2}}{n_{k}}\right)-\frac{\bar{W}{\cal Q}\bar{W}^{\prime}}{2}\right)^{-1},
whered is the number of assets in the ETF, andn_{k} is the number of trades in thekth asset, and
\bar{W}^{k}=\left(0_{\left(k-1\right)d}^{\prime},\frac{1}{n_{1}}\sum_{m=1}^{n_{1}}w_{t_{m-1}^{1}}^{1},\dots,\frac{1}{n_{d}}\sum_{m=1}^{n_{d}}w_{t_{m-1}^{d}}^{d},0_{\left(d-k\right)d}^{\prime}\right),
wherew_{t_{m-1}^{k}}^{k} is the weight of thekth asset in the ETF.
and
{\cal Q}^{\left(i-1\right)d+j}
is defined by the following two cases:
\left(0_{\left(i-1\right)d+j-1}^{\prime},1,0_{\left(d-i+1\right)d-j}^{\prime}\right)+\left(0_{\left(j-1\right)d+i-1}^{\prime},-1,0_{\left(d-j+1\right)d-i}^{\prime}\right) \quad \textrm{if }i\neq j;
0_{d^{2}}^{\prime} \quad \textrm{otherwise}.
\bar{W}^k has dimensionsd \times d^2 and{\cal Q}^{\left(i-1\right)d+j} has dimensionsd^2 \times d^2.
The Beta-Adjusted Covariance is then
\Sigma^{\textrm{BAC}} = \Sigma - L\left(\bar{\beta}-\beta_{\bullet}\right),
where\Sigma is the pre-estimated covariance matrix.
Usage
rBACov( pData, shares, outstanding, nonEquity, ETFNAME = "ETF", unrestricted = TRUE, targetBeta = c("HY", "VAB", "expert"), expertBeta = NULL, preEstimator = "rCov", noiseRobustEstimator = rTSCov, noiseCorrection = FALSE, returnL = FALSE, ...)Arguments
pData | a named list. Each list-item contains an |
shares | a |
outstanding | number of shares outstanding of the ETF |
nonEquity | aggregated value of the additional components (like cash, money-market funds, bonds, etc.) of the ETF which are not included in the components in |
ETFNAME | a |
unrestricted | a |
targetBeta | a |
expertBeta | a |
preEstimator | a |
noiseRobustEstimator | a |
noiseCorrection | a |
returnL | a |
... | extra arguments passed to |
Author(s)
Emil Sjoerup, (Kris Boudt and Kirill Dragun for the Python version)
References
Boudt, K., Dragun, K., Omauri, S., and Vanduffel, S. (2021) Beta-Adjusted Covariance Estimation (working paper).
See Also
ICov for a list of implemented estimators of the integrated covariance.
Examples
## Not run: # Since we don't have any data in this package that is of the required format we must simulate it.library(xts)library(highfrequency)# The mvtnorm package is needed for this example# Please install this package before running this examplelibrary("mvtnorm")# Set the seed for replicationset.seed(123)iT <- 23400 # Number of observations# Simulate returnsrets <- rmvnorm(iT * 3 + 1, mean = rep(0,4), sigma = matrix(c(0.1, -0.03 , 0.02, 0.04, -0.03, 0.05, -0.03, 0.02, 0.02, -0.03, 0.05, -0.03, 0.04, 0.02, -0.03, 0.08), ncol = 4))# We assume that the assets don't trade in a synchronous mannerw1 <- rets[sort(sample(1:nrow(rets), size = nrow(rets) * 0.5)), 1]w2 <- rets[sort(sample(1:nrow(rets), size = nrow(rets) * 0.75)), 2]w3 <- rets[sort(sample(1:nrow(rets), size = nrow(rets) * 0.65)), 3]w4 <- rets[sort(sample(1:nrow(rets), size = nrow(rets) * 0.8)), 4]w5 <- rnorm(nrow(rets) * 0.9, mean = 0, sd = 0.005)timestamps1 <- seq(34200, 57600, length.out = length(w1))timestamps2 <- seq(34200, 57600, length.out = length(w2))timestamps3 <- seq(34200, 57600, length.out = length(w3))timestamps4 <- seq(34200, 57600, length.out = length(w4))timestamps4 <- seq(34200, 57600, length.out = length(w4))timestamps5 <- seq(34200, 57600, length.out = length(w5))w1 <- xts(w1 * c(0,sqrt(diff(timestamps1) / (max(timestamps1) - min(timestamps1)))), as.POSIXct(timestamps1, origin = "1970-01-01"), tzone = "UTC")w2 <- xts(w2 * c(0,sqrt(diff(timestamps2) / (max(timestamps2) - min(timestamps2)))), as.POSIXct(timestamps2, origin = "1970-01-01"), tzone = "UTC")w3 <- xts(w3 * c(0,sqrt(diff(timestamps3) / (max(timestamps3) - min(timestamps3)))), as.POSIXct(timestamps3, origin = "1970-01-01"), tzone = "UTC")w4 <- xts(w4 * c(0,sqrt(diff(timestamps4) / (max(timestamps4) - min(timestamps4)))), as.POSIXct(timestamps4, origin = "1970-01-01"), tzone = "UTC")w5 <- xts(w5 * c(0,sqrt(diff(timestamps5) / (max(timestamps5) - min(timestamps5)))), as.POSIXct(timestamps5, origin = "1970-01-01"), tzone = "UTC")p1 <- exp(cumsum(w1))p2 <- exp(cumsum(w2))p3 <- exp(cumsum(w3))p4 <- exp(cumsum(w4))weights <- runif(4) * 1:4weights <- weights / sum(weights)p5 <- xts(rowSums(cbind(w1 * weights[1], w2 * weights[2], w3 * weights[3], w4 * weights[4]), na.rm = TRUE), index(cbind(p1, p2, p3, p4)))p5 <- xts(cumsum(rowSums(cbind(p5, w5), na.rm = TRUE)), index(cbind(p5, w5)))p5 <- exp(p5[sort(sample(1:length(p5), size = nrow(rets) * 0.9))])BAC <- rBACov(pData = list( "ETF" = p5, "STOCK 1" = p1, "STOCK 2" = p2, "STOCK 3" = p3, "STOCK 4" = p4 ), shares = 1:4, outstanding = 1, nonEquity = 0, ETFNAME = "ETF", unrestricted = FALSE, preEstimator = "rCov", noiseCorrection = FALSE, returnL = FALSE, K = 2, J = 1)# Noise robust version of the estimatornoiseRobustBAC <- rBACov(pData = list( "ETF" = p5, "STOCK 1" = p1, "STOCK 2" = p2, "STOCK 3" = p3, "STOCK 4" = p4 ), shares = 1:4, outstanding = 1, nonEquity = 0, ETFNAME = "ETF", unrestricted = FALSE, preEstimator = "rCov", noiseCorrection = TRUE, noiseRobustEstimator = rHYCov, returnL = FALSE, K = 2, J = 1)# Use the Variance Adjusted Beta method# Also use a different pre-estimator.VABBAC <- rBACov(pData = list( "ETF" = p5, "STOCK 1" = p1, "STOCK 2" = p2, "STOCK 3" = p3, "STOCK 4" = p4 ), shares = 1:4, outstanding = 1, nonEquity = 0, ETFNAME = "ETF", unrestricted = FALSE, targetBeta = "VAB", preEstimator = "rHYov", noiseCorrection = FALSE, returnL = FALSE, Lin = FALSE, L = 0, K = 2, J = 1) ## End(Not run)Realized bipower covariance
Description
Calculate the Realized BiPower Covariance (rBPCov),defined in Barndorff-Nielsen and Shephard (2004).
Letr_{t,i} be an intradayN x 1 return vector andi=1,...,Mthe number of intraday returns.
The rBPCov is defined as the process whose value at timetis theN-dimensional square matrix withk,q-th element equal to
\mbox{rBPCov}[k,q]_t = \frac{\pi}{8} \bigg( \sum_{i=2}^{M} \left| r_{(k)t,i} + r_{(q)t,i} \right| \ \left| r_{(k)t,i-1} + r_{(q)t,i-1} \right| \\ - \left| r_{(k)t,i} - r_{(q)t,i} \right| \ \left| r_{(k)t,i-1} - r_{(q)t,i-1} \right| \bigg),
wherer_{(k)t,i} is thek-th component of the return vectorr_{i,t}.
Usage
rBPCov( rData, cor = FALSE, alignBy = NULL, alignPeriod = NULL, makeReturns = FALSE, makePsd = FALSE, ...)Arguments
rData | an |
cor | boolean, in case it is |
alignBy | character, indicating the time scale in which |
alignPeriod | positive numeric, indicating the number of periods to aggregate over. E.g. to aggregatebased on a 5-minute frequency, set |
makeReturns | boolean, should be |
makePsd | boolean, in case it is |
... | used internally, do not change. |
Value
in case the input is and contains data from one day, anN byN matrix is returned. If the data is a univariatexts object with multiple days, anxts is returned.If the data is multivariate and contains multiple days (xts ordata.table), the function returns a list containingN byN matrices. Each item in the list has a name which corresponds to the date for the matrix.
Author(s)
Jonathan Cornelissen, Kris Boudt, and Emil Sjoerup.
References
Barndorff-Nielsen, O. E., and Shephard, N. (2004). Measuring the impact of jumps in multivariate price processes using bipower covariation. Discussion paper, Nuffield College, Oxford University.
See Also
ICov for a list of implemented estimators of the integrated covariance.
Examples
# Realized Bipower Variance/Covariance for a price series aligned# at 5 minutes.# Univariate:rbpv <- rBPCov(rData = sampleTData[, list(DT, PRICE)], alignBy ="minutes", alignPeriod = 5, makeReturns = TRUE)# Multivariate:rbpc <- rBPCov(rData = sampleOneMinuteData, makeReturns = TRUE, makePsd = TRUE)rbpcRealized beta
Description
Depending on users' choices of estimator (realized covariance (RCOVestimator) and realized variance (RVestimator)), the function returns the realized beta, defined as the ratio between both.
The realized beta is given by
\beta_{jm} = \frac {RCOVestimator_{jm}}{RVestimator_{m}}
in which
RCOVestimator: Realized covariance of asset j and market indexm.
RVestimator: Realized variance of market indexm.
Usage
rBeta( rData, rIndex, RCOVestimator = "rCov", RVestimator = "rRVar", makeReturns = FALSE, ...)Arguments
rData | a |
rIndex | a |
RCOVestimator | can be chosen among realized covariance estimators: |
RVestimator | can be chosen among realized variance estimators: |
makeReturns | boolean, should be |
... | arguments passed to |
Details
Suppose there areN equispaced returns on dayt for the assetj and the indexm. Denoter_{(j)i,t},r_{(m)i,t} as theith return on dayt for assetj and indexm (withi=1, \ldots,N).
By default, the RCov is used and the realized beta coefficient is computed as:
\hat{\beta}_{(jm)t}= \frac{\sum_{i=1}^{N} r_{(j)i,t} r_{(m)i,t}}{\sum_{i=1}^{N} r_{(m)i,t}^2}.
Note: The function does not support to calculate betas across multiple days.
Value
numeric
Author(s)
Giang Nguyen, Jonathan Cornelissen, Kris Boudt, Onno Kleen, and Emil Sjoerup.
References
Barndorff-Nielsen, O. E. and Shephard, N. (2004). Econometric analysis of realized covariation: high frequency based covariance, regression, and correlation infinancial economics.Econometrica, 72, 885-925.
Examples
## Not run: library("xts")a <- as.xts(sampleOneMinuteData[as.Date(DT) == "2001-08-04", list(DT, MARKET)])b <- as.xts(sampleOneMinuteData[as.Date(DT) == "2001-08-04", list(DT, STOCK)])rBeta(a, b, RCOVestimator = "rBPCov", RVestimator = "rMinRVar", makeReturns = TRUE)## End(Not run)CholCov estimator
Description
Positive semi-definite covariance estimation using the CholCov algorithm.The algorithm estimates the integrated covariance matrix by sequentially adding series and using 'refreshTime' to synchronize the observations.This is done in order of liquidity, which means that the algorithm uses more data points than most other estimation techniques.
Usage
rCholCov( pData, IVest = "rMRCov", COVest = "rMRCov", criterion = "squared duration", ...)Arguments
pData | a list. Each list-item i contains an |
IVest | integrated variance estimator, default is |
COVest | covariance estimator, default is |
criterion | criterion to use for sorting the data according to liquidity. Possible values are |
... | additional arguments to pass to |
Details
Additional arguments forIVest andCOVest should be passed in the ... argument.For therMRCov estimator, which is the default, thetheta anddelta parameters can be set. These default to 1 and 0.1 respectively.
The CholCov estimation algorithm is useful for estimating covariances ofd series that are sampled asynchronously and with different liquidities.The CholCov estimation algorithm is as follows:
First sort the series in terms of decreasing liquidity according to a liquidity criterion, such that series
1is the most liquid, and seriesdthe least.Step 1:
Apply refresh-time on
{a} = \{1\}to obtain the grid\tau^{a}.Estimate
\hat{g}_{11}using an IV estimator onf_{\tau^{a}_j}^{(1)}= \hat{u}_{\tau^{a}_j}^{(1)}.Step 2:
Apply refresh-time on
{b} = \{1,2\}to obtain the grid\tau^{b}.Estimate
\hat{h}^{b}_{21}as the realized beta betweenf_{\tau^{b}_j}^{(1)}and\hat{u}_{\tau^{b}_j}^{(2)}. Set\hat{h}_{21}=\hat{h}^{b}_{21}.Estimate
\hat{g}_{22}using an IV estimator onf_{\tau^{b}_j}^{(2)}= \hat{u}_{\tau^{b}_j}^{(2)}-\hat{h}_{21}f_{\tau^{b}_j}^{(1)}.Step 3:
Apply refresh-time on
{c} = \{1,3\}to obtain the grid\tau^{c}.Estimate
\hat{h}^{c}_{31}as the realized beta betweenf_{\tau^{c}_j}^{(1)}and\hat{u}_{\tau^{c}_j}^{(3)}. Set\hat{h}_{31}= \hat{h}^{c}_{31}.Apply refresh-time on
{d} = \{1,2,3\}to obtain the grid\tau^{d}.Re-estimate
\hat{h}_{21}^{d}at the new grid, such that the projectionsf_{\tau^{d}_j}^{(1)}andf_{\tau^{d}_j}^{(2)}are orthogonal.Estimate
\hat{h}^{d}_{32}as the realized beta betweenf_{\tau^{d}_j}^{(2)}and\hat{u}_{\tau^{d}_j}^{(3)}. Set\hat{h}_{32} = \hat{h}^{d}_{32}.Estimate
\hat{g}_{33}using an IV estimator onf_{\tau^{d}_j}^{(3)}= \hat{u}_{\tau^{d}_j}^{(3)}-\hat{h}_{32}f_{\tau^{d}_j}^{(2)} -\hat{h}_{31}f_{\tau^{d}_j}^{(1)}.Step 4 to d:
Continue in the same fashion by sampling over
{1,...,k,l}to estimateh_{lk}using the smallest possible set.Re-estimate the
h_{nm}withm<n\leq kat every new grid to obtain orthogonal projections.Estimate the
g_{kk}as the IV of projections based on the final estimates,\hat{h}.
Value
a list containing the covariance matrix"CholCov", and the Cholesky decomposition"L" and"G" such that\code{L} \times \code{G} \times \code{L}' = \code{CholCov}.
Author(s)
Emil Sjoerup
References
Boudt, K., Laurent, S., Lunde, A., Quaedvlieg, R., and Sauri, O. (2017). Positive semidefinite integrated covariance estimation, factorizations and asynchronicity.Journal of Econometrics, 196, 347-367.
See Also
ICov for a list of implemented estimators of the integrated covariance.
Realized covariance
Description
Function returns the Realized Covariation (rCov).Letr_{t,i} be an intradayN \times M return vector andi=1,...,Mthe number of intraday returns.
Then, the rCov is given by
\mbox{rCov}_{t}=\sum_{i=1}^{M}r_{t,i}r'_{t,i}.
Usage
rCov( rData, cor = FALSE, alignBy = NULL, alignPeriod = NULL, makeReturns = FALSE, ...)Arguments
rData | an |
cor | boolean, in case it is |
alignBy | character, indicating the time scale in which |
alignPeriod | positive numeric, indicating the number of periods to aggregate over. For example, to aggregatebased on a 5-minute frequency, set |
makeReturns | boolean, should be |
... | used internally, do not change. |
Value
in case the input is and contains data from one day, anN \times N matrix is returned. If the data is a univariatexts object with multiple days, anxts is returned.If the data is multivariate and contains multiple days (xts ordata.table), the function returns a list containing N by N matrices. Each item in the list has a name which corresponds to the date for the matrix.
Author(s)
Jonathan Cornelissen, Kris Boudt, and Emil Sjoerup.
See Also
ICov for a list of implemented estimators of the integrated covariance.
Examples
# Realized Variance/Covariance for prices aligned at 5 minutes.# Univariate:rv = rCov(rData = sampleTData[, list(DT, PRICE)], alignBy = "minutes", alignPeriod = 5, makeReturns = TRUE)rv# Multivariate:rc = rCov(rData = sampleOneMinuteData, makeReturns = TRUE)rcHayashi-Yoshida covariance
Description
Calculates the Hayashi-Yoshida Covariance estimator
Usage
rHYCov( rData, cor = FALSE, period = 1, alignBy = "seconds", alignPeriod = 1, makeReturns = FALSE, makePsd = TRUE, ...)Arguments
rData | an |
cor | boolean, in case it is |
period | Sampling period |
alignBy | character, indicating the time scale in which |
alignPeriod | positive numeric, indicating the number of periods to aggregate over. For example, to aggregatebased on a 5-minute frequency, set |
makeReturns | boolean, should be |
makePsd | boolean, in case it is |
... | used internally. Do not set. |
Author(s)
Scott Payseur and Emil Sjoerup.
References
Hayashi, T. and Yoshida, N. (2005). On covariance estimation of non-synchronously observed diffusion processes.Bernoulli, 11, 359-379.
See Also
ICov for a list of implemented estimators of the integrated covariance.
Examples
library("xts")hy <- rHYCov(rData = as.xts(sampleOneMinuteData)["2001-08-05"], period = 5, alignBy = "minutes", alignPeriod = 5, makeReturns = TRUE)Realized kernel estimator
Description
Realized covariance calculation using a kernel estimator. The different types of kernels available can be found usinglistAvailableKernels.
Usage
rKernelCov( rData, cor = FALSE, alignBy = NULL, alignPeriod = NULL, makeReturns = FALSE, kernelType = "rectangular", kernelParam = 1, kernelDOFadj = TRUE, ...)Arguments
rData | an |
cor | boolean, in case it is |
alignBy | character, indicating the time scale in which |
alignPeriod | positive numeric, indicating the number of periods to aggregate over. For example, to aggregate based on a 5-minute frequency, set |
makeReturns | boolean, should be |
kernelType | Kernel name. |
kernelParam | Kernel parameter. |
kernelDOFadj | Kernel degree of freedom adjustment. |
... | used internally, do not change. |
Details
Letr_{t,i} beN returns in periodt,i = 1, \ldots, N. The returns or prices do not have to be equidistant. The kernel estimator forH = \code{kernelParam} is given by
\gamma_0 + 2 \sum_{h = 1}^H k \left(\frac{h-1}{H}\right) \gamma_h,
wherek(x) is the chosen kernel function and
\gamma_h = \sum_{i = h}^N r_{t,i} \times r_{t,i-h}
is the empirical autocovariance function. The multivariate version employs the cross-covariances instead.
Value
in case the input is and contains data from one day, anN byN matrix is returned. If the data is a univariatexts object with multiple days, anxts is returned.If the data is multivariate and contains multiple days (xts ordata.table), the function returns a list containingN byN matrices. Each item in the list has a name which corresponds to the date for the matrix.
Author(s)
Scott Payseur, Onno Kleen, and Emil Sjoerup.
References
Barndorff-Nielsen, O. E., Hansen, P. R., Lunde, A., and Shephard, N. (2008). Designing realized kernels to measure the ex post variation of equity prices in the presence of noise.Econometrica, 76, 1481-1536.
Hansen, P. and Lunde, A. (2006). Realized variance and market microstructure noise.Journal of Business and Economic Statistics, 24, 127-218.
Zhou., B. (1996). High-frequency data and volatility in foreign-exchange rates.Journal of Business & Economic Statistics, 14, 45-52.
See Also
ICov for a list of implemented estimators of the integrated covariance.
Examples
# Univariate:rvKernel <- rKernelCov(rData = sampleTData[, list(DT, PRICE)], alignBy = "minutes", alignPeriod = 5, makeReturns = TRUE)rvKernel# Multivariate:rcKernel <- rKernelCov(rData = sampleOneMinuteData, makeReturns = TRUE)rcKernelRealized kurtosis of highfrequency return series.
Description
Calculate the realized kurtosis as defined in Amaya et al. (2015).
Assume there areN equispaced returns in periodt. Letr_{t,i} be a return (withi=1, \ldots,N) in periodt.Then,rKurt is given by
\mbox{rKurt}_{t} = \frac{N \sum_{i=1}^{N}(r_{t,i})^4}{\left( \sum_{i=1}^N r_{t,i}^2 \right)^2}.
Usage
rKurt(rData, alignBy = NULL, alignPeriod = NULL, makeReturns = FALSE)Arguments
rData | an |
alignBy | character, indicating the time scale in which |
alignPeriod | positive numeric, indicating the number of periods to aggregate over. For example, to aggregatebased on a 5-minute frequency, set |
makeReturns | boolean, should be |
Value
In case the input is an
xtsobject with data from one day, a numeric of the same length as the number of assets.If the input data spans multiple days and is in
xtsformat, anxtswill be returned.If the input data is a
data.tableobject, the function returns adata.tablewith the same column names as the input data, containing the date and the realized measures.
Author(s)
Giang Nguyen, Jonathan Cornelissen, Kris Boudt, Onno Kleen, and Emil Sjoerup.
References
Amaya, D., Christoffersen, P., Jacobs, K., and Vasquez, A. (2015). Does realized skewness and kurtosis predict the cross-section of equity returns?Journal of Financial Economics, 118, 135-167.
Examples
rk <- rKurt(sampleTData[, list(DT, PRICE)], alignBy = "minutes", alignPeriod = 5, makeReturns = TRUE)rkDEPRECATED
Description
DEPRECATED USErMPVar
Usage
rMPV(rData, alignBy = NULL, alignPeriod = NULL, makeReturns = FALSE)Arguments
rData | DEPRECATED |
alignBy | DEPRECATED |
alignPeriod | DEPRECATED |
makeReturns | DEPRECATED |
Realized multipower variation
Description
Calculate the Realized Multipower Variation rMPVar, defined in Andersen et al. (2012).
Assume there areN equispaced returnsr_{t,i} in periodt,i=1, \ldots,N. Then, the rMPVar is given by
\mbox{rMPVar}_{N}(m,p)= d_{m,p} \frac{N^{p/2}}{N-m+1} \sum_{i=1}^{N-m+1}|r_{t,i}|^{p/m} \ldots |r_{t,i+m-1}|^{p/m}
in which
d_{m,p} = \mu_{p/m}^{-m}:
m: the window size of return blocks;
p: the power of the variation;
andm >p/2.
Usage
rMPVar( rData, m = 2, p = 2, alignBy = NULL, alignPeriod = NULL, makeReturns = FALSE, ...)Arguments
rData | an |
m | the window size of return blocks. 2 by default. |
p | the power of the variation. 2 by default. |
alignBy | character, indicating the time scale in which |
alignPeriod | positive numeric, indicating the number of periods to aggregate over. For example, to aggregatebased on a 5-minute frequency, set |
makeReturns | boolean, should be |
... | used internally, do not change. |
Value
numeric
Author(s)
Giang Nguyen, Jonathan Cornelissen, Kris Boudt, and Emil Sjoerup.
References
Andersen, T. G., Dobrev, D., and Schaumburg, E. (2012). Jump-robust volatility estimation using nearest neighbor truncation.Journal of Econometrics, 169, 75-93.
See Also
IVar for a list of implemented estimators of the integrated variance.
Examples
mpv <- rMPVar(sampleTData[, list(DT, PRICE)], m = 2, p = 3, alignBy = "minutes", alignPeriod = 5, makeReturns = TRUE)mpvDEPRECATED rMRC
Description
DEPRECATED USErMRCov
Usage
rMRC(pData, pairwise = FALSE, makePsd = FALSE, theta = 0.8, ...)Arguments
pData | DEPRECATED USE |
pairwise | DEPRECATED USE |
makePsd | DEPRECATED USE |
theta | DEPRECATED USE |
... | DEPRECATED USE |
Modulated realized covariance
Description
Calculate univariate or multivariate pre-averaged estimator, as defined in Hautsch and Podolskij (2013).
Usage
rMRCov( pData, pairwise = FALSE, makePsd = FALSE, theta = 0.8, crossAssetNoiseCorrection = FALSE, ...)Arguments
pData | a list. Each list-item contains an |
pairwise | boolean, should be |
makePsd | boolean, in case it is |
theta | a |
crossAssetNoiseCorrection | a |
... | used internally, do not change. |
Details
In practice, market microstructure noise leads to a departure from the pure semimartingale model. We consider the processY in period\tau:
\mbox{Y}_{\tau} = X_{\tau} + \epsilon_{\tau},
where the observedd dimensional log-prices are the sum of underlying Brownian semimartingale processX and a noise term\epsilon_{\tau}.
\epsilon_{\tau} is ani.i.d. process withX.
It is intuitive that under mean zeroi.i.d. microstructure noise some form of smoothing of the observed log-price should tend to diminish the impact of the noise.Effectively, we are going to approximate a continuous function by an average of observations ofY in a neighborhood, the noise being averaged away.
Assume there isN equispaced returns in period\tau of a list (after refreshing data). Letr_{\tau_i} be a return (withi=1, \ldots,N) of an asset in period\tau. Assume there isd assets.
In order to define the univariate pre-averaging estimator, we first define the pre-averaged returns as
\bar{r}_{\tau_j}^{(k)}= \sum_{h=1}^{k_N-1}g\left(\frac{h}{k_N}\right)r_{\tau_{j+h}}^{(k)}
where g is a non-zero real-valued functiong:[0,1]\rightarrowR given byg(x) =\min(x,1-x).k_N is a sequence of integers satisfying\mbox{k}_{N} = \lfloor\theta N^{1/2}\rfloor.We use\theta = 0.8 as recommended in Hautsch and Podolskij (2013). The pre-averaged returns are simply a weighted average over the returns in a local window.This averaging diminishes the influence of the noise. The order of the window sizek_n is chosen to lead to optimal convergence rates.The pre-averaging estimator is then simply the analogue of the realized variance but based on pre-averaged returns and an additional term to remove bias due to noise
\hat{C}= \frac{N^{-1/2}}{\theta \psi_2}\sum_{i=0}^{N-k_N+1} (\bar{r}_{\tau_i})^2-\frac{\psi_1^{k_N}N^{-1}}{2\theta^2\psi_2^{k_N}}\sum_{i=0}^{N}r_{\tau_i}^2
with
\psi_1^{k_N}= k_N \sum_{j=1}^{k_N}\left(g\left(\frac{j+1}{k_N}\right)-g\left(\frac{j}{k_N}\right)\right)^2,\quad
\psi_2^{k_N}= \frac{1}{k_N}\sum_{j=1}^{k_N-1}g^2\left(\frac{j}{k_N}\right).
\psi_2= \frac{1}{12}
The multivariate counterpart is very similar. The estimator is called the Modulated Realized Covariance (rMRCov) and is defined as
\mbox{MRC}= \frac{N}{N-k_N+2}\frac{1}{\psi_2k_N}\sum_{i=0}^{N-k_N+1}\bar{\boldsymbol{r}}_{\tau_i}\cdot \bar{\boldsymbol{r}}'_{\tau_i} -\frac{\psi_1^{k_N}}{\theta^2\psi_2^{k_N}}\hat{\Psi}
where\hat{\Psi}_N = \frac{1}{2N}\sum_{i=1}^N \boldsymbol{r}_{\tau_i}(\boldsymbol{r}_{\tau_i})'. It is a bias correction to make it consistent.However, due to this correction, the estimator is not ensured PSD.An alternative is to slightly enlarge the bandwidth such that\mbox{k}_{N} = \lfloor\theta N^{1/2+\delta}\rfloor.\delta = 0.1 results in a consistent estimate without the bias correction and a PSD estimate, in which case:
\mbox{MRC}^{\delta}= \frac{N}{N-k_N+2}\frac{1}{\psi_2k_N}\sum_{i=0}^{N-k_N+1}\bar{\boldsymbol{r}}_i\cdot \bar{\boldsymbol{r}}'_i
Value
Ad \times d covariance matrix.
Author(s)
Giang Nguyen, Jonathan Cornelissen, Kris Boudt, and Emil Sjoerup.
References
Hautsch, N., and Podolskij, M. (2013). Preaveraging-based estimation of quadratic variation in the presence of noise and jumps: theory, implementation, and empirical Evidence.Journal of Business & Economic Statistics, 31, 165-183.
See Also
ICov for a list of implemented estimators of the integrated covariance.
Examples
## Not run: library("xts")# Note that this ought to be tick-by-tick data and this example is only to show the usage.a <- list(as.xts(sampleOneMinuteData[as.Date(DT) == "2001-08-04", list(DT, MARKET)]), as.xts(sampleOneMinuteData[as.Date(DT) == "2001-08-04", list(DT, STOCK)]))rMRCov(a, pairwise = TRUE, makePsd = TRUE)# We can also add use data.tables and use a named list to convey asset namesa <- list(foo = sampleOneMinuteData[as.Date(DT) == "2001-08-04", list(DT, MARKET)], bar = sampleOneMinuteData[as.Date(DT) == "2001-08-04", list(DT, STOCK)])rMRCov(a, pairwise = TRUE, makePsd = TRUE)## End(Not run)DEPRECATED
Description
DEPRECATED USErMedRQuar
Usage
rMedRQ(rData, alignBy = NULL, alignPeriod = NULL, makeReturns = FALSE)Arguments
rData | DEPRECATED USE |
alignBy | DEPRECATED USE |
alignPeriod | DEPRECATED USE |
makeReturns | DEPRECATED USE |
An estimator of integrated quarticity from applying the median operator on blocks of three returns
Description
Calculate the rMedRQ, defined in Andersen et al. (2012). Assume there areN equispaced returnsr_{t,i} in periodt,i=1, \ldots,N. Then, the rMedRQ is given by
\mbox{rMedRQ}_{t}=\frac{3\pi N}{9\pi +72 - 52\sqrt{3}} \left(\frac{N}{N-2}\right) \sum_{i=2}^{N-1} \mbox{med}(|r_{t,i-1}|, |r_{t,i}|, |r_{t,i+1}|)^4.
Usage
rMedRQuar(rData, alignBy = NULL, alignPeriod = NULL, makeReturns = FALSE)Arguments
rData | an |
alignBy | character, indicating the time scale in which |
alignPeriod | positive numeric, indicating the number of periods to aggregate over. For example, to aggregatebased on a 5-minute frequency, set |
makeReturns | boolean, should be |
Value
In case the input is an
xtsobject with data from one day, a numeric of the same length as the number of assets.If the input data spans multiple days and is in
xtsformat, anxtswill be returned.If the input data is a
data.tableobject, the function returns adata.tablewith the same column names as the input data, containing the date and the realized measures.
Author(s)
Giang Nguyen, Jonathan Cornelissen, Kris Boudt, and Emil Sjoerup.
References
Andersen, T. G., Dobrev, D., and Schaumburg, E. (2012). Jump-robust volatility estimation using nearest neighbor truncation.Journal of Econometrics, 169, 75-93.
Examples
rq <- rMedRQuar(rData = sampleTData[, list(DT, PRICE)], alignBy = "minutes", alignPeriod = 5, makeReturns = TRUE)rqDEPRECATED
Description
DEPRECATED USErMedRVar
Usage
rMedRV(rData, alignBy = NULL, alignPeriod = NULL, makeReturns = FALSE)Arguments
rData | DEPRECATED USE |
alignBy | DEPRECATED USE |
alignPeriod | DEPRECATED USE |
makeReturns | DEPRECATED USE |
rMedRVar
Description
Calculate the rMedRVar, defined in Andersen et al. (2012). Letr_{t,i} be a return (withi=1,\ldots,M) in periodt.Then, the rMedRVar is given by
\mbox{rMedRVar}_{t}=\frac{\pi}{6-4\sqrt{3}+\pi}\left(\frac{M}{M-2}\right) \sum_{i=2}^{M-1} \mbox{med}(|r_{t,i-1}|,|r_{t,i}|, |r_{t,i+1}|)^2
Usage
rMedRVar(rData, alignBy = NULL, alignPeriod = NULL, makeReturns = FALSE, ...)Arguments
rData | an |
alignBy | character, indicating the time scale in which |
alignPeriod | positive numeric, indicating the number of periods to aggregate over. For example, to aggregatebased on a 5-minute frequency, set |
makeReturns | boolean, should be |
... | used internally, do not change. |
Details
The rMedRVar belongs to the class of realized volatility measures in this packagethat use the series of high-frequency returnsr_{t,i} of a daytto produce an ex post estimate of the realized volatility of that dayt.rMedRVar is designed to be robust to price jumps.The difference between RV and rMedRVar is an estimate of the realized jumpvariability. Disentangling the continuous and jump components in RVcan lead to more precise volatility forecasts,as shown in Andersen et al. (2012)
Value
In case the input is an
xtsobject with data from one day, a numeric of the same length as the number of assets.If the input data spans multiple days and is in
xtsformat, anxtswill be returned.If the input data is a
data.tableobject, the function returns adata.tablewith the same column names as the input data, containing the date and the realized measures.
Author(s)
Jonathan Cornelissen, Kris Boudt, and Emil Sjoerup.
References
Andersen, T. G., Dobrev, D., and Schaumburg, E. (2012). Jump-robust volatility estimation using nearest neighbor truncation.Journal of Econometrics, 169, 75-93.
See Also
IVar for a list of implemented estimators of the integrated variance.
Examples
medrv <- rMedRVar(rData = sampleTData[, list(DT, PRICE)], alignBy = "minutes", alignPeriod = 5, makeReturns = TRUE)medrvDEPRECATED
Description
DEPRECATED USErMinRQuar
Usage
rMinRQ(rData, alignBy = NULL, alignPeriod = NULL, makeReturns = FALSE)Arguments
rData | DEPRECATED USE |
alignBy | DEPRECATED USE |
alignPeriod | DEPRECATED USE |
makeReturns | DEPRECATED USE |
An estimator of integrated quarticity from applying the minimum operator on blocks of two returns
Description
Calculate the rMinRQuar, defined in Andersen et al. (2012).Assume there areN equispaced returnsr_{t,i} in periodt,i=1, \ldots,N.Then, the rMinRQuar is given by
\mbox{rMinRQuar}_{t}=\frac{\pi N}{3 \pi - 8} \left(\frac{N}{N-1}\right) \sum_{i=1}^{N-1} \mbox{min}(|r_{t,i}| ,|r_{t,i+1}|)^4
Usage
rMinRQuar(rData, alignBy = NULL, alignPeriod = NULL, makeReturns = FALSE)Arguments
rData | an |
alignBy | character, indicating the time scale in which |
alignPeriod | positive numeric, indicating the number of periods to aggregate over. For example, to aggregatebased on a 5-minute frequency, set |
makeReturns | boolean, should be |
Value
In case the input is an
xtsobject with data from one day, a numeric of the same length as the number of assets.If the input data spans multiple days and is in
xtsformat, anxtswill be returned.If the input data is a
data.tableobject, the function returns adata.tablewith the same column names as the input data, containing the date and the realized measures.
Author(s)
Giang Nguyen, Jonathan Cornelissen, Kris Boudt, and Emil Sjoerup
References
Andersen, T. G., Dobrev, D., and Schaumburg, E. (2012). Jump-robust volatility estimation using nearest neighbor truncation.Journal of Econometrics, 169, 75-93.
Examples
rq <- rMinRQuar(rData = sampleTData[, list(DT, PRICE)], alignBy = "minutes", alignPeriod = 5, makeReturns = TRUE)rqDEPRECATED
Description
DEPRECATED USErMinRVar
Usage
rMinRV(rData, alignBy = NULL, alignPeriod = NULL, makeReturns = FALSE)Arguments
rData | DEPRECATED USE |
alignBy | DEPRECATED USE |
alignPeriod | DEPRECATED USE |
makeReturns | DEPRECATED USE |
rMinRVar
Description
Calculate the rMinRVar, defined in Andersen et al. (2009).Letr_{t,i} be a return (withi=1,\ldots,M) in periodt.Then, the rMinRVar is given by
\mbox{rMinRVar}_{t}=\frac{\pi}{\pi - 2}\left(\frac{M}{M-1}\right) \sum_{i=1}^{M-1} \mbox{min}(|r_{t,i}| ,|r_{t,i+1}|)^2
Usage
rMinRVar(rData, alignBy = NULL, alignPeriod = NULL, makeReturns = FALSE, ...)Arguments
rData | an |
alignBy | character, indicating the time scale in which |
alignPeriod | positive numeric, indicating the number of periods to aggregate over. For example, to aggregatebased on a 5-minute frequency, set |
makeReturns | boolean, should be |
... | used internally, do not change. |
Value
In case the input is an
xtsobject with data from one day, a numeric of the same length as the number of assets.If the input data spans multiple days and is in
xtsformat, anxtswill be returned.If the input data is a
data.tableobject, the function returns adata.tablewith the same column names as the input data, containing the date and the realized measures.
Author(s)
Jonathan Cornelissen, Kris Boudt, Emil Sjoerup.
References
Andersen, T. G., Dobrev, D., and Schaumburg, E. (2012). Jump-robust volatility estimation using nearest neighbor truncation.Journal of Econometrics, 169, 75-93.
See Also
IVar for a list of implemented estimators of the integrated variance.
Examples
minrv <- rMinRVar(rData = sampleTData[, list(DT, PRICE)], alignBy = "minutes", alignPeriod = 5, makeReturns = TRUE)minrvRealized outlyingness weighted covariance
Description
Calculate the Realized Outlyingness Weighted Covariance (rOWCov), defined in Boudt et al. (2008).
Letr_{t,i}, fori = 1,..., M be a sampleofM high-frequency(N \times 1) return vectors andd_{t,i}their outlyingness given by the squared Mahalanobis distance betweenthe return vector and zero in terms of the reweighted MCD covarianceestimate based on these returns.
Then, the rOWCov is given by
\mbox{rOWCov}_{t}=c_{w}\frac{\sum_{i=1}^{M}w(d_{t,i})r_{t,i}r'_{t,i}}{\frac{1}{M}\sum_{i=1}^{M}w(d_{t,i})},
The weightw_{i,\Delta} is one if the multivariate jump test statistic forr_{i,\Delta} in Boudt et al. (2008) is lessthan the 99.9% percentile of the chi-square distribution withN degrees of freedom and zero otherwise.The scalarc_{w} is a correction factor ensuring consistency of the rOWCov for the Integrated Covariance,under the Brownian Semimartingale with Finite Activity Jumps model.
Usage
rOWCov( rData, cor = FALSE, alignBy = NULL, alignPeriod = NULL, makeReturns = FALSE, seasadjR = NULL, wFunction = "HR", alphaMCD = 0.75, alpha = 0.001, ...)Arguments
rData | a |
cor | boolean, in case it is |
alignBy | character, indicating the time scale in which |
alignPeriod | positive numeric, indicating the number of periods to aggregate over. For example, to aggregatebased on a 5-minute frequency, set |
makeReturns | boolean, should be |
seasadjR | a |
wFunction | determines whethera zero-one weight function (one if no jump is detected based on |
alphaMCD | a numeric parameter, controlling the size ofthe subsets over which the determinant is minimized.Allowed values are between 0.5 and 1 andthe default is 0.75. See Boudt et al. (2008) or the |
alpha | is a parameter between 0 and 0.5,that determines the rejection threshold value(see Boudt et al. (2008) for details). |
... | used internally, do not change. |
Details
Advantages of the rOWCov compared to therBPCov include a higher statistical efficiency, positive semi-definiteness and affine equi-variance.However, the rOWCov suffers from a curse of dimensionality.The rOWCov gives a zero weight to a return vectorif at least one of the components is affected by a jump.In the case of independent jump occurrences, the average proportion of observationswith at least one component being affected by jumps increases fast with the dimensionof the series. This means that a potentially large proportion of the returns receivesa zero weight, due to which the rOWCov can have a low finite sample efficiency in higher dimensions.
Value
anN \times N matrix
Author(s)
Jonathan Cornelissen, Kris Boudt, and Emil Sjoerup.
References
Boudt, K., Croux, C., and Laurent, S. (2008). Outlyingness weighted covariation.Journal of Financial Econometrics, 9, 657–684.
See Also
ICov for a list of implemented estimators of the integrated covariance.
Examples
## Not run: library("xts")# Realized Outlyingness Weighted Variance/Covariance for prices aligned# at 1 minutes.# Univariate:row <- rOWCov(rData = as.xts(sampleOneMinuteData[as.Date(DT) == "2001-08-04", list(DT, MARKET)]), makeReturns = TRUE)row# Multivariate:rowc <- rOWCov(rData = as.xts(sampleOneMinuteData[as.Date(DT) == "2001-08-04",]), makeReturns = TRUE)rowc## End(Not run)Realized quad-power variation of intraday returns
Description
Calculate the realized quad-power variation, defined in Andersen et al. (2012).
Assume there areN equispaced returnsr_{t,i} in periodt,i=1, \ldots,N. Then, the rQPVar is given by
\mbox{rQPVar}_{t}=N*\frac{N}{N-3} \left(\frac{\pi^2}{4} \right)^{-4} \mbox({|r_{t,i}|} {|r_{t,i-1}|} {|r_{t,i-2}|} {|r_{t,i-3}|})
Usage
rQPVar(rData, alignBy = NULL, alignPeriod = NULL, makeReturns = FALSE, ...)Arguments
rData | an |
alignBy | character, indicating the time scale in which |
alignPeriod | positive numeric, indicating the number of periods to aggregate over. For example, to aggregatebased on a 5-minute frequency, set |
makeReturns | boolean, should be |
... | used internally, do not change. |
Value
In case the input is an
xtsobject with data from one day, a numeric of the same length as the number of assets.If the input data spans multiple days and is in
xtsformat, anxtswill be returned.If the input data is a
data.tableobject, the function returns adata.tablewith the same column names as the input data, containing the date and the realized measures.
Author(s)
Giang Nguyen, Jonathan Cornelissen, Kris Boudt, and Emil Sjoerup
References
Andersen, T. G., Dobrev, D., and Schaumburg, E. (2012). Jump-robust volatility estimation using nearest neighbor truncation.Journal of Econometrics, 169, 75-93.
See Also
IVar for a list of implemented estimators of the integrated variance.
Examples
qpv <- rQPVar(rData= sampleTData[, list(DT, PRICE)], alignBy= "minutes", alignPeriod =5, makeReturns= TRUE)qpvRealized quarticity
Description
Calculate the realized quarticity (rQuar), defined in Andersen et al. (2012).
Assume there areN equispaced returnsr_{t,i} in periodt,i=1, \ldots,N.
Then, the rQuar is given by
\mbox{rQuar}_{t}=\frac{N}{3} \sum_{i=1}^{N} \mbox(r_{t,i}^4)
Usage
rQuar(rData, alignBy = NULL, alignPeriod = NULL, makeReturns = FALSE)Arguments
rData | an |
alignBy | character, indicating the time scale in which |
alignPeriod | positive numeric, indicating the number of periods to aggregate over. For example, to aggregatebased on a 5-minute frequency, set |
makeReturns | boolean, should be |
Value
In case the input is an
xtsobject with data from one day, a numeric of the same length as the number of assets.If the input data spans multiple days and is in
xtsformat, anxtswill be returned.If the input data is a
data.tableobject, the function returns adata.tablewith the same column names as the input data, containing the date and the realized measures.
Author(s)
Giang Nguyen, Jonathan Cornelissen, Kris Boudt, and Emil Sjoerup.
References
Andersen, T. G., Dobrev, D., and Schaumburg, E. (2012). Jump-robust volatility estimation using nearest neighbor truncation.Journal of Econometrics, 169, 75-93.
Examples
rq <- rQuar(rData = sampleTData[, list(DT, PRICE)], alignBy = "minutes", alignPeriod = 5, makeReturns = TRUE)rqRobust two time scale covariance estimation
Description
Calculate the robust two time scale covariance matrix proposed in Boudt and Zhang (2010).Unlike therOWCov, but similarly to therThresholdCov, therRTSCov uses univariate jump detection rulesto truncate the effect of jumps on the covarianceestimate. By the use of two time scales, this covariance estimateis not only robust to price jumps, but also to microstructure noise and non-synchronic trading.
Usage
rRTSCov( pData, cor = FALSE, startIV = NULL, noisevar = NULL, K = 300, J = 1, KCov = NULL, JCov = NULL, KVar = NULL, JVar = NULL, eta = 9, makePsd = FALSE, ...)Arguments
pData | a list. Each list-item i contains an |
cor | boolean, in case it is |
startIV | vector containing the first step estimates of the integrated variance of the assets, needed in the truncation. Is |
noisevar | vector containing the estimates of the noise variance of the assets, needed in the truncation. Is |
K | positive integer, slow time scale returns are computed on prices that are |
J | positive integer, fast time scale returns are computed on prices that are |
KCov | positive integer, for the extradiagonal covariance elements the slow time scale returns are computed on prices that are |
JCov | positive integer, for the extradiagonal covariance elements the fast time scale returns are computed on prices that are |
KVar | vector of positive integers, for the diagonal variance elements the slow time scale returns are computed on prices that are |
JVar | vector of positive integers, for the diagonal variance elements the fast time scale returns are computed on prices that are |
eta | positive real number, squared standardized high-frequency returns that exceed eta are detected as jumps. |
makePsd | boolean, in case it is |
... | used internally, do not change. |
Details
The rRTSCov requires the tick-by-tick transaction prices. (Co)variances are then computed using log-returns calculated on a rolling basison stock prices that areK (slow time scale) andJ (fast time scale) steps apart.
The diagonal elements of the rRTSCov matrix are the variances, computed for log-price seriesX withn price observationsat times \tau_1,\tau_2,\ldots,\tau_n as follows:
(1-\frac{\overline{n}_K}{\overline{n}_J})^{-1}(\{X,X\}_T^{(K)^{*}}-\frac{\overline{n}_K}{\overline{n}_J}\{X,X\}_T^{(J)^{*}}),
where\overline{n}_K=(n-K+1)/K,\overline{n}_J=(n-J+1)/J and
\{X,X\}_T^{(K)^{*}} =\frac{c_\eta^{*}}{K}\frac{\sum_{i=1}^{n-K+1}(X_{t_{i+K}}-X_{t_i})^2I_X^K(i;\eta)}{\frac{1}{n-K+1}\sum_{i=1}^{n-K+1}I_X^K(i;\eta)}.
The constantc_\eta adjusts for the bias due to the thresholding andI_{X}^K(i;\eta) is a jump indicator functionthat is one if
\frac{(X_{t_{i+K}}-X_{t_{i}})^2}{(\int_{t_{i}}^{t_{i+K}} \sigma^2_sds +2\sigma_{\varepsilon_{\mbox{\tiny X}}}^2)} \ \ \leq \ \ \eta
and zero otherwise. The elements in the denominator are the integrated variance (estimated recursively) and noise variance (estimated by the method in Zhang et al, 2005).
The extradiagonal elements of the rRTSCov are the covariances.For their calculation, the data is first synchronized by the refresh time method proposed by Harris et al (1995).It uses the functionrefreshTime to collect first the so-called refresh times at which all assets have traded at least oncesince the last refresh time point. Suppose we have two log-price series:X andY. Let \Gamma =\{ \tau_1,\tau_2,\ldots,\tau_{N^{\mbox{\tiny X}}_{\mbox{\tiny T}}}\} and\Theta=\{\theta_1,\theta_2,\ldots,\theta_{N^{\mbox{\tiny Y}}_{\mbox{\tiny T}}}\}be the set of transaction times of these assets.The first refresh time corresponds to the first time at which both stocks have traded, i.e.\phi_1=\max(\tau_1,\theta_1). The subsequent refresh time is defined as the first time when both stocks have again traded, i.e.\phi_{j+1}=\max(\tau_{N^{\mbox{\tiny{X}}}_{\phi_j}+1},\theta_{N^{\mbox{\tiny{Y}}}_{\phi_j}+1}). Thecomplete refresh time sample grid is\Phi=\{\phi_1,\phi_2,...,\phi_{M_N+1}\}, whereM_N is the total number of paired returns. Thesampling points of assetX andY are defined to bet_i=\max\{\tau\in\Gamma:\tau\leq \phi_i\} ands_i=\max\{\theta\in\Theta:\theta\leq \phi_i\}.
Given these refresh times, the covariance is computed as follows:
c_{N}( \{X,Y\}^{(K)}_T-\frac{\overline{n}_K}{\overline{n}_J}\{X,Y\}^{(J)}_T ),
where
\{X,Y\}^{(K)}_T =\frac{1}{K} \frac{\sum_{i=1}^{M_N-K+1}c_i (X_{t_{i+K}}-X_{t_{i}})(Y_{s_{i+K}}-Y_{s_{i}})I_{X}^K(i;\eta)I_{Y}^K(i;\eta)}{\frac{1}{M_N-K+1}\sum_{i=1}^{M_N-K+1}{I_X^K(i;\eta)I_Y^K(i;\eta)}},
withI_{X}^K(i;\eta) the same jump indicator function as for the variance andc_N a constant to adjust for the bias due to the thresholding.
Unfortunately, the rRTSCov is not always positive semidefinite.By setting the argumentmakePsd = TRUE, the functionmakePsd is used to return a positive semidefinitematrix. This function replaces the negative eigenvalues with zeroes.
Value
anN \times N matrix
Author(s)
Jonathan Cornelissen, Kris Boudt, and Emil Sjoerup.
References
Boudt K. and Zhang, J. 2010. Jump robust two time scale covariance estimation and realized volatility budgets. Mimeo.
Harris, F., McInish, T., Shoesmith, G., and Wood, R. (1995). Cointegration, error correction, and price discovery on informationally linked security markets.Journal of Financial and Quantitative Analysis, 30, 563-581.
Zhang, L., Mykland, P. A., and Ait-Sahalia, Y. (2005). A tale of two time scales: Determining integrated volatility with noisy high-frequency data.Journal of the American Statistical Association, 100, 1394-1411.
See Also
ICov for a list of implemented estimators of the integrated covariance.
Examples
## Not run: library(xts)set.seed(123)start <- strptime("1970-01-01", format = "%Y-%m-%d", tz = "UTC")timestamps <- start + seq(34200, 57600, length.out = 23401)dat <- cbind(rnorm(23401) * sqrt(1/23401), rnorm(23401) * sqrt(1/23401))dat <- exp(cumsum(xts(dat, timestamps)))price1 <- dat[,1]price2 <- dat[,2]rcRTS <- rRTSCov(pData = list(price1, price2))# Note: List of prices as inputrcRTS## End(Not run)An estimator of realized variance.
Description
Calculates the daily Realized Variance.Letr_{t,i} be an intraday return vector withi=1,...,M number of intraday returns.
Then, the realized variance is given by
\mbox{RVar}_{t}=\sum_{i=1}^{M}r_{t,i}^{2}
Usage
rRVar(rData, alignBy = NULL, alignPeriod = NULL, makeReturns = FALSE, ...)Arguments
rData | an |
alignBy | character, indicating the time scale in which |
alignPeriod | positive numeric, indicating the number of periods to aggregate over. For example, to aggregatebased on a 5-minute frequency, set |
makeReturns | boolean, should be |
... | used internally, do not change. |
Value
In case the input is an
xtsobject with data from one day, a numeric of the same length as the number of assets.If the input data spans multiple days and is in
xtsformat, anxtswill be returned.If the input data is a
data.tableobject, the function returns adata.tablewith the same column names as the input data, containing the date and the realized measures.
See Also
IVar for a list of implemented estimators of the integrated variance.
Examples
rv <- rRVar(sampleOneMinuteData, makeReturns = TRUE)plot(rv[, DT], rv[, MARKET], xlab = "Date", ylab = "Realized Variance", type = "l")DEPRECATED
Description
DEPRECATED USErSVar
Usage
rSV(rData, alignBy = NULL, alignPeriod = NULL, makeReturns = FALSE)Arguments
rData | DEPRECATED USE |
alignBy | DEPRECATED USE |
alignPeriod | DEPRECATED USE |
makeReturns | DEPRECATED USE |
Realized semivariance of highfrequency return series
Description
Calculate the realized semivariances, defined in Barndorff-Nielsen et al. (2008).
Function returns two outcomes:
Downside realized semivariance
Upside realized semivariance.
Assume there areN equispaced returnsr_{t,i} in periodt,i=1, \ldots,N.
Then, therSVar is given by
\mbox{rSVardownside}_{t}= \sum_{i=1}^{N} (r_{t,i})^2 \ \times \ I [ r_{t,i} < 0]
\mbox{rSVarupside}_{t}= \sum_{i=1}^{N} (r_{t,i})^2 \ \times \ I [ r_{t,i} > 0]
Usage
rSVar(rData, alignBy = NULL, alignPeriod = NULL, makeReturns = FALSE, ...)Arguments
rData | an |
alignBy | character, indicating the time scale in which |
alignPeriod | positive numeric, indicating the number of periods to aggregate over. For example to aggregate.based on a 5-minute frequency, set |
makeReturns | boolean, should be |
... | used internally |
Value
list with two entries, the realized positive and negative semivariances
Author(s)
Giang Nguyen, Jonathan Cornelissen, Kris Boudt, and Emil Sjoerup.
References
Barndorff-Nielsen, O. E., Kinnebrock, S., and Shephard N. (2010).Measuring downside risk: realised semivariance. In: Volatility and Time Series Econometrics: Essays in Honor of Robert F. Engle,(Edited by Bollerslev, T., Russell, J., and Watson, M.), 117-136. Oxford University Press.
See Also
IVar for a list of implemented estimators of the integrated variance.
Examples
sv <- rSVar(sampleTData[, list(DT, PRICE)], alignBy = "minutes", alignPeriod = 5, makeReturns = TRUE)svRealized semicovariance
Description
Calculate the Realized Semicovariances (rSemiCov).Let r_{t,i} be an intradayM x N return matrix andi=1,...,Mthe number of intraday returns. Then, letr_{t,i}^{+} = max(r_{t,i},0) andr_{t,i}^{-} = min(r_{t,i},0).
Then, the realized semicovariance is given by the following three matrices:
\mbox{pos}_t =\sum_{i=1}^{M} r^{+}_{t,i} r^{+'}_{t,i}
\mbox{neg}_t =\sum_{i=1}^{M} r^{-}_{t,i} r^{-'}_{t,i}
\mbox{mixed}_t =\sum_{i=1}^{M} (r^{+}_{t,i} r^{-'}_{t,i} + r^{-}_{t,i} r^{+'}_{t,i})
The mixed covariance matrix will have 0 on the diagonal.From these three matrices, the realized covariance can be constructed aspos + neg + mixed.The concordant semicovariance matrix ispos + neg.The off-diagonals of the concordant matrix is always positive, while for the mixed matrix, it is always negative.
Usage
rSemiCov( rData, cor = FALSE, alignBy = NULL, alignPeriod = NULL, makeReturns = FALSE)Arguments
rData | an |
cor | boolean, in case it is |
alignBy | character, indicating the time scale in which |
alignPeriod | positive numeric, indicating the number of periods to aggregate over. For example, to aggregatebased on a 5-minute frequency, set |
makeReturns | boolean, should be |
Details
In the case that cor isTRUE, the mixed matrix will be anN \times N matrix filled with NA as mapping the mixed covariance matrix into correlation space is impossible due to the 0-diagonal.
Value
In case the data consists of one day a list of fiveN \times N matrices are returned. These matrices are namedmixed,positive,negative,concordant, andrCov.The latter matrix corresponds to the realized covariance estimator and is thus named like the functionrCov.In case the data spans more than one day, the list for each day will be put into another list named according to the date of the estimates.
Author(s)
Emil Sjoerup.
References
Bollerslev, T., Li, J., Patton, A. J., and Quaedvlieg, R. (2020). Realized semicovariances.Econometrica, 88, 1515-1551.
See Also
ICov for a list of implemented estimators of the integrated covariance.
Examples
# Realized semi-variance/semi-covariance for prices aligned# at 5 minutes.# Univariate:rSVar = rSemiCov(rData = sampleTData[, list(DT, PRICE)], alignBy = "minutes", alignPeriod = 5, makeReturns = TRUE)rSVar## Not run: library("xts")# Multivariate multi day:rSC <- rSemiCov(sampleOneMinuteData, makeReturns = TRUE) # rSC is a list of lists# We extract the covariance between stock 1 and stock 2 for all three covariances.mixed <- sapply(rSC, function(x) x[["mixed"]][1,2])neg <- sapply(rSC, function(x) x[["negative"]][1,2])pos <- sapply(rSC, function(x) x[["positive"]][1,2])covariances <- xts(cbind(mixed, neg, pos), as.Date(names(rSC)))colnames(covariances) <- c("mixed", "neg", "pos")# We make a quick plot of the different covariancesplot(covariances)addLegend(lty = 1) # Add legend so we can distinguish the series.## End(Not run)Realized skewness
Description
Calculate the realized skewness, defined in Amaya et al. (2015).
Assume there areN equispaced returns in periodt. Letr_{t,i} be a return (withi=1, \ldots,N) in periodt. Then,rSkew is given by
\mbox{rSkew}_{t}= \frac{\sqrt{N} \sum_{i=1}^{N}(r_{t,i})^3}{\left(\sum r_{i,t}^2\right)^{3/2}}.
Usage
rSkew(rData, alignBy = NULL, alignPeriod = NULL, makeReturns = FALSE)Arguments
rData | an |
alignBy | character, indicating the time scale in which |
alignPeriod | positive numeric, indicating the number of periods to aggregate over. For example, to aggregatebased on a 5-minute frequency, set |
makeReturns | boolean, should be |
Value
In case the input is an
xtsobject with data from one day, a numeric of the same length as the number of assets.If the input data spans multiple days and is in
xtsformat, anxtswill be returned.If the input data is a
data.tableobject, the function returns adata.tablewith the same column names as the input data, containing the date and the realized measures.
Author(s)
Giang Nguyen, Jonathan Cornelissen, Kris Boudt, Onno Kleen, and Emil Sjoerup.
References
Amaya, D., Christoffersen, P., Jacobs, K., and Vasquez, A. (2015). Does realized skewness and kurtosis predict the cross-section of equity returns?Journal of Financial Economics, 118, 135-167.
Examples
rs <- rSkew(sampleTData[, list(DT, PRICE)],alignBy ="minutes", alignPeriod =5, makeReturns = TRUE)rsRealized tri-power quarticity
Description
Calculate the rTPQuar, defined in Andersen et al. (2012).
Assume there areN equispaced returnsr_{t,i} in periodt,i=1, \ldots,N. Then, the rTPQuar is given by
\mbox{rTPQuar}_{t}=N\frac{N}{N-2} \left(\frac{\Gamma \left(0.5\right)}{ 2^{2/3}\Gamma \left(7/6\right)} \right)^{3} \sum_{i=3}^{N} \mbox({|r_{t,i}|}^{4/3} {|r_{t,i-1}|}^{4/3} {|r_{t,i-2}|}^{4/3})
Usage
rTPQuar(rData, alignBy = NULL, alignPeriod = NULL, makeReturns = FALSE)Arguments
rData | an |
alignBy | character, indicating the time scale in which |
alignPeriod | positive numeric, indicating the number of periods to aggregate over. For example, to aggregatebased on a 5-minute frequency, set |
makeReturns | boolean, should be |
Value
In case the input is an
xtsobject with data from one day, a numeric of the same length as the number of assets.If the input data spans multiple days and is in
xtsformat, anxtswill be returned.If the input data is a
data.tableobject, the function returns adata.tablewith the same column names as the input data, containing the date and the realized measures.
Author(s)
Giang Nguyen, Jonathan Cornelissen, Kris Boudt, and Emil Sjoerup.
References
Andersen, T. G., Dobrev, D., and Schaumburg, E. (2012). Jump-robust volatility estimation using nearest neighbor truncation.Journal of Econometrics, 169, 75-93.
Examples
tpq <- rTPQuar(rData = sampleTData[, list(DT, PRICE)], alignBy = "minutes", alignPeriod = 5, makeReturns = TRUE)tpqTwo time scale covariance estimation
Description
Calculate the two time scale covariance matrix proposed in Zhang et al. (2005) and Zhang (2010).By the use of two time scales, this covariance estimateis robust to microstructure noise and non-synchronic trading.
Usage
rTSCov( pData, cor = FALSE, K = 300, J = 1, KCov = NULL, JCov = NULL, KVar = NULL, JVar = NULL, makePsd = FALSE, ...)Arguments
pData | a list. Each list-item i contains an |
cor | boolean, in case it is |
K | positive integer, slow time scale returns are computed on prices that are |
J | positive integer, fast time scale returns are computed on prices that are |
KCov | positive integer, for the extradiagonal covariance elements the slow time scale returns are computed on prices that are |
JCov | positive integer, for the extradiagonal covariance elements the fast time scale returns are computed on prices that are |
KVar | vector of positive integers, for the diagonal variance elements the slow time scale returns are computed on prices that are |
JVar | vector of positive integers, for the diagonal variance elements the fast time scale returns are computed on prices that are |
makePsd | boolean, in case it is |
... | used internally, do not change. |
Details
The rTSCov requires the tick-by-tick transaction prices. (Co)variances are then computed using log-returns calculated on a rolling basison stock prices that areK (slow time scale) andJ (fast time scale) steps apart.
The diagonal elements of the rTSCov matrix are the variances, computed for log-price seriesX withn price observationsat times \tau_1,\tau_2,\ldots,\tau_n as follows:
(1-\frac{\overline{n}_K}{\overline{n}_J})^{-1}([X,X]_T^{(K)}- \frac{\overline{n}_K}{\overline{n}_J}[X,X]_T^{(J))}
where\overline{n}_K=(n-K+1)/K,\overline{n}_J=(n-J+1)/J and
[X,X]_T^{(K)} =\frac{1}{K}\sum_{i=1}^{n-K+1}(X_{t_{i+K}}-X_{t_i})^2.
The extradiagonal elements of the rTSCov are the covariances.For their calculation, the data is first synchronized by the refresh time method proposed by Harris et al (1995).It uses the functionrefreshTime to collect first the so-called refresh times at which all assets have traded at least oncesince the last refresh time point. Suppose we have two log-price series:X andY. Let \Gamma =\{ \tau_1,\tau_2,\ldots,\tau_{N^{\mbox{\tiny X}}_{\mbox{\tiny T}}}\} and\Theta=\{\theta_1,\theta_2,\ldots,\theta_{N^{\mbox{\tiny Y}}_{\mbox{\tiny T}}}\}be the set of transaction times of these assets.The first refresh time corresponds to the first time at which both stocks have traded, i.e.\phi_1=\max(\tau_1,\theta_1). The subsequent refresh time is defined as the first time when both stocks have again traded, i.e.\phi_{j+1}=\max(\tau_{N^{\mbox{\tiny{X}}}_{\phi_j}+1},\theta_{N^{\mbox{\tiny{Y}}}_{\phi_j}+1}). Thecomplete refresh time sample grid is\Phi=\{\phi_1,\phi_2,...,\phi_{M_N+1}\}, whereM_N is the total number of paired returns. Thesampling points of assetX andY are defined to bet_i=\max\{\tau\in\Gamma:\tau\leq \phi_i\} ands_i=\max\{\theta\in\Theta:\theta\leq \phi_i\}.
Given these refresh times, the covariance is computed as follows:
c_{N}( [X,Y]^{(K)}_T-\frac{\overline{n}_K}{\overline{n}_J}[X,Y]^{(J)}_T ),
where
[X,Y]^{(K)}_T =\frac{1}{K} \sum_{i=1}^{M_N-K+1} (X_{t_{i+K}}-X_{t_{i}})(Y_{s_{i+K}}-Y_{s_{i}}).
Unfortunately, the rTSCov is not always positive semidefinite.By setting the argument makePsd = TRUE, the functionmakePsd is used to return a positive semidefinitematrix. This function replaces the negative eigenvalues with zeroes.
Value
in case the input is and contains data from one day, an N by N matrix is returned. If the data is a univariatexts object with multiple days, anxts is returned.If the data is multivariate and contains multiple days (xts ordata.table), the function returns a list containing N by N matrices. Each item in the list has a name which corresponds to the date for the matrix.
Author(s)
Jonathan Cornelissen, Kris Boudt, and Emil Sjoerup.
References
Harris, F., McInish, T., Shoesmith, G., and Wood, R. (1995). Cointegration, error correction, and price discovery on informationally linked security markets.Journal of Financial and Quantitative Analysis, 30, 563-581.
Zhang, L., Mykland, P. A., and Ait-Sahalia, Y. (2005). A tale of two time scales: Determining integrated volatility with noisy high-frequency data.Journal of the American Statistical Association, 100, 1394-1411.
Zhang, L. (2011). Estimating covariation: Epps effect, microstructure noise.Journal of Econometrics, 160, 33-47.
See Also
ICov for a list of implemented estimators of the integrated covariance.
Examples
# Robust Realized two timescales Variance/Covariance# Multivariate:## Not run: library(xts)set.seed(123)start <- strptime("1970-01-01", format = "%Y-%m-%d", tz = "UTC")timestamps <- start + seq(34200, 57600, length.out = 23401)dat <- cbind(rnorm(23401) * sqrt(1/23401), rnorm(23401) * sqrt(1/23401))dat <- exp(cumsum(xts(dat, timestamps)))price1 <- dat[,1]price2 <- dat[,2]rcovts <- rTSCov(pData = list(price1, price2))# Note: List of prices as inputrcovts## End(Not run)Threshold Covariance
Description
Calculate the threshold covariance matrix proposed in Gobbi and Mancini (2009).Unlike therOWCov, the rThresholdCov uses univariate jump detection rules to truncate the effect of jumps on the covarianceestimate. As such, it remains feasible in high dimensions, but it is less robust to small cojumps.
Letr_{t,i} be an intradayN x 1 return vector ofN assets wherei=1,...,M andM being the number of intraday returns.
Then, thek,q-th element of the threshold covariance matrix is defined as
\mbox{thresholdcov}[k,q]_{t} = \sum_{i=1}^{M} r_{(k)t,i} 1_{\{r_{(k)t,i}^2 \leq TR_{M}\}} \ \ r_{(q)t,i} 1_{\{r_{(q)t,i}^2 \leq TR_{M}\}},
with the threshold valueTR_{M} set to3 \Delta^{0.49} times the daily realized bi-power variation of assetk,as suggested in Jacod and Todorov (2009).
Usage
rThresholdCov( rData, cor = FALSE, alignBy = NULL, alignPeriod = NULL, makeReturns = FALSE, ...)Arguments
rData | an |
cor | boolean, in case it is |
alignBy | character, indicating the time scale in which |
alignPeriod | positive numeric, indicating the number of periods to aggregate over. For example, to aggregatebased on a 5-minute frequency, set |
makeReturns | boolean, should be |
... | used internally, do not change. |
Value
in case the input is and contains data from one day, anN \times N matrix is returned. If the data is a univariatexts object with multiple days, anxts is returned.If the data is multivariate and contains multiple days (xts ordata.table), the function returns a list containingN \times N matrices. Each item in the list has a name which corresponds to the date for the matrix.
Author(s)
Jonathan Cornelissen, Kris Boudt, and Emil Sjoerup.
References
Barndorff-Nielsen, O. and Shephard, N. (2004). Measuring the impact of jumps in multivariate price processes using bipower covariation. Discussion paper, Nuffield College, Oxford University.
Jacod, J. and Todorov, V. (2009). Testing for common arrival of jumps in discretely-observed multidimensional processes.Annals of Statistics, 37, 1792-1838.
Mancini, C. and Gobbi, F. (2012). Identifying the Brownian covariation from the co-jumps given discrete observations.Econometric Theory, 28, 249-273.
See Also
ICov for a list of implemented estimators of the integrated covariance.
Examples
# Realized threshold Variance/Covariance:# Multivariate:## Not run: library("xts")set.seed(123)start <- strptime("1970-01-01", format = "%Y-%m-%d", tz = "UTC")timestamps <- start + seq(34200, 57600, length.out = 23401)dat <- cbind(rnorm(23401) * sqrt(1/23401), rnorm(23401) * sqrt(1/23401))dat <- exp(cumsum(xts(dat, timestamps)))rcThreshold <- rThresholdCov(dat, alignBy = "minutes", alignPeriod = 1, makeReturns = TRUE)rcThreshold## End(Not run)Rank jump test
Description
Calculate the rank jump test of Li et al. (2019).The procedure tests for the rank of the jump matrix at simultaneous jump events in market returns as well as individual assets.
Usage
rankJumpTest( marketPrice, stockPrices, alpha = c(5, 3), coarseFreq = 10, localWindow = 30, rank = 1, BoxCox = 1, quantiles = c(0.9, 0.95, 0.99), nBoot = 1000, dontTestAtBoundaries = TRUE, alignBy = "minutes", alignPeriod = 5, marketOpen = "09:30:00", marketClose = "16:00:00", tz = NULL)Arguments
marketPrice | data.table or |
stockPrices | list containing the individual stock prices in either data.table or |
alpha | significance level (in standard deviations) to use for the jump detections. Default is |
coarseFreq | numeric denoting the coarse sampling frequency. Default is |
localWindow | numeric denoting the local window for the bootstrap algorithm. Default is |
rank | rank of the jump matrix under the null hypothesis. Default is |
BoxCox | numeric of exponents for the Box-Cox transformation, default is |
quantiles | numeric denoting which quantiles of the bootstrapped critical values to return and compare against. Default is |
nBoot | numeric denoting how many replications to be used for the bootstrap algorithm. Default is |
dontTestAtBoundaries | logical determining whether to exclude data across different days. Default is |
alignBy | character, indicating the time scale in which |
alignPeriod | positive numeric, indicating the number of periods to aggregate over. E.g. to aggregatebased on a 5 minute frequency, set |
marketOpen | the market opening time, by default: |
marketClose | the market closing time, by default: |
tz | fallback time zone used in case we we are unable to identify the timezone of the data, by default: |
Details
Let the jump times be defined as:
{\cal I}_{n} = \left\{ i:\left|\Delta_{i}^{n}Z\right|>u_{n}\right\}
Then the estimated jump matrix is:
\hat{\boldsymbol{J}_{n}}=\left[\Delta_{i,k}^{n}\boldsymbol{X}\right]_{i\in{\cal I}_{n}}
Let\hat{\lambda}_{n,1}^{2}\geq\hat{\lambda}_{n,2}^{2}\geq\cdots\geq\hat{\lambda}_{n,d}^{2} be the ordered eigenvalues of\hat{\boldsymbol{J}}_{n}\hat{\boldsymbol{J}}_{n}^{\prime}, then test statistic is
\hat{S}_{n,t}=\sum_{j=r+1}^{d}\hat{\lambda}_{n,j}^{2}.
The critical values are computed by applying a bootstrapping method
The singular value decomposition of the jump matrix\hat{\boldsymbol{J}}_{n} is:
\hat{\boldsymbol{J}}=\hat{\boldsymbol{U}}_{n}\hat{\boldsymbol{D}}_{n}\hat{\boldsymbol{V}}_{n}^{\prime}
then\hat{\boldsymbol{U}}_{n}=\left[\hat{\boldsymbol{U}}_{1n}:\hat{\boldsymbol{U}}_{2n}\right] and\hat{\boldsymbol{V}}_{n}=\left[\hat{\boldsymbol{V}}_{1n}:\hat{\boldsymbol{V}}_{2n}\right]
\boldsymbol{\upsilon}_{n}=\left(\upsilon_{j,n}\right)_{1\leq j\leq d} such that\upsilon_{j,n}\asymp\Delta_{n}^{\varpi} for \varpi\in\left(0,1/2\right) which is used to trim jumps. The bootstrapping method is calculated by the following algorithm
Step 1.
For each
i\in{\cal I}_{n}, draw\kappa_{i}^{\star}\sim\textrm{Uniform}\left[0,1\right]and draw with equal probability,\boldsymbol{\xi}_{n,i-}^{\star} \textrm{from}\left\{ \min\left(\max\left(\Delta_{i-j}^{n}\boldsymbol{X},-\boldsymbol{\upsilon}_{n}\right),\boldsymbol{\upsilon}_{n}\right):1\leq j\leq k_{n}\right\},\boldsymbol{\xi}_{n,i+}^{\star} \textrm{from}\left\{ \min\left(\max\left(\Delta_{i+j}^{n}\boldsymbol{X},-\boldsymbol{\upsilon}_{n}\right),\boldsymbol{\upsilon}_{n}\right):1\leq j\leq k_{n}\right\},and set
\boldsymbol{\zeta}_{n,i}^{\star}=\sqrt{\kappa_{i}^{\star}}\boldsymbol{\xi}_{n,i-}^{\star}+\sqrt{k-\kappa_{i}^{\star}}\boldsymbol{\xi}_{n,i+}^{\star}and\boldsymbol{\zeta}_{n}^{\star}=\left[\boldsymbol{\zeta}_{n,i}^{\star}\right]_{i\in{\cal I}_{n}}Step 2.
Repeat 1 for a large number of iterations. Set
c\upsilon_{n,\alpha}as as the1-\alphaquantile of\left\Vert \hat{\boldsymbol{U}}_{2n}^{\prime}\boldsymbol{\xi}_{n}^{\star}\hat{\boldsymbol{V}}_{2n}\right\Vert ^{2}in the simulated sample.
Value
A list containingcriticalValues which are the bootstrapped critical values,testStatistic the test statistic of the jump test,dimensions which are the dimensions of the jump matrixmarketJumpDetections the jumps detected in the market prices,stockJumpDetections the co-jumps detected in the individual stock prices, andjumpIndices which are the indices of the detected jumps.
Author(s)
Emil Sjoerup, based on Matlab code provided by Li et al. (2019)
References
Li, j., Todorov, V., Tauchen, G., and Lin, H. (2019). Rank Tests at Jump Events.Journal of Business & Economic Statistics, 37, 312-321.
Synchronize (multiple) irregular timeseries by refresh time
Description
This function implements the refresh time synchronization scheme proposed by Harris et al. (1995). It picks the so-called refresh times at which all assets have traded at least once since the last refresh time point. For example, the first refresh time corresponds to the first time at which all stocks have traded.The subsequent refresh time is defined as the first time when all stocks have traded again.This process is repeated until the end of one time series is reached.
Usage
refreshTime(pData, sort = FALSE, criterion = "squared duration")Arguments
pData | a list. Each list-item contains an |
sort | logical determining whether to sort the index based on a criterion (will only sort descending; i.e., most liquid first). Default is |
criterion | character determining which criterion used. Currently supports |
Value
Anxts ordata.table object containing the synchronized time series - depending on the input.
Author(s)
Jonathan Cornelissen, Kris Boudt, Onno Kleen, and Emil Sjoerup.
References
Harris, F., T. McInish, Shoesmith, G., and Wood, R. (1995). Cointegration, error correction, and price discovery on informationally linked security markets.Journal of Financial and Quantitative Analysis, 30, 563-581.
Examples
# Suppose irregular timepoints:start <- as.POSIXct("2010-01-01 09:30:00")ta <- start + c(1,2,4,5,9)tb <- start + c(1,3,6,7,8,9,10,11)# Yielding the following timeseries:a <- xts::as.xts(1:length(ta), order.by = ta)b <- xts::as.xts(1:length(tb), order.by = tb)# Calculate the synchronized timeseries:refreshTime(list(a,b))Delete entries for which the spread is more thanmaxi times the median spread
Description
Function deletes entries for which the spread is more than"maxi" times the medianspread on that day.
Usage
rmLargeSpread(qData, maxi = 50, tz = NULL)Arguments
qData | an |
maxi | an integer. By default |
tz | fallback time zone used in case we we are unable to identify the timezone of the data, by default: |
Value
xts ordata.table object depending on input.
Author(s)
Jonathan Cornelissen, Kris Boudt, Onno Kleen, and Emil Sjoerup.
Delete entries for which the spread is negative
Description
Function deletes entries for which the spread is negative.
Usage
rmNegativeSpread(qData)Arguments
qData | an |
Value
data.table orxts object
Author(s)
Jonathan Cornelissen, Kris Boudt and Onno Kleen
Examples
rmNegativeSpread(sampleQDataRaw)Remove outliers in quotes
Description
Delete entries for which the mid-quote is outlying with respect to surrounding entries.
Usage
rmOutliersQuotes(qData, maxi = 10, window = 50, type = "advanced", tz = NULL)Arguments
qData | a |
maxi | an integer, indicating the maximum number of median absolute deviations allowed. |
window | an integer, indicating the time window for which the "outlyingness" is considered. |
type | should be |
tz | fallback time zone used in case we we are unable to identify the timezone of the data, by default: |
Details
If
type = "standard": Function deletes entries for which the mid-quote deviated by more than "maxi"median absolute deviations from a rolling centered median (excludingthe observation under consideration) of window observations.If
type = "advanced": Function deletes entries for which the mid-quote deviates by more than "maxi"median absolute deviations from the value closest to the mid-quote ofthese three options:Rolling centered median (excluding the observation under consideration)
Rolling median of the following window of observations
Rolling median of the previous window of observations
The advantage of this procedure compared to the "standard" proposedby Barndorff-Nielsen et al. (2010) is that it will not incorrectly removelarge price jumps. Therefore this procedure has been set as the defaultfor removing outliers.
Note that the median absolute deviation is taken over the entireday. In case it is zero (which can happen if mid-quotes don't change much), the median absolute deviation is taken over a subsample without constant mid-quotes.
Value
xts object ordata.table depending on type of input.
Author(s)
Jonathan Cornelissen and Kris Boudt.
References
Barndorff-Nielsen, O. E., P. R. Hansen, A. Lunde, and N. Shephard (2009). Realized kernels in practice: Trades and quotes.Econometrics Journal, 12, C1-C32.
Brownlees, C.T., and Gallo, G.M. (2006). Financial econometric analysis at ultra-high frequency: Data handling concerns.Computational Statistics & Data Analysis, 51, 2232-2245.
Remove outliers in trades without using quote data
Description
Delete entries for which the price is outlying with respect to surrounding entries. In comparison totradesCleanupUsingQuotes, this function doesn't need quote data.
Usage
rmOutliersTrades(pData, maxi = 10, window = 50, type = "advanced", tz = NULL)Arguments
pData | a |
maxi | an integer, indicating the maximum number of median absolute deviations allowed. |
window | an integer, indicating the time window for which the "outlyingness" is considered. |
type | should be |
tz | fallback time zone used in case we we are unable to identify the timezone of the data, by default: |
Details
If
type = "standard": Function deletes entries for which the price deviated by more than "maxi"median absolute deviations from a rolling centered median (excludingthe observation under consideration) of window observations.If
type = "advanced": Function deletes entries for which the price deviates by more than "maxi"median absolute deviations from the value closest to the price ofthese three options:Rolling centered median (excluding the observation under consideration)
Rolling median of the following window of observations
Rolling median of the previous window of observations
The advantage of this procedure compared to the "standard" proposedby Barndorff-Nielsen et al. (2010, footnote 8) is that it will not incorrectly removelarge price jumps. Therefore this procedure has been set as the defaultfor removing outliers.
Note that the median absolute deviation is taken over the entireday. In case it is zero (which can happen if prices don't change much), the median absolute deviation is taken over a subsample without constant prices.
Value
xts object ordata.table depending on type of input.
Author(s)
Jonathan Cornelissen, Kris Boudt, and Onno Kleen.
References
Barndorff-Nielsen, O. E., P. R. Hansen, A. Lunde, and N. Shephard (2009). Realized kernels in practice: Trades and quotes.Econometrics Journal, 12, C1-C32.
Delete transactions with unlikely transaction prices
Description
Function deletes entries with prices that are above the ask plus the bid-ask spread.Similar for entries with prices below the bid minus the bid-ask spread.
Usage
rmTradeOutliersUsingQuotes( tData, qData, lagQuotes = 0, nSpreads = 1, BFM = FALSE, backwardsWindow = 3600, forwardsWindow = 0.5, plot = FALSE, ...)Arguments
tData | a |
qData | a |
lagQuotes | numeric, number of seconds the quotes are registered faster thanthe trades (should be round and positive). Default is 0. For older datasets, i.e. before 2010, it may be a good idea to set this to e.g. 2. See Vergote (2005) |
nSpreads | numeric of length 1 denotes how far above the offer and below bid we allow outliers to be. Trades are filtered out if they are MORE THAN nSpread * spread above (below) the offer (bid) |
BFM | a logical determining whether to conduct 'Backwards - Forwards matching' of trades and quotes.The algorithm tries to match trades that fall outside the bid - ask and first tries to match a small window forwards and if this fails, it tries to match backwards in a bigger window.The small window is a tolerance for inaccuracies in the timestamps of bids and asks. The backwards window allow for matching of late reported trades, i.e. block trades. |
backwardsWindow | a numeric denoting the length of the backwards window. Default is 3600, corresponding to one hour. |
forwardsWindow | a numeric denoting the length of the forwards window. Default is 0.5, corresponding to one half second. |
plot | a logical denoting whether to visualize the forwards, backwards, and unmatched trades in a plot. |
... | used internally |
Details
Note: in order to work correctly, the input data of this function should becleaned trade (tData) and quote (qData) data respectively.In older high frequency datasets the trades frequently lag the quotes. In newer datasets this tends to happen only during extreme market activity when exchange networks are at maximum capacity.
Value
xts ordata.table object depending on input.
Author(s)
Jonathan Cornelissen, Kris Boudt, Onno Kleen, and Emil Sjoerup.
References
Vergote, O. (2005). How to match trades and quotes for NYSE stocks? K.U.Leuven working paper.
Christensen, K., Oomen, R. C. A., Podolskij, M. (2014): Fact or Friction: Jumps at ultra high frequency.Journal of Financial Economics, 144, 576-599
salesCondition is deprecated. UsetradesCondition instead.
Description
salesCondition is deprecated. UsetradesCondition instead.
Usage
salesCondition( tData, validConds = c("", "@", "E", "@E", "F", "FI", "@F", "@FI", "I", "@I"))Arguments
tData | salesCondition is deprecated. UsetradesCondition instead. |
validConds | salesCondition is deprecated. UsetradesCondition instead. |
Multivariate tick by tick data
Description
Cleaned Tick by tick data for a sector ETF, calledETF and two stock components of that ETF,these stocks are namedAAA andBBB.
Usage
sampleMultiTradeDataFormat
Adata.table object
One minute data
Description
One minute data price of one stock and a market proxy. This is data from the US market.
Usage
sampleOneMinuteDataFormat
Adata.table object
Sample of cleaned quotes for stock XXX for 2 days measured in microseconds
Description
Adata.table object containing the quotes for the pseudonymized stock XXX for 2 days. This is the cleaned version of the data samplesampleQDataRaw, usingquotesCleanup.
Usage
sampleQDataFormat
data.table object
Examples
## Not run: # The code to create the sampleQData dataset from raw data issampleQData <- quotesCleanup(qDataRaw = sampleQDataRaw, exchanges = "N", type = "standard", report = FALSE)## End(Not run)Sample of raw quotes for stock XXX for 2 days measured in microseconds
Description
Adata.table object containing the raw quotes the pseudonymized stock XXX for 2 days, in the typical NYSE TAQ database format.
Usage
sampleQDataRawFormat
data.table object
Sample of cleaned trades for stock XXX for 2 days
Description
Adata.table object containing the trades for the pseudonymized stock XXX for 2 days, in the typical NYSE TAQ database format.This is the cleaned version of the data samplesampleTDataRaw, usingtradesCleanupUsingQuotes.
Usage
sampleTDataFormat
A data.table object.
Examples
## Not run: # The code to create the sampleTData dataset from raw data issampleQData <- quotesCleanup(qDataRaw = sampleQDataRaw, exchanges = "N", type = "standard", report = FALSE)tradesAfterFirstCleaning <- tradesCleanup(tDataRaw = sampleTDataRaw, exchanges = "N", report = FALSE)sampleTData <- tradesCleanupUsingQuotes( tData = tradesAfterFirstCleaning, qData = sampleQData, lagQuotes = 0)[, c("DT", "EX", "SYMBOL", "PRICE", "SIZE")]# Only some columns are included. These are the ones that were historically included.# For most applications, we recommend aggregating the data at a high frequency# For example, every second.aggregated <- aggregatePrice(sampleTData[, list(DT, PRICE)], alignBy = "seconds", alignPeriod = 1)acf(diff(aggregated[as.Date(DT) == "2018-01-02", PRICE]))acf(diff(aggregated[as.Date(DT) == "2018-01-03", PRICE]))signature <- function(x, q){res <- x[, (rCov(diff(log(PRICE), lag = q, differences = 1))/q), by = as.Date(DT)]return(res[[2]])}rvAgg <- matrix(nrow = 100, ncol = 2)for(i in 1:100) rvAgg[i, ] <- signature(aggregated, i)plot(rvAgg[,1], type = "l")plot(rvAgg[,2], type = "l")## End(Not run)European data
Description
Trade data of one stock on one day in the European stock market.
Usage
sampleTDataEuropeFormat
Adata.table object
Sample of raw trades for stock XXX for 2 days
Description
An imaginarydata.table object containing the raw trades the pseudonymized stock XXX for 2 days, in the typical NYSE TAQ database format.
Usage
sampleTDataRawFormat
A data.table object.
Retain only data from a single stock exchange
Description
Filter raw trade data to only contain specified exchanges
Usage
selectExchange(data, exch = "N")Arguments
data | an |
exch | The (vector of) symbol(s) of the stock exchange(s) that should be selected.By default the NYSE is chosen (
|
Value
xts ordata.table object depending on input.
Author(s)
Jonathan Cornelissen, Kris Boudt, Onno Kleen, and Emil Sjoerup.
Spot Drift Estimation
Description
Function used to estimate the spot drift of intraday (tick) stock prices/returns
Usage
spotDrift( data, method = "mean", alignBy = "minutes", alignPeriod = 5, marketOpen = "09:30:00", marketClose = "16:00:00", tz = NULL, ...)Arguments
data | Can be one of two input types, |
method | Which method to be used to estimate the spot-drift. Currently, three methods are available, rolling mean and median as well as the kernel method of Christensen et al. (2018).The kernel is a left hand exponential kernel that will weigh newer observations more heavily than older observations. |
alignBy | character, indicating the time scale in which |
alignPeriod | How often should the estimation take place? If |
marketOpen | Opening time of the market, standard is "09:30:00". |
marketClose | Closing time of the market, standard is "16:00:00". |
tz | fallback time zone used in case we we are unable to identify the timezone of the data, by default: |
... | Additional arguments for the individual methods. See ‘Details’. |
Details
The additional arguments for the mean and median methods are:
periodsfor the rolling window length which is 5 by default.aligncontrols the alignment. The default is"right".
For the kernel mean estimator, the argumentsmeanBandwidth can be used to control the bandwidth of the drift estimator and thepreAverage argument, which can be used to control the pre-averaging horizon. These arguments default to 300 and 5 respectively.
The following estimation methods can be specified inmethod:
Rolling window mean ("mean")
Estimates the spot drift by applying a rolling mean over returns.
\hat{\mu_{t}} = \sum_{t = k}^{T} \textrm{mean} \left(r_{t-k : t} \right),
wherek is the argumentperiods.Parameters:
periodshow big the window for the estimation should be. The estimator will have
periodsNAs at the beginning of each trading day.alignalignment method for returns. Defaults to
"left", which includes only past data, but other choices,"center"and"right"are available.Warning: These values includes future data.
Outputs:
mua matrix containing the spot drift estimates
Rolling window median ("median")
Estimates the spot drift by applying a rolling mean over returns.
\hat{\mu_{t}} = \sum_{t = k}^{T} \textrm{median} \left(r_{t-k : t} \right),
wherek is the argumentperiods.Parameters:
periodsHow big the window for the estimation should be. The estimator will have
periodsNAs at the beginning of each trading day.alignAlignment method for returns. Defaults to
"left", which includes only past data, but other choices,"center"and"right"are available.These values includes FUTURE DATA, so beware!
Outputs:
mua matrix containing the spot drift estimates
kernel spot drift estimator ("kernel")
dX_{t} = \mu_{t}dt + \sigma_{t}dW_{t} + dJ_{t},
where\mu_{t},\sigma_{t}, andJ_{t} are the spot drift, the spot volatility, and a jump process respectively.However, due to microstructure noise, the observed log-price is
Y_{t} = X_{t} + \varepsilon_{t}
In order robustify the results to the presence of market microstructure noise, the pre-averaged returns are used:
\Delta_{i}^{n}\overline{Y} = \sum_{j=1}^{k_{n}-1}g_{j}^{n}\Delta_{i+j}^{n}Y,
whereg(\cdot) is a weighting function,min(x, 1-x), andk_{n} is the pre-averaging horizon.The spot drift estimator is then:
\hat{\bar{\mu}}_{t}^{n} = \sum_{i=1}^{n-k_{n}+2}K\left(\frac{t_{i-1}-t}{h_{n}}\right)\Delta_{i-1}^{n}\overline{Y},
The kernel estimation method has the following parameters:
preAveragea positive
integerdenoting the length of pre-averaging window for the log-prices. Default is 5meanBandwidthan
integerdenoting the bandwidth for the left-sided exponential kernel for the mean. Default is300L
Outputs:
mua matrix containing the spot drift estimates
Value
An object of class"spotDrift" containing at least the estimated spot drift process. Input on what this class should contain and methods for it is welcome.
Author(s)
Emil Sjoerup.
References
Christensen, K., Oomen, R., and Reno, R. (2020) The drift burst hypothesis. Journal of Econometrics. Forthcoming.
Examples
# Example 1: Rolling mean and median estimators for 2 daysmeandrift <- spotDrift(data = sampleTData, alignPeriod = 1)mediandrift <- spotDrift(data = sampleTData, method = "median", alignBy = "seconds", alignPeriod = 30, tz = "EST")plot(meandrift)plot(mediandrift)## Not run: # Example 2: Kernel based estimator for one day with data.table formatprice <- sampleTData[as.Date(DT) == "2018-01-02", list(DT, PRICE)]kerneldrift <- spotDrift(sampleTDataEurope, method = "driftKernel", alignBy = "minutes", alignPeriod = 1)plot(kerneldrift)## End(Not run)Spot volatility estimation
Description
Estimates a wide variety of spot volatility estimators.
Usage
spotVol( data, method = "detPer", alignBy = "minutes", alignPeriod = 5, marketOpen = "09:30:00", marketClose = "16:00:00", tz = "GMT", m = NULL, n = NULL, online = NULL, ...)Arguments
data | Can be one of two input types, |
method | specifies which method will be used to estimate the spotvolatility. Valid options are |
alignBy | character, indicating the time scale in which |
alignPeriod | positive integer, indicating the number of periods to aggregateover. For example, to aggregate an |
marketOpen | the market opening time. This should be in the time zonespecified by |
marketClose | the market closing time. This should be in the time zonespecified by |
tz | fallback time zone used in case we we are unable to identify the timezone of the data, by default: |
m | method-specific tuning parameters for the piecewise constantvolatility estimator ( |
n | see the description of parameter |
online | see the description of parameter |
... | method-specific parameters (see ‘Details’ below). |
Details
The following estimation methods can be specified inmethod:
Deterministic periodicity method ("detPer")
Parameters:
dailyVolA string specifying the estimation method for the daily components_t.Possible values are"rBPCov", "rRVar", "rMedRVar"."rBPCov"by default.periodicVolA string specifying the estimation method for the component of intraday volatility,that depends in a deterministic way on the intraday time at which the return is observed.Possible values are"SD", "WSD", "TML", "OLS". See Boudt et al. (2011) for details. Default ="TML".P1A positive integer corresponding to the number of cosine terms used in the flexible Fourierspecification of the periodicity function, see Andersen et al. (1997) for details. Default = 5.P2Same asP1, but for the sine terms. Default = 5.dummiesBoolean: in case it isTRUE, the parametric estimator of periodic standard deviationspecifies the periodicity function as the sum of dummy variables corresponding to each intraday period.If it isFALSE, the parametric estimator uses the flexible Fourier specification. Default isFALSE.
Outputs (see ‘Value’ for a full description of each component):
spotdailyperiodic
Let there beT days ofN equally-spaced log-returnsr_{i,t},i = 1, \dots, N andi = 1, \dots, T.In case ofmethod = "detPer", the returns are modeled as
r_{i,t} = f_i s_t u_{i,t}
with independentu_{i,t} \sim \mathcal{N}(0,1).The spot volatility is decomposed into a deterministic periodic factorf_{i} (identical for every day in the sample) and a daily factors_{t} (identical for all observations within a day). Both components are then estimated separately, see Taylor and Xu (1997)and Andersen and Bollerslev (1997). The jump robust versions by Boudt et al.(2011) have also been implemented.
IfperiodicVol = "SD", we have
\hat f_i^{SD} = \frac{SD_i}{\sqrt{\frac{1}{\lfloor{\lambda / \Delta}\rfloor} \sum_{j = 1}^N SD_j^2}}
with\Delta = 1 / N, cross-daily averagesSD_i = \sqrt{1/T \sum_{i = t}^T r_{i,t}^2}, and\lambda being the length of the intraday time intervals.
IfperiodicVol = "WSD", we have another nonparametric estimator that is robust to jumps in contrast toperiodicVol = "SD". The definition of this estimator can be found in Boudt et al. (2011, Eqs. 2.9-2.12).
The estimates whenperiodicVol = "OLS" andperiodicVol = "TML" are based on the regression equation
\log \left| 1/T \sum_{t = 1}^T r_{i,t} \right| - c = \log f_i + \varepsilon_i
withi.i.d. zero-mean error term\varepsilon_i andc = -0.63518.periodicVol = "OLS" employs ordinary-least-squares estimation andperiodicVol = "TML" truncated maximum-likelihood estimation (see Boudt et al., 2011, Section 2.2, for further details).
Stochastic periodicity method ("stochPer")
Parameters:
P1: A positive integer corresponding to the number of cosine terms used in the flexible Fourierspecification of the periodicity function. Default = 5.P2: Same asP1, but for the sine terms. Default = 5.init: A named list of initial values to be used in the optimization routine ("BFGS"inoptim).Default =list(sigma = 0.03, sigma_mu = 0.005, sigma_h = 0.005, sigma_k = 0.05, phi = 0.2, rho = 0.98, mu = c(2, -0.5), delta_c = rep(0, max(1,P1)),delta_s = rep(0, max(1,P2))). The naming of the parameters follows Beltratti and Morana (2001), the corresponding model equations are listed below.initcan contain any number of these parameters.For parameters not specified ininit, the default initial value will be used.control: A list of options to be passed down tooptim.
Outputs (see ‘Value’ for a full description of each component):
spotpar
This method by Beltratti and Morana (2001) assumes the periodicity factor tobe stochastic. The spot volatility estimation is split into four components:a random walk, an autoregressive process, a stochastic cyclical process anda deterministic cyclical process. The model is estimated using aquasi-maximum likelihood method based on the Kalman Filter. The packageFKF is used to apply the Kalman filter. In addition tothe spot volatility estimates, all parameter estimates are returned.
The model for the intraday change in the return series is given by
r_{t,n} = \sigma_{t,n} \varepsilon_{t,n}, \ t = 1, \dots, T; \ n = 1, \dots, N,
where\sigma_{t,n} is the conditional standard deviation of then-th intervalof dayt and\varepsilon_{t,n} is ai.i.d. mean-zero unit-variance process.The conditional standard deviations are modeled as
\sigma_{t,n} = \sigma \exp \left(\frac{\mu_{t,n} + h_{t,n} + c_{t,n}}{2} \right)
with\sigma being a scaling factor and\mu_{t,n} is the non-stationary volatilitycomponent
\mu_{t,n} = \mu_{t,n-1} + \xi_{t,n}
with independent\xi_{t,n} \sim \mathcal{N}(0,\sigma_\xi^2).h_{t,n} is the stochastic stationary acyclical volatility component
h_{t,n} = \phi h_{t,n-1} + \nu_{t,n}
with independent\eta_{t,n} \sim \mathcal{N}(0,\sigma_\eta^2) and| \phi | \leq 1.The cyclical component is separated in two components:
c_{t,n} = c_{1,t,n} + c_{2,t,n}
The first component is written in state-space form,
\left( \begin{array}{r}c_{1,t,n} \\ c_{1,t,n}^*\end{array}\right) =\rho \left(\begin{array}{rr}\cos \lambda & \sin \lambda \\ -\sin \lambda & \cos \lambda\end{array}\right)\left(\begin{array}{r}c_{1,t,n - 1} \\ c_{1,t,n-1}^*\end{array}\right)+\left(\begin{array}{r}\kappa_{1,t,n} \\ \kappa_{1,t,n}^*\end{array}\right)
with0 \leq \rho \leq 1 and\kappa_{1,t,n}, \kappa_{1,t,n}^* are mutually independent zero-mean normal random variables with variance\sigma_\kappa^2.All other parameters and the processc_{1,t,n}^* in the state-space representation are only of instrumental use and are not part ofthe return value which is why we won't introduce them in detailin this vignette; see Beltratti and Morana (2001, pp. 208-209) for more information.
The second component is given by
c_{2,t,n} = \mu_1 n_1 + \mu_2 n_2 + \sum_{p = 2}^P (\delta_{cp} \cos(p\lambda) + \delta_{sp} \sin (p \lambda n))
withn_1 = 2n / (N+1) andn_2 = 6n^2 / (N+1) / (N+2).
Nonparametric filtering ("kernel")
Parameters:
typeString specifying the type of kernel to be used. Optionsinclude"gaussian", "epanechnikov", "beta". Default ="gaussian".hScalar or vector specifying bandwidth(s) to be used in kernel.Ifhis a scalar, it will be assumed equal throughout the sample. Ifit is a vector, it should contain bandwidths for each day. If left empty,it will be estimated. Default =NULL.estString specifying the bandwidth estimation method. Possiblevalues include"cv", "quarticity". Method"cv"equalscross-validation, which chooses the bandwidth that minimizes the IntegratedSquare Error."quarticity"multiplies the simple plug-in estimatorby a factor based on the daily quarticity of the returns.estisobsolete ifhhas already been specified by the user."cv"by default.lowerLower bound to be used in bandwidth optimization routine,when using cross-validation method. Default is0.1n^{-0.2}.upperUpper bound to be used in bandwidth optimization routine,when using cross-validation method. Default isn^{-0.2}.
Outputs (see ‘Value’ for a full description of each component):
spotpar
This method by Kristensen (2010) filters the spot volatility in anonparametric way by applying kernel weights to the standard realizedvolatility estimator. Different kernels and bandwidths canbe used to focus on specific characteristics of the volatility process.
Estimation results heavily depend on the bandwidth parameterh, so itis important that this parameter is well chosen. However, it is difficult tocome up with a method that determines the optimal bandwidth for any kind ofdata or kernel that can be used. Although some estimation methods areprovided, it is advised that you specifyh yourself, or make sure thatthe estimation results are appropriate.
One way to estimateh, is by using cross-validation. For each day inthe sample,h is chosen as to minimize the Integrated Square Error,which is a function ofh. However, this function often has multiplelocal minima, or no minima at all (h \rightarrow \infty). To ensure a reasonableoptimum is reached, strict boundaries have to be imposed onh. Thesecan be specified bylower andupper, which by default are0.1n^{-0.2} andn^{-0.2} respectively, wheren is thenumber of observations in a day.
When using the method"kernel", in addition to the spot volatilityestimates, all used values of the bandwidthh are returned.
A formal definition of the estimator is too extensive for the context of this vignette.Please refer to Kristensen (2010) for more detailed information. Our parameternames are aligned with this reference.
Piecewise constant volatility ("piecewise")
Parameters:
typestring specifying the type of test to be used. Optionsinclude
"MDa", "MDb", "DM". See Fried (2012) for details. Default ="MDa".mnumber of observations to include in reference window.Default =
40.nnumber of observations to include in test window.Default =
20.alphasignificance level to be used in tests. Note that the testwill be executed many times (roughly equal to the total number ofobservations), so it is advised to use a small value for
alpha, toavoid a lot of false positives. Default =0.005.volEststring specifying the realized volatility estimator to beused in local windows. Possible values are
"rBPCov", "rRVar", "rMedRVar".Default ="rBPCov".onlineboolean indicating whether estimations at a certain point
tshould be done online (using only information available att-1), or ex post (using all observations between two change points).Default =TRUE.
Outputs (see ‘Value’ for a full description of each component):
spotcp
This nonparametric method by Fried (2012) is a two-step approach and assumes the volatility to bepiecewise constant over local windows. Robust two-sample tests are applied todetect changes in variability between subsequent windows. The spot volatilitycan then be estimated by evaluating regular realized volatility estimatorswithin each local window."MDa", "MDb" refer to different test statistics, see Section 2.2 in Fried (2012).
Along with the spot volatility estimates, this method will return thedetected change points in the volatility level. When plotting aspotVol object containingcp, these change points will bevisualized.
GARCH models with intraday seasonality ("garch")
Parameters:
modelstring specifying the type of test to be used. Optionsinclude"sGARCH", "eGARCH". Seeugarchspecin therugarchpackage. Default ="eGARCH".garchordernumeric value of length 2, containing the order ofthe GARCH model to be estimated. Default =c(1,1).diststring specifying the distribution to be assumed on theinnovations. Seedistribution.modelinugarchspecforpossible options. Default ="norm".solver.controllist containing solver options.Seeugarchfitfor possible values. Default =list().P1a positive integer corresponding to the number of cosineterms used in the flexible Fourier specification of the periodicity function.Default = 5.P2same asP1, but for the sinus terms. Default = 5.
Outputs (see ‘Value’ for a full description of each component):
spotugarchfit
Along with the spot volatility estimates, this method will return theugarchfit object used by therugarch package.
In this model, daily returnsr_t based on intraday observationsr_{i,t}, i = 1, \dots, Nare modeled as
r_t = \sum_{i = 1}^N r_{i,t} = \sigma_t \frac{1}{\sqrt{N}} \sum_{i = 1}^N s_i Z_{i,t}.
with\sigma_t > 0, intraday seasonalitys_i > 0, andZ_{i,t} being a zero-mean unit-variance error term.
The overall approach is as in Appendix B of Andersen and Bollerslev (1997).This method generates the external regressorss_i needed to model the intradayseasonality with a flexible Fourier form (Andersen and Bollerslev, 1997, Eqs. A.1-A.4). Therugarch package is then employed to estimate the specified intraday GARCH(1,1) model on the residualsr_{i,t} / s_i.
Realized Measures ("RM")
This estimator takes trailing rolling window observations of intraday returns to estimate the spot volatility.
Parameters:
RMstring denoting which realized measure to use to estimate the local volatility. Possible values are:"rBPCov", "rMedRVar", "rMinRVar", "rCov", "rRVar".Default ="rBPCov".lookBackPeriodpositive integer denoting the amount of sub-sampled returns to use for the estimation of the local volatility. Default is10.dontIncludeLastlogical indicating whether to omit the last return in the calculation of the local volatility.This is done in Lee-Mykland (2008) to produce jump-robust estimates of spot volatility. Setting this toTRUEwill then uselookBackPeriod - 1returns in the construction of the realized measures. Default =FALSE.
Outputs (see ‘Value’ for a full description of each component):
spotRMlookBackPeriod
This method returns the estimates of the spot volatility, a string containing the realized measure used, and the lookBackPeriod.
(Non-overlapping) Pre-Averaged Realized Measures ("PARM")
This estimator takes rolling historical window observations of intraday returns to estimate the spot volatility as in the option"RM" but adds return pre-averaging of the realized measures. For a description of return pre-averaging see the details onspotDrift.
Parameters:
RMString denoting which realized measure to use to estimate the local volatility.Possible values are:"rBPCov", "rMedRVar", "rMinRVar", "rCov", and "rRVar". Default ="rBPCov".lookBackPeriodpositive integer denoting the amount of sub-sampled returns to use for the estimation of the local volatility. Default = 50.
Outputs (see ‘Value’ for a full description of each component):
spotRMlookBackPeriodkn
Value
AspotVol object, which is a list containing one or more of thefollowing outputs, depending on the method used:
spotAn
xtsormatrixobject (depending on the input) containingspot volatility estimates\sigma_{t,i}, reported for each intervalibetweenmarketOpenandmarketClosefor every daytindata. The length of the intervals is specified byalignPeriodandalignBy. Methods that provide this output: All.dailyAn
xtsornumericobject (depending on the input) containingestimates of the daily volatility levels for each daytindata,if the used method decomposed spot volatility into a daily and an intradaycomponent. Methods that provide this output:"detPer".periodicAn
xtsornumericobject (depending on the input) containingestimates of the intraday periodicity factor for each day intervalibetweenmarketOpenandmarketClose, if the spot volatility wasdecomposed into a daily and an intraday component. If the output is inxtsformat, this periodicity factor will be dated to the first day ofthe input data, but it is identical for each day in the sample. Methods thatprovide this output:"detPer".parA named list containing parameter estimates, for methods that estimate oneor more parameters. Methods that provide this output:
"stochper", "kernel".cpA vector containing the change points in the volatility, i.e. the observationindices after which the volatility level changed, according to the appliedtests. The vector starts with a 0. Methods that provide this output:
"piecewise".ugarchfitA
ugarchfitobject, as used by therugarchpackage, containing all output from fitting the GARCH model to the data.Methods that provide this output:"garch".The
spotVolfunction offers several methods to estimate spotvolatility and its intraday seasonality, using high-frequency data. Itreturns an object of classspotVol, which can contain various outputs,depending on the method used. See ‘Details’ for a description of each method.In any case, the output will contain the spot volatility estimates.The input can consist of price data or return data, either tick by tick orsampled at set intervals. The data will be converted to equispacedhigh-frequency returns
r_{t,i}(read: thei-th return on dayt).
Author(s)
Jonathan Cornelissen, Kris Boudt, Onno Kleen, and Emil Sjoerup.
References
Andersen, T. G. and Bollerslev, T. (1997). Intraday periodicity and volatility persistence in financial markets.Journal of Empirical Finance, 4, 115-158.
Beltratti, A. and Morana, C. (2001). Deterministic and stochastic methods for estimation of intraday seasonal components with high frequency data.Economic Notes, 30, 205-234.
Boudt K., Croux C., and Laurent S. (2011). Robust estimation of intraweek periodicity in volatility and jump detection.Journal of Empirical Finance, 18, 353-367.
Fried, R. (2012). On the online estimation of local constant volatilities.Computational Statistics and Data Analysis, 56, 3080-3090.
Kristensen, D. (2010). Nonparametric filtering of the realized spot volatility: A kernel-based approach.Econometric Theory, 26, 60-93.
Taylor, S. J. and Xu, X. (1997). The incremental volatility information in one million foreign exchange quotations.Journal of Empirical Finance, 4, 317-340.
Examples
## Not run: init <- list(sigma = 0.03, sigma_mu = 0.005, sigma_h = 0.007, sigma_k = 0.06, phi = 0.194, rho = 0.986, mu = c(1.87,-0.42), delta_c = c(0.25, -0.05, -0.2, 0.13, 0.02), delta_s = c(-1.2, 0.11, 0.26, -0.03, 0.08))# Next method will take around 370 iterationsvol1 <- spotVol(sampleOneMinuteData[, list(DT, PRICE = MARKET)], method = "stochPer", init = init)plot(vol1$spot[1:780])legend("topright", c("stochPer"), col = c("black"), lty=1)## End(Not run)# Various kernel estimates## Not run: h1 <- bw.nrd0((1:nrow(sampleOneMinuteData[, list(DT, PRICE = MARKET)]))*60)vol2 <- spotVol(sampleOneMinuteData[, list(DT, PRICE = MARKET)], method = "kernel", h = h1)vol3 <- spotVol(sampleOneMinuteData[, list(DT, PRICE = MARKET)], method = "kernel", est = "quarticity")vol4 <- spotVol(sampleOneMinuteData[, list(DT, PRICE = MARKET)], method = "kernel", est = "cv")plot(cbind(vol2$spot, vol3$spot, vol4$spot))xts::addLegend("topright", c("h = simple estimate", "h = quarticity corrected", "h = crossvalidated"), col = 1:3, lty=1)## End(Not run)# Piecewise constant volatility## Not run: vol5 <- spotVol(sampleOneMinuteData[, list(DT, PRICE = MARKET)], method = "piecewise", m = 200, n = 100, online = FALSE)plot(vol5)## End(Not run)# Compare regular GARCH(1,1) model to eGARCH, both with external regressors## Not run: vol6 <- spotVol(sampleOneMinuteData[, list(DT, PRICE = MARKET)], method = "garch", model = "sGARCH")vol7 <- spotVol(sampleOneMinuteData[, list(DT, PRICE = MARKET)], method = "garch", model = "eGARCH")plot(as.numeric(t(vol6$spot)), type = "l")lines(as.numeric(t(vol7$spot)), col = "red")legend("topleft", c("GARCH", "eGARCH"), col = c("black", "red"), lty = 1)## End(Not run)## Not run: # Compare realized measure spot vol estimation to pre-averaged versionvol8 <- spotVol(sampleTDataEurope[, list(DT, PRICE)], method = "RM", marketOpen = "09:00:00", marketClose = "17:30:00", tz = "UTC", alignPeriod = 1, alignBy = "mins", lookBackPeriod = 10)vol9 <- spotVol(sampleTDataEurope[, list(DT, PRICE)], method = "PARM", marketOpen = "09:00:00", marketClose = "17:30:00", tz = "UTC", lookBackPeriod = 10)plot(zoo::na.locf(cbind(vol8$spot, vol9$spot)))## End(Not run)Convert to format for realized measures
Description
Convenience function to split data from onexts ordata.table with at least"DT","SYMBOL", and"PRICE" columns to a format that can be used in the functions for calculation of realized measures. This is the opposite ofgatherPrices.
Usage
spreadPrices(data)Arguments
data | An |
Value
Anxts or adata.table object with columns"DT" and a column named after each unique entrance in the"SYMBOL" column of the input. These columns contain the price of the associated symbol. We drop all other columns, e.g.SIZE.
Author(s)
Emil Sjoerup.
See Also
Examples
## Not run: library(data.table)data1 <- copy(sampleTData)[, `:=`(PRICE = PRICE * runif(.N, min = 0.99, max = 1.01), DT = DT + runif(.N, 0.01, 0.02))]data2 <- copy(sampleTData)[, SYMBOL := 'XYZ']dat <- rbind(data1, data2)setkey(dat, "DT")dat <- spreadPrices(dat)rCov(dat, alignBy = 'minutes', alignPeriod = 5, makeReturns = TRUE, cor = TRUE) ## End(Not run)Summary forHARmodel objects
Description
Summary forHARmodel objects
Usage
## S3 method for class 'HARmodel'summary(object, ...)Arguments
object | An object of class |
... | pass |
Value
A modifiedsummary.lm
Cleans trade data
Description
This is a wrapper function for cleaning the trade data of all stock data inside the folder dataSource. The result is saved in the folder dataDestination.
In case you supply the argumentrawtData, the on-disk functionality is ignored. The function returns a vectorindicating how many trades were removed at each cleaning step in this case.and the function returns anxts ordata.table object.
The following cleaning functions are performed sequentially:noZeroPrices,autoSelectExchangeTrades orselectExchange,tradesCondition, andmergeTradesSameTimestamp.
Since the functionrmTradeOutliersUsingQuotesalso requires cleaned quote data as input, it is not incorporated here andthere is a separate wrapper calledtradesCleanupUsingQuotes.
Usage
tradesCleanup( dataSource = NULL, dataDestination = NULL, exchanges = "auto", tDataRaw = NULL, report = TRUE, selection = "median", validConds = c("", "@", "E", "@E", "F", "FI", "@F", "@FI", "I", "@I"), marketOpen = "09:30:00", marketClose = "16:00:00", printExchange = TRUE, saveAsXTS = FALSE, tz = NULL)Arguments
dataSource | character indicating the folder in which the original data is stored. |
dataDestination | character indicating the folder in which the cleaned data is stored. |
exchanges | vector of stock exchange symbols for all data in
The default value is |
tDataRaw |
|
report | boolean and |
selection | argument to be passed on to the cleaning routine |
validConds | character vector containing valid sales conditions. Passed through to |
marketOpen | character in the format of |
marketClose | character in the format of |
printExchange | Argument passed to |
saveAsXTS | indicates whether data should be saved in |
tz | fallback time zone used in case we we are unable to identify the timezone of the data, by default: |
Details
Using the on-disk functionality with .csv.zip files from the WRDS databasewill write temporary files on your machine in order to unzip the files - we try to clean up after it,but cannot guarantee that there won't be files that slip through the crack if the permission settings on your machine does not match ours.
If the inputdata.table does not contain a DT column but it does contain DATE and TIME_M columns, we create the DT column by REFERENCE, altering thedata.table that may be in the user's environment.
Value
For each day anxts ordata.table object is saved into the folder of that date, containing the cleaned data.This procedure is performed for each stock in"ticker".The function returns a vector indicating how many trades remained after each cleaning step.
In case you supply the argumentrawtData, the on-disk functionality is ignoredand the function returns a list with the cleaned trades asxts object (see examples).
Author(s)
Jonathan Cornelissen, Kris Boudt, Onno Kleen, and Emil Sjoerup
References
Barndorff-Nielsen, O. E., Hansen, P. R., Lunde, A., and Shephard, N. (2009). Realized kernels in practice: Trades and quotes.Econometrics Journal, 12, C1-C32.
Brownlees, C.T. and Gallo, G.M. (2006). Financial econometric analysis at ultra-high frequency: Data handling concerns.Computational Statistics & Data Analysis, 51, 2232-2245.
Examples
# Consider you have raw trade data for 1 stock for 2 days head(sampleTDataRaw)dim(sampleTDataRaw)tDataAfterFirstCleaning <- tradesCleanup(tDataRaw = sampleTDataRaw, exchanges = list("N"))tDataAfterFirstCleaning$reportdim(tDataAfterFirstCleaning$tData)# In case you have more data it is advised to use the on-disk functionality# via "dataSource" and "dataDestination" argumentsPerform a final cleaning procedure on trade data
Description
Function performs cleaning procedurermTradeOutliersUsingQuotes for the trades of all stocks data in "dataDestination". Note that preferably the input data for this function is trade and quote data cleaned by respectively e.g.tradesCleanupandquotesCleanup.
Usage
tradesCleanupUsingQuotes( tradeDataSource = NULL, quoteDataSource = NULL, dataDestination = NULL, tData = NULL, qData = NULL, lagQuotes = 0, nSpreads = 1, BFM = FALSE, backwardsWindow = 3600, forwardsWindow = 0.5, plot = FALSE)Arguments
tradeDataSource | character indicating the folder in which the original trade data is stored. |
quoteDataSource | character indicating the folder in which the original quote data is stored. |
dataDestination | character indicating the folder in which the cleaned data is stored, folder of |
tData |
|
qData |
|
lagQuotes | numeric, number of seconds the quotes are registered faster thanthe trades (should be round and positive). Default is 0. For older datasets, i.e. before 2010, it may be a good idea to set this to, e.g., 2 (see, Vergote, 2005). |
nSpreads | numeric of length 1 denotes how far above the offer and below bid we allow outliers to be. Trades are filtered out if they are MORE THAN nSpread * spread above (below) the offer (bid) |
BFM | a logical determining whether to conduct "Backwards - Forwards matching" of trades and quotes.The algorithm tries to match trades that fall outside the bid - ask and first tries to match a small window forwards and if this fails, it tries to match backwards in a bigger window.The small window is a tolerance for inaccuracies in the timestamps of bids and asks. The backwards window allow for matching of late reported trades, i.e. block trades. |
backwardsWindow | a numeric denoting the length of the backwards window used when |
forwardsWindow | a numeric denoting the length of the forwards window used when |
plot | a logical denoting whether to visualize the forwards, backwards, and unmatched trades in a plot. Passed on to |
Details
In case you supply the argumentstData andqData, the on-disk functionality is ignoredand the function returns cleaned trades as adata.table orxts object (see examples).
When using the on-disk functionality and tradeDataSource and quoteDataSource are the same, the quote files are all files in the folder that contains 'quote', and the rest are treated as containing trade data.
Value
For each day anxts object is saved into the folder of that date, containing the cleaned data.
Author(s)
Jonathan Cornelissen, Kris Boudt, Onno Kleen, and Emil Sjoerup.
References
Barndorff-Nielsen, O. E., Hansen, P. R., Lunde, A., and Shephard, N. (2009). Realized kernels in practice: Trades and quotes.Econometrics Journal, 12, C1-C32.
Brownlees, C.T., and Gallo, G.M. (2006). Financial econometric analysis at ultra-high frequency: Data handling concerns.Computational Statistics & Data Analysis, 51, 2232-2245.
Christensen, K., Oomen, R. C. A., Podolskij, M. (2014): Fact or Friction: Jumps at ultra high frequency.Journal of Financial Economics, 144, 576-599
Examples
# Consider you have raw trade data for 1 stock for 2 days ## Not run: tDataAfterFirstCleaning <- tradesCleanup(tDataRaw = sampleTDataRaw, exchanges = "N", report = FALSE)qData <- quotesCleanup(qDataRaw = sampleQDataRaw, exchanges = "N", report = FALSE)dim(tDataAfterFirstCleaning)tDataAfterFinalCleaning <- tradesCleanupUsingQuotes(qData = qData[as.Date(DT) == "2018-01-02"], tData = tDataAfterFirstCleaning[as.Date(DT) == "2018-01-02"])dim(tDataAfterFinalCleaning)## End(Not run)# In case you have more data it is advised to use the on-disk functionality# via the "tradeDataSource", "quoteDataSource", and "dataDestination" argumentsDelete entries with abnormal trades condition.
Description
Delete entries with abnormal trades condition
Usage
tradesCondition( tData, validConds = c("", "@", "E", "@E", "F", "FI", "@F", "@FI", "I", "@I"))Arguments
tData | an |
validConds | a character vector containing valid sales conditions defaults to |
Details
To get more information on the sales conditions, see the NYSE documentation. Section about Daily TAQ Trades File.The current version (as of May 2020) can be found online atNYSE's webpage
Value
xts ordata.table object depending on input.
Note
Some CSV readers and the WRDS API parses empty strings as NAs. We transformNA values in COND to"".
Author(s)
Jonathan Cornelissen, Kris Boudt, Onno Kleen, and Emil Sjoerup.