This vignette illustrates the usage of the package
fitHeavyTailto estimate the mean vector and covariance matrix of heavy-tailedmultivariate distributions such as the angular Gaussian, Cauchy, orStudent’s\(t\) distribution. Theresults are compared against existing benchmark functions from differentpackages.
The package can be installed fromCRAN orGitHub:
# install stable version from CRANinstall.packages("fitHeavyTail")# install development version from GitHubdevtools::install_github("convexfi/fitHeavyTail")To get help:
library(fitHeavyTail)help(package = "fitHeavyTail")?fit_mvtTo citefitHeavyTailin publications:
citation("fitHeavyTail")To illustrate the simple usage of the packagefitHeavyTail,let’s start by generating some multivariate data under a Student’s\(t\) distribution with significant heavytails (degrees of freedom\(\nu=4\)):
library(mvtnorm) # package for multivariate t distributionN <- 10 # number of variablesT <- 80 # number of observationsnu <- 4 # degrees of freedom for heavy tailsset.seed(42)mu <- rep(0, N)U <- t(rmvnorm(n = round(0.3*N), sigma = 0.1*diag(N)))Sigma_cov <- U %*% t(U) + diag(N) # covariance matrix with factor model structureSigma_scatter <- (nu-2)/nu * Sigma_covX <- rmvt(n = T, delta = mu, sigma = Sigma_scatter, df = nu) # generate dataWe can first estimate the mean vector and covariance matrix via thetraditional sample estimates (i.e., sample mean and sample covariancematrix):
mu_sm <- colMeans(X)Sigma_scm <- cov(X)Then we can compute the robust estimates via the packagefitHeavyTail:
library(fitHeavyTail)fitted <- fit_mvt(X)We can now compute the estimation errors and see the significantimprovement:
sum((mu_sm - mu)^2)#> [1] 0.2857323sum((fitted$mu - mu)^2)#> [1] 0.1487845sum((Sigma_scm - Sigma_cov)^2)#> [1] 5.861138sum((fitted$cov - Sigma_cov)^2)#> [1] 3.031499To get a visual idea of the robustness, we can plot the shapes of thecovariance matrices (true and estimated ones) on two dimensions. Observehow the heavy-tailed estimation follows the true one more closely thanthe sample covariance matrix:
In the following, we generate multivariate heavy-tailed Student’s\(t\) distributed data and compare theperformance of many different existing packages via 100 Monte Carlosimulations in terms of estimation accurary, measured by the meansquared error (MSE) and CPU time.
The following plot gives a nice overall perspective of the MSEvs. CPU time tradeoff of the different methods (note the ellipse at thebottom left that embraces the best four methods:fitHeavyTail::fit_Tyler(),fitHeavyTail::fit_Cauchy(),fitHeavyTail::fit_mvt(), andfitHeavyTail::fit_mvt() with fixednu = 6):
From the numerical results we can draw several observations:
stats:cov() is the sample covariance matrix (SCM). Asexpected, it is not robust to heavy tails and has the worst estimationerror although it enjoys the lowest computational cost. It is notacceptable for heavy-tailed distributions.QRM::fit.mst() assumes the data follows a multivariate\(t\) distribution; it has one of thehighest computational cost with a not-so-good estimation error.MASS::cov.trob() (with fixednu = 6)assumes the data follows a multivariate\(t\) distribution; it shows a goodperformance in terms of MSE and cpu time. It is probably the best choiceamong the benchmark existing packages (with the advantage that it comespreinstalled with base R in the packageMASS).MASS::cov.mve() shows one of the worst performance interms of both estimation error and computational cost.robustbase::covMcd() also shows one of the worstperformance in terms of both estimation error and computationalcost.robust::covRob() has a low computational cost but badestimation error.covRobust::cov.nnve() shows a bad performance in termsof both estimatior error and cpu time.rrcov::CovMrcd() also shows one of the worstperformance in terms of both estimation error and computationalcost.sn::selm (nu=6) has a very good performance but with ahigh computational cost.fitHeavyTail::fit_Tyler() normalizes the data (to getrid of the shape of the tail); it shows a very small estimation errorwith an acceptable computational cost.fitHeavyTail::fit_Cauchy() assumes a multivariateCauchy distribution and it has a performance similar tofitHeavyTail::fit_Tyler() but with a slightly highercomputational cost.fitHeavyTail::fit_mvt() assumes the data follows amultivariate\(t\) distribution; itshows a small estimation error with acceptable computational cost.fitHeavyTail::fit_mvt() with fixednu = 6seems to perform similar to the previous case (which also estimatesnu).Concluding, the top choices seem to be (in order):
fitHeavyTail::fit_mvt() (either without fixingnu or withnu = 6),fitHeavyTail::fit_Cauchy(),fitHeavyTail::fit_Tyler(), andMASS::cov.trob() (with the advantage of beingpreinstalled with base R, but with a worse estimation error).The overall winner isfitHeavyTail::fit_mvt() by a bigmargin.
The empirical distribution of daily returns of some financialvariables, such as exchange rates, equity prices, and interest rates, isoften skewed. There are several different formulations of multivariateskewed\(t\) distributions appearing inthe literature(Lee and McLachlan, 2014)(Aas and Haff, 2006). The package nowsupports the multivariate generalized hyperbolic (GH) skewed\(t\) distribution and provides a method toestimate the parameters of such distribution. It is implemented in thefunctionfitHeavyTail::fit_mvst(). Below is a simpleexample to illustrate its usage:
# parameter setting for GH Skewed t distributionN <- 5T <- 200nu <- 6mu <- rnorm(N)scatter <- diag(N)gamma <- rnorm(N)# generate GH Skew t data via hierarchical structuretaus <- rgamma(n = T, shape = nu/2, rate = nu/2)X <- matrix(data = mu, nrow = T, ncol = N, byrow = TRUE) + matrix(data = gamma, nrow = T, ncol = N, byrow = TRUE) / taus + mvtnorm::rmvnorm(n = T, mean = rep(0, N), sigma = scatter) / sqrt(taus)# fit GH Skew t modelfitted <- fit_mvst(X)In essence, all the algorithms are based on the maximum likelihoodestimation (MLE) of some assumed distribution given the observed data.The difficulty comes from the fact that the optimal solution to such MLEformulations becomes too involved in the form of a fixed-point equationand the framework of Majorization-Minimization (MM) algorithms(Sunet al., 2017) becomes key to derive efficient algorithms.
In some cases, the probability distribution function becomes toocomplicated to manage directly (like the multivariate Student’s\(t\) distribution) and it is necessary toresort to a hierarchical distribution that involves some latentvariables. In order to deal with such hidden variables, one has toresort to the Expectation-Maximization (EM) algorithm, whichinterestingly is an instance of the MM algorithm.
The following is a concise description of the algorithms used by thethree fitting functions (note that the current version of the R packagefitHeavyTaildoes not allow yet a regularization term with a target):
The functionfitHeavyTail::fit_Tyler() normalizesthe centered samples\(\bar{\boldsymbol{x}}_t= \boldsymbol{x}_t - \boldsymbol{\mu}\) (where\(\boldsymbol{\mu}\) has been previouslyestimated), which then have an angular Gaussian distribution on thesphere, and performs an MLE based on the MM algorithm(Sunet al., 2014). The formulation including a regularizationterm is\[\begin{array}{ll}\underset{\boldsymbol{\Sigma}}{\textsf{minimize}} &\begin{aligned}[t]\frac{T}{2}\log\det(\boldsymbol{\Sigma}) +\frac{N}{2}\sum\limits_{t=1}^{T}\log{\left(\bar{\boldsymbol{x}}_t^{\mkern-1mu\raise-1mu\textsf{T}}\boldsymbol{\Sigma}^{-1}\bar{\boldsymbol{x}}_t\right)}\hspace{2cm}\\\color{darkred}{+ \;\alpha\left(N\log\left(\textsf{Tr}\left(\boldsymbol{\Sigma}^{-1}\boldsymbol{T}\right)\right)+ \log\det(\boldsymbol{\Sigma})\right)}\end{aligned}\end{array}\] where\(\boldsymbol{T}\) isthe target matrix (e.g.,\(\boldsymbol{T} =\boldsymbol{I}\) or\(\boldsymbol{T} =\frac{1}{N}\textsf{Tr}(\boldsymbol{S})\times\boldsymbol{I}\),with\(\boldsymbol{S}\) being thesample covariance matrix). This leads to the iteration step\[\boldsymbol{\Sigma}_{k+1} =(1 -\rho)\frac{N}{T}\sum\limits_{t=1}^{T}\frac{\bar{\boldsymbol{x}}_t\bar{\boldsymbol{x}}_t^{\mkern-1mu\raise-1mu\textsf{T}}}{\bar{\boldsymbol{x}}_t^{\mkern-1mu\raise-1mu\textsf{T}}\boldsymbol{\Sigma}_k^{-1}\bar{\boldsymbol{x}}_t}+\rho\frac{N}{\textsf{Tr}\left(\boldsymbol{\Sigma}_k^{-1}\boldsymbol{T}\right)}\boldsymbol{T},\] where\(\rho = \frac{\alpha}{T/2 +\alpha}\) or\(\alpha =\frac{T}{2}\frac{\rho}{1 - \rho}\), and initial point\(\boldsymbol{\Sigma}_{0} = \boldsymbol{S}\).For better numerical stability, one can further normalize the estimateat each iteration:\(\boldsymbol{\Sigma}_{k+1}\leftarrow\boldsymbol{\Sigma}_{k+1}/\textsf{Tr}\left(\boldsymbol{\Sigma}_{k+1}\right)\).The iterations converge to the solution up to a scaling factor if andonly if\(1 + \frac{2}{T}\alpha >\frac{N}{T}\) or, equivalently,\(\rho> 1 - \frac{T}{N}\)(Sun et al.,2014) (the correct scaling factor is later obtained via arobust fitting method). If instead the regularization term\(\color{darkred}{\textsf{Tr}\left(\boldsymbol{\Sigma}^{-1}\boldsymbol{T}\right)+ \log\det(\boldsymbol{\Sigma})}\) is used, the iteration stepbecomes\[\boldsymbol{\Sigma}_{k+1} =(1 -\rho)\frac{N}{T}\sum\limits_{t=1}^{T}\frac{\bar{\boldsymbol{x}}_t\bar{\boldsymbol{x}}_t^{\mkern-1mu\raise-1mu\textsf{T}}}{\bar{\boldsymbol{x}}_t^{\mkern-1mu\raise-1mu\textsf{T}}\boldsymbol{\Sigma}_k^{-1}\bar{\boldsymbol{x}}_t}+ \rho\boldsymbol{T}.\]
The functionfitHeavyTail::fit_Cauchy() assumes thatthe data follows a multivariate Cauchy distribution (\(t\) distribution with\(\nu=1\)) and performs an MLE based on theMM algorithm(Sun et al., 2015). The formulationincluding a regularization term is\[\begin{array}{ll}\underset{\boldsymbol{\mu},\boldsymbol{\Sigma}}{\textsf{minimize}} &\begin{aligned}[t]& \frac{T}{2}\log\det(\boldsymbol{\Sigma}) +\frac{N+1}{2}\sum\limits_{t=1}^{T}\log{\left(1+(\boldsymbol{x}_t -\boldsymbol{\mu})^{\mkern-1mu\raise-1mu\textsf{T}}\boldsymbol{\Sigma}^{-1}(\boldsymbol{x}_t- \boldsymbol{\mu})\right)}\\& \color{darkred}{+\;\alpha\left(N\log\left(\textsf{Tr}\left(\boldsymbol{\Sigma}^{-1}\boldsymbol{T}\right)\right)+ \log\det(\boldsymbol{\Sigma})\right) + \gamma \log{\left(1 +(\boldsymbol{\mu} -\boldsymbol{t})^{\mkern-1mu\raise-1mu\textsf{T}}\boldsymbol{\Sigma}^{-1}(\boldsymbol{\mu}- \boldsymbol{t})\right)}}\end{aligned}\end{array}\] where\(\boldsymbol{t}\) and\(\boldsymbol{T}\) are the targets for\(\boldsymbol{\mu}\) and\(\boldsymbol{\Sigma}\), respectively. Thisleads to the following (accelerated) iteration step (Algorithm 4 in(Sun et al., 2015)):\[\boldsymbol{\mu}_{k+1} = \frac{(N+1)\sum_{t=1}^{T}w_t\left(\boldsymbol{\mu}_k,\boldsymbol{\Sigma}_k\right)\boldsymbol{x}_t+ 2\gammaw_{\textsf{tgt}}\left(\boldsymbol{\mu}_k,\boldsymbol{\Sigma}_k\right)\boldsymbol{t}}{(N+1)\sum_{t=1}^{T}w_t\left(\boldsymbol{\mu}_k,\boldsymbol{\Sigma}_k\right) + 2\gammaw_{\textsf{tgt}}\left(\boldsymbol{\mu}_k,\boldsymbol{\Sigma}_k\right)}\] and\[\boldsymbol{\Sigma}_{k+1} = \beta_k\left\{(1 -\rho)\frac{N+1}{T}\sum\limits_{t=1}^{T}w_t\left(\boldsymbol{\mu}_k,\boldsymbol{\Sigma}_k\right)\left(\boldsymbol{x}_t- \boldsymbol{\mu}_{k+1}\right)\left(\boldsymbol{x}_t -\boldsymbol{\mu}_{k+1}\right)^{\mkern-1mu\raise-1mu\textsf{T}}\\\hspace{6cm}+\rho\left(\frac{N}{\textsf{Tr}\left(\boldsymbol{\Sigma}_k^{-1}\boldsymbol{T}\right)}\boldsymbol{T}+\frac{\gamma}{\alpha}w_\textsf{tgt}\left(\boldsymbol{\mu}_k,\boldsymbol{\Sigma}_k\right)\left(\boldsymbol{t}- \boldsymbol{\mu}_{k+1}\right)\left(\boldsymbol{t} -\boldsymbol{\mu}_{k+1}\right)^{\mkern-1mu\raise-1mu\textsf{T}}\right)\right\}\] where\(\rho = \frac{\alpha}{T/2 +\alpha}\),\[\begin{aligned}w_t\left(\boldsymbol{\mu},\boldsymbol{\Sigma}\right) &= \frac{1}{1 +\left(\boldsymbol{x}_t -\boldsymbol{\mu}\right)^{\mkern-1mu\raise-1mu\textsf{T}}\boldsymbol{\Sigma}^{-1}\left(\boldsymbol{x}_t- \boldsymbol{\mu}\right)},\\w_\textsf{tgt}\left(\boldsymbol{\mu},\boldsymbol{\Sigma}\right) &=\frac{1}{1 + \left(\boldsymbol{t} -\boldsymbol{\mu}\right)^{\mkern-1mu\raise-1mu\textsf{T}}\boldsymbol{\Sigma}^{-1}\left(\boldsymbol{t}- \boldsymbol{\mu}\right)},\\\beta_k &=\frac{T+2\gamma}{(N+1)\sum_{t=1}^{T}w_t\left(\boldsymbol{\mu}_k,\boldsymbol{\Sigma}_k\right)+ 2\gammaw_\textsf{tgt}\left(\boldsymbol{\mu}_k,\boldsymbol{\Sigma}_k\right)},\end{aligned}\] and initial point\(\boldsymbol{\mu}_{0} =\frac{1}{T}\sum_{t=1}^{T}\boldsymbol{x}_t\) and\(\boldsymbol{\Sigma}_{0} = \boldsymbol{S}\)(note that this initial point is not totally correct due to a scalingfactor). The iterations converge to the solution if and only if theconditions of Corollary 3 in(Sun et al.,2015) are satisfied.
The functionfitHeavyTail::fit_mvt() assumes thedata follows a multivariate Student’s\(t\) distribution and performs an MLE basedon the EM algorithm(Liu and Rubin, 1995;Liu et al.,1998). The MLE formulation (without missing values) is\[\begin{array}{ll}\underset{\boldsymbol{\mu},\boldsymbol{\Sigma},\nu}{\textsf{minimize}}&\begin{aligned}[t]\frac{T}{2}\log\det(\boldsymbol{\Sigma}) +\frac{N+\nu}{2}\sum\limits_{t=1}^{T}\log{\left(1+\frac{1}{\nu}(\boldsymbol{x}_t-\boldsymbol{\mu})^{\mkern-1mu\raise-1mu\textsf{T}}\boldsymbol{\Sigma}^{-1}(\boldsymbol{x}_t- \boldsymbol{\mu})\right)}\\-\; T\log{\Gamma\left(\frac{N+\nu}{2}\right)}+ T\log{\Gamma\left(\frac{\nu}{2}\right)}+ \frac{TN}{2}\log{\nu}.\end{aligned}\end{array}\] Since its direct minimization is complicated, the EM algorithminstead iteratively optimizes the Q function at iteration\(k\):\[\begin{array}{ll}\underset{\boldsymbol{\mu},\boldsymbol{\Sigma},\nu}{\textsf{minimize}}&\begin{aligned}[t]\frac{T}{2}\log\det(\boldsymbol{\Sigma}) +\sum\limits_{t=1}^{T}\left\{\frac{\textsf{E}_k[\tau_t]}{2}(\boldsymbol{x}_t-\boldsymbol{\mu})^{\mkern-1mu\raise-1mu\textsf{T}}\boldsymbol{\Sigma}^{-1}(\boldsymbol{x}_t- \boldsymbol{\mu}) + \frac{\nu}{2}\textsf{E}_k[\tau_t] -\frac{\nu}{2}\textsf{E}_k[\log{\tau_t]}\right\}\\-\; \frac{T\nu}{2}\log{\frac{\nu}{2}} +T\log{\Gamma\left(\frac{\nu}{2}\right)}\end{aligned}\end{array}\] where\[\textsf{E}_k[\tau_t] = \frac{\nu_k + N}{\nu_k + \left(\boldsymbol{x}_t -\boldsymbol{\mu}_k\right)^{\mkern-1mu\raise-1mu\textsf{T}}\boldsymbol{\Sigma}_k^{-1}\left(\boldsymbol{x}_t- \boldsymbol{\mu}_k\right)}.\] The (accelerated) solution is given by\[\boldsymbol{\mu}_{k+1} =\frac{\sum_{t=1}^{\mkern-1mu\raise-1mu\textsf{T}}\textsf{E}_k[\tau_t]\boldsymbol{x}_t}{\sum_{t=1}^{\mkern-1mu\raise-1mu\textsf{T}}\textsf{E}_k[\tau_t]},\]\[\boldsymbol{\Sigma}_{k+1} =\frac{1}{\alpha_k}\frac{1}{T}\sum_{t=1}^{T}\textsf{E}_k[\tau_t]\left(\boldsymbol{x}_t- \boldsymbol{\mu}_{k+1}\right)\left(\boldsymbol{x}_t -\boldsymbol{\mu}_{k+1}\right)^{\mkern-1mu\raise-1mu\textsf{T}},\] with\(\alpha_k =\frac{1}{T}\sum_{t=1}^{\mkern-1mu\raise-1mu\textsf{T}}\textsf{E}_k[\tau_t]\),and\(\nu_{k+1}\) can be found by:
\[\nu_{k+1} = \arg\min_\nu\left\{\frac{\nu}{2}\sum_{t=1}^{T}\left(\textsf{E}_k[\tau_t] -\textsf{E}_k[\log{\tau_t]}\right) - \frac{\nu}{2}T\log{\frac{\nu}{2}} +T\log{\Gamma\left(\frac{\nu}{2}\right)}\right\};\]
\[\nu_{k+1} = \arg\min_\nu \left\{\frac{N + \nu}{2}\sum_{t=1}^{T}\log{\left(\nu + \left(\boldsymbol{x}_t -\boldsymbol{\mu}_{k+1}\right)\boldsymbol{\Sigma}_{k+1}^{-1}\left(\boldsymbol{x}_t-\boldsymbol{\mu}_{k+1}\right)^{\mkern-1mu\raise-1mu\textsf{T}}\right)}\\\hspace{6cm}- T\log{\Gamma\left(\frac{N + \nu}{2}\right)} +T\log{\Gamma\left(\frac{\nu}{2}\right)} - \frac{\nu}{2}T\log{\nu}\right\};\]
The initial point is\(\boldsymbol{\mu}_{0}= \frac{1}{T}\sum_{t=1}^{T}\boldsymbol{x}_t\),\(\boldsymbol{\Sigma}_{0} =\frac{\nu_0-2}{\nu_0}\boldsymbol{S}\), and\(\nu_0 = 2/\kappa + 4\), with\(\kappa = \left[\frac{1}{3}\frac{1}{N}\sum_{i=1}^N\textsf{kurt}_i\right]^+\) and\[\textsf{kurt}_i= \frac{(T-1)}{(T-2)(T-3)}\left((T+1)\left(\frac{m_i^{(4)}}{\big(m_i^{(2)}\big)^2}- 3\right) + 6\right),\] where\(m_i^{(q)}=\frac{1}{T}\sum_{t=1}^{\mkern-1mu\raise-1mu\textsf{T}}(x_{it}-\bar{x}_i)^q\)denotes the\(q\)th order samplemoment. The algorithm with missing values in\(\boldsymbol{x}_t\) becomes more cumbersomebut it is essentially the same idea. This function can also incorporatea factor model structure on the covariance matrix\(\boldsymbol{\Sigma} =\boldsymbol{B}\boldsymbol{B}^{\mkern-1mu\raise-1mu\textsf{T}}+ {\sfDiag}(\boldsymbol{\psi})\), which requires a more sophisticatedalgorithm(Zhou et al., 2019).
The functionfitHeavyTail::fit_mvst() assumes thedata follows a multivariate skew\(t\)distribution and performs an MLE based on the EM algorithm(Aas and Haff, 2006). Suppose\(\boldsymbol{x}_{t}\) is a random vectorfollowing the GH Skew t distribution, its probability density function(pdf) is decided by four parameters, namely, location vector\(\boldsymbol{\mu}\), scatter matrix\(\boldsymbol{\Sigma}\), degrees of freedom\(\nu\), and skewness vector\(\boldsymbol{\gamma}\). When\(\boldsymbol{\gamma} = \boldsymbol{0}\), theGH skew\(t\) reduces to the Student’s\(t\) distribution (considered infitHeavyTail::fit_mvt()). The expression of the GH skew\(t\) pdf is rather complicated:\[f\left(\boldsymbol{x}\right)=\frac{2^{1-\frac{\nu+N}{2}}}{\Gamma\left(\frac{\nu}{2}\right)\left(\pi\nu\right)^{\frac{N}{2}}\vert\boldsymbol{\Sigma}\vert^{\frac{1}{2}}}\frac{K_{\frac{\nu+N}{2}}\left(\left[\left(\nu+d\left(\boldsymbol{x}\right)\right)\boldsymbol{\gamma}^{\mkern-1mu\raise-1mu\textsf{T}}\boldsymbol{\Sigma}^{-1}\boldsymbol{\gamma}\right]^{\frac{1}{2}}\right)\exp\left(\left(\boldsymbol{x}-\boldsymbol{\mu}\right)^{\mkern-1mu\raise-1mu\textsf{T}}\boldsymbol{\Sigma}^{-1}\boldsymbol{\gamma}\right)}{\left[\left(\nu+d\left(\boldsymbol{x}\right)\right)\boldsymbol{\gamma}^{T}\boldsymbol{\Sigma}^{-1}\boldsymbol{\gamma}\right]^{-\frac{\nu+N}{4}}\left(1+\frac{d\left(\boldsymbol{x}\right)}{\nu}\right)^{\frac{\nu+N}{2}}},\] where\(K_{\alpha}\) is themodified Bessel function and\(d\left(\boldsymbol{x}\right)=\left(\boldsymbol{x}-\boldsymbol{\mu}\right)^{\mkern-1mu\raise-1mu\textsf{T}}\boldsymbol{\Sigma}^{-1}\left(\boldsymbol{x}-\boldsymbol{\mu}\right)\).Fortunately,\(\boldsymbol{x}_{t}\) canbe represented in a hierarchical structure:\[\begin{aligned}\boldsymbol{x}_{t}\lvert\tau &\sim\mathcal{N}\left(\boldsymbol{\mu}+\frac{1}{\tau_{t}}\boldsymbol{\gamma},\frac{1}{\tau_{t}}\boldsymbol{\Sigma}\right),\\\tau_{t} & \sim\text{Gamma}\left(\frac{\nu}{2},\frac{\nu}{2}\right).\end{aligned}\] Making use of such structure, an EM based algorithm has beenproposed in(Aas and Haff, 2006) to obtain the MLEestimators of these parameters.