| Type: | Package |
| Title: | Distributions Hermite Polynomial Approximation |
| Version: | 1.3.3 |
| Date: | 2023-11-29 |
| Author: | Potanin Bogdan |
| Maintainer: | Potanin Bogdan <bogdanpotanin@gmail.com> |
| Description: | Multivariate conditional and marginal densities, moments, cumulative distribution functions as well as binary choice and sample selection models based on Hermite polynomial approximation which was proposed and described by A. Gallant and D. W. Nychka (1987) <doi:10.2307/1913241>. |
| License: | GPL-3 |
| Imports: | Rcpp (≥ 1.0.10), RcppParallel (≥ 5.0.0) |
| LinkingTo: | Rcpp, RcppArmadillo, RcppParallel |
| RoxygenNote: | 7.2.3 |
| Encoding: | UTF-8 |
| Suggests: | ggplot2, mvtnorm, titanic, sampleSelection, GA (≥ 3.2) |
| NeedsCompilation: | yes |
| SystemRequirements: | GNU make |
| Packaged: | 2023-11-29 05:10:28 UTC; Bogdan |
| Repository: | CRAN |
| Date/Publication: | 2023-11-29 07:00:10 UTC |
B-splines generation, estimation and combination
Description
FunctionbsplineGenerate generates a listof all basis splines with appropriateknots vector anddegree.FunctionbsplineComb allows to get linear combinationsof these b-splines with particularweights. FunctionbsplineEstimate estimates the spline atpointsx. The structure of this spline should be provided viam andknots arguments.
Usage
bsplineGenerate(knots, degree, is_names = TRUE)bsplineEstimate(x, m, knots)bsplineComb(splines, weights)Arguments
knots | sorted in ascending order numeric vector representingknots of the spline. |
degree | positive integer representing degree of the spline. |
is_names | logical; if TRUE (default) then rows and columns of thespline matrices will have a names. Set it to FALSE in order to get notable speed boost. |
x | numeric vector representing the points at which the spline should be estimated. |
m | numeric matrix which rows correspond to spline intervalswhile columns represent variables powers. Therefore the element in i-th row and j-th column represents the coefficient associated withthe variable that 1) belongs to the i-th interval i.e. between i-th and(i + 1)-th knots 2) raised to the power of (j - 1). |
splines | list being returned by the |
weights | numeric vector of the same length as |
Details
In contrast tobs functionbsplineGenerate generates a splines basis in a formof a list containing information concerning these b-splines structure.In order to evaluate one of these b-splines at particular pointsbsplineEstimate function should be applied.
Value
FunctionbsplineGenerate returns a list. Eachelement of this list is a list containing the followinginformation concerning b-spline structure:
knots- knots vector of the b-spline.m- matrix representing polynomial coefficients for eachinterval of the spline in the same manner as formargument(see this argument description above).ind- index of the b-spline.
FunctionbsplineComb returns a list with the following arguments:
knots- knots vector of thesplines.m- linear combination of thesplinesmatrices; coefficients of this linear combination are given viaweightsargument.
FunctionbsplineGenerate returns a numericvector of values being calculated at pointsx via splines withknots vector and matrixm.
Examples
# Let's generate all b-splines of degree 3 with knots # vector (-2.1, 1.5, 1.5, 2.2, 3.7, 4.2, 5)b <- bsplineGenerate(knots = c(-2.1, 1.5, 1.5, 2.2, 3.7, 4.2, 5), degree = 3)# Get the first of these b-splinesb[[1]]# Take a linear combination of these splines with # weights 1.6, -1.2 and 3.2.b_comb <- bsplineComb(splines = b, weights = c(1.6, -1.2, 3.2))# Estimate this spline value at points (-3, 0.7, 2.5, 3.8, 10)b_values <- bsplineEstimate(x = c(-3, 0.7, 2.5, 3.8, 10), knots = b_comb$knots, m = b_comb$m)# Visualize the splines <- seq(from = 0, to = 5, length = 1000)b_values_s <- bsplineEstimate(x = s, knots = b_comb$knots, m = b_comb$m)plot(s, b_values_s)Extract coefficients from hpaBinary object
Description
Extract coefficients from hpaBinary object
Usage
## S3 method for class 'hpaBinary'coef(object, ...)Arguments
object | Object of class "hpaBinary" |
... | further arguments (currently ignored) |
Extract coefficients from hpaML object
Description
Extract coefficients from hpaML object
Usage
## S3 method for class 'hpaML'coef(object, ...)Arguments
object | Object of class "hpaML" |
... | further arguments (currently ignored) |
Extract coefficients from hpaSelection object
Description
Extract coefficients from hpaSelection object
Usage
## S3 method for class 'hpaSelection'coef(object, ..., type = "all")Arguments
object | Object of class "hpaSelection" |
... | further arguments (currently ignored) |
type | character; if "all" (default) then all estimated parametersvalues will be returned. If "selection" then selection equation coefficientsestimates will be provided. If "outcome" then outcome equation coefficientsestimates will be returned. |
Calculate normal pdf in parallel
Description
Calculate in parallel for each value from vectorx density function of normal distribution with mean equal tomean and standard deviation equal tosd.
Usage
dnorm_parallel(x, mean = 0, sd = 1, is_parallel = FALSE)Arguments
x | numeric vector of quantiles. |
mean | double value. |
sd | double positive value. |
is_parallel | if |
Examples
## Consider normal distribution with mean 3 and standard deviation 5.## Calculate its density function at points 2 and 3.# Create vector of pointsmy_points <- c(2, 3)# Calculate pdf at these points # (set is_parallel = TRUE in order # to turn on parallel computations)dnorm_parallel(my_points, 3, 5, is_parallel = FALSE)Semi-nonparametric single index binary choice model estimation
Description
This function performs semi-nonparametric (SNP) maximum likelihood estimation of single index binary choice model using Hermite polynomial based approximating function proposed by Gallant and Nychka in 1987. Please, seedhpa 'Details' section to get more information concerning this approximating function.
Usage
hpaBinary( formula, data, K = 1L, mean_fixed = NA_real_, sd_fixed = NA_real_, constant_fixed = 0, coef_fixed = TRUE, is_x0_probit = TRUE, is_sequence = FALSE, x0 = numeric(0), cov_type = "sandwich", boot_iter = 100L, is_parallel = FALSE, opt_type = "optim", opt_control = NULL, is_validation = TRUE)Arguments
formula | an object of class "formula" (or one that can be coerced to that class):a symbolic description of the model to be fitted.All variables in |
data | data frame containing the variables in the model. |
K | non-negative integer representing polynomial degree (order). |
mean_fixed | numeric value for binary choice equation random error density mean parameter. Set it to |
sd_fixed | numeric value for binary choice equation random errordensity |
constant_fixed | numeric value for binary choice equation constant parameter. Set it to |
coef_fixed | logical value indicating whether binary equation first independent variable coefficient should be fixed ( |
is_x0_probit | logical; if |
is_sequence | if TRUE then function calculates models with polynomialdegrees from 0 to K each time using initial values obtained from the previous step. In this case function will return the list of models where i-th list element correspond to model calculated under K=(i-1). |
x0 | numeric vector of optimization routine initial values.Note that |
cov_type | character determining the type of covariance matrix to bereturned and used for summary. If |
boot_iter | the number of bootstrap iterationsfor |
is_parallel | if |
opt_type | string value determining the type of the optimizationroutine to be applied. The default is |
opt_control | a list containing arguments to be passed to theoptimization routine depending on |
is_validation | logical value indicating whether function input arguments should be validated. Set it to |
Details
Densities Hermite polynomial approximation approach has beenproposed by A. Gallant and D. W. Nychka in 1987. The main idea is toapproximate unknown distribution density with scaled Hermite polynomial.For more information please refer to the literature listed below.
Let's use notations introduced indhpa 'Details' section. FunctionhpaBinary maximizes the followingquasi log-likelihood function:
\ln L(\gamma_{0}, \gamma, \alpha, \mu, \sigma; x) = \sum\limits_{i:z_{i}=1} \ln\left(\overline{F}_{\xi}(-(\gamma_{0}+\gamma x_{i}), \infty;\alpha, \mu, \sigma)\right) +
+\sum\limits_{i:z_{i}=0} \ln\left(\overline{F}_{\xi}(-\infty, -(\gamma_{0} + x_{i}\gamma);\alpha, \mu, \sigma)\right),
where (in addition to previously defined notations):
x_{i} - is row vector of regressors derived fromdata according toformula.
\gamma - is column vector of regression coefficients.
\gamma_{0} - constant.
z_{i} - binary (0 or 1) dependent variable defined informula.
Note that\xi is one dimensional andK correspondstoK=K_{1}.
The first polynomial coefficient (zero powers) set to 1 for identification purposes i.e.\alpha_{0}=1.
Ifcoef_fixed isTRUE then the coefficient for the first independent variable informula will be fixed to 1 i.e.\gamma_{1}=1.
Ifmean_fixed is notNA then\mu=mean_fixedfixed.
Ifsd_fixed is notNA then\sigma=mean_fixedfixed. However ifis_x0_probit = TRUE then parameter\sigma will be scale adjusted in order to provide better initial point for optimization routine. Please, extract\sigma adjusted value from the function's output list. The same is formean_fixed.
Rows indata corresponding to variables mentioned informulawhich have at least oneNA value will be ignored.
All variables mentioned informula should be numeric vectors.
The function calculates standard errors via sandwich estimatorand significance levels are reported taking into account quasi maximumlikelihood estimator (QMLE) asymptotic normality. If one wants to switchfrom QMLE to semi-nonparametric estimator (SNPE) during hypothesis testingthen covariance matrix should be estimated again using bootstrap.
This function maximizes (quasi) log-likelihood function viaoptim function setting itsmethod argument to "BFGS". Ifopt_type = "GA" then geneticalgorithm fromga functionwill be additionally (afteroptim putting itssolution (par) intosuggestions matrix) applied in order to perform global optimization. Note that global optimization takesmuch more time (usually minutes but sometimes hours or even days). The number of iterations and population size of the genetic algorithmwill grow linearly along with the number of estimated parameters. If it seems that global maximum has not been found then itis possible to continue the search restarting the function setting its input argumentx0 tox1 output value. Note thatifcov_type = "bootstrap" thengafunction will not be used for bootstrap iterations since itmay be extremely time consuming.
Ifopt_type = "GA" thenopt_control should be thelist containing the values to be passed togafunction. It is possible to pass argumentslower,upper,popSize,pcrossover,pmutation,elitism,maxiter,suggestions,optim,optimArgs,seed andmonitor. Note that it is possible to setpopulation,selection,crossover andmutation arguments changingga default parameters viagaControl function. These arguments information reported inga.In order to provide manual values forlower andupper boundsplease follow parameters ordering mentioned above for thex0 argument. If these bounds are not provided manually thenthey (except those related to the polynomial coefficients)will depend on the estimates obtainedby local optimization viaoptim function(this estimates will be in the middlebetweenlower andupper).Specifically for each sd parameterlower (upper) boundis 5 times lower (higher) than thisparameteroptim estimate.For each mean and regression coefficient parameter its lower and upper bounds deviate from correspondingoptim estimateby two absolute values of this estimate.Finally, lower and upper bounds for each polynomialcoefficient are-10 and10 correspondingly (do not dependon theiroptim estimates).
The following arguments are differ from their defaults inga:
pmutation = 0.2,optim = TRUE,optimArgs =list("method" = "Nelder-Mead", "poptim" = 0.2, "pressel" = 0.5),seed = 8,elitism = 2 + round(popSize * 0.1).
Let's denote byn_reg the number of regressorsincluded into theformula.The argumentspopSize andmaxiter ofga function have been set proportional to the number ofestimated polynomial coefficients and independent variables:
popSize = 10 + 5 * (K + 1) + 2 * n_regmaxiter = 50 * (1 + K) + 10 * n_reg
Value
This function returns an object of class "hpaBinary".
An object of class "hpaBinary" is a list containing the following components:
optim-optimfunction output. Ifopt_type = "GA"then it is the list containingoptimandgafunctions outputs.x1- numeric vector of distribution parameters estimates.mean- mean (mu) parameter of density function estimate.sd- sd (sigma) parameter of density function estimate.pol_coefficients- polynomial coefficients estimates.pol_degrees- the same asKinput parameter.coefficients- regression (single index) coefficients estimates.cov_mat- covariance matrix estimate.marginal_effects- marginal effects matrix where columns arevariables and rows are observations.results- numeric matrix representing estimation results.log-likelihood- value of Log-Likelihood function.AIC- AIC value.errors_exp- random error expectation estimate.errors_var- random error variance estimate.dataframe- data frame containing variables mentioned informulawithoutNAvalues.model_Lists- lists containing information about fixed parameters and parameters indexes inx1.n_obs- number of observations.z_latent- latent variable (single index) estimates.z_prob- probabilities of positive outcome (i.e. 1) estimates.
References
A. Gallant and D. W. Nychka (1987) <doi:10.2307/1913241>
See Also
summary.hpaBinary,predict.hpaBinary,plot.hpaBinary,logLik.hpaBinary
Examples
## Estimate survival probability on Titaniclibrary("titanic")# Prepare data set converting # all variables to numeric vectorsh <- data.frame("male" = as.numeric(titanic_train$Sex == "male"))h$class_1 <- as.numeric(titanic_train$Pclass == 1)h$class_2 <- as.numeric(titanic_train$Pclass == 2)h$class_3 <- as.numeric(titanic_train$Pclass == 3)h$sibl <- titanic_train$SibSph$survived <- titanic_train$Survivedh$age <- titanic_train$Ageh$parch <- titanic_train$Parchh$fare <- titanic_train$Fare# Estimate model parametersmodel_hpa_1 <- hpaBinary(survived ~class_1 + class_2 +male + age + sibl + parch + fare,K = 3, data = h)#get summarysummary(model_hpa_1)# Get predicted probabilitiespred_hpa_1 <- predict(model_hpa_1)# Calculate number of correct predictionshpa_1_correct_0 <- sum((pred_hpa_1 < 0.5) & (model_hpa_1$dataframe$survived == 0))hpa_1_correct_1 <- sum((pred_hpa_1 >= 0.5) & (model_hpa_1$dataframe$survived == 1))hpa_1_correct <- hpa_1_correct_1 + hpa_1_correct_0# Plot random errors density approximationplot(model_hpa_1)## Estimate parameters on data simulated from Student distributionlibrary("mvtnorm")set.seed(123)# Simulate independent variables from normal distributionn <- 5000X <- rmvnorm(n=n, mean = c(0,0), sigma = matrix(c(1,0.5,0.5,1), ncol=2))# Simulate random errors from Student distributionepsilon <- rt(n, 5) * (3 / sqrt(5))# Calculate latent and observable variables valuesz_star <- 1 + X[, 1] + X[, 2] + epsilonz <- as.numeric((z_star > 0))# Store the results into data frameh <- as.data.frame(cbind(z,X))names(h) <- c("z", "x1", "x2")# Estimate model parametersmodel <- hpaBinary(formula = z ~ x1 + x2, data=h, K = 3)summary(model)# Get predicted probabilities of 1 valuespredict(model)# Plot density function approximationplot(model)Probabilities and Moments Hermite Polynomial Approximation
Description
Approximation of truncated, marginal and conditional densities,moments and cumulative probabilities of multivariate distributions viaHermite polynomial based approach proposed by Gallant and Nychka in 1987.
Density approximating function is scale adjusted product of two terms. The first one is squared multivariate polynomial ofpol_degrees degrees withpol_coefficients coefficients vector. The second is product of independent normal random variables' densities with expected values and standard deviations given bymean andsd vectors correspondingly. Approximating function satisfies properties of density function thus generating a broad family of distributions.Characteristics of these distributions (moments, quantiles, probabilities and so on) may provide accurate approximations to characteristic of otherdistributions. Moreover it is usually possible to provide arbitrary closeapproximation by the means of polynomial degrees increase.
Usage
dhpa( x, pol_coefficients = numeric(0), pol_degrees = numeric(0), given_ind = numeric(0), omit_ind = numeric(0), mean = numeric(0), sd = numeric(0), is_parallel = FALSE, log = FALSE, is_validation = TRUE)phpa( x, pol_coefficients = numeric(0), pol_degrees = numeric(0), given_ind = numeric(0), omit_ind = numeric(0), mean = numeric(0), sd = numeric(0), is_parallel = FALSE, log = FALSE, is_validation = TRUE)ihpa( x_lower = numeric(0), x_upper = numeric(0), pol_coefficients = numeric(0), pol_degrees = numeric(0), given_ind = numeric(0), omit_ind = numeric(0), mean = numeric(0), sd = numeric(0), is_parallel = FALSE, log = FALSE, is_validation = TRUE)ehpa( x = numeric(0), pol_coefficients = numeric(0), pol_degrees = numeric(0), given_ind = numeric(0), omit_ind = numeric(0), mean = numeric(0), sd = numeric(0), expectation_powers = numeric(0), is_parallel = FALSE, is_validation = TRUE)etrhpa( tr_left = numeric(0), tr_right = numeric(0), pol_coefficients = numeric(0), pol_degrees = numeric(0), mean = numeric(0), sd = numeric(0), expectation_powers = numeric(0), is_parallel = FALSE, is_validation = TRUE)dtrhpa( x, tr_left = numeric(0), tr_right = numeric(0), pol_coefficients = numeric(0), pol_degrees = numeric(0), given_ind = numeric(0), omit_ind = numeric(0), mean = numeric(0), sd = numeric(0), is_parallel = FALSE, log = FALSE, is_validation = TRUE)itrhpa( x_lower = numeric(0), x_upper = numeric(0), tr_left = numeric(0), tr_right = numeric(0), pol_coefficients = numeric(0), pol_degrees = numeric(0), given_ind = numeric(0), omit_ind = numeric(0), mean = numeric(0), sd = numeric(0), is_parallel = FALSE, log = FALSE, is_validation = TRUE)dhpaDiff( x, pol_coefficients = numeric(0), pol_degrees = numeric(0), given_ind = numeric(0), omit_ind = numeric(0), mean = numeric(0), sd = numeric(0), type = "pol_coefficients", is_parallel = FALSE, log = FALSE, is_validation = TRUE)ehpaDiff( x = numeric(0), pol_coefficients = numeric(0), pol_degrees = numeric(0), given_ind = numeric(0), omit_ind = numeric(0), mean = numeric(0), sd = numeric(0), expectation_powers = numeric(0), type = "pol_coefficients", is_parallel = FALSE, log = FALSE, is_validation = TRUE)ihpaDiff( x_lower = numeric(0), x_upper = numeric(0), pol_coefficients = numeric(0), pol_degrees = numeric(0), given_ind = numeric(0), omit_ind = numeric(0), mean = numeric(0), sd = numeric(0), type = "pol_coefficients", is_parallel = FALSE, log = FALSE, is_validation = TRUE)qhpa( p, x = matrix(1, 1), pol_coefficients = numeric(0), pol_degrees = numeric(0), given_ind = numeric(0), omit_ind = numeric(0), mean = numeric(0), sd = numeric(0))rhpa( n, pol_coefficients = numeric(0), pol_degrees = numeric(0), mean = numeric(0), sd = numeric(0))Arguments
x | numeric matrix of function arguments andconditional values. Note that |
pol_coefficients | numeric vector of polynomial coefficients. |
pol_degrees | non-negative integer vector of polynomial degrees (orders). |
given_ind | logical or numeric vector indicating whether corresponding random vector component is conditioned. By default it is a logical vector of |
omit_ind | logical or numeric vector indicating whether correspondingrandom component is omitted. By default it is a logical vector of |
mean | numeric vector of expected values. |
sd | positive numeric vector of standard deviations. |
is_parallel | if |
log | logical; if |
is_validation | logical value indicating whether function input arguments should be validated. Set it to |
x_lower | numeric matrix of lower integration limits.Note that |
x_upper | numeric matrix of upper integration limits.Note that |
expectation_powers | integer vector of random vector components powers. |
tr_left | numeric matrix of left (lower) truncation limits.Note that |
tr_right | numeric matrix of right (upper) truncation limits.Note that |
type | determines the partial derivatives to be included into thegradient. If |
p | numeric vector of probabilities |
n | positive integer representing the number of observations |
Details
It is possible to approximate densitiesdhpa, cumulative probabilitiesphpa,ihpa, momentsehpa as well as their truncateddtrhpa,itrhpa,etrhpa formsand gradientsdhpaDiff,ihpaDiff.Note thatphpa is special ofihpawherexcorresponds tox_upper whilex_lower is matrix ofnegative infinity values. Sophpa intended to approximate cumulativedistribution functions whileihpa approximates probabilities thatrandom vector components will be between values determined by rows ofx_lower andx_upper matrices. Further details are given below.
Since density approximating function is non-negative and integratesto 1 it is density function for somem-variate random vector\xi. Approximating functionf_{\xi }(x) has the following form:
f_{\xi }(x) = f_{\xi }(x;\mu, \sigma, \alpha) =\frac{1}{\psi }\prod\limits_{t=1}^{m}\phi ({x}_{t};{\mu }_{t},{\sigma }_{t}){{\left( \sum\limits_{{i}_{1}=0}^{{K}_{1}}{...}\sum\limits_{{i}_{m}=0}^{{K}_{m}}{{{\alpha }_{({{i}_{1}},...,{{i}_{m}})}}\prod\limits_{r=1}^{m}x_{r}^{{{i}_{r}}}} \right)}^{2}}
\psi =\sum\limits_{{i}_{1}=0}^{{K}_{1}}{...}\sum\limits_{{i}_{m}=0}^{{K}_{m}}{\sum\limits_{{j}_{1}=0}^{{K}_{1}}{...}\sum\limits_{{j}_{m}=0}^{{K}_{m}}{{{\alpha }_{({i}_{1},\cdots,{i}_{m})}}{{\alpha }_{({j}_{1},\cdots,{j}_{m})}}\prod\limits_{r=1}^{m}\mathcal{M}({i}_{r}+{j}_{r};{{\mu }_{r}},{\sigma }_{r})}},
where:
x = (x_{1},...x_{m}) - is vector of arguments i.e. rowsofx matrix indhpa.
{\alpha }_{({i}_{1},\cdots,{i}_{m})} - is polynomial coefficientcorresponding topol_coefficients[k] element. In order to investigatecorrespondence betweenk and({i}_{1},\cdots,{i}_{m}) values please see 'Examples' section below orpolynomialIndex function 'Details', 'Value' and 'Examples' sections. Note that ifm=1thenpol_coefficients[k] simply corresponds to\alpha_{k-1}.
(K_{1},...,K_{m}) - are polynomial degrees (orders) provided viapol_degrees argument sopol_degrees[i] determinesK_{i}.
\phi (.;{\mu }_{t},{\sigma }_{t}) - is normal random variable density function where\mu_{t} and\sigma_{t} are mean and standard deviation determined bymean[t] andsd[t] arguments values.
\mathcal{M}(q;{{\mu }_{r}},{\sigma }_{r}) - isq-th ordermoment of normal random variable with mean{\mu }_{r} and standarddeviation{\sigma }_{r}. Note that functionnormalMoment allows to calculate and differentiate normal random variable's moments.
\psi - constant term insuring thatf_{\xi }(x) isdensity function.
Thereforedhpa allows to calculatef_{\xi}(x) values at pointsdetermined by rows ofx matrix given polynomial degreespol_degrees (K) as well asmean (\mu),sd (\sigma) andpol_coefficients (\alpha) parameters values. Note thatmean,sd andpol_degrees arem-variate vectors whilepol_coefficients hasprod(pol_degrees + 1) elements.
Cumulative probabilities could be approximated as follows:
P\left(\underline{x}_{1}\leq\xi_{1}\leq\overline{x}_{1},...,\underline{x}_{m}\leq\xi_{m}\leq\overline{x}_{m}\right) =
= \bar{F}_{\xi}(\underline{x},\bar{x}) = \bar{F}_{\xi}(\underline{x},\bar{x};\mu, \sigma, \alpha) =\frac{1}{\psi }\prod\limits_{t=1}^{m}(\Phi ({{{\bar{x}}}_{t}};{{\mu }_{t}},{{\sigma }_{t}})-\Phi ({{{\underline{x}}}_{t}};{{\mu }_{t}},{{\sigma }_{t}})) *
* \sum\limits_{{{i}_{1}}=0}^{{{K}_{1}}}{...}\sum\limits_{{{i}_{m}}=0}^{{{K}_{m}}}{\sum\limits_{{{j}_{1}}=0}^{{{K}_{1}}}{...}\sum\limits_{{{j}_{m}}=0}^{{{K}_{m}}}{{{\alpha }_{({{i}_{1}},...,{{i}_{m}})}}{{\alpha }_{({{j}_{1}},...,{{j}_{m}})}}}}\prod\limits_{r=1}^{m}\mathcal{M}_{TR}\left({i}_{r}+{j}_{r};\underline{x}_{r},\overline{x}_{r},\mu_{r},\sigma_{r}\right)
where:
\Phi (.;{\mu }_{t},{\sigma }_{t}) - is normal random variable's cumulative distribution function where\mu_{t} and\sigma_{t} are mean and standard deviation determined bymean[t] andsd[t] arguments values.
\mathcal{M}_{TR}(q;\underline{x}_{r},\overline{x}_{r},\mu_{r},\sigma_{r}) - isq-th ordermoment of truncated (from above by\overline{x}_{r} and from below by\underline{x}_{r}) normal random variable with mean{\mu }_{r} and standarddeviation{\sigma }_{r}. Note that functiontruncatedNormalMoment allows to calculate and differentiate truncated normal random variable's moments.
\overline{x} = (\overline{x}_{1},...,\overline{x}_{m}) - vector of upper integration limitsi.e. rows ofx_upper matrix inihpa.
\underline{x} = (\underline{x}_{1},...,\underline{x}_{m}) - vector of lower integration limitsi.e. rows ofx_lower matrix inihpa.
Thereforeihpa allows to calculate interval distribution function\bar{F}_{\xi}(\underline{x},\bar{x})values at points determined by rows ofx_lower (\underline{x})andx_upper (\overline{x}) matrices.The rest of the arguments are similar todhpa.
Expected value powered product approximation is as follows:
E\left( \prod\limits_{t=1}^{m}\xi_{t}^{{{k}_{t}}} \right)=\frac{1}{\psi }\sum\limits_{{{i}_{1}}=0}^{{{K}_{1}}}{...}\sum\limits_{{{i}_{m}}=0}^{{{K}_{m}}}{\sum\limits_{{{j}_{1}}=0}^{{{K}_{1}}}{...}\sum\limits_{{{j}_{m}}=0}^{{{K}_{m}}}{{{\alpha }_{({{i}_{1}},...,{{i}_{m}})}}{{\alpha }_{({{j}_{1}},...,{{j}_{m}})}}}}\prod\limits_{r=1}^{m}\mathcal{M}({{i}_{r}}+{{j}_{r}}+{{k}_{t}};{{\mu }_{r}},{{\sigma }_{r}})
where(k_{1},...,k_{m}) are integer powers determined byexpectation_powers argument ofehpa soexpectation_powers[t] assignsk_{t}. Note that argumentxinehpa allows to determined conditional values.
Expanding polynomial degrees(K_{1},...,K_{m}) it is possible to provide arbitrary close approximation to density of somem-variate random vector\xi^{\star}. So actuallyf_{\xi}(x)approximatesf_{\xi^{\star}}(x). Accurate approximation requiresappropriatemean,sd andpol_coefficients valuesselection. In order to get sample estimates of these parameters please applyhpaML function.
In order to perform calculation for marginal distribution of some\xi components please provide omitted components viaomit_ind argument.For examples if ones assumem=5-variate distributionand wants to deal with1-st,3-rd, and5-th components only i.e.(\xi_{1},\xi_{3},\xi_{5}) then setomit_ind = c(FALSE, TRUE, FALSE, TRUE, FALSE)indicating that\xi_{2} and\xi_{4} should be 'omitted' from\xi since2-nd and4-th values ofomit_ind areTRUE.Thenx still should be5 column matrix but values in2-nd and4-th columns will not affect calculation results. Meanwhile note that marginal distribution oftcomponents of\xi usually do not coincide with any marginaldistribution generated byt-variate density approximating function.
In order to perform calculation for conditional distribution i.e. given fixed values for some\xi components please provide thesecomponents viagiven_ind argument.For example if ones assumem=5-variate distributionand wants to deal with1-st,3-rd, and5-th components given fixed values (suppose 8 and 10) for the other two components i.e.(\xi|\xi_{2} = 8, \xi_{4} = 10) then setgiven_ind = c(FALSE, TRUE, FALSE, TRUE, FALSE) andx[2] = 8,x[4] = 10 where for simplicity it is assumed thatx is single row5 column matrix; it is possible to provide different conditional values for the same components simply setting different values to differentx rows.
Note that it is possible to combinegiven_ind andomit_indarguments. However it is wrong to set bothgiven_ind[i] andomit_ind[i] toTRUE. Also at least one value should beFALSE both forgiven_ind andomit_ind.
In order to consider truncated distribution of\xi i.e.\left(\xi|\overline{a}_{1}\leq\xi_{1}\leq\overline{b}_{1},\cdots,\overline{a}_{m}\leq\xi_{m}\leq\overline{b}_{m}\right)please set lower (left) truncation points\overline{a} and upper (right) truncation points\overline{b} viatr_left andtr_right arguments correspondingly. Note that if lower truncationpoints are negative infinite and upper truncation points are positiveinfinite thendtrhpa,itrhpa andetrhpa are similar todhpa,ihpa andehpa correspondingly.
In order to calculate Jacobian off_{\xi }(x;\mu, \sigma, \alpha)and\bar{F}_{\xi}(\underline{x},\bar{x};\mu, \sigma, \alpha) w.r.tall ore some particular parameters please applydhpaDiffandihpaDiff functions correspondingly specifyingparameters of interest viatype argument. Ifx orx_lower andx_upper are single row matrices then gradientswill be calculated.
For further information please see 'Examples' section. Note that examplesare given separately for each function.
Ifgiven_ind and (or)omit_ind are numeric vectorsthen they are insensitive to the order of elements. For examplegiven_ind = c(5, 2, 3) is similar togiven_ind = c(2, 3, 5).
Densities Hermite polynomial approximation approach has beenproposed by A. Gallant and D. W. Nychka in 1987. The main idea is toapproximate unknown distribution density with scaled Hermite polynomial.For more information please refer to the literature listed below.
Value
Functionsdhpa,phpa anddtrhpa return vector of probabilities of lengthnrow(x).
Functionsihpa anditrhpa return vector of probabilities of lengthnrow(x_upper).
Ifx argument has not been provided or is a single rowmatrix then functionehpa returns moment value. Otherwise it returns vector of lengthnrow(x) containing moments values.
Iftr_left andtr_right arguments are single row matrices thenfunctionetrhpa returns moment value.Otherwise it returns vector of lengthmax(nrow(tr_left), nrow(tr_right)) containing moments values.
FunctionsdhpaDiff andihpaDiff return Jacobin matrix. The numberof columns depends ontype argument. The number of rows isnrow(x) fordhpaDiff andnrow(x_upper) forihpaDiff
Ifmean orsd are not specified they assume the default values ofm-dimensional vectors of 0 and 1, respectively. Ifx_lower is not specified then it is the matrix of the same size asx_upper containing negative infinity values only. Ifexpectation_powers is not specified then it ism-dimensionalvector of 0 values.
Please see 'Details' section for additional information.
References
A. Gallant and D. W. Nychka (1987) <doi:10.2307/1913241>
Examples
## Example demonstrating dhpa function application.## Let's approximate some three random variables (i.e. X1, X2 and X3) ## joint density function at points x = (0,1, 0.2, 0.3) and ## y = (0.5, 0.8, 0.6) with Hermite polynomial of (1, 2, 3) degrees which ## polynomial coefficients equal 1 except coefficient related to x1*(x^3) ## polynomial element which equals 2. Also suppose that normal density ## related mean vector equals (1.1, 1.2, 1.3) while standard deviations ## vector is (2.1, 2.2, 2.3).# Prepare initial valuesx <- matrix(c(0.1, 0.2, 0.3), nrow = 1) # x point as a single row matrixy <- matrix(c(0.5, 0.8, 0.6), nrow = 1) # y point as a single row matrixx_y <- rbind(x, y) # matrix which rows are x and ymean <- c(1.1, 1.2, 1.3)sd <- c(2.1, 2.2, 2.3)pol_degrees <- c(1, 2, 3)# Create polynomial powers and indexes correspondence matrixpol_ind <- polynomialIndex(pol_degrees)# Set all polynomial coefficients to 1pol_coefficients <- rep(1, ncol(pol_ind))pol_degrees_n <- length(pol_degrees)# Assign coefficient 2 to the polynomial element (x1 ^ 1)*(x2 ^ 0)*(x3 ^ 2)pol_coefficients[apply(pol_ind, 2, function(x) all(x == c(1, 0, 2)))] <- 2# Visualize correspondence between polynomial # elements and their coefficientsas.data.frame(rbind(pol_ind, pol_coefficients), row.names = c("x1 power", "x2 power", "x3 power", "coefficients"), optional = TRUE)printPolynomial(pol_degrees, pol_coefficients)# Calculate density approximation # at point x (note that x should be a matrix)dhpa(x = x, pol_coefficients = pol_coefficients, pol_degrees = pol_degrees, mean = mean, sd = sd) # at points x and ydhpa(x = x_y, pol_coefficients = pol_coefficients, pol_degrees = pol_degrees, mean = mean, sd = sd)# Condition second component to be 0.5 i.e. X2 = 0.5.# Substitute x and y second components with conditional value 0.5x <- matrix(c(0.1, 0.5, 0.3), nrow = 1) # or simply x[2] <- 0.5y <- matrix(c(0.4, 0.5, 0.6), nrow = 1) # or simply y[2] <- 0.5x_y <- rbind(x, y) # Set TRUE to the second component indicating that it is conditionedgiven_ind <- c(FALSE, TRUE, FALSE)# Calculate conditional (on X2 = 0.5) density approximation # at point xdhpa(x = x, pol_coefficients = pol_coefficients, pol_degrees = pol_degrees, mean = mean, sd = sd, given_ind = given_ind) # at points x and ydhpa(x = x_y, pol_coefficients = pol_coefficients, pol_degrees = pol_degrees, mean = mean, sd = sd, given_ind = given_ind)# Consider third component marginal distribution conditioned on the# second component 0.5 value i.e. (X3 | X2 = 0.5).# Set TRUE to the first component indicating that it is omittedomit_ind <- c(TRUE, FALSE, FALSE)# Calculate conditional (on x2 = 0.5) marginal (for x3) density approximation # at point xdhpa(x = x, pol_coefficients = pol_coefficients, pol_degrees = pol_degrees, mean = mean, sd = sd, given_ind = given_ind, omit_ind = omit_ind) # at points x and ydhpa(x = x_y, pol_coefficients = pol_coefficients, pol_degrees = pol_degrees, mean = mean, sd = sd, given_ind = given_ind, omit_ind = omit_ind) ## Example demonstrating phpa function application.## Let's approximate some three random variables (X1, X2, X3) ## joint cumulative distribution function (cdf) at point (0,1, 0.2, 0.3)## with Hermite polynomial of (1, 2, 3) degrees which polynomial ## coefficients equal 1 except coefficient related to x1*(x^3) polynomial ## element which equals 2. Also suppose that normal density related## mean vector equals (1.1, 1.2, 1.3) while standard deviations## vector is (2.1, 2.2, 2.3).## Prepare initial valuesx <- matrix(c(0.1, 0.2, 0.3), nrow = 1)mean <- c(1.1, 1.2, 1.3)sd <- c(2.1, 2.2, 2.3)pol_degrees <- c(1, 2, 3)# Create polynomial powers and indexes correspondence matrixpol_ind <- polynomialIndex(pol_degrees)# Set all polynomial coefficients to 1pol_coefficients <- rep(1, ncol(pol_ind))pol_degrees_n <- length(pol_degrees)# Assign coefficient 2 to the polynomial element (x1 ^ 1)*(x2 ^ 0)*(x3 ^ 2)pol_coefficients[apply(pol_ind, 2, function(x) all(x == c(1, 0, 2)))] <- 2# Visualize correspondence between polynomial# elements and their coefficientsas.data.frame(rbind(pol_ind, pol_coefficients), row.names = c("x1 power", "x2 power", "x3 power", "coefficients"), optional = TRUE)printPolynomial(pol_degrees, pol_coefficients)# Calculate cdf approximation at point xphpa(x = x, pol_coefficients = pol_coefficients, pol_degrees = pol_degrees, mean = mean, sd = sd)# Condition second component to be 0.5# Substitute x second component with conditional value 0.5x <- matrix(c(0.1, 0.5, 0.3), nrow = 1) # or simply x[2] <- 0.5# Set TRUE to the second component indicating that it is conditionedgiven_ind <- c(FALSE, TRUE, FALSE)# Calculate conditional (on X2 = 0.5) cdf approximation at point xphpa(x = x, pol_coefficients = pol_coefficients, pol_degrees = pol_degrees, mean = mean, sd = sd, given_ind = given_ind)# Consider third component marginal distribution# conditioned on the second component 0.5 value# Set TRUE to the first component indicating that it is omittedomit_ind <- c(TRUE, FALSE, FALSE)# Calculate conditional (on X2 = 0.5) marginal (for X3) cdf # approximation at point xphpa(x = x, pol_coefficients = pol_coefficients, pol_degrees = pol_degrees, mean = mean, sd = sd, given_ind = given_ind, omit_ind = omit_ind)## Example demonstrating ihpa function application.## Let's approximate some three random variables (X1, X2, X3) joint interval ## distribution function (intdf) at lower and upper points (0,1, 0.2, 0.3) ## and (0,4, 0.5, 0.6) correspondingly with Hermite polynomial of (1, 2, 3) ## degrees which polynomial coefficients equal 1 except coefficient related ## to x1*(x^3) polynomial element which equals 2. Also suppose that normal## density related mean vector equals (1.1, 1.2, 1.3) while standard## deviations vector is (2.1, 2.2, 2.3).## Prepare initial valuesx_lower <- matrix(c(0.1, 0.2, 0.3), nrow=1)x_upper <- matrix(c(0.4, 0.5, 0.6), nrow=1)mean <- c(1.1, 1.2, 1.3)sd <- c(2.1, 2.2, 2.3)pol_degrees <- c(1, 2, 3)# Create polynomial powers and indexes correspondence matrixpol_ind <- polynomialIndex(pol_degrees)# Set all polynomial coefficients to 1pol_coefficients <- rep(1, ncol(pol_ind))pol_degrees_n <- length(pol_degrees)# Assign coefficient 2 to the polynomial element (x1 ^ 1)*(x2 ^ 0)*(x3 ^ 2)pol_coefficients[apply(pol_ind, 2, function(x) all(x == c(1, 0, 2)))] <- 2# Visualize correspondence between polynomial # elements and their coefficientsas.data.frame(rbind(pol_ind, pol_coefficients), row.names = c("x1 power", "x2 power", "x3 power", "coefficients"), optional = TRUE)printPolynomial(pol_degrees, pol_coefficients)# Calculate intdf approximation at points x_lower and x_upperihpa(x_lower = x_lower, x_upper = x_upper, pol_coefficients = pol_coefficients, pol_degrees = pol_degrees, mean = mean, sd = sd)# Condition second component to be 0.7# Substitute x second component with conditional value 0.7x_upper <- matrix(c(0.4, 0.7, 0.6), nrow = 1)# Set TRUE to the second component indicating that it is conditionedgiven_ind <- c(FALSE, TRUE, FALSE)# Calculate conditional (on X2 = 0.5) intdf approximation # at points x_lower and x_upperihpa(x_lower = x_lower, x_upper = x_upper, pol_coefficients = pol_coefficients, pol_degrees = pol_degrees, mean = mean, sd = sd, given_ind = given_ind)# Consider third component marginal distribution# conditioned on the second component 0.7 value# Set TRUE to the first component indicating that it is omittedomit_ind <- c(TRUE, FALSE, FALSE)# Calculate conditional (on X2 = 0.5) marginal (for X3) # intdf approximation at points x_lower and x_upperihpa(x_lower = x_lower, x_upper = x_upper, pol_coefficients = pol_coefficients, pol_degrees = pol_degrees, mean = mean, sd = sd, given_ind = given_ind, omit_ind = omit_ind)## Example demonstrating ehpa function application.## Let's approximate some three random variables (X1, X2, X3) powered product ## expectation for powers (3, 2, 1) with Hermite polynomial of (1, 2, 3) ## degrees which polynomial coefficients equal 1 except coefficient ## related to x1*(x^3) polynomial element which equals 2.## Also suppose that normal density related mean vector equals ## (1.1, 1.2, 1.3) while standard deviations vector is (2.1, 2.2, 2.3).# Prepare initial valuesexpectation_powers = c(3,2,1)mean <- c(1.1, 1.2, 1.3)sd <- c(2.1, 2.2, 2.3)pol_degrees <- c(1, 2, 3)# Create polynomial powers and indexes correspondence matrixpol_ind <- polynomialIndex(pol_degrees)# Set all polynomial coefficients to 1pol_coefficients <- rep(1, ncol(pol_ind))pol_degrees_n <- length(pol_degrees)#Assign coefficient 2 to the polynomial element (x1 ^ 1)*(x2 ^ 0)*(x3 ^ 2)pol_coefficients[apply(pol_ind, 2, function(x) all(x == c(1, 0, 2)))] <- 2# Visualize correspondence between polynomial elements and their coefficientsas.data.frame(rbind(pol_ind, pol_coefficients), row.names = c("x1 power", "x2 power", "x3 power", "coefficients"), optional = TRUE)printPolynomial(pol_degrees, pol_coefficients)# Calculate expected powered product approximationehpa(pol_coefficients = pol_coefficients, pol_degrees = pol_degrees, mean = mean, sd = sd, expectation_powers = expectation_powers)# Condition second component to be 0.5# Substitute x second component with conditional value 0.5x <- matrix(c(NA, 0.5, NA), nrow = 1)#Set TRUE to the second component indicating that it is conditionedgiven_ind <- c(FALSE, TRUE, FALSE)# Calculate conditional (on X2 = 0.5) expected powered product approximationehpa(x = x, pol_coefficients = pol_coefficients, pol_degrees = pol_degrees, mean = mean, sd = sd, expectation_powers = expectation_powers, given_ind = given_ind)# Consider third component marginal distribution# conditioned on the second component 0.5 value# Set TRUE to the first component indicating that it is omittedomit_ind <- c(TRUE, FALSE, FALSE)# Calculate conditional (on X2 = 0.5) marginal (for X3) expected powered # product approximation at points x_lower and x_upperehpa(x = x, pol_coefficients = pol_coefficients, pol_degrees = pol_degrees, mean = mean, sd = sd, expectation_powers = expectation_powers, given_ind = given_ind, omit_ind = omit_ind)## Example demonstrating etrhpa function application.## Let's approximate some three truncated random variables (X1, X2, X3) ## powered product expectation for powers (3, 2, 1) with Hermite polynomial ## of (1,2,3) degrees which polynomial coefficients equal 1 except ## coefficient related to x1*(x^3) polynomial element which equals 2. Also## suppose that normal density related mean vector equals (1.1, 1.2, 1.3) ## while standard deviations vector is (2.1, 2.2, 2.3). Suppose that lower ## and upper truncation points are (-1.1,-1.2,-1.3) and (1.1,1.2,1.3) ## correspondingly.# Prepare initial valuesexpectation_powers = c(3,2,1)tr_left = matrix(c(-1.1,-1.2,-1.3), nrow = 1)tr_right = matrix(c(1.1,1.2,1.3), nrow = 1)mean <- c(1.1, 1.2, 1.3)sd <- c(2.1, 2.2, 2.3)pol_degrees <- c(1, 2, 3)# Create polynomial powers and indexes correspondence matrixpol_ind <- polynomialIndex(pol_degrees)# Set all polynomial coefficients to 1pol_coefficients <- rep(1, ncol(pol_ind))pol_degrees_n <- length(pol_degrees)# Assign coefficient 2 to the polynomial element (x1 ^ 1)*(x2 ^ 0)*(x3 ^ 2)pol_coefficients[apply(pol_ind, 2, function(x) all(x == c(1, 0, 2)))] <- 2# Visualize correspondence between polynomial elements and their coefficientsas.data.frame(rbind(pol_ind, pol_coefficients), row.names = c("x1 power", "x2 power", "x3 power", "coefficients"), optional = TRUE)printPolynomial(pol_degrees, pol_coefficients)# Calculate expected powered product approximation for truncated distributionetrhpa(pol_coefficients = pol_coefficients, pol_degrees = pol_degrees, mean = mean, sd = sd, expectation_powers = expectation_powers, tr_left = tr_left, tr_right = tr_right) ## Example demonstrating dtrhpa function application.## Let's approximate some three random variables (X1, X2, X3) joint density ## function at point (0,1, 0.2, 0.3) with Hermite polynomial of (1,2,3) ## degrees which polynomial coefficients equal 1 except coefficient related ## to x1*(x^3) polynomial element which equals 2. Also suppose that normal ## density related mean vector equals (1.1, 1.2, 1.3) while standard ## deviations vector is (2.1, 2.2, 2.3). Suppose that lower and upper ## truncation points are (-1.1,-1.2,-1.3) and (1.1,1.2,1.3) correspondingly.# Prepare initial valuesx <- matrix(c(0.1, 0.2, 0.3), nrow=1)tr_left = matrix(c(-1.1,-1.2,-1.3), nrow = 1)tr_right = matrix(c(1.1,1.2,1.3), nrow = 1)mean <- c(1.1, 1.2, 1.3)sd <- c(2.1, 2.2, 2.3)pol_degrees <- c(1, 2, 3)# Create polynomial powers and indexes correspondence matrixpol_ind <- polynomialIndex(pol_degrees)# Set all polynomial coefficients to 1pol_coefficients <- rep(1, ncol(pol_ind))pol_degrees_n <- length(pol_degrees)# Assign coefficient 2 to the polynomial element (x1 ^ 1)*(x2 ^ 0)*(x3 ^ 2)pol_coefficients[apply(pol_ind, 2, function(x) all(x == c(1, 0, 2)))] <- 2# Visualize correspondence between polynomial elements and their coefficientsas.data.frame(rbind(pol_ind, pol_coefficients), row.names = c("x1 power", "x2 power", "x3 power", "coefficients"), optional = TRUE)printPolynomial(pol_degrees, pol_coefficients)# Calculate density approximation at point xdtrhpa(x = x, pol_coefficients = pol_coefficients, pol_degrees = pol_degrees, mean = mean, sd = sd, tr_left = tr_left, tr_right = tr_right)# Condition second component to be 0.5# Substitute x second component with conditional value 0.5x <- matrix(c(0.1, 0.5, 0.3), nrow = 1)# Set TRUE to the second component indicating that it is conditionedgiven_ind <- c(FALSE, TRUE, FALSE)# Calculate conditional (on x2 = 0.5) density approximation at point xdtrhpa(x = x, pol_coefficients = pol_coefficients, pol_degrees = pol_degrees, mean = mean, sd = sd, given_ind = given_ind, tr_left = tr_left, tr_right = tr_right)# Consider third component marginal distribution# conditioned on the second component 0.5 value# Set TRUE to the first component indicating that it is omittedomit_ind <- c(TRUE, FALSE, FALSE)# Calculate conditional (on X2 = 0.5) marginal (for X3) # density approximation at point xdtrhpa(x = x, pol_coefficients = pol_coefficients, pol_degrees = pol_degrees, mean = mean, sd = sd, given_ind = given_ind, omit_ind = omit_ind, tr_left = tr_left, tr_right = tr_right) ## Example demonstrating itrhpa function application.## Let's approximate some three truncated random variables (X1, X2, X3) joint ## interval distribution function at lower and upper points (0,1, 0.2, 0.3) ## and (0,4, 0.5, 0.6) correspondingly with Hermite polynomial of (1 ,2, 3) ## degrees which polynomial coefficients equal 1 except coefficient## related to x1*(x^3) polynomial element which equals 2. Also suppose ## that normal density related mean vector equals (1.1, 1.2, 1.3) while ## standard deviations vector is (2.1, 2.2, 2.3). Suppose that lower and ## upper truncation are (-1.1,-1.2,-1.3) and (1.1,1.2,1.3) correspondingly.# Prepare initial valuesx_lower <- matrix(c(0.1, 0.2, 0.3), nrow=1)x_upper <- matrix(c(0.4, 0.5, 0.6), nrow=1)tr_left = matrix(c(-1.1,-1.2,-1.3), nrow = 1)tr_right = matrix(c(1.1,1.2,1.3), nrow = 1)mean <- c(1.1, 1.2, 1.3)sd <- c(2.1, 2.2, 2.3)pol_degrees <- c(1, 2, 3)# Create polynomial powers and indexes correspondence matrixpol_ind <- polynomialIndex(pol_degrees)# Set all polynomial coefficients to 1pol_coefficients <- rep(1, ncol(pol_ind))pol_degrees_n <- length(pol_degrees)# Assign coefficient 2 to the polynomial element (x1 ^ 1)*(x2 ^ 0)*(x3 ^ 2)pol_coefficients[apply(pol_ind, 2, function(x) all(x == c(1, 0, 2)))] <- 2# Visualize correspondence between polynomial # elements and their coefficientsas.data.frame(rbind(pol_ind, pol_coefficients), row.names = c("x1 power", "x2 power", "x3 power", "coefficients"), optional = TRUE)printPolynomial(pol_degrees, pol_coefficients)# Calculate intdf approximation at points x_lower and x_upperitrhpa(x_lower = x_lower, x_upper = x_upper, pol_coefficients = pol_coefficients, pol_degrees = pol_degrees, mean = mean, sd = sd, tr_left = tr_left, tr_right = tr_right) # Condition second component to be 0.7# Substitute x second component with conditional value 0.7x_upper <- matrix(c(0.4, 0.7, 0.6), nrow = 1)# Set TRUE to the second component indicating that it is conditionedgiven_ind <- c(FALSE, TRUE, FALSE)# Calculate conditional (on X2 = 0.5) intdf # approximation at points x_lower and x_upperitrhpa(x_lower = x_lower, x_upper = x_upper, pol_coefficients = pol_coefficients, pol_degrees = pol_degrees, mean = mean, sd = sd, given_ind = given_ind, tr_left = tr_left, tr_right = tr_right) # Consider third component marginal distribution# conditioned on the second component 0.7 value# Set TRUE to the first component indicating that it is omittedomit_ind <- c(TRUE, FALSE, FALSE)# Calculate conditional (on X2 = 0.5) marginal (for X3) intdf # approximation at points x_lower and x_upperitrhpa(x_lower = x_lower, x_upper = x_upper, pol_coefficients = pol_coefficients, pol_degrees = pol_degrees, mean = mean, sd = sd, given_ind = given_ind, omit_ind = omit_ind, tr_left = tr_left, tr_right = tr_right) ## Example demonstrating dhpaDiff function application.## Let's approximate some three random variables (X1, X2, X3) joint density## function at point (0,1, 0.2, 0.3) with Hermite polynomial of (1,2,3)## degrees which polynomial coefficients equal 1 except coefficient related## to x1*(x^3) polynomial element which equals 2. Also suppose that normal## density related mean vector equals (1.1, 1.2, 1.3) while standard## deviations vector is (2.1, 2.2, 2.3). In this example let's calculate## density approximating function's gradient respect to various parameters# Prepare initial valuesx <- matrix(c(0.1, 0.2, 0.3), nrow = 1)mean <- c(1.1, 1.2, 1.3)sd <- c(2.1, 2.2, 2.3)pol_degrees <- c(1, 2, 3)# Create polynomial powers and indexes correspondence matrixpol_ind <- polynomialIndex(pol_degrees)# Set all polynomial coefficients to 1pol_coefficients <- rep(1, ncol(pol_ind))pol_degrees_n <- length(pol_degrees)# Assign coefficient 2 to the polynomial element (x1 ^ 1)*(x2 ^ 0)*(x3 ^ 2)pol_coefficients[apply(pol_ind, 2, function(x) all(x == c(1, 0, 2)))] <- 2# Visualize correspondence between polynomial# elements and their coefficientsas.data.frame(rbind(pol_ind, pol_coefficients), row.names = c("x1 power", "x2 power", "x3 power", "coefficients"), optional = TRUE)printPolynomial(pol_degrees, pol_coefficients)# Calculate density approximation gradient # respect to polynomial coefficients at point xdhpaDiff(x = x, pol_coefficients = pol_coefficients, pol_degrees = pol_degrees, mean = mean, sd = sd)# Condition second component to be 0.5# Substitute x second component with conditional value 0.5x <- matrix(c(0.1, 0.5, 0.3), nrow = 1)# Set TRUE to the second component indicating that it is conditionedgiven_ind <- c(FALSE, TRUE, FALSE)# Calculate conditional (on x2 = 0.5) density approximation's # gradient respect to polynomial coefficients at point xdhpaDiff(x = x, pol_coefficients = pol_coefficients, pol_degrees = pol_degrees, mean = mean, sd = sd, given_ind = given_ind)# Consider third component marginal distribution# conditioned on the second component 0.5 value# Set TRUE to the first component indicating that it is omittedomit_ind <- c(TRUE, FALSE, FALSE)# Calculate conditional (on X2 = 0.5) marginal (for X3) density # approximation's gradient respect to: # polynomial coefficientsdhpaDiff(x = x, pol_coefficients = pol_coefficients, pol_degrees = pol_degrees, mean = mean, sd = sd, given_ind = given_ind, omit_ind = omit_ind) # meandhpaDiff(x = x, pol_coefficients = pol_coefficients, pol_degrees = pol_degrees, mean = mean, sd = sd, given_ind = given_ind, omit_ind = omit_ind, type = "mean") # sddhpaDiff(x = x, pol_coefficients = pol_coefficients, pol_degrees = pol_degrees, mean = mean, sd = sd, given_ind = given_ind, omit_ind = omit_ind, type = "sd") # xdhpaDiff(x = x, pol_coefficients = pol_coefficients, pol_degrees = pol_degrees, mean = mean, sd = sd, given_ind = given_ind, omit_ind = omit_ind, type = "x") ## Example demonstrating ehpaDiff function application.## Let's approximate some three random variables (X1, X2, X3) expectation## of the form E((X1 ^ 3) * (x2 ^ 1) * (X3 ^ 2)) and calculate the gradient# Distribution parametersmean <- c(1.1, 1.2, 1.3)sd <- c(2.1, 2.2, 2.3)pol_degrees <- c(1, 2, 3)pol_coefficients_n <- prod(pol_degrees + 1)pol_coefficients <- rep(1, pol_coefficients_n)# Set powers for expectationexpectation_powers <- c(3, 1, 2)# Calculate expectation approximation gradient # respect to all parametersehpaDiff(pol_coefficients = pol_coefficients, pol_degrees = pol_degrees, mean = mean, sd = sd, expectation_powers = expectation_powers, type = "all")# Let's calculate gradient of E(X1 ^ 3 | (X2 = 1, X3 = 2))x <- c(0, 1, 2) # x[1] may be arbitrary (not NA) valuesexpectation_powers <- c(3, 0, 0) # expectation_powers[2:3] may be # arbitrary (not NA) valuesgiven_ind <- c(2, 3)ehpaDiff(x = x, pol_coefficients = pol_coefficients, pol_degrees = pol_degrees, mean = mean, sd = sd, given_ind = given_ind, expectation_powers = expectation_powers, type = "all")## Example demonstrating ihpaDiff function application.## Let's approximate some three random variables (X1, X2, X3 ) joint interval ## distribution function (intdf) at lower and upper points (0,1, 0.2, 0.3) ## and (0,4, 0.5, 0.6) correspondingly with Hermite polynomial of (1, 2, 3) ## degrees which polynomial coefficients equal 1 except coefficient ## related to x1*(x^3) polynomial element which equals 2.## Also suppose that normal density related mean vector equals ## (1.1, 1.2, 1.3) while standard deviations vector is (2.1, 2.2, 2.3).## In this example let's calculate interval distribution approximating ## function gradient respect to polynomial coefficients.# Prepare initial valuesx_lower <- matrix(c(0.1, 0.2, 0.3), nrow=1)x_upper <- matrix(c(0.4, 0.5, 0.6), nrow=1)mean <- c(1.1, 1.2, 1.3)sd <- c(2.1, 2.2, 2.3)pol_degrees <- c(1, 2, 3)# Create polynomial powers and indexes correspondence matrixpol_ind <- polynomialIndex(pol_degrees)# Set all polynomial coefficients to 1pol_coefficients <- rep(1, ncol(pol_ind))pol_degrees_n <- length(pol_degrees)# Assign coefficient 2 to the polynomial element (x1 ^ 1)*(x2 ^ 0)*(x3 ^ 2)pol_coefficients[apply(pol_ind, 2, function(x) all(x == c(1, 0, 2)))] <- 2# Visualize correspondence between polynomial # elements and their coefficientsas.data.frame(rbind(pol_ind, pol_coefficients), row.names = c("x1 power", "x2 power", "x3 power", "coefficients"), optional = TRUE)printPolynomial(pol_degrees, pol_coefficients)# Calculate intdf approximation gradient respect to # polynomial coefficients at points x_lower and x_upperihpaDiff(x_lower = x_lower, x_upper = x_upper, pol_coefficients = pol_coefficients, pol_degrees = pol_degrees, mean = mean, sd = sd)# Condition second component to be 0.7# Substitute x second component with conditional value 0.7x_upper <- matrix(c(0.4, 0.7, 0.6), nrow = 1)# Set TRUE to the second component indicating that it is conditionedgiven_ind <- c(FALSE, TRUE, FALSE)# Calculate conditional (on X2 = 0.5) intdf approximation# respect to polynomial coefficients at points x_lower and x_upperihpaDiff(x_lower = x_lower, x_upper = x_upper, pol_coefficients = pol_coefficients, pol_degrees = pol_degrees, mean = mean, sd = sd, given_ind = given_ind)# Consider third component marginal distribution# conditioned on the second component 0.7 value# Set TRUE to the first component indicating that it is omittedomit_ind <- c(TRUE, FALSE, FALSE)# Calculate conditional (on X2 = 0.5) marginal (for X3) intdf approximation# respect to: # polynomial coefficientsihpaDiff(x_lower = x_lower, x_upper = x_upper, pol_coefficients = pol_coefficients, pol_degrees = pol_degrees, mean = mean, sd = sd, given_ind = given_ind, omit_ind = omit_ind) # meanihpaDiff(x_lower = x_lower, x_upper = x_upper, pol_coefficients = pol_coefficients, pol_degrees = pol_degrees, mean = mean, sd = sd, given_ind = given_ind, omit_ind = omit_ind, type = "mean") # sdihpaDiff(x_lower = x_lower, x_upper = x_upper, pol_coefficients = pol_coefficients, pol_degrees = pol_degrees, mean = mean, sd = sd, given_ind = given_ind, omit_ind = omit_ind, type = "sd") # x_lowerihpaDiff(x_lower = x_lower, x_upper = x_upper, pol_coefficients = pol_coefficients, pol_degrees = pol_degrees, mean = mean, sd = sd, given_ind = given_ind, omit_ind = omit_ind, type = "x_lower") # x_upperihpaDiff(x_lower = x_lower, x_upper = x_upper, pol_coefficients = pol_coefficients, pol_degrees = pol_degrees, mean = mean, sd = sd, given_ind = given_ind, omit_ind = omit_ind, type = "x_upper") ## Examples demonstrating qhpa function application.## Sub-example 1 - univariate distribution## Consider random variable X# Distribution parametersmean <- 1sd <- 2pol_degrees <- 2pol_coefficients <- c(1, 0.1, -0.01)# The level of quantilep <- 0.7# Calculate quantile of Xqhpa(p = p, pol_coefficients = pol_coefficients, pol_degrees = pol_degrees, mean = mean, sd = sd) ## Sub-example 2 - marginal distribution## Consider random vector (X1, X2) and quantile of X1# Distribution parametersmean <- c(1, 1.2)sd <- c(2, 3)pol_degrees <- c(2, 2)pol_coefficients <- c(1, 0.1, -0.01, 0.2, 0.012, 0.0013, 0.0042, 0.00025, 0)# The level of quantilep <- 0.7# Calculate quantile of X1qhpa(p = p, pol_coefficients = pol_coefficients, pol_degrees = pol_degrees, mean = mean, sd = sd, omit_ind = 2) # set omitted variable index ## Sub-example 3 - marginal and conditional distribution## Consider random vector (X1, X2, X3) and ## quantiles of X1|X3 and X1|(X2,X3)mean <- c(1, 1.2, 0.9)sd <- c(2, 3, 2.5)pol_degrees <- c(1, 1, 1)pol_coefficients <- c(1, 0.1, -0.01, 0.2, 0.012, 0.0013, 0.0042, 0.00025)# The level of quantilep <- 0.7# Calculate quantile of X1|X3 = 0.2qhpa(p = p, x = matrix(c(NA, NA, 0.2), nrow = 1), # set any values to # unconditioned and # omitted components pol_coefficients = pol_coefficients, pol_degrees = pol_degrees, mean = mean, sd = sd, omit_ind = 2, # set omitted variable index given_ind = 3) # set conditioned variable index # Calculate quantile of X1|(X2 = 0.5, X3 = 0.2)qhpa(p = p, x = matrix(c(NA, 0.5, 0.2), nrow = 1), # set any values to # unconditioned and # omitted components pol_coefficients = pol_coefficients, pol_degrees = pol_degrees, mean = mean, sd = sd, given_ind = c(2, 3)) # set conditioned # variables indexes ## Examples demonstrating rhpa function application.# Set seed for reproducibilityset.seed(123)# Distribution parametersmean <- 1sd <- 2pol_degrees <- 2pol_coefficients <- c(1, 0.1, -0.01)# Simulate two observations from this distributionrhpa(n = 2, pol_coefficients = pol_coefficients, pol_degrees = pol_degrees, mean = mean, sd = sd)Fast pdf and cdf for standardized univariate PGN distribution
Description
This function uses fast algorithms to calculate densitiesand probabilities (along with their derivatives) related to standardized PGN distribution.
Usage
dhpa0( x, pc, mean = 0, sd = 1, is_parallel = FALSE, log = FALSE, is_validation = TRUE, is_grad = FALSE)phpa0( x, pc, mean = 0, sd = 1, is_parallel = FALSE, log = FALSE, is_validation = TRUE, is_grad = FALSE)Arguments
x | numeric vector of functions arguments. |
pc | polynomial coefficients without the first term. |
mean | expected value (mean) of the distribution. |
sd | standard deviation of the distribution. |
is_parallel | logical; if TRUE then multiple cores will be used for some calculations. Currently unavailable. |
log | logical; if |
is_validation | logical value indicating whether function input arguments should be validated. Set it to |
is_grad | logical; if |
Details
Functionsdhpa0 andphpa0 are similar todhpa andphpa correspondingly. However there are two keydifferences. First,dhpa0 andphpa0are deal with univariate PGN distribution only. Second, this distributionis standardized to zero mean and unit variances. Moreoverpc is similar topol_coefficients argument ofdhpa butwithout the first component i.e.pc=pol_coefficients[-1]. Alsomean andsd are not the arguments of the normal densitybut actual mean and standard deviation of the resulting distribution. Soif these arguments are different from0 and1 correspondinglythen standardized PGN distribution will be linearly transformed to havemeanmean and standard deviationsd.
Value
Both functions return a list.Functiondhpa0 returns a list with element named"den" that is a numeric vector of density values. Functionphpa0 returns a list with element named"prob" that is a numeric vector of probabilities.
Ifis_grad = TRUE then elements"grad_x" and"grad_pc"will be add to the list containing gradients respect to input argumentx and parameterspc correspondingly. Iflog = TRUE thenadditional elements will be add to the list containing density, probabilityand gradient values for logarithms of corresponding functions. Theseelements will be named as"grad_x_log","grad_pc_log","grad_prob_log" and"grad_den_log".
Examples
# Calculate density and probability of standartized PGN# distribution # distribution parameterspc <- c(0.5, -0.2) # function argumentsx <- c(-0.3, 0.8, 1.5) # probability density functiondhpa0(x, pc) # cumulative distribution functionphpa0(x, pc)# Additionally calculate gradients respect to arguments# and parameters of the PGN distributiondhpa0(x, pc, is_grad = TRUE)phpa0(x, pc, is_grad = TRUE)# Let's denote by X standardized PGN random variable and repeat# calculations for 2 * X + 1dhpa0(x, pc, is_grad = TRUE, mean = 1, sd = 2)phpa0(x, pc, is_grad = TRUE, mean = 1, sd = 2)Semi-nonparametric maximum likelihood estimation
Description
This function performs semi-nonparametric (SNP)maximum likelihood estimation of unknown (possibly truncated) multivariate density using Hermite polynomial based approximating function proposed by Gallant and Nychka in 1987. Please, seedhpa 'Details' section to get more information concerning this approximating function.
Usage
hpaML( data, pol_degrees = numeric(0), tr_left = numeric(0), tr_right = numeric(0), given_ind = numeric(0), omit_ind = numeric(0), x0 = numeric(0), cov_type = "sandwich", boot_iter = 100L, is_parallel = FALSE, opt_type = "optim", opt_control = NULL, is_validation = TRUE)Arguments
data | numeric matrix which rows are realizations of independent identically distributed random vectors while columns correspond tovariables. |
pol_degrees | non-negative integer vector of polynomial degrees (orders). |
tr_left | numeric vector of left (lower) truncation limits. |
tr_right | numeric vector of right (upper) truncation limits. |
given_ind | logical or numeric vector indicating whether corresponding random vector component is conditioned. By default it is a logical vector of |
omit_ind | logical or numeric vector indicating whether correspondingrandom component is omitted. By default it is a logical vector of |
x0 | numeric vector of optimization routine initial values.Note that |
cov_type | character determining the type of covariance matrix to bereturned and used for summary. If |
boot_iter | the number of bootstrap iterationsfor |
is_parallel | if |
opt_type | string value determining the type of the optimizationroutine to be applied. The default is |
opt_control | a list containing arguments to be passed to theoptimization routine depending on |
is_validation | logical value indicating whether function input arguments should be validated. Set it to |
Details
Densities Hermite polynomial approximation approach has beenproposed by A. Gallant and D. W. Nychka in 1987. The main idea is toapproximate unknown distribution density with scaled Hermite polynomial.For more information please refer to the literature listed below.
Let's use notations introduced indhpa 'Details' section. FunctionhpaML maximizes the followingquasi log-likelihood function:
\ln L(\alpha, \mu, \sigma; x) = \sum\limits_{i=1}^{n} \ln\left(f_{\xi}(x_{i};\alpha, \mu, \sigma)\right),
where (in addition to previously defined notations):
x_{i} - are observations i.e.data matrix rows.
n - is sample size i.e. the number ofdata matrix rows.
Argumentspol_degrees,tr_left,tr_right,given_ind andomit_ind affect the form off_{\xi}\left(x_{i};\alpha, \mu, \sigma)\right) in a way described indhpa 'Details' section. Note that change ofgiven_ind andomit_ind values may result in estimator whichstatistical properties has not been rigorously investigated yet.
The first polynomial coefficient (zero powers) set to 1 for identification purposes i.e.\alpha_{(0,...,0)}=1.
AllNA andNaN values will be removed fromdata matrix.
The function calculates standard errors via sandwich estimatorand significance levels are reported taking into account quasi maximumlikelihood estimator (QMLE) asymptotic normality. If one wants to switchfrom QMLE to semi-nonparametric estimator (SNPE) during hypothesis testingthen covariance matrix should be estimated again using bootstrap.
This function maximizes (quasi) log-likelihood function viaoptim function setting itsmethod argument to "BFGS". Ifopt_type = "GA" then geneticalgorithm fromga functionwill be additionally (afteroptim putting itssolution (par) intosuggestions matrix) applied in order to perform global optimization. Note that global optimization takesmuch more time (usually minutes but sometimes hours or even days). The number of iterations and population size of the genetic algorithmwill grow linearly along with the number of estimated parameters. If it seems that global maximum has not been found then itis possible to continue the search restarting the function setting its input argumentx0 tox1 output value. Note thatifcov_type = "bootstrap" thengafunction will not be used for bootstrap iterations since itmay be extremely time consuming.
Ifopt_type = "GA" thenopt_control should be thelist containing the values to be passed togafunction. It is possible to pass argumentslower,upper,popSize,pcrossover,pmutation,elitism,maxiter,suggestions,optim,optimArgs,seed andmonitor. Note that it is possible to setpopulation,selection,crossover andmutation arguments changingga default parameters viagaControl function. These arguments information reported inga.In order to provide manual values forlower andupper boundsplease follow parameters ordering mentioned above for thex0 argument. If these bounds are not provided manually thenthey (except those related to the polynomial coefficients)will depend on the estimates obtainedby local optimization viaoptim function(this estimates will be in the middlebetweenlower andupper).Specifically for each sd parameterlower (upper) boundis 5 times lower (higher) than thisparameteroptim estimate.For each mean and regression coefficient parameter its lower and upper bounds deviate from correspondingoptim estimateby two absolute values of this estimate.Finally, lower and upper bounds for each polynomialcoefficient are-10 and10 correspondingly (do not dependon theiroptim estimates).
The following arguments are differ from their defaults inga:
pmutation = 0.2,optim = TRUE,optimArgs =list("method" = "Nelder-Mead", "poptim" = 0.2, "pressel" = 0.5),seed = 8,elitism = 2 + round(popSize * 0.1).
The argumentspopSize andmaxiter ofga function have been set proportional to the number ofestimated polynomial coefficients:
popSize = 10 + (prod(pol_degrees + 1) - 1) * 2.maxiter = 50 * (prod(pol_degrees + 1))
Value
This function returns an object of class "hpaML".
An object of class "hpaML" is a list containing the following components:
optim-optimfunction output. Ifopt_type = "GA"then it is the list containingoptimandgafunctions outputs.x1- numeric vector of distribution parameters estimates.mean- density function mean vector estimate.sd- density function sd vector estimate.pol_coefficients- polynomial coefficients estimates.tr_left- the same astr_leftinput parameter.tr_right- the same astr_rightinput parameter.omit_ind- the same asomit_indinput parameter.given_ind- the same asgiven_indinput parameter.cov_mat- covariance matrix estimate.results- numeric matrix representing estimation results.log-likelihood- value of Log-Likelihood function.AIC- AIC value.data- the same asdatainput parameter but withoutNAobservations.n_obs- number of observations.bootstrap- list where bootstrap estimation results are stored.
References
A. Gallant and D. W. Nychka (1987) <doi:10.2307/1913241>
See Also
summary.hpaML,predict.hpaML,logLik.hpaML,plot.hpaML
Examples
## Approximate Student (t) distribution# Set seed for reproducibilityset.seed(123)# Simulate 5000 realizations of Student distribution # with 5 degrees of freedomn <- 5000df <- 5x <- matrix(rt(n, df), ncol = 1)pol_degrees <- c(4)# Apply pseudo maximum likelihood routineml_result <- hpa::hpaML(data = x, pol_degrees = pol_degrees)summary(ml_result)# Get predicted probabilites (density values) approximationspredict(ml_result)# Plot density approximationplot(ml_result)## Approximate chi-squared distribution# Set seed for reproducibilityset.seed(123)# Simulate 5000 realizations of chi-squared distribution # with 5 degrees of freedomn <- 5000df <- 5x <- matrix(rchisq(n, df), ncol = 1)pol_degrees <- c(5)# Apply pseudo maximum likelihood routineml_result <- hpaML(data = x, pol_degrees = as.vector(pol_degrees), tr_left = 0)summary(ml_result)# Get predicted probabilites (density values) approximationspredict(ml_result)# Plot density approximationplot(ml_result)## Approximate multivariate Student (t) distribution## Note that calculations may take up to a minute# Set seed for reproducibilityset.seed(123)# Simulate 5000 realizations of three dimensional Student distribution # with 5 degrees of freedomlibrary("mvtnorm")cov_mat <- matrix(c(1, 0.5, -0.5, 0.5, 1, 0.5, -0.5, 0.5, 1), ncol = 3)x <- rmvt(n = 5000, sigma = cov_mat, df = 5)# Estimate approximating joint distribution parametersml_result <- hpaML(data = x, pol_degrees = c(1, 1, 1))# Get summarysummary(ml_result)# Get predicted values for joint density functionpredict(ml_result)# Plot density approximation for the# second random variableplot(ml_result, ind = 2)# Plot density approximation for the# second random variable conditioning# on x1 = 1plot(ml_result, ind = 2, given = c(1, NA, NA))## Approximate Student (t) distribution and plot densities approximated## under different hermite polynomial degrees against ## true density (of Student distribution)# Simulate 5000 realizations of t-distribution with 5 degrees of freedomn <- 5000df <- 5x <- matrix(rt(n, df), ncol=1)# Apply pseudo maximum likelihood routine# Create matrix of lists where i-th element contains hpaML results for K=iml_result <- matrix(list(), 4, 1)for(i in 1:4){ ml_result[[i]] <- hpa::hpaML(data = x, pol_degrees = i)}# Generate test valuestest_values <- seq(qt(0.001, df), qt(0.999, df), 0.001)n0 <- length(test_values)# t-distribution density function at test values pointstrue_pred <- dt(test_values, df)# Create matrix of lists where i-th element contains # densities predictions for K=iPGN_pred <- matrix(list(), 4, 1)for(i in 1:4){ PGN_pred[[i]] <- predict(object = ml_result[[i]], newdata = matrix(test_values, ncol=1))}# Plot the resultlibrary("ggplot2")# prepare the datah <- data.frame("values" = rep(test_values,5), "predictions" = c(PGN_pred[[1]],PGN_pred[[2]], PGN_pred[[3]],PGN_pred[[4]], true_pred), "Density" = c( rep("K=1",n0), rep("K=2",n0), rep("K=3",n0), rep("K=4",n0), rep("t-distribution",n0)) ) # build the plotggplot(h, aes(values, predictions)) + geom_point(aes(color = Density)) + theme_minimal() + theme(legend.position = "top", text = element_text(size=26), legend.title=element_text(size=20), legend.text=element_text(size=28)) + guides(colour = guide_legend(override.aes = list(size=10)) )# Get informative estimates summary for K=4summary(ml_result[[4]])Perform semi-nonparametric selection model estimation
Description
This function performs semi-nonparametric (SNP) maximum likelihood estimation of sample selection model using Hermite polynomial based approximating function proposed by Gallant and Nychka in 1987. Please, seedhpa 'Details' section to get more information concerning this approximating function.
Usage
hpaSelection( selection, outcome, data, selection_K = 1L, outcome_K = 1L, pol_elements = 3L, is_Newey = FALSE, x0 = numeric(0), is_Newey_loocv = FALSE, cov_type = "sandwich", boot_iter = 100L, is_parallel = FALSE, opt_type = "optim", opt_control = NULL, is_validation = TRUE)Arguments
selection | an object of class "formula" (or one that can be coerced to that class): a symbolic description of the selection equation form. All variables in |
outcome | an object of class "formula" (or one that can be coerced to that class): a symbolic description of the outcome equation form. All variables in |
data | data frame containing the variables in the model. |
selection_K | non-negative integer representing polynomial degree related to selection equation. |
outcome_K | non-negative integer representing polynomial degree related to outcome equation. |
pol_elements | number of conditional expectation approximating terms for Newey's method. If |
is_Newey | logical; if TRUE then returns only Newey's method estimation results (default value is FALSE). |
x0 | numeric vector of optimization routine initial values.Note that |
is_Newey_loocv | logical; if TRUE then number of conditional expectation approximating terms for Newey's method will be selectedbased on leave-one-out cross-validation criteria iterating through 0 to pol_elements number of these terms. |
cov_type | character determining the type of covariance matrix to bereturned and used for summary. If |
boot_iter | the number of bootstrap iterationsfor |
is_parallel | if |
opt_type | string value determining the type of the optimizationroutine to be applied. The default is |
opt_control | a list containing arguments to be passed to theoptimization routine depending on |
is_validation | logical value indicating whether function input arguments should be validated. Set it to |
Details
Densities Hermite polynomial approximation approach has beenproposed by A. Gallant and D. W. Nychka in 1987. The main idea is toapproximate unknown distribution density with scaled Hermite polynomial.For more information please refer to the literature listed below.
Let's use notations introduced indhpa 'Details' section. FunctionhpaSelection maximizes the followingquasi log-likelihood function:
\ln L(\gamma, \beta, \alpha, \mu, \sigma; x) = \sum\limits_{i:z_{i}=1} \ln\left(\overline{F}_{\left(\xi_{1}|\xi_{2}=y_{i}-x_{i}^{o}\beta\right)}\left(-\gamma x_{i}^{s}, \infty;\alpha, \mu, \sigma\right)\right)f_{\xi_{2}}\left(y_{i}-x_{i}^{o}\beta\right)+
+\sum\limits_{i:z_{i}=0} \ln\left(\overline{F}_{\xi}(-\infty, -x_{i}^{s}\gamma;\alpha, \mu, \sigma)\right),
where (in addition to previously defined notations):
x_{i}^{s} - is row vector of selection equation regressors derived fromdata according toselection formula.
x_{i}^{o} - is row vector of outcome equation regressors derived fromdata according tooutcome formula.
\gamma - is column vector of selection equation regression coefficients (constant will not be added by default).
\beta - is column vector of outcome equation regression coefficients (constant will not be added by default).
z_{i} - binary (0 or 1) dependent variable defined inselection formula.
y_{i} - continuous dependent variable defined inoutcome formula.
Note that\xi is two dimensional andselection_K correspondstoK_{1} whileoutcome_K determinesK_{2}.
The first polynomial coefficient (zero powers) set to 1 for identification purposes i.e.\alpha_{0}=1.
Rows indata corresponding to variables mentioned inselectionandoutcome formulas which have at least oneNAvalue will be ignored. The exception is continues dependent variabley which may haveNA values for observation wherez_{i}=0.
Note that coefficient for the firstindependent variable inselection will be fixedto 1 i.e.\gamma_{1}=1.
All variables mentioned inselection andoutcome should be numeric vectors.
The function calculates standard errors via sandwich estimatorand significance levels are reported taking into account quasi maximumlikelihood estimator (QMLE) asymptotic normality. If one wants to switchfrom QMLE to semi-nonparametric estimator (SNPE) during hypothesis testingthen covariance matrix should be estimated again using bootstrap.
Initial values for optimization routine are obtained by Newey's method (see the reference below). In order to obtain initial valuesvia least squares please, setpol_elements = 0. Initial values forthe outcome equation are obtained viahpaBinary functionsettingK toselection_K.
Note that selection equation dependent variables should have exactly two levels (0 and 1) where "0" states for the selection results which leads to unobservable values of dependent variable in outcome equation.
This function maximizes (quasi) log-likelihood function viaoptim function setting itsmethod argument to "BFGS". Ifopt_type = "GA" then geneticalgorithm fromga functionwill be additionally (afteroptim putting itssolution (par) intosuggestions matrix) applied in order to perform global optimization. Note that global optimization takesmuch more time (usually minutes but sometimes hours or even days). The number of iterations and population size of the genetic algorithmwill grow linearly along with the number of estimated parameters. If it seems that global maximum has not been found then itis possible to continue the search restarting the function setting its input argumentx0 tox1 output value. Note thatifcov_type = "bootstrap" thengafunction will not be used for bootstrap iterations since itmay be extremely time consuming.
Ifopt_type = "GA" thenopt_control should be thelist containing the values to be passed togafunction. It is possible to pass argumentslower,upper,popSize,pcrossover,pmutation,elitism,maxiter,suggestions,optim,optimArgs,seed andmonitor. Note that it is possible to setpopulation,selection,crossover andmutation arguments changingga default parameters viagaControl function. These arguments information reported inga.In order to provide manual values forlower andupper boundsplease follow parameters ordering mentioned above for thex0 argument. If these bounds are not provided manually thenthey (except those related to the polynomial coefficients)will depend on the estimates obtainedby local optimization viaoptim function(this estimates will be in the middlebetweenlower andupper).Specifically for each sd parameterlower (upper) boundis 5 times lower (higher) than thisparameteroptim estimate.For each mean and regression coefficient parameter its lower and upper bounds deviate from correspondingoptim estimateby two absolute values of this estimate.Finally, lower and upper bounds for each polynomialcoefficient are-10 and10 correspondingly (do not dependon theiroptim estimates).
The following arguments are differ from their defaults inga:
pmutation = 0.2,optim = TRUE,optimArgs =list("method" = "Nelder-Mead", "poptim" = 0.2, "pressel" = 0.5),seed = 8,elitism = 2 + round(popSize * 0.1).
Let's denote byn_reg the number of regressorsincluded into theselection andoutcome formulas.The argumentspopSize andmaxiter ofga function have been set proportional to the number ofestimated polynomial coefficients and independent variables:
popSize = 10 + 5 * (z_K + 1) * (y_K + 1) + 2 * n_regmaxiter = 50 * (z_K + 1) * (y_K + 1) + 10 * n_reg
Value
This function returns an object of class "hpaSelection".
An object of class "hpaSelection" is a list containing the following components:
optim-optimfunction output. Ifopt_type = "GA"then it is the list containingoptimandgafunctions outputs.x1- numeric vector of distribution parameters estimates.Newey- list containing information concerning Newey's method estimation results.selection_mean- estimate of the hermite polynomial mean parameter related to selection equation random error marginal distribution.outcome_mean- estimate of the hermite polynomial mean parameter related to outcome equation random error marginal distribution.selection_sd- estimate of sd parameter related to selection equation random error marginal distribution.outcome_sd- estimate of the hermite polynomial sd parameter related to outcome equation random error marginal distribution.pol_coefficients- polynomial coefficients estimates.pol_degrees- numeric vector which first element isselection_Kand the second isoutcome_K.selection_coef- selection equation regression coefficients estimates.outcome_coef- outcome equation regression coefficients estimates.cov_mat- covariance matrix estimate.results- numeric matrix representing estimation results.log-likelihood- value of Log-Likelihood function.re_moments- list which contains information about random errors expectations, variances and correlation.data_List- list containing model variables and their partition according to outcome and selection equations.n_obs- number of observations.ind_List- list which contains information about parameters indexes inx1.selection_formula- the same asselectioninput parameter.outcome_formula- the same asoutcomeinput parameter.
Abovementioned listNewey has class "hpaNewey" and contains the following components:
outcome_coef- regression coefficients estimates (except constant term which is part of conditional expectation approximating polynomial).selection_coef- regression coefficients estimates related to selection equation.constant_biased- biased estimate of constant term.inv_mills- inverse mills ratios estimates and their powers (including constant).inv_mills_coef- coefficients related toinv_mills.pol_elements- the same aspol_elementsinput parameter. However ifis_Newey_loocvisTRUEthen it will equal to the number of conditional expectation approximating terms for Newey's method which minimize leave-one-out cross-validation criteria.outcome_exp_cond- dependent variable conditional expectation estimates.selection_exp- selection equation random error expectation estimate.selection_var- selection equation random error variance estimate.hpaBinaryModel- object of class "hpaBinary" which contains selection equation estimation results.
Abovementioned listre_moments contains the following components:
selection_exp- selection equation random errors expectation estimate.selection_var- selection equation random errors variance estimate.outcome_exp- outcome equation random errors expectation estimate.outcome_var- outcome equation random errors variance estimate.errors_covariance- outcome and selection equation random errors covariance estimate.rho- outcome and selection equation random errors correlation estimate.rho_std- outcome and selection equation random errors correlation estimator standard error estimate.
References
A. Gallant and D. W. Nychka (1987) <doi:10.2307/1913241>
W. K. Newey (2009) <https://doi.org/10.1111/j.1368-423X.2008.00263.x>
Mroz T. A. (1987) <doi:10.2307/1911029>
See Also
summary.hpaSelection,predict.hpaSelection,plot.hpaSelection,logLik.hpaSelection
Examples
## Let's estimate wage equation accounting for non-random selection.## See the reference to Mroz TA (1987) to get additional details about## the data this examples use# Prepare datalibrary("sampleSelection")data("Mroz87")h = data.frame("kids" = as.numeric(Mroz87$kids5 + Mroz87$kids618 > 0),"age" = as.numeric(Mroz87$age),"faminc" = as.numeric(Mroz87$faminc),"educ" = as.numeric(Mroz87$educ),"exper" = as.numeric(Mroz87$exper),"city" = as.numeric(Mroz87$city),"wage" = as.numeric(Mroz87$wage),"lfp" = as.numeric(Mroz87$lfp))# Estimate model parametersmodel <- hpaSelection(selection = lfp ~ educ + age + I(age ^ 2) + kids + log(faminc), outcome = log(wage) ~ exper + I(exper ^ 2) + educ + city, selection_K = 2, outcome_K = 3, data = h, pol_elements = 3, is_Newey_loocv = TRUE)summary(model)# Plot outcome equation random errors densityplot(model, type = "outcome")# Plot selection equation random errors densityplot(model, type = "selection")## Estimate semi-nonparametric sample selection model## parameters on simulated data given chi-squared random errorsset.seed(100)library("mvtnorm")# Sample sizen <- 1000# Simulate independent variablesX_rho <- 0.5X_sigma <- matrix(c(1, X_rho, X_rho, X_rho, 1, X_rho, X_rho,X_rho,1), ncol=3)X <- rmvnorm(n=n, mean = c(0,0,0), sigma = X_sigma)# Simulate random errorsepsilon <- matrix(0, n, 2)epsilon_z_y <- rchisq(n, 5)epsilon[, 1] <- (rchisq(n, 5) + epsilon_z_y) * (sqrt(3/20)) - 3.8736epsilon[, 2] <- (rchisq(n, 5) + epsilon_z_y) * (sqrt(3/20)) - 3.8736# Simulate selection equationz_star <- 1 + 1 * X[,1] + 1 * X[,2] + epsilon[,1]z <- as.numeric((z_star > 0))# Simulate outcome equationy_star <- 1 + 1 * X[,1] + 1 * X[,3] + epsilon[,2]z <- as.numeric((z_star > 0))y <- y_stary[z==0] <- NAh <- as.data.frame(cbind(z, y, X))names(h) <- c("z", "y", "x1", "x2", "x3")# Estimate parametersmodel <- hpaSelection(selection = z ~ x1 + x2, outcome = y ~ x1 + x3, data = h, selection_K = 1, outcome_K = 3)summary(model)# Get conditional predictions for outcome equationmodel_pred_c <- predict(model, is_cond = TRUE)# Conditional predictions y|z=1model_pred_c$y_1# Conditional predictions y|z=0model_pred_c$y_0# Get unconditional predictions for outcome equationmodel_pred_u <- predict(model, is_cond = FALSE)model_pred_u$y# Get conditional predictions for selection equation# Note that for z=0 these predictions are NApredict(model, is_cond = TRUE, type = "selection")# Get unconditional predictions for selection equationpredict(model, is_cond = FALSE, type = "selection")Probabilities and Moments Hermite Spline Approximation
Description
The set of functions similar todhpa-likefunctions. The difference is that instead of polynomial these functionsutilize spline.
Usage
dhsa(x, m, knots, mean = 0, sd = 1, log = FALSE)ehsa(m, knots, mean = 0, sd = 1, power = 1)Arguments
x | numeric vector of values for which the function should be estimated. |
m | numeric matrix which rows correspond to spline intervalswhile columns represent variables powers. Therefore the element in i-th row and j-th column represents the coefficient associated withthe variable that 1) belongs to the i-th interval i.e. between i-th and(i + 1)-th knots 2) raised to the power of (j - 1). |
knots | sorted in ascending order numeric vector representingknots of the spline. |
mean | expected value of a normal distribution. |
sd | standard deviation of a normal distribution. |
log | logical; if |
power | non-negative integer representing the power of the expected value i.e. E(X ^ power) will be estimated. |
Details
In contrast todhpa-like functions thesefunctions may deal with univariate distributions only. In future thisfunctions will be generalized to work with multivariate distributions.The main idea of these functions is to use squared spline instead of squared polynomial in order to provide greater numeric stability and approximation accuracy. To provide spline parameters please usem andknotsarguments (i.e. instead ofpol_degrees andpol_coefficientsarguments that where used to specify the polynomialfordhpa-like functions).
Value
Functiondhsa returns vector of probabilitiesof the same length asx. Functionehsa returns moment value.
See Also
Examples
## Examples demonstrating dhsa and ehsa functions application.# Generate a b-splinesb <- bsplineGenerate(knots = c(-2.1, 1.5, 1.5, 2.2, 3.7, 4.2, 5), degree = 3) # Combine b-splines into a splinespline <- bsplineComb(splines = b, weights = c(1.6, -1.2, 3.2))# Assign parameters using the spline created aboveknots <- spline$knotsm <- spline$mmean <- 1sd <- 2# Estimate the density at particular pointsx <- c(2, 3.7, 8)dhsa(x, m = m, knots = knots, mean = mean, sd = sd) # Calculate expected valueehsa(m = m, knots = knots, mean = mean, sd = sd, power = 1) # Evaluate the third momentehsa(m = m, knots = knots, mean = mean, sd = sd, power = 3)Calculates log-likelihood for "hpaBinary" object
Description
This function calculates log-likelihood for "hpaBinary" object
Usage
## S3 method for class 'hpaBinary'logLik(object, ...)Arguments
object | Object of class "hpaBinary" |
... | further arguments (currently ignored) |
Calculates log-likelihood for "hpaML" object
Description
This function calculates log-likelihood for "hpaML" object
Usage
## S3 method for class 'hpaML'logLik(object, ...)Arguments
object | Object of class "hpaML" |
... | further arguments (currently ignored) |
Calculates log-likelihood for "hpaSelection" object
Description
This function calculates log-likelihood for "hpaSelection" object
Usage
## S3 method for class 'hpaSelection'logLik(object, ...)Arguments
object | Object of class "hpaSelection" |
... | further arguments (currently ignored) |
Calculates log-likelihood for "hpaBinary" object
Description
This function calculates log-likelihood for "hpaBinary" object
Usage
logLik_hpaBinary(object)Arguments
object | Object of class "hpaBinary" |
Calculates log-likelihood for "hpaML" object
Description
This function calculates log-likelihood for "hpaML" object
Usage
logLik_hpaML(object)Arguments
object | Object of class "hpaML" |
Calculates log-likelihood for "hpaSelection" object
Description
This function calculates log-likelihood for "hpaSelection" object
Usage
logLik_hpaSelection(object)Arguments
object | Object of class "hpaSelection" |
Calculates multivariate empirical cumulative distribution function
Description
This function calculates multivariate empirical cumulative distribution functionat each point of the sample
Usage
mecdf(x)Arguments
x | numeric matrix which rows are observations |
Calculate k-th order moment of normal distribution
Description
This function recursively calculates k-th order moment of normal distribution.
Usage
normalMoment( k = 0L, mean = 0, sd = 1, return_all_moments = FALSE, is_validation = TRUE, is_central = FALSE, diff_type = "NO")Arguments
k | non-negative integer moment order. |
mean | numeric expected value. |
sd | positive numeric standard deviation. |
return_all_moments | logical; if |
is_validation | logical value indicating whether function input arguments should be validated. Set it to |
is_central | logical; if |
diff_type | string value indicating the type of the argumentthe moment should be differentiated respect to.Default value is |
Details
This function estimatesk-th order moment of normal distribution which mean equals tomean and standard deviation equals tosd.
Note that parameterk value automatically converts to integer. So passing non-integerk value will not cause any errors but the calculations will be performed for roundedk value only.
Value
This function returnsk-th order moment ofnormal distribution which mean equals tomean and standard deviation issd. Ifreturn_all_moments isTRUE then see this argument description above for output details.
Examples
## Calculate 5-th order moment of normal random variable which## mean equals to 3 and standard deviation is 5.# 5-th momentnormalMoment(k = 5, mean = 3, sd = 5)# (0-5)-th momentsnormalMoment(k = 5, mean = 3, sd = 5, return_all_moments = TRUE)# 5-th moment derivative respect to meannormalMoment(k = 5, mean = 3, sd = 5, diff_type = "mean")# 5-th moment derivative respect to sdnormalMoment(k = 5, mean = 3, sd = 5, diff_type = "sd")Plot hpaBinary random errors approximated density
Description
Plot hpaBinary random errors approximated density
Usage
## S3 method for class 'hpaBinary'plot(x, y = NULL, ...)Arguments
x | Object of class "hpaBinary" |
y | this parameter currently ignored |
... | further arguments to be passed to |
Plot approximated marginal density using hpaML output
Description
Plot approximated marginal density using hpaML output
Usage
## S3 method for class 'hpaML'plot(x, y = NULL, ..., ind = 1, given = NULL)Arguments
x | Object of class "hpaML" |
y | this parameter currently ignored |
... | further arguments to be passed to |
ind | index of random variable for whichapproximation to marginal density should be plotted |
given | numeric vector of the same length as given_indfrom |
Plot hpaSelection random errors approximated density
Description
Plot hpaSelection random errors approximated density
Usage
## S3 method for class 'hpaSelection'plot(x, y = NULL, ..., type = "outcome")Arguments
x | Object of class "hpaSelection" |
y | this parameter currently ignored |
... | further arguments to be passed to |
type | character; if "outcome" then function plots the graph for outcome equation random errors, if "selection" then plot for selection equation random errors will be generated. |
Calculate normal cdf in parallel
Description
Calculate in parallel for each value from vectorx distribution function of normal distribution with mean equal tomean and standard deviation equal tosd.
Usage
pnorm_parallel(x, mean = 0, sd = 1, is_parallel = FALSE)Arguments
x | vector of quantiles: should be numeric vector,not just double value. |
mean | double value. |
sd | double positive value. |
is_parallel | if |
Multivariate Polynomial Representation
Description
FunctionpolynomialIndex provides matrix which allows to iterate through the elements of multivariate polynomial being aware of these elements powers. So (i, j)-th element of the matrix is power of j-th variable in i-th multivariate polynomial element.
FunctionprintPolynomial prints multivariate polynomialgiven its degrees (pol_degrees) and coefficients (pol_coefficients) vectors.
Usage
polynomialIndex(pol_degrees = numeric(0), is_validation = TRUE)printPolynomial(pol_degrees, pol_coefficients, is_validation = TRUE)Arguments
pol_degrees | non-negative integer vector of polynomial degrees (orders). |
is_validation | logical value indicating whether function input arguments should be validated. Set it to |
pol_coefficients | numeric vector of polynomial coefficients. |
Details
Multivariate polynomial of degrees(K_{1},...,K_{m}) (pol_degrees) has the form:
a_{(0,...,0)}x_{1}^{0}*...*x_{m}^{0}+ ... + a_{(K_{1},...,K_{m})}x_{1}^{K_{1}}*...*x_{m}^{K_{m}},
wherea_{(i_{1},...,i_{m})} are polynomial coefficients, whilepolynomial elements are:
a_{(i_{1},...,i_{m})}x_{1}^{i_{1}}*...*x_{m}^{i_{m}},
where(i_{1},...,i_{m}) are polynomial element's powers correspondingto variables(x_{1},...,x_{m}) respectively. Note thati_{j}\in \{0,...,K_{j}\}.
FunctionprintPolynomial removes polynomial elements which coefficients are zero and variables which powers are zero. Output may contain long coefficients representation as they are not rounded.
Value
FunctionpolynomialIndex returns matrix which rows are responsible for variables while columns are related to powers. So(i, j)-th element of this matrix corresponds to the poweri_{j} of thex_{j} variable ini-th polynomial element. Thereforei-th column of this matrix contains vector ofpowers(i_{1},...,i_{m}) for thei-th polynomial element.So the function transformsm-dimensional elements indexingto one-dimensional.
FunctionprintPolynomial returns the string which contains polynomial symbolic representation.
Examples
## Get polynomial indexes matrix for the polynomial ## which degrees are (1, 3, 5)polynomialIndex(c(1, 3, 5))## Consider multivariate polynomial of degrees (2, 1) such that coefficients## for elements which powers sum is even are 2 and for those which powers sum## is odd are 5. So the polynomial is 2+5x2+5x1+2x1x2+2x1^2+5x1^2x2 where## x1 and x2 are polynomial variables.# Create variable to store polynomial degreespol_degrees <- c(2, 1)# Let's represent its powers (not coefficients) in a matrix formpol_matrix <- polynomialIndex(pol_degrees)# Calculate polynomial elements' powers sumspol_powers_sum <- pol_matrix[1, ] + pol_matrix[2, ]# Let's create polynomial coefficients vector filling it# with NA valuespol_coefficients <- rep(NA, (pol_degrees[1] + 1) * (pol_degrees[2] + 1))# Now let's fill coefficients vector with correct valuespol_coefficients[pol_powers_sum %% 2 == 0] <- 2pol_coefficients[pol_powers_sum %% 2 != 0] <- 5# Finally, let's check that correspondence is correctprintPolynomial(pol_degrees, pol_coefficients)## Let's represent polynomial 0.3+0.5x2-x2^2+2x1+1.5x1x2+x1x2^2pol_degrees <- c(1, 2)pol_coefficients <- c(0.3, 0.5, -1, 2, 1.5, 1)printPolynomial(pol_degrees, pol_coefficients)Predict method for hpaBinary
Description
Predict method for hpaBinary
Usage
## S3 method for class 'hpaBinary'predict(object, ..., newdata = NULL, is_prob = TRUE)Arguments
object | Object of class "hpaBinary" |
... | further arguments (currently ignored) |
newdata | An optional data frame (forhpaBinary andhpaSelection) or numeric matrix (forhpaML)in which to look for variables with which to predict. If omitted,the original data frame (matrix) used. |
is_prob | logical; if TRUE (default) then function returns predicted probabilities. Otherwise latent variable(single index) estimates will be returned. |
Value
This function returns predicted probabilities based onhpaBinary estimation results.
Predict method for hpaML
Description
Predict method for hpaML
Usage
## S3 method for class 'hpaML'predict(object, ..., newdata = matrix(c(0)))Arguments
object | Object of class "hpaML" |
... | further arguments (currently ignored) |
newdata | An optional data frame (forhpaBinary andhpaSelection) or numeric matrix (forhpaML)in which to look for variables with which to predict. If omitted,the original data frame (matrix) used. |
Value
This function returns predictions based onhpaML estimation results.
Predict outcome and selection equation values from hpaSelection model
Description
This function predicts outcome and selection equation values from hpaSelection model.
Usage
## S3 method for class 'hpaSelection'predict( object, ..., newdata = NULL, method = "HPA", is_cond = TRUE, type = "outcome")Arguments
object | Object of class "hpaSelection" |
... | further arguments (currently ignored) |
newdata | An optional data frame (forhpaBinary andhpaSelection) or numeric matrix (forhpaML)in which to look for variables with which to predict. If omitted,the original data frame (matrix) used. |
method | string value indicating prediction method based on hermite polynomial approximation "HPA" or Newey method "Newey". |
is_cond | logical; if |
type | character; if "outcome" (default) then predictions for selection equation will be estimated according to |
Details
Note that Newey method can't predict conditional outcomes for zero selection equation value. Conditional probabilities for selection equation could be estimated only when dependent variable from outcome equation is observable.
Value
This function returns the list which structure depends onmethod,is_probit andis_outcome values.
Predict method for hpaBinary
Description
Predict method for hpaBinary
Usage
predict_hpaBinary(object, newdata = NULL, is_prob = TRUE)Arguments
object | Object of class "hpaBinary" |
newdata | An optional data frame (forhpaBinary andhpaSelection) or numeric matrix (forhpaML)in which to look for variables with which to predict. If omitted,the original data frame (matrix) used. |
is_prob | logical; if TRUE (default) then function returns predicted probabilities. Otherwise latent variable(single index) estimates will be returned. |
Value
This function returns predicted probabilities based onhpaBinary estimation results.
Predict method for hpaML
Description
Predict method for hpaML
Usage
predict_hpaML(object, newdata = matrix(1, 1))Arguments
object | Object of class "hpaML" |
newdata | An optional data frame (forhpaBinary andhpaSelection) or numeric matrix (forhpaML)in which to look for variables with which to predict. If omitted,the original data frame (matrix) used. |
Value
This function returns predictions based onhpaML estimation results.
Predict outcome and selection equation values from hpaSelection model
Description
This function predicts outcome and selection equation values from hpaSelection model.
Usage
predict_hpaSelection( object, newdata = NULL, method = "HPA", is_cond = TRUE, is_outcome = TRUE)Arguments
object | Object of class "hpaSelection" |
newdata | An optional data frame (forhpaBinary andhpaSelection) or numeric matrix (forhpaML)in which to look for variables with which to predict. If omitted,the original data frame (matrix) used. |
method | string value indicating prediction method based on hermite polynomial approximation "HPA" or Newey method "Newey". |
is_cond | logical; if |
is_outcome | logical; if |
Details
Note that Newey method can't predict conditional outcomes for zero selection equation value. Conditional probabilities for selection equation could be estimated only when dependent variable from outcome equation is observable.
Value
This function returns the list which structure depends onmethod,is_probit andis_outcome values.
Print method for "hpaBinary" object
Description
Print method for "hpaBinary" object
Usage
## S3 method for class 'hpaBinary'print(x, ...)Arguments
x | Object of class "hpaBinary" |
... | further arguments (currently ignored) |
Print method for "hpaML" object
Description
Print method for "hpaML" object
Usage
## S3 method for class 'hpaML'print(x, ...)Arguments
x | Object of class "hpaML" |
... | further arguments (currently ignored) |
Print method for "hpaSelection" object
Description
Print method for "hpaSelection" object
Usage
## S3 method for class 'hpaSelection'print(x, ...)Arguments
x | Object of class "hpaSelection" |
... | further arguments (currently ignored) |
Summary for "hpaBinary" object
Description
Summary for "hpaBinary" object
Usage
## S3 method for class 'summary.hpaBinary'print(x, ...)Arguments
x | Object of class "hpaBinary" |
... | further arguments (currently ignored) |
Summary for hpaML output
Description
Summary for hpaML output
Usage
## S3 method for class 'summary.hpaML'print(x, ...)Arguments
x | Object of class "hpaML" |
... | further arguments (currently ignored) |
Summary for "hpaSelection" object
Description
Summary for "hpaSelection" object
Usage
## S3 method for class 'summary.hpaSelection'print(x, ...)Arguments
x | Object of class "hpaSelection" |
... | further arguments (currently ignored) |
Summary for hpaBinary output
Description
Summary for hpaBinary output
Usage
print_summary_hpaBinary(x)Arguments
x | Object of class "hpaML" |
Summary for hpaML output
Description
Summary for hpaML output
Usage
print_summary_hpaML(x)Arguments
x | Object of class "hpaML" |
Summary for hpaSelection output
Description
Summary for hpaSelection output
Usage
print_summary_hpaSelection(x)Arguments
x | Object of class "hpaSelection" |
Summarizing hpaBinary Fits
Description
Summarizing hpaBinary Fits
Usage
## S3 method for class 'hpaBinary'summary(object, ...)Arguments
object | Object of class "hpaBinary" |
... | further arguments (currently ignored) |
Value
This function returns the same list ashpaBinary function changing its class to "summary.hpaBinary".
Summarizing hpaML Fits
Description
Summarizing hpaML Fits
Usage
## S3 method for class 'hpaML'summary(object, ...)Arguments
object | Object of class "hpaML" |
... | further arguments (currently ignored) |
Value
This function returns the same list ashpaML function changing its class to "summary.hpaML".
Summarizing hpaSelection Fits
Description
This function summarizing hpaSelection Fits
Usage
## S3 method for class 'hpaSelection'summary(object, ...)Arguments
object | Object of class "hpaSelection" |
... | further arguments (currently ignored) |
Value
This function returns the same list ashpaSelection function changing its class to "summary.hpaSelection".
Summarizing hpaBinary Fits
Description
Summarizing hpaBinary Fits
Usage
summary_hpaBinary(object)Arguments
object | Object of class "hpaBinary" |
Value
This function returns the same list ashpaBinary function changing its class to "summary.hpaBinary".
Summarizing hpaML Fits
Description
Summarizing hpaML Fits
Usage
summary_hpaML(object)Arguments
object | Object of class "hpaML" |
Value
This function returns the same list ashpaML function changing its class to "summary.hpaML".
Summarizing hpaSelection Fits
Description
This function summarizing hpaSelection Fits.
Usage
summary_hpaSelection(object)Arguments
object | Object of class "hpaSelection". |
Value
This function returns the same list ashpaSelection function changing its class to "summary.hpaSelection".
Calculate k-th order moment of truncated normal distribution
Description
This function recursively calculates k-th order moment of truncated normal distribution.
Usage
truncatedNormalMoment( k = 1L, x_lower = numeric(0), x_upper = numeric(0), mean = 0, sd = 1, pdf_lower = numeric(0), cdf_lower = numeric(0), pdf_upper = numeric(0), cdf_upper = numeric(0), cdf_difference = numeric(0), return_all_moments = FALSE, is_validation = TRUE, is_parallel = FALSE, diff_type = "NO")Arguments
k | non-negative integer moment order. |
x_lower | numeric vector of lower truncation points. |
x_upper | numeric vector of upper truncation points. |
mean | numeric expected value. |
sd | positive numeric standard deviation. |
pdf_lower | non-negative numeric matrix of precalculated normaldensity functions with mean |
cdf_lower | non-negative numeric matrix of precalculated normal cumulative distribution functionswith mean |
pdf_upper | non-negative numeric matrix of precalculated normaldensity functions with mean |
cdf_upper | non-negative numeric matrix ofprecalculated normal cumulative distribution functionswith mean |
cdf_difference | non-negative numeric matrix of precalculated |
return_all_moments | logical; if |
is_validation | logical value indicating whether function input arguments should be validated. Set it to |
is_parallel | if |
diff_type | string value indicating the type of the argumentthe moment should be differentiated respect to.Default value is |
Details
This function estimatesk-th order moment ofnormal distribution which mean equals tomean and standard deviation equals tosd truncated at points given byx_lower andx_upper. Note that the function is vectorized so you can providex_lower andx_upper as vectors of equal size. If vectors values forx_lower andx_upper are not provided then their default values will be set to-(.Machine$double.xmin * 0.99) and(.Machine$double.xmax * 0.99) correspondingly.
Note that parameterk value automatically converts to integer. So passing non-integerk value will not cause any errors but the calculations will be performed for roundedk value only.
If there is precalculated density or cumulative distributionfunctions at standardized truncation points (subtractmean and then divide bysd) then it is possible to providethem throughpdf_lower,pdf_upper,cdf_lower andcdf_upper arguments inorder to decrease number of calculations.
Value
This function returns vector of k-th order moments for normally distributed random variable with mean =mean and standard deviation =sd underx_lower andx_upper truncation pointsx_lower andx_upper correspondingly. Ifreturn_all_moments isTRUE then see this argument description above for output details.
Examples
## Calculate 5-th order moment of three truncated normal random ## variables (x1, x2, x3) which mean is 5 and standard deviation is 3. ## These random variables truncation points are given ## as follows:-1<x1<1, 0<x2<2, 1<x3<3.k <- 3x_lower <- c(-1, 0, 1, -Inf, -Inf)x_upper <- c(1, 2 , 3, 2, Inf)mean <- 3sd <- 5# get the momentstruncatedNormalMoment(k, x_lower, x_upper, mean, sd)# get matrix of (0-5)-th moments (columns) for each variable (rows)truncatedNormalMoment(k, x_lower, x_upper, mean, sd, return_all_moments = TRUE)# get the moments derivatives respect to meantruncatedNormalMoment(k, x_lower, x_upper, mean, sd, diff_type = "mean")# get the moments derivatives respect to standard deviationtruncatedNormalMoment(k, x_lower, x_upper, mean, sd, diff_type = "sd")Extract covariance matrix from hpaBinary object
Description
Extract covariance matrix from hpaBinary object
Usage
## S3 method for class 'hpaBinary'vcov(object, ...)Arguments
object | Object of class "hpaBinary" |
... | further arguments (currently ignored) |
Extract covariance matrix from hpaML object
Description
Extract covariance matrix from hpaML object
Usage
## S3 method for class 'hpaML'vcov(object, ...)Arguments
object | Object of class "hpaML" |
... | further arguments (currently ignored) |
Extract covariance matrix from hpaSelection object
Description
Extract covariance matrix from hpaSelection object
Usage
## S3 method for class 'hpaSelection'vcov(object, ...)Arguments
object | Object of class "hpaSelection" |
... | further arguments (currently ignored) |