Movatterモバイル変換


[0]ホーム

URL:


Version:2.1.1
Date:2023-04-04
Title:The Skew-Normal and Related Distributions Such as the Skew-t andthe SUN
Maintainer:Adelchi Azzalini <adelchi.azzalini@unipd.it>
Depends:R (≥ 3.0.0), methods, stats4
Imports:mnormt (≥ 2.0.0), numDeriv, utils, quantreg
Suggests:R.rsp
VignetteBuilder:R.rsp
Description:Build and manipulate probability distributions of the skew-normal family and some related ones, notably the skew-t and the SUN families. For the skew-normal and the skew-t distributions, statistical methods are provided for data fitting and model diagnostics, in the univariate and the multivariate case.
License:GPL-2 |GPL-3
URL:http://azzalini.stat.unipd.it/SN/
Encoding:UTF-8
NeedsCompilation:no
Packaged:2023-04-04 17:13:55 UTC; aa
Author:Adelchi AzzaliniORCID iD [aut, cre]
Repository:CRAN
Date/Publication:2023-04-04 18:10:02 UTC

Packagesn: overview, background and history

Description

Thesn package provides facilities to define and manipulate probability distributions of the skew-normal (SN) family and some related ones, notably the skew-t (ST) and the unified skew-normal (SUN) families. For a number of these families, statistical methods are provided, to perform data fitting and model diagnostics, in the univariate and the multivariate case.

Overview of the package structure and commands

A separatate document is entirely dedicated to the presentation of the package structure and its basic functions; see thepackage overview.

Background information and references

The package adopts the terminology, notation and general framework of themonograph by Azzalini and Capitanio (2014). This matching constitutes a reason for the numerous references to the book in the documentation of the package.

An additional reason for referring to that monograph instead of the originalresearch papers is that the book provides a relatively not-so-formal account of material which has been elaborated in a number of publications, sometimesvery technical, or re-elabotated over a few papers or possibly mixing the information of key interest with other material. In other words, the motivation behind this policy is readability,not indulgence in self-citation.

When one or a few original sources appeared to deliver the requiredinformation in a compact and accessible form, they have been cited directly. In any case, the cited sections of the book include bibliographic noteswhich refer back to the original sources.

A bit of history

The first version of the package was written in 1997, and it was uploaded onCRAN in 1998.Subsequent versions have evolved gradually up to version 0.4-18 in May 2013.

In January 2014, version 1.0-0 has been uploaded toCRAN.This represented a substantial re-writing of the earlier ‘version 0.x’,developed in broad connection with the book by Azzalini and Capitanio (2014). Differences between the ‘version 0’ and the ‘version 1’ series are radical; they concern the core computational and graphical part as well as the user interface. Since version 1.0-0, the S4 protocol for classes and methods has been adopted.

After various versions 1.x-y, version 2.0.0 has appeared in March 2021,providing support for theSUN distribution.

Additional information on the evolution of the package is provided inNEWS file, accessible from the package documentation index page.

Backward compatibility versus ‘version 0.4-18’

There is a partial backward compatibility of newer version versus‘version 0-4.18’ of the package.Some functions of the older version would work as beforewith virtually no change; a wider set arguments is now allowed. Functionsdsn,dst,dmsn and alike fall inthis category: in some cases, the names of the arguments have been altered, but they work as before if called with unnamed arguments; similar cases aremsn.mle,sn.cumulants andT.Owen.Notice, however, thatmsn.mle and other fitting functions haveeffectively been subsumed into the more comprehensive fitting functionselm.

A second group of functions will work with little or even minimal changes.Specific examples are functionssn.mle andst.mle which havebecomesn.mple andst.mple, with some additionalarguments (again, one can achieve the same result viaselm). Another example is constitude by the group of functionsdp.to.cp,cp.to.dp andst.cumulants.inversion, which have been replaced by the more general functionsdp2cp andcp2dp; one only needs to pay attention to conversion from 3rd and 4th order cumulants to their standardized form in connection with the replacement ofst.cumulants.inversion.

Finally, some functions are not there any longer, with no similarly-workingfunctions in the new version. This is the case ofsn.mle.groupedandst.mle.grouped for maximum likelihood estimation fromgrouped data, that is, data recorded as intervals and corresponding frequencies.

Requirements

R version 2.15-3 or higher, plus packagesmnormt,numDeriv andquantreg, in addition to standardpackages (methods,graphics,stats4, etc.)

Version

The commandcitation("sn") indicates, among other information,the running version of the package.The most recent version of the package can be obtained fromthe web page:http://azzalini.stat.unipd.it/SN/which also provides related material.

From the above-indicated web page, one can also obtain the package ‘sn0’ which is essentially the last ‘version 0’ (that is, 0.4-18) with suitable renaming of certain ingredients.This allows to have both the current and the old package installed at the same time.

Author

Adelchi Azzalini.Please send comments, error reportset cetera to the author, whose web page ishttp://azzalini.stat.unipd.it/.

Licence

This package and its documentation are usable under the terms of the “GNU General Public License” version 3 or version 2,as you prefer; a copy of them is available fromhttps://www.R-project.org/Licenses/.

While the software is freely usable, it would be appreciatedif a reference is inserted in publications or other workwhich makes use of it. For the appropriate way of referencing it, see the commandcitation("sn").

References

Azzalini, A. with the collaboration of Capitanio, A. (2014).The Skew-Normal and Related Families. Cambridge University Press, IMS Monographs series.

See Also

package-overview


Penalty function for log-likelihood ofselm models

Description

Penalty function for the log-likelihood ofselm models whenmethod="MPLE".Qpenalty is the default function;MPpenalty is an example of a user-defined function effectivelycorresponding to a prior distributio onalpha.

Usage

Qpenalty(alpha_etc, nu = NULL, der = 0)MPpenalty(alpha, der = 0)

Arguments

alpha_etc,alpha

in the univariate case, a single valuealpha; in the multivariate case, a two-component list whose first component is the vectoralpha, the second one is matrixcov2cor(Omega).

nu

degrees of freedom, only required ifselm is calledwithfamily="ST".

der

a numeric value in the set0,1,2 which indicates therequired numer of derivatives of the function. In the multivariate casethe function will only be called withder equal to 0 or 1.

Details

The penalty is a function ofalpha, but its expression maydepend on other ingredients, specificallynu andcov2cor(Omega).See ‘Details’ ofselm for additional information.

The penalty mechanism allows to introduce a prior distribution\pi for\alpha by settingQ=-\log\pi, leading to a maximuma posteriori estimate in the stated sense.

As a simple illustration of this mechanism, functionMPpenaltyimplements the ‘matching prior’ distribution for the univariateSNdistribution studied by Cabraset al. (2012); a brief summary of theproposal is provided in Section 3.2 of Azzalini and Capitanio (2014). Notethat, besidesalpha=+/-Inf, this choice also penalizesalpha=0withQ=Inf, effectively removingalpha=0 from the parameterspace.

Starting from the code of functionMPpenalty, a user should be able to introduce an alternative prior distribution if so desired.

Value

A positive numberQ representing the penalty, possiblywith attributesattr(Q, "der1") andattr(Q, "der2"), depending onthe input valueder.

Author(s)

Adelchi Azzalini

References

Azzalini, A. with the collaboration of Capitanio, A. (2014).The Skew-Normal and Related Families. Cambridge University Press, IMS Monographs series.

Cabras, S., Racugno, W., Castellanos, M. E., and Ventura, L. (2012).A matching prior for the shape parameter of the skew-normal distribution.Scand. J. Statist.39, 236–247.

See Also

selm function

Examples

data(frontier)m2 <- selm(frontier ~ 1)  # no penaltym2a <- selm(frontier ~ 1, method="MPLE") # penalty="Qpenalty" is implied herem2b <- selm(frontier ~ 1, method="MPLE", penalty="MPpenalty")

Class"SECdistrMv"

Description

A class of objects representing multivariate skew-elliptically contoured (SEC) distributions.

Objects from the Class

Objects can be created by a call to functionmakeSECdistr,when its argumentdp is a list, or by a suitable transformation of some object of this class. They can also obtained from an object generatedbyselm using the functionextractSEDdistr.

Slots

family:

a character string which identifies the parametricfamily; currently, possible values are:"SN","ESN","ST","SC".

dp:

a list of parameters; its length depends on the selectedfamily.

name:

a character string with the name of the multivariatevariable; it can be an empty string.

compNames:

a vector of character strings with the names of the component variables.

Methods

show

signature(object = "SECdistrMv-class"): ...

plot

signature(x = "SECdistrMv-class"): ...

summary

signature(object = "SECdistrMv-class"): ...

mean

signature(x = "SECdistrMv"): ...

vcov

signature(object = "SECdistrMv"): ...

Note

SeemakeSECdistr for a detailed description offamily anddp.

Note that here methodsmean andvcov are not appliedto data or to a fitted model, but to aprobabilitydistribution instead, of which they provide the mean (vector) valueand the variance-covariance matrix. If methodsmean andvcov are applied to a distribution for which the mean or the variance donot exist, aNULL value is returned and a warning messageis issued.

Author(s)

Adelchi Azzalini

See Also

SECdistrUv,plot,SECdistrMv-method,summary,SECdistrMv-method,affineTransSECdistr,marginalSECdistr,extractSECdistr

Examples

  dp0 <- list(xi=1:2, Omega=diag(3:4), alpha=c(3, -5))  f10 <- makeSECdistr(dp=dp0, family="SN", name="SN-2D", compNames=c("x", "y"))  show(f10)  plot(f10)  summary(f10)  mean(f10)  # the mean value of the probability distribution  vcov(f10)  # the variance-covariance matrix of the probability distribution

Class"SECdistrUv"

Description

A class of objects representing univariate skew-ellipticallycontoured (SEC) distributions.

Objects from the class

Objects can be created by a call to functionmakeSECdistr when its argumentdp is a vector. They can also obtained froman object generated byselm using the functionextractSEDdistr.

Slots

family:

a character string which selects the parametricfamily; currently, possible values are:"SN","ESN","ST","SC".

dp:

a numeric vector of parameters; its length depends on the selectedfamily.

name:

a character string with name of the distribution.

Methods

show

signature(object = "SECdistrUv"): ...

plot

signature(x = "SECdistrUv"): ...

summary

signature(object = "SECdistrUv"): ...

mean

signature(x = "SECdistrUv"): ...

sd

signature(object = "SECdistrUv"): ...

Note

SeemakeSECdistr for a detailed description offamily anddp.

Unlike various other packages, methodsmean andsd here are not targeted to data or to a fitted model, but to aprobabilitydistribution instead, of which they provide the mean valueand the standard deviation. If these methods are appliedto a distribution of which the mean or the variance do not exist, aNULL value is returned and a warning message is issued.

Author(s)

Adelchi Azzalini

See Also

SECdistrMv,plot,SECdistrUv-method,summary,SECdistrUv-method,extractSECdistr

Examples

f2 <- makeSECdistr(dp=c(3, 5, -pi, 6), family="ST", name="My first ST")show(f2)plot(f2)plot(f2, probs=c(1,5,9)/10)plot(f2, range=c(-30,10), probs=NULL, col=2, main=NULL)summary(f2)mean(f2)  # the mean value of the probability distributionsd(f2)  # the standard deviation of the distribution

The Unified Skew-Normal (SUN) probability distribution

Description

Density, distribution function, random number generation, the mean value,the variance-covariance matrix and the Mardia's measures of multivariateskewness and kurtosis of theSUN probability distribution.

Usage

dsun(x, xi, Omega, Delta, tau, Gamma, dp = NULL, log = FALSE, silent=FALSE, ...)psun(x, xi, Omega, Delta, tau, Gamma, dp = NULL, log = FALSE, silent=FALSE, ...)rsun(n=1, xi, Omega, Delta, tau, Gamma, dp = NULL, silent=FALSE)sunMean(xi, Omega, Delta, tau, Gamma, dp = NULL, silent=FALSE, ...)sunVcov(xi, Omega, Delta, tau, Gamma, dp = NULL, silent=FALSE, ...)sunMardia(xi, Omega, Delta, tau, Gamma, dp = NULL, silent=FALSE, ...)

Arguments

x

either a vector of lengthd, whered=ncol(Omega),with the coordinates of the point where the density or thedistribution function must be evaluated, or alternativelyad-column matrix whose rows represent a set of points.

xi

a numeric vector of lengthd representing the location parameter of the distribution; see ‘Background’.In a call todsun andpsun,xi can be a matrix,whose rows represent a set of location parameters;in this case, its dimensions must match those ofx.

Omega

a symmetric positive definite matrix of dimension(d,d);see ‘Details’.

Delta

a matrix of size(d,m), wherem=length(tau);see ‘Details’ about its constraints.

tau

a vector of lengthm, say.

Gamma

a symmetric positive definite matrix of dimension(m,m)with 1's on its main diagonal, that is, a correlation matrix

dp

a list with five elements, representingxi (which must be a vector in this case),Omega,Delta,tau andGamma, with restrictions indicated in the ‘Details’.Its default value isNULL; ifdp is assigned, the individual parameters must not be specified.

n

a positive integer value.

log

a logical value (default value:FALSE); ifTRUE, log-densities and log-probabilities are returned.

silent

a logical value which indicates the action to take in the casem=1, which could be more convenently handled by functions for theSN/ESN family. Ifsilent=FALSE (default value), a warningmessage is issued; otherwise this is suppressed.

...

additional tuning arguments passed either topmnorm (fordsun,psun andsunMean) or tomom.mtruncnorm (forsunVcov andsunMardia); see also ‘Details’.

Details

A member of theSUN family of distributions is identified by five parameters, which are described in the ‘Background’ section. The five parameters can be supplied by combining them in a list, denoteddp, in which case the individual parameters mustnot besupplied. The elements ofdp must appear in the above-indicated orderand must be named.

The optional arguments in... passed topmnorm,which usesptriv.nt whend=3,biv.nt.prob whend=2 and andsadmvn whend>2.In practice these arguments are effective only ifd>3, since for lower dimensions the computations are made to full available precision anyway.A similar fact applies to the... argument passed tomom.mtruncnorm.

Some numerical inaccuracy is inevitably involved in these computations.In most cases, they are of negligible extent, but they can possibly becomemore relevant, especially in the computation of higher order momentsinvolved bysunMardia, depending on the dimensiond and on the specific parameter values.Consider the ‘Warning’ section inrecintabwhich is used bymom.mtruncnorm.

The above-described functions operate following the traditionalR scheme for probability distributions. Another scheme, coexisting with the classical one, works withSUNdistr-class objects, which representSUN distributions, by encapsulating their parameters and other characteristics. These objects are created bymakeSUNdistr, and various methods exist for them; seeSUNdistr-class. Moreover these objects can be manipulated by a number of tools, described inSUNdistr-op, leading to new objects of the same class.

Value

The structure of the returned value depends on the called function, as follows:

dsun, psun a vector of lengthnrow(x) representing density or probability values,
or their log-transformed values iflog=TRUE,
rsun a matrix of size(n,d), where each row represents aSUN random vectors,
sunMean a vector of lengthd representing the mean value,
sunVcov a matrix of size(d,d) representing the variance-covariance matrix,
sunMardia a vector of length two with the Mardia's measures of multivariate skewness and kurtosis.

Background

A member of theSUN family is characterized by twodimensionality indices, denotedd andm, and a set of fiveparameters blocks (vector and matrices, as explained soon). The valued represents the number of observablecomponents; the valuem represents the number of latent (or hidden)variables notionally involved in the construction of the distribution.The parameters and their correspondingR variables are as follows:

\xixi a vector of lengthd,
\OmegaOmega a matrix of size(d,d),
\DeltaDelta a matrix of size(d,m),
\tautau a vector of lengthm,
\GammaGamma a matrix of size(m,m),

and must satisfy the following conditions:

  1. \Omega is a symmetric positive definite matrix;

  2. \Gamma is a symmetric positive definite matrix with 1's on the main diagonal, hence a correlation matrix;

  3. if\bar\Omega denotes the correlation matrix associated to\Omega, the matrix of size(d+m)\times(d+m)formed by the2 x 2 blocks

    \bar\Omega\Delta
    \Delta'\Gamma

    must be a positive definite correlation matrix.

The formulation adopted here has arisen as the evolution of earlier constructions, which are recalled very briefly next.A number of extensions of the multivariate skew-normal distributions, all involving a numberm (withm\ge1) of latent variables (instead ofm=1 like the skew-normal distribution), have been put-forward in close succession in the years 2003-2005. Special attention has been drawn by the ‘closed skew-normal (CSN)’distribution developed by González-Faríaset alii (2004a, 2004b)and the ‘fundamental skew-normal (FUSN)’ distributiondeveloped by Arellano-Valle and Genton (2005),but other formulations have been considered too.

Arellano Valle and Azzalini (2006) have shown the essential equivalenceof these apparently alternative constructions, after appropriatereparameterizations, and underlined the necessity of removing over-parameterizations in some cases, to avoid lack of identifiability. This elaboration has led to theSUN formulation. A relatively less technical account of their development is provided in Section 7.1 of Azzalini and Capitanio (2014), using very slightly modified notation and parameterization, which are the ones adopted here.

Additional results have been presented by Arellano-Valle and Azzalini (2021), such as expressions for the variance matrixand higher order moments, the Mardia's measures of multivariate skewnessand kurtosis, which are implemented here. Another result is theconditional distribution when the conditioning event is representedby an orthant.

Note

The present structure and user interface of this function, and of other ones related to theSUN distribution, must be considered experimental, and they might possibly change in the future.

Author(s)

Adelchi Azzalini

References

Arellano-Valle, R. B., and Azzalini, A. (2006).On the unification of families of skew-normal distributions.Scand. J. Stat.33, 561-574. Corrigendum in49 (2022), 1418-1419.

Arellano-Valle, R. B. and Azzalini, A. (2021).Some properties of the unified skew-normal distribution.Statistical Papers63, 461-487,doi:10.1007/s00362-021-01235-2;see alsoarXiv:2011.06316

Arellano-Valle, R. B. and Genton, M. G. (2005). On fundamental skew distributions.J. Multivariate Anal.96, 93–1116.

Azzalini, A. with the collaboration of Capitanio, A. (2014).The Skew-Normal and Related Families. Cambridge University Press, IMS Monographs series.

González-Farías, G., Domínguez-Molina, J. A., & Gupta, A. K. (2004a). Additive properties of skew normal random vectors.J. Statist. Plann. Inference126, 521-534.

González-Farías, G., Domínguez-Molina, J. A., & Gupta, A. K. (2004b). The closed skew-normal distribution. In M. G. Genton (Ed.),Skew-elliptical Distributions and Their Applications: a Journey Beyond Normality, Chapter 2, (pp. 25–42). Chapman & Hall/CRC.

See Also

makeSUNdistr to build aSUN distribution object, with related methods inSUNdistr-class,and other facilities inSUNdistr-op

convertCSN2SUNpar to convert a parameter set of the Closed Skew-Normal formulation to the equivalentSUN parameter set

Examples

xi <- c(1, 0, -1)Omega <- matrix(c(2,1,1, 1,3,1, 1,1,4), 3, 3)Delta <- matrix(c(0.72,0.20, 0.51,0.42, 0.88, 0.94), 3, 2, byrow=TRUE)Gamma <- matrix(c(1, 0.8, 0.8, 1), 2, 2)dp3 <- list(xi=xi, Omega=Omega, Delta=Delta, tau=c(-0.5, 0), Gamma=Gamma)x <- c(0.8, 0.5, -1.1)f1 <- dsun(x, xi, Omega, Delta, c(-0.5, 0), Gamma) # mode 1f2 <- dsun(x, dp=dp3)   # mode 2, equivalent to mode 1set.seed(1)xm <- rsun(10, dp=dp3)f3 <- dsun(xm, dp=dp3) psun(xm, dp=dp3)sunMean(dp=dp3)sunVcov(dp=dp3)sunMardia(dp=dp3)

Class"SUNdistr" and its methods

Description

A class of objects representing Unified Skew-Normal (SUN) distributions.

Details

SeeSUNdistr-base for a description of the required structure ofdp.

Note that here the methodsmean andvcov are not appliedto data or to a fitted model, but to aprobabilitydistribution, of which they provide the mean (expected) valueand the variance-covariance matrix.

The object of this class follow the S4 protocol.

Objects from the class

Objects can be created by a call to the functionmakeSUNdistr or by a suitable transformation of some object of this class.

Slots

dp:

a list of parameters of length five, as described inSUNdistr-base

name:

a character string with the name of the multivariatevariable; it can be an empty string.

compNames:

a vector of character strings with the names of the component variables.

HcompNames

a vector of character strings with the names of the hidden variables.

Methods

show

signature(object = "SUNdistr-class"): ...

plot

signature(x = "SUNdistr-class"): ...

summary

signature(object = "SUNdistr-class"): ...

mean

signature(x = "SUNdistr"): ...

vcov

signature(object = "SUNdistr"): ...

Author(s)

Adelchi Azzalini

See Also

plot,SUNdistr-method,\quadsummary,SUNdistr-method,\quadaffineTransSUNdistr,marginalSUNdistr

convertSN2SUNdistr to convert aSECdistr object withfamily"SN" or"ESN" to the equivalentSUNdistr-classobject

Examples

  xi <- c(1, 0, -1)  Omega <- matrix(c(2,1,1, 1,3,1, 1,1,4), 3, 3)  Delta <- matrix(c(0.72,0.20, 0.51,0.42, 0.88, 0.94), 3, 2, byrow=TRUE)  Gamma <- matrix(c(1, 0.8, 0.8, 1), 2, 2)  dp3 <- list(xi=xi, Omega=Omega, Delta=Delta, tau=c(-0.5, 0), Gamma=Gamma)  sun3 <- makeSUNdistr(dp=dp3, name="firstSUN", compNames=c("x", "w", "z"))  show(sun3)  plot(sun3)  mean(sun3)  # the mean value of the probability distribution  vcov(sun3)  # the variance-covariance matrix of the probability distribution  summary(sun3)  # a more detailed summary

Operations on SUNdistr-class objects

Description

Given an object ofSUNdistr-class, or possibly two such thingsin some cases, the functions perform various operations,and produce a new object of the same class.

Usage

affineTransSUNdistr(object, a, A, name, compNames, HcompNames, drop = TRUE)conditionalSUNdistr(object, comp, values, eventType = "=", name, drop = TRUE) convolutionSUNdistr(object1, object2, name, compNames, HcompNames) joinSUNdistr(object1, object2, name, compNames, HcompNames) marginalSUNdistr(object, comp, name, drop=TRUE)

Arguments

object,object1,object2

objects of classSUNdistr

a

a numeric vector; see ‘Details’

A

a numeric matrix; see ‘Details’

name

an optional character string with the name of the returneddistribution

compNames

an optional vector of character strings with the names of the component variables of the returned distribution

HcompNames

an optional vector of character strings with the names of the hidden variables of the returned distribution

drop

a logical value (default:TRUE) relevant only in thecasem=1. When bothm=1 anddrop=TRUE, the returned object is of class eitherSECdistrUv orSECdistrMv,depending on the dimension of the returned object, and family"SN" or"ESN", as appropriate.

comp

a vector of integers representing the selected components

values

a numeric vector which identifies the conditioning event

eventType

a single character value which indicates the type of theconditioning event, as described in the ‘Details’ section;possible values are"=" (default) and">"

Details

For anobject which represents the distribution of a multivariateSUN random variableY of dimensiond, say, a number ofoperations are possible, producing a new object of the same class.Thisobject could have been created bymakeSUNdistr or it could be the outcome from some previous call to one of the functionsdescribed here.

The functionaffineTransSUNdistr computes the distribution ofa+A'Y, providedA is a full-rank matrix withnrow(A)=d andlength(a)=ncol(A).See equation (7.6) of Azzalini & Capitanio (2014).

The functionmarginalSUNdistr builds aSUN distribution from the components selected by thecomp vector.

A conditional distribution can be computed usingconditionalSUNdistr for two type of events, selected byeventType. The"=" case corresponds to the eventY_1=y_1 whereY_1 is the subset of components identifiedby thecomp argument,y_1 is vector specified by thevalues argument and the equality sign must hold for each component.See equation (7.6) of Azzalini & Capitanio (2014).

IfconditionalSUNdistr is used witheventType=">",the conditiong refers to the eventY_1>y_1, where the inequality must be interpreted components-wise;see Arellano-Valle & Azzalini (2021) for the underlying mathematical result.If the conditional distribution is required for the reverse inequality condition,"<" say, this is equivalent to consideration of the event-Y_1>-y_1. The corresponding distribution can be obtained in two steps: first a new variable is constructed reversing the sign of the requiredcomponents usingaffineTransSUNdistr;thenconditionalSUNdistr is applied to this new variable withthe">" condition and values-y_1. More complex conditions, where the"<" and">" signs are mixed for different component varables, can be handled similarly, by introducing a square matrixA foraffineTransSUNdistr having an appropriate combination of1s' and-1's on its maindiagonal, and 0's elsewhere, and matching changes of sign to the components ofy_1.

FunctionsconvolutionSUNdistr andjoinSUNdistr operate under the assumptions thatobject1 andobject2 refer to independent variables. Specifically,convolutionSUNdistr computes the convolution of thetwo objects (i.e. the distribution of the sum of two independent variables),which must have the same dimensiond. FunctionjoinSUNdistr combines two objects into a joint distribution.

If the argumentsname,compNames andHcompNamesare missing, they are composed from the supplied arguments.

Value

an object ofSUNdistr-class

Note

The present structure and user interface of this function, and of other ones related to theSUN distribution, must be considered experimental, and they might possibly change in the future.

Author(s)

Adelchi Azzalini

References

Arellano-Valle, R. B. and Azzalini, A. (2021).Some properties of the unified skew-normal distribution.Statistical Papers,doi:10.1007/s00362-021-01235-2andarXiv:2011.06316

Azzalini, A. with the collaboration of Capitanio, A. (2014).The Skew-Normal and Related Families. Cambridge University Press, IMS Monographs series.

See Also

SUNdistr-base,makeSUNdistr,SUNdistr-class

Examples

xi <- c(1, 0, -1)Omega <- matrix(c(2,1,1, 1,3,1, 1,1,4), 3, 3)Delta <- matrix(c(0.72,0.20, 0.51,0.42, 0.88, 0.94), 3, 2, byrow=TRUE)Gamma <- matrix(c(1, 0.8, 0.8, 1), 2, 2)dp3 <- list(xi=xi, Omega=Omega, Delta=Delta, tau=c(-0.5, 0), Gamma=Gamma)sun3 <- makeSUNdistr(dp=dp3, name="SUN3", compNames=c("x", "w", "z"))#a <- c(1,-2)A <- matrix(1:6, 3, 2)sun2at <- affineTransSUNdistr(sun3, a, A, "SUN2at", compNames=c("at1", "at2"))sun2m <- marginalSUNdistr(sun3, comp=c(1,3), name="SUN2m")sun1c <- conditionalSUNdistr(sun3, comp=c(1,3), values=c(1.1, 0.8),            eventType=">", name="SUN1c", drop=FALSE)#Omega <- matrix(c(5, 1, 1, 6), 2, 2)Delta <- matrix(c(0.30, 0.50, 0.50, 0.85), 2, 2, byrow=TRUE)Gamma <- matrix(c(1, 0.18, 0.18, 1), 2, 2)tau <- c(0.4, -0.8)dp2 <- list(x=c(1, 0), Omega=Omega, Delta=Delta, tau=tau, Gamma=Gamma)sun2 <-  makeSUNdistr(dp=dp2, name="SUN2", compNames=c("u", "v"))#sun2conv <- convolutionSUNdistr(sun2, sun2m, name="SUN2sum")sun5 <- joinSUNdistr(sun3, sun2)

Owen's function

Description

Evaluates functionT(h,a) studied by D.B.Owen

Usage

T.Owen(h, a, jmax=50, cut.point=8)

Arguments

h

a numeric vector. Missing values (NAs) andInf areallowed.

a

a numeric value.Inf is allowed.

jmax

an integer scalar value which regulates the accuracy of the result.See Section ‘Details’ below for explanation.

cut.point

a scalar value which regulates the behaviour of the algorithm,as explained in Section ‘Details’ below (default value:8).

Details

Ifa>1 and0<h<=cut.point, a series expansion is used,truncated afterjmax terms.Ifa>1 andh>cut.point, an asymptotic approximation is used.In the other cases, various reflection properties of the functionare exploited. See the reference below for more information.

Value

a numeric vector.

Background

The functionT(h,a) studied by Owen (1956) is useful for the computation of the bivariate normal distribution function and related quantities, including the distribution function of a skew-normal variate; seepsn.See the reference below for more information on functionT(h,a).

Author(s)

Adelchi Azzalini and Francesca Furlan

References

Owen, D. B. (1956).Tables for computing bivariate normal probabilities.Ann. Math. Statist.27, 1075-1090.

See Also

psn

Examples

 owen <- T.Owen(1:10, 2)

Affine transformations and marginals of a skew-elliptical distribution

Description

Given a multivariate random variableY with skew-elliptical(SEC) distribution, compute the distribution of a (possibly multivariate) marginal or the distributionof an affine transformationa + A^{\top}Y.

Usage

  affineTransSECdistr(object, a, A, name, compNames, drop=TRUE)  marginalSECdistr(object, comp, name, drop=TRUE)

Arguments

object

an object of classSECdistrMv which identifies the source random variable, as created bymakeSECdistr or byextractSECdistr or by a previous call to these functions

a

a numeric vector with the lengthncol(A).

A

a full-rank matrix withnrow(A) equal to the dimensionalityd of the random variable identified byobject.

name

an optional character string representing the name of the outcome distribution; if missing, one such string is constructed.

compNames

an optional vector of lengthncol(A) of character strings with the names of the components of the outcome distribution; if missing, one such vector is constructed.

drop

a logical flag (default value:TRUE), operating only ifthe returned object has dimensiond=1, in which case it indicateswhether this object must be of classSECdistrUv.

comp

a vector formed by a subset of1:d which indicates whichcomponents must be extracted fromobject, on denoting byd its dimensionality.

Value

Ifobject defines the distribution of aSEC randomvariableY,affineTransSECdistr computes the distribution ofa+A'Y andmarginalSECdistr computes the marginaldistribution of thecomp components. In both cases the returnedobject is of classSECdistrMv, except whendrop=TRUEoperates, leading to an object of classSECdistrUv.

Background

These functions implement formulae given in Sections 5.1.4, 5.1.6 and 6.2.2 of the reference below.

References

Azzalini, A. with the collaboration of Capitanio, A. (2014).The Skew-Normal and Related Families. Cambridge University Press, IMS Monographs series.

See Also

makeSECdistr,extractSECdistr,SECdistrMv-class

Examples

dp3 <- list(xi=1:3, Omega=toeplitz(1/(1:3)), alpha=c(3,-1,2), nu=5)st3 <- makeSECdistr(dp3, family="ST", name="ST3", compNames=c("U", "V", "W"))A <- matrix(c(1,-1,1, 3,0,-2), 3, 2)new.st <- affineTransSECdistr(st3, a=c(-3,0), A=A)#st2 <- marginalSECdistr(st3, comp=c(3,1), name="2D marginal of ST3")

Australian Institute of Sport data

Description

Data on 102 male and 100 female athletes collected at the Australian Institute of Sport, courtesy of Richard Telford and Ross Cunningham.

Usage

data(ais)

Format

A data frame with 202 observations on the following 13 variables.

[,1]sex categorical, levels:female,male
[,2]sport categorical, levels:B_Ball,Field,Gym,Netball,Row,Swim,T_400m,
Tennis,T_Sprnt,W_Polo
[,3]RCC red cell count (numeric)
[,4]WCC white cell count (numeric)
[,5]Hc Hematocrit (numeric)
[,6]Hg Hemoglobin (numeric)
[,7]Fe plasma ferritin concentration (numeric)
[,8]BMI body mass index, weight/(height)^2 (numeric)
[,9]SSF sum of skin folds (numeric)
[,10]Bfat body fat percentage (numeric)
[,11]LBM lean body mass (numeric)
[,12]Ht height, cm (numeric)
[,13]Wt weight, kg (numeric)

Details

The data have been made publicly available in connection with thebook by Cook and Weisberg (1994).

References

Cook and Weisberg (1994),An Introduction to Regression Graphics. John Wiley & Sons, New York.

Examples

data(ais, package="sn")pairs(ais[,c(3:4,10:13)], col=as.numeric(ais[,1]), main = "AIS data")

Price of Barolo wine

Description

A data frame with prices of bottles of Barolo wine and some auxiliary variables

Usage

data(barolo)

Format

A data frame with 307 observations on five variables, as follows:

reseller reseller code (factor with levelsA, B, C, D)
vintage vintage year (numeric)
volume content volume in centilitres (numeric)
price price in Euro (numeric)
age age in 2010 (numeric)

For six items,vintage isNA's and so alsoage.Three items have a non-standard volume of 50 cl.

Details

The data have been obtained in July 2010 from the websites of four Italian wine resellers, selecting only quotations of Barolo wine, which is produced in the Piedmont region of Italy. The price does not include the delivery charge.

The data have been presented in Section 4.3.2 of the reference below,where a subset of them has been used for illustrative purposes.This subset refers to reseller"A" and bottles of 75cl.

Source

Azzalini, A. with the collaboration of Capitanio, A. (2014).The Skew-Normal and Related Families. Cambridge University Press, IMS Monographs series.

Examples

data(barolo)attach(barolo)  f <- cut(age, c(0, 5,  6, 8, 11,  30))table(volume, f)plot(volume, price, col=as.numeric(f), pch=as.character(reseller))legend(400, 990, col=1:5, lty=1, title="age class",       legend=c("4-5", "6", "7-8", "9-11", "12-30"))#A75 <- (reseller=="A" & volume==75)hist(log(price[A75],10), col="gray85")# see Figure 4.7 of the source

Coefficients of objects created byselm

Description

coef method for classes"selm" and"mselm".

Usage

## S4 method for signature 'selm'coef(object, param.type = "CP", ...)## S4 method for signature 'mselm'coef(object, param.type = "CP", vector=TRUE, ...)

Arguments

object

an object of class"selm" or"mselm" as createdby a call to functionselm.

param.type

a character string which indicates the required type of parameter type; possible values are"CP" (default),"DP","pseudo-CP" and their equivalent lower-case expressions.

vector

a logical value (default isTRUE) which selects a vectoror a list format of the retuned value

...

not used, included for compatibility with the generic method

Value

a numeric vector or a list(the latter only formselm-class objects ifvector=FALSE)

Note

The possible options ofparam.type are described in thedocumentation ofdp2cp; their corresponding outcomes differ by an additive constant only. With the"CP" option (that is,the ‘centred parametrization’), the residuals are centred around 0, at least approximately; this is a reason for setting"CP" as the default option. For more information, see the ‘Note’ in the documentation ofsummary.selm.

Author(s)

Adelchi Azzalini

References

Azzalini, A. with the collaboration of Capitanio, A. (2014).The Skew-Normal and Related Families. Cambridge University Press, IMS Monographs series.

See Also

dp2cp,summary.selm,selm function,selm-class

Examples

 data(wines, package="sn")m5 <- selm(acidity ~ phenols + wine, family="SN", data=wines)coef(m5)coef(m5, "dp")#m12 <- selm(cbind(acidity, alcohol) ~ phenols + wine,  family="SN", data=wines)coef(m12)coef(m12, "DP", vector=FALSE)

Skew-normal conditional distribution

Description

For a multivariate (extended) skew-normal distribution, compute its conditional distribution for given values of some of its components.

Usage

  conditionalSECdistr(object, fixed.comp, fixed.values, name, drop = TRUE)

Arguments

object

an object of classSECdistrMv withfamily="SN"orfamily="ESN".

fixed.comp

a vector containing a subset of1:d which selectsthe components whose values are to be fixed, ifd denotes thedimensionality of the distribution.

fixed.values

a numeric vector of values taken on by the componentsfixed.comp; it must be of the same length offixed.comp.

name

an optional character string with the name of the outcome distribution; if missing, one such string is constructed.

drop

logical (default=TRUE), to indicate whether thereturned object must be of classSECdistrUv whenlength(fixed.comp)+1=d.

Details

For background information, see Section 5.3.2 of the reference below.

Value

an object of classSECdistrMv, except in the case whendrop=TRUE operates, leading to an object of classSECdistrUv-class.

References

Azzalini, A. and Capitanio, A. (2014).The Skew-normal and Related Families. Cambridge University Press, IMS Monographs series.

See Also

makeSECdistr,SECdistrMv-class,affineTransSECdistr

Examples

Omega <- diag(3) + outer(1:3,1:3)sn <- makeSECdistr(dp=list(xi=rep(0,3), Omega=Omega, alpha=1:3), family="SN")esn <- conditionalSECdistr(sn, fixed.comp=2, fixed.values=1.5)show(esn)

Confidence intervals for parameters of a selm-class object

Description

Computes confidence intervals for parameters in a selm-class object producesbyselm fit when the response variable is univariate.

Usage

  ## S3 method for class 'selm'confint(object, parm, level=0.95, param.type, tol=1e-3, ...)

Arguments

object

an object of classselm as produced by a call tofunctionselm with univariate response.

parm

a specification of which parameters are to be given confidence intervals, either a vector of numbers or a vector of names. If missing, all parameters are considered.

level

the confidence level required (default value is0.95).

param.type

a character string with the required parameterization; it must be either"CP" or"DP" or"pseudo-CP", or possibly their equivalent lowercase.

tol

the desired accuracy (convergence tolerance); this is a parameter passed touniroot for computing the roots of thelikelihood-based confidence interval foralpha.

...

not used, only there for compatibility reasons.

Details

A description of the methodology underlyingconfint.selm is provided in the technical note of Azzalini (2016).That document also explains why in certain cases an interval is not constructed andNA's are returned as endpoint.

Value

An invisible list whose components, described below, are partly different in the one- and the two-parameter cases.

call

the calling statement

<param1>

values of the first parameter

<param2>

values of the second parameter (in a two-parameter case)

logLik

numeric vector or matrix of the profile log-likelihood values

confint

in the one-parameter case, the confidence interval

level

in the one-parameter case, the confidence level

deviance.contour

in the two-parameter case, a list of lists whoseelements identify each curve of the contour plot

Author(s)

Adelchi Azzalini

References

Azzalini, A. (2016). Derivation of various types of intervals from aselm object.Technical note distributed with the documentation of theR packagesn in fileselm-intervals.pdf within section ‘User guide, package vignettes and other documentation’.

See Also

selm,summary.selm,profile.selm,

makeSECdistr for theCP/DP parameterizations,

uniroot for itstol argument

Examples

data(ais)m1 <-  selm(log(Fe) ~ BMI + LBM, family = "sn", data = ais)intervCP <- confint(m1)intervDP <- confint(m1, param.type="DP")confint(m1, parm=2:3)    confint(m1, parm=c("omega", "alpha"), param.type="DP")

Conversion ofCSN parameters toSUN parameters

Description

The parameter set of a Closed Skew-Normal (CSN)distribution is converted into the parameter set of the equivalent Unified Skew-Normal (SUN) distribution.

Usage

convertCSN2SUNpar(mu, Sigma, D, nu, Delta)

Arguments

mu

a numeric vector of lengthp, say.

Sigma

a positive definite variance matrix of sizec(p,p).

D

an arbitrary numeric matrix of size sayc(q, p), say.

nu

a numeric vector of lengthq.

Delta

a positive definite variance matrix of sizec(q,q).

Details

The arguments of the function match the parameters(\mu, \Sigma, D, \nu, \Delta) of theCSN distributionpresented by González-Faríaset alii (2004a, 2004b). These parameters are converted into those of the equivalentSUN distribution, which is unique. The converse operation, that is,mapping parameters from theSUN to theCSN family, is not handled here. Its solution would be non-unique, because theCSN family is over-parameterized.

Note that, having retained the exact notation of the above-quoted papers,there is aDelta argument which must not be confused with one of thearguments for theSUN distribution inSUNdistr-base. The coincidence of these names is entirely accidental.

TheCSN parameters must only satisfy the requirements that\Sigma and\Delta are symmetric positive definite matrices. Since these conditions are somewhat simpler to check than those for theSUN parameters, as indicated inSUNdistr-base, this function may provide a simple option for the specification of aCSN/SUN distribution.

The parameter listdp produced by this function can be used as aninput for the functions inSUNdistr-base or formakeSUNdistr.

Value

a list representing thedp parameter set of thecorrespondingSUN distribution

Author(s)

Adelchi Azzalini

References

González-Farías, G., Domínguez-Molina, J. A., & Gupta, A. K. (2004a). Additive properties of skew normal random vectors.J. Statist. Plann. Inference126, 521-534.

González-Farías, G., Domínguez-Molina, J. A., & Gupta, A. K. (2004b). The closed skew-normal distribution. In M. G. Genton (Ed.),Skew-elliptical Distributions and Their Applications: a Journey Beyond Normality, Chapter 2, (pp. 25–42). Chapman & Hall/CRC.

See Also

SUNdistr-base,makeSUNdistr

Examples

p <- 3q <- 2mu <- 1:pSigma <- toeplitz(1/(1:p))D <- matrix(sqrt(1:(p*q)), q, p)nu <- 1/(1:q)Delta <- diag(q) + outer(rep(1,q), rep(1,q))dp <- convertCSN2SUNpar(mu, Sigma, D, nu, Delta)

Convert a SN distribution into a SUN

Description

An object ofSECdistrMv-class orSECdistrUv-class representing aSN orESN distribution is converted into aSUNdistr-class object representing the same distribution.

Usage

convertSN2SUNdistr(object, HcompNames = "h", silent = FALSE)

Arguments

object

an object ofSECdistrMv-class withfamily of typeSN orESN.

HcompNames

an optional character string for the hidden component

silent

a logical value which controls the behaviour if the suppliedobject is not suitable. Ifsilent = FALSE (detault value)an error message is generated, otherwise aNULL is silentlyreturned.

Value

an object ofSUNdistr-class

Author(s)

Adelchi Azzalini

See Also

SUNdistr-class,SECdistrMv-class,SECdistrUv-class

Examples

esn <- makeSECdistr(dp=c(0, 1, 2, 0.5), family="ESN")sun <- convertSN2SUNdistr(esn)mean(sun) - mean(esn)vcov(sun) - sd(esn)^2#dp0 <- list(xi=1:2, Omega=diag(3:4), alpha=c(3, -5))f10 <- makeSECdistr(dp=dp0, family="SN", name="SN-2d", compNames=c("u1", "u2"))sun10 <- convertSN2SUNdistr(f10)mean(sun10) - mean(f10)vcov(sun10) - vcov(f10)

Multivariate skew-normal distribution

Description

Probability density function, distribution function and random number generation for the multivariate skew-normal (SN) distribution.

Usage

dmsn(x, xi=rep(0,length(alpha)), Omega, alpha, tau=0, dp=NULL, log=FALSE)pmsn(x, xi=rep(0,length(alpha)), Omega, alpha,  tau=0, dp=NULL, ...)rmsn(n=1, xi=rep(0,length(alpha)), Omega, alpha,  tau=0, dp=NULL)

Arguments

x

either a vector of lengthd, whered=length(alpha), or a matrix withd columns, giving the coordinates of the point(s) where the density or thedistribution function must be evaluated.

xi

a numeric vector of lengthd representing the location parameter of the distribution; see ‘Background’.In a call todmsn andpmsn,xi can be a matrix,whose rows represent a set of location parameters;in this case, its dimensions must match those ofx.

Omega

a symmetric positive-definite matrix of dimension(d,d);see ‘Background’.

alpha

a numeric vector which regulates the slant of the density; see ‘Background’.Inf values inalpha are not allowed.

tau

a single value representing the ‘hidden mean’ parameter of theESN distribution;tau=0 (default) corresponds to aSN distribution.

dp

a list with three elements, corresponding toxi,Omega andalpha described above; default valueFALSE. Ifdp is assigned, individual parameters must not be specified.

n

a numeric value which represents the number of random vectorsto be drawn.

log

logical (default value:FALSE); ifTRUE, log-densities are returned.

...

additional parameters passed topmnorm.

Details

Typical usages are

dmsn(x, xi=rep(0,length(alpha)), Omega, alpha, log=FALSE)dmsn(x, dp=, log=FALSE)pmsn(x, xi=rep(0,length(alpha)), Omega, alpha, ...)pmsn(x, dp=)rmsn(n=1, xi=rep(0,length(alpha)), Omega, alpha)rmsn(n=1, dp=)

For efficiency reasons,rmsn makes very limited checks on the validity of the arguments. For instance, failure to positive definiteness ofOmega would not be detected, and an uncontrolled crash occurs. Functionpmsn makes use ofpmnorm from packagemnormt;the accuracy of its computation can be controlled via...

Value

A vector of density values (dmsn) or of probabilities(pmsn) or a matrix of random points (rmsn).

Background

The multivariate skew-normal distribution is discussed by Azzalini and Dalla Valle (1996). The(Omega,alpha)parametrization adopted here is the one of Azzalini and Capitanio (1999).Chapter 5 of Azzalini and Capitanio (2014) provides an extensive account,including subsequent developments.

Notice that the location vectorxi does not represent the mean vector of the distribution. Similarly,Omega is notthe covariance matrix of the distribution, although it isa covariance matrix. Finally, the components ofalpha are not equal to the slant parameters of the marginal distributions; to fix the marginal parameters at prescribed values, it is convenient to start from the OP parameterization, as illustrated in the ‘Examples’ below. Another option is to start from theCPparameterization, but notice that, at variance from theOP, not allCP sets are invertible to lend aDP set.

References

Azzalini, A. and Capitanio, A. (1999).Statistical applications of the multivariate skew normal distribution.J.Roy.Statist.Soc. B61, 579–602. Full-length version available athttps://arXiv.org/abs/0911.2093

Azzalini, A. with the collaboration of Capitanio, A. (2014).The Skew-Normal and Related Families. Cambridge University Press, IMS Monographs series.

Azzalini, A. and Dalla Valle, A. (1996).The multivariate skew-normal distribution.Biometrika83, 715–726.

See Also

dsn,dmst,pmnorm,op2dp,cp2dp

Examples

x <- seq(-3,3,length=15)xi <- c(0.5, -1)Omega <- diag(2)Omega[2,1] <- Omega[1,2] <- 0.5alpha <- c(2,-6)pdf <- dmsn(cbind(x, 2*x-1), xi, Omega, alpha)cdf <- pmsn(cbind(x, 2*x-1), xi, Omega, alpha)p1 <- pmsn(c(2,1), xi, Omega, alpha)p2 <- pmsn(c(2,1), xi, Omega, alpha, abseps=1e-12, maxpts=10000)#rnd <- rmsn(10, xi, Omega, alpha)## use OP parameters to fix marginal shapes at given lambda values:op <- list(xi=c(0,1), Psi=matrix(c(2,2,2,3), 2, 2), lambda=c(5, -2))rnd <- rmsn(10, dp=op2dp(op,"SN"))# # use CP parameters to fix mean vector, variance matrix and marginal skewness:cp <- list(mean=c(0,0), var.cov=matrix(c(3,2,2,3)/3, 2, 2), gamma1=c(0.8, 0.4))dp <- cp2dp(cp, "SN")rnd <- rmsn(5, dp=dp)

Multivariate skew-t distribution and skew-Cauchy distribution

Description

Probability density function, distribution function and random number generation for the multivariate skew-t (ST) andskew-Cauchy (SC) distributions.

Usage

dmst(x, xi=rep(0,length(alpha)), Omega, alpha, nu=Inf, dp=NULL, log=FALSE)pmst(x, xi=rep(0,length(alpha)), Omega, alpha, nu=Inf, dp=NULL, ...)rmst(n=1, xi=rep(0,length(alpha)), Omega, alpha, nu=Inf, dp=NULL)dmsc(x, xi=rep(0,length(alpha)), Omega, alpha, dp=NULL, log=FALSE)pmsc(x, xi=rep(0,length(alpha)), Omega, alpha, dp=NULL, ...)rmsc(n=1, xi=rep(0,length(alpha)), Omega, alpha, dp=NULL)

Arguments

x

fordmst anddmsc, this is either a vector of lengthd, whered=length(alpha), or a matrix withd columns,representing the coordinates of the point(s) where the density must beavaluated; forpmst andpmsc, only a vector of lengthd is allowed.

xi

a numeric vector of lengthd representing the location parameter of the distribution; see ‘Background’.In a call todmst ordmsc,xi can be a matrix,whose rows represent a set of location parameters; in this case, its dimensions must match those ofx.

Omega

a symmetric positive-definite matrix of dimension(d,d); see Section ‘Background’.

alpha

a numeric vector of lengthd which regulates the slantof the density; see Section ‘Background’.Inf values inalpha are not allowed.

nu

a positive value representing the degrees of freedom ofST distribution; does not need to be integer. Default value isnu=Inf which corresponds to the multivariateskew-normal distribution.

dp

a list with three elements namedxi,Omega,alpha andnu, containing quantities as described above. Ifdp is specified, this prevents specification of the individualparameters.

n

a numeric value which represents the number of random vectors to bedrawn; default value is1.

log

logical (default value:FALSE); ifTRUE,log-densities are returned.

...

additional parameters passed topmt.

Details

Typical usages are

dmst(x, xi=rep(0,length(alpha)), Omega, alpha, nu=Inf, log=FALSE)dmst(x, dp=, log=FALSE)pmst(x, xi=rep(0,length(alpha)), Omega, alpha, nu=Inf, ...)pmst(x, dp=, ...)rmst(n=1, xi=rep(0,length(alpha)), Omega, alpha, nu=Inf)rmst(n=1, dp=)dmsc(x, xi=rep(0,length(alpha)), Omega, alpha, log=FALSE)dmsc(x, dp=, log=FALSE)pmsc(x, xi=rep(0,length(alpha)), Omega, alpha, ...)pmsc(x, dp=, ...)rmsc(n=1, xi=rep(0,length(alpha)), Omega, alpha)rmsc(n=1, dp=)

For efficiency reasons,rmst, rmsc make very limited checks on the validity of the arguments. For instance, failure to positive definiteness ofOmega would not be detected, and an uncontrolled crash occurs.Functionpmst requiresdmt from packagemnormt; the accuracy of its computation can be controlled via argument....

Value

A vector of density values (dmst anddmsc) or a singleprobability (pmst andpmsc) or a matrix of random points(rmst andrmsc).

Background

The family of multivariateST distributions is an extension of the multivariate Student'st family, via the introduction of aalpha parameter which regulates asymmetry; whenalpha=0, the skew-tdistribution reduces to the commonly used form of multivariate Student'st. Further, location is regulated byxi and scale byOmega, when its diagonal terms are not all 1's.Whennu=Inf the distribution reduces to the multivariate skew-normal one; seedmsn. Notice that the location vectorxidoes not represent the mean vector of the distribution (which in factmay not even exist ifnu <= 1), and similarlyOmega is notthe covariance matrix of the distribution, although it isa covariance matrix. For additional information, see Section 6.2 of the reference below.

The family of multivariateSC distributions is the subset of theST family, obtained whennu=1. While in the univariate casethere are specialized functions for theSC distribution,dmsc,pmsc andrmsc simply make a call todmst,pmst, rmst with argumentnu set equal to 1.

References

Azzalini, A. with the collaboration of Capitanio, A. (2014).The Skew-Normal and Related Families. Cambridge University Press, IMS Monograph series.

See Also

dst,dsc,dmsn,dmt,makeSECdistr

Examples

x <- seq(-4,4,length=15)xi <- c(0.5, -1)Omega <- diag(2)Omega[2,1] <- Omega[1,2] <- 0.5alpha <- c(2,2)pdf <- dmst(cbind(x,2*x-1), xi, Omega, alpha, 5)rnd <- rmst(10,  xi, Omega, alpha, 6)p1 <- pmst(c(2,1), xi, Omega, alpha, nu=5)p2 <- pmst(c(2,1), xi, Omega, alpha, nu=5, abseps=1e-12, maxpts=10000)

Conversion between parametrizations of a skew-elliptical distribution

Description

Convert direct parameters (DP) to centred parameters (CP) of a skew-elliptical distribution andvice versa.

Usage

dp2cp(dp, family, object = NULL, cp.type = "proper", upto = NULL) cp2dp(cp, family)dp2op(dp, family)op2dp(op, family)

Arguments

dp

a vector (in the univariate case) or a list (in the multivariatecase) as described inmakeSECdistr; see ‘Background and Details’ for an extented form of usage.

cp

a vector or a list, in agreement withdp as for type anddimension.

op

a vector or a list, in agreement withdp as for type anddimension.

family

a characther string with the family acronym, as described inmakeSECdistr, except that family"ESN" is not implemented.

object

optionally, an S4 object of classSECdistrUv orSECdistrMv, as produced bymakeSECdistr (default value:NULL). If this argument is notNULL, thenfamily anddp must not be set.

cp.type

character string, which has effect only iffamily="ST" or"SC", otherwise a warning message is generated. Possible values are"proper", "pseudo", "auto", which correspond to theCP parameter set, their 'pseudo-CP' version and an automaticselection based onnu>4, wherenu represents the degrees offreedom of theST distribution.

upto

numeric value (in1:length(dp), default=NULL) to select how manyCP components are computed. Default valueupto=NULL is equivalent tolength(dp).

Value

Fordp2cp, a matching vector (in the univariate case) or a list (in the multivariate case) ofcp parameters. Forcp2dp andop2dp, a similar object ofdp parameters,provided the set of input parameters is in the admissible region.Fordp2op, a similar set ofop parameters.

Background

For a description of theDPparameters, see Section ‘Details’ ofmakeSECdistr. TheCP form of parameterization is cumulant-based. For a univariatedistribution, theCP components are the mean value (first cumulant), the standard deviation (square root of the 2nd cumulant), the coefficient of skewness (3rd standardized cumulant) and, for theST, the coefficient of excess kurtosis (4th standardized cumulant). For a multivariate distribution, there exists an extension based on the same logic; its components represent thevector mean value, the variance matrix, the vector of marginal coefficients ofskewness and, only for theST, the Mardia's coefficient of excesskurtosis. The pseudo-CP variant provides an ‘approximate form’ ofCP when not all required cumulants exist; however, this parameter setis not uniquely invertible toDP. The names of pseudo-CPcomponents printed in summary output are composed by adding a~ after the usual component name; for example, the first one is denotedmean~.

Additional information is provided by Azzalini and Capitanio (2014).Specifically, their Section 3.1.4 presentsCP in the univariateSN case, Section 4.3.4CP for theST case and the 'pseudo-CP' version. Section 5.2.3 presents the multivariateextension for theSN distribution, Section 6.2.5 for the multivariateST case. For a more detailed discussion, see Arellano-Valle & Azzalini (2013).

TheOP parameterization is very similar toDP, from which it differs only for the components which regulate dispersion (or scatter) and slant. Its relevance lies essentially in the multivariate case, where the components of the slant parameter can be interpreted component-wise andremain unaffected if marginalization with respect to some other components is performed. In the multivariateSN case, the components ofOP, denoted\xi, \Psi, \lambda, are associated to the expression of the densityfunction (5.30) of Azzalini & Capitanio (2014); see pp.128–131 for moreinformation. In the univariate case, the slant component ofDPand the one ofOP coincide, that is,\alpha=\lambda,Parameter\xi and other parameters which may exist with other families remain the same of theDP set. The termOP stands for‘original parameterization’ since this is, up to a negligible difference, the parameterization adopted by Azzalini & Dalla Valle (1996).

Details

While any choice of the components ofDP orOP isadmissible, this is not true forCP. An implication is that acall tocp2dp may fail with an error message"non-admissible CP"for certain input values. The most extreme case is represented by theSC family, for whichCP never exists; hence it makesto sense to callcp2dp withfamily="SC".

It is possible to call the functions withdp orcp having morecomponents than those expected for a given family as described above and inmakeSECdistr. In the univariate case, this means thatdporcp can be vectors of longer length than indicated earlier. Thisoccurrence is interpreted in the sense that the additional components afterthe first one are regarded as regression coefficients of aselm model,and they are transferred unchanged to the matching components of thetransformed parameter set; the motivation is given in Section 3.1.4 ofAzzalini and Capitanio (2014). In the multivariate case,dp[[1]] andcp[[1]] can be matrices instead of vectors; the rows beyond the firstone are transferred unchanged tocp[[1]] anddp[[1]],respectively.

References

Arellano-Valle, R. B. and Azzalini, A. (2013, available on-line 12 June 2011). The centred parameterization and related quantities of the skew-t distribution.J. Multiv. Analysis113, 73-90.

Azzalini, A. with the collaboration of Capitanio, A. (2014).The Skew-Normal and Related Families. Cambridge University Press, IMS Monographs series.

Azzalini, A. and Dalla Valle, A. (1996).The multivariate skew-normal distribution.Biometrika83, 715–726.

See Also

makeSECdistr,summary.SECdistr,sn.cumulants,

the ‘Note’ atsummary.selm for the reason whyCP is the default parameterization in that function and in related ones,

the ‘Examples’ atrmsn for use of theCPparameterization

Examples

# univariate casecp <- dp2cp(c(1, 2222, 3333, 2, 3), "SN")dp <- cp2dp(cp, "SN")# notice that the 2nd and the 3rd component remain unchanged## multivariate casedp3 <- list(xi=1:3, Omega=toeplitz(1/(1:3)), alpha=c(-3, 8, 5), nu=6)cp3 <- dp2cp(dp3, "ST")dp3.back <- cp2dp(cp3, "ST")#op3 <- dp2op(dp3, "ST")dp3back <- op2dp(op3,"ST")

Skew-Cauchy Distribution

Description

Density function, distribution function, quantiles and randomnumber generation for the skew-Cauchy (SC) distribution.

Usage

dsc(x, xi = 0, omega = 1, alpha = 0, dp = NULL, log = FALSE)psc(x, xi = 0, omega = 1, alpha = 0, dp = NULL)qsc(p, xi = 0, omega = 1, alpha = 0, dp = NULL) rsc(n = 1, xi = 0, omega = 1, alpha = 0, dp = NULL)

Arguments

x

vector of quantiles. Missing values (NAs) andInf'sare allowed.

p

vector of probabilities. Missing values (NAs) are allowed.

xi

vector of location parameters.

omega

vector of (positive) scale parameters.

alpha

vector of slant parameters.

dp

a vector of length 3 whose elements represent the parameters described above. Ifdp is specified, the individual parameters cannot be set.

n

sample size.

log

logical flag used indsc (defaultFALSE).WhenTRUE, the logarithm of the density values is returned.

Value

density (dsc), probability (psc), quantile (qsc)or random sample (rsc) from the skew-Cauchy distribution with givenxi,omega andalpha parameters or from the extendedskew-normal iftau!=0

Details

Typical usages are

dsc(x, xi=0, omega=1, alpha=0, log=FALSE)dsc(x, dp=, log=FALSE)psc(x, xi=0, omega=1, alpha=0)psc(x, dp= )qsc(p, xi=0, omega=1, alpha=0)qsc(x, dp=)rsc(n=1, xi=0, omega=1, alpha=0)rsc(x, dp=)

Background

The skew-Cauchy distribution can be thought as a skew-t with tail-weightparameternu=1. In this case, closed-form expressions of the distribution function and the quantile function have been obtained byBehboodianet al. (2006).The key facts are summarized in Complement 4.2 of Azzalini and Capitanio (2014).A multivariate version of the distribution exists.

References

Azzalini, A. with the collaboration of Capitanio, A. (2014).The Skew-normal and Related Families. Cambridge University Press, IMS Monographs series.

Behboodian, J., Jamalizadeh, A., and Balakrishnan, N. (2006).A new class of skew-Cauchy distributions.Statist. Probab. Lett.76, 1488–1493.

See Also

dst,dmsc

Examples

pdf <- dsc(seq(-5,5,by=0.1), alpha=3)cdf <- psc(seq(-5,5,by=0.1), alpha=3)q <- qsc(seq(0.1,0.9,by=0.1), alpha=-2)p <- psc(q, alpha=-2)  rn <- rsc(100, 5, 2, 5)

Skew-Normal Distribution

Description

Density function, distribution function, quantiles and randomnumber generation for the skew-normal (SN) and the extended skew-normal (ESN) distribution.

Usage

dsn(x, xi=0, omega=1, alpha=0, tau=0, dp=NULL, log=FALSE)psn(x, xi=0, omega=1, alpha=0, tau=0, dp=NULL, engine, ...)qsn(p, xi=0, omega=1, alpha=0, tau=0, dp=NULL, tol=1e-8, solver="NR", ...) rsn(n=1, xi=0, omega=1, alpha=0, tau=0,  dp=NULL)

Arguments

x

vector of quantiles. Missing values (NA's) andInf'sare allowed.

p

vector of probabilities. Missing values (NA's) are allowed

xi

vector of location parameters.

omega

vector of scale parameters; must be positive.

alpha

vector of slant parameter(s);+/- Inf is allowed.Forpsn, it must be of length 1 ifengine="T.Owen". Forqsn, it must be of length 1.

tau

a single value representing the ‘hidden mean’ parameter of theESN distribution;tau=0 (default) corresponds to aSN distribution.

dp

a vector of length 3 (in theSN case) or 4 (in theESN case), whose components represent the individual parameters described above. Ifdpis specified, the individual parameters cannot be set.

n

a positive integer representing the sample size.

tol

a scalar value which regulates the accuracy of the result ofqsn, measured on the probability scale.

log

logical flag used indsn (defaultFALSE).WhenTRUE, the logarithm of the density values is returned.

engine

a character string which selects the computing engine;this is either"T.Owen" or"biv.nt.prob", the latter from packagemnormt. Iftau != 0 orlength(alpha)>1,"biv.nt.prob" must be used. If this argument is missing, a default selection rule is applied.

solver

a character string which selects the numerical method used for solving the quantile equation; possible options are"NR" (default)and"RFB", described in the ‘Details’ section.

...

additional parameters passed toT.Owen

Value

density (dsn), probability (psn), quantile (qsn)or random sample (rsn) from the skew-normal distribution with givenxi,omega andalpha parameters or from the extendedskew-normal iftau!=0

Details

Typical usages are

dsn(x, xi=0, omega=1, alpha=0, log=FALSE)dsn(x, dp=, log=FALSE)psn(x, xi=0, omega=1, alpha=0,  ...)psn(x, dp=,  ...)qsn(p, xi=0, omega=1, alpha=0, tol=1e-8, ...)qsn(x, dp=, ...)rsn(n=1, xi=0, omega=1, alpha=0)rsn(x, dp=)

psn andqsn make use of functionT.Owenorbiv.nt.prob

Inqsn, the choicesolver="NR" selects the Newton-Raphson method for solving the quantile equation, while optionsolver="RFB"alternates a step ofregula falsi with one of bisection. The"NR" method is generally more efficient, but"RFB" is occasionally required in some problematic cases.

In version 1.6-2, the random number generation method forrsn haschanged; the so-called transformation method (also referred to as the‘additive representation’) has been adopted for all values oftau.Also, the code has been modified so that there is this form of consistency:providedset.seed() is reset similarly before calls, code likersn(5, dp=1:3) andrsn(10, dp=1:3), for instance, will start with the same initial values in the longer sequence as in the shorter sequence.

Background

The family of skew-normal distributions is an extension of the normalfamily, via the introdution of aalpha parameter which regulatesasymmetry; whenalpha=0, the skew-normal distribution reduces to the normal one. The density function of theSN distribution in the ‘normalized’ case havingxi=0 andomega=1 is2\phi(x)\Phi(\alpha x), if\phi and\Phi denote thestandard normal density and distribution function.An early discussion of the skew-normal distribution is given by Azzalini (1985); see Section 3.3 for theESN variant, up to a slight difference in the parameterization.

An updated exposition is provided in Chapter 2 of Azzalini and Capitanio (2014); theESN variant is presented Section 2.2. See Section 2.3 for an historical account. A multivariate version of the distribution is examined in Chapter 5.

References

Azzalini, A. (1985).A class of distributions which includes the normal ones.Scand. J. Statist.12, 171-178.

Azzalini, A. with the collaboration of Capitanio, A. (2014).The Skew-Normal and Related Families. Cambridge University Press, IMS Monographs series.

See Also

Functions used bypsn:T.Owen,biv.nt.prob

Related distributions:dmsn,dst,dmst

Examples

pdf <- dsn(seq(-3, 3, by=0.1), alpha=3)cdf <- psn(seq(-3, 3, by=0.1), alpha=3)q <- qsn(seq(0.1, 0.9, by=0.1), alpha=-2)r <- rsn(100, 5, 2, 5)qsn(1/10^(1:4), 0, 1, 5, 3, solver="RFB")

Skew-t Distribution

Description

Density function, distribution function, quantiles and random number generation for the skew-t (ST) distribution.

Usage

dst(x, xi=0, omega=1, alpha=0, nu=Inf, dp=NULL, log=FALSE) pst(x, xi=0, omega=1, alpha=0, nu=Inf, dp=NULL, method=0, lower.tail=TRUE,     log.p=FALSE, ...)qst(p, xi=0, omega=1, alpha=0, nu=Inf, tol=1e-08, dp=NULL, method=0, ...)rst(n=1, xi=0, omega=1, alpha=0, nu=Inf, dp=NULL)

Arguments

x

vector of quantiles. Missing values (NAs) are allowed.

p

vector of probabililities.

xi

vector of location parameters.

omega

vector of scale parameters; must be positive.

alpha

vector of slant parameters. Withpst andqst, it must be of length 1.

nu

a single positive value representing the degrees of freedom;it can be non-integer. Default value isnu=Inf which corresponds to the skew-normal distribution.

dp

a vector of length 4, whose elements represent location, scale(positive), slant and degrees of freedom, respectively. Ifdp isspecified, the individual parameters cannot be set.

n

a positive integer representing the sample size.

log,log.p

logical; ifTRUE, densities are given as log-densitiesand probabilitiesp are given aslog(p)

tol

a scalar value which regulates the accuracy of the result ofqsn, measured on the probability scale.

method

an integer value between0 and5 which selects the computing method; see ‘Details’ below for the meaning of thesevalues. Ifmethod=0 (default value), an automatic choice is madeamong the four actual computing methods, depending on the otherarguments.

lower.tail

logical; ifTRUE (default), probabilities areP\{X\le x\},otherwiseP\{X\ge x\}

...

additional parameters passed tointegrate orpmst.

Value

Density (dst), probability (pst), quantiles (qst) and random sample (rst) from the skew-t distribution with givenxi,omega,alpha andnu parameters.

Details

Typical usages are

dst(x, xi=0, omega=1, alpha=0, nu=Inf, log=FALSE)dst(x, dp=, log=FALSE)pst(x, xi=0, omega=1, alpha=0, nu=Inf, method=0, ...)pst(x, dp=, log=FALSE)qst(p, xi=0, omega=1, alpha=0, nu=Inf, tol=1e-8,  method=0, ...)qst(x, dp=, log=FALSE)rst(n=1, xi=0, omega=1, alpha=0, nu=Inf)rst(x, dp=, log=FALSE)

Background

The family of skew-t distributions is an extension of the Student'st family, via the introduction of aalpha parameter which regulates skewness; whenalpha=0, the skew-t distribution reduces to the usual Student'st distribution. Whennu=Inf, it reduces to the skew-normal distribution. Whennu=1, it reduces to a form of skew-Cauchy distribution.See Chapter 4 of Azzalini & Capitanio (2014) for additional information. A multivariate version of the distribution exists; seedmst.

Details

For evaluation ofpst, and so indirectly ofqst, four different methods are employed.In all the cases, the actual computations are performed for the normalized valuesz=(x-xi)/omega).Method 1 consists in usingpmst with dimensiond=1.Method 2 appliesintegrate to the density functiondst.Method 3 again usesintegrate too but with a different integrand,as given in Section 4.2 of Azzalini & Capitanio (2003, full version ofthe paper).Method 4 consists in the recursive procedure of Jamalizadeh, Khosravi andBalakrishnan (2009), which is recalled in Complement 4.3 on Azzalini & Capitanio (2014); the recursion overnu starts from the explicit expression fornu=1 given bypsc.Method 5 is targeted to tail probabilities only, and it returnsNAsfor non-extremex values (those withabs(z)<=20); it is based on expressions given in Complement 4.4 of Azzalini and Capitanio (2014).Method 1 and 4 are only suitable for integer values ofnu.Method 4 becomes progressively less efficient asnu increases,because the value ofnu determines the number of nested calls, but the decay of efficiency is slower for larger values oflength(x).If the default argument valuemethod=0 is retained, an automatic choiceamong the above four methods is made, which depends on the values ofnu, alpha, z. The numerical accuracy of methods 1, 2 and 3 can be regulated via the... argument, while method 4 is conceptually exact, up to machine precision.

Ifqst is called withnu>1e4, the computation is transferred toqsn.

References

Azzalini, A. and Capitanio, A. (2003).Distributions generated by perturbation of symmetry with emphasis on a multivariate skew-t distribution.J.Roy. Statist. Soc. B65, 367–389.Full version of the paper athttps://arXiv.org/abs/0911.2342.

Azzalini, A. with the collaboration of Capitanio, A. (2014).The Skew-normal and Related Families. Cambridge University Press, IMS Monographs series.

Jamalizadeh, A., Khosravi, M., and Balakrishnan, N. (2009).Recurrence relations for distributions of a skew-t and a linearcombination of order statistics from a bivariate-t.Comp. Statist. Data An.53, 847–852.

See Also

dmst,dsn,dsc

Examples

pdf <- dst(seq(-4, 4, by=0.1), alpha=3, nu=5)rnd <- rst(100, 5, 2, -5, 8)q <- qst(c(0.25, 0.50, 0.75), alpha=3, nu=5)pst(q, alpha=3, nu=5)  # must give back c(0.25, 0.50, 0.75)#p1 <- pst(x=seq(-3,3, by=1), dp=c(0,1,pi, 3.5))p2 <- pst(x=seq(-3,3, by=1), dp=c(0,1,pi, 3.5), method=2, rel.tol=1e-9)

Extract the SEC error distribution from an object created byselm

Description

Given an object created by a call toselm, the function deliverstheSEC distribution representing the stochastic term of the fitted model

Usage

  extractSECdistr(object, name, compNames)

Arguments

object

an object of classselm ormselm, as created byselm.

name

an optional character string representing the name of the outcome distribution; if missing, a string is constructed from theobject ingredients.

compNames

in the multivariate case, an optional vector of character strings with the names of the components of the error distribution; if missing, one such vector is constructed from theobjectingredients.

Value

An object of classSECdistrMv orSECdistrUv, depending of the class ofobject.

Details

When the formula of the fitted model includes only the constant1,the returned object represents the fittedSEC distribution.If the formula includes additional terms, the linear predictor is eliminatedand the returned object corresponds to the error term of the model; hence the location parameterxi in theDP parameterizationis set to zero.

The returned object can be submitted to toolsavailable for objects created bymakeSECdistr, such assummary.SECdistr,conditionalSECdistr andand so on.

See Also

selm,makeSECdistr

Examples

data(ais)m2 <- selm(log(Fe) ~ 1, family="ST", data=ais, fixed=list(nu=8))f2 <- extractSECdistr(m2)show(f2)#m4 <- selm(cbind(BMI, LBM) ~ 1, family="SN", data=ais)f4 <- extractSECdistr(m4)mean(f4)vcov(f4)

Maximum-likelihood fitting of a univariate distribution from grouped data

Description

Maximum-likelihood fitting of a univariate distribution when the data arerepresented by a set of frequencies pertaining to given set of contiguous intervals.

Usage

fitdistr.grouped(breaks, counts, family, weights, trace = FALSE, wpar = NULL)

Arguments

breaks

A numeric vector of strictly increasing values which identify a set of contiguous intervals on the real line.See ‘Details’ for additional information.

counts

A vector of non-negative integers representing the number of observations falling in the intervals specified bybreaks;it is then required thatlength(counts)+1=length(breaks).

family

A character string specifying the parametric family of distributions to be used for fitted. Admissible names are:"normal","logistic","t","Cauchy","SN","ST","SC","gamma","Weibull";the names"gaussian" and"Gaussian" are also allowed, and are converted to"normal".

weights

An alias forcounts, allowed for analogy with theselm function.

trace

A logical value which indicates whether intermediate evaluations of the optimization process are printed (default:FALSE).

wpar

An optional vector with initial values of the ‘working parameters’for starting the maximization of the log-likelihood function; see ‘Details’ for their description.

Details

The original motivation of this function was fitting a univariateSN,ST orSC distribution from grouped data;its scope was later extended to include some other continuous distributions. The adopted name of the function reflects the broad similarity of its purpose with the one offitdistr, but there are substantial differences in the actual working of the two functions.

The parameter set of a givenfamily is the same as appearing in thecorrespondingd<basename> function, with the exception of the"t" distribution, for which a location and a scale parameter are included, besidesdf.

The range ofbreaks does not need to span the whole support of thechosenfamily of distributions, that is,(0, Inf) for"Weibull" and"gamma" families,(-Inf, Inf) for the other families.In fact, for the purpose of post-fitting plotting, an infiniterange(breaks) represents a complication, requiring anad hoc handling; so it is sensible to avoid it.However, at the maximum-likelihood fitting stage, the full support ofthe probability distribution is considered, with the following implications. Ifmax(breaks)=xR, say, andxR<Inf, then an additional interval(xR, Inf) is introduced, with valuecounts=0 assigned.A similar action is taken at the lower end: ifmin(breaks)=xL islarger than the infimum of the support of the distribution,an extra 0-counts interval is introduced as(0, xL) or(-Inf, xL), depending on the support of thefamily.

Maximum likelihood fitting is obtained by maximizing the pertaining multinomial log-likelihood using theoptim function withmethod="Nelder-Mead". For numerical convenience, the numerical search is performed using ‘working parameters’ in place of the originalones, with reverse conversion at the end. The working parameters coincidewith the original distribution parameters when they have unbounded range,while they are log-transformed in case of intrinsically positive parameters.This transformation applies to the parameters of the positive-valueddistributions ("gamma" and "Weibull"), all scale parameters anddfof the"t" distribution.

Value

An object of classfitdistr.grouped, whose components are described infitdistr.grouped-class.

Author(s)

Adelchi Azzalini

See Also

For methods pertaining to this class of objects, seefitdistr.grouped-class andplot.fitdistr.grouped; see alsodsn,dst,dsc,Distributions,dmultinom;see alsoselm for ungrouped data fitting and an exampleelaborated on below.

Examples

data(barolo)attach(barolo)A75 <- (reseller=="A" & volume==75)logPrice <- log(price[A75], 10) # used in documentation of 'selm'; see its fittingdetach(barolo)breaks<- seq(1, 3, by=0.25)f <- cut(logPrice, breaks = breaks)counts <- tabulate(f, length(levels(f))) logPrice.grouped <- fitdistr.grouped(breaks, counts, family='ST')summary(logPrice.grouped) # compare this fit with the ungrouped data fitting

Methods for objects of class created byfitdistr.grouped

Description

A successful call to functionfitdistr.grouped creates an object of class, also namedfitdistr.grouped, for which a set of methods exist.The structure of an object of this class is described in section ‘Object components’.

Usage

  ## S3 method for class 'fitdistr.grouped'logLik(object, ...)  ## S3 method for class 'fitdistr.grouped'coef(object, ...)  ## S3 method for class 'fitdistr.grouped'vcov(object, ...)  ## S3 method for class 'fitdistr.grouped'print(x, ...)  ## S3 method for class 'fitdistr.grouped'summary(object, cor=FALSE, ...)  ## S3 method for class 'fitdistr.grouped'fitted(object, full=FALSE, ...)

Arguments

x,object

an object of classfitdistr.grouped as created by a call to the function with this name.

cor

logical (default=FALSE); is the correlation matrix required?

full

logical (default=FALSE); must the vector of fitted frequencies include the boundary classes, when they are added to cover the full support of the fitted distribution?

...

further arguments passed to or from other methods.

Object components

Components of an object of classfitdistr.grouped:

call

the matched call

family

the selectedfamily of distributions

logL

the achieved maximum log-likelihood

param

a vector of estimated parameters

vcov

the approximate variance-covariance matrix of the estimates

input

a list with the input quantities and some derived ones

opt

a list as returned byoptim

Author(s)

Adelchi Azzalini

References

Possolo, A., Merkatas, C. and Bodnar, O. (2019).Asymmetrical uncertainties.Metrologia 56, 045009.

See Also

the functionfitdistr.grouped, the plotting methodplot.fitdistr.grouped

Examples

data(barolo)attach(barolo)A75 <- (reseller=="A" & volume==75)logPrice <- log(price[A75], 10) # as used in selm documentation; see its fittingdetach(barolo)breaks <- seq(1, 3, by=0.25)f <- cut(logPrice, breaks = breaks)counts <- tabulate(f, length(levels(f))) fit.logPrice.gr <- fitdistr.grouped(breaks, counts, family='ST')summary(fit.logPrice.gr) # compare this fit with the ungrouped data fitting print(fit.logPrice.gr)coef(fit.logPrice.gr)vcov(fit.logPrice.gr)data.frame(intervals=levels(f), counts, fitted=format(fitted(fit.logPrice.gr)))full.intervals <- c("(-Inf, 1]", levels(f), "(3, Inf)")data.frame("full-range intervals" = full.intervals,       "full-range counts" = c(0, counts, 0),       "full-range fit" = fitted(fit.logPrice.gr, full=TRUE))sum(counts) - sum(fitted(fit.logPrice.gr))  sum(counts) - sum(fitted(fit.logPrice.gr, full=TRUE))  # must be "nearly 0" #---# Use first entry in Table 3 of Possolo et al. (2019) and do similar fitting# to the *probability* values, not observation countsafcrc59 <- 1.141breaks <- c(-Inf, afcrc59 - 0.033, afcrc59, afcrc59 + 0.037, Inf)prob <-  c(0.16, 0.50, 0.84) cum.percent <- c(0, prob, 1)*100 fitSN <- fitdistr.grouped(breaks, counts=diff(cum.percent), family="SN") print(coef(fitSN))print(rbind(target=c(prob, 1)*100, fitted=cumsum(fitted(fitSN))), digits=5)# Note: given the nature of these data (i.e. probabilities, not counts), #       there is no point to use vcov, logLik and summary on the fitted object.

Four-number summary of a numeric vector

Description

Returns a quantile-based four-number summary of the input data

Usage

fournum(x, na.rm = TRUE, ...)

Arguments

x

a numeric vector, maybe includingNAs and+/-Inf's.At least 8 not-NA values are required.It works with objects which can be coerced to vector.

na.rm

logical; ifTRUE, allNA andNaNs are dropped, before the statistics are computed.

...

optional arguments passed toquantile

Details

Functionquantile is used to compute 7 octiles ofx, that is, quantiles of level(1:7)/8, denotedoct[1:7], and derive four summary quantities:

  1. the median, which corresponds tooct[4],

  2. the ‘(coefficient of) quartile deviation’ or semi-interquantile range:(oct[6] - oct[2])/2;

  3. the Galton-Bowley measure of asymmetry, that is, skewness:(oct[6] - 2 * oct[4] + oct[2])/(oct[6] - oct[2]);

  4. the Moors measure of kurtosis:(oct[7] - oct[5] + oct[3] - oct[1])/(oct[6] - oct[2])

The term ‘coefficient of quartile deviation’ is adopted from theEncyclopedia of Statistical Sciences; see the references below.What is called Galton-Bowley measure here is often named ‘Bowley's measure’, but some sources attribute it to Francis Galton. For the Moors measure, see Moors (1988).

Value

a vector of length four containing the median, the quartile deviation, the Galton-Bowley measure and the Moors measure. However,ifx does not contain at least 8 values (after removingNAs),rep(NA,4) is returned.

Note

Computation of octiles makes real sense only iflength(x) issubstantially larger than 8.

Author(s)

Adelchi Azzalini

References

‘Quartile deviation, coefficient of’, in:Encyclopedia of Statistical Sciences, 2nd edition (2006). Editors: Samuel Kotz (Editor-in-Chief), Campbell B. Read, N. Balakrishnan, Brani Vidakovic. Volume 10, p.6743.

‘Skewness, Bowleys's measures of’, in:Encyclopedia of Statistical Sciences, 2nd edition (2006). Editors: Samuel Kotz (Editor-in-Chief), Campbell B. Read, N. Balakrishnan, Brani Vidakovic. Volume 12, p.7771-7773.

Moors, J.J.A. (1988). A quantile alternative for kurtosis.Source: Journal of the Royal Statistical Society. Series D (The Statistician), Vol. 37, pp. 25-32

See Also

quantile,fivenum,IQR

Examples

fournum(datasets::rivers)

Simulated sample from a skew-normal distribution

Description

A sample simulated from the SN(0,1,5) distribution with samplecoefficient of skewness inside the admissible range (-0.9952719, 0.9952719) for the skew-normal family butmaximum likelihood estimate on the frontier of the parameter space.

Usage

data(frontier)

Format

A numeric vector of length 50.

Source

Generated by a run ofrsn(50, 0, 1, 5).

Examples

data(frontier, package="sn")fit <- selm(frontier ~ 1)plot(fit, which=2)#fit.p <- selm(frontier ~ 1, method="MPLE")plot(fit.p, which=2)

Mapping of the (Galton-Bowley, Moors) measures to the (alpha,nu) parameters of a ST distribution

Description

Given a pair of (Galton-Bowley, Moors) measures of skewness and kurtosis for a given sample,galton_moors2alpha_nu delivers values (alpha,nu) such that a skew-t (ST) distribution with these slant and tail-weight parameter has its (Galton-Bowley, Moors) measures equal to the input values.Its simplified versiongalton2alpha uses only a Galton-Bowley measureto deliver aalpha value, assuming aSN distribution.These functions are mainly intended for internal package usage.

Usage

galton_moors2alpha_nu(galton, moors, quick = TRUE, move.in = TRUE, verbose = 0,   abstol = 1e-04)galton2alpha(galton, move.in = TRUE)

Arguments

galton

a numeric value, representing a Galton-Bowley measure

moors

a numeric value, representing a Moors measure

quick

a logical value; ifTRUE, a quick mapping is performed

move.in

if the input values(galton, moors) are outside the feasibleST region, a suitable point within the feasible area is returned

verbose

a numeric value which regulates the amount of printed detail

abstol

the tolerance value of the mapping, only relevant isquick=FALSE

Details

For background information about the Galton-Bowley's and the Moors measures, see the documentation offournum.The working of the mapping by described in Azzalini and Salehi (2020).

Value

forgalton_moors2alpha_nu, named vector of length two, with one or more descriptive attributes; forgalton2alpha, a singlealpha value.

Note

These functions are mainly intended for internal package usage.Specifically they are used byst.prelimFit.

Author(s)

Adelchi Azzalini

References

Azzalini, A. and Salehi, M. (2020).Some computational aspects of maximum likelihood estimation of the skew-t distribution.In:Computational and Methodological Statistics and Biostatistics,edited by Andriëtte Bekker, Ding-Geng Chen and Johannes T. Ferreira.Springer. DOI: 10.1007/978-3-030-42196-0

See Also

fournum,st.prelimFit

Examples

galton_moors2alpha_nu(0.5, 3, quick=FALSE)  # input in the feasible areagalton_moors2alpha_nu(0.5, 3)    # very similar output, much more quicklygalton_moors2alpha_nu(0.5, 0.5)  # input outside the feasible area

Build a skew-elliptically contoured distribution

Description

Build an object which identifies a skew-elliptically contoured distribution (SEC), in the univariate and in the multivariate case.The term ‘skew-elliptical distribution’ is a synonym ofSEC distribution.

Usage

makeSECdistr(dp, family, name, compNames)

Arguments

dp

a numeric vector (in the univariate case) or a list (in themultivariate case) of parameters which identify the specific distributionwithin the namedfamily. See ‘Details’ for their expected structure.

family

a character string which identifies the parametricfamily; currently, possible values are:"SN","ESN","ST","SC". See ‘Details’ for additional information.

name

an optional character string with the name of the distribution.If missing, one is created.

compNames

in the multivariate case, an optional vector of characterstrings with the names of the component variables; its length must be equal to the dimensionality of the distribution being generated. If missing and the first component ofdp is a named vector,its names are used ascompNames; otherwisethe components are named"V1","V2", ...

Details

Ifdp is a numeric vector, a univariate distribution is built. Alternatively, ifdp is a list, a multivariate distribution is built. In both cases, the required number of components ofdp depends onfamily: it must be3 for"SN" and"SC"; it must be4 for"ESN" and"ST".

In the univariate case, the first three components ofdp represent what for the specific distributions are denotedxi (location),omega (scale, positive) andalpha (slant); see functionsdsn,dst,dsc for theirdescription. The fourth component, when it exists, represents eithertau (hidden variable mean) for"ESN" ornu (degrees of freedom) for"ST". The names of the individual parameters are attachedto the components ofdp in the returned object.

In the multivariate case,dp is a list with components having similar role as in the univariate case, butxi=dp[[1]] andalpha=dp[[3]] are now vectors and the scale parameterOmega=dp[[2]] is a symmetric positive-definite matrix. For a multivariate distribution of dimension 1 (which can be created, although a warning message is issued),Omega corresponds to the square ofomega in the univariate case. Vectorsxi andalpha must be of lengthncol(Omega).See also functionsdmsn,dmst anddmsc.The fourth component, when it exists, is a scalar with the same role as in the univariate case.

In the univariate casealpha=Inf is allowed, but in the multivariatecase all components of the vectoralpha must be finite.

An object built by this function operates according to the S4 protocol.

Value

In the univariate case, an object of classSECdistrUv;in the multivariate case, an object of classSECdistrMv.SeeSECdistrUv-class andSECdistrMv-classfor their description.

Background

For background information, see Azzalini and Capitanio (2014), specificallyChapters 2 and 4 for univariate cases, Chapters 5 and 6 for multivariatecases; Section 6.1 provides a general formulation ofSECdistributions.

If the slant parameteralpha is0 (or a vector of0's,in the multivariate case), the distribution is of classical elliptical type.

TheESN distribution is included here as a members of theSEC class, with a very slight extension of the original definitionof this class, since the only difference is the non-zero truncation pointof the unobserved component of the(d+1)-dimensionalEC variable.

Author(s)

Adelchi Azzalini

References

Azzalini, A. with the collaboration of Capitanio, A. (2014).The Skew-Normal and Related Families. Cambridge University Press, IMS Monographs series.

See Also

The description of classesSECdistrUv-class andSECdistrMv-class

plot.SECdistr for plotting andsummary.SECdistr for summaries

Related functionsdsn,dst,dsc,dmsn,dmst,dp2cp

FunctionsaffineTransSECdistr andconditionalSECdistr to manipulate objects of classSECdistrMv-class

FunctionextractSECdistr to extract objects of classSECdistrMv-class andSECdistrUv-class representing theSEC distribution of aselm fit

Examples

    f1 <- makeSECdistr(dp=c(3,2,5), family="SN", name="First-SN")show(f1)summary(f1)plot(f1)plot(f1, probs=c(0.1, 0.9))#f2 <- makeSECdistr(dp=c(3, 5, -4, 8), family="ST", name="First-ST")f9 <- makeSECdistr(dp=c(5, 1, Inf, 0.5), family="ESN", name="ESN,alpha=Inf")#dp0 <- list(xi=1:2, Omega=diag(3:4), alpha=c(3, -5))f10 <- makeSECdistr(dp=dp0, family="SN", name="SN-2d", compNames=c("u1", "u2"))#dp1 <- list(xi=1:2, Omega=diag(1:2)+outer(c(3,3),c(2,2)), alpha=c(-3, 5), nu=6)f11 <- makeSECdistr(dp=dp1, family="ST", name="ST-2d", compNames=c("t1", "t2"))

Build an object representing a SUN distribution

Description

Build an object which identifies a Unified Skew-Normal distribution (SUN) within this parametric family.TheSUN family is essentially equivalent to some other parametric families examined in the literature, notably the Closed Skew-Normal.

Usage

makeSUNdistr(dp, name, compNames, HcompNames, drop = TRUE)

Arguments

dp

a list of parameters as described atSUNdistr-base.

name

an optional character string with the name of the distribution. If missing, one is created.

compNames

an optional vector of character strings with the names of the component variables; its length must be equal to the dimensionalityd of the distribution being generated.If missing, the components are named"V1","V2", ...

HcompNames

an optional vector of character strings with the names of the hidden component variables; its length must be equal to the dimensionality componentm described in the ‘Details’.If missing, the components are named"H1","H2", ...

drop

a logical value (default:TRUE) relevant only in thecasem=1. When bothm=1 anddrop=TRUE, the returned object is of class eitherSECdistrUv orSECdistrMv,depending on the value ofd, and family"SN" or"ESN", depending on thedp ingredients.

Details

The argumentdp is a list, whose components are described atSUNdistr-base; see especially the ‘Details’ there.In this respect, there is no difference between the univariate and theunivariate case, differently from the similar commandmakeSECdistr.

If the argumentsname,compNames andHcompNamesare missing, they are composed from the supplied arguments.

ASUNdistr-class object operates according to the S4 protocol.

Value

An object ofSUNdistr-class

Note

The present structure and user interface of this function, and of other ones related to theSUN distribution, must be considered experimental, and they might possibly change in the future.

Author(s)

Adelchi Azzalini

See Also

Basic information on the SUN distributionSUNdistr-base,the description of the classSUNdistr-class,

Related methods:show.SUNdistr for displaying the object constituents,plot.SUNdistr for plotting,mean.SUNdistr for the mean value,vcov.SUNdistr for the variance matrix,summary.SUNdistr for various summary quantities

FunctionsSUNdistr-op manipulate objects created by this function, producing newSUNdistr-class objects

Examples

xi <- c(1, 0, -1)Omega <- matrix(c(2,1,1, 1,3,1, 1,1,4), 3, 3)Delta <- matrix(c(0.72,0.20, 0.51,0.42, 0.88, 0.94), 3, 2, byrow=TRUE)Gamma <- matrix(c(1, 0.8, 0.8, 1), 2, 2)dp3 <- list(xi=xi, Omega=Omega, Delta=Delta, tau=c(-0.5, 0), Gamma=Gamma)sun3 <- makeSUNdistr(dp=dp3, name="SUN3", compNames=c("x", "w", "z"))show(sun3)

vech, tr and other matrix operators

Description

vech and other matrix operators

Usage

vech(A)vech2mat(v)duplicationMatrix(n)tr(x)blockDiag(...)

Arguments

A

a (symmetric) square numeric matrix.

v

a numeric vector such thatlength(v)=n*(n+1)/2 for somepositive integern.

n

a positive integer number; default isn=1.

x

a square numeric matrix.

...

an abitrary numer of matrices or objects coercible into matrices.

Value

a vector in case ofvech, a scalar in case oftr, otherwise a matrix.

Details

For a square matrixA,vech(A) returns the vector formedby the lower triangular portion of the matrix, including the diagonal; usually, this only makes sense for a symmetric matrix of numeric values. Ifv=vech(M) whereM is a symmetric numeric matrix,vech2mat(v) performs the inverse operation and returns the original matrixM; this explain the requirement onlength(v).

For a positive integern,D=duplicationMatrix(n) is a matrix of dimension(n^2, n*(n+1)/2) such thatD %*% vech(M) returns thevec-form of a symmetric matrixM of ordern, that is, the vector which stacks the columns ofM;for more information, see Section 3.8 of Magnus and Neudecker (1988).

For a square numeric matrixx,tr(x) returns its trace.

blockDiag(...) creates a block-diagonal matrix from a set of matrices or objects coercible into matrices. Generally, this is useful only fornumeric objects.

Author

Adelchi Azzalini; the original Octave code ofduplicationMatrix is by Kurt Hornik.

References

Magnus, Jan R. and Neudecker, Heinz (1988).Matrix differentialcalculus with application in statistics and econometrics. Wiley series in probability and statistics.

Examples

M <- toeplitz(1:4)v <- vech(M)vech2mat(v) - MD <- duplicationMatrix(ncol(M))# D %*% vech(M) - as.vector(M), must be a one-column matrix of 0str(outer(1:4,2:5))blockDiag(M[1:2,], 1:2, diag(5:6))

The mode of a skew-elliptically contoured (SEC) distribution

Description

Compute compute the mode of a univariate or multivariateSECdistribution.

Usage

  modeSECdistr(dp, family, object=NULL)

Arguments

dp

a numeric vector (in the univariate case, for classSECdistrUv) or a list (in the multivariate case, , for classSECdistrUv) of parameters which identify the specific distribution within the namedfamily.

family

a character string which identifies the parametricfamily among those admissible for classesSECdistrUv orSECdistrMv.

object

an object of classSECdistrUv orSECdistrMv ascreated bymakeSECdistr orextractSECdistr;if this argument is used, argumentsdp andfamily must not be set, andvice versa.

Value

a numeric vector

Background

The mode is obtained through numerical maximization.In the multivariate case, the problem is reduced to a one-dimensional search using Propositions 5.14 and 6.2 of the reference below.

References

Azzalini, A. with the collaboration of Capitanio, A. (2014).The Skew-Normal and Related Families. Cambridge University Press, IMS Monographs series.

See Also

makeSECdistr andextractSECdistrfor additional information and for constructing a suitableobject,

SECdistrUv-class andSECdistrMv-classfor methodsmean andvcov which compute the mean (vector) and the variance (matrix) of theobject distribution

Examples

dp3 <- list(xi=1:3, Omega=toeplitz(1/(1:3)), alpha=c(3,-1,2), nu=5)st3 <- makeSECdistr(dp3, family="ST", name="ST3", compNames=c("U", "V", "W"))mode1 <- modeSECdistr(dp3, "ST")mode2 <- modeSECdistr(object=st3) # the same of mode1

Packagesn: overview of the structure and main commands

Description

The package provides facilities to build and manipulate probability distributions of the skew-normal (SN) and some related families, notably the skew-t (ST) and the ‘unified skew-normal’ (SUN) families. For theSN,ST and skew-Cauchy (SC) families,statistical methods are made available for data fitting and model diagnostics, in the univariate and the multivariate case.

Two main sides

The package comprises two main sides: one side provides facilities for the pertaining probability distributions; the other one deals with relatedstatistical methods.Underlying formulation, parameterizations of distributions and terminology are in agreement with the monograph of Azzalini and Capitanio (2014).

Probability side

There are two layers of support for the probability distributions of interest.At the basic level, there exist functions which follow the classicalR scheme for distributions.In addition, there exists facilities to build an object which incapsulates a probability distribution and then certain operations can be be performed on such an object;these probability objects operate according to the S4 protocol. The two schemes are described next.

ClassicalR scheme

The following functions work similary to{d,p,q,r}norm and otherRfunctions for probability distributions:

In addition to the usual specification of their parameters as a sequence ofindividual components, a parameter set can be specified as a singledpentity, namely a vector in the univariate case, a list in the multivariatecase;dp stands for ‘Direct Parameters’ (DP).

Another set of parameters is in use, denoted Centred Parameters (CP),which are more convenient for interpretability, since they correspond tofamiliar quantifies, such as the mean and the standard deviation. Conversion from thedp parameter set to the correspondingCPparameters can be accomplished using the functiondp2cp,while functioncp2dp performs the inverse transformation.

TheSUN family is mainly targeted to the multivariate context, and this is reflected in the organization of the pertaining functions, although univariateSUN distributions are supported.Density, distribution function and random numbers are handled by{d,p,r}sun. Mean value, variance matrix and Mardia's measuresof multivariate skewness and kurtosis are computed bysun{Mean,Vcov,Mardia}.

In addition, one can introduce a user-specified density function usingdSymmModulated anddmSymmModulated, in the univariate and themultivariate case, respectively. These densities are of the‘symmetry-modulated’ type, also called ‘skew-symmetric’, whereone can specify the base density and the modulation factor with high degree offlexibility. Random numbers can be sampled using the corresponding functionsrSymmModulated andrmSymmModulated. In the bivariate case,a dedicated plotting function exists.

Probability distribution objects:SEC families

FunctionmakeSECdistr can be used to build a ‘SECdistribution’ object representing a member of a specified parametric family(among the typesSN, ESN, ST, SC) with a givendp parameterset. This object can be used for various operations such as plotting orextraction of moments and other summary quantities. Another way of constructing aSEC distribution object is viaextractSECdistr which extracts suitable components of an object produced by functionselm to be described below.

Additional operations on these objects are possible in the multivariate case,namelymarginalSECdistr andaffineTransSECdistrfor marginalization and affine trasformations. For the multivariateSN family only (but includingESN),conditionalSECdistr performs a conditioning on the values taken on by some components of the multivariate variable.

Probability distribution objects: theSUN family

FunctionmakeSUNdistr can be used to build aSUNdistribution object representing a member of theSUN parametric family.This object can be used for various operations such as plotting orextraction of moments and other summary quantities.

Moreover there are several trasformation operations which can be performedon aSUN distribution object, or two such objects in some cases:computing a (multivariate) marginal distribution, a conditional distribution(on given values of some components or on one-sided intervals), an affinetrasformation, a convolution (that is, the distribution of the sum of twoindependent variables), and joining two distributions under assumption ofindependence.

Statistics side

The main function for data fitting is represented byselm, which allowsto specify a linear regression model for the location parameter, similarly to functionlm, but assuming askew-elliptical distributionof the random component;this explains the nameselm=(se+lm). Allowed types of distributionsareSN (but notESN),ST andSC.The fitted distribution is univariate or multivariate, depending on the natureof the response variable of the posited regression model. The model fittingmethod is either maximum likelihood or maximum penalized likelihood; the latter option effectively allows the introduction of a prior distribution on the slant parameter of the error distribution, hence leading to a ‘maximum a posteriori’ estimate.

Once the fitting process has been accomplished, an object of class eitherselm (for univariate response) ormselm (for multivariate response) is produced.A number of ‘methods’ are available for these objects:show,plot,summary,coef,residuals,logLik and others.For univariateselm-class objects, univariate and bivariate profilelog-likelihood functions can be obtained; apredict method also exists.These methods are built following the S4 protocol; however, the user must notbe concerned with the choice of the adopted protocol (unless this is wished).

The actual fitting process invoked viaselm is actually performed by aset of lower-level procedures. These are accessible for direct call, if so wished, typically for improved efficiency, at the expense of a little additional programming effort. Similarly, functions to compute the Fisher information matrix are available, in the expected and the observed form (with some restrictions depending on the selected distribution).

TheextractSECdistr function extracts the fittedSEC distribution fromselm-class andmselm-class objects, henceproviding a bridge with the probability side of the package.

The facilities for statistical work do not support theSUN family.

Additional material

Additional material is available in the section ‘User guides, package vignettes and other documentation’accessible from the front page of the documentation. See especially the documentpkg_sn-intro.pdf

Author

Adelchi Azzalini.Please send comments, error reportset cetera to the author, whose web page ishttp://azzalini.stat.unipd.it/.

References

Azzalini, A. with the collaboration of Capitanio, A. (2014).The Skew-Normal and Related Families. Cambridge University Press, IMS Monographs series.


Plotting methods for classesSECdistrUv andSECdistrMv

Description

Plotting methods for classesSECdistrUv andSECdistrMv

Usage

## S4 method for signature 'SECdistrUv'plot(x, range, probs, main, npt = 251, ...)## S4 method for signature 'SECdistrMv'plot(x, range, probs, npt, landmarks = "auto",   main, comp, compLabs, data = NULL, data.par = NULL, gap = 0.5, ...)

Arguments

x

an object of classSECdistrUv orSECdistrMv.

range

in the univariate case, a vector of length 2 which defines the plotting range; in the multivariate case, a matrix with two rows where each column defines the plotting range of the corresponding component variable. If missing, a sensible choice is made.

probs

a vector of probability values. In the univariate case, thecorresponding quantiles are plotted on the horizontal axis; it can beskipped by settingprobs=NULL. In the multivariate case, each probability value corresponds to a contour level in each bivariate plot; at least one probability value is required. See ‘Details’ forfurther information. Default value:c(0.05, 0.25, 0.5, 0.75, 0.95)in the univariate case,c(0.25, 0.5, 0.75, 0.95) in themultivariate case.

npt

a numeric value or vector (in the univariate and in the multivariate case, respectively) to assign the number of evaluation pointsof the distribution, on an equally-spaced grid over therangedefined above. Default value: 251 in the univariate case, a vector of101's in the multivariate case.

landmarks

a character string which affects the placement of somelandmark values in the multivariate case, that is, the origin, the modeand the mean (or its substitute pseudo-mean), which are all aligned.Possible values:"proper","pseudo","auto"(default),"". The option"" prevents plotting of thelandmarks. With the other options, the landmarks are plotted, with somevariation in the last one:"proper" plots the proper mean value,"pseudo" plots the pseudo-mean, useful when the proper mean doesnot exists,"auto" plots the proper mean if it exists, otherwise itswitches automatically to the pseudo-mean. Seedp2cp formore information on pseudo-CP parameters, including pseudo-mean.

main

a character string for main title; if missing, one is builtfrom the available ingredients.

comp

a subset of the vector1:d, ifd denotes thedimensionality of the multivariate distribution.

compLabs

a vector of character strings or expressions used to denotethe variables in the plot; if missing,slot(object,"compNames") is used.

data

an optional set of data of matching dimensionity ofobject to be superimposed to the plot. The default valuedata=NULL produces no effect. In the univariate case, data are plotted usingrug at the top horizontal axis, unless ifprobs=NULL, in which case plotting is at the bottom axis. In the multivariate case, points are plotted in the form of a scatterplot or matrix of scatterplots; thiscan be regulated by argumentdata.par.

data.par

an optional list of graphical parameters used for plottingdata in the multivariate case, whendata is notNULL. Recognized parameters are:col,pch,cex.If missing, the analogous components ofpar() are used.

gap

a numeric value which regulates the gap between panels of amultivariate plot whend>2.

...

additional graphical parameters

Value

an invisible list. In the univariate case the list has three components:the input object representing the distribution and two numeric vectors with the coordinates of the plotted density values. In the multivariate case, the first element of the list is the input object representing the distribution and all subsequent list elements are lists with components of the panels comprising the matrix plot; the elements of these sub-lists are: the vectors ofx andy coordinates, the names of the variables, the density values at the(x,y) points, a vector of the density levels of the curves appearing in each panel plot, with the corresponding approximateprobability content as a vector attribute.

Details

For univariate density plots,probs are used to compute quantilesfrom the appropriate distribution, and these are superimposed to the plot ofthe density function, unlessprobs=NULL. In the multivariate case,each bivariate plot is constructed as a collection of contour curves,one curve for each probability level; consequently,probs cannot be missing orNULL. The level of the density contour lines are chosen so that each curve circumscribes a region with the quoted probability, to a good degree of approssimation; for additional information, see Azzalini and Capitanio (2014), specifically Complement 5.2 and p.179, and references therein.

Methods

signature(x = "SECdistrUv")

Plot an objectxof classSECdistrUv.

signature(x = "SECdistrMv")

Plot an objectxof classSECdistrMv.

Author(s)

Adelchi Azzalini

References

Azzalini, A. with the collaboration of Capitanio, A. (2014).The Skew-Normal and Related Families. Cambridge University Press, IMS Monographs series.

See Also

makeSECdistr,summary.SECdistr,dp2cp

Examples

# d=1f1 <- makeSECdistr(dp=c(3,2,5), family="SC", name="Univariate Skew-Cauchy")plot(f1)plot(f1, range=c(-3,40), probs=NULL, col=4)#    # d=2Omega2 <- matrix(c(3, -3, -3, 5), 2, 2)    f2 <-  makeSECdistr(dp=list(c(10,30), Omega=Omega2, alpha=c(-3, 5)),                  family="sn", name="SN-2d", compNames=c("x1","x2"))plot(f2)    x2 <- rmsn(100, dp=slot(f2,"dp"))plot(f2, main="Distribution 'f2'", probs=c(0.5,0.9), cex.main=1.5, col=2,  cex=0.8, compLabs=c(expression(x[1]), expression(log(z[2]-beta^{1/3}))),   data=x2, data.par=list(col=4, cex=0.6, pch=5))

Plotting method for classSUNdistr

Description

Plotting method for classSUNdistr

Usage

## S4 method for signature 'SUNdistr'plot(x, range, nlevels = 8, levels, npt,  main, comp, compLabs, gap = 0.5, ...)

Arguments

x

an object of classSUNdistr

range

in the univariate case, a vector of length 2 which defines the plotting range; in the multivariate case, a matrix with two rows where each column defines the plotting range of the corresponding component variable. If missing, a sensible choice is made.

nlevels

number of contour levels desirediff levels is not supplied.

levels

numeric vector of levels at which to draw contour lines.

npt

a numeric value or vector (in the univariate and in the multivariate case, respectively) to assign the number of evaluation pointsof the distribution, on an equally-spaced grid over therangedefined above. Default value: 251 in the univariate case, a vector of101's in the multivariate case.

main

a character string for main title; if missing, one is builtfrom the available ingredients.

comp

an optional integer vector representing the subset of the vector1:d, ifd denotes the dimensionality of the distribution.

compLabs

a vector of character strings or expressions used to labelthe variables in the plot; if missing,slot(object,"compNames")[comp] is used.

gap

a numeric value which regulates the gap between panels of amultivariate plot whend>2; default:0.5.

...

additional graphical parameters

Details

For univariate density plots,probs are used to compute quantilesfrom the appropriate distribution, and these are superimposed to the plot ofthe density function, unlessprobs=NULL. In the multivariate case,each bivariate plot is constructed as a collection of contour curves,one curve for each probability level; consequently,probs cannot be missing orNULL. The level of the density contour lines are chosen so that each curve circumscribes a region with the quoted probability, to a good degree of approssimation; for additional information, see Azzalini and Capitanio (2014), specifically Complement 5.2 and p.179, and references therein.

Value

an invisible list. In the univariate case the list has three components:the input object representing the distribution and two numeric vectors with the coordinates of the plotted density values. In the multivariate case, the first element of the list is the input object representing the distribution and all subsequent list elements are lists with components of the panels comprising the matrix plot; the elements of these sub-lists are: the vectors ofx andy coordinates, the names of the variables, the density values at the(x,y) points, a vector of the density levels of the curves appearing in each panel plot.

Author(s)

Adelchi Azzalini

See Also

makeSUNdistr,SUNdistr-class

Examples

xi <- c(1, 0, -1)Omega <- matrix(c(2,1,1, 1,3,1, 1,1,4), 3, 3)Delta <- matrix(c(0.72,0.20, 0.51,0.42, 0.88, 0.94), 3, 2, byrow=TRUE)Gamma <- matrix(c(1, 0.8, 0.8, 1), 2, 2)dp3 <- list(xi=xi, Omega=Omega, Delta=Delta, tau=c(-0.5, 0), Gamma=Gamma)sun3 <- makeSUNdistr(dp=dp3, name="SUN3", compNames=c("x", "w", "z"))plot(sun3, npt=rep(51,3))p <- plot(sun3, comp=2:3, compLabs=c(expression(x[2]), expression(x[3])))# str(p)

Plot an object generated byfitdistr.grouped

Description

The object produced byfitdistr.grouped is plotted in the formof an histogram of the input original observed frequencies with superimposed the fitted parametric density function.

Usage

## S3 method for class 'fitdistr.grouped'plot(x, freq = FALSE, col = "grey90",   border = "grey80",  pdfcol = "blue", main, sub = NULL, xlab, ylab, xlim, ylim,   axes = TRUE, labels = FALSE, ...)

Arguments

x

alink{fitdistr.grouped} object.

freq

logical; ifTRUE, the histogram graphic is to present a representation of frequencies; ifFALSE (default value),densities are plotted.

col

a colour to be used to fill the bars.

border

the colour of the border around the bars.

pdfcol

the colour of the fitted parametric distribution.

main,sub,xlab,ylab

these arguments totitle an be omitted,in which case default values are constructed.

xlim,ylim

the range ofx andy values with sensible defaults.

axes

logical, indicating if axes should be drawn.

labels

logical or character. Additionally draw labels on top of bars, if not FALSE; if TRUE, draw the counts or rounded densities; if labels is a character, draw itself.

...

further graphical parameters to title and axis.

Details

The function builds onplot.histogram, but some lesser features have been dropped.

Value

A list with the following components

hist

an object of classhistogram which produces the histogram.

x,y

the values used for plotting the fitted parametric distribution.

Author(s)

Adelchi Azzalini

See Also

fitdistr.grouped for generating a suitable object,plot.histogram for the related more general function.

Examples

data(barolo)attach(barolo)A75 <- (reseller=="A" & volume==75)logPrice <- log(price[A75],10) # as used in selm documentation; see its fittingdetach(barolo)breaks<- seq(1, 3, by=0.25)f <- cut(logPrice, breaks = breaks)freq <- tabulate(f, length(levels(f))) logPrice.grouped <- fitdistr.grouped(breaks, freq, family='ST')plot(logPrice.grouped)

Diagnostic plots forselm fits

Description

Diagnostic plots for objects of classselmandmselm generated by a call to functionselm

Usage

  ## S4 method for signature 'selm'plot(x, param.type="CP", which = c(1:4), caption,     panel = if (add.smooth) panel.smooth else points, main = "",     ask = prod(par("mfcol")) < length(which) && dev.interactive(), ...,     id.n = 3, labels.id = names(x@residuals.dp),     cex.id = 0.75, identline = TRUE, add.smooth = getOption("add.smooth"),     label.pos = c(4, 2), cex.caption = 1)  ## S4 method for signature 'mselm'plot(x, param.type="CP",  which, caption,     panel = if (add.smooth) panel.smooth else points,  main = "",     ask = prod(par("mfcol")) < length(which) && dev.interactive(), ...,     id.n = 3, labels.id = names(x@residuals.dp),     cex.id = 0.75, identline = TRUE, add.smooth = getOption("add.smooth"),     label.pos = c(4, 2), cex.caption = 1)

Arguments

x

an object of classselm ormselm.

param.type

a character string which selects the type of residualsto be used for some of of the plots; possible values are:"CP" (default),"DP","pseudo-CP". The various type of residuals only differ by an additive term; see ‘Details’ for more information.

which

if a subset of the plots is required, specify a subset of1:4; see ‘Details’ for a description of the plots.

caption

a vector of character strings with captions to appear above the plots.

panel

panel function. The useful alternative topoints,panel.smooth can be chosen byadd.smooth = TRUE.

main

title to each plot, in addition to the above caption.

ask

logical; ifTRUE, the user is asked before each plot.

...

other parameters to be passed through to plotting functions.

id.n

number of points to be labelled in each plot, starting with themost extreme.

labels.id

vector of labels, from which the labels for extreme pointswill be chosen.NULL uses observation numbers..

cex.id

magnification of point labels.

identline

logical indicating if an identity line should be added toQQ-plot and PP-plot (default:TRUE).

add.smooth

logical indicating if a smoother should be added to most plots; see alsopanel above.

label.pos

positioning of labels, for the left half and right half of the graph respectively, for plots 1-3.

cex.caption

controls the size ofcaption.

Details

The meaning ofparam.type is described indp2cp. However, for these plot only the first parametercomponent is relevant, which affects the location of the residuals; the othercomponents are not computed. Moreover, forQQ-plot andPP-plot,DP-residuals are used irrespectively ofparam.type; see Section ‘Background’.

Valueswhich=1 andwhich=2 have adifferent effect for object of class"selm" and class"mselm".In the univariate case,which=1 plots the residual values versus thefitted values ifp>1, wherep denotes the number of covariatesincluding the constant; ifp=1, a boxplot of the response is produced.Valuewhich=2 produces an histogram of the residuals with superimposedthe fitted curve, whenp>1; ifp=1, a similar plot is generatedusing the response variable instead of the residuals. Default value forwhich is1:4.

In the multivariate case,which=1 is feasible only ifp=1 and itdisplays the data scatter with superimposed the fitted distribution. Valuewhich=2 produces a similar plot but for residuals instead ofdata. Default value for codewhich is2:4 ifp>1, otherwisec(1,3,4).

Valuewhich=3 produces a QQ-plot, both in the univariate and in themultivariate case; the difference is that the squares of normalized residualsand suitably defined Mahalanobis distances, respectively, are used in the twocases. Similarly,which=4 produces a PP-plot, working in a similarfashion.

Background

Healy-type graphical diagnostics, in the form of QQ- and PP-plots, for the multivariate normal distribution have been extended to the skew-normaldistribution by Azzalini and Capitanio (1999, section 6.1), and subsequently to the skew-t distribution in Azzalini and Capitanio (2003). A brief explanation in the univariateSN case is provided in Section 3.1.1 of Azzalini and Capitanio (2014); see also Section 3.1.6.For the univariateST case, see p.102 and p.111 of the monograph.The multivariate case is discussed in Section 5.2.1 as for theSNdistribution, in Section 6.2.6 as for theST distribution.

Author(s)

Adelchi Azzalini

References

Azzalini, A. and Capitanio, A. (1999).Statistical applications of the multivariate skew normal distribution.J.Roy.Statist.Soc. B61, 579-602. Full-length version available athttps://arXiv.org/abs/0911.2093

Azzalini, A. and Capitanio, A. (2003).Distributions generated by perturbation of symmetry with emphasis ona multivariate skewt distribution.J.Roy. Statist. Soc. B65, 367-389.Full-length version available athttps://arXiv.org/abs/0911.2342

Azzalini, A. with the collaboration of Capitanio, A. (2014).The Skew-Normal and Related Families. Cambridge University Press, IMS Monographs series.

See Also

selm,dp2cp

Examples

data(wines)#m10 <- selm(flavanoids  ~ 1, family="SN", data=wines, subset=(wine=="Barolo"))plot(m10)plot(m10, which=c(1,3)) # fig 3.1 and 3.2(a) of Azzalini and Capitanio (2014)#m18 <- selm(acidity ~ sugar + nonflavanoids + wine, family="SN", data=wines)plot(m18)plot(m18, param.type="DP")#m28 <- selm(cbind(acidity, alcohol) ~ sugar + nonflavanoids + wine,             family="SN", data=wines)plot(m28, col=4)            # data(ais)m30 <-  selm(cbind(RCC, Hg, Fe) ~ 1, family="SN", data=ais)plot(m30, col=2, which=2)# multiple plots on the same sheet par(mfcol=c(2,2)) plot(m30, which=1:3)par(mfcol=c(1,1))

The distribution of the product of two jointly normal ort variables

Description

Consider the productW=X_1 X_2 from a bivariate random variable(X_1, X_2) having joint normal or Student'st distribution, with 0 location and unit scale parameters. Functions are provided for the distribution function ofW in the normal and thet case, and the quantile function for thet case.

Usage

pprodn2(x, rho)pprodt2(x, rho, nu)qprodt2(p, rho, nu, tol=1e-5, trace=0)

Arguments

x

a numeric vector

p

a numeric vector of probabilities

rho

a scalar value representing the correlation (or the matchingterm in thet case when correlation does not exists)

nu

a positive scalar representing the degrees of freedom

tol

the desired accuracy (convergence tolerance),passed to functionuniroot

trace

integer number for controlling tracing information, passed on touniroot

Details

Functionpprodt2 implements formulae in Theorem 1 of Wallgren (1980).Corresponding quantiles are obtained byqprodt2 by solving the pertaining non-linear equations with the aid ofuniroot,one such equation for each element ofp.

Functionpprodn2 implements results for the central case inTheorem 1 of Aroian et al. (1978).

Value

a numeric vector

Author(s)

Adelchi Azzalini

References

Aroian, L.A., Taneja, V.S, & Cornwell, L.W. (1978).Mathematical forms of the distribution of the product of two normal variables.Communications in statistics. Theory and methods, 7, 165-172

Wallgren, C. M. (1980).The distribution of the product of two correlatedt variates.Journal of the American Statistical Association, 75, 996-1000

See Also

uniroot

Examples

p <-  pprodt2(-3:3, 0.5, 8)qprodt2(p, 0.5, 8)

Predict method for selm-class objects

Description

Predicted values based on a model object produced byselm with univariate response.

Usage

  ## S3 method for class 'selm'predict(object, newdata, param.type = "CP",           interval = "none", level = 0.95, na.action = na.pass, ...)

Arguments

object

an object of classselm as produced by a call tofunctionselm with univariate response.

newdata

an optional data frame in which to look for variables withwhich to predict. If omitted, the fitted values are used.

param.type

a character string with the required parameterization; it must be one of"CP","DP","pseudo-CP", or possibly their equivalent lowercase.

interval

type of interval calculation among"none", "confidence", "prediction"; it can be abbreviated.

level

tolerance/confidence level (default value is0.95).

na.action

function determining what should be done with missing values in newdata. The default is to predictNA.

...

not used, only there for compatibility reasons.

Details

Predicted values are obtained by evaluating the regression function in the dataframenewdata (which defaults tomodel.frame(object)).Settinginterval other than"none" produces computation ofconfidence or prediction (tolerance) intervals at the specified level.

Ifnewdata is omitted the predictions are based on the data used for the fit.

The action taken in case of missing data is regulated by argumentna.action, along the lines of functionpredict.lm.

A detailed description of the methodology underlyingpredict.selmis available in the technical note of Azzalini (2016).

Value

a vector of predictions (ifinterval="none") or a matrix ofpredictions and bounds with column namesfit,lwr, andupr, ifinterval is set.

Author(s)

Adelchi Azzalini

References

Azzalini, A. (2016). Derivation of various types of intervals from aselm object.Technical note distributed with the documentation of theR packagesn, in fileselm-intervals.pdf within section ‘User guide, package vignettes and other documentation’.

See Also

selm,summary.selm,

makeSECdistr for theCP/DP parameterizations,

predict.lm for usage ofna.action

Examples

data(barolo)attach(barolo)  A75 <- (reseller=="A" & volume==75)detach(barolo)m3 <- selm(log(price, 10) ~ age, data=barolo[A75,], family="ST")

Profile log-likelihood function of selm-class objects

Description

One- or two-dimensional profile (penalized) log-likelihood function of aselm fit and corresponding confidence interval or regions

Usage

  ## S3 method for class 'selm'profile(fitted, param.type, param.name, param.values, npt,    opt.control = list(), plot.it = TRUE, log = TRUE, levels,    trace = FALSE, ...)

Arguments

fitted

an object of classselm as produced by a call tofunctionselm with univariate response.

param.type

a character string with the required parameterization; it must be either"CP" or"DP", or possibly their equivalent lowercase.

param.name

either a single character string or a vector of two such terms with the name(s) of the parameter(s) for which the profile log-likelihood is required; these names must match those appearing insummary.selm(object, param.type).

param.values

in the one-parameter case, a numeric vector with the values where the log-likelihood must be evaluated; in the two-parametercase, a list of two such vectors used to build a grid of coordinates of points. Their range must identify an interval or a rectangle whichincludes theMLE orMPLE obtained byselm.See ‘Details’ for more information.

npt

in case the vector or any of the vectors of argumentparam.values has length 2, an equally spaced grid of values is build with length equalto the corresponding component ofnpt.If the above condition is met but this argument is missing, a default choice is made, namely 51 or (26,26) in the one- or two-parametercase, respectively.

opt.control

an optional list passed as argumentcontrol tooptim tooptimize the log-likelihood; see ‘Details’ for more information.

plot.it

a logical value; ifTRUE (default value), a plot is produced representing the deviance, which is described in ‘Details’ below.In the one-parameter case, a confidence interval of prescribedlevelis marked on the plot; in the two-parameter case, the contour curves arelabelled with approximate confidence levels. See however for more information.

log

a logical value (default:TRUE) indicating whether thescale and tail-weight parameter (the latter only for theST family) must be log-transformed, if case any of them occurs inparam.name. This applies toomega andnu in theDP parameter set and tos.d. andgamma2 in theCP parameter set.

levels

a single probability value (in the one-parameter case) or a vector of such values (in the two-parameter case) for which the confidenceinterval or region is requited; see ‘Details’ for more information.

trace

a logical value (default:FALSE) to activate printingof intermediate outcome of the log-likelihood optimization process

...

optional graphical parameters passed to the plotting functions.

Details

For each chosen point of the parameter(s) to be profiled, the log-likelihood is maximized with respect to the remaining parameters.The optimization process is accomplished using theoptimoptimization function, withmethod="BFGS". This step can be regulated bythe user viaopt.control which is passed tooptim ascontrol argument, apart from elementfnscale whose use isreserved.

If the originalfitted object included a fixed parameter value, this is kept fixed here. If the estimation method was"MPLE",that choice carries on here; in case the penalty function was user-defined,it must still be accessible.

For plotting purposes and also in the numerical output, the deviancefunctionD is used, namely

D = 2\left[\max(\log L) - \log L\right]

whereL denotes the likelihood.

The range ofparam.values must enclose the maximum (penalized)likelihood estimates (MLE orMPLE) by an adequate extent such that suitable confidence intervals or regions can be established fromstandard asymptotic theory. If this condition does not hold, the functionstill proceeds, but no confidence interval or region is delivered.For theSN family andDP parameterization, the asymptotictheory is actually non-standard near the important point\alpha=0, but the correspondence with the regular case of theCP parameterization, still allows to derive confidence regions using standardprocedures; for additional information, see Section 3.1.6 ofAzzalini and Capitanio (2014).When theMLE occurs on the frontier of the parameter space, a message is issued and no confidence interval is produced, while in thetwo-parameter case the plot is not labelled with probability values, but onlywith deviance levels.

Value

An invisible list whose components, described below, are partly different in the one- and the two-parameter cases.

call

the calling statement

<param1>

values of the first parameter

<param2>

values of the second parameter (in the two-parameter case)

logLik

numeric vector or matrix of the profile log-likelihood values

confint

in the one-parameter case, the confidence interval

level

in the one-parameter case, the confidence level

deviance.contour

in the two-parameter case, a list of lists whoseelements identify each curve of the contour plot

Warnings

Author(s)

Adelchi Azzalini

References

Azzalini, A. with the collaboration of Capitanio, A. (2014).The Skew-Normal and Related Families. Cambridge University Press, IMS Monographs series.

See Also

selm,summary.selm,

makeSECdistr for theCP/DP parameterizations,

optim for itscontrol argument

Examples

data(ais, package="sn")m1 <-  selm(log(Fe) ~ BMI + LBM, family = "sn", data = ais)pll <- profile(m1, "dp", param.name="alpha", param.val=c(-3,2))profile(m1, "cp", param.name="gamma1", param.val=seq(-0.7, 0.4, by=0.1))# in the next example, we reduce the grid points to save execution timepll <- profile(m1, "cp", param.name=c("(Intercept.CP)", "gamma1"),           param.val = list(c(1.5, 4), c(-0.8, 0.5)), npt=c(11,11) )

Residuals and fitted values fromselm fits

Description

residuals andfitted methods for classes"selm" and"mselm".

Usage

## S4 method for signature 'selm'residuals(object, param.type = "CP", ...)## S4 method for signature 'mselm'residuals(object, param.type = "CP", ...)## S4 method for signature 'selm'fitted(object, param.type = "CP", ...)## S4 method for signature 'mselm'fitted(object, param.type = "CP", ...)

Arguments

object

an object of class"selm" or"mselm" as createdby a call to functionselm.

param.type

a character string which indicates the required type of parameter type; possible values are"CP" (default),"DP","pseudo-CP" and their equivalent lower-case expressions.

...

not used, included for compatibility with the generic method.

Value

a numeric vector (forselm-class objects) or a matrix(formselm-class objects).

Note

The possible options ofparam.type are described in thedocumentation ofdp2cp; their corresponding outcomes differ by an additive constant only. With the"CP" option (that is,the ‘centred parametrization’), the residuals are centred around 0, at least approximately; this is a reason for setting"CP" as the default option. For more information, see the ‘Note’ in the documentation ofsummary.selm.

Author(s)

Adelchi Azzalini

References

Azzalini, A. with the collaboration of Capitanio, A. (2014).The Skew-Normal and Related Families. Cambridge University Press, IMS Monographs series.

See Also

dp2cp,summary.selm,selm function,selm-class

Examples

 data(wines, package="sn")m5 <- selm(acidity ~ phenols + wine, family="SN", data=wines)residuals(m5)residuals(m5, "dp")fitted(m5, "dp")#m12 <- selm(cbind(acidity, alcohol) ~ phenols + wine,  family="SN", data=wines)residuals(m12)## see other examples at function selm

Standard deviation

Description

Thesd function from thestats is replaced by a newmethod in order to introduce a separate method to deal with objects of classSECdistrUv. The functionsd.default is an alias of the originalfunctionsd.

Usage

  sd(x, ...)  ## Default S3 method:sd(x, na.rm = FALSE, ...)

Arguments

x

a numeric vector, matrix or data frame.

na.rm

logical. Should missing values be removed?

...

further arguments passed to or from other methods.

See Also

sd,SECdistrUv


Fitting linear models with skew-elliptical error term

Description

Functionselm fits alinearmodelwithskew-elliptical error term. The term ‘skew-elliptical distribution’ is an abbreviated equivalent of skew-elliptically contoured (SEC) distribution.The function works for univariate and multivariate response variables.

Usage

selm(formula, family = "SN", data, weights, subset, na.action,   start = NULL, fixed.param = list(), method = "MLE",  penalty=NULL,   model = TRUE, x = FALSE, y = FALSE, contrasts = NULL, offset, ...)

Arguments

formula

an object of class"formula" (or one that can be coerced to that class): a symbolic description of themodel to be fitted, using the same syntax used for the similar parameter ofe.g."lm", with the restriction that the constantterm must not be removed from the linear predictor.

family

a character string which selects the parametric familyofSEC type assumed for the error term. It must be one of"SN" (default),"ST" or"SC", which correspond to theskew-normal, the skew-t and the skew-Cauchy family, respectively.SeemakeSECdistr for more information on these families andthe set ofSEC distributions; notice that the family"ESN" listed there is not allowed here.

data

an optional data frame containing the variables inthe model. If not found indata, the variables are taken fromenvironment(formula), typically the environment from whichselm is called.

weights

a numeric vector of weights associated to individualobservations. Weights are supposed to represent frequencies, hence must benon-negative integers (not all 0) andlength(weights) must equal thenumber of observations. If not assigned, a vector of all 1's is generated.

subset

an optional vector specifying a subset of observationsto be used in the fitting process. It works like the same parameterinlm.

na.action

a function which indicates what should happenwhen the data containNAs. The default is set by thena.action setting ofoptions. The ‘factory-fresh’ default isna.omit. Another possible value isNULL, no action.

start

a vector (in the univariate case) or a list (in the multivariate case) of initialDP values for searching the parameter estimates. See ‘Details’ about a choice ofstart to be avoided. Ifstart=NULL (default), initial values are selected by the procedure.Iffamily="ST", an additional option exists; see ‘Details’.

fixed.param

a list of assignments of parameter values which mustbe kept fixed in the estimation process. Currently, there only two types of admissible constraint: one is tosetalpha=0 to impose a symmetry condition of the distribution; the other is to setnu=<value>, to fix the degrees of freedom at the named<value> whenfamily="ST", for instancelist(nu=3). See ‘Details’ for additional information.

method

a character string which selects the estimation method to beused for fitting. Currently, two options exist:"MLE" (default) and"MPLE", corresponding to standard maximum likelihood and maximumpenalized likelihood estimation, respectively. See ‘Details’ foradditional information.

penalty

a character string which denotes the penalty function to besubtracted to the log-likelihood function, whenmethod="MPLE"; ifpenalty=NULL (default), a pre-defined function is adopted. See‘Details’ for a description of the default penalty function and forthe expected format of alternative specifications. Whenmethod="MLE", no penalization is applied and this argument has noeffect.

model,x,y

logicals. IfTRUE, the corresponding componentsof the fit are returned.

contrasts

an optional list. See thecontrasts.arg ofmodel.matrix.default.

offset

this can be used to specify ana priori knowncomponent to be included in the linear predictor during fitting. Thisshould beNULL or a numeric vector of length equal to the number ofcases. One or moreoffset terms can be included in theformula instead or as well, and if more than one are specified their sum is used.

...

optional control parameters, as follows.

Details

By default,selm fits the selected model by maximumlikelihood estimation (MLE), making use of some numericaloptimization method. Maximization is performed in oneparameterization, usuallyDP, and then the estimates are mapped toother parameter sets,CP and pseudo-CP; seedp2cp for more information on parameterizations. These parameter transformations are carried out trasparently to the user. The observed information matrix is used to obtain the estimated variance matrix of theMLE's and from this the standard errors. Background information onMLE in the context ofSEC distributions is provided by Azzalini and Capitanio (2014); see specifically Chapter 3, Sections 4.3, 5.2, 6.2.5–6. For additionalinformation, see the original research work referenced therein as well asthe sources quoted below.

Although the density functionof SEC distributions are expressed usingDP parameter sets, the methods associated to the objects createdby this function communicate, by default, their outcomes in theCPparameter set, or its variant form pseudo-CP whenCPdoes not exist; the ‘Note’ atsummary.selm explains why. A more detailed discussion is provided by Azzalini and Capitanio (1999, Section 5.2) and Arellano-Valle and Azzalini (2008, Section 4), for the univariate and the multivariate SN case, respectively; an abriged account is available in Sections 3.1.4–6 and 5.2.3 of Azzalini and Capitanio (2014). For the ST case, see Arellano-Valle and Azzalini (2013).

There is a known open issue which affects computation of the informationmatrix of the multivariate skew-normal distribution when the slantparameter\alpha approaches the null vector; see p.149 ofAzzalini and Capitanio (2014). Consequently, if a model withmultivariate response is fitted withfamily="SN" and the estimatealpha of\alpha is at the origin or neary so, theinformation matrix and the standard errors are not computed and awarning message is issued. In this unusual circumstance, a simplework-around is to re-fit the model withfamily="ST", which willwork except in remote cases when (i) the estimated degrees of freedomnu diverge and (ii) stillalpha remains at the origin.

The optional argumentfixed.param=list(alpha=0) imposes theconstraint\alpha=0 in the estimation process; in the multivariate case, the expression is interpreted in the sense that all the components of vector\alpha are zero, which implies symmetry of theerror distribution, irrespectively of the parameterization subsequently adopted for summaries and diagnostics.When this restriction is selected, the estimation method cannot beset to"MPLE". Under the constraint\alpha=0,iffamily="SN", the model is fitted similarly tolm, exceptthat hereMLE is used for estimation of the covariance matrix. Iffamily="ST" orfamily="SC", a symmetric Student'st or Cauchy distribution is adopted.

Under the constraint\alpha=0, the location parameter\xicoincides with the mode and the mean of the distribution, when the latterexists. In addition, when the covariance matrix of aSTdistribution exists, it differs from\Omega only by a multiplicativefactor. Consequently, the summaries of a model of this sort automaticallyadopt theDP parametrization.

The other possible form of constraint allows to fix the degrees offreedom whenfamily="ST". The two constraints can be combined writing, for instance,fixed.param=list(alpha=0, nu=6).The constraintnu=1 is equivalent to selectfamily="SC".In practice, an expression of typefixed.param=list(..) can beabbreviated tofixed=list(..).

Argumentstart allows to set the initial values, with respect to theDP parameterization, of the numerical optimization. However, there is a specific choice of start to be avoided.Whenfamily="SN", do not set the shape parameteralpha exactly at 0, as this would blow-up computation of the log-likelihood gradient and the Hessian matrix. This is not due to a software bug, but to a known peculiar behaviour of the log-likelihood function at that specific point. Therefore, in the univariate case for instance, do not set e.g.start=c(12, 21, 0), but set instead somethinglikestart=c(12, 21, 0.01). Recall that, if one needs to fit a model forcing 0 asymmetry, typically tocompare two log-likelihood functions with/without asymmetry, then the optionto use isfixed.param=list(alpha=0).

Since version 1.6.0, a new initialization procedure has been introducedfor the casefamily="ST", which adopts the method proposed by Azzalini & Salehi (2020), implemented in functionsst.prelimFit andmst.prelimFit.Correspondingly, thestart argument can now be of different type,namely a character with possible values"M0","M2" (detault in the univariate case) and"M3" (detault in the multivariate case).The choice"M0" selects the older method, in use prior to version1.6.0. For more information, see Azzalini & Salehi (2020).

In some cases, especially for small sample size, theMLE occurs onthe frontier of the parameter space, leading toDP estimates withabs(alpha)=Inf or to a similar situation in the multivariate case or in an alternative parameterization. Such outcome is regared by many asunsatisfactory; surely it prevents using the observed information matrix tocompute standard errors. This problem motivates the use of maximum penalizedlikelihood estimation (MPLE), where the regular log-likelihoodfunction\log~L is penalized by subtracting an amountQ, say, increasingly large as|\alpha| increases. Hence the function which is maximized at the optimization stage is now\log\,L~-~Q. Ifmethod="MPLE" andpenalty=NULL, the default functionQpenalty is used,which implements the penalization:

Q(\alpha) = c_1 \log(1 + c_2 \alpha_*^2)

wherec_1 andc_2 are positive constants, whichdepend on the degrees of freedomnu in theST case,

\alpha_*^2 = \alpha^\top \bar\Omega \alpha

and\bar\Omega denotes the correlation matrix associated to the scale matrixOmega described in connection withmakeSECdistr. In the univariate case\bar\Omega=1,so that\alpha_*^2=\alpha^2. Further information onMPLE and this choice of the penalty function is given in Section 3.1.8 and p.111 of Azzalini and Capitanio (2014); for a more detailed account, see Azzalini and Arellano-Valle (2013) and references therein.

It is possible to change the penalty function, to be declared via the argumentpenalty. For instance, if the calling statement includespenalty="anotherQ", the user must have defined

⁠ ⁠anotherQ <- function(alpha_etc, nu = NULL, der = 0)

with the following arguments.

This function must return a single numeric value, possibly with requiredattributes when is called withder>1.Sincesn imports functionsgrad andhessian from packagenumDeriv, one can rely on them for numerical evaluation of the derivatives, if they are not available in an explicit form.

This penalization scheme allows to introduce a prior distribution\pi for\alpha by settingQ=-\log\pi, leading to a maximuma posteriori estimate in the stated sense. SeeQpenalty for more information and an illustration.

The actual computations are not performed withinselm which only sets-up ingredients for work ofselm.fit and other functionsfurther below this one. Seeselm.fit for more information.

Value

an S4 object of classselm ormselm, depending on whetherthe response variable of the fitted model is univariate or multivariate;these objects are described in theselm class.

Cautionary notes

The first of these notes applies to the stagepreceding theuse ofselm and related fitting procedures. Before fitting a model ofthis sort, consider whether you have enough data for this task. In this respect, the passage below taken from p.63 of Azzalini and Capitanio (2014) is relevant.

“Before entering technical aspects, it is advisable to underline a qualitative effect of working with a parametric family which effectively is regulated by moments up to the third order. The implication is that the traditional rule of thumb by which a sample size is small up to ‘aboutn = 30’, and then starts to become ‘large’,while sensible for a normal population or other two-parameter distribution, is not really appropriate here. To give an indication of a new threshold is especially difficult, because the value of\alpha also has a role here. Under thiscaveat, numerical experience suggests that ‘aboutn = 50’ may be a more appropriate guideline in this context.”

The above passage referred to the univariate SN context. In the multivariate case, increase the sample size appropriately, especially so with theST family.This is not to say that one cannot attempt fitting these models with small or moderate sample size. However, one must be aware of the implications and not be surprised if problems appear.

The second cautionary note refers instead to the outcome of a call toselm and related function, or the lack of it.The estimates are obtained by numerical optimization methods and, asusual in similar cases, there is no guarantee that the maximum of theobjective function is achieved. Consideration of model simplicityand of numerical experience indicate that models withSN errorterms generally produce more reliable results compared to those with theST family. Take into account that models involving a traditional Student'st distribution with unknown degrees of freedom can already be problematic; the presence of the (multivariate) slant parameter\alpha in theST family cannot make things any simpler. Consequently, care must be exercised, especially so if one works with the (multivariate)ST family. Consider re-fitting a model with different starting values and, in theST case, building the profile log-likelihood for a range of\nu values; functionprofile.selm can be useful here.

Details on the numerical optimization which has produced objectobj can be extracted withslot(obj, "opt.method"); inspection of thiscomponent can be useful in problematic cases.# Be aware that occasionallyoptim andnlminb declare successful# completion of a regular minimization problem at a point where the Hessian # matrix is not positive-definite.

Author(s)

Adelchi Azzalini

References

Arellano-Valle, R. B., and Azzalini, A. (2008).The centred parametrization for the multivariate skew-normal distribution.J. Multiv. Anal.99, 1362–1382.Corrigendum:100 (2009), 816.

Arellano-Valle, R. B., and Azzalini, A. (2013, available online 12 June 2011).The centred parametrization and related quantities for the skew-t distribution.J. Multiv. Anal.113, 73–90.

Azzalini, A. and Capitanio, A. (1999).Statistical applications of the multivariate skew normal distribution.J.Roy.Statist.Soc. B61, 579–602. Full-length version available athttps://arXiv.org/abs/0911.2093

Azzalini, A. and Arellano-Valle, R. B. (2013, available online 30 June 2012). Maximum penalized likelihood estimation for skew-normal and skew-t distributions.J. Stat. Planning & Inference143, 419–433.

Azzalini, A. with the collaboration of Capitanio, A. (2014).The Skew-Normal and Related Families. Cambridge University Press, IMS Monographs series.

Azzalini, A. and Salehi, M. (2020).Some computational aspects of maximum likelihood estimation of the skew-t distribution. InComputational and Methodological Statistics and Biostatistics, edited by A. Bekker, Ding-Geng Chen and Johannes T. Ferreira, pp.3-28. Springer Nature Switzerland.

See Also

Examples

data(ais)m1 <- selm(log(Fe) ~ BMI + LBM, family="SN", data=ais)print(m1)summary(m1)s <- summary(m1, "DP", cov=TRUE, cor=TRUE)plot(m1)plot(m1, param.type="DP")logLik(m1)coef(m1)coef(m1, "DP")var <- vcov(m1)#m1a <- selm(log(Fe) ~ BMI + LBM, family="SN", method="MPLE", data=ais)m1b <- selm(log(Fe) ~ BMI + LBM, family="ST", fixed.param=list(nu=8), data=ais)#data(barolo)attach(barolo)A75 <- (reseller=="A" & volume==75)logPrice <- log(price[A75],10) m <- selm(logPrice ~ 1, family="ST", opt.method="Nelder-Mead")summary(m)summary(m, "DP")plot(m, which=2, col=4, main="Barolo log10(price)")# cfr Figure 4.7 of Azzalini & Capitanio (2014), p.107detach(barolo)#-----# examples with multivariate response#m3 <- selm(cbind(BMI, LBM) ~ WCC + RCC, family="SN", data=ais)plot(m3, col=2, which=2)summary(m3, "dp")coef(m3)coef(m3, vector=FALSE)#data(wines)m28 <- selm(cbind(chloride, glycerol, magnesium) ~ 1, family="ST", data=wines)dp28 <- coef(m28, "DP", vector=FALSE) pcp28 <- coef(m28, "pseudo-CP", vector=FALSE) # the next statement takes a little more time than othersplot(m28)#m4 <- selm(cbind(alcohol,sugar)~1, family="ST", data=wines)m5 <- selm(cbind(alcohol,sugar)~1, family="ST", data=wines, fixed=list(alpha=0))print(1 - pchisq(2*as.numeric(logLik(m4)-logLik(m5)), 2)) # test for symmetry

Classesselm andmselm of objects created by functionselm

Description

A successful call to functionselm creates an object ofeither of these classes, having a structure described in section‘Slots’. A set of methods for these classes of objects exist, listed insection ‘Methods’.

Objects from the class

An object can be created by a successful call to functionselm.

Slots

call:

the calling statement.

family:

the parametric family of skew-ellitically contoured distributed (SEC) type.

logL:

log-likelihood or penalized log-likelihood valueachieved at the end of the maximization process.

method:

estimation method ("MLE" or"MPLE").

param:

estimated parameters, for various parameterizations.

param.var:

approximate variance matrices of the parameter estimates, for various parameterizations.

size:

a numeric vector with size of various components.

fixed.param:

a vector of parameters which have been keptfixed in the fitting process, if any.

residuals.dp:

residual values, for DP-type parameters.

fitted.values.dp:

fitted values, for DP-type parameters.

control:

a list with control parameters.

input:

a list of selected input values.

opt.method:

a list with details on the optimization method.

Methods

coefsignature(object = "selm"): ...
logLiksignature(object = "selm"): ...
plotsignature(x = "selm"): ...
showsignature(object = "selm"): ...
summarysignature(object = "selm"): ...
residualssignature(object = "selm"): ...
fittedsignature(object = "selm"): ...
vcovsignature(object = "selm"): ...
weightssignature(object = "selm"): ...
profilesignature(fitted = "selm"): ...
confintsignature(object = "selm"): ...
predictsignature(object = "selm"): ...
coefsignature(object = "mselm"): ...
logLiksignature(object = "mselm"): ...
plotsignature(x = "mselm"): ...
showsignature(object = "mselm"): ...
summarysignature(object = "mselm"): ...
residualssignature(object = "mselm"): ...
fittedsignature(object = "mselm"): ...
vcovsignature(object = "mselm"): ...
weightssignature(object = "mselm"): ...

Note

Seedp2cp for a description of possible parameter sets.WhenlogLik is used on an object obtained using the MPLE estimationmethod, the value reported is actually thepenalized log-likelihood.

Author(s)

Adelchi Azzalini

See Also

See alsoselm function,plot.selm,summary.selm,dp2cp

Examples

data(ais)m1 <- selm(log(Fe) ~ BMI + LBM, family="SN", data=ais)summary(m1)plot(m1)logLik(m1)res <- residuals(m1)fv <- fitted(m1)# data(wines, package="sn")m2 <- selm(alcohol ~ malic + phenols, data=wines)#m12 <- selm(cbind(acidity, alcohol) ~ phenols + wine, family="SN", data=wines)coef(m12)cp <- coef(m12, vector=FALSE)dp <- coef(m12, "DP", vector=FALSE)plot(m12)plot(m12, which=2, col="gray60", pch=20)

Fitting functions forselm models

Description

A call toselm activates a call toselm.fit andfrom here to some other function which actually performs the parametersearch, among those listed below. These lower-level functions can be called directly for increased efficiency, at the expense of some more programming effort and lack of methods for the returned object.

Usage

 selm.fit(x, y, family = "SN", start = NULL, w, fixed.param = list(),    offset = NULL, selm.control=list())sn.mple(x, y, cp = NULL, w, penalty = NULL, trace = FALSE, opt.method =   c("nlminb",  "Nelder-Mead", "BFGS", "CG", "SANN"), control = list()) st.mple(x, y, dp = NULL, w, fixed.nu = NULL, symmetr = FALSE, penalty = NULL,    trace = FALSE, opt.method = c("nlminb", "Nelder-Mead", "BFGS", "CG", "SANN"),    control = list()) msn.mle(x, y, start = NULL, w, trace = FALSE, opt.method = c("nlminb",    "Nelder-Mead", "BFGS", "CG", "SANN"), control = list())msn.mple(x, y, start = NULL, w, trace = FALSE, penalty = NULL,    opt.method = c("nlminb", "Nelder-Mead", "BFGS", "CG", "SANN"),    control = list()) mst.mple(x, y, start = NULL, w, fixed.nu = NULL, symmetr=FALSE,    penalty = NULL, trace = FALSE,    opt.method = c("nlminb", "Nelder-Mead", "BFGS", "CG", "SANN"),    control = list())

Arguments

x

a full-rank design matrix with the first column of all 1's.

y

a vector or a matrix of response values such thatNROW(y)=nrow(x).

family

a character string which selects the parametric family ofdistributions assumed for the error term of the regression model. It must one of"SN" (default),"ST" or"SC", which correspond to the skew-normal, the skew-t and the skew-Cauchy family, respectively.SeemakeSECdistr for more information on these families andthe skew-elliptically contoured (SEC) distributions; notice thatfamily"ESN" is not allowed here.

start,dp,cp

a vector or a list of initial parameter values,depeding whethery is a vector or a matrix. It is assumed thatcp is given in theCP parameterization,dp andstart in theDP parameterization. Forst.mple andmst.mple, see also the paragraph aboutstart in the documentation ‘Details’ ofselm.

w

a vector of non-negative integer weights of length equal toNROW(y); if missing, a vector of all 1's is generated.

fixed.param

a list of assignments of parameter values to be keptfixed during the optimization process. Currently, there is only one suchoption, namelyfixed.param=list(nu='value'), to fix the degreesof freedom at the named'value' whenfamily="ST", for instancelist(nu=3). Settingfixed.param=list(nu=1) is equivalent toselectfamily="SC".

penalty

an optional character string with the name of the penaltyfunction of the log-likelihood; default valueNULL corresponds to no penalty.

offset

this can be used to specify ana priori knowncomponent to be included in the linear predictor during fitting. Thisshould beNULL or a numeric vector of length equal to the number ofcases. One or moreoffset terms can be included in theformula instead or as well, and if more than one are specified their sum isused.

trace

a logical value which regulates printing of successive calls to the target function; default value isFALSE which suppresses printing.

fixed.nu

a positive value to keep fixed the parameternu of theST distribution in the optimization process; with defaultvalueNULL,nu is estimated like the other parameters.

symmetr

a logical flag indicating whether a contraint of symmetry isimposed on the slant parameter; default issymmetr=FALSE.

opt.method

a character string which selects the optimization methodwithin the setc("nlminb", "Nelder-Mead", "BFGS", "CG", "SANN");the last four of these are"methods" of functionoptim.

selm.control

a list whose components regulate the working ofselm.fit; see ‘Details’ for their description;

control

a list of control items passed to the optimization function.

Details

A call toselm produces a call toselm.fit whichselects the appropriate function amongsn.mple,st.mple,msn.mle,msn.mple,mst.mple, depending on thearguments of the calling statement. In the adopted scheme for function names,msn refers to a multivariate skew-normal distribution andmst refers to a multivariate skew-t distribution, whilemle andmple refers to maximum likelihood and maximumpenalized likelihood estimation, respectively.Of these functions,sn.mple works inCP space; the othersin theDP space. In all cases, a correspondig mapping to the alternative parameter space is performed before exitingselm.fit,in addition to the selected parameter set.

The components ofselm.control are as follows:

Functionmsn.mle, forMLE estimation of linear models withSN errors, is unchanged from version 0.4-x of the package. Functionmsn.mple is similar tomsn.mle but allows to introducea penalization of the log-likelihood; whenpenalty=NULL, a call tomsn.mle is more efficient.Functionssn.mple andmst.mple work likesn.mle andmst.mle in version 0.4-x if the argumentpenalty is not set or it is set toNULL, except thatmst.mple does nothandle a univariate response (usest.mple for that).

Value

A list whose specific components depend on the named function.Typical components are:

call

the calling statement

dp

vector or list of estimatedDP parameters

cp

vector or list of estimatedCP parameters

logL

the maximized (penalized) log-likelihood

aux

a list with auxiliary output values, depending on the function

opt.method

a list produced by the numericalopt.method

Background

Computational aspects of maximum likelihood estimation for univariateSN distributions are discussed in Section 3.1.7 of Azzalini andCapitanio (2014). The working ofsn.mple follows these lines; maximization is performed in theCP space. All other functionsoperate on theDP space.

The technique underlyingmsn.mle is based on a partial analytical maximization, leading implicitly to a form of profile log-likelihood.This scheme is formulated in detail in Section 6.1 of Azzalini and Capitanio (1999) and summarized in Section 5.2.1 of Azzalini and Capitanio (2014). The same procedure is not feasible when one adoptsMPLE; hence functionmsn.mple has to maximize over a larger parameter space.

When the SN family is fitted with the constraintalpha=0, this amountsto adopt a classical linear model with Gaussian distributional assumption.The correspondingMLE's are the same as those produced bylm, except that the denominator the of theMLE variance (matrix) has the ‘uncorrected’ form.In the multivariate case, the covariance matrix ofMLE is computed using expression (10) in Section 15.8 of Magnus and Neudecker (2007).

Maximization of the univariateST log-likelihood is speeded-up by using the expressions of the gradient given by DiCiccio and Monti (2011),reproduced with inessential variants in Section 4.3.3 of Azzalini and Capitanio (2014).

The working ofmst.mple is based on a re-parameterization described in Section 5.1 of Azzalini and Capitanio (2003). The expressions of the corresponding log-likelihood derivatives are given in Appendix B of the full version of the paper.

Author(s)

Adelchi Azzalini

References

Azzalini, A. and Capitanio, A. (1999).Statistical applications of the multivariate skew normal distribution.J.Roy.Statist.Soc. B61, 579–602. Full-length version available athttps://arXiv.org/abs/0911.2093

Azzalini, A. and Capitanio, A. (2003).Distributions generated by perturbation of symmetry with emphasis ona multivariate skewt distribution.J.Roy. Statist. Soc. B65, 367–389.Full-length version available athttps://arXiv.org/abs/0911.2342

Azzalini, A. with the collaboration of Capitanio, A. (2014).The Skew-Normal and Related Families. Cambridge University Press, IMS Monographs series.

DiCiccio, T. J. and Monti, A. C. (2011). Inferential aspects of the skewt-distribution.Quaderni di Statistica13, 1–21.

Magnus, J. R. and Neudecker, H. (2007).Matrix Differential Calculus with Applications in Statistics and Econometrics, third edition. John Wiley & Sons.

See Also

selm for a comprehensive higher level fitting function,Qpenalty for specification of a penalty function

Examples

 data(wines, package="sn")X <- model.matrix(~ phenols + wine, data=wines)fit <- msn.mle(x=X, y=cbind(wines$acidity, wines$alcohol), opt.method="BFGS")fit <- st.mple(x=X, y = wines$acidity, fixed.nu=4, penalty="Qpenalty")

Cumulants of univariate skew-normal and skew-t distributions

Description

Compute cumulants of univariate (extended) skew-normal and skew-t distributions up to a given order.

Usage

  sn.cumulants(xi=0, omega=1, alpha=0, tau=0, dp=NULL, n=4)  st.cumulants(xi=0, omega=1, alpha=0, nu=Inf, dp=NULL, n=4)

Arguments

xi

location parameters (numeric vector).

omega

scale parameters (numeric vector, positive).

alpha

slant parameters (numeric vector).

tau

hidden mean parameter (numeric scalar).

nu

degrees of freedom (numeric scalar, positive); the default valueisnu=Inf which corresponds to the skew-normal distribution.

dp

a vector containing the appropriate set of parameters. Ifdp is notNULL, the individual parameters must not be supplied.

n

maximal order of the cumulants. Forst.cumulants andforsn.cumulants withtau!=0 (ESN distribution), it cannot exceed 4.

Value

A vector of lengthn or a matrix withn columns, in case the input values are vectors.

Background

See Sections 2.1.4, 2.2.3 and 4.3.1 of the reference below

Author(s)

Adelchi Azzalini

References

Azzalini, A. with the collaboration of Capitanio, A. (2014).The Skew-Normal and Related Families. Cambridge University Press, IMS Monographs series.

See Also

dsn,dsn

Examples

sn.cumulants(omega=2, alpha=c(0, 3, 5, 10), n=5)sn.cumulants(dp=c(0, 3, -8), n=6)st.cumulants(dp=c(0, 3, -8, 5), n=6) # only four of them are computedst.cumulants(dp=c(0, 3, -8, 3))

Expected and observed Fisher information forSN andST distributions

Description

Computes Fisher information for parameters of simple sample havingskew-normal (SN) or skew-t (ST) distribution orfor a regression model with errors term having such distributions, in theDP andCP parametrizations.

Usage

sn.infoUv(dp=NULL, cp=NULL, x=NULL, y, w, penalty=NULL,  norm2.tol=1e-06)     sn.infoMv(dp, x=NULL, y, w, penalty=NULL, norm2.tol=1e-06, at.MLE=TRUE)st.infoUv(dp = NULL, cp = NULL, x = NULL, y, w, fixed.nu = NULL,     symmetr = FALSE, penalty = NULL, norm2.tol = 1e-06) st.infoMv(dp, x = NULL, y, w, fixed.nu = NULL, symmetr = FALSE,     penalty = NULL, norm2.tol = 1e-06)

Arguments

dp,cp

direct or centred parameters, respectively; one of themcan be a non-NULL argument. For the univariateSNdistribution,sn.infoUv is to be used, and these arguments arevectors. In the multivariate case,sn.infoMv is to be used and thesearguments are lists. Seedp2cp for their description.

x

an optional matrix which represents the design matrix of a regression model

y

a numeric vector (forsn.infoUv andst.infoUv)or a matrix (forsn.infoMv andst.infoMv) representing theresponse. In theSN case (sn.infoUv andsn.infoMv),y can be missing, and in this case the expectedinformation matrix is computed; otherwise the observed information iscomputed. In theST case (st.infoUv andst.infoMv),y is a required argument, since only the observed information matrixforST distributions is implemented. See ‘Details’ foradditional information.

w

an optional vector of weights (only meaningful for the observed information, hence ify is missing); if missing, a vector of 1's isgenerated.

fixed.nu

an optional numeric value which declares a fixed value of thedegrees of freedom,nu. If notNULL, the information matrixhas a dimension reduced by 1.

symmetr

a logical flag which indicates whether a symmetry condition of the distribution is being imposed; default issymmetr=FALSE.

penalty

a optional character string with the name of the penalty function used in the call toselm; see this function for itsdescription.

norm2.tol

for the observed information case, the Mahalanobis squareddistance of the score function from 0 is evaluated; if it exceedsnorm2.tol, a warning message is issued, since the ‘informationmatrix’ so evaluated may be not positive-definite. See ‘Details’ foradditional information.

at.MLE

a logical flag; ifat.MLE=TRUE (default value), it generates a warning if the information matrix is not positive definite or an error if the observed information matrix is not evaluated at a maximum of the log-likelihood function.

Value

a list containing the following components:

dp,cp

one of the two arguments is the one supplied on input; the other one matches the previous one in the alternative parametrization.

type

the type of information matrix: "observed" or "expected".

info.dp,info.cp

matrices of Fisher (observed or expected) information in the two parametrizations.

asyvar.dp,asyvar.cp

inverse matrices of Fisher information in the twoparametrizations, when available; See ‘Details’ for additionalinformation.

aux

a list containing auxiliary elements, depending of the selected function and the type of computation.

Details

In the univariateSN case, whenx is not set, then a simplerandom sample is assumed and a matrixx with a single column of all 1's is constructed; in this case, the supplied vectordp orcpmust have length 3. Ifx is set, then the supplied vector of parameters,dp orcp, must have lengthncol(x)+2.In the multivariate case, a direct extension of this scheme applies.

If the observed information matrix is required,dp orcp shouldrepresent the maximum likelihood estimates (MLE) for the giveny,otherwise the information matrix may fail to be positive-definite and itwould be meaningless anyway. Therefore, the squared Mahalobis norm of the score vector is evaluated and compared withnorm2.tol. If it exceeds this threshold, this is taken as an indication that the suppliedparameter list is not at theMLE and a warning message is issued.The returned list still includesinfo.dp andinfo.cp, but in this case these represent merely the matrices of second derivatives;asyvar.dp andasyvar.cp are set toNULL.

Background

The information matrix for the the univariateSN distribution inthe two stated parameterizations in discussed in Sections 3.1.3–4 of Azzalini and Capitanio (2014). For the multivariate distribution, Section 5.2.2 of this monograph summarizes briefly the findings of Arellano-Valle and Azzalini (2008).

ForST distributions, only the observed information matrix is provided, at the moment. Computation for the univariate case is based on DiCiccio and Monti (2011). For the multivariate case, the score function iscomputed using an expression of Arellano-Valle (2010) followed by numericaldifferentiation.

References

Arellano-Valle, R. B. (2010).The information matrix of the multivariate skew-t distribution.Metron,LXVIII, 371–386.

Arellano-Valle, R. B., and Azzalini, A. (2008).The centred parametrization for the multivariate skew-normal distribution.J. Multiv. Anal.99, 1362–1382.Corrigendum:100 (2009), 816.

Azzalini, A. with the collaboration of Capitanio, A. (2014).The Skew-Normal and Related Families. Cambridge University Press, IMS Monographs series.

DiCiccio, T. J. and Monti, A. C. (2011). Inferential aspects of the skewt-distribution.Quaderni di Statistica13, 1–21.

See Also

dsn,dmsn,dp2cp

Examples

infoE <- sn.infoUv(dp=c(0,1,5))           # expected informationset.seed(1); rnd <- rsn(100, dp=c(0, 1, 3))fit <- selm(rnd~1, family="SN")infoO <- sn.infoUv(cp=coef(fit), y=rnd)   # observed information#data(wines)X <- model.matrix(~ pH + wine, data=wines)fit <- sn.mple(x=X, y=wines$alcohol)infoE <- sn.infoUv(cp=fit$cp, x=X)infoO <- sn.infoUv(cp=fit$cp, x=X, y=wines$alcohol)

Spreading grouped data over intervals

Description

Assuming thatcounts represents the frequencies of observationsfalling into intervals identified bybreaks, the function returnsa vector of values obtained by uniformly spreading each group of data over the pertaining interval.

Usage

spread.grouped(breaks, counts, shift = "centre")

Arguments

breaks

A numeric vector of strictly increasing finite values which identify a set of contiguous intervals on the real line.

counts

A vector of non-negative integers representing the number of observations falling in the intervals specified bybreaks;it is then required thatlength(counts)+1=length(breaks).

shift

a character string which regulates the positioning of the constructed points within a given interval, with possible values"left","center" (default choice) and"right",possibly abbreviated.

Value

A numeric vector of lengthsum(counts) of values withinrange(breaks).

Author(s)

Adelchi Azzalini

See Also

fitdistr.grouped

Examples

breaks <- c(10, 12, 15, 20)counts <- c(3, 2, 4)spread.grouped(breaks, counts)spread.grouped(breaks, counts, "l")

Compute preliminary estimates for a linear model with ST-distributed error term

Description

For a univariate or multivariate linear model where the error termis assumed to have skew-t (ST) distribution and the location parameteris a linear function of a set of explanatory values, the functions computepreliminary estimates to be used as initial values for a subsequentmaximization of the likelihood function.These functions are mainly intended for internal package use.

Usage

st.prelimFit(x, y, w, quick = TRUE, verbose = 0, max.nu = 30, SN=FALSE)mst.prelimFit(x, y, w, quick = TRUE, verbose = 0, max.nu = 30, SN=FALSE)

Arguments

x

design matrix of numeric values. It may be missing;if present, the first column must contain all 1's.

y

vector of observations of lengthn,or a matrix withn rows.

w

a vector of non-negative integer weights of lengthn; if missing, a vector of all 1's is generated.

quick

logical value which regulates whether a very quick estimate is produced (default valueTRUE); see ‘Details’ for additional information.

verbose

an integer value which regulates the amount of messagesprinted out; default value is 0.

max.nu

threshold for the estimated degrees of freedom

SN

logical value (default value:FALSE); ifTRUE, aSN distribution is assumed.

Details

The underlying methodology is the one presented by Azzalini and Salehi (2020).In its essence, it is based on the selection of parameter values achieving the best matching between certain quantile-based summaries of the ST distributionand the corresponding empirical quantities for the sample or, in the presence of explanatory variables, the same quantities computed from the residuals after fitting a median regression.

Argumentquick selects whether the above-described matching is performedin a quick or in an accurate way. Since the output values of this function are intended to be only initial values for subsequent likelihood maximization,this explains the default optionquick=TRUE. Other possible valuesareFALSE andNULL; the latter simply setsalpha=0andnu=10.

Since the methodology hinges on some selected sample quantiles, it can occasionally be spoiled by poor behaviour of these basic quantiles,especially for small or moderate sample sizes.The more visible effect of such situation is a very large value of theestimated degrees of freedom, which then hampers rather than help asubsequent likelihood maximization. It is therefore appropriate to setan upper limitmax.nu to this component.

Argumentx may be missing. In this case, a one-column matrix withall elements 1 is created.

Value

A call tost.prelimFit returns a list with these components:

dp

a vector of estimates in the DP parameterization

residuals

a vector of residual values

logLik

the corresponding log-likelihood value

A call tomst.prelimFit returns a list with these components:

dp

a list with the estimates in the DP parameterization

shrink.steps

the number of shrinking steps applied to the original estimate of the scale matrix to obtain an admissible matrix

dp.matrix

a numeric matrix formed by the component-wise DP estimates

logLik

the corresponding log-likelihood value

Note

These functions are mainly intended to be called byselm,but they could be of interest for people developing their own procedures.

Author(s)

Adelchi Azzalini

References

Azzalini, A. and Salehi, M. (2020).Some computational aspects of maximum likelihood estimation of the skew-t distribution.In:Computational and Methodological Statistics and Biostatistics,edited by Andriëtte Bekker, Ding-Geng Chen and Johannes T. Ferreira.Springer. DOI: 10.1007/978-3-030-42196-0

See Also

selm and eitherdst ordmst for explanation of the DP parameters

Examples

data(barolo)attach(barolo)  A75 <- (reseller=="A" & volume==75)  log.price <- log(price[A75], 10)prelimFit <- st.prelimFit(y=log.price)detach(barolo)# data(ais)attach(ais)prelim32 <- mst.prelimFit(y=cbind(BMI, LBM), x=cbind(1, Ht, Wt))detach(ais)

Summary of aSEC distribution object

Description

Produce a summary of an object of class either"SECdistrUv" or"SECdistrMv", which refer to a univariate or amultivariateSEC distribution, respectively. Both types ofobjects can be produced bymakeSECditr.

Usage

## S4 method for signature 'SECdistrUv'summary(object, cp.type = "auto", probs)## S4 method for signature 'SECdistrMv'summary(object, cp.type = "auto")

Arguments

object

an object of class"SECdistrUv" or"SECdistrMv".

cp.type

a character string to select the required variance ofCP parameterization; possible values are"proper","pseudo","auto" (default). For a description of thesecodes, seedp2cp.

probs

in the univariate case, a vector of probabilities for which the corresponding quantiles are required. If missing, the vectorc(0.05, 0.25, 0.50, 0.75, 0.95) is used.

Details

For a description of theDP,CP and pseudo-CP parameter sets included in the returned object, seedp2cp.

Theaux slot of the returned object includes other summary quantities,as described next. In the univariate case, the reported quantile-based measures of skewness andkurtosis refer to the Bowley and Moors measures, respectively;see Groeneveld (2006) and Moors (1988) for their specifications.In the multivariate case, the Mardia's measures of skewness and kurtosisare computed from the expressions given on p.153 and p.178 ofAzzalini and Capitanio (2014).

In the univariate case,delta is a simple transformation of the slant parameteralpha; it takes values in(-1, 1). In the multivariate case,delta is a vector with componentsof similar type; they correspond to the matching terms of the univariate components. Thealpha* anddelta* coefficients are univariatecomprehensive summary quantities of slant; see pp.132-3 of Azzalini and Capitanio (2014) for their expressions. These quantitiesplay an important role inSEC distributions; for instance,the Mardia's measures of multivariare skewness and kurtosis dependon the vector of slant parameters only viadelta* or, equivalently,viaalpha*.

The mode, which is unique for all these distributions, is computed by a numerical line search between theDP location and theCPlocation (or the pseudo-DP location, when the latter doesexists). This line search is univariate also in the multivariate case, using Propositions 5.14 and 6.2 of Azzalini and Capitanio (2014);see also Problem 5.14.

The examples below illustrate how extract various components fromaux and other slots of the returned object.

Value

A list with the following components:

family

name of the family within theSEC class, character

dp

DP parameters, a list or a vector

name

the name of the distribution, character string

compNames

in the multivariate case the names of the components,a character vector

cp

CP parameters, a list or a vector

cp.type

the name of the selected variant of theCP set

aux

a list with auxiliary ingredients (mode, coefficients of skewness and kurtosis, in the parametric and non-parametric variants, and more); see Section ‘Details’ for more information.

The list itemsdp andcp are vectors ifclass(object) isSECdistrUv (univariate distribution); they are lists ifclass(object) isSECdistrMv (multivariate distribution).

Author(s)

Adelchi Azzalini

References

Azzalini, A. with the collaboration of Capitanio, A. (2014).The Skew-Normal and Related Families. Cambridge University Press, IMS Monographs series.

Moors, I. J. A. (1988). A quantile alternative for kurtosis.The Statistician37, 25-32.

Groeneveld, R. A. (2006). Skewness, Bowley's measures of. In volume12, 7771-3, ofEncyclopedia of Statistical Sciences, 2nd edition, edited by Kotz et al. Wiley, New York.

See Also

makeSECdistr for building aSEC distribution

extractSECdistr for extracting aSECdistribution from aselm fit

methodsmean andsd for computing the mean and the standard deviation ofSECdistrUv-class objects,methodsmean andvcov for computing the mean vector and the variance matrix ofSECdistrMv-class objects

modeSECdistr for computing the mode directly

Examples

f3 <- makeSECdistr(dp=c(3,2,5), family="SC")summary(f3)s <- summary(f3, probs=(1:9)/10)print(slotNames(s)) print(names(slot(s,"aux")))   # the components of the 'aux' slotslot(s, "aux")$mode           # the same of modeSECdistr(object=f3)slot(s, "aux")$q.measures     # quantile-based measures of skewness and kurtosis#dp3 <- list(xi=1:3, Omega=toeplitz(1/(1:3)), alpha=c(-3, 8, 5), nu=6)st3 <- makeSECdistr(dp=dp3, family="ST", name="ST3", compNames=c("U", "V", "W"))s <- summary(st3)dp <- slot(s, "dp")           # the same of slot(st3, "dp")slot(s, "cp")$var.cov         # the same of vcov(st3)slot(s, "aux")$delta.star     # comprehensive coefficient of shapeslot(s, "aux")$mardia         # Mardia's measures of skewness and kurtosis#dp2 <- list(xi=rep(0,2), Omega=matrix(c(2,2,2,4),2,2), alpha=c(3,-5), tau=-1)esn2 <- makeSECdistr(dp=dp2, family="ESN", name="ESN-2d")summary(esn2)

Classessummary.SECdistrMv andsummary.SECdistrUv

Description

Summaries of objects of classesSECdistrMv andSECdistrUv

Objects from the Class

Objects can be created by calls of typesummary(object) whenobject is of class either"SECdistrMv" or"SECdistrUv".

Slots

family:

A character string which representsthe parametric family ofSEC type

dp:

Object of class"list" or"vector" for"SECdistrMv" and"SECdistrUv", respectively

name:

Object of class"character" with the name of distribution

compNames:

For"SECdistrMv" objects, a character vector with names of the components of the multivariate distribution

cp:

Object of class"list" or"vector" for"SECdistrMv" and"SECdistrUv", respectively

cp.type:

a character string of theCP version

aux:

A list of auxiliary quantities

Methods

show

signature(object = "summary.SECdistrMv"): ...

show

signature(object = "summary.SECdistrUv"): ...

Author(s)

Adelchi Azzalini

See Also

summary.SECdistrMv,summary.SECdistrUv,

makeSECdistr,dp2cp


Summary of aSUN distribution object

Description

Produce a summary of an object of class"SUNdistr"

Usage

## S4 method for signature 'SUNdistr'summary(object, ...)

Arguments

object

an object of class"SUNdistr".

...

optional arguments passed tomom.mtruncnorm for theregulation of its working.qq

Value

An S4-object with the following slots:

dp

the parameters of the distrbution, a list

name

the name of the distribution, a character string

compNames

the names of the components, a character vector

HcompNames

the names of the hidden components, a character vector

mean

the mean value, a vector

var.cov

the variance-covariance matrix

gamma1

the marginal indices of asymmetry, a vector

cum3

the third order cumulants, a three-dimensional array

mardia

the Mardia's measures of multivariate asymmetry andskewness, a vector of length two

Author(s)

Adelchi Azzalini

References

Arellano-Valle, R. B. and Azzalini, A. (2021).Some properties of the unified skew-normal distribution.Statistical Papers,doi:10.1007/s00362-021-01235-2andarXiv:2011.06316

Azzalini, A. with the collaboration of Capitanio, A. (2014).The Skew-Normal and Related Families. Cambridge University Press, IMS Monographs series.

See Also

makeSUNdistr for building aSUN distribution object

methodsmean andvcov for computing the mean vector and the variance matrix ofSUNdistr-class objects

Examples

Omega <- matrix(c(5, 1, 1, 6), 2, 2)Delta <- matrix(c(0.30, 0.50, 0.50, 0.85), 2, 2, byrow=TRUE)Gamma <- matrix(c(1, 0.18, 0.18, 1), 2, 2)tau <- c(0.4, -0.8)dp2 <- list(x=c(1, 0), Omega=Omega, Delta=Delta, tau=tau, Gamma=Gamma)sun2 <-  makeSUNdistr(dp=dp2, name="SUN2", compNames=c("u", "v"))s <- summary(sun2)

Classsummary.SUNdistr

Description

Summaries of objects of classesSUNdistr

Objects from the Class

Objects can be created by calls of typesummary(object) whenobject is of class"SUNdistr".

Slots

dp:

a list of parameters

name

the name of the distribution, a character string

compNames

the names of the components, a character vector

HcompNames

the names of the hidden components, a character vector

mean

the mean value, a vector

var.cov

the variance-covariance matrix

gamma1

the marginal indices of asymmetry, a vector

cum3

the third order cumulants, a three-dimensional array

mardia

the Mardia's measures of multivariate asymmetry andskewness, a vector of length two

Methods

show

signature(object = "summary.SUNdistr"): ...

Author(s)

Adelchi Azzalini

See Also

summary.SUNdistr,makeSUNdistr


Summarizingselm fits

Description

summary method for class"selm" and"mselm".

Usage

## S4 method for signature 'selm'summary(object, param.type = "CP", cov = FALSE, cor = FALSE)## S4 method for signature 'mselm'summary(object, param.type = "CP", cov = FALSE, cor = FALSE)

Arguments

object

an object of class"selm" or"mselm" as createdby a call to functionselm.

param.type

a character string which indicates the required type of parameter type; possible values are"CP" (default),"DP","pseudo-CP" and their equivalent lower-case expressions.

cov

a logical value, to indicate if an estimate of the variance and covariance matrix of the estimates is required (default:FALSE).

cor

a logical value, to indicate if an estimate of the correlation matrix of the estimates is required (default:FALSE).

Value

An S4 object of classsummary.selm with 12 slots.

call:

the calling statement.

family:

the parametric family of skew-ellitically contoured distributed (SEC) type.

logL:

the maximized log-likelihood or penalized log-likelihood value

method:

estimation method ("MLE" or"MPLE")

param.type:

a characer string with the chosen parameter set.

param.table:

table of parameters, std.errors and z-values

fixed.param:

a list of fixed parameter values

resid:

residual values

control:

a list with control parameters

aux:

a list of auxiliary quantities

size:

a numeric vector with various lengths and dimensions

boundary:

a logical value which indicates whether the estimates are on the boundary of the parameter space

Note

There are two reasons why the default choice ofparam.type isCP. One is the the easier interpretation of cumulant-based quantitiessuch as mean value, standard deviation, coefficient of skewness.

The other reason is more technical and applies only to cases when the estimate of the slant parameteralpha of theSN distribution is close to the origin: standard asymptotic distribution theory of maximum likelihood estimates (MLE's) does not apply in this case and thecorresponding standard errors are not trustworthy. The problem is especialy severe at\alpha=0 but to some extent propagates to its vicinity. Ifd=1, adoption ofCP leads to MLE's with regular asymptotic distribution across the parameter space, including\alpha=0. Ford>1 and\alpha=0, the problem is still unsolved at the present time, which is the reason whyselm issues a warningmessage when the MLE is in the vicinity of\alpha=0; see ‘Details’ ofselm. For background information, see Sections 3.1.4–6 and 5.2.3 of Azzalini and Capitanio (2014) and references therein.

This problem does not occur with the theSC and theSTdistribution (unless its tail-weight parameternu diverges, that is, when we are effectively approaching theSN case).

Author(s)

Adelchi Azzalini

References

Azzalini, A. with the collaboration of Capitanio, A. (2014).The Skew-Normal and Related Families. Cambridge University Press, IMS Monographs series.

See Also

selm function,selm (andmselm) class,plot.selm,dp2cp

Examples

 data(wines, package="sn")m5 <- selm(acidity ~ phenols + wine, family="SN", data=wines)summary(m5)summary(m5, "dp")s5 <- summary(m5, "dp", cor=TRUE, cov=TRUE)dp.cor <- slot(s5, "aux")$param.corcov2cor(vcov(m5, "dp")) # the same## m6 <- selm(acidity ~ phenols + wine, family="ST", data=wines) # boundary!?#m12 <- selm(cbind(acidity, alcohol) ~ phenols + wine,  family="SN", data=wines)s12 <- summary(m12)coef(m12, 'dp')coef(m12, "dp", vector=FALSE)## see other examples at function selm

Symmetry-modulated distributions

Description

Symmetry-modulated distributions, univariate and multivariate,AKA skew-symmetric distributions

Usage

dSymmModulated(x, xi=0, omega=1, f0, G0, w, par.f0, par.G0, odd="check",   log=FALSE, ...)rSymmModulated(n=1, xi=0, omega=1, f0, G0, w, par.f0, par.G0, odd="check", ...) dmSymmModulated(x, xi, Omega, f0, G0, w, par.f0, par.G0, odd="check",   log=FALSE, ...) rmSymmModulated(n=1, xi, Omega, f0, G0, w, par.f0, par.G0, odd="check", ...)plot2D.SymmModulated(range, npt=rep(101,2), xi=c(0,0), Omega, f0, G0, w,   par.f0, par.G0, odd="check", ...)

Arguments

x

a vector of coordinates where the density must be evaluated; for multivariate densities, evaluated bydmSymmModulated, a matrix is also allowed, each row representing a point.

xi

a numeric vector representing the location parameter; if must have length 1 fordSymmModulated andrSymmModulated,length 2 forplot2D.SymmModulated).

omega

a positive value representing the scale parameter.

f0

a character string denoting the symmetric density to be modulated;admissible values fordSymmModulated anddSymmModulated are"beta","cauchy","logistic" (or"logis"),"normal" (or"norm"),"t","uniform"; for the other functions the possible values are"cauchy","normal" (or"norm"),"t"; the meaning of the names is described in the ‘Details’ section.

G0

a character string denoting the symmetric distribution used in themodulating factor; admissible values are"beta","cauchy","logistic" (or"logis"),"normal" (or"norm"),"t","uniform", with meaning described in the‘Details’ section.

w

the name (not as a character string) of a user-defined function which satisfies the conditionw(-z)=-w(z) for allz;see the ‘Details’ section for additional specifications.

par.f0,par.G0

parameters required byf0 andG0, when they are of type"beta" or"t", otherwise ignored.

odd

a character string, with possible values"check" (default),"assume", "force", for regulation of the behaviour about the conditionthatw is an odd function, as explained in the ‘Details’section.

log

logical (default:FALSE); ifTRUE, densities are given as log(densities).

n

an integer value (default:n=1) indicating the number of random numbers.

Omega

a symmetric positive-definite matrix which regulates the dependence structure off0 and so of the final density.

range

a two-column matrix whose column-wise range is taken as theplotting intervals on the coordinated axes forming a bivariate grid ofpoints over which the density is plotted.

npt

a numeric vector with two elements representing the number of equally-spaced points on each axis spanning therange described above; default value isrep(101,2).

...

optional parameters regulating the functionw and, forplot2D.SymmModulated only, graphical parameters to be supplied tofunctioncontour.

Value

FordSymmModulated,rSymmModulated anddmSymmModulated, a numeric vector; fordmSymmModulated a matrix, unlessn=1.

Forplot2D.SymmModulated an invisible list containing thexandy coordinates forming the grid over which the densitypdfhas been evaluated for plotting.

Background

In the univariate case, start from symmetric density functionf_0, such thatf_0(z)=f_0(-z) for allz, and ‘modulate’ it in the form

f(z) = 2\, f_0(z)\, G_0\{w(z)\}

whereG_0 is a univariate symmetric (about 0) distribution function andw(z)is a real-valued odd function, hence satisfying the conditionw(-z)=-w(z);then $f(z)$ is a proper density function wich integrates to 1. A subsequent location and scale transformation applied tof(z) delivers the final density. Specifically, ifZ denotes a univariate random variable with densityf(z), then the computed density pertains to the transformed variable

\xi + \omega Z.

In the multivariate case, the scheme is similar, with natural adaptation.Densityf_0 is nowd-dimensional, whileG_0 is still univariate. The conditionsf_0(z)=f_0(-z) andw(-z)=-w(z) refer to ad-dimensional vectorz. Given ad \times d symmetric positive-definite matrix\Omega, we extract the the square roots\omega of the diagonal element of\Omega and correspondingly obtain the scale-free matrix

\bar\Omega = \mathrm{diag}(\omega)^{-1}\, \Omega\, \mathrm{diag}(\omega)^{-1}

which is used to regulate the dependence structure off_0(z) and so off(z). IfZ is multivariate random variable with densityf(z), then the final distribution refers to

\xi + \mathrm{diag}(\omega)\,Z

where\xi is ad-dimensional vector of location parametes.

This construction was put forward by Azzalini and Capitanio (2003). An essentially equivalent formulation has been presented by Wang et al. (2004).A summary account is available in Section 1.2 of Azzalini and Capitanio(2014); this includes, inter alia, an explanation of why the term‘symmetry-modulated’ distributions is preferred to ‘skew-symmetric’ distributions.

Random number generation is based on expression (1.11a) ofAzzalini and Capitanio (2014).

Details

FunctionsdSymmModulated andrSymmModulated deal with univariate distributions, for computing densities and generating random rumbers, respectively;dmSymmModulated andrmSymmModulated act similarly for multivariate distributions. For the bivariate case only,plot2D.SymmModulated computes a density over a grid ofcoordinates and produces acontour plot.

The distribution names used inf0 andG0 have, in the univariatecase, the same meaning as described in theDistributionspage, with the following exceptions, to achive symmetry about 0:"uniform" denotes a uniform distribution over the interval(-1, 1);"beta" denotes the a symmetric Beta distribution withsupport over the interval(-1, 1) and a common value of the shape parameters.

In the multivariate case, the available options"normal" and"t"forf0 refer to densities computed bydmnorm anddmt with 0 location and correlation matrix\bar\Omega, implied by\Omega. ArgumentG0 has the same meaning as in the univariate case.

Options"beta" and"t" forf0 andG0 require thespecification of a shape parameter, via the argumentspar.f0 andpar.G0, respectively. For"beta" the parameter representsthe common value of the shape parameters ofBeta;for"t", it representsdf ofTDist anddmt.

Functionw most be of the formw <- function(z, ...) where... are optional additional parameters andz represents valuedof the standardized form of the density; in the univariate case,x andz are related byz=(x-xi)/omega and an analogous fact holds inthe multivariate setting. The function must satisfy the conditionw(-z)=-w(z). It is assumed that the function is vectorized and, in themultivariate case, it will be called withz representing a matrix withd columns, ifd denotes the dimensionality of the randomvariable.

Argumentodd regulates the behaviour with respect to the conditionw(-z)=-w(z). If its value is"assume", the condition is justassumed to hold, and no action is taken. If the value is"check"(deafult), alimited check is performed; namely, in case of densities,the check is at 0 and the suppliedx points, while for random numbers the check is at 0 and the generated points.The value"force" ensures that the condition is satisfied by actually constructing a modified version of the user-supplied functionw, such that the required condition is enforced.

Author(s)

Adelchi Azzalini

References

Arellano-Valle, R. B., Gómez, H. W. and Quintana, F. A. (2004).A new class of skew-normal distributions.Comm. Stat., Theory &Methods,58, 111-121.

Azzalini, A. and Capitanio, A. (2003).Distributions generated by perturbation of symmetry with emphasis on a multivariate skew-t distribution.J.Roy. Statist. Soc. B65, 367–389.Full version of the paper athttps://arXiv.org/abs/0911.2342

Azzalini, A. with the collaboration of Capitanio, A. (2014).The Skew-Normal and Related Families. Cambridge University Press, IMS Monographs series.

Wang, J., Boyer, J. and Genton, M. G. (2004).A skew-symmetric representation of multivariate distributions.Statistica Sinica,14, 1259-1270.

See Also

Distributions,Beta,TDist,dmnorm,dmt,contour

Examples

x <- seq(2, 13, length=45)wLinear <- function(z, lambda) lambda*zy <- dSymmModulated(x, 5, 2, f0="normal", G0="normal", w=wLinear, lambda=3)# the same of dsn(x, 5, 2, 3), up to negligible numerical differences#wSGN <- function(z, lambda) z*lambda[1]/sqrt(1 + lambda[2]*z^2) y <- dSymmModulated(x, 5, 2, f0="normal", G0="normal", w=wSGN, lambda=c(3,5))# SGN distribution of Arellano-Valle et al. (2004)#wST <- function(z, lambda, nu) lambda*z*sqrt((nu+1)/(nu+z^2))y <- rSymmModulated(n=100, 5, 2, f0="t", G0="t", w=wST, par.f0=8, par.G0=9,         lambda=3, nu=8)# equivalent to rst(n=100, 5, 2, 3, 8) #wTrigs <- function(z, p, q) sin(z * p)/(1 + cos(z * q))x <- seq(-1, 1, length=51)y <- dSymmModulated(x, 0, 1, f0="beta", G0="logistic", w=wTrigs, par.f0=2,        par.G0=NULL, p=5, q=0.5)   plot(x, y, type="l")        # univariate analogue of the bivariate distribution on pp.372-3 of# Azzalini & Capitanio (2003)           #range <- cbind(c(-3,3), c(-3,3))wMvTrigs <- function(z, p, q) sin(z %*% p)/(1 + cos(z %*% q))plot2D.SymmModulated(range, xi=c(0,0), Omega=diag(2), f0="normal", G0="normal",    w=wMvTrigs, par.f0=NULL, par.G0=NULL, p=c(2,3), q=c(1,1), col=4)# w(.) as in (1.6) of Azzalini & Capitanio (2014, p.4) and plot as in # bottom-right panel of their Figure 1.1.

Piedmont wines data

Description

Data refer to chemical properties of 178 specimens of three types of wine produced in the Piedmont region of Italy.

Usage

data(wines)

Format

A data frame with 178 observations on the following 28 variables.

wine wine name (categorical, levels:Barbera,Barolo,Grignolino)
alcohol alcohol percentage (numeric)
sugar sugar-free extract (numeric)
acidity fixed acidity (numeric)
tartaric tartaric acid (numeric)
malic malic acid (numeric)
uronic uronic acids (numeric)
pH pH (numeric)
ash ash (numeric)
alcal_ash alcalinity of ash (numeric)
potassium potassium (numeric)
calcium calcium (numeric)
magnesium magnesium (numeric)
phosphate phosphate (numeric)
cloride chloride (numeric)
phenols total phenols (numeric)
flavanoids flavanoids (numeric)
nonflavanoids nonflavanoid phenols (numeric)
proanthocyanins proanthocyanins (numeric)
colour colour intensity (numeric)
hue hue (numeric)
OD_dwOD_{280}/OD_{315} of diluted wines (numeric)
OD_flOD_{280}/OD_{315} of flavanoids (numeric)
glycerol glycerol (numeric)
butanediol 2,3-butanediol (numeric)
nitrogen total nitrogen (numeric)
proline proline (numeric)
methanol methanol (numeric)

Details

The data represent 27 chemical measurements on each of 178 wine specimens belonging to three types of wine produced in the Piedmont region of Italy. The data have been presented and examined by Forinaet al. (1986) and were freely accessible from thePARVUS web-site until it was active.These data or, more often, a subset of them are now available from variousplaces, including someR packages. The present dataset includes all variables available on thePARVUS repository, which arethe variables listed by Forinaet al. (1986) with the exception of ‘Sulphate’. Moreover, it reveals the undocumented fact thatthe original dataset appears to include also the vintage year;see the final portion of the ‘Examples’ below.

Source

Forina, M., Lanteri, S. Armanino, C., Casolino, C., Casale, M. and Oliveri, P.V-PARVUS 2008: an extendible package of programs for esplorative data analysis, classification and regression analysis. Dip. Chimica e Tecnologie Farmaceutiche ed Alimentari, Università di Genova, Italia. Web-site (not accessible as of 2014): ‘⁠http://www.parvus.unige.it⁠

References

Forina M., Armanino C., Castino M. and Ubigli M. (1986). Multivariate data analysis as a discriminating method of the origin of wines.Vitis25, 189–201.

Examples

data(wines)pairs(wines[,c(2,3,16:18)], col=as.numeric(wines$wine))#code <- substr(rownames(wines), 1, 3)table(wines$wine, code)#year <- as.numeric(substr(rownames(wines), 6, 7))table(wines$wine, year)# coincides with Table 1(a) of Forina et al. (1986)

Function\log(2\,\Phi(x)) and its derivatives

Description

The functionlog(2*pnorm(x)) and its derivatives, including inverse Mills ratio.

Usage

zeta(k, x)

Arguments

k

an integer number between 0 and 5.

x

a numeric vector. Missing values (NAs) andInfs are allowed.

Details

Fork between 0 and 5, the derivative of orderk of\log(2\,\Phi(x)) is evaluated, where\Phi(x) denotes theN(0,1) cumulative distribution function.The derivative of orderk=0 refers to the function itself.Ifk is not an integer within0,..., 5,NULL is returned.

Value

a vector representing thek-th order derivative evaluated atx.

Background

The computation fork>1 is reduced to the casek=1, making useof expressions given by Azzalini and Capitanio (1999); see especially Section 4 of the full-length version of the paper. The main facts are summarized in Section 2.1.4 of Azzalini and Capitanio (2014).

For numerical stability, the evaluation ofzeta(1,x) whenx < -50 makes use of the asymptotic expansion (26.2.13) of Abramowitz and Stegun (1964).

zeta(1,-x) equalsdnorm(x)/pnorm(-x) (in principle, apart fromthe above-mentioned asymptotic expansion), called theinverse Mills ratio.

References

Abramowitz, M. and Stegun, I. A., editors (1964).Handbook of Mathematical Functions. Dover Publications.

Azzalini, A. and Capitanio, A. (1999).Statistical applications of the multivariate skew normal distribution.J.Roy.Statist.Soc. B61, 579–602. Full-length version available athttps://arXiv.org/abs/0911.2093

Azzalini, A. with the collaboration of Capitanio, A. (2014).The Skew-Normal and Related Families. Cambridge University Press, IMS Monographs series.

Examples

y <- zeta(2,seq(-20,20,by=0.5))#for(k in 0:5) curve(zeta(k,x), from=-1.5, to=5, col = k+2, add = k > 0)legend(3.5, -0.5, legend=as.character(0:5), col=2:7, lty=1)

[8]ページ先頭

©2009-2025 Movatter.jp